#python #pandas #pandas-groupby
Вопрос:
допустим, у меня есть такой фрейм данных
eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
Я хотел добавить новую строку в каждую группу groupby(by=[‘eff_date’,’mdl_cd’,’ast_cd’])
в каком столбце значение для eff_date,mdl_cd
и ast_cd
останется same
, но для prop_cd
значения станет Hlds
, а значение столбца value станет суммой значений этой группы, например, для первой группы значение столбца value будет (-0.1234 0.5123 -0.7612 ) т.е. -0,3723
следовательно, результат будет таким
eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Agri,Hlds,-0.3723 row added (sum of value in that group)
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Engy,Hlds,0.5101 row added (sum of value in that group)
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Industry,Hlds,-0.7634 row added (sum of value in that group)
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-22,Comm,Metal,Hlds,-0.1277 row added (sum of value in that group)
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Agri,Hlds,0.979 row added (sum of value in that group)
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123
2021-09-23,Equity,Engy,Hlds,0.0101 row added (sum of value in that group)
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Industry,Hlds,-0.2334 row added (sum of value in that group)
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
2021-09-23,Equity,Metal,Hlds,1.1747 row added (sum of value in that group)
как выполнить это вычисление с помощью pandas
Комментарии:
1. Не могли бы вы просто выполнить groupby и суммировать все сразу и объединить его обратно в исходный df, а затем отсортировать.
Ответ №1:
Вы можете создать фрейм данных с суммой каждой группы по .groupby()
и .sum()
, установить prop_cd
как Hlds
по .assign()
.
Затем сопоставьте с исходным фреймом данных по pd.concat()
и отсортируйте столбцы, чтобы объединить строки суммы с их соответствующими группами по .sort_values()
, следующим образом:
df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')
df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)
Результат:
print(df_out)
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Agri Hlds -0.3723
4 2021-09-22 Comm Engy Car 0.1212
5 2021-09-22 Comm Engy Fund -0.1234
6 2021-09-22 Comm Engy Mmt 0.5123
7 2021-09-22 Comm Engy Hlds 0.5101
8 2021-09-22 Comm Industry Car -0.7612
9 2021-09-22 Comm Industry Fund 0.1212
10 2021-09-22 Comm Industry Mmt -0.1234
11 2021-09-22 Comm Industry Hlds -0.7634
12 2021-09-22 Comm Metal Car 0.5123
13 2021-09-22 Comm Metal Fund -0.7612
14 2021-09-22 Comm Metal Mmt 0.1212
15 2021-09-22 Comm Metal Hlds -0.1277
16 2021-09-23 Equity Agri Car 0.6541
17 2021-09-23 Equity Agri Fund 0.5123
18 2021-09-23 Equity Agri Mmt -0.1874
19 2021-09-23 Equity Agri Hlds 0.9790
20 2021-09-23 Equity Engy Car 0.1212
21 2021-09-23 Equity Engy Fund -0.6234
22 2021-09-23 Equity Engy Mmt 0.5123
23 2021-09-23 Equity Engy Hlds 0.0101
24 2021-09-23 Equity Industry Car -0.1612
25 2021-09-23 Equity Industry Fund 0.1212
26 2021-09-23 Equity Industry Mmt -0.1934
27 2021-09-23 Equity Industry Hlds -0.2334
28 2021-09-23 Equity Metal Car 0.5123
29 2021-09-23 Equity Metal Fund 0.5412
30 2021-09-23 Equity Metal Mmt 0.1212
31 2021-09-23 Equity Metal Hlds 1.1747
Настройка
df = pd.read_clipboard(',')
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Engy Car 0.1212
4 2021-09-22 Comm Engy Fund -0.1234
5 2021-09-22 Comm Engy Mmt 0.5123
6 2021-09-22 Comm Industry Car -0.7612
7 2021-09-22 Comm Industry Fund 0.1212
8 2021-09-22 Comm Industry Mmt -0.1234
9 2021-09-22 Comm Metal Car 0.5123
10 2021-09-22 Comm Metal Fund -0.7612
11 2021-09-22 Comm Metal Mmt 0.1212
12 2021-09-23 Equity Agri Car 0.6541
13 2021-09-23 Equity Agri Fund 0.5123
14 2021-09-23 Equity Agri Mmt -0.1874
15 2021-09-23 Equity Engy Car 0.1212
16 2021-09-23 Equity Engy Fund -0.6234
17 2021-09-23 Equity Engy Mmt 0.5123
18 2021-09-23 Equity Industry Car -0.1612
19 2021-09-23 Equity Industry Fund 0.1212
20 2021-09-23 Equity Industry Mmt -0.1934
21 2021-09-23 Equity Metal Car 0.5123
22 2021-09-23 Equity Metal Fund 0.5412
23 2021-09-23 Equity Metal Mmt 0.1212
Промежуточный результат:
print(df_sum)
eff_date mdl_cd ast_cd value prop_cd
0 2021-09-22 Comm Agri -0.3723 Hlds
1 2021-09-22 Comm Engy 0.5101 Hlds
2 2021-09-22 Comm Industry -0.7634 Hlds
3 2021-09-22 Comm Metal -0.1277 Hlds
4 2021-09-23 Equity Agri 0.9790 Hlds
5 2021-09-23 Equity Engy 0.0101 Hlds
6 2021-09-23 Equity Industry -0.2334 Hlds
7 2021-09-23 Equity Metal 1.1747 Hlds