как добавить новую строку в каждую группу groupby в PANDAS, одно из значений этой строки равно сумме значений каждой группы

#python #pandas #pandas-groupby

Вопрос:

допустим, у меня есть такой фрейм данных

 eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
 

Я хотел добавить новую строку в каждую группу groupby(by=[‘eff_date’,’mdl_cd’,’ast_cd’])
в каком столбце значение для eff_date,mdl_cd и ast_cd останется same , но для prop_cd значения станет Hlds , а значение столбца value станет суммой значений этой группы, например, для первой группы значение столбца value будет (-0.1234 0.5123 -0.7612 ) т.е. -0,3723

следовательно, результат будет таким

 eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Agri,Hlds,-0.3723         row added   (sum of value in that group)

2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Engy,Hlds,0.5101          row added  (sum of value in that group)

2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Industry,Hlds,-0.7634      row added (sum of value in that group)

2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-22,Comm,Metal,Hlds,-0.1277         row added (sum of value in that group)

2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Agri,Hlds,0.979           row added (sum of value in that group)

2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Engy,Hlds,0.0101          row added (sum of value in that group)

2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Industry,Hlds,-0.2334     row added (sum of value in that group)

2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
2021-09-23,Equity,Metal,Hlds,1.1747         row added (sum of value in that group)
 

как выполнить это вычисление с помощью pandas

Комментарии:

1. Не могли бы вы просто выполнить groupby и суммировать все сразу и объединить его обратно в исходный df, а затем отсортировать.

Ответ №1:

Вы можете создать фрейм данных с суммой каждой группы по .groupby() и .sum() , установить prop_cd как Hlds по .assign() .

Затем сопоставьте с исходным фреймом данных по pd.concat() и отсортируйте столбцы, чтобы объединить строки суммы с их соответствующими группами по .sort_values() , следующим образом:

 df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')

df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)
 

Результат:

 print(df_out)

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Agri    Hlds -0.3723
4   2021-09-22    Comm      Engy     Car  0.1212
5   2021-09-22    Comm      Engy    Fund -0.1234
6   2021-09-22    Comm      Engy     Mmt  0.5123
7   2021-09-22    Comm      Engy    Hlds  0.5101
8   2021-09-22    Comm  Industry     Car -0.7612
9   2021-09-22    Comm  Industry    Fund  0.1212
10  2021-09-22    Comm  Industry     Mmt -0.1234
11  2021-09-22    Comm  Industry    Hlds -0.7634
12  2021-09-22    Comm     Metal     Car  0.5123
13  2021-09-22    Comm     Metal    Fund -0.7612
14  2021-09-22    Comm     Metal     Mmt  0.1212
15  2021-09-22    Comm     Metal    Hlds -0.1277
16  2021-09-23  Equity      Agri     Car  0.6541
17  2021-09-23  Equity      Agri    Fund  0.5123
18  2021-09-23  Equity      Agri     Mmt -0.1874
19  2021-09-23  Equity      Agri    Hlds  0.9790
20  2021-09-23  Equity      Engy     Car  0.1212
21  2021-09-23  Equity      Engy    Fund -0.6234
22  2021-09-23  Equity      Engy     Mmt  0.5123
23  2021-09-23  Equity      Engy    Hlds  0.0101
24  2021-09-23  Equity  Industry     Car -0.1612
25  2021-09-23  Equity  Industry    Fund  0.1212
26  2021-09-23  Equity  Industry     Mmt -0.1934
27  2021-09-23  Equity  Industry    Hlds -0.2334
28  2021-09-23  Equity     Metal     Car  0.5123
29  2021-09-23  Equity     Metal    Fund  0.5412
30  2021-09-23  Equity     Metal     Mmt  0.1212
31  2021-09-23  Equity     Metal    Hlds  1.1747
 

Настройка

 df = pd.read_clipboard(',')

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Engy     Car  0.1212
4   2021-09-22    Comm      Engy    Fund -0.1234
5   2021-09-22    Comm      Engy     Mmt  0.5123
6   2021-09-22    Comm  Industry     Car -0.7612
7   2021-09-22    Comm  Industry    Fund  0.1212
8   2021-09-22    Comm  Industry     Mmt -0.1234
9   2021-09-22    Comm     Metal     Car  0.5123
10  2021-09-22    Comm     Metal    Fund -0.7612
11  2021-09-22    Comm     Metal     Mmt  0.1212
12  2021-09-23  Equity      Agri     Car  0.6541
13  2021-09-23  Equity      Agri    Fund  0.5123
14  2021-09-23  Equity      Agri     Mmt -0.1874
15  2021-09-23  Equity      Engy     Car  0.1212
16  2021-09-23  Equity      Engy    Fund -0.6234
17  2021-09-23  Equity      Engy     Mmt  0.5123
18  2021-09-23  Equity  Industry     Car -0.1612
19  2021-09-23  Equity  Industry    Fund  0.1212
20  2021-09-23  Equity  Industry     Mmt -0.1934
21  2021-09-23  Equity     Metal     Car  0.5123
22  2021-09-23  Equity     Metal    Fund  0.5412
23  2021-09-23  Equity     Metal     Mmt  0.1212
 

Промежуточный результат:

 print(df_sum)

     eff_date  mdl_cd    ast_cd   value prop_cd
0  2021-09-22    Comm      Agri -0.3723    Hlds
1  2021-09-22    Comm      Engy  0.5101    Hlds
2  2021-09-22    Comm  Industry -0.7634    Hlds
3  2021-09-22    Comm     Metal -0.1277    Hlds
4  2021-09-23  Equity      Agri  0.9790    Hlds
5  2021-09-23  Equity      Engy  0.0101    Hlds
6  2021-09-23  Equity  Industry -0.2334    Hlds
7  2021-09-23  Equity     Metal  1.1747    Hlds