Pandas cumsum по уникальным двум столбцам

#python #pandas #cumsum

Вопрос:

Я работаю с python и хочу обобщить общие цели двух команд на основе таблицы результатов, которая выглядит следующим образом:

Я помещаю вывод желаний в столбцы (cumsumlocal и cumsumVisitor) для лучшего объяснения, я хочу обобщить цели каждой команды по сассону и сопоставить Примечание, есть два разных сассона. Таким образом, cumsum должен быть уникальным для каждого seasson.

    Seasson Match    Local   Visitor GoalLocal   goalVisitor _-->cumsumLocal cumsumVisitor
-----------------------------------------------------------------------------------------
    1       1   Machester   Blackburn   2      1    _---->    2       1
    1       1   Leeds       arsenal     2      4    _---->    2       4
    1       2   Blackburn   Leeds       1      3    _---->    3       5
    1       2   Arsenal     Manchester  2      0    _---->    6       2
    1       3   Leeds       Manchester  6      1    _---->    11      3
    1       3   Arsenal     Blackburn   5      0    _---->      
    2       1   Machester   Blackburn   3      1    _---->      
    2       1   Leeds       arsenal     2      0    _---->      
    2       2   Blackburn   Leeds       2      4    _---->      
    2       2   Arsenal     Manchester  1      3    _---->      
    2       3   Leeds       Manchester  2      0    _---->      
    2       3   Arsenal     Blackburn   6      1    _---->

1. Вы бы хотели не публиковать рисунок …. кроме того, вам может потребоваться объяснить немного больше

2. Возможно ли полностью добавить вывод в новые столбцы для проверки решений?

3. Да, это верно. То, что я пытаюсь сделать, это добавить накопленные цели каждой команды на каждую дату (матч). Но каждая команда может играть как местная, так и гостевая. Вот почему путаница.

Ответ №1:

Я считаю, что вам нужна предварительная обработка для добавления _ к именам столбцов:

 d = {'Local':'Team_Local','Visitor':'Team_Visitor',
     'GoalLocal':'Goal_Local','goalVisitor':'Goal_Visitor'}
df = df.rename(columns=d)
print (df)
    Seasson  Match  Team_Local Team_Visitor  Goal_Local  Goal_Visitor
0         1      1  Manchester    Blackburn           2             1
1         1      1       Leeds      Arsenal           2             4
2         1      2   Blackburn        Leeds           1             3
3         1      2     Arsenal   Manchester           2             0
4         1      3       Leeds   Manchester           6             1
5         1      3     Arsenal    Blackburn           5             0
6         2      1  Manchester    Blackburn           3             1
7         2      1       Leeds      Arsenal           2             0
8         2      2   Blackburn        Leeds           2             4
9         2      2     Arsenal   Manchester           1             3
10        2      3       Leeds   Manchester           2             0
11        2      3     Arsenal    Blackburn           6             1

Создайте MultiIndex по split , затем измените по stack и создайте новый столбец по groupby= cumsum , последний раз измените обратно по unstack :

 df = df.set_index(['Seasson','Match'], append=True)
df.columns = df.columns.str.split('_', expand=True)
df = df.stack()
#pandas 0.24 
df['Cum'] = df.groupby(['Seasson','Team'])['Goal'].cumsum()
#pandas lower
#df['Cum'] = df.reset_index().groupby(['Seasson','Team'])['Goal'].cumsum().values
df = df.unstack().reindex(['Team','Goal','Cum'], axis=1, level=0)
df.columns = df.columns.map('_'.join)
df = df.reset_index(level=0, drop=True).reset_index()

 print (df)
    Seasson  Match  Team_Local Team_Visitor  Goal_Local  Goal_Visitor  
0         1      1  Manchester    Blackburn           2             1   
1         1      1       Leeds      Arsenal           2             4   
2         1      2   Blackburn        Leeds           1             3   
3         1      2     Arsenal   Manchester           2             0   
4         1      3       Leeds   Manchester           6             1   
5         1      3     Arsenal    Blackburn           5             0   
6         2      1  Manchester    Blackburn           3             1   
7         2      1       Leeds      Arsenal           2             0   
8         2      2   Blackburn        Leeds           2             4   
9         2      2     Arsenal   Manchester           1             3   
10        2      3       Leeds   Manchester           2             0   
11        2      3     Arsenal    Blackburn           6             1   

    Cum_Local  Cum_Visitor  
0           2            1  
1           2            4  
2           2            5  
3           6            2  
4          11            3  
5          11            2  
6           3            1  
7           2            0  
8           3            6  
9           1            6  
10          8            6  
11          7            4

1. Я получаю эту ошибку df[‘Cum’] = df.groupby([‘Seasson’, ‘Team’])[‘Goal’].cumsum() Файл «pandas index.pyx», строка 137, в pandas.index. IndexEngine.get_loc (pandasindex.c:4066) Файл «pandasindex.pyx», строка 159, в pandas.index. IndexEngine.get_loc (pandasindex.c:3930) Файл «pandashashtable.pyx», строка 675, в pandas.hashtable. PyObjectHashTable.get_item (pandashashtable.c:12408) Файл «pandashashtable.pyx», строка 683, в pandas.hashtable. PyObjectHashTable.get_item (pandashashtable.c: 12359) Ошибка ключа: ‘Seasson’

2. @Fede — Как работает pandas under 24 альтернатива? df['Cum'] = df.reset_index().groupby(['Seasson','Team'])['Goal'].cumsum().values ?