Вычислять формулы на основе результатов группировки Pandas и добавлять в конечный набор результатов

#python #pandas #group-by #pandas-groupby

#python #pandas #группировать по #pandas-groupby

Вопрос:

Доброе утро всем,

Фрейм данных:

 df = {'Client': ['A', 'A', 'A',
                 'B', 'B', 'B', 'B','B','B','B','B',
                 'C',
                 'D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
           'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ',
           'No RFQ',
           'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)

   Client            Result
0       A           Covered
1       A   Customer Reject
2       A  Customer Timeout
3       B     Dealer Reject
4       B    Dealer Timeout
5       B              Done
6       B      Tied Covered
7       B         Tied Done
8       B  Tied Traded Away
9       B       Traded Away
10      B            No RFQ
11      C            No RFQ
12      D           Covered
13      D   Customer Reject
14      D  Customer Timeout
15      D     Dealer Reject
16      D    Dealer Timeout
17      D              Done
18      D      Tied Covered
19      D         Tied Done
20      D  Tied Traded Away
21      D       Traded Away
22      D            No RFQ
  

Добавьте нулевые значения и добавьте итоговые значения в каждой группе (любезно предоставлено @MhDG7)

 groups = pd.melt((
    df
    .assign(count=1)
    .groupby(['Client','Result'],as_index=False)
    .sum()
    .pivot(index='Client', columns='Result', values='count')
    .fillna(0)
    .reset_index()
    )
    ,id_vars=['Client']).sort_values(['Client','value'])

# print(groups)
print("")

for i,rows in groups.groupby('Client').sum().iterrows():
    print(groups[lambda x: x['Client']==i])
    print('Totalttt    ',rows['value'])
    print('n')

   Client            Result  value
12      A     Dealer Reject    0.0
16      A    Dealer Timeout    0.0
20      A              Done    0.0
24      A            No RFQ    0.0
28      A      Tied Covered    0.0
32      A         Tied Done    0.0
36      A  Tied Traded Away    0.0
40      A       Traded Away    0.0
0       A           Covered    1.0
4       A   Customer Reject    1.0
8       A  Customer Timeout    1.0
Total                3.0


   Client            Result  value
1       B           Covered    0.0
5       B   Customer Reject    0.0
9       B  Customer Timeout    0.0
13      B     Dealer Reject    1.0
17      B    Dealer Timeout    1.0
21      B              Done    1.0
25      B            No RFQ    1.0
29      B      Tied Covered    1.0
33      B         Tied Done    1.0
37      B  Tied Traded Away    1.0
41      B       Traded Away    1.0
Total                8.0


   Client            Result  value
2       C           Covered    0.0
6       C   Customer Reject    0.0
10      C  Customer Timeout    0.0
14      C     Dealer Reject    0.0
18      C    Dealer Timeout    0.0
22      C              Done    0.0
30      C      Tied Covered    0.0
34      C         Tied Done    0.0
38      C  Tied Traded Away    0.0
42      C       Traded Away    0.0
26      C            No RFQ    1.0
Total                1.0


   Client            Result  value
3       D           Covered    1.0
7       D   Customer Reject    1.0
11      D  Customer Timeout    1.0
15      D     Dealer Reject    1.0
19      D    Dealer Timeout    1.0
23      D              Done    1.0
27      D            No RFQ    1.0
31      D      Tied Covered    1.0
35      D         Tied Done    1.0
39      D  Tied Traded Away    1.0
43      D       Traded Away    1.0
Total                11.0
  

Желаемый результат:

Я хотел бы добавить две формулы, которые используют результаты каждой возвращенной группировки. Таким образом, будет Total строка (сумма всех результатов для каждого клиента), а также две формулы ( Pct_Calc_1 и Pct_Calc_2 ) на основе суммированных сгруппированных значений, возвращаемых от каждого отдельного клиента:

 Pct_Calc_1 = (Total_Result_Per_Client - Total_Dealer_Reject_Per_Client - Total_Dealer_Timeout_Per_Client) / (Total_Per_Client - Total_Customer_Reject_Per_Client)

Pct_Calc_2 = (Total – (Covered   Tied Covered) – Customer Reject – Dealer Reject – Dealer Timeout – (Traded Away   Tied Traded Away)) / (Total – Dealer Reject – Dealer Timeout)
  

В случае, скажем, клиента B, будет отображаться следующий вывод:

    Client            Result  value
1       B           Covered    0.0
5       B   Customer Reject    0.0
9       B  Customer Timeout    0.0
13      B     Dealer Reject    1.0
17      B    Dealer Timeout    1.0
21      B              Done    1.0
25      B            No RFQ    1.0
29      B      Tied Covered    1.0
33      B         Tied Done    1.0
37      B  Tied Traded Away    1.0
41      B       Traded Away    1.0
Total                          8.0
Pct_Calc_1                     75.00%
Pct_Calc_2                     50.00%
  

где Pct_Calc_1 = (8 — 1 — 1) / (8 — 0) = 75.00%

Pct_Calc_2 = (8 — (0 1) — 0 — 1 — 1 — (1 1))) / (8 — 1 — 1) = 50.00%

В случае клиента D:

    Client            Result  value
3       D           Covered    1.0
7       D   Customer Reject    1.0
11      D  Customer Timeout    1.0
15      D     Dealer Reject    1.0
19      D    Dealer Timeout    1.0
23      D              Done    1.0
27      D            No RFQ    1.0
31      D      Tied Covered    1.0
35      D         Tied Done    1.0
39      D  Tied Traded Away    1.0
43      D       Traded Away    1.0
Total                          11.0
Pct_Calc_1                     90.00%
Pct_Calc_2                     44.44%
  

Pct_Calc_1 = (11 — 1 — 1) / (11 — 1) = 90.00%

Pct_Calc_2 = (11 — (1 1) — 1 — 1 -1 — (1 1))) / (11 — 1 — 1) = 44.44%

Если исходную группировку необходимо изменить, чтобы обеспечить этот дополнительный вывод, тогда все хорошо.

Ответ №1:

Давайте сделаем crosstab

 s = pd.crosstab(df.Client,df.Result,margins=True).drop('All')
s['Cal1']=(s['All'] - s['Dealer Reject'] - s['Dealer Timeout'])/(s['All'] - s['Customer Reject'])
s = s.stack().reset_index(name='value')
s
   Client            Result  value
0       A           Covered   1.00
1       A   Customer Reject   1.00
2       A  Customer Timeout   1.00
3       A     Dealer Reject   0.00
4       A    Dealer Timeout   0.00
5       A              Done   0.00
6       A            No RFQ   0.00
7       A      Tied Covered   0.00
8       A         Tied Done   0.00
9       A  Tied Traded Away   0.00
10      A       Traded Away   0.00
11      A               All   3.00
12      A              Cal1   1.50
13      B           Covered   0.00
14      B   Customer Reject   0.00
15      B  Customer Timeout   0.00
16      B     Dealer Reject   1.00
17      B    Dealer Timeout   1.00
18      B              Done   1.00
19      B            No RFQ   1.00
20      B      Tied Covered   1.00
21      B         Tied Done   1.00
22      B  Tied Traded Away   1.00
23      B       Traded Away   1.00
24      B               All   8.00
25      B              Cal1   0.75
26      C           Covered   0.00
27      C   Customer Reject   0.00
28      C  Customer Timeout   0.00
29      C     Dealer Reject   0.00
30      C    Dealer Timeout   0.00
31      C              Done   0.00
32      C            No RFQ   1.00
33      C      Tied Covered   0.00
34      C         Tied Done   0.00
35      C  Tied Traded Away   0.00
36      C       Traded Away   0.00
37      C               All   1.00
38      C              Cal1   1.00
39      D           Covered   1.00
40      D   Customer Reject   1.00
41      D  Customer Timeout   1.00
42      D     Dealer Reject   1.00
43      D    Dealer Timeout   1.00
44      D              Done   1.00
45      D            No RFQ   1.00
46      D      Tied Covered   1.00
47      D         Tied Done   1.00
48      D  Tied Traded Away   1.00
49      D       Traded Away   1.00
50      D               All  11.00
51      D              Cal1   0.90
  

Комментарии:

1. Превосходно! Спасибо @BEN_YO