#python #pandas #group-by #pandas-groupby
#python #pandas #группировать по #pandas-groupby
Вопрос:
Доброе утро всем,
Фрейм данных:
df = {'Client': ['A', 'A', 'A',
'B', 'B', 'B', 'B','B','B','B','B',
'C',
'D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ',
'No RFQ',
'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)
Client Result
0 A Covered
1 A Customer Reject
2 A Customer Timeout
3 B Dealer Reject
4 B Dealer Timeout
5 B Done
6 B Tied Covered
7 B Tied Done
8 B Tied Traded Away
9 B Traded Away
10 B No RFQ
11 C No RFQ
12 D Covered
13 D Customer Reject
14 D Customer Timeout
15 D Dealer Reject
16 D Dealer Timeout
17 D Done
18 D Tied Covered
19 D Tied Done
20 D Tied Traded Away
21 D Traded Away
22 D No RFQ
Добавьте нулевые значения и добавьте итоговые значения в каждой группе (любезно предоставлено @MhDG7)
groups = pd.melt((
df
.assign(count=1)
.groupby(['Client','Result'],as_index=False)
.sum()
.pivot(index='Client', columns='Result', values='count')
.fillna(0)
.reset_index()
)
,id_vars=['Client']).sort_values(['Client','value'])
# print(groups)
print("")
for i,rows in groups.groupby('Client').sum().iterrows():
print(groups[lambda x: x['Client']==i])
print('Totalttt ',rows['value'])
print('n')
Client Result value
12 A Dealer Reject 0.0
16 A Dealer Timeout 0.0
20 A Done 0.0
24 A No RFQ 0.0
28 A Tied Covered 0.0
32 A Tied Done 0.0
36 A Tied Traded Away 0.0
40 A Traded Away 0.0
0 A Covered 1.0
4 A Customer Reject 1.0
8 A Customer Timeout 1.0
Total 3.0
Client Result value
1 B Covered 0.0
5 B Customer Reject 0.0
9 B Customer Timeout 0.0
13 B Dealer Reject 1.0
17 B Dealer Timeout 1.0
21 B Done 1.0
25 B No RFQ 1.0
29 B Tied Covered 1.0
33 B Tied Done 1.0
37 B Tied Traded Away 1.0
41 B Traded Away 1.0
Total 8.0
Client Result value
2 C Covered 0.0
6 C Customer Reject 0.0
10 C Customer Timeout 0.0
14 C Dealer Reject 0.0
18 C Dealer Timeout 0.0
22 C Done 0.0
30 C Tied Covered 0.0
34 C Tied Done 0.0
38 C Tied Traded Away 0.0
42 C Traded Away 0.0
26 C No RFQ 1.0
Total 1.0
Client Result value
3 D Covered 1.0
7 D Customer Reject 1.0
11 D Customer Timeout 1.0
15 D Dealer Reject 1.0
19 D Dealer Timeout 1.0
23 D Done 1.0
27 D No RFQ 1.0
31 D Tied Covered 1.0
35 D Tied Done 1.0
39 D Tied Traded Away 1.0
43 D Traded Away 1.0
Total 11.0
Желаемый результат:
Я хотел бы добавить две формулы, которые используют результаты каждой возвращенной группировки. Таким образом, будет Total
строка (сумма всех результатов для каждого клиента), а также две формулы ( Pct_Calc_1
и Pct_Calc_2
) на основе суммированных сгруппированных значений, возвращаемых от каждого отдельного клиента:
Pct_Calc_1 = (Total_Result_Per_Client - Total_Dealer_Reject_Per_Client - Total_Dealer_Timeout_Per_Client) / (Total_Per_Client - Total_Customer_Reject_Per_Client)
Pct_Calc_2 = (Total – (Covered Tied Covered) – Customer Reject – Dealer Reject – Dealer Timeout – (Traded Away Tied Traded Away)) / (Total – Dealer Reject – Dealer Timeout)
В случае, скажем, клиента B, будет отображаться следующий вывод:
Client Result value
1 B Covered 0.0
5 B Customer Reject 0.0
9 B Customer Timeout 0.0
13 B Dealer Reject 1.0
17 B Dealer Timeout 1.0
21 B Done 1.0
25 B No RFQ 1.0
29 B Tied Covered 1.0
33 B Tied Done 1.0
37 B Tied Traded Away 1.0
41 B Traded Away 1.0
Total 8.0
Pct_Calc_1 75.00%
Pct_Calc_2 50.00%
где Pct_Calc_1
= (8 — 1 — 1) / (8 — 0) = 75.00%
Pct_Calc_2
= (8 — (0 1) — 0 — 1 — 1 — (1 1))) / (8 — 1 — 1) = 50.00%
В случае клиента D:
Client Result value
3 D Covered 1.0
7 D Customer Reject 1.0
11 D Customer Timeout 1.0
15 D Dealer Reject 1.0
19 D Dealer Timeout 1.0
23 D Done 1.0
27 D No RFQ 1.0
31 D Tied Covered 1.0
35 D Tied Done 1.0
39 D Tied Traded Away 1.0
43 D Traded Away 1.0
Total 11.0
Pct_Calc_1 90.00%
Pct_Calc_2 44.44%
Pct_Calc_1
= (11 — 1 — 1) / (11 — 1) = 90.00%
Pct_Calc_2
= (11 — (1 1) — 1 — 1 -1 — (1 1))) / (11 — 1 — 1) = 44.44%
Если исходную группировку необходимо изменить, чтобы обеспечить этот дополнительный вывод, тогда все хорошо.
Ответ №1:
Давайте сделаем crosstab
s = pd.crosstab(df.Client,df.Result,margins=True).drop('All')
s['Cal1']=(s['All'] - s['Dealer Reject'] - s['Dealer Timeout'])/(s['All'] - s['Customer Reject'])
s = s.stack().reset_index(name='value')
s
Client Result value
0 A Covered 1.00
1 A Customer Reject 1.00
2 A Customer Timeout 1.00
3 A Dealer Reject 0.00
4 A Dealer Timeout 0.00
5 A Done 0.00
6 A No RFQ 0.00
7 A Tied Covered 0.00
8 A Tied Done 0.00
9 A Tied Traded Away 0.00
10 A Traded Away 0.00
11 A All 3.00
12 A Cal1 1.50
13 B Covered 0.00
14 B Customer Reject 0.00
15 B Customer Timeout 0.00
16 B Dealer Reject 1.00
17 B Dealer Timeout 1.00
18 B Done 1.00
19 B No RFQ 1.00
20 B Tied Covered 1.00
21 B Tied Done 1.00
22 B Tied Traded Away 1.00
23 B Traded Away 1.00
24 B All 8.00
25 B Cal1 0.75
26 C Covered 0.00
27 C Customer Reject 0.00
28 C Customer Timeout 0.00
29 C Dealer Reject 0.00
30 C Dealer Timeout 0.00
31 C Done 0.00
32 C No RFQ 1.00
33 C Tied Covered 0.00
34 C Tied Done 0.00
35 C Tied Traded Away 0.00
36 C Traded Away 0.00
37 C All 1.00
38 C Cal1 1.00
39 D Covered 1.00
40 D Customer Reject 1.00
41 D Customer Timeout 1.00
42 D Dealer Reject 1.00
43 D Dealer Timeout 1.00
44 D Done 1.00
45 D No RFQ 1.00
46 D Tied Covered 1.00
47 D Tied Done 1.00
48 D Tied Traded Away 1.00
49 D Traded Away 1.00
50 D All 11.00
51 D Cal1 0.90
Комментарии:
1. Превосходно! Спасибо @BEN_YO