How To Group Tuples By Common Items And Find Average Per Each Group
I have a list of tuples named data: data = [('A', 2), ('B', 2), ('B', 4), ('B', 6), ('B', 8), ('B', 6), ('B', 4), ('B', 3), ('C', 10), ('C', 10), ('C', 10),
Solution 1:
Here is an option with a defaultdict:
from collections import defaultdict
avg = defaultdict(lambda :{'count': 0, 'sum': 0})
# calculate the sum and count for each keyfor k, v in data:
avg[k]['count'] += 1
avg[k]['sum'] += v
# calculate the average
[(k, v['sum']/v['count']) for k, v in avg.items()]
#[('A', 2.0),# ('D', 12.0),# ('F', 8.0),# ('E', 12.0),# ('B', 4.714285714285714),# ('C', 10.0)]
Solution 2:
Try with groupby
from itertools import groupby
data_ = [(n,[i[1] for i in g]) for n,g in groupby(data, key = lambda x:x[0])]
result = [(i,float(sum(j))/float(len(j))) for i,j in data_]
Result
[('A', 2.0),
('B', 4.714285714285714),
('C', 10.0),
('D', 12.0),
('E', 12.0),
('F', 8.0)]
Solution 3:
An alternative solution which you might consider, especially when dealing with large data sets, is to use pandas. Here, groupby
and mean
will do the job:
import pandas as pd
data = [('A', 2),
('B', 2), ('B', 4), ('B', 6), ('B', 8), ('B', 6), ('B', 4), ('B', 3),
('C', 10), ('C', 10), ('C', 10),
('D', 12),
('E', 12),
('F', 10), ('F', 8), ('F', 6)]
df = pd.DataFrame(data, columns=['letter', 'number'])
print(df)
# letter number# 0 A 2# 1 B 2# 2 B 4# 3 B 6# 4 B 8# 5 B 6# 6 B 4# 7 B 3# 8 C 10# 9 C 10# 10 C 10# 11 D 12# 12 E 12# 13 F 10# 14 F 8# 15 F 6print(df.groupby('letter').mean())
# number# letter # A 2.000000# B 4.714286# C 10.000000# D 12.000000# E 12.000000# F 8.000000print(df.groupby('letter').mean().round().astype(int))
# number# letter # A 2# B 5# C 10# D 12# E 12# F 8
You can get back your list of tuples as follows:
averages = df.groupby('letter').mean().round().astype(int)
result= list(result.to_records())
print(result)
# [('A', 2), ('B', 5), ('C', 10), ('D', 12), ('E', 12), ('F', 8)]
Post a Comment for "How To Group Tuples By Common Items And Find Average Per Each Group"