Skip to content Skip to sidebar Skip to footer

How To Group Tuples By Common Items And Find Average Per Each Group

I have a list of tuples named data: data = [('A', 2), ('B', 2), ('B', 4), ('B', 6), ('B', 8), ('B', 6), ('B', 4), ('B', 3), ('C', 10), ('C', 10), ('C', 10),

Solution 1:

Here is an option with a defaultdict:

from collections import defaultdict
avg = defaultdict(lambda :{'count': 0, 'sum': 0})
​
# calculate the sum and count for each keyfor k, v in data:
    avg[k]['count'] += 1
    avg[k]['sum'] += v

# calculate the average
[(k, v['sum']/v['count']) for k, v in avg.items()]

#[('A', 2.0),# ('D', 12.0),# ('F', 8.0),# ('E', 12.0),# ('B', 4.714285714285714),# ('C', 10.0)]

Solution 2:

Try with groupby

from itertools import groupby
data_ = [(n,[i[1] for i in g]) for n,g in groupby(data, key = lambda x:x[0])]   
result = [(i,float(sum(j))/float(len(j))) for i,j in data_]

Result

[('A', 2.0),
 ('B', 4.714285714285714),
 ('C', 10.0),
 ('D', 12.0),
 ('E', 12.0),
 ('F', 8.0)]

Solution 3:

An alternative solution which you might consider, especially when dealing with large data sets, is to use pandas. Here, groupby and mean will do the job:

import pandas as pd

data = [('A', 2), 
        ('B', 2), ('B', 4), ('B', 6), ('B', 8), ('B', 6), ('B', 4), ('B', 3),
        ('C', 10), ('C', 10), ('C', 10),
        ('D', 12),
        ('E', 12),
        ('F', 10), ('F', 8), ('F', 6)]

df = pd.DataFrame(data, columns=['letter', 'number'])
print(df)
#    letter  number# 0       A       2# 1       B       2# 2       B       4# 3       B       6# 4       B       8# 5       B       6# 6       B       4# 7       B       3# 8       C      10# 9       C      10# 10      C      10# 11      D      12# 12      E      12# 13      F      10# 14      F       8# 15      F       6print(df.groupby('letter').mean())
#            number# letter           # A        2.000000# B        4.714286# C       10.000000# D       12.000000# E       12.000000# F        8.000000print(df.groupby('letter').mean().round().astype(int))
#         number# letter        # A            2# B            5# C           10# D           12# E           12# F            8

You can get back your list of tuples as follows:

averages = df.groupby('letter').mean().round().astype(int)
result= list(result.to_records())
print(result)
# [('A', 2), ('B', 5), ('C', 10), ('D', 12), ('E', 12), ('F', 8)]

Post a Comment for "How To Group Tuples By Common Items And Find Average Per Each Group"