Attempting To Find The 5 Largest Values Per Month Using Groupby
I am attempting to show the top three values of nc_type for each month. I tried using n_largest but that doesn't do it by date. Original Data: area
Solution 1:
Scenario 1
MultiIndex series
occurred_date nc_type
1.0 x 3
y 4
z 13
w 24
f 34
12.0 d 18
g 10
w 44
a 27
g 42
Name: test, dtype: int64
Call sort_values
+ groupby
+ head
:
df.sort_values(ascending=False).groupby(level=0).head(2)
occurred_date nc_type
12.0 w 44
g 42
1.0 f 34
w 24
Name: test, dtype: int64
Change head(2)
to head(5)
for your situation.
Or, expanding upon my comment with nlargest
, you could do:
df.groupby(level=0).nlargest(2).reset_index(level=0, drop=1)
occurred_date nc_type
1.0 f 34
w 24
12.0 w 44
g 42
Name: test, dtype: int64
Scenario 2
3-col dataframe
occurred_date nc_type value
0 1.0 x 3
1 1.0 y 4
2 1.0 z 13
3 1.0 w 24
4 1.0 f 34
5 12.0 d 18
6 12.0 g 10
7 12.0 w 44
8 12.0 a 27
9 12.0 g 42
You can use sort_values
+ groupby
+ head
:
df.sort_values(['occurred_date', 'value'],
ascending=[True, False]).groupby('occurred_date').head(2)
occurred_date nc_type value
4 1.0 f 34
3 1.0 w 24
7 12.0 w 44
9 12.0 g 42
Change head(2)
to head(5)
for your scenario.
Scenario 3
MultiIndex Dataframe
test
occurred_date nc_type
1.0 x 3
y 4
z 13
w 24
f 34
12.0 d 18
g 10
w 44
a 27
g 42
Or, with nlargest
.
df.groupby(level=0).test.nlargest(2)\
.reset_index(level=0, drop=1)
occurred_date nc_type
1.0 f 34
w 24
12.0 w 44
g 42
Name: test, dtype: int64
Solution 2:
I'd include group_keys=False
df.groupby('occurred_date', group_keys=False).nlargest(3)
occurred_date nc_type
1.0 f 34
w 24
z 13
12.0 w 44
g 42
a 27
Name: value, dtype: int64
Post a Comment for "Attempting To Find The 5 Largest Values Per Month Using Groupby"