Pandas: Find Group Index Of First Row Matching A Predicate In A Group, If Any
I want to group a DataFrame by some criteria, and then find the integer index in the group (not the DataFrame) of the first row satisfying some predicate. If there is no such row,
Solution 1:
This is a bit longer, but IMHO is more understandable / customizable
In [126]: df2 = df.copy()
This is your group metric
In [127]: g = df.a//5
A reference to the create groups
In [128]: grp = df.groupby(g)
Create a columns of the generated group and the cumulative count within the group
In [129]: df2['group'] = g
In [130]: df2['count'] = grp.cumcount()
In [131]: df2
Out[131]:
a b group count
00 red 0011 green 0122 blue 0233 red 0344 green 0455 blue 1066 red 1177 green 1288 blue 1399 red 141010 green 201111 blue 21
Filtering and grouping gives you back the first elemnt that you want. The count is the within group count
In[132]: df2[df2.b=='red'].groupby('group').first()
Out[132]:
abcountgroup00red016red1
You can generate all of the group keys (e.g. nothing came back from your filter); this way.
In [133]: df2[df2.b=='red'].groupby('group').first().reindex(grp.groups.keys())
Out[133]:
a b count
00 red 016 red 12 NaN NaN NaN
Solution 2:
Best I could do:
import itertools as it
df.groupby(df.a // 5).apply(lambda group: next(it.chain(np.where(group.get_values() == "red")[0], [None])))
The only real difference is using np.where
on the values (so I'd expect this to be faster usually), but you may even want to just write your own first_where
function and use that.
Post a Comment for "Pandas: Find Group Index Of First Row Matching A Predicate In A Group, If Any"