Skip to content Skip to sidebar Skip to footer

Pandas: Find Group Index Of First Row Matching A Predicate In A Group, If Any

I want to group a DataFrame by some criteria, and then find the integer index in the group (not the DataFrame) of the first row satisfying some predicate. If there is no such row,

Solution 1:

This is a bit longer, but IMHO is more understandable / customizable

In [126]: df2 = df.copy()

This is your group metric

In [127]: g = df.a//5

A reference to the create groups

In [128]: grp = df.groupby(g)

Create a columns of the generated group and the cumulative count within the group

In [129]: df2['group'] = g

In [130]: df2['count'] = grp.cumcount()

In [131]: df2
Out[131]: 
     a      b  group  count
00    red      0011  green      0122   blue      0233    red      0344  green      0455   blue      1066    red      1177  green      1288   blue      1399    red      141010  green      201111   blue      21

Filtering and grouping gives you back the first elemnt that you want. The count is the within group count

In[132]: df2[df2.b=='red'].groupby('group').first()
Out[132]: 
       abcountgroup00red016red1

You can generate all of the group keys (e.g. nothing came back from your filter); this way.

In [133]: df2[df2.b=='red'].groupby('group').first().reindex(grp.groups.keys())
Out[133]: 
    a    b  count
00  red      016  red      12 NaN  NaN    NaN

Solution 2:

Best I could do:

import itertools as it
df.groupby(df.a // 5).apply(lambda group: next(it.chain(np.where(group.get_values() == "red")[0], [None])))

The only real difference is using np.where on the values (so I'd expect this to be faster usually), but you may even want to just write your own first_where function and use that.

Post a Comment for "Pandas: Find Group Index Of First Row Matching A Predicate In A Group, If Any"