Skip to content Skip to sidebar Skip to footer

Get Only The First And Last Rows Of Each Group With Pandas

Iam newbie in python. I have huge a dataframe with millions of rows and id. my data looks like this: Time ID X Y 8:00 A 23 100 9:00 B 24 110 10:00 B 25 120 1

Solution 1:

Use groupby, find the head and tail for each group, and concat the two.

g = df.groupby('ID')

(pd.concat([g.head(1), g.tail(1)])
   .drop_duplicates()
   .sort_values('ID')
   .reset_index(drop=True))

    Time ID   X    Y
08:00  A  23100120:00  A  3522029:00  B  24110323:00  B  38250411:00  C  26130522:00  C  37240615:00  D  30170

If you can guarantee each ID group has at least two rows, the drop_duplicates call is not needed.


Details

g.head(1)

    TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170g.tail(1)

     TimeIDXY715:00D301701220:00A352201422:00C372401523:00B38250pd.concat([g.head(1), g.tail(1)])

     TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170715:00D301701220:00A352201422:00C372401523:00B38250

Solution 2:

If you create a small function to only select the first and last rows of a DataFrame, you can apply this to a group-by, like so:

df.groupby('ID').apply(lambda x: df.iloc[[0, -1]])

As others have mentioned, it might be nice to also .drop_duplicates() or similar after the fact, to filter out duplicated rows for cases where there was only one row for the 'ID'.

Post a Comment for "Get Only The First And Last Rows Of Each Group With Pandas"