Get Only The First And Last Rows Of Each Group With Pandas

April 22, 2024 Post a Comment

Iam newbie in python. I have huge a dataframe with millions of rows and id. my data looks like this: Time ID X Y 8:00 A 23 100 9:00 B 24 110 10:00 B 25 120 1

Solution 1:

Use groupby, find the head and tail for each group, and concat the two.

g = df.groupby('ID')

(pd.concat([g.head(1), g.tail(1)])
   .drop_duplicates()
   .sort_values('ID')
   .reset_index(drop=True))

    Time ID   X    Y
08:00  A  23100120:00  A  3522029:00  B  24110323:00  B  38250411:00  C  26130522:00  C  37240615:00  D  30170

If you can guarantee each ID group has at least two rows, the drop_duplicates call is not needed.

Details

g.head(1)

    TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170g.tail(1)

     TimeIDXY715:00D301701220:00A352201422:00C372401523:00B38250pd.concat([g.head(1), g.tail(1)])

     TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170715:00D301701220:00A352201422:00C372401523:00B38250

Solution 2:

If you create a small function to only select the first and last rows of a DataFrame, you can apply this to a group-by, like so:

df.groupby('ID').apply(lambda x: df.iloc[[0, -1]])

As others have mentioned, it might be nice to also .drop_duplicates() or similar after the fact, to filter out duplicated rows for cases where there was only one row for the 'ID'.

Free Interactive Python Tutorial

Get Only The First And Last Rows Of Each Group With Pandas

Solution 1:

Solution 2:

Post a Comment for "Get Only The First And Last Rows Of Each Group With Pandas"