Get Only The First And Last Rows Of Each Group With Pandas
Iam newbie in python. I have huge a dataframe with millions of rows and id. my data looks like this: Time ID X Y 8:00 A 23 100 9:00 B 24 110 10:00 B 25 120 1
Solution 1:
Use groupby
, find the head
and tail
for each group, and concat
the two.
g = df.groupby('ID')
(pd.concat([g.head(1), g.tail(1)])
.drop_duplicates()
.sort_values('ID')
.reset_index(drop=True))
Time ID X Y
08:00 A 23100120:00 A 3522029:00 B 24110323:00 B 38250411:00 C 26130522:00 C 37240615:00 D 30170
If you can guarantee each ID group has at least two rows, the drop_duplicates
call is not needed.
Details
g.head(1)
TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170g.tail(1)
TimeIDXY715:00D301701220:00A352201422:00C372401523:00B38250pd.concat([g.head(1), g.tail(1)])
TimeIDXY08:00A2310019:00B24110311:00C26130715:00D30170715:00D301701220:00A352201422:00C372401523:00B38250
Solution 2:
If you create a small function to only select the first and last rows of a DataFrame, you can apply this to a group-by, like so:
df.groupby('ID').apply(lambda x: df.iloc[[0, -1]])
As others have mentioned, it might be nice to also .drop_duplicates()
or similar after the fact, to filter out duplicated rows for cases where there was only one row for the 'ID'.
Post a Comment for "Get Only The First And Last Rows Of Each Group With Pandas"