Skip to content Skip to sidebar Skip to footer

How To Keep Only The Most Recent Revised Order For Each Order In Pandas

Say I have a data frame that tracks the order number, and the revision number for that order in two different columns like so: OrderNum RevNum TotalPrice 0AXL3 0 $5.00

Solution 1:

IIUC:

In [100]: df.groupby('OrderNum', as_index=False).last()
Out[100]:
  OrderNum  RevNum TotalPrice
00AXL3       3      $8.0010BDF1       2      $8.50

UPDATE:

If there were other columns in the data frame, would this keep those as well?

In [116]: df['new'] = np.arange(len(df))

In [117]: df
Out[117]:
  OrderNum  RevNum TotalPrice  new
0    0AXL3       0      $5.00    0
1    0AXL3       1      $4.00    1
2    0AXL3       2      $7.00    2
3    0AXL3       3      $8.00    3
4    0BDF1       0      $3.00    4
5    0BDF1       1      $2.50    5
6    0BDF1       2      $8.50    6

In [118]: df.groupby('OrderNum', as_index=False).last()
Out[118]:
  OrderNum  RevNum TotalPrice  new
0    0AXL3       3      $8.00    3
1    0BDF1       2      $8.50    6

Solution 2:

One way is use drop_duplicates, note dataframe should be sorted on RevNum from smallest to largest or you can add sort_values:

df1.drop_duplicates(subset='OrderNum', keep='last')

Output:

  OrderNum  RevNum TotalPrice
3    0AXL3       3      $8.00
6    0BDF1       2      $8.50

OR

df1[~df1.duplicated(subset='OrderNum', keep='last')]

Output:

  OrderNum  RevNum TotalPrice
3    0AXL3       3      $8.00
6    0BDF1       2      $8.50

Post a Comment for "How To Keep Only The Most Recent Revised Order For Each Order In Pandas"