How Can I Resample Pandas Dataframe By Day On Period Time?
i have a dataframe like this: df.head() Out[2]: price sale_date 0 477,000,000 1396/10/30 1 608,700,000 1396/10/30 2 580,000,000 1396/10/03 3 350,000,000 139
Solution 1:
It seems here not working resample
and Grouper
with Periods
for me in pandas 1.1.3 (I guess bug):
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00
# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00
Possible solution is aggregate by sum
, so if duplicated sale_date
then price
values are summed:
df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
sale_date price
0 1396-03-18 328000000
1 1396-10-03 580000000
2 1396-10-30 477000000
3 1396-11-25 608700000
4 1396-12-05 350000000
EDIT: It is possible by Series.reindex
with period_range
:
s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
df = s.reindex(rng, fill_value=0).reset_index()
print (df)
sale_date price
0 1396-03-18 328000000
1 1396-03-19 0
2 1396-03-20 0
3 1396-03-21 0
4 1396-03-22 0
.. ... ...
258 1396-12-01 0
259 1396-12-02 0
260 1396-12-03 0
261 1396-12-04 0
262 1396-12-05 350000000
[263 rows x 2 columns]
Post a Comment for "How Can I Resample Pandas Dataframe By Day On Period Time?"