import pandas as pd
mydate = ["01/01/2018","19/01/2018","24/01/2018" ,
"27/01/2018","29/01/2018","30/01/2018" ,
"22/02/2018","23/03/2018"]
mydate = pd.to_datetime(mydate)
events = ["a" , "b" , "c" , "d" , "e" , "f" ,"g" , "h"]
df = pd.DataFrame({"date" :mydate,"events" :events})
df
date events
0 2018-01-01 a
1 2018-01-19 b
2 2018-01-24 c
3 2018-01-27 d
4 2018-01-29 e
5 2018-01-30 f
6 2018-02-22 g
7 2018-03-23 h
我想每20天对数据进行切片,并将其存储在单独的数据帧中。我看了组,date_range和其他功能,但找不到解决我的问题。我可以使用典型的for循环来做这件事,但我希望使用一些熊猫功能。
Expected result
df = [df1 , df2 , df3 , df4]
where df1 contain row 0 ,1
df2 contains row 2,3,4,5
df3 contain row 6
df4 contain row 7
可以使用pd。Grouper
withfreq='20d'
:
In [8]: final_list = [e for _, e in df.groupby(pd.Grouper(key='date', freq='20d')) if not e.empty]
In [9]: for e in final_list: print(e)
date events
0 2018-01-01 a
1 2018-01-19 b
date events
2 2018-01-24 c
3 2018-01-27 d
4 2018-01-29 e
5 2018-01-30 f
date events
6 2018-02-22 g
date events
7 2018-03-23 h
下面是一个解决方案,尽管它使用了一个简单的循环:
import pandas as pd
from datetime import datetime
df = 'your dataframe'
dfs = []
delta = df.date.max() - df.date.min()
for i in range(0, delta.days+1, 20):
mask = (df['date'] >= df.date.min()+datetime.timedelta(days=i)) & (df['date'] <= df.date.min() + datetime.timedelta(days=i+20))
dfs.append(df.loc[mask])
我试过这个,
minimum=df['date'].min()
df['diff']=(df['date']-minimum)/datetime.timedelta(days=1)
df['s']=df.groupby(pd.cut(df['diff'],np.arange(-0.000001, df['diff'].max()+20, 20))).grouper.group_info[0]
for u,v in df.groupby('s'):
del v['s']
print v
输出
date events diff
0 2018-01-01 a 0.0
1 2018-01-19 b 18.0
date events diff
2 2018-01-24 c 23.0
3 2018-01-27 d 26.0
4 2018-01-29 e 28.0
5 2018-01-30 f 29.0
date events diff
6 2018-02-22 g 52.0
date events diff
7 2018-03-23 h 81.0