提问者:小点点

如何在Python中高效地执行两个数据帧的笛卡尔乘积?


因此,我的目标是将每月数据扩展为每日数据,我认为最好的方法是合并下面的两个数据帧。我的pandas是1.2之前的版本,所以我不能使用merge with cross。还有其他有效的方法吗?

我有两个数据帧。一个数据帧是一列,其时间如下:

A = 2000-01-01,2000-01-02....2000-12-31

我的第二个数据框有两个不同的列,比如时间、城市、食物。我想将第二个数据帧与第一个数据帧及时合并,但时间有格式

time = 2000-01-01,2000-02-01...2000-12-01

每个城市都有。使其形象化

London 2000-01-01  apple
London 2000-01-01  orange
London 2000-01-01  Rasberry
London 2000-02-01  apple
London 2000-02-01  orange
London 2000-02-01  Rasberry
...
London 2000-12-01  Rasberry
Paris 2000-01-01  apple
Paris 2000-01-01  orange
Paris 2000-01-01  Rasberry
Paris 2000-02-01  apple
Paris 2000-02-01  orange
Paris 2000-02-01  Rasberry
...
Paris  2000-12-01 Rasberry

我想将其合并,使其最终成为:

London 2000-01-01 apple
London 2000-01-02 NAN
..
London 2000-12-01 Apple
..
London 2000-12-31 NAN
London 2000-01-01 Orange
London 2000-01-02 NAN
..
London 2000-12-01 Orange
..
London 2000-12-31 NAN
Paris 2000-01-01 apple
Paris 2000-01-02 NAN
..
Paris 2000-12-01 Apple
..
Paris 2000-12-31 NAN
Paris 2000-01-01 Orange
Paris 2000-01-02 NAN
..
Paris 2000-12-01 Orange
..
Paris 2000-12-31 NAN

NANS是从当我合并它的时候,没有从每个水果的每个城市的第2-31个值。如果有一个简单的方法,请让我知道。我认为这是某种外部连接,但它搞乱了格式。


共1个答案

匿名用户

我会使用一个Groupby,后面跟着一个左合并:

df2.groupby('city').apply(lambda df: df1.merge(df, 'left', on='time')
                          )[['time', 'food']].reset_index(
                              level='city').reset_index(drop=True)

我得到:

       city        time      food
0    London  2000-01-01     apple
1    London  2000-01-01    orange
2    London  2000-01-01  Rasberry
3    London  2000-01-02       NaN
4    London  2000-01-03       NaN
5    London  2000-01-04       NaN
...
30   London  2000-01-29       NaN
31   London  2000-01-30       NaN
32   London  2000-01-31       NaN
33   London  2000-02-01     apple
34   London  2000-02-01    orange
35   London  2000-02-01  Rasberry
36   London  2000-02-02       NaN
37   London  2000-02-03       NaN
...
337  London  2000-11-29       NaN
338  London  2000-11-30       NaN
339  London  2000-12-01  Rasberry
340  London  2000-12-02       NaN
341  London  2000-12-03       NaN
342  London  2000-12-04       NaN
...
367  London  2000-12-29       NaN
368  London  2000-12-30       NaN
369  London  2000-12-31       NaN
370   Paris  2000-01-01     apple
371   Paris  2000-01-01    orange
372   Paris  2000-01-01  Rasberry
373   Paris  2000-01-02       NaN
374   Paris  2000-01-03       NaN
...
401   Paris  2000-01-30       NaN
402   Paris  2000-01-31       NaN
403   Paris  2000-02-01     apple
404   Paris  2000-02-01    orange
405   Paris  2000-02-01  Rasberry
406   Paris  2000-02-02       NaN
407   Paris  2000-02-03       NaN
...
707   Paris  2000-11-29       NaN
708   Paris  2000-11-30       NaN
709   Paris  2000-12-01  Rasberry
710   Paris  2000-12-02       NaN
711   Paris  2000-12-03       NaN
...
738   Paris  2000-12-30       NaN
739   Paris  2000-12-31       NaN