我试图划分如下的数据帧:
from io import StringIO
import pandas as pd
data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')
for key, group in df.groupby(['C','B']):
group.to_csv(f'df_{key}.csv', index=False)
这将通过数据帧将组的结果导出到本地机器。有没有办法执行此操作并将这些多个拆分csv上传到s3(类似于boto3的put_object)
您可以使用s3fs,也必须安装s3fs。可以使用pip
进行安装,例如:
pip install s3fs
基于您的代码验证的示例:
import os
from io import StringIO
import pandas as pd
import s3fs
# I did not use my default aws profile
# so had to provide key and secret. If you use
# the default aws profile, providing `key`
# and `secret` should not be required
fs = s3fs.S3FileSystem(
anon=False,
key='<access_key>',
secret='<secret_key>')
data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')
for key, group in df.groupby(['C','B']):
group.to_csv(fs.open(f's3://<bucket-name>/df_{key[0]}-M{key[1]}.csv', 'w'), index=False)
代码正确上传文件:
from io import StringIO
import pandas as pd
import boto3
data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')
client = boto3.client('s3')
for key, group in df.groupby(['C', 'B']):
group.to_csv(f'df_{key}.csv', index=False)
client.upload_file(f'df_{key}.csv', 'my-another-test-bucket-2',
f'df_{key[0]}-M{key[1]}.csv')
S3铲斗