提问者:小点点

上传数据帧到s3 python[复制]


我试图划分如下的数据帧:

from io import StringIO
import pandas as pd

data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby(['C','B']):
    group.to_csv(f'df_{key}.csv', index=False)

这将通过数据帧将组的结果导出到本地机器。有没有办法执行此操作并将这些多个拆分csv上传到s3(类似于boto3的put_object)


共2个答案

匿名用户

您可以使用s3fs,也必须安装s3fs。可以使用pip进行安装,例如:

pip install s3fs

基于您的代码验证的示例:

import os

from io import StringIO
import pandas as pd
import s3fs

# I did not use my default aws profile
# so had to provide key and secret. If you use
# the default aws profile, providing `key`
# and `secret` should not be required
fs = s3fs.S3FileSystem(
        anon=False,
        key='<access_key>',
        secret='<secret_key>')

data = """ 
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby(['C','B']):
    group.to_csv(fs.open(f's3://<bucket-name>/df_{key[0]}-M{key[1]}.csv', 'w'), index=False)

代码正确上传文件:

匿名用户

from io import StringIO
import pandas as pd
import boto3


data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

client = boto3.client('s3')
for key, group in df.groupby(['C', 'B']):
    group.to_csv(f'df_{key}.csv', index=False)
    client.upload_file(f'df_{key}.csv', 'my-another-test-bucket-2',
                       f'df_{key[0]}-M{key[1]}.csv')

S3铲斗