提问者:小点点

执行S3 Multipart Upload操作到SSE-KMS加密存储桶时访问被拒绝


我面临Multipart UploadSSE-KMS加密桶的访问被拒绝
代码正在运行Glue(可能来自其他服务,无法验证)。我尝试了一组不同的权限,甚至完全访问,但没有效果。

  1. Access for KMS is granted properly (kms:Decrypt, kms:Encrypt and kms:GenerateDataKey*) and it worked previously!
  2. This issues appears for both Glue on Spark and Glue PyShell jobs and do not affect small files.
  3. For Glue PySpark jobs this issue appear only when Glue security configuration is set (the key is same as granted for the job)
  4. For PySpark, writing is done via
    output_sink = glueContext.getSink(...)
    output_sink.writeFrame(dynamic_frame)
    
    df = pandas.read_excel(...)
    df.to_parquet(output_file_path, compression="snappy", index=False)
    

    我已经尝试过但没有成功:

    1. 向资源添加s3:*权限
    2. kms:*权限添加到作业中使用的kms密钥的策略
    3. 另外,添加桶ARN资源(不带密钥前缀)
    4. 设置s3:*资源:“*”
    5. glue.amazonaws.com添加到KMS密钥的主要服务
    6. 对于PySpark作业,手动指定fs.s3.enableServerSideEncryptionfs.s3.serverSideEncryption.kms.keyId以及相应的密钥ARN
    7. 对于PyShell作业,尝试升级awsclibotocoreboto3版本(pands=1.1.5以及s3fs=0.4.2由于PyShell python版本的3.6.13,版本无法升级到更高版本)

    PySpark作业的stacktrace的一部分指示多部分上载问题:

    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 172.36.10.82, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ..; S3 Extended Request ID: ..), S3 Extended Request ID: ....
    ..<cropped entries>..
        at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:110)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:189)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.putObject(AmazonS3LiteClient.java:107)
        at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:174)
        at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:208)
        at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:423)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:74)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:108)
        at org.apache.parquet.nimble.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:579)
    ..<cropped entries>..
        at com.amazonaws.services.glue.sinks.GlueParquetHadoopWriter.writeParquetPartitioned(GlueParquetHadoopWriter.scala:163)
        at com.amazonaws.services.glue.sinks.GlueParquetHadoopWriter$$anonfun$doParquetWrite$2.apply(GlueParquetHadoopWriter.scala:188)
        at com.amazonaws.services.glue.sinks.GlueParquetHadoopWriter$$anonfun$doParquetWrite$2.apply(GlueParquetHadoopWriter.scala:181)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
    ..<cropped entries>..
    

    PyShell的错误消息:

    Sending http request: <AWSPreparedRequest stream_output=False, method=PUT, 
    url=https://my-bucket-name.s3.ca-central-1.amazonaws.com/folder/folder/folder/file-name.snappy.parquet?partNumber=1&uploadId=~uploadId~, 
    headers={
      'User-Agent': b'Botocore/1.12.232 Python/3.6.13 Linux/4.14.238-125.422.amzn1.x86_64',
      'Content-MD5': b'Ic4VG7BgETssQJOhSK+E/Q==',
      'Expect': b'100-continue',
      'X-Amz-Date': b'20220518T163248Z',
      'X-Amz-Security-Token': b'~token-data~',
      'X-Amz-Content-SHA256': b'UNSIGNED-PAYLOAD',
      'Authorization': b'AWS4-HMAC-SHA256 Credential=~credential~, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=~signature~',
      'Content-Length': '5421349'
    }>
    ...
    
    Traceback (most recent call last):
      File "/tmp/glue-python-scripts-2tscdixy/script.py", line 44, in main
        df.to_parquet(output_file_path, compression="snappy", index=False)
      File "/glue/lib/installation/pandas/util/_decorators.py", line 199, in wrapper
        return func(*args, **kwargs)
      File "/glue/lib/installation/pandas/core/frame.py", line 2372, in to_parquet
        **kwargs,
      File "/glue/lib/installation/pandas/io/parquet.py", line 276, in to_parquet
        **kwargs,
      File "/glue/lib/installation/pandas/io/parquet.py", line 123, in write
        self.api.parquet.write_table(table, path, compression=compression, **kwargs)
      File "/glue/lib/installation/pyarrow/parquet.py", line 2034, in write_table
        writer.write_table(table, row_group_size=row_group_size)
      File "/glue/lib/installation/pyarrow/parquet.py", line 686, in __exit__
        self.close()
      File "/glue/lib/installation/pyarrow/parquet.py", line 710, in close
        self.file_handle.close()
      File "pyarrow/io.pxi", line 173, in pyarrow.lib.NativeFile.close
      File "/glue/lib/installation/fsspec/spec.py", line 1630, in close
        self.flush(force=True)
      File "/glue/lib/installation/fsspec/spec.py", line 1501, in flush
        if self._upload_chunk(final=force) is not False:
      File "/glue/lib/installation/s3fs/core.py", line 1245, in _upload_chunk
        raise IOError('Write failed: %r' % exc)
    OSError: Write failed: ClientError('An error occurred (AccessDenied) when calling the UploadPart operation: Access Denied',)
    

共1个答案

匿名用户

这个问题持续了3天,今天突然消失了。我唯一的怀疑是,这个问题是由AWS内部错误引起的,昨天刚刚修复,所有非工作策略都开始给予适当的权限。

对于寻找任何线索或解决方案的人来说,我发现唯一有用的可能选项(但尚未确认这是否有效)是添加具有bucket列表权限的单独策略语句,以及为目标文件夹语句添加s3:ListMultipartUploadPartss3:AbortMultipartUpload操作:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::my-bucket-name"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetEncryptionConfiguration",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket-name/my-output-folder/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ],
      "Resource": "arn:aws:kms:region:11111:key/kkkkkk"
    }
  ]
}