提问者:小点点

GCPDataflow无法访问不同GCP项目中的BigQuery数据集


我有一个GCP的数据流,它读取两个不同GCP项目中的两个数据集并进行比较。

它适用于同一项目中的两个数据集。但是,当我尝试比较不同项目中的两个数据集时,我收到了一个错误:

{
  "message": "java.lang.RuntimeException: Unable to confirm BigQuery dataset presence for table \"my-other-project:my_dataset_other.2022-07-13_My_BigQuery_Table\". If the dataset is created by an earlier stage of the pipeline, this validation can be disabled using #withoutValidation.",
  "stacktrace": "java.lang.RuntimeException: java.lang.RuntimeException: Unable to confirm BigQuery dataset presence for table \"my-other-project:my_dataset_other.2022-07-13_My_BigQuery_Table\". If the dataset is created by an earlier stage of the pipeline, this validation can be disabled using #withoutValidation.\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:1018)\n\tat org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:662)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:581)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.access$500(TransformHierarchy.java:240)\n\tat org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:214)\n\tat org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:469)\n\tat org.apache.beam.sdk.Pipeline.validate(Pipeline.java:598)\n\tat org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)\n\tat org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)\n\tat ....
  org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)\n\tat org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)\n\tat org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:61)\n\tat org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)\n\tat org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)\n\tat 
  ........
  
  org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyDatasetPresence(BigQueryHelpers.java:521)\n\t... 116 more\nCaused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden\nGET https://bigquery.googleapis.com/bigquery/v2/projects/my-other-project/datasets/my_dataset_other?prettyPrint=false\n{\n  \"code\" : 403,\n  \"errors\" : [ {\n    \"domain\" : \"global\",\n    \"message\" : \"Access Denied: Dataset my-other-project:my_dataset_other: Permission bigquery.datasets.get denied on dataset my-other-project:my_dataset_other (or it may not exist).\",\n    \"reason\" : \"accessDenied\"\n  } ],\n  \"message\" : \"Access Denied: Dataset my-other-project:my_dataset_other: Permission bigquery.datasets.get denied on dataset my-other-project:my_dataset_other (or it may not exist).\",\n  \"status\" : \"PERMISSION_DENIED\"\n}\n\tat com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)\n\tat com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1324)\n\t... 118 more\n"
}
Response headers
 cache-control: no-cacheno-storemax-age=0must-revalidate 
 connection: close 
 content-type: application/json 
 date: Wed20 Jul 2022 13:40:30 GMT 
 expires: 0 
 pragma: no-cache 

当ProjectA上运行的数据流管道试图访问my-ther-project:my_dataset_other中的数据时,会发生此错误。数据流使用服务号my_user@projecta.iam.gserviceaccount.com运行。

我已经给这个服务号一个角色"大数据数据查看器"在我的其他项目:my_dataset_other。

编辑:

代码是这样的:

private PCollection<MyModel> readModelFromBigQuery(Pipeline pipeline, String projectId, String datasetId, String table) {
    var tableReference = new TableReference()
            .setProjectId(projectId)
            .setDatasetId(datasetId)
            .setTableId(table);

    return pipeline
            .apply(BigQueryIO.readTableRows().from(tableReference ))
            .apply(MapElements.into(TypeDescriptor.of(MyModel.class)).via(MyModel::fromTableRow));
}



var pCollection1 = readModelFromBigQuery(pipeline, "my-first-project", "my_dataset_first", "2022-07-13_My_BigQuery_Table");
var pCollection2 = readModelFromBigQuery(pipeline, "my-other-project", "my_dataset_other", "2022-07-13_My_BigQuery_Table");

PCollectionList.of(pCollection1).and(pCollection2)
            .apply(new MyTransformation())
            .apply(BigQueryIO.<MyModel>write()
                .to(composeDestinationTableName())
.......

有人能告诉我哪里出了问题吗?


共1个答案

匿名用户

解决方案比预期的更容易!今天我的眼睛感受到了403禁止错误,你错过了biquery. datasets.get权限。

当然,要获取数据,你只需要成为数据集上的BQ数据查看器,但是,很明显,Beam连接器首先列出数据集,然后查询数据集中的数据。所以,你必须授予在项目级别列出数据集的能力。

这是一个糟糕的新最小权限原则,但只需在项目级别授予您的服务号。要限制权限范围,您可以在项目级别授予角色角色/bigquery. metadataViewer。它不太宽,也一点也不危险(比项目范围的数据查看器更好)