我有一个GCP的数据流,它读取两个不同GCP项目中的两个数据集并进行比较。
它适用于同一项目中的两个数据集。但是,当我尝试比较不同项目中的两个数据集时,我收到了一个错误:
{
"message": "java.lang.RuntimeException: Unable to confirm BigQuery dataset presence for table \"my-other-project:my_dataset_other.2022-07-13_My_BigQuery_Table\". If the dataset is created by an earlier stage of the pipeline, this validation can be disabled using #withoutValidation.",
"stacktrace": "java.lang.RuntimeException: java.lang.RuntimeException: Unable to confirm BigQuery dataset presence for table \"my-other-project:my_dataset_other.2022-07-13_My_BigQuery_Table\". If the dataset is created by an earlier stage of the pipeline, this validation can be disabled using #withoutValidation.\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:1018)\n\tat org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:662)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:581)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)\n\tat org.apache.beam.sdk.runners.TransformHierarchy$Node.access$500(TransformHierarchy.java:240)\n\tat org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:214)\n\tat org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:469)\n\tat org.apache.beam.sdk.Pipeline.validate(Pipeline.java:598)\n\tat org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)\n\tat org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)\n\tat ....
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)\n\tat org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)\n\tat org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:61)\n\tat org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)\n\tat org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)\n\tat org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)\n\tat
........
org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyDatasetPresence(BigQueryHelpers.java:521)\n\t... 116 more\nCaused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden\nGET https://bigquery.googleapis.com/bigquery/v2/projects/my-other-project/datasets/my_dataset_other?prettyPrint=false\n{\n \"code\" : 403,\n \"errors\" : [ {\n \"domain\" : \"global\",\n \"message\" : \"Access Denied: Dataset my-other-project:my_dataset_other: Permission bigquery.datasets.get denied on dataset my-other-project:my_dataset_other (or it may not exist).\",\n \"reason\" : \"accessDenied\"\n } ],\n \"message\" : \"Access Denied: Dataset my-other-project:my_dataset_other: Permission bigquery.datasets.get denied on dataset my-other-project:my_dataset_other (or it may not exist).\",\n \"status\" : \"PERMISSION_DENIED\"\n}\n\tat com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)\n\tat com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1324)\n\t... 118 more\n"
}
Response headers
cache-control: no-cacheno-storemax-age=0must-revalidate
connection: close
content-type: application/json
date: Wed20 Jul 2022 13:40:30 GMT
expires: 0
pragma: no-cache
当ProjectA上运行的数据流管道试图访问my-ther-project:my_dataset_other中的数据时,会发生此错误。数据流使用服务号my_user@projecta.iam.gserviceaccount.com运行。
我已经给这个服务号一个角色"大数据数据查看器"在我的其他项目:my_dataset_other。
编辑:
代码是这样的:
private PCollection<MyModel> readModelFromBigQuery(Pipeline pipeline, String projectId, String datasetId, String table) {
var tableReference = new TableReference()
.setProjectId(projectId)
.setDatasetId(datasetId)
.setTableId(table);
return pipeline
.apply(BigQueryIO.readTableRows().from(tableReference ))
.apply(MapElements.into(TypeDescriptor.of(MyModel.class)).via(MyModel::fromTableRow));
}
var pCollection1 = readModelFromBigQuery(pipeline, "my-first-project", "my_dataset_first", "2022-07-13_My_BigQuery_Table");
var pCollection2 = readModelFromBigQuery(pipeline, "my-other-project", "my_dataset_other", "2022-07-13_My_BigQuery_Table");
PCollectionList.of(pCollection1).and(pCollection2)
.apply(new MyTransformation())
.apply(BigQueryIO.<MyModel>write()
.to(composeDestinationTableName())
.......
有人能告诉我哪里出了问题吗?
解决方案比预期的更容易!今天我的眼睛感受到了403禁止错误,你错过了biquery. datasets.get权限。
当然,要获取数据,你只需要成为数据集上的BQ数据查看器,但是,很明显,Beam连接器首先列出数据集,然后查询数据集中的数据。所以,你必须授予在项目级别列出数据集的能力。
这是一个糟糕的新最小权限原则,但只需在项目级别授予您的服务号。要限制权限范围,您可以在项目级别授予角色角色/bigquery. metadataViewer。它不太宽,也一点也不危险(比项目范围的数据查看器更好)