从Pub/Sub流式传输到BigQuery

提问者：小点点

从Pub/Sub流式传输到BigQuery

我正在尝试使用python数据流将一些数据从google PubSub流式传输到BigQuery中。出于测试目的，我通过设置将以下代码https://github.com/GoogleCloudPlatform/DataflowSDK-examples/blob/master/python/dataflow_examples/cookbook/bigquery_schema.py到流式传输管道中

options.view_as(StandardOptions).streaming = True

然后我将record_ids管道更改为从Pub/Sub读取

# ADDED THIS
lines = p | 'Read PubSub' >> beam.io.ReadStringsFromPubSub(INPUT_TOPIC) | beam.WindowInto(window.FixedWindows(15))
# CHANGED THIS # record_ids = p | 'CreateIDs' >> beam.Create(['1', '2', '3', '4', '5'])
record_ids = lines | 'Split' >> (beam.FlatMap(split_fn).with_output_types(unicode))
records = record_ids | 'CreateRecords' >> beam.Map(create_random_record)
records | 'Write' >> beam.io.Write(
    beam.io.BigQuerySink(
        OUTPUT,
        schema=table_schema,
        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
        write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))

注意：我已被谷歌列入白名单以运行代码（在alpha中）

现在当我尝试它时，我有一个错误

工作流失败。原因：（f215df7c8fcdbb00）：未知流接收器：bigquery

你可以在这里找到完整的代码：https://github.com/marcorigodanzo/gcp_streaming_test/blob/master/my_bigquery_schema.py

我认为这与现在流式类型的管道有关，有人能告诉我如何在流式管道中进行bigQuery写入吗？

共1个答案

匿名用户

BeamPython不支持从流式管道写入BigQuery。现在，您需要使用BeamJava-您可以分别使用PubsubIO. readStrings（）和BigQueryIO.write eTableRow（）。