我正在构建一个云SQL(MSSQL服务器)到BigQuery集成,使用GCP上的Airflow(Composer)。我在GKE集群中设置了一个云SQL代理,它运行良好,没有错误:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: cloud-sql-proxy
name: cloud-sql-proxy
namespace: cloud-sql-to-bq
spec:
replicas: 1
selector:
matchLabels:
run: cloud-sql-proxy
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
run: cloud-sql-proxy
spec:
containers:
- command:
- /cloud_sql_proxy
- -instances=[INSTANCE-NAME]=tcp:0.0.0.0:1433
image: b.gcr.io/cloudsql-docker/gce-proxy:latest
imagePullPolicy: IfNotPresent
name: airflow-sqlproxy
ports:
- containerPort: 1433
protocol: TCP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
restartPolicy: Always
我的DAG:
dag = DAG('mssql-export-demo', catchup=False, default_args=default_args)
cloud_storage_bucket_name = 'mssql-export-test'
export_customers = MsSqlToGoogleCloudStorageOperator(
task_id='export_analysis',
sql='SELECT * FROM vwAnalysis;',
bucket=cloud_storage_bucket_name,
filename='data/customers/export.json',
schema_filename='schemas/export.json',
mssql_conn_id='cloud_sql_proxy_conn',
dag=dag
)
我还在Airflow中创建了一个指向cloud_sql_proxy_conn的连接。当我运行DAG时,我收到以下错误:
[2020-11-28 01:59:20,555] {taskinstance.py:1153} ERROR - Connection to the database failed for an unknown reason.
Traceback (most recent call last)
File "src/pymssql.pyx", line 636, in pymssql.connec
File "src/_mssql.pyx", line 1964, in _mssql.connec
File "src/_mssql.pyx", line 683, in _mssql.MSSQLConnection.__init_
_mssql.MSSQLDriverException: Connection to the database failed for an unknown reason
没有其他错误信息,所以这使得调试变得相当困难。有人有云SQL和作曲家上MSSQL的经验来帮我解决这个问题吗?
Airflow现在提供了CloudSqlInstanceExportOperator,这意味着无需在GKE中设置云SQL代理。