提问者:小点点

在ADFv1管道中将源文件名传递到目标


我正在使用Azure数据工厂v1开发ETL(很遗憾,我无法使用Azure数据工厂v2)。

我想从给定的blob存储容器读取所有. csv文件,然后将每个文件的内容写入SQLAzure中的表。

目标表包含csv文件中的所有列。它还必须包含一个带有数据来源文件名称的新列。

这就是我陷入困境的地方:我找不到将文件名从源数据集(来自blob存储源的. csv文件)传递到目标数据集(Sql Azure接收器)的方法。

我已经实现了一个管道,该管道从blob存储中读取文件并将其保存到AzureSQL表中。

以下是将单个文件复制到AzureSQLjson的摘录:

{
    "name": "pipelineFileImport",
    "properties": {
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource",
                        "recursive": false
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
                    }
                },
                "inputs": [
                    {
                        "name": "InputDataset-cn0"
                    }
                ],
                "outputs": [
                    {
                        "name": "OutputDataset-cn0"
                    }
                ],
                "policy": {
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "style": "StartOfInterval",
                    "retry": 3,
                    "longRetry": 0,
                    "longRetryInterval": "00:00:00"
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
            }
        ],
        "start": "2018-07-20T09:50:55.486Z",
        "end": "2018-07-20T09:50:55.486Z",
        "isPaused": false,
        "hubName": "test_hub",
        "pipelineMode": "OneTime",
        "expirationTime": "3.00:00:00",
        "datasets": [
            {
                "name": "InputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },
                        ...
                    ],
                    "published": false,
                    "type": "AzureBlob",
                    "linkedServiceName": "Source-TestBlobStorage",
                    "typeProperties": {
                        "fileName": "testFile001.csv",
                        "folderPath": "fileinput",
                        "format": {
                            "type": "TextFormat",
                            "columnDelimiter": ";",
                            "firstRowAsHeader": true
                        }
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": true,
                    "policy": {}
                }
            },
            {
                "name": "OutputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },...
                    ],
                    "published": false,
                    "type": "AzureSqlTable",
                    "linkedServiceName": "Destination-SQLAzure-cn0",
                    "typeProperties": {
                        "tableName": "[staging].[Files]"
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": false,
                    "policy": {}
                }
            }
        ]
    }
}

我需要一种方法将源文件的名称传递给目标数据集,以便将其写入SQLAzure数据库。


共1个答案

匿名用户

没有本地方法来处理这个问题。但是我认为你可以使用存储过程来实现这一点。

请参考存储过程属性。https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-sql-connector#copy-activity-properties

相关问题