我正在使用Azure数据工厂v1开发ETL(很遗憾,我无法使用Azure数据工厂v2)。
我想从给定的blob存储容器读取所有. csv文件,然后将每个文件的内容写入SQLAzure中的表。
目标表包含csv文件中的所有列。它还必须包含一个带有数据来源文件名称的新列。
这就是我陷入困境的地方:我找不到将文件名从源数据集(来自blob存储源的. csv文件)传递到目标数据集(Sql Azure接收器)的方法。
我已经实现了一个管道,该管道从blob存储中读取文件并将其保存到AzureSQL表中。
以下是将单个文件复制到AzureSQLjson的摘录:
{
"name": "pipelineFileImport",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
}
},
"inputs": [
{
"name": "InputDataset-cn0"
}
],
"outputs": [
{
"name": "OutputDataset-cn0"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
}
],
"start": "2018-07-20T09:50:55.486Z",
"end": "2018-07-20T09:50:55.486Z",
"isPaused": false,
"hubName": "test_hub",
"pipelineMode": "OneTime",
"expirationTime": "3.00:00:00",
"datasets": [
{
"name": "InputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},
...
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-TestBlobStorage",
"typeProperties": {
"fileName": "testFile001.csv",
"folderPath": "fileinput",
"format": {
"type": "TextFormat",
"columnDelimiter": ";",
"firstRowAsHeader": true
}
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
},
{
"name": "OutputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},...
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "Destination-SQLAzure-cn0",
"typeProperties": {
"tableName": "[staging].[Files]"
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": false,
"policy": {}
}
}
]
}
}
我需要一种方法将源文件的名称传递给目标数据集,以便将其写入SQLAzure数据库。
没有本地方法来处理这个问题。但是我认为你可以使用存储过程来实现这一点。
请参考存储过程属性。https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-sql-connector#copy-activity-properties