我有一个脚本,从中我可以将参数传递给hive变量。整个流程如下。一个配置单元变量被截断。
beeline -u "$HIVESERVER" \
-f $path_to.hql \
--hivevar TRGT_DB=$HIVE_TRGT_DB \
--hivevar SRC_CLMN=$SOURCE_CLMNS \
--hivevar INBND_DB=$HIVE_SRC_DB \
--hivevar SRC_TBL=$SOURCE_TABLE \
--hivevar TRGT_TBL=$TRGT_TABLE \
SRC_CLMN是从如下文件中映射的。
SRC_CLMN=`cat source_column.tbl | grep table | (some sed functions)`
trim(regexp_replace(col1, '[^a-zA-Z]', ' ')) as col1,trim(col2) as col2,trim(regexp_extract(col1,“\(([^)] )\)”, 1)) as col3,trim(col4) as col4,trim(col5) as col5,trim(col6) as col6,'' as col7,trim(col8) as col8,trim(col9) as col9,trim(col10) as col10,trim(col11) as col11,trim(col12) as col12,trim(col13) as col13
当我打印变量时,它打印的是整个字符串。
但是当我使用SET SRC_CLMN打印hive变量时;我只看到直到" trim(regexp_replace(col1,"
所以这是在hive查询中抛出一个错误。
SRC_CLMN变量的输出:
trim(regexp_replace(col1,'[^a-zA-Z]',' '))as col1,trim(col 2)as col2,trim(regexp_extract(col1,"(([^)])",1)) as col3,trim(col4) as col4,trim(col5) as col5,trim(col6) as col6,' ' as col7,trim(col8) as col8,trim(col9) as col9,trim(col10) as col10,trim(col11) as col11,trim(col12) as col12,trim(col13) as col13
正如您在输出中看到的,有单引号'[^a-zA-Z]',''))您必须使用跳过字符才能使它们在字符串中可用
这就是变量如何将SRC_CLMN='trim(regexp_replace(col1,'[^a-zA-Z]','))保存为col1,trim(col2)保存为col2,trim regexp_extract(col1),“(([^)])”,1)保存为col3,trim)col4,trim修剪(col12)为col12,修剪(col13)为col13
如果你选中我标记的粗体斜体部分,你会看到它以引号开头,以引号结尾,所以对于变量,它是字符串
您需要处理所有带有跳过字符的引用,您可以使用sed来这样做,以替换所有'并添加跳过字符\ '