我正在VertexAI管道中使用google云管道组件CustomPythonPackageTrainingJobRunOp。我以前能够成功地将此包作为CustomTrainingJob运行。我可以在日志中看到多个(11)错误消息,但对我来说唯一有意义的是,“ValueError:太多值无法解包(预期2)”,但我无法找出解决方案。如果需要,我也可以添加所有其他错误消息。我在训练代码的开头记录了一些消息,因此我知道错误发生在训练代码执行之前。我完全被困在这一点上。指向某人在管道中使用CustomPythonPackageTrainingJobRunOp的示例的链接也会非常有帮助。以下是我尝试执行的管道代码:
import kfp
from kfp.v2 import compiler
from kfp.v2.google.client import AIPlatformClient
from google_cloud_pipeline_components import aiplatform as gcc_aip
@kfp.dsl.pipeline(name=pipeline_name)
def pipeline(
project: str = "adsfafs-321118",
location: str = "us-central1",
display_name: str = "vertex_pipeline",
python_package_gcs_uri: str = "gs://vertex/training/training-package-3.0.tar.gz",
python_module_name: str = "trainer.task",
container_uri: str = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
staging_bucket: str = "vertex_bucket",
base_output_dir: str = "gs://vertex_artifacts/custom_training/"
):
gcc_aip.CustomPythonPackageTrainingJobRunOp(
display_name=display_name,
python_package_gcs_uri=python_package_gcs_uri,
python_module=python_module_name,
container_uri=container_uri,
project=project,
location=location,
staging_bucket=staging_bucket,
base_output_dir=base_output_dir,
args = ["--arg1=val1", "--arg2=val2", ...]
)
compiler.Compiler().compile(
pipeline_func=pipeline, package_path=package_path
)
api_client = AIPlatformClient(project_id=project_id, region=region)
response = api_client.create_run_from_job_spec(
package_path,
pipeline_root=pipeline_root_path
)
在CustomPythonPackageTrainingJobRunOp的留档中,参数"python_module"的类型似乎是"google.cloud. aiplatform.training_jobs.CustomPythonPackageTrainingJob"而不是字符串,这似乎很奇怪。但是,我尝试重新定义管道,其中我已将CustomPythonPackageTrainingJobRunOp中的参数python_module替换为CustomPythonPackageTrainingJob对象而不是字符串,如下所示,但仍然得到相同的错误:
def pipeline(
project: str = "...",
location: str = "...",
display_name: str = "...",
python_package_gcs_uri: str = "...",
python_module_name: str = "...",
container_uri: str = "...",
staging_bucket: str = "...",
base_output_dir: str = "...",
):
job = aiplatform.CustomPythonPackageTrainingJob(
display_name= display_name,
python_package_gcs_uri=python_package_gcs_uri,
python_module_name=python_module_name,
container_uri=container_uri,
staging_bucket=staging_bucket
)
gcc_aip.CustomPythonPackageTrainingJobRunOp(
display_name=display_name,
python_package_gcs_uri=python_package_gcs_uri,
python_module=job,
container_uri=container_uri,
project=project,
location=location,
base_output_dir=base_output_dir,
args = ["--arg1=val1", "--arg2=val2", ...]
)
编辑:
添加了我正在传递但忘记在这里添加的参数。
原来我将args传递给python模块的方式是不正确的。您需要指定args而不是
args = ["--arg1=val1","--arg2=val2",…]
,您需要指定args = ["--arg1", val1,"--arg2",val2,…]