提问者:小点点

尝试在VertexAI管道中使用CustomPythonPackageTrainingJobRunOp时出错


我正在VertexAI管道中使用google云管道组件CustomPythonPackageTrainingJobRunOp。我以前能够成功地将此包作为CustomTrainingJob运行。我可以在日志中看到多个(11)错误消息,但对我来说唯一有意义的是,“ValueError:太多值无法解包(预期2)”,但我无法找出解决方案。如果需要,我也可以添加所有其他错误消息。我在训练代码的开头记录了一些消息,因此我知道错误发生在训练代码执行之前。我完全被困在这一点上。指向某人在管道中使用CustomPythonPackageTrainingJobRunOp的示例的链接也会非常有帮助。以下是我尝试执行的管道代码:

import kfp
from kfp.v2 import compiler
from kfp.v2.google.client import AIPlatformClient
from google_cloud_pipeline_components import aiplatform as gcc_aip

@kfp.dsl.pipeline(name=pipeline_name)
def pipeline(
    project: str = "adsfafs-321118",
    location: str = "us-central1",
    display_name: str = "vertex_pipeline",
    python_package_gcs_uri: str = "gs://vertex/training/training-package-3.0.tar.gz",
    python_module_name: str = "trainer.task",
    container_uri: str = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
    staging_bucket: str = "vertex_bucket",
    base_output_dir: str = "gs://vertex_artifacts/custom_training/"
):
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=python_module_name,
        container_uri=container_uri,
        project=project,
        location=location,
        staging_bucket=staging_bucket,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )



compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=package_path
)

api_client = AIPlatformClient(project_id=project_id, region=region)

response = api_client.create_run_from_job_spec(
    package_path,
    pipeline_root=pipeline_root_path
)

在CustomPythonPackageTrainingJobRunOp的留档中,参数"python_module"的类型似乎是"google.cloud. aiplatform.training_jobs.CustomPythonPackageTrainingJob"而不是字符串,这似乎很奇怪。但是,我尝试重新定义管道,其中我已将CustomPythonPackageTrainingJobRunOp中的参数python_module替换为CustomPythonPackageTrainingJob对象而不是字符串,如下所示,但仍然得到相同的错误:

def pipeline(
    project: str = "...",
    location: str = "...",
    display_name: str = "...",
    python_package_gcs_uri: str = "...",
    python_module_name: str = "...",
    container_uri: str = "...",
    staging_bucket: str = "...",
    base_output_dir: str = "...",
):

    job = aiplatform.CustomPythonPackageTrainingJob(
        display_name= display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module_name=python_module_name,
        container_uri=container_uri,
        staging_bucket=staging_bucket
    )
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=job,
        container_uri=container_uri,
        project=project,
        location=location,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )

编辑:

添加了我正在传递但忘记在这里添加的参数。


共1个答案

匿名用户

原来我将args传递给python模块的方式是不正确的。您需要指定args而不是args = ["--arg1=val1","--arg2=val2",…],您需要指定args = ["--arg1", val1,"--arg2",val2,…]