提问者:小点点

如何在Kubeflow管道组件步骤中呈现混淆矩阵可视化?


这是我正在尝试的官方教程之一

import kfp
import kfp.dsl as dsl
from kfp.components import create_component_from_func    

@create_component_from_func
def confusion_visualization(matrix_uri: str = 'https://raw.githubusercontent.com/kubeflow/pipelines/master/samples/core/visualization/confusion_matrix.csv') -> NamedTuple('VisualizationOutput', [('mlpipeline_ui_metadata', 'UI_metadata')]):
    """Provide confusion matrix csv file to visualize as metrics."""
    import json

    metadata = {
        'outputs' : [{
          'type': 'confusion_matrix',
          'format': 'csv',
          'schema': [
            {'name': 'target', 'type': 'CATEGORY'},
            {'name': 'predicted', 'type': 'CATEGORY'},
            {'name': 'count', 'type': 'NUMBER'},
          ],
          'source': matrix_uri,
          'labels': ['rose', 'lily', 'iris'],
        }]
    }
    
    print('Printing the metadata')
    print(metadata)

    from collections import namedtuple
    visualization_output = namedtuple('VisualizationOutput', [
        'mlpipeline_ui_metadata'])
    print()
    return visualization_output(json.dumps(metadata))

@dsl.pipeline(
    name='confusion-matrix-pipeline',
    description='A sample pipeline to generate Confusion Matrix for UI visualization.'
)
def confusion_matrix_pipeline():
    confusion_visualization_task = confusion_visualization('results.json')
    
    
client = kfp.Client()
client.create_run_from_pipeline_func(
    confusion_matrix_pipeline,
    arguments={}
)

我无法在运行输出或可视化选项卡中看到可视化。它说此步骤中没有可视化。我在这里错过了什么?


共1个答案

匿名用户

我认为您的代码的一个问题是您没有将输出作为文件提供(尝试使用OutputPath)。

来自KubeFlow文档:

该组件还必须导出一个文件输出工件,工件名称为mlpipeline-用户界面-元数据,否则Kubeflow管道UI将无法呈现可视化。

…如果组件将这样的文件写入其容器文件系统,则Kubeflow管道系统提取该文件,Kubeflow管道UI使用该文件生成指定的查看器。元数据指定从何处加载工件数据。Kubeflow管道UI将数据加载到内存中并呈现它。

KubeFlow文档还提供了一个示例,它对我很有用:

def confusion_matrix_viz(mlpipeline_ui_metadata_path: kfp.components.OutputPath()):
  import json
    
  metadata = {
    'outputs' : [{
      'type': 'confusion_matrix',
      'format': 'csv',
      'schema': [
        {'name': 'target', 'type': 'CATEGORY'},
        {'name': 'predicted', 'type': 'CATEGORY'},
        {'name': 'count', 'type': 'NUMBER'},
      ],
      'source': <CONFUSION_MATRIX_CSV_FILE>,
      # Convert vocab to string because for bealean values we want "True|False" to match csv data.
      'labels': list(map(str, vocab)),
    }]
  }

  with open(mlpipeline_ui_metadata_path, 'w') as metadata_file:
    json.dump(metadata, metadata_file)

因此,您可以尝试像这样修改您的代码:

import kfp
import kfp.dsl as dsl
from kfp.components import create_component_from_func    

@create_component_from_func
def confusion_visualization(
    matrix_uri: str = 'https://raw.githubusercontent.com/kubeflow/pipelines/master/samples/core/visualization/confusion_matrix.csv',
    mlpipeline_ui_metadata_path: kfp.components.OutputPath()
):
    """Provide confusion matrix csv file to visualize as metrics."""
    import json

    metadata = {
        'outputs' : [{
          'type': 'confusion_matrix',
          'format': 'csv',
          'schema': [
            {'name': 'target', 'type': 'CATEGORY'},
            {'name': 'predicted', 'type': 'CATEGORY'},
            {'name': 'count', 'type': 'NUMBER'},
          ],
          'source': matrix_uri,
          'labels': ['rose', 'lily', 'iris'],
        }]
    }
    
    print('Printing the metadata')
    print(metadata)

    with open(mlpipeline_ui_metadata_path, 'w') as metadata_file:
        json.dump(metadata, metadata_file)

@dsl.pipeline(
    name='confusion-matrix-pipeline',
    description='A sample pipeline to generate Confusion Matrix for UI visualization.'
)
def confusion_matrix_pipeline():
    confusion_visualization_task = confusion_visualization('results.json')
    
    
client = kfp.Client()
client.create_run_from_pipeline_func(
    confusion_matrix_pipeline,
    arguments={}
)