提问者:小点点

PythonCloud DataFlow中的依赖项,要求. txt在本地工作,但不在worker上工作


我正试图让我的Cloud DataFlow作业运行,并按照此处所述的要求. txt文件

https://cloud.google.com/dataflow/pipelines/dependencies-python

我可以直接构建python库,而不是从源代码构建所有opencv(需要20-30分钟)

通过我的计算引擎,我可以做到这一点

root@fcfca6a4dad2:/DeepMeerkat# pip install opencv-python
Collecting opencv-python
  Downloading opencv_python-3.2.0.7-cp27-cp27mu-manylinux1_x86_64.whl (6.7MB)
    100% |################################| 6.7MB 163kB/s
Collecting numpy>=1.11.1 (from opencv-python)
  Downloading numpy-1.13.0-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB)
    100% |################################| 16.6MB 68kB/s
Installing collected packages: numpy, opencv-python
  Found existing installation: numpy 1.8.2
    DEPRECATION: Uninstalling a distutils installed project (numpy) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling numpy-1.8.2:
      Successfully uninstalled numpy-1.8.2
Successfully installed numpy-1.13.0 opencv-python-3.2.0.7

我可以把它和其他一些模块打包到一个需求文件中

root@fcfca6a4dad2:/DeepMeerkat# pip install -r tests/prediction/requirements.txt
Requirement already satisfied: opencv-python in /usr/local/lib/python2.7/dist-packages (from -r tests/prediction/requirements.txt (line 1))
Collecting tensorflow==1.0.1 (from -r tests/prediction/requirements.txt (line 2))
  Downloading tensorflow-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl (44.1MB)
    100% |################################| 44.1MB 27kB/s
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from -r tests/prediction/requirements.txt (line 3))
Requirement already satisfied: mock>=2.0.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: wheel in /usr/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: protobuf>=3.1.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: funcsigs>=1; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: pbr>=0.11 in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=3.1.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Installing collected packages: tensorflow
Successfully installed tensorflow-1.0.1

但是,当我将其发送到云数据流时,它无法从worker中找到opencv-python。

root@fcfca6a4dad2:/DeepMeerkat# python tests/prediction/run.py \
>     --runner DataflowRunner \
>     --project $PROJECT \
>     --staging_location $BUCKET/staging \
>     --temp_location $BUCKET/temp \
>     --job_name $PROJECT-deepmeerkat \
>     --setup_file tests/prediction/setup.py \
>     --requirements_file tests/prediction/requirements.txt
No handlers could be found for logger "oauth2client.contrib.multistore_file"
/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py:113: DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0855119228363 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0597159862518 seconds
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Starting GCS upload to gs://api-project-773889352370-testing/staging/api-project-773889352370-deepmeerkat.1499372970.163850/requirements.txt...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Completed GCS upload to gs://api-project-773889352370-testing/staging/api-project-773889352370-deepmeerkat.1499372970.163850/requirements.txt
INFO:root:Executing command: ['/usr/bin/python', '-m', 'pip', 'install', '--download', '/tmp/dataflow-requirements-cache', '-r', 'tests/prediction/requirements.txt', '--no-binary', ':all:']
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
Collecting opencv-python (from -r tests/prediction/requirements.txt (line 1))
  Could not find a version that satisfies the requirement opencv-python (from -r tests/prediction/requirements.txt (line 1)) (from versions: )
No matching distribution found for opencv-python (from -r tests/prediction/requirements.txt (line 1))
Traceback (most recent call last):
  File "tests/prediction/run.py", line 22, in <module>
    predict.run()
  File "/DeepMeerkat/tests/prediction/modules/predict.py", line 32, in run
    p.run()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 167, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 176, in run
    return self.runner.run(self)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 252, in run
    self.dataflow_client.create_job(self.job), self)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper
    return fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 425, in create_job
    self.create_job_description(job)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 448, in create_job_description
    job.options, file_copy=self._gcs_file_copy)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/dependency.py", line 307, in stage_job_resources
    setup_options.requirements_file, requirements_cache_path)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/dependency.py", line 241, in _populate_requirements_cache
    processes.check_call(cmd_args)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/processes.py", line 44, in check_call
    return subprocess.check_call(*args, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'pip', 'install', '--download', '/tmp/dataflow-requirements-cache', '-r', 'tests/prediction/requirements.txt', '--no-binary', ':all:']' returned non-zero exit status 1

看起来是没有二进制标志才是问题所在。本地运行中(卸载上面的之后)

root@fcfca6a4dad2:/DeepMeerkat# pip install -r tests/prediction/requirements.txt --no-binary :all:
Collecting opencv-python (from -r tests/prediction/requirements.txt (line 1))
  Could not find a version that satisfies the requirement opencv-python (from -r tests/prediction/requirements.txt (line 1)) (from versions: )
No matching distribution found for opencv-python (from -r tests/prediction/requirements.txt (line 1))

无二进制标志被描述为排除破碎的轮子?这在这种情况下如何适用?

可以确认模块可以运行

再次,

root@fcfca6a4dad2:/DeepMeerkat# pip install opencv-python
Collecting opencv-python
  Using cached opencv_python-3.2.0.7-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.11.1 in /usr/local/lib/python2.7/dist-packages (from opencv-python)
Installing collected packages: opencv-python
Successfully installed opencv-python-3.2.0.7
root@fcfca6a4dad2:/DeepMeerkat# python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>>

共1个答案

匿名用户

我认为您看到的错误实际上是由于worker未能安装轮子文件而导致的。如opencv-python包页面上所述,轮子文件的问题可能会导致包显示为未找到。

在这种情况下,您可以使用不在PyPI中的包的说明并指定--extra_package