[Bug]: Mitigate impact of CVE-2023-47248 for Apache Beam

What happened?

There is a recently disclosed vulnerability affecting PyArrow dependency: https://nvd.nist.gov/vuln/detail/CVE-2023-47248 , which might be a matter of concern for some Beam users who read parquet files from untrusted sources.

To address this, we have applied the mitigation provided by https://pypi.org/project/pyarrow-hotfix/ in Beam 2.52.0, and will upgrade Beam to support pyarrow==14 in a future release.

Users of Beam version 2.51.0 or below who use pyarrow in their pipelines and are concerned about CVE-2023-47248, can apply the following workround:

  1. Install pyarrow-hotfix package on the workers
  2. Add import pyarrow_hotfix in the pipeline code: if the pipeline is composed only of one module, add --save_main_session pipeline option. If the pipeline is comprised of multiple files and uses --setup_file, add the import in the pipeline package files, for example in the __init__.py file.

Issue Priority

Priority: 1 (major)

Issue Components

  • Component: Python SDK