[Bug]: Mitigate impact of CVE-2023-47248 for Apache Beam
What happened?
There is a recently disclosed vulnerability affecting PyArrow dependency: https://nvd.nist.gov/vuln/detail/CVE-2023-47248 , which might be a matter of concern for some Beam users who read parquet files from untrusted sources.
To address this, we have applied the mitigation provided by https://pypi.org/project/pyarrow-hotfix/ in Beam 2.52.0, and will upgrade Beam to support pyarrow==14 in a future release.
Users of Beam version 2.51.0 or below who use pyarrow in their pipelines and are concerned about CVE-2023-47248, can apply the following workround:
- Install
pyarrow-hotfixpackage on the workers - Add
import pyarrow_hotfixin the pipeline code: if the pipeline is composed only of one module, add--save_main_sessionpipeline option. If the pipeline is comprised of multiple files and uses--setup_file, add the import in the pipeline package files, for example in the__init__.pyfile.
Issue Priority
Priority: 1 (major)
Issue Components
- Component: Python SDK