[SPARK-48710][PYTHON] Use NumPy 2.0 compatible types by codesorcery · Pull Request #47083 · apache/spark
added 2 commits
June 25, 2024 12:01
codesorcery
changed the title
Spark [SPARK-48710][PYTHON] Use NumPy 2.0 compatible types
[SPARK-48710][PYTHON] Use NumPy 2.0 compatible types
HyukjinKwon pushed a commit that referenced this pull request
Jul 3, 2024…1.15,<2) ### What changes were proposed in this pull request? * Add a constraint for `numpy<2` to the PySpark package ### Why are the changes needed? PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail. #47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied. ### Does this PR introduce _any_ user-facing change? NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`. ### How was this patch tested? Via existing CI jobs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47175 from codesorcery/SPARK-48710-numpy-upper-bound. Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request
Jul 3, 2024…1.15,<2) ### What changes were proposed in this pull request? * Add a constraint for `numpy<2` to the PySpark package ### Why are the changes needed? PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail. #47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied. ### Does this PR introduce _any_ user-facing change? NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`. ### How was this patch tested? Via existing CI jobs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47175 from codesorcery/SPARK-48710-numpy-upper-bound. Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 44eba46) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
gaecoli pushed a commit to gaecoli/spark that referenced this pull request
Jul 10, 2024…1.15,<2) ### What changes were proposed in this pull request? * Add a constraint for `numpy<2` to the PySpark package ### Why are the changes needed? PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail. apache#47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied. ### Does this PR introduce _any_ user-facing change? NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`. ### How was this patch tested? Via existing CI jobs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47175 from codesorcery/SPARK-48710-numpy-upper-bound. Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request
Jul 30, 2024…ptional dependencies ### What changes were proposed in this pull request? This is a follow-up of #47083 to recover PySpark RDD tests. ### Why are the changes needed? `PySpark Core` test should not fail on optional dependencies. **BEFORE** ``` $ python/run-tests.py --python-executables python3 --modules pyspark-core ... File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/core/rdd.py", line 5376, in _test import numpy as np ModuleNotFoundError: No module named 'numpy' ``` **AFTER** ``` $ python/run-tests.py --python-executables python3 --modules pyspark-core ... Tests passed in 189 seconds Skipped tests in pyspark.tests.test_memory_profiler with python3: test_assert_vanilla_mode (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_assert_vanilla_mode) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_aggregate_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_aggregate_in_pandas) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_clear) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_cogroup_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_arrow) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_cogroup_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_pandas) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_group_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_arrow) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_group_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_pandas) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_map_in_pandas_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_map_in_pandas_not_supported) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_pandas_udf_iterator_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_iterator_not_supported) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_pandas_udf_window (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_window) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_udf_multiple_actions (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_multiple_actions) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_udf_registered (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_registered) ... skipped 'Must have memory-profiler installed.' test_memory_profiler_udf_with_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_with_arrow) ... skipped 'Must have memory-profiler installed.' test_profilers_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_profilers_clear) ... skipped 'Must have memory-profiler installed.' test_code_map (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_code_map) ... skipped 'Must have memory-profiler installed.' test_memory_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_memory_profiler) ... skipped 'Must have memory-profiler installed.' test_profile_pandas_function_api (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_function_api) ... skipped 'Must have memory-profiler installed.' test_profile_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_udf) ... skipped 'Must have memory-profiler installed.' test_udf_line_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_udf_line_profiler) ... skipped 'Must have memory-profiler installed.' Skipped tests in pyspark.tests.test_rdd with python3: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (pyspark.tests.test_rdd.RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock) ... skipped 'NumPy or Pandas not installed' Skipped tests in pyspark.tests.test_serializers with python3: test_statcounter_array (pyspark.tests.test_serializers.NumPyTests.test_statcounter_array) ... skipped 'NumPy not installed' test_serialize (pyspark.tests.test_serializers.SciPyTests.test_serialize) ... skipped 'SciPy not installed' Skipped tests in pyspark.tests.test_worker with python3: test_memory_limit (pyspark.tests.test_worker.WorkerMemoryTest.test_memory_limit) ... skipped "Memory limit feature in Python worker is dependent on Python's 'resource' module on Linux; however, not found or not on Linux." test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12' test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12' ``` ### Does this PR introduce _any_ user-facing change? No. The failure happens during testing. ### How was this patch tested? Pass the CIs and do the manual test without optional dependencies. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47526 from dongjoon-hyun/SPARK-48710. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
sunchao pushed a commit that referenced this pull request
Mar 10, 2026…1.15,<2) ### What changes were proposed in this pull request? * Add a constraint for `numpy<2` to the PySpark package ### Why are the changes needed? PySpark references some code which was removed with NumPy 2.0. Thus, if `numpy>=2` is installed, executing PySpark may fail. #47083 updates the `master` branch to be compatible with NumPy 2. This PR adds a version bound for older releases, where it won't be applied. ### Does this PR introduce _any_ user-facing change? NumPy will be limited to `numpy<2` when installing `pypspark` with extras `ml`, `mllib`, `sql`, `pandas_on_spark` or `connect`. ### How was this patch tested? Via existing CI jobs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47175 from codesorcery/SPARK-48710-numpy-upper-bound. Authored-by: Patrick Marx <6949483+codesorcery@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters