add benchmark for arrow scan by kevinjqliu · Pull Request #3126 · apache/iceberg-python
➜ uv run pytest tests/benchmark/test_arrow_scan_benchmark.py -m benchmark -s
======================================================================= test session starts =======================================================================
platform darwin -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/kevinliu/repos/iceberg-python
configfile: pyproject.toml
plugins: mock-3.15.1, anyio-4.11.0, lazy-fixtures-1.4.0, checkdocs-2.14.0, requests-mock-1.12.1
collected 1 item
tests/benchmark/test_arrow_scan_benchmark.py
--- ArrowScan.to_record_batches Benchmark (Comparison) ---
runs_per_shape=10, warmup_runs_per_shape=2, sleep_between_scenarios_sec=0.5, files=32, target_file_size_mb=50 (memory only: arr_mb, rss_delta_mb)
| implementation | worker_setting | num_files | file_size_mb_avg | total_rows | total_batches | full_scan_time_ms_avg | full_scan_time_ms_max | arrow_peak_mb_avg | rss_peak_delta_mb_avg | arrow_peak_mb_max | rss_peak_delta_mb_max |
| -------------------------------------- | -------------- | --------- | ---------------- | ---------- | ------------- | --------------------- | --------------------- | ----------------- | --------------------- | ----------------- | --------------------- |
| baseline (fully materialize all tasks) | 1 | 32 | 49.10 | 4324000 | 288 | 132.65 | 163.01 | 49.11 | 0.03 | 49.11 | 0.11 |
| bounded_queue | 1 | 32 | 49.10 | 4324000 | 288 | 142.43 | 154.39 | 100.19 | 0.00 | 103.92 | 0.03 |
| lazy | 1 | 32 | 49.10 | 4324000 | 288 | 131.48 | 148.50 | 101.38 | 0.00 | 103.85 | 0.00 |
| lazy_warmup | 1 | 32 | 49.10 | 4324000 | 288 | 125.48 | 158.24 | 106.02 | 0.00 | 134.75 | 0.00 |
| baseline (fully materialize all tasks) | 2 | 32 | 49.10 | 4324000 | 288 | 87.65 | 92.75 | 135.00 | 0.04 | 159.81 | 0.12 |
| bounded_queue | 2 | 32 | 49.10 | 4324000 | 288 | 97.39 | 105.58 | 201.49 | 0.24 | 204.91 | 1.52 |
| lazy | 2 | 32 | 49.10 | 4324000 | 288 | 126.66 | 131.47 | 100.88 | 0.00 | 102.35 | 0.00 |
| lazy_warmup | 2 | 32 | 49.10 | 4324000 | 288 | 79.60 | 83.19 | 213.08 | 0.36 | 244.99 | 3.56 |
| baseline (fully materialize all tasks) | 4 | 32 | 49.10 | 4324000 | 288 | 66.89 | 81.48 | 308.17 | 0.05 | 343.86 | 0.27 |
| bounded_queue | 4 | 32 | 49.10 | 4324000 | 288 | 73.09 | 78.14 | 394.04 | 0.01 | 401.54 | 0.06 |
| lazy | 4 | 32 | 49.10 | 4324000 | 288 | 127.57 | 132.25 | 103.22 | 0.00 | 109.17 | 0.00 |
| lazy_warmup | 4 | 32 | 49.10 | 4324000 | 288 | 62.09 | 82.48 | 504.49 | 0.53 | 582.62 | 2.30 |
| baseline (fully materialize all tasks) | 8 | 32 | 49.10 | 4324000 | 288 | 61.22 | 63.91 | 699.60 | 12.08 | 826.30 | 37.50 |
| bounded_queue | 8 | 32 | 49.10 | 4324000 | 288 | 66.69 | 73.07 | 752.00 | 0.60 | 787.62 | 3.34 |
| lazy | 8 | 32 | 49.10 | 4324000 | 288 | 125.74 | 127.10 | 101.36 | 0.00 | 106.66 | 0.05 |
| lazy_warmup | 8 | 32 | 49.10 | 4324000 | 288 | 58.10 | 60.26 | 1991.14 | 1.85 | 2429.90 | 9.08 |
| baseline (fully materialize all tasks) | 16 | 32 | 49.10 | 4324000 | 288 | 60.33 | 62.30 | 1585.96 | 2.29 | 1715.55 | 7.94 |
| bounded_queue | 16 | 32 | 49.10 | 4324000 | 288 | 66.26 | 77.49 | 1335.69 | 1.31 | 1482.20 | 10.48 |
| lazy | 16 | 32 | 49.10 | 4324000 | 288 | 128.26 | 133.57 | 100.75 | 0.00 | 102.29 | 0.00 |
| lazy_warmup | 16 | 32 | 49.10 | 4324000 | 288 | 57.81 | 60.90 | 2763.34 | 2.22 | 3079.33 | 9.12 |
| baseline (fully materialize all tasks) | default (18) | 32 | 49.10 | 4324000 | 288 | 63.72 | 72.10 | 1680.22 | 54.69 | 1822.33 | 177.28 |
| bounded_queue | default (18) | 32 | 49.10 | 4324000 | 288 | 64.19 | 69.11 | 1506.08 | 3.60 | 1683.01 | 13.53 |
| lazy | default (18) | 32 | 49.10 | 4324000 | 288 | 138.37 | 180.34 | 102.41 | 0.00 | 106.72 | 0.00 |
| lazy_warmup | default (18) | 32 | 49.10 | 4324000 | 288 | 59.35 | 66.66 | 2823.83 | 7.30 | 3105.95 | 36.11 |
| baseline (fully materialize all tasks) | 32 | 32 | 49.10 | 4324000 | 288 | 70.89 | 102.28 | 2099.31 | 88.90 | 2454.51 | 260.28 |
| bounded_queue | 32 | 32 | 49.10 | 4324000 | 288 | 63.24 | 66.65 | 2276.13 | 9.70 | 2850.23 | 48.03 |
| lazy | 32 | 32 | 49.10 | 4324000 | 288 | 128.86 | 138.36 | 102.16 | 0.01 | 106.72 | 0.12 |
| lazy_warmup | 32 | 32 | 49.10 | 4324000 | 288 | 60.45 | 73.04 | 2846.71 | 11.87 | 3030.61 | 55.73 |
saved graph: tests/benchmark/artifacts/arrow_scan_benchmark_relationships.png