add benchmark for arrow scan by kevinjqliu · Pull Request #3126 · apache/iceberg-python

➜ uv run pytest tests/benchmark/test_arrow_scan_benchmark.py -m benchmark -s 
======================================================================= test session starts =======================================================================
platform darwin -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/kevinliu/repos/iceberg-python
configfile: pyproject.toml
plugins: mock-3.15.1, anyio-4.11.0, lazy-fixtures-1.4.0, checkdocs-2.14.0, requests-mock-1.12.1
collected 1 item                                                                                                                                                  

tests/benchmark/test_arrow_scan_benchmark.py 
--- ArrowScan.to_record_batches Benchmark (Comparison) ---
runs_per_shape=10, warmup_runs_per_shape=2, sleep_between_scenarios_sec=0.5, files=32, target_file_size_mb=50 (memory only: arr_mb, rss_delta_mb)
| implementation                         | worker_setting | num_files | file_size_mb_avg | total_rows | total_batches | full_scan_time_ms_avg | full_scan_time_ms_max | arrow_peak_mb_avg | rss_peak_delta_mb_avg | arrow_peak_mb_max | rss_peak_delta_mb_max |
| -------------------------------------- | -------------- | --------- | ---------------- | ---------- | ------------- | --------------------- | --------------------- | ----------------- | --------------------- | ----------------- | --------------------- |
| baseline (fully materialize all tasks) | 1              | 32        | 49.10            | 4324000    | 288           | 132.65                | 163.01                | 49.11             | 0.03                  | 49.11             | 0.11                  |
| bounded_queue                          | 1              | 32        | 49.10            | 4324000    | 288           | 142.43                | 154.39                | 100.19            | 0.00                  | 103.92            | 0.03                  |
| lazy                                   | 1              | 32        | 49.10            | 4324000    | 288           | 131.48                | 148.50                | 101.38            | 0.00                  | 103.85            | 0.00                  |
| lazy_warmup                            | 1              | 32        | 49.10            | 4324000    | 288           | 125.48                | 158.24                | 106.02            | 0.00                  | 134.75            | 0.00                  |
| baseline (fully materialize all tasks) | 2              | 32        | 49.10            | 4324000    | 288           | 87.65                 | 92.75                 | 135.00            | 0.04                  | 159.81            | 0.12                  |
| bounded_queue                          | 2              | 32        | 49.10            | 4324000    | 288           | 97.39                 | 105.58                | 201.49            | 0.24                  | 204.91            | 1.52                  |
| lazy                                   | 2              | 32        | 49.10            | 4324000    | 288           | 126.66                | 131.47                | 100.88            | 0.00                  | 102.35            | 0.00                  |
| lazy_warmup                            | 2              | 32        | 49.10            | 4324000    | 288           | 79.60                 | 83.19                 | 213.08            | 0.36                  | 244.99            | 3.56                  |
| baseline (fully materialize all tasks) | 4              | 32        | 49.10            | 4324000    | 288           | 66.89                 | 81.48                 | 308.17            | 0.05                  | 343.86            | 0.27                  |
| bounded_queue                          | 4              | 32        | 49.10            | 4324000    | 288           | 73.09                 | 78.14                 | 394.04            | 0.01                  | 401.54            | 0.06                  |
| lazy                                   | 4              | 32        | 49.10            | 4324000    | 288           | 127.57                | 132.25                | 103.22            | 0.00                  | 109.17            | 0.00                  |
| lazy_warmup                            | 4              | 32        | 49.10            | 4324000    | 288           | 62.09                 | 82.48                 | 504.49            | 0.53                  | 582.62            | 2.30                  |
| baseline (fully materialize all tasks) | 8              | 32        | 49.10            | 4324000    | 288           | 61.22                 | 63.91                 | 699.60            | 12.08                 | 826.30            | 37.50                 |
| bounded_queue                          | 8              | 32        | 49.10            | 4324000    | 288           | 66.69                 | 73.07                 | 752.00            | 0.60                  | 787.62            | 3.34                  |
| lazy                                   | 8              | 32        | 49.10            | 4324000    | 288           | 125.74                | 127.10                | 101.36            | 0.00                  | 106.66            | 0.05                  |
| lazy_warmup                            | 8              | 32        | 49.10            | 4324000    | 288           | 58.10                 | 60.26                 | 1991.14           | 1.85                  | 2429.90           | 9.08                  |
| baseline (fully materialize all tasks) | 16             | 32        | 49.10            | 4324000    | 288           | 60.33                 | 62.30                 | 1585.96           | 2.29                  | 1715.55           | 7.94                  |
| bounded_queue                          | 16             | 32        | 49.10            | 4324000    | 288           | 66.26                 | 77.49                 | 1335.69           | 1.31                  | 1482.20           | 10.48                 |
| lazy                                   | 16             | 32        | 49.10            | 4324000    | 288           | 128.26                | 133.57                | 100.75            | 0.00                  | 102.29            | 0.00                  |
| lazy_warmup                            | 16             | 32        | 49.10            | 4324000    | 288           | 57.81                 | 60.90                 | 2763.34           | 2.22                  | 3079.33           | 9.12                  |
| baseline (fully materialize all tasks) | default (18)   | 32        | 49.10            | 4324000    | 288           | 63.72                 | 72.10                 | 1680.22           | 54.69                 | 1822.33           | 177.28                |
| bounded_queue                          | default (18)   | 32        | 49.10            | 4324000    | 288           | 64.19                 | 69.11                 | 1506.08           | 3.60                  | 1683.01           | 13.53                 |
| lazy                                   | default (18)   | 32        | 49.10            | 4324000    | 288           | 138.37                | 180.34                | 102.41            | 0.00                  | 106.72            | 0.00                  |
| lazy_warmup                            | default (18)   | 32        | 49.10            | 4324000    | 288           | 59.35                 | 66.66                 | 2823.83           | 7.30                  | 3105.95           | 36.11                 |
| baseline (fully materialize all tasks) | 32             | 32        | 49.10            | 4324000    | 288           | 70.89                 | 102.28                | 2099.31           | 88.90                 | 2454.51           | 260.28                |
| bounded_queue                          | 32             | 32        | 49.10            | 4324000    | 288           | 63.24                 | 66.65                 | 2276.13           | 9.70                  | 2850.23           | 48.03                 |
| lazy                                   | 32             | 32        | 49.10            | 4324000    | 288           | 128.86                | 138.36                | 102.16            | 0.01                  | 106.72            | 0.12                  |
| lazy_warmup                            | 32             | 32        | 49.10            | 4324000    | 288           | 60.45                 | 73.04                 | 2846.71           | 11.87                 | 3030.61           | 55.73                 |
saved graph: tests/benchmark/artifacts/arrow_scan_benchmark_relationships.png