gh-91432: Replace JUMP+FOR_ITER with FOR_END by sweeneyde · Pull Request #70016 · python/cpython
Below are some benchmarks. My machine is not the most stable, but I believe there is some consistent measurable speedup.
PyPerformance:
Slower (9):
- scimark_sparse_mat_mult: 4.87 ms +- 0.23 ms -> 5.04 ms +- 0.19 ms: 1.04x slower
- spectral_norm: 96.6 ms +- 1.4 ms -> 100.0 ms +- 1.7 ms: 1.03x slower
- regex_v8: 23.3 ms +- 0.4 ms -> 24.0 ms +- 0.5 ms: 1.03x slower
- pathlib: 19.7 ms +- 0.4 ms -> 20.2 ms +- 0.8 ms: 1.03x slower
- sympy_sum: 184 ms +- 2 ms -> 188 ms +- 6 ms: 1.02x slower
- telco: 6.46 ms +- 0.12 ms -> 6.56 ms +- 0.19 ms: 1.02x slower
- sqlalchemy_imperative: 24.5 ms +- 1.0 ms -> 24.8 ms +- 0.7 ms: 1.02x slower
- regex_dna: 209 ms +- 3 ms -> 212 ms +- 3 ms: 1.01x slower
- logging_simple: 6.55 us +- 0.12 us -> 6.64 us +- 0.28 us: 1.01x slower
Faster (20):
- pickle_list: 4.77 us +- 0.17 us -> 4.44 us +- 0.15 us: 1.07x faster
- logging_silent: 105 ns +- 5 ns -> 98.0 ns +- 2.0 ns: 1.07x faster
- nbody: 105 ms +- 5 ms -> 99.1 ms +- 3.7 ms: 1.06x faster
- pickle_dict: 28.6 us +- 0.4 us -> 26.9 us +- 0.5 us: 1.06x faster
- scimark_lu: 111 ms +- 2 ms -> 105 ms +- 3 ms: 1.05x faster
- pickle: 12.6 us +- 1.1 us -> 12.0 us +- 1.4 us: 1.05x faster
- xml_etree_iterparse: 116 ms +- 16 ms -> 110 ms +- 2 ms: 1.05x faster
- deltablue: 4.27 ms +- 0.18 ms -> 4.12 ms +- 0.09 ms: 1.04x faster
- regex_effbot: 3.19 ms +- 0.09 ms -> 3.08 ms +- 0.04 ms: 1.04x faster
- scimark_monte_carlo: 72.9 ms +- 1.5 ms -> 70.4 ms +- 1.2 ms: 1.03x faster
- django_template: 42.1 ms +- 2.2 ms -> 40.9 ms +- 2.4 ms: 1.03x faster
- float: 81.6 ms +- 1.5 ms -> 79.7 ms +- 1.3 ms: 1.02x faster
- pickle_pure_python: 351 us +- 10 us -> 343 us +- 8 us: 1.02x faster
- chaos: 80.3 ms +- 1.8 ms -> 78.8 ms +- 2.1 ms: 1.02x faster
- unpickle_pure_python: 263 us +- 6 us -> 259 us +- 7 us: 1.02x faster
- nqueens: 87.0 ms +- 1.4 ms -> 85.5 ms +- 2.2 ms: 1.02x faster
- raytrace: 329 ms +- 9 ms -> 324 ms +- 10 ms: 1.01x faster
- unpickle_list: 5.11 us +- 0.12 us -> 5.05 us +- 0.08 us: 1.01x faster
- regex_compile: 152 ms +- 3 ms -> 151 ms +- 3 ms: 1.01x faster
- scimark_sor: 119 ms +- 2 ms -> 118 ms +- 1 ms: 1.01x faster
Benchmark hidden because not significant (29): 2to3, chameleon, crypto_pyaes, dulwich_log, fannkuch, go, hexiom, json_dumps, json_loads, logging_format, mako, meteor_contest, pidigits, pyflate, python_startup, python_startup_no_site, richards, scimark_fft, sqlalchemy_declarative, sqlite_synth, sympy_expand, sympy_integrate, sympy_str, tornado_http, unpack_sequence, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process
Geometric mean: 1.01x faster
Microbenchmarks:
benchmark code:
from itertools import repeat from pyperf import Runner, perf_counter runner = Runner() def time_this(func): runner.bench_time_func(func.__name__, func) return func ############################### @time_this def range_sum(loops): s = 0 r = iter(range(loops)) t0 = perf_counter() for x in r: s += x return perf_counter() - t0 @time_this def list_sum(loops): s = 0.0 r = iter([1.0] * loops) t0 = perf_counter() for x in r: s += x return perf_counter() - t0 @time_this def repeat_sum(loops): s = 0.0 r = repeat(1.0, loops) t0 = perf_counter() for x in r: s += x return perf_counter() - t0 ############################### @time_this def range_all(loops): r = iter(range(1, loops + 1)) t0 = perf_counter() for x in r: if not x: break return perf_counter() - t0 @time_this def list_all(loops): r = iter([True] * loops) t0 = perf_counter() for x in r: if not x: break return perf_counter() - t0 @time_this def repeat_all(loops): r = repeat(True, loops) t0 = perf_counter() for x in r: if not x: break return perf_counter() - t0
Faster (6):
- range_all: 17.5 ns +- 0.1 ns -> 16.6 ns +- 0.3 ns: 1.05x faster
- list_all: 10.6 ns +- 0.7 ns -> 10.3 ns +- 0.4 ns: 1.03x faster
- repeat_sum: 15.2 ns +- 0.2 ns -> 14.8 ns +- 0.5 ns: 1.03x faster
- list_sum: 17.3 ns +- 0.2 ns -> 16.9 ns +- 0.3 ns: 1.03x faster
- range_sum: 31.5 ns +- 0.3 ns -> 30.9 ns +- 0.3 ns: 1.02x faster
- repeat_all: 7.98 ns +- 0.06 ns -> 7.90 ns +- 0.10 ns: 1.01x faster
Geometric mean: 1.03x faster
These don't make too much sense (some macro-benchmarks speeding up more than the tightest micro-benchmarks?), so if someone with a stable machine is willing to re-measure, that would be appreciated.