gh-91432: Replace JUMP+FOR_ITER with FOR_END by sweeneyde · Pull Request #70016 · python/cpython

Below are some benchmarks. My machine is not the most stable, but I believe there is some consistent measurable speedup.

PyPerformance:

Slower (9):

  • scimark_sparse_mat_mult: 4.87 ms +- 0.23 ms -> 5.04 ms +- 0.19 ms: 1.04x slower
  • spectral_norm: 96.6 ms +- 1.4 ms -> 100.0 ms +- 1.7 ms: 1.03x slower
  • regex_v8: 23.3 ms +- 0.4 ms -> 24.0 ms +- 0.5 ms: 1.03x slower
  • pathlib: 19.7 ms +- 0.4 ms -> 20.2 ms +- 0.8 ms: 1.03x slower
  • sympy_sum: 184 ms +- 2 ms -> 188 ms +- 6 ms: 1.02x slower
  • telco: 6.46 ms +- 0.12 ms -> 6.56 ms +- 0.19 ms: 1.02x slower
  • sqlalchemy_imperative: 24.5 ms +- 1.0 ms -> 24.8 ms +- 0.7 ms: 1.02x slower
  • regex_dna: 209 ms +- 3 ms -> 212 ms +- 3 ms: 1.01x slower
  • logging_simple: 6.55 us +- 0.12 us -> 6.64 us +- 0.28 us: 1.01x slower

Faster (20):

  • pickle_list: 4.77 us +- 0.17 us -> 4.44 us +- 0.15 us: 1.07x faster
  • logging_silent: 105 ns +- 5 ns -> 98.0 ns +- 2.0 ns: 1.07x faster
  • nbody: 105 ms +- 5 ms -> 99.1 ms +- 3.7 ms: 1.06x faster
  • pickle_dict: 28.6 us +- 0.4 us -> 26.9 us +- 0.5 us: 1.06x faster
  • scimark_lu: 111 ms +- 2 ms -> 105 ms +- 3 ms: 1.05x faster
  • pickle: 12.6 us +- 1.1 us -> 12.0 us +- 1.4 us: 1.05x faster
  • xml_etree_iterparse: 116 ms +- 16 ms -> 110 ms +- 2 ms: 1.05x faster
  • deltablue: 4.27 ms +- 0.18 ms -> 4.12 ms +- 0.09 ms: 1.04x faster
  • regex_effbot: 3.19 ms +- 0.09 ms -> 3.08 ms +- 0.04 ms: 1.04x faster
  • scimark_monte_carlo: 72.9 ms +- 1.5 ms -> 70.4 ms +- 1.2 ms: 1.03x faster
  • django_template: 42.1 ms +- 2.2 ms -> 40.9 ms +- 2.4 ms: 1.03x faster
  • float: 81.6 ms +- 1.5 ms -> 79.7 ms +- 1.3 ms: 1.02x faster
  • pickle_pure_python: 351 us +- 10 us -> 343 us +- 8 us: 1.02x faster
  • chaos: 80.3 ms +- 1.8 ms -> 78.8 ms +- 2.1 ms: 1.02x faster
  • unpickle_pure_python: 263 us +- 6 us -> 259 us +- 7 us: 1.02x faster
  • nqueens: 87.0 ms +- 1.4 ms -> 85.5 ms +- 2.2 ms: 1.02x faster
  • raytrace: 329 ms +- 9 ms -> 324 ms +- 10 ms: 1.01x faster
  • unpickle_list: 5.11 us +- 0.12 us -> 5.05 us +- 0.08 us: 1.01x faster
  • regex_compile: 152 ms +- 3 ms -> 151 ms +- 3 ms: 1.01x faster
  • scimark_sor: 119 ms +- 2 ms -> 118 ms +- 1 ms: 1.01x faster

Benchmark hidden because not significant (29): 2to3, chameleon, crypto_pyaes, dulwich_log, fannkuch, go, hexiom, json_dumps, json_loads, logging_format, mako, meteor_contest, pidigits, pyflate, python_startup, python_startup_no_site, richards, scimark_fft, sqlalchemy_declarative, sqlite_synth, sympy_expand, sympy_integrate, sympy_str, tornado_http, unpack_sequence, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

Microbenchmarks:

benchmark code:

from itertools import repeat
from pyperf import Runner, perf_counter

runner = Runner()
def time_this(func):
    runner.bench_time_func(func.__name__, func)
    return func

###############################

@time_this
def range_sum(loops):
    s = 0
    r = iter(range(loops))
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

@time_this
def list_sum(loops):
    s = 0.0
    r = iter([1.0] * loops)
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

@time_this
def repeat_sum(loops):
    s = 0.0
    r = repeat(1.0, loops)
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

###############################

@time_this
def range_all(loops):
    r = iter(range(1, loops + 1))
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

@time_this
def list_all(loops):
    r = iter([True] * loops)
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

@time_this
def repeat_all(loops):
    r = repeat(True, loops)
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

Faster (6):

  • range_all: 17.5 ns +- 0.1 ns -> 16.6 ns +- 0.3 ns: 1.05x faster
  • list_all: 10.6 ns +- 0.7 ns -> 10.3 ns +- 0.4 ns: 1.03x faster
  • repeat_sum: 15.2 ns +- 0.2 ns -> 14.8 ns +- 0.5 ns: 1.03x faster
  • list_sum: 17.3 ns +- 0.2 ns -> 16.9 ns +- 0.3 ns: 1.03x faster
  • range_sum: 31.5 ns +- 0.3 ns -> 30.9 ns +- 0.3 ns: 1.02x faster
  • repeat_all: 7.98 ns +- 0.06 ns -> 7.90 ns +- 0.10 ns: 1.01x faster

Geometric mean: 1.03x faster

These don't make too much sense (some macro-benchmarks speeding up more than the tightest micro-benchmarks?), so if someone with a stable machine is willing to re-measure, that would be appreciated.