bpo-46841: Quicken code in-place by brandtbucher · Pull Request #31888 · python/cpython
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This moves the bytecode to the end of the corresponding PyCodeObject, and quickens it in-place.
Related changes:
PyCodeObjectis now more compact. I've removed the almost-always-unusedco_varnames,co_freevars, andco_cellvarsmember caches, and rearranged someintmembers to fill some holes in the struct on 64-bit builds.co_codehas been removed and replaced with_PyCode_GetCode, andco_quickenedandco_firstinstrhave been replaced with_PyCode_CODE._PyOpcode_Deoptis a new mapping from all opcodes to their un-quickened forms._PyOpcode_InlineCacheEntriesis renamed to_PyOpcode_Caches,_Py_IncrementCountAndMaybeQuickenis renamed to_PyCode_Warmup,_Py_Quickenis renamed to_PyCode_Quicken, and_co_quickenedis renamed to_co_code_adaptive(and is now a read-onlymemoryview).- We don't emit unused nonzero opargs anymore in the compiler.
It looks like this results in a 3% memory improvement across all benchmarks:
Slower (1):
- regex_dna: 13.9 MB +- 374.4 kB -> 14.8 MB +- 311.7 kB: 1.07x slower
Faster (49):
- xml_etree_generate: 12.8 MB +- 340.8 kB -> 11.5 MB +- 329.9 kB: 1.11x faster
- logging_simple: 16.4 MB +- 1428.4 kB -> 14.9 MB +- 1906.2 kB: 1.10x faster
- html5lib: 27.4 MB +- 1000.0 kB -> 25.6 MB +- 1662.1 kB: 1.07x faster
- xml_etree_process: 12.8 MB +- 467.9 kB -> 12.1 MB +- 371.3 kB: 1.06x faster
- pidigits: 7594.4 kB +- 233.7 kB -> 7238.7 kB +- 100.3 kB: 1.05x faster
- regex_compile: 8885.3 kB +- 424.5 kB -> 8524.6 kB +- 159.0 kB: 1.04x faster
- unpickle: 8016.9 kB +- 384.9 kB -> 7694.8 kB +- 178.5 kB: 1.04x faster
- telco: 8009.3 kB +- 358.4 kB -> 7690.8 kB +- 327.4 kB: 1.04x faster
- pickle_pure_python: 8018.5 kB +- 390.8 kB -> 7706.8 kB +- 231.7 kB: 1.04x faster
- json_loads: 7637.5 kB +- 239.3 kB -> 7350.9 kB +- 186.3 kB: 1.04x faster
- chaos: 8442.3 kB +- 116.6 kB -> 8125.8 kB +- 39.7 kB: 1.04x faster
- logging_silent: 8057.6 kB +- 264.3 kB -> 7761.4 kB +- 169.1 kB: 1.04x faster
- fannkuch: 7315.6 kB +- 213.7 kB -> 7061.8 kB +- 186.3 kB: 1.04x faster
- pickle_dict: 7948.3 kB +- 257.2 kB -> 7677.0 kB +- 219.3 kB: 1.04x faster
- scimark_lu: 8120.4 kB +- 340.0 kB -> 7843.8 kB +- 348.3 kB: 1.04x faster
- sympy_integrate: 60.9 MB +- 424.5 kB -> 58.8 MB +- 456.4 kB: 1.04x faster
- meteor_contest: 9609.0 kB +- 374.0 kB -> 9283.7 kB +- 125.0 kB: 1.04x faster
- scimark_fft: 8204.3 kB +- 217.3 kB -> 7927.2 kB +- 227.9 kB: 1.03x faster
- spectral_norm: 7379.3 kB +- 241.6 kB -> 7147.8 kB +- 271.3 kB: 1.03x faster
- nbody: 7564.3 kB +- 252.4 kB -> 7327.0 kB +- 259.3 kB: 1.03x faster
- scimark_sor: 8102.5 kB +- 196.4 kB -> 7852.7 kB +- 352.7 kB: 1.03x faster
- crypto_pyaes: 7926.6 kB +- 120.6 kB -> 7689.8 kB +- 206.4 kB: 1.03x faster
- sympy_str: 61.3 MB +- 39.1 kB -> 59.4 MB +- 51.4 kB: 1.03x faster
- richards: 7951.8 kB +- 286.5 kB -> 7715.4 kB +- 244.9 kB: 1.03x faster
- unpack_sequence: 9026.1 kB +- 296.4 kB -> 8760.5 kB +- 387.8 kB: 1.03x faster
- django_template: 37.8 MB +- 116.9 kB -> 36.7 MB +- 109.5 kB: 1.03x faster
- sympy_expand: 60.1 MB +- 44.5 kB -> 58.4 MB +- 50.4 kB: 1.03x faster
- sympy_sum: 71.9 MB +- 2189.4 kB -> 69.9 MB +- 2175.3 kB: 1.03x faster
- raytrace: 8276.0 kB +- 152.7 kB -> 8050.4 kB +- 161.2 kB: 1.03x faster
- dulwich_log: 15.3 MB +- 94.2 kB -> 14.9 MB +- 99.3 kB: 1.03x faster
- pickle_list: 7874.5 kB +- 191.8 kB -> 7666.5 kB +- 173.0 kB: 1.03x faster
- pickle: 7937.0 kB +- 203.7 kB -> 7731.4 kB +- 386.8 kB: 1.03x faster
- regex_effbot: 8133.9 kB +- 222.4 kB -> 7924.1 kB +- 237.5 kB: 1.03x faster
- unpickle_list: 7853.8 kB +- 211.0 kB -> 7656.6 kB +- 201.3 kB: 1.03x faster
- deltablue: 9946.5 kB +- 169.6 kB -> 9698.5 kB +- 98.9 kB: 1.03x faster
- sqlite_synth: 9487.7 kB +- 41.4 kB -> 9255.3 kB +- 40.3 kB: 1.03x faster
- unpickle_pure_python: 7912.4 kB +- 187.2 kB -> 7720.2 kB +- 209.8 kB: 1.02x faster
- tornado_http: 29.9 MB +- 899.9 kB -> 29.2 MB +- 831.3 kB: 1.02x faster
- scimark_sparse_mat_mult: 8554.4 kB +- 114.1 kB -> 8358.8 kB +- 340.0 kB: 1.02x faster
- go: 9354.8 kB +- 372.5 kB -> 9142.5 kB +- 367.4 kB: 1.02x faster
- xml_etree_iterparse: 12.4 MB +- 233.6 kB -> 12.1 MB +- 307.8 kB: 1.02x faster
- pathlib: 9057.6 kB +- 173.9 kB -> 8863.2 kB +- 166.3 kB: 1.02x faster
- chameleon: 20.0 MB +- 346.1 kB -> 19.6 MB +- 316.4 kB: 1.02x faster
- regex_v8: 13.2 MB +- 143.0 kB -> 12.9 MB +- 199.2 kB: 1.02x faster
- python_startup_no_site: 11.3 MB +- 34.3 kB -> 11.1 MB +- 23.3 kB: 1.02x faster
- python_startup: 11.3 MB +- 39.6 kB -> 11.1 MB +- 24.2 kB: 1.02x faster
- xml_etree_parse: 11.7 MB +- 90.7 kB -> 11.5 MB +- 464.2 kB: 1.02x faster
- 2to3: 22.6 MB +- 39.0 kB -> 22.3 MB +- 45.5 kB: 1.01x faster
- json_dumps: 9413.9 kB +- 48.8 kB -> 9294.5 kB +- 325.8 kB: 1.01x faster
Benchmark hidden because not significant (7): float, hexiom, logging_format, mako, nqueens, pyflate, scimark_monte_carlo
Geometric mean: 1.03x faster
Ignore pyperf's incorrect "faster"/"slower" terminology... we're measuring memory usage here. I'm still waiting on actual performance numbers for this.
1% perf improvement too:
Slower (11):
- pickle_dict: 27.6 us +- 0.2 us -> 28.4 us +- 0.3 us: 1.03x slower
- html5lib: 65.3 ms +- 2.7 ms -> 67.2 ms +- 2.8 ms: 1.03x slower
- pickle_list: 4.33 us +- 0.05 us -> 4.46 us +- 0.05 us: 1.03x slower
- regex_v8: 23.1 ms +- 0.2 ms -> 23.8 ms +- 0.2 ms: 1.03x slower
- regex_dna: 217 ms +- 1 ms -> 223 ms +- 4 ms: 1.03x slower
- scimark_lu: 111 ms +- 2 ms -> 113 ms +- 2 ms: 1.02x slower
- regex_effbot: 3.46 ms +- 0.06 ms -> 3.50 ms +- 0.05 ms: 1.01x slower
- json_dumps: 12.6 ms +- 0.1 ms -> 12.8 ms +- 0.2 ms: 1.01x slower
- fannkuch: 397 ms +- 3 ms -> 400 ms +- 5 ms: 1.01x slower
- json_loads: 28.1 us +- 0.3 us -> 28.3 us +- 0.3 us: 1.01x slower
- xml_etree_iterparse: 105 ms +- 1 ms -> 105 ms +- 1 ms: 1.01x slower
Faster (43):
- go: 149 ms +- 1 ms -> 139 ms +- 1 ms: 1.07x faster
- logging_simple: 5.38 us +- 0.10 us -> 5.16 us +- 0.08 us: 1.04x faster
- pickle: 9.89 us +- 0.14 us -> 9.53 us +- 0.10 us: 1.04x faster
- pycparser: 1.24 sec +- 0.02 sec -> 1.19 sec +- 0.02 sec: 1.04x faster
- thrift: 780 us +- 13 us -> 754 us +- 8 us: 1.03x faster
- deltablue: 3.85 ms +- 0.05 ms -> 3.73 ms +- 0.05 ms: 1.03x faster
- unpack_sequence: 48.5 ns +- 0.5 ns -> 47.1 ns +- 0.9 ns: 1.03x faster
- scimark_sparse_mat_mult: 4.97 ms +- 0.15 ms -> 4.83 ms +- 0.11 ms: 1.03x faster
- pyflate: 451 ms +- 3 ms -> 438 ms +- 4 ms: 1.03x faster
- xml_etree_process: 56.7 ms +- 0.8 ms -> 55.2 ms +- 0.7 ms: 1.03x faster
- pickle_pure_python: 323 us +- 3 us -> 314 us +- 3 us: 1.03x faster
- telco: 6.80 ms +- 0.09 ms -> 6.65 ms +- 0.16 ms: 1.02x faster
- scimark_sor: 120 ms +- 1 ms -> 118 ms +- 1 ms: 1.02x faster
- pidigits: 194 ms +- 0 ms -> 190 ms +- 0 ms: 1.02x faster
- logging_format: 5.87 us +- 0.08 us -> 5.74 us +- 0.09 us: 1.02x faster
- unpickle_pure_python: 238 us +- 2 us -> 233 us +- 2 us: 1.02x faster
- xml_etree_generate: 80.0 ms +- 0.6 ms -> 78.4 ms +- 0.7 ms: 1.02x faster
- meteor_contest: 108 ms +- 3 ms -> 106 ms +- 1 ms: 1.02x faster
- regex_compile: 139 ms +- 1 ms -> 136 ms +- 1 ms: 1.02x faster
- hexiom: 6.96 ms +- 0.03 ms -> 6.83 ms +- 0.02 ms: 1.02x faster
- sympy_sum: 163 ms +- 2 ms -> 160 ms +- 1 ms: 1.02x faster
- tornado_http: 98.2 ms +- 1.3 ms -> 96.5 ms +- 1.4 ms: 1.02x faster
- dulwich_log: 65.8 ms +- 0.4 ms -> 64.7 ms +- 0.5 ms: 1.02x faster
- sympy_integrate: 20.9 ms +- 0.1 ms -> 20.6 ms +- 0.1 ms: 1.02x faster
- scimark_fft: 340 ms +- 4 ms -> 334 ms +- 4 ms: 1.02x faster
- 2to3: 267 ms +- 1 ms -> 263 ms +- 1 ms: 1.02x faster
- scimark_monte_carlo: 69.7 ms +- 1.2 ms -> 68.7 ms +- 0.8 ms: 1.01x faster
- django_template: 35.0 ms +- 0.5 ms -> 34.5 ms +- 0.5 ms: 1.01x faster
- chaos: 71.7 ms +- 0.6 ms -> 70.7 ms +- 0.6 ms: 1.01x faster
- nbody: 94.0 ms +- 1.7 ms -> 92.8 ms +- 1.8 ms: 1.01x faster
- raytrace: 310 ms +- 2 ms -> 306 ms +- 3 ms: 1.01x faster
- sqlalchemy_declarative: 141 ms +- 3 ms -> 140 ms +- 3 ms: 1.01x faster
- float: 76.7 ms +- 0.8 ms -> 75.8 ms +- 1.0 ms: 1.01x faster
- sympy_str: 291 ms +- 2 ms -> 287 ms +- 3 ms: 1.01x faster
- richards: 47.5 ms +- 1.2 ms -> 47.0 ms +- 1.1 ms: 1.01x faster
- sympy_expand: 485 ms +- 6 ms -> 480 ms +- 3 ms: 1.01x faster
- python_startup_no_site: 6.02 ms +- 0.00 ms -> 5.96 ms +- 0.00 ms: 1.01x faster
- chameleon: 6.63 ms +- 0.07 ms -> 6.57 ms +- 0.06 ms: 1.01x faster
- crypto_pyaes: 83.9 ms +- 0.7 ms -> 83.2 ms +- 1.1 ms: 1.01x faster
- spectral_norm: 102 ms +- 1 ms -> 101 ms +- 1 ms: 1.01x faster
- python_startup: 8.41 ms +- 0.01 ms -> 8.34 ms +- 0.01 ms: 1.01x faster
- nqueens: 86.1 ms +- 1.2 ms -> 85.5 ms +- 0.8 ms: 1.01x faster
- pathlib: 18.3 ms +- 0.2 ms -> 18.2 ms +- 0.3 ms: 1.01x faster
Benchmark hidden because not significant (8): json, logging_silent, mako, sqlalchemy_imperative, sqlite_synth, unpickle, unpickle_list, xml_etree_parse
Geometric mean: 1.01x faster
This is still marked as draft, what is left to do?
There is still an awkward spot in _gen_throw where we walk back f_lasti to the previous SEND instruction and perform the jump ourselves when in a yield from (which is sort of a strange control-flow path that isn’t reflected in the CFG/bytecode/dis). I also don’t think the current implementation handles EXTENDED_ARGs correctly.
I’m still trying to understand the code better and figure out a cleaner way of doing this. Any ideas?
There is still an awkward spot in
_gen_throwwhere we walk backf_lastito the previousSENDinstruction and perform the jump ourselves when in ayield from(which is sort of a strange control-flow path that isn’t reflected in the CFG/bytecode/dis). I also don’t think the current implementation handlesEXTENDED_ARGs correctly.I’m still trying to understand the code better and figure out a cleaner way of doing this. Any ideas?
#31968 should help.
🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit c8054b9 🤖
If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.
asvetlov pushed a commit to YvesDup/cpython that referenced this issue
Mar 25, 2022* Moves the bytecode to the end of the corresponding PyCodeObject, and quickens it in-place. * Removes the almost-always-unused co_varnames, co_freevars, and co_cellvars member caches * _PyOpcode_Deopt is a new mapping from all opcodes to their un-quickened forms. * _PyOpcode_InlineCacheEntries is renamed to _PyOpcode_Caches * _Py_IncrementCountAndMaybeQuicken is renamed to _PyCode_Warmup * _Py_Quicken is renamed to _PyCode_Quicken * _co_quickened is renamed to _co_code_adaptive (and is now a read-only memoryview). * Do not emit unused nonzero opargs anymore in the compiler.