gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag by vstinner · Pull Request #93106 · python/cpython
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
LGTM but left one comment
| @@ -32,7 +32,7 @@ | |||
| #include "pegen.h" | |||
|
|
|||
| #if defined(Py_DEBUG) && defined(Py_BUILD_CORE) | |||
| # define D(x) if (Py_DebugFlag) x; | |||
| # define D(x) if (_Py_GetConfig()->parser_debug) { x; } | |||
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this makes the debug run slower due to the function call (is everywhere)🤔
Could you do a quick check please?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change only affects the debug build. I don't know how to measure the parser performance. Any suggestion?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this only affects debug mode, but I want to know by how much.
A quick check is running with "python -d" some big python file that doesn't actually execute any code and compare the timing before and after this PR. I don't need anything exact, just a rough estimate
Benchmark with a few large Python files:
import pyperf import tokenize # wc -l $(find -name "*.py")|sort -n large_files = """ 5259 ./Tools/clinic/clinic.py 5553 ./Lib/test/test_email/test_email.py 5577 ./Lib/test/test_argparse.py 5659 ./Lib/test/test_logging.py 5806 ./Lib/test/test_descr.py 5878 ./Lib/test/test_decimal.py 5993 ./Lib/test/_test_multiprocessing.py 6425 ./Lib/_pydecimal.py 6626 ./Lib/test/datetimetester.py 6664 ./Lib/test/test_socket.py 7325 ./Lib/test/test_typing.py 15534 ./Lib/pydoc_data/topics.py """ large_files = [line.split()[-1] for line in large_files.splitlines() if line.strip()] files = [] for filename in large_files: with tokenize.open(filename) as fp: content = fp.read() files.append((filename, content)) def bench(files): for filename, content in files: compile(content, filename, "exec") runner = pyperf.Runner() runner.bench_func('compile', bench, files)
Python built with:
./configure --with-pydebug && make- GCC 12.1.1 using
-Ogoptimization level
Comparison:
$ python3 -m pyperf compare_to ref.json pr.json
Mean +- std dev: [ref] 1.40 sec +- 0.02 sec -> [pr] 1.93 sec +- 0.01 sec: 1.37x slower
Oh, it's way slower.
…Flag * Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the parser. * Add Parser.debug member. * Add tok_state.debug member. * Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
I rewrote my PR to add a member to the Parser structure instead. New benchmark:
$ python3 -m pyperf compare_to ref.json pr2.json
Mean +- std dev: [ref] 1.40 sec +- 0.02 sec -> [pr2] 1.43 sec +- 0.04 sec: 1.02x slower
IMO the slowdown is now acceptable. I didn't use CPU isolation for this benchmark, I was using my laptop to do other stuffs in the meanwhile ;-)
I didn't expect that the D() macro would be called so often, and that the change would have such massive impact on the performance!
Is this debug mode really useful for someone working on the parser? I'm surprised that it has a command line option (-d) and an environment variable (PYTHONDEBUG=1). It's very verbose :-)
Follow-up PR to better document this option (for experts only!!!): #93186