[mypyc] Optionally log a sampled operation trace to a file by JukkaL · Pull Request #19457 · python/mypy
Logging executed ops is useful for performance analysis. For example, we can look for functions which perform many slow operations and try to optimize them. I've already used this successfully to implement several optimizations. A typical optimization that this helps with is replacing a generic Python function call operation with a native call. Compile using `MYPYC_LOG_TRACE=1 mypyc ...` to enable trace logging. The log will be written to `mypyc_trace.txt`. Roughly 1/1000 of ops of certain kinds (e.g. primitive calls) are logged. This can also be enabled by passing `log_trace=True` to `mypycify`. Compared to profiling, this logging data is frequency-oriented instead of CPU time oriented, and it's mostly helpful for micro-optimizations. It also needs some understanding of mypyc internals to be useful. It's not generally possible to reconstruct call stacks from the event data (but we could improve this). However, there is very little noise in the data and even small improvements can be visible. Logging isn't impacted by C compiler optimizations, so for a faster iteration loop, optimizations can be disabled. In the future this could possibly be used for profile-guided optimizations, but we'd probably need to adjust the data collection a bit for this use case. This is currently not documented and mostly intended for mypy or mypyc maintainers for now. Also no tests yet, since this is not a user-evel feature and it's disabled by default. Random example of log entries from mypy self check: ``` mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend mypy.semanal.SemanticAnalyzer.lookup::primitive_op:list_get_item_unsafe mypy.expandtype.ExpandTypeVisitor.visit_type_var__TypeVisitor_glue:239:call:mypy.expandtype.ExpandTypeVisitor.visit_type_var mypy.applytype.apply_generic_arguments:111:call_c:CPy_NoErrOccurred mypy.indirection.TypeIndirectionVisitor.visit_callable_type__TypeVisitor_glue:118:call:mypy.indirection.TypeIndirectionVisitor.visit_callable_type mypy.fastparse.ASTConverter.visit_Call::primitive_op:buf_init_item mypy.semanal.SemanticAnalyzer.is_func_scope::primitive_op:int_eq mypy.meet.is_overlapping_types:397:call:mypy.meet._is_subtype_is_overlapping_types_obj mypy.types.CallableType.serialize:2287:call_c:CPyList_SetItemUnsafe mypy.checkexpr.ExpressionChecker.check_argument_types:2576:call_c:CPyList_SetItemUnsafe ``` For example, let's look at this line: ``` mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend ``` In method `TypeVarExtractor._merge`, on line 1146 of `mypy/typeops.py`, the C primitive CPyList_Extend was called (corresponds to `list.extend`). I'll later add some documentation to the wiki or other developer docs and give examples of using and analyzing the data.
changed the title
[mypyc] Add optional logging of sampled operation trace to a file
[mypyc] Optionally log a sampled operation trace to a file
JukkaL
deleted the
mypyc-op-trace-2
branch
JukkaL added a commit that referenced this pull request
Jul 16, 2025Specialize for concrete sequence types (`list` and `tuple`) for faster iteration, since the traversal code is very hot. I used log tracing (#19457) to identify functions where PyObject_GetIter was called frequently. This is a aart of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL added a commit that referenced this pull request
Jul 16, 2025Use precalculated set objects in more places. This is similar to #19385. Some cases were still unoptimized. I used trace logging (#19457) to identify functions where we were creating many set objects, and I noticed that these were unncessary. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL added a commit that referenced this pull request
Jul 16, 2025Mypyc doesn't generate very efficient code for enums, so switch to a regular class. We can later revert the change if/when we can improve enum support in mypyc. Operations related to ArgKind were pretty prominent in the op trace log (#19457). By itself this improves performance by ~1.7%, based on `perf_compare.py`, which is significant: ``` master 4.168s (0.0%) | stdev 0.037s HEAD 4.098s (-1.7%) | stdev 0.028s ``` This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL added a commit that referenced this pull request
Jul 16, 2025Use precalculated set objects in more places. This is similar to #19385. Some cases were still unoptimized. I used trace logging (#19457) to identify functions where we were creating many set objects, and I noticed that these were unnecessary. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL added a commit that referenced this pull request
Jul 17, 2025Specialize for concrete sequence types (`list` and `tuple`) for faster iteration, since the traversal code is very hot. I used trace logging (#19457) to identify functions where `PyObject_GetIter` was called frequently. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters