[mypyc] Optionally log a sampled operation trace to a file by JukkaL · Pull Request #19457 · python/mypy

@JukkaL

Logging executed ops is useful for performance analysis. For example,
we can look for functions which perform many slow operations and try
to optimize them. I've already used this successfully to implement
several optimizations. A typical optimization that this helps with is
replacing a generic Python function call operation with a native call.

Compile using `MYPYC_LOG_TRACE=1 mypyc ...` to enable trace logging.
The log will be written to `mypyc_trace.txt`. Roughly 1/1000 of ops
of certain kinds (e.g. primitive calls) are logged.

This can also be enabled by passing `log_trace=True` to `mypycify`.

Compared to profiling, this logging data is frequency-oriented instead
of CPU time oriented, and it's mostly helpful for micro-optimizations.
It also needs some understanding of mypyc internals to be useful. It's
not generally possible to reconstruct call stacks from the event data
(but we could improve this). However, there is very little noise in
the data and even small improvements can be visible.

Logging isn't impacted by C compiler optimizations, so for a faster
iteration loop, optimizations can be disabled.

In the future this could possibly be used for profile-guided
optimizations, but we'd probably need to adjust the data collection a
bit for this use case.

This is currently not documented and mostly intended for mypy or mypyc
maintainers for now. Also no tests yet, since this is not a user-evel
feature and it's disabled by default.

Random example of log entries from mypy self check:
```
mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend
mypy.semanal.SemanticAnalyzer.lookup::primitive_op:list_get_item_unsafe
mypy.expandtype.ExpandTypeVisitor.visit_type_var__TypeVisitor_glue:239:call:mypy.expandtype.ExpandTypeVisitor.visit_type_var
mypy.applytype.apply_generic_arguments:111:call_c:CPy_NoErrOccurred
mypy.indirection.TypeIndirectionVisitor.visit_callable_type__TypeVisitor_glue:118:call:mypy.indirection.TypeIndirectionVisitor.visit_callable_type
mypy.fastparse.ASTConverter.visit_Call::primitive_op:buf_init_item
mypy.semanal.SemanticAnalyzer.is_func_scope::primitive_op:int_eq
mypy.meet.is_overlapping_types:397:call:mypy.meet._is_subtype_is_overlapping_types_obj
mypy.types.CallableType.serialize:2287:call_c:CPyList_SetItemUnsafe
mypy.checkexpr.ExpressionChecker.check_argument_types:2576:call_c:CPyList_SetItemUnsafe
```

For example, let's look at this line:
```
mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend
```
In method `TypeVarExtractor._merge`, on line 1146 of
`mypy/typeops.py`, the C primitive CPyList_Extend was called
(corresponds to `list.extend`).

I'll later add some documentation to the wiki or other developer docs
and give examples of using and analyzing the data.

@JukkaL changed the title [mypyc] Add optional logging of sampled operation trace to a file [mypyc] Optionally log a sampled operation trace to a file

Jul 16, 2025

p-sawicki

@JukkaL

p-sawicki

@JukkaL JukkaL deleted the mypyc-op-trace-2 branch

July 16, 2025 12:43

JukkaL added a commit that referenced this pull request

Jul 16, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster
iteration, since the traversal code is very hot.

I used log tracing (#19457) to identify functions where
PyObject_GetIter was called frequently.

This is a aart of a set of micro-optimizations that improve self check
performance by ~5.5%.

JukkaL added a commit that referenced this pull request

Jul 16, 2025
Use precalculated set objects in more places.

This is similar to #19385. Some cases were still unoptimized.

I used trace logging (#19457) to identify functions where we were
creating many set objects, and I noticed that these were unncessary.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.

JukkaL added a commit that referenced this pull request

Jul 16, 2025
Mypyc doesn't generate very efficient code for enums, so switch to a
regular class. We can later revert the change if/when we can improve
enum support in mypyc.

Operations related to ArgKind were pretty prominent in the op trace
log (#19457).

By itself this improves performance by ~1.7%, based on
`perf_compare.py`, which is significant:
```
master                    4.168s (0.0%) | stdev 0.037s
HEAD                      4.098s (-1.7%) | stdev 0.028s
```

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.

JukkaL added a commit that referenced this pull request

Jul 16, 2025
Use precalculated set objects in more places.

This is similar to #19385. Some cases were still unoptimized.

I used trace logging (#19457) to identify functions where we were
creating many set objects, and I noticed that these were unnecessary.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.

JukkaL added a commit that referenced this pull request

Jul 17, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster
iteration, since the traversal code is very hot.

I used trace logging (#19457) to identify functions where
`PyObject_GetIter` was called frequently.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.