bpo-38436: Improved performance for list addition. by brandtbucher · Pull Request #16705 · python/cpython
This PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1.
In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for list summations, by keeping only one intermediate result and extending it in-place.
For (potentially large) lists:
Currently executed as:
tmp = a + b tmp = tmp + c tmp = tmp + d ...
With this change:
tmp = a + b tmp += c tmp += d ...
Here are the pyperformance results (optimizations enabled):
EDIT: there are better measurements in the BPO discussion.
This is related to my earlier work in bpo-36229, however this is a much less invasive change that's limited to ceval.c, where our knowledge of context is much better.