bpo-38436: Improved performance for list addition. by brandtbucher · Pull Request #16705 · python/cpython

This PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1.

In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for list summations, by keeping only one intermediate result and extending it in-place.

For (potentially large) lists:

Currently executed as:

tmp = a + b
tmp = tmp + c
tmp = tmp + d
...

With this change:

tmp = a + b
tmp += c
tmp += d
...

Here are the pyperformance results (optimizations enabled):

EDIT: there are better measurements in the BPO discussion.

This is related to my earlier work in bpo-36229, however this is a much less invasive change that's limited to ceval.c, where our knowledge of context is much better.

https://bugs.python.org/issue38436