gh-112075: refactor dictionary lookup functions for better re-usability by DinoV · Pull Request #114629 · python/cpython

Refactor dictionary lookup functions for better usability

Free threaded builds of Python will need to expand upon the lookup functions to have thread-safe versions of them that can be run without the dictionary being locked. These lookups will just be subtly different - they'll use atomic loads and they'll need to contain some extra checks for values that may change in flight.

Currently there are 3 loookup functions so for free-threaded builds we'll need to get 6 of them. That's going to be a good amount of copy and pasting with sometimes subtle differences. So I'm curious if people would prefer an approach like this as opposed to the copy and pasting. I don't want to use macros as that'd destroy debugging. If we're concerned about perf implications for non-mainstream compilers or don't like this approach I can just go ahead with copy and pasting.

This factors these functions into one core function which contains the loop, and 3 comparison helper functions which implement the different comparisons. In free-threaded versions we'll get modified versions of these comparisons as that's where the differences will be.

This still generates nearly identical code in LTO builds, here's the before https://pastebin.com/UZdhKc6f and the after: https://pastebin.com/sdhcFZP5. Even in non-LTO builds the code is remarkably similar (with the new code maybe resembling the LTO code a little bit more as for some reason the non-LTO code gets a weird jmp at the beginning).

There is technically one change here in that the loop unrolling has been applied to all versions of the code, although we could easily have 2 versions of the loop function with one unrolled and one not-unrolled.

Perf seems to be be mostly neutral to me: https://pastebin.com/LPJYZAj4