Dictionary modifications to enable optimizations.
There are a couple of modifications to the dict that we can make to enable optimizations.
These changes aren't designed to speedup dictionary operations (although they shouldn't slow them down) but to allow a larger range of speculative optimizations.
Exploit the immutability of shared dict keys.
Shared keys can only be added to, never removed from. By setting the dk_usable field to zero, no more keys can be added and the dict-keys becomes immutable.
Here are a couple of examples uses:
-
Loading from builtins. If
globals->keys == EXPECTED_KEYSandkey not in EXPECTED_KEYSandEXPECTED_KEYSis immutable, then we can read frombuiltinswithout have to checkglobals. If the values inglobalschange, but the keys don't (as would be the case for global variables) then we still get fast access to builtins. -
Loading a method. Functions are non-overriding descriptors, so to cache a function in
LOAD_METHODwe just need to check thatobj->dict->keys == EXPECTED_KEYSprovided thatkey not in EXPECTED_KEYSandEXPECTED_KEYSis immutable.
We can achieve the same general effect by versioning the keys, dict_key->version == EXPECTED_VERSION, which would require an extra memory access to check, but wouldn't leak keys as we wouldn't need to hold a reference to EXPECTED_KEYS.
More variants of the lookup function.
The dictionary implements its internal specialization (split-keys, str-only, general) via a function pointer to the relevant lookup function.
By replacing this function pointer with a pair of enums, one for loads and one for stores, would allow additional specializations to insert out-of-line guards, and other special cases.
For example, one way to optimize builtin loads, is to replace them with constant loads, and discard the optimized code should the builtin dict change. We could implement this by specializing the store function of the builtin dict to invalidate all optimized (quickened) code should any value change.
This would leave (on 64bit machines) 6 bytes spare for a keys version, which would be plenty since we only need a version when we want to optimize, not for every change.
Remove the version number from dictionaries
The above optimizations mean that we have no use for the dict version, so we can remove it and save 8 bytes per dict.
LOAD_GLOBAL_BUILTIN example
By making the global's key immutable and adding the ability to de-optimize to the builtin dictionary, LOAD_GLOBAL can be specialized to the highly efficient LOAD_GLOBAL_BUILTIN:
if (UNLIKELY(globals->keys != EXPECTED_KEYS)) { goto maybe_deopt_load_global_builtin; // Handles counter, etc. } saturating_increment(counter); INCREF(cached_value); PUSH(cached_value);
Not only is this about as fast as we can get without JIT compilation, it also works when there is a variable in the global dictionary.