Dictionary modifications to enable optimizations.

There are a couple of modifications to the dict that we can make to enable optimizations.
These changes aren't designed to speedup dictionary operations (although they shouldn't slow them down) but to allow a larger range of speculative optimizations.

Exploit the immutability of shared dict keys.

Shared keys can only be added to, never removed from. By setting the dk_usable field to zero, no more keys can be added and the dict-keys becomes immutable.

Here are a couple of examples uses:

  1. Loading from builtins. If globals->keys == EXPECTED_KEYS and key not in EXPECTED_KEYS and EXPECTED_KEYS is immutable, then we can read from builtins without have to check globals. If the values in globals change, but the keys don't (as would be the case for global variables) then we still get fast access to builtins.

  2. Loading a method. Functions are non-overriding descriptors, so to cache a function in LOAD_METHOD we just need to check that obj->dict->keys == EXPECTED_KEYS provided that key not in EXPECTED_KEYS and EXPECTED_KEYS is immutable.

We can achieve the same general effect by versioning the keys, dict_key->version == EXPECTED_VERSION, which would require an extra memory access to check, but wouldn't leak keys as we wouldn't need to hold a reference to EXPECTED_KEYS.

More variants of the lookup function.

The dictionary implements its internal specialization (split-keys, str-only, general) via a function pointer to the relevant lookup function.

By replacing this function pointer with a pair of enums, one for loads and one for stores, would allow additional specializations to insert out-of-line guards, and other special cases.

For example, one way to optimize builtin loads, is to replace them with constant loads, and discard the optimized code should the builtin dict change. We could implement this by specializing the store function of the builtin dict to invalidate all optimized (quickened) code should any value change.

This would leave (on 64bit machines) 6 bytes spare for a keys version, which would be plenty since we only need a version when we want to optimize, not for every change.

Remove the version number from dictionaries

The above optimizations mean that we have no use for the dict version, so we can remove it and save 8 bytes per dict.

LOAD_GLOBAL_BUILTIN example

By making the global's key immutable and adding the ability to de-optimize to the builtin dictionary, LOAD_GLOBAL can be specialized to the highly efficient LOAD_GLOBAL_BUILTIN:

    if (UNLIKELY(globals->keys != EXPECTED_KEYS)) {
         goto maybe_deopt_load_global_builtin; // Handles counter, etc.
    }
    saturating_increment(counter);
    INCREF(cached_value);
    PUSH(cached_value);

Not only is this about as fast as we can get without JIT compilation, it also works when there is a variable in the global dictionary.