bpo-35780: Add link guards to the lru_cache() C code by rhettinger · Pull Request #11733

bpo-35780: Add link guards to the lru_cache() C code by rhettinger · Pull Request #11733 · python/cpython

Here's a scenario that I'm worried about but haven't proved that it can occur.

When a link is created, the lru_cache gets one reference but doesn't store it anywhere.
When the link is added to the linked list, it is also added to the cache dict, giving a second reference.
When a link is about to be evicted, it is first removed from the linked list.
If a dict access triggers a reentrant or concurrent call into the lru_cache, that new code path can use the dict to find the link (already removed from the linked list but still in the dict) and try to move it to the front of the cache by updating the link fields. That puts it back into the linked list.
The original code path resumes, believing that the link is removed from the linked list, and deletes the link from the dict, killing that last reference to the link.
Now, the linked list has a list that is an orphan (no cache dict entry points at it) and that has a refcount of zero (deleted but still in memory, ready for reuse), and that will fail when it comes time to evict it. That will either segfault, overwrite something in use, drive the refcount negative, or silently corrupt the output.

Possible solutions (if in fact there is a problem):

Don't use borrowed references in the links. Use actual references to establish the invariant: all links in the linked list have a ref count greater than zero, even if the links are orphans.
Adopt the rotating root technique used in the pure Python code so that links don't get removed and readded with intervening dictionary calls. For a cache miss, the pure Python code leaves the links in place and only updates the key/result fields in the new link and invalidates those fields in the old link. In other words, it never leaves the links in an inconsistent state.