Make the MRO cache thread-safe in free-threaded builds

Feature or enhancement

The MRO (method resolution order) cache is used to cache lookups on the type hierarchy, such as method lookups. It's currently implemented per-interpreter, which will not be thread-safe in free-threaded builds.

	/* Internal API to look for a name through the MRO.
	This returns a borrowed reference, and doesn't set an exception! */
	PyObject *
	_PyType_Lookup(PyTypeObject type, PyObject name)
	{
	PyObject *res;
	int error;
	PyInterpreterState *interp = _PyInterpreterState_GET();

	unsigned int h = MCACHE_HASH_METHOD(type, name);
	struct type_cache *cache = get_type_cache();
	struct type_cache_entry *entry = &cache->hashtable[h];
	if (entry->version == type->tp_version_tag &&
	entry->name == name) {
	assert(_PyType_HasFeature(type, Py_TPFLAGS_VALID_VERSION_TAG));
	OBJECT_STAT_INC_COND(type_cache_hits, !is_dunder_name(name));
	OBJECT_STAT_INC_COND(type_cache_dunder_hits, is_dunder_name(name));
	return entry->value;
	}
	OBJECT_STAT_INC_COND(type_cache_misses, !is_dunder_name(name));
	OBJECT_STAT_INC_COND(type_cache_dunder_misses, is_dunder_name(name));

	/* We may end up clearing live exceptions below, so make sure it's ours. */
	assert(!PyErr_Occurred());

	res = find_name_in_mro(type, name, &error);
	/* Only put NULL results into cache if there was no error. */
	if (error) {
	/* It's not ideal to clear the error condition,
	but this function is documented as not setting
	an exception, and I don't want to change that.
	E.g., when PyType_Ready() can't proceed, it won't
	set the "ready" flag, so future attempts to ready
	the same type will call it again -- hopefully
	in a context that propagates the exception out.
	*/
	if (error == -1) {
	PyErr_Clear();
	}
	return NULL;
	}

	if (MCACHE_CACHEABLE_NAME(name) && assign_version_tag(interp, type)) {
	h = MCACHE_HASH_METHOD(type, name);
	struct type_cache_entry *entry = &cache->hashtable[h];
	entry->version = type->tp_version_tag;
	entry->value = res; /* borrowed */
	assert(_PyASCIIObject_CAST(name)->hash != -1);
	OBJECT_STAT_INC_COND(type_cache_collisions, entry->name != Py_None && entry->name != name);
	assert(_PyType_HasFeature(type, Py_TPFLAGS_VALID_VERSION_TAG));
	Py_SETREF(entry->name, Py_NewRef(name));
	}
	return res;
	}

The nogil-3.9 and nogil-3.12 forks used different approaches to make the MRO cache thread-safe. The nogil-3.9 fork moved the type cache to the PyThreadState. The nogil-3.12 fork implemented a thread-safe cache shared across threads ¹. The nogil-3.9 approach is much simpler, I think we should start with that.

Suggested approach:

If Py_GIL_DISABLED is defined, move the type_cache to the private struct PyThreadStateImpl.
Refactor _PyType_Lookup() as necessary to use the correct cache

Linked PRs

For reference, here is the nogil-3.12 implementation: https://github.com/colesbury/nogil-3.12/commit/9c1f7ba1b4 ↩