gh-112075: Make some `dict` operations thread-safe without GIL by chgnrdv · Pull Request #112247 · python/cpython

Closed

chgnrdv wants to merge 3 commits intopython:mainfrom

chgnrdv:make-dict-ops-thread-safe-in-nogil

Conversation

@chgnrdv

#112075

For dict.__len__, _Py_atomic_load_ssize_relaxed is used to access PyDictObject ma_used field.

For the following methods

  • dict.fromkeys
  • dict.copy
  • dict_richcompare
  • dict.clear
  • dict.__sizeof__
  • dict.__or__
  • dict.__ior__
  • dict.__reversed__
  • dict.keys
  • dict.items
  • dict.values
  • dict.update (in dict_update_common)
  • dict.__init__ (in dict_update_common)
  • dict.__repr__

the critical section API is used, either in form of AC directive or macro.

For `dict.__len__`, use `_Py_atomic_load_ssize_relaxed` to access `PyDictObject` `ma_used` field.

For the following methods
* `dict.fromkeys`
* `dict.copy`
* `dict_richcompare`
* `dict.clear`
* `dict.__sizeof__`
* `dict.__or__`
* `dict.__ior__`
* `dict.__reversed__`
* `dict.keys`
* `dict.items`
* `dict.values`

use critical section API, either in form of AC directive or macro.

colesbury

#include "pycore_call.h" // _PyObject_CallNoArgs()
#include "pycore_ceval.h" // _PyEval_GetBuiltin()
#include "pycore_code.h" // stats
#include "pycore_critical_section.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment like: #include "pycore_critical_section.h" // Py_BEGIN_CRITICAL_SECTION()

}

/*[clinic input]
@critical_section

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict_fromkeys_impl should not have a critical_section. This locks the type object, which isn't necessary.

We may need critical sections for some code paths of _PyDict_FromKeys, but that will need to be within _PyDict_FromKeys.



/*[clinic input]
@critical_section

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want a critical section here. Creating the reverse iterator is thread-safe without any special handling, although iterating over the reverse iterator may require special handling.

return _PyDictView_New(dict, &PyDictKeys_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictKeys_Type);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here. Creating the view objects, like _PyDictView_New (and the reverse iterator above) do not access any of the dictionary internals so they don't need to lock the dictionary.

return _PyDictView_New(dict, &PyDictItems_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictItems_Type);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here.

return _PyDictView_New(dict, &PyDictValues_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictValues_Type);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here

static PyObject *
dict_clear(PyDictObject *mp, PyObject *Py_UNUSED(ignored))
{
Py_BEGIN_CRITICAL_SECTION(mp);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need public APIs like PyDict_Clear() to be thread-safe. The critical sections need to be pushed down into that API.

I suggest doing something like in colesbury/nogil-3.12@d896dfc8db:

  • Rename the current implementation of PyDict_Clear() and make it static (e.g. static void _dict_clear(...))
  • Make PyDict_Clear() a simple wrapper around _dict_clear() that adds critical section calls.
  • Just call PyDict_Clear() from this function dict_clear(...).
Py_ReprLeave((PyObject *)mp);
return PyUnicode_FromString("{}");
repr = PyUnicode_FromString("{}");
goto end;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin this may incorrectly call _PyUnicodeWriter_Dealloc() on an uninitialized writer below if repr is NULL (i.e., if PyUnicode_FromString returns an error).

For these functions with complex control flow, I think it's better to add a wrapper function than try to work Py_BEGIN_CRITICAL_SECTION() calls into the existing control flow. In other words:

  • Rename dict_repr to e.g. dict_repr_impl.
  • Add a new dict_repr that just calls dict_repr_impl under a Py_BEGIN_CRITICAL_SECTION() calls.

@erlend-aasland

@colesbury

I think dict thread-safety has been addressed by Dino's PRs and this PR can be closed now. Thanks for your work @chgnrdv.

Labels