Issue 33196: multiprocessing: serialization must ensure that contexts are compatible (the same)

Issue33196

Created on 2018-04-01 05:58 by arcivanov, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
test_lock_sigsegv.py	arcivanov, 2018-04-01 05:58
testing_on_fedora.png	augustogoulart, 2018-11-13 14:22
coredump	arcivanov, 2018-11-14 09:31	coredump (Fedora 29)

Messages (11)
msg314762 - (view)	Author: Arcadiy Ivanov (arcivanov)	Date: 2018-04-01 05:58
While working on GH gevent/gevent#993 I've encountered a stall trying to read from an mp.Queue passed to mp.Process's target as an argument. Trying to print out the lock state in child process I encountered as SEGV in Lock's __repr__. I originally thought it was due to gevent/greenlet stack magic, but it wasn't. This happens when `fork` context Queue (default) is used with `spawn` context Process (obvious stupidity on my part, alas shouldn't crash). Python 3.6.4 from PyEnv Fedora 27 ``` $ python test_lock_sigsegv.py Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)> -11 ``` ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 __new_sem_getvalue (sem=0x7fc877f54000, sval=sval@entry=0x7fffb130db9c) at sem_getvalue.c:38 38 *sval = atomic_load_relaxed (&isem->data) & SEM_VALUE_MASK; ... #0 __new_sem_getvalue (sem=0x7fc877f54000, sval=sval@entry=0x7fffb130db9c) at sem_getvalue.c:38 #1 0x00007f1116aeb202 in semlock_getvalue (self=<optimized out>) at /tmp/python-build.20171219170845.6548/Python-3.6.4/Modules/_multiprocessing/semaphore.c:531 ``` At a minimum I think there should be a check trying to reduce arguments via incompatible context's process to prevent a SEGV. Test attached.
msg314792 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2018-04-01 23:45
Thanks for the report. Indeed I think it would be worth preventing this programmer error.
msg329491 - (view)	Author: Augusto Goulart (augustogoulart) *	Date: 2018-11-09 01:12
I couldn't reproduce the error on Debian 9 nor OSX, although I tried tweaking the test script a little bit to force the error. Arcadiy, did you tried reproducing the same issue in a different platform? Did someone report something similar in recent issues on gevent?
msg329719 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-11-12 06:49
On Win10 I've also failed to reproduce the reported issue with the supplied script. I tried with Python versions 3.6.3, 3.7.0, and a recent build of the master branch (to be 3.8). Can someone try to reproduce this on Fedora?
msg329845 - (view)	Author: Augusto Goulart (augustogoulart) *	Date: 2018-11-13 14:22
I've tested on Fedora 29 server and also failed to reproduce the error.
msg329892 - (view)	Author: Arcadiy Ivanov (arcivanov)	Date: 2018-11-14 09:20
@gus.goulart you have reproduced it. The screenshot showing `-11` means the process dumped core. Because it's the child that dumps core, it's masked by abrt. Observe: $ python3 --version Python 3.7.1 $ python3 ~/Downloads/test_lock_sigsegv.py Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)> -11 $ abrt 61bdd28 1x /usr/bin/python3.7 2018-11-14 04:18:06 $ uname -a Linux myhost 4.18.17-300.fc29.x86_64 #1 SMP Mon Nov 5 17:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
msg329893 - (view)	Author: Arcadiy Ivanov (arcivanov)	Date: 2018-11-14 09:23
@taleinat The above has been reproduced on Fedora 29.
msg329898 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-11-14 10:35
> At a minimum I think there should be a check trying to reduce arguments via incompatible context's process to prevent a SEGV. I'm not sure that I understand the bug. The reproducer script pass a multiprocessing.Queue to a child process and then the child crash when attempting to call multiprocessing.synchronize.Lock.__repr__(). Does the child reuse a copy of the lock of the parent process? Or does the child create a new SemLock? I reproduced the bug on Fedora 26. I attached the child process in gdb. The crash occurs on sem_getvalue() in the child process. Program received signal SIGSEGV, Segmentation fault. 0x00007f29a5156610 in sem_getvalue@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 (gdb) where #0 0x00007f29a5156610 in sem_getvalue@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #1 0x00007f299c60e7bb in semlock_getvalue (self=0x7f299a95e2b0, _unused_ignored=0x0) at /home/haypo/prog/python/master/Modules/_multiprocessing/semaphore.c:541 #2 0x0000000000434537 in _PyMethodDef_RawFastCallKeywords (method=0x7f299c8102e0 <semlock_methods+192>, self=<_multiprocessing.SemLock at remote 0x7f299a95e2b0>, args=0x7f299c5f47e8, nargs=0, kwnames=0x0) at Objects/call.c:629 #3 0x0000000000607aff in _PyMethodDescr_FastCallKeywords (descrobj=<method_descriptor at remote 0x7f299ca42520>, args=0x7f299c5f47e0, nargs=1, kwnames=0x0) at Objects/descrobject.c:288 #4 0x0000000000512f92 in call_function (pp_stack=0x7ffd3591f730, oparg=1, kwnames=0x0) at Python/ceval.c:4595 (...) (gdb) py-bt Traceback (most recent call first): File "/home/haypo/prog/python/master/Lib/multiprocessing/synchronize.py", line 170, in __repr__ elif self._semlock._get_value() == 1: File "/home/haypo/prog/python/master/test_lock_sigsegv.py", line 20, in child print("Child r_q: %r, %r, %r" % (r_q._rlock, r_q._wlock, r_q._sem), flush=True) File "/home/haypo/prog/python/master/Lib/multiprocessing/process.py", line 99, in run self._target(self._args, *self._kwargs) File "/home/haypo/prog/python/master/Lib/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/haypo/prog/python/master/Lib/multiprocessing/spawn.py", line 130, in _main return self._bootstrap() File "/home/haypo/prog/python/master/Lib/multiprocessing/spawn.py", line 629, in spawn_main File "<string>", line 1, in <module>
msg329908 - (view)	Author: Augusto Goulart (augustogoulart) *	Date: 2018-11-14 13:53
@vstinner, on Debian 9 I can see the problem as well but wasn't able to debug with the level of details you did. Could you please share the process you followed? What I found was: ./python -X dev test_lock_sigsegv.py Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)> Fatal Python error: Segmentation fault Current thread 0x00007fab36124480 (most recent call first): File "/home/gus/Workspace/cpython/Lib/multiprocessing/synchronize.py", line 170 in __repr__ File "/home/gus/Workspace/cpython/test_lock_sigsegv.py", line 17 in child File "/home/gus/Workspace/cpython/Lib/multiprocessing/process.py", line 99 in run File "/home/gus/Workspace/cpython/Lib/multiprocessing/process.py", line 297 in _bootstrap File "/home/gus/Workspace/cpython/Lib/multiprocessing/spawn.py", line 130 in _main File "/home/gus/Workspace/cpython/Lib/multiprocessing/spawn.py", line 117 in spawn_main File "<string>", line 1 in <module> -11 Using GDB: (gdb) set follow-fork-mode child (gdb) run test_lock_sigsegv.py Starting program: /home/gus/Workspace/cpython/python test_lock_sigsegv.py [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Parent r_q: <Lock(owner=None)>, <Lock(owner=None)>, <BoundedSemaphore(value=2147483647, maxvalue=2147483647)> [New process 4941] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". process 4941 is executing new program: /home/gus/Workspace/cpython/python -11 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 2 (process 4941) exited normally] (gdb) where No stack. (gdb) py-bt Unable to locate python frame (gdb)
msg329909 - (view)	Author: Arcadiy Ivanov (arcivanov)	Date: 2018-11-14 14:08
@vstinner > I'm not sure that I understand the bug. The bug is, if a user makes an error and passes a Queue from context 'fork' to a child that is spawned using 'spawn', the passed Queue is, for obvious reasons, broken. The 'print("Child r_q: %r, %r, %r" % (r_q._rlock, r_q._wlock, r_q._sem), flush=True)' is simply a demonstration of a broken state of the SemLock observed in the child. The expected fix would be to stop the mixed context use of MP objects on the API level (ValueError?) or at least prevent a segfault.
msg329913 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-11-14 16:02
> The bug is, if a user makes an error and passes a Queue from context 'fork' to a child that is spawned using 'spawn', the passed Queue is, for obvious reasons, broken. Ok. I rewrote the issue title.

History
Date	User	Action	Args
2022-04-11 14:58:59	admin	set	github: 77377
2018-11-14 17:28:52	davin	set	nosy: + davin
2018-11-14 16:02:18	vstinner	set	messages: + msg329913
2018-11-14 16:01:51	vstinner	set	title: SEGV in mp.synchronize.Lock.__repr__ in spawn'ed proc if ctx mismatched -> multiprocessing: serialization must ensure that contexts are compatible (the same)
2018-11-14 14:08:05	arcivanov	set	messages: + msg329909
2018-11-14 13:53:16	augustogoulart	set	messages: + msg329908
2018-11-14 10:35:40	vstinner	set	messages: + msg329898
2018-11-14 09:31:13	arcivanov	set	files: + coredump
2018-11-14 09:23:49	arcivanov	set	messages: + msg329893
2018-11-14 09:20:44	arcivanov	set	messages: + msg329892
2018-11-13 14:22:01	augustogoulart	set	files: + testing_on_fedora.png messages: + msg329845
2018-11-12 06:49:01	taleinat	set	messages: + msg329719
2018-11-09 01:13:18	augustogoulart	set	nosy: + taleinat
2018-11-09 01:12:39	augustogoulart	set	nosy: + vstinner, augustogoulart messages: + msg329491
2018-04-01 23:45:19	pitrou	set	versions: + Python 3.7, Python 3.8 nosy: + pitrou messages: + msg314792 stage: needs patch
2018-04-01 05:58:11	arcivanov	create