Issue7205
Created on 2009-10-25 21:55 by rbcollins, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bz2fail.py | rbcollins, 2009-10-25 21:55 | demo of deadlock | ||
| bzthreads.patch | pitrou, 2009-10-26 19:23 | |||
| Messages (12) | |||
|---|---|---|---|
| msg94462 - (view) | Author: Robert Collins (rbcollins) * ![]() |
Date: 2009-10-25 21:55 | |
There is a systemic bug in BZ2File where the GIL is released to perform compression work, and any other thread calling into BZ2File will deadlock. We noticed in the write method, but inspection of the code makes it clear that its systemic. The problem is pretty simple. Say you have two threads and one bz2file object. One calls write(), the other calls (say) seek(), but it could be write() or other methods too. Now, its pretty clear that the question 'should these two threads get serialised' could be contentious. So lets put that aside by saying 'raising an exception or serialising in arbitrary order would be ok'. What happens today is: t1:bz2file.write bz2file.lock.acquire gil-release bz2compression starts t2:gil-acquired bz2file.seek bz2file.lock.acquire(wait=1) <- this thread is stuck now, and has the GIL t1:bz2compression finishes gil.acquire <- this thread is stuck now, waiting for the GIL If any owner of the bz2file object lock will release the GIL, *all* routines that attempt to lock the bz2file object have to release the GIL if they can't get the lock - blocking won't work. I'm not familiar enough with the python threading API to know whether its safe to call without the GIL. If its not then clearly it can't be used to work with getting the GIL, and lower layer locks should be used. |
|||
| msg94463 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-10-25 22:00 | |
Thanks, nice catch. |
|||
| msg94464 - (view) | Author: Robert Collins (rbcollins) * ![]() |
Date: 2009-10-25 22:15 | |
On Sun, 2009-10-25 at 22:00 +0000, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > Thanks, nice catch. Yeah :). > versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2 Python 2.5 is also affected - its what we're running on the server that broke :) -Rob |
|||
| msg94465 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-10-25 22:27 | |
> Python 2.5 is also affected - its what we're running on the server that > broke :) Yes, but it doesn't receive any bug fixes anymore -- only security fixes. |
|||
| msg94466 - (view) | Author: Robert Collins (rbcollins) * ![]() |
Date: 2009-10-25 22:34 | |
On Sun, 2009-10-25 at 22:27 +0000, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > > Python 2.5 is also affected - its what we're running on the server that > > broke :) > > Yes, but it doesn't receive any bug fixes anymore -- only security > fixes. Ok, we'll work around the issue there then ;) -Rob |
|||
| msg94500 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-10-26 19:23 | |
Here is a patch. |
|||
| msg94509 - (view) | Author: Robert Collins (rbcollins) * ![]() |
Date: 2009-10-26 21:15 | |
On Mon, 2009-10-26 at 19:23 +0000, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > Here is a patch. Looks fine to me assuming that the locking functions can be used outside the GIL. On the test side, the case I supplied was low noise for me - I'd hesitate to do as much compression as you are (50 times more) unless you saw it spuriously pass a lot - the nature of the deadlock isn't dependent on races so much as simple scheduling - as long as the seocnd thread is scheduled before the first thread completes compressing the deadlock will occur. -Rob |
|||
| msg94512 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-10-26 21:27 | |
> Looks fine to me assuming that the locking functions can be used outside > the GIL. Yes, they can. Actually, even the GIL uses them. :-) > On the test side, the case I supplied was low noise for me - I'd > hesitate to do as much compression as you are (50 times more) unless you > saw it spuriously pass a lot - the nature of the deadlock isn't > dependent on races so much as simple scheduling - as long as the seocnd > thread is scheduled before the first thread completes compressing the > deadlock will occur. Well, your test case often succeeded here, so I decided on a more aggressive variation. |
|||
| msg94515 - (view) | Author: Robert Collins (rbcollins) * ![]() |
Date: 2009-10-26 21:33 | |
On Mon, 2009-10-26 at 21:27 +0000, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > Well, your test case often succeeded here, so I decided on a more > aggressive variation. fair enough, if its needed - its needed :) -Rob |
|||
| msg94567 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-10-27 17:50 | |
Fixed in r75818 (trunk), r75819 (2.6), r75820 (py3k), r75821 (3.1). Thanks! |
|||
| msg96472 - (view) | Author: James G. sack (jim) (jgsack) | Date: 2009-12-16 02:55 | |
On my Fedora 11 AMD x86_64 system, it appears the deadlock still occurs (up to the limit of my patience, ie: several minutes). If I reduce to say 3, I can get the test to succeed sometimes. Observed in: trunk -r 76850 (latest) -r 75818 (first occurrence of testThreading in test-bz2.py) but not -r 75817 ~jim |
|||
| msg96478 - (view) | Author: James G. sack (jim) (jgsack) | Date: 2009-12-16 07:06 | |
IMPORTANT Correction: Please disregard msg96472. I was forgetting to do .configure and make, and evidently getting bogus failures. test_bz2 works fine now, ..sorry for the false alarm. ~jim |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:54 | admin | set | github: 51454 |
| 2009-12-16 07:06:50 | jgsack | set | messages: + msg96478 |
| 2009-12-16 02:55:01 | jgsack | set | nosy:
+ jgsack messages: + msg96472 |
| 2009-10-27 17:50:29 | pitrou | set | status: open -> closed resolution: fixed messages: + msg94567 |
| 2009-10-26 21:33:17 | rbcollins | set | messages: + msg94515 |
| 2009-10-26 21:27:30 | pitrou | set | messages: + msg94512 |
| 2009-10-26 21:15:13 | rbcollins | set | messages: + msg94509 |
| 2009-10-26 19:23:08 | pitrou | set | files:
+ bzthreads.patch keywords: + patch messages: + msg94500 |
| 2009-10-25 22:34:16 | rbcollins | set | messages: + msg94466 |
| 2009-10-25 22:27:09 | pitrou | set | messages: + msg94465 |
| 2009-10-25 22:15:03 | rbcollins | set | messages: + msg94464 |
| 2009-10-25 22:00:16 | pitrou | set | priority: high versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2 nosy: + pitrou messages: + msg94463 stage: needs patch |
| 2009-10-25 21:55:53 | rbcollins | create | |
