> Looking at the output, I think the tests are just going to be inherently flakey. It's not testing the specific scenario directly enough, and relying heavily on implicit synchronization.
My notes to debug race conditions:
https://pythondev.readthedocs.io/unstable_tests.html#debug-race-conditions
In general, you should run the same test in a loop in many processes in parallel *and* stress the machine with a random workload.
My favorite recipe:
* Terminal 1: python -m test -F -j20 <... options for the test ...>
* Terminal 2: python -m test -j0 -r -F
Sadly, there is no silver bullet for -j20: sometimes, the machine must be "more idle" to trigger the bug (ex: -j5), sometimes the machine must almost die, be more stressed (-j100).
Happy hacking! |