Fix race condition between test and supervisor restart by exarkun · Pull Request #530 · gridsync/gridsync
until polls with fairly low resolution. If the supervisor managed to
receive exit notification and launch a new process between two consecutive
polls for is_running() == False the test would miss the stopped condition
entirely and never complete (or rather, time out after 10 seconds).
Now just wait for the supervisor to report the pid has changed and assert the
process ends in the desire state (running).
`until` polls with fairly low resolution. If the supervisor managed to receive exit notification and launch a new process between two consecutive polls for `is_running() == False` the test would miss the stopped condition entirely and never complete (or rather, time out after 10 seconds). Now just wait for the supervisor to report the pid has changed and assert the process ends in the desire state (running).
If the supervisor managed to receive exit notification and launch a new process between two consecutive
polls foris_running() == Falsethe test would miss the stopped condition entirely and never complete (or rather, time out after 10 seconds).
The default restart_delay value of 1 is intended to guard against this (i.e., by enforcing a wait of restart_delay amount of seconds before re-launching), however, since that's merely a default -- and since there's no mechanism/check in place to ensure that the value of restart_delay is greater than the until polling resolution -- I'm fine with this change.
Now just wait for the supervisor to report the pid has changed and assert the process ends in the desire state (running).
I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!
exarkun
deleted the
more-reliable-supervisor-kill-test
branch
I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!
Ah, this is a good point. PID re-use is certainly a thing. I don't know how likely it is that a PID gets re-used in this particular scenario but maybe it's not impossible.
Sometimes folks use the combination of (pid, start time) as a more robust unique identifier. We could consider doing that kind of thing around here (maybe exposing this functionality behind another Supervisor API), especially if this new version of the test also proves to be unreliable.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters