Fix race condition between test and supervisor restart by exarkun · Pull Request #530

Fix race condition between test and supervisor restart by exarkun · Pull Request #530 · gridsync/gridsync

until polls with fairly low resolution. If the supervisor managed to
receive exit notification and launch a new process between two consecutive
polls for is_running() == False the test would miss the stopped condition
entirely and never complete (or rather, time out after 10 seconds).

Now just wait for the supervisor to report the pid has changed and assert the
process ends in the desire state (running).

`until` polls with fairly low resolution.  If the supervisor managed to
receive exit notification and launch a new process between two consecutive
polls for `is_running() == False` the test would miss the stopped condition
entirely and never complete (or rather, time out after 10 seconds).

Now just wait for the supervisor to report the pid has changed and assert the
process ends in the desire state (running).

If the supervisor managed to receive exit notification and launch a new process between two consecutive
polls for is_running() == False the test would miss the stopped condition entirely and never complete (or rather, time out after 10 seconds).

The default restart_delay value of 1 is intended to guard against this (i.e., by enforcing a wait of restart_delay amount of seconds before re-launching), however, since that's merely a default -- and since there's no mechanism/check in place to ensure that the value of restart_delay is greater than the until polling resolution -- I'm fine with this change.

Now just wait for the supervisor to report the pid has changed and assert the process ends in the desire state (running).

I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!

exarkun deleted the more-reliable-supervisor-kill-test branch

August 18, 2022 15:26

I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!

Ah, this is a good point. PID re-use is certainly a thing. I don't know how likely it is that a PID gets re-used in this particular scenario but maybe it's not impossible.

Sometimes folks use the combination of (pid, start time) as a more robust unique identifier. We could consider doing that kind of thing around here (maybe exposing this functionality behind another Supervisor API), especially if this new version of the test also proves to be unreliable.