bpo-39678 RFC refactor queue manager thread by tomMoral · Pull Request #18551 · python/cpython

Conversation

@tomMoral

As discussed in #17670 the the _queue_management_worker function has grown quite long and complicated.
This PR intends to turn it into an object with a bunch of short and readable helper methods.

I did a first pass, not sure about name changed for _queue_management_thread and a bunch of other change, in particular to cancel behavior (only do it once now). Also, I am not sure if I should add a what's new entry. Let me know if I should change anything.

https://bugs.python.org/issue39678

@codecov

@tomMoral

The failure seems unrelated to this PR (in importlib).

pitrou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. This looks generally good, just some comments.

n_children_alive = sum(p.is_alive() for p in processes.values())
# Set this thread to be daemonized
super().__init__()
self.daemon = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile exploring whether/how we can remove the daemon flag, though in another BPO issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the daemon flag is only necessary if we can't guarantee that the thread will be joined at the end of the executor's shutdown process. But, I could see this potentially being an issue when calling executor.shutdown(wait=False) for ProcessPoolExecutor (which already has issues with graceful termination).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure this is needed anymore.
The shutdown(wait=False) behavior should be fixed by #17670 and the clean up process is quite robust I believe. We should maybe give a try to simply removing the flag and experiments bad situations to see if it leads to interpreters hanging at exit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is: what happens if the interpreter starts to shutdown while an executor is still running? If the management thread is not daemon, the interpreter will hang trying to join it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the current behavior. The _python_exit function will join the thread anyway no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed these messages.
Indeed, the simplest solution would be to call the atexit _python_exit before joining non-deamonized threads.

@pitrou

@aeros You may be interested in this, at least abstractly.

aeros

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @tomMoral!

After just recently working with the internals for the cancel_futures executor.shutdown() parameter, I can certainly attest to the queue management function becoming a bit convoluted. Splitting up the monolithic function into a class with multiple methods should make it significantly easier to make changes and adjustments in the future.

@tomMoral

Thanks for the reviews! I addressed your comments, let me know if there are other changes to do.

pitrou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, thank you. @aeros Do you want to take another look?

aeros

@aeros aeros left a comment

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the latest commits to the PR addressed all my of concerns.

I think this PR would be a good candidate for the test-with-buildbot label, just to ensure that we don't encounter any odd side effects on other platforms prior to merging (besides the CI checks). I suspect everything will pass, but if anything doesn't, it's much simpler to address it now rather than post-commit (which could require a PR to revert the changes and another to re-apply them later).

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @aeros for commit b9d5c38 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@pitrou

Buildbot failures look unrelated, I'm merging.

sthagen added a commit to sthagen/python-cpython that referenced this pull request

Mar 1, 2020

@tomMoral

thanks for the quick reviews and merge!

@tomMoral tomMoral deleted the RFC_queue_management_thread branch

March 1, 2020 21:49

@aeros

@pitrou In the post-commit, it looks like the failures only occurred for the Windows 7 buildbot (buildbot/AMD64 Windows7 SP1 3.x), which we no longer support since Windows 7 recently reached EoL. That buildbot has also been consistently failing across PRs recently, so it's definitely not related to the PR.

(Last time I checked, the buildbot team was working on removing all of the Windows 7 builders from the fleet, the above is the last one remaining)

So, in other words, everything looks good! Thanks again for working on the ProcessPoolExecutor refactor @tomMoral.

Labels