Add MultiprocessingAuthkeyPlugin to propagate authkey to Dask workers by moi90 · Pull Request #9124 · dask/distributed
Closes #9122
- Tests added / passed
- Passes
pre-commit run --all-files
Unit Test Results
See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.
27 files ± 0 27 suites ±0 9h 46m 19s ⏱️ + 3m 6s
4 113 tests + 1 4 006 ✅ - 1 104 💤 ±0 3 ❌ + 2
51 529 runs +13 49 330 ✅ ±0 2 184 💤 - 1 15 ❌ +14
For more details on these failures, see this check.
Results for commit e54b412. ± Comparison against base commit c9e7aca.
♻️ This comment has been updated with latest results.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a PR to distributed we don't need to do this in a plugin, this could happen directly in the Cluster object for handing off the key and the Worker object for recieving it. We could use the Dask config as the transport for the key to avoid us just injecting it into the environment all the time.
For testing we could just ensure the key is successfully set, transmitted and recieved.
moi90
marked this pull request as draft
Could you give me a hint how to test this properly?
The following works instantly (without any change), because the cluster started by gen_cluster likely relies on the multiprocessing module which obviously correctly forwards the authkey to child processes.
def _get_authkey(): import multiprocessing.process return multiprocessing.process.current_process().authkey @gen_cluster(client=True) async def test_authkey(c, s, a, b): import multiprocessing.process worker_authkey = await c.submit(_get_authkey) assert worker_authkey == multiprocessing.process.current_process().authkey
Is there a factory for a cluster that is started using, say, subprocess instead?
I don't think there is a factory, but you could use subprocess to run the dask scheduler and dask worker commands separately.
you could use
subprocess
That seems to work, thanks!
We could use the Dask config as the transport for the key
Is this secure? Is the config visible or even serialized to disk? The Python developers make a big deal about the fact that the authkey must always remain inaccessible from outside...
Is this secure?
When you add it to the config in a Python process it would only be held in memory, we never write config back to the disk, it's only read once when dask.config is imported.
When workers are launched it would be transferred via environment variables, which is the same as the original proposal here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters