Issue 26692: cgroups support in multiprocessing

Issue26692

Created on 2016-04-05 00:46 by Satrajit Ghosh, last changed 2022-04-11 14:58 by admin.

Messages (4)
msg262881 - (view)	Author: Satrajit Ghosh (Satrajit Ghosh)	Date: 2016-04-05 00:46
multiprocessing cpucount returns the number of cpus on the system as returned by /proc/cpuinfo. this is true even on machines where linux kernel cgroups is being used to restrict cpu usage for a given process. this results in significant thread swithcing on systems with many cores. some ideas have been implemented in the following repos to handle cgroups: https://github.com/peo3/cgroup-utils http://cpachecker.googlecode.com/svn-history/r12889/trunk/scripts/benchmark/runexecutor.py it would be nice if multiprocessing was a little more intelligent and queried process characteristics.
msg298893 - (view)	Author: Charles-François Natali (neologix) *	Date: 2017-07-23 07:56
I'm not convinced. The reason is that using the number of CPU cores is just a heuristic for a default value: the API allows the user to specify the number of workers to use, so it's not really a limitation. The problem is that if you try to think about a more "correct" default value, it gets complicated: here, it's about cgroups, but for example: - What if they are multiple processes running on the same box? - What if the process is subject to CPU affinity? Currently, the CPU affinity mask is ignored. - What if the code being executed by children is itself multi-threaded (maybe because it's using a numerical library using BLAS etc)? - What about hyper-threading? If the code has a lot of cache misses, it would probably be a good idea to use one worker per logical thread, but if it's cache-friendly, probably not. - Etc. In other words, I think that there's simply not reasonable default value for the number of workers to use, that any value will make some class of users/use-case unhappy, and it would add a lot of unnecessary complexity. Since the user can always specify the number of workers - if you find a place where it's not possible, then please report it - I really think we should let the choice/burden up to the user.
msg298901 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-07-23 12:22
Agreed that it is not possible for multiprocessing to choose an optimal default in all settings. However, making the default adequate for more use cases sounds like a reasonable goal. Currently, we are using `os.cpu_count()`. Ideally, we would have a second API `os.usable_cpu_count()` that would return the number of logical CPUs usable by the current process (taking into account affinity settings, cgroups, etc.).
msg310113 - (view)	Author: David Chin (hairygristle)	Date: 2018-01-16 20:06
I would like to state strong support if is.get_usable_cpu_count() I administer a typical HPC cluster which may have multiple jobs scheduled on the same physical server. The fact that multiprocessing ignores cgroups leads to bad oversubscription.

History
Date	User	Action	Args
2022-04-11 14:58:29	admin	set	github: 70879
2018-01-16 20:06:33	hairygristle	set	nosy: + hairygristle messages: + msg310113
2017-11-07 23:02:08	mihaic	set	nosy: + mihaic
2017-09-05 03:31:34	giampaolo.rodola	set	nosy: + giampaolo.rodola
2017-07-23 12:22:53	pitrou	set	messages: + msg298901
2017-07-23 07:56:19	neologix	set	messages: + msg298893
2017-07-22 21:59:53	pitrou	set	stage: needs patch type: behavior -> enhancement versions: + Python 3.7, - Python 3.6
2017-07-22 21:59:46	pitrou	set	nosy: + pitrou, neologix
2016-04-05 06:18:39	SilentGhost	set	nosy: + jnoller, sbt versions: + Python 3.6
2016-04-05 00:46:11	Satrajit Ghosh	create