Message 336082 - Python tracker

Message336082

Author	bmerry
Recipients	bmerry
Date	2019-02-20.13:00:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1550667654.79.0.460473844528.issue36051@roundup.psfhosted.org>
In-reply-to

Content
A common pattern in libraries doing I/O is to receive data in chunks, put them in a list, then join them all together using b"".join(chunks). For example, see http.client.HTTPResponse._safe_read. When the output is large, the memory copies can block the interpreter for a non-trivial amount of time, and prevent multi-threaded scaling. If the GIL could be dropped during the memcpys it could improve parallel I/O performance in some high-bandwidth scenarios (36050 mentions a case where I've run into this serialisation bottleneck in practice). Obviously it could hurt performance to drop the GIL for small cases. As far as I know numpy uses thresholds to decide when it's worth dropping the GIL and it seems to work fairly well.

Content

A common pattern in libraries doing I/O is to receive data in chunks, put them in a list, then join them all together using b"".join(chunks). For example, see http.client.HTTPResponse._safe_read. When the output is large, the memory copies can block the interpreter for a non-trivial amount of time, and prevent multi-threaded scaling. If the GIL could be dropped during the memcpys it could improve parallel I/O performance in some high-bandwidth scenarios (36050 mentions a case where I've run into this serialisation bottleneck in practice).

Obviously it could hurt performance to drop the GIL for small cases. As far as I know numpy uses thresholds to decide when it's worth dropping the GIL and it seems to work fairly well.

History
Date	User	Action	Args
2019-02-20 13:00:54	bmerry	set	recipients: + bmerry
2019-02-20 13:00:54	bmerry	set	messageid: <1550667654.79.0.460473844528.issue36051@roundup.psfhosted.org>
2019-02-20 13:00:54	bmerry	link	issue36051 messages
2019-02-20 13:00:54	bmerry	create