Issue 36957: Speed up math.isqrt

Issue 36957: Speed up math.isqrt

Issue36957

Created on 2019-05-18 15:43 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 13405	merged	mark.dickinson, 2019-05-18 15:46
PR 13416	merged	serhiy.storchaka, 2019-05-19 07:03

Messages (9)
msg342799 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-05-18 15:43
The `math.isqrt` algorithm introduce in GH-36887 currently works entirely with Python long integers. That's unnecessarily inefficient for small inputs. For n < 264, `math.isqrt(n)` can be computed, via exactly the same algorithm, using entirely C integer arithmetic. For larger n, the first 5 iterations of the algorithm can similarly be performed entirely in C integer arithmetic, and we can then switch to long integer arithmetic for subsequent iterations. On my machine, these simple changes make a substantial difference (4x faster) for small inputs, and a significant but less substantial difference (70% speedup) for inputs not much larger than 264. The speedup for huge integers is likely to be much smaller, percentage-wise. Some timings: Unpatched --------- lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]" 1000 loops, best of 5: 327 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(263-1000, 263+1000)" "[isqrt(n) for n in x]" 200 loops, best of 5: 1.44 msec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(295-1000, 295+1000)" "[isqrt(n) for n in x]" 200 loops, best of 5: 1.64 msec per loop Patched (PR imminent) ------- lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]" 5000 loops, best of 5: 78.1 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(263-1000, 263+1000)" "[isqrt(n) for n in x]" 1000 loops, best of 5: 355 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(295-1000, 295+1000)" "[isqrt(n) for n in x]" 500 loops, best of 5: 954 usec per loop
msg342800 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-05-18 15:47
> introduce in GH-36887 Sorry, that should have been: introduced in GH-13244. #36887 was the corresponding b.p.o. issue.
msg342801 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-05-18 16:06
Did you try the floating point implementation?
msg342802 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-05-18 17:07
> Did you try the floating point implementation? The aim here was to use exactly the same algorithm, but speed it up by working with C integers where possible; that's a fairly simple change. Using floating-point would require more complex changes. Again, the biggest issue with using a floating-point sqrt as an initial value is that we can't assume either IEEE 754 floating-point format, or a correctly rounded libm sqrt, even though both those things are highly likely on a typical modern machine. So any use of floating-point would also have to have an accuracy check and a fallback integer-only implementation for the case where the floating-point fails. It's possible to make those changes, but I think we'd end up crossing the threshold to "too complicated" for the implementation of a simple function. It's a bit of a shame, because if we _are_ allowed to assume IEEE 754, and a correctly-rounded sqrt implementation (using round-ties-to-even), then it turns out that one can prove that for any value `n` smaller than 2*106 and `a := int(math.sqrt(float(n)))` (assuming that the `int` and `float` conversions are also* correctly rounded), we have (a - 1)2 < n < (a + 1)2, which is exactly the loop invariant that the current algorithm needs to maintain.
msg342836 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-05-19 07:07
It is possible to get yet 10-20% by avoiding to create temporary Python integers for right arguments of shift operations. PR 13416 adds two private functions _PyLong_Rshift() and _PyLong_Lshift() which take the second argument as C size_t instead of Python integer. _PyLong_Lshift() can also be used in factorial() and in float to int comparison. $ ./python -m timeit -s "from math import isqrt; x = range(263-1000, 263+1000)" "[isqrt(n) for n in x]" Unpatched: 200 loops, best of 5: 1.84 msec per loop Patched: 200 loops, best of 5: 1.51 msec per loop $ ./python -m timeit -s "from math import isqrt; x = range(295-1000, 295+1000)" "[isqrt(n) for n in x]" Unpatched: 100 loops, best of 5: 2.09 msec per loop Patched: 200 loops, best of 5: 1.75 msec per loop
msg342837 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-05-19 07:16
I have also some ideas about algorithmic optimizations (they need to be tested). In classic formula $a_{i+1} = a_i + (n - a_i^2)/(2a_i)$ we can calculate $n - a_i^2$ as $(n - a_{i-1}^2) - (a_i^2 - a_{i-1})^2 = (n - a_{i-1}^2) - (a_i^2 - a_{i-1})(a_i^2 + a_{i-1})$. $n - a_i^2$ usually is much smaller than $n$, so this can speed up subtraction and division. Things become more complicated when use shifts as in your formula, but I think that we can get benefit even in this case. This can also speed up the final check $a_i^2 <= n$.
msg342840 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-05-19 11:14
New changeset a5119e7d75c9729fc36c059d05f3d7132e7f6bb4 by Serhiy Storchaka in branch 'master': bpo-36957: Add _PyLong_Rshift() and _PyLong_Lshift(). (GH-13416) https://github.com/python/cpython/commit/a5119e7d75c9729fc36c059d05f3d7132e7f6bb4
msg342864 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-05-19 16:51
New changeset 5c08ce9bf712acbb3f05a3a57baf51fcb534cdf0 by Mark Dickinson in branch 'master': bpo-36957: Speed up math.isqrt (#13405) https://github.com/python/cpython/commit/5c08ce9bf712acbb3f05a3a57baf51fcb534cdf0
msg342865 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-05-19 16:53
Calling this done for now.

History
Date	User	Action	Args
2022-04-11 14:59:15	admin	set	github: 81138
2019-05-19 16:53:34	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg342865 stage: patch review -> resolved
2019-05-19 16:51:59	mark.dickinson	set	messages: + msg342864
2019-05-19 11:14:48	serhiy.storchaka	set	messages: + msg342840
2019-05-19 07:16:22	serhiy.storchaka	set	messages: + msg342837
2019-05-19 07:07:01	serhiy.storchaka	set	messages: + msg342836
2019-05-19 07:03:24	serhiy.storchaka	set	pull_requests: + pull_request13327
2019-05-18 17:07:57	mark.dickinson	set	messages: + msg342802
2019-05-18 16:06:24	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg342801
2019-05-18 15:47:56	mark.dickinson	set	messages: + msg342800
2019-05-18 15:46:48	mark.dickinson	set	keywords: + patch stage: needs patch -> patch review pull_requests: + pull_request13316
2019-05-18 15:43:46	mark.dickinson	create