Issue36957
Created on 2019-05-18 15:43 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 13405 | merged | mark.dickinson, 2019-05-18 15:46 | |
| PR 13416 | merged | serhiy.storchaka, 2019-05-19 07:03 | |
| Messages (9) | |||
|---|---|---|---|
| msg342799 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-05-18 15:43 | |
The `math.isqrt` algorithm introduce in GH-36887 currently works entirely with Python long integers. That's unnecessarily inefficient for small inputs. For n < 2**64, `math.isqrt(n)` can be computed, via exactly the same algorithm, using entirely C integer arithmetic. For larger n, the first 5 iterations of the algorithm can similarly be performed entirely in C integer arithmetic, and we can then switch to long integer arithmetic for subsequent iterations. On my machine, these simple changes make a substantial difference (4x faster) for small inputs, and a significant but less substantial difference (70% speedup) for inputs not much larger than 2**64. The speedup for huge integers is likely to be much smaller, percentage-wise. Some timings: Unpatched --------- lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]" 1000 loops, best of 5: 327 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]" 200 loops, best of 5: 1.44 msec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]" 200 loops, best of 5: 1.64 msec per loop Patched (PR imminent) ------- lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]" 5000 loops, best of 5: 78.1 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]" 1000 loops, best of 5: 355 usec per loop lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]" 500 loops, best of 5: 954 usec per loop |
|||
| msg342800 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-05-18 15:47 | |
> introduce in GH-36887 Sorry, that should have been: introduced in GH-13244. #36887 was the corresponding b.p.o. issue. |
|||
| msg342801 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-05-18 16:06 | |
Did you try the floating point implementation? |
|||
| msg342802 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-05-18 17:07 | |
> Did you try the floating point implementation? The aim here was to use exactly the same algorithm, but speed it up by working with C integers where possible; that's a fairly simple change. Using floating-point would require more complex changes. Again, the biggest issue with using a floating-point sqrt as an initial value is that we can't assume either IEEE 754 floating-point format, *or* a correctly rounded libm sqrt, even though both those things are highly likely on a typical modern machine. So any use of floating-point would also have to have an accuracy check and a fallback integer-only implementation for the case where the floating-point fails. It's possible to make those changes, but I think we'd end up crossing the threshold to "too complicated" for the implementation of a simple function. It's a bit of a shame, because if we _are_ allowed to assume IEEE 754, and a correctly-rounded sqrt implementation (using round-ties-to-even), then it turns out that one can prove that for any value `n` smaller than 2**106 and `a := int(math.sqrt(float(n)))` (assuming that the `int` and `float` conversions are *also* correctly rounded), we have (a - 1)**2 < n < (a + 1)**2, which is exactly the loop invariant that the current algorithm needs to maintain. |
|||
| msg342836 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-05-19 07:07 | |
It is possible to get yet 10-20% by avoiding to create temporary Python integers for right arguments of shift operations. PR 13416 adds two private functions _PyLong_Rshift() and _PyLong_Lshift() which take the second argument as C size_t instead of Python integer. _PyLong_Lshift() can also be used in factorial() and in float to int comparison. $ ./python -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]" Unpatched: 200 loops, best of 5: 1.84 msec per loop Patched: 200 loops, best of 5: 1.51 msec per loop $ ./python -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]" Unpatched: 100 loops, best of 5: 2.09 msec per loop Patched: 200 loops, best of 5: 1.75 msec per loop |
|||
| msg342837 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-05-19 07:16 | |
I have also some ideas about algorithmic optimizations (they need to be tested). In classic formula $a_{i+1} = a_i + (n - a_i^2)/(2*a_i)$ we can calculate $n - a_i^2$ as $(n - a_{i-1}^2) - (a_i^2 - a_{i-1})^2 = (n - a_{i-1}^2) - (a_i^2 - a_{i-1})*(a_i^2 + a_{i-1})$. $n - a_i^2$ usually is much smaller than $n$, so this can speed up subtraction and division. Things become more complicated when use shifts as in your formula, but I think that we can get benefit even in this case. This can also speed up the final check $a_i^2 <= n$.
|
|||
| msg342840 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-05-19 11:14 | |
New changeset a5119e7d75c9729fc36c059d05f3d7132e7f6bb4 by Serhiy Storchaka in branch 'master': bpo-36957: Add _PyLong_Rshift() and _PyLong_Lshift(). (GH-13416) https://github.com/python/cpython/commit/a5119e7d75c9729fc36c059d05f3d7132e7f6bb4 |
|||
| msg342864 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-05-19 16:51 | |
New changeset 5c08ce9bf712acbb3f05a3a57baf51fcb534cdf0 by Mark Dickinson in branch 'master': bpo-36957: Speed up math.isqrt (#13405) https://github.com/python/cpython/commit/5c08ce9bf712acbb3f05a3a57baf51fcb534cdf0 |
|||
| msg342865 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-05-19 16:53 | |
Calling this done for now. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:15 | admin | set | github: 81138 |
| 2019-05-19 16:53:34 | mark.dickinson | set | status: open -> closed resolution: fixed messages: + msg342865 stage: patch review -> resolved |
| 2019-05-19 16:51:59 | mark.dickinson | set | messages: + msg342864 |
| 2019-05-19 11:14:48 | serhiy.storchaka | set | messages: + msg342840 |
| 2019-05-19 07:16:22 | serhiy.storchaka | set | messages: + msg342837 |
| 2019-05-19 07:07:01 | serhiy.storchaka | set | messages: + msg342836 |
| 2019-05-19 07:03:24 | serhiy.storchaka | set | pull_requests: + pull_request13327 |
| 2019-05-18 17:07:57 | mark.dickinson | set | messages: + msg342802 |
| 2019-05-18 16:06:24 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg342801 |
| 2019-05-18 15:47:56 | mark.dickinson | set | messages: + msg342800 |
| 2019-05-18 15:46:48 | mark.dickinson | set | keywords:
+ patch stage: needs patch -> patch review pull_requests: + pull_request13316 |
| 2019-05-18 15:43:46 | mark.dickinson | create | |
