The `math.isqrt` algorithm introduce in GH-36887 currently works entirely with Python long integers. That's unnecessarily inefficient for small inputs.
For n < 2**64, `math.isqrt(n)` can be computed, via exactly the same algorithm, using entirely C integer arithmetic.
For larger n, the first 5 iterations of the algorithm can similarly be performed entirely in C integer arithmetic, and we can then switch to long integer arithmetic for subsequent iterations.
On my machine, these simple changes make a substantial difference (4x faster) for small inputs, and a significant but less substantial difference (70% speedup) for inputs not much larger than 2**64. The speedup for huge integers is likely to be much smaller, percentage-wise.
Some timings:
Unpatched
---------
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]"
1000 loops, best of 5: 327 usec per loop
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]"
200 loops, best of 5: 1.44 msec per loop
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]"
200 loops, best of 5: 1.64 msec per loop
Patched (PR imminent)
-------
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt" "[isqrt(n) for n in range(2, 1000)]"
5000 loops, best of 5: 78.1 usec per loop
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]"
1000 loops, best of 5: 355 usec per loop
lovelace:cpython mdickinson$ ./python.exe -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]"
500 loops, best of 5: 954 usec per loop |