It is possible to get yet 10-20% by avoiding to create temporary Python integers for right arguments of shift operations. PR 13416 adds two private functions _PyLong_Rshift() and _PyLong_Lshift() which take the second argument as C size_t instead of Python integer. _PyLong_Lshift() can also be used in factorial() and in float to int comparison.
$ ./python -m timeit -s "from math import isqrt; x = range(2**63-1000, 2**63+1000)" "[isqrt(n) for n in x]"
Unpatched: 200 loops, best of 5: 1.84 msec per loop
Patched: 200 loops, best of 5: 1.51 msec per loop
$ ./python -m timeit -s "from math import isqrt; x = range(2**95-1000, 2**95+1000)" "[isqrt(n) for n in x]"
Unpatched: 100 loops, best of 5: 2.09 msec per loop
Patched: 200 loops, best of 5: 1.75 msec per loop |