Oh, sorry, Stefan, I didn't noticed your patch. I wouldn't write my patch if noticed your patch.
int_free_list_2.patch adds free list only for single-digits ints. Following patch adds free list for multi-digit ints (3 on 32-bit build, 2 on 64-bit build) enough to represent 32-bit integers. Unfortunately it makes allocating/deallocating of single-digit ints slower.
Microbenchmarks:
$ ./python -m timeit -s "r = range(10**4)" -- "for i in r: pass"
Unpatched: 1000 loops, best of 3: 603 usec per loop
1-digit free list: 1000 loops, best of 3: 390 usec per loop
Multi-digit free list: 1000 loops, best of 3: 428 usec per loop
$ ./python -m timeit -s "r = range(10**5)" -- "for i in r: pass"
Unpatched: 100 loops, best of 3: 6.12 msec per loop
1-digit free list: 100 loops, best of 3: 5.69 msec per loop
Multi-digit free list: 100 loops, best of 3: 4.36 msec per loop
$ ./python -m timeit -s "a = list(range(10**4))" -- "for i, x in enumerate(a): pass"
Unpatched: 1000 loops, best of 3: 1.25 msec per loop
1-digit free list: 1000 loops, best of 3: 929 usec per loop
Multi-digit free list: 1000 loops, best of 3: 968 usec per loop
$ ./python -m timeit -s "a = list(range(10**5))" -- "for i, x in enumerate(a): pass"
Unpatched: 100 loops, best of 3: 11.7 msec per loop
1-digit free list: 100 loops, best of 3: 10.9 msec per loop
Multi-digit free list: 100 loops, best of 3: 9.99 msec per loop
As for more realistic cases, base85 encoding is 5% faster with multi-digit free list.
$ ./python -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)"
Unpatched: 100 loops, best of 3: 10 msec per loop
1-digit free list: 100 loops, best of 3: 9.85 msec per loop
Multi-digit free list: 100 loops, best of 3: 9.48 msec per loop |