I've noticed that replacing the for loop in the ins1 function in listobject.c with a memmove when the number of pointers to move is greater than 16 seems to speed up list.insert by about 3 to 4x on a contrived benchmark.
# Before
jeethu@dev:cpython (master)$ ./python -m timeit -s "l = []" "for _ in range(100): l.insert(0, None)"
200 loops, best of 5: 3.07 msec per loop
#After
jeethu@dev:cpython (3.7_list_insert_memmove)$ ./python -m timeit -s "l = []" "for _ in range(100): l.insert(0, None)"
500 loops, best of 5: 744 usec per loop
Both builds were configured with --enable-optimizations and --with-lto |