Optimize rabinkarp_update() by dbaarda · Pull Request #191 · librsync/librsync

added 8 commits

April 15, 2020 10:34
This adds rabinkarp.c copying rollsum.c's unrolled loop idea.
Change from a generic uint32_pow() to rabinkarp_pow() optimized using a lookup
table for getting powers of RABINKARP_MULT.

Use a DOMULT4() macro to unroll rabinkarp_update() in a way that means 4
multiplies could be piplined/parallelized if the hardware can do it. Each
multiply is not dependent on the results of the previous multiply.
A lot of testing of many different variants of the loop unrolling seems to
indicate this is the best variant of the loop-unrolling.