Consider a faster alternative algorithm for random.shuffle() · Issue #108598 · python/cpython

For historical contrast, if random.shuffle were based on a PRNG with a mere 7 trillion different internal states (looking at you, Wichmann–Hill), then it would be easy to write a perfect_shuffle detector: you'd just generate 10 million shuffles of a 100-element list and look for duplicates - the birthday paradox says that with shuffle the chance of getting at least one duplicate is >99.92%, while with perfect_shuffle the chance of a duplicate is essentially zero.

@mdickinson, I didn't follow this line of argument. It's certain that WH can't generate more than ~7e12 (its period) distinct permutations, but it goes through every one of its possible internal states before repeating. So if there's a duplicate permutation generated before then, it's not due to WH suffering a birthday-paradox internal state repetition, but due to the shuffling algorithm mapping distinct WH states to the same permutation. It's the algorithm that's subject to the birthday paradox, not the underlying generator.

Some quick coding doesn't change my mind about that 😉. Here's output from a run with a sequence with only 17 elements (I don't want to wait forever on this). Even at such a "small" size, there was a case where using Wichmann-Hill as the generator took over 45 million shuffles to find the first duplicate. Offhand, WH doesn't appear to be doing any worse at this than using the Twister as the generator, or using the system urandom() (via random.SystemRandom.random). But the variance across tries is high, so my "appears" is a very subjective guess:

EDIT: to try to nail this, I put in a far feebler version of WH:

    def whbase(c=4991):
        while True:
            c = 170 * c % 30323
            yield c/30323.0
    wh = whbase().__next__

30323 is a prime, so c internally goes through all values in range(1, 30323). And, indeed, if the input list is large enough, using this generates exactly 30322 distinct permutations - or 15161 - before finding its first duplicate. I don't know why sometimes it's the maximum possible, while other times only half that. But that's a pit I'm not inclined to jump into. The point was to verify that the generator itself is exempt from birthday paradox arguments.

EDIT 2: "all or half?" are special cases of a more general pattern, which turns out to be shallow. If I pass a list with 15162 elements,, the horridly feeble WH variant generates only 2 distinct permutations. Why? Because the shuffling algorithm calls the generator N-1 times for a list of length N. So, whatever the starting state, shuffling a 15162-list advances the state by 15161 positions, and doing it a second time does the same, leaving the starting state advanced by 30332 positions in all: exactly back to the state we started with. 15161 is very special, because it's half the period P. For smaller sizes, "even or odd?" makes the difference. If the length is odd, shuffling consumes an even number of states, so (because P is also even) only half the states can be starting states for the next shuffle - only P/2 distinct perms can be generated. While if the length is even, shuffling consumes an odd number of states, and (again because P is even) no state can ruled out as the starting state - all P perms can potentially be generated. And 2 and 15161 are P's only non-trivial divisors, so there are no other patterns of this kind in this case.

Note that the above is nearly irrelevant for the Twister, because its period is prime.

N=17 N!=355_687_428_096_000 isqrt=18_859_677
using random.SystemRandom().random
dup found at count=6_759_620
dup found at count=18_331_040
dup found at count=24_891_577
dup found at count=3_873_247
dup found at count=8_787_305

using random.random
dup found at count=80_778_804
dup found at count=38_294_138
dup found at count=4_863_331
dup found at count=19_520_382
dup found at count=17_565_755

using WH
dup found at count=11_945_489
dup found at count=11_136_7397
dup found at count=32_105_454
dup found at count=45_458_159
dup found at count=17_871_475
Click here for the code
def shuf(xs, r):
   for i in range(len(xs)-1, 0, -1):
       j = int(r() * (i+1))
       xs[i], xs[j] = xs[j], xs[i]

def whbase(a=123, b=145, c=4991):
   while True:
       a = 171 * a % 30269
       b = 172 * b % 30307
       c = 170 * c % 30323
       yield (a/30269.0 + b/30307.0 + c/30323.0) % 1.0
wh = whbase().__next__

def dupdetect(r, n):
   count = 0
   base = bytearray(range(n))
   seen = set()
   while True:
       count += 1
       xs = base[:]
       shuf(xs, r)
       t = bytes(xs)
       if t in seen:
           print(f"dup found at {count=:_}")
           break
       seen.add(t)

N = 17
import math
fac = math.factorial(N)
s = math.isqrt(fac)
print(f"{N=:} N!={fac:_} isqrt={s:_}")
import random
for meth, name in [(random.SystemRandom().random, "random.SystemRandom().random"),
                  (random.random, "random.random"),
                  (wh, "WH")]:
   print("using", name)
   for i in range(5):
       dupdetect(meth, N)
   print()