Fisher–Yates shuffle

Updated on Dec 12, 2024

Edit

Comment

The Fisher–Yates shuffle is an algorithm for generating a random permutation of a finite set—in plain terms, the algorithm shuffles the set. The algorithm effectively puts all the elements into a hat; it continually determines the next element by randomly drawing an element from the hat until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. The modern version of the algorithm is efficient: it takes time proportional to the number of items being shuffled and shuffles them in place.

The Fisher–Yates shuffle is named after Ronald Fisher and Frank Yates, who first described it, and is also known as the Knuth shuffle after Donald Knuth. A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cyclic permutations of length n instead of random permutations.

Fisher and Yates' original method

The Fisher–Yates shuffle, in its original form, was described in 1938 by Ronald Fisher and Frank Yates in their book Statistical tables for biological, agricultural and medical research. Their description of the algorithm used pencil and paper; a table of random numbers provided the randomness. The basic method given for generating a random permutation of the numbers 1 through N goes as follows:

Write down the numbers from 1 through N.
Pick a random number k between one and the number of unstruck numbers remaining (inclusive).
Counting from the low end, strike out the kth number not yet struck out, and write it down at the end of a separate list.
Repeat from step 2 until all the numbers have been struck out.
The sequence of numbers written down in step 3 is now a random permutation of the original numbers.

Provided that the random numbers picked in step 2 above are truly random and unbiased, so will the resulting permutation be. Fisher and Yates took care to describe how to obtain such random numbers in any desired range from the supplied tables in a manner which avoids any bias. They also suggested the possibility of using a simpler method — picking random numbers from one to N and discarding any duplicates—to generate the first half of the permutation, and only applying the more complex algorithm to the remaining half, where picking a duplicate number would otherwise become frustratingly common.

The modern algorithm

The modern version of the Fisher–Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964 and popularized by Donald E. Knuth in The Art of Computer Programming as "Algorithm P". Neither Durstenfeld nor Knuth, in the first edition of his book, acknowledged the work of Fisher and Yates; they may not have been aware of it. Subsequent editions of The Art of Computer Programming mention Fisher and Yates' contribution.

The algorithm described by Durstenfeld differs from that given by Fisher and Yates in a small but significant way. Whereas a naive computer implementation of Fisher and Yates' method would spend needless time counting the remaining numbers in step 3 above, Durstenfeld's solution is to move the "struck" numbers to the end of the list by swapping them with the last unstruck number at each iteration. This reduces the algorithm's time complexity to O(n), compared to O(n²) for the naïve implementation. This change gives the following algorithm (for a zero-based array).

-- To shuffle an array a of n elements (indices 0..n-1):for i from n−1 downto 1 do j ← random integer such that 0 ≤ j ≤ i exchange a[j] and a[i]

An equivalent version which shuffles the array in the opposite direction (from lowest index to highest) is:

-- To shuffle an array a of n elements (indices 0..n-1):for i from 0 to n−2 do j ← random integer such that i ≤ j < n exchange a[i] and a[j]

The "inside-out" algorithm

The Fisher–Yates shuffle, as implemented by Durstenfeld, is an in-place shuffle. That is, given a preinitialized array, it shuffles the elements of the array in place, rather than producing a shuffled copy of the array. This can be an advantage if the array to be shuffled is large.

To simultaneously initialize and shuffle an array, a bit more efficiency can be attained by doing an "inside-out" version of the shuffle. In this version, one successively places element number i into a random position among the first i positions in the array, after moving the element previously occupying that position to position i. In case the random position happens to be number i, this "move" (to the same place) involves an uninitialised value, but that does not matter, as the value is then immediately overwritten. No separate initialization is needed, and no exchange is performed. In the common case where source is defined by some simple function, such as the integers from 0 to n − 1, source can simply be replaced with the function since source is never altered during execution.

To initialize an array a of n elements to a randomly shuffled copy of source, both 0-based: for i from 0 to n − 1 do j ← random integer such that 0 ≤ j ≤ i if j ≠ i a[i] ← a[j] a[j] ← source[i]

The inside-out shuffle can be seen to be correct by induction. Assuming a perfect random number generator, every one of the n! different sequences of random numbers that could be obtained from the calls of random will produce a different permutation of the values, so all of these are obtained exactly once. The condition that checks if j ≠ i may be omitted in languages that have no problems accessing uninitialized array values, and for which assigning is cheaper than comparing.

Another advantage of this technique is that the algorithm can be modified so that even when we do not know "n", the number of elements in source, we can still generate a uniformly distributed random permutation of the source data. Below the array a is built iteratively starting from empty, and a.length represents the current number of elements seen.

To initialize an empty array a to a randomly shuffled copy of source whose length is not known: while source.moreDataAvailable j ← random integer such that 0 ≤ j ≤ a.length if j = a.length a.append(source.next) else a.append(a[j]) a[j] ← source.next

Pencil-and-paper method

As an example, we'll permute the numbers from 1 to 8 using Fisher and Yates' original method. We'll start by writing the numbers out on a piece of scratch paper:

Now we roll a random number k from 1 to 8—let's make it 3—and strike out the kth (i.e. third) number (3, of course) on the scratch pad and write it down as the result:

Now we pick a second random number, this time from 1 to 7: it turns out to be 4. Now we strike out the fourth number not yet struck off the scratch pad—that's number 5—and add it to the result:

Now we pick the next random number from 1 to 6, and then from 1 to 5, and so on, always repeating the strike-out process as above:

Modern method

We'll now do the same thing using Durstenfeld's version of the algorithm: this time, instead of striking out the chosen numbers and copying them elsewhere, we'll swap them with the last number not yet chosen. We'll start by writing out the numbers from 1 to 8 as before:

For our first roll, we roll a random number from 1 to 8: this time it is 6, so we swap the 6th and 8th numbers in the list:

The next random number we roll from 1 to 7, and turns out to be 2. Thus, we swap the 2nd and 7th numbers and move on:

The next random number we roll is from 1 to 6, and just happens to be 6, which means we leave the 6th number in the list (which, after the swap above, is now number 8) in place and just move to the next step. Again, we proceed the same way until the permutation is complete:

At this point there's nothing more that can be done, so the resulting permutation is 7 5 4 3 1 8 2 6.

Sattolo's algorithm

A very similar algorithm was published in 1986 by Sandra Sattolo for generating uniformly distributed cycles of (maximal) length n. The only difference between Durstenfeld's and Sattolo's algorithms is that in the latter, in step 2 above, the random number j is chosen from the range between 1 and i−1 (rather than between 1 and i) inclusive. This simple change modifies the algorithm so that the resulting permutation always consists of a single cycle.

In fact, as described below, it is quite easy to accidentally implement Sattolo's algorithm when the ordinary Fisher–Yates shuffle is intended. This will bias the results by causing the permutations to be picked from the smaller set of (n−1)! cycles of length N, instead of from the full set of all n! possible permutations.

The fact that Sattolo's algorithm always produces a cycle of length n can be shown by induction. Assume by induction that after the initial iteration of the loop, the remaining iterations permute the first n − 1 elements according to a cycle of length n − 1 (those remaining iterations are just Sattolo's algorithm applied to those first n − 1 elements). This means that tracing the initial element to its new position p, then the element originally at position p to its new position, and so forth, one only gets back to the initial position after having visited all other positions. Suppose the initial iteration swapped the final element with the one at (non-final) position k, and that the subsequent permutation of first n − 1 elements then moved it to position l; we compare the permutation π of all n elements with that remaining permutation σ of the first n − 1 elements. Tracing successive positions as just mentioned, there is no difference between π and σ until arriving at position k. But then, under π the element originally at position k is moved to the final position rather than to position l, and the element originally at the final position is moved to position l. From there on, the sequence of positions for π again follows the sequence for σ, and all positions will have been visited before getting back to the initial position, as required.

As for the equal probability of the permutations, it suffices to observe that the modified algorithm involves (n−1)! distinct possible sequences of random numbers produced, each of which clearly produces a different permutation, and each of which occurs—assuming the random number source is unbiased—with equal probability. The (n−1)! different permutations so produced precisely exhaust the set of cycles of length n: each such cycle has a unique cycle notation with the value n in the final position, which allows for (n−1)! permutations of the remaining values to fill the other positions of the cycle notation.

A sample implementation of Sattolo's algorithm in Python is:

Comparison with other shuffling algorithms

The Fisher–Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal. Combined with a high-quality unbiased random number source, it is also guaranteed to produce unbiased results. Compared to some other solutions, it also has the advantage that, if only part of the resulting permutation is needed, it can be stopped halfway through, or even stopped and restarted repeatedly, generating the permutation incrementally as needed.

An alternative method assigns a random number to each element of the set to be shuffled and then sorts the set according to the assigned numbers. The sorting method has the same asymptotic time complexity as Fisher-Yates: although general sorting is O(n log n), numbers are efficiently sorted using Radix sort in O(n) time. Like the Fisher–Yates shuffle, the sorting method produces unbiased results. However, care must be taken to ensure that the assigned random numbers are never duplicated, since sorting algorithms typically don't order elements randomly in case of a tie. Additionally, this method requires asymptotically larger space: O(n) additional storage space for the random numbers, versus O(1) space for the Fisher–Yates shuffle. Finally, we note that the sorting method has a simple parallel implementation, unlike the Fisher–Yates shuffle, which is sequential.

A variant of the above method that has seen some use in languages that support sorting with user-specified comparison functions is to shuffle a list by sorting it with a comparison function that returns random values. However, this is an extremely bad method: it is very likely to produce highly non-uniform distributions, which in addition depends heavily on the sorting algorithm used. For instance suppose quicksort is used as sorting algorithm, with a fixed element selected as first pivot element. The algorithm starts comparing the pivot with all other elements to separate them into those less and those greater than it, and the relative sizes of those groups will determine the final place of the pivot element. For a uniformly distributed random permutation, each possible final position should be equally likely for the pivot element, but if each of the initial comparisons returns "less" or "greater" with equal probability, then that position will have a binomial distribution for p = 1/2, which gives positions near the middle of the sequence with a much higher probability for than positions near the ends. Randomized comparison functions applied to other sorting methods like merge sort may produce results that appear more uniform, but are not quite so either, since merging two sequences by repeatedly choosing one of them with equal probability (until the choice is forced by the exhaustion of one sequence) does not produce results with a uniform distribution; instead the probability to choose a sequence should be proportional to the number of elements left in it. In fact no method that uses only two-way random events with equal probability ("coin flipping"), repeated a bounded number of times, can produce permutations of a sequence (of more than two elements) with a uniform distribution, because every execution path will have as probability a rational number with as denominator a power of 2, while the required probability 1/n! for each possible permutation is not of that form.

In principle this shuffling method can even result in program failures like endless loops or access violations, because the correctness of a sorting algorithm may depend on properties of the order relation (like transitivity) that a comparison producing random values will certainly not have. While this kind of behaviour should not occur with sorting routines that never perform a comparison whose outcome can be predicted with certainty (based on previous comparisons), there can be valid reasons for deliberately making such comparisons. For instance the fact that any element should compare equal to itself allows using them as sentinel value for efficiency reasons, and if this is the case, a random comparison function would break the sorting algorithm.

References

Fisher–Yates shuffle Wikipedia

(Text) CC BY-SA

Contents