Mersenne Twister - Alchetron, The Free Social Encyclopedia

The Mersenne Twister is a pseudorandom number generator (PRNG). It is by far the most widely used general-purpose PRNG. Its name derives from the fact that its period length is chosen to be a Mersenne prime.

Adoption in software systems

The Mersenne Twister is the default PRNG for the following software systems:

Microsoft Visual C++, Microsoft Excel, GAUSS, GLib, GNU Multiple Precision Arithmetic Library, GNU Octave, GNU Scientific Library, gretl, IDL, Julia, CMU Common Lisp, Embeddable Common Lisp, Steel Bank Common Lisp, Maple, MATLAB, Free Pascal, PHP, Python, R, Ruby, SageMath, Scilab, Stata. It is also available in Apache Commons, in standard C++ (since C++11), and in Mathematica. Add-on implementations are provided in many program libraries, including the Boost C++ Libraries, the CUDA Library, and the NAG Numerical Library.

The Mersenne Twister is one of two PRNGs in SPSS: the other generator is kept only for compatibility with older programs, and the Mersenne Twister is stated to be "more reliable". The Mersenne Twister is similarly one of the PRNGs in SAS: the other generators are older and deprecated.

Advantages

The commonly used version of Mersenne Twister, MT19937, which produces a sequence of 32-bit integers, has the following desirable properties:

It has a very long period of 2¹⁹⁹³⁷ − 1. While a long period is not a guarantee of quality in a random number generator, short periods (such as the 2³² common in many older software packages) can be problematic.
It is k-distributed to 32-bit accuracy for every 1 ≤ k ≤ 623 (see definition below).
It passes numerous tests for statistical randomness, including the Diehard tests.

Disadvantages

The large state space comes with a performance cost: the 2.5 KiB state buffer will place a load on the memory caches. In 2011, Saito & Matsumoto proposed a version of the Mersenne Twister to address this issue. The tiny version, TinyMT, uses just 127 bits of state space.

By today's standards, the Mersenne Twister is somewhat slow, unless the SFMT implementation is used (see section below).

It passes most, but not all, of the stringent TestU01 randomness tests. Because it is based on simple linear (xor) operations, it fails tests based on linear complexity after relatively few bits of output, despite its extremely large state. Passing the output through a simple hash function can remedy this weakness.

Multiple Mersenne Twister instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values.

It can take a long time to start generating output that passes randomness tests, if the initial state is highly non-random—particularly if the initial state has many zeros. A consequence of this is that two instances of the generator, started with initial states that are almost the same, will usually output nearly the same sequence for many iterations, before eventually diverging. The 2002 update to the MT algorithm has improved initialization, so that beginning with such a state is very unlikely.

k-distribution

A pseudorandom sequence x_i of w-bit integers of period P is said to be k-distributed to v-bit accuracy if the following holds.

Let trunc_v(x) denote the number formed by the leading v bits of x, and consider P of the kv-bit vectors ( trunc v ( x i ) , trunc v ( x i + 1 ) , . . . , trunc v ( x i + k − 1 ) ) ( 0 ≤ i < P ) . Then each of the 2^kv possible combinations of bits occurs the same number of times in a period, except for the all-zero combination that occurs once less often.

Alternatives

The algorithm in its native form is not cryptographically secure. The reason is that observing a sufficient number of iterations (624 in the case of MT19937, since this is the size of the state vector from which future iterations are produced) allows one to predict all future iterations.

A pair of cryptographic stream ciphers based on output from the Mersenne Twister has been proposed by Matsumoto, Nishimura, and co-authors. The authors claim speeds 1.5 to 2 times faster than Advanced Encryption Standard in counter mode.

An alternative generator, WELL ("Well Equidistributed Long-period Linear"), offers quicker recovery, and equal randomness, and nearly equal speed. Marsaglia's xorshift generators and variants are the fastest in this class.

Algorithmic detail

For a w-bit word length, the Mersenne Twister generates integers in the range [0, 2^w−1].

The Mersenne Twister algorithm is based on a matrix linear recurrence over a finite binary field F₂. The algorithm is a twisted generalised feedback shift register (twisted GFSR, or TGFSR) of rational normal form (TGFSR(R)), with state bit reflection and tempering. The basic idea is to define a series x i through a simple recurrence relation, and then output numbers of the form x i T , where T is an invertible F₂ matrix called a tempering matrix.

The general algorithm is characterized by the following quantities (some of these explanations make sense only after reading the rest of the algorithm):

w: word size (in number of bits)

n: degree of recurrence

m: middle word, an offset used in the recurrence relation defining the series x, 1 ≤ m < n

r: separation point of one word, or the number of bits of the lower bitmask, 0 ≤ r ≤ w - 1

a: coefficients of the rational normal form twist matrix

b, c: TGFSR(R) tempering bitmasks

s, t: TGFSR(R) tempering bit shifts

u, d, l: additional Mersenne Twister tempering bit shifts/masks

with the restriction that 2^nw − r − 1 is a Mersenne prime. This choice simplifies the primitivity test and k-distribution test that are needed in the parameter search.

The series x is defined as a series of w-bit quantities with the recurrence relation:

x k + n := x k + m ⊕ ( ( x k u ∣∣ x k + 1 l ) A ) k = 0 , 1 , …

where ∣ ∣ denotes concatenation of bit vectors (with upper bits on the left), ⊕ the bitwise exclusive or (XOR), x k u means the upper w − r bits of x k , and x k + 1 l means the lower r bits of x k + 1 . The twist transformation A is defined in rational normal form as:

A = ( 0 I w − 1 a w − 1 ( a w − 2 , … , a 0 ) )

with I_n − 1 as the (n − 1) × (n − 1) identity matrix. The rational normal form has the benefit that multiplication by A can be efficiently expressed as: (remember that here matrix multiplication is being done in F₂, and therefore bitwise XOR takes the place of addition)

x A = { x ≫ 1 x 0 = 0 ( x ≫ 1 ) ⊕ a x 0 = 1

where x₀ is the lowest order bit of x.

As like TGFSR(R), the Mersenne Twister is cascaded with a tempering transform to compensate for the reduced dimensionality of equidistribution (because of the choice of A being in the rational normal form). Note that this is equivalent to using the matrix A where A = T⁻¹AT for T an invertible matrix, and therefore the analysis of characteristic polynomial mentioned below still holds.

As with A, we choose a tempering transform to be easily computable, and so do not actually construct T itself. The tempering is defined in the case of Mersenne Twister as

y := x ⊕ ((x >> u) & d) y := y ⊕ ((y << s) & b) y := y ⊕ ((y << t) & c) z := y ⊕ (y >> l)

where x is the next value from the series, y a temporary intermediate value, z the value returned from the algorithm, with <<, >> as the bitwise left and right shifts, and & as the bitwise and. The first and last transforms are added in order to improve lower-bit equidistribution. From the property of TGFSR, s + t ≥ ⌊ w / 2 ⌋ − 1 is required to reach the upper bound of equidistribution for the upper bits.

The coefficients for MT19937 are:

(w, n, m, r) = (32, 624, 397, 31)

a = 9908B0DF₁₆

(u, d) = (11, FFFFFFFF₁₆)

(s, b) = (7, 9D2C5680₁₆)

(t, c) = (15, EFC60000₁₆)

l = 18

Note that 32-bit implementations of the Mersenne Twister generally have d = FFFFFFFF₁₆. As a result, the d is occasionally omitted from the algorithm description, since the bitwise and with d in that case has no effect.

The coefficients for MT19937-64 are:

(w, n, m, r) = (64, 312, 156, 31)

a = B5026F5AA96619E9₁₆

(u, d) = (29, 5555555555555555₁₆)

(s, b) = (17, 71D67FFFEDA60000₁₆)

(t, c) = (37, FFF7EEE000000000₁₆)

l = 43

Initialization

As should be apparent from the above description, the state needed for a Mersenne Twister implementation is an array of n values of w bits each. To initialize the array, a w-bit seed value is used to supply x₀ through x_n − 1 by setting x₀ to the seed value and thereafter setting

x_i = f × (x_i-1 ⊕ (x_i-1 >> (w-2))) + i

for i from 1 to n-1. The first value the algorithm then generates is based on x_n, not based on x₀. The constant f forms another parameter to the generator, though not part of the algorithm proper. The value for f for MT19937 is 1812433253 and for MT19937-64 is 6364136223846793005.

Comparison with classical GFSR

In order to achieve the 2^nw − r − 1 theoretical upper limit of the period in a TGFSR, φ_B(t) must be a primitive polynomial, φ_B(t) being the characteristic polynomial of

B = ( 0 I w ⋯ 0 0 ⋮ I w ⋮ ⋱ ⋮ ⋮ ⋮ 0 0 ⋯ I w 0 0 0 ⋯ 0 I w − r S 0 ⋯ 0 0 ) ← m -th row

S = ( 0 I r I w − r 0 ) A

The twist transformation improves the classical GFSR with the following key properties:

The period reaches the theoretical upper limit 2^nw − r − 1 (except if initialized with 0)

Equidistribution in n dimensions (e.g. linear congruential generators can at best manage reasonable distribution in five dimensions)

Pseudocode

The following piece of pseudocode implements the general Mersenne Twister algorithm. The constants w, n, m, r, a, u, d, s, b, t, c, l, and f are as in the algorithm description above. It is assumed that int represents a type sufficient to hold values with w bits:

// Create a length n array to store the state of the generator int[0..n-1] MT int index := n+1 const int lower_mask = (1 << r) - 1 // That is, the binary number of r 1's const int upper_mask = lowest w bits of (not lower_mask) // Initialize the generator from a seed function seed_mt(int seed) { index := n MT[0] := seed for i from 1 to (n - 1) { // loop over each element MT[i] := lowest w bits of (f * (MT[i-1] xor (MT[i-1] >> (w-2))) + i) } } // Extract a tempered value based on MT[index] // calling twist() every n numbers function extract_number() { if index >= n { if index > n { error "Generator was never seeded" // Alternatively, seed with constant value; 5489 is used in reference C code } twist() } int y := MT[index] y := y xor ((y >> u) and d) y := y xor ((y << s) and b) y := y xor ((y << t) and c) y := y xor (y >> l) index := index + 1 return lowest w bits of (y) } // Generate the next n values from the series x_i function twist() { for i from 0 to (n-1) { int x := (MT[i] and upper_mask) + (MT[(i+1) mod n] and lower_mask) int xA := x >> 1 if (x mod 2) != 0 { // lowest bit of x is 1 xA := xA xor a } MT[i] := MT[(i + m) mod n] xor xA } index := 0 }

Python implementation

This python implementation hard-codes the constants for MT19937:

Then MT19937(seed).extract_number() returns the random number, where seed is the initial seed.

C# implementation

This C# implementation hard-codes the constants for MT19937-64:

C/C++ implementation

Simple 32-bit C/C++ implementation (tested using GCC for ARM):

Haskell implementation

This Haskell implementation is optimized for readability and uses an inefficient list representation of the state array x i , … , x i + n − 1 :

Freepascal implementation

Output checked against G++ C++11 mt19937 and FPC's default random.
This is NOT the Mersenne twister implementation that Freepascal uses as its default prng, but the algorithm is the same.
Usage:

SFMT

SFMT, the single instruction, multiple data-oriented fast Mersenne Twister, is a variant of Mersenne Twister, introduced in 2006, designed to be fast when it runs on 128-bit SIMD.

It is roughly twice as fast as Mersenne Twister.

It has a better equidistribution property of v-bit accuracy than MT but worse than WELL ("Well Equidistributed Long-period Linear").

It has quicker recovery from zero-excess initial state than MT, but slower than WELL.

It supports various periods from 2⁶⁰⁷−1 to 2²¹⁶⁰⁹¹−1.

Intel SSE2 and PowerPC AltiVec are supported by SFMT. It is also used for games with the Cell BE in the PlayStation 3.

MTGP

MTGP is a variant of Mersenne Twister optimised for graphics processing units published by Mutsuo Saito and Makoto Matsumoto. The basic linear recurrence operations are extended from MT and parameters are chosen to allow many threads to compute the recursion in parallel, while sharing their state space to reduce memory load. The paper claims improved equidistribution over MT and performance on a high specification GPU (Nvidia GTX260 with 192 cores) of 4.7 ms for 5×10⁷ random 32-bit integers.

References

Mersenne Twister Wikipedia

(Text) CC BY-SA

Contents