instituto de engenharia de sistemas e computadores investigação e desenvolvimento em lisboa...
TRANSCRIPT
Segurança em Redes Móveis 2009 1/35
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seedMersenne Twister
Random Number Generator&
Diehard Randomness Test
Samuel Antão
Technical University of Lisbon/INESC-ID, Lisbon, Portugal
14th December, 2009
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
2/35
Outline
• Mersenne Twister Overview
• Mersenne Twister Details
• Bitstream Test (Diehard)
• Other Diehard results
• Conclusions
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
3/35
Mersenne Twister Overview• Proposed in 1998 by M. Matsumoto and T. Nishimura
• Ultra long period of 219937-1
• Based on GF(2) fast arithmetic
• Standard random number generation in several applications
– Maple
– Matlab
– Python scripting language
• Faced some criticism by computer science field (George Marsaglia):
– Not very elegant
– Marsaglia proposed some RNG not so complex and with equivalent randomness properties
M. Matsumoto and Takuji Nishimura. “Mersenne Twister – A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator ”. ACM Transactions on Modeling and Computer Simulation, Vol. 8, No. 1, January 1998, pp 3-30
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
4/35
Mersenne Twister Overview• History
– MT based on General Feedback Shift Register (GFSR) approach (Lewis
and Payne, 1973)
– Example for the polynomial x5+x2+1 and delay 6:
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
5/35
Mersenne Twister Overview• History (cont.)
– The GFSR quality depends on judicious chosen seeds
– To overcome this problem in 1992 Matsumoto and Kurita proposed the Twisted
GFSR (TGFSR)
– In the TGFSR one of the feedback operands is multiplied by a matrix
– In 1994, Tezuka, discovered significant correlation in the 2 least significant bits of a
TGFSR sequence
– In 1998, Matsumoto and Nishimura, overcame that problem by setting one of the
feedback operands as a concatenation of high/low part of other two
– The 1998 proposal sets the generator period to be of the form 2nw-r-1, thus a
Mersenne prime: Mersenne Twister
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
6/35
Mersenne Twister Details• Lack of definition of “good randomness”
• The authors selected two randomness metrics that, together, are
believed to be respected only by good generators
– K-distribution
– Number of terms in the recurrence characteristic polynomial
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
7/35
Mersenne Twister Details• K-distribution
– Considering a a pseudorandom sequence xi of w-bit integers of period P it is
possible to obtain a kv-bit vector as:
(truncv( xi ), truncv( xi+1 ), … , truncv( xi+k-1 )) (0 ≤ i < P)
– truncv( y ) is the most significant v bits of y
– A sequence is said to be k-distributed to v-bit accuracy if all the 2kv possible
combinations occur the same number of times in one period (except for zero)
– k(v) is the maximum k such that the sequence is k-distributed to v-bit accuracy
– Pseudorandom generator period bound: 2k(v)v-1 ≤ P
– We are interested in as large as possible k correspondent to as large as possible v
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
8/35
Mersenne Twister Details• Recurrence characteristic polynomial number of terms
– Each step of the pseudorandom number generation (PRNG) performs a
linear recurrence. This recurrence has a specific characteristic polynomial
– Usually the characteristic polynomial is a trinomial (GFSR)
– A PRNG based on polynomials with few terms shows poor randomness
– A PRNG based on polynomials that can be generated from trinomials also
shows poor randomness
– We are interested in implementing a linear recurrence with a characteristic
polynomial with a large amount of terms
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
9/35
Mersenne Twister Details• MT recurrence step
– General formula
xk+n = xk+m xor (xuk|xl
k+1)A, (k=0,1,2,…)
– Matrix A chosen to be easy to operate
021
1000010
aaa
A
ww
1 if )(shiftright0 if)(shiftright
0
0
xxorx
A axxx
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
10/35
Mersenne Twister Details• MT recurrence step
– Data set (state) transformation
– r bits removed from the state to solve the problems regarding low k-distribution for
small v-bit accuracy (v=2) reported in previous TGFSR versions
– Period given by 2p-1=2nw-r-1
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
11/35
Mersenne Twister Details• Tampering matrix T
– After observing a sufficient large amount of random data, it becomes possible to
predict nest random values
– The recurrence does not suit cryptographic applications
– Solution: after each step the state is multiplied by an invertible matrix T
– It does not completely overcome the problem (hash function required for
cryptographic applications)
– Matrix T operations (z=xT):
• y = x xor (x >> u)
• y = y xor ((y<<s) and b)
• y = y xor ((y<<t) and c)
• z = y xor (y >> l)
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
12/35
Mersenne Twister Details• Matrix T does not change the period of the recurrence
• Parameters for the recurrence and matrix T obtained after extensive
search
• The theoretical period bound 2k(v)v – 1 ≤ 2nw-r – 1 cannot be reached
• To attain the maximum period 2p – 1 = 2nw-r – 1 the recurrence
characteristic polynomial c(t) must be primitive:
• A special method to test for primitivity had to be used to compute the
generator parameters in useful time
)(mod)(mod
2
2
tctttctt
p
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
13/35
Mersenne Twister Details• MT parameters search
K-distribution bounds
MT results
Previous TGFSR
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
14/35
Mersenne Twister Details• MT summary
– Period 219937-1
– 623-distributed to 32-bit accuracy
– Characteristic polynomial with several terms (~100)
– Constrained data set of 624 word data
– Fast generation (from 1.5 to 2 times faster than AES encryption)
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
15/35
Bitstream Test
• The random sequence were obtained from a C library available online
by Geoff Kuenning:
– http://www.cs.hmc.edu/~geoff/mtwist.html
– 132 million random 32-bit integers per second (3.6 GHz Pentium Xeon)
• 3000 files of 11,468,800 bytes were obtained, to support 3000
independent Diehard tests
• The MT random sequences passed all the Diehard tests (Win32
version)
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
16/35
Bitstream Test
• Bitstream test details
– The random sequence is used to obtain 221overlapped 20-bit words
– Considering a repository that contains all the possible words of 20-bit (220), some of
these words may not appear in the 221 obtained from the random sequence
– For a truly random sequence, the number of missing words obtained from that
sequence should approximately follow a normal distribution of µ=141,909 and
σ=428
– Each Diehard test run performs 20 bitstream tests. The total of bitstream tests
performed are 20x3000=60000.
19222122
21432
20321
21212121 ...
......
bbbb
bbbbbbbb
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
17/35
Bitstream Test
• Bitstream test ~ N(141909, 428) (class width = 30)
1.4 1.405 1.41 1.415 1.42 1.425 1.43 1.435 1.44
x 105
0
200
400
600
800
1000
1200
1400
1600
1800
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
18/35
Other Diehard results
• Birthday Spacings ~ χ2(6) (class width = 0.1)
0 5 10 15 20 250
50
100
150
200
250
300
350
400
450
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
19/35
Other Diehard results
• Overlapping 5-Permutation ~ χ2(99) (class width = 1)
20 40 60 80 100 120 140 160 180 2000
50
100
150
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
20/35
Other Diehard results
• Binary Rank for 31x31 Matrices ~ χ2(3) (class width = 0.1)
0 5 10 150
10
20
30
40
50
60
70
80
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
21/35
Other Diehard results
• Binary Rank for 32x32 Matrices ~ χ2(6) (class width = 0.1)
0 5 10 150
10
20
30
40
50
60
70
80
90
100
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
22/35
Other Diehard results
• Binary Rank for 6x8 Matrices ~ Exp(2) (class width = 0.01)
0 0.5 1 1.5 2 2.5 350
100
150
200
250
300
350
400
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
23/35
Other Diehard results
• Overlapping-Pairs-Sparse-Occupancy ~ N(141909,290) (class width = 30)
1.405 1.41 1.415 1.42 1.425 1.43 1.435
x 105
0
500
1000
1500
2000
2500
3000
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
24/35
Other Diehard results
• Overlapping-Quadruples-Sparse-Occupancy ~ N(141909,295) (class width = 30)
1.405 1.41 1.415 1.42 1.425 1.43 1.435
x 105
0
500
1000
1500
2000
2500
3000
3500
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
25/35
Other Diehard results
• DNA ~ N(141909,339) (class width = 30)
1.405 1.41 1.415 1.42 1.425 1.43 1.435
x 105
0
500
1000
1500
2000
2500
3000
3500
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
26/35
Other Diehard results
• Count the 1’s ~ χ2(2500) (class width = 7)
2200 2300 2400 2500 2600 2700 28000
50
100
150
200
250
300
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
27/35
Other Diehard results
• Count the 1’s for specific bytes ~ χ2(2500) (class width = 5)
2200 2300 2400 2500 2600 2700 28000
500
1000
1500
2000
2500
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
28/35
Other Diehard results
• Parking Lot ~ N(3523,21.9) (class width = 1)
3400 3450 3500 3550 3600 36500
100
200
300
400
500
600
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
29/35
Other Diehard results
• Minimum distance ~ Exp(0.995) (class width = 0.01)
0 0.5 1 1.5 2 2.5 3 3.5 40
100
200
300
400
500
600
700
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
30/35
Other Diehard results
• 3D Spheres ~ Exp(30) (class width = 1)
0 20 40 60 80 1000
200
400
600
800
1000
1200
1400
1600
1800
2000
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
31/35
Other Diehard results
• Squeeze ~ χ2(42) (class width = 0.7)
10 20 30 40 50 60 70 800
20
40
60
80
100
120
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
32/35
Other Diehard results
• Craps - Wins ~ N(200000p,sqrt(200000p(1-p))) (class width = 15), p=244/495
9.75 9.8 9.85 9.9 9.95
x 104
0
10
20
30
40
50
60
70
80
90
100
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
33/35
Other Diehard results
• Craps - Throws ~ χ2(20) (class width = 0.7)
0 5 10 15 20 25 30 35 400
20
40
60
80
100
120
140
160
Classes
Dis
trib
utio
n
ObtainedExpected
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
34/35
Other Diehard results
• The remaining tests only provide the final p-values, thus it was no
possible to plot the results
– Runs test
– Overlapping sums test
• Although, both provide uniform p-values, confirming the good
randomness of the generator
Segurança em Redes Móveis 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technologyfrom seed
35/35
Conclusions
• MT is a widely used pseudorandom number generator
• MT has extra large period
• MT is computationally fast
• MT passes the Diehard statistical tests (and others as well)
• Statistical tests are a good platform to test for randomness
• The bitstream test, is one of statistical tests to be employed