| « Impressions of a bunch of game demos | Half-Life 2 is a nightmare » |
Here’s how to test a pseudo-random number generator to make sure that it’s giving you a uniform distribution. Simple, but something that I just now had time to do.
For my research, pseudo-random numbers are perfect; if you use the same seed at the beginning, you get the same numbers back every time. I don’t even care if the stream is predictable by an outside observer; I just need a series of numbers that are uniformly distributed over a range. This is true for many scientific applications.
That means my “randomness” test is pretty simple at the heart: I just have to make sure the distribution is uniform.
Here’s the random number generator I’m using. It’s from The C++ Programming Language by Bjarne Stroustrup.
class Randint {
unsigned long randx;
public:
Randint(long s = 0) { randx=s; }
void seed(long s) { randx=s; }
//magic numbers chosen to use 31 bits of a 32-bit long
static long abs(long x) {return x & 0x7fffffff; }
static double max() { return 2147483648.0; } // note: a double
long draw() { return randx = randx * 1103515245 + 12345; }
double fdraw() { return abs(draw()) / max(); } // in the interval [0,1]
long operator()() { return abs(draw()); } // in the interval [0,2**31]
};
class Urand : public Randint { // uniform distro in the interval [0:n[
public:
Urand(long s = 0) : Randint(s) { }
long next(long n) { long r = long(n*fdraw()); return (r==n) ? n-1: r; }
};
So you can see that we have a nice concise class here, then some needless inheritance. Yup, must be a Stroustrup design. (Well, in his defence, I think he was using random number generation to illustrate principles of inheritance.)
OK, let’s jump forward a couple of years to yesterday. I’ve read about 3/4 megabyte of other people’s C++ in the interim and discovered that my advisor was right all along: Just Pretend It’s C.
#define N 5
#define TRIALS 300000
void random_test() {
Urand uni((unsigned)time(0));
int is[N];
for(int i = 0; i < N; i++) {
is[i] = 0;
}
for(int _ = 0; _ < N * TRIALS; _++) {
is[uni.next(N)]++;
}
for(int i = 0; i < N; i++) {
cout << is[i] << ' ';
}
cout << endl;
}
Well, you can see right away a C-like use of the preprocessor to define constants. I remember Stroustrup says not to do that, but I can’t remember what the replacement is. So I Just Pretend It’s C.
Next, I declare a random number generator and an array to count how many times each number in the range N shows up.
Now there are three loops. Loop 1 initialises the array to 0. Loop 2 generates N*TRIALS numbers in the range N. Every time a number shows up, it increments the appropriate elements in the counting array. I used ++ because the languages I normally use don’t support it and it was exciting.
Finally, I left-shift cout by a bunch of numbers. Wait, I mean “print out the results". Because there are N elements in the array and N*TRIALS numbers generated, each element should be almost equal to TRIALS. In other words, the output should be something like this:
300902 299324 299833 300159 299782
In fact, because this is C, you should actually check that the sum of those numbers is N*TRIALS to make sure you weren’t incrementing like, is[N] or is[-1] by mistake.
OK, so that’s the simple case. In addition, I want to verify that the specific ways I use the random number generator are working. So I need to test two things: (1) does my sampling code (choice) work? (2) does my shuffling code (shuffle) work?
template <class T> T choice(const vector<T>& l) {
static Urand uni((unsigned)time(0));
return l[uni.next(l.size())];
}
template <class T> void shuffle(vector<T>& l) {
static Urand uni((unsigned)time(0));
for(int i = l.size(); i > 1; i--) {
int j = uni.next(i);
T tmp = l[j];
l[j] = l[i-1];
l[i-1] = tmp;
}
}
Both functions cache their own static instance of the random number generator. I did it this way to avoid globals. I don’t know if this idiomatic C, because I don’t remember ever seeing static used in real code. Besides that, I don’t think there’s anything remarkable about this code. It’s the shuffle algorithm from Knuth, the efficient one that I can’t use in Haskell without dropping into the ST or IO monad. So that’s cool. I like it.
To test choice, we want to see whether each element in a list is chosen about the same number of times. Since Urand::next works, it’s pretty easy to suspect that choice works, but let’s write the test anyway.
void choice_test() {
vector<int> l(N);
int is[N];
for(int i = 0; i < N; i++) {
is[i] = 0;
l[i] = i;
}
for(int _ = 0; _ < N * TRIALS; _++) {
is[choice(l)]++;
}
for(int i = 0; i < N; i++) {
cout << is[i] << ' ';
}
cout << endl;
}
Yeah, pretty much the same as random_test except there’s a vector l that holds the numbers 0-(N-1). Testing shuffle, on the other hand, is more interesting.
void shuffle_test() {
vector<int> l(N);
int rs[N][N];
for(int i = 0; i < N; i++) {
l[i] = i;
for(int j = 0; j < N; j++) {
rs[i][j] = 0;
}
}
for(int _ = 0; _ < N * TRIALS; _++) {
shuffle(l);
for(int i = 0; i < N; i++) {
rs[i][l[i]]++;
}
}
for(int i = 0; i < N; i++) {
for(int j = 0; j < N; j++) {
cout << rs[i][j] << ' ';
}
cout << endl;
}
}
Again there are three loops, but now they’re nested and the test array is 2D. That’s because the way to test shuffle is to make an vector l of length N, shuffle it N*TRIALS times, and make sure that each element of l appears in each position approximately TRIALS times. So if N=3 and TRIALS=1000, then we expect 0 to appear in position 0 1000 times, position 1 1000 times and position 2 1000 times. Likewise, we expect 1 to appear in position 0 1000 times, position 1 1000 times and position 2 1000 times. Finally, 2 should appear in position 0 1000, 1 1000 and 2 1000. So you can see that the test matrix has to be NxN.
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 1000 | 1000 | 1000 |
| 1 | 1000 | 1000 | 1000 |
| 2 | 1000 | 1000 | 1000 |
Here is some example output for the actual definitions of N and TRIALS from above (5 and 30000). Again, summing each row to make sure that they’re all equal to N*TRIALS is a good idea.
300943 299623 299526 300206 299702 299383 299473 300661 299518 300965 300022 299529 300791 300685 298973 299796 300818 299084 299717 300585 299856 300557 299938 299874 299775
I have to say that I didn’t think the generator I implemented really worked. I figured either a throwaway example in The C++ Programming Language would be too simple, or I would have made a mistake implementing it somewhere. Neither turned out to be true. Well, I did have a bug in shuffle at first. Turns that you can’t swap things using references:
T& tmp = l[j];
l[j] = l[i-1];
l[i-1] = tmp;
You have to make a real temporary copy of the thing.
T tmp = l[j];
l[j] = l[i-1];
l[i-1] = tmp;
At least if your input list is of type vector<T>; I suppose you could do it with a vector<T*> in which case you have an additional indirection all the time. So never mind.