« Impressions of a bunch of game demosHalf-Life 2 is a nightmare »

Testing random number generation

23/03/10

Permalink 02:31:14 pm, 1183 words
Categories: Code, Linguistics

Testing random number generation

Here’s how to test a pseudo-random number generator to make sure that it’s giving you a uniform distribution. Simple, but something that I just now had time to do.

For my research, pseudo-random numbers are perfect; if you use the same seed at the beginning, you get the same numbers back every time. I don’t even care if the stream is predictable by an outside observer; I just need a series of numbers that are uniformly distributed over a range. This is true for many scientific applications.

That means my “randomness” test is pretty simple at the heart: I just have to make sure the distribution is uniform.

Here’s the random number generator I’m using. It’s from The C++ Programming Language by Bjarne Stroustrup.

class Randint {
  unsigned long randx;
public:
  Randint(long s = 0) { randx=s; }
  void seed(long s) { randx=s; }
  //magic numbers chosen to use 31 bits of a 32-bit long
  static long abs(long x) {return x & 0x7fffffff; }
  static double max() { return 2147483648.0; } // note: a double
  long draw() { return randx = randx * 1103515245 + 12345; }
  double fdraw() { return abs(draw()) / max(); } // in the interval [0,1]
  long operator()() { return abs(draw()); } // in the interval [0,2**31]
};
class Urand : public Randint { // uniform distro in the interval [0:n[
public:
  Urand(long s = 0) : Randint(s) { }
  long next(long n) { long r = long(n*fdraw()); return (r==n) ? n-1: r; }
};

So you can see that we have a nice concise class here, then some needless inheritance. Yup, must be a Stroustrup design. (Well, in his defence, I think he was using random number generation to illustrate principles of inheritance.)

OK, let’s jump forward a couple of years to yesterday. I’ve read about 3/4 megabyte of other people’s C++ in the interim and discovered that my advisor was right all along: Just Pretend It’s C.

#define N 5
#define TRIALS 300000
void random_test() {
  Urand uni((unsigned)time(0));
  int is[N];
  for(int i = 0; i < N; i++) {
    is[i] = 0;
  }
  for(int _ = 0; _ < N * TRIALS; _++) {
    is[uni.next(N)]++;
  }
  for(int i = 0; i < N; i++) {
    cout << is[i] << ' ';
  }
  cout << endl;
}

Well, you can see right away a C-like use of the preprocessor to define constants. I remember Stroustrup says not to do that, but I can’t remember what the replacement is. So I Just Pretend It’s C.

Next, I declare a random number generator and an array to count how many times each number in the range N shows up.

Now there are three loops. Loop 1 initialises the array to 0. Loop 2 generates N*TRIALS numbers in the range N. Every time a number shows up, it increments the appropriate elements in the counting array. I used ++ because the languages I normally use don’t support it and it was exciting.

Finally, I left-shift cout by a bunch of numbers. Wait, I mean “print out the results". Because there are N elements in the array and N*TRIALS numbers generated, each element should be almost equal to TRIALS. In other words, the output should be something like this:

300902 299324 299833 300159 299782

In fact, because this is C, you should actually check that the sum of those numbers is N*TRIALS to make sure you weren’t incrementing like, is[N] or is[-1] by mistake.

OK, so that’s the simple case. In addition, I want to verify that the specific ways I use the random number generator are working. So I need to test two things: (1) does my sampling code (choice) work? (2) does my shuffling code (shuffle) work?

template <class T> T choice(const vector<T>& l) {
  static Urand uni((unsigned)time(0));
  return l[uni.next(l.size())];
}

template <class T> void shuffle(vector<T>& l) {
  static Urand uni((unsigned)time(0));

  for(int i = l.size(); i > 1; i--) {
    int j = uni.next(i);
    T tmp = l[j];
    l[j] = l[i-1];
    l[i-1] = tmp;
  }
}

Both functions cache their own static instance of the random number generator. I did it this way to avoid globals. I don’t know if this idiomatic C, because I don’t remember ever seeing static used in real code. Besides that, I don’t think there’s anything remarkable about this code. It’s the shuffle algorithm from Knuth, the efficient one that I can’t use in Haskell without dropping into the ST or IO monad. So that’s cool. I like it.

To test choice, we want to see whether each element in a list is chosen about the same number of times. Since Urand::next works, it’s pretty easy to suspect that choice works, but let’s write the test anyway.

void choice_test() {
  vector<int> l(N);
  int is[N];
  for(int i = 0; i < N; i++) {
    is[i] = 0;
    l[i] = i;
  }
  for(int _ = 0; _ < N * TRIALS; _++) {
    is[choice(l)]++;
  }
  for(int i = 0; i < N; i++) {
    cout << is[i] << ' ';
  }
  cout << endl;
}

Yeah, pretty much the same as random_test except there’s a vector l that holds the numbers 0-(N-1). Testing shuffle, on the other hand, is more interesting.

void shuffle_test() {
  vector<int> l(N);
  int rs[N][N];
  for(int i = 0; i < N; i++) {
    l[i] = i;
    for(int j = 0; j < N; j++) {
      rs[i][j] = 0;
    }
  }
  for(int _ = 0; _ < N * TRIALS; _++) {
    shuffle(l);
    for(int i = 0; i < N; i++) {
      rs[i][l[i]]++;
    }
  }
  for(int i = 0; i < N; i++) {
    for(int j = 0; j < N; j++) {
      cout << rs[i][j] << ' ';
    }
    cout << endl;
  }
}

Again there are three loops, but now they’re nested and the test array is 2D. That’s because the way to test shuffle is to make an vector l of length N, shuffle it N*TRIALS times, and make sure that each element of l appears in each position approximately TRIALS times. So if N=3 and TRIALS=1000, then we expect 0 to appear in position 0 1000 times, position 1 1000 times and position 2 1000 times. Likewise, we expect 1 to appear in position 0 1000 times, position 1 1000 times and position 2 1000 times. Finally, 2 should appear in position 0 1000, 1 1000 and 2 1000. So you can see that the test matrix has to be NxN.

012
0100010001000
1100010001000
2100010001000

Here is some example output for the actual definitions of N and TRIALS from above (5 and 30000). Again, summing each row to make sure that they’re all equal to N*TRIALS is a good idea.

300943 299623 299526 300206 299702
299383 299473 300661 299518 300965
300022 299529 300791 300685 298973
299796 300818 299084 299717 300585
299856 300557 299938 299874 299775

Postscript

I have to say that I didn’t think the generator I implemented really worked. I figured either a throwaway example in The C++ Programming Language would be too simple, or I would have made a mistake implementing it somewhere. Neither turned out to be true. Well, I did have a bug in shuffle at first. Turns that you can’t swap things using references:

    T& tmp = l[j];
    l[j] = l[i-1];
    l[i-1] = tmp;

You have to make a real temporary copy of the thing.

    T tmp = l[j];
    l[j] = l[i-1];
    l[i-1] = tmp;

At least if your input list is of type vector<T>; I suppose you could do it with a vector<T*> in which case you have an additional indirection all the time. So never mind.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
powered by b2evolution free blog software