173 Not So Random

Description

As part of Ruby’s standard library, we have access to a good pseudorandom number generator (PRNG), the Mersenne twister. (In particular, the MT19937 variant.) Ruby provides two kernel functions to make use of this generator: srand and rand. This week’s quiz involves generating random numbers in a predictable manner.

The first part is to create a wrapper around the random number generator rand, supporting the appropriate parameter (i.e. the parameter intrinsic to rand). Your wrapper should implement two additional functions: next which returns the next random number, and reset which restarts the sequence. So, for example:

irb> x = Random.new(100)
=> #<Random:...>

irb> Array.new(5) { x.next }
=> [51, 92, 14, 71, 60]

irb> Array.new(5) { x.next }
=> [20, 82, 86, 74, 74]

irb> x.reset
=> nil

irb> Array.new(5) { x.next }
=> [51, 92, 14, 71, 60]     # after reset, sequence restarts

You may do this as a class, as depicted here, or as a function, lambda, code block… whatever you like.

The second part is a little trickier: creating multiple, concurrent, reproducible sequences of pseudorandom numbers isn’t so easy. As convenient as the built-in generator is, there is only one seed. For example, assume we have two Random objects, created as shown above, x and y.

irb> x = Random.new(100)
=> #<Random:...> 

irb> Array.new(6) { x.next }
=> [51, 92, 14, 71, 60, 20]

irb> y = Random.new(100)
=> #<random:...>

irb> Array.new(6) { y.next }
=> [66, 92, 98, 17, 83, 57]

irb> x.reset
=> nil

irb> Array.new(2) { x.next }
=> [51, 92]       # ok: sequence restarted as requested

irb> Array.new(2) { y.next }
=> [14, 71]       # fail: this is part of _x_ sequence

irb> Array.new(2) { x.next }
=> [60, 20]       # more fail: _x_ is now [51, 92, 60, 20]? wrong...

The reason for the failure should be obvious: my current implementation of Random just blindly uses srand and rand without considering that there may be multiple instances of Random.

So, for this second part, expand your wrapper to support concurrent use. Please note that you are required to make use of the built-in generator: you should not implement your own PRNG.

One final note… It is up to you whether the seed for each wrapper will be user-settable or hidden (as in my examples above). However, if hidden, each wrapper should have a different seed. (Generated randomly, perhaps?)

Summary

Procedural generation is a valuable technique for computer graphics and games that relies on algorithms to produce content on the fly. Use of procedural techniques have increased in recent years, due to the large volume of content that must be created… too large or unwieldy to be completed by hand in a short duration. But procedural generation is not new. Early computer games such as Elite and Rescue on Fractalus used algorithms, rather than preformed data, to represent large worlds on machines with limited storage.

On one hand, CPU cycles are traded for storage limitations. On the other, CPU cycles are exchanged for human and schedule limitations. Essentially, procedural generation is a way to swap CPU time for other limited resources. But if the same algorithms are run repeatedly, won’t everything look the same? An algorithm, given the same input, by definition produces the same output.

That’s where pseudorandom numbers can help, both to provide variety and maintain repeatability. A random sequence starting with a seed value of 42 will – when put through a particular algorithm – always generate the same output. Yet, start the sequence with seed value 43, and the same algorithm will generate new output. We can generate a forest of randomly generated trees, each looking similar but unique, and not have the trees randomly change every frame.

Do we need more than one generator? It depends on your algorithms and how you generate the content. It could work, but it may be easier to keep some things separate. One random number generator for the forest. Another generator for the city buildings. Another one for people. That way, I can avoid generating information for the forest if it happens to go offscreen, because the PRNGs for the buildings and the people are independent of the forest PRNG.

Generally, for task such as this, you don’t usually need a generator such as the Mersenne Twister. A linear congruential generator, while not really very random, is quite simple, fast and often sufficient. For such a task, Ruby code of a few lines would have been completely sufficient.

But I wanted to see how we could convince Ruby’s rand to do what we wanted, without writing a new PRNG. I’m glad I asked, because after playing with closures and continuations, I conceived only one solution myself… I knew there had to be another, better way.

But first, let’s look at the first solution from Frederick Cheung, which implements the only idea I had.

class Random
   def initialize(*rand_args)
     @rand_args = rand_args
     ignored, @start_seed = srand, srand
     @sequence = 0
   end

   def next
     srand(@start_seed)
     @sequence.times { rand *@rand_args}
     @sequence += 1
     rand *@rand_args
   end

   def reset
     @sequence = 0
   end
end 

At first, I thought Frederick was the only one who implemented a solution that would allow rand to produce random floats between zero and one (i.e. what you get when you call rand with no parameters). I consider this important behavior. However, it turns out that passing zero to rand induces the same behavior, so props to everyone who provided a default value of zero.

This implementation is straightforward. Each time we call next to get the next random number in the sequence, we restart the sequence, calling rand a number of times equal to @sequence, which is incremented each call. This is very simple and guarantees we always get back to the correct point, whether or not other Random objects have generated numbers.

(Note that while I did mention concurrent in the quiz description, I did not mean to imply multithreaded, so for our purposes here, this is sufficient.)

I found this line in the initializer a little curious:

   ignored, @start_seed = srand, srand

srand returns the previous seed. Calling it twice in a row will first set the seed, then return that and set it again to a new value. But the saved seed is what is used in calls to next. It’s a nice trick to get the (time-based) seed generated by srand.

While simple, this solution is highly inefficient. Procedural generation can require a lot of random numbers, and this PRNG algorithm is quadratic. (That is, O(n^2).) Very quickly, this generator would become painfully slow.

One possible improvement, provided in a few solutions, is caching. Since we’re generating the same exact data, we can save our work and reuse it later. In our contrived situation, This isn’t as bad as it sounds. Although memory use must be considered, it’s not unlikely that a particular Random object – a particular sequence – will always generate the same quantity of numbers every time. If we’re not uprooting trees in that forest, it’s likely we need to continue generating the same number of trees. Of course, in typical situations, you wouldn’t force yourself into this quiz’s confines, but if a small cache and a one-time generation is sufficient, you could trade a little memory for CPU time, if need be.

To finish up, let’s look at brabuhr’s clean, compact solution.

require 'slave'

class Random
  def initialize(ceiling, seed = nil)
    @ceiling = ceiling
    @seed = seed || rand(2112)
    @slave = Slave::new{
      lambda {|reset|
        if reset
          srand(@seed)
        else
          rand(@ceiling)
        end
      }
    }
    self.reset
  end

  def next
    @slave.object.call(false)
  end

  def reset
    @slave.object.call(true)
  end
end 

The Slave library combines a forked Ruby process with DRb (Distributed Ruby) and creates a client-server relationship between the two processes. This relationship is created in the call to Slave::new, which passes the following block to the slave:

      lambda { |reset|
        if reset
          srand(@seed)
        else
          rand(@ceiling)
        end
      }

Calling this block with a boolean (as done in the next and reset methods) simply invokes either srand or rand. But because the slave object is a forked Ruby process, it contains a separate built-in seed for those methods. Each Random instance will create such a slave, thereby creating independent seeds that cannot interfere with one another. In my own experience, I am so used to threading (i.e. shared state and address space), the idea of using a forked process (i.e. independent state and address space) didn’t cross my mind.

Using Slave may have some overhead from the interprocess communication. However, it is a constant-time operation: it should easily beat the first Random implementation for performance for typical use.


Wednesday, February 04, 2009