212 Genetic Programming

Description

Buon giorno Rubyists,

Let’s say that we have a table of outputs for an unknown function, and we want to find out what that function is. One possible method of finding a solution is to use genetic programming. In genetic programming we begin with a random population of programs, then rank them according to how well they meet the solution criteria. The top ranked programs are then modified to create a new generation of programs. The new generation is ranked by how well the solve the problem in the same way and the process repeats until, hopefully, we’ll have evolved a strong solution.

The modifications made to the programs are based on techniques observed in biological evolution: mutation and crossover. In mutation a piece of one program is altered randomly, for example x + 3 could become x + (y - 1). The mutation occurs at one node in the parse tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly altering the node it is taken from another node of another program. For example: x + (y * x) and 3 - (y + x*x) could yield something like (y*x) - (y + x*x).

In order to mutate our programs it would be nice to work with the structure of the code directly. The ParseTree gem provides a way of getting an Abstract Syntax Tree to manipulate. The AST represents the code as S-expressions, arrays containing operators as the first argument and the operands as the remaining arguments. If you are familiar with LISP then you know that LISP programmers program directly in S-expressions. Using S-expressions makes evolution and manipulation of the programs much easier, as S-expressions are the essence of programs.

When manipulating your ASTs you may need a deep copy to prevent unwanted side effects Here is a deep copy idiom that may be useful:

def deep_copy(tree)
  Marshal.load(Marshal.dump(tree)
end

In order to get the AST back into executable ruby you’ll need ruby2ruby or something like it. As always, feel encouraged to use and share other tools if you feel they are good for the task. Also, if you have any questions don’t be afraid to ask, there are many people on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would like to evolve an algorithm for:

["x", "y", "result"]
[8, 16, 20808]
[22, 31, 150847]
[5, 16, 20685]
[12, 19, 34895]
[18, 25, 79349]
[20, 33, 181525]
[31, 1, -119]
[19, 33, 181433]
[0, 12, 8640]
[13, 12, 9017]

In order to determine how well suited a program is we can use the following metric function:

# Return how far off from the data the given method is
# A lower score is better, a score of 0 matches the data perfectly
def fitness(program, data)
  delta = 0
  data.each do |row|
    value = program.calculate(row[0], row[1])
    delta += (value - row[2]).abs
  end
  return delta
end

Have Fun!

Summary

Sander Land’s solution used Charlie a library for genetic algorithms and genetic programming. Let’s take a look at Sander’s solution:

%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}],  [:-@], [:+,:*,:-])
  DATA = [[8, 16, 20808], [22, 31, 150847],  [5, 16, 20685], [12, 19, 34895], [18, 25, 79349],
         [20, 33, 181525], [31, 1, -119], [19, 33, 181433],  [0, 12, 8640], [13, 12, 9017]]
  SIZE_PENALTY = 100
  
  def fitness
    -DATA.inject(0) do |f,(x,y,z)| 
      f + ( z - eval_genes(:x=>x,:y=>y)).abs
    end - size * SIZE_PENALTY
  end
end

Here Sander creates a class named Quiz212. Quiz212 extends TreeGenotype, provided by the Charlie gem. TreeGenotype takes three arguments: terminals, unary_ops, binary_ops. These are used to construct the tree representing the program. The class also defines a fitness function which is used to evaluate the programs generated by TreeGenotype. The data, as well as a size penalty are stored as constants within the class.

The Charlie gem also provides a Population class. The population’s initializer takes a genotype class as a parameter and uses that to evolve the population. The population also provides an evolve_on_console method that displays the current best program each generation.

Population.new(Quiz212).evolve_on_console(200)

Charlie looks like a great way to get started trying out genetic programming. The TreeGenotype is just one of many possible options available in the library. Please do check it out.

Brabuhr provided a solution using Directed Ruby Programming. DRP uses Grammatical Evolution as a foundation for Genetic Programming. DRP let’s you define a grammar to constrain the kinds of programs that are generated. Here’s an example of defining an expression using DRP::RuleEngine:

def expression; "#{expression} #{binaryop} #{expression}"; end

We can also define a binary operator in a similar way:

def binaryop; "/"; end

These rules, along with any others that we define, are used by the rule engine as a grammar to generate candidate programs. brabuhr collects these generated programs and applies a fitness function to see how well their output matches the data. These programs are then sorted and mutated/crossed to create the next generation of programs. Be sure to take a look at brabuhr’s full solution for more details.

This week we saw two interesting options for working with genetic programming, Charlie, and DRP. Both of these libraries make working with genetic programming much easier by providing an interface to the common tools and techniques used.

And now what you’ve all been waiting for… the mystery function:

def mystery(x, y)
  3 * x * y + 5 * (y**3) - 7 * x
end

Thank you Sander and brabuhr for your solutions to this week’s quiz!

Genetic Programming (#212) - Solutions


Sunday, July 05, 2009