168 Statistician II

Description

Last week’s quiz started the creation of a line-based pattern-matching system: our statistician. This week, your task is to further develop a solution from last week: organize the code and provide a more interesting interface.

The first thing is organization. This little library should be reusable and not tied to any particular parsing need. So we want to separate out the “Statistician” from the client. To do this means moving the appropriate code into a separate file called statistician.rb, containing:

# statistician.rb

module Statistician
  # This module is your task! Your code goes here...
end

Meanwhile, the client code will now begin with:

# client.rb

require 'statistician'

Simple, eh?

Next, we will move the rules from their own data file and bring them into the code. Admittedly, moving data into code usually is not a wise thing to do, but as the primary data is that which the rules parse, we’re going to do it anyway. Besides, this is Ruby Quiz, so why not?

Simultaneously, we’re going to group rules together: rules that while may differ somewhat in appearance, essentially represent the same kind or category of data. As the rules and category are client data, they will go into the client’s code. Here’s an example to begin, borrowing the LotRO rules used last week.

# client.rb

class Offense < Statistician::Reportable
  rule "You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[ damage]."
  rule "You reflect <amount> point[s] of <kind> damage to[ the] <name>."
end

class Victory < Statistician::Reportable
  rule "Your mighty blow defeated[ the] <name>."
end

Next, we need a parser (or Reporter, as I like to call it) that can manage these rules and classes, read the input data and process it all line by line. Such client code looks like this:

# client.rb

lotro = Statistician::Reporter.new(Offense, Victory)
lotro.parse(File.read(ARGV[0]))

Finally, we need to begin getting useful information out of all the records that have been read and parsed by the Reporter. After the data is parsed, the final bit will be to support code such as this:

# client.rb

num = Offense.records.size
dmg = Offense.records.inject(0) { |s, x| s + x.amount.to_i }
puts "Average damage inflicted: #{dmg.to_f / num}"

puts Offense.records[0].class   # outputs "Offense"

What is going on here? The class Offense serves three purposes.

1. Its declaration contains the rules for offensive related records. 2. After parsing, the class method records returns an array of records that matched those rules. 3. Those records are instances of the class, and instance methods that match the field names (extracted from the rules) provide access to a record’s data.

Hopefully this isn’t too confusing. I could have broken up some of these responsibilities into other classes or sections of code, but since the three tasks are rather related, I thought it convenient and pleasing to group them all into the client’s declared class.

Below I’ll give the full, sample client file I’m using, as well as the output it generates when run over the hunter.txt file we used last week. A few hints, first…

1. You are welcome to make statistician.rb depend on other Ruby modules. I personally found OpenStruct to be quite useful here.

2. Personally, I found making Offense inherit from Reportable to be the cleanest method. At least, it is in my own code. There may be other ways to accomplish this goal: by include or extend methods. If you find those techniques more appealing, please go ahead, but make a note of it in your submission, since it does require changing how client code is written.

3. Metaprogramming can get a bit tricky to explain in a couple sentences, so I’ll leave such hints and discussion for the mailing list. Aside from that, there are some good examples of metaprogramming looking back through past Ruby Quizzes. Of particular interest would be the metakoans.rb quiz.

4. Finally, my own solution for this week’s quiz is just under 80 lines long, so it need not be overly complex to support the client file below.

Here is the complete, sample client file:

require 'statistician'

class Defense < Statistician::Reportable
  rule "[The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[ damage]."
  rule "You are wounded for <amount> point[s] of <kind> damage."
end

class Offense < Statistician::Reportable
  rule "You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[ damage]."
  rule "You reflect <amount> point[s] of <kind> damage to[ the] <name>."
end

class Defeat < Statistician::Reportable
  rule "You succumb to your wounds."
end

class Victory < Statistician::Reportable
  rule "Your mighty blow defeated[ the] <name>."
end

class Healing < Statistician::Reportable
  rule "You heal <amount> points of your wounds."
  rule "<player> heals you for <amount> of wound damagepoints."
end

class Regen < Statistician::Reportable
  rule "You heal yourself for <amount> Power points."
  rule "<player> heals you for <amount> Power points."
end

class Comment < Statistician::Reportable
  rule "### <comment> ###"
end

class Ignored < Statistician::Reportable
  rule "<player> defeated[ the] <name>."
  rule "<player> has succumbed to his wounds."
  rule "You have spotted a creature attempting to move stealthily about."
  rule "You sense that a creature is nearby but hidden from your sight."
  rule "[The ]<name> incapacitated you."
end


if __FILE__ == $0
  lotro = Statistician::Reporter.new(Defense, Offense, Defeat, Victory,
                                     Healing, Regen, Comment, Ignored)
  lotro.parse(File.read(ARGV[0]))

  num = Offense.records.size
  dmg = Offense.records.inject(0) { |sum, off| sum + Integer(off.amount.gsub(',', '_')) }
  d = Defense.records[3]

  puts <<-EOT
Number of Offense records: #{num}
Total damage inflicted: #{dmg}
Average damage per Offense: #{(100.0 * dmg / num).round / 100.0}

Defense record 3 indicates that a #{d.name} attacked me
using #{d.attack}, doing #{d.amount} points of damage.

Unmatched rules:
#{lotro.unmatched.join("\n")}

Comments:
#{Comment.records.map { |c| c.comment }.join("\n")}

  EOT
end

And here is the output it generates, using the hunter.txt data file:

Number of Offense records: 1300
Total damage inflicted: 127995
Average damage per Offense: 98.46

Defense record 3 indicates that a Tempest Warg attacked me
using Melee Double, doing 108 points of damage.

Unmatched rules:
The Trap wounds Goblin-town Guard for 128 points of Common damage.
Nothing to cure.

Comments:
Chat Log: Combat 04/04 00:34 AM

Summary

I don’t know if it was the metaprogramming that scared people away this week, or perhaps folks are away on summer vacations. In any case, I’m going to summarize this week’s quiz by looking at the submission from Matthias Reitinger. The solution is, as Matthias indicates, unexpectedly concise. “I guess that’s just the way Ruby works.”

Matthias’ code implements the Statistician module in three parts, each a class. Here is the first class, Rule:

class Rule
  def initialize(pattern)
    @fields = []
    pattern = Regexp.escape(pattern).gsub(/\\\[(.+?)\\\]/, '(?:\1)?').
      gsub(/<(.+?)>/) { @fields << $1; '(.+?)' }
    @regexp = Regexp.new('^' + pattern + '$')
  end
  
  def match(line)
    @result = if md = @regexp.match(line)
      Hash[*@fields.zip(md.captures).flatten]
    end
  end
  
  def result
    @result
  end
end

Rule makes use of regular expressions built-up as discussed in the previous quiz, so I’m not going to discuss that here. I will point out, though, the initialization of the @fields member in the initializer. Note the last gsub call: it uses the block form of gsub.

gsub(/<(.+?)>/) { @fields << $1; '(.+?)' }

As the (.+?) string is last evaluated in the block, that provides the expected replacement in the string. However, Matthias makes use of the just-matched expression to extract the field names. This avoids a second pass over the source string to get those fields names, and is arguably simpler.

The match method matches input lines against the regular expression, returning nil if the input didn’t match, or a hash if it did. Field names (@fields) are first paired (zip) with the matched values (md.captures), then flatten-ed into a single array, finally expanded (*) and passed to a Hash initializer that treats alternate items as keys and values. The end result of Rule#match, when the input matches, is a hash that looks like this:

{ 'amount' => '108', 'name' => 'Tempest Warg' }

That hash is returned, but also stored internally into member @result for future reference, accessed by the last method, result.

The next class is Reportable:

class Reportable < OpenStruct
  class << self
    attr_reader :records
    
    def inherited(klass)
      klass.class_eval do
        @rules, @records = [], []
      end
      super
    end
    
    def rule(pattern)
      @rules << Rule.new(pattern)
    end
    
    def match(line)
      if rule = @rules.find { |rule| rule.match(line) }
        @records << self.new(rule.result)
      end
    end
  end
end

This small class is the extent of the metaprogramming going on in the solution, and it’s not much, though perhaps unfamiliar to some. Let’s get into some of it. We’ll ignore the OpenStruct inheritance for the moment, coming back to it later.

Everything inside the Reportable class is surrounded by a block that opens with class << self. There is a good summary on the Ruby Talk mailing list, but its use here can be summed up in two words: class methods. The class << self mechanism is not strictly about class methods, but in this context it affects similar behavior. Alternatively, these methods could have been defined in this manner:

class Reportable < OpenStruct
  def Reportable.rule(pattern)
    # etc.
  end

  def Reportable.match(line)
    # etc.
  end

  # etc.
end

In the end, the class << self mechanism is cleaner looking, and also allows for use of attr_reader in a natural way.

The next interesting bit is the inherited method. This is a class method, here implemented on Reportable, that is called whenever Reportable is subclassed (which happens repeatedly in the client code). It’s a convenient hook that allows the other bit of metaprogramming to happen.

klass.class_eval do
  @rules, @records = [], []
end

klass is the class derived from Reportable (i.e. our client’s classes for future statistical analysis). Here, Matthias initializes two members, both to empty arrays, in the scope of class klass. This serves to ensure that every class derived from Reportable gets its own, separate members, not shared with other Reportable subclasses.

This could be done without metaprogramming, but would require effort from the client.

class Reportable
  # class methods here
end

class Offense < Reportable
  @rules, @records = [], []
  # rules, etc.
end

class Defense < Reportable
  @rules, @records = [], []
  # rules, etc.
end

If the client forgot to initialize those two members, or got the names wrong, the class wouldn’t work, exceptions would be thrown, cats and dogs living together… you get the idea.

You might consider defining those data members in the Reportable class itself, like so:

class Reportable
  @rules, @records = [], []
  
  # class methods, without inherited
end

The problem with this is that every Reportable subclass would now share the same rules and records arrays: not the desired outcome.

In the end, the class_eval used here, called from inherited, is the right way to do things. It provides a way for the superclass to inject functionality into the subclass.

Getting back to functionality, Reportable#match is straightforward, but let me highlight one line:

@records << self.new(rule.result)

If you recall, result returns a hash of field names to values. And Reportable is attempting to pass that hash to its own initializer, of which none is defined. This is where OpenStruct comes in.

OpenStruct “allows you to create data objects and set arbitrary attributes.” And OpenStruct provides an initializer that takes the hash Matthias provides, and does the expected.

data = OpenStruct.new( {'amount' => '108', 'name' => 'Tempest Warg'} )
p data.amount     # -> 108
p data.name       # -> Tempest Warg

By subclassing Reportable from OpenStruct, all of the client’s classes will inherit the same behavior, which fulfills many of the requirements provided in the class specification.

The final class, Reporter, is trivial.

class Reporter
  attr_reader :unmatched
  
  def initialize(*args)
    @reportables = args
    @unmatched = []
  end
  
  def parse(data)
    data.each_line do |line|
      line.strip!
      @reportables.find { |rep| rep.match(line) } or @unmatched << line
    end
  end
end

It reads through a data source a line at a time, finding a matching rule (and creating the appropriate record in the process) or adding the input line to @unmatched which the client can query later.


Wednesday, February 04, 2009