199 Restoring Data From SQL

Description

Ahoy Rubyists,

This weeks quiz comes from a real world situation. While our team was working on one of our internal applications a bug was introduced. This bug causes certain Composition database entries to be deleted when updating an associated Product. It only happened with one way of updating the products and went undetected for while. Now we notice that we are missing lots of Composition data and we need it back fast!

Fixing the bug was trivial once it was discovered, but restoring the data? That may be much harder… Fortunately we have lots of backup SQL files laying around so we can piece together the missing data. This week’s quiz is to parse a sequence of SQL files and output a list of all records that ever existed in them.

Each file looks like this:

--
-- Definition of table `compositions`
--

DROP TABLE IF EXISTS `compositions`;
CREATE TABLE `compositions` (
 `id` int(11) NOT NULL auto_increment,
 `product_id` int(11) default NULL,
 `component_id` int(11) default NULL,
 `created_at` datetime default NULL,
 `updated_at` datetime default NULL,
 `quantity` decimal(15,3) default '1.000',
 `line_num` int(11) default NULL,
 `fixed` tinyint(1) default NULL,
 PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--
-- Dumping data for table `compositions`
--

/*!40000 ALTER TABLE `compositions` DISABLE KEYS */;
INSERT INTO `compositions`
(`id`,`product_id`,`component_id`,`created_at`,`updated_at`,`quantity`,`line_num`,`fixed`)
VALUES
 (1,1,2,NULL,NULL,'1.000',NULL,NULL),
 (2,1,3,NULL,NULL,'1.000',NULL,NULL),
 (3,NULL,190,'2008-07-24 10:27:34','2008-07-24 10:27:34','5.000',NULL,NULL),
 /* ... more ... */
(135,259,358,'2008-11-19 16:50:36','2008-11-19 16:52:20','1.000',80,1);

Non-null column values in more recent files should take precedence over those same value is previous files. So for example if row 135 had a quantity of 2.000 in the second to last file and a quantity of 1.000 in the last file then the final output should have a quantity of 1.000 for row 135, overwriting the previous 2.000 value.

SQL input files

Have Fun… and save our business!*

* 48-hour no-spoiler period still applies ;-)

Summary

No submissions this week (there may have been a glitch causing this not to have even been posted to the mailing list, so that might explain it…)

Here’s the solution that I ended up using:

file_names = Dir['PROD_*']

compositions_found = false

compositions = Hash.new {{}}

file_names.each do |file_name|
  File.read(file_name).each_line do |line|
    compositions_found = false if line =~ /\/\*/
    if compositions_found
      line =~ /\(([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/

      if $2 && $2 != 'NULL'
        composition = {:id => $1.to_i}
        composition[:product_id] = $2.to_i if $2 && $2 != 'NULL'
        composition[:component_id] = $3.to_i if $3 && $3 != 'NULL'
        composition[:quantity] = $6[1...-1].to_f if $6 && $6 != 'NULL'
        composition[:line_num] = $7.to_i if $7 && $7 != 'NULL'
        composition[:fixed] = $8.to_i if $8 && $8 != 'NULL'
        #puts composition.inspect
        compositions[$1] = compositions[$1].merge(composition)
      end
      #puts line
    else
      compositions_found = true if line =~ /INSERT INTO `compositions`/
    end

    next unless compositions_found
  end
end

compositions.sort.each do |id, composition|
  puts composition.inspect
end

It reads the comma separated fields using a wonky regex (please let me know of a better way!). The output is inspected hashes that I eval in another program to load the data back into the DB. It worked surprisingly well.


Monday, April 06, 2009