228 Ruby BASIC

Description

10 PRINT "Hello Rubyists"
20 END

Inspired by one of the solutions from quiz #223, this week’s quiz is to write a BASIC interpreter.

There are many different versions of BASIC with some subtle differences, so for the purpose of this quiz let’s focus on unstructured Dartmouth BASIC.

Everyone should feel encouraged to post some samples of their favorite BASIC programs for testing.

Have fun!

Summary

This summary was written by Jean Lazarou.

We received no solution for this quiz. Therefore we are going to present the one we wrote.

The basic language reference we consider is the one from Dartmouth college (back in October 1964). The language is rather simple, it only computes numbers. It does not contain any string handling, except for output printing. The variable naming is limited to a single letters followed by an optional digit. It does not support interactive input.

The language

Let us start with a short description of the language. For a deeper specification see annex or get the Dartmouth College’s document.

Each line of a basic program is a statement. Most statements are instructions to execute.

A line must start with a line number. After the line number comes a word denoting the type of the statement. Basic has fifteen statement types:

Our implementation does not support the DEF and DIM statements.

Source code does not require lines to be ordered. Basic uses only capital letters and spaces are optional. Expression must appear on a single line.

The END statement must be the highest numbered statement, including the data statements. A number must contain up to nine digits with an optional minus sign and an optional decimal point.

As basic programs have no input statements, a program uses the data statements for data input.

A simple program

Next program outputs the first ten square roots.

10 LET X = 0
20 LET X = X + 1
30 PRINT X, SQR(X)
40 IF X <= 10 THEN 20
50 END

The parser

We are going to write a parser and an interpreter. Separating the parsing phase from the interpretation (or the run) makes writing tests easier, we separate the syntaxical aspects from the running aspects.

Parsing basic code is not very difficult, the syntax is not complex. We use a regular expression to validate each line.

01  unless line =~ /^(\d+) *(DATA|DEF|DIM|END|FOR|GOSUB|GOTO|IF|LET|NEXT|PRINT|READ|REM|RETURN|STOP) *(.*)$/ 
02    raise error_message("INVALID STATEMENT", line_number, line) 
03  end

The regular expression matches a string starting with digits, followed by optional spaces, one of the valid basic keywords, optional spaces and anyhting. We use regular expression’s groups so that if the line matches our rule the global variables $1 contains the line number, $2 contains the statement and $3 the rest of the line. We discard any space in between.

After validating the statement type, we need statement specific validation. We use a hash table that maps the statement type to a method that parses each statements.

01   STATEMENT_PARSERS = { 
02     'DATA' => :parse_data, 
03     'END' => :parse_end, 
04     'FOR' => :parse_for, 
05     'GOSUB' => :parse_gosub, 
06     'GOTO' => :parse_goto, 
07     'IF' => :parse_if, 
08     'LET' => :parse_let, 
09     'NEXT' => :parse_next, 
10     'PRINT' => :parse_print, 
11     'READ' => :parse_read, 
12     'REM' => :parse_rem, 
13     'RETURN' => :parse_return, 
14     'STOP' => :parse_stop 
15   }

Each parse method needs the statement line number and the rest of the line (the part to parse). It also needs the current position in the script and the whole line to be able to generate a valuable error message.

01   statement = self.send(method, line_number, line, $1.to_i, $3) 
02   statements << statement

The parser sends the message (the parse method) to it self. If the parse method successfully parses the statement it returns a Satetement object. We add the statement object to the statements array.

Once the parser consumes all the input program, we need to sort the satements because the basic language does not require the statement to be ordered.

01 statements.sort!

The sorting works because the Statement objects provide an implementation of the comparison method (<=>). The comparison method compares the line numbers. Here is the Statement base class.

01 class Statement 
02   
03   attr_reader :line, :arguments 
04   
05   def initialize line, type, arguments = nil 
06     @line, @type, @arguments = line, type, arguments 
07   end 
08   
09   def <=> other 
10     @line - other.line 
11   end 
12   
13   def to_s 
14     "#{@line} #{@type} #{@arguments}".rstrip 
15   end 
16   
17 end

We are not going to present all the parse methods, as an example we are going to look at parse_goto which is rather simple. A goto statement looks like: 10 GOTO 90. Which means: the effect of line 10 is to jump at line 90.

01   def parse_goto line_number, source, statement_line, arguments 
02     
03     scanner = BasicScanner.new(arguments) 
04     
05     scanner.skip_spaces 
06     to_line = scanner.scan_line 
07     
08     raise error_message("MISSING TO LINE", line_number, source) unless to_line 
09     
10     scanner.skip_spaces 
11     raise error_message("INVALID 'GOTO' STATMENT", line_number, source) unless scanner.eos? 
12     
13     GotoStatement.new(statement_line, to_line.to_i) 
14     
15   end

We use the BasicScanner class which derives from the StringScanner class. The reason for extending StringScanner is to add methods, like skip_spaces, that makes it easier to read the code, not to mention that regular expressions are not easy to read.

The arguments parameter is a string containing everything following the basic statement type. We first skip spaces (line 5), then we try retrieving the line number to go to (line 6). If we don’t find a line number, we raise an error (line 8). Otherwise we skip trailing spaces (line 10) and check if we reached the end of the input (line 11). Finally, input was a valid GOTO statement and we return a GotoStatement.

Expressions

When a statement expects an expression the parse methods use the parse_expression method.

parse_expression

The array of tokens is still an infix expression. The reason is that we can easier implement the to_s method use by the to_s method of a Statement object which produces a valid basic source code.

The to_a method returns the array of tokens.

Expression
Testing the parser

Testing the parser is easy, here are some examples.

Testing the expression parsing.

01   expr = @parser.parse_expression('INT(X/Y) + 2') 
02   assert_equal ['INT', '(', :X, '/', :Y, ')', '+', 2], expr.to_a

Testing converting the expression to the postfix version.

01   expr = @parser.parse_expression('SQR(4) + INT(Y/2) * 3') 
02   assert_equal [4, 'SQR', :Y, 2, '/', 'INT', 3, '*', '+'], expr.to_postfix

Testing that spaces are optional.

01   def test_program_witout_spaces 
02 
03     program = <<PROGRAM 
04 10PRINT"HELLO"
05 20END
06 PROGRAM 
07 
08     statements = @parser.parse(program) 
09 
10     assert_equal 2, statements.length 
11     
12     assert_equal '10 PRINT "HELLO"', statements[0].to_s 
13     assert_equal '20 END', statements[1].to_s 
14 
15   end

We use the to_s method of the Statement objects which returns a canonical string to make the test is easier to read. (notice that reading source code without spaces is not easy – line 4)

The interpreter

Now that a parser returns an array of Statement objects, executing the program is easy. We loop through them and keep track of the program position (we need a bit more, see later). The interpreter uses a runtime object which stores the programs’s variables and the program position. Here is the main loop of the interpreter.

01   def run_statements statements 
02     
03     runtime = create_runtime(statements) 
04     
05     while runtime.running? 
06     
07       statement = statements[runtime.program_pos] 
08       
09       statement.execute(runtime) 
10       
11       runtime.program_pos += 1 
12       
13     end 
14     
15     runtime 
16     
17   end

The runtime object has also a running state.

The interpreter expects the statement objects to implement the execute method (line 9). The statement classes in the parser file (basic_parser.rb) do not offer execute methods, we re-open the classes here in the interpreter file (basic.rb). Therefore our files are shorter and the separation of responsibilities does not imply doubling the set of classes.

At the end of the method (line 15) we return the runtime to the calling code so that it can access the variables.

The runtime object

Let us fisrt look at the runtime class layout.

01 class Runtime 
02   
03   attr_accessor :program_pos 
04   
05   def initialize console, data 
06   end 
07   
08   def running? 
09   end 
10 
11   def next_data 
12   end 
13   
14   def end 
15   end 
16   
17   def print line 
18   end 
19   
20   def [] var 
21   end 
22   
23   def []= var, value 
24   end 
25     
26   def each_var 
27   end 
28   
29   def save_pos name, subject 
30   end 
31   
32   def restore_pos name 
33   end 
34   
35 end

A Runtime instance needs a console where the output is sent and an array of all the data available for the read statements.

The program position can be read or changed, for instance the GOTO statement changes it. Notice that the program position is an index in the program’s array of statements, not the source code line number.

The next_data returns the sequence of data or nil when all the data is consumed.

The end method puts the state to program ended.

The print method is used by the PRINT statement.

The [], []= and each_var give access to the current values of the program’s variables.

The last methods, save_pos and restore_pos are used to save the current program position and to retrieve the saved position. A name is accociated with each save, it is useful to validate the correct sequence of the NEXT statements and the RETURN statements. The saved positions are pushed on a stack and must be consumed in reverse order.

Creating the runtime object

The create_runtime method instantiate a Runtime object.

It loops through all the statements creating:

After the first loop, it starts a second loop to set the mapped line number to the program position for all the goto-like statements. The test to find what statements are goto-like is done by searching all the objects that respond to the program_pos= message, needed to set the actual program position.

Turn the Statement objects into runnable objects…

In the interpreter file we re-open the Statement classes and add the execute methods. The default implementation does nothing (some statements like DATA and REM just inherit the no-operation execute).

01 class Statement 
02   
03   def execute runtime 
04   end 
05   
06 end

One interresting thing in Ruby is that we do not need to redeclare inheritance when we re-open a class. Therefore, if we ever need to change the class hierarchy we only need to change it at one place. If we declared the inheritance we would have to give exactly the same.

We are going to look at the STOP and the GOSUB statements.

01 class StopStatement 
02   
03   def execute runtime 
04     runtime.end 
05   end 
06   
07 end

(We do not repeat that the class derives from the Statement class.)

The implementation for stop is very simple, it only needs to notify the runtime that the program stops running (line 4).

The gosub is just a bit more complicated.

01 class GosubStatement 
02   
03   def execute runtime 
04     runtime.save_pos :gosub, self 
05     runtime.program_pos = @to_program_pos 
06   end 
07   
08   def program_pos= line 
09     @to_program_pos = line - 1 
10   end 
11   
12 end

We ask the runtime to save the current program position, because when we later meet the return statement we must continue at the next statement relative to the current position. We register the save position with a special name, the :gosub symbol (line 4). And we set the position where we want to jump (line 5). Actually we set the goto index – 1 (see the way we store the line index at line 9).

Saving the current position prepares the runtime for a later use when the interpreter meets the return statement.

01 class ReturnStatement 
02   
03   def execute runtime 
04     
05     gosub = runtime.restore_pos(:gosub) 
06     
07     raise "UNMATCHED 'GOSUB' AT LINE #{@line}" unless gosub 
08     
09   end 
10   
11 end
Testing the interpreter

The tests that validate the interpreter look like:

01 
02 require 'test/unit' 
03 require 'basic' 
04 
05 class TestBasic < Test::Unit::TestCase 
06   
07   def test_print_one_string 
08     
09     program = <<BASIC 
10 10 PRINT 'HELLO'
11 20 END
12 BASIC 
13 
14     @basic.run program 
15 
16     assert_equal 1, @console.lines.length 
17     assert_equal ["HELLO"], @console.lines 
18     
19   end 
20   
21 end

The test requires the unit test framework and the interpreter (line 3). We call the interpreter, at line 14, with the program. Next we assert we printed out one line (at line 16) and the output content (line 17). The console attribute is a false console that stores all the printed lines. The interpreter needs a console and calls its print method. We create a test console.

01 class TestConsole 
02   
03   attr_reader :lines 
04   
05   def initialize 
06     @lines = [] 
07   end 
08   
09   def print line 
10     @lines << line 
11   end 
12   
13 end 

The test setup creates the interpreter.

01   def setup 
02     @console = TestConsole.new 
03     @basic = BasicInterpreter.new(@console) 
04   end 

If the interpreter was as follow, the test would pass.

01 class BasicInterpreter 
02   
03   def initialize console 
04     @console = console 
05   end 
06   
07   def run program 
08     @console.print "HELLO" 
09   end 
10   
11 end
 
$>ruby  basic_test.rb
Loaded suite basic_test
Started
E
Finished in 0.000369 seconds.
 
  1) Error:
test_print_one_string(TestBasic):
NoMethodError: undefined method `run' for nil:NilClass
    basic_test.rb:14:in `test_print_one_string'
 
1 tests, 0 assertions, 0 failures, 1 errors
Using programs for tests

We wrote some basic programs that we can use to validate our interpreter in automated tests. By using the technique presented above and adding some more methods, the tests become even easier to read. Here if the factorial program.

10 REM
15 REM COMPUTE FACTORIAL
20 REM
25 LET F = 1
30 READ N
31 LET X = N
35 IF N = 0 THEN 55
40 LET F = F * N
45 LET N = N - 1
50 GOTO 35
55 PRINT "FACT", X, "IS", F
60 GOTO 25
65 DATA 1, 3, 5, 6, 20
99 END

Here is the test that runs the factorial program.

01 class TestRunningBasicPrograms < Test::Unit::TestCase 
02 
03   def test_factorial 
04 
05     script 'fact.bas' 
06     
07     expects 'FACT', 1, 'IS', 1 
08     expects 'FACT', 3, 'IS', 6 
09     expects 'FACT', 5, 'IS', 120 
10     expects 'FACT', 6, 'IS', 720 
11     expects 'FACT', 20, 'IS', 2432902008176640000 
12     
13     assert_output_as_expected 
14     
15   end 
16 
17 end

At line 5 we set the program file under test, from line 7 up to 11 we declare the expectations and the last line (line 13) checks the outcome.

Tests using the runtime

As the interpreter returns the runtime object, we can use it during the assertion phase.

01   def test_1et 
02     
03     program = <<BASIC 
04 10 LET A = (23 + 5) / 2
05 40 END
06 BASIC 
07 
08     runtime = @basic.run(program) 
09 
10     assert_equal 14, runtime[:A] 
11     
12   end

We run the program and store the runtime in the runtime variable (line 8). Then we assert that the value of the variable A of the program is the one expected, using the runtime object.

Test results

The code attached to this summary contain tests. Here is the tests results.

 
$>ruby  basic_parser_test.rb
Loaded suite basic_parser_test
Started
.............
Finished in 0.008335 seconds.
 
13 tests, 56 assertions, 0 failures, 0 errors
 
$>ruby  basic_test.rb
Loaded suite basic_test
Started
.............
Finished in 0.020823 seconds.
 
13 tests, 23 assertions, 0 failures, 0 errors

You can download the code here.

Annex

Here is a BNF description of the language syntax.

 
 
  <program>    ::= <statement> {\n <statement>}
  <statement>  ::= <let> | <read> | <data> | <print> | <goto> | <if> | <for> | <next> |
                   <end> | <stop> | <def> | <gosub> | <return> | <dim> | <rem>
  <let>        ::= LET <variable> = <expression>
  <read>       ::= READ <variable> {, <variable>}
  <data>       ::= DATA <number> {, <number>}
  <print>      ::= PRINT <label> | <label> <expression> | <expression>
  <goto>       ::= GOTO <line-number>
  <if>         ::= IF <expression> <relational> <expression> THEN <line-number>
  <for>        ::= FOR <unsubscripted-var> = <expression> TO <expression> [STEP <expression>]
  <next>       ::= NEXT <unsubscripted-var>
  <end>        ::= END
  <stop>       ::= STOP
  <def>        ::= DEF FN <letter> (<unsubscripted-var>) = <expression>
  <gosub>      ::= GOSUB <line-number>
  <return>     ::= RETURN
  <dim>        ::= DIM <letter>(<integer>) | <letter>(<integer>, <integer>)
  <rem>        ::= <string>
  <variable>   ::= <letter> [<digit>] [(<integer>) | (<integer>, <integer>)]
  <label>      ::= "<string>"
  <relational> ::= < | = | > | <= | >= | <>
  <unsubscripted-var> ::= <letter> [<digit>]
 

Expressions may contain the following arithmetic operations: +, -, *, /, ^.

Basic has next built-in functions: SIN, COS, TAN, ATN, EXP, ABS, LOG, SQR, RND, INT


Saturday, January 30, 2010