228 Ruby BASIC
Description
10 PRINT "Hello Rubyists"
20 END
Inspired by one of the solutions from quiz #223, this week’s quiz is to write a BASIC interpreter.
There are many different versions of BASIC with some subtle differences, so for the purpose of this quiz let’s focus on unstructured Dartmouth BASIC.
Everyone should feel encouraged to post some samples of their favorite BASIC programs for testing.
Have fun!
Summary
This summary was written by Jean Lazarou.We received no solution for this quiz. Therefore we are going to present the one we wrote.
The basic language reference we consider is the one from Dartmouth college (back in October 1964). The language is rather simple, it only computes numbers. It does not contain any string handling, except for output printing. The variable naming is limited to a single letters followed by an optional digit. It does not support interactive input.
The language
Let us start with a short description of the language. For a deeper specification see annex or get the Dartmouth College’s document.
Each line of a basic program is a statement. Most statements are instructions to execute.
A line must start with a line number. After the line number comes a word denoting the type of the statement. Basic has fifteen statement types:
DATADEFDIMENDFORGOSUBGOTOIFLETNEXTPRINTREADREMRETURNSTOP
Our implementation does not support the DEF and DIM statements.
Source code does not require lines to be ordered. Basic uses only capital letters and spaces are optional. Expression must appear on a single line.
The END statement must be the highest numbered statement, including the data statements. A number must contain up to nine digits with an optional minus sign and an optional decimal point.
As basic programs have no input statements, a program uses the data statements for data input.
A simple program
Next program outputs the first ten square roots.
10 LET X = 0 20 LET X = X + 1 30 PRINT X, SQR(X) 40 IF X <= 10 THEN 20 50 END
The parser
We are going to write a parser and an interpreter. Separating the parsing phase from the interpretation (or the run) makes writing tests easier, we separate the syntaxical aspects from the running aspects.
Parsing basic code is not very difficult, the syntax is not complex. We use a regular expression to validate each line.
01 unless line =~ /^(\d+) *(DATA|DEF|DIM|END|FOR|GOSUB|GOTO|IF|LET|NEXT|PRINT|READ|REM|RETURN|STOP) *(.*)$/
02 raise error_message("INVALID STATEMENT", line_number, line)
03 end
The regular expression matches a string starting with digits, followed by optional spaces, one of the valid basic keywords, optional spaces and anyhting. We use regular expression’s groups so that if the line matches our rule the global variables $1 contains the line number, $2 contains the statement and $3 the rest of the line. We discard any space in between.
After validating the statement type, we need statement specific validation. We use a hash table that maps the statement type to a method that parses each statements.
01 STATEMENT_PARSERS = {
02 'DATA' => :parse_data,
03 'END' => :parse_end,
04 'FOR' => :parse_for,
05 'GOSUB' => :parse_gosub,
06 'GOTO' => :parse_goto,
07 'IF' => :parse_if,
08 'LET' => :parse_let,
09 'NEXT' => :parse_next,
10 'PRINT' => :parse_print,
11 'READ' => :parse_read,
12 'REM' => :parse_rem,
13 'RETURN' => :parse_return,
14 'STOP' => :parse_stop
15 }Each parse method needs the statement line number and the rest of the line (the part to parse). It also needs the current position in the script and the whole line to be able to generate a valuable error message.
01 statement = self.send(method, line_number, line, $1.to_i, $3) 02 statements << statement
The parser sends the message (the parse method) to it self. If the parse method successfully parses the statement it returns a Satetement object. We add the statement object to the statements array.
Once the parser consumes all the input program, we need to sort the satements because the basic language does not require the statement to be ordered.
01 statements.sort!
The sorting works because the Statement objects provide an implementation of the comparison method (<=>). The comparison method compares the line numbers. Here is the Statement base class.
01 class Statement
02
03 attr_reader :line, :arguments
04
05 def initialize line, type, arguments = nil
06 @line, @type, @arguments = line, type, arguments
07 end
08
09 def <=> other
10 @line - other.line
11 end
12
13 def to_s
14 "#{@line} #{@type} #{@arguments}".rstrip
15 end
16
17 endWe are not going to present all the parse methods, as an example we are going to look at parse_goto which is rather simple. A goto statement looks like: 10 GOTO 90. Which means: the effect of line 10 is to jump at line 90.
01 def parse_goto line_number, source, statement_line, arguments
02
03 scanner = BasicScanner.new(arguments)
04
05 scanner.skip_spaces
06 to_line = scanner.scan_line
07
08 raise error_message("MISSING TO LINE", line_number, source) unless to_line
09
10 scanner.skip_spaces
11 raise error_message("INVALID 'GOTO' STATMENT", line_number, source) unless scanner.eos?
12
13 GotoStatement.new(statement_line, to_line.to_i)
14
15 endWe use the BasicScanner class which derives from the StringScanner class. The reason for extending StringScanner is to add methods, like skip_spaces, that makes it easier to read the code, not to mention that regular expressions are not easy to read.
The arguments parameter is a string containing everything following the basic statement type. We first skip spaces (line 5), then we try retrieving the line number to go to (line 6). If we don’t find a line number, we raise an error (line 8). Otherwise we skip trailing spaces (line 10) and check if we reached the end of the input (line 11). Finally, input was a valid GOTO statement and we return a GotoStatement.
Expressions
When a statement expects an expression the parse methods use the parse_expression method.
parse_expression- a number (a floating point value) for literals,
- a string for operators (
+,-,*,/,^), parentheses and built-in functions - a symbol for variables
The array of tokens is still an infix expression. The reason is that we can easier implement the to_s method use by the to_s method of a Statement object which produces a valid basic source code.
The to_a method returns the array of tokens.
ExpressionTesting the parser
Testing the parser is easy, here are some examples.
Testing the expression parsing.
01 expr = @parser.parse_expression('INT(X/Y) + 2')
02 assert_equal ['INT', '(', :X, '/', :Y, ')', '+', 2], expr.to_aTesting converting the expression to the postfix version.
01 expr = @parser.parse_expression('SQR(4) + INT(Y/2) * 3')
02 assert_equal [4, 'SQR', :Y, 2, '/', 'INT', 3, '*', '+'], expr.to_postfixTesting that spaces are optional.
01 def test_program_witout_spaces 02 03 program = <<PROGRAM 04 10PRINT"HELLO" 05 20END 06 PROGRAM 07 08 statements = @parser.parse(program) 09 10 assert_equal 2, statements.length 11 12 assert_equal '10 PRINT "HELLO"', statements[0].to_s 13 assert_equal '20 END', statements[1].to_s 14 15 end
We use the to_s method of the Statement objects which returns a canonical string to make the test is easier to read. (notice that reading source code without spaces is not easy – line 4)
The interpreter
Now that a parser returns an array of Statement objects, executing the program is easy. We loop through them and keep track of the program position (we need a bit more, see later). The interpreter uses a runtime object which stores the programs’s variables and the program position. Here is the main loop of the interpreter.
01 def run_statements statements 02 03 runtime = create_runtime(statements) 04 05 while runtime.running? 06 07 statement = statements[runtime.program_pos] 08 09 statement.execute(runtime) 10 11 runtime.program_pos += 1 12 13 end 14 15 runtime 16 17 end
The runtime object has also a running state.
The interpreter expects the statement objects to implement the execute method (line 9). The statement classes in the parser file (basic_parser.rb) do not offer execute methods, we re-open the classes here in the interpreter file (basic.rb). Therefore our files are shorter and the separation of responsibilities does not imply doubling the set of classes.
At the end of the method (line 15) we return the runtime to the calling code so that it can access the variables.
The runtime object
Let us fisrt look at the runtime class layout.
01 class Runtime 02 03 attr_accessor :program_pos 04 05 def initialize console, data 06 end 07 08 def running? 09 end 10 11 def next_data 12 end 13 14 def end 15 end 16 17 def print line 18 end 19 20 def [] var 21 end 22 23 def []= var, value 24 end 25 26 def each_var 27 end 28 29 def save_pos name, subject 30 end 31 32 def restore_pos name 33 end 34 35 end
A Runtime instance needs a console where the output is sent and an array of all the data available for the read statements.
The program position can be read or changed, for instance the GOTO statement changes it. Notice that the program position is an index in the program’s array of statements, not the source code line number.
The next_data returns the sequence of data or nil when all the data is consumed.
The end method puts the state to program ended.
The print method is used by the PRINT statement.
The [], []= and each_var give access to the current values of the program’s variables.
The last methods, save_pos and restore_pos are used to save the current program position and to retrieve the saved position. A name is accociated with each save, it is useful to validate the correct sequence of the NEXT statements and the RETURN statements. The saved positions are pushed on a stack and must be consumed in reverse order.
Creating the runtime object
The create_runtime method instantiate a Runtime object.
It loops through all the statements creating:
- a map table, mapping the basic line numbers to the index in the array of statements
- a collection of the goto-like statements (
GOTO,GOSUBandIF) - an array of all the data found in the statements
After the first loop, it starts a second loop to set the mapped line number to the program position for all the goto-like statements. The test to find what statements are goto-like is done by searching all the objects that respond to the program_pos= message, needed to set the actual program position.
Turn the Statement objects into runnable objects…
In the interpreter file we re-open the Statement classes and add the execute methods. The default implementation does nothing (some statements like DATA and REM just inherit the no-operation execute).
01 class Statement 02 03 def execute runtime 04 end 05 06 end
One interresting thing in Ruby is that we do not need to redeclare inheritance when we re-open a class. Therefore, if we ever need to change the class hierarchy we only need to change it at one place. If we declared the inheritance we would have to give exactly the same.
We are going to look at the STOP and the GOSUB statements.
01 class StopStatement 02 03 def execute runtime 04 runtime.end 05 end 06 07 end
(We do not repeat that the class derives from the Statement class.)
The implementation for stop is very simple, it only needs to notify the runtime that the program stops running (line 4).
The gosub is just a bit more complicated.
01 class GosubStatement 02 03 def execute runtime 04 runtime.save_pos :gosub, self 05 runtime.program_pos = @to_program_pos 06 end 07 08 def program_pos= line 09 @to_program_pos = line - 1 10 end 11 12 end
We ask the runtime to save the current program position, because when we later meet the return statement we must continue at the next statement relative to the current position. We register the save position with a special name, the :gosub symbol (line 4). And we set the position where we want to jump (line 5). Actually we set the goto index – 1 (see the way we store the line index at line 9).
Saving the current position prepares the runtime for a later use when the interpreter meets the return statement.
01 class ReturnStatement
02
03 def execute runtime
04
05 gosub = runtime.restore_pos(:gosub)
06
07 raise "UNMATCHED 'GOSUB' AT LINE #{@line}" unless gosub
08
09 end
10
11 endTesting the interpreter
The tests that validate the interpreter look like:
01 02 require 'test/unit' 03 require 'basic' 04 05 class TestBasic < Test::Unit::TestCase 06 07 def test_print_one_string 08 09 program = <<BASIC 10 10 PRINT 'HELLO' 11 20 END 12 BASIC 13 14 @basic.run program 15 16 assert_equal 1, @console.lines.length 17 assert_equal ["HELLO"], @console.lines 18 19 end 20 21 end
The test requires the unit test framework and the interpreter (line 3). We call the interpreter, at line 14, with the program. Next we assert we printed out one line (at line 16) and the output content (line 17). The console attribute is a false console that stores all the printed lines. The interpreter needs a console and calls its print method. We create a test console.
01 class TestConsole 02 03 attr_reader :lines 04 05 def initialize 06 @lines = [] 07 end 08 09 def print line 10 @lines << line 11 end 12 13 end
The test setup creates the interpreter.
01 def setup 02 @console = TestConsole.new 03 @basic = BasicInterpreter.new(@console) 04 end
If the interpreter was as follow, the test would pass.
01 class BasicInterpreter 02 03 def initialize console 04 @console = console 05 end 06 07 def run program 08 @console.print "HELLO" 09 end 10 11 end
$>ruby basic_test.rb
Loaded suite basic_test
Started
E
Finished in 0.000369 seconds.
1) Error:
test_print_one_string(TestBasic):
NoMethodError: undefined method `run' for nil:NilClass
basic_test.rb:14:in `test_print_one_string'
1 tests, 0 assertions, 0 failures, 1 errors
Using programs for tests
We wrote some basic programs that we can use to validate our interpreter in automated tests. By using the technique presented above and adding some more methods, the tests become even easier to read. Here if the factorial program.
10 REM 15 REM COMPUTE FACTORIAL 20 REM 25 LET F = 1 30 READ N 31 LET X = N 35 IF N = 0 THEN 55 40 LET F = F * N 45 LET N = N - 1 50 GOTO 35 55 PRINT "FACT", X, "IS", F 60 GOTO 25 65 DATA 1, 3, 5, 6, 20 99 END
Here is the test that runs the factorial program.
01 class TestRunningBasicPrograms < Test::Unit::TestCase 02 03 def test_factorial 04 05 script 'fact.bas' 06 07 expects 'FACT', 1, 'IS', 1 08 expects 'FACT', 3, 'IS', 6 09 expects 'FACT', 5, 'IS', 120 10 expects 'FACT', 6, 'IS', 720 11 expects 'FACT', 20, 'IS', 2432902008176640000 12 13 assert_output_as_expected 14 15 end 16 17 end
At line 5 we set the program file under test, from line 7 up to 11 we declare the expectations and the last line (line 13) checks the outcome.
Tests using the runtime
As the interpreter returns the runtime object, we can use it during the assertion phase.
01 def test_1et 02 03 program = <<BASIC 04 10 LET A = (23 + 5) / 2 05 40 END 06 BASIC 07 08 runtime = @basic.run(program) 09 10 assert_equal 14, runtime[:A] 11 12 end
We run the program and store the runtime in the runtime variable (line 8). Then we assert that the value of the variable A of the program is the one expected, using the runtime object.
Test results
The code attached to this summary contain tests. Here is the tests results.
$>ruby basic_parser_test.rb Loaded suite basic_parser_test Started ............. Finished in 0.008335 seconds. 13 tests, 56 assertions, 0 failures, 0 errors
$>ruby basic_test.rb Loaded suite basic_test Started ............. Finished in 0.020823 seconds. 13 tests, 23 assertions, 0 failures, 0 errors
You can download the code here.
Annex
Here is a BNF description of the language syntax.
<program> ::= <statement> {\n <statement>}
<statement> ::= <let> | <read> | <data> | <print> | <goto> | <if> | <for> | <next> |
<end> | <stop> | <def> | <gosub> | <return> | <dim> | <rem>
<let> ::= LET <variable> = <expression>
<read> ::= READ <variable> {, <variable>}
<data> ::= DATA <number> {, <number>}
<print> ::= PRINT <label> | <label> <expression> | <expression>
<goto> ::= GOTO <line-number>
<if> ::= IF <expression> <relational> <expression> THEN <line-number>
<for> ::= FOR <unsubscripted-var> = <expression> TO <expression> [STEP <expression>]
<next> ::= NEXT <unsubscripted-var>
<end> ::= END
<stop> ::= STOP
<def> ::= DEF FN <letter> (<unsubscripted-var>) = <expression>
<gosub> ::= GOSUB <line-number>
<return> ::= RETURN
<dim> ::= DIM <letter>(<integer>) | <letter>(<integer>, <integer>)
<rem> ::= <string>
<variable> ::= <letter> [<digit>] [(<integer>) | (<integer>, <integer>)]
<label> ::= "<string>"
<relational> ::= < | = | > | <= | >= | <>
<unsubscripted-var> ::= <letter> [<digit>]
Expressions may contain the following arithmetic operations: +, -, *, /, ^.
Basic has next built-in functions: SIN, COS, TAN, ATN, EXP, ABS, LOG, SQR, RND, INT