rockit - Ruby O-o Compiler construction toolKIT

Version: 0.3.8
Release date: 2001-10-26
Available from: http://rockit.sf.net
Author: Robert Feldt, feldt@ce.chalmers.se
README version: $Id: README,v 1.11 2001/10/17 02:04:19 feldt Exp $
Tarball: http://prdownloads.sf.net/rockit/rockit-0-3-8.tar.gz
Project page: http://sourceforge.net/projects/rockit

What is it?

An easy-to-use, object-oriented compiler construction toolkit written in and generating code for Ruby. Currently focusing on the "front-end" phases of compiler construction.

Main features of rockit:

Grammars written in Extended Backus-Naur Form. (=> use *, ? and + ops).
Generates both lexer and parser.
Parsers will return abstract syntax trees (AST).
Generated AST's support simple tree-walking using iterators.
"Ruby-friendly" with for example Array's for repetition, Regexp's for tokens etc.
More advanced parser than yacc's LaLr(1) (If you're curious its called "Generalized LR parsing with scanner forking"!)
AST's can be dumped to postscript (if you have graphviz/dot)
Reports when the grammar is ambigous and shows the alternative ways to parse the sentence. Helps you resolve ambiguities.
Associativity and precedence can be specified based on productions/rules in the grammar (NOT on operators which is less "portable").

Installation?

unpack tarball (if you haven't already)
install: ruby install.rb
If you've got RubyUnit you can also run tests: ruby tests/runtests.rb

Why is it needed?

No need to write a lexer/scanner; rockit gives you both a lexer and a parser from the same grammar.
No need to write standard code for building an abstract syntax tree; rockit automatically generates it and you can specify how the tree should look.
rockit-generated parsers builds the AST; NO need to write "action code" in the grammar. "Action code" separated from grammar.
More powerful operators.
You can write grammars directly in Ruby code.
Rockit will show you why your grammar is ambigous (if it is!) by showing you the two ways the sentence can be parsed. This helps you resolve the ambiguity by introducing priorities.

But we already have two excellent compiler compilers in/for Ruby!

Yes, but they (racc and rbison) both use the bison/yacc format which, IMHO, is not optimal for an OO language like Ruby. You also have to write the "action code" (the one to be executed for each production) by hand. This is sometimes a good thing if you simply want to extract some info; but for general use you probably want to make several passes over the result from the parse (which will likely be an AST). Instead of writing the code for building the AST rockit does it for you.

In the longer term rockit will include components that are typically not part of compiler compilers like yacc and bison such as for example syntax-directed translation, pretty-printer generation etc.

Example of using the rockit command-line program?

$ rockit my_grammar.grammar myparser.rb MyModule my_parser
Generated parser for my_grammar.grammar and saved it in myparser.rb. Use it by
doing:

require 'myparser'
ast = MyModule.my_parser.parse "..."

Example of using the rockit lib in Ruby code?

require 'rockit/rockit'

parser = Parse.generate_parser <<-'END_OF_GRAMMAR'
 Grammar ExampleGrammar
  Tokens
    Blank      = /\s+/             [:Skip]
    Number     = /\d+/
  Productions
    Expr       -> Number           [^]
               |  Expr '+' Expr    [Plus: left,_,right]
               |  Expr '-' Expr    [Minus: left,_,right]
               |  Expr '*' Expr    [Mul: left,_,right]
               |  Expr '/' Expr    [Div: left,_,right]
               |  '(' Expr ')'     [^: _,expr,_]
  Priorities
    left(Plus), left(Minus), left(Mul), left(Div)
    Div = Mul > Plus = Minus
END_OF_GRAMMAR

def calc_eval(ast)
  case ast.name
  when "Plus"
    calc_eval(ast.left) + calc_eval(ast.right)
  when "Minus"
    calc_eval(ast.left) - calc_eval(ast.right)
  when "Mul"
    calc_eval(ast.left) * calc_eval(ast.right)
  when "Div"
    calc_eval(ast.left) / calc_eval(ast.right)
  when "Number"
    ast.lexeme.to_i
  end
end

calc_eval(parser.parse '(4*((2+6)-3))/2')    # => 10

Requirements?

Memoize from RAA (or my Ruby page) is needed for a slight performance increase. But should work without it. Please mail me if it doesn't!

Otherwise it should work with any Ruby >= 1.6. If you've got strscan by Minero Aoki installed it will be used and give a slight performance increase. But things work even if you haven't.

I've successfully used rockit with Ruby 1.7.1 (2001-09-20) and cygwin 1.1.8 (gcc version 2.95.2-6) on Windows 2000 Professional. But people have reported it works with 1.6.5.

NOTE THAT THIS IS AN ALPHA RELEASE SO THERE WILL LIKELY BE BUGS AND THE API WILL LIKELY CHANGE.

RubyUnit is needed to run unit tests.

Documentation?

Not much yet. Check out the examples in the examples dir.

You can get a good intro to writing grammars by looking at the grammar for rockit grammars. Its in lib and called 'rockit-grammar-files.grammar'. You can also compare it to the grammar in bootstrap.rb which is (almost) the same grammar but written directly in Ruby code.

Also check out the tests. Lots of good info and examples in there.

More examples of use?

There are some stuff in the examples dir:

calculator - simple read-eval calculators
minibasic - interpreter for subset of basic in 46 LOC!
polynomials - examples on evaluating and differentiating polynomials
from the ANTLR tutorial.
ruby - rudimentary parser for Ruby. Translates parse.y to rockit grammar.
But more work is needed for it to be useful. Note that I've only tried with parse.y from Ruby 1.6.3 and there is reported to be a problem with later ones. I'll check it out soon...

License and legal issues?

rockit is distributed under LGPL. See LICENSE and COPYING-LESSER. Parsers you generate are LGPL so should not restrict you. If it does please mail me.

Special things to note?

Rockit is currently:

SLOW! Both when generating the parser and when parsing. I haven't given performance much thought yet and haven't profiled so expect significant performance gains when we get to this issue on the TODO.

BAD AT HANDLING AND REPORTING ERRORS! Will be fixed when someone shows me "the/a right way" to do it.

If you're installing from sources grabbed by CVS you should remove the file "lib/rockit_grammars_parser.rb" before running "ruby install.rb".

Plans for the future?

Lots of stuff, see TODO.

Do you have comments or questions?

I'd appreciate if you drop me a note if you're successfully using rockit. If there are some known users I'll be more motivated to packing up additions / new versions and post them to RAA. Please give feedback!

What is Generalized LR parsing?

(You don't need to understand this to use rockit but if you're interested you might learn something about parsing...) It is a pseudo-parallel parsing algorithm wihch runs a dynamically varying number of LR parsers in parallel. LR parsing algorithms, such as for example yacc and bison, generate a table with parsing actions. If there is an ambiguity in the language or the generation technique used introduces ambiguities because its "imperfect" there are multiple actions in some position(s) in the table. In ordinary (yacc-style) LALR(1) parsing these are called conflicts and must be resolved by rewriting the grammar or introducing associativity and precedence rules since the parser must take one and only one action. In generalized LR parsing all actions are taken by spawning of parsers for each one of them. So if the ambiguity arose not because of the grammar but because of the limitations of the parser generator all but the parser taking the correct action will fail. And if there are multiple ways to parse the sequence they will all be found! This procedure incurs a performance penalty at compile-time, but it can be overcome by clever encodings of the different parsers and their data.

Happy coding!

Robert Feldt, feldt@ce.chalmers.se