The Parser API

I favor a "reflection" approach -- a high-level API layered on top of an AST.


context           # [class,meth] at cursor
class_range(c)    # text range for class c
meth_range(c,m)   # text range for method m
                  # et cetera...

Comments. There are two basic ways of using a parser. You can create a tree (an AST) which can then be walked arbitrarily; or you can set up methods that will be called when the source is scanned and certain constructs are encountered. Think of these in terms of XML parsing: You have DOM-style and you have SAX-style respectively.

But I don't want to have to use any such API in my little refactoring scripts. I don't want to keep track of character offsets manually and walk the tree in search of certain tokens.

I am suggesting a third higher-level approach in addition to the other two, a "reflection" style approach. This will be on top of the AST, not replacing it; it will simply be a higher level of abstraction.

What do I mean by "reflection"? Well, think about Ruby's reflection API. To ask the names of the instance methods of Array, I can simply say Array.instance_methods. This, of course, only operates on the current program; it operates on a source that is loaded and parsed and ready to run. It can't be used to examine the text of some other program.

But the concept is similar to what I'm suggesting. I want easy, pre-parsed, "canned" access to every set of elements in the Ruby source being edited.

For this proof of concept, I only implemented two API methods: context and class_range. The former tells which class and method the cursor is currently sitting in; and the latter tells the text range spanned by the given class.

I'm not a parser guy. The parser code I'm using currently is based on basic_parser.rb by Rich Kilmer, hacked together just for FreeRIDE's class browser. (Soon a new and improved parser will be added to FreeRIDE, and this code will be thrown away.) I combined this with a few calls to irb's lexer (which is superb and easy to use), a few regular expressions, and a little black magic.

This is truly ugly code. The file is named medusa.rb, after the monster in Greek mythology— a monster so ugly that if you look directly at it, you will turn to stone.