The Lazy Parsers

The Lazy Parser

Closures are cool. It allows us to inject stack based local variables anywhere in our parse descent hierarchy. Typically, we store temporary variables, generated by our semantic actions, in our closure variables, as a means to pass information up and down the recursive descent.

Now imagine this... Having in mind that closure variables can be just about any type, we can store a parser, a rule, or a pointer to a parser or rule, in a closure variable. Yeah, right, so what?... Ok, hold on... What if we can use this closure variable to initiate a parse? Think about it for a second. Suddenly well have some powerful dynamic parsers! Suddenly we'll have a full round trip from to Phoenix and Spirit and back! Phoenix semantic actions choose the right Spirit parser and Spirit parsers choose the right Phoenix semantic action. Oh MAN, what a honky cool idea, I might say!!

lazy_p

This is the idea behind the lazy_p parser. The lazy_p syntax is:

    lazy_p(actor)

where actor is a Phoenix expression that returns a Spirit parser. This returned parser is used in the parsing process.

Example:

    lazy_p(phoenix::val(int_p))[assign(result)]

Semantic actions attached to the lazy_p parser expects the same signature as that of the returned parser (int_p, in our example above).

lazy_p example

To give you a better glimpse (see the lazy_parser.cpp example here), say you want to parse inputs such as:

where bin {...} and dec {...} specifies the numeric format (binary or decimal) that we are expecting to read. If we analyze the input, we want a grammar like:

    base = "bin" | "dec";
    block = base >> '{' >> *block_line >> '}';
    block_line = number | block;

We intentionally left out the number rule. The tricky part is that the way number rule behaves depends on the result of the base rule. If base got a "bin", then number should parse binary numbers. If base got a "dec", then number should parse decimal numbers. Typically we'll have to rewrite our grammar to accomodate the different parsing behavior:

    block = 
            "bin" >> '{' >> *bin_line >> '}'
        |   "dec" >> '{' >> *dec_line >> '}'
        ;
    bin_line = bin_p | block;
    dec_line = int_p | block;

while this is fine, the redundancy makes us want to find a better solution; after all, we'd want to make full use of Spirit's dynamic parsing capabilities. Apart from that, there will be cases where the set of parsing behaviors for our number rule is not known when the grammar is written. We'll only be given a map of string descriptors and corresponding rules [e.g. (("dec", int_p), ("bin", bin_p) ... etc...)].

The basic idea is to have a rule for binary and decimal numbers. That's easy enough to do (see numerics). When base is being parsed, in your semantic action, store a pointer to the selected base in a closure variable (e.g. block.int_rule). Here's an example:

    base 
        = str_p("bin")[block.int_rule = &var(bin_rule)] 
        | str_p("dec")[block.int_rule = &var(dec_rule)]
        ;

With this setup, your number rule will now look something like:

    number = lazy_p(*block.int_rule);

The lazy_parser.cpp does it a bit differently, ingeniously using the symbol table to dispatch the correct rule, but in essence, both strategies are similar. Admitedly, when you add up all the rules, the resulting grammar is more complex than the hard-coded grammar above. Yet, for more complex grammar patterns with a lot more rules to choose from, the additional setup is well worth it.

Copyright © 2003 Joel de Guzman
Copyright © 2003 Vaclav Vesely

Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.