FAQ

The Scanner Business

Question: Why doesn't this compile?

    rule<> r = /*...*/;
    parse("hello world", r, space_p); // BAD [attempts phrase level parsing]

But if I remove the skip-parser, everything goes back to normal again:

    rule<> r = *anychar_p;
    parse("hello world", r); // OK [character level parsing]

Sometimes you'll want to pass in a rule to one of the functions parse functions that Spirit provides. The problem is that the rule is a template class that is parameterized by the scanner type. This is rather awkward but unavoidable: the rule is tied to a scanner. What's not obvious is that this scanner must be compatible with the scanner that is ultimately passed to the rule's parse member function. Otherwise, the compiler will complain.

Why does the first call to parse not compile? Because of scanner incompatibility. Behind the scenes, the free parse function creates a scanner from the iterators passed in. In the first call to parse, the scanner created is a plain vanilla scanner<>. This is compatible with the default scanner type of rule<> [see default template parameters of the rule]. The second call creates a scanner of type phrase_scanner_t. Thus, in order for the second call to succeed, the rule must be parameterized as rule<phrase_scanner_t>:

    rule<phrase_scanner_t> r = *anychar_p;
    parse("hello world", r, space_p);       //  OK [phrase level parsing]

Take note however that phrase_scanner_t is compatible only when you are using char const* iterators and space_p as the skip parser. Other than that, you'll have to find the right type of scanner. This is tedious to do correctly. In light of this issue, it is best to avoid rules as arguments to the parse functions. Keep in mind that this happens only with rules. The rule is the only parser that has to be tied to a particular scanner type. For instance:

    parse("hello world", *anychar_p);           //  OK  [character level parsing]
    parse("hello world", *anychar_p, space_p);  //  OK  [phrase level parsing

Eliminating Left Recursion

Question: I ported a grammar from YACC. It's "kinda" working - the parser itself compiles with no errors. But when I try to parse, it gives me an "invalid page fault". I tracked down the problem to this grammar snippet:

    or_expr = xor_expr | (or_expr >> VBAR >> xor_expr);

What you should do is to elliminate direct and indirect left-recursion. This causes the invalid page fault because the program enters an infinite loop. The code above is good for bottom up parsers such as YACC but not for LL parsers such as Spirit.

This is similar to a rule in Hartmut Kaiser's C parser in the examples/applications.

    inclusive_or_expression
    = exclusive_or_expression
    | inclusive_or_expression >> OR >> exclusive_or_expression
    ;

Transforming left recursion to right recursion, we have:

    inclusive_or_expression
    = exclusive_or_expression >> inclusive_or_expression_helper
    ;

    inclusive_or_expression_helper
    = OR >> exclusive_or_expression >> inclusive_or_expression_helper
    | epsilon_p
    ;

I'd go further. Since:

    r = a | epsilon_p;

is equivalent to:

    r = !a;

we can simplify inclusive_or_expression_helper thus:

    inclusive_or_expression_helper
    = !(OR >> exclusive_or_expression >> inclusive_or_expression_helper)
    ;

Now, since:

    r = !(a >> r);

is equivalent to:

    r = *a;

we have:

    inclusive_or_expression_helper
    = *(OR >> exclusive_or_expression)
    ;

Now simplifying inclusive_or_expression fully, we have:

    inclusive_or_expression
    = exclusive_or_expression >> *(OR >> exclusive_or_expression)
    ;

Reminds me of the calculators. So in short:

    a = b | a >> op >> b;

in pseudo-YACC is:

    a = b >> *(op >> b);

in Spirit. What could be simpler? Look Ma, no recursion, just iteration.

The lexeme_d directive and rules

Question: Does lexeme_d not support expressions which include rules? In the example below, the definition of atomicRule compiles,

    rule<ScannerT> atomicRule
        = lexeme_d[(alpha_p | '_') >> *(alnum_p | '.' | '-' | '_')];

but if I move alnum_p | '.' | '-' | '_' into its own rule, the compiler complains about conversion from const scanner<...> to const phrase_scaner_t&.

    rule<ScannerT> ch 
        = alnum_p | '.' | '-' | '_';

    rule<ScannerT> compositeRule
        = lexeme_d[(alpha_p | '_') >> *(ch)]; // <- error source

You might get the impression that the lexeme_d directive and rules do not mix. Actually, this problem is related the the first FAQ entry: The Scanner Business. More precisely, the lexeme_d directive and rules with incompatible scanner types do not mix. This problem is more subtle. What's causing the scanner incompatibility is the directive itself. The lexeme_d directive transforms the scanner it receives into something that disables the skip parser. This non-skipping scanner, unfortunately, is incompatible with the original scanner before transformation took place.

The simplest solution is not to use rules in the lexeme_d. Instead, you can definitely apply lexeme_d to subrules and grammars if you really need more complex parsers inside the lexeme_d. If you really must use a rule, you need to know the exact scanner used by the directive. The lexeme_scanner metafunction is your friend here. The example above will work as expected once we give the ch rule a correct scanner type:

    rule<lexeme_scanner<ScannerT>::type> ch 
        = alnum_p | '.' | '-' | '_';

Note: make sure to add "typename" before lexeme_scanner when this is used inside a template class or function.

The same thing happens when rules are used inside the as_lower_d directive. In such cases, you can use the as_lower_scanner. See the lexeme_scanner and as_lower_scanner.

Kleene Star infinite loop

Question: Why Does This Loop Forever?

    rule<> optional = !(str_p("optional"));
    rule<> list_of_optional = *optional;

The problem with this is that the kleene star will continue looping until it gets a no-match from it's enclosed parser. Because the optional rule is optional, it will always return a match. Even if the input doesn't match "optional" it will return a zero length match. list_of_optional will keep calling optional forever since optional will never return a no-match. So in general, any rule that can be "nullable" (meaning it can return a zero length match) must not be put inside a kleene star.

Boost CVS and Spirit CVS

Question: There is Boost CVS and Spirit CVS. Which is used for further development of Spirit?

Generally, development takes place in Spirit's CVS. However, from time to time a new version of Spirit will be integrated in Boost. When this happens development takes place in the Boost CVS. There will be announcements on the Spirit mailing lists whenever the status of the Spirit CVS changes.

How to reduce compilation times with complex Spirit grammars

Question: Are there any techniques to minimize compile times using spirit. For simple parsers compile time doesn't seem to be a big issue, but I recently tried creating a parser with about 78 rules, and it took about 2 hours to compile. I would like to break the grammar up into smaller chunks, but it is not as easy as I thought it would be because rules in two grammar capsules are defined in terms of each other. Any thoughts?

The only way to reduce compile times is

The first task is merely a logistic problem, the second is rather a technical one.

A good example of solving the first task is given in the Spirit cpp_lexer example written by JCAB, which you can find here.

The cross referencing problems may be solved by some kind of forward declaration, or, if this does not work, by introducimg some dummy template argument to the non-templated grammars, which allows to defer the instantiation time until the compiler has seen all the defintions:

    template <typename T = int>
grammar2;

template <typename T = int>
grammar1 : public grammar<grammar1>
{ // refers to grammar2<> }; template <typename T> grammar2 : public grammar<grammar2> { // refers to grammar1<> }; //... grammar1<> g; // both grammars instantiated here

The second task is slightly more complex. You must ensure, that in the first compilation unit the compiler sees only some function/template declaration and in the second compilation unit the function/template definition. Still no problem, as far no templates are involved. If templates are involved, you need to manually (explicitely) instantiate these templates with the correct template parameters inside a separate compilation unit. This way the compilation time is splitted up between several compilation units, which reduces the overall required time drastically too.

For a sample, how to achieve this, you may want to look at the Wave preprocessor library, where this technique is used extensively. You can find Wave here.