The Scanner Business

Question: Why doesn't this compile?

    rule<> r = /*...*/;
    parse("hello world", r, space_p); // BAD [attempts phrase level parsing]

But if I remove the skip-parser, everything goes back to normal again:

    rule<> r = *anychar_p;
    parse("hello world", r); // OK [character level parsing]

Sometimes you'll want to pass in a rule to one of the functions parse functions that Spirit provides. The problem is that the rule is a template class that is parameterized by the scanner type. This is rather awkward but unavoidable: the rule is tied to a scanner. What's not obvious is that this scanner must be compatible with the scanner that is ultimately passed to the rule's parse member function. Otherwise, the compiler will complain.

Why does the first call to parse not compile? Because of scanner incompatibility. Behind the scenes, the free parse function creates a scanner from the iterators passed in. In the first call to parse, the scanner created is a plain vanilla scanner<>. This is compatible with the default scanner type of rule<> [see default template parameters of the rule]. The second call creates a scanner of type phrase_scanner_t:

    typedef skipper_iteration_policy<>                  iter_policy_t;
    typedef scanner_policies<iter_policy_t>             scanner_policies_t;
    typedef scanner<char const*, scanner_policies_t>    phrase_scanner_t;

Thus, in order for the second call to succeed, the rule must be parameterized as rule<phrase_scanner_t>:

    rule<phrase_scanner_t> r = *anychar_p;
    parse("hello world", r, space_p);       //  OK [phrase level parsing]

Take note however that phrase_scanner_t is compatible only when you are using char const* iterators and space_p as the skip parser. Other than that, you'll have to find the right type of scanner. This is tedious to do correctly. In light of this issue, it is best to avoid rules as arguments to the parse functions. Keep in mind that this happens only with rules. The rule is the only parser that has to be tied to a particular scanner type. For instance:

    parse("hello world", *anychar_p);           //  OK  [character level parsing]
    parse("hello world", *anychar_p, space_p);  //  OK  [phrase level parsing

Eliminating Left Recursion

Question: I ported a grammar from YACC. It's "kinda" working - the parser itself compiles with no errors. But when I try to parse, it gives me an "invalid page fault". I tracked down the problem to this grammar snippet:

    or_expr = xor_expr | (or_expr >> VBAR >> xor_expr);

What you should do is to elliminate direct and indirect left-recursion. This causes the invalid page fault because the program enters an infinite loop. The code above is good for bottom up parsers such as YACC but not for LL parsers such as Spirit.

This is similar to a rule in Hartmut Kaiser's C parser in the examples/applications.

    = exclusive_or_expression
    | inclusive_or_expression >> OR >> exclusive_or_expression

Transforming left recursion to right recursion, we have:

    = exclusive_or_expression >> inclusive_or_expression_helper

    = OR >> exclusive_or_expression >> inclusive_or_expression_helper
    | epsilon_p

I'd go further. Since:

    r = a | epsilon_p;

is equivalent to:

    r = !a;

we can simplify inclusive_or_expression_helper thus:

    = !(OR >> exclusive_or_expression >> inclusive_or_expression_helper)

Now, since:

    r = !(a >> r);

is equivalent to:

    r = *a;

we have:

    = *(OR >> exclusive_or_expression)

Now simplifying inclusive_or_expression fully, we have:

    = exclusive_or_expression >> *(OR >> exclusive_or_expression)

Reminds me of the calculators. So in short:

    a = b | a >> op >> b;

in pseudo-YACC is:

    a = b >> *(op >> b);

in Spirit. What could be simpler? Look Ma, no recursion, just iteration.

The lexeme_d directive and rules

Question: Does lexeme_d not support expressions which include rules? In the example below, the definition of atomicRule compiles,

    rule<ScannerT> atomicRule
        = lexeme_d[(alpha_p | '_') >> *(alnum_p | '.' | '-' | '_')];

but if I move alnum_p | '.' | '-' | '_' into its own rule, the compiler complains about conversion from const scanner<...> to const phrase_scaner_t&.

    rule<ScannerT> ch 
        = alnum_p | '.' | '-' | '_';

    rule<ScannerT> compositeRule
        = lexeme_d[(alpha_p | '_') >> *(ch)]; // <- error source

You might get the impression that the lexeme_d directive and rules do not mix. Actually, this problem is related the the first FAQ entry: The Scanner Business. More precisely, the lexeme_d directive and rules with incompatible scanner types do not mix. This problem is more subtle. What's causing the scanner incompatibility is the directive itself. The lexeme_d directive transforms the scanner it receives into something that disables the skip parser. This non-skipping scanner, unfortunately, is incompatible with the original scanner before transformation took place.

The simplest solution is not to use rules in the lexeme_d. Instead, you can definitely apply lexeme_d to subrules and grammars if you really need more complex parsers inside the lexeme_d. If you really must use a rule, you need to know the exact scanner used by the directive. The lexeme_scanner metafunction is your friend here. The example above will work as expected once we give the ch rule a correct scanner type:

    rule<lexeme_scanner<ScannerT>::type> ch 
        = alnum_p | '.' | '-' | '_';

Note: make sure to add "typename" before lexeme_scanner when this is used inside a template class or function.

The same thing happens when rules are used inside the as_lower_d directive. In such cases, you can use the as_lower_scanner. See the lexeme_scanner and as_lower_scanner.

Kleene Star infinite loop

Question: Why Does This Loop Forever?

    rule<> optional = !(str_p("optional"));
    rule<> list_of_optional = *optional;

The problem with this is that the kleene star will continue looping until it gets a no-match from it's enclosed parser. Because the optional rule is optional, it will always return a match. Even if the input doesn't match "optional" it will return a zero length match. list_of_optional will keep calling optional forever since optional will never return a no-match. So in general, any rule that can be "nullable" (meaning it can return a zero length match) must not be put inside a kleene star.

Boost CVS and Spirit CVS

Question: There is Boost CVS and Spirit CVS. Which is used for further development of Spirit?

Generally, development takes place in Spirit's CVS. However, from time to time a new version of Spirit will be integrated in Boost. When this happens development takes place in the Boost CVS. There will be announcements on the Spirit mailing lists whenever the status of the Spirit CVS changes.