The Rule

The rule is a polymorphic parser that acts as a named place-holder capturing the behavior of an EBNF expression assigned to it. Naming an EBNF expression allows it to be referenced later by another EBNF expression. The rule is a template class parameterized by the type of iterator (IteratorT) that it will work on. The default iterator type is char const*.

The rule class models EBNF's production rule. Example:

 rule<> a_rule = *(a | b) & +(c | d | e);

The type and functionality of the right-hand EBNF expression, which may be arbitrarily complex, is encoded in the rule named a_rule. a_rule may now be referenced elsewhere in the grammar:

rule<> another_rule = f >> g >> h >> a_rule;
Rules and virtual functions

rules straddle the border between static and dynamic C++. In effect, a rule transforms compile-time polymorphism (using templates) into run-time polymorphism (using virtual functions). This is necessary due to C++'s inability to automatically declare a variable of a type deduced from an arbitrarily complex expression in the right-hand side (rhs) of an assignment . Basically, we want to do something like:

T rule = an_arbitrarily_complex_expression;

without having to know or care about the resulting type of the right-hand side of the assignment expression . Apart from this, we also need a facility to forward declare an unknown type:

T rule;
...
rule = a | b;

These limitations lead us to this implementation of rules. This comes at the expense of the overhead of a virtual function call, once through each invocation of a rule.

When a rule is invoked by an EBNF expression, the rule is held by the expression by reference. It is the responsibility of the client to ensure that the referenced rule stays in scope and does not get destructed while it is being referenced ( cyclic dependency ).

Forward declarations:

rules may be declared before being defined to allow cyclic structures typically found in BNF declarations. Example:

rule<> a, b, c;
a = b | a;
b = c | a;

Recursion:

The right-hand side of a rule may reference other rules, including itself. The limitation is that direct or indirect left recursion is not allowed (this is an unchecked run-time error that results in an infinite loop). This is typical of top-down parsers. Example:

a = a | b; // infinite loop!
Dynamic Parsers

We have seen that the symbol table is a dynamic parser. In fact, hosting declarative EBNF in imperative C++ yields an interesting blend. We have the best of both worlds. We have the ability to conveniently modify the grammar at run time using imperative constructs such as if, else statements. Example:

if (feature_is_available)
    r = add_this_feature;

Rules are essentially dynamic parsers. A dynamic parser is characterized by its ability to modify its behavior at run time. Initially, an undefined rule matches nothing. At any time, alternatives may be added, thus, dynamically altering its behavior.

Undefined rules:

An undefined rule matches nothing.

Multiple declarations:

Some BNF variants allow multiple declarations of a rule. The declarations are taken as alternatives. Example, the definition:

r = a;
r = b;

is equivalent to:

r = a | b;

No start rule:

Typically, parsers have what is called a start symbol, chosen to be the root of the grammar where parsing starts. The Spirit parser compiler has no notion of a start symbol. Any rule can be a start symbol. This feature promotes step-wise creation of parsers. We can build parsers from the bottom up while fully testing each level or module up untill we get to the top-most level.