Virtual functions: From static to dynamic C++

Rules straddle the border between static and dynamic C++. In effect, a rule transforms compile-time polymorphism (using templates) into run-time polymorphism (using virtual functions). This is necessary due to C++'s inability to automatically declare a variable of a type deduced from an arbitrarily complex expression in the right-hand side (rhs) of an assignment. Basically, we want to do something like:

    T rule = an_arbitrarily_complex_expression;

without having to know or care about the resulting type of the right-hand side (rhs) of the assignment expression. Apart from this, we also need a facility to forward declare an unknown type:

    T rule;
    rule = a | b;

These limitations lead us to this implementation of rules. This comes at the expense of the overhead of a virtual function call, once through each invocation of a rule.

Multiple declaration

Some BNF variants allow multiple declarations of a rule. The declarations are taken as alternatives. Example:

   r = a;    
   r = b;

is equivalent to:

   r = a | b;

Spirit v1.3 allowed this behavior. However, the current version of Spirit no longer allows this because experience shows that this behavior leads to unwanted gotchas (for instance, it does not allow rules to be held in containers). In the current release of Spirit, a second assignment to a rule will simply redefine it. The old definition is destructed. This follows more closely C++ semantics and is more in line with what the user expects the rule to behave.

Sequencing Syntax

The comma operator as in a, b seems to be a better candidate, syntax-wise. But then the problem is with its precedence. It has the lowest precedence in C/C++, which makes it virtually useless.

Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000" talks about overloading whitespace. Such a feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b | c instead of a >> b | c). Unfortunately, the article was dated April 1, 1998. Oh well.

Forward iterators

In general, the scanner expects at least a standard conforming forward iterator. Forward iterators are needed for backtracking where the iterator needs to be saved and restored later. Generally speaking, Spirit is a backtracking parser. The implication of this is that at some point, the iterator position will have to be saved to allow the parser to backtrack to a previous point. Thus, for backtracking to work, the framework requires at least a forward iterator.

Some parsers might require more specialized iterators (bi-directional or even random access). Perhaps in the future, deterministic parsers when added to the framework, will perform no backtracking and will need just a single token lookahead, hence will require input iterators only.

Why are subrules important?

Subrules open up the oportunity to do aggressive meta programming as well because they do not rely on virtual functions. The virtual function is the meta-programmer's hell. Not only does it slow down the program due to the virtual function indirect call, it is also an opaque wall where no metaprogram can get past. It kills all meta-information beyond the virtual function call. Worse, the virtual function cannot be templated. Which means that its arguments have to be tied to a actual types. Many problems stem from this limitation.

While Spirit is a currently classified as a non-deterministic recursive-descent parser, Doug Gregor first noted that other parsing techniques apart from top-down recursive descent may be applied. For instance, apart from non-deterministic recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented using the same expression template front end. Spirit rules use virtual functions to encode the RHS parser expression in an opaque abstract parser type. While it serves its purpose well, the rule's virtual functions are the stumbling blocks to more advanced metaprogramming. Subrules are free from virtual functions.