Annotations

Sequencing Syntax

The comma operator as in a, b seems to be a better candidate, syntax-wise. But then the problem is with its precedence. It has the lowest precedence in C/C++, which makes it virtually useless. There was some talk in Boost's mailing list about the possibility of writing a template meta-program that overrides the precedence of the comma operator. Yet there is no way, using any meta-program, to distinguish explicit grouping using parentheses. From the viewpoint of the meta-program, r = a, b | c; is the same as r = a, (b | c);.

The initial release of Spirit [V1.0] has caused a bit of commotion and controversy over the >> and the prefix * and + operators. While some find the operators liveable, likeable, and even lovable, some respected people in the field find them confusing and unacceptable. Some have strong opinions while some are somewhat neutral but with varied preferences. The author maintains that we cannot possibly view the syntax issue in a strictly logical perspective because subjectivity inevitably enters into the issue. It is not about right or wrong. Ultimately, it is plainly and simply a matter of preference.

Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000" talks about overloading whitespace. Such a feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b | c instead of a >> b | c). If and when C++ incorporates whitespace overloading, we'll get much closer than ever to (E)BNF syntax. Unfortunately, the article was dated April 1, 1998. Oh well.

String Types

strlit is a template class parametized by the string type and defaults to cstring<>. The auxilliary class cstring is a template class parametized by the character type and defaults to char. The cstring utility class accepts literal string and is declared as:

template <typename CharT = char>
class cstring {

public:

    typedef CharT const* const_iterator;

    cstring(const_iterator str, unsigned len);
    cstring(const_iterator str);

    const_iterator      begin() const;
    const_iterator      end() const;
    std::size_t         length() const;
};

Clients are free to supply strlit with a different class. The requirement is that the class should have the STL savvy const_iterator typedef, begin/end member functions that provides iterators to its elements and a length member function that gives the number of elements. std::string is compatible and can be used as template parameter to strlit.

Intersections

Some researchers assert that the intersections (e.g. a & b) let us define context sensitive languages that ("XBNF" [citing Leu-Weiner, 1973]). "The theory of defining a language as the intersection of a finite number of context free languages was developed by Leu and Weiner in 1973".

~ Operator

The complement operator ~ was originally put into consideration. For character sets (chsets) the ~ operator is well defined. Not quite for sets of strings. Further understanding of its value and meaning leads us to uncertainty. The basic problem stems from the fact that ~a will yield U-a, where U is the universal set of all strings. Might this be a another fertile ground for future exploration?

Cyclic Dependency

Rules tend to have cyclic dependencies. It is typical to have a rule a reference another rule b which in turn directly or indirectly references rule a. Because of this, memory management through smart pointers using some kind of reference counting scheme is not an option.

Left Recursion

Left recursion (e.g. a = a | b;) is not allowed in Spirit. There are ways to rewrite the meta-program hierarchy to circumvent this limitation. This might be implemented in a future release of Spirit in the form of a directive.

The auto Keyword

The rule is an engineering compromise constrained by the limitations of C++. A Spirit EBNF rule such as:

rule<> rule = ch_p('a') | 'b';

could actaully be rewritten as:

alternative<chlit<>, chlit<> > rule 
= ch_p('a') | 'b';

There are clear advantages. The second form is much more efficient and flexible (not coupled to the iterator type). The problem is that it is tedious and error prone to declare complex EBNF expressions this way. The right hand side (rhs) has to mirror the type of the left hand side (lhs).

David Abrahams (boost.org) proposed in comp.std.c++ to reuse the auto keyword for this purpose. Example:

auto rule = ch_p('a') | 'b';

This would have been a neat solution and a perfect fit for our purpose. It's not a complete solution though since there are still situations where we do not know the rhs beforehand; for instance when pre-declaring cyclic dependent rules.

Similar work

The work of Dr. Damian Conway shares some similarities with Spirit. Like Spirit, his work involves inline parsing in C++. Although any similarity to Dr. Conway's work is purely coincidental, one may think that Spirit is a modern version of his work using modern C++ features and techniques such as static polymorphism through partial template specializations instead of dynamic dispatch through virtual functions. In some ways, Spirit could benefit from learning more about Dr. Conway's work especially with its implementation of deferred expressions.

Research, design and implementation of Spirit V1.2 led this author to Haskell Combinators. In a nutshell, combinators are inline parsing in a functional language such as Haskell. The works of S. Doaitse Swierstra and Luc Duponcheel on deterministic, error-correcting combinator parsers are of particular interest that should lead the way to the future of Spirit.