The Parser

What makes Spirit tick? First, some concepts. The parser class is the most fundamental entity in the framework. Basically, a parser accepts a pair of iterators and returns a match object as its result. The iterators delimit the data currently being parsed. The match object evaluates to true if the parse succeeds, in which case the input is advanced accordingly. Each parser can represent a specific pattern or algorithm, or it can be a more complex parser formed as a composition of other parsers.

All parsers inherit from the base template class, parser:

template <typename DerivedT>
struct parser {
    /*...*/
    DerivedT&  	    derived();
    DerivedT const& derived() const;
};

This class is a protocol base class for all parsers. This is essentially an interface contract. The parser class does not really know how to parse anything but instead relies on the template parameter DerivedT to do the actual parsing. This technique is known as the "Curiously Recurring Template Pattern" in meta-programming circles. This inheritance strategy gives us the power of polymorphism without the virtual function overhead. In essence this is a way to implement compile time polymorphism.

Concrete sub-classes inheriting from parser must have a corresponding member function parse() compatible with the conceptual Interface:

template <typename IteratorT>
match
parse(IteratorT& first, IteratorT last) const;

where first points to the current input, last points to one after the end of the input (reminiscent of STL algorithms), match result reports the parsing success (or failure). Note that first is passed in by reference. This is advanced appropriately when a match is found, otherwise its position is undefined.

Token Types

Since parser::parse is a template member function, its IteratorT template parameter is an abstract concept. This implies that parsers can deal with arbitrary token types. This can be a char, a wchar_t, an enum, an integer or a user defined type. Spirit can work on arbitrary token types as long the == and < operators are applicable to its instances.

IteratorT is an STL compliant forward iterator. It is passed by reference to allow the function to 'move' its position accordingly when a match is found. The parse is considered successful if a portion of the input starting at the current scanner position is matched. The parse function terminates as soon as the iterator finds anything that the parser does not recognize or when the input is exhasuted (first == last).

A parser can be quite simple. Here is a sample parser that accepts all characters:

struct anychar : public parser<anychar> {
    template <typename IteratorT>
    match
    parse(IteratorT& first, IteratorT last) const
    {
        if (first != last)
        {
            ++first;
            return match(1);
        }
        return match();
    }
};

A match object is returned by the parse function. Most importantly, the match object reports the success of the parse; i.e. evaluates to true if the parse function is successful, false otherwise. If the parse is successful, the match object may also be queried to report the number of characters matched (match.length()). The length is non-negative if the match is successful, and the typical length of a parse failure is -1. Note: a default-constructed match object represents an unsuccessful parse. Here is a code snippet:

match   hit = parse(first, last);
bool    success = hit;
int     length = hit.length();

Clients of the framework generally do not need to write their own hand-coded parsers at all. Spirit has an immense repertoire of pre-defined parsers covering all aspects of syntax and semantics analysis. We shall examine this repertoire of parsers in the following sections. In the rare case where a specific functionality is not available, it is extremely easy to write a user-defined parser. The ease in writing a parser entity is the main reason for Spirit's extensibility.