The Grammar

The Grammar

The grammar encapsulates a set of rules. The grammar class is a protocol base class for all grammars. It is essentially an interface contract. The grammar is a template class that is parameterized by its derived class, DerivedT, and its context, ContextT. The template parameter ContextT defaults to parser_context, a predefined context. You need not be concerned at all with the ContextT template parameter unless you wish to tweak the low level behavior of the grammar. Detailed information on the ContextT template parameter is provided elsewhere. The grammar relies on the template parameter DerivedT, a grammar subclass to define the actual rules.

Presented above is the public API. There may actually be more template parameters after ContextT. Everything after the ContextT parameter should not be of concern to the client and are strictly for internal use only.

    template<
        typename DerivedT,
        typename ContextT = parser_context>
    struct grammar;

Grammar definition

A concrete sub-class inheriting from grammar is expected to have a nested template class (or struct) named definition:

It is a nested template class with a typename ScannerT parameter.
Its constructor defines the rules.
Its constructor is passed in a reference to the actual grammar self.
It has a member function named start that returns a reference to the start rule.

Grammar skeleton

    struct my_grammar : public grammar<my_grammar>
    {
        template <typename ScannerT>
        struct definition
        {
            rule<ScannerT>  r;
            definition(my_grammar const& self)  { r = /*..define here..*/; }
            rule<ScannerT> const& start() const { return r; }
        };
    };

Decoupling the scanner type, and hence iterator, from the rules that form a grammar allows the grammar to be used in different contexts possibly using different scanners/iterators. We don't care what scanner we are dealing with. The user defined my_grammar can be instantiated without regard to a scanner type and can be used as a parser using any type of scanner. In short, unlike the rule, the grammar is not tied to a specific scanner type. See "Scanner Business" to see why this is important and to gain further understanding on this scanner-rule coupling problem.

Instantiating and using my_grammar

Our grammar above may now be instantiated and put into action:

    my_grammar g;

    if (parse(first, last, g, space_p).full)
        cout << "parsing succeeded\n";
    else
        cout << "parsing failed\n";

my_grammar IS-A parser and can be used anywhere a parser is expected, even referenced by another rule:

    rule<>  r = g >> str_p("cool huh?");

Full Grammar Example

Following our original calculator example, here it is now rewritten using a grammar:

    struct calculator : public grammar<calculator>
    {
        template <typename ScannerT>
        struct definition
        {
            definition(calculator const& self)
            {
                group       = '(' >> expression >> ')';
                factor      = integer | group;
                term        = factor >> *(('*' >> factor) | ('/' >> factor));
                expression  = term >> *(('+' >> term) | ('-' >> term));
            }

            rule<ScannerT> expression, term, factor, group;

            rule<ScannerT> const&
            start() const { return expression; }
        };
    };

A fullly working example with semantic actions can be viewed here. This is part of the Spirit distribution.
[ See libs/spirit/example/fundamental/calc/calc_plain.cpp ]

self

You might notice that the definition of the grammar has a constructor that accepts a const reference to the outer grammar. In the example above, notice that calculator::definition takes in a calculator const& self. While this is unused in the example above, in many cases, this is very useful. The self argument is the definition's window to the outside world. For example, the calculator class might have a reference to some state information that the definition can update while parsing proceeds through semantic actions.

Grammar Capsules

As the grammar gets quite complicated, it is a good idea to group parts of the grammar into logical modules. For instance, when writing a language, it might be wise to put expressions and statements into separate grammar capsules. The grammar takes advantage of the encapsulation properties of C++ classes. The declarative nature of classes makes it a perfect fit for the definition of grammars. Since the grammar is nothing more than a class declaration we can conveniently publish it in header files. The idea is that once written and fully tested, a grammar can be reused in many contexts. We now have the notion of grammar libraries.

Reentrancy and multithreading

An instance of a grammar may be used in different places multiple times without any problem. The implementation is tuned to allow this at the expense of some overhead. However, we can save considerable cycles and bytes if we are certain that a grammar will only have a single instance. If this is desired, simply define BOOST_SPIRIT_SINGLE_GRAMMAR_INSTANCE before including any spirit header files.

#defineBOOST_SPIRIT_SINGLE_GRAMMAR_INSTANCE

On the other hand, if a grammar is intended to be used in multithreaded code, we should then define BOOST_SPIRIT_THREADSAFE before including any spirit header files. In this case it will also be required to link against Boost.Threads

#define BOOST_SPIRIT_THREADSAFE