Nov 06

Spirit V2.1 will be released with Boost V1.41

By Hartmut Kaiser Add comments

Now, as the release of the new Spirit version is at our doorstep, I would like to whet your appetite. Let’s start with a list of high level things worth knowing. Here is the one floor elevator speech I stole from Spirit’s documentation:

Spirit is an object-oriented, recursive-descent parser and output generation library for C++. It allows you to write grammars and format descriptions using a domain specific embedded language (DSEL) directly in C++. The DSEL has a format similar to Parsing Expression Grammars (PEG). These inline grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, they are immediately executable.

Let’s try to digest this piecewise. I think I do not need to dig into Spirit being ‘object-oriented’, that stands for itself. ‘Recursive descent’ is a way to write parsers. Wikipedia says:

A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Spirit is special as it applies the technology of using mutually recursive procedures not only for implementing parsers, but it uses the same approach for creating output generators. We are quite proud of this as we do not know of anybody doing this before. Compared to earlier Spirit versions this extends the scope of the library considerably, as we now support two very important steps in any data transformation flow: Spirit.Qi (the sub-library responsible for parsing) helps converting arbitrary formatted (text) input into some internal structured data format, while Spirit.Karma (the sub-library for output generation) simplifies the opposite: converting the internal binary data structures into arbitrary formatted (text) output. This picture shows what I mean:

The place of Spirit.Qi and Spirit.Karma in a data transformation flow of a typical application
The place of Spirit.Qi and Spirit.Karma in a data transformation flow of a typical application

Grammars for Spirit.Qi and Spirit.Karma are written directly in C++. We use extensive operator overloading to make the grammar syntax very similar to PEG, a well known way to unambiguously describe formal grammars. Let’s have a look at a simple example. The following PEG grammar describes a trivial infix calculator supporting ‘*’, ‘/’, ‘+’, ‘–’, and allowing to group sub-expressions with parentheses:

fact ← integer / '(' expr ')'
term ← fact (('*' fact) / ('/' fact))*
expr ← term (('+' term) / ('-' term))*

Here is the same grammar, but written in Spirit.Qi, and yes, that’s perfectly valid C++. This code defines a parser which recognizes infix calculator expressions:

namespace qi = boost::spirit::qi;
typedef qi::rule<std::string::iterator> rule;
rule fact, term, expr;

fact = qi::int_ | '(' >> expr >> ')' ;
term = fact >> *(('*' >> fact) | ('/' >> fact)) ;
expr = term >> *(('+' >> term) | ('-' >> term)) ;

The last three lines look almost similar to the original PEG grammar above, except for the ‘>>’ used to denote concatenation. Grammars written for output formatting in Spirit.Karma look very similar, the most notable difference is probably that they use the left shift operator ‘<<‘ instead of Spirit.Qi’s right shift operator ‘>>’ for concatenation. Any C++ programmer will find that non-surprising as it follows the convention of using ‘<<‘ and ‘>>’ for normal input and output code.

The last fact in our one floor elevator speech from above is obvious now. Spirit is pure C++, uses only language constructs as provided by C++, and for this reason, integrates smoothly with any surrounding code. All data items in your program are easily accessible as the destination for parsing results or as the source for generating output. No additional (external) tools as normally required for conventional parser generators have to be utilized.

Interested in trying out Spirit? I suggest you start reading the documentation (here) or browse through the extensive examples coming with the library as soon as Boost V1.41 has been released (or grab the beta here). In the meantime, if you have any questions, leave a comment or use Spirit’s mailing list.

Leave a Reply

preload preload preload