Beginner – Boost.Spirit

Phoenix V3 – An Overview

Joel de Guzman — Wed, 08 Jun 2011 21:36:05 +0000

Here’s another BoostCon video uploaded by Marshall Clow. This one is about Phoenix V3, by Hartmut Kaiser:

http://blip.tv/boostcon/phoenix-v3-an-overview-5250984

The slides for this talk can be found here: https://github.com/boostcon/2011_presentations/blob/master/mon/phoenix_v3.pdf?raw=true.

Phoenix will be the next generation of creating unnamed, inlined polymorphic function objects. With V3 we combine the functionality of Boost.Bind and Boost.Lambda, and arranges it into a new library. By writing this new library, we were able to fix some limitations of the aforementioned libraries without breaking backwardscompatibility. The purpose of the talk will be to outline the importance and elegance of functional programming (FP) in C++. The first part of the talk will give an introduction into the Domain Specific Embedded Language (DSEL) we defined with Phoenix. A DSEL is built with the help of regular C++ function and operator overloads. For Phoenix we defined such a language that emulates C++, to give potential users a low entry into the world of FP. While a lot of existing C++ code relies on higher order functions (better known as function objects), e.g. the C++ standard library use them as a way to let users customize operations in certain algorithms. We focus the second part of the talk on examples on how to use Phoenix instead of writing regular function objects and how to enable your legacy code to be used inside Phoenix expressions. However, Phoenix is more. Phoenix is equipped with a unique (in C++) mechanism to handle the expressions discussed in the previous sections as data. This allows us to handle Phoenix not in the C++ standard way but in any way you like. An overview of these mechanisms will be given in the last part of the talk to give potential users an insight on possible future applications that might evolve around Phoenix.

Rating: 5.0/5 (2 votes cast)

Spirit.Qi in the Real World

Joel de Guzman — Wed, 08 Jun 2011 21:34:32 +0000

This is the first time I missed attending BoostCon (May 15-20, 2011 – Aspen, Colorado). Fortunately, for us who were not able to attend, Marshall Clow uploaded some videos. Here’s one one that’s relevant to Spirit: “Spirit.Qi in the Real World”, by Robert Stewart. Watch the presentation here:

http://blip.tv/boostcon/spirit-qi-in-the-real-world-5254335

You can find the slides here: https://github.com/boostcon/2011_presentations/raw/master/tue/spirit_qi_in_the_real_world.pdf

Past sessions on Spirit have focused on introducing Spirit or showing extracts of real use, intermingled with tutorial highlights. Upon writing real Spirit.Qi parsers, however, one quickly discovers that “the devil is in the details.” There are special cases, tricks, and idioms that one must discover by trial and error or, perhaps, by following the Spirit mailing list, all of which take time and may not be convenient. In this session, we’ll walk through the development of a Spirit.Qi parser for printf()-style format strings. The result will be a replacement for printf() that is typesafe and efficient.

Rating: 5.0/5 (2 votes cast)

Attribute Propagation and Attribute Compatibility

Hartmut Kaiser — Sat, 12 Feb 2011 22:15:46 +0000

Narinder Claire asked a seemingly innocent question on the Spirit mailing list the other day. After starting to write an answer I realized that this question is not innocent at all as it touches the very fabric of Spirit: the rules of attribute handling. Many people have a hard time to properly understand what is going on in the nether regions of Spirit. More importantly, they have a hard time to understand why is Spirit implemented the way it is.

The new article Attribute Propagation and Attribute Compatibility not only answers Narinders questions but tries to explain those important concepts in more detail.

Rating: 5.0/5 (3 votes cast)

Using Boost.Spirit V2: Qi and Karma

Joel de Guzman — Sat, 04 Dec 2010 04:04:46 +0000

These are links to the slides and video of Michael Caisse’s BoostCon 2010 talk:

slides:
video:

Enjoy!

Machinery, sensors, equipment, client/server communications, even file formats… Parsing and producing communication streams is everywhere you look. Often these tasks are simple or small enough to tempt ad-hoc solutions. The Spirit 2.1 library provides a model that is simple enough to tackle those “quick hacks” and easily scales for full-featured AST generation.

This session will explore real-life experiences with the parser and generator (Qi/Karma) portions of the Spirit library. As we look at various small and medium-sized parsers/generators employed in various products we will establish some “rules-of-thumb” and guidelines for tackling the parser/generator domain with Qi/Karma. The session will end with the implementation of a usable XML parser and a simplified XPath-like node extractor.

The session will include some lecture and a lot of tutorial. Attendees will walk away with the knowledge and tools to begin parsing and generating with Spirit Qi/Karma.

—Michael Caisse

Rating: 4.0/5 (6 votes cast)

Parsing Escaped String Input Using Spirit.Qi

Hartmut Kaiser — Sat, 13 Nov 2010 20:28:40 +0000

Jeroen Habraken (a.k.a VeXocide) sent an article about parsing escaped strings using Qi, which we happily publish for everybody to read. Thanks Jeroen!

Continue reading here.

Rating: 4.5/5 (2 votes cast)

Joel de Guzman, Hartmut Kaiser: Spirit: History and Evolution

Joel de Guzman — Thu, 14 Oct 2010 18:03:35 +0000

Care about how Spirit got started? Here’s a link to our BoostCon 2010 presentation:

http://blip.tv/file/4245756

This year, we celebrate Spirit’s 10th anniversary from its early beginnings as an offshoot from a much larger GUI library in the 90s and debuted into Boost in May 2001 in the typical “Is there interest in this library?” fashion like all would be Boost libraries. From a humble 7 header file library, Spirit has grown to be one of the most sophisticated Boost libraries and along the way became the incubator of other Boost libraries such as Boost.Fusion, Boost.Phoenix, and Boost.Wave and played a significant role for Boost.Proto getting mature.
We would like to present Spirit (and the libraries it inspired) in a historical perspective. The presentation will aim to provide a lighter, more intimate perspective into the development of at least 4 libraries with almost a decade’s worth of experience being Boost authors and bonafide crazy template metaprogrammers who abuse operators like Mad Scientists. Of course, we can’t help it if we show off some C++ tricks here and there, but we’ll try to keep it as light as we can.

Rating: 5.0/5 (1 vote cast)

Tracking the Input Position While Parsing

Peter Schüller — Fri, 05 Mar 2010 16:21:53 +0000

The following article is about tracking the parsing position with Spirit V2. This is useful for generating error messages which tell the user exactly where an error has occurred. We also show how to use Spirit V2 to parse from an input stream without first reading the whole stream into a std::string.

Rating: 5.0/5 (1 vote cast)

The Anatomy of Semantic Actions in Qi

Hartmut Kaiser — Wed, 03 Mar 2010 18:04:35 +0000

The concept of Spirit’s semantic actions seems to be easy enough to understand as most people new to the library prefer their usage over applying the built-in attribute propagation rules. That is not surprising. The idea of attaching a function to any point of a grammar which is called whenever the corresponding parser matched is straighforward to grasp. Earlier versions of Spirit required a semantic action to conform to a very specific interface. Today’s semantic actions are more flexible and more powerful. Recently, a couple of people asked questions about them. So I decided dedicating this Tip of the Day to the specifics and the usage model of semantic actions in Spirit Qi.

All three of Spirit’s sub-libraries – Qi, Karma, and Lex – support semantic actions. In each case they are different and have some specifics. Today I will highlight semantic actions in Qi. But I will dedicate later Tips of the Day to semantic actions in Karma and Lex.

Semantic actions are functions or function objects attached to some specific part of a grammar. In Qi they are invoked after the corresponding parser successfully recognizes a portion of the input. Here the semantic action receives the attribute value of the matching parser.

Semantic Actions – a General View

A semantic action f are attached to a Qi parser p by simply writing:

p[f]

The function (or function object) f has to expose a certain interface allowing Spirit to pass the proper argument types. In the simplest case this can be a global function taking no arguments at all.

void func()
{
    std::cout << "Matched an integer!\n";
}

std::string input("1234");
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end, qi::int_[func]);     // this will call func

Most of the time this is not sufficient as a semantic action is expected to receive the matched attribute value. This is possible by writing:

void func(int attribute)
{
    std::cout << "Matched integer: " << attribute << "\n";
}

The type of the expected parameter (in this case the int) depends on the parser the semantic action is attached to. The attribute type exposed by the parser has to be convertible to the argument type.

There are actually 2 more arguments being passed: the parser context and a reference to a boolean ‘hit’ parameter. The parser context is meaningful only if the semantic action is attached somewhere to the right hand side of a rule. We will see more information about this shortly. The boolean value can be set to false inside the semantic action invalidating the match in retrospective, making the parser fail. Qi allows us to bind a nullary or a single argument function, like above. The other arguments are simply ignored.

It is feasible to bind any function object (such as generated by Boost.Bind or Boost.Lambda) as an semantic action. Even if the documentation shows a couple of examples (see here), I would not recommend using those libraries in this context. For me the preferred method of writing semantic actions is to employ Boost.Phoenix – a companion library bundled with Spirit. It is like Boost.Lambda on steroids, with special custom features that make it easy to integrate semantic actions with Spirit. If your requirements go beyond simple parsing, I suggest that you use this library. All the following examples in this article will use Boost.Phoenix for semantic actions. But whatever method you use, please let me highlight the following:

The three libraries allow you to utilize special placeholders to control parameter placement (_1, _2, etc.). Unfortunately, each of those libraries has it’s own implementation of the placeholders, all in different namespaces. You have to make sure not to mix placeholders with a library they don’t belong to and not to use different libraries while writing a semantic action.

Generally, for Boost.Bind, use ::_1, ::_2, etc. (yes, these placeholders are defined in the global namespace).

For Boost.Lambda use the placeholders defined in the namespace boost::lambda.

For semantic actions written using Boost.Phoenix use the placeholders defined in the namespace boost::spirit. Please note that all existing placeholders for your convenience are also available from the namespace boost::spirit::qi.

The current version of Spirit (V2.2) does not yet support binding a native C++0x lambda function as a semantic action, but this is something we are currently working on. You can expect this to be possible in the near future.

Writing Phoenix based Semantic Actions

Writing a semantic action with Phoenix is beneficial as Spirit ‘knows’ about Phoenix. If you write them with the help of Phoenix you can utilize special placeholders Spirit provides you with. Those placeholders refer to elements in the context of the current parser execution such as attributes, local variables and inherited attributes of rules, etc. None of the other means of writing semantic actions (using Bind, Lambda, or hand written function objects) gives you direct access to those elements. The following table lists all available placeholders exposed by Spirit (as mentioned earlier, all are defined in the namespace boost::spirit::qi). Again, please note, these are only available inside a semantic action and only if the semantic action is written utilizing Phoenix.

Placeholder	Description
`_1, _2, ... , _N`	Nth attribute of the parser `p`
`_pass`	Assign `false` to `_pass` to force a generator failure.
`_val`	The enclosing rule’s synthesized attribute.
`_r1, _r2, ... , _rN`	The enclosing rule’s Nth inherited attribute.
`_a, _b, ... , _j`	The enclosing rule’s local variables (`_a` refers to the first).

Obviously, the placeholders listed in the last three rows of the table are meaningful only if used in a rule definition. As an example, let us rewrite the semantic action from above with Phoenix:

std::string input("1234");
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    qi::int_
    [
        std::cout << "Matched integer: " << qi::_1 << "\n";
    ]
);

One problem with earlier versions of Spirit (i.e. Spirit.Classic) was that while parsing sequences of things it was difficult to avoid calling a semantic action prematurely. For instance, in a parser sequence of two integer parsers (int_[f1] >> ‘,’ >> int_[f2]) the function f1 got called immediately after the first integer matched, and even if the second integer parser would fail later on. In the current version of Spirit this is not an issue anymore as it is possible to attach a semantic action to the whole sequence while still referring to the single attributes of the different sequence elements:

std::string input("1234,2345");
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    (qi::int_ >> ',' >> qi::int_)
    [
        std::cout << "Matched integers: "
              << qi::_1 << " and " << qi::_2 << "\n";
    ]
);

Here, qi::_1 refers to the attribute matched by the first integer parser, and qi::_2 to the second one.

Initially I was planning to additionally describe the internal interface of a semantic action. Utilizing this interface allows you to write your own function objects and still to get access to the elements of the parser context mentioned above (attributes, the rule’s local variables and inherited attributes, etc.). But this post already got longer as anticipated, which is why I defer this discussion to a second Tip of the Day. Stay tuned!

Rating: 4.6/5 (8 votes cast)

Parsing Skippers and Skipping Parsers

Hartmut Kaiser — Wed, 24 Feb 2010 13:32:08 +0000

Spirit supports skipper based parsing since its very invention. So this is definitely not something new to Spirit V2. Nevertheless, the recent discussion on the Spirit mailing list around the semantics of Qi’s lexeme[] directive shows the need for some clarification. Today I try to answer questions like: “What does it mean to use a skipper while parsing?”, or “When do I want to use a skipper and when not?”.

While parsing some formatted data stream it is very often desirable to ignore some parts of the input. A common example would be the need to skip whitespace and comments while parsing some computer language. Certainly it is possible to explicitly account for the tokens to skip (such as the whitespace or the comments) while writing the grammar. But this can get very tedious as those tokens are valid to appear at any point in the input.

For the sake of simplicity, let us assume we want to parse a simple key/value expression: key=value, where we want to allow for any number of space characters before, in between, or after the key or the value. A naive grammar matching the plain key/value pair without whitespace skipping would look like (see Parsing a List of Key-Value Pairs Using Spirit.Qi for more details):

pair  =  key >> '=' >> value;
key   =  qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");
value = +qi::char_("a-zA-Z_0-9");

If we want to explicitly accommodate the rule pair to match any interspersed space characters we get:

pair  = *space >> key >> *space >> '=' *space >> value >> *space;

which, while it produces the desired result, is not only error prone, but additionally difficult to write, to understand, and to maintain. If we look closer we see, that the process of skipping the whitespace tokens is easily automated. It seems to be sufficient to insert a repeated invocation of the space parser (or generally, any skip parser) in between the elements of the user defined parser expression sequences.

In fact, that is exactly what Spirit can do for you! The library invokes any supplied skip parser upon entry to the parse member function of any parser conforming to the PrimitiveParser concept. The skip parser has to be supplied by calling a special API function: phrase_parse:

namespace qi = boost::spirit::qi;
typedef std::string::const_iterator iterator;

qi::rule pair = key >> '=' >> value;
qi::rule key = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");
qi::rule value = +qi::char_("a-zA-Z_0-9");

std::string input(" key = value ");
iterator_type begin = input.begin();
iterator_type end = input.end();
qi::phrase_parse(begin, end, pair, qi::space);

This code snippet illustrates several important things:

The function qi::phrase_parse is equivalent to the API function qi::parse except for its additional parameter, the skip parser. Our example utilizes qi::space, but it is possible to use any other, even more complex parser expression as the skipper instead.
All rules which we want to perform the skip parsing need to be declared with the type of the skip parser they are going to be used with. Our example specifies the type of the qi::space parser expression, which is qi::space_type. For more complex parser expressions you might want to use a (mini) grammar or take advantage of BOOST_TYPEOF to let the compiler deduce the actual type.
All rules which should not perform skip parsing have to be declared without an additional skip parser type. These rules behave like an implicit lexeme[] directive (for more information about lexeme[], see below), they inhibit the invocation of the skip parser even if they are executed as part of a rule with an associated skipper.

In the example above we suppressed skipping while matching either the key or the value, otherwise our grammar would match any additional space character inside the key or value as well. Remember, the expression char_ conforms to the PrimitiveParser concept, it will execute the skip parser for each of its invocations. In this case any skip parser would be executed in between any two of the matched characters.

Sometimes it is necessary to turn of skipping for a smaller part of the grammar only. For this purpose Spirit implements the lexeme[] directive. This directive inhibits skipping during the execution of the embedded parser. For instance, parsing a quoted string of alphanumeric characters would look like this:

string = lexeme['"' >> *alnum >> '"'];

Here the lexeme directive disables skipping while matching the string, which avoids ‘loosing’ characters otherwise matched by the skipper. Please note: lexeme[] performs a pre-skip step, even if it is not a PrimitiveParser itself (it is essentially considered to be a logical primitive by design). If this is undesired, you can utilize the no_skip[] directive instead:

string = '"' >> no_skip[*alnum] >> '"';

This parser will match all the characters in between the quotes, even if the string starts with a character sequence matched by the applied skip parser. The no_skip[] directive is semantically equivalent to lexeme[] except it does not perform a pre-skip before executing the embedded parser. Note: the no_skip[] directive has been added only recently. It will be available starting with the next release (Boost V1.43).

This short article would not be complete without mentioning the skip[] directive. This directive is the counterpart to lexeme[]. It enables skipping for the embedded parser. Without any argument it can be used inside a lexeme or no_skip directive only. In this case it just re-enables the outer skipper:

string = lexeme['"' >> *(alpha | skip[digit]) >> '"'];

This (purely hypothetical) parser would enable skipping inside a string as long as it matches digits. But the skip directive can do more. It may take an additional argument allowing to specify a new skipper, for instance:

skip(qi::space)[*alnum]

which will skip spaces while executing the embedded *alnum parser. This form of the directive can be applied for two purposes. It can be used either for changing the current skip parser or to establish skipping inside a context otherwise not doing skipping at all (even if invoked with the qi::parse() API function).

For more detailed information about all the mentioned directives please see the corresponding documentation.

Rating: 4.6/5 (8 votes cast)

Parsing Arbitrary Things in Any Sequence

Hartmut Kaiser — Wed, 17 Feb 2010 15:45:25 +0000

Recently, there have been a couple of questions on the Spirit mailing list asking how to parse as set of things known in advance in any sequence and any combination. A simple example would be a list of key/value pairs with known keys but the keys may be ordered in any sequence. This use case seems to be quite common. Fortunately Spirit provides you with a predefined parser component designed for exactly that purpose: the permutation parser.

Spirit’s permutation parser a ^ b matches either a, b, a >> b, or b >> a, where a and b can be arbitrary parser expressions. Just like normal sequences this operator can be utilized to combine more than two operands. For instance, the expression a ^ b ^ c will match a or b or c (or an combination thereof) in any sequence. The attribute propagation rule for the permutation parser is

a: A, b: B --> (a ^ b): tuple, optional >

As usual, if one or more operand of the expression do not expose any attribute (expose unused_type as their attribute, which is equivalent), this operand disappears from attribute handling:

a: A, b: Unused --> (a ^ b): optional;

The permutation parser works out of the box whenever you do not require to match all of the elements in the input. But what if you want strict permutation (operands get matched exactly once)? You have two possibilities, as often, one simple and less versatile and one more complex but universally applicable solution. The simple solution is to parse the input and to check afterward whether all optionals in the resulting attribute have been filled. I will leave that solution as an exercise for the reader.

If we assume the attribute to be a (Fusion) tuple of optionals, containing one optional for each of the parser components in the permutation parser we can write the following code (thanks to Carl Barron for the initial idea).

This code defines a Phoenix function (a lazy function encapsulating some custom functionality) checking whether one or more of the optionals in a given Fusion sequence are empty. The Fusion algorithm find_if iterates over the given sequence of optionals, invoking the option_empty::operator() for each of the elements. fusion::find_if stops iterating on the first invocation returning true and returns the iterator to the element it stopped on. This is very similar to the well known std::find_if algorithm.

namespace phoenix = boost::phoenix; namespace fusion = boost::fusion; namespace qi = boost::spirit::qi; class no_empties_impl { // helper function object to be invoked by fusion::find_if struct optional_empty { template bool operator ()(T const& val) const { return !val; // return true if 'val' is empty. } }; public: template struct result { typedef bool type; }; // This operator will get called from the semantic action attached // to the permutation parser. The parameter refers to its overall // attribute: the fusion tuple of optionals. template bool operator ()(T const& t) const { // look for an empty optional, if any return false. return fusion::find_if(t) == fusion::end(t); } }; // define the Phoenix function phoenix::function const no_empties = no_empties_impl();

The overall Phoenix function no_empties will return false if we found at least one non-initialized optional in the passed sequence. The following code snippet illustrates how everything fits together:

std::string input ("BCA"); std::string::const_iterator begin = input.begin(); std::string::const_iterator end = input.end(); qi::parse(begin, end, (qi::char_('A') ^ 'B' ^ 'C')[qi::_pass = no_empties(qi::_0)]);

We assign the result of the invocation of no_empties to Qi’s predefined placeholder _pass. If we assign false, then the parser the semantic action is attached to will be forced to fail in retrospective (even if it matched the input successfully before). As a result the overall parser expression will succeed as long as a) the permutation parser matches its input and b) the Phoenix function inside the semantic action returns true.

For more information about the permutation parser please consult its documentation here. Overall, this example is a bit more complex than the average parser you might usually write. It utilizes three libraries: Spirit, Phoenix, and Fusion in a seamless manner. But for sure, once you understand the idea, it will be easier for you to come up with similar solutions. Spirit has been designed with Phoenix and Fusion in mind, and in fact it relies on Fusion heavily itself. As a result, the integration of those libraries is almost perfect.

Rating: 5.0/5 (2 votes cast)