Jan 14

Over at the Freenet #boost IRC channel somebody (I think it was @VeXocide) suggested to write a ‘Tip of the Day’ about the Qi directive raw[]. I was told this ‘was a major stumbling stone’ while learning Qi. I always appreciate to get suggestions for articles, so here we go…

For those of you following the discussions around Spirit, the raw[] directive might be somewhat surprising. Almost every introduction talks about Spirit as being fully attributed. That means that every component exposes its own, specific attribute type representing the matched input (in Qi) or the data to emit output for (in Karma). This is a major change from earlier Spirit versions (we call those Classic today) which created so called transduction parsers. Transduction parsers ‘return’ a pair of pointers (iterators) to the matched portion of the input without attempting to convert it to any specific result type.

The directive qi::raw[] reintroduces transduction parsing into Qi. It is doing that by exposing as its attribute the pair of iterators to the input sequence matched by its embedded parser. To be concise, it exposes an instance  of a boost::iterator_range<Iterator> containing the two iterators (where Iterator is the type of the underlying input stream as passed to qi::parse).

Here is an example matching any valid C++ identifier:

namespace qi = boost::spirit::qi;
std::string input("abc123_");
std::string::const_iterator b = input.begin();
std::string ident;
qi::parse(b, input.end(),
    qi::raw[(qi::alpha | '_') >> *(qi::alnum | '_')], ident);

This example works as expected as the std::string attribute (the variable ident) is compatible with the raw[] component (the std::string implements a similar constructor as the iterator_range natively supported by raw[]). The embedded parser will match the C++ identifier without any attribute conversion as the raw[] invokes it with an attribute of the type unused_type, effectively disabling attribute handling. If the embedded parser succeeds the iterators pointing to the matched input sequence are used to initialize the std::string passed in as the attribute. Voila!

In this example it would be just more complex (but still possible) to write a parser expression utilizing Spirit’s built in attribute propagation rules. At least you would need to write char_(‘_’) instead of the plain ‘_’. Sometimes though, for other use cases,  it is a lot more difficult or just impossible to employ the attribute magic. That is where raw[] really shines. It allows to embed complex parser expressions without having to worry about attribute matching and propagation. The embedded parser will run as fast as possible as it is invoked in ‘match mode’ only, which skips attribute conversion.

2 Responses to “Why might I want to use the directive qi::raw[ ]?”

  1. Lars Viklund says:

    All C++ identifiers minus the ones containing the nondigit class universal-character-name as per section 2.10 and Annex E in TC2.
    But that’s besides the point somewhat.

Leave a Reply

preload preload preload