This is the second in a series of articles giving an introduction to the attribute handling in Spirit. It might be a good idea to have a look at the first part before reading on as I’m going to rely on what is described there (The Magical Power of Attributes in Spirit – Primitives). In this article I will talk about Spirit’s operators. These are being used to combine other components into more complex expressions so as to build more powerful Qi parsers and Karma generators. We have four different types of operators in Spirit, each of which will be covered in more detail below.
Each operator implements its own set of attribute propagation rules. These rules define the synthesized attributes of Qi expressions and the consumed attributes of Karma expressions. As we have seen already, it is very important to understand how the synthesized and consumed attributes are formed. Parsers and generators are most elegant and efficient if they are tightly integrated with your data structures, and learning to match parsers to attributes is most crucial to successfully and effectively use Spirit.
This article covers the most important of Spirits operators only, so for a full list of all available operators and how the they synthesize or consume attributes please see the documentation sections Parser Compound Attribute Rules and Generator Compound Attribute Rules.
Throughout Spirit’s documentation and for the purpose of these articles we use a special notation to describe the attribute propagation rules for compound components. It is of the form:
a: A, b: B, … –> (composite-expression): composite-attribute
where: a, b, etc. are the operands; A, B, etc. are the operand’s attribute types; composite-expression is the expression involving the operands; and composite-attribute is the resulting attribute type of the composite expression. For instance:
a: A, b: B –> (a >> b): tuple<A, B>
which reads as: given, a and b are components, and A is the type of the attribute of a, and B is the type of the attribute of b, then the type of the attribute of a >> b will be tuple<A, B>.
All examples shown here are based on the test functions introduced in the first part of this article.
The Optional Operator
The optional operator is the simplest of all operators used in Spirit. As the name implies, it is used to make some part (or the whole) of a Spirit expression optional. For Qi parsers the corresponding parts of the input are optional and the whole input will match even if those parts are missing. For Karma generators the corresponding data will be emitted only if it is present, therefore adapting the generated output to the available data at runtime. Spirit uses the unary minus (‘-’) as the optional operator. Here are the attribute propagation rules for the optional operator:
|Qi||a: A –> (-a): optional<A>
a: Unused –> (-a): Unused
|Karma||a: A –> (-a): optional<A>
a: Unused –> (-a): Unused
The optional<A> in the table above is a placeholder for boost::optional<A>.
Unused stands for
unused_type, which is just Spirits way of saying ‘I don’t care’. In this context it is interpreted as if the component does not expose any attribute at all. The result of executing an optional parser expression will be either a non-initialized boost::optional<A> if the embedded parser did not match, or it will hold its returned attribute if it did match. An optional generator expression will emit the supplied attribute only if the boost::optional<A> has been properly initialized, otherwise no output will be generated.
The following examples parse an optional integer showing it succeeds even if no integer is found in the input. The examples also demonstrates the generation of an optional integer, again, always succeeding, even if no integer is passed to the generator.
assert(test_parser(-qi::int_, "", boost::optional<int>())); assert(test_parser(-qi::int_, "1234", boost::optional<int>(1234))); assert(test_generator(-karma::int_, boost::optional<int>(), ""))); assert(test_generator(-karma::int_, boost::optional<int>(1234), "1234"))); assert(test_generator(-karma::int_, 1234, "1234")));
As the last line shows, even if the consumed attribute of the Karma operator is a boost::optional, it is still compatible with any non-optional attribute as long as it is convertible to the consumed attribute of the embedded generator.
Spirit has several repetition operators, such as the Kleene star (unary ‘*’), the plus (unary ‘+’) operator, and lists (‘%’), etc.. All of them are similar as they handle several elements of the same type. In terms of attribute handling, they are all equivalent. The following table shows the attribute rules based on those for the Kleene star operator, but all others are similar.
|Qi||a: A –> (*a): vector<A>
a: Unused –> (*a): Unused
|Karma||a: A –> (*a): vector<A>
a: Unused –> (*a): Unused
The vector<A> is used as a placeholder for any standard container (it is possible to adapt your own data structures).
Unused stands for
unused_type. In short, repetitive parsers directly fill any supplied standard container as long as the value type of the container is compatible with the synthesized attribute of the embedded parser. Likewise, repetitive generators directly use the elements of any supplied standard container, one element for each invocation of the embedded generator.
std::vector<char> v; v.push_back('a'); v.push_back('b'); v.push_back('c'); assert(test_parser(*qi::char_, "abc", v)); assert(test_generator(*karma::char_, v, "abc")));
Sequences are arguably the most powerful and useful of all operators but, at the same time, their attribute handling is the most difficult to fully understand. Sequence operators are used to build, well, sequences of other components. In order to build a sequence in Qi we use the ‘>>’ operator. In Karma the ‘<<’ operator is used. This is probably not surprising as it corresponds to the conventions used for C++ streams, where those operators are used for input (‘>>’) and output (‘<<’) as well. Here are the main attribute propagation rules forsequences:
|Qi||a: A, b: B –> (a >> b): tuple<A, B>
a: A, b: Unused –> (a >> b): A
|Karma||a: A, b: B –> (a << b): tuple<A, B>
a: A, b: Unused –> (a << b): A
Again, the tuple<A, B, …> in the table above is used as a placeholder only. The notation stands for any Boost.Fusion sequence holding elements of types
B, … etc.
Unused stands for
Sequences in Boost.Fusion are a generalization of tuples, which includes Fusion’s own data structures (such as fusion::vector, fusion::list, etc.), Fusion’s views, std::pair, boost::array, and any C++ struct (although this requires to use the macro BOOST_FUSION_ADAPT_STRUCT). Especially the ability to ‘map’ a plain C++ struct onto a Spirit component sequence is a very powerful concept enabling tight integration of parsing and generation with your own (arbitrary) data structures.
The following example parses two comma separated double numbers and outputs those in reverse order:
assert(test_parser(qi::double_ >> ',' >> qi::double_, "1.0,2.0", std::make_pair(1.0, 2.0))); assert(test_generator(karma::double_ << ',' << karma::double_, std::make_pair(2.0, 1.0), "2.0,1.0")));
We use a std::pair<double, double> as the attribute for our sequence (std::pair perfectly conforms to the notion of a Fusion sequence). The literal ‘,’ does not expose any attribute (Unused), which makes it ‘disappear’ from the attribute handling (see the related attribute propagation rules in the table above). As a result, the member variable first of the pair is used as the attribute for the first double in the sequence, while the pair’s member variable second is the used as the attribute for the second double in the corresponding Spirit expression.
Qi and Karma expose a set of API functions usable with sequences. Very much like the functions of the
printf families these API functions allow to pass the attributes for each of the elements of the sequence separately. Using the corresponding overload of Qi’s parse() or Karma’s
generate() the expression above could be rewritten as:
double d1 = 0.0, d2 = 0.0; qi::parse(f, l, qi::double_ >> ',' >> qi::double_, d1, d2); karma::generate(sink, karma::double_ << ',' << karma::double_, d2, d1);
where the first argument is used for the first
double_, and the second argument is used for the second
double_. This provides a clear and comfortable syntax, more similar to the placeholder based syntax as exposed by
It is possible to pass a (standard) container as an attribute for a Spirit sequence if all elements of that sequence expose either the container or its value type as their attributes. So we could rewrite the example above by replacing the std::pair<double, double> with any standard container type, such as std::vector<double>. This feature has been added to ensure full semantic compatibility of repetitive constructs like double_ % ‘,’ with the equivalent sequence expression double >> *(‘,’ >> double_), but has been proven to be useful in general.
We have seen in the first article about primitives how the special placeholder _1 can be used in a semantic action to refer to the attribute of the expression it is attached to. But wait, there is more! If you attach a semantic action to a sequence, each of the attributes of the elements can be accessed using the corresponding placeholders _2, _3, etc.:
(qi::double_ >> ',' >> qi::double_) [ std::cout << qi::_1 << ',' << qi::_2 ] (karma::double_ << ',' << karma::double_) [ karma::_1 = 1.0, karma::_2 = 2.0 ]
The first expression will print the two matched doubles separated with a comma, while the second expression will emit “1.0,2.0″. This is consistent with the multi-attribute API functions mentioned above.
The Alternative Operator
Alternatives in Spirit are built using the ‘|’ operator. Parser alternatives are used to specify different possible valid input formats to be matched alternatively. The result of an alternative match are attributes with possibly different types. Each of the alternatives may potentially expose its own attribute type. Generator alternatives are used to specify different output formats to apply depending on the type of the supplied attribute. As a logical consequence the attribute propagation rules for the alternative operator are:
|Qi||a: A, b: B –> (a | b): variant<A, B>
a: A, b: A –> (a | b): A
a: A, b: Unused –> (a | b): optional<A>
|Karma||a: A, b: B –> (a | b): variant<A, B>
a: A, b: Unused –> (a | b): optional<A>
Here the variant<A, B> stands for a boost::variant<A, B> and the optional<A> is a placeholder for boost::optional<A>. Please note how Unused is handled in a special way. It disappears from the attribute handling again, but all other alternatives will get optional. If all alternatives expose the same attribute type, the overall attribute will be the same. Let us have a look at an example:
using boost::variant; test_parser(qi::int_ | qi::bool_, "1234", variant<bool, int>(1234)); test_parser(qi::int_ | qi::bool_, "true", variant<bool, int>(true)); test_generator(karma::int_ | karma::string, 4321, "4321")); test_generator(karma::int_ | karma::string, std::string("a"), "a"));
The alternative parser simply fills the variant with the appropriate attribute type based on the matched input. The generator alternative has the additional feature of doing the selection of the output format based on the supplied attribute type: data driven format selection at work! The generator will choose the first matching alternative for the supplied attribute. You don’t even have to pass a variant to the generator (although, you certainly can), any attribute compatible with any of the types of the alternatives can be used, causing the corresponding formatting expression to be chosen.
Building Spirit expressions is very similar to building normal C++ expressions: the types of the right hand side and the left hand side must be compatible. The difference is that for Spirit expressions you have to think about the attribute types of the parts of the expression, not the C++ types of those. Once you understand the attribute propagation rules of the different operators it will be a lot easier to build Qi parsers and Karma generators which are compatible with your own data types. Most of the time you should be able to integrate your data without having to resolve to semantics actions. This does not only make your code simpler and more readable, it has the additional benefit of being faster, both at compile time and at runtime.