Dec 14

The Magical Power of Attributes in Spirit – Operators

By Hartmut Kaiser Add comments

This is the second in a series of articles giving an introduction to the attribute handling in Spirit. It might be a good idea to have a look at the first part before reading on as I’m going to rely on what is described there (The Magical Power of Attributes in Spirit – Primitives).  In this article I will talk about Spirit’s operators. These are being used to combine other components into more complex expressions so as to build more powerful Qi parsers and Karma generators. We have four different types of operators in Spirit, each of which will be covered in more detail below.

Each operator implements its own set of attribute propagation rules. These rules define the synthesized attributes of Qi expressions and the consumed attributes of Karma expressions. As we have seen already, it is very important to understand how the synthesized and consumed attributes are formed. Parsers and generators are most elegant and efficient if they are tightly integrated with your data structures, and learning to match parsers to attributes is most crucial to successfully and effectively use Spirit.

This article covers  the most important of Spirits operators only, so for a full list of all available operators and how the they synthesize or consume attributes please see the documentation sections Parser Compound Attribute Rules and Generator Compound Attribute Rules.

Throughout Spirit’s documentation and for the purpose of these articles we use a special notation to describe the attribute propagation rules for compound components. It is of the form:

a: A, b: B, … –> (composite-expression): composite-attribute

where: a, b, etc. are the operands; A, B, etc. are the operand’s attribute types; composite-expression is the expression involving the operands; and composite-attribute is the resulting attribute type of the composite expression. For instance:

a: A, b: B –> (a >> b): tuple<A, B>

which reads as: given, a and b are components, and A is the type of the attribute of a, and B is the type of the attribute of b, then the type of the attribute of a >> b will be tuple<A, B>.

All examples shown here are based on the test functions introduced in the first part of this article.

The Optional Operator

The optional operator is the simplest of all operators used in Spirit. As the name implies, it is used to make some part (or the whole) of a Spirit expression optional. For Qi parsers the corresponding parts of the input are optional and the whole input will match even if those parts are missing. For Karma generators the corresponding data will be emitted only if it is present, therefore adapting the generated output to the available data at runtime. Spirit uses the unary minus (‘-’) as the optional operator. Here are the attribute propagation rules for the optional operator:

Qi a: A –> (-a): optional<A>
a: Unused –> (-a): Unused
Karma a: A –> (-a): optional<A>
a: Unused –> (-a): Unused

The optional<A> in the table above is a placeholder for boost::optional<A>. Unused stands for unused_type, which is just Spirits way of saying ‘I don’t care’. In this context it is interpreted as if the component does not expose any attribute at all. The result of executing an optional parser expression will be either a non-initialized boost::optional<A> if the embedded parser did not match, or it will hold its returned attribute if it did match. An optional generator expression will emit the supplied attribute only if the boost::optional<A> has been properly initialized, otherwise no output will be generated.

The following examples parse an optional integer showing it succeeds even if no integer is found in the input. The examples also demonstrates the generation of an optional integer, again, always succeeding, even if no integer is passed to the generator.

assert(test_parser(-qi::int_, "", boost::optional<int>()));
assert(test_parser(-qi::int_, "1234", boost::optional<int>(1234)));
assert(test_generator(-karma::int_, boost::optional<int>(), "")));
assert(test_generator(-karma::int_, boost::optional<int>(1234), "1234")));
assert(test_generator(-karma::int_, 1234, "1234")));

As the last line shows, even if the consumed attribute of the Karma operator is a boost::optional, it is still compatible with any non-optional attribute as long as it is convertible to the consumed attribute of the embedded generator.

Repetition Operators

Spirit has several repetition operators, such as the Kleene star (unary ‘*’), the plus (unary ‘+’) operator, and lists (‘%’), etc.. All of them are similar as they handle several elements of the same type. In terms of attribute handling, they are all equivalent. The following table shows the attribute rules based on those for the Kleene star operator, but all others are similar.

Qi a: A –> (*a): vector<A>
a: Unused –> (*a): Unused
Karma a: A –> (*a): vector<A>
a: Unused –> (*a): Unused

The vector<A> is used as a placeholder for any standard container (it is possible to adapt your own data structures). Unused stands for unused_type. In short, repetitive parsers directly fill any supplied standard container as long as the value type of the container is compatible with the synthesized attribute of the embedded parser. Likewise, repetitive generators directly use the elements of any supplied standard container, one element for each invocation of the embedded generator.

std::vector<char> v;
v.push_back('a'); v.push_back('b'); v.push_back('c');
assert(test_parser(*qi::char_, "abc", v));
assert(test_generator(*karma::char_, v, "abc")));

Sequence Operators

Sequences are arguably the most powerful and useful of all operators but, at the same time, their attribute handling is the most difficult to fully understand. Sequence operators are used to build, well, sequences of other components. In order to build a sequence in Qi we use the ‘>>’ operator. In Karma the ‘<<’ operator is used. This is probably not surprising as it corresponds to the conventions used for C++ streams, where those operators are used for input (‘>>’) and output (‘<<’) as well. Here are the main attribute propagation rules forsequences:

Qi a: A, b: B –> (a >> b): tuple<A, B>
a: A, b: Unused –> (a >> b): A
Karma a: A, b: B –> (a << b): tuple<A, B>
a: A, b: Unused –> (a << b): A

Again, the tuple<A, B, …> in the table above is used as a placeholder only. The notation stands for any Boost.Fusion sequence holding elements of types A, B, … etc. Unused stands for unused_type.

Sequences in Boost.Fusion are a generalization of tuples, which includes Fusion’s own data structures (such as fusion::vector, fusion::list, etc.), Fusion’s views, std::pair, boost::array, and any C++ struct (although this requires to use the macro BOOST_FUSION_ADAPT_STRUCT). Especially the ability to ‘map’ a plain C++ struct onto a Spirit component sequence is a very powerful concept enabling tight integration of parsing and generation with your own (arbitrary) data structures.

The following example parses two comma separated double numbers and outputs those in reverse order:

assert(test_parser(qi::double_ >> ',' >> qi::double_,
    "1.0,2.0", std::make_pair(1.0, 2.0)));
assert(test_generator(karma::double_ << ',' << karma::double_,
    std::make_pair(2.0, 1.0), "2.0,1.0")));

We use a std::pair<double, double> as the attribute for our sequence (std::pair perfectly conforms to the notion of a Fusion sequence). The literal ‘,’ does not expose any attribute (Unused), which makes it ‘disappear’ from the attribute handling (see the related attribute propagation rules in the table above). As a result, the member variable first of the pair is used as the attribute for the first double in the sequence,  while the pair’s member variable second is the used as the attribute for the second double in the corresponding Spirit expression.

Qi and Karma expose a set of API functions usable with sequences. Very much like the functions of the scanf and printf families these API functions allow to pass the attributes for each of the elements of the sequence separately. Using the corresponding overload of Qi’s parse() or Karma’s generate() the expression above could be rewritten as:

double d1 = 0.0, d2 = 0.0;
qi::parse(f, l, qi::double_ >> ',' >> qi::double_, d1, d2);
karma::generate(sink, karma::double_ << ',' << karma::double_, d2, d1);

where the first argument is used for the first double_, and the second argument is used for the second double_. This provides a clear and comfortable syntax, more similar to the placeholder based syntax as exposed by printf or boost::format.

It is possible to pass a (standard) container as an attribute for a Spirit sequence if all elements of that sequence expose either the container or its value type as their attributes. So we could rewrite the example above by replacing the std::pair<double, double> with any standard container type, such as std::vector<double>. This feature has been added to ensure full semantic compatibility of repetitive constructs like double_ % ‘,’ with the equivalent sequence expression double >> *(‘,’ >> double_), but has been proven to be useful in general.

We have seen in the first article about primitives how the special placeholder _1 can be used in a semantic action to refer to the attribute of the expression it is attached to. But wait, there is more! If you attach a semantic action to a sequence, each of the attributes of the elements can be accessed using the corresponding placeholders _2, _3, etc.:

(qi::double_ >> ',' >> qi::double_)
    [ std::cout << qi::_1 << ',' << qi::_2 ]
(karma::double_ << ',' << karma::double_)
    [ karma::_1 = 1.0, karma::_2 = 2.0 ]

The first expression will print the two matched doubles separated with a comma, while the second expression will emit “1.0,2.0″. This is consistent with the multi-attribute API functions mentioned above.

The Alternative Operator

Alternatives in Spirit are built using the ‘|’ operator. Parser alternatives are used to specify different possible valid input formats to be matched alternatively. The result of an alternative match are attributes with possibly different types. Each of the alternatives may potentially expose its own attribute type. Generator alternatives are used to specify different output formats to apply depending on the type of the supplied attribute. As a logical consequence the attribute propagation rules for the alternative operator are:

Qi a: A, b: B –> (a | b): variant<A, B>
a: A, b: A –> (a | b): A
a: A, b: Unused –> (a | b): optional<A>
Karma a: A, b: B –> (a | b): variant<A, B>
a: A, b: Unused –> (a | b): optional<A>

Here the variant<A, B> stands for a boost::variant<A, B> and the optional<A> is a placeholder for boost::optional<A>. Please note how Unused is handled in a special way. It disappears from the attribute handling again, but all other alternatives will get optional. If all alternatives expose the same attribute type, the overall attribute will be the same. Let us have a look at an example:

using boost::variant;
test_parser(qi::int_ | qi::bool_, "1234", variant<bool, int>(1234));
test_parser(qi::int_ | qi::bool_, "true", variant<bool, int>(true));
test_generator(karma::int_ | karma::string, 4321, "4321"));
test_generator(karma::int_ | karma::string, std::string("a"), "a"));

The alternative parser simply fills the variant with the appropriate attribute type based on the matched input. The generator alternative has the additional feature of doing the selection of the output format based on the supplied attribute type: data driven format selection at work! The generator will choose the first matching alternative for the supplied attribute. You don’t even have to pass a variant to the generator (although, you certainly can), any attribute compatible with any of the types of the alternatives can be used, causing the corresponding formatting expression to be chosen.

Conclusion

Building Spirit expressions is very similar to building normal C++ expressions: the types of the right hand side and the left hand side must be compatible. The difference is that for Spirit expressions you have to think about the attribute types of the parts of the expression, not the C++ types of those. Once you understand the attribute propagation rules of the different operators it will be a lot easier to build Qi parsers and Karma generators which are compatible with your own data types. Most of the time you should be able to integrate your data without having to resolve to semantics actions. This does not only make your code simpler and more readable, it has the additional benefit of being faster, both at compile time and at runtime.

12 Responses to “The Magical Power of Attributes in Spirit – Operators”

  1. Matt Sutton says:

    Hi Hartmut,
    Thanks for putting these articles together. They really do help a newbie come up to speed. However, this sentence didn’t make sense to me:
    “It is possible to pass a (standard) container as an attribute for a Spirit sequence if all elements expose either the that container or its value type of as their attributes. ”
    Can you clarify?
    Thanks!

    • Hartmut Kaiser says:

      Matt,

      well that’s obviously a typo (which should be corrected now and more understandable now). The bottom line is that any Spirit sequence can be passed a (standard) container if all elements of that sequence have an attribute that:

      a) is compatible with the value_type of the container or
      b) is a (standard) container itself, while the value_type of that container is compatible with the value_type of the container passed by the user

      Here is an example. Let’s say we want to parse an (C++) identifier:

          qi::alpha >> *qi::alnum
      

      here qi::alpha exposes a char and *qi::alnum exposes a container of char’s. The overall sequence could be used for instance with a std::string or a std::vector<char> as the user supplied attribute because the conditions outlined above are met. So you can write:

      std::string result;
      std::string input("a12345");
      std::string::iterator b = input.begin();
      bool r = qi::parse(b, input.end(), qi::alpha >> *qi::alnum, result);
      assert(r && input == result);
      

      Similarily you could write:

      std::string result;
      std::string input("a12345");
      std::string::iterator b = input.begin();
      qi::rule<iterator, std::string()> rule = qi::alpha >> *qi::alnum;
      bool r = qi::parse(b, input.end(), rule, result);
      assert(r && input == result);
      

      but that’s probably more a topic for the next (upcoming) article covering non-terminals.

      HTH
      Regards Hartmut

      • Matt Suttton says:

        Ah, yes. OK, that makes sense and is obviously good to know. It sounds like it is almost akin to a list flattening assuming all elements of all lists are of the same type? Or, is it just as simple as list concatenation and I’m envisioning it as more complex?
        Thanks again!
        Matt

        • Hartmut Kaiser says:

          Matt,

          you can call it list flattening indeed as sequence elements might expose parts of that list themselves.

          Regards Hartmut

  2. Great articles, very useful intro to Spirti 2.

    Quick question about note on the semantic actions in this instalment:

    “Most of the time you should be able to integrate your data without having to
    resolve to semantics actions.”

    Does the “integrate your data” mean that now parsing data, fetching values, composing objects from those, etc. is generally possible without employing the semantic actions?

  3. Frank Dellaert says:

    Well, I just spent one unproductive hour to even compile the snippet

    assert(test_parser(
        qi::double_ >> ',' >> qi::double_, "1.0,2.0", 
        std::make_pair(1.0, 2.0)));
    

    I couldn’t. I finally figured out that using a fusion::tuple instead of a make_pair did the trick. Maybe there is some combination of includes that makes it work? Perhaps this could be clarified a bit in the article and in the spirit documentation.

    Thanks for the article, however, which nicely compliments the spirit documentation. I’d say the latter falls short in explaining the basic concepts clearly.

    • Hartmut Kaiser says:

      You probably just forgot to include:

      #include <boost/fusion/include/std_pair.hpp>
      

      This header file adapt’s any std:pair as a Fusion sequence.

  4. Chris says:

    I tried the example, the alternation example with bool and int, and it works great. I want to use an int and a string:

        boost::variant<int, std::string> val;
        std::string test("1234");
        std::string::iterator b = test.begin();
        qi::parse(b, test.end(), qi::int_ | +qi::char_("a-zA-Z_0-9"), val);
    

    But I get screens of errors when I try this. Can you see what I may be missing here?

    Thanks, -Chris

    • Hartmut Kaiser says:

      Hmmm, compiles (and runs) for me (SVN trunk). What Boost version do you use?

      Regards Hartmut

  5. Chris says:

    I was using 1.46.1. I pulled SVN trunk and tried it and it worked for me also.

    I was able to get it to work with 1.46.1 by breaking the +qi::char_(“a-zA-Z_0-9″) into a separate rule with an explicit std::string signature.

    Thanks, -Chris

  6. Krzyszof says:

    Rules for attributes of compound components are nicely documented in “Compound Attribute Rules” but what about components combined with operator ()? It is “obvoius” that attribite of () is same as attribute of internal components?

    I assume:
    1) a: A, b: A, c: A –> a >> ( b > c ): vector
    2) a: A, b: B, c: C –> a >> ( b > c ): tuple<A, tuple >

    Is there some (easy?) way to force case 2 to behave like:
    3) a: A, b: B, c: C –> a >> ( b > c ): tuple

    Of course removing brackets is not an option.

Leave a Reply

preload preload preload