Feb 12

Attribute Propagation and Attribute Compatibility

By Hartmut Kaiser Add comments

Narinder Claire asked a seemingly innocent question on the Spirit mailing list the other day. After starting to write an answer I realized that this question is not innocent at all as it touches the very fabric of Spirit: the rules of attribute handling. Many people have a hard time to properly understand what is going on in the nether regions of Spirit. More importantly, they have a hard time to understand why is Spirit implemented the way it is.

The question was (I’m paraphrasing to fit it into this article): the following code compiles and runs fine:

std::map<std::string, std::string> the_dictionary;
void insert(std::pair<std::string, std::string>& p)
{
    the_dictionary[p.first] = p.second;
}

namespace qi = boost::spirit::qi;

typedef std::string::iterator iterator;
qi::rule<iterator, std::string()> identifier = qi::alpha >> *qi::alnum;
qi::rule<iterator, std::pair<std::string, std::string>()> assignment;
assignment = identifier >> '=' >> identifier >> ';';

std::string input("a=b");
qi::parse(input.begin(), input.end(), assignment[&insert]);

It parses the input ‘a=b’ and calls the function insert() after it matched the input, passing along a pair of strings “a” and “b”.

It stops compiling if the semantic action is moved into the assignment rule:

assignment = (identifier >> '=' >> identifier >> ';')[&insert];
qi::parse(input.begin(), input.end(), assignment);

On the other hand, it still compiles and runs if the code is written as:

assignment = (identifier >> '=' >> identifier >> ';');
std::pair<std::string, std::string> pp;
qi::parse(input.begin(), input.end(), assignment, pp);

What (the hell) is going on? Any help with an explanation would be appreciated!

The Attribute Rules

Attribute handling in Spirit is governed by two seemingly contradictory set of rules: attribute propagation rules and attribute compatibility rules.

The attribute propagation rules are well formalized and are described in the documentation for each of the components. They are applied from the ‘bottom up’, describing how the exposed attributes of the components compose to form the overall attribute of a Spirit expression. For instance, for sequences we find the general rule to be (see the docs here):

a: A, b: B –> (a >> b): tuple<A, B>

which reads as:

Given the parser component a exposes an attribute of type A and the parser component b exposes an attribute of type B, then the sequence of parser components a >> b will expose an attribute of type tuple<A, B>.

In terms of attribute propagation, tuple stands for fusion::vector.

At the same time, the use of ‘tuple’ above already hints at the attribute compatibility rules. It means, that a sequence will be able to fill any attribute as long as it is a Fusion sequence of any type (for instance std::pair) holding two elements of type A and B.

The attribute compatibility rules are applied from the ‘top downwards’. They prescribe whether a user supplied attribute can be directly used by the component it is passed to. Spirit requires a given attribute type to conform to a set of concepts in order for this type to be compatible with a component. These concepts are specific to every parser or generator component. Unfortunately the current documentation does not contain a full and explicit description of those conceptual requirements, which is something we definitely need to improve on in the future.

Spirit enforces these concepts (i.e. the compatibility rules) by wrapping all attribute handling into special templates, called customization points. All default implementations of the customization points (CPs) generally expose a functionality, which is aligned with what a user would intuitively expect. The tricky part is that you can provide specializations for all CPs, therefore bending the rules for your types. However, here are the three main rules of attribute compatibility in Spirit:

  • Normal C++ convertibility applies in any case: two types A and B are compatible if there exists a default conversion from A to B,
  • STL containers are compatible amongst each other as long as their value_type is compatible
  • All Fusion sequences are compatible if they have the same length and all of their elements are pairwise compatible.

The main exception are sequences, which in addition to their exposed attribute being a Fusion sequence, may be compatible with STL containers if all elements of that sequence are either compatible with the given container or its value_type.

The attribute compatibility rules generally have precedence over the attribute propagation rules and they are applied whenever possible, fully disabling attribute propagation. On the other hand, attribute propagation rules have preference whenever the compatibility rules cannot be applied. This happens:

  • If no (explicit) user supplied attribute is available,
  • Semantic actions are involved,
  • Non-terminals are involved, i.e. for constructs using rule<>, grammar<>, and subrule<>,
  • The user has requested an explicit attribute conversion using attr_cast<>

Each of those cases influences the attribute handling in slightly different ways. We describe the first three below, leaving out attr_cast<> as it exposes semantics very similar to non-terminals.

Edit: some people have been asking what the phrase ‘Semantic actions are involved’ means. It means that attribute compatibility is disabled for a specific rule if the rule’s right hand side (directly) has any semantic actions attached. Semantic actions attached to rules invoked from the right hand side are irrelevant for this specific rule:

// compatibility disabled for 'a'
rule<Iterator, int()> a = int_[f];
// compatibility enforced for 'b'
rule<Iterator, int()> b; b %= int_[f];
// compatibility enabled for 'r'
rule<Iterator, std::vector<int>()> r = a >> b; 
Missing Explicit Attribute

If no user supplied attribute is given, the user either does not want to handle attributes at all (for instance, using a parser to verify any given input only), or he/she uses semantic actions to perform operations related to attributes. We generally assume that as soon as semantic actions are attached to an expression, the user takes full responsibility for attribute handling. Here is a simple example:

using boost::phoenix::ref;
int i = 0;
std::string input("123");
qi::parse(input.begin(), input.end(), qi::int_[ref(i) = qi::_1]);

In this case no attribute is passed to (forced upon) the parser, therefore the placeholder qi::_1 refers to the attribute exposed (propagated) from the parser the semantic action is attached to (the qi::int_): qi::_1 refers to an int. This is caused by the action component, which always makes sure that the attribute type passed to the semantic action is the same as the exposed (propagated) attribute of the component it is attached to.

Non-Terminals

Non-terminals expose a well defined attribute type – it is specified as one of the template parameters. For instance, a Qi rule exposing a std::string has to be defined as:

qi::rule<iterator, std::string()> identifier;
identifier = qi::alpha >> *qi::alnum;

Any right hand side parser for this rule will be invoked with an attribute of type std::string (if no semantic actions are attached to any of its parts). This mainly implies that the rule’s attribute type has to be compatible with the right hand side component (the rule passes an instance of the given attribute type to its right hand side).

At the same time, if the right hand side has a semantic action attached, no attribute is passed to it:

qi::rule<iterator> identifier;
identifier = (qi::alpha >> *qi::alnum)[&store_ident];

This forces the action component to synthesize a new attribute instance. In this case, in accordance with the known attribute propagation rules, the type of this synthesized attribute is fusion::vector2<char, std::vector<char> >, which is passed along to the semantic action. Obviously, the function store_ident has to accept a parameter of this type as well.

Sometimes it is necessary to combine both, semantic actions and attribute compatibility. Spirit allows to enforce this by using the operator%=():

qi::rule<iterator, std::string()> identifier;
identifier %=
    qi::string("ident:") >> (qi::alpha >> *qi::alnum)[&store_ident];

Here the rule’s attribute will be passed to the first component in the right hand side’s sequence (the qi::string(“ident:”)), but the function store_ident will still be called with the fusion::vector2 as described above.

Back to the Question at Hand

Let’s apply the newly gained knowledge to the examples given in the original question.

The first example above compiles fine because the semantic action is attached to the rule assignment, which exposes the std::pair<> as its attribute. This causes the semantic action to be invoked with that very same std::pair<>. At the same time the rule’s std::pair<> attribute is compatible with the sequence at its right hand side (both types are Fusion sequences of the same length and hold the same element types).

The second example does not compile, because the semantic action is attached to the sequence. According to the attribute propagation rules for sequences, its attribute type will be a fusion::vector2, which is exactly the type of the attribute passed to the function insert. We cannot (default-) convert the Fusion vector2<> to a std::pair<>, which causes the compilation error.

The third example above does compile. We can apply the compatibility rules for all components involved. The std::pair<> is passed from the very top is compatible with the rule assignment and with the rule’s right hand side. The attribute is handed down to the single components, splitting of the original std::pair<>, passing one of the pair’s std::strings at a time to each of the elements of the sequence.

Conclusion

Given all the complexity of attribute propagation and attribute compatibility we sometimes wish to have chosen a simpler path. At the same time anything simpler than what we have would be a lot less powerful and less flexible. We have had a lot of discussions whether to abandon the attribute compatibility rules in future versions of Spirit all together. But we always come back to the understanding that overall the current scheme is a good thing. It took us a while to remove the major problems and to reach a sufficient level of consistency. However we believe, that Spirit provides you with a powerful and tunable way of handling attributes, which is one of the biggest innovations introduced by this library.

7 Responses to “Attribute Propagation and Attribute Compatibility”

  1. Jamboree says:

    I noticed that with Spirit V2.4.2, a new directive ‘as<T&>()[] ‘ is introduced in, and I believe that for Narinder’s 2nd example, it can be modified as:

    assignment = as<std::pair<std::string, std::string> >()[(identifier >> '=' >> identifier >> ';')][&insert];
    

    Since I’m not using V2.4.2 (not released yet) , correct me if I’m wrong.

    • Hartmut Kaiser says:

      Yes that should work as well. This creates an instance of the std::pair and passes this to the embedded parser (the compatibility rules apply here). The semantic action will then be invokes using the very same attribute type.

      However, I personally do not like to litter my grammars with attribute related constructs. But that is purely IMHO.

      Regards Hartmut

  2. Frank Dellaert says:

    This is interesting. However, the code snippets you give do not compile, which is probably obvious to experts, but thoroughly confuses newcomers (like me). For example,

    int i = 0;
    std::string input(“123”);
    qi::parse(input.begin(), input.end(), qi::int_[i = qi::_1]);

    I could only get to work by editing as below

    int i = 0;
    std::string input(“123”);
    std::string::iterator it = input.begin();
    qi::parse(it, input.end(), qi::int_[ref(i) = qi::_1]);

    and only compiles if I include the relevant phoenix headers…

    Please help us newbies? I’m willing (expecting) to wrestle a bit, but in this example it was avoidable, I think.

    Frank

    • Hartmut Kaiser says:

      Frank,

      you’re certainly right, the example I gave was erroneous. It’s corrected now.

      Thanks!
      Regards Hartmut

  3. Frank Dellaert says:

    After trying my hand at some attribute synthesizing (see my Point3 example on the Spirit-general mailing list), I guess I sympathize with the comment “Given all the complexity of attribute propagation and attribute compatibility we sometimes wish to have chosen a simpler path.”

    One simpler path that I used a few years back, when I wrote a combinatorial parser in OCaml, would translate here into abandoning _val and -as the single operating principle- have the return type of semantic actions determine the type of the synthesized attribute. The pure functional style also de-emphasized using side effects: the return value of the parser is what you got, with the type you wanted.

    As an aside, I loved my OCaml parser, but had to switch to C++ when I saw that my love for ML was not shared by my students. In Spirit 2, I am re-discovering the joys of combinatorial attribute grammars, and I love it! However, one note of caution: the complexity of making Spirit work might turn off a lot of potential users (including my students), possibly undercutting the impact of the library itself.

    Cheers
    Frank

    • Hartmut Kaiser says:

      The pure functional approach you’re suggesting was exactly what we considered over and over again. It might be simpler in OCaml to deal with the ‘pure’ synthesized attributes returned from the parsers, but in C++ this turns out to be a pain on its own.

      What we do now is to develop utree (should be available with Boost V1.47), a new dynamic data structure you can use as the attribute for your parsers. It can store arbitrary AST’s for almost any grammar. No fuss, no hassle (at least we hope so). Stay tuned!

  4. anders li says:

    hi Hartmut,
    Could you please give us some preview on explanation, usage or some SVN examples on utree ?

Leave a Reply

preload preload preload