Dec 21

The Magical Power of Attributes in Spirit – Directives and Non-terminals

By Hartmut Kaiser Add comments

In the previous two installments of this article series (The Magical Power of Attributes in Spirit – Primitives and The Magical Power of Attributes in Spirit – Operators) we talked about the attribute handling in different constructs utilized to build parsers and generators with Spirit. We will continue this walk through and touch on the remaining parts. Again, I suggest having a look at the first two parts. There are prerequisite information detailed there.

In this article I will talk about directives and non-terminals. There is only so much information we can cover in 3 articles, so this will not be exhaustive. The Spirit documentation is still your friend. Please consult it whenever you are stuck.

Component Directives

Directives are special constructs designed to non-intrusively change the behavior of some part of the grammar. The syntax of directives has one of the following forms:

directive[…grammar to modify…]
directive(param1, param2,…)[…grammar to modify…]

The part of the grammar to modify is embedded inside the square bracket. Directives may have additional parameters (here: param1, param2, …). In terms of attributes Spirit implements three different types of directives:

  • Directives exposing no attribute (unused_type), the classical example is qi::omit[] which inhibits parts of the grammar for attribute handling; we will elaborate on this directive later.
  • Directives exposing the attribute of its embedded components; almost all directives belong to this category, for instance qi::nocase[], qi::lexeme[], karma::left_align[], or karma::buffer[]. These directives are transparent during attribute handling.
  • Directives repeating its embedded component, therefore they expose a container holding the attributes of the embedded component; an example is the repeat[] directive. In terms of its attribute the behavior of repeat[] is very similar to the Kleene star or the plus operators.

The omit[] directive has interesting properties from the standpoint of attributes. If employed as a parser directive, omit[]inhibits the attribute of the embedded parser. This is very useful when parts of the input do not contribute to the overall required attribute, but the parser for that input exposes some arbitrary attribute anyway. For instance:

assert(test_parser(qi::omit[qi::char_] >> qi::int_, "x345", 345));

In this case the character attribute exposed by qi::char_ does not influence the attribute of the sequence. On the other hand, the omit[] directive in Karma is designed to consume an attribute without emitting any output. The type of the consumed attribute is determined by the embedded generator. Consider an attribute of the type: std::pair<int, double>. Let us assume we need to emit output from the second member only (the double):

std::pair<int, double> p(1, 2.0);
assert(test_generator(
    karma::omit[karma::int_] << karma::double_, p, "2.0"));

Apart from omit[], directives do not exhibit such surprising behavior. Please refer to the corresponding documentation to understand the attribute rules for a particular directive.

Non-Terminals

Non-terminals are more interesting. Spirit has two officially supported non-terminal types: grammars and rules (the Spirit Repository additionally has subrules, which conform to a similar interface). The grammar encapsulates a set of rules, primitive parsers and sub-grammars. It is the main mechanism for modularization and composition and they can be composed to form more complex grammars. The rule is a polymorphic component that acts as a named placeholder capturing the behavior of a Spirit expression assigned to it. Both, grammars and rules, are completely identical in terms of attribute handling. For this reason the following section will concentrate on rule‘s only, leaving the grammar construct as an exercise for the reader.

Generally, non-terminals are very similar to functions. They may take parameters – their inherited attributes, and they usually return a value – their synthesized attribute. The types of the inherited and the synthesized attributes have to be explicitly specified when defining the particular grammar or the rule. As an example, the following code declares a Qi rule exposing a double as its synthesized attribute, while expecting a std::string as its only inherited attribute:

qi::rule<Iterator, double(std::string)> r =
    qi::lit(qi::_r1) >> qi::double_;

Three things are worth mentioning here:

  1. The function declaration expression double(std::string) does not declare a function in the common sense. We use this syntax to specify the attributes in a compact way. The double is the synthesized attribute (as ‘returned’ from the rule), while the std::string is an inherited attribute (as it is a ‘parameter’ to the rule).
  2. The right hand side expression assigned to the rule always needs to expose an attribute compatible with the rule’s synthesized attribute. In the example above the lit does not expose any attribute, while the double_ component exposes a double. According to the attribute propagation rules for sequences, the attributes of both sides are compatible indeed.
  3. We use yet another set of special, predefined placeholders to access the inherited attributes a rule has been invoked with. Here qi::_r1 refers to the first inherited attribute of the left hand side rule. Similarily, qi::_r2, qi::_r3, etc. can be used to access the second, third, etc. inherited attributes, if needed.

In the world of generators, non-terminals are just as useful as in the parser world. Generator non-terminals encapsulate a format description for a particular data type, and, whenever we need to emit output for this data type, the corresponding non-terminal is invoked in a similar way as the predefined Karma generator primitives. Karma non-terminals are very similar to the non-terminals in Qi. Generator non-terminals may accept parameters as well, and we call those inherited attributes too. The main difference is that they do not expose a synthesized attribute (as parsers do), but they require a special consumed attribute. Usually the consumed attribute is the value the generator creates its output from. Even if the consumed attribute is not ‘returned’ from the generator we chose to use the same function style declaration syntax as employed in Qi. The example below defines a Karma rule consuming a double while not expecting any additional inherited attributes.

karma::rule<OutputIterator, double()> r = karma::double_;

Karma non-terminals follow the same rules as outlined for Qi non-terminals above. The attributes of the right and left hand sides must be compatible. We did not declare any inherited attributes in the example above, but if we did we would have used karma::_r1, etc. as the placeholders representing those.

It is very important to remember to apply the function style declaration syntax while specifying the attributes of a non-terminal. The problem is that non-terminals may take more template parameters than shown above, one of which is the Qi skip parser (or the Karma delimiting generator). As the template parameters may be passed in any sequence, specifying the attributes using the function style declaration is the only means to identify them. Forgetting to utilize this declaration style might result in difficult to decode compilation errors or in unexpected runtime behavior.

Inherited attributes for rule‘s are very similar to expressions we saw earlier. For instance we described the primitive components char_(‘a’) or lit(“abc”), where the ‘a’ and the “abc” are the inherited attributes passed to the char_ and lit components. The rule defined in the Qi example above always needs to be invoked with a std::string argument. For instance:

std::string str("num: ");
assert(test_parse(r(phoenix::val(str)), "num: 2.0", 2.0));
str = "prefix: ";
assert(test_parse(r(phoenix::val(str)), "prefix: 3.1", 3.1));

In this case the inherited attribute defines the prefix to be matched before matching the double. We wrapped the string parameter into a phoenix::val() because of a problem in the current version of Spirit requiring to pass all inherited attributes except scalars as lazy expressions (function objects). This limitation will be removed in future versions of the library.

Conclusion

Spirit has a lot more components than I was able to cover in these three articles. Some of those do not fit into any category, making them difficult to describe in a generalized way. If you need more information about a specific component it is always the best to consult the documentation. The quick references (see Qi’s quick reference and Karma’s quick reference) give a good overview about what components are available, while each of the components has a separate reference section describing it.

11 Responses to “The Magical Power of Attributes in Spirit – Directives and Non-terminals”

  1. T says:

    Is a Primitive, qi::int_ for example, a rule?
    I am puzzled by the underlining C++ types of a rule and a primitive and their relationship or conversion rule if any.
    In the following code, the last two forms does not compile apparantly due to type mis-match, what concept do I miss here?:

    #include <iostream>
    #include <boost/spirit/include/qi.hpp>
    
    int main()
    {
        using namespace std;
    
        using namespace boost;
        using namespace boost::spirit;
        using namespace boost::spirit::qi; 
    
        int i, j;
        std::string s;
    
        cin >> s;
    
        // compiles OK
        qi::parse(s.begin(), s.end(), qi::int_, i); 
    
        // does not compile
        qi::parse(s.begin(), s.end(), qi::int_[cout << qi::_1], i); 
        // does not compile
        qi::parse(s.begin(), s.end(), qi::int_[qi::ref(j) = qi::_1], i);
    
        cout << endl << i;
        cout << endl << j;
    }
    

    Thanks.

    • Hartmut Kaiser says:

      Hey T,

      that’s several (unrelated) questions at once which I’ll try to answer.

      A qi::int_ is not a rule, it’s a primitive (predefined) parser component designed to match integers. OTOH, a rule is a special parser component matching the input as defined by the right hand side expression it got assigned to. In parser lingo this is called a non-terminal (see here for some explainations: http://en.wikipedia.org/wiki/Parsing_expression_grammar).

      The code which is not compiling for you fails mainly because you are missed to include the proper header file:

      #incude <boost/spirit/include/phoenix.hpp>
      

      This header pulls in the whole Phoenix library, which is probably more than you really need, but for the sake of simplicity I’d suggest to start with that. The qi::_1 is a Phoenix placeholder. It causes the semantic action to be interpreted as a Phoenix expression, which requires the additional #include.

      One minor detail: ref() doesn’t live in the namespace qi but is an utility defined in Phoenix as well, so you might want to write boost::phoenix::ref instead.

      Regards Hartmut

      • T says:

        Thanks for lighting me up. I just modified this code as you suggested and tested with gcc on ubuntu, receiving the following errors for each parse():
        code:

        #include <iostream>
        #include <string>
        #include <boost/spirit/include/qi.hpp>
        #include <boost/spirit/include/phoenix.hpp>
        
        int main()
        {
            using namespace std;
        
            using namespace boost;
            using namespace boost::spirit;
            using namespace boost::spirit::qi;
        
            int i, j;
            std::string s;
        
            cin >> s;
        
            // compiles OK
            qi::parse(s.begin(), s.end(), qi::int_, i);
        
            // does not compile
            qi::parse(s.begin(), s.end(), qi::int_[cout << qi::_1], i);
            // does not compile
            qi::parse(s.begin(), s.end(), qi::int_[boost::phoenix::ref(j) = qi::_1], i);
        
            cout << endl << i;
            cout << endl << j;
        }
        
        g++ -c -pipe -g -Wall -W -D_REENTRANT -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED -I../../qtsdk-2009.05/qt/mkspecs/linux-g++ -I. -I../../qtsdk-2009.05/qt/include/QtCore -I../../qtsdk-2009.05/qt/include/QtGui -I../../qtsdk-2009.05/qt/include -I/media/progs/cpp-includes/boost_svn/trunk -I. -o main.o main.cpp
        main.cpp: In function ‘int main()’:
        main.cpp:20: error: invalid initialization of non-const reference of type ‘__gnu_cxx::__normal_iterator&lt;char*, std::basic_string&lt;char, std::char_traits, std::allocator &gt; &gt;&amp;’ from a temporary of type ‘__gnu_cxx::__normal_iterator&lt;char*, std::basic_string&lt;char, std::char_traits, std::allocator &gt; &gt;’
        /media/progs/cpp-includes/boost_svn/trunk/boost/spirit/home/qi/parse.hpp:32: error: in passing argument 1 of ‘bool boost::spirit::qi::parse(Iterator&amp;, Iterator, const Expr&amp;, Attr&amp;) [with Iterator = __gnu_cxx::__normal_iterator&lt;char*, std::basic_string&lt;char, std::char_traits, std::allocator &gt; &gt;, Expr = boost::spirit::int__type, Attr = int]’
        ...
        

        What could I have done wrong?
        Thanks.

        • Hartmut Kaiser says:

          T,

          your problem is not related to Spirit, but a pure C++ issue. Rewriting your code as

              // compiles OK
              std::string::iterator b = s.begin();
              qi::parse(b, s.end(), qi::int_, i);
          
              // does not compile
              b = s.begin();
              qi::parse(b, s.end(), qi::int_[cout << qi::_1], i);
              // does not compile
              b = s.begin();
              qi::parse(b, s.end(), 
                  qi::int_[boost::phoenix::ref(j) = qi::_1], i);
          

          should make it compile.

          Regards Hartmut

  2. Jurnell Cockhren says:

    Hey. Concerning the rule used in the article:

    qi::rule r = qi::lit(qi::_r1) >> qi::double_;
    

    I have two questions:
    1. For rule ‘r,’ let’s say instead of qi::double_, we parse with some token_def. Does the rule requires a token_def attribute of std::string?
    2. I was under the impression that either required autorule ‘%=’ or [_val = _2] in order to return a synthesized attribute. If not, could you explain how passed up through rules (assuming there’s a rule ‘m’ defined as: qi::rule m = r;

    • Hartmut Kaiser says:

      Hey Jurnell,

      1) The short answer is: yes, the token_def should expose a std::string as its attribute too. The slightly longer answer is: it depends. If you write the rule as above, then the right hand side attribute should match the attribute of the rule, which is the std::string. OTOH, if you utilize semantic actions the token_def does not necessarily need to expose a std::string. Then the semantic action could assign any desired value to the left hand sides attribute (i.e. …[qi::_val = “foo”]). Everything depends on your concrete setup.

      2) You’re right, the autorule operator (‘%=’) forces the left hand side’s attribute to be propagated to the right hand side expression. But in addition rules have some built in logic allowing to decide whether the autorule behavior should be applied even if the ‘normal’ assignment operator (‘=’) is used. The logic is simple: rules expose autorule behavior by default as long as no semantic actions are attached to any of the components of the right hand side expression. So in the examples above all rules implicitely expose the autorule behavior. A reason to use the operator %= would be to enforce autorule semantics even if semantic actions are involved.

      Regards Hartmut

  3. boulila says:

    When I want to parse a complex grammar (mora than 2 rules) and I use phrase_parse to parse. I have an undesirable outputs. I want to ommit them, so I see that I can use the function “buffer” but I dont arrive to use it. Can you help me?

  4. reset999 says:

    Ok… I’m very new at this but I have a slight problem using arrays of symbol tables.

    I have defined a symbol table struct, let’s call it ABC that maps char strings onto unsigned int’s like so:

    struct ABC : qi::symbols<char,unsigned int>  {};
    

    and a:

    struct MY_STRUCT {
       unsigned int uindex;
       unsigned int ui;
    };
    

    I then try to define a grammar say XYZ which sports an array of ABC tables and synthesises a struct MY_STRUCT. Let’s assume the array of ABC’s is nicely initialised somewhere else before the first parse. The input is a symbol string which is then mapped onto an UINT which then becomes an index into the ABC array, selecting the proper ABC table and that finally gets used as a parser to map the next string symbol from the input into a final UINT.
    The following does not compile saying there is no global operator [] which takes type const boost::phoenix::actor<Eval> :

    typedef unsigned int UINT;
    template <typename Iterator>
    struct XYZ : qi::grammar<Iterator,MY_STRUCT(),qi::space_type> {
       XYZ () : XYZ::base_type(doit)
       {
             using qi::_val;
             using qi::eps;
             using qi::_1;
             using qi::_r1;
             using qi::space;
             using qi::string;
             using phoenix::at_c;
             
             table_selector = tables[ _r1 ];   // kaboom 
    
             doit = eps[ at_c<0>(_val) = -1,
                         at_c<1>(_val) = -1 ]
                    >> index[ at_c<0>(_val) = _1 ]
                    >> table_selector( at_c<0>(_val) )[ at_c<1>(_val) = _1 ]
                    ;
       }
       ;
    
       qi::rule<Iterator,MY_STRUCT(),qi::space_type>   doit;
       qi::rule<Iterator,UINT(UINT)>                   table_selector;
       ABC                                             tables[10];
       ABC                                             index;
    };
    

    I added the table_selector rule to emphasize the problem but in reality it’s redundant. Going through the tables array directly with at_c<0>(_val) fails the same way.

    Is there a way of using the inherited attribute _r1 as a subscript to the array? Perhaps there is a way to extract the UINT that’s supposed to be inside?
    Or am I totally on the wrong track – if so could someone please help me re-write this so that it compiles and works? 🙂

    Regards,
    reset999

    • Hartmut Kaiser says:

      Hey reset999,

      It’s very difficult to give an elaborate answer in a comment. Would you mind sending your question to the Spirit mailing list instead (see here for details: http://boost-spirit.com/home/feedback-and-support/)? Other people might want to chime in as well.
      But to answer your question, I believe it’s not possible to use a placeholder variable as a subscript to the array here. In order to get resolved, a placeholder needs to be used in a certain context (i.e. a semantic action), which it is not in your example.

      Regards Hartmut

  5. reset999 says:

    Ok done that 🙂
    http://sourceforge.net/mailarchive/message.php?msg_name=27771242.post%40talk.nabble.com

    And I also seem to have answered my own question – although I’m not sure I’m doing this right?

    Cheers,
    reset999

Leave a Reply to Jurnell Cockhren

preload preload preload