Nov 20

Creating Your Own Parser Component for Spirit.Qi

By Hartmut Kaiser Add comments

Several people have been asking the question of how it would be possible to access the current iterator position from an semantic action. Different solutions have been proposed, all of them somehow abusing the predefined Qi directive raw[] which normally exposes as its attribute the pair of iterators pointing to the range in the input stream matched by the embedded parser. I thought this to be a nice incentive to write about how you can create your own parser components.

Spirit is a very modular library. It has been designed with full extensibility in mind, so writing additional components is fairly straight forward. Let’s write a parser component for Qi called iter_pos which exposes the current iterator position as its attribute. In the end we will have implemented a parser primitive usable in the same way as any of the predefined parsers, nicely integrated with Qi’s predefined attribute propagation and merging rules.

The new parser primitive will have the following expression semantics: iter_pos will create a new parser which never matches anything while always succeeding (very much like the eps parser). It’s attribute type is iterator_type, the type of the iterator used to access the underlying input stream.

In order to create a custom parser component we need to perform 4 easy steps as outlined below.

Defining the Placeholder

The first step is to define the placeholder representing our new component when building parser expressions (that’s the iter_pos symbol). This can can be done by using the predefined macro BOOST_SPIRIT_TERMINAL:

 namespace custom_parser { BOOST_SPIRIT_TERMINAL(iter_pos); } 

which can be placed in any namespace. We assume our custom parser component will be created in the namespace custom_parser. This macro defines two types and a const instance of one of the types. It essentially expands to:

namespace custom_parser
{
    namespace tag { struct iter_pos{}; } // tag identifying placeholder
    typedef unspecified<tag::iter_pos> iter_pos_type;
    iter_pos_type const iter_pos = {};   // placeholder itself
}

We will use the type tag::iter_pos later to identify our component, whereas the variable iter_pos is the one to be used as the placeholder representing our parser component in more complex grammars.

Implementing the Enabler

The placeholder needs to be associated with the appropriate extension mechanism (Spirit has separate extension points for primitive parsers, parser directives, operators, etc.). We need to implement an enabler for the custom parser in a way allowing the library to recognize iter_pos as a simple terminal only, as we don’t want it to be valid if used as a directive, etc. The extension point used for simple terminals is boost::spirit::use_terminal<>. We enable our custom parser by providing a specialization of this extension point, which has to be placed in the namespace boost::spirit:

namespace boost { namespace spirit
{
    // We want custom_parser::iter_pos to be usable as a terminal only,
    // and only for parser expressions (qi::domain).
    template <>
    struct use_terminal<qi::domain, custom_parser::tag::iter_pos>
      : mpl::true_
    {};
}}

Spirit will pick up this template specialization whenever it sees our placeholder custom_parser::iter_pos while compiling a parser expression (that’s why the qi::domain).

Implementing the Parser itself

So far, everything we saw was scaffolding allowing to integrate our new parser with the component framework of Spirit. This step describes the actual interface to be implemented in order to expose the required functionality as a parser component. Here is the full code, please note we are placing this code into our own namespace:

namespace custom_parser
{
    struct iter_pos_parser
      : boost::spirit::qi::primitive_parser<iter_pos_parser>
    {
        // Define the attribute type exposed by this parser component
        template <typename Context, typename Iterator>
        struct attribute
        {
            typedef Iterator type;
        };

        // This function is called during the actual parsing process
        template <typename Iterator, typename Context
          , typename Skipper, typename Attribute>
        bool parse(Iterator& first, Iterator const& last
          , Context&, Skipper const& skipper, Attribute& attr) const
        {
            boost::spirit::qi::skip_over(first, last, skipper);
            boost::spirit::traits::assign_to(first, attr);
            return true;
        }

        // This function is called during error handling to create
        // a human readable string for the error context.
        template <typename Context>
        boost::spirit::info what(Context&) const
        {
            return boost::spirit::info("iter_pos");
        }
    };
}

The code above shows 4 notable details any parser component needs to implement, all of them are based on conceptual requirements for parsers (for more information see the Spirit documentation about this here).

We derive our parser implementation from boost::spirit::qi::primitive_parser<> to associate our component with the correct parser concept, in this case a PrimitiveParser.

The embedded meta function attribute is a template which will be instantiated with a Context type (we can ignore this for our small example) and with the type of the Iterator the parser component is being used with. It needs to have defined an embedded type definition type. This will be interpreted as the attribute type exposed by the parser component. Obviously, we expose the supplied Iterator type as our attribute.

The member function parse() is where the actual parsing takes place. It will be invoked with a pair of iterators (first, last), an instance of the Context (which we will ignore), the reference to the used skipper instance (skipper), and the reference to the attribute instance where the parser is supposed to store its result (attr). If no skipper is used to invoke this parser the Skipper will be of the type unused_type. Similarily, if no attribute needs to be extracted the Attribute will be of the type unused_type. As we are designing a parser representing the PrimitiveParser concept we need to perform a pre-skip operation on function entry by calling the function qi::skip_over. Our iter_pos parser is not performing any real parsing, but is supposed to return the current iterator position as it’s attribute. We achieve this by assigning the iterator first (which pints to the current position in the input stream) to the attribute. For this assignment to take place we invoke the customization point assign_to, allowing us not to worry about whether an attribute has been supplied or not. The parse function needs to return whether it succeeded, in our case it will always return true.

The function what() is invoked by the library whenever a human readable string is needed identifying this parser. This is most notably used for error handling, allowing to generate a nicely formatted description about the error context.

This part is clearly the most complex step required to write a parser component, but I believe it is not too complex to understand. As mentioned above, Spirit has been designed with extensibility in mind, and we wanted to make it as simple as possible to be extended.

Instantiating the Parser

The last required piece of code is a parser generator function object which will be used by the library to instantiate a new instance of our parser component. Non-surprisingly, the name of the function object we need to specialize is make_primitive, and since we were using qi::domain in the first step above, we now need to place this specialization into the namespace boost::spirit::qi.

namespace boost { namespace spirit { namespace qi
{
    // This is the factory function object invoked in order to create
    // an instance of our iter_pos_parser.
    template <typename Modifiers>
    struct make_primitive<custom_parser::tag::iter_pos, Modifiers>
    {
        typedef custom_parser::iter_pos_parser result_type;

        result_type operator()(unused_type, unused_type) const
        {
            return result_type();
        }
    };
}}}

You can think of this function object as of a factory for our parser object. Our specialization is again based on the tag::iter_pos as defined above. This identifies our parser component. The function object make_primitive has to expose the type of the component it creates as its embedded type definition result_type. Additionally it exposes a function operator as the actual factory function. You probably already realized that unused_type is Spirits fancy way of saying: ‘I don’t care’, and as we don’t care about the two (required) parameters we use unused_type instead (ok, if you insist to know the details: the first parameter is a reference to the iter_pos placeholder instance resulting in this factory being invoked, and the second parameter is a reference to an object instance of the type Modifiers, which is needed only for directives like no_case[]).

Using the iter_pos Parser Component

The only things left to show is how to use our newly created parser. As I mentioned before, the iter_pos component is usable like any other predefined parser.

    std::string prefix, suffix;           // attributes receiving the
    std::string::iterator position;       // parsed values

    std::string input("prefix1234567");
    std::string::iterator first = input.begin();
    bool result =
        qi::parse(first, input.end()
          , +qi::alpha >> custom_parser::iter_pos >> +qi::digit
          , prefix, position, suffix);

This code snippet utilizes the new parser component to retrieve the position in the input at which the letters end and the digits start. If the parse is successful, prefix will hold “prefix”, suffix will hold “1234567”, and position will point to the 7th character after the beginning of the input string.

Conclusion

The methodology as outlined above is applicable for simple components not exposing any additional functionality, parameterization, or member functions on its own. But since most of Spirits predefined parsers are written using this technique I assume it will allow you to go a long way before you need to apply more powerful tools. For now, I leave the description of how to implement one of those more complex parsers for a future article. Stay tuned!

If you are interested in learning more about this topic, here you can find more material about this in the main Spirit documentation. The complete source code for this example can be found in the Boost SVN. It consists of two files, the header file iter_pos.hpp, which is self contained and ready to be reused in your projects, and the example source file iter_pos_parser.cpp.

15 Responses to “Creating Your Own Parser Component for Spirit.Qi”

  1. Larry Evans says:

    What’s the purpose of

      unspecified
    

    in:

      typedef unspecified iter_pos_type;
    

    ?

    • Hartmut Kaiser says:

      Larry,

      I wanted to avoid having to write about Boost.Proto terminals and related datastructures (because that’s what is defined there). So I left out this ‘minor’ detail from the explanation, even more as it is not relevant to the topic at hands.

      Regards Hartmut

      • Rob says:

        I’m creating a new parser. I thought this article would show me how to do that, but “unspecified” doesn’t help me. Can you add a pointer to the required information if you don’t care to include it here?

  2. udpn says:

    Hartmut,

    That wasn’t that much straightforward for one who’s using boost::spirit without deep knowledge of template metaprogramming. Actually it was something like call of Metacthulhu :)

    Currently I’m in trouble of defining a skipper which will skip both spaces and newlines. Paaanic.

    Regards, Philip.

    • Joel de Guzman says:

      Philip, if you don’t want to do any “Metacthulhu”, then this is the wrong article for you ;-) .

      Hartmut, perhaps we need to tag the articles as beginner/intermediate/advanced? Seems to me, this is an advanced topic as people don’t normally want to extend Spirit and write their own components?

    • Hartmut Kaiser says:

      Philip,

      a skipper which will skip spaces and newlines would be:

      space | eol 
      

      The only problem I could see is when it comes to pass the proper skipper type to a grammar or rule declaration. Grammars and rules need to be declared with the skipper type they are going to be used with. Fortunately Boost has a facility allowing to achieve this in a blink:

      rule<Iterator, BOOST_TYPEOF(space | eol)> r;
      

      HTH
      Regards Hartmut

      • udpn says:

        Hartmut, that was awesome.

        I’ve already used user-defined grammar as skipper. New manual lacks a lot of data.

      • udpn says:

        BTW, there’s no data about splitting grammar in parts too. I’ve made a full grammar for C++, but I can’t compile it due to “out of heap” exception.

        • OvermindDL1 says:

          The mini-c example shows how to split things up to prevent that, but you should still update your compile at the least if it is not already.

        • Michael S. says:

          updn, couldn’t you share your Spirit C++ grammar?

          If you’ve successfully made a C++ grammar in Spirit I believe it really shows the strength Spirit – and it would be a really good example for everyone to see.

  3. amira says:

    I don’t like to skeep space

  4. Larry Evans says:

    Instead of following this 4-5 step solution, why not do something like token_def
    where just 1 class is needed. I just tried it and it seems to work.
    Code is uploaded into vault under Strings-Text Processing directory
    in file, variant_primitive_parser.cpp.

  5. Leopold Talirz says:

    If someone is having problems compiling iter_pos.hpp – the file is not completely self-contained as claimed in the article. It will compile after adding

    #include <boost/spirit/include/karma.hpp>

    (I am sure much less would be needed – I did not check).

Leave a Reply

To use RetinaPost you must register at http://www.RetinaPost.com/register