Dec 05

Creating Your Own Generator Component for Spirit.Karma

By Hartmut Kaiser Add comments

In the good tradition of highlighting Spirits both major sub-libraries based on similar use cases I will talk about Karma today. In a previous installment (see Creating Your Own Parser Component for Spirit.Qi) I presented the steps needed to create a parser terminal. Our topic here is the creation of a new generator directive allowing to group output elements in columns. For instance, instead of being restricted to printing a vector of integers in one row only:

1 2 3 4 5 6 7 8 9

we will build the facilities needed to automatically insert a line break after each N-th element:

1 2 3 4 5
6 7 8 9

This is a very useful feature in general still missing from Karma. For this reason the described directive has been recently added to the library making it available with the next release. But to simplify matters the code presented here will be an abridged version of the full implementation as contained in the main code base. We chose the following syntax for our new directive columns, where the embedded expression (in the example the *int_) can be any other generator:

columns[*int_]

As you might already know, all generators are invoked with a special delimiting generator as one of their parameters, which is used to insert special delimiters in between the output emitted by Karma’s primitives. We will see in the example below how to specify it. This delimiting generator is very similar to Qi’s skip parser. As mentioned in an earlier article Karma is very much a mirrored image of Qi, which applies nicely in this case as well.

Now, the columns directive will substitute the delimiting generator for the embedded generator with a generator which not only wraps and invokes the old delimiter, but additionally emits an additional column delimiter after its N-th invocation. The original delimiter will be restored after the embedded generator expression finished its work. As the chosen syntax above does not allow to specify neither the number N nor the column delimiter to use we will assume for this example that we always want to generate columns by inserting a newline after each 5th invocation of our special delimiting generator (the ‘real’ columns directive as added to Karma does support specifying both parameters, for more information please consult its documentation).

Creating a custom directive is a bit more difficult than implementing a primitive component. In addition to the 4 similar steps required for the primitive (if you have not seen the related article, please see Creating Your Own Parser Component for Spirit.Qi) we need to perform one more step. Even if this will be a bit like a repetition of what I wrote in that other article, I will explain the necessary steps in detail.

Defining the Placeholder

The first step is again to define the placeholder representing our new component when building generator expressions (that’s the columns symbol). This has to be done by using the predefined macro BOOST_SPIRIT_TERMINAL:

namespace custom_generator { BOOST_SPIRIT_TERMINAL(columns); }

which can be placed in any namespace. We assume our custom generator component will be created in the namespace custom_generator. This macro defines two types and a const instance of one of the types. It essentially expands to:

namespace tag { struct columns {}; } // tag identifying placeholder
typedef unspecified<tag::columns> columns_type;
columns_type const columns = {};     // placeholder itself

We will use the type tag::colums later to identify our component, whereas the variable columns is the one to be used as the placeholder representing our generator directive in more complex grammars.

Implementing the Enabler

The placeholder needs to be associated with the appropriate extension mechanism (Spirit has separate extension points for primitive generators, generator directives, operators, etc.). We need to implement an enabler for the custom generator in a way allowing the library to recognize columns as a directive only, as we don’t want it to be valid if used as a primitive generator, etc. The extension point used for directives is boost::spirit::use_directive<>. We enable our custom generator by providing a specialization of this extension point, which has to be placed in the namespace boost::spirit:

namespace boost { namespace spirit
{
    // We want custom_generator::columns to be usable as a directive only,
    // and only for generator expressions (karma::domain).
    template <>;
    struct use_terminal<karma::domain, custom_generator::tag::columns>
      : mpl::true_ {};
}}

Spirit will pick up this template specialization whenever it sees our placeholder custom_generator::columns while compiling a generator expression (that’s why the karma::domain).

Implementing the Generator itself

Everything we have seen so far is boilerplate code allowing to integrate our new generator with the component framework of Spirit. This step describes the actual interface to be implemented in order to expose the required functionality as a generator component. Here is the full code, please note everything is placed into our own namespace custom_generator.

The class simple_columns_generator is slightly more complicated than the iter_pos parser component developed earlier, but if you look closer you will see a very similar, equally well defined interface we need to implement in order to fit the new generator into the existing framework.

We derive our generator implementation from boost::spirit::karma::unary_generator<> to associate our component with the correct generator concept, in this case a UnaryGenerator.

// That's the actual columns generator
template <typename Subject>
struct simple_columns_generator
  : boost::spirit::karma::primitive_generator&lt;
        simple_columns_generator<Subject> >
{
    // more members go here as explained below
    Subject subject;
};

The embedded type properties is specific for Karma generators (and not needed for Qi parser components). Here is a piece of background information: Karma internally wraps the user provided output iterator into its own. This is done to allow for certain advanced features as buffering, character counting, and output position tracking. Usually only a part of those features is required for the concrete generator expression. The embedded type properties describes the required properties of all generator components. Karma uses this information to optimize its wrapping output iterator to support only the required features. In our case the columns generator does not add any requirements, which makes it sufficient to expose the properties of the embedded generator.

// Define required output iterator properties
typedef typename Subject::properties properties;

The embedded meta function attribute is a template which will be instantiated with a Context type (we can ignore this for our small example) and with the type of the OutputIterator the generator component is being used with. It needs to have defined an embedded type definition type. This will be interpreted as the attribute type exposed by the generator component. As the columns generator will have to expose the attribute of its embedded expression we use another of Spirits customization points, attribute_of, to retrieve the embedded attribute.

// Define the attribute type exposed by this parser component
template <typename Context, typename Iterator>
struct attribute
  : boost::spirit::traits::attribute_of<Subject, Context, Iterator>
{};

The constructor of the simple_columns_generator component will be called with a reference to the embedded generator, we will see the corresponding code later.

simple_columns_generator(Subject const& s)
  : subject(s) {}

The member function generate() is called in order to emit the output for this component. It will be invoked with the output iterator to be used (sink), an instance of the generator context (ctx), the reference to the used delimiter instance (delimit), and the reference to the attribute instance the component generates output from (attr). Most of the arguments are just forwarded to the embedded component’s generate() function. As mentioned above, our columns generator wraps the outer delimiter into a new column delimiter (which is a generator itself) and passes this new column delimiter while executing generate() of its embedded component. We will look at the implementation in a minute.

// This function is called during the actual output generation.
// It dispatches to the embedded generator while supplying a new
// delimiter to use, wrapping the outer delimiter.
template <typename OutputIterator, typename Context
  , typename Delimiter, typename Attribute>
bool generate(OutputIterator& sink, Context& ctx
  , Delimiter const& delimiter, Attribute const& attr) const
{
    columns_delimiter<Delimiter> d(delimiter);
    return subject.generate(sink, ctx, d, attr) &&
           d.final_delimit(sink);
}

The function what() is invoked by the library whenever a human readable string is needed identifying this generator. This is most notably used for error handling, allowing to generate a nicely formatted description about the error context. Our implementation again mostly forwards to the embedded generator.

// This function is called during error handling to create
// a human readable string for the error context.
template <typename Context>
boost::spirit::info what(Context& ctx) const
{
    return boost::spirit::info("columns", subject.what(ctx));
}

We have all desired parts of the generator in place now. What’s still missing is the column delimiter. It has two data members: the reference to the outer delimiter and the column counter. The outer delimiter is passed to the constructor from the generator() function we saw above. As this class is used as a generator (all delimiters are generators) it needs to expose a generate() function as well, with exactly the same prototype as described. The column delimiter will be invoked by the components constituting the embedded generator. For this reason we first invoke the wrapped (outer) delimiter and increment the invocation counter. After each 5th invocation we insert an additional newline into the output stream.

The last missing piece is the function final_delimit() which will be invoked once after the embedded generator returned (see the function generate() described above). Its sole purpose is to append a column generator at the very end if the last invocation didn’t already append one.

// Special delimiter wrapping the original one while additionally emitting
// the column delimiter after each 5th invocation
template <typename Delimiter>
struct columns_delimiter
{
    columns_delimiter(Delimiter const& delim)
      : delimiter(delim), count(0) {}

    // This function is called during the actual delimiter output
    template <typename OutputIterator, typename Context
      , typename Delimiter_, typename Attribute>
    bool generate(OutputIterator& sink, Context&, Delimiter_ const&
      , Attribute const&) const
    {
        // first invoke the wrapped delimiter
        if (!karma::delimit_out(sink, delimiter))
            return false;

        // now we count the number of invocations and emit the column
        // delimiter after each 5th column
        if ((++count % 5) == 0)
            *sink++ = '\n';
        return true;
    }

    // Generate a final column delimiter if the last invocation didn't
    // emit one
    template <typename OutputIterator>
    bool final_delimit(OutputIterator& sink) const
    {
        if (count % 5)
            *sink++ = '\n';
        return true;
    }

    Delimiter const& delimiter;   // wrapped delimiter
    mutable unsigned int count;   // invocation counter
};

Overall, this step was a bit more involved, mainly as we needed to write more code. Nevertheless,  I believe it highlights the important parts of the interface your own components with the libraries infrastructure.

Instantiating the Generator

The next required piece of code is a generator function object which will be used by the library to instantiate a new instance of our generator component. Non-surprisingly, the name of the function object we need to specialize is make_directive, and since we were using karma::domain in the first step above, we now need to place this specialization into the namespace boost::spirit::karma.

namespace boost { namespace spirit { namespace karma
{
    // This is the factory function object invoked in order to create
    // an instance of our simple_columns_generator.
    template <typename Subject, typename Modifiers>
    struct make_directive<
        custom_generator::tag::columns, Subject, Modifiers>
    {
        typedef custom_parser::simple_columns_generator<Subject>
            result_type;

        result_type operator()(
            unused_type, Subject const& s, unused_type) const
        {
            return result_type(s);
        }
    };
}}}

You can think of this function object as of a factory for our generator object. Our specialization is again based on the tag::columns as defined above. This identifies our generator component. The function object make_directive has to expose the type of the component it creates as its embedded type definition result_type. Additionally it exposes a function operator as the actual factory function. The unused_type is Spirits way of saying: ‘I don’t care’, and as we don’t care about two of the three (required) parameters we use unused_type instead (here are the details anyway: the first parameter is a reference to the columns placeholder instance resulting in this factory being invoked, and the third parameter is a reference to an object instance of the type Modifiers, which is needed only for directives like lower[]). The second parameter refers to the embedded generator being used with the columns directive. We pass this to the constructor of  the new instance of the simple_columns_generator we create to return from the factory function. Earlier we saw how this instance is stored and used by the columns generator.

Enabling proper Auto-Rule Behavior

The last step required for making a new directive for Spirit is to specialize yet another customization point. In order to understand this you need some more background information. Spirit’s rule<> non-terminals have special built-in capabilities to do proper attribute propagation to/from the right hand side expression they are initialized from (we call this auto-rule behavior). This attribute propagation is guaranteed to work only as long as no semantic actions are attached to any of the right hand side’s sub-expressions. The specialization of the template has_semantic_action allows detecting its presence. Our code just uses built-in facilities to forward the detection to the embedded (subject) generator.

namespace boost { namespace spirit { namespace traits
{
    template <typename Subject>
    struct has_semantic_action<
            custom_generator::simple_columns_generator<Subject> >
      : unary_has_semantic_action<Subject> {};
}}}

Using the New columns Component

Last but not least we will have a look at a small example demonstrating the usage of the new columns component. The following code generates exactly the output shown in the very beginning of this article:

std::vector<int> v;
for (int i = 1; i < 10; ++i)
    v.push_back(i);

std::string generated;
std::back_insert_iterator<std::string> sink(generated);

karma::generate_delimited(sink
  , custom_generator::columns[*karma::int_]    // formatting directive
  , karma::space                               // outer delimiter
  , v);                                        // data to output

Conclusion

There is not much to add. A general observation perhaps: Spirit has been built based on customizations points. These are templates to be specialized in order to add functionality. Usually the core library provides the main template for those specializations, exposing some default behavior. As all predefined Spirit components use the customization points of the core library as well, all users are free to extend the library to adapt it to their needs. All this got possible only after the available C++ compilers started to properly support partial template specialization, so you won’t be able to use Spirit with older or non-conforming compilers (see here for a list of supported compilers).

All of this code shown above can be downloaded from the Boost SVN here (and the example code is here). As already mentioned, starting with the next release Karma will provide a more complex columns directive, but this small example will stay around as well.

One Response to “Creating Your Own Generator Component for Spirit.Karma”

  1. Ilya says:

    Hi,

    I’m trying to implement so called “indent” directive (for formatted XML, JSON output, etc.) and facing with two problems:

    1) I have to intercept every ‘\n’ passed to the sink in order to generate indentation. It looks like this need support in output_iterator, like of counting policy.
    2) This directive should support nesting (i.e. nested directives should increase indentation level), for example a JSON like generator:

    array = '[' << indent(4)[-(value % ",")] << "&";
    rule<OutputIterator> value = double_ | int_ | bool_ | string_ | array;
    

    But it looks like this will require modification of output_iterator.
    Is it possible to implement without modifications of output_iterator?

    P.S.: This library is so great! I really love it!

Leave a Reply

preload preload preload