Nov 18

Output Generation from a List of Key-Value Pairs Using Spirit.Karma

By Hartmut Kaiser Add comments

Even if it feels a bit like cheating I decided to get back to the example highlighting lists of key/value pairs I initially wrote about here. On the other hand looking at this very same problem from the standpoint of generating output allows me to highlight a couple a interesting facilities Spirit provides you with.

A Few Words up Front

Starting with V2.1 (released with Boost V1.41) Spirit is more than a parser generator library. It now contains two separate, but nevertheless tightly connected sub-libraries: Spirit.Qi – for parsing and Spirit.Karma – for output generation. Understanding Spirit.Karma is not hard at all if you already know Spirit.Qi. The two sub-libraries are like Yin and Yang. Everything you have learnt so far about Qi’s parsers is equally true for Karma’s generators, with the exception that this knowledge has to be applied ‘inside out’.

  • Qi is all about parsing byte sequences into internal data structures, Karma is about generating byte sequences from those.
  • If Qi gets its input from input iterators, then Karma outputs the generated data using an output iterator.
  • If Qi uses overloaded versions of the ‘>>’ do denote sequences of things, Karma uses the ‘<<’.
  • Qi’s semantic actions receive a value from the parser they are attached to; but Karma’s semantic actions are providing values to their generators.
  • If Qi’s parser attributes are passed up the parser hierarchy to fill the data structures supplied by the user, then Karma’s attributes are passed down the generator hierarchy, extracting the values the user wants to generate output from.

This duality allowed us to develop Qi and Karma using a common library infrastructure and common implementation techniques, and it makes it easier for the user to work with either of the libraries. Most importantly, Qi and Karma share a common syntax, exposing a common Domain Specific Embedded Language (based on Parsing Expression Grammars – PEG) formed by using C++ operator overloading.

The Actual Output Generation

Let’s get back to the problem at hand: generating output from a list of key/value pairs. Again, we start with writing the grammar, which in this case describes the output format for the URL query string: key1=value1&key2=value2&…&keyN=valueN.

query → pair ('&' pair)*
pair  → key ('=' value)?
key   → [a-zA-Z_][a-zA-Z_0-9]*
value → [a-zA-Z_0-9]+

This is almost identical to the grammar we used to derive the parser solution. We slightly modified the usual PEG syntax by changing the ← to the → to denote the top down nature of output generation, but otherwise the notation is unchanged. A query should be printed as a list of at least one pair, where these have to be separated by ‘&’. Each pair should be formatted as a mandatory key, optionally followed by a ‘=’ and the value, where both, key and value have defined some constraints on what characters are allowed.

After you have seen how any PEG can be converted into a Qi parser here, you probably will not be surprised to see a similar conversion to be possible for output formats.

query =  pair << *('&' << pair);
pair  =  karma::string << -('=' << karma::string);

In addition to the usual conversion rules, for now we applied some simplifications, namely removing the constraints on the key and value formats (we will re-introduce this later). This enables us to use the predefined generator component karma::string, a placeholder expression to output any (non-empty) sequence of characters.

Now, as we have the grammar, let’s talk about attributes! Attributes in Qi are the types and values we get as the result of converting the input. Attributes in Karma are the types and values we want to generate output from. We assume the list of key/value pairs to be stored in a std::vector, while the pairs themselves are stored in a std::pair. The only slight difficulty comes from the optional nature of the stored values. As the value may be given or not we store it in a boost::optional<std::string>, leaving it uninitialized if no value is present:

typedef std::pair<std::string, boost::optional<std::string> > pair_type;

karma::rule<OutputIterator, std::vector<pair_type>()> query;
karma::rule<OutputIterator, pair_type()> pair;

Generator non-terminals encapsulate a format description for a particular data type, and, whenever we need to emit output for this data type, the corresponding non-terminal is invoked in a similar way as the predefined Karma generator primitives. The Karma non-terminals are very similar to the Qi non-terminals. The main difference is that they do not expose a synthesized attribute (as parsers do), but they require a special consumed attribute. Even if the consumed attribute is not ‘returned’ from the generator (as the synthesized parser attributes in Qi) we chose to utilize the same function style declaration syntax as used in Qi. For the rule definitions above we need to provide at least the output iterator type of the output data sink (OutputIterator) and the consumed attribute type, which is the type of the data the rule is supposed to generate output from.

The encapsulation of the whole grammar is straightforward when using karma::grammar<>:

namespace karma = boost::spirit::karma;

template <typename OutputIterator>
struct keys_and_values
  : karma::grammar<OutputIterator, std::vector<pair_type>()>
{
    keys_and_values()
      : keys_and_values::base_type(query)
    {
        query =  pair << *('&' << pair);
        pair  =  karma::string << -('=' << karma::string);
    }
    karma::rule<OutputIterator, std::vector<pair_type>()> query;
    karma::rule<OutputIterator, pair_type()> pair;
};

Deriving your type from karma::grammar<> creates a new Karma generator representing the enclosed format description. The base class is initialized with the start rule (query), which is the top most rule of the grammar to be executed when the grammar is invoked. The template parameter OutputIterator makes the grammar usable with any underlying output stream chosen.

The following code snippet concludes the example by demonstrating how to invoke our grammar:

typedef std::back_insert_iterator<std::string> sink_type;
typedef boost::optional<std::string> value_type;

std::vector<pair_type> v;                    // vector containing data
v.push_back(pair_type("key1", value_type("value1")));
v.push_back(pair_type("key2", value_type()));
v.push_back(pair_type("key3", value_type("")));

std::string generated;
sink_type sink(generated);
keys_and_values<sink_type> g;                // create instance of
                                             // generator

bool result = karma::generate(sink, g, v);   // returns true if
                                             // successful

// ‘generated’ will hold: "key1=value1&key2&key3="

The function karma::generate() is another of Spirit’s main API functions. In the simplest case it takes an output iterator representing the output target to send the output to (sink), an instance of the generator to invoke (g), and the attribute instance holding the data (v). This function executes the actual generator operation and returns true if it was successful.

Karma has well defined attribute propagation and splitting rules along the same lines as Qi does. These rules allow for the code above to actually work as expected. The Kleene Star splits the supplied (STL compatible) container and passes its elements to the embedded generator, one at a time. That’s the reason why we use the value_type of the vector as the attribute type of the rule pair. The sequence on the right hand side of pair matches the member elements of its attribute to the corresponding sequence parts, which naturally splits off the std::pair. It is interesting to note that the optional part of this rule, -(‘=’ << karma::string), includes the ‘=’, which allows us to skip generating the ‘=’ if the value is not initialized (empty).

In the beginning I promised to come back to the issue of adding constraints to the keys and values generated. Assuming we want to apply the constraints as listed in the initial PEG grammar we could write:

karma::rule<OutputIterator, std::string()> key, value;
key   =  karma::char_("a-zA-Z_") << *karma::char_("a-zA-Z_0-9");
value = +karma::char_("a-zA-Z_0-9");

Additionally we would have to change the format description for pair:

pair  =  key << -('=' << value);

These changes alone will force the generator to fail if either the key or the value contained ‘illegal’ characters. This works because Karma generators check their attribute against the immediate value they are initialized from, while failing if those do not match.

Concluding Remarks

A last, more general note. Comparing parsing of lists of key/value pairs with how these can be generated reveals something perhaps unexpected. The grammars used are almost identical! But if you think about this it probably will get obvious. This similarity has been maintained based on the idea that a grammar usable to describe the expected structure of some input may as well be used to formalize the structure of a correspondingly generated output. Parsing, most of the time, is implemented by comparing the input with the patterns defined by the grammar. If we use the same patterns to format a matching output, the generated sequence will follow the rules of the grammar as well.

If you want to try out this example for yourself, the complete source code is available from the Boost SVN here. In the future this example will be distributed as part of the Spirit distribution, but for now it lives in the SVN only.

Leave a Reply

preload preload preload