Feb ’10 17

Recently, there have been a couple of questions on the Spirit mailing list asking how to parse as set of things known in advance in any sequence and any combination. A simple example would be a list of key/value pairs with known keys but the keys may be ordered in any sequence. This use case seems to be quite common. Fortunately Spirit provides you with a predefined parser component designed for exactly that purpose: the permutation parser.

Spirit’s permutation parser a ^ b matches either a, b, a >> b, or b >> a, where a and b can be arbitrary parser expressions. Just like normal sequences this operator can be utilized to combine more than two operands. For instance, the expression a ^ b ^ c will match a or b or c (or an combination thereof) in any sequence. The attribute propagation rule for the permutation parser is

a: A, b: B --> (a ^ b): tuple<optional<A>, optional<B> >

As usual, if one or more operand of the expression do not expose any attribute (expose unused_type as their attribute, which is equivalent), this operand disappears from attribute handling:

a: A, b: Unused --> (a ^ b): optional<A>;

The permutation parser works out of the box whenever you do not require to match all of the elements in the input. But what if you want strict permutation (operands get matched exactly once)? You have two possibilities, as often, one simple and less versatile and one more complex but universally applicable solution. The simple solution is to parse the input and to check afterward whether all optionals in the resulting attribute have been filled. I will leave that solution as an exercise for the reader.

If we assume the attribute to be a (Fusion) tuple of optionals, containing one optional for each of the parser components in the permutation parser we can write the following code (thanks to Carl Barron for the initial idea).

This code defines a Phoenix function (a lazy function encapsulating some custom functionality) checking whether one or more of the optionals in a given Fusion sequence are empty. The Fusion algorithm find_if iterates over the given sequence of optionals, invoking the option_empty::operator() for each of the elements. fusion::find_if stops iterating on the first invocation returning true and returns the iterator to the element it stopped on. This is very similar to the well known std::find_if algorithm.

namespace phoenix = boost::phoenix;
namespace fusion = boost::fusion;
namespace qi = boost::spirit::qi;

class no_empties_impl
{
    // helper function object to be invoked by fusion::find_if
    struct optional_empty
    {
        template <typename T>
        bool operator ()(T const& val) const
        {
            return !val;  // return true if 'val' is empty.
        }
    };

public:
    template <typename T>
    struct result { typedef bool type; };

    // This operator will get called from the semantic action attached
    // to the permutation parser. The parameter refers to its overall
    // attribute: the fusion tuple of optionals.
    template <typename T>
    bool operator ()(T const& t) const
    {
        // look for an empty optional, if any return false.
        return fusion::find_if<optional_empty>(t) ==
               fusion::end(t);
    }
};

// define the Phoenix function
phoenix::function<no_empties_impl> const no_empties = no_empties_impl();

The overall Phoenix function no_empties will return false if we found at least one non-initialized optional in the passed sequence. The following code snippet illustrates how everything fits together:

std::string input ("BCA");
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    (qi::char_('A') ^ 'B' ^ 'C')[qi::_pass = no_empties(qi::_0)]);

We assign the result of the invocation of no_empties to Qi’s predefined placeholder _pass. If we assign false, then the parser the semantic action is attached to will be forced to fail in retrospective (even if it matched the input successfully before). As a result the overall parser expression will succeed as long as a) the permutation parser matches its input and b) the Phoenix function inside the semantic action returns true.

For more information about the permutation parser please consult its documentation here. Overall, this example is a bit more complex than the average parser you might usually write. It utilizes three libraries: Spirit, Phoenix, and Fusion in a seamless manner. But for sure, once you understand the idea, it will be easier for you to come up with similar solutions. Spirit has been designed with Phoenix and Fusion in mind, and in fact it relies on Fusion heavily itself. As a result, the integration of those libraries is almost perfect.

13 Responses to “Parsing Arbitrary Things in Any Sequence”

  1. OvermindDL1 says:

    And of course, certainly there is some way to use symbols with rules stuffed in for much larger groupings too. :)

  2. Hicham says:

    This has been working beautifully for me since the initial posts by Carl Barron and ? I put the no_empties phoenix function as a member of my grammar.

    Perhaps there should be an equivalent to the ^ operator which does this by default, ie parsing will fail if not all of the operands are there ?

    • Hartmut Kaiser says:

      I’m not sure about that. Such an implementation would probably do exacty what’s described here (as we don’t have any operators available anymore). What do others think about this? Any syntax related suggestions?

      Regards Hartmut

  3. Olaf says:

    if I want to compile this (even with headers, main etc.) I’ve got an compile error
    no matching function for call to find_if(const boost::fusion::vector1<boost::optional >&)

    Is the shown syntax correct?

    Thanks,
    Olaf

  4. Olaf says:

    with fusion::all(t, optional_empty()) works

    • Patrick says:

      Hi Olaf,

      can you please be a little bit more precise? I got the same error, but I am not able to fix it with your hint. Can you post a source that compiles?

      Thanks a lot
      Patrick

  5. Aaron says:

    I’ve come to the conclusion that the permutation parser isn’t very useful when picking out expressions from an unbounded large set. For instance, when parsing HTTP headers:

    Host: 1.1.1.1:80
    Connection: close
    Content-Length: 1234
    

    …via this rule:

    rule = connection_header
      ^ content_length_header
      ^ host_header
      ^ omit[+print >> eol] // catch-all
      ;
    

    …will not behave as expected. It will match the ‘Host’ header and the ‘Content-Length’ header, but the ‘Connection’ header will get eaten up by the catch-all. I don’t know if this is a bug in the spirit implementation (I’m using 2.3) or if it’s intentional, but it makes the permutation parser useless in those cases, especially if you’re trying to fuse/adapt a rule to a struct. Unfortunately, in my experience, almost all of the cases where the permutation parser would otherwise come into use are of this variety, which means I rarely end up using the ^ operator, and I have to take a messy roundabout strategy with lots of explicit semantic actions.

  6. Robert Nelson says:

    I think a strict parser would be very helpful. Couldn’t you use the binary + operator, for example. It seems that strict permutations are very common, and it would be helpful to not have to manually use phoenix functions and have unnecessarily complicated data types; we should not have to use boost::optional for items that are required. It should also always match items that are first in the list first so that it is possible to have a catch all at the end of the permutation list. I am not sure if the current implementation does this. If operators are not available, a library function that takes variable arguments like strict_permutation(“a”,”b”,”c”) would be good.

    You should also be able to mix required and non required elements. For example a + b + -c would match “ab” “ba” “bac” …
    a + b + *char_ would match all strings that included a and b, but you could have different semantic actions for a and b.
    This would be really useful for parsing key-value pairs where a couple pairs are required, and there are optional user defined pairs.

    You could write the function or operator like this:
    rule operator + (rule a, rule b) {return (a << b) | (b << a);}

    A strict permutation operator or function as described above would be incredible useful. It would result in cleaner looking code, and work in complicated situations that would be hard to implement using phoenix operators. Do you think you could implement one in a future release?

    Thanks
    –Robert Nelson

  7. hicham says:

    Has there been any followup on the mailing list?

    I have come to a 2nd case where I’d like to use a “operator” for strict permutation where all the operands are required to appear exactly once.
    It was mentioned there were no more c++ operators available for this?

    Can a function be used instead?

Leave a Reply

preload preload preload