Apr ’11 16

The keyword parser construct has recently been added to spirit’s repository (available in 1.47 or from svn) . Here’s a small introduction to help you get started using the keyword parsers.

Those of you familiar with the Nabialek trick will recognize it’s working under the hood. What you can achieve with the keywords parser can also be achieved with the Nabialek trick but not always as elegantly or as efficiently.

The two examples presented below are included in the spirit repository and can be found in the folder :

libs/spirit/repository/example/qi

Data members marked by keywords (options.cpp)

For this small introduction we’ll consider parsing a program command line.

Options are commonly passed to applications delimited by option keywords :


mySuperCompiler --include includePath --define newSymbol=10 --output output.txt --define newSymbol2=20 --source mySourceFile

The order in which the options are specified doesn’t matter at all. The task of the parser we are going to write is to extract the individual options into some internal data structure we will use to control the program.

Here are the structures we could use to hold the options passed to our command line :

// A basic preprocessor symbol</pre>
typedef std::pair<std::string, int32_t> preprocessor_symbol;

struct program_options {
   // symbol container type definition
   typedef std::vector< preprocessor_symbol > preprocessor_symbols_container;</pre>
   // include paths
   std::vector<std::string> includes;
   // preprocessor symbols
   preprocessor_symbols_container preprocessor_symbols;
   // output file name
   boost::optional<std::string> output_filename;
   // input file name
   std::string source_filename;
};

Of course  the structures are adapted to be compatible with fusion in order to get the data pulled into the structures easily.

Now lets define our options rule:


rule<const char *, program_options(), space_type> kwd_rule;

kwd_rule %= kwd("--include")[
                parse_string
            ]
          / kwd("--define") [
                parse_string
                >> (
                    (lit('=') > int_) | attr(1)
                   )
            ]
          / kwd("--output",0,1)[
                parse_string
            ]
          / kwd("--source",1)[
                parse_string
            ]
          ;

The first thing to notice here is that we used the %= operator. This means that the parsing construct we just wrote has an attribute type compatible with the attribute type of our adapted structure!

This is one spot were the keyword parsing construct surpasses the Nabialek trick. The Nabialek trick just can’t do that.

On the next lines we define our keyword parsing constructs. Writing

kwd("--include")[ parse_string ] 

is equivalent to writing:

lit("--inlude") > parse_string

The word “–include” must be followed by a string.

The kwd directive has the ability to be combined by using the / operator. The kwd directive and the operator / work tightly together to achive the goal of attribute compatibility while using the Nabialek trick.

One last thing to notice is the occurrence constraints which can be associated with a kwd directive. It works like the repeat directive and enables to add additional validation checks inside the keyword parsing loop.

Writing

kwd("--output",0,1)[ parse_string ] 

means that the keyword “–output” may occur 0 or 1 times at most. If it occurs more than once the parser will fail.

Writing

kwd("--source",1)[ parse_string ] 

means that the keyword “–source” must occur once and only once. This works just like the repeat directive.

Using occurrence constraints doesn’t cost much on the runtime performance and gives the ability to easily enforce constraints which would be otherwise way much more difficult to formulate.

The kwd directive also exists in a case insentive variant : ikwd. You can combine the kwd and ikwd freely inside the same keyword block at the cost of a small runtime overhead.

Derived structures (derived.cpp)

A recent post in the mailing list gave me the idea to provide an example of how the keyword parser can be used to produce different derived structures depending on keywords placed in the input.

Here’s the problem as described by MM:

“I have a case where I have a prefix string that will distinguish what will follow it.

prefix string - struct members

this is what is read from the input stream. I have a base struct and 5 derived D1..D5, each derived has a different prefix as a static const std::string member. Parsing the prefix string tells me which struct D1..D5 I should parse after. All these derived structs are fusion adapted. There is a rule for each of the derived.”

To keep the example simple here are the classes we could consider:


struct base_type {
    base_type(const std::string &name) : name(name)  {}

    std::string name;
    virtual std::ostream &output(std::ostream &os) const {
        os<<"Base : "<<name;        return os;
    }
};

struct derived1 : public base_type {
    derived1(const std::string &name, unsigned int data1) :
        base_type(name)
      , data1(data1)  {}

    unsigned int data1;
    virtual std::ostream &output(std::ostream &os) const {
        base_type::output(os);
        os<<", "<<data1;
        return os;
    }
};

struct derived2 : public base_type {
    derived2(const std::string &name, unsigned int data2) :
        base_type(name), data2(data2)  {}

    unsigned int data2;
    virtual std::ostream &output(std::ostream &os) const    {
        base_type::output(os);
        os<<", "<<data2;
        return os;
    }
};

struct derived3 : public derived2 {
    derived3(const std::string &name, unsigned int data2, double data3) :
      derived2(name,data2)
    , data3(data3)
    {}

    double data3;
    virtual std::ostream &output(std::ostream &os) const    {
        derived2::output(os);
        os<<", "<<data3;
        return os;
    }
};

Our parse result must be a vector of pointers to our base class:

std::vector<base_type*>

To get that done, we’ll use semantic actions inside the kwd directive:


kwd_rule = kwd("derived1")[
              ('=' > parse_string > int_ )
              [phx::push_back(_val,phx::new_<derived1>(_1,_2))]
           ]
         / kwd("derived2")[
              ('=' > parse_string > int_ )
              [phx::push_back(_val,phx::new_<derived2>(_1,_2))]
           ]
         / kwd("derived3")[
              ('=' > parse_string > int_ > double_)
              [phx::push_back(_val,phx::new_<derived3>(_1,_2,_3))]
           ]
           ;

This rule will construct new derived classes and append them to our result vector during parsing. The input parsed by this construct is of the form:

 derived2 = "object1" 10 derived3= "object2" 40 20.0 

Keywords vs Nabialek trick

Here’s a small table to compare the features of the keyword parsing constructs and the Nabialek trick to help you decide which solution better suits your needs.

Nabialek trickKeywords parser
Attribute propagationnoyes
Runtime modification of the keyword setyesno
Occurrence constraintsnot easily implentedyes
Number of keyword limitavailable runtime memoryBOOST_VARIANT_LIMIT_TYPES

The keywords parsing construct can save a lot of typing over the Nabialek trick and has in many cases even better performance. It also makes retrieving the parsed data into the program usable structures much easier as it supports attribute propagation. The main limitation of the keyword parser is the number of keywords a keyword block may contain ( limited by the maximum size of the variant type BOOST_VARIANT_LIMIT_TYPES).

2 Responses to “The Keyword parser”

  1. anders li says:

    Very good features !

  2. Krzyszof says:

    I wonder how kwd and operator “/” can be combined with “>>”.

    Suppose one wants to parse:
    item “name1″ ( x 1 y 2 )
    item “name2″ ( y 3 x 4 )

    Should it be something like:
    item_rule %= lit(“item”) >> parse_name >> ‘(‘ >> kwd(“x”)[int_] / kwd(“y”)[int_] >> ‘)’;

    this fails for me. Adding brackets does not help.

Leave a Reply

preload preload preload