Confix Parsers

Confix Parsers

Confix Parsers recognize a sequence out of three independent elements: an opening, an expression and a closing. A simple example is a C comment:

    /* This is a C comment */

which could be parsed through the following rule definition:

    rule<> c_comment_rule
        =   confix_p("/*", *anychar_p, "*/")
        ;

The confix_p parser generator should be used for generating the required Confix Parser. The three parameters to confix_p can be single characters (as above), strings or, if more complex parsing logic is required, auxiliary parsers, each of which is automatically converted to the corresponding parser type needed for successful parsing.

The generated parser is equivalent to the following rule:

    open >> (expr - close) >> close

If the expr parser is an action_parser_category type parser (a parser with an attached semantic action) we have to do something special. This happens, if the user wrote something like:

    confix_p(open, expr[func], close)

where expr is the parser matching the expr of the confix sequence and func is a functor to be called after matching the expr. If we would do nothing, the resulting code would parse the sequence as follows:

    open >> (expr[func] - close) >> close

which in most cases is not what the user expects. (If this is what you've expected, then please use the confix_p generator function direct(), which will inhibit the parser refactoring). To make the confix parser behave as expected:

    open >> (expr - close)[func] >> close

the actor attached to the expr parser has to be re-attached to the (expr - close) parser construct, which will make the resulting confix parser 'do the right thing'. This refactoring is done by the help of the Refactoring Parsers. Additionally special care must be taken, if the expr parser is a unary_parser_category type parser as

    confix_p(open, *anychar_p, close)

which without any refactoring would result in

    open >> (*anychar_p - close) >> close

and will not give the expected result (*anychar_p will eat up all the input up to the end of the input stream). So we have to refactor this into:

    open >> *(anychar_p - close) >> close

what will give the correct result.

The case, where the expr parser is a combination of the two mentioned problems (i.e. the expr parser is a unary parser with an attached action), is handled accordingly too, so:

    confix_p(open, (*anychar_p)[func], close)

will be parsed as expected:

    open >> (*(anychar_p - end))[func] >> close

The required refactoring is implemented here with the help of the Refactoring Parsers too.

Summary of Confix Parser refactorings
You write it as: It is refactored to:
confix_p(open, expr, close)

open >> (expr - close) >> close

confix_p(open, expr[func], close)

open >> (expr - close)[func] >> close

confix_p(open, *expr, close)

open >> *(expr - close) >> close

confix_p(open, (*expr)[func], close)

open >> (*(expr - close))[func] >> close

Comment Parsers

The Comment Parser generator template comment_p is helper for generating a correct Confix Parser from auxiliary parameters, which is able to parse comment constructs as follows:

    StartCommentToken >> Comment text >> EndCommentToken

There are the following types supported as parameters: parsers, single characters and strings (see as_parser). If it is used with one parameter, a comment starting with the given first parser parameter up to the end of the line is matched. So for instance the following parser matches C++ style comments:

    comment_p("//")

If it is used with two parameters, a comment starting with the first parser parameter up to the second parser parameter is matched. For instance a C style comment parser could be constrcuted as:

    comment_p("/*", "*/")

The comment_p parser generator allows to generate parsers for matching non-nested comments (as for C/C++ comments). Sometimes it is necessary to parse nested comments as for instance allowed in Pascal.

    { This is a { nested } PASCAL-comment }

Such nested comments are parseable through parsers generated by the comment_nest_p generator template functor. The following example shows a parser, which can be used for parsing the two different (nestable) Pascal comment styles:

    rule<> pascal_comment
        =   comment_nest_p("(*", "*)")
        |   comment_nest_p('{', '}')
        ;

Please note, that a comment is parsed implicitly as if the whole comment_p(...) statement were embedded into a lexeme_d[] directive, i.e. during parsing of a comment no token skipping will occur, even if you've defined a skip parser for your whole parsing process.

comments.cpp demonstrates various comment parsing schemes:

  1. Parsing of different comment styles
  2. Parsing tagged data with the help of the confix_parser
  3. Parsing tagged data with the help of the confix_parser but the semantic
    action is directly attached to the body sequence parser

This is part of the Spirit distribution.