Semantic Actions

Semantic actions have the form: expression[action]

Ultimately, after having defined our grammar and having generated a corresponding parser, we'll need to produce some output and do some work besides syntax analysis. Unless of course what we want is merely to check for the conformance of an input with our grammar, which is very seldom the case.

What we need is a mechanism that will instruct the parser on what work should be done as it traverses the grammar while in the process of parsing an input stream. This mechanism is put in place through semantic actions.

Semantic actions may be attached to any expression at any level within the parser hierarchy. An action is a C/C++ function or function object that will be called if a match is found in the particular context where it is attached. The action function may be serve as a hook into the parser and may be used to, for example:

Generate output from the parser (ASTs, for example)
Report warnings or errors
Manage symbol tables

Generic Semantic Actions

A generic semantic action can be any free function or function object that is compatible with the interface:

    void f(IteratorT first, IteratorT last);

where IteratorT is the type of iterator used, first points to the current input and last points to one after the end of the input (identical to STL iterator ranges). A functor should have a member operator() with the same signature as above:

    struct my_functor
    {
        void operator()(IteratorT first, IteratorT last) const;
    };

Iterators pointing to the matching portion of the input are passed into the function/functor.

Example:

    void
    my_action(char const* first, char const* last)
    {
        std::string str(first, last);
        std::cout << str << std::endl;
    }

    rule<> myrule = (a | b | *(c >> d))[&my_action];

The function my_action will be called whenever the expression (a | b | *(c >> d) matches a portion of the input stream while parsing. Two iterators first and last, are passed into the function. These iterators point to the start and end, respectively, of the portion of input stream where the match is found.

Const-ness:

With functors, take note that the operator() should be const. This implies that functors are immutable. One may wish to have some member variables that are modified when the action gets called. This is not a good idea. First of all, functors are preferably lighweight. Functors are passed around a lot and it would incur a lot of overhead if the functors are heavily laden. Second, functors are passed by value. Thus, the actual functor object that finally attaches to the parser, will surely not be the original instance supplied by the client. What this means is that changes to a functor's state will not affect the original functor that the client passed in since they are distinct copies. If a functor needs to update some state variables, which is often the case, it is better to use references to external data. The following example shows how this can be done:

    struct my_functor
    {
        my_functor(std::string& str_)
        : str(str_) {}

        void
        operator()(IteratorT first, IteratorT last) const
        {
            str.assign(first, last);
        }

        std::string& str;
    };

Full Example:

Here now is our calculator enhanced with semantic actions:

    namespace
    {
        void    do_int(char const* str, char const* end)
        {
            string  s(str, end);
            cout << "PUSH(" << s << ')' << endl;
        }

        void    do_add(char const*, char const*)    { cout << "ADD\n"; }
        void    do_subt(char const*, char const*)   { cout << "SUBTRACT\n"; }
        void    do_mult(char const*, char const*)   { cout << "MULTIPLY\n"; }
        void    do_div(char const*, char const*)    { cout << "DIVIDE\n"; }
        void    do_neg(char const*, char const*)    { cout << "NEGATE\n"; }
    }

We augment our grammar with semantic actions:

    struct calculator : public grammar<calculator>
    {
        template <typename ScannerT>
        struct definition
        {
            definition(calculator const& self)
            {
                expression
                    =   term
                        >> *(   ('+' >> term)[&do_add]
                            |   ('-' >> term)[&do_subt]
                            )
                    ;

                term =
                    factor
                        >> *(   ('*' >> factor)[&do_mult]
                            |   ('/' >> factor)[&do_div]
                            )
                        ;

                factor
                    =   lexeme_d[(+digit_p)[&do_int]]
                    |   '(' >> expression >> ')'
                    |   ('-' >> factor)[&do_neg]
                    |   ('+' >> factor)
                    ;
            }

            rule<ScannerT> expression, term, factor;

            rule<ScannerT> const&
            start() const { return expression; }
        };
    };

Feeding in the expression (-1 + 2) * (3 + -4), for example, to the rule expression will produce the expected output:

-1
2
ADD
3
-4
ADD
MULT

which, by the way, is the Reverse Polish Notation (RPN) of the given expression, reminiscent of some primitive calculators and the language Forth.

View the complete source code here. This is part of the Spirit distribution.
[ See libs/spirit/example/fundamental/calc/calc_plain.cpp ]

Specialized Actions

In general, semantic actions accept the first-last iterator pair. The action functions or functors receive the unprocessed data representing the matching production directly from the input. There are situations though where we might want to pass data in its processed form. A concrete example is the numeric parser. It is unwise to pass unprocessed data to a semantic action attached to a numeric parser and just throw away what has been done by the parser. Here, we want to pass the actual parsed number.

The function and functor signature of a semantic action varies depending on the parser where it is attached. The following table lists the parsers that accept unique signatures. Unless explicitly stated in the documentation of a specific parser type, parsers not included in the list by default expect the generic signature as explained above.

Numeric Actions

Applies to:

uint_p
int_p
ureal_p
real_p

Signature for functions:

    void func(NumT val);


Signature for functors:

    struct ftor
    {
        void operator()(NumT val) const;
    };

Where NumT is any primitive numeric type such as int, long, float, double, etc., or a user defined numeric type such as big_int. NumT is the same type used as template parameter to uint_p, int_p, ureal_p or real_p. The parsed number is passed into the function/functor.

Character Actions

Applies to:

chlit, ch_p
range, range_p
anychar
alnum, alpha
cntrl, digit
graph, lower
print, punct
space, upper
xdigit

Signature for functions:

    void func(CharT ch);

Signature for functors:

    struct ftor
    {
        void operator()(CharT ch) const;
    };

Where CharT is the value_type of the iterator used in parsing. A char const* iterator for example has a value_type of char. The matching character is passed into the function/functor.

Cascading Actions

Actions can be cascaded. Cascaded actions also inherit the function/functor interface of the original. For example:

    uint_p[fa][fb][fc]

Here, the functors fa, fb and fc all expect the signature void operator()(unsigned n) const.

Directives and Actions

Directives inherit the the function/functor interface of the subject it is enclosing. Example:

    as_lower_d[ch_p('x')][f]

Here, the functor f expects the signature void operator()(char ch) const, assuming that the iterator used is a char const*.

Templatized Functors

For the sake of genericity, it is often better to make the functor's member operator() a template. That way, we do not have to concern ourselves with the type of the argument to expect as long as the behavior is appropriate. For instance, rather than hard-coding char const* as the argument of a generic semantic action, it is better to make it a template member function. That way, it can accept any type of iterator:

    struct my_functor
    {
        template <typename IteratorT>
        void operator()(IteratorT first, IteratorT last) const;
    };

Take note that this is only possible with functors, however. Which clearly shows that functors are superior to plain function.