Feb ’11 28

When using expectation points, a parsing failure results in an exception that generically indicates the failure, but probably doesn’t explain the problem in the most meaningful way. It is possible to attach an error handler to react to the failed match in a more specialized way:

rule = alpha > '!';
on_error<fail>(rule,
   std::cerr << val("Expected '!' at offset ") << (_3 - _1)
      << " in \" << std::string(_1, _2) << '"'
      << std::endl);

That will produce a message like the following on stderr:

Expected '!' at offset 7 in "Some input"

However, if there’s more than one expectation point in a rule, then the diagnostic may be unhelpfully generic. To do otherwise, one must distinguish which expectation point failed. While it is certainly possible to factor the grammar into additional rules in order to have at most one expectation point per rule, that’s not necessary and can make the grammar less readable than otherwise. Instead, the what parameter (_4) of the error handler can be used:

rule = alpha > '!';
on_error<fail>(rule,
   std::cerr << val("Expected " << _4 << " at offset ")
      << (_3 - _1) << " in \" << std::string(_1, _2) << '"'
      << std::endl);

The what parameter describes the failure. In the case of an expectation point match failure, it is the name of the parser that failed to match or, if the parser is to match literal text, like '!' in the preceding example, the what parameter will be "literal-char" or similar. In this case, _4 will be "literal-char" (in the form of a boost::spirit::utf8_string which is a specialization of std::basic_string), and thus not terribly useful in a diagnostic.

To make the error message more helpful, and especially in rules with more than one literal parser to distinguish, create distinct, named rules:

exclamation = lit('!');
exclamation.name("!");
rule = alpha > exclamation;
on_error<fail>(rule,
   std::cerr << val("Expected ") << _4 << " at offset "
      << (_3 - _1) << " in \" << std::string(_1, _2) << '"'
      << std::endl);

This will report Expected ! at offset... when the exclamation rule fails to match.

Since an expectation point failure is distinguished by the what parameter, it follows that the what parameter can be used to dispatch to different behavior in the error handler based upon which expectation point failed to match. Doing so can be as simple as passing the what parameter to an error handling function which can use normal C++ techniques for dispatch such as cascading if-else’s or a map lookup, using the what string as the key to find a function to call. However, Phoenix offers power to do that work within the context of the on_error() call:

semicolon = lit(';');
semicolon.name(";");
rule = alpha > semicolon > alpha;
on_error<fail>(rule,
   let(_a = bind(&boost::spirit::info::tag, _4))
   [
      if_(";" == _a)
      [
         report_missing(_4, _1, _2, _3)
      ]
      .else_
      [
         if_("alpha" == _a)
         [
            report_missing("second word", _1, _2, _3)
         ]
         .else_
         [
            report_error(_4, _1, _2, _3)
         ]
      ]
   ]);

For the last example to compile, a number of include and using directives are necessary beyond the basics you are probably accustomed to seeing:

#include <boost/spirit/home/phoenix/bind/bind_member_variable.hpp>
#include <boost/spirit/home/phoenix/scope/let.hpp>
#include <boost/spirit/home/phoenix/scope/local_variable.hpp>
#include <boost/spirit/home/phoenix/statement/if.hpp>
using boost::phoenix::local_names;

It would seem, at first blush, that comparing to _4 directly should work, but it doesn’t because _4 is a Phoenix actor. Instead, a string type is needed to support the comparisons against the string literals for dispatching. In this example, a local Phoenix variable, _a is declared and assigned the result of binding _4 to boost::spirit::info::tag, the field of the boost::spirit::info struct that contains the what string. Thus, _a is a variable local to the error handler that is bound to the boost::spirit::utf8_string that describes the error and supports comparisons. Note the use of Phoenix’s let construct to declare a local variable scope. (This _a, which is boost::phoenix::local_names::_a, can be ambiguous with boost::spirit::qi::_a, depending upon using directives and declarations.)

The two functions, report_missing() and report_error() are not defined here, but presumably would report on stderr or raise an exception to indicate that a parsing error occurred, and would report the error context from the input range [_1,_2) and would note the error location, within that range, as given by _3.

When dispatching in this manner, there can be other parsing errors besides expectation point match failures, hence the final .else_ branch in the example error handler. For lack of a better response, the example just reports a generic error message that includes the what parameter’s text to give some sort of explanation. A real world rule would possibly provide a more context-specific diagnostic.

A final caution regarding this technique: the compile time, maintenance burden, and code size increases with each additional expectation point to be handled. Using a map-based dispatch may well be better when the number of expectation points grows. However, the diagnostic text generation may get out of synchronization with the point in the grammar triggering it because of their being located in different parts of the code.

There is another way to keep the diagnostic text near the rule triggering an error, while avoiding a great deal of code within the grammar. It involves collecting the rule name and corresponding diagnostic in a structure stored in an array that is then passed to an error handler that uses the what parameter to select a diagnostic from the array. If that was as clear as mud, don’t worry. The code should make it clear. Let’s start with the rule name to diagnostic mapping which combines the structure and array within a class template:

template <size_t N>
class diagnostics
{
public:
   diagnostics();

   // Adds a tag and diagnostic message pair to self.
   void
   add(char const * _tag, char const * _diagnostic);

   // Returns the diagnostic, if any, for _tag.
   char const *
   operator [](char const * _tag) const;

private:
   struct entry
   {
      char const * tag;
      char const * diagnostic;
   };

   entry  entries_[N];
   size_t size_;
};

diagnostics, as written, simply saves pointers to string literals. For more flexibility, it could store real strings (std::basic_string<>s, for instance), but this design is useful and simpler for exposition. To use diagnostics, one must create a grammar data member for each rule that will use it, and then populate it as needed by the rule:

semicolon = lit(';');
semicolon.name(";");
rule = alpha > semicolon > alpha;
diags.add(";", "Missing semicolon after first word");
diags.add("alpha", "Missing second word");
on_error<fail>(rule,
   error_handler(ref(diags), _1, _2, _3, _4));

Notice how the first expectation point is identified by a named rule for the required semicolon, which will produce an error message or exception containing the diagnostic text "Missing semicolon after first word". Similarly, if there is no word after a semicolon, then the diagnostic "Missing second word" will be used because the second alpha will fail to match. In each case, the expectation is that the error handler will use _4 to indicate which rule fail to satisfy an expectation point.

To round out this example, here’s how error_handler() might look:

struct error_handler_impl
{
   template <class, class, class, class, class>
   struct result { typedef void type; };

   template <class D, class B, class E, class W, class I>
   void
   operator ()(D const & _diagnostics, B _begin, E _end,
      W _where, I const & _info) const
   {
      utf8_string const & tag(_info.tag);
      char const * const what(tag.c_str());
      char const * diagnostic(_diagnostics[what]);
      std::string scratch;
      if (!diagnostic)
      {
         scratch.reserve(25 + tag.length());
         scratch = "Invalid syntax: expected ";
         scratch += tag;
         diagnostic = scratch.c_str();
      }
      raise_parsing_error(diagnostic, _begin, _end,
         _where);
   }
};
phx::function<error_handler_impl> error_handler;

You’re probably wondering where the implementation of diagnostics’ member functions are to be found. Here they are:

template <size_t N>
inline
diagnostics<N>::diagnostics()
   : size_(0)
{
}

template <size_t N>
void
diagnostics<N>::add(char const * const _tag,
   char const * const _diagnostic)
{
   assert(size_ < N);
   entry & e(entries_[size_++]);
   e.tag = _tag;
   e.diagnostic = _diagnostic;
}

template <size_t N>
char const *
diagnostics<N>::operator [](char const * const _tag) const
{
   for (size_t i(0); i < size_; ++i)
   {
      entry const & e(entries_[i]);
      if (0 == std::strcmp(e.tag, _tag))
      {
         return e.diagnostic;
      }
   }
   return 0;
}

It should now be apparent that there are numerous ways to dispatch error handling when using expectation points, but all revolve around decoding the what parameter. In the end, factor your grammar to be functional and readable and then consider which expectation point failure dispatching technique fits best without sacrificing readability or performance.

9 Responses to “Dispatching on Expectation Point Failures”

  1. Nicely done Rob. Thank you for adding this article with the various examples.

  2. Jamboree says:

    In the first 3 examples, shouldn’t
    std::cerr << "…"
    be
    std::cerr << phx::val("…")
    ?

    And thanks for posting this!

    • Rob says:

      Yes, you’re right. While std::cerr is recognized and can be lazy, it requires that the first expression streamed into it be lazy. val("Expected...") does that.

      Thank you. I’ve updated the post.

  3. Addy says:

    This is a great article though it could benefit from publishing the entire source code.

  4. Steve says:

    The error_handler_impl as written will not work on the latest versions of boost. It took me awhile, but it needs to be structured like this

    struct error_handler_impl
    {
    template
    struct result { typedef void type; };

    template
    void
    operator ()(D const & _diagnostics, B _begin, E _end,
    W _where, I const & _info) const
    {

    etc

    I was also unable to get let to work with the latest versions of boost. This is on MSVC 2010

Leave a Reply

preload preload preload