Operators are used as a means for object composition and embedding. Simple parsers may be composed to form composites through operator overloading, crafted to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant. An expression such as:
a | bactually yields a new parser type which is a composite of its operands, a and b. Taking this example further, if a and b were of type chlit<>, the result would have the composite type:
Alternative <chlit<>, chlit<> >A suite of operators facilitate the composition of parsers:
Operator precedence and grouping:
Since we are defining our meta-language in C++, we follow C/C++'s operator precedence rules. Groups override this (e.g., *(a | b) reads: match a or b zero (0) or more times).
Primitive type operands
It is important to emphasize that only one of the operands may be a primitive data type. C++ forbids the overloading operator for primitives such as ints, chars and pointers (e.g. char*). At least one of the operands must be a user-defined type).
Typically, in an expression involving multiple operators, casting the leftmost operand as a parser is enough to 'infect' all the rest of the operands to its right to be acceptable. Examples:r = 'a' | 'b' | 'c' | 'd'; // ill formed r = ch_p('a') | 'b' | 'c' | 'd'; // OKFor binary operators, one of the operands but not both may be a char, wchar_t, char const* or wchar_t const*. Examples:
a | 'x'; a - "Hello World";where a is a parser object.
A few simple classes, when composed and structured in a hierarchy, form a very powerful object oriented recursive descent parsing engine. These classes provide the infrastructure needed for the construction of more complex parsers. The final parser composite is a non-deterministic recursive descent parser with infinite look-ahead. Top-down descent traverses the hierarchy, checking each object for a match, backtracking and checking all possible alternatives.
Non-determinism is not an absolute limitation though. The character set and the symbol table introduce some form of determinism. For instance, whereas 'a' | 'b' | 'c' is non-deterministic, chset('a') | 'b' | 'c' is absolutely deterministic. Soon, we shall see more deterministic parsers in future versions of the framework.
The flexibility of object embedding and composition combined with recursion opens up a unique approach to parsing. Subclasses are free to form aggregates and algorithms of arbitrary complexity. Complex parsers can be created with the composition of only a few primitive classes.
The framework is designed to be fully open-ended and extensible. New primitives or composites, from the trivial to the complex, may be added any time. Composition happens at compile time (static). Again, this is not a requirement. There are facilities, such as the symbol table, that enable dynamic creation and modification of parser behavior and structure. This is possible through the expressive flexibility of C++ template meta-programming.