Operators

Operators are used as a means for object composition and embedding. Simple parsers may be composed to form composites through operator overloading, crafted to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant. An expression such as:

 a | b

actually yields a new parser type which is a composite of its operands, a and b. Taking this example further, if a and b were of type chlit<>, the result would have the composite type:

 Alternative <chlit<>, chlit<> >

A suite of operators facilitate the composition of parsers:

The Complete List of Operators
Set operators
a | b Union match a or b. Also referred to as alternatives
a & b Intersection match a and b
a - b Difference match a but not b. If b's matched text is shorter than a's matched text, a successful match is made.
a ^ b XOR match a or b, but not both
~

Alternative operands are tried one by one on a first come first served basis starting from the leftmost operand. After a successfully matched alternative is found, the parser concludes its search, essentially short-circuiting the search for other potentially viable candidates. This short-circuiting implicitly gives the highest priority to the leftmost alternative.

Short-circuiting is done in the same manner as C or C++'s logical expressions; e.g. if (x < 3 || y < 2) where, if x evaluates to be less than 3, the y < 2 test is not done at all. In addition to providing an implicit priority rule for alternatives which is necessary, given the non-deterministic nature of the Spirit parser compiler, short-circuiting improves the execution time. TIP: If the order of your alternatives is logically irrelevant, strive to put the (expected) most common choice first for maximum efficiency.

Sequencing operators
a >> b Sequence match a and b in sequence
a && b Sequential-and same as above, match a and b in sequence
a || b Sequential-or match a or b in sequence

The sequencing operator >> can alternatively be thought of as the sequential-and operator. The expression a && b reads as match a and b in sequence. Continuing this logic, we can also have a sequential-or operator where the expression a || b reads as match a or b and in sequence. That is, if both a and b match, it must be in sequence; this is equivalent to a | b | a >> b.

Loops
*a Kleene star match a zero (0) or more times
+a Positive match a one (1) or more times
!a Optional match a zero (0) or one (1) time

Operator precedence and grouping:

Since we are defining our meta-language in C++, we follow C/C++'s operator precedence rules. Groups override this (e.g., *(a | b) reads: match a or b zero (0) or more times).

Primitive type operands

It is important to emphasize that only one of the operands may be a primitive data type. C++ forbids the overloading operator for primitives such as ints, chars and pointers (e.g. char*). At least one of the operands must be a user-defined type).

Typically, in an expression involving multiple operators, casting the leftmost operand as a parser is enough to 'infect' all the rest of the operands to its right to be acceptable. Examples:

r = 'a' | 'b' | 'c' | 'd';                 // ill formed
r = ch_p('a')  | 'b' | 'c' | 'd';    // OK

For binary operators, one of the operands but not both may be a char, wchar_t, char const* or wchar_t const*. Examples:

a | 'x';
a - "Hello World";

where a is a parser object.

A few simple classes, when composed and structured in a hierarchy, form a very powerful object oriented recursive descent parsing engine. These classes provide the infrastructure needed for the construction of more complex parsers. The final parser composite is a non-deterministic recursive descent parser with infinite look-ahead. Top-down descent traverses the hierarchy, checking each object for a match, backtracking and checking all possible alternatives.

Non-determinism is not an absolute limitation though. The character set and the symbol table introduce some form of determinism. For instance, whereas 'a' | 'b' | 'c' is non-deterministic, chset('a') | 'b' | 'c' is absolutely deterministic. Soon, we shall see more deterministic parsers in future versions of the framework.

The flexibility of object embedding and composition combined with recursion opens up a unique approach to parsing. Subclasses are free to form aggregates and algorithms of arbitrary complexity. Complex parsers can be created with the composition of only a few primitive classes.

The framework is designed to be fully open-ended and extensible. New primitives or composites, from the trivial to the complex, may be added any time. Composition happens at compile time (static). Again, this is not a requirement. There are facilities, such as the symbol table, that enable dynamic creation and modification of parser behavior and structure. This is possible through the expressive flexibility of C++ template meta-programming.