Directives

parser directives have the form: directive[expression]

A directive modifies the behavior of its enclosed expression, essentially 'decorating' it. The framework pre-defines a few directives. Clients of the framework are free to define their own directives as needed.

Predefined directives:

lexeme:

Turns off white space skipping. By default the parser ignores white spaces and all things considered as white spaces, possibly including comments as parameterized by the scanner passed into the parser's parse member function.

Situations where we want to work at the character level instead of the phrase level call for a special construct. Rules can be directed to work at the character level by enclosing the pertinent parts of the grammar inside the lexeme directive.

nocase:

There are times when we want to inhibit case sensitivity. The nocase directive converts all characters from the input to lower-case. It is important to note that only the input is converted to lower case. Any parser enclosed inside the nocase directive that expects any upper case characters will fail to parse anything. Example: nocase['X'] will never succeed because it expects an upper case 'X' that the nocase directive will never supply.

Parsing Numbers
Consider the parsing of integers and real numbers:

number = real | integer;

A number can be a real or an integer. This grammar is ambiguous. An input "1234" should potentially match both real and integer. Recall though that alternatives are short-circuited . Thus, for inputs such as above, the real alternative always wins. If we swap the alternatives:

number = integer | real;

We still have a problem. Now, an input "123.456" will be partially matched by integer until the decimal point. This is not what we want.

The solution here is either to fix the ambiguity by factoring out the common prefixes of real and integer or, if that is not possible nor desired, use the longest directive:

number = longest[ integer | real ];

longest:

Alternatives in the Spirit parser compiler are short-circuited. Sometimes, this is not what is desired. The longest directive instructs the parser not to short-circuit alternatives enclosed inside this directive, but instead makes the parser try all possible alternatives and choose the one matching the longest portion of the input stream.

shortest:

Opposite of the longest directive.

Now, going back to our original example, the observant reader might notice that the integer rule was left undefined. Although its definition is quite obvious, here's how it is actually defined in the context of the framework. Continuing our original example:

integer = lexeme[ 
!(ch_p('+') | '-') >> +digit
];

where digit is another predefined parser that calls the std::isdigit(ch) standard function. Optionally, we can take advantage of the numeric parser int_p:

integer = int_p;

Alas, we have our complete grammar specification:

rule<> integer, group, expr1, expr2, expr;
integer = lexeme[ !(ch_p('+') | '-') >> +digit ];
group   = '(' >> expr >> ')';
expr1   = integer | group;
expr2   = expr1 >> *(('*' >> expr1) | ('/' >> expr1));
expr    = expr2 >> *(('+' >> expr2) | ('-' >> expr2));
The Start Symbol

Typically, parsers have what is called a start symbol, chosen to be the root of the grammar where parsing starts. The Spirit parser compiler has no notion of a start symbol. Any rule can be a start symbol.

This feature promotes step-wise creation of parsers. We can build parsers from the bottom up while fully testing each level or module up untill we get to the top-most level.

Yes, it's a calculator. This production rule expr in our grammar specification, traditionally called the start symbol, can accept inputs such as:

12345
-12345
+12345
1 + 2
1 * 2
1/2 + 3/4
1 + 2 + 3 + 4
1 * 2 * 3 * 4
(1 + 2) * (3 + 4)
(-1 + 2) * (3 + -4)
1 + ((6 * 200) - 20) / 6
(1 + (2 + (3 + (4 + 5))))