Character and Phrase Levels

Typical parsers regard the processing of characters (symbols that form words or lexemes) and phrases (words that form sentences) as separate domains. Entities such as reserved words, operators, literal strings, numerical constants, etc., which constitute the terminals of a grammar are usually extracted first in a separate lexical analysis stage.

At this point, as evident in the examples we have so far, it is important to note that contrary to standard practice, the Spirit framework handles parsing tasks at both the character level as well as the phrase level. One may consider that a lexical analyzer is seamlessly integrated in the Spirit framework.

Although the Spirit parser library does not need a separate lexical analyzer, there is no reason why we cannot have one. One can always have as many parser layers as needed. In theory, one may create a preprocessor, a lexical analyzer and a parser proper, all using the same framework.

The framework predefines some parser primitives. The most common is "chlit" (character literal), "range" (character range), and "strlit" (string literal).

Going back to our original example, the character literals '(',  ')',  '+',  '-',  '*' and '/' in the grammar declaration are chlit objects that are implicitly created behind the scenes. One may prefer to declare these explicitly as:

chlit<> plus('+');
chlit<> minus('-');
chlit<> times('*');
chlit<> divide('/');
chlit<> oppar('(');
chlit<> clpar(')');

The framework also predefines a few more utility parsers. There is "epsilon" which matches the null string (always returns a sucessful match with 0 length), "anychar" which matches any single character (including the '\0') and "nothing" which never matches anything and always fails. Finally, there is a full repertoire of single character parsers: alnum, alpha, cntrl, digit, graph, lower, print, punct, space, upper and xdigit.

The Complete List of Parser Primitives
chlit Character literal Example: chlit<>('X');
range Character range Example: range<>('0','9');
strlit String literal Example: strlit<>("Hello");

chlit and range are template classes parameterized by character type which defaults to char.

Examples: chlit<wchar_t>, range<unsigned char>

strlit is a template class parameterized by the string type which defaults to cstring<>. The auxilliary class cstring is a template class parameterized by the character type which defaults to char.

Examples: strlit<cstring<wchar_t> >, strlit<cstring<short> >





ch_p(ch) Functor version of chlit
range_p(from, to) Functor version of range
str_p(string) Functor version of strlit

These functors are designed to be used within expressions. These functors are actually objects that create parsers. Example:

helloworld = str_p("hello") >> str_p("world");

Utility Primitives
anychar Matches any character
nothing Matches nothing
epsilon The epsilon, always a sucessful match with 0 length

alnum Alpha-numeric characters
alpha Alphabetic characters
cntrl Control characters
digit Numeric digits
graph Non-space printing characters
lower Lower case letters
print Printable characters
punct Punctuation symbols
space Space, tab, return, etc.
upper Upper case letters
xdigit Hexadecimal digits