Primitives

Character and Phrase Levels

Typical parsers regard the processing of characters (symbols that form words or lexemes) and phrases (words that form sentences) as separate domains. Entities such as reserved words, operators, literal strings, numerical constants, etc., which constitute the terminals of a grammar are usually extracted first in a separate lexical analysis stage.

At this point, as evident in the examples we have so far, it is important to note that contrary to standard practice, the Spirit framework handles parsing tasks at both the character level as well as the phrase level. One may consider that a lexical analyzer is seamlessly integrated in the Spirit framework.

Although the Spirit parser library does not need a separate lexical analyzer, there is no reason why we cannot have one. One can always have as many parser layers as needed. In theory, one may create a preprocessor, a lexical analyzer and a parser proper, all using the same framework.

The framework predefines some parser primitives. The most common is "chlit" (character literal), "range" (character range), and "strlit" (string literal).

Going back to our original example, the character literals '(', ')', '+', '-', '*' and '/' in the grammar declaration are chlit objects that are implicitly created behind the scenes. One may prefer to declare these explicitly as:
chlit<> plus('+');
chlit<> minus('-');
chlit<> times('*');
chlit<> divide('/');
chlit<> oppar('(');
chlit<> clpar(')');
The framework also predefines a few more utility parsers. There is "epsilon" which matches the null string (always returns a sucessful match with 0 length), "anychar" which matches any single character (including the '\0') and "nothing" which never matches anything and always fails. Finally, there is a full repertoire of single character parsers: alnum, alpha, cntrl, digit, graph, lower, print, punct, space, upper and xdigit.

The Complete List of Parser Primitives

Basics

chlit Character literal Example: chlit<>('X');

range Character range Example: range<>('0','9');

strlit String literal Example: strlit<>("Hello");

chlit and range are template classes parametized by character type which defaults to char.

Examples: chlit<wchar_t>, range<unsigned char>

strlit is a template class parametized by the string type which defaults to cstring<>. The auxilliary class cstring is a template class parametized by the character type which defaults to char.

Examples: strlit<cstring<wchar_t> >, strlit<cstring<short> >

Functors

ch_p(ch) Functor version of chlit

range_p(from, to) Functor version of range

str_p(string) Functor version of strlit

These functors are designed to be used within expressions. These functors are actually objects that create parsers. Example:

helloworld = str_p("hello") >> str_p("world");

Utility Primitives

anychar Matches any character

nothing Matches nothing

epsilon The epsilon, always a sucessful match with 0 length

alnum Alpha-numeric characters

alpha Alphabetic characters

cntrl Control characters

digit Numeric digits

graph Non-space printing characters

lower Lower case letters

print Printable characters

punct Punctuation symbols

space Space, tab, return, etc.

upper Upper case letters

xdigit Hexadecimal digits