Character and Phrase Levels |
Typical
parsers regard the processing of characters
(symbols
that form words or lexemes) and phrases (words
that
form sentences) as separate domains. Entities such
as reserved words, operators, literal strings,
numerical
constants, etc., which constitute the terminals of
a grammar are usually extracted first in a
separate
lexical analysis stage.
At
this point, as evident in the examples we have so
far, it is important to note that contrary to
standard
practice, the Spirit framework handles parsing
tasks
at both the character level as well as the phrase
level. One may consider that a lexical analyzer is
seamlessly integrated in the Spirit framework.
Although the Spirit parser library does not need a
separate lexical analyzer, there is no reason why
we cannot have one. One can always have as many
parser
layers as needed. In theory, one may create a
preprocessor,
a lexical analyzer and a parser proper, all using
the same framework.
|
|
|
|
The framework predefines some parser primitives. The most common is "chlit"
(character literal), "range" (character range), and "strlit" (string literal).
Going back to our original example, the character literals '(',
')',
'+', '-', '*' and '/' in the grammar declaration are chlit
objects
that are implicitly created behind the scenes. One may prefer to declare
these
explicitly as:
chlit<> plus('+');
chlit<> minus('-');
chlit<> times('*');
chlit<> divide('/');
chlit<> oppar('(');
chlit<> clpar(')');
The framework also predefines a few more utility parsers. There is
"epsilon"
which matches the null string (always returns a sucessful match with 0
length),
"anychar" which matches any single character (including the '\0') and
"nothing"
which never matches anything and always fails. Finally, there is a full
repertoire
of single character parsers: alnum,
alpha, cntrl, digit, graph, lower, print, punct, space, upper and
xdigit.
The Complete List of Parser
Primitives |
Basics |
chlit |
Character literal |
Example: chlit<>('X'); |
range |
Character range |
Example: range<>('0','9'); |
strlit |
String literal |
Example: strlit<>("Hello"); |
|
|
|
chlit and range
are template classes parameterized by
character type which defaults to char.
Examples:
chlit<wchar_t>,
range<unsigned
char>
strlit is a template class parameterized
by the string type which defaults to cstring<>.
The auxilliary class cstring is a template
class parameterized by the character type
which defaults to char.
Examples:
strlit<cstring<wchar_t>
>,
strlit<cstring<short>
>
|
|
|
|
Functors |
ch_p(ch) |
Functor version of
chlit |
range_p(from, to) |
Functor version of
range |
str_p(string) |
Functor version of
strlit |
|
|
|
These functors are designed to be
used within expressions. These
functors are actually objects that create
parsers. Example:
helloworld
= str_p("hello")
>>
str_p("world");
|
|
|
|
Utility
Primitives |
anychar |
Matches any character |
nothing |
Matches nothing |
epsilon |
The epsilon, always
a sucessful match with 0 length |
|
|
|
alnum |
Alpha-numeric characters |
alpha |
Alphabetic characters |
cntrl |
Control characters |
digit |
Numeric digits |
graph |
Non-space printing
characters |
lower |
Lower case letters |
print |
Printable characters |
punct |
Punctuation symbols |
space |
Space, tab, return,
etc. |
upper |
Upper case letters |
xdigit |
Hexadecimal digits |
|
|
|
|
|
|