Character and Phrase Levels |
Typical
parsers regard the processing of characters
(symbols
that form words or lexemes) and phrases (words
that
form sentences) as separate domains. Entities such
as reserved words, operators, literal strings,
numerical
constants, etc., which constitute the terminals of
a grammar are usually extracted first in a
separate
lexical analysis stage.
At
this point, as evident in the examples we have so
far, it is important to note that contrary to
standard
practice, the Spirit framework handles parsing
tasks
at both the character level as well as the phrase
level. One may consider that a lexical analyzer is
seamlessly integrated in the Spirit framework.
Although the Spirit parser library does not need a
separate lexical analyzer, there is no reason why
we cannot have one. One can always have as many
parser
layers as needed. In theory, one may create a
preprocessor,
a lexical analyzer and a parser proper, all using
the same framework.
|
|
|
|
The framework predefines some parser primitives. The most common is
"chlit"
(character literal), "range" (character range), and "strlit" (string
literal).
Going back to our original example, the character literals '(',
')',
'+', '-', '*' and '/' in the grammar declaration are chlit
objects
that are implicitly created behind the scenes. One may prefer to declare
these
explicitly as:
chlit<> plus('+');
chlit<> minus('-');
chlit<> times('*');
chlit<> divide('/');
chlit<> oppar('(');
chlit<> clpar(')');
The framework also predefines a few more utility parsers. There is
"epsilon"
which matches the null string (always returns a sucessful match with 0
length),
"anychar" which matches any single character (including the '\0') and
"nothing"
which never matches anything and always fails. Finally, there is a full
repertoire
of single character parsers: alnum,
alpha, cntrl, digit, graph, lower, print, punct, space, upper and
xdigit.
The Complete List of Parser
Primitives |
Basics |
chlit |
Character
literal |
Example:
chlit<>('X'); |
range |
Character
range |
Example:
range<>('0','9'); |
strlit |
String
literal |
Example:
strlit<>("Hello"); |
|
|
|
chlit and range are template classes
parametized by character type which
defaults to char.
Examples:
chlit<wchar_t>,
range<unsigned
char>
strlit is a template class
parametized
by the string
type which defaults to
cstring<>.
The auxilliary class cstring is a
template
class parametized by the character
type
which defaults to char.
Examples:
strlit<cstring<wchar_t>
>,
strlit<cstring<short>
>
|
|
|
|
Functors |
ch_p(ch) |
Functor
version of chlit |
range_p(from,
to) |
Functor
version of range |
str_p(string) |
Functor
version of strlit |
|
|
|
These functors are designed to be used
within expressions. These
functors are actually objects that
create
parsers. Example:
helloworld
= str_p("hello")
>>
str_p("world");
|
|
|
|
Utility
Primitives |
anychar |
Matches
any character |
nothing |
Matches
nothing |
epsilon |
The
epsilon, always a sucessful
match
with 0 length |
|
|
|
alnum |
Alpha-numeric
characters |
alpha |
Alphabetic
characters |
cntrl |
Control
characters |
digit |
Numeric
digits |
graph |
Non-space
printing
characters |
lower |
Lower case
letters |
print |
Printable
characters |
punct |
Punctuation
symbols |
space |
Space, tab,
return,
etc. |
upper |
Upper case
letters |
xdigit |
Hexadecimal
digits |
|
|
|
|