Symbols

Ternary State Trees

The actual set implementation is supplied by the SetT template parameter (3rd template parameter of the symbols class) . By default, this uses the tst class which is an implementation of the Ternary Search Tree.

Ternary Search Trees are faster than hashing for many typical search problems especially when the search interface is iterator based. Searching for a string of length k in a ternary search tree with n strings will require at most O(log n+k) character comparisons. TSTs are many times faster than hash tables for unsuccessful searches since mismatches are discovered earlier after examining only a few characters. Hash tables always examine an entire key when searching.

For details see
http://www.cs.princeton.edu/~rs/strings/.

This class symbols implements a symbol table. The symbol table holds a dictionary of symbols where each symbol is a sequence of CharTs. The template class is parametized by the character type (CharT) can work efficiently with 8, 16, 32 and even 64 bit characters.

Traditionally, symbol table management is maintained seperately outside the BNF grammar through semantic actions. Contrary to standard practice, the Spirit symbol table symbols is-a parser. An instance of which may be used anywhere in the EBNF grammar specification. It is an example of a dynamic parser. A dynamic parser is characterized by its ability to modify its behavior at run time. Initially, an empty symbols object matches nothing. At any time, symbols may be added, thus, dynamically altering its behavior.

Each entry in a symbol table has an associated mutable data slot. In this regard, one can view the symbol table as an associative container (or map) of key-value pairs where the keys are strings. The data associated with each symbol may be modified any time. Later, when we get to the section on specialized actions, we shall see how that is done.

The symbols class expects two template parameters (actually there is a third, see detail box above). The first parameter (T) specifies the data type associated with each symbol (defaults to int) and the second parameter (CharT) specifies the character type of the symbols (defaults to char). Here are some sample declarations:

symbols<> sym;
symbols<short, wchar_t> sym2; struct my_info { int id;
double value; }; symbols<my_info> sym3;

After having declared our symbol tables, symbols may be added statically using the construct:

 sym = a, b, c, d ...;

where sym is a symbol table and a..d etc. are strings. Note that the comma operator is separating the items being added to the symbol table, through an assignment. Due to operator overloading this is possible and correct (though it may take a little getting used to) and is a concise way to initialize the symbol table with many symbols. Also, it is perfectly valid to make multiple assignments to a symbol table to iteratively add symbols (or groups of symbols) at different times.

Simple example:

 sym = "pineapple", "orange", "banana", "apple", "mango";
Note that it is invalid to add the same symbol multiple times to a symbol table, though you may modify the value associated with a symbol artibrarily many times.

Now, we may use sym in the grammar. Example:

fruits = sym >> *(',' >> sym);

This simple grammar fruits accepts a comma separated list of my favorite fruits. Using specialized actions, we can dynamically add more fruity entries into our symbol table sym. We shall deal with this later.