This page will contain FAQ entries which will eventually be incorporated into the FAQ section of the documentation.
- Large Memory Consumption Compiling Spirit Parser on g++
- Correctly Parsing Identifiers
- Rule and Grammar Attributes
- Semantic Action for Mismatches
- Parse Or Supply a Default
- Alternatives and Attributes
- Skipper-less Rules in Phrase Parsing
Question: I am trying to assign a rule, y, to another rule, x:
x = y;
This fails at runtime with a Boost Assertion saying that I’m trying to initialize a rule from an uninitialized one. What is happening?
Answer: Spirit 2 rules conform to ‘proper’ C++ copy/assign semantics. This gives you the freedom to copy/store/assign your rules as you do any C++ object (pass them as arguments, return them from functions, store them in STL containers, etc). This is a major difference between Classic Spirit and Spirit 2.
You are trying to assign a rule (in the c++ sense instead of the BNF sense) to a different instance of the same type. There is no way to directly deduce your intention of referring to the right hand side rule instead of making a (C++) copy. So in this situation you need to write:
x = y.alias();
to make the lhs a ‘logical’ copy of the rhs (and not a copy in the pure C++ sense).
Question: Hmm… then how about
x = eps >> y;
should there be alias too?
Answer: No. Only when there is a single rule in the RHS.
Question: I wrote a spirit parser to read in a file which acts as a parameter file for a finite-element simulation code. The file itself has approx. 200 to 300 lines but the grammar for this is quite large, because you have very few repetitions in the job-file.
During the implementation I discoverd that the compilation process needs more and more memory (and time) with increasing grammar. Right now I need 8GB at peak, to compile the parser!
Is there a way to reduce the ram needed for computation? The machines where I have access easily have up to 8GB RAM only.
Answer: With gcc, the debugging information uses up a lot of ram. Try compiling with the ‘-g0′ flag.
Question: I store identifiers in a symbol table. My problem is that prefixes are being incorrectly matched. I don’t want that. For example, if I have an identifier named “bat” in the symbol table, “battery” is being incorrectly parsed as “bat” followed by a trailing “tery”. Is there a way to allow whole identifiers only?
Answer: Sure. The not-predicate is your friend. Check out calc6 code, for example:
>> !(alnum | ‘_’) // make sure we have whole words
ensures that a variable does not follow an alnum or underscore.
Question: Why do I have to specify the attribute for rules and grammars using the strange function declaration style:
qi::rule<Iterator, std::string(), skipper> r;
Answer: This syntax is equivalent to:
typedef std::string f_type();
rule<Iterator, f_type, skipper> r;
The C++ specs call it a ‘type-id’ “which is a declaration for an object or function of that type that omits the name of the object or function”. We usually call it function declaration syntax (even if it probably should be called function type-id syntax). It is used not only in Spirit but by several other libraries as well (see boost::function for instance).
The main reason for this is not because we wanted it that way, but because it is a nice and concise way of specifying a function interface. If you think about what a rule is, you quickly realize that it is like a function: it may return a value (the parsed result, i.e. synthesized attribute). Additionally, it may take one or more arguments (the inherited attributes).
A second, but minor reason is that we need to be able to distinguish the different template parameters of a rule. The template parameters can be specified in any order (except for the iterator), so we need some means of figuring out what’s what. The function declaration syntax is sufficiently different from the skipper or the encoding to allow it to be recognized at compile time.
Question: If a rule matches, and thus executes its semantic actions, but a rule which includes that rule mismatches, there seems to be no way to “unwind” the code executed down the chain. For example, if one of your semantic actions allocates memory or increments a reference count, how do you free/release that reference in the mismatch scenario?
r = p | eps[cleanup];
If p fails with side-effects from its direct or indirect semantic actions, the cleanup semantic action can roll them back.
Question: There are times when you want to parse, say, a number or provide a default if it isn’t in the input. How can you do that? (Thanks to Robert Stewart and Michael Caisse for this entry)
Answer: The parser for that value is optional, so the usual mechanism is something like this:
-(uint_ >> ‘s’)
(The ‘s’ parser is present just to illustrate that there can be more expected to follow the optional part of interest.)
The problem is how to provide a default value should there be no number (and whatever follows it). A naive approach is the following:
rule<Iterator, unsigned()> s;
s = (uint_ >> ‘s’) | eps[_val = 0];
Because there’s a semantic action in that rule, attribute compatibility rules are disabled. Thus, %= is needed rather than =. Secondly, the assignment must be lazy, so
boost::phoenix::val() must be used:
s %= (uint_ >> ‘s’) | eps[_val = val(0)];
This can be done more easily using
boost::spirit::qi::attr(), however, which obviates the semantic action and thereby eliminates the need for %=. This leaves us with the following simple solution:
s = (uint_ >> ‘s’) | attr(0);
Question: I have this rule:
r = a | b;
where both a and b and the r have an attribute which is a container (e.g. std::vector). When a fails and Qi backtracks and tries b, it doesn’t clear the vector before trying the second alternative and the final attribute ends up doubled. That seems like a bug to me.
Answer: If a fails, then it backtracks and tries b. The side-effect being that the work done by a is not rolled back. Why is the action not undone? Well, for the sake of efficiency. We’ve found that it is very expensive to roll back the attribute to its original state on a failed match. Doing so would involve lots of copying and swapping. Consider this pseudo code:
Attr save = attr;
if (parse(f, l, attr)) // if successful
Take note that you always pay for that extra copy every time!
To be very honest with you, we are actually violating a post condition required by parsers where: On a failed match, the attribute is left untouched. You can still see this in the docs until 2.4.2. Now, to avoid further confusion, this post condition has been changed to:
On a failed match, the attribute state is undefined.
And with that, it will be clear that when a fails, we make no guarantee that the attribute will be in a consistent state. Actually, if you think about it, it is the same as:
r = a[f] | b[f];
where when a fails, it does not roll back what it has already done by f. So, there’s really consistency with how semantic actions work.
How do you deal with cases like this? The hold directive is your friend: see Parser Directive for Attribute Commit/Rollback. For example:
r = hold[a] | b;
hold does the saving and swapping for you.
Question: I have this start rule:
qi::rule<iterator_type, qi::space_type> start;
start = (int_ % ‘,’);
This parser works great using qi::phrase_parse. However, parsing fails when I refactor the RHS into another rule and use an alias of that in my start rule:
vec = (int_ % ‘,’);
start = vec.alias();
What am I doing wrong?
Answer: A rule needs to know what skipper to use. Calling a rule inside phrase_parse which has no skipper is equivalent to an implicit lexeme around its right hand side. You don’t want that. You need to declare your vec rule with a skipper, just as you did with your start rule:
qi::rule<iterator_type, qi::space_type> vec;