This page is a compilation of best practices using Spirit.
- Separate grammar construction from parsing. I am not entirely sure if this merits an entry here since this is pretty much C++ 101 and not directly related to Spirit. Anyway, since it is short, let’s have it anyway as our first entry. Examples speak volumes and Spirit has lots of examples. For brevity, in the examples, parsing immediately follows the construction of the grammar. Example (example/qi/roman.cpp):
roman roman_parser; // Our grammar /*...*/ bool r = parse(iter, end, roman_parser, result);
In real world usage, this is not efficient. Grammars are meant to be constructed once and used many times. It is always a good idea to separate construction from parsing.
There are exceptions, for sure. Daniel James noted (see comments below) that for non context-free grammars that require a reference to some state, the easiest way is to construct a new grammar each time.
- Avoid complex rules. Rules with complex definitions hurt the compiler badly. We’ve seen rules that are more than a hundred lines long and take a couple of minutes to compile. On some compilers, experience shows that the compile time is exponential in relation to the RHS expression length. C++ compilers were not designed to handle such big expressions and some just couldn’t cope (crashes). It is always best to break complex rules into more manageable, easier to digest parts. Doing so also makes the rules more readable.
- Avoid complex grammars. Try as much as possible to modularize big grammars into smaller sub-grammars. Spirit grammars are composable. Try to identify the grammar parts, especially those that can be reused, and separate them into their own sub-grammars. Reusable grammars are a real advantage. For example, how often have you written a rule for identifiers?
- Take things one step at a time. Don’t try to write a grammar that covers all the complexity of your input. Start with the simplest piece of the input and write a parser for that. Gradually add more rules to your grammar as you cover more complexity in the input.
take things one step at a time. Don't try to write a grammar that covers all the complexity of your input. Start with the simplest piece of the input and write a parser for that. Gradually add more rules to your grammar as you cover more complexity in the input. You can either develop the whole parsing bit first and then work on the semantic actions associated with each rule, or you can ping-pong back and forth between parsing and actions. I don't know which is better yet. I've been doing a little bit of both.
It’s worth mentioning – quickbook is constantly constructing new grammars. One problem is when the grammar isn’t context free, then it needs a reference to some state. The easiest way to do that is to construct a new grammar each time.
I think it’s also worth mentioning splitting complicated grammars into several files, that seems to come up a lot.
On second thoughts, I don’t think grammars with state should be mentioned since that’s a more advanced topic. Best practices should be established from the start, knowing when and how to break them can come later.
I haven’t actually used it yet but error handling is probably worth a mention. Perhaps also some simple ways to keep the grammar efficient (maybe avoiding excessive back tracking?), although that might be best left for another article.
I kept it anyway. It’s just a one-liner.
I agree that a lot of people ask help on how to reduce compile times by splitting their grammars across several translation units. After doing some reading in the mailing list archives and piecing together a few hints in the documentation I was able to pull it off. I feel that this is requested often enough that it could be explained here.
I could contribute some example code if that would help.
It may be a little on the advanced side, but it would encourage people to create smaller, simpler grammars.
Definitely! That would be good.
spirit’s mini_c example shows how to separate!
Well, Grammars are meant to be constructed once and used many times. In my (and others) case it would be useful to have an example of this best practise in practise, maybe a simple threaded parse function with ‘global’ grammar. Where to place them, how to access them as best practise? Anonymous namespace, static global grammar, foo_init(), singleton etc.?
Thanks,
Olaf
BTW, even in spirit’s scheme example the grammar isn’t reused, isn’t?
sry, after thinking about even more ideas/questions rise up. All the examples I’ve seen bind the error handler to the grammar (scheme e.g.) where the grammar takes the filename for reporting errors trough the handler. What are the recommandations on this? This may prevent simple reuse (or I do have to rebind the phoenix::function again or similar). Maybe the scheme and mini_c examples should aware on this best practise also since it serves as inspiration for non professionals like me.
Regards,
Olaf
Hmm, I’m not sure how binding the error handler to the grammar prevents reuse? What am I missing?
hey, guys
I recently uses spirit, my platform is MinGW.
Because the grammar includes lots of rules, I got a long-time compile:
1. debug target: can not compile, errors are:
————– Build: Debug in jack —————
Compiling: main.cpp
cc1plus.exe: out of memory allocating 38714 bytes
Process terminated with status 1 (9 minutes, 33 seconds)
0 errors, 0 warnings
2. release target: can compile. consume lots of time (>5 mins)
I know there are lots of compile time computation there. It will consume lot of RAM, in my example, the compile needs >2G ram. But my computer’s RAM is right 2G there. So when the RAM is used up, both the compile and system
will slow down largely.
Any suggestions on how to write rules and grammars more efficiently ?
C++ Template is good, but on another side it is bad as well.
anders
Well, as it says in #2, avoid very complex rules. It is best to keep rules as simple as possible. Do you have rules spanning multiple lines? Break them up into smaller rules. As for #3, I’ll be providing a simple example on modular grammar construction (Josh gave me a small example, but I am not sure yet how to best proceed) or you might want to look at the mini_c example.
thanks Joel.
In my practice for using Spirit for about 2 months, I find that char_ is sometimes a
invisible killer. Say, when a man want to ignore something in parsing, he may write something like this:
Whey he parses something like this kind of fragment:
Here it cannot be parsed successfully. why ? because the expr will eat the whole:
after it sees the != sign , it is terrible !! Once this happens, the bad man will use lots of time to find the root cause. I am one of the poor mans.
The solution for this is for example :
At least do not let it go across C++ lines.
So I suggest whether the following can be seen like best practices:
1. Parse something as accurate as you can. That means, if an certain identifier can be recognized as
then it’s a bad idea to use somthing like this:
2. If a rule starting like this:
Then it is a good idea to have a twice eye on it. This is definitely a bug candidate.
Dear Joel,
You say “Avoid complex rules”. But what is the performance penalty for breaking up complex grammars / rules into smaller composable pieces.
I have been using smaller “sub-rules” each with their own names for better error handling but am unsure how much of a performance penalty I am paying.
Leo
There is no performance penalty.
Well, that’s not my experience.
Whenever you break a rule, you are creating a qi::rule object, which calls the enclosed parser through a virtual function, which is slower. For a parser that is called for every word in a text, this can make the parsing 2x slower. For a parser called for every character it would be even worse.
Of course you can break the rule using auto (or BOOST_AUTO), but this just clarifies the code, it doesn’t really break the rule from the compiler’s perspective.
Am I missing something?
hi,
There is a discussion on the spirit user mailing list, I think this is a problem when using spirit if you want your grammar be modular.
So, can you please have a look at this problem? I think this problem is critical to the extension on spirit usage compared to the many details in spirit.
http://news.gmane.org/gmane.comp.parsers.spirit.general
the subject is : “questions on modular grammars”
anders
The project Scalpel models std c++ grammar, but, it writes all the rules together in one grammar(struct). So there is no modular in Scalpel project, I think modular should be separated the grammars in different files, in different grammar structs.
But, the “object recursively constructed” problems lie there.
So , can spirit developers give some ideas on how the modular can be done.(please have a look at the discussions)