This page is a compilation of best practices using Spirit.
- Separate grammar construction from parsing. I am not entirely sure if this merits an entry here since this is pretty much C++ 101 and not directly related to Spirit. Anyway, since it is short, let’s have it anyway as our first entry. Examples speak volumes and Spirit has lots of examples. For brevity, in the examples, parsing immediately follows the construction of the grammar. Example (example/qi/roman.cpp):
roman roman_parser; // Our grammar /*...*/ bool r = parse(iter, end, roman_parser, result);
In real world usage, this is not efficient. Grammars are meant to be constructed once and used many times. It is always a good idea to separate construction from parsing.
There are exceptions, for sure. Daniel James noted (see comments below) that for non context-free grammars that require a reference to some state, the easiest way is to construct a new grammar each time.
- Avoid complex rules. Rules with complex definitions hurt the compiler badly. We’ve seen rules that are more than a hundred lines long and take a couple of minutes to compile. On some compilers, experience shows that the compile time is exponential in relation to the RHS expression length. C++ compilers were not designed to handle such big expressions and some just couldn’t cope (crashes). It is always best to break complex rules into more manageable, easier to digest parts. Doing so also makes the rules more readable.
- Avoid complex grammars. Try as much as possible to modularize big grammars into smaller sub-grammars. Spirit grammars are composable. Try to identify the grammar parts, especially those that can be reused, and separate them into their own sub-grammars. Reusable grammars are a real advantage. For example, how often have you written a rule for identifiers?
loading...
It’s worth mentioning – quickbook is constantly constructing new grammars. One problem is when the grammar isn’t context free, then it needs a reference to some state. The easiest way to do that is to construct a new grammar each time.
I think it’s also worth mentioning splitting complicated grammars into several files, that seems to come up a lot.
loading...
On second thoughts, I don’t think grammars with state should be mentioned since that’s a more advanced topic. Best practices should be established from the start, knowing when and how to break them can come later.
I haven’t actually used it yet but error handling is probably worth a mention. Perhaps also some simple ways to keep the grammar efficient (maybe avoiding excessive back tracking?), although that might be best left for another article.
loading...
I kept it anyway. It’s just a one-liner.
loading...
I agree that a lot of people ask help on how to reduce compile times by splitting their grammars across several translation units. After doing some reading in the mailing list archives and piecing together a few hints in the documentation I was able to pull it off. I feel that this is requested often enough that it could be explained here.
I could contribute some example code if that would help.
It may be a little on the advanced side, but it would encourage people to create smaller, simpler grammars.
loading...
Definitely! That would be good.
loading...
spirit’s mini_c example shows how to separate!
Well, Grammars are meant to be constructed once and used many times. In my (and others) case it would be useful to have an example of this best practise in practise, maybe a simple threaded parse function with ‘global’ grammar. Where to place them, how to access them as best practise? Anonymous namespace, static global grammar, foo_init(), singleton etc.?
Thanks,
Olaf
loading...
BTW, even in spirit’s scheme example the grammar isn’t reused, isn’t?
loading...
sry, after thinking about even more ideas/questions rise up. All the examples I’ve seen bind the error handler to the grammar (scheme e.g.) where the grammar takes the filename for reporting errors trough the handler. What are the recommandations on this? This may prevent simple reuse (or I do have to rebind the phoenix::function again or similar). Maybe the scheme and mini_c examples should aware on this best practise also since it serves as inspiration for non professionals like me.
Regards,
Olaf
loading...
Hmm, I’m not sure how binding the error handler to the grammar prevents reuse? What am I missing?
loading...
hey, guys
I recently uses spirit, my platform is MinGW.
Because the grammar includes lots of rules, I got a long-time compile:
1. debug target: can not compile, errors are:
————– Build: Debug in jack —————
Compiling: main.cpp
cc1plus.exe: out of memory allocating 38714 bytes
Process terminated with status 1 (9 minutes, 33 seconds)
0 errors, 0 warnings
2. release target: can compile. consume lots of time (>5 mins)
I know there are lots of compile time computation there. It will consume lot of RAM, in my example, the compile needs >2G ram. But my computer’s RAM is right 2G there. So when the RAM is used up, both the compile and system
will slow down largely.
Any suggestions on how to write rules and grammars more efficiently ?
C++ Template is good, but on another side it is bad as well.
anders
loading...
Well, as it says in #2, avoid very complex rules. It is best to keep rules as simple as possible. Do you have rules spanning multiple lines? Break them up into smaller rules. As for #3, I’ll be providing a simple example on modular grammar construction (Josh gave me a small example, but I am not sure yet how to best proceed) or you might want to look at the mini_c example.
loading...
thanks Joel.
In my practice for using Spirit for about 2 months, I find that char_ is sometimes a
invisible killer. Say, when a man want to ignore something in parsing, he may write something like this:
expr = *( char_ – “!=” ) ;
if_statement = blablabla ;
Whey he parses something like this kind of fragment:
if(iFs->Open() != KErrNone )
{
User::Leave(whatever);
}
else if( from here goes wrong… )
{
intL();
}
else
{
if( KErrNone != xyz->Open())
{
hereL();
}
}
Here it cannot be parsed successfully. why ? because the expr will eat the whole:
if( from here goes wrong… )
{
intL();
}
else
{
if( KErrNone
after it sees the != sign , it is terrible !! Once this happens, the bad man will use lots of time to find the root cause. I am one of the poor mans.
The solution for this is for example :
expr = *( char_ – “!=”-’;’ ) ; At least do not let it go across C++ lines.
So I suggest whether the following can be seen like best practices:
1. Parse something as accurate as you can. That means, if an certain identifier can be recognized as alpha >> *alnum, then it’s a bad idea to use somthing like this: *(char_ – “_”)
2. If a rule starting like this:
rule A = *(char_ – “}” ) >> ruleB >> ruleC;
Then it is a good idea to have a twice eye on it. This is definitely a bug candidate.
loading...