Nov 19

The Spirit2 Advantage

By Joel de Guzman Add comments

Why would you want to upgrade to Spirit2 from “Classic” Spirit?

First: Design. “Classic” Spirit evolved mostly adhoc. Over time, the library accumulated features, one on top of another. While the basic design proved to be sufficient in addressing extensions to a certain degree, after a certain point, the complexity became a real obstacle to advancement. We needed to refactor the library into smaller sub-libraries and modules. Spirit2 is a complete overhaul and redesign. This required new libraries such as Fusion and Proto and the new Phoenix. These infrastructure libraries took lots of time to perfect. Now, these libraries are in fact 100% solid Boost-certified libraries in their own right. In terms of design, Spirit2 is just about perfect! 🙂

Second: Performance. “Classic” Spirit was not optimized for speed. We didn’t actually spend time optimizing the code. The main focus was on perfecting the interface and making a solid library. There were a couple of issues with it that are rather problematic. For example, closures and grammars use TLS when multithreading and statics when not. That alone cost dearly.

Now attributes… Somehow, someway, you’ll add semantics to your parser. You’ll want it to compute something, compile something, translate something, etc. In all certainty, you’ll be generating “something”. You won’t write a parser just to check if the input conforms to the grammar. That something… is your attribute. It’s like in functional programming where you always return “something”.

In “Classic” Spirit, we have facilities to generate an AST. In Spirit2, an AST is just a specialized form of, you guessed it, attribute —the AST IS your “something”. In Spirit2, there’s no special case for ASTs. You just make your parser generate your tree for you. You map your grammar to your data structure and bingo!, the parser compiles the data in-situ. You can even use your own structs.

Now, in “Classic” Spirit, trees cost a lot. The parse-tree and AST in “classic” use some form of a runtime tree data structure comprised of vectors of nodes. We’ve found this to be rather expensive. In Spirit2, we use variants and tuples as much as possible. These data structures are known to be an order of magnitude faster than dynamic containers.

You don’t pay for it when you don’t need it. If you don’t need a certain attribute, just pass unused — it is a special type that inhibits attributes and attribute computation.

So how does this translate to performance?

Here’s an informal benchmark sent in by Denis Taniguchi, an early Spirit2 adopter:

Just a follow up of the performance problems I was facing using spirit1.x with parse tree. I managed to switch completely to spirit 2 and I though it was interesting to post the results:

Spirit 1.x [ straight pattern matching (no actions)  ]
real    0m0.359s
user   0m0.332s
sys     0m0.012s

Spirit 1.x [ parse tree generation ]
real    0m9.305s
user   0m9.209s
sys     0m0.076s

Spirit 2.1 [ AST with variant nodes generation ]
real    0m0.459s
user   0m0.432s
sys     0m0.028s

All done in the same computer using the same file as input.

Take note that the benchmark Spirit 1.x code does not do anything at all —it is just a pattern matcher, while the Spirit 2.1 collects and saves the relevant data from the input.

3 Responses to “The Spirit2 Advantage”

  1. Joel de Guzman says:

    This one just came in from Henry Tan:

    So to justify the cost of converting all of my grammar to Spirit.Qi I did a small experiment parsing tuples of expression. I have an equivalent grammars in Qi/Classic that parse tuple of expression. I generate random inputs containing about 50 expressions in the form of

    <name>=<value>, <name><subname>:<subvalue>=<value>

    The length of the string varies. I loop through (5000x, and 50,000x) the same input set and feed them to Qi and Classic. Not to get too scientific in the approach, the result basically shows that Spirit.Qi is about 7.8x faster than Spirit.Classic and both produces semantically equivalent ParseTree. For both 5000x and 50K run, the result is very linear 7.8x gain. I believe it was mainly because of a slimmer ParseTree that can be fine tune better.

  2. C++0X says:

    You are only talking about runtime performance, IMHO, i think we should not ignore the compile time overhead.

Leave a Reply

preload preload preload