Sep ’14 03

Spirit.Qi numeric parsers are the fastest in the world of C++ and surpasses even hand-written low level parsers such as atoi and atof written in C. This is old news actually. The numeric parsers are highly optimised from the very start, when I started writing Spirit V2. The tests were written in 2009, yet are still very relevant now. Here are the benchmark results from the same tests I ran today on a Mac using Clang. The benchmark code is in Github (qi/workbench):

Integer tests

atoi_test:  4.7452590000 [s] {checksum: 213ce33a}
strtol_test: 4.5116520000 [s] {checksum: 213ce33a}
spirit_int_test: 1.4126110000 [s] {checksum: 213ce33a}

Double Tests

atof_test: 3.2435210000 [s] {checksum: 84a4f7d}
strtod_test: 3.1327660000 [s] {checksum: 84a4f7d}
spirit_double_test: 0.7237810000 [s] {checksum: 84a4f7d}

10 Responses to “Fastest numeric parsers in the world!”

  1. stereomatching says:

    Nice to see this comparison result. By now the biggest defects of spirit are compilation speed + cryptic error messages, any possible to alleviate them by the power of c++14/17?

  2. fashanu says:

    This is pretty impressive. 🙂

  3. WTM says:

    Why “spirit_int_test” slower than “spirit_double_test”? Double needs moar ticks! Even in SSE or whatever code there.

    And wtf? It is “absolute to = ignore everything except 0..9 untill end”. Not funny, my own “pwcharlen to int” with char validating even faster: 0.89[s] (!). As for “to float”, well… 3.79[s],okay. But all possible errors are getting handled and returns BOOL false on fail.

    Couldn’t find spirit_double_test source, a lot of hpp, will stay for a while =)

  4. Greg says:

    I think it is a bit disingenuous to claim that the spirit parser is faster than hand-written parsers (lexers, really) without digging into why atoi and related functions are so much slower. I’m pretty sure the basic algorithm is the same in both cases — surely you don’t need backtracking or any fancy tricks to parse a simple integer.

    To investigate further, I wrote a simple atoi implementation myself in C, and indeed, it runs a bit faster than the spirit parser, and 3x faster than the native atoi (glibc-based, on my machine). So, the question is: “why is atoi so slow”? First, atoi just calls strtol, which is why strol is a bit faster than atoi. strtol supports much more runtime functionality than atoi, and I suspect a lot more functionality than qi::int_. strtol supports selecting the numeric base at runtime, optionally parsing 0x prefixed number, and skipping leading whitespace in a locale-specific manner and other infrequently used features.

    The real story, here, I believe, is that C’s strtol is breaking a long-held C++ promise: “You don’t pay for what you don’t use”. The overwhelming majority of strtol users don’t want to parse octal numbers, or use exotic locales, and certainly don’t want to pay the price for those features they aren’t using. Moreover, should they want to use these features, they almost certainly can make that decision at compile time.

    The overwhelming benefit is that C++ allows us to chose whether to use these features at compile time (or rather, template instantiation time), and thus, not pay for features we know we aren’t every going to use.

    • Joel de Guzman says:

      does your hand coded parser have all the features such as radix, min digits, max digits, under and overflow protection, etc?

      • Greg says:

        That’s the issue: this test code, which reproduces the atoi interface doesn’t support any of these, because the atoi interface doesn’t support them. Despite this, you pay a price for these features you don’t use (and can’t use) because glibc atoi just turns around and calls strtol

        • Joel de Guzman says:

          And then again, the point is that you can beat hand written code with C++ templates, without sacrificing on the feature set. At the very least you want under and overflow protection. Add that to your hand rolled code and if you do it right, you’ll get at least the same perf. Mind you, it’s not an easy challenge. Do it wrong and you’ll end up with slower code.

    • Michael says:

      Agree, this is a bit of an unfair benchmark. I have no doubt that Qi is fast at parsing numbers, but this benchmark is a rather poorly written attempt to prove that.

      • Christopher Beck says:

        Do you have a better benchmark that you would rather use? The benchmark looks okay to me, and it’s not out of line with what I have observed when benchmarking other programs that use qi / other parsing functions.

  5. Silver Möls says:

    I wrote optimized csv-parser in C++ that imports ~100M rows (~10GB) of 11 numeric fields per row from text file to binary-format in ~1 100 000 000 numbers in 1.3s (given that file is already in disc-cache) on my 3 year old but still good laptop. Iirc it was ~25x faster than atoi() and after that i added multi-threading.

    Format ain’t same like here: 2 fixed-point to float, 2 int’s, date(3 int’s), time(4 int’s) – fx tick-data.
    But this test seems to be for 10M int’s (~0.1GB of text). I doubt that mine is 50-100x faster, but even when running on 1 thread it still gives 20x faster conversion rate – thou it is fixed to this field format, but it also parses the field separators (‘ ‘ and ‘.’ and eol).

    If anyone interested: gmail is slyy2048

Leave a Reply

preload preload preload