Sep 03

Spirit.Qi numeric parsers are the fastest in the world of C++ and surpasses even hand-written low level parsers such as atoi and atof written in C. This is old news actually. The numeric parsers are highly optimised from the very start, when I started writing Spirit V2. The tests were written in 2009, yet are still very relevant now. Here are the benchmark results from the same tests I ran today on a Mac using Clang. The benchmark code is in Github (qi/workbench):

Integer tests

atoi_test:  4.7452590000 [s] {checksum: 213ce33a}
strtol_test: 4.5116520000 [s] {checksum: 213ce33a}
spirit_int_test: 1.4126110000 [s] {checksum: 213ce33a}

Double Tests

atof_test: 3.2435210000 [s] {checksum: 84a4f7d}
strtod_test: 3.1327660000 [s] {checksum: 84a4f7d}
spirit_double_test: 0.7237810000 [s] {checksum: 84a4f7d}

16 Responses to “Fastest numeric parsers in the world!”

  1. stereomatching says:

    Nice to see this comparison result. By now the biggest defects of spirit are compilation speed + cryptic error messages, any possible to alleviate them by the power of c++14/17?

  2. fashanu says:

    This is pretty impressive. 🙂

  3. WTM says:

    Why “spirit_int_test” slower than “spirit_double_test”? Double needs moar ticks! Even in SSE or whatever code there.

    And wtf? It is “absolute to = ignore everything except 0..9 untill end”. Not funny, my own “pwcharlen to int” with char validating even faster: 0.89[s] (!). As for “to float”, well… 3.79[s],okay. But all possible errors are getting handled and returns BOOL false on fail.

    Couldn’t find spirit_double_test source, a lot of hpp, will stay for a while =)

  4. Greg says:

    I think it is a bit disingenuous to claim that the spirit parser is faster than hand-written parsers (lexers, really) without digging into why atoi and related functions are so much slower. I’m pretty sure the basic algorithm is the same in both cases — surely you don’t need backtracking or any fancy tricks to parse a simple integer.

    To investigate further, I wrote a simple atoi implementation myself in C, and indeed, it runs a bit faster than the spirit parser, and 3x faster than the native atoi (glibc-based, on my machine). So, the question is: “why is atoi so slow”? First, atoi just calls strtol, which is why strol is a bit faster than atoi. strtol supports much more runtime functionality than atoi, and I suspect a lot more functionality than qi::int_. strtol supports selecting the numeric base at runtime, optionally parsing 0x prefixed number, and skipping leading whitespace in a locale-specific manner and other infrequently used features.

    The real story, here, I believe, is that C’s strtol is breaking a long-held C++ promise: “You don’t pay for what you don’t use”. The overwhelming majority of strtol users don’t want to parse octal numbers, or use exotic locales, and certainly don’t want to pay the price for those features they aren’t using. Moreover, should they want to use these features, they almost certainly can make that decision at compile time.

    The overwhelming benefit is that C++ allows us to chose whether to use these features at compile time (or rather, template instantiation time), and thus, not pay for features we know we aren’t every going to use.

    • Joel de Guzman says:

      does your hand coded parser have all the features such as radix, min digits, max digits, under and overflow protection, etc?

      • Greg says:

        That’s the issue: this test code, which reproduces the atoi interface doesn’t support any of these, because the atoi interface doesn’t support them. Despite this, you pay a price for these features you don’t use (and can’t use) because glibc atoi just turns around and calls strtol

        • Joel de Guzman says:

          And then again, the point is that you can beat hand written code with C++ templates, without sacrificing on the feature set. At the very least you want under and overflow protection. Add that to your hand rolled code and if you do it right, you’ll get at least the same perf. Mind you, it’s not an easy challenge. Do it wrong and you’ll end up with slower code.

    • Michael says:

      Agree, this is a bit of an unfair benchmark. I have no doubt that Qi is fast at parsing numbers, but this benchmark is a rather poorly written attempt to prove that.

      • Christopher Beck says:

        Do you have a better benchmark that you would rather use? The benchmark looks okay to me, and it’s not out of line with what I have observed when benchmarking other programs that use qi / other parsing functions.

  5. Silver Möls says:

    I wrote optimized csv-parser in C++ that imports ~100M rows (~10GB) of 11 numeric fields per row from text file to binary-format in ~1 100 000 000 numbers in 1.3s (given that file is already in disc-cache) on my 3 year old but still good laptop. Iirc it was ~25x faster than atoi() and after that i added multi-threading.

    Format ain’t same like here: 2 fixed-point to float, 2 int’s, date(3 int’s), time(4 int’s) – fx tick-data.
    But this test seems to be for 10M int’s (~0.1GB of text). I doubt that mine is 50-100x faster, but even when running on 1 thread it still gives 20x faster conversion rate – thou it is fixed to this field format, but it also parses the field separators (‘ ‘ and ‘.’ and eol).

    If anyone interested: gmail is slyy2048

  6. zhanxw says:

    It’s really fast. Is there documentation to explain why it is that fast?

  7. Sergey Zlygostev says:

    This parser does not implement the functionality of atoi (). It fails the test with leading spaces, as well as the overflow test.
    Try to parse ” -9″ and you will get 0!
    Try to parse “-99999999999” (11 digits) and you will get -999999999 (9 digits)!

    • Joel de Guzman says:

      Well, I could argue that Spirit numeric parsers can do a lot more things that atoi can’t, like custom ints, and a lot more! But I need to correct you that Spirit can’t handle leading spaces. Of course it can! And it is also wrong to say that Spirit does not have overflow handling. Of course it has! You are probably using it incorrectly or you do not know what you are talking about.

  8. Zoltan Tirinda says:

    The only reason why spirit qi is faster when parsing floats is that it doesn’t support round-trips. Round-tripping is when you print out your float in full precision and read it back, then you get the exact same float. For example try to print 7.62223644e-05 in full precision. You will get this string “7.6222364e-05” and when you read it back with Spirit, there will be an inaccuracy 7.62223717e-05, while strtof() will give the exact same float 7.62223644e-05.

    Spirit implements the parser using float multiplication/division while strtof() is constructing the result using biginteger following the standard.

    It’s nice to have these benchmarks, but without saying the “why” it just seems as a cheap commercial. I’m a bit disapointed this fact is not clearly stated in the boost spirit documentation.

Leave a Reply

preload preload preload