The Spirit mailing list is still the place where active discussion takes place. I will be posting some excerpts from the mailing list here every once in a while. Here’s an interesting post from Leo Goodstadt:
Are Qi parsers thread-safe?
The answer of course is yes
I have been trying to speed up some parsing code going through around 25 GB of data. This had been taking up 6 hours using Python code. Re-writing it using Qi had taken it down to 4 minutes. And finally, with the help of Intel Threading Building Blocks, I am down to 37 seconds or a > 500x speedup!
The final code runs optimally with around 15 threads. The process seems to be entirely bounded by the physical limits of pulling 25 Gb off the hard disk raid over the network.
Intel TBB works really very well with Spirit / Qi.
I divide the parsing tasks into three parts:
- The data is in a one-item-per-line format, so I can divide up the file as it is being read into chunks of lines of around (<=) 64 kb at a time. This is a serial process.
- These are then parsed by Qi and analysed in parallel.
- TBB then takes the output from (2) and feeds them serially in order for step (3).
I have to write some classes to provide a set of reusable buffers for the data flow through the pipeline to avoid lots of memory allocations and thrashing the cache.
At the moment, each thread has its own lexer. Is there data held statically within each instance of the lexer or are they thread-safe?
Hartmut replies: There is no static data, everything should be perfectly thread safe as long as you do not share either the lexer objects or the iterators between threads, while simultaneously accessing them.
I should say, thank you so much for Qi. It is the one library I can point to in c++ which trumps anything out there for any other language.