Jan ’10 05

We got many questions how to parse data which has to be accessed using a standard input stream (usually a std::istream) without having to read all of the data into memory first.

The standard way of iterating over a stream would be to wrap it into a std::istream_iterator, but unfortunately this is not possible. Qi requires iterators to be at least forward iterators, while the std::istream_iterator is an input iterator only. To overcome this limitation Spirit V2.2 (the next version of Spirit, to be released with Boost V1.42) will implement a new iterator.

namespace boost { namespace spirit
{
    // this is functionally equivalent to std::istream_iterator
    template <typename Char, typename Traits = std::char_traits<Char> >
    struct basic_istream_iterator;

    // predefine specialization for 'char'
    typedef basic_istream_iterator<char> istream_iterator;
}}

This iterator is functionally equivalent to the std::istream_iterator except that it is a forward iterator you can utilize for your Qi parsing needs. Here is an example:

#include <boost/spirit/include/support_istream_iterator.hpp>

namespace spirit = boost::spirit;

// open file, disable skipping of whitespace
std::ifstream in("some_data_file");
in.unsetf(std::ios::skipws);

// wrap istream into iterator
spirit::istream_iterator begin(in);
spirit::istream_iterator end;

// use iterator to parse file data
spirit::qi::parse(begin, end, <...your grammar here...>);

This iterator is implemented on top of the multi_pass iterator framework, which is documented here. For those curious enough to have a peek: you can access the new iterator from Boost SVN here. But in order to use it with Boost prior to V1.42 you will need to download the whole iterators directory from SVN.

7 Responses to “Stream-based Parsing Made Easy”

  1. Marshall says:

    Helpful bit:

    #include <boost/spirit/include/support_istream_iterator.hpp>

  2. Felix says:

    The input stream iterator is definitely a useful feature.
    What about other Classic Spirit iterators?
    Is there a reason why File Iterator and Position Iterator are not part of Sprit 2.x?

    • Hartmut Kaiser says:

      Felix,

      The only reason why these have not been ported yet is that nobody invested time doing it :-P .
      But these iterators are still usable with Spirit 2.x (even if they live in namespace spirit::classic), so there is no real rush to port/rewrite them.

      Regards Hartmut

  3. Mark Jarvin says:

    When is Boost 1.42 due to be released? Am I right in guessing early February 2010? I’m extrapolating from here: http://www.boost.org/community/review_schedule.html

    Thanks!

    • Hartmut Kaiser says:

      Mark,

      the release schedule is all I can refer to only as well. But I know that beta1 has been posted today, so it seems we’re still on schedule.

      Regards Hartmut

  4. Sehe says:

    Note that it is slow; I have ran this comparison (1000x repeated parses of about 60 different files <100k, boost 1.46.1/Spirit 2042 and fully optimized release builds):


    #if 1
    std::string contents = read(spec);
    return do_parse_attempt(contents.begin(), contents.end(), handler);
    #else
    std::ifstream in(spec.c_str());
    in.unsetf(std::ios::skipws); // No white space skipping!

    // wrap istream into iterator
    boost::spirit::istream_iterator begin(in);
    boost::spirit::istream_iterator end;

    return do_parse_attempt(begin, end, handler);
    #endif

    (read() defined below)

    The copying/allocating version (using the read() helper) was done in 6.1s, the spirit::istream_iterator version in 10.9s

    That is with read(…) defined as braindead:


    std::string read(const std::string& spec)
    {
    std::ifstream in(spec.c_str());
    in.unsetf(std::ios::skipws);
    std::string storage;
    std::copy(
    std::istream_iterator(in),
    std::istream_iterator(),
    std::back_inserter(storage));
    return storage;
    }

Leave a Reply

To use RetinaPost you must register at http://www.RetinaPost.com/register