{"id":1018,"date":"2010-03-05T08:03:47","date_gmt":"2010-03-05T16:03:47","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/?page_id=1018"},"modified":"2010-03-06T09:29:04","modified_gmt":"2010-03-06T17:29:04","slug":"tracking-the-input-position-while-parsing","status":"publish","type":"page","link":"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/","title":{"rendered":"Tracking the Input Position While Parsing"},"content":{"rendered":"<p>The following example is about tracking the parsing position with <em>Spirit<\/em> V2. This is useful for generating error messages which tell the user exactly where an error has occurred. We also show how to use <em>Spirit<\/em> V2 to parse from an input stream without first reading the whole stream into a <span style=\"font-family: courier new;\">std::string<\/span>.<\/p>\n<p>In our example, we want to parse a list of real numbers and return them in a <span style=\"font-family: courier new;\">std::vector<\/span> and throw an exception if an error occurs. We will start with a naive way of achieving our goal and then improve the program in two steps. This naive way could be the following small program. It provides basic and not very helpful error handling and scales bad for large inputs:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n#include &lt;vector&gt;\r\n#include &lt;istream&gt;\r\n#include &lt;sstream&gt;\r\n#include &lt;iostream&gt;\r\n#include &lt;boost\/include\/spirit\/qi.hpp&gt;\r\n\r\nnamespace qi = boost::spirit::qi;\r\nnamespace ascii = boost::spirit::ascii;\r\n\r\n\/\/ parse list of doubles from input stream\r\n\/\/ throw exception (perhaps including filename) on errors\r\nstd::vector&lt;double&gt;\r\n  parse(std::istream&amp; input, const std::string&amp; filename);\r\n\r\n\/\/ main function\r\nint main(int, char**)\r\n{\r\n  try\r\n  {\r\n    parse(std::cin, &quot;STDIN&quot;);\r\n  }\r\n  catch(const std::exception&amp; e)\r\n  {\r\n    std::cerr &lt;&lt; &quot;Exception: &quot; &lt;&lt; e.what() &lt;&lt; std::endl;\r\n    return -1;\r\n  }\r\n  return 0;\r\n}\r\n\r\n\/\/ implementation\r\nstd::vector&lt;double&gt;\r\n  parse(std::istream&amp; input, const std::string&amp; filename)\r\n{\r\n  \/\/ get input into string\r\n  std::ostringstream buf;\r\n  buf &lt;&lt; input.rdbuf();\r\n  std::string str = buf.str();\r\n\r\n  \/\/ prepare iterators\r\n  std::string::iterator first = str.begin();\r\n  std::string::iterator last = str.end();\r\n\r\n  \/\/ parse\r\n  std::vector&lt;double&gt; output;\r\n  bool r = qi::phrase_parse(\r\n    \/\/ iterators over input\r\n    first, last,\r\n    \/\/ recognize list of doubles\r\n    qi::double_ &gt;&gt; *(',' &gt;&gt; qi::double_) &gt;&gt; qi::eoi,\r\n    \/\/ comment skipper\r\n    ascii::space | '#' &gt;&gt; *(ascii::char_ - qi::eol) &gt;&gt; qi::eol,\r\n    \/\/ store into this object\r\n    output);\r\n\r\n  \/\/ error detection\r\n  if(!r || first != last)\r\n    throw std::runtime_error(&quot;parse error in &quot; + filename);\r\n\r\n  \/\/ return result\r\n  return output;\r\n}\r\n<\/pre>\n<p>On the following input, parsing succeeds (and there is no output):<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n# good input\r\n1.3,2.3, 5\r\n<\/pre>\n<p>With the following faulty input<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n# bad input\r\n123,42.0,a,1.4\r\n<\/pre>\n<p>we get an exception:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nException: parse error in STDIN\r\n<\/pre>\n<p>If you want to try out this version of the program you may download it here: <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/naive_parsing.cpp\" target=\"_blank\">naive_parsing.cpp<\/a>, the two input files are <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/goodinput.txt\" target=\"_blank\">goodinput.txt<\/a> and <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/badinput.txt\" target=\"_blank\">badinput.txt<\/a>.<\/p>\n<p>To make the above scalable for large files, we now change the example to parse from an STL stream, and not buffer everything into a <span style=\"font-family: courier new;\">std::string<\/span>. For that, we need the <span style=\"font-family: courier new;\">boost::spirit::multi_pass<\/span> class which wraps a stream iterator and exposes an iterator usable by <em>Qi<\/em> parsers. The reason we need this adaptor is, that a <span style=\"font-family: courier new;\">std::istream_iterator<\/span> is an input iterator, and <em>Qi<\/em> requires a forward iterator which allows certain backtracking on the input.<\/p>\n<p>So we include the following file<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n#include &lt;boost\/include\/spirit\/support_multi_pass.hpp&gt;\r\n<\/pre>\n<p>and replace<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n   \/\/ get input into string\r\n   std::ostringstream buf;\r\n   buf &lt;&lt; input.rdbuf();\r\n   std::string str = buf.str();\r\n\r\n   \/\/ prepare iterators\r\n   std::string::iterator first = str.begin();\r\n   std::string::iterator last = str.end();\r\n<\/pre>\n<p>by<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n   \/\/ iterate over stream input\r\n   typedef std::istreambuf_iterator&lt;char&gt; base_iterator_type;\r\n   base_iterator_type in_begin(input);\r\n\r\n   \/\/ convert input iterator to forward iterator, usable by spirit parser\r\n   typedef boost::spirit::multi_pass&lt;base_iterator_type&gt; forward_iterator_type;\r\n   forward_iterator_type fwd_begin =\r\n       boost::spirit::make_default_multi_pass(in_begin);\r\n   forward_iterator_type fwd_end;\r\n<\/pre>\n<p>Now we only need to rename all occurrences of <span style=\"font-family: courier new;\">first<\/span> to <span style=\"font-family: courier new;\">fwd_begin<\/span> and all occurrences of <span style=\"font-family: courier new;\">last<\/span> to <span style=\"font-family: courier new;\">fwd_end<\/span> and we are done! (The complete second program can be downloaded from here: <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/stream_iterator_parsing.cpp\" target=\"_blank\">stream_iterator_parsing.cpp<\/a>.)<\/p>\n<p>This new program has the same behavior as the first one, but it does not need to buffer the whole input.<\/p>\n<p>The final task is to get more useful error messages if a parsing problem occurs. To that end we wrap the iterator yet another time, using the <span style=\"font-family: courier new;\">boost::spirit::classic::position_iterator2<\/span> class which records line number, column number, and filename of the parsed input.<\/p>\n<p>First we need one of additional include file:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n#include &lt;boost\/include\/spirit\/classic_position_iterator.hpp&gt;\r\nnamespace classic = boost::spirit::classic;\r\n<\/pre>\n<p>This file in fact provides several possibilities for position iterators: <span style=\"font-family: courier new;\">boost::spirit::classic::position_iterator<\/span> is a bit more efficient than <span style=\"font-family: courier new;\">boost::spirit::classic::position_iterator2<\/span>, but the latter allows to extract the whole line of input from an iterator in that line (and we will use this feature). Additionally, one can specify a &#8216;position iterator policy&#8217; to only store line numbers and not column numbers. This increases efficiency but might decrease usefulness of the position information.<\/p>\n<p>Wrapping the forward iterator with the position iterator is not difficult, we create another typedef and initialize the iterators as follows:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n   \/\/ wrap forward iterator with position iterator, to record the position\r\n   typedef classic::position_iterator2&lt;forward_iterator_type&gt;\r\n     pos_iterator_type;\r\n   pos_iterator_type position_begin(fwd_begin, fwd_end, filename);\r\n   pos_iterator_type position_end;\r\n<\/pre>\n<p>Hint: make sure there are no brackets after <span style=\"font-family: courier new;\">position_end<\/span> &#8211; this often happens and creates compile errors which are difficult to find: with brackets you do not declare a variable of type <span style=\"font-family: courier new;\">pos_iterator_type<\/span>, but a function returning <span style=\"font-family: courier new;\">pos_iterator_type<\/span>!<\/p>\n<p>We now use the new position iterator for parsing, but the grammar needs to be adjusted a bit. We will replace the following fragment:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n  \/\/ parse\r\n  bool r = qi::phrase_parse(\r\n    \/\/ iterators over input\r\n    fwd_begin, fwd_end,\r\n    \/\/ recognize list of doubles\r\n    qi::double_ &gt;&gt; *(',' &gt;&gt; qi::double_) &gt;&gt; qi::eoi,\r\n    \/\/ comment skipper\r\n    ascii::space | '#' &gt;&gt; *(ascii::char_ - qi::eol) &gt;&gt; qi::eol,\r\n    \/\/ doubles are stored into this object\r\n    output);\r\n\r\n  \/\/ error detection\r\n  if (!r || fwd_begin != fwd_end)\r\n    throw std::runtime_error(&quot;parse error in &quot; + filename);\r\n<\/pre>\n<p>In our new code, we first call <span style=\"font-family: courier new;\">qi::phrase_parse<\/span> as before, but this time we disallow backtracking by using <span style=\"font-family: courier new;\">&gt;<\/span> instead of <span style=\"font-family: courier new;\">&gt;&gt;<\/span> for certain parser sequence operators in the grammar: if the parser would backtrack over such a sequence, it throws a <span style=\"font-family: courier new;\">qi::expectation_failure&lt;pos_iterator_type&gt;<\/span> exception.<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n   try\r\n   {\r\n     qi::phrase_parse(\r\n       \/\/ iterators over input\r\n       position_begin, position_end,\r\n       \/\/ recognize list of doubles\r\n       qi::double_ &gt; *(',' &gt; qi::double_) &gt;&gt; qi::eoi,\r\n       \/\/ comment skipper\r\n       ascii::space | '#' &gt;&gt; *(ascii::char_ - qi::eol) &gt;&gt; qi::eol,\r\n       \/\/ store into this object\r\n       output);\r\n   }\r\n   catch(const qi::expectation_failure&lt;pos_iterator_type&gt;&amp;amp; e)\r\n   {\r\n   }\r\n<\/pre>\n<p>To process the caught exception we add the following code:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n     const classic::file_position_base&lt;std::string&gt;&amp; pos =\r\n         e.first.get_position();\r\n     std::stringstream msg;\r\n     msg &lt;&lt;\r\n         &quot;parse error at file &quot; &lt;&lt; pos.file &lt;&lt;\r\n         &quot; line &quot; &lt;&lt; pos.line &lt;&lt; &quot; column &quot; &lt;&lt; pos.column &lt;&lt; std::endl &lt;&lt;\r\n         &quot;'&quot; &lt;&lt; e.first.get_currentline() &lt;&lt; &quot;'&quot; &lt;&lt; std::endl &lt;&lt;\r\n         std::setw(pos.column) &lt;&lt; &quot; &quot; &lt;&lt; &quot;^- here&quot;;\r\n     throw std::runtime_error(msg.str());\r\n<\/pre>\n<p>(The complete final program can be downloaded from here: <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/stream_iterator_errorposition_parsing.cpp\" target=\"_blank\">stream_iterator_errorposition_parsing.cpp<\/a>.)<\/p>\n<p>This code creates an error message from the filename, the line and column number and the currently parsed line.<\/p>\n<p>The faulty input now creates the following useful error message:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nException: parse error at file STDIN line 1 column 10\r\n'123,42.0,a,1.4'\r\n          ^- here\r\n<\/pre>\n<p>The programs of this example can be downloaded here: <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/naive_parsing.cpp\" target=\"_blank\">naive_parsing.cpp<\/a>, <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/stream_iterator_parsing.cpp\" target=\"_blank\">stream_iterator_parsing.cpp<\/a>, and <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/stream_iterator_errorposition_parsing.cpp\" target=\"_blank\">stream_iterator_errorposition_parsing.cpp<\/a>. The two input files are <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/goodinput.txt\" target=\"_blank\">goodinput.txt<\/a> and <a href=\"http:\/\/boost-spirit.com\/dl_more\/parsing_tracking_position\/badinput.txt\" target=\"_blank\">badinput.txt<\/a>.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1018\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1018\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1018\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1018\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>The following example is about tracking the parsing position with Spirit V2. This is useful for generating error messages which tell the user exactly where an error has occurred. We also show how to use Spirit V2 to parse from an input stream without first reading the whole stream into a std::string. In our example, [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1018\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1018\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1018\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1018\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/tracking-the-input-position-while-parsing\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":39,"featured_media":0,"parent":384,"menu_order":5,"comment_status":"open","ping_status":"open","template":"article-page.php","meta":{"_s2mail":"","spay_email":""},"jetpack_shortlink":"https:\/\/wp.me\/PIHdZ-gq","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1018"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=1018"}],"version-history":[{"count":9,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1018\/revisions"}],"predecessor-version":[{"id":1029,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1018\/revisions\/1029"}],"up":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/384"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=1018"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}