{"id":989,"date":"2010-02-24T05:32:08","date_gmt":"2010-02-24T13:32:08","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/?p=989"},"modified":"2010-06-02T02:40:32","modified_gmt":"2010-06-02T09:40:32","slug":"parsing-skippers-and-skipping-parsers","status":"publish","type":"post","link":"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/","title":{"rendered":"Parsing Skippers and Skipping Parsers"},"content":{"rendered":"<p><em>Spirit<\/em> supports skipper based parsing since its very invention. So this is definitely not something new to Spirit V2. Nevertheless, the recent discussion on the <a href=\"http:\/\/boost-spirit.com\/home\/feedback-and-support\/\">Spirit mailing list<\/a> around the semantics of <em>Qi&#8217;s<\/em> <span style=\"font-family: Courier New;\">lexeme[]<\/span> directive shows the need for some clarification. Today I try to answer questions like: &#8220;What does it mean to use a skipper while parsing?&#8221;, or &#8220;When do I want to use a skipper and when not?&#8221;.<\/p>\n<p><!--more--><\/p>\n<p>While parsing some formatted data stream it is very often desirable to ignore some parts of the input. A common example would be the need to skip whitespace and comments while parsing some computer language. Certainly it is possible to explicitly account for the tokens to skip (such as the whitespace or the comments) while writing the grammar. But this can get very tedious as those tokens are valid to appear at any point in the input.<\/p>\n<p>For the sake of simplicity, let us assume we want to parse a simple key\/value expression: <span style=\"font-family: Courier New;\">key=value<\/span>, where we want to allow for any number of space characters before, in between, or after the <span style=\"font-family: Courier New;\">key<\/span> or the <span style=\"font-family: Courier New;\">value<\/span>. A naive grammar matching the plain key\/value pair without whitespace skipping would look like (see <a href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/\">Parsing a List of Key-Value Pairs Using Spirit.Qi<\/a> for more details):<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\npair  =  key &gt;&gt; '=' &gt;&gt; value;\r\nkey   =  qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);\r\nvalue = +qi::char_(&quot;a-zA-Z_0-9&quot;);\r\n<\/pre>\n<p>If we want to explicitly accommodate the rule <span style=\"font-family: Courier New;\">pair<\/span> to match any interspersed space characters we get:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\npair  = *space &gt;&gt; key &gt;&gt; *space &gt;&gt; '=' *space &gt;&gt; value &gt;&gt; *space;\r\n<\/pre>\n<p>which, while it produces the desired result, is not only error prone, but additionally difficult to write, to understand, and to maintain. If we look closer we see, that the process of skipping the whitespace tokens is easily automated. It seems to be sufficient to insert a repeated invocation of the <span style=\"font-family: Courier New;\">space<\/span> parser (or generally, any skip parser) in between the elements of the user defined parser expression sequences.<\/p>\n<p>In fact, that is exactly what <em>Spirit<\/em> can do for you! The library invokes any supplied skip parser upon entry to the parse member function of any parser conforming to the <a href=\"http:\/\/www.boost.org\/doc\/libs\/1_42_0\/libs\/spirit\/doc\/html\/spirit\/qi\/reference\/parser_concepts\/primitiveparser.html\"><span style=\"font-family: Courier New;\">PrimitiveParser<\/span><\/a> concept. The skip parser has to be supplied by calling a special API function: <span style=\"font-family: Courier New;\">phrase_parse:<\/span><\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nnamespace qi = boost::spirit::qi;\r\ntypedef std::string::const_iterator iterator;\r\n\r\nqi::rule&lt;iterator, qi::space_type&gt; pair = key &gt;&gt; '=' &gt;&gt; value;\r\nqi::rule&lt;iterator&gt; key = qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);\r\nqi::rule&lt;iterator&gt; value = +qi::char_(&quot;a-zA-Z_0-9&quot;);\r\n\r\nstd::string input(&quot; key = value &quot;);\r\niterator_type begin = input.begin();\r\niterator_type end = input.end();\r\nqi::phrase_parse(begin, end, pair, qi::space);\r\n<\/pre>\n<p>This code snippet illustrates several important things:<\/p>\n<ul>\n<li>The function <span style=\"font-family: Courier New;\">qi::phrase_parse<\/span> is equivalent to the API function <span style=\"font-family: Courier New;\">qi::parse<\/span> except for its additional parameter, the skip parser. Our example utilizes <span style=\"font-family: Courier New;\">qi::space<\/span>, but it is possible to use any other, even more complex parser expression as the skipper instead.<\/li>\n<li>All rules which we want to perform the skip parsing need to be declared with the type of the skip parser they are going to be used with. Our example specifies the type of the <span style=\"font-family: Courier New;\">qi::space<\/span> parser expression, which is <span style=\"font-family: Courier New;\">qi::space_type<\/span>. For more complex parser expressions you might want to use a (mini) grammar or take advantage of <span style=\"font-family: Courier New;\">BOOST_TYPEOF<\/span> to let the compiler deduce the actual type.<\/li>\n<li>All rules which should not perform skip parsing have to be declared without an additional skip parser type. These rules behave like an implicit <span style=\"font-family: Courier New;\">lexeme[]<\/span> directive (for more information about <span style=\"font-family: Courier New;\">lexeme[]<\/span>, see below), they inhibit the invocation of the skip parser even if they are executed as part of a rule with an associated skipper.<\/li>\n<\/ul>\n<p>In the example above we suppressed skipping while matching either the <span style=\"font-family: Courier New;\">key<\/span> or the <span style=\"font-family: Courier New;\">value,<\/span> otherwise our grammar would match any additional <span style=\"font-family: Courier New;\">space<\/span> character inside the <span style=\"font-family: Courier New;\">key<\/span> or <span style=\"font-family: Courier New;\">value<\/span> as well. Remember, the expression <span style=\"font-family: Courier New;\">char_<\/span> conforms to the <a href=\"http:\/\/www.boost.org\/doc\/libs\/1_42_0\/libs\/spirit\/doc\/html\/spirit\/qi\/reference\/parser_concepts\/primitiveparser.html\">PrimitiveParser<\/a> concept, it will execute the skip parser for each of its invocations. In this case any skip parser would be executed in between any two of the matched characters.<\/p>\n<p>Sometimes it is necessary to turn of skipping for a smaller part of the grammar only. For this purpose Spirit implements the <span style=\"font-family: Courier New;\">lexeme[]<\/span> directive. This directive inhibits skipping during the execution of the embedded parser. For instance, parsing a quoted string of alphanumeric characters would look like this:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstring = lexeme['&quot;' &gt;&gt; *alnum &gt;&gt; '&quot;'];\r\n<\/pre>\n<p>Here the lexeme directive disables skipping while matching the string, which avoids &#8216;loosing&#8217; characters otherwise matched by the skipper. Please note: <span style=\"font-family: Courier New;\">lexeme[]<\/span> performs a pre-skip step, even if it is not a <a href=\"http:\/\/www.boost.org\/doc\/libs\/1_42_0\/libs\/spirit\/doc\/html\/spirit\/qi\/reference\/parser_concepts\/primitiveparser.html\">PrimitiveParser<\/a> itself (it is essentially considered to be a logical primitive by design). If this is undesired, you can utilize the <span style=\"font-family: Courier New;\">no_skip[]<\/span> directive instead:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstring = '&quot;' &gt;&gt; no_skip[*alnum] &gt;&gt; '&quot;';\r\n<\/pre>\n<p>This parser will match all the characters in between the quotes, even if the string starts with a character sequence matched by the applied skip parser. The <span style=\"font-family: Courier New;\">no_skip[]<\/span> directive is semantically equivalent to <span style=\"font-family: Courier New;\">lexeme[]<\/span> except it does not perform a pre-skip before executing the embedded parser. Note: the <span style=\"font-family: Courier New;\">no_skip[]<\/span> directive has been added only recently. It will be available starting with the next release (Boost V1.43).<\/p>\n<p>This short article would not be complete without mentioning the <span style=\"font-family: Courier New;\">skip[]<\/span> directive. This directive is the counterpart to <span style=\"font-family: Courier New;\">lexeme[]<\/span>. It enables skipping for the embedded parser. Without any argument it can be used inside a lexeme or no_skip directive only. In this case it just re-enables the outer skipper:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstring = lexeme['&quot;' &gt;&gt; *(alpha | skip[digit]) &gt;&gt; '&quot;'];\r\n<\/pre>\n<p>This (purely hypothetical) parser would enable skipping inside a string as long as it matches digits. But the skip directive can do more. It may take an additional argument allowing to specify a new skipper, for instance:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nskip(qi::space)[*alnum]\r\n<\/pre>\n<p>which will skip spaces while executing the embedded <span style=\"font-family: Courier New;\">*alnum<\/span> parser. This form of the directive can be applied for two purposes. It can be used either for changing the current skip parser or to establish skipping inside a context otherwise not doing skipping at all (even if invoked with the <span style=\"font-family: Courier New;\">qi::parse()<\/span> API function).<\/p>\n<p>For more detailed information about all the mentioned directives please see the corresponding documentation.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-989\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-989\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-989\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-989\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Spirit supports skipper based parsing since its very invention. So this is definitely not something new to Spirit V2. Nevertheless, the recent discussion on the Spirit mailing list around the semantics of Qi&#8217;s lexeme[] directive shows the need for some clarification. Today I try to answer questions like: &#8220;What does it mean to use a [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-989\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-989\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-989\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-989\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/02\/24\/parsing-skippers-and-skipping-parsers\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[19,5,18],"tags":[8],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pIHdZ-fX","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/989"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=989"}],"version-history":[{"count":17,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/989\/revisions"}],"predecessor-version":[{"id":995,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/989\/revisions\/995"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=989"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/categories?post=989"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/tags?post=989"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}