{"id":1183,"date":"2010-11-13T12:59:30","date_gmt":"2010-11-13T20:59:30","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/"},"modified":"2010-11-13T13:25:38","modified_gmt":"2010-11-13T21:25:38","slug":"parsing-escaped-string-input-using-spirit-qi","status":"publish","type":"page","link":"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/","title":{"rendered":"Parsing Escaped String Input Using Spirit.Qi"},"content":{"rendered":"<p>In a previous <a href=\"http:\/\/boost-spirit.com\/home\/articles\/karma-examples\/generate-escaped-string-output-using-spirit-karma\/\">article<\/a> you&#8217;ve learned how to generated escaped strings, today we&#8217;re going to do the reverse. We&#8217;re going to describe a <em>Qi<\/em> grammar you can use to parse quoted strings in which special characters are escaped.<\/p>\n<p>The purpose of the <span style=\"font-family: Courier New;\">unescaped_string<\/span> grammar is to removed the enclosing quotes and un-escape the characters which were previously escaped (i.e. convert <span style=\"font-family: Courier New;\">&#8220;\\\\n&#8221;<\/span> back to <span style=\"font-family: Courier New;\">&#8216;\\n&#8217;<\/span> again). Unsurprisingly, the Parsing Expression Grammar (PEG) is quite like the one in the Karma article:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nunesc_str = '&quot;' (unesc_char \/ . \/ &quot;\\\\x&quot; hex)* '&quot;'\r\nunesc_char = &amp;&quot;\\\\a&quot; '\\a' \/ &amp;&quot;\\\\b&quot; '\\b' \/ &amp;&quot;\\\\f&quot; '\\f' \/\r\n             &amp;&quot;\\\\n&quot; '\\n' \/ &amp;&quot;\\\\r&quot; '\\r' \/ &amp;&quot;\\\\t&quot; '\\t' \/\r\n             &amp;&quot;\\\\v&quot; '\\v' \/ &amp;&quot;\\\\\\\\&quot; '\\\\' \/\r\n             &amp;&quot;\\\\\\'&quot; '\\'' \/ &amp;&quot;\\\\\\&quot;&quot; '&quot;'\r\n<\/pre>\n<p>An escaped strings starts and ends with the quoting character (in this case <span style=\"font-family: Courier New;\">&#8216;&#8221;&#8216;<\/span>) and all characters of the sequence are read either as an escaped character (<span style=\"font-family: Courier New;\">unesc_char<\/span>), a printable character or a &#8220;<span style=\"font-family: Courier New;\">\\\\x&#8221;<\/span> followed by the hexadecimal representation of a character code. The unesc_char will handle any of the listed character codes by parsing a backslash followed by the corresponding C-style encoding. Like in the Karma example, each of the listed alternatives (such as <span style=\"font-family: Courier New;\">&amp;&#8221;\\\\a&#8221; &#8216;\\a&#8217;<\/span>) reads as: if the input being parsed starts with <span style=\"font-family: Courier New;\">&#8220;\\\\a&#8221;<\/span>, return <span style=\"font-family: Courier New;\">&#8216;\\a&#8217;<\/span>.<\/p>\n<p>Now let&#8217;s convert this to a <em>Spirit<\/em> grammar:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nunesc_str = '&quot;' &gt;&gt; *(unesc_char | qi::alnum | &quot;\\\\x&quot; &gt;&gt; qi::hex) &gt;&gt; '&quot;';\r\nunesc_char.add(&quot;\\\\a&quot;, '\\a')(&quot;\\\\b&quot;, '\\b')(&quot;\\\\f&quot;, '\\f')(&quot;\\\\n&quot;, '\\n')\r\n              (&quot;\\\\r&quot;, '\\r')(&quot;\\\\t&quot;, '\\t')(&quot;\\\\v&quot;, '\\v')(&quot;\\\\\\\\&quot;, '\\\\')\r\n              (&quot;\\\\\\'&quot;, '\\'')(&quot;\\\\\\&quot;&quot;, '\\&quot;');\r\n<\/pre>\n<p>Just like in the <em>Karma<\/em> example we&#8217;re using predefined facilities, but in this case <em>Qi<\/em> facilities. Where we were previously using <span style=\"font-family: Courier New;\">karma::symbols&lt;&gt;<\/span> to map special characters to their C-style representation, we&#8217;re doing the reverse now: mapping C-style representations to the special characters using <span style=\"font-family: Courier New;\">qi::symbols&lt;&gt;<\/span>. This too will conveniently fail if the representation is not in the symbol table. The next parser that will be tried is <span style=\"font-family: Courier New;\">qi::alnum<\/span>, which successfully accepts an alphanumeric character and fails on all other input. At this point you might be wondering why I&#8217;m not using qi::print, like we were using <span style=\"font-family: Courier New;\">karma::print<\/span>, I&#8217;ll come back to this later. Last but not least we&#8217;ll try converting a hexadecimal character representation to the actual character code if it&#8217;s prefixed with a <span style=\"font-family: Courier New;\">&#8220;\\\\x&#8221;<\/span>.<\/p>\n<p>Generally, what is true for <em>Karma<\/em> generators holds for their <em>Qi<\/em> parser counterparts. They too have the ability to fail parsing if their preconditions are not met, allowing them to be used in alternatives just like the <em>Karma<\/em> generators. When one fails, the next alternative is tried until we&#8217;re out of alternatives and have to fail overall. The next alternative here is <span style=\"font-family: Courier New;\">qi::alnum<\/span>, which as we said accepts an alphanumeric character and fails otherwise.<\/p>\n<p>Since we now have the grammar, the next step is to figure out which attributes to use for the rules. The attributes in <em>Qi<\/em> are the types and values we gat as the result of converting the input, thus it&#8217;s quite straightforward:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nqi::rule&lt;InputIterator, std::string()&gt; unesc_str;\r\nqi::symbols&lt;char const, char const&gt; unesc_char;\r\n<\/pre>\n<p>Note that we&#8217;re assuming narrow character representation here, for wide character representation one would have used <span style=\"font-family: Courier New;\">std::wstring<\/span> and <span style=\"font-family: Courier New;\">wchar_t const<\/span>.<\/p>\n<p>Akin to the <em>Karma<\/em> example, we use the <em>Qi<\/em> counterpart <span style=\"font-family: Courier New;\">qi::rule&lt;&gt;<\/span> as the non-terminal for storing the parsed data for the <span style=\"font-family: Courier New;\">unesc_str<\/span>, for which we again assume <span style=\"font-family: Courier New;\">std::string<\/span> to be its attribute. Again two of the parser alternatives inside the Kleene Star expose a single character as their attribute, the other (<span style=\"font-family: Courier New;\">&#8220;\\\\x&#8221; &gt;&gt; qi::hex<\/span>) exposing an <span style=\"font-family: Courier New;\">unsigned int<\/span> which can be implicitly casted to a char. If we strictly followed the rules we&#8217;ll see the rule actually exposes a <span style=\"font-family: Courier New;\">vector&lt;char&gt;<\/span>, but this is compatible with a <span style=\"font-family: Courier New;\">std::string<\/span>. Secondly the C-style representations with their character matches are stored in a <span style=\"font-family: Courier New;\">qi::symbols&lt;&gt;<\/span> instance. Now you might wonder why it&#8217;s a <span style=\"font-family: Courier New;\">qi::symbols&lt;char const, char const&gt;<\/span> and not <span style=\"font-family: Courier New;\">qi::symbols&lt;char const *, char const&gt;<\/span> since the <em>Karma<\/em> example was using <span style=\"font-family: Courier New;\">karma&lt;char const, char const *&gt;<\/span> the first parameter is the character type of the strings stored, not the type of the strings.<\/p>\n<p>That&#8217;s all there&#8217;s to it!<\/p>\n<p>Since the <em>Karma<\/em> example allows you to specify the quoting character, we&#8217;re going to do so too, and I&#8217;ll use this opportunity to show just how similar <em>Karma<\/em> and <em>Qi<\/em> really are. Now here&#8217;s the <em>Karma<\/em> code presented in the previous article:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nkarma::rule&lt;OutputIterator, std::string(char const*)&gt; esc_str =\r\n        karma::lit(karma::_r1)\r\n    &lt;&lt; *(esc_char | karma::print | &quot;\\\\x&quot; &lt;&lt; karma::hex)\r\n    &lt;&lt;  karma::lit(karma::_r1);\r\n<\/pre>\n<p>Since we&#8217;re parsing data instead of generating it, we should use <span style=\"font-family: Courier New;\">&gt;&gt;<\/span> instead of <span style=\"font-family: Courier New;\">&lt;&lt;<\/span>, and obviously we&#8217;re using <em>Qi<\/em> thus we need to use that namespace, which gives us:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nqi::rule&lt;InputIterator, std::string(char const*)&gt; unesc_str =\r\n        qi::lit(qi::_r1)\r\n    &gt;&gt; *(unesc_char | qi::alnum | &quot;\\\\x&quot; &gt;&gt; qi::hex)\r\n    &gt;&gt;  qi::lit(qi::_r1);\r\n<\/pre>\n<p>Easy huh?<\/p>\n<p>This leads us back to the reason we&#8217;re using <span style=\"font-family: Courier New;\">qi::alnum<\/span> instead of <span style=\"font-family: Courier New;\">qi::print<\/span>, there are two reasons:<\/p>\n<ol>\n<li>The alternatives are are tried in the order they are in, thus if we had <span style=\"font-family: Courier New;\">*(unesc_char | qi::print | &#8220;\\\\x&#8221; &gt;&gt; qi::hex)<\/span> the <span style=\"font-family: Courier New;\">&#8220;\\\\x&#8221; &gt;&gt; qi::hex<\/span> option would never be tried since <span style=\"font-family: Courier New;\">&#8220;\\\\x&#8221;<\/span> and hexadecimal characters are printable, and thus parsed using qi::print. Simply re-arranging the order solves this problem, thus we could use <span style=\"font-family: Courier New;\">*(unesc_char | &#8220;\\\\x&#8221; &gt;&gt; qi::hex | qi::print)<\/span>.<\/li>\n<li>The quoted character used in the example is <span style=\"font-family: Courier New;\">&#8220;&#8221;'&#8221;<\/span>, which means we&#8217;re actually parsing as follows<\/li>\n<\/ol>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nunesc_str = qi::lit(&quot;'''&quot;)\r\n    &gt;&gt; *(unesc_char | &quot;\\\\x&quot; &gt;&gt; qi::hex | qi::print)\r\n    &gt;&gt;  qi::lit(&quot;'''&quot;);\r\n<\/pre>\n<p>but the Kleene Star is greedy (an so is the unary <span style=\"font-family: Courier New;\">+<\/span>, which means &#8220;one or more&#8221;), thus it would parse the closing <span style=\"font-family: Courier New;\">&#8220;&#8221;'&#8221;<\/span> too! The solution here is not to use <span style=\"font-family: Courier New;\">qi::print<\/span>, but (<span style=\"font-family: Courier New;\">qi::print &#8211; &#8220;&#8216;&#8221;<\/span>) which means &#8220;all printable characters, except <span style=\"font-family: Courier New;\">&#8220;&#8216;&#8221;<\/span>.<\/p>\n<p>Putting this all together and wrapping it in a qi::grammar we get:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\ntemplate &lt;typename InputIterator&gt;\r\nstruct unescaped_string\r\n  : qi::grammar&lt;InputIterator, std::string(char const*)&gt;\r\n{\r\n    unescaped_string()\r\n      : unescaped_string::base_type(unesc_str)\r\n    {\r\n        unesc_char.add(&quot;\\\\a&quot;, '\\a')(&quot;\\\\b&quot;, '\\b')(&quot;\\\\f&quot;, '\\f')(&quot;\\\\n&quot;, '\\n')\r\n                      (&quot;\\\\r&quot;, '\\r')(&quot;\\\\t&quot;, '\\t')(&quot;\\\\v&quot;, '\\v')\r\n                      (&quot;\\\\\\\\&quot;, '\\\\')(&quot;\\\\\\'&quot;, '\\'')(&quot;\\\\\\&quot;&quot;, '\\&quot;')\r\n        ;\r\n\r\n        unesc_str = qi::lit(qi::_r1)\r\n            &gt;&gt; *(unesc_char | qi::alnum | &quot;\\\\x&quot; &gt;&gt; qi::hex)\r\n            &gt;&gt;  qi::lit(qi::_r1)\r\n        ;\r\n    }\r\n\r\n    qi::rule&lt;InputIterator, std::string(char const*)&gt; unesc_str;\r\n    qi::symbols&lt;char const, char const&gt; unesc_char;\r\n};\r\n<\/pre>\n<p>which can be called as such:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\ntypedef std::string::const_iterator iterator_type;\r\n\r\nstd::string parsed;\r\nstd::string str(&quot;'''string to unescape:\\\\x20\\\\n\\\\r\\\\t\\\\\\&quot;\\\\'\\\\x41'''&quot;);\r\nchar const* quote = &quot;'''&quot;;\r\niterator_type iter = str.begin();\r\niterator_type end = str.end();\r\nclient::unescaped_string&lt;iterator_type&gt; p;\r\n\r\nqi::parse(iter, end, p(quote), parsed);\r\n\r\n    \/\/ this will result in: parsed == &quot;string to unescape: \\n\\r\\t\\&quot;\\'A&quot;\r\n<\/pre>\n<p>Which uses <span style=\"font-family: Courier New;\">qi::parse<\/span>, one of <em>Qi<\/em> its main API functions, similar to <span style=\"font-family: Courier New;\">karma::generate<\/span> for generation. It takes the beginning and the end of the string to parse, an instance of the parser (<span style=\"font-family: Courier New;\">p<\/span>) and an attribute instance to which the parsed data is saved (<span style=\"font-family: Courier New;\">parsed<\/span>). We pass the quoting character sequence (the <span style=\"font-family: Courier New;\">&#8220;&#8221;'&#8221;<\/span> in quote) as an inherited attribute when invoking the grammar. When successful <span style=\"font-family: Courier New;\">qi::parse<\/span> returns <span style=\"font-family: Courier New;\">true<\/span>, while it returns <span style=\"font-family: Courier New;\">false<\/span> otherwise and one can compare <span style=\"font-family: Courier New;\">iter<\/span> to <span style=\"font-family: Courier New;\">end<\/span> to check if the complete input string has been parsed.<\/p>\n<p>This leaves one minor enhancement, one can use <span style=\"font-family: Courier New;\">parsed.reserve(str.length())<\/span> to reserve enough space in <span style=\"font-family: Courier New;\">parsed<\/span> to fit the un-escaped string, which encompasses a small speed enhancement.<\/p>\n<p>If you want to try out this example for yourself, the complete source code is available from the <a href=\"http:\/\/www.boost.org\">Boost<\/a> SVN <a href=\"http:\/\/svn.boost.org\/svn\/boost\/trunk\/libs\/spirit\/example\/qi\/unescaped_string.cpp\">here<\/a>.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1183\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1183\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1183\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1183\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>In a previous article you&#8217;ve learned how to generated escaped strings, today we&#8217;re going to do the reverse. We&#8217;re going to describe a Qi grammar you can use to parse quoted strings in which special characters are escaped. The purpose of the unescaped_string grammar is to removed the enclosing quotes and un-escape the characters which [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1183\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1183\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1183\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1183\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-escaped-string-input-using-spirit-qi\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":450,"featured_media":0,"parent":384,"menu_order":1,"comment_status":"open","ping_status":"open","template":"article-page.php","meta":{"_s2mail":"yes","spay_email":""},"jetpack_shortlink":"https:\/\/wp.me\/PIHdZ-j5","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1183"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/450"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=1183"}],"version-history":[{"count":6,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1183\/revisions"}],"predecessor-version":[{"id":1192,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/1183\/revisions\/1192"}],"up":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/384"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=1183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}