{"id":371,"date":"2009-11-15T20:12:48","date_gmt":"2009-11-16T04:12:48","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/?page_id=371"},"modified":"2009-11-19T10:02:08","modified_gmt":"2009-11-19T18:02:08","slug":"parsing-a-list-of-key-value-pairs-using-spirit-qi","status":"publish","type":"page","link":"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/","title":{"rendered":"Parsing a List of Key-Value Pairs Using Spirit.Qi"},"content":{"rendered":"<p>One of the goals of this blog is to provide shrink wrapped solutions for small everyday parsing and output generation problems. We hope to show how <em>Spirit<\/em> may be used to simplify the life for you as a C++ programmer related to data conversion problems from some external representation to your internal data structures and vice versa.<\/p>\n<p>One of the tasks often to be solved is to parse arbitrary key\/value pairs delimited by some separator into a <span style=\"font-family: courier new\">std::map<\/span>. Parsing the URL query format is one example: <span style=\"font-family: courier new\">key1=value1;key2=value2;\u2026;keyN=valueN<\/span>, and that\u2019s what I would like to provide a reusable solution for.<\/p>\n<p>Let\u2019s start with the corresponding grammar (written using the notation of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Parsing_expression_grammar\" target=\"_blank\">Parsing Expression Grammars<\/a>):<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nquery \u2190 pair ((';' \/ '&amp;') pair)*\r\npair  \u2190 key ('=' value)?\r\nkey   \u2190 [a-zA-Z_][a-zA-Z_0-9]*\r\nvalue \u2190 [a-zA-Z_0-9]+\r\n<\/pre>\n<p>We assume that a <strong>key<\/strong> has to start with any letter or an underscore, but otherwise might consist of letters, digits or underscores. A <strong>value<\/strong> can not be empty and may consist of letters, digits or underscores only as well. Further, we assume a <strong>query<\/strong> to be a sequence of at least one <strong>pair<\/strong> delimited by semicolons (<span style=\"font-family: courier new\">&#8216;;&#8217;<\/span>) or ampersands (<span style=\"font-family: courier new\">&#8216;&amp;&#8217;<\/span>), and a <strong>pair<\/strong> to be a sequence of a mandatory <strong>key<\/strong> optionally followed by a <span style=\"font-family: courier new\">&#8216;=&#8217;<\/span> and a <strong>value<\/strong>.<\/p>\n<p>Converting any Parsing Expression Grammar (PEG) into an equivalent grammar for <em>Spirit.Qi<\/em> is a purely mechanical step. Let me provide the result first and explain some of the details later on:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nquery =  pair &gt;&gt; *((qi::lit(';') | '&amp;') &gt;&gt; pair);\r\npair  =  key &gt;&gt; -('=' &gt;&gt; value);\r\nkey   =  qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);\r\nvalue = +qi::char_(&quot;a-zA-Z_0-9&quot;);\r\n<\/pre>\n<p>All differences we see are caused by limitations of the C++ language <em>Spirit<\/em> has to live with. Direct juxtaposition is expressed using the right shift operator (<span style=\"font-family: courier new\">&#8216;&gt;&gt;&#8217;)<\/span><span style=\"font-family: trebuchet ms\">, postfix operators as the Kleene Star (<span style=\"font-family: courier new\">&#8216;*&#8217;<\/span>) and the Plus (<span style=\"font-family: courier new\">&#8216;+&#8217;<\/span>) are moved to the front of the corresponding expression and written as prefix operators, and the optional operator (question mark) is rewritten using the unary minus (<span style=\"font-family: courier new\">&#8216;-&#8216;<\/span>). Otherwise the two grammars look fairly similar.<\/span> The <span style=\"font-family: courier new\">char_<\/span> is a predefined Qi parser primitive matching exactly one character based on the description provided as its argument. The <span style=\"font-family: Courier New;\">lit<\/span> is very similar to <span style=\"font-family: Courier New;\">char_<\/span> except that it does not expose the matched value as its attribute.<\/p>\n<p>The next required step is to understand what attribute type each of our defined grammar rules should expose. These attribute types will allow us to map the external representation onto the internal C++ types we will use to store the parsing results. Let us store the keys and the values as <span style=\"font-family: courier new\">std::string<\/span>\u2019s (assuming we have to deal with narrow character representations). The result of a parsed key\/value pair can be conveniently stored into a <span style=\"font-family: courier new\">std::pair&lt;std::string, std::string&gt;<\/span>. Finally, the overall result of parsing the query string should be stored into a <span style=\"font-family: courier new\">std::map&lt;std::string, std::string&gt;<\/span>. This knowledge allows to write the declaration of our rule variables as used above:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nqi::rule&lt;Iterator, std::map&lt;std::string, std::string&gt;()&gt; query;\r\nqi::rule&lt;Iterator, std::pair&lt;std::string, std::string&gt;()&gt; pair;\r\nqi::rule&lt;Iterator, std::string()&gt; key, value;\r\n<\/pre>\n<p>The type <span style=\"font-family: courier new\">qi::rule&lt;&gt;<\/span> is a predefined non-terminal parser provided by <em>Spirit<\/em> usable for storing the grammar definitions above. We need to provide the iterator type of the underlying input data <span style=\"font-family: courier new\">(Iterator)<\/span> and the attribute type, which is the type of the data the rule is supposed to store its parsed data in.<\/p>\n<p>A word about the unusual function declaration syntax used to specify the attribute type of the rule. Non-terminals in recursive descent parsers can be seen as being very similar to functions. They return a value, their (synthesized) attribute, while they optionally may take arguments, their (inherited) attributes. <em>Spirit<\/em> uses the function declaration syntax in order to emphasize this similarity.<\/p>\n<p>In the beginning I promised to provide a shrink wrapped solution, so what\u2019s still left is to encapsulate the whole functionality. Again Spirit has some recommended way of doing that: <span style=\"font-family: courier new\">grammar&lt;&gt;<\/span>\u2019s. Spirit <span style=\"font-family: courier new\">grammar<\/span>\u2019s are special non-terminals acting as containers for one or more rules allowing to encapsulate more complex parsers:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nnamespace qi = boost::spirit::qi;\r\n\r\ntemplate &lt;typename Iterator&gt;\r\nstruct keys_and_values\r\n  : qi::grammar&lt;Iterator, std::map&lt;std::string, std::string&gt;()&gt;\r\n{\r\n    keys_and_values()\r\n      : keys_and_values::base_type(query)\r\n    {\r\n        query =  pair &gt;&gt; *((qi::lit(';') | '&amp;') &gt;&gt; pair);\r\n        pair  =  key &gt;&gt; -('=' &gt;&gt; value);\r\n        key   =  qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);\r\n        value = +qi::char_(&quot;a-zA-Z_0-9&quot;);\r\n    }\r\n    qi::rule&lt;Iterator, std::map&lt;std::string, std::string&gt;()&gt; query;\r\n    qi::rule&lt;Iterator, std::pair&lt;std::string, std::string&gt;()&gt; pair;\r\n    qi::rule&lt;Iterator, std::string()&gt; key, value;\r\n};\r\n<\/pre>\n<p>The derivation from Qi\u2019s grammar type converts the <span style=\"font-family: courier new\">keys_and_values<\/span> type into a parser. Its member rules define a grammar which makes it usable for recognizing URL query strings. The base class constructor gets passed the <strong>start<\/strong> rule, which is the top most rule of the grammar to be executed when the grammar is invoked. The type <span style=\"font-family: courier new\">keys_and_values<\/span> has a template parameter allowing to utilize this grammar with arbitrary iterator types.<\/p>\n<p>The last missing code piece shows how to invoke the newly created parser.<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstd::string input(&quot;key1=value1;key2;key3=value3&quot;);  \/\/ input to parse\r\nstd::string::iterator begin = input.begin();\r\nstd::string::iterator end = input.end();\r\n\r\nkeys_and_values&lt;std::string::iterator&gt; p;    \/\/ create instance of parser\r\nstd::map&lt;std::string, std::string&gt; m;        \/\/ map to receive results\r\nbool result = qi::parse(begin, end, p, m);   \/\/ returns true if successful\r\n<\/pre>\n<p>The function <span style=\"font-family: courier new\">qi::parse()<\/span> is one of Spirit\u2019s main API functions. In the simplest case it takes a pair of iterators pointing to the input sequence to parse <span style=\"font-family: courier new\">(begin, end)<\/span>, an instance of the parser to invoke <span style=\"font-family: courier new\">(p)<\/span>, and the attribute instance to be filled with the converted data <span style=\"font-family: courier new\">(m)<\/span>. This function executes the actual parsing operation and returns <span style=\"font-family: Courier New;\">true<\/span> if it was successful.<\/p>\n<p>The fact that attributes of certain types are getting filled on the fly might look like magic to you, or you might think I left out some essential code snippets. But neither it is magic nor did I leave out anything. In <em>Spirit.Qi<\/em> all parser components (such as <span style=\"font-family: Courier New;\">char_<\/span> or <span style=\"font-family: Courier New;\">lit<\/span>) expose specific attribute types and all compound parsers (such as sequences and alternatives) implement well defined rules for attribute propagation and merging. Our example uses the knowledge about these rules, and if you want to understand how this works in more detail, please refer to <em>Spirit\u2019s<\/em> documentation.<\/p>\n<p>Here is an example: as you might expect <span style=\"font-family: Courier New;\">char_<\/span> exposes the matched character as its attribute. The Kleene Star (<span style=\"font-family: courier new\">&#8216;*&#8217;<\/span>) and the Plus (<span style=\"font-family: courier new\">&#8216;+&#8217;<\/span>) are compound parsers collecting the attributes of their embedded parser in a (<a href=\"http:\/\/www.sgi.com\/tech\/stl\/\" target=\"_blank\">STL<\/a> compatible) container (in our case a <span style=\"font-family: Courier New;\">std::string<\/span><span style=\"font-family: Trebuchet MS;\">, which is a container of <span style=\"font-family: Courier New;\">char<\/span>). The non-terminal <span style=\"font-family: Courier New;\">rule<\/span> normally inherits the attribute of the parser expression on the right hand side of its assignment. Last but not least, sequences match the attributes of their elements to the corresponding parts of their attribute data structure. Since literals (such as <span style=\"font-family: Courier New;\">&#8216;=&#8217;<\/span> or <span style=\"font-family: Courier New;\">lit(&#8216;;&#8217;)<\/span>) do not expose any attribute, the expression<\/span><\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\npair  =  key &gt;&gt; -('=' &gt;&gt; value);\r\n<\/pre>\n<p>naturally combines the two string attributes of <strong>key<\/strong> and <strong>value<\/strong> into the pair of strings as expected by the non-terminal <strong>pair<\/strong>, initializing the attribute of <strong>value<\/strong> with an empty string if it is not matched.<\/p>\n<p>If you want to try it out for yourself, the complete source code for this example is available from the <a href=\"http:\/\/www.boost.org\/\" target=\"_blank\">Boost<\/a> SVN <a href=\"http:\/\/svn.boost.org\/svn\/boost\/trunk\/libs\/spirit\/example\/qi\/key_value_sequence.cpp\" target=\"_blank\">here<\/a>. In the future this example will be distributed as part of the Spirit distribution, but for now it lives in the SVN only.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-371\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-371\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-371\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-371\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>One of the goals of this blog is to provide shrink wrapped solutions for small everyday parsing and output generation problems. We hope to show how Spirit may be used to simplify the life for you as a C++ programmer related to data conversion problems from some external representation to your internal data structures and [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-371\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-371\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-371\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-371\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/parsing-a-list-of-key-value-pairs-using-spirit-qi\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":3,"featured_media":0,"parent":384,"menu_order":6,"comment_status":"open","ping_status":"open","template":"article-page.php","meta":{"_s2mail":"","spay_email":""},"jetpack_shortlink":"https:\/\/wp.me\/PIHdZ-5Z","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/371"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=371"}],"version-history":[{"count":11,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/371\/revisions"}],"predecessor-version":[{"id":413,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/371\/revisions\/413"}],"up":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/384"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}