{"id":392,"date":"2009-11-17T03:41:59","date_gmt":"2009-11-17T11:41:59","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/?page_id=392"},"modified":"2011-01-09T18:00:18","modified_gmt":"2011-01-10T02:00:18","slug":"nabialek-trick","status":"publish","type":"page","link":"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/","title":{"rendered":"Nabialek trick"},"content":{"rendered":"<p>This sample is a port of the original &#8220;Nabialek trick&#8221; from classic Spirit. <a href=\"https:\/\/svn.boost.org\/svn\/boost\/trunk\/libs\/spirit\/example\/qi\/nabialek.cpp\">See the cpp file here<\/a>.<\/p>\n<h4>Update:<\/h4>\n<blockquote><p>Thomas Bernard did some performance investigations and concluded that the older version of this article which stores the rules directly in the symbol-table is sub-optimal. This updated article uses rule pointers instead.<\/p>\n<p>Thomas notes:<\/p>\n<p>I&#8217;ve done some more investigation on the performance issue the proposed Nabialek trick implementation shows. With the help of callgrind I confirmed that the main performance hit comes from copying the parser out of the symbol table when a successful parse occurs into the rule local variable. (If anybody is interested I can provide the callgrind results)<\/p>\n<p>There is a very simple way to prevent that: use pointers. This time the optimization I propose works \ud83d\ude42<\/p>\n<p>The modifications to the proposed implementation are simple:<\/p>\n<ol>\n<li>Replace the symbol table definition to use rule pointers instead of normal rules<\/li>\n<li>Replace the rule local variable type to be a pointer to a rule instead of a rule<\/li>\n<li>Adapt the call to the lazy parser to take into account the modified local variable type<\/li>\n<\/ol>\n<p>Using pointers prevents copying the rules around and gives about three times faster parsing speed in the most simple cases. The speed improvement can be much bigger as the complexity of the rules stored in the symbol table increases.<\/p><\/blockquote>\n<p>This technique, I&#8217;ll call the &#8220;Nabialek trick&#8221; (from the name of its inventor, Sam Nabialek), can improve the rule dispatch from linear non-deterministic to deterministic. The trick applies to grammars where a keyword (operator, etc), precedes a production. There are lots of grammars similar to this:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nr =\r\n        keyword1 &gt;&gt; production1\r\n    |   keyword2 &gt;&gt; production2\r\n    |   keyword3 &gt;&gt; production3\r\n    |   keyword4 &gt;&gt; production4\r\n    |   keyword5 &gt;&gt; production5\r\n    \/*** etc ***\/\r\n    ;\r\n;\r\n<\/pre>\n<p>The cascaded alternatives are tried one at a time through trial and error until something matches. The Nabialek trick takes advantage of the symbol table&#8217;s search properties to optimize the dispatching of the alternatives. For an example, consider this grammar:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\none = id;\r\ntwo = id &gt;&gt; ',' &gt;&gt; id;\r\n\r\nstart = *(&quot;one&quot; &gt;&gt; one | &quot;two&quot; &gt;&gt; two);\r\n<\/pre>\n<p>There are two rules (one and two). When &#8220;one&#8221; is recognized, rule &#8216;one&#8217; is invoked. When &#8220;two&#8221; is recognized, rule &#8216;two&#8217; is invoked.<\/p>\n<p>Here&#8217;s the corresponding grammar using the Nabialek trick (see <a href=\"https:\/\/svn.boost.org\/svn\/boost\/trunk\/libs\/spirit\/example\/qi\/nabialek.cpp\">nabialek.cpp<\/a>):<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\none = id;\r\ntwo = id &gt;&gt; ',' &gt;&gt; id;\r\n\r\nkeyword.add\r\n    (&quot;one&quot;, &amp;one)\r\n    (&quot;two&quot;, &amp;two)\r\n    ;\r\n\r\nstart = *(keyword[_a = _1] &gt;&gt; lazy(*_a));\r\n<\/pre>\n<p>The grammar works as follows. keyword is a symbol table with two slots containing rule pointers:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nqi::symbols&lt;char, qi::rule&lt;Iterator, ascii::space_type&gt;*&gt; keyword;\r\n<\/pre>\n<p>This is initialized to contain the address of rules &#8220;one&#8221; and &#8220;two&#8221;.<\/p>\n<p>one, two, id, start are rules which are declared as follows:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nqi::rule&lt;Iterator, ascii::space_type&gt; id, one, two;\r\nqi::rule&lt;Iterator, ascii::space_type,\r\n    qi::locals&lt;qi::rule&lt;Iterator,\r\n        ascii::space_type&gt;*&gt; &gt; start;\r\n<\/pre>\n<p>The start rule has a local variable \u2014a rule pointer. In this example, when we enter the start rule, we create a temporary local variable. When a valid keyword is matched, notice that the semantic action stores a pointer to the rule we initially stored in the symbol table into the local variable with this code:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nkeyword[_a = _1]\r\n<\/pre>\n<p>_a refers to the local variable and _1 refers to the attribute of keyword \u2014the rule pointer contained in the symbol slots.<\/p>\n<p>Now that the local variable (referred to by _a) contains a pointer to the rule we want to continue parsing with, the next thing to do is call it. This is done by the lazy pseudo parser:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n&gt;&gt; lazy(*_a)\r\n<\/pre>\n<p>The dynamic grammar parses inputs such as:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n&quot;one only&quot;\r\n&quot;one again&quot;\r\n&quot;two first, second&quot;\r\n<\/pre>\n<p>The cool part is that the rule we want to parse with is dynamically chosen at runtime depending on what the symbol table contains. If it got a &#8220;one&#8221; then we use the &#8216;one&#8217; rule. If it got &#8220;two&#8221;, then we use the &#8216;two&#8217; rule. Very nifty! This technique should be very fast, especially when there are lots of keywords. It would be nice to add special facilities to make this easy to use. I imagine:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nr = keywords &gt;&gt; rest;\r\n<\/pre>\n<p>where keywords is a special parser (based on the symbol table) that automatically sets its RHS (rest) depending on the acquired symbol. This, I think, is mighty cool! Someday perhaps&#8230;<\/p>\n<p>Now, that last line was written in 2003. Will someone come forward and finally do it? \ud83d\ude42<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-392\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-392\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-392\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-392\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>This sample is a port of the original &#8220;Nabialek trick&#8221; from classic Spirit. See the cpp file here. Update: Thomas Bernard did some performance investigations and concluded that the older version of this article which stores the rules directly in the symbol-table is sub-optimal. This updated article uses rule pointers instead. Thomas notes: I&#8217;ve done [&hellip;]<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-392\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-392\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-392\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-392\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/articles\/qi-example\/nabialek-trick\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":2,"featured_media":0,"parent":384,"menu_order":2,"comment_status":"open","ping_status":"open","template":"article-page.php","meta":{"_s2mail":"yes","spay_email":""},"jetpack_shortlink":"https:\/\/wp.me\/PIHdZ-6k","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/392"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=392"}],"version-history":[{"count":21,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/392\/revisions"}],"predecessor-version":[{"id":1303,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/392\/revisions\/1303"}],"up":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/pages\/384"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}