<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Boost.Spirit &#187; Beginner</title>
	<atom:link href="http://boost-spirit.com/home/category/experience-level/beginner/feed/" rel="self" type="application/rss+xml" />
	<link>http://boost-spirit.com/home</link>
	<description>Home of The Boost.Spirit Library</description>
	<lastBuildDate>Sat, 24 Jul 2010 02:46:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Tracking the Input Position While Parsing</title>
		<link>http://boost-spirit.com/home/2010/03/05/tracking-the-input-position-while-parsing/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=tracking-the-input-position-while-parsing</link>
		<comments>http://boost-spirit.com/home/2010/03/05/tracking-the-input-position-while-parsing/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 16:21:53 +0000</pubDate>
		<dc:creator>Peter Schüller</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Qi Example]]></category>
		<category><![CDATA[MultiPass]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=1026</guid>
		<description><![CDATA[The following article is about tracking the parsing position with Spirit V2. This is useful for generating error messages which tell the user exactly where an error has occurred. We also show how to use Spirit V2 to parse from an input stream without first reading the whole stream into a std::string. Continue reading » [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (1 vote cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>The following article is about tracking the parsing position with <em>Spirit</em> V2. This is useful for generating error messages which tell the user exactly where an error has occurred. We also show how to use <em>Spirit</em> V2 to parse from an input stream without first reading the whole stream into a std::string.</p>
<p><a href="http://boost-spirit.com/home/articles/qi-example/tracking-the-input-position-while-parsing/">Continue reading »</a></p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (1 vote cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/03/05/tracking-the-input-position-while-parsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Anatomy of Semantic Actions in Qi</title>
		<link>http://boost-spirit.com/home/2010/03/03/the-anatomy-of-semantic-actions-in-qi/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=the-anatomy-of-semantic-actions-in-qi</link>
		<comments>http://boost-spirit.com/home/2010/03/03/the-anatomy-of-semantic-actions-in-qi/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 18:04:35 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Karma]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=1010</guid>
		<description><![CDATA[The concept of Spirit&#8217;s semantic actions seems to be easy enough to understand as most people new to the library prefer their usage over applying the built-in attribute propagation rules. That is not surprising. The idea of attaching a function to any point of a grammar which is called whenever the corresponding parser matched is [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.4" /></div><div>Rating: 4.4/<strong>5</strong> (5 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>The concept of <em>Spirit&#8217;s</em> semantic actions seems to be easy enough to understand as most people new to the library prefer their usage over applying the built-in attribute propagation rules. That is not surprising. The idea of attaching a function to any point of a grammar which is called whenever the corresponding parser matched is straighforward to grasp. Earlier versions of <em>Spirit</em> required a semantic action to conform to a very specific interface. Today&#8217;s semantic actions are more flexible and more powerful. Recently, a couple of people asked questions about them. So I decided dedicating this Tip of the Day to the specifics and the usage model of semantic actions in <em>Spirit Qi</em>.</p>
<p><span id="more-1010"></span></p>
<p>All three of <em>Spirit&#8217;s</em> sub-libraries &#8211; <em>Qi</em>, <em>Karma</em>, and <em>Lex</em> – support semantic actions. In each case they are different and have some specifics. Today I will highlight semantic actions in <em>Qi</em>. But I will dedicate later Tips of the Day to semantic actions in <em>Karma</em> and  <em>Lex</em>.</p>
<p>Semantic actions are functions or function objects attached to some specific part of a grammar. In <em>Qi</em> they are invoked <em>after</em> the corresponding parser successfully recognizes a portion of the input. Here the semantic action receives the attribute value of the matching parser.</p>
<h5>Semantic Actions – a General View</h5>
<p>A semantic action <span style="font-family: Courier New;">f</span> are attached to a <em>Qi</em> parser <span style="font-family: Courier New;">p</span> by simply writing:</p>
<pre class="brush: cpp;">
p[f]
</pre>
<p>The function (or function object) <span style="font-family: Courier New;">f</span> has to expose a certain interface allowing <em>Spirit</em> to pass the proper argument types. In the simplest case this can be a global function taking no arguments at all.</p>
<pre class="brush: cpp;">
void func()
{
    std::cout &lt;&lt; &quot;Matched an integer!\n&quot;;
}

std::string input(&quot;1234&quot;);
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end, qi::int_[func]);     // this will call func
</pre>
<p>Most of the time this is not sufficient as a semantic action is expected to receive the matched attribute value. This is possible by writing:</p>
<pre class="brush: cpp;">
void func(int attribute)
{
    std::cout &lt;&lt; &quot;Matched integer: &quot; &lt;&lt; attribute &lt;&lt; &quot;\n&quot;;
}
</pre>
<p>The type of the expected parameter (in this case the <span style="font-family: Courier New;">int</span>) depends on the parser the semantic action is attached to. The attribute type exposed by the parser has to be convertible to the argument type.</p>
<p>There are actually 2 more arguments being passed: the parser context and a reference to a boolean &#8216;hit&#8217; parameter. The parser context is meaningful only if the semantic action is attached somewhere to the right hand side of a rule. We will see more information about this shortly. The boolean value can be set to false inside the semantic action invalidates the match in retrospective, making the parser fail. <em>Qi</em> allows us to bind a nullary or a single argument function, like above. The other arguments are simply ignored.</p>
<p>It is feasible to bind any function object (such as generated by <a href="http://www.boost.org/doc/libs/1_42_0/libs/bind/index.html">Boost.Bind</a> or <a href="http://www.boost.org/doc/libs/1_42_0/libs/lambda/index.html">Boost.Lambda</a>) as an semantic action. Even if the documentation shows a couple of examples (see <a href="http://www.boost.org/doc/libs/1_42_0/libs/spirit/doc/html/spirit/qi/tutorials/semantic_actions.html#spirit.qi.tutorials.semantic_actions.examples_of_semantic_actions">here</a>), I would not recommend using those libraries in this context. For me the preferred method of writing semantic actions is to employ <a href="http://www.boost.org/doc/libs/1_42_0/libs/spirit/phoenix/doc/html/index.html">Boost.Phoenix</a> &#8211; a companion library bundled with <em>Spirit</em>. It is like <a href="http://www.boost.org/libs/lambda/index.html">Boost.Lambda</a> on steroids, with special custom features that make it easy to integrate semantic actions with Spirit. If your requirements go beyond simple parsing, I suggest that you use this library. All the following examples in this article will use <a href="http://www.boost.org/phoenix/doc/html/index.html">Boost.Phoenix</a> for semantic actions. But whatever method you use, please let me highlight the following:</p>
<blockquote><p>The three libraries allow you to utilize special placeholders to control parameter placement (<code>_1</code>, <code>_2</code>, etc.). Unfortunately, each of those libraries has it&#8217;s own implementation of the placeholders, all in different namespaces. You have to make sure not to mix placeholders with a library they don&#8217;t belong to and not to use different libraries while writing a semantic action.</p>
<p>Generally, for <a href="http://www.boost.org/libs/bind/index.html">Boost.Bind</a>, use <code>::_1</code>, <code>::_2</code>, etc. (yes, these placeholders are defined in the global namespace).</p>
<p>For <a href="http://www.boost.org/libs/lambda/index.html">Boost.Lambda</a> use the placeholders defined in the namespace <code>boost::lambda</code>.</p>
<p>For semantic actions written using <a href="http://www.boost.org/phoenix/doc/html/index.html">Boost.Phoenix</a> use the placeholders defined in the namespace <code>boost::spirit</code>. Please note that all existing placeholders for your convenience are also available from the namespace <code>boost::spirit::qi</code>.</p></blockquote>
<p>The current version of Spirit (V2.2) does not yet support binding a native C++0x lambda function as a semantic action, but this is something we are currently working on. You can expect this to be possible in the near future.</p>
<h5>Writing Phoenix based Semantic Actions</h5>
<p>Writing a semantic action with Phoenix is beneficial as <em>Spirit</em>  &#8216;knows&#8217; about Phoenix. If you write them with the help of Phoenix you can utilize special placeholders <em>Spirit</em> provides you with. Those placeholders refer to elements in the context of the current parser execution such as attributes, local variables and inherited attributes of rules, etc. None of the other means of writing semantic actions (using Bind, Lambda, or had written function objects) gives you direct access to those elements. The following table lists all available placeholders exposed by Spirit (as mentioned earlier, all are defined in the namespace <span style="font-family: Courier New;">boost::spirit::qi</span>). Again, please note, these are only available inside a semantic action and only if the semantic action is written utilizing Phoenix.</p>
<table border="1" cellspacing="0" cellpadding="2" width="600">
<tbody>
<tr>
<td width="206" valign="top"><strong>Placeholder</strong></td>
<td width="394" valign="top"><strong>Description</strong></td>
</tr>
<tr>
<td width="206" valign="top"><code>_1, _2, ... , _N</code></td>
<td width="394" valign="top">Nth attribute of the parser <code>p</code></td>
</tr>
<tr>
<td width="206" valign="top">
<dt><code>_pass</code></dt>
</td>
<td width="394" valign="top">Assign <code>false</code> to <code>_pass</code> to force a generator failure.</td>
</tr>
<tr>
<td width="206" valign="top">
<dt><code>_val</code></dt>
</td>
<td width="394" valign="top">The enclosing rule&#8217;s synthesized attribute.</td>
</tr>
<tr>
<td width="206" valign="top">
<dt><code>_r1, _r2, ... , _rN</code></dt>
</td>
<td width="394" valign="top">The enclosing rule&#8217;s Nth inherited attribute.</td>
</tr>
<tr>
<td width="206" valign="top">
<dt><code>_a, _b, ... , _j</code></dt>
</td>
<td width="394" valign="top">The enclosing rule&#8217;s local variables (<code>_a</code> refers to the first).</td>
</tr>
<tr>
<td width="206" valign="top"> </td>
<td width="394" valign="top"> </td>
</tr>
</tbody>
</table>
<p>Obviously, the placeholders listed in the last three rows of the table are meaningful only if used in a rule definition. As an example, let us rewrite the semantic action from above with Phoenix:</p>
<pre class="brush: cpp;">
std::string input(&quot;1234&quot;);
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    qi::int_
    [
        std::cout &lt;&lt; &quot;Matched integer: &quot; &lt;&lt; qi::_1 &lt;&lt; &quot;\n&quot;;
    ]
);
</pre>
<p>One problem with earlier versions of Spirit (i.e. <em>Spirit.Classic</em>) was that while parsing sequences of things it was difficult to avoid calling a semantic action prematurely. For instance, in a parser sequence of two integer parsers (<span style="font-family: Courier New;">int_[f1] &gt;&gt; &#8216;,&#8217; &gt;&gt; int_[f2]</span>) the function <span style="font-family: Courier New;">f1</span> got called immediately after the first integer matched, and even if the second integer parser would fail later on. In the current version of Spirit this is not an issue anymore as it is possible to attach a semantic action to the whole sequence while still referring to the single attributes of the different sequence elements:</p>
<pre class="brush: cpp;">
std::string input(&quot;1234,2345&quot;);
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    (qi::int_ &gt;&gt; ',' &gt;&gt; qi::int_)
    [
        std::cout &lt;&lt; &quot;Matched integers: &quot;
              &lt;&lt; qi::_1 &lt;&lt; &quot; and &quot; &lt;&lt; qi::_2 &lt;&lt; &quot;\n&quot;;
    ]
);
</pre>
<p>Here, <span style="font-family: Courier New;">qi::_1</span> refers to the attribute matched by the first integer parser, and <span style="font-family: Courier New;">qi::_2</span> to the second one.</p>
<p>Initially I was planning to additionally describe the internal interface of a semantic action. Utilizing this interface allows you to write your own function objects and still to get access to the elements of the parser context mentioned above (attributes, the rule&#8217;s local variables and inherited attributes, etc.). But this post already got longer as anticipated, which is why I defer this discussion to a second Tip of the Day. Stay tuned!</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.4" /></div><div>Rating: 4.4/<strong>5</strong> (5 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/03/03/the-anatomy-of-semantic-actions-in-qi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing Skippers and Skipping Parsers</title>
		<link>http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=parsing-skippers-and-skipping-parsers</link>
		<comments>http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 13:32:08 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Qi Example]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=989</guid>
		<description><![CDATA[Spirit supports skipper based parsing since its very invention. So this is definitely not something new to Spirit V2. Nevertheless, the recent discussion on the Spirit mailing list around the semantics of Qi&#8217;s lexeme[] directive shows the need for some clarification. Today I try to answer questions like: &#8220;What does it mean to use a [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (4 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p><em>Spirit</em> supports skipper based parsing since its very invention. So this is definitely not something new to Spirit V2. Nevertheless, the recent discussion on the <a href="http://boost-spirit.com/home/feedback-and-support/">Spirit mailing list</a> around the semantics of <em>Qi&#8217;s</em> <span style="font-family: Courier New;">lexeme[]</span> directive shows the need for some clarification. Today I try to answer questions like: &#8220;What does it mean to use a skipper while parsing?&#8221;, or &#8220;When do I want to use a skipper and when not?&#8221;.</p>
<p><span id="more-989"></span></p>
<p>While parsing some formatted data stream it is very often desirable to ignore some parts of the input. A common example would be the need to skip whitespace and comments while parsing some computer language. Certainly it is possible to explicitly account for the tokens to skip (such as the whitespace or the comments) while writing the grammar. But this can get very tedious as those tokens are valid to appear at any point in the input.</p>
<p>For the sake of simplicity, let us assume we want to parse a simple key/value expression: <span style="font-family: Courier New;">key=value</span>, where we want to allow for any number of space characters before, in between, or after the <span style="font-family: Courier New;">key</span> or the <span style="font-family: Courier New;">value</span>. A naive grammar matching the plain key/value pair without whitespace skipping would look like (see <a href="http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/">Parsing a List of Key-Value Pairs Using Spirit.Qi</a> for more details):</p>
<pre class="brush: cpp;">
pair  =  key &gt;&gt; '=' &gt;&gt; value;
key   =  qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);
value = +qi::char_(&quot;a-zA-Z_0-9&quot;);
</pre>
<p>If we want to explicitly accommodate the rule <span style="font-family: Courier New;">pair</span> to match any interspersed space characters we get:</p>
<pre class="brush: cpp;">
pair  = *space &gt;&gt; key &gt;&gt; *space &gt;&gt; '=' *space &gt;&gt; value &gt;&gt; *space;
</pre>
<p>which, while it produces the desired result, is not only error prone, but additionally difficult to write, to understand, and to maintain. If we look closer we see, that the process of skipping the whitespace tokens is easily automated. It seems to be sufficient to insert a repeated invocation of the <span style="font-family: Courier New;">space</span> parser (or generally, any skip parser) in between the elements of the user defined parser expression sequences.</p>
<p>In fact, that is exactly what <em>Spirit</em> can do for you! The library invokes any supplied skip parser upon entry to the parse member function of any parser conforming to the <a href="http://www.boost.org/doc/libs/1_42_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/primitiveparser.html"><span style="font-family: Courier New;">PrimitiveParser</span></a> concept. The skip parser has to be supplied by calling a special API function: <span style="font-family: Courier New;">phrase_parse:</span></p>
<pre class="brush: cpp;">
namespace qi = boost::spirit::qi;
typedef std::string::const_iterator iterator;

qi::rule&lt;iterator, qi::space_type&gt; pair = key &gt;&gt; '=' &gt;&gt; value;
qi::rule&lt;iterator&gt; key = qi::char_(&quot;a-zA-Z_&quot;) &gt;&gt; *qi::char_(&quot;a-zA-Z_0-9&quot;);
qi::rule&lt;iterator&gt; value = +qi::char_(&quot;a-zA-Z_0-9&quot;);

std::string input(&quot; key = value &quot;);
iterator_type begin = input.begin();
iterator_type end = input.end();
qi::phrase_parse(begin, end, pair, qi::space);
</pre>
<p>This code snippet illustrates several important things:</p>
<ul>
<li>The function <span style="font-family: Courier New;">qi::phrase_parse</span> is equivalent to the API function <span style="font-family: Courier New;">qi::parse</span> except for its additional parameter, the skip parser. Our example utilizes <span style="font-family: Courier New;">qi::space</span>, but it is possible to use any other, even more complex parser expression as the skipper instead.</li>
<li>All rules which we want to perform the skip parsing need to be declared with the type of the skip parser they are going to be used with. Our example specifies the type of the <span style="font-family: Courier New;">qi::space</span> parser expression, which is <span style="font-family: Courier New;">qi::space_type</span>. For more complex parser expressions you might want to use a (mini) grammar or take advantage of <span style="font-family: Courier New;">BOOST_TYPEOF</span> to let the compiler deduce the actual type.</li>
<li>All rules which should not perform skip parsing have to be declared without an additional skip parser type. These rules behave like an implicit <span style="font-family: Courier New;">lexeme[]</span> directive (for more information about <span style="font-family: Courier New;">lexeme[]</span>, see below), they inhibit the invocation of the skip parser even if they are executed as part of a rule with an associated skipper.</li>
</ul>
<p>In the example above we suppressed skipping while matching either the <span style="font-family: Courier New;">key</span> or the <span style="font-family: Courier New;">value,</span> otherwise our grammar would match any additional <span style="font-family: Courier New;">space</span> character inside the <span style="font-family: Courier New;">key</span> or <span style="font-family: Courier New;">value</span> as well. Remember, the expression <span style="font-family: Courier New;">char_</span> conforms to the <a href="http://www.boost.org/doc/libs/1_42_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/primitiveparser.html">PrimitiveParser</a> concept, it will execute the skip parser for each of its invocations. In this case any skip parser would be executed in between any two of the matched characters.</p>
<p>Sometimes it is necessary to turn of skipping for a smaller part of the grammar only. For this purpose Spirit implements the <span style="font-family: Courier New;">lexeme[]</span> directive. This directive inhibits skipping during the execution of the embedded parser. For instance, parsing a quoted string of alphanumeric characters would look like this:</p>
<pre class="brush: cpp;">
string = lexeme['&quot;' &gt;&gt; *alnum &gt;&gt; '&quot;'];
</pre>
<p>Here the lexeme directive disables skipping while matching the string, which avoids &#8216;loosing&#8217; characters otherwise matched by the skipper. Please note: <span style="font-family: Courier New;">lexeme[]</span> performs a pre-skip step, even if it is not a <a href="http://www.boost.org/doc/libs/1_42_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/primitiveparser.html">PrimitiveParser</a> itself (it is essentially considered to be a logical primitive by design). If this is undesired, you can utilize the <span style="font-family: Courier New;">no_skip[]</span> directive instead:</p>
<pre class="brush: cpp;">
string = '&quot;' &gt;&gt; no_skip[*alnum] &gt;&gt; '&quot;';
</pre>
<p>This parser will match all the characters in between the quotes, even if the string starts with a character sequence matched by the applied skip parser. The <span style="font-family: Courier New;">no_skip[]</span> directive is semantically equivalent to <span style="font-family: Courier New;">lexeme[]</span> except it does not perform a pre-skip before executing the embedded parser. Note: the <span style="font-family: Courier New;">no_skip[]</span> directive has been added only recently. It will be available starting with the next release (Boost V1.43).</p>
<p>This short article would not be complete without mentioning the <span style="font-family: Courier New;">skip[]</span> directive. This directive is the counterpart to <span style="font-family: Courier New;">lexeme[]</span>. It enables skipping for the embedded parser. Without any argument it can be used inside a lexeme or no_skip directive only. In this case it just re-enables the outer skipper:</p>
<pre class="brush: cpp;">
string = lexeme['&quot;' &gt;&gt; *(alpha | skip[digit]) &gt;&gt; '&quot;'];
</pre>
<p>This (purely hypothetical) parser would enable skipping inside a string as long as it matches digits. But the skip directive can do more. It may take an additional argument allowing to specify a new skipper, for instance:</p>
<pre class="brush: cpp;">
skip(qi::space)[*alnum]
</pre>
<p>which will skip spaces while executing the embedded <span style="font-family: Courier New;">*alnum</span> parser. This form of the directive can be applied for two purposes. It can be used either for changing the current skip parser or to establish skipping inside a context otherwise not doing skipping at all (even if invoked with the <span style="font-family: Courier New;">qi::parse()</span> API function).</p>
<p>For more detailed information about all the mentioned directives please see the corresponding documentation.</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (4 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Parsing Arbitrary Things in Any Sequence</title>
		<link>http://boost-spirit.com/home/2010/02/17/parsing-arbitrary-things-in-any-sequence/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=parsing-arbitrary-things-in-any-sequence</link>
		<comments>http://boost-spirit.com/home/2010/02/17/parsing-arbitrary-things-in-any-sequence/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 15:45:25 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Intermediate]]></category>
		<category><![CDATA[Qi Example]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=977</guid>
		<description><![CDATA[Recently, there have been a couple of questions on the Spirit mailing list asking how to parse as set of things known in advance in any sequence and any combination. A simple example would be a list of key/value pairs with known keys but the keys may be ordered in any sequence. This use case [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>Recently, there have been a couple of questions on the <em><a href="http://boost-spirit.com/home/info/mailing-list/">Spirit mailing list</a></em> asking how to parse as set of things known in advance in any sequence and any combination. A simple example would be a list of key/value pairs with known keys but the keys may be ordered in any sequence. This use case seems to be quite common. Fortunately Spirit provides you with a predefined parser component designed for exactly that purpose: the permutation parser.</p>
<p><span id="more-977"></span></p>
<p><em>Spirit&#8217;s</em> permutation parser <span style="font-family: Courier New;">a ^ b</span> matches either <span style="font-family: Courier New;">a</span>, <span style="font-family: Courier New;">b</span>, <span style="font-family: Courier New;">a &gt;&gt; b</span>, or <span style="font-family: Courier New;">b &gt;&gt; a</span>, where <span style="font-family: Courier New;">a</span> and <span style="font-family: Courier New;">b</span> can be arbitrary parser expressions. Just like normal sequences this operator can be utilized to combine more than two operands. For instance, the expression <span style="font-family: Courier New;">a ^ b ^ c</span> will match <span style="font-family: Courier New;">a</span> or <span style="font-family: Courier New;">b</span> or <span style="font-family: Courier New;">c</span> (or an combination thereof) in any sequence. The attribute propagation rule for the permutation parser is</p>
<pre class="brush: cpp;">
a: A, b: B --&gt; (a ^ b): tuple&lt;optional&lt;A&gt;, optional&lt;B&gt; &gt;
</pre>
<p>As usual, if one or more operand of the expression do not expose any attribute (expose <span style="font-family: Courier New;">unused_type</span> as their attribute, which is equivalent), this operand disappears from attribute handling:</p>
<pre class="brush: cpp;">
a: A, b: Unused --&gt; (a ^ b): optional&lt;A&gt;;
</pre>
<p>The permutation parser works out of the box whenever you do not require to match all of the elements in the input. But what if you want strict permutation (operands get matched exactly once)? You have two possibilities, as often, one simple and less versatile and one more complex but universally applicable solution. The simple solution is to parse the input and to check afterward whether all optionals in the resulting attribute have been filled. I will leave that solution as an exercise for the reader.</p>
<p>If we assume the attribute to be a (<em>Fusion</em>) tuple of optionals, containing one optional for each of the parser components in the permutation parser we can write the following code (thanks to Carl Barron for the initial idea).</p>
<p>This code defines a <em>Phoenix</em> function (a lazy function encapsulating some custom functionality) checking whether one or more of the optionals in a given <em>Fusion</em> sequence are empty. The <em>Fusion</em> algorithm <span style="font-family: Courier New;">find_if</span> iterates over the given sequence of optionals, invoking the <span style="font-family: Courier New;">option_empty::operator()</span> for each of the elements. <span style="font-family: Courier New;">fusion::find_if</span> stops iterating on the first invocation returning <span style="font-family: Courier New;">true</span> and returns the iterator to the element it stopped on. This is very similar to the well known <span style="font-family: Courier New;">std::find_if</span> algorithm.</p>
<pre class="brush: cpp;">
namespace phoenix = boost::phoenix;
namespace fusion = boost::fusion;
namespace qi = boost::spirit::qi;

class no_empties_impl
{
    // helper function object to be invoked by fusion::find_if
    struct optional_empty
    {
        template &lt;typename T&gt;
        bool operator ()(T const&amp; val) const
        {
            return !val;  // return true if 'val' is empty.
        }
    };

public:
    template &lt;typename T&gt;
    struct result { typedef bool type; };

    // This operator will get called from the semantic action attached
    // to the permutation parser. The parameter refers to its overall
    // attribute: the fusion tuple of optionals.
    template &lt;typename T&gt;
    bool operator ()(T const&amp; t) const
    {
        // look for an empty optional, if any return false.
        return fusion::find_if&lt;optional_empty&gt;(t) ==
               fusion::end(t);
    }
};

// define the Phoenix function
phoenix::function&lt;no_empties_impl&gt; const no_empties = no_empties_impl();
</pre>
<p>The overall Phoenix function <span style="font-family: Courier New;">no_empties</span> will return <span style="font-family: Courier New;">false</span> if we found at least one non-initialized optional in the passed sequence. The following code snippet illustrates how everything fits together:</p>
<pre class="brush: cpp;">
std::string input (&quot;BCA&quot;);
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
qi::parse(begin, end,
    (qi::char_('A') ^ 'B' ^ 'C')[qi::_pass = no_empties(qi::_0)]);
</pre>
<p>We assign the result of the invocation of <span style="font-family: Courier New;">no_empties</span> to Qi&#8217;s predefined placeholder <span style="font-family: Courier New;">_pass</span>. If we assign <span style="font-family: Courier New;">false</span>, then the parser the semantic action is attached to will be forced to fail in retrospective (even if it matched the input successfully before). As a result the overall parser expression will succeed as long as a) the permutation parser matches its input and b) the <em>Phoenix</em> function inside the semantic action returns <span style="font-family: Courier New;">true</span>.</p>
<p>For more information about the permutation parser please consult its documentation <a title="Permutation parser documentation" href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/spirit/qi/reference/operator/permutation.html">here</a>. Overall, this example is a bit more complex than the average parser you might usually write. It utilizes three libraries: <em>Spirit</em>, <em>Phoenix</em>, and <em>Fusion</em> in a seamless manner. But for sure, once you understand the idea, it will be easier for you to come up with similar solutions. <em>Spirit</em> has been designed with <em>Phoenix</em> and <em>Fusion</em> in mind, and in fact it relies on <em>Fusion</em> heavily itself. As a result, the integration of those libraries is almost perfect.</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/02/17/parsing-arbitrary-things-in-any-sequence/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What&#8217;s the Difference Between Karma&#8217;s &#8216;!&#8217; and &#8216;~&#8217;?</title>
		<link>http://boost-spirit.com/home/2010/01/26/whats-the-difference-between-karmas-and/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=whats-the-difference-between-karmas-and</link>
		<comments>http://boost-spirit.com/home/2010/01/26/whats-the-difference-between-karmas-and/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 11:41:57 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Karma]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=943</guid>
		<description><![CDATA[A couple of days ago I promised to get back to this topic (if you want to refresh your memory, here is the discussion of those operators in Qi). Today we will discuss Karma&#8217;s unary operators &#8216;!&#8217; and &#8216;~&#8217;. These have very similar semantics as their counterparts in Qi, but as usual, we have to [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>A couple of days ago I promised to get back to this topic (if you want to refresh your memory, <a href="http://boost-spirit.com/home/2010/01/17/whats-the-difference-between-qis-and/">here</a> is the discussion of those operators in <em>Qi</em>). Today we will discuss <em>Karma&#8217;s</em> unary operators <span style="font-family: Courier New;">&#8216;!&#8217;</span> and <span style="font-family: Courier New;">&#8216;~&#8217;</span>. These have very similar semantics as their counterparts in <em>Qi</em>, but as usual, we have to turn things inside out in order to make them fit to output generation.</p>
<p><span id="more-943"></span></p>
<p>The one commonality of the two operators is the same as in <em>Qi</em>: both negate whether the component they are being used with succeeds generating. If the component <span style="font-family: Courier New;">&#8216;c&#8217;</span> succeeds, both compound constructs, <span style="font-family: Courier New;">&#8216;!c&#8217;</span> and <span style="font-family: Courier New;">&#8216;~c&#8217;</span> will fail, and similarly, if <span style="font-family: Courier New;">&#8216;c&#8217;</span> fails, the execution of the components <span style="font-family: Courier New;">&#8216;!c&#8217;</span> and <span style="font-family: Courier New;">&#8216;~c&#8217;</span> will succeed.</p>
<p>Similar to its counterpart in <em>Qi</em>, the unary <em>Karma</em> operator <span style="font-family: Courier New;">&#8216;~&#8217;</span> is applicable to character and character class generators only. It negates the set of characters a generator will be allowed to emit. Let me explain. The <em>Karma</em> character set generators, such as <span style="font-family: Courier New;">char_(&#8220;a-z&#8221;)</span> or <span style="font-family: Courier New;">digit</span> will emit their attribute only if the attribute value belongs to the character set described. Consequently, applying the operator &#8216;~&#8217; will negate the character set the generator it is attached to. Here are some examples:</p>
<table border="1" cellspacing="0" cellpadding="2" width="600">
<tbody>
<tr>
<td width="109" valign="top"><strong>Expression</strong></td>
<td width="491" valign="top"><strong>Description</strong></td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~char<span style="font-family: Courier New;">_(&#8220;a-z&#8221;)</span> </span></td>
<td width="491" valign="top">will emit character values outside the character range spanned by &#8216;a&#8217; and &#8216;z&#8217;</td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~digit</span></td>
<td width="491" valign="top">will emit non-digits only</td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~char_(&#8216;a&#8217;)</span></td>
<td width="491" valign="top">will emit everything but an <span style="font-family: Courier New;">&#8216;a&#8217;</span></td>
</tr>
</tbody>
</table>
<p> </p>
<p>If a generator can&#8217;t emit its attribute it will fail. This is very similar to a failing parser component. The possibility for a generator component to fail is very useful. This can be utilized for alternatives, predicates and other constructs. But this is beyond today&#8217;s topic and will be discussed in a different installment of the &#8216;Tip of the Day&#8217;.</p>
<p>The generators created by the operator <span style="font-family: Courier New;">&#8216;~&#8217;</span> do not wrap the underlying generator. The operator rather modifies the behavior of the component it is attached to. This means there is no performance difference if compared to the plain character generators.</p>
<p>The unary operator <span style="font-family: Courier New;">&#8216;!&#8217;</span> creates a not-predicate generator. Again, similar to its counterpart in <em>Qi</em>, it can be attached to any (arbitrarily complex) generator. The not-predicate generator will succeed if the associated generator fails, and it will fail if its generator succeeds. It invokes the generator it is attached to but the emitted output will not show up in the overall output stream. So effectively, the not-predicate generator will never emit any output. The following example will succeed emitting a floating point number if the first attribute is false (which will make the <span style="font-family: Courier New;">true_</span> generator fail), otherwise it will not emit anything and will fail all together:</p>
<pre class="brush: cpp;">
namespace karma = boost::spirit::karma;
std::string output;
std::back_insert_iterator&lt;std::string&gt; sink(output);
karma::generate(sink, !true_ &lt;&lt; double_, false, 1.0); // will emit: 1.0
</pre>
<p>This example highlights another difference if compared to <em>Qi&#8217;s</em> not-predicate. The <em>Karma</em> not-predicate will always consume an attribute. More accurately, it will expose the attribute of the generator it is associated with (while in <em>Qi</em> the not-predicate never exposes any attribute). As mentioned earlier, we utilize the <span style="font-family: Courier New;">true_</span> generator&#8217;s ability to fail to control what output is generated (or if any output is generated at all as in our case).</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/26/whats-the-difference-between-karmas-and/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What are the Benefits of Using a Lexer?</title>
		<link>http://boost-spirit.com/home/2010/01/25/what-are-the-benefits-of-using-a-lexer/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=what-are-the-benefits-of-using-a-lexer</link>
		<comments>http://boost-spirit.com/home/2010/01/25/what-are-the-benefits-of-using-a-lexer/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 11:30:34 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Lex]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=935</guid>
		<description><![CDATA[Starting with Spirit V2 we added a module for generating code aimed at the lexical analysis of the input: Spirit.Lex (a lexer module, also called scanner). Lexical analysis is the process of preprocessing the stream of input characters and separating it into strings called tokens, most of the time delimited by whitespace. Most compiler texts [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.0" /></div><div>Rating: 4.0/<strong>5</strong> (3 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>Starting with <em>Spirit</em> V2 we added a module for generating code aimed at the lexical analysis of the input: <em>Spirit.Lex</em> (a lexer module, also called scanner). Lexical analysis is the process of preprocessing the stream of input characters and separating it into strings called tokens, most of the time delimited by whitespace. Most compiler texts start here, and devote several chapters to discussing various ways to build scanners. <em>Spirit.Lex</em> is a library built to take care of the complexities of creating a lexer for your grammar.</p>
<p>We know the documentation of <em>Spirit.Lex</em> is not complete yet. So I will write  more about it here from now on to fill in the missing pieces and to show a couple of tricks demonstrating its best usage.</p>
<p><span id="more-935"></span></p>
<p>Lexical analysis is done in a separate module from the parser, feeding the parser with a stream of input tokens only. The following picture visualizes this process.</p>
<p><a href="http://boost-spirit.com/home/wp-content/uploads/2010/01/flowofcontrol1.png"><img style="display: block; float: none; margin-left: auto; margin-right: auto; border-width: 0px;" title="The common flow control implemented while parsing combined with lexical analysis" src="http://boost-spirit.com/home/wp-content/uploads/2010/01/flowofcontrol1_thumb.png" border="0" alt="The common flow control implemented while parsing combined with lexical analysis" width="600" height="223" /></a> Instead of directly getting the next character from the input the parser asks the lexer for the next input token. The lexer in turn looks at the next input characters to decide what token patterns match best. Only after matching the next token from the input it returns it to the parser.</p>
<p>Theoretically it is not necessary to implement this separation as in the end the language is defined only by exactly one set of syntactical rules. So we could write the whole parser in one module. In fact, <em>Spirit.Qi</em> allows you to write parsers without using a lexer, while parsing the input character stream directly, and for the most part this is the way <em>Spirit</em> has been used since its invention.</p>
<p>So the question is: &#8220;When does it make sense to invest the additional time and effort to create a separate lexer for your grammar and when is it sufficient to exclusively utilize <em>Spirit.Qì</em>?&#8221; Unfortunately, there is no single answer to this question, and all I can try is to highlight some of the advantages and disadvantages of a separate lexer module. The concrete decision whether to utilize a lexer or to parse the grammar in one step has to be made on a case by case basis.</p>
<p><em>Spirit.Lex</em> gives you the ability to create lexical analyzers based on patterns. These patterns are regular expression used to define the different tokens to be recognized in the character input sequence. The lexer generates internal tables from all token definitions &#8211; the so called deterministic finite automata (DFA&#8217;s). It is well know that matching an input sequence using a DFA is as efficient as it can get. <em>Spirit.Lex is fast! </em>Measurements prove that lexers generated with <em>Spirit.Lex</em> are faster than comparable lexers generated with <a href="http://flex.sourceforge.net/">flex</a>, the most widely used lexical generator used today.</p>
<p>The input sequence is expected to be provided to the lexical analyzer as an arbitrary standard forward iterator. The lexical analyzer itself exposes a standard forward iterator as well. The difference here is that the exposed iterator provides access to the token sequence instead of to the character sequence. The tokens in this sequence are constructed on the fly by analyzing the underlying character sequence and matching this to the patterns as defined by the application.</p>
<p>Here is a list of additional advantages <em>Spirit.Lex</em> provides you with:</p>
<ul>
<li>The definition of tokens is done using regular expressions, where the token definitions can refer to special substitution strings (pattern macros) simplifying the token definitions.</li>
<li>The generated lexer may have multiple start states (we will talk about lexer states in a separate post).</li>
<li>It is possible to attach code to any of the token definitions; this code gets executed whenever the corresponding token pattern has been matched.</li>
<li>The iterator exposed by the lexer buffers the last emitted tokens. This significantly speeds up parsing of grammars which require backtracking.</li>
<li>The tokens created at runtime can carry arbitrary token specific data items which are available from the parser as attributes. The input character sequence is converted to the token data items exactly once, regardless of how often the value is accessed.</li>
<li>Token based parsing simplifies the <em>Qi</em> grammar necessary to describe the matching rules. This lessens the runtime overhead introduced by the parser itself.</li>
</ul>
<p>But at the same time this feature list hints at the disadvantages of a separate lexer module:</p>
<ul>
<li>The definition of tokens is done using regular expressions. To some, regular expressions are more difficult to understand compared to EBNF or PEG rules.</li>
<li>Additional effort is required to develop and debug the lexer and its token descriptions.</li>
<li>The lexer definition may add measurable compile time overhead, depending on what features are employed.</li>
<li>Additional runtime overhead is required in order to generate the lexer tables and construct the tokens.</li>
</ul>
<p>The last bullet point touches on the most interesting characteristic: the speed of parsing. As already mentioned, generally it is not possible to predict whether the overhead introduced by the lexer is amortized by savings on grammar specific things like required backtracking, repeated attribute conversion, required symbol lookup, etc. But the experience shows that lexers tend to pay off for moderately sized or large grammars, such as full language implementations or data format descriptions with a lot of dynamic structural alternatives. But when in doubt, you will have to do measurements based on your concrete grammar and input data to make a sound decision. You know the drill…</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.0" /></div><div>Rating: 4.0/<strong>5</strong> (3 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/25/what-are-the-benefits-of-using-a-lexer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What are Rule Bound Semantic Actions?</title>
		<link>http://boost-spirit.com/home/2010/01/21/what-are-rule-bound-semantic-actions/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=what-are-rule-bound-semantic-actions</link>
		<comments>http://boost-spirit.com/home/2010/01/21/what-are-rule-bound-semantic-actions/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 11:00:00 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Intermediate]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Karma]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=914</guid>
		<description><![CDATA[In the previous installment of the &#8216;Tip of the Day&#8217; I started to talk about some lesser known features related to semantic actions. Today I will highlight some more details. If a semantic action is attached to a component which is part of an expression assigned to a rule (the rule&#8217;s right hand side) it [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (1 vote cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://boost-spirit.com/home/2010/01/19/how-to-access-attributes-from-semantic-actions/">the previous installment</a> of the &#8216;Tip of the Day&#8217; I started to talk about some lesser known features related to semantic actions. Today I will highlight some more details. If a semantic action is attached to a component which is part of an expression assigned to a rule (the rule&#8217;s right hand side) it is not only possible to access the attributes of the components it is connected with. In addition it is possible to access rule specific values! Sounds interesting? Read on!</p>
<p><span id="more-914"></span></p>
<p>I chose this topic even if I got some comments asking me to avoid writing about things already covered in <em>Spirit&#8217;s</em> documentation. At the same time, I believe it&#8217;s valuable for some of you to repeat the facts from the docs but presented in a more pointed and descriptive way. Additionally, I think the articles as published here lay the ground for more detailed things not yet covered in the docs, but the basics need to be explained properly before we can touch the more advanced stuff. But please leave your comments below telling me what you would like to see highlighted. I am always interested in getting feedback on what works for you and what not.</p>
<p>Ok, here we go – let us talk about rule bound semantic actions.</p>
<p>Any semantic action attached to the right hand side of a rule (or a part of it) is special. We will call those &#8216;rule bound semantic actions&#8217;. <em>Spirit</em> provides you with extra placeholder variables (in addition to <span style="font-family: Courier New;">_1</span>, <span style="font-family: Courier New;">_2</span>, etc.) usable to access values related to the enclosing rule (the rule, the expression has been assigned to). For the purpose of this post I assume, that the semantic actions are written using <a href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/phoenix/doc/html/index.html">Boost.Phoenix</a>. In fact, all placeholders described below are implemented using Phoenix and do not integrate very well with other, similar facilities available in Boost (such as <a href="http://www.boost.org/doc/libs/1_41_0/libs/bind/bind.html">Boost.Bind</a> or <a href="http://www.boost.org/doc/libs/1_41_0/doc/html/lambda.html">Boost.Lambda</a>). You should avoid mixing Phoenix placeholder with expressions built from those libraries. This is no limitation, as from my experience, using Phoenix is just the simplest way to write semantic actions.</p>
<p>Let us assume further, that we have a component <span style="font-family: Courier New;">c</span> with an attached semantic action <span style="font-family: Courier New;">f</span>, which is assigned as the right hand side of a rule exposing a synthesized attribute of type <span style="font-family: Courier New;">Attr</span> and providing local variables of types <span style="font-family: Courier New;">L1</span>, <span style="font-family: Courier New;">L2</span>, etc. (I will explain local variables shortly):</p>
<pre class="brush: cpp;">
rule&lt;Iterator, Attr(), locals&lt;L1, L2, ...&gt; &gt; r = c[f];
</pre>
<p>Here is a list of all predefined placeholders available in the rule bound semantic action <span style="font-family: Courier New;">F</span> (see also the related documentation <a href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/spirit/qi/quick_reference/phoenix.html">here</a> and <a href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/spirit/karma/quick_reference/phoenix.html">here</a>):</p>
<table border="1" cellspacing="0" cellpadding="2" width="600">
<tbody>
<tr>
<td width="181" valign="top"><strong>Placeholders</strong></td>
<td width="419" valign="top"><strong>Description</strong></td>
</tr>
<tr>
<td width="181" valign="top"><code>_1, _2 ... , _N</code></td>
<td width="419" valign="top">Nth attribute of <code>c</code></td>
</tr>
<tr>
<td width="181" valign="top"><code>_val</code></td>
<td width="419" valign="top">The enclosing rule&#8217;s synthesized attribute (of type <span style="font-family: Courier New;">Attr</span>)</td>
</tr>
<tr>
<td width="181" valign="top">
<dt><code>_a, _b ... , _j</code></dt>
</td>
<td width="419" valign="top">The enclosing rule&#8217;s local variables (<code>_a</code> refers to the first of type <span style="font-family: Courier New;">L1</span>, <span style="font-family: Courier New;">_b</span> to the next of type <span style="font-family: Courier New;">L2</span>, etc.)</td>
</tr>
<tr>
<td width="181" valign="top"><code>_r1, _r2 ... , _rN</code></td>
<td width="419" valign="top">The enclosing rule&#8217;s Nth inherited attribute</td>
</tr>
<tr>
<td width="181" valign="top">
<dt><code>_pass</code></dt>
</td>
<td width="419" valign="top">Assign <code>false</code> to <code>_pass</code> to force a parser or generator failure</td>
</tr>
<tr>
<td width="181" valign="top">  </td>
<td width="419" valign="top">  </td>
</tr>
</tbody>
</table>
<p>We already talked about <span style="font-family: Courier New;">_val</span>, <span style="font-family: Courier New;">_1</span>, <span style="font-family: Courier New;">_2</span>, etc. last time, and I will not describe inherited attributes today, that is for another day. I would like to concentrate on local variables here.</p>
<p>Local variables allow to associate arbitrary typed data instances with each rule invocation. These variables are very similar to local variables in a function. They are valid only during the invocation of the rule they are defined in and they automatically go out of scope after the right hand side of the rule has finished executing. Further, each invocation of a rule creates a new set of (default constructed) local variables, even if the same rule is invoked recursively.</p>
<p>Here is a small <em>Karma</em> example generating a list of items prefixed with a sequence number (note: <span style="font-family: Courier New;">_a</span>, <span style="font-family: Courier New;">eps</span>, <span style="font-family: Courier New;">string</span>, <span style="font-family: Courier New;">lit</span>, <span style="font-family: Courier New;">rule</span>, and <span style="font-family: Courier New;">locals</span> are imported from <span style="font-family: Courier New;">namespace boost::spirit::karma</span>).</p>
<pre class="brush: cpp;">
rule&lt;OutIter, locals&lt;int&gt;, std::vector&lt;std::string&gt;()&gt; r =
    eps[_a = 1] &lt;&lt; (lit(_a) &lt;&lt; eps[++_a] &lt;&lt; &quot; &quot; &lt;&lt; string) % '\n';
</pre>
<p>Let us analyze, what this rule definition does. </p>
<ul>
<li>The attribute of this rule is <span style="font-family: Courier New;">std::vector&lt;std::string&gt;</span>, for the sake of simplicity we will pass in all items stored in this type of container. The items are emitted using the list operator (<span style="font-family: Courier New;">%</span>), causing them to be interleaved with newlines.</li>
<li>One of the rule&#8217;s template parameters is <span style="font-family: Courier New;">locals&lt;int&gt;</span>. That is the way we declare the types of the local variables. The template <span style="font-family: Courier New;">locals&lt;&gt;</span> is a type container you can use to list the types for all local variables to instantiate for each of the rule&#8217;s invocations.</li>
<li>In our example, we have a single local variable of type <span style="font-family: Courier New;">int</span>  and we reference it from the right hand side of the rule definition using the placeholder <span style="font-family: Courier New;">_a</span>.</li>
<li>The generator <span style="font-family: Courier New;">eps</span> is utilized to inject invocations of semantic actions initializing and incrementing the local variable. The generator <span style="font-family: Courier New;">eps</span> is perfect for doing this as it succeeds always while emitting nothing.</li>
<li>The most notable construct is <span style="font-family: Courier New;">lit(_a)</span> showing that it is possible to directly create a generator component from the local variable. Many of <em>Spirit&#8217;s</em> construct are enabled for lazy evaluation, and <span style="font-family: Courier New;">lit()</span> is one of them.</li>
</ul>
<p>The only additional thing I want to add is that everything shown above works equally well in <em>Qi</em>, but you probably already guessed this. For your convenience, the corresponding symbols are available from the <span style="font-family: Courier New;">namespace boost::spirit::qi</span> as well. If you are interested in seeing locals in action in a more complete <em>Qi</em> example you should have a look at the mini-xml series as described in the docs <a href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html">here</a>.</p>
<p>This &#8216;Tip of the Day&#8217; got a little longer than usual, but I hope it is worth reading anyways. Spirit is a large library and there are uncounted hidden gems waiting not only to be described, but some of them are still waiting to be discovered. Happy hacking and experimenting!</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (1 vote cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/21/what-are-rule-bound-semantic-actions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Access Attributes from Semantic Actions?</title>
		<link>http://boost-spirit.com/home/2010/01/19/how-to-access-attributes-from-semantic-actions/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=how-to-access-attributes-from-semantic-actions</link>
		<comments>http://boost-spirit.com/home/2010/01/19/how-to-access-attributes-from-semantic-actions/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 11:00:03 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Karma]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=898</guid>
		<description><![CDATA[The concept of semantic actions seems to be quite easy to understand. It appears to be at least easier to grasp than the concept of attribute propagation. This might be because semantic actions have been part of Spirit for almost a decade now. Additionally, with semantic actions data flow control is tightly connected to the [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.7" /></div><div>Rating: 4.7/<strong>5</strong> (3 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>The concept of semantic actions seems to be quite easy to understand. It appears to be at least easier to grasp than the concept of attribute propagation. This might be because semantic actions have been part of <em>Spirit</em> for almost a decade now. Additionally, with semantic actions data flow control is tightly connected to the component the semantic action is attached to, so the effect is highly localized and easy to spot.</p>
<p><em>Spirit</em> has some new features related to semantics actions. That&#8217;s reason enough to talk about how attributes can be accessed from inside semantic actions.</p>
<p><span id="more-898"></span></p>
<p>First of all, let us revisit the concept of semantic actions.</p>
<blockquote><p>Semantic actions may be attached to any point in the grammar specification. These actions are C++ functions or function objects that are called at certain points during the process of parsing or output generation. In <em>Qi</em> a semantic action is called whenever a part of the parser successfully recognizes a portion of the input. In <em>Karma</em> they are called whenever a part of the generator is about to be invoked.</p></blockquote>
<p>Let us assume you have a component <span style="font-family: Courier New;">C</span>, and a C++ function or function object <code>F</code>, then you can make the component call <code>F</code> by attaching it to the component:</p>
<pre class="brush: cpp;">
C[F]
</pre>
<p>The expression above links <code>F</code> to the component <code>C</code>.</p>
<p>Even if it possible to utilize almost any separate C++ function as a semantic action, it is a lot simpler to write semantic actions using <a href="http://www.boost.org/doc/libs/1_41_0/libs/spirit/phoenix/doc/html/index.html">Boost.Phoenix</a>. Phoenix enables to write semantic actions in a very straight forward way. The code to be executed is placed inline directly into the square brackets. Here is a very simple <em>Qi</em> example:</p>
<pre class="brush: cpp;">
qi::int_[std::cout &lt;&lt; qi::_1]
</pre>
<p>Here <span style="font-family: Courier New;">qi::_1</span> is a predefined (Phoenix) placeholder referencing the attribute of the parser the semantic actions has been attached to (the <span style="font-family: Courier New;">qi::int_</span>). As the semantic action is invoked <span style="text-decoration: underline;">after</span> the parser succeeded, the attribute will be set to the matched value. We can do something very similar in <em>Karma</em> as well:</p>
<pre class="brush: cpp;">
int i = 3;
karma::int_[karma::_1 = phoenix::ref(i)] // will emit: 3
</pre>
<p>Here we assign the value we want to be emitted to the (Phoenix) placeholder <span style="font-family: Courier New;">karma::_1</span> (<span style="font-family: Courier New;">phoenix::ref()</span> is almost identical to <span style="font-family: Courier New;">boost::ref()</span>, except that it integrates nicely with any Phoenix expression). This works as semantic actions in Karma are invoked <span style="text-decoration: underline;">before</span> the generator does its job, allowing to pass the current value of the variable <span style="font-family: Courier New;">&#8216;i&#8217;</span> as the attribute for the generator <span style="font-family: Courier New;">karma::int_</span>.</p>
<p>A lesser known feature of <em>Spirit&#8217;s</em> semantic actions is that they can be attached to arbitrary complex parser (or generator) expressions, which is particularly useful if the expression is a sequence. In this case predefined placeholders enable to access the attributes of the sequence elements separately:</p>
<pre class="brush: cpp;">
(qi::int_ &gt;&gt; qi::double_)[std::cout &lt;&lt; (qi::_1 + qi::_2)]
</pre>
<p>Here <span style="font-family: Courier New;">qi::_1</span> refers to the attribute of the component <span style="font-family: Courier New;">qi::int_</span>, and <span style="font-family: Courier New;">qi::_2</span> refers to the attribute of <span style="font-family: Courier New;">qi::double_</span>. This feature is very handy as the semantic action will be invoked only after both parser components succeeded making sure the results will be evaluated only if the input has been matched completely. This is something which did not work in <em>Spirit.Classic. M</em>any developers have been complaining about the lack of any related functionality.</p>
<p>Needless to say, similar constructs are available for <em>Karma</em> as well, even if this functionality is there not as important as in <em>Qi</em>.</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=4.7" /></div><div>Rating: 4.7/<strong>5</strong> (3 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/19/how-to-access-attributes-from-semantic-actions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What&#8217;s the Difference Between Qi&#8217;s &#8216;!&#8217; and &#8216;~&#8217;?</title>
		<link>http://boost-spirit.com/home/2010/01/17/whats-the-difference-between-qis-and/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=whats-the-difference-between-qis-and</link>
		<comments>http://boost-spirit.com/home/2010/01/17/whats-the-difference-between-qis-and/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 17:12:21 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=876</guid>
		<description><![CDATA[The Freenet #boost IRC channel amazes me every day with the amount of interest Spirit is getting from a lot of people. Thanks to everyone over there! But the best is those people are asking many interesting questions allowing me to come up with yet another Tip of the Day. Today&#8217;s question has been asked [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>The Freenet #boost IRC channel amazes me every day with the amount of interest <em>Spirit</em> is getting from a lot of people. Thanks to everyone over there! But the best is those people are asking many interesting questions allowing me to come up with yet another Tip of the Day.</p>
<p>Today&#8217;s question has been asked by @psicode: &#8220;What is the difference between the components created by the unary operators <span style="font-family: Courier New;">&#8216;!&#8217;</span> and <span style="font-family: Courier New;">&#8216;~&#8217;</span>?&#8221;. As the semantics of those operators are slightly dissimilar in <em>Qi</em> and <em>Karma</em>, I will talk about them separately. I will write about the <em>Qi</em> operators today and about the corresponding <em>Karma</em> operators in one of the next installments.</p>
<p><span id="more-876"></span></p>
<p>Let us start with the commonality between the two operators: both negate whether the component they are being used with succeeds parsing. If the component <span style="font-family: Courier New;">&#8216;c&#8217;</span> succeeds, both compound constructs, <span style="font-family: Courier New;">&#8216;!c&#8217;</span> and <span style="font-family: Courier New;">&#8216;~c&#8217;</span> will fail, and similarly, if <span style="font-family: Courier New;">&#8216;c&#8217;</span> fails, the execution of components <span style="font-family: Courier New;">&#8216;!c&#8217;</span> and <span style="font-family: Courier New;">&#8216;~c&#8217;</span> will succeed.</p>
<p>The differences are more interesting. The unary <em>Qi</em> operator &#8216;~&#8217; is applicable to character and character class parsers only. It negates the set of characters matched by the parser component it is attached to. Here are some examples:</p>
<table border="1" cellspacing="0" cellpadding="2" width="600">
<tbody>
<tr>
<td width="109" valign="top"><strong>Expression</strong></td>
<td width="491" valign="top"><strong>Description</strong></td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~char_</span></td>
<td width="491" valign="top">does not match anything</td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~digit</span></td>
<td width="491" valign="top">matches everything except digits</td>
</tr>
<tr>
<td width="109" valign="top"><span style="font-family: Courier New;">~char_(&#8220;a-z&#8221;)</span></td>
<td width="491" valign="top">matches every character outside the character range spanned by &#8216;a&#8217; and &#8216;z&#8217;</td>
</tr>
</tbody>
</table>
<p> </p>
<p>The parsers created by the operator <span style="font-family: Courier New;">&#8216;~&#8217;</span> do not wrap the underlying parser. It rather modifies the behavior of the component it is attached to. This means there is no performance difference if compared to the plain character parsers.</p>
<p>The unary operator <span style="font-family: Courier New;">&#8216;!&#8217;</span> is creating a not-predicate parser. It can be attached to any (arbitrarily complex) parser expression. The not-predicate is a look-ahead matching parser trying to match the expression it is attached to. This is done without moving the current input position forward. In other words, the not-predicate does not consume any input . If the attached parser fails matching the overall not-predicate will succeed. In this case the parsing resumes at the same point where the not-predicate started matching. The following (slightly artificial) example will succeed matching a floating point number after making sure it is not a Boolean expression:</p>
<pre class="brush: cpp;">
namespace qi = boost::spirit::qi;
std::string input(&quot;1.0&quot;);
std::string::const_iterator b = input.begin();
double result = 0;
qi::parse(b, input.end(), !qi::bool_ &gt;&gt; qi::double_, result);
</pre>
<p>Any parser created by the not-predicate is neutral in terms of attribute handling because it exposes <span style="font-family: Courier New;">unused_type</span> as its attribute. As a consequence, parser components augmented with <span style="font-family: Courier New;">&#8216;!&#8217;</span> will never expose their attribute, and never will participate in any attribute propagation.</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=0.0" /></div><div>Rating: 0.0/<strong>5</strong> (0 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/17/whats-the-difference-between-qis-and/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Do Rules Propagate Their Attributes?</title>
		<link>http://boost-spirit.com/home/2010/01/15/how-do-rules-propagate-attributes/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=how-do-rules-propagate-attributes</link>
		<comments>http://boost-spirit.com/home/2010/01/15/how-do-rules-propagate-attributes/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 15:41:35 +0000</pubDate>
		<dc:creator>Hartmut Kaiser</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Tip of the Day]]></category>
		<category><![CDATA[Karma]]></category>
		<category><![CDATA[Qi]]></category>

		<guid isPermaLink="false">http://boost-spirit.com/home/?p=857</guid>
		<description><![CDATA[If you read the article about attribute handling for non-terminals (The Magical Power of Attributes in Spirit &#8211; Directives and Non-terminals) you might remember that Spirit&#8217;s non-terminals (rules and grammars) are somewhat special with regard to their attribute handling. In today&#8217;s &#8216;Tip of the Day&#8217; I would like to revisit this topic as it still [...]<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (3 votes cast)</div><br />]]></description>
			<content:encoded><![CDATA[<p>If you read the article about attribute handling for non-terminals (<a href="http://boost-spirit.com/home/articles/basics/the-magical-power-of-attributes-in-spirit-directives-and-non-terminals/">The Magical Power of Attributes in Spirit &#8211; Directives and Non-terminals</a>) you might remember that <em>Spirit&#8217;s</em> non-terminals (rules and grammars) are somewhat special with regard to their attribute handling. In today&#8217;s &#8216;Tip of the Day&#8217; I would like to revisit this topic as it still seems to be difficult to understand.</p>
<p><span id="more-857"></span></p>
<p><em>Spirit&#8217;s</em> non-terminals can expose any attribute type, and the required type needs to be explicitly specified while declaring them. This is true for <em>Qi</em> parsers and <em>Karma</em> generators. The following example declares a rule exposing an (synthesized) attribute of type <span style="font-family: Courier New;">double</span> (if you wonder why it uses the unusual function declaration syntax, i.e. <span style="font-family: Courier New;">double()</span>, please read the article mentioned above):</p>
<pre class="brush: cpp;">
namespace qi = boost::spirit::qi;
std::string input(&quot;1.0&quot;);
std::string::const_iterator b = input.begin();
double result = 0;
qi::rule&lt;std::string::const_iterator, double()&gt; r = qi::double_;
qi::parse(b, input.end(), r, result);
</pre>
<p>The left hand side&#8217;s attribute (the <span style="font-family: Courier New;">result</span> passed in by the user) is directly handed over to the right hand side of the rule. The parser created by <span style="font-family: Courier New;">qi::double_</span> will put its result into the very same <span style="font-family: Courier New;">double</span> instance as passed in from the outside. No additional copies are created. We call this behavior <em>auto attribute propagation</em>. For rules it is enabled by default as long as no semantic actions are attached to the right hand side&#8217;s expression.</p>
<p>If you want to enforce auto attribute propagation even if the right hand side has semantic actions, you need to employ the &#8216;%=&#8217; syntax as shown in the next Karma example:</p>
<pre class="brush: cpp;">
namespace karma = boost::spirit::karma;
typedef std::back_insert_iterator&lt;std::string&gt; output_iterator;
std::string output;
output_iterator sink(output);
karma::rule&lt;output_iterator, double()&gt; r;
r %= karma::double_[++karma::_1];
karma::generate(sink, r, 1.0);    // will emit: 2.0
</pre>
<p>This code will increment the attribute (the <span style="font-family: Courier New;">1.0</span>)  before emitting the result to the output iterator (remember, semantic actions in <em>Karma</em> are called before invoking the related generator). Note, we incremented the attribute of the generator <span style="font-family: Courier New;">karma::double_</span>, not the left hand side&#8217;s attribute (which would have been <span style="font-family: Courier New;">++karma::_val</span>). This works as we enforced the auto attribute propagation using the &#8216;%=&#8217;.</p>
<p>There is one more important thing to remember: regardless of how the auto attribute propagation is enabled, either based on the default behavior of <span style="font-family: Courier New;">&#8216;=&#8217;</span> or by enforcing it using <span style="font-family: Courier New;">&#8216;%=&#8217;</span>, the attribute types of the rule and the right hand side must be compatible. It is not possible to define <em>Spirit&#8217;s</em> &#8216;attribute compatibility&#8217; in a short sentence, so we leave the topic for another day. But in the simplest case it means the attributes have to be convertible. In <em>Qi</em> the right hand side&#8217;s attribute must at least be convertible to the rule&#8217;s attribute, while in <em>Karma</em> the opposite needs to be true: the rule&#8217;s attribute should at least be convertible to the right hand side&#8217;s attribute.</p>
<br /><div><img src="http://boost-spirit.com/home/wp-content/plugins/gd-star-rating/gfx.php?value=5.0" /></div><div>Rating: 5.0/<strong>5</strong> (3 votes cast)</div><br />]]></content:encoded>
			<wfw:commentRss>http://boost-spirit.com/home/2010/01/15/how-do-rules-propagate-attributes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
