{"id":1039,"date":"2010-03-11T16:24:42","date_gmt":"2010-03-12T00:24:42","guid":{"rendered":"http:\/\/boost-spirit.com\/home\/?p=1039"},"modified":"2010-03-12T18:45:38","modified_gmt":"2010-03-13T02:45:38","slug":"s-expressions-and-variants","status":"publish","type":"post","link":"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/","title":{"rendered":"S-expressions and variant"},"content":{"rendered":"<p>I have a mixed relationship with variant&#8230;<\/p>\n<p>I just wrote a parser for S-expressions (that will be the basis of ASTs and intermediate types in my planned &#8220;write-a-compiler&#8221; article series). The parser itself is easy, but as always, I spent more time on the underlying data structures. <\/p>\n<p><!--more--> <\/p>\n<p>What are S-expressions? S-expressions, also called sexps, are recursive, list based, data structures. Being recursive, they can represent hierarchical information. S-expressions are parenthesized prefix expressions, known for their use in LISP (and its sibling Scheme). Here&#8217;s a simple sexp:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n(* 2 (+ 3 4))\r\n<\/pre>\n<p>The sexp above corresponds to this infix expression:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n(2 * (3 + 4))\r\n<\/pre>\n<p>S-expressions are simple and infinitely powerful beasts as evident in applications that use LISP as their scripting language. They can represent code and data. Some people even use S-expressions as a suitable (and terser!) replacement for XML. The in-memory data structures are very easy to use, transform and manipulate, traverse and compile or accumulate results from.<\/p>\n<p>The plan is to use S-expressions as our AST representation and embed a minimal LISP\/Scheme interpreter <strong>IN<\/strong> the compiler. This implies that along the way, we&#8217;ll be building an S-expression parser and a LISP\/Scheme interpreter. How cool is that? &#8230; We&#8217;re talking about scripting the compiler with an interpreter! \ud83d\ude42<\/p>\n<p>I needed a dynamic data type that can represent the S-expressions. I called it utree, short for universal-tree. I want it to be as simple as it can be and fast and tight in memory footprint. Boost variant was simply out of the question (I used it in one early prototypes). For one, it failed a basic requirement (tight memory footprint). The padding and the way it aligns the &#8220;what-type&#8221; integer member is quite wasteful. It uses a conservative alignment using the worst alignment of the types in the union. Thus if you have a type in there that aligns to 8 bytes, variant requires another 8 bytes just for the type discriminator! Try it out:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstruct x { void* a; void* b; void* c; };\r\n\/***\/\r\nstd::cout &lt;&lt; sizeof(x) &lt;&lt; std::endl;\r\nstd::cout &lt;&lt; sizeof(boost::variant&lt;x, int, double&gt;) &lt;&lt; std::endl;\r\n<\/pre>\n<p>I get: 12 and 24 respectively (32 bit system).<\/p>\n<p>I ended up with 40 bytes in my initial prototype (using STL containers and variant) and later squeezed that to 24 (minimum). I did away with variant in my latest version and got 16 bytes. In this case, I &#8220;stole&#8221; unused padding bits from the data to store the discriminator. \u00a0\u00a0With this 16 bytes, I have nil, bool, int, double, string and (double linked) list. The string itself steals memory when it can (i.e. it stores the string in the union when it can and only uses the heap when needed). The string steals as much as it can. So, on 32 bit systems, it can store in-situ as much as 14 bytes. That&#8217;s a lot for storing simple strings like symbols and identifiers. On 64 bit systems, you can store a lot more in-situ and minimize heap usage more.<\/p>\n<p>At this point, I feel like writing my own variant type that can do such things (intrusive variant?). Barring the use of Boost.Variant, I needed to write my own data structures (double linked list). I really wanted to use Boost.Intrusive which is quite efficient, but because I had to squeeze my own variant in there, I had to make use of unions which require PODs!<\/p>\n<p>Here&#8217;s the work in progress:<br \/>\n<a href=\"http:\/\/boost-spirit.com\/dl_more\/scheme\/scheme_v0.2\/\">http:\/\/boost-spirit.com\/dl_more\/scheme\/scheme_v0.2\/<\/a><\/p>\n<p>Here&#8217;s the utree API:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\n\r\n    \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\r\n    \/\/ The main utree (Universal Tree) class\r\n    \/\/ The utree is a hierarchical, dynamic type that can store:\r\n    \/\/  - a nil\r\n    \/\/  - a bool\r\n    \/\/  - an integer\r\n    \/\/  - a double\r\n    \/\/  - a string (textual or binary)\r\n    \/\/  - a (doubly linked) list of utree\r\n    \/\/  - a reference to a utree\r\n    \/\/\r\n    \/\/ The utree has minimal memory footprint. The data structure size is\r\n    \/\/ 16 bytes on a 32-bit platform. Being a container of itself, it can\r\n    \/\/ represent tree structures.\r\n    \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\r\n    class utree\r\n    {\r\n    public:\r\n\r\n        typedef utree value_type;\r\n        typedef detail::list::node_iterator&lt;utree&gt; iterator;\r\n        typedef detail::list::node_iterator&lt;utree const&gt; const_iterator;\r\n        typedef utree&amp; reference;\r\n        typedef utree const&amp; const_reference;\r\n        typedef std::ptrdiff_t difference_type;\r\n        typedef std::size_t size_type;\r\n\r\n        struct nil {};\r\n\r\n        utree();\r\n        explicit utree(bool b);\r\n        explicit utree(unsigned int i);\r\n        explicit utree(int i);\r\n        explicit utree(double d);\r\n        explicit utree(char const* str);\r\n        explicit utree(char const* str, std::size_t len);\r\n        explicit utree(std::string const&amp; str);\r\n        explicit utree(boost::reference_wrapper&lt;utree&gt; ref);\r\n\r\n        utree(utree const&amp; other);\r\n        ~utree();\r\n\r\n        utree&amp; operator=(utree const&amp; other);\r\n        utree&amp; operator=(bool b);\r\n        utree&amp; operator=(unsigned int i);\r\n        utree&amp; operator=(int i);\r\n        utree&amp; operator=(double d);\r\n        utree&amp; operator=(char const* s);\r\n        utree&amp; operator=(std::string const&amp; s);\r\n        utree&amp; operator=(boost::reference_wrapper&lt;utree&gt; ref);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree const&amp; x, F f);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree&amp; x, F f);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree const&amp; x, utree const&amp; y, F f);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree&amp; x, utree const&amp; y, F f);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree const&amp; x, utree&amp; y, F f);\r\n\r\n        template &lt;typename F&gt;\r\n        typename F::result_type\r\n        static visit(utree&amp; x, utree&amp; y, F f);\r\n\r\n        template &lt;typename T&gt;\r\n        void push_back(T const&amp; val);\r\n\r\n        template &lt;typename T&gt;\r\n        void push_front(T const&amp; val);\r\n\r\n        template &lt;typename T&gt;\r\n        iterator insert(iterator pos, T const&amp; x);\r\n\r\n        template &lt;typename T&gt;\r\n        void insert(iterator pos, std::size_t, T const&amp; x);\r\n\r\n        template &lt;typename Iter&gt;\r\n        void insert(iterator pos, Iter first, Iter last);\r\n\r\n        template &lt;typename Iter&gt;\r\n        void assign(Iter first, Iter last);\r\n\r\n        void clear();\r\n        void pop_front();\r\n        void pop_back();\r\n        iterator erase(iterator pos);\r\n        iterator erase(iterator first, iterator last);\r\n\r\n        utree&amp; front();\r\n        utree&amp; back();\r\n        utree const&amp; front() const;\r\n        utree const&amp; back() const;\r\n\r\n        utree&amp; operator[](std::size_t i);\r\n        utree const&amp; operator[](std::size_t i) const;\r\n\r\n        void swap(utree&amp; other);\r\n\r\n        iterator begin();\r\n        iterator end();\r\n        const_iterator begin() const;\r\n        const_iterator end() const;\r\n\r\n        bool empty() const;\r\n        std::size_t size() const;\r\n    };\r\n\r\n    bool operator==(utree const&amp; a, utree const&amp; b);\r\n    bool operator&lt;(utree const&amp; a, utree const&amp; b);\r\n    bool operator!=(utree const&amp; a, utree const&amp; b);\r\n    bool operator&gt;(utree const&amp; a, utree const&amp; b);\r\n    bool operator&lt;=(utree const&amp; a, utree const&amp; b);\r\n    bool operator&gt;=(utree const&amp; a, utree const&amp; b);\r\n\r\n<\/pre>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1039\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1039\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1039\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1039\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>I have a mixed relationship with variant&#8230; I just wrote a parser for S-expressions (that will be the basis of ASTs and intermediate types in my planned &#8220;write-a-compiler&#8221; article series). The parser itself is easy, but as always, I spent more time on the underlying data structures.<\/p>\n<div class=\"sharedaddy sd-sharing-enabled\"><div class=\"robots-nocontent sd-block sd-social sd-social-icon-text sd-sharing\"><h3 class=\"sd-title\">Share this:<\/h3><div class=\"sd-content\"><ul><li><a href=\"#\" class=\"sharing-anchor sd-button share-more\"><span>Share<\/span><\/a><\/li><li class=\"share-end\"><\/li><\/ul><div class=\"sharing-hidden\"><div class=\"inner\" style=\"display: none;\"><ul><li class=\"share-facebook\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-facebook-1039\" class=\"share-facebook sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=facebook\" target=\"_blank\" title=\"Click to share on Facebook\" ><span>Facebook<\/span><\/a><\/li><li class=\"share-twitter\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-twitter-1039\" class=\"share-twitter sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=twitter\" target=\"_blank\" title=\"Click to share on Twitter\" ><span>Twitter<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-pinterest\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-pinterest-1039\" class=\"share-pinterest sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=pinterest\" target=\"_blank\" title=\"Click to share on Pinterest\" ><span>Pinterest<\/span><\/a><\/li><li class=\"share-linkedin\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"sharing-linkedin-1039\" class=\"share-linkedin sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=linkedin\" target=\"_blank\" title=\"Click to share on LinkedIn\" ><span>LinkedIn<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-reddit\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-reddit sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=reddit\" target=\"_blank\" title=\"Click to share on Reddit\" ><span>Reddit<\/span><\/a><\/li><li class=\"share-tumblr\"><a rel=\"nofollow noopener noreferrer\" data-shared=\"\" class=\"share-tumblr sd-button share-icon\" href=\"http:\/\/boost-spirit.com\/home\/2010\/03\/11\/s-expressions-and-variants\/?share=tumblr\" target=\"_blank\" title=\"Click to share on Tumblr\" ><span>Tumblr<\/span><\/a><\/li><li class=\"share-end\"><\/li><li class=\"share-end\"><\/li><\/ul><\/div><\/div><\/div><\/div><\/div>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[15,5],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pIHdZ-gL","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/1039"}],"collection":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/comments?post=1039"}],"version-history":[{"count":18,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/1039\/revisions"}],"predecessor-version":[{"id":1042,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/posts\/1039\/revisions\/1042"}],"wp:attachment":[{"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/media?parent=1039"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/categories?post=1039"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/boost-spirit.com\/home\/wp-json\/wp\/v2\/tags?post=1039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}