I have a mixed relationship with variant…
I just wrote a parser for S-expressions (that will be the basis of ASTs and intermediate types in my planned “write-a-compiler” article series). The parser itself is easy, but as always, I spent more time on the underlying data structures.
What are S-expressions? S-expressions, also called sexps, are recursive, list based, data structures. Being recursive, they can represent hierarchical information. S-expressions are parenthesized prefix expressions, known for their use in LISP (and its sibling Scheme). Here’s a simple sexp:
(* 2 (+ 3 4))
The sexp above corresponds to this infix expression:
(2 * (3 + 4))
S-expressions are simple and infinitely powerful beasts as evident in applications that use LISP as their scripting language. They can represent code and data. Some people even use S-expressions as a suitable (and terser!) replacement for XML. The in-memory data structures are very easy to use, transform and manipulate, traverse and compile or accumulate results from.
The plan is to use S-expressions as our AST representation and embed a minimal LISP/Scheme interpreter IN the compiler. This implies that along the way, we’ll be building an S-expression parser and a LISP/Scheme interpreter. How cool is that? … We’re talking about scripting the compiler with an interpreter! ๐
I needed a dynamic data type that can represent the S-expressions. I called it utree, short for universal-tree. I want it to be as simple as it can be and fast and tight in memory footprint. Boost variant was simply out of the question (I used it in one early prototypes). For one, it failed a basic requirement (tight memory footprint). The padding and the way it aligns the “what-type” integer member is quite wasteful. It uses a conservative alignment using the worst alignment of the types in the union. Thus if you have a type in there that aligns to 8 bytes, variant requires another 8 bytes just for the type discriminator! Try it out:
struct x { void* a; void* b; void* c; }; /***/ std::cout << sizeof(x) << std::endl; std::cout << sizeof(boost::variant<x, int, double>) << std::endl;
I get: 12 and 24 respectively (32 bit system).
I ended up with 40 bytes in my initial prototype (using STL containers and variant) and later squeezed that to 24 (minimum). I did away with variant in my latest version and got 16 bytes. In this case, I “stole” unused padding bits from the data to store the discriminator. ย ย With this 16 bytes, I have nil, bool, int, double, string and (double linked) list. The string itself steals memory when it can (i.e. it stores the string in the union when it can and only uses the heap when needed). The string steals as much as it can. So, on 32 bit systems, it can store in-situ as much as 14 bytes. That’s a lot for storing simple strings like symbols and identifiers. On 64 bit systems, you can store a lot more in-situ and minimize heap usage more.
At this point, I feel like writing my own variant type that can do such things (intrusive variant?). Barring the use of Boost.Variant, I needed to write my own data structures (double linked list). I really wanted to use Boost.Intrusive which is quite efficient, but because I had to squeeze my own variant in there, I had to make use of unions which require PODs!
Here’s the work in progress:
http://boost-spirit.com/dl_more/scheme/scheme_v0.2/
Here’s the utree API:
/////////////////////////////////////////////////////////////////////////// // The main utree (Universal Tree) class // The utree is a hierarchical, dynamic type that can store: // - a nil // - a bool // - an integer // - a double // - a string (textual or binary) // - a (doubly linked) list of utree // - a reference to a utree // // The utree has minimal memory footprint. The data structure size is // 16 bytes on a 32-bit platform. Being a container of itself, it can // represent tree structures. /////////////////////////////////////////////////////////////////////////// class utree { public: typedef utree value_type; typedef detail::list::node_iterator<utree> iterator; typedef detail::list::node_iterator<utree const> const_iterator; typedef utree& reference; typedef utree const& const_reference; typedef std::ptrdiff_t difference_type; typedef std::size_t size_type; struct nil {}; utree(); explicit utree(bool b); explicit utree(unsigned int i); explicit utree(int i); explicit utree(double d); explicit utree(char const* str); explicit utree(char const* str, std::size_t len); explicit utree(std::string const& str); explicit utree(boost::reference_wrapper<utree> ref); utree(utree const& other); ~utree(); utree& operator=(utree const& other); utree& operator=(bool b); utree& operator=(unsigned int i); utree& operator=(int i); utree& operator=(double d); utree& operator=(char const* s); utree& operator=(std::string const& s); utree& operator=(boost::reference_wrapper<utree> ref); template <typename F> typename F::result_type static visit(utree const& x, F f); template <typename F> typename F::result_type static visit(utree& x, F f); template <typename F> typename F::result_type static visit(utree const& x, utree const& y, F f); template <typename F> typename F::result_type static visit(utree& x, utree const& y, F f); template <typename F> typename F::result_type static visit(utree const& x, utree& y, F f); template <typename F> typename F::result_type static visit(utree& x, utree& y, F f); template <typename T> void push_back(T const& val); template <typename T> void push_front(T const& val); template <typename T> iterator insert(iterator pos, T const& x); template <typename T> void insert(iterator pos, std::size_t, T const& x); template <typename Iter> void insert(iterator pos, Iter first, Iter last); template <typename Iter> void assign(Iter first, Iter last); void clear(); void pop_front(); void pop_back(); iterator erase(iterator pos); iterator erase(iterator first, iterator last); utree& front(); utree& back(); utree const& front() const; utree const& back() const; utree& operator[](std::size_t i); utree const& operator[](std::size_t i) const; void swap(utree& other); iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const; bool empty() const; std::size_t size() const; }; bool operator==(utree const& a, utree const& b); bool operator<(utree const& a, utree const& b); bool operator!=(utree const& a, utree const& b); bool operator>(utree const& a, utree const& b); bool operator<=(utree const& a, utree const& b); bool operator>=(utree const& a, utree const& b);
What is an S-expression?
would you please be kind enough to explain it to a layman like myself.
Try googling s-expression.
I got several hits.
I’ll add a short summary anyway. Thanks, Darid.
Done ๐
You’re storing the discriminator in the padding. Couldn’t the same be
done by changing boost/variant/variant.hpp:215-216 from:
which_t which_;
storage_t storage_;
to:
storage_t storage_;
which_t which_;
? IOW, the compiler should be able to figure out how to best fit
which_ after storage_.
Nope. I don’t think so. The padding will be the same either way. Try it out.
OK, but now I can’t figure how you store the discriminant inside the
padding at the end of your union of POD’s without somehow
misaligning the discriminant. I guess just saying I wonder why
the compiler writers couldn’t figure how to do this most efficiently.
Do you know of some language rules which require this wasted
space?
I did nothing special. I just stole one byte from the fast_string (the string that places its data in-situ) for the discriminator. Check out the implementation of fast_string:
http://boost-spirit.com/dl_more/scheme/scheme_v0.1/detail/utree_detail1.hpp
Would you mind explaining why you need the
lbuff
field in the fast_string union? I see that it has the sizebuff_size / (sizeof(long) / sizeof(char))
; why that size specifically?You seem to use during utree initialization before assigning a value. My first thought was that its purpose is to zero out the memory, but then why would you need an extra union field?
Mathias, its purpose is to make sure that 1) we initialize the string to zero 2) that we have a size that’s divisible by sizeof(long). The latter has the advantage of a) optimizing the buffer usage since machine architectures tend to align the structs/unions anyway, and b) having to zero the string in size/sizeof(long) operations instead of size operations.
Thanks, I understand (1) and now (2a). For (2b), why didn’t you use
memset
? My understanding is that is the most efficient way to initialize memory with a constant value.That is the common perception alright, but i’m not so sure about that for this case. memset has to deal with off sizes. Anyway, if I really wanted blazing speed, I’d unroll the loop (since the size is static anyway), but that is not the main intent.
I tried using a union with a discriminant inside a struct. I get the same
size of for composite_tagged_seq:
IOW, the following code:
produces:
sizeof(x)=24
sizeof(x_union)=32
sizeof(x_oneof)=32
Yep. You can’t get away with the extra padding. In my case, I just store the discriminant (intrusively) in the largest union type in a place where it can not be overwritten by other inhabitants of the union. That is a caveat. I place this at the right/bottom-most of the largest struct but there can be some obscure OS where some weird alignment can place data such that this is overwritten. This can be tested at compile time though. For a generic variant, one can perhaps use the same trick by having an array of chars with 1+ the largest struct and place the discriminator there at the extra byte.
Sometimes for tree is needed some function on its all elements .
For u-tree its possible to doo for_each?
Definitely. It’s an STL container. You can also do unary and binary visitation like you do with variant, allowing you to traverse the tree any way you want.
The reason for this:
Thus if you have a type in there that aligns to 12 bytes, variant
requires another 12 bytes just for the type discriminator!
is to preserve the 12 byte alignment for the oddly aligned type when the
variant is put into an array. By storing the type discriminant in the
1st 12 bytes of the containing structure, then when the containing structure
is put into a 2 element array, the 2nd array element will have the properly
aligned oddly aligned type. Now I suppose you could avoid this restriction
if you’re sure never to put utree into an array.
Yes, Larry, that is correct. 12 bytes alignment is quite odd though. 64 bit machines I am aware of typically aligns to 8 bytes (64 bits). Correct me if I’m wrong. Anyway, the point really is to make use of the padding bits and optimize the packing of the data.
I like the idea of a more efficient variant. Perhaps I am misunderstanding a fundamental goal of the utree, but how would it be possible to store an arbitrary type T in it, which seems to be a requirement for creating arbitrary ASTs.
You don’t. Like in scheme, you only have one general data structure that can represent any data structure. In scheme S-expressions *can* represent any type of hierarchical data. You store arbitrary heterogeneous data there in an intermediate form that you later convert to your final form by walking the tree and building your actual data structures.
I see. Now the name utree makes even more sense. Thanks for the clarification.
Yet another basic question: say each utree element uses x (24 in your case) bytes. Would the S-expression (true 42 8.7 ‘c’) then use (1+4)*x bytes?
Well, it will be a list, so you’ll also need 2 linked-list pointers per element. I’m considering some optimizations though for small lists that will give you a better memory footprint.
In lisp by a sample syntax aproximately is (operation operand1 operand2 operand3 ), then how i can represent in utree operation and list of operands
in same time when utree has only one variant value ?
A UTree element can also be a list.
Thanks for replay, I’ll try to make my tree (actually i mean operands as rule references, and operation as one of spirit rule operations )
Sorry for maybe dumb question, is possible to construct utree as list from
std::list or std::vector or maybe scheme::detail::list on one line say
scheme::utree my_utree( std::vector(21) );
I spend 3 hours to quess right syntax, but no results…,
only errors like …
C:\csource\utree\test.cpp:117: error: no matching function for call to ‘scheme::utree::utree(scheme::detail::list&)’
C:\csource\utree\detail/utree_detail2.hpp:979: note: candidates are: scheme::utree::utree(scheme::utree::construct_list)
Sorry on more time I have lost < > ,
std::list<int > or std::vector<int > or maybe scheme::detail::list on one line say
scheme::utree my_utree( std::vector<int >(21) );
I spend 3 hours to quess right syntax, but no results…,
only errors like …
C:\csource\utree\test.cpp:117: error: no matching function for call to ‘scheme::utree::utree(scheme::detail::list&)’
C:\csource\utree\detail/utree_detail2.hpp:979: note: candidates are: scheme::utree::utree(scheme::utree::construct_list)
utree has an additional constructor (not shown in the listing above) taking a boost::iterator_range:
which should do what you want.
Regrads Hartmut
Thanks Hartmut, that works, just now I understand that if
utree is constructed as list , only makes a sense to construct it as list of
of utrees ๐
That looks good. I will certainly have a use for this.
But I have a concern… isn’t storing members of a union as one type and then reading them as another forbidden by the standard. IIRC it breaks the strict-aliasing rule, and it can cause problems with certain compilers that do optimization taking advantage of the aliasing rules.
It would probably be better to store it as an unsigned char array and then cast as needed (casting to and from char pointers is allowed by the standard).
How can I handle optional, even rekursive??
Optionals are just T | nil
utree is inherently recursive
Only for interest: Why do you wrote your own list implementation and don’t use std::list? Why is there no allocator interface like the std::list has which would allow to use pools, e.g. boost’s fast_pool_allocator? Would be any benefit from them?
1) size constraints. std::list is 16 bytes last I checked (32 bit compiler) 2) I needed to keep the elements in the variant PODs to simplify the implementation. It’s a lot easier this way than having to rewrite variant for our very tuned and specific application.