Home | Libraries | People | FAQ | More |
Hierarchical data structures are data structures built by repeatedly composing together simpler types, for example:
std::pair<int,char>
composes the simpler int
and char
types to make a more
complex type
struct
can be thought of as
a collection of its members
Composition can be applied repeatedly to build up complex types, std::vector
's
of struct
's containing std::pair
's
and so on. The Traversal Library is concerned with traversing these trees of
data, and performing targetted operations on parts of the tree.
A traversal can be thought of as consisting of 2 distinct 'directions':
std::vector
.
Horizontal traversals are the fundamental building blocks of Traversal Library.
The more interesting recursive traversals are built by repeated application
of the horizontal traversals at the different levels of the tree. The grey
nodes in the diagram below indicate the direction of a horizontal traversal:
Employee
in a std::vector<Employee>
to continue a traversal. As we traverse deeper vertically, the types become
simpler and simpler. Inevitably we eventually reach types with no additional
substructure, and this terminates the vertical traversal. For example if
we traverse our std::vector<Employee>
,
we then enter the simpler Employee
type, and then into the int
representing the employees age, an int
has no interesting structure, so this terminates the vertical traversal.
The grey nodes in the diagram below indicate the direction of a vertical
traversal:
For our examples we will consider a simple model of a business or organization.
namespace demo { struct dept; typedef int salary; typedef int age; template<typename Tag> struct Id { std::string id; }; template<typename Tag> std::ostream& operator<<(std::ostream& s, Id<Tag> const& id); struct surname_tag; struct address_tag; typedef Id<surname_tag> surname_type; typedef Id<address_tag> address_type; struct person { surname_type surname; address_type address; int age; }; struct employee { person detail; salary pay; }; typedef boost::variant<dept, employee> unit; struct dept { std::string name; std::vector<unit> units; }; typedef std::vector<dept> company; } BOOST_FUSION_ADAPT_STRUCT(demo::person, (surname_type, surname) (address_type, address) (int, age)) BOOST_FUSION_ADAPT_STRUCT(demo::employee, (demo::person, detail) (demo::salary, pay)) BOOST_FUSION_ADAPT_STRUCT(demo::dept, (std::string, name) (std::vector<demo::unit>, units))
The BOOST_FUSION_ADAPT_STRUCT
macro used above produces the necessary code to make each of the struct
s that we are using into Boost.Fusion
sequences. Traversal Library can traverse Fusion sequences, and uses this information
to understand the internal structure of each of the structs
.
To get started, lets do about the simplest thing we can do. Lets print out all the different employee names in the firm, using a Boost.Lambda expression:
traversal::full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));
There is quite a lot going on here, even in only 1 line of code:
traversal::full
is a traversal strategy. It specifies
that we should walk over every element of its first argument, applying the
function object passed in to every data member.
traversal::restrict
takes a function object, and returns
a new function object. The argument callable is applied to elements of the
specified type. To any other argument type, the resulting function call is
a null op. In this way, we select to only apply our Boost.Lambda expression
to the surname_type
members
we encounter as we walk the tree.
Ok, a few quick variants on our initial example:
traversal::reverse_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl)); traversal::bu_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl)); traversal::bu_reverse_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));
reverse_full
performs a full
traversal, but reverses the order of horizontal traversals in the tree. For
example, standard containers will be traversed from rbegin
to rend
rather than begin
to end
.
The reverse_
prefix is used
throughout Traversal Library to indicate this switch in horizontal traversal
direction.
bu_full
performs a full traversal,
but visits nodes in the tree from the bottom up, rather than the default
top down direction. The bu_
prefix is used throughout Traversal Library to indicate this switch in vertical
traversal direction.
bu_reverse_full
switches
both the vertical and horizontal traversal directions. When both traversal
directions are to be switched, the prefix bu_reverse_
is used throughout Traversal Library.
Next, we'll try incrementing the salaries of everybody in the firm (yay!):
traversal::full(v, traversal::restrict<void(int&)>(_1 += increase)); // Wont work
The above code wont work as expected. It will increase everybody's salary,
but it will also increase everybody's ages by the same amount (which might
not be quite as popular!), as both salary and age are of type int
. We skimmed over the Id<Tag>
template in the previous examples, but it was important for our code to work
properly, as it provided distinct types for name_type
and address_type
.
We need to find another way of identifying the data we want to modify, given
our lack of type information. What we do know is that the int
we wish to modify is contained in an employee
object. So if we had access to information about the parent nodes, we could
target the data we want. This time we will build our own function object, rather
than using the Traversal Library combinators,
to demonstrate the process of producing bespoke modification schemes:
struct raise_salary { explicit raise_salary(int raise) : raise_(raise) {} int raise_; // This overload matches all the nodes we don't care about template<typename Parent, typename Child> void operator()(Child&, Parent&) const {} // This overload picks out the salary values we wish to modify void operator()(salary_type& salary, employee& e) const { salary += raise_; } };
The function objects used by Traversal Library are just ordinary function objects,
but typically with multiple overloads of the call operator. Overload resolution
then selects the best match at each point in the data structure, and this process
allows us to selectively process different parts of the tree. Generally there
will be one template operator()
to provide default behaviour (such as do
nothing), and overloads of operator()
with more specific typing to perform the
interesting actions involved in the traversal. Using our function object we
can write:
traversal::full_parent(my_company, raise_salary(101))
traversal::full_parent
calls raise_salary
for each node in my_company
,
as with the previous traversal::full
call.
The _parent
postfix means that
two arguments will be passes to the call operator of the function object argument.
The first argument is a reference to the current node, and the second argument
is a reference to its parent node. Using the additional information about the
type of the parent at each node in the tree, our 2 overloads of operator()
allow
us to pick out only the salary member in the employee
type to modify, without inadvertently modifying the age
data elsewhere in the data structure. In effect we are applying a form of pattern
matching to pick out the parts of the tree that we desire. Every Traversal
Library traversal strategy provides _parent
variants to permit this type of behaviour.
We can actually simplify the strategy above, using another tool from the Traversal Library. The code below will have the same effect:
void raise_salary()(salary_type& salary, employee& e) const { salary += 101; // We have to hardwire the raise size this time } traversal::full_parent(my_company, traversal::extend(raise_salary))
traversal::extend
takes a function, and returns a function
object, calling the underlying function when the argument types match, and
returning a default value otherwise. The default value is either a default
constructed value of the function return type, or the value passed as the second
argument to traversal::extend
. As a function has no state, we have
to hardwire a particular salary increase. Extending stateful function objects
will be handled in a later example.
Ok, now we've raised the salary of everybody in the firm, it might be a good idea if we could sum up the total off all the salaries that we're paying to all our workers. The Traversal Library provides 2 types of traverals:
int
summarizing the total salaries across the firm.
We're going to build another function object, similar to raise_salary
that we used previously.
struct sum_salaries { // This overload matches all the nodes we don't care about template<typename Child, typename Parent> int operator()(int state, Child&, Parent&) const { return state; } // This overload picks out the salary values we wish to sum int operator()(int state, salary_type& salary, employee& e) const { state += salary; return state } };
Now we can sum up all the salaries in the firm with:
traversal::tu_full_parent(my_company, 0, sum_salaries())
tu_full
is the type unifying
counterpart to the in place full
traversal that we have seen previously. All the traversal algorithms come in
corresponding pairs, the in place variant, and the type unifying variant, with
a tu_
prefix. tu_full
takes 3 arguments, the data structure
to traverse, the initial state, and a function object describing how to update
the state at different points in the data structure.
We actually could have done the whole thing in a line using the restrict
operator and the Boost.Lambda library:
traversal::tu_full_parent(my_company, 0, traversal::tu_restrict<int(int&, salary_type&, employee&)>(_1 + _2))
In the examples we've seen so far, we always visit every part of the argument
data structure, using different variants of the full
traversal. We may wish to be more selective in our traversals, only walking
a subset of the full tree.
We shall write a function to return the address of a named department within our example company structure. To do so we shall need to produce a suitable function object:
struct find_dept { find_dept(std::string const& name) : name(name) {} // Pick out Dept objects, and check for the named department std::pair<Dept*, bool> operator()(std::pair<Dept*, bool> const& state, Dept& dept) const { if(dept.name == name) return std::make_pair(&dept, true); else return state; } // Ignore everything else template<typename T> std::pair<Dept*, bool> operator()(std::pair<Dept*, bool> const& state, T& t) const { return state; } std::string name; };
The convention for selective traversals is that they return a bool
indicating if the traversal should terminate
early. In the case of a type unifying traversal, function objects return a
std::pair
containing the result, and the boolean
termination flag. So our function object skips everything until it finds the
named department in question, at which point it takes it's address, and sets
the termination flag to true
.
We then use an instance of the function object as follows:
traversal::tu_once(my_company, static_cast<Dept*>(0), find_dept("HR"))
The traversal tu_once
traverses
my_company
until the argument
function object returns a true
termination flag. We need the static_cast
just so the result type is deduced correctly. tu_once
returns a std::pair
containing the result and the final
termination flag. Again, this can be implemented more concisely using the
extend
utility:
std::pair<Dept*, bool> hr_finder(std::pair<Dept*, bool> const& state, Dept& dept) const { if(dept.name == "HR") // We hardwire the dept to search for return std::make_pair(&dept, true); else return state; } traversal::tu_once(my_company, static_cast<Dept*>(0), traversal::extend(hr_finder))
Having to hardwire the department to search for in the above example is unsatisfactory,
we really need a stateful finder
.
Fortunately extend
also works
with function objects that have the appropriate result_type
and xxx_type
typedefs, so we
can have our cake and eat it, a simple function object with the state we need,
and using extend
to fill in
the remaining details:
struct finder { typedef std::pair<Dept*, bool> result_type; typedef std::pair<Dept*, bool>& first_argument_type; typedef Dept& second_argument_type; explicit finder(std::string const& name) : name(name) {} // Function object just for the action we care about // Skipping behaviour and default results will be handled // by extend. result_type operator()(first_argument_type state, second_argument_type dept) const { if(dept.name == name) // We hardwire the dept to search for return std::make_pair(&dept, true); else return state; } std::string name; }; // We can find departments other than HR, much better! traversal::tu_once(my_company, static_cast<Dept*>(0), traversal::extend(finder("Finance")))
In total there are 4 different traversal types provided by Traversal Library
full
traversals. These visit
every node in the data structure, and have been used in the bulk of our examples
so far. The dashed line in the diagram below indicates the path of a full
traversal:
once
traversals. These visit
every node in the data structure until the termination flags is set, at which
point they terminate immediately. The dashed line in the diagram below indicates
the path of a once
traversal.
The grey node indicates the point at which the applied function indicated
the traversal should end:
stop
traversals. These visit
every node in the tree until the termination flag is set. Once the termination
flag is set, they will traverse no further into the current subtree. The
dashed line in the diagram below indicates the path of a stop traversal.
The grey node indicates a point at which the applied function indicated that
traversal further into that subtree was not required:
spine
traversals. These visit
every node in each horizontal level of the tree. Once the termination flag
is set, they will traverse no further horizontally, an proceed into the identified
subtree. The basic principle is to find a single desired node at each level,
and then proceed down into the next level and repeat, finding a spine of
nodes down the structure. The dashed line in the diagram below indicates
the path of a spine
traversal.
The grey nodes indicate the point at which the applied function indicated
the horizontal traversal should be halted, and the traversal should enter
into the subtree below:
Each of the 4 traversal types then has type unifying variants, top down and bottom up variants, and forward and reverse horizontal traversal variants. These different degrees of freedom then described the full family of traversals available.
Starting with version 0.02 the Traversal Library has support for pattern matching on data structures, beyond the straightforward overload selection provided previously.
Pattern matching is supported by 2 cooperating components. The match
combinator generates a function object
that is applied whenever a given pattern matches. For example:
traversal::full(v, traversal::match<my_pattern>(func));
The code above performs a full traversal as usual, but only applying func
where my_pattern
matches the input data type. So what is my_pattern
?
It is simply an MPL metafunction class, describing the types that func
should be applied to. The second part
of the Traversal Library pattern matching support is a family of templates
for building up patterns. For example func
will be applied to all std::vector
s
in the traversal below:
traversal::full(v, traversal::match<pattern::vector<pattern::_> >(func));
pattern::vector
is a template that creates a metafunction
class that matches standard vectors whos elements match the given element pattern.
pattern::_
is a wildcard pattern that matches any
datatype. So the combination above matches vectors with arbitrary element types.
We can build up patterns using composition, just like we did with our original
data structures. Patterns can be named using typedefs or inheritence to improve
code clarity. The example below matches std::vector
s,
with elements that are std::pair
s,
with a first element of type int
.
// Create a named pattern struct my_pattern : pattern::vector<pattern::pair<pattern::lit<int>, pattern::_> > {}; // Match vectors of pairs, where the first element of the pair is an integer traversal::full(v, traversal::match<my_pattern>(func));
As a final example, we show how Traversal Library patterns can be combined with other libraries, in this case MPL and Boost.TypeTraits to provide custom pattern matching:
// Another named pattern struct pointers_to_integral_types : pattern::pointer<is_integral<mpl::_> > {}; // Apply func to all pointers to integral types in v traversal::full(v, traversal::match<pointers_to_integral_types>(func));
The Traversal Library currently provides pattern matching primitives for the
standard containers, Fusion sequences, Boost.Variants, Boost.Optionals, built
in pointers, std::auto_ptr
and the various Boost smart pointer
types.
The Traversal Library also allows you to customize traversal behaviour. For
example, by default the library assumes a std::string
is atomic, and does not traverse the individual char
s
within the string. The way to modify this behaviour is to provide a custom
horizontal traversal scheme, as all the recursive traversals are built upon
the horizontal traversals, they can all be modified in this way. The Traversal
Library also provides utilities for simplifying the generation of common custom
traversal schemes.
More detail here in later versions of the documentation!
Copyright © 2007 -2008 Dan Marsden |