Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Quick Start Guide

Traversing Data Structures

Hierarchical data structures are data structures built by repeatedly composing together simpler types, for example:

Composition can be applied repeatedly to build up complex types, std::vector's of struct's containing std::pair's and so on. The Traversal Library is concerned with traversing these trees of data, and performing targetted operations on parts of the tree.

A traversal can be thought of as consisting of 2 distinct 'directions':

horizontal

vertical

Our Example

For our examples we will consider a simple model of a business or organization.

namespace demo {

    struct dept;

    typedef int salary;
    typedef int age;

    template<typename Tag>
    struct Id
    {
        std::string id;
    };

    template<typename Tag>
    std::ostream& operator<<(std::ostream& s, Id<Tag> const& id);

    struct surname_tag;
    struct address_tag;

    typedef Id<surname_tag> surname_type;
    typedef Id<address_tag> address_type;

    struct person
    {
        surname_type surname;
        address_type address;
        int age;
    };

    struct employee
    {
        person detail;
        salary pay;
    };

    typedef boost::variant<dept, employee> unit;

    struct dept
    {
        std::string name;
        std::vector<unit> units;
    };

    typedef std::vector<dept> company;
}

BOOST_FUSION_ADAPT_STRUCT(demo::person,
                          (surname_type, surname)
                          (address_type, address)
                          (int, age))

BOOST_FUSION_ADAPT_STRUCT(demo::employee,
                          (demo::person, detail)
                          (demo::salary, pay))

BOOST_FUSION_ADAPT_STRUCT(demo::dept,
                          (std::string, name)
                          (std::vector<demo::unit>, units))

The BOOST_FUSION_ADAPT_STRUCT macro used above produces the necessary code to make each of the structs that we are using into Boost.Fusion sequences. Traversal Library can traverse Fusion sequences, and uses this information to understand the internal structure of each of the structs.

Something Simple

To get started, lets do about the simplest thing we can do. Lets print out all the different employee names in the firm, using a Boost.Lambda expression:

traversal::full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));

There is quite a lot going on here, even in only 1 line of code:

Ok, a few quick variants on our initial example:

traversal::reverse_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));
traversal::bu_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));
traversal::bu_reverse_full(my_company, traversal::restrict<void(surname_type&)>(std::cout << _1 << std::endl));

Parental Involvement

Next, we'll try incrementing the salaries of everybody in the firm (yay!):

traversal::full(v, traversal::restrict<void(int&)>(_1 += increase)); // Wont work

The above code wont work as expected. It will increase everybody's salary, but it will also increase everybody's ages by the same amount (which might not be quite as popular!), as both salary and age are of type int. We skimmed over the Id<Tag> template in the previous examples, but it was important for our code to work properly, as it provided distinct types for name_type and address_type.

We need to find another way of identifying the data we want to modify, given our lack of type information. What we do know is that the int we wish to modify is contained in an employee object. So if we had access to information about the parent nodes, we could target the data we want. This time we will build our own function object, rather than using the Traversal Library combinators, to demonstrate the process of producing bespoke modification schemes:

struct raise_salary
{
    explicit raise_salary(int raise)
        : raise_(raise)
    {}

    int raise_;

    // This overload matches all the nodes we don't care about
    template<typename Parent, typename Child>
    void operator()(Child&, Parent&) const
    {}

    // This overload picks out the salary values we wish to modify
    void operator()(salary_type& salary, employee& e) const
    {
        salary += raise_;
    }
};

The function objects used by Traversal Library are just ordinary function objects, but typically with multiple overloads of the call operator. Overload resolution then selects the best match at each point in the data structure, and this process allows us to selectively process different parts of the tree. Generally there will be one template operator() to provide default behaviour (such as do nothing), and overloads of operator() with more specific typing to perform the interesting actions involved in the traversal. Using our function object we can write:

traversal::full_parent(my_company, raise_salary(101))

traversal::full_parent calls raise_salary for each node in my_company, as with the previous traversal::full call. The _parent postfix means that two arguments will be passes to the call operator of the function object argument. The first argument is a reference to the current node, and the second argument is a reference to its parent node. Using the additional information about the type of the parent at each node in the tree, our 2 overloads of operator() allow us to pick out only the salary member in the employee type to modify, without inadvertently modifying the age data elsewhere in the data structure. In effect we are applying a form of pattern matching to pick out the parts of the tree that we desire. Every Traversal Library traversal strategy provides _parent variants to permit this type of behaviour.

We can actually simplify the strategy above, using another tool from the Traversal Library. The code below will have the same effect:

void raise_salary()(salary_type& salary, employee& e) const
{
    salary += 101; // We have to hardwire the raise size this time
}

traversal::full_parent(my_company, traversal::extend(raise_salary))

traversal::extend takes a function, and returns a function object, calling the underlying function when the argument types match, and returning a default value otherwise. The default value is either a default constructed value of the function return type, or the value passed as the second argument to traversal::extend. As a function has no state, we have to hardwire a particular salary increase. Extending stateful function objects will be handled in a later example.

Type Unification

Ok, now we've raised the salary of everybody in the firm, it might be a good idea if we could sum up the total off all the salaries that we're paying to all our workers. The Traversal Library provides 2 types of traverals:

  1. In place traversals - These are the traversals we've seen so far. They permit us to modify the original data structure in place, or to just report information about the tree, for example printing out all the employee names as we encounter them.
  2. Type unifying traversals - These traversals walk over a data structure, and return a result of a specific type. This is the type of traversal we need to convert a company structure into an int summarizing the total salaries across the firm.

We're going to build another function object, similar to raise_salary that we used previously.

struct sum_salaries
{
    // This overload matches all the nodes we don't care about
    template<typename Child, typename Parent>
    int operator()(int state, Child&, Parent&) const
    {
        return state;
    }

    // This overload picks out the salary values we wish to sum
    int operator()(int state, salary_type& salary, employee& e) const
    {
        state += salary;
        return state
    }
};

Now we can sum up all the salaries in the firm with:

traversal::tu_full_parent(my_company, 0, sum_salaries())

tu_full is the type unifying counterpart to the in place full traversal that we have seen previously. All the traversal algorithms come in corresponding pairs, the in place variant, and the type unifying variant, with a tu_ prefix. tu_full takes 3 arguments, the data structure to traverse, the initial state, and a function object describing how to update the state at different points in the data structure.

We actually could have done the whole thing in a line using the restrict operator and the Boost.Lambda library:

traversal::tu_full_parent(my_company, 0, traversal::tu_restrict<int(int&, salary_type&, employee&)>(_1 + _2))

Selective Traversals

In the examples we've seen so far, we always visit every part of the argument data structure, using different variants of the full traversal. We may wish to be more selective in our traversals, only walking a subset of the full tree.

We shall write a function to return the address of a named department within our example company structure. To do so we shall need to produce a suitable function object:

struct find_dept
{
    find_dept(std::string const& name)
      : name(name)
    {}

    // Pick out Dept objects, and check for the named department
    std::pair<Dept*, bool>
    operator()(std::pair<Dept*, bool> const& state, Dept& dept) const
    {
        if(dept.name == name)
            return std::make_pair(&dept, true);
        else
            return state;
    }

    // Ignore everything else
    template<typename T>
    std::pair<Dept*, bool> 
    operator()(std::pair<Dept*, bool> const& state, T& t) const
    {
        return state;
    }

    std::string name;
};

The convention for selective traversals is that they return a bool indicating if the traversal should terminate early. In the case of a type unifying traversal, function objects return a std::pair containing the result, and the boolean termination flag. So our function object skips everything until it finds the named department in question, at which point it takes it's address, and sets the termination flag to true. We then use an instance of the function object as follows:

traversal::tu_once(my_company, static_cast<Dept*>(0), find_dept("HR"))

The traversal tu_once traverses my_company until the argument function object returns a true termination flag. We need the static_cast just so the result type is deduced correctly. tu_once returns a std::pair containing the result and the final termination flag. Again, this can be implemented more concisely using the extend utility:

std::pair<Dept*, bool>
hr_finder(std::pair<Dept*, bool> const& state, Dept& dept) const
{
    if(dept.name == "HR") // We hardwire the dept to search for
        return std::make_pair(&dept, true);
    else
        return state;
}

traversal::tu_once(my_company, static_cast<Dept*>(0), traversal::extend(hr_finder))

Having to hardwire the department to search for in the above example is unsatisfactory, we really need a stateful finder. Fortunately extend also works with function objects that have the appropriate result_type and xxx_type typedefs, so we can have our cake and eat it, a simple function object with the state we need, and using extend to fill in the remaining details:

struct finder
{
    typedef std::pair<Dept*, bool> result_type;
    typedef std::pair<Dept*, bool>& first_argument_type;
    typedef Dept& second_argument_type;

    explicit finder(std::string const& name)
        : name(name)
    {}

    // Function object just for the action we care about
    // Skipping behaviour and default results will be handled
    // by extend.
    result_type 
    operator()(first_argument_type state, second_argument_type dept) const
    {
        if(dept.name == name) // We hardwire the dept to search for
            return std::make_pair(&dept, true);
        else
            return state;
    }

    std::string name;
};

// We can find departments other than HR, much better!
traversal::tu_once(my_company, static_cast<Dept*>(0), traversal::extend(finder("Finance")))

All The Family

In total there are 4 different traversal types provided by Traversal Library

full

once

stop

spine

Each of the 4 traversal types then has type unifying variants, top down and bottom up variants, and forward and reverse horizontal traversal variants. These different degrees of freedom then described the full family of traversals available.

More powerful pattern matching on data types

Starting with version 0.02 the Traversal Library has support for pattern matching on data structures, beyond the straightforward overload selection provided previously.

Pattern matching is supported by 2 cooperating components. The match combinator generates a function object that is applied whenever a given pattern matches. For example:

traversal::full(v, traversal::match<my_pattern>(func));

The code above performs a full traversal as usual, but only applying func where my_pattern matches the input data type. So what is my_pattern? It is simply an MPL metafunction class, describing the types that func should be applied to. The second part of the Traversal Library pattern matching support is a family of templates for building up patterns. For example func will be applied to all std::vectors in the traversal below:

traversal::full(v, traversal::match<pattern::vector<pattern::_> >(func));

pattern::vector is a template that creates a metafunction class that matches standard vectors whos elements match the given element pattern. pattern::_ is a wildcard pattern that matches any datatype. So the combination above matches vectors with arbitrary element types.

We can build up patterns using composition, just like we did with our original data structures. Patterns can be named using typedefs or inheritence to improve code clarity. The example below matches std::vectors, with elements that are std::pairs, with a first element of type int.

// Create a named pattern
struct my_pattern
  : pattern::vector<pattern::pair<pattern::lit<int>, pattern::_> >
{};

// Match vectors of pairs, where the first element of the pair is an integer
traversal::full(v, traversal::match<my_pattern>(func));

As a final example, we show how Traversal Library patterns can be combined with other libraries, in this case MPL and Boost.TypeTraits to provide custom pattern matching:

// Another named pattern
struct pointers_to_integral_types
  : pattern::pointer<is_integral<mpl::_> >
{};

// Apply func to all pointers to integral types in v
traversal::full(v, traversal::match<pointers_to_integral_types>(func));

The Traversal Library currently provides pattern matching primitives for the standard containers, Fusion sequences, Boost.Variants, Boost.Optionals, built in pointers, std::auto_ptr and the various Boost smart pointer types.

Customizing Traversal Behaviour

The Traversal Library also allows you to customize traversal behaviour. For example, by default the library assumes a std::string is atomic, and does not traverse the individual chars within the string. The way to modify this behaviour is to provide a custom horizontal traversal scheme, as all the recursive traversals are built upon the horizontal traversals, they can all be modified in this way. The Traversal Library also provides utilities for simplifying the generation of common custom traversal schemes.

More detail here in later versions of the documentation!

Copyright © 2007 -2008 Dan Marsden

PrevUpHomeNext