constexpr tree structures
This is a followup of my notes about a constexpr
linked list.
A node
based binary tree
The same consideration for a linked list holds, a node
won’t use pointers but const
references, and it won’t have userdefined constructors (even defaulted). Given this limitation in the design space, there are also the same usability issues, until the already reported compiler bugs are fixed.
For simplicity, let’s see how a node
based binary tree can be implemented.
struct data{int i;};
struct node {
data d;
const node &left;
const node &right;
};
constexpr const node Null{data{0},Null,Null};
constexpr node leaf(data d){
return node{d, Null, Null};
}
constexpr const node tree
{
data{1},
node
{
data{2},
Null,
leaf(data{3})
},
leaf(data{4})
};
The limitations are the same as the list, and nearly no operation is possible at compiletime on all three major compilers.
No parent reference
I know no way how to initialize a tree where every node
has a reference to its parent, just like I have no idea how to initialize a doublylinked list. This makes it more difficult to implement certain types of algorithms.
Traversing the tree
There are two main methods for traversing a tree: breadthfirst and depthfirst.
depthfirst traversal is accomplished easily with
template <class F>
void dfs(const node& n, F f){
f(n);
if(&(n.left) != &Null){
dfs(n.left, f);
}
if(&(n.right) != &Null){
dfs(n.right, f);
}
}
for breadthfirst traversal, one needs to define some helper structure, as no node has a reference to its parent.
For creating an iteratorbased interface, one needs to define some helper structure where to store all nodes sorted in the right order.
Binary array tree
For most usecases, replacing a treelike structure with something that wraps an array and provides a treelike interface would have multiple benefits

better cache locality

smaller data structures

better code reuse (creating iterators both for bfs and dfs is much simpler)

usable in
constexpr
context with current compilers 
if the array size is correct, then it can be interpreted as a valid tree, in the case of node it is possible to create structures that are not trees, as one node could have the root node as children

it is possible to reuse the same data structure for both mutable and immutable trees
The main disadvantage is that if there are multiple data structures with a common subset, it might be harder to remove those duplicates. I do not believe this is an issue for most applications, as replacing the sparse data structure with an array should reduce the size much more, except in rare circumstances.
There is just one (soft) limitation: It is possible to represent a binary tree as an array if this is full.
This means that for a given height h
, the tree has ${2}^{h}1$
elements, as there is the following relation
For nonfull trees… one has to decide what to do with the gaps in the array. Also if the blacklist has many values, the array will be "sparse", thus another data structure for storing the tree might be better,
bfs traversal
This is trivial, as it is just traversing the tree:
constexpr const int tree[]{
0,
1, 2,
3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14,
15,16, 17,18, 19,20, 21,22, 23,24, 25,26, 27,28, 29,30
};
int main(){
for(const auto& v : tree){
std::cout << v << ' ';
}
}
will print 0 1 2 3 4 5 6 7 8 9 ….
dfs traversal
This is trickier but can be done without creating helper structures, as one can calculate where the next index is located. This also makes it easier to define a proper iterator
constexpr int children_left(int i) {return i*2+1;};
constexpr int children_right(int i){return i*2+2;};
constexpr int parent(int i) {return (i1)/2;};
constexpr int bubble_up(int i, int leaf){
int k = 1;
i = parent(i);
while ((leaf+1) % (k*2) != k ) {
i = parent(i);
k*=2;
}
return i;
}
struct index_iter{
int size=1;
int i = 0;
constexpr void next(){
assert(i != size);
if(i == size1){
i = size;
return;
}
if(i < size/2){
i = children_left(i);
return;
}
int leaf = isize/2;
int tmp_parent = bubble_up(i, leaf);
i = children_right(tmp_parent);
}
};
The condition if(i == size1){i = size;return;}
is straightforward. If we are on the last valid element of the array, that means on the last leaf, then the "next" index is one past the end of the array. This is a common convention, also used for iterators, for notifying the user that the new index is invalid.
if(i < size/2){ i = children_left(i); return;}
means that as longs as we are not on a leaf (as all leafs have index>=size/2
), the next element is simply the left children.
The less obvious part is what to do when we are on a leaf which is not the last. If the leaf is the left child of a node, then we can simply go to its brother, but if the leaf is a right node, then we need to got to traverse the tree in the opposite direction, and from a given node select the right children.
How many times we need to traverse parents depends on which node we are. And the logic of this procedure is encapsulated in the bubble_up
function.
If one manually writes down the number of parents it needs to traverse (even when simply jumping to the brother), it is possible to recognize the following pattern
leaf 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ... bubble steps 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 ... Starting from 0 every 2nd element is 1 Starting from 1 every 4th element is 2 Starting from 3 every 8th element is 3 Starting from 7 every 16th element is 4 ... Starting from 2**(k1)1 every 2**k th element is k
Thus given a node at index j
, one needs to find the value of k
for which
This is equivalent to testing
Determining which leaf we are on simply means subtracting size/2
from the current index. The algorithm then tests all values of k
by starting from 1
until it finds one that satisfies the condition. And instead of returning the number of times we need to traverse a parent, it calculates the necessary index directly.
Once index_iter
has been declared and implemented, it can be used for implementing an iterator by wrapping it:
struct depthiterator{
using iterator_category = std::forward_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = const int;
using pointer = value_type*;
using reference = value_type&;
index_iter ii;
std::span<const int> data;
constexpr depthiterator(std::span<const int> data_) : ii{std::size(data_), 0}, data{data_}{};
struct end_t{};
constexpr depthiterator(std::span<const int> data_, end_t) : ii{std::size(data_), std::size(data_)}, data{data_} {};
constexpr reference operator*() const { return data[ii.i]; }
constexpr pointer operator>() { return &data[ii.i]; }
constexpr depthiterator& operator++() {
ii.next();
return *this;
}
constexpr depthiterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
constexpr friend bool operator== (const depthiterator& a, const depthiterator& b) { return a.ii.i == b.ii.i; };
constexpr friend bool operator!= (const depthiterator& a, const depthiterator& b) { return a.ii.i != b.ii.i; };
};
Which can be used like
int main(){
for (auto it = depthiterator(tree); it != depthiterator(tree, depthiterator::end_t{}); ++it){
std::cout << *it << '\n';
}
constexpr int i1 = *depthiterator(tree);
static_assert(i1 == 0, "");
constexpr int i2 = *(++depthiterator(tree));
static_assert(i2 == 1, "");
constexpr int i3 = *(++ ++depthiterator(tree));
static_assert(i3 == 3, "");
}
and will print 0 1 3 7 15 16 8 17 18 4 …
as expected.
There are still improvements that can be made, currently depthiterator
does not verify that the size of it’s input is sensible, as complete tree have a size in the form ${2}^{n}1$. By providing appropriate abstractions it is possible to turn arrays with unexpected sizes into errors and enable .begin()
and .end()
functions for both dfs and bfs.
ternary and nary nodebased trees
Similarly to a binary tree, it is possible to define node
for ternary trees:
struct data{int i;};
struct node {
data d;
const node& first;
const node& second;
const node& third;
};
constexpr const node Null{data{0},Null,Null,Null};
constexpr node leaf(data d){
return node{d, Null, Null, Null};
}
constexpr const node tree {
data{1},
node {
data{2},
leaf(data{3}),
leaf(data{4}),
leaf(data{5})
},
leaf(data{6}),
leaf(data{7})
};
template <class F>
void dfs(const node& n, F f){
f(n);
if(&(n.first) != &Null){
dfs(n.first, f)
}
if(&(n.second) != &Null){
dfs(n.second, f)
}
if(&(n.third) != &Null){
dfs(n.third, f)
}
}
But it is annoying that, depending on the number of children, I had to define another node structure.
Thanks to templates it is possible to provide just one definition for all types of trees, but there are a couple of caveats.
#include <array>
namespace detail {
template <typename T, std::size_t ... Is>
constexpr std::array<T, sizeof...(Is)> create_array_i(T value, std::index_sequence<Is...>)
{
return {{(static_cast<void>(Is), value)...}};
}
}
template <std::size_t N, typename T>
constexpr std::array<T, N> create_array(const T& value) {
return detail::create_array_i(value, std::make_index_sequence<N>());
}
struct data{int i;};
template <int N>
struct node;
template <int N>
struct node_ref{
const node<N>& ref;
};
template <int N>
struct node {
data d;
std::array<node_ref<N>, N> children;
};
template <int N, class F>
void dfs(const node<N>& n, F f){
f(n);
for(const auto& v : n.children){
if(&(v.ref) != &Null<N>){
dfs(v.ref, f);
}
}
}
template<int i>
constexpr node<i> Null{data{0}, create_array<i>(node_ref<i>{Null<i>})};
template <int N>
constexpr node<N> leaf(data d){
return node<N>{d, create_array<N>(node_ref<N>{Null<N>})};
}
constexpr const node<4> tree {
data{1},
node<4> {
data{2},
leaf<4>(data{3}),
leaf<4>(data{4}),
leaf<4>(data{5}),
leaf<4>(data{6})
},
leaf<4>(data{7}),
leaf<4>(data{8}),
leaf<4>(data{9}),
};
Something like dfs(tree, [](const node<4>& n){std::cout << n.d.i << '\n';});
will print 1 2 3 4 5 6 7 8 9
.
Multiple Null nodes
The first caveat is that the definition of Null
depends on a template parameter. In order not to define different constexpr
Null values, which would require different names, we need to be able to define a variable template, which requires C++14.
If one still needs to support C++11 it can use a couple of macros to reduce code duplication. Something like the following macros will do the job:
#define define_NullN_constant(j) constexpr const node<j> Null##j{data{0}, create_array<j>(node_ref<j>{Null##j})};
define_NullN_constant(2);
define_NullN_constant(3);
define_NullN_constant(4);
...
template <int N>
constexpr node<N> Null();
#define define_Null_function(j) template<> constexpr node<j> Null(){return Null##j;};
define_Null_function(2)
define_Null_function(3)
define_Null_function(4)
...
Which is not as practical.
Otherwise, for the enduser, it means using Null<j>()
instead of Null<j>
, i.e. a function instead of a variable.
Array of references
References are not real objects like int
, char
, float
a struct
, a pointer, and so on. They are aliases; an alternate name for an object.
While in most cases the difference is not important, in this case, it is. It is not possible to create an array, or std::array
, of references.
The fix is simple and consists of a layer of indirection.: struct node_ref
. This structure does not seem necessary, and in fact, does not add any functionality or flexibility. But it is possible to create an array of node_ref
, while it is not to create an array of const node&
or simply node&
.
The standard library already provides std::reference_wrapper, but

factory functions like
std::cref
have a=delete
overload for temporaries 
it has userdefined constructors

is is only constexpr from C++20
Thus writing down a custom node_ref

makes code easier to write and read/is easier to use

is faster to compile compared to
std::reference_wrapper
, as it is not necessary to includefunction

is the same amount of code for providing factory functions like
std::cref
that accepts temporaries 
is necessary as
std::reference_wrapper
would create dangling references
Initialize array with one value
The standard library and C++ language does not provide any mechanism, except default construction and 0init, for initializing an array with a given value.
In practical terms, it means that creating Null
is problematic, as depending on the size the number of arguments changes. This makes both writing a template variable or a macro impossible.
Fortunately, though std::index_sequence
, it is possible to write the helper function create_array
that creates a std::array
with a given value.
ternary and nary array tree
Similarly to a binary, tree, a nary tree can be trivially mapped to an array if it is full (ie a nary tree with height h
has ${n}^{h}$ leafs.)
For example, a ternary tree can be defined the same way as a binary tree:
constexpr const int tree[]{
0
1 2 3
4 5 6 7 8 9 10 11 12
};
As whitespace and formatting are completely irrelevant, the relevant difference is how the data is interpreted/traversed/iterated.
The expected size of the array is
bfs traversal
Just like a binary tree, bfs traversal is iterating the array, it doesn’t matter if it is a binary, ternary, or another type of tree:
for(const auto& v : tree){
std::cout << v << ' ';
}
dfs traversal
Similar to a binary tree, when a leaf node is reached, it is necessary to calculate how many parents need to be traversed. Contrary to before, we also need to determine which child we need to traverse, while before we simply selected the right child.
The sequence, this time, for a ternary tree looks like this:
1 1 2 1 1 2 1 1 3 1 1 2 1 1 2 1 1 3 1 1 2 1 1 2 1 1 4 ...
but contrary to the sequence of the binary tree, I was not able to write down the pattern as simple rules.
Thus I rechecked the logic of the binary tree and realized that
means finding the gives the highest power of 2 that divides 2n (as this check is done only on right leaves).
A similar relation is valid for a ternary tree:
n (leaf+1) bubble op 1 > 1 as 3n == 3*1 == 3 is divided by 3^1 2 > 1 as 3n == 3*2 == 6 is divided by 3^1 3 > 2 as 3n == 3*3 == 9 is divided by 3^2 4 > 1 as 3n == 3*4 == 12 is divided by 3^1 5 > 1 as 3n == 3*5 == 15 is divided by 3^1 6 > 2 as 3n == 3*6 == 18 is divided by 3^2 7 > 1 as 3n == 3*7 == 21 is divided by 3^1 8 > 1 as 3n == 3*8 == 24 is divided by 3^1 9 > 3 as 3n == 3*9 == 27 is divided by 3^3 10 > 1 as ... ...
And the same relation holds for every nary tree (with n>=2).
A naive implementation of such function is testing with an increasing value if the value we are interested in is divisible or not:
constexpr int highest_pow(int base, int mult){
int count=0;
int k = 1;
while (mult % k == 0 ) {
k*=base;
++count;
}
return count;
}
There is one last issue: given a parent, how do I know which node to select? If no bubble happened, then it’s the first, but if we just bubbled then I should store somewhere the information on how many nodes I already visited.
Fortunately, there is another way. The information on which nodes have been traversed is available while bubbling up. Thus instead of bubbling up n
times, let’s bubble n1
times and choose the "brother" node.
Put it all together, the algorithm looks like
constexpr int children_left(int base, int i) {return i*base+1;};
constexpr int parent(int base, int i) {return (i1)/base;};
constexpr int next_brother(int base, int i) {(void)base;return i+1;};
constexpr int highest_pow(int base, int mult){
int count=0;
int k = 1;
while (mult % k == 0 ) {
k*=base;
++count;
}
return count;
}
int bubble_to_parent_min1(int base, int i, int leaf){
int bubble_op = highest_pow(base, leaf+1);
for(int j = 0 ; j != bubble_op1; ++j){
i = parent(base, i);
}
return i;
}
struct index_iter{
int size;
int i;
int base;
constexpr void next(){
assert(i != size);
if(i == size1){
i = size;
return;
}
if(i < size/base){
i = children_left(base, i);
return;
}
int leaf = isize/base;
int tmp = bubble_to_parent_min1(base, i, leaf);
i = next_brother(base, tmp);
}
};
The class depthiterator
can be changed to
struct depthiterator{
using iterator_category = std::forward_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = const int;
using pointer = value_type*;
using reference = value_type&;
index_iter ii;
std::span<const int> data;
constexpr depthiterator(std::span<const int> data_, int base) : ii{std::size(data_), 0, base}, data{data_}{
};
struct end_t{};
constexpr depthiterator(std::span<const int> data_, int base, end_t) : ii{std::size(data_), std::size(data_), base}, data{data_} {
};
constexpr reference operator*() const { return data[ii.i]; }
constexpr pointer operator>() { return &data[ii.i]; }
constexpr depthiterator& operator++() {
ii.next();
return *this;
}
constexpr depthiterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
constexpr friend bool operator== (const depthiterator& a, const depthiterator& b) { return a.ii.i == b.ii.i; };
constexpr friend bool operator!= (const depthiterator& a, const depthiterator& b) { return a.ii.i != b.ii.i; };
};
and used like
int main(){
for (auto it = depthiterator(tree, 3); it != depthiterator(tree, 3, depthiterator::end_t{}); ++it){
std::cout << *it << '\n';
}
}
thus the output of
constexpr const int tree[]{
0,
1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12,
13,14,15, 16,17,18, 19,20,21, 22,23,24, 25,26,27, 28,29,30, 31,32,33, 34,35,36, 37,38,39
};
looks like
0 1 4 13 14 15 5 16 17 18 6 19 20 21 2 7 22 23 24 8 25 26 27 9 28 29 30 3 10 31 32 33 11 34 35 36 12 37 38 39
Nonfull nary array trees
I’ve written that nary trees, if full, can be easily mapped to arrays. But what if the tree is not full?
In this case, a possible approach is having a second array that contains the index of nodes to skip (there is no need to adapt index_iter
too).
// for legibility, nodes that should be ignored are written in parenthesis
constexpr const int tree[]{
0,
1, 2,
3, 4, 5, (6),
7, 8, 9, 10, 11, 12, (13), (14),
(15),16, 17,18, 19,20, 21,22, 23,24, 25,26, (27),(28), (29),(30)
};
constexpr const int tree_blacklist[]{
15, 6 // NOTE: decrease order for more efficient to_skip/advance function, but output does not depend on it
};
Then, depthiterator
should accept an optional blacklistspan for knowing which nodes (and subnodes) to skip.
constexpr bool to_skip(int base, const int i_, std::span<const int> blacklist){
return std::any_of(blacklist.begin(), blacklist.end(), [&](int b){
int i = i_;
while(i > b){
i = parent(base, i);
}
return i == b;
});
}
constexpr index_iter advance(index_iter ii, std::span<const int> blacklist){
do{
ii.next();
}while(ii.i != ii.size && to_skip(ii.base, ii.i, blacklist));
return ii;
}
struct depthiterator{
using iterator_category = std::forward_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = const int;
using pointer = value_type*;
using reference = value_type&;
index_iter ii;
std::span<const int> data;
std::span<const int> blacklist;
constexpr depthiterator(std::span<const int> data_, int base, std::span<const int> blacklist_) : ii{std::size(data_), 0, base}, data{data_}, blacklist{blacklist_}{
ii= advance(ii, blacklist);
};
struct end_t{};
constexpr depthiterator(std::span<const int> data_, int base, end_t) : ii{std::size(data_), std::size(data_), base}, data{data_} {
};
constexpr reference operator*() const { return data[ii.i]; }
constexpr pointer operator>() { return &data[ii.i]; }
constexpr depthiterator& operator++() {
ii = advance(ii, blacklist);
return *this;
}
constexpr depthiterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
constexpr friend bool operator== (const depthiterator& a, const depthiterator& b) { return a.ii.i == b.ii.i; };
constexpr friend bool operator!= (const depthiterator& a, const depthiterator& b) { return a.ii.i != b.ii.i; };
};
and can be used like
int main(){
for (auto it = depthiterator(tree, 2, tree_blacklist); it != depthiterator(tree, 2, depthiterator::end_t{}); ++it){
std::cout << *it << '\n';
}
}
In this case, the output will be
1 3 7 16 8 17 18 4 9 19 20 10 21 22 2 5 11 23 24 12 25 26
Notice that also a bfs traversal now requires a custom iterator, and even functions like querying the size require some attention, as it is not the size of the array anymore.
Also, other operations are more complex and errorprone if no adequate abstraction is provided. For example, with a full tree of size N
, a node is a leaf if its index i
is greater than N/2
. If the tree is not full, one needs to check the size and if the elements that are would be it’s children in a full tree are blacklisted or not.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me here.