The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

constexpr linked list


6 - 8 minutes read, 1594 words
Categories: c++
Keywords: c++ constants constexpr data structures performance

It is trivial to declare strings and arrays as true constants (data that does not require any runtime processing), and it is a useful technique for improving correctness, binary size, and performance.

Even more, complex data structures, like a map, set, or list, can be "mapped" to an array with a wrapper class that provides the desired API.

This time I wanted to experiment with a data structure without mapping it immediately to an array.

The probably most simple data structure is a linked list; a linear collection of elements.

A barebone implementation

Contrary to arrays, the order of elements is not given by the physical placement in memory, but each element points to the next until the last.

A barebone implementation, where each element contains some data (for simplicity let’s assume it’s just an int) would look similar too

struct data{ int i; };

struct node{
    data d;
    node* next;
};

next would point to the next element, unless it is nullptr.

While this structure, as long as data is constexpr-friendly, is also constexpr-friendly, it is not yet possible to create a list at compile-time

struct data{ int i; };

struct node{
    data d;
    node* next;
};

constexpr const node n3{
    data{42}, nullptr
};

constexpr const node n2{
    data{-1}, &n3 // error: invalid conversion from 'const node*' to 'node*' [-fpermissive]
};

GCC triggers an error with invalid conversion from 'const node*' to 'node*' [-fpermissive], clang error: cannot initialize a member subobject of type 'node *' with an rvalue of type 'const node *' and MSVC error C2440: 'initializing': cannot convert from 'const node *' to 'node *'.

Notice that this has nothing to do with constexpr, as

const node n3{
    data{42}, nullptr
};

const node n2{
    data{-1}, &n3 // error: invalid conversion from 'const node*' to 'node*' [-fpermissive]
};

yields exactly the same error.

To fix this error it is sufficient to change the declaration of next to be a const node*.

Now it is possible to create such structure at compile time:

struct data{ int i; };

struct node{
    data d;
    const node* next;
};

constexpr const node n3{
    data{42}, nullptr
};
constexpr const node n2{
    data{1}, &n3
};
constexpr const node n1{
    data{0}, &n2
};
constexpr const node n0{
    data{-1}, &n1
};

The assembly output is

n3:
        .long   42
        .zero   4
        .quad   0
n2:
        .long   1
        .zero   4
        .quad   n3
n1:
        .long   0
        .zero   4
        .quad   n2
n0:
        .long   -1
        .zero   4
        .quad   n1

While it works, it is impractical for declaring longers lists, as every node needs to be declared separately, even if we are not interested in those single variables, except for initializing n0.

Writing something like

struct data{ int i; };

struct node{
    data d;
    const node* next;
};

constexpr const node list{
    data{-1},
    node{
        data{0},
        &node{
            data{1},
            &node{
                data{42}
                nullptr
            }
        }
    }
};

would be much less error-prone and readable. Unfortunately, it is invalid and does not compile, the compiler diagnostic is error: taking address of rvalue [-fpermissive] (on GCC) and error: taking the address of a temporary object of type 'node' [-Waddress-of-temporary] (on clang).

Using references instead of pointers would make it possible to declare the whole list at once, but it makes apparently impossible to have the last element since every element needs to point to a valid object.

It is fortunately possible (and incredibly easy) to create a self-reference. Thus

  • by changing the detection of the last element

  • by creating some dummy data

it is possible to use a sentinel value instead of nullptr for declaring the end of the list.

Now it’s also possible to create lists with a single expression:

struct data{ int i; };

struct node{
    data d;
    const node& next;
};

constexpr const node Null{data{0}, Null};

constexpr const node list{
    data{-1},
    node{
        data{0},
        node{
            data{1},
            node{
                data{42},
                Null
            }
        }
    }
};

Note: accessing elements at compile-time is generally (currently) not possible

constexpr auto n1 = n0.next;

fails with error: the value of '<temporary>' is not usable in a constant expression with GCC and with something similar with clang (unless using clang >=12), and produces an ICE with MSVC (or a diagnostic with some variations).

The code is valid, but given the scarce support, it is hard to currently sell it as a portable solution. I’ve reported the bugs to both GCC and MSVC, hopefully, the next compiler version of both vendors will be able to support such data structures without issues.

Doubly linked list

node is used for defining a singly linked list, it does not seem possible to me to create a doubly linked list in one go.

Circular linked list

A list is circularly linked when the last node has a next node at the beginning of the list. With the current definition of node, it is also possible to declare such a list, similarly to how the Null value is defined. In this case, we do not even need a sentinel value.

struct data{ int i; };

struct node{
    data d;
    const node& next;
};

constexpr const node list{
    data{-1},
    node{
        data{0},
        node{
            data{1},
            node{
                data{42},
                list
            }
        }
    }
};

It’s hard to use node for manipulating lists

Unfortunately this definition of node makes every mutable operation on a list harder or impossible.

  • next is const, thus we can only create immutable data structures

  • next is a reference, making it impossible to change where it points to

which is a strong indicator that our node is useful for creating only immutable structures with immutable data (unless this is not const-correct). This is a big limitation for code reusability.

It’s not possible to define constructors for node

The reason is that otherwise there would be a dangling reference.

Consider the following snippet:

constexpr const int& i = 42;
constexpr const int j = std::max(0,42);
constexpr const int& k = std::max(0,42);

The first line is safe because of lifetime extension and thus compiles. The second line compiles as the value is stored in the variable j. The third line does not compile. The reason is that there is no lifetime extension, thus k would be dangling, and using it would be UB.

Thus changing node to something like

struct data{ int i; };

class node{
    data d;
    const node& next;
    public:
    constexpr node(data d_, const node& next_) : d{d_}, next{next_} {}

    // and other useful functions
};

constexpr const node Null{data{0}, Null};

constexpr const node list{
    data{-1},
    node{
        data{0},
        Null
    }
};

// or
constexpr const node list(
    data{-1},
    node(
        data{0},
        Null
    )
);

is not possible, as the node constructor introduces a new scope, and assigns next to a temporary. As the lifetime extension does not kick anymore in, it creates a dangling reference. This is unfortunate, it makes for example impossible to add a default constructor that creates a Null node, or a node with some default data.

The issue cannot be sidestepped by creating constexpr factory functions unless it’s for creating the Null node and the last element of the list.

This is one area where currently a function-like macro cannot be substituted by C++ constructs, as it does not (necessarily) introduce a new scope.

It is (currently) not possible to initialize the data with =

Unless not using Clang. GCC (from version 7.x) and MSVC can compile the following snippet without issues (and the code is officially standard-compliant since C++17 because of RVO)

struct data{ int i; };

struct node{
    data d;
    const node& next;
};

constexpr const node Null = node{data{0}, Null}; // here it's OK with all compilers

constexpr const node list = node{ // clang complains
    data{-1},
    node{
        data{0},
        Null
    }
};

Common sublist

While such limitation might make it dubious to create such structures instead of relying on arrays, it has (to my knowledge), one advantage: two or more different lists can have a common tail, making it possible to manually reduce the binary size even further (compared to a dynamically initialized list):

struct data{ int i; };

struct node{
    data d;
    const node& next;
};

constexpr cost node Null{data{0}, Null};

constexpr const node common{
    data{1},
    node{
        data{42},
        Null
    }
};

constexpr const node list1{
    data{-1},
    node{
        data{0},
        common
    }
};

constexpr const node list2{
    data{-2},
    node{
        data{3},
        common
    }
};

With arrays, this is only possible if they have a contiguous intersection. In those cases, it is possible to define only one array and two or more std::span for accessing the data.

Use an array

As handling arrays is easier, a better approach might be to try to replace the list with an array. It is eventually possible to provide a list-like API, like for ct_map the API mimics the one of std::map, even if the underlying data structure is an array.

This also assumes that the data structure has a limited size. Otherwise, assumptions about the data structure, like constant access time, which do not hold, might have considerable effects. Even if when talking about constants, it is expected for them not to be too big, and thus arrays (eventually sorted) to be the best structure.

Another positive side-effect is that it reduces further the binary size, and iterating through elements is generally faster because the elements are probably in the CPU cache.

While it is obviously not possible to represent a circular list as array, it is possible to use an array, and a user-defined iterator to get the desired behavior.


Do you want to share your opinion? Or is there an error, same parts that are not clear enough?

You can contact me here.