constexpr linked list
- A barebone implementation
- Accessing elements at compile-time is generally (currently) not possible
- Doubly linked list
- Circular linked list
- It’s hard to use
node
for manipulating lists - It’s not possible to define constructors for
node
- It is (currently) not possible to initialize the data with
=
- Common sublist
- Use an array
It is trivial to declare strings and arrays as true constants (data that does not require any runtime processing), and it is a useful technique for improving correctness, binary size, and performance.
Even more, complex data structures, like a map, set, or list, can be "mapped" to an array with a wrapper class that provides the desired API.
This time I wanted to experiment with a data structure without mapping it immediately to an array.
The probably most simple data structure is a linked list; a linear collection of elements.
A barebone implementation
Contrary to arrays, the order of elements is not given by the physical placement in memory, but each element points to the next until the last.
A barebone implementation, where each element contains some data
(for simplicity let’s assume it’s just an int
) would look similar too
struct data{ int i; };
struct node{
data d;
node* next;
};
next
would point to the next element, unless it is nullptr
.
While this structure, as long as data
is constexpr
-friendly, is also constexpr
-friendly, it is not yet possible to create a list at compile-time
struct data{ int i; };
struct node{
data d;
node* next;
};
constexpr const node n3{
data{42}, nullptr
};
constexpr const node n2{
data{-1}, &n3 // error: invalid conversion from 'const node*' to 'node*' [-fpermissive]
};
GCC triggers an error with invalid conversion from 'const node*' to 'node*' [-fpermissive]
, clang error: cannot initialize a member subobject of type 'node *' with an rvalue of type 'const node *'
and MSVC error C2440: 'initializing': cannot convert from 'const node *' to 'node *'
.
Notice that this has nothing to do with constexpr
, as
const node n3{
data{42}, nullptr
};
const node n2{
data{-1}, &n3 // error: invalid conversion from 'const node*' to 'node*' [-fpermissive]
};
yields exactly the same error.
To fix this error it is sufficient to change the declaration of next
to be a const node*
.
Now it is possible to create such a structure at compile time:
struct data{ int i; };
struct node{
data d;
const node* next;
};
constexpr const node n3{
data{42}, nullptr
};
constexpr const node n2{
data{1}, &n3
};
constexpr const node n1{
data{0}, &n2
};
constexpr const node n0{
data{-1}, &n1
};
The assembly output is
n3:
.long 42
.zero 4
.quad 0
n2:
.long 1
.zero 4
.quad n3
n1:
.long 0
.zero 4
.quad n2
n0:
.long -1
.zero 4
.quad n1
While it works, it is impractical to declare longer lists, as every node needs to be declared separately, even if we are not interested in those single variables, except for initializing n0
.
Writing something like
struct data{ int i; };
struct node{
data d;
const node* next;
};
constexpr const node list{
data{-1},
node{
data{0},
&node{
data{1},
&node{
data{42}
nullptr
}
}
}
};
would be much less error-prone and readable. Unfortunately, it is invalid and does not compile, the compiler diagnostic is error: taking address of rvalue [-fpermissive]
(on GCC) and error: taking the address of a temporary object of type 'node' [-Waddress-of-temporary]
(on clang).
Using references instead of pointers would make it possible to declare the whole list at once, but it makes apparently impossible to have the last element since every element needs to point to a valid object.
It is fortunately possible (and incredibly easy) to create a self-reference. Thus
-
by changing the detection of the last element
-
by creating some dummy data
it is possible to use a sentinel value instead of nullptr
for declaring the end of the list.
Now it’s also possible to create lists with a single expression:
struct data{ int i; };
struct node{
data d;
const node& next;
};
constexpr const node Null{data{0}, Null};
constexpr const node list{
data{-1},
node{
data{0},
node{
data{1},
node{
data{42},
Null
}
}
}
};
Accessing elements at compile-time is generally (currently) not possible
constexpr auto n1 = n0.next;
fails with error: the value of '<temporary>' is not usable in a constant expression
with GCC and with something similar with clang (unless using clang >=12), and produces an ICE with MSVC (or a diagnostic with some variations).
Doubly linked list
node
is used for defining a singly linked list, it does not seem possible to me to create a doubly linked list in one go.
Circular linked list
A list is circularly linked when the last node has the next node pointing at the beginning of the list. With the current definition of node, it is also possible to declare such a list, similarly to how the Null
value is defined. In this case, we do not even need a sentinel value.
struct data{ int i; };
struct node{
data d;
const node& next;
};
constexpr const node list{
data{-1},
node{
data{0},
node{
data{1},
node{
data{42},
list
}
}
}
};
It’s hard to use node
for manipulating lists
Unfortunately this definition of node
makes every mutable operation on a list harder or impossible.
-
next
isconst
, thus we can only create immutable data structures -
next
is a reference, making it impossible to change where it points to
which is a strong indicator that our node
is useful for creating only immutable structures with immutable data (unless this is not const
-correct). This is a big limitation for code reusability.
It’s not possible to define constructors for node
The reason is that otherwise there would be a dangling reference.
Consider the following snippet:
constexpr const int& i = 42;
constexpr const int j = std::max(0,42);
constexpr const int& k = std::max(0,42);
The first line is safe because of lifetime extension and thus compiles. The second line compiles as the value is stored in the variable j
. The third line does not compile. The reason is that there is no lifetime extension, thus k
would be dangling, and using it would be UB.
Thus changing node
to something like
struct data{ int i; };
class node{
data d;
const node& next;
public:
constexpr node(data d_, const node& next_) : d{d_}, next{next_} {}
// and other useful functions
};
constexpr const node Null{data{0}, Null};
constexpr const node list{
data{-1},
node{
data{0},
Null
}
};
// or
constexpr const node list(
data{-1},
node(
data{0},
Null
)
);
is not possible, as the node
constructor introduces a new scope, and assigns next
to a temporary. As the lifetime extension does not kick anymore in, it creates a dangling reference. This is unfortunate, it makes, for example, impossible to add a default constructor that creates a Null
node
, or a node
with some default data
.
The issue cannot be sidestepped by creating constexpr
factory functions unless it’s for creating the Null
node and the last element of the list.
This is one area where currently a function-like macro cannot be substituted by C++ constructs, as it does not (necessarily) introduce a new scope.
It is (currently) not possible to initialize the data with =
Unless ignoring Clang. GCC (from version 7.x) and MSVC can compile the following snippet without issues (and the code is officially standard-compliant since C++17 because of RVO)
struct data{ int i; };
struct node{
data d;
const node& next;
};
constexpr const node Null = node{data{0}, Null}; // here it's OK with all compilers
constexpr const node list = node{ // clang complains
data{-1},
node{
data{0},
Null
}
};
Common sublist
While such limitation might make it dubious to create such structures instead of relying on arrays, it has (to my knowledge), one advantage: two or more different lists can have a common tail, making it possible to manually reduce the binary size even further (compared to a dynamically initialized list):
struct data{ int i; };
struct node{
data d;
const node& next;
};
constexpr cost node Null{data{0}, Null};
constexpr const node common{
data{1},
node{
data{42},
Null
}
};
constexpr const node list1{
data{-1},
node{
data{0},
common
}
};
constexpr const node list2{
data{-2},
node{
data{3},
common
}
};
With arrays, this is only possible if they have a contiguous intersection. In those cases, it is possible to define only one array and two or more std::span
for accessing the data.
Use an array
As handling arrays is easier, a better approach might be to try to replace the list with an array. It is eventually possible to provide a list-like API, like for ct_map the API mimics the one of std::map
, even if the underlying data structure is an array.
This also assumes that the data structure has a limited size. Otherwise, assumptions about the data structure, like constant access time, which do not hold, might have considerable effects. Even when talking about constants, it is expected for them not to be too big, and thus arrays (eventually sorted) to be the best structure.
Another positive side-effect is that it reduces further the binary size, and iterating through elements is generally faster because the elements are probably in the CPU cache.
While it is obviously not possible to represent a circular list as an array, it is possible to use an array, and a user-defined iterator (and operator[]
) to get the desired behavior.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.