Bit fields in C and C++
Bit fields are one obscure and underspecified feature of both C and C++, and I have almost never had to deal with them.
Per se they are not something complicated, but both languages are extremely vague.
| Note 📝 | For simplicity I’m assuming CHAR_BIT==8. |
Limitations on bitfields
A bit field is not an object; it is not possible to create a local (or static or global) bit field variable:
struct bitfield{
unsigned char v : 2 = 0;
};
void foo(){
auto bf = bitfield{}; // compiles
unsigned char v : 2; // fails to compile
} Similarly, it is not possible to take a pointer or make a reference to a bit field variable, and it is also not possible to use it as parameter (reference, value or pointer):
#include <utility>
struct bitfield{
unsigned char v : 2 = 0;
};
template <class T>
void sink1(T&&){}
template <class T>
void sink2(const T&){}
void foo(){
auto bf = bitfield{};
auto* ptr = &bf.v; // fails to compile
auto& ref = bf.v; // fails to compile
sink1(bf.v); // fails to compile
auto v = bf.v; // compiles, but v is the underlying type
static_assert(std::is_same_v<unsigned char, decltype(v)>);
decltype(auto) v = bf.v; // compiles, but v is the underlying type
static_assert(std::is_same_v<unsigned char, decltype(v)>);
const auto& ref = bf.v; // compiles, but ref to underlying type
static_assert(std::is_same_v<const unsigned char&, decltype(ref)>);
sink2(bf.v); // compiles, but sink2 instantiated with T = unsigned char
} Given those limitation, as you cannot pass bit fields around, it’s another use-case for using a macro instead of a function!
There are further restrictions; for example, the sizeof operator cannot be used on a bit field:
#include <utility>
struct bitfield{
unsigned char v1 : 2 = 0;
unsigned char v2 = 0;
};
template <class T>
void sink1(T&&){}
template <class T>
void sink2(const T&){}
void foo(){
auto bf = bitfield{};
sizeof bf.v1; // fails to compile
sizeof bf.v2; // compiles
} decltype works, but reports the underlying type:
#include <utility>
struct bitfield{
unsigned char v1 : 2 = 0;
unsigned char v2 = 0;
};
static_assert(std::is_same_v<unsigned char, decltype(bitfield::v1)>); Most of there restriction exist because the smallest objects have sizeof equal to one, for example char and unsigned char.
And yet a char is not "atomic", it consists of CHAR_BIT bits.
Creating a const reference works, because it creates a temporary object, whose lifetime is extended until the reference is destroyed
The lifetime of a temporary object may be extended by binding to a reference, see reference initialization for details.
How to get the "size" of a bitfield
This is the easiest approach I’ve found; unfortunately it requires a macro:
#include <bit>
struct bitfield1{
unsigned int b : 15 = 1;
};
struct bitfield2{
unsigned char c1 : 1 = ~0;
unsigned short c2 : 3;
};
#define BSIZE(s, v) \
[] consteval {\
s bf = { .v = 0 }; \
bf.v = ~bf.v; \
return std::popcount(bf.v); \
}()
static_assert(BSIZE(bitfield1, b) == 15);
static_assert(BSIZE(bitfield2, c1) == 1);
static_assert(BSIZE(bitfield2, c2) == 3); One could write s bf = { .v = ~0 };, but at least GCC and clang warns about a narrowing conversion, which leads to a compilation error, even if unsigned char c1 : 1 = ~0; does not produce any warning in struct bitfield2. On the other hand bf.v = ~bf.v; does not produce any warning, but only works if bf.v was set to 0 before, which is what { .v = 0 } ensures.
A pity, as otherwise the macro could have been shortened to a single function call:
#define BSIZE(s, v) std::popcount( s{ .v = ~0 }.v ) Writing something like the following macro
#define BSIZE(s, v) \
[] consteval {\
s bf{ .v = 0 };\
return std::popcount(~bf.v);\
}() would also be wrong, because ~bf.v converts to int, not a bit field, and thus has in general more bits set than the maximum of bf.v. Also the code does not compile as std::popcount has no overloads for int.
For this reason, the proposed BSIZE also fails to compile for following structure
struct bitfield{
int v : 3; // not unsigned
}; I’m not sure how it should be handled, except to avoid using bit fields on signed types.
How to initialize a bitfield structure
Given the following structre
struct data_t {
// b0
unsigned char v1 : 2;
unsigned char v2 : 1;
unsigned char v3 : 1;
unsigned char v4 : 4;
// b1
unsigned char v5 : 2;
unsigned char padding : 6;
}; What is the "best" way to initialize it from a buffer of data?
There are two approaches: copy the buffer over data_t, or extract the bits one by one
data_t to_data_t1(std::span<const unsigned char, sizeof(data_t)> s) {
data_t t = {};
std::memcpy(&t, s.data(), s.size());
t.padding = {};
return t;
}
data_t to_data_t2(std::span<const unsigned char, sizeof(data_t)> s) {
data_t t = {};
unsigned char b0 = s[0];
t.v1 = (b0 ) & 0x03; // bits 0-1
t.v2 = (b0 >> 2) & 0x01; // bit 2
t.v3 = (b0 >> 3) & 0x01; // bit 3
t.v4 = (b0 >> 4) & 0x0F; // bits 4-7
unsigned char b1 = s[1];
t.v5 = (b1 ) & 0x03; // bits 0-1
return t;
} I find the first function much easier to read and maintain.
If the layout of data_t changes, the code in to_data_t1 does not need to change.
In the case of to_data_t2, one needs to update the function; also verifying it’s correctness is less trivial, have all bytes been shifted and masked correctly?
A further advantage of to_data_t1 is that it is, as of today, generates better code to_data_t2 on all compilers I have tested.
Thus less code, less chances to introduce accidental errors, and better performance/smaller binaries.
to_data_t2 has two advantages:
-
it is portable between big and little endian systems
-
it can be made
constexpr
The portability is "only" relevant if the data is exchanged between different systems.
One can achieve a better compatibility by using different data structures depending on the endianness of the system:
struct data_t_little_endian { // do not use directly, use data_t
// b0
unsigned char v1 : 2;
unsigned char v2 : 1;
unsigned char v3 : 1;
unsigned char v4 : 4;
// b1
unsigned char v5 : 2;
unsigned char padding : 6;
};
struct data_t_big_endian { // do not use directly, use data_t
// b0
unsigned char v4 : 4;
unsigned char v3 : 1;
unsigned char v2 : 1;
unsigned char v1 : 2;
// b1
unsigned char padding : 6;
unsigned char v5 : 2;
};
using data_t = std::conditional<std::endian::native == std::endian::little,
data_t_little_endian, data_t_big_endian>::type;
data_t to_data_t1(std::span<const unsigned char, sizeof(data_t)> s) {
data_t t = {};
std::memcpy(&t, s.data(), s.size());
t.padding = {};
return t;
} although it requires to declare the same data structure twice, and thus increase the chance to introduce a bug that can be observed only on some platforms.
How to convert a bitfield structure to an array of bytes
Just as a buffer is converted to data_t, one might also need to do the conversion the other way round.
There are multiple approaches, the standard library and the language offer multiple ways to achieve the same result:
std::span<const unsigned char, sizeof(data_t)> to_buffer_t1a(const data_t& d) {
return std::span<const unsigned char, sizeof(data_t)>(reinterpret_cast<const
unsigned char*>(&d), sizeof(data_t));
}
std::span<const unsigned char> to_buffer_t1b(const data_t& d) {
return std::span<const unsigned char>(reinterpret_cast<const unsigned
char*>(&d), sizeof(data_t));
}
std::span<const std::byte> to_buffer_t2(const data_t& d) {
return std::as_bytes(std::span{&d, 1});
}
std::array<unsigned char, sizeof(data_t)> to_buffer_t3(const data_t& d) {
return std::bit_cast<std::array<unsigned char, sizeof(data_t)>>(d);
}
std::array<unsigned char, sizeof(data_t)> to_buffer_t4(const data_t& d) {
unsigned char[sizeof(data_t)] tmp;
std::memcpy(tmp, &d, sizeof(data_t));
return std::array<unsigned char, sizeof(data_t)>(tmp);
}
std::array<unsigned char, sizeof(data_t)> to_buffer_t5(const data_t& d) {
std::array<unsigned char, 2> out = {};
out[0] = ((d.v1 & 0x3) << 0) |
((d.v2 & 0x1) << 2) |
((d.v3 & 0x1) << 3) |
((d.v4 & 0xF) << 4);
out[1] = ((d.v5 & 0x3) << 0);
return out;
} Note that only to_buffer_t3 (std::bit_cast) and to_buffer_t5 (manual parsing) can be made constexpr.
All versions except for to_buffer_t5 have the same portability issue, and to_buffer_t5 is also the only method that naturally sets the padding to 0.
Contrary to parsing a buffer, GCC generate equivalent code on a little endian machine:
to_buffer_t1a(data_t_little_endian const&):
mov rax, rdi
ret
to_buffer_t1b(data_t_little_endian const&):
mov rax, rdi
mov edx, 2
ret
to_buffer_t2(data_t_little_endian const&):
mov rax, rdi
mov edx, 2
ret
to_buffer_t3(data_t_little_endian const&):
movzx eax, word ptr [rdi]
ret
to_buffer_t4(data_t_little_endian const&):
movzx eax, word ptr [rdi]
ret
to_buffer_t5(data_t_little_endian const&):
movzx eax, word ptr [rdi]
and eax, 1023
ret The version using std::as_bytes is also slightly less ergonomic, because for comparing a std::byte with an unsigned char, one needs to cast; there is no predefined comparison operator.
Testing on machines with different endianness
#include <bit>
int main() {
constexpr int w = sizeof(void*) * 8;
constexpr const char* b = std::endian::native == std::endian::little ? "little" :
std::endian::native == std::endian::big ? "big" :
"?";
printf("%d-bit, %s endian\n", w, b);
} On my machine, I get the following output
64-bit, little endian The easiest way to test a big-endian machine, is to cross-compile the binary and run in an emulator, I’m going to use Quick Emulator (QEMU) :
sudo apt install --install-recommends qemu-user g++-powerpc-linux-gnu
powerpc-linux-gnu-g++ --std=c++20 -static main.cpp
qemu-powerpc a.out If the system is configured correctly, it is possible to execute ./a.out directly instead of invoking manually qemu-powerpc:
powerpc-linux-gnu-g++ --std=c++20 -static main.cpp
./a.out # no need to use qemu-powerpc directly In this case, the output looks like the following:
32-bit, little endian A higher level abstraction
With a couple of helper function, the bit shifting and masking can be abstracted away:
#define BSIZE(s, v) \
[]() consteval { \
s bf = { .v = 0 }; \
bf.v = ~bf.v; \
return std::popcount(bf.v); \
}()
template<unsigned bit_offset, unsigned bit_width>
constexpr unsigned char extract_bits(std::span<const unsigned char> buf) {
static_assert( bit_offset/8 == (bit_offset+bit_width-1)/8 );
unsigned byte_index = bit_offset / 8;
unsigned bit_index = bit_offset % 8;
return (buf[byte_index]>>bit_index) & ((1u << bit_width) - 1);
}
template<unsigned bit_offset, unsigned bit_width>
constexpr void insert_bits(std::span<unsigned char> buf, unsigned char value)
{
static_assert( bit_offset/8 == (bit_offset+bit_width-1)/8 )
unsigned byte_index = bit_offset / 8;
unsigned bit_index = bit_offset % 8;
unsigned mask = ((1u << bit_width) - 1) << bit_index;
buf[byte_index] &= ~mask;
buf[byte_index] |= (value << bit_index) ;
}
data_t decode(std::span<const unsigned char, sizeof(data_t)> buf) {
constexpr auto bs_v1 = BSIZE(data_t, v1);
constexpr auto bs_v2 = BSIZE(data_t, v2);
constexpr auto bs_v3 = BSIZE(data_t, v3);
constexpr auto bs_v4 = BSIZE(data_t, v4);
constexpr auto bs_v5 = BSIZE(data_t, v5);
constexpr auto bs_vp = BSIZE(data_t, padding);
auto bf = data_t{};
bf.v1 = extract_bits<0, bs_v1>(buf);
bf.v2 = extract_bits<0+bs_v1, bs_v2>(buf);
bf.v3 = extract_bits<0+bs_v1+bs_v2, bs_v3>(buf);
bf.v4 = extract_bits<0+bs_v1+bs_v2+bs_v3, bs_v4>(buf);
bf.v5 = extract_bits<0+bs_v1+bs_v2+bs_v3+bs_v4, bs_v5>(buf);
bf.padding = extract_bits<0+bs_v1+bs_v2+bs_v3+bs_v4+bs_v5, bs_vp>(buf);
// or always set padding to 0
bf.padding = 0;
return bf;
}
std::array<unsigned char, sizeof(data_t)> encode(const data_t& bf) {
constexpr auto bs_v1 = BSIZE(data_t, v1);
constexpr auto bs_v2 = BSIZE(data_t, v2);
constexpr auto bs_v3 = BSIZE(data_t, v3);
constexpr auto bs_v4 = BSIZE(data_t, v4);
constexpr auto bs_v5 = BSIZE(data_t, v5);
constexpr auto bs_vp = BSIZE(data_t, padding);
std::array<unsigned char, sizeof(data_t)> buf = {};
insert_bits<0, bs_v1>(buf, bf.v1);
insert_bits<0+bs_v1, bs_v2>(buf, bf.v2);
insert_bits<0+bs_v1+bs_v2, bs_v3>(buf, bf.v3);
insert_bits<0+bs_v1+bs_v2+bs_v3, bs_v4>(buf, bf.v4);
insert_bits<0+bs_v1+bs_v2+bs_v3+bs_v4, bs_v5>(buf, bf.v5);
insert_bits<0+bs_v1+bs_v2+bs_v3+bs_v4+bs_v5, bs_vp>(buf, bf.padding);
// or always set padding to 0
insert_bits<0+bs_v1+bs_v2+bs_v3+bs_v4+bs_v5, bs_vp>(buf, 0);
return buf;
} In case you want to support bitfields over multiple bytes, a possible approach is to change the static_assert to an if constexpr, and the implement the required logic.
Also note that
unsigned mask = ((1u << bit_width) - 1) << bit_index;
buf[byte_index] &= ~mask;
buf[byte_index] |= (value << bit_index) ; can be simplified to
buf[byte_index] |= (value & ((1u << bit_width) - 1)) << bit_index; but only if the caller zero-initializes the buffer before the first call to insert_bits.
Conclusion
I suppose that the safe approach is not to use std::memcpy, reinterpret_cast, std::bit_cast, and/or std::as_bytes.
One of the reason is that none of them, except for those using std::bit_cast, can be made constexpr.
The second reason is that it avoids all possible issues related to layout, packing and padding of the structure, which are also underspecified both in C and C++.
The third reason is that if the data structure is shared between different machine types, one has to remember to use std::conditional with std::endian::native in order to use the correct structure. It needs to be done only once, but one needs to realize it.
In a bigger project, it is probably easier to ban the usage of std::memcpy, reinterpret_cast, std::bit_cast, and std::as_bytes in most places, and manually parse the buffer to the data structure and vice-versa.
On the other hand, a higher level abstraction is still problematic; even ignoring some potential performance and size penalties, it has a lot of boilerplate compared to the lower-level but endian-sensitive approaches that just copy or reinterpret the data.
Without a higher level abstraction, and "fiddling" with the bits directly, the main drawbacks is the chance to introduce bugs by writing a wrong constant while shifting:
This can be avoided by:
-
using a code generator that parses the structure and creates the appropriate conversion functions
-
write the code in a way that is easy to parse and verify with an external script for correctness
Althoug using adding an external tool to the build process always adds complexity.
It is trivial to write an exaustive test that parses some array to a structure and back (just toggle all possible bits, they are normally not that many) but it would only verify that the conversion routines do not lose any information, not that the intermediate result, the structure, is correct.
If you have questions, comments, or found typos, the notes are not clear, or there are some errors; then just contact me.