The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

Magic numbers for runtime checks

Notes published the
7 - 9 minutes to read, 1765 words
Categories: c c++
Keywords: c c++ cast global variable type erasure undefined behavior validation

Where to use magic numbers

Note 📝
"magic numbers", in this context, are not unnamed constants.

One error-prone way to achieve polymorphism in C and C++ is to use void*.

The most common use case I found, is when a library offers an interface for registering a callback function.

Suppose that the library of a fictous company ACME offers an interface like

typedef int callback(int);
void acme_register_callback(callback* f);

For this example, void* is not necessary, but what if ACME wants to give the user the possibility to access some user-provided data?

A good interface would not force the end-user to resort to global variables.

The only way is then to pass this data as an additional parameter in the callback function, but the author of acme_register_callback has no way to know what type of data the implementer of the callback function wants to access.

One could argue that templates would be the way to go, unfortunately, it does not work well if ACME wants to provide a precompiled library.

Class hierarchies and dynamic_cast could also help but are still problematic.

ACME might also want to prefer to provide only a C interface to ensure a stable ABI and to make it easier to use the provided library from more programming languages than just C++.

So we are currently stuck with the following signatures

typedef int callback(int, void* data);
void acme_register_callback(callback* f);
void acme_set_user_data(void*);

Using those functions is easy:


struct data_for_callback{
    std::string x = "hello";
    short y = 42;
    int z = -1;
};

int foo(int, void* data){
    assert(data != nullptr);
    const auto d = reintepret_cast<data_for_callback*>(data);
    // do something with the data
}

int main(){
    data_for_callback d;
    acme_register_callback(foo)
    acme_set_user_data(&d);
    // trigger call to callback
}

Note that I am using reintepret_cast, even if static_cast would do exactly the same thing. This cast is a dangerous operation, no matter how it is spelled, as it bypasses the type system. reinterpret_cast denotes the intent more clearly than a static_cast or C-style cast.

If void* data is not pointing to some data_for_callback, then the current implementation of foo has undefined behavior.

And there is no standard technique for detecting it reliably.

void* does not carry any metadata, it is just an address, so it is not possible to automatically detect such types of errors.⁠[1]

Sanitizers and compiler flags can help in some cases (see previous footnotes), but it is also possible to implement some checks that can help to diagnose errors by hand: magic numbers.

Note 📝
I’ve always called them check fields or check numbers, as they work similar to a check digit, but it seems that most people prefer the word "magic".

Consider following code

constexpr unsigned char data_for_callback_check = 12;
struct data_for_callback{
    private:
    unsigned char check = data_for_callback_check;
    public:
    std::string x = "hello";
    short y = 42;
    int z = -1;
};

data_for_callback& get_data_for_callback(void* ptr){
    assert(ptr != nullptr); // or throw/return error
    auto data = reintepret_cast<const unsigned char*>(ptr);
    assert(data[0] = data_for_callback_check); // or throw/return error
    return *reintepret_cast<data_for_callback*>(ptr);
}

int foo(int, void* data){
    const auto& d = get_data_for_callback(data);
    // do something with the data
}

int main(){
    auto d = data_for_callback();
    acme_register_callback(foo);
    acme_set_user_data(d);
    // trigger call to callback
}

data_for_callback_check = 12 is the magic number for identifying if a pointer points to a data_for_callback structure.

The first step is adding the metadata to the structure, thus I’ve added an unsigned char as the first member variable. This variable must never change its value (but it should not be defined as const if the data should be copyable).

get_data_for_callback verifies the check number and casts void* to data_for_callback& in case of success.

The verification needs to be done through an unsigned char*. Casting void* to data_for_callback* and then accessing data_for_callback_check would be undefined behavior if void* did not point to a data_for_callback*.

Accessing data from unsigned char* (and char* and signed char*) is well defined, even if there is no unsigned char object (as long as there is something/the pointer is valid). Thus even if the void* points to something else, it is possible to handle the error gracefully, or at least detect it.

For this reason, setting the magic number as the first member variable is the safes option. If the magic number is not the first value, one would need to calculate what would be the offset, which is annoying. But most important sizeof(unsigned char) == 1, thus accessing data[0] is always valid (as long as data points to something), while a bigger index is problematic.

Unfortunately, it does not cover all errors.

If data[0] != data_for_callback_check we know that there is no data_for_callback object, so it’s great for detecting this situation, but what if the check passes?

We still cannot be sure that there is a data_for_callback object.

For example

constexpr unsigned char data_for_callback_check = 12;
struct data_for_callback{
    private:
    unsigned char check = data_for_callback_check;
    public:
    std::string x = "hello";
    short y = 42;
    int z = -1;
};

const data_for_callback& get_data_for_callback(const void* ptr){
    assert(ptr != nullptr); // or throw/return error
    auto data = reintepret_cast<const unsigned char*>(ptr);
    assert(data[0] = data_for_callback_check); // or throw/return error
    return *reintepret_cast<data_for_callback*>(*ptr);
}

int foo(int, void* data){
    const auto& d = get_data_for_callback(data);
    // do something with the data
}

int main(){
    acme_register_callback(foo)
    unsigned char arr{data_for_callback_check, 0, 0, 0, 0}; // arr[0] == data_for_callback_check (!)
    acme_set_user_data(arr);
    // trigger call to callback
}

Instead of a single unsigned char it would be possible to use an array of two, three, or even more values, but there is no verification that can be fooled (just like with check digits!).

Indeces instead of pointers

There is acutally another way to write a verification that cannot be fooled, and that does not even require a check number.

A pointer is, after all, an address and can be "converted" to a number. You should use intptr_t/uintptr_t, which is guaranteed to be big enough.

struct data_for_callback{
    std::string x = "hello";
    short y = 42;
    int z = -1;
};

std::map<std::uintptr_t, data_for_callback> global_data;

const data_for_callback& get_data_for_callback(const void* ptr){
    return global_data.at(std::uintptr_t(ptr));
}

int foo(int, void* data){
    const auto& d = get_data_for_callback(data);
    // do something with the data
}

int main(){
    auto index = std::uintptr_t(11);
    global_data.emplace(index, data_for_callback());
    acme_register_callback(foo);
    acme_set_user_data(reinterpret_cast<void*>(index));
    // trigger call to callback
}

Note that this has another set of disadvantages.

  • there is a non-trivial global structure

  • there is a mutable global structure

  • pointers that in general do not point to anything are passed around. It is not even possible to access them through unsigned char*, which is unexpected, even if the library that defines acme_set_user_data should not touch the pointer it receives in any way…​

  • It might be necessary to synchronize the global data structure if this is accessed from multiple threads.

Note 📝
It is possible to use std::map<void*, …​> as container, as std::map uses std::less for compaing keys (and not < directly, which is troublesome with pointers), and avoid converting between pointers and intptr_t/uintptr_t. As the pointers do not really point to something but are only used as key/index, it would not really improve the readability of the code.

For some (most?) use cases, you can probably replace the map with a vector, for some cases an array could also be sufficient (in particular as there is no resizing it makes multithreading a little bit easier).

I do not think that for most-use-cases it is worth the effort to use a global structure, but it is worth remembering.

Should I add a check number to every class in case it might be passed as void* parameter?

No, while it is simple to create automatically check numbers, for example

struct s{
  unsigned char check = sizeof(s); // low-quality check number
};

it should not be necessary, because normally function should take typed parameters, and not void*.

A better automatically generated check number would not only take the size into account, but also the name of the class, a macro should be able to hide nicely the implementation details.

One might think that typeid would be the best tool for this job, it even has a hash function, but it is not constexpr.

If you actually need to pass a structure to a function that takes void*, you do not need to change the structure directly (which can be problematic); you can trivially wrap it in another structure:

struct data_for_callback{
    std::string x = "hello";
    short y = 42;
    int z = -1;
};

struct wrap_with_check{
  unsigned char check = 42;
  data_for_callback d;
};

Keeping thus your commonly used data structure free of those normally useless member functions.

Final note

Having a stable and at the same time super-flexible interface is an intriguing idea.

But it comes at a cost.

Not only some type of errors that would otherwise be catched by the compiler are now possible, but it also requires, especially if a project grows in size, more documentation.

In fact the Linux Kernel is removing many magic numbers because using a type-safe API is much simpler and robust, even if less generic.

The conclusion is that you should avoid using void*, but if you have to work with an interface that uses it, you can detect some programming errors with a check number, instead of blindly casting from one pointer type to the other.


1. This is not necessarily true, memory tagging is a thing, but it is not part of the C or C++ standard, and depends on the platform and environment

Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.