The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

Undefined behavior - the ugly


14 - 18 minutes to read, 3536 words
Categories: c c++ security
Keywords: c c++ clang constexpr cppcheck gcc guideline hardening multithreading performance security static analysis undefined behavior

Everyone that worked with C and C++ stomped more than once in UB; undefined behavior.

There are good reasons why UB exists in the language, bad reasons, and then the ugly, and undefined behavior is, when it manifests itself, mostly ugly.

The good

Undefined behavior has multiple advantages, otherwise never language features would try to be undefined-behavior-free.

Consider for example std::string_view and std::span.

Those classes have been added in C++17 and C++20, both of them have an operator[] and both of them invoke undefined behavior if the index used as a parameter is out-of-range instead of having a well-defined behavior.

If undefined behavior is always as bad as described, then there would be little reason why never library addition still makes it so easy to trigger it.

Distinguish programming errors from runtime errors

There are obviously multiple possible design decisions for handling an invalid value; throwing an error and leave otherwise the function signature unchanged, using an optional return value or something like std::expected.

The main disadvantage of defining the behavior is that using operator[] with an invalid index is, strictly speaking, not a programming error anymore.

It might not seem like a big issue, but in this case, undefined behavior helps to make dynamic and runtime analyzers more powerful.

Consider the trivial program

std::string s = "123";
s[42];

As the string does not have 42 characters, the second line has undefined behavior.

As a consequence, both a static analyzer and runtime analyzer, or another programmer during code review can warn about the error.

In fact, cppcheck diagnoses

[main.cpp:5] (error) Out of bounds access in 's[42]', if 'sv' size is 3 and '42' is 42 [containerOutOfBounds]

If the program would have been

std::string s = "123";
s.at(42);

then, it is not possible to say that this code is wrong. There might be, somewhere else, a catch block that handles the exception. Without more context, or analyzing the whole program (if possible), it is not clear if this snippet has an error or not, even if in this case it is a contrived construct for throwing a std::out_of_range exception.

Note that also in this case, cppcheck diagnoses a similar warning. Trying to access a non-existent element is generally viewed as a programming error, even when using functions that do bound-checking. As a guideline, exceptions that are not subclasses of std::runtime_error should be treated as a programming error, and std::out_of_range is not a subclass of it.

Leave the door open for future proposals

The C and C++ committee always tries to avoid breaking existing code and binaries, even if this means choosing solutions that are not as performant, easy to use, or user-friendly. Because of that, once something is defined, in most cases, it won’t change ever (notable exclusion are features that the committee thought are practically unused (for example trigraphs, auto, comma operator inside brackets, …​), unportable, or that cause more damage than anything else (like std::auto_ptr))

If something is not defined or defined as UB, it gives the possibility to standardize something better later.

Can provide a better working environment

Consider dereferencing nullptr. In C and C++ it is undefined behavior, while in Java it produces a NullPointerException.

The main disadvantage of a NullPointerException is that the context where the error happened is lost. Which variable should not have been null? What values did other variables have? With JEP 358 (Helpful NullPointerExceptions), at least some more information is added.

In C, the behavior is not defined, but (handwaving technical difficulties) one can generate a stack dump or break execution automatically and see exactly what is happening, instead of having "only" having a stack trace.

Some interesting GCC flags would be -ftrapv, or the sanitizer flags -fsanitize, -fharden-compares, -fharden-conditional-branches

Makes it easier to define an API/specification

What sorting method does the sort function use? As long as the data is sorted, and some upper bounds are defined, it should not be that relevant for most if not all use-cases.

If the sort method accepts a comparator (or overloads operator< in C++) then the differences between implementations can be observed. For example, one can count how many elements are compared or moved from one position to another. For covering those details, "implementation-defined" would be sufficient, but what’s the behavior if the comparator is buggy or does not define some criteria required by the sort API?

In this case, it is not possible to define what the sort function will do. For example, in Java 7 the sorting algorithm changed, and some code that (seemed) to work, broke.

Another example is the Java hashCode function, which should consider only member functions that do not mutate, otherwise, it can break data structures like maps and sets.

An obvious case where undefined behavior helps to simplify a specification is a race condition.

Data races are considered undefined behavior in most programming languages, and as far as I know, trying to specify benign data races has never succeded.

Another example would be

void foo(int, int);


void bar(int i){
	foo(++i,++i);
}

In this case, there might be different opinions what values are passed to foo. If i is 0, is it called as foo(1,2) or foo(2,1)? There are good reason for both possibilities, and defining one of those might be come as a surprise for those that expect the other outcome. Not defining it, makes it clear that the code is not valid, and give tools the possibility to know that the code is not valid, and emit a diagnostic.

In general it would be even better if we can define that this is a compile-error, but how to specify that

void bar(int& i, int& j){
	foo(++i,++j);
}

is not valid, if i and j refer to the same value, and bar is part of a library linked at runtime?

Performance

Performance is probably the most overused reason for allowing undefined behavior in many interfaces.

A simple example could be

// v needs to contain at least 10 elements
void foo(std::vector<int>& v){
    assert(v.size() >=10 );
    for(int i = 9; i >= 0; --i){
        v[i] = 0;
    }
}

In it current form, the compiler could change the for loop in for(int i = 0; i != 10; ++i) if looping in the other direction is faster, or even use alternate approaches like using memset:

void foo3(std::vector<int>& v){
    memset(v.data(), 0, 10*sizeof(int));
}

When using std::vector::at, for(int i = 9; i >= 0; --i) and for(int i = 0; i != 10; ++i) are not as easy to interchange, as they will have a different behavior when the size is smaller than 9, even if such state should not be possible. It can also have a significant difference on the size of the final binary, even if at runtime the difference might be hard to measure.

Gives more flexibility for the implementation

For example, an implementation can add sanity checks that are not specified, like a throwing operator[].

Another example would be (again), the implementation of Collections.sort in Java 8, which adds more sanity checks, and eventually throws a ConcurrentModificationException, eventually breaking code that seems to work.

Simplify code

Just like UB makes it easier to define an API, it can also help to simplify code.

Consider a function that looks like

std::string fun(const int i){
    if(i<0){ return ""; }
    // a lot of code
    int j = i%5;
    char c;
    switch(j){
        case 0: c = 'a';
        case 1: c = 'e';
        case 2: c = 'i';
        case 3: c = 'o';
        case 4: c = 'u';
    }
    // a lot of code
}

A possible way to simplify this code is to extract the switch statement to a separate function

std::string fun(const int i){
    if(i<0){ return ""; }
    // a lot of code
    int j = i%5;
    const char c = vowel(j);
    // a lot of code
}


char vowel(int i){
    switch(i){
        case 0: return 'a';
        case 1: return 'e';
        case 2: return 'i';
        case 3: return 'o';
        case 4: return 'u';
    }
}

Notice that both implementations do not handle the case if i%5 < 0 and i%5 > 6.

In the first implementation, it is obvious that we do not need to handle those cases; i>0 and thus i%5 is always greater than 0 and less than five.

But for the second case, vowel does not know it.

There are alternative design decisions.

  • ignore the error

  • return dummy value

  • throw an exception (os use something like std::optional or std::expected)

  • explicitly add a non-recoverable error

Note 📝
I’m leaving other design decisions, like jumps, global values, and signals out for brevity, as they have more disadvantages than advantages.

The current implementation ignores the (programming) error. Even if not written explicitly, the contract of vowel is that i is positive and less than five.

The second possibility is to return a dummy value (as in garbage-in, garbage out) This is "wrong" because if there is an error, it would hide it, as the caller would get a "valid" character, and eventually recognize the error only later.

Throwing an exception seems thus more appropriate, as it seems to follow the fail-fast principle.

Unfortunately throwing is not as fast in failing as one might expect.

The biggest advantage of tools like sanitizers and fuzzing is how they help to find errors without writing explicit tests.

If a future code change introduces an error, the best thing that can happen is to discover it automatically without further changes.

If the tested function (either fun or something that calls fun), it would be possible to adapt the test for verifying that vowel does not throw, assuming that the exception thrown from vowel can escape all exception handlers and that it can be distinguished from other exceptions.

Thus (not considering other tests that might fail because of the introduced error), it requires an active change for detecting the error automatically and makes a big assumption (depending on the size of the test) on the implementation details.

The last possible design decision is to add a non-recoverable error. One possibility would be to add assert(i >= 0 && i⇐4 && "precondition failed") at the beginning of vowel, or assert(false && "precondition failed") at the end of it. This is equivalent to ignoring the whole situation, except we documented the intent better, and that a smart compiler might diagnose that there is unreachable code (if the assert is put after the switch statement).

The bad

Undefined behavior has multiple disadvantages.

It does not help to write portable code, for example, the code might "work" with one compiler, on a platform even for years, and then fail after something changes.

It also does not help much when working with dependencies, especially if those are loaded at runtime. UB is generally a non-recoverable error. Simply loading a shared library, like a plugin, at runtime, means that the main application is as stable as the most buggy component. If there would be no UB, or if all errors were recoverable, then it would be possible to log them and unload the plugin automatically, possibly ensuring a better user experience.

Flexibility is only granted if one has control over all dependencies

Otherwise, the program has a mix of implementation with a throwing operator[] and a non-throwing operator[].

The ugly

In practice, UB has many more side effects.

For this reason, contracts are trying hard not to introduce any UB, especially since its main job is to provide a tool to verify the correctness of a given program.

It can remove code written by programmers

Consider

void foo(int a)
{
    if (a + 1000 < a) {
        throw std::runtime_error();
    }
    // other code
}

this, depending on the environment, equivalent to

void foo(int a)
{
    // other code
}

Because a + 1000 < a is never true. It "could" be true only if a would have modulo semantics, which it does not.

The correct way to write this check for is testing for a ⇐ std::numeric_limits<int>::max() - 1000 (as we know that std::numeric_limits<int>::max() minus a positive value does not overflow), or cast everythign to unsigned values: unsigned(a)+1000<unsigned(a).

Another typical example is dereferencing a pointer and checking afterward if it is nullptr or not:

void foo(int* a)
{
    if(*a == 1){
        return 0;
    }
    if (a == nullptr) {
        throw std::runtime_error();
    }
    // other code
}

Depending on the environment, it is equivalent to

void foo(int* a)
{
    if(*a == 1){
        return 0;
    }
    // other code
}

Those examples might seem trivial, but often such code appears after simplifying and inlining more complex functions, for example


bool bar(int* a){
    return *a == 1;
}
void baz(int* a){
  if (a == nullptr) {
    throw std::runtime_error();
  }
  //other code
}
void foo(int* a)
{
  if(bar(a)){
    return 0;
  }
  baz(a);
}

and the function bar (after inlining) changes the check-in baz.

Note 📝
For this specific use-case, GCC offers the option -fdelete-null-pointer-checks/-fno-delete-null-pointer-checks, I recall it was added for the Linux kernel.

Uninitialized variables

DWORD cchString;
DWORD cbValue;
HRESULT hr = CorSigUncompressData(pbBlob, cbBlob, &cchString, &cbValue);
if (SUCCEEDED(hr))
{
    cbBlob -= cbValue;
    pbBlob += cbValue;

    if (cbBlob >= cchString)
    {
        //  Convert to unicode
        wchar_t rgchTypeName[c_cchTypeNameMax];
        DWORD cchString = MultiByteToWideChar(CP_UTF8, 0, reinterpret_cast<LPCSTR>(pbBlob), static_cast<int>(cchString), rgchTypeName, ARRAYSIZE(rgchTypeName));
        if (cchString != 0 && cchString < ARRAYSIZE(rgchTypeName))
        {
            //  Ensure that the string is null terminated.
            rgchTypeName[cchString] = L'\0';
        }
        cbBlob -= cchString;
        pbBlob += cchString;
    }
}

This piece of code did the "right thing" (AKA behaved as expected) when compiled for 32bits, but not when compiled for 64-bit.

There is in fact an uninitialized cchString variable, that is passed to the MulitByteToWideChar function(!).

As it is undefined behavior, the 32-bit compiler re-used the stack storage for the inner and outer cchString variables, while the 64-bit compiler did not.

Library extensions can often be used only when recompiling everything

Checked std::vector::operator[], otherwise UB (because of ODR)

Pointers are equals, pointed content is not

Consider the following C example, shown first by Michael Spencer:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  int* p = malloc(sizeof(int));
  int* q = realloc(p, sizeof(int));
  *p = 1;
  *q = 2;
  if (p == q) {
    printf("%d %d\n", *p, *q);
  }
  return 0;
}

After calling realloc, p is invalid, even if it has the same value as q. Because of that, depending on how the code is optimized, the output with clang is either "2 2" or "1 2".

Even more confusing is the slightly modified example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  int* p = malloc(sizeof(int));
  int* q = realloc(p, sizeof(int));
  if (p == q) {
    *p = 1;
    *q = 2;
    printf("%d %d\n", *p, *q);
  }
  return 0;
}

Pointer provenance, ie determining if a pointer is valid or not, is a tricky topic.[1].

In C and C++ pointers can be converted losslessly to integers (if the type is big enough), saved, and read from files. One would assume that when casting an integer to a pointer type, such optimization cannot be made, as, "how can the compiler tell if a pointer has been marked as invalid if the value has been read from a file?".

But UB is UB, trying to find an explanation does not always make sense:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(){
  int* p = malloc(sizeof(int));
  uintptr_t ap = (uintptr_t)p;
  int* q = realloc(p, sizeof(int));
  uintptr_t aq = (uintptr_t)q;
  if (aq == ap){
    *(int*)ap = 1;
    *(int*)aq = 2;
    printf("%d %d\n", *(int*)ap, *(int*)aq);
  }
  return 0;
}

Even if the two integral types have the same value, when cast to pointers they behave differently(!)

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void){
  int* p = malloc(sizeof(int));
  uintptr_t ap = (uintptr_t)p;
  int* q = realloc(p, sizeof(int));
  uintptr_t aq = (uintptr_t)q;
  if (aq == ap){
    aq = ap;
    *(int*)ap = 1;
    *(int*)aq = 2;
    printf("%d %d\n", *(int*)ap, *(int*)aq);
  }
  return 0;
}

makes the program well defined, but it is not obvious why aq = ap after aq == ap is necessary, and does not get optimized out.

UB cannot be compartmentalized

While it is clear that a non-recoverable error will bring down the whole process, it is not always clear that perfectly fine-looking code is affected by some code that comes afterward because of time-travel.

For example

void foo(int* a){
    if(a == nullptr){
        bar();
    }else{
        baz();
    }
    if( a == nullptr){
        *a = 1;
    }
}

Since *a when a==nullptr is undefined, the compiler can assume that a is never nullptr, and remove the check at the beginning of the function.

Also, code that is never executed, just compiled, can change the behavior of an unrelated piece of code

namespace{
    typedef void fun();
    fun* f = nullptr;
}

void bar();

void never_called() {
  f = bar;
}

int main() {
  f();
}

Because calling f() with f==nullptr is undefined behavior, the compiler does not need to add any check for such a case.

As f has internal linkage (thus cannot be changed from another translation unit) and is assigned in only one place to bar, the compiler inlines the location where f points to bar.

This example also shows why it is generally not possible to localize undefined behavior. the function never_called is, as the name implies, never used. But it does not matter, its presence in the source code is enough to change how the code is transformed, and in the case of undefined behavior, it can have unintended side effects.

Conclusion

Different programmers have different experiences and expectations about what undefined behavior means.

I believe the biggest issue of undefined behavior is that it gives too much liberty about how the code can be transformed when things go wrong. And obviously defining "when things go wrong" is not easy.

In other languages, if something that is not defined happens, in most cases it does not change code written previously or in an unrelated function.

In C++, sorting std::vector<double> containing a nan value can do anything. In other languages, it might return an unsorted container. Returning an unsorted container instead of crashing goes against the fail-fast approach, but having nasal daemons when executing an unrelated function does not help much in practice.

For this reason, even if the compiler can remove safety checks, it is important to

  • compile with as many diagnostics as possible, and do not ignore them silently

  • enable all hardening flag

  • verify invariants, pre, and postconditions

  • have an automated test suite

  • test with and without optimizations

  • test with sanitizers

  • test with Valgrind

  • use static analyzers

To detect and avoid as early as possible undefined behavior.

More specific solutions given a certain environment might be to

As a more long-term plan, it would be nice if in the standard it would be possible to limit what undefined behavior can do.

For example, I would really dislike that code is removed because I’m sorting a nan value. On the other hand, I believe it is perfectly fine if it causes stack overflows, crashes, dead-locks, emits a signal, reboots my PC, the postconditions do not hold (the vector is not sorted, or some values are missing), corrupts the heap, and so on.

As long as those things happen while sorting or after (for example because of memory corruption), it is possible to use sanity checks for analyzing the state of a program and the reason for the errors.

Unfortunately, I am not aware of any plan/proposal for constraining undefined behavior.

I think the main bet are contracts, as they would give the end-user the possibility to enforce them and thus detect UB.

Another area of improvement are compiler diagnostic. If those were active even when not all compiler flags are turned on would be great, but this is easier said than done. If a new, useful, warning generates too many diagnostics that do not result in bugs, it might simply get turned off, as there are alternative (maybe faster) approaches to find most bugs. Once a warning is disabled, it is unlikely that it gets enabled again.

Defining a "friendly subset" might seem a good idea, the impossible task is to agree on such subset, at that point, it might be better to define a new language, better tooling, or try to fix the issue in the official standard.

Note 📝
The C language defines an optional extension to limit what undefined behavior can do, but as far as I know the three major compilers did not implement it.

1. Some interesting papers/articles: N2090, rust, P1726, P2414, and rust again

Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me here.