Undefined behavior - the ugly
Everyone that worked with C and C++ stomped more than once in UB; undefined behavior.
There are good reasons why UB exists in the language, bad reasons, and then the ugly, and undefined behavior is, when it manifests itself, mostly ugly.
The good
Undefined behavior has some advantages, otherwise, all new language features would be free from undefined behavior.
Consider for example std::string_view
and std::span
.
Those classes have been added in C++17 and C++20, both of them have an operator[]
and both of them invoke undefined behavior if the index used as a parameter is out-of-range instead of having a well-defined behavior.
If undefined behavior is always as bad as described, then there would be little reason why never library addition still makes it so easy to trigger it.
Distinguish programming errors from runtime errors
There are obviously multiple possible design decisions for handling an invalid value; throwing an error and leaving the function signature unchanged, using an optional return value or something like std::expected
🗄️.
The main disadvantage of defining the behavior is that using operator[]
with an invalid index is, strictly speaking, not a programming error anymore.
It might not seem like a big issue, but in this case, undefined behavior helps to make dynamic and runtime analyzers more powerful.
Consider the trivial program
std::string s = "123";
s[42];
As the string does not have 42
characters, the second line has undefined behavior.
As a consequence, both a static analyzer and runtime analyzer, or another programmer during code review can warn about the error.
In fact, cppcheck
diagnoses
[main.cpp:5] (error) Out of bounds access in 's[42]', if 'sv' size is 3 and '42' is 42 [containerOutOfBounds]
If the program would have been
std::string s = "123";
s.at(42);
then, it is not possible to say that this code is wrong. There might be, somewhere else, a catch block that handles the exception. Without more context, or analyzing the whole program (if possible), it is not clear if this snippet has an error or not, even if in this case it is a contrived construct for throwing a std::out_of_range
exception.
Note that also in this case, cppcheck
diagnoses a similar warning. Trying to access a non-existent element is generally viewed as a programming error, even when using functions that do bound-checking. As a guideline, exceptions that are not subclasses of std::runtime_error
should be treated as a programming error, and std::out_of_range
is not a subclass of it. This page lists the exception types defined in the standard library.
Leave the door open for future proposals
The C and C++ committee always tries to avoid breaking existing code and binaries, even if this means choosing solutions that are not as performant, easy to use, or user-friendly. Because of that, once something is defined, in most cases, it won’t change ever (notable exclusions are features that the committee thought was practically unused (for example trigraphs 🗄️, auto
, comma operator inside brackets 🗄️, …), unportable, or that cause more damage than anything else (like std::auto_ptr
🗄️))
If something is not defined or defined as UB, it gives the possibility to standardize something better later.
Can provide a better working environment
Consider dereferencing nullptr
. In C and C++ it is undefined behavior, while in Java it produces a NullPointerException
.
The main disadvantage of a NullPointerException
is that the context where the error happened is lost. Which variable should not have been null
? What values did other variables have? With JEP 358 (Helpful NullPointerExceptions) 🗄️, at least some more information is added.
In C, the behavior is not defined, but (handwaving technical difficulties) one can generate a stack dump or break execution automatically and see exactly what is happening, instead of having "only" a stack trace.
Some interesting GCC flags would be -ftrapv
, or the sanitizer flags -fsanitize
, -fharden-compares
, -fharden-conditional-branches
Makes it easier to define an API/specification
What sorting method does the sort function use? As long as the data is sorted, and some upper bounds are defined, it should not be that relevant for most if not all use cases.
If the sort method accepts a comparator (or overloads operator<
in C++) then the differences between implementations can be observed. For example, one can count how many elements are compared or moved from one position to another. For covering those details, "implementation-defined" would be sufficient, but what’s the behavior if the comparator is buggy or does not define some criteria required by the sort API?
In this case, it is not possible to define what the sort function will do. For example, in Java 7 the sorting algorithm changed, and some code that (seemed) to work, broke.
Another example is the Java hashCode
function, which should consider only member functions that do not mutate, otherwise, it can break data structures like maps and sets.
An obvious case where undefined behavior helps to simplify a specification is a race condition.
Data races are considered undefined behavior in most programming languages, and as far as I know, trying to specify benign data races has never succeeded.
Another example would be
void foo(int, int);
void bar(int i){
foo(++i,++i);
}
In this case, there might be different opinions on what values are passed to foo
. If i
is 0, is it called foo(1,2)
or foo(2,1)
? There are good reasons for both possibilities and defining one of those might come as a surprise for those that expect the other outcome. Not defining it, makes it clear that the code is not valid, and gives tools the possibility to know that the code is not valid, and emit a diagnostic.
In general, it would be even better if we could define that this is a compile-error, but how to specify that
void bar(int& i, int& j){
foo(++i,++j);
}
is not valid, if i
and j
refer to the same value, and bar
is part of a library linked at runtime?
Performance
Performance is probably the most overused reason for allowing undefined behavior in many interfaces.
A simple example could be
// v needs to contain at least 10 elements
void foo(std::vector<int>& v){
assert(v.size() >=10 );
for(int i = 9; i >= 0; --i){
v[i] = 0;
}
}
In its current form, the compiler could change the for loop in for(int i = 0; i != 10; ++i)
if looping in the other direction is faster, or even use alternate approaches like using memset
:
void foo3(std::vector<int>& v){
memset(v.data(), 0, 10*sizeof(int));
}
When using std::vector::at
, for(int i = 9; i >= 0; --i)
and for(int i = 0; i != 10; ++i)
cannot be interchanged, as they will have different behavior when the size is smaller than 9, even if such state should never occurr. It can also have a significant difference on the size of the final binary, even if at runtime the difference might be hard to measure.
Gives more flexibility for the implementation
For example, an implementation can add sanity checks that are not specified, like a throwing operator[]
.
Another example would be (again), the implementation of Collections.sort
in Java 8, which adds more sanity checks, and eventually throws a ConcurrentModificationException
, eventually breaking code that seems to work 🗄️.
Simplify code
Just like UB makes it easier to define an API, it can also help to simplify code.
Consider a function that looks like
std::string fun(const int i){
if(i<0){ return ""; }
// a lot of code
int j = i%5;
char c;
switch(j){
case 0: c = 'a';
case 1: c = 'e';
case 2: c = 'i';
case 3: c = 'o';
case 4: c = 'u';
}
// a lot of code
}
A possible way to simplify this code is to extract the switch
statement to a separate function
std::string fun(const int i){
if(i<0){ return ""; }
// a lot of code
int j = i%5;
const char c = vowel(j);
// a lot of code
}
char vowel(int i){
switch(i){
case 0: return 'a';
case 1: return 'e';
case 2: return 'i';
case 3: return 'o';
case 4: return 'u';
}
}
Notice that both implementations do not handle the case if i%5 < 0
and i%5 > 6
.
In the first implementation, it is obvious that we do not need to handle those cases; i>0
and thus i%5
is always greater than 0 and less than five.
But for the second case, vowel
does not know it.
There are alternative design decisions.
-
ignore the error
-
return a dummy value
-
throw an exception (os use something like
std::optional
orstd::expected
) -
explicitly add a non-recoverable error (abort)
Note 📝 | I’m leaving other design decisions, like jumps, global values, and signals out for brevity, as they have more disadvantages than advantages. |
The current implementation ignores the error. Even if not written explicitly, the contract of vowel
is that i
is positive and less than five.
The second possibility is to return a dummy value (as in garbage-in, garbage out) This is conceptually "wrong" because if there is an error, it would hide it, as the caller would get a "valid" character, and eventually recognize the error only later.
Throwing an exception seems thus more appropriate, as it seems to follow the fail-fast principle.
Unfortunately throwing is not as fast in failing as one might expect, as exceptions can be caught.
The biggest advantage of tools like sanitizers and fuzzing is how they help to find errors without writing explicit tests.
If a future code change introduces an error, the best thing that can happen is to discover it automatically without further changes.
If the tested function (either fun
or something that calls fun
), it would be possible to adapt the test for verifying that vowel
does not throw, assuming that the exception thrown from vowel
can escape all exception handlers and that it can be distinguished from other exceptions.
Thus (not considering other tests that might fail because of the introduced error), it requires an active change for detecting the error automatically and makes a big assumption (depending on the size of the test) on the implementation details.
The last possible design decision is to add a non-recoverable error. One possibility would be to add assert(i >= 0 && i⇐4 && "precondition failed")
at the beginning of vowel
, or assert(false && "precondition failed")
at the end of it. This is equivalent to ignoring the whole situation, except we documented the intent better, and a compiler might diagnose that there is unreachable code (if the assert
is put after the switch statement).
The bad
Undefined behavior has multiple disadvantages.
It does not help to write portable code, for example, the code might "work" with one compiler, on a platform even for years, and then fail after something changes.
It also does not help much when working with dependencies, especially if those are loaded at runtime. UB is generally a non-recoverable error. Simply loading a shared library, like a plugin, at runtime, means that the main application is as stable as the most buggy component. If there would be no UB, or if all errors were recoverable, then it would be possible to log them and unload the plugin automatically, possibly ensuring a better user experience.
The ugly
In practice, UB has many more side effects.
For this reason, contracts are trying hard not to introduce any UB, especially since its main job is to provide a tool to verify the correctness of a given program.
It can remove code written by programmers
Consider
void foo(int a)
{
if (a + 1000 < a) {
throw std::runtime_error();
}
// other code
}
this, depending on the environment, is equivalent to
void foo(int a)
{
// other code
}
Because a + 1000 < a
is never true. It "could" be true only if a
would have modulo semantics, which it does not.
The correct way to write this check is testing for a ⇐ std::numeric_limits<int>::max() - 1000
(as we know that std::numeric_limits<int>::max()
minus a positive int
value does not overflow), or cast everything to unsigned values: unsigned(a)+1000<unsigned(a)
.
Another typical example is dereferencing a pointer and checking afterward if it is nullptr
or not:
void foo(int* a)
{
if(*a == 1){
return 0;
}
if (a == nullptr) {
throw std::runtime_error();
}
// other code
}
Depending on the environment, it is equivalent to
void foo(int* a)
{
if(*a == 1){
return 0;
}
// other code
}
Those examples might seem trivial, but often such code appears after simplifying and inlining more complex functions, for example
bool bar(int* a){
return *a == 1;
}
void baz(int* a){
if (a == nullptr) {
throw std::runtime_error();
}
//other code
}
void foo(int* a)
{
if(bar(a)){
return 0;
}
baz(a);
}
and the function bar
(after inlining) changes the check-in baz
, as-if the written code is
bool bar(int* a){
return *a == 1;
}
void baz(int* a){
// no check for nullptr
//other code
}
void foo(int* a)
{
if(bar(a)){
return 0;
}
baz(a);
}
Note 📝 | For this specific use-case, GCC offers the option -fdelete-null-pointer-checks /-fno-delete-null-pointer-checks , I recall it was added for the Linux kernel 🗄️. |
Uninitialized variables
From Larry Osterman 🗄️:
DWORD cchString;
DWORD cbValue;
HRESULT hr = CorSigUncompressData(pbBlob, cbBlob, &cchString, &cbValue);
if (SUCCEEDED(hr))
{
cbBlob -= cbValue;
pbBlob += cbValue;
if (cbBlob >= cchString)
{
// Convert to unicode
wchar_t rgchTypeName[c_cchTypeNameMax];
DWORD cchString = MultiByteToWideChar(CP_UTF8, 0, reinterpret_cast<LPCSTR>(pbBlob), static_cast<int>(cchString), rgchTypeName, ARRAYSIZE(rgchTypeName));
if (cchString != 0 && cchString < ARRAYSIZE(rgchTypeName))
{
// Ensure that the string is null-terminated.
rgchTypeName[cchString] = L'\0';
}
cbBlob -= cchString;
pbBlob += cchString;
}
}
This piece of code did the "right thing" (AKA behaved as expected) when compiled for Windows 32-bit, but not when compiled for 64-bit.
This function has an uninitialized cchString
variable, that is passed to the MulitByteToWideChar
function(!).
As it is undefined behavior, the 32-bit compiler re-used the stack storage for the inner and outer cchString
variables, while the 64-bit compiler did not.
Library extensions can often be used only when recompiling everything
You should not enable, for example, a checked std::vector::operator[]
that throws out_of_range
unless you can recompile everything. If not, enabling such an extension would introduce undefined behavior because of ODR.
Pointers are equals, pointed content is not
Consider the following C example, shown first by Michael Spencer:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int* p = malloc(sizeof(int));
int* q = realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q) {
printf("%d %d\n", *p, *q);
}
return 0;
}
After calling realloc
, p
is invalid, even if it has the same value as q
. Because of that, depending on how the code is optimized, the output with clang is either "2 2" or "1 2".
Even more confusing is the slightly modified example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int* p = malloc(sizeof(int));
int* q = realloc(p, sizeof(int));
if (p == q) {
*p = 1;
*q = 2;
printf("%d %d\n", *p, *q);
}
return 0;
}
Pointer provenance, ie determining if a pointer is valid or not, is a tricky topic.[1].
In C and C++ pointers can be converted losslessly to integers (if the type is big enough), saved, and read from files. One would assume that when casting an integer to a pointer type, such optimization cannot be made, as, "how can the compiler tell if a pointer has been marked as invalid if the value has been read from a file?".
But UB is UB, trying to find an explanation does not always make sense:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(){
int* p = malloc(sizeof(int));
uintptr_t ap = (uintptr_t)p;
int* q = realloc(p, sizeof(int));
uintptr_t aq = (uintptr_t)q;
if (aq == ap){
*(int*)ap = 1;
*(int*)aq = 2;
printf("%d %d\n", *(int*)ap, *(int*)aq);
}
return 0;
}
Even if the two integral types have the same value, when cast to pointers they behave differently(!)
It is interesting to see that
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void){
int* p = malloc(sizeof(int));
uintptr_t ap = (uintptr_t)p;
int* q = realloc(p, sizeof(int));
uintptr_t aq = (uintptr_t)q;
if (aq == ap){
aq = ap;
*(int*)ap = 1;
*(int*)aq = 2;
printf("%d %d\n", *(int*)ap, *(int*)aq);
}
return 0;
}
makes the program well-defined, but it is not obvious why aq = ap
after aq == ap
is necessary, and does not get optimized out.
UB cannot be compartmentalized
While it is clear that a non-recoverable error will bring down the whole process, it is not always clear that perfectly fine-looking code is affected by some code that comes afterward because of time travel 🗄️.
For example
void foo(int* a){
if(a == nullptr){
bar();
}else{
baz();
}
if( a == nullptr){
*a = 1;
}
}
Since *a
when a==nullptr
is undefined, the compiler can assume that a
is never nullptr
, and remove the check at the beginning of the function.
Also, code that is never executed, just compiled, can change the behavior of an unrelated piece of code
namespace{
typedef void fun();
fun* f = nullptr;
}
void bar();
void never_called() {
f = bar;
}
int main() {
f();
}
Because calling f()
with f==nullptr
is an undefined behavior, the compiler does not need to add any check for such a case.
As f
has internal linkage (thus cannot be changed from another translation unit) and is assigned in only one place to bar
, the compiler inlines the location where f
points to bar
.
In the following example
#include <cstdio>
int main() {
while (true){}
}
void never_called() {
std::puts("never_called");
}
The simple fact that the function unreachable
is defined after main
leads it to be called, even if it is effectively dead code. Removing the function or moving it before main
, changes the behavior of the program
#include <cstdio>
void never_called() {
std::puts("never_called");
}
int main() {
while (true){}
}
Those examples show why it is generally not possible to localize undefined behavior, any attempt to do so is doomed to fail. The function never_called
is, as the name implies, never used. But it does not matter, its presence in the source code is enough to change how the code is transformed, and in the case of undefined behavior, it can have unintended side effects.
Conclusion
Different programmers have different experiences and expectations about undefined behavior means.
I believe the biggest issue of undefined behavior is that it gives too much liberty about how the code can be transformed when things go wrong. And defining "when things go wrong" is not easy.
In other languages, if something that is not defined happens, in most cases it does not change code written previously or in an unrelated function.
In C++, sorting std::vector<double>
containing a nan value can do anything. In other languages, it might return an unsorted container. Returning an unsorted container instead of crashing goes against the fail-fast approach, but having nasal daemons when executing an unrelated function does not help much in practice.
For this reason, even if the compiler can remove safety checks, it is important to
-
compile with as many diagnostics as possible, and do not ignore them silently
-
enable all hardening flag 🗄️
-
verify invariants, pre, and postconditions
-
have an automated test suite
-
test with and without optimizations
-
test with sanitizers
-
test with Valgrind
-
use static analyzers
To detect and avoid as early as possible undefined behavior.
More specific solutions given a certain environment might be to
-
disable some optimizations
As a more long-term plan, it would be nice if in the standard it would be possible to limit what undefined behavior can do.
For example, I would dislike that some unrelated code is removed because I’m sorting a container with a nan
value. On the other hand, I believe it is perfectly fine if it causes stack overflows, crashes, dead-locks, emits a signal, reboots my PC, the postconditions do not hold (the vector is not sorted, or some values are missing), corrupts the heap, and so on.
As long as those things happen while sorting or after (for example because of memory corruption), it is possible to use sanity checks for analyzing the state of a program and the reason for the errors.
Unfortunately, I am not aware of any plan/proposal for constraining undefined behavior.
I think the main bet is contracts, as they would give the end-user the possibility to enforce them and thus detect UB.
Another area of improvement is compiler diagnostics. If those were active even when not all compiler flags are turned on would be great, but this is easier said than done. If a new, useful, warning generates too 🗄️ many diagnostics 🗄️ that do not 🗄️ result in bugs 🗄️, it might simply get turned off, as there are alternative (maybe faster) approaches to find most bugs 🗄️. Once a warning is disabled, it is unlikely that it will get enabled again.
Defining a "friendly subset" might seem a good idea, the impossible task is to agree on such subset 🗄️, at that point, it might be better to define a new language, better tooling, or try to fix the issue in the official standard.
Note 📝 | The C language defines an optional extension to limit what undefined behavior can do, but as far as I know, none of the three major compilers did implement it. |
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.