Global variables in C++ libraries
While working on a big project, we noticed there were some issues while the application exited.
Most of the time those issues manifested themselves as crashes, but it was not obvious which piece of code caused it.
After investigating this problem for a long time, we learned that (global) constants have a lot of bad side effects we were not aware of, even if those were scoped in a single translation unit.
Long story short, we had an issue similar to those that crypto++ had in 2010: a destructor of a global instance was executed more than once.
This is a minimal code example for reproducing the issue (depending on the environment):
// header file
const std::string hello = "Hello World!"; // might invoke UB
or, to make the issue even more explicit:
#include <iostream>
struct my_struct{
my_struct(){
std::cout << this << " hello\n";
}
~my_struct(){
std::cout << this << " goodbye\n";
}
};
const my_struct instance;
Baffled that undefined behavior gets triggered with such completely innocent-looking code, I decided to test different environments and configurations to see how to safely write global variables. While it might be a dubious task, as globals are best avoided, there are valid use cases. And most important constants are globals too. After all, I want to write code that works in all environments, be it Windows, GNU/Linux, or as an application or a library.
Most classes are already designed in a way that they can be used in most places. An int
, like any other primitive type, can be used as a local variable, member variable of another object, and global variable. There is no need to use a special syntax for initializing or using those different types of variables.
The same is true for any classes that are part of the standard library, and for most custom classes. Of course, it is possible to disable such behavior, like hiding constructors and destructors, overriding new
, and so on.
Therefore I find it very annoying that some code, apparently completely valid and designed to be used in all situations, causes undefined behavior in some environments.
Long story short, the simplest example I could create was a static library (lib0
), used by two shared libraries (lib1
and lib2
), which were linked to an executable. If lib0
would be shared, or if all libraries would be static then the issue is not reproducible. Of course, there are a lot of other factors: depending on where the global instance is defined made a difference, just like the platform. For example, on Windows, the issues were reproducible only in one particular scenario, while on GNU/Linux there were other scenarios.
The triggering factors were mainly symbol visibility. Other factors like linkage contributed to worsening or improving the situation, for more details, see the coming tables.
To make the situation more complex and confusing, where different language features have multiple effects. For example, declaring a variable as const
might not only make it constant; in the case of global variables, it might change its linkage too, but only in C++, not in C!
Test environment
The class used for testing looks like
struct my_struct{
my_struct(){
std::cout << this << " hello\n";
}
~my_struct(){
std::cout << this << " goodbye\n";
}
};
and the test environment looks as follow
add_library(lib0 STATIC lib0/lib0.hpp lib0/lib0.cpp)
target_include_directories(lib0 PUBLIC lib0)
add_library(lib1 lib1/lib1.hpp lib1/lib1.cpp)
target_include_directories(lib1 PUBLIC lib1)
target_link_libraries(lib1 lib0)
add_library(lib2 lib2/lib2.hpp lib2/lib2.cpp)
target_include_directories(lib2 PUBLIC lib2)
target_link_libraries(lib2 lib0)
add_executable(main main/main.cpp)
target_link_libraries(main lib1 lib2)
as described above, there is a static library (lib0
), used by two separate shared libraries (lib1
and lib2
), and both are linked to the same executable (main
).
From the description, it looks very similar to the diamond problems when supporting multiple inheritance. While writing this article I found the term Diamond dependency, which seems like an appropriate description. Unfortunately, this term is used to describe the following issue: two libraries depend on a common library but at different versions. The issue described in this article arises even when there are no version mismatches!
Summary
Some considerations of the results:
-
No libraries have been loaded by hand.
-
Binaries have been compiled and executed under GNU/Linux (thanks to WINE for testing Windows executables).
-
A further static analysis can be made with
nm
,objdump
, andreadelf
for those who know how to interpret the result. -
Everything tested is undefined behavior or implementation-specific (as the C++ standard does not define libraries in any way!), so it might break and change.
-
The row "ctor/dtor" gives the number of times the constructor (or destructor) has been called.
-
The row "inst" gives the number of different objects that have been constructed (and destructed).
-
The row "visibility" states which
-fvisibility
flag has been passed to the compiler. -
The row "weak" states if the attribute
weak
has been used or not. -
The row "const" states if the instance has been declared
const
or not. -
The row "static" states if the instance has been declared
static
or not. -
The row "link" states what linkage the instance has. The result depends on qualifiers like
extern
,static
,weak
, andconst
. -
"n/a" means that the option was not available (or that I’m not aware of it) for the given compiler and/or configuration.
-
If a symbol has been hidden, then it has not have
weak
linkage, even if annotated with theweak
attribute. -
If a symbol has
weak
linkage, then it has also external linkage. -
Since C++17, it is possible to declare variables as
inline
. In the case this made a difference, there is a separate row to show the effects.
Global in a header file
// .hpp
STATIC CONST INLINE my_struct instance = {};
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
4 | 4 | gcc/clang | default/hidden | na | c/nc | s/ns | int | if nc, then static |
4 | 4 | msvc | na | na | c/nc | s/ns | int | if nc, then static |
4 | 4 | mingw | default/hidden | na | c/nc | s/ns | int | if nc, then static |
2 | 2 | gcc/clang | hidden | na | c | ns | ext | inline (since C++17) |
1 | 1 | gcc/clang | default | na | c | ns | weak | inline (since C++17) |
2 | 2 | mingw | default/hidden | na | c | ns | ext | inline (since C++17) |
2 | 2 | msvc | na | na | c | ns | ext | inline (since C++17) |
For those not acquainted with the term "translation unit", this code might not do what the author wanted when defining a global instance. Every translation unit (rule of thumb: every .cpp
file that includes the given header file) will have a different instance of instance
, unless inline
(since c++17) is used.
Global variable in a source file, with extern
declaration
// .hpp / or .cpp
extern CONST my_struct WEAK_ATTR instance;
// .cpp
CONST my_struct instance = {};
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
2 | 1 | gcc/clang | default | nw | c/nc | na | ext | |
2 | 2 | gcc/clang | hidden | w/nw | c/nc | na | ext | |
1 | 1 | gcc/clang | default | w | c/nc | na | weak | |
2 | 2 | msvc | na | na | c/nc | na | ext | |
2 | 2 | mingw | default/hidden | w/nw | c/nc | na | ext | |
1 | 1 | gcc/clang | default | nw | c/nc | na | ext | inline (since C++17) |
Global variable in a source file, in an unnamed namespace
// file.cpp
namespace {
STATIC CONST INLINE my_struct instance = {};
}
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
2 | 2 | gcc/clang | default/hidden | na | c/nc | s/ns | int | |
2 | 2 | msvc | na | na | c/nc | s/ns | int | |
2 | 2 | mingw | default/hidden | na | c/nc | s/ns | int |
This instance is not accessible from other translation units but could be made through a function. The static modifier does not have any effect. An unnamed namespace in a translation unit defines an object with internal linkage and (apparently) hidden visibility. I could not find this fact (that anonymous namespaces have hidden visibility) stated anywhere in the documentation, but it makes sense and is a nice feature.
inline
, like static
did not make any difference.
Global in source file
// .cpp
STATIC CONST my_struct instance = {};
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
2 | 2 | gcc/clang | default/hidden | na | c/nc | s/ns | int | not static and const and default visibility |
2 | 1 | gcc/clang | default | na | nc | ns | ext | |
2 | 2 | msvc | na | na | c/nc | s/ns | int | |
2 | 2 | mingw | default/hidden | na | c/nc | s/ns | int | |
1 | 1 | gcc/clang | default | na | c/nc | ns | int | inline (since C++17) |
The anonymous namespace is a superior alternative, as it is less error-prone.
Singleton instance
// .hpp, optional
LIB0_API CONST my_struct& create_my_struct();
// .cpp
CONST my_struct& create_my_struct(){
static CONST my_struct instance;
return instance;
}
// or:
// .hpp
inline CONST my_struct& create_my_struct(){
static CONST my_struct instance;
return instance;
}
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
1 | 1 | gcc/clang | default | na | c/nc | s | int | |
2 | 2 | gcc/clang | hidden | na | c/nc | s | int | |
2 | 2 | msvc | na | na | c/nc | s | int | |
2 | 2 | mingw | default/hidden | na | c/nc | s | int |
Notice: There does not seem to give a configuration where the number of constructor and destructor calls mismatches with the number of instances. Notice that with "gcc/clang" with default
visibility, the singleton behaves as if it has weak
linkage, but it does not; it has internal linkage.
Static member variable
// .hpp
struct my_struct{
// ...
static CONST INLINE my_struct WEAK_ATTR instance;
};
// .cpp
CONST my_struct my_struct::instance = {}; // unless defined inline (since c++17)
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
2 | 1 | gcc/clang | default | nw | c/nc | s | ext | |
2 | 2 | gcc/clang | hidden | w/nw | c/nc | s | ext | |
1 | 1 | gcc/clang | default | w | c/nc | s | weak | |
2 | 2 | msvc | na | na | c/nc | s | ext | |
2 | 2 | mingw | default/hidden | w/nw | c/nc | s | ext | |
1 | 1 | gcc/clang | default | nw | c | s | ext | inline (since C++17) |
It is not possible to define a member variable with internal linkage unless the whole class has internal linkage. This means that the class needs to be defined(!) in an anonymous namespace, as shown in the next table.
Static member variable with internal linkage
// .cpp
namsepace {
struct my_struct{
// ...
static CONST my_struct WEAK_ATTR instance;
};
CONST my_struct my_struct::instance = {};
}
ctor/dtor | inst | compiler | visibility | weak | const | static | link | other settings |
---|---|---|---|---|---|---|---|---|
2 | 2 | gcc/clang | default/hidden | na | c/nc | s | int | |
2 | 2 | msvc | na | na | c/nc | s | int | |
2 | 2 | mingw | default/hidden | na | c/nc | s | int |
It behaves like a normal global instance in an anonymous namespace.
While this might be a potential use case for unnamed namespaces in header files, it might make more sense to try to avoid static member variables.
Conclusion
Testing different compilers with different types of global variables has been interesting, and some surprises came up.
Supposing that you do not want different (or the same) shared libraries to interfere with each other, the only solution is to prefer internal linkage and avoid public visibility for everything that is not part of the interface of the library.
The weak attribute seemed to help in some situations, but after looking better at the documentation it became clear that it is not a proper fix. It might be a viable solution in some environments, but it opens the door for everyone for overriding symbols (even by accident). Thus it does not solve the problem for different shared libraries, just only for the same library used multiple times.
For mutable globals, a much more robust solution is to define a symbol as private as possible: If we want some independence from the compiler (and not resort to macros), the anonymous namespace (in a .cpp
file), and a singleton instance are the less error-prone solutions.
Nonetheless, it is advised to set hidden visibility with GCC and clang (even the documentation states it!). As there is no way to trigger an error instead of relying on subtle undefined behavior, it is the safest option. It will also normally improve both the compile times and the runtime performance of the executable.
And of course, avoid global variables with a mutable state as much as possible. While normally when speaking about the state we tend to ignore constructors and destructors, for globals those are relevant too.
If the globals are constants, defining them as constexpr
is an even better solution, as no code is executed at runtime, and thus there is no interaction between libraries.
Some build systems, like CMake, provide some tools for helping settings the visibility. The settings VISIBILITY_INLINES_HIDDEN
and CXX_VISIBILITY_PRESET
ensure that the correct flag is passed to the compiler, while the GENERATE_EXPORT_HEADER
function defines macros like those described by GCC for controlling the interface of a shared library.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.