The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal Public Domain Dedication

Post-build static analysis

Notes published the
6 - 8 minutes to read, 1613 words

Most analysis development tools that I’m aware of puts the emphasis on the source code (like cppcheck or the compiler), before or during compilation, or at runtime (like Valgrind, sanitizers, gcov, …​), while or after executing the executable.

I’m not aware of many tools, or practices of doing a static analysis after building an executable, but without executing it (without counting reverse engineering). So I started experimenting a little bit, and see if could gather some useful information.

Different tools permit to analyze an executable, the first that comes to mind are readelf, objdump and nm (all part of gnu binutils 🗄️). Aside from the command line tools, there are also libraries for different programming languages, which can give greater flexibility for more complex tasks.

Search for global variables

Just execute one of the following commands

objdump --demangle --all-headers <object file>
objdump --demangle --syms <object file>

Will print a lot of interesting information, in particular, the symbol table and shared libraries.

It’s possible to find all global instances, functions, and other interesting details from the symbol table.

I noted in a project, for example, that many constants were placed in the .bss segment, while I would have expected them to be in the .data.rel.ro or .rodata segment. They also appeared more than once, some of those at least 8 times!

Thus I search the variable in the source code and found the following construct

// .hpp
const std::string constant = "value";

After putting so much effort into analyzing the effects of global variables, I’m happy to see how it is possible to find some of those errors statically.

In this case, there are multiple possible fixes:

// .hpp
extern const std::string constant;

// .cpp
constant = "value";

which works since C++89, or (if using at least C++17)

// .hpp
inline const std::string constant = "value";

With both approaches, all unnecessary copies of "value" are avoided, but the variable is still going to land in the .bss and not .rodata section.

To avoid this runtime cost, we need to use a container that does not allocate memory, like std::string_view, a \0-terminated const char*, or a const char array. If \0 -termination is important, it might be better to avoid std::string_view, and resort to const char*, or write an own string_view that guarantees \0-termination.

The following command should list all variables in the .bss section. As they are sorted by symbol name, it should also help to find those that probably have unnecessary copies:

objdump --demangle --sym --section=.bss <object file> | grep -v -e '^$' | tail -n +3 | sort -k5 | less

With readelf it is possible to gather similar information:

readelf --wide --symbols <object file> | c++filt | grep -v -e '^$' -e '^Symbol' -e ' vtable' -e ' typeinfo' -e ' FUNC ' | sort -k8

and with nm too

nm --print-size --demangle --debug-syms <object file> | grep -i -e ' B ' -e ' C ' -e ' S ' | sort -k3

Those are a good starting point for searching global variables that are not statically constant and appear multiple times.

Note 📝
If you get an error message similar to nm: <object file>: no symbols, then the symbols have been stripped. Use --dynamic instead of --debug-syms.

Detecting singletons

While grepping for occurrences, I also had some instances that looked like

00000000000040a0 l     O .bss	0000000000000008              guard variable for my_struct::instance
0000000000004098 l     O .bss	0000000000000001              my_struct::instance

After inspecting the source code, it was possible to conclude that the guard variable is created by the compiler in case of a singleton. Since C++11, following code is assured to be thread-safe, as if std::call_once was used:

struct my_struct{
	// ...
};

const my_struct& foo(){
    static const my_struct instance = {};
    return instance;
}

This is ensured, for example, by creating a hidden guard variable, to prevent data races if foo is called from multiple threads.

So

objdump --demangle --sym <object file> | grep 'guard variable for '

seems to be a good starting point for searching all compiled singletons.

The compiler can avoid emitting those guard variables if not necessary, for example

const char* foo(){
    static const char* instance = "...";
    return instance;
}

does not generate any guard variable (tested with GCC and Clang, both with -O0)

In this particular case, instance ends in the .data section (which is similar to .bss, except that it is not zero-init). I actually expected it to end in the .rodata section as the value instance cannot get modified outside of foo. Even compiling (both Clang and GCC) with -D_FORTIFY_SOURCE=2 and -O3 did not make any difference.

The following equivalent piece of code puts instance in the .rodata section (tested again both with GCC and clang, no special compiler flag is necessary):

const char* foo(){
    static const char* const instance = "...";
    return instance;
}

or

const char* foo(){
    static constexpr const char* instance = "...";
    return instance;
}

Even more simply, it is possible to write:

constexpr const char* instance = "...";

Verify if using a sanitizer with nm

Valgrind is one of my favorite tools for development. It helps to catch otherwise hard-to-diagnose errors, like resource leaks and concurrency issues. The biggest downside (if you are not on Windows) is the performance hit, it might even make your application 10 times slower. If you are on Windows, the biggest downside is that this tool is not available outside of GNU/Linux and BSD systems, unless the program runs under WINE.

Generally, sanitizers are faster, sometimes (not always) they have much better diagnostic messages, there are ports for Windows, and do not require any change in the runtime environment.

Unfortunately, they do not work together with Valgrind and probably other memory checkers that I’m not aware of.

Thus, a script that executes a test suite with Valgrind, could test if the binary has been linked with a sanitizer and avoid using it.

A simple check with nm would be

nm --demangle --debug-syms a.out | grep -E ' (__.san|__sanitizer|__ubsan|)::'

As those symbols depend on the used library (AddressSanitizer, UndefinedBehaviorSanitizer, ThreadSanitizer, …​), and as the grep expression could also match some internal symbol of the library (it would not be that uncommon to have, for example, a "sanitize_input" function), care should be taken when grepping for those functions.

hardening-check

The Debian package devscript contains the command-line utility hardening-check

This is the output on a simple C++ "Hello World!" program, compiled by invoking GCC (g++ (Debian 8.3.0-19) 8.3.0) with no other flags produced:

a.out:
 Position Independent Executable: yes
 Stack protected: no, not found!
 Fortify Source functions: unknown, no protectable libc functions used
 Read-only relocations: yes
 Immediate binding: no, not found!
 Stack clash protection: unknown, no -fstack-clash-protection instructions found
 Control flow integrity: unknown, no -fcf-protection instructions found!

The output of Clang (clang version 7.0.1-9 (tags/RELEASE_701/final)):

a.out:
 Position Independent Executable: no, normal executable!
 Stack protected: no, not found!
 Fortify Source functions: unknown, no protectable libc functions used
 Read-only relocations: yes
 Immediate binding: no, not found!
 Stack clash protection: unknown, no -fstack-clash-protection instructions found
 Control flow integrity: unknown, no -fcf-protection instructions found!

As build systems for big or reusable projects tend to be complicated, instead of verifying by hand that flags are passed correctly and not ignored somewhere, it is easier to check if the built binary meets the requirements.

Of course, trying to simplify the build system should be the long-term solution, but hardening-check is a simple tool for verifying if everything (to some degree) is working as expected.

This is the same approach that we should take when writing code: We should strive to write code so simple that there are obviously no bugs. Unfortunately most of the time we face code that is so complex that there are no obvious bugs in it, thus we need tests to try to find them. (quoted from Hoare)

Alternate approaches to static post-build analysis and conclusion

Different use cases popped up while experimenting with my compiled binaries.

  • Test for false expectations, and some common errors. For example, the author of the code did not intend to create multiple global instances or variables that are not really constant.

  • It showed in some places possibilities on how to simplify code, or that it is possible to remove code without affecting what’s generated.

  • It can help to ensure the build system is behaving as expected, for example, that -fpie or hardening flag does not get ignored by some libraries.

  • Verify some properties and act accordingly, for example when running a test suite with a memory checker or other debugging facility.

Of course, false expectations and common errors can be found with compiler warnings and static analysis. Unfortunately Cppcheck (version 1.88), Clang (version 7.0.1-9) with -Weverything, GCC, and MSVC (version 19.21.27702.2) with /Wall /analyze did not warn about global constants not being real constants, as in

const char* foo(){
    static const char* instance = ...;
    return instance;
}

Of course, there are other ways of detecting those potential issues.

The first one would be to write plugins (for GCC or clang), which is surely an interesting task, but those would depend on the compiler or other programs, and limit their usefulness. Enhancing Cppcheck (or another static analyzer) or the compiler itself would be the best approach, as it would make the new diagnostic available to a much wider audience, without installing anything separately and removing the need to use or learn a separate tool. Otherwise writing a tool from scratch seems to be a nice idea, but parsing C++ is incredibly difficult.

While all these approaches are valid and more robust as long-term solutions, none of them are as simple as invoking a program from the command line and interpreting its textual output.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.