Tracing with GCC
Tracing in C and C++
Tracing function calls is a general technique for debugging applications (and collecting information).
When something does not work as expected, the first useful information is to know where the error happened, what’s the status of the program, and so on.
Unfortunately, the most technique I’ve seen boil down to editing the called (or supposedly called) functions and add a printf
call, hidden behind a macro that would add other information like function name and line number.
An example of this technique can be found here here.
The approach works, and has multiple advantages:
-
it does not require appropriate tooling, a library, or a special environment
-
it is portable, works on all platforms and with all compilers
printf debugging is such a ubiquitous technique that it should not be simply dismissed, but it has some disadvantages:
-
it requires to change the code
-
it is a manual process
Requiring to change the code is the biggest disadvantage, especially if there is a lot of it (counting also external libraries). As the process is manual, there is no guarantee that every function is going to be traced as it should.
But even with smaller programs, tracing (or more generally logging) should not depend on the functionality of the application, just like the ability to debug programs. Fortunately, we are not required to annotate in some way the functions we want to generate debug symbols in the source code itself. Just like debugging, tracing should be orthogonal to the functionality of our programs.
__cyg_profile_func_enter
and __cyg_profile_func_exit
GCC (and Clang for compatibility) seem to offer a solution to this problem: __cyg_profile_func_enter
and __cyg_profile_func_exit
.
When compiling a test program with -finstrument-functions
, the function __cyg_profile_func_enter
will be called after entering any function, and __cyg_profile_func_exit
after exiting any function. One of the parameters of those functions is the address of the called functions.
Seems like there is already a solution to tracing, too bad I’ve never seen it used in practice.
I decided to try this compiler flag, as there might be some gotcha. In particular, as the documentation is lacking, I wanted to know how this feature worked in different contexts. For example:
-
are defaulted constructors/destructors traced? (either compiler-generated or user-defined with
= default
) -
are (virtual) member functions and classes traced?
-
are callbacks (call to function pointers) traced?
-
is
__cyg_profile_func_exit
also called in case of an exception (I’m not that interested in alongjmp
) -
are inlined function traces as if they were not inlined?
-
is (
std::
) library code traced too? -
how does it work with
constexpr
functions?
This is the test program:
#include <cstdio>
#include <algorithm>
struct C0 {
C0() {}
};
struct C1 {
C1() = default;
virtual void doit() {}
virtual ~C1() = default;
};
struct C2 : C1 {
virtual void doit() override {}
virtual ~C2() = default;
};
namespace {
void fun_in_unnamed_namespace(C1& c1) {
c1.doit();
}
}
extern "C" {
void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_enter(void *func, void *caller) {
std::printf("> %p\n", func);
}
void __cyg_profile_func_exit (void *func, void *caller) {
std::printf("< %p\n", func);
}
};
void throwing_foo() {
throw 1;
}
using callback = int (*)(int);
constexpr int foo(int a) {
return a;
}
void foo_with_callback(callback f) {
f(1);
int arr[]{1,2,3,4};
std::sort(arr, arr+3);
}
int main(int argc, char**) try {
C0 c0;
C1 c1;
C2 c2;
fun_in_unnamed_namespace(c1);
fun_in_unnamed_namespace(c2);
foo_with_callback(foo);
constexpr int _1 = foo(1);
int _2 = foo(argc);
auto l = [](){
return 12;
};
auto _3 = l();
foo_with_callback([](int){return 1;});
throwing_foo();
} catch(...) {
return 0;
}
What is actually traced
So let’s compile and run the program:
g++ -finstrument-functions tracing.cpp -o tracing && ./tracing
The output looks like
> 0x55a37721047e
> 0x55a3772106ae
< 0x55a3772106ae
> 0x55a3772101a6
> 0x55a3772106e4
< 0x55a3772106e4
< 0x55a3772101a6
...
> 0x55a37721078e
< 0x55a37721078e
< 0x55a37721047e
So let’s try to add some more useful information, like the function name.
With dladdr it is possible to query the function name. This would be an additional dependency on our program.
I’ve changed the tracing functions to
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <dlfcn.h>
#include <cstdio>
#include <algorithm>
namespace {
Dl_info get_dl_info(void *func, void *caller) __attribute__((no_instrument_function));
Dl_info get_dl_info(void *func, void *caller) {
Dl_info info;
if(!dladdr(func, &info)){
info.dli_fname = "?";
info.dli_sname = "?";
}
if(!info.dli_fname) {
info.dli_fname = "?";
}
if(!info.dli_sname) {
info.dli_sname = "?";
}
return info;
}
}
extern "C" {
void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_enter(void *func, void *caller) {
auto info = get_dl_info(func, caller);
std::printf("> %p [%s] %s\n", func, info.dli_fname, info.dli_sname);
}
void __cyg_profile_func_exit (void *func, void *caller) {
auto info = get_dl_info(func, caller);
std::printf("< %p [%s] %s\n", func, info.dli_fname, info.dli_sname);
}
};
and the output is
> 0x561775cee556 [./tracing] ? > 0x561775cee78a [./tracing] ? < 0x561775cee78a [./tracing] ? > 0x561775cee2d4 [./tracing] ?
For some reason, gathering information about the function name fails.
To make it work, we need to add another parameter while compiling:
g++ -finstrument-functions -ldl -Wl,--export-dynamic tracing2.cpp -o tracing && ./tracing | c++filt
As the resulting names are mangled, for readability c++filt
is used for demangling
The result looks like
> 0x563b9a37f524 [./tracing] main > 0x563b9a37f78c [./tracing] C0::C0() < 0x563b9a37f78c [./tracing] C0::C0() > 0x563b9a37f2d4 [./tracing] ? > 0x563b9a37f7c2 [./tracing] C1::doit() < 0x563b9a37f7c2 [./tracing] C1::doit() < 0x563b9a37f2d4 [./tracing] ? > 0x563b9a37f2d4 [./tracing] ? > 0x563b9a37f7f8 [./tracing] C2::doit() < 0x563b9a37f7f8 [./tracing] C2::doit() < 0x563b9a37f2d4 [./tracing] ? > 0x563b9a37f3a5 [./tracing] foo_with_callback(int (*)(int)) > 0x563b9a37f82d [./tracing] foo(int) ... < 0x563b9a37f3a5 [./tracing] foo_with_callback(int (*)(int)) > 0x563b9a37f901 [./tracing] void std::sort<int*>(int*, int*) > 0x563b9a37f68a [./tracing] __gnu_cxx::__ops::__iter_less_iter() < 0x563b9a37f68a [./tracing] __gnu_cxx::__ops::__iter_less_iter() > 0x563b9a37f97a [./tracing] void std::__sort<int*, __gnu_cxx::__ops::_Iter_less_iter>(int*, int*, __gnu_cxx::__ops::_Iter_less_iter) ... > 0x563b9a37f344 [./tracing] throwing_foo() < 0x563b9a37f344 [./tracing] throwing_foo() > 0x563b9a380c42 [./tracing] C2::~C2() > 0x563b9a37f86c [./tracing] C1::~C1() < 0x563b9a37f86c [./tracing] C1::~C1() < 0x563b9a380c42 [./tracing] C2::~C2() > 0x563b9a37f86c [./tracing] C1::~C1() < 0x563b9a37f86c [./tracing] C1::~C1() < 0x563b9a37f524 [./tracing] main
and with clang:
... < 0x402860 [./tracing] void std::sort<int*>(int*, int*) > 0x4022f0 [./tracing] throwing_foo() > 0x4028e0 [./tracing] C2::~C2() > 0x402940 [./tracing] C1::~C1() < 0x402940 [./tracing] C1::~C1() < 0x4028e0 [./tracing] C2::~C2() > 0x402940 [./tracing] C1::~C1() < 0x402940 [./tracing] C1::~C1() < 0x4023a0 [./tracing] main
The only difference I could spot between Clang and GCC is that in the case of Clang when an exception is thrown, __cyg_profile_func_exit
is not called in the function where the exception is thrown. It is still called in the other functions while unwinding the stack.
Otherwise:
-
Defaulted constructor (compiler provided or
= default
) are not traced -
(virtual) member functions and lambdas are traced
-
callbacks are traced
-
Library code is traced too if not compiled/linked separately, this means that all implementation details of the templated function in the standard library will be traced.
-
constexpr functions are traced only at runtime
Notice: I have used printf
instead of std::cout
for different reasons, but later I noticed using std::cout
might cause an overflow. Probably operator<<
is inlined in the user code, causing another call to __cyg_profile_func_enter
. This issue did not happen with std::printf
, as it compiled separately, and only linked afterward with the test program.
Thanks to dladdr
, we have a viable alternative to manually edit all the functions one by one. As before, we do not need a special environment or separate tooling, but also this approach has some drawbacks:
-
Some compile-time information needs to be computed at runtime
-
Not portable on any compiler (MSVC does not support
-finstrument-functions
) -
Not portable on all platforms (there might be alternatives to
dladdr
for those platforms) -
There is an additional dependency (
-ldl
) -
To query the function name, we needed to add all symbols to the dynamic symbol table (
-Wl,--export-dynamic
or, only GCC:-rdynamic
).
The importance of those drawbacks depends on the application requirements. Calling dladdr
at (nearly) every function call is an expensive operation, especially compared to the first approach, where the function name is available at compile-time with __func__
. Also querying the function name at runtime opens the door to possible programming and runtime errors.
It occurred to me that saving the function address is sufficient for analyzing the program flow. Tools like addr2line
can pinpoint in the source code which functions at which line corresponds to which address.
The problem is that because of pie
(used by default by GCC in Debian) and ASLR
(used by most modern Operating Systems), the function address is randomized at runtime.
Since, in my tests, the function addresses were global, and not relative to the process, addr2line
is not able to find the correct function.
As I did not find a way to get an address relative to the process, I decided to convert it before calling addr2line
.
By inspecting the binary it is possible to get the relative address, otherwise, it is possible to let the compiler generate a file with those pieces of information with -Wl,-Map=tracing.map
:
g++ -finstrument-functions -Wl,-Map=tracing.map tracing.cpp
This will generate a tracing.map
file, containing the relative address of all functions.
Thus if we can get the runtime address of one function, we can calculate the runtime offset and transform all addresses from global to local.
This is straightforward to implement:
extern "C" {
void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));
void __cyg_profile_func_enter(void *func, void *caller) {
static const int unused = []() __attribute__((no_instrument_function)) {
void* enter = (void*)(&__cyg_profile_func_enter);
std::printf("%p\n\n", enter);
return 0;
}(); (void) unused;
std::printf("> %p\n", func);
}
void __cyg_profile_func_exit (void *func, void *caller) {
std::printf("< %p\n", func);
}
};
With this code, the first printed address will be of the function __cyg_profile_func_enter
. This code uses a local lambda to ensure thread safety.
The offset can be calculated as follows:
loc_addr_enter="$(grep __cyg_profile_func_enter tracing.map | tr -s ' ' | cut -d' ' -f2)";
./a.out > tracing.txt
glob_addr_enter="$(head -n 1 tracing.txt)";
offset=$(($glob_addr_enter-$loc_addr_enter));
and used as
glob_addr_of_some_fun="$(sed '10q;d' tracing.txt | cut -d' ' -f2)";
addr2line "$(printf '%x\n' $(($glob_addr_of_some_fun-$offset)))" -e a.out -f | c++filt
Debug symbols are not required. If those are not available, in my case, addr2line
was never able to determine the line where the function is implemented, but it was always able to determine its name.
With a couple of command-line tools, we have achieved another possible solution.
Manual editing and __func__ | dladdr | addr2line | |
---|---|---|---|
Need to edit function implementation | yes | no | no |
Access to input/output parameters | yes | no | no |
Access to function name | at compile-time | not all names (function with hidden visibility or missing symbols are not available), at runtime | post-run analysis |
Availability | works in all configurations | needs | needs |
Scalability/Maintainable | does not scale well, error-prone | automated | automated |
Instrument subset of source code | edit function body | attribute | |
Requires specialized tools/environment | no | not on GNU/Linux, but requires additional symbols | not on GNU/Linux, but stripping might break the functionality |
I actually like the approach with addr2line
. The approach is, in my experience, not as much as ubiquitous as manual editing every function, but it permits debugging workflows that were hard to imagine before.
The main issue with dladdr
is that it requires adding symbols to the executable, it has "big" overhead, and fails at least with functions with hidden visibility. Of course, using dladdr
does not imply that addr2line
cannot be used to fill the gaps, and the information provided by dladdr
might be enough for many use cases.
The main disadvantage of addr2line
is that it requires a post-run analysis on a non-stripped binary. The tools required for the analysis should be common enough to be available on most systems.
While analyzing the tooling I looked mainly at GNU/Linux systems, on Windows, there is some support for it.
With g++
and addr2line
on both Cygwin and msys2, everything works as described if building with debug symbols (pass -g
to the compiler), otherwise function name and line numbers could not be determined.
dladdr
does not seem to work in both environments (tested with the following parameters: -g -finstrument-functions -ldl -Wl,--export-all-symbols
), as it is unable to find any symbol.
The fact that there is no good "native" and simple alternative for MSVC or an alternative that does not require specialized tools/environments on Windows is a problem when not working only on GNU/Linux.
Of course, there exist alternatives, otherwise, it would not be possible to write libraries like Interceptor and subhook. Interceptor uses the _penter
🗄️ and _pexit
🗄️, which require writing assembly to work and subhook uses assembly to work on all platforms. This makes them both harder to use, as the assembly limits the portability, and because _penter
and _pexit
does not seem to be available on MSVC with Windows x64 as a target. Otherwise, using a tested library is normally a better approach.
One might argue that relying on symbols is bad, especially on debug symbols. But using __func__
, __FILE__
, or __LINE__
is doing exactly the same, just manually: adding information for debugging.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.