The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal Public Domain Dedication

Tracing with GCC

Tracing in C and C++

Tracing function calls is a general technique for debugging applications (and collecting information).

When something does not work as expected, the first useful information is to know where the error happened, what’s the status of the program, and so on.

Unfortunately, the most technique I’ve seen boil down to editing the called (or supposedly called) functions and add a printf call, hidden behind a macro that would add other information like function name and line number.

An example of this technique can be found here here.

The approach works, and has multiple advantages:

  • it does not require appropriate tooling, a library, or a special environment

  • it is portable, works on all platforms and with all compilers

printf debugging is such a ubiquitous technique that it should not be simply dismissed, but it has some disadvantages:

  • it requires to change the code

  • it is a manual process

Requiring to change the code is the biggest disadvantage, especially if there is a lot of it (counting also external libraries). As the process is manual, there is no guarantee that every function is going to be traced as it should.

But even with smaller programs, tracing (or more generally logging) should not depend on the functionality of the application, just like the ability to debug programs. Fortunately, we are not required to annotate in some way the functions we want to generate debug symbols in the source code itself. Just like debugging, tracing should be orthogonal to the functionality of our programs.

__cyg_profile_func_enter and __cyg_profile_func_exit

GCC (and Clang for compatibility) seem to offer a solution to this problem: __cyg_profile_func_enter and __cyg_profile_func_exit.

When compiling a test program with -finstrument-functions, the function __cyg_profile_func_enter will be called after entering any function, and __cyg_profile_func_exit after exiting any function. One of the parameters of those functions is the address of the called functions.

Seems like there is already a solution to tracing, too bad I’ve never seen it used in practice.

I decided to try this compiler flag, as there might be some gotcha. In particular, as the documentation is lacking, I wanted to know how this feature worked in different contexts. For example:

  • are defaulted constructors/destructors traced? (either compiler-generated or user-defined with = default)

  • are (virtual) member functions and classes traced?

  • are callbacks (call to function pointers) traced?

  • is __cyg_profile_func_exit also called in case of an exception (I’m not that interested in a longjmp)

  • are inlined function traces as if they were not inlined?

  • is (std::) library code traced too?

  • how does it work with constexpr functions?

This is the test program:

#include <cstdio>
#include <algorithm>

struct C0 {
	C0() {}
};

struct C1 {
	C1() = default;
	virtual void doit() {}
	virtual ~C1() = default;
};
struct C2 : C1 {
	virtual void doit() override {}
	virtual ~C2() = default;
};

namespace {
	void fun_in_unnamed_namespace(C1& c1) {
		c1.doit();
	}
}


extern "C" {
	void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
	void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));

	void __cyg_profile_func_enter(void *func,  void *caller) {
		std::printf("> %p\n", func);
	}
	void __cyg_profile_func_exit (void *func, void *caller) {
		std::printf("< %p\n", func);
	}
};

void throwing_foo() {
	throw 1;
}


using callback = int (*)(int);

constexpr int foo(int a) {
	return a;
}

void foo_with_callback(callback f) {
	f(1);
	int arr[]{1,2,3,4};
	std::sort(arr, arr+3);
}

int main(int argc, char**) try {
	C0 c0;
	C1 c1;
	C2 c2;
	fun_in_unnamed_namespace(c1);
	fun_in_unnamed_namespace(c2);
	foo_with_callback(foo);
	constexpr int _1 = foo(1);
	int _2 = foo(argc);

	auto l = [](){
		return 12;
	};
	auto _3 = l();
	foo_with_callback([](int){return 1;});
	throwing_foo();
} catch(...) {
	return 0;
}

What is actually traced

So let’s compile and run the program:

g++ -finstrument-functions tracing.cpp -o tracing && ./tracing

The output looks like

> 0x55a37721047e
> 0x55a3772106ae
< 0x55a3772106ae
> 0x55a3772101a6
> 0x55a3772106e4
< 0x55a3772106e4
< 0x55a3772101a6
...
> 0x55a37721078e
< 0x55a37721078e
< 0x55a37721047e

So let’s try to add some more useful information, like the function name.

With dladdr it is possible to query the function name. This would be an additional dependency on our program.

I’ve changed the tracing functions to

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <dlfcn.h>

#include <cstdio>
#include <algorithm>


namespace {
	Dl_info get_dl_info(void *func,  void *caller) __attribute__((no_instrument_function));
	Dl_info get_dl_info(void *func,  void *caller) {
		Dl_info info;
		if(!dladdr(func, &info)){
			info.dli_fname = "?";
			info.dli_sname = "?";
		}
		if(!info.dli_fname) {
			info.dli_fname = "?";
		}
		if(!info.dli_sname) {
			info.dli_sname = "?";
		}
		return info;
	}
}
extern "C" {
	void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
	void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));

	void __cyg_profile_func_enter(void *func,  void *caller) {
		auto info = get_dl_info(func, caller);
		std::printf("> %p [%s] %s\n", func, info.dli_fname, info.dli_sname);
	}
	void __cyg_profile_func_exit (void *func, void *caller) {
		auto info = get_dl_info(func, caller);
		std::printf("< %p [%s] %s\n", func, info.dli_fname, info.dli_sname);
	}
};

and the output is

> 0x561775cee556 [./tracing] ?
> 0x561775cee78a [./tracing] ?
< 0x561775cee78a [./tracing] ?
> 0x561775cee2d4 [./tracing] ?

For some reason, gathering information about the function name fails.

To make it work, we need to add another parameter while compiling:

g++ -finstrument-functions -ldl -Wl,--export-dynamic tracing2.cpp -o tracing && ./tracing | c++filt

As the resulting names are mangled, for readability c++filt is used for demangling

The result looks like

> 0x563b9a37f524 [./tracing] main
> 0x563b9a37f78c [./tracing] C0::C0()
< 0x563b9a37f78c [./tracing] C0::C0()
> 0x563b9a37f2d4 [./tracing] ?
> 0x563b9a37f7c2 [./tracing] C1::doit()
< 0x563b9a37f7c2 [./tracing] C1::doit()
< 0x563b9a37f2d4 [./tracing] ?
> 0x563b9a37f2d4 [./tracing] ?
> 0x563b9a37f7f8 [./tracing] C2::doit()
< 0x563b9a37f7f8 [./tracing] C2::doit()
< 0x563b9a37f2d4 [./tracing] ?
> 0x563b9a37f3a5 [./tracing] foo_with_callback(int (*)(int))
> 0x563b9a37f82d [./tracing] foo(int)
...
< 0x563b9a37f3a5 [./tracing] foo_with_callback(int (*)(int))
> 0x563b9a37f901 [./tracing] void std::sort<int*>(int*, int*)
> 0x563b9a37f68a [./tracing] __gnu_cxx::__ops::__iter_less_iter()
< 0x563b9a37f68a [./tracing] __gnu_cxx::__ops::__iter_less_iter()
> 0x563b9a37f97a [./tracing] void std::__sort<int*, __gnu_cxx::__ops::_Iter_less_iter>(int*, int*, __gnu_cxx::__ops::_Iter_less_iter)
...
> 0x563b9a37f344 [./tracing] throwing_foo()
< 0x563b9a37f344 [./tracing] throwing_foo()
> 0x563b9a380c42 [./tracing] C2::~C2()
> 0x563b9a37f86c [./tracing] C1::~C1()
< 0x563b9a37f86c [./tracing] C1::~C1()
< 0x563b9a380c42 [./tracing] C2::~C2()
> 0x563b9a37f86c [./tracing] C1::~C1()
< 0x563b9a37f86c [./tracing] C1::~C1()
< 0x563b9a37f524 [./tracing] main

and with clang:

...
< 0x402860 [./tracing] void std::sort<int*>(int*, int*)
> 0x4022f0 [./tracing] throwing_foo()
> 0x4028e0 [./tracing] C2::~C2()
> 0x402940 [./tracing] C1::~C1()
< 0x402940 [./tracing] C1::~C1()
< 0x4028e0 [./tracing] C2::~C2()
> 0x402940 [./tracing] C1::~C1()
< 0x402940 [./tracing] C1::~C1()
< 0x4023a0 [./tracing] main

The only difference I could spot between Clang and GCC is that in the case of Clang when an exception is thrown, __cyg_profile_func_exit is not called in the function where the exception is thrown. It is still called in the other functions while unwinding the stack.

Otherwise:

  • Defaulted constructor (compiler provided or = default) are not traced

  • (virtual) member functions and lambdas are traced

  • callbacks are traced

  • Library code is traced too if not compiled/linked separately, this means that all implementation details of the templated function in the standard library will be traced.

  • constexpr functions are traced only at runtime

Notice: I have used printf instead of std::cout for different reasons, but later I noticed using std::cout might cause an overflow. Probably operator<< is inlined in the user code, causing another call to __cyg_profile_func_enter. This issue did not happen with std::printf, as it compiled separately, and only linked afterward with the test program.

Thanks to dladdr, we have a viable alternative to manually edit all the functions one by one. As before, we do not need a special environment or separate tooling, but also this approach has some drawbacks:

  • Some compile-time information needs to be computed at runtime

  • Not portable on any compiler (MSVC does not support -finstrument-functions)

  • Not portable on all platforms (there might be alternatives to dladdr for those platforms)

  • There is an additional dependency (-ldl)

  • To query the function name, we needed to add all symbols to the dynamic symbol table (-Wl,--export-dynamic or, only GCC: -rdynamic).

The importance of those drawbacks depends on the application requirements. Calling dladdr at (nearly) every function call is an expensive operation, especially compared to the first approach, where the function name is available at compile-time with __func__. Also querying the function name at runtime opens the door to possible programming and runtime errors.

It occurred to me that saving the function address is sufficient for analyzing the program flow. Tools like addr2line can pinpoint in the source code which functions at which line corresponds to which address.

The problem is that because of pie (used by default by GCC in Debian) and ASLR (used by most modern Operating Systems), the function address is randomized at runtime.

Since, in my tests, the function addresses were global, and not relative to the process, addr2line is not able to find the correct function.

As I did not find a way to get an address relative to the process, I decided to convert it before calling addr2line.

By inspecting the binary it is possible to get the relative address, otherwise, it is possible to let the compiler generate a file with those pieces of information with -Wl,-Map=tracing.map:

g++ -finstrument-functions -Wl,-Map=tracing.map tracing.cpp

This will generate a tracing.map file, containing the relative address of all functions.

Thus if we can get the runtime address of one function, we can calculate the runtime offset and transform all addresses from global to local.

This is straightforward to implement:

extern "C" {
	void __cyg_profile_func_enter(void *this_fn, void *call_site) __attribute__((no_instrument_function));
	void __cyg_profile_func_exit(void *this_fn, void *call_site) __attribute__((no_instrument_function));

	void __cyg_profile_func_enter(void *func,  void *caller) {
		static const int unused = []() __attribute__((no_instrument_function))  {
			void* enter = (void*)(&__cyg_profile_func_enter);
			std::printf("%p\n\n", enter);
			return 0;
		}(); (void) unused;
		std::printf("> %p\n", func);
	}

	void __cyg_profile_func_exit (void *func, void *caller) {
		std::printf("< %p\n", func);
	}
};

With this code, the first printed address will be of the function __cyg_profile_func_enter. This code uses a local lambda to ensure thread safety.

The offset can be calculated as follows:

loc_addr_enter="$(grep __cyg_profile_func_enter tracing.map | tr -s ' ' | cut -d' ' -f2)";
./a.out > tracing.txt
glob_addr_enter="$(head -n 1 tracing.txt)";
offset=$(($glob_addr_enter-$loc_addr_enter));

and used as

glob_addr_of_some_fun="$(sed '10q;d' tracing.txt | cut -d' ' -f2)";
addr2line "$(printf '%x\n' $(($glob_addr_of_some_fun-$offset)))" -e a.out -f | c++filt

Debug symbols are not required. If those are not available, in my case, addr2line was never able to determine the line where the function is implemented, but it was always able to determine its name.

With a couple of command-line tools, we have achieved another possible solution.

Table 1. Comparison
Manual editing and __func__ dladdr addr2line

Need to edit function implementation

yes

no

no

Access to input/output parameters

yes

no

no

Access to function name

at compile-time

not all names (function with hidden visibility or missing symbols are not available), at runtime

post-run analysis

Availability

works in all configurations

needs -finstrument-functions, -Wl,--export-dynamic and dladdr

needs -finstrument-functions and addr2line

Scalability/Maintainable

does not scale well, error-prone

automated

automated

Instrument subset of source code

edit function body

attribute no_instrument_function at function declaration, or blacklist approach through -finstrument-functions-exclude-file-list= to the compiler

Requires specialized tools/environment

no

not on GNU/Linux, but requires additional symbols

not on GNU/Linux, but stripping might break the functionality

I actually like the approach with addr2line. The approach is, in my experience, not as much as ubiquitous as manual editing every function, but it permits debugging workflows that were hard to imagine before.

The main issue with dladdr is that it requires adding symbols to the executable, it has "big" overhead, and fails at least with functions with hidden visibility. Of course, using dladdr does not imply that addr2line cannot be used to fill the gaps, and the information provided by dladdr might be enough for many use cases.

The main disadvantage of addr2line is that it requires a post-run analysis on a non-stripped binary. The tools required for the analysis should be common enough to be available on most systems.

While analyzing the tooling I looked mainly at GNU/Linux systems, on Windows, there is some support for it.

With g++ and addr2line on both Cygwin and msys2, everything works as described if building with debug symbols (pass -g to the compiler), otherwise function name and line numbers could not be determined.

dladdr does not seem to work in both environments (tested with the following parameters: -g -finstrument-functions -ldl -Wl,--export-all-symbols), as it is unable to find any symbol.

The fact that there is no good "native" and simple alternative for MSVC or an alternative that does not require specialized tools/environments on Windows is a problem when not working only on GNU/Linux.

Of course, there exist alternatives, otherwise, it would not be possible to write libraries like Interceptor and subhook. Interceptor uses the _penter 🗄️ and _pexit 🗄️, which require writing assembly to work and subhook uses assembly to work on all platforms. This makes them both harder to use, as the assembly limits the portability, and because _penter and _pexit does not seem to be available on MSVC with Windows x64 as a target. Otherwise, using a tested library is normally a better approach.

One might argue that relying on symbols is bad, especially on debug symbols. But using __func__, __FILE__, or __LINE__ is doing exactly the same, just manually: adding information for debugging.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.