Register function at link time

Notes published the
15 - 18 minutes to read, 3686 words

While writing down that a possible improvement for the test suite in 100 lines of code is to reduce the number of allocations, I wrote

In practice, TEST_CASE appear on different translation units that are compiled separately, thus this information is not available, and cannot be used with constexpr/consteval.

— from "A C++ test suite in 100 lines of code", section Fewer allocations

and

In other languages, tests are annotated or follow a specific naming convention that can be queried at runtime; in C++, there is unfortunately no such mechanism.

— from "A C++ test suite in 100 lines of code", section Test runner

In both cases, I avoid going into details, because both statements are not exactly true.

It is possible to annotate functions in C++ too, and it is possible to collect them during the build process.

Use-cases?

The main use-case for such a process are test suites, but it is a general mechanism that provides a uniform way to access initialisation routines, a method for registering callbacks or other type of data at compile-time, and so on.

For the rest of the notes, I’ll write about function pointers, but the same logic applies for many more data types.

Register functions at runtime

Registering functions automatically at runtime means adding one global declaration for every element, and a function call for registering the elements.

Example from my other notes

#include <iostream>
#include <vector>
#include <cassert>

using function_sig = void();

constinit struct {
  std::vector<function_sig*> tests = {};
  int add(function_sig* test) {
    tests.push_back(test);
    return 0;
  }
  void execute_tests(){
    for(const auto& t : tests){
      try{
        t();
      } catch(...){
        std::cerr << "test failed\n";
      }
    }
  }
} runner;

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)
#define TEST_CASE(name) \
  static void name(); \
  const int CONCAT(impl, __LINE__) = runner.add(name); \
  static void name()

// example usage
int add(int a, int b) {return a + b + 1;}

TEST_CASE(test1) {
  assert(add(1, 2) == 3);
  assert(add(4, 5) == 9);
};

TEST_CASE(test2) {
  assert(1 == 1);
};

int main() {
  runner.execute_tests();
}

The main disadvantage of this approach is that it requires to allocate memory, but it also has other drawbacks.

The simple fact that some code before main is executed is not ideal, and global mutable function pointers pose a security risk. On MSVC it is possible to mitigate the security risk with the EncodePointer function 🗄️, something similar can be implemented for all platforms.

Register functions at runtime, without allocating memory

It is possible to create other data structures that do not require memory allocation, for example, an out-of-line linked list.

Thus contrary to what I’ve claimed, it is possible to collect all data without allocating memory.

#include <cstdio>

using test_signature = void();

struct node {
  private:
    inline static constinit const node* first = nullptr;
  public:
    static node const* start() noexcept {
      return first;
    }
    test_signature* const callback;
    const node* const next;
    explicit node(test_signature* c) noexcept : callback( c ), next(first)
    {
      first = this;
    }
};

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)
#define TEST_CASE(name) \
  static void name(); \
  const auto CONCAT(impl, __LINE__) = node(name); \
  static void name()

TEST_CASE(test1){ std::puts("test1"); }
TEST_CASE(test2){ std::puts("test2"); }

int main() {
  for(auto i = node::start(); i != nullptr; i = i->next){
    i->callback();
  }
}

The main advantage is that it does not require a memory allocation.

The disadvantage, compared to an array-like structure, is that there is only one way to traverse the container, it is not possible, for example, to traverse it from last to first or multiple elements in parallel.

Nevertheless, if it is mandatory not to use any dynamic memory, it is a grate alternative.

Register functions at compile-time with custom parsing

One could use a custom script that parses all sources, extracts the function name from TEST_CASE, and generates a header file generated.hpp

The source code would look like

main.cpp
#include "generated.hpp"

#define TEST_CASE(name) void name()

TEST_CASE(test1) {
  std::puts("test1");
};

TEST_CASE(test2) {
  std::puts("test1");
};

int main() {
  for(const auto& v : tests){
    v();
  }
}

and the generated source file like

generated.hpp
using function_sig = void();

void test1();
void test2();

constexpr function_sig* const tests[]{
  &test1,
  &test2,
};

The main advantage of this approach, apart from the fact that does not require allocations, is that no code is executed before main, and it does not depend on how a specific compiler works.

The main disadvantage is that it requires an external tool for parsing all source files, the functions cannot be anonymous/private, test functions need a unique name, and be in the global scope (unless you want to do more complex parsing and support namespaces too).

It does prove that both points I made were too pessimistic; the macro itself can be used as a tagging mechanism, and we can collect (with some additional limitation) the necessary information at compile-time by generating the relevant source code.

We do not necessarily need to parse the whole C++ syntax; only find the relevant TEST_CASE;

One might assume that something like

#!/usr/bin/env python3

# return first i >= input, such that content[i] is not whitespace, throws if none found
def forward_whitespace(i:int, content:str) -> int:
    while(content[i] == ' ' or  content[i] == '\n' or  content[i] == '\r'): i = i+1
    return i

# return first i >= input, such that content[i] is whitespace, throws if none found
def forward_nonwhitespace(i:int, content:str) -> int:
    while(content[i] != ' ' and content[i] != '\n' and content[i] != '\r'): i = i+1
    return i

# return first i <= input, such that content[i] is not whitespace, throws if none found
def rewind_whitespace(i:int, content:str) -> int:
    while(content[i] == ' ' or  content[i] == '\n' or  content[i] == '\r'): i = i-1
    return i

# return first i <= input, such that content[i] is whitespace, throws if none found
def rewind_nonwhitespace(i:int, content:str) -> int:
    while(content[i] != ' ' and content[i] != '\n' and content[i] != '\r'): i = i-1
    return i

TEST_CASE = "TEST_CASE"

if __name__ == '__main__':
    content = open("main-tests.cpp", "r").read()
    functions = []
    index = 0
    while True:
      index = content.find(TEST_CASE, index)
      if (index == -1): break

      assert index == 0 or content[index -1 ] == ' ' or content[index -1 ] == '\n'
      previndex = rewind_whitespace(index-1, content)
      previndex2 = rewind_nonwhitespace(previndex, content)
      if content[previndex2+1:previndex+1] == '#define' or content[previndex2+1:previndex+1] == 'define' :
        index = index + len(TEST_CASE) # otherwise will find define again and again
        continue

      index = forward_whitespace(index+len(TEST_CASE), content)
      assert content[index] == '('
      beginfunction = forward_whitespace(index+1, content)
      endfunction = forward_nonwhitespace(beginfunction, content)
      if(content[endfunction-1] == ')'):
         endfunction = endfunction -1
      assert beginfunction != endfunction
      functions.append(content[beginfunction:endfunction])

    print("#pragma once\n")
    print("using function_sig = void();\n")
    for function in functions: print("void {}();".format(function))

    print("\nconstexpr function_sig* const tests[]{")
    for function in functions: print("  &{},".format(function))
    print("};\n")

might seem sufficient, but it would fail in multiple ways for the following snippet:

#if 0
TEST_CASE(test4){std::puts("test4");}
#else
TEST_CASE(test5){std::puts("test5");}
#endif


#ifdef FOO
TEST_CASE(test6){std::puts("test6");}
#endif

namespace a::b{
TEST_CASE(test7){std::puts("test7");}
}

namespace {
TEST_CASE(test8){std::puts("test8");}
}
// TEST_CASE(test9){std::puts("test9");}
/*
TEST_CASE(test10){std::puts("test10");}
*/
#define MYTEST TEST_CASE
MYTEST(test11){std::puts("test7");}

#define MYTEST2(V) V
MYTEST2(TEST_CASE(test12)){std::puts("test7");}

For test4, the code should be skipped, while for test6, the parsing script needs to know with which compiler flags the file is compiled, which makes the setup more complex. One could argue that conditional compilation (thus supporting test6) is just a nice-to-have, but it would be too surprising if it is not supported.

For test7, the parsing script needs to remember in which namespace the function has been defined, while for test8, it should generate a warning (or error out) as it cannot generate code for test8.

test9 and test10 should be skipped too, at least adding support for those should not be too hard.

While test11 and test12 do not seem to make much sense, someone might want to annotate the test differently, and let a custom parser collect them, just like we are doing for TEST_CASE. Although this use case might be less common, it should be supported too.

A big issue of parsing the source code is the additional restriction there are.

There is another approach, another much more popular "tagging mechanism": linker sections.

Some compilers can be instructed to put specific pieces of data in specific sections of the binary. GCC, Clang, and MSVC support this approach. At run-time, it is then possible to iterate between the global symbols and find the correct code to execute.

The main disadvantages of this approach are

  • the fact that the code is not portable

  • the risk of creating invalid binaries (thus introducing errors that are normally only possible, unless the compiler or linker has a bug)

  • risk of depending on undefined behavior

GNU ld linker script

#include <cstdio>
#include <cstdint>
#include <span>

using test_signature = void();

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)
#define TEST_CASE(name) \
  void name();\
  [[gnu::used]] constexpr auto CONCAT(helper, __LINE__) [[gnu::section(".tmptests")]] = &name; \
  void name()

TEST_CASE(test1){std::puts("test1");}
TEST_CASE(test2){std::puts("test1");}

std::span<test_signature* const> get_tests() noexcept {
  extern test_signature* tests_begin[];
  extern test_signature* tests_end[];
  const auto tests_size = ((uintptr_t)(tests_end) - (uintptr_t)(tests_begin))/sizeof(test_signature*);
  test_signature*const* begin = tests_begin;
  asm("":"+r"(begin));
  return std::span<test_signature* const>(begin, begin + tests_size);
}

int main() {
  auto funcs = get_tests();
  for(const auto& v : funcs){
    v();
  }
}
linkerscript.ld
SECTIONS
{
  tests (READONLY) : {
    PROVIDE(tests_begin = .);
    KEEP(*(.tmptests))
    PROVIDE(tests_end = .);
  }
}
INSERT AFTER .data;

how to compile: gcc --std=c++20 -Wl,-Tlinkerscript.ld main.cpp

Some important things to note.

The size of the array has been determined with uintptr_t)(tests_end) - (uintptr_t)(tests_begin/sizeof(test_signature*), and not simply with tests_end - tests_begin.

Doing tests_end - tests_begin, or trying to iterate from tests_begin to tests_end would invoke undefined behavior, because pointer arithmetic is defined only on arrays, and in this case, there is no array tests_begin (and there is no array tests_end too).

The cast to uintptr_t makes the operation valid, as uintptr_t is an integral type, and the subtraction does thus not depend on pointer arithmetic. Also uintptr_t, if available, is defined to be big enough to hold any pointer.

Before creating the array, the snippet asm("":"+r"(begin)); ensures that the compiler does not use his knowledge that there is no array; the snippet has been copied from the GCC bug tracker.

Note that this piece of assembly is supported by all versions of GCC, on all platforms and operating systems, it should thus not introduce any portability issues as long as one is working with GCC (hopefully Clang too, I’m still waiting for a response).

But it is problematic when working with different compilers.

Since C++23, there is a functionality in the standard library that should permit getting rid of the inline assembly: start_lifetime_as_array.

#include <cstdio>
#include <cstdint>
#include <span>
#include <utility>

using test_signature = void();

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)
#define TEST_CASE(name) \
  void name();\
  [[gnu::used]] constexpr auto CONCAT(helper, __LINE__) [[gnu::section(".tmptests")]] = &name; \
  void name()

TEST_CASE(test1){std::puts("test1");}
TEST_CASE(test2){std::puts("test1");}

std::span<test_signature* const> get_tests() noexcept {
  extern test_signature* tests_begin[];
  extern test_signature* tests_end[];
  const auto tests_size = ((uintptr_t)(tests_end) - (uintptr_t)(tests_begin))/sizeof(test_signature*);
  test_signature*const* begin = tests_begin;
  begin = std::start_lifetime_as_array<test_signature**>(begin, tests_size);
  return std::span<test_signature* const>(begin, begin + tests_size);
}

int main() {
  auto funcs = get_tests();
  for(const auto& v : funcs){
    v();
  }
}

Although no assembly is required, the implementation is currently less portable, as std::start_lifetime_as_array is missing from most toolchains.

Another thing to notice is that I’ve used .

The main reason is that with some optimization flags, the compiler removes the functions test1, test2, and the variable placed in the .tmptests section. Since no C++ code uses those functions and variables, the compiler assumes that those can be removed. And normally that would be correct. In this particular case, we are resorting to external tools and iterating over the global memory, thus we have to instruct the compiler not to eliminate what does look like dead code.

There is one last issue with this approach; the linker emits a warning about relocation, and I did not find a way to suppress it (without disabling relocation). Fortunately, I was made aware that there is a way to avoid the liker script altogether.

GNU ld without linker script

If the section name does not contain a dot (.), then the linker will automatically generate a start and stop symbol for you.

#include <cstdio>
#include <cstdint>
#include <span>

using test_signature = void();

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)
#define TEST_CASE(name) \
  void name();\
  [[gnu::used]] constexpr auto CONCAT(helper, __LINE__) [[gnu::section("tests")]] = &name; \
  void name()

TEST_CASE(test1){std::puts("test1");}
TEST_CASE(test2){std::puts("test2");}

std::span<test_signature* const> get_tests() noexcept {
  extern test_signature* __start_tests[];
  extern test_signature* __stop_tests[];
  const auto tests_size = ((uintptr_t)(&__stop_tests) - (uintptr_t)(&__start_tests))/sizeof(test_signature*);
  test_signature*const* begin = __start_tests;
  asm("":"+r"(begin));
  return std::span<test_signature* const>(begin, begin + tests_size);
}

int main() {
  auto funcs = get_tests();
  for(const auto& v : funcs){
    v();
  }
}

MSVC toolchain

Similarly to GCC and Clang, it is possible to define and use sections with the Microsoft compiler too:

#include <cstdio>
#include <span>
#include <iostream>

using test_signature = void();

#pragma comment(linker, "/merge:tests=.rdata")
#pragma section("tests$a", read)
#pragma section("tests$b", read)
#pragma section("tests$c", read)


#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)

#define TEST_CASE(name) \
  void name(); \
  extern __declspec(allocate("tests$b")) constexpr auto CONCAT(helper, __LINE__) = &name; \
  void name()

TEST_CASE(test1){std::puts("test1");}
TEST_CASE(test2){std::puts("test2");}

std::span<test_signature* const> get_tests() noexcept {
  __declspec(allocate("tests$a")) static constexpr test_signature* tests_begin = nullptr;
  __declspec(allocate("tests$c")) static constexpr test_signature* tests_end = nullptr;
  const auto tests_size = ((uintptr_t)(&tests_end) - (uintptr_t)(&tests_begin))/sizeof(test_signature*);
  test_signature*const* begin = &tests_begin;
  return std::span<test_signature* const>(begin, begin + tests_size);
}

int main() {
  auto funcs = get_tests();
  for(const auto& v : funcs){
    if(v){
      v();
    };
  }
}

There are some differences.

The first is that I’ve used extern instead of an attribute like gnu::used. I’ve currently found no other way to prevent the compiler from optimizing too much code out, the MSVC compiler does not seem to have an attribute like msvc::used. It is not a great solution, I’m sure there is something better (using #pragma comment(linker, "/include:<mangled function signature>") has its own set of issues, it does not work with constexpr, auto, and namespaces, and using something like #pragma optimize("", off) has the same limitations)

The second difference is that in the loop I added a if(v), to test if a pointer is nullptr or not.

The main reason is that the MSVC linker might insert padding between elements, and if there is padding, then it will be memset to 0. In the case of pointers, this is not a big issue, in MSVC a pointer memset to 0 is nullptr, and is conventionally already used for determining if a pointer points to something or not.

It is problematic if you want to put other objects, structs that contain references, and so on.

The simplest workaround would be to add an indirection and always have an array of pointers.

Common solution

To sum it up, the runner could use one of the many mechanisms described here; for brevity, only the registration mechanism that exposes an array-like structure has been wrapped, not the whole runner.

#include <cstdio>
#include <cstdint>
#include <span>

using test_signature = void();

#define CONCAT_IMPL(x, y) x##y
#define CONCAT(x, y) CONCAT_IMPL(x, y)

#ifdef ALLOCATE
#include <vector>

constinit std::vector<test_signature*> tests;

#define TEST_CASE(name) \
  void name(); \
  const int CONCAT(impl, __LINE__) = (tests.push_back(&name),0); \
  void name()

std::span<test_signature* const> get_tests() noexcept {
  return std::span<test_signature* const>(tests);
}

#elif CUSTOM_PARSING
#include "generated.hpp"

std::span<test_signature* const> get_tests() noexcept {
  return std::span<test_signature* const>(tests);
}

#elif _MSC_VER
#pragma comment(linker, "/merge:tests=.rdata")
#pragma section("tests$a", read)
#pragma section("tests$b", read)
#pragma section("tests$c", read)

#define TEST_CASE(name) \
  void name(); \
  extern __declspec(allocate("tests$b")) constexpr auto CONCAT(helper, __LINE__) = &name; \
  void name()

std::span<test_signature* const> get_tests() noexcept {
  __declspec(allocate("tests$a")) static constexpr test_signature* tests_begin = nullptr;
  __declspec(allocate("tests$c")) static constexpr test_signature* tests_end = nullptr;
  const auto tests_size = ((uintptr_t)(&tests_end) - (uintptr_t)(&tests_begin))/sizeof(test_signature*);
  test_signature*const* begin = &tests_begin;
  return std::span<test_signature* const>(begin, begin + tests_size);
}

#elif __GNUC__

#define TEST_CASE(name) \
  void name();\
  [[gnu::used]] constexpr auto CONCAT(helper, __LINE__) [[gnu::section("tests")]] = &name; \
  void name()

std::span<test_signature* const> get_tests() noexcept {
  extern test_signature* __start_tests[];
  extern test_signature* __stop_tests[];
  const auto tests_size = ((uintptr_t)(&__stop_tests) - (uintptr_t)(&__start_tests))/sizeof(test_signature*);
  test_signature*const* begin = __start_tests;
  asm("":"+r"(begin));
  return std::span<test_signature* const>(begin, begin + tests_size);
}
#endif

TEST_CASE(test1){std::puts("test1");}
TEST_CASE(test2){std::puts("test2");}

int main() {
  auto funcs = get_tests();
  for(const auto& v : funcs){
    if(v){ // necessary for the MSVC implementation
      v();
    };
  }
}

Conclusion

There is no clear winner, all approaches have at least one disadvantage.

global vector global linked list GNU ld linker MSVC linker custom parsing

Works without allocating memory

no

yes

yes

yes

yes

Works with temporary lambda

yes

yes

yes

partially

no

Works with functions in anonymous namespace

yes

yes

yes

yes

no

Works with duplicated function names

yes

yes

yes

yes

no

Works without changing build process

yes

yes

no

yes

no

Buffer contains only valid elements

yes

yes

yes

no

yes

Buffer initialized before runtime

no

no

yes

yes

yes

Is toolchain agnostic

yes

yes

no

no

partially

Avoids low-level C++ funcionalities

yes

yes

no

no

yes

Is it const-correct

no

no

no

no

yes

This means, that supporting more methods and wanting to have a consistent experience means to search for the common denominator.

The MSVC linker might add padding between the elements, thus the buffer might contain invalid elements that need to be tested for. The easiest way to overcome this issue is to store the elements somewhere else globally and store only pointers in the buffer.

Using custom parsing, has the disadvantage that supporting conditional compilation is complex, and requires a change to the build system. The parser needs to know how single files are compiled, and eventually embed a more complex preprocessor, in case TEST_CASE is hidden behind another macro. And even if such cases are supported, lambdas and functions in anonymous namespaces are left out.

Supporting more methods also always has the disadvantage that it might bring to an explosion of test combinations if you develop for multiple systems and platforms.

If such a feature would be part of the standard, then it can provide an API that makes invoking UB less easy and would not require a custom build step.

This would be the winning approach, as it would provide the most benefits:

  • does not need memory allocation

  • const-correct by default

  • buffer initialized before runtime

  • could work with lambda and functions in anonymous namespaces

  • no invalid elements

  • no need to fiddle with extern, attributes, starting manually lifetime, uintptr_t, and other low-level tools

  • work with functions and objects with internal linkage

There is a proposal of 2023 🗄️, but it seems that no work is going on.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.