The CMake logo, licensed under CC BY 2.0

Blacklist dependencies in CMake

Notes published the
8 - 10 minutes to read, 2034 words

Recently I had the following situation:

I had a command line executable, that during the build process had a library as a dependency, which provided functionalities for graphical applications.

The executable did not use any of the functions, so the easiest fix would be to remove the offending library. Unfortunately, the library was not a direct dependency, but a transitive one.

In CMake, the project looked similar to

cmake_minimum_required(VERSION 3.25)

project(app)

add_library(lib0 lib0.cpp lib0.h)

add_library(lib1 lib1.cpp lib1.h)
target_link_libraries(lib1 lib0)

add_executable(app main.cpp)
target_link_libraries(app lib1)

Note that app depends on both lib1 and lib0, but only lib1 has been added as a direct dependency to app.

While it is obvious that if a is a command-line executable you should not link it to a library that provides graphical dialogs, it is not always easy to spot the issue.

The graphical library could be a dependency of the library lib0. The author of app could (should) check manually all dependencies, but what if the graphical component has been added in a second moment?

Analyze dependencies with CMake

First of all, it is useful to get a visual representation of all dependencies.

With such a graph, it is trivial to see the relation of all targets.

Reviewing such a graph, when multiple people work together, could be part of a review process, but it is error-prone, especially when a project has a high number of targets (high could mean something like "more than 15").

Once it has been established that app is a command line executable and that libgui is a GUI library, and one should not depend on the other, why do the review manually?

In CMake, it is currently, not possible to query all transitive dependencies, only the direct ones, thus one needs to query and accumulate them by hand:

# Usage: get_recursive_link_libraries(TARGET <target> OUTPUT_LIST <output_list> [KEEP_ALIASES])
function(get_recursive_link_libraries)
    cmake_parse_arguments(PARSED_ARGS "KEEP_ALIASES" "TARGET;OUTPUT_LIST" "" ${ARGN})
    if(PARSED_ARGS_UNPARSED_ARGUMENTS)
        message(FATAL_ERROR "Unknown parameters: ${PARSED_ARGS_UNPARSED_ARGUMENTS}")
    endif()
    get_target_property(ALIAS ${PARSED_ARGS_TARGET} ALIASED_TARGET)
    if(ALIAS AND NOT PARSED_ARGS_KEEP_ALIASES)
        set(PARSED_ARGS_TARGET ${ALIAS})
    endif()
    get_target_property(LIBS ${PARSED_ARGS_TARGET} LINK_LIBRARIES)
    set(OUTPUT_LIST ${PARSED_ARGS_TARGET}) # add the target in case it appears somewhere as a dependency
    while(LIBS)
        list(POP_FRONT LIBS TARGET_LIB)
        if(NOT TARGET ${TARGET_LIB}) # might be custom linker flag or generator expression
            message(WARNING "Skip ${TARGET_LIB}, as it is not a CMake target")
            continue()
        endif()
        get_target_property(ALIAS ${TARGET_LIB} ALIASED_TARGET)
        if(ALIAS AND NOT PARSED_ARGS_KEEP_ALIASES)
            set(TARGET_LIB ${ALIAS})
        endif()
        list(FIND OUTPUT_LIST ${TARGET_LIB} VISITED)
        if(${VISITED} EQUAL -1)
            list(APPEND OUTPUT_LIST ${TARGET_LIB})
            get_target_property(LIBS_OF_LIBS ${TARGET_LIB} LINK_LIBRARIES)
            if(NOT LIBS_OF_LIBS)
                continue()
            endif()
            list(APPEND LIBS ${LIBS_OF_LIBS})
        endif()
    endwhile()
    list(REMOVE_ITEM OUTPUT_LIST ${PARSED_ARGS_TARGET})
    set(${PARSED_ARGS_OUTPUT_LIST} ${OUTPUT_LIST} PARENT_SCOPE)
endfunction()

The current implementation of get_recursive_link_libraries works with aliases 🗄️, interface libraries 🗄️, circular dependencies (every dependency is listed and analyzed only once), private dependencies 🗄️.

Before target_link_options was introduced, target_link_libraries was also used to add custom linker flags. This should not be done anymore, but we do not necessarily control the CMake files of all the dependencies. Therefore everything that is not a CMake target is skipped. Generator expressions might not work as expected, as they are not recognized as CMake targets.

I did not see any difference between INTERFACE_LINK_LIBRARIES and LINK_LIBRARIES, I am not sure in which situations they behave differently.

At that point, verifying if libgui appears as a dependency is trivial:

get_recursive_link_libraries(
    TARGET app
    OUTPUT_LIST APP_LINK_LIBS
)

if(libgui IN_LIST APP_LINK_LIBS)
    message(FATAL_ERROR "libgui is a dependency of app. List of dependencies: ${APP_LINK_LIBS}")
endif()

Is this a problem even if the dependency is unused?

Yes, you generally want to avoid having unnecessary dependencies, and there are multiple reasons.

If the library is effectively unused, removing it at build time might not even make a difference on the final executable. You might obtain a binary identical file after recompiling your code. Thanks to LTO and dead-code elimination, the compiler and linker could be able to remove all unused code.

The first issue is that even if the irrelevant code is dropped, the build system still needs to compile the dependencies. This has multiple side effects, especially in bigger projects with multiple targets.

Unnecessary dependencies diminish how many components can be built in parallel, thus the system might not be able to use the resources as efficiently as possible.

On a non-clean build, it means recompiling all dependent targets, reducing the ratio of build-test.

Once the library has been built, the compiler and linker have more work to do to remove the unnecessary code. Especially for the linker, this can also take a non-trivial amount of time. LTO can increase by one or two orders of magnitude the time required for creating the final executable, on less powerful devices it could also be impossible to enable it because it might require too many resources (RAM for example).

The second issue is that the compiler and linker might not be able to remove all the code. This leads to larger binaries (when linking statically), or to a longer list of components that need to be on the target platform (in the case of shared libraries). Per se, this is not necessarily problematic, but what if one wants to do some post-build analysis, for example, a smoke test that an executable does not create dialogs? Or maybe access some other type of resources, like the open web?

It is possible to bypass those checks by loading libraries manually, but in this case, we are not defending ourselves against a possible malevolent executable; we are doing some sanity checks on an application we have built ourselves.

Linking an unnecessary library can have side effects at runtime if there are variables initialized in the global scope, those will execute some code during startup.

Other types of resources

There are many other use cases one might want to guard against. The first example that comes to mind is specialized libraries. If your application does not deal with documents, does it need to link something for parsing .doc, .pdf, and other files?

But how can one avoid such situations? I think a proactive approach is unfortunately useless.

I wrote "unfortunately", because the error is not always easy to spot, and thus a proactive approach would be better.

As mentioned, this is an issue not only during build times but also at runtime. It might not affect the fact that the application will work correctly.

An unnecessary dependency might introduce security bugs, and make the main application slower, or bigger than expected, but for many use cases, finding those issues is harder than a program that misbehaves.

Thus such issues might remain undetected for a long time, even if 100% reproducible.

It would be nice to proactively define somehow that we do not want to depend on certain libraries, but how do we plan to enlist all libraries for opening PDF documents, that are not even part of our workflow?

Why clutter the build system with checks against libraries that might not even work on your machine?

Thus manually inspecting dependency graphs seems currently to be the best approach, and once the issue occurs, it might make sense to add an automatic verification.

But should we list libraries, or is it possible to use something else?

cmake_minimum_required(VERSION 3.25)

project(app)

add_library(lib0 lib0.cpp lib0.h)
set_target_properties(lib00 PROPERTIES GUILIB TRUE)

add_library(lib1 lib1.cpp lib1.h)
target_link_libraries(lib1 lib0)

add_executable(app main.cpp)
target_link_libraries(app lib1)

get_recursive_link_libraries(
    TARGET app
    OUTPUT_LIST APP_LINK_LIBS
)

foreach(LIB IN LISTS APP_LINK_LIBS)
    get_target_property(ISGUILIB ${LIB} GUILIB)
    if(ISGUILIB)
        message(FATAL_ERROR "app depends on ${LIB}, which has the property GUILIB set")
    endif()
endforeach()

Note that we can set properties on targets we do not own, as long as those are non-alias targets. This is very practical for this use case, as we can add properties to third-party dependencies without patching their build files (especially as they might not use CMake for building)

This means that if we have a set of graphical libraries (internal and external), it is possible to tag them all appropriately and verify for all relevant applications (or libraries) that they do not have a graphical library as a dependency.

Enforce internal structure

A similar approach could also be used for whitelisting components.

In bigger projects with multiple targets, some libraries provide a lower-level abstraction and other libraries provide higher-level abstractions. lower-level libraries (often) should not depend on higher-level libraries.

One could split the CMake projects, maybe even in multiple repositories. It would then be impossible to create an unwanted dependency by accident.

On the other hand, having everything in the same repository, in the same project, enables workflows that are otherwise fragile, error-prone, or time-consuming. For example: deprecating, removing, or updating an API of a component.

With a multi-repo setup, it means at least coordinating multiple repositories. There is a dependency between the commits of different repositories, and it cannot be found in the history of the project but needs to be documented (and eventually asserted) somewhere.

This makes understanding the evolution of the code much harder and preparing a working environment more error-prone.

In a single repository, not only it is easier to see how the code evolves, but it is also possible to make such changes atomically. This means that there would be never an inconsistent state, it makes verifying the correctness of such transition generally easier, and it is possible to recognize from the history commit history how and why the code changed. Thus fewer opportunities for errors.

set_target_properties(lib00 PROPERTIES LEVEL 0)
set_target_properties(lib0 PROPERTIES LEVEL 0)
set_target_properties(lib1 PROPERTIES LEVEL 1)

get_recursive_link_libraries(
    TARGET lib1
    OUTPUT_LIST APP_LINK_LIBS
)
get_target_property(LEVEL_LIB1 lib1 LEVEL)
foreach(LIB IN LISTS APP_LINK_LIBS)
    get_target_property(LEVEL_DEP_LIB ${LIB} LEVEL)
    message(WARNING "${LIB} - ${LEVEL_DEP_LIB}")
    if(NOT LEVEL_DEP_LIB GREATER_EQUAL 0) // (1)
        message(FATAL_ERROR "property level of ${LIB} has unexpected value (${LEVEL_DEP_LIB})")
    endif()
    if(LEVEL_DEP_LIB GREATER LEVEL_LIB1)
        message(FATAL_ERROR "level of ${LIB} (${LEVEL_DEP_LIB}) is higher than the level of lib1 (${LEVEL_LIB1})")
    endif()
endforeach()
  1. I’m using GREATER_EQUAL, because if(LEVEL_DEP_LIB) does not distinguish if the value is not set, or if it has the value 0

This issue is very specific

On the contrary, I Immediately remembered reading that the Microsoft compiler had a similar issue:

If you run the compiler with the /analyze option then it loads mspft120.dll - the /analyze DLL. Then mspft120 loads msxml6.dll to load an XML configuration file. Then msxml6 loads urlmon.dll to open the stream, and finally urlmon loads mshtml.dll. Then mshtml.dll creates a window, because that’s what it does.

…​

I’m sure that every step in this process makes sense in some context - just not in this context. I suspect nobody ever noticed that mshtml.dll was being loaded, or else they didn’t run enough parallel compiles for it to matter.

— You Got Your Web Browser in my Compiler!
from randomascii 🗄️

But this is one of those types of issues that are not diagnosed often, because the application works correctly, but not optimally.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.