The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

Force null checks

Notes published the
Notes updated the
14 - 17 minutes to read, 3454 words
Categories: c c++ java
Keywords: c c++ data structures java

After encountering (again) The Billion Dollar Mistake in a class, I wanted to experiment a bit and see what patterns could help to avoid it.

Assuming you have a class with a pointer; something like:

struct bar{
    int doit();
};

class foo{
    bar* b;
public:
    explicit foo(bar* b) : b(b){}

    int baz1(){
        return this->b->doit();
    }

    int baz2(){
        if(not this->b){
            return 42;
        }
        return this->b->doit();
    }
};

What can we conclude?

First, foo has a pointer to bar, it does not matter if it is owned or not.

This pointer, at least according to baz2, can be nullptr.

So baz1 has a bug if it is called when b == nullptr.

While obvious in this small example, this issue is easily overlooked in bigger classes.

There are multiple possible outcomes

  • b is never (or should be never) nullptr, we have some dead and misleading code in baz2

  • b can be nullptr, but is never dereferenced when nullptr, either by accident or design. This cannot be recognized from this piece of code.

  • baz1 is sometimes called with b == nullptr, and we have a bug

I am interested in the case we are calling baz1 with b == nullptr.

At the moment, for the snippet foo(nullptr).baz1(); only GCC (with -Wall -Wextra and -O1 or -O2) warns that an invalid pointer is dereferenced. So static analysis is, at the moment, not good enough (the warning is triggered only because GCC can prove we are dereferencing an invalid pointer).

Is it possible to avoid this issue altogether?

Null Object Pattern

A possible approach is avoiding having a nullptr at all, even when some data is optional. The Null Object Pattern works well when it is possible to express consistently what to do when an object is not there, like returning/using specific values or throwing an error.

Most of the time, this pattern is trivial to express in the case of an interface and a class hierarchy.

We could rewrite the code as

#include <cassert>

struct interface{
    virtual int foo(int) = 0;
    virtual ~interface() = default;
};

class bar {
    interface* i;
public:
    explicit bar(interface* i_) : i([&] () -> interface* {
        if(i_){return i_;}
        static constinit struct : interface{
            int foo(int) override {return 42;};
        } ie;
        return &ie;
    }()){
        assert(this->i != nullptr);
    }

    int baz(int v){
        return this->i->foo(v); // no need to check for nullptr
    }
};

Since the case i == nullptr does not exist anymore, there is no chance anymore to dereference nullptr.

One could even change the member pointer to a member reference, unless bar needs to be assignable.

In Java and other languages where class hierarchies are predominant, the pattern looks even more natural

interface Interface {
   public int doit(int i);
}

class Bar{
    private Interface i;
    private static final Interface ei = new Interface(){
        public int doit(int i){return 42;}
    };
    public Bar(Interface i_){
        this.i = (i_ == null) ? ei : i_;
        assert(this.i != null);
    }
    int baz(int v){
        return this.i.foo(v); // no need to check for null
    }
}

It is not necessary to declare an anonymous class, but if it is needed only in one place, it makes it easier to keep everything together, as long as it makes sense.

Note 📝
I am using a global variable! And not even const! In this case, it is stateless and hidden, there should be no undefined behavior even when using this pattern in libraries. Be warned if you define the global somewhere else, or if it has some mutable state.

In case bar has an owning pointer, it is possible to use a similar paradigm, there are two approaches.

Function parameter

I was asked for an example of how to use this pattern for non-member variables.

Supposing that the code looks like

struct interface{
    int foo(int);
};

int bar(interface* i){
    if(i != nullptr){
        return i->foo(42);
    }
    return 42;
}

And one wants to avoid verifying if the parameter is nullptr in the implementation of bar, even if it is a valid value.

With a wrapper class, and replacing the parameter for the function fun, it can be done with minimal changes:

struct interface{
    virtual int foo(int) = 0;
    virtual ~interface() = default;
};

class non_null_interface {
    interface* i;
public:
    explicit non_null_interface(interface* i_) noexcept : i([&] () -> interface* {
        if(i_){return i_;}
        static constinit struct : interface{
            int foo(int) override {return 42;};
        } ie;
        return &ie;
    }()){
        assert(this->i != nullptr);
    }
    interface* operator->() const noexcept{
        return i;
    }
};

int bar(non_null_interface i){
    return i->foo(42); // no need to check for nullptr
}

Since non_null_interface overloads operator->, it has the same API of interface*, thus reducing the number of changes needed in the implementation of bar.

I’ve decided not to make the constructor explicit. This makes the signature change of bar API-compatible; users of the function do not need to change their code.

As non_null_interface is a (non-owning) "parameter type", there are no lifetime issues.

Owning pointer

Just use an owning pointer, and if it is nullptr, allocate a "Null Object":

#include <memory>
#include <cassert>

struct interface{
    virtual int foo(int) = 0;
    virtual ~interface() = default;
};

class bar {
    std::unique_ptr<interface> i;
public:
    explicit bar(std::unique_ptr<interface> i_) : i([&] {
        if(i_){return std::move(i_);}
        struct empty : interface{
            int foo(int) override {return 42;};
        };
        return std::unique_ptr<interface>(std::make_unique<empty>());
    }()){
        assert(this->i != nullptr);
    }

    int baz(int v){
        return this->i->foo(v); // no need to check for nullptr
    }
};

The main advantage is that it hides completely from the user the fact we are using a "Null Object". We could be even using a "Null Object" with state, which was not advisable in the case of a non-owning pointer (as the state would have been shared between all instances).

The downside is the additional memory caused by std::make_unique<empty>() was not necessary when verifying manually if i was nullptr or not.

Conditional owning pointer

As std::unique_ptr has a customizable deleter, and since the "Null Object" is stateless, it is possible to avoid unnecessary allocations:

#include <memory>
#include <cassert>

struct interface{
    virtual int foo(int) = 0;
    virtual ~interface() = default;
};

class bar {
    std::unique_ptr<interface, void(*)(interface*)> i;
public:
    explicit bar(std::unique_ptr<interface> i_) : i([&] {
        using uptr = std::unique_ptr<interface, void(*)(interface*)>;
        if(i_){return uptr(i.release(), [](interface* i){ delete i;});}
        static constinit struct : interface{
            int foo(int) override {return 42;};
        } ie;
        return uptr(&ie, [](interface*){});
    }()){
        assert(this->i != nullptr);
    }

    int baz(int v){
        return this->i->foo(v); // no need to check for nullptr
    }
};

In this case, we are converting a std::unique_ptr<interface> to a unique_ptr with a deleter selected at runtime. If the pointer passed in the constructor is not null, then we need to delete the pointer, otherwise, we use an empty deleter.

It is not nice for multiple reasons.

First, we need to call release, which is often a red flag as it means that memory is managed manually.

Second, we need to define two deleters, an empty one, and another that simply calls delete.

Third, we have a std::unique_pointer that sometimes does not own the object.

It is possible but use a boolean flag in bar, or a stateful deleter for the unique_ptr, but it does not change the fact that those unpleasantries are still there.

But it is what the class is actually doing; depending on the input parameter, because we do not want to waste resources, the unique_ptr might not own the resource.

Also in all cases, this implementation detail is leaked in the header file⁠[1] as the status owning/detached needs to be registered as part of the status of the class.

Note 📝
it is possible to avoid to call release, by defining a deleter that is convertible from the default deleter, but then you have to call delete by yourself.
#include <memory>
#include <cassert>

struct interface{
    virtual int foo(int) = 0;
    virtual ~interface() = default;
};

struct custom_delete {
    bool own;
    custom_delete() : own(false) = default;
    custom_delete(std::default_delete<interface>) : own(true){
    }
    void operator()(interface *ptr) const {if(own){delete ptr;}}
};
using maybe_owning_ptr = std::unique_ptr<interface, custom_delete>;

class bar {
    maybe_owning_ptr i;
public:
    explicit bar(std::unique_ptr<interface> i_) : i([&] {
        if(i_){
            return maybe_owning_ptr(std::move(i));
        }
        static constinit struct : interface{
            int foo(int) override {return 42;};
        } ie;
        return maybe_owning_ptr(&ie);
    }()){
        assert(this->i != nullptr);
    }

    int baz(int v){
        return this->i->foo(v); // no need to check for nullptr
    }
};

What if…​

But what if we do not have a pointer to a class hierarchy or a pointer to a final class? What if we are using another "nullable" type, like std::optional? In that case, we cannot replace a type with another as the type system will prevent us from doing so (at least in C++).

Compiler warnings and static tooling could help, if there was a diagnosis like you must check for nullptr before dereferencing, but I could not find something that worked reliably

Custom annotation

A custom annotation could help; imagine writing something like

struct bar{
    int doit();
};

class foo{
    [[might_be_null]] bar* b;
public:
    explicit foo(bar* b) : b(b){}

    int baz1(){
        return b->doit();
    }

    int baz2(){
        if(not this->b){
            return 42;
        }
        return this->b->doit();
    }
};

And having the compiler emit a warning if b is accessed without verifying in the same function scope that it is not nullptr.

But there seems not to exist such a check.

Inversion of control

Since I did not find any compiler extension or static analyzer that forces the user to verify if an optional value is present before accessing it, let’s remove the if altogether, similarly to the Null Object Pattern, but this time, by moving the control in a reusable class.

For simplicity, let’s assume a non-owning pointer, but the code can be generalized for other data types too.

Consider the following case

struct bar{
    int doit(int);
};

class foo{
    bar* b;
    explicit foo(bar* b) : b(b){}

    int baz2(int v){
        if(not this->b){
            return 42;
        }
        return this->b->doit(c);
    }
};

and rewrite it to

template <class T>
class fwoptional{
    T* opt;
public:
    explicit fwoptional(T* opt) : opt(opt){}

    template <typename L, typename F>
    decltype(auto) apply(L l, F f) const {
        if(opt){
            return f(*opt);
        } else {
            return l();
        }
    }
};

struct bar{
    int doit(int);
};

struct foo{
    fwoptional<bar> b;
    explicit foo(bar* b) : b(b){}

    int baz1(int v){
        return b.apply([]{return 42;}, [v](bar& d){return d.doit(v);});
    }
};

The author of foo cannot dereference nullptr by accident anymore.

Why fwoptional? Because it is a functional wrapper over optional data. Instead of providing access to a value that might not be there, like std::optional does, fwoptional provides an interface for applying functions over values that might not be there, and forces the user to also provide a function in case the value is not there.

This way, it is not possible to ignore the fact that a value might be missing.

You might wonder why using lambdas, and not passing all arguments if we just want to call a member function…​

template <class T>
class fwoptional{
    T* opt;
public:
    explicit fwoptional(T* opt) : opt(opt){}

    template <typename L, typename F, typename... Args>
    decltype(auto) apply(L l, F f, Args&&... args ) const {
        if(opt){
            return (opt->*f)(std::forward<Args&&...>(args)...);
        } else {
            return l();
        }
    }
};

struct bar{
    int doit(int);
};

struct foo{
    fwoptional<bar> b;
    explicit foo(bar* b) : b(b){}

    int baz1(int v){
        return b.apply([]{return 42;}, &bar::doit, v);
    }
};

The main disadvantage is that arguments are always evaluated, even if the optional class is not present.

In the case of a local int it does not matter. If one of the arguments is a class with an expensive constructor, then using a lambda for wrapping the function call instead of passing the parameters, even if more verbose, provides the same overhead of manually writing if. The same holds if an int is the result of an expensive computation hidden behind a function that we do not want to call if we are going to discard the result.

Also one might find the first lambda, invoked if the object is missing, a dubious design choice, as it leads to unnecessary verbose code. If in most cases you just want to return a value, why not write something like

template <class T>
class fwoptional{
    T* opt;
public:
    explicit fwoptional(T* opt) : opt(opt){}

    template <typename L, typename F, typename... Args>
    decltype(auto) apply(L&& l, F f, Args&&... args ) const {
        if(opt){
            return (opt->*f)(std::forward<Args&&...>(args)...);
        } else {
            return l;
        }
    }
};

struct bar{
    int doit(int);
};

struct foo{
    fwoptional<bar> b;
    explicit foo(bar* b) : b(b){}

    int baz1(int v){
        return b.apply(42, &bar::doit, v);
    }
};

The code is even more concise but does not work well if we want to do something else (like throwing an exception), and needs special casing in case of a function that does not return any value.

All presented approaches have their advantages and disadvantages, in one situation you might prefer one of the presented apply functions to another.

Thankfully with constexpr if it is possible to condense them all together in a couple of overloads.

For completeness, the following version also works with "optional" types like std::unique_ptr, std::optional and custom-made classes, as long as they are convertible to bool (for testing the absence) and provide operator* for accessing the object when present (operator-> is, strictly speaking, not required as one can use operator* and then call the member function, but it should probably be there for completeness)

#include <type_traits>
#include <utility>

template <class T>
class fwoptional{
    T opt;
public:
    explicit fwoptional(T opt) : opt(std::move(opt)){}

    template <typename L, typename F>
    decltype(auto) apply(L&& l, F f) const {
        using ret = typename std::invoke_result<F,decltype(*opt)>::type;
        if(opt){
            return f(*opt);
        } else if constexpr (std::is_invocable_v<L&&>){
            return l();
        } else if constexpr(std::is_convertible_v<L&&, ret>){
            return ret(l);
        } else {
            static_assert(false, "do not know what to do..., "
            "either wrong function overload or return type");
        }
    }

    template <typename L, typename F, typename... Args>
    decltype(auto) apply_args(L&& l, F f, Args&&... args ) const {
        using ret = typename std::invoke_result<F,T,Args&&...>::type;
        if(opt){
            return (opt->*f)(std::forward<Args&&...>(args)...);
        } else if constexpr (std::is_invocable_v<L&&, Args&&...>){
            return l(std::forward<Args&&...>(args)...);
        } else if constexpr (std::is_invocable_v<L&&>){
            return l();
        } else if constexpr(std::is_convertible_v<L&&, ret>){
            return ret(l);
        } else {
            static_assert(false, "do not know what to do..., "
            "either wrong function overload or return type");
        }
    }
};

struct bar{
    int doit(int);
};

struct foo{
    fwoptional<bar*> b;
    explicit foo(bar* b) : b(b){}

    int baz1(int v){

        // choose the preferred/more appropriate method
        b.apply_args(42, &bar::doit, v);
        b.apply_args([]{return 42;}, &bar::doit, v);
        b.apply_args([](int){return 42;}, &bar::doit, v);

        b.apply(42, [v](bar& b){return b.doit(v);});
        b.apply([]{return 42;}, [v](bar& b){return b.doit(v);});
        return 0;
    }
};

In Java, the pattern would look like

import java.util.function.*;
import java.util.*;

interface Interface {
   public int doit(int i);
}

class FWOptional<T> {
    private T opt;
    public FWOptional(T opt){
        this.opt = opt;
    }
    public FWOptional(Optional<T> opt){
        this.opt = (opt != null && opt.isPresent()) ? opt.get() : null;
    }
    public <R> R apply(Supplier<R> l, Function<T,R> f){
        assert(l != null && f != null);
        if(opt == null){
            return l.get();
        } else {
            return f.apply(opt);
        }
    }
    public void apply(Runnable l, Consumer<T> f){
        assert(l != null && f != null);
        if(opt == null){
            l.run();
        } else {
            f.accept(opt);
        }
    }
}

class Bar {
    private FWOptional<Interface> i;

    public Bar(Interface i){
        this.i = new FWOptional<Interface>(i);
    }

    public void fun(int v){
        int j = this.i.apply(
            () -> 0,
            (Interface ii) -> ii.doit(v)
        );
    }
}


class Bar2 {
    private FWOptional<Interface> i;

    public Bar2(Optional<Interface> i){
        this.i = new FWOptional<Interface>(i);
    }

    public void fun(int v){
        this.i.apply(
            () -> {},
            (Interface ii) -> {}
        );
    }
}

Since the syntax for lambdas in Java can be much more concise than in C++, there is little need to provide overloads for passing parameters around instead of callbacks.

There seems to be missing a Function interface: one that takes nothing as a parameter and returns nothing. Granted, such a function might be dubious as it either always throws/exists/aborts, does nothing, or changes the status of the program through global variables or this. Thus the need for such an interface normally does not arise often. Runnable has the correct interface but is described as a protocol between threads. Maybe I’m giving too much weight to the documentation and the name of the class, as it works as desired. Implementing a Procedure class is not difficult, the issue is making different libraries work together if everyone rolls their interface. Thus using Runnable seems to be the best option.

For FWOptional, the overload that takes a Consumer<T> also needs a function that takes no parameter. The developer has to state if they want to do nothing, throw an error, or do something else.

Omitting the else branch is something I want to avoid for this class. If it is omitted, we are nearly back to square one; we cannot see if the developer forgot to handle the case when the optional data is not there.

Containers

For some types, there is already a value that can often be used as a "Null Object". In the case of containers, a null reference and an empty container can be treated equally in many scenarios.

#include <vector>
#include <cassert>

class bar{
    const std::vector<int>* data;
public:
    explicit bar(const std::vector<int>* data) : data(data){}

    bool is_there_data() const {
        if(data){
            return not data->empty();
        }
        return false;
    }
};

Similar to the Null Object Pattern, but without creating a subclass, it is possible to create a static instance

#include <vector>
#include <span>
#include <cassert>

class bar{
    const std::vector<int>* data;
public:
    explicit bar(const std::vector<int>* data_) : data([&](){
        if(data_){return data_;}
        static const std::vector<int> sdata{};
        return &sdata;
    }()){
        assert(data != nullptr);
    }

    bool is_there_data() const {
        return not data->empty();
    }
};

In this example, using a non-owning view type, would avoid creating a global instance, and remove an indirection:

#include <vector>
#include <span>

class bar{
    std::span<const int> data;
public:
    explicit bar(const std::vector<int>* data) :
      data( data ? std::span<const int>(*data) : std::span<const int>()){
    }

    bool is_there_data() const {
        return not data.empty();
    }
};

If the absence of a container means using some predefined data, it is still possible to reuse the same pattern:

#include <vector>
#include <span>

class bar{
    std::span<const int> data;
public:
    explicit bar(const std::vector<int>* data_) : data([&]() -> std::span<const int>{
        if(data_){return *data;}
        static constexpr int default_data[]{0, -1, 42};
        return default_data;
    }()){
    }

    bool is_there_data() const {
        return not data.empty();
    }
};

1. Yes, one could set a global flag in the implementation file…​, don’t do it

Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.