The private implementation pattern (PIMPA)

The Problem

Many articles have been written about why global variables are bad. I’ve also contributed to this trend, even if I’ve focused on libraries and undefined behavior that appears even when using constants.

The main problem with mutable globals is that they can change the state of the program, and it’s difficult to see where that happens.

Consider following snippet of a single-threaded program:

int bar();

int foo(){
    a = 1; // suppose a is a global declared somewhere else
    int b = bar(); // does a == 1 affect the behaviour of bar? does bar change a?
    assert(a == 1); // might fail
    return a + b;
}

Unfortunately, the assertion might fail. To know if it might fail or not, we need to inspect bar and every function it calls to see if a changes.

Different types of globals

Globals are called this way because there are visible from everyone, thus, they break encapsulation. In a language with function, this pattern is easy to recognize, whatever can be accessed without being passed as a parameter, is generally considered a global variable.

Interestingly, different languages have different types of globals, or better some languages have globals with different scopes.

In C we have for example we can have globals scoped in a translation unit or over multiple. Or we can also ensure that every translation unit has it unshared global. But this is old news, I’ve already explored how this can affect programs (TLDR: just avoid globals, it’s so much easier).

So we already have multiple types of globals with different scopes and visibilities, and if a language support multithreading, then normally it also provides the possibility to make a distinction between globals visible between threads, or thread-local global variables.

What is common to all those types of globals is that they are accessible inside (some) functions, without being passed explicitly as parameters, and their scope is bigger than the scope of the function.

In C++ and other object-oriented languages like Java, there is another, more subtle type of "global variable": member variables.

While it seems a confusing statement, as obviously member variables cannot be accessed from everywhere, there are many similarities with globals.

int myclass::bar();

int myclass::foo(){
    a = 1; // suppose a is a member variable of myclass
    int b = bar(); // does a == 1 affect the behaviour of bar? does bar change a?
    assert(a == 1); // might fail
    reutrn a + b;
}

Even if we can argue that "global" is or is not the right word for a member variable, this code snippet has the same issues like the one where a is a global variable. bar() might access a without that a is passed as a parameter to it.

We need to see the implementation of bar and every other member function it calls to see if a changes.

In reality, a is passed as a parameter to bar through the this pointer. The language syntax makes it so, that for member functions this→bar() and bar() are equivalent.

But we could say the same thing about global variables.

They are "passed" secretly to all functions that can access them.

Thus, if we imagine that globals are like hidden parameters passed to every function, or acknowledge that "member variables are accessible inside (some) functions, without being passed explicitly as parameters, and their scope is bigger than the scope of the function" Thus if we write the code like

int myclass::bar(); // private function of myclass

int myclass::foo(){
    this->a = 1; // suppose a is a private mutable member variable of myclass
    int b = this->bar(); // does a == 1 affect the behaviour of bar? does bar change a?
    assert(this->a == 1); // might fails
}

While our questions are left unanswered, as bar might or might not do something we b, at least we know that it has free access to it. I find such code much more readable, especially when reading code not written by myself, or when dealing with big classes or class hierarchies. First of all, it makes clear to what variables bar can really access.

While classes are small it might seem a non-issue. But when working in bigger code-basis and with other people it’s common not to know all methods (public, protected and private) provided by a class.

So when reading bar() we do not generally know if it is a member function or a free function (yes, a colorized IDE can help, but a diff tool normally not).

As mutable globals are generally avoided, if possible, because we know they are bad, we assume most of the time that the code we are reading does not depend on it, and eventually be surprised when things do not work.

Inside a class, it is harder to work with the same assumption, because it is normal to access and change member variables in member functions. Personally, I’ve never worked on a code-basis (or only read for what its worth) that preferred to write this all the time.

Side-note: python does not have this readability issue, as writing self is not optional.

In case of class hierarchies and public and protected member functions (and even private overridden functions!) we need to check manually all possible call stacks to see if a gets referenced somewhere, even just to understand if we are calling a free or a member function!

Fortunately, there is a simple pattern, that also has other benefits, to improve the situation and fix the underlying problem: the private implementation pattern (PIMPA). Not to be confused with the PIMPL ("Pointer to implementation") pattern, which can provide similar benefits.

PIMPA

The main idea is to reuse the same techniques that we already use to avoid as much as possible global state: function parameters.

But it would not make much sense just to change the signature of int myclass::bar() to int myclass::bar(int). We would be passing an extra parameter for no reason, as we might still access a through this by accident instead of the function parameter.

Thus we need to make bar, and all other private member functions, free functions. And while we are at it, we do not declare them in the header where myclass is defined, but we declare and implement them in an unnamed namespace in the .cpp file.

This has multiple side-effects:

no more globals

First of all, we know that bar cannot access a behind our back anymore. This reason alone would be sufficient for me to implement most classes this way, if possible.

(re)compilation times

Similar to PIMPL (re)compilation times are generally reduced.

If we change some internal details of myclass, like the signature of a bar or another former private member function, all source files that included myclass.hpp needed to be recompiled, even if the class layout did not change and no one external to the class was able to call bar directly.

If all private functions are in the implementation file, generally only relinking is necessary, as the header remains unchanged.

The PIMPL pattern provides a similar advantage, by moving the (private) data of the class to the implementation file.

Similar to PIMPL, this pattern could also reduce the number of includes necessary in the header files, as all type parameters of the private functions do not appear in the header anymore.

Thus those two techniques are othogonal and can both be used to reduce compilation times.

But contrary to the PIMPL pattern, we are exchanging the cost of another indirection (through the pointer) and an allocation for compilation speed (unless using placement new).

reduced complexity for programmers/human reader

Another advantage is that the detail exposed in the header file, are not only less for the compiler, but also for programmers and readers, the end-user of the class.

While a lot of details might still present, like the private data, the header files contain only the functions the programmer is supposed to use, either by using the class directly or by subclassing.

This makes it also easier to identify which functions should be documented for users, and those that are not relevant.

Less work for the linker

As the function is in an anonymous namespace, they have internal linkage since C++11.

This means less work to what is normally the bottleneck during the compilation phase, as the linking phase is fundamentally serial and difficult to parallelize.

Better compiler diagnostics for unused functions

While this depends on the quality of the compiler, a compiler (or static analyzer) can generally detect more easily if a function is unused if declared in an anonymous namespace.

If a function is defined in a header file (even with private visibility), normally it needs to do something like a "whole-program" analysis to verify that the function is unused, and still, if the function is part of a library and exported, it might generally not emit any warning. Member functions also need a more complex-analysis, as there might be class hierarchies, virtual functions, and friend functions, that might provide access to the private data.

With an anonymous namespace, at least GCC and clang with -Wunused-function, are able to tell immediately that a function is unused

Internal linkage for private function

Either the whole class has internal linkage, or none of the functions have it. Normally it is not possible to define a different visibility for single functions. By removing private functions from the class definition, it is possible to give them internal linkage.

Different optimisation opportunities

I did not make any test nor benchmark, so I’m not claiming that the generated code is better.

It is possible, or even to expect, that the compiler would not inline some member functions outside of the class as it did before, as we managed to hide them better.

On the other hand, it has more possibilities to make optimizations inside the class, as it knows that those functions are not directly callable from the outside.

In particular, it could reduce the amount of generated code. Because of the as-if rule, a sufficiently sophisticated compiler (and linker) would ignore all the implementation differences and do the same optimizations and produce the best machine code.

Thus there are no hard rules, even if today the code would be better or worse, the next compiler version might change its behavior.

But it should be interesting to see if this pattern improves the quality of the code.

Enforce invariants

Generally, invariant hold when a public function of the class is called. An invariant might not hold during the processing and is then restored when the function finishes.

A simple approach to verify a complex invariant is to define a class like

template <class C>
struct check_invariant {
    C& c;
    check_invariant(class * c_) : c(*c_){
        assert(invariant(c) && "invariant failed");
    }
    ~check_invariant(){
        assert(invariant(c) && "invariant failed");
    }
};

By creating an object of this type at the beginning of every function where we expect the invariant to hold, thus on every public function and eventually on some private function, we can self-test the correctness of the class.

There exist of course the possibility to add such a check at the wrong place, like a private method, as they might work only on a subset of the whole class while the invariant might be broken.

As generally the private functions in the PIMPA pattern do not have access to the whole class, but only the subset they need to, it is not possible to try to enforce the invariant at the wrong place.

Add thread-safety at posteriori

And last, but not least, in case we decide to make the class thread-safe, it is easier to inspect its correctness.

One common method to make an existing class thread-safe is to apply the :

  • add a mutex to protect all data

  • lock the mutex in all member functions accessible outside of the class (overridden functions, public and protected functions, and friend function/classes)

  • no member function can call another member function accessible out of the class.

As I’ve discovered later, this pattern is named Thread-Safe Interface.

Thanks to PIMPA, we already know that private functions cannot call other member functions, so it’s easy to enforce the third rule.

It is not as easy to verify, for example during code-review, if every public and protected member function does not call any other member function, but if we have as a coding convention (enforced by linters) that member functions and variables need to be prepended with this when calling member functions, then it is much easier to verify the correctness.

This removes the risk of deadlocking. If there is a situation where a public functions call another one, then the common functionality can be extrapolated in a private hidden function, as already explained.

Also, this has the advantage that we can assure, as explained in this coding hint for private and public functions, that the private function should take a constant reference to the locker, to indicate that the data should be locked when accessing it.

What I did dislike about my recommendation was that it cluttered the header file with a lot of implementation details for apparently no good reason. Now all those details are really an implementation detail and the reader of the class does not need to ask themself why some private functions also take a possibly unnamed lock_guard as a parameter.

Where applying PIMPA is not possible

Unfortunately, there are cases where it is not possible to remove the implementation details from the header file.

Templated classes

If the private method needs to be aware of the template, it is generally not possible to move them from the header file to the implementation file, unless we know beforehand what all possible template types are.

Header-only libraries

The reasons should be obvious, as we have no file where to hide the implementation. Nonetheless, if possible, I would recommend to remove all private member functions and add them as stateless free functions. The drawback is that those functions can be called by someone else, but it is a minor problem, and actually promotes code reuse. (If the function is needed somewhere else, it is now easier to move it in a more appropriate place).

overridden functions

Because those functions are callable from the base class, thus they are not an implementation detail (and they should generally also be protected by a mutex in case of multithreading). If it should not be so, then the fix is to remove the virtuality of the function, so that we can remove the overridden private implementation.

If the behavior is correct, it is still possible to move the body of the implementation in a free function, and call this function from the overridden private implementation.