The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal Public Domain Dedication

How to force return value optimization

Notes published the
7 - 9 minutes to read, 1746 words

While discussing how RVO worked in practice and why it is important that compilers can eliminate temporaries (even after C++11), we noticed that it seemed that most code-basis do not take advantage of it.

In practice, to avoid a potentially expensive copy, we saw that in many, especially pre-C++11 codebases, there were additional allocations and pointers were passed around. Another common technique was a 2-phase initialization.

While both approaches work, they make the code harder to understand and maintain. Worse both methods are generally less efficient than simply returning values from a function.

Return value optimization is not a novel technique. I was able to find some guidelines on how to take advantage of it in Scott Meyers, More Effective C++, published in 1996, thus even before C++ got standardized.

On Compiler Explorer, it’s possible to test from GCC 4.1.2 and Clang 3.0 (and other compilers too), that RVO is applied consistently. Only newer versions of Visual Studio are available (VS 2019), but the documentation of Visual Studio 2005 🗄️ indicates that also older versions of MSVC applied RVO. Also, the first compiler to apply NRVO was apparently the Zortech C++ compiler in 1991.

As a reference, here is a snippet of code

// implemented elsewhere
// the compiler cannot assume those have no side effects
void i_construct();
void i_copy();
void i_assign();

struct C {
  C(int, int) {i_construct();}
  //private:
  C(const C&) { i_copy(); }
  // the copy constructor has a visible side effect
  C& operator=(const C&){ i_assign(); return *this;}
};

C create(int i){
    C c = C(i,-1);
    return c;
    // or simply return C(i);
}

int main() {
  // direct-initialization, calls C::C(42)
  C c1(42, -1);
  // copy-initialization, calls C::C(C(42))
  C c2 = C(42, -1);
  C c3 = create(42);
}

Looking at the assembly (for example on compiler explorer), it is possible to see that i_copy and i_assign are newer called, even with -O0 (or /Od for Visual Studio), thus telling explicitly the compiler that all optimization should be disabled. It is also possible to provide an implementation for i_construct and compile the source code locally and execute it. It will work without further changes, as far as I could see with every compiler.

As this is an observable side-effect (at compile time!), it occurred to me that it makes it possible to detect if the compiler performs copy elision, and eventually triggers a compile error if it does not.

Thus it makes it possible to enforce, for example, return value optimization.

Return value optimization is not only important because of performance (which is still an argument, even after C++11), but also because there are types that are not copyable and moveable. Consider for example a factory function that should return a std::atomic or std::lock_guard.

Both types cannot be returned from functions, because they do not have a copy and move constructors. Thus RVO (before C++17) cannot be applied to those types. The most common workarounds I saw are adding an unnecessary layer of indirection through an allocation, and returning an owning pointer to the caller.

But if the compiler supports copy elision, then there is no need to incur the cost of an additional allocation or add an error-prone interface. It is possible to create a wrapper that can take advantage of RVO.

#include <atomic>

struct force_rvo {
    std::atomic<int> obj;
    force_rvo(int i) : obj(i) {}
    force_rvo(const force_rvo&); // not implemented by design
    force_rvo(force_rvo&&); // not implemented by design
};


force_rvo foo(){
    return force_rvo(42);
}

int main(){
    auto v = foo();
}

foo is effectively returning an std::atomic by value, and this code works pre-C++17. From C++17 such a workaround is not necessary since copy elision is guaranteed by the standard, even in the absence of copy and move constructors.

So this technique is useful for those that still use C++11 or C++14.

Actually, it can be even used in C++98 and C++03 for implementing a much better std::auto_ptr, something that looks and feels like std::unique_ptr.

Leaving custom deleters out, I came up with the following implementation, with the following differences

  • The class is named owning_ptr and not unique_ptr to avoid confusion, but the semantics should be the same

  • move_out plays the same role as std::move, again different names to avoid confusion

  • There are no variadic templates, no variadic macros, and no perfect forwarding, thus it does not seem possible to implement something like std::make_unique. I think make_owning_from has still some advantages (template deduction, avoiding a leak in case of function taking multiple owning_ptr constructed in place, reminder to use std::make_unique when upgrading to newer standards,…​ ), but it’s completely optional, as it does not need to access any implementation detail of the class.

template <class T>
struct owning_ptr {
        owning_ptr(T* ptr_) : ptr(ptr_) {}
        owning_ptr(const owning_ptr&); // not implemented by design
        owning_ptr& operator=(const owning_ptr&); // not implemented by design
        ~owning_ptr() { this->reset(); }
        T* get() { return this->ptr; }
        const T* get() const { return this->ptr; }
        void reset(T* ptr_ = NULL){
            delete this->ptr;
            this->ptr = ptr_;
        }
        T* release() {
            T* ptr_ = this->ptr;
            this->ptr = NULL;
            return ptr_;
        }
    private:
        T* ptr;
};
template <class T>
owning_ptr<T> make_owning_from(T* ptr) {
    return owning_ptr<T>(ptr);
}
template <class T>
owning_ptr<T> move_out(owning_ptr<T>& p) {
    return make_owning_from(p.release());
}

template <class U>
void swap(owning_ptr<U>& a, owning_ptr<U>& b){
    owning_ptr<U> tmp = move_out(a);
    a.reset(b.release());
    b.reset(tmp.release());
}

Why is owning_ptr much better than std::auto_ptr?

Given the following snippet

bool compare(const owning_ptr<int>& lhs, const owning_ptr<int>& rhs){
    return *lhs.get() < *rhs.get();
}

int main(){
    owning_ptr<int> arr[] = {
        make_owning_from(new int(42)),
        make_owning_from(new int(-1)),
        /* ... */
    };
    std::sort(arr, arr + sizeof(arr)/sizeof(arr[0]), compare);
}

the code either compiles and works correctly, or it triggers a linker error. With std::auto_ptr, the code has undefined behavior (as the copy constructor does not copy), while with std::unique_ptr the code always compiles and works correctly.

owning_ptr is also, compared to std::auto_ptr a much better member variable, as it forces the user to write a correct copy-constructor (otherwise they’ll get a linker error if ever used), as the generated one, in case of auto_ptr does the wrong thing (it does not copy).

What’s most interesting, it’s that owning_ptr can too be passed to a function that should take owning parameters.

  • With std::auto_ptr this is possible because of the copy constructor that does not copy.

  • With std::unique_ptr this is possible thanks to the move constructor (introduced in C++11).

  • With owning_ptr, this is possible because of copy-elision (especially in C++17, but also available before).

// takes by value!
void sink(owning_ptr<int>){}

int main(){
    owning_ptr<int> p = make_owning_from(new int(42));
    sink(move_out(p));
    sink(make_owning_from(new int(42)));
    //sink(move_out(make_owning_from(new int(42))));
}

Only the last commented-out call does not work, as move_out needs a reference to a value.

There is actually a trick to get it to work for temporaries too, and it involves a macro:

#define MOVE_OUT(p) make_owning_from(p.release())

// takes by value!
void sink(owning_ptr<int>){}

int main(){
    owning_ptr<int> p = make_owning_from(new int(42));
    sink(MOVE_OUT(p));
    sink(make_owning_from(new int(42)));
    sink(MOVE_OUT(make_owning_from(new int(42))));
}

And this got me thinking…​ if RVO would have been standardized before move semantic, would it have made sense to standardize move semantic as we know it today?

std::unique_ptr is always (for me at least) the first example that comes to mind when thinking about move-semantic. The fact that it is practically possible to implement it without move semantic (and with a feature enabled by all compilers I know of even with optimizations disabled) raises some questions.

Of course, the current owning_ptr is not as good as std::unique_ptr. For example, the call to std::sort will generally produce a linker error and not compile. Also, the interface is less polished, as since C++11 we can express the absence of a copy-constructor and introspect that property, while until C++03 it’s only possible to fail hard. Also let’s not forget that RVO does not always take place, even if I suspect that in most (not all) situations it is possible to rewrite the code to take advantage of it if the compiler would emit a diagnostic.

But it is interesting to see that at least for owning_ptr there was no need to introduce the new syntax used for references, and that the class actually has only one explicit constructor (and the not-implemented copy-constructor), and that move_out did not need any special access.

To conclude, those findings/acknowledgments would have been much more interesting before C++11, when move semantic, almost always a superior choice, was not available, and before C++17 when RVO was not mandated (under certain circumstances) by the standard.

As of today, there is no need to define a force_rvo class for returning non-moveable and non-copyable types, unless we want to be sure it applies in some places not required by the standard, but it seems to be a more nice use-case. As it happens, in some domains more than others, to work with code bases that still use an older standard, it is at least possible to take advantage of that we know what happens in future standards.

It is also good to know that newer compilers than those currently in use have improved support for RVO, and not removed it entirely for example. So if the current compiler supports it, it is possible to define a less error-prone std::auto_ptr and return non-copyable types without incurring extra allocation, knowing that the code will not break all at once when finally upgrading to a newer standard or more modern compiler.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.