The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal Public Domain Dedication

Guaranteed copy elision

Notes published the
8 - 10 minutes to read, 1970 words

While copy elision and return value optimization (RVO) are not novel techniques, many programmers are not aware of it, do not know how it works in practice, and since C++17 the standard guarantees that for temporaries, no copies (or moves) are made.

Brief history of copy elision

Consider the following piece of code: std::string s = std::string("foo");.

It seems that we are creating a temporary string on the right side, and then assigning it to s. For this reason, some developers prefer to write std::string s("foo");

Similarly, consider

std::string foo(){
    return std::string("foo");
}


std::string s = foo();

It looks like we are creating a string inside the function foo, copying it out, and assigning it to s.

In both cases, as the compiler is permitted to elide copies, the string will only be created once.

While less common, the same holds for

std::string s = std::string(std::string(std::string(std::string("foo"))));

Instead of making 4 copies/temporaries, a good compiler will just initialize the string once.

Especially before C++11, since there is no move semantic and as this optimization is not guaranteed, one cannot blame developers for making it of a habit to follow the C standard technique to pass a complex object by mutable reference (or pointer) and initialize the variable inside a function, instead of returning an initialized value. Another factor that "forced" to use this C technique, is that there is, to my knowledge, no automated diagnostic for checking if some unnecessary copy is made between function calls.

C++11

With C++11, thanks to move semantic, for many use-cases copy-elision is not an optimization that will make a big difference for most use-cases.

Returning a string or vector from a function by value? For many scenarios, it means swapping three pointers.

Nevertheless, even move semantics can be expensive, for example for a big object on the stack like an array. In those cases copy elision is still an important optimization.

C++17

Until C++17, nothing changed, then wording for guaranteed copy elision through simplified value categories 🗄️ got approved.

Contrary to C++11, it did not add a new category (like xvalue, rvalue, lvalue, …​) for identifying those expressions where RVO would apply, but instead redefines "where" a value is created.

The rules of the language have been changed so that there is no copy in the first place, the effect for users is nearly the same.

For example, after C++11 and before C++17, std::string{"foo"} is a prvalue of type std::string with value "foo" (Before C++11 it is simply an rvalue).

With C++17, std::string{"foo"} is not a real object anymore, but it is still a prvalue that can be used to initialize some object.

Consider:

auto foo = std::string("foo");

Before C++17, this snippet creates a temporary std::string, which is then used for initializing foo through the move constructor, or copy-constructor before C++11. Because the compiler can optimize the code, in practice foo is initialized directly, but the appropriate constructors need to exist. Since C++17, foo is initialized with the prvalue std::string("foo").

Since C++17, there are no more temporary objects, thus no more calls to copy or move constructors!

This is true even when returning temporaries from functions:

std::string bar(){
    return std::string("foo");
}
auto foo = bar();

Before C++17, this snippet creates a temporary std::string inside bar. This temporary is moved (or copied before C++11) outside of bar() and is finally used for initializing foo through the move constructor, (or copy-constructor before C++11). Again, because the compiler can optimize the code, in practice foo is initialized directly, but the constructors need to exist.

Since C++17, return std::string("foo"); initializes the result object of bar(), which is foo.

Again, there are no more temporary objects, thus no more calls to copy or move constructors!

While we still say "Return value optimization" and "copy elision", there is in fact no optimization or elision, as there are no objects to optimize away.

Thanks to the new wording

struct s {
    s() = default;
    s(const s&) = delete;
    s(s&&) = delete;
};
s make() { return s(); }
auto x = make();

Compiles in C++17 and follow-up standards.

Remaining use-cases

Copy-elision is still a thing, as the rules changed in C++17 affect only temporaries.

Consider for example

struct s {
  s(int){}
  s(const s&) = delete;
  s(s&&) = delete;
};

s make() {
  auto v = s(42);
  return v; // does not compile
}

As v is not a temporary, but a full-fledged object, it does still require the existence of a copy or move constructor, even if it is never called.

As already described in this notes, it is possible to force RVO:

struct force_rvo {
  force_rvo(int){}
  // defined as public, but not implemented anywhere
  force_rvo(const force_rvo&);
  force_rvo(force_rvo&&);
};

// here we still need copy elision
force_rvo make() {
  auto m =  force_rvo(42);
  return m;
}

In some cases, one might want to do some operation after the creation of the object, for example

  • logging

  • conditionally changing the object state

  • construct the object in multiple phases

While it is possible to add some of those actions in a destructor, for example

struct custom_action{
  ~custom_action();
};
struct s {
  s(int){}
  s(const s&) = delete;
  s(s&&) = delete;
};

// no need for copy elision
s make() {
  custom_action _;
  return s(42);
}

It is hard to ensure that it works as if one could take advantage of RVO.

What if, for example, the constructor of s throws, and we do not want to execute a custom action? Or what if the custom action could throws an exception?

In those cases, it is harder to get the correct logic.

There is currently no paper that aims to make such code well-formed, probably because there is not enough motivation and because it might be harder to define the intended semantics.

Best practices

While RVO is not generally guaranteed, there are a couple of guidelines that can help to exploit it.

If not part of a public API, and if you get to decide which compilers to support, you can force RVO.

Thus, it is possible to use something like force_rvo where it makes sense. If some future version of the compiler we are interested in does not compile the code anymore, it is still possible to change and adapt the code without changing the interface.

Thus I would not recommend force_rvo as part of an API of a (public) library.

Do not save the result in a variable just to return it

Because since C++17 returning a temporary is guaranteed not to make any copy or move. But also previous to C++17 chances are very high that copy and moves are elided away.

One exception to this guideline might be for easing debugging, as it is harder to inspect temporary variables (but not impossible). An alternate approach might be stepping into the constructor.

Do not introduce unnecessary allocations or a two-phase-init to avoid a move or copy when returning a value

Especially if those indirections mean a two-phase initialization and memory allocation.

The first is error-prone and might introduce an uninitialized state to a class, which needs to be checked at runtime.

The second one might be more expensive than copying the value (worst case scenario), introducing an optimization barrier, and also introducing a new state (nullptr) from the caller’s perspective.

Do not std::move on a return statement

Because it disables different kinds of optimizations and forces a call to the move operator. It also makes the code less readable.

While it is true that the compiler might not apply RVO, there are some patterns to decrease the chances of generating unnecessary moves/copies. In general, it seems that as long as there is only one possible object to return, all three compilers will happily apply RVO:

struct s{
    explicit s(int);
    s(const s&);
    s& operator=(const s&);
    s(s&&) noexcept;
    s& operator=(s&&) noexcept;
    ~s();
};

void bar();

s fun(int i){
    auto a = s(i);
    bar();
    return a;
}

But if there are multiple paths even to return the same object, like in the following example

s fun(int i){
    auto a = s(i);
    bar(a);
    return i ? a : a;
}

then none of GCC, clang, and MSVC apply RVO. Even worse, in this case, it will even copy the value, and won’t compile for classes that do not have a copy-constructor, like std::unique_ptr.

Notice that with an if-else, NRVO is still applied with GCC and clang, but not with MSVC

s fun(int i){
    auto a = s(i);
    bar(a);
    if(i){
        return a;
    } else {
        return a;
    }
}

When adding a different object, and there exists a scope where both of them are present, then all three compilers did not apply NRVO to a:

s fun(int i){
    auto a = s(i);
    bar(a);
    if(i){
        return a;
    } else {
        return s(i+1);
    }
}

If there is no scope where both objects exist, like

s fun(int i){
    if(i){
        auto a = s(i);
        bar(a);
        return a;
    } else {
        return s(i+1);
    }
}

or

s fun(int i){
    if(i){
        auto a = s(i);
        bar(a);
        return a;
    } else {
        auto a = s(i+1);
        return a;
    }
}

then only Clang will apply RVO.

But if there is no common scope, then the function can probably be split into two functions without too many difficulties:

namespace{
	s fun_0(int i){
		auto a = s(i);
		bar(a);
		return a;
	}
	s fun_1(int i){
		auto a = s(i+1);
		return a;
	}
}
s fun(int i){
    if(i){
        return fun_0(i);
    } else {
        return fun_1(i);
    }
}

it might seem contradictory, but by adding another indirection, we removed unnecessary calls to the move/copy constructor on GCC and Clang.

In the case of MSVC, the compiler needs some optimizations turned on; thanks to #pragma optimize it is possible to optimize only the relevant functions

void bar(s&);
namespace{
    #ifdef _MSC_VER
    #pragma optimize("g", on)
    #endif
	s fun_0(int i){
		auto a = s(i);
		bar(a);
		return a;
	}

    #ifdef _MSC_VER
    #pragma optimize("g", on)
    #endif
	s fun_1(int i){
		auto a = s(i+1);
		return a;
	}
}
s fun(int i){
    if(i){
        return fun_0(i);
    } else {
        return fun_1(i);
    }
}

There are normally multiple factors that influence optimizations like the number of parameters, accessed globals if other functions can be inlined and the size of the function, and all those factors will have different meanings depending on compiler type and version.

Thus the examples shown are probably not significant but can help to understand how to minimize the risk of incurring unnecessary moves and copies.

TL;DR: best practices

There are not many places where it makes a difference if a move constructor is called or not when returning from a function. For those cases, keeping functions linear and short is the easiest way to increase the chances that the compiler will not unnecessarily call those constructors.

It also holds that

  • temporaries are generally optimized away

  • std::move forces a call to the move constructor and prevents all forms of return value optimization

  • the ternary operator forces a copy in a return statement (return expr ? a : b), unless both a and b are temporaries (guaranteed only since C++17, but likely to happen even before). The copy can be avoided with std::move, which forces a call to the move constructor. An if statement seems to be a superior alternative.

  • the comma operator forces a copy in a return statement (return ((void)expr,a);), unless a is temporary (guaranteed only since C++17, but likely to happen even before). The copy can be avoided with std::move, which forces a call to the move constructor.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.