Extension methods in C++
Extension methods are member functions (methods) added to a class, after this has been fully defined.
Different languages use different techniques for supporting extension methods, and C++ is currently lacking one.
Why are extension methods missing from C++
I believe the main reason is to avoid breaking user code.
Suppose you add the extension method starts_with
to std::string
.
Then you upgrade your toolchain to C++20.
What now?
Should it be a compiler error? Should the member function take precedence? Should the extension method take precedence?
None of the listed alternatives looks good, especially if your code is a library that gets compiled by other teams.
If it is a compiler error, then it would mean the standard library would see little to no improvement. It would not have added the member functions starts_with
to std::string
, because it would break the code of programmers who added it themselves.
If it is not a compiler error, but one function takes precedence over the other, it is equally bad. Now the code changes meaning, depending on whether a header file has been included or not.
Similar questions exist if two extensions are defined with the same name and signature.
One could propose that it is undefined behaviour to add extension methods to the standard library types, similar to how it is undefined behaviour to take the address of functions from the standard library. But then extension methods would not be that useful.
For types you own, you can already add new methods; there is little need to resort to extension methods: The standard library is probably the most used library, and other external libraries face the same issue the standard library has.
One of the features of C++ is being able to take most (if not all) old projects, and compile them with no changes with the current standard.
The old code might be a dependency, maybe even something part of the operating system, thus not something that can be changed easily. It might not even be a direct dependency, but a transitive one; thus, modifying the build system to patch something out might even be harder. And if the dependency is shared between multiple projects, it can get even more complex!
The only solution that comes to mind is to use namespaces for member functions. That would be novel, and I’m not convinced it is worth exploring such a big change in the language.
TL;DR C++ does not have extension methods because supporting them makes upgrading libraries (standard or external) very brittle.
Why are they good
Even if extension methods are that problematic (at least for some environments), they do offer one great feature: improved code legibility.
starts_with(foo, bar)
is less readable than foo.starts_with(bar)
.
Note that this does not hold for every function; for example min(foo, bar)
is not less readable than foo.min(bar)
; on the contrary!
Other examples where in my opinion a member function is more readable is optional.value_or(default_value)
, string.find(substring)
, string.split(delimiter)
, container.size()
, and value.copy_to(other_value)
.
Yes, it is just syntactic sugar, but so are loops, lambdas, the builtin operator++
, class hierarchies, and many other features that simplify programming and permit to write code easier to maintain.
Workarounds
There are different ways to overcome this limitation if one absolutely wants to avoid using a free function.
Wrapper classes
It is possible to create wrapper classes and add new methods.
For example
#include <vector>
class string{};
std::vector<string> split(const string&, const string& del);
// extension methods
struct string_with_extensions {
string str;
std::vector<string> split(const string& del) const{
return ::split(str, del);
}
};
Usage would look like
// with extension method
void bar1(const string& str, const string& del){
auto res = string_with_extensions(str).split(del);
}
// without extension method
void bar2(const string& str, const string& del){
auto res = split(str, del);
}
The main disadvantage is the introduction of additional copies just for calling some functions, which is normally not acceptable. One could store just a reference or pointer inside of string_with_extensions
, which would avoid the additional copies, but might introduce lifetime issues.
Another thing to consider is all existing member functions and variables of string
. Should those be exposed in string_with_extensions
? It should, unless the class is named string_extensions
. As some classes have a lot of overloads, duplicating every member function for just forwarding all parameters creates a lot of boilerplate code, which might also be mostly dead code, and such trivial functions are very error-prone to write, especially if you want to avoid accidental copies, be const
-correct and noexcept
-correct, do perfect-forwarding, and support all overloads and optional parameters.
In general, one needs to have both string
and string_with_extension
as variables for using the extension methods, which is also not ideal.
Both issues can be partially side-stepped when using class hierarchies by defining and using a subclass, but it introduces another set of issues.
operator->*
I was recently made aware of a feature of operator->*
that I did not know.
It can be overridden, and it can be defined outline.
Why is defining it outline useful?
It can be used to "approximate" extension methods.
Since it can be defined outside a class, it is possible to add operator->*
on a class we do not control.
Changing the API of a class you do not own is a bad thing, what if a future update of the class overloads operator->*
?
It would be a compiler error.
But since no one overloads it, the chances are much smaller.
An example would be the following:
struct my_optional{
bool b = false;
int v = 0;
constexpr my_optional(int i):v(i), b(true){}
constexpr my_optional() = default;
// no value_or member function
};
struct value_or {
int value;
constexpr friend int operator->*(const my_optional& opt, const value_or& def) {
return opt.b ? opt.v : def.value;
}
constexpr value_or(int v):value(v){}
};
static_assert(my_optional(0)->*value_or(1) == 0, "");
static_assert(my_optional()->*value_or(1) == 1, "");
The code nearly reads as my_optional(0)->value_or(1)
; there is just an additional *
, and obviously uses ->
instead of .
.
my_optional
does not have value_or
as a member function, but through a struct value_or
, which exists solely for overloading operator->*
, it looks like we "extended" my_optional
with value_or
.
Unintended side-effects
It works for non-user-defined types too: integers, floating points, enumerators, and even arrays!
#include <cassert>
struct foo {
friend int operator->*(auto&& opt, foo def) {
return 0;
}
// optional parameters
int a = 1;
int b = 2;
int c = 3;
};
enum class a{e};
int main(){
assert(a::e->*foo(1,2,3) == 0);
assert(1->*foo(1,2) == 0);
assert(1.0->*foo(1,2) == 0);
int arr[10]{};
assert(arr->*foo(1) == 0);
}
Does operator->*
fix the mentioned issues of extension methods?
Overloading operator->*
is uncommon, and this might lead to surprises. I also have to admit that I do not know what the intended use case for operator->*
was.
While one should _never overload operator,
as it will break code, the builtin operator->*
is rarely used. Thus, the chances of breaking something are extremely low.
Overloading operator->*
can lead to more readable code, but I would use it currently only as an implementation detail, not something that is part of the API of a library.
If such functions are part of a public API, then there is the risk of colliding with other overloaded operator->*
, just like there is the same risk with free functions and types.
One can use namespaces:
#include <optional>
namespace extensions{
template <class T>
struct value_or {
T value;
constexpr friend int operator->*(auto&& opt, value_or<T>&& def) {
return opt.has_value() ? opt.value() : def.value;
}
};
template <class T> value_or(T&&) -> value_or<T>;
}
static_assert(std::optional<int>{12}->*extensions::value_or(1) == 12);
static_assert(std::optional<int>{std::nullopt}->*extensions::value_or(1) == 1);
// or
using extensions::value_or;
static_assert(std::optional<int>{12}->*value_or(1) == 12);
static_assert(std::optional<int>{std::nullopt}->*value_or(1) == 1);
This approach resembles the one used for user-defined literals.
Conclusion
Using subclasses or wrapper classes is an already used pattern, but it introduces additional types, variables, and indirections that make the code harder to maintain. I would not recommend this approach. In some environments, programmers would change such class hierarchies to be virtual
and lose value semantic too.
operator->*
avoids this issue, but is a more novel approach.
Thanks to namespaces, even if (ab)using operator->*
gets more mainstream, collisions can still be avoided.
Last but not least, as already written, defining operator->*
for a class we do not own is problematic.
Thanks to namespaces, it is less problematic; nevertheless, I would follow the advice I gave on function poisoning, which would be to avoid making extension methods part of a public API, and use them only internally:
What we have done is changing the public API of a library (standard or third-party, it does not matter). This is a Bad Thingโข because we are not in control of that API. As long as those changes are limited to our project, they provide some benefits, and the possible issues are limited.
If you have questions, comments, or found typos, the notes are not clear, or there are some errors; then just contact me.