string_view(s)
Since C++17, the standard library provides std::string_view
.
Some people are not happy with some of its design decisions, for example, the fact that std::string_view
, just like std::string
, has a bloated interface, or that a std::string
is implicitly convertible to std::string_view
, which might create dangling strings by accident. Another apparently controversial design decision is that std::string_view
does not point to a \0
terminated string, making it generically unsuitable as a drop-in replacement for const char*
and std::string
based interfaces.
-
more performant than
std::string
andconst char*
-
make it a drop-in replacement for functions accepting non-owning strings
While it is difficult we will have some other string_view type (if we do not count std::span, from C++20), even if the original proposal 🗄️ mentions that it could make sense, it’s easy to implement such a class.
In fact, \0
termination and implicit conversion are only part of a small subset of properties we might like to change from a string class. For example, we might need some string class that ensures there is no such thing as an empty string or that handles errors in a different way.
First of all, let’s look at what properties the string type of the standard library and const char*
have:
string type | since | owning | \0 terminated | std::string conversion | const char* conversion | std::string_view conversion |
---|---|---|---|---|---|---|
( | C++98 | no | mostly by convention | implicit | implicit | implicit |
( | C++98 | yes | literals, otherwise not necessarily | implicit | implicit | implicit |
| C++98 | yes | yes | implicit | explicit | implicit |
| C++17 | no | no | explicit | explicit | implicit |
Conversion to std::string
and std::string_view
from char*
, char[]
and std::string
where made implicit for allowing to write
void foo(const std::string&);
void bar(std::string_view);
foo("literal");
bar("literal");
std::string str;
foo(str);
bar(str);
Spelling std::string
and std::string_view
for those use cases would have been very verbose.
Contrary to const char*
, the conversion from the non-owning container (std::string_view
) to the owning (std::string
) is explicit
void foo(const std::string&);
std::string_view strv = "..."
foo(std::string(strv));
Contrary to const char*
, the conversion from the owning container (std::string
) to the non-owning (std::string_view
) is implicit
void bar(std::string_view);
std::string str = "...";
bar(str)
Gaps in the string classes
If we want a non-owning string container type with explicit conversion, to avoid errors like
std::string_view strv = std::string("a");
We need to look outside of the standard library (supposing we want something better than char*
).
If we want a non-owning string with an implicit conversion to owning, to be able to write
void foo(const std::string&);
std::string_view strv = "...";
foo(strv);
just as we are able with const char*
, we need again to look outside of the standard library.
If we want a non-owning \0
terminated string, then std::string_view
is not the right choice, const char*
is, unfortunately, a better choice, unless designing another string class.
How many string types do we need to provide to satisfy all needs?
Probably too many, there are already a lot of string classes in the wild. The MFC library 🗄️ has CString
, the Xerces library 🗄️ has XMLString
, the Qt framework 🗄️ has QString
, the abseil library 🗄️ has absl::string_view
, the EASTL 🗄️ library has eastl::string
, and so on and so on
It is interesting to see that most libraries provide only an owning string class.
For my use cases, something like string_view
is the default choice. Most of the time I’m searching something into a string, handling literals, comparing sequences of characters, or just passing them along, so there is no need to allocate any memory or deep-copy the content, as I’m not mutating them.
My take on string_view
So here it is, an incomplete proof of concept of a family of string_view classes!
Obligatory xkcd.
Why a family and not a single class that can do it all?
Because some design decisions are incompatible with others. Some use cases want string_view
to convert implicitly from std::string
, others want the opposite. Some do not want any implicit conversion, while others want them in one or another direction. A class cannot satisfy all those requirements at once.
Adding a family of string_view
like types is a major overhead for the developer. Considering that (not counting wchar_t
, other character types, and other libraries), we already have literals, const char*
, std::string
, and std::string_view
for dealing with strings, there will always be a certain complexity in any project when dealing with them.
What does string_views
provide
I’ve identified (at least for my most common use cases) the following "policies"
-
conversion policy
-
format policy
-
content policy
-
allocation/copy policy
-
rw policy
The conversion policy is straightforward.
std::string_view
needs to get explicitly converted to std::string
, while the opposite is not true. In some use-cases this is the desired behavior, in other use-cases, I want the opposite behavior.
The format policy is also easy. std::string
is \0
terminated, std::string_view
is not. On one side, this is unfortunate since most OS API requires a \0
terminated string, on the other, it permits functionalities (like substring), that are otherwise not possible. So either we use const char*
, or we use std::string
which might do unnecessary allocations, or we use std::string_view
and hope that the content is \0
terminated (after all it is what we do with const char*
)
Notice that in a \0
terminated string_views
, the trailing \0
is not part of the content, otherwise .size()
would return one character more.
Neither std::string
, nor std::string_view
has a content policy, but many interfaces do. For example, when creating files, a filename cannot contain an embedded \0
or /
.
It is possible to create new string types with those invariants, or do the verifications outside of the string class. The content policy is a user-defined verification that happens on construction, making it possible to reuse an existing string class without wrapping it. Notice that \0
-termination is not a specific content policy, as the trailing \0
is not part of the interface, but a \0
-terminated string probably should have a content policy that disallows \0
for avoiding silent truncations.
The allocation/copy policy is the big difference between std::string_view
and std::string
.
std::string
performs deep copies, while std::string_view
performs shallow copies. std::string
allocates, thus copying is costly, while std::string_view
does not allocate, thus passing by value is preferred over passing by reference.
string_view
does not have an allocation policy. It is (given the name I chose for the project) out of scope, and has profound implications on how the class should be used.
To sum it up: Most string classes would have the same underlying implementation, the main difference is the constructor/conversions, presence/absence of trailing \0
, content validation, and value/reference semantics.
The library targets at least C++14, it could be backported to C++11 and C++03, but some features might be missing.
It permits to cover use cases that are currently not covered by std::string
and std::string_view
.
There is one use-case that is not covered (at least yet): a mutable string_view, as all methods are const
.
A mutable string_view
still has reference semantics, but it imposes some design decisions, like the absence of operator==
(it’s not a coincidence that std::span
does not have it 🗄️).
It would also not be possible to enforce a content policy, as there are different functions (operator[]
, iterators
, …) that are able to change the content directly.
Thus it does not make much sense to enforce a content policy on mutable strings inside the class itself.
Another design decision was not to duplicate every function of std::string
/std::string_view
.
Most functionalities can be implemented as free functions, making them easily reusable not only for string_view but also for most other string-like classes. This has also the advantage of providing a more uniform API when dealing with different string-like classes. The main disadvantage is that it does not make string_views
a drop-in replacement for the string classes inside std
.
As some functionalities depend on the invariant of the class (a generic substring for \0
-terminated string cannot work), and others are just redundant (.size()
vs .length()
), trying to add all possible functionalities would just bloat the API, without many benefits.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.