string_view(s)
Since C++17, the standard library provides std::string_view
.
Some people are not happy with some of its design decisions, for example, the fact that std::string_view
, just like std::string
, has a bloated interface, or that a std::string
is implicitly convertible to std::string_view
, which might create dangling strings by accident. Another, apparently controversial design decision is that std::string_view
does not point to a \0
terminated string, making it generically unsuitable as a drop-in replacement for const char*
and std::string
based interfaces.
Most of the design decision can be found at the original proposal, the main guidelines where
-
more performant than
std::string
andconst char*
-
make it a drop-in replacemente for functions accepting non owning strings
While it is difficult we will have some other string_view type (if we do not count std::span, from C++20), even if the original proposal mentions that it could make sense, it’s easy to implement such a class.
In fact, \0
termination and implicit conversion are only part of a small subset of properties we might like to change from a string class. For example, we might need some string class that ensures there is no such thing as an empty string or that handles errors in a different way.
First of all, let’s look at what properties the string-type of the standard library and const char*
have:
string type | since | owning | \0 terminated | std::string conversion | const char* conversion | std::string_view conversion |
---|---|---|---|---|---|---|
(const) char* | C++98 | no | mostly by convention | implicit | implicit | implicit |
(const) char[] | C++98 | yes | literals, otherwise not necessarily | implicit | implicit | implicit |
std::string | C++98 | yes | yes | implicit | explicit | implicit |
std::string_view | C++17 | no | no | explicit | explicit | implicit |
Conversion to std::string
and std::string_view
from char*
, char[]
and std::string
where made implicit for allowing to write
void foo(const std::string&);
void bar(std::string_view);
foo("literal");
bar("literal");
std::string str;
foo(str);
bar(str);
Spelling std::string
and std::string_view
for those use-cases would have been very verbose.
Contrary to const char*
, the conversion from the non-owning container (std::string_view
) to the owning (std::string
) is explicit
void foo(const std::string&);
std::string_view strv = "..."
foo(std::string(strv));
Contrary to const char*
, the conversion from the owning container (std::string
) to the non-owning (std::string_view
) is implicit
void bar(std::string_view);
std::string str = "...";
bar(str)
Gaps in the string classes
If we want a non-owning string container type with explicit conversion, to avoid errors like
std::string_view strv = std::string("a");
We need to look outside of the standard library (supposing we want something better than char*
).
If we want a non-owning string with an implicit conversion to owning, in order to be able to write
void foo(const std::string&);
std::string_view strv = "...";
foo(strv);
Just as we are able with const char*
, we need again to look outside of the standard library.
If we want a non-owning \0
terminated string, then std::string_view
is not the right choice, const char*
is, unfortunately, a better choice, unless designing another string class.
How many string types do we need to provide to satisfy all needs
Probably too many, there are already a lot of string classes in the wild. The MFC library has CString
, the Xerces library has XMLString
, the Qt framework has QString
, the abseil library has absl::string_view
, the EASTL library has eastl::string
and so on and so on.
It is interesting to see that most libraries provide only an owning string class.
For my use-cases, something like string_view
is the default choice. Most of the time I’m searching something into a string, handling literals, comparing sequences of characters, or just passing them along, so no there is no need to allocate any memory or deep-copy the content, as I’m not mutating them.
My take on string_view
So here it is, an incomplete proof of concept of a family of string_view classes!
Obligatory xkcd: https://xkcd.com/927/
Why a family and not a single class that can do it all?
Because some design decisions are incompatible with others. Some use cases want string_view
to convert implicitly from std::string
, others want the opposite. Some do not want any implicit conversion, while others want them in one or another direction. A class cannot satisfy all those requirements at once.
Adding a family of string_view
like types is a major overhead for the developer. Considering that (not counting wchar_t
, other character types, and other libraries), we already have literals, const char*
convention, std::string
and std::string_view
for dealing with strings, there will always be a certain complexity in any project when dealing with them.
What does string_views
provide
I’ve identified (at least for my most common use-cases) the following "policies"
-
conversion policy
-
format policy
-
content policy
-
allocation/copy policy
-
rw policy
The conversion policy is straightforward.
std::string_view
needs to get explicitly converted to std::string
, while the opposite is not true. In some use-cases this is the desired behavior, in other use-cases, I want the opposite behavior.
The format policy is also easy. std::string
is \0
terminated, std::string_view
is not. On one side, this is very unfortunate since most OS API requires a \0
terminated string, on the other, it permits functionalities (like substring), that are otherwise not possible. So either we use const char*
, or we use std::string
which might do unnecessary allocations, or we use std::string_view
and hope that the content is \0
terminated (after all its what we do with const char*
, but much easier to misuse)
Notice that the trailing \0
is not part of the content (otherwise .size()
would return one character more).
Neither std::string
, nor std::string_view
has a content policy, but many interfaces do. For example, when creating files, a filename cannot contain an embedded \0
or /
.
It is possible to create new string types with those invariants, or the checks are done outside of the string class. The content policy is a user-defined verification that happens on construction, making it possible to reuse an existing string-class without wrapping it. Notice that \0
-termination is not a specific content policy, as the trailing \0
is not part of the interface, but a \0
-terminated string probably should have a content policy that disallows \0
for avoiding silent truncations.
The allocation/copy policy is the big difference between std::string_view
and std::string
.
std::string
performs deeps copies, while std::string_view
performs shallow copies. std::string
allocates, thus copying is costly, while std::string_view
does not allocate, thus passing by value is preferred than passing by reference.
string_views does not have an allocation policy. It is (given the name I chose for the project) out-of-scope, and has profound implications on how the class should be used.
To sum it up: Most strings classes would have the same underlying implementation, the main difference is the constructor/conversions, presence/absence of trailing \0
, content validation, and value/reference semantic.
The library targets C++>=14, it could be backported to C++11 and C++03, but some features might be missing.
It should permit to cover a lot more use cases that are not covered by std::string
and std::string_view
.
There is one use-case that is not covered (at least yet): a mutable string_view (as all methods are const
). A mutable string_view
still has reference semantic, but it imposes some design decisions, like the absence of operator==
(it’s not a coincidence that std::span
does not have it).
It would also be more difficult to enforce a content policy, as there are different functions (operator[]
, iterators
, …) that are able to change the content directly.
Thus it does not make much sense to enforce a content policy on mutable strings inside the class itself.
Another design decision was not duplicate every function of std::string
/std::string_view
.
Most functionalities can be implemented as free functions, making them easily reusable not only for string_view, but also for most other string-like classes. This has also the advantage to provide a more uniform API when dealing with different string-like classes. The main disadvantage is that it does not make string_views
a drop-in replacement for the string classes inside std
.
As some functionalities depend on the invariant of the class (a generic substring for \0
-terminated string cannot work), and others are just redundant (.size()
vs .length()
), trying to add all possible functionalities would just bloat the API, without many benefits.