The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

string_view(s)


6 - 7 minutes read, 1452 words
Categories: c++
Keywords: abseil c++ conversion cstring eastl explicit qstring std::string std::string_view string_view(s)

Since C++17, the standard library provides std::string_view.

Some people are not happy with some of its design decisions, for example, the fact that std::string_view, just like std::string, has a bloated interface, or that a std::string is implicitly convertible to std::string_view, which might create dangling strings by accident. Another, apparently controversial design decision is that std::string_view does not point to a \0 terminated string, making it generically unsuitable as a drop-in replacement for const char* and std::string based interfaces.

Most of the design decision can be found at the original proposal, the main guidelines where

  • more performant than std::string and const char*

  • make it a drop-in replacemente for functions accepting non owning strings

While it is difficult we will have some other string_view type (if we do not count std::span, from C++20), even if the original proposal mentions that it could make sense, it’s easy to implement such a class.

In fact, \0 termination and implicit conversion are only part of a small subset of properties we might like to change from a string class. For example, we might need some string class that ensures there is no such thing as an empty string or that handles errors in a different way.

First of all, let’s look at what properties the string-type of the standard library and const char* have:

Table 1. Comparison
string type since owning \0 terminated std::string conversion const char* conversion std::string_view conversion

(const) char*

C++98

no

mostly by convention

implicit

implicit

implicit

(const) char[]

C++98

yes

literals, otherwise not necessarily

implicit

implicit

implicit

std::string

C++98

yes

yes

implicit

explicit

implicit

std::string_view

C++17

no

no

explicit

explicit

implicit

Conversion to std::string and std::string_view from char*, char[] and std::string where made implicit for allowing to write

void foo(const std::string&);
void bar(std::string_view);


foo("literal");
bar("literal");
std::string str;
foo(str);
bar(str);

Spelling std::string and std::string_view for those use-cases would have been very verbose.

Contrary to const char*, the conversion from the non-owning container (std::string_view) to the owning (std::string) is explicit

void foo(const std::string&);

std::string_view strv = "..."
foo(std::string(strv));

Contrary to const char*, the conversion from the owning container (std::string) to the non-owning (std::string_view) is implicit

void bar(std::string_view);

std::string str = "...";
bar(str)

Gaps in the string classes

If we want a non-owning string container type with explicit conversion, to avoid errors like

std::string_view strv = std::string("a");

We need to look outside of the standard library (supposing we want something better than char*).

If we want a non-owning string with an implicit conversion to owning, in order to be able to write

void foo(const std::string&);

std::string_view strv = "...";
foo(strv);

Just as we are able with const char*, we need again to look outside of the standard library.

If we want a non-owning \0 terminated string, then std::string_view is not the right choice, const char* is, unfortunately, a better choice, unless designing another string class.

How many string types do we need to provide to satisfy all needs

Probably too many, there are already a lot of string classes in the wild. The MFC library has CString, the Xerces library has XMLString, the Qt framework has QString, the abseil library has absl::string_view, the EASTL library has eastl::string and so on and so on.

It is interesting to see that most libraries provide only an owning string class.

For my use-cases, something like string_view is the default choice. Most of the time I’m searching something into a string, handling literals, comparing sequences of characters, or just passing them along, so no there is no need to allocate any memory or deep-copy the content, as I’m not mutating them.

My take on string_view

So here it is, an incomplete proof of concept of a family of string_view classes!

Obligatory xkcd: https://xkcd.com/927/

Why a family and not a single class that can do it all?

Because some design decisions are incompatible with others. Some use cases want string_view to convert implicitly from std::string, others want the opposite. Some do not want any implicit conversion, while others want them in one or another direction. A class cannot satisfy all those requirements at once.

Adding a family of string_view like types is a major overhead for the developer. Considering that (not counting wchar_t, other character types, and other libraries), we already have literals, const char* convention, std::string and std::string_view for dealing with strings, there will always be a certain complexity in any project when dealing with them.

What does string_views provide

I’ve identified (at least for my most common use-cases) the following "policies"

  • conversion policy

  • format policy

  • content policy

  • allocation/copy policy

  • rw policy

The conversion policy is straightforward.

std::string_view needs to get explicitly converted to std::string, while the opposite is not true. In some use-cases this is the desired behavior, in other use-cases, I want the opposite behavior.

The format policy is also easy. std::string is \0 terminated, std::string_view is not. On one side, this is very unfortunate since most OS API requires a \0 terminated string, on the other, it permits functionalities (like substring), that are otherwise not possible. So either we use const char*, or we use std::string which might do unnecessary allocations, or we use std::string_view and hope that the content is \0 terminated (after all its what we do with const char*, but much easier to misuse)

Notice that the trailing \0 is not part of the content (otherwise .size() would return one character more).

Neither std::string, nor std::string_view has a content policy, but many interfaces do. For example, when creating files, a filename cannot contain an embedded \0 or /.

It is possible to create new string types with those invariants, or the checks are done outside of the string class. The content policy is a user-defined verification that happens on construction, making it possible to reuse an existing string-class without wrapping it. Notice that \0-termination is not a specific content policy, as the trailing \0 is not part of the interface, but a \0-terminated string probably should have a content policy that disallows \0 for avoiding silent truncations.

The allocation/copy policy is the big difference between std::string_view and std::string.

std::string performs deeps copies, while std::string_view performs shallow copies. std::string allocates, thus copying is costly, while std::string_view does not allocate, thus passing by value is preferred than passing by reference.

string_views does not have an allocation policy. It is (given the name I chose for the project) out-of-scope, and has profound implications on how the class should be used.

To sum it up: Most strings classes would have the same underlying implementation, the main difference is the constructor/conversions, presence/absence of trailing \0, content validation, and value/reference semantic.

The library targets C++>=14, it could be backported to C++11 and C++03, but some features might be missing.

It should permit to cover a lot more use cases that are not covered by std::string and std::string_view.

There is one use-case that is not covered (at least yet): a mutable string_view (as all methods are const). A mutable string_view still has reference semantic, but it imposes some design decisions, like the absence of operator== (it’s not a coincidence that std::span does not have it).

It would also be more difficult to enforce a content policy, as there are different functions (operator[], iterators, …​) that are able to change the content directly.

Thus it does not make much sense to enforce a content policy on mutable strings inside the class itself.

Another design decision was not duplicate every function of std::string/std::string_view.

Most functionalities can be implemented as free functions, making them easily reusable not only for string_view, but also for most other string-like classes. This has also the advantage to provide a more uniform API when dealing with different string-like classes. The main disadvantage is that it does not make string_views a drop-in replacement for the string classes inside std.

As some functionalities depend on the invariant of the class (a generic substring for \0-terminated string cannot work), and others are just redundant (.size() vs .length()), trying to add all possible functionalities would just bloat the API, without many benefits.