The C++ logo, by Jeremy Kratz, licensed under CC0 1.0 Universal

string_view(s)


6 - 7 minutes read, 1452 words
Categories: c++
Keywords: abseil c++ conversion cstring eastl explicit qstring std::string string_view

Since C++17, the standard library provides std::string_view.

Some people are not happy with some of its design decisions, for example, the fact that std::string_view, just like std::string, has a bloated interface, or that a std::string is implicitly convertible to std::string_view, which might create dangling strings by accident. Another, apparently controversial design decision is that std::string_view does not point to a \0 terminated string, making it generically unsuitable as a drop-in replacement for const char* and std::string based interfaces.

Most of the design decision can be found at the original proposal, the main guidelines where

  • more performant than std::string and const char*

  • make it a drop-in replacemente for functions accepting non owning strings

While it is difficult we will have some other string_view type (if we do not count std::span, from C++20), even if the original proposal mentions that it could make sense, it’s easy to implement such a class.

In fact, \0 termination and implicit conversion are only part of a small subset of properties we might like to change from a string class. For example, we might need some string class that ensures there is no such thing as an empty string or that handles errors in a different way.

First of all, let’s look at what properties the string-type of the standard library and const char* have:

Table 1. Comparison
string type since owning \0 terminated std::string conversion const char* conversion std::string_view conversion

(const) char*

C++98

no

mostly by convention

implicit

implicit

implicit

(const) char[]

C++98

yes

literals, otherwise not necessarily

implicit

implicit

implicit

std::string

C++98

yes

yes

implicit

explicit

implicit

std::string_view

C++17

no

no

explicit

explicit

implicit

Conversion to std::string and std::string_view from char*, char[] and std::string where made implicit for allowing to write

void foo(const std::string&);
void bar(std::string_view);


foo("literal");
bar("literal");
std::string str;
foo(str);
bar(str);

Spelling std::string and std::string_view for those use-cases would have been very verbose.

Contrary to const char*, the conversion from the non-owning container (std::string_view) to the owning (std::string) is explicit

void foo(const std::string&);

std::string_view strv = "..."
foo(std::string(strv));

Contrary to const char*, the conversion from the owning container (std::string) to the non-owning (std::string_view) is implicit

void bar(std::string_view);

std::string str = "...";
bar(str)

Gaps in the string classes

If we want a non-owning string container type with explicit conversion, to avoid errors like

std::string_view strv = std::string("a");

We need to look outside of the standard library (supposing we want something better than char*).

If we want a non-owning string with an implicit conversion to owning, in order to be able to write

void foo(const std::string&);

std::string_view strv = "...";
foo(strv);

Just as we are able with const char*, we need again to look outside of the standard library.

If we want a non-owning \0 terminated string, then std::string_view is not the right choice, const char* is, unfortunately, a better choice, unless designing another string class.

How many string types do we need to provide to satisfy all needs

Probably too many, there are already a lot of string classes in the wild. The MFC library has CString, the Xerces library has XMLString, the Qt framework has QString, the abseil library has absl::string_view, the EASTL library has eastl::string and so on and so on.

It is interesting to see that most libraries provide only an owning string class.

For my use-cases, something like string_view is the default choice. Most of the time I’m searching something into a string, handling literals, comparing sequences of characters, or just passing them along, so no there is no need to allocate any memory or deep-copy the content, as I’m not mutating them.

My take on string_view

So here it is, an incomplete proof of concept of a family of string_view classes!

Obligatory xkcd: https://xkcd.com/927/

Why a family and not a single class that can do it all?

Because some design decisions are incompatible with others. Some use cases want string_view to convert implicitly from std::string, others want the opposite. Some do not want any implicit conversion, while others want them in one or another direction. A class cannot satisfy all those requirements at once.

Adding a family of string_view like types is a major overhead for the developer. Considering that (not counting wchar_t, other character types, and other libraries), we already have literals, const char* convention, std::string and std::string_view for dealing with strings, there will always be a certain complexity in any project when dealing with them.

What does string_views provide

I’ve identified (at least for my most common use-cases) the following "policies"

  • conversion policy

  • format policy

  • content policy

  • allocation/copy policy

  • rw policy

The conversion policy is straightforward.

std::string_view needs to get explicitly converted to std::string, while the opposite is not true. In some use-cases this is the desired behavior, in other use-cases, I want the opposite behavior.

The format policy is also easy. std::string is \0 terminated, std::string_view is not. On one side, this is very unfortunate since most OS API requires a \0 terminated string, on the other, it permits functionalities (like substring), that are otherwise not possible. So either we use const char*, or we use std::string which might do unnecessary allocations, or we use std::string_view and hope that the content is \0 terminated (after all its what we do with const char*, but much easier to misuse)

Notice that the trailing \0 is not part of the content (otherwise .size() would return one character more).

Neither std::string, nor std::string_view has a content policy, but many interfaces do. For example, when creating files, a filename cannot contain an embedded \0 or /.

It is possible to create new string types with those invariants, or the checks are done outside of the string class. The content policy is a user-defined verification that happens on construction, making it possible to reuse an existing string-class without wrapping it. Notice that \0-termination is not a specific content policy, as the trailing \0 is not part of the interface, but a \0-terminated string probably should have a content policy that disallows \0 for avoiding silent truncations.

The allocation/copy policy is the big difference between std::string_view and std::string.

std::string performs deeps copies, while std::string_view performs shallow copies. std::string allocates, thus copying is costly, while std::string_view does not allocate, thus passing by value is preferred than passing by reference.

string_views does not have an allocation policy. It is (given the name I chose for the project) out-of-scope, and has profound implications on how the class should be used.

To sum it up: Most strings classes would have the same underlying implementation, the main difference is the constructor/conversions, presence/absence of trailing \0, content validation, and value/reference semantic.

The library targets C++>=14, it could be backported to C++11 and C++03, but some features might be missing.

It should permit to cover a lot more use cases that are not covered by std::string and std::string_view.

There is one use-case that is not covered (at least yet): a mutable string_view (as all methods are const). A mutable string_view still has reference semantic, but it imposes some design decisions, like the absence of operator== (it’s not a coincidence that std::span does not have it).

It would also be more difficult to enforce a content policy, as there are different functions (operator[], iterators, …​) that are able to change the content directly.

Thus it does not make much sense to enforce a content policy on mutable strings inside the class itself.

Another design decision was not duplicate every function of std::string/std::string_view.

Most functionalities can be implemented as free functions, making them easily reusable not only for string_view, but also for most other string-like classes. This has also the advantage to provide a more uniform API when dealing with different string-like classes. The main disadvantage is that it does not make string_views a drop-in replacement for the string classes inside std.

As some functionalities depend on the invariant of the class (a generic substring for \0-terminated string cannot work), and others are just redundant (.size() vs .length()), trying to add all possible functionalities would just bloat the API, without many benefits.


Do you want to share your opinion? Or is there an error, same parts that are not clear enough?

You can contact me here.