<!DOCTYPE html>
    <html>
    <head>
        <meta charset="UTF-8">
        <title>&grave;string&lowbar;view&grave; range constructor should be &grave;explicit&grave;</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css">
<style>
body { color: #000000; background-color: #FFFFFF; } del { text-decoration: line-through; color: #8B0040; } ins { text-decoration: underline; color: #005100; } table { border-collapse: collapse; } th { border: 1px solid black; padding-left: 0.8em; padding-right: 0.8em; vertical-align: top; } td { border: 1px solid black; padding-left: 0.8em; padding-right: 0.8em; vertical-align: top; } span.comment { font-style: italic; } span.comment code { font-style: normal; } span.comment em { font-weight: bold; } span.comment var { font-style: normal; } p.example { margin-left: 1em; } pre.example { margin-left: 1em; } div.example { margin-left: 1em; } code.extract { background-color: #F5F6A2; } pre.extract { margin-left: 2em; background-color: #F5F6A2; border: 1px solid #E1E28E; } .attribute { margin-left: 2em; } .attribute dt { float: left; font-style: italic; padding-right: 1ex; } .attribute dd { margin-left: 0em; } blockquote.std { color: #000000; background-color: #F1F1F1; border: 1px solid #D1D1D1; padding-left: 0.5em; padding-right: 0.5em; } blockquote.std.ins { text-decoration: underline; color: #000000; background-color: #C8FFC8; border: 1px solid #B3EBB3; } blockquote.std.del { text-decoration: line-through; color: #000000; background-color: #FFC8EB; border: 1px solid #ECB3C7; } blockquote.std div { margin-top: 1em; margin-bottom: 1em; } blockquote.std ins { text-decoration: underline; color: #000000; background-color: #C8FFC8; } blockquote.std del { text-decoration: line-through; color: #000000; background-color: #FFC8EB; } blockquote.std ins * { background-color: inherit; } blockquote.std del * { background-color: inherit; } blockquote.std dt { margin-top: 1em; } blockquote.std ul { list-style-type: none; padding-left: 2em; margin-top: -0.2em; margin-bottom: -0.2em; } blockquote.std li { margin-top: 0.6em; margin-bottom: 0.6em; } blockquote.std ul > li::before { content: '\2014'; position: absolute; margin-left: -1.5em; } blockquote.std table { border: 1px solid black; border-collapse: collapse; margin-left: auto; margin-right: auto; margin-top: 0.8em; text-align: left; hyphens: none; } blockquote.std caption { margin-bottom: 1em; } blockquote.std th { border: inherit; padding-left: 1em; padding-right: 1em; vertical-align: top; } blockquote.std td { border: inherit; padding-left: 1em; padding-right: 1em; vertical-align: top; } blockquote.std th.left, td.left { text-align: left; } blockquote.std th.right, td.right { text-align: right; } blockquote.std th.center, td.center { text-align: center; } blockquote.std th.justify, td.justify { text-align: justify; } blockquote.std th.border, td.border { border-left: 1px solid black; } blockquote.std tr.rowsep, td.cline { border-top: 1px solid black; } blockquote.std tr.capsep { border-top: 3px solid black; border-top-style: double; } blockquote.std th { border-bottom: 1px solid black; } div.stdnote { display: inline; } div.stdexample { display: inline; } a.stdref::before { content: "["; } a.stdref::after { content: "]"; } table.frontmatter { border: 0; margin: 0; } table.frontmatter th { border: 0; } table.frontmatter td { border: 0; } span.highlight { background-color: #7FDFFF }
</style>


    </head>
    <body class="vscode-body vscode-light">
        <script type="text/javascript">
    document.addEventListener("DOMContentLoaded", function() {
        var notes = document.getElementsByClassName("stdnote");
        for (var n = 0; n < notes.length; ++n) {
            var node = notes[n];
            node.insertAdjacentHTML("beforebegin",
                "<span>[&nbsp;<i>Note:<\/i> <\/span>");
            node.insertAdjacentHTML("beforeEnd",
                "<span> &mdash;&nbsp;<i>end note<\/i>&nbsp;]<\/span>");
        }

        var notes_ins = document.getElementsByClassName("stdnote-ins");
        for (var n = 0; n < notes_ins.length; ++n) {
            var node = notes_ins[n];
            node.insertAdjacentHTML("beforebegin",
                "<span><ins>[&nbsp;<i>Note:<\/i> <\/ins><\/span>");
            node.insertAdjacentHTML("beforeEnd",
                "<span><ins> &mdash;&nbsp;<i>end note<\/i>&nbsp;]<\/ins><\/span>");
        }

        var examples = document.getElementsByClassName("stdexample");
        for (var n = 0; n < examples.length; ++n) {
            var node = examples[n];
            node.insertAdjacentHTML("beforebegin",
                "<span>[&nbsp;<i>Example:<\/i> <\/span>");
            node.insertAdjacentHTML("beforeEnd",
                "<span> &mdash;&nbsp;<i>end example<\/i>&nbsp;]<\/span>");
        }

        var examples_ins = document.getElementsByClassName("stdexample-ins");
        for (var n = 0; n < examples_ins.length; ++n) {
            var node = examples_ins[n];
            node.insertAdjacentHTML("beforebegin",
                "<span><ins>[&nbsp;<i>Example:<\/i> <\/ins><\/span>");
            node.insertAdjacentHTML("beforeEnd",
                "<span><ins> &mdash;&nbsp;<i>end example<\/i>&nbsp;]<\/ins><\/span>");
        }

        var references = document.getElementsByClassName("stdref");
        for (var n = 0; n < references.length; ++n) {
            var node = references[n];
            node.setAttribute("href", "https://eel.is/c++draft/" + node.innerText);
        }

        var wg21links = document.getElementsByClassName("wg21link");
        for (var n = 0; n < wg21links.length; ++n) {
            var node = wg21links[n];
            node.setAttribute("href", "https://wg21.link/" + node.innerText);
        }
    });
</script>
<h1 id="string_view-range-constructor-should-be-explicit"><code>string_view</code> range constructor should be <code>explicit</code></h1>
<table class="frontmatter" border="0" cellpadding="0" cellspacing="0" width="619">
    <tr>
        <td align="left" valign="top">Document number:</td>
        <td>P2499R0</td>
    </tr>
    <tr>
        <td align="left" valign="top">Date:</td>
        <td>2021-12-07</td>
    </tr>
    <tr>
        <td align="left" valign="top">Project:</td>
        <td>Programming Language C++</td>
    </tr>
    <tr>
        <td align="left" valign="top">Audience:</td>
        <td>LEWG</td>
    </tr>
    <tr>
        <td align="left" valign="top">Reply-to:</td>
        <td>James Touton &lt;<a href="mailto:bekenn@gmail.com">bekenn@gmail.com</a>&gt;</td>
    </tr>
</table>
<h2 id="introduction">Introduction</h2>
<p><a class="wg21link">P1989R2</a> added a new constructor to <code>basic_string_view</code> that allows for implicit conversion from any contiguous range of the corresponding character type.  This implicit conversion relies on the premise that a range of <code>char</code> is inherently string-like.  While that premise holds in some situations, it is hardly universally true, and the implicit conversion is likely to cause problems.  This paper proposes making the conversion explicit instead of implicit in order to avoid misleading programmers.</p>
<h2 id="rationale">Rationale</h2>
<p><a class="wg21link">P1391R3</a> (a precursor to <a class="wg21link">P1989R2</a>) justifies making the conversion implicit with the incorrect notion that &quot;a contiguous range of character[s] is the same platonic thing as a <code>string_view</code>&quot;, despite correctly pointing out that &quot;[ranges] with different [traits types] should not be implicitly convertible&quot;.  The latter acknowledgment recognizes that there are semantic nuances here beyond the value type, and as a result, no direct conversion is provided from range types having a mismatched <code>traits_type</code>.</p>
<p>One such semantic difference between a string and an arbitrary range of <code>char</code> is mentioned in <a class="wg21link">P1391R3</a> (lightly modified for correctness):</p>
<blockquote>
<pre class="example"><code class="language-c++"><span class="hljs-keyword">char</span> <span class="hljs-keyword">const</span> t[] = <span class="hljs-string">&quot;text&quot;</span>;
<span class="hljs-function"><span class="hljs-built_in">std</span>::string_view <span class="hljs-title">s1</span><span class="hljs-params">(t)</span></span>; <span class="hljs-comment">// s1.size() == 4;</span>

<span class="hljs-function"><span class="hljs-built_in">std</span>::span&lt;<span class="hljs-keyword">char</span> <span class="hljs-keyword">const</span>&gt; <span class="hljs-title">tv</span><span class="hljs-params">(t)</span></span>;
<span class="hljs-function"><span class="hljs-built_in">std</span>::string_view <span class="hljs-title">s2</span><span class="hljs-params">(tv)</span></span>; <span class="hljs-comment">// s2.size() == 5;</span>
</code></pre>
</blockquote>
<p>Here, <code>s1</code> and <code>s2</code> are constructed from equivalent ranges of <code>const char</code>, but the resulting <code>string_view</code> objects are different.  This is because overload resolution for the array argument selects <code>string_view</code>'s constructor from <code>const char*</code>, a type which by convention points to a string followed by a null terminator.  The terminator is not semantically part of the string, so the resulting <code>string_view</code> doesn't include it.  The span, by contrast, does include the null terminator.</p>
<p>Laudably, <a class="wg21link">P1989R2</a> recognizes several mechanisms by which a type may indicate that it provides string-like data, and the range constructor is disabled in these cases:</p>
<ul>
<li>The type is implicitly convertible to <code>const charT*</code></li>
<li>The type provides its own conversion function to the target <code>basic_string_view</code> specialization</li>
<li>The type defines its own <code>traits_type</code>, and that type differs from the string view's <code>traits_type</code></li>
</ul>
<p>The presence of these mechanisms refutes the notion that &quot;a contiguous range of character[s] is the same platonic thing as a <code>string_view</code>&quot;.  Nonetheless, it is certainly true that constructing a <code>string_view</code> from a range of <code>char</code> is a useful operation, provided that the user knows that the entire range actually constitutes a string.  This paper therefore proposes to keep the range constructor, but make it <code>explicit</code>.</p>
<h3 id="pitfalls">Pitfalls</h3>
<p>Very often, a contiguous range of <code>char</code> is used as a buffer for storing string data.  This does not imply that the <em>entire</em> range constitutes a string:</p>
<pre class="example"><code class="language-c++"><span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">get_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::span&lt;<span class="hljs-keyword">char</span>&gt; buffer)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">use_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::string_view str)</span></span>;

<span class="hljs-keyword">char</span> buf[<span class="hljs-number">200</span>];
get_string(buf);
use_string(buf);
</code></pre>
<p>This code is representative of quite a lot of real-world code that exists today.  The <code>get_string</code> function fills a portion of a buffer with a null-terminated string, and the <code>use_string</code> function consumes that string.  This code works in C++20, and would also work in C++17 with a minor modification to <code>get_string</code> to pass the buffer as a pointer and size instead of as a span.  This code will continue to work in the presence of <a class="wg21link">P1989R2</a>; the range constructor is disabled because the array is convertible to <code>const char*</code> (and even if it weren't disabled, overload resolution would prefer the <code>const char*</code> constructor anyway).</p>
<p>Many code style guidelines emphasize the use of <code>std::array</code> over raw arrays, so let's make that change:</p>
<pre class="example"><code class="language-c++"><span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">get_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::span&lt;<span class="hljs-keyword">char</span>&gt; buffer)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">use_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::string_view str)</span></span>;

<span class="hljs-built_in">std</span>::<span class="hljs-built_in">array</span>&lt;<span class="hljs-keyword">char</span>, 200&gt; buf;
get_string(buf);
use_string(buf); <span class="hljs-comment">// oops</span>
</code></pre>
<p>The code compiles and runs, and in many cases will appear to work, but where the length of the <code>string_view</code> parameter used to be inferred from the presence of a null terminator, it is now unavoidably the size of the entire buffer, and unquestionably wrong given that the prior code was correct.  If the range constructor were <code>explicit</code>, this code would generate an error diagnostic.</p>
<p>The same sort of thing can easily happen with <code>vector</code>s.  For instance, an API might require the user to invoke a function that provides an estimate for a buffer size, which the user then allocates before calling another function that fills the buffer.  The estimate may return a size greater than that actually needed by the resulting string if calculating the exact size would be expensive:</p>
<pre class="example"><code class="language-c++"><span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">size_t</span> <span class="hljs-title">estimate_string_size</span><span class="hljs-params">()</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">get_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::span&lt;<span class="hljs-keyword">char</span>&gt; buffer)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">use_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::string_view str)</span></span>;

<span class="hljs-keyword">size_t</span> estimated_size = estimate_string_size();
<span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">vector</span>&lt;<span class="hljs-keyword">char</span>&gt; <span class="hljs-title">buf</span><span class="hljs-params">(estimated_size)</span></span>;
get_string(buf);
use_string(buf); <span class="hljs-comment">// oops</span>
</code></pre>
<p><a class="wg21link">P1391R3</a> states: &quot;We think this proposed design is consistent with existing practices of having to be explicit about the size in the presence of embedded nulls[.]&quot;  This paper respectfully disagrees.</p>
<h2 id="design-decisions">Design decisions</h2>
<p>The intent of <a class="wg21link">P1989R2</a> is to allow for conversion from a range to a string view.  LEWG has already decided that this is a good idea, and this paper concurs.  Removing the range constructor would be counter-productive, but keeping it in its current form is also problematic.  That leaves us with a couple of options.</p>
<h3 id="option-1-make-the-range-constructor-explicit">Option 1: Make the range constructor <code>explicit</code></h3>
<p>This is the preferred approach of this paper.  This approach preserves the functionality gains offered by <a class="wg21link">P1989R2</a> while making it harder to invoke the conversion by accident.  Users who know that the source range actually represents a string can still take advantage of the conversion.  Consider the <code>vector</code> example above, but with <code>get_string</code> modified to return the number of characters written to the buffer:</p>
<pre class="example"><code class="language-c++"><span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">size_t</span> <span class="hljs-title">estimate_string_size</span><span class="hljs-params">()</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">size_t</span> <span class="hljs-title">get_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::span&lt;<span class="hljs-keyword">char</span>&gt; buffer)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">extern</span> <span class="hljs-keyword">void</span> <span class="hljs-title">use_string</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::string_view str)</span></span>;

<span class="hljs-keyword">size_t</span> estimated_size = estimate_string_size();
<span class="hljs-function"><span class="hljs-built_in">std</span>::<span class="hljs-built_in">vector</span>&lt;<span class="hljs-keyword">char</span>&gt; <span class="hljs-title">buf</span><span class="hljs-params">(estimated_size)</span></span>;
<span class="hljs-keyword">size_t</span> actual_size = get_string(buf);
buf.resize(actual_size);
use_string(<span class="hljs-built_in">std</span>::string_view(buf)); <span class="hljs-comment">// ok</span>
</code></pre>
<h3 id="option-2-make-the-range-constructor-conditionally-explicit">Option 2: Make the range constructor conditionally <code>explicit</code></h3>
<p>If the source type defines its own <code>traits_type</code>, and that type is the same as the string view's <code>traits_type</code>, then the source range can reasonably be assumed to represent a string.  This appears to be a good approach, but does add a small amount of complexity to the specification and may be a more difficult rule to teach than Option 1.  This paper is not opposed to Option 2.</p>
<h3 id="option-3-make-the-range-constructor-conditionally-explicit-and-remove-the-traits_type-constraint">Option 3: Make the range constructor (conditionally) <code>explicit</code> and remove the <code>traits_type</code> constraint</h3>
<p>This modifies either Option 1 or Option 2 by additionally removing the constraint that the source range's <code>traits_type</code> (if present) must match the string view's <code>traits_type</code>.  Given that the constructor is already <code>explicit</code>, the user is already primed to expect that the resulting string view is not semantically equivalent to the source range in every respect.  Moreover, the name <code>traits_type</code> is somewhat generic; there's nothing in that name that implies the traits are string traits.</p>
<p>This change would allow for explicit conversion from a string or string view with dissimilar traits.  This paper agrees with <code>P1391R3</code> that &quot;strings with different [traits types] should not be implicitly convertible&quot;, but an <em>explicit</em> conversion may be sensible.  This paper does not attempt to explore the consequences of this design, and so this approach is not recommended.</p>
<h2 id="wording">Wording</h2>
<p>All modifications are presented relative to <a class="wg21link">N4901</a>.</p>
<p>Modify §21.4.3.1 <a class="stdref">string.view.template.general</a> and the corresponding heading prior to §21.4.3.2 <a class="stdref">string.view.cons</a> paragraph 11:</p>
<blockquote class="std">
<div><pre><code>template&lt;class R&gt;
  constexpr <ins>explicit </ins>basic_string_view(R&amp;&amp; r);</code></pre></div>
</blockquote>
<h2 id="references">References</h2>
<ol>
<li>[<a class="wg21link">P1391R3</a>] Corentin Jabot; &quot;Range constructor for std::string_view&quot;</li>
<li>[<a class="wg21link">P1989R2</a>] Corentin Jabot; &quot;Range constructor for std::string_view 2: Constrain Harder&quot;</li>
<li>[<a class="wg21link">N4901</a>] &quot;Working Draft, Standard for Programming Language C++&quot;</li>
</ol>

    </body>
    </html>
