<p><em>Document number: P2166R0</em></p>
<p><em>Project: Programming Language C++</em></p>
<p><em>Audience: LEWG-I, LEWG, LWG</em></p>
<p><em>Yuriy Chernyshov &lt;georgthegreat@gmail.com&gt;, &lt;thegeorg@yandex-team.ru&gt;</em></p>
<p><em>Date: 2020-05-06</em></p>
<h1 id="a-proposal-to-prohibit-stdbasic_string-and-stdbasic_string_view-construction-from-nullptr.">A Proposal to Prohibit std::basic_string and std::basic_string_view construction from nullptr.</h1>
<h2 id="introduction-and-motivation">Introduction and Motivation</h2>
<p>According to the C++ Standard, the behavior of <code>std::basic_string::basic_string(const CharT* s)</code> constructor <em>is undefined if [s, s + Traits::length(s)) is not a valid range (for example, if s is a null pointer)</em> (citation is taken <a href="https://en.cppreference.com/w/cpp/string/basic_string/basic_string">from cppreference.com</a>, the standard have slighty different wording in <a href="https://wg21.link/string.cons#12">21.3.2.2 [string.cons]</a>). Same applies to <code>std::basic_string_view::basic_string_view(const CharT* s)</code> constructor.</p>
<p>Existing implementations (i. e. <a href="https://github.com/llvm/llvm-project/blob/1b678ee8a6cc7510801b7c5be2bcde08ff8bbd6e/libcxx/include/string#L822">libc++</a>) might add a runtime assertion to forbid such behavior. Certain OpenSource projects would trigger this assertion. The list includes, but not limited to:</p>
<ul>
<li><a href="https://github.com/pocoproject/poco/blob/3fc3e5f5b8462f7666952b43381383a79b8b5d92/Data/ODBC/include/Poco/Data/ODBC/Extractor.h#L465">poco</a>,</li>
<li><a href="https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse/c++/src/H5PropList.cpp#558">hdf5</a>,</li>
<li><a href="https://github.com/llvm/llvm-project/blob/ca09dab303f4fd72343be10dbd362b60a5f91c45/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp#L1319">llvm</a> project itself, though the code is marked as unreachable.</li>
</ul>
<p>On a large private monorepo applying proposed changes and running an automatic CI-check helped to find 7 problematic projects (the number includes projects listed above), one of which would actually segfault if the code was reached (and the code was really easy reachable).</p>
<p>This proposal attempts to improve the diagnostics by explicitly deleting the problematic constructors, thus moving these assertions to compile time.</p>
<h2 id="impact-on-the-standard">Impact on the Standard</h2>
<p>This proposal changes <code>&lt;string&gt;</code> and <code>&lt;string_view&gt;</code> headers only and does not affect the language core.</p>
<h2 id="proposed-wording">Proposed Wording</h2>
<p>The wording is relative to <a href="https://wg21.link/n4861">N4861</a>.</p>
<ol type="1">
<li>Modify <a href="https://wg21.link/basic.string">21.3.2 [basic.string]</a> as follows:</li>
</ol>
<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1"></a>[...]</span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="kw">namespace</span> std {</span>
<span id="cb1-3"><a href="#cb1-3"></a>    <span class="kw">template</span>&lt;<span class="kw">class</span> charT, <span class="kw">class</span> traits = char_traits&lt;charT&gt;,</span>
<span id="cb1-4"><a href="#cb1-4"></a>        <span class="kw">class</span> Allocator = allocator&lt;charT&gt;&gt;</span>
<span id="cb1-5"><a href="#cb1-5"></a>    <span class="kw">class</span> basic_string {</span>
<span id="cb1-6"><a href="#cb1-6"></a>    <span class="kw">public</span>:</span>
<span id="cb1-7"><a href="#cb1-7"></a>        <span class="co">// types</span></span>
<span id="cb1-8"><a href="#cb1-8"></a>        [...]</span>
<span id="cb1-9"><a href="#cb1-9"></a>        <span class="co">// [string.cons], construct/copy/destroy</span></span>
<span id="cb1-10"><a href="#cb1-10"></a>        [...]</span>
<span id="cb1-11"><a href="#cb1-11"></a>        <span class="kw">constexpr</span> basic_string(<span class="at">const</span> charT* s, <span class="dt">size_type</span> n, <span class="at">const</span> Allocator&amp; a = Allocator());</span>
<span id="cb1-12"><a href="#cb1-12"></a>        <span class="kw">constexpr</span> basic_string(<span class="at">const</span> charT* s, <span class="at">const</span> Allocator&amp; a = Allocator());</span>
<span id="cb1-13"><a href="#cb1-13"></a>    +   <span class="kw">constexpr</span> basic_string(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;</span>
<span id="cb1-14"><a href="#cb1-14"></a>        [...]</span>
<span id="cb1-15"><a href="#cb1-15"></a>        <span class="kw">template</span>&lt;<span class="kw">class</span> T&gt;</span>
<span id="cb1-16"><a href="#cb1-16"></a>        <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="at">const</span> T&amp; t);</span>
<span id="cb1-17"><a href="#cb1-17"></a>        <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="at">const</span> charT* s);</span>
<span id="cb1-18"><a href="#cb1-18"></a>    +   <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;</span>
<span id="cb1-19"><a href="#cb1-19"></a>        [...]</span>
<span id="cb1-20"><a href="#cb1-20"></a>    };</span>
<span id="cb1-21"><a href="#cb1-21"></a>    [...]</span>
<span id="cb1-22"><a href="#cb1-22"></a>}</span></code></pre></div>
<ol start="2" type="1">
<li>Modify <a href="https://wg21.link/string.view.synop">21.4.1 [string.view.synop]</a> as indicated:</li>
</ol>
<div class="sourceCode" id="cb2"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb2-1"><a href="#cb2-1"></a>[...]</span>
<span id="cb2-2"><a href="#cb2-2"></a><span class="kw">template</span>&lt;<span class="kw">class</span> charT, <span class="kw">class</span> traits = char_traits&lt;charT&gt;&gt;</span>
<span id="cb2-3"><a href="#cb2-3"></a><span class="kw">class</span> basic_string_view {</span>
<span id="cb2-4"><a href="#cb2-4"></a><span class="kw">public</span>:</span>
<span id="cb2-5"><a href="#cb2-5"></a><span class="co">// types</span></span>
<span id="cb2-6"><a href="#cb2-6"></a>    [...]</span>
<span id="cb2-7"><a href="#cb2-7"></a>    <span class="kw">constexpr</span> basic_string_view(<span class="at">const</span> charT* str);</span>
<span id="cb2-8"><a href="#cb2-8"></a>+   <span class="kw">constexpr</span> basic_string_view(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;</span>
<span id="cb2-9"><a href="#cb2-9"></a>    [...]</span>
<span id="cb2-10"><a href="#cb2-10"></a>};</span></code></pre></div>
<h2 id="further-discourse">Further Discourse</h2>
<p>These changes would not allow to remove runtime check, the following code will remain compilable and will trigger the assertion:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb3-1"><a href="#cb3-1"></a><span class="at">const</span> <span class="dt">char</span> *p = <span class="kw">nullptr</span>; <span class="co">// or more likely, p = functionThatCanReturnNull()</span></span>
<span id="cb3-2"><a href="#cb3-2"></a>string s(p, <span class="dv">3</span>);</span></code></pre></div>
<p>As a development of the above proposal it seems logical to remove sized counterpart of nullptr constructors, as <em>the behavior is undefined if [s, s + count) is not a valid range</em> (citation source is the same). That is, the following statements are suggested where appropriate:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb4-1"><a href="#cb4-1"></a>basic_string(<span class="dt">nullptr_t</span>, <span class="dt">size_t</span>) == <span class="kw">delete</span>;</span>
<span id="cb4-2"><a href="#cb4-2"></a><span class="kw">constexpr</span> basic_string_view(<span class="dt">nullptr_t</span>, <span class="dt">size_t</span>) == <span class="kw">delete</span>;</span></code></pre></div>
<p>These changes will break the legal, yet not legitimate case of constructing <code>std::string</code> using <code>basic_string(nullptr, 0);</code> and <code>std::string_view</code> using <code>basic_string_view(nullptr, 0);</code> and thus they were not included into the main text of the proposal.</p>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>The author would like to thank Antony Poloukhin, Marshall Clow and Eric Fiselier for a thorough review and suggestions.</p>
