<p><em>Document number: P2166R1</em></p>
<p><em>Project: Programming Language C++</em></p>
<p><em>Audience: LEWG-I, LEWG, LWG</em></p>
<p><em>Yuriy Chernyshov &lt;georgthegreat@gmail.com&gt;, &lt;thegeorg@yandex-team.ru&gt;</em></p>
<p><em>Date: 2020-09-06</em></p>
<h1 id="a-proposal-to-prohibit-stdbasic_string-and-stdbasic_string_view-construction-from-nullptr.">A Proposal to Prohibit std::basic_string and std::basic_string_view construction from nullptr.</h1>
<h2 id="revision-history">Revision History</h2>
<h3 id="r0---r1">R0 -&gt; R1</h3>
<ol style="list-style-type: decimal">
<li>Include some code listings from the open source software</li>
<li>Add reference to P2037.</li>
</ol>
<h2 id="introduction-and-motivation">Introduction and Motivation</h2>
<p>According to the C++ Standard, the behavior of <code>std::basic_string::basic_string(const CharT* s)</code> constructor <em>is undefined if [s, s + Traits::length(s)) is not a valid range (for example, if s is a null pointer)</em> (citation is taken <a href="https://en.cppreference.com/w/cpp/string/basic_string/basic_string">from cppreference.com</a>, the standard have slighty different wording in <a href="https://wg21.link/string.cons#12">21.3.2.2 [string.cons]</a>). Same applies to <code>std::basic_string_view::basic_string_view(const CharT* s)</code> constructor.</p>
<p>Existing implementations (i. e. <a href="https://github.com/llvm/llvm-project/blob/1b678ee8a6cc7510801b7c5be2bcde08ff8bbd6e/libcxx/include/string#L822">libc++</a>) might add a runtime assertion to forbid such behavior. Certain OpenSource projects would trigger this assertion. The list includes, but not limited to:</p>
<p><strong>LLVM</strong></p>
<p>LLVM project <a href="https://github.com/llvm/llvm-project/blob/58b28fa7a2fd57051f3d2911878776d6f57b18d8/llvm/utils/TableGen/DFAEmitter.cpp#L174">had</a> the following code in <code>llvm/utils/TableGen/DFAEmitter.cpp</code>:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> Action {
    Record *R = <span class="kw">nullptr</span>;
    <span class="dt">unsigned</span> I = <span class="dv">0</span>;
    <span class="bu">std::</span>string S = <span class="kw">nullptr</span>;

    <span class="co">// more code here</span>
}</code></pre></div>
<p>According to the comments, <code>struct Action</code> was intended to be an ad hoc implementation of <code>std::variant&lt;Record *, unsigned, std::string&gt;</code>. The bug was fixed in <a href="https://reviews.llvm.org/D87185">D87185</a>.</p>
<p><strong>poco</strong></p>
<p>Poco project <a href="https://github.com/pocoproject/poco/blob/3fc3e5f5b8462f7666952b43381383a79b8b5d92/Data/ODBC/include/Poco/Data/ODBC/Extractor.h#L465">uses</a> the following generic code for extracting data in their ODBC protocol implementation:</p>
<pre><code>template&lt;typename T&gt;
bool extractManualImpl(std::size_t pos, T&amp; val, SQLSMALLINT cType)
{
    SQLRETURN rc = 0;
    T value = (T) 0;
    resizeLengths(pos);
    rc = SQLGetData(_rStmt, 
        (SQLUSMALLINT) pos + 1, 
        cType,  //C data type
        &amp;value, //returned value
        0,      //buffer length (ignored)
        &amp;_lengths[pos]);  //length indicator
    
     // more code below
}</code></pre>
<p>The project also has the <code>Poco::Data::LOB</code> class for storing Large Objects, with one of the <strong>implicit</strong> constructors implemented as follows:</p>
<pre><code>LOB(const std::basic_string&lt;T&gt;&amp; content):
    _pContent(new std::vector&lt;T&gt;(content.begin(), content.end()))
    /// Creates a LOB from a string.
{
}</code></pre>
<p>Once <code>extractManualImpl</code> is called with an instance of LOB, the nullptr constructor would lead to undefined behavior. Such invocation could be found in <code>bool Extractor::extract(std::size_t pos, Poco::Data::BLOB&amp; val)</code> method <a href="https://github.com/pocoproject/poco/blob/3fc3e5f5b8462f7666952b43381383a79b8b5d92/Data/ODBC/src/Extractor.cpp#L733">in extractor.cpp</a>.</p>
<p><strong>protobuf</strong></p>
<p>Well known Google protobuf project had the similar problem fixed in <a href="https://github.com/protocolbuffers/protobuf/commit/eff1a6a01492988448685c6f9771e80e735d6030">this commit</a>. The code was:</p>
<pre><code>string GetCapitalizedType(const FieldDescriptor* field) {

    switch (field-&gt;type()) {
        // handle all possible enum values, but without adding default label
    }

    // Some compilers report reaching end of function even though all cases of
    // the enum are handed in the switch.
    GOOGLE_LOG(FATAL) &lt;&lt; &quot;Can&#39;t get here.&quot;;
    return NULL;
}</code></pre>
<p>As the code is unreachable, it would not cause any problems though.</p>
<p>On a large private monorepo applying proposed changes and running an automatic CI-check helped to find 7 problematic projects (the number includes projects listed above), two of which would actually segfault if the code point was reached (and it was indeed reachable).</p>
<p>This proposal attempts to improve the diagnostics by explicitly deleting the problematic constructors, thus moving these assertions to compile time.</p>
<h2 id="impact-on-the-standard">Impact on the Standard</h2>
<p>This proposal changes <code>&lt;string&gt;</code> and <code>&lt;string_view&gt;</code> headers only and does not affect the language core.</p>
<h2 id="proposed-wording">Proposed Wording</h2>
<p>The wording is relative to <a href="https://wg21.link/n4861">N4861</a>.</p>
<ol style="list-style-type: decimal">
<li>Modify <a href="https://wg21.link/basic.string">21.3.2 [basic.string]</a> as follows:</li>
</ol>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp">[...]
<span class="kw">namespace</span> std {
    <span class="kw">template</span>&lt;<span class="kw">class</span> charT, <span class="kw">class</span> traits = char_traits&lt;charT&gt;,
        <span class="kw">class</span> Allocator = allocator&lt;charT&gt;&gt;
    <span class="kw">class</span> basic_string {
    <span class="kw">public</span>:
        <span class="co">// types</span>
        [...]
        <span class="co">// [string.cons], construct/copy/destroy</span>
        [...]
        <span class="kw">constexpr</span> basic_string(<span class="at">const</span> charT* s, <span class="dt">size_type</span> n, <span class="at">const</span> Allocator&amp; a = Allocator());
        <span class="kw">constexpr</span> basic_string(<span class="at">const</span> charT* s, <span class="at">const</span> Allocator&amp; a = Allocator());
    +   <span class="kw">constexpr</span> basic_string(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;
        [...]
        <span class="kw">template</span>&lt;<span class="kw">class</span> T&gt;
        <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="at">const</span> T&amp; t);
        <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="at">const</span> charT* s);
    +   <span class="kw">constexpr</span> basic_string&amp; <span class="kw">operator</span>=(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;
        [...]
    };
    [...]
}</code></pre></div>
<ol start="2" style="list-style-type: decimal">
<li>Modify <a href="https://wg21.link/string.view.synop">21.4.1 [string.view.synop]</a> as indicated:</li>
</ol>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp">[...]
<span class="kw">template</span>&lt;<span class="kw">class</span> charT, <span class="kw">class</span> traits = char_traits&lt;charT&gt;&gt;
<span class="kw">class</span> basic_string_view {
<span class="kw">public</span>:
<span class="co">// types</span>
    [...]
    <span class="kw">constexpr</span> basic_string_view(<span class="at">const</span> charT* str);
+   <span class="kw">constexpr</span> basic_string_view(<span class="dt">nullptr_t</span>) = <span class="kw">delete</span>;
    [...]
};</code></pre></div>
<h2 id="further-discourse">Further Discourse</h2>
<p>These changes would not allow to remove runtime check, the following code will remain compilable and will trigger the assertion:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="at">const</span> <span class="dt">char</span> *p = <span class="kw">nullptr</span>; <span class="co">// or more likely, p = functionThatCanReturnNull()</span>
string s(p, <span class="dv">3</span>);</code></pre></div>
<p>As a development of the above proposal it seems logical to remove sized counterpart of nullptr constructors, as <em>the behavior is undefined if [s, s + count) is not a valid range</em> (citation source is the same). That is, the following statements are suggested where appropriate:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp">basic_string(<span class="dt">nullptr_t</span>, <span class="dt">size_t</span>) == <span class="kw">delete</span>;
<span class="kw">constexpr</span> basic_string_view(<span class="dt">nullptr_t</span>, <span class="dt">size_t</span>) == <span class="kw">delete</span>;</code></pre></div>
<p>These changes will break the legal, yet not legitimate case of constructing <code>std::string</code> using <code>basic_string(nullptr, 0);</code> and <code>std::string_view</code> using <code>basic_string_view(nullptr, 0);</code> and thus they were not included into the main text of the proposal.</p>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>The author would like to thank Antony Poloukhin, Marshall Clow and Eric Fiselier for a thorough review and suggestions.</p>
<p>Similar problem with assignment operator is being solved by Andrzej Krzemieński in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2037r1.html">P2037</a>.</p>
