<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">

<style type="text/css">
pre {font-family: "Consolas", "Lucida Console", monospace; margin-left:20pt; }
code {font-family: "Consolas", "Lucida Console", monospace; }
pre > i   { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
code > i  { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
pre > em  { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
code > em { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example   { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract  { margin-left: 2em; background-color: #F5F6A2;  border: 1px solid #E1E28E; }

p.function    { }
.attribute    { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std    { color: #000000; background-color: #F1F1F1;  border: 1px solid #D1D1D1;  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;  color: #000000; background-color: #FFEBFF;  border: 1px solid #ECD7EC;  padding-left: 0.5empadding-right: 0.5em; ; }
blockquote.stdins { text-decoration: underline;  color: #000000; background-color: #C8FFC8;  border: 1px solid #B3EBB3; padding: 0.5em; }
table.header { border: 0px; border-spacing: 0;  margin-left: 0px; font-style: normal; }
table { border: 1px solid black; border-spacing: 0px;  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;  padding-left: 0.4em; border: none;  padding-right: 0.4em; border: none; }
td { text-align: left;  padding-left: 0.4em; border: none;  padding-right: 0.4em; border: none; }
</style>

<title>P2037R1 &mdash; String's gratuitous assignment</title>

</head>
<body>

<table class="header"><tbody>
  <tr>
    <th>Document number:&nbsp;&nbsp;</th><th> </th><td>P2037r1</td>
  </tr>
  <tr>
    <th>Date:&nbsp;&nbsp;</th><th> </th><td>2020-06-15</td>
  </tr>
  <tr>
    <th>Audience:&nbsp;&nbsp;</th><th> </th><td>LEWG</td>
  </tr>
  <tr>
    <th>Reply-to:&nbsp;&nbsp;</th><th> </th><td><address>Andrzej Krzemie&#324;ski &lt;akrzemi1 at gmail dot com&gt;</address></td>
  </tr>
</tbody></table>

<h1>String's gratuitous assignment</h1>

<p> This paper explores the capability of the assignment from <code>char</code> to <code>std::string</code> and the consequences
    of removing it. We propose to deprecate this assignment, not necessarily with the intention to remove it in the future.</p>


<h2><a name="revisions">Revision history</a></h2>

<h3><a name="revisions.r1">R0 &rarr; R1</a></h3>

    <ol>
    <li>Showing <code>getchar()</code> as a more realistic example of a correct use case of using <code>int</code> as <code>char</code>.</li>
    <li>Added wording for the deprecation of the assignment, per LEWG recomendation.</li>
    <li>Higlihgted in discussion that there is no estimation of how many users will be negatively impacted by the deprecation.</li>
    <li>Added observation that a simple removal of the assignment from <code>char</code> will automatically enable the the assignment
        from literal <code>0</code> which is UB. </li>
    </ol>

<h2><a name="background">Background</a></h2>

<p>The interface of <code>std::basic_string</code> provides the following signature:
    </p>

<pre>constexpr basic_string&amp; operator=(charT c);</pre>

<p>This allows the direct assignment from <code>char</code> to <code>std::string</code>:
    </p>

<pre>std::string s;
s = 'A';
assert(s == "A");</pre>

<p> However, due to the implicit conversion between scalar types, this allows an assignment from numeric types,
    such as <code>int</code> or <code>double</code>, which often has an undesired semantics:
    </p>

<pre>std::string s;
s = 50;
assert(s == "2");

s = 48.0;
assert(s == "0");</pre>

<p> In fact, any user-defined type that has an impicit conversion operator to <code>int</code> or <code>double</code> 
    is also assignable to <code>std::string</code>.</p>

<p> In order to prevent the likely inadvertent conversions,
    <a href="https://github.com/cplusplus/nbballot/issues/13">[RU013]</a> proposes to change the signature so that it is
    equivalent to:
    </p>

<pre>template &lt;class T&gt;
  requires is_same_v&lt;T, charT&gt;
constexpr basic_string&amp; operator=(charT c);
</pre>



<h2><a name="discussion">Discussion</a></h2>


<h3><a name="discussion.intended">Intended usage</a></h3>

<p> Even the intended usage of the assignment from <code>char</code> is suspicious. We have a direct interface for assigning
    a single character to an existing <code>std::string</code>:
    </p>

<pre>std::string s;
s = 'A';</pre>

<p> However, there is no corresponding interface &mdash; in the form of constructor &mdash; for initializing a string
    from a single character. We have to use a more verbose syntax:</p>

<pre>const std::string s1 (1u, 'C');
const std::string s2 = {'C'};</pre>


<p> Whatever the motivation for the assignment from <code>char</code> was, surely the same motivation applied for
    the converting constructor.
    </p>


<h3><a name="discussion.pitfall">Common pitfall</a></h3>

<p> There are two common situations where the gratuitous converting assignment from <code>int</code> to
    <code>std::string</code> is used inadvertantly and results in a well-formed C++ program that does
    something else than what the programmer intended.
    </p>

<p> First is when inexperienced C++ programmers try to use their experience from weakly typed languages when trying to
    convert from <code>int</code> to <code>std::string</code> through an assignment syntax:
    </p>

<pre>template &lt;typename From, typename To&gt;
  requires std::is_assignable_v&lt;To&amp;, From const&amp;&gt;
void convert(From const&amp; from, To&amp; to)
{
  to = from;
}

std::string s;
convert(50, s);
std::cout &lt;&lt; s; <em>// outputs "2"</em>
</pre>

<p> The second situation is when a piece of data used throughout a program, such as a unique identifier,
    is changed type from <code>int</code> to <code>std::string</code>. Consider the common concept of an "id".
    While the concept is common and universally understood, there exists no natural internal representation
    of an identifier. It can be represented by an <code>int</code> or by a <code>std::string</code>,
    and sometimes the representation can change in time. If we decide to change the representation in our
    program, the expectation is that after the change  whenever a raw <code>int</code> is converted to an id &mdash;
    either in initialization or in the assignment &mdash;
    a compiler should detect a type mismatch and report a compie-time error. But because of the surprising
    "conversion" this is not the case.
    </p>


<h3><a name="discussion.valid">Valid conversions from <code>int</code></a></h3>

<p> There are usages of the assignment from type <code>int</code> to <code>std::string</code> that
    are nonetheless valid and behave exactly as intended. These are the cases when we already treat
    the value stored in an <code>int</code> as a character, but we store it in a variable of
    type <code>int</code> either for convenience or because of the peculiar rules of type promotions in C++.
    The first case is when we use literal <code>0</code> to indicate a null character <code>'\0'</code>:
    </p>

<pre>if (auto ch = std::getchar(); ch != EOF) { <em>// "Almost Always Auto" philosophy</em>
  str = ch;
}</pre>


<p> Function <code>std::getchar()</code> returns <code>int</code> so that,
    apart from any <code>char</code> value, it can also return special value <code>EOF</code>. But once
    we have confirmed the return value is not <code>EOF</code> we can treat the value as <code>char</code>.
    </p>

<p> Sometimes we may not even be aware that we are producing a value of type <code>int</code>:
    </p>

<pre>
void assign_digit(int d, std::string&amp; s)
<em>// precondition: 0 &lt;= d &amp;&amp; d &lt;= 9</em>
{
  constexpr char zero = '0';
  s = (char)d + zero;
}</pre>

<p> In the example above we might believe that because we are adding two <code>char</code>s, the resulting type will
    also be of type <code>char</code>, but the result of the addition of two <code>char</code>s is in fact of type
    <code>int</code>. This incorrect expectation is enforced by the way narrowing is defined in C++:
    </p>

<pre><em>// test if char + char == char :</em>
constexpr char zero = '0';
const int d = 9;
char ch {(char)d + zero}; <em>// brace-init prevents narrowing</em>
</pre>

<p> Brace initialization prevents narrowing. The above "test" compiles fine, so no narrowing occurs.
    From this, a programmer could draw an incorrect conclusion that the type of expression
    <code>(char)d + zero</code> must be <code>char</code>; but it is not.
    </p>


<h3><a name="discussion.options">Our options</a></h3>

<p> There is a number of ways we can respond to this problem.</p>


<h4><a name="discussion.options.noting">Do nothing</a></h4>

<p> That is, do not modify the interface of <code>std::basic_string</code>.
    The potential bugs resulting from the suspicious conversion
    can be detected by static analyzers rather than compilers. For instance,
    clang-tidy has checker
    <a href="https://clang.llvm.org/extra/clang-tidy/checks/bugprone-string-integer-assignment.html"><code>bugprone-string-integer-assignment</code></a>
    that reports all places where the suspicious assignment from an <code>int</code>
    is performed.
    This avoids any correct code breakage, and leaves the option for the
    bugs to be detected by other tools.
    </p>


<h4><a name="discussion.options.remove">Remove the assignment operator from <code>charT</code></a></h4>

<p> We can just remove the assignment from <code>charT</code> altogether.
    This assignment is suspicious even if no conversions are applied.
    It is like an assignment of a container element to a container.
    This warrants the usage of syntax that expresses the element-container relation, like:
    </p>

<pre>str.assign(1, ch);
str = {ch};</pre>

<p> A migration procedure can be provided for changing the program that
    previously used the suspicious assignment.
    </p>

<p> However, it should be noted that currenlty owing to the existence of the assignment from <code>char</code>
    the following code fails to compile:</p>
    
<pre>str = 0;
str = NULL;</pre>    

<p> This is because there are two competing assignment operators: one taking <code>char</code>
    and the other taking <code>const char *</code>. If we removed the former assignment, the latter woud start
    compiling, but the assignment from a null <code>const char *</code> would cause Undefined Behavior. In order
    to avoid current bugs and not introduce the potential for new ones, the removal of one assignment operator
    would have to be accompanied by the addition of another:</p>     

<pre>   constexpr basic_string&amp; operator=(nullptr_t) = delete;</pre>

<p> An alternative solution would be to declare the assignment from <code>char</code> itself as deleted.</p>



<h4><a name="discussion.options.deprecate">Deprecate the assignment</a></h4>

<p> A softer variant of the above would be to declare the assignment from <code>charT</code>
    as deprecated. This does not affect the semantics of any existing program, and at the same time encourages 
    tools (compilers included) to diagnose any usage of such assignment.
    </p>
    
<p> A deprecation is <em>not</em> a commitment to remove a feature ever in the future. A possible outcome of
    such deprecation would be that we will keep the assignment forever. Nonetheless, it should be noted that
    if the depprecatd assignment is ever removed, it would introduce the problem of reenabling assignment from 
    literal 0.</p>


<h4><a name="discussion.options.poison">Poison the conversion from scalar types to <code>charT</code> in the assignment</a></h4>

<p> Do what <a href="https://github.com/cplusplus/nbballot/issues/13">[RU013]</a> proposes:
    replace the current signature of the assignment with something equivalent to:
    </p>

<pre>template &lt;class T&gt;
  requires is_same_v&lt;T, charT&gt;
constexpr basic_string&amp; operator=(charT c);
</pre>

<p> This may still compromize some valid programs, but the damage is smaller
    than if the operator was removed altogether. An automated mecanical fix
    can be easily provided: you just need to apply a cast:
    <p>

<pre>str = std::char_traits&lt;char&gt;::to_char_type(i);</pre>

<p>This solution also suffers from the problem of reenabling assignment from literal 0.</p>



<h4><a name="discussion.options.poison_float">Poison all conversion but the one from <code>int</code></a></h4>


<p> There is no controversy about disallowing an assignment from <code>float</code> or <code>unsigned int</code>.
    Chances that such usages are correct are so small that sacrificing them would be acceptable.
    The only assignment from non-<code>charT</code> that could be potentially correct is the
    one from <code>int</code>, as <code>int</code>s are often produced from <code>char</code> in unexpected
    places. Given that, we could poison other assignments, but leave the assignment from <code>int</code> intact.
    </p>

<p> However, all places where this bug has been reported, it was exactly the assignment from <code>int</code>, so
    this option may not be much more attractive than doing nothing.
    </p>


<h4><a name="discussion.options.assign_char">Offer an alternative interface</a></h4>


<p> If the assignment is narrowed in applicability or removed, this change can be accompanied
    by adding a dedicated interface for putting a single character into a string. we could add
    the following signature to <code>basic_string</code>:
    </p>

<pre>
constexpr basic_string&amp; assign_char(charT c);
</pre>

<p>And this avoids any pitfalls, even if an <code>int</code> is passed to it:</p>

<pre>
str.assign_char('0' + 0); <em>// we obviously mean a </em>numeric<em> conversion to char</em>
</pre>

<p> It is superior to <code>str = {ch}</code> because it allows correct assignments from
    <code>int</code>, and it is superior to <code>str = {char(ch)}</code> because it avoids
    explicit conversion operators.
  </p>



<h2><a name="impact">Impact on users</a></h2>



<p> There was a consensus in LEWG to depprecate the assignment from <code>CharT</code> by moving it to Annex D.
    So we are ony discussing the impact of deprecating the assignment. Deprecation technically does not alter the 
    interface in the sense that programs that used to be valid remain vaid with unaltered semantics, and programs
    that used to be invalid remain invalid with the same diagnostics. However, deprecation will impact the users
    who configure their compiers to warn about the usage of deprecated features and to treat warnings as errors. 
    For users who use the string assignment inadvertantly and incorrectly this breakage will be a gain. But for
    users who are aware of the semantics and assign from <code>int</code> to <code>string</code> conciously this
    will be a harm. The <code>int</code>-to-<code>string</code> assignment can be treated as a dangerous but useful
    tool. Such impact could be mitigated if compilers allow the users to control which deprecations are warned about.</p>
    
<p> The deprecation warning about the <code>int</code>-to-<code>string</code> assignment has not been implemented
    on any compier that we are aware of. (It is implemented in clang-tidy though.) The impact on the users has
    not been estimated.</p>



<h2><a name="wording">Proposed wording</a></h2>


<p> Changes are  relative to <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4861.pdf">[N4861]</a>.
    </p>

<p>In [basic.string] paragraph 3, remove the the decaration of the assignment from <code>CharT</code> from class synopsis:</p>


<blockquote class="std"><pre>    <em>// 21.3.2.2, construct/copy/destroy</em>
    constexpr basic_string() noexcept(noexcept(Allocator())) : basic_string(Allocator()) { }
    constexpr explicit basic_string(const Allocator&amp; a) noexcept;
    constexpr basic_string(const basic_string&amp; str);
    constexpr basic_string(basic_string&amp;&amp; str) noexcept;
    constexpr basic_string(const basic_string&amp; str, size_type pos,
                           const Allocator&amp; a = Allocator());
    constexpr basic_string(const basic_string&amp; str, size_type pos, size_type n,
                           const Allocator&amp; a = Allocator());
    template&lt;class T&gt;
      constexpr basic_string(const T&amp; t, size_type pos, size_type n,
                             const Allocator&amp; a = Allocator());
    template&lt;class T&gt;
      constexpr explicit basic_string(const T&amp; t, const Allocator&amp; a = Allocator());
    constexpr basic_string(const charT* s, size_type n, const Allocator&amp; a = Allocator());
    constexpr basic_string(const charT* s, const Allocator&amp; a = Allocator());
    constexpr basic_string(size_type n, charT c, const Allocator&amp; a = Allocator());
    template&lt;class InputIterator&gt;
      constexpr basic_string(InputIterator begin, InputIterator end,
                             const Allocator&amp; a = Allocator());
    constexpr basic_string(initializer_list&lt;charT&gt;, const Allocator&amp; = Allocator());
    constexpr basic_string(const basic_string&amp;, const Allocator&amp;);
    constexpr basic_string(basic_string&amp;&amp;, const Allocator&amp;);
    constexpr ~basic_string();
    constexpr basic_string&amp; operator=(const basic_string&amp; str);
    constexpr basic_string&amp; operator=(basic_string&amp;&amp; str)
      noexcept(allocator_traits&lt;Allocator&gt;::propagate_on_container_move_assignment::value ||
               allocator_traits&lt;Allocator&gt;::is_always_equal::value);
    template&lt;class T&gt;
      constexpr basic_string&amp; operator=(const T&amp; t);
    constexpr basic_string&amp; operator=(const charT* s);
    <del>constexpr basic_string&amp; operator=(charT c);</del>
    constexpr basic_string&amp; operator=(initializer_list&lt;charT&gt;);</pre></blockquote>


<p>Remove paragraph 30 from [string.cons]:</p>

<blockquote class="std"><p><code>constexpr basic_string&amp; operator=(const charT* s);</code></p>
<div style="text-indent: +2em;"><em>Effects:</em> Equivalent to: <code>return *this = basic_string_view&lt;charT, traits&gt;(s);</code><br><br></div>

<p><del><code>constexpr basic_string&amp; operator=(charT c);</code></del></p>
<del><div style="text-indent: +2em;"><em>Effects:</em> Equivalent to:<br></div>
<div style="text-indent: +4em;"><code>return *this = basic_string_view&lt;charT, traits&gt;(addressof(c), 1);</code><br><br></div></del>

<p><code>constexpr basic_string&amp; operator=(initializer_list&lt;charT&gt; il);</code></p>
<div style="text-indent: +2em;"><em>Effects:</em> Equivalent to:<br></div>
<div style="text-indent: +4em;"><code>return *this = basic_string_view&lt;charT, traits&gt;(il.begin(), il.size());</code></div>
</blockquote>

<p>Modify section D.19 as follows (this includes changing the stable links):</p>

<blockquote class="std"><p><span style="font-size: 140%;">D.19 Deprectaed <code>basic_string</code> <del>capacity</del><ins>members</ins> 
<span style="float: right;">[depr.string.capacity]</span></span><br></p>

<p>The following member<ins>s are</ins><del>is</del> declared in addition to those members specified in <ins>21.3.2.2 and </ins>21.3.2.4:<br></p>

<pre>  namespace std {
    template&lt;class charT, class traits = char_traits&lt;charT&gt;,
             class Allocator = allocator&lt;charT&gt;&gt;
    class basic_string {
    public:
      <ins>constexpr basic_string&amp; operator=(charT c);</ins>
      void reserve();
    };
  }</pre>
 
  
<p><ins><code>constexpr basic_string&amp; operator=(charT c);</code></ins></p>
<ins><div style="text-indent: +2em;"><em>Effects:</em> Equivalent to:<br></div>
<div style="text-indent: +4em;"><code>return *this = basic_string_view&lt;charT, traits&gt;(addressof(c), 1);</code></div></ins>
  
<p><code>void reserve();</code></p>

<div style="text-indent: +2em;"><em>Effects:</em> After this call, <code>capacity()</code> 
has an unspecified value greater than or equal to <code>size()</code>.
 [<em>Note:</em> This is a non-binding shrink to fit request. <em>&mdash;end note</em>]</div>

</blockquote>

<h2><a name="acknowledgements">Acknowledgements</a></h2>

<p> I am grateful to Antony Polukhin and Jorg Brown for their useful feedback.
    I am also grateful to Tomasz Kamiński for reviewing the proposed wording.
    Barry Revzin and Ville Voutilainen stressed the importance of estimating
    the impact of the deprecation on the usrs. This is now reflected in the paper.    
    </p>

<h2><a name="literature">References</a></h2>

<ol>
  <li>[RU013] --
      [string.cons].30 <br>
      (<a href="https://github.com/cplusplus/nbballot/issues/13">https://github.com/cplusplus/nbballot/issues/13</a>).
      </li>

   <li>[CLANG] --
      "Extra Clang Tools 10 documentation" <br>
      (<a href="https://clang.llvm.org/extra/clang-tidy/">https://clang.llvm.org/extra/clang-tidy/3</a>).
      </li>
</ol>


</body>
</html>
