<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">

<style type="text/css">
pre {font-family: "Consolas", "Lucida Console", monospace; margin-left:20pt; }
code {font-family: "Consolas", "Lucida Console", monospace; }
pre > i   { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
code > i  { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
pre > em  { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
code > em { font-family: "Consolas", "Lucida Console", monospace;  font-style:italic; }
body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example   { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract  { margin-left: 2em; background-color: #F5F6A2;  border: 1px solid #E1E28E; }

p.function    { }
.attribute    { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std    { color: #000000; background-color: #F1F1F1;  border: 1px solid #D1D1D1;  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;  color: #000000; background-color: #FFEBFF;  border: 1px solid #ECD7EC;  padding-left: 0.5empadding-right: 0.5em; ; }
blockquote.stdins { text-decoration: underline;  color: #000000; background-color: #C8FFC8;  border: 1px solid #B3EBB3; padding: 0.5em; }
table.header { border: 0px; border-spacing: 0;  margin-left: 0px; font-style: normal; }
table { border: 1px solid black; border-spacing: 0px;  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;  padding-left: 0.4em; border: none;  padding-right: 0.4em; border: none; }
td { text-align: left;  padding-left: 0.4em; border: none;  padding-right: 0.4em; border: none; }
</style>

<title>P2037R0 &mdash; String's gratuitous assignment</title>

</head>
<body>

<table class="header"><tbody>
  <tr>
    <th>Document number:&nbsp;&nbsp;</th><th> </th><td>P2037r0</td>
  </tr>
  <tr>
    <th>Date:&nbsp;&nbsp;</th><th> </th><td>2020-01-11</td>
  </tr>
  <tr>
    <th>Audience:&nbsp;&nbsp;</th><th> </th><td>LEWG</td>
  </tr>
  <tr>
    <th>Reply-to:&nbsp;&nbsp;</th><th> </th><td><address>Andrzej Krzemie&#324;ski &lt;akrzemi1 at gmail dot com&gt;</address></td>
  </tr>
</tbody></table>

<h1>String's gratuitous assignment</h1>

<p> This paper explores the capability of the assignment from <code>char</code> to <code>std::string</code> and the consequences
    of removing it.</p>


<h2><a name="background">Background</a></h2>

<p>The interface of <code>std::basic_string</code> provides the following signature:
    </p>

<pre>constexpr basic_string&amp; operator=(charT c);</pre>

<p>This allows the direct assignment from <code>char</code> to <code>std::string</code>:
    </p>

<pre>std::string s;
s = 'A';
assert(s == "A");</pre>

<p> However, due to the implicit conversion between scalar types, this allows an assignment from numeric types,
    such as <code>int</code> or <code>double</code>, which often has an undesired semantics:
    </p>

<pre>std::string s;
s = 50;
assert(s == "2");

s = 48.0;
assert(s == "0");</pre>

<p> In order to prevent the likely inadvertent conversions,
    <a href="https://github.com/cplusplus/nbballot/issues/13">[RU013]</a> proposes to change the signature so that it is
    equivalent to:
    </p>

<pre>template &lt;class T&gt;
  requires is_same_v&lt;T, charT&gt;
constexpr basic_string&amp; operator=(charT c);
</pre>


<h2><a name="discussion">Discussion</a></h2>

<h3><a name="discussion.intended">Intended usage</a></h3>

<p> Even the intended usage of the assignment from <code>char</code> is suspicious. We have a direct interface for assigning
    a single character to an existing <code>std::string</code>:
    </p>

<pre>std::string s;
s = 'A';</pre>

<p> However, there is no corresponding interface &mdash; in the form of constructor &mdash; for initializing a string
    from a single character. We have to use a more verbose syntax:</p>

<pre>const std::string s1 (1u, 'C');
const std::string s2 = {'C'};</pre>


<p> Whatever the motivation for the assignment from <code>char</code> was, surely the same motivation applied for
    the converting constructor.
    </p>


<h3><a name="discussion.pitfall">Common pitfall</a></h3>

<p> There are two common situations where the gratuitous converting assignment from <code>int</code> to
    <code>std::string</code> is used inadvertantly and results in a well-formed C++ program that does
    something else than what the programmer intended.
    </p>

<p> First is when inexperienced programmers try to use their experience from weakly typed languages when trying to
    convert from <code>int</code> to <code>std::string</code> through an assignment syntax:
    </p>

<pre>template &lt;typename From, typename To&gt;
  requires std::is_assignable_v&lt;To&amp;, From const&amp;&gt;
void convert(From const& from, To& to)
{
  to = from;
}

std::string s;
convert(50, s);
std::cout << s; // outputs "2"
</pre>

<p> The second situation is when a piece of data used throughout a program, such as a unique identifier,
    is changed type from <code>int</code> to <code>std::string</code>. Consider the common concept of an "id".
    While he concept is common and universally understood, there exists no natural internal representation
    of an identifier. It can be represented by an <code>int</code> or by a <code>std::string</code>,
    and sometimes the representation can change in time. If we decide to change the representation in our
    program, the expextation is that after the change  whenever a raw <code>int</code> is converted to an id &mdash;
    either in initialization or in the assignment &mdash;
    a compiler should detect a type mismatch and report a compie-time error. But because of the surprising
    "conversion" this is not the case.
    </p>


<h3><a name="discussion.valid">Valid conversions from <code>int</code></a></h3>

<p> There are usages of the assignment from type <code>int</code> to <code>std::string</code> that
    are nonetheless valid and behave exactly as intended. These are the cases when we already treat
    the value stored in an <code>int</code> as a character, but we store it in a variable of
    type <code>int</code> either for convenience or because of the peculiar rules of type promotions in C++.
    The first case is when we use literal <code>0</code> to indicate a null character <code>'\0'</code>:
    </p>

<pre>if (cond1) {
  str = 'A';
}
else if (cond2) {
  str = 'B';
}
else {
  str = 0;  <em>// I mean '\0'</em>
}</pre>

<p>or:</p>

<pre>  str = NULL;</pre>

<p> which &mdash; although suspicious &mdash; is reported to be used, and is the reason why compilers do not
    define macro <code>NULL</code> as <code>nullptr</code>.
    </p>

<p> Sometimes we may not even be aware that we are producing a value of type <code>int</code>:
    </p>

<pre>
void assign_digit(int d, std::string& s)
<em>// precondition: 0 &lt;= d && d &lt;= 9</em>
{
  constexpr char zero = '0';
  s = (char)d + zero;
}</pre>

<p> In the example above we might believe that because we are adding two <code>char</code>s, the resulting type will
    also be of type <code>char</code>, but the result of the addition of two <code>char</code>s is in fact of type
    <code>int</code>. This incorrect expectation is enforced by the way narrowing is defined in C++:
    </p>

<pre><em>// test if char + char == char :</em>
constexpr char zero = '0';
const int d = 9;
char ch {(char)d + zero}; <em>// brace-init prevents narrowing</em>
</pre>

<p> Brace initialization prevents narrowing. The above "test" compiles fine, so no narrowing occur.
    From this, a programmer could draw an incorrect conclusion that the type of expression
    <code>(char)d + zero</code> must be <code>char</code>; but it is not.
    </p>


<h3><a name="discussion.options">Our options</a></h3>

<p> There is a number of ways we can respond to this problem.</p>


<h4><a name="discussion.options.noting">Do nothing</a></h4>

<p> That is, do not modify the interface of <code>std::basic_string</code>.
    The potential bugs resulting from the suspicious conversion
    can be detected by static analyzers rather than compilers. For instance,
    clang-tidy has checker
    <a href="https://clang.llvm.org/extra/clang-tidy/checks/bugprone-string-integer-assignment.html"><code>bugprone-string-integer-assignment</code></a>
    that reports all places where the suspicious assignment from an <code>int</code>
    is performed.
    This avoids any correct code breakage, and leaves the option for the
    bugs to be detected by other tools.
    </p>


<h4><a name="discussion.options.remove">Remove the assignment operator from <code>charT</code></a></h4>

<p> We can just remove the assignment from <code>charT</code> altogether.
    This assignment is suspicious even if no conversions are applied.
    It is like an assignment of a container element to a container.
    This warrants the usage of syntax that expresses the element-container relation, like:
    </p>

<pre>str.assign(1, ch);
str = {ch};</pre>

<p> A migration procedure can be provided for changing the program that
    previously used the suspicious assignment.
    </p>


<h4><a name="discussion.options.deprecate">Deprecate the assignment</a></h4>

<p> A softer variant of the above would be to declare the assignment from <code>charT</code>
    as deprecated. This does not break any correct code, and allows potential bugs to be
    detected by the compiler.
    </p>


<h4><a name="discussion.options.poison">Poison te conversion from scalar types to <code>charT</code> in the assignment</a></h4>

<p> Do what <a href="https://github.com/cplusplus/nbballot/issues/13">[RU013]</a> proposes:
    replace the current signature of the assignment with something equivalent to:
    </p>

<pre>template &lt;class T&gt;
  requires is_same_v&lt;T, charT&gt;
constexpr basic_string&amp; operator=(charT c);
</pre>

<p> This may still compromize some valid programs, but the damage is smaller
    than if the operator was removed altogether. An automated mecanical fix
    can be easily provided: you just need to apply a cast:
    <p>

<pre>str = std::char_traits&lt;char&gt;::to_char_type(i);</pre>


<h4><a name="discussion.options.poison_float">Poison all conversion but the one from <code>int</code></a></h4>


<p> There is no controversy about disallowing an assignment from <code>float</code> or <code>unsigned int</code>.
    Chances that such usages are correct are so small that sacrificing them would be acceptable.
    The only assignment from non-<code>charT</code> that could be potentially correct is the
    one from <code>int</code>, as <code>int</code>s are often produced from <code>char</code> in unexpected
    places. Given that, we could poison other assignments, but leave the assignment from <code>int</code> intact.
    </p>

<p> However, all places where this bug has been reported, it was exactly the assignment from <code>int</code>, so
    this option may not be much more attractive than doing nothing.
    </p>


<h4><a name="discussion.options.assign_char">Offer an alternative interface</a></h4>


<p> If the assignment is narrowed in applicability or removed, this change can be accompanied
    by adding a dedicated interface for putting a single character into a string. we could add
    the following signature to <code>basic_string</code>:
    </p>

<pre>
constexpr basic_string&amp; assign_char(charT c);
</pre>

<p>And this avoids any pitfalls, even if an <code>int</code> is passed to it:</p>

<pre>
str.assign_char('0' + 0); <em>// we obviously mean a </em>numeric<em> conversion to char</em>
</pre>

<p> It is superior to <code>str = {ch}</code> because it allows correct assignments from
    <code>int</code>, and it is superior to <code>str = {char(ch)}</code> because it avoids
    explicit conversion operators.
  </p>


<h2><a name="acknowledgements">Acknowledgements</a></h2>

<p> I am grateful to Antony Polukhin and Jorg Brown for their useful feedback.
    </p>

<h2><a name="literature">References</a></h2>

<ol>
  <li>[RU013] --
      [string.cons].30 <br>
      (<a href="https://github.com/cplusplus/nbballot/issues/13">https://github.com/cplusplus/nbballot/issues/13</a>).
      </li>

   <li>[CLANG] --
      "Extra Clang Tools 10 documentation" <br>
      (<a href="https://clang.llvm.org/extra/clang-tidy/">https://clang.llvm.org/extra/clang-tidy/3</a>).
      </li>
</ol>


</body>
</html>
