<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xml:lang="en">
<head><meta http-equiv="Content-type" content="application/xhtml+xml;charset=utf-8" /><title>Fixing operator>>(basic_istream&, CharT*) (LWG 2499)</title></head>
<body><!-- maruku -o lwg2499.html lwg2499.md --><style type='text/css'>
pre { margin: 0; }
pre code { display: block; margin-left: 2em; }
div code { display: block; margin-left: 2em; }
ins { text-decoration: none; font-weight: bold; background-color: #A0FFA0 }
del { text-decoration: line-through; background-color: #FFA0A0 }
</style><table><tbody>
<tr><th>Doc. no.:</th>	<td>P0487R1</td></tr>
<tr><th>Date:</th>	<td>2018-08-23</td></tr>
<tr><th>Audience:</th>	<td>Library Working Group</td></tr>
<tr><th>Reply-to:</th>	<td>Zhihao Yuan &lt;zy at miator dot net&gt;</td></tr>
</tbody></table>
<h1 id="fixing_operatorbasic_istream_chart_lwg_2499">Fixing operator&gt;&gt;(basic_istream&amp;, CharT*) (LWG 2499)</h1>

<h2 id="background">Background</h2>

<p>The <a href="https://cplusplus.github.io/LWG/issue2499">issue</a> was submitted with the following rationale: the most obvious use of this overload</p>

<pre><code>std::cin &gt;&gt; buffer;</code></pre>

<p>does not protect against buffer overflow, thus shares the same problem of stdio’s <code>gets()</code>, which has been removed from both C11 and C++. So maybe we should remove this overload as well.</p>

<p>However, comparing it to <code>gets()</code> brings in some distortion here. More precisely, <code>scanf</code>‘s <code>&quot;%s&quot;</code> is where this overload copies from. Both deal with formatted input, read “words”, and naive uses of them suffer from buffer overflow, plus both have ways to prevent this issue. For <code>scanf</code>, you can limit the field widths,</p>

<pre><code>scanf(&quot;%20s %20s&quot;, a, b);</code></pre>

<p>and the iostreams’ version improved this practice by allowing programmatically passing the width:</p>

<pre><code>cin &gt;&gt; setw(21) &gt;&gt; a;</code></pre>

<p>The idea is as same as the <code>&quot;%.*s&quot;</code> conversion specification in <code>printf</code>, while <code>scanf</code> doesn’t support the asterisk ( <code>&#39;*&#39;</code> ) arguments.</p>

<h2 id="discussion">Discussion</h2>

<p>What should we do to this library issue? People have raised the voices to deprecate or remove this overload. However, I want to mention that:</p>

<ol>
<li>C is not deprecating or removing either <code>&quot;%s&quot;</code> or <code>&quot;%</code><em><code>N</code></em><code>s&quot;</code> from <code>scanf</code>;</li>

<li>There are more than one existing legitimate uses of this overload. The users can pass the <code>.width()</code> argument to read unknown inputs, or read from streams with known contents and customized streams.</li>
</ol>

<p>As shown as the proposed resolution to this issue, rather than deprecating or removing the whole overload, I try to:</p>

<ol>
<li>Preserve some legitimate uses;</li>

<li>Protect the users against the bad uses.</li>
</ol>

<p>More specifically, we can safely claim that when a width is not specified ( <code>.width() == 0</code> ), the user’s intention is to read as if the length of the buffer is being passed. To an array type, the length is known at compile-time so that we can “fix” this for the user. However, due to implementability, unless we want to place additional preconditions on this function such as “Requires: <code>width() &gt; 0</code> if the argument is of type <code>charT*</code>”, all uses of passing a pointer to characters will have to be deprecated or removed.</p>

<p>In a previous <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0487r0.html">revision</a> of this paper, two wordings were provided, one for deprecating the pointer arguments and one for removing. The deprecation option is nontrivial and has dual-behavior, while the removal option fortunately does not break ABI since the affected signatures were not candidates for explicit instantiation, therefore LWG decided to go for the removal option.</p>

<h2 id="implementation">Implementation</h2>

<p>libc++ review: <a href="https://reviews.llvm.org/D51268">https://reviews.llvm.org/D51268</a></p>

<h2 id="wording">Wording</h2>

<p>This wording is relative to N4762.</p>

<p>Modify 27.7.4.2.3 [istream.extractors] as indicated:</p>
<div><code>
template&lt;class charT, class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;basic_istream&lt;charT, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;charT, traits&gt;&amp; in,
					   <del>charT* s</del><ins>charT (&amp;s)[N]</ins>);<br />
template&lt;class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;basic_istream&lt;char, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;char, traits&gt;&amp; in,
					  <del>unsigned char* s</del><ins>unsigned char (&amp;s)[N]</ins>);<br />
template&lt;class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;basic_istream&lt;char, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;char, traits&gt;&amp; in,
					  <del>signed char* s</del><ins>signed char (&amp;s)[N]</ins>);
</code></div>
<blockquote>
<p><em>Effects:</em> Behaves like a formatted input member (as described in 27.7.4.2.1) of <code>in</code>. After a <code>sentry</code> object is constructed, <code>operator&gt;&gt;</code> extracts characters and stores them into <del>successive locations of an array whose first element is designated by</del> <code>s</code>. If <code>width()</code> is greater than zero, <code>n</code> is <del><code>width()</code></del><ins><code>min(size_t(width()), N)</code></ins>. Otherwise <code>n</code> is <del>the number of elements of the largest array of <code>char_type</code> that can store a terminating <code>charT()</code></del><ins><code>N</code></ins>. <code>n</code> is the maximum number of characters stored.</p>
</blockquote>

<p>Update 27.7.4.1 [istream] synopsis:</p>

<blockquote>
<p>[…]</p>
</blockquote>
<div><code>
&nbsp;&nbsp;template&lt;class charT, class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;basic_istream&lt;charT, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;charT, traits&gt;&amp; in, <del>charT*</del><ins>charT(&amp;)[N]</ins>);<br />
&nbsp;&nbsp;template&lt;class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;basic_istream&lt;char, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;char, traits&gt;&amp; in, <del>unsigned char*</del><ins>unsigned char(&amp;)[N]</ins>);<br />
&nbsp;&nbsp;template&lt;class traits<ins>, size_t N</ins>&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;basic_istream&lt;char, traits&gt;&amp; operator&gt;&gt;(basic_istream&lt;char, traits&gt;&amp; in, <del>signed char*</del><ins>signed char(&amp;)[N]</ins>);<br />
}
</code></div>
<p>Add a new compatibility item to C.5 [diff.cpp17]:</p>

<blockquote>
<p>Clause 27: input/output library [diff.cpp17.input.output]</p>
</blockquote>

<blockquote>
<p><strong>Change:</strong> Character array extraction only takes array types.</p>

<p><strong>Rationale:</strong> Increase safety via preventing buffer overflow at compile time.</p>

<p><strong>Effect on original feature:</strong> Valid C++ 2017 code may fail to compile in this International Standard:</p>
</blockquote>

<pre><code>auto p = new char[100];
std::cin &gt;&gt; std::setw(20) &gt;&gt; p;</code></pre>

<blockquote></blockquote>
</body></html>
