<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2036: istream &gt;&gt; char and eofbit</title>
<meta property="og:title" content="Issue 2036: istream &gt;&gt; char and eofbit">
<meta property="og:description" content="C++ library issue. Status: NAD">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2036.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#NAD">NAD</a> status.</em></p>
<h3 id="2036"><a href="lwg-closed.html#2036">2036</a>. <code>istream &gt;&gt; char</code> and <code>eofbit</code></h3>
<p><b>Section:</b> 31.7.5.2 <a href="https://wg21.link/istream">[istream]</a> <b>Status:</b> <a href="lwg-active.html#NAD">NAD</a>
 <b>Submitter:</b> Howard Hinnant <b>Opened:</b> 2011-02-27 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#istream">issues</a> in [istream].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#NAD">NAD</a> status.</p>
<p><b>Discussion:</b></p>
<p>The question is:  When a single character is extracted from an <code>istream</code> using <code>operator&gt;&gt;</code>, 
does <code>eofbit</code> get set if this is the last character extracted from the stream?  The current standard is at 
best ambiguous on the subject. 31.7.5.2 <a href="https://wg21.link/istream">[istream]</a>/p3 describes all extraction operations with:</p>

<blockquote><p>
3 If <code>rdbuf()-&gt;sbumpc()</code> or <code>rdbuf()-&gt;sgetc()</code> returns <code>traits::eof()</code>, then the input 
function, except as explicitly noted otherwise, completes its actions and does <code>setstate(eofbit)</code>, which may 
throw <code>ios_base::failure</code> (31.5.4.4 <a href="https://wg21.link/iostate.flags">[iostate.flags]</a>), before returning.
</p></blockquote>

<p>And  [istream::extractors]/p12 in describing <code>operator&gt;&gt;(basic_istream&lt;charT,traits&gt;&amp; in, charT&amp; c);</code> 
offers no further clarification:
</p>

<blockquote><p>
12 <i>Effects</i>: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of <code>in</code>. 
After a <code>sentry</code> object is constructed a character is extracted from <code>in</code>, if one is available, and 
stored in <code>c</code>. Otherwise, the function calls <code>in.setstate(failbit)</code>.
</p></blockquote>

<p>I coded it one way in libc++, and g++ coded it another way.  Chris Jefferson noted that some boost code was 
sensitive to the difference and fails for libc++.  Therefore I believe that it is very important that we specify 
this extraction operator in enough detail that both vendors and clients know what behavior is required and expected.
</p>

<p>Here is a brief code example demonstrating the issue:</p>

<blockquote><pre>
#include &lt;sstream&gt;
#include &lt;cassert&gt;

int main()
{
  std::istringstream ss("1");
  char t;
  ss &gt;&gt; t;
  assert(!ss.eof());
};
</pre></blockquote>

<p>For every type capable of reading this istringstream but <code>char</code>, <code>ss.eof()</code> will be true after the 
extraction (<code>bool</code>, <code>int</code>, <code>double</code>, etc.).  So for consistency's sake we might want to have 
<code>char</code> behave the same way as other built-in types.</p>

<p>However Jean-Marc Bourguet offers this counter example code using an interactive stream.  He argues that 
setting <code>eof</code> inhibits reading the next line:</p>

<blockquote><pre>
#include &lt;iostream&gt;

int main()
{
 char c;
 std::cin &gt;&gt; std::noskipws;
 std::cout &lt;&lt; "First line: ";
 while (std::cin &gt;&gt; c) {
    if (c == '\n') {
       std::cout &lt;&lt; "Next line: ";
    }
 }
}
</pre></blockquote>

<p>As these two code examples demonstrate, whether or not <code>eofbit</code> gets set is an observable difference and it 
is impacting real-world code.  I feel it is critical that we clearly and unambiguously choose one behavior or the other.  
I am proposing wording for both behaviors and ask the LWG to choose one (and only one!).</p>

<p>Wording for setting <code>eof</code> bit:</p>

<p>Modify  [istream::extractors]/p12 as follows:</p>

<blockquote><p>
12 <i>Effects</i>: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of <code>in</code>. 
After a <code>sentry</code> object is constructed a character is extracted from <code>in</code>, if one is available, and 
stored in <code>c</code>. <del>Otherwise, the function calls <code>in.setstate(failbit)</code>.</del>  <ins>If a character is 
extracted and it is the last character in the pending sequence, the function calls <code>in.setstate(eofbit)</code>.  
If a character is not extracted the function calls <code>in.setstate(failbit | eofbit)</code>.</ins>
</p></blockquote>

<p>Wording for not setting <code>eof</code> bit:</p>

<blockquote><p>
12 <i>Effects</i>: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of <code>in</code>. 
After a <code>sentry</code> object is constructed a character is extracted from <code>in</code><del>, if one is available, 
and stored in <code>c</code>. Otherwise, the function calls <code>in.setstate(failbit)</code>.</del> <ins>with 
<code>in.rdbuf()-&gt;sbumpc()</code>.  If <code>traits::eof()</code> is returned, the function calls 
<code>in.setstate(failbit | eofbit)</code>.  Otherwise the return value is converted to type <code>charT</code> and stored
in <code>c</code>.</ins>
</p></blockquote>

<p><i>[2011-02-27: Jean-Marc Bourguet comments]</i></p>


<p>Just for completeness: it [the counter example] doesn't inhibit to read the next line, it inhibits the prompt 
to be put at the appropriate time.</p>

<p>More information to take into account when deciding:</p>

<ul>
<li><p>if I'm reading correctly the section to get boolean values when <code>boolalpha</code> is set, there we mandate 
that <code>eof</code> isn't set if trying to read past the end of the pending sequence wasn't needed to determine the result.
</p></li>

<li><p>
see also the behaviour of <code>getline</code> (which isn't a formatted input function but won't set <code>eof</code> 
if it occurs just after the delimiter)
</p></li>

<li><p>
if I'm reading the C standard correctly <code>scanf("%c")</code> wouldn't set <code>feof</code> either in that situation.
</p></li>
</ul>

<p><i>[2011-02-28: Martin Sebor comments]</i></p>


<p>[Responds to bullet 1 of Jean-Marc's list]</p>

<p>
Yes, this matches the stdcxx test suite for <code>num_get</code> and <code>time_get</code>
but not <code>money_get</code> when the currency symbol is last. I don't see
where in the locale.money.get.virtuals section we specify whether
<code>eofbit</code> is or isn't set and when.
<p/>
IMO, if we try to fix the <code>char</code> extractor to be consistent we
should also fix all the others extractors and manipulators that
aren't consistent (including <code>std::get_money</code> and <code>std::get_time</code>).
</p>

<p><i>[2011-03-24 Madrid meeting]</i></p>


<p>Dietmar convinced Howard, that the standard does already say the right words</p>



<p><b>Rationale:</b></p><p>Reading the last character does not set eofbit and the standard says so already</p>

<p id="res-2036"><b>Proposed resolution:</b></p>
<p></p>





</body>
</html>
