<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 662: Inconsistent handling of incorrectly-placed thousands separators</title>
<meta property="og:title" content="Issue 662: Inconsistent handling of incorrectly-placed thousands separators">
<meta property="og:description" content="C++ library issue. Status: NAD">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue662.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#NAD">NAD</a> status.</em></p>
<h3 id="662"><a href="lwg-closed.html#662">662</a>. Inconsistent handling of incorrectly-placed thousands separators</h3>
<p><b>Section:</b> 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a> <b>Status:</b> <a href="lwg-active.html#NAD">NAD</a>
 <b>Submitter:</b> Cosmin Truta <b>Opened:</b> 2007-04-05 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View other</b> <a href="lwg-index-open.html#facet.num.get.virtuals">active issues</a> in [facet.num.get.virtuals].</p>
<p><b>View all other</b> <a href="lwg-index.html#facet.num.get.virtuals">issues</a> in [facet.num.get.virtuals].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#NAD">NAD</a> status.</p>
<p><b>Discussion:</b></p>
<p>
From Section 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a>, paragraphs 11 and 12, it is implied
that the value read from a stream must be stored
even if the placement of thousands separators does not conform to the
<code>grouping()</code> specification from the <code>numpunct</code> facet.
Since incorrectly-placed thousands separators are flagged as an extraction
failure (by the means of <code>failbit</code>), we believe it is better not
to store the value. A consistent strategy, in which any kind of extraction
failure leaves the input item intact, is conceptually cleaner, is able to avoid
corner-case traps, and is also more understandable from the programmer's point
of view.
</p>
<p>
Here is a quote from <i>"The C++ Programming Language (Special Edition)"</i>
by B.&nbsp;Stroustrup (Section&nbsp;D.4.2.3, pg.&nbsp;897):
</p>
<blockquote><p>
<i>"If a value of the desired type could not be read, failbit is set in r.
[...] An input operator will use r to determine how to set the state of its
stream. If no error was encountered, the value read is assigned through v;
otherwise, v is left unchanged."</i>
</p></blockquote>
<p>
This statement implies that <code>rdstate()</code> alone is sufficient to
determine whether an extracted value is to be assigned to the input item
<i>val</i> passed to <code>do_get</code>. However, this is in disagreement
with the current C++ Standard. The above-mentioned assumption is true in all
cases, except when there are mismatches in digit grouping. In the latter case,
the parsed value is assigned to <i>val</i>, and, at the same time, <i>err</i>
is assigned to <code>ios_base::failbit</code> (essentially "lying" about the
success of the operation). Is this intentional? The current behavior raises
both consistency and usability concerns.
</p>
<p>
Although digit grouping is outside the scope of <code>scanf</code> (on which
the virtual methods of <code>num_get</code> are based), handling of grouping
should be consistent with the overall behavior of scanf. The specification of
<code>scanf</code> makes a distinction between input failures and matching
failures, and yet both kinds of failures have no effect on the input items
passed to <code>scanf</code>. A mismatch in digit grouping logically falls in
the category of matching failures, and it would be more consistent, and less
surprising to the user, to leave the input item intact whenever a failure is
being signaled.
</p>
<p>
The extraction of <code>bool</code> is another example outside the scope of
<code>scanf</code>, and yet consistent, even in the event of a successful
extraction of a <code>long</code> but a failed conversion from
<code>long</code> to <code>bool</code>.
</p>
<p>
Inconsistency is further aggravated by the fact that, when failbit is set,
subsequent extraction operations are no-ops until <code>failbit</code> is
explicitly cleared. Assuming that there is no explicit handling of
<code>rdstate()</code> (as in <code>cin&gt;&gt;i&gt;&gt;j</code>) it is
counter-intuitive to be able to extract an integer with mismatched digit
grouping, but to be unable to extract another, properly-formatted integer
that immediately follows.
</p>
<p>
Moreover, setting <code>failbit</code>, and selectively assigning a value to
the input item, raises usability problems. Either the strategy of
<code>scanf</code> (when there is no extracted value in case of failure), or
the strategy of the <code>strtol</code> family (when there is always an
extracted value, and there are well-defined defaults in case of a failure) are
easy to understand and easy to use. On the other hand, if <code>failbit</code>
alone cannot consistently make a difference between a failed extraction, and a
successful but not-quite-correct extraction whose output happens to be the same
as the previous value, the programmer must resort to implementation tricks.
Consider the following example:
</p>
<pre>
    int i = old_i;
    cin &gt;&gt; i;
    if (cin.fail())
        // can the value of i be trusted?
        // what does it mean if i == old_i?
        // ...
</pre>
<p>
Last but not least, the current behvaior is not only confusing to the casual
reader, but it has also been confusing to some book authors. Besides
Stroustrup's book, other books (e.g. "Standard C++ IOStreams and Locales" by
Langer and Kreft) are describing the same mistaken assumption. Although books
are not to be used instead of the standard reference, the readers of these
books, as well as the people who are generally familiar to <code>scanf</code>,
are even more likely to misinterpret the standard, and expect the input items
to remain intact when a failure occurs.
</p>


<p id="res-662"><b>Proposed resolution:</b></p>

<p>
Change 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a>:
</p>

<blockquote>
<p>
<b>Stage 3:</b> The result of stage 2 processing can be one of
</p>
<ul>
<li>A sequence of <code>chars</code> has been accumulated in stage 2 that is converted (according to the rules of <code>scanf</code>) to a value of the type of <code><i>val</i></code>.  <del>This value is stored in <code><i>val</i></code> and <code>ios_base::goodbit</code> is stored in <code><i>err</i></code>.</del></li>

<li>The sequence of <code>chars</code> accumulated in stage 2 would have caused <code>scanf</code> to report an input failure. <code>ios_base::failbit</code> is assigned to <code><i>err</i></code>.</li>
</ul>
<p>
<ins>In the first case,</ins> <del>D</del><ins>d</ins>igit grouping is checked.  That is, the positions of discarded separators is examined for consistency with <code>use_facet&lt;numpunct&lt;charT&gt; &gt;(<i>loc</i>).grouping()</code>.  If they are not consistent then <code>ios_base::failbit</code> is assigned to <code><i>err</i></code>.  <ins>Otherwise, the value that was converted in stage 2 is stored in <code><i>val</i></code> and <code>ios_base::goodbit</code> is stored in <code><i>err</i></code>.</ins>
</p>
</blockquote>


<p><b>Rationale:</b></p><p>
post-Toronto: Changed from New to NAD at the request of the author.  The preferred solution of
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2327.pdf">N2327</a>
makes this resolution obsolete.
</p>




</body>
</html>
