<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2381: Inconsistency in parsing floating point numbers</title>
<meta property="og:title" content="Issue 2381: Inconsistency in parsing floating point numbers">
<meta property="og:description" content="C++ library issue. Status: C++23">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2381.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#C++23">C++23</a> status.</em></p>
<h3 id="2381"><a href="lwg-defects.html#2381">2381</a>. Inconsistency in parsing floating point numbers</h3>
<p><b>Section:</b> 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a> <b>Status:</b> <a href="lwg-active.html#C++23">C++23</a>
 <b>Submitter:</b> Marshall Clow <b>Opened:</b> 2014-04-30 <b>Last modified:</b> 2023-11-22</p>
<p><b>Priority: </b>2
</p>
<p><b>View other</b> <a href="lwg-index-open.html#facet.num.get.virtuals">active issues</a> in [facet.num.get.virtuals].</p>
<p><b>View all other</b> <a href="lwg-index.html#facet.num.get.virtuals">issues</a> in [facet.num.get.virtuals].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#C++23">C++23</a> status.</p>
<p><b>Discussion:</b></p>
<p>
In 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a> we have:
</p>
<blockquote><p>
Stage 3: The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the
rules of one of the functions declared in the header <code>&lt;cstdlib&gt;</code>:
</p>
<ul>
<li><p>For a signed integer value, the function <code>strtoll</code>.</p></li>
<li><p>For an unsigned integer value, the function <code>strtoull</code>.</p></li>
<li><p>For a floating-point value, the function <code>strtold</code>.</p></li>
</ul>
</blockquote>
<p>
This implies that for many cases, this routine should return true:
</p>
<blockquote><pre>
bool is_same(const char* p)
{
  std::string str{p};
  double val1 = std::strtod(str.c_str(), nullptr);
  std::stringstream ss(str);
  double val2;
  ss &gt;&gt; val2;
  return std::isinf(val1) == std::isinf(val2) &amp;&amp;                 // either they're both infinity
         std::isnan(val1) == std::isnan(val2) &amp;&amp;                 // or they're both NaN
         (std::isinf(val1) || std::isnan(val1) || val1 == val2); // or they're equal
}
</pre></blockquote>
<p>
and this is indeed true, for many strings:
</p>
<blockquote><pre>
assert(is_same("0"));
assert(is_same("1.0"));
assert(is_same("-1.0"));
assert(is_same("100.123"));
assert(is_same("1234.456e89"));
</pre></blockquote>
<p>
but not for others
</p>
<blockquote><pre>
assert(is_same("0xABp-4")); // hex float
assert(is_same("inf"));
assert(is_same("+inf"));
assert(is_same("-inf"));
assert(is_same("nan"));
assert(is_same("+nan"));
assert(is_same("-nan"));

assert(is_same("infinity"));
assert(is_same("+infinity"));
assert(is_same("-infinity"));
</pre></blockquote>
<p>
These are all strings that are correctly parsed by <code>std::strtod</code>, but not by the stream extraction operators.
They contain characters that are deemed invalid in stage 2 of parsing.
<p/>
If we're going to say that we're converting by the rules of <code>strtold</code>, then we should accept all the things that
<code>strtold</code> accepts.
</p>

<p><i>[2016-04, Issues Telecon]</i></p>

<p>
People are much more interested in round-tripping hex floats than handling <code>inf</code> and <code>nan</code>. Priority changed to P2.
</p>
<p>
Marshall says he'll try to write some wording, noting that this is a very closely specified part of the standard, and has remained unchanged for a long time. Also, there will need to be a sample implementation.
</p>

<p><i>[2016-08, Chicago]</i></p>

<p>Zhihao provides wording</p>
<p>The <code>src</code> array in Stage 2 does narrowing only.  The actual
input validation is delegated to <code>strtold</code> (independent from
the parsing in Stage 3 which is again being delegated
to <code>strtold</code>) by saying:</p>

<p>  [...] If it is not discarded, then a check is made to determine
  if c is allowed as the next character of an input field of the
  conversion specifier returned by Stage 1.</p>

<p>So a conforming C++11 <code>num_get</code> is supposed to magically
accept an hexfloat without an exponent</p>

<p>  0x3.AB</p>

<p>because we refers to C99, and the fix to this issue should be
just expanding the <code>src</code> array.</p>

<p>Support for Infs and NaNs are not proposed because of the
complexity of nan(n-chars).</p>

<p><i>[2016-08, Chicago]</i></p>

<p>Tues PM: Move to Open</p>

<p><i>[2016-09-08, Zhihao Yuan comments and updates proposed wording]</i></p>

<p>
Examples added.
</p>

<p><i>[2018-08-23 Batavia Issues processing]</i></p>

<p>Needs an Annex C entry. Tim to write Annex C.</p>

<p><strong>Previous resolution [SUPERSEDED]:</strong></p>
<blockquote class="note">
<p>This wording is relative to N4606.</p>

<ol>
<li><p>Change 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a>/3 Stage 2 as indicated:</p>

<blockquote>
<p><code>static const char src[] = "0123456789abcdef<ins>p</ins>xABCDEF<ins>P</ins>X+-";</code></p>
</blockquote>
</li>

<li><p>Append the following examples to 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a>/3 Stage 2 as indicated:</p>
<blockquote>
<p>
<ins>[<i>Example:</i></ins>
</p>
<blockquote>
<p>
<ins>Given an input sequence of <code>"0x1a.bp+07p"</code>,</ins>
</p>
<ul>
<li><p><ins>if Stage 1 returns <code>%d</code>, <code>"0"</code> is accumulated;</ins></p></li>
<li><p><ins>if Stage 1 returns <code>%i</code>, <code>"0x1a"</code> are accumulated;</ins></p></li>
<li><p><ins>if Stage 1 returns <code>%g</code>, <code>"0x1a.bp+07"</code> are accumulated.</ins></p></li>
</ul>
<p>
<ins>In all cases, leaving the rest in the input.</ins>
</p>
</blockquote>
<p><ins>&mdash; end example]</ins></p>
</blockquote>
</li>
</ol>
</blockquote>

<p><i>[2021-05-18 Tim updates wording]</i></p>

<p>Based on the git history, libc++ appears to have always included
<code>p</code> and <code>P</code> in <code>src</code>.</p>

<p><i>[2021-09-20; Reflector poll]</i></p>

<p>
Set status to Tentatively Ready after eight votes in favour during reflector poll.
</p>

<p><i>[2021-10-14 Approved at October 2021 virtual plenary. Status changed: Voting &rarr; WP.]</i></p>



<p id="res-2381"><b>Proposed resolution:</b></p>
<p>This wording is relative to <a href="https://wg21.link/n4885">N4885</a>.</p>

<ol>
<li><p>Change 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a>/3 Stage 2 as indicated:</p>

<blockquote>
<p>
&mdash; Stage 2:
<p/>
If <code>in == end</code> then stage 2 terminates. Otherwise a <code>charT</code>
is taken from <code>in</code> and local variables are initialized as if by
</p>
<blockquote>
<pre>
char_type ct = *in;
char c = src[find(atoms, atoms + sizeof(src) - 1, ct) - atoms];
if (ct == use_facet&lt;numpunct&lt;charT&gt;&gt;(loc).decimal_point())
c = '.';
bool discard =
  ct == use_facet&lt;numpunct&lt;charT&gt;&gt;(loc).thousands_sep()
  &amp;&amp; use_facet&lt;numpunct&lt;charT&gt;&gt;(loc).grouping().length() != 0;
</pre>
</blockquote>
<p>
where the values <code>src</code> and <code>atoms</code> are defined as if by:
</p>
<blockquote>
<pre>
static const char src[] = "0123456789abcdef<ins>p</ins>xABCDEF<ins>P</ins>X+-";
char_type atoms[sizeof(src)];
use_facet&lt;ctype&lt;charT&gt;&gt;(loc).widen(src, src + sizeof(src), atoms);
</pre>
</blockquote>
<p>
for this value of <code>loc</code>.
<p/>
If <code>discard</code> is true, then if <code>'.'</code> has not yet been accumulated,
then the position of the character is remembered, but the character is otherwise
ignored. Otherwise, if <code>'.'</code> has already been accumulated, the character
is discarded and Stage 2 terminates. If it is not discarded, then a check is
made to determine if <code>c</code> is allowed as the next character of an input
field of the conversion specifier returned by Stage 1. If so, it is accumulated.
<p/>
If the character is either discarded or accumulated then <code>in</code> is advanced
by <code>++in</code> and processing returns to the beginning of stage 2.
<p/>
<ins>[<i>Example:</i></ins>
</p>
<blockquote>
<p>
<ins>Given an input sequence of <code>"0x1a.bp+07p"</code>,</ins>
</p>
<ul>
<li><p><ins>if the conversion specifier returned by Stage 1 is <code>%d</code>, <code>"0"</code> is accumulated;</ins></p></li>
<li><p><ins>if the conversion specifier returned by Stage 1 is <code>%i</code>, <code>"0x1a"</code> are accumulated;</ins></p></li>
<li><p><ins>if the conversion specifier returned by Stage 1 is <code>%g</code>, <code>"0x1a.bp+07"</code> are accumulated.</ins></p></li>
</ul>
<p>
<ins>In all cases, the remainder is left in the input.</ins>
</p>
</blockquote>
<p><ins>&mdash; end example]</ins></p>
</blockquote>
</li>
<li>
<p>Add the following new subclause to C.6 <a href="https://wg21.link/diff.cpp03">[diff.cpp03]</a>:</p>
<blockquote>
<p>
<ins><b>C.4.? [locale]: localization library [diff.cpp03.locale]</b></ins>
<p/>
<ins><b>Affected subclause:</b> 28.3.4.3.2.3 <a href="https://wg21.link/facet.num.get.virtuals">[facet.num.get.virtuals]</a><br/>
<b>Change:</b> The <code>num_get</code> facet recognizes hexadecimal floating point values.<br/>
<b>Rationale:</b> Required by new feature.<br/>
<b>Effect on original feature:</b> Valid C++2003 code may have different behavior in this
revision of C++.
</ins>
</p>
</blockquote>
</li>
</ol>





</body>
</html>
