<html>
<head>
<title>P0682R0: Repairing elementary string conversions (LWG 2955)</title>

<style type="text/css">
  ins { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .new { text-decoration:none; font-weight:bold; background-color:#D0FFD0 }
  del { text-decoration:line-through; background-color:#FFA0A0 }  
  strong { font-weight: inherit; color: #2020ff }
  table, td, th { border: 1px solid black; border-collapse:collapse; padding: 5px }
</style>
</head>

<body>
ISO/IEC JTC1 SC22 WG21 P0682R0<br/>
Jens Maurer &lt;Jens.Maurer@gmx.net><br/>
Target audience: LEWG and LWG<br/>
2017-06-19<br/>

<h1>P0682R0: Repairing elementary string conversions</h1>

<h2>Introduction</h2>

<p>

This paper proposes a few repairs for the elementary string
conversions introduced into C++17 via P0067R5 "Elementary string
conversions, revision 5".  In particular, the following issues were
raised in e-mail discussions:

<ul>
  <li>Move the functions into a separate header; do not cram them into &lt;utility&gt; (suggestion by Zhihao Yuan: &lt;textproc&gt;)</li>
  <li>Depending on std::error_code may cause an indirect dependency on std::string; use std::errc instead (LWG 2955).</li>
  <li>Clarify the behavior when parsing "-" as a signed number.  The
    outcome should be errc::invalid_argument.</li>
  <li>Rounding behavior for floating-point to_chars / from_chars (Edward Catmur via std-discussion)
</ul>

This paper should be treated as a Defect Report against C++17.

<h2>Discussion</h2>

<h3>Separate header</h3>

The &lt;utility&gt; header contains a number of unrelated tools; its
scope should not be expanded even further.  Alternative candidates
among existing headers would be:

<ul>
  <li>&lt;string&gt; &mdash; but to_chars / from_chars don't use std::string</li>
  <li>&lt;cstring&gt; &mdash; but adding C++ names to headers adapted from C is probably unwise</li>
  <li>&lt;cstdlib&gt; &mdash; but adding C++ names to headers adapted from C is probably unwise</li>
</ul>

So, assuming consensus to create a new header is achieved, this boils
down to a naming discussion for which I suggest the "expert champion"
approach previously discussed.  Candidate names are:

<ul>
  <li>&lt;textproc&gt; (Zhihao Yuan) &mdash; but there is not a lot of text processed
  <li>&lt;textconv&gt; &mdash; but might be confused with encoding conversions</li>
  <li>&lt;charconv&gt; &mdash; but more than one character is converted</li>
  <li>&lt;stringconv&gt; &mdash; but might be confused with encoding conversions</li>
  <li>&lt;fromto_chars&gt; &mdash; a bit long and artificial</li>
  <li>&lt;chars&gt;</li>
</ul>

<h3>Dependency on std::error_code (LWG 2955)</h3>

Raised by Billy Robert O'Neal III in
http://lists.isocpp.org/lib/2017/03/2400.php.
<p>

This dependency brings in functions returning
a <code>std::string</code>, which is at a higher level than the
functions to_chars and from_chars.  For a low-level facility which
might also be made available on freestanding environments, such a
dependency seems to violate boundaries between abstraction levels.
<p>
The recommendation is to directly use the std::errc enumeration
(22.5.1 [system_error.syn]), However, the usage pattern of the
functions deteriorates to something like:
<pre>
  if (auto [ptr, ec] = to_chars(p, last, 42); ec != std::errc()) {
    // failure case
  }
</pre>

<h3>Clarify the behavior when parsing "-" as a signed number</h3>

This is a near-editorial clarification.

<h3>Rounding behavior for floating-point to_chars / from_chars</h3>

This topic only concerns "shortest string representation" conversions,
i.e. those without a precision.  The round-trip requirements already
remove some freedom in this area.
<p>
"For example 0x1.0000000000001p0 is approx. 1.000000000000000222045,
so is 1.0000000000000003 an acceptable output from to_chars, or only
1.0000000000000002?" The suggested resolution is to prefer the output
that has the smallest difference to the original floating-point value.
<p>
Should the current rounding mode be considered in from_chars? For
example, 1e23 might be read as
0x1.52d02c7e14af<span style="color:red">6</span>p76 or as
0x1.52d02c7e14af<span style="color:red">7</span>p76.  If the current
rounding mode is, in fact, considered, the follow-on question is
whether the round-trip guarantee only applies when both to_chars and
from_chars operate under the same rounding mode, or regardless of
rounding mode.  For applications such as interval arithmetic, it seems
worthwhile if from_chars does consider the rounding mode, but to_chars
should certainly not be forced to output a long digit string for 1e23 to
disambiguate.


<h2>Wording changes</h2>

Change in 23.2.1 [utility.syn]:

<blockquote>
  <pre>
  struct to_chars_result {
    char* ptr;
    <del>error_code</del><ins>errc</ins> ec;
  };
[...]
  struct from_chars_result {
    const char* ptr;
    <del>error_code</del><ins>errc</ins> ec;
  };
  </pre>
</blockquote>

Change in 23.2.8 [utility.to.chars] paragraphs 1 and 2:

<blockquote>
... If the member ec of the return value is such that the value<del>,
when converted to bool, is false</del> <ins>is equal to the value of a
value-initialized <code>errc</code></ins>, the conversion was
successful and the member ptr is the one-past-the-end pointer of the
characters written. ..
<p>
The functions that take a floating-point value but not a precision
parameter ensure that the string representation consists of the
smallest number of characters such that there is at least one digit
before the radix point (if present) and parsing the representation
using the corresponding from_chars function <ins>using the same
rounding mode (29.4 [cfenv])</ins> recovers <code>value</code>
exactly. <ins>If there are several such representations, the
representation with the smallest difference to the floating-point
argument value is chosen, resolving any remaining ties using the
current rounding mode.</ins> [ Note: ... ]
</blockquote>

Change in 23.2.9 [utility.from.chars] paragraph 1:

<blockquote>
All functions named from_chars analyze the string [first, last) for a
pattern, where [first, last) is required to be a valid range. If no
characters match the pattern, <code>value</code> is unmodified, the
member <code>ptr</code> of the return value is <code>first</code> and
the member <code>ec</code> is equal to errc::invalid_argument. <ins>[
Note: If the pattern allows for an optional sign, but the string has
no digit characters following the sign, no characters match the
pattern. ]</ins> ... Otherwise, <code>value</code> is set to the
parsed value<ins>, after rounding according to the current rounding
mode (29.4 [cfenv]),</ins> and the member <code>ec</code> is <del>set
such that the conversion to bool yields
false</del> <ins>value-initialized</ins>.

</blockquote>

</body>
</html>
