<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2854: wstring_convert provides no indication of incomplete input or output</title>
<meta property="og:title" content="Issue 2854: wstring_convert provides no indication of incomplete input or output">
<meta property="og:description" content="C++ library issue. Status: NAD">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2854.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#NAD">NAD</a> status.</em></p>
<h3 id="2854"><a href="lwg-closed.html#2854">2854</a>. <code>wstring_convert</code> provides no indication of incomplete input or output</h3>
<p><b>Section:</b> 99 [depr.conversions.string] <b>Status:</b> <a href="lwg-active.html#NAD">NAD</a>
 <b>Submitter:</b> PowerGamer <b>Opened:</b> 2017-01-08 <b>Last modified:</b> 2017-06-05</p>
<p><b>Priority: </b>3
</p>
<p><b>View other</b> <a href="lwg-index-open.html#depr.conversions.string">active issues</a> in [depr.conversions.string].</p>
<p><b>View all other</b> <a href="lwg-index.html#depr.conversions.string">issues</a> in [depr.conversions.string].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#NAD">NAD</a> status.</p>
<p><b>Discussion:</b></p>
<p>
Example:
</p>
<blockquote><pre>
// Input UTF-16 string is incomplete - only first half of
// UTF-16 surrogate pair L"\xD843\xDEF9":
wchar_t in_utf16[] = L"\xD843";

std::wstring_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;&gt; cvt;
auto out_utf8 = cvt.to_bytes(in_utf16); // No error.
</pre></blockquote>
<p>
There is no indication that input was incomplete (the value returned
by <code>cvt.state()</code> is not documented and so cannot be examined by user for
that purpose). As such the user will not know that more input data
should be provided in additional call to <code>cvt.to_bytes()</code>.
<p/>
The output can be incomplete too: MSVC2017 implementation (which as
far as I can tell is standard conforming) produces <code>"\xF0"</code> in <code>out_utf8</code>.
Again, no indication of incomplete output produced is provided by
<code>std::wstring_convert</code>.
<p/>
IMO it makes <code>std::wstring_convert</code> in its current state completely
useless (it cannot be relied upon to either produce complete and valid
UTF sequence or throw an error in all situations).
<p/>
Imagine a file has UTF16 encoded text. You want to read all the data
from a file at once and convert it into UTF8 using
<code>std::wstring_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;&gt;</code>.
<p/>
Now, if a file contains completely <em>invalid</em> UTF16 (for example,
forbidden or incorrectly encoded Unicode code points) you will get an
exception from <code>std::wstring_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;&gt;</code>.
<p/>
But if a file contains <em>incomplete</em> (but in all other regards <em>valid</em>)
UTF16 (for ex. file ends with only the first half of a valid surrogate
pair) you will <em>neither</em> get an error exception from
<code>std::wstring_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;&gt;</code> <em>nor</em> any
indication that the input provided to 
<code>std::wstring_convert&lt;std::codecvt_utf8_utf16&lt;wchar_t&gt;&gt;</code> was incomplete.
</p>

<p><i>[2017-01-27 Telecon]</i></p>

<p>Priority 3; send to LEWG</p>

<p><i>[2017-02 in Kona, LEWG recommends NAD]</i></p>


<p><i>[2017-06-02 Issues Telecon]</i></p>

<p>This facility has a number of known problems, including poor error handling.
The feature has been deprecated, and the plan is to replace it with better
facilities with a better API.</p>
<p>Resolve as NAD</p>


<p id="res-2854"><b>Proposed resolution:</b></p>





</body>
</html>
