<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 382: codecvt do_in/out result</title>
<meta property="og:title" content="Issue 382: codecvt do_in/out result">
<meta property="og:description" content="C++ library issue. Status: NAD">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue382.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#NAD">NAD</a> status.</em></p>
<h3 id="382"><a href="lwg-closed.html#382">382</a>. codecvt do_in/out result</h3>
<p><b>Section:</b> 28.3.4.2.5 <a href="https://wg21.link/locale.codecvt">[locale.codecvt]</a> <b>Status:</b> <a href="lwg-active.html#NAD">NAD</a>
 <b>Submitter:</b> Martin Sebor <b>Opened:</b> 2002-08-30 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#locale.codecvt">issues</a> in [locale.codecvt].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#NAD">NAD</a> status.</p>
<p><b>Discussion:</b></p>
<p>
It seems that the descriptions of codecvt do_in() and do_out() leave
sufficient room for interpretation so that two implementations of
codecvt may not work correctly with the same filebuf. Specifically,
the following seems less than adequately specified:
</p>

<ol>
<li>
  the conditions under which the functions terminate
</li>
<li>
  precisely when the functions return ok
</li>
<li>
  precisely when the functions return partial
</li>
<li>
  the full set of conditions when the functions return error
</li>
</ol>

<ol>
<li>
   28.3.4.2.5.3 <a href="https://wg21.link/locale.codecvt.virtuals">[locale.codecvt.virtuals]</a>, p2 says this about the effects of the
   function: ...Stops if it encounters a character it cannot
   convert...  This assumes that there *is* a character to
   convert. What happens when there is a sequence that doesn't form a
   valid source character, such as an unassigned or invalid UNICODE
   character, or a sequence that cannot possibly form a character
   (e.g., the sequence "\xc0\xff" in UTF-8)?
</li>
<li>
   Table 53 says that the function returns codecvt_base::ok
   to indicate that the function(s) "completed the conversion."
   Suppose that the source sequence is "\xc0\x80" in UTF-8,
   with from pointing to '\xc0' and (from_end==from + 1).
   It is not clear whether the return value should be ok
   or partial (see below).
</li>
<li>
   Table 53 says that the function returns codecvt_base::partial
   if "not all source characters converted." With the from pointers
   set up the same way as above, it is not clear whether the return
   value should be partial or ok (see above).
</li>
<li>
   Table 53, in the row describing the meaning of error mistakenly
   refers to a "from_type" character, without the symbol from_type
   having been defined. Most likely, the word "source" character
   is intended, although that is not sufficient. The functions
   may also fail when they encounter an invalid source sequence
   that cannot possibly form a valid source character (e.g., as
   explained in bullet 1 above).
</li>
</ol>
<p>
Finally, the conditions described at the end of 28.3.4.2.5.3 <a href="https://wg21.link/locale.codecvt.virtuals">[locale.codecvt.virtuals]</a>, p4 don't seem to be possible:
</p>
<blockquote><p>
    "A return value of partial, if (from_next == from_end),
    indicates that either the destination sequence has not
    absorbed all the available destination elements, or that
    additional source elements are needed before another
    destination element can be produced."
</p></blockquote>
<p>
If the value is partial, it's not clear to me that (from_next
==from_end) could ever hold if there isn't enough room
in the destination buffer. In order for (from_next==from_end) to
hold, all characters in that range must have been successfully
converted (according to 28.3.4.2.5.3 <a href="https://wg21.link/locale.codecvt.virtuals">[locale.codecvt.virtuals]</a>, p2) and since there are no
further source characters to convert, no more room in the
destination buffer can be needed.
</p>
<p>
It's also not clear to me that (from_next==from_end) could ever
hold if additional source elements are needed to produce another
destination character (not element as incorrectly stated in the
text). partial is returned if "not all source characters have
been converted" according to Table 53, which also implies that
(from_next==from) does NOT hold.
</p>
<p>
Could it be that the intended qualifying condition was actually
(from_next != from_end), i.e., that the sentence was supposed
to read
</p>
<blockquote><p>
    "A return value of partial, if (from_next != from_end),..."
</p></blockquote>
<p>
which would make perfect sense, since, as far as I understand it,
partial can only occur if (from_next != from_end)?
</p>
<p><i>[Lillehammer: Defer for the moment, but this really needs to be
  fixed. Right now, the description of codecvt is too vague for it to
  be a useful contract between providers and clients of codecvt
  facets.  (Note that both vendors and users can be both providers and
  clients of codecvt facets.) The major philosophical issue is whether
  the standard should only describe mappings that take a single wide
  character to multiple narrow characters (and vice versa), or whether
  it should describe fully general N-to-M conversions. When the
  original standard was written only the former was contemplated, but
  today, in light of the popularity of utf8 and utf16, that doesn't
  seem sufficient for C++0x. Bill supports general N-to-M conversions;
  we need to make sure Martin and Howard agree.]</i></p>


<p><i>[
2009-07 Frankfurt
]</i></p>


<blockquote>
<p>
codecvt is meant to be a 1-to-N to N-to-1 conversion. It does not work
well for N-to-M conversions. wbuffer_convert now exists, and handles
N-to-M cases. Also, there is a new specialization of codecvt that
permits UTF-16 &lt;-&gt; UTF-8 conversions.
</p>
<p>
NAD without prejudice. Will reopen if proposed resolution is supplied.
</p>
</blockquote>



<p id="res-382"><b>Proposed resolution:</b></p>




</body>
</html>
