<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 3767: codecvt&lt;charN_t, char8_t, mbstate_t&gt; incorrectly added to locale</title>
<meta property="og:title" content="Issue 3767: codecvt&lt;charN_t, char8_t, mbstate_t&gt; incorrectly added to locale">
<meta property="og:description" content="C++ library issue. Status: WP">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue3767.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#WP">WP</a> status.</em></p>
<h3 id="3767"><a href="lwg-defects.html#3767">3767</a>. <code>codecvt&lt;char<i>N</i>_t, char8_t, mbstate_t&gt;</code> incorrectly added to locale</h3>
<p><b>Section:</b> 28.3.3.1.2.1 <a href="https://wg21.link/locale.category">[locale.category]</a>, 28.3.4.2.5.1 <a href="https://wg21.link/locale.codecvt.general">[locale.codecvt.general]</a> <b>Status:</b> <a href="lwg-active.html#WP">WP</a>
 <b>Submitter:</b> Victor Zverovich <b>Opened:</b> 2022-09-05 <b>Last modified:</b> 2024-04-02</p>
<p><b>Priority: </b>3
</p>
<p><b>View all other</b> <a href="lwg-index.html#locale.category">issues</a> in [locale.category].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#WP">WP</a> status.</p>
<p><b>Discussion:</b></p>
<p>
Table [tab:locale.category.facets] includes the following two facets:
</p>
<ul>
<li><p><code>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</code></p></li>
<li><p><code>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</code></p></li>
</ul>
<p>
However, neither of those actually has anything to do with a locale and therefore 
it doesn't make sense to dynamically register them with <code>std::locale</code>. 
Instead they provide conversions between fixed encodings (UTF-8, UTF-16, UTF-32) 
that are unrelated to locale encodings other than they may happen to coincide with 
encodings of some locales by accident.
<p/>
The issue was introduced when adding <code>codecvt&lt;char[16|32]_t, char, mbstate_t&gt;</code> in 
<a href="https://wg21.link/N2035" title=" Minimal Unicode support for the standard library">N2035</a> which gave no design rationale for using <code>codecvt</code> in the first 
place. Likely it was trying to do a minimal amount of changes and copied the wording for 
<code>codecvt&lt;wchar_t, char, mbstate_t&gt;</code> but unfortunately didn't consider encoding implications.
<p/>
<a href="https://wg21.link/P0482" title=" char8_t: A type for UTF-8 characters and strings (Revision 6)">P0482</a> changed <code>char</code> to <code>char8_t</code> in these facets which 
made the issue more glaring but unfortunately, despite the breaking change, it failed to address it.
<p/>
Apart from an obvious design mistake this also adds a small overhead for every locale 
construction because the implementation has to copy these pseudo-facets for no good 
reason violating "don't pay for what you don't use" principle.
<p/>
A simple fix is to remove the two facets from table [tab:locale.category.facets] and make them 
directly constructible.
</p>

<p><i>[2022-09-23; Reflector poll]</i></p>

<p>
Set priority to 3 after reflector poll. Send to SG16 (then maybe LEWG).
</p>

<p><i>[2022-09-28; SG16 responds]</i></p>

<p>
SG16 agrees that the codecvt facets mentioned in LWG3767
"<code>codecvt&lt;char<i>N</i>_t, char8_t, mbstate_t&gt;</code>
incorrectly added to locale" are intended to be invariant
with respect to locale. Unanimously in favor.
</p>

<p><i>[Issaquah 2023-02-10; LWG issue processing]</i></p>

<p>
Removing these breaks most code using them today, because the most obvious
way to use them is via <code>use_facet</code> on a locale, which would throw
if they're removed (and because they were guaranteed to be present, code
using them might have not bothered to check for them using <code>has_facet</code>).
Instead of removing them, deprecate the guarantee that they're always present
(so move them to D.19 <a href="https://wg21.link/depr.locale.category">[depr.locale.category]</a>).
Don't bother changing the destructor.
Victor to update wording.
</p>

<p><strong>Previous resolution [SUPERSEDED]:</strong></p>
<blockquote class="note">

<p>
This wording is relative to <a href="https://wg21.link/N4917" title=" Working Draft, Standard for Programming Language C++">N4917</a>.
</p>

<ol>

<li><p>Modify 28.3.3.1.2.1 <a href="https://wg21.link/locale.category">[locale.category]</a>, Table 105 ([tab:locale.category.facets]) &mdash; 
"Locale category facets" &mdash; as indicated:</p>

<blockquote>
<table border="1">
<caption>Table 105: Locale category facets [tab:locale.category.facets]</caption>
<tr>
<th align="center">Category</th>
<th align="center">Includes facets</th>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

<tr>
<td>
ctype
</td>

<td>
<code>ctype&lt;char&gt;, ctype&lt;wchar_t&gt;<br/>
codecvt&lt;char, char, mbstate_t&gt;<br/>
<del>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</del><br/>
<del>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</del><br/>
codecvt&lt;wchar_t, char, mbstate_t&gt;</code>
</td>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

</table>
</blockquote>

</li>

<li><p>Modify 28.3.4.2.5.1 <a href="https://wg21.link/locale.codecvt.general">[locale.codecvt.general]</a> as indicated:</p>

<blockquote>
<blockquote>
<pre>
namespace std {
  [&hellip;]
  template&lt;class internT, class externT, class stateT&gt;
    class codecvt : public locale::facet, public codecvt_base {
    public:
      using intern_type = internT;
      using extern_type = externT;
      using state_type = stateT;

      explicit codecvt(size_t refs = 0);
      <ins>~codecvt();</ins>

      [&hellip;]
    protected:
      <del>~codecvt();</del>
      [&hellip;]
    };
}
</pre>
</blockquote>
<p>
[&hellip;]
<p/>
-3- The specializations required in Table <del>105 [tab:locale.category.facets]</del><ins>106 [tab:locale.spec]</ins>
(28.3.3.1.2.1 <a href="https://wg21.link/locale.category">[locale.category]</a>) convert the implementation-defined native character set. 
<code>codecvt&lt;char, char, mbstate_t&gt;</code> implements a degenerate conversion; it does not 
convert at all. The specialization <code>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</code> converts 
between the UTF-16 and UTF-8 encoding forms, and the specialization 
<code>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</code> converts between the UTF-32 and UTF-8 encoding forms. 
<code>codecvt&lt;wchar_t, char, mbstate_t&gt;</code> converts between the native character sets for ordinary 
and wide characters. Specializations on <code>mbstate_t</code> perform conversion between encodings known to 
the library implementer. Other encodings can be converted by specializing on a program-defined 
<code>stateT</code> type. Objects of type <code>stateT</code> can contain any state that is useful to communicate 
to or from the specialized <code>do_in</code> or <code>do_out</code> members.
</p>
</blockquote>
</li>

</ol>
</blockquote>

<p><i>[2023-02-10; Victor Zverovich comments and provides improved wording]</i></p>

<p>
Per today's LWG discussion the following changes have been implemented in revised wording:
</p>
<ul>
<li><p>Deprecated the facets instead of removing them (also <code>_byname</code> variants which were previously missed).</p></li>
<li><p>Removed the changes to facet dtor since with deprecation it's no longer critical to provide other ways to access them.</p></li>
</ul>
<p><i>[Kona 2023-11-07; move to Ready]</i></p>


<p><i>[Tokyo 2024-03-23; Status changed: Voting &rarr; WP.]</i></p>



<p id="res-3767"><b>Proposed resolution:</b></p>
<p>
This wording is relative to <a href="https://wg21.link/N4928" title=" Working Draft, Standard for Programming Language C++">N4928</a>.
</p>

<ol>

<li><p>Modify 28.3.3.1.2.1 <a href="https://wg21.link/locale.category">[locale.category]</a>, Table 105 ([tab:locale.category.facets]) &mdash; 
"Locale category facets" &mdash; and Table 106 ([tab:locale.spec]) "Required specializations" as indicated:</p>

<blockquote>
<table border="1">
<caption>Table 105: Locale category facets [tab:locale.category.facets]</caption>
<tr>
<th align="center">Category</th>
<th align="center">Includes facets</th>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

<tr>
<td>
ctype
</td>

<td>
<code>ctype&lt;char&gt;, ctype&lt;wchar_t&gt;<br/>
codecvt&lt;char, char, mbstate_t&gt;<br/>
<del>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</del><br/>
<del>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</del><br/>
codecvt&lt;wchar_t, char, mbstate_t&gt;</code>
</td>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

</table>
[&hellip;]
<table border="1">
<caption>Table 106: Required specializations [tab:locale.spec]</caption>
<tr>
<th align="center">Category</th>
<th align="center">Includes facets</th>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

<tr>
<td>
ctype
</td>

<td>
<code>ctype_byname&lt;char&gt;, ctype_byname&lt;wchar_t&gt;<br/>
codecvt_byname&lt;char, char, mbstate_t&gt;<br/>
<del>codecvt_byname&lt;char16_t, char8_t, mbstate_t&gt;</del><br/>
<del>codecvt_byname&lt;char32_t, char8_t, mbstate_t&gt;</del><br/>
codecvt_byname&lt;wchar_t, char, mbstate_t&gt;</code>
</td>
</tr>

<tr>
<td colspan="2" align="center">
<code>&hellip;</code>
</td>
</tr>

</table>
</blockquote>

</li>

<li><p>Modify 28.3.4.2.5.1 <a href="https://wg21.link/locale.codecvt.general">[locale.codecvt.general]</a> as indicated:</p>

<blockquote>
<p>
[&hellip;]
<p/>
-3- The specializations required in Table 105 (28.3.3.1.2.1 <a href="https://wg21.link/locale.category">[locale.category]</a>) 
convert the implementation-defined native character set. 
<code>codecvt&lt;char, char, mbstate_t&gt;</code> implements a degenerate conversion; it does not 
convert at all. <del>The specialization <code>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</code> converts 
between the UTF-16 and UTF-8 encoding forms, and the specialization 
<code>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</code> converts between the UTF-32 and UTF-8 encoding forms.</del> 
<code>codecvt&lt;wchar_t, char, mbstate_t&gt;</code> converts between the native character sets for ordinary 
and wide characters. Specializations on <code>mbstate_t</code> perform conversion between encodings known to 
the library implementer. Other encodings can be converted by specializing on a program-defined 
<code>stateT</code> type. Objects of type <code>stateT</code> can contain any state that is useful to communicate 
to or from the specialized <code>do_in</code> or <code>do_out</code> members.
</p>
</blockquote>
</li>


<li><p>Modify D.19 <a href="https://wg21.link/depr.locale.category">[depr.locale.category]</a> (Deprecated locale category facets) in Annex D as indicated:</p>

<blockquote>
<p>
-1- The <code>ctype</code> locale category includes the following facets as if they were specified in table Table 105
[tab:locale.category.facets] of 28.3.4.2.5.1 <a href="https://wg21.link/locale.codecvt.general">[locale.codecvt.general]</a>.
</p>
<blockquote><pre>
codecvt&lt;char16_t, char, mbstate_t&gt;
codecvt&lt;char32_t, char, mbstate_t&gt;
<ins>codecvt&lt;char16_t, char8_t, mbstate_t&gt;
codecvt&lt;char32_t, char8_t, mbstate_t&gt;</ins>
</pre></blockquote>
<p>
-1- The <code>ctype</code> locale category includes the following facets as if they were specified in table Table 106
[tab:locale.spec] of 28.3.4.2.5.1 <a href="https://wg21.link/locale.codecvt.general">[locale.codecvt.general]</a>.
</p>
<blockquote><pre>
codecvt_byname&lt;char16_t, char, mbstate_t&gt;
codecvt_byname&lt;char32_t, char, mbstate_t&gt;
<ins>codecvt_byname&lt;char16_t, char8_t, mbstate_t&gt;
codecvt_byname&lt;char32_t, char8_t, mbstate_t&gt;</ins>
</pre></blockquote>
<p>
-3- The following class template specializations are required in addition to those specified in 28.3.4.2.5 <a href="https://wg21.link/locale.codecvt">[locale.codecvt]</a>. 
The specialization<ins>s</ins> <code>codecvt&lt;char16_t, char, mbstate_t&gt;</code> <ins>and <code>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</code></ins> 
convert<del>s</del> between the UTF-16 and UTF-8 encoding forms, and the specialization<ins>s</ins> 
<code>codecvt&lt;char32_t, char, mbstate_t&gt;</code> <ins>and <code>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</code></ins> 
convert<del>s</del> between the UTF-32 and UTF-8 encoding forms.
</p>
</blockquote>
</li>

</ol>





</body>
</html>
