<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 3944: Formatters converting sequences of char to sequences of wchar_t</title>
<meta property="og:title" content="Issue 3944: Formatters converting sequences of char to sequences of wchar_t">
<meta property="og:description" content="C++ library issue. Status: WP">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue3944.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#WP">WP</a> status.</em></p>
<h3 id="3944"><a href="lwg-defects.html#3944">3944</a>. Formatters converting sequences of <code>char</code> to sequences of <code>wchar_t</code></h3>
<p><b>Section:</b> 28.5.6.4 <a href="https://wg21.link/format.formatter.spec">[format.formatter.spec]</a> <b>Status:</b> <a href="lwg-active.html#WP">WP</a>
 <b>Submitter:</b> Mark de Wever <b>Opened:</b> 2023-06-01 <b>Last modified:</b> 2024-07-08</p>
<p><b>Priority: </b>3
</p>
<p><b>View other</b> <a href="lwg-index-open.html#format.formatter.spec">active issues</a> in [format.formatter.spec].</p>
<p><b>View all other</b> <a href="lwg-index.html#format.formatter.spec">issues</a> in [format.formatter.spec].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#WP">WP</a> status.</p>
<p><b>Discussion:</b></p>
<p>
I noticed some interesting features introduced by the range based
formatters in C++23
</p>
<blockquote><pre>
// Ill-formed in C++20 and C++23
const char* cstr = "hello";
char* str = const_cast&lt;char*&gt;(cstr);
std::format(L"{}", str);
std::format(L"{}",cstr);

// Ill-formed in C++20
// In C++23 they give L"['h', 'e', 'l', 'l', 'o']"
std::format(L"{}", "hello"); // A libc++ bug prevents this from working.
std::format(L"{}", std::string_view("hello"));
std::format(L"{}", std::string("hello"));
std::format(L"{}", std::vector{'h', 'e', 'l', 'l', 'o'});
</pre></blockquote>
<p>
An example is shown <a href="https://godbolt.org/z/P9E6TK3YW">here</a>. This only
shows libc++ since libstdc++ and MSVC STL have not implemented the
formatting ranges papers (<a href="https://wg21.link/P2286R8" title=" Formatting Ranges">P2286R8</a> and <a href="https://wg21.link/P2585R0" title=" Improving default container formatting">P2585R0</a>) yet.
<p/>
The difference between C++20 and C++23 is the existence of range
formatters. These formatters use the formatter specialization
<code>formatter&lt;char, wchar_t&gt;</code> which converts the sequence of <code>char</code>s 
to a sequence of <code>wchar_t</code>s.
<p/>
In this conversion <code>same_as&lt;char, charT&gt;</code> is <code>false</code>, thus the requirements
of the range-type <code>s</code> and <code>?s</code> ([tab:formatter.range.type]) aren't met. So
the following is ill-formed:
</p>
<blockquote><pre>
std::format(L"{:s}", std::string("hello")); // Not L"hello"
</pre></blockquote>
<p>
It is surprising that some string types can be formatted as a sequence
of wide-characters, but others not. A sequence of characters can be a
sequence UTF-8 code units. This is explicitly supported in the width
estimation of string types. The conversion of <code>char</code> to <code>wchar_t</code> will
convert the individual code units, which will give incorrect results for
multi-byte code points. It will not transcode UTF-8 to UTF-16/32. The
current behavior is not in line with the note in
28.5.6.4 <a href="https://wg21.link/format.formatter.spec">[format.formatter.spec]</a>/2
</p>
<blockquote><p>
[<i>Note 1</i>: Specializations such as <code>formatter&lt;wchar_t, char&gt;</code> and
<code>formatter&lt;const char*, wchar_t&gt;</code> that would require implicit
multibyte / wide string or character conversion are disabled. &mdash; <i>end note</i>]
</p></blockquote>
<p>
Disabling this could be done by explicitly disabling the <code>char</code> to <code>wchar_t</code>
sequence formatter. Something along the lines of
</p>
<blockquote><pre>
template&lt;ranges::input_range R&gt;
  requires(format_kind&lt;R&gt; == range_format::sequence &amp;&amp;
           same_as&lt;remove_cvref_t&lt;ranges::range_reference_t&lt;R&gt;&gt;, char&gt;)
struct formatter&lt;R, wchar_t&gt; : __disabled_formatter {};
</pre></blockquote>
<p>
where <code>__disabled_formatter</code> satisfies 28.5.6.4 <a href="https://wg21.link/format.formatter.spec">[format.formatter.spec]</a>/5, would
do the trick. This disables the conversion for all sequences not only
the string types. So <code>vector</code>, <code>array</code>, <code>span</code>, etc. would be disabled.
<p/>
This does not disable the conversion in the <code>range_formatter</code>. This allows
users to explicitly opt in to this formatter for their own
specializations.
<p/>
An alternative would be to only disable this conversion for string type
specializations (28.5.6.4 <a href="https://wg21.link/format.formatter.spec">[format.formatter.spec]</a>/2.2) where <code>char</code> to 
<code>wchar_t</code> is used:
</p>
<blockquote><pre>
template&lt;size_t N&gt; struct formatter&lt;charT[N], charT&gt;;
template&lt;class traits, class Allocator&gt;
  struct formatter&lt;basic_string&lt;charT, traits, Allocator&gt;, charT&gt;;
template&lt;class traits&gt;
  struct formatter&lt;basic_string_view&lt;charT, traits&gt;, charT&gt;;
</pre></blockquote>
<p>
Disabling following the following two is not strictly required:
</p>
<blockquote><pre>
template&lt;&gt; struct formatter&lt;char*, wchar_t&gt;;
template&lt;&gt; struct formatter&lt;const char*, wchar_t&gt;;
</pre></blockquote>
<p>
However, if (<code>const</code>) <code>char*</code> becomes an <code>input_range</code> 
in a future version C++, these formatters would become enabled. 
Disabling all five instead of the three required specializations seems like a 
future proof solution.
<p/>
Since there is no enabled narrowing formatter specialization
</p>
<blockquote><pre>
template&lt;&gt; struct formatter&lt;wchar_t, char&gt;;
</pre></blockquote>
<p>
there are no issues for <code>wchar_t</code> to <code>char</code> conversions.
<p/>
Before proceeding with a proposed resolution the following design
questions need to be addressed:
</p>
<ul>
<li><p>Do we want to allow string types of <code>char</code>s to be formatted as
sequences of <code>wchar_t</code>s?</p></li>
<li><p>Do we want to allow non string type sequences of <code>char</code>s to be
formatted as sequences of <code>wchar_t</code>s?</p></li>
<li><p>Should we disable <code>char</code> to <code>wchar_t</code> conversion in the <code>range_formatter</code>?</p></li>
</ul>
<p>
SG16 has indicated they would like to discuss this issue during a telecon.
</p>

<p><i>[2023-06-08; Reflector poll]</i></p>

<p>
Set status to SG16 and priority to 3 after reflector poll.
</p>

<p><i>[2023-07-26; Mark de Wever provides wording confirmed by SG16]</i></p>

<p><i>[2024-03-18; Tokyo: move to Ready]</i></p>


<p><i>[St. Louis 2024-06-29; Status changed: Voting &rarr; WP.]</i></p>



<p id="res-3944"><b>Proposed resolution:</b></p>
<p>
This wording is relative to <a href="https://wg21.link/N4950" title=" Working Draft, Standard for Programming Language C++">N4950</a>.
</p>

<ol>

<li><p>Modify 28.5.6.4 <a href="https://wg21.link/format.formatter.spec">[format.formatter.spec]</a> as indicated:</p>

<blockquote class="note">
<p>
[<i>Drafting note</i>: The unwanted conversion happens due to the <code>formatter</code> base class
specialization (28.5.7.3 <a href="https://wg21.link/format.range.fmtdef">[format.range.fmtdef]</a>)
</p>
<pre>
struct <i>range-default-formatter</i>&lt;range_format::sequence, R, charT&gt;
</pre>
<p>
which is defined the header <code>&lt;format&gt;</code>. Therefore the disabling is only
needed in this header) &mdash; <i>end drafting note</i>]
</p>
</blockquote>

<blockquote>
<p>
-2- [&hellip;]
<p/>
The <code>parse</code> member functions of these formatters interpret the format specification as a 
<i>std-format-spec</i> as described in 28.5.2.2 <a href="https://wg21.link/format.string.std">[format.string.std]</a>.
<p/>
[<i>Note 1</i>: Specializations such as <code>formatter&lt;wchar_t, char&gt;</code> <del>and 
<code>formatter&lt;const char*, wchar_t&gt;</code></del> that would require implicit multibyte / wide string 
or character conversion are disabled. &mdash; <i>end note</i>]
<p/>
<ins>
-?- The header <code>&lt;format&gt;</code> provides the following disabled specializations:
</ins>
</p>
<ol style="list-style-type: none">
<li><p><ins>(?.1) &mdash; The string type specializations</ins></p>
<blockquote><pre>
<ins>template&lt;&gt; struct formatter&lt;char*, wchar_t&gt;;
template&lt;&gt; struct formatter&lt;const char*, wchar_t&gt;;
template&lt;size_t N&gt; struct formatter&lt;char[N], wchar_t&gt;;
template&lt;class traits, class Allocator&gt;
  struct formatter&lt;basic_string&lt;char, traits, Allocator&gt;, wchar_t&gt;;
template&lt;class traits&gt;
  struct formatter&lt;basic_string_view&lt;char, traits&gt;, wchar_t&gt;;</ins>
</pre></blockquote>
</li>
</ol>
<p>
-3- For any types <code>T</code> and <code>charT</code> for which neither the library nor the user provides 
an explicit or partial specialization of the class template <code>formatter</code>, 
<code>formatter&lt;T, charT&gt;</code> is disabled.
</p>
</blockquote>
</li>

</ol>




</body>
</html>
