<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 3965: Incorrect example in [format.string.escaped] p3 for formatting of combining characters</title>
<meta property="og:title" content="Issue 3965: Incorrect example in [format.string.escaped] p3 for formatting of combining characters">
<meta property="og:description" content="C++ library issue. Status: WP">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue3965.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#WP">WP</a> status.</em></p>
<h3 id="3965"><a href="lwg-defects.html#3965">3965</a>. Incorrect example in [format.string.escaped] p3 for formatting of combining characters</h3>
<p><b>Section:</b> 28.5.6.5 <a href="https://wg21.link/format.string.escaped">[format.string.escaped]</a> <b>Status:</b> <a href="lwg-active.html#WP">WP</a>
 <b>Submitter:</b> Tom Honermann <b>Opened:</b> 2023-07-31 <b>Last modified:</b> 2023-11-22</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all issues with</b> <a href="lwg-status.html#WP">WP</a> status.</p>
<p><b>Discussion:</b></p>
<p>
The C++23 DIS contains the following example in 28.5.6.5 <a href="https://wg21.link/format.string.escaped">[format.string.escaped]</a> p3. (This example does 
not appear in the most recent <a href="https://wg21.link/N4950" title=" Working Draft, Standard for Programming Language C++">N4950</a> WP or on <a href="https://eel.is/c++draft">https://eel.is/c++draft</a> 
because the project editor has not yet merged changes needed to support rendering of some of the characters involved).
</p>
<blockquote><pre>
string s6 = format("[{:?}]", "&#x1F937;&#x200D;&#x2642;&#xFE0F;"); // s6 has value: ["&#x1F937;\u{200d}&#x2642;\u{fe0f}"]
</pre></blockquote>
<p>
The character to be formatted (&#x1F937;&#x200D;&#x2642;&#xFE0F;) consists of the following sequence of code points 
in the order presented:
</p>
<ul>
<li><p>U+1F937 (SHRUG)</p></li>
<li><p>U+200D (ZERO WIDTH JOINER)</p></li>
<li><p>U+2642 (MALE SIGN)</p></li>
<li><p>U+FE0F (VARIATION SELECTOR-16)</p></li>
</ul>
<p>
28.5.6.5 <a href="https://wg21.link/format.string.escaped">[format.string.escaped]</a> bullet 2.2.1 specifies which code points are to be formatted as a 
<code>\u{<i>hex-digit-sequence</i>}</code> escape sequence:
</p>
<ol style="list-style-type: none">
<li><p>(2.2.1) &mdash; If <i>X</i> encodes a single character <i>C</i>, then:</p>
<ol style="list-style-type: none">
<li><p>(2.2.1.1) &mdash; If <i>C</i> is one of the characters in Table 75 [tab:format.escape.sequences], then the two characters shown 
as the corresponding escape sequence are appended to <i>E</i>.</p></li>
<li><p>(2.2.1.2) &mdash; Otherwise, if <i>C</i> is not U+0020 SPACE and</p>
<ol style="list-style-type: none">
<li><p>(2.2.1.2.1) &mdash; <i>CE</i> is UTF-8, UTF-16, or UTF-32 and <i>C</i> corresponds to a Unicode scalar value whose
Unicode property <code>General_Category</code> has a value in the groups <code>Separator</code> (<code>Z</code>) or <code>Other</code>
(<code>C</code>), as described by UAX #44 of the Unicode Standard, or</p></li>
<li><p>(2.2.1.2.2) &mdash; <i>CE</i> is UTF-8, UTF-16, or UTF-32 and <i>C</i> corresponds to a Unicode scalar value with
the Unicode property <code>Grapheme_Extend=Yes</code> as described by UAX #44 of the Unicode
Standard and <i>C</i> is not immediately preceded in <i>S</i> by a character <i>P</i> appended to <i>E</i> without
translation to an escape sequence, or</p></li>
<li><p>(2.2.1.2.3) &mdash; <i>CE</i> is neither UTF-8, UTF-16, nor UTF-32 and <i>C</i> is one of an implementation-defined
set of separator or non-printable characters</p></li>
</ol>
<p>
then the sequence <code>\u{<i>hex-digit-sequence</i>}</code> is appended to <i>E</i>, where <code><i>hex-digit-sequence</i></code>
is the shortest hexadecimal representation of <i>C</i> using lower-case hexadecimal digits.
</p>
</li>
<li><p>(2.2.1.3) &mdash; Otherwise, <i>C</i> is appended to <i>E</i>.</p></li>
</ol>
</li>
</ol>
<p>
The example is not consistent with the above specification for the final code point. 
<a href="https://util.unicode.org/UnicodeJsps/character.jsp?a=FE0F">U+FE0F</a> is a single character, 
is not one of the characters in Table 75, is not U+0020, has a <code>General_Category</code> of <code>Nonspacing Mark (Mn)</code> 
which is neither <code>Z</code> nor <code>C</code>, has <code>Grapheme_Extend=Yes</code> but the prior character (U+2642) is not 
formatted as an escape sequence, and is not one of an implementation-defined set of separator or non-printable characters 
(for the purposes of this example; the example assumes a UTF-8 encoding). Thus, formatting for this character falls to 
the last bullet point and the character should be appended as is (without translation to an escape sequence). 
Since this character is a combining character, it should  combine with the previous character and thus alter the 
appearance of U+2642 (thus producing <code>"&#x2642;&#xFE0F;"</code> instead of <code>"&#x2642;\u{fe0f}"</code>).
</p>

<p><i>[2023-10-27; Reflector poll]</i></p>

<p>
Set status to Tentatively Ready after six votes in favour during reflector poll.
</p>

<p><i>[2023-11-11 Approved at November 2023 meeting in Kona. Status changed: Voting &rarr; WP.]</i></p>



<p id="res-3965"><b>Proposed resolution:</b></p>
<p>
This wording is relative to <a href="https://wg21.link/N4950" title=" Working Draft, Standard for Programming Language C++">N4950</a> plus missing editorial pieces from <a href="https://wg21.link/P2286R8" title=" Formatting Ranges">P2286R8</a>.
</p>

<ol>

<li><p>Modify the example following 28.5.6.5 <a href="https://wg21.link/format.string.escaped">[format.string.escaped]</a> p3 as indicated:</p>

<blockquote class="note">
<p>
[<i>Drafting note</i>: The presented example was voted in as part of <a href="https://wg21.link/P2286R8" title=" Formatting Ranges">P2286R8</a> during the July 2022 
Virtual Meeting but is not yet accessible in the most recent working draft <a href="https://wg21.link/N4950" title=" Working Draft, Standard for Programming Language C++">N4950</a>.
<p/>
Note that the final character (&#x2642;&#xFE0F;) is composed from the two code points U+2642 and U+FE0F.
]
</p>
</blockquote>


<blockquote>
<p>
</p>
<blockquote><pre>
string s6 = format("[{:?}]", "&#x1F937;&#x200D;&#x2642;&#xFE0F;"); // s6 has value: <del>["&#x1F937;\u{200d}&#x2642;\u{fe0f}"]</del><ins>["&#x1F937;\u{200d}&#x2642;&#xFE0F;"]</ins>
</pre></blockquote>

</blockquote>
</li>
</ol>





</body>
</html>
