<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE>
    CWG Issue 1656</TITLE>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<STYLE TYPE="text/css">
  INS { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .INS { text-decoration:none; background-color:#D0FFD0 }
  DEL { text-decoration:line-through; background-color:#FFA0A0 }
  .DEL { text-decoration:line-through; background-color: #FFD0D0 }
  @media (prefers-color-scheme: dark) {
    HTML { background-color:#202020; color:#f0f0f0; }
    A { color:#5bc0ff; }
    A:visited { color:#c6a8ff; }
    A:hover, a:focus { color:#afd7ff; }
    INS { background-color:#033a16; color:#aff5b4; }
    .INS { background-color: #033a16; }
    DEL { background-color:#67060c; color:#ffdcd7; }
    .DEL { background-color:#67060c; }
  }
  SPAN.cmnt { font-family:Times; font-style:italic }
</STYLE>
</HEAD>
<BODY>
<P><EM>This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21
  Core Issues List revision 118b.
  See http://www.open-std.org/jtc1/sc22/wg21/ for the official
  list.</EM></P>
<P>2025-09-28</P>
<HR>
<A NAME="1656"></A><H4>1656.
  
Encoding of numerically-escaped characters
</H4>
<B>Section: </B>5.13.3&#160; [<A href="https://wg21.link/lex.ccon">lex.ccon</A>]
 &#160;&#160;&#160;

 <B>Status: </B>CD6
 &#160;&#160;&#160;

 <B>Submitter: </B>Mike Miller
 &#160;&#160;&#160;

 <B>Date: </B>2013-04-30<BR>


<P>[Accepted at the November, 2020 meeting as part of paper P2029R4.]</P>

<P>According to 5.13.3 [<A href="https://wg21.link/lex.ccon#4">lex.ccon</A>] paragraph 4,</P>

<BLOCKQUOTE>

The escape <TT>\ooo</TT> consists of the backslash followed by one, two, or
three octal digits that are taken to specify the value of the desired
character. The escape <TT>\xhhh</TT> consists of the backslash followed
by <TT>x</TT> followed by one or more hexadecimal digits that are taken to
specify the value of the desired character. There is no limit to the number
of digits in a hexadecimal sequence. A sequence of octal or hexadecimal
digits is terminated by the first character that is not an octal digit or a
hexadecimal digit, respectively. The value of a character literal is
implementation-defined if it falls outside of the implementation-defined
range defined for <TT>char</TT> (for literals with no
prefix), <TT>char16_t</TT> (for literals prefixed
by <TT>'u'</TT>), <TT>char32_t</TT> (for literals prefixed
by <TT>'U'</TT>), or <TT>wchar_t</TT> (for literals prefixed
by <TT>'L'</TT>).

</BLOCKQUOTE>

<P>It is not clearly stated whether the &#8220;desired character&#8221;
being specified reflects the source or the target encoding.  This
particularly affects UTF-8 string literals (5.13.5 [<A href="https://wg21.link/lex.string#7">lex.string</A>] paragraph 7)
:</P>

<BLOCKQUOTE>

A string literal that begins with <TT>u8</TT>, such as <TT>u8"asdf"</TT>,
is a UTF-8 string literal and is initialized with the given characters as
encoded in UTF-8.

</BLOCKQUOTE>

<P>For example, assuming the source encoding is Latin-1, is
<TT>u8"\xff"</TT> supposed to specify a three-byte string whose
first two bytes are <TT>0xc3 0xbf</TT> (the UTF-8 encoding of
<TT>\u00ff</TT>) or a two-byte string whose first byte has the
value <TT>0xff</TT>?  (At least some current implementations assume the
latter interpretation.)</P>

<P><B>Notes from the September, 2013 meeting:</B></P>

<P>The second interpretation (that the escape sequence specifies the
execution-time code unit) is intended.</P>

<BR><BR>
</BODY>
</HTML>
