<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE>
    CWG Issue 1335</TITLE>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<STYLE TYPE="text/css">
  INS { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .INS { text-decoration:none; background-color:#D0FFD0 }
  DEL { text-decoration:line-through; background-color:#FFA0A0 }
  .DEL { text-decoration:line-through; background-color: #FFD0D0 }
  @media (prefers-color-scheme: dark) {
    HTML { background-color:#202020; color:#f0f0f0; }
    A { color:#5bc0ff; }
    A:visited { color:#c6a8ff; }
    A:hover, a:focus { color:#afd7ff; }
    INS { background-color:#033a16; color:#aff5b4; }
    .INS { background-color: #033a16; }
    DEL { background-color:#67060c; color:#ffdcd7; }
    .DEL { background-color:#67060c; }
  }
  SPAN.cmnt { font-family:Times; font-style:italic }
</STYLE>
</HEAD>
<BODY>
<P><EM>This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21
  Core Issues List revision 118b.
  See http://www.open-std.org/jtc1/sc22/wg21/ for the official
  list.</EM></P>
<P>2025-09-28</P>
<HR>
<A NAME="1335"></A><H4>1335.
  
Stringizing, extended characters, and universal-character-names
</H4>
<B>Section: </B>15.7.3&#160; [<A href="https://wg21.link/cpp.stringize">cpp.stringize</A>]
 &#160;&#160;&#160;

 <B>Status: </B>CD6
 &#160;&#160;&#160;

 <B>Submitter: </B>Johannes Schaub
 &#160;&#160;&#160;

 <B>Date: </B>2011-07-03
  &#160;&#160;&#160;
  <B>Liaison: </B>WG14<BR>


<P>[Resolved at the October, 2021 meeting by paper P2314R4.]</P>



<P>When a string literal containing an extended character is
stringized (15.7.3 [<A href="https://wg21.link/cpp.stringize">cpp.stringize</A>]), the result contains a
universal-character-name instead of the original extended character.
The reason is that the extended character is translated to a
universal-character-name in translation phase 1 (5.2 [<A href="https://wg21.link/lex.phases">lex.phases</A>]), so that the string literal <TT>"@"</TT> (where <TT>@</TT>
represents an extended character) becomes <TT>"\uXXXX"</TT>.  Because
the preprocessing token is a string literal, when the stringizing
occurs in translation phase 4, the <TT>\</TT> is doubled, and the
resulting string literal is <TT>"\"\\uXXXX\""</TT>.  As a result, the
universal-character-name is not recognized as such when the translation
to the execution character set occurs in translation phase 5.  (Note
that phase 5 translation does occur if the stringized extended character
does not appear in a string literal.)  Existing practice appears to
ignore these rules and preserve extended characters in stringized
string literals, however.</P>

<P>See also <A HREF="578.html">issue 578</A>.</P>

<P><B>Additional note (August, 2013):</B></P>

<P>Implementations are granted substantial latitude in their
handling of extended characters and universal-character-names in
5.2 [<A href="https://wg21.link/lex.phases#1">lex.phases</A>] paragraph 1 phase 1, i.e.,</P>

<BLOCKQUOTE>

(An implementation may use any internal encoding, so long as an actual
extended character encountered in the source file, and the same extended
character expressed in the source file as a universal-character-name (i.e.,
using the <TT>\uXXXX</TT> notation), are handled equivalently except where
this replacement is reverted in a raw string literal.)

</BLOCKQUOTE>

<P>However, this freedom is mostly nullified by the requirements of
stringizing in 15.7.3 [<A href="https://wg21.link/cpp.stringize#2">cpp.stringize</A>] paragraph 2:</P>

<BLOCKQUOTE>

If, in the replacement list, a parameter is immediately preceded by
a <TT>#</TT> preprocessing token, both are replaced by a single character
string literal preprocessing token that contains the spelling of the
preprocessing token sequence for the corresponding argument.

</BLOCKQUOTE>

<P>This means that, in order to handle a construct like</P>

<PRE>
  #define STRINGIZE_LITERAL( X ) # X
  #define STRINGIZE( X ) STRINGIZE_LITERAL( X )

  STRINGIZE( STRINGIZE( identifier_\u00fC\U000000Fc ) )
</PRE>

<P>an implementation must recall the original spelling, including
the form of UCN and the capitalization of any non-numeric hexadecimal
digits, rather than simply translating the characters into a convenient
internal representation.</P>

<P>To effect the freedom asserted in 5.2 [<A href="https://wg21.link/lex.phases">lex.phases</A>], the
description of stringizing should make the spelling of a
universal-character-name implementation-defined.</P>

<P><B>Additional note (February, 2022):</B></P>

<P>P2314R4 Character sets and encodings (approved in October, 2021)
effected changes so that extended characters are no longer translated
to UCNs in phase 1.</P>

<BR><BR>
</BODY>
</HTML>
