<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE>
    CWG Issue 1403</TITLE>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<STYLE TYPE="text/css">
  INS { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .INS { text-decoration:none; background-color:#D0FFD0 }
  DEL { text-decoration:line-through; background-color:#FFA0A0 }
  .DEL { text-decoration:line-through; background-color: #FFD0D0 }
  @media (prefers-color-scheme: dark) {
    HTML { background-color:#202020; color:#f0f0f0; }
    A { color:#5bc0ff; }
    A:visited { color:#c6a8ff; }
    A:hover, a:focus { color:#afd7ff; }
    INS { background-color:#033a16; color:#aff5b4; }
    .INS { background-color: #033a16; }
    DEL { background-color:#67060c; color:#ffdcd7; }
    .DEL { background-color:#67060c; }
  }
  SPAN.cmnt { font-family:Times; font-style:italic }
</STYLE>
</HEAD>
<BODY>
<P><EM>This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21
  Core Issues List revision 118b.
  See http://www.open-std.org/jtc1/sc22/wg21/ for the official
  list.</EM></P>
<P>2025-09-28</P>
<HR>
<A NAME="1403"></A><H4>1403.
  
Universal-character-names in comments
</H4>
<B>Section: </B>5.4&#160; [<A href="https://wg21.link/lex.comment">lex.comment</A>]
 &#160;&#160;&#160;

 <B>Status: </B>CD6
 &#160;&#160;&#160;

 <B>Submitter: </B>David Krauss
 &#160;&#160;&#160;

 <B>Date: </B>2011-10-05<BR>


<P>[ Resolved by P2314R4, adopted in October, 2021. ]</P>

<P>According to 5.3.1 [<A href="https://wg21.link/lex.charset#2">lex.charset</A>] paragraph 2,</P>

<BLOCKQUOTE>

If the hexadecimal value for a universal-character-name
corresponds to a surrogate code point (in the range
0xD800-0xDFFF, inclusive), the program is
ill-formed. Additionally, if the hexadecimal value for a
universal-character-name outside the <I>c-char-sequence</I>,
<I>s-char-sequence</I>, or <I>r-char-sequence</I> of a character
or string literal corresponds to a control character (in either
of the ranges 0x00-0x1F or 0x7F-0x9F, both inclusive) or to a
character in the basic source character set, the program is
ill-formed.

</BLOCKQUOTE>

<P>These restrictions should not apply to comment text.  Arguably
the prohibitions of control characters and characters in the basic
character set already do not apply, as they require that the
preprocessing tokens for literals have already been recognized; this
occurs in phase 3, which also replaces comments with single spaces.
However, the prohibition of surrogate code points is not so limited
and might conceivably be applied within comments.</P>

<P>Probably the most straightforward way of addressing this problem
would be simply to state in 5.4 [<A href="https://wg21.link/lex.comment">lex.comment</A>] that
character sequences that resemble universal-character-names are not
recognized as such within comment text.</P>

<P><B>Additional note (February, 2022):</B></P>

<P>P2314R4 Character sets and encodings (approved in October, 2021)
effected changes so that extended characters are no longer translated
to UCNs in phase 1.</P>

<BR><BR>
</BODY>
</HTML>
