<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE>
    CWG Issue 1332</TITLE>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<STYLE TYPE="text/css">
  INS { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .INS { text-decoration:none; background-color:#D0FFD0 }
  DEL { text-decoration:line-through; background-color:#FFA0A0 }
  .DEL { text-decoration:line-through; background-color: #FFD0D0 }
  @media (prefers-color-scheme: dark) {
    HTML { background-color:#202020; color:#f0f0f0; }
    A { color:#5bc0ff; }
    A:visited { color:#c6a8ff; }
    A:hover, a:focus { color:#afd7ff; }
    INS { background-color:#033a16; color:#aff5b4; }
    .INS { background-color: #033a16; }
    DEL { background-color:#67060c; color:#ffdcd7; }
    .DEL { background-color:#67060c; }
  }
  SPAN.cmnt { font-family:Times; font-style:italic }
</STYLE>
</HEAD>
<BODY>
<P><EM>This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21
  Core Issues List revision 118b.
  See http://www.open-std.org/jtc1/sc22/wg21/ for the official
  list.</EM></P>
<P>2025-09-28</P>
<HR>
<A NAME="1332"></A><H4>1332.
  
Handling of invalid universal-character-names
</H4>
<B>Section: </B>5.3.1&#160; [<A href="https://wg21.link/lex.charset">lex.charset</A>]
 &#160;&#160;&#160;

 <B>Status: </B>CD5
 &#160;&#160;&#160;

 <B>Submitter: </B>Mike Miller
 &#160;&#160;&#160;

 <B>Date: </B>2011-06-20<BR>


<P>According to 5.3.1 [<A href="https://wg21.link/lex.charset#2">lex.charset</A>] paragraph 2,</P>

<BLOCKQUOTE>

The character designated by the universal-character-name
<TT>\UNNNNNNNN</TT> is that character whose character short name in
ISO/IEC 10646 is <TT>NNNNNNNN</TT>; the character designated by the
universal-character-name <TT>\uNNNN</TT> is that character whose
character short name in ISO/IEC 10646 is <TT>0000NNNN</TT>. If the
hexadecimal value for a universal-character-name corresponds to a
surrogate code point (in the range 0xD800-0xDFFF,
inclusive), the program is ill-formed. Additionally, if the
hexadecimal value for a universal-character-name outside the
<I>c-char-sequence</I>, <I>s-char-sequence</I>, or
<I>r-char-sequence</I> of a character or string literal corresponds to
a control character (in either of the ranges 0x00-0x1F or 0x7F-0x9F,
both inclusive) or to a character in the basic source character set,
the program is ill-formed.

</BLOCKQUOTE>

<P>It is not specified what should happen if the hexadecimal value
does not designate a Unicode code point: is that undefined behavior
or does it make the program ill-formed?</P>

<P>As an aside, a note should be added explaining why these
requirements apply to to an <I>r-char-sequence</I> when, as the
footnote at the end of the paragraph explains,</P>

<BLOCKQUOTE>

A sequence of characters resembling a universal-character-name in an
<I>r-char-sequence</I> (5.13.5 [<A href="https://wg21.link/lex.string">lex.string</A>]) does not form a
universal-character-name.

</BLOCKQUOTE>

<P><B>Additional note, February, 2021:</B></P>

<P>This issue was resolved editorially in N4842.</P>

<BR><BR>
</BODY>
</HTML>
