﻿<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding-left: 0.5empadding-right: 0.5em; ; }

blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

blockquote pre em { font-family: normal }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }

</style>

<title>Adding u8 character literals</title>
</head>
<body>

<div style="text-align: right; float: right">
<p>
ISO/IEC JTC1 SC22 WG21<br>
N4267 /
EWG 119<br>
Richard Smith<br>
richard@metafoo.co.uk<br>
2014-11-05
</div>

<h1>Adding u8 character literals</h1>

<h2><a name="Wording">Wording</a></h2>

<p>
Change in 2.14.3 (lex.ccon):
<blockquote class="std">
<pre><em>character-literal:</em>
        <del>' <em>c-char-sequence</em> '</del>
        <del>u' <em>c-char-sequence</em> '</del>
        <del>U' <em>c-char-sequence</em> '</del>
        <del>L' <em>c-char-sequence</em> '</del>
        <ins><em>encoding-prefix<sub>opt</sub></em> ' <em>c-char-sequence</em> '</ins>
<ins><em>encoding-prefix:</em> <span style="font-family: normal">one of</span></ins>
        <ins><tt>u8 u U L</tt></ins>
</pre>
[&hellip;]
</blockquote>

<p>
Change in 2.14.3 (lex.ccon) paragraph 1 and split it into two paragraphs:
<blockquote class="std">
A character literal is one or more characters enclosed in single quotes, as in <tt>'x'</tt>, optionally preceded by
<del>one of the letters</del> <ins><tt>u8</tt>,</ins> <tt>u</tt>, <tt>U</tt>,
or <tt>L</tt>, as in <ins><tt>u8'w'</tt>,</ins> <tt>u'y'</tt>, <tt>U'z'</tt>,
or <tt>L'x'</tt>, respectively.

<p>
A character literal that does not begin
with <ins><tt>u8</tt>,</ins> <tt>u</tt>, <tt>U</tt>, or <tt>L</tt> is an ordinary character literal<del>,
also referred to as a narrow-character literal</del>.
An ordinary
character literal that contains a single <em>c-char</em> representable in the execution character set has type <tt>char</tt>,
with value equal to the numerical value of the encoding of the <em>c-char</em> in the execution character set. An
ordinary character literal that contains more than one <em>c-char</em> is a <em>multicharacter literal</em>. A multicharacter
literal, or an ordinary character literal containing a single <em>c-char</em> not representable in the execution character
set, is conditionally-supported, has type <tt>int</tt>, and has an implementation-defined value.
</blockquote>

<p><em>Drafting note: the term "narrow-character literal" was not used anywhere else in the standard, and confusingly sometimes referred to literals of non-narrow-character type.</em></p>

<p>
Change in 2.14.3 (lex.ccon) paragraph 2 and split it into four paragraphs:
<blockquote class="std">
<p>
<ins>
A character literal that begins with <tt>u8</tt>, such as <tt>u8'w'</tt>, is a character literal of type
<tt>char</tt>, known as a <i>UTF-8 character literal</i>. The value of a UTF-8 character literal is equal to
its ISO 10646 code point value, provided that the code point value is representable with a single UTF-8
code unit (that is, provided it is in the C0 Controls and Basic Latin Unicode block). If the value is not representable
with a single UTF-8 code unit, the program is ill-formed. A UTF-8 character
literal containing multiple <em>c-char</em>s is ill-formed.
</ins>

<p>
A character literal that begins with the letter <tt>u</tt>, such as <tt>u'y'</tt>, is a character literal of type <tt>char16_t</tt>. The
value of a <tt>char16_t</tt> literal containing a single <em>c-char</em> is equal to its ISO 10646 code point value, provided that
the code point is representable with a single 16-bit code unit. (That is, provided it is a basic multi-lingual
plane code point.) If the value is not representable within 16 bits, the program is ill-formed. A <tt>char16_t</tt>
literal containing multiple <em>c-char</em>s is ill-formed.

<p>
A character literal that begins with the letter <tt>U</tt>, such as
<tt>U'z'</tt>, is a character literal of type <tt>char32_t</tt>. The value of a <tt>char32_t</tt> literal containing a single <em>c-char</em> is
equal to its ISO 10646 code point value. A <tt>char32_t</tt> literal containing multiple <em>c-char</em>s is ill-formed.

<p>A character literal that begins with the letter <tt>L</tt>, such as <tt>L'x'</tt>, is a wide-character literal. A wide-character
literal has type <tt>wchar_t</tt>. [<i>Footnote:</i> &hellip;] The value of a wide-character literal containing a single <em>c-char</em> has value equal
to the numerical value of the encoding of the <em>c-char</em> in the execution wide-character set, unless the <em>c-char</em>
has no representation in the execution wide-character set, in which case the value is implementation-defined.
[ <i>Note:</i> The type <tt>wchar_t</tt> is able to represent all members of the execution wide-character set (see 3.9.1).
]. The value of a wide-character literal containing multiple <em>c-char</em>s is implementation-defined.
</blockquote>

<p>
Change in 2.14.3 (lex.ccon) paragraph 4:
<blockquote class="std">
[&hellip;] The value of a character literal is implementation-defined if it falls
outside of the implementation-defined range defined for
<tt>char</tt> (for literals with no prefix)<del>,
<tt>char16_t</tt> (for literals prefixed by <tt>'u'</tt>),
<tt>char32_t</tt> (for literals prefixed by <tt>'U'</tt>),</del> or
<tt>wchar_t</tt> (for literals prefixed by <tt><del>'</del>L<del>'</del></tt>).
<ins>[&nbsp;<em>Note</em>: If the value of a character literal prefixed by <tt>u</tt>, <tt>u8</tt>, or
<tt>U</tt> is outside the range defined for its type, the program is ill-formed.&nbsp;]</ins>
</blockquote>

<p>
Change in 2.14.5 (lex.string):
<blockquote class="std">
[&hellip;]
<pre><del><em>encoding-prefix:</em></del>
        <del>u8</del>
        <del>u</del>
        <del>U</del>
        <del>L</del></pre>
[&hellip;]
</blockquote>
