<?xml version="1.0" encoding="us-ascii" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
	<meta http-equiv="content-type" content="text/html; charset=us-ascii" />
	<title>Recommendations for extended identifier characters for C and C++</title>
	<style type="text/css">
		.altid
		{
			border: medium solid black;
		}
		.right
		{
			float: right;
		}
		.xml5
		{
			background-color: Silver;
		}
	</style>
</head>
<body>
	<table class="right">
		<tr>
			<th>
				Doc. no.:
			</th>
			<td>WG14/N1518<br />
				WG21/N3146<br />
				PL22.16/10-0136</td>
		</tr>
		<tr>
			<th>
				Date:
			</th>
			<td>2010-10-04</td>
		</tr>
		<tr>
			<th>
				Reply to:
			</th>
			<td>Clark Nelson </td>
		</tr>
		<tr>
			<th>
				Phone:
			</th>
			<td>+1-503-712-8433</td>
		</tr>
		<tr>
			<th>
				Email:
			</th>
			<td>clark.nelson@intel.com</td>
		</tr>
	</table>
	<h1>
		Recommendations for extended identifier characters for C and C++</h1>
	<h2>
		Introduction</h2>
	<p>
		In response to their 2010 FCD ballot, WG21 received the following comment (designated
		as CA 24) from the Canadian national body:
	</p>
	<blockquote>
		<p>
			A list of issues related TR 10176:2003</p>
		<ol>
			<li>"Combining characters should not appear as the first character of an identifier."
				Reference: ISO/IEC TR 10176:2003 (Annex A) This is not reflected in FCD.</li>
			<li>Restrictions on the first character of an identifier are not observed as recommended
				in TR 10176:2003. The inclusion of digits (outside of those in the basic character
				set) under
				<var>
					identifer-nondigit</var>
				is implied by FCD.</li>
			<li>It is implied that only the "main listing" from Annex A is included for C++. That
				is, the list ends with the Special Characters section. This is not made explicit
				in FCD. Existing practice in C++03 as well as WG 14 (C, as of N1425) and WG 4 (COBOL,
				as of N4315) is to include a list in a normative Annex.</li>
			<li>Specify width sensitivity as implied by C++03: <code>\uFF21</code> is not the same
				as <code>A</code>. Case sensitivity is already stated in [lex.name].</li>
		</ol>
	</blockquote>
	<p>
		It is reasonable to expect that WG14 would receive a very similar comment in response
		to an upcoming ballot.</p>
	<h2>
		Background</h2>
	<p>
		In investigating what various standards say about extended characters in identifiers,
		the following facts came to light.</p>
	<h3>
		General</h3>
	<ol>
		<li>The C++ WD now cites TR 10176:2003 for the specification of extended identifier
			characters.</li>
		<li>The C standard (and WD) incorporates the lists from TR 10176:1998 for the specification
			of valid extended identifier characters.</li>
		<li>There are differences between the lists in the C standard and in TR 10176:2003.</li>
		<li>The lists in TR 10176:2003 are recommended as the <strong>minimum</strong> set of
			characters that an implementation should allow in identifiers.</li>
		<li>The C standard explicitly allows implementations to accept other characters (implementation-defined)
			outside the basic source character set in identifiers.</li>
		<li>The C++ standard (and WD) gives no such permission.</li>
		<li>In C, a UCN not in the set of those specified as allowed is undefined behavior.</li>
		<li>In C++, a UCN not in the set of those specified as allowed requires a diagnostic.</li>
		<li>TR 10176 is based on the same principles as the "Default Identifier Syntax" defined
			in Unicode UAX#31 (see [DefId]): very roughly, characters defined to be letters
			are allowed initially, characters defined to be digits and combining marks are allowed
			non-initially.</li>
		<li>The identifier character set in XML 1.0 was originally defined using the same principles.</li>
		<li>For the sake of stability in the presence of an expanding character set, UAX#31
			also defines an "Alternative Identifier Syntax" (see [AltId]), in which anything
			is allowed but: white space, "syntax" characters, private use characters, surrogates,
			control characters, and non-characters.</li>
		<li>Unicode defines "syntax" characters to include most punctuation and symbols (including
			mathematical operators).</li>
		<li>XML 1.1 also defines the identifier character set generously, excluding only certain
			characters and character ranges.</li>
		<li>The latest edition of XML 1.0 has adopted the identifier specification from XML
			1.1.</li>
	</ol>
	<h3>
		Combining characters</h3>
	<ol>
		<li>The C standard (and presumably TR 10176:1998) does not list combining characters,
			compatibility presentation forms, or fullwidth or halfwidth forms as valid in identifiers.</li>
		<li>TR 10176:2003 has a separate list, also in Annex A, for combining characters, compatibility
			presentation forms, fullwidth &amp; halfwidth forms, etc.</li>
		<li>The C++ standard references Annex A, without even acknowledging that it has two
			parts.</li>
		<li>TR 10176:2003 recommends that a combining character should not appear as the first
			character of an identifier.</li>
	</ol>
	<h3>
		Digits</h3>
	<ol>
		<li>TR 10176:2003 gives digits as an example of a kind of character often not allowed
			initially, and calls them out as a separate category in Annex A, but makes no actual
			recommendation.</li>
		<li>In C, a UCN representing a digit is not allowed to start an identifier.</li>
		<li>The C++ standard has no such restriction.</li>
	</ol>
	<h3>
		Halfwidth and fullwidth variants</h3>
	<ol>
		<li>TR 10176:2003 makes no recommendation with respect to halfwidth or fullwidth variants,
			but notes that COBOL (in particular) considers halfwidth and fullwidth variants
			to be equivalent to the original character (at least in some contexts).</li>
		<li>The C standard does not explicitly include halfwidth or fullwidth variants as identifier
			characters.</li><li>The C++ standard is silent on this topic.</li>
	</ol>
	<h2>
		Discussion</h2>
	<h3>
		General</h3>
	<p>
		WG14 and WG21 have experienced that trying to keep a language standard in synch
		with an expanding character set definition can be problematic. The Unicode Consortium
		and the World Wide Web Consortium have both acknowledged this, and provided (normative)
		guidance to avoid the problem. Although TR 10176 is based on problematic principles,
		the recommendation that <strong>at least</strong> the specified characters should
		be accepted in identifiers can be completely satisfied using the [AltId] approach.Therefore,
		it seems reasonable to abandon the [DefId] and TR 10176 approach, in favor of something
		simpler and more stable.</p>
	<p>
		In a sane world, C and C++ would use the same definition of valid extended identifier
		characters. In an ideal world, the definition would appear in the C standard, and
		be referenced by the C++ standard. The publication schedules of WG14 and WG21 would
		appear to put an ideal world out of reach. But putting textually identical specifications
		in annexes of both C and C++, based on the [AltId] principles, would seem to be
		feasible.</p>
	<h3>
		Specifics</h3>
	<p>
		Estalishing the principle by which the set of valid extended identifier characters
		will be defined is clearly not enough; it is also necessary to select the exact
		definition.</p>
	<p>
		In an ideal world, it would be possible to cite a definition from a different standard
		(as C++ was going to attempt to do by referencing [TR2003]). Citing the identifier
		syntax from [XML2008] would be problematic, for several reasons:</p>
	<ol>
		<li>It's not an ISO standard.</li>
		<li>It covers so much more than just the identifier syntax.</li>
		<li>Its identifier syntax allows some inappropriate characters (including the basic
			source characters &quot;<code>-</code>&quot;, &quot;<code>.</code>" and &quot;<code>:</code>&quot;,
			and several punctuation marks used in CJK text), and disallows some inappropriate
			characters (including "<code title="FEMININE ORDINAL INDICATOR">&#xaa;</code>",
			"<code title="MICRO SIGN">&#xb5;</code>" and "<code title="MASCULINE ORDINAL INDICATOR">&#xba;</code>",
			which were allowed in C99).</li>
	</ol>
	<p>
		The content and organization of UAX#31 would make it easier to cite. It's still
		not an ISO standard, but that may not be an insuperable problem. In general, the
		current definition of [AltId] seems to be better than [XML2008] at tracking recent
		assignments of blocks in Unicode. But the exact definition of [AltId] has what appears
		to be a very serious flaw.</p>
	<p>
		[AltId] defines a property called &quot;Pattern_White_Space&quot;; characters having
		that property are disallowed as identifier characters.Several other categories of
		characters are also disallowed, including &quot;Pattern_Syntax&quot; characters
		and control characters. The &quot;Pattern_White_Space&quot; category includes the
		ASCII SPACE character, several control characters (including HT, LF and CR), and
		a few others (including the Unicode LINE SEPARATOR and PARAGRAPH SEPARATOR characters)
		&mdash; but it does not include several other characters defined as spaces, including
		the Latin-1 NO-BREAK SPACE (NBSP).</p>
	<p>
		In addition, it should be noted that [AltId] seems to be unique among standards
		and recommendations for identifiers, in having no restriction on characters allowed
		initially.</p>
	<h3>
		Combining characters</h3>
	<p>
		There is a fairly serious technical reason why an identifier should not start with
		a combining character: a combining character combines, semantically and visually,
		with the character <strong>preceding</strong> it. So disallowing this would prevent
		potentially serious, gratuitous confusion. Thus TR 10176 (reasonably) recommends
		that it be disallowed.</p>
	<p>
		But [AltId] imposes no restrictions on the first character of an identifier. On
		the other hand, [XML2008] disallows (some) general combining characters initially,
		but not any script-specific combining character.</p>
	<p>
		It should be noted that the C and C++ standards published to date appear to scrupulously
		disallow all combining characters in identifiers &mdash; both general and script-specific.Therefore,
		it has not previously been necessary to place restrictions on whether to allow them
		initially. It is difficult to imagine how to continue to avoid this problem using
		an extended identifier character definition based on [AltId].</p>
	<h3>
		Digits</h3>
	<p>
		While an identifier starting with a script-specific digit might be confused with
		a number by a human (who recognizes the digit as such), it will not be so confused
		by a C or C++ compiler &mdash; unless the compiler has been specifically extended
		to recognize script-specific numerical literals. Thus the problem of initial script-specific
		digits is less severe than the problem of initial combining characters (which is
		inherent in the structure of Unicode). TR 10176 does not go so far as to actually
		recommend that they be disallowed &mdash; it only mentions that they might be disallowed.</p>
	<p>
		To date, C++ has never allowed script-specific digits as identifier characters,
		initially or otherwise. C allows them. While C disallows them initially, the &quot;shall&quot;
		imposing the requirement is not in a &quot;Constraint&quot; section, which of course
		means that implementations have not, strictly speaking, been required to diagnose
		them; instead, an identifier starting with a script-specific digit yields undefined
		behavior.</p>
	<h3>
		Halfwidth and fullwidth variants</h3>
	<p>
		This is the tip of a very large iceberg; see <a href="http://www.unicode.org/reports/tr15/tr15-31.html">
			UAX#15</a> (if you dare) for much, much more information. Basically, width variations
		are just one of a dozen or so ways in which one character, or sequence of characters,
		can be confused for another. Unicode defines four different normalization forms,
		which can be used to resolve these confusions. The fact that there are multiple
		normalization forms can reasonably be taken as an indication of the complexity of
		the situation.</p>
	<p>
		Any standard (including COBOL and TR 10176) that talks about width variations, and
		no other form of canonical or compatibility equivalence, is very likely demonstrating
		a potent combination of hubris and ignorance.</p>
	<p>
		I think the C and C++ standards should be silent on this whole topic. An mplementer
		should be able to decide whether his implementation should normalize or not, and
		if so which normalization form should be used, based on his understanding of the
		needs of his customers. The implication of that would be that users should never
		name different things using identifiers that would normalize to the same string,
		nor attempt to reference something using anything but its exact name (for example,
		by using a name that would normalize to the same string as the original name).
	</p>
	<h2>
		Recommendations</h2>
	<p>
		The definition of the ranges of UCNs allowed in an identifier should appear in an
		annex in each standard; the text of these two annexes should be identical. (The
		wording of the citations of these annexes will be different between the two standards.)</p>
	<p>
		The set of UCNs <strong>disallowed</strong> in identifiers in C and C++ should exactly
		match the specification in [AltId], <strong>with the following additions</strong>:
		all characters in the Basic Latin (i.e. ASCII, basic source character) block, and
		all characters in the Unicode General Category "Separator, space".</p>
	<p>
		General combining characters, appearing in blocks dedicated to that purpose, should
		be disallowed as the initial character of an identifier. (Script-specific combining
		characters would be allowed initially, only because of the additional complexity
		and instability of specifying them.)</p>
	<p>
		There should be no restriction on script-specific digits initally in an identifier.</p>
	<h2>
		Proposed wording</h2>
	<p>
		The annex specifying the ranges of allowed identifier characters should be Annex
		D in the C standard, and Annex E in the C++ standard. It should have two sub-clauses:
		one for allowed identifier characters, and one for characters disallowed initially.</p>
	<h3>
		Annex</h3>
	<blockquote>
		<h4>
			<ins>
				<var>
					X</var>. Universal character names for identifier characters (normative)</ins></h4>
		<h5>
			<ins>
				<var>
					X</var>.1 Ranges of characters allowed</ins></h5>
		<p>
			<ins>00A8, 00AA, 00AD, 00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6,
				00F8-00FF</ins></p>
		<p>
			<ins>0100-167F, 1681-180D, 180F-1FFF</ins></p>
		<p>
			<ins>200B-200D, 202A-202E, 203F-2040, 2054, 2060-206F</ins></p>
		<p>
			<ins>2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF</ins></p>
		<p>
			<ins>3004-3007, 3021-302F, 3031-303F</ins></p>
		<p>
			<ins>3040-D7FF</ins></p>
		<p>
			<ins>F900-FD3D, FD40-FDCF, FDF0-FE44, FE47-FFFD</ins></p>
		<p>
			<ins>10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD,
				80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD</ins></p>
		<h5>
			<ins>
				<var>
					X</var>.2 Ranges of characters disallowed initially</ins></h5>
		<p>
			<ins>0300-036F, 1DC0-1DFF, 20D0-20FF, FE20-FE2F</ins></p>
	</blockquote>
	<h3>
		Citing the annex from C</h3>
	<p>
		Change 6.2.4.1p3:</p>
	<blockquote>
		<p>
			Each universal character name in an identifier shall designate a character whose
			encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D<ins>, subclause
				D.1</ins>.<sup>71)</sup> The initial character shall not be a universal character
			name designating a <del>digit</del> <ins>character whose encoding falls into one of
				the ranges specified in subclause D.2</ins>. An implementation may allow multibyte
			characters that are not part of the basic source character set to appear in identifiers;
			which characters and their correspondence to universal character names is implementation-defined.</p>
	</blockquote>
	<h3>
		Citing the annex from C++</h3>
	<p>
		Change 2.11p1:</p>
	<blockquote>
		<p>
			An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name
			in an identifier shall designate a character whose encoding in ISO 10646 falls into
			one of the ranges specified in <del>Annex A of TR 10176:2003</del> <ins>annex E, subclause
				E.1. The initial element shall not be a universal-character-name designating a character
				whose encoding falls into one of the ranges specified in subclause E.2</ins>.
			Upper- and lower-case letters are different. All characters are significant.<sup>19</sup></p>
	</blockquote>
	<p>
		It may also be appropriate to delete the normative reference to TR 10176:2003. Even
		though the standard will (substantially) follow its recommendations for extended
		identifier characters, there will remain no actual reference to it.</p>
	<h2>
		References</h2>
	<dl>
		<dt>[AltId]</dt>
		<dd>
			Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax, "Alternative
			Identifier Syntax", <a href="http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax">
				http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax</a></dd>
		<dt>[DefId]</dt>
		<dd>
			Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax, "Default Identifier
			Syntax", <a href="http://www.unicode.org/reports/tr31/tr31-11.html#Default_Identifier_Syntax">
				http://www.unicode.org/reports/tr31/tr31-11.html#Default_Identifier_Syntax</a></dd>
		<dt>[TR2003]</dt>
		<dd>
			ISO/IEC TR 10176:2003 (WG 20 DTR ballot draft), <a href="http://www.open-std.org/JTC1/sc22/WG20/docs/n970-tr10176-2002.pdf">
				http://www.open-std.org/JTC1/sc22/WG20/docs/n970-tr10176-2002.pdf</a></dd>
		<dt>[TR2003a]</dt>
		<dd>
			Online text of identifier character repertoire recommended by [TR2003], <a href="http://www.iso.org/ittf/ISOIEC_TR_10176_2003_Table.txt">
				http://www.iso.org/ittf/ISOIEC_TR_10176_2003_Table.txt</a></dd>
		<dt>[XML2006]</dt>
		<dd>
			Extensible Markup Language (XML) 1.0 (Fourth Edition), "Common Syntactic Constructs",
			<a href="http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn">http://www.w3.org/TR/2006/REC-xml-20060816/#sec-common-syn</a></dd>
		<dt>[XML2008]</dt>
		<dd>
			Extensible Markup Language (XML) 1.0 (Fifth Edition), "Common Syntactic Constructs",
			<a href="http://www.w3.org/TR/2008/REC-xml-20081126/#sec-common-syn">http://www.w3.org/TR/2008/REC-xml-20081126/#sec-common-syn</a></dd>
	</dl>
	<hr />
	<h2>
		Appendix: [AltId] and [XML2008] illustrated</h2>
	<h3>
		The Basic Multilingual Plane</h3>
	<p>
		Legend: Characters disallowed by [AltId] are indicated by a <span class="altid">box</span>.
		Characters disallowed by [XML2008] are indicated by a <span class="xml5">gray background</span>.</p>
	<p>
		Every block is presented, most only by name. Certain blocks (especially including
		those corresponding to the ASCII and Latin-1 &quot;legacy encodings&quot;) have
		significant numbers of punctuation and symbol characters; these are presented character
		by character. Each assigned, non-control character appears in the HTML source as
		the appropriate character entity; if it doesn't display correctly in your browser,
		the fault is almost certainly in your browser setup. The formal name of each character
		is also present an HTML title, which will hopefully pop up if you hover your mouse
		pointer over the character.</p>
	<p>
		It should be noted that [AltId] is unique in having no distinction between characters
		allowed initially and non-initially in an identifier. Where [XML2008] makes such
		a distinction, it is indicated below as a note. There are also notes for a few isolated
		(plausible) non-identifier characters.</p>
	<table>
		<tr>
			<th>
				0000
			</th>
			<th colspan="16">
				Basic Latin
			</th>
		</tr>
		<tr>
			<th>
				0000
			</th>
			<td class="altid xml5" title="[NULL]">&nbsp;</td>
			<td class="altid xml5" title="[START OF HEADING]">&nbsp;</td>
			<td class="altid xml5" title="[START OF TEXT]">&nbsp;</td>
			<td class="altid xml5" title="[END OF TEXT]">&nbsp;</td>
			<td class="altid xml5" title="[END OF TRANSMISSION]">&nbsp;</td>
			<td class="altid xml5" title="[ENQUIRY]">&nbsp;</td>
			<td class="altid xml5" title="[ACKNOWLEDGE]">&nbsp;</td>
			<td class="altid xml5" title="[BELL]">&nbsp;</td>
			<td class="altid xml5" title="[BACKSPACE]">&nbsp;</td>
			<td class="altid xml5" title="[CHARACTER TABULATION]">&nbsp;</td>
			<td class="altid xml5" title="[LINE FEED]">&nbsp;</td>
			<td class="altid xml5" title="[LINE TABULATION]">&nbsp;</td>
			<td class="altid xml5" title="[FORM FEED]">&nbsp;</td>
			<td class="altid xml5" title="[CARRIAGE RETURN]">&nbsp;</td>
			<td class="altid xml5" title="[SHIFT OUT]">&nbsp;</td>
			<td class="altid xml5" title="[SHIFT IN]">&nbsp;</td>
		</tr>
		<tr>
			<th>
				0010
			</th>
			<td class="altid xml5" title="[DATA LINK ESCAPE]">&nbsp;</td>
			<td class="altid xml5" title="[DEVICE CONTROL ONE]">&nbsp;</td>
			<td class="altid xml5" title="[DEVICE CONTROL TWO]">&nbsp;</td>
			<td class="altid xml5" title="[DEVICE CONTROL THREE]">&nbsp;</td>
			<td class="altid xml5" title="[DEVICE CONTROL FOUR]">&nbsp;</td>
			<td class="altid xml5" title="[NEGATIVE ACKNOWLEDGE]">&nbsp;</td>
			<td class="altid xml5" title="[SYNCHRONOUS IDLE]">&nbsp;</td>
			<td class="altid xml5" title="[END OF TRANSMISSION BLOCK]">&nbsp;</td>
			<td class="altid xml5" title="[CANCEL]">&nbsp;</td>
			<td class="altid xml5" title="[END OF MEDIUM]">&nbsp;</td>
			<td class="altid xml5" title="[SUBSTITUTE]">&nbsp;</td>
			<td class="altid xml5" title="[ESCAPE]">&nbsp;</td>
			<td class="altid xml5" title="[INFORMATION SEPARATOR FOUR]">&nbsp;</td>
			<td class="altid xml5" title="[INFORMATION SEPARATOR THREE]">&nbsp;</td>
			<td class="altid xml5" title="[INFORMATION SEPARATOR TWO]">&nbsp;</td>
			<td class="altid xml5" title="[INFORMATION SEPARATOR ONE]">&nbsp;</td>
		</tr>
		<tr>
			<th>
				0020
			</th>
			<td class="altid xml5" title="SPACE">&nbsp;</td>
			<td class="altid xml5" title="EXCLAMATION MARK">!</td>
			<td class="altid xml5" title="QUOTATION MARK">"</td>
			<td class="altid xml5" title="NUMBER SIGN">#</td>
			<td class="altid xml5" title="DOLLAR SIGN">$</td>
			<td class="altid xml5" title="PERCENT SIGN">%</td>
			<td class="altid xml5" title="AMPERSAND">&amp;</td>
			<td class="altid xml5" title="APOSTROPHE">'</td>
			<td class="altid xml5" title="LEFT PARENTHESIS">(</td>
			<td class="altid xml5" title="RIGHT PARENTHESIS">)</td>
			<td class="altid xml5" title="ASTERISK">*</td>
			<td class="altid xml5" title="PLUS SIGN">+</td>
			<td class="altid xml5" title="COMMA">,</td>
			<td class="altid" title="HYPHEN-MINUS">-</td>
			<td class="altid" title="FULL STOP">.</td>
			<td class="altid xml5" title="SOLIDUS">/</td>
			<td>XML disallows <span class="altid" title="HYPHEN-MINUS">-</span>and <span class="altid"
				title="FULL STOP">.</span> initially</td>
		</tr>
		<tr>
			<th>
				0030
			</th>
			<td title="DIGIT ZERO">0</td>
			<td title="DIGIT ONE">1</td>
			<td title="DIGIT TWO">2</td>
			<td title="DIGIT THREE">3</td>
			<td title="DIGIT FOUR">4</td>
			<td title="DIGIT FIVE">5</td>
			<td title="DIGIT SIX">6</td>
			<td title="DIGIT SEVEN">7</td>
			<td title="DIGIT EIGHT">8</td>
			<td title="DIGIT NINE">9</td>
			<td class="altid" title="COLON">:</td>
			<td class="altid xml5" title="SEMICOLON">;</td>
			<td class="altid xml5" title="LESS-THAN SIGN">&lt;</td>
			<td class="altid xml5" title="EQUALS SIGN">=</td>
			<td class="altid xml5" title="GREATER-THAN SIGN">&gt;</td>
			<td class="altid xml5" title="QUESTION MARK">?</td>
			<td>XML allows <span class="altid" title="COLON">:</span> initially, disallows digits
				initially </td>
		</tr>
		<tr>
			<th>
				0040
			</th>
			<td class="altid xml5" title="COMMERCIAL AT">@</td>
			<td title="LATIN CAPITAL LETTER A">A</td>
			<td title="LATIN CAPITAL LETTER B">B</td>
			<td title="LATIN CAPITAL LETTER C">C</td>
			<td title="LATIN CAPITAL LETTER D">D</td>
			<td title="LATIN CAPITAL LETTER E">E</td>
			<td title="LATIN CAPITAL LETTER F">F</td>
			<td title="LATIN CAPITAL LETTER G">G</td>
			<td title="LATIN CAPITAL LETTER H">H</td>
			<td title="LATIN CAPITAL LETTER I">I</td>
			<td title="LATIN CAPITAL LETTER J">J</td>
			<td title="LATIN CAPITAL LETTER K">K</td>
			<td title="LATIN CAPITAL LETTER L">L</td>
			<td title="LATIN CAPITAL LETTER M">M</td>
			<td title="LATIN CAPITAL LETTER N">N</td>
			<td title="LATIN CAPITAL LETTER O">O</td>
		</tr>
		<tr>
			<th>
				0050
			</th>
			<td title="LATIN CAPITAL LETTER P">P</td>
			<td title="LATIN CAPITAL LETTER Q">Q</td>
			<td title="LATIN CAPITAL LETTER R">R</td>
			<td title="LATIN CAPITAL LETTER S">S</td>
			<td title="LATIN CAPITAL LETTER T">T</td>
			<td title="LATIN CAPITAL LETTER U">U</td>
			<td title="LATIN CAPITAL LETTER V">V</td>
			<td title="LATIN CAPITAL LETTER W">W</td>
			<td title="LATIN CAPITAL LETTER X">X</td>
			<td title="LATIN CAPITAL LETTER Y">Y</td>
			<td title="LATIN CAPITAL LETTER Z">Z</td>
			<td class="altid xml5" title="LEFT SQUARE BRACKET">[</td>
			<td class="altid xml5" title="REVERSE SOLIDUS">\</td>
			<td class="altid xml5" title="RIGHT SQUARE BRACKET">]</td>
			<td class="altid xml5" title="CIRCUMFLEX ACCENT">^</td>
			<td title="LOW LINE">_</td>
		</tr>
		<tr>
			<th>
				0060
			</th>
			<td class="altid xml5" title="GRAVE ACCENT">`</td>
			<td title="LATIN SMALL LETTER A">a</td>
			<td title="LATIN SMALL LETTER B">b</td>
			<td title="LATIN SMALL LETTER C">c</td>
			<td title="LATIN SMALL LETTER D">d</td>
			<td title="LATIN SMALL LETTER E">e</td>
			<td title="LATIN SMALL LETTER F">f</td>
			<td title="LATIN SMALL LETTER G">g</td>
			<td title="LATIN SMALL LETTER H">h</td>
			<td title="LATIN SMALL LETTER I">i</td>
			<td title="LATIN SMALL LETTER J">j</td>
			<td title="LATIN SMALL LETTER K">k</td>
			<td title="LATIN SMALL LETTER L">l</td>
			<td title="LATIN SMALL LETTER M">m</td>
			<td title="LATIN SMALL LETTER N">n</td>
			<td title="LATIN SMALL LETTER O">o</td>
		</tr>
		<tr>
			<th>
				0070
			</th>
			<td title="LATIN SMALL LETTER P">p</td>
			<td title="LATIN SMALL LETTER Q">q</td>
			<td title="LATIN SMALL LETTER R">r</td>
			<td title="LATIN SMALL LETTER S">s</td>
			<td title="LATIN SMALL LETTER T">t</td>
			<td title="LATIN SMALL LETTER U">u</td>
			<td title="LATIN SMALL LETTER V">v</td>
			<td title="LATIN SMALL LETTER W">w</td>
			<td title="LATIN SMALL LETTER X">x</td>
			<td title="LATIN SMALL LETTER Y">y</td>
			<td title="LATIN SMALL LETTER Z">z</td>
			<td class="altid xml5" title="LEFT CURLY BRACKET">{</td>
			<td class="altid xml5" title="VERTICAL LINE">|</td>
			<td class="altid xml5" title="RIGHT CURLY BRACKET">}</td>
			<td class="altid xml5" title="TILDE">~</td>
			<td class="altid xml5" title="[DELETE]">&nbsp;</td>
		</tr>
		<tr>
			<th>
				0800
			</th>
			<th colspan="16">
				Latin-1 Supplement
			</th>
		</tr>
		<tr>
			<th>
				0080
			</th>
			<td class="altid xml5" title="&lt;control&gt;">&nbsp;</td>
			<td class="altid xml5" title="&lt;control&gt;">&nbsp;</td>
			<td class="altid xml5" title="[BREAK PERMITTED HERE]">&nbsp;</td>
			<td class="altid xml5" title="[NO BREAK HERE]">&nbsp;</td>
			<td class="altid xml5" title="[INDEX]">&nbsp;</td>
			<td class="altid xml5" title="[NEXT LINE]">&nbsp;</td>
			<td class="altid xml5" title="[START OF SELECTED AREA]">&nbsp;</td>
			<td class="altid xml5" title="[END OF SELECTED AREA]">&nbsp;</td>
			<td class="altid xml5" title="[CHARACTER TABULATION SET]">&nbsp;</td>
			<td class="altid xml5" title="[CHARACTER TABULATION WITH JUSTIFICATION]">&nbsp;
			</td>
			<td class="altid xml5" title="[LINE TABULATION SET]">&nbsp;</td>
			<td class="altid xml5" title="[PARTIAL LINE FORWARD]">&nbsp;</td>
			<td class="altid xml5" title="[PARTIAL LINE BACKWARD]">&nbsp;</td>
			<td class="altid xml5" title="[REVERSE LINE FEED]">&nbsp;</td>
			<td class="altid xml5" title="[SINGLE SHIFT TWO]">&nbsp;</td>
			<td class="altid xml5" title="[SINGLE SHIFT THREE]">&nbsp;</td>
		</tr>
		<tr>
			<th>
				0090
			</th>
			<td class="altid xml5" title="[DEVICE CONTROL STRING]">&nbsp;</td>
			<td class="altid xml5" title="[PRIVATE USE ONE]">&nbsp;</td>
			<td class="altid xml5" title="[PRIVATE USE TWO]">&nbsp;</td>
			<td class="altid xml5" title="[SET TRANSMIT STATE]">&nbsp;</td>
			<td class="altid xml5" title="[CANCEL CHARACTER]">&nbsp;</td>
			<td class="altid xml5" title="[MESSAGE WAITING]">&nbsp;</td>
			<td class="altid xml5" title="[START OF GUARDED AREA]">&nbsp;</td>
			<td class="altid xml5" title="[END OF GUARDED AREA]">&nbsp;</td>
			<td class="altid xml5" title="[START OF STRING]">&nbsp;</td>
			<td class="altid xml5" title="&lt;control&gt;">&nbsp;</td>
			<td class="altid xml5" title="[SINGLE CHARACTER INTRODUCER]">&nbsp;</td>
			<td class="altid xml5" title="[CONTROL SEQUENCE INTRODUCER]">&nbsp;</td>
			<td class="altid xml5" title="[STRING TERMINATOR]">&nbsp;</td>
			<td class="altid xml5" title="[OPERATING SYSTEM COMMAND]">&nbsp;</td>
			<td class="altid xml5" title="[PRIVACY MESSAGE]">&nbsp;</td>
			<td class="altid xml5" title="[APPLICATION PROGRAM COMMAND]">&nbsp;</td>
		</tr>
		<tr>
			<th>
				00A0
			</th>
			<td class="xml5" title="NO-BREAK SPACE">&nbsp;</td>
			<td class="altid xml5" title="INVERTED EXCLAMATION MARK">&#xa1;</td>
			<td class="altid xml5" title="CENT SIGN">&#xa2;</td>
			<td class="altid xml5" title="POUND SIGN">&#xa3;</td>
			<td class="altid xml5" title="CURRENCY SIGN">&#xa4;</td>
			<td class="altid xml5" title="YEN SIGN">&#xa5;</td>
			<td class="altid xml5" title="BROKEN BAR">&#xa6;</td>
			<td class="altid xml5" title="SECTION SIGN">&#xa7;</td>
			<td class="xml5" title="DIAERESIS">&#xa8;</td>
			<td class="altid xml5" title="COPYRIGHT SIGN">&#xa9;</td>
			<td class="xml5" title="FEMININE ORDINAL INDICATOR">&#xaa;</td>
			<td class="altid xml5" title="LEFT-POINTING DOUBLE ANGLE QUOTATION MARK">&#xab;
			</td>
			<td class="altid xml5" title="NOT SIGN">&#xac;</td>
			<td class="xml5" title="SOFT HYPHEN">&#xad;</td>
			<td class="altid xml5" title="REGISTERED SIGN">&#xae;</td>
			<td class="xml5" title="MACRON">&#xaf;</td>
		</tr>
		<tr>
			<th>
				00B0
			</th>
			<td class="altid xml5" title="DEGREE SIGN">&#xb0;</td>
			<td class="altid xml5" title="PLUS-MINUS SIGN">&#xb1;</td>
			<td class="xml5" title="SUPERSCRIPT TWO">&#xb2;</td>
			<td class="xml5" title="SUPERSCRIPT THREE">&#xb3;</td>
			<td class="xml5" title="ACUTE ACCENT">&#xb4;</td>
			<td class="xml5" title="MICRO SIGN">&#xb5;</td>
			<td class="altid xml5" title="PILCROW SIGN">&#xb6;</td>
			<td title="MIDDLE DOT">&#xb7;</td>
			<td class="xml5" title="CEDILLA">&#xb8;</td>
			<td class="xml5" title="SUPERSCRIPT ONE">&#xb9;</td>
			<td class="xml5" title="MASCULINE ORDINAL INDICATOR">&#xba;</td>
			<td class="altid xml5" title="RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK">&#xbb;
			</td>
			<td class="xml5" title="VULGAR FRACTION ONE QUARTER">&#xbc;</td>
			<td class="xml5" title="VULGER FRACTION ONE HALF">&#xbd;</td>
			<td class="xml5" title="VULGAR FRACTION THREE QUARTERS">&#xbe;</td>
			<td class="altid xml5" title="INVERTED QUESTION MARK">&#xbf;</td>
			<td>XML disallows <span title="MIDDLE DOT">&#xb7;</span> initially</td>
		</tr>
		<tr>
			<th>
				00C0
			</th>
			<td title="LATIN CAPITAL LETTER A WITH GRAVE">&#xc0;</td>
			<td title="LATIN CAPITAL LETTER A WITH ACUTE">&#xc1;</td>
			<td title="LATIN CAPITAL LETTER A WITH CIRCUMFLEX">&#xc2;</td>
			<td title="LATIN CAPITAL LETTER A WITH TILDE">&#xc3;</td>
			<td title="LATIN CAPITAL LETTER A WITH DIAERESIS">&#xc4;</td>
			<td title="LATIN CAPITAL LETTER A WITH RING ABOVE">&#xc5;</td>
			<td title="LATIN CAPITAL LETTER AE">&#xc6;</td>
			<td title="LATIN CAPITAL LETTER C WITH CEDILLA">&#xc7;</td>
			<td title="LATIN CAPITAL LETTER E WITH GRAVE">&#xc8;</td>
			<td title="LATIN CAPITAL LETTER E WITH ACUTE">&#xc9;</td>
			<td title="LATIN CAPITAL LETTER E WITH CIRCUMFLEX">&#xca;</td>
			<td title="LATIN CAPITAL LETTER E WITH DIAERESIS">&#xcb;</td>
			<td title="LATIN CAPITAL LETTER I WITH GRAVE">&#xcc;</td>
			<td title="LATIN CAPITAL LETTER I WITH ACUTE">&#xcd;</td>
			<td title="LATIN CAPITAL LETTER I WITH CIRCUMFLEX">&#xce;</td>
			<td title="LATIN CAPITAL LETTER I WITH DIAERESIS">&#xcf;</td>
		</tr>
		<tr>
			<th>
				00D0
			</th>
			<td title="LATIN CAPITAL LETTER ETH">&#xd0;</td>
			<td title="LATIN CAPITAL LETTER N WITH TILDE">&#xd1;</td>
			<td title="LATIN CAPITAL LETTER O WITH GRAVE">&#xd2;</td>
			<td title="LATIN CAPITAL LETTER O WITH ACUTE">&#xd3;</td>
			<td title="LATIN CAPITAL LETTER O WITH CIRCUMFLEX">&#xd4;</td>
			<td title="LATIN CAPITAL LETTER O WITH TILDE">&#xd5;</td>
			<td title="LATIN CAPITAL LETTER O WITH DIAERESIS">&#xd6;</td>
			<td class="altid xml5" title="MULTIPLICATION SIGN">&#xd7;</td>
			<td title="LATIN CAPITAL LETTER O WITH STROKE">&#xd8;</td>
			<td title="LATIN CAPITAL LETTER U WITH GRAVE">&#xd9;</td>
			<td title="LATIN CAPITAL LETTER U WITH ACUTE">&#xda;</td>
			<td title="LATIN CAPITAL LETTER U WITH CIRCUMFLEX">&#xdb;</td>
			<td title="LATIN CAPITAL LETTER U WITH DIAERESIS">&#xdc;</td>
			<td title="LATIN CAPITAL LETTER Y WITH ACUTE">&#xdd;</td>
			<td title="LATIN CAPITAL LETTER THORN">&#xde;</td>
			<td title="LATIN SMALL LETTER SHARP S">&#xdf;</td>
		</tr>
		<tr>
			<th>
				00E0
			</th>
			<td title="LATIN SMALL LETTER A WITH GRAVE">&#xe0;</td>
			<td title="LATIN SMALL LETTER A WITH ACUTE">&#xe1;</td>
			<td title="LATIN SMALL LETTER A WITH CIRCUMFLEX">&#xe2;</td>
			<td title="LATIN SMALL LETTER A WITH TILDE">&#xe3;</td>
			<td title="LATIN SMALL LETTER A WITH DIAERESIS">&#xe4;</td>
			<td title="LATIN SMALL LETTER A WITH RING ABOVE">&#xe5;</td>
			<td title="LATIN SMALL LETTER AE">&#xe6;</td>
			<td title="LATIN SMALL LETTER C WITH CEDILLA">&#xe7;</td>
			<td title="LATIN SMALL LETTER E WITH GRAVE">&#xe8;</td>
			<td title="LATIN SMALL LETTER E WITH ACUTE">&#xe9;</td>
			<td title="LATIN SMALL LETTER E WITH CIRCUMFLEX">&#xea;</td>
			<td title="LATIN SMALL LETTER E WITH DIAERESIS">&#xeb;</td>
			<td title="LATIN SMALL LETTER I WITH GRAVE">&#xec;</td>
			<td title="LATIN SMALL LETTER I WITH ACUTE">&#xed;</td>
			<td title="LATIN SMALL LETTER I WITH CIRCUMFLEX">&#xee;</td>
			<td title="LATIN SMALL LETTER I WITH DIAERESIS">&#xef;</td>
		</tr>
		<tr>
			<th>
				00F0
			</th>
			<td title="LATIN SMALL LETTER ETH">&#xf0;</td>
			<td title="LATIN SMALL LETTER N WITH TILDE">&#xf1;</td>
			<td title="LATIN SMALL LETTER O WITH GRAVE">&#xf2;</td>
			<td title="LATIN SMALL LETTER O WITH ACUTE">&#xf3;</td>
			<td title="LATIN SMALL LETTER O WITH CIRCUMFLEX">&#xf4;</td>
			<td title="LATIN SMALL LETTER O WITH TILDE">&#xf5;</td>
			<td title="LATIN SMALL LETTER O WITH DIAERESIS">&#xf6;</td>
			<td class="altid xml5" title="DIVISION SIGN">&#xf7;</td>
			<td title="LATIN SMALL LETTER O WITH STROKE">&#xf8;</td>
			<td title="LATIN SMALL LETTER U WITH GRAVE">&#xf9;</td>
			<td title="LATIN SMALL LETTER U WITH ACUTE">&#xfa;</td>
			<td title="LATIN SMALL LETTER U WITH CIRCUMFLEX">&#xfb;</td>
			<td title="LATIN SMALL LETTER U WITH DIAERESIS">&#xfc;</td>
			<td title="LATIN SMALL LETTER Y WITH ACUTE">&#xfd;</td>
			<td title="LATIN SMALL LETTER THORN">&#xfe;</td>
			<td title="LATIN SMALL LETTER Y WITH DIAERESIS">&#xff;</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				0100
			</th>
			<td>Latin Extended-A</td>
		</tr>
		<tr>
			<th>
				0180<br />
				0200
			</th>
			<td>Latin Extended-B</td>
		</tr>
		<tr>
			<th>
				0250
			</th>
			<td>IPA Extensions, Spacing Modifier Letters</td>
		</tr>
		<tr>
			<th>
				0300
			</th>
			<td>Combining Diacritical Marks</td>
			<td>XML disallows these initally</td>
		</tr>
		<tr>
			<th>
				0370
			</th>
			<td>Greek and Coptic</td>
			<td>XML specifically disallows <span class="xml5" title="GREEK QUESTION MARK">&#x37e;</span>
			</td>
		</tr>
		<tr>
			<th>
				0400
			</th>
			<td>Cyrillic</td>
		</tr>
		<tr>
			<th>
				0500
			</th>
			<td>Cyrillic Supplement, Armenian, Hebrew</td>
		</tr>
		<tr>
			<th>
				0600
			</th>
			<td>Arabic</td>
		</tr>
		<tr>
			<th>
				0700
			</th>
			<td>Syriac, Arabic Supplement, Thaana, NKo</td>
		</tr>
		<tr>
			<th>
				0800
			</th>
			<td>Samaritan</td>
		</tr>
		<tr>
			<th>
				0900
			</th>
			<td>Devanagari, Bengali</td>
		</tr>
		<tr>
			<th>
				0A00
			</th>
			<td>Gurmukhi, Gujarati</td>
		</tr>
		<tr>
			<th>
				0B00
			</th>
			<td>Oriya, Tamil</td>
		</tr>
		<tr>
			<th>
				0C00
			</th>
			<td>Telugu, Kannada</td>
		</tr>
		<tr>
			<th>
				0D00
			</th>
			<td>Malayalam, Sinhala</td>
		</tr>
		<tr>
			<th>
				0E00
			</th>
			<td>Thai, Lao</td>
		</tr>
		<tr>
			<th>
				0F00
			</th>
			<td>Tibetan</td>
		</tr>
		<tr>
			<th>
				1000
			</th>
			<td>Myanmar, Georgian</td>
		</tr>
		<tr>
			<th>
				1100
			</th>
			<td>Hangul Jamo</td>
		</tr>
		<tr>
			<th>
				1200<br />
				1300
			</th>
			<td>Ethiopic</td>
		</tr>
		<tr>
			<th>
				1380
			</th>
			<td>Ethiopic Supplement, Cherokee</td>
		</tr>
		<tr>
			<th>
				1400<br />
				1600
			</th>
			<td>Unified Canadian Aboriginal Syllabics</td>
		</tr>
		<tr>
			<th>
				1680
			</th>
			<td>Ogham, Runic</td>
			<td>The Ogham block contains a script-specific space: <span title="OGHAM SPACE MARK">
				&#x1680;</span></td>
		</tr>
		<tr>
			<th>
				1700
			</th>
			<td>Tagalog, Hanunoo, Buhid, Tagbanwa, Khmer</td>
		</tr>
		<tr>
			<th>
				1800
			</th>
			<td>Mongolian, Unified Canadian Aboriginal Syllabics Extended</td>
			<td>The Mongolian block contains a script-specific space: <span title="MONGOLIAN VOWEL SEPARATOR">
				&#x180E;</span></td>
		</tr>
		<tr>
			<th>
				1900
			</th>
			<td>Limbu, Tai Le, New Tai Lue, Khmer Symbols</td>
		</tr>
		<tr>
			<th>
				1A00
			</th>
			<td>Buginese, Tai Tham</td>
		</tr>
		<tr>
			<th>
				1B00
			</th>
			<td>Balinese, Sundanese</td>
		</tr>
		<tr>
			<th>
				1C00
			</th>
			<td>Lepcha, Ol Chiki, Vedic Extensions</td>
		</tr>
		<tr>
			<th>
				1D00
			</th>
			<td>Phonetic Extensions, Phonetic Extensions Supplement</td>
		</tr>
		<tr>
			<th>
				1DC0
			</th>
			<td>Combining Diacritical Marks Supplement</td>
			<td>XML does <strong>not</strong> disallow these initially</td>
		</tr>
		<tr>
			<th>
				1E00
			</th>
			<td>Latin Extended Additional</td>
		</tr>
		<tr>
			<th>
				1F00
			</th>
			<td>Greek Extended</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				2000
			</th>
			<th colspan="16">
				General Punctuation
			</th>
		</tr>
		<tr>
			<th>
				2000
			</th>
			<td class="xml5" title="EN QUAD">&#x2000;</td>
			<td class="xml5" title="EM QUAD">&#x2001;</td>
			<td class="xml5" title="EN SPACE">&#x2002;</td>
			<td class="xml5" title="EM SPACE">&#x2003;</td>
			<td class="xml5" title="THREE-PER-EM SPACE">&#x2004;</td>
			<td class="xml5" title="FOUR-PER-EM SPACE">&#x2005;</td>
			<td class="xml5" title="SIX-PER-EM SPACE">&#x2006;</td>
			<td class="xml5" title="FIGURE SPACE">&#x2007;</td>
			<td class="xml5" title="PUNCTUATION SPACE">&#x2008;</td>
			<td class="xml5" title="THIN SPACE">&#x2009;</td>
			<td class="xml5" title="HAIR SPACE">&#x200a;</td>
			<td class="xml5" title="ZERO WIDTH SPACE">&#x200b;</td>
			<td title="ZERO WIDTH NON JOINER">&#x200c;</td>
			<td title="ZERO WIDTH JOINER">&#x200d;</td>
			<td class="altid xml5" title="LEFT-TO-RIGHT MARK">&#x200e;</td>
			<td class="altid xml5" title="RIGHT-TO-LEFT MARK">&#x200f;</td>
		</tr>
		<tr>
			<th>
				2010
			</th>
			<td class="altid xml5" title="HYPHEN">&#x2010;</td>
			<td class="altid xml5" title="NON-BREAKING HYPHEN">&#x2011;</td>
			<td class="altid xml5" title="FIGURE DASH">&#x2012;</td>
			<td class="altid xml5" title="EN DASH">&#x2013;</td>
			<td class="altid xml5" title="EM DASH">&#x2014;</td>
			<td class="altid xml5" title="HORIZONTAL BAR">&#x2015;</td>
			<td class="altid xml5" title="DOUBLE VERTICAL LINE">&#x2016;</td>
			<td class="altid xml5" title="DOUBLE LOW LINE">&#x2017;</td>
			<td class="altid xml5" title="LEFT SINGLE QUOTATION MARK">&#x2018;</td>
			<td class="altid xml5" title="RIGHT SINGLE QUOTATION MARK">&#x2019;</td>
			<td class="altid xml5" title="SINGLE LOW-9 QUOTATION MARK">&#x201a;</td>
			<td class="altid xml5" title="SINGLE HIGH-REVERSED-9 QUOTATION MARK">&#x201b;</td>
			<td class="altid xml5" title="LEFT DOUBLE QUOTATION MARK">&#x201c;</td>
			<td class="altid xml5" title="RIGHT DOUBLE QUOTATION MARK">&#x201d;</td>
			<td class="altid xml5" title="DOUBLE LOW-9 QUOTATION MARK">&#x201e;</td>
			<td class="altid xml5" title="DOUBLE HIGH-REVERSED-9 QUOTATION MARK">&#x201f;</td>
		</tr>
		<tr>
			<th>
				2020
			</th>
			<td class="altid xml5" title="DAGGER">&#x2020;</td>
			<td class="altid xml5" title="DOUBLE DAGGER">&#x2021;</td>
			<td class="altid xml5" title="BULLET">&#x2022;</td>
			<td class="altid xml5" title="TRIANGULAR BULLET">&#x2023;</td>
			<td class="altid xml5" title="ONE DOT LEADER">&#x2024;</td>
			<td class="altid xml5" title="TWO DOT LEADER">&#x2025;</td>
			<td class="altid xml5" title="HORIZONTAL ELLIPSIS">&#x2026;</td>
			<td class="altid xml5" title="HYPHENATION POINT">&#x2027;</td>
			<td class="altid xml5" title="LINE SEPARATOR">&#x2028;</td>
			<td class="altid xml5" title="PARAGRAPH SEPARATOR">&#x2029;</td>
			<td class="xml5" title="LEFT-TO-RIGHT EMBEDDING">&#x202a;</td>
			<td class="xml5" title="RIGHT-TO-LEFT EMBEDDING">&#x202b;</td>
			<td class="xml5" title="POP DIRECTIONAL FORMATTING">&#x202c;</td>
			<td class="xml5" title="LEFT-TO-RIGHT OVERRIDE">&#x202d;</td>
			<td class="xml5" title="RIGHT-TO-LEFT OVERRIDE">&#x202e;</td>
			<td class="xml5" title="NARROW NO-BREAK SPACE">&#x202f;</td>
		</tr>
		<tr>
			<th>
				2030
			</th>
			<td class="altid xml5" title="PER MILLE SIGN">&#x2030;</td>
			<td class="altid xml5" title="PER TEN THOUSAND SIGN">&#x2031;</td>
			<td class="altid xml5" title="PRIME">&#x2032;</td>
			<td class="altid xml5" title="DOUBLE PRIME">&#x2033;</td>
			<td class="altid xml5" title="TRIPLE PRIME">&#x2034;</td>
			<td class="altid xml5" title="REVERSED PRIME">&#x2035;</td>
			<td class="altid xml5" title="REVERSED DOUBLE PRIME">&#x2036;</td>
			<td class="altid xml5" title="REVERSED TRIPLE PRIME">&#x2037;</td>
			<td class="altid xml5" title="CARET">&#x2038;</td>
			<td class="altid xml5" title="SINGLE LEFT-POINTING ANGLE QUOTATION MARK">&#x2039;
			</td>
			<td class="altid xml5" title="SINGLE RIGHT-POINTING ANGLE QUOTATION MARK">&#x203a;
			</td>
			<td class="altid xml5" title="REFERENCE MARK">&#x203b;</td>
			<td class="altid xml5" title="DOUBLE EXCLAMATION MARK">&#x203c;</td>
			<td class="altid xml5" title="INTERROBANG">&#x203d;</td>
			<td class="altid xml5" title="OVERLINE">&#x203e;</td>
			<td title="UNDERTIE">&#x203f;</td>
			<td>XML disallows <span title="UNDERTIE">&#x203f;</span> initially</td>
		</tr>
		<tr>
			<th>
				2040
			</th>
			<td title="CHARACTER TIE">&#x2040;</td>
			<td class="altid xml5" title="CARET INSERTION POINT">&#x2041;</td>
			<td class="altid xml5" title="ASTERISM">&#x2042;</td>
			<td class="altid xml5" title="HYPHEN BULLET">&#x2043;</td>
			<td class="altid xml5" title="FRACTION SLASH">&#x2044;</td>
			<td class="altid xml5" title="LEFT SQUARE BRACKET WITH QUILL">&#x2045;</td>
			<td class="altid xml5" title="RIGHT SQUARE BRACKET WITH QUILL">&#x2046;</td>
			<td class="altid xml5" title="DOUBLE QUESTION MARK">&#x2047;</td>
			<td class="altid xml5" title="QUESTION EXCLAMATION MARK">&#x2048;</td>
			<td class="altid xml5" title="EXCLAMATION QUESTION MARK">&#x2049;</td>
			<td class="altid xml5" title="TIRONIAN SIGN ET">&#x204a;</td>
			<td class="altid xml5" title="REVERSED PILCROW SIGN">&#x204b;</td>
			<td class="altid xml5" title="BLACK LEFTWARDS BULLET">&#x204c;</td>
			<td class="altid xml5" title="BLACK RIGHTWARDS BULLET">&#x204d;</td>
			<td class="altid xml5" title="LOW ASTERISK">&#x204e;</td>
			<td class="altid xml5" title="REVERSED SEMICOLON">&#x204f;</td>
			<td>XML disallows <span title="CHARACTER TIE">&#x2040;</span> initially</td>
		</tr>
		<tr>
			<th>
				2050
			</th>
			<td class="altid xml5" title="CLOSE UP">&#x2050;</td>
			<td class="altid xml5" title="TWO ASTERISKS ALIGNED VERTICALLY">&#x2051;</td>
			<td class="altid xml5" title="COMMERCIAL MINUS SIGN">&#x2052;</td>
			<td class="altid xml5" title="SWUNG DASH">&#x2053;</td>
			<td class="xml5" title="INVERTED UNDERTIE">&#x2054;</td>
			<td class="altid xml5" title="FLOWER PUNCTUATION MARK">&#x2055;</td>
			<td class="altid xml5" title="THREE DOT PUNCTUATION">&#x2056;</td>
			<td class="altid xml5" title="QUADRUPLE PRIME">&#x2057;</td>
			<td class="altid xml5" title="FOUR DOT PUNCTUATION">&#x2058;</td>
			<td class="altid xml5" title="FIVE DOT PUNCTUATION">&#x2059;</td>
			<td class="altid xml5" title="TWO DOT PUNCTUATION">&#x205a;</td>
			<td class="altid xml5" title="FOUR DOT MARK">&#x205b;</td>
			<td class="altid xml5" title="DOTTED CROSS">&#x205c;</td>
			<td class="altid xml5" title="TRICOLON">&#x205d;</td>
			<td class="altid xml5" title="VERTICAL FOUR DOTS">&#x205e;</td>
			<td class="xml5" title="MEDIUM MATHEMATICAL SPACE">&#x205f;</td>
		</tr>
		<tr>
			<th>
				2060
			</th>
			<td class="xml5" title="WORD JOINER">&#x2060;</td>
			<td class="xml5" title="FUNCTION APPLICATION">&#x2061;</td>
			<td class="xml5" title="INVISIBLE TIMES">&#x2062;</td>
			<td class="xml5" title="INVISIBLE SEPARATOR">&#x2063;</td>
			<td class="xml5" title="INVISIBLE PLUS">&#x2064;</td>
			<td class="xml5" title="&lt;unassigned&gt;">&nbsp;</td>
			<td class="xml5" title="&lt;unassigned&gt;">&nbsp;</td>
			<td class="xml5" title="&lt;unassigned&gt;">&nbsp;</td>
			<td class="xml5" title="&lt;unassigned&gt;">&nbsp;</td>
			<td class="xml5" title="&lt;unassigned&gt;">&nbsp;</td>
			<td class="xml5" title="INHIBIT SYMMETRIC SWAPPING">&#x206a;</td>
			<td class="xml5" title="ACTIVATE SYMMETRIC SWAPPING">&#x206b;</td>
			<td class="xml5" title="INHIBIT ARABIC FORM SHAPING">&#x206c;</td>
			<td class="xml5" title="ACTIVATE ARABIC FORM SHAPING">&#x206d;</td>
			<td class="xml5" title="NATIONAL DIGIT SHAPES">&#x206e;</td>
			<td class="xml5" title="NOMINAL DIGIT SHAPES">&#x206f;</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				2070
			</th>
			<td>Superscripts and Subscripts</td>
		</tr>
		<tr>
			<th>
				20A0
			</th>
			<td>Currency Symbols</td>
		</tr>
		<tr>
			<th>
				20D0
			</th>
			<td>Combining Diacritical Marks for Symbols</td>
			<td>XML does <strong>not</strong> disallow these initially</td>
		</tr>
		<tr>
			<th>
				2100
			</th>
			<td>Letterlike Symbols</td>
		</tr>
		<tr>
			<th>
				2150
			</th>
			<td>Number Forms</td>
		</tr>
		<tr>
			<th>
				2190
			</th>
			<td class="altid xml5">Arrows</td>
		</tr>
		<tr>
			<th>
				2200
			</th>
			<td class="altid xml5">Mathematical Operators</td>
		</tr>
		<tr>
			<th>
				2300
			</th>
			<td class="altid xml5">Miscellaneous Technical</td>
		</tr>
		<tr>
			<th>
				2400
			</th>
			<td class="altid xml5">Control Pictures</td>
		</tr>
		<tr>
			<th>
				2440
			</th>
			<td class="altid xml5">Optical Character Recognition</td>
		</tr>
		<tr>
			<th>
				2460
			</th>
			<td class="xml5">Enclosed Alphanumerics</td>
		</tr>
		<tr>
			<th>
				2500
			</th>
			<td class="altid xml5">Box Drawing</td>
		</tr>
		<tr>
			<th>
				2580
			</th>
			<td class="altid xml5">Block Elements</td>
		</tr>
		<tr>
			<th>
				25A0
			</th>
			<td class="altid xml5">Geometric Shapes</td>
		</tr>
		<tr>
			<th>
				2600
			</th>
			<td class="altid xml5">Miscellaneous Symbols</td>
		</tr>
		<tr>
			<th>
				2700
			</th>
			<td class="altid xml5">Dingbats</td>
		</tr>
		<tr>
			<th>
				2776
			</th>
			<td class="xml5">Dingbats (circled digits)</td>
		</tr>
		<tr>
			<th>
				2794
			</th>
			<td class="altid xml5">Dingbats</td>
		</tr>
		<tr>
			<th>
				27C0
			</th>
			<td class="altid xml5">Miscellaneous Mathematical Symbols-A</td>
		</tr>
		<tr>
			<th>
				27F0
			</th>
			<td class="altid xml5">Supplemental Arrows-A</td>
		</tr>
		<tr>
			<th>
				2800
			</th>
			<td class="altid xml5">Braille Patterns</td>
		</tr>
		<tr>
			<th>
				2900
			</th>
			<td class="altid xml5">Supplemental Arrows-B</td>
		</tr>
		<tr>
			<th>
				2980
			</th>
			<td class="altid xml5">Miscellaneous Mathematical Symbols-B</td>
		</tr>
		<tr>
			<th>
				2A00
			</th>
			<td class="altid xml5">Supplemental Mathematical Operators</td>
		</tr>
		<tr>
			<th>
				2B00
			</th>
			<td class="altid xml5">Miscellaneous Symbols and Arrows</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				2C00
			</th>
			<td>Glagolitic, Latin Extended-C, Coptic</td>
		</tr>
		<tr>
			<th>
				2D00
			</th>
			<td>Georgian Supplement, Tifinagh, Ethiopic Extended, Cyrillic Extended-A</td>
		</tr>
		<tr>
			<th>
				2E00
			</th>
			<td class="altid">Supplemental Punctuation</td>
		</tr>
		<tr>
			<th>
				2E80
			</th>
			<td>CJK Radicals Supplement, Kangxi Radicals</td>
		</tr>
		<tr>
			<th>
				2FF0
			</th>
			<td class="xml5">Ideographic Description</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				3000
			</th>
			<th colspan="16">
				CJK Symbols and Punctuation
			</th>
		</tr>
		<tr>
			<th>
				3000
			</th>
			<td class="xml5" title="IDEOGRAPHIC SPACE">&#x3000;</td>
			<td class="altid" title="IDEOGRAPHIC COMMA">&#x3001;</td>
			<td class="altid" title="IDEOGRAPHIC FULL STOP">&#x3002;</td>
			<td class="altid" title="DITTO MARK">&#x3003;</td>
			<td title="JAPANESE INDUSTRIAL STANDARD SYMBOL">&#x3004;</td>
			<td title="IDEOGRAPHIC ITERATION MARK">&#x3005;</td>
			<td title="IDEOGRAPHIC CLOSING MARK">&#x3006;</td>
			<td title="IDEOGRAPHIC NUMBER ZERO">&#x3007;</td>
			<td class="altid" title="LEFT ANGLE BRACKET">&#x3008;</td>
			<td class="altid" title="RIGHT ANGLE BRACKET">&#x3009;</td>
			<td class="altid" title="LEFT DOUBLE ANGLE BRACKET">&#x300a;</td>
			<td class="altid" title="RIGHT DOUBLE ANGLE BRACKET">&#x300b;</td>
			<td class="altid" title="LEFT CORNER BRACKET">&#x300c;</td>
			<td class="altid" title="RIGHT CORNER BRACKET">&#x300d;</td>
			<td class="altid" title="LEFT WHITE CORNER BRACKET">&#x300e;</td>
			<td class="altid" title="RIGHT WHITE CORNER BRACKET">&#x300f;</td>
		</tr>
		<tr>
			<th>
				3010
			</th>
			<td class="altid" title="LEFT BLACK LENTICULAR BRACKET">&#x3010;</td>
			<td class="altid" title="RIGHT BLACK LENTICULAR BRACKET">&#x3011;</td>
			<td class="altid" title="POSTAL MARK">&#x3012;</td>
			<td class="altid" title="GETA MARK">&#x3013;</td>
			<td class="altid" title="LEFT TORTOISE SHELL BRACKET">&#x3014;</td>
			<td class="altid" title="RIGHT TORTOISE SHELL BRACKET">&#x3015;</td>
			<td class="altid" title="LEFT WHITE LENTICULAR BRACKET">&#x3016;</td>
			<td class="altid" title="RIGHT WHITE LENTICULAR BRACKET">&#x3017;</td>
			<td class="altid" title="LEFT WHITE TORTOISE SHELL BRACKET">&#x3018;</td>
			<td class="altid" title="RIGHT WHITE TORTOISE SHELL BRACKET">&#x3019;</td>
			<td class="altid" title="LEFT WHITE SQUARE BRACKET">&#x301a;</td>
			<td class="altid" title="RIGHT WHITE SQUARE BRACKET">&#x301b;</td>
			<td class="altid" title="WAVE DASH">&#x301c;</td>
			<td class="altid" title="REVERSED DOUBLE PRIME QUOTATION MARK">&#x301d;</td>
			<td class="altid" title="DOUBLE PRIME QUOTATION MARK">&#x301e;</td>
			<td class="altid" title="LOW DOUBLE PRIME QUOTATION MARK">&#x301f;</td>
		</tr>
		<tr>
			<th>
				3020
			</th>
			<td class="altid" title="POSTAL MARK FACE">&#x3020;</td>
			<td title="HANGZHOU NUMERAL ONE">&#x3021;</td>
			<td title="HANGZHOU NUMERAL TWO">&#x3022;</td>
			<td title="HANGZHOU NUMERAL THREE">&#x3023;</td>
			<td title="HANGZHOU NUMERAL FOUR">&#x3024;</td>
			<td title="HANGZHOU NUMERAL FIVE">&#x3025;</td>
			<td title="HANGZHOU NUMERAL SIX">&#x3026;</td>
			<td title="HANGZHOU NUMERAL SEVEN">&#x3027;</td>
			<td title="HANGZHOU NUMERAL EIGHT">&#x3028;</td>
			<td title="HANGZHOU NUMERAL NINE">&#x3029;</td>
			<td title="IDEOGRAPHIC LEVEL TONE MARK">&#x302a;</td>
			<td title="IDEOGRAPHIC RISING TONE MARK">&#x302b;</td>
			<td title="IDEOGRAPHIC DEPARTING TONE MARK">&#x302c;</td>
			<td title="IDEOGRAPHIC ENTERING TONE MARK">&#x302d;</td>
			<td title="HANGUL SINGLE DOT TONE MARK">&#x302e;</td>
			<td title="HANGUL DOUBLE DOT TONE MARK">&#x302f;</td>
		</tr>
		<tr>
			<th>
				3030
			</th>
			<td class="altid" title="WAVY DASH">&#x3030;</td>
			<td title="VERTICAL KANA REPEAT MARK">&#x3031;</td>
			<td title="VERTICAL KANA REPEAT WITH VOICED SOUND MARK">&#x3032;</td>
			<td title="VERTICAL KANA REPEAT MARK UPPER HALF">&#x3033;</td>
			<td title="VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF">&#x3034;</td>
			<td title="VERTICAL KANA REPEAT MARK LOWER HALF">&#x3035;</td>
			<td title="CIRCLED POSTAL MARK">&#x3036;</td>
			<td title="IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL">&#x3037;</td>
			<td title="HANGZHOU NUMERAL TEN">&#x3038;</td>
			<td title="HANGZHOU NUMERAL TWENTY">&#x3039;</td>
			<td title="HANGZHOU NUMERAL THIRTY">&#x303a;</td>
			<td title="VERTICAL IDEOGRAPHIC ITERATION MARK">&#x303b;</td>
			<td title="MASU MARK">&#x303c;</td>
			<td title="PART ALTERNATION MARK">&#x303d;</td>
			<td title="IDEOGRAPHIC VARIATION INDICATOR">&#x303e;</td>
			<td title="IDEOGRAPHIC HALF FILL SPACE">&#x303f;</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
				3040
			</th>
			<td>Hiragana, Katakana</td>
		</tr>
		<tr>
			<th>
				3100
			</th>
			<td>Bopomofo, Hangul Compatibility Jamo, Kanbun, Bopomofo Extended, CJK Strokes, Katakana
				Phonetic Extensions</td>
		</tr>
		<tr>
			<th>
				3200
			</th>
			<td>Enclosed CJK Letters and Months</td>
		</tr>
		<tr>
			<th>
				3300
			</th>
			<td>CJK Compatibility</td>
		</tr>
		<tr>
			<th>
				3400<br />
				4D00
			</th>
			<td>CJK Unified Ideographs Extension A</td>
		</tr>
		<tr>
			<th>
				4DC0
			</th>
			<td>Yijing Hexagram Symbols</td>
		</tr>
		<tr>
			<th>
				4E00<br />
				9F00
			</th>
			<td>CJK Unified Ideographs</td>
		</tr>
		<tr>
			<th>
				A000<br />
				A400
			</th>
			<td>Yi Syllables</td>
		</tr>
		<tr>
			<th>
				A490
			</th>
			<td>Yi Radicals, Lisu</td>
		</tr>
		<tr>
			<th>
				A500<br />
				A600
			</th>
			<td>Vai</td>
		</tr>
		<tr>
			<th>
				A640
			</th>
			<td>Cyrillic Extended-B, Bamum</td>
		</tr>
		<tr>
			<th>
				A700
			</th>
			<td>Modifier Tone Letters, Latin Extended-D</td>
		</tr>
		<tr>
			<th>
				A800
			</th>
			<td>Syloti Nagri, Common Indic Number Forms, Phags-pa, Saurashtra, Devanagari Extended
			</td>
		</tr>
		<tr>
			<th>
				A900
			</th>
			<td>Kayah Li, Rejang, Hangul Jamo Extended-A, Javanese</td>
		</tr>
		<tr>
			<th>
				AA00
			</th>
			<td>Cham, Myanmar Extended-A, Tai Viet</td>
		</tr>
		<tr>
			<th>
				AB00
			</th>
			<td>Meetei Mayek</td>
		</tr>
		<tr>
			<th>
				AC00<br />
				D700
			</th>
			<td>Hangul Syllables</td>
		</tr>
		<tr>
			<th>
				D7B0
			</th>
			<td>Hangul Jamo Extended-B</td>
		</tr>
		<tr>
			<th>
				D800<br />
				DB00
			</th>
			<td class="altid xml5">High Surrogates</td>
		</tr>
		<tr>
			<th>
				DB80
			</th>
			<td class="altid xml5">High Private Use Surrogates</td>
		</tr>
		<tr>
			<th>
				DC00<br />
				DF00
			</th>
			<td class="altid xml5">Low Surrogates</td>
		</tr>
		<tr>
			<th>
				E000<br />
				F800
			</th>
			<td class="altid xml5">Private Use Area</td>
		</tr>
		<tr>
			<th>
				F900<br />
				FA00
			</th>
			<td>CJK Compatibility Ideographs</td>
		</tr>
		<tr>
			<th>
				FB00
			</th>
			<td>Alphabetic Presentation Forms</td>
		</tr>
		<tr>
			<th>
				FB50<br />
				FD00
			</th>
			<td>Arabic Presentation Forms-A</td>
		</tr>
		<tr>
			<th>
				FDD0
			</th>
			<td class="altid xml5">(non-characters)</td>
		</tr>
		<tr>
			<th>
				FDF0
			</th>
			<td>Arabic Presentation Forms-A</td>
			<td>This block contains Pattern_Syntax characters <span class="altid" title="ORNATE LEFT PARENTHESIS">
				&#xFD3E;</span> and <span class="altid" title="ORNATE RIGHT PARENTHESIS">&#xFD3F;</span>
			</td>
		</tr>
		<tr>
			<th>
				FE00
			</th>
			<td>Variation Selectors, Vertical Forms</td>
		</tr>
		<tr>
			<th>
				FE20
			</th>
			<td>Combining Half Marks</td>
			<td>XML does <strong>not</strong> disallow these initally</td>
		</tr>
		<tr>
			<th>
				FE30
			</th>
			<td>CJK Compatibility Forms</td>
			<td>This block also contains Pattern_Syntax characters <span class="altid" title="SESAME DOT">
				&#xFE45;</span> and <span class="altid" title="WHITE SESAME DOT">&#xFE46;</span>
			</td>
		</tr>
		<tr>
			<th>
				FE50
			</th>
			<td>Small Form Variants, Arabic Presentation Forms-B</td>
		</tr>
		<tr>
			<th>
				FF00
			</th>
			<td>Halfwidth and Fullwidth Forms</td>
		</tr>
	</table>
	<table>
		<tr>
			<th>
			</th>
			<th colspan="16">
				Specials
			</th>
		</tr>
		<tr>
			<th>
				FFF0
			</th>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="&lt;unassigned&gt;">&nbsp;</td>
			<td title="INTERLINEAR ANNOTATION ANCHOR">&nbsp;</td>
			<td title="INTERLINEAR ANNOTATION SEPARATOR">&nbsp;</td>
			<td title="INTERLINEAR ANNOTATION TERMINATOR">&nbsp;</td>
			<td title="OBJECT REPLACEMENT CHARACTER">&nbsp;</td>
			<td title="REPLACEMENT CHARACTER">&#xfffd;</td>
			<td class="altid xml5" title="&lt;not a character&gt;">&nbsp;</td>
			<td class="altid xml5" title="&lt;not a character&gt;">&nbsp;</td>
		</tr>
	</table>
	<h3>
		Beyond the BMP</h3>
	<p>
		The Supplementary Private Use Area extends from F0000 through 10FFFF; both [AltId]
		and [XML2008] disallow characters in that range.</p>
	<p>
		In addition, [AltId] disallows, as non-characters, the last two code positions of
		each plane, i.e. every position of the form
		<var>
			P</var>FFFE or
		<var>
			P</var>FFFF, for any value of
		<var>
			P</var>.
	</p>
	<p>
		Otherwise, no character outside the BMP is disallowed as an identifier character
		by either specification.</p>
</body>
</html>
