<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding-left: 0.5empadding-right: 0.5em; ; }

blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }

</style>

<title>Digit Separators</title>
</head>

<body>
<h1>Digit Separators</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3661 - 2013-04-19
</p>

<address>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</address>

<p>
<a href="#Problem">Problem</a><br>
<a href="#Solution">Solution</a><br>
<a href="#Proposal">Proposal</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#new.lex.icon">2.14.2 Integer literals [lex.icon]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#new.lex.fcon">2.14.4 Floating literals [lex.fcon]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#new.lex.ext">2.14.8 User-defined literals [lex.ext]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#diff.cpp11.literalsep">C.<var>new</var>.<var>new</var> Clause 2: lexical conventions [diff.cpp11.lex]</a><br>
</p>

<h2><a name="Problem">Problem</a></h2>

<p>
Numeric literals of more than a few digits are hard to read.
Consider the following tasks.
</p>

<ul>
<li>Pronounce <code>7237498123</code>.</li>
<li>Compare <code>237498123</code>
with <code>237499123</code> for equality.</li>
<li>Decide whether <code>237499123</code>
or <code>20249472</code> is larger.</li>
</ul>


<h2><a name="Solution">Solution</a></h2>

<p>
The problem has a long history of solutions in writing and typography,
digit separators.
In the English-speaking world,
commas are usually used to separate digits.
</p>

<ul>
<li>Pronounce <code>7,237,498,123</code>.</li>
<li>Compare <code>237,498,123</code>
with <code>237,499,123</code> for equality.</li>
<li>Decide whether <code>237,499,123</code>
or <code>20,249,472</code> is larger.</li>
</ul>

<p>
We wish to introduce digit separators into C++.
Much discussion of constraints and alternatives appears in N3499.
We propose using an underscore (aka low line) as a digit separator
and a double radix point (aka double dot) as a disambiguating suffix separator.
</p>


<h2><a name="Proposal">Proposal</a></h2>

<h3><a name="new.lex.icon">2.14.2 Integer literals [lex.icon]</a></h3>

<p>
Edit the grammar as follows.
Editor, note the change to the binary literal syntax
as described in N3472.
</p>

<blockquote>
<dl>
<dt><var>integer-literal:</var></dt>
<dd><var>decimal-literal integer-suffix<sub>opt</sub></var></dd>
<dd><var>octal-literal integer-suffix<sub>opt</sub></var></dd>
<dd><var>hexadecimal-literal integer-suffix<sub>opt</sub></var></dd>
<dt><var>decimal-literal:</var></dt>
<dd><var>nonzero-digit</var></dd>
<dd><var>decimal-literal <ins>digit-separator<sub>opt</sub></ins>
digit</var></dd>
<dt><var>octal-literal:</var></dt>
<dd><code>0</code></dd>
<dd><var>octal-literal <ins>digit-separator<sub>opt</sub></ins>
octal-digit</var></dd>
<dt><var>hexadecimal-literal:</var></dt>
<dd><code>0x</code> <var>hexadecimal-digit</var></dd>
<dd><code>0X</code> <var>hexadecimal-digit</var></dd>
<dd><var>hexadecimal-literal <ins>digit-separator<sub>opt</sub></ins>
hexadecimal-digit</var></dd>
<dt><var>binary-literal:</var></dt>
<dd><code>0b</code> <var>binary-digit</var></dd>
<dd><code>0b</code> <var>binary-digit</var></dd>
<dd><var>hexadecimal-literal <ins>digit-separator<sub>opt</sub></ins>
binary-digit</var></dd>
<dt><var>nonzero-digit:</var> one of</dt>
<dd><code>1 2 3 4 5 6 7 8 9</code></dd>
<dt><var>octal-digit:</var> one of</dt>
<dd><code>0 1 2 3 4 5 6 7</code></dd>
<dt><var>hexadecimal-digit:</var> one of</dt>
<dd><code>0 1 2 3 4 5 6 7 8 9</code></dd>
<dd><code>a b c d e f</code></dd>
<dd><code>A B C D E F</code></dd>
<dt><ins><var>digit-separator:</ins></dt>
<dd><ins><code>_</code></ins></dd>
</dl>
</blockquote>

<p>
Edit paragraph 1 as follows.
</p>

<blockquote>
<p>
An <dfn>integer literal</dfn>
is a sequence of digits
that has no period or exponent part<ins>,
with optional digit separators.
These separators are ignored when determining its value</ins>.
....
[<i>Example:</i>
<del>the</del> <ins>The</ins> number twelve can be written
<code>12</code>, <code>014</code>, or <code>0XC</code>.
<ins>The literals
<code>1048576</code>,
<code>1_048_576</code>,
<code>0X100000</code>,
<code>0x10_0000</code>, and
<code>0_004_000_000</code>
all have the same value.</ins>
&mdash;<i>end example</i>]
</p>
</blockquote>


<h3><a name="new.lex.fcon">2.14.4 Floating literals [lex.fcon]</a></h3>

<p>
Edit the grammar as follows.
</p>

<blockquote>
<dl>
<dt><var>floating-literal:</var></dt>
<dd><var>fractional-constant exponent-part<sub>opt</sub>
floating-suffix<sub>opt</sub></var></dd>
<dd><var>digit-sequence exponent-part floating-suffix<sub>opt</sub></var></dd>
<dt><var>fractional-constant:</var></dt>
<dd><var>digit-sequence<sub>opt</sub></var> <code>.</code>
<var>digit-sequence</var></dd>
<dd><var>digit-sequence</var> <code>.</code></dd>
<dt><var>exponent-part:</var></dt>
<dd><code>e</code> <var>sign<sub>opt</sub> digit-sequence</var></dd>
<dd><code>E</code> <var>sign<sub>opt</sub> digit-sequence</var></dd>
<dt><var>sign:</var> one of</dt>
<dd><code>+ -</code></dd>
<dt><var>digit-sequence:</var></dt>
<dd><var>digit</var></dd>
<dd><var>digit-sequence <ins>digit-separator<sub>opt</sub></ins>
digit</var></dd>
</dl>
</blockquote>

<p>
Edit within paragraph 1 as follows.
</p>

<blockquote>
<p>
....
The integer and fraction parts
both consist of a sequence of decimal (base ten) digits<ins>,
with optional digit separators</ins>.
<ins>These separators are ignored when determining the value.
[<i>Example:</i>
The literals <code>1.602_176_565e-19</code>
and <code>1.602176565e-19</code>
have the same value.
&mdash;<i>end example</i>]</ins>
....
</p>
</blockquote>


<h3><a name="new.lex.ext">2.14.8 User-defined literals [lex.ext]</a></h3>

<p>
Edit the grammar as follows.
Editor, note the change to the binary literal syntax
as described in N3472.
</p>

<blockquote>
<dl>
<dt><var>user-defined-literal:</var></dt>
<dd><var>user-defined-integer-literal</var></dd>
<dd><var>user-defined-floating-literal</var></dd>
<dd><var>user-defined-string-literal</var></dd>
<dd><var>user-defined-character-literal</var></dd>
<dt><var>user-defined-integer-literal:</var></dt>
<dd><var>decimal-literal
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dd><var>octal-literal
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dd><var>hexadecimal-literal
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dd><var>binary-literal
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dt><var>user-defined-floating-literal:</var></dt>
<dd><var>fractional-constant exponent-part<sub>opt</sub>
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dd><var>digit-sequence exponent-part
<del>ud-suffix</del> <ins>separated-suffix</ins></var></dd>
<dt><var>user-defined-string-literal:</var></dt>
<dd><var>string-literal ud-suffix</var></dd>
<dt><var>user-defined-character-literal:</var></dt>
<dd><var>character-literal ud-suffix</var></dd>
<dt><ins><var>separated-suffix:</var></ins></dt>
<dd><ins><var>suffix-separator<sub>opt</sub> ud-suffix</var></ins></dd>
<dt><ins><var>suffix-separator:</var></ins></dt>
<dd><ins>..</ins></dd>
<dt><var>ud-suffix:</var></dt>
<dd><var>identifier</var></dd>
</dl>
</blockquote>

<p>
Edit paragraph 1 as follows.
</p>

<blockquote>
<p>
If a token matches both <var>user-defined-literal</var>
and another literal kind,
it is treated as the latter.
[<i>Example:</i>
<code>123_km</code> <ins>and <code>123.._km</code></ins>
<del>is a <var>user-defined-literal</var></del>
<ins>are <var>user-defined-literal</var>s</ins>,
but <ins>123_456 and</ins> 12LL
<del>is an <var>integer-literal</var></del>
<ins>are <var>integer-literal</var>s</ins>.
&mdash;<i>end example</i>]
The syntactic non-terminal preceding the <var>ud-suffix</var>
<ins>or <var>separated-suffix</var></ins>
in a <var>user-defined-literal</var>
is taken to be the longest sequence of characters
that could match that non-terminal.
</blockquote>

<h3><a name="diff.cpp11.literalsep">C.<var>new</var>.<var>new</var> Clause 2: lexical conventions [diff.cpp11.lex]</a></h3>

<p>Add a new section as follows.
Editor: please incorporate with N3652.</p>

<p>Add the new text block below.</p>

<blockquote class="stdins">

<p>
2.14 [lex.literal]
</p>

<p>
<b>Change:</b>
Digit separator support.
</p>

<p>
<b>Rationale:</b>
Required for new features.
</p>

<p>
<b>Effect on original feature:</b>
Valid C++ 2011 code may change meaning,
and hence possibly fail to compile,
in this International Standard.
A user-defined literal suffix that
begins with an underscore followed by
a character that may be interpreted as a digit
within the context of the enclosing literal
may change meaning.
For example,
<code>10_10</code>
changes from integer <code>10</code> with a suffix of <code>_10</code>
to an integer <code>1010</code>.
The original meaning can be restored with
<code>10.._10</code>.
The literal <code>0x1234_goo</code>
has suffix <code>_goo</code>
but the literal <code>0x1234_foo</code>
has suffix <code>oo</code>.
The <code>0x1234.._foo</code>
has suffix <code>_foo</code>.
</p>
</blockquote>

</body>
</html>
