<html>
<!--
questions:
- deprecate octal constants starting with a leading zero at the same time as introducing this proposal?
- modify also octal escape sequences in character constants as proposed in the addendum?
- allow binary escape sequences in character constants (for coherency)?
- prescribe the behaviour of compilers in case the value of a character literal falls outside the implementation-defined range defined for char. 
- synchronise standardization with C standard?
- propose a change of the C-run time library function strol?
-->
<head>
<title>Oo... adding a coherent character sequence to start octal-literals</title>
<style type="text/css">
  p {text-align:justify}
  li {text-align:justify}
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
</style>
</head>

<body>
<table>
<tr>
  <td align="left">Doc. no.</td>
  <td align="left">P0085R0</td>
</tr>
<tr>
  <td align="left">Date:</td>
  <td align="left">2015-05-08</td>
</tr>
<tr>
  <td align="left">Project:</td>
  <td align="left">ISO JTC1/SC22/WG21: Programming Language C++, evolution group</td>
</tr>
<tr>
  <td align="left">Reply to:</td>
  <td align="left">Michael Jonker &lt;<a href="mailto:Michael.Jonker@cern.ch">Michael.Jonker@cern.ch</a>&gt;, Axel Naumann &lt;<a href="mailto:Axel.Naumann@cern.ch">Axel.Naumann@cern.ch</a>&gt;</td>
</tr>
</table>


<h1 align=center>Oo... adding a coherent character sequence to begin octal-literals</h1>
<!-- style and layout copied from: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4340.html -->
<!-- for submission instructions: https://isocpp.org/std/submit-a-proposal -->

<a name="Proposal"></a><h2>Proposal</h2>
<p>Proposal to add <tt>0o</tt> and <tt>0O</tt> as an alternative (and preferred) sequence to introduce octal-literals.
</p>
<p>
The syntax rule to interpret integer literals starting with a zero as octal-literals probably dates back from the time people were still tinkering with 8 or 16 bit processors in their garages.
This syntax rule is what might be called an historical mistake.
Nowadays, this feature is hardly used and can be easily misunderstood or overseen by novice programmers leading to unexpected errors.
</p>
<p>
To allow future generations (of developers if not compilers) to correct this feature,
I propose to add the character sequence <tt>0o</tt> as an alternative
(and preferred) sequence to introduce an octal-literal.
The prefix <tt>0o</tt> follows the model set by the prefix <tt>0x</tt> to introduce a hex-literal, and (since c++14) <tt>0b</tt> to introduce a binary-literal.
</p>

<p>
From <a href="http://en.wikipedia.org/wiki/Octal">http://en.wikipedia.org/wiki/Octal//en.wikipedia.org/wiki/Octal</a>) :
<i>
"Newer languages have been abandoning the prefix 0, as decimal numbers  are often represented with leading zeroes.
The prefix q was introduced to avoid the prefix o being mistaken for a zero,
while the prefix 0o was introduced to avoid starting a numerical literal with an alphabetic character (like o or q),
since these might cause the literal to be confused with a variable name.
The prefix 0o also follows the model set by the prefix 0x used for hexadecimal literals in the C language;
it is supported by Haskell,[11] OCaml,[12] Perl 6,[13] Python as of version 3.0,[14] Ruby,[15] Tcl as of version 9,[16] and it is intended to be supported by ECMAScript 6[17] (the prefix 0 has been discouraged in ECMAScript 3 and dropped in ECMAScript 5[18])."
</i>
</p>


<a name="Examples"></a><h2>Examples</h2>
<pre>
<code>
// The following literals all specify the same number.

int literal_octal_prefered          = 0o52;
int literal_octal_to_be_deprecated  = 052;
int literal_decimal                 = 42;
int literal_hex                     = 0x2A;
int literal_binary                  = 0b00101010;
</code>
</pre>

<a name="Effects on existing code"></a><h2>Effects on existing code</h2>
<p>
This proposal does not invalidate the existing syntax rule for integer literals starting with a zero.
Also, under the current standard, any sequence starting with 0o is illegal.
As a consequence, the proposed modification will not break existing code.
</p>

<a name="Discussion"></a><h2>Discussion</h2>
<p>
The objective of this proposal is to create the possibility, by adding this more coherent syntax to introduce octal-literals,
to deprecate and phase out the current octal literal syntax in a future version (if so decided).
</p>

<p>
It is recommended that at the places where the 0o sequence is introduced in the documentation, explicit attention is drawn to the existence of an incoherent (but backward compatible) syntax of integer literals starting with a zero.
</p>
<p>
It should be noted that a similar 'clean-up' is required in the specification of the 
<a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/fscanf.html"><tt>scanf</tt></a> and
<a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/strtol.html"><tt>strtol</tt></a> families of C run-time library functions.
The required modifications, although outside the scope of the C++ language, are also detailed below.
</p>

<h2><a name="6.0">Technical Specification</a></h2>

<p>
Make the following edits (relative to
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296">n4296</a>)
with <ins>insertions</ins> and <del>removals</del> marked like so:
</p>

<h3><a name="2.13.2">2.13.2 Integer literals [lex.icon]</a></h3>
<pre>
      <i>octal-literal:
            <ins>0o octal-digit</ins>
            <ins>0O octal-digit</ins>
            0
            octal-literal <sub>opt</sub> octal-digit</i>
</pre>
<p>
1 An <i>integer literal</i> is a sequence of digits that has no period or exponent part, with optional separating single
quotes that are ignored when determining its value. An integer literal may have a prefix that specifies its base
and a suffix that specifies its type. The lexically first digit of the sequence of digits is the most significant.
A <i>binary</i> integer literal (base two) begins with <tt>0b</tt> or <tt>0B</tt> and consists of a sequence of binary digits.
An <i>octal</i> integer literal (base eight) begins <ins>with <tt>0o</tt>, <tt>0O</tt> or </ins>with the digit <tt>0</tt> and consists of a sequence of octal digits.<sup>22</sup>
A <i>decimal</i> integer literal (base ten) begins with a digit other than <tt>0</tt> and consists of a sequence of decimal digits.
A <i>hexadecimal</i> integer literal (base sixteen) begins with <tt>0x</tt> or <tt>0X</tt> and consists of a sequence of hexadecimal
digits, which include the decimal digits and the letters a through f and A through F with decimal values ten through fifteen.
[ <i>Example:</i> The number twelve can be written 12, <ins>0o14, </ins>014, 0XC, or 0b1100. The literals
1048576, 1048576, 0X100000, 0x100000, and <ins>0o</ins>0004000000 all have the same value.  <i>end example</i> ]
</p>


<h3><a name="A.2">A.2 Lexical conventions [gram.lex]</a></h3>
<pre>
      <i>octal-literal:
            <ins>0o octal-digit</ins>
            <ins>0O octal-digit</ins>
            0
            octal-literal <sub>opt</sub> octal-digit</i>
</pre>

<a name="Addendum"></a><h2>Addendum</h2>
<p>
While browsing the standard for the keyword 'octal', a second occurrence of octal encoded information was spotted in escape sequences of Character literals.
The following modification could be considered in addition to the modification of octal-literals discussed above.
</p>
<p>
Note that the proposed modification hereunder also removes the restriction that limits the number of octal digits in an escape sequence to <tt>3</tt>,
as there is no such restriction for hexadecimal escape sequences.
Of course, the specification that <i>'the value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char ...'</i> is itself ill defined.
Better would be to specify explicitly that the compiler produces a warning/error and/or to prescribe that the value equals the specified value modulus the implementation-defined range for char.
</p>

<h3><a name="2.13.3">2.13.3 Character literals [lex.icon]</a></h3>
<pre>
      <i>octal-escape-sequence:
            <ins>\o octal-digit</ins>
            \ octal-digit
            <del>\ octal-digit octal-digit</del>
            <del>\ octal-digit octal-digit octal-digit</del>
            <ins>octal-escape-sequence octal-digit</ins></i>
</pre>

<h4><a name="2.13.3 Table 6">Table 6  Escape sequences (2.13.3)</a></h4>
<pre>
<ins>octal number nnn \onnn</ins>
octal number ooo \ooo
hex number hhh \xhhh
</pre>
<p>
<sup>8</sup> 
<del>The escape <tt>\ooo</tt> consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character.</del>
<ins>The escape <tt>\onnn</tt> consists of the backslash followed o followed by one or more octal digits that are taken to specify the value of the desired character.
The escape <tt>\ooo</tt> consists of the backslash followed by one or more octal digits that are taken to specify the value of the desired character.
</ins>
The escape <tt>\xhhh</tt> consists of the backslash followed by x followed by one
or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to
the number of digits in <ins>an octal or </ins>a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by
the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character
literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for
literals with no prefix) or wchar_t (for literals prefixed by L). [ <i>Note:</i> If the value of a character literal
prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed.  <i>end note</i> ]
</p>
<h3><a name="A.2">A.2 Lexical conventions [gram.lex]</a></h3>
<pre>
      <i>octal-escape-sequence:
            <ins>\o octal-digit</ins>
            \ octal-digit
            <del>\ octal-digit octal-digit</del>
            <del>\ octal-digit octal-digit octal-digit</del>
            <ins>octal-escape-sequence octal-digit</ins></i>
</pre>

<a name="fscan strol"></a><h2>Technical Specification C run time library functions: strol, scanf</h2>
Although related, but outside the scope of the C++ language, proposed modifications to the technical specification of the C run time library functions <tt>strol</tt> and <tt>scanf</tt> are given here for completeness:
<h3><a name="strol">strol</a></h3>
<p>
Make the following edits (relative to
<a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/strtol.html">pubs.opengroup.org:strtol</a>)
with <ins>insertions</ins> and <del>removals</del> marked like so:
</p>
<p>
If the value of base is <tt>0</tt>, the expected form of the subject sequence is that of a decimal constant, <ins>binary constant</ins>, octal constant, or hexadecimal constant,
any of which may be preceded by a <tt>'+'</tt> or <tt>'-'</tt> sign.
A decimal constant <ins>[<i>deprecated:</i></ins> begins with a non-zero digit, and<ins>- <i> end deprecated</i>]</ins> consists of a sequence of decimal digits.
<ins>A binary constant consists of the prefix <tt>0b</tt> or <tt>0B</tt> followed by a sequence of the digits <tt>'0'</tt> or <tt>'1'</tt> only.</ins>
An octal constant consists of <ins>the prefix <tt>0o</tt> or <tt>0O</tt> [<i>deprecated:</i> or</ins> the prefix <tt>'0'</tt>, optionally<ins>- <i>end deprecated]</i></ins> followed by a sequence of the digits <tt>'0'</tt> to <tt>'7'</tt> only.
A hexadecimal constant consists of the prefix <tt>0x</tt> or <tt>0X</tt> followed by a sequence of the decimal digits and letters <tt>'a'</tt> (or <tt>'A'</tt> ) to <tt>'f'</tt> (or <tt>'F'</tt> ) with values 10 to 15 respectively.
</p>
<p>
If the value of base is between 2 and 36, the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base,
optionally preceded by a <tt>'+'</tt> or <tt>'-'</tt> sign.
The letters from <tt>'0a'</tt> (or <tt>'A'</tt> ) to <tt>'z'</tt> (or <tt>'Z'</tt> ) inclusive are ascribed the values 10 to 35;
only letters whose ascribed values are less than that of base are permitted.
If the value of base is <ins>2, 8,</ins> 16,
the characters <ins><tt>0b</tt> or <tt>0B</tt>, <tt>0o</tt> or <tt>0O</tt>,</ins> <tt>0x</tt> or <tt>0X</tt> <ins>respectively</ins>
may optionally precede the sequence of letters and digits, following the sign if present.
</p>
<h3><a name="scanf">scanf</a></h3>
As the technical specification of <tt>scanf</tt> (
<a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/scanf.html">pubs.opengroup.org:scanf</a>)
is based on the specification of <tt>strol</tt>, no modification is required to this specification.
In practice, the documentation of <tt>scanf</tt> (in e.g. man pages) is often implicitly expressing the behaviour of the underlying <tt>strol</tt> function.
This documentation should be adapted accordingly. Example:

<pre>
<code>
conversions
i   Matches an optionally signed integer; the next pointer must be a pointer to int.
    The integer is read:
      in base 16 if it begins with <tt>0x</tt> or <tt>0X</tt>,
      in base 8 if it begins with 0<ins><tt>o</tt> or <tt>0O</tt> [<i>deprecated:</I> or 0 - <i>end deprecated</i>]</ins>,
      <ins>in base 2 if it begins with <tt>0b</tt> or <tt>0B</tt></ins>
      and in base 10 otherwise.
    Only characters that correspond to the base are used.
</code>
</pre>
<a name="Acknowledgements"></a><h2>Acknowledgements</h2>
<p>
Thanks to Axel Naumann for guidance, discussion and providing references to the standard documents.
<p>
The document style was borrowed from Doc. no. <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4340.html">N4340</a>
</p>
</body>

</html>
