<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin: 2em; }
pre.example { margin: 2em; }
div.example { margin: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
p.attribute { text-indent: 3em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1; padding: 0.5em; }
blockquote.stddel { text-decoration: line-through; color: #000000;
  background-color: #FFEBFF; border: 1px solid #ECD7EC; padding: 0.5em; }
blockquote.stdins { text-decoration: underline; color: #000000;
  background-color: #C8FFC8; border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
  border-collapse: collapse; margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top; padding: 0.2em;
  border: 1px solid black; }
td { text-align: left; vertical-align: top; padding: 0.2em;
  border: 1px solid black; }

</style>


<title>More Better Operators</title>
</head>
<body>
<h1>More Better Operators</h1>

<p>
Committee: ISO/IEC JTC1 SC22 WG21 EWG Evolution<br>
Document Number: P0611R0<br>
Date: 2017-03-18<br>
Authors: Lawrence Crowl<br>
Reply To: Lawrence Crowl, Lawrence@Crowl.org<br>
Audience: EWG Evolution
</p>


<h2>Abstract</h2>

<p>
C++ can and should extend and improve its set of operators.
Attention to several approaches and principles
can reduce negative impacts from doing so.
This paper provides two examples of improving operator support.
</p>


<h2>Contents</h2>

<p>
<a href="#Introduction">Introduction</a><br>
<a href="#Problem">Problem</a><br>
<a href="#Solution">Solution</a><br>
<a href="#Examples">Examples Of Change</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Dereference">A Better Dereference Operator</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExclusiveOr">A Better Exclusive Or Operator</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Exponentiation">A New Exponentiation Operator</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Miscellaneous">Miscellaneous Possibilitites</a><br>
<a href="#Operators">Table of Operators</a><br>
</p>


<h2><a name="Introduction">Introduction</a></h2>

<p>
When introduced,
C was notable for its rich set of operators.
Many subsequent languages have adopted those operators.
They have also extended those operators,
so that today the C operator set appears rather small.
C++ is one of those subsequent languages,
but one that has been more conservative in adding operators.
See the <a href="#Operators">Table of Operators</a>
for the current C and C++ operators.
</p>

<p>
The following table gives a sampling of extended operators
in several languages.
Named operators are particularly problematic,
as sometimes they require more special syntax than operators,
and sometimes they are not listed with the punctuation operators.
I have omitted all compound assignment operators.
Entries may not be correct or complete.
</p>

<table>
<thead>

<tr>
<th>language</th>
<th>operators</th>
</tr>

</thead>
<tbody>

<tr>
<td>C++</td>
<td><code>:: .* -&gt;* new new[] delete delete[] throw
typeof decltype</code></td>
</tr>

<tr>
<td>C#</td>
<td><code>:: ?. ?[] checked unchecked delegate await Is As ?? =&gt;</code></td>
</tr>

<tr>
<td>D</td>
<td><code>is &lt;&gt; &lt;&gt;= !&lt; !&gt; !&lt;= !&gt;= !&lt;&gt; !&lt;&gt;=
in !in &gt;&gt;&gt; ~ new delete cast ^^ ..</code></td>
</tr>

<tr>
<td>Go</td>
<td><code>[:] := &lt;- &^</code></td>
</tr>

<tr>
<td>Groovy</td>
<td><code>new {} .& .@ ?. *. *: ** &gt;&gt;&gt; .. ..&gt; in instanceof as
&lt;=&gt; =~ ==~</code></td>
</tr>

<tr>
<td>Java</td>
<td><code>instanceof &gt;&gt;&gt;</code></td>
</tr>

<tr>
<td>Perl</td>
<td><code>** \ =~ !~ lt gt le ge &lt;=&gt; eq ne cmp ~~ .. ... =&gt;</code></td>
</tr>

<tr>
<td>PHP</td>
<td><code>clone new ** @ instanceof === !== ?? . .=</code></td>
</tr>

<tr>
<td>Python</td>
<td><code>** // &lt;&gt; in 'not in' is 'is not'</code></td>
</tr>

<tr>
<td>Ruby</td>
<td><code>:: []= ** &lt;=&gt; === .eql? equal? =~ !~ .. ... defined?</code></td>
</tr>

<tr>
<td>Swift</td>
<td><code>... ..&lt; ?? &+ &- &*</code> and extensible operators</td>
</tr>

</tbody>
</table>


<h2><a name="Problem">Problem</a></h2>

<p>
How do we extend the operator set in a way
that meets the need for concise syntax
while not producing unparsable or unreadable expressions?
</p>

<p>
In addition, how do we remove operators that are problems?
</p>


<h2><a name="Solution">Solution</a></h2>

<p>
There are several approaches to adding operators.
</p>

<dl>

<dt><b>
Extend the token set with new combinations of punctuation characters.
</b></dt>
<dd>

<p>
Existing examples include <code>::</code> and <code>.*</code>.
There exist proposals suggesting <code>&lt;=&gt;</code> and <code>-&gt;</code>.
</p>

<p>
One problem with this approach is that
what was formerly two tokens can become one token.
For example, <code>a**b</code> changes meaning
if one adds a Fortran-style <code>**</code> exponentiation operator.
</p>

<p>
A careful examination of possible adjacent operators
will reduce the likelyhood of problematic operator introduction.
In general, one should avoid concatenation ambiguity
between postfix operators and infix operators
as well as between infix operators and prefix operators.
</p>

<p>
Some rare ambiguities may be acceptable.
For example, the code <code>a//*here*/b</code>
changed meaning with the introduction of <code>//</code> comments.
This change was not a significant problem in practice.
Automated tools could help.
</p>

</dd>


<dt><b>
Extend the token set with new keywords.
</b></dt>
<dd>

<p>
Examples include <code>new</code> and <code>throw</code>.
</p>

<p>
One problem with this approach is that
identifers in use in existing programs can suddenly become ill-formed.
I had one customer strenuously objecting
to introduction of the <code>export</code> keyword
invalidating the public interface of his library.
</p>

<p>
The policy of vetting keywords by searching open source software
has reduced the severity of this problem.
</p>

</dd>


<dt><b>
Extend the 'fixity' of exiting operators.
</b></dt>
<dd>

<p>
The most widely known example of this kind of extension
is that of using the infix subtraction operator <code>-</code>
as prefix negation operator.
An example more specific to C and C++
is the extension of the infix operator <code>*</code> (multiply)
to the prefix operator <code>*</code> (dereference).
</p>

<p>
While all such 'fixity' extensions will yield parsable expressions,
such expression may not be easily written or read.
To avoid confusion, it is probably best that
any given operator have at most two of the three fixities.
</p>

<p>
A more subtle problem is adding a fixity
that would naturally bring two tokens together
that would be interpreted as a different combined token.
The best example of this problem
is the introduction of the <code>&gt;</code>
as a postfix 'close template argument list' operator.
Closing two lists simultaneously led to the right shift operator.
</p></dd>


<dt><b>
Extend the basic character set to enable more operators.
</b></dt>
<dd>

<p>
Some programming languages use Unicode (ISO 10646)
as their base character set
and allow use of the rich set of symbols within various code pages.
While this approach is certainly viable for C and C++,
it could cause compatiblity problems.
</p>

<p>
C and C++ predate Unicode and presently depend on 'common' characters.
These characters are mostly the non-national-use characters
of the ISO 646 collection of character sets.
However, C and C++ do use some ASCII-specific characters.
Digraphs and trigraphs enable avoiding these characters
when using a non-US variant of ISO 646
or when using other character sets, e.g. EBCDIC.
</p>

<p>
There are two ASCII characters that are not in use by C and C++,
<code>@</code> (commercial at)
and
<code>`</code> (accent grave).
Of those, <code>@</code> is in use by Objective C and Objective C++.
These languages are meta-languages built on top of C and C++.
In fact, other programmers may often need to build meta-languages
on top of C and C++.
Furthermore, they may need to compose two different meta language.
That composition is possible in C and C++ today precisely
because they do not use <code>@</code> and <code>`</code>.  
I strongly recommend the C and C++
do not use those characters.
</p>

</dd>

</dl>

<p>
No approach above is likely to be sufficient alone.
Furthermore,
all approaches have the potential to invalidate existing code.
If designers investigate potential invalidations
and provide migration strategies,
we can sanely extend the operator set.
</p>


<h2><a name="Examples">Examples Of Change</a></h2>

<p>
In this section,
I outline some changes to operators
that I think would improve C and C++.
</p>


<h3><a name="Dereference">A Better Dereference Operator</a></h3>

<p>
The prefix dereference operator
is a known and persistent problem for programmers.
It complicates declaration and expression syntax.
It occasionally requires grouping parentheses.
It occasionally requires reading expressions inside-out.
</p>

<p>
Solution is a postfix dereference operator.
The only parenthesis required are those for function calls.
Declarations (and their corresponding expressions)
are read strictly left-to-right,
with the exception of the final type.
</p>

<table>

<tr>
<th>prefix</th>
<th>postfix</th>
</tr>

<tr>
<td><code>int *v</code></td>
<td><code>int v^</code></td>
</tr>

<tr>
<td><code>int *a[6]</code></td>
<td><code>int a[6]^</code></td>
</tr>

<tr>
<td><code>int *f(int)</code></td>
<td><code>int f(int)^</code></td>
</tr>

<tr>
<td><code>int (*f)(int)</code></td>
<td><code>int f^(int)</code></td>
</tr>

<tr>
<td><code>int *(*f[6])(int)</code></td>
<td><code>int f[6]^(int)^</code></td>
</tr>

<tr>
<td><code>int *(*(*f)[6])(int)</code></td>
<td><code>int f^[6]^(int)^</code></td>
</tr>

</table>

<p>
Once a postifix dereference operator is in place,
the prefix dereference operator becomes redundant.
It can be removed at some later time.
After the prefix dereference operator is gone,
that operator space could be reused for another purpose.
This timeline leads to the following migration strategy.
</p>

<table>
<thead>

<tr>
<th>Standards</th>
<th>Programmers</th>
</tr>

</thead>
<tbody>

<tr>
<td>add <code>a^</code> as dereference</td>
<td rowspan=2>replace <code>*a</code> expressions with <code>a^</code></td>
</tr>

<tr>
<td>deprecate <code>*a</code></td>
</tr>

<tr>
<td>remove <code>*a</code></td><td>wait</td>
</tr>

<tr>
<td>add <code>a**b</code> for exponentiation
<br>and/or <code>*a</code> for something else</td>
<td>use that meaning</td>
</tr>

</tbody>
</table>


<h3><a name="ExclusiveOr">A Better Exclusive Or Operator</a></h3>

<p>
The expressions <code>a-b</code> and <code>-b</code>
are natural pairs in the sense that
the second can be rewritten as the first with a constant;
<code>-b</code> is <code>0-b</code>.
</p>

<p>
The same is true of one's complement and exclusive or;
<code>~b</code> is <code>ALLONES^b</code>.
However, they don't use the same operator.
They could though,
as there is no current use of <code>~</code> as an infix operator.
Indeed, PL/I took this approach
where the operator was the &not; symbol.
</p>

<p>
Changing the exclusive or operator to <code>~</code>
would free up the infix <code>^</code> operator for other purposes.
The most pressing use is exponentiation.
It would not suprise me if most uses of binary <code>^</code>
in C/C++ comments is as an exponentiation operator,
not an exclusive or operator.
This leads to the following migration strategy.
</p>

<table>
<thead>

<tr>
<th>Standards</th>
<th>Programmers</th>
</tr>

</thead>
<tbody>

<tr>
<td>add <code>a~b</code> as xor
<br>redefine <code>xor</code> to <code>~</code>
<br>redefine <code>xor_eq</code> to <code>~=</code></td>
<td rowspan=2>replace uses of token <code>^</code> with <code>~</code></td>
</tr>

<tr>
<td>deprecate <code>a^b</code></td>
</tr>

<tr>
<td>remove <code>a^b</code></td><td>wait</td>
</tr>

<tr>
<td>add <code>a^b</code> as exponentiation
<br>add <code>pwr</code> as <code>^</code></td>
<td>use <code>a^b</code> for exponentiation</td>
</tr>

</tbody>
</table>

<p>
Standards could either retain or remove token <code>xor</code>.
The migration strategy for removal follows.
</p>

<table>
<thead>

<tr>
<th>Standards</th>
<th>Programmers</th>
</tr>

</thead>
<tbody>

<tr>
<td>wait</td>
<td rowspan=2>replace uses of token <code>xor</code> with <code>compl</code></td>
</tr>

<tr>
<td>deprecate <code>xor</code></td>
</tr>

<tr>
<td>remove <code>xor</code></td><td>do nothing</td>
</tr>

</tbody>
</table>


<h3><a name="Exponentiation">A New Exponentiation Operator</a></h3>

<p>
Both of the operator improvements above
provide space for two different exponentiation (power) operators.
</p>

<p>
That space is needed because:
</p>

<ul>

<li>
The C <code>pow</code> function means different things in different libraries.
</li>

<li>
Use of the <code>pow</code> function
is not a clean mapping from typical mathematical expressions.
</li>

<li>
A floating-point <code>pow</code> function
is not a replacement for a integer power operation.
</li>

</ul>

<p>
I prefer the <code>^</code> operator
to the <code>**</code> operator,
but <code>**</code> has the advange of not using national-use characters
and hence needing no lexical work-arounds.
</p>


<h3><a name="Miscellaneous">Miscellaneous Possibilitites</a></h3>

<p>
To round out the examples,
let me provide some short hints at possibilities.
</p>

<table>
<thead>
<tr><th>expression</th><th>meaning</th></tr>
</thead>

<tbody>
<tr><td><code>/a</code></td><td>inversion <code>1/a</code></td></tr>
<tr><td><code>^a</code></td><td>exp function <code>e^a</code></td></tr>
<tr><td><code>!!a</code></td><td>logical identity <code>! !a</code></td></tr>
<tr><td><code>a!!b</code></td><td>logical xor <code>!a~!b</code></td></tr>
<tr><td><code>a~~b</code></td><td>logical xor <code>!a~!b</code></td></tr>
<tr><td><code>|a</code></td><td>absolute value <code>a<0?-a:a</code></td></tr>
</tbody>

</table>


<h2><a name="Operators">Table of Operators</a></h2>

<p>
This table merges the operators of both C and C++.
It extends the precedence levels as necessary
to assign a shared consistent level.
</p>

<table>
<thead>

<tr>
<th>operators</th>
<th>prefix</th>
<th>infix</th>
<th>postfix</th>
</tr>

</thead>
<tbody>

<tr>
<td><code>::</code></td>
<td>1L scope (C++)</td>
<td>1L scope (C++)</td>
<td></td>
</tr>

<tr>
<td><code>()</code></td>
<td>3R C-style cast (around type)
<br>0 group (around expression)</td>
<td></td>
<td>2L functional cast (after type) (C++)
<br>2L call (after expression)</td>
</tr>

<tr>
<td><code>{} <%%></code></td>
<td>0 list (a literal in C)</td>
<td></td>
<td>2L functional cast (after type)</td>
</tr>

<tr>
<td><code>[] <::></code></td>
<td>0 lambda (C++)</td>
<td></td>
<td>2L index</td>
</tr>

<tr>
<td><code>. -&gt;</code></td>
<td></td>
<td>2L member</td>
<td></td>
</tr>

<tr>
<td><code>sizeof</code>
<br><code>_Alignof</code> (C)
<br><code>alignof</code> (C++)</td>
<td>3R type attribute</td>
<td></td>
<td></td>
</tr>

<tr>
<td><code>new new[]
<br>delete delete[]</code></td>
<td>3R allocation (C++)</td>
<td></td>
<td></td>
</tr>

<tr>
<td><code>++ --</code></td>
<td>3R pre-(in|dec)crement</td>
<td></td>
<td>2L post-(in|dec)crement</td>
</tr>

<tr>
<td><code>~ compl</code></td>
<td>3R bitwise not</td>
<td></td>
<td></td>
</tr>

<tr>
<td><code>! not</code></td>
<td>3R logical not</td>
<td></td>
<td></td>
</tr>

<tr>
<td><code>.* -&gt;*</code></td>
<td></td>
<td>4L member pointer (C++)</td>
<td></td>
</tr>

<tr>
<td><code>*</code></td>
<td>3R dereference</td>
<td>5L multiply</td>
<td></td>
</tr>

<tr>
<td><code>/</code></td>
<td></td>
<td>5L divide</td>
<td></td>
</tr>

<tr>
<td><code>%</code></td>
<td><var>potential digraph conflict</var></td>
<td>5L modulo</td>
<td><var>potential digraph conflict</var></td>
</tr>

<tr>
<td><code>+</code></td>
<td>3R promote</td>
<td>6L add</td>
<td></td>
</tr>

<tr>
<td><code>-</code></td>
<td>3R negate</td>
<td>6L subtract</td>
<td></td>
</tr>

<tr>
<td><code>&lt;&lt; &gt;&gt;</code></td>
<td></td>
<td>7L shift</td>
<td></td>
</tr>

<tr>
<td><code>&lt;</code></td>
<td></td>
<td>8L order
<br>? template argument</td>
<td></td>
</tr>

<tr>
<td><code>&gt;</code></td>
<td></td>
<td>8L order</td>
<td>? template argument</td>
</tr>

<tr>
<td><code>&lt;= &gt;=</code></td>
<td></td>
<td>8L order</td>
<td></td>
</tr>

<tr>
<td><code>== != not_eq</code></td>
<td></td>
<td>9L equality</td>
<td></td>
</tr>

<tr>
<td><code>&amp; bitand</code></td>
<td>3R address-of</td>
<td>10L bitwise and</td>
<td></td>
</tr>

<tr>
<td><code>^ xor</code></td>
<td></td>
<td>11L bitwise xor</td>
<td></td>
</tr>

<tr>
<td><code>| bitor</code></td>
<td></td>
<td>12L bitwise or</td>
<td></td>
</tr>

<tr>
<td><code>&amp;&amp; and</code></td>
<td></td>
<td>13L logical and</td>
<td></td>
</tr>

<tr>
<td><code>|| or</code></td>
<td></td>
<td>14L logical or</td>
<td></td>
</tr>

<tr>
<td><code>?:</code></td>
<td></td>
<td>15R conditional (C)
<br>16R conditional (C++)</td>
<td></td>
</tr>

<tr>
<td><code>throw</code></td>
<td>16R throw (C++)</td>
<td></td>
<td></td>
</tr>

<tr>
<td><code>=
<br>*= /= %=
<br>+= -=
<br>&lt;&lt;= &gt;&gt;=
<br>&amp;= ^= |=
<br>and_eq xor_eq or_eq</code></td>
<td></td>
<td>16R assignment</td>
<td></td>
</tr>

<tr>
<td><code>,</code></td>
<td></td>
<td>argument separator (within call)
<br>element separator (within list)
<br>17L sequence (otherwise)</td>
<td></td>
</tr>

</tbody>
</table>

</body>
</html>
