<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding-left: 0.5empadding-right: 0.5em; ; }

blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }

</style>

<title>Pass by Const Reference or Value</title>
</head>

<body>
<h1>Pass by Const Reference or Value</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3538 - 2013-03-06
</p>

<address>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</address>

<p>
<a href="#Introduction">Introduction</a><br>
<a href="#Background">Background</a><br>
<a href="#Problem">Problem</a><br>
<a href="#Approaches">Approaches</a><br>
<a href="#Solution">Solution</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Either">Either Or</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Undefined">Undefined Behavior</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Dealiasing">Compiler Dealiasing</a><br>
<a href="#Discussion">Discussion</a><br>
<a href="#Revision">Revision History</a><br>
<a href="#References">References</a><br>
</p>


<h2><a name="Introduction">Introduction</a></h2>

<p>
Efficiency and expressiveness are hallmarks of C++,
but sometimes those hallmarks conflict in ways
that force programmers to compromise on one or the other.
The progammer's choice
of passing a given argument by const reference or by value
is one of those compromises.
That compromise has become more of a problem
with modern calling conventions.
</p>

<p>
In this paper,
we
describe the problem,
discuss possible approaches to reduce the problem,
and explore a solution that introduces a new language feature.
</p>


<h2><a name="Background">Background</a></h2>

<p>
Consider the following code for some class <code>type</code>.
</p>

<pre class="example">
<code>extern type va1(type input);
extern type va2(type input);

void <a name="vf1">vf1</a>(type&amp; output, type input) {
    output += va1(input);
    output += va2(input);
}</code>
</pre>

<p>
Passing class types by value is expensive,
as the compiler must often
</p>

<ul>
<li>allocate a temporary local variable of the type
(increasing cache pressure),</li>
<li>copy the bytes of the argument to the temporary,</li>
<li>pass a pointer to the temporary into the function,</li>
<li>access the bytes of the parameter indirectly, and</li>
<li>deallocate the temporary on return.</li>
</ul>

<p>
To avoid this expense,
the traditional response in C
is to pass the type "by pointer".
However, in C++,
the type can be passed by const reference,
which provides both
the syntactic convenience of pass-by-value and
the efficiency of pass-indirectly.
</p>

<p>
The performance difference, coupled with the convenience,
has resulted in an automatic tendency of programmers
to pass classes by const reference.
The class template <code>complex</code>
(26.4 Complex numbers [complex.numbers])
is a prime example of this tendency.
</p>

<p>
Unfortunately, the semantics are not the same,
as there is now the potential for a parameter
to alias with other objects.
In the following rewrite of the above example,
the input parameter may alias with the output parameter
and so the first statement may modify the input parameter,
thus introducing an error into the code
at the second statement.
</p>

<pre class="example">
<code>extern type ra1(const type& input);
extern type ra2(const type& input);

void rf1(type&amp; output, const type&amp; input) {
    output += ra1(input);
    output += ra2(input);
}</code>
</pre>

<p>
Programmers generally have three responses.
</p>

<blockquote>
<dl>

<dt>Ignore aliasing.</dt>
<dd>
<p>
As actual aliasing is uncommon,
programmers often forget to consider it,
or decide that it will "never happen".
</p>
</dd>

<dt>Document aliasing.</dt>
<dd>
<p>
The programmer of a function may document when arguments may not alias.
In C, the <code>restrict</code> type qualifier
<a href="#C11restrict">[C11restrict]</a>
provides this documentation.
In essence, this approach gives up
on trying to match the operational semantics of primitive scalar types.
</p>
</dd>

<dt>Overcome aliasing.</dt>
<dd>
<p>
Programmer of a function may write extra code to overcome aliasing.
This response eases the burden on the calling programmer
at the expense of some loss in efficiency.
</p>
</dd>

</dl>
</blockquote>

<p>
The simplest technique to overcoming aliasing
is to copy the potentially aliasing parameter.
</p>

<pre class="example">
<code>void rf2(type&amp; output, const type&amp; input) {
    type temp = input;
    output += ra1(temp);
    output += ra2(temp);
}</code>
</pre>

<p>
This technique is both more complex and less efficient
than simply passing the parameter by value.
While the technique can be useful when dealing with legacy interfaces,
it should not be a primary technique.
</p>

<p>
The next technique is to detect the aliasing
and copy the parameter only when necessary.
</p>

<pre class="example">
<code>void <a name="rf3">rf3</a>(type&amp; output, const type&amp; input) {
    if ( &amp;output == &amp;input ) {
        type temp = input;
        output += ra1(temp);
        output += ra2(temp);
    } else {
        output += ra1(input);
        output += ra2(input);
    }
}</code>
</pre>

<p>
This technique introduces a comparison
and may introduce an instruction pipeline bubble.
For small classes,
this overhead may exceed the cost of simply copying the class.
</p>

<p>
The next technique is to write the function in two phases.
The first phase reads from parameters
and writes only to temporaries, not potentially aliasable objects.
The second phase writes results.
This technique may not be possible
if subordinate function calls mutate arguments.
</p>

<pre class="example">
<code>void rf4(type&amp; output, const type&amp; input) {
    type temp1 = ra1(input);
    type temp2 = ra2(input);
    output += temp1;
    output += temp2;
}</code>
</pre>

<p>
This technique requires allocating more local storage,
which puts pressure on the memory cache.
</p>

<p>
One can also mix the last two techniques.
</p>

<pre class="example">
<code>void rf5(type&amp; output, const type&amp; input) {
    if ( &amp;output == &amp;input ) {
        type temp1 = ra1(input);
        type temp2 = ra2(input);
        output += temp1;
        output += temp2;
    } else {
        output += ra1(input);
        output += ra2(input);
    }   
}</code>
</pre>

<p>
The value in this technique depends on several attributes of the platform.
</p>

<p>
Finally, even when programmers properly avoid aliased writes,
there is additional expense in reading the parameters
because the compiler may be unable to determine that there are no aliases,
and hence keep values in memory rather than in registers.
</p>


<h2><a name="Problem">Problem</a></h2>

<p>
The latest generation of calling conventions
will pass small classes in registers,
provided that those classes
have a trivial copy constructor and trivial destructor.
(This convention was often introduced
with the introduction of 64-bit addresses,
such as
AMD64 <a href="#AMD64abi">[AMD64abi]</a>,
IA-64 <a href="#IA64abi">[IA64abi]</a>, and
SPARC V9 <a href="#SparcV9abi">[SparcV9abi]</a>.)
Thus small classes
can have value parameter performance near that of primitive scalar types.
Just as importantly,
aliasing for such parameters is not possible,
and thus they are safer.
</p>

<p>
The problem is that a programmer writing portable software
(or defining widely used interfaces)
cannot know whether the software
will run on an old calling convention or on a new one.
Furthermore, the programmer cannot know what constitutes 'small'.
Thus, the programmer has no choice
but to pessimize some platforms.
</p>

<p>
Small classes are quite common.
They are used for handles, numbers, coordinates, etc.
These classes are often critical to overall application performance.
Being unable to write efficient portable programs for these classes
is a significant problem.
</p>


<h2><a name="Approaches">Approaches</a></h2>

<p>
There are at least three approaches to reducing the problem.
</p>

<blockquote>
<dl>

<dt>Pass small classes by value.</dt>
<dd>
<p>
We can change our habits
to pass trivially copyable classes
of sixteen bytes (two <code>double</code>s) or less by value.
With this approach, complex numbers would primarily be passed by value,
not by reference as they are in
the class template <code>complex</code>
(26.4 Complex numbers [complex.numbers]).
This approach avoids aliasing issues,
enables removing some indirect references when accessing parameters,
but may introduce copying on older platforms.
The net performance change is unclear.
</p>
</dd>

<dt>Add the <code>restrict</code> qualifier to C++.</dt>
<dd>
<p>
The <code>restrict</code> qualifier
would at least enable the compiler
to remove some indirect references when accessing reference parameters.
This approach effectively requires the calling programmer to avoid aliasing.
It also does not take full advantage of modern calling conventions.
</p>
</dd>

<dt>Add another language feature.</dt>
<dd>
<p>
Adding a more carefully targeted language feature
may make code less sensitive to the platform
and enable compilers to better optimize the program.
</p>
</dd>

</dl>
</blockquote>

<p>
As the first two approaches are relatively well understood,
we consider only the third approach in detail.
</p>


<h2><a name="Solution">Solution</a></h2>

<p>
Our solution for the third approach
is to introduce a syntax for 'input' parameters.
These parameters give the compiler
the choice of passing the parameter by reference or by value.
The Ada programming languages uses this solution
for its parameter modes
<a href="#AdaLRMparam">[AdaLRMparam]</a>
<a href="#AdaRDparam">[AdaRDparam]</a>.
</p>

<p>
Pragmatically, the choice of how to pass such parameters
will be defined by some combination of
the platform ABI
<a href="#AMD64abi">[AMD64abi]</a>
<a href="#IA64abi">[IA64abi]</a>
<a href="#SparcV9abi">[SparcV9abi]</a>
and the C++ ABI
<a href="#ItaniumCXXabi">[ItaniumCXXabi]</a>
<a href="#SparcCXXabi">[SparcCXXabi]</a>.
</p>

<p>
As a strawman,
we propose a <var>ptr-operator</var> of a <var>declarator</var>
of the form
</p>

<blockquote>
<dl>
<dt><dfn>ptr-operator:</dfn>:</dt>
<dd><code>|</code>
<var>attribute-specifier-seq<sub>opt</sub>
cv-qualifier-seq<sub>opt</sub></var></dd>
</dl>
</blockquote>

<p>
but with the additional constraints
that the <code>const</code> qualifier must be present
and that the operator may only be used on parameters.
</p>

<p>
Example <a href="#vf1"><code>vf1</code></a> becomes:
</p>

<pre class="example">
<code>extern type oa1(const type| input);
extern type oa2(const type| input);

void of1(type&amp; output, const type| input) {
    output += oa1(input);
    output += oa2(input);
}</code>
</pre>

<p>
This example works well
when the 'input' parameter is actually passed by value.
However, when passed by reference,
aliasing issues arise
and the feature needs detailed semantics for aliasing.
We discuss various choices in these semantics below.
</p>


<h3><a name="Either">Either Or</a></h3>

<p>
One semantics choice is that
the parameter is either exactly const reference or exactly (const) value.
The programmer may assume only the intersection of guarantees.
</p>
<ul>
<li>
Because the parameter might be a const reference,
the programmer must assume the parameter may alias.
</li>
<li>
Because the parameter might be a value,
the programmer must assume the parameter
may be a distinct object from all others.
</li>
<li>
Because the parameter might be a value,
the programmer may not assume a pointer to the parameter
will persist beyond the lifetime of the function.
</li>
<li>
More importantly,
value parameters are sliced
and const reference parameters are not,
so the virtual semantics of such parameters would be significantly different.
So, it might be unwise to use this parameter mode with polymorphic classes.
</li>
</ul>

<p>
Under this choice,
overcoming aliasing with the technique
used in <a href="#rf3"><code>rf3</code></a>
has the virtue that if pass-by-value is chosen,
then the condition becomes statically false,
and the condition and corresponding dead code may be eliminated.
There is no loss in efficiency for the fast case.
</p>


<h3><a name="Undefined">Undefined Behavior</a></h3>

<p>
Another choice is to make aliasing undefined behavior.
This approach has been used, for example,
in Fortran66
<a href="#Fortran66args">[Fortran66args]</a>,
Ada83
<a href="#AdaLRMparam">[AdaLRMparam]</a>
<a href="#AdaRDparam">[AdaRDparam]</a>,
and C11
<a href="#C11restrict">[C11restrict]</a>.
Taking this approach in C++ would not be novel.
</p>

<p>
In practice,
read-read aliasing causes no trouble,
so we only need to make write-write and read-write aliasing undefined.
Going further, (non-concurrent) writes to mutable members
that do not change the abstract state of an object
produce well-defined behavior.
</p>

<p>
Fortran, Ada, and C make the prohibition on aliases
a global program property,
which makes it an incomputable property.
In the worst case,
this approach could lead to many latent bugs.
However, the approach has proven workable in practice
because programmers need only follow a few rules to avoid the problem.
First, calling programmers ensure that
arguments do not alias each other.
Second, called programmers ensure that
the functions do not access objects accessible by callers.
</p>

<p>
While in this choice
programmer are ultimately responsible for avoiding aliasing,
static analysis tools and runtime checks can help programmers substantially.
</p>


<h3><a name="Dealiasing">Compiler Dealiasing</a></h3>

<p>
We can choose to require that the compiler dealias the arguments.
This dealiasing is really only feasible for aliasing between parameters.
We must rely on programmers to avoid aliases on non-local objects.
</p>

<p>
Only this choice tries to preserve the illusion that
operations on classes are the same as operations on primitive types.
</p>

<p>
There are a few strategies in implementing the dealiasing.
In all cases the alias checking is potentially O(n<sup>2</sup>)
in the number of arguments.
</p>

<blockquote>
<dl>

<dt>Detect in Callers</dt>
<dd>
<p>
The compiler does alias analysis at the call site
and copies arguments that it cannot prove are unaliased.
Many aliases can be eliminated a-priori
because temporary values cannot be aliased.
On the other hand, the compiler will not know which parameters
are potentially conflicting within the function implementation.
</p>
</dd>

<dt>Detect in Callees</dt>
<dd>
<p>
The compiler does conflict analysis in the function implementation
and copies parameters that alias with another.
This approach limits dealiasing to the function body,
rather than the more numerous function call sites.
There is no argument information available to avoid some of the checks.
</p>
</dd>

<dt>Leak Information</dt>
<dd>
<p>
The compiler can leak information about conflicts
from the callee to the caller,
enabling a joint detection in both callers and callee.
Then the caller can make minimal copies,
only those at the intersection of aliases and conflicts.
While this strategy works well in source-based environments,
it works less well on systems exploiting ABIs.
(Thanks to Chandler Carruth.)
</p>
</dd>

<dt>Multiple Implementations</dt>
<dd>
<p>
The compiler can produce multiple instances of the function,
each representing protecting against some subset of conflicts.
At the call site, the compiler matches the aliases
against the versions of the callee
and chooses to call the one needing
the least number of dealiasing copies.
(Thanks to Chandler Carruth.)
</p>
</dd>

</dl>
</blockquote>


<h2><a name="Discussion">Discussion</a></h2>

<p>
Suppose we choose to adopt both of the first two approaches above &mdash;
pass small classes by value and add the <code>restrict</code> qualifier.
What do we then loose by 
not adopting the 'input' parameter solution presented?
</p>

<ul>
<li>
We lose the effective prohibition on argument references
persisting pass the function return.
</li>

<li>
We lose the illusion of class operations
behaving like primitive type operations.
</li>

<li>
We lose the documentation or enforcement of intent that
the parameter is not intended to be polymorphic.
</li>

</ul>


<h2><a name="Revision">Revision History</a></h2>

<p>
This paper revises N3445 = 12-0135 - 2012-09-23 as follows.
</p>

<ul>

<li><p>
Make editorial corrections.
</p></li>

<li><p>
Add a 'Revision History' section.
</p></li>

</ul>


<h2><a name="References">References</a></h2>

<dl>

<dt><a name="AdaLRMparam">[AdaLRMparam]</a></dt>
<dd>
<cite>Ada '83 Language Reference Manual</cite>,
Section 6.2 Formal Parameter Modes,
<a href="http://archive.adaic.com/standards/83lrm/html/lrm-06-02.html#6.2">
http://archive.adaic.com/standards/83lrm/html/lrm-06-02.html#6.2</a>
</dd>

<dt><a name="AdaRDparam">[AdaRDparam]</a></dt>
<dd>
<cite>Rationale for the Design of the Ada Programming Language</cite>,
Section 8.2 Parameter Modes
<a href="http://archive.adaic.com/standards/83rat/html/ratl-08-02.html#8.2">
http://archive.adaic.com/standards/83rat/html/ratl-08-02.html#8.2</a>
</dd>

<dt><a name="AMD64abi">[AMD64abi]</a></dt>
<dd>
<cite>System V Application Binary Interface,
AMD64 Architecture Processor Supplement,
Draft Version 0.99.6</cite>,
<a href="http://www.x86-64.org/documentation/abi.pdf">
http://www.x86-64.org/documentation/abi.pdf</a>,
Michael Matz, Jan Hubi&#269;ka, Andreas Jaeger, Mark Mitchell,
July 2012,
Chapter 3: Low-Level System Information,
3.2 Function Calling Sequence
</dd>

<dt><a name="C11restrict">[C11restrict]</a></dt>
<dd>
<cite>ISO/IEC 9899:2011 Programming languages -- C</cite>,
<a href="http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1570.pdf">
http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1570.pdf</a>,
Section 6.7.3.1 Formal definition of <code>restrict</code>
</dd>

<dt><a name="Fortran66args">[Fortran66args]</a></dt>
<dd>
<cite>USA Standard FORTRAN (USAS X3.9-1966)</cite>,
(also known as Fortran 66),
<a href="ftp://ftp.nag.co.uk/sc22wg5/ARCHIVE/Fortran66.pdf">
ftp://ftp.nag.co.uk/sc22wg5/ARCHIVE/Fortran66.pdf</a>,
section 8.3.2 Referencing External Functions,
section 8.4.2 Referencing Subroutines
</dd>

<dt><a name="IA64abi">[IA64abi]</a></dt>
<dd>
<cite>Itanium Software Conventions and Runtime Architecture Guide</cite>,
<a href="http://www.intel.com/content/dam/www/public/us/en/documents/guides/itanium-software-runtime-architecture-guide.pdf">
http://www.intel.com/content/dam/www/public/us/en/documents/guides/itanium-software-runtime-architecture-guide.pdf</a>,
Section 8.5 Parameter Passing
</dd>

<dt><a name="ItaniumCXXabi">[ItaniumCXXabi]</a></dt>
<dd>
<cite>Itanium C++ ABI</cite>,
<a href="http://mentorembedded.github.com/cxx-abi/abi.html">
http://mentorembedded.github.com/cxx-abi/abi.html</a>
</dd>

<dt><a name="SparcCXXabi">[SparcCXXabi]</a></dt>
<dd>
<cite>The C++ Applicatio Binary Interface, SPARC Processor Supplement</cite>
Sun Microsystems, Inc.,
December 1995,
Section 3.3: Function Calling Sequence
</dd>

<dt><a name="SparcV9abi">[SparcV9abi]</a></dt>
<dd>
<cite>SPARC Compliance Definition 2.4.1</cite>
<a href="http://www.sparc.org/standards/64.psabi.1.34.ps.Z">
http://www.sparc.org/standards/64.psabi.1.34.ps.Z</a>,
SPARC Internatlional, Inc.,
July 1999,
Chapter 3: Low-Level System Information,
Low-Level System Information (64-bit psABI),
Function Calling Sequence
</dd>

</dl>

</body>
</html>
