<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>C++ Data-Dependency Ordering: Function Annotation</title>
</head><body>
<h1>C++ Data-Dependency Ordering: Function Annotation</h1>

<p>ISO/IEC JTC1 SC22 WG21 N2361 = 07-0221 - 2007-08-02

</p><p>Paul E. McKenney, paulmck@linux.vnet.ibm.com

</p><h2>Introduction</h2>

<p> This document presents an interface and minimal implementation
for preservation of data dependency ordering to expedite access to
dynamic linked data structures that are read frequently and seldom modified.
This document extends the proposal in
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A> and
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2360.html">N2360</A>
to permit dependency chains to cross compilation-unit boundaries, by
providing annotations for function arguments and return values.
The rationale for this proposal may be found in
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>.
<P>
This proposal does not affect library functions.
Changes to library functions (for example, annotating the Vector templates)
were considered,
but rejected because current uses of data dependency ordering are restricted
to basic operations such as indirection, field selection, array access,
and casting.
Longer term experience might indicate that a future
proposal affecting library classes is warranted, however, there is
insufficient motivation for such a proposal at this time.
<P>
This proposal is expected to have minimal affect to strongly ordered
machines (e.g., x86) and on weakly ordered machines that do not
support data dependency ordering (e.g., Alpha).
It has no effect on implementations that refrain from breaking
dependency chains.
The major burden of this proposal would fall on weakly ordered machines
that order data-dependent operations, such as ARM, Itanium, and PowerPC.
Even for these architectures, a fully conforming compiler could use
the same approach as weakly ordered machines that do not support
data dependency ordering, albeit at a performance penalty.
Alternatively, such a compiler could simply refrain from carrying out
optimizations that break dependency chains.
<P>
This proposal enforces only data dependencies, not control dependencies.
If experience indicates that control dependencies also need to be
enforced, a separate proposal will be put forward for them.
<P>
This proposal is based on
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2153.pdf">N2153</A>
by Silvera, Wong, McKenney, and Blainey, on
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html">N2176</A>
by Hans Boehm, on
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2195.html">N2195</A>
by Peter Dimov, on
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2260.html">N2260</A>
by Paul E. McKenney, on
discussions on the
cpp-threads list, and on discussions
in the concurrency workgroup at the 2007 Oxford and Toronto meetings.

<h3>Rationale</h3>

<P>
See <A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>.

<h3>Prior Approaches</h3>

<P>
See <A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>.


<h3>Dependency Chains</h3>

<P>
See <A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>.


<h3>Current Approach</h3>

<P>
This proposal is in two parts:

<OL>
<LI>	Preserving dependency ordering within the
	confines of a single function body, as described in
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A> and
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2360.html">N2360</A>.
<LI>	Preserving dependency
	ordering across multiple functions, possibly in multiple
	compilation units.
	This is addressed in the remainder of this document.
</OL>

<P>

<h3>Multiple Functions</h3>

<P>
As noted in the previous section, a dependency chain terminates
when a value is passed into or returned from an unannotated function.
If a given function's return value is annotated, then dependency chains
survive being returned from that function.
If a particular argument of a given function is annotated, then
dependency chains survive being passed in via that argument.
If the function has a prototype, then the annotation must be applied
to the prototype as well as to the function definition itself.
<P>
For example, the following propagates dependencies through argument
<code>y</code> to the return value:
<pre>
[[dependence_propagate]] int *f([[dependence_propagate]] int *y)
{
        return y;
}
</pre>
<P>
The following example propagates dependency chains in, but not out:
<pre>
int f([[dependence_propagate]] int *y)
{
        return *y;
}
</pre>
<P>
The following propagates dependency chains out, but not in:
<pre>
[[dependence_propagate]] struct foo *f(int i)
{
        return &amp;foo_head[i];
}
</pre>
<P>
Finally, the following does not propagate dependency chains at all:
<pre>
template&lt;T> T *kill_dependency_chain(T *y)
{
        return y;
}
</pre>
<P>


<h3>Minimal Implementation</h3>

<P>
Minimal implementations of <code>[[dependency_propagate]]</code>
simply ignore this attribute, assuming the minimal implementation of
dependency ordering described in
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>.


<h3>Behavior on Dependency Examples</h3>

<P>
In
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html">N2176</A>,
Hans Boehm lists a number of example optimizations that can break
dependency chains.
Most of these are addressed in
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>,
the remainder are covered below.


<h4>Example 5</h4><P>

<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html">N2176</A>
example code:
<pre>
r1 = x.load(memory_order_relaxed);
if (r1)
	f(&amp;y);
else
	g(&amp;y);
</pre><P>

In this case, there is no data dependency leading into <code>f()</code>
and <code>g()</code>, so this code-dependency example is out of scope.
Modifying the example by replacing <code>&amp;y</code> with <code>r1</code>
to create a data dependency leading into the two functions:
<pre>
r1 = x.load(memory_order_relaxed);
if (r1)
	f(r1);
else
	g(r1);
</pre><P>

Recoding this based on this proposal and on
<A HREF="http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2359.html">N2359</A>:
<pre>
void f([[dependence_propagate]] atomic&lt;struct foo *&gt; p);
void g([[dependence_propagate]] atomic&lt;struct foo *&gt; p);

r1 = x.load(memory_order_dependency);
if (r1)
	f(r1);
else
	g(r1);
</pre><P>

Assuming that <code>x</code> is an atomic, the
<code>x.load(memory_order_dependency)</code> will form the head of a dependency
chain.
The <code>[[dependence_propagate]]</code> annotations will cause 
the dependency chain to propagate into <code>f()</code> and <code>g()</code>.


<h3>Alternatives Considered</h3>

<P>
<UL>
<LI>	Require the compiler globally analyze the program to infer
	which dependency chains must be preserved.
	This conflicts with the common practice of compiling C++
	programs on a module-by-module basis, so this alternative
	was rejected.
<LI>	Prohibit dependency chains from crossing function boundaries.
	There are a large number of examples of RCU dependency chains
	crossing function boundaries in the Linux kernel, so this
	alternative was rejected.
<LI>	Prohibit dependency chains from crossing compilation-unit
	boundaries.
	There are a large number of examples of RCU dependency chains
	crossing compilation-unit boundaries in the Linux kernel.
	In some cases, the overall overhead of the instructions making
	up the dependency chain overwhelms that of a memory fence,
	however, in a number of cases, the actual number of instructions is
	small, so that the overhead of a memory fence would still be
	prohibitive.
	Therefore, this alternative was rejected.
</UL>



</body></html>
