<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>Converting Memory Fences to N2324 Form</title>
</head><body>
<h1>Converting Memory Fences to N2324 Form</h1>

<p>ISO/IEC JTC1 SC22 WG21 N2362 = 07-0222 - 2007-08-04

</p><p>Paul E. McKenney, paulmck@linux.vnet.ibm.com<BR>
Lawrence Crowl, Lawrence@Crowl.org

</p><h2>Introduction</h2>

<P>Existing parallel code using memory fences typically uses the address-free
varieties provided by most hardware, for example:
</P>
<UL>
<LI>	Alpha: mb, wmb
<LI>	Itanium: mf
<LI>	POWER: sync, lwsync, eieio
<LI>	SPARC: membar
<LI>	x86: lfence
</UL>

<P> When converting such programs to use N2324's address-based memory fences,
developers must supply the corresponding variable.  This document lists
a number of methods that might be used to accomplish this, and is
particularly concerned with the memory_order_seq_cst and the
memory_order_acq_rel variants.  A brief description of each method
follows:
</P>
<OL>
<LI>	Developers could carefully select separate variables for
	each related use of fences within the system being ported,
	carefully validating that the additional potential for compiler
	and hardware misordering did not render the program incorrect.
<LI>	Developers could randomly select variables that were conveniently
	in scope, potentially using a different variable for each fence
	in the program.
<LI>	Developers could create a new global variable, and assign that global
	variable to all N2324-based fences.  Developers would most likely
	use macros or inline functions to map their existing API into
	that provided by N2324.
<LI>	The C++ standard could specify the name of the global variable
	to be used in such cases, and developers would be advised to
	use that variable.  N2324 might in addition be extended to take
	a default argument, which would map to the standard name.
<LI>	The relevant ABI standard could specify the name for a given
	platform.  N2324 might again be extended to supply this
	ABI-specific name as the default for fence operations.
</OL>

<P>Each of these approaches is expanded on in the following sections.
</P>

<H2>1. Select Separate Fence Variables</H2>

<P> This approach offers the greatest potential performance for platforms
that can exploit the additional opportunities for reordering.  However,
it also requires the greatest effort on the part of the programmers and
incurs the greatest risk.  However, this risk is incurred only on systems
that use special facilities or optimizations that take advantage of the
greater freedom to reorder or to reduce communications.

<P> In contrast, platforms that chose to implement the N2324 fence operations
as address-free fence instructions (as listed above) would be guaranteed
to run the program with the old semantics.
</P>

<P> Furthermore, given that all existing hardware would likely use
address-free fences, any validation that the developers might do would
be theoretical.
Although it is hoped that program analysis tools will eventually
be capable of analyzing fence usage,
there is currently no way to test the variable choices,
which in turn means that any design errors or even typographical errors
would persist.
Such a program would therefore -look- like it was written
for a machine with address-sensitive fence instructions when it does not
in fact run correctly on such hardware.
</P>

<P> This situation forces the conclusion that (a) programmers are extremely
unlikely to choose this option and (b) if they do choose it, they will
almost certainly get it wrong.
</P>


<H2>2.  Select Random Fence Variables</H2>

<P> In this scenario, the developers randomly choose any convenient atomic
that is in scope for each separate fence primitive.  This requires very
little effort on the part of the developers, and is guaranteed to preserve
program behavior on existing machines with address-free fence instructions.
The program would very likely fail on machines with address-sensitive
fence instructions, though casual inspection of the program would have
a fair chance of fooling the inspector into believing that the variables
had been properly selected.
</P>

<P> A moment reflecting on human nature and on experience with real people
on real projects should be sufficient to force the conclusion that
this option is depressingly likely to be chosen.
</P>


<H2>3.  Create New Global Variable</H2>

<P> Here, a single new global atomic variable is chosen to be used in
conjunction with N2324 fence operations.  In standalone roll-your-own
software projects, this option is reasonably likely to be chosen,
and it has the virtue of preserving program semantics on hardware
that has address-sensitive fence instructions.  Unfortunately, such
a choice of global variable may result in extremely low levels of
performance and scalability.
</P>

<P> Furthermore, if the program is built using multiple third-party
modules and libraries that are independently converted, it is
unlikely that all the parties would chose the same global variable,
thus raising the possibility of bugs appearing on hardware with
address-sensitive fence instructions.
</P>

<P> Even worse, a casual investigation of the code might erroneously
conclude that the program had been optimized to run on hardware
featuring address-sensitive fences.
A better approach would make it quite clear that no such optimization
had been undertaken.
</P>

<P> Of course, such code would continue to run correctly on conventional
machines with address-free fence instructions, increasing the likelihood
that any such errors would go undetected until much later when the
software was actually run on a machine with address-sensitive
fence instructions.
</P>


<H2>4.  C++ Standard Specifies Global Variable</H2>

<P> With this option, the C++ standard specifies the name of the global
variable to be used for N2324 fences, and developers would be advised to
use that variable.  N2324 might in addition be extended to take a default
argument, which would map to the standard name.  This latter approach
possesses the virtue that developers would be very strongly incented
to let the compiler reliably choose the correct name.
However, the C language does not permit default arguments, so
the C-language API would need to either require the variable be
specified or require an additional API member.
</P>

<P> This approach would permit programs, even those produced by multiple
parties working in isolation, to produce correct results when run
on machines with address-sensitive fence instructions.
In addition, it would be obvious that the program had not been
specifically optimized for hardware featuring address-sensitive
fences, as such optimization would almost invariably avoid the
standard name.
</P>

<P> However, the C++ standard is arguably a strange place to put such
a variable name, particularly when no platform that we are currently
aware of needs it.
On platforms with efficient address-insensitive fence instructions,
placing the name in the standard would consume an identifier to
no purpose.
</P>


<H2>5.  ABI Standard Specifies Global Variable</H2>

<P> A more logical place to put the name of the global variable would be in
the relevant ABI standard.  This works especially well for the majority
of the platforms with address-free fence instructions, as such platforms
need not specify a variable name at all, given that they don't need one.
In this case, N2324 might be extended to offer a default value for the
address to be associated with the fence operation, permitting the common
address-free-fence platforms to simply sidestep the whole issue.
In addition, the name of the global variable in the ABI standard
could potentially be an illegal C++ identifier, which would avoid
consuming a legal C++ name.
</P>

<P> In other words, only the ABI standards for prospective machines with
address-sensitive fence instructions would need to take this requirement
into account.  Platforms such as Itanium that have both address-free
and address-sensitive variants of fence instructions could choose to
modify its ABI standard or not, as performance, convenience, or other
considerations dictated.
</P>

<P> This approach perserves the correctness advantages of option #4, likewise
the performance shortfall of naively ported code, but only when such code
is run on machines having only address-sensitive fence instructions for
which simulating address-free fence instructions is expensive.
However, the C language does not permit default arguments, so
the C-language API would need to either require the variable be
specified or require an additional API member.
</P>


<H2>Conclusion</H2>

<P> At the current moment, option #4 seems to be the most straightforward
in terms of standards effort (when the C-language standard is taken
into account) and also in terms of correct operation of
programs with pre-existing fence operations.
</P>

<P> A candidate variable definition is as follows:</P>
<pre>
const atomic_bool atomic_global_fence_compatibility = { false };
</pre>


</body></html>
