<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>



<meta http-equiv="Content-Type" content="text/html;charset=us-ascii"><title>Updates to C++ Memory Model Based on Formalization</title></head><body>
<h1>Updates to C++ Memory Model Based on Formalization</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3045 = 10-0035 - 2010-02-15
</p>
<!--Based on XNNNN.2010.02.14a.html -->

<p>
Paul E. McKenney, paulmck@linux.vnet.ibm.com
<br>
Mark Batty, mjb220@cl.cam.ac.uk
<br>
Clark Nelson, clark.nelson@intel.com
<br>
N.M. Maclaren, nmm1@cam.ac.uk
<br>
Hans Boehm, hans.boehm@hp.com
<br>
Anthony Williams, anthony@justsoftwaresolutions.co.uk
<br>
Peter Dimov, pdimov@mmltd.net
<br>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</p>

<h2>Introduction</h2>

<p>
Mark Batty recently undertook a partial formalization of the C++
memory model, which  Mark summarized in
<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html">N2955</a>.
This paper summarizes the discussions on Mark's paper, both verbal and
email, recommending appropriate actions.
We expect that this working paper will be divided into a group of
issues to be applied to the working draft.
</p>

<h2>Core Issues</h2>

<h3>Issue 1: 1.10p2 &ldquo;Might Be&rdquo; Might Be Indefinite (Editorial)</h3>

<p>
The phrase &ldquo;might be&rdquo; is indefinite and should be reworded.
</p>

<p>Replace &ldquo;might be&rdquo; with &ldquo;is&rdquo;:
</p>

<p><b>Priority: Low.</b>
</p>

<h3>Issue 2: 1.10p14 Lock and SC Operations Interleave Simply (Non-Normative)</h3>

<p>
<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html">N2955</a>
suggests two changes to 1.10p14:
</p>

<ol>
<li>	Qualifying the conflicting accesses as being both non-atomic.
	This change is unnecessary, because a single non-atomic operation
	in a set of conflicting operations is all that is required to
	result in a data race.
	This change is further incorrect, because it is possible to
	publish a reference to an atomic object within its constructor,
	which would permit a data race between an atomic operation from
	some other thread on the one hand and the remainder of the
	(non-atomic) accesses from the constructor on the other.
	<p>
	Recommendation: no change.
	<p>
<li>	Correct use of simple locks results in simple interleaving, as
	currently stated in 1.10p14.
	If atomic <code>memory_order_seq_cst</code>
	operations are also used outside
	of lock-based critical sections, the result is still simple
	interleaving.
	If atomic <code>memory_order_seq_cst</code>
	operations are also used both
	inside and outside of lock-based critical sections, the
	result is still sequentially consistent, but the individual
	lock-based critical sections are no longer simply interleaved.
	However, the result will be consistent with at least one simple
	interleaving of
	the individual operations making up each critical section.
	<p>
	Recommendation: update note to include atomic
	<code>memory_order_seq_cst</code>, reworking the wording
	appropriately.
</ol>

<h4>Wording for Issue 2</h4>

<p>
Reword the non-normative noted in 1.10p14 to include sequentially consistent
atomic operations as well as lock-based critical sections, as follows:
</p>

<blockquote>
	<p>
	The execution of a program contains a data race if it contains
	two conflicting actions in different threads, at least one of
	which is not atomic, and neither happens before the other. Any
	such data race results in undefined behavior. [ Note: It can be
	shown that programs that correctly use simple locks
	<ins>
	and <code>memory_order_cst</code> operations
	</ins>
	to prevent all
	data races and
	<ins>
	that
	</ins>
	use no other synchronization operations behave as
	<del>
	the executions of
	</del>
	<ins>
	if the operations executed by
	</ins>
	their constituent threads
	<del>
	were
	</del>
	<ins>
	are
	</ins>
	simply
	interleaved, with each
	<del>
	observed value
	</del>
	<ins>
	value computation
	</ins>
	of an object being the
	<del>
	last value assigned
	</del>
	<ins>
	last side effect on that object
	</ins>
	in that interleaving. This is normally
	referred to as &ldquo;sequential consistency&rdquo;. However,
	this applies only to race-free programs, and race-free programs
	cannot observe most program transformations that do not change
	singlethreaded program semantics. In fact, most single-threaded
	program transformations continue to be allowed, since any program
	that behaves differently as a result must perform an undefined
	operation. &mdash; end note ]
	</p>
</blockquote>

<p><b>Priority: Medium.</b>
</p>

<h2>Core Non-Issues</h2>

<h3>Non-Issue 1: 1.10p4 Overhead of Access to Atomic Objects</h3>

<p>Atomic and locking objects are not trivially copyable [29.5.1p2, 29.5.2p1,
29.5.3p2], so the result of copying them (for example,
via <code>std::memcpy</code>)
are not specified by the standard [3.9].
Additionally, if the <code>memcpy</code>
operation results in a data race, then undefined behavior is explicitly
specified by the working draft [1.10p14].
</p>

<p>
There was some spirited discussion of the non-data-race case on the
email reflector, with the following positions outlined:
</p>

<ol>
<li>	Peter Dimov argued that atomic integral types have standard layout
	[29.5.1p2, 29.5.2p1], and that there was therefore no good reason
	to prohibit copying out the underlying memory
	locations of an atomic object.
	Peter further argued that atomic accesses to large objects
	can incur high overheads, even when using
	<code>memory_order_relaxed</code>,
	and that there are a number of
	situations (including some implementations of resizeable
	hash tables) where most accesses to a given object are not
	subject to data races.
	In such cases, there is good reason to avoid
	<code>memory_order_relaxed</code>'s
	overhead for accesses known to be data-race free.
<li>	Clark Nelson argued against copying to atomic objects, even in
	absence of a data race, given that some implementations might
	have non-trivial representations.  Clark was willing to
	entertain the thought of copying from atomic objects, but only in
	absence of data races.
<li>	An informal poll of the Core Working Group resulted in the position
	that copying non-trivially copyable objects (e.g., via
	<code>memcpy</code>)
	was at best unspecified, at worst undefined.
<li>	Paul McKenney argued that mandating copyability might rule out
	active-memory hardware optimizations, and that the behavior
	should thus remain undefined.
	The effect of copying out an atomic object's
	underlying representation can be
	efficiently emulated via a <code>memory_order_relaxed</code> load, so
	it is not necessary to define the effect of copying the
	underlying representation.
	Furthermore, the effect of copying an underlying representation
	to an atomic object can be both safely and efficiently emulated
	via a <code>memory_order_relaxed</code> store for machine-word-sized
	accesses, which are the most common in practice.
<li>	Some time back, Alexander Terekhov is said to have proposed
	an additional <code>memory_order</code> <code>enum</code>
	member that would permit the implementation to access the
	atomic object non-atomically (for the purposes of this paper,
	call it <code>memory_order_nonatomic</code>).
	This could be thought of as specifying memory ordering that is
	so relaxed that the implementation need not even guarantee
	indivisibility of different accesses to the same
	atomic object.
	A <code>memory_order_nonatomic</code>
	operation would therefore be subject
	to data races.
<li>	Hans Boehm proposed leaving 1.10p4 as is and stating some
	form of the prohibition in clause 29 or 30.  Peter Dimov
	and Paul McKenney agreed with this approach, with Paul
	suggesting 29.3p1.
</ol>

<p>Therefore, this paper recommends no changes to 1.10p4.
This paper does not recommend adding <code>memory_order_nonatomic</code>
to c++0x, but something similar should be considered for a later TR
or a later version of the standard.
</p>

<p><b>Priority: N/A.</b>
</p>

<h3>Non-Issue 2: 1.10p6 Mathematical Meaning of Maximal</h3>

<p>
The phrase &ldquo;<var>M</var> is a maximal contiguous&rdquo; could
be interpreted as meaning the sequence having the maximum value or
any of a number of alternative interpretations.
However, there were other instances of this abbreviation that were
not objected to, so recommend no change.
</p>

<p><b>Priority: N/A.</b>
</p>

<h3>Non-Issue 3: 1.10p12 Initialization as Visible Side Effect</h3>

<p>
The intent of this paragraph is that initialization be considered a
separate access, but this is not explicitly stated.
There is some debate as to whether this needs to be explicitly stated.
In absence of consensus, let those who read the words of this paragraph
apply appropriate common sense.
</p>

<p><b>Priority: N/A.</b>
</p>

<h3>Non-Issue 4: 1.10p13 Initialization as Visible Side Effect</h3>

<p>
As with 1.10p12, the intent of this paragraph is that initialization
be considered a separate access, but this is not explicitly stated.
There is some debate as to whether this needs to be explicitly stated.
In absence of consensus, let those who read the words of this paragraph
apply appropriate common sense.
</p>

<p><b>Priority: N/A.</b>
</p>

<h3>Library Issues</h3>

<h3>Issue 1: 29.3p1 Limits to Memory-Order Relaxation (Non-Normative)</h3>

<p>
Add a note stating that <code>memory_order_relaxed</code> operations
must maintain indivisibility, as described in the discussion of 1.10p4.
This must be considered in conjunction with the resolution to LWG 1151,
which is expected to be addressed by Hans Boehm in N3040.
</p>

<h4>Wording for Issue 1</h4>

<p>
Add a note as follows:
</p>

<blockquote>
	<p>
	The enumeration <code>memory_order</code> specifies the detailed
	regular (non-atomic) memory synchronization order
	as defined in 1.10 and may provide for operation ordering.
	Its enumerated values and their meanings are as follows:</p>
	<p>
	&mdash;	memory_order_relaxed: no operation orders memory.<br>
	&mdash;	<code>memory_order_release</code>,
		<code>memory_order_acq_rel</code>, and
		<code>memory_order_seq_cst</code>: a store operation performs
		a release operation on the affected memory location.<br>
	&mdash; <code>memory_order_consume</code>: a load operation
		performs a consume operation on the affected memory
		location.<br>
	&mdash; <code>memory_order_acquire</code>,
		<code>memory_order_acq_rel</code>, and
		<code>memory_order_seq_cst</code>: a load operation performs
		an acquire operation on the affected memory location.
	<p>
	<ins> [ Note: Atomic operations specifying
	<code>memory_order_relaxed</code> are relaxed only with
	respect to memory ordering.  Implementations must still
	guarantee that any given atomic access to a particular
	atomic object be indivisible with respect to all other
	atomic accesses to that object.  &mdash; end note. ]</ins>
	</p>
</blockquote>

<p><b>Priority: Low.</b>
</p>

<h3>Issue 2: 29.3p9 Schedulers, Loops, and Atomics (Normative)</h3>

<p>
The second sentence of this paragraph, &ldquo;Implementations shall not
move an atomic operation out of an unbounded loop&rdquo;, does not add
anything to the first sentence, and, worse, can be interpreted as restricting
the meaning of the first sentence.
This sentence should therefore be deleted.
The Library Working Group discussed this change during the Santa Cruz
meeting in October 2009, and agreed with this deletion.
</p>

<h4>Wording for Issue 2</h4>

<p>
Therefore, remove the second sentence of 29.3p9 as follows:
</p>

<blockquote>
	<p>
	Implementations should make atomic stores visible to atomic
	loads within a reasonable amount of time.
	<del>Implementations shall not move an atomic operation out of an
	unbounded loop.</del>
	</p>
</blockquote>

<p><b>Priority: Medium.</b>
</p>

<h3>Issue 3: 29.5.1 Uninitialized Atomics and C/C++ Compatibility (Normative)</h3>

<p>
This topic was the subject of a spirited discussion among a subset of
the participants in the C/C++-compatibility effort this past October and
November.
</p>

<p>
Unlike C++, C has no mechanism to force a given variable to be
initialized.
Therefore, if C++ atomics are going to be compatible with those of C,
either C++ needs to tolerate uninitialized atomic objects, or C needs
to require that all atomic objects be initialized.
There are a number of cases to consider:
</p>
<ol>
<li>	C static variables.  The C standard specifies that these are
	initialized bitwise to zero.
	The C &ldquo;<code>={value}</code>&rdquo; syntax may be used
	to explicitly initialize these values, however, such initialization
	may <i>not</i> contain any statements executing at run time.
<li>	C on-stack <code>auto</code> variables.  The C standard does not
	require that these be initialized.
	On some machines, such variables might be
	initialized to an error value (for example, not-a-thing (NAT)
	for variables on Itanium that live only in a machine register).
	The C &ldquo;<code>={value}</code>&rdquo; syntax may be used
	to explicitly initialize these values, and may include
	statements executing at run time.
<li>	C dynamically allocated variables, for example, via
	<code>malloc()</code>.
	The C standard does not require that these be initialized.
	The C &ldquo;<code>={value}</code>&rdquo; syntax may <i>not</i> be
	used to explicitly initialize these values.
</ol>

<p>Of course, C on-stack <code>auto</code> variables and dynamically
allocated variables are inaccessible to other threads until references
to them are published.
Such publication must ensure that any initialization happens before any
access to the variable from another thread, for example, by use of
store release or locking.
</p>

<p>
There are also a number of interesting constraints on these types:
</p>
<ol>
<li>	The
	<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n3000.pdf">
	C++0x Working Draft</a>
	requires that the atomic integral type have
	standard layout (29.5.1p2).
<li>	The
	<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n3000.pdf">
	C++0x Working Draft</a>
	requires that the atomic pointer type have
	standard layout (29.5.2p1).
<li>	The
	<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n3000.pdf">
	C++0x Working Draft</a>
	requires that the atomic flag type have
	standard layout (29.7p3).
</ol>

<p>
These constraints permit but three known ways for C++ to make use of
non-generic atomic types defined in C-language translation units:
</p>

<ol>
<li>	The atomic type is a structure containing a single field of
	the underlying type, possibly followed by padding.
	There is an implementation-provided external lock table,
	and the implementation locates the lock corresponding to
	a given instance of an atomic type by hashing that
	instance's address.
	The implementation is of course responsible for correctly
	initializing the array of locks.
	This implementation permits C++ to tolerate an unspecified initial
	value for a given instance of an atomic type, but only in cases
	where every bit pattern corresponds to a valid value of the atomic
	type in question.
<li>	The atomic type is a structure containing a single field of
	the underlying type, possibly followed by padding.
	If the atomic type is implemented in a non-lock-free manner,
	an external table is used to check whether a given instance of
	an atomic type has been initialized, allowing it to be initialized
	if required.
	Such initialization could include any locks that might be embedded
	in instance of the atomic type.
	This external table would
	be accessed by both C and C++ code for each access to the atomic
	variable in question (although a clever optimizer might be able
	to elide some table accesses).
	This table would clearly need to be implemented so as to tolerate
	multithreaded access and modification.
	In addition, special handling might be required to ensure that
	any atomic variables residing in deallocated memory were removed
	from the external table.
	There are therefore serious concerns about the overhead of this
	approach.
<li>	If the underlying hardware supports atomic operations that are
	large enough to cover the given non-generic atomic type, then
	those atomic operations can be used directly.
<li>	Any instance of an atomic type that is defined in a C-language
	translation units must be initialized by C code before
	the first C++ use of that instance.
	This approach requires two syntaxes for C-language initialization,
	one to be applied to static variables and another for dynamically
	allocated objects.  Either syntax may be appled to <code>auto</code>
	variables.
</ol>

<p>
The wording below permits any of the above implementation alternatives.
</p>

<h4>Wording for Issue 3</h4>

<p>Add the following to WG21 29.5.1 (Integral Types) in locations
corresponding to the existing <code>atomic_is_lock_free()</code> functions:
</p>

<blockquote>
	<p>
	<ins><code>atomic_bool ATOMIC_VAR_INIT(bool);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(volatile atomic_bool*, bool);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(atomic_bool*, bool);</code></ins>
	</p>
	<p>
	<ins><code>atomic_<i>itype</i> ATOMIC_VAR_INIT(<i>itype</i>);</code></ins>
	<p>
	<ins><code>void atomic_init(volatile atomic_<i>itype</i>*, <i>itype</i>);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(atomic_<i>itype</i>*, <i>itype</i>);</code></ins>
	</p>
</blockquote>

<p>
Note that <code>ATOMIC_INIT</code> is already in use, for example, in
the Linux kernel.
Google code search was unable to find <code>ATOMIC_VAR_INIT</code> or
<code>atomic_init</code>.
</p>

<p>Add the following to WG21 29.5.2 (Address Type) located
corresponding to the existing <code>atomic_is_lock_free()</code> function:
</p>

<blockquote>
	<p>
	<ins><code>atomic_address ATOMIC_VAR_INIT(void *);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(volatile atomic_address*, void *);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(atomic_address*, void *);</code></ins>
	</p>
</blockquote>

<p>Add the following after WG21 29.6p4 (Operations on Atomic Types):
</p>

<blockquote>
	<p>
	<ins><code>ATOMIC_VAR_INIT(x);</code></ins>
	</p>
	<p>
	<ins>A macro expanding to a token sequence suitable for initializing an
	atomic variable of a type that is the atomic equivalent of the
	type of <var>x</var>.
	Concurrent access to the variable being initialized, even via an
	atomic operation, constitutes a data race. </ins>
	</p>
	<p>
	<ins>[ Example: </ins>
	</p>
	<p>
	<ins><code>atomic_int v = ATOMIC_VAR_INIT(5);</code></ins>
	</p>
	<p>
	<ins>&mdash; end example ]</ins>
	</p>
</blockquote>

<p>Add the following after WG21 29.6p5 (Operations on Atomic Types):
</p>

<blockquote>
	<p>
	<ins><code>void atomic_init(volatile A *object, C desired);</code></ins>
	</p>
	<p>
	<ins><code>void atomic_init(A *object, C desired);</code></ins>
	</p>
	<p>
	<i>Effects:</i> Non-atomically assigns the value <var>desired</var> to
	<var>object</var>.
	Concurrent access from another thread, even via an atomic operation,
	constitutes a data race.
	</p>
</blockquote>

<p>
In addition, WG14's C-language working draft requires initializers
for non-flag atomic types (initialization is already provided in the C++
working draft via constructors).
These are listed below for convenience, but will need to be the subject
of a later WG14 paper.
</p>

<p>Change WG14 7.16.1p1 as follows:
</p>

<blockquote>
	<p>
	The header &lt;stdatomic.h&gt; defines <del>three</del><ins>four</ins>
	macros and declares several types and functions for performing
	atomic operations on data shared between threads.
	</p>
</blockquote>

<p>Change WG14 7.16.1p2 as follows:
</p>

<blockquote>
	<p> The macros defined are</p>
	<p>
	<code>ATOMIC_INTEGRAL_LOCK_FREE</code><br>
	<code>ATOMIC_ADDRESS_LOCK_FREE</code><br>
	</p>
	<p>which indicate the general lock-free property of integer and
	address atomic types; and</p>
	<p>
	<del><code>ATOMIC_FLAG_INIT</code></del><br>
	<ins><code>ATOMIC_VAR_INIT</code></ins><br>
	<ins><code>atomic_init</code></ins><br>
	</p>
	<p><del>which expands to an initializer for an object of type
	atomic_flag.</del>
	<ins>which expands to an initializer of an atomic type and
	and to an execution-time initializer for an atomic type,
	respectively.</ins></p>
</blockquote>

<p>Add a new section to WG14 named &ldquo;Initialization&rdquo;:
</p>

<blockquote>
	<p><ins>7.16.N Initialization</ins></p>
	<p><ins>The macro <code>ATOMIC_VAR_INIT</code> may be used
	to initialize an atomic variable declaration, however,
	the default zero-initialization is guaranteed to produce
	a valid object where it applies.</ins></p>
	<p><ins>EXAMPLE</ins></p>
	<p><ins><code>atomic_int guide = ATOMIC_VAR_INIT(42);</code></ins></p>
	<p><ins>The macro <code>atomic_init</code> may be used
	to initialize an atomic variable at execution time, for example,
	for atomic variables that have been dynamically allocated.</ins></p>
	<p><ins>EXAMPLE</ins></p>
	<p><ins><code>atomic_init(&amp;p->a, 42);</code></ins></p>
	<p><ins>An atomic variable that is not explicitly initialized with
	<code>ATOMIC_VAR_INIT</code> is initially in an indeterminate
	state.</ins></p>
</blockquote>

<p>Delete WG14 7.16.7p4:
</p>

<blockquote>
	<p>
	<del>
	The macro <code>ATOMIC_FLAG_INIT</code> may be used
	to initialize an atomic_flag to the clear state. An
	atomic_flag that is not explicitly initialized with
	<code>ATOMIC_FLAG_INIT</code> is initially in an indeterminate
	state.</del></p>
	<p><del>EXAMPLE</del></p>
	<p><del><code>atomic_flag guard = ATOMIC_FLAG_INIT;</code></del></p>
</blockquote>

<p><b>Priority: Medium.</b>
</p>

</body></html>
