<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>



<meta http-equiv="Content-Type" content="text/html;charset=us-ascii"><title>3074: Updates to C++ Memory Model Based on Formalization</title></head><body>
<h1>Updates to C++ Memory Model Based on Formalization</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3074 = 10-0064 - 2010-03-11
</p>

<p>
Paul E. McKenney, paulmck@linux.vnet.ibm.com
<br>
Mark Batty, mjb220@cl.cam.ac.uk
<br>
Clark Nelson, clark.nelson@intel.com
<br>
N.M. Maclaren, nmm1@cam.ac.uk
<br>
Hans Boehm, hans.boehm@hp.com
<br>
Anthony Williams, anthony@justsoftwaresolutions.co.uk
<br>
Peter Dimov, pdimov@mmltd.net
<br>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</p>

<h2>Introduction</h2>

<p>
Mark Batty recently undertook a partial formalization of the C++
memory model, which  Mark summarized in
<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html">N2955</a>.
This paper summarizes the discussions on Mark's paper, both verbal and
email, recommending appropriate actions for the Core Working Group.
Library issues are dealt with in a separate N3057 paper.
</p>

<p>This paper is based on
<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3045.html">N3045</a>,
and has been updated to reflect discussions in the Concurrency subgroup
of the Library Working Group in Pittsburgh.
This paper also carries the C-language side of
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3040.html">N3040</a>,
which was also discussed in the Concurrency subgroup of the Library
Working Group in Pittsburgh.
</p>

<h2>Core Issues</h2>

<h3>Core Issue 1: 1.10p2 &ldquo;Might Be&rdquo; Might Be Indefinite (Editorial)</h3>

<p>
The phrase &ldquo;might be&rdquo; is indefinite and should be reworded.
</p>

<p>Replace &ldquo;might be&rdquo; with &ldquo;is&rdquo;:
</p>

<h3>Core Issue 2: 1.10p14 Lock and SC Operations Interleave Simply (Non-Normative)</h3>

<p>
<a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html">N2955</a>
suggests two changes to 1.10p14:
</p>

<ol>
<li>	Qualifying the conflicting accesses as being both non-atomic.
	This change is unnecessary, because a single non-atomic operation
	in a set of conflicting operations is all that is required to
	result in a data race.
	This change is further incorrect, because it is possible to
	publish a reference to an atomic object within its constructor,
	which would permit a data race between an atomic operation from
	some other thread on the one hand and the remainder of the
	(non-atomic) accesses from the constructor on the other.
	<p>
	Recommendation: no change.
	<p>
<li>	Correct use of simple locks results in simple interleaving, as
	currently stated in 1.10p14.
	If atomic <code>memory_order_seq_cst</code>
	operations are also used outside
	of lock-based critical sections, the result is still simple
	interleaving.
	If atomic <code>memory_order_seq_cst</code>
	operations are also used both
	inside and outside of lock-based critical sections, the
	result is still sequentially consistent, but the individual
	lock-based critical sections are no longer simply interleaved.
	However, the result will be consistent with at least one simple
	interleaving of
	the individual operations making up each critical section.
	<p>
	Recommendation: update note to include atomic
	<code>memory_order_seq_cst</code>, reworking the wording
	appropriately.
</ol>

<h2>Core Non-Issues</h2>

<h3>Core Non-Issue 1: 1.10p4 Overhead of Access to Atomic Objects</h3>

<p>Atomic and locking objects are not trivially copyable [29.5.1p2, 29.5.2p1,
29.5.3p2], so the result of copying them (for example,
via <code>std::memcpy</code>)
are not specified by the standard [3.9].
Additionally, if the <code>memcpy</code>
operation results in a data race, then undefined behavior is explicitly
specified by the working draft [1.10p14].
</p>

<p>
There was some spirited discussion of the non-data-race case on the
email reflector, with the following positions outlined:
</p>

<ol>
<li>	Peter Dimov argued that atomic integral types have standard layout
	[29.5.1p2, 29.5.2p1], and that there was therefore no good reason
	to prohibit copying out the underlying memory
	locations of an atomic object.
	Peter further argued that atomic accesses to large objects
	can incur high overheads, even when using
	<code>memory_order_relaxed</code>,
	and that there are a number of
	situations (including some implementations of resizeable
	hash tables) where most accesses to a given object are not
	subject to data races.
	In such cases, there is good reason to avoid
	<code>memory_order_relaxed</code>'s
	overhead for accesses known to be data-race free.
<li>	Clark Nelson argued against copying to atomic objects, even in
	absence of a data race, given that some implementations might
	have non-trivial representations.  Clark was willing to
	entertain the thought of copying from atomic objects, but only in
	absence of data races.
<li>	An informal poll of the Core Working Group resulted in the position
	that copying non-trivially copyable objects (e.g., via
	<code>memcpy</code>)
	was at best unspecified, at worst undefined.
<li>	Paul McKenney argued that mandating copyability might rule out
	active-memory hardware optimizations, and that the behavior
	should thus remain undefined.
	The effect of copying out an atomic object's
	underlying representation can be
	efficiently emulated via a <code>memory_order_relaxed</code> load, so
	it is not necessary to define the effect of copying the
	underlying representation.
	Furthermore, the effect of copying an underlying representation
	to an atomic object can be both safely and efficiently emulated
	via a <code>memory_order_relaxed</code> store for machine-word-sized
	accesses, which are the most common in practice.
<li>	Some time back, Alexander Terekhov is said to have proposed
	an additional <code>memory_order</code> <code>enum</code>
	member that would permit the implementation to access the
	atomic object non-atomically (for the purposes of this paper,
	call it <code>memory_order_nonatomic</code>).
	This could be thought of as specifying memory ordering that is
	so relaxed that the implementation need not even guarantee
	indivisibility of different accesses to the same
	atomic object.
	A <code>memory_order_nonatomic</code>
	operation would therefore be subject
	to data races.
<li>	Hans Boehm proposed leaving 1.10p4 as is and stating some
	form of the prohibition in clause 29 or 30.  Peter Dimov
	and Paul McKenney agreed with this approach, with Paul
	suggesting 29.3p1.
</ol>

<p>Therefore, this paper recommends no changes to 1.10p4.
This paper does not recommend adding <code>memory_order_nonatomic</code>
to c++0x, but something similar should be considered for a later TR
or a later version of the standard.
</p>

<h3>Core Non-Issue 2: 1.10p6 Mathematical Meaning of Maximal</h3>

<p>
The phrase &ldquo;<var>M</var> is a maximal contiguous&rdquo; could
be interpreted as meaning the sequence having the maximum value or
any of a number of alternative interpretations.
However, there were other instances of this abbreviation that were
not objected to, so recommend no change.
</p>

<h3>Core Non-Issue 3: 1.10p12 Initialization as Visible Side Effect</h3>

<p>
The intent of this paragraph is that initialization be considered a
separate access, but this is not explicitly stated.
There is some debate as to whether this needs to be explicitly stated.
In absence of consensus, let those who read the words of this paragraph
apply appropriate common sense.
</p>

<h3>Core Non-Issue 4: 1.10p13 Initialization as Visible Side Effect</h3>

<p>
As with 1.10p12, the intent of this paragraph is that initialization
be considered a separate access, but this is not explicitly stated.
There is some debate as to whether this needs to be explicitly stated.
In absence of consensus, let those who read the words of this paragraph
apply appropriate common sense.
</p>

<h2>WG21 C++-Language Wording</h2>

<p>This section lists WG21 C++-language wording.
The corresponding WG14 C-language wording is shown in a separate section
to ease coordination of changes to the two working drafts.

<h4>Wording for Core Issue 1</h4>

<p>Reword the 1.10p2 as follows:
</p>

<blockquote>
	<p>
	The value of an object visible to a thread <var>T</var>
	at a particular point <del>might be</del><ins>is</ins> the
	initial value of the object, a value assigned to the object by
	<var>T</var> , or a value assigned to the object by another
	thread, according to the rules below. [ <i>Note:</i> In some
	cases, there may instead be undefined behavior. Much of this
	section is motivated by the desire to support atomic operations
	with explicit and detailed visibility constraints. However,
	it also implicitly supports a simpler view for more restricted
	programs. &mdash; <i>end note</i> ]
	</p>
</blockquote>

<h4>Wording for Core Issue 2</h4>

<p>
Reword the non&ndash;normative note in 1.10p14 to include sequentially consistent
atomic operations as well as lock&ndash;based critical sections, as follows:
</p>

<blockquote>
	<p>
	The execution of a program contains a data race if it contains
	two conflicting actions in different threads, at least one of
	which is not atomic, and neither happens before the other. Any
	such data race results in undefined behavior. [ Note: It can be
	shown that programs that correctly use simple locks
	<ins>
	and <code>memory_order_seq_cst</code> operations
	</ins>
	to prevent all
	data races and
	<ins>
	that
	</ins>
	use no other synchronization operations behave as
	<del>
	the executions of
	</del>
	<ins>
	if the operations executed by
	</ins>
	their constituent threads
	<del>
	were
	</del>
	<ins>
	are
	</ins>
	simply
	interleaved, with each
	<del>
	observed value
	</del>
	<ins>
	value computation
	</ins>
	of an object being the
	<del>
	last value assigned
	</del>
	<ins>
	last side effect on that object
	</ins>
	in that interleaving. This is normally
	referred to as &ldquo;sequential consistency&rdquo;. However,
	this applies only to
	<ins>data&ndash;</ins>race&ndash;free programs, and
	<ins>data&ndash;</ins>race&ndash;free programs
	cannot observe most program transformations that do not change
	single&ndash;threaded program semantics.
	In fact, most single&ndash;threaded
	program transformations continue to be allowed, since any program
	that behaves differently as a result must perform an undefined
	operation. &mdash; end note ]
	</p>
</blockquote>

<h2>WG14 C&ndash;Language Wording</h2>

<p>Change WG14 5.1.2.4p2 as follows:
</p>

<blockquote>
	<p>The value of an object visible to a thread <var>T</var>
	at a particular point <del>might be</del> <ins>is</ins> the
	initial value of the object, a value assigned to the object
	by <var>T</var> , or a value assigned to the object by another
	thread, according to the rules below.
	</p>
</blockquote>

<p>Change WG14 5.1.2.4p23 as follows:
</p>

<blockquote>
	<p>
	NOTE 11 It can be shown that programs that correctly use simple
	locks <ins>and <code>memory_order_seq_cst</code> operations</ins>
	to prevent all data races<del>,</del> <ins>that</ins> and
	use no other synchronization operations<del>,</del> behave
	as though <del>the executions of</del> <ins>the operations
	executed by</ins> their constituent threads <del>were</del>
	<ins>are</ins> simply interleaved, with each <del>observed
	value</del> <ins>value computation</ins> of an object being
	the <del>last value assigned</del> <ins>last side effect on
	that object</ins> in that interleaving.  This is normally
	referred to as &ldquo;sequential consistency&rdquo;. However,
	this applies only to <ins>data-</ins>race-free programs, and
	<ins>data-</ins>race-free programs cannot observe most program
	transformations that do not change single-threaded program
	semantics. In fact, most single-threaded program transformations
	continue to be allowed, since any program that behaves differently
	as a result must contain undefined behavior.
	</p>
</blockquote>

</body></html>
