<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN"
	"http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us">

<head>
<title>WG21/N2429: Concurrency memory model (final revision)</title>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<style type="text/css">
.deleted {
	text-decoration: line-through
}
.inserted {
	text-decoration: underline
}
</style>
</head>

<body>

<table summary="This table provides identifying information for this document.">
	<tr>
		<th>Doc. No.:</th>
		<td>WG21/N2429<br />
		J16/07-0299</td>
	</tr>
	<tr>
		<th>Date:</th>
		<td>2007-10-05</td>
	</tr>
	<tr>
		<th>Reply to:</th>
		<td>Clark Nelson</td>
		<td>Hans-J. Boehm</td>
	</tr>
	<tr>
		<th>Phone:</th>
		<td>+1-503-712-8433</td>
		<td>+1-650-857-3406</td>
	</tr>
	<tr>
		<th>Email:</th>
		<td><a href="mailto:clark.nelson@intel.com">clark.nelson@intel.com</a></td>
		<td><a href="mailto:Hans.Boehm@hp.com">Hans.Boehm@hp.com</a></td>
	</tr>
</table>
<h1>Concurrency memory model (final revision)</h1>
<p>This paper is a revision of
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm">N2334</a>. 
The most significant changes since N2334 include:</p>
<ul>
	<li>The statement about non-terminating non-interacting loops was redrafted 
	to avoid giving them undefined behavior.</li>
	<li>A note was added to the paragraph defining &quot;memory location&quot;, pointing out 
	that references and other memory objects introduced by the implementation 
	may be included.</li>
	<li>Paragraph 1.10p4 was tweaked to clarify dependencies between the language 
	and the library.</li>
	<li>The definition of "visible sequence" (1.10p10) was improved.</li>
	<li>Most rationale was deleted and the paper was rearranged, to simplify the 
	job of the Project Editor in applying the proposed changes.</li>
</ul>
<p>This paper has benefited from feedback from many people, including Sarita Adve, 
Paul McKenney, Raul Silvera, Lawrence Crowl, and Peter Dimov.</p>
<h2>Contents</h2>
<ul>
	<li><a href="#loops">Non-terminating loops</a></li>
	<li><a href="#location">The definition of &quot;memory location&quot;</a></li>
	<li><a href="#races">Multi-threaded executions and data races</a></li>
	<li><a href="#exceptions">Treatment of uncaught exceptions</a></li>
</ul>
<h2><a id="loops">Non-terminating loops</a></h2>
<p>It is generally felt that it is important to allow the transformation of potentially 
non-terminating loops (e.g. by merging two loops that iterate over the same potentially 
infinite set, or by eliminating a side-effect-free loop), even when that may not 
otherwise be justified in the case in which the first loop never terminates.</p>
<p>Existing compilers commonly assume that code immediately following a loop is 
executed if and only if code immediately preceding a loop is executed. This assumption 
is clearly invalid if the loop fails to terminate. Even if we wanted to prohibit 
this behavior, it is unclear that all relevant compilers could comply in a reasonable 
amount of time. The assumption appears both pervasive and hard to test for.</p>
<p>The treatment of non-terminating loops in the current standard is very unclear. 
We believe that some implementations already eliminate potentially non-terminating, 
side-effect-free, loops, probably based on 1.9p9, which appears to impose very weak 
requirements on conforming implementations for non-terminating programs. We had 
previously arrived at a tentative conclusion that non-terminating loops were already 
sufficiently weakly specified that no changes were needed. We no longer believe 
this, for the following reasons:</p>
<ul>
	<li>On closer inspection, it is at best unclear that this reasoning would continue 
	to apply in a world in which the program may terminate even if one of the threads 
	does not.</li>
	<li>In the presence of threads, the elimination of certain side-effect-free 
	potentially infinite loops (e.g. <code>while (!please_self_destruct.load_acquire()) 
	{}; self_destruct()</code>) is clearly hazardous, and a bit more clarity seems 
	appropriate.</li>
</ul>
<hr />
<p><strong>Proposed changes to the Working Draft begin here:</strong></p>
<hr />
<p>Add new paragraph 6.5p5:</p>
<blockquote class="inserted">
	<p>A loop that, outside of the <var>for-init-statement</var> in the case of 
	a <code>for</code> statement,</p>
	<ul>
		<li>performs no I/O operations, and</li>
		<li>does not access or modify volatile objects, and</li>
		<li>performs no synchronization or atomic operations</li>
	</ul>
	<p>may be assumed by the implementation to terminate. [<em>Note:</em> This is 
	intended to allow compiler transformations, such as removal of empty loops, 
	even when termination cannot be proven. <em>end note</em>]</p>
</blockquote>
<h2><a id="location">The definition of &quot;memory location&quot;</a></h2>
<p>New paragraphs inserted as 1.7p3 et seq.:</p>
<blockquote class="inserted">
	<p>A <dfn>memory location</dfn> is either an object of scalar type, or a maximal 
	sequence of adjacent bit-fields all having non-zero width. [<em>Note:</em> Various 
	features of the language, such as references and virtual functions, might involve 
	additional memory locations that are not accessible to programs but are managed 
	by the implementation. <em>end note</em> ] Two threads of execution can update 
	and access separate memory locations without interfering with each other.</p>
	<p>[<em>Note</em>: Thus a bit-field and an adjacent non-bit-field are in separate 
	memory locations, and therefore can be concurrently updated by two threads of 
	execution without interference. The same applies to two bit-fields, if one is 
	declared inside a nested struct declaration and the other is not, or if the 
	two are separated by a zero-length bit-field declaration, or if they are separated 
	by a non-bit-field declaration. It is not safe to concurrently update two bit-fields 
	in the same struct if all fields between them are also bit-fields, no matter 
	what the sizes of those intervening bit-fields happen to be. <em>end note</em> 
	]</p>
	<p>[<em>Example</em>: A structure declared as <code>struct {char a; int b:5, 
	c:11, :0, d:8; struct {int ee:8;} e;}</code> contains four separate memory locations: 
	The field <code>a</code>, and bit-fields <code>d</code> and <code>e.ee</code> 
	are each separate memory locations, and can be modified concurrently without 
	interfering with each other. The bit-fields <code>b</code> and <code>c</code> 
	together constitute the fourth memory location. The bit-fields <code>b</code> 
	and <code>c</code> cannot be concurrently modified, but <code>b</code> and
	<code>a</code>, for example, can be. <em>end example</em>.]</p>
</blockquote>
<h2><a id="races">Multi-threaded executions and data races</a></h2>
<p>Insert a new section between 1.9 and 1.10, titled &quot;Multi-threaded executions 
and data races&quot;.</p>
<p>1.10p1:</p>
<blockquote class="inserted">
	<p>Under a hosted implementation, a C++ program can have more than one <dfn>
	thread of execution</dfn> (a.k.a. <dfn>thread</dfn>) running concurrently. The 
	execution of each thread proceeds as defined by the remainder of this standard. 
	The execution of the entire program consists of an execution of all of its threads. 
	[<em>Note:</em> Usually the execution can be viewed as an interleaving of all 
	its threads. However some kinds of atomic operations, for example, allow executions 
	inconsistent with a simple interleaving, as described below. <em>end note</em> 
	] Under a freestanding implementation, it is implementation-defined whether 
	a program can have more than one thread of execution.</p>
</blockquote>
<p>1.10p2:</p>
<blockquote class="inserted">
	<p>The value of an object visible to a thread <var>T</var> at a particular point 
	might be the initial value of the object, a value assigned to the object by
	<var>T</var>, or a value assigned to the object by another thread, according 
	to the rules below. [<em>Note:</em> In some cases, there may instead be undefined 
	behavior. Much of this section is motivated by the desire to support atomic 
	operations with explicit and detailed visibility constraints. However, it also 
	implicitly supports a simpler view for more restricted programs. <em>end note</em> 
	]</p>
</blockquote>
<p>1.10p3:</p>
<blockquote class="inserted">
	<p>Two expression evaluations <dfn>conflict</dfn> if one of them modifies a 
	memory location and the other one accesses or modifies the same memory location.</p>
</blockquote>
<p>1.10p4:</p>
<blockquote class="inserted">
	<p>The library defines a number of <dfn>atomic operations</dfn> (clause 29[atomics]) 
	and operations on locks (clause 30[thread]) that are specially identified as 
	synchronization operations. These operations play a special role in making assignments 
	in one thread visible to another. A <dfn>synchronization operation</dfn> is 
	either an <dfn>acquire</dfn> operation or a <dfn>release</dfn> operation, or 
	both, on one or more memory locations; the semantics of these are described 
	below. In addition, there are <dfn>relaxed</dfn> atomic operations, which are 
	not synchronization operations, and atomic <dfn>read-modify-write</dfn> operations, 
	which have special characteristics, also described below. [<em>Note:</em> For 
	example, a call that acquires a lock will perform an acquire operation on the 
	locations comprising the lock. Correspondingly, a call that releases the same 
	lock will perform a release operation on those same locations. Informally, performing 
	a release operation on <var>A</var> forces prior side effects on other memory 
	locations to become visible to other threads that later perform an acquire operation 
	on <var>A</var>. We do not include &quot;relaxed&quot; atomic operations as &quot;synchronization&quot; 
	operations though, like synchronization operations, they cannot contribute to 
	data races. <em>end note</em> ]</p>
</blockquote>
<p>1.10p5:</p>
<blockquote class="inserted">
	<p>All modifications to a particular atomic object <var>M</var> occur in some 
	particular total order, called the <dfn>modification order</dfn> of <var>M</var>. 
	If <var>A</var> and <var>B</var> are modifications of an atomic object <var>
	M</var>, and <var>A</var> happens before <var>B</var>, then <var>A</var> shall 
	precede <var>B</var> in the modification order of <var>M</var>, which is defined 
	below. [<em>Note:</em> This states that the modification orders must respect 
	&quot;happens before&quot;. <em>end note</em> ] [<em>Note:</em> There is a separate order 
	for each scalar object. There is no requirement that these can be combined into 
	a single total order for all objects. In general this will be impossible since 
	different threads may observe modifications to different variables in inconsistent 
	orders. <em>end note</em> ]</p>
</blockquote>
<p>1.10p6:</p>
<blockquote class="inserted">
	<p>A <dfn>release sequence</dfn> on an atomic object <var>M</var> is a maximal 
	contiguous sub-sequence of side effects in the modification order of <var>M</var>, 
	where the first operation is a release, and every subsequent operation</p>
	<ul>
		<li>is performed by the same thread that performed the release, or</li>
		<li>is a non-relaxed atomic read-modify-write operation.</li>
	</ul>
</blockquote>
<p>1.10p7:</p>
<blockquote class="inserted">
	<p>An evaluation <var>A</var> that performs a release operation on an object
	<var>M</var> <dfn>synchronizes with</dfn> an evaluation <var>B</var> that performs 
	an acquire operation on <var>M</var> and reads a value written by any side effect 
	in the release sequence headed by <var>A</var>. [<em>Note:</em> Except in the 
	specified cases, reading a later value does not necessarily ensure visibility 
	as described below. Such a requirement would sometimes interfere with efficient 
	implementation. <em>end note</em> ] [<em>Note:</em> The specifications of the 
	synchronization operations define when one reads the value written by another. 
	For atomic variables, the definition is clear. All operations on a given lock 
	occur in a single total order. Each lock acquisition &quot;reads the value written&quot; 
	by the last lock release. <em>end note</em> ]</p>
</blockquote>
<p>1.10p8:</p>
<blockquote class="inserted">
	<p>An evaluation <var>A</var> <dfn>happens before</dfn> an evaluation <var>B</var> 
	if:</p>
	<ul>
		<li><var>A</var> is sequenced before <var>B</var> or</li>
		<li><var>A</var> synchronizes with <var>B</var>; or</li>
		<li>for some evaluation <var>X</var>, <var>A</var> happens before <var>X</var> 
		and <var>X</var> happens before <var>B</var>.</li>
	</ul>
</blockquote>
<p>1.10p9:</p>
<blockquote class="inserted">
	<p>A <dfn>visible</dfn> side effect <var>A</var> on an object <var>M</var> with 
	respect to a value computation <var>B</var> of <var>M</var> satisfies the conditions:</p>
	<ul>
		<li><var>A</var> happens before <var>B</var>, and</li>
		<li>there is no other side effect <var>X</var> to <var>M</var> such that
		<var>A</var> happens before <var>X</var> and <var>X</var> happens before
		<var>B</var>.</li>
	</ul>
	<p>The value of a non-atomic scalar object <var>M</var>, as determined by evaluation
	<var>B</var>, shall be the value stored by the visible side effect <var>A</var>. 
	[ <em>Note:</em> If there is ambiguity about which side effect to a non-atomic 
	object is visible, then there is a data race, and the behavior is undefined. 
	<em>end note</em> ] [ <em>Note:</em> This states that operations on ordinary 
	variables are not visibly reordered. This is not actually detectable without 
	data races, but it is necessary to ensure that data races, as defined here, 
	and with suitable restrictions on the use of atomics, correspond to data races 
	in a simple interleaved (sequentially consistent) execution. <em>end note</em> 
	]</p>
</blockquote>
<p>1.10p10:</p>
<blockquote class="inserted">
	<p>The <dfn>visible sequence</dfn> of side effects on an atomic object <var>
	M</var>, with respect to a value computation <var>B</var> of <var>M</var>, is 
	a maximal contiguous sub-sequence of side effects in the modification order 
	of <var>M</var>, where the first side effect is visible with respect to <var>
	B</var>, and for every subsequent side effect, it is not the case that <var>
	B</var> happens before it. The value of an atomic object <var>M</var>, as determined 
	by evaluation <var>B</var>, shall be the value stored by some operation in the 
	visible sequence of <var>M</var> with respect to <var>B</var>. Furthermore, 
	if a value computation <var>A</var> of an atomic object <var>M</var> happens 
	before a value computation <var>B</var> of <var>M</var>, and the value computed 
	by <var>A</var> corresponds to the value stored by side effect <var>X</var>, 
	then the value computed by <var>B</var> shall either equal the value computed 
	by <var>A</var>, or be the value stored by side effect <var>Y</var>, where
	<var>Y</var> follows <var>X</var> in the modification order of <var>M</var>. 
	[<em>Note:</em> This effectively disallows compiler reordering of atomic operations 
	to a single object, even if both operations are &quot;relaxed&quot; loads. By doing so, 
	we effectively make the &quot;cache coherence&quot; guarantee provided by most hardware 
	available to C++ atomic operations.<em>end note</em> ] [Note: The visible sequence 
	depends on the &quot;happens before&quot; relation, which depends on the values observed 
	by loads of atomics, which we are restricting here. The intended reading is 
	that there must exist an association of atomic loads with modifications they 
	observe that, together with suitably chosen modification orders and the &quot;happens 
	before&quot; relation derived as described above, satisfy the resulting constraints 
	as imposed here. -- end note.]</p>
</blockquote>
<p>1.10p11:</p>
<blockquote class="inserted">
	<p>The execution of a program contains a <dfn>data race</dfn> if it contains 
	two conflicting actions in different threads, at least one of which is not atomic, 
	and neither happens before the other. Any such data race results in undefined 
	behavior. [<em>Note:</em> It can be shown that programs that correctly use simple 
	locks to prevent all data races, and use no other synchronization operations, 
	behave as though the executions of their constituent threads were simply interleaved, 
	with each observed value of an object being the last value assigned in that 
	interleaving. This is normally referred to as &quot;sequential consistency&quot;. However, 
	this applies only to race-free programs, and race-free programs cannot observe 
	most program transformations that do not change single-threaded program semantics. 
	In fact, most single-threaded program transformations continue to be allowed, 
	since any program that behaves differently as a result must perform an undefined 
	operation. <em>end note</em> ]</p>
</blockquote>
<p>1.10p12:</p>
<blockquote class="inserted">
	<p>[<em>Note:</em> Compiler transformations that introduce assignments to a 
	potentially shared memory location that would not be modified by the abstract 
	machine are generally precluded by this standard, since such an assignment might 
	overwrite another assignment by a different thread in cases in which an abstract 
	machine execution would not have encountered a data race. This includes implementations 
	of data member assignment that overwrite adjacent members in separate memory 
	locations. We also generally preclude reordering of atomic loads in cases in 
	which the atomics in question may alias, since this may violate the &quot;visible 
	sequence&quot; rules. <em>end note</em> ]</p>
</blockquote>
<p>1.10p13:</p>
<blockquote class="inserted">
	<p>[<em>Note:</em> Transformations that introduce a speculative read of a potentially 
	shared memory location may not preserve the semantics of the C++ program as 
	defined in this standard, since they potentially introduce a data race. However, 
	they are typically valid in the context of an optimizing compiler that targets 
	a specific machine with well-defined semantics for data races. They would be 
	invalid for a hypothetical machine that is not tolerant of races or provides 
	hardware race detection. <em>end note</em> ]</p>
</blockquote>
<h2><a id="exceptions">Treatment of uncaught exceptions</a></h2>
<p>15.3p9:</p>
<blockquote>
	<p>If no matching handler is found <span class="deleted">in a program</span>, the function 
	std::terminate() is called; whether or not the stack is unwound before this 
	call to std::terminate() is implementation-defined (15.5.1).</p>
</blockquote>

</body>

</html>
