<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
	"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us">

<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii" />
<title>WG21/N2052: Sequencing and the concurrency memory model</title>
<style type="text/css">
.deleted {
	text-decoration: line-through;
}
.inserted {
	text-decoration: underline;
}
</style>
</head>

<body>

<table summary="This table provides identifying information for this document.">
	<tr>
		<th>Doc. No.:</th>
		<td>WG21/N2052<br />
		J16/06-0122</td>
	</tr>
	<tr>
		<th>Date:</th>
		<td>2006-09-07</td>
	</tr>
	<tr>
		<th>Reply to:</th>
		<td>Clark Nelson</td>
		<td>Hans-J. Boehm</td>
	</tr>
	<tr>
		<th>Phone:</th>
		<td>+1-503-712-8433</td>
		<td>+1-650-857-3406</td>
	</tr>
	<tr>
		<th>Email:</th>
		<td><a href="mailto:clark.nelson@intel.com">clark.nelson@intel.com</a></td>
		<td><a href="mailto:Hans.Boehm@hp.com">Hans.Boehm@hp.com</a></td>
	</tr>
</table>
<h1>Sequencing and the concurrency memory model</h1>
<p>This paper is a successor to, but not a revision of,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1944.htm">N1944</a>. 
N1944 was basically an exploratory paper, despite the amount of nearly-WD-ready 
text proposed; its style of presentation was very heavy on explanation and motivation. 
Consequently, it is certain to be useful as a tutorial introduction and/or rationale 
for this paper.</p>
<p>But based on the amount of positive feedback received, the exploratory phase 
could hopefully be considered complete. Furthermore, some of the feedback received 
would have been difficult to address in a document organized as N1944 was. It now 
seems highly desirable to have a cohesive presentation of the changed WD text, emphasizing 
the result rather than the process. This paper also presents work on aspects of 
sequencing explicitly related to concurrency, addressing other feedback on N1944.</p>
<p>This paper should also be viewed as a successor to
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1942.html">N1942</a>, 
the memory model proposal. Again, much of the explanatory material from N1942 is 
not repeated here. In an attempt to simplify, some of the terminology has changed 
from N1942.</p>
<h2>Contents</h2>
<ul>
	<li><a href="#n1944">Significant changes in the proposed wording since N1944</a></li>
	<li><a href="#rearranging">Rearranging the text of &quot;Program execution&quot;</a></li>
	<li><a href="#execution">The text proposed for &quot;Program execution&quot;</a></li>
	<li><a href="#location">The definition of &quot;memory location&quot;</a></li>
	<li><a href="#races">Multi-threaded executions and data races</a></li>
	<li><a href="#operators">Sequencing for specific operators</a></li>
	<li><a href="#temporaries">Sequencing for destruction of temporaries</a></li>
	<li><a href="#miscellaneous">Fixes for miscellaneous sequencing issues</a></li>
	<li><a href="#loops">Semantics of some non-terminating loops</a></li>
	<li><a href="#library">Library thread-safety</a></li>
</ul>
<h2><a id="n1944">Significant changes in the proposed wording since N1944</a></h2>
<p>The WD text proposed in N1944 introduced ambiguity in the use of the term &quot;evaluation&quot;. 
Most new uses of that term were intended to reflect usage in mathematics, as in 
the computation of a value, without side effects. This usage is inconsistent with 
C/C++ tradition, and the way the term is used in the standard. So when it is necessary 
to talk about evaluations that do not have side effects, the term &quot;value computation&quot; 
is now used.</p>
<p>There is a new paragraph defining and explaining the &quot;sequenced before&quot; relation; 
see <a href="#s1.9p14">1.9p14</a>.</p>
<p>To reflect the consensus from the discussion in Berlin, a note has been added 
clearly stating that there is no requirement of consistency for operations whose 
sequencing is not constrained; see <a href="#s1.9p16">1.9p16</a>.</p>
<p>The statement of the &quot;no interleaving&quot; rule for functions has been updated; see
<a href="#s1.9p17">1.9p17</a>. Also, an example has been added pointing out a possibly-surprising 
interpretation of &quot;unspecified behavior&quot;.</p>
<p>Resolutions are proposed for several questions raised but not answered in N1944, 
mostly in <a href="#miscellaneous">Fixes for miscellaneous sequencing issues</a>.</p>
<h2><a id="rearranging">Rearranging the text of &quot;Program execution&quot;</a></h2>
<p>The changes proposed in N1944 were mainly in section 1.9 (Program execution) 
and various locations in clause 5 (Expressions), plus a couple of spots in clause 
12 (Special member functions). The &quot;undefined behavior&quot; rule, a key paragraph in 
the understanding of sequencing, which basically describes what may be called an 
&quot;intra-thread data race&quot;, is currently in 5p4, which is widely separated from the 
bulk of the discussion of the principles of sequencing in 1.9. Furthermore, it would 
seem logical to describe concurrency &#8212; and particularly inter-thread data races 
&#8212; in a new section building on and immediately following 1.9. Therefore we propose 
to move the &quot;undefined behavior&quot; rule from 5p4 to 1.9.</p>
<p>Within 1.9 with the changes proposed in N1944, the bulk of the discussion of 
sequencing is in p15-16. Paragraph 8, which currently contains the &quot;no overlap&quot; 
rule for function execution, should be merged into p16, which discusses many other 
sequencing constraints on function calls. And if, as proposed, the references to 
sequence points and evaluation are removed from p11 (the &quot;least requirements&quot;), 
then the definitions in p7 are not needed until p15; moving paragraph 7 down would 
result in a more cohesive presentation.</p>
<p>Finally, it could be argued that cohesiveness would be increased still further 
by moving the discussion of reassociation (concerning implications of the &quot;as-if&quot; 
rule) to immediately follow the &quot;least requirements&quot; (which is basically the normative 
statement of the &quot;as-if&quot; rule), instead of showing up in the middle of the discussion 
of expressions and sequencing.</p>
<p>This table shows the proposed shifting of content, assuming regular paragraph 
(re-)numbering. The letters in the central columns are just tags, intended to illustrate 
how text moves around (in lieu of arrows): the tag stays with the content.</p>
<table>
	<tr>
		<th>Paragraph number</th>
		<th>Old content</th>
		<th colspan="2"></th>
		<th>New content</th>
	</tr>
	<tr>
		<td>1.9p7</td>
		<td>Definitions of &quot;side effect&quot;, &quot;sequence point&quot;</td>
		<td>A</td>
		<td>C</td>
		<td>Effect of asynchronous signal</td>
	</tr>
	<tr>
		<td>1.9p8</td>
		<td>&quot;No overlap&quot; rule for function execution</td>
		<td>B</td>
		<td>C</td>
		<td>Allocation of automatic objects</td>
	</tr>
	<tr>
		<td>1.9p9</td>
		<td>Effect of asynchronous signal</td>
		<td>C</td>
		<td>C</td>
		<td>The &quot;least requirements&quot;</td>
	</tr>
	<tr>
		<td>1.9p10</td>
		<td>Allocation of automatic objects</td>
		<td>C</td>
		<td>E</td>
		<td>Note concerning reassociation</td>
	</tr>
	<tr>
		<td>1.9p11</td>
		<td>The &quot;least requirements&quot;</td>
		<td>C</td>
		<td>D</td>
		<td>Definition of &quot;full-expression&quot;</td>
	</tr>
	<tr>
		<td>1.9p12</td>
		<td>Definition of &quot;full-expression&quot;</td>
		<td>D</td>
		<td>D</td>
		<td>Note concerning default arguments</td>
	</tr>
	<tr>
		<td>1.9p13</td>
		<td>Note concerning default arguments</td>
		<td>D</td>
		<td>A</td>
		<td>Definition of &quot;side effect&quot;, &quot;evaluation&quot;</td>
	</tr>
	<tr>
		<td>1.9p14</td>
		<td>Note concerning reassociation</td>
		<td>E</td>
		<td>[new]</td>
		<td>Definition of &quot;sequenced before&quot;</td>
	</tr>
	<tr>
		<td>1.9p15</td>
		<td>Sequencing between full-expressions</td>
		<td>F</td>
		<td>F</td>
		<td>Sequencing between full-expressions</td>
	</tr>
	<tr>
		<td>1.9p16</td>
		<td>Sequencing constraints on function calls</td>
		<td>G</td>
		<td>5p4</td>
		<td>The &quot;undefined behavior&quot; rule</td>
	</tr>
	<tr>
		<td>1.9p17</td>
		<td>Operators that impose a sequence point</td>
		<td>[delete]</td>
		<td>G+B</td>
		<td>Sequencing constraints on function calls, including the &quot;no overlap&quot; 
		rule</td>
	</tr>
</table>
<h2><a id="execution">The text proposed for &quot;Program execution&quot;</a></h2>
<p>So here is the proposed reading of section 1.9, beginning with p6 (just for the 
sake of context). Each paragraph is introduced with its proposed paragraph number, 
and an explanation of its source. Text from the current working draft to be replaced 
or deleted is <span class="deleted">stricken through</span>. Replacement or added 
text is <span class="inserted">underlined</span>. Footnotes are presented here in 
the same style as examples and notes. If the introductory paragraphs and stricken 
text were deleted, the result would be longish block of consecutive paragraphs, 
as proposed for the standard.</p>
<p>1.9p6 (unchanged):</p>
<blockquote>
	<p>The observable behavior of the abstract machine is its sequence of reads 
	and writes to <code>volatile</code> data and calls to library I/O functions. 
	An implementation can offer additional library I/O functions as an extension. 
	[ <em>Footnote:</em> Implementations that do so should treat calls to those 
	functions as &quot;observable behavior&quot; as well. &#8212;<em>end footnote</em> ]</p>
</blockquote>
<p>1.9p7 (unchanged from the current p9, except for the addition of an omitted word):</p>
<blockquote>
	<p>When the processing of the abstract machine is interrupted by receipt of 
	a signal, the values of objects with type other than <code>volatile std::sig_atomic_t</code> 
	are unspecified, and the value of any object not of <span class="inserted">type</span>
	<code>volatile std::sig_atomic_t</code> that is modified by the handler becomes 
	undefined.</p>
</blockquote>
<p>1.9p8 (unchanged from the current p10):</p>
<blockquote>
	<p>An instance of each object with automatic storage duration (3.7.2) is associated 
	with each entry into its block. Such an object exists and retains its last-stored 
	value during the execution of the block and while the block is suspended (by 
	a call of a function or receipt of a signal).</p>
</blockquote>
<p>1.9p9 (original text from p11):</p>
<blockquote>
	<p>The least requirements on a conforming implementation are:</p>
	<ul>
		<li><span class="deleted">At sequence points, volatile objects are stable 
		in the sense that previous evaluations are complete and subsequent evaluations 
		have not yet occurred.</span> <span class="inserted">Accesses to volatile 
		objects are initiated strictly according to the rules of the abstract machine.</span></li>
		<li>At program termination, all data written into files shall be identical 
		to one of the possible results that execution of the program according to 
		the abstract semantics would have produced.</li>
		<li>The input and output dynamics of interactive devices shall take place 
		in such a fashion that prompting messages actually appear prior to a program 
		waiting for input. What constitutes an interactive device is implementation-defined.</li>
	</ul>
	<p>[ <em>Note:</em> more stringent correspondences between abstract and actual 
	semantics may be defined by each implementation. &#8212;<em>end note</em> ]</p>
</blockquote>
<p>1.9p10 (unchanged from p14):</p>
<blockquote>
	<p>[ <em>Note:</em> operators can be regrouped according to the usual mathematical 
	rules only where the operators really are associative or commutative.<sup>11)</sup> 
	For example, in the following fragment</p>
	<blockquote>
		<p><em>[unchanged text omitted]</em></p>
	</blockquote>
	<p>However on a machine in which overflows do not produce an exception and in 
	which the results of overflows are reversible, the above expression statement 
	can be rewritten by the implementation in any of the above ways because the 
	same result will occur. &#8212;<em>end note</em> ]</p>
</blockquote>
<p>1.9p11 (original text from p12):</p>
<blockquote>
	<p>A <dfn>full-expression</dfn> is an expression that is not a subexpression 
	of another expression. If a language construct is defined to produce an implicit 
	call of a function, a use of the language construct is considered to be an expression 
	for the purposes of this definition. <span class="inserted">A call to a destructor 
	generated at the end of the lifetime of an object other than a temporary object 
	is an implicit full-expression.</span> Conversions applied to the result of 
	an expression in order to satisfy the requirements of the language construct 
	in which the expression appears are also considered to be part of the full-expression. 
	[ <em>Example:</em></p>
	<blockquote>
		<p><em>[unchanged example omitted]</em></p>
	</blockquote>
</blockquote>
<p>1.9p12 (unchanged from p13):</p>
<blockquote>
	<p>[ <em>Note:</em> the evaluation of a full-expression can include the evaluation 
	of subexpressions that are not lexically part of the full-expression. For example, 
	subexpressions involved in evaluating default argument expressions (8.3.6) are 
	considered to be created in the expression that calls the function, not the 
	expression that defines the default argument. &#8212;<em>end note</em> ]</p>
</blockquote>
<p>1.9p13 (original text from p7):</p>
<blockquote>
	<p>Accessing an object designated by a <code>volatile</code> lvalue (3.10), 
	modifying an object, calling a library I/O function, or calling a function that 
	does any of those operations are all <dfn>side effects</dfn>, which are changes 
	in the state of the execution environment. <span class="deleted">Evaluation 
	of an expression might produce side effects.</span> <span class="inserted">
	<dfn>Evaluation</dfn> of an expression (or sub-expression) in general includes 
	both value computations (including fetching a value previously assigned to an 
	object) and initiation of side effects.</span> <span class="deleted">At certain 
	specified points in the execution sequence called <dfn>sequence points</dfn>, 
	all side effects of previous evaluations shall be complete and no side effects 
	of subsequent evaluations shall have taken place.</span> [ <em>Footnote:</em> 
	Note <span class="deleted">that some aspects of sequencing in the abstract machine 
	are unspecified; the preceding restriction upon side effects applies to that 
	particular execution sequence in which the actual code is generated. Also note</span> 
	that when a call to a library I/O function returns, the side effect is considered 
	complete, even though some external actions implied by the call (such as the 
	I/O itself) may not have completed yet. &#8212;<em>end footnote</em> ]</p>
</blockquote>
<p><a id="s1.9p14">1.9p14 (new paragraph):</a></p>
<blockquote>
	<p><span class="inserted">&quot;<dfn>Sequenced before</dfn>&quot; is an asymmetric, transitive, 
	pair-wise relation between evaluations executed by a single thread, which induces 
	a partial order among those evaluations. Given any two evaluations <var>A</var> 
	and <var>B</var>, if <var>A</var> is sequenced before <var>B</var>, then the 
	execution of <var>A</var> shall precede the execution of <var>B</var>. If
	<var>A</var> is not sequenced before <var>B</var> and <var>B</var> is not sequenced 
	before <var>A</var>, then <var>A</var> and <var>B</var> are <dfn>unsequenced</dfn>. 
	[ <em>Note:</em> The execution of unsequenced evaluations can overlap. &#8212;<em>end 
	note</em> ] When <var>A</var> and <var>B</var> are <dfn>indeterminately sequenced</dfn>, 
	then either <var>A</var> is sequenced before <var>B</var>, or <var>B</var> is 
	sequenced before <var>A</var>, but which is unspecified. [ <em>Note:</em> Indeterminately 
	sequenced evaluations can not overlap, but either could be executed first. &#8212;<em>end 
	note</em> ]</span></p>
</blockquote>
<p>1.9p15 (original text from p15):</p>
<blockquote>
	<p><span class="deleted">There is a sequence point at the completion of evaluation 
	of each full-expression.</span> <span class="inserted">Every value computation 
	and side effect associated with a full-expression is sequenced before every 
	value computation and side effect associated with the next full-expression to 
	be evaluated.</span> [ <em>Footnote:</em> As specified in 12.2,
	<span class="deleted">after the &quot;end-of-full-expression&quot; sequence point</span>
	<span class="inserted">after a full-expression is evaluated</span>, a sequence 
	of zero or more invocations of destructor functions for temporary objects takes 
	place, usually in reverse order of the construction of each temporary object. 
	&#8212;<em>end footnote</em> ]</p>
</blockquote>
<p><a id="s1.9p16">1.9p16 (original text from clause 5 paragraph 4):</a></p>
<blockquote>
	<p>Except where noted, <span class="deleted">the order of evaluation</span>
	<span class="inserted">evaluations</span> of operands of individual operators<span class="inserted">,</span> 
	and <span class="inserted">of</span> subexpressions of individual expressions<span class="deleted">, 
	and the order in which side effects take place, is unspecified</span>
	<span class="inserted">are unsequenced</span>. [ <em>Footnote:</em> The precedence 
	of operators is not directly specified, but it can be derived from the syntax. 
	&#8212;<em>end footnote</em> ] <span class="inserted">[ <em>Note:</em> In an expression 
	that is evaluated more than once during the execution of a program, unsequenced 
	and indeterminately sequenced evaluations of its subexpressions need not be 
	performed consistently in different evaluations. &#8212;<em>end note</em> ]</span>
	<span class="deleted">Between the previous and next sequence point a scalar 
	object shall have its stored value modified at most once by the evaluation of 
	an expression. Furthermore, the prior value shall be accessed only to determine 
	the value to be stored. The requirements of this paragraph shall be met for 
	each allowable ordering of the subexpressions of a full expression; otherwise 
	the behavior is undefined.</span> <span class="inserted">If a side effect on 
	a scalar object is not sequenced relative to either a different side effect 
	on the same scalar object, or a value computation using the value of the same 
	scalar object, the behavior is undefined.</span> [ <em>Example:</em></p>
	<blockquote>
		<pre>i = v[i++];       <em>// the behavior is undefined</em>
<!--      -->i = 7, i++, i++;  <em>//</em> i <em>becomes</em> 9
<!--      -->i = ++i + 1;      <em>// the behavior is undefined</em>
<!--      -->i = i + 1;        <em>// the value of</em> i <em>is incremented</em></pre>
	</blockquote>
	<p>&#8212;<em>end example</em> ]</p>
</blockquote>
<p><a id="s1.9p17">1.9p17 (original text is p16 with p8 inserted):</a></p>
<blockquote>
	<p></p>
	<p>When calling a function (whether or not the function is inline),
	<span class="deleted">there is a sequence point after the evaluation of all 
	function arguments (if any) which takes place</span> <span class="inserted">
	every value computation and side effect associated with with any argument expression, 
	or with the postfix expression designating the called function, is sequenced</span> 
	before execution of any <span class="deleted">expressions or statements</span>
	<span class="inserted">expression or statement</span> in the
	<span class="inserted">body of the called</span> function
	<span class="deleted">body</span>. <span class="inserted">[ <em>Note:</em> Value 
	computations and side effects associated with different argument expressions 
	are unsequenced. &#8212;<em>end note</em> ]</span> <span class="deleted">There is 
	also a sequence point after the copying of a returned value and before the execution 
	of any expressions outside the function. [ <em>Footnote:</em> The sequence point 
	at the function return is not explicitly specified in ISO C, and can be considered 
	redundant with sequence points at full-expressions, but the extra clarity is 
	important in C++. In C++, there are more ways in which a called function can 
	terminate its execution, such as the throw of an exception. &#8212;<em>end footnote</em> 
	]</span> <span class="deleted">Once the execution of a function begins, no expressions 
	from the calling function are evaluated until execution of the called function 
	has completed.</span> <span class="inserted">Every evaluation in the calling 
	function (including other function calls) that is not otherwise specifically 
	sequenced before or after the execution of the body of the called function is 
	indeterminately sequenced with respect to the execution of the called function.</span> 
	[ <em>Footnote:</em> In other words, function executions do not &quot;interleave&quot; 
	with each other. &#8212;<em>end footnote</em> ] Several contexts in C++ cause evaluation 
	of a function call, even though no corresponding function call syntax appears 
	in the translation unit. [ <em>Example:</em> evaluation of a new expression 
	invokes one or more allocation and constructor functions; see 5.3.4. For another 
	example, invocation of a conversion function (12.3.2) can arise in contexts 
	in which no function call syntax appears. &#8212;<em>end example</em> ] The
	<span class="deleted">sequence points at function-entry and function-exit</span>
	<span class="inserted">sequencing constraints on the execution of the called 
	function</span> (as described above) are features of the function calls as evaluated, 
	whatever the syntax of the expression that calls the function might be.
	<span class="inserted">[ <em>Example:</em></span></p>
	<blockquote class="inserted">
		<pre>int increment_x() { x++; }
<!--      -->x++ + increment_x();                <em>// Evaluation order unspecified; x may be incremented only once</em>
<!--      -->increment_x() + increment_x();      <em>// </em>x<em> is incremented twice</em></pre>
	</blockquote>
	<p class="inserted">&#8212;<em>end example</em> ]</p>
</blockquote>
<p>Deleted as redundant with descriptions of operators (original text from p17):</p>
<blockquote>
	<p><span class="deleted">In the evaluation of each of the expressions</span></p>
	<blockquote>
		<pre><span class="deleted">a &amp;&amp; b
a || b
a ? b : c
a , b</span></pre>
	</blockquote>
	<p><span class="deleted">using the built-in meaning of the operators in these 
	expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after the evaluation 
	of the first expression. [ <em>Footnote:</em> The operators indicated in this 
	paragraph are the built-in operators, as described in clause 5. When one of 
	these operators is overloaded (clause 13) in a valid context, thus designating 
	a user-defined operator function, the expression designates a function invocation, 
	and the operands form an argument list, without an implied sequence point between 
	them. &#8212;<em>end footnote</em> ]</span></p>
</blockquote>
<h2><a id="location">The definition of &quot;memory location&quot;</a></h2>
<p>New paragraphs inserted as 1.7p3 et seq.:</p>
<blockquote>
	<p><span class="inserted">A <dfn>memory location</dfn> is either an object of 
	scalar type, or a maximal sequence of adjacent bit-fields all having non-zero 
	width. Two threads of execution can update and access separate memory locations 
	without interfering with each other.</span></p>
	<p><span class="inserted">[<em>Note</em>: Thus a bit-field and an adjacent non-bit-field 
	are in separate memory locations, and therefore can be concurrently updated 
	by two threads of execution without interference. The same applies to two bit-fields, 
	if one is declared inside a nested struct declaration and the other is not, 
	or if the two are separated by a zero-length bit-field declaration, or if they 
	are separated by a non-bit-field declaration. It is not safe to concurrently 
	update two bit-fields in the same struct if all fields between them are also 
	bit-fields, no matter what the sizes of those intervening bit-fields happen 
	to be.]</span></p>
	<p><span class="inserted">[<em>Example</em>: A structure declared as <code>struct 
	{char a; int b:5, c:11, :0, d:8; struct {int ee:8;} e;}</code> contains four 
	separate memory locations: The fields <code>a</code>, and bit-fields <code>d</code> 
	and <code>e.ee</code> are each separate memory locations, and can be modified 
	concurrently without interfering with each other. The bit-fields <code>b</code> 
	and <code>c</code> together constitute the fourth memory location. The bit-fields
	<code>b</code> and <code>c</code> can not be concurrently modified, but
	<code>b</code> and <code>a</code>, for example, can be. <em>--end example</em>.]</span>
	</p>
</blockquote>
<h2><a id="races">Multi-threaded executions and data races</a></h2>
<p>Insert a new section between 1.9 and 1.10, titled &quot;Multi-threaded executions 
and data races&quot;.</p>
<p>1.10p1:</p>
<blockquote class="inserted">
	<p>Under a hosted implementation, a C++ program can have more than one <dfn>
	thread of execution</dfn> (a.k.a. <dfn>thread</dfn>) running concurrently. Each 
	thread executes a single function according to the rules expressed in this standard. 
	The execution of the entire program consists of an interleaved execution of 
	all of its threads. Under a freestanding implementation, it is implementation-defined 
	whether a program can have more than one thread of execution.</p>
</blockquote>
<p>1.10p2:</p>
<blockquote class="inserted">
	<p>The execution of each thread proceeds as defined by the remainder of this 
	standard. The value of an object visible to a thread <var>T</var> at a particular 
	point might be the initial value of the object, a value assigned to the object 
	by <var>T</var>, or a value assigned to the object by another thread, according 
	to the rules below.</p>
</blockquote>
<p>1.10p3:</p>
<blockquote class="inserted">
	<p>Two expression evaluations <dfn>conflict</dfn> if one of them modifies a 
	memory location and the other one accesses or modifies the same memory location.</p>
</blockquote>
<!--
<blockquote class="inserted">
	<p>If two conflicting evaluations are performed by the same thread, and neither 
	is sequenced before the other, then the execution sequence contains an intra-thread 
	data race. Any intra-thread data race is an undefined operation, and no requirements 
	are placed on such an execution.</p>
</blockquote>
-->
<!--
<blockquote class="inserted">
	<p>[<em>Note:</em> The purpose of the rest of this section is (1) to define 
	an inter-thread data race, which will also give rise to an undefined operation, 
	and (2) to define how an assignment to an object in one thread might affect 
	the value of that object as seen by other threads. None of this is relevant 
	to implementations that are limited to a single thread.]</p>
</blockquote>
-->
<p>1.10p4:</p>
<blockquote class="inserted">
	<p>The library defines a number of operations, such as operations on locks, 
	that are specially identified as synchronization operations. These operations 
	play a special role in making assignments in one thread visible to another. 
	A <dfn>synchronization operation</dfn> is either an acquire operation or a release 
	operation, or both, on one or more memory locations. [<em>Note:</em> For example, 
	a call that acquires a lock will perform an acquire operation on the locations 
	comprising the lock. Correspondingly, a call that releases the same lock will 
	perform a release operation on those same locations. Informally, performing 
	a release operation on <var>A</var> forces prior side effects on other memory 
	locations to become visible to other threads that later perform an acquire operation 
	on <var>A</var>.]</p>
</blockquote>
<p>The following merges in the &quot;depends-on&quot; relation from the description in N1944. 
Hopefully this is easier to follow.</p>
<p>1.10p5:</p>
<blockquote class="inserted">
	<p>An expression evaluation <var>A</var> is <dfn>inter-thread-ordered-before</dfn> 
	another evaluation <var>B</var> if:</p>
	<ul>
		<li><var>A</var> is sequenced before <var>B</var> and either <var>A</var> 
		performs an acquire operation, or <var>B</var> performs a release operation; 
		or</li>
		<li><var>A</var> is an unordered atomic read and <var>B</var> is an unordered 
		atomic write, and either the value written by <var>B</var> is computed 
		using the value read by <var>A</var>, or the execution of <var>B</var> is conditioned 
		on the value read by <var>A</var>.</li>
	</ul>
	<!--
	<p>[<em>Note:</em> This definition is redundant for most synchronization 
	operations, since those that read a value will usually have acquire semantics, 
	and those that update a value will usually have release semantics. However, 
	for isolated operations that do not provide such guarantees, it avoids results 
	that can only be justified by inherently &quot;circular&quot; executions.]</p>
	-->
</blockquote>
<p>1.10p6:</p>
<blockquote class="inserted">
	<p>[<em>Note:</em> An evaluation <var>A</var> can only be inter-thread-ordered-before
	<var>B</var> if <var>A</var> is also sequenced before <var>B</var>. For race-free 
	programs making conventional use of locks, the distinction between inter-thread-ordered-before 
	and sequenced-before is unimportant. The distinction becomes important with 
	very weakly ordered library synchronization primitives.]</p>
</blockquote>
<p>This was rewritten in terms of &quot;synchronizes-with&quot;, which is restricted to synchronization 
operations, instead of explicitly including store-load dependencies in a &quot;communicates-with&quot; 
relation as in N1944. This version is intended to be equivalent, since we insist 
that &quot;happens-before&quot; together with store-load dependencies remains acyclic. We 
need that for the race free implies sequential consistency proof, and for one of 
the examples.</p>
<p>1.10p7:</p>
<blockquote class="inserted">
	<p>A evaluation <var>A</var> that performs a release operation on a location
	<var>L</var> <dfn>synchronizes-with</dfn> an evaluation <var>B</var> that performs 
	an acquire operation on <var>L</var> and reads the value written by <var>A</var>.
	[<em>Note:</em> The specifications of the synchronization operations define 
	when one reads the value written by another. For atomic variables, the definition 
	is clear. For locks, we assume that all lock operations occur in a single total 
	order. Each lock acquisition &quot;reads the value written&quot; by the last lock release.]</p>
</blockquote>
<p>1.10p8:</p>
<blockquote class="inserted">
	<p>An evaluation <var>A</var> <dfn>happens-before</dfn> an evaluation <var>B</var> 
	if:</p>
	<ul>
		<li><var>A</var> is inter-thread-ordered-before <var>B</var>; or</li>
		<li><var>A</var> synchronizes-with <var>B</var>; or</li>
		<li>for any evaluation <var>X</var>, <var>A</var> happens-before <var>X</var>, 
		and <var>X</var> happens-before <var>B</var>.</li>
	</ul>
</blockquote>
<p>1.10p9:</p>
<blockquote class="inserted">
	<p>An evaluation <var>A</var> <dfn>precedes</dfn> an evaluation <var>B</var> 
	if:</p>
	<ul>
		<li><var>A</var> happens-before <var>B</var>; or</li>
		<li><var>A</var> is an assignment, and <var>B</var> observes the value stored 
		by <var>A</var>.</li>
	</ul>
</blockquote>
<p>1.10p10:</p>
<blockquote class="inserted">
	<p>A multi-threaded execution is <dfn>consistent</dfn> if each thread observes 
	values of objects that obey the following constraints:</p>
	<ul>
		<li>No evaluation precedes itself.</li>
		<li>Each read access <var>B</var> to a scalar object observes the value 
		assigned to that object by a side effect <var>A</var> only if there is no 
		other side effect <var>X</var> to the same object such that
		<ul>
			<li><var>A</var> is sequenced before or happens-before <var>X</var>, 
			and</li>
			<li><var>X</var> is sequenced before or happens-before <var>B</var>.
			</li>
		</ul>
		</li>
	</ul>
	<p>[<em>Note:</em> The first condition implies that a read operation <var>B</var> 
	cannot &quot;see&quot; an assignment <var>A</var> if <var>B</var> happens-before <var>
	A</var>. It also prevents cyclic situation in which, for example <code>x</code> 
	and <code>y</code> are initially zero, one thread evaluates <code>x = y;</code> 
	while another evaluates <code>y = x;</code>, each sees the result of the other 
	thread, and both <code>x</code> and <code>y</code> obtain a value of 42. The 
	second condition effectively asserts that later assignments hide earlier ones 
	if there is a well-defined order between them.]</p>
</blockquote>
<p>1.10p11:</p>
<blockquote class="inserted">
	<p>An execution contains an <dfn>inter-thread data race</dfn> if it contains 
	two conflicting actions in different threads, at least one of which is not atomic, 
	and neither happens-before the other. Any inter-thread data race results in 
	undefined behavior. A multi-threaded program that does not contain a data race exhibits the behavior 
	of a consistent execution. [<em>Note:</em> It can be shown that programs that correctly use simple locks 
	to prevent all inter-thread data races, and use no other synchronization operations, 
	behave as though the executions of their constituent threads were simply interleaved, 
	with each observed value of an object being the last value assigned in that 
	interleaving. This is normally referred to as &quot;sequential consistency&quot;. However, 
	this applies only to race-free programs, and race-free programs cannot observe 
	most program transformations that do not change single-threaded program semantics. 
	In fact, most single-threaded program transformations continue to be allowed, 
	since any program that behaves differently as a result must perform an undefined 
	operation.]</p>
</blockquote>
<p>1.10p12:</p>
<blockquote class="inserted">
	<p>[<em>Note:</em> Compiler transformations that introduce assignments to a 
	potentially shared memory location that would not be modified by the abstract 
	machine are generally precluded by this standard, since such an assignment might 
	overwrite another assignment by a different thread in cases in which an abstract 
	machine execution would not have encountered a data race.]</p>
</blockquote>
<p>Various other changes in the base language are no doubt needed, but not yet clear. 
I think there is somewhat of a consensus that thread-safety of static initialization 
should be explicitly indicated with a new keyword such as &quot;async&quot;? Exception issues 
should probably be deferred to the thread API proposal.</p>
<h2><a id="operators">Sequencing for specific operators</a></h2>
<p>5.2.2p8 (function call); deleted as redundant with (new) 1.9p17:</p>
<blockquote>
	<p><span class="deleted">The order of evaluation of arguments is unspecified. 
	All side effects of argument expression evaluations take effect before the function 
	is entered. The order of evaluation of the postfix expression and the argument 
	expression list is unspecified.</span></p>
</blockquote>
<p>5.2.6p1 (post-increment):</p>
<blockquote>
	<p>The value <span class="deleted">obtained by applying</span>
	<span class="inserted">of</span> a postfix <code>++</code>
	<span class="inserted">expression</span> is the value <span class="deleted">
	that the</span> <span class="inserted">of its</span> operand
	<span class="deleted">had before applying the operator</span>. [ <em>Note:</em> 
	the value obtained is a copy of the original value &#8212;<em>end note</em> ] The 
	operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic 
	type or a pointer to a complete object type. <span class="deleted">After the 
	result is noted, the</span> <span class="inserted">The</span> value of the
	<span class="inserted">operand</span> object is modified by adding <code>1</code> 
	to it, unless the object is of type <code>bool</code>, in which case it is set 
	to <code>true</code>. [ <em>Note:</em> this use is deprecated, see Annex D. 
	&#8212;<em>end note</em> ] <span class="inserted">The value computation of the
	<code>++</code> expression is sequenced before the modification of the operand 
	object.</span> The result is an rvalue. The type of the result is the cv-unqualified 
	version of the type of the operand. See also 5.7 and 5.17.</p>
</blockquote>
<p>5.14p2 (logical AND operator), and also 5.15p2 (logical OR operator):</p>
<blockquote>
	<p>The result is a <code>bool</code>. <span class="deleted">All side effects 
	of the first expression except for destruction of temporaries (12.2) happen 
	before the second expression is evaluated.</span> <span class="inserted">If 
	the second expression is evaluated, every value computation and side effect 
	associated with the first expression is sequenced before every value computation 
	and side effect associated with the second expression.</span></p>
</blockquote>
<p>5.16p1 (conditional operator):</p>
<blockquote>
	<p>Conditional expressions group right-to-left. The first expression is implicitly 
	converted to <code>bool</code> (clause 4). It is evaluated and if it is
	<code>true</code>, the result of the conditional expression is the value of 
	the second expression, otherwise that of the third expression.
	<span class="deleted">All side effects of the first expression except for destruction 
	of temporaries (12.2) happen before the second or third expression is evaluated.</span> 
	Only one of the second and third expressions is evaluated.
	<span class="inserted">Every value computation and side effect associated with 
	the first expression is sequenced before every value computation and side effect 
	associated with the second or third expression.</span></p>
</blockquote>
<p>5.17p1 (assignment and compound assignment operators):</p>
<blockquote>
	<p>The assignment operator (<code>=</code>) and the compound assignment operators 
	all group right-to-left. All require a modifiable lvalue as their left operand 
	and return <span class="deleted">an lvalue with the type and value of the left 
	operand after the assignment has taken place</span> <span class="inserted">an 
	lvalue referring to the left operand</span>. The result in all cases is a bit-field 
	if the left operand is a bit-field. <span class="inserted">In all cases, the 
	assignment is sequenced after the value computation of the right and left operands, 
	and before the value computation of the assignment expression.</span></p>
</blockquote>
<p>5.18p1 (comma operator):</p>
<blockquote>
	<p>A pair of expressions separated by a comma is evaluated left-to-right and 
	the value of the left expression is discarded. The lvalue-to-rvalue (4.1), array-to-pointer 
	(4.2), and function-to-pointer (4.3) standard conversions are not applied to 
	the left expression. <span class="deleted">All side effects (1.9) of the left 
	expression, except for the destruction of temporaries (12.2), are performed 
	before the evaluation of the right expression.</span> <span class="inserted">
	Every value computation and side effect associated with the left expression 
	is sequenced before every value computation and side effect associated with 
	the right expression.</span> The type and value of the result are the type and 
	value of the right operand; the result is an lvalue if its right operand is 
	an lvalue, and is a bit-field if its right operand is an lvalue and a bit-field.</p>
</blockquote>
<h2><a id="temporaries">Sequencing for destruction of temporaries</a></h2>
<p>12.2p3:</p>
<blockquote>
	<p>When an implementation introduces a temporary object of a class that has 
	a non-trivial constructor (12.1, 12.8), it shall ensure that a constructor is 
	called for the temporary object. Similarly, the destructor shall be called for 
	a temporary with a non-trivial destructor (12.4). Temporary objects are destroyed 
	as the last step in evaluating the full-expression (1.9) that (lexically) contains 
	the point where they were created. This is true even if that evaluation ends 
	in throwing an exception. <span class="inserted">The value computations and 
	side effects of destroying a temporary object are associated only with the full-expression, 
	not with any specific subexpression.</span></p>
</blockquote>
<p>12.2p4:</p>
<blockquote>
	<p>There are two contexts in which temporaries are destroyed at a different 
	point than the end of the full-expression. The first context is when a default 
	constructor is called to initialize an element of an array. If the constructor 
	has one or more default arguments, <span class="inserted">the destruction of</span> 
	any <span class="deleted">temporaries</span> <span class="inserted">temporary</span> 
	created in <span class="deleted">the</span> <span class="inserted">a</span> 
	default argument <span class="deleted">expressions are destroyed immediately 
	after return from the constructor</span> <span class="inserted">expression is 
	sequenced before the construction of the next array element, if any</span>.</p>
</blockquote>
<p>12.2p5:</p>
<blockquote>
	<p>The second context is when a reference is bound to a temporary. The temporary 
	to which the reference is bound or the temporary that is the complete object 
	of a subobject to which the reference is bound persists for the lifetime of 
	the reference except as specified below. A temporary bound to a reference member 
	in a constructor&#8217;s ctor-initializer (12.6.2) persists until the constructor 
	exits. A temporary bound to a reference parameter in a function call (5.2.2) 
	persists until the completion of the full expression containing the call. A 
	temporary bound to the returned value in a function return statement (6.6.3) 
	persists until the function exits. <span class="deleted">In all these cases, 
	the temporaries created during the evaluation of the expression initializing 
	the reference, except the temporary to which the reference is bound, are destroyed 
	at the end of the full-expression in which they are created and in the reverse 
	order of the completion of their construction.</span> <span class="inserted">
	The destruction of a temporary whose lifetime is not extended by being bound 
	to a reference is sequenced before the destruction of any of any temporary which 
	is constructed earlier in the same full-expression.</span> If the lifetime of 
	two or more temporaries to which references are bound ends at the same point, 
	these temporaries are destroyed at that point in the reverse order of the completion 
	of their construction. In addition, the destruction of temporaries bound to 
	references shall take into account the ordering of destruction of objects with 
	static or automatic storage duration (3.7.1, 3.7.2); that is, if <code>obj1</code> 
	is an object with the same storage duration as the temporary and created before 
	the temporary is created the temporary shall be destroyed before <code>obj1</code> 
	is destroyed; if obj2 is an object with the same storage duration as the temporary 
	and created after the temporary is created the temporary shall be destroyed 
	after obj2 is destroyed. [ Example:</p>
</blockquote>
<h2><a id="miscellaneous">Fixes for miscellaneous sequencing issues</a></h2>
<p>3.6.2p1 (initialization of non-local objects):</p>
<blockquote>
	<p>Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) 
	before any other initialization takes place. A reference with static storage 
	duration and an object of POD type with static storage duration can be initialized 
	with a constant expression (5.19); this is called <dfn>constant initialization</dfn>. 
	Together, zero-initialization and constant initialization are called <dfn>static 
	initialization</dfn>; all other initialization is <dfn>dynamic initialization</dfn>. 
	Static initialization shall be performed before any dynamic initialization takes 
	place. Dynamic initialization of an object is either ordered or unordered. Definitions 
	of explicitly specialized class template static data members have ordered initialization. 
	Other class template static data members (i.e., implicitly or explicitly instantiated 
	specializations) have unordered initialization. Other objects defined in namespace 
	scope have ordered initialization. Objects defined within a single translation 
	unit and with ordered initialization shall be initialized in the order of their 
	definitions in the translation unit. The order of initialization is unspecified 
	for objects with unordered initialization and for objects defined in different 
	translation units. <span class="inserted">An unordered initialization is indeterminately 
	sequenced with respect to every other dynamic initialization.</span> [ <em>Note:</em> 
	8.5.1 describes the order in which aggregate members are initialized. The initialization 
	of local static objects is described in 6.7. &#8212;<em>end note</em> ]</p>
</blockquote>
<p>8.5.1p17 (aggregate initialization); new paragraph:</p>
<blockquote>
	<p><span class="inserted">The full-expressions in an <var>initializer-clause</var> 
	are evaluated in the order in which they appear.</span></p>
</blockquote>
<p>12.6.2p3 (mem-initializers):</p>
<blockquote>
	<p>The <var>expression-list</var> in a <var>mem-initializer</var> is used to 
	initialize the base class or non-static data member subobject denoted by the
	<var>mem-initializer-id</var>. The semantics of a <var>mem-initializer</var> 
	are as follows:</p>
	<ul>
		<li>if the <var>expression-list</var> of the <var>mem-initializer</var> 
		is omitted, the base class or member subobject is value-initialized (see 
		8.5);</li>
		<li>otherwise, the subobject indicated by <var>mem-initializer-id</var> 
		is direct-initialized using <var>expression-list</var> as the <var>initializer</var> 
		(see 8.5).</li>
	</ul>
	<blockquote>
		<p><em>[unchanged example omitted]</em></p>
	</blockquote>
	<p><span class="deleted">There is a sequence point (1.9) after the initialization 
	of each base and member.</span> <span class="inserted">The initialization of 
	each base and member constitutes a full-expression.</span>
	<span class="deleted">The <var>expression-list</var> of</span>
	<span class="inserted">Any expression in</span> a <var>mem-initializer</var> 
	is evaluated as part of the <span class="deleted">initialization of the corresponding 
	base or member</span> <span class="inserted">full-expression that performs the 
	initialization</span>.</p>
</blockquote>
<p>14.2 (template arguments):</p>
<blockquote>
	<dl>
		<dt><var>template-argument:</var></dt>
		<dd><var><span class="deleted">assignment-expression</span>
		<span class="inserted">constant-expression</span></var></dd>
		<dd><var>type-id</var></dd>
		<dd><var>id-expression</var></dd>
	</dl>
</blockquote>
<h2><a id="loops">Semantics of some non-terminating loops</a></h2>
<!--
<p>Insert a new paragraph just before 6.5.1:</p>
<p>6.5p5:</p>
<blockquote class="inserted">
	<p>A non-terminating loops that occurs in a program with more than one thread, 
	and fails to perform an acquire operation after a finite number of initial operations, 
	has unspecified behavior. [ <em>Note:</em> This allows compilers to move assignments 
	above non-terminating loops under certain conditions, and allows some such erroneous 
	loops to be diagnosed. Intentionally infinite loops should contain an acquire 
	operation, such as accessing an atomic variable. &#8212;<em>end note</em> ]</p>
</blockquote>
-->
<p>Concern has been expressed about whether it is safe and legal for a compiler 
to optimize based on the assumption that a loop will terminate. The canonical example:</p>
<blockquote>
	<pre>for (T * p = q; p != 0; p = p-&gt;next)
<!--  -->    ++count;
<!--  -->x = 42;</pre>
</blockquote>
<p>Is it valid for the compiler to move the assignment to <code>x</code> above the 
loop? If the loop terminates, clearly yes, because the overall effect of the code 
doesn&#39;t change, and, in the absence of synchronization, there is no guarantee that 
the assignment to <code>x</code> will not be visible to a different thread before 
any assignments to <code>count</code>. But what if the loop doesn&#39;t terminate? For 
example, may a user assume that a non-terminating loop effects synchronization, 
and may therefore be used to prevent a data race? Clearly, a loop that contains 
any explicit synchronizations must be assumed to interact with a different thread, 
and a loop that contains a volatile access or a call to an I/O function must be 
assumed to interact with the environment, so optimization opportunities for such 
a loop are already limited. But what about a simple loop, as above?</p>
<p>If such a loop does not terminate, then clearly neither the loop itself nor any 
code following the loop can have any observable behavior. Moreover, as the &quot;least 
requirements&quot; refer to data written to files &quot;at program termination&quot;, the presence 
of a non-terminating loop may even nullify observable behavior preceding entry to 
the loop (for example, because of buffered output). For these reasons, there are 
problems with concluding that a strictly-conforming program can contain any non-terminating 
loop. We therefore conclude that a compiler is free to assume that a simple loop 
will terminate, and to optimize based on that assumption.</p>
<h2><a id="library">Library thread-safety</a></h2>
<p>Add a new section after 17.4.4.8, entitled &quot;Thread safety&quot;:</p>
<blockquote class="inserted">
	<p>Unless otherwise specified:</p>
	<ul>
		<li>Every data type (e.g. container) implemented by the library shall be 
		thread-safe in the same sense as an ordinary scalar object: The client must 
		ensure that an operation that logically updates an object is not executed 
		concurrently with another operation that reads or writes the same object. 
		The implementation must protect against accesses to shared data that do 
		not correspond to conflicting accesses at the abstract level, i.e. updates 
		that occur in response to logical &quot;read&quot; operations, or against accesses 
		to a data structure shared by multiple abstract objects. For example, implementations 
		of &quot;read operations&quot; that maintain an internal shared cache must use internal 
		synchronization mechanisms to protect that cache, as will any implementations 
		that maintain other forms of per class, as opposed to per object, data.</li>
		<li>Library calls do not introduce synchronizes-with relationships.</li>
		<li>Operations that allocate memory, such as <code>allocator&lt;T&gt;::allocate()</code>, 
		do not modify shared data. Hence they can be invoked concurrently from different 
		threads without introducing a data race. </li>
	</ul>
</blockquote>

</body>

</html>
