<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>



<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<title>Comments on the C++ Memory Model Following a Partial Formalization Attempt</title>
</head><body>
<h1>Comments on the C++ Memory Model Following a Partial Formalization Attempt</h1>

<p>
ISO/IEC JTC1 SC22 WG21 DRAFT N2955 = 09-0145 - 2009-09-28
</p>

<p>
Mark Batty, mjb220@cl.cam.ac.uk
</p>

<p>
<a href="#Motivation%20for%20Suggestions">Motivation for Suggestions</a><br>
<a href="#Mixed%20Accesses%20to%20Memory%20Locations">Mixed Accesses to Memory Locations</a><br>
<a href="#Release%20Sequence%20and%20Visible%20Sequence%20of%20Side%20Effects">Release Sequence and Visible Sequence of Side Effects</a><br>
<a href="#The%20Conditions%20for%20Sequential%20Consistency">The Conditions for Sequential Consistency</a><br>
<a href="#Reading%20the%20Initial%20State">Reading the Initial State</a><br>
<a href="#Restriction%20on%20Compiler%20Hoisting">Restriction on Compiler Hoisting</a><br>

<a href="#Collected%20Suggestions%20for%20Change%20to%20the%20Standard">Collected Suggestions for Change to the Standard</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p2">1.10p2</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p4">1.10p4</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p6">1.10p6</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p12">1.10p12</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p13">1.10p13</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.10p14">1.10p14</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#29.3p9">29.3p9</a><br>
<a href="#References">References</a><br>
</p>



<h2><a name="Motivation for Suggestions">Motivation for Suggestions</a></h2>

The issues highlighted in this document were first raised in
<a href="#BattySewell09">[BattySewell09]</a>, an attempt to formalize
part of the memory model as it is described in the current working
draft of the standard <a href="#N2914">[N2914]</a>, using the HOL
proof assistant <a href="#HOL">[HOL]</a>, paying particular attention
to the subset of the standard corresponding to the sequentially
consistent model of
<a href="#BoehmAdve08">[BoehmAdve08]</a>. Such formalization work
commonly uncovers ambiguities, inconsistencies, or omissions. In this
case, we have identified a number of places where some local
clarification or improvement may be appropriate.


<h2><a name="Mixed Accesses to Memory Locations">Mixed Accesses to Memory Locations</a></h2>

<p>
The atomic memory operations in section 29 [Atomics] are the programmer's
interface to the explicitly synchronized memory operations. They are
presented as a set of types with functions for operating on them. It
does not seem sufficiently clear what provision there is for
non-atomic access to objects of this type.
</p>

<p>
At first glance the Standard seems to allow non-atomic accesses at
atomic locations. In 1.10 [intro.multithread] paragraph 14 this is
implied from the definition of a data race (note that memory
operations in conflict act on the same location):
</p>

<blockquote><p>
  The execution of a program contains a <i>data race</i> if it
  contains two conflicting actions in different threads, at least one
  of which is not atomic, and neither happens-before the other. Any
  such data race results in undefined behavior.
</p></blockquote>

<p>
How then would a programmer write down a non-atomic access to the
atomic location? There are no operations built into the C++ library
functions on the atomic types for non-atomic access. Non-atomic access
will have to take place on the object through an lvalue of some other
type. 3.10 [basic.lval] paragraph 15 lists the types of lvalue that would not result in
undefined behavior:
</p>

<blockquote><p>
  If a program attempts to access the stored value of an object
  through an lvalue of other than one of the following types the
  behavior is undefined
</p>

  <ul>
    <li> the dynamic type of the object,
    </li><li> a cv-qualified version of the dynamic type of the object,
    </li><li> a type similar (as defined in 4.4) to the dynamic type of the object,
    </li><li> a type that is the signed or unsigned type corresponding to the dynamic type of the object,
    </li><li> a type that is the signed or unsigned type corresponding to
      a cv-qualified version of the dynamic type of the object,
    </li><li> an aggregate or union type that includes one of the
      aforementioned types among its members (including, recursively,
      a member of a subaggregate or contained union),
    </li><li> a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
    </li><li> a char or unsigned char type.
  </li></ul>
</blockquote>

<p>
All of these types are slight variations of the original type and are
still atomic, except for the char and unsigned char types. This
suggests that any non-atomic access to an atomic location would have
to be through a char or unsigned char lvalue. 
There is evidence in the standard to suggest that using such an lvalue
to modify an atomic object would result in undefined behavior, but it
is not definitive.
</p>

<p>
Each atomic type in section 29 has a deleted copy constructor,
therefore has no trivial copy constructor, and ought not to be
trivially copyable. This is not certain however from the definition in
9p6 [class] which defines a trivially copyable class, but does not
explicitly require such a class to have a trivial copy
constructor. Section 3.9 [basic.types] leaves <code>memcpy</code>
undefined over non-trivially copyable types. If <code>memcpy</code> is
undefined over the atomics, this suggests that other non-atomic memory
accesses might not work either.
</p>

<p>
This argument is not very strong, but if it does indeed hold, then the
following note, to be appended to 1.10p4, will clarify that mixed
accesses are not allowed:
</p>

<blockquote><p>
  <b>( Note: Atomic and locking objects are not trivially copyable [3.9],
   so executions that access atomic or locking objects as
   non-atomic objects, through an array of character type, for example,
   shall have undefined behaviour - end note )</b>
</p></blockquote>

<p>
The possibility of mixed accesses should be removed from the definition
of a data race in 1.10p14:
</p>

<blockquote><p>
  The execution of a program contains a <i>data race</i> if it
  contains two <b>non-atomic</b> conflicting actions in different
  threads, <del>at least one of which is not atomic,</del> and neither
  happens-before the other. Any such data race results in undefined
  behavior.
</p></blockquote>



<h2><a name="Release Sequence and Visible Sequence of Side Effects">Release Sequence and Visible Sequence of Side Effects</a></h2>

<p>
The standard presents several definitions with the phrase "maximal
contiguous sub-sequence of side effects in the modification order of
M". The usages of this phrase seem unclear, in the definitions of a
visible sequence of side effects and a release sequence in 1.10p13
and 1.10p6 respectively:
</p>

<blockquote><p>
  The visible sequence of side effects on an atomic object <var>M</var>, with
  respect to a value computation <var>B</var> of <var>M</var>, is a <i>maximal contiguous
  sub-sequence of side effects in the modification order of <var>M</var></i>, where
  the first side effect is visible with respect to <var>B</var>, and for every
  subsequent side effect, it is not the case that <var>B</var> happens-before it.
[our emphasis]
</p></blockquote>

<blockquote><p>
  A release sequence on an atomic object <var>M</var> is a <i>maximal contiguous
  sub-sequence of side effects in the modification order of <var>M</var></i>, where
  the first operation is a release, and every subsequent operation is
  performed by the same thread that performed the release, or is an
  atomic read-modify-write operation.
[our emphasis]
</p></blockquote>

<p>
Early in the process of formalization, the word 'maximal' was
understood to be specifying that the first operation in the sequence
is maximal in the modification order. This was an error in
interpretation rather than in the prose of the standard, but it might
improve the clarity of the paragraphs to change the word 'maximal' for
the words 'maximal-length'. So respectively 1.10p13 and 1.10p6 become:
</p>

<blockquote><p>
  The visible sequence of side effects on an atomic object <var>M</var>, with
  respect to a value computation <var>B</var> of <var>M</var>, is a
  maximal<b>-length</b> contiguous sub-sequence of side effects in
  the modification order of <var>M</var>, where the first side effect is
  visible with respect to <var>B</var>, and for every subsequent side effect,
  it is not the case that <var>B</var> happens-before it.
</p></blockquote>

<blockquote><p>
  A release sequence on an atomic object <var>M</var> is a
  maximal<b>-length</b> contiguous sub-sequence of side effects in
  the modification order of <var>M</var>, where the first operation is a
  release, and every subsequent operation is performed by the same
  thread that performed the release, or is an atomic read-modify-write
  operation.
</p></blockquote>




<h2><a name="The Conditions for Sequential Consistency">The Conditions for Sequential Consistency</a></h2>

<p>
The standard alludes to supporting a simplified sequentially
consistent model of program execution although it does not explicitly
specify it. It would be valuable for programmers to be guaranteed
sequential consistency with minimal well-specified conditions.
</p>

<p>
The standard does promise that programmers can rely on a sequentially
consistent execution of memory operations, but only if they do not use
atomic operations at all. The note that makes this claim is in
1.10p14:
</p>

<blockquote><p>
  It can be shown that programs that correctly use simple locks to
  prevent all data races, and use no other synchronization operations,
  behave as though the executions of their constituent threads were
  simply interleaved, with each observed value of an object being the
  last value assigned in that interleaving. This is normally referred
  to as "sequential consistency". However, this applies only to
  race-free programs, and race-free programs cannot observe most
  program transformations that do not change single-threaded program
  semantics. [...]
</p></blockquote>

<p>
In 1.10p12 the standard alludes to requiring sequential
consistency with slightly weaker conditions, describing the
correspondence of data races in restricted executions to data races in
a sequentially consistent execution.
</p>

<blockquote><p>
  This states that operations on ordinary variables are not visibly
  reordered. This is not actually detectable without data races, but
  it is necessary to ensure that data races, as defined here, and with
  suitable restrictions on the use of atomics, correspond to data
  races in a simple interleaved (sequentially consistent) execution.
</p></blockquote>

<p>
Given the models described in <a href="#BoehmAdve08">[BoehmAdve08]</a> and <a href="#BattySewell09">[BattySewell09]</a> correspond to data race
free executions in the model of the standard over a restricted set of
the atomic operations, it seems that the language regarding the
conditions for sequential consistency could be made stronger.
</p>

<p>
The suggestion for change is that the note in 1.10p14 be changed to:
</p>

<blockquote><p>
  It can be shown that programs that correctly use simple locks
  <b>and atomic <code>memory_order_seq_cst</code> operations</b> to
  prevent all data races, and use no other synchronization operations,
  behave as though the executions of their constituent threads were
  simply interleaved [...]
</p></blockquote>


<h2><a name="Reading the Initial State">Reading the Initial State</a></h2>

<p>
There is only one paragraph discussing reads from the initial state in
the standard. In 1.10p2 it is mentioned that reads might read from
the initial state.
</p>

<blockquote><p>
  The value of an object visible to a thread <var>T</var> at a particular point
  might be the initial value of the object, a value assigned to the
  object by <var>T</var>, or a value assigned to the object by another thread,
  according to the rules below.
</p></blockquote>

<p>
Given the weakness of the memory model it does not seem enough to
leave the reader to come up with a common-sense semantics for reads
from the initial state. The standard should provide explicit
conditions for such reads, for the non atomic memory objects, 1.10p12
states:
</p>

<blockquote><p>
  The value of a non-atomic scalar object <var>M</var> as determined by
  evaluation <var>B</var>, shall be the value stored by the visible side effect
  <var>A</var>.
</p></blockquote>

<p>
Which could be changed to:
</p>

<blockquote><p>
  The value of a non-atomic scalar object <var>M</var> as determined by
  evaluation <var>B</var>, shall be the value stored by the visible side effect
  <var>A</var><b>, if one exists, or the initial value of the object
  otherwise</b>.
</p></blockquote>

<p>
Similarly, 1.10p13 describes the values determined for evaluations
on atomic objects:
</p>

<blockquote><p>
  The visible sequence of side effects on an atomic object <var>M</var>, with
  respect to a value computation <var>B</var> of <var>M</var>, is a maximal contiguous
  sub-sequence of side effects in the modification order of <var>M</var>, where
  the first side effect is visible with respect to <var>B</var>, and for every
  subsequent side effect, it is not the case that <var>B</var> happens before
  it.
</p>

<p>
  The value of an atomic object <var>M</var>, as determined by evaluation <var>B</var>,
  shall be the value stored by some operation in the visible sequence
  of <var>M</var> with respect to <var>B</var>.
</p></blockquote>

<p>
In the atomic case, if there is no visible side effect with respect to
an evaluation, then we assume that the evaluation could take the value
of any side effect that it does not happen before or the initial value
of the object. Altering 1.10p13 to incorporate this gives:
</p>

<blockquote><p>
  The visible sequence of side effects on an atomic object <var>M</var>, with
  respect to a value computation <var>B</var> of <var>M</var>, is a maximal contiguous
  sub-sequence of side effects in the modification order of <var>M</var>, where
  the first side effect is visible with respect to <var>B</var>, and for every
  subsequent side effect, it is not the case that <var>B</var> happens before
  it.
</p>

<p><b> If there are no visible side effects with respect
to <val>B</val> and there are some side effects that <val>B</val> does
not happen before, then <val>B</val> has an initial side effect
sequence, a maximal-length contiguous sub-sequence of side effects in
the modification order of <val>M</val> where for each side effect, it
is not the case that <val>B</val> happens before it.
</b></p>

<p>
The value of an atomic object <var>M</var>, as determined by evaluation <var>B</var>,
shall be the value stored by some operation in the visible sequence of
<var>M</var> with respect to <var>B</var><b>, or if no such sequence
exists, the initial value or any value from the initial side effect
sequence</b>.

</p></blockquote>

<p>
Returning now to 1.10p2, the word 'might' suggests that there may be
some ambiguity over what values are read. The intended meaning is not
that the value of an object might be the initial value at any
arbitrary point, nor is the intention that the value might not be the
initial value at the initial point. Given the edits of 10.1p12 and
10.1p13 above, 10.1p2 could be changed slightly to remove the word
'might':
</p>

<blockquote><p>
  The value of an object visible to a thread <var>T</var> at a particular point
  <del>might be</del> <b>is</b> the initial value of the object, a value
  assigned to the object by <var>T</var>, or a value assigned to the object by
  another thread, according to the rules below.
</p></blockquote>


<h2><a name="Restriction on Compiler Hoisting">Restriction on Compiler Hoisting</a></h2>

<p>
Consider paragraph 29.3p9 [atomics.order]:
</p>

<blockquote><p>
Implementations shall not move an atomic operation out of an unbounded loop.
</p></blockquote>

<p>
The phrase "moving an atomic operation" may be meaningful in the
context of a specific implementation, but the presentation of the rest
of the model has been more general. The word "moved" seems to refer to
memory operations being moved by a compiler (or some abstract
interpreter), whereas the style of the rest of the standard is more
axiomatic: given some particular candidate execution, the memory model
determines whether it is valid or not.
</p>

<p>
Even leaving that aside, paragraph 29.3p9 mentions an unbounded
loop. This term is not used anywhere else in the standard. Given the
context it seems to be referring to iteration statements that are not
assumed to terminate by 6.5p5 [stmt.iter]. Any iteration statement
containing an atomic operation is not assumed to terminate by
paragraph 6.5p5, so 29.3p9 is redundant.
</p>

<p>
There seems to be a serious question as to just what is intended
here. If the clause is meant to prevent compilers from hoisting atomic
memory actions out of the body of a loop, then this restriction is
overly strong. If a given transformation would produce observably
different behavior, then 1.9p1 [intro.execution] dictates that the
transformation would not be valid:
</p>

<blockquote><p> The semantic descriptions in this International
Standard define a parametrized non-deterministic abstract
machine. This International Standard places no requirement on the
structure of conforming implementations. In particular, they need not
copy or emulate the structure of the abstract machine. Rather,
conforming implementations are required to emulate (only) the
observable behavior of the abstract machine as explained below.
</p></blockquote>

<p>
The suggestion for change is to remove the second sentence in 29.3p9
entirely.
</p>





<h2><a name="Collected Suggestions for Change to the Standard">Collected Suggestions for Change to the Standard</a></h2>
<p>
This section collects all of the suggestions for change to the
standard described in this document. The relevant paragraphs label is
given then the whole paragraph is reproduced with any added text in
<b>bold</b> and any removed text <del>struck through</del>.
</p>

<h3><a name="1.10p2">1.10p2</a></h3>
<p>
The value of an object visible to a thread <val>T</val> at a particular point
<del>might be</del> <b>is</b> the initial value of the object, a value
assigned to the object by <val>T</val>, or a value assigned to the object by
another thread, according to the rules below. ( Note: In some cases,
there may instead be undefined behavior. Much of this section is
motivated by the desire to support atomic operations with explicit and
detailed visibility constraints. However, it also implicitly supports
a simpler view for more restricted programs. - end note )
</p>

<h3><a name="1.10p4">1.10p4</a></h3>
<p>
The library defines a number of atomic operations (paragraph 29) and
operations on locks (paragraph 30) that are specially identified as
synchronization operations. These operations play a special role in
making assignments in one thread visible to another. A synchronization
operation on one or more memory locations is either a consume
operation, an acquire operation, a release operation, or both an
acquire and release operation. A synchronization operation without an
associated memory location is a fence and can be either an acquire
fence, a release fence, or both an acquire and release fence. In
addition, there are relaxed atomic operations, which are not
synchronization operations, and atomic read-modify-write operations,
which have special characteristics. ( Note: For example, a call that
acquires a lock will perform an acquire operation on the locations
comprising the lock. Correspondingly, a call that releases the same
lock will perform a release operation on those same
locations. Informally, performing a release operation on <val>A</val> forces
prior side effects on other memory locations to become visible to
other threads that later perform a consume or an acquire operation
on <val>A</val>. ``Relaxed'' atomic operations are not synchronization
operations even though, like synchronization operations, they cannot
contribute to data races. - end note )
<b>( Note: Atomic and locking objects are not trivially copyable
   [3.9], so executions that access atomic or locking objects as
   non-atomic objects, through an array of character type, for
   example, will have undefined behaviour - end note )</b>
</p>

<h3><a name="1.10p6">1.10p6</a></h3>
<p>
A release sequence on an atomic object <val>M</val> is a maximal<b>-length</b>
contiguous sub-sequence of side effects in the modification order of
<val>M</val>, where the first operation is a release, and every subsequent
operation is performed by the same thread that performed the release,
or is an atomic read-modify-write operation.
</p>

<h3><a name="1.10p12">1.10p12</a></h3>
<p>
A visible side effect <val>A</val> on an object <val>M</val> with respect to a value
computation <val>B</val> of <val>M</val> satisfies the conditions:
</p>
<ul>
<li> <val>A</val> happens before <val>B</val>, and
</li><li> there is no other side effect <val>X</val> to <val>M</val> such that <val>A</val> happens
  before <val>X</val> and <val>X</val> happens before <val>B</val>.
</li></ul>
<p>
The value of a non-atomic scalar object <val>M</val>, as determined by
evaluation <val>B</val>, shall be the value stored by the visible side effect
<val>A</val> <b>if one exists, or the initial value of the object
  otherwise</b>. ( Note: If there is ambiguity about which side effect to
a non-atomic object is visible, then there is a data race, and the
behavior is undefined. - end note ) ( Note: This states that
operations on ordinary variables are not visibly reordered. This is
not actually detectable without data races, but it is necessary to
ensure that data races, as defined here, and with suitable
restrictions on the use of atomics, correspond to data races in a
simple interleaved (sequentially consistent) execution. - end note )
</p>

<h3><a name="1.10p13">1.10p13</a></h3>
<p>
The visible sequence of side effects on an atomic object <val>M</val>, with
respect to a value computation <val>B</val> of <val>M</val>, is a
maximal<b>-length</b> contiguous sub-sequence of side effects in the
modification order of <val>M</val>, where the first side effect is visible with
respect to <val>B</val> and for every subsequent side effect, it is not the
case that <val>B</val> happens-before it.
</p>

<p><b> If there are no visible side effects with respect
to <val>B</val> and there are some side effects that <val>B</val> does
not happen before, then <val>B</val> has an initial side effect
sequence, a maximal-length contiguous sub-sequence of side effects in
the modification order of <val>M</val> where for each side effect, it
is not the case that <val>B</val> happens before it.
</b></p>

The value of an atomic object <val>M</val>, as determined by
evaluation <val>B</val>, shall be the value stored by some operation
in the visible sequence of
<val>M</val> with respect to <val>B</val><b>, or if no such sequence
exists, the initial value or any value from the initial side effect
sequence</b>. Furthermore, if a value computation <val>A</val> of an
atomic object <val>M</val> happens before a value
computation <val>B</val> of <val>M</val>, and the value computed
by <val>A</val> corresponds to the value stored by side
effect <val>X</val>, then the value computed by <val>B</val> shall
either equal the value computed by <val>A</val>, or be the value
stored by side effect <val>Y</val>, where <val>Y</val>
follows <val>X</val> in the modification order of
<val>M</val>. ( Note: This effectively disallows compiler reordering
of atomic operations to a single object, even if both operations are
relaxed loads. By doing so, we This effectively makes the cache
coherence guarantee provided by most hardware available to C++ atomic
operations. - end note ) ( Note: The visible sequence depends on the
happens before relation, which depends on the values observed by loads
of atomics, which we are restricting here. The intended reading is
that there must exist an association of atomic loads with
modifications they observe that, together with suitably chosen
modification orders and the happens before relation derived as
described above, satisfy the resulting constraints as imposed here. -
end note )

<h3><a name="1.10p14">1.10p14</a></h3>

The execution of a program contains a data race if it contains two
<b>non-atomic</b> conflicting actions in different threads, <del>at
  least one of which is not atomic,</del> and neither happens before the
other. Any such data race results in undefined behavior. ( Note: It
can be shown that programs that correctly use simple locks <b>and
  atomic <code>memory_order_seq_cst</code> operations</b> to prevent all
data races, and use no other synchronization operations, behave as
though the executions of their constituent threads were simply
interleaved, with each observed value of an object being the last
value assigned in that interleaving. This is normally referred to as
"sequential consistency". However, this applies only to race-free
programs, and race-free programs cannot observe most program
transformations that do not change single-threaded program
semantics. In fact, most single-threaded program transformations
continue to be allowed, since any program that behaves differently as
a result must perform an undefined operation. - end note )

<h3><a name="29.3p9">29.3p9</a></h3>

Implementations should make atomic stores visible to atomic loads
within a reasonable amount of time.
<del>Implementations shall not move an atomic operation out of an unbounded loop.</del>


<h2><a name="References">References</a></h2>

<dl>


<dt>
<a name="N2914">[N2914]</a>
</dt>
<dd>
<cite>Working Draft, Standard for Programming Language C++</cite>,
ISO/IEC JTC1 SC22 WG21 N2914 = 09-0104 - 2009-06-22,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf" target="_blank">
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf</a>.
Accessed 2009/09/28.
</dd>




<dt>
<a name="BoehmAdve08">[BoehmAdve08]</a>
</dt>
<dd>
<cite>Foundations of the C++ Concurrency Memory Model</cite>,
H.-J. Boehm and S. Adve,
In <cite>Proc. PLDI,</cite>
pages 68-78, New York, NY, USA, 2008. ACM.
<a href="http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf" target="_blank">
http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf</a>.
Accessed 2009/09/28.
</dd>


<dt>
<a name="HOL">[HOL]</a>
</dt>
<dd>
The HOL 4 system.
<a href="http://hol.sourceforge.net/" target="_blank">
http://hol.sourceforge.net/</a>.
</dd>

<dt>
<a name="BattySewell09">[BattySewell09]</a>
</dt>
<dd>
<cite>Notes on the Draft C++ Memory Model</cite>,
Mark Batty, Peter Sewell,
<a href="http://www.cl.cam.ac.uk/%7Emjb220/cpp/cpp2440.pdf" target="_blank">
http://www.cl.cam.ac.uk/~mjb220/cpp/cpp2440.pdf
</a>
</dd>




</dl>
</body></html>