
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link href='https://fonts.googleapis.com/css?family=Roboto' rel='stylesheet' type='text/css'>
    <link href='https://fonts.googleapis.com/css?family=Inconsolata:bold' rel='stylesheet' type='text/css'>
    <style>
      body {
        font-family: 'Roboto', sans-serif;
      }
      .body {
        margin: 0 auto;
        max-width: 80em;
      }
      a {
        color: #455A64;
      }
      h1, h2, h3, h4, h5, h6 {
        color: #37474F;
      }
      h1 a, h2 a, h3 a, h4 a, h5 a, h6 a {
        padding-left: 1em;
        padding-right: 3em;
        text-decoration: none;
        opacity: 0;
      }
      a.headerlink:hover {
        opacity: 1;
      }
      .field-name {
        text-align: right;
        padding-right: 1em;
      }
      tt, .highlight {
        color: #263238;
        background-color: #ECEFF1;
        font-family: 'Inconsolata', monospace;
        font-weight: bold;
      }
      tt {
        padding: 0em 0.5em;
      }
      .highlight {
        margin: 0em 1em;
        padding: 0.1em 1em;
      }
    </style>
    <title>P0153R0 std::atomic_object_fence(mo, T&amp;&amp;...)</title> 
  </head>
  <body>
    <div class="body">

  <div class="section" id="p0153r0-std-atomic-object-fence-mo-t">
<h1>P0153R0 <tt class="docutils literal"><span class="pre">std::atomic_object_fence(mo,</span> <span class="pre">T&amp;&amp;...)</span></tt><a class="headerlink" href="#p0153r0-std-atomic-object-fence-mo-t" title="Permalink to this headline">¶</a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Author:</th><td class="field-body">Olivier Giroux</td>
</tr>
<tr class="field-even field"><th class="field-name">Contact:</th><td class="field-body"><a class="reference external" href="mailto:ogiroux&#37;&#52;&#48;nvidia&#46;com">ogiroux<span>&#64;</span>nvidia<span>&#46;</span>com</a></td>
</tr>
<tr class="field-odd field"><th class="field-name">Author:</th><td class="field-body">JF Bastien</td>
</tr>
<tr class="field-even field"><th class="field-name">Contact:</th><td class="field-body"><a class="reference external" href="mailto:jfb&#37;&#52;&#48;google&#46;com">jfb<span>&#64;</span>google<span>&#46;</span>com</a></td>
</tr>
<tr class="field-odd field"><th class="field-name">Date:</th><td class="field-body">2015-11-05</td>
</tr>
<tr class="field-even field"><th class="field-name">Previous:</th><td class="field-body"><a class="reference external" href="http://wg21.link/N4522">http://wg21.link/N4522</a></td>
</tr>
<tr class="field-odd field"><th class="field-name">URL:</th><td class="field-body"><a class="reference external" href="https://github.com/jfbastien/papers/blob/master/source/P0153R0.rst">https://github.com/jfbastien/papers/blob/master/source/P0153R0.rst</a></td>
</tr>
</tbody>
</table>
<div class="section" id="rationale">
<h2>Rationale<a class="headerlink" href="#rationale" title="Permalink to this headline">¶</a></h2>
<p>Fences allow programmers to express a conservative approximation to the precise
pair-wise relations of operations required to be ordered in the happens-before
relation. This is conservative because fences use the sequenced-before relation
to select vast extents of the program into the happens-before relation.</p>
<p>This conservatism is commonly desired because it is difficult to reason about
operations hidden behind layers of abstraction in C++ programs. An unfortunate
consequence of this is that precise expression of ordering is not possible in
C++ currently, which makes it easy to over-constrain the order of operations
internal to synchronization primitives that comprise multiple atomic objects.
This constrains the ability of implementations (compiler and hardware) to
reorder, ignore, or assume the absence of operations that are not relevant or
not visible.</p>
<p>In existing practice, the <tt class="docutils literal"><span class="pre">flush</span></tt> primitive of OpenMP is more expressive than
the fences of C++ in at least this one sense: it can optionally restrict the
ordering of operations to a developer-specified set of memory locations. This is
enough to exactly express the required pair-wise ordering for short lock-free
algorithms. This capability isn&#8217;t only relevant to OpenMP and would be further
enhanced if it was integrated with the other facets of the more modern C++
memory model.</p>
<p>An example use-case for this capability is a likely implementation strategy for
<a class="reference external" href="http://wg21.link/N4392">N4392</a>&#8216;s <tt class="docutils literal"><span class="pre">std::barrier</span></tt> object. This algorithm makes ordered modifications on
the atomic sub-objects of a larger non-atomic synchronization object, but the
internal modifications need only be ordered with respect to each other, not all
surrounding objects (they are ordered separately).</p>
<p>In one example implementation, <tt class="docutils literal"><span class="pre">std::barrier</span></tt> is coded as follows:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="k">struct</span> <span class="n">barrier</span> <span class="p">{</span>
    <span class="c1">// Some member functions elided.</span>
    <span class="kt">void</span> <span class="n">arrive_and_wait</span><span class="p">()</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="k">const</span> <span class="n">myepoch</span> <span class="o">=</span> <span class="n">epoch</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
        <span class="kt">int</span> <span class="k">const</span> <span class="n">result</span> <span class="o">=</span> <span class="n">arrived</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_acq_rel</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="n">expected</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">expected</span> <span class="o">=</span> <span class="n">nexpected</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
            <span class="n">arrived</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
            <span class="c1">// Only need to order {expected, arrived} -&gt; {epoch}.</span>
            <span class="n">epoch</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">myepoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_release</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="k">else</span>
            <span class="k">while</span> <span class="p">(</span><span class="n">epoch</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_acquire</span><span class="p">)</span> <span class="o">==</span> <span class="n">myepoch</span><span class="p">)</span>
                <span class="p">;</span>
    <span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
    <span class="kt">int</span> <span class="n">expected</span><span class="p">;</span>
    <span class="n">atomic</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">arrived</span><span class="p">,</span> <span class="n">nexpected</span><span class="p">,</span> <span class="n">epoch</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
</div>
<p>The release operation on the epoch atomic is likely to require the compiler to
insert a fence that has an effect that goes beyond the intended constraint,
which is to order only the operations on the barrier object. Since the barrier
object is likely to be smaller than a cache line and the library&#8217;s
implementation can control its alignment using <tt class="docutils literal"><span class="pre">alignas</span></tt>, then it would be
possible to compile this program without a fence in this location on
architectures that are cache-line coherent.</p>
<p>To concisely express the bound on the set of memory operations whose order is
constrained, we propose to accompany <tt class="docutils literal"><span class="pre">std::atomic_thread_fence</span></tt> with an
<tt class="docutils literal"><span class="pre">object</span></tt> variant which takes a reference to the object(s) to be ordered by
the fence.</p>
</div>
<div class="section" id="proposed-addition">
<h2>Proposed addition<a class="headerlink" href="#proposed-addition" title="Permalink to this headline">¶</a></h2>
<p>Under 29.2 Header <tt class="docutils literal"><span class="pre">&lt;atomic&gt;</span></tt> synopsis [<strong>atomics.syn</strong>]:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="k">namespace</span> <span class="n">std</span> <span class="p">{</span>
   <span class="c1">// 29.8, fences</span>
   <span class="c1">// ...</span>
   <span class="k">template</span><span class="o">&lt;</span><span class="n">class</span><span class="p">...</span> <span class="n">T</span><span class="o">&gt;</span>
   <span class="kt">void</span> <span class="n">atomic_object_fence</span><span class="p">(</span><span class="n">memory_order</span><span class="p">,</span> <span class="n">T</span><span class="o">&amp;&amp;</span><span class="p">...</span> <span class="n">objects</span><span class="p">)</span> <span class="n">noexcept</span><span class="p">;</span>
 <span class="p">}</span>
</pre></div>
</div>
<p>Under 29.8 Fences [<strong>atomics.fences</strong>], after the current
<tt class="docutils literal"><span class="pre">atomic_thread_fence</span></tt> paragraph:</p>
<p><tt class="docutils literal"><span class="pre">template&lt;class...</span> <span class="pre">T&gt;</span> <span class="pre">void</span> <span class="pre">atomic_object_fence(memory_order,</span> <span class="pre">T&amp;&amp;...</span> <span class="pre">objects)</span> <span class="pre">noexcept;</span></tt></p>
<p><em>Effect</em>: Equivalent to <tt class="docutils literal"><span class="pre">atomic_thread_fence(order)</span></tt> except that operations on
objects other than those in the variadic template arguments and their
sub-objects are <em>un-sequenced</em> with the fence. The <em>objects</em> operands are not
accessed.</p>
<p><em>Note</em>: The compiler may omit fences entirely depending on alignment
information, may generate a dynamic test leading to a fence for under-aligned
objects, or may emit the same fence an <tt class="docutils literal"><span class="pre">atomic_thread_fence</span></tt> would.</p>
<p>The <tt class="docutils literal"><span class="pre">__cpp_lib_atomic_object_fence</span></tt> feature test macro should be added.</p>
</div>
<div class="section" id="alternate-wording">
<h2>Alternate wording<a class="headerlink" href="#alternate-wording" title="Permalink to this headline">¶</a></h2>
<p>At the Kona meeting, the SG1 group expressed concerns about the current wording
and suggested that it be reworked. The main concern was that the exclusive
behavior expressed in the <em>effect</em> clause wasn&#8217;t fully correct.</p>
<p>The authors seek comments on the following approach.</p>
<p>The current definition from 1.10 (13) is:</p>
<blockquote>
<div><p>An evaluation A inter-thread happens before an evaluation B if</p>
<ul class="simple">
<li>(13.1) — A synchronizes with B, or</li>
<li>(13.2) — A is dependency-ordered before B, or</li>
<li>(13.3) — for some evaluation X<ul>
<li>(13.3.1) — A synchronizes with X and X is sequenced before B, or</li>
<li>(13.3.2) — A is sequenced before X and X inter-thread happens before B, or</li>
<li>(13.3.3) — A inter-thread happens before X and X inter-thread happens before B.</li>
</ul>
</li>
</ul>
</div></blockquote>
<p>An alternate wording could update (13.3.1) and (13.3.2) for the case where X is
an object fence. In that case, these clauses apply if A&#8217;s and B&#8217;s modified
memory location are named in the fence&#8217;s objects parameters.</p>
<p>This could be done by either:</p>
<ol class="arabic simple">
<li>Adding two new (13.3.*) clauses.</li>
<li>Keeping all the wording updates in 29.8 and declaring an exception to 1.10
from there.</li>
</ol>
</div>
<div class="section" id="example-implementation">
<h2>Example implementation<a class="headerlink" href="#example-implementation" title="Permalink to this headline">¶</a></h2>
<p>A trivial, yet conforming implementation may implement the new fence in terms of
the existing <tt class="docutils literal"><span class="pre">std::atomic_thread_fence</span></tt> using the same memory order:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="k">template</span><span class="o">&lt;</span><span class="n">class</span><span class="p">...</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="kt">void</span> <span class="n">atomic_object_fence</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">memory_order</span> <span class="n">order</span><span class="p">,</span> <span class="n">T</span> <span class="o">&amp;&amp;</span><span class="p">...)</span> <span class="n">noexcept</span> <span class="p">{</span>
  <span class="n">std</span><span class="o">::</span><span class="n">atomic_thread_fence</span><span class="p">(</span><span class="n">order</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>A more advanced implementation can overload this for the single-object case
on architectures (or micro-architectures) that have cache coherency with a known
line size, even if it is conservatively approximated:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="cp">#define __CACHELINE_SIZE </span><span class="c1">// Secret (micro-)architectural value.</span>
<span class="k">template</span> <span class="o">&lt;</span><span class="k">class</span> <span class="nc">T</span><span class="o">&gt;</span>
<span class="n">std</span><span class="o">::</span><span class="n">enable_if_t</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">is_standard_layout</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;::</span><span class="n">value</span> <span class="o">&amp;&amp;</span>
                 <span class="n">__CACHELINE_SIZE</span> <span class="o">-</span> <span class="n">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">%</span> <span class="n">__CACHELINE_SIZE</span> <span class="o">&gt;=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span><span class="o">&gt;</span>
<span class="n">atomic_object_fence</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">memory_order</span><span class="p">,</span> <span class="n">T</span> <span class="o">&amp;&amp;</span><span class="n">object</span><span class="p">)</span> <span class="n">noexcept</span> <span class="p">{</span>
  <span class="k">asm</span> <span class="k">volatile</span><span class="p">(</span><span class="s">&quot;&quot;</span> <span class="o">:</span> <span class="s">&quot;+m&quot;</span><span class="p">(</span><span class="n">object</span><span class="p">)</span> <span class="o">:</span> <span class="s">&quot;m&quot;</span><span class="p">(</span><span class="n">object</span><span class="p">));</span>  <span class="c1">// Code motion barrier.</span>
<span class="p">}</span>
</pre></div>
</div>
<p>To extend this for multiple objects, an implementation for the same architecture may
emit a run-time check that the total footprint of all the objects fits in the span of
a single cache line.  This check may commonly be eliminated as dead code, for example
when the objects are references from a common base pointer.</p>
<p>The above <tt class="docutils literal"><span class="pre">std::barrier</span></tt> example&#8217;s inner-code can use the new overload as follows:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="n">expected</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">expected</span> <span class="o">=</span> <span class="n">nexpected</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
    <span class="n">arrived</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
    <span class="n">atomic_object_fence</span><span class="p">(</span><span class="n">memory_order_release</span><span class="p">,</span> <span class="o">*</span><span class="k">this</span><span class="p">);</span>
    <span class="n">epoch</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">myepoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>It is equivalently valid to list the individual members of <tt class="docutils literal"><span class="pre">barrier</span></tt> instead of
<tt class="docutils literal"><span class="pre">*this</span></tt>. Both forms are equivalent.</p>
<p>Less trivial implementations of <tt class="docutils literal"><span class="pre">std::atomic_object_fence</span></tt> can enable more
optimizations for new hardware and portable program representations.</p>
</div>
<div class="section" id="relation-to-p0154r0">
<h2>Relation to P0154R0<a class="headerlink" href="#relation-to-p0154r0" title="Permalink to this headline">¶</a></h2>
<p>In <a class="reference external" href="http://wg21.link/P0154R0">P0154R0</a> we propose to formalize the notions of false-sharing and
true-sharing as perceived by the implementation in relation to the placement of
objects in memory. In the expository implementation of the previous section we
also showed how a cache-line coherent architecture or micro-architecture can
elide fences that only bisect relations between objects that are in the same
cache line, if provable at compile-time. These notions interact in a virtuous
way because P0154R0&#8217;s abstraction enables reasoning about likely cache behavior
that implementations can optimize for.</p>
<p>The example application of <tt class="docutils literal"><span class="pre">std::atomic_object_fence</span></tt> to the <tt class="docutils literal"><span class="pre">std::barrier</span></tt>
object is improved by combining these notions as follows:</p>
<div class="highlight-c++"><div class="highlight"><pre><span class="n">alignas</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="o">::</span><span class="n">hardware_true_sharing_size</span><span class="p">)</span> <span class="c1">// P0154</span>
<span class="k">struct</span> <span class="n">barrier</span> <span class="p">{</span>
    <span class="c1">// Some member functions elided.</span>
    <span class="kt">void</span> <span class="n">arrive_and_wait</span><span class="p">()</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="k">const</span> <span class="n">myepoch</span> <span class="o">=</span> <span class="n">epoch</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
        <span class="kt">int</span> <span class="k">const</span> <span class="n">result</span> <span class="o">=</span> <span class="n">arrived</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_acq_rel</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="n">expected</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">expected</span> <span class="o">=</span> <span class="n">nexpected</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
            <span class="n">arrived</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
            <span class="n">atomic_object_fence</span><span class="p">(</span><span class="n">memory_order_release</span><span class="p">,</span> <span class="o">*</span><span class="k">this</span><span class="p">);</span> <span class="c1">// P0153</span>
            <span class="n">epoch</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">myepoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="k">else</span>
            <span class="k">while</span> <span class="p">(</span><span class="n">epoch</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">memory_order_acquire</span><span class="p">)</span> <span class="o">==</span> <span class="n">myepoch</span><span class="p">)</span>
                <span class="p">;</span>
    <span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
    <span class="kt">int</span> <span class="n">expected</span><span class="p">;</span>
    <span class="n">atomic</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">arrived</span><span class="p">,</span> <span class="n">nexpected</span><span class="p">,</span> <span class="n">epoch</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
</div>
<p>By aligning the barrier object to the true-sharing granularity, it is
significantly more likely that the implementation will be able to elide the
fence if the architecture or micro-architecture has cache-line coherency. Of
course an implementation of the Standard is free to ensure this by other means,
we provide this example as exposition for what developer programs might do.</p>
</div>
<div class="section" id="memory-model-example">
<h2>Memory model example<a class="headerlink" href="#memory-model-example" title="Permalink to this headline">¶</a></h2>
<table border="1" class="docutils">
<colgroup>
<col width="50%" />
<col width="50%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">T0</th>
<th class="head">T1</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><tt class="docutils literal"><span class="pre">0:</span> <span class="pre">w</span> <span class="pre">=</span> <span class="pre">1;</span></tt></td>
<td><tt class="docutils literal"><span class="pre">4:</span> <span class="pre">while(!a.load(rlx));</span></tt></td>
</tr>
<tr class="row-odd"><td><tt class="docutils literal"><span class="pre">1:</span> <span class="pre">x</span> <span class="pre">=</span> <span class="pre">1;</span></tt></td>
<td><tt class="docutils literal"><span class="pre">5:</span> <span class="pre">objfence(acq,</span> <span class="pre">a,</span> <span class="pre">x);</span></tt></td>
</tr>
<tr class="row-even"><td><tt class="docutils literal"><span class="pre">2:</span> <span class="pre">objfence(rel,</span> <span class="pre">a,</span> <span class="pre">x);</span></tt></td>
<td><tt class="docutils literal"><span class="pre">6:</span> <span class="pre">assert(x);</span></tt></td>
</tr>
<tr class="row-odd"><td><tt class="docutils literal"><span class="pre">3:</span> <span class="pre">a.store(1,rlx);</span></tt></td>
<td><tt class="docutils literal"><span class="pre">7:</span> <span class="pre">assert(w);</span></tt></td>
</tr>
</tbody>
</table>
<p>The semantics of fences mean that:</p>
<dl class="docutils">
<dt><tt class="docutils literal"><span class="pre">2</span></tt> synchronizes-with <tt class="docutils literal"><span class="pre">5</span></tt> because [<strong>29.8¶2</strong>]:</dt>
<dd><ol class="first last upperalpha simple">
<li><tt class="docutils literal"><span class="pre">2</span></tt> is sequenced-before <tt class="docutils literal"><span class="pre">3</span></tt>,</li>
<li><tt class="docutils literal"><span class="pre">3</span></tt> inter-thread happens-before <tt class="docutils literal"><span class="pre">4</span></tt>, and</li>
<li><tt class="docutils literal"><span class="pre">4</span></tt> is sequenced-before <tt class="docutils literal"><span class="pre">5</span></tt>.</li>
</ol>
</dd>
<dt><tt class="docutils literal"><span class="pre">1</span></tt> happens-before <tt class="docutils literal"><span class="pre">6</span></tt> because [<strong>1.10¶13-14</strong>]:</dt>
<dd><ol class="first last upperalpha simple">
<li><tt class="docutils literal"><span class="pre">1</span></tt> is sequenced-before <tt class="docutils literal"><span class="pre">2</span></tt>,</li>
<li><tt class="docutils literal"><span class="pre">2</span></tt> synchronizes-with <tt class="docutils literal"><span class="pre">5</span></tt>, and</li>
<li><tt class="docutils literal"><span class="pre">5</span></tt> is sequenced-before <tt class="docutils literal"><span class="pre">6</span></tt>.</li>
</ol>
</dd>
</dl>
<p>Therefore the program is well-defined (so far) and the <tt class="docutils literal"><span class="pre">assert(x)</span></tt> of <tt class="docutils literal"><span class="pre">6</span></tt>
does not fire.</p>
<p>However, the <em>un-sequenced</em> semantics of the object fence also mean that:</p>
<dl class="docutils">
<dt><tt class="docutils literal"><span class="pre">0</span></tt>  conflicts with <tt class="docutils literal"><span class="pre">7</span></tt> because [<strong>1.10¶23</strong>]:</dt>
<dd><ol class="first last upperalpha simple">
<li><tt class="docutils literal"><span class="pre">0</span></tt> is a store to <tt class="docutils literal"><span class="pre">w</span></tt>, <tt class="docutils literal"><span class="pre">7</span></tt> is a load of <tt class="docutils literal"><span class="pre">w</span></tt> and they are not both
atomic, and</li>
<li><tt class="docutils literal"><span class="pre">0</span></tt> is not sequenced-before <tt class="docutils literal"><span class="pre">2</span></tt> and <tt class="docutils literal"><span class="pre">5</span></tt> is not sequenced-before
<tt class="docutils literal"><span class="pre">7</span></tt>.</li>
</ol>
</dd>
</dl>
<p>Therefore the <tt class="docutils literal"><span class="pre">assert(w)</span></tt> of <tt class="docutils literal"><span class="pre">7</span></tt> makes the program undefined due to a
data-race.</p>
</div>
</div>


    </div>
  </body>
</html>