﻿<html>
<head>
    <title>Proposed Text for Bidirectional Fences</title>
    <meta content="http://schemas.microsoft.com/intellisense/ie5" name="vs_targetSchema" />
    <meta http-equiv="Content-Language" content="en-us" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body bgcolor="#ffffff">
    <address>
        Document number: N2731=08-0241</address>
    <address>
        Programming Language C++</address>
    <address>
        &nbsp;</address>
    <address>
        Peter Dimov, &lt;<a href="mailto:pdimov@pdimov.com">pdimov@pdimov.com</a>&gt;</address>
    <address>
        &nbsp;</address>
    <address>
        2008-08-22</address>
    <h1>
        Proposed Text for Bidirectional Fences</h1>
    <h2>
        <a name="changes">Changes from N2633</a></h2>
    <ul>
        <li>Renamed <code>atomic_memory_fence</code> to <code>atomic_thread_fence</code>.</li>
        <li>Renamed <code>atomic_compiler_fence</code> to <code>atomic_signal_fence</code>.</li>
        <li>Defined <code>atomic_*_fence(memory_order_consume)</code> as an acquire fence.</li>
        <li>Moved the portions of the text that modify <em>synchronizes-with</em> into 1.10
            [intro.multithreaded].</li>
        <li>Changed the release sequence definition to include relaxed read-modify-write operations.
            The PowerPC architecture, which was the original reason for the restriction, is
            now believed to propagate B-cumulativity across <code>lwarx</code>/<code>stwcx</code>
            loops. Relaxed RMW operations are no longer believed to break a release sequence
            on PowerPC.</li>
        <li>Changed the specification of <code>memory_order_seq_cst</code> fences to interact
            properly with other <code>memory_order_seq_cst</code> operations.</li>
        <li>Moved the portions of the text that describe <code>memory_order_seq_cst</code> fences
            into 29.1 [atomics.order].</li>
    </ul>
    <h2>
        <a name="proposed">Proposed Text</a></h2>
    <p>
        All edits are relative to N2691.</p>
    <h3>
        Chapter 1 edits</h3>
    <p>
        Change 1.10 [intro.multithreaded] p4 as follows:</p>
    <blockquote>
        The library defines a number of atomic operations (clause 29) and operations on
        locks (clause 30) that are specially identified as synchronization operations. These
        operations play a special role in making assignments in one thread visible to another.
        A synchronization operation <ins>on one or more memory locations</ins> is either
        a consume operation, an acquire operation, a release operation, or both an acquire
        and release operation<del>, on one or more memory locations; the semantics of these
            are described below</del>. <ins>A synchronization operation without an associated memory
                location is a fence and can be either an acquire fence, a release fence or both
                an acquire and release fence.</ins> In addition, there are relaxed atomic
        operations, which are not synchronization operations, and atomic read-modify-write
        operations, which have special characteristics<del>, also described below</del>.
        [ <em>Note:</em> For example, a call that acquires a lock will perform an acquire
        operation on the locations comprising the lock. Correspondingly, a call that releases
        the same lock will perform a release operation on those same locations. Informally,
        performing a release operation on A forces prior side effects on other memory locations
        to become visible to other threads that later perform a consume or an acquire operation
        on A. We do not include “relaxed” atomic operations as synchronization operations
        although, like synchronization operations, they cannot contribute to data races.
        <em>&mdash;end note</em> ]</blockquote>
    <p>
        Change 1.10 [intro.multithreaded] p6 as follows:</p>
    <blockquote>
        A release sequence on an atomic object <em>M</em> is a maximal contiguous sub-sequence
        of side effects in the modification order of <em>M</em>, where the first operation
        is a release, and every subsequent operation
        <ul>
            <li>is performed by the same thread that performed the release, or</li>
            <li>is <del>a non-relaxed</del> <ins>an</ins> atomic read-modify-write operation.</li>
        </ul>
    </blockquote>
    <p>
        Change 1.10 [intro.multithreaded] p7 as follows:</p>
    <blockquote>
        <ins>Certain library calls <em>synchronize with</em> other library calls performed by
            another thread. In particular, an</ins> <del>An</del> evaluation <em>A</em>
        that performs a release operation on an object <em>M</em> <em>synchronizes with</em>
        an evaluation <em>B</em> that performs an acquire operation on <em>M</em> and reads
        a value written by any side effect in the release sequence headed by <em>A</em>.
        [ <em>Note:</em> Except in the specified cases, reading a later value does not necessarily
        ensure visibility as described below. Such a requirement would sometimes interfere
        with efficient implementation. <em>&mdash;end note</em> ] [ <em>Note:</em> The specifications
        of the synchronization operations define when one reads the value written by another.
        For atomic variables, the definition is clear. All operations on a given lock occur
        in a single total order. Each lock acquisition “reads the value written” by the
        last lock release. <em>&mdash;end note</em> ]</blockquote>
    <blockquote>
        <ins>A release fence <em>R</em> <em>synchronizes with</em> an acquire fence <em>A</em>
            if there exist evaluations <em>X</em> and <em>Y</em> such that <em>X</em> is sequenced
            after <em>R</em> and performs an atomic modification operation on an object <em>M</em>,
            <em>Y</em> is sequenced before <em>A</em> and performs an atomic operation on <em>M</em>,
            and <em>Y</em> reads the value written by <em>X</em> or a value written by any side
            effect in the release sequence <em>X</em> would head had it been a release operation.</ins></blockquote>
    <blockquote>
        <ins>A release fence <em>R</em> <em>synchronizes with</em> an evaluation <em>A</em>
            that performs an acquire operation on an object <em>M</em> if there exists an evaluation
            <em>X</em> such that <em>X</em> is sequenced after <em>R</em>, <em>X</em> performs
            an atomic modification operation on <em>M</em> and <em>A</em> reads the value written
            by <em>X</em> or a value written by any side effect in the release sequence <em>X</em>
            would head had it been a release operation.</ins></blockquote>
    <blockquote>
        <ins>An evaluation <em>R</em> that is a release operation on an object <em>M</em> <em>
            synchronizes with</em> an acquire fence <em>A</em> if an evaluation <em>Y</em>,
            sequenced before <em>A</em>, performs an atomic operation on <em>M</em> and reads
            the value written by <em>R</em>, or a value written by any side effect in the release
            sequence headed by <em>R</em>.</ins></blockquote>
    <h3>
        Chapter 29 edits</h3>
    <p>
        Remove the</p>
    <blockquote>
        <code>void fence(memory_order) const volatile;</code></blockquote>
    <p>
        members from all types in <em>[atomics]</em>.</p>
    <p>
        Remove the</p>
    <blockquote>
        <code>void atomic_flag_fence(const volatile atomic_flag *object, memory_order order);</code></blockquote>
    <p>
        function.</p>
    <p>
        Remove the</p>
    <blockquote>
        <code>void atomic_fence(const volatile atomic_<em>type</em>*, memory_order);</code></blockquote>
    <p>
        functions.</p>
    <p>
        Remove the definition</p>
    <blockquote>
        <code>extern const atomic_flag atomic_global_fence_compatibility;</code></blockquote>
    <p>
        Change 29.1 [atomics.order] p2 as follows:</p>
    <blockquote>
        The <code>memory_order_seq_cst</code> operations that load a value are acquire operations
        on the affected locations. The <code>memory_order_seq_cst</code> operations that
        store a value are release operations on the affected locations. In addition, in
        a consistent execution, there must be a single total order <em>S</em> on all <code>memory_order_seq_cst</code>
        operations <ins>and fences</ins>, consistent with the happens before order and modification
        orders for all affected locations, such that each <code>memory_order_seq_cst</code>
        operation observes either the last preceding modification according to this order
        <em>S</em>, or the result of an operation that is not <code>memory_order_seq_cst</code>.
        [ <em>Note:</em> Although it is not explicitly required that <em>S</em> include
        locks, it can always be extended to an order that does include lock and unlock operations,
        since the ordering between those is already included in the happens before ordering.
        <em>—end note</em> ]</blockquote>
    <blockquote>
        <ins>If a <code>memory_order_seq_cst</code> fence <em>F</em> is sequenced before an
            atomic operation <em>A</em> on an object <em>M</em>, <em>A</em> observes either
            the last <code>memory_order_seq_cst</code> modification of <em>M</em> preceding
            <em>F</em> in the total order <em>S</em>, or a later modification of <em>M</em>
            in its modification order.</ins></blockquote>
    <blockquote>
        <ins>If an atomic modification operation <em>A</em> of an object <em>M</em> is sequenced
            before a <code>memory_order_seq_cst</code> fence <em>F</em>, and a <code>memory_order_seq_cst</code>
            operation <em>B</em> on <em>M</em> follows <em>F</em> in <em>S</em>, <em>B</em>
            observes either the effects of <em>A</em> on <em>M</em>, or a later modification
            of <em>M</em> in its modification order.</ins></blockquote>
    <blockquote>
        <ins>If an atomic modification operation <em>A</em> of an object <em>M</em> is sequenced
            before a <code>memory_order_seq_cst</code> fence <em>Fa</em>, a <code>memory_order_seq_cst</code>
            fence <em>Fb</em> is sequenced before an atomic operation <em>B</em> on <em>M</em>,
            and <em>Fb</em> follows <em>Fa</em> in <em>S</em>, <em>B</em> observes either the
            effects of <em>A</em> on <em>M</em>, or a later modification on <em>M</em> in its
            modification order.</ins></blockquote>
    <p>
        Add</p>
    <blockquote>
        <code>// 29.6, fences</code><br />
        <code>void atomic_thread_fence(memory_order);</code><br />
        <code>void atomic_signal_fence(memory_order);</code>
    </blockquote>
    <p>
        to the synopsis of <code>&lt;cstdatomic&gt;</code>.</p>
    <p>
        Add a new section, <em>[atomic.fences]</em>, with the following contents:</p>
    <blockquote>
        <h3>
            29.6 Fences</h3>
    </blockquote>
    <blockquote>
        This section introduces synchronization primitives called <em>fences</em>. Their
        synchronization properties are described in [intro.multithreaded] and [atomics.order].</blockquote>
    <blockquote>
        <code>void atomic_thread_fence(memory_order mo);</code></blockquote>
    <blockquote>
        <em>Effects:</em> Depending on the value of <code>mo</code>, this operation:
        <ul>
            <li>has no effects, if <code>mo == memory_order_relaxed</code>;</li>
            <li>is an acquire fence, if <code>mo == memory_order_acquire || mo == memory_order_consume</code>;</li>
            <li>is a release fence, if <code>mo == memory_order_release</code>;</li>
            <li>is both an acquire fence and a release fence, if <code>mo == memory_order_acq_rel</code>;</li>
            <li>is a sequentially consistent acquire and release fence, if <code>mo == memory_order_seq_cst</code>.</li>
        </ul>
    </blockquote>
    <blockquote>
        <code>void atomic_signal_fence(memory_order mo);</code></blockquote>
    <blockquote>
        <em>Effects:</em> equivalent to <code>atomic_thread_fence(mo)</code>, except that
        synchronizes with relationships are established only between a thread and a signal
        handler executed in the same thread.</blockquote>
    <blockquote>
        [<em>Note:</em> <code>atomic_signal_fence</code> can be used to specify the order
        in which actions performed by the thread become visible to the signal handler. <em>&mdash;
            end note</em>]</blockquote>
    <blockquote>
        [<em>Note:</em> Compiler optimizations or reorderings of loads and stores are inhibited
        in the same way as with <code>atomic_thread_fence</code>, but the hardware fence
        instructions that <code>atomic_thread_fence</code> would have inserted are not emitted.
        <em>&mdash; end note</em>]</blockquote>
    <hr />
    <p>
        <em>Thanks to Hans Boehm, Lawrence Crowl, Paul McKenney, Clark Nelson and Raul Silvera
            for reviewing this paper.</em></p>
    <p>
        <em>--end</em></p>
</body>
</html>
