<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">
<title>C++ Atomic Types and Operations</title>
</head>
<body>
<h1>C++ Atomic Types and Operations</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N2427 = 07-0297 - 2007-10-03

<p>
Hans-J. Boehm, Hans.Boehm@hp.com, boehm@acm.org
<br>Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

<p>
This document is a revision of N2393 = 07-0253 - 2007-09-10.

<p>
<br><a href="#Introduction">Introduction</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#IntroRationale">Rationale</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#IntroGoals">Goals</a>
<br><a href="#Discussion">Discussion of Design</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussLibrary">A Library API</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussVolatile">Atomic Types, Operations, and Volatile</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussInterop">C and C++ Interoperability</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussOrder">Memory Ordering</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussConsistency">Consistency</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussFences">Fences</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussDependent">Dependent Memory Order</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussLockFree">Lock-Free Property</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussFlag">Flag Type and Operations</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussIntegral">Integral and Address Types</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussOperations">Atomic Operations</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussGeneric">Generic Types</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#DiscussImplement">Implementability</a>
<br><a href="#Prior">Prior Approaches</a>
<br><a href="#Changes">Changes to the Standard</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#library">Chapter 17 Library introduction [library]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#utilities">Chapter 20 General utilities library [utilities]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#utility.concepts">20.1 Concepts [utility.concepts]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#concept.comparison">20.1.2 Comparisons [concept.comparison]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#concept.copymove">20.1.5 Copy and move [concept.copymove]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics">Chapter 29 Atomics library [atomics]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.order">29.1 Order and Consistency [atomics.order]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.lockfree">29.2 Lock-Free Property [atomics.lockfree]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.flag">29.3 Flag Type and Operations [atomics.flag]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.types">29.4 Integral and Address Types [atomics.types]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.operations">29.5 Operations [atomics.operations]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.generic">29.6 Generic Types [atomics.generic]</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#atomics.synopsis">29.7 Synopsis [atomics.synopsis]</a>
<br><a href="#Implementation">Implementation</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplPresent">Notes on the Presentation</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplFiles">Implementation Files</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplCPP0X">C++0x Features</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplOrder">Memory Order</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplFlag">Flag Type and Operations</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplMacros">Implementation Macros</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplLockFree">Lock-Free Macro</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplIntegral">Integral and Address Types</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplBoolean">Boolean</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplAddress">Address</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplIntegers">Integers</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplTypedefs">Integer Typedefs</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplCharacters">Characters</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplTemplate">Template Types</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplGeneric">Fully Generic Type</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplPointer">Pointer Partial Specialization</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplSpecial">Integral Full Specializations</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplFunctions">C++ Core Functions</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplCoreMacros">C Core Macros</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplMethods">C++ Methods</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplCleanup">Implementation Header Cleanup</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ImplStandard">Standard Headers</a>
<br><a href="#Examples">Examples of Use</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleFlag">Flag</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleLazy">Lazy Initialization</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleInteger">Integer</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleEvent">Event Counter</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleList">List Insert</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleUpdate">Update</a>
<br>&nbsp;&nbsp;&nbsp;&nbsp;<a href="#ExampleMain">Main</a>

<h2><a name="Introduction">Introduction</a></h2>

<p>
We propose standard atomic types and operations.
In addition to normative wording for the interface,
we present a minimal conforming implementation
and example uses of the interface.

<h3><a name="IntroRationale">Rationale</a></h3>

<p>
The traditional shared-memory notion
that every store is instantly visible to all threads
induces an unacceptable performance loss on modern systems.
Therefore programmers must have a mechanism
to indicate when stores in one thread should be communicated to another.

<p>
Specifically, a program that wishes to communicate the fact that
a particular piece of data prepared by one thread
is ready to be examined by another thread,
needs a shared variable <em>flag</em>,
that both:

<ul>
<li> Allows atomic accesses,
in the sense that concurrent reads and writes are allowed
and that the reads result in only one of the assigned values
and never undefined behavior.</li>
<li> Ensures that any ordinary data written before the flag is set
(i.e. the prepared data)
is seen correctly by another thread after it sees a set flag.</li>
</ul>

<p>
Although the second aspect is often glossed over,
it is usually not automatic with modern hardware and compilers,
and is just as important as the first in ensuring correctness.

<p>
In POSIX, this communication is achieved by calling certain functions,
particularly mutex lock and unlock.
While mutexes are adequate for many applications,
there is a significant need for finer-grained atomic operations:

<dl>

<dt>Lock-free concurrent data structures</dt>
<dd>
Lock-free concurrent data structures
are difficult to design and implement correctly.
As such, there is a significant need for programmers
capable of doing so to write portably,
and for that they need standards.
</dd>

<dt>Inter-thread memory synchronization</dt>
<dd>
Occasionally synchronization with locks offers insufficient performance,
and other, more specific idioms suffice.
</dd>

</dl>

<h3><a name="IntroGoals">Goals</a></h3>

<p>
We had several goals for our design.
Each had significant impact on the details.

<ul>
<li>Provide the primitives critical to inter-thread memory synchronization.</li>
<li>Provide the primitives critical to effective lock-free algorithms.</li>
<li>Provide a very high degreee of C and C++ interoperability.</li>
<li>Cover the common range of existing hardware primitives.</li>
<li>Expose those hardware primitives efficiently to programmers,
that is, the design should minimize overhead.</li>
<li>Provide a syntax that reduces errors.</li>
<li>Support generic programming.</li>
</ul>


<h2><a name="Discussion">Discussion of Design</a></h2>

<p>
While our proposal is based on existing practice,
that practice is somewhat fragmented,
so some design choices may not be obvious.
In this section, we discuss our choices and their reasoning.

<h3><a name="DiscussLibrary">A Library API</a></h3>

<p>
We propose to add atomic types and operations purely as a library API.
In practice, for C++, this API would have to be implemented largely with
either compiler intrinsics or assembly code.
(As such, this proposal should be implemented by compiler vendors,
not library vendors,
much as the exception facilities are implemented by compiler vendors.)
For C, a compiler implementation is required for the type-generic macros.

<h3><a name="DiscussVolatile">Atomic Types, Operations, and Volatile</a></h3>

<p>
We propose atomic <em>types</em>
rather than atomic <em>operations</em> on general-purpose types.
Doing so enforces a single synchronization protocol
for all operations on an object.
Using two protocols simultaneously will result in synchronization failure.

<p>
We chose to specify atomic types with conventional type names
rather than modify the <samp>volatile</samp> type qualifier for concurrency.
The <samp>volatile</samp> type qualifier
has a long history within C and C++,
and changing its meaning is both risky and unnecessary.
In addition, the existing meaning of volatile,
"may be modified by external agents",
is somewhat orthogonal to
"may be modified concurrently by the program".
See <cite><a href="../2006/n2016.html">N2016</a>
Should <samp>volatile</samp>
Acquire Atomicity and Thread Visibility Semantics?</cite>
for a more complete discussion.
As a consequence, objects of atomic type may also be volatile qualified.
Compilers may optimize some non-volatile atomic operations,
where they would not be able to optimize volatile operations.

<h3><a name="DiscussInterop">C and C++ Interoperability</a></h3>

<p>
The proposal defines the atomic types as C++ standard-layout structs.
In C, these would be simply opaque types.
Headers common to both C and C++
use exactly the same syntax to declare atomic variables.

<p>
Furthermore, the proposal defines the C++ atomic types
such that the static initialization syntax
can be identical to C aggregate initialization syntax.
That is, atomic variables can be initialized in common headers as well.

<p>
The core atomic operations are free functions
that take pointers to atomic types.
Programmers use the same syntax for these operations
in both C and C++.
That is, a header included from both C and C++
can provide inline functions that operate on atomic types.

<p>
The proposal defines the core functions
as overloaded functions in C++
and as type-generic macros in C.
This approach helps programmers avoid changing code
when an atomic type changes size.

<p>
Because free functions are occasionally clumsy,
the proposal additionally provides member operators and member functions
so that C++ programmers may use a more concise syntax.

<h3><a name="DiscussOrder">Memory Ordering</a></h3>

<p>
Synchronization operations in the memory model
(<a href="../2006/n2052.htm">N2052</a>,
<a href="../2006/n2138.html">N2138</a>,
<a href="../2007/n2171.htm">N2171</a>,
<a href="../2007/n2176.html">N2176</a>,
<a href="../2007/n2177.html">N2177</a>,
<a href="../2007/n2300.html">N2300</a>,
<a href="../2007/n2392.html">N2392</a>)
may be either <em>acquire</em> or <em>release</em> operations, or both.
These operations govern the communication of non-atomic stores between threads.
A release operation ensures that
prior memory operations become visible
to a thread performing subsequent acquire operation
on the same object.

<p>
Rather than have atomic operations implicitly provide
both acquire and release semantics,
we choose to complicate the interface
by adding explicit ordering specifications to various operations.
Many comparable packages do not,
and instead provide only a single version of operations,
like compare-and-swap,
which implicitly include a full memory fence.

<p>
Unfortunately,
the extra ordering constraint introduced by the single version
is almost never completely necessary.
For example,
an atomic increment operation
may be used simply to count the number of times a function is called,
as in a profiler.
This requires atomicity, but no ordering constraints.
And on many architectures (PowerPC, ARM, Itanium, Alpha, though not X86),
the extra ordering constraints are at least moderately expensive.

<p>
The proposal defines an enumeration
that enables detailed specification of the memory order
for every operation.

<h3><a name="DiscussConsistency">Consistency</a></h3>

<p>
A strict interpretation of acquire and release
yields a fairly weak consistency model,
which allows threads to have a different notion of the order of writes.
For stronger consistency,
this proposal destinguishes between
an operation with acquire and release semantics
and an operation with sequentially consistent semantics.
See <cite><a href="../2007/n2177.html">N2177</a>,
Sequential Consistency for Atomics</cite>.

<p>
The member operators provide no mechanism
for specifying memory order enumeration,
and so they have only the strongest synchronization.
This default is not an undue burden
because using weaker synchronization requires considerable thought.
Any extra syntactic burden is dwarfed by the semantic burden.

<p>
Program auditors can search for uses of the memory order enumeration
to identify code that uses syncronization models
that are weaker than sequentially consistent.


<h3><a name="DiscussFences">Fences</a></h3>

<p>
It is also unclear how a convention requiring full global memory fences
would properly interact with an interface
that supported simple atomic loads and stores.
Here a full memory fence would generally multiply the cost by a large factor.
(The current gcc interface
does not seem to support simple atomic loads and stores explicitly,
which makes it unclear how to support e.g. lock-based emulation,
or architectures on which the relevant loads and stores
are not implicitly atomic.)

<p>
There are two possible approaches to specifying ordering constraints:

<ul>
<li> Have the programmer provide explicit memory fences/barriers,
perhaps most usefully
in a way that is analogous to the SPARC membar instructions.</li>
<li> Associate the ordering semantics with operations.
The closest hardware analog for this is probably Itanium,
though we carry this through more consistently.</li>
</ul>

<p>
Both approaches appear to have their merits.
We chose the latter for reasons described in
<a href="../2007/n2176.html">N2176</a>.</p>

<p>
Some architectures provide fences that are limited to loads or stores.
We have, so far, not included them,
since it seems to be hard to find cases in which both:

<ul>
<li> Such limited ordering constraints
are useful and not excessively brittle.</li>
<li> They actually result in a performance benefit
over the more general constraint.</li>
</ul>

<p>
However, we have provided <em>per-variable</em> fence operations,
which are semantically modeled as read-modify-write operations.
Such fences enable programmers to conditinalize memory synchronization,
which can substantially improve performance.
See 
<cite><a href="../2007/n2362.html">N2153
A simple and efficient memory model for weakly-ordered architectures</a></cite>
for motivating examples
and the <a href="ExampleLazy">Lazy Initialization</a> example
for an use of the proposed syntax.

<p>
We expect that implementations that have hardware fences
will use such operations to implement the language fences.
See
<cite><a href="../2007/n2362.html">
N2362 Converting Memory Fences to N2324 Form</a></cite>,
for a discussion of the issues and approaches to converting
from existing practice of global fences
to the per-variable fences in this proposal.


<h3><a name="DiscussDependent">Dependent Memory Order</a></h3>

<p>
Most architectures provide additional ordering guarantees
if one memory operation is dependent on another.
In fact, these are critical
for efficient implementation of languages like Java.

<p>
For C++, there is near-universal agreement
that it would be nice to have some such guarantees.
The fundamental issues are:

<ul>
<li>Compilers may remove or change data and/or control dependencies.</li>
<li>Detailed guarantees vary across architectures.</li>
</ul>

<p>
See papers
<a href="../2007/n2359.html">
N2359 C++ Dependency Ordering: Atomics</a>,
<a href="../2007/n2360.html">
N2360 C++ Dependency Ordering: Memory Model</a>, and
<a href="../2007/n2361.html">
N2361 C++ Dependency Ordering: Function Annotation</a>
for exploration of these issues.


<h3><a name="DiscussLockFree">Lock-Free Property</a></h3>

<p>
In some cases, both the decision to use a lock-free algorithm,
and sometimes the choice of lock-free algorithm,
depends on the availability of underlying hardware primitives.
In other cases, e.g. when dealing with asynchronous signals,
it may be important to know that operations like compare-and-swap
are really lock-free,
because a lock-based emulation might result in deadlock.

<p>
The proposal defines feature queries to determine
whether or not operations are lock-free.
We provide two kinds of feature queries,
compile-time preprocessor macros and run-time functions.
To facilitate optimal storage use,
the proposal supplies feature macros
that indicates general lock-free status of integral and address atomic types.
The run-time function provide per-object information.

<p>
The proposal provides run-time lock-free query functions
rather than compile-time constants
because subsequent implementations of a platform
may upgrade locking operations with lock-free operations,
so it is common for systems to abstract such facilities
behind dynamic libraries,
and we wish to leave that possiblility open.
Furthermore, we recommend that
implementations without hardware atomic support use that technique.

<p>
The proposal provides lock-free query functions on individual objects
rather than whole types
to permit unavoidably misaligned atomic variables
without penalizing the performance of aligned atomic variables.

<p>
Because consistent use of operations requires
that all operations on a type must use the same protocol,
all operations are lock-free or none of them are.
That is, the lock-free property applies to whole objects,
not individual operations.

<p>
To facilitate inter-process communication via shared memory,
it is our intent that lock-free operations also be <em>address-free</em>.
That is, atomic operations on the same memory location
via two different addresses
will communicate atomically.
The implementation shall not depend on any per-process state.
While such a definition is beyond the scope of the standard,
a clear statement of our intent
will enable a portable expression of class of a programs already extant.


<h3><a name="DiscussFlag">Flag Type and Operations</a></h3>

<p>
The proposal includes a very simple atomic flag type,
providing two primary operations,
test-and-set and clear.
This type is the minimum hardware-implemented type
needed to conform to this standard.
The remaining types can be emulated with the atomic flag,
though with less than ideal properties.
Few programmers should be using this type.

<p>
We considered adding a wait operation in this proposal,
but ultimately rejected it because
pure busy waiting has pathological performance characteristics
on multi-programmed machines
and because
doing better requires interaction with the operating system scheduler,
which seems inappropriate to a processor-level facility.</p>


<h3><a name="DiscussIntegral">Integral and Address Types</a></h3>

<p>
The proposal includes a full set of integral atomic types
and an address atomic type.

<p>
This proposal includes atomic integral types smaller than a machine word,
even though many architectures do not have such operations.
For machines that implement a word-based compare-and-swap operation,
the effect of operations can be achieved by loading the containing word,
modifying the sub-word in place,
and performing a compare-and-swap on the containing word.
In the event that no compare-and-swap is available,
the implementation may need to
either implement smaller types with larger types
or use locking algorithms.
Using the same size for the atomic type as its base type
eases effort required to port existing code,
e.g. code using <samp>__sync</samp>.

<h3><a name="DiscussOperations">Atomic Operations</a></h3>

<p>
The simplest atomic operations are load and store.

<p>
There are times when one wishes to store a new value,
but also load the old value.
An atomic load followed by an atomic store
is not an atomic sequence,
so we provide an atomic swap operation
that does a combined load and store atomically.

<p>
We also provide the common fetch-and-modify operations.
The fetch-and-modify functions return the original stored value.
The original stored value is required for fetch-and-or and fetch-and-and
because there is no means to compute to the original stored value
from the new value and the modifying argument.
In contrast to the functions,
the fetch-and-modify assignment operators return the new value.
We do this for consistency with normal assignment operators.
Unlike normal C++ assignment operators, though,
the atomic assignments return values rather than references,
which is like C.
The reason is that another thread might intervene
between an assignment and a subsequent read.
Rather than introduce this classic parallel programming bug,
we return a value.

<p>
The normal signed integral addition operations
have undefined behavior in the event of an overflow.
For atomic variables,
this undefined behavior introduces significant burden
because there is in general no way to pre-test for overflow.
Rather than leave overflow undefined,
we recognize the defacto behavior of modern systems
and define atomic fetch-and-add (-subtract) to use two's-complement arithmetic.
We are aware of no implementation of C++
for which this definition is a problem.

<p>
The compare-and-swap operation
seems to be the minimum general-purpose synchronization primitive.
It appears to be both necessary and sufficient
for most interesting lock-free algorithms.
The compare-and-swap operations may fail spuriously.
That is, the stored value may be known to be equal to the expected value,
and yet the operation fails to swap.
We permit this failure for two reasons.
First, it appears unavoidable for load-locked store-conditional implementations.
Second, hardware memory switches may prefer to fail some operations
to ensure overall machine performance.

<p>
The compare-and-swap operations
replace the expected value with the found value
in the event that the found value does not match the expected value.
The intent of the design is to set up a compare-and-swap loop
for the next iteration.
The reason for this design is that:

<ul>
<li>Recovering from spurious failure requires a loop.
</li>
<li>Setting up the loop for the next iteration
requires the least syntax in user code.
</li>
<li>The more comprehensive operation
subsumes the two hardware variants of compare and swap.
Some architectures (e.g. SPARC) return the mismatched value in hardware,
and it is more efficient to pass that back up
rather than artificially introduce another atomic load.
Architectures that don't provide the mismatched value
must do a subsequent load.
Likewise some architectures do not set conditions (e.g. SPARC)
and those will need to insert a comparison.
</li>
<li>Compilers will be able to examine the compilation context
and remove any unnecessary overhead
for the relatively rare cases where a loop is not needed.
</li>
</ul>

<p>
Unlike other operations,
the compare-and-swap operations
have two memory synchronization order parameters.
The first is for operations that succeed;
the second is for those that fail.
The reason for this pair of specifications
is that there are several circumstances in which
the failure case can have substantially weaker,
and substantially cheaper, synchronization.
However, for sequential consistency,
full synchronization is required.

<h3><a name="DiscussGeneric">Generic Types</a></h3>

<p>
Generic atomic types provide significant benefits:

<ul>
<li>They ease type-safe use of atomic pointers.</li>
<li>They ease the process of making atomic version of user-defined types.</li>
<li>They enable generic algorithms,
such as lock-free pointer-based queues.</li>
</ul>

<p>
The generic atomic type has
a partial specialization for pointers
(derived from the atomic address type),
and full specializations for integral types
(derived from the atomic integrals).

<p>
The primary problem with a generic atomic template
is that effective use of machine operations
requires that their parameter types
are trivially copy assignable and bitwise equality comparable.

<p>
Furthermore, parameter types that are not also
statically initializable and trivially destructable
may be difficult to use.

<p>
In the present language,
there is no mechanism to require these properties of a type parameter.
Roland Schwarz suggests using a template union to enfore POD type parameters.
Unfortunately, that approach also prevents
the derivation of specializations of atomic for the types above,
which is unacceptable.
Furthermore, Lois Goldthwaite
proposes generalizing unions to permit non-POD types in
<a href="../2007/n2248.html">N2248 Toward a More Perfect Union</a>.
We believe that concepts are a more appropriate mechanism
to enforce this restriction,
and so we have proposed the concepts and concept maps
necessary for safe use of generic atomics.
The burden on programmers using generic atomics
is to declare that types passes as template parameters
are in fact bitwise equality comparable.
The proposed mechanisms will infer trivially copy assignable types.

<p>
The intent is that vendors will specialize
a fully-general locking implementation of a generic atomic template
with implementations using hardware primitives
when those primitives are applicable.
This specialization can be accomplished
by defining a base template with the size and alignment
of the parameter type as additional template parameters,
and then specializing on those two arguments.


<h3><a name="DiscussImplement">Implementability</a></h3>

<p>
We believe that there is ample evidence for implementability.

<p>
The Intel/gcc
<a href="http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html#Atomic-Builtins"><samp>__sync</samp></a> intrinsics
provide evidence for compiler implementability of the proposed interface.

<blockquote>
<p>(We do not advocate standardizing these intrisics as is.
They provide far less control over memory ordering
than we advocated above.
For example,
they provide no way to atomically increment a counter
without imposing unnecessary ordering constraints.
The lack of appropriate ordering control
appears to already have resulted in implementation shortcuts,
e.g. gcc does not implement <samp>__sync_synchronize()</samp>
as a full memory barrier on X86,
in spite of the documentation.
We believe a number of issues were not fully understood
when that design was developed,
and it could could greatly benefit from another iteration at this stage.)
</blockquote>

<p>
Other packages,
particularly Boehm's
<a href="http://www.hpl.hp.com/research/linux/atomic_ops/">atomic_ops</a>
package provide evidence of efficient implementability
over a range of architectures.

<p>
The remaining implementation issue is the burden on implementors
to produce a minimally conforming implementation.
The minimum hardware support required
is the atomic test-and-set and clear operations
that form the basis of the atomic flag type.
This proposal includes an example implementation
based on that minimal hardware support,
and thus shows the vendor work <em>required</em>.


<h2><a name="Prior">Prior Approaches</a></h2>

<p>
This proposal results from experience with many approaches.
For brevity, we address only those approaches
with formal proposals before the C++ standards committee.
</p>

<p>
<a href="../2005/n1875.html">N1875</a>
presented an atomic statement,
which would require the compiler to select the appropriate atomic operations.
We chose to abandon this approach because:
</p>

<ul>
<li>
The non-atomic evaluations within the atomic block were less than obvious.
</li>
<li>
The atomic block syntax was too invitingly simple
for the level of skill required to use it correctly.
</li>
<li>
The atomic block syntax
seems far more appropriate to express general atomic memory transactions.
It is premature to standardize those at this point.
</li>
</ul>

<p>
<a href="../2006/n2047.html">N2047</a>
presented a template-based C++ library of atomic types
layered on top of a C library of atomic operations on plain integral types.
We chose to abandon this approach because:
</p>

<ul>
<li>
The template-based approach severely limits interoperability with C.
</li>
<li>
The proposal had
both "guaranteed atomic" and "possibly emulated" versions of atomic types,
which appears to have insufficient expressive power
to warrant the added complexity.
It was originally designed in part
to allow for access to hardware-provided atomic load and store operations
on platforms that do not provide hardware provided compare-and-swap,
a property not shared by the current proposal.
Our current feeling is that such platforms
will be rare enough by 2010
that this property is not a major consideration.
</li>
<li>
The distinction between wait-free and lock-free
had insufficient user support and use cases.
</li>
<li>
The C-level approach
failed to identify concurrently accessed data at the point of declaration,
and made it too easy to access such data with non-atomic primitives.
</li>
</ul>

<p>
<a href="../2007/n2145.html">N2145</a>
introduced the basic approach adopted in this proposal.
</p>

<ul>
<li>
There are atomic types, with atomic operations on them.
</li>
<li>
The operations include synchronization semantics.
</li>
<li>
There are a set of primitive types in common to both C and C++.
</li>
<li>
There were definitional weaknesses in the proposal
due to an overly-strong definition of POD.
</li>
</ul>

<p>
<a href="../2007/n2153.pdf">N2153</a>
proposes a fence-based memory model.
</p>

<ul>
<li>
There are atomic types, with atomic operations on them.
</li>
<li>
The operations include synchronization semantics.
</li>
<li>
There are global fences.
</li>
<li>
There is some discussion of load-dependent memory order.
</li>

</ul>

<p>
<a href="../2007/n2195.html">N2195</a>
presents a lower level interface than that of
<a href="../2007/n2145.html">N2145</a>.
N2195 deviates from the previous proposals in the following major areas:
</p>

<ul>
<li>
Absence of feature test macros or functions.
</li>
<li>
Absence of a dedicated enumerated set of atomic types.
</li>
<li>
Absence of a high-level C++ API
(the <samp>std::atomic&lt;T&gt;</samp> class template).
</li>
<li>
Inclusion of an acquire+release ordering constraint.
</li>
<li>
Passing the constraint as a first argument
instead of using several functions or macros with the appropriate suffix.
</li>
<li>
Introduction of bi-directional fences as proposed in N2153.
</li>
<li>
Additional requirements on the ordered constraint
that enable the programmer to achieve sequential consistency
and CCCC (cache-coherent casual consistency).
</li>
</ul>

<p>
<a href="../2007/n2324.html">N2334</a>
derives from N2145, but was refined in several areas.
</p>

<ul>
<li>
The synchronization specification
has changed from a part of the function name to a function parameter.
(Adopted from
<a href="../2007/n2195.html">N2195</a>.)
The change simplifies the specification and
simplifies the programming of generic algorithms.
Its major cost is that optimizers
will need to do constant parameter propagation to obtain full performance.
We feel this capability is common enough in modern compilers
to not be an excessive burden.
</li>
<li>
We have split the "ordered" synchronization primitive
into two parts, "acquire-release" and "sequentially-consistent".
This split provides finer control over synchronization.
(Adopted from
<a href="../2007/n2195.html">N2195</a>.)
<li>
We have added per-variable fence operations,
consistent with the updated concurrency model.
These fences were motivated by
<a href="../2007/n2153.html">N2153</a>,
but are not isomorphic to them.
In particular, N2153 provides global fences
but N2334 provides per-variable fences.)
</li>
<li>
We have updated the proposal to include the new character types from
<a href="../2007/n2249.html">N2249
New Character Types in C++</a>.
</li>
<li>
We have addressed the definitional weaknesses
with respect to POD-ness
with the help of the following papers.
<ul>
<li>
<a href="../2007/n2215.pdf">N2215</a> Initializer lists (Rev.3)
</li>
<li>
<a href="../2007/n2230.html">N2230</a> POD's Revisited;
Resolving Core Issue 568 (Revison 3)
</li>
<li>
<a href="../2007/n2235.pdf">N2235</a> Generalized Constant Expressions &mdash;
Revision 5
</li>
<li>
<a href="../2007/n2326.html">N2326</a> Defaulted and Deleted Functions
</li>
</ul>
</li>
</ul>

<p>
<a href="../2007/n2381.html">N2381</a>
is primarily a set of editorial changes to N2324,
but with the following substantive changes.
</p>

<ul>
<li>
We included summaries of more prior work.
</li>
<li>
We moved some discussion of specifying ordering constraints to 
<a href="../2007/n2176.html">N2176</a>.
<li>
We added references to the work on load-dependent ordering.
</li>
<li>
We added a discussion of the design for compare-and-swap.
</li>
<li>
We added a second synchronization parameter to compare-and-swap
to specify the synchronization semantics for a failed comparison.
</li>
<li>
We removed the implicit conversion operators.
</li>
</ul>

<p>
<a href="../2007/n2393.html">N2393</a>
is a substantial editorial rewrite of N2381,
with primary emphasis on providing normative wording.
Changes are generally not substantive,
but with the following exceptions.
</p>

<ul>

<li>
We split our use of "scalar" into "integral and address"
to correctly indicate the set of atomic types.
We have split the lock-free property macro to correspond.
</li>

<li>
We add the concepts necessary to for the atomic generics to behave properly.
</li>

<li>
We have eliminated one unspportable requirement on ordering,
replacing it by a non-normative encouragement.
</li>

<li>
We have eliminated one requirement
as a result of simplification in the memory model.
</li>

</ul>

<p>
<a href="../2007/n2427.html">N2427</a>
is a minor change to N2393
correcting a weakness in the interaction between concepts,
booleans, and hardware compare-and-swap.
</p>

<ul>

<li>
We changed <samp>BitwiseEqualityComparable</samp>
to <samp>AtomicComparable</samp>.
</li>

<li>
We changed <samp>AtomicComparable</samp>
to not be a refinement of <samp>EqualityComparable</samp>.
</li>

</ul>

<h2><a name="Changes">Changes to the Standard</a></h2>

<p>
This section of the proposal provides the formal wording for the standard.

<h3><a name="library">Chapter 17 Library introduction [library]</a></h3>

<p>
Add a new paragraph with the following wording:

<blockquote>
<p>
The atomic components allow more fine-grained concurrent access to shared data
than is possible with locks.
</blockquote>

<h3><a name="utilities">Chapter 20 General utilities library [utilities]</a></h3>

<p>
This chapter of the working paper has not yet been modified for concepts.
The changes listed under this chapter will apply after those of
<a href="../2007/n2322.pdf">
N2322 Concepts for the C++0x Standard Library: Utilities</a>
or its adopted successor.

<h3><a name="utility.concepts">20.1 Concepts [utility.concepts]</a></h3>

<p>
In the synopsis, move <samp>LessThanComparable</samp>
immediately before <samp>EqualityComparable</samp>.

<p>
In the synopsis, immediately after <samp>EqualityComparable</samp> add:

<blockquote>
<p>
<samp>concept AtomicComparable&lt;typename T&gt;</samp>
<i>see below;</i>
<br>
<samp>concept_map AtomicComparable&lt;T&gt;</samp>
<i>see below;</i>
</blockquote>

<p>
In the synopsis, immediately after <samp>CopyAssignable</samp> add:

<blockquote>
<p>
<samp>concept TriviallyCopyAssignable&lt;typename T&gt;</samp>
<i>see below;</i>
<br>
<samp>template&lt;CopyAssignable T&gt;
concept_map TriviallyCopyAssignable&lt;T&gt;</samp>
<i>see below;</i>
</blockquote>

<h3><a name="concept.comparison">20.1.2 Comparisons [concept.comparison]</a></h3>

<p>
Move the paragraphs on <samp>LessThanComparable</samp>
immediately before those on <samp>EqualityComparable</samp>.

<p>
Add a new paragraph after those on <samp>EqualityComparable</samp>.

<blockquote>
<p>
Concept <samp>AtomicComparable</samp>
to enable a bitwise equality comparison,
as with <samp>memcmp</samp>,
in the implementation of compare-and-swap operations.
[<i>Note:</i>
Such types should not have padding,
i.e. the size of the type is the sum of the sizes of its elements.
If padding exists, the comparison may provide false negatives,
but never false positives.
<i>&mdash;end note</i>]


<pre><samp>
concept AtomicComparable&lt;typename T&gt; { }
</samp></pre>
</blockquote>

<p>
Add a new paragraph.

<blockquote>
<p>
All integral types are <samp>AtomicComparable</samp>.

<pre><samp>
concept_map AtomicComparable&lt;char&gt; { }
concept_map AtomicComparable&lt;signed char&gt; { }
concept_map AtomicComparable&lt;unsigned char&gt; { }
concept_map AtomicComparable&lt;short&gt; { }
concept_map AtomicComparable&lt;unsigned short&gt; { }
concept_map AtomicComparable&lt;int&gt; { }
concept_map AtomicComparable&lt;unsigned int&gt; { }
concept_map AtomicComparable&lt;long&gt; { }
concept_map AtomicComparable&lt;unsigned long&gt; { }
concept_map AtomicComparable&lt;long long&gt; { }
concept_map AtomicComparable&lt;unsigned long long&gt; { }
concept_map AtomicComparable&lt;wchar_t&gt; { }
concept_map AtomicComparable&lt;char16_t&gt; { }
concept_map AtomicComparable&lt;char32_t&gt; { }
concept_map AtomicComparable&lt;char32_t&gt; { }
concept_map AtomicComparable&lt;bool&gt; { }
</samp></pre>
</blockquote>

<p>
Add a new paragraph.

<blockquote>
<p>
All pointer types are <samp>AtomicComparable</samp>.

<pre><samp>
template&lt; typename T &gt; concept_map AtomicComparable&lt;T*&gt; { }
</samp></pre>
</blockquote>

<h3><a name="concept.copymove">20.1.5 Copy and move [concept.copymove]</a></h3>

<p>
Add a new paragraph after the paragraph on <samp>CopyAssignable</samp>:

<blockquote>
<p>
Concept <samp>TriviallyCopyAssignable</samp>
requires the ability to assign to an object
with a trivial assignment.

<pre><samp>
concept TriviallyCopyAssignable&lt;typename T&gt; : CopyAssignable&lt;T&gt; { }

template&lt;CopyAssignable T&gt;
    requires True&lt;has_trivial_assign&lt;T&gt;::value&gt;
concept_map TriviallyCopyAssignable&lt;T&gt; { }
</samp></pre>

</blockquote>

<h3><a name="atomics">Chapter 29 Atomics library [atomics]</a></h3>

<p>
Add a new clause with the following paragraphs.
</p>

<blockquote>
<p>
This clause describes components
for fine-grained atomic access.
This access is provided via operations on atomic objects.
[<i>Footnote:</i>
Atomic objects are neither active nor radioactive.
<i>&mdash;end footnote</i>]
</p>

<p>
The following subclauses describe atomics requirements,
and components for types and operations,
as summarized below.
</p>

<table border=1>
<caption>Atomics library summary</caption>
<tr><th>Subclause</th><th>Header(s)</th></tr>
<tr><td>29.1 Order and Consistency [atomics.order]</td>
<td></td></tr>
<tr><td>29.2 Lock-Free Property [atomics.lockfree]</td>
<td></td></tr>
<tr><td>29.3 Flag Type and Operations [atomics.flag]</td>
<td></td></tr>
<tr><td>29.4 Integral and Address Types [atomics.types]</td>
<td></td></tr>
<tr><td>29.5 Operations [atomics.operations]</td>
<td></td></tr>
<tr><td>29.6 Generic Types [atomics.generic]</td>
<td></td></tr>
<tr><td>29.7 Synopsis [atomics.synopsis]</td>
<td><samp>&lt;cstdatomic&gt; &lt;stdatomic.h&gt;</samp></td></tr>
</table>

</blockquote>


<h3><a name="atomics.order">29.1 Order and Consistency [atomics.order]</a></h3>

<p>
Add a new sub-clause with the following paragraphs.

<blockquote>
<p>
The enumeration <samp>memory_order</samp>
specifies the detailed
regular (non-atomic) memory synchronization order
as defined in
[the new section added by <a href="../2007/n2334.html">N2334</a>
or its adopted successor]
and may provide for operation ordering.
Its enumerated values and their meanings are as follows.

<dl>
<dt><samp>memory_order_relaxed</samp>
<dd>
The operation does not order memory.
</dd>

<dt><samp>memory_order_release</samp>
<dd>
Performs a release operation on the affected memory locations,
thus making regular memory writes visible to other threads
through the atomic variable to which it is applied.
</dd>

<dt><samp>memory_order_acquire</samp>
<dd>
Performs an acquire operation on the affected memory locations,
thus making regular memory writes in other threads
released through the atomic variable to which it is applied,
visible to the current thread.
</dd>

<dt><samp>memory_order_acq_rel</samp>
<dd>
The operation has both acquire and release semantics.
</dd>

<dt><samp>memory_order_seq_cst</samp>
<dd>
The operation has both acquire and release semantics,
and in addition, has sequentially-consistent operation ordering.
</dd>

</dl>

<p>
The <samp>memory_order_seq_cst</samp> operations that load a value
are acquire operations on the affected locations.
The <samp>memory_order_seq_cst</samp> operations that store a value
are release operations on the affected locations.
In addition, in a consistent execution,
there must be a single total order <var>S</var>
on all <samp>memory_order_seq_cst</samp> operations,
consistent with the happens before order and modification orders
for all affected locations,
such that each <samp>memory_order_seq_cst</samp> operation
observes either
the last preceding modification according to this order <var>S</var>,
or the result of an operation that is not <samp>memory_order_seq_cst</samp>.
[<i>Note:</i>
Although we do not explicitly require that <var>S</var> include locks,
it can always be extended
to an order that does include lock and unlock operations,
since the ordering between those
is already included in the happens before ordering.
<i>&mdash;end note</i>]

<p>
An atomic store shall only store a value
that has been computed from constants and program input values
by a finite sequence of program evaluations,
such that each evaluation observes the values of variables
as computed by the last prior assignment in the sequence.
[<i>Footnote:</i>
Among other implications, atomic variables shall not decay.
<i>&mdash;end footnote</i>]
The ordering of evaluations in this sequence must be such that

<ol>

<li>If an evaluation B observes a value computed by A in a different thread,
then B must not happen before A.
</li>

<li>If an evaluation A is included in the sequence,
then all evaluations that assign to the same variable
and are sequenced before or happens-before A
must be included.
</li>
</ol>

<p>
[<i>Note:</i> Requirement 2 disallows "out-of-thin-air",
or "speculative" stores of atomics when relaxed atomics are used.
Since unordered operations are involved,
evaluations may appear in this sequence out of thread order.
For example, with <samp>x</samp> and <samp>y</samp> initially zero,

<dl>
<dt>Thread 1:</dt>
<dd><samp>r1 = y.load( memory_order_relaxed );</samp></dd>
<dd><samp>x.store( r1, memory_order_relaxed );</samp></dd>
<dt>Thread 2:</dt>
<dd><samp>r2 = x.load( memory_order_relaxed );</samp></dd>
<dd><samp>y.store( 42, memory_order_relaxed );</samp></dd>
</dl>

<p>
is allowed to produce <samp>r1</samp> = <samp>r2</samp> = 42.
The sequence of evaluations justifying this consists of:

<blockquote><p><samp>
y.store( 42, memory_order_relaxed );
<br>r1 = y.load( memory_order_relaxed );
<br>x.store( r1, memory_order_relaxed );
<br>r2 = x.load( memory_order_relaxed );
</samp></blockquote>

<p>
On the other hand,

<dl>
<dt>Thread 1:</dt>
<dd><samp>r1 = y.load( memory_order_relaxed );</samp></dd>
<dd><samp>x.store( r1, memory_order_relaxed );</samp></dd>
<dt>Thread 2:</dt>
<dd><samp>r2 = x.load( memory_order_relaxed );</samp></dd>
<dd><samp>y.store( r2, memory_order_relaxed );</samp></dd>
</dl>

<p>
may not produce <samp>r1</samp> = <samp>r2</samp> = 42,
since there is no sequence of evaluations
that results in the computation of 42.
In the absence of "relaxed" operations
and read-modify-write operations
with weaker than <samp>memory_order_acq_rel</samp> ordering,
it has no impact.
<i>&mdash;end note</i>]

<p>
[<i>Note:</i>
The requirements do allow <samp>r1</samp> == <samp>r2</samp> == 42 in
(<samp>x</samp>, <samp>y</samp> initially zero):

<dl>
<dt>Thread 1:</dt>
<dd><samp>r1 = x.load( memory_order_relaxed );</samp></dd>
<dd><samp>if ( r1 == 42 ) y.store( r1, memory_order_relaxed );</samp></dd>
<dt>Thread 2:</dt>
<dd><samp>r2 = y.load( memory_order_relaxed );</samp></dd>
<dd><samp>if ( r2 == 42 ) x.store( 42, memory_order_relaxed );</samp></dd>
</dl>

<p>
Implementations are discouraged from allowing such behavior.
<i>&mdash;end note</i>]

<p>
Implementations shall strive
to make atomic stores
visible to atomic loads within a reasonable amount of time.
They shall never move an atomic operation out of an unbounded loop.

</blockquote>


<h3><a name="atomics.lockfree">29.2 Lock-Free Property [atomics.lockfree]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
The macros <samp>ATOMIC_INTEGRAL_LOCK_FREE</samp>
and <samp>ATOMIC_ADDRESS_LOCK_FREE</samp>
indicates the general lock-free property of integral and address atomic types.
The properties also apply
to the corresponding specializations of the atomic template.
A value of 0 indicates that the types are never lock-free.
A value of 1 indicates that the types are sometimes lock-free.
A value of 2 indicates that the types are always lock-free.

<p>
The function <samp>atomic_is_lock_free</samp>
indicates whether or not the object is lock-free.
The result of a lock-free query on one object
cannot be inferred from the result of a lock-free query
on another object.

<p>
[<i>Note:</i>
Operations that are lock-free should also be also <dfn>address-free</dfn>.
That is, atomic operations on the same memory location
via two different addresses will communicate atomically.
The implementation shall not depend on any per-process state.
This restriction enables communication
via memory mapped into a process more than once
and memory shared between two processes.
&mdash;<i>end note</i>]</p>

</blockquote>


<h3><a name="atomics.flag">29.3 Flag Type and Operations [atomics.flag]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
The <samp>atomic_flag</samp> type
provides the classic test-and-set functionality.
It has two states, set and clear.

<p>
Operations on an <samp>atomic_flag</samp>
must be lock free.
[<i>Note:</i>
Hence the operations must be address-free.
No other type requires lock-free operations,
and hence the <samp>atomic_flag</samp> type
is the minimum hardware-implemented type needed to conform to this standard.
The remaining types can be emulated with <samp>atomic_flag</samp>,
though with less than ideal properties.
&mdash;<i>end note</i>]

<p>
The <samp>atomic_flag</samp> type shall have standard layout.
It shall have a trivial default constructor,
deleted copy constructor, a deleted copy assignment operator,
and a trivial destructor.

<p>
The macro <samp>ATOMIC_FLAG_INIT</samp>
shall be used to initialize an <samp>atomic_flag</samp>
to the clear state.
Such initialization shall be static for static-duration variables.
An <samp>atomic_flag</samp> shall not be zero initialized.
[<i>Example:</i>
<pre><samp>atomic_flag guard = ATOMIC_FLAG_INIT;</samp></pre>
<p>
<i>&mdash;end example</i>].

<pre><samp>
bool atomic_flag_test_and_set( volatile atomic_flag* <var>object</var> )
bool atomic_flag_test_and_set_explicit( volatile atomic_flag* <var>object</var>,
        memory_order )
bool atomic_flag::test_and_set(
        memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically, set the value in <samp><var>object</var></samp> to true.
Memory is affected as per <samp><var>order</var></samp>.
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence both such an operation and the evaluation
that produced the input value
synchronize with any evaluation that reads the updated value.
</dd>
<dt><i>Returns:</i>
<dd>
Atomically, true if the set of <samp><var>object</var></samp>
immediately before the effects.
</dd>
</dl>

<p>
<pre><samp>
void atomic_flag_clear( volatile atomic_flag* <var>object</var> )
void atomic_flag_clear_explicit( volatile atomic_flag* <var>object</var>,
        memory_order )
void atomic_flag::clear(
        memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically, set the value in <samp><var>object</var></samp> to true.
Memory is affected as per <samp><var>order</var></samp>.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>order</var></samp> argument shall be
neither <samp>memory_order_acquire</samp> nor <samp>memory_order_acq_rel</samp>.
</dd>
</dl>

<p>
<pre><samp>
void atomic_flag_fence( const volatile atomic_flag* <var>object</var>,
        memory_order <var>order</var> )
void atomic_flag::fence( memory_order <var>order</var> ) const volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Memory is affected as per <samp><var>order</var></samp>.
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence such an operation 
synchronizes with any operation on the same variable.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>order</var></samp> argument shall not be
<samp>memory_order_relaxed</samp>.
</dd>
</dl>

<p>
The
static-duration variable <samp>atomic_global_fence_compatibility</samp>
of type <samp>atomic_flag</samp>
may be used to emulate global fences.
[<i>Example:</i>
<pre>
atomic_flag_fence( &amp;atomic_global_fence_compatibility, memory_order_acquire );
atomic_global_fence_compatibility.fence( memory_order_release );
</pre>
<p>
&mdash;<i>end example</i>]

</blockquote>


<h3><a name="atomics.types">29.4 Integral and Address Types [atomics.types]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
The atomic integral and address types are listed below.
These types shall have standard layout.
They shall have a trivial default constructor,
A constexpr explicit value constructor,
a deleted copy constructor,
a deleted copy assignment operator,
and a trivial destructor.
These types shall support aggregate initialization syntax.

<p>
The set of operations availble for these types
are defined in
<a href="#atomics.synopsis">29.7 Synopsis [atomics.synopsis]</a>.
The semantics of those operations are defined in
<a href="#atomics.operations">29.5 Operations [atomics.operations]</a>.

<p>
The <samp>atomic_bool</samp> type
provides an atomic boolean.

<p>
The <samp>atomic_address</samp> type
provides atomic <samp>void*</samp> operations.
The unit of addition/subtraction is one byte.
[<i>Note:</i>
The fetch-and-and, fetch-and-or, and fetch-and-xor operations
are not supported as those operations have no semantics for pointers.
&mdash;<i>end note</i>]

<p>
There are a full set of atomic integral types.

<blockquote>
<p><samp>atomic_char</samp>,
<samp>atomic_schar</samp>,
<samp>atomic_uchar</samp>,
<samp>atomic_short</samp>,
<samp>atomic_ushort</samp>,
<samp>atomic_int</samp>,
<samp>atomic_uint</samp>,
<samp>atomic_long</samp>,
<samp>atomic_ulong</samp>,
<samp>atomic_llong</samp>,
<samp>atomic_ullong</samp>
<samp>atomic_char16_t</samp>,
<samp>atomic_char32_t</samp>,
<samp>atomic_wchar_t</samp>
</blockquote>

<p>In addition to integral types, there are typedefs
for atomic types corresponding to the <samp>&lt;cstdint&gt;</samp> typedefs.
<blockquote>
<p><samp>atomic_int_least8_t</samp>,
<samp>atomic_uint_least8_t</samp>,
<samp>atomic_int_least16_t</samp>,
<samp>atomic_uint_least16_t</samp>,
<samp>atomic_int_least32_t</samp>,
<samp>atomic_uint_least32_t</samp>,
<samp>atomic_int_least64_t</samp>,
<samp>atomic_uint_least64_t</samp>,
<samp>atomic_int_fast8_t</samp>,
<samp>atomic_uint_fast8_t</samp>,
<samp>atomic_int_fast16_t</samp>,
<samp>atomic_uint_fast16_t</samp>,
<samp>atomic_int_fast32_t</samp>,
<samp>atomic_uint_fast32_t</samp>,
<samp>atomic_int_fast64_t</samp>,
<samp>atomic_uint_fast64_t</samp>,
<samp>atomic_intptr_t</samp>,
<samp>atomic_uintptr_t</samp>,
<samp>atomic_size_t</samp>,
<samp>atomic_ssize_t</samp>,
<samp>atomic_ptrdiff_t</samp>,
<samp>atomic_intmax_t</samp>,
<samp>atomic_uintmax_t</samp>
</blockquote>

<p>[<i>Note:</i>
The representation of atomic integral and address types
need not have the same size
as their corresponding regular types.
They should have the same size whenever possible,
as it eases effort required to port existing code.
&mdash;<i>end note</i>]</p>

</blockquote>

<h3><a name="atomics.operations">29.5 Operations [atomics.operations]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
There are only a few kinds of operations on atomic types,
though there are many instances on those kinds.
This section specifies each general kind,
and the specific instances are defined in
<a href="#atomics.synopsis">29.7 Synopsis [atomics.synopsis]</a>.

<p>
In the following operation definitions:
<ul>
<li>
An <samp><var>A</var></samp> refers to one of the atomic types.
</li>
<li>
A <samp><var>C</var></samp> refers to its corresponding non-atomic type.
The <samp>atomic_address</samp> atomic type
corresponds to the <samp>void*</samp> non-atomic type.
</li>
<li>
A <samp><var>M</var></samp>
refers to type of the other argument for arithmetic operaions.
For integral atomic types, <samp><var>M</var></samp>
is <samp><var>C</var></samp>.
For atomic address types, <samp><var>M</var></samp>
is <samp>std::ptrdiff_t</samp>.
</li>
<li>
The free functions not ending in <samp>_explicit</samp>
have the semantics of their corresponding <samp>_explicit</samp>
with <samp>memory_order</samp> arguments of <samp>memory_order_seq_cst</samp>.
</li>
</ul>

<p>
[<i>Note:</i>
Many operations are volatile-qualified.
The "volatile as device register" semantics have not changed
in the standard.
This qualification means that volatility is preserved
when applying these operations to volatile objects.
It does not mean that operations
on non-volatile objects
become volatile.
Thus, volatile qualified operations on non-volatile objects
may be merged under some conditions.
<i>&mdash;end note</i>]

<pre><samp>
bool atomic_is_lock_free( const volatile <var>A</var>* <var>object</var> )
void <var>A</var>::is_lock_free() const volatile
</samp></pre>
<dl>
<dt><i>Returns:</i>
<dd>
True if the object's operations are lock-free,
false otherwise.
</dd>
</dl>

<pre><samp>
void atomic_store( volatile <var>A</var>* <var>object</var>, <var>C</var> <var>desired</var> )
void atomic_store_explicit( volatile <var>A</var>* <var>object</var>, <var>C</var> <var>desired</var>,
         memory_order <var>order</var> )
void <var>A</var>::store( <var>C</var> <var>desired</var>,
         memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically replace the value in <samp><var>object</var></samp>
with <samp><var>desired</var></samp>.
Memory is affected as per <samp><var>order</var></samp>.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>order</var></samp> argument shall be
neither <samp>memory_order_acquire</samp> nor <samp>memory_order_acq_rel</samp>.
</dd>
</dl>

<p>
<pre><samp>
<var>C</var> <var>A</var>::operator =( <var>C</var> <var>desired</var> ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>store( <var>desired</var> )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp><var>desired</var></samp>
</dd>
</dl>

<p>
<pre><samp>
<var>C</var> atomic_load( volatile <var>A</var>* <var>object</var> )
<var>C</var> atomic_load_explicit( volatile <var>A</var>* <var>object</var>, memory_order )
<var>C</var> <var>A</var>::load( memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Memory is affected as per <samp><var>order</var></samp>.
</dd>
<dt><i>Returns:</i>
<dd>
Atomically return value in <samp><var>object</var></samp>.
Memory is affected as per <samp><var>order</var></samp>.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>order</var></samp> argument shall be
neither <samp>memory_order_release</samp> nor <samp>memory_order_acq_rel</samp>.
</dd>
</dl>

<p>
<pre><samp>
<var>C</var> atomic_swap( volatile <var>A</var>* <var>object</var>, <var>C</var> <var>desired</var> )
<var>C</var> atomic_swap_explicit( volatile <var>A</var>* <var>object</var>, <var>C</var> <var>desired</var>, memory_order )
<var>C</var> <var>A</var>::swap( <var>C</var> <var>desired</var>,
         memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically replace the value in <samp><var>object</var></samp>
with <samp><var>desired</var></samp>.
Memory is affected as per <samp><var>order</var></samp>.
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence both such an operation and the evaluation
that produced the input value
synchronize with any evaluation that reads the updated value.
</dd>
<dt><i>Returns:</i>
<dd>
Atomically, the value of <samp><var>object</var></samp>
immediately before the effects.
</dd>
</dl>

<p>
<pre><samp>
bool atomic_compare_swap( volatile <var>A</var>* <var>object</var>, <var>C</var>* <var>expected</var>, <var>C</var> <var>desired</var> )
bool atomic_compare_swap_explicit( volatile <var>A</var>* <var>object</var>,
        <var>C</var>* <var>expected</var>, <var>C</var> <var>desired</var>, memory_order <var>success</var>, memory_order <var>failure</var> )
bool <var>A</var>::compare_swap( <var>C</var>& <var>expected</var>, <var>C</var> <var>desired</var>,
        memory_order <var>success</var>, memory_order <var>failure</var> ) volatile
bool <var>A</var>::compare_swap( <var>C</var>& <var>expected</var>, <var>C</var> <var>desired</var>,
        memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically, compares the value in <samp><var>object</var></samp>
for equality with that in <samp><var>expected</var></samp>,
and if true,
replaces the value in <samp><var>object</var></samp>
with <samp><var>desired</var></samp>,
and if false,
updates the value in <samp><var>expected</var></samp>
with the value in <samp><var>object</var></samp>.
If true, 
memory is affected as per <samp><var>success</var></samp>.
If false, 
memory is affected as per <samp><var>failure</var></samp>.
When only a single <samp><var>order</var></samp> is specified,
<samp><var>success</var></samp> is <samp><var>order</var></samp>
and <samp><var>failure</var></samp> is <samp><var>order</var></samp>
with <samp>memory_order_acq_rel</samp>
decaying to <samp>memory_order_acquire</samp>
and <samp>memory_order_release</samp>
decaying to <samp>memory_order_relaxed</samp>.
[<i>Footnote:</i>
The memory order decays, not the memory itself.
&mdash;<i>end footnote</i>]
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence both such an operation and the evaluation
that produced the input value
synchronize with any evaluation that reads the updated value.
</dd>
<dt><i>Returns:</i>
<dd>
The result of the comparison.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>failure</var></samp> argument shall be
neither <samp>memory_order_release</samp> nor <samp>memory_order_acq_rel</samp>.
The <samp><var>failure</var></samp> argument shall be
no stronger than the <samp><var>success</var></samp> argument.
</dd>
</dl>

<p>
[<i>Note:</i>
The effect of the compare-and-swap operations is
<blockquote><pre><samp>
if ( *<var>object</var> == *<var>expected</var> )
    *<var>object</var> = <var>desired</var>;
else
    *<var>expected</var> = *<var>object</var>;
</samp></pre></blockquote>
<p>
&mdash;<i>end note</i>]

<p>
The compare-and-swap operations may fail spuriously,
that is return false
while leaving the value in <samp><var>expected</var></samp> unchanged.
[<i>Note:</i>
This spurious failure enables implementation of compare-and-swap
on a broader class of machines, e.g. load-locked store-conditional machines.
&mdash;<i>end note</i>]
[<i>Example:</i>
A consequence of spurious failure
is that nearly all uses of compare-and-swap will be in a loop.
<blockquote><pre><samp>
expected = current.load();
do desired = function( expected );
while ( ! current.compare_swap( expected, desired ) );
</samp></pre></blockquote>
<p>
&mdash;<i>end example</i>]

<pre><samp>
void atomic_fence( const volatile <var>A</var>* <var>object</var>, memory_order <var>order</var> )
void <var>A</var>::fence( memory_order <var>order</var> ) const volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Memory is affected as per <samp><var>order</var></samp>.
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence such an operation 
synchronizes with any operation on the same variable.
</dd>
<dt><i>Requires:</i>
<dd>
The <samp><var>failure</var></samp> argument shall not be
<samp>memory_order_relaxed</samp>.
</dd>
</dl>

<p>
The following operations perform arithmetic computations.
The key, operator, and computation correspondence is:

<p>
<table border=1 cellpadding=2>
<tr>
<th><samp><var>key</var></samp></th><th><var>op</var></th><th><var>computation</var></th>
<th><samp><var>key</var></samp></th><th><var>op</var></th><th><var>computation</var></th>
</tr>
<tr>
<td><samp>add</samp></td><td><samp>+</samp></td><td>addition</td>
<td><samp>sub</samp></td><td><samp>-</samp></td><td>subtraction</td>
</tr>
<tr>
<td><samp>or</samp></td><td><samp>|</samp></td><td>bitwise inclusive or</td>
<td><samp>xor</samp></td><td><samp>^</samp></td><td>bitwise exclusive or</td>
</tr>
<tr>
<td><samp>and</samp></td><td><samp>&amp;</samp></td><td>bitwise and</td>
</tr>
</table>

<pre><samp>
<var>C</var> atomic_fetch_<var>key</var>( volatile <var>A</var>* <var>object</var>, <var>M</var> <var>operand</var> )
<var>C</var> atomic_fetch_<var>key</var>_explicit( volatile <var>A</var>* <var>object</var>, <var>M</var> <var>operand</var>,
        memory_order )
<var>C</var> <var>A</var>::fetch_<var>key</var>( <var>M</var> <var>operand</var>,
        memory_order <var>order</var> = memory_order_seq_cst ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
Atomically replace the value in <samp><var>object</var></samp>
with result of the <var>computation</var>
applied to the value in <samp><var>object</var></samp>
and the given <samp><var>operand</var></samp>.
Memory is affected as per <samp><var>order</var></samp>.
These operations are read-modify-write operations
in the sense of the "synchronizes with" definition in
[the new section added by
<a href="../2007/n2334.htm">N2334</a> or successor],
and hence both such an operation and the evaluation
that produced the input value
synchronize with any evaluation that reads the updated value.
</dd>
<dt><i>Returns:</i>
<dd>
Atomically, the value of <samp><var>object</var></samp>
immediately before the effects.
</dd>
</dl>

<p>
For signed integral types,
arithmetic is defined to use two's-complement representation.
There are no undefined results.
For address types,
the result may be an undefined address,
but the operations otherwise have no undefined behavior.

<pre><samp>
<var>C</var> <var>A</var>::operator <var>op</var>=( <var>M</var> <var>operand</var> ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>fetch_<var>key</var>( <var>operand</var> )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp>fetch_<var>key</var>( <var>operand</var> ) <var>op</var> <var>operand</var></samp>
</dd>
</dl>

<pre><samp>
<var>C</var> <var>A</var>::operator ++( int ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>fetch_add( 1 )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp>fetch_add( 1 )</samp>
</dd>
</dl>

<pre><samp>
<var>C</var> <var>A</var>::operator --( int ) volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>fetch_sub( 1 )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp>fetch_sub( 1 )</samp>
</dd>
</dl>

<pre><samp>
<var>C</var> <var>A</var>::operator ++() volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>fetch_add( 1 )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp>fetch_add( 1 ) + 1</samp>
</dd>
</dl>

<pre><samp>
<var>C</var> <var>A</var>::operator --() volatile
</samp></pre>
<dl>
<dt><i>Effects:</i>
<dd>
<samp>fetch_sub( 1 )</samp>
</dd>
<dt><i>Returns:</i>
<dd>
<samp>fetch_sub( 1 ) - 1</samp>
</dd>
</dl>

</blockquote>


<h3><a name="atomics.generic">29.6 Generic Types [atomics.generic]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
There is a generic <samp>atomic</samp> class template.
The template requires that its type argument be
trivially copy assignable and
bitwise equality comparable.
[<i>Note:</i>
Type arguments that are not also statically initializable
and trivially destructable
may be difficult to use.
<i>&mdash;end note</i>]

<p>
The <samp>atomic</samp> template instances
shall have a deleted copy constructor,
and a deleted copy assignment operator.
They shal have a constexpr explicit value constructor.

<p>
There are pointer partial specializations
on the <samp>atomic</samp> class template.
These specializations publically derive from <samp>atomic_address</samp>
For <samp>atomic</samp> pointer partial specializations,
the unit of addition/subtraction is the size of the referenced type.
These instances shall have trivial default constructors
and trivial destructors.

<p>
There are full specializations
over the integral types
on the <samp>atomic</samp> class template.
These specializations publically
derive from the corresponding atomic integral type.
These instances shall have trivial default constructors
and trivial destructors.

</blockquote>


<h3><a name="atomics.synopsis">29.7 Synopsis [atomics.synopsis]</a></h3>

<p> Add a new sub-clause with the following paragraphs.

<blockquote>

<p>
The atomic types and operations have the following synopsis.

<pre><samp>
namespace std {

#define ATOMIC_INTEGRAL_LOCK_FREE <var>implementation-defined</var>
#define ATOMIC_ADDRESS_LOCK_FREE <var>implementation-defined</var>

typedef enum memory_order {
       memory_order_relaxed, memory_order_acquire, memory_order_release,
       memory_order_acq_rel, memory_order_seq_cst
} memory_order;

typedef struct atomic_flag
{
    bool test_and_set( memory_order = memory_order_seq_cst ) volatile;
    void clear( memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;

    atomic_flag() = default;
    atomic_flag( const atomic_flag&amp; ) = delete;
    atomic_flag&amp; operator =( const atomic_flag&amp; ) = delete;
} atomic_flag;

#define ATOMIC_FLAG_INIT <var>implementation-defined</var>

extern const atomic_flag atomic_global_fence_compatibility;

bool atomic_flag_test_and_set( volatile atomic_flag* );
bool atomic_flag_test_and_set_explicit( volatile atomic_flag*,
         memory_order );
void atomic_flag_clear( volatile atomic_flag* );
void atomic_flag_clear_explicit( volatile atomic_flag*, memory_order );
void atomic_flag_fence( const volatile atomic_flag*, memory_order );

typedef struct atomic_bool
{
    bool is_lock_free() const volatile;
    void store( bool, memory_order = memory_order_seq_cst ) volatile;
    bool load( memory_order = memory_order_seq_cst ) volatile;
    bool swap( bool, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( bool&amp;, bool, memory_order, memory_order ) volatile;
    bool compare_swap( bool&amp;, bool,
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;

    atomic_bool() = default;
    constexpr explicit atomic_bool( bool );
    atomic_bool( const atomic_bool&amp; ) = delete;
    atomic_bool&amp; operator =( const atomic_bool &amp; ) = delete;
    bool operator =( bool ) volatile;
} atomic_bool;

bool atomic_is_lock_free( const volatile atomic_bool* );
void atomic_store( volatile atomic_bool*, bool );
void atomic_store_explicit( volatile atomic_bool*, bool, memory_order );
bool atomic_load( volatile atomic_bool* );
bool atomic_load_explicit( volatile atomic_bool*, memory_order );
bool atomic_swap( volatile atomic_bool* );
bool atomic_swap_explicit( volatile atomic_bool*, bool );
bool atomic_compare_swap( volatile atomic_bool*, bool*, bool );
bool atomic_compare_swap_explicit( volatile atomic_bool*, bool*, bool,
                                   memory_order, memory_order );
void atomic_fence( const volatile atomic_bool*, memory_order );

typedef struct atomic_address
{
    bool is_lock_free() const volatile;
    void store( void*, memory_order = memory_order_seq_cst ) volatile;
    void* load( memory_order = memory_order_seq_cst ) volatile;
    void* swap( void*, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( void*&amp;, void*,
                       memory_order, memory_order ) volatile;
    bool compare_swap( void*&amp;, void*,
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;
    void* fetch_add( ptrdiff_t,
                     memory_order = memory_order_seq_cst ) volatile;
    void* fetch_sub( ptrdiff_t,
                     memory_order = memory_order_seq_cst ) volatile;

    atomic_address() = default;
    constexpr explicit atomic_address( void* );
    atomic_address( const atomic_address&amp; ) = delete;
    atomic_address&amp; operator =( const atomic_address&amp; ) = delete;
    void* operator =( void* ) volatile;
    void* operator +=( ptrdiff_t ) volatile;
    void* operator -=( ptrdiff_t ) volatile;
} atomic_address;

bool atomic_is_lock_free( const volatile atomic_address* );
void atomic_store( volatile atomic_address*, void* );
void atomic_store_explicit( volatile atomic_address*, void*, memory_order );
void* atomic_load( volatile atomic_address* );
void* atomic_load_explicit( volatile atomic_address*, memory_order );
void* atomic_swap( volatile atomic_address* );
void* atomic_swap_explicit( volatile atomic_address*, void*, memory_order );
bool atomic_compare_swap( volatile atomic_address*, void**, void* );
bool atomic_compare_swap_explicit( volatile atomic_address*, void**, void*,
                                   memory_order, memory_order );
void atomic_fence( const volatile atomic_address*, memory_order );
void* atomic_fetch_add( volatile atomic_address*, ptrdiff_t );
void* atomic_fetch_add_explicit( volatile atomic_address*, ptrdiff_t,
                                 memory_order );
void* atomic_fetch_sub( volatile atomic_address*, ptrdiff_t );
void* atomic_fetch_sub_explicit( volatile atomic_address*, ptrdiff_t,
                                 memory_order );
</samp></pre>

<p>
And for each of
the <samp><var>integral</var></samp> (character and integer) types listed above,

<pre><samp>
typedef struct atomic_<var>integral</var>
{
    bool is_lock_free() const volatile;
    void store( <var>integral</var>, memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> load( memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> swap( <var>integral</var>,
                   memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( <var>integral</var>&amp;, <var>integral</var>,
                       memory_order, memory_order ) volatile;
    bool compare_swap( <var>integral</var>&amp;, <var>integral</var>,
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;
    <var>integral</var> fetch_add( <var>integral</var>,
                        memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> fetch_sub( <var>integral</var>,
                        memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> fetch_and( <var>integral</var>,
                        memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> fetch_or( <var>integral</var>,
                        memory_order = memory_order_seq_cst ) volatile;
    <var>integral</var> fetch_xor( <var>integral</var>,
                        memory_order = memory_order_seq_cst ) volatile;

    atomic_<var>integral</var>() = default;
    constexpr explicit atomic_<var>integral</var>( <var>integral</var> );
    atomic_<var>integral</var>( const atomic_<var>integral</var>&amp; ) = delete;
    atomic_<var>integral</var>&amp; operator =( const atomic_<var>integral</var> &amp; ) = delete;
    <var>integral</var> operator =( <var>integral</var> ) volatile;
    <var>integral</var> operator ++( int ) volatile;
    <var>integral</var> operator --( int ) volatile;
    <var>integral</var> operator ++() volatile;
    <var>integral</var> operator --() volatile;
    <var>integral</var> operator +=( <var>integral</var> ) volatile;
    <var>integral</var> operator -=( <var>integral</var> ) volatile;
    <var>integral</var> operator &amp;=( <var>integral</var> ) volatile;
    <var>integral</var> operator |=( <var>integral</var> ) volatile;
    <var>integral</var> operator ^=( <var>integral</var> ) volatile;
} atomic_<var>integral</var>;

bool atomic_is_lock_free( const volatile atomic_<var>integral</var>* );
void atomic_store( volatile atomic_<var>integral</var>*, <var>integral</var> );
void atomic_store_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                            memory_order );
<var>integral</var> atomic_load( volatile atomic_<var>integral</var>* );
<var>integral</var> atomic_load_explicit( volatile atomic_<var>integral</var>*, memory_order );
<var>integral</var> atomic_swap( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_swap_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                               memory_order );
bool atomic_compare_swap( volatile atomic_<var>integral</var>*, <var>integral</var>*, <var>integral</var> );
bool atomic_compare_swap_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>*,
                                   <var>integral</var>, memory_order, memory_order );
void atomic_fence( const volatile atomic_<var>integral</var>*, memory_order );
<var>integral</var> atomic_fetch_add( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_fetch_add_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                                    memory_order );
<var>integral</var> atomic_fetch_sub( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_fetch_sub_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                                    memory_order );
<var>integral</var> atomic_fetch_and( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_fetch_and_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                                    memory_order );
<var>integral</var> atomic_fetch_or( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_fetch_or_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                                    memory_order );
<var>integral</var> atomic_fetch_xor( volatile atomic_<var>integral</var>*, <var>integral</var> );
<var>integral</var> atomic_fetch_xor_explicit( volatile atomic_<var>integral</var>*, <var>integral</var>,
                                    memory_order );

template< TriviallyCopyConstructable T >
    requires AtomicComparable< T >
struct atomic
{
    bool is_lock_free() const volatile;
    void store( T, memory_order = memory_order_seq_cst ) volatile;
    T load( memory_order = memory_order_seq_cst ) volatile;
    T swap( T, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( T&amp;, T, memory_order, memory_order ) volatile;
    bool compare_swap( T&amp;, T,
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;

    atomic() = default;
    constexpr explicit atomic( T );
    atomic( const atomic&amp; ) = delete;
    atomic&amp; operator =( const atomic&amp; ) = delete;
    T operator =( T ) volatile;
};

template&lt;typename T&gt;
struct atomic&lt; T* &gt; : atomic_address
{
    T* fetch_add( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;
    T* fetch_sub( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;

    atomic() = default;
    constexpr explicit atomic( T );
    atomic( const atomic&amp; ) = delete;
    atomic&amp; operator =( const atomic&amp; ) = delete;

    T* operator =( T* ) volatile;
    T* operator ++( int ) volatile;
    T* operator --( int ) volatile;
    T* operator ++() volatile;
    T* operator --() volatile;
    T* operator +=( ptrdiff_t ) volatile;
    T* operator -=( ptrdiff_t ) volatile;
};

template&lt;&gt;
struct atomic&lt; <var>integral</var> &gt; : atomic_<var>integral</var>
{
    atomic() = default;
    constexpr explicit atomic( <var>integral</var> );
    atomic( const atomic&amp; ) = delete;
    atomic&amp; operator =( const atomic&amp; ) = delete;

    <var>integral</var> operator =( <var>integral</var> ) volatile;
};

} // namespace std
</samp></pre>

<p>
The standard provides two headers,
<samp>cstdatomic</samp> and
<samp>stdatomic.h</samp>.
The <samp>cstdatomic</samp> header
defines the types and functions in namespace <samp>std</samp>.
The <samp>stdatomic.h</samp> header
declares the types and functions in namespace <samp>std</samp>
and has using declarations for those types and functions
in the global namespace.

</blockquote>

<h2><a name="Implementation">Implementation</a></h2>

<p>
This proposal embeds
an example, minimally-conforming, implementation
for both C and C++.
The implementation uses a hash table of flags
and does busy waiting on the flags.

<h3><a name="ImplPresent">Notes on the Presentation</a></h3>

<p>
The proposal marks the defined interface
with the <samp>&lt;code&gt;</samp> font tag,
which typically renders in a <samp>teletype</samp> font.

<p>
The proposal marks the example implemenation
with the <samp>&lt;var&gt;</samp> font tag
within the <samp>&lt;code&gt;</samp> font tag,
which typically renders in an <samp><var>italic teletype</var></samp> font.
This example implementation is <em>not</em> part of the standard;
it is evidence of implementability.

<p>
The embedded source is a bash script
that generates the C and C++ source files.
We have taken this approach
because the definitions have a high degree of redundancy,
which would otherwise interfere with the readability of the document.

<p>
To extract the bash script from the HTML source,
use the following sed script.
(The bash script will also generate the sed script.)
<pre><code>
echo n2427.sed
cat &lt;&lt;EOF &gt;<var>n2427.sed</var>

<var>1,/&lt;code&gt;/        d
/&lt;\/code&gt;/,/&lt;code&gt;/    d
            s|&lt;var&gt;||g
            s|&lt;/var&gt;||g
            s|&amp;lt;|&lt;|g
            s|&amp;gt;|&gt;|g
            s|&amp;amp;|\&amp;|g</var>

EOF
</code></pre>

<p>
To compile the enclosed sources and examples,
use the following Makefile.
(The bash script will also generate the Makefile.)
<pre><code>
echo Makefile
cat &lt;&lt;EOF &gt;<var>Makefile</var>

<var>default : test

n2427.bash : n2427.html
	sed -f n2427.sed n2427.html &gt; n2427.bash

stdatomic.h cstdatomic impatomic.h impatomic.c n2427.c : n2427.bash
	bash n2427.bash

impatomic.o : impatomic.h impatomic.c
	gcc -std=c99 -c impatomic.c

n2427.c.exe : n2427.c stdatomic.h impatomic.o
	gcc -std=c99 -o n2427.c.exe n2427.c impatomic.o

n2427.c++.exe : n2427.c stdatomic.h impatomic.o
	g++ -o n2427.c++.exe n2427.c impatomic.o

test : n2427.c.exe n2427.c++.exe

clean :
	rm -f n2427.bash stdatomic.h cstdatomic impatomic.h impatomic.c
	rm -f impatomic.o n2427.c.exe n2427.c++.exe</var>

EOF
</code></pre>

<h3><a name="ImplFiles">Implementation Files</a></h3>

<p>
As is common practice,
we place the common portions of the C and C++ standard headers
in a separate implementation header.

<p>
The implementation header includes standard headers
to obtain basic typedefs.
<pre><code>
echo impatomic.h includes
cat &lt;&lt;EOF &gt;<var>impatomic.h</var>

#ifdef __cplusplus
#include &lt;cstddef&gt;
namespace std {
#else
#include &lt;stddef.h&gt;
#include &lt;stdbool.h&gt;
#endif

EOF
</code></pre>

<p>
The corresponding implementation source
includes the implementation header and <samp>stdint.h</samp>.
<pre><code>
echo impatomic.c includes
cat &lt;&lt;EOF &gt;<var>impatomic.c</var>

#include &lt;stdint.h&gt;
#include "<var>impatomic.h</var>"

EOF
</code></pre>

<h3><a name="ImplCPP0X">C++0x Features</a></h3>

<p>
Because current compilers do not support the new C++0x features,
we have surrounded these with a macro to conditionally remove them.

<pre><code>
echo impatomic.h CPP0X
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>#define CPP0X( feature )</var>

EOF
</code></pre>

<h3><a name="ImplOrder">Memory Order</a></h3>

<pre><code>
echo impatomic.h order
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef enum memory_order {
    memory_order_relaxed, memory_order_acquire, memory_order_release,
    memory_order_acq_rel, memory_order_seq_cst
} memory_order;

EOF
</code></pre>

<h3><a name="ImplFlag">Flag Type and Operations</a></h3>

<p>
To aid the emulated implementation,
the example implementation includes a predefined hashtable of locks
implemented via flags.

<pre><code>
echo impatomic.h flag
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef struct atomic_flag
{
#ifdef __cplusplus
    bool test_and_set( memory_order = memory_order_seq_cst ) volatile;
    void clear( memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;

    CPP0X( atomic_flag() = default; )
    CPP0X( atomic_flag( const atomic_flag&amp; ) = delete; )
    atomic_flag&amp; operator =( const atomic_flag&amp; ) CPP0X(=delete);

CPP0X(private:)
#endif
    <var>bool __f__</var>;
} atomic_flag;

#define ATOMIC_FLAG_INIT { <var>false</var> }

#ifdef __cplusplus
extern "C" {
#endif

<var>extern</var> bool atomic_flag_test_and_set( volatile atomic_flag* );
<var>extern</var> bool atomic_flag_test_and_set_explicit
( volatile atomic_flag*, memory_order );
<var>extern</var> void atomic_flag_clear( volatile atomic_flag* );
<var>extern</var> void atomic_flag_clear_explicit
( volatile atomic_flag*, memory_order );
<var>extern</var> void atomic_flag_fence
( const volatile atomic_flag*, memory_order );
<var>extern void __atomic_flag_wait__
( volatile atomic_flag* );</var>
<var>extern void __atomic_flag_wait_explicit__
( volatile atomic_flag*, memory_order );</var>
<var>extern volatile atomic_flag* __atomic_flag_for_address__
( const volatile void* __z__ )
__attribute__((const))</var>;

#ifdef __cplusplus
}
#endif

#ifdef __cplusplus

inline bool atomic_flag::test_and_set( memory_order __x__ ) volatile
{ return atomic_flag_test_and_set_explicit( this, __x__ ); }

inline void atomic_flag::clear( memory_order __x__ ) volatile
{ atomic_flag_clear_explicit( this, __x__ ); }

inline void atomic_flag::fence( memory_order __x__ ) const volatile
{ atomic_flag_fence( this, __x__ ); }

#endif

EOF
</code></pre>

<p>
The wait operation may be implemented with busy-waiting,
and hence must be used only with care.

<p>
The for_address function returns the address of a flag
for the given address.
Multiple argument values may yield a single flag,
and the implementation of locks may use these flags,
so no operation should attempt to hold any flag or lock
while holding a flag for an address.

<p>
The prototype implementation of flags
uses the
<a href="http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html#Atomic-Builtins"><samp>__sync</samp> macros</a>
from the <a href="http://gcc.gnu.org/">GNU C/C++ compiler</a>,
when available,
and otherwise uses a non-atomic implementation
with the expectation that vendors will replace it.
It might even be implemented with, for example,
<samp>pthread_mutex_trylock</samp>,
in which case the internal flag wait function
might just be <samp>pthread_mutex_lock</samp>.
This would of course tend to
make <samp>atomic_flag</samp> larger than necessary.

<p>
The prototype implementation of flags is implemented in C.
<pre><code>
echo impatomic.c flag
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.c</var>

<var>#if defined(__GNUC__)
#if __GNUC__ &gt; 4 || (__GNUC__ == 4 &amp;&amp; __GNUC_MINOR__ &gt; 0)
#define USE_SYNC
#endif
#endif</var>

bool atomic_flag_test_and_set_explicit
( volatile atomic_flag* __a__, memory_order __x__ )
<var>{
#ifdef USE_SYNC
    if ( __x__ &gt;= memory_order_acq_rel )
        __sync_synchronize();
    return __sync_lock_test_and_set( &amp;(__a__-&gt;__f__), 1 );
#else
    bool result = __a__-&gt;__f__;
    __a__-&gt;__f__ = true;
    return result;
#endif
}</var>

bool atomic_flag_test_and_set( volatile atomic_flag* __a__ )
{ return atomic_flag_test_and_set_explicit( __a__, memory_order_seq_cst ); }

void atomic_flag_clear_explicit
( volatile atomic_flag* __a__, memory_order __x__ )
<var>{
#ifdef USE_SYNC
    __sync_lock_release( &amp;(__a__-&gt;__f__) );
    if ( __x__ &gt;= memory_order_acq_rel )
        __sync_synchronize();
#else
    __a__-&gt;__f__ = false;
#endif
} </var>

void atomic_flag_clear( volatile atomic_flag* __a__ )
{ atomic_flag_clear_explicit( __a__, memory_order_seq_cst ); }

void atomic_flag_fence( const volatile atomic_flag* __a__, memory_order __x__ )
<var>{ 
#ifdef USE_SYNC
    __sync_synchronize();
#endif
} </var>

</code></pre>

<p>
Note that the following implementation of wait
is almost always wrong --
it has high contention.
Some form of exponential backoff prevents excessive contention.

<pre><code>
<var>void __atomic_flag_wait__( volatile atomic_flag* __a__ )
{ while ( atomic_flag_test_and_set( __a__ ) ); }</var>

<var>void __atomic_flag_wait_explicit__( volatile atomic_flag* __a__,
                                    memory_order __x__ )
{ while ( atomic_flag_test_and_set_explicit( __a__, __x__ ) ); }</var>

<var>#define LOGSIZE 4

static atomic_flag volatile __atomic_flag_anon_table__[ 1 &lt;&lt; LOGSIZE ] =
{
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
};</var>

<var>volatile atomic_flag* __atomic_flag_for_address__( const volatile void* __z__ )
{
    uintptr_t __u__ = (uintptr_t)__z__;
    __u__ += (__u__ &gt;&gt; 2) + (__u__ &lt;&lt; 4);
    __u__ += (__u__ &gt;&gt; 7) + (__u__ &lt;&lt; 5);
    __u__ += (__u__ &gt;&gt; 17) + (__u__ &lt;&lt; 13);
    if ( sizeof(uintptr_t) &gt; 4 ) __u__ += (__u__ &gt;&gt; 31);
    __u__ &amp;= ~((~(uintptr_t)0) &lt;&lt; LOGSIZE);
    return __atomic_flag_anon_table__ + __u__;
}</var>

EOF
</code></pre>

<h3><a name="ImplMacros">Implementation Macros</a></h3>

<p>
The remainder of the example implementation uses the following macros.
These macros exploit GNU extensions for
value-returning blocks (AKA statement expressions)
and <samp>__typeof__</samp>.

<p>
The macros rely on data fields of atomic structs being named <samp>__f__</samp>.
Other symbols used are
<samp>__a__</samp>=atomic, 
<samp>__e__</samp>=expected, 
<samp>__f__</samp>=field, 
<samp>__g__</samp>=flag, 
<samp>__m__</samp>=modified, 
<samp>__o__</samp>=operation, 
<samp>__r__</samp>=result, 
<samp>__p__</samp>=pointer to field,
<samp>__v__</samp>=value (for single evaluation),
<samp>__x__</samp>=memory-ordering, and
<samp>__y__</samp>=memory-ordering.

<pre><code>
echo impatomic.h macros implementation
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>#define _ATOMIC_LOAD_( __a__, __x__ ) \\
({ volatile __typeof__((__a__)-&gt;__f__)* __p__ = &amp;((__a__)-&gt;__f__); \\
   volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_explicit__( __g__, __x__ ); \\
   __typeof__((__a__)-&gt;__f__) __r__ = *__p__; \\
   atomic_flag_clear_explicit( __g__, __x__ ); \\
   __r__; })</var>

<var>#define _ATOMIC_STORE_( __a__, __m__, __x__ ) \\
({ volatile __typeof__((__a__)-&gt;__f__)* __p__ = &amp;((__a__)-&gt;__f__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_explicit__( __g__, __x__ ); \\
   *__p__ = __v__; \\
   atomic_flag_clear_explicit( __g__, __x__ ); \\
   __v__; })</var>

<var>#define _ATOMIC_MODIFY_( __a__, __o__, __m__, __x__ ) \\
({ volatile __typeof__((__a__)-&gt;__f__)* __p__ = &amp;((__a__)-&gt;__f__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_explicit__( __g__, __x__ ); \\
   __typeof__((__a__)-&gt;__f__) __r__ = *__p__; \\
   *__p__ __o__ __v__; \\
   atomic_flag_clear_explicit( __g__, __x__ ); \\
   __r__; })</var>

<var>#define _ATOMIC_CMPSWP_( __a__, __e__, __m__, __x__ ) \\
({ volatile __typeof__((__a__)-&gt;__f__)* __p__ = &amp;((__a__)-&gt;__f__); \\
   __typeof__(__e__) __q__ = (__e__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   bool __r__; \\
   volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_explicit__( __g__, __x__ ); \\
   __typeof__((__a__)-&gt;__f__) __t__ = *__p__; \\
   if ( __t__ == *__q__ ) { *__p__ = __v__; __r__ = true; } \\
   else { *__q__ = __t__; __r__ = false; } \\
   atomic_flag_clear_explicit( __g__, __x__ ); \\
   __r__; })</var>

<var>#define _ATOMIC_FENCE_( __a__, __x__ ) \\
({ volatile __typeof__((__a__)-&gt;__f__)* __p__ = &amp;((__a__)-&gt;__f__); \\
   volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ ); \\
   atomic_flag_fence( __g__, __x__ ); \\
   })</var>

EOF
</code></pre>

<h3><a name="ImplLockFree">Lock-Free Macro</a></h3>

<pre><code>
echo impatomic.h lock-free macros
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#define ATOMIC_INTEGRAL_LOCK_FREE <var>0</var>
#define ATOMIC_ADDRESS_LOCK_FREE <var>0</var>

EOF
</code></pre>

<h3><a name="ImplIntegral">Integral and Address Types</a></h3>

<p>
The standard defines atomic types
corresponding to booleans, addresses, integers, and for C++, wider characters.
These atomic types are defined in terms of a base type.

<p>
The base types have two names in this proposal,
a short name usually embedded within other identifiers,
and a long name for the base type.
The mapping between them is as follows.
<pre><code>
bool="bool"
address="void*"

INTEGERS="char schar uchar short ushort int uint long ulong llong ullong"
char="char"
schar="signed char"
uchar="unsigned char"
short="short"
ushort="unsigned short"
int="int"
uint="unsigned int"
long="long"
ulong="unsigned long"
llong="long long"
ullong="unsigned long long"

CHARACTERS="wchar_t"
# CHARACTERS="char16_t char32_t wchar_t" // char*_t not yet in compilers
char16_t="char16_t"
char32_t="char32_t"
wchar_t="wchar_t"
</code></pre>

<p>
In addition to types, some operations also need two names,
one for embedding within other identifiers,
and one consisting of the operator.
<pre><code>
ADR_OPERATIONS="add sub"
INT_OPERATIONS="add sub and or xor"
add="+"
sub="-"
and="&amp;"
or="|"
xor="^"
</code></pre>

<h4><a name="ImplBoolean">Boolean</a></h4>
<pre><code>
echo impatomic.h type boolean
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef struct atomic_bool
{
#ifdef __cplusplus
    bool is_lock_free() const volatile;
    void store( bool, memory_order = memory_order_seq_cst ) volatile;
    bool load( memory_order = memory_order_seq_cst ) volatile;
    bool swap( bool, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap ( bool&amp;, bool, memory_order, memory_order ) volatile;
    bool compare_swap ( bool&amp;, bool,
                        memory_order = memory_order_seq_cst) volatile;
    void fence( memory_order ) const volatile;

    CPP0X( atomic_bool() = delete; )
    CPP0X( constexpr explicit atomic_bool( bool __v__ ) : __f__( __v__ ) { } )
    CPP0X( atomic_bool( const atomic_bool&amp; ) = delete; )
    atomic_bool&amp; operator =( const atomic_bool&amp; ) CPP0X(=delete);

    bool operator =( bool __v__ ) volatile
    { store( __v__ ); return __v__; }

    friend void atomic_store_explicit( volatile atomic_bool*, bool,
                                       memory_order );
    friend bool atomic_load_explicit( volatile atomic_bool*, memory_order );
    friend bool atomic_swap_explicit( volatile atomic_bool*, bool,
                                      memory_order );
    friend bool atomic_compare_swap_explicit( volatile atomic_bool*, bool*, bool,
                                              memory_order, memory_order );
    friend void atomic_fence( const volatile atomic_bool*, memory_order );

CPP0X(private:)
#endif
    <var>bool __f__;</var>
} atomic_bool;

EOF
</code></pre>

<h4><a name="ImplAddress">Address</a></h4>
<pre><code>
echo impatomic.h type address
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef struct atomic_address
{
#ifdef __cplusplus
    bool is_lock_free() const volatile;
    void store( void*, memory_order = memory_order_seq_cst ) volatile;
    void* load( memory_order = memory_order_seq_cst ) volatile;
    void* swap( void*, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( void*&amp;, void*, memory_order, memory_order ) volatile;
    bool compare_swap( void*&amp;, void*,
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;
    void* fetch_add( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;
    void* fetch_sub( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;

    CPP0X( atomic_address() = default; )
    CPP0X( constexpr explicit atomic_address( void* __v__ ) : __f__( __v__) { } )
    CPP0X( atomic_address( const atomic_address&amp; ) = delete; )
    atomic_address&amp; operator =( const atomic_address &amp; ) CPP0X(=delete);

    void* operator =( void* __v__ ) volatile
    { store( __v__ ); return __v__; }

    void* operator +=( ptrdiff_t __v__ ) volatile
    { return fetch_add( __v__ ); }

    void* operator -=( ptrdiff_t __v__ ) volatile
    { return fetch_sub( __v__ ); }

    friend void atomic_store_explicit( volatile atomic_address*, void*,
                                       memory_order );
    friend void* atomic_load_explicit( volatile atomic_address*, memory_order );
    friend void* atomic_swap_explicit( volatile atomic_address*, void*,
                                       memory_order );
    friend bool atomic_compare_swap_explicit( volatile atomic_address*,
                              void**, void*, memory_order, memory_order );
    friend void atomic_fence( const volatile atomic_address*, memory_order );
    friend void* atomic_fetch_add_explicit( volatile atomic_address*, ptrdiff_t,
                                            memory_order );
    friend void* atomic_fetch_sub_explicit( volatile atomic_address*, ptrdiff_t,
                                            memory_order );

CPP0X(private:)
#endif
    void* __f__;
} atomic_address;

EOF
</code></pre>

<h4><a name="ImplIntegers">Integers</a></h4>
<pre><code>
echo impatomic.h type integers
for TYPEKEY in ${INTEGERS}
do
TYPENAME=${!TYPEKEY}
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef struct atomic_${TYPEKEY}
{
#ifdef __cplusplus
    bool is_lock_free() const volatile;
    void store( ${TYPENAME},
                memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} load( memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} swap( ${TYPENAME},
                      memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( ${TYPENAME}&amp;, ${TYPENAME},
                       memory_order, memory_order ) volatile;
    bool compare_swap( ${TYPENAME}&amp;, ${TYPENAME},
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;
    ${TYPENAME} fetch_add( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_sub( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_and( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_or( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_xor( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;

    CPP0X( atomic_${TYPEKEY}() = default; )
    CPP0X( constexpr atomic_${TYPEKEY}( ${TYPENAME} __v__ ) : __f__( __v__) { } )
    CPP0X( atomic_${TYPEKEY}( const atomic_${TYPEKEY}&amp; ) = delete; )
    atomic_${TYPEKEY}&amp; operator =( const atomic_${TYPEKEY}&amp; ) CPP0X(=delete);

    ${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile
    { store( __v__ ); return __v__; }

    ${TYPENAME} operator ++( int ) volatile
    { return fetch_add( 1 ); }

    ${TYPENAME} operator --( int ) volatile
    { return fetch_sub( 1 ); }

    ${TYPENAME} operator ++() volatile
    { return fetch_add( 1 ) + 1; }

    ${TYPENAME} operator --() volatile
    { return fetch_sub( 1 ) - 1; }

    ${TYPENAME} operator +=( ${TYPENAME} __v__ ) volatile
    { return fetch_add( __v__ ) + __v__; }

    ${TYPENAME} operator -=( ${TYPENAME} __v__ ) volatile
    { return fetch_sub( __v__ ) - __v__; }

    ${TYPENAME} operator &amp;=( ${TYPENAME} __v__ ) volatile
    { return fetch_and( __v__ ) &amp; __v__; }

    ${TYPENAME} operator |=( ${TYPENAME} __v__ ) volatile
    { return fetch_or( __v__ ) | __v__; }

    ${TYPENAME} operator ^=( ${TYPENAME} __v__ ) volatile
    { return fetch_xor( __v__ ) ^ __v__; }

    friend void atomic_store_explicit( volatile atomic_${TYPEKEY}*, ${TYPENAME},
                                       memory_order );
    friend ${TYPENAME} atomic_load_explicit( volatile atomic_${TYPEKEY}*,
                                             memory_order );
    friend ${TYPENAME} atomic_swap_explicit( volatile atomic_${TYPEKEY}*,
                                             ${TYPENAME}, memory_order );
    friend bool atomic_compare_swap_explicit( volatile atomic_${TYPEKEY}*,
                      ${TYPENAME}*, ${TYPENAME}, memory_order, memory_order );
    friend void atomic_fence( const volatile atomic_${TYPEKEY}*, memory_order );
    friend ${TYPENAME} atomic_fetch_add_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_sub_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_and_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_or_explicit(  volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_xor_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );

CPP0X(private:)
#endif
    <var>${TYPENAME} __f__;</var>
} atomic_${TYPEKEY};

EOF
done
</code></pre>

<h4><a name="ImplTypedefs">Integer Typedefs</a></h4>

<p>
The following typedefs
support atomic versons
of the <samp>cstdint</samp> and <samp>stdint.h</samp> types.
<pre><code>
echo impatomic.h typedefs integers
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef <var>atomic_schar</var> atomic_int_least8_t;
typedef <var>atomic_uchar</var> atomic_uint_least8_t;
typedef <var>atomic_short</var> atomic_int_least16_t;
typedef <var>atomic_ushort</var> atomic_uint_least16_t;
typedef <var>atomic_int</var> atomic_int_least32_t;
typedef <var>atomic_uint</var> atomic_uint_least32_t;
typedef <var>atomic_llong</var> atomic_int_least64_t;
typedef <var>atomic_ullong</var> atomic_uint_least64_t;

typedef <var>atomic_schar</var> atomic_int_fast8_t;
typedef <var>atomic_uchar</var> atomic_uint_fast8_t;
typedef <var>atomic_short</var> atomic_int_fast16_t;
typedef <var>atomic_ushort</var> atomic_uint_fast16_t;
typedef <var>atomic_int</var> atomic_int_fast32_t;
typedef <var>atomic_uint</var> atomic_uint_fast32_t;
typedef <var>atomic_llong</var> atomic_int_fast64_t;
typedef <var>atomic_ullong</var> atomic_uint_fast64_t;

typedef <var>atomic_long</var> atomic_intptr_t;
typedef <var>atomic_ulong</var> atomic_uintptr_t;

typedef <var>atomic_long</var> atomic_ssize_t;
typedef <var>atomic_ulong</var> atomic_size_t;

typedef <var>atomic_long</var> atomic_ptrdiff_t;

typedef <var>atomic_llong</var> atomic_intmax_t;
typedef <var>atomic_ullong</var> atomic_uintmax_t;

EOF
</code></pre>

<h4><a name="ImplCharacters">Characters</a></h4>
<pre><code>
echo impatomic.h type characters
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

EOF

for TYPEKEY in ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

typedef struct atomic_${TYPEKEY}
{
#ifdef __cplusplus
    bool is_lock_free() const volatile;
    void store( ${TYPENAME}, memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} load( memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} swap( ${TYPENAME},
                      memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( ${TYPENAME}&amp;, ${TYPENAME},
                       memory_order, memory_order ) volatile;
    bool compare_swap( ${TYPENAME}&amp;, ${TYPENAME},
                       memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;
    ${TYPENAME} fetch_add( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_sub( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_and( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_or( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;
    ${TYPENAME} fetch_xor( ${TYPENAME},
                           memory_order = memory_order_seq_cst ) volatile;

    CPP0X( atomic_${TYPENAME}() = default; )
    CPP0X( constexpr atomic_${TYPEKEY}( ${TYPENAME} __v__ ) : __f__( __v__) { } )
    CPP0X( atomic_${TYPENAME}( const atomic_${TYPENAME}&amp; ) = delete; )
    atomic_${TYPENAME}&amp; operator =( const atomic_${TYPENAME}&amp; ) CPP0X(=delete);

    ${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile
    { store( __v__ ); return __v__; }

    ${TYPENAME} operator ++( int ) volatile
    { return fetch_add( 1 ); }

    ${TYPENAME} operator --( int ) volatile
    { return fetch_sub( 1 ); }

    ${TYPENAME} operator ++() volatile
    { return fetch_add( 1 ) + 1; }

    ${TYPENAME} operator --() volatile
    { return fetch_sub( 1 ) - 1; }

    ${TYPENAME} operator +=( ${TYPENAME} __v__ ) volatile
    { return fetch_add( __v__ ) + __v__; }

    ${TYPENAME} operator -=( ${TYPENAME} __v__ ) volatile
    { return fetch_sub( __v__ ) - __v__; }

    ${TYPENAME} operator &amp;=( ${TYPENAME} __v__ ) volatile
    { return fetch_and( __v__ ) &amp; __v__; }

    ${TYPENAME} operator |=( ${TYPENAME} __v__ ) volatile
    { return fetch_or( __v__ ) | __v__; }

    ${TYPENAME} operator ^=( ${TYPENAME} __v__ ) volatile
    { return fetch_xor( __v__ ) ^ __v__; }

    friend void atomic_store_explicit( volatile atomic_${TYPEKEY}*, ${TYPENAME},
                                       memory_order );
    friend ${TYPENAME} atomic_load_explicit( volatile atomic_${TYPEKEY}*,
                                             memory_order );
    friend ${TYPENAME} atomic_swap_explicit( volatile atomic_${TYPEKEY}*,
                                             ${TYPENAME}, memory_order );
    friend bool atomic_compare_swap_explicit( volatile atomic_${TYPEKEY}*,
                    ${TYPENAME}*, ${TYPENAME}, memory_order, memory_order );
    friend void atomic_fence( const volatile atomic_${TYPEKEY}*, memory_order );
    friend ${TYPENAME} atomic_fetch_add_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_sub_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_and_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_or_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );
    friend ${TYPENAME} atomic_fetch_xor_explicit( volatile atomic_${TYPEKEY}*,
                                                  ${TYPENAME}, memory_order );

CPP0X(private:)
#endif
    <var>${TYPENAME} __f__;</var>
} atomic_${TYPEKEY};

EOF
done

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#else

typedef <var>atomic_int_least16_t</var> atomic_char16_t;
typedef <var>atomic_int_least32_t</var> atomic_char32_t;
typedef <var>atomic_int_least32_t</var> atomic_wchar_t;

#endif

EOF
</code></pre>

<h3><a name="ImplTemplate">Template Types</a></h3>

<p>
This section defines the <samp>atomic</samp> template
and its various specializations.</p>

<h4><a name="ImplGeneric">Fully Generic Type</a></h4>

<p>
This minimal implementation does not specialize on size.

<pre><code>
echo impatomic.h type generic
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

template&lt; typename T &gt;
struct atomic
{
#ifdef __cplusplus

    bool is_lock_free() const volatile;
    void store( T, memory_order = memory_order_seq_cst ) volatile;
    T load( memory_order = memory_order_seq_cst ) volatile;
    T swap( T __v__, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( T&amp;, T, memory_order, memory_order ) volatile;
    bool compare_swap( T&amp;, T, memory_order = memory_order_seq_cst ) volatile;
    void fence( memory_order ) const volatile;

    CPP0X( atomic() = default; )
    CPP0X( constexpr explicit atomic( T __v__ ) : __f__( __v__ ) { } )
    CPP0X( atomic( const atomic&amp; ) = delete; )
    atomic&amp; operator =( const atomic&amp; ) CPP0X(=delete);

    T operator =( T __v__ ) volatile
    { store( __v__ ); return __v__; }

CPP0X(private:)
#endif
    T <var>__f__</var>;
};

#endif
EOF
</code></pre>

<h4><a name="ImplPointer">Pointer Partial Specialization</a></h4>
<pre><code>
echo impatomic.h type pointer
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

template&lt;typename T&gt; struct atomic&lt; T* &gt; : atomic_address
{
    T* load( memory_order = memory_order_seq_cst ) volatile;
    T* swap( T*, memory_order = memory_order_seq_cst ) volatile;
    bool compare_swap( T*&amp;, T*, memory_order, memory_order ) volatile;
    bool compare_swap( T*&amp;, T*,
                       memory_order = memory_order_seq_cst ) volatile;
    T* fetch_add( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;
    T* fetch_sub( ptrdiff_t, memory_order = memory_order_seq_cst ) volatile;

    CPP0X( atomic() = default; )
    CPP0X( constexpr explicit atomic( T __v__ ) : atomic_address( __v__ ) { } )
    CPP0X( atomic( const atomic&amp; ) = delete; )
    atomic&amp; operator =( const atomic&amp; ) CPP0X(=delete);

    T* operator =( T* __v__ ) volatile
    { store( __v__ ); return __v__; }

    T* operator ++( int ) volatile
    { return fetch_add( 1 ); }

    T* operator --( int ) volatile
    { return fetch_sub( 1 ); }

    T* operator ++() volatile
    { return fetch_add( 1 ) + 1; }

    T* operator --() volatile
    { return fetch_sub( 1 ) - 1; }

    T* operator +=( T* __v__ ) volatile
    { return fetch_add( __v__ ) + __v__; }

    T* operator -=( T* __v__ ) volatile
    { return fetch_sub( __v__ ) - __v__; }
};

#endif
EOF
</code></pre>

<h4><a name="ImplSpecial">Integral Full Specializations</a></h4>

<p>
We provide full specializations of the generic atomic
for integers and characters.
These specializations derive from the specific atomic types
to enable implicit reference conversions.
The implicit assignment of the derived class
prevents inheriting the base class assignments,
and so the assignment must be explicitly redeclared.
<pre><code>
echo impatomic.h type specializations
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

EOF

for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

template&lt;&gt; struct atomic&lt; ${TYPENAME} &gt; : atomic_${TYPEKEY}
{
    CPP0X( atomic() = default; )
    CPP0X( constexpr explicit atomic( ${TYPENAME} __v__ )
    : atomic_${TYPEKEY}( __v__ ) { } )
    CPP0X( atomic( const atomic&amp; ) = delete; )
    atomic&amp; operator =( const atomic&amp; ) CPP0X(=delete);

    ${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile
    { store( __v__ ); return __v__; }
};

EOF
done

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#endif

EOF
</code></pre>

<h3><a name="ImplFunctions">C++ Core Functions</a></h3>

<p>
In C++, these operations are implemented as overloaded functions.
<pre><code>
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

EOF

echo impatomic.h functions ordinary basic
for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> bool atomic_is_lock_free( const volatile atomic_${TYPEKEY}* __a__ )
<var>{ return false; }</var>

<var>inline</var> ${TYPENAME} atomic_load_explicit
( volatile atomic_${TYPEKEY}* __a__, memory_order __x__ )
<var>{ return _ATOMIC_LOAD_( __a__, __x__ ); }</var>

<var>inline</var> ${TYPENAME} atomic_load( volatile atomic_${TYPEKEY}* __a__ )
<var>{ return atomic_load_explicit( __a__, memory_order_seq_cst ); }</var>

<var>inline</var> void atomic_store_explicit
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__, memory_order __x__ )
<var>{ _ATOMIC_STORE_( __a__, __m__, __x__ ); }</var>

<var>inline</var> void atomic_store
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__ )
<var>{ atomic_store_explicit( __a__, __m__, memory_order_seq_cst ); }</var>

<var>inline</var> ${TYPENAME} atomic_swap_explicit
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__, memory_order __x__ )
<var>{ return _ATOMIC_MODIFY_( __a__, =, __m__, __x__ ); }</var>

<var>inline</var> ${TYPENAME} atomic_swap
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__ )
<var>{ return atomic_swap_explicit( __a__, __m__, memory_order_seq_cst ); }</var>

<var>inline</var> bool atomic_compare_swap_explicit
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME}* __e__, ${TYPENAME} __m__,
  memory_order __x__, memory_order __y__ )
<var>{ return _ATOMIC_CMPSWP_( __a__, __e__, __m__, __x__ ); }</var>

<var>inline</var> bool atomic_compare_swap
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME}* __e__, ${TYPENAME} __m__ )
<var>{ return atomic_compare_swap_explicit( __a__, __e__, __m__,
                 memory_order_seq_cst, memory_order_seq_cst ); }</var>

<var>inline</var> void atomic_fence
( const volatile atomic_${TYPEKEY}* __a__, memory_order __x__ )
<var>{ _ATOMIC_FENCE_( __a__, __x__ ); }</var>

EOF
done

echo impatomic.h functions address fetch
TYPEKEY=address
TYPENAME=${!TYPEKEY}

for FNKEY in ${ADR_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> ${TYPENAME} atomic_fetch_${FNKEY}_explicit
( volatile atomic_${TYPEKEY}* __a__, ptrdiff_t __m__, memory_order __x__ )
<var>{ ${TYPENAME} volatile* __p__ = &amp;((__a__)-&gt;__f__);
  volatile atomic_flag* __g__ = __atomic_flag_for_address__( __p__ );
  __atomic_flag_wait_explicit__( __g__, __x__ );
  ${TYPENAME} __r__ = *__p__;
  *__p__ = (${TYPENAME})((char*)(*__p__) ${OPERATOR} __m__);
  atomic_flag_clear_explicit( __g__, __x__ );
  return __r__; }</var>

<var>inline</var> ${TYPENAME} atomic_fetch_${FNKEY}
( volatile atomic_${TYPEKEY}* __a__, ptrdiff_t __m__ )
<var>{ return atomic_fetch_${FNKEY}_explicit( __a__, __m__, memory_order_seq_cst ); }</var>

EOF
done

echo impatomic.h functions integer fetch
for TYPEKEY in ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> ${TYPENAME} atomic_fetch_${FNKEY}_explicit
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__, memory_order __x__ )
<var>{ return _ATOMIC_MODIFY_( __a__, ${OPERATOR}=, __m__, __x__ ); }</var>

<var>inline</var> ${TYPENAME} atomic_fetch_${FNKEY}
( volatile atomic_${TYPEKEY}* __a__, ${TYPENAME} __m__ )
<var>{ atomic_fetch_${FNKEY}_explicit( __a__, __m__, memory_order_seq_cst ); }</var>

EOF
done
done
</code></pre>

<h3><a name="ImplCoreMacros">C Core Macros</a></h3>

<p>
For C, we need type-generic macros.
<pre><code>
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#else

EOF

echo impatomic.h type-generic macros basic
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#define atomic_is_lock_free( __a__ ) \\
<var>false</var>

#define atomic_load( __a__ ) \\
<var>_ATOMIC_LOAD_( __a__, memory_order_seq_cst )</var>

#define atomic_load_explicit( __a__, __x__ ) \\
<var>_ATOMIC_LOAD_( __a__, __x__ )</var>

#define atomic_store( __a__, __m__ ) \\
<var>_ATOMIC_STORE_( __a__, __m__, memory_order_seq_cst )</var>

#define atomic_store_explicit( __a__, __m__, __x__ ) \\
<var>_ATOMIC_STORE_( __a__, __m__, __x__ )</var>

#define atomic_swap( __a__, __m__ ) \\
<var>_ATOMIC_MODIFY_( __a__, =, __m__, memory_order_seq_cst )</var>

#define atomic_swap_explicit( __a__, __m__, __x__ ) \\
<var>_ATOMIC_MODIFY_( __a__, =, __m__, __x__ )</var>

#define atomic_compare_swap( __a__, __e__, __m__ ) \\
<var>_ATOMIC_CMPSWP_( __a__, __e__, __m__, memory_order_seq_cst )</var>

#define atomic_compare_swap_explicit( __a__, __e__, __m__, __x__, __y__ ) \\
<var>_ATOMIC_CMPSWP_( __a__, __e__, __m__, __x__ )</var>

#define atomic_fence( __a__, __x__ ) \\
<var>({ _ATOMIC_FENCE_( __a__, __x__ ); })</var>

EOF

echo impatomic.h type-generic macros fetch
for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#define atomic_fetch_${FNKEY}_explicit( __a__, __m__, __x__ ) \\
<var>_ATOMIC_MODIFY_( __a__, ${OPERATOR}=, __m__, __x__ )</var>

#define atomic_fetch_${FNKEY}( __a__, __m__ ) \\
<var>_ATOMIC_MODIFY_( __a__, ${OPERATOR}=, __m__, memory_order_seq_cst )</var>

EOF
done

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#endif

EOF
</code></pre>

<h3><a name="ImplMethods">C++ Methods</a></h3>

<p>
The core functions are difficult to use,
and so the proposal includes member function definitions
that are syntactically simpler.
The member operators are defined in the class definitions.

<pre><code>
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus

EOF

echo impatomic.h methods ordinary basic
for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> bool atomic_${TYPEKEY}::is_lock_free() const volatile
<var>{ return false; }</var>

<var>inline</var> void atomic_${TYPEKEY}::store
( ${TYPENAME} __m__, memory_order __x__ ) volatile
<var>{ atomic_store_explicit( this, __m__, __x__ ); }</var>

<var>inline</var> ${TYPENAME} atomic_${TYPEKEY}::load
( memory_order __x__ ) volatile
<var>{ return atomic_load_explicit( this, __x__ ); }</var>

<var>inline</var> ${TYPENAME} atomic_${TYPEKEY}::swap
( ${TYPENAME} __m__, memory_order __x__ ) volatile
<var>{ return atomic_swap_explicit( this, __m__, __x__ ); }</var>

<var>inline</var> bool atomic_${TYPEKEY}::compare_swap
( ${TYPENAME}&amp; __e__, ${TYPENAME} __m__,
  memory_order __x__, memory_order __y__ ) volatile
<var>{ return atomic_compare_swap_explicit( this, &amp;__e__, __m__, __x__, __y__ ); }</var>

<var>inline</var> bool atomic_${TYPEKEY}::compare_swap
( ${TYPENAME}&amp; __e__, ${TYPENAME} __m__, memory_order __x__ ) volatile
<var>{ return atomic_compare_swap_explicit( this, &amp;__e__, __m__, __x__,
      __x__ == memory_order_acq_rel ? memory_order_acquire :
      __x__ == memory_order_release ? memory_order_relaxed : __x__ ); }</var>

<var>inline</var> void atomic_${TYPEKEY}::fence
( memory_order __x__ ) const volatile
<var>{ return atomic_fence( this, __x__ ); }</var>

EOF
done

echo impatomic.h methods template basic
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

template&lt; typename T &gt;
<var>inline</var> bool atomic&lt;T&gt;::is_lock_free() const volatile
<var>{ return false; }</var>

template&lt; typename T &gt;
<var>inline</var> void atomic&lt;T&gt;::store( T __v__, memory_order __x__ ) volatile
<var>{ _ATOMIC_STORE_( this, __v__, __x__ ); }</var>

template&lt; typename T &gt;
<var>inline</var> T atomic&lt;T&gt;::load( memory_order __x__ ) volatile
<var>{ return _ATOMIC_LOAD_( this, __x__ ); }</var>

template&lt; typename T &gt;
<var>inline</var> T atomic&lt;T&gt;::swap( T __v__, memory_order __x__ ) volatile
<var>{ return _ATOMIC_MODIFY_( this, =, __v__, __x__ ); }</var>

template&lt; typename T &gt;
<var>inline</var> bool atomic&lt;T&gt;::compare_swap
( T&amp; __r__, T __v__, memory_order __x__, memory_order __y__ ) volatile
<var>{ return _ATOMIC_CMPSWP_( this, &amp;__r__, __v__, __x__ ); }</var>

template&lt; typename T &gt;
<var>inline</var> bool atomic&lt;T&gt;::compare_swap
( T&amp; __r__, T __v__, memory_order __x__ ) volatile
<var>{ return compare_swap( __r__, __v__, __x__,
      __x__ == memory_order_acq_rel ? memory_order_acquire :
      __x__ == memory_order_release ? memory_order_relaxed : __x__ ); }</var>

EOF

echo impatomic.h methods address fetch
TYPEKEY=address
TYPENAME=${!TYPEKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> void* atomic_address::fetch_add
( ptrdiff_t __m__, memory_order __x__ ) volatile
<var>{ return atomic_fetch_add_explicit( this, __m__, __x__ ); }</var>

<var>inline</var> void* atomic_address::fetch_sub
( ptrdiff_t __m__, memory_order __x__ ) volatile
<var>{ return atomic_fetch_sub_explicit( this, __m__, __x__ ); }</var>

EOF

echo impatomic.h methods integer fetch
for TYPEKEY in ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

<var>inline</var> ${TYPENAME} atomic_${TYPEKEY}::fetch_${FNKEY}
( ${TYPENAME} __m__, memory_order __x__ ) volatile
<var>{ return atomic_fetch_${FNKEY}_explicit( this, __m__, __x__ ); }</var>

EOF
done
done

echo impatomic.h methods pointer fetch
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

template&lt; typename T &gt;
T* atomic&lt;T*&gt;::load( memory_order __x__ ) volatile
<var>{ return static_cast&lt;T*&gt;( atomic_address::load( __x__ ) ); }</var>

template&lt; typename T &gt;
T* atomic&lt;T*&gt;::swap( T* __v__, memory_order __x__ ) volatile
<var>{ return static_cast&lt;T*&gt;( atomic_address::swap( __v__, __x__ ) ); }</var>

template&lt; typename T &gt;
bool atomic&lt;T*&gt;::compare_swap
( T*&amp; __r__, T* __v__, memory_order __x__, memory_order __y__) volatile
<var>{ return atomic_address::compare_swap( *reinterpret_cast&lt;void**&gt;( &amp;__r__ ),
               static_cast&lt;void*&gt;( __v__ ), __x__, __y__ ); }</var>
//<var>{ return _ATOMIC_CMPSWP_( this, &amp;__r__, __v__, __x__ ); }</var>

template&lt; typename T &gt;
bool atomic&lt;T*&gt;::compare_swap
( T*&amp; __r__, T* __v__, memory_order __x__ ) volatile
<var>{ return compare_swap( __r__, __v__, __x__,
      __x__ == memory_order_acq_rel ? memory_order_acquire :
      __x__ == memory_order_release ? memory_order_relaxed : __x__ ); }</var>

template&lt; typename T &gt;
T* atomic&lt;T*&gt;::fetch_add( ptrdiff_t __v__, memory_order __x__ ) volatile
<var>{ return atomic_fetch_add_explicit( this, sizeof(T) * __v__, __x__ ); }</var>

template&lt; typename T &gt;
T* atomic&lt;T*&gt;::fetch_sub( ptrdiff_t __v__, memory_order __x__ ) volatile
<var>{ return atomic_fetch_sub_explicit( this, sizeof(T) * __v__, __x__ ); }</var>

EOF

cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#endif

EOF
</code></pre>

<h3><a name="ImplCleanup">Implementation Header Cleanup</a></h3>
<pre><code>
echo impatomic.h close namespace
cat &lt;&lt;EOF &gt;&gt;<var>impatomic.h</var>

#ifdef __cplusplus
} // namespace std
#endif

EOF
</code></pre>

<h3><a name="ImplStandard">Standard Headers</a></h3>

<p>
The C standard header.
<pre><code>
echo stdatomic.h
cat &lt;&lt;EOF &gt;stdatomic.h

#include "<var>impatomic.h</var>"

#ifdef __cplusplus

EOF

for TYPEKEY in flag bool address ${INTEGERS} ${CHARACTERS}
do
cat &lt;&lt;EOF &gt;&gt;stdatomic.h

using std::atomic_${TYPEKEY};

EOF
done

cat &lt;&lt;EOF &gt;&gt;stdatomic.h

using std::atomic;
using std::memory_order;
using std::memory_order_relaxed;
using std::memory_order_acquire;
using std::memory_order_release;
using std::memory_order_acq_rel;
using std::memory_order_seq_cst;

#endif

EOF

</code></pre>

<p>
The C++ standard header.
<pre><code>
echo cstdatomic
cat &lt;&lt;EOF &gt;cstdatomic

#include "<var>impatomic.h</var>"

EOF
</code></pre>

<h2><a name="Examples">Examples of Use</a></h2>

<p>
The following program shows example uses of the atomic types,
in both C and C++.
These examples also serve as tests for the interface definition.
<pre><code>
echo n2427.c include
cat &lt;&lt;EOF &gt;n2427.c

#include "stdatomic.h"

EOF
</code></pre>

<h3><a name="ExampleFlag">Flag</a></h3>

<p>
We show two uses,
global functions with explicit memory ordering
and member functions with implicit memory ordering.

<pre><code>
echo n2427.c flag
cat &lt;&lt;EOF &gt;&gt;n2427.c

atomic_flag af = ATOMIC_FLAG_INIT;

void flag_example( void )
{
    if ( ! atomic_flag_test_and_set_explicit( &amp;af, memory_order_acquire ) )
        atomic_flag_clear_explicit( &amp;af, memory_order_release );
#ifdef __cplusplus
    if ( ! af.test_and_set() )
        af.clear();
#endif
}

EOF
</code></pre>

<h3><a name="ExampleLazy">Lazy Initialization</a></h3>

<p>
For lazy initialization,
a thread that does not do initialization
may need to wait on the thread that does.
(Lazy initialization is similar to double-checked locking.)
For this example, we busy wait on a boolean.
Busy waiting like this is usually ill-advised,
but it sufficies for the example.
There are three variants of the example:
one using strong C++ operators and methods,
one using weak C functions, and
one using fence-based C++ operators and methods.

<pre><code>
echo n2427.c lazy
cat &lt;&lt;EOF &gt;&gt;n2427.c

atomic_bool lazy_ready = { false };
atomic_bool lazy_assigned = { false };
int lazy_value;

#ifdef __cplusplus

int lazy_example_strong_cpp( void )
{
    if ( ! lazy_ready.load() ) {
        /* the value is not yet ready */
        if ( lazy_assigned.swap( true ) ) {
            /* initialization assigned to another thread; wait */
            while ( ! lazy_ready.load() );
        }
        else {
            lazy_value = 42;
            lazy_ready = true;
        }
    }
    return lazy_value;
}

#endif

int lazy_example_weak_c( void )
{
    if ( ! atomic_load_explicit( &amp;lazy_ready, memory_order_acquire ) ) {
        if ( atomic_swap_explicit( &amp;lazy_assigned, true,
                                   memory_order_relaxed ) ) {
            while ( ! atomic_load_explicit( &amp;lazy_ready,
                                            memory_order_acquire ) );
        }
        else {
            lazy_value = 42;
            atomic_store_explicit( &amp;lazy_ready, true, memory_order_release );
        }
    }
    return lazy_value;
}

#ifdef __cplusplus

int lazy_example_fence_cpp( void )
{
    if ( lazy_ready.load( memory_order_relaxed ) )
        lazy_ready.fence( memory_order_acquire );
    else if ( lazy_assigned.swap( true, memory_order_relaxed ) ) {
        while ( ! lazy_ready.load( memory_order_relaxed ) );
        lazy_ready.fence( memory_order_acquire );
    }
    else {
        lazy_value = 42;
        lazy_ready.store( true, memory_order_release );
    }
    return lazy_value;
}

#endif

EOF
</code></pre>

<h3><a name="ExampleInteger">Integer</a></h3>
<pre><code>
echo n2427.c integer
cat &lt;&lt;EOF &gt;&gt;n2427.c

atomic_ulong volatile aulv = { 0 };
atomic_ulong auln = { 1 };
#ifdef __cplusplus
atomic&lt; unsigned long &gt; taul CPP0X( { 3 } );
#endif

void integer_example( void )
{
    atomic_ulong a = { 3 };
    unsigned long x = atomic_load( &amp;auln );
    atomic_store_explicit( &amp;aulv, x, memory_order_release );
    unsigned long y = atomic_fetch_add_explicit( &amp;aulv, 1,
                                                 memory_order_relaxed );
    unsigned long z = atomic_fetch_xor( &amp;auln, 4 );
#ifdef __cplusplus
    // x = auln; // implicit conversion disallowed
    x = auln.load();
    aulv = x;
    auln += 1;
    aulv ^= 4;
    // auln = aulv; // uses a deleted operator
    aulv -= auln++;
    auln |= --aulv;
    aulv &amp;= 7;
    atomic_store_explicit( &amp;taul, 7, memory_order_release );
    x = taul.load( memory_order_acquire );
    y = atomic_fetch_add_explicit( &amp; taul, 1, memory_order_acquire );
    z = atomic_fetch_xor( &amp; taul, 4 );
    x = taul.load();
    // auln = taul; // uses a deleted operator
    // taul = aulv; // uses a deleted operator
    taul = x;
    taul += 1;
    taul ^= 4;
    taul -= taul++;
    taul |= --taul;
    taul &amp;= 7;
#endif
}

EOF
</code></pre>

<p>
Note that because <samp>taul</samp> is not a volatile variable,
the compiler would be permitted to merge the last six statements.</p>

<h3><a name="ExampleEvent">Event Counter</a></h3>

<p>
An event counter is not part of the communication between threads,
and so it can use faster primitives.
<pre><code>
echo n2427.c event
cat &lt;&lt;EOF &gt;&gt;n2427.c

#ifdef __cplusplus

struct event_counter
{
    void inc() { au.fetch_add( 1, memory_order_relaxed ); }
    unsigned long get() { au.load( memory_order_relaxed ); }
    atomic_ulong au;
};
event_counter ec = { 0 };

void generate_events()
{
    ec.inc();
    ec.inc();
    ec.inc();
}

int read_events()
{
    return ec.get();
}

int event_example()
{
    generate_events(); // possibly in multiple threads
    // join all other threads, ensuring that final value is written
    return read_events();
}

#endif

EOF
</code></pre>

<p>
An important point here is that this is safe,
and we are guaranteed to see exactly the final value,
because the thread joins force the necessary ordering
between the <samp>inc</samp> calls and the <samp>get</samp> call.

<h3><a name="ExampleList">List Insert</a></h3>

<p>
The insertion into a shared linked list
can be accomplished in a lock-free manner with compare-and-swap,
provided that compare-and-swap is lock-free.
(Note that adding a correct "remove" operation
is harder than it seems
because removing it from the list does not imply
that there will be no outstanding accesses,
and thus any modification or deallocation of the node
might constitute a race condition.)

<p>
In both of the following examples,
a comparison failure in the compare-and-swap
will update <samp>candidate-&gt;next</samp>
with the current value of <samp>head</samp>.</p>

<pre><code>
echo n2427.c list
cat &lt;&lt;EOF &gt;&gt;n2427.c

#ifdef __cplusplus

struct data;
struct node
{
    node* next;
    data* value;
};

atomic&lt; node* &gt; head CPP0X( { (node*)0 } );

void list_example_strong( data* item )
{
    node* candidate = new node;
    candidate-&gt;value = item;
    candidate-&gt;next = head.load();
    while ( ! head.compare_swap( candidate-&gt;next, candidate ) );
}

void list_example_weak( data* item )
{
    node* candidate = new node;
    candidate-&gt;value = item;
    candidate-&gt;next = head.load( memory_order_relaxed );
    while ( ! head.compare_swap( candidate-&gt;next, candidate,
                                 memory_order_release, memory_order_relaxed ) );
}

#endif

EOF
</code></pre>

<h3><a name="ExampleUpdate">Update</a></h3>

<p>
The best algorithm for updating a variable
may depend on whether or not atomics are lock-free.
In the example below,
this update can be accomplished in a lock-free manner with compare-and-swap
when atomic integrals are lock-free,
but may require other mechanisms when
atomic integrals are not lock-free.
This example uses the feature macro to generate minimal code
when the lock-free status is known a priori.
<pre><code>
echo n2427.c update
cat &lt;&lt;EOF &gt;&gt;n2427.c

#if ATOMIC_INTEGRAL_LOCK_FREE &lt;= 1
atomic_flag pseudo_mutex = ATOMIC_FLAG_INIT;
unsigned long regular_variable = 1;
#endif
#if ATOMIC_INTEGRAL_LOCK_FREE &gt;= 1
atomic_ulong atomic_variable = { 1 };
#endif

void update()
{
#if ATOMIC_INTEGRAL_LOCK_FREE == 1
    if ( atomic_is_lock_free( &amp;atomic_variable ) ) {
#endif
#if ATOMIC_INTEGRAL_LOCK_FREE &gt; 0
        unsigned long full = atomic_load( atomic_variable );
        unsigned long half = full / 2;
        while ( ! atomic_compare_swap( &amp;atomic_variable, &amp;full, half ) )
            half = full / 2;
#endif
#if ATOMIC_INTEGRAL_LOCK_FREE == 1
    } else {
#endif
#if ATOMIC_INTEGRAL_LOCK_FREE &lt; 2
        __atomic_flag_wait__( &amp;pseudo_mutex );
        regular_variable /= 2 ;
        atomic_flag_clear( &amp;pseudo_mutex );
#endif
#if ATOMIC_INTEGRAL_LOCK_FREE == 1
    }
#endif
}

EOF
</code></pre>

<h3><a name="ExampleMain">Main</a></h3>
<pre><code>
echo n2427.c main
cat &lt;&lt;EOF &gt;&gt;n2427.c

int main()
{
}

EOF
</code></pre>
</body></html>
