<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>P0250R1: Wording improvements for initialization and thread ids (CWG 2046)</title>
</head>
<body>
<table>
	<tr>
                <th>Doc. No.:</th>
                <td>WG21/P0250R1</td>
        </tr>
        <tr>
                <th>Date:</th>
                <td>2016-03-20</td>
        </tr>
        <tr>
                <th>Author:</th>
                <td>Hans-J. Boehm</td>
        </tr>
        <tr>
                <th>Email:</th>
                <td><a href="mailto:hboehm@google.com">hboehm@google.com</a></td>
        </tr>
	<tr>
		<th>Target audience:</th>
		<td>SG1 and CWG</td>
	</tr>
</table>
<h1>P0250R1: Wording improvements for initialization and thread ids (CWG 2046)</h1>
<p>
This is an attempt to turn the Kona SG1 discussion of CWG 2046 into wording.
This revision also reflects discussion in Jacksonville.
Unfortunately, these discussions exposed a number of related issues that are
still not fully resolved.

<h2>Happens-before constraints on the implementation</h2>
<p>
In several places the standard needs to require that some operation,
e.g. an initialization, happens
before another one, in roughly, but not quite, the sense of the 1.10 memory model.
The difference is that we expect this happens before relation to compose with
other happens-before relationships.
But our normal happens before relation no longer composes in this way,
due to the presence of <tt>memory_order_consume</tt>.  Hence we suggest
introducing the required notion in 1.10.
<p>
There was some discussion in Jacksonville about using "synchronizes with"
instead of introducing new terminology. That appears to be technically
workable, but rather misleading.  "Synchronizes with" is currently only
defined between actions by different threads, and does not include sequencing.
Many of the guarantees we would like to provide will often rely on sequencing
rather than synchronization.  Thus my current feeling is to introduce
new terminology.
<p>
The current standard sometimes uses "sequenced before" to express this
kind of relationship.  This is incorrect and repaired below,
since sequencing is a relation on evaluations within a single thread.

<h2>Parallel initialization for sequential programs</h2>
<p>
In Kona, SG1 rather quickly concluded that a C++ program
should be allowed to construct
multiple threads for constructors etc., even if the program itself
did not use threads. That is not reflected here, mostly because it is a
more profound issue than we appreciated at the time.
<p>
If we went this route, we would break compatibility with
C++03, in what may be a rather significant way.  Static constructors for
old single-threaded programs would be required to be thread-safe.
This is unlikely to be automatically true.  It is not that uncommon
for constructors to e.g. add to a global data structure.  The order in
which items are added may not matter.  But such a data structure implemented
in a single-threaded program is unlikely to be thread-safe.
<p>
In Jacksonville, it became clear that some of the confusion here stems from
uncertaintly about the motivation for the current wording.  Currently, we
very explicitly allow static constructors to run after the start of main,
whether or not other threads are started.  Running a constructor while main
is running inherently implies some notion of concurrency.
The intent appears to have been to allow construction right before
the first odr-use within the using thread, rather than at an arbitrary
point before the initial odr-use, so that unexpected races between
the constructor and main can be avoided.  We don't yet know how to
phrase this.
<p>
The Jacksonville discussion seemed to lean towards preserving wording
that allows parallel initialization only after a second thread is forked.
However it remains unclear what this means, given that I don't think we're
currently very clear that random library calls cannot start threads.
We would probably need a similar restriction prohibiting under-the-covers
thread creation by library calls unless a second thread has already been
created.

<h2>Thread id for main</h2>
<p>
We concluded in Kona that main did not necessarily have a thread id, and
that some implementations went out of their way to keep it from having one.
After looking at the N4567 wording again, my conclusion is that this
interpretation of the standard is a stretch.  Some of this wording may
have improved since the implementers looked at it.
<p>
The first sentence of 1.10 [intro.multithread] states
<blockquote>
<p>
A thread of execution (also known as a thread) is a single flow of control
within a program, including the initial invocation of a specific top-level
function, and recursively including every function invocation subsequently
executed by the thread.
</blockquote>
<p>
30.3.1.1 [thread.thread.id] states
<blockquote>
<p>
An object of type thread::id provides a unique identifier for each thread of
execution and a single distinct value for all thread objects that do not
represent a thread of execution (30.3.1).
</blockquote>
<p>
30.3.2 [thread.thread.this] states
<blockquote>
<p>
thread::id this_thread::get_id() noexcept;
<p>
Returns: An object of type thread::id that uniquely identifies the current
thread of execution. No other thread of execution shall have this id and this
thread of execution shall always have this id. The object returned shall not
compare equal to a default constructed thread::id.
</blockquote>
<p>
I would nonetheless suggest changing the first sentence of 3.6.1
[basic.start.main] as follows:
<blockquote>
<p>
A program shall contain a global function called main, which is the designated
start of the program <ins>and of its initial thread of execution</ins>.
</blockquote>
<p>
There were some voices in Jacksonville in favor of preparing the standard
for execution agents that do not support a thread id (or thread local
storage).  This would simplify GPU execution, but it seems to require that
the standard library specify which calls need thread local storage, and
which do not, something we don't currently specify.  It is unclear to
me that we should address that issue in this context.

<h2>Wording changes for construction/destruction ordering</h2>
<p>
(Fixes an editorial ugliness that somehow crept in.) Move 1.9p6, which uses
sequenced before, to someplace below the definition of sequenced before in
1.9p13 [intro.execution].
<p>
Add after 1.10p14 [intro.multithread]:
<blockquote>
<ins>An evaluation A <i>strongly happens before</i> an evaluation B if either</ins>
<ul>
<li><ins>A is sequenced before B, or</ins>
<li><ins>A synchronizes with B, or</ins>
<li><ins>A strongly happens before X and X strongly happens before B.</ins>
</ul>
<ins>[Note: In the absence of consume operations the happens before and strong happens before relations
are identical.  Strong happens before essentially excludes consume operations.]</ins>
</blockquote>
<p>
There may be cleaner ways to introduce this.
<p>
Change the last part of 3.6.2p2 [basic.start.static] as follows:
<blockquote>
<p>
If constant initialization is not performed, a variable with static storage
duration (3.7.1) or thread storage duration (3.7.2) is zero-initialized (8.5).
Together, zero-initialization and constant initialization are called static
initialization; all other initialization is dynamic initialization. <ins>Within each phase,</ins> static
initialization shall <del>be performed</del> <ins>strongly happen</ins> before
<ins>(1.10)</ins> any dynamic initialization <del>takes place</del>. [ Note: The dynamic
initialization of non-local variables is described in 3.6.3; that of local
static variables is described in 6.7. -- end note ]
</blockquote>
<p>
Change 3.6.3p1 [basic.start.dynamic] as follows:
<blockquote>
<p>
Dynamic initialization of a
non-local variable with static storage duration is unordered if the variable is
an implicitly or explicitly instantiated specialization, and otherwise is
ordered [ Note: an explicitly specialized static data member or variable
template specialization has ordered initialization. -- end note ].
Variables with ordered initialization defined within a single translation unit
shall be initialized <del>in the order of their definitions</del><ins>such that
initialization corresponding to earlier definitions strongly happen before
those for later definitions</ins> in the translation unit.  If a program starts
a thread (30.3), <ins>any initialization not specified to
happen before the first thread creation may be performed by either the main
thread or a thread implicitly created to perform initializations.  If such
initializations are unordered or defined in separate translation units, no
happens before ordering is specified between them.</ins> <del>the subsequent
initialization of a variable is
unsequenced with respect to the initialization of a variable defined in a
different translation unit.  Otherwise, the initialization of a variable is
indeterminately sequenced with respect to the initialization of a variable
defined in a different translation unit. If a program starts a thread, the
subsequent unordered initialization of a variable is unsequenced with respect
to every other dynamic initialization. Otherwise,</del> <ins>If two unordered
initializations happen before the start of the first additional thread, or the
program starts no additional threads, then those</ins> <del>the</del> unordered
initialization<ins>s</ins><del> of a variable is</del> <ins>are</ins>
indeterminately sequenced with respect to <ins>each other</ins> <del>every other
dynamic initialization</del>. [ Note: This definition permits initialization of
a sequence of ordered variables concurrently with another sequence.
-- end note ]
</blockquote>
<p>
3.6.3p2 appears to have some unfortunate interactions with signal handlers.  We
may want to address those here, but they are currently not addressed. P0270R0
has more details.
<p>
Change the beginning of 3.6.3p2 as follows:
<blockquote>
<p>
It is implementation-defined whether the dynamic initialization of a non-local
variable with static storage duration <ins>strongly</ins>
happens before the first statement of main. If the initialization is
deferred to some point in time after the first statement of main, it
<ins>strongly</ins> happens before the first odr-use (3.2) of
any function or variable defined in the same translation unit as the variable
to be initialized.
</blockquote>
<p>
As we discussed above, this almost certainly needs to be pinned down further.
Can we say that it happens immediately before the first odr-use, i.e. after
all evaluations sequenced before the odr-use?  Can we just insist that
it must happen before main, and drop the second option?
<p>
Change 3.6.3p3 as follows to fix what looks like an unrelated bug:
<blockquote>
<p>
It is implementation-defined whether the dynamic initialization of a non-local
variable with <del>static or</del> thread storage duration is
sequenced before the first statement of the initial function of the
thread. If the initialization is deferred to some point in time after the first
statement of the initial function of the thread, it is
sequenced before the first odr-use (3.2) of any variable with
thread storage duration defined in the same translation unit as the variable to
be initialized.
</blockquote>
<p>
The preceding seems to have similar issues to static variables, in that it
allows essentially concurrent initialization of thread-locals.  If a
thread-local constructor touches a data structure also touched by the mainline
thread code, this allows concurrent access from the same thread, which seems
highly undesirable and unexpected. This presumably needs clarification similar
to the static case.
<p>
Change 3.6.4p1 [basic.start.term] as follows:
<blockquote>
<p>
Destructors (12.4) for initialized objects (that is, objects whose lifetime
(3.8) has begun) with static storage duration are called as a result of
returning from main and as a result of calling std::exit (18.5).
<ins>The return from main or the call to std::exit respectively strongly
happen before the destructor invocations.</ins>
<p>
Destructors for initialized objects with thread storage duration within a given
thread are called as a result of returning from the initial function of that
thread and as a result of that thread calling std::exit. The completions of the
destructors for all initialized objects with thread storage duration within
that thread <del>are sequenced</del> <ins>strongly happen</ins> before the
initiation of the destructors of any object with static storage duration. If
the completion of the constructor or dynamic initialization of an object with
thread storage duration is sequenced before that of another, the completion of
the destructor of the second is sequenced before the initiation of the
destructor of the first.
<p>
If the completion of the constructor or dynamic initialization of an object
with static storage duration <del>is sequenced</del> <ins>strongly
happens</ins> before that of another, the completion of the destructor of the
second <del>is sequenced</del> <ins>strongly happens</ins> before the
initiation of the destructor of the first. [ Note: This definition permits
concurrent destruction.  -- end note ] If an object is initialized statically,
the object is destroyed in the same order as if the object was dynamically
initialized.  For an object of array or class type, all subobjects of that
object are destroyed before any block-scope object with static storage duration
initialized during the construction of the subobjects is destroyed. If the
destruction of an object with static or thread storage duration exits via an
exception, std::terminate is called (15.5.1).
</blockquote>
<p>
Change 3.6.4p3 [basic.start.term] as follows:
<blockquote>
<p>
If the completion of the initialization of an object with static storage duration <del>is sequenced</del> <ins>strongly happens</ins> before a call
to std::atexit (see &lt;cstdlib&gt;, 18.5), the call to the function passed to std::atexit <del>is sequenced</del> <ins>strongly happens</ins> before
the call to the destructor for the object. If a call to std::atexit <del>is sequenced</del> <ins>strongly happens</ins> before the completion of the
initialization of an object with static storage duration, the call to the destructor for the object <del>is sequenced</del> <ins>strongly happens</ins>
before the call to the function passed to std::atexit. If a call to std::atexit <del>is sequenced</del> <ins>strongly happens</ins> before another
call to std::atexit, the call to the function passed to the second std::atexit call <del>is sequenced</del> <ins>strongly happens</ins> before the
call to the function passed to the first std::atexit call.
</blockquote>
</body>
</html>
