<html><head><title>Thread-Local Storage</title>
<body><h1>Thread-Local Storage</h1>

<p>ISO/IEC JTC1 SC22 WG21 N2280 = 07-0140 - 2007-05-02

<p>Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

<p>This proposal is a revision of N2147 = 07-0007 - 2007-01-05.
The revision is to replace the keyword <code>__thread</code>
with the keyword <code>thread_local</code>.

<h2>Introduction</a></h2>

<p> In multi-threaded applications,
there often arises the need to maintain data
that is unique to a thread.
We call this thread-local storage.

<p> Several techniques have been used to accomplish this task.
Notable among them is the POSIX
<code>getthreadspecific</code> and <code>setthreadspecific</code> facility.
Unfortunately, this facility is clumsy and slow.
In addition, the facility is not particularly helpful
when converting a single-threaded application
to a multi-threaded application.

<p> Several vendors have provided a language extension
for a new storage class that indicates that
a variable has thread storage duration.
Use of thread variables is relatively easy and
access to thread variables is relatively fast.
In addition,
the conversion of a single-threaded application using static-duration variables 
to a multi-threaded application using thread-duration variables
requires less wholesale program restructuring.

<p> Roughly equivalent extensions are available from

<table>

<tr>
<td><a href="http://www.gnu.org/">GNU</a>
<td><a href="http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html#Thread-Local">Thread-Local Storage</a></td>
</tr>

<tr>
<td><a href="http://www.hp.com/">Hewlett-Packard</a>
<td><a href="http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARH9VDTE/THRDSCHP.HTM#anch_1024">Using Thread Local Storage</a></td>
</tr>

<tr>
<td><a href="http://www.hp.com/">Hewlett-Packard</a>
<td><a href="http://devrsrc1.external.hp.com/STKT/impacts/i320.html">Tru64 UNIX to HP-UX STK: critical Impact: TLS - feature differences (CrCh320)</a></td>
</tr>

<tr>
<td><a href="http://www.intel.com/">Intel</a>
<td><a href="http://www.intel.com/software/products/compilers/clin/docs/ug_cpp/lin1057.htm">Thread-local Storage</a></td>
</tr>

<tr>
<td><a href="http://www.microsoft.com/">Microsoft</a>
<td><a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/decla_44.asp">Thread Local Storage</a></td>
</tr>

<tr>
<td><a href="http://www.sun.com/">Sun Microsystems</a>
<td><a href="http://docs.sun.com/source/817-5070/Language_Extensions.html#pgfId-997650">Thread-Local Storage</a></td>
</tr>

</table>

<p> The C++ standard should adopt existing practice for thread-local storage.
In addition, the C++ standard should extend existing practice
to enable broader use.

<h2>Proposal</h2>

<p> The specification outline is as follows.
We defer detailed changes to the text of the standard
to the final section.

<h3>Thread Storage Duration</h3>

<p> Add a new storage duration called thread storage duration.
Objects with thread storage duration are unique to each thread.

<p> Those objects which may have static storage duration
may have thread storage duration instead.
These objects include
namespace-scope variables,
function-local static variables, and class static member variables.

<h3>Storage Class <code>thread_local</code></h3>

<p> Add <code>thread_local</code>,
a new keyword and storage class specifier.
The <code>thread_local</code> specifier
indicates that the variable has thread storage duration.

<p> Variables declared with the <code>thread_local</code> specifier
are bound as they would be without the <code>thread_local</code> specifier.

<h3>Addresses of Thread Variable</h3>

<p> The address-of operator (<code>&amp;</code>),
when applied to a thread variable,
is evaluated at run time
and returns the address of the current thread's variable.
Therefore, the address of a thread variable is not a constant.

<p> Thread-local storage defines lifetime and scope, not accessibility.
That is, one may take the address of a thread-local variable
and pass it to other threads.

<p> The address of a thread variable is stable
for the lifetime of the corresponding thread.
The address of a thread variable
may be freely used during the variable's lifetime
by any thread in the program.
When a thread terminates,
all addresses of that thread's variables are invalid
and may not be used.

<h3>Thread Variable Dynamic Initialization</h3>

<p> A thread variable may be statically initialized
as would any other static-duration variable.

<p> At present, all implementations of thread-local storage
do not support dynamic initalization (and presumably non-trivial destructors).
There was mild consensus at the Mont Treblant meeting
to support dynamic initialization of function-local, thread-local variables.
The intialization of such variables is already guarded and synchronous,
so new technology is not required.
On the other hand, the implementation for dynamic initialization
of namespace-scope variables is much more difficult,
and may require additional linker and operating system support.
There was no consensus to support dynamic initialization of
namespace-scope variables at this time.
However, interviews with prospective users
indicated a <em>firm</em> desire for full dynamic initialization
of thread storage duration variables.
The programmers simply did not want to partition their types this way.

<p> The implementation of dynamic initialization and destruction
can be implemented with two approaches.

<dl>
<dt><code>.init</code> sections</dt>
<dd>
Extend the semantics of <code>.init</code> sections
to also include sections for thread-local storage.
These thread-local inits will be invoked
whenever the corresponding storage section is allocated.
This approach requires operating-system support.
</dd>

<dt>initialized flags</dt>
<dd>
The compiler inserts dynamic tests on an initialized flag
into the program before access to a thread-local variable.
The initialization of a thread-local variable
must initialize all such variables defined within its translation unit.
Note though, that initializations should be marked complete
before executing the initialization
to prevent recursive attempts to initialize the same variable.
(Such recursive initializations have undefined behavior
and are governed by the zero-initialization clause.)
This approach does not require operating-system support,
but has higher run-time cost.
</dd>
</dl>

<p> In either case, the initialization of a thread-local variable
must place the destruction on a thread-local list
for subsequent handling on exit from the thread
(potentially with cancellation cleanup functions).

<h2>Other Issues</h2>

<p> There are some other issues that deserve mention
even though they are not properly part of the C++ standard
because they affect real programs.

<h3>Dynamic Libraries</h3>

<p> The allocation of thread-local storage
for the full product of threads and dynamic libraries
could result in very large storage requirements.
The Sun Microsystems implementation
only allocates thread-local storage for a dynamic library
when the thread uses a variable from that library.
That is, the Sun implementation
allocates memory lazily for each thread and dynamic library pair.
To avoid bloated programs,
the language definition must permit this optimization.

<p> The system may immediately deallocate
the storage associated with a thread and dynamic library pair
when either the thread terminates or the library is closed.
The system is not required to deallocate immediately.
However, the system is required to not leak storage.
Thread-local storage for a thread must be reclaimed
no later than a subsequent thread creation.
Thread-local storage for a library within a thread
must be reclaimed no later than a subsequent open of that library.
(Opening another library does not require storage reclamation,
though doing so would ceratinly reduce storage consumption.)

<p> While storage deallocation can be defered,
variable destruction must not be defered
because destruction depends on access to thread state.
In the presence of programmed closing of a dynamic library,
its thread-local variables may need to be destructed out of order
with respect to thread-local variables outside of the library.

<h3>System Interface</h3>

<p> When <code>dlsym()</code> is used on a thread variable,
the address returned
will be the address of the currently executing thread's variable.

<h2>Standard Changes</h2>

<p> The text of the standard changes
as specified in this section.

<h3>2.11 Keywords [lex.key]</h3>

<p> To table 3, add <ins><code>thread_local</code></ins>.

<h3>3.6.1 Main function [basic.start.main]</h3>

<p> In paragraph 4, edit as follows.
This change is the minimal necessary to accomodate
thread-duration objects.
A more robust specification of termination is needed.
See 18.4 Start and termination [support.start.term].

<blockquote>
Calling the function <code>std::exit(int)</code>
declared in <code>&lt;cstdlib&gt;</code> (18.4)
terminates the program without leaving the current block
<ins>or current thread</ins>
and hence without destroying any objects with automatic storage duration (12.4)
<ins>or thread storage duration (3.7.2(new))</ins>.
If <code>std::exit</code> is called to end a program
during the destruction of an object
with static <ins>or thread</ins> storage duration,
the program has undefined behavior.
</blockquote>

<h3>3.6.2 Initialization of non-local objects [basic.start.init]</h3>

<p> Before paragraph 1, add a new paragraph

<blockquote>
<ins>There are two broad classes of non-local objects,
those with static storage duration (3.7.1)
and those with thread storage duration (3.7.2(new)).
Objects with static storage duration are initialized
as a consequence of program initiation.
Objects with thread storage duration are initialized
as a consequence of thread initiation.
Within each initiation,
initialization occurs as follows.</ins>
</blockquote>

<p> In paragraph 1, edit

<blockquote>
Objects with static storage duration (3.7.1)
<ins>or thread storage duration (3.7.2(new))</ins>
shall be zero-initialized (8.5) before any other initialization takes place.
A reference with static <ins>or thread</ins> storage duration
and an object of POD type with static <ins>or thread</ins> storage duration
can be initialized with a constant expression (5.19);
</blockquote>

<p> In paragraph 2, edit

<blockquote>
An implementation is permitted
to perform the initialization of an object of namespace scope
<del>with static storage duration</del>
as a static initialization
even if such initialization is not required to be done statically,
provided that
<ul>
<li> the dynamic version of the initialization
does not change the value of any other object of namespace scope
<del>with static storage duration</del> prior to its initialization, and
<li>....
<li>[<em>Note:</em> as a consequence,
if the initialization of an object <code>obj1</code>
refers to an object <code>obj2</code> of namespace scope
<del>with static storage duration</del> ....
</ul>
</blockquote>

<p> In paragraph 3, edit

<blockquote>
It is implementation-defined whether or not
the dynamic initialization (8.5, 9.4, 12.1, 12.6.1)
of an object of namespace scope <ins>and with static storage duration</ins>
is done before the first statement of <code>main</code>.
....
</blockquote>

<p> After paragraph 3, add new paragraph 4.

<blockquote>
<ins>It is implementation-defined whether or not
the dynamic initialization (8.5, 9.4, 12.1, 12.6.1)
of an object of namespace scope and with thread storage duration
is done before the first statement of the initial function of the thread.
If the initialization is deferred to some point in time
after the first statement of the initial function of the thread,
it shall occur before the first use of any object with thread storage duration
defined in the same translation unit as the object to be initialized.</ins>
</blockquote>

<p> In existing paragraph 4, edit

<blockquote>
If construction or destruction of a
<del>non-local static</del>
object <ins>of namespace scope</ins>
ends in throwing an uncaught exception,
the result is a call to <code>std::terminate</code> (18.7.3.3).
</blockquote>

<h3>3.6.3 Termination [basic.start.term]</h3>

<p> In paragraph 1, edit

<blockquote>
Destructors (12.4) for initialized objects of static storage duration
(declared at block scope or at namespace scope)
are called as a result of returning from <code>main</code>
and as a result of calling <code>exit</code> (18.3).
<ins>Destructors (12.4) for initialized objects with thread storage duration
(declared at block scope or at namespace scope)
are called as a result of returning from the initial function of a thread.
When the initial function of a thread is the <code>main</code> function,
the objects are destructed before those of static storage duration.</ins>
These objects are destroyed
in the reverse order of the completion of their constructor
or of the completion of their dynamic initialization.
If an object is initialized statically,
the object is destroyed in the same order
as if the object was dynamically initialized.
For an object of array or class type,
all subobjects of that object are destroyed
before any local object with static storage duration
initialized during the construction of the subobjects is destroyed.
</blockquote>

<p> In paragraph 4, edit

<blockquote>
Calling the function <code>std::abort()</code>
declared in <code>&lt;cstdlib&lt;</code>
terminates the program without executing destructors for objects
<del>of</del> <ins>with</ins>
automatic<ins>, thread,</ins> or <del>with</del> static storage duration
and without calling the functions passed to <code>std::atexit()</code>.
</blockquote>

<h3>3.7 Storage Duration [basic.stc]</h3>

<p> To the list of storage durations in paragraph 1,
between static and automatic, add

<ul>
<li><ins>thread storage duration</ins></li>
</ul>

<p> In paragraph 2, edit

<blockquote>
Static<ins>, thread,</ins> and automatic durations
are associated with objects
introduced by declarations (3.1) and
implicitly created by the implementation (12.2).
</blockquote>

<p> In paragraph 3, edit

<blockquote>
The storage class specififers
<code>static</code><ins>, <code>thread_local</code>,</ins> and <code>auto</code>
are related to storage duration as described below.
</blockquote>

<h3>3.7.1 Static storage duration [basic.stc.static]</h3>

<p> In paragraph 1, edit

<blockquote>
All objects which
<del>neither</del> <ins>do not</ins> have dynamic storage duration<ins>,
do not have thread storage duration, and</ins>
<del>nor</del> are <ins>not</ins> local<ins>,</ins>
have static storage duration.
</blockquote>

<h3>3.7.2(new) Thread storage duration [basic.stc.thread]</h3>

<p> Add a new section after 3.7.1 Static storage duration [basic.stc.static]
with the following contents.

<blockquote>
<p> <ins>All objects declared with the <code>thread_local</code> keyword
have <em>thread storage duration</em>.
The storage for these objects
shall last for the duration of the thread in which they are created.
There is a distinct object per thread,
and use of the declared name
refers to the object associated with the current thread.</ins>

<p> <ins>An object with thread storage duration shall be initialized
before its first use,
and if initialized, shall be destroyed on thread exit.</ins>
</blockquote>

<h3>3.7.3.1(old) Allocation functions [basic.stc.dynamic.allocation]</h3>

<p> In paragraph 4, edit

<blockquote>
[ Note: in particular,
a global allocation function is not called to allocate storage
for objects with static storage duration (3.7.1),
<ins>for objects with thread storage duration (3.7.2(new)),</ins>
for objects of type <code>std::type_info</code> (5.2.8),
for the copy of an object thrown by a throw expression (15.1). --end note ]
</blockquote>

<h3>3.8 Object Lifetime [basic.life]</h3>

<p> In paragraph 8, edit

<blockquote>
If a program ends the lifetime of an object of type <code>T</code>
with static (3.7.1)<ins>, thread (3.7.2(new),</ins> or
automatic <del>(3.7.2)</del><ins>(3.7.3(new))</ins>
storage duration and if <code>T</code> has a non-trivial destructor,
</blockquote>

<p> In footnote 40, edit

<blockquote>
that is, an object for which a destructor will be called implicitly --
<del>either either</del>
upon exit from the block for an object with automatic storage duration<ins>,
upon exit from the thread for an object with thread storage duration,</ins>
or upon exit from the program for an object with static storage duration.
</blockquote>

<p> In paragraph 9, edit

<blockquote>
Creating a new object at the storage location
that a <code>const</code> object with
static<ins>, thread,</ins> or automatic storage duration
occupies or,
at the storage location
that such a <code>const</code> object used to occupy
before its lifetime ended results in undefined behavior.
</blockquote>

<h3>5.19 Constant expressions [expr.const]</h3>

<p> Paragraph 2 remains unchanged,
intepreting "static" as modifying initialization
rather than as a reference to duration.

<blockquote>
Other expressions are considered constant-expressions
only for the purpose of
non-local static object initialization (3.6.2).
Such constant expressions shall evaluate to one of the following:
</blockquote>

<p> Paragraphs 4 (address constant expressions)
and 5 (reference constant expressions)
remain unchanged.
The omission of thread storage duration becomes significant, though,
in that objects with thread storage duration do not have constant addresses.

<h3>6.7 Declaration statement [stmt.dcl]</h3>

<p> In paragraph 4, edit

<blockquote>
The zero-initialization (8.5)
of all local objects with static storage duration (3.7.1)
<ins>or thread storage duration (3.7.2(new))</ins>
is performed before any other initialization takes place.
A local object of POD type (3.9)
with static <ins> or thread</ins> storage duration
initialized with constant-expressions
is initialized before its block is first entered.
An implementation is permitted to perform early initialization
of other local objects with static <ins>or thread</ins> storage duration
under the same conditions
that an implementation is permitted
to statically initialize an object
with static <ins>or thread</ins> storage duration
in namespace scope (3.6.2).
</blockquote>

<p> Paragraph 5 is unchanged,
which by implication states that thread storage duration objects
must be destructed.

<h3>7.1.1 Storage class specifiers [dcl.stc]</h3>

<p> In paragraph 1, add "<ins><code>thread_local</code></ins>"
to the list of storage class specifiers.

<p> In paragraph 1, edit

<blockquote>
At most one <var>storage-class-specifier</var>
shall appear in a given <var>decl-specifier-seq</var><del>.</del><ins>,
except that <code>thread_local</code>
may appear with <code>static</code> and <code>extern</code>.
If <code>thread_local</code> does appear,
it shall be present in all declarations referring to the same object.</ins>
</blockquote>

<p> After paragraph 3, add a new paragraph

<blockquote>
<ins>The <code>thread_local</code> specifier
can be applied only to the names of
objects of block scope that also specify <code>static</code>
or to the names of objects of namespace scope.
It specifies that the named object
has thread storage duration (3.7.2(new)).</ins>
</blockquote>

<p> In paragraph 4, edit

<blockquote>
A <code>static</code> specifier used in the declaration of an object
declares the object to have static storage duration (3.7.1)<ins>,
unless accompanied by the <code>thread_local</code> specifier,
which declares the object to have thread storage duration (3.7.2(new))</ins>
</blockquote>

<p> Paragraph 5 on <code>extern</code> is missing the parallel text.

<h3>8.5 Initializers [dcl.init]</h3>

<p> In paragraph 2, edit

<blockquote>
Automatic, register, <ins>thread,</ins> static, and
<ins>namespace-scoped</ins> external variables <del>of namespace scope</del>
can be initialized by arbitrary expressions
involving literals and previsously declared variables and functions.
</blockquote>

<p> Paragraph 7 remains unchanged,
which implies that thread storage duration objects
may be uninitialized at program startup.

<h3>8.5.1 Aggregates [decl.init.aggr]</h3>

<p> In paragraph 14, edit as follows.
The expanded scope of 3.6.2 leaves this text mostly untouched.

<blockquote>
When an aggregate with static <ins>or thread</ins> storage duration
is initialized with a brace-enclosed <var>initializer-list</var>,
if all the member initializer expressions are constant expressions,
and the aggregate is a POD type, the initialization shall be done during
a static phase of initialization (3.6.2);
otherwise, it is unspecified
whether the initialization of members with constant expressions
takes place
during the static phase or during the dynamic phase of initialization.
</blockquote>

<h3>9.2 Class members [class.mem]</h3>

<p> In paragraph 6, edit

<blockquote>
A member shall not be declared
to have automatic storage duration
(<code>auto</code>, <code>register</code>)<ins>,
with the <code>thread_local</code> <var>storage-class-specifier</var>
unless also declared <code>static</code>,</ins>
or with the <code>extern</code> <var>storage-class-specifier</var>.
</blockquote>

<h3>9.4.2 Static data members [class.static.data]</h3>

<p> In paragraph 1, edit

<blockquote>
A <code>static</code> data member is not part of the subobjects of a class.
<ins>For such a member declared <code>thread_local</code>,
there is only one copy of the member per thread.
For such a member not declared <code>thread_local</code>,
there</ins> <del>There</del> is only one copy of
<del>a <code>static</code></del> <ins>the</ins> data member
shared by all the objects of the class.
</blockquote>

<h3>12.1 Constructors [class.ctor]</h3>

<p> In paragraph 8, edit

<blockquote>
Default constructors are called implicitly to create class objects
of static<ins>, thread,</ins> or automatic
storage duration (3.7.1, <ins>3.7.2(new),</ins> 3.7.2)
defined without an initializer (8.5),
...
</blockquote>

<h3>12.2 Temporary objects [class.temporary]</h3>

<p> In paragraph 5, edit

<blockquote>
In addition, the destruction of temporaries bound to references
shall take into account the ordering of destruction of objects
with static<ins>, thread,</ins> or automatic storage duration
(3.7.1, 3.7.2<ins>(new), 3.7.3(new)</ins>);
</blockquote>

<h3>12.4 Destructors [class.dtor]</h3>

<p> In paragraph 10, edit

<blockquote>
Destructors are invoked implicitly
(1) for a constructed object with static storage duration (3.7.1)
at program termination (3.6.3),
<ins>(new) for a constructed object with thread storage duration (3.7.2(new))
at thread exit,</ins>
(2) for a constructed object with automatic storage duration
(3.7.<del>2</del><ins>3(new)</ins>)
when the block in which the object is created exits (6.7),
(3) for a constructed temporary object
when the lifetime of the temporary object ends (12.2),
(4) for a constructed object allocated by a new-expression (5.3.4),
through use of a delete-expression (5.3.5),
(5) in several situations due to the handling of exceptions (15.3).
</blockquote>

<h3>12.6.1 Explicit initialization [class.expl.init]</h3>

<p> In paragraph 4, edit

<blockquote>
[ Note: the order in which objects
with static <ins>or thread</ins> storage duration
are initialized is described in 3.6.2 and 6.7. -- end note ]
</blockquote>

<h3>15.3 Handling an exception [except.handle]</h3>

<p> In paragraph 4, edit

<blockquote>
Exceptions thrown
in destructors of objects with static storage duration
or in constructors of <ins>static-duration</ins> namespace-scope objects
are not caught by a <var>function-try-block</var> on <code>main()</code>.
<ins>Likewise, exceptions thrown
in destructors of object with thread storage duration
or in constructors of <ins>thread-duration</ins> namespace-scope objects
are not caught by a <var>function-try-block</var>
on the initial function of the thread.</ins>
</blockquote>

<h3>15.5.1 The <code>std::terminate()</code> function [except.terminate]</h3>

<p> In paragraph 1,
in the list of causes for termination,
edit

<blockquote>
when construction or destruction of a non-local object
with static <ins>or thread</ins> storage duration
exits using an exception (3.6.2), or
</blockquote>

<p> Another possibility is to propogate the exception to the joiner,
but then there would be no distinction between
the thread function exiting with an exception
and one of its thread-duration objects exiting with an exception.

<h3>18.4 Start and termination [support.start.term]</h3>

<p> In paragraph 3, edit

<blockquote>
The program is terminated
without executing destructors for objects
of automatic<ins>, thread,</ins> or static storage duration
and without calling the functions passed to atexit() (3.6.3).
</blockquote>

<p> Paragraph 8, discusses the interaction of destruction
and calling <code>exit</code>.
The following edit is the minimum possible change to the standard
to occomodate thread storage duration objects.

<blockquote>
The function exit() has additional behavior in this International Standard:
<ul>
<li>First,
objects with static storage duration are destroyed
and functions registered by calling <code>atexit</code> are called.
Non-local objects with static storage duration
are destroyed in the reverse order of the completion of their constructor.
(<del>Automatic objects</del>
<ins>Objects with either automatic or thread storage duration</ins>
are not destroyed
as a result of calling <code>exit()</code>.)
Functions registered with <code>atexit</code>
are called in the reverse order of their registration,
except that a function is called after any previously registered functions
that had already been called at the time it was registered.
A function registered with <code>atexit</code>
before a non-local object <code>obj1</code> of static storage duration
is initialized
will not be called until <code>obj1</code>'s destruction has completed.
A function registered with <code>atexit</code>
after a non-local object <code>obj2</code> of static storage duration
is initialized
will be called before <code>obj2</code>'s destruction starts.
A local static object <code>obj3</code>
is destroyed at the same time it would be
if a function calling the <code>obj3</code> destructor
were registered with <code>atexit</code>
at the completion of the <code>obj3</code> constructor.
</ul>
</blockquote>

</body></html>
