<html><head><title>C++ Threads</title></head>
<body>
<h1>C++ Threads</h1>

Lawrence Crowl, 2005-10-03, N1875=05-0135

<ul>
<li><a href=introduction>Introduction</a>
<li><a href=approach>Approach</a>
<li><a href=fork>Fork and Join</a>
<li><a href=timeouts>Timeouts</a>
<li><a href=managing>Managing Threads</a>
<li><a href=monitor>Mutual Exclusion and Conditions</a>
<li><a href=synchronized>Synchronized Initialization</a>
<li><a href=atomic>Atomic Operations</a>
<li><a href=errors>Errors and Exceptions</a>
<li><a href=library>Standard Library</a>
</ul>

<a name=introduction><h1>Introduction</h1></a>

<p>The challenge in defining a C++ threading model
is that we need to serve the needs of relatively casual thread programmers
and deliver good performance on modern systems.
However, we should not attempt to duplicate
the work of significant industry standards,
such as OpenMP.
Furthermore, a standard C++ threading model
should be reasonably implementable on existing operating systems.

<p>This paper defers the discussion of a C++ memory model,
i.e. memory synchronization,
to other papers,
most recently
<cite>N1876=05136
"Memory model for multithreaded C++: August 2005 status update"</cite>.

<p>This paper defers the discussion of thread-local strorage
to paper <cite>N1874=05-134 "Thread-Local Storage"</cite>.

<a name=approach><h1>Approach</h1></a>

<p>Proposals for threads in C++ generally adopt one of two approaches,
a language extension to classes or a library.
While generally workable, these approaches suffer from disadvantages.
Extensions to classes tend to tie threading, a control abstraction,
to the primary mechanism for data abstraction.
This approach prematurely binds the choice of parallelism
and synchronization into the API,
and the result is often oversynchronized programs.
A library approach often requires a fair amount of packaging
at the library interface.
Furthermore, a library interface implies either
a compiler that recognizes the library
or a compiler that avoids certain optimizations.
Finally, both of these approaches suffer from an inapplicability to C.
The problems that threads address are equally prevelant in C programs,
and so a C++ solution that can be relatively easily adopted into C
would serve the C/C++ community well.

<p>This proposal adopts a syntax-based implementation
of threads and synchronization.
There are new operators and new control statements
to address the new control domain.
This proposal requires significant compiler implementation effort.
On the other hand,
that effort is lower
than the effort in recognizing a library-based implementation.
In keeping with a desire to for feasible C adoption,
the syntax is C-like, just as are C++ control constructs.
Likewise, the proposal adopts C-like function names where necessary.

<p>Introducing new keywords is problematic.
This proposal follows the <tt>_Upper</tt> convention for new keywords,
recognizing that many persons find this convention ugly.
It is however, the least likely to introduce conflicts with existing programs
and the most likely to ensure debate on the choice of keywords.

<p>Any C++ thread design is also contrained by the core thread utilities
provided by the various operating systems.
The two primary constraints are Posix threads and Windows threads.

<a name=fork><h2>Fork and Join</h2></a>

<p>The <tt>_Fork</tt> (prefix) operator
creates threads to execute a function call.
The thread operator's value is a join object.
The <tt>_Join</tt> (prefix) operator applied to a join object,
yields the return value of function.
The variables of join objects are declared, by example,
with the <tt>_Join</tt> operator.

<p><blockquote><tt><pre>
int function( int argument ) {
    int _Join joiner = _Fork work1( argument );
    // concurrent execution of work1 has begun
    int value = work2( argument );
    return value + _Join joiner;
}
</pre></tt></blockquote>

<p>The join object must be live until the thread has been joined.
The return value within joiner is live from the function return
until the join.

<p>There is no select-like facility
for joining the first ready joiner of a set of joiners.
The expectation is that programmers
will use other thread synchronization facilities for this purpose.

<a name=timeouts><h2>Timeouts</h2></a>

<p>Several operations require a notion of timeout.
The type <tt>struct _Timeout</tt> provides a means to specify a timeout.
There are two interfaces to create timeout values,
one suitable for C,
<blockquote><pre>
<tt>struct _Timeout;
void _Timeout_set( struct _Time * result, int32_t seconds, int32_t nanoseconds );
</tt></pre></blockquote>
and the other suitable to C++.
<blockquote><pre>
<tt>struct _Timeout {
    _Timeout( int32_t seconds, int32_t nanoseconds );
    // private implementation
};
</tt></pre></blockquote>

<a name=managing><h2>Managing Threads</h2></a>

<p>Managing threads requires a handle on threads.
A pointer to the join object serves this purpose.
A pointer to a join object of arbitrary type
may be cast to a pointer to a join object of void type
and back to a pointer of the original type.
When joining through such a pointer to a void joiner,
the return value is lost.
Pointers to join objects may be compared for equality or inequality
regardless of the result type of joiner.
So that a thread may manage itself,
a thread may obtain a pointer to its own join object.

<blockquote><tt>void _Join * _Thread_self()</tt></blockquote>

<p>A thread may indicate a good point
to defer processor resources to other threads
with the yield function.

<blockquote><tt>void _Thread_yield()</tt></blockquote>

<p>Likewise, a thread may induce a <em>minimum delay</em>
with the sleep function.

<blockquote><tt>void _Thread_sleep( struct _Timeout )</tt></blockquote>

<p>It is an open issue
whether threads should be able to induce yields and sleeps in other threads.

<a name=monitor><h2>Mutual Exclusion and Conditions</h2></a>

<p>Mutual exclusion is provided via the monitor paradigm.
This paradigm is realized with a mutex variable, a lock statement,
a condition variable, a wait statement, and a notify statement.

<p>There are three types of mutex variables,
<tt>_Mutex_simple</tt>, <tt>_Mutex_tested</tt>, and <tt>_Mutex_timed</tt>.

<p>The lock statement provides an else clause in the event that the mutex
is already held
(<tt>_Mutex_tested</tt> or <tt>_Mutex_timed</tt>)
or cannot be acquired within the specified timeout
(<tt>_Mutex_timed</tt> only).

<blockquote>
<tt>_Lock ( <var>mutex-lvalue</var>
</tt> [ <tt>
; <var>timeout-value</var>
</tt> ] <tt>
)<br>
<var>statement</var>
</tt> [ <tt>
else <var>statement</var></tt>
</tt> ]
</blockquote>

<p>Within the active statement of a lock statment,
programs may wait on conditions.
These conditions a represented as variables of type <tt>_Condition</tt>.
The corresponding statements are wait and notify.

<p>The wait statement specifies a condition variable,
a boolean expression, and an optional timeout.

<blockquote>
<tt>_Wait ( <var>condition-lvalue</var>
; <var>boolean-expr</var>
</tt> [ <tt>
; <var>timeout-value</var>
</tt> ] <tt>
)<br>
<var>statement</var> 
</tt> [ <tt>
else <var>statement</var></tt>
</tt> ]
</blockquote>

<p>If the boolean expression of the wait statement evaluates to false,
the thread will wait on a notify statement against the same condition variable,
and then repeat until the boolean expression evaluates to true.
When the expression evaluates to true,
the primary statement executes.
If the timeout is present
and the thread is not notified within the specified time,
the thread will reacquire the lock and execute the else statement.

<p>The notify statement specifies a condition variable
and an optional count.

<blockquote>
<tt>_Notify ( <var>condition-lvalue</var>
</tt> [ <tt>
; <var>count</var>
</tt> ] <tt>
); </tt>
</blockquote>

<p>If the count is not present,
all threads waiting on the condition will be notified.
If the count is present,
at most that number of waiting threads will be notified.

<p>For example, an int buffer might be implemented as
<blockquote><tt><pre>
class buffer {
    int head;
    int tail;
    int store[10];
    _Mutex_simple mutex;
    _Condition not_full;
    _Condition not_empty;
public:
    buffer() : head( 0 ) , tail( 0 ) { }
    void insert( int arg ) {
        _Wait( not_full; (head+1)%10 != tail ) {
            store[head] = arg;
            head = (head+1)%10;
            _Notify( not_empty );
        }
    }
    int remove() {
        int result;
        _Wait( not_empty; head != tail ) {
            result = store[tail];
            tail = (tail+1)%10;
            _Notify( not_full );
        }
    }
};
</pre></tt></blockquote>

<a name=synchronized><h2>Synchronized Initialization</h2></a>

<p>In C++, 
variables with program extent, e.g. static local variables,
may be initialized dynamically,
and hence are at risk of unsynchronized initialization.
A new storage class, <tt>_Synchronized</tt>,
declares that the variable initialization is to be synchronized.
Only one thread will initialize the variable,
and all threads will wait for initialization to complete.
The <tt>_Synchronized</tt> keyword is only required on variable definition.

<p><blockquote><pre><tt>
struct type {
     static int shared;
};
_Synchronized static int type::shared = initial_value();

int function( int arg ) {
    _Synchronized static int holder = arg;
    return holder + arg;
}
</tt></pre></blockquote>

<p>Due to the potential for deadlock,
programmers should minimize reliance on other synchronized variables
in the initialization of a synchronized varible.

<a name=atomic><h2>Atomic Operations</h2></a>

<p>Atomic operations are among the most difficult to standardize
because of the wide range of atomic primitives
provided by the various machines.
Atomic operations prefered on one machine
may be inefficient or unavailable on another machine.
(Though compare-and-swap is becoming nearly universal
on modern general-purpose machines.)
However, all machines provide at least one atomic synchronization primitive
that may be used to emulate others.
The requirement, then, is a mechanism for atomicity
that does require any specific atomic operation,
and that enables efficient exploitation of many operations.

<p>This requirement is satisfied by the <tt>_Atomic</tt> statement,
which accepts a variable and a statement.
The overall model of the <tt>_Atomic</tt> statement is 
that of a compiler-generated compare-and-swap loop.
However, The intent is that compilers recognize simple statements
corresponding to the processor's atomic operations
and use those atomic operations.
On the other hand, when a compare-and-swap implementation is not feasible,
this intent is that the compiler use tested busy locking.
As a consequence, the <tt>_Atomic</tt> statement behavior
is undefined in the presence of asynchronous signals.

<p>Because atomic operations may fail,
the usual approach is to retry those operations.
However, unbounded retry may lead to live-lock.
To prevent this problem,
the <tt>_Atomic</tt> statements accepts a count of the number
of times an atomic operation may attempted before
failing over to the else clause.

<p><blockquote>
<tt>_Atomic ( <var>lvalue</var>
</tt> [ <tt>
; <var>attempt-count</var>
</tt> ] <tt>
) <var>statement</var> 
</tt> [ <tt>
else <var>statement</var></tt>
</tt> ]
</blockquote>

<p>The variable specified in the <tt>_Atomic</tt> statement
may be read at most once and written at most once,
in that order,
and must be referenced through the same lvalue as specified.
Reads of other non-local variables have acquire semantics.
Writes to other non-local variables have release semantics.

<p>The classic test-and-set is equivalent to 
<blockquote><tt><pre>
extern bool bit;
bool old;
_Atomic( bit ) { old = bit; bit = true; }
</pre></tt></blockquote>

<p>A simple but more realistic examples is inserting a new list head element.
<blockquote><tt><pre>
extern element * head;
element * item;
_Atomic( head ) { item-&gt;next = head; head = item; }
</pre></tt></blockquote>

<p>A more comprehensive example is decrementing a reference count.
When decrementing a reference count,
the program needs to know if the count has reached zero
to deallocate the item.
However, reading the reference count after changing it
is not thread-safe.
The solution, is to store the value in a function-local variable
within the <tt>_Atomic</tt> statement.
<blockquote><tt><pre>
element * item;
int current;
_Atomic( item-&gt;refs ) { item-&gt;refs = current = item-&gt;refs + 1; }
</pre></tt></blockquote>

<a name=errors><h2>Errors and Exceptions</h2></a>

<p>Exceptions thrown out of the function called at a <tt>_Fork</tt> operator
are propogated through the <tt>_Join</tt> operator.

<p>It is an open issue on whether or not
threads should be able to induce exceptions in another thread.
It is clear, however, that such exceptions can be induced
only synchronously and at predictable points in the program.

<p>It is an open issue on how to handle errors with a C interface.

<a name=library><h2>Standard Library</h2></a>

<p>The standard library may need significant extension
in the definition of concurrent data structures.

</body></html>
