<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body {
  color: #000000;
  background-color: #FFFFFF;
}

del {
  text-decoration: line-through;
  color: #8B0040;
}
ins {
  text-decoration: underline;
  color: #005100;
}

p.example {
  margin: 2em;
}
pre.example {
  margin: 2em;
}
div.example {
  margin: 2em;
}

code.extract {
  background-color: #F5F6A2;
}
pre.extract {
  margin: 2em;
  background-color: #F5F6A2;
  border: 1px solid #E1E28E;
}

p.function {
}

p.attribute {
  text-indent: 3em;
}

blockquote.std {
  color: #000000;
  background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding: 0.5em;
}

blockquote.stddel {
  text-decoration: line-through;
  color: #000000;
  background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding: 0.5em;
}

blockquote.stdins {
  text-decoration: underline;
  color: #000000;
  background-color: #C8FFC8;
  border: 1px solid #B3EBB3;
  padding: 0.5em;
}

table {
  border: 1px solid black;
  border-spacing: 0px;
  margin-left: auto;
  margin-right: auto;
}
th {
  text-align: left;
  vertical-align: top;
  padding: 0.2em;
  border: none;
}
td {
  text-align: left;
  vertical-align: top;
  padding: 0.2em;
  border: none;
}

</style>

<title>C++ Latches and Barriers</title>
</head>
<body>
<h1>C++ Latches and Barriers</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3817 - 2013-10-11
</p>

<p>
Alasdair Mackintosh, alasdair@google.com, alasdair.mackintosh@gmail.com
</p>

<p>
<a href="#Introduction">Introduction</a><br>
<a href="#Solution">Solution</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#latch_operations">Latch Operations</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#barrier_operations">Barrier Operations</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#example">Sample Usage</a><br>
<a href="#synopsis">Synopsis</a><br>
<a href="#wording">Wording</a><br>
</p>


<h2><a name="Introduction">Introduction</a></h2>

<p>
Certain idioms that are commonly used in concurrent programming are missing
from the standard libraries. Although many of these these can be relatively
straightforward to implement, we believe it is more efficient to have a
standard version.
</p><p>
</p>
In addition, although some idioms can be provided using
mutexes, higher performance can often be obtained with atomic operations and
lock-free algorithms. However, these algorithms are more complex to write, and
are prone to error.
<p></p>
<p>
Other standard concurrency idioms may have difficult corner cases, and can be
hard to implement correctly. For these reasons, we believe that it is
valuable to provide these in the standard library.
</p>

<h2><a name="Solution">Solution</a></h2>

<p>
We propose a set of commonly-used concurrency classes, some of which may be
implemented using efficient lock-free algorithms where appropriate. This paper
describes the <i>latch</i> and <i>barrier</i> classes.
</p>
<p>
Latches are a thread co-ordination mechanism that allow one or more threads to
block until an operation is completed. An individual latch is a single-use
object; once the operation has been completed, it cannot be re-used.
</p>
<p>
Barriers are a thread co-ordination mechanism that allow multiple threads to
block until an operation is completed. Unlike a latch, a barrier is re-usable;
once the operation has been completed, the threads can re-use the same
barrier. It is thus useful for managing repeated tasks handled by multiple
threads.
</p>
<p>
A reference implementation of these two classes has been written.
</p>

<h3><a name="latch_operations">Latch Operations</a></h3>
A latch maintains an internal counter that is initialized when the latch is
created. One or more threads may block waiting until the counter is decremented
to 0.

<dl>
<dt><code>constructor( size_t );</code></dt>
<dd>
<p>
The parameter is the initial value of the internal counter.
</p>
</dd>

<dt><code>destructor( );</code></dt>
<dd>
<p>
Destroys the latch. If the latch is destroyed while other threads are blocked in
<code>wait()</code>, the behaviour is undefined. Once the internal count has
reached 0, the latch may safely be destroyed.
</p>
</dd>

<dt><code>void count_down( );</code></dt>
<dd>
<p>
Decrements the internal count by 1, and returns. If the count reaches 0, any
threads blocked in <code>wait()</code> will be released.
</p>
<p>
Throws <code>std::logic_error</code> if the internal count is already 0.
</p>
</dd>

<dt><code>void wait( );</code></dt>
<dd>
<p>
Blocks the calling thread until the internal count is decremented to 0 by one
or more other threads calling <code>count_down()</code>. If the count is
already 0, this is a no-op.
</p>
</dd>

<dt><code>bool try_wait( );</code></dt>
<dd>
<p>
Returns true if the internal count has been decremented to 0 by one
or more other threads calling <code>count_down()</code>, and false otherwise.
Does not block the calling thread.
</p>
</dd>

<dt><code>void count_down_and_wait( );</code></dt>
<dd>
<p>
Decrements the internal count by 1. If the resulting count is not 0, blocks the
calling thread until the internal count is decremented to 0 by one or more
other threads calling <code>count_down()</code>.
</p>
</dd>
</dl>
<p>
There are no copy or assignment operations.
</p>

<h4>Memory Ordering</h4>
All calls to <code>count_down()</code> synchronize with any thread
returning from <code>wait()</code>. All calls to <code>count_down()</code>
synchronize with any thread that gets a true value from <code>try_wait()</code>.

<h3><a name="barrier_operations">Barrier Operations</a></h3>
<p>
A barrier maintains an internal thread counter that is initialized when the
barrier is created. Threads may decrement the counter and then block waiting
until the specified number of threads are blocked. All threads will then be
woken up, and the barrier will reset. In addition, there is a mechanism to
change the thread count value after the count reaches 0.
</p>

<dl>
<dt><code>constructor( size_t );</code></dt>
<dd>
<p>
The parameter is the initial value of the internal thread counter.
</p>
<p>
Throws <code>std::invalid_argument</code> if the specified count is 0.
</p>
</dd>

<dt><code>constructor( size_t, std::function&lt;void()&gt; );</code></dt>
<dt><code>constructor( size_t, void (*completion)() );</code></dt>
<dd>
<p>
The parameters are the initial value of the internal thread counter, and a
function that will be invoked when the counter reaches 0.
</p>
<p>
Throws <code>std::invalid_argument</code> if the specified count is 0.
</p>
<p>
(Note that as well as these forms, a constructor templated on any callable type
could also be provided.)
</p>
</dd>

<dt><code>constructor( size_t, std::function&lt;size_t()&gt; );</code></dt>
<dt><code>constructor( size_t, size_t (*completion)() );</code></dt>
<dd>
<p>
The parameters are the initial value of the internal thread counter, and a
function that will be invoked when the counter reaches 0. The return value
from the function will be used to reset the internal thread counter.
</p>
<p>
Throws <code>std::invalid_argument</code> if the specified count is 0.
</p>
<p>
(Note that as well as these forms, a constructor templated on any callable type
could also be provided.)
</p>
</dd>

<dt><code>destructor( );</code></dt>
<dd>
<p>
Destroys the barrier. If the barrier is destroyed while other threads are
blocked in <code>count_down_and_wait()</code>, the behaviour is undefined.
</p>
</dd>

<dt><code>void count_down_and_wait( );</code></dt>
<dd>
<p>
Decrements the internal thread count by 1. If the resulting count is not 0,
blocks the calling thread until the internal count is decremented to 0 by one
or more other threads calling <code>count_down_and_wait()</code>.
</p>
<p>
Before any threads are released, the completion function registered in the
constructor will be invoked (if specified and non-NULL). Note that the
completion function may be invoked in the context of one of the blocked
threads.  When the completion function returns, the internal thread count will
be reset, and all blocked threads will be unblocked. If the barrier was created
with a void function, then the internal thread count will be reset to the
initial thread count specified in the contructor. If the barrier was created
with a function returning size_t, then the thread count will be reset to the
function's return value. It is illegal to return 0.
</p><p>
Note that it is safe for a thread to re-enter <code>count_down_and_wait()</code>
immediately. It is not necessary to ensure that all blocked threads have exited
<code>count_down_and_wait()</code> before one thread re-enters it.
</p></dd>
</dl>
<p>
There are no copy or assignment operations.
</p>
<p>
Note that the barrier does not have separate <code>count_down()</code> and
<code>wait()</code> methods. Every thread that counts down will then block until
all threads have counted down. Hence only the
<code>count_down_and_wait()</code> method is supported.
</p>


<h4>Memory Ordering</h4>
For threads X and Y that call <code>count_down_and_wait()</code>, the
call to <code>count_down_and_wait()</code> in X synchronizes with the return from
<code>count_down_and_wait()</code> in Y.

<h4>A Note on Completion Functions and Templates</h4>
The proposed barrier takes an optional completion function, which may either
return void or size_t. A barrier may thus do one of three things after all
threads have called <code>count_down_and_wait()</code>:
<ul>
<li>
Reset itself automatically (if given no completion function.)
</li>
<li>
Invoke the completion function and then reset itself automatically (if given a
function returning void).
</li>
<li>
Invoke the completion function and use the return value to reset itself (if
given a function returning size_t).
</li>
</ul>
<p>
It has been suggested that the barrier class could be templated on the type of
the completion function, so that a specialisation could be made for the first
type of barrier. (If the compiler knew that no completion function would be
involved, it could generate faster code, possibly using hardware barriers.)
</p>
<p>
However, doing this would require users of the barrier to declare a templated
barrier parameter rather than using a plain barrier type. Given this, it would
be simpler to create multipe barrier types (e.g. simple_barrier and barrier,
with the same interface) and let users of the barrier class template that.
</p>

<h4>Use of a scoped guard to manage latches and barriers</h4>
A future paper will propose a <code>scoped_guard</code>; a helper class that invokes a
function when it goes out of scope. This is analagous to
a <code>std::unique_ptr</code>, which deletes an object when it goes out of
scope.

The latch and barrier classes could provide scoped guards that would ensure
that a latch was always decremented when a worker had finished, regardless of
how the worker terminated. For example:
<pre>
<code>
  void DoWork(latch& completion_latch) {
    // Automatically invokes completion_latch.count_down() when this
    // function terminates 
    scoped_guard g = completion_latch.count_down_guard();
    
    switch (state) {
      case 0:
        return;
      case 1:
        alogorithm1();
        return;
      case 2:
        alogorithm2();
        return;
      default:
        throw std::logic_error     
    }  
  }  
</code>
</pre>
We suggest that a suitable <code>xxx_guard()</code> method be provided for all
of the latch and barrier methods, as an aid to safe useage. This would avoid
having threads waiting for termination conditions that are never triggered.

<h3><a name="example">Sample Usage</a></h3>
Sample use cases for the latch include:
<ul>
<li>
Setting multiple threads to perform a task, and then waiting until all threads
have reached a common point.
</li>
<li>
Creating multiple threads, which wait for a signal before advancing beyond a
common point.
</li>
</ul>
An example of the first use case would be as follows:
<pre>
<code>
  void DoWork(threadpool* pool) {
    latch completion_latch(NTASKS);
    for (int i = 0; i &lt; NTASKS; ++i) {
      pool-&gt;add_task([&amp;] {
        // perform work
        ...
        completion_latch.count_down();
      }));
    }
    // Block until work is done
    completion_latch.wait();
  }
</code>
</pre>
An example of the second use case is shown below. We need to load data and then
process it using a number of threads. Loading the data is I/O bound, whereas
starting threads and creating data structures is CPU bound. By running these in
parallel, throughput can be increased.
<pre>
<code>
  void DoWork() {
    latch start_latch(1);
    vector&lt;thread*&gt; workers;
    for (int i = 0; i &lt; NTHREADS; ++i) {
      workers.push_back(new thread([&amp;] {
        // Initialize data structures. This is CPU bound.
        ...
        start_latch.wait();
        // perform work
        ...
      }));
    }
    // Load input data. This is I/O bound.
    ...
    // Threads can now start processing
    start_latch.count_down();

    // Wait for threads to finish, delete allocated objects.
    ...
  }

</code>
</pre>
<p>
The barrier can be used to co-ordinate a set of threads carrying out a repeated
task. The number of threads can be adjusted dynamically to respond to changing
requirements.
</p>
<p>
In the example below, a number of threads are performing a multi-stage
task. Some tasks may require fewer steps than others, meaning that some threads
may finish before others. We reduce the number of threads waiting on the
barrier when this happens.
</p>

<pre>
<code>

  void DoWork() {
    Tasks&amp; tasks;
    size_t initial_threads;
    atomic&lt;size_t&gt; current_threads(initial_threads)
    vector&lt;thread*&gt; workers;

    // Create a barrier, and set a lambda that will be invoked every time the
    // barrier counts down. If one or more active threads have completed,
    // reduce the number of threads.
    std::function rf = [&amp;] { return current_threads;};
    barrier task_barrier(n_threads, rf);

    for (int i = 0; i &lt; n_threads; ++i) {
      workers.push_back(new thread([&amp;] {
        bool active = true;
        while(active) {
          Task task = tasks.get();
          // perform task
          ...
          if (finished(task)) {
            current_threads--;
            active = false;
          }
          task_barrier.count_down_and_wait();
         }
       });
    }

    // Read each stage of the task until all stages are complete.
    while (!finished()) {
      GetNextStage(tasks);
    }
  }
</code>
</pre>

<h2><a name="synopsis">Synopsis</a></h2>
<p>
The synopsis is as follows.
</p>

<pre>
<code>
class latch {
 public:
  explicit latch(size_t count);
  ~latch();

  void count_down();

  void wait();

  bool try_wait();

  void count_down_and_wait();
};

class barrier {
 public:
  explicit barrier(size_t num_threads) throw (std::invalid_argument);
  barrier(size_t num_threads, std::function&lt;void()&gt; completion) throw (std::invalid_argument);
  barrier(size_t num_threads, void (*completion)()) throw (std::invalid_argument);

  barrier(size_t num_threads, std::function&lt;size_t()&gt; completion) throw (std::invalid_argument);
  barrier(size_t num_threads, size_t (*completion)()) throw (std::invalid_argument);
  ~barrier();

  void count_down_and_wait();

};
</code>
</pre>

<h2><a name="wording">Wording</a></h2>
The wording in this section is relative to N3242.

<h3>30.7 Thread coordination [thread.coordination]</h3>
<p>
This section provides mechanisms for thread coordination: latches and
barriers.
These mechanisms allow multiple threads to block until a given operation
has completed.
</p>
<h3>30.7.1 Latches [thread.latch]</h3>
A latch is a thread co-ordination mechanism that allow one or more threads to
block until an operation is completed.
An individual latch is a single-use object; once the operation has been
completed, it cannot be re-used.
<h4>30.7.1.1 Class latch [thread.latch]</h4>
<pre>
<code>
namespace std {
  class latch {
  public:
    explicit latch(size_t count);
    ~latch();

    void count_down();

    void wait();

    bool try_wait();

    void count_down_and_wait();
  };
}
032840</pre>
</code>
<p>
The class <code>latch</code> provides a single-use thread coordination mechanism.
An individual latch instance maintains an internal counter that is initialized
when the latch is created.
One or more threads may block waiting until the counter is decremented to 0.
</p>
<p>
<code>latch(size_t count);</code>
</p><blockquote>
<p>
<i>Effects:</i>
Constructs an object of type latch.
The parameter is the initial value of the internal counter.
</p>
<p>
<i>Throws:</i>
Nothing.
</p>
</blockquote>

<code>~latch();</code>
<blockquote>
<p>
<i>Effects:</i>
Destroys the latch.
</p>
<p>
<i>Requires:</i>
There are no threads currently
calling <code>wait()</code>, <code>count_down()</code>
or <code>count_down_and_wait()</code>.
If there are, the behaviour is undefined.
</p>
<p>
<i>Throws:</i>
Nothing.
</p>
</blockquote>

<code>void count_down();</code>
<blockquote>
<p>
<i>Effects:</i>
Decrements the internal count by 1, and returns. If the count reaches 0, any
threads blocked in <code>wait()</code> will be released.
</p><p>
<i>Synchronization:</i>
All calls to <code>count_down()</code> synchronize with any thread
returning from <code>wait()</code>.
All calls to <code>count_down()</code> synchronize with any thread 
that obtains a true return value from <code>try_wait()</code>.
</p>
<p>
<i>Throws:</i>
<code>std::logic_error</code> if the internal count is already 0.
<code>system_error</code> when an exception is required.
</p>
</blockquote>
<p></p>

<code>void wait();</code>
<blockquote>
<p>
<i>Effects:</i>
Blocks the calling thread until the internal count is decremented to 0 by one
or more other threads calling <code>count_down()</code>.
If the count is already 0, returns immediately.
</p>
<p>
<i>Synchronization:</i>
See <code>count_down()</code>.
</p>
<p>
<i>Throws:</i>
 <code>system_error</code> when an exception is required.
</p>
</blockquote>

<code>bool try_wait();</code>
<blockquote>
<p>
<i>Effects:</i>
Returns true if the internal count has been decremented to 0 by one
or more other threads calling <code>count_down()</code>, and false otherwise.
Does not block the calling thread.
</p>
<p>
<i>Synchronization:</i>
See <code>count_down()</code>.
</p>
<p>
<i>Throws:</i>
Nothing.
</p>
</blockquote>

<code>void count_down_and_wait();</code>
<blockquote>
<p>
<i>Effects:</i>
Decrements the internal count by 1.
If the resulting internal count is not 0, blocks the
calling thread until the internal count is decremented to 0 by one or more
other threads calling <code>count_down()</code> or <code>count_down_and_wait</code>.
</p>
<p>
<i>Synchronization:</i>
All calls to <code>count_down_and_wait()</code> synchronize with any thread
returning from <code>wait()</code>.
All calls to <code>count_down_and_wait()</code> synchronize with any thread 
that obtains a true return value from <code>try_wait()</code>.
</p>
<p>
<i>Throws:</i>
<code>std::logic_error</code> if the internal count is already 0.
<code>system_error</code> when an exception is required.
</p>
</blockquote>
<h3>30.7.2 Barriers [thread.barrier]</h3>
A barrier is a thread co-ordination mechanism that allows one or more threads to
block until an operation is completed. A barrier is re-usable; once the operation
has been completed, it can be used again to block until a subsequent set of
operations has completed.

<h4>30.7.2.1 Class barrier [thread.barrier]</h4>
<pre>
<code>
class barrier {
 public:
  explicit barrier(size_t num_threads) throw (std::invalid_argument);
  barrier(size_t num_threads, std::function&lt;void()&gt; completion) throw (std::invalid_argument);
  barrier(size_t num_threads, void (*completion)()) throw (std::invalid_argument);

  barrier(size_t num_threads, std::function&lt;size_t()&gt; completion) throw (std::invalid_argument);
  barrier(size_t num_threads, size_t (*completion)()) throw (std::invalid_argument);

  ~barrier();

  void count_down_and_wait();

};
</code>
</pre>
<p>
The class <code>barrier</code> provides a re-usable coordination mechanism.
An individual barrier maintains an internal thread counter that is
initialized when the barrier is created.
One or more threads may decrement the counter and then block waiting
until the specified number of threads are similarly blocked.
All threads will then be unblocked, and the barrier will reset.
While resetting the barrier, there is a mechanism to
change the internal thread count value.
</p>
<p>
<code>barrier(size_t count);</code>
</p><blockquote>
<p>
<i>Effects:</i>
Constructs an object of type barrier.
The parameter is the initial value of the internal counter.
</p>
<p>
<i>Throws:</i>
<code>std::invalid_argument</code> if the specified count is 0.
</p>
</blockquote>
<p>

<code>barrier(size_t num_threads, std::function&lt;void()&gt; completion);</code>
</p><blockquote>
<p>
<i>Effects:</i>
Constructs an object of type barrier.
The parameters are the initial value of the internal counter, and a 
<code>std::function</code> completion that will be invoked when the 
internal count reaches 0.
If the function argument is NULL, it will be ignored.
</p>
<p>
<i>Throws:</i>
<code>std::invalid_argument</code> if the specified count is 0.
</p>
</blockquote>

<code>barrier(size_t num_threads, void (*completion)()));</code>
<blockquote>
<p>
<i>Effects:</i>
Constructs an object of type barrier.
The parameters are the initial value of the internal counter, and a 
completion function that will be invoked when the internal count reaches 0.
If the function argument is NULL, it will be ignored.
</p>
<p>
<i>Throws:</i>
<code>std::invalid_argument</code> if the specified count is 0.
</p>
</blockquote>

<code>barrier(size_t num_threads, std::function&lt;size_t()&gt; completion);</code>
<blockquote>
<p>
<i>Effects:</i>
Constructs an object of type barrier.
The parameters are the initial value of the internal counter, and a 
<code>std::function</code> completion function that will be invoked
when the internal count reaches 0.
If the function argument is NULL, it will be ignored.
</p>
<p>
<i>Throws:</i>
<code>std::invalid_argument</code> if the specified count is 0.
</p>
</blockquote>

<code>barrier(size_t num_threads, size_t (*completion)()));</code>
<blockquote>
<p>
<i>Effects:</i>
Constructs an object of type barrier.
The parameters are the initial value of the internal counter, and a 
completion function that will be invoked when the internal count reaches 0.
If the function argument is NULL, it will be ignored.
</p>
<p>
<i>Throws:</i>
<code>std::invalid_argument</code> if the specified count is 0.
</p>
</blockquote>


<code>~barrier();</code>
<blockquote>
<p>
<i>Effects:</i>
Destroys the barrier.
</p>
<p>
<i>Requires:</i>
There are no threads currently
calling <code>count_down_and_wait()</code>.
If there are, the behaviour is undefined.
</p>
<p>
<i>Throws:</i>
Nothing.
</p>
</blockquote>

<code>void count_down_and_wait();</code>
<blockquote>
<p>
<i>Effects:</i>
Decrements the internal thread count by 1.
If the resulting count is not 0, blocks the calling thread until the 
internal count is decremented to 0 by one or more other threads calling
<code>count_down_and_wait()</code>. 
Before any threads are unblocked, the completion function parameter passed to the 
constructor (if specified and non-NULL) will be invoked.
When the completion function returns, the internal thread count will
be reset, and all blocked threads will be unblocked.
If the barrier was created with a void completion function, then the internal
thread count will be reset to the initial thread count specified in the
contructor.
If the barrier was created with a completion function returning size_t, then
the thread count will be reset to the function's return value. It is illegal to
return 0, and a <code>std::logic_error</code> will be thrown in the completion
function's thread if this occurs.
</p>
<p>
<i>Note:</i>
It is safe for a thread to re-enter <code>count_down_and_wait()</code>
immediately.
It is not necessary to ensure that all blocked threads have exited
<code>count_down_and_wait()</code> before one thread re-enters it.
</p>
<p>
<i>Note:</i>The thread that the completion function is invoked in is
implementation dependent.
It may be invoked in one of the threads that has called <code>count_down_and_wait()</code>.
</p>
<p>
<i>Synchronization:</i>
All calls to <code>count_down_and_wait()</code> synchronize with any thread
returning from <code>count_down_and_wait()</code>.
The invocation of the completion function synchronizes with any thread that has
called <code>count_down_and_wait()</code>.
</p>
<p>
<i>Throws:</i>
<code>std::logic_error</code> in the thread of the completion function if the
completion function returns 0.
<code>system_error</code> when an exception is required.
</p>
</blockquote>
</body>
</html>
