<html><head><title>C++ Latches and Barriers</title><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">.lst-kix_list_1-1>li:before{content:"\002022  "}.lst-kix_list_2-8>li:before{content:"\002022  "}.lst-kix_list_3-2>li:before{content:"\002022  "}.lst-kix_list_3-7>li:before{content:"\002022  "}.lst-kix_list_2-0>li:before{content:"\002022  "}.lst-kix_list_1-2>li:before{content:"\002022  "}.lst-kix_list_1-5>li:before{content:"\002022  "}.lst-kix_list_2-3>li:before{content:"\002022  "}.lst-kix_list_2-4>li:before{content:"\002022  "}.lst-kix_list_3-5>li:before{content:"\002022  "}.lst-kix_list_1-4>li:before{content:"\002022  "}.lst-kix_list_3-0>li:before{content:"\002022  "}.lst-kix_list_1-0>li:before{content:"\002022  "}.lst-kix_list_2-5>li:before{content:"\002022  "}.lst-kix_list_2-7>li:before{content:"\002022  "}.lst-kix_list_1-8>li:before{content:"\002022  "}.lst-kix_list_3-4>li:before{content:"\002022  "}.lst-kix_list_1-3>li:before{content:"\002022  "}.lst-kix_list_3-3>li:before{content:"\002022  "}.lst-kix_list_3-6>li:before{content:"\002022  "}.lst-kix_list_3-8>li:before{content:"\002022  "}ul.lst-kix_list_3-7{list-style-type:none}ul.lst-kix_list_3-8{list-style-type:none}.lst-kix_list_1-6>li:before{content:"\002022  "}.lst-kix_list_2-6>li:before{content:"\002022  "}ul.lst-kix_list_3-0{list-style-type:none}ul.lst-kix_list_3-1{list-style-type:none}ul.lst-kix_list_3-2{list-style-type:none}ul.lst-kix_list_3-3{list-style-type:none}ul.lst-kix_list_3-4{list-style-type:none}.lst-kix_list_3-1>li:before{content:"\002022  "}ul.lst-kix_list_3-5{list-style-type:none}ul.lst-kix_list_3-6{list-style-type:none}ul.lst-kix_list_1-0{list-style-type:none}ul.lst-kix_list_1-2{list-style-type:none}ul.lst-kix_list_2-4{list-style-type:none}.lst-kix_list_2-2>li:before{content:"\002022  "}ul.lst-kix_list_1-1{list-style-type:none}ul.lst-kix_list_2-5{list-style-type:none}ul.lst-kix_list_1-4{list-style-type:none}ul.lst-kix_list_2-6{list-style-type:none}.lst-kix_list_1-7>li:before{content:"\002022  "}ul.lst-kix_list_1-3{list-style-type:none}ul.lst-kix_list_2-7{list-style-type:none}ul.lst-kix_list_2-0{list-style-type:none}ul.lst-kix_list_1-6{list-style-type:none}ul.lst-kix_list_2-1{list-style-type:none}ul.lst-kix_list_1-5{list-style-type:none}ul.lst-kix_list_2-2{list-style-type:none}ul.lst-kix_list_1-8{list-style-type:none}ul.lst-kix_list_1-7{list-style-type:none}ul.lst-kix_list_2-3{list-style-type:none}.lst-kix_list_2-1>li:before{content:"\002022  "}ul.lst-kix_list_2-8{list-style-type:none}ol{margin:0;padding:0}.c20{border-bottom-width:1pt;border-top-style:solid;width:398pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c37{border-bottom-width:1pt;border-top-style:solid;width:38.7pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c21{border-bottom-width:1pt;border-top-style:solid;width:60.8pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c1{vertical-align:baseline;color:#000000;font-size:10pt;font-style:normal;font-family:"Courier New";text-decoration:none;font-weight:normal}.c9{vertical-align:baseline;color:#000000;font-size:14pt;font-style:normal;font-family:"Times New Roman";text-decoration:none;font-weight:normal}.c15{color:#000000;font-style:normal;font-family:"Times New Roman";text-decoration:none;font-weight:normal}.c24{vertical-align:baseline;font-size:14pt;font-style:normal;font-family:"Times New Roman";font-weight:normal}.c2{line-height:1.0;padding-top:0pt;text-align:left;padding-bottom:0pt}.c3{line-height:1.0;padding-top:0pt;text-align:left;padding-bottom:6pt}.c19{line-height:1.0;padding-top:0pt;text-align:left;padding-bottom:14.2pt}.c29{color:#000000;font-style:normal;text-decoration:none;font-weight:normal}.c41{max-width:540pt;background-color:#ffffff;padding:21.6pt 36pt 21.6pt 36pt}.c0{widows:2;orphans:2;direction:ltr}.c43{margin-right:auto;border-collapse:collapse}.c22{line-height:1.15;direction:ltr;margin-left:36pt}.c40{list-style-position:inside;text-indent:45pt;margin-left:35.4pt}.c35{font-size:14pt;text-decoration:none;font-weight:normal}.c4{vertical-align:baseline;font-size:12pt;font-family:"Courier New"}.c13{color:inherit;text-decoration:inherit}.c46{line-height:1.15;direction:ltr}.c32{color:#000000;font-family:"Times New Roman"}.c11{vertical-align:baseline;font-size:12pt}.c5{font-size:10pt;font-family:"Courier New"}.c27{font-size:11pt;font-family:"Courier New"}.c33{font-size:11pt;font-family:"Arial"}.c16{margin-left:28.4pt;padding-bottom:14.2pt}.c6{font-size:12pt;font-family:"Courier New"}.c34{margin:0;padding:0}.c8{color:#1155cc;text-decoration:underline}.c44{font-weight:normal}.c17{height:0pt}.c38{padding-bottom:14.2pt}.c10{margin-left:72pt}.c26{margin-left:28.4pt}.c28{margin-left:18pt}.c18{page-break-after:avoid}.c45{font-family:"Arial"}.c7{font-style:italic}.c30{vertical-align:baseline}.c25{margin-left:36pt}.c42{padding-bottom:6pt}.c39{padding-top:12pt}.c12{padding-bottom:0pt}.c23{font-size:12pt}.c31{margin-left:54pt}.c14{height:14pt}.c36{font-size:14pt}.title{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:center;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}.subtitle{widows:2;padding-top:0pt;line-height:1.0;orphans:2;text-align:center;color:#000000;font-size:14pt;font-family:"Arial";padding-bottom:3pt}li{color:#000000;font-size:14pt;font-family:"Times New Roman"}p{color:#000000;font-size:14pt;margin:0;font-family:"Times New Roman"}h1{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:24pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h2{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:18pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h3{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:14pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h4{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:14pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h5{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-style:italic;font-size:13pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:3pt}h6{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:11pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:3pt}</style></head><body class="c41"><h1 class="c0 c18"><a name="h.mrc6952e0whh"></a><span class="c32">C++ Latches and Barriers</span></h1><p class="c3 c0"><span class="c9">ISO/IEC JTC1 SC22 WG21 N</span><span>3998</span><span class="c9">&nbsp;- 2014-0</span><span>5</span><span class="c9">-</span><span>21</span></p><p class="c3 c0"><span class="c9">Alasdair Mackintosh, </span><span class="c24 c8"><a class="c13" href="mailto:alasdair@google.com">alasdair@google.com</a></span><span class="c9">, </span><span class="c8 c24"><a class="c13" href="mailto:alasdair.mackintosh@gmail.com">alasdair.mackintosh@gmail.com</a></span></p><p class="c3 c0"><span>Olivier Giroux, </span><span class="c8"><a class="c13" href="mailto:OGiroux@nvidia.com">OGiroux@nvidia.com</a></span><span>, </span><span class="c8"><a class="c13" href="mailto:ogiroux@gmail.com">ogiroux@gmail.com</a></span></p><p class="c3 c0 c14"><span></span></p><p class="c0 c28"><span class="c8"><a class="c13" href="#h.mrc6952e0whh">C++ Latches and Barriers</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.qfiyr9nhb3on">Revision History</a></span></p><p class="c0 c25"><span class="c8"><a class="c13" href="#h.r7uhgh9a3qqw">Introduction</a></span></p><p class="c0 c25"><span class="c8"><a class="c13" href="#h.rs9bik786ahx">Solution</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.94fny07f18oa">Concepts</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.tvy8lbn78dwj">ArriveAndWaitable</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.kfz8b1e14yq7">Latch</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.s6u4pup9f1v">Barrier</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.w1z5efbsofkv">Classes</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.ochtd04q8zj5">Header std::latch Synopsis</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.4kr7i3465ny5">Memory Ordering</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.ecoi17nlqseu">Class std::barrier</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.34mqtw978gfl">Memory Ordering</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.9fyt9rmi4g2z">Class std::notifying_barrier</a></span></p><p class="c0 c10"><span class="c8"><a class="c13" href="#h.iqd3r8x75vrc">Memory Ordering</a></span></p><p class="c0 c25"><span class="c8"><a class="c13" href="#h.p3kxt6srrk2q">Notes</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.7jwjdi7yg2r3">Use of a scoped guard to manage latches and barriers</a></span></p><p class="c0 c31"><span class="c8"><a class="c13" href="#h.dcshjdgb0rki">Sample Usage</a></span></p><p class="c0 c25"><span class="c8"><a class="c13" href="#h.gtvi3sys2b7e">Alternative Solutions</a></span></p><p class="c0 c25"><span class="c8"><a class="c13" href="#h.32y3d6lgyg3p">Synopsis</a></span></p><h3 class="c0 c18"><a name="h.qfiyr9nhb3on"></a><span class="c32">Revision History</span></h3><a href="#" name="b5c341676f4a23547c82137abbf5ad1c8e5015de"></a><a href="#" name="0"></a><table cellpadding="0" cellspacing="0" class="c43"><tbody><tr class="c17"><td class="c37"><p class="c2 c0"><span class="c15 c11">N3666</span></p></td><td class="c21"><p class="c2 c0"><span class="c15 c11">2013-04-18</span></p></td><td class="c20"><p class="c2 c0"><span class="c11 c15">Initial Version</span></p></td></tr><tr class="c17"><td class="c37"><p class="c2 c0"><span class="c15 c11">N3817</span></p></td><td class="c21"><p class="c2 c0"><span class="c15 c11">2013-10-11</span></p></td><td class="c20"><p class="c2 c0"><span class="c15 c11">Clarify destructor behaviour. Add comment on templatised completion functions.</span></p></td></tr><tr class="c17"><td class="c37"><p class="c2 c0"><span class="c15 c11">N3885</span></p></td><td class="c21"><p class="c2 c0"><span class="c15 c11">2013-01-21</span></p></td><td class="c20"><p class="c2 c0"><span class="c15 c11">Add Alternative Solutions section. (Not formally published)</span></p></td></tr><tr class="c17"><td class="c37"><p class="c2 c0"><span class="c23">N3998</span></p></td><td class="c21"><p class="c2 c0"><span class="c23">2014-05-21</span></p></td><td class="c20"><p class="c2 c0"><span class="c15 c11">Add Concepts, simplify latch and barrier, add notifiying_barrier</span></p></td></tr></tbody></table><h2 class="c0 c18"><a name="h.r7uhgh9a3qqw"></a><span class="c32">Introduction</span></h2><p class="c3 c0"><span class="c9">Certain idioms that are commonly used in concurrent programming are missing from the standard libraries. Although many of these these can be relatively straightforward to implement, we believe it is more efficient to have a standard version.</span></p><p class="c3 c0"><span class="c9">In addition, although some idioms can be provided using mutexes, higher performance can often be obtained with atomic operations and lock-free algorithms. However, these algorithms are more complex to write, and are prone to error.</span></p><p class="c3 c0"><span class="c9">Other standard concurrency idioms may have difficult corner cases, and can be hard to implement correctly. For these reasons, we believe that it is valuable to provide these in the standard library.</span></p><a href="#" name="id.30j0zll"></a><h2 class="c0 c18"><a name="h.rs9bik786ahx"></a><span class="c32">Solution</span></h2><p class="c3 c0"><span class="c9">We propose a set of commonly-used concurrency classes, some of which may be implemented using efficient lock-free algorithms where appropriate. This paper describes var</span><span class="c36">ious concepts related to thread co-ordination, a</span><span>n</span><span class="c36">d defines </span><span class="c9">the</span><span class="c35 c32 c7 c30">&nbsp;latch</span><span class="c36">,</span><span class="c35 c32 c7 c30">&nbsp;</span><span class="c7 c36">barrier</span><span class="c36">&nbsp;and </span><span class="c7 c36">notifying_barrier</span><span class="c9">&nbsp;classes.</span></p><p class="c0 c3"><span class="c9">Latches are a thread coordination mechanism that allow one or more threads to block until an operation is completed. An individual latch is a single-use object; once the operation has been completed, it cannot be reused.</span></p><p class="c3 c0"><span class="c9">Barriers are a thread coordination mechanism that allow multiple threads to block until an operation is completed. Unlike a latch, a barrier is re-usable; once the operation has been completed, the threads can re-use the same barrier. It is thus useful for managing </span><span class="c9">repeated tasks</span><span class="c9">, or phases of a larger task, that are handled by multiple threads.</span></p><p class="c3 c0"><span class="c36">Notifying Barriers allow additional behaviour to be defined when an operation has completed.</span></p><p class="c3 c0"><span class="c9">A reference implementation of these classes has been written</span><span class="c9">.</span></p><a href="#" name="id.w3v6sd8c9dfs"></a><h3 class="c0 c18 c42"><a name="h.94fny07f18oa"></a><span>Concepts</span></h3><p class="c0"><span>In the section below, a </span><span class="c7">synchronization point</span><span>&nbsp;represents a point at which </span><span>a</span><span>&nbsp;thread may block until a given </span><span class="c7">synchronization condition</span><span>&nbsp;has been reached </span><span>or at which it may notify other threads that a synchronization condition has been achieved.</span></p><p class="c0"><span class="c36">We define the following concepts:</span></p><h4 class="c0 c18"><a name="h.tvy8lbn78dwj"></a><span>ArriveAndWaitable</span></h4><p class="c0"><span class="c36">Provides:</span></p><p class="c22"><span class="c27">arrive_and_wait()</span><span>&nbsp;- Allows a single thread to indicate that it has arrived at a synchronization point. The thread will block until the synchronization condition has been reached. May only be called once by a given thread.</span></p><h4 class="c0 c18"><a name="h.kfz8b1e14yq7"></a><span>Latch</span></h4><p class="c0"><span>Provides ArriveAndWaitable, plus:</span></p><p class="c22"><span class="c6">wait()</span><span>&nbsp;- The calling thread will block until the synchronization condition has been reached.</span></p><p class="c22"><span class="c6">arrive()</span><span>&nbsp;- Allows a single thread to indicate that is has arrived at the synchronization point. Does not block. May only be called once by a given thread.</span></p><p class="c22"><span class="c6">count_down(N)</span><span>&nbsp;- decrements </span><span>by N</span><span>&nbsp;the internal counter that determines when the synchronization condition has been reached. May be called more than once by a given thread. </span></p><p class="c14 c46"><span class="c33"></span></p><h4 class="c0 c18"><a name="h.s6u4pup9f1v"></a><span>Barrier&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></h4><p class="c0"><span>Provides ArriveAndWaitable, plus:</span></p><p class="c0 c25"><span class="c27">arrive_and_wait()</span><span>&nbsp;- Allows a single thread to indicate that it has arrived at a synchronization point. The thread will block until the synchronization condition has been reached. May be called repeatedly by a given thread.</span></p><p class="c0 c25"><span class="c6">arrive_and_drop</span><span class="c6">()</span><span class="c23 c45">- </span><span>Allows a single thread to indicate that it has arrived at a synchronization point. The thread will not block. Once a thread returns from this function, it shall not invoke other methods on the </span><span>barrier</span><span>&nbsp;(except the destructor, if otherwise valid).</span></p><h3 class="c0 c18"><a name="h.w1z5efbsofkv"></a><span>Class</span><span>es</span></h3><h3 class="c0 c18"><a name="h.ochtd04q8zj5"></a><span>Header std::latch Synopsis</span></h3><p class="c0"><span>Provides the Latch concept.</span></p><p class="c3 c0"><span class="c9">A latch maintains an internal counter that is initialized when the latch is created. The </span><span class="c35 c32 c7 c30">synchronization condit</span><span class="c7">ion</span><span>&nbsp;is reached when the counter is decremented to 0. Threads may block at a </span><span class="c7">synchronization point</span><span>&nbsp;waiting for the condition to be reached.</span><span class="c7">&nbsp;</span><span>When the condition is reached, any such blocked threads will be released.</span></p><p class="c3 c0 c14"><span></span></p><p class="c19 c0"><span class="c6">latch</span><span class="c29 c4">( </span><span class="c29 c4">int</span><span class="c6">&nbsp;</span><span class="c29 c4">C</span><span class="c29 c4">&nbsp;);</span></p><p class="c19 c0 c26"><span class="c7">Requires</span><span>: C shall be &gt;= zero.</span></p><p class="c19 c0 c26"><span class="c7">Effects</span><span>: initializes the latch with a count of C.</span><span>&nbsp; </span><span class="c7">[Note: </span><span class="c7">If C is zero, the synchronization condition has been reached. - End note]</span></p><p class="c19 c0 c26"><span class="c7">Synchronization: </span><span>None</span></p><p class="c19 c0"><span class="c6">~latch</span><span class="c29 c4">( );</span></p><p class="c0 c16"><span class="c7">Requires:</span><span>&nbsp;No threads are blocked at the synchronization </span><span>point</span><span>. </span><span class="c9">Note that the latch may be destroyed if threads have not yet returned from </span><span class="c6">wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_wait()</span><span>&nbsp;provided that the condition has been reached.</span></p><p class="c0 c38"><span class="c6">void arrive( );</span></p><p class="c0 c16"><span class="c7">Effects</span><span>: </span><span>Decrements</span><span>&nbsp;the internal count by 1. </span><span>If the count reaches 0 the</span><span>&nbsp;synchronization condition is reached.</span><span>&nbsp;I</span><span>f called more than once by a given thread the behaviour is undefined.</span></p><p class="c0 c16"><span class="c7">Throws:</span><span>&nbsp;std::logic_error if the internal count would be decremented below 0.</span></p><p class="c0 c16"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result</span><span>.</span></p><p class="c0 c38"><span class="c6">void arrive_and_wait( );</span></p><p class="c0 c16"><span class="c7">Effects</span><span>: </span><span>Decrements the internal count by 1. If the count reaches 0 </span><span>the synchronization condition is reached. Otherwise b</span><span>locks</span><span>&nbsp;at the synchronization point until the synchronization condition is reached. If called more than once by a given thread the behaviour is undefined.</span></p><p class="c0 c16"><span class="c7">Throws:</span><span>&nbsp;std::logic_error if the internal count would be decremented below 0.</span></p><p class="c0 c16"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.</span></p><p class="c19 c0"><span class="c29 c4">void count_down( </span><span class="c6">int</span><span class="c29 c4">&nbsp;N );</span></p><p class="c19 c0 c26"><span class="c7">Effects</span><span>: </span><span class="c9">Decrements the internal count by </span><span>N</span><span class="c9">. If the count reaches 0, </span><span>the synchronization condition is reached</span><span class="c9">. May be called by any thread. Does not block.</span></p><p class="c19 c0 c26"><span class="c32 c7 c30 c35">Throws:</span><span class="c9">&nbsp;std::logic_error if the internal count </span><span>would be decremented below </span><span>0</span><span class="c9">.</span></p><p class="c19 c0 c26"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.</span></p><p class="c19 c0"><span class="c29 c4">void wait( );</span></p><p class="c19 c0 c26"><span class="c7">Effects</span><span>: </span><span class="c9">Blocks the calling thread at the synchronization point until the </span><span>synchronization condition is reached.</span><span class="c9">&nbsp;If the</span><span>&nbsp;condition has already been reached</span><span class="c9">, </span><span>&nbsp;the thread does not block.</span></p><p class="c0 c38"><span class="c6">bool try_wait( );</span></p><p class="c19 c0 c26"><span class="c7">Returns: </span><span class="c9">Returns true if the synchronization condition has been is reached, and false otherwise. Does not block.</span></p><p class="c0"><span class="c6">latch(const latch&amp;) = delete;<br>latch&amp; operator=(const latch&amp;) = delete;</span></p><h4 class="c0 c18"><a name="h.4kr7i3465ny5"></a><span class="c32 c36">Memory Ordering</span></h4><p class="c3 c0"><span class="c9">All calls to</span><span class="c15 c11">&nbsp;</span><span class="c29 c4">count_down()</span><span>, </span><span class="c29 c4">arrive(</span><span class="c6">)</span><span>, and </span><span class="c6">arrive_and_wait()</span><span class="c9">synchronize with any </span><span>calls to</span><span class="c15 c11">&nbsp;</span><span class="c6">wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_wait()</span><span>&nbsp;that complete as a result</span><span class="c29 c30">.</span><span class="c9">&nbsp; All calls </span><span class="c30">to </span><span class="c6">count_down()</span><span>,</span><span>&nbsp;</span><span class="c6">arrive()</span><span>, or </span><span class="c6">arrive_and_wait()</span><span class="c30">&nbsp;s</span><span class="c9">ynchronize with any </span><span>call to</span><span class="c4">&nbsp;try_wait()</span><span class="c11">&nbsp;that returns </span><span class="c4">true</span><span class="c30">&nbsp;as a result.</span></p><p class="c3 c0 c14"><span></span></p><h3 class="c0 c18"><a name="h.ecoi17nlqseu"></a><span>Header std::barrier synopsis</span></h3><p class="c3 c0"><span>Provides the Barrier concept.</span></p><p class="c3 c0"><span>A barrier is created with an initial value representing the number of threads that can arrive at the </span><span class="c7">synchronization point. </span><span>When that many threads have arrived, the &nbsp;</span><span class="c7">synchronization condition</span><span>&nbsp;is reached and the threads are released. The barrier will then reset, and may be reused for a new cycle, in which the same set of threads may arrive again at the synchronization point. T</span><span>he same set of threads</span><span>&nbsp;shall arrive at the barrier in each cycle, otherwise the behaviour is undefined</span><span class="c9">.</span></p><p class="c3 c0 c14"><span></span></p><p class="c3 c0"><span class="c6">barrier</span><span class="c29 c4">( int C );<br></span></p><p class="c19 c0 c26"><span class="c7">Requires: </span><span>C shall be &gt;= </span><span>zero</span><span>. </span><span>[Note: </span><span>If C is zero, the synchronization condition is considered to have already been reached. In the case, the barrier may only be destroyed. End Note]</span><span><br><br></span><span class="c7">Effects:</span><span>&nbsp;initializes the barrier with the </span><span>number of participating threads</span><span>&nbsp;C.<br></span></p><p class="c19 c0"><span class="c6">~barrier</span><span class="c29 c4">( );</span></p><p class="c19 c0 c26"><span class="c7">Requires:</span><span>&nbsp;</span><span>No </span><span class="c9">threads are blocked </span><span>at the synchronization </span><span>point</span><span>.</span></p><p class="c19 c0 c26"><span class="c7">Effects:</span><span>&nbsp;destroys the barrier</span></p><p class="c19 c0"><span class="c4 c29">void </span><span class="c6">arrive</span><span class="c29 c4">_and_wait( );</span></p><p class="c19 c0 c26"><span class="c7">Effects: </span><span>Blocks at the </span><span class="c7">synchronization point</span><span>&nbsp;until the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached. When all threads (as determined by the initial thread count parameter to the constructor) have arrived, the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached and all threads are released. The barrier may then be re-used for another cycle.</span></p><p class="c19 c0 c26"><span>[</span><span class="c9">Note: it is safe for a thread to re-enter </span><span class="c6">arrive_and_wait</span><span class="c29 c4">()</span><span class="c9">&nbsp;immediately. It is not necessary to ensure that all blocked threads have exited </span><span class="c6">arrive_and_wait</span><span class="c29 c4">()</span><span class="c9">&nbsp;before one thread re-enters it. -- </span><span>E</span><span class="c9">nd note]</span></p><p class="c19 c0 c26"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c0"><span class="c6">void arrive_and_drop( );</span></p><p class="c0 c16"><span class="c7">Effects: </span><span>Signals that this thread has arrived at the </span><span class="c7">synchronization point</span><span>. </span><span>May block</span><span>&nbsp;until the </span><span class="c7">synchronization condition </span><span>is reached</span><span>.</span><span>&nbsp;When the barrier resets, </span><span>the current thread is removed from the set of participating threads</span><span>.</span></p><p class="c0 c16"><span>I</span><span>f the barrier was created with an initial count of N, </span><span>and all N threads call</span><span class="c6">&nbsp;</span><span class="c6">arrive_and_drop()</span><span>, any further operations on the barrier are undefined, apart from calling the destructor.</span></p><p class="c0 c16"><span>Calling </span><span class="c6">arrive_and_drop()</span><span>modifies the requirement that the same set of threads must arrive in each cycle. The thread that drops it is no longer considered as part of this set. </span><span>If a thread that has called </span><span class="c6">arrive_and_drop()</span><span>&nbsp;calls another </span><span>method </span><span>on the same barrier, other than the destructor, the results are undefined.</span></p><p class="c0 c16"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c0"><span class="c6">barrier(const barrier&amp;) = delete;<br>barrier&amp; operator=(const barrier&amp;) = delete;</span></p><h4 class="c0 c18"><a name="h.34mqtw978gfl"></a><span class="c32 c36">Memory Ordering</span></h4><p class="c0"><span>All calls to</span><span class="c23">&nbsp;</span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>synchronize with any calls to</span><span class="c23">&nbsp;</span><span class="c6">arrive_and_wait()</span><span>&nbsp;that complete as a result.</span></p><h3 class="c0 c18"><a name="h.9fyt9rmi4g2z"></a><span><br>Header std::</span><span>notifying_barrier</span><span>&nbsp;synppsis</span></h3><p class="c0"><span>Provides the Barrier concept.</span><span><br><br>A notifying barrier behaves as a barrier, but is constructed with a callable completion function that is invoked after all threads have arrived at the </span><span class="c7">synchronization point</span><span>, and before the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached. The completion may </span><span>modify</span><span>&nbsp;the set of threads</span><span>&nbsp;that arrives at the barrier in each cycle.</span></p><p class="c0 c14"><span></span></p><p class="c0 c12"><span class="c6">template &lt;typename T&gt;</span></p><p class="c0 c12"><span class="c6">notifying_barrier( int C, T F );</span></p><p class="c0 c16 c14"><span class="c6"></span></p><p class="c0 c16"><span class="c7">Requires: </span><span>C shall be &gt;= </span><span>zero</span><span>.</span><span>&nbsp;F shall conform to the callable </span><span class="c6">int()</span><span>&nbsp;concept. &nbsp;</span><span>[Note: </span><span>If C is zero, the synchronization condition is considered to have already been reached. In the case, the barrier may only be destroyed. End Note]</span><span><br></span><span class="c7">Effects:</span><span>&nbsp;initializes the barrier with a thread count of C, &nbsp;and </span><span>a callable object that will be invoked after</span><span>&nbsp;the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached.</span><span>&nbsp;</span></p><p class="c0 c38"><span class="c6">~notifying_barrier( );</span></p><p class="c0 c16"><span class="c7">Requires:</span><span>&nbsp;No threads are blocked at the </span><span class="c7">synchronization point</span><span>.</span></p><p class="c0 c16"><span class="c7">Effects:</span><span>&nbsp;destroys the barrier</span></p><p class="c0 c38"><span class="c6">void arrive_and_wait( );</span></p><p class="c0 c16"><span class="c7">Effects: </span><span>Blocks at the </span><span class="c7">synchronization point</span><span>&nbsp;until the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached. When all threads (as determined by the initial thread count parameter to the constructor) have arrived, the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached. Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion may be invoked in the context of one of the threads that invoked </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>.) When the completion returns, </span><span>the internal state will be reset</span><span>, and all blocked threads will be unblocked. The barrier may then be used for a new cycle.</span></p><p class="c0 c16"><span>If the </span><span>completion </span><span>returns 0 then the set of participating threads is unchanged</span><span>.</span><span>&nbsp;</span><span>Otherwise the count of expected threads is set to the completion&#39;s return value</span><span>. (If the count is altered, this relaxes the restriction that the same set of threads must arrive at the barrier in each new cycle.)</span></p><p class="c0 c16"><span>Note that it is safe for a thread to re-enter </span><span class="c6">arrive_and_wait()</span><span>&nbsp;immediately. It is not necessary to ensure that all blocked threads have exited </span><span class="c6">arrive_and_wait()</span><span>&nbsp;before one thread re-enters it.</span></p><p class="c0 c16"><span class="c7">Synchronization: </span><span>Synchronizes with invocations of </span><span>the</span><span>&nbsp;completion function. Invocations of the c</span><span>ompletion</span><span>&nbsp;function then s</span><span>ynchronize with calls unblocked as a result of this call.</span></p><p class="c0"><span class="c6">void arrive_and_drop( );</span></p><p class="c0 c16"><span class="c7">Effects: </span><span>Signals that this thread has arrived at the </span><span class="c7">synchronization point</span><span>. When all threads (as determined by the initial thread count parameter to the constructor) have arrived, the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached. May block until the </span><span class="c7">synchronization condition</span><span>&nbsp;is reached</span><span>.</span><span>&nbsp;Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion may be invoked in the context of one of the threads that invoked </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>.) When the completion returns, </span><span>the internal state will be reset</span><span>, and all blocked threads will be unblocked. The barrier may then be used for a new cycle.</span></p><p class="c0 c16"><span>If the </span><span>completion </span><span>returns 0 then those threads that called </span><span class="c6">arrive_and_drop() </span><span>are removed from the set of expected threads</span><span>.</span><span>&nbsp;</span><span>Otherwise the count of expected threads is set to the completion&#39;s return value</span><span>.</span><span>&nbsp;(If the count is altered, this relaxes the restriction that the same set of threads must arrive at the barrier in each new cycle.)</span></p><p class="c0 c16"><span class="c7">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c0"><span class="c6">notifying_barrier(const notifying_barrier&amp;) = delete;<br>notifying_barrier&amp; operator=(const notifying_barrier&amp;) = delete;</span></p><h4 class="c0 c18 c39"><a name="h.iqd3r8x75vrc"></a><span>Memory Ordering</span></h4><p class="c0"><span>All calls to the completion function synchronize with those calls to </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>&nbsp;that triggered the completion.</span><span>&nbsp;The completion function synchronizes with all calls unblocked after it has run</span><span>.</span></p><p class="c0 c14"><span></span></p><h2 class="c0 c18"><a name="h.p3kxt6srrk2q"></a><span>Notes</span></h2><p class="c0"><span>(The following notes have not changed significantly since the last revision of this paper, and may be skipped by readers familiar with that.)</span></p><h3 class="c0 c18 c39"><a name="h.7jwjdi7yg2r3"></a><span>Use of a scoped guard to manage latches and barriers</span></h3><p class="c3 c0"><span class="c9">A future paper will propose a</span><span class="c15 c11">&nbsp;</span><span class="c29 c4">scoped_guard</span><span class="c9">; a helper class that invokes a function when it goes out of scope. This is analagous to a</span><span class="c15 c11">&nbsp;</span><span class="c29 c4">std::unique_ptr</span><span class="c9">, which deletes an object when it goes out of scope. The latch and barrier classes could provide scoped guards that would ensure that a latch was always decremented when a worker had finished, regardless of how the worker terminated. For example:</span></p><p class="c2 c0 c14"><span class="c9"></span></p><p class="c2 c0"><span class="c1">&nbsp; void DoWork(latch&amp; completion_latch) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Automatically invokes completion_latch.count_down() when this</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // function terminates</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; scoped_guard g = completion_latch.count_down_guard();</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; switch (state) {</span></p><p class="c0 c2"><span class="c1">&nbsp; &nbsp; &nbsp; case 0:</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; case 1:</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; alogorithm1();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; case 2:</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; alogorithm2();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; default:</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; throw std::logic_error</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; }</span></p><p class="c2 c0"><span class="c1">&nbsp; }</span></p><p class="c19 c0 c14"><span class="c1"></span></p><p class="c3 c0"><span class="c9">We suggest that a suitable</span><span class="c15 c11">&nbsp;</span><span class="c29 c4">xxx_guard()</span><span class="c9">&nbsp;method be provided for all of the latch and barrier methods, as an aid to safe useage. This would avoid having threads waiting for termination conditions that are never triggered.</span></p><a href="#" name="id.2et92p0"></a><h3 class="c0 c18"><a name="h.dcshjdgb0rki"></a><span class="c32">Sample Usage</span></h3><p class="c3 c0"><span class="c9">Sample use cases for the latch include:</span></p><ul class="c34 lst-kix_list_2-0 start"><li class="c2 c0 c40"><span class="c9">Setting multiple threads to perform a task, and then waiting until all threads have reached a common point.</span></li><li class="c3 c0 c40"><span class="c9">Creating multiple threads, which wait for a signal before advancing beyond a common point.</span></li></ul><p class="c3 c0"><span class="c9">An example of the first use case would be as follows:</span></p><p class="c2 c0 c14"><span class="c9"></span></p><p class="c2 c0"><span class="c1">&nbsp; void DoWork(threadpool* pool) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; latch completion_latch(NTASKS);</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; for (int i = 0; i &lt; NTASKS; ++i) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; pool-&gt;add_task([&amp;] {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; // perform work</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; completion_latch.count_down();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; }));</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; }</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Block until work is done</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; completion_latch.wait();</span></p><p class="c2 c0"><span class="c1">&nbsp; }</span></p><p class="c19 c0 c14"><span class="c1"></span></p><p class="c3 c0"><span class="c9">An example of the second use case is shown below. We need to load data and then process it using a number of threads. Loading the data is I/O bound, whereas starting threads and creating data structures is CPU bound. By running these in parallel, throughput can be increased.</span></p><p class="c2 c0 c14"><span class="c9"></span></p><p class="c2 c0"><span class="c1">&nbsp; void DoWork() {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; latch start_latch(1);</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; for (int i = 0; i &lt; NTHREADS; ++i) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; // Initialize data structures. This is CPU bound.</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; start_latch.wait();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; // perform work</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; }));</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; }</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Load input data. This is I/O bound.</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Threads can now start processing</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; start_latch.count_down();</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Wait for threads to finish, delete allocated objects.</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; }<br></span></p><p class="c3 c0"><span class="c9">The barrier can be used to co-ordinate a set of threads carrying out a repeated task.</span></p><p class="c2 c0"><span class="c1">&nbsp; void DoWork() {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; Tasks&amp; tasks;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; </span><span class="c5">int</span><span class="c1">&nbsp;</span><span class="c5">n</span><span class="c1">_threads;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; barrier task_barrier(n_threads);</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; for (int i = 0; i &lt; n_threads; ++i) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; bool active = true;</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; while(active) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Task task = tasks.get();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // perform task</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; task_barrier.</span><span class="c5">arrive_and_wait</span><span class="c1">();</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; &nbsp;});</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; }</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; // Read each stage of the task until all stages are complete.</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; while (!finished()) {</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; &nbsp; GetNextStage(tasks);</span></p><p class="c2 c0"><span class="c1">&nbsp; &nbsp; }</span></p><p class="c2 c0"><span class="c1">&nbsp; }</span></p><p class="c0 c14"><span></span></p><p class="c0"><span>The notifying_barrier can be used to co-ordinate a set of threads where the number of threads can vary. In the example below, we reduce the number of threads when a task finishes. (Alternatively we could increase the number of threads if the task required it.) Note that reducing threads can be done via </span><span class="c6">barrier::arrive_and_drop()</span><span>&nbsp;method, but increasing the number of threads can only be done with a notifying_barrier.</span></p><p class="c0 c12 c14"><span class="c5"></span></p><p class="c0 c12"><span class="c5">&nbsp; void DoWork() {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; Tasks&amp; tasks;</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; int initial_threads;</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; atomic&lt;int&gt; current_threads(initial_threads)</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c0 c12 c14"><span class="c5"></span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; // Create a notifying_barrier, and set a lambda that will be </span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; // invoked every time the barrier counts down. If one or more </span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; // active threads have completed, reduce the number of threads.</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; std::function rf = [&amp;] { return current_threads;};</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; notfying_barrier task_barrier(n_threads, rf);</span></p><p class="c0 c12 c14"><span class="c5"></span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; for (int i = 0; i &lt; n_threads; ++i) {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; bool active = true;</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; while(active) {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Task task = tasks.get();</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // perform task</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (finished(task)) {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; current_threads--;</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; active = false;</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; task_barrier.arrive_and_wait();</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; &nbsp;});</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; }</span></p><p class="c0 c12 c14"><span class="c5"></span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; // Read each stage of the task until all stages are complete.</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; while (!finished()) {</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; &nbsp; GetNextStage(tasks);</span></p><p class="c0 c12"><span class="c5">&nbsp; &nbsp; }</span></p><p class="c0 c12"><span class="c5">&nbsp; }</span></p><p class="c0 c14 c19"><span class="c5"></span></p><a href="#" name="id.tyjcwt"></a><h2 class="c0 c18"><a name="h.gtvi3sys2b7e"></a><span class="c32">Alternative Solutions</span></h2><p class="c3 c0"><span class="c9">Java provides a Phaser interface for thread co-ordination. See</span><span class="c15 c11">&nbsp;</span><span class="c15 c11"><a class="c13" href="http://www.google.com/url?q=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Futil%2Fconcurrent%2FPhaser.html&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHdvbxqj93pJSIF--RIB_ywiLV-ug">http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Phaser.html</a></span><span class="c9">. It has been suggested that a similar concept could be used in the C++ standard instead of the current latch/barrier proposal.</span></p><p class="c3 c0"><span class="c9">The Phaser interface offers the following potential advantages:</span></p><ul class="c34 lst-kix_list_3-0 start"><li class="c2 c0 c40"><span class="c9">The number of tasks can be updated at any point.</span></li><li class="c3 c0 c40"><span class="c9">A thread can complete a phase without blocking.</span></li></ul><p class="c3 c0"><span class="c9">To describe the differences between Phasers and barriers we use the term &#39;cycle&#39;. Both constructs support the use-case where a number of threads block until they all arrive at a certain point. Once all threads have arrived, they are unblocked, and can continue. We refer to this as a &#39;cycle&#39;. Both Phasers and barriers are reusable: after one cycle has completed a new cycle can begin.</span></p><p class="c3 c0"><span class="c9">The proposed notifying_barrier can only update the number of threads during the completion function that is invoked when the count reaches zero at the end of a cycle. The Phaser allows a thread to call register() at any point during the cycle. However it is difficult to reason about the ordering of this behaviour. To ensure that a thread is added to a particular cycle would require additional synchronization with the other threads in the cycle, with the complexity and performance overheads that this involves. If there is no such synchronization then a thread cannot control which cycle it joins. As the purpose of the barrier is to allow threads to co-operate on tasks, this feature of Phaser does not seem useful.</span></p><p class="c3 c0"><span class="c9">The Phaser class supports an arriveAndAwaitAdvance() method that corresponds to barrier&#39;s </span><span>arrive_and_wait</span><span class="c9">(). At also supports an arrive() method that decrements the internal count without blocking. This potentially allows some threads to arrive at a later cycle while other threads are still completing earlier cycles. Again, we feel that it can be hard to reason about the order in which threads will enter different cycles, or when the sequence of operations controlled by this Phaser has finally terminated. In addition, this capability adds to the complexity of the implementation.</span></p><p class="c3 c0"><span class="c9">We feel that it would be possible to implement a C++ Phaser, possibly using the proposed latch and barrier classes, but that the latter two offer a simpler interface that will be useful for most thread co-ordination tasks.</span></p><a href="#" name="id.3dy6vkm"></a><h2 class="c0 c18"><a name="h.32y3d6lgyg3p"></a><span class="c32">Synopsis</span></h2><p class="c3 c0"><span class="c9">The synopsis is as follows.</span></p><p class="c2 c0 c14"><span class="c9"></span></p><p class="c2 c0"><span class="c5">class latch {<br>public:<br> &nbsp;explicit latch(int count);<br> &nbsp;~latch();<br> &nbsp;void arrive();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void count_down(int n);<br> &nbsp;void wait();<br> &nbsp;bool try_wait();</span></p><p class="c2 c0"><span class="c1">};</span></p><p class="c2 c0 c14"><span class="c1"></span></p><p class="c2 c0"><span class="c5">class barrier {<br> public:<br> &nbsp;explicit barrier(int num_threads);<br> &nbsp;~barrier();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void arrive_and_drop();</span></p><p class="c2 c0"><span class="c1">};</span></p><p class="c2 c0 c14"><span class="c5"></span></p><p class="c19 c0"><span class="c5">class notifying_barrier {<br> public:<br> &nbsp;template &lt;typename F&gt;<br> &nbsp;notifying_barrier(int num_threads, F completion);<br> &nbsp;~notifying_barrier();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void arrive_and_drop();<br>};</span></p><p class="c0 c14"><span class="c9"></span></p></body></html>