<html><head><title>C++ Latches and Barriers (post Rapperswil)</title><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">.lst-kix_list_1-1>li:before{content:"\002022  "}.lst-kix_list_2-8>li:before{content:"\002022  "}.lst-kix_list_3-2>li:before{content:"\002022  "}.lst-kix_list_3-7>li:before{content:"\002022  "}.lst-kix_list_2-0>li:before{content:"\002022  "}.lst-kix_list_1-2>li:before{content:"\002022  "}.lst-kix_list_1-5>li:before{content:"\002022  "}.lst-kix_list_2-3>li:before{content:"\002022  "}.lst-kix_list_2-4>li:before{content:"\002022  "}.lst-kix_list_3-5>li:before{content:"\002022  "}.lst-kix_list_1-4>li:before{content:"\002022  "}.lst-kix_list_3-0>li:before{content:"\002022  "}.lst-kix_list_1-0>li:before{content:"\002022  "}.lst-kix_list_2-5>li:before{content:"\002022  "}.lst-kix_list_2-7>li:before{content:"\002022  "}.lst-kix_list_1-8>li:before{content:"\002022  "}.lst-kix_list_3-4>li:before{content:"\002022  "}.lst-kix_list_1-3>li:before{content:"\002022  "}.lst-kix_list_3-3>li:before{content:"\002022  "}.lst-kix_list_3-6>li:before{content:"\002022  "}.lst-kix_list_3-8>li:before{content:"\002022  "}ul.lst-kix_list_3-7{list-style-type:none}ul.lst-kix_list_3-8{list-style-type:none}.lst-kix_list_1-6>li:before{content:"\002022  "}.lst-kix_list_2-6>li:before{content:"\002022  "}ul.lst-kix_list_3-0{list-style-type:none}ul.lst-kix_list_3-1{list-style-type:none}ul.lst-kix_list_3-2{list-style-type:none}ul.lst-kix_list_3-3{list-style-type:none}ul.lst-kix_list_3-4{list-style-type:none}.lst-kix_list_3-1>li:before{content:"\002022  "}ul.lst-kix_list_3-5{list-style-type:none}ul.lst-kix_list_3-6{list-style-type:none}ul.lst-kix_list_1-0{list-style-type:none}ul.lst-kix_list_1-2{list-style-type:none}ul.lst-kix_list_2-4{list-style-type:none}.lst-kix_list_2-2>li:before{content:"\002022  "}ul.lst-kix_list_1-1{list-style-type:none}ul.lst-kix_list_2-5{list-style-type:none}ul.lst-kix_list_1-4{list-style-type:none}ul.lst-kix_list_2-6{list-style-type:none}.lst-kix_list_1-7>li:before{content:"\002022  "}ul.lst-kix_list_1-3{list-style-type:none}ul.lst-kix_list_2-7{list-style-type:none}ul.lst-kix_list_2-0{list-style-type:none}ul.lst-kix_list_1-6{list-style-type:none}ul.lst-kix_list_2-1{list-style-type:none}ul.lst-kix_list_1-5{list-style-type:none}ul.lst-kix_list_2-2{list-style-type:none}ul.lst-kix_list_1-8{list-style-type:none}ul.lst-kix_list_1-7{list-style-type:none}ul.lst-kix_list_2-3{list-style-type:none}.lst-kix_list_2-1>li:before{content:"\002022  "}ul.lst-kix_list_2-8{list-style-type:none}ol{margin:0;padding:0}.c31{border-bottom-width:1pt;border-top-style:solid;width:60.8pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c32{border-bottom-width:1pt;border-top-style:solid;width:398pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c33{border-bottom-width:1pt;border-top-style:solid;width:38.7pt;border-right-style:solid;padding:1.4pt 1.4pt 1.4pt 1.4pt;border-bottom-color:#000000;border-top-width:1pt;border-bottom-style:solid;vertical-align:middle;border-top-color:#000000;border-left-color:#000000;border-right-color:#000000;border-left-style:solid;border-right-width:1pt;border-left-width:1pt}.c2{vertical-align:baseline;color:#000000;font-size:10pt;font-style:normal;font-family:"Courier New";text-decoration:none;font-weight:normal}.c1{line-height:1.0;padding-top:0pt;widows:2;orphans:2;text-align:left;direction:ltr;padding-bottom:6pt}.c0{vertical-align:baseline;color:#000000;font-size:12pt;font-style:normal;font-family:"Times New Roman";text-decoration:none;font-weight:normal}.c11{vertical-align:baseline;color:#000000;font-style:normal;font-family:"Times New Roman";text-decoration:none;font-weight:normal}.c20{vertical-align:baseline;color:#000000;font-style:normal;text-decoration:none;font-weight:normal}.c34{font-style:normal;font-family:"Times New Roman";font-weight:normal}.c36{max-width:540pt;background-color:#ffffff;padding:21.6pt 36pt 21.6pt 36pt}.c16{list-style-position:inside;text-indent:45pt;margin-left:35.4pt}.c26{vertical-align:baseline;text-decoration:none;font-weight:normal}.c3{widows:2;orphans:2;direction:ltr}.c5{line-height:1.0;padding-top:0pt;text-align:left}.c40{margin-right:auto;border-collapse:collapse}.c9{color:inherit;text-decoration:inherit}.c27{margin:0;padding:0}.c30{font-size:11pt;font-family:"Arial"}.c6{font-size:12pt;font-family:"Courier New"}.c29{line-height:1.15;direction:ltr}.c19{font-size:10pt;font-family:"Courier New"}.c17{color:#000000;font-family:"Times New Roman"}.c12{color:#1155cc;text-decoration:underline}.c10{margin-left:28.4pt;padding-bottom:14.2pt}.c25{font-size:11pt;font-family:"Courier New"}.c28{vertical-align:baseline}.c7{font-size:14pt}.c37{font-family:"Arial"}.c21{margin-left:72pt}.c13{margin-left:54pt}.c39{margin-left:18pt}.c22{font-size:12pt}.c41{padding-bottom:6pt}.c23{margin-left:36pt}.c24{height:0pt}.c18{page-break-after:avoid}.c15{padding-bottom:14.2pt}.c4{padding-bottom:0pt}.c35{padding-top:12pt}.c8{height:14pt}.c38{font-weight:normal}.c14{font-style:italic}.title{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:center;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}.subtitle{widows:2;padding-top:0pt;line-height:1.0;orphans:2;text-align:center;color:#000000;font-size:14pt;font-family:"Arial";padding-bottom:3pt}li{color:#000000;font-size:14pt;font-family:"Times New Roman"}p{color:#000000;font-size:14pt;margin:0;font-family:"Times New Roman"}h1{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:24pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h2{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:18pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h3{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:14pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h4{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:14pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:6pt;page-break-after:avoid}h5{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-style:italic;font-size:13pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:3pt}h6{widows:2;padding-top:12pt;line-height:1.0;orphans:2;text-align:left;color:#000000;font-size:11pt;font-family:"Times New Roman";font-weight:bold;padding-bottom:3pt}</style></head><body class="c36"><h1 class="c3 c18"><a name="h.mrc6952e0whh"></a><span class="c17">C++ Latches and Barriers</span></h1><p class="c1"><span class="c11 c7">ISO/IEC JTC1 SC22 WG21 N4204</span><span class="c11 c7">&nbsp;- 2014-0</span><span>8</span><span class="c11 c7">-</span><span>06</span></p><p class="c1"><span class="c11 c7">Alasdair Mackintosh, </span><span class="c12 c28 c7 c34"><a class="c9" href="mailto:alasdair@google.com">alasdair@google.com</a></span><span class="c11 c7">, </span><span class="c34 c12 c28 c7"><a class="c9" href="mailto:alasdair.mackintosh@gmail.com">alasdair.mackintosh@gmail.com</a></span></p><p class="c1"><span>Olivier Giroux, </span><span class="c12"><a class="c9" href="mailto:OGiroux@nvidia.com">OGiroux@nvidia.com</a></span><span>, </span><span class="c12"><a class="c9" href="mailto:ogiroux@gmail.com">ogiroux@gmail.com</a></span></p><p class="c1 c8"><span></span></p><p class="c3 c39"><span class="c12"><a class="c9" href="#h.mrc6952e0whh">C++ Latches and Barriers</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.qfiyr9nhb3on">Revision History</a></span></p><p class="c3 c23"><span class="c12"><a class="c9" href="#h.r7uhgh9a3qqw">Introduction</a></span></p><p class="c3 c23"><span class="c12"><a class="c9" href="#h.rs9bik786ahx">Solution</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.94fny07f18oa">Concepts</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.tvy8lbn78dwj">ArriveAndWaitable</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.kfz8b1e14yq7">Latch</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.s6u4pup9f1v">Barrier</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.w1z5efbsofkv">Classes</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.ochtd04q8zj5">Header std::latch Synopsis</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.4kr7i3465ny5">Memory Ordering</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.ecoi17nlqseu">Class std::barrier</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.34mqtw978gfl">Memory Ordering</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.9fyt9rmi4g2z">Class std::flex_barrier</a></span></p><p class="c3 c21"><span class="c12"><a class="c9" href="#h.iqd3r8x75vrc">Memory Ordering</a></span></p><p class="c3 c23"><span class="c12"><a class="c9" href="#h.p3kxt6srrk2q">Notes</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.7jwjdi7yg2r3">Use of a scoped guard to manage latches and barriers</a></span></p><p class="c3 c13"><span class="c12"><a class="c9" href="#h.dcshjdgb0rki">Sample Usage</a></span></p><p class="c3 c23"><span class="c12"><a class="c9" href="#h.gtvi3sys2b7e">Alternative Solutions</a></span></p><p class="c3 c23"><span class="c12"><a class="c9" href="#h.32y3d6lgyg3p">Synopsis</a></span></p><h3 class="c3 c18"><a name="h.qfiyr9nhb3on"></a><span class="c17">Revision History</span></h3><a href="#" name="9a2fb9adf60688834597adc303766058e96c3c2b"></a><a href="#" name="0"></a><table cellpadding="0" cellspacing="0" class="c40"><tbody><tr class="c24"><td class="c33"><p class="c3 c5 c4"><span class="c0">N3666</span></p></td><td class="c31"><p class="c3 c4 c5"><span class="c0">2013-04-18</span></p></td><td class="c32"><p class="c3 c5 c4"><span class="c0">Initial Version</span></p></td></tr><tr class="c24"><td class="c33"><p class="c3 c5 c4"><span class="c0">N3817</span></p></td><td class="c31"><p class="c3 c5 c4"><span class="c0">2013-10-11</span></p></td><td class="c32"><p class="c3 c5 c4"><span class="c0">Clarify destructor behaviour. Add comment on templatised completion functions.</span></p></td></tr><tr class="c24"><td class="c33"><p class="c3 c5 c4"><span class="c0">N3885</span></p></td><td class="c31"><p class="c3 c5 c4"><span class="c0">2013-01-21</span></p></td><td class="c32"><p class="c3 c5 c4"><span class="c0">Add Alternative Solutions section. (Not formally published)</span></p></td></tr><tr class="c24"><td class="c33"><p class="c3 c5 c4"><span class="c22">N3998</span></p></td><td class="c31"><p class="c3 c5 c4"><span class="c22">2014-05-21</span></p></td><td class="c32"><p class="c3 c5 c4"><span class="c0">Add Concepts, simplify latch and barrier, add notifiying_barrier</span></p></td></tr><tr class="c24"><td class="c33"><p class="c3 c5 c4"><span class="c0">N4204</span></p></td><td class="c31"><p class="c3 c5 c4"><span class="c0">2014-08-06</span></p></td><td class="c32"><p class="c3 c5 c4"><span class="c0">Minor revisions after Rapperswil meeting</span></p></td></tr></tbody></table><h2 class="c3 c18"><a name="h.r7uhgh9a3qqw"></a><span class="c17">Introduction</span></h2><p class="c1"><span class="c11 c7">Certain idioms that are commonly used in concurrent programming are missing from the standard libraries. Although many of these these can be relatively straightforward to implement, we believe it is more efficient to have a standard version.</span></p><p class="c1"><span class="c11 c7">In addition, although some idioms can be provided using mutexes, higher performance can often be obtained with atomic operations and lock-free algorithms. However, these algorithms are more complex to write, and are prone to error.</span></p><p class="c1"><span class="c11 c7">Other standard concurrency idioms may have difficult corner cases, and can be hard to implement correctly. For these reasons, we believe that it is valuable to provide these in the standard library.</span></p><a href="#" name="id.30j0zll"></a><h2 class="c3 c18"><a name="h.rs9bik786ahx"></a><span class="c17">Solution</span></h2><p class="c1"><span class="c11 c7">We propose a set of commonly-used concurrency classes, some of which may be implemented using efficient lock-free algorithms where appropriate. This paper describes var</span><span class="c7">ious concepts related to thread co-ordination, a</span><span>n</span><span class="c7">d defines </span><span class="c11 c7">the</span><span class="c26 c17 c7 c14">&nbsp;latch</span><span class="c7">,</span><span class="c26 c17 c7 c14">&nbsp;</span><span class="c7 c14">barrier</span><span class="c7">&nbsp;and </span><span class="c14">flex_barrier</span><span class="c11 c7">&nbsp;classes.</span></p><p class="c1"><span class="c11 c7">Latches are a thread coordination mechanism that allow one or more threads to block until an operation is completed. An individual latch is a single-use object; once the operation has been completed, it cannot be reused.</span></p><p class="c1"><span class="c11 c7">Barriers are a thread coordination mechanism that allow multiple threads to block until an operation is completed. Unlike a latch, a barrier is re-usable; once the operation has been completed, the threads can re-use the same barrier. It is thus useful for managing </span><span class="c11 c7">repeated tasks</span><span class="c11 c7">, or phases of a larger task, that are handled by multiple threads.</span></p><p class="c1"><span>Flex</span><span class="c7">&nbsp;Barriers allow additional behaviour to be defined when an operation has completed.</span></p><p class="c1"><span class="c11 c7">A reference implementation of these classes has been written</span><span class="c7 c11">.</span></p><a href="#" name="id.w3v6sd8c9dfs"></a><h3 class="c3 c18 c41"><a name="h.94fny07f18oa"></a><span>Concepts</span></h3><p class="c3"><span>In the section below, a </span><span class="c14">synchronization point</span><span>&nbsp;represents a point at which </span><span>a</span><span>&nbsp;thread may block until a given </span><span class="c14">synchronization condition</span><span>&nbsp;has been reached </span><span>or at which it may notify other threads that a synchronization condition has been achieved.</span></p><p class="c3"><span class="c7">We define the following concepts:</span></p><h4 class="c3 c18"><a name="h.tvy8lbn78dwj"></a><span>ArriveAndWaitable</span></h4><p class="c3"><span class="c7">Provides:</span></p><p class="c23 c29"><span class="c25">arrive_and_wait()</span><span>&nbsp;- Allows a single thread to indicate that it has arrived at a synchronization point. The thread will block until the synchronization condition has been reached. May only be called once by a given thread.</span></p><h4 class="c3 c18"><a name="h.kfz8b1e14yq7"></a><span>Latch</span></h4><p class="c3"><span>Provides ArriveAndWaitable, plus:</span></p><p class="c29 c23"><span class="c6">wait()</span><span>&nbsp;- The calling thread will block until the synchronization condition has been reached.</span></p><p class="c29 c23"><span class="c6">arrive()</span><span>&nbsp;- Allows a single thread to indicate that is has arrived at the synchronization point. Does not block. May only be called once by a given thread.</span></p><p class="c29 c23"><span class="c6">count_down(N)</span><span>&nbsp;- decrements </span><span>by N</span><span>&nbsp;the internal counter that determines when the synchronization condition has been reached. May be called more than once by a given thread. </span></p><p class="c29 c8"><span class="c30"></span></p><h4 class="c3 c18"><a name="h.s6u4pup9f1v"></a><span>Barrier&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></h4><p class="c3"><span>Provides ArriveAndWaitable, plus:</span></p><p class="c3 c23"><span class="c25">arrive_and_wait()</span><span>&nbsp;- Allows a single thread to indicate that it has arrived at a synchronization point. The thread will block until the synchronization condition has been reached. May be called repeatedly by a given thread.</span></p><p class="c3 c23"><span class="c6">arrive_and_drop</span><span class="c6">()</span><span class="c22 c37">- </span><span>Allows a single thread to indicate that it has arrived at a synchronization point. The thread will not block. Once a thread returns from this function, it shall not invoke other methods on the </span><span>barrier</span><span>&nbsp;(except the destructor, if otherwise valid).</span></p><h3 class="c3 c18"><a name="h.w1z5efbsofkv"></a><span>Class</span><span>es</span></h3><h3 class="c3 c18"><a name="h.ochtd04q8zj5"></a><span>Header std::latch Synopsis</span></h3><p class="c3"><span>Provides the Latch concept.</span></p><p class="c1"><span class="c11 c7">A latch maintains an internal counter that is initialized when the latch is created. The </span><span class="c17 c7 c14 c26">synchronization condit</span><span class="c14">ion</span><span>&nbsp;is reached when the counter is decremented to 0. Threads may block at a </span><span class="c14">synchronization point</span><span>&nbsp;waiting for the condition to be reached.</span><span class="c14">&nbsp;</span><span>When the condition is reached, any such blocked threads will be released.</span></p><p class="c1 c8"><span></span></p><p class="c3 c5 c15"><span class="c6">latch</span><span class="c20 c6">( </span><span class="c20 c6">int</span><span class="c6">&nbsp;</span><span class="c20 c6">C</span><span class="c20 c6">&nbsp;);</span></p><p class="c3 c5 c10"><span class="c14">Requires</span><span>: C shall be &gt;= zero.</span></p><p class="c3 c5 c10"><span class="c14">Effects</span><span>: initializes the latch with a count of C.</span><span>&nbsp; </span><span>[Note: </span><span>If C is zero, the synchronization condition has been reached. End note]</span></p><p class="c3 c5 c10"><span class="c14">Synchronization: </span><span>None</span></p><p class="c3 c5 c15"><span class="c6">~latch</span><span class="c20 c6">( );</span></p><p class="c3 c10"><span class="c14">Requires:</span><span>&nbsp;No threads are blocked at the synchronization </span><span>point</span><span>. May </span><span class="c11 c7">be </span><span>called</span><span class="c11 c7">&nbsp;if threads have not yet returned from </span><span class="c6">wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_wait()</span><span>&nbsp;provided that the condition has been reached.</span><span>&nbsp; [Note: The destructor may not return until all threads have exited </span><span>&nbsp;</span><span class="c6">wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_wait()</span><span>. End note]</span></p><p class="c3 c15"><span class="c6">void arrive( );</span></p><p class="c3 c10"><span class="c14">Effects</span><span>: </span><span>Decrements</span><span>&nbsp;the internal count by 1. </span><span>If the count reaches 0 the</span><span>&nbsp;synchronization condition is reached.</span><span>&nbsp;I</span><span>f called more than once by a given thread the behaviour is undefined.</span></p><p class="c3 c10"><span class="c14">Throws:</span><span>&nbsp;std::logic_error if the internal count would be decremented below 0.</span></p><p class="c3 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result</span><span>.</span></p><p class="c3 c15"><span class="c6">void arrive_and_wait( );</span></p><p class="c3 c10"><span class="c14">Effects</span><span>: </span><span>Decrements the internal count by 1. If the count reaches 0 </span><span>the synchronization condition is reached. Otherwise b</span><span>locks</span><span>&nbsp;at the synchronization point until the synchronization condition is reached. If called more than once by a given thread the behaviour is undefined.</span></p><p class="c3 c10"><span class="c14">Throws:</span><span>&nbsp;std::logic_error if the internal count would be decremented below 0.</span></p><p class="c3 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.</span></p><p class="c3 c5 c15"><span class="c20 c6">void count_down( </span><span class="c6">int</span><span class="c20 c6">&nbsp;N );</span></p><p class="c3 c5 c10"><span class="c14">Effects</span><span>: </span><span class="c11 c7">Decrements the internal count by </span><span>N</span><span class="c11 c7">. If the count reaches 0, </span><span>the synchronization condition is reached</span><span class="c11 c7">. May be called by any thread. Does not block.</span></p><p class="c3 c5 c10"><span class="c26 c17 c7 c14">Throws:</span><span class="c11 c7">&nbsp;std::logic_error if the internal count </span><span>would be decremented below </span><span>0</span><span class="c11 c7">.</span></p><p class="c3 c5 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.</span></p><p class="c3 c5 c15"><span class="c6 c20">void wait( );</span></p><p class="c3 c5 c10"><span class="c14">Effects</span><span>: </span><span class="c11 c7">Blocks the calling thread at the synchronization point until the </span><span>synchronization condition is reached.</span><span class="c11 c7">&nbsp;If the</span><span>&nbsp;condition has already been reached</span><span class="c11 c7">, </span><span>&nbsp;the thread does not block.</span></p><p class="c3 c15"><span class="c6">bool try_wait( );</span></p><p class="c3 c5 c10"><span class="c14">Returns: </span><span class="c11 c7">Returns true if the synchronization condition has been is reached, and false otherwise. Does not block.</span></p><p class="c3"><span class="c6">latch(const latch&amp;) = delete;<br>latch&amp; operator=(const latch&amp;) = delete;</span></p><h4 class="c3 c18"><a name="h.4kr7i3465ny5"></a><span class="c17 c7">Memory Ordering</span></h4><p class="c1"><span class="c11 c7">All calls to</span><span class="c0">&nbsp;</span><span class="c20 c6">count_down()</span><span>, </span><span class="c20 c6">arrive(</span><span class="c6">)</span><span>, and </span><span class="c6">arrive_and_wait()</span><span class="c11 c7">synchronize with any </span><span>calls to</span><span class="c0">&nbsp;</span><span class="c6">wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_wait()</span><span>&nbsp;that complete as a result</span><span class="c20">.</span><span class="c11 c7">&nbsp; All calls </span><span class="c28">to </span><span class="c6">count_down()</span><span>,</span><span>&nbsp;</span><span class="c6">arrive()</span><span>, or </span><span class="c6">arrive_and_wait()</span><span class="c28">&nbsp;s</span><span class="c11 c7">ynchronize with any </span><span>call to</span><span class="c6 c28">&nbsp;try_wait()</span><span class="c28 c22">&nbsp;that returns </span><span class="c6 c28">true</span><span class="c28">&nbsp;as a result.</span></p><p class="c1 c8"><span></span></p><h3 class="c3 c18"><a name="h.ecoi17nlqseu"></a><span>Header std::barrier synopsis</span></h3><p class="c1"><span>Provides the Barrier concept.</span></p><p class="c1"><span>A barrier is created with an initial value representing the number of threads that can arrive at the </span><span class="c14">synchronization point. </span><span>When that many threads have arrived, the &nbsp;</span><span class="c14">synchronization condition</span><span>&nbsp;is reached and the threads are released. The barrier will then reset, and may be reused for a new cycle, in which the same set of threads may arrive again at the synchronization point. T</span><span>he same set of threads</span><span>&nbsp;shall arrive at the barrier in each cycle, otherwise the behaviour is undefined</span><span class="c11 c7">.</span></p><p class="c1 c8"><span></span></p><p class="c1"><span class="c6">barrier</span><span class="c20 c6">( int C );<br></span></p><p class="c3 c5 c10"><span class="c14">Requires: </span><span>C shall be &gt;= </span><span>zero</span><span>. </span><span>[Note: </span><span>If C is zero, the synchronization condition is considered to have already been reached. In the case, the barrier may only be destroyed. End Note]</span><span><br><br></span><span class="c14">Effects:</span><span>&nbsp;initializes the barrier with the </span><span>number of participating threads</span><span>&nbsp;C.<br></span></p><p class="c3 c5 c15"><span class="c6">~barrier</span><span class="c20 c6">( );</span></p><p class="c3 c5 c10"><span class="c14">Requires:</span><span>&nbsp;</span><span>No </span><span class="c11 c7">threads are blocked </span><span>at the synchronization </span><span>point</span><span>.</span></p><p class="c3 c5 c10"><span class="c14">Effects:</span><span>&nbsp;destroys the barrier</span></p><p class="c3 c5 c15"><span class="c20 c6">void </span><span class="c6">arrive</span><span class="c20 c6">_and_wait( );</span></p><p class="c3 c5 c10"><span class="c14">Effects: </span><span>Blocks at the </span><span class="c14">synchronization point</span><span>&nbsp;until the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached. When all threads (as determined by the initial thread count parameter to the constructor) have arrived, the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached and all threads are released. The barrier may then be re-used for another cycle.</span></p><p class="c3 c5 c10"><span>[</span><span class="c11 c7">Note: it is safe for a thread to re-enter </span><span class="c6">arrive_and_wait</span><span class="c20 c6">()</span><span class="c11 c7">&nbsp;immediately. It is not necessary to ensure that all blocked threads have exited </span><span class="c6">arrive_and_wait</span><span class="c20 c6">()</span><span class="c11 c7">&nbsp;before one thread re-enters it. -- </span><span>E</span><span class="c11 c7">nd note]</span></p><p class="c3 c5 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c3"><span class="c6">void arrive_and_drop( );</span></p><p class="c3 c10"><span class="c14">Effects: </span><span>Signals that this thread has arrived at the </span><span class="c14">synchronization point</span><span>. </span><span>May block</span><span>&nbsp;until the </span><span class="c14">synchronization condition </span><span>is reached</span><span>.</span><span>&nbsp;When the barrier resets, </span><span>the current thread is removed from the set of participating threads</span><span>.</span></p><p class="c3 c10"><span>I</span><span>f the barrier was created with an initial count of N, </span><span>and all N threads call</span><span class="c6">&nbsp;</span><span class="c6">arrive_and_drop()</span><span>, any further operations on the barrier are undefined, apart from calling the destructor.</span></p><p class="c3 c10"><span>Calling </span><span class="c6">arrive_and_drop()</span><span>modifies the requirement that the same set of threads must arrive in each cycle. The thread that drops it is no longer considered as part of this set. </span><span>If a thread that has called </span><span class="c6">arrive_and_drop()</span><span>&nbsp;calls another </span><span>method </span><span>on the same barrier, other than the destructor, the results are undefined.</span></p><p class="c3 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c3"><span class="c6">barrier(const barrier&amp;) = delete;<br>barrier&amp; operator=(const barrier&amp;) = delete;</span></p><h4 class="c3 c18"><a name="h.34mqtw978gfl"></a><span class="c7 c17">Memory Ordering</span></h4><p class="c3"><span>All calls to</span><span class="c22">&nbsp;</span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>synchronize with any calls to</span><span class="c22">&nbsp;</span><span class="c6">arrive_and_wait()</span><span>&nbsp;that complete as a result.</span></p><h3 class="c3 c18"><a name="h.9fyt9rmi4g2z"></a><span><br>Header std::</span><span>flex_barrier</span><span>&nbsp;synopsis</span></h3><p class="c3"><span>Provides the Barrier concept.</span><span><br><br>A flex barrier behaves as a barrier, but may be constructed with a callable completion function that is invoked after all threads have arrived at the </span><span class="c14">synchronization point</span><span>, and before the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached. The completion may </span><span>modify</span><span>&nbsp;the set of threads</span><span>&nbsp;that arrives at the barrier in each cycle.</span></p><p class="c3"><span>The flex barrier maintains the concept of a </span><span class="c14">count of expected threads</span><span>. This is the number of threads that are expected to arrive at each completion point. Unlike the regular barrier, this count may change from cycle to cycle.</span></p><p class="c3 c8"><span></span></p><p class="c3 c4"><span class="c6">template &lt;typename T&gt;</span></p><p class="c3 c4"><span class="c6">flex_barrier( int C, T F );</span></p><p class="c3 c10 c8"><span class="c6"></span></p><p class="c3 c10"><span class="c14">Requires: </span><span>C shall be &gt;= </span><span>zero</span><span>.</span><span>&nbsp;F shall conform to the callable </span><span class="c6">int()</span><span>&nbsp;concept. &nbsp;</span><span>[Note: </span><span>If C is zero, the synchronization condition is considered to have already been reached. In this case, the barrier may only be destroyed. End Note]</span><span><br></span><span class="c14">Effects:</span><span>&nbsp;Initializes the barrier with an expected thread count of C, and </span><span>a callable object that will be invoked after</span><span>&nbsp;the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached.</span><span>&nbsp;</span></p><p class="c3 c4 c8"><span class="c6"></span></p><hr style="page-break-before:always;display:none;"><p class="c3 c4 c8"><span class="c6"></span></p><p class="c3 c4"><span class="c6">flex_barrier( int C );</span></p><p class="c3 c10 c8"><span class="c6"></span></p><p class="c3 c10"><span class="c14">Requires: </span><span>C shall be &gt;= </span><span>zero</span><span>. </span><span>[Note: </span><span>If C is zero, the synchronization condition is considered to have already been reached. In the case, the barrier may only be destroyed. End Note]</span><span><br></span><span class="c14">Effects:</span><span>&nbsp;Initializes the barrier with an expected thread count of C. Has the same effect as creating a flex barrier with a callable object that returns 0 and has no other side effects</span><span>.</span><span>&nbsp; </span><span>[Note: </span><span>The documentation below describes the behaviour of the flex barrier in terms of the callable object use in the constructor. If the flex barrier was created with this constructor, it behaves as if an internal callable object existed. </span><span>End Note]</span><span><br></span></p><p class="c3 c15"><span class="c6">~flex_barrier( );</span></p><p class="c3 c10"><span class="c14">Requires:</span><span>&nbsp;No threads are blocked at the </span><span class="c14">synchronization point</span><span>.</span></p><p class="c3 c10"><span class="c14">Effects:</span><span>&nbsp;destroys the barrier</span></p><p class="c3 c15"><span class="c6">void arrive_and_wait( );</span></p><p class="c3 c10"><span class="c14">Effects: </span><span>Blocks at the </span><span class="c14">synchronization point</span><span>&nbsp;until the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached. When a number of threads equal to the count of expected threads has arrived, the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached. Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion will be invoked in the context of one of the threads that invoked </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>.) When the completion returns, </span><span>the internal state will be reset</span><span>, and all blocked threads will be unblocked. The barrier may then be used for a new cycle.</span></p><p class="c3 c10"><span>If the </span><span>completion </span><span>returns 0 then the count of expected threads is unchanged, and the set of participating threads is unchanged</span><span>.</span><span>&nbsp;</span><span>Otherwise the count of expected threads is set to the completion&#39;s return value</span><span>. &nbsp;Doing so relaxes the restriction that the same set of threads must arrive at the barrier in each new cycle.</span></p><p class="c3 c10"><span>Note that it is safe for a thread to re-enter </span><span class="c6">arrive_and_wait()</span><span>&nbsp;immediately. It is not necessary to ensure that all blocked threads have exited </span><span class="c6">arrive_and_wait()</span><span>&nbsp;before one thread re-enters it.</span></p><p class="c3 c10"><span class="c14">Synchronization: </span><span>Synchronizes with invocations of </span><span>the</span><span>&nbsp;completion function. Invocations of the c</span><span>ompletion</span><span>&nbsp;function then s</span><span>ynchronize with calls unblocked as a result of this call.</span></p><p class="c3"><span class="c6">void arrive_and_drop( );</span></p><p class="c3 c10"><span class="c14">Effects: </span><span>Signals that this thread has arrived at the </span><span class="c14">synchronization point</span><span>. </span><span>When a number of threads equal to the count of expected threads has arrived,</span><span>&nbsp;the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached. May block until the </span><span class="c14">synchronization condition</span><span>&nbsp;is reached</span><span>.</span><span>&nbsp;Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion will be invoked in the context of one of the threads that invoked </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>.) When the completion returns, </span><span>the internal state will be reset</span><span>, and all blocked threads will be unblocked. The barrier may then be used for a new cycle.</span></p><p class="c3 c10"><span>If the </span><span>completion </span><span>returns 0 then those threads that called </span><span class="c6">arrive_and_drop()</span><span>are removed from the set of expected threads</span><span>.</span><span>&nbsp;</span><span>Otherwise the count of expected threads is set to the completion&#39;s return value</span><span>&nbsp;and </span><span>the restriction that the same set of threads must arrive at the barrier in each new cycle is relaxed.</span></p><p class="c3 c10"><span class="c14">Synchronization: </span><span>Synchronizes with calls unblocked as a result of this call.</span></p><p class="c3"><span class="c6">flex_barrier(const flex_barrier&amp;) = delete;<br>flex_barrier&amp; operator=(const flex_barrier&amp;) = delete;</span></p><h4 class="c3 c18 c35"><a name="h.iqd3r8x75vrc"></a><span>Memory Ordering</span></h4><p class="c3"><span>All calls to the completion function synchronize with those calls to </span><span class="c6">arrive_and_wait()</span><span>&nbsp;or </span><span class="c6">arrive_and_drop()</span><span>&nbsp;that triggered the completion.</span><span>&nbsp;The completion function synchronizes with all calls unblocked after it has run</span><span>.</span></p><p class="c3 c8"><span></span></p><h2 class="c3 c18"><a name="h.p3kxt6srrk2q"></a><span>Notes</span></h2><p class="c3"><span>(The following notes have not changed significantly since the last revision of this paper, and may be skipped by readers familiar with that.)</span></p><h3 class="c3 c18 c35"><a name="h.7jwjdi7yg2r3"></a><span>Use of a scoped guard to manage latches and barriers</span></h3><p class="c1"><span class="c11 c7">A future paper will propose a</span><span class="c0">&nbsp;</span><span class="c20 c6">scoped_guard</span><span class="c11 c7">; a helper class that invokes a function when it goes out of scope. This is analagous to a</span><span class="c0">&nbsp;</span><span class="c20 c6">std::unique_ptr</span><span class="c11 c7">, which deletes an object when it goes out of scope. The latch and barrier classes could provide scoped guards that would ensure that a latch was always decremented when a worker had finished, regardless of how the worker terminated. For example:</span></p><p class="c3 c5 c4 c8"><span class="c11 c7"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; void DoWork(latch&amp; completion_latch) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Automatically invokes completion_latch.count_down() when this</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // function terminates</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; scoped_guard g = completion_latch.count_down_guard();</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; switch (state) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; case 0:</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; case 1:</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; alogorithm1();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; case 2:</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; alogorithm2();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; return;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; default:</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; throw std::logic_error</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; }</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; }</span></p><p class="c3 c5 c15 c8"><span class="c2"></span></p><p class="c1"><span class="c11 c7">We suggest that a suitable</span><span class="c0">&nbsp;</span><span class="c20 c6">xxx_guard()</span><span class="c11 c7">&nbsp;method be provided for all of the latch and barrier methods, as an aid to safe useage. This would avoid having threads waiting for termination conditions that are never triggered.</span></p><a href="#" name="id.2et92p0"></a><h3 class="c3 c18"><a name="h.dcshjdgb0rki"></a><span class="c17">Sample Usage</span></h3><p class="c1"><span class="c11 c7">Sample use cases for the latch include:</span></p><ul class="c27 lst-kix_list_2-0 start"><li class="c16 c3 c5 c4"><span class="c11 c7">Setting multiple threads to perform a task, and then waiting until all threads have reached a common point.</span></li><li class="c1 c16"><span class="c11 c7">Creating multiple threads, which wait for a signal before advancing beyond a common point.</span></li></ul><p class="c1"><span class="c11 c7">An example of the first use case would be as follows:</span></p><p class="c3 c5 c4 c8"><span class="c11 c7"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; void DoWork(threadpool* pool) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; latch completion_latch(NTASKS);</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; for (int i = 0; i &lt; NTASKS; ++i) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; pool-&gt;add_task([&amp;] {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; // perform work</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; completion_latch.count_down();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; }));</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; }</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Block until work is done</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; completion_latch.wait();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; }</span></p><p class="c3 c5 c15 c8"><span class="c2"></span></p><p class="c1"><span class="c11 c7">An example of the second use case is shown below. We need to load data and then process it using a number of threads. Loading the data is I/O bound, whereas starting threads and creating data structures is CPU bound. By running these in parallel, throughput can be increased.</span></p><p class="c3 c5 c4 c8"><span class="c11 c7"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; void DoWork() {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; latch start_latch(1);</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; for (int i = 0; i &lt; NTHREADS; ++i) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; // Initialize data structures. This is CPU bound.</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; start_latch.wait();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; // perform work</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; }));</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; }</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Load input data. This is I/O bound.</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Threads can now start processing</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; start_latch.count_down();</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Wait for threads to finish, delete allocated objects.</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; }<br></span></p><p class="c1"><span class="c11 c7">The barrier can be used to co-ordinate a set of threads carrying out a repeated task.</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; void DoWork() {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; Tasks&amp; tasks;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; </span><span class="c19">int</span><span class="c2">&nbsp;</span><span class="c19">n</span><span class="c2">_threads;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; barrier task_barrier(n_threads);</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; for (int i = 0; i &lt; n_threads; ++i) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; bool active = true;</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; while(active) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Task task = tasks.get();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // perform task</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; task_barrier.</span><span class="c19">arrive_and_wait</span><span class="c2">();</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; &nbsp;});</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; }</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; // Read each stage of the task until all stages are complete.</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; while (!finished()) {</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; &nbsp; GetNextStage(tasks);</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; &nbsp; }</span></p><p class="c3 c5 c4"><span class="c2">&nbsp; }</span></p><p class="c3 c8"><span></span></p><p class="c3"><span>The flex_barrier can be used to co-ordinate a set of threads where the number of threads can vary. In the example below, we reduce the number of threads when a task finishes. (Alternatively we could increase the number of threads if the task required it.) Note that reducing threads can be done via </span><span class="c6">barrier::arrive_and_drop()</span><span>&nbsp;method, but increasing the number of threads can only be done with a flex_barrier.</span></p><p class="c3 c4 c8"><span class="c19"></span></p><p class="c3 c4"><span class="c19">&nbsp; void DoWork() {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; Tasks&amp; tasks;</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; int initial_threads;</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; atomic&lt;int&gt; current_threads(initial_threads)</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; vector&lt;thread*&gt; workers;</span></p><p class="c3 c4 c8"><span class="c19"></span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; // Create a flex_barrier, and set a lambda that will be </span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; // invoked every time the barrier counts down. If one or more </span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; // active threads have completed, reduce the number of threads.</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; std::function rf = [&amp;] { return current_threads;};</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; notfying_barrier task_barrier(n_threads, rf);</span></p><p class="c3 c4 c8"><span class="c19"></span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; for (int i = 0; i &lt; n_threads; ++i) {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; workers.push_back(new thread([&amp;] {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; bool active = true;</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; while(active) {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Task task = tasks.get();</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // perform task</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ...</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (finished(task)) {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; current_threads--;</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; active = false;</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; task_barrier.arrive_and_wait();</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; &nbsp;});</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; }</span></p><p class="c3 c4 c8"><span class="c19"></span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; // Read each stage of the task until all stages are complete.</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; while (!finished()) {</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; &nbsp; GetNextStage(tasks);</span></p><p class="c3 c4"><span class="c19">&nbsp; &nbsp; }</span></p><p class="c3 c4"><span class="c19">&nbsp; }</span></p><p class="c3 c5 c8 c15"><span class="c19"></span></p><a href="#" name="id.tyjcwt"></a><h2 class="c3 c18"><a name="h.gtvi3sys2b7e"></a><span class="c17">Alternative Solutions</span></h2><p class="c1"><span class="c11 c7">Java provides a Phaser interface for thread co-ordination. See</span><span class="c0">&nbsp;</span><span class="c0"><a class="c9" href="http://www.google.com/url?q=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Futil%2Fconcurrent%2FPhaser.html&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHdvbxqj93pJSIF--RIB_ywiLV-ug">http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Phaser.html</a></span><span class="c11 c7">. It has been suggested that a similar concept could be used in the C++ standard instead of the current latch/barrier proposal.</span></p><p class="c1"><span class="c11 c7">The Phaser interface offers the following potential advantages:</span></p><ul class="c27 lst-kix_list_3-0 start"><li class="c3 c5 c4 c16"><span class="c11 c7">The number of tasks can be updated at any point.</span></li><li class="c1 c16"><span class="c11 c7">A thread can complete a phase without blocking.</span></li></ul><p class="c1"><span class="c11 c7">To describe the differences between Phasers and barriers we use the term &#39;cycle&#39;. Both constructs support the use-case where a number of threads block until they all arrive at a certain point. Once all threads have arrived, they are unblocked, and can continue. We refer to this as a &#39;cycle&#39;. Both Phasers and barriers are reusable: after one cycle has completed a new cycle can begin.</span></p><p class="c1"><span class="c11 c7">The proposed </span><span>flex_barrier</span><span class="c11 c7">&nbsp;can only update the number of threads during the completion function that is invoked when the count reaches zero at the end of a cycle. The Phaser allows a thread to call register() at any point during the cycle. However it is difficult to reason about the ordering of this behaviour. To ensure that a thread is added to a particular cycle would require additional synchronization with the other threads in the cycle, with the complexity and performance overheads that this involves. If there is no such synchronization then a thread cannot control which cycle it joins. As the purpose of the barrier is to allow threads to co-operate on tasks, this feature of Phaser does not seem useful.</span></p><p class="c1"><span class="c11 c7">The Phaser class supports an arriveAndAwaitAdvance() method that corresponds to barrier&#39;s </span><span>arrive_and_wait</span><span class="c11 c7">(). At also supports an arrive() method that decrements the internal count without blocking. This potentially allows some threads to arrive at a later cycle while other threads are still completing earlier cycles. Again, we feel that it can be hard to reason about the order in which threads will enter different cycles, or when the sequence of operations controlled by this Phaser has finally terminated. In addition, this capability adds to the complexity of the implementation.</span></p><p class="c1"><span class="c11 c7">We feel that it would be possible to implement a C++ Phaser, possibly using the proposed latch and barrier classes, but that the latter two offer a simpler interface that will be useful for most thread co-ordination tasks.</span></p><a href="#" name="id.3dy6vkm"></a><h2 class="c3 c18"><a name="h.32y3d6lgyg3p"></a><span class="c17">Synopsis</span></h2><p class="c1"><span class="c11 c7">The synopsis is as follows.</span></p><p class="c3 c5 c4 c8"><span class="c11 c7"></span></p><p class="c3 c5 c4"><span class="c19">class latch {<br>public:<br> &nbsp;explicit latch(int count);<br> &nbsp;~latch();<br> &nbsp;void arrive();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void count_down(int n);<br> &nbsp;void wait();<br> &nbsp;bool try_wait();</span></p><p class="c3 c5 c4"><span class="c2">};</span></p><p class="c3 c5 c4 c8"><span class="c2"></span></p><p class="c3 c5 c4"><span class="c19">class barrier {<br> public:<br> &nbsp;explicit barrier(int num_threads);<br> &nbsp;~barrier();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void arrive_and_drop();</span></p><p class="c3 c5 c4"><span class="c2">};</span></p><p class="c3 c5 c4 c8"><span class="c19"></span></p><p class="c3 c5 c15"><span class="c19">class flex_barrier {<br> public:<br> &nbsp;template &lt;typename F&gt;<br> &nbsp;flex_barrier(int num_threads, F completion);<br> &nbsp;~flex_barrier();<br> &nbsp;void arrive_and_wait();<br> &nbsp;void arrive_and_drop();<br>};</span></p><p class="c3 c8"><span class="c11 c7"></span></p></body></html>
