<div style="text-align: right">

<p><em>Document No.: P2257R0</em></p>
<p><em>Date: 2020-11-15</em></p>
<p><em>Audience: LEWG Library Evolution</em></p>
<p><em>Reply-to: Dalton M. Woodard &lt;<script type="text/javascript">
<!--
h='&#x61;&#114;&#x67;&#x6f;&#46;&#x61;&#x69;';a='&#64;';n='&#100;&#x61;&#108;&#116;&#x6f;&#110;&#x6d;&#x77;&#x6f;&#x6f;&#100;&#x61;&#114;&#100;';e=n+a+h;
document.write('<a h'+'ref'+'="ma'+'ilto'+':'+e+'" clas'+'s="em' + 'ail">'+e+'<\/'+'a'+'>');
// -->
</script><noscript>&#100;&#x61;&#108;&#116;&#x6f;&#110;&#x6d;&#x77;&#x6f;&#x6f;&#100;&#x61;&#114;&#100;&#32;&#x61;&#116;&#32;&#x61;&#114;&#x67;&#x6f;&#32;&#100;&#x6f;&#116;&#32;&#x61;&#x69;</noscript>&gt;</em></p>
</div>

<hr />
<h1>Blocking is an insufficient description for senders and receivers</h1>
<h2>Abstract</h2>
<p>Proposes an initial direction for a reformulation and extension of the blocking property to senders. No wording is suggested as of this revision.</p>
<h2>Introduction</h2>
<p>The most recent revision of <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html">P0443, A Unified Executors Proposal for C++</a> specifies a number of generic properties, the vast majority of which are focused on the particulars of executor types and, secondarily, schedulers. Recent design work, however, has emphasized the importance of senders, receivers, and schedulers to the overall picture. There is as of late an increased understanding that these concepts likely represent the fundamental abstractions for generic concurrent programming, rather than executors. Indeed, eager executors should probably be viewed as limited tools of expedience rather than fundamental abstractions. For reference see the papers <a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1525r1.pdf">One-Way is a Poor Basis Operation</a>, and <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2235r0.html">Disentangling schedulers and executors</a>.</p>
<p>Even so, most aspects of the design work laid out already for executors remains important for senders and receivers. In particular are the properties of certain classes of types, with which generic algorithms may conditionally enabled and optimized.</p>
<p>New approaches to specifying these properties have been suggested in <a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2220r0.pdf">redefine properties in P0443</a>, the general direction of which we agree with and the proposed mechanism of which we'll assume for the rest of this paper.</p>
<p>Our concern for the remainder of this paper shall be the blocking property, which <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html">P0443</a> specifies as &quot;[describing] what guarantees executors provide about the blocking behavior of their execution functions.&quot; When adapted to a property query of sender types, we can use this information to perform library-internal optimizations. For instance, it can be shown that a default implementation of the <code>execution::submit()</code> algorithm for senders and receivers can elide heap allocations, conditional on whether the given sender guarantees it blocks execution of the calling thread pending completion of the operation.</p>
<p>This alone should be sufficient motivation to redesign the properties in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html">P0443</a> to apply generically to types other than executors, but there are other benefits as well. Staying focused on the blocking property, this allows for more ergonomic and streamlined implementations of custom sender types, which <em>could</em> choose to omit customization of <code>submit()</code> entirely, provided a guarantee that the default implementation in terms of <code>connect()</code> and <code>start()</code> will be just as efficient. As it stands now, custom sender types, even those that can guarantee completion inline such as the proposed algorithm <code>just()</code> from <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1897r3.html">P1897</a>, would likely always have to customize submit() to avoid unnecessary heap allocations.</p>
<p>A straightforward adaptation of the blocking property to senders appears to not be possible, however, and <a href="https://github.com/executors/executors/issues/480">issue #480 of the executors design review</a> from earlier this year highlighted the basic problems. We believe some of the issues with this can be resolved by carefully reformulating the blocking property.</p>
<h2>Proposal</h2>
<p>First, we believe that blocking is the wrong description for how senders behave, at least to a point. The principal benefit of senders and receivers is their ability to compose cleanly, efficiently, and <em>lazily</em>. Describing a sender with terms relating to blocking, therefore, seems inappropriate. Doubly so since senders are just <em>one half</em> of the picture. We think blocking should instead be reserved for describing operations, which we will explore in more detail later on.</p>
<p>As regards the potential for eliding heap allocations in <code>execution::submit()</code>, what matters is not blocking <em>per se</em>, but rather <em>how and when</em> a sender completes, and in <em>which context</em> a connected receiver's completion channels are guaranteed to be signaled. In particular, a default implementation of <code>execution::submit()</code> may elide allocation of temporary state and enjoy the efficiency of a direct implementation along the lines of</p>
<pre><code>operation_state auto op = execution::connect(S, R);
execution::start(op);
</code></pre>
<p>if and only if the sender type <code>S</code> can guarantee it fulfills the receiver contract synchronously with the invocation of <code>start()</code>. This likely means it must guarantee a <em>strongly happens before</em> relationship with return from <code>start()</code>. Notice how this is <em>not</em> a description of a blocking operation in the general case, but rather a description of a synchronous operation, and we believe it would be unfortunate to conflate those two terms.</p>
<p>Consider for instance the algorithm <code>just()</code> mentioned previously. This is probably not what most would consider a &quot;blocking&quot; algorithm -- in fact, that description is in disagreement with the standard's definition of &quot;blocking&quot; -- but the only language provided by <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html">P0443</a> to describe it would be as &quot;always blocking&quot;. The same goes for inline schedulers and inline executors.</p>
<p>As a consequence, we think senders should first and foremost be described by their <em>completion guarantees</em>. Taking the language of the blocking property and turning it around, more or less, we'd have the following possibilities for guarantees a sender type might make:</p>
<ul>
<li><code>unspecified_completion_t</code>, from the prior description <code>possibly_blocking_t</code>, guaranteeing nothing about when or where a connected receiver's completion- signal operations may occur;</li>
<li><code>asynchronous_completion_t</code>, from the prior description <code>never_blocking_t</code>, guaranteeing that a connected receiver's completion-signal operations will not occur on the calling thread before <code>execution::start()</code> returns, but does not prohibit them from occurring concurrently on another thread prior to, concurrently with, or after return from <code>execution::start()</code>;</li>
<li>and <code>synchronous_completion_t</code>, from the prior description <code>always_blocking_t</code>, guaranteeing that a connected receiver's completion- signal operations will occur before <code>execution::start()</code> returns, but does not guarantee on which thread an operation may occur -- specifically, they need not occur on the thread calling <code>execution::start()</code>.</li>
</ul>
<p>We can also strengthen the requirement for <code>synchronous_completion_t</code> to obtain another possibly useful guarantee:</p>
<ul>
<li><code>inlined_completion_t</code>, guaranteeing that a connected receiver's completion- signal operations will occur before <code>execution::start()</code> returns, and on the thread calling <code>execution::start()</code>.</li>
</ul>
<p>The default assumption in generic code would be <code>unspecified_completion_t</code> when interfacing with a sender type that does not customize this property.</p>
<p>Now, we mentioned earlier how blocking should be a description of <em>operations</em> rather than senders, so we suggest something like the blocking property be redesigned to describe operation states.</p>
<p>Before moving on, recall the definition of blocking provided in <a href="http://eel.is/c++draft/defns.block">defns.block</a></p>
<blockquote>
<p>⟨execution⟩ wait for some condition (other than for the implementation to execute the execution steps of the thread of execution) to be satisfied before continuing execution past the blocking operation</p>
</blockquote>
<p>Note that we do not think this should be a property of senders, because senders <em>do not</em> comprise the whole picture of an asynchronous operation, and <em>do not</em> have visibility into the work performed underneath a call to <code>execution::set_value()</code>. Likewise, receivers, which represent the completions of (possibly intermediate) asynchronous operations, have no visibility into the upstream computations of the senders they are connected to. Both of these facts are a <em>good thing</em> for the design! But it does mean that both halves matter equally when determining whether a fully composed operation is blocking.</p>
<p>This is all to say, in the general case we do not know whether an operation is blocking until both sender and receiver are connected. Indeed, we believe it is necessary for information about blocking to back-propagate from receiver to sender at the time of connecting one to the other, and then forward again to any code requiring knowledge of it, exposed through the returned operation state. This is appropriate since a given thread of execution ought to care mostly about the behavior of <code>execution::start()</code>. Also, the required property queries could be performed with the same customization point, and we suggest the name <code>get_blocking</code> from <a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2220r0.pdf">P2220</a> is retained for that purpose.</p>
<p>Moreover, the above description intuitively corresponds to the distinction between schedulers, composed lazily with senders and receivers, and executors, used eagerly. The language of blocking, appropriately redefined to describe operations, could even be used to recover the description of blocking for executors.</p>
<p>We also have the following rough descriptions for blocking properties redefined for operation states, adapted from the wording in <a href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2220r0.pdf">P2220</a>:</p>
<ul>
<li><p><code>possibly_blocking_t</code>, guaranteeing nothing about invocation of an operation state's <code>start()</code> customization, execution may block pending some condition external to the steps of the thread of execution when invoking <code>execution::start()</code>;</p></li>
<li><p><code>never_blocking_t</code>, guaranteeing that execution shall not block pending any condition external to the steps of the thread of execution when invoking <code>execution::start()</code>;</p></li>
<li><p>and <code>always_blocking_t</code>, guaranteeing that execution shall block pending some condition external to the steps of the steps of the thread of execution when invoking <code>execution::start()</code>;</p></li>
</ul>
<p>Like that for the completion property, the default assumption in generic code would be <code>possibly_blocking_t</code> when interfacing with an operation that does not customize this property.</p>
<p>Disregarding for the moment the categories <code>unspecified_completion_t</code> and <code>possibly_blocking_t</code>, we think there are four meaningful combinations of completion guarantees for senders and blocking guarantees of the operation states they produce when connected to a receiver. These are, with brief concrete examples of each:</p>
<ul>
<li><code>asynchronous_completion_t</code> / <code>never_blocking_t</code>, such as work enqueued to a background thread pool that completes in a resident thread (assuming the enqueue performed in <code>start()</code> can be implemented in a non-blocking manner);</li>
<li><code>synchronous_completion_t</code> / <code>always_blocking_t</code>, such as fork/join parallelism synchronously waited on for completion (consider <code>bulk_schedule</code> to a thread pool or GPU resource followed by <code>sync_wait()</code>);</li>
<li><code>inlined_completion_t</code> / <code>always_blocking_t</code>, such as a write to or read from a network socket configured in blocking mode, resulting in the number of bytes transferred with no additional dependent work;</li>
<li>and <code>inlined_completion_t</code> / <code>never_blocking_t</code>, such as a write to or read from a network socket configured in non-blocking mode, resulting in the number of bytes transferred with no additional dependent work.</li>
</ul>
<p>Note how the language of blocking as currently specified for <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html">P0443</a> is insufficient to distinguish between all of the above examples.</p>
<p>It is worth emphasizing the last two examples above. When applying senders and receivers to future designs of fundamental I/O abstractions, the blocking property allows us to express that asynchrony is not required to guarantee non- blocking operation. It would be unfortunate to not have the requisite vocabulary to describe this fact.</p>
<h2>Impact</h2>
<p>One significant benefit obtained with the above design is a disentangling of the concerns around blocking operations and completion guarantees. They are properly orthogonal, and our suggested direction reflects this. We also think this is a direction that's more harmonious with the language's current descriptions of blocking and concurrency. Moreover, when applying senders and receivers to the implementation of latency sensitive and safety critical applications, it may be paramount to afford generic code comprising execution contexts and runtime systems a deep understanding of the work being scheduled and executed. And by reflecting these properties in the primitive layers of a design, in a way that's consistent with their fundamental mode of operation, can let us achieve this.</p>
<h2>Limitations and Open Questions</h2>
<p>It is still unclear how forward progress guarantees (specifically, the concurrent, parallel, and weakly parallel guarantees described in the standard) fit into this picture. We believe further research is needed in this direction, along with an appropriate description for execution allowances along the lines of sequenced, parallel, parallel-unsequenced, and unsequenced.</p>
<p>The completion properties for senders described above are assuming each of the receiver completion channels are of equal status. This may not be appropriate, and it could be desirable to instead focus on the value channel, specifically, allowing senders wide discretion in choosing when and where the error and/or cancellation channels are signaled, without compromising their completion guarantee. For example, can an operation that initiates a truly asynchronous computation still claim &quot;completes asynchronously&quot; if sometimes it must call <code>execution::set_error()</code> or <code>execution::set_done()</code> on the initiating thread? We believe so, but the wording will have to be specified carefully to avoid confusion.</p>
