<!DOCTYPE html>
    <html>
    <head>
        <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
        <style>
/*--------------------------------------------------------------------------------------------- * Copyright (c) Microsoft Corporation. All rights reserved. * Licensed under the MIT License. See License.txt in the project root for license information. *--------------------------------------------------------------------------------------------*/ body { font-family: "Segoe WPC", "Segoe UI", "SFUIText-Light", "HelveticaNeue-Light", sans-serif, "Droid Sans Fallback"; font-size: 14px; padding: 0 12px; line-height: 22px; word-wrap: break-word; } body.scrollBeyondLastLine { margin-bottom: calc(100vh - 22px); } body.showEditorSelection .code-line { position: relative; } body.showEditorSelection .code-active-line:before, body.showEditorSelection .code-line:hover:before { content: ""; display: block; position: absolute; top: 0; left: -12px; height: 100%; } body.showEditorSelection li.code-active-line:before, body.showEditorSelection li.code-line:hover:before { left: -30px; } .vscode-light.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(0, 0, 0, 0.15); } .vscode-light.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(0, 0, 0, 0.40); } .vscode-dark.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(255, 255, 255, 0.4); } .vscode-dark.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(255, 255, 255, 0.60); } .vscode-high-contrast.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(255, 160, 0, 0.7); } .vscode-high-contrast.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(255, 160, 0, 1); } img { max-width: 100%; max-height: 100%; } a { color: #4080D0; text-decoration: none; } a:focus, input:focus, select:focus, textarea:focus { outline: 1px solid -webkit-focus-ring-color; outline-offset: -1px; } hr { border: 0; height: 2px; border-bottom: 2px solid; } h1 { padding-bottom: 0.3em; line-height: 1.2; border-bottom-width: 1px; border-bottom-style: solid; } h1, h2, h3 { font-weight: normal; } h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { font-size: inherit; line-height: auto; } a:hover { color: #4080D0; text-decoration: underline; } table { border-collapse: collapse; } table > thead > tr > th { text-align: left; border-bottom: 1px solid; } table > thead > tr > th, table > thead > tr > td, table > tbody > tr > th, table > tbody > tr > td { padding: 5px 10px; } table > tbody > tr + tr > td { border-top: 1px solid; } blockquote { margin: 0 7px 0 5px; padding: 0 16px 0 10px; border-left: 5px solid; } code { font-family: Menlo, Monaco, Consolas, "Droid Sans Mono", "Courier New", monospace, "Droid Sans Fallback"; font-size: 14px; line-height: 19px; } body.wordWrap pre { white-space: pre-wrap; } .mac code { font-size: 12px; line-height: 18px; } code > div { padding: 16px; border-radius: 3px; overflow: auto; } /** Theming */ .vscode-light { color: rgb(30, 30, 30); } .vscode-dark { color: #DDD; } .vscode-high-contrast { color: white; } .vscode-light code { color: #A31515; } .vscode-dark code { color: #D7BA7D; } .vscode-light code > div { background-color: rgba(220, 220, 220, 0.4); } .vscode-dark code > div { background-color: rgba(10, 10, 10, 0.4); } .vscode-high-contrast code > div { background-color: rgb(0, 0, 0); } .vscode-high-contrast h1 { border-color: rgb(0, 0, 0); } .vscode-light table > thead > tr > th { border-color: rgba(0, 0, 0, 0.69); } .vscode-dark table > thead > tr > th { border-color: rgba(255, 255, 255, 0.69); } .vscode-light h1, .vscode-light hr, .vscode-light table > tbody > tr + tr > td { border-color: rgba(0, 0, 0, 0.18); } .vscode-dark h1, .vscode-dark hr, .vscode-dark table > tbody > tr + tr > td { border-color: rgba(255, 255, 255, 0.18); } .vscode-light blockquote, .vscode-dark blockquote { background: rgba(127, 127, 127, 0.1); border-color: rgba(0, 122, 204, 0.5); } .vscode-high-contrast blockquote { background: transparent; border-color: #fff; }
</style>
<style>
/* Tomorrow Theme */ /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ /* Original theme - https://github.com/chriskempson/tomorrow-theme */ /* Tomorrow Comment */ .hljs-comment, .hljs-quote { color: #8e908c; } /* Tomorrow Red */ .hljs-variable, .hljs-template-variable, .hljs-tag, .hljs-name, .hljs-selector-id, .hljs-selector-class, .hljs-regexp, .hljs-deletion { color: #c82829; } /* Tomorrow Orange */ .hljs-number, .hljs-built_in, .hljs-builtin-name, .hljs-literal, .hljs-type, .hljs-params, .hljs-meta, .hljs-link { color: #f5871f; } /* Tomorrow Yellow */ .hljs-attribute { color: #eab700; } /* Tomorrow Green */ .hljs-string, .hljs-symbol, .hljs-bullet, .hljs-addition { color: #718c00; } /* Tomorrow Blue */ .hljs-title, .hljs-section { color: #4271ae; } /* Tomorrow Purple */ .hljs-keyword, .hljs-selector-tag { color: #8959a8; } .hljs { display: block; overflow-x: auto; color: #4d4d4c; padding: 0.5em; } .hljs-emphasis { font-style: italic; } .hljs-strong { font-weight: bold; }
</style>
<style>
ul.contains-task-list { padding-left: 0; } ul ul.contains-task-list { padding-left: 40px; } .task-list-item { list-style-type: none; } .task-list-item-checkbox { vertical-align: middle; }
</style>
        <style>
            body {
                font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', 'HelveticaNeue-Light', 'Ubuntu', 'Droid Sans', sans-serif;
                font-size: 14px;
                line-height: 1.6;
            }
        </style>
    </head>
    <body>
        <table>
<thead>
<tr>
<th>Document Number:</th>
<th>P0978R0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Date:</td>
<td>2018-03-31</td>
</tr>
<tr>
<td>Audience:</td>
<td>Evolution</td>
</tr>
<tr>
<td>Revises:</td>
<td>none</td>
</tr>
<tr>
<td>Reply to:</td>
<td>Gor Nishanov (gorn@microsoft.com)</td>
</tr>
</tbody>
</table>
<h1 id="a-response-to-p0973r0-coroutines-ts-use-cases-and-design-issues">A Response to &quot;P0973r0: Coroutines TS Use Cases and Design Issues&quot;</h1>
<h2 id="introduction-a-refintroa">Introduction <a ref="#intro"></a></h2>
<p>A coroutine is a generalization of a function that in addition to usual control flow operations such as call and return, can suspend execution of itself and yield control back to the caller with an ability to resume execution at a later time. In C++, coroutines were explicitly designed to efficiently and succinctly support  the following use patterns:</p>
<ul>
<li>asynchronous tasks, where <code>co_await &lt;expr&gt;</code> suspends a coroutine while waiting for the result of an expression) and the coroutine is resumed once the result is available;</li>
<li>generators, where <code>co_yield &lt;expr&gt;</code> suspends the coroutine yielding the result of the expression to the consumer and the coroutine is resumed once the consumer asks for the next value;</li>
<li>asynchronous streams, which can be thought of as an asynchronous version of a generator where both <code>co_await</code> and <code>co_yield</code> can be used.</li>
</ul>
<p>Unlike most other languages that support coroutines, C++ coroutines are open and not tied to any particular runtime or generator type and allow libraries to imbue coroutines with meaning, whereas the compiler is responsible solely for efficient transformation of a function to a state machine that is the foundation of the coroutine.</p>
<p>Because C++ coroutines are open in nature and semantic provided by the library, they can be applied to some non-traditional use cases, such as automatic error propagation of <code>expected&lt;T,E&gt;</code>, which happened to be a major use case in Google.</p>
<p>P0973r0 paper identified a number of issues that make coroutines in their current form sub-optimal for exception-less error propagation. We agree with some of the issues (for example, awkwardness of <code>co_return</code> and <code>co_await</code> keywords in that scenario), but we categorically, absolutely, emphatically, vociferously object to notion that coroutines violate zero-overhead principle.
Before diving into details of why we believe that P0973r0 is mistaken on this issue, let's go through the areas of agreement (or mostly agreement).</p>
<!--This fundamental mistake significantly undermines the strength of arguments made and conclusions reached in P0973r0.
-->
<h2 id="const-reference-parameters-are-dangerous-a-refreferencesa">Const reference parameters are dangerous <a ref="#references"></a></h2>
<p>We agree with P0973 point here and would go even further. Any reference or raw pointer is dangerous if that reference/pointer survives and used after the lifetime of an object has ended.</p>
<p>While it is a hazard in C++ in general. It is a likely  hazard in asynchronous scenarios. The mitigation is available in coroutine design itself.</p>
<p>Though standard coroutine types are unlikely to be that drastic, custom coroutine types used in a particular codebase by a particular company can chose to ban reference and pointer arguments to a coroutine with an escape hatch with some version of <code>std::ref</code> wrapper, for example, when a developer is sure that references use
is safe. It only requires a little bit of template meta-programming and a <code>static_assert</code> when defining a coroutine type.</p>
<p>Note that the banning is done by the library defining the semantics of the coroutine. It will be a compile time error to declare a parameter that is deemed unsafe by the coroutine type designer. No coding guidelines or static analyzers are required. The code won't compile if a parameter of unacceptable type is declared in a coroutine.</p>
<h2 id="banning-return-is-user-hostile-and-make-migration-difficult-a-refreturna">Banning return is user-hostile, and make migration difficult <a ref="#return"></a></h2>
<p>We agree with authors' points. There is no technical reason for why coroutines cannot use <code>return</code> in place of <code>co_return</code>.</p>
<p>If there is a will of the committee, we can revisit the issue and explore the alternatives that can address this concern. For example:</p>
<ul>
<li><code>return</code> / <code>co_return</code> are interchangeable in coroutines</li>
<li><code>co_return</code> is gone completely (warning: breaking change)</li>
<li><code>return</code> can only be used in <em>&quot;non-suspending&quot;</em> coroutines, such as the ones used with <code>expected&lt;T,E&gt;</code>, whereas traditional coroutines will still be required to use <code>co_return</code>.</li>
</ul>
<h2 id="the-name-coawait-privileges-a-single-use-case-a-refawaita">The name co_await privileges a single use case <a ref="#await"></a></h2>
<p>Indeed, the names of keywords <code>co_await</code> and <code>co_yield</code> bake in certain expectations of what semantics library should provide with <code>co_await</code> implying waiting for some value to get into the coroutine and <code>co_yield</code> implying pushing some value out of the coroutine.</p>
<p>Even though <code>co_yield</code> can be implemented purely in terms of <code>co_await</code> we chose to have dedicated <code>co_yield</code> keyword, in order to anticipate enabling return type deduction in coroutines N4499/[dcl.spec.auto]/16. These keywords used alone or together nicely cover the intended design space:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// lambda return type deduces to generator&lt;int&gt; (C++20)</span>
[] { <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">10</span>; ++i) co_yield i; } 

<span class="hljs-comment">// function return type deduces to task&lt;double&gt; (C++20)</span>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">f</span><span class="hljs-params">()</span> </span>{ <span class="hljs-function">co_await <span class="hljs-title">foo</span><span class="hljs-params">()</span></span>; co_return <span class="hljs-number">3.14</span>; }

<span class="hljs-comment">// lambda return type deduces to async_stream&lt;size_t&gt; (post C++20)</span>
[] { <span class="hljs-keyword">for</span> (;;) {
       <span class="hljs-keyword">size_t</span> val = co_await read_async(); 
       co_yield val;
     }
};
</div></code></pre>
<p>Replacing meaningful <code>await</code> and <code>yield</code> keywords with some semantic-less keyword or symbol may make coroutines less readable, less user-friendly and is likely to make automatic type deduction in coroutines difficult if not impossible.</p>
<!--
I agree that for error-propagation scenarios the use of `co_await` and `co_return` is sub-optimal and more aesthetically pleasing syntax is desirable. Whether such syntax is tied to coroutines (as in functions that can suspend and resume) or it is a distinct facility is subject to debate. -->
<h2 id="constexpr-is-not-supported-a-refconstexpra">constexpr is not supported <a ref="#constexpr"></a></h2>
<p>We intentionally kept the scope of coroutine design <em>relatively</em> small but sufficient to cover the design space with the expectation that as major compiler vendors implement the feature, become familiar with it, gain better understanding of related issues, we will be better equipped to evolve the coroutines in the future. <code>constexpr</code> coroutines was one of the things that was cut to keep the design and implementation manageable.</p>
<p>While we do not want to rush <code>constexpr</code> in coroutines in general, we could consider <code>constexpr</code> for <em>non-suspending</em> coroutines if that is critical for Google use cases.</p>
<p>In earlier discussions with Richard Smith, we discussed how we can make it easier on the compiler to deal with cases like <code>expected&lt;T,E&gt;</code> where no actual suspension and resumption happens by allowing library writer to indicate that to the compiler. For an example, as a strawman, by specializing a trait:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> T, <span class="hljs-keyword">typename</span> E&gt;
<span class="hljs-keyword">struct</span> is_non_suspending&lt;expected&lt;T,E&gt;&gt;: true_type {};
</div></code></pre>
<h2 id="coawait-chains-awkwardly-a-refawait-compositiona">co_await chains awkwardly <a ref="#await-composition"></a></h2>
<p>We believe that precedence for <code>co_await</code> allows excellent composition properties for intended usage scenarios.</p>
<p>When using awaitable composition prior to applying operator <code>co_await</code> as it is the case with altering executor for resuming or requesting a different error handling mode:</p>
<pre class="hljs"><code><div>                    <span class="hljs-keyword">int</span> v = co_await s.async_read().on_executor(e);
expected&lt;v, error_code&gt; v = co_await s.async_read().as_expected_ec();
</div></code></pre>
<p>when sending the result of the <code>co_await</code> to standard algorithms via view composition:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">int</span> v = co_await when_all(s1.async_read(), s2.async_read())
        | reduce_exceptions()
        | accumulate(<span class="hljs-number">0</span>);  
</div></code></pre>
<p>and even when combining all of them together in one expression:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">int</span> v = co_await when_all(s1.async_read(), s2.async_read())
                         .on_executor(e)
        | reduce_exceptions()
        | accumulate(<span class="hljs-number">0</span>);  
</div></code></pre>
<p>Coming back to the issue of using parentheses when mixing postfix and prefix operators raised in P0973r0:</p>
<p>Yes. Mixing postfix and prefix operators require wrapping an expression in parentheses or splitting it out into a separate line.</p>
<pre class="hljs"><code><div><span class="hljs-keyword">int</span> size = (co_await GetString(some_complicated_expression)).size();

<span class="hljs-comment">// or</span>

<span class="hljs-keyword">const</span> <span class="hljs-keyword">auto</span> str = co_await GetString(some_complicated_expression);
<span class="hljs-comment">// use str.size() or assign it to a variable</span>
</div></code></pre>
<p>Giving that <code>co_await</code> currently cleanly composes with await transformers and view composition operators, having to add sometimes extra parentheses or split out an <em>await-expression</em> for clarity seems like a non-issue.</p>
<p>Moreover, in the particular case shown by the authors, it is unlikely that the <em>awaitable</em> returned from <code>GetString</code> would have anything but <code>await_ready</code>, <code>await_suspend</code>, and <code>await_resume</code> members defined, so an attempt to mistakenly extract <code>size()</code> from the <em>awaitable</em> will be immediately caught by the compiler with a helpful <em>Fix-it</em> explaining how to fix it.</p>
<h2 id="the-library-bindings-are-massive-and-yet-inflexible">The library bindings are massive and yet inflexible</h2>
<p>While adjectives characterizing flexibility and girth of coroutine library bindings are subjective, the fact is that Coroutine TS carefully balanced library binding in favor of readability and ease of use over the beauty of the core wording.</p>
<p>Coroutine bindings are usually only a few lines of code and easy to specify. In fact, they are so compact that Coroutine TS shows entire generator implementation as an example.</p>
<p>Let consider other points raised by the authors:</p>
<blockquote>
<p>the library extension points are ... keyed off of the function signature
... This effectively disallows per-function customization.</p>
</blockquote>
<p>This is correct, per function customization was intentionally cut from the design to keep it small and that decision was proven to be correct. In four years and with thousands of users there was not a single customer asking for per function customization.</p>
<p>A particular form of per function customization that was considered and cut was:</p>
<pre class="hljs"><code><div>task&lt;T&gt; foo() <span class="hljs-keyword">using</span>(different-coroutine-trait-than-<span class="hljs-keyword">default</span>-one) {
  ...
}
</div></code></pre>
<p>where <em>using-traits-clause</em> would appear only in function definition.</p>
<blockquote>
<p>implicit core/library coupling inside
<code>coroutine_handle</code> will mean that many kinds of implementation changes will require NxM coordination between vendors of compilers and standard libraries.</p>
</blockquote>
<p>If one looks at standard library headers of their favorite library vendor, one will discover that many library facilities are implemented using compiler intrinsics, <code>__builtin_launder</code>, <code>__builtin_addressof</code>, <code>__builtin_nan</code>, <code>__is_abstract</code>, <code>is_union</code>, <code>__is_class</code> or <code>__is_final</code>, to name a few. Adding 3-4 more intrinsics to the list of a hundred, does not make existing situation substantially different. MSVC, clang and GCC do attempt to harmonize their intrinsics to be able to compile each other standard libraries. The <code>&lt;coroutine&gt;</code> header is no different.</p>
<p>Unlike the actual coroutine transformation where implementation may vary dramatically between vendors, <code>coroutine_handle</code> type in question is a tiny wrapper around a <code>void*</code> and there is not much room to go wild. At the moment MSVC and Clang differ in only one intrinsic, but, we (MSVC) plan to match Clang before C++20.</p>
<table>
<thead>
<tr>
<th>Member</th>
<th>libcxx</th>
<th>MSVC</th>
</tr>
</thead>
<tbody>
<tr>
<td>resume()</td>
<td>__builtin_coro_resume</td>
<td>_coro_resume</td>
</tr>
<tr>
<td>destroy()</td>
<td>__builtin_coro_destroy</td>
<td>_coro_destroy</td>
</tr>
<tr>
<td>done()</td>
<td>__builtin_coro_done</td>
<td>_coro_done</td>
</tr>
<tr>
<td>promise()</td>
<td>__builtin_coro_promise</td>
<td>N/A</td>
</tr>
</tbody>
</table>
<p>Now let's proceed to the most controversial claim of P0973r0 with which we disagree violently and emphatically!</p>
<h2 id="implicit-allocation-violates-the-zero-overhead-principle-a-refzeroa">Implicit allocation violates the zero-overhead principle <a ref="#zero"></a></h2>
<h3 id="brief-recap-of-rationale-for-the-c-coroutine-design-a-refdesigna">Brief recap of rationale for the C++ Coroutine design <a ref="#design"></a></h3>
<p>The design of C++ coroutines was a delicate balancing act weighting different concerns against each other that led to a decision to rely on elidable implicit memory allocation for the coroutine frame by default and giving a coroutine designer an option to override the default allocation if desired.</p>
<p>We prioritized having zero-overhead coroutines with light-weight syntax out of the box over losing some control over allocation of the coroutine frame to a compiler. Note that in the regular functions developers have zero control of how activation frames are allocated.</p>
<p>Zero-overhead out of the box, meant that we wanted an experience where developers do not have to pack their entire logic into a single coroutine out of fear that breaking it out into smaller coroutines for readability and good hygiene will impose overhead. We also categorically did not want to force them to deal with custom allocators for simple tasks like breaking one coroutine into smaller pieces. The following has to work efficiently out of the box without requiring users to tinker with custom allocators and without restrictions on how many nested coroutines could be called in this fashion (without recursion).</p>
<pre class="hljs"><code><div>task&lt;&gt; big_task() { <span class="hljs-comment">// lifetime of `subtask` is fully enclosed in its caller</span>
  ...
  <span class="hljs-function">co_await <span class="hljs-title">subtask</span><span class="hljs-params">()</span></span>; <span class="hljs-comment">// coroutine frame allocation elided</span>
  ...
}
</div></code></pre>
<p>We also wanted to have generators that are as efficient as any other way of expressing a range of lazily produced values:</p>
<pre class="hljs"><code><div>generator&lt;<span class="hljs-keyword">int</span>&gt; range(<span class="hljs-keyword">int</span> from, <span class="hljs-keyword">int</span> to) {
  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = from; i &lt; to; ++i)
    co_yield i;
}
<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span> </span>{ <span class="hljs-comment">// lifetime of `range` is fully enclosed in its caller</span>
  <span class="hljs-keyword">auto</span> s = range(<span class="hljs-number">1</span>, <span class="hljs-number">10</span>); <span class="hljs-comment">// coroutine frame allocation elided</span>
  <span class="hljs-keyword">return</span> <span class="hljs-built_in">std</span>::accumulate(s.begin(), s.end(), <span class="hljs-number">0</span>);
}
</div></code></pre>
<p>The requirements for this optimization to occur is that your coroutine type has to have RAII semantics, some members of the coroutine type (i.e. <code>generator</code> or <code>task</code>) should be available for inlining and the lifetime of the coroutine does not escape its caller (see P0981r0 for more detailed exposition). Here are the examples where there <strong>will be</strong> an allocation of memory (and coroutine designer can control what allocator is used if desired).</p>
<pre class="hljs"><code><div>task&lt;&gt; session(tcp::socket s, <span class="hljs-keyword">size_t</span> block_size) { ... }

task&lt;&gt; server(io_context&amp; io, tcp::endpoint <span class="hljs-keyword">const</span>&amp; endpoint, <span class="hljs-keyword">size_t</span> block_size) {
    tcp::<span class="hljs-function">acceptor <span class="hljs-title">acceptor</span><span class="hljs-params">(io, endpoint)</span></span>;
    acceptor.listen();
    <span class="hljs-keyword">for</span> (;;) <span class="hljs-comment">// coroutine `session` escapes `server`. Will require allocation</span>
        spawn(io, session(co_await async_accept(acceptor), block_size));
}
</div></code></pre>
<p>Similarly, if you start moving generators around, they will need allocation:</p>
<pre class="hljs"><code><div>main() { <span class="hljs-comment">// coroutine `seq` escapes its caller</span>
  <span class="hljs-keyword">auto</span> gen = seq();
  <span class="hljs-comment">// do someting with gen</span>
  do_more_work(move(gen)); <span class="hljs-comment">// escapes, `seq` will require allocation</span>
}
</div></code></pre>
<p>Now that we gave a brief overview of the rationale behind the decision to rely on implicit elidable allocation of the coroutine frame, let's proceed to the concerns expressed in P0973r0.</p>
<!--
Further in the paper, I will be referring to this elidable implicit allocation of the coroutine frame as HALO (derived from Heap Allocation eLision Optimization). Pronounced hay-low.
ooo 1.8.2. Concerns about HALO from D0978R0
-->
<h3 id="concerns-about-implicit-elidable-coroutine-frame-allocation-a-refconcernsa">Concerns about implicit elidable coroutine frame allocation <a ref="#concerns"></a></h3>
<p>Let me briefly restate authors points in the condensed form (please see the original paper [P0973r0] for full context):</p>
<ul>
<li>not clear it is feasible in general, clang for example only do it when the coroutine is available in the same translation unit</li>
<li>even if eventually this optimization will be reliable, programmers won't trust it</li>
<li>at odds with design philosophy of C++</li>
<li>overloading of operator new is obscure and not flexible enough</li>
<li>and besides, allocation is not needed at all for expected&lt;T,E&gt;</li>
</ul>
<p>Let's start with analyzing the first point:</p>
<blockquote>
<ul>
<li>not clear it is feasible in general, clang for example only do it when the coroutine is available in the same translation unit</li>
</ul>
</blockquote>
<!--
The second part of that statement is obviously untrue and could be verified with a few lines of code in your favorite online compiler (https://godbolt.org/g/6qRsCT look at codegen for `main`).

```c++
generator<int> range(int from, int to) {
  // comment out the next line to see what happens when coroutine is inlined.
  a();a();a();a();a();a();a();a();a(); // added to prevent inlining
  for (int i = from; i < to; ++i)
    co_yield i;
}
int main() {
  auto s = range(1, 10);
  return std::accumulate(s.begin(), s.end(), 0);
}
```
-->
<p>Indeed, current implementation of coroutines in clang only runs heap elision optimization if coroutine is defined in the same translation unit as its user. However, it is purely a temporary situation and requires a little bit more investment in the compiler to remove the limitation.</p>
<p>Moreover, Richard Smith, Chandler Carruth and Gor Nishanov, on one rainy evening of 2014, sketched out how this optimization would work across hard ABI boundaries where peeking at the body of the coroutine is impossible.</p>
<p>With respect to feasibility of this optimization in general, if there is a <strong>reasonable doubt</strong> that this optimization is not feasible for the designed use cases, we need to absolutely stop work on the Coroutines TS and look for alternatives! <strong>Update:</strong> At the end of the Jacksonville 2018 meeting, Google and Microsoft compiler engineers have met and written a join statement on feasibilty of heap allocation elision optimizations (see: P0981R0: &quot;Halo: Coroutine Heap Allocation eLision Optimization: the joint response&quot;) where feasibility of this optimization was reaffirmed.</p>
<p>Elidable implicit allocation of coroutine frame is the foundational point in the design. This is what gives the coroutine light-weight syntax combined with zero-overhead. If it is does not work for the targeted use cases, we need to rethink our approach to the coroutines. So far, Clang and MSVC implementors are in agreement about feasibility. GCC have not tried implementing coroutines yet, but given that Clang and MSVC think that they can do it, GCC is likely could as well.</p>
<blockquote>
<ul>
<li>even if eventually this optimization will be reliable, programmers won't trust it</li>
</ul>
</blockquote>
<p>Authors present hesitance of returning large object by value due to distrust in copy elision as an analogy why developer may distrust implicit elidable frame allocation. This analogy does not work with coroutines.</p>
<p>With copy elision, developers have relatively straightforward alternative to returning by value, namely, adding a reference parameter. With coroutines, there is no easy alternative.</p>
<p>Coroutines address such a dire need, that developers are grabbing raw compiler bits of incomplete non-yet-standard feature and start using it, alternatives to coroutines are significantly more verbose and/or unreadable/maintainable. It seems that this matches the view of the authors of P0973r0 as well who concur in the introduction that coroutines address such a dire need that their developers will be grabbing non-standardized-bits as well: &quot;we believe that coroutines have the potential to solve several problems for C++ programmers at Google, and moreover those problems are likely serious enough, and the potential solution good enough, to justify the the risk of adopting coroutines prior to full standardization.&quot;</p>
<p>Unlike copy-elision where there is an easy alternative, coroutines are so badly needed that concern that coroutines will not be used because of mistrust in the complier is much less likely.</p>
<blockquote>
<ul>
<li>at odds with design philosophy of C++</li>
</ul>
</blockquote>
<p>Please see the earlier section: &quot;Brief recap of rationale for the C++ Coroutine design&quot; where philosophy behind coroutine design was explained and, in authors view, is fully in line with design philosophy of C++.
This view is shared by many experts including the creator of C++ Bjarne Stroustrup.</p>
<blockquote>
<ul>
<li>overloading of operator new is obscure and not flexible enough</li>
</ul>
</blockquote>
<p>As a core language feature, coroutines rely on core language facilities, such as overloading of <code>operator new</code> to control allocations. On the library side, coroutine designers, working on user-facing library types can chose to expose customization points for allocation in traditional ways of the libraries, namely, with full richness of the std::allocator and its friends.</p>
<p>In the draft version of P0973r0 authors claimed that it is impossible to store coroutines with small state on the stack and with larger state on the heap, in the latest revision, authors state that Coroutines TS does not give tools to coroutine designer to allow stack-like allocation for nested generators.</p>
<p>While we understand that authors have these concerns, we are happy to report that in both cases, the limiting factor is not the coroutine TS customization points, but the lack of imagination on behalf of the coroutine designer. Both are possible with rather straight-forward code.</p>
<!--
Authors also claim:
> For example, users might
want to avoid the allocation and accept the cost of copying the coroutine state around on the
stack instead. Or they might want to split the difference by allocating for large states, but
keeping small states on the stack. The Coroutines TS does not give the user the tools to do this.

That is obviously untrue and verifiable with just a few lines of code:

```c++
// type-erased array of bytes as not to force template use on the user
struct coro_storage {
    byte* mem;
    size_t size;

    template <size_t N>
    coro_storage(array<byte,N>& arr)
        : mem(arr.begin()), size(arr.size()) {}
};

// write coroutine as
coro g(coro_storage) {
    puts("Hello");
    co_return;
}

// use coroutine as
int main() { 
    array<byte, 32> store_coroutine_here;
    g(store_coroutine_here); // no allocations if fits
}
```

To enable the snippet above to work, you only need to overload `operator new` and `operator delete` in the definition of the coroutine promise. It is rather simple to write:

```c++
template <typename... Whatever>
void *operator new(size_t sz, coro_storage s, Whatever const&...) {
  if (sz + 1 <= s.size) {
    s.mem[sz] = {0}; // did not allocate
    return s.mem;
  }
  auto *mem = (byte*)::operator new(sz + 1);
  mem[sz] = {1}; // did allocate
  return mem;
}
      
void operator delete(void* mem, size_t sz) {
  if (static_cast<byte*>(mem)[sz] != byte{0}) // was allocated
    ::operator delete(mem);
}
```
Full example is available here: https://godbolt.org/g/PiKYg2.
-->
<!--
In regular functions, when users do not want to consume too much stack memory, they use classes like `llvm::SmallVector<T,N>` that would use stack memory if the vector is small and switch to dynamic allocation if the vector grows too big.

The same technique is fully applicable to coroutines, if a coroutine author wants to keep the coroutine frame small, they would need to be frugal with how much automatic storage they consume. -->
<blockquote>
<ul>
<li>and besides, allocation is not needed at all for expected&lt;T,E&gt;</li>
</ul>
</blockquote>
<p>We are in full agreement with authors of P0973 on this subject. <strong>Moreover</strong>, efficient use of coroutines with <code>expected&lt;T,E&gt;</code> <strong>absolutely, does not rely on the heap elision optimization at all!</strong> We only rely on Coroutine TS blessing not to have an allocation if not needed.</p>
<p>In llvm, there is a very simple coroutine optimization called &quot;suspend point simplification and elimination&quot;, which looks to see if it can simplify and get rid of suspend points. Suspend point simplification looks for cases where an expansion of suspend point would lead to pure local control flow (continue execution or jump to the coroutine end) and if it is, a suspend point is removed and replaced with a normal local control flow within a function. If <strong>all</strong> suspend points are eliminated from the coroutine due to simplification or unreachability, all of the coroutine-ness is stripped out and the coroutine becomes a normal function. No allocations, no coroutine transformations are required.</p>
<p>As mentioned earlier, when discussing constexpr, we don't have to make the optimizer work that hard to turn a coroutine back to a function if we can explain to the compiler that coroutines using <code>expected&lt;T,E&gt;</code> are not coroutines at all and can be completely dealt with by the frontend of the compiler.</p>
<p>Now, while it is flattering that coroutines are so flexible and efficient that they can be applied to things which are not coroutines at all, this discussion makes one think that maybe using coroutines for expansion of <code>expected&lt;T,E&gt;</code> is not necessarily a good thing for C++ in the long term. Maybe, working on making exceptions more acceptable to developers and finding ways to evolve exceptions to deal with cases where today <code>expected&lt;T,E&gt;</code> is preferable could be a more rewarding long-term goal.</p>
<h2 id="conclusion-a-refconclusiona">Conclusion <a ref="#conclusion"></a></h2>
<p>We thanks the authors of P0973r0 for taking time to try out coroutines and write a concerns paper. We are in agreement with some of the concerns, believe that some are not relevant to Coroutines TS as a solution is available via library bindings, and other concerns can be addressed with small non-breaking improvements, if desired, to improve support for non-exceptional error propagation via <code>expected&lt;T,E&gt;</code>.</p>
<!--
P0973r0 conclusion that coroutines are not well suited for generator and `expected<T,E>` use cases is based primarily on the alleged unfeasibilty and untrustworsiness of elidable implicit allocation of the coroutine frame and inability to control allocations.
-->
<p>As we understand P0953r0 paper was written partially to defend and contrast an alternative coroutine design to be presented in Rapperswil. Without seeing the alternative design, it is difficult to evaluate whether it addresses the concerns fully without introducing its own set of problem. This document provides responses to reported issues. While we provide a vigorous defense of the current TS, we do not preclude a possibility that some of the ideas from not-yet-revealed alternative design cannot be beneficially added to the existing TS to address some of the expressed concerns.</p>
<!-- The demonstrated unfamiliarity with coroutine allocation customization techniques, optimizations used to implement coroutines in Clang compiler and obviously untrue statements made in "Concerns about implicit elidable coroutine frame allocation" section of P0973r0 undermine the credibility of the paper's conclusion.-->
<!-- Softer version: 
‘we understand the authors are concerned about X and Y, and are happy to report that experience in practice shows X and Y are not actually problems in practice as demonstrated by <countexamples>’
-->
<!--
With respect to other concerns, we can explore small incremental improvements that can help the users to smooth out the transition from macros to coroutines to propagate errors from `expected<T,E>` by, for example, addressing keyword concerns and (possibly) lifting `constexpr` restriction for some coroutine types if it is absolutely critical for adoption.

These seem to be rather minor non-breaking improvements and they should not delay the merge of the Coroutines TS into the working paper for C++20.
-->
<h2 id="acknowledgements">Acknowledgements</h2>
<p>Thanks to Bjarne Stroustrup, Geoffrey Rommer, Casey Carter, and many others, for feedback on previous drafts of this paper.</p>
<h2 id="references-a-refreferencesa">References: <a ref="#references"></a></h2>
<ul>
<li>P0973R0: &quot;Coroutines TS Use Cases and Design Issues&quot; shared on reflector (https://wg21.link/P0973r0)</li>
<li>N4499: http://open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4499.pdf</li>
<li>Coroutines TS: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4723.pdf</li>
<li>Coroutines using allocator example: https://godbolt.org/g/QjRTEt</li>
<li>Coroutines storing themselves in <code>array&lt;byte,N&gt;</code>: https://godbolt.org/g/PiKYg2.</li>
<li>P0981R0: &quot;Halo: Coroutine Heap Allocation eLision Optimization: the joint response&quot; (https://wg21.link/P0981r0)</li>
</ul>

    </body>
    </html>