<!DOCTYPE html>
    <html>
    <head>
        <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
        <style>
/*--------------------------------------------------------------------------------------------- * Copyright (c) Microsoft Corporation. All rights reserved. * Licensed under the MIT License. See License.txt in the project root for license information. *--------------------------------------------------------------------------------------------*/ body { font-family: "Segoe WPC", "Segoe UI", "SFUIText-Light", "HelveticaNeue-Light", sans-serif, "Droid Sans Fallback"; font-size: 14px; padding: 0 12px; line-height: 22px; word-wrap: break-word; } body.scrollBeyondLastLine { margin-bottom: calc(100vh - 22px); } body.showEditorSelection .code-line { position: relative; } body.showEditorSelection .code-active-line:before, body.showEditorSelection .code-line:hover:before { content: ""; display: block; position: absolute; top: 0; left: -12px; height: 100%; } body.showEditorSelection li.code-active-line:before, body.showEditorSelection li.code-line:hover:before { left: -30px; } .vscode-light.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(0, 0, 0, 0.15); } .vscode-light.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(0, 0, 0, 0.40); } .vscode-dark.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(255, 255, 255, 0.4); } .vscode-dark.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(255, 255, 255, 0.60); } .vscode-high-contrast.showEditorSelection .code-active-line:before { border-left: 3px solid rgba(255, 160, 0, 0.7); } .vscode-high-contrast.showEditorSelection .code-line:hover:before { border-left: 3px solid rgba(255, 160, 0, 1); } img { max-width: 100%; max-height: 100%; } a { color: #4080D0; text-decoration: none; } a:focus, input:focus, select:focus, textarea:focus { outline: 1px solid -webkit-focus-ring-color; outline-offset: -1px; } hr { border: 0; height: 2px; border-bottom: 2px solid; } h1 { padding-bottom: 0.3em; line-height: 1.2; border-bottom-width: 1px; border-bottom-style: solid; } h1, h2, h3 { font-weight: normal; } h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { font-size: inherit; line-height: auto; } a:hover { color: #4080D0; text-decoration: underline; } table { border-collapse: collapse; } table > thead > tr > th { text-align: left; border-bottom: 1px solid; } table > thead > tr > th, table > thead > tr > td, table > tbody > tr > th, table > tbody > tr > td { padding: 5px 10px; } table > tbody > tr + tr > td { border-top: 1px solid; } blockquote { margin: 0 7px 0 5px; padding: 0 16px 0 10px; border-left: 5px solid; } code { font-family: Menlo, Monaco, Consolas, "Droid Sans Mono", "Courier New", monospace, "Droid Sans Fallback"; font-size: 14px; line-height: 19px; } body.wordWrap pre { white-space: pre-wrap; } .mac code { font-size: 12px; line-height: 18px; } code > div { padding: 16px; border-radius: 3px; overflow: auto; } /** Theming */ .vscode-light { color: rgb(30, 30, 30); } .vscode-dark { color: #DDD; } .vscode-high-contrast { color: white; } .vscode-light code { color: #A31515; } .vscode-dark code { color: #D7BA7D; } .vscode-light code > div { background-color: rgba(220, 220, 220, 0.4); } .vscode-dark code > div { background-color: rgba(10, 10, 10, 0.4); } .vscode-high-contrast code > div { background-color: rgb(0, 0, 0); } .vscode-high-contrast h1 { border-color: rgb(0, 0, 0); } .vscode-light table > thead > tr > th { border-color: rgba(0, 0, 0, 0.69); } .vscode-dark table > thead > tr > th { border-color: rgba(255, 255, 255, 0.69); } .vscode-light h1, .vscode-light hr, .vscode-light table > tbody > tr + tr > td { border-color: rgba(0, 0, 0, 0.18); } .vscode-dark h1, .vscode-dark hr, .vscode-dark table > tbody > tr + tr > td { border-color: rgba(255, 255, 255, 0.18); } .vscode-light blockquote, .vscode-dark blockquote { background: rgba(127, 127, 127, 0.1); border-color: rgba(0, 122, 204, 0.5); } .vscode-high-contrast blockquote { background: transparent; border-color: #fff; }
</style>
<style>
/* Tomorrow Theme */ /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ /* Original theme - https://github.com/chriskempson/tomorrow-theme */ /* Tomorrow Comment */ .hljs-comment, .hljs-quote { color: #8e908c; } /* Tomorrow Red */ .hljs-variable, .hljs-template-variable, .hljs-tag, .hljs-name, .hljs-selector-id, .hljs-selector-class, .hljs-regexp, .hljs-deletion { color: #c82829; } /* Tomorrow Orange */ .hljs-number, .hljs-built_in, .hljs-builtin-name, .hljs-literal, .hljs-type, .hljs-params, .hljs-meta, .hljs-link { color: #f5871f; } /* Tomorrow Yellow */ .hljs-attribute { color: #eab700; } /* Tomorrow Green */ .hljs-string, .hljs-symbol, .hljs-bullet, .hljs-addition { color: #718c00; } /* Tomorrow Blue */ .hljs-title, .hljs-section { color: #4271ae; } /* Tomorrow Purple */ .hljs-keyword, .hljs-selector-tag { color: #8959a8; } .hljs { display: block; overflow-x: auto; color: #4d4d4c; padding: 0.5em; } .hljs-emphasis { font-style: italic; } .hljs-strong { font-weight: bold; }
</style>
<style>
ul.contains-task-list { padding-left: 0; } ul ul.contains-task-list { padding-left: 40px; } .task-list-item { list-style-type: none; } .task-list-item-checkbox { vertical-align: middle; }
</style>
        <style>
            body {
                font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', 'HelveticaNeue-Light', 'Ubuntu', 'Droid Sans', sans-serif;
                font-size: 14px;
                line-height: 1.6;
            }
        </style>
    </head>
    <body>
        <table>
<thead>
<tr>
<th>Document Number:</th>
<th>P0981R0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Date:</td>
<td>2018-03-18</td>
</tr>
<tr>
<td>Audience:</td>
<td>Evolution</td>
</tr>
<tr>
<td>Revises:</td>
<td>none</td>
</tr>
<tr>
<td>Reply to:</td>
<td>Richard Smith (richardsmith@google.com), Gor Nishanov (gorn@microsoft.com)</td>
</tr>
</tbody>
</table>
<h1 id="halo-coroutine-heap-allocation-elision-optimization-the-joint-response">Halo: coroutine Heap Allocation eLision Optimization: the joint response</h1>
<h2 id="summary">Summary</h2>
<p>During the discussion of coroutines in a Tuesday session in Jacksonville 2018, we had the following exchange:</p>
<blockquote>
<p>Richard Smith: I have concerns about the generator example. ... The optimization
relies on <code>accumulate</code> being inlinable or optimizable. ...
You need to inline at least <code>begin</code> to know that the value of the handle is not changed.</p>
</blockquote>
<blockquote>
<p>Gor Nishanov: Let's take this offline and come back with the joint response.</p>
</blockquote>
<p>This document is that join response. It evaluates the necessary conditions for heap allocation elision
to work. After careful study, we find that:</p>
<ol>
<li>Heap allocation elision optimization DOES NOT require inlining a potentially unbounded amount of code, such as the coroutine body OR the algorithms using the coroutine (such as <code>accumulate</code> in the generator example).</li>
<li>It DOES REQUIRE inlining some compiler synthesized code (such as the coroutine ramp) and inlining a bounded amount of wrapper/glue code that gives the coroutine the desired semantics (such as <code>begin</code> in the generator example). Specifically, we need to inline (or analyze interprocedurally) sufficient code to conclude that all execution paths that create the coroutine and then leave the calling function will include a call to <code>destroy</code> on a <code>coroutine_handle</code> denoting the coroutine (typically in the destructor of the coroutine type).</li>
<li>With the implementation of coroutines in clang today, inlining the coroutine ramp function requires the coroutine to be defined in the same translation unit. We believe that alternative implementation strategies may allow this optimization to apply across ABI boundaries when the definition of the coroutine is not available in the current translation unit, but at this time have not analyzed the tradeoffs of such approaches.</li>
</ol>
<p>The rest of the document goes over details of what exactly needs to be inlined and explores generator and task examples.</p>
<h2 id="terminology">Terminology</h2>
<p><strong>Coroutine frame</strong>: a memory location where state of the coroutine that has to be preserved across suspend points is stored. The coroutine frame can reside on the heap or on the stack, depending on whether coroutine heap elision is in effect or whether a custom allocator is provided by the user to override default allocation of the coroutine state.</p>
<p><strong>User-authored coroutine body</strong>: the <em>function-body</em> of the coroutine.</p>
<p><strong>Coroutine state machine</strong>: a compiler created transformation of the <em>user-authored coroutine body</em></p>
<p><strong>Coroutine ramp function</strong>: a compiler synthesized ramp/trampoline/thunk that creates an initial coroutine state and starts the coroutine state machine.</p>
<p><strong>Coroutine handle</strong>: an object of a <code>coroutine_handle</code> type that points at a coroutine state machine.</p>
<p><strong>Coroutine type</strong>: a wrapper around the coroutine handle that has the necessary bindings to give the coroutine the desired semantics. For example: a <code>generator&lt;T&gt;</code> or a <code>task&lt;T&gt;</code>.</p>
<p><strong>Coroutine escaping its caller</strong>: an operation within the caller of the coroutine that could make the address of the object of coroutine type visible outside the caller, excluding such cases where the compiler can prove that such escaping does not occur. For example: storing the address of the object of coroutine type into a global or through an externally-supplied pointer, or calling a member function on the object of coroutine type whose body is not available for examination (or is not examined due to optimizer limitations).</p>
<h2 id="case-analysis">Case analysis</h2>
<h3 id="generator-case">Generator case</h3>
<p>Let's consider the example presented in P0978R0:</p>
<pre class="hljs"><code><div>generator&lt;<span class="hljs-keyword">int</span>&gt; range(<span class="hljs-keyword">int</span> from, <span class="hljs-keyword">int</span> to) { <span class="hljs-comment">// user-authored coroutine body starts</span>
   <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = from; i &lt; to; ++i)
      co_yield i;
} <span class="hljs-comment">// user authored coroutine body ends</span>

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span> </span>{
  <span class="hljs-keyword">auto</span> s = range(<span class="hljs-number">1</span>, <span class="hljs-number">10</span>);
  <span class="hljs-keyword">return</span> <span class="hljs-built_in">std</span>::accumulate(s.begin(), s.end(), <span class="hljs-number">0</span>);
}
</div></code></pre>
<p>Conceptually, the coroutine transformation will transform <code>range(int,int)</code> into something like this:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> range$state-machine { ... transformed user-authored-body-is-here ... };

generator&lt;<span class="hljs-keyword">int</span>&gt; range(<span class="hljs-keyword">int</span> from, <span class="hljs-keyword">int</span> to) { <span class="hljs-comment">// coroutine ramp/thunk/trampoline</span>
   <span class="hljs-keyword">auto</span> s = <span class="hljs-keyword">new</span> range$state-machine(from, to);
   <span class="hljs-keyword">return</span> s-&gt;promise.get_return_object();
}
</div></code></pre>
<p>For the coroutine heap allocation elision optimization to work we need the following to be available
for inlining:</p>
<ul>
<li>the coroutine ramp function</li>
<li><code>get_return_object()</code></li>
<li><code>generator&lt;int&gt;::begin()</code></li>
<li>the constructor and move constructor of <code>generator&lt;int&gt;</code></li>
<li>the destructor of <code>generator&lt;int&gt;</code></li>
<li><code>coroutine_handle&lt;&gt;::destroy</code> (either a compiler builtin or inlineable to a compiler builtin)</li>
</ul>
<p>We DO NOT need to inline the body of the coroutine or the algorithm <code>accumulate</code>,
unless the <code>iterator</code> type retains a pointer or reference to the <code>generator&lt;int&gt;</code>
object.</p>
<p>Note that authors of coroutine types should be aware of this point: at least one
blogger has described an implementation of <code>generator&lt;T&gt;::iterator</code> that holds a
<code>generator&lt;T&gt;*</code>. However, every implementation of <code>generator&lt;T&gt;</code> we can find in the
wild holds a <code>coroutine_handle&lt;Promise&gt;</code> instead, which avoids the problem, and
allows heap allocation elision without inlining <code>accumulate</code>.</p>
<p>To give a feel of how big the functions that need to be inlined are, here is a typical generator implementation:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> generator {
  <span class="hljs-keyword">struct</span> promise_type { ... };

  <span class="hljs-keyword">struct</span> iterator {
    ...
    coroutine_handle&lt;promise_type&gt; h_copy;
  };
  ...
  coroutine_handle&lt;promise_type&gt; h; <span class="hljs-comment">// wrapper around void* trivial to construct/destruct</span>
};
</div></code></pre>
<p>Constructors that need to be inlinable:</p>
<pre class="hljs"><code><div>generator(coroutine_handle&lt;promise_type&gt; h) : h(h) {}
generator(generator&amp;&amp; rhs) : h(rhs.h) { rhs.h = <span class="hljs-literal">nullptr</span>; }
</div></code></pre>
<p>Destructor must be inlinable:</p>
<pre class="hljs"><code><div>~generator() { <span class="hljs-keyword">if</span> (h) h.destroy(); }
</div></code></pre>
<p><code>get_return_object</code> must be inlinable:</p>
<pre class="hljs"><code><div>generator promise_type::get_return_object() {
  <span class="hljs-keyword">return</span> {coroutine_handle::from_promise(*<span class="hljs-keyword">this</span>)};
}
</div></code></pre>
<p><code>begin()</code> must be inlinable</p>
<pre class="hljs"><code><div><span class="hljs-function">iterator <span class="hljs-title">begin</span><span class="hljs-params">()</span> </span>{
  <span class="hljs-keyword">if</span> (h) h.resume();
  <span class="hljs-keyword">return</span> {h};
}
</div></code></pre>
<!--
https://godbolt.org/g/2FaMtz (heap allocation elision with most of the function bodies removed)

https://godbolt.org/g/6qRsCT (original)

https://godbolt.org/g/PXaJHD (task)
-->
<h3 id="generator-case-with-the-ranges-ts">Generator case with the Ranges TS</h3>
<p>Under the Ranges TS, we expect that generators will be passed by reference to algorithms.
If these algorithms extract the <code>begin</code> / <code>end</code> iterators from their ranges and then call
other functions, only the range wrapper function need be inlined. For example, given:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span>&lt;InputRange Rng, <span class="hljs-keyword">class</span> Proj = identity,
         IndirectUnaryPredicate&lt;projected&lt;<span class="hljs-keyword">iterator_t</span>&lt;Rng&gt;, Proj&gt;&gt; Pred&gt;
<span class="hljs-keyword">bool</span> all_of(Rng &amp;&amp;rng, Pred pred, Proj proj = Proj{}) {
  <span class="hljs-keyword">return</span> all_of(begin(rng), end(rng), ref(pred), ref(proj));
}
</div></code></pre>
<p>Heap allocation elision optimization in <code>all_of(range(1, 5), pred)</code> only requires that
the wrapper version of <code>all_of</code> be inlined in addition to those functions identified above.
The algorithm implementation does not need to be inlined.</p>
<h3 id="task-example">Task example</h3>
<p>Let's consider the <code>big_task</code> example from P0978R0 with a small addition of a call to an executor (as the question was raised whether a coroutine posted to execute on an executor can have its allocation elided).</p>
<pre class="hljs"><code><div>task&lt;&gt; subtask(Executor ex) {
  ...
  <span class="hljs-function">co_await <span class="hljs-title">execute_on</span><span class="hljs-params">(ex)</span></span>;
  ...
}

task&lt;&gt; big_task(Executor ex) {
  ...
  <span class="hljs-function">co_await <span class="hljs-title">subtask</span><span class="hljs-params">(ex)</span></span>; <span class="hljs-comment">// subtask frame allocation elided</span>
  ...
}
</div></code></pre>
<p>Similar to the generator case, only a small number of library functions need to be available for
inlining:</p>
<ul>
<li>coroutine ramp function</li>
<li><code>get_return_object()</code></li>
<li><code>await_suspend</code>/<code>await_ready</code>/<code>await_resume</code> of the task</li>
<li>constructor and move constructor of <code>task&lt;&gt;</code></li>
<li>destructor of <code>task&lt;&gt;</code></li>
<li><code>coroutine_handle&lt;&gt;::destroy</code> (either a compiler builtin or inlineable to a compiler builtin)</li>
</ul>
<p>We DO NOT need to inline:</p>
<ul>
<li>executor <code>.execute</code> / <code>on_execute</code></li>
<li>body of <code>subtask</code> or <code>big_task</code></li>
</ul>
<p>To give a feel of how big the functions that need to be inlined are, here is a typical <code>task</code> implementation:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> task {
  <span class="hljs-keyword">struct</span> promise_type { ... };
  ...
  coroutine_handle&lt;promise_type&gt; h; <span class="hljs-comment">// wrapper around void* trival to construct/destruct</span>
};
</div></code></pre>
<p>Constructors:</p>
<pre class="hljs"><code><div>task(coroutine_handle&lt;promise_type&gt; h) : h(h) {}
task(task&amp;&amp; rhs) : h(rhs.h) { rhs.h = <span class="hljs-literal">nullptr</span>; }
</div></code></pre>
<p><code>get_return_object</code>:</p>
<pre class="hljs"><code><div>task promise_type::get_return_object() {
  <span class="hljs-keyword">return</span> {coroutine_handle::from_promise(*<span class="hljs-keyword">this</span>)}; }
</div></code></pre>
<p>Destructor:</p>
<pre class="hljs"><code><div>~task() { <span class="hljs-keyword">if</span> (h) h.destroy(); }
</div></code></pre>
<p><code>await_suspend</code>:</p>
<pre class="hljs"><code><div><span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">await_suspend</span><span class="hljs-params">(coroutine_handle&lt;&gt; waiter)</span> </span>{
  h.promise().waiter = waiter;
  <span class="hljs-keyword">return</span> h;
}
</div></code></pre>
<p><code>await_ready</code>/<code>await_resume</code>:</p>
<pre class="hljs"><code><div><span class="hljs-function"><span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> <span class="hljs-title">await_ready</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">noexcept</span> </span>{ <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>; }
<span class="hljs-function"><span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">void</span> <span class="hljs-title">await_resume</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">noexcept</span> </span>{}
</div></code></pre>
<h2 id="conclusions">Conclusions</h2>
<p>We find that the functions that need to be inlined for coroutine heap allocation elision optimization
to work are tiny and we would want them inlined anyway irrespective of whether they are needed for
heap allocation elision to work or not. We anticipate that optimizers will reliably perform the
optimization on code patterns similar to those above, but that allocations will be performed for
code patterns where the object of coroutine type escapes or when compiling without optimization.</p>
<p>Some constraints are imposed on authors of coroutine types and on consumers of coroutine types.
Code should avoid unnecessarily retaining pointers and references to the coroutine type,
by extracting the necessary information from it in an inlineable wrapper. These constraints are
straightforward to satisfy in the cases we have examined, but are novel.</p>

    </body>
    </html>