<!DOCTYPE html>
<html>
<head>
<title>D0055_BetterTogether</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
/* GitHub stylesheet for MarkdownPad (http://markdownpad.com) */
/* Author: Nicolas Hery - http://nicolashery.com */
/* Version: b13fe65ca28d2e568c6ed5d7f06581183df8f2ff */
/* Source: https://github.com/nicolahery/markdownpad-github */

/* RESET
=============================================================================*/

html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video {
  margin: 0;
  padding: 0;
  border: 0;
}

/* BODY
=============================================================================*/

body {
  font-family: Helvetica, arial, freesans, clean, sans-serif;
  font-size: 14px;
  line-height: 1.6;
  color: #333;
  background-color: #fff;
  padding: 20px;
  max-width: 960px;
  margin: 0 auto;
}

body>*:first-child {
  margin-top: 0 !important;
}

body>*:last-child {
  margin-bottom: 0 !important;
}

/* BLOCKS
=============================================================================*/

p, blockquote, ul, ol, dl, table, pre {
  margin: 15px 0;
}

/* HEADERS
=============================================================================*/

h1, h2, h3, h4, h5, h6 {
  margin: 20px 0 10px;
  padding: 0;
  font-weight: bold;
  -webkit-font-smoothing: antialiased;
}

h1 tt, h1 code, h2 tt, h2 code, h3 tt, h3 code, h4 tt, h4 code, h5 tt, h5 code, h6 tt, h6 code {
  font-size: inherit;
}

h1 {
  font-size: 28px;
  color: #000;
}

h2 {
  font-size: 24px;
  border-bottom: 1px solid #ccc;
  color: #000;
}

h3 {
  font-size: 18px;
}

h4 {
  font-size: 16px;
}

h5 {
  font-size: 14px;
}

h6 {
  color: #777;
  font-size: 14px;
}

body>h2:first-child, body>h1:first-child, body>h1:first-child+h2, body>h3:first-child, body>h4:first-child, body>h5:first-child, body>h6:first-child {
  margin-top: 0;
  padding-top: 0;
}

a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
  margin-top: 0;
  padding-top: 0;
}

h1+p, h2+p, h3+p, h4+p, h5+p, h6+p {
  margin-top: 10px;
}

/* LINKS
=============================================================================*/

a {
  color: #4183C4;
  text-decoration: none;
}

a:hover {
  text-decoration: underline;
}

/* LISTS
=============================================================================*/

ul, ol {
  padding-left: 30px;
}

ul li > :first-child, 
ol li > :first-child, 
ul li ul:first-of-type, 
ol li ol:first-of-type, 
ul li ol:first-of-type, 
ol li ul:first-of-type {
  margin-top: 0px;
}

ul ul, ul ol, ol ol, ol ul {
  margin-bottom: 0;
}

dl {
  padding: 0;
}

dl dt {
  font-size: 14px;
  font-weight: bold;
  font-style: italic;
  padding: 0;
  margin: 15px 0 5px;
}

dl dt:first-child {
  padding: 0;
}

dl dt>:first-child {
  margin-top: 0px;
}

dl dt>:last-child {
  margin-bottom: 0px;
}

dl dd {
  margin: 0 0 15px;
  padding: 0 15px;
}

dl dd>:first-child {
  margin-top: 0px;
}

dl dd>:last-child {
  margin-bottom: 0px;
}

/* CODE
=============================================================================*/

pre, code, tt {
  font-size: 12px;
  font-family: Consolas, "Liberation Mono", Courier, monospace;
}

code, tt {
  margin: 0 0px;
  padding: 0px 0px;
  white-space: nowrap;
  border: 1px solid #eaeaea;
  background-color: #f8f8f8;
  border-radius: 3px;
}

pre>code {
  margin: 0;
  padding: 0;
  white-space: pre;
  border: none;
  background: transparent;
}

pre {
  background-color: #f8f8f8;
  border: 1px solid #ccc;
  font-size: 13px;
  line-height: 19px;
  overflow: auto;
  padding: 6px 10px;
  border-radius: 3px;
}

pre code, pre tt {
  background-color: transparent;
  border: none;
}

kbd {
    -moz-border-bottom-colors: none;
    -moz-border-left-colors: none;
    -moz-border-right-colors: none;
    -moz-border-top-colors: none;
    background-color: #DDDDDD;
    background-image: linear-gradient(#F1F1F1, #DDDDDD);
    background-repeat: repeat-x;
    border-color: #DDDDDD #CCCCCC #CCCCCC #DDDDDD;
    border-image: none;
    border-radius: 2px 2px 2px 2px;
    border-style: solid;
    border-width: 1px;
    font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
    line-height: 10px;
    padding: 1px 4px;
}

/* QUOTES
=============================================================================*/

blockquote {
  border-left: 4px solid #DDD;
  padding: 0 15px;
  color: #777;
}

blockquote>:first-child {
  margin-top: 0px;
}

blockquote>:last-child {
  margin-bottom: 0px;
}

/* HORIZONTAL RULES
=============================================================================*/

hr {
  clear: both;
  margin: 15px 0;
  height: 0px;
  overflow: hidden;
  border: none;
  background: transparent;
  border-bottom: 4px solid #ddd;
  padding: 0;
}

/* TABLES
=============================================================================*/

table th {
  font-weight: bold;
}

table th, table td {
  border: 1px solid #ccc;
  padding: 6px 13px;
}

table tr {
  border-top: 1px solid #ccc;
  background-color: #fff;
}

table tr:nth-child(2n) {
  background-color: #f8f8f8;
}

/* IMAGES
=============================================================================*/

img {
  max-width: 100%
}
</style>
<style type="text/css">
.highlight  { background: #ffffff; }
.highlight .c { color: #999988; font-style: italic } /* Comment */
.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.highlight .k { font-weight: bold } /* Keyword */
.highlight .o { font-weight: bold } /* Operator */
.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */
.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
.highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #aa0000 } /* Generic.Error */
.highlight .gh { color: #999999 } /* Generic.Heading */
.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
.highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
.highlight .go { color: #888888 } /* Generic.Output */
.highlight .gp { color: #555555 } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #aaaaaa } /* Generic.Subheading */
.highlight .gt { color: #aa0000 } /* Generic.Traceback */
.highlight .kc { font-weight: bold } /* Keyword.Constant */
.highlight .kd { font-weight: bold } /* Keyword.Declaration */
.highlight .kp { font-weight: bold } /* Keyword.Pseudo */
.highlight .kr { font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */
.highlight .m { color: #009999 } /* Literal.Number */
.highlight .s { color: #d14 } /* Literal.String */
.highlight .na { color: #008080 } /* Name.Attribute */
.highlight .nb { color: #0086B3 } /* Name.Builtin */
.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */
.highlight .no { color: #008080 } /* Name.Constant */
.highlight .ni { color: #800080 } /* Name.Entity */
.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */
.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */
.highlight .nn { color: #555555 } /* Name.Namespace */
.highlight .nt { color: #000080 } /* Name.Tag */
.highlight .nv { color: #008080 } /* Name.Variable */
.highlight .ow { font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mf { color: #009999 } /* Literal.Number.Float */
.highlight .mh { color: #009999 } /* Literal.Number.Hex */
.highlight .mi { color: #009999 } /* Literal.Number.Integer */
.highlight .mo { color: #009999 } /* Literal.Number.Oct */
.highlight .sb { color: #d14 } /* Literal.String.Backtick */
.highlight .sc { color: #d14 } /* Literal.String.Char */
.highlight .sd { color: #d14 } /* Literal.String.Doc */
.highlight .s2 { color: #d14 } /* Literal.String.Double */
.highlight .se { color: #d14 } /* Literal.String.Escape */
.highlight .sh { color: #d14 } /* Literal.String.Heredoc */
.highlight .si { color: #d14 } /* Literal.String.Interpol */
.highlight .sx { color: #d14 } /* Literal.String.Other */
.highlight .sr { color: #009926 } /* Literal.String.Regex */
.highlight .s1 { color: #d14 } /* Literal.String.Single */
.highlight .ss { color: #990073 } /* Literal.String.Symbol */
.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #008080 } /* Name.Variable.Class */
.highlight .vg { color: #008080 } /* Name.Variable.Global */
.highlight .vi { color: #008080 } /* Name.Variable.Instance */
.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */
.pl-c {
    color: #969896;
}

.pl-c1,.pl-mdh,.pl-mm,.pl-mp,.pl-mr,.pl-s1 .pl-v,.pl-s3,.pl-sc,.pl-sv {
    color: #0086b3;
}

.pl-e,.pl-en {
    color: #795da3;
}

.pl-s1 .pl-s2,.pl-smi,.pl-smp,.pl-stj,.pl-vo,.pl-vpf {
    color: #333;
}

.pl-ent {
    color: #63a35c;
}

.pl-k,.pl-s,.pl-st {
    color: #a71d5d;
}

.pl-pds,.pl-s1,.pl-s1 .pl-pse .pl-s2,.pl-sr,.pl-sr .pl-cce,.pl-sr .pl-sra,.pl-sr .pl-sre,.pl-src,.pl-v {
    color: #df5000;
}

.pl-id {
    color: #b52a1d;
}

.pl-ii {
    background-color: #b52a1d;
    color: #f8f8f8;
}

.pl-sr .pl-cce {
    color: #63a35c;
    font-weight: bold;
}

.pl-ml {
    color: #693a17;
}

.pl-mh,.pl-mh .pl-en,.pl-ms {
    color: #1d3e81;
    font-weight: bold;
}

.pl-mq {
    color: #008080;
}

.pl-mi {
    color: #333;
    font-style: italic;
}

.pl-mb {
    color: #333;
    font-weight: bold;
}

.pl-md,.pl-mdhf {
    background-color: #ffecec;
    color: #bd2c00;
}

.pl-mdht,.pl-mi1 {
    background-color: #eaffea;
    color: #55a532;
}

.pl-mdr {
    color: #795da3;
    font-weight: bold;
}

.pl-mo {
    color: #1d3e81;
}
.task-list {
padding-left:10px;
margin-bottom:0;
}

.task-list li {
    margin-left: 20px;
}

.task-list-item {
list-style-type:none;
padding-left:10px;
}

.task-list-item label {
font-weight:400;
}

.task-list-item.enabled label {
cursor:pointer;
}

.task-list-item+.task-list-item {
margin-top:3px;
}

.task-list-item-checkbox {
display:inline-block;
margin-left:-20px;
margin-right:3px;
vertical-align:1px;
}
</style>
</head>
<body>
<table>
<thead>
<tr>
<th>Document Number:</th>
<th>P0055R1</th>
</tr>
</thead>
<tbody>
<tr>
<td>Date:</td>
<td>2015-09-12</td>
</tr>
<tr>
<td>Project:</td>
<td>LEWG</td>
</tr>
<tr>
<td>Revises:</td>
<td>P0055R0</td>
</tr>
<tr>
<td>Reply to:</td>
<td>gorn@microsoft.com</td>
</tr>
</tbody>
</table>
<h1 id="p0055r1-on-interactions-between-coroutines-and-networking-library">P0055R1: On Interactions Between Coroutines and Networking Library</h1>
<h2 id="introduction">Introduction</h2>
<p>Proposed Networking Library (<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4478.html">N4478</a>) uses the callback based asynchronous model described in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4045.pdf">N4045</a> which is shown to have lower overhead than the asynchronous I/O abstractions based on future.then ([4399]). The overhead of the Networking Library abstractions can be made even lower if it can take advantage of coroutines <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4499.pdf">N4499</a>. This paper suggests altering completion token transformation class templates described in N4478/[async.reqmts.async] to achieve near zero-overhead efficiency when used with coroutines. Changes in this revision clarify that the current CompletionToken model supports only one programming model efficiently, namely, continuation using callbacks whereas this proposal offers efficient mechanism supporting both callback and coroutine models.</p>
<h2 id="performance-for-callbacks-and-callbacks-only">Performance for callbacks and callbacks only</h2>
<p>N4045: Library Foundation for Asynchronous Operations paper argues that std::future is a poor choice as a fundamental building block for asynchronous programming due to inherent performance limitations of std::future and suggests to use a callback model as a foundation mechanism. To support both users who desire to use callbacks and those who desire to use future-like objects, it offers a CompletionToken model adopted subsequently by Networking [P0112] and Executors [P0113] proposals.</p>
<p>We wholeheartedly agree with N4045 assessment of performance limitations of std::future. Unfortunately, the CompletionToken model proposed in N4045 and adopted by P0112 and P0113, does not offer any efficient mechanism for consumption of its APIs other than callbacks. </p>
<p>Without changes to traits similar to the ones proposed in this paper, one has to resort to utilizing <code>use_future</code> adapter described in P0112 that brings back inefficiencies related to future-based programming model. Indeed, a benchmark modeled after the one described in P0162, shows that overhead of the <code>use_future</code> mechanism results in nearly 50 times slower execution times than direct consumption of the async API via traits mechanism described in this paper.</p>
<!-- <img src="SlowBench.png" height=200/> -->
<pre><code>// 7.9ns per iteration (as proposed)             // 390ns per iteration (using use_future)

std::future&lt;void&gt; loop() {                    |   std::future&lt;void&gt; loop() {
  for (int i = 0; i &lt;= 100&#39;000&#39;000; ++i) {    |     for (int i = 0; i &lt;= 100&#39;000&#39;000; ++i) {
    co_await async_xyz(0);                    |       co_await async_xyz(0, use_future);
  }                                           |     }                                                     
}                                             |   }
</code></pre><p>The tests were performed with use_future adapter mapping to light-weight <code>rexp::future</code> from <a href="https://github.com/chriskohlhoff/resumable-expressions">https://github.com/chriskohlhoff/resumable-expressions</a>.<br>26% of the time was spent in allocation/deallocation of promises and future shared objects, 20% was spent in synchronization primitives. Allocation overhead can be reduced with a custom allocator, however synchronization overhead is unavoidable in the current CompletionToken model. Proposed model allows to avoid both allocation and synchronization overhead, since it allows to defer launching of the operation until <code>.then</code> or <code>await_suspend</code> can provide completion callback to the API, allocation is avoided by allowing using a temporary on the coroutine frame that is stable in memory for the duration of the asynchronous operation.</p>
<h2 id="coroutines-offer-lower-overhead-than-the-callback-model">Coroutines offer lower overhead than the callback model</h2>
<p>Using the same benchmark as in P0162 and applied to slightly more sophisticated code (we included an index variable <code>i</code> to count number of iterations and a code to delete the state machine when desired number of iterations is reached), we reaffirmed the proposition of this paper that coroutines offer lower overhead than the callback model while allowing very compact and readable representation of an asynchronous state machine while maintaining or exceeding performance of the callback model.</p>
<pre><code>                  coroutine                         callback based equivalent

std::future&lt;void&gt; loop() {                 |  struct loop_state {
  for (int i = 0; i &lt;= 100&#39;000&#39;000; ++i) { |     int i = 0;
    co_await async_xyz(0);                 |     loop_state() {
  }                                        |       async_xyz(0, [this](OsResultType o) { OnComplete(o); });
}                                          |     }
                                           |     void OnComplete(OsResultType) {
                                           |       if (++i &gt; 100&#39;000&#39;000) {
                                           |         delete this;
                                           |         return;
                                           |       }
                                           |       async_xyz(0, [this](OsResultType o) { OnComplete(o); });
                                           |     }
                                           |   };
                                           |   void loop() { new loop_state();}
</code></pre><p><strong>Nanoseconds per iteration:</strong></p>
<table>
<thead>
<tr>
<th>Test</th>
<th>/O2</th>
<th>/O2 /GL (whole program opt)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LLVM- Coro</td>
<td><strong>5.6ns</strong></td>
<td>6.1ns</td>
</tr>
<tr>
<td>VS 2015 Coro</td>
<td>6.2ns</td>
<td><strong>6.0ns</strong></td>
</tr>
<tr>
<td>Callback Default Alloc</td>
<td>6.8ns</td>
<td>6.7ns</td>
</tr>
<tr>
<td>Callback Custom Alloc</td>
<td>7.5ns</td>
<td>7.0ns</td>
</tr>
</tbody>
</table>
<!-- <img src=FastBench.png height=300/> -->  
<p>In this benchmark we used the callback model using a default thread caching allocator and a custom allocator that serves out memory from a preallocated fixed size arena (same as in P0162) and compared against two coroutine implementations, stock version from VS2015 Update 1 (VS 2015 Coro) and a coroutine implementation modeling coroutine optimization passes from an experimental llvm implementation (LLVM coro). As shown above, coroutines offer lower overhead that we expect to get better in the future as code generation and optimization strategies for coroutines improve with time.</p>
<h2 id="overview">Overview</h2>
<p>Networking Library asynchronous functions uses class templates <code>completion_handler_type_t</code> and <code>async_result</code> to transform <code>CompletionToken</code> passed as a parameter to the interface functions starting with prefix async_ into a callable function object to be submitted to unspecified underlying implementation functions. This transformation allows to use the same set of functions whether using a callback model or relying on future based continuation mechanism. For the latter, an object of type <code>use_future_t</code> is provided in place of the callback parameter (for example: <code>async_xyz(buf, len, use_future)</code>).</p>
<pre><code class="lang-c++">template&lt;class CompletionToken&gt;
auto async_xyz(T1 t1, T2 t2, CompletionToken&amp;&amp; token)
{
  completion_handler_type_t&lt;decay_t&lt;CompletionToken&gt;, void(R1 r1, R2 r2)&gt;
    completion_handler(forward&lt;CompletionToken&gt;(token));

  async_result&lt;decltype(completion_handler)&gt; result(completion_handler);

  async_xyz_impl(t1, t2, completion_handler); // do the work

  return result.get();
}
</code></pre>
<p>We propose to use a single <code>completion_token_transform</code> function to perform transformation currently done via <code>completion_handler_type_t</code> and <code>async_result</code>. Not only this results in less boilerplate code for the user/library developer to write, but also enables zero-overhead mode when working with coroutines as described in the next section.</p>
<pre><code class="lang-c++">template&lt;class CompletionToken&gt;
auto async_xyz(T1 t1, T2 t2, CompletionToken&amp;&amp; token) noexcept(auto)
{
  return completion_token_transform&lt;void(R1 r1, R2 r2)&gt;(
       forward&lt;CompletionToken&gt;(token),
       [=](auto typeErasedHandler) { async_xyz_impl_raw(t1, t2, typeErasedHandler); });
}
</code></pre>
<h2 id="details">Details</h2>
<p>Let&#39;s explore how a high level asynchronous function <code>async_xyz</code> can be built on top of a low level <code>os_xyz</code> supplied by the platform. At first, we will write both callback and coroutine based solutions separately. Then, we will show how utilizing <code>completion_token_transform</code> as shown in the previous section allows the same API to handle efficiently both cases.</p>
<p>Let <code>ParamType</code> be the type representing all the input parameters to an asynchronous call, <code>ResultType</code> be the type of the result provided asynchronously and <code>OsContext*</code> is a pointer to a context structure <code>OsContext</code> that <code>os_xyz</code> requires to remain valid  until the asynchronous operation is complete. The general shape of the low level API is assumed to be as shown below.</p>
<pre><code>using CallbackFnPtr = void(*)(OsResultType r, OsContext*); // os wants this signature
void os_associate_completion_callback(CallbackFnPtr cb); // usually per handle or per threadpool
void os_xyz(ParamType p, OsContext* o); // initiating routine (per operation)
</code></pre><p>To transform a call to <code>async_xyz(P, CompletionHandler)</code> into a call to <code>os_xyz</code>, we need to type erase the completion handler and pass it to the <code>os_xyz</code> as <code>OsContext*</code> parameter. In the completion callback, given an OsContext*, the callback will downcast it to the type containing the actual handler class and invoke it. In a simplified form it can look like:</p>
<pre><code>template &lt;typename CompletionHandler&gt;
void async_xyz(ParamType p, CompletionHandler &amp;&amp; cb) {
    auto o = make_unique&lt;Handler&lt;decay_t&lt;CompletionHandler&gt;&gt;&gt;(forward&lt;CompletionHandler&gt;(cb));
    os_xyz(p, o.get());
    o.release();
}

where Handler and HandlerBase defined as follows

struct HandlerBase : OsContext {
    CallbackFnPtr cb;
    explicit HandlerBase(CallbackFnPtr cb) : cb(cb) {}
    static void callback(ResultType r, OsContext* o) { // register this with OS
        static_cast&lt;HandlerBase*&gt;(o)-&gt;cb(r, o);
    }
};

template &lt;typename CompletionHandler&gt;
struct Handler : HandlerBase, CompletionHandler {
    template &lt;typename CompletionHandlerFwd&gt;
    explicit Handler(CompletionHandlerFwd&amp;&amp; h)
        : CompletionHandler(forward&lt;CompletionHandlerFwd&gt;(h))
        , HandlerBase(&amp;Handler::callback)
    {}
    static void callback(ResultType r, OsContext* o) {
        auto me = static_cast&lt;Handler*&gt;(o);
        auto handler = move(*static_cast&lt;CompletionHandler*&gt;(me));
        delete me;  // deleting it prior to invoke improves allocator behavior
        handler(r); // as handle is likely to request a similar block which can be immediately reused
    }
};
</code></pre><p>While sophisticated implementations may utilize specialized allocation / deallocation functions to lessen the overhead of type erasure and memory allocations, the overhead cannot be eliminated completely in a callback model. </p>
<p>However, when asynchronous API is used in a coroutine, no type erasure or memory allocation needs to be performed at all. No only this results in less code and faster execution, it also eliminates the sole source of failure mode of async APIs allowing the library to mark async_xxx functions as noexcept. <!--, in addition, absence of allocations simplifies using of the networking library in the environments where memory allocations are highly undesirable beyond initial preallocation of resources (game developement, for example).--></p>
<p>Let&#39;s compare mapping <code>async_xyz</code> to an <code>os_xyz</code> when used in a coroutine. To be usable in an await expression (<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4499.pdf">N4499</a>/[expr.await]), <code>async_xyz(P, use_await_t)</code> function needs to return an object with member functions await_ready, await_suspend and await_resume defined as follows: </p>
<pre><code>auto async_xyz(ParamType p, use_await_t = use_await_t{}) {
    struct Awaiter : AwaitableBase {
        ParamType p;
        explicit Awaiter(ParamType &amp; p) : p(move(p)) {}

        bool await_ready() { return false; } // the operation has not started yet
        auto await_resume() { return move(this-&gt;result); } // unpack the result when done
        void await_suspend(coroutine_handle&lt;&gt; h) { // call the OS and setup completion
            this-&gt;resume = h;
            os_xyz(p, this);
        }
    };
    return Awaiter{ p };
}

where AwaitableBase defined as follows

struct AwaitableBase : HandlerBase {
    coroutine_handle&lt;&gt; resume;
    ResultType result;

    AwaitableBase() : HandlerBase(&amp;AwaitableBase::Callback) {}

    static void Callback(ResultType r, OsContext* o) {
        auto me = static_cast&lt;AwaitableBase*&gt;(o);
        me-&gt;result = r;
        me-&gt;resume();
    }
};
</code></pre><p>The following example illustrates how a compiler transforms expression <code>await async_xyz(p)</code>.<br>Note the absence of memory allocations / deallocations and type erasure of any kind. </p>
<pre><code>ResultType r = await async_xyz(p);

becomes

     async_xyz`Awaiter __tmp{p}; 
     $promise.resume_addr = &amp;__resume_label;   // save the resumption point of the coroutine
     __tmp.resume = $RBP;                      // inlined await_suspend
     os_xyz(p,&amp;OsContextBase::Invoke, &amp;__tmp); // inlined await_suspend
     jmp Epilogue; // suspends the coroutine
__resume_label:    // will be resumed at this point once the operation is finished
     R r = move(__tmp.result); // inlined await_resume
</code></pre><h2 id="now-with-completion_token_transform">Now with completion_token_transform</h2>
<p>Given the public async function async_xyz defined as described in the Overview section (and repeated below for readers convenience)</p>
<pre><code class="lang-c++">template&lt;class CompletionToken&gt;
auto async_xyz(T1 t1, T2 t2, CompletionToken&amp;&amp; token) noexcept(auto)
{
  return completion_token_transform&lt;void(R1 r1, R2 r2)&gt;(
       forward&lt;CompletionToken&gt;(token),
       [=](auto typeErasedHandler) { async_xyz_impl_raw(t1, t2, typeErasedHandler); });
}
</code></pre>
<p>with the <code>completion_token_transform</code> defined as follows, we can achieve the same efficient implementation of asynchronous function when using callbacks:</p>
<pre><code>template &lt;typename Signature, typename CompletionHandler, typename Invoker&gt;
void completion_token_transform(CompletionHandler &amp;&amp; fn, Invoker invoker)
{
    auto p = make_unique&lt;Handler&lt;decay_t&lt;CompletionHandler&gt;&gt;&gt;(forward&lt;CompletionHandler&gt;(fn));
    invoker(p.get());
    p.release(); // if we reached this point, handler is owned by async activity and unique_ptr can relinquish the ownership
}
</code></pre><p>By defining overload for <code>use_await_t</code>, we can get efficient implementation of async_xyz when used in coroutines.</p>
<pre><code>template &lt;typename Signature, typename Invoker&gt;
auto completion_token_transform(use_await_t, Invoker invoker)
{
    struct Awaiter : AwaiterBase, Invoker {
        bool await_ready() { return false; }
        ResultType await_resume() { return move(this-&gt;result); }
        void await_suspend(coroutine_handle&lt;&gt; h) {
            this-&gt;resume = h;
            static_cast&lt;Invoker*&gt;(this)-&gt;operator()(this);
        }
        Awaiter(Invoker&amp; invoker) : Invoker(move(invoker)) {}
    };
    return Awaiter{ invoker };
}
</code></pre><p>And finally, for completeness, here is how <code>completion_token_transform</code> overload for <code>use_future_t</code> will look like:</p>
<pre><code>template &lt;typename Signature, typename Invoker&gt;
auto completion_token_transform(use_future_t, Invoker invoker) {
    struct FutHandler {
        promise&lt;ResultType&gt; p;
        void operator()(ResultType r) { p.set_value(move(r)); }
    };
    auto p = make_unique&lt;Handler&lt;FutHandler&gt;&gt;(FutHandler{});
    auto f = p-&gt;p.get_future();
    invoker(p.get());
    p.release();
    return f;
}
</code></pre><h2 id="summary">Summary</h2>
<p>Proposed changes improve efficiency of the networking library by altering the mechanism how high-level public API interprets CompletionToken when invoking unspecified internal implementation. If this direction has support, the author of this article will gladly help the author of Networking Library proposal to flesh out the relevant details and provide testing of proposed changes using coroutines available in MSVC compiler. </p>
<h2 id="future-work-musing">Future Work / Musing</h2>
<p>There is an upcoming proposal (see [c++std-ext-17433]) to add [[nodiscard]] attribute/context-sensitive keyword to be applicable to classes and functions. If that attribute is applied to an awaiter class returned from the completion_token_transform, it will make it safe to add a default CompletionToken use_await_t to all async_xyz APIs.</p>
<!-- as shown below,
```
template <typename Signature, typename Invoker>
auto completion_token_transform(use_await_t, Invoker invoker)
{
    nodiscard struct Awaiter : AwaiterBase, Invoker {
``` 
-->
<pre><code class="lang-c++">template&lt;class CompletionToken = use_await_t&gt;
auto async_xyz(T1 t1, T2 t2, CompletionToken&amp;&amp; token  = use_await_t{}) noexcept(auto)
{
  return completion_token_transform&lt;void(R1 r1, R2 r2)&gt;(
       forward&lt;CompletionToken&gt;(token),
       [=](auto typeErasedHandler) { async_xyz_impl_raw(t1, t2, typeErasedHandler); });
}
</code></pre>
<p>If a user accidentally writes <code>async_xyz(t1,t2)</code> instead of <code>await async_xyz(t1,t2)</code>, the mistake will be caught at compile time due to <code>nodiscard</code> tag on the awaitable class.</p>
<p>Moreover, given that coroutines enable coding simplicity of synchronous functions combined with efficiency and scalability of asynchronous I/O, we may chose to use the nicest names, namely (send, receive, accept) to asynchronous functions and use <code>CompletionToken</code> form of the API to deal with all cases. A single API function <code>async_xyz</code> can be utilized for all flavors of operations. This shrinks required API surface by two thirds.</p>
<pre><code>Instead of 3 forms of every API:

   void send(T1,T2);
   void send(T1,T2,error_code&amp;);
   void async_send(T1,T2, CompletionToken);

We can use a single form

   auto send(T1,T2,CompletionToken);

To be used as follows:

   await send(t1,t2); // CompletionToken defaults to use_await_t as being the most efficient and convenient way of using the async API
   send(t1,t2,block); // synchronous version throwing an exception
   send(t1,t2,block[ec]); // synchronous version reporting an error by setting error code into ec
   send(t1,t2,[]{ completion }); // asynchronous call using callback model
   auto fut = send(t1,t2,use_future); // completion via future
</code></pre><p>Benefit of this approach extends beyond the networking library to other future standard or non-standard libraries modeling their APIs on the CompletionToken/completion_token_transform. </p>
<h2 id="acknowledgments">Acknowledgments</h2>
<p>Great thanks to Christopher Kohlhoff whose N4045 provided the inspiration for this work.</p>
<h2 id="references">References</h2>
<p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4045.pdf">N4045</a>: <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4045.pdf">Library Foundations for Asynchronous Operations, Revision 2</a><br><a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4399.html">N4399</a>: <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4399.html">Technical Specification for C++ Extensions for Concurrency</a><br><a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4478.html">N4478</a>: <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4478.html">Networking Library Proposal (Revision 5)</a><br><a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4499.pdf">N4499</a>: <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4499.pdf">Draft Wording For Coroutines (Revision 2)</a><br><a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/p0057r1.pdf">P0057</a>: <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/p0057r1.pdf">Wording for Coroutines</a><br><a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/p0162r0.pdf">P0162</a>: <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/p0162r0.pdf">A response to &quot;P0055R0: On Interactions Between Coroutines and Networking Library&quot;</a></p>

</body>
</html>
<!-- This document was created with MarkdownPad, the Markdown editor for Windows (http://markdownpad.com) -->
