<!DOCTYPE html>
        <html>
        <head>
            <meta charset="UTF-8">
            <title>system&lowbar;scheduler on Win32&comma; Darwin and Linux</title>
            <style>
/* From extension vscode.github */
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

.vscode-dark img[src$=\#gh-light-mode-only],
.vscode-light img[src$=\#gh-dark-mode-only],
.vscode-high-contrast:not(.vscode-high-contrast-light) img[src$=\#gh-light-mode-only],
.vscode-high-contrast-light img[src$=\#gh-dark-mode-only] {
	display: none;
}

</style>
            
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css">
<style>
            body {
                font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif;
                font-size: 14px;
                line-height: 1.6;
            }
        </style>
        <style>
.task-list-item {
    list-style-type: none;
}

.task-list-item-checkbox {
    margin-left: -20px;
    vertical-align: middle;
    pointer-events: none;
}
</style>
<style>
:root {
  --color-note: #0969da;
  --color-tip: #1a7f37;
  --color-warning: #9a6700;
  --color-severe: #bc4c00;
  --color-caution: #d1242f;
  --color-important: #8250df;
}

</style>
<style>
@media (prefers-color-scheme: dark) {
  :root {
    --color-note: #2f81f7;
    --color-tip: #3fb950;
    --color-warning: #d29922;
    --color-severe: #db6d28;
    --color-caution: #f85149;
    --color-important: #a371f7;
  }
}

</style>
<style>
.markdown-alert {
  padding: 0.5rem 1rem;
  margin-bottom: 16px;
  color: inherit;
  border-left: .25em solid #888;
}

.markdown-alert>:first-child {
  margin-top: 0
}

.markdown-alert>:last-child {
  margin-bottom: 0
}

.markdown-alert .markdown-alert-title {
  display: flex;
  font-weight: 500;
  align-items: center;
  line-height: 1
}

.markdown-alert .markdown-alert-title .octicon {
  margin-right: 0.5rem;
  display: inline-block;
  overflow: visible !important;
  vertical-align: text-bottom;
  fill: currentColor;
}

.markdown-alert.markdown-alert-note {
  border-left-color: var(--color-note);
}

.markdown-alert.markdown-alert-note .markdown-alert-title {
  color: var(--color-note);
}

.markdown-alert.markdown-alert-important {
  border-left-color: var(--color-important);
}

.markdown-alert.markdown-alert-important .markdown-alert-title {
  color: var(--color-important);
}

.markdown-alert.markdown-alert-warning {
  border-left-color: var(--color-warning);
}

.markdown-alert.markdown-alert-warning .markdown-alert-title {
  color: var(--color-warning);
}

.markdown-alert.markdown-alert-tip {
  border-left-color: var(--color-tip);
}

.markdown-alert.markdown-alert-tip .markdown-alert-title {
  color: var(--color-tip);
}

.markdown-alert.markdown-alert-caution {
  border-left-color: var(--color-caution);
}

.markdown-alert.markdown-alert-caution .markdown-alert-title {
  color: var(--color-caution);
}

</style>
        
        </head>
        <body class="vscode-body vscode-light">
            <style type="text/css">
p {text-align:justify}
li {text-align:justify}
blockquote.note
{
background-color:#E0E0E0;
padding-left: 15px;
padding-right: 15px;
padding-top: 1px;
padding-bottom: 1px;
}
code
{
color:#000000;
}
ins {background-color:#A0FFA0}
del {background-color:#FFA0A0}
table {border-collapse: collapse;}
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
</style>
<table>
<thead>
<tr>
<th>Document Number:</th>
<th>p3456r0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Date:</td>
<td>2024-10-15</td>
</tr>
<tr>
<td>Target:</td>
<td>SG1</td>
</tr>
<tr>
<td>Revises:</td>
<td></td>
</tr>
<tr>
<td>Reply to:</td>
<td>Gor Nishanov (<a href="mailto:gorn@microsoft.com">gorn@microsoft.com</a>)</td>
</tr>
</tbody>
</table>
<h1 id="system_scheduler-on-win32-darwin-and-linux">system_scheduler on Win32, Darwin and Linux</h1>
<h2 id="abstract">Abstract</h2>
<p>This document compares the features offered by the global threadpool on Windows and MacOS with the facilities proposed in the P2079R4 System Execution Context paper. It explores extensions to P2079R4 that capture most of the primitives offered by the operating systems and outlines implementations for Windows, MacOS, and Linux.</p>
<h2 id="overview">Overview</h2>
<p>Win32 and Darwin offer global system threadpools with the following features:</p>
<ol>
<li>Post work items for execution.</li>
<li>Schedule a work item to run at a particular time (or after a delay).</li>
<li>Associate an I/O operation with a threadpool, so that the handler processing completion will be executed by the threadpool.</li>
<li>Bulk execution (Darwin only), though a bulk algorithm running over post work item results in better performance (see bulk section).</li>
<li>Allow associating priorities with work items.</li>
<li>Maintain an optimal number of threads processing work items:
<ol>
<li>If a thread gets blocked, another one is released/created to maintain the desired number of active threads.</li>
<li>The number of active threads is kept proportional to the number of cores.</li>
<li>Guards against thread explosion (when newly created threads get blocked).</li>
<li>Shrinks the number of threadpool threads to zero when not needed.</li>
</ol>
</li>
</ol>
<p>The following table summarizes the features:</p>
<table>
<thead>
<tr>
<th></th>
<th>p2079r4</th>
<th>Windows</th>
<th>MacOS</th>
<th>Asio</th>
</tr>
</thead>
<tbody>
<tr>
<td>schedule</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td>+</td>
</tr>
<tr>
<td>schedule_at</td>
<td></td>
<td>+</td>
<td>+</td>
<td>+</td>
</tr>
<tr>
<td>defer</td>
<td></td>
<td></td>
<td></td>
<td>+</td>
</tr>
<tr>
<td>bulk</td>
<td>+</td>
<td></td>
<td>+</td>
<td></td>
</tr>
<tr>
<td>i/o</td>
<td></td>
<td>+</td>
<td>+</td>
<td>+</td>
</tr>
<tr>
<td>priorities</td>
<td></td>
<td>+</td>
<td>+</td>
<td></td>
</tr>
</tbody>
</table>
<p>In subsequent sections we will try to outline
whether we can get as much parity with system threadpools with C++26 system context and
what can be added later.</p>
<h2 id="system_scheduler-of-p2079r4">system_scheduler of P2079r4</h2>
<p>As a starting point, we will use p2079r4's
system_scheduler and over the course of this paper
expand it to cover most of the functionality offered
by operation system global threadpools and boost asio.</p>
<pre><code class="language-c++"><span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">()</span></span>;

<span class="hljs-function"><span class="hljs-keyword">class</span> <span class="hljs-title">system_scheduler</span><span class="hljs-params">()</span> </span>{
<span class="hljs-keyword">public</span>:
  <span class="hljs-built_in">system_scheduler</span>() = <span class="hljs-keyword">delete</span>;
  <span class="hljs-type">bool</span> <span class="hljs-keyword">operator</span>==(<span class="hljs-type">const</span> system_scheduler&amp;) <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span>;
  std::<span class="hljs-function">execution::forward_progress_guarantee <span class="hljs-title">get_forward_progress_guarantee</span><span class="hljs-params">()</span> <span class="hljs-keyword">noexcept</span></span>;

  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">schedule</span><span class="hljs-params">()</span></span>;
  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">bulk</span><span class="hljs-params">(integral <span class="hljs-keyword">auto</span> i, <span class="hljs-keyword">auto</span> f)</span></span>;
};
</code></pre>
<h2 id="priorities">Priorities</h2>
<p>Win32 threadpool offers three priority levels: high, normal and low. When a thread comes to pick up work, it will only pick up
work from a lower priority queue, only when there is no work items in the higher priority queues. MacOS offers additional
priority level <code>background</code> that is of lower priority than <code>low</code>
and all the work with <code>background</code> priority will not run
when low-power mode is enabled.</p>
<p>To reach parity we recommend to to extend <code>get_system_scheduler()</code> to take an optional <code>priority</code> parameter to take advantage
of the underlying system threadpools. On systems without
OS threadpool, this can be easily supported by providing
a queue per priority level.</p>
<pre><code class="language-c++"><span class="hljs-keyword">enum class</span> <span class="hljs-title class_">system_scheduler_priority</span> {
   background
   low,
   normal,
   high,
};

<span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">(system_scheduler_priority priority = system_scheduler_priority::normal)</span></span>;
</code></pre>
<p>On Darwin, it can be implemented as:</p>
<pre><code class="language-c++"><span class="hljs-keyword">class</span> <span class="hljs-title class_">system_scheduler</span> {
   ...
   <span class="hljs-keyword">using</span> native_handle_type = <span class="hljs-type">dispatch_queue_t</span>;
   <span class="hljs-function">native_handle_type <span class="hljs-title">native_handle</span><span class="hljs-params">()</span></span>;

   <span class="hljs-function"><span class="hljs-keyword">friend</span> system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">()</span></span>;
<span class="hljs-keyword">private</span>:
   <span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-title">system_scheduler</span><span class="hljs-params">(native_handle_type h)</span> : handle{</span>h}
   native_handle_type handle{};
};

<span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">(system_scheduler_priority priority = system_scheduler_priority::normal)</span> </span>{
   <span class="hljs-type">static</span> array&lt;<span class="hljs-type">int</span>, 4&gt; arr = {
      DISPATCH_QUEUE_PRIORITY_BACKGROUND,
      DISPATCH_QUEUE_PRIORITY_LOW,
      DISPATCH_QUEUE_PRIORITY_DEFAULT,
      DISPATCH_QUEUE_PRIORITY_HIGH
   };
   <span class="hljs-keyword">auto</span> queue = <span class="hljs-built_in">dispatch_get_global_queue</span>(<span class="hljs-built_in">to_underlying</span>(priority), <span class="hljs-number">0</span>)
   <span class="hljs-keyword">return</span> <span class="hljs-built_in">system_scheduler</span>(q);
}
</code></pre>
<p>On Windows:</p>
<pre><code class="language-c++"><span class="hljs-keyword">class</span> <span class="hljs-title class_">system_scheduler</span> {
   ...
   <span class="hljs-keyword">using</span> native_handle_type = TP_CALLBACK_ENVIRON*;
   <span class="hljs-function">native_handle_type <span class="hljs-title">native_handle</span><span class="hljs-params">()</span></span>;

   <span class="hljs-function"><span class="hljs-keyword">friend</span> system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">()</span></span>;
<span class="hljs-keyword">private</span>:
   <span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-title">system_scheduler</span><span class="hljs-params">(native_handle_type h)</span> : handle{</span>h}
   native_handle_type handle{};
};

<span class="hljs-function">TP_CALLBACK_ENVIRON <span class="hljs-title">env_for_priority</span><span class="hljs-params">(TP_CALLBACK_PRIORITY pri)</span> </span>{
  TP_CALLBACK_ENVIRON result;
  <span class="hljs-built_in">InitializeThreadpoolEnvironment</span>(&amp;result);
  <span class="hljs-built_in">SetThreadpoolCallbackPriority</span>(&amp;result, pri);
  <span class="hljs-keyword">return</span> result;
}
array&lt;TP_CALLBACK_ENVIRON, 4&gt; arr = {
   <span class="hljs-built_in">env_for_priority</span>(TP_CALLBACK_PRIORITY_LOW),
   <span class="hljs-built_in">env_for_priority</span>(TP_CALLBACK_PRIORITY_LOW),
   <span class="hljs-built_in">env_for_priority</span>(TP_CALLBACK_PRIORITY_NORMAL),
   <span class="hljs-built_in">env_for_priority</span>(TP_CALLBACK_PRIORITY_HIGH),
};

<span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">(system_scheduler_priority priority = system_scheduler_priority::normal)</span>
</span>{
   <span class="hljs-keyword">return</span> <span class="hljs-built_in">system_scheduler</span>(arr[<span class="hljs-built_in">to_underlying</span>(priority)]);
}
</code></pre>
<p>One design point is whether normal should be a default
enum value, as in <code>system_scheduler_priority{} == system_scheduler_priority::normal</code>.</p>
<p>If we choose to do that, the enum will look like:</p>
<pre><code class="language-c++"><span class="hljs-keyword">enum class</span> <span class="hljs-title class_">system_scheduler_priority</span> : <span class="hljs-type">int</span> {
   background = <span class="hljs-number">-2</span>,
   low = <span class="hljs-number">-1</span>,
   normal = <span class="hljs-number">0</span>,
   high = <span class="hljs-number">1</span>,
};
</code></pre>
<h2 id="bulk">Bulk</h2>
<p>Bulk returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender. For example, the following
coroutine will perform matrix multiplication using bulk.</p>
<pre><code class="language-c++"><span class="hljs-function">task&lt;<span class="hljs-type">void</span>&gt; <span class="hljs-title">matmul</span><span class="hljs-params">(<span class="hljs-type">float</span>* xout, <span class="hljs-type">float</span>* x, <span class="hljs-type">float</span>* w, <span class="hljs-type">int</span> n, <span class="hljs-type">int</span> d)</span> </span>{
   <span class="hljs-keyword">auto</span> sh = <span class="hljs-built_in">get_system_scheduler</span>();
   <span class="hljs-comment">// Parallel matrix multiplication using bulk.</span>
   <span class="hljs-comment">// W (d,n) * x (n) -&gt; xout (d)</span>
   <span class="hljs-keyword">co_await</span> sh.<span class="hljs-built_in">bulk</span>(d, [&amp;](<span class="hljs-type">size_t</span> i) {
        <span class="hljs-type">float</span> val = <span class="hljs-number">0.0f</span>;
        <span class="hljs-keyword">for</span> (<span class="hljs-type">int</span> j = <span class="hljs-number">0</span>; j &lt; n; j++)
            val += w[i * n + j] * x[j];

        xout[i] = val;
   });
}
</code></pre>
<p>Darwin libdispatch offers the dispatch_apply_f that submits a single function to the dispatch queue and causes the function to be executed the specified number of times.</p>
<pre><code class="language-c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">dispatch_apply_f</span><span class="hljs-params">(<span class="hljs-type">size_t</span> iterations, <span class="hljs-type">dispatch_queue_t</span> queue, <span class="hljs-type">void</span> *context, <span class="hljs-type">void</span> (*work)(<span class="hljs-type">void</span> *context, <span class="hljs-type">size_t</span> iteration))</span></span>;
</code></pre>
<p>Windows does not have an equivalent, but, such facility can be readily implemented as an algorithm on top of non-bulk schedule.</p>
<p>In a few tests we ran, on Darwin, running bulk algorithm
over scheduler is more performant than the dispatch_apply_f.</p>
<p>Additionally bulk algorithm supports cancellation that can
prompt early exit from the calculation when not needed.</p>
<p>Another observation is that on MSVC, running parallel algorithms (which internally use the global threadpool) completely starves all other threadpool work until the parallel computation is completed. This appears to be suboptimal behavior.</p>
<p>We would like to propose that bulk should:</p>
<ol>
<li>support cancellation</li>
<li>cooperate with other items in the threadpool to not starve &quot;concurrency&quot; work items while &quot;parallel&quot; computation is run.</li>
</ol>
<h2 id="defer">Defer</h2>
<p>This feature is not available in Win32 or Darwin and
only offered by by boost::asio and
earlier revisions of executor proposal.</p>
<p>It allows to defer an execution of a work item immediately after the current work item returns control to the threadpool.</p>
<ul>
<li>it allows to avoid stack overflow when performing inline completion without the need to resort to posting
a threadpool work item</li>
<li>it allows to avoid deadlocks when code executing a callback is holding a lock and some actions need to be deferred until the control is returned to a threadpool.</li>
</ul>
<p>The implementation is straightforward. A scheduler will
need to maintain a thread-local queue of work that will
accumulate deferred items and execute them all immediately
after the current executing work item finishes.</p>
<pre><code class="language-c++"><span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">()</span></span>;

<span class="hljs-function"><span class="hljs-keyword">class</span> <span class="hljs-title">system_scheduler</span><span class="hljs-params">()</span> </span>{
<span class="hljs-keyword">public</span>:
  <span class="hljs-built_in">system_scheduler</span>() = <span class="hljs-keyword">delete</span>;
  <span class="hljs-type">bool</span> <span class="hljs-keyword">operator</span>==(<span class="hljs-type">const</span> system_scheduler&amp;) <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span>;
  std::<span class="hljs-function">execution::forward_progress_guarantee <span class="hljs-title">get_forward_progress_guarantee</span><span class="hljs-params">()</span> <span class="hljs-keyword">noexcept</span></span>;

  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">schedule</span><span class="hljs-params">()</span></span>;
  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">defer</span><span class="hljs-params">()</span></span>; <span class="hljs-comment">// &lt;-- freshly added</span>
  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">bulk</span><span class="hljs-params">(integral <span class="hljs-keyword">auto</span> i, <span class="hljs-keyword">auto</span> f)</span></span>;
};
</code></pre>
<h2 id="timed">Timed</h2>
<p>Win32, Darwin and Asio threadpools support scheduling
work items based on time (absolute or relative).</p>
<p>Underlying implementation maintains a data structure
to keep the scheduled work items in the order of expiration.
Usually, it is a pairing heap (O(1) insert, O(log n) extract expiring item).</p>
<p>To reduce impact of O(log n), Windows gives an opportunity to
the user to indicate whether it is OK to delay execution
of a work item with a specified window, thus allowing to group some
of the timed items together, reducing the total number
of elements maintained by the heap.</p>
<p>Windows also support cancellation of timed work items, whereas Darwin does not.</p>
<p>Thus on Darwin, support for timed operations will need to
rely on C++ runtime maintained pairing heap if we would like to support cancellation, since now the heap is in the c++ library implementor hand's, we can also add support for coalescing window increasing performance when number of timed items is high.</p>
<pre><code class="language-c++"><span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Rep, <span class="hljs-keyword">typename</span> Ratio&gt;
<span class="hljs-function">schedule_after_sender <span class="hljs-title">system_scheduler::schedule_after</span><span class="hljs-params">(std::chrono::duration&lt;Rep, Ratio&gt; delay, milliseconds window = {})</span> <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span></span>;

<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Clock, <span class="hljs-keyword">typename</span> Duration&gt;
<span class="hljs-function">schedule_at_sender <span class="hljs-title">system_scheduler::schedule_at</span><span class="hljs-params">(std::chrono::time_point&lt;Clock, Duration&gt;&amp; abs_time, milliseconds window = {})</span> <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span></span>;
</code></pre>
<p>We choose to keep the window in milliseconds
as it matches Win32 API. While it is possible to have it specified in microseconds or nanoseconds (or have it as templatized duration), we ended up with
milliseconds based on simplicity and windows experience.</p>
<h2 id="elastic-threadpool">Elastic threadpool</h2>
<p>One fundamental feature of OS provided threadpools is that they
have visibility in what threads are doing and can make
intelligent decisions when to remove or add a thread to the threadpool.</p>
<p>We explored two contending approaches:</p>
<ol>
<li>Use Berkeley Packet Filter to inject code into the kernel to report when thread goes to sleep or wakes up</li>
<li>Do periodic sampling of all threadpool threads by reading /proc/self/task/{}/stat</li>
</ol>
<p>The first approach seems sub-optimal as it forces system wide overhead for all processes and typically requires root privilege to install.</p>
<p>Sampling of /proc/self/task/{}/stat seems more promising. This is the approach used by libdispatch on Linux. Moreover, the overhead of sampling can be reduced significantly utilizing <code>io_uring</code> to sample thousands of thread stats in a one kernel transition.</p>
<p>Knowledge of the threadpool thread state allows to implement
the desired policies to grow and shrink threadpool as needed.</p>
<h2 id="scheduled-item-cancellation">Scheduled item cancellation</h2>
<p>One open question is what to do when a pending work item (via schedule() or as a result of I/O completion) is cancelled. We can see several strategies.</p>
<ol>
<li>(Conservative) Leave the work item in the queue, it will be processed like a non-cancelled work item, with the exception that instead of calling set_value, set_stopped will be called.</li>
<li>(Eager) Remove work item from the queue. Execute the handler inline.</li>
<li>(Reprioritize) Move the cancelled work items to the beginning of the queue</li>
<li>(Defer) Move the cancelled work item to the deferred queue</li>
</ol>
<p>Out of this options, the first one looks least problematic, while it does not speed up completion of the cancelled item if there
are many items ahead in the queue, it does not disturb the execution of other items and if there are many cancellations, they will be handled by threadpool concurrently.</p>
<p>Eager, can lead to deadlocks, stack overflows, slowing down cancellation as it would be executed by the thread initiating the cancellation.</p>
<p>Defer is immune from the deadlock, but, has the same issue of running the cancellation on the same thread.</p>
<p>Finally, eager, prioritizes cancelled items over regular items. It may or many not be beneficial over the conservative approach depending on the exact scenario.</p>
<p>Thus, conservative approach is recommended as a default.</p>
<h2 id="timer-implementation">Timer implementation</h2>
<p>The most likely implementation of a timer (in the cases when such facility is not available in the OS) would be to maintain
a heap of timer items in the order of expiration and use
a threadpool control thread or other facilities to wait until
the nearest timer expires. Once expired, the work item is queued
into a threadpool queue as if it was scheduled via schedule_sender.</p>
<p>If a timed work is cancelled prior to expiration, it is also posted
into a threadpool in a cancelled state.</p>
<h2 id="io">I/O</h2>
<p>A libunifex explored how sender/receive I/O may look like. For example:</p>
<pre><code class="language-c++"><span class="hljs-function">task&lt;<span class="hljs-type">void</span>&gt; <span class="hljs-title">read</span><span class="hljs-params">(<span class="hljs-keyword">auto</span> scheduler sched)</span> </span>{
   <span class="hljs-keyword">auto</span> in = <span class="hljs-built_in">open_file_read_only</span>(sched, <span class="hljs-string">&quot;file.txt&quot;</span>);
   std::array&lt;<span class="hljs-type">char</span>, 1024&gt; buffer;
   <span class="hljs-function"><span class="hljs-keyword">co_await</span> <span class="hljs-title">async_read_some_at</span><span class="hljs-params">(
      in, <span class="hljs-number">0</span>, as_writable_bytes(span{buffer.data(), buffer.size()}))</span></span>;
}
</code></pre>
<p>While it is unlikely that in time for C++26 we will be able
to specify I/O suite covering various aspect of the I/O, we can expose <code>native_handle()</code> to the <code>system_scheduler</code> that
would allow developers to write libraries on top providing
an I/O support that will be executing completions on
system scheduler.</p>
<p>Early in the priority section we already got a sneak preview of native_handle. In this section we can show how it can be
used to support an I/O.</p>
<p>In Windows, native_handle is an alias to <code>TP_CALLBACK_ENVIRON*</code> that
carries information about what threadpool to use and what is
the priorities of the items being scheduled. It can be used
with existing win32 threadpool APIs as follows:</p>
<pre><code class="language-c++"><span class="hljs-keyword">auto</span> io = <span class="hljs-built_in">CreateThreadpoolIo</span>(file, cb, context, 
   <span class="hljs-built_in">get_system_scheduler</span>().<span class="hljs-built_in">native_handle</span>());
</code></pre>
<p>Of course, we expect that the users will have a nice C++ wrappers around raw win32 APIs, but, the purpose of this to show how to make C++ system context interact with components that expect windows threadpool (TP_CALLBACK_ENV) as an argument.</p>
<p>Similarly, on Darwin, it would look</p>
<pre><code class="language-c++"><span class="hljs-built_in">dispatch_read</span>(f, <span class="hljs-number">1024</span>, <span class="hljs-built_in">get_system_scheduler</span>().<span class="hljs-built_in">native_handle</span>(),
^(<span class="hljs-type">dispatch_data_t</span> data, <span class="hljs-type">int</span> error) {...});
  
</code></pre>
<p>It is likely that Linux implementation will
use io_uring to send commands and receive completions
and subsequently it will post completion handle
to the threadpool as if by schedule().
Thus, on Linux, native handle needs to have
enough context for the user to be able to develop
their own I/O wrappers on top.</p>
<h2 id="summary">Summary</h2>
<p>Thus, in order to expose most of the features
of underlying OS global threadpool the system_context
would look like:</p>
<pre><code class="language-c++"><span class="hljs-keyword">enum class</span> <span class="hljs-title class_">system_scheduler_priority</span> {
   background
   low,
   normal,
   high,
};

<span class="hljs-function">system_scheduler <span class="hljs-title">get_system_scheduler</span><span class="hljs-params">(system_scheduler_priority priority = system_scheduler_priority::normal)</span></span>;

<span class="hljs-function"><span class="hljs-keyword">class</span> <span class="hljs-title">system_scheduler</span><span class="hljs-params">()</span> </span>{
<span class="hljs-keyword">public</span>:
  <span class="hljs-built_in">system_scheduler</span>() = <span class="hljs-keyword">delete</span>;
  <span class="hljs-type">bool</span> <span class="hljs-keyword">operator</span>==(<span class="hljs-type">const</span> system_scheduler&amp;) <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span>;
  std::<span class="hljs-function">execution::forward_progress_guarantee <span class="hljs-title">get_forward_progress_guarantee</span><span class="hljs-params">()</span> <span class="hljs-keyword">noexcept</span></span>;

  <span class="hljs-keyword">using</span> native_handle_type = implementation-defined;
  <span class="hljs-function">native_handle_type <span class="hljs-title">native_handle</span><span class="hljs-params">()</span></span>;

  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">schedule</span><span class="hljs-params">()</span></span>;
  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">defer</span><span class="hljs-params">()</span></span>;
  <span class="hljs-function">sender <span class="hljs-keyword">auto</span> <span class="hljs-title">bulk</span><span class="hljs-params">(integral <span class="hljs-keyword">auto</span> i, <span class="hljs-keyword">auto</span> f)</span></span>;

  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Rep, <span class="hljs-keyword">typename</span> Ratio&gt;
  <span class="hljs-function">schedule_after_sender <span class="hljs-title">system_scheduler::schedule_after</span><span class="hljs-params">(
     std::chrono::duration&lt;Rep, Ratio&gt; delay,
     milliseconds window = {})</span> <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span></span>;

  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Clock, <span class="hljs-keyword">typename</span> Duration&gt;
  <span class="hljs-function">schedule_at_sender <span class="hljs-title">system_scheduler::schedule_at</span><span class="hljs-params">(
     std::chrono::time_point&lt;Clock, Duration&gt;&amp; abs_time, 
     milliseconds window = {})</span> <span class="hljs-type">const</span> <span class="hljs-keyword">noexcept</span></span>;
};
</code></pre>
<p>This paper was coming in hot for the deadline. We intend to clean it up and improved before seeing it in Poland.</p>
<h2 id="references">References:</h2>
<p>[p2300] <a href="https://wg21.link/p2300">https://wg21.link/p2300</a> std::execution</p>
<p>[p2079r4] <a href="https://wg21.link/p2079r4">https://wg21.link/p2079r4</a> System execution context</p>
<p>[asio-executors] <a href="https://github.com/chriskohlhoff/executors/">https://github.com/chriskohlhoff/executors/</a></p>
<p>[win32 threadpool details] <a href="https://www.youtube.com/watch?v=CzgNVuXVMWo">https://www.youtube.com/watch?v=CzgNVuXVMWo</a></p>
<p>[libdispatch] <a href="https://developer.apple.com/documentation/dispatch?language=objc">https://developer.apple.com/documentation/dispatch?language=objc</a></p>
<p>[io_uring] <a href="https://kernel.dk/io_uring.pdf">https://kernel.dk/io_uring.pdf</a></p>
<!--
tips about libdispatch
lib dispatch background work paused
https://gist.github.com/tclementdev/6af616354912b0347cdf6db159c37057
>
            
            
        </body>
        </html>