<!DOCTYPE html>
<html>
<head>
  <title>P0469R0: Sample in place</title>
    <style type="text/css">
  p {text-align:justify}
  li {text-align:justify}
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
</style>
</head>
<body>
<h1>Sample in place</h1>
<table>
<tbody>
<tr>
<td>Document number:</td>
<td>P0469R0</td>
</tr>
<tr>
<td>Date:</td>
<td>2016-10-17</td>
</tr>
<tr>
<td>Audience:</td>
<td>Library Evolution Working Group</td>
</tr>
<tr>
<td>Reply-to:</td>
<td>R. &quot;Tim&quot; Song &lt;rs2740@gmail.com&gt;</td>
</tr>
</tbody>
</table>
<h2>Abstract</h2>

<p>This paper proposes adding an algorithm <code>inplace_sample</code> that is like <code>std::sample</code>,
but partitions the input range instead of copying the sampled elements.</p>

<h2>Background</h2>

<p>The <a href="http://wg21.link/P0347R0"><code>random_generator</code> proposal</a>, based on Melissa O&#39;Neill&#39;s randutils library,
included a member function named <code>sample</code> that performed sampling in-place.
According to Walter Brown, there is interest in a standalone <code>inplace_sample</code>
algorithm, hence this proposal.</p>

<h2>Motivation</h2>

<p>There are two drawbacks for <code>std::sample</code>:</p>

<ul><li>It copies the sampled elements, which could be potentially expensive, and makes it hard to use with move-only types.</li><li>There is no easy way to get the elements <em>not</em> included in the sample.</li></ul>

<p><code>inplace_sample</code> solves both problems. Instead of copying elements, it partitions them so that
every element in the sample comes before every element not included in the sample. And since it returns
the partition point, accessing the remaining elements is trivial.</p>

<h2>Impact on the standard</h2>

<p>This is a pure extension.</p>

<h2>Design questions</h2>

<h3>Stability</h3>

<p>The stability guarantee in the specification comes from the original implementation&#39;s
use of <code>stable_partition</code>, but it does impose a cost in the form of additional memory
or move/swap operations (as opposed to the case in the selection sampling version of <code>std::sample</code>, which gets it for free).</p>

<p>It may be appropriate to remove the requirement.</p>

<h3>Naming</h3>

<p>There appears to be two naming conventions in <code>&lt;algorithm&gt;</code> for non-copying and copying variants of the same algorithm:</p>

<ul><li>One appends <code>_copy</code> to the copying variant, hence <code>partition</code>/<code>partition_copy</code> and <code>remove</code>/<code>remove_copy</code>, etc.</li><li>The other prepends <code>inplace_</code> to the non-copying variant: <code>merge</code
>/<code>inplace_merge</code></li></ul>

<p>The second alternative is chosen in this paper since it seems too late to rename <code>std::sample</code>.</p>

<h2>Proposed specification</h2>

<p>Add the following signature to [algorithms.general], header <code>&lt;algorithm&gt;</code> synopsis:</p>

<pre><code>template &lt;class ForwardIterator, class Distance,
          class UniformRandomBitGenerator&gt;
ForwardIterator inplace_sample(ForwardIterator first, ForwardIterator last,
                               Distance n, UniformRandomBitGenerator&amp;&amp; g);</code></pre>

<p>Add to the end of [alg.random.sample]:</p>

<pre><code>template &lt;class ForwardIterator, class Distance,
          class UniformRandomBitGenerator&gt;
ForwardIterator inplace_sample(ForwardIterator first, ForwardIterator last,
                               Distance n, UniformRandomBitGenerator&amp;&amp; g);</code></pre>

<p>-?- <em>Requires</em>:
<ul>
<li><code>ForwardIterator</code> shall satisfy the requirements of <code>ValueSwappable</code> ([swappable.requirements]).</li>
<li>The type of <code>*first</code> shall satisfy the requirements of <code>MoveConstructible</code> (Table [tab:moveconstructible]) and <code>MoveAssignable</code> (Table [tab:moveassignable]).</li>
<li><code>Distance</code> shall be an integer type.</li>
<li><code>remove_reference_t&lt;UniformRandomBitGenerator&gt;</code> shall meet the requirements of a uniform random bit generator type ([rand.req.urng]) whose return type is convertible to <code>Distance</code>.</li></ul></p>

<p>-?- <em>Effects</em>: If <code>n &gt;= last - first</code>, has no effects. Otherwise, reorders elements in the range <code>[first, last)</code>, such that each possible sample of size <code>n</code> has equal probability of appearing in the range <code>[first, first
+ n)</code>. The relative order of the elements both inside the sample and outside the sample is preserved.</p>

<p>-?- <em>Returns</em>: <code>first + min(n, last - first)</code>.</p>

<p>-?- <em>Remarks</em>: To the extent that the implementation of this function makes use of random numbers, the object <code>g</code> shall serve as the implementation&#39;s source of randomness.</p>

<h2>Possible implementation</h2>

<p>The following naïve implementation assumes a version of <code>std::stable_partition</code> that
accepts forward iterators.</p>

<p><pre><code>
template &lt;class ForwardIterator, class Distance,
          class UniformRandomBitGenerator&gt;
ForwardIterator inplace_sample(ForwardIterator first, ForwardIterator last,
                               Distance n, UniformRandomBitGenerator&amp;&amp; g);
{
    auto total = std::distance(first, last);
    using dist_t = uniform_int_distribution&lt;Distance&gt;;
    using param_t = typename dist_t::param_type;
    dist_t d{};

    return std::stable_partition(first, last,
     [&amp;](const auto&amp;) {
        --total;
        if (d(g, param_t{0, total}) &lt; n) {
            --n;
            return true;
        } else {
            return false;
        }
     });
}</code></pre></p>
</body></html>
