<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>A new specification for std::generate_canonical</title>
    <style type="text/css">
      html { margin: 0; padding: 0; color: black; background-color: white; }
      body { padding: 2em; font-size: medium; font-family: "DejaVu Serif", serif; line-height: 150%; }
      code { font-family: "DejaVu Sans Mono", monospace; color: #006; }

      h1, h2, h3 { margin: 1.5em 0 .75em 0; line-height: 125%; clear: both; }

      sup, sub { line-height: 0; }

      div.code { white-space: pre-line; font-family: "DejaVu Sans Mono", monospace;
                 border: thin solid #E0E0E0; background-color: #F8F8F8; padding: 1em;
                 border-radius: 4px; }

      div.strictpre { white-space: pre; }

      div.code em { font-family: "DejaVu Serif", serif; }

      .docinfo { float: right }
      .docinfo p { margin: 0; text-align:right; }
      .docinfo address { font-style: normal; }

      .quote { display: inline-block; clear: both; margin-left: 1ex;
               border: thin solid #E0E0E0; background-color: #F8F8F8; padding: 1ex; }

      /*  Use DIV[insert] and DIV[delete] if the entire paragraph is added or removed; otherwise
       *  use DIV[modify] and use INS/DEL elements to mark up individual changes.
       */

      div.insert { border-left: thick solid #0A0; border-right: thick solid #0A0; padding: 0 1em; }
      div.modify { border-left: thick solid #999; border-right: thick solid #999; padding: 0 1em; }
      div.delete { border-left: thick solid #A00; border-right: thick solid #A00; padding: 0 1em; }

      del { color: #A00; text-decoration: line-through; }
      ins { color: #090; }
      ins code, del code { color: inherit; }

      #nostrike:checked ~ .out  { text-decoration: line-through; }

      table { border-collapse: collapse; margin: 3em auto; padding: 0; caption-side: bottom; }
      table caption { padding: 1ex 0 0 0; text-align: left; font-family: "DejaVu Sans", sans-serif; }
      td { text-align: center; margin: 0; padding: 0; height: 3em; width: 3em; border: 1px solid black; border-width: 3px 1px; }
      td:first-child { border-left-width: 3px; }
      td.br { border-right-width: 3px; }
      tr.bits td { background-color: #AFA; }
      tr.bits td.r { background-color: #FAA; }
    </style>
  </head>
  <body>
    <div class="docinfo">
      <p>ISO/IEC JTC1 SC22 WG21 P0952R1</p>
      <p>Date: 2023-09-20</p>
      <p>To: LWG</p>
      <address>
        Thomas K&ouml;ppe &lt;<a href="mailto:tkoeppe@google.com">tkoeppe@google.com</a>&gt;<br/>
        Davis Herring &lt;<a href="mailto:herring@lanl.gov">herring@lanl.gov</a>&gt;<br/>
      </address>
    </div>

    <h1>A new specification for <code>std::generate_canonical</code></h1>

    <p>This paper proposes a new specification for the function template
      <code>generate_canonical</code> [rand.util.canonical, 28.5.8.2]. The outcome is preserved
      (namely a random number in the interval [0, 1)), but the algorithm and the constraints are
      changed to ensure the desired statistical properties. The new specification
      obsoletes <a href="https://cplusplus.github.io/LWG/issue2524">LWG 2524</a>.</p>

    <h2>Contents</h2>
    <!-- fgrep -e "<h2 id=" gen_canonical.html | sed -e 's/.*id="\(.*\)">\(.*\)<\/h2>/<li><a href="#\1">\2<\/a><\/li>/g' -->
    <ol>
      <li><a href="#problem">Problem statement</a></li>
      <li><a href="#driveby">Drive-by fixes</a></li>
      <li><a href="#notwording">DO NOT SUBMIT: Proposed naive wording</a></li>
      <li><a href="#flaws">Statistical flaws in the current specification</a></li>
      <li><a href="#newspec">A new specification</a></li>
      <li><a href="#impact">Impact on the Standard</a></li>
      <li><a href="#questions">Questions for LWG and SG6</a></li>
      <li><a href="#wording">Proposed wording (not included)</a></li>
    </ol>

    <h2 id="problem">Problem statement</h2>

    <p>The specification of <code>generate_canonical</code> in C++23 and in the current working
      paper (N4958) is effectively unimplementable since it is over-constrained. To be more
      precise, it is <em>wrongly</em> constrained in terms of purely mathematical expressions
      that ignore the reality of floating point rounding on real implementations. We will point
      out two problems with the current specification. The first one is immediate, the second is
      somewhat more subtle and will be discussed later in this paper. First, consider the the
      following three current requirements.</p>
    <ol>
      <li>The result must lie in [0, 1).</li>
      <li>The algorithm is specified exactly and the underlying URBG must be invoked a specific,
      fixed number of times for a given set of parameters.</li>
      <li>The results must be uniformly distributed.</li>
    </ol>

    <p>The immediate problem is that this is unimplementable for the following reasons: The
      algorithm is currently specified exactly as a particular computation, which results in a
      fraction
      <em>S</em>/<em>R</em><sup><em>k</em></sup> that is mathematically guaranteed to be less
      than 1. However, the value may be arbitrarily close to 1, and when expressed as a value of
      a bounded-precision type in C++, the result may actually be exactly 1 due to rounding.
      (This causes real bugs, e.g. when a computation divides by <code>(1&nbsp;-&nbsp;x)</code>, where
      <code>x</code> was obtained from <code>generate_canonical</code>.) If we accept (2), the
      algorithm as written, the rounding violates constraint (1). If we modify the result when
      the algorithm results in 1, we violate (3), uniformity. If we want to preserve (1) and (3),
      we need to rerun the algorithm in the case where it results in 1, which violates (2), the
      precise computational prescription.</p>

    <p>LWG decided at the 2017 Albuquerque meeting that the best solution is to change the
      specification to rerun the algorithm until the result is not equal to 1. This means that
      the complexity of the algorithm can no longer be stated precisely, but only in
      expectation. If one round of the algorithm invokes the URBG <em>k</em> times, and the
      result is 1 with probability (1&nbsp;&minus;&nbsp;p), then the expected number of
      invocations of the URBG is now <em>k</em>/<em>p</em>. (In practice, <em>p</em> will be
      very close to (and less than) 1.) This paper attempts to improve on that decision by
      also addressing some statistical issues.</p>

    <h2 id="driveby">Drive-by fixes</h2>

    <p>The original expression for <em>b</em> assumes that the radix is 2 when comparing
      <code>numeric_limits&lt;RealType&gt;::digits</code> to the template parameter <code>bits</code>. We propose
      to interpret that template parameter as a number of digits in the radix
      <code>numeric_limits&lt;RealType&gt;::radix</code>; SG6 concurred with this direction in March 2022.
      Accordingly, we rename it to <code>digits</code>, and we also rename the variable used in the
      specification from <em>b</em> to <em>d</em>.</p>

    <h2 id="notwording">DO NOT SUBMIT: Proposed naive wording</h2>

    <p>If we simplify modified the specification to rerun the algorithm until it results in a
      value less than 1, we could use the following wording. However, this wording fails to address
      the radix issue or the deeper statistical problems that we will discuss below, so we do not want to keep the
      change as small as this. This section is included solely for historical interest because this
      wording has been discussed on the reflector before.</p>

    <label for="nostrike">Strike out bad wording:</label> <input id="nostrike" type="checkbox" checked="checked">

    <p class="out">Modify [rand.util.canonical, 29.6.7.2] paragraphs 3 and 4 as follows.</p>
    <div class="modify out">
      <div class="code">template&lt;class RealType, size_t bits, class URBG&gt;
        &nbsp; RealType generate_canonical(URBG&amp; g);</div>
      <p>3. <em>Complexity:</em> <ins>For each attempt (see below), exactly</ins><del>Exactly</del>
        <em>k</em> = max(1, &lceil;<em>b</em> / log<sub>2</sub><em>R</em>&rceil;)
        invocations of <code>g</code>, where <em>b</em> is the lesser of
        <del><code>numeric_limits&lt;RealType&gt;::digits</code> and </del><code>bits</code><ins> and
        log<sub>2</sub>(<code>numeric_limits&lt;RealType&gt;::radix</code>)
      &times; <code>numeric_limits&lt;RealType&gt;::digits</code></ins>, and <em>R</em> is the value of
        <code>g.max()</code>&nbsp;&minus;&nbsp;<code>g.min()</code>&nbsp;+&nbsp;1.</p>
      <p>4. <em>Effects:</em> <ins>For each attempt, invokes</ins><del>Invokes</del> <code>g()</code>
        <em>k</em> times to obtain values <em>g</em><sub>0</sub>, &hellip;, <em>g</em><sub><em>k</em>&minus;1</sub>,
        respectively<ins> and calculates</ins><del>. Calculates</del> a quantity [<em>S</em> = &hellip;]
        using arithmetic of type <code>RealType</code>.
        <ins>Attempts are repeated as long as the quantity <em>S</em>/<em>R</em><sup><em>k</em></sup>
          has the value <code>1.0</code> when expressed as type <code>RealType</code>.</ins></p>
      <p>5. <em>Returns:</em> <em>S</em>/<em>R</em><sup><em>k</em></sup>.</p>
      <p>6. <em>Throws:</em> What and when <code>g</code> throws.</p>
    </div>

    <h2 id="flaws">Statistical flaws in the current specification</h2>

    <p>The currently specified algorithm does not always result in uniform output due to
      rounding. To see this, consider first a simple lemma.</p>

    <p><strong>Claim.</strong> <em>The restriction of a uniform distribution on a finite
    set to a subset is uniform.</em>&#8718;</p>

    <p>We use this to look for statistical properties of <code>generate_canonical</code>. If
      the function generates uniformly distributed floats in the range [0, 1), then by throwing
      away all numbers less than 0.5, we retain a uniform distribution on the set [0.5, 1). In
      the popular IEEE-754 floating-point representation, numbers in this range have a fixed
      exponent (of effective value &minus;1), and so we get a uniform distribution of mantissas,
      and thus each bit of the mantissa (or perhaps of a leading subset of significant bits, when
      the algorithm is used with low precision) is independently uniformly distributed.
      This is a property we can look for. More generally, similar statements should hold for restrictions
      to intervals of the form [2<sup>&minus;<em>n</em></sup>, 2<sup>&minus;<em>n</em>&nbsp;+&nbsp;1</sup>).</p>

    <p>The problem in the current specification comes from the use of division combined with
      floating point rounding. Whenever a larger range is used to derive a smaller range via
      division (rather than just discarding bits, when that is an option), the floating point
      rounding behaviour affects the results. To illustrate, consider the sequence 0.0, 0.5,
      1.0, 1.5, 2.0, 2.5, 3.0, 3.5. If we <em>discard</em> the least significant bit, we obtain 0, 0, 1,
      1, 2, 2, 3, 3, which is uniform. But if we employ the popular to-nearest-even rounding, we
      obtain 0, 0, 1, 2, 2, 2, 3, 4, which biases even to odd numbers at a ratio of 3-to-1.
      (This also shows another bias, namely that 0 only gets hit twice.)</p>

    <p>To make this problem concrete, consider a typical 32-bit float with 24 mantissa digits.
      For an extreme case, if the URBG returns 25 bits, then in the restricted range [0.5, 1.0)
      the least significant bit results from rounding away the last digit of <em>g</em><sub>0</sub>
      in just the same way as in the previous example, which leads to a 3-to-1 bias of zeros over ones
      in the last bit. A 26-bit URBG would have to strip two bits in the range [0.5, 1.0) (leading to
      a 5-to-3 bias), but only one bit in the range [0.25, 0.5), and so on. With the typical 32-bit URBG,
      the bias in the last bit shows up when restricting to the range [2<sup>&minus;8</sup>, 2<sup>&minus;7</sup>) (or
      <code>[0x1p-8, 0x1p-7)</code> in code).</p>

    <p>Other rounding modes may be fairer when rounding is needed for random number generation.
      However, the rounding mode is not in scope of the specification and thus not under our
      control. Our proposed algorithm will use discard-and-retry to create a value that can
      be normalized without rounding.</p>

    <p>The problem with rounding to nearest-even was also discovered in
      <a href="https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#nextDouble--">the
      Java library&rsquo;s <code>nextDouble</code> function</a>, which originally used a
      non-trivial division and would thus experience a rounding-induced bias in the lowest bit.
      (In the corrected, present version, the division by <code>1L &lt;&lt; 53</code> is just a
      trivial exponent adjustment.)</p>

    <h2 id="newspec">A new specification</h2>

    <p>We propose a new specification that provably results in uniform output, does not suffer
      from rounding problems, and is independent of the radix of the floating point
      implementation. The algorithm does not use non-trivial division to produce a limited
      output range. Rather, it will discard any result that falls outside the desired range
      and retry, so that the final division does not round. This ensures uniformity and avoids
      any dependency on floating-point rounding behaviour.
      An extra integer division (usually just a shift) is used to prevent an unreasonable expected number of retries in certain cases.</p>

    <p>The set of possible outcomes is not required to contain <em>every</em> representable
      value of <code>RealType</code> in the interval [0, 1).
      It is difficult to define uniformity over such a distribution, and providing it would be inordinately complicated in service of very unlikely, very small results.
      Instead, the resulting values will on common implementations be precisely the
      values 2<sup>&minus;<em>b</em></sup>{0, &hellip;, 2<sup><em>b</em></sup>&nbsp;&minus;&nbsp;1},
      where <em>b</em> is <code>digits</code> (restricted to the precision of the type).
      For example, for <code>digits = 2</code> the outcome on a typical implementation is
      (uniformly) 0, 0.25, 0.5 and 0.75. Note also that the mean of the resulting distribution
      is smaller (by 2<sup>&minus;<em>b</em>&nbsp;&minus;&nbsp;1</sup>) than the ideal mean 0.5.</p>

    <p>The proposed algorithm is as follows.</p>
    <ul>
      <li>Let <em>r</em> be <code>numeric_limits&lt;RealType&gt;::radix</code>,
        let <em>d</em> be the smaller of <code>digits</code> and <code>numeric_limits&lt;RealType&gt;::digits</code>,
        and let <em>R</em> be <code>g.max()</code>&nbsp;&minus;&nbsp;<code>g.min()</code>&nbsp;+&nbsp;1.</li>

      <li>Let <em>k</em> be the smallest integer such that
        <em>R</em><sup><em>k</em></sup>&nbsp;&ge;&nbsp;<em>r</em><sup><em>d</em></sup>.</li>

      <li>Let <em>x</em> be &lfloor;<em>R</em><sup><em>k</em></sup>/<em>r</em><sup><em>d</em></sup>&rfloor;,
        which is in (0,&nbsp;<em>R</em>) and need not be a power of <em>r</em>.</li>
    </ul>

    <p>Now compute <em>S</em> = &sum;<sub><em>i</em>&isin;[0,<em>k</em>)</sub>
      <em>g</em><sub><em>i</em></sub>&nbsp;<em>R</em><sup><em>i</em></sup>
      in unbounded precision.
      Whenever <em>S</em> &ge; <em>x</em><em>r</em><sup><em>d</em></sup>,
      discard the result and retry; this occurs with probability less than 1/2 because (2<em>x</em>)&nbsp;<em>r</em><sup><em>d</em></sup>&nbsp;&gt;&nbsp;<em>R</em><sup><em>k</em></sup>.
      The return value is
      &lfloor;<em>S</em>&nbsp;/&nbsp;<em>x</em>&rfloor;&nbsp;/&nbsp;<em>r</em><sup><em>d</em></sup>,
      which can be computed without rounding since <em>d</em>&nbsp;&le;&nbsp;<code>numeric_limits&lt;RealType&gt;::digits</code>.</p>

    <p>For the edge case <code>digits = 0</code>:</p>
    <ul>
      <li><em>d</em> = 0,</li>
      <li><em>k</em> = 0, so the URBG is not called,</li>
      <li><em>x</em> = 1, and</li>
      <li><em>S</em> &equiv; 0, so the results have zero entropy.</li>
    </ul>
    
    <p>Note that on the most common platforms <em>r</em> = 2
      and <em>R</em> = 2<sup><em>n</em></sup>
      is a power of 2, so that the definitions simplify:
      <em>k</em> = &lceil;<em>d</em> / <em>n</em>&rceil;
      and <em>x</em> = 2<sup><em>m</em></sup>, where <em>m</em> = &minus;<em>d</em>&nbsp;mod&nbsp;<em>n</em>.
      No retries ever occur (because <em>R</em><sup><em>k</em></sup> is a multiple of <em>r</em><sup><em>d</em></sup>);
      the final return expression is &lfloor;<em>S</em>&nbsp;/&nbsp;2<sup><em>m</em></sup>&rfloor;&nbsp;/&nbsp;2<sup><em>d</em></sup>.
      (When <em>d</em> is a multiple of <em>n</em>, this is equivalent to the current specification.)</p>

    <p>It is easy to see that this algorithm produces uniform outputs: The value
      <em>S</em>&nbsp;&lt;&nbsp;<em>x</em><em>r</em><sup><em>d</em></sup> is obtained by restricting
      the URBG to the range [0, <em>x</em><em>r</em><sup><em>d</em></sup>). Since <em>d</em> does not exceed
      the precision of <code>RealType</code>, the final division by <em>r</em><sup><em>d</em></sup> does not round:
      the result is representable as a value of <code>RealType</code> strictly less than 1.</p>

    <h2 id="impact">Impact on the Standard</h2>

    <p>This proposal changes the side effects, computational complexity and specific algorithmic
      details of the library facility <code>std::generate_canonical</code>. In particular, code
      that depends on a specific sequence of results from repeated invocations, or on a particular
      number of calls to the URBG argument, will be broken.</p>

    <h2 id="questions">Questions for LWG and SG6</h2>

    <p>We requested clarification on the following details; some feedback from SG6 is presented below.</p>
    <ol>
      <li><p>Is the design intention that <code>std::generate_canonical&lt;RealType, bits&gt;</code> picks
        an integer uniformly from [0, 2<sup><em>M</em></sup>) and returns the value divided by
        2<sup><em>M</em></sup>? This is in contrast to possible alternative interpretations such as &ldquo;pick
        a uniform real number (mathematically) from [0, 1) and return its rounded <code>RealType</code>
        representation&rdquo;. The latter suffers from the round-to-1.0 bug, of course, but it could be
        amended to &ldquo;round-down&rdquo; semantics to avoid this. However, the requirement of a mathematically
        uniform real number requires highly variable (and potentially large) number of URBG invocations, so we
        believe that this is not the design intent.</p>

        <p>This has been the subject of some reflector discussion; SG6 confirmed this direction in March 2022.</li>

      <li><p>How much do we want to specify the algorithm? The above description is fairly detailed
        and prescriptive, which we intend to result in identical results on conforming implementations.
        However, for certain parameters our prescription requires more retries than strictly necessary.
        An alternative would be to give implementations more freedom to use more efficient sampling
        strategies; or we could consider <em>mandating</em> more efficient sampling, at the cost of
        making the specification more complex.</p>

        <p>SG6 was against removing the precise algorithm specification in March 2022.</p></li>
    </ol>

    <h2 id="wording">Proposed wording</h2>

    <p>Relative to N4958.</p>

    <p>Modify [rand.synopsis, 28.5.2] as follows.</p>

    <div class="modify">
      <div class="code">// 28.5.8.2, function template generate_canonical
        template&lt;class RealType, size_t <del>bits</del><ins>digits</ins>, class URBG&gt;
        &nbsp; RealType generate_canonical(URBG&amp; g);
      </div>
    </div>

    <p>Modify [rand.util.canonical] as follows.</p>
    <div class="modify">
      <div class="code">template&lt;class RealType, size_t <del>bits</del><ins>digits</ins>, class URBG&gt;
        &nbsp; RealType generate_canonical(URBG&amp; g);</div>
      <p>1. <em>Effects:</em> <ins>Let</ins></p>

      <div class="insert">
        <ol>
          <li><ins><em>r</em> be <code>numeric_limits&lt;RealType&gt;::radix</code>,</ins></li>
          <li><ins><em>R</em> be <code>g.max()</code>&nbsp;&minus;&nbsp;<code>g.min()</code>&nbsp;+&nbsp;1,</ins></li>
          <li><ins><em>d</em> be the smaller of <code>digits</code> and <code>numeric_limits&lt;RealType&gt;::digits</code>,</ins><sup>[<em>footnote</em>: <em><del>b</del><ins>d</ins></em> is introduced [&hellip;]]</sup></li>
          <li><ins><em>k</em> be the smallest integer such that <em>R</em><sup><em>k</em></sup>&nbsp;&ge;&nbsp;<em>r</em><sup><em>d</em></sup>, and</ins></li>
          <li><ins><em>x</em> be &lfloor;<em>R</em><sup><em>k</em></sup>/<em>r</em><sup><em>d</em></sup>&rfloor;.</ins></li>
        </ol>
      </div>

      <p><del>Invokes</del><ins>An <em>attempt</em> is <em>k</em> invocations of</ins> <code>g()</code><del> <em>k</em> times</del> to obtain values <em>g</em><sub>0</sub>, &hellip;, <em>g</em><sub><em>k</em>&minus;1</sub>, respectively<del>. Calculates</del><ins>, and the calculation of</ins> a quantity [<em>S</em> = &hellip;]<del> using arithmetic of type <code>RealType</code></del>.<ins>  Attempts are made until <em>S</em> &lt; <em>x</em><em>r</em><sup><em>d</em></sup>.</ins></p>
      <p>2. <em>Returns:</em> <ins>&lfloor;</ins><em>S</em><ins>/<em>x</em>&rfloor;</ins>/<del><em>R</em><sup><em>k</em></sup></del><ins><em>r</em><sup><em>d</em></sup></ins>.</p>
      <p>[Note: <ins>The return value <em>c</em> satisfies </ins>0 &le; <del><em>S</em>/<em>R</em><sup><em>k</em></sup></del><ins><em>c</em></ins> &lt; 1.&nbsp; &mdash;end note]</p>
      <p>3. <em>Throws:</em> What and when <code>g</code> throws.</p>
      <p>4. <em>Complexity:</em> Exactly <em>k</em><del> = max(1, &lceil;<em>b</em> / log<sub>2</sub><em>R</em>&rceil;)</del> invocations of <code>g</code><del>, where <em>b</em><sup>[<em>footnote</em>]</sup> is the lesser of <code>numeric_limits&lt;RealType&gt;::digits</code> and <code>bits</code>, and <em>R</em> is the value of <code>g.max()</code>&nbsp;&minus;&nbsp;<code>g.min()</code>&nbsp;+&nbsp;1</del><ins> per attempt</ins>.</p>
      <p>5. [Note: [&hellip;]&nbsp; &mdash;end note]</p>
    </div>

    <p>Drafting note: the <em>Complexity</em> footnote is moved into the <em>Effects</em> bullet defining what is now <em>d</em> (the erstwhile <em>b</em>).</p>
    
  </body>
</html>
