<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">

<html>
  <head>
    <title>Comments on the Initialization of Random Engines</title>
    <meta author="Marc Paterno">
    <style type="text/css">
      pre {
      margin-top: 1ex;
      margin-bottom: 1ex;
      background-color: #FFFFEE;
      white-space:pre;
      border-style:solid;
      border-width:1px;
      border-color:#999999;
      padding:5px;
      width:95%;
      }      
    </style>

  </head>

  <body bgcolor="#FFFFFF" text="#000000">
    <h1>Comments on the Initialization of Random Engines</h1>

    <table align="right">
      <tr>
        <td></td>
      </tr>

      <tr>
        <td align="right"><b><i>Document number:</i></b></td>

        <td>&nbsp;N1547=03-0130</td>
      </tr>

      <tr>
        <td align="right"><b><i>Date:</i></b></td>

        <td>&nbsp;October 29, 2003</td>
      </tr>

      <tr>
        <td align="right"><b><i>Project:</i></b></td>

        <td>&nbsp; Programming Language C++</td>
      </tr>

      <tr>
        <td align="right"><b><i>Reference:</i></b></td>

        <td>&nbsp; ISO/IEC IS 14882:1998(E)</td>
      </tr>

      <tr>
        <td align="right"><b><i>Reply to:</i></b></td>

        <td>&nbsp; Marc Paterno</td>
      </tr>

      <tr>
        <td></td>

        <td>&nbsp; Fermi National Accelerator Laboratory</td>
      </tr>

      <tr>
        <td></td>

        <td>&nbsp; <a href="mailto:paterno@fnal.gov">paterno@fnal.gov</a></td>
      </tr>
    </table><br clear="all">

    <h2>Purpose</h2>

    <p>
      This document addresses some perceived problems with the
      signature of initialization functions in N1452, <i>A Proposal to
      Add an Extensible Random Number Facility to the Standard Library
      (Revision 2)</i>. It does <i>not</i> propose new language for
      TR1; if the arguments in this paper are accepted by the
      Committee, new language would be required.
    </p>

    <h2>Initializers in N1452</h2>

    <p>
      The Engines of N1452 can all be initialized and seeded by
      passing two iterators that specify a range of values. The
      specification of the range is unusual in that the first iterator
      is passed by (non-const) reference, rather than by value. The
      reason for doing this is to allow for the correct initialization
      of composed engines.
    </p>

    <h2>The problem</h2>

    <p>
      As Pete Becker points out in N1535, item 6, this signature
      causes a problem, in that natural use leads to a compilation
      failure. Pete presents the following example:
    </p>
    <pre>
unsigned long init[] = { 1, 2, 3 };
minstd_rand rng0(init, init + 3);  // illegal, init not a modifiable lvalue
minstd_rand rng1;
rng1.seed(init, init + 3);         // illegal, init not a modifiable lvalue
    </pre>

    <p>
      Pete proposes a solution to this problem, using two-phase
      initialization. This involves a constructor that takes an object
      of type <tt>rng_no_init</tt>; this constructor does no
      initialization. This solution neatly solves the problem in that
      it allows the natural use of the initialization range as
      above. But it has the drawback of allowing the construction of
      engines in a state that is not usable for most purposes. An
      engine constructed with the <tt>rng_no_init</tt> is suitable for
      nothing except initialization.
    </p>

    <p>
      I believe that there is another natural solution to this
      problem, which does not require having a constructor which
      leaves engines in an incompletely initialized state. This
      solution is mentioned in section 3E of N1452; I repeat it here,
      to draw attention to its advantages.
    </p>

    <h2>An Alternative Solution</h2>

    <p>
      As mentioned in N1452, section 3E, an alternative approach would
      be to pass a zero-argument function object (a
      &quot;generator&quot;, in an unfortunate collision of
      terms). N1452 presents the following analysis:
    </p>

    <blockquote>
      An alternative approach is to pass a zero-argument function
      object (&quot;generator&quot;) for seeding. It is trivial to
      implement a generator from a given iterator range, but it is
      more complicated to implement an iterator range from a
      generator. Also, the exception object that is specified to be
      thrown when the iterator range is exhausted could be configured
      in a user-provided iterator to generator mapping. With this
      approach, some engines would have three one-argument
      constructors: One taking a single integer for seeding, one
      taking a (reference?) to a (templated) generator, and the copy
      constructor. It appears that the opportunities for ambiguities
      or choosing the wrong overload are too confusing to the
      unsuspecting user.
    </blockquote>

    <p>
      In this discussion, I see only one argument made against this
      solution: that the presence of three one-argument constructors
      for some engines may lead to ambiguity or confusion on the part
      of the user. I believe there is no ambiguity, and that there is
      very little risk of confusion on the part of the user.
    </p>

    <h3>No Ambiguity</h3>
    <p>
      I believe that it is possible to implement the three
      one-argument constructors to avoid ambiguity.
      Note that:
      <ol>
      <li>One of the
      constructors is the copy constructor, which is not a
      template.
      <li>The second of the constructors takes an integral type;
      this is also not a template.
      <li>The third of the constructors
      takes, by non-const reference, a function or object of a type that models
      the concept <tt><b>Generator</b></tt>.
      </ol>
      If the implementer uses
      something like the &quot;enable-if&quot; idiom to prevent the
      templated constructor from being chosen when an integer argument
      is passed, or when the copy constructor is wanted, then there is
      no ambiguity. The resulting syntax for the user is very
      natural. Consider the result of several attempts to instantiate
      a specific engine:
    </p>
    <pre>
mt19937 e1(1);       // integral c'tor, conversion if needed
mt19937 e2(e1);      // copy c'tor
mt19337 e3("wrong"); // compilation failure -- const char* is not a generator
SomeGenerator_t g;   // SomeGenerator_t is a model of Generator
mt19937 e4(g);       // initialization using the given generator
    </pre>
    <p>
      Construction from integral types works, including conversion of
      the passed integral type when appropriate. The
      &quot;enable-if&quot; technique prevents the templated
      constructor from matching such a call. Copy construction works,
      because a template constructor is never a copy constructor --
      the compiler must choose the copy constructor. Construction from
      an inappropriate type fails (with a diagnostic of a quality
      dependent upon the compiler). Construction with a type which is
      a model of generator succeeds, as desired.
    </p>

    <h3>Advantages</h3>
    <p>
      I believe that the generator-based constructor is a superior alternative
      to
      the range-based constructor. Partly, this is because it does not
      suffer from the difficulties pointed out in N1535, without
      introducing the &quot;rng_no_init&quot; constructor which
      produces an engine in a mostly-unusable state. More importantly,
      I believe the concept of construction from a generator more
      closely matches the idea of what is being done.
    </p>
    <p>
      It seems to be a burden on the user to create a range of the
      appropriate size to avoid exhaustion when initializing an
      engine. &quot;Obvious&quot; uses of the range-constructor may,
      in fact, be incorrect in the sense of being certain to fail,
      because the range will be exhausted before completion of the
      constructor. A generator seems more naturally inexhaustable than
      an range specified by a pair of iterators, making it easy to
      pass the generator through as many levels of composed engines as
      the user may require.
    </p>
    <p>
      A significant benefit of the generator-based constructor is that
      one engine can be used to seed another engine, as long as the
      return type of the &quot;seeding&quot; engine is appropriately
      convertible to the type required by the &quot;seeded&quot;
      engine.
    </p>
    <h2>Conclusion</h2>
    <p>
      In summary, I believe that the generator-based constructor can
      be implemented in a manner unambiguous to the
      compiler. Additionally, it is clear in meaning to the
      user. Finally, it is easier to use than the range-based
      constructor in the construction of composed engines.
    </p>
    <p>
      While this paper addresses directly the issue of constructors, I
      believe the same arguments apply to the <tt>seed</tt> member functions.
      Changing the function <tt>seed</tt> to take a
      generator rather than a range also helps to disentangle two
      functionalities that have been conflated in previous
      discussions: the use of <tt>seed</tt> to do what the function
      name says, seeding an engine, as opposed to its use to reset the
      state from sequence describing the state of the engine
      previously captured with the &quot;save&quot; function, however
      it is spelled.
    </p>
  </body>
</html>
