<html>
  <head>
    <title>Explicit model definitions are necessary</title>
  </head>
  <body>

    <h1>Explicit model definitions are necessary</h1>

    <table>
      <tr><th align="left">Author</th><td>Doug Gregor, Jeremy Siek</td></tr>
      <tr><th align="left">Contact</th><td><a href="mailto:dgregor@cs.indiana.edu">dgregor@cs.indiana.edu</a></td></tr>
      <tr><th align="left">Date</th><td>2005-04-14</td></tr>
      <tr><th align="left">Number</th><td>N1798=05-0058</td></tr>
      <tr><th align="left">Working Group</th><td>Evolution</td></tr>
    </table>

    <h2>Abstract</h2>
    <p>This paper illustrates that explicit model definitions (as
      required by the <a
      href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1758.pdf"
      />Indiana concepts proposal, N1758</a>) are necessary for any
      safe, correct, and backward-compatible formulation of concepts
      in C++. We show that certain common idioms necessitate explicit
      model definitions, in particular the use of iterators in the C++
      Standard Library. More precisely, we show that existing
      (correct) code using the standard library will still compile but
      will fail at run-time if concepts are matched implicitly based
      on structure, as in the Texas A&amp;M concepts proposal,
      N1782.</p>

      <ul>
        <li><a href="#istream_iterator">The trouble with <code>istream_iterator</code></a></li>
        <li><a href="#examples">Additional examples</a></li>
        <li><a href="#conclusions">Conclusions</a></li>
      </ul>

  <a name="istream_iterator"><h2>The trouble with <code>istream_iterator</code></h2></a>
    <p>Consider the following definition of
  the <code>ForwardIterator</code> concept that is not mutable,
  adapted from N1782:</p>

<pre>
  concept Forward_iterator&lt;Input_iterator Iter&gt;
    where Default_constructible&lt;Iter&gt;
          &amp;&amp; Assignable&lt;Iter&gt; {
      Iter p, q;     
      Iter&amp; r = (p = q);   
      const value_type&amp; t = *p;     
      Iter&amp; q2 = ++p;    
      const Iter&amp; q3 = p++;    
      const value_type&amp; t2 = *p++  
  };
</pre>

<p>and this definition of <code>istream_iterator</code> (from the GNU implementation
  of the C++ Standard Library, libstdc++):</p>

<pre>
  template&lt;typename _Tp, typename _CharT = char, 
           typename _Traits = char_traits&lt;_CharT&gt;, typename _Dist = ptrdiff_t&gt; 
    class istream_iterator 
      : public iterator&lt;input_iterator_tag, _Tp, _Dist, const _Tp*, const _Tp&amp;&gt;
    {
    public:
      typedef _CharT                         char_type;
      typedef _Traits                        traits_type;
      typedef basic_istream&lt;_CharT, _Traits&gt; istream_type;

      istream_iterator();
      istream_iterator(istream_type&amp; __s);
      istream_iterator(const istream_iterator&amp; __obj);
      const _Tp&amp; operator*() const;
      const _Tp* operator-&gt;() const;
      istream_iterator&amp; operator++();
      istream_iterator operator++(int);
    };
  
  template&lt;typename _Tp, typename _CharT, typename _Traits, typename _Dist&gt;
    inline bool 
    operator==(const istream_iterator&lt;_Tp, _CharT, _Traits, _Dist&gt;&amp; __x,
	       const istream_iterator&lt;_Tp, _CharT, _Traits, _Dist&gt;&amp; __y);

  template &lt;class _Tp, class _CharT, class _Traits, class _Dist&gt;
    inline bool 
    operator!=(const istream_iterator&lt;_Tp, _CharT, _Traits, _Dist&gt;&amp; __x,
	       const istream_iterator&lt;_Tp, _CharT, _Traits, _Dist&gt;&amp; __y);
</pre>

    <p>Does <code>istream_iterator</code> match the syntactic
    constraints of the <code>Forward_iterator</code> concept? Manually
    checking this confirms that it does. We also
    checked <code>istream_iterator</code> against
    the <code>ForwardIterator</code> concept checker in the <a
    href="http://www.boost.org/libs/concept_check">Boost Concept
    Checking library</a> (commenting
    out <code>iterator_category</code> checks, of course!)  and it did
    pass, i.e., it structurally matches
    the <code>Forward_iterator</code> concept.</p>

    <p>So, there are certain input iterator types (such
    as <code>istream_iterator</code>) that would be misclassified as
    forward iterators.  What is the danger in this? Some algorithms
    dispatch based on <code>Input_iterator</code>
    vs. <code>Forward_iterator</code>. For instance, the range
    constructor for a <code>std::vector</code> has two conceptual
    implementations required by the standard:</p>

<pre>
  // O(lg n) allocations
  template&lt;Input_iterator Iter&gt;
    vector(Iter first, Iter last)
    { while (first != last) push_back(*first++); }

  // 1 allocation
  template&lt;Forward_iterator Iter&gt;
    vector(Iter first, Iter last)
    {
      typename Iter::difference_type n = distance(first, last);
      reserve(n);
      while (n--) push_back(*first++); 
    }
</pre>

    <p>Concept-based overloading will pick the bottom version for
    <code>Forward_iterators</code> or the top version
    for <code>Input_iterator</code>s. The <code>Forward_iterator</code>
    version is preferred when it can be used
    because <code>Forward_iterator</code> is a refinement
    of <code>Input_iterator</code>.</p>

    <p>Now, we put the pieces together in one line of code, which
    reads integers from <code>cin</code> and puts them into
    the <code>vector</code> named <code>ints</code>:</p>

<pre>
  vector&lt;int&gt; ints(istream_iterator&lt;int&gt;(cin), istream_iterator&lt;int&gt;());
</pre>

    <p>We've established that <code>istream_iterator</code>
      models <code>Forward_iterator</code>, so the
      second <code>vector</code> range constructor will be
      selected. Now, consider the execution of that function:</p>

    <ol>
      <li>The first line determines how long the sequence is, by
        running through the input stream. Note that this reads all values in
        the stream in the process.</li>

      <li>The second line reserves space in the <code>vector</code>.</li>

      <li>The third line tries to read the sequence to put it into the
        <code>vector</code>, but the stream has already been exhausted
        (remember, input iterators are single-pass: you can only read them
        once!). This causes undefined behavior, because "first" is a
        past-the-end iterator and we are incrementing it.</li>
    </ol>

    <p>Why did this fail? The <code>ForwardIterator</code> concept
      adds one very important semantic guarantee
      to <code>InputIterator</code>: you can pass through and read a
      <code>ForwardIterator</code> sequence many times, but
      an <code>InputIterator</code> sequence can only be read
      once. Structural conformance assumes that semantics follow from
      syntax, which is not the case with these iterator concepts.</p>

    <p>The code above works today (without concepts) because we have
      the notion of <code>iterator_category</code>
      (in <code>iterator_traits</code>), which is a an explicit model
      declaration. <code>input_iterator_tag</code>, 
      <code>forward_iterator_tag</code>, etc. are just hacks that help
      us tell the library "my type semantically models
      the <code>InputIterator</code> concept", etc. However, if we go
      to a purely structural model for concepts in C++ (only syntactic
      matching), the example program will compile but fail at run
      time.</p>

    <p>The second problem with misinterpreting input iterators as
    forward iterators is that one will not receive a compiler error
    for code such as:</p>

    <pre>
istream_iterator&lt;int&gt; i = adjacent_find(istream_iterator&lt;int&gt;(cin), istream_iterator&lt;int&gt;());
    </pre>

    <p>This code is incorrect, and will fail to compile with standard
    library implementations that explicitly check
    the <code>iterator_category</code>. Using purely structural
    conformance, the program compiles. While it executes, it invokes
    undefined behavior (because it reads an input iterator after a
    copy of it has been incremented), but without checking it returns
    almost the right answer: the iterator returned will be one step
    too far, violating the semantics in the standard. Using explicit
    model declarations ensures that this error is detected at compile
    time.</p>

    <a name="examples"><h2>Additional examples</h2></a>
    <p>Here are some other examples of concepts that differ only by
    semantics, for which code that works today (because it emulates
    explicit model declarations) would not work if we introduce a
    purely syntactic concept system into C++0x:</p>

    <ul>
      <li><p>The Abstract Algebra concepts of <a
          href="http://mathworld.wolfram.com/Groupoid.html"><code>Groupoid</code></a>
          and <a
          href="http://mathworld.wolfram.com/Semigroup.html"><code>SemiGroup</code></a>
          can be described by:</p>

        <pre>
    template&lt;typeid T&gt;
      concept Groupoid
      {
        T operator+(T,T);
      };

    template&lt;typeid T&gt;
      concept SemiGroup : Groupoid&lt;T&gt;
      {
        // operator+ is associative
      };
      </pre>

        <p>The semantic difference becomes important in parallel
          reduction operations, which can only operate in parallel when the
          operation is associative (i.e., the type models
          the <code>SemiGroup</code> concept). For instance, a parallelizing
          STL could apply parallel reduction to
          implement <code>std::accumulate</code> when the "plus" operation
          is associative (i.e., the type models the <code>SemiGroup</code>
          concept) but it must perform sequential accumulation when the
          "plus" operation is not associative (i.e., the type only models
          the <code>Groupoid</code> concept). The two concepts are
          syntactically indistinguishable, meaning that there is no way to
          safely determine that the parallelization can be performed.</p>
      </li>

      <li>Other iterator categories that syntactically match
        <code>Input_iterator</code>, <code>Output_iterator</code>,
        or <code>Forward_iterator</code> include read-once iterators and
        write-once iterators that are useful in the context of move
        semantics.</li>

      <li>The Parallel Boost Graph Library has the following concepts
      with identical syntax:
    
        <pre>
          template&lt;typeid PG&gt; concept <a href="http://www.osl.iu.edu/research/pbgl/documentation/parallel/MessagingProcessGroup.html">MessagingProcessGroup</a> {};
          template&lt;typeid PG&gt; concept <a href="http://www.osl.iu.edu/research/pbgl/documentation/parallel/BSPProcessGroup.html">BSPProcessGroup</a> : MessagingProcessGroup {};
          template&lt;typeid PG&gt; concept <a href="http://www.osl.iu.edu/research/pbgl/documentation/parallel/ImmediateProcessGroup.html">ImmediateProcessGroup</a> : MessagingProcessGroup {};
        </pre>

        <p>The difference between these concepts is entirely semantic:
          <code>MessagingProcessGroup</code> gives very weak
    guarantees about message delivery time (in a distributed computing
    environment), whereas the two refining concepts tighen up these
    delivery time guarantees. We need to dispatch on the various
    process group types because certain distributed algorithms require
    certain delivery time guarantees that are part
    of <code>BSPProcessGroup</code>
          or <code>ImmediateProcessGroup</code>.</p></li>
    </ul>

    <a name="conclusions"><h2>Conclusions</h2></a>
    <p>Concepts contain both syntax and semantics. When model
    definitions are required, the user states explicitly that the
    semantics are satisfied. When model definitions are optional, the
    compiler matches the syntax of the concept and then assumes that
    the semantics are correct. The danger is that a data type can have
    the semantics of one concept but the syntax of another (typically
    more refined) concept, invoking improper optimizations. We avoid
    these problems in current C++ by emulating explicit model
    definitions using traits and category tags.</p>
  </body>
</html>

