<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

    <title>P0574R0: Algorithm Complexity Constraints and Parallel Overloads</title>
    <style type="text/css">
      p {text-align:justify}
      li {text-align:justify}
      blockquote.note
      {
      background-color:#E0E0E0;
      padding-left: 15px;
      padding-right: 15px;
      padding-top: 1px;
      padding-bottom: 1px;
      }
      ins, .inserted, ins p, .inserted p
      {
      color: black;
      background: #a0ffa0;
      text-decoration: underline;
      }
      del, del p
      {
      color: black;
      background: #ffa0a0;
      text-decoration: line-through;
      }
    </style>
  </head><body>
    <table>
      <tr><td>Document Number:</td><td>P0574R0</td></tr>
      <tr><td>Date:</td><td>2017-02-04</td></tr>
      <tr><td>Author:</td><td><a href="mailto:anthony@justsoftwaresolutions.co.uk">Anthony
            Williams</a><br>Just Software Solutions Ltd</td></tr>
      <tr><td>Audience:</td><td>SG1, LWG</td></tr>
    </table>
    <h1>P0574R0: Algorithm Complexity Constraints and Parallel Overloads</h1>

    <h2>Introduction</h2>

    <p>The C++17 draft features overloads of a lot of the standard library
    algorithms which take an <code>ExecutionPolicy</code> argument, in order to
    allow parallel implementations. However, the requirements of many of the
    algorithms are written in such a way as to require a sequential
    implementation, or a sub-optimal parallel implementation, thus eliminating
    some or all of the benefit in the new overloads.</p>

    <p><a href="http://wg21.link/p0523">P0523r0: Wording for CH 10: Complexity
    of parallel algorithms</a> attempts to address this problem by a blanket
    relaxation of the constraints for overloads with
    an <code>ExecutionPolicy</code>. I think this is an incorrect solution. It
    simultaneously grants too much leeway for some algorithms, while not fixing
    the problems with others. Instead, I think it is better to address each
    algorithm individually.</p>

    <p>In this paper I have gathered those algorithms where the requirements are
    specified in such a way as to rule out an efficient parallel implementation,
    and proposed alternative specifications for those algorithms.</p>

    <h2>Algorithms and proposed wording</h2>

    <h3>25.3.8 <code>adjacent_find</code> [alg.adjacent.find]</h3>

    <p>Change paragraph 2 as follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
      or an <code>ExecutionPolicy</code>
      of <code>std::execution::sequential_policy</code>, f</ins><del>F</del>or a
      nonempty range, exactly <code>min((i - first) + 1, (last - first) -
      1)</code> applications of the corresponding predicate,
      where <code>i</code> is <code>adjacent_find</code>’s return
      value.  <ins>For the overloads with an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, at
      most <code>(last-first)-1</code> applications of the corresponding
      predicate.</ins>
    </blockquote>

    <h3>25.4.1 Copy [alg.copy]</h3>

    <p>Modify the requirements for <code>copy_if</code> in paragraph 12 as
      follows:</p>

    <blockquote>
      Requires: The ranges <code>[first, last)</code> and <code>[result, result
      + (last - first))</code> shall not overlap. <ins>For the overloads with
      an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>,
      <code>std::iterator_traits&lt;InputIterator&gt;::value_type</code> shall
      be <code>MoveConstructible</code>.</ins>
    </blockquote>

    <h3>25.6.8 Remove [alg.remove]</h3>

    <p>Modify the requirements for <code>remove_copy</code>
      and <code>remove_copy_if</code> in paragraph 7 as follows:</p>

    <blockquote>
      Requires: The ranges <code>[first, last)</code> and <code>[result, result
      + (last - first))</code> shall not overlap.  The expression <code>*result
      = *first</code> shall be valid. <ins>For the overloads with
      an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, the type
      of <code>*first</code> shall satisfy the <code>CopyConstructible</code>
      requirements.</ins>
    </blockquote>

    <h3>25.6.9 Unique [alg.unique]</h3>

    <p>Modify the requirements for <code>unique_copy</code>
      and <code>unique_copy_if</code> in paragraph 5 as follows:</p>

    <blockquote>
      Requires: The comparison function shall be an equivalence relation. The ranges <code>[first, last)</code> and
      <code>[result, result+(last-first))</code> shall not overlap. The
      expression <code>*result = *first</code> shall be valid.  Let <code>T</code> be the value type of
      <code>InputIterator</code>. <ins>For the overloads with
      no <code>ExecutionPolicy</code>, or an <code>ExecutionPolicy</code>
      of <code>std::execution::sequential_policy</code>, the following shall
        hold:</ins><p><ins>&mdash;</ins> If <code>InputIterator</code> meets the forward
      iterator requirements, then there are no additional requirements
      for <code>T</code>. Otherwise, if <code>OutputIterator</code> meets the
      forward iterator requirements and its value type is the same
      as <code>T</code>, then <code>T</code> shall
      be <code>CopyAssignable</code> (Table 26).  Otherwise, <code>T</code>
      shall be both <code>CopyConstructible</code> (Table 24)
      and <code>CopyAssignable.</code></p>
      <p><ins>For the overloads with an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, <code>T</code> shall
      be both <code>CopyConstructible</code> (Table 24)
      and <code>CopyAssignable.</code></ins></p>
    </blockquote>

    <h3>25.6.14 Partitions [alg.partitions]</h3>

    <p>Modify the complexity for <code>partition</code> in paragraph 7 as
      follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
      or an <code>ExecutionPolicy</code>
        of <code>std::execution::sequential_policy</code>, i</ins><del>I</del>f
      <code>ForwardIterator</code> meets the requirements for
      a <code>BidirectionalIterator</code>, at most
      <code>(last - first) / 2</code> swaps are done; otherwise at
        most <code>last - first</code> swaps are done. Exactly <code>last -
        first</code> applications of the predicate are done. <ins>For the
        overloads with an <code>ExecutionPolicy</code> other
        than <code>std::execution::sequential_policy</code>, no worse than O(N
        log N) swaps, where N = <code>last - first</code>. Exactly <code>last -
          first</code> applications of the predicate are done.</ins>
    </blockquote>

    <p>Modify the complexity for <code>stable_partition</code> in paragraph 11 as
      follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
      or an <code>ExecutionPolicy</code>
      of <code>std::execution::sequential_policy</code>, a</ins><del>A</del>t
      most N log(N ) swaps, where N = <code>last - first</code>, but only O(N )
      swaps if there is enough extra memory. Exactly <code>last - first</code>
      applications of the predicate. <ins>For the overloads with
      an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, O(N log N) swaps,
      where N = <code>last - first</code>. Exactly <code>last - first</code>
        applications of the predicate are done.</ins>
    </blockquote>

    <p>Modify the requirements for <code>partition_copy</code> in paragraph 12
      as follows:</p>

    <blockquote>
      Requires: <code>InputIterator</code>’s value type shall
      be <code>CopyAssignable</code>, and shall be writable (24.2.1) to
      the <code>out_true</code>
      and <code>out_false</code> <code>OutputIterator</code>s, and shall be
      convertible to <code>Predicate</code>’s argument type.  The input range shall not
      overlap with either of the output ranges. <ins>For the overloads with
      an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, <code>InputIterator</code>’s
        value type shall also be <code>CopyConstructible</code>.</ins>
    </blockquote>

    <h3>25.7.2 Nth element [alg.nth.element]</h3>

    <p>Modify the complexity for <code>nth_element</code> in paragraph 3 as
      follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
      or an <code>ExecutionPolicy</code>
      of <code>std::execution::sequential_policy</code>,
      l</ins><del>L</del>inear on average. <ins>For the overloads with
      an <code>ExecutionPolicy</code> other
      than <code>std::execution::sequential_policy</code>, O(N) executions of
      the predicate, and no worse than O(N log N) swaps, where N = <code>last -
      first</code>.</ins>
    </blockquote>

    <h3>25.7.4 Merge [alg.merge]</h3>

    <p>Modify the complexity for <code>merge</code> in paragraph 4 as
      follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
        or an <code>ExecutionPolicy</code>
        of <code>std::execution::sequential_policy</code>, a</ins><del>A</del>t
        most <code>(last1 - first1) + (last2 - first2) - 1</code>
        comparisons. <ins>For the overloads with an <code>ExecutionPolicy</code>
        other than <code>std::execution::sequential_policy</code>,
        O(<code>(last1 - first1) + (last2 - first2) - 1</code>)
        comparisons.</ins>
    </blockquote>

    <p>Modify the complexity for <code>inplace_merge</code> in paragraph 8 as
      follows:</p>

    <blockquote>
      Complexity: <ins>For the overloads with no <code>ExecutionPolicy</code>,
        or an <code>ExecutionPolicy</code>
        of <code>std::execution::sequential_policy</code>,
        if</ins><del>When</del> enough additional memory is
        available, <code>(last - first) - 1</code> comparisons. <del>If no
        additional memory is available</del><ins>Otherwise</ins>, an algorithm
        with complexity <ins>at most O(</ins><code>N</code> log(<code>N</code>
        )<ins>)</ins> may be used, where <code>N = last - first</code>.
    </blockquote>

    <h3>26.8.3 Reduce [reduce]</h3>

    <p>Modify paragraph 5 as follows:</p>

    <blockquote>
      Requires: <code>binary_op</code> shall neither invalidate iterators or
      subranges, nor modify elements in the range <code>[first,
      last)</code>. <ins>For the overloads with an <code>ExecutionPolicy</code>
      other than <code>std::execution::sequential_policy</code>,
      both <code>T</code> and <code>BinaryOperation</code> shall
      be <code>CopyConstructible</code>. All
        of <code>binary_op(init,*first)</code>,
        <code>binary_op(*first,init)</code>, <code>binary_op(init,init)</code>,
      and <code>binary_op(*first,*first)</code> shall be convertible
        to <code>T</code>.</ins>
    </blockquote>

    <h3>26.8.4 Transform reduce [transform.reduce]</h3>

    <p>Modify paragraph 1 as follows:</p>

    <blockquote>
      Requires: Neither <code>unary_op</code> nor <code>binary_op</code> shall
      invalidate subranges, or modify elements in the range <code>[first,
      last)</code>. <ins>For the overloads with an <code>ExecutionPolicy</code>
      other than <code>std::execution::sequential_policy</code>, all
      of <code>T</code>, <code>UnaryOperation</code>
      and <code>BinaryOperation</code> shall
        be <code>CopyConstructible</code>. All
        of <code>binary_op(init,unary_op(*first)</code>,
        <code>binary_op(unary_op(*first),init)</code>, <code>binary_op(init,init)</code>,
      and <code>binary_op(unary_op(*first),unary_op(*first))</code> shall be
      convertible to <code>T</code>.</ins>
    </blockquote>

    <h3>26.8.11 Adjacent difference [adjacent.difference]</h3>

    <p>Change paragraph 2 as follows:</p>
    
    <blockquote>
      Effects: <ins>For the overloads with no <code>ExecutionPolicy</code>, or an
        <code>ExecutionPolicy</code>
        of <code>std::execution::sequential_policy</code>, f</ins><del>F</del>or
        a non-empty range, the function creates an accumulator <code>acc</code>
        whose type is <code>InputIterator</code>’s value type, initializes it
        with <code>*first</code>, and assigns the result
        to <code>*result</code>. For every iterator <code>i</code> in
        [<code>first + 1</code>,<code> last</code>) in order, creates an
        object <code>val</code> whose type is <code>InputIterator</code>’s value
        type, initializes it with <code>*i</code>, computes <code>val -
        acc</code> or <code>binary_op(val, acc)</code>, assigns the result
        to <code>*(result + (i - first))</code>, and move assigns
        from <code>val</code> to <code>acc</code>. <ins>For the overloads with
        an <code>ExecutionPolicy</code> other
        than <code>std::execution::sequential_policy</code>, for a non-empty
        range, for every value <code>d</code> in
        [<code>1</code>,<code>last-first-1</code>) the function
        computes <code>*(first+d)-*(first+d-1)</code>
        or <code>binary_op(*(first+d),*(first+d-1))</code> and assigns the
        result to <code>*(result + d)</code></ins>
    </blockquote>
      
    </body></html>
    
    
