<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 291: Underspecification of set algorithms</title>
<meta property="og:title" content="Issue 291: Underspecification of set algorithms">
<meta property="og:description" content="C++ library issue. Status: CD1">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue291.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#CD1">CD1</a> status.</em></p>
<h3 id="291"><a href="lwg-defects.html#291">291</a>. Underspecification of set algorithms</h3>
<p><b>Section:</b> 26.8.7 <a href="https://wg21.link/alg.set.operations">[alg.set.operations]</a> <b>Status:</b> <a href="lwg-active.html#CD1">CD1</a>
 <b>Submitter:</b> Matt Austern <b>Opened:</b> 2001-01-03 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#alg.set.operations">issues</a> in [alg.set.operations].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#CD1">CD1</a> status.</p>
<p><b>Discussion:</b></p>
<p>
The standard library contains four algorithms that compute set
operations on sorted ranges: <code>set_union</code>, <code>set_intersection</code>,
<code>set_difference</code>, and <code>set_symmetric_difference</code>.  Each
of these algorithms takes two sorted ranges as inputs, and writes the
output of the appropriate set operation to an output range.  The elements
in the output range are sorted.
</p>

<p>
The ordinary mathematical definitions are generalized so that they
apply to ranges containing multiple copies of a given element.  Two
elements are considered to be &quot;the same&quot; if, according to an
ordering relation provided by the user, neither one is less than the
other.  So, for example, if one input range contains five copies of an
element and another contains three, the output range of <code>set_union</code>
will contain five copies, the output range of
<code>set_intersection</code> will contain three, the output range of
<code>set_difference</code> will contain two, and the output range of
<code>set_symmetric_difference</code> will contain two.
</p>

<p>
Because two elements can be &quot;the same&quot; for the purposes
of these set algorithms, without being identical in other respects
(consider, for example, strings under case-insensitive comparison),
this raises a number of unanswered questions:
</p>

<ul>
<li>If we're copying an element that's present in both of the
input ranges, which one do we copy it from?</li>
<li>If there are <i>n</i> copies of an element in the relevant
input range, and the output range will contain fewer copies (say
<i>m</i>) which ones do we choose?  The first <i>m</i>, or the last
<i>m</i>, or something else?</li>
<li>Are these operations stable?  That is, does a run of equivalent
elements appear in the output range in the same order as as it
appeared in the input range(s)?</li>
</ul>

<p>
The standard should either answer these questions, or explicitly
say that the answers are unspecified.  I prefer the former option,
since, as far as I know, all existing implementations behave the
same way.
</p>



<p id="res-291"><b>Proposed resolution:</b></p>

<p>Add the following to the end of 26.8.7.3 <a href="https://wg21.link/set.union">[set.union]</a> paragraph 5:</p>
<blockquote><p>
If [first1, last1) contains <i>m</i> elements that are equivalent to
each other and [first2, last2) contains <i>n</i> elements that are
equivalent to them, then max(<i>m</i>, <i>n</i>) of these elements
will be copied to the output range: all <i>m</i> of these elements
from [first1, last1), and the last max(<i>n-m</i>, 0) of them from
[first2, last2), in that order.
</p></blockquote>

<p>Add the following to the end of 26.8.7.4 <a href="https://wg21.link/set.intersection">[set.intersection]</a> paragraph 5:</p>
<blockquote><p>
If [first1, last1) contains <i>m</i> elements that are equivalent to each
other and [first2, last2) contains <i>n</i> elements that are
equivalent to them, the first min(<i>m</i>, <i>n</i>) of those 
elements from [first1, last1) are copied to the output range.
</p></blockquote>

<p>Add a new paragraph, <b>Notes</b>, after 26.8.7.5 <a href="https://wg21.link/set.difference">[set.difference]</a>
paragraph 4:</p>
<blockquote><p>
If [first1, last1) contains <i>m</i> elements that are equivalent to each
other and [first2, last2) contains <i>n</i> elements that are
equivalent to them, the last max(<i>m-n</i>, 0) elements from 
[first1, last1) are copied to the output range.
</p></blockquote>

<p>Add a new paragraph, <b>Notes</b>, after 26.8.7.6 <a href="https://wg21.link/set.symmetric.difference">[set.symmetric.difference]</a>
paragraph 4:</p>
<blockquote><p>
If [first1, last1) contains <i>m</i> elements that are equivalent to
each other and [first2, last2) contains <i>n</i> elements that are
equivalent to them, then |<i>m - n</i>| of those elements will be
copied to the output range: the last <i>m - n</i> of these elements
from [first1, last1) if <i>m</i> &gt; <i>n</i>, and the last <i>n -
m</i> of these elements from [first2, last2) if <i>m</i> &lt; <i>n</i>.
</p></blockquote>

<p><i>[Santa Cruz: it's believed that this language is clearer than
  what's in the Standard.  However, it's also believed that the
  Standard may already make these guarantees (although not quite in
  these words).  Bill and Howard will check and see whether they think
  that some or all of these changes may be redundant.  If so, we may
  close this issue as NAD.]</i></p>




<p><b>Rationale:</b></p>
<p>For simple cases, these descriptions are equivalent to what's
  already in the Standard.  For more complicated cases, they describe
  the behavior of existing implementations.</p>





</body>
</html>
