<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0040)file://C:\WINDOWS\TEMP\binary_search.htm -->
<!-- saved from url=(0022)http://internet.e-mail --><!-- saved from url=(0070)http://people.ne.mediaone.net/abrahams/binary_search/binary_search.htm --><HTML><HEAD><TITLE>Binary Search with Heterogeneous Comparison</TITLE>
<META content="MSHTML 5.00.2919.6307" name=GENERATOR>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type></HEAD>
<BODY bgColor=#ffffff text=#000000>
<H1>Binary Search with Heterogeneous Comparison</H1>
<P>Document number: J16-01/0027 = WG21 N1313 
<P>Author: David Abrahams 
<P>Date: 18-April-2001 
<H2>Introduction</H2>
<P>The standard library provides the following powerful binary search algorithms 
which operate on sorted ranges. Each of these functions takes the following 
parameters: 
<UL>
  <LI><TT>start</TT> - an iterator indicating the start of a range to search 
  <LI><TT>finish</TT> - an iterator indicating the end of a range to search 
  <LI><TT>value</TT> - a value to be searched for 
  <LI><TT>compare</TT> - an optional comparison function </LI></UL>
<TABLE border=1>
  <TBODY>
  <TR>
    <TH>Operation 
    <TH>Effects 
  <TR>
    <TD><TT><A 
      href="http://www.sgi.com/tech/stl/binary_search.html">binary_search</A></TT> 

    <TD>Return <TT>true</TT> iff [<TT>start</TT>, <TT>finish</TT>) contains an 
      element matching <TT>value</TT> 
  <TR>
    <TD><TT><A 
      href="http://www.sgi.com/tech/stl/lower_bound.html">lower_bound</A></TT> 
    <TD>Find a <TT>value</TT>'s first order-preserving position in 
      [<TT>start</TT>, <TT>finish</TT>) 
  <TR>
    <TD><TT><A 
      href="http://www.sgi.com/tech/stl/upper_bound.html">upper_bound</A></TT> 
    <TD>Find a <TT>value</TT>'s last order-preserving position in 
      [<TT>start</TT>, <TT>finish</TT>) 
  <TR>
    <TD><TT><A 
      href="http://www.sgi.com/tech/stl/equal_range.html">equal_range</A></TT> 
    <TD>Find a <TT>value</TT>'s first <I>and</I>last order-preserving 
      positions in [<TT>start</TT>, <TT>finish</TT>) </TD></TR></TBODY></TABLE>
<P>If the comparison function is omitted, the search uses the less-than operator 
to compare the supplied value to elements of the range. Unfortunately, there is 
a large class of useful applications of binary searching for which the standard 
library algorithms are not guaranteed to work. 
<H2>Example</H2>
<P>Suppose you tried to use these functions to implement a dictionary lookup. A 
dictionary is composed of entries which contain a word plus the word's 
definition. Well, you'd like to be able to look up a word's dictionary entry 
given just the word, and not be forced to build an entire dictionary entry just 
to do the search, since the definition part of the dictionary entry won't be 
used at all. To experienced users of hand-crafted binary searches, this usage is 
certainly familiar and reliable. For example: 
<BLOCKQUOTE><PRE>// Type definitions
typedef std::string Word;
typedef std::string Definition;
typedef std::pair&lt;Word, Definition&gt; DictionaryEntry
typedef std::vector&lt;DictionaryEntry&gt; Dictionary;

// Binary search for the position of word in d.
// Almost exactly like std::lower_bound
Dictionary::const_iterator word_position(
    const Dictionary&amp; d,
    const Word&amp; word)
{
  Dictionary::const_iterator first = d.begin();
  std::size_t len = d.size();
  while (len &gt; 0) {
    const std::size_t half = len &gt;&gt; 1;
    const Dictionary::const_iterator middle = first + half;
    if (*middle &lt; word) {
      first = middle;
      ++first;
      len -= half + 1;
    }
    else {
      len = half;
    }
  }
  return first;
}

// Return a pointer to the definition of the given word, or 0 if the
// word doesn't appear in the dictionary
const Definition*
find_definition(const Dictionary&amp; d, const Word&amp; word)
{
  Dictionary::const_iterator p = word_position(d, word);
  return (p == d.end() || p-&gt;first != word) ? 0 : &amp;p-&gt;second;
}

// Define a word in the dictionary or throw if already defined
void define_word(
  Dictionary&amp; d, 
  const Word&amp; word, 
  const Definition&amp; definition)
{
  Dictionary::const_iterator p = word_position(d, word);
  if (p != d.end() &amp;&amp; p-&gt;first == word) {
    throw std::exception("duplicate definition");
  }
  else {
    d.insert(d.begin() + (p - d.begin()),
             DictionaryEntry(word, definition));
  }
}

</PRE></BLOCKQUOTE>
<P>The question is, instead of writing the word_position() function above, which 
is tedious and error-prone, can we reuse the generic algorithms in the standard 
library? This is what Scott Meyers was trying to accomplish in the comp.std.c++ 
thread entitled <A 
href="http://groups.google.com/groups?hl=en&amp;lr=&amp;safe=off&amp;ic=1&amp;th=70fa0a02e7d49135,31&amp;seekm=MPG.143ab91a14f21268989743%40news.supernews.com#p">Heterogeneous 
comparison in binary search</A>. 
<P>Certainly, with nearly all implementations<A 
href="file:///C:/WINDOWS/TEMP/binary_search.htm#1">[1]</A> of the standard 
library, we can use the following comparison function object with 
std::lower_bound(), and it will give us the expected results: 
<BLOCKQUOTE><PRE>// A "heterogeneous comparison object" struct
CompareEntryWord1 { bool operator()(const DictionaryEntry&amp; e, const
Word&amp; w) const { return e.first &lt; w; } };

</PRE></BLOCKQUOTE>
<P>But is it legal? The standard's position on this question is not encouraging. 
For one thing, 25.3 says that for the algorithms to work correctly, the 
comparison object has to induce a strict weak ordering on the values. If we take 
``the values" to mean the elements of the iterator range, then our comparison 
function clearly fails: you can't use it when both arguments are 
DictionaryEntrys. The standard also says the algorithms ``assume that the 
sequence being searched is in order according to the implied or explicit 
comparison function," which makes little sense when the comparison function 
can't compare the sequence elements. 
<P>Technically, though, we can satisfy the standard's requirements for the 
comparison function by adding an overloaded <TT>operator()()</TT> to ``keep the 
language lawyers happy": 
<BLOCKQUOTE><PRE>struct CompareEntryWord2
{
  // Heterogeneous comparison actually gets used
  bool operator()(const DictionaryEntry&amp; e, const Word&amp; w) const
    { return e.first &lt; w; }

  // Homogeneous comparison just to satisfy the legal requirements.
  bool operator()(
    const DictionaryEntry&amp; e, const DictionaryEntry&amp;&amp; w) const
    { return e.first &lt; w.first; }
};

</PRE></BLOCKQUOTE>
<P>This version is arguably legal, but it subverts the intent of the standard. 
The authors of that text clearly never meant to leave this loophole in there for 
us. One dead giveaway is that the EFFECTS: clause for lower_bound says ``Finds 
the first position into which value can be inserted without violating the 
ordering." Clearly, when the value doesn't have the same type as the rest of the 
range, the clause becomes nonsensical<A 
href="file:///C:/WINDOWS/TEMP/binary_search.htm#2">[2]</A>. 
<P>Dietmar Kuehl has suggested an alternative which doesn't suffer these 
problems: define a new iterator which wraps Dictionary::const_iterator, but 
returns a const Word&amp; (instead of a const DictionaryEntry&amp;) when 
dereferenced. This approach has two new problems, though: 
<OL>
  <LI>It won't work for many similar cases where the value to be compared with 
  must be computed from the elements of the range, because an iterator used with 
  the standard binary searches is required to return a reference type when 
  dereferenced. It may well be that no appropriate lvalue exists to which a 
  reference can be returned. 
  <LI>it requires much more code than simply re-writing the binary search 
  algorithm (<A 
  href="http://groups.google.com/groups?hl=en&amp;lr=&amp;safe=off&amp;th=959c8231aa8754b8,7&amp;rnum=4&amp;ic=1&amp;selm=8qsvee%24tk0%241%40nnrp1.deja.com">this 
  posting</A> shows an example), and is correspondingly more error-prone. 
</LI></OL>
<P>I think ``the right answer" in the long run is to figure out how to loosen 
the standard's requirements so that CompareEntryWord1 can be guaranteed to work 
as expected. Matt Austern made a first stab at it in <A 
href="http://groups.google.com/groups?start=10&amp;hl=en&amp;lr=&amp;safe=off&amp;th=70fa0a02e7d49135,31&amp;rnum=12&amp;ic=1&amp;selm=39D2B72E.58D35CF2%40ihug.co.nz">this 
posting</A>, but was discouraged to find that his first attempt, though probably 
already too complicated, wasn't complicated enough to do the job. This is the 
formulation he ended up with: 
<BLOCKQUOTE><I>comp</I> is a function object whose first argument type is 
  <I>V</I> and whose second argument type is <I>T</I>, and where <I>comp(x, 
  y)</I> is equivalent to <I>comp'(pi(x),&nbsp;y)</I>, where <I>comp'</I> is 
  some strict weak ordering on <I>T</I> and where <I>pi</I> is some homomorphism 
  from <I>V</I> to <I>T</I>. The sequence <TT>[first, last)</TT> must be sorted 
  in ascending order by <I>comp''</I>, where <I>comp''(x, y)</I> is equivalent 
  to <I>comp'(pi(x), pi(y))</I>. </BLOCKQUOTE>
<P>Even if this is formally correct, it is probably beyond the ken of most 
committee members to verify its correctness, and beyond the ken of even most 
expert programmers to verify that their comparison functions satisfy these 
criteria. That makes it a bad choice for the standard on both counts. 
<P>The problem with both of these early attempts is that they focus on the sort 
order of the range. This is an obvious way to think of things if the range 
elements and the search key are the same type, but I think to solve the problem 
for the case we're interested in, a shift in thinking is required. Strict weak 
ordering is a great concept for sorting, but maybe it's not appropriate for 
searching. 
<P>Suppose, as a simplification, that we think about the search key as though it 
were bound to one of the arguments of the comparison function (say, using 
<TT>std::bind2nd</TT> for <TT>lower_bound()</TT>). That gives us a simple unary 
comparison function object operating on elements of the range and returning 
bool. In the lower_bound algorithm, we are searching for the first element for 
which the unary function object returns false (or the end position if no such 
element exists). For this to work, of course, the unary function object must 
return true for zero or more initial elements, and false for all the rest. That 
is, the sequence must be <I>partitioned</I> with respect to 
<TT>comp(e,&nbsp;value)</TT>, where value is the search key. I believe this 
formulation captures what's actually going on with binary_search more generally, 
and to boot, is simpler to express. 
<P>This point-of-view is reflected in the currently proposed wording for <A 
href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/lwg-active.html#270">library 
issue 270</A>. 
<H2>Footnotes</H2>
<P><A name=1>[1]</A> SGI's library implementation actually has some fancy 
"concept checks" which try to make sure you're following all the rules at 
compile time, and it would fail to compile the use of the above comparison 
object with lower_bound. 
<P><A name=2>[2]</A> On the other hand, the EFFECTS: clause is arguably 
redundant, since the result of the algorithm is much more clearly specified by 
the RETURNS: clause, which still makes perfect sense: 
<BLOCKQUOTE>Returns: The furthermost iterator i in the range [first, last] 
  such that for any iterator j in the range [first, i) the following 
  corresponding conditions hold: *j &lt; value or comp(*j, value) != false 
</BLOCKQUOTE>
<P>Revised <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %b %Y" startspan -->15 
Feb 2001<!--webbot bot="Timestamp" endspan i-checksum="14373" --> 
<P> Copyright David Abrahams 2001. Permission to copy, use, modify, sell and 
distribute this document is granted provided this copyright notice appears in 
all copies. This document is provided "as is" without express or implied 
warranty, and with no claim as to its suitability for any purpose. <!--  LocalWords:  HTML html charset alt gif abrahams htm const
        incrementable david abrahams
         --><!--  LocalWords:  jeremy siek mishandled interoperable typename struct Iter iter src
         --><!--  LocalWords:  int bool ForwardIterator BidirectionalIterator BaseIterator
         --><!--  LocalWords:  RandomAccessIterator DifferenceType AdaptableUnaryFunction
         --><!--  LocalWords:  iostream hpp sizeof InputIterator constness ConstIterator
         David Abrahams
         --></P></BODY></HTML>
