<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html class="gr__open-std_org"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<style type="text/css">
pre {margin-left:20pt; }
pre > i {
  font-family: "OCR A Extended", "Consolas", "Lucida Console", monospace;
  font-style:italic;
}
code > i {
  font-family: "OCR A Extended", "Consolas", "Lucida Console", monospace;
  font-style:italic;
}
pre > em {
  font-family: "OCR A Extended", "Consolas", "Lucida Console", monospace;
  font-style:italic;
}
code > em {
  font-family: "OCR A Extended", "Consolas", "Lucida Console", monospace;
  font-style:italic;
}
body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding-left: 0.5empadding-right: 0.5em; ; }

blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

table.header { border: 0px; border-spacing: 0;
  margin-left: 0px; font-style: normal; }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.4em; border: none;
  padding-right: 0.4em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.4em; border: none;
  padding-right: 0.4em; border: none; }
</style>

<title>Remove dedicated precalculated hash lookup interface</title>
</head>

<body data-gr-c-s-loaded="true">

<table class="header"><tbody>
  <tr>
    <th>Document number:&nbsp;&nbsp;</th><th> </th><td>P1661R1</td>
  </tr>
  <tr>
    <th>Date:&nbsp;&nbsp;</th><th> </th><td>2019-07-17</td>
  </tr>
  <tr>
    <th>Audience:&nbsp;&nbsp;</th><th> </th><td>Library Working Group</td>
  </tr>
  <tr>
    <th>Reply-to:&nbsp;&nbsp;</th><th> </th><td><address>Tomasz Kamiński &lt;tomaszkam at gmail dot com&gt;</address></td>
  </tr>
</tbody></table>

<h1><a name="title">Remove dedicated precalculated hash lookup interface</a></h1>

<h2><a name="intro">1. Introduction</a></h2>

<p>This paper proposed to remove the precalculated hash lookup interface (as proposed in  <a href="https://wg21.link/p0920r2">P0920R2: Precalculated hash values in lookup</a>),
   as this functionality can be implemented by the user in less error-prone manner, via heterogeneous lookup for unordered containers (<a href="https://wg21.link/p0919r3">P0919R3</a>).</p>

<p>[ Note: The implementation of hashers from this paper are included only to illustrate how precalculated hash lookup can be implemented.
     They are not proposed for standardization. ]</p>

<!--h2><a name="toc">Table of contents</a></h2-->

<h2><a name="history">2. Revision history</a></h2>

<h3><a name="history.r0">2.2. Revision 1</a></h3>

<ul>
  <li>Changed audience to LWG.</li>
  <li>Updated links to most recent revision (R2) of P0920.</li>
</ul>


<h3><a name="history.r0">2.1. Revision 0</a></h3>

<p>Initial revision.</p>


<h2><a name="motivation">3. Motivation and Scope</a></h2>

<p>Contrary to information presented in the <a href="https://wg21.link/p0920r2">P0920R2: Precalculated hash values in lookup</a>,
   the C++20 is already providing efficient way to precalculate hash value for multiple lookups,
   via heterogeneous lookup for unordered containers (<a href="https://wg21.link/p0919r3">P0919R3</a>).</p>

<p>To illustrate, the following code provides a type-safe implementation for the <code>std::unordered_map</code>
   with <code>std::string</code> key:</p>
<pre>
template&lt;typename Key&gt;
class PrehashedStringKey
{
public:
  using key_type = Key;

  explicit PrehashedStringKey(std::size_t h, key_type v)
   : key_(std::move(v)), hash_(h)
  {}

  std::size_t hash() const
  { return hash_; }

  key_type const&amp; key() const&amp;
  { return key_; }

  key_type&amp;&amp; key() &amp;&amp;
  { return std::move(key_); }

  friend bool operator==(PrehashedStringKey const&amp; lhs, std::string_view rhs)
  { return lhs.key() == rhs; }

  template&lt;typename U&gt;
  friend bool operator==(PrehashedStringKey const&amp; lhs, PrehashedStringKey&lt;U&gt; const&amp; rhs)
  { return lhs.key() == rhs.key(); }

private:
  key_type key_;
  std::size_t hash_;
};

struct PrecalculationEnabledStringHasher
{
   using transparent_key_equal = std::equal_to&lt;&gt;;

   template&lt;typename T&gt;
   using prehashed_key_type = PrehashedStringKey&lt;T&gt;;

   std::size_t operator()(std::string_view sv) const
   { return std::hash&lt;std::string_view&gt;{}(sv); }

   template&lt;typename T&gt;
   std::size_t operator()(prehashed_key_type&lt;T&gt; const&amp; v) const
   { return v.hash(); }

   template&lt;typename T&gt;
     requires std::is_convertible_v&lt;T, std::string_view&gt;
   prehashed_key_type&lt;T&gt; precompute(T t) const
   {
     std::size_t hash = operator()(std::string_view(t));
     return prehashed_key_type&lt;T&gt;(hash, std::move(t));
   }
};
</pre>
<p>The above interface introduces a <code>precompute</code> function that bundles given key value (regardless if it is <code>std::string</code>, <code>std::string_view</code>, or <code>char const*</code>),
   that can be used with multiple <code>unordered_map</code> that uses the same hasher. To illustrate the example the from Motivation section of <a href="https://wg21.link/p0920r1">P0920R1</a>,
   will now look as follows:</p>
<pre>std::array&lt;std::unordered_map&lt;std::string, int, PrecalculationEnabledStringHasher&gt;, array_size&gt; maps;

void update(const std::string&amp; user)
{
  const auto hashedUser = maps.front().hash_function().precompute(user);
  for(auto&amp; m : maps) {
    auto it = m.find(hashedUser);
    // ...
  }
}</pre>

<h3><a name="motivation.errors.parity">3.1. Eliminate the possibility of pairing errors</a></h3>

<p>The obvious advantage of the above solution is that it eliminates the possibility of pairing key with incorrect hash
   value when passed to <code>find</code> function. For example, the following code contains undefined behaviour, due
   paring of <code>user2</code> with <code>hash1</code>:</p>
<pre>std::array&lt;std::unordered_map&lt;std::string, int&gt;, array_size&gt; maps;

void update(const std::string&amp; user1, const std::string&amp; user2)
{
  const auto hash1 = maps.front().hash_function()(user1);
  const auto hash1 = maps.front().hash_function()(user2);
  for(auto&amp; m : maps) {
    auto it1 = m.find(user1, hash1);
    auto it2 = m.find(user2, hash1); // undefined behaviour
    // ...
  }
}</pre>

<p>However, such kind of error is not possible in case of <code>PrehashedStringKey</code>
   as we are bundling the key value with the appropriate hash:</p>
<pre>std::array&lt;std::unordered_map&lt;std::string, int, PrecalculationEnabledStringHasher&gt;, array_size&gt; maps;

void update(const std::string&amp; user)
{
  const auto hashedUser1 = maps.front().hash_function().precompute(user1);
  const auto hashedUser2 = maps.front().hash_function().precompute(user2);
  for(auto&amp; m : maps) {
    auto it1 = m.find(hashedUser1);
    auto it2 = m.find(hashedUser2);
    // ...
  }
}</pre>

<p>The difference in the above interface can be compared with the difference between using a range,
   versus passing iterator/sentinel as separate arguments.</p>


<h3><a name="motivation.error.type_safety">3.2. Providing type-safety for key</a></h3>

<p>Similarly to above, the currently proposed interface does not provide type-safety for key &mdash;
   it is possible to accidentally pass a hash of unrelated type to the <code>find</code> function,
   and thus causing undefined behaviour.</p>

<p>To illustrate, lets consider following example of correct code:</p>
<pre>std::unordered_map&lt;std::chrono::milliseconds, int&gt; m1;
std::unordered_map&lt;std::chrono::milliseconds, int&gt; m2;

void update(std::chrono::seconds secs)
{
   const auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash);
      // ...
   }
}</pre>

<p>This code will silently break (undefined behaviour) in situation when one of the maps
   would be changed to use <code>std::chrono::seconds</code> as key:</p>
<pre>std::unordered_map&lt;std::chrono::milliseconds, int&gt;&gt; m1;
std::unordered_map&lt;std::chrono::seconds, int&gt;&gt; m2;

void update(std::chrono::seconds secs)
{
   vconst auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash); // undefined behaviour
      // ...
   }
}</pre>
<p>[ Note: That above code still compiles, as <code>std::chrono::seconds</code> are convertible to keys of both
     <code>m1</code> (<code>std::chrono::milliseconds</code>) and <code>m2</code> (<code>std::chrono::seconds</code>). ]</p>

<p>However, in case if wrapper similar to <code>PrecomptedStringHash</code> would be used (see <a href="#implementability">Implementability</a> section),
   such mistakes would be reported as compilation error:</p>
<pre>std::unordered_map&lt;std::chrono::milliseconds, int, PrecalculationEnabledHasherFor&lt;std::chrono::milliseconds&gt;&gt; m1;
std::unordered_map&lt;std::chrono::seconds, int, PrecalculationEnabledHasherFor&lt;std::chrono::seconds&gt;&gt; m2;

void update(std::chrono::seconds secs)
{
   const auto hashed = m1.hash_function().precompute(secs);
   {
      auto it1 = m1.find(hashed);
      // ...
   }
   {
      auto it2 = m2.find(hashed); // ill-formed
      // ...
   }
}</pre>

<h3><a name="motivation.error.mixing_hashers">3.3. Mixing different hashers</a></h3>

<p>Another drawback of current interface, is that it allows silently mixing of <code>unordered_map</code> instances,
   that use  different hashers. This applies both for the type of the hasher and its state.</p>

<p>To illustrate, following code contains undefined behavior, due hash mismatch:</p>
<pre>std::unordered_map&lt;std::string, int, MurmurHash&gt; m1;
std::unordered_map&lt;std::string, int, CityHash&gt; m2;

void update(std::string const&amp; secs)
{
   const auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash); // undefined behavior
      // ...
   }
}</pre>

<p>Above problem, can be immediately addressed, by tagging the key/hash pair produced by the hasher
   with the hash type (see <a href="#implementability">Implementability</a> section).</p>
 <pre>std::unordered_map&lt;std::string, int, PrecalculationEnabledHasher&lt;MurmurHash&gt;&gt; m1;
std::unordered_map&lt;std::string, int, PrecalculationEnabledHasher&lt;CityHash&gt;&gt; m2;

void update(std::string const&amp; secs)
{
   const auto hashedKey = m1.hash_function()(secs);
   {
      auto it1 = m1.find(hashedKey);
      // ...
   }
   {
      auto it2 = m2.find(hashedKey); // ill-formed
      // ...
   }
}</pre>

<p>However, all of above does not address the possibility of using hasher of the same type,
   but having different state. [ Note: This is not an issue is all hasher use the same program-wide
   state. ]</p>

<p>One of the possible options is to introduce a check to the <code>find</code> method,
   that will rehash the object and compare it with the provided hash value. However, such
   solution eliminates are performance gains of this feature, making it more suited for the debug builds.</p>

<p>Use of the <code>PrehashedKey</code> approach, provides alternative solutions.
   Firstly, we can store the state of the hasher (salt) in key/hash bundle, and validating it inside
   <code>operator()</code>:</p>
<pre>template&lt;typename T&gt;
std::size_t operator()(prehashed_key_type&lt;T&gt; const&amp; v) const
{
  if (v.salt() != hash_.salt())
    report_error();
  return v.hash();
}
</pre>
<p>Above solution is reducing performance impact (comparing state of hasher is usually less
   expensive than rehashing of a key), and can be eliminated in case of stateless hasher
   object.</p>

<p>Secondly, we can mitigate hasher state mismatch errors, by using different hasher (and <code>PrehashedKey</code>) for each
   container definition. This can be achieved by adding an otherwise unused tag template parameter to
   <code>PrecalculationEnabledHasher</code> and <code>PrehashedKey</code>.</p>

<h3><a name="motivation.summary">3.4. Summary</a></h3>

<p>Examples presented above show that the "power user" that need to squeeze the extra performance from
   lookup of multiple <code>unordered_map</code> with the same hashes (type and state) are already able to achieve
   that goal via custom hasher.</p>

<p>In addition, with this customized implementation, each user can take a different
   approach regarding the unavoidable safety versus speed trade-off for the solution, e.g.:</p>
<ul>
  <li>always compare the state of hasher in <code>precompute</code> method,</li>
  <li>store reference to key instead of copy in <code>prehashed_key</code>,</li>
  <li>use different hasher for each container definition (via tagging).</li>
</ul>
<p>making the solution more fitting to the specific use case.</p>

<p>Finally, the hasher based solution makes it possible to localize the usage error-prone
   interface to only specific part of the code. This is not possible if the lookup mechanism
   is included as part of the <code>unordered_map</code> interface.</p>

<p>In light of the above author believes that dedicated precalculated lookup interface shall be removed
   from the standard, in favor of custom user-provided implementation, that suits their need.</p>


<h2><a name="wording">4. Proposed Wording</a></h2>

<p>Revert the changes introduced in by <a href="https://wg21.link/p0920r2">P0920R2</a>.</p>

<h2><a name="implementability">5. Implementability</a></h2>

<p>Below you may find the general implementation of the hash wrapper, that allow to extend any
   existing hasher with precalculated hash lookup functionality.</p>
<pre>
template&lt;typename Key, typename Hasher&gt;
struct PrehashedKey
{
  using key_type = Key;

  template&lt;typename U&gt;
  explicit PrehashedKey(std::size_t h, U&amp;&amp; u)
   : key_(std::forward&lt;U&gt;(u)), hash_(h)
  {}

  std::size_t hash() const
  { return hash_; }

  key_type const&amp; key() const&amp;
  { return key_; }

  key_type&amp;&amp; key() &amp;&amp;
  { return std::move(key_); }

private:
  key_type key_;
  std::size_t hash_;
};

template&lt;typename Equal, typename Hasher&gt;
class PrecalculationEnabledHashEqual
{
  [[no_unique_address]] Equal equal;

public:
  using is_transparent = void;

  template&lt;typename T&gt;
  using prehashed_key_type = PrehashedKey&lt;T, Hasher&gt;;

  PrecalculationEnabledHashEqual() = default;
  PrecalculationEnabledHashEqual(Equal e) : equal(std::move(e)) {}

  template&lt;typename T, typename U&gt;
    requires std::is_invocable_v&lt;Equal const&amp;, T const&amp;, U const&amp;&gt;
  decltype(auto) operator()(T const&amp; lhs, U const&amp; rhs) const
  {
     return equal(lhs, rhs);
  }

  template&lt;typename T, typename U&gt;
    requires std::is_invocable_v&lt;Equal const&amp;, T const&amp;, U const&amp;&gt;
  decltype(auto) operator()(prehashed_key_type&lt;T&gt; const&amp; lhs, U const&amp; rhs) const
  {
     return equal(lhs.key(), rhs);
  }

  template&lt;typename T, typename U&gt;
    requires std::is_invocable_v&lt;Equal const&amp;, T const&amp;, U const&amp;&gt;
  decltype(auto) operator()(T const&amp; lhs, prehashed_key_type&lt;U&gt; const&amp; rhs) const
  {
     return equal(lhs, rhs.key());
  }

  template&lt;typename T, typename U&gt;
    requires std::is_invocable_v&lt;Equal const&amp;, T const&amp;, U const&amp;&gt;
  decltype(auto) operator()(prehashed_key_type&lt;T&gt; const&amp; lhs, prehashed_key_type&lt;U&gt; const&amp; rhs) const
  {
     return equal(lhs.key(), rhs.key());
  }
};

template&lt;typename Hasher&gt;
struct HasherEquailty
{ using type = std::equal_to&lt;&gt;; };

template&lt;typename Hasher&gt;
  requires (requires () { typename Hasher::transparent_key_equal; })
struct HasherEquailty&lt;Hasher&gt;
{ using type = typename Hasher::transparent_key_equal; };

template&lt;typename Hasher&gt;
class PrecalculationEnabledHasher
{
  [[no_unique_address]] Hasher hasher;

public:
   using transparent_key_equal = PrecalculationEnabledHashEqual&lt;HasherEquailty&lt;Hasher&gt;, Hasher&gt;;

   template&lt;typename T&gt;
   using prehashed_key_type = PrehashedKey&lt;T, Hasher&gt;;

   PrecalculationEnabledHasher() = default;
   PrecalculationEnabledHasher(Hasher h) : hasher(std::move(h)) {}

   template&lt;typename T&gt;
     requires std::is_invocable_r_v&lt;std::size_t, Hasher const&amp;, T const&amp;&gt;
   std::size_t operator()(T const&amp; t) const
   { return hasher(t); }

   template&lt;typename T&gt;
   std::size_t operator()(prehashed_key_type&lt;T&gt; const&amp; v) const
   { return v.hash(); }

   template&lt;typename T&gt;
     requires std::is_invocable_r_v&lt;std::size_t, Hasher const&amp;, std::decay_t&lt;T&gt; const&amp;&gt;
   prehashed_key_type&lt;std::decay_t&lt;T&gt;&gt; precompute(T&amp;&amp; t) const
   {
     std::size_t hash = operator()(std::as_const(t));
     return prehashed_key_type&lt;std::decay_t&lt;T&gt;&gt;(hash, std::forward&lt;T&gt;(t));
   }
};

template&lt;typename T&gt;
using PrecalculationEnabledHasherFor = PrecalculationEnabledHasher&lt;std::hash&lt;T&gt;&gt;;
</pre>

With above code, the <code>PrecalculationEnabledStringHasher</code> is equivalent to <code>PrecalculationEnabledHasherFor&lt;std::string_view&gt;</code>.</p>


<h2><a name="acknowledgements">6. Acknowledgements</a></h2>

<p>Special thanks and recognition goes to Sabre (<a href="http://www.sabre.com/">http://www.sabre.com</a>) for supporting the production of this proposal
   and author's participation in standardization committee.</p>

<h2><a name="literature">7. References</a></h2>

<ol>
  <li>Mateusz Pusz,
      "Precalculated hash values in lookup"
      (P0920R2, <a href="http://wg21.link/p0920r2">http://wg21.link/p0920r2</a>)</li>

  <li>Mateusz Pusz,
      "Heterogeneous lookup for unordered containers"
      (P0919R3, <a href="http://wg21.link/p0919r3">http://wg21.link/p0919r3</a>)</li>

  <li>Richard Smith,
      "Working Draft, Standard for Programming Language C++"
      (N4762, <a href="https://wg21.link/n4762">https://wg21.link/n4762</a>)</li>

</ol>

</body></html>
