<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>

<head>
<meta content="text/html; charset=windows-1252" http-equiv="Content-Type" />
<title>Proposal to extend atomic with priority update functions</title>
<style type="text/css">
p {
	text-align: justify;
}
li {
	text-align: justify;
}
ins {
	color: #008040;
	text-decoration: underline;
}
del {
	color: #FF0000;
	text-decoration: line-through;
}
tbody {
	font-style: italic;
}
</style>
</head>

<body>

<h2 style="text-align: center">Proposal to extend atomic with priority update functions</h2>

<div>
	<br />
</div>

<address>
	Document number: WG21 N3696<br />
	Date: 2013-06-26<br />
	Reply-to: Bronek Kozicki <a href="mailto:brok@spamcop.net">brok@spamcop.net</a><br />
	Subgroup: SG1 - Concurrency<br />
</address>

<div>
	<br />
</div>

<h3>Introduction</h3>
<p>Recent research in concurrent programming [1] have identified a useful primitive 
which may be used to significantly reduce memory write contention in parallel and 
concurrent programs. A <em>priority update</em> is an operation which reads a memory 
location, compares its content to the value provided using a predicate, and writes 
the value to the memory location only if the predicate returns true. Two common 
predicates employed for this operation are <em>less-than</em> (in which case a priority 
update would be called <em>write-with-min</em>, however for consistency with existing 
atomic operations I propose to use name <em>fetch-min</em>) and <em>greater-than</em> 
(which I suggest to call <em>fetch-max</em>). A generalization of this operation 
would take an arbitrary predicate provided by the user, alongside with the value.</p>
<p>The operation can be robustly implemented using only memory read and CAS operation, 
as member functions of <code>std::atomic</code> might look like:</p>
<blockquote>
	<pre>
template &lt;typename V&gt;
T priority_update(T value, V predicate)
{
  T read = this-&gt;load();
  while (predicate(value, read) {
    if (this-&gt;compare_exchange_weak(read, value))
      return read;
  }
  return read;
}

T fetch_min(T value)
{ return priority_update(value, less&lt;T&gt;); }

T fetch_max(T value)
{ return priority_update(value, greater&lt;T&gt;); }

</pre>
</blockquote>
<p>Paper [1] identifies a range of concurrent algorithms which, when implemented 
using the above described primitive, exhibit very good performance characteristics. 
If such algorithms were to become more popular in C++ , it would be useful to provide 
the primitive in <code>&lt;atomic&gt;</code> , rather than rely on the user to &quot;Bring 
Your Own&quot;. This would serve the purpose of establishing a primitive which can 
be used for reasoning about, writing and reading of such concurrent algorithms, 
as well as allow users to automatically benefit from the hardware support for certain 
specializations of these operations, where it is available [2].</p>
<h3>Proposal</h3>
<p>I propose that a set of new member functions <code>priority_update</code>,
<code>fetch_min</code>, <code>fetch_max</code>, and associated set of overloads 
taking explicit memory ordering parameters, with the behaviour as proposed in the 
code snippet above, be added to the template <code>std::atomic</code> for all types, 
that is integral, pointer and user types; as well as corresponding free functions
<code>atomic_priority_update</code>, <code>atomic_fetch_min</code>, <code>atomic_fetch_max</code>
and <code>atomic_priority_update_explicit</code>, <code>atomic_fetch_min_explicit</code>, <code>atomic_fetch_max_explicit</code>.</p>
<p>The rationale for including pointer types 
may require some explanation - the primary use of priority update is not to yield 
a meaningful number (in fact, the return value may often be ignored), but it is 
to significantly reduce the number of memory writes. For such uses it does not matter 
what quantity is being compared as long as full ordering is guaranteed, thus comparing 
certain memory addresses is, from the point of view of algorithm designer, a perfectly 
valid operation.</p>
<h3>Existing practice</h3>
<p>Operations <code>Atomic_IMIN</code>, <code>Atomic_IMAX</code>, <code>Atomic_UMIN</code>, <code>Atomic_UMAX</code> in 
Intel GFX L3 cache [2]</p>
<p>Operations <code>atomic_imin</code>, <code>atomic_imax</code>, <code>atomic_umin</code>, <code>atomic_umax</code> in Microsoft Shader Model 5 [3]</p>
<p>Paper [1] contains large number of references to research on priority 
updates.</p>
<h3>Acknowledgments</h3>
<p>Phillip B. Gibbons and Arch Robison encouraged writing of this paper.</p>
<h3>References</h3>
<p>[1] Julian Shun, Guy E. Blelloch, Jeremy T. Finemany, Phillip B. Gibbons &quot;Reducing Contention Through Priority Updates&quot;, February 2013 CMU-CS-13-101 ,
<a href="http://reports-archive.adm.cs.cmu.edu/anon/2013/CMU-CS-13-101.pdf">http://reports-archive.adm.cs.cmu.edu/anon/2013/CMU-CS-13-101.pdf</a></p>

<p>[2] Intel OpenSource HD Graphics Programmers Reference Manual (PRM) Volume 
1 Part 7: L3$/URB (Ivy Bridge), May 2012 ,
<a href="https://01.org/linuxgraphics/sites/default/files/documentation/ivb_ihd_os_vol1_part7.pdf">
https://01.org/linuxgraphics/sites/default/files/documentation/ivb_ihd_os_vol1_part7.pdf</a></p>

<p>[3] Microsoft Shader Model 5 Assembly<a href="http://msdn.microsoft.com/en-us/library/windows/desktop/hh447232%28v=vs.85%29.aspx">
http://msdn.microsoft.com/en-us/library/windows/desktop/hh447232%28v=vs.85%29.aspx</a></p>


</body>

</html>