<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head>
<meta http-equiv="content-type" content="text/html; charset=windows-1252">
  <style type="text/css">
h3 {  margin-bottom: 0;}
.comment { color: #999999; font-style: italic; }
.pre { color: #000099; }
.string { color: #009900; }
.char { color: #009900; }
.float { color: #996600; }
.int { color: #999900; }
.bool { color: #000000; font-weight: bold; }
.type { color: #FF6633; }
.flow { color: #FF0000; }
.keyword { color: #990000; }
.operator { color: #663300; font-weight: bold; }
.operator { color: #663300; font-weight: bold; }
pre.code {
	border: 2px solid #666;
	background-color: #F4F4F4;
	padding-left: 10px;
	padding-top: 0px;
}
code {
	border: 2px solid #d0d0d0;
	background-color: LightYellow;
	padding: 2px;
	padding-left: 10px;
	display:table;
	white-space:pre;
	margin:2px;
	margin-bottom:10px;
}
dt {
	font-weight: bold;
}
.ins {
	background-color:#A0FFA0;
}
.del {
	background-color:#FFA0A0;
	text-decoration:line-through
}    
</style>

<title>Timing barriers</title>
</head>

<body>
<p>Document number: P0342R0<br>
Date: 2016-05-30<br>
Reply-To:&nbsp;&nbsp;Mike Spertus, Symantec (<a href="mailto:mike_spertus@symantec.com">mike_spertus@symantec.com</a>)<br>
Audience: Evolution Working Group
</p>
<h1>Timing barriers</h1> 
<h2>Abstract</h2>
Timing blocks of code for profiling purposes and to compare different code approaches
is central to production-quality software development. However, modern compiler reordering
	of code invalidates such measurements. This paper proposes a simple implementation-defined
	timing fence to support such use cases.
<h2>The problem</h2>
When teaching my students C++, I wanted to demonstrate how poor the performance pf the 
	naive implementation of the Fibonacci sequence is and ran the following code.
<code>#include&lt;chrono&gt;
#include&lt;atomic&gt;
#include&lt;iostream&gt;
using namespace std;

size_t
fib(size_t n)
{
  if (n == 0)
    return n;
  if (n == 1)
    return 1;
  return fib(n - 1) + fib(n - 2);
}

int const which{42};

int main()
{
  auto start = chrono::high_resolution_clock::now();
  auto result = fib(which);
  auto end = chrono::high_resolution_clock::now();
  cout << "fib(" &lt;&lt; which << ") is " &lt;&lt; result &lt;&lt; endl;
  cout << "Elapsed time is " &lt;&lt; chrono::duration_cast<chrono::milliseconds>(end - start).count() &lt;&lt; "ms" << endl;

  return 0;
}
</code>
<p>I was surprised to discover that the program printed an elapsed time of zero milliseconds and quickly realized
	that I had embarrassingly neglected the possibility that the compiler would reorder the calculation out of
	the timing region. However, when I tried to fix this problem to get valid timings, I realized that
	the <em>as-if rule</em> means that there is no standards-compliant way to do this basic activity.
	It was tempting to try using mutexes or atomic fences, which also inhibit code motion, but only
	as regards multiple threads and therefore impose no requirements on this single threaded program.
	I was able to invoke separate compilation to get a proper demonstration, but even that is vulnerable
	to whole program optimizations and link-time code generation, etc. As non-local compiler optimizations improve,
	the ability to benchmark algorithms in a portable way will become incereasingly degraded.</p>
<h2>The proposal</h2>
Without decimating the <em>as-if rule</em>, there appears to be no way to normatively
require such timings to be correct. Nevertheless, timing a block of code or an algorithm is not devoid 
	of meaning. This suggests the following simple proposal: Add an implementation-defined <tt>std::timing_fence()</tt>
	function that is a hint that code should not be moved from one side of the fence to the other. While 
	compiler writers are free to ignore it as a matter of compliance, 
	better capturing of the code into the benchmarked region would definitely qualify as a real-world
	improved quality-of-implementation.
</body>
</html>