<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>p0037r7</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(title);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--adapted from https://github.com/Thiht/markdown-viewer/blob/master/chrome/lib/sss/sss.css-->
  <style type="text/css">
  body {
      color: #333;
      font-family: 'Segoe UI', 'Lucida Grande', Helvetica, sans-serif;
      line-height: 1.5;
      margin: auto;
      max-width: 1000px;
  }
  
  h1, h2, h3, h4, h5, h6 {
      font-weight: normal;
      line-height: 1em;
      margin: 20px 0;
  }
  
  h1 {
      font-size: 2.25em;
  }
  
  h2 {
      font-size: 1.75em;
  }
  
  h3 {
      font-size: 1.5em;
  }
  
  h4, h5, h6 {
      font-size: 1.25em;
  }
  
  a {
      color: #08C;
      text-decoration: none;
  }
  
  a:hover, a:focus {
      text-decoration: underline;
  }
  
  a:visited {
      color: #058;
  }
  
  img {
      max-width: 100%;
  }
  
  li + li {
      margin-top: 3px;
  }
  
  dt {
      font-weight: bold;
  }
  
  code {
      background: #EEE;
      font-family: "Consolas", "Lucida Console", monospace;
      padding: 1px 5px;
  }
  
  pre {
      background: #EEE;
      padding: 5px 10px;
      white-space: pre-wrap;
  }
  
  pre code {
      padding: 0;
  }
  
  blockquote {
      border-left: 5px solid #EEE;
      margin: 0;
      padding: 0 10px;
  }
  
  table {
      border-collapse: collapse;
      width: 100%;
  }
  
  table + table {
      margin-top: 1em;
  }
  
  thead {
      background: #EEE;
      text-align: left;
  }
  
  th, td {
      border: 1px solid #EEE;
      padding: 5px 10px;
  }
  
  hr {
      background: #EEE;
      border: 0;
      height: 1px;
  }
  </style>
</head>
<body>
<p><strong>Document number</strong>: P0037R7<br />
<strong>Date</strong>: 2019-06-17<br />
<strong>Reply-to</strong>: John McFarlane, <a href="mailto:wg21@john.mcfarlane.name">wg21@john.mcfarlane.name</a><br />
<strong>Audience</strong>: SG6, LEWGI</p>
<h1>Fixed-Point Real Numbers</h1>
<h2><a name="Introduction"></a>Introduction</h2>
<p>This proposal introduces a system for performing fixed-point arithmetic using integral types.</p>
<h2>Contents</h2>
<ul>
<li><a href="#Motivation">Motivation</a></li>
<li><a href="#Impact-On-the-Standard">Impact On the Standard</a></li>
<li><a href="#Design-Decisions">Design Decisions</a>
<ul>
<li><a href="#Class-Template">Class Template</a>
<ul>
<li><a href="#Rep-Type-Template-Parameter"><code>Rep</code> Type Template Parameter</a></li>
<li><a href="#Scale-Type-Template-Parameters"><code>Scale</code> Type Template Parameters</a></li>
<li><a href="#Exponent-and-Radix-Non-Type-Template-Parameters"><code>Exponent</code> and <code>Radix</code> Non-Type Template Parameters</a></li>
</ul></li>
<li><a href="#Requirements-on-Rep">Requirements on <code>Rep</code></a></li>
<li><a href="#Conversion">Conversion</a></li>
<li><a href="#Access-to-Rep-Value">Access to <code>Rep</code> Value</a></li>
<li><a href="#Class-Template-Deduction">Class Template Deduction</a></li>
<li><a href="#Operators">Operators</a></li>
<li><a href="#Division-Operator">Division Operator</a></li>
<li><a href="#Custom-Division">Custom Division</a></li>
<li><a href="#Alternative-Types-for-Rep">Alternative Types for <code>Rep</code></a>
<ul>
<li><a href="#Required-Specializations">Required Specializations</a></li>
</ul></li>
<li><a href="#Example">Example</a></li>
</ul></li>
<li><a href="#Technical-Specification">Technical Specification</a>
<ul>
<li><a href="#Header-scaled_integer-Synopsis">Header &lt;scaled_integer&gt; Synopsis</a>
<ul>
<li><a href="#scaled_integer-Class-Template"><code>scaled_integer&lt;&gt;</code> Class Template</a></li>
</ul></li>
</ul></li>
<li><a href="#Open-Issues">Open Issues</a>
<ul>
<li><a href="#Library-Support">Library Support</a></li>
<li><a href="#Extended-Comparison-Range">Extended Comparison Range</a></li>
<li><a href="#Allow-Binary-Operations-if-Radixes-are-Different">Allow Binary Operations if Radixes are Different</a></li>
</ul></li>
<li><a href="#Prior-Art">Prior Art</a>
<ul>
<li><a href="#N1169">N1169</a></li>
<li><a href="#P0106">P0106</a></li>
<li><a href="#Ada-Language-Support">Ada Language Support</a></li>
</ul></li>
<li><a href="#Acknowledgements">Acknowledgements</a></li>
<li><a href="#Revisions">Revisions</a></li>
<li><a href="#Reference-Implementation">Appendix 1: Reference Implementation</a></li>
<li><a href="#Performance">Appendix 2: Performance</a>
<ul>
<li><a href="#Types">Types</a></li>
<li><a href="#Basic-Arithmetic">Basic Arithmetic</a></li>
<li><a href="#3-Dimensional-Magnitude-Squared">3-Dimensional Magnitude Squared</a></li>
<li><a href="#Circle-Intersection">Circle Intersection</a></li>
</ul></li>
</ul>
<h2><a name="Motivation"></a>Motivation</h2>
<p>Floating-point types are an exceedingly versatile and widely supported method of expressing real numbers on modern architectures.</p>
<p>However, there are certain situations where fixed-point arithmetic is preferable:</p>
<ul>
<li>Some systems lack native floating-point registers and must emulate them in software;</li>
<li>many others are capable of performing some or all operations more efficiently using integer arithmetic;</li>
<li>certain applications can suffer from the variability in precision which comes from a dynamic radix point <a href="http://www.pathengine.com/Contents/Overview/FundamentalConcepts/WhyIntegerCoordinates/page.php">[pathengine]</a>;</li>
<li>binary arithmetic operations including <code>+</code>, <code>-</code>, <code>*</code> and <code>/</code> result in underflow, i.e. loss of lower-precision bits;</li>
<li>in situations where a variable exponent is not desired, it takes valuable space away from the significand and reduces precision and</li>
<li>not all hardware and compilers produce exactly the same results, leading to non-deterministic results.</li>
</ul>
<p>Integer types provide the basis for an efficient, lossless representation of binary fixed-point real numbers. However, laborious, error-prone steps are required to normalize the results of certain operations and to convert to and from fixed-point types.</p>
<p>A set of tools for defining and manipulating fixed-point types is proposed. These tools are designed to make work easier for those who traditionally use integers to perform low-level, high-performance fixed-point computation. They are composable such that a wide range of trade-offs between speed, accuracy and safety are supported.</p>
<h2><a name="Impact-On-the-Standard"></a>Impact On the Standard</h2>
<p>This proposal is a pure library extension. It does not require changes to any standard classes or functions. It adds several new class and function templates to new header file, <code>&lt;scaled_integer&gt;</code>. Some optional deduction guides, member functions and operator overloads rely on types proposed in <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0828r0.md">[P0828]</a> and <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p1050r0.md">[P1050]</a>.</p>
<h2><a name="Design-Decisions"></a>Design Decisions</h2>
<p>The design is driven by the following aims in roughly descending order:</p>
<ol>
<li>to automate the task of performing low-level fixed-point arithmetic using integer types — including non-standard fundamental integers or classes;</li>
<li>to minimise precision loss due to underflow;</li>
<li>to treat fixed-point as a super-set of integer such that a fixed-point type with an exponent of zero can provide a drop-in replacement for its underlying integer type;</li>
<li>to avoid incurring expense — including compilation time — for unused features and</li>
<li>to facilitate a style of code that is intuitive to anyone who is comfortable with integer and floating-point arithmetic.</li>
</ol>
<p>More generally, the aim of this proposal is to contain within a single API all the tools necessary to perform fixed-point arithmetic. The design facilitates a wide range of competing compile-time strategies for avoiding overflow and precision loss, but implements only the simplest by default. Similarly, orthogonal concerns such as run-time overflow detection and rounding modes are deferred to the underlying integer types used as storage.</p>
<h3><a name="Class-Template"></a>Class Template</h3>
<p>Fixed-point numbers are specializations of</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb1-1" title="1"><span class="kw">template</span> &lt;<span class="kw">class</span> Rep = <span class="dt">int</span>, <span class="kw">class</span> Scale = power&lt;&gt;&gt;</a>
<a class="sourceLine" id="cb1-2" title="2"><span class="kw">class</span> scaled_integer;</a></code></pre></div>
<p>where <code>power</code> is a tag type declared as</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb2-1" title="1"><span class="kw">template</span> &lt;<span class="dt">int</span> Exponent = <span class="dv">0</span>, <span class="dt">int</span> Radix = <span class="dv">2</span>&gt;</a>
<a class="sourceLine" id="cb2-2" title="2"><span class="kw">struct</span> power;</a></code></pre></div>
<p>and where the template parameters are described as follows.</p>
<h4><a name="Rep-Type-Template-Parameter"></a><code>Rep</code> Type Template Parameter</h4>
<p>This parameter indicates the integral type used as storage. Fundamental integral types other than <code>bool</code> are ideal choices but any suitably integer-like type can be used.</p>
<p>Other than scale, the characteristics of <code>scaled_integer&lt;Rep&gt;</code> are the characteristics of <code>Rep</code> including:</p>
<ul>
<li>signedness;</li>
<li>number of digits;</li>
<li>behavior of operators and</li>
<li>alignment.</li>
</ul>
<h4><a name="Scale-Type-Template-Parameters"></a><code>Scale</code> Type Template Parameter</h4>
<p><code>Scale</code> is a tag type used to specify the type of static scaling applied to the integer in order to convert it between its underlying value and the semantic value of the <code>scaled_integer</code> type. Future specification of the relationship between <code>Scale</code> and <code>Rep</code> could be used to allow user-defined alternatives to <code>power</code>. An example of an existing type which might replace <code>power</code> is <code>ratio</code>.</p>
<h4><a name="Exponent-and-Radix-Non-Type-Template-Parameters"></a><code>Exponent</code> and <code>Radix</code> Non-Type Template Parameters</h4>
<p>The radix (or base) of a fixed-point type is typically two to denote scaling by powers of two. In financial applications, accurate representation of decimal fractions requires a radix of ten. Thus while <code>Radix</code> can be any number greater than one, <code>2</code> is the default.</p>
<p>The exponent of a fixed-point type is the equivalent of the exponent field in a floating-point type and shifts the stored value by the requisite number of digits necessary to produce the desired range. The default value of <code>Exponent</code> is zero, giving <code>scaled_integer&lt;T&gt;</code> the same range as <code>T</code>. By far the most common use of fixed-point is to store values with fractional digits. Thus, the exponent is typically a negative value.</p>
<p>The resolution of an instantiation of <code>scaled_integer</code> is</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb3-1" title="1">pow(Radix, Exponent)</a></code></pre></div>
<p>and the minimum and maximum values are</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb4-1" title="1"><span class="bu">std::</span>numeric_limits&lt;Rep&gt;::min() * pow(Radix, Exponent)</a></code></pre></div>
<p>and</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb5-1" title="1"><span class="bu">std::</span>numeric_limits&lt;Rep&gt;::max() * pow(Radix, Exponent)</a></code></pre></div>
<p>respectively.</p>
<p>Any usage that results in values of <code>Exponent</code> which lie outside the range, (<code>INT_MIN / 2</code>, <code>INT_MAX / 2</code>), may result in undefined behavior and/or overflow or underflow. This range of exponent values is far in excess of the largest built-in floating-point type and should be adequate for all intents and purposes.</p>
<h3><a name="Requirements-on-Rep"></a>Requirements on <code>Rep</code></h3>
<p>First and foremost, <code>Rep</code> is required to be an arithetic component as discussed in <a href="wg21.link/p0554">[P0554]</a>.</p>
<p>Semantically <code>Rep</code> is a fundamental integral type or a class emulating parts of such a type necessary to invoke the equivalent <code>scaled_integer</code> operation. For example, <code>scaled_integer&lt;Rep&gt;</code>'s binary <code>operator+</code> requires that <code>Rep</code> similarly has an arithmetic binary <code>operator+</code>. To be convertible to/from other arithmetic types, <code>Rep</code> must support the same conversion.</p>
<p>Additionally, <code>Rep</code> must be scalable, which is to say it must support arithmetic left-shift, multiply and division. This is necessary — not only for conversion to and from other arithmetic types but also — in order to normalize the operands in operations such as addition and comparison.</p>
<p>Finally, <code>Rep</code> should provide specializations for the customisation points laid down in <a href="#Required-Specializations">Required Specializations</a>.</p>
<h3><a name="Conversion"></a>Conversion</h3>
<p>While effort is made to ensure that significant digits are not lost during conversion, no effort is made to avoid rounding errors. Whatever would happen when converting to and from <code>Rep</code> largely applies to <code>scaled_integer</code> objects also. For example:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb6-1" title="1">scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">1</span>&gt;&gt;{<span class="fl">.499</span>}==<span class="dv">0</span></a></code></pre></div>
<p>...equates to <code>true</code> and is considered an acceptable rounding error.</p>
<h3><a name="Access-to-Rep-Value"></a>Access to <code>Rep</code> Value</h3>
<p>It is sometimes necessary to read from and write to the <code>Rep</code> value contained in a <code>scaled_integer&lt;Rep&gt;</code> object. This is supported through numeric traits, <code>to_rep</code> and <code>from_rep</code> respectively. These traits are described in paper, <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0675r0.md">[P0675]</a>.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb7-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> a = from_rep&lt;scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">8</span>&gt;&gt;&gt;{}(<span class="dv">320</span>);</a>
<a class="sourceLine" id="cb7-2" title="2"><span class="kw">static_assert</span>(a == <span class="fl">1.25</span>);</a>
<a class="sourceLine" id="cb7-3" title="3"></a>
<a class="sourceLine" id="cb7-4" title="4"><span class="kw">constexpr</span> <span class="kw">auto</span> b = to_rep(a);</a>
<a class="sourceLine" id="cb7-5" title="5"><span class="kw">static_assert</span>(b == <span class="dv">320</span>);    <span class="co">// 1.25*(1&lt;&lt;8)</span></a></code></pre></div>
<h3><a name="Class-Template-Deduction"></a>Class Template Deduction</h3>
<p>The type of a <code>scaled_integer</code> object can be deduced by an integer initializer:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">auto</span> a = scaled_integer(<span class="dv">0</span><span class="bu">ul</span>);</a>
<a class="sourceLine" id="cb8-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(a), scaled_integer&lt;<span class="dt">unsigned</span> <span class="dt">long</span>&gt;&gt;);</a></code></pre></div>
<p>It can also be deduced with an integral constant of type <code>constant</code> (described in <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0827r0.md">[P0827]</a>):</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb9-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> b = scaled_integer(constant&lt;<span class="bn">0xFF00000000</span><span class="bu">L</span>&gt;{});</a>
<a class="sourceLine" id="cb9-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(b), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;<span class="dv">32</span>&gt;&gt;&gt;);</a>
<a class="sourceLine" id="cb9-3" title="3"><span class="kw">static_assert</span>(to_rep(b) == <span class="bn">0xFF</span>);</a></code></pre></div>
<p>For <code>Exponent</code>, the highest value which does not incur data loss is used. This minimizes the required range of the underlying integer value which reduces the likelihood of out-of-range errors during arithmetic operations. For <code>Rep</code>, a fundamental integer type of <code>int</code> width is preferred unless a wider type is required.</p>
<h3><a name="Operators"></a>Operators</h3>
<p>Any arithmetic, comparison, logic and bitwise operators that might be applied to integer types can also be applied to fixed-point types. A guiding principle of operator overloads is that they perform as little run-time computation as is practically possible.</p>
<p>With the exception of shift and comparison operators, binary operators can take any combination of:</p>
<ul>
<li>one or two fixed-point arguments and</li>
<li>zero or one arguments of any arithmetic type, i.e. a type for which <code>numeric_limits</code> is specialized.</li>
</ul>
<p>Assuming a binary operation, <code>@</code>, in the form</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb10-1" title="1"><span class="kw">auto</span> R = S <span class="er">@</span> T;</a></code></pre></div>
<p>where <code>S</code> is of type <code>scaled_integer&lt;RepS, power&lt;ExponentS, 2&gt;&gt;</code> and <code>T</code> is a numeric type — possibly another <code>scaled_integer</code> instantiation — then result, <code>R</code>, of the operation is determined as follows.</p>
<ol>
<li><p>If <code>T</code> is a floating-point type, <code>Float</code>, then <code>S</code> is cast to <code>Float</code> and a floating-point operation takes place, e.g.:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb11-1" title="1"><span class="kw">auto</span> a = scaled_integer&lt;<span class="dt">long</span> <span class="dt">long</span>&gt;(<span class="dv">3</span>) + <span class="fl">4.</span><span class="bu">f</span>;</a>
<a class="sourceLine" id="cb11-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(a), <span class="kw">decltype</span>(<span class="fl">3.</span><span class="bu">f</span> + <span class="fl">4.</span><span class="bu">f</span>)&gt;);</a></code></pre></div></li>
<li><p>If <code>T</code> is a <code>constant</code> of integer type, <code>Integer</code>, then:</p>
<p>a. If the operator is bitwise left shift (<code>&lt;&lt;</code>), then the result is <code>scaled_integer&lt;RepS, power&lt;ExponentS+T::value&gt;&gt;</code> with the shift operator applied.</p>
<p>b. If the operator is bitwise right shift (<code>&gt;&gt;</code>), then the result is <code>scaled_integer&lt;RepS, power&lt;ExponentS-T::value&gt;&gt;</code> with the shift operator applied.</p>
<p>c. Otherwise, <code>T</code> is converted to <code>scaled_integer(T{})</code>, e.g.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb12-1" title="1"><span class="kw">auto</span> b = scaled_integer(<span class="dv">200</span><span class="bu">U</span>) - constant&lt;<span class="dv">100</span><span class="bu">L</span>&gt;{};</a>
<a class="sourceLine" id="cb12-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;</a>
<a class="sourceLine" id="cb12-3" title="3">    <span class="kw">decltype</span>(b),</a>
<a class="sourceLine" id="cb12-4" title="4">    <span class="kw">decltype</span>(scaled_integer&lt;<span class="dt">unsigned</span>&gt;(<span class="dv">200</span>) - scaled_integer&lt;<span class="dt">int</span>&gt;(<span class="dv">100</span>))&gt;);</a></code></pre></div>
<p>and proceeding rule #4 subsequently applies.</p></li>
<li><p>If <code>T</code> is an integer type, <code>Integer</code>, then:</p>
<p>a. If the operator is bitwise shift (<code>&lt;&lt;</code> or <code>&gt;&gt;</code>), then the result is type <code>S</code> with the shift operator applied.</p>
<p>b. Otherwise, <code>T</code> is cast to <code>scaled_integer&lt;Integer&gt;</code>, e.g.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb13-1" title="1"><span class="kw">auto</span> c = scaled_integer&lt;&gt;(<span class="dv">5</span>) * <span class="dv">6</span><span class="bu">ul</span>;</a>
<a class="sourceLine" id="cb13-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;</a>
<a class="sourceLine" id="cb13-3" title="3">    <span class="kw">decltype</span>(c),</a>
<a class="sourceLine" id="cb13-4" title="4">    <span class="kw">decltype</span>(scaled_integer&lt;&gt;(<span class="dv">5</span>) * scaled_integer&lt;<span class="dt">unsigned</span> <span class="dt">long</span>&gt;(<span class="dv">6</span>))&gt;);</a></code></pre></div>
<p>and proceeding rule #4 subsequently applies.</p></li>
<li><p>If <code>T</code> is type, <code>scaled_integer&lt;RepT, power&lt;ExponentT, 2&gt;&gt;</code>, then:</p>
<p>a. If the operator is multiplication (<code>*</code>), then the result is <code>scaled_integer&lt;decltype(RepS*RepT), power&lt;ExponentS+ExponentT&gt;&gt;</code>, e.g.:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb14-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> d = scaled_integer&lt;<span class="dt">uint8_t</span>, power&lt;-<span class="dv">7</span>&gt;&gt;{<span class="fl">1.25</span>} </a>
<a class="sourceLine" id="cb14-2" title="2">     * scaled_integer&lt;<span class="dt">uint8_t</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="dv">8</span>};</a>
<a class="sourceLine" id="cb14-3" title="3"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(d), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">10</span>&gt;&gt;&gt;);</a>
<a class="sourceLine" id="cb14-4" title="4"><span class="kw">static_assert</span>(d == <span class="dv">10</span>);</a></code></pre></div>
<p>b. If the operator is division (<code>/</code>), then the result is <code>scaled_integer&lt;decltype(RepS/RepT), power&lt;ExponentS-ExponentT&gt;&gt;</code>, e.g.:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb15-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> e = scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">5</span>&gt;&gt;{<span class="fl">1.5</span>} / scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="fl">2.5</span>};</a>
<a class="sourceLine" id="cb15-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(e), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">2</span>&gt;&gt;&gt;);</a>
<a class="sourceLine" id="cb15-3" title="3"><span class="kw">static_assert</span>(e == <span class="fl">.5</span>);</a></code></pre></div>
<p>c. If the operator is modulo (<code>%</code>), then the result is <code>scaled_integer&lt;decltype(RepS%RepT), power&lt;ExponentS&gt;&gt;</code>, e.g.:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb16-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> f = scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">5</span>&gt;&gt;{<span class="fl">1.5</span>} % scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="fl">2.5</span>};</a>
<a class="sourceLine" id="cb16-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(f), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">5</span>&gt;&gt;&gt;);</a>
<a class="sourceLine" id="cb16-3" title="3"><span class="kw">static_assert</span>(f == <span class="fl">.25</span>);</a></code></pre></div>
<p>d. If the operator is addition (<code>+</code>) or subtraction (<code>-</code>), then the operand with the greater exponent is converted such that its exponent matches the other operands' exponent. Then the result is <code>scaled_integer&lt;decltype(RepS@RepT), power&lt;min(ExponentS,ExponentT)&gt;&gt;</code>, eg.:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb17-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> g = scaled_integer&lt;<span class="dt">int8_t</span>, power&lt;-<span class="dv">2</span>&gt;&gt;{<span class="fl">12.5</span>} - scaled_integer&lt;<span class="dt">short</span>&gt;{<span class="dv">8</span>};</a>
<a class="sourceLine" id="cb17-2" title="2"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(g), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">2</span>&gt;&gt;&gt;);</a>
<a class="sourceLine" id="cb17-3" title="3"><span class="kw">static_assert</span>(g == <span class="fl">4.5</span>);</a></code></pre></div>
<p>e. If the operator is comparison (<code>==</code>, <code>!=</code>, <code>&lt;</code>, <code>&gt;</code>, <code>&lt;=</code> or <code>&gt;=</code>), then the operand with the greater exponent is converted such that its exponent matches the other operands' exponent. Then the result is <code>decltype(RepS@RepT)</code>, eg.:</p>
<pre><code>constexpr auto h = scaled_integer&lt;int8_t, power&lt;-2&gt;&gt;{12.5} &lt;= scaled_integer&lt;short&gt;{8};
static_assert(is_same_v&lt;decltype(h), const bool&gt;);
static_assert(h == false);
</code></pre>
<p>(See section, <a href="#Extended-Comparison-Range">Extended Comparison Range</a>, for additional details.)</p>
<p>f. If the operator is bitwise or (<code>|</code>) or xor (<code>^</code>) then the same rules as addition (<code>+</code>) are applied.</p>
<p>g. If the operator is bitwise and (<code>&amp;</code>) then the same rules as bitwise or (<code>|</code>) are applied except that the greater — not less — exponent is preferred.</p></li>
</ol>
<p>Some details have been left out for brevity. Unary operators are supported. Some minor variations occur when <code>S</code> is not <code>scaled_integer</code> and <code>T</code> is <code>scaled_integer</code>. Rules for bit-shifting values where <code>Radix!=2</code> do not necessarily involve a different result type. Other binary operations involving different radixes produce a return type which is optimized to contain the precise result with the minimum value stored in <code>Rep</code> and the minimum viable value of <code>Radix</code>.</p>
<p>The complete set of rules may appear to be large and complex. However, this mostly reflects the existing complexity in the behavior of arithmetic types. Relatively few design principles govern these rules:</p>
<ol>
<li>A <code>scaled_integer&lt;T&gt;</code> should follow the same behavior as <code>T</code> to the greatest extent practical, reflecting the facts that: a) all integers are fixed-point — rather than floating-point types and b) integer arithmetic generally provides the best efficiency and performance characteristics.</li>
<li>In situations where a trade-off between overflow and underflow must be made, the design guards against underflow. This follows from principle #1. Far more operations can cause overflow and users are generally more wary of it. And detection/handling of overflow is an orthogonal concern which is best implemented using a custom numeric type such as the <code>safe_integer</code> and <code>elastic_integer</code> types discussed in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html#componentization">[P0554]</a>.</li>
</ol>
<h3><a name="Division-Operator"></a>Division Operator</h3>
<p>The behavior of the division operator, <code>operator/</code>, poses a dilemma which has proven contentious. Following an impromptu review of P0037 by SG6 in San Diego Davis Herring volunteered to write a paper which explores two competing strategies as identified by SG6 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1368r0.html">[P1368]</a>. The two strategies identified are:</p>
<ol>
<li>'Quasi-exact' is the strategy pursued by P0106 and perceived as being most like floating-point and therefore least surprising.</li>
<li>The strategy currently proposed for P0037 which follows the behavior of integer division and therefore maximizes efficiency, control, precision and consistency with the other operators.</li>
</ol>
<p>Unfortunately, fixed-point is commonly seen as a replacement for floating-point — rather than an extension to integer — arithmetic. There is a desire to write generic code which can accept both floating-point and fixed-point types and which involves division operations. However, there are problems with this aim which are not necessarily solved by using 'quasi-exact' division. Conversely, it is feasible to write generic code in which fixed-point and integer types can be interchanged. 'Quasi-exact' is a poor fit for such code. Finally, generic code which can be successfully instantiated with all three is possible but must be written with extra care where division is concerned.</p>
<p>Some observations which may help back up these claims are as follows.</p>
<h4>'Quasi-exact' quotients are sensitive to operand types</h4>
<p>Consider the following examples using P0106 types:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb19-1" title="1"><span class="kw">auto</span> q = negatable{<span class="dv">1</span>} / negatable{<span class="dv">3</span><span class="bu">L</span>}  <span class="er"># example 1</span></a></code></pre></div>
<p>In the above example, the type of the dividend and divisor are deduced from the initializer. On systems with 32-bit <code>long</code> integers, the type of the divisor will be deduced as <code>negatable&lt;31, 0&gt;</code> and on systems with 64-bit <code>long</code> integers, the type of the divisor will be deduced as <code>negatable&lt;63, 0&gt;</code>. This will in turn affect the number of fractional digits of the quotient which will in turn affect the resultant value.</p>
<p>Certainly, there are ways in which this situation can be avoided but in general, making the result sensitive to the range of the divisor may have drawbacks in some situations.</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb20-1" title="1"><span class="kw">auto</span> q = negatable&lt;<span class="dv">1</span>, <span class="dv">0</span>&gt;{<span class="dv">1</span>} / negatable&lt;<span class="dv">2</span>, <span class="dv">0</span>&gt;{<span class="dv">3</span>}  <span class="er"># example 2</span></a></code></pre></div>
<p>In this second example, the minimum width is chosen for the operands. The result is now consistent but is not exact at all, having only two fractional digits.</p>
<h4>'Quasi-exact' results are inexact</h4>
<p>By trying to emulate floating-point division, precision loss is considered acceptable. In contrast, integer division is lossless — provided that the remainder is taken into account. Consider the example from the <a href="#Operators">Operators</a> section.</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb21-1" title="1"><span class="co">// quotient is scaled_integer&lt;short, power&lt;-2&gt;&gt;{.5}</span></a>
<a class="sourceLine" id="cb21-2" title="2"><span class="kw">constexpr</span> <span class="kw">auto</span> e = scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">5</span>&gt;&gt;{<span class="fl">1.5</span>} / scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="fl">2.5</span>};</a>
<a class="sourceLine" id="cb21-3" title="3"></a>
<a class="sourceLine" id="cb21-4" title="4"><span class="co">// remainder is scaled_integer&lt;short, power&lt;-5&gt;&gt;{.25}</span></a>
<a class="sourceLine" id="cb21-5" title="5"><span class="kw">constexpr</span> <span class="kw">auto</span> f = scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">5</span>&gt;&gt;{<span class="fl">1.5</span>} % scaled_integer&lt;<span class="dt">short</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="fl">2.5</span>};</a>
<a class="sourceLine" id="cb21-6" title="6"></a>
<a class="sourceLine" id="cb21-7" title="7"><span class="co">// dividend is scaled_integer&lt;int, power&lt;-5&gt;&gt;{1.5}</span></a>
<a class="sourceLine" id="cb21-8" title="8"><span class="kw">constexpr</span> <span class="kw">auto</span> dividend = scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">3</span>&gt;&gt;{<span class="fl">2.5</span>} * e + f;</a></code></pre></div>
<p>The initial dividend is retrieved by working back from the quotient, the remainder and the divisor. This is guaranteed by the underlying integer type. Indeed, an optimizing compiler will recognize that the input and output dividend are the same and elide them. (<a href="https://godbolt.org/z/yGcEsq">example</a>)</p>
<h4>'Quasi-exact' division is slow</h4>
<p>The above example requires no shift operations. P1368 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1368r0.html">[P1368]</a> speculates that for some <code>Rep</code> types, shift operations may be elided. This is a valid point. However, nothing about providing a thin abstraction precludes such elision. And at this level the bare amount of scaling can be performed regardless of <code>Rep</code> type.</p>
<h4>'Quasi-exact' division is rigid</h4>
<p>What 'quasi-exact' division really delivers is a reliable way to get <em>some</em> precision. The fact that it takes a best guess is evidence of a lack of control afforded the user. And the user is badly served when this guess is wrong. The ideal API should provide this best guess as a default only and allow the user to override it to choose the precision they need. That is the topic of the next section.</p>
<h3><a name="Custom-Division"></a>Custom Division</h3>
<p>The <code>scaled_integer</code> division operator, <code>/</code>, performs the least work possible and, combined with the modulo operator, <code>%</code>, produces lossless results. However, it behaves very differently from floating-point division and is likely to be a source of surprises for some users.</p>
<p>In particular, the choice of quotient type can have a dramatic effect on precision. If, for example, the dividend and divisor have the same <code>Exponent</code> and <code>Radix</code>, then the quotient's <code>Exponent</code> will be zero and all fractional digits will be dropped. In contrast to floating-point division, the choice of <code>Exponent</code> cannot be tailored to the result.</p>
<p>For this reason, the <code>fractional</code> type <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p1050r0.md">[P1050]</a> is provided in order to facilitate two important use cases.</p>
<p>Firstly a 'sane default' result type can be calculated automatically in line with the formula detailed in P0106 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html#basic_operations">[P0106]</a>. Here, a deduction guide does the work of determining that a division involving a dividend with 31 integer digits should result in a quotient with 31 fractional digits:</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb22-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> i = scaled_integer{fractional{<span class="dv">1</span>, <span class="dv">3</span>}};</a>
<a class="sourceLine" id="cb22-2" title="2"><span class="kw">static_assert</span>(i == <span class="fl">0.333333333022892475128173828125</span><span class="bu">L</span>);</a>
<a class="sourceLine" id="cb22-3" title="3"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(i), <span class="at">const</span> scaled_integer&lt;<span class="dt">int64_t</span>, power&lt;-<span class="dv">31</span>&gt;&gt;&gt;);</a></code></pre></div>
<p>This suffers from the problem (discussed above) that number of fractional digits is sensitive to the width of the divisor. However, the <code>scaled_integer</code> type does not attempt to abstract away the integers with which it represents values. This should make such portability issues more apparent. And in the above style, retention of fractioanl values using the <code>fractional</code> type becomes encouraged.</p>
<p>Alternatively, the user can forgo CTAD and choose the template parameters explicitly:</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb23-1" title="1"><span class="kw">constexpr</span> <span class="kw">auto</span> j = scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">16</span>&gt;&gt;{fractional{<span class="dv">1</span>, <span class="dv">3</span>}};</a>
<a class="sourceLine" id="cb23-2" title="2"><span class="kw">static_assert</span>(j == <span class="fl">0.3333282470703125</span>);</a>
<a class="sourceLine" id="cb23-3" title="3"><span class="kw">static_assert</span>(is_same_v&lt;<span class="kw">decltype</span>(j), <span class="at">const</span> scaled_integer&lt;<span class="dt">int</span>, power&lt;-<span class="dv">16</span>&gt;&gt;&gt;);</a></code></pre></div>
<p>This usage avoids all abiguity and provides complete control over the precision expressed in the quotient.</p>
<h3><a name="Alternative-Types-for-Rep"></a>Alternative Types for <code>Rep</code></h3>
<p>Using built-in integral types as the default underlying representation minimizes certain costs:</p>
<ul>
<li>many fixed-point operations are as efficient as their integral equivalents;</li>
<li>compile-time complexity is kept relatively low and</li>
<li>the behavior of fixed-point types should cause few surprises.</li>
</ul>
<p>However, this choice also brings with it many of the deficiencies of built-in types. For example:</p>
<ul>
<li>the typical rounding behavior is distinct for:
<ul>
<li>conversion from floating-point types;</li>
<li>right shift and</li>
<li>divide operations;</li>
</ul></li>
<li>all of these rounding behaviors cause drift and propagate error;</li>
<li>overflow, underflow and flush are handled silently with wrap-around or undefined behavior;</li>
<li>divide-by-zero similarly results in undefined behavior and</li>
<li>the range of values is limited by the largest type: <code>long long int</code>.</li>
</ul>
<p>The effort involved in addressing these deficiencies is non-trivial and on-going (for example <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0105r0.html">[P0105]</a>). As solutions are made available, it should become easier to define custom integral types which address concerns surrounding robustness and correctness. How to combine such numeric types is the topic of <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html">[P0554]</a>, <strong>Composition of Arithmetic Types</strong>.</p>
<p>Of particular note is the <code>elastic_integer</code> type detailed in <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0828r0.md">[P0828]</a>. When used in combination with <code>scaled_integer</code>, the resultant composite type is able to avoid a large proportion of the out-of-range errors associated with fixed-point arithmetic while avoiding expensive run-time overflow checks.</p>
<h4><a name="Required-Specializations"></a>Required Specializations</h4>
<p>For a type to be suitable as parameter, <code>Rep</code>, of <code>scaled_integer</code>, it must meet the following requirements:</p>
<ul>
<li>it must have specialized the following existing standard library types:
<ul>
<li><code>numeric_limits</code></li>
<li><code>make_signed</code> and <code>make_unsigned</code></li>
</ul></li>
<li>it must have specialized the following proposed standard library types as described in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0675r0.html">[P0675]</a>:
<ul>
<li><code>num_digits</code> and <code>set_num_digits</code>,</li>
<li><code>to_rep</code>, <code>from_rep</code> and <code>from_value</code>.</li>
</ul></li>
</ul>
<p>Note that <code>make_signed</code> and <code>make_unsigned</code> cannot be specialized for custom types. Unless this rule can be relaxed, some equivalent mechanism must be introduced in order for custom types to be used with <code>scaled_integer&lt;&gt;</code>. One possibility is the addition of <code>numeric_limits&lt;&gt;::signed</code> and <code>numeric_limits&lt;&gt;::unsigned</code> type aliases.</p>
<h3><a name="Example"></a>Example</h3>
<p>The following function, <code>magnitude</code>, calculates the magnitude of a 3-dimensional vector.</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb24-1" title="1"><span class="kw">template</span>&lt;<span class="kw">class</span> Fp&gt;</a>
<a class="sourceLine" id="cb24-2" title="2"><span class="kw">constexpr</span> <span class="kw">auto</span> magnitude(Fp x, Fp y, Fp z)</a>
<a class="sourceLine" id="cb24-3" title="3">{</a>
<a class="sourceLine" id="cb24-4" title="4">    <span class="cf">return</span> sqrt(x*x+y*y+z*z);</a>
<a class="sourceLine" id="cb24-5" title="5">}</a></code></pre></div>
<p>And here is a call to <code>magnitude</code>.</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb25-1" title="1"><span class="kw">auto</span> m = magnitude(</a>
<a class="sourceLine" id="cb25-2" title="2">        scaled_integer&lt;<span class="dt">uint16_t</span>, power&lt;-<span class="dv">12</span>&gt;&gt;(<span class="dv">1</span>),</a>
<a class="sourceLine" id="cb25-3" title="3">        scaled_integer&lt;<span class="dt">uint16_t</span>, power&lt;-<span class="dv">12</span>&gt;&gt;(<span class="dv">4</span>),</a>
<a class="sourceLine" id="cb25-4" title="4">        scaled_integer&lt;<span class="dt">uint16_t</span>, power&lt;-<span class="dv">12</span>&gt;&gt;(<span class="dv">9</span>));</a>
<a class="sourceLine" id="cb25-5" title="5"><span class="co">// m === scaled_integer&lt;uint32_t, power&lt;-24&gt;&gt;{9.8994948863983154}</span></a></code></pre></div>
<h2><a name="Technical-Specification"></a>Technical Specification</h2>
<h3><a name="Header-scaled_integer-Synopsis"></a>Header &lt;scaled_integer&gt; Synopsis</h3>
<div class="sourceCode" id="cb26"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb26-1" title="1"><span class="kw">namespace</span> std {</a>
<a class="sourceLine" id="cb26-2" title="2">  <span class="kw">template</span> &lt;<span class="dt">int</span> Exponent, <span class="dt">int</span> Radix&gt; <span class="kw">class</span> power;</a>
<a class="sourceLine" id="cb26-3" title="3"></a>
<a class="sourceLine" id="cb26-4" title="4">  <span class="kw">template</span> &lt;<span class="kw">class</span> Rep, <span class="kw">class</span> Scale&gt; <span class="kw">class</span> scaled_integer;</a>
<a class="sourceLine" id="cb26-5" title="5"></a>
<a class="sourceLine" id="cb26-6" title="6">  <span class="co">// for each unary arithmetic, comparison, logic and bitwise operator, @</span></a>
<a class="sourceLine" id="cb26-7" title="7">  <span class="kw">template</span> &lt;<span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix&gt;</a>
<a class="sourceLine" id="cb26-8" title="8">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-9" title="9">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-10" title="10"></a>
<a class="sourceLine" id="cb26-11" title="11">  <span class="co">// for each binary arithmetic, comparison, logic and bitwise operator, @</span></a>
<a class="sourceLine" id="cb26-12" title="12">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix&gt;</a>
<a class="sourceLine" id="cb26-13" title="13">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-14" title="14">      <span class="at">const</span> scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-15" title="15">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-16" title="16"></a>
<a class="sourceLine" id="cb26-17" title="17">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsFloat,</a>
<a class="sourceLine" id="cb26-18" title="18">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;RhsFloat&gt;::is_iec559&gt;&gt;</a>
<a class="sourceLine" id="cb26-19" title="19">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-20" title="20">      <span class="at">const</span> scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-21" title="21">      <span class="at">const</span> RhsFloat &amp; rhs);</a>
<a class="sourceLine" id="cb26-22" title="22">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsFloat, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix,</a>
<a class="sourceLine" id="cb26-23" title="23">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;LhsFloat&gt;::is_iec559&gt;&gt;</a>
<a class="sourceLine" id="cb26-24" title="24">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-25" title="25">      <span class="at">const</span> LhsFloat &amp; lhs,</a>
<a class="sourceLine" id="cb26-26" title="26">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-27" title="27"></a>
<a class="sourceLine" id="cb26-28" title="28">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsInteger,</a>
<a class="sourceLine" id="cb26-29" title="29">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;RhsInteger&gt;::is_integer&gt;&gt;</a>
<a class="sourceLine" id="cb26-30" title="30">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-31" title="31">      <span class="at">const</span> scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-32" title="32">      <span class="at">const</span> RhsInteger &amp; rhs);</a>
<a class="sourceLine" id="cb26-33" title="33">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsInteger, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix,</a>
<a class="sourceLine" id="cb26-34" title="34">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;LhsInteger&gt;::is_integer&gt;&gt;</a>
<a class="sourceLine" id="cb26-35" title="35">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-36" title="36">      <span class="at">const</span> LhsInteger &amp; lhs,</a>
<a class="sourceLine" id="cb26-37" title="37">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-38" title="38"></a>
<a class="sourceLine" id="cb26-39" title="39">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">auto</span> RhsValue&gt;</a>
<a class="sourceLine" id="cb26-40" title="40">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-41" title="41">      <span class="at">const</span> scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-42" title="42">      constant&lt;RhsValue&gt;);</a>
<a class="sourceLine" id="cb26-43" title="43">  <span class="kw">template</span> &lt;<span class="kw">auto</span> LhsValue, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix&gt;</a>
<a class="sourceLine" id="cb26-44" title="44">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>(</a>
<a class="sourceLine" id="cb26-45" title="45">      constant&lt;LhsValue&gt;,</a>
<a class="sourceLine" id="cb26-46" title="46">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-47" title="47"></a>
<a class="sourceLine" id="cb26-48" title="48">  <span class="co">// for each arithmetic, comparison, logic and bitwise compound assignment operator, @=</span></a>
<a class="sourceLine" id="cb26-49" title="49">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix&gt;</a>
<a class="sourceLine" id="cb26-50" title="50">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-51" title="51">      scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-52" title="52">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-53" title="53"></a>
<a class="sourceLine" id="cb26-54" title="54">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsFloat,</a>
<a class="sourceLine" id="cb26-55" title="55">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;RhsFloat&gt;::is_iec559&gt;&gt;</a>
<a class="sourceLine" id="cb26-56" title="56">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-57" title="57">      scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-58" title="58">      <span class="at">const</span> RhsFloat &amp; rhs);</a>
<a class="sourceLine" id="cb26-59" title="59">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsFloat, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix,</a>
<a class="sourceLine" id="cb26-60" title="60">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;LhsFloat&gt;::is_iec559&gt;&gt;</a>
<a class="sourceLine" id="cb26-61" title="61">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-62" title="62">      LhsFloat &amp; lhs,</a>
<a class="sourceLine" id="cb26-63" title="63">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-64" title="64"></a>
<a class="sourceLine" id="cb26-65" title="65">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">class</span> RhsInteger,</a>
<a class="sourceLine" id="cb26-66" title="66">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;RhsInteger&gt;::is_integer&gt;&gt;</a>
<a class="sourceLine" id="cb26-67" title="67">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-68" title="68">      scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-69" title="69">      <span class="at">const</span> RhsInteger &amp; rhs);</a>
<a class="sourceLine" id="cb26-70" title="70">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsInteger, <span class="kw">class</span> RhsRep, <span class="dt">int</span> RhsExponent, <span class="dt">int</span> RhsRadix,</a>
<a class="sourceLine" id="cb26-71" title="71">        <span class="kw">typename</span> = _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;LhsInteger&gt;::is_integer&gt;&gt;</a>
<a class="sourceLine" id="cb26-72" title="72">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-73" title="73">      LhsInteger &amp; lhs,</a>
<a class="sourceLine" id="cb26-74" title="74">      <span class="at">const</span> scaled_integer&lt;RhsRep, power&lt;RhsExponent, RhsRadix&gt;&gt; &amp; rhs);</a>
<a class="sourceLine" id="cb26-75" title="75"></a>
<a class="sourceLine" id="cb26-76" title="76">  <span class="kw">template</span> &lt;<span class="kw">class</span> LhsRep, <span class="dt">int</span> LhsExponent, <span class="dt">int</span> LhsRadix, <span class="kw">auto</span> RhsValue&gt;</a>
<a class="sourceLine" id="cb26-77" title="77">    <span class="kw">constexpr</span> <span class="kw">auto</span> <span class="kw">operator</span><span class="er">@</span>=(</a>
<a class="sourceLine" id="cb26-78" title="78">      scaled_integer&lt;LhsRep, power&lt;LhsExponent, LhsRadix&gt;&gt; &amp; lhs,</a>
<a class="sourceLine" id="cb26-79" title="79">      constant&lt;RhsValue&gt;);</a>
<a class="sourceLine" id="cb26-80" title="80"></a>
<a class="sourceLine" id="cb26-81" title="81">  <span class="kw">template</span> &lt;<span class="kw">auto</span> Value&gt;</a>
<a class="sourceLine" id="cb26-82" title="82">  scaled_integer(::cnl::constant&lt;Value&gt;)</a>
<a class="sourceLine" id="cb26-83" title="83">  -&gt; <span class="co">/* ... */</span>;</a>
<a class="sourceLine" id="cb26-84" title="84"></a>
<a class="sourceLine" id="cb26-85" title="85">  <span class="kw">template</span> &lt;<span class="kw">class</span> Integer&gt;</a>
<a class="sourceLine" id="cb26-86" title="86">  scaled_integer(Integer)</a>
<a class="sourceLine" id="cb26-87" title="87">  -&gt; scaled_integer&lt;Integer, <span class="dv">0</span>&gt;;</a>
<a class="sourceLine" id="cb26-88" title="88">}</a></code></pre></div>
<h4><a name="scaled_integer-Class-Template"></a><code>scaled_integer&lt;&gt;</code> Class Template</h4>
<div class="sourceCode" id="cb27"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb27-1" title="1"><span class="kw">template</span> &lt;<span class="dt">int</span> Exponent = <span class="dv">0</span>, <span class="dt">int</span> Radix = <span class="dv">2</span>&gt;</a>
<a class="sourceLine" id="cb27-2" title="2"><span class="kw">struct</span> power {};</a>
<a class="sourceLine" id="cb27-3" title="3"></a>
<a class="sourceLine" id="cb27-4" title="4"><span class="kw">template</span> &lt;<span class="kw">class</span> Rep = <span class="dt">int</span>, <span class="kw">class</span> Scale = power&lt;&gt;&gt;</a>
<a class="sourceLine" id="cb27-5" title="5"><span class="kw">class</span> scaled_integer</a>
<a class="sourceLine" id="cb27-6" title="6">{</a>
<a class="sourceLine" id="cb27-7" title="7"><span class="kw">public</span>:</a>
<a class="sourceLine" id="cb27-8" title="8">  <span class="kw">using</span> rep = Rep;</a>
<a class="sourceLine" id="cb27-9" title="9">  <span class="kw">using</span> radix = Radix;</a>
<a class="sourceLine" id="cb27-10" title="10"></a>
<a class="sourceLine" id="cb27-11" title="11">  <span class="kw">constexpr</span> <span class="at">static</span> <span class="dt">int</span> exponent;</a>
<a class="sourceLine" id="cb27-12" title="12"></a>
<a class="sourceLine" id="cb27-13" title="13">  <span class="kw">constexpr</span> scaled_integer();</a>
<a class="sourceLine" id="cb27-14" title="14">  <span class="kw">template</span>&lt;<span class="kw">class</span> FromRep, <span class="dt">int</span> FromExponent, <span class="dt">int</span> FromRadix&gt;</a>
<a class="sourceLine" id="cb27-15" title="15">    <span class="kw">constexpr</span> scaled_integer(scaled_integer&lt;FromRep, power&lt;FromExponent, FromRadix&gt;&gt; <span class="at">const</span>&amp;);</a>
<a class="sourceLine" id="cb27-16" title="16">  <span class="kw">template</span>&lt;CNL_IMPL_CONSTANT_VALUE_TYPE Value&gt;</a>
<a class="sourceLine" id="cb27-17" title="17">    <span class="kw">constexpr</span> scaled_integer(constant&lt;Value&gt;);</a>
<a class="sourceLine" id="cb27-18" title="18">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_integer, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-19" title="19">    <span class="kw">constexpr</span> scaled_integer(S <span class="at">const</span>&amp;);</a>
<a class="sourceLine" id="cb27-20" title="20">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_iec559, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-21" title="21">    <span class="kw">constexpr</span> scaled_integer(S);</a>
<a class="sourceLine" id="cb27-22" title="22">  <span class="kw">template</span>&lt;<span class="kw">class</span> Numerator, <span class="kw">class</span> Denominator&gt;</a>
<a class="sourceLine" id="cb27-23" title="23">    <span class="kw">constexpr</span> scaled_integer(<span class="at">const</span> fractional&lt;Numerator, Denominator&gt;&amp;);</a>
<a class="sourceLine" id="cb27-24" title="24">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_integer, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-25" title="25">    <span class="kw">constexpr</span> scaled_integer&amp; <span class="kw">operator</span>=(S);</a>
<a class="sourceLine" id="cb27-26" title="26">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_iec559, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-27" title="27">    <span class="kw">constexpr</span> scaled_integer&amp; <span class="kw">operator</span>=(S);</a>
<a class="sourceLine" id="cb27-28" title="28">  <span class="kw">template</span>&lt;<span class="kw">class</span> FromRep, <span class="dt">int</span> FromExponent, <span class="dt">int</span> FromRadix&gt;</a>
<a class="sourceLine" id="cb27-29" title="29">    <span class="kw">constexpr</span> scaled_integer&amp; <span class="kw">operator</span>=(</a>
<a class="sourceLine" id="cb27-30" title="30">        scaled_integer&lt;FromRep, power&lt;FromExponent, FromRadix&gt;&gt; <span class="at">const</span>&amp;);</a>
<a class="sourceLine" id="cb27-31" title="31">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_integer, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-32" title="32">    <span class="kw">explicit</span> <span class="kw">constexpr</span> <span class="kw">operator</span> S() <span class="at">const</span>;</a>
<a class="sourceLine" id="cb27-33" title="33">  <span class="kw">template</span>&lt;<span class="kw">class</span> S, _impl::<span class="dt">enable_if_t</span>&lt;numeric_limits&lt;S&gt;::is_iec559, <span class="dt">int</span>&gt; Dummy = <span class="dv">0</span>&gt;</a>
<a class="sourceLine" id="cb27-34" title="34">    <span class="kw">explicit</span> <span class="kw">constexpr</span> <span class="kw">operator</span> S() <span class="at">const</span>;</a>
<a class="sourceLine" id="cb27-35" title="35">};</a></code></pre></div>
<h2><a name="Open-Issues"></a>Open Issues</h2>
<h3><a name="Library-Support"></a>Library Support</h3>
<p>Because the aim is to provide an alternative to existing arithmetic types which are supported by the standard library, it is conceivable that a future proposal might specialize existing class templates and overload existing functions.</p>
<p>Possible candidates for overloading include the functions defined in &lt;cmath&gt; and a templated specialization of <code>numeric_limits</code>. A new type trait, <code>is_scaled_integer</code>, would also be useful.</p>
<p>While <code>scaled_integer</code> is intended to provide drop-in replacements to existing built-ins, it may be preferable to deviate slightly from the behavior of certain standard functions. For example, overloads of functions from &lt;cmath&gt; will be considerably less concise, efficient and versatile if they obey rules surrounding error cases. In particular, the guarantee of setting <code>errno</code> in the case of an error prevents a function from being defined as pure. This highlights a wider issue surrounding the adoption of the functional approach and compile-time computation that is beyond the scope of this document.</p>
<p>One suggested addition is a specialization of <code>std::complex</code>. This would take the form:</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb28-1" title="1"><span class="kw">template</span>&lt;<span class="kw">class</span> Rep, <span class="dt">int</span> Exponent, <span class="dt">int</span> Radix&gt;</a>
<a class="sourceLine" id="cb28-2" title="2"><span class="kw">class</span> complex&lt;scaled_integer&lt;Rep, power&lt;Exponent, Radix&gt;&gt;&gt;;</a></code></pre></div>
<p>This type's arithmetic operators would differ from existing specializations because <code>scaled_integer&lt;&gt;</code> operators often return results of a different type to their operands. Hence signatures such as</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb29-1" title="1"><span class="kw">template</span>&lt;<span class="kw">class</span> T&gt;</a>
<a class="sourceLine" id="cb29-2" title="2">complex&lt;T&gt; <span class="kw">operator</span>*( <span class="at">const</span> complex&lt;T&gt;&amp; lhs, <span class="at">const</span> complex&lt;T&gt;&amp; rhs);</a></code></pre></div>
<p>would need to be replaced with:</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb30-1" title="1"><span class="kw">template</span>&lt;<span class="kw">class</span> T&gt;</a>
<a class="sourceLine" id="cb30-2" title="2"><span class="kw">auto</span> <span class="kw">operator</span>*( <span class="at">const</span> complex&lt;T&gt;&amp; lhs, <span class="at">const</span> complex&lt;T&gt;&amp; rhs);</a></code></pre></div>
<h3><a name="Extended-Comparison-Range"></a>Extended Comparison Range</h3>
<p>Comparison operations between two <code>scaled_integer</code> operands require that they both have the same exponent. When they do not, conversion takes place to ensure they do. Unfortunately, if the difference in exponents is too great, the conversion may cause an out-of-bounds condition.</p>
<p>However, where two operands have bits whose values are in ranges that do not overlap, it may not be necessary to perform a conversion which results in out-of-range results: a result that ensures they continue to not overlap may be sufficient. For example,</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb31-1" title="1"><span class="kw">static_assert</span>(scaled_integer&lt;<span class="dt">uint8_t</span>&gt;{<span class="dv">0</span>} &lt; scaled_integer&lt;<span class="dt">uint8_t</span>, power&lt;<span class="dv">128</span>&gt;&gt;{<span class="fl">4.e38</span>});</a></code></pre></div>
<p>requires that the right-hand operand be converted to <code>scaled_integer&lt;uint8_t&gt;</code>. This will result in the underlying integer being scaled up by 1000 bits, resulting in undefined behavior and/or a flushed value. But in this case, it only needed to be scaled by 8 bits in order for none of its bit values to overlap with those of the left-hand operand.</p>
<h3><a name="Allow-Binary-Operations-if-Radixes-are-Different"></a>Allow Binary Operations if Radixes are Different</h3>
<p>Does it make sense to allow binary operations which take, say, a base-2 and a base-10 number? The answer is relatively straight-forward when one considers that all base-2 numbers can be expressed using base-10 numbers. What about a base-2 and a base-3 number? At this point, we may need to convert them to a base-6 number to proceed.</p>
<p>Next, how do the exponents interact in situations when the radixes are different? For example, when adding <code>scaled_integer&lt;int, power&lt;-2, 2&gt;&gt;</code> and <code>scaled_integer&lt;int, power&lt;-1, 4&gt;&gt;</code>, is the result the former or the latter? They are computationally equivalent because they both represent units of 0.25.</p>
<p>The likely solution is to choose a result type with the minimum radix and then the minimum exponent necessary in order to be able to represent all possible values. However, this may result in a set of operators which are surprising to the user. Thus is it tempting to simply forbid inter-radix operations. (Note: a similar problem is faced by <code>common_type(chrono::duration)</code>.)</p>
<h2><a name="Prior-Art"></a>Prior Art</h2>
<p>Many examples of fixed-point support in programming languages and their libraries exist. While almost all of them aim for low run-time cost and expressive alternatives to raw integer manipulation, they vary greatly in detail and in terms of their interface.</p>
<p>One especially interesting dichotomy is between solutions which offer a discrete selection of fixed-point types and libraries which contain a continuous range of exponents through type parameterization.</p>
<h3><a name="N1169"></a>N1169</h3>
<p>One example of the former is found in proposal N1169 <a href="http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf">[N1169]</a>, the intent of which is to expose features found in certain embedded hardware. It introduces a succinct set of language-level fixed-point types and impose constraints on the number of integer or fractional digits each can possess.</p>
<p>As with all examples of discrete-type fixed-point support, the limited choice of exponents is a considerable restriction on the versatility and expressiveness of the API.</p>
<p>Nevertheless, it may be possible to harness performance gains provided by N1169 fixed-point types through explicit template specialization. This is likely to be a valuable proposition to potential users of the library who find themselves targeting platforms which support fixed-point arithmetic at the hardware level.</p>
<h3><a name="P0106"></a>P0106</h3>
<p>There are many other C++ libraries available which fall into the latter category of continuous-range fixed-point arithmetic <a href="https://github.com/mizvekov/fp">[mizvekov]</a> <a href="http://www.codeproject.com/Articles/37636/Fixed-Point-Class">[schregle]</a> <a href="https://github.com/viboes/fixed_point">[viboes]</a>. In particular, an existing library proposal by Lawrence Crowl, P0106 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html">[P0106]</a> (formerly N3352), aims to achieve very similar goals through similar means and warrants closer comparison than N1169.</p>
<p>P0106 introduces four class templates covering the quadrant of signed versus unsigned and fractional versus integer numeric types. It is intended to replace built-in types in a wide variety of situations and accordingly, is highly compile-time configurable in terms of how rounding and overflow are handled. Parameters to these four class templates include the range in bits and - for fractional types - the resolution.</p>
<p>The <code>scaled_integer</code> class template could probably - with a few caveats - be generated using the two fractional types, <code>nonnegative</code> and <code>negatable</code>, replacing the <code>Rep</code> parameter with the integer bit count of <code>Rep</code>, specifying <code>fastest</code> for the rounding mode and specifying <code>undefined</code> as the overflow mode.</p>
<p>However, <code>scaled_integer</code> more closely and concisely caters to the needs of users who already use integer types and simply desire a less error-prone form. It more closely follows the five design aims of this paper and, arguably, more closely follows the spirit of the standard in its pursuit of zero-cost abstraction.</p>
<p>Some aspects of the design of the P0106 API which back up these conclusion are that:</p>
<ul>
<li>the nature of the range-specifying template parameters - through careful framing in mathematical terms - abstracts away valuable information regarding machine-critical type size information;</li>
<li>the breaking up of duties amongst four separate class templates introduces four new concepts and incurs additional mental load for relatively little gain while further detaching the interface from vital machine-level details;</li>
<li>the absence of the most negative number from signed types reduces the capacity of all types by one and</li>
<li>the selection of rounding and overflow modes via enumerations limits the choices to a pre-existing finite set and prevents the user from providing their own such strategies — or indeed from customizing fixed-point types in more exotic ways such as by using non-standard, platform-specific integer types.</li>
</ul>
<p>A more detailed comparison of the approaches taken in this paper and P0106 can be found in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html">[P0554]</a>.</p>
<h3><a name="Ada-Language-Support"></a>Ada Language Support</h3>
<p>Most languages lack fixed-point support. One difficulty in supporting fixed-point in a type system is that applications of fixed-point are sensitive to the choice of exponent value and parameterization of types is limited by syntax.</p>
<p>Ada provides binary and decimal fixed-point types and specifies resolution and range using <code>delta</code> and <code>range</code> literals, e.g.:</p>
<div class="sourceCode" id="cb32"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb32-1" title="1">type T is delta <span class="fl">0.0625</span> range <span class="fl">0.0</span> .. <span class="fl">16.0</span>;</a></code></pre></div>
<p>Limitations imposed on C++'s UDL syntax and non-type template parameters make it difficult to form types using literals in this way.</p>
<h2><a name="Acknowledgements"></a>Acknowledgements</h2>
<p>SG6: Davis Herring, Lawrence Crowl, Lisa Lippincott<br />
SG14: Guy Davidson, Michael Wong<br />
Contributors: Ed Ainsley, Billy Baker, Lance Dyson, Marco Foco, Mathias Gaunard, Clément Grégoire, Nicolas Guillemot, Kurt Guntheroth, Matt Kinzelman, Joël Lamotte, Sean Middleditch, Paul Robinson, Patrice Roy, Peter Schregle, Ryhor Spivak</p>
<h2><a name="Revisions"></a>Revisions</h2>
<p>This paper revises <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0037r5.md">P0037R6</a>:</p>
<ul>
<li>rename <code>fixed_point</code> to <code>scaled_integer</code> following feedback from SG6 and LEWG in Kona</li>
<li>replaced <code>Exponent</code> and <code>Radix</code> parameters with the <code>power</code> tag type</li>
<li>remove sections discussing <code>fixed_point</code> versus <code>scaled_integer</code> including:
<ul>
<li>division operator issues</li>
<li>details of template parameters</li>
</ul></li>
<li>added mention of fixed-point types in Ada</li>
</ul>
<p>P0037R6 revises <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0037r5.md">P0037R5</a>:</p>
<ul>
<li>response to <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1368r0.html">[P1368]</a>
<ul>
<li>added section, <a href="#Division-Operator">Division Operator</a></li>
<li>added section, <a href="#Rename-fixed_point">Rename <code>fixed_point</code> to avoid confusion over division</a></li>
<li>added detail to section, <a href="#Custom-Division">Custom Division</a></li>
</ul></li>
</ul>
<p>P0037R5 revises <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0037r4.md">P0037R4</a>:</p>
<ul>
<li>feedback from SG6 chair:
<ul>
<li>removed <code>multiply</code> and <code>divide</code> functions</li>
<li>added <code>Radix</code> non-type template parameter to <code>fixed_point</code>
<ul>
<li>added discussion of operators with different radixes</li>
<li>removed discussion of operators with different exponents</li>
</ul></li>
<li>put case for rearranging template parameters of <code>fixed_point</code> to put <code>Exponent</code> first</li>
<li>corrected innacurate or out-of-date information in the P0106 prior art section</li>
</ul></li>
<li>added references to <code>fractional</code> and <code>elastic_integer</code></li>
<li>added deduction guides and <code>fractional</code> c'tor and assignment operator</li>
</ul>
<p>P0037R4 revises <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0037r3.html">P0037R3</a>:</p>
<ul>
<li>removed mention of <code>width</code> and <code>set_width</code></li>
<li>rewritten description of <code>Rep</code> template parameter</li>
<li>added sections, <strong>Access to <code>Rep</code> Value</strong> and <strong>Class Template Deduction</strong></li>
<li>removed <code>make_fixed</code> and <code>make_ufixed</code> function templates</li>
<li>rewritten <strong>Operator Overloads</strong> section, renamed <strong>Operators</strong> and included <code>constant</code> operators</li>
<li>removed sections, <strong>Overflow</strong> and <strong>Underflow</strong></li>
<li>removed <code>add</code> and <code>subtract</code> function templates</li>
<li>reversed the roles of <code>operator/</code> and <code>divide</code></li>
<li>replaced section on composability with link to <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html">[P0554]</a></li>
<li>replaced reference to <a href="http://johnmcfarlane.github.io/fixed_point/papers/p0381r1.html">[P0381]</a> with reference to <a href="https://github.com/johnmcfarlane/papers/blob/master/wg21/p0675r0.md">[P0675]</a></li>
<li>revised synopsis</li>
<li>renamed section <strong>Future Issues</strong> to <strong>Open Issues</strong> and added sections,
<ul>
<li><strong>Different Division Strategies</strong>, <strong>Template Parameter Order</strong>, <strong>Disable Addition/Subtraction if Exponents are Different</strong>, <strong>Named Functions</strong>, <strong>Extended Comparison Range</strong> and remove sections</li>
<li><strong>Compile-Time Bit-Shift Operations</strong>, <strong>Alternative Return Type Policies</strong>,</li>
</ul></li>
<li>removed section, <strong>References</strong>, moving links into document body</li>
<li>rewrote <strong>Appendix 1: Reference Implementation</strong> referencing <a href="https://github.com/johnmcfarlane/cnl">[github]</a></li>
<li>formatting changes intended to make markdown more readable as plain text</li>
</ul>
<h2><a name="Reference-Implementation"></a>Appendix 1: Reference Implementation</h2>
<p>An in-development implementation of the fixed_point class template and its essential supporting functions and types is available <a href="https://github.com/johnmcfarlane/cnl">[github]</a>.</p>
<h2><a name="Performance"></a>Appendix 2: Performance</h2>
<p>Despite a focus on usable interface and direct translation from integer-based fixed-point operations, there is an overwhelming expectation that the source code result in minimal instructions and clock cycles. A few preliminary numbers are presented to give a very early idea of how the API might perform.</p>
<p>Some notes:</p>
<ul>
<li><p>A few test functions were run, ranging from single arithmetic operations to basic geometric functions, performed against integer, floating-point and fixed-point types for comparison.</p></li>
<li><p>Figures were taken from a single CPU, OS and compiler, namely:</p>
<ul>
<li>Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)</li>
<li>Target: x86_64-pc-linux-gnu</li>
<li>Thread model: posix</li>
</ul></li>
<li><p>Fixed inputs were provided to each function, meaning that branch prediction rarely fails. Results may also not represent the full range of inputs.</p></li>
<li><p>Details of the test harness used can be found in the source project mentioned in Appendix 1;</p></li>
<li><p>Times are in nanoseconds;</p></li>
<li><p>Code has not yet been optimized for performance.</p></li>
</ul>
<h3><a name="Types"></a>Types</h3>
<p>Where applicable various combinations of integer, floating-point and fixed-point types were tested with the following identifiers:</p>
<ul>
<li><code>uint8_t</code>, <code>int8_t</code>, <code>uint16_t</code>, <code>int16_t</code>, <code>uint32_t</code>, <code>int32_t</code>, <code>uint64_t</code> and <code>int64_t</code> built-in integer types;</li>
<li><code>float</code>, <code>double</code> and <code>long double</code> built-in floating-point types;</li>
<li>s3:4, u4:4, s7:8, u8:8, s15:16, u16:16, s31:32 and u32:32 format fixed-point types.</li>
</ul>
<h3><a name="Basic-Arithmetic"></a>Basic Arithmetic</h3>
<p>Plus, minus, multiplication and division were tested in isolation using a number of different numeric types with the following results:</p>
<p>name cpu_time<br />
add(float) 1.78011<br />
add(double) 1.73966<br />
add(long double) 3.46011<br />
add(u4_4) 1.87726<br />
add(s3_4) 1.85051<br />
add(u8_8) 1.85417<br />
add(s7_8) 1.82057<br />
add(u16_16) 1.94194<br />
add(s15_16) 1.93463<br />
add(u32_32) 1.94674<br />
add(s31_32) 1.94446<br />
add(int8_t) 2.14857<br />
add(uint8_t) 2.12571<br />
add(int16_t) 1.9936<br />
add(uint16_t) 1.88229<br />
add(int32_t) 1.82126<br />
add(uint32_t) 1.76<br />
add(int64_t) 1.76<br />
add(uint64_t) 1.83223<br />
sub(float) 1.96617<br />
sub(double) 1.98491<br />
sub(long double) 3.55474<br />
sub(u4_4) 1.77006<br />
sub(s3_4) 1.72983<br />
sub(u8_8) 1.72983<br />
sub(s7_8) 1.72983<br />
sub(u16_16) 1.73966<br />
sub(s15_16) 1.85051<br />
sub(u32_32) 1.88229<br />
sub(s31_32) 1.87063<br />
sub(int8_t) 1.76<br />
sub(uint8_t) 1.74994<br />
sub(int16_t) 1.82126<br />
sub(uint16_t) 1.83794<br />
sub(int32_t) 1.89074<br />
sub(uint32_t) 1.85417<br />
sub(int64_t) 1.83703<br />
sub(uint64_t) 2.04914<br />
mul(float) 1.9376<br />
mul(double) 1.93097<br />
mul(long double) 102.446<br />
mul(u4_4) 2.46583<br />
mul(s3_4) 2.09189<br />
mul(u8_8) 2.08<br />
mul(s7_8) 2.18697<br />
mul(u16_16) 2.12571<br />
mul(s15_16) 2.10789<br />
mul(u32_32) 2.10789<br />
mul(s31_32) 2.10789<br />
mul(int8_t) 1.76<br />
mul(uint8_t) 1.78011<br />
mul(int16_t) 1.8432<br />
mul(uint16_t) 1.76914<br />
mul(int32_t) 1.78011<br />
mul(uint32_t) 2.19086<br />
mul(int64_t) 1.7696<br />
mul(uint64_t) 1.79017<br />
div(float) 5.12<br />
div(double) 7.64343<br />
div(long double) 8.304<br />
div(u4_4) 3.82171<br />
div(s3_4) 3.82171<br />
div(u8_8) 3.84<br />
div(s7_8) 3.8<br />
div(u16_16) 9.152<br />
div(s15_16) 11.232<br />
div(u32_32) 30.8434<br />
div(s31_32) 34<br />
div(int8_t) 3.82171<br />
div(uint8_t) 3.82171<br />
div(int16_t) 3.8<br />
div(uint16_t) 3.82171<br />
div(int32_t) 3.82171<br />
div(uint32_t) 3.81806<br />
div(int64_t) 10.2286<br />
div(uint64_t) 8.304</p>
<p>Among the slowest types are <code>long double</code>. It is likely that they are emulated in software. The next slowest operations are fixed-point multiply and divide operations - especially with 64-bit types. This is because values need to be promoted temporarily to double-width types. This is a known fixed-point technique which inevitably experiences slowdown where a 128-bit type is required on a 64-bit system.</p>
<p>Here is a section of the disassembly of the s15:16 multiply call:</p>
<pre><code>30:   mov    %r14,%rax  
      mov    %r15,%rax  
      movslq -0x28(%rbp),%rax  
      movslq -0x30(%rbp),%rcx  
      imul   %rax,%rcx  
      shr    $0x10,%rcx  
      mov    %ecx,-0x38(%rbp)  
      mov    %r12,%rax  
4c:   movzbl (%rbx),%eax  
      cmp    $0x1,%eax  
    ↓ jne    68  
54:   mov    0x8(%rbx),%rax  
      lea    0x1(%rax),%rcx  
      mov    %rcx,0x8(%rbx)  
      cmp    0x38(%rbx),%rax  
    ↑ jb     30
</code></pre>
<p>The two 32-bit numbers are multiplied together and the result shifted down - much as it would if raw <code>int</code> values were used. The efficiency of this operation varies with the exponent. An exponent of zero should mean no shift at all.</p>
<h3><a name="3-Dimensional-Magnitude-Squared"></a>3-Dimensional Magnitude Squared</h3>
<p>A fast <code>sqrt</code> implementation has not yet been tested with <code>fixed_point</code>. (The naive implementation takes over 300ns.) For this reason, a magnitude-squared function is measured, combining multiply and add operations:</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb34-1" title="1"><span class="kw">template</span> &lt;<span class="kw">class</span> FP&gt;</a>
<a class="sourceLine" id="cb34-2" title="2"><span class="kw">constexpr</span> FP magnitude_squared(<span class="at">const</span> FP &amp; x, <span class="at">const</span> FP &amp; y, <span class="at">const</span> FP &amp; z)</a>
<a class="sourceLine" id="cb34-3" title="3">{</a>
<a class="sourceLine" id="cb34-4" title="4">    <span class="cf">return</span> x * x + y * y + z * z;</a>
<a class="sourceLine" id="cb34-5" title="5">}</a></code></pre></div>
<p>Only real number formats are tested:</p>
<p>float 2.42606<br />
double 2.08<br />
long double 4.5056<br />
s3_4 2.768<br />
s7_8 2.77577<br />
s15_16 2.752<br />
s31_32 4.10331</p>
<p>Again, the size of the type seems to have the largest impact.</p>
<h3><a name="Circle-Intersection"></a>Circle Intersection</h3>
<p>A similar operation includes a comparison and branch:</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode c++"><code class="sourceCode cpp"><a class="sourceLine" id="cb35-1" title="1"><span class="kw">template</span> &lt;<span class="kw">class</span> Real&gt;</a>
<a class="sourceLine" id="cb35-2" title="2"><span class="dt">bool</span> circle_intersect_generic(Real x1, Real y1, Real r1, Real x2, Real y2, Real r2)</a>
<a class="sourceLine" id="cb35-3" title="3">{</a>
<a class="sourceLine" id="cb35-4" title="4">    <span class="kw">auto</span> x_diff = x2 - x1;</a>
<a class="sourceLine" id="cb35-5" title="5">    <span class="kw">auto</span> y_diff = y2 - y1;</a>
<a class="sourceLine" id="cb35-6" title="6">    <span class="kw">auto</span> distance_squared = x_diff * x_diff + y_diff * y_diff;</a>
<a class="sourceLine" id="cb35-7" title="7"></a>
<a class="sourceLine" id="cb35-8" title="8">    <span class="kw">auto</span> touch_distance = r1 + r2;</a>
<a class="sourceLine" id="cb35-9" title="9">    <span class="kw">auto</span> touch_distance_squared = touch_distance * touch_distance;</a>
<a class="sourceLine" id="cb35-10" title="10"></a>
<a class="sourceLine" id="cb35-11" title="11">    <span class="cf">return</span> distance_squared &lt;= touch_distance_squared;</a>
<a class="sourceLine" id="cb35-12" title="12">}</a></code></pre></div>
<p>float 3.46011<br />
double 3.48<br />
long double 6.4<br />
s3_4 3.88<br />
s7_8 4.5312<br />
s15_16 3.82171<br />
s31_32 5.92</p>
<p>Again, fixed-point and native performance are comparable.</p>
</body>
</html>
