<p><strong>Document number</strong>: LEWG, SG14, SG6: P0037R2<br />
<strong>Date</strong>: 2016-05-30<br />
<strong>Reply-to</strong>: John McFarlane, <a href="mailto:mcfarlane.john+fixed-point@gmail.com">mcfarlane.john+fixed-point@gmail.com</a><br />
<strong>Audience</strong>: SG6, SG14</p>
<h1>Fixed-Point Real Numbers</h1>
<h2>I. Introduction</h2>
<p>This proposal introduces a system for performing binary fixed-point
arithmetic using integral types.</p>
<h2>II. Motivation</h2>
<p>Floating-point types are an exceedingly versatile and widely supported
method of expressing real numbers on modern architectures.</p>
<p>However, there are certain situations where fixed-point arithmetic is
preferable:</p>
<ul>
<li>Some systems lack native floating-point registers and must emulate them in software;</li>
<li>many others are capable of performing some or all operations more efficiently using integer arithmetic;</li>
<li>certain applications can suffer from the variability in precision which comes from a dynamic radix point
<a href="http://www.pathengine.com/Contents/Overview/FundamentalConcepts/WhyIntegerCoordinates/page.php">[1]</a>;</li>
<li>in situations where a variable exponent is not desired,
it takes valuable space away from the significand and reduces precision and</li>
<li>not all hardware and compilers produce exactly the same results, leading to non-deterministic results.</li>
</ul>
<p>Integer types provide the basis for an efficient
representation of binary fixed-point real numbers. However, laborious,
error-prone steps are required to normalize the results of certain
operations and to convert to and from fixed-point types.</p>
<p>A set of tools for defining and manipulating fixed-point types is
proposed. These tools are designed to make work easier for those who
traditionally use integers to perform low-level, high-performance
fixed-point computation.
They are composable such that a wide range of trade-offs between speed, accuracy and safety are supported.</p>
<h2>III. Impact On the Standard</h2>
<p>This proposal is a pure library extension. It does not require
changes to any standard classes or functions.
It adds several new class and function templates
to new header file, <code>&lt;fixed_point&gt;</code>.</p>
<p>It depends on two new class templates, <code>width&lt;class&gt;</code> and <code>set_width&lt;class, int&gt;</code>,
added to existing header file, <code>&lt;type_traits&gt;</code> and proposed in
P0381 <a href="http://johnmcfarlane.github.io/fixed_point/papers/p0381r0.html">[10]</a>.</p>
<h2>IV. Design Decisions</h2>
<p>The design is driven by the following aims in roughly descending
order:</p>
<ol>
<li>to automate the task of using integer types to perform low-level
binary fixed-point arithmetic;</li>
<li>to facilitate a style of code that is intuitive to anyone who is
comfortable with integer and floating-point arithmetic;</li>
<li>to treat fixed-point as a super-set of integer such that
a fixed-point type with an exponent of zero can provide
a drop-in replacement for its underlying integer type</li>
<li>to avoid incurring expense for unused features - including compilation time.</li>
</ol>
<p>More generally, the aim of this proposal is to contain within a single API
all the tools necessary to perform binary fixed-point arithmetic.
The design facilitates a wide range of competing compile-time strategies for
avoiding overflow and precision loss, but implements only the simplest by default.
Similarly, orthogonal concerns such as run-time overflow detection and rounding modes
are deferred to the underlying integer types used as storage.</p>
<h3>Class Template</h3>
<p>Fixed-point numbers are specializations of</p>
<pre><code>template &lt;class Rep, int Exponent&gt;
class fixed_point;
</code></pre>
<p>where the template parameters are described as follows.</p>
<h4><code>Rep</code> Type Template Parameter</h4>
<p>This parameter identifies the capacity and signedness of the
underlying type used to represent the value. In other words, the size
of the resulting type will be <code>sizeof(Rep)</code> and it will be
signed iff <code>is_signed&lt;Rep&gt;::value</code> is true.</p>
<p><code>Rep</code> may be a fundamental integral type or similar integer-like type.
The most suitable types are: <code>std::int8_t</code>, <code>std::uint8_t</code>,
<code>std::int16_t</code>, <code>std::uint16_t</code>, <code>std::int32_t</code> and <code>std::uint32_t</code>.
In limited situations, <code>std::int64_t</code> and <code>std::uint64_t</code> can be used.
The  reasons for these limitations relate to the difficulty in finding
a type that is suitable for performing lossless integer
multiplication.</p>
<p>The characteristics of <code>Rep</code> are passed to the fixed-point type.
If, for example, <code>Rep</code> has an alternative rounding style,
overflow handling strategy or large storage capacity,
then the <code>fixed_point</code> specialization will benefit from this feature.
By defaulting to <code>int</code> for its representation, <code>fixed_point</code>
defaults to machine-level efficiency and minimal compile-time overhead.</p>
<h4><code>Exponent</code> Non-Type Template Parameter</h4>
<p>The exponent of a fixed-point type is the equivalent of the exponent
field in a floating-point type and shifts the stored value by the
requisite number of bits necessary to produce the desired range. The
default value of <code>Exponent</code> is zero, giving <code>fixed_point&lt;T&gt;</code> the same
range as <code>T</code>.</p>
<p>The resolution of a specialization of <code>fixed_point</code> is</p>
<pre><code>pow(2, Exponent)
</code></pre>
<p>and the minimum and maximum values are</p>
<pre><code>std::numeric_limits&lt;Rep&gt;::min() * pow(2, Exponent)
</code></pre>
<p>and</p>
<pre><code>std::numeric_limits&lt;Rep&gt;::max() * pow(2, Exponent)
</code></pre>
<p>respectively.</p>
<p>Any usage that results in values of <code>Exponent</code> which lie outside the
range, (<code>INT_MIN / 2</code>, <code>INT_MAX / 2</code>), may result in undefined
behavior and/or overflow or underflow. This range of exponent values
is far in excess of the largest built-in floting-point type and should
be adequate for all intents and purposes.</p>
<h3><code>make_fixed</code> and <code>make_ufixed</code> Helper Types</h3>
<p>The <code>Exponent</code> template parameter is versatile and concise. It is an
intuitive scale to use when considering the full range of positive and
negative exponents a fixed-point type might possess. It also
corresponds to the exponent field of built-in floating-point types.</p>
<p>However, most fixed-point formats can be described more intuitively by
the cardinal number of integer and/or fractional digits they contain.
Most users will prefer to distinguish fixed-point types using these
parameters.</p>
<p>For this reason, two aliases are defined in the style of
<code>make_signed</code>.</p>
<p>These aliases are declared as:</p>
<pre><code>template &lt;int IntegerDigits, int FractionalDigits = 0, class Archetype = signed&gt;
using make_fixed;
</code></pre>
<p>and</p>
<pre><code>template &lt;int IntegerDigits, int FractionalDigits = 0, class Archetype = unsigned&gt;
using make_ufixed;
</code></pre>
<p>They resolve to a <code>fixed_point</code> specialization with the given
signedness and number of integer and fractional digits. They may
contain additional integer digits.</p>
<p>For example, one could define and initialize an 8-bit, unsigned,
fixed-point variable with four integer digits and four fractional
digits:</p>
<pre><code>make_ufixed&lt;4, 4&gt; value { 15.9375 };
</code></pre>
<p>or a 32-bit, signed, fixed-point number with two integer digits and 29
fractional digits:</p>
<pre><code>make_fixed&lt;2, 29&gt; value { 3.141592653 };
</code></pre>
<p>Type parameter, <code>Archetype</code>, is provided in the case that a
<code>fixed_point</code> specialization is desired which has as the <code>Rep</code>
parameter some type other than a built-in integral. The signedness of
<code>Archetype</code> corresponds to the signedness of the resultant
<code>fixed_point</code> specialization although the size does not.</p>
<h3>Conversion</h3>
<p>Fixed-point numbers can be explicitly converted to and from
arithmetic types.</p>
<p>While effort is made to ensure that significant digits are not lost
during conversion, no effort is made to avoid rounding errors.
Whatever would happen when converting to and from <code>Rep</code> largely
applies to <code>fixed_point</code> objects also. For example:</p>
<pre><code>make_ufixed&lt;4, 4&gt;(.006) == make_ufixed&lt;4, 4&gt;(0)
</code></pre>
<p>...equates to <code>true</code> and is considered an acceptable rounding error.</p>
<h3>Operator Overloads</h3>
<p>Any operators that might be applied to integer types can also be
applied to fixed-point types. A guiding principle of operator
overloads is that they perform as little run-time computation as is
practically possible.</p>
<p>With the exception of shift and comparison operators, binary operators
can take any combination of:</p>
<ul>
<li>one or two fixed-point arguments and</li>
<li>zero or one arguments of any arithmetic type, i.e. a type for which
<code>is_arithmetic</code> is true.</li>
</ul>
<p>Where the inputs are not identical fixed-point types, a simple set of
promotion-like rules are applied to determine the return type:</p>
<ol>
<li>If both arguments are fixed-point,
then the result has a <code>Rep</code> which is the common type of the <code>Rep</code> of the inputs
and the <code>Exponent</code> value of the input with the greater integer capacity.</li>
<li>If one of the arguments is a floating-point type, then the type of
the result is the smallest floating-point type of equal or greater
size than the inputs.</li>
<li>If one of the arguments is an integral type,
then the result has a <code>Rep</code> which is the common type of the input fixed-point <code>Rep</code> and the integral type
and the same <code>Exponent</code> value as the input fixed-point type.</li>
</ol>
<p>Some examples:</p>
<pre><code>fixed_point&lt;uint8_t, -3&gt;{8} + fixed_point&lt;int8_t, -4&gt;{3} == fixed_point&lt;int, -3&gt;{11};
fixed_point&lt;uint8_t, -3&gt;{8} + 3 == fixed_point&lt;unsigned, -3&gt;{11};  
fixed_point&lt;uint8_t, -3&gt;{8} + float{3} == float{11};  
</code></pre>
<p>The reasoning behind this choice is a combination of predictability
and performance. It is explained for each rule as follows:</p>
<ol>
<li>ensures that the least computation is performed where fixed-point
types are used exclusively. Aside from multiplication and division
requiring shift operations, should require similar computational
costs to equivalent integer operations;</li>
<li>loosely follows the promotion rules for mixed-mode arithmetic,
ensures values with exponents far beyond the range of the
fixed-point type are catered for and avoids costly conversion from
floating-point to integer and</li>
<li>preserves the input fixed-point type whose range is far more likely
to be of deliberate importance to the operation.</li>
</ol>
<p>A guiding aim is for specializations with <code>Exponent</code> set to 0 to behave as closely as possible like their <code>Rep</code>.
For instance, where possible, an object of type, <code>int</code>, should be interchangeable with <code>fixed_point&lt;&gt;</code>.</p>
<p>Shift operator overloads require an integer type as the right-hand
parameter and return a type which is adjusted to accommodate the new
value without risk of overflow or underflow.</p>
<p>Comparison operators convert the inputs to a common result type
following the rules above before performing a comparison and returning
<code>true</code> or <code>false</code>.</p>
<h4>Overflow</h4>
<p>Because arithmetic operators return a result of equal capacity to
their inputs, they carry a risk of overflow. For instance,</p>
<pre><code>make_ufixed&lt;2, 30&gt;(3) + make_ufixed&lt;2, 30&gt;(1)
</code></pre>
<p>is zero on architectures where <code>int</code> is 4 bytes because a type with 2 integer bits cannot
store a value of 4.</p>
<p>The result of overflow of any bits in a fixed-point value depends
entirely on how <code>Rep</code> handles overflow. Thus, for built-in
signed types, the result is undefined and for built-in unsigned types,
the value wraps around.</p>
<h4>Underflow</h4>
<p>The other typical cause of lost bits is underflow where, for example,</p>
<pre><code>make_fixed&lt;7, 0&gt;(15) / make_fixed&lt;7, 0&gt;(2)
</code></pre>
<p>results in a value of 7. This results in loss of precision but is
generally considered acceptable.</p>
<p>However, when all bits are lost due to underflow, the value is said
to be flushed. As with overflow, the result of a flush is the same for
a fixed-point type as it is for its underlying <code>Rep</code>. In the case
of built-in integral types, the value becomes zero.</p>
<h3>Dealing With Overflow and Flushes</h3>
<p>Errors resulting from overflow and flushes are two of the biggest
headaches related to fixed-point arithmetic. Integers suffer the same
kinds of errors but are somewhat easier to reason about as they lack
fractional digits. Floating-point numbers are largely shielded from
these errors by their variable exponent and implicit bit.</p>
<p>Four strategies for avoiding overflow in fixed-point types are
presented:</p>
<ol>
<li>simply leave it to the user to avoid overflow;</li>
<li>allow the user to provide a custom type for <code>Rep</code>
which behaves differently from built-in integral types;</li>
<li>promote the result to a larger type to ensure sufficient capacity
or</li>
<li>adjust the exponent of the result upward to ensure that the top
limit of the type is sufficient to preserve the most significant
digits at the expense of the less significant digits.</li>
</ol>
<p>For arithmetic operators, choice 1) is taken because it most closely
follows the behavior of integer types. Thus it should cause the least
surprise to the fewest users. This makes it far easier to reason
about in code where functions are written with a particular type in
mind. It also requires the least computation in most cases.</p>
<p>Choice 2) is beyond the scope of this proposal
and is covered in more detail in the section, <strong>Alternative Types for <code>Rep</code></strong>.</p>
<p>Choices 3) and 4) are reasonably robust to overflow events.
However, they represent different trade-offs and neither one is the best fit in all situations.
Notably, where any instance of <code>c = a + b</code> is replaced with <code>a += b</code>, results may change in surprising ways.
For these reasons, they are presented as named functions.</p>
<h4>Named Arithmetic Functions</h4>
<p>The following named function templates can be used as general-purpose alternatives to arithmetic operators, <code>-</code>, <code>+</code>, <code>*</code> and <code>/</code>.</p>
<pre><code class="language-c++">template &lt;class Result, class Rhs&gt;
constexpr Result negate(const Rhs&amp;);

template &lt;class Result, class Lhs, class Rhs&gt;
constexpr Result add(const Lhs&amp;, const Rhs&amp;);

template &lt;class Result, class Lhs, class Rhs&gt;
constexpr Result subtract(const Lhs&amp;, const Rhs&amp;);

template &lt;class Result, class Lhs, class Rhs&gt;
constexpr Result multiply(const Lhs&amp;, const Rhs&amp;);

template &lt;class Result, class Lhs, class Rhs&gt;
constexpr Result divide(const Lhs&amp;, const Rhs&amp;);
</code></pre>
<p>They represent the cornerstone of all other fixed-point arithmetic
and offer the user maximum control over the computation being performed.
If used carefully, these functions can provide the same performance as built-in integer arithmetic.</p>
<p>The following example loses no information and performs no shifts, effectively boiling down to <code>255*255</code>.</p>
<pre><code class="language-c++">auto f = make_ufixed&lt;4, 4&gt;{15.9375};
auto p = multiply&lt;make_ufixed&lt;8, 8&gt;&gt;(f, f);
// p === make_ufixed&lt;8, 8&gt;{254.00390625}
</code></pre>
<p>However, these functions shield the user from none of the pitfalls of fixed-point arithmetic.
For example, naive use of <code>multiply</code> can easily lead to surprising results.</p>
<pre><code class="language-c++">auto f = make_ufixed&lt;4, 4&gt;{15.9375};
auto p = multiply&lt;make_ufixed&lt;4, 4&gt;&gt;(f, f);
// p === make_ufixed&lt;4, 4&gt;{14.00000000};
</code></pre>
<p>In contrast, the <code>*</code> operator uses wider intermediate types and language-level promotion rules.
It does a better job at avoiding unnecessary narrowing and catastrophic information loss.</p>
<pre><code class="language-c++">auto f = make_ufixed&lt;4, 4&gt;{15.9375};
auto p = f * f;
// p === make_fixed&lt;27, 4&gt;&gt;{254.00000000}
</code></pre>
<h3>Alternative Types for <code>Rep</code></h3>
<p>Using built-in integral types as the default underlying representation
minimizes certain costs:</p>
<ul>
<li>many fixed-point operations are as efficient as their integral equivalents;</li>
<li>compile-time complexity is kept relatively low and</li>
<li>the behavior of fixed-point types should cause few surprises.</li>
</ul>
<p>However, this choice also brings with it many of the deficiencies of built-in types.
For example:</p>
<ul>
<li>the typical rounding behavior is distinct for:
<ul>
<li>conversion from floating-point types;</li>
<li>right shift and</li>
<li>divide operations;</li>
</ul>
</li>
<li>all of these rounding behaviors cause drift and propagate error;</li>
<li>overflow, underflow and flush are handled silently with wrap-around or undefined behavior;</li>
<li>divide-by-zero similarly results in undefined behavior and</li>
<li>the range of values is limited by the largest type: <code>long long int</code>.</li>
</ul>
<p>The effort involved in addressing these deficiencies is non-trivial
and on-going (for example <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0105r0.html">[2]</a>).
As solutions are made available, it should become easier
to define custom integral types which address concerns surrounding robustness and correctness.
Such types deserve their place in the standard library.</p>
<h4>Example Custom Type, <code>integer</code></h4>
<p>A composable system of integer types that is suitable for use with <code>fixed_point</code> might take the following form:</p>
<pre><code>// size may be rounded up in some cases;
template &lt;int NumBytes, bool IsSigned = true&gt;
class sized_integer;

// may take built-in or sized_integer as Rep parameter
template &lt;class Rep = int, rounding Rounding = rounding::towards_odd&gt;
class rounding_integer;

// may take built-in, sized_integer or rounding_integer as Rep parameter
template &lt;class Rep = int, overflow Overflow = overflow::exception&gt;
class overflow_integer;

// a 'kitchen sink' custom integer type  
template &lt;int NumBytes, bool IsSigned = true, rounding Rounding = rounding::towards_odd, overflow Overflow = overflow::exception&gt;
using integer = 
  overflow_integer&lt;
    rounding_integer&lt;
      sized_integer&lt;NumBytes, IsSigned&gt;,
      Rounding&gt;,
    Overflow&gt;;
</code></pre>
<p>Any of these types might be used to compose <code>fixed_point</code> specializations
without paying (in compile-time complexity) for features that are not used.</p>
<p>While the issues related to integer types affect the fixed-point types they support,
they are not specific to fixed-point.
It would not only be premature - but inappropriate -
to attempt to address rounding and error handling at the level of a fixed-point type.</p>
<h4>Required Specializations</h4>
<p>For a type to be suitable as parameter, <code>Rep</code>, of <code>fixed_point</code>,
it must meet the following requirements:</p>
<ul>
<li>it must have specialized the following existing standard library types:
<ul>
<li><code>is_signed</code> and <code>is_unsigned</code></li>
<li><code>make_signed</code> and <code>make_unsigned</code></li>
<li><code>is_integer</code></li>
</ul>
</li>
<li>it must have specialized the following proposed standard library types:
<ul>
<li><code>width</code> and <code>set_width</code> as described in P0381 <a href="http://johnmcfarlane.github.io/fixed_point/papers/p0381r0.html">[10]</a></li>
</ul>
</li>
</ul>
<h3>Example</h3>
<p>The following function, <code>magnitude</code>, calculates the magnitude of a 3-dimensional vector.</p>
<pre><code class="language-c++">template&lt;class Fp&gt;
constexpr auto magnitude(Fp x, Fp y, Fp z)
{
    return sqrt(x*x+y*y+z*z);
}
</code></pre>
<p>And here is a call to <code>magnitude</code>.</p>
<pre><code class="language-c++">auto m = magnitude(
        make_ufixed&lt;4, 12&gt;(1),
        make_ufixed&lt;4, 12&gt;(4),
        make_ufixed&lt;4, 12&gt;(9));
// m === make_fixed&lt;19, 12&gt;(9.8994140625)
</code></pre>
<p>Observe that the arithmetic operations in <code>magnitude</code> obey the promotion rules of their underlying type.
Hence <code>m</code> is backed by <code>int</code> - which is 32 bits wide on most modern architectures.</p>
<h2>V. Technical Specification</h2>
<h3>Header &lt;fixed_point&gt; Synopsis</h3>
<pre><code class="language-c++">namespace std {
  template &lt;class Rep, int Exponent&gt; class fixed_point;

  template &lt;int IntegerDigits, int FractionalDigits = 0, class Archetype = signed&gt;
    using make_fixed;
  template &lt;int IntegerDigits, int FractionalDigits = 0, class Archetype = unsigned&gt;
    using make_ufixed;

  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator==(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator!=(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator&lt;(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator&gt;(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator&gt;=(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr bool operator&lt;=(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;

  template &lt;class Rep, int Exponent&gt;
    constexpr fixed_point&lt;Rep, Exponent&gt; operator-(
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr fixed_point&lt;Rep, Exponent&gt; operator+(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr fixed_point&lt;Rep, Exponent&gt; operator-(
      const fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    fixed_point&lt;Rep, Exponent&gt; &amp; operator+=(
      fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    fixed_point&lt;Rep, Exponent&gt; &amp; operator-=(
      fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    fixed_point&lt;Rep, Exponent&gt; &amp; operator*=(
      fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    fixed_point&lt;Rep, Exponent&gt; &amp; operator/=(
      fixed_point&lt;Rep, Exponent&gt; &amp; lhs,
      const fixed_point&lt;Rep, Exponent&gt; &amp; rhs) noexcept;

  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator==(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator!=(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator&lt;(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator&gt;(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator&gt;=(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator&lt;=(const Lhs &amp; lhs, const Rhs &amp; rhs) noexcept;

  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator+(
      const Lhs &amp; lhs,
      const Rhs &amp; rhs) noexcept;
  template &lt;class Lhs, class Rhs&gt;
    constexpr auto operator-(
      const Lhs &amp; lhs,
      const Rhs &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class RhsRep, int RhsExponent&gt;
    constexpr auto operator*(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class RhsRep, int RhsExponent&gt;
    constexpr auto operator/(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class Integer&gt;
    constexpr auto operator*(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const Integer &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class Integer&gt;
    constexpr auto operator/(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const Integer &amp; rhs) noexcept;
  template &lt;class Integer, class RhsRep, int RhsExponent&gt;
    constexpr auto operator*(
      const Integer &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class Integer, class RhsRep, int RhsExponent&gt;
    constexpr auto operator/(
      const Integer &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class Float&gt;
    constexpr auto operator*(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const Float &amp; rhs) noexcept;
  template &lt;class LhsRep, int LhsExponent, class Float&gt;
    constexpr auto operator/(
      const fixed_point&lt;LhsRep, LhsExponent&gt; &amp; lhs,
      const Float &amp; rhs) noexcept;
  template &lt;class Float, class RhsRep, int RhsExponent&gt;
    constexpr auto operator*(
      const Float &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class Float, class RhsRep, int RhsExponent&gt;
    constexpr auto operator/(
      const Float &amp; lhs,
      const fixed_point&lt;RhsRep, RhsExponent&gt; &amp; rhs) noexcept;
  template &lt;class LhsRep, int Exponent, class Rhs&gt;
    fixed_point&lt;LhsRep, Exponent&gt; &amp; operator+=(fixed_point&lt;LhsRep, Exponent&gt; &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class LhsRep, int Exponent, class Rhs&gt;
    fixed_point&lt;LhsRep, Exponent&gt; &amp; operator-=(fixed_point&lt;LhsRep, Exponent&gt; &amp; lhs, const Rhs &amp; rhs) noexcept;
  template &lt;class LhsRep, int Exponent&gt;
  template &lt;class Rhs, typename std::enable_if&lt;std::is_arithmetic&lt;Rhs&gt;::value, int&gt;::type Dummy&gt;
    fixed_point&lt;LhsRep, Exponent&gt; &amp;
    fixed_point&lt;LhsRep, Exponent&gt;::operator*=(const Rhs &amp; rhs) noexcept;
  template &lt;class LhsRep, int Exponent&gt;
  template &lt;class Rhs, typename std::enable_if&lt;std::is_arithmetic&lt;Rhs&gt;::value, int&gt;::type Dummy&gt;
    fixed_point&lt;LhsRep, Exponent&gt; &amp;
    fixed_point&lt;LhsRep, Exponent&gt;::operator/=(const Rhs &amp; rhs) noexcept;
  template &lt;class Rep, int Exponent&gt;
    constexpr fixed_point&lt;Rep, Exponent&gt;
      sqrt(const fixed_point&lt;Rep, Exponent&gt; &amp; x) noexcept;
}
</code></pre>
<h4><code>fixed_point&lt;&gt;</code> Class Template</h4>
<pre><code class="language-c++">template &lt;class Rep = int, int Exponent = 0&gt;
class fixed_point
{
public:
  using rep = Rep;

  constexpr static int exponent;
  constexpr static int digits;
  constexpr static int integer_digits;
  constexpr static int fractional_digits;

  fixed_point() noexcept;
  template &lt;class S, typename std::enable_if&lt;_impl::is_integral&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    explicit constexpr fixed_point(S s) noexcept;
  template &lt;class S, typename std::enable_if&lt;std::is_floating_point&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    explicit constexpr fixed_point(S s) noexcept;
  template &lt;class FromRep, int FromExponent&gt;
    explicit constexpr fixed_point(const fixed_point&lt;FromRep, FromExponent&gt; &amp; rhs) noexcept;
  template &lt;class S, typename std::enable_if&lt;_impl::is_integral&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    fixed_point &amp; operator=(S s) noexcept;
  template &lt;class S, typename std::enable_if&lt;std::is_floating_point&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    fixed_point &amp; operator=(S s) noexcept;
  template &lt;class FromRep, int FromExponent&gt;
    fixed_point &amp; operator=(const fixed_point&lt;FromRep, FromExponent&gt; &amp; rhs) noexcept;

  template &lt;class S, typename std::enable_if&lt;_impl::is_integral&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    explicit constexpr operator S() const noexcept;
  template &lt;class S, typename std::enable_if&lt;std::is_floating_point&lt;S&gt;::value, int&gt;::type Dummy = 0&gt;
    explicit constexpr operator S() const noexcept;
  explicit constexpr operator bool() const noexcept;

  template &lt;class Rhs, typename std::enable_if&lt;std::is_arithmetic&lt;Rhs&gt;::value, int&gt;::type Dummy = 0&gt;
    fixed_point &amp;operator*=(const Rhs &amp; rhs) noexcept;
  template &lt;class Rhs, typename std::enable_if&lt;std::is_arithmetic&lt;Rhs&gt;::value, int&gt;::type Dummy = 0&gt;
    fixed_point &amp; operator/=(const Rhs &amp; rhs) noexcept;

  constexpr rep data() const noexcept;
  static constexpr fixed_point from_data(rep r) noexcept;
};
</code></pre>
<h2>VI. Future Issues</h2>
<h3>Library Support</h3>
<p>Because the aim is to provide an alternative to existing arithmetic
types which are supported by the standard library, it is conceivable
that a future proposal might specialize existing class templates and
overload existing functions.</p>
<p>Possible candidates for overloading include the functions defined in
&lt;cmath&gt; and a templated specialization of <code>numeric_limits</code>. A new type
trait, <code>is_fixed_point</code>, would also be useful.</p>
<p>While <code>fixed_point</code> is intended to provide drop-in replacements to
existing built-ins, it may be preferable to deviate slightly from the
behavior of certain standard functions. For example, overloads of
functions from &lt;cmath&gt; will be considerably less concise, efficient
and versatile if they obey rules surrounding error cases. In
particular, the guarantee of setting <code>errno</code> in the case of an error
prevents a function from being defined as pure. This highlights a
wider issue surrounding the adoption of the functional approach and
compile-time computation that is beyond the scope of this document.</p>
<h3>Compile-Time Bit-Shift Operations</h3>
<p>A notable feature of the <em>fp</em> library <a href="https://github.com/mizvekov/fp">[4]</a>
is the creation of an alias for <code>integral_constant</code> which can be applied to the right-hand side of bit-shift operations.
The type returned from this operation has identical bit-wise value to the left-hand input
but with <code>Exponent</code> value adjusted by the amount of the right-hand side.
It is essentially the same as the <code>trunc_shift_</code> functions
and means that when shifting by literal values what looks looks like run-time operation
is a compile-time calculation which guarantees no overflow or underflow.</p>
<h3>Alternative Return Type Policies</h3>
<p>When devising a strategy for mitigating the risk of overflow during arithmetic operations,
the number of integer and fractional bits stored in the result is an important choice.
The <code>fixed_point</code> type picks one of the simpler options by default,
but it is by no means the only viable one.</p>
<p>The <em>fp</em> library <a href="https://github.com/mizvekov/fp">[4]</a> returns a type
whose size matches the inputs but whose exponent is shifted to preserve high bits.
The arithmetic types proposed in P0106
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html">[8]</a>
increase capacity to ensure that precision is preserved.
Even greater control of the required capacity of a fixed-point type can be afforded by
systems such as the <em>bounded::integer</em> library <a href="http://doublewise.net/c++/bounded/">[3]</a>.</p>
<p>A common requirement among these approaches is the ability
to specify the return type of arithmetic operations.
For this reason, named-function arithmetic operators
which are more expressive than those proposed so far may be necessary.</p>
<p>These functions would specify a return type as a template parameter. For example:</p>
<pre><code>template &lt;class Result, class Lhs, class Rhs&gt;
constexpr Result fixed_point_multiply(const Lhs &amp; lhs, const Rhs &amp; rhs);
</code></pre>
<h3>Non-Binary Radixes</h3>
<p>Interested in decimal fixed-point arithmetic has been observed.
It seems plausible that a general-purpose type could support both
binary and decimal fixed-point types as follows:</p>
<pre><code class="language-c++14">template&lt;class Rep, int Exponent, int Radix&gt;
class basic_fixed_point;

template&lt;class Rep, int Exponent&gt;
using fixed_point = basic_fixed_point&lt;Rep, Exponent, 2&gt;;

template&lt;class Rep, int Exponent&gt;
using decimal_fixed_point = basic_fixed_point&lt;Rep, Exponent, 10&gt;;
</code></pre>
<p>This naming scheme imitates <code>basic_string</code>
in order to illustrate a similar pattern.</p>
<p>Further investigation needs to be conducted
in order to ascertain whether this break-down
can maintain the same level of efficiency.</p>
<h2>VII. Prior Art</h2>
<p>Many examples of fixed-point support in C and C++ exist. While almost
all of them aim for low run-time cost and expressive alternatives to
raw integer manipulation, they vary greatly in detail and in terms of
their interface.</p>
<p>One especially interesting dichotomy is between solutions which offer
a discrete selection of fixed-point types and libraries which contain
a continuous range of exponents through type parameterization.</p>
<h3>N1169</h3>
<p>One example of the former is found in proposal N1169
<a href="http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf">[5]</a>,
the intent of which is to expose features found in certain embedded
hardware. It introduces a succinct set of language-level fixed-point
types and impose constraints on the number of integer or fractional
digits each can possess.</p>
<p>As with all examples of discrete-type fixed-point support, the limited
choice of exponents is a considerable restriction on the versatility
and expressiveness of the API.</p>
<p>Nevertheless, it may be possible to harness performance gains provided
by N1169 fixed-point types through explicit template specialization.
This is likely to be a valuable proposition to potential users of the
library who find themselves targeting platforms which support
fixed-point arithmetic at the hardware level.</p>
<h3>P0106</h3>
<p>There are many other C++ libraries available which fall into the
latter category of continuous-range fixed-point arithmetic
<a href="https://github.com/mizvekov/fp">[4]</a>
<a href="http://www.codeproject.com/Articles/37636/Fixed-Point-Class">[6]</a>
<a href="https://github.com/viboes/fixed_point">[7]</a>. In particular, an
existing library proposal, P0106 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html">[8]</a>,
aims to achieve very similar goals through similar means and warrants
closer comparison than N1169.</p>
<p>P0106 introduces four class templates covering the quadrant of signed
versus unsigned and fractional versus integer numeric types. It is
intended to replace built-in types in a wide variety of situations and
accordingly, is highly compile-time configurable in terms of how
rounding and overflow are handled. Parameters to these four class
templates include the storage in bits and - for fractional types - the
resolution.</p>
<p>The <code>fixed_point</code> class template could probably - with a few caveats -
be generated using the two fractional types, <code>nonnegative</code> and
<code>negatable</code>, replacing the <code>Rep</code> parameter with the integer bit
count of <code>Rep</code>, specifying <code>fastest</code> for the rounding mode and
specifying <code>undefined</code> as the overflow mode.</p>
<p>However, <code>fixed_point</code> more closely and concisely caters to the needs of
users who already use integer types and simply desire a more concise,
less error-prone form. It more closely follows the four design aims of
the library and - it can be argued - more closely follows the spirit
of the standard in its pursuit of zero-cost abstraction.</p>
<p>Some aspects of the design of the P0106 API which back up these
conclusion are that:</p>
<ul>
<li>the result of arithmetic operations closely resemble the <code>trunc_</code>
function templates and are potentially more costly at run-time;</li>
<li>the nature of the range-specifying template parameters - through
careful framing in mathematical terms - abstracts away valuable
information regarding machine-critical type size information;</li>
<li>the breaking up of duties amongst four separate class templates
introduces four new concepts and incurs additional mental load for
relatively little gain while further detaching the interface from
vital machine-level details and</li>
<li>the absence of the most negative number from signed types reduces
the capacity of all types by one.</li>
</ul>
<p>The added versatility that the P0106 API provides regarding rounding
and overflow handling are of relatively low priority to users who
already bear the scars of battles with raw integer types.
Nevertheless, providing them as options to be turned on or off at
compile time is an ideal way to leave the choice in the hands of the
user.</p>
<p>Many high-performance applications - in which fixed-point is of
potential value - favor run-time checks during development which are
subsequently deactivated in production builds.
The P0106 interface is highly conducive to this style of development.
The design proposed in this paper aims to achieve similar results
by composing fixed-point types from custom integral types.</p>
<h2>VIII. Acknowledgements</h2>
<p>SG6: Lawrence Crowl
SG14: Guy Davidson, Michael Wong<br />
Contributors: Ed Ainsley, Billy Baker, Lance Dyson, Marco Foco,
Mathias Gaunard, Clément Grégoire, Nicolas Guillemot, Kurt Guntheroth, Matt Kinzelman, Joël Lamotte,
Sean Middleditch, Paul Robinson, Patrice Roy, Peter Schregle, Ryhor Spivak</p>
<h2>IX. Revisions</h2>
<p>This paper revises <a href="http://johnmcfarlane.github.io/fixed_point/papers/p0037r1.html">P0037R1</a>:</p>
<ul>
<li>notes that fixed-point can exhibit greater determinism compared to floating-point;</li>
<li>renames <code>resize</code> to <code>set_width</code></li>
<li>adds <code>width</code></li>
<li>moves details of <code>width</code> and <code>set_width</code> to separate paper, P0381, Numeric Width
<a href="http://johnmcfarlane.github.io/fixed_point/papers/p0381r0.html">[10]</a>;</li>
<li>revises design aims, making the <code>fixed_point</code> class template more like integer types;</li>
<li>renames template parameter <code>ReprType</code> to <code>Rep</code> in line with chronos API;</li>
<li>makes more effort to explain the reason for exposing <code>Rep</code> to the API;</li>
<li>minor typo fixes;</li>
<li>notes change in approach away from throwing spare bits at fractional part of value;</li>
<li>changes to rules concerning result of mixed-mode arithmetic operations</li>
<li>updates various examples to reflect changes in design</li>
<li>removes <code>promote_</code> and <code>trunc_</code> functions</li>
<li>adds simpler set of named arithmetic functions such as <code>multiply</code>, <code>subtract</code> etc.</li>
<li>removes section on bounded integers</li>
<li>adds section on non-binary radixes</li>
<li>lists additions to reference implementation including <code>elastic</code> type
which returns expanded results in order to avoid overflow a la P0106's types.</li>
</ul>
<h2>X. References</h2>
<ol>
<li>Why Integer Coordinates?, <a href="http://www.pathengine.com/Contents/Overview/FundamentalConcepts/WhyIntegerCoordinates/page.php">http://www.pathengine.com/Contents/Overview/FundamentalConcepts/WhyIntegerCoordinates/page.php</a></li>
<li>Rounding and Overflow in C++, <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0105r0.html">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0105r0.html</a></li>
<li>C++ bounded::integer library, <a href="http://doublewise.net/c++/bounded/">http://doublewise.net/c++/bounded/</a></li>
<li>fp, C++14 Fixed Point Library, <a href="https://github.com/mizvekov/fp">https://github.com/mizvekov/fp</a></li>
<li>N1169, Extensions to support embedded processors, <a href="http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf">http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf</a></li>
<li>fpmath, Fixed Point Math Library, <a href="http://www.codeproject.com/Articles/37636/Fixed-Point-Class">http://www.codeproject.com/Articles/37636/Fixed-Point-Class</a></li>
<li>Boost fixed_point (proposed), Fixed point integral and fractional types, <a href="https://github.com/viboes/fixed_point">https://github.com/viboes/fixed_point</a></li>
<li>P0106, C++ Binary Fixed-Point Arithmetic, <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html</a></li>
<li>fixed_point, Reference Implementation of P0037, <a href="https://github.com/johnmcfarlane/fixed_point">https://github.com/johnmcfarlane/fixed_point</a></li>
<li>P0381, Numeric Width, <a href="http://johnmcfarlane.github.io/fixed_point/papers/p0381r0.html">http://johnmcfarlane.github.io/fixed_point/papers/p0381r0.html</a></li>
</ol>
<h2>XI. Appendix 1: Reference Implementation</h2>
<p>An in-development implementation of the fixed_point class template and
its essential supporting functions and types is available
<a href="https://github.com/johnmcfarlane/fixed_point">[9]</a>.</p>
<p>Items include:</p>
<ul>
<li>utility header containing definitions for:
<ul>
<li>math and trigonometric functions and</li>
<li>a partial <code>numeric_limits</code> specialization;</li>
</ul>
</li>
<li>a type, <code>integer</code>, intended to explore and illustrate the potential of custom <code>Rep</code>;</li>
<li>a type, <code>elastic</code>, intended to illustrate the use of <code>fixed_point</code>
to design better-behaved numeric types such as those presented in
P0106 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html">[8]</a>;</li>
<li>compile-time tests of <code>constexpr</code> operations;</li>
<li>run-time tests of assignment and exception-throwing behavior and</li>
<li>benchmarking support (used to generate results in this paper).</li>
</ul>
<h2>XII. Appendix 2: Performance</h2>
<p>Despite a focus on usable interface and direct translation from
integer-based fixed-point operations, there is an overwhelming
expectation that the source code result in minimal instructions and
clock cycles. A few preliminary numbers are presented to give a very
early idea of how the API might perform.</p>
<p>Some notes:</p>
<ul>
<li>
<p>A few test functions were run, ranging from single arithmetic
operations to basic geometric functions, performed against integer,
floating-point and fixed-point types for comparison.</p>
</li>
<li>
<p>Figures were taken from a single CPU, OS and compiler, namely:</p>
<pre><code>Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
Target: x86_64-pc-linux-gnu
Thread model: posix
</code></pre>
</li>
<li>
<p>Fixed inputs were provided to each function, meaning that branch
prediction rarely fails. Results may also not represent the full
range of inputs.</p>
</li>
<li>
<p>Details of the test harness used can be found in the source
project mentioned in Appendix 1;</p>
</li>
<li>
<p>Times are in nanoseconds;</p>
</li>
<li>
<p>Code has not yet been optimized for performance.</p>
</li>
</ul>
<h3>Types</h3>
<p>Where applicable various combinations of integer, floating-point and
fixed-point types were tested with the following identifiers:</p>
<ul>
<li><code>uint8_t</code>, <code>int8_t</code>, <code>uint16_t</code>, <code>int16_t</code>, <code>uint32_t</code>, <code>int32_t</code>,
<code>uint64_t</code> and <code>int64_t</code> built-in integer types;</li>
<li><code>float</code>, <code>double</code> and <code>long double</code> built-in floating-point types;</li>
<li>s3:4, u4:4, s7:8, u8:8, s15:16, u16:16, s31:32 and u32:32 format
fixed-point types.</li>
</ul>
<h3>Basic Arithmetic</h3>
<p>Plus, minus, multiplication and division were tested in isolation
using a number of different numeric types with the following results:</p>
<p>name	cpu_time<br />
add(float)	1.78011<br />
add(double)	1.73966<br />
add(long double)	3.46011<br />
add(u4_4)	1.87726<br />
add(s3_4)	1.85051<br />
add(u8_8)	1.85417<br />
add(s7_8)	1.82057<br />
add(u16_16)	1.94194<br />
add(s15_16)	1.93463<br />
add(u32_32)	1.94674<br />
add(s31_32)	1.94446<br />
add(int8_t)	2.14857<br />
add(uint8_t)	2.12571<br />
add(int16_t)	1.9936<br />
add(uint16_t)	1.88229<br />
add(int32_t)	1.82126<br />
add(uint32_t)	1.76<br />
add(int64_t)	1.76<br />
add(uint64_t)	1.83223<br />
sub(float)	1.96617<br />
sub(double)	1.98491<br />
sub(long double)	3.55474<br />
sub(u4_4)	1.77006<br />
sub(s3_4)	1.72983<br />
sub(u8_8)	1.72983<br />
sub(s7_8)	1.72983<br />
sub(u16_16)	1.73966<br />
sub(s15_16)	1.85051<br />
sub(u32_32)	1.88229<br />
sub(s31_32)	1.87063<br />
sub(int8_t)	1.76<br />
sub(uint8_t)	1.74994<br />
sub(int16_t)	1.82126<br />
sub(uint16_t)	1.83794<br />
sub(int32_t)	1.89074<br />
sub(uint32_t)	1.85417<br />
sub(int64_t)	1.83703<br />
sub(uint64_t)	2.04914<br />
mul(float)	1.9376<br />
mul(double)	1.93097<br />
mul(long double)	102.446<br />
mul(u4_4)	2.46583<br />
mul(s3_4)	2.09189<br />
mul(u8_8)	2.08<br />
mul(s7_8)	2.18697<br />
mul(u16_16)	2.12571<br />
mul(s15_16)	2.10789<br />
mul(u32_32)	2.10789<br />
mul(s31_32)	2.10789<br />
mul(int8_t)	1.76<br />
mul(uint8_t)	1.78011<br />
mul(int16_t)	1.8432<br />
mul(uint16_t)	1.76914<br />
mul(int32_t)	1.78011<br />
mul(uint32_t)	2.19086<br />
mul(int64_t)	1.7696<br />
mul(uint64_t)	1.79017<br />
div(float)	5.12<br />
div(double)	7.64343<br />
div(long double)	8.304<br />
div(u4_4)	3.82171<br />
div(s3_4)	3.82171<br />
div(u8_8)	3.84<br />
div(s7_8)	3.8<br />
div(u16_16)	9.152<br />
div(s15_16)	11.232<br />
div(u32_32)	30.8434<br />
div(s31_32)	34<br />
div(int8_t)	3.82171<br />
div(uint8_t)	3.82171<br />
div(int16_t)	3.8<br />
div(uint16_t)	3.82171<br />
div(int32_t)	3.82171<br />
div(uint32_t)	3.81806<br />
div(int64_t)	10.2286<br />
div(uint64_t)	8.304</p>
<p>Among the slowest types are <code>long double</code>. It is likely that they are
emulated in software. The next slowest operations are fixed-point
multiply and divide operations - especially with 64-bit types. This is
because values need to be promoted temporarily to double-width types.
This is a known fixed-point technique which inevitably experiences
slowdown where a 128-bit type is required on a 64-bit system.</p>
<p>Here is a section of the disassembly of the s15:16 multiply call:</p>
<pre><code>30:   mov    %r14,%rax  
      mov    %r15,%rax  
      movslq -0x28(%rbp),%rax  
      movslq -0x30(%rbp),%rcx  
      imul   %rax,%rcx  
      shr    $0x10,%rcx  
      mov    %ecx,-0x38(%rbp)  
      mov    %r12,%rax  
4c:   movzbl (%rbx),%eax  
      cmp    $0x1,%eax  
    ↓ jne    68  
54:   mov    0x8(%rbx),%rax  
      lea    0x1(%rax),%rcx  
      mov    %rcx,0x8(%rbx)  
      cmp    0x38(%rbx),%rax  
    ↑ jb     30
</code></pre>
<p>The two 32-bit numbers are multiplied together and the result shifted
down - much as it would if raw <code>int</code> values were used. The efficiency
of this operation varies with the exponent. An exponent of zero should
mean no shift at all.</p>
<h3>3-Dimensional Magnitude Squared</h3>
<p>A fast <code>sqrt</code> implementation has not yet been tested with
<code>fixed_point</code>. (The naive implementation takes over 300ns.) For this
reason, a magnitude-squared function is measured, combining multiply
and add operations:</p>
<pre><code>template &lt;class FP&gt;
constexpr FP magnitude_squared(const FP &amp; x, const FP &amp; y, const FP &amp; z)
{
    return x * x + y * y + z * z;
}
</code></pre>
<p>Only real number formats are tested:</p>
<p>float  2.42606<br />
double  2.08<br />
long double  4.5056<br />
s3_4  2.768<br />
s7_8  2.77577<br />
s15_16  2.752<br />
s31_32  4.10331</p>
<p>Again, the size of the type seems to have the largest impact.</p>
<h3>Circle Intersection</h3>
<p>A similar operation includes a comparison and branch:</p>
<pre><code>template &lt;class Real&gt;
bool circle_intersect_generic(Real x1, Real y1, Real r1, Real x2, Real y2, Real r2)
{
    auto x_diff = x2 - x1;
	auto y_diff = y2 - y1;
    auto distance_squared = x_diff * x_diff + y_diff * y_diff;

    auto touch_distance = r1 + r2;
	auto touch_distance_squared = touch_distance * touch_distance;

    return distance_squared &lt;= touch_distance_squared;
}
</code></pre>
<p>float	3.46011<br />
double	3.48<br />
long double	6.4<br />
s3_4	3.88<br />
s7_8	4.5312<br />
s15_16	3.82171<br />
s31_32	5.92</p>
<p>Again, fixed-point and native performance are comparable.</p>
