<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin: 2em; }
pre.example { margin: 2em; }
div.example { margin: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
p.attribute { text-indent: 3em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1; padding: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC; padding: 0.5em; }
blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }

</style>

<title>C++ Binary Fixed-Point Arithmetic</title>
</head>
<body>
<h1>C++ Binary Fixed-Point Arithmetic</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3352 = 12-0042 - 2012-01-15
</p>

<p>
Lawrence Crowl, Lawrence@Crowl.org
</p>

<h2><a name="Introduction">Introduction</a></h2>

<p>
C++ supports integer arithmetic and floating-point arithmetic,
but it does not support fixed-point arithmetic.
We propose support for fixed-point arithmetic via standard library facilities.
</p>


<h3><a name="versus_integer">Fixed-Point versus Integer</a></h3>

<p>
In C and C++,
the basic integer types have several problematic behaviors.
</p>

<ul>

<li><p>
Signed integer arithmetic overflow results in undefined behavior.
Pre-emptively checking for overflow is challenging and tedious,
so programmers very rarely do so.
As a result, most programmers simply assume it will not happen.
</p></li>

<li><p>
Unsigned integer arithmetic overflow results in well-defined behavior,
but that behavior is not always desirable.
Again, pre-emptively checking for overflow is challenging and tedious.
</p></li>

<li><p>
C/C++ signed integers promote to unsigned integers,
which is the exact opposite of the relationship
between the their mathematical analogs;
(unsigned) cardinal numbers are a strict subset of (signed) integral numbers.
Because of this promotion,
it is difficult to prevent signed numbers from being used
in places where they are not intended.
</p></li>

<li><p>
C/C++ integer ranges are platform-specific,
which generally either binds programs to a platform
or requires considerable care in production of portable code.
</p></li>

</ul>


<h3><a name="versus_floating">Fixed-Point versus Floating-Point</a></h3>

<p>
Fixed-point arithmetic is better than floating-point arithmetic
in several domains.
</p>

<ul>

<li><p>
Problems with a constrained range can use bits for resolution
that would otherwise have gone to an exponent.
Examples include Mandelbrot set computation and angular position.
</p></li>

<li><p>
Problems with a constrained resolution
can use smaller representation than a single-precision floating-point.
An example is a pixel color value.
</p></li>

<li><p>
Low-cost or low-power systems may not provide floating-point hardware.
For these systems, fixed-point arithmetic provides a much higher performance
alternative to software-implemented floating-point arithmetic
for many problems.
</p></li>

</ul>


<h3><a name="prior_art">Prior Art</a></h3>

<p>
The popular computing literature abounds with
articles on how to use integers to implement fixed-point arithmetic.
However, manually writing fixed-point arithmetic with integer arithmetic
is tedious and prone to error.
Direct support is desirable.
</p>

<p>
ISO/IEC TR 18037 <a href="#E">[E]</a>
provides fixed-point support in the C programming language.
However, this support is not general;
only a few possible radix positions are supported.
The feature is essentially limited to the digital signal processing domain.
</p>

<p>
Likewise,
software implementations of fixed-point arithmetic in C and C++,
e.g. libfixmath <a href="#F">[F]</a>,
are also not general
as the support only a limited number of radix positions.
</p>

<p>
The programming languages
Ada <a href="#A">[A]</a>,
COBOL <a href="#B">[B]</a>,
CORAL 66 <a href="#C1">[C1]</a> <a href="#C2">[C2]</a>,
JOVIAL <a href="#J">[J]</a>, and
PL/I <a href="#P">[P]</a>
provide direct support for fixed-point arithmetic.
</p>


<h2><a name="proposal_summary">Proposal Outline</a></h2>

<p>
We propose extending the standard library
to provide general purpose binary fixed-point arithmetic.
We anticipate that these library components
will be used for the manipulation of program data variables,
and not program control variables.
That is, we do not view the proposal
as a replacement for index or size variables.
</p>

<p>
The design relies on generally 'safe' defaults,
but with additional explicit controls
to match particular application domains
or to enable more efficient execution.
</p>

<p>
Our proposal requires no new hardware,
and is implementable as a pure library.
Indeed, much of that implementation already exists.
However, some operations could be substantially
faster with direct hardware support.
</p>


<h3><a name="basic_types">Basic Types</a></h3>

<p>
The fixed-point library contains four class templates.
They are <code>cardinal</code> and <code>integral</code>
for integer arithmetic,
and <code>nonnegative</code> and <code>negatable</code>
for fractional arithmetic.
</p>

<p>
These types have a range specified by an integer.
The range of an unsigned number <var>n</var>
is 0 <= n < 2<sup><var>g</var></sup>
where <var>g</var> is the range parameter.
The range of an signed number <var>n</var>
is 2<sup><var>g</var></sup> < n < 2<sup><var>g</var></sup>.
Note that the range interval is half-open for unsigned numbers
and open for signed numbers.
For example,
<code>cardinal&lt;8&gt;</code> has values <var>n</var>
such that 0 &lt;= <var>n</var> &lt; 256
and
<code>integral&lt;8&gt;</code> has values <var>n</var>
such that -256 &lt; <var>n</var> &lt; 256.
</p>

<p>
The fractional types have a resolution specified by an integer.
The resolution of a fractional number <var>n</var>
is 2<sup><var>s</var></sup>,
where <var>s</var> is the resolution parameter.
For example, <code>negatable&lt;8,-4&gt;</code> has values <var>n</var>
such that -256 &lt; <var>n</var> &lt; 256
in increments of 2<sup>-4</sup> = 1/16.
</p>

<p>
Both range and resolution parameters may be either positive or negative.
The number of significant bits is <var>g</var>-<var>s</var>.
This specification enables representing
both very small and very large values
with few bits.
In any event,
the range must be greater than the resolution,
that is <var>g</var>&gt;<var>s</var>.
</p>


<h3><a name="basic_operations">Basic Operations</a></h3>

<p>
The basic arithmetic operations are
addition, subtraction, multiplication and division.
When mixing operands of different template class types,
<code>cardinal</code> types will promote to the other types, and
the other types will promote to <code>negatable</code> type.
The effect is that unsigned fixed-point types
promote to signed fixed-point types.
There are notable exceptions.
Negation and subtraction on unsigned types yields a signed type.
Comparison across types is direct;
there is no conversion beforehand.
</p>

<p>
In general,
the range and resolution of the result of basic operations
are large enough to hold the mathematical results.
The following table shows the range and resolution
for the results of basic operations on fractional types.
The $ operator returns the range parameter.
The @ operator returns the resolution parameter.
</p>

<table>
<thead>
<tr><th>operation</th><th>result range</th><th>result resolution</th></tr>
</thead>
<tbody>
<tr><td>a+b</td><td>max($a,$b)+1</td><td>min(@a,@b)</td></tr>
<tr><td>a-b</td><td>max($a,$b)+1</td><td>min(@a,@b)</td></tr>
<tr><td>a*b</td><td>$a+$b</td><td>@a+@b</td></tr>
<tr><td>a/b</td><td>$a-@b</td><td>@a+$b</td></tr>
</tbody>
</table>

<p>
Overflow in template argument computation is undefined behavior.
In practice, overflow is unlikely to be a significant problem
because even small machines can represent numbers with thousands of bits
and because compiler can diagnose overflow in template arguments.
</p>

<p>
The special case in the operations above is division,
where the mathematical result may require an infinite number of bits.
The actual value must be rounded to a representable value.
The above resolution is sufficient to ensure that
if the mathematical result is not zero,
the fixed-point result is not zero.
Furthermore,
assuming values have an error of one-half ULP,
the defined resolution is close to the error bound in the computation.
</p>

<p>
When the computation is not exact,
rounding will be to one of the two nearest representable values.
The algorithm for choosing between these values is the rounding mode.
Different applications desire different modes,
so programmers may specify the rounding mode
with a value of type <code>enum class round</code>. 
The possible values are:
</p>

<dl>
<dt><code>fastest</code></dt>
<dd>Speed is more important than the choice in value.</dd>

<dt><code>negative</code></dt>
<dd>Round towards negative infinity.
This mode is useful in interval arithmetic.</dd>

<dt><code>truncated</code></dt>
<dd>Round towards zero.
This mode is useful in implementing integral arithmetic.</dd>

<dt><code>positive</code></dt>
<dd>Round towards positive infinity.
This mode is useful in interval arithmetic.</dd>

<dt><code>classic</code></dt>
<dd>Round towards the nearest value,
but exactly-half values are rounded towards maximum magnitude.
This mode is the standard school algorithm.</dd>

<dt><code>near_even</code></dt>
<dd>Round towards the nearest value,
but exactly-half values are rounded towards even values.
This mode has more balance than the classic mode.</dd>

<dt><code>near_odd</code></dt>
<dd>Round towards the nearest value,
but exactly-half values are rounded towards odd values.
This mode has as much balance as the <code>near_even</code> mode,
but preserves more information.</dd>
</dl>

<p>
In general, these modes get slower but more accurate working down the list.
</p>


<h3><a name="construction">Construction and Assignment</a></h3>

<p>
Since the range of intermediate values grow to hold all possible values,
and variables have a static range and resolution,
construction and assignment may need to reduce the range and resolution.
Reducing the resolution is done with a rounding mode
associated with the variable.
When the dynamic value exceeds the range of variable,
the assignment overflows.
</p>

<p>
When an overflow does occur,
the desirable behavior depends on the application,
so programmers may specify the overflow mode
with a value of type <code>enum class overflow</code>. 
The possible values are:
</p>

<dl>
<dt><code>impossible</code></dt>
<dd>Programmer analysis of the program
has determined that overflow cannot occur.
Uses of this mode should be accompanied by
an argument supporting the conclusion.</dd>

<dt><code>undefined</code></dt>
<dd>Programmers are willing to accept undefined
behavior in the event of an overflow.</dd>

<dt><code>modulus</code></dt>
<dd>The assigned value is the dynamic value mod the range of the variable.
This mode makes sense only with unsigned numbers.
It is useful for angular measures.</dd>

<dt><code>saturate</code></dt>
<dd>If the dynamic value exceeds the range of the variable,
assign the nearest representable value.</dd>

<dt><code>exception</code></dt>
<dd>If the dynamic value exceeds the range of the variable,
throw an exeception of type <code>std::overflow_error</code>.</dd>
</dl>

<p>
In general, these modes get slower but safer working down the list.
</p>


<h3><a name="literals">Literals</a></h3>

<p>
There exists no mechanism in C++11
to specify literals for the template types above.
However, we can get close with template functions
that yield the appropriate fixed-point value
based on an template <code>int</code> parameter.
For example, the expression <code>to_cardinal&lt;24&gt;()</code>
will produce a <code>cardinal</code> constant
with a range just sufficient to hold the value 24.
Likewise, the expression <code>to_nonnegative&lt;2884,-4&gt;()</code>
will produce a <code>nonnegative</code> constant
with a range and resolution just sufficient
to hold the value 2884*2<sup>-4</sup>.
</p>


<h2><a name="Examples">Examples</a></h2>

<h3><a name="alpha_blending">Alpha Blending</a></h3>

<p>
Consider the alpha blending of two RGBA pixels,
<code>a</code> and <code>b</code>.
To avoid redundancy, we will only show computation for the red color.
The algorithm is somewhat complicated
by the need to convert the [0-255] range of color representation
to the [0-1] range of color values.
</p>

<p>
<pre>
<code>struct pixel { cardinal&lt;8&gt; r, g, b, a; };

pixel blend( pixel a, pixel b ) {
  constexpr scale = to_nonnegative&lt;255,0&gt;;
  auto a_r = a.r / scale;
  auto b_r = b.r / scale;
  auto aia = b.a * (to_cardinal&lt;1&gt;() - a.a);
  auto c_a = a_a + aia;
  auto c_r = (a.r*a.a + b.r*aia) / c_a;
  pixel c;
  c.a = static_cast&lt;nonnegative&lt;8,0&gt;(c_a * to_nonnegative&lt;255,0&gt;);
  c.r = static_cast&lt;nonnegative&lt;8,0&gt;(c_r * to_nonnegative&lt;255,0&gt;);
  return c;
};</code>
</pre>


<h2><a name="proposal_details">Proposal Details</a></h2>

<h3><a name="type_signatures">Type Signatures</a></h3>

<p>
The template class type signatures are as follows.
</p>

<pre>
<code>template&lt;
    int Crng,
    overflow Covf = overflow::exception
&gt;
class cardinal;</code>
</pre>

<pre>
<code>template&lt;
    int Crng,
    overflow Covf = overflow::exception
&gt;
class integral;</code>
</pre>

<pre>
<code>template&lt;
    int Crng,
    int Crsl,
    round Crnd = round::nearest_odd,
    overflow Covf = overflow::exception
&gt;
class nonnegative;</code>
</pre>

<pre>
<code>template&lt;
    int Crng,
    int Crsl,
    round Crnd = round::nearest_odd,
    overflow Covf = overflow::exception
&gt;
class negatable;</code>
</pre>

<h3><a name="operations">Operations</a></h3>

<table>

<thead>
<tr><th>operations</th>
<th>types</th>
<th>notes</th></tr>
</thead>

<tbody>
<tr><td>default construction</td>
<td>all</td>
<td>uninitialized</td></tr>

<tr><td>copy construction</td>
<td>all</td>
<td>identical value</td></tr>

<tr><td>value construction</td>
<td>from same or lower type</td>
<td>value subject to overflow and/or rounding</td></tr>

<tr><td><code>v.increment&lt;overflow&gt;();<br>v++; v--; ++v; --v</code></td>
<td><code>cardinal</code> and <code>integral</code></td>
<td>value subject to overflow</td></tr>

<tr><td><code>-v</code></td>
<td>all</td>
<td>result is a signed type</td></tr>

<tr><td><code>!v</code></td>
<td>all</td>
<td>test for value != 0</td></tr>

<tr><td><code>~v</code></td>
<td><code>cardinal</code></td>
<td>invert bits, but still within range</td></tr>

<tr><td><code>v.scale_up&lt;</code><var>n</var><code>&gt;()</code></td>
<td>all</td>
<td>multiply by 2<sup><var>n</var></sup>, where <var>n</var>&gt;0</td></tr>

<tr><td><code>v.scale&lt;</code><var>n</var><code>,</code><var>r</var><code>&gt;()</code></td>
<td>all</td>
<td>multiply by 2<sup><var>n</var></sup>,
apply the rounding mode when <var>n</var>&lt;0</td></tr>

<tr><td><code>v*u</code></td>
<td>all</td>
<td>multiplication</td></tr>

<tr><td><code>v.divide&lt;round&gt;(u)</code></td>
<td><code>cardinal</code> and <code>integral</code></td>
<td>integer division with the given roundng mode</td></tr>

<tr><td><code>v.divide&lt;round&gt;(u)</code></td>
<td><code>nonnegative</code> and <code>negatable</code></td>
<td>fractional division with the given roundng mode</td></tr>

<tr><td><code>v/u</code></td>
<td><code>cardinal</code> and <code>integral</code></td>
<td><code>v.divide&lt;truncated&gt;(u)</code></td></tr>

<tr><td><code>v/u</code></td>
<td><code>nonnegative</code> and <code>negatable</code></td>
<td><code>v.divide&lt;</code>class-specified
rounding mode<code>&gt;(u)</code></td></tr>

<tr><td><code>v/u</code></td>
<td><code>nonnegative</code> and <code>negatable</code></td>
<td><code>v.divide&lt;</code>class-specified
rounding mode<code>&gt;(u)</code></td></tr>

<tr><td><code>v%u</code></td>
<td><code>cardinal</code> and <code>integral</code></td>
<td>remainder</td></tr>

<tr><td><code>v+u</code></td>
<td>all</td>
<td>addition</td></tr>

<tr><td><code>v-u</code></td>
<td>all</td>
<td>subtraction, the result is always signed</td></tr>

<tr><td>comparisons</td>
<td>all</td>
<td>no value promotion/conversion, all comparisons are value-based</td></tr>

<tr><td><code>v&amp;u; v^u; v|u</code></td>
<td><code>cardinal</code></td>
<td>bitwise logical operations</td></tr>

<tr><td><code>v=u; a*=b; a/=b; a%=b; a+=b; a-=b; </code></td>
<td>all</td>
<td>(compound) assignment</td></tr>

<tr><td><code>v&amp=u; v^=u; v|=u;</code></td>
<td><code>cardinal</code></td>
<td>bitwise logical compound assignment</td></tr>

</tbody>
</table>


<h2><a name="References">References</a></h2>

<dl>

<dt><a name="A">[A]</a></dt>
<dd>
<cite>Ada Reference Manual</cite>,
<a
href="http://www.ada-auth.org/arm.html">
http://www.ada-auth.org/arm.html</a>
</dd>

<dt><a name="B">[B]</a></dt>
<dd>
<cite>ISO IEC JTC1/SC22/WG4 - COBOL</cite>,
<a
href="http://www.cobolstandard.info/wg4/wg4.html">
http://www.cobolstandard.info/wg4/wg4.html</a>
</dd>

<dt><a name="C1">[C1]</a></dt>
<dd>
<cite>BS 5905:1980
Specification for computer programming language CORAL 66</cite>,
October 1980,
<a
href="http://shop.bsigroup.com/ProductDetail/?pid=000000000000133933">
http://shop.bsigroup.com/ProductDetail/?pid=000000000000133933</a>
</dd>

<dt><a name="C2">[C2]</a></dt>
<dd>
<cite>XGC CORAL 66
Language Reference Manual</cite>,
2001,
<a
href="http://www.xgc.com/manuals/xgc-c66-rm/book1.html">
http://www.xgc.com/manuals/xgc-c66-rm/book1.html</a>
</dd>

<dt><a name="E">[E]</a></dt>
<dd>
<cite>ISO/IEC TR 18037:2008: Programming languages -- C --
Extensions to support embedded processors</cite>,
<a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51126">
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51126</a>
</dd>

<dt><a name="F">[F]</a></dt>
<dd>
<cite>libfixmath: Cross Platform Fixed Point Maths Library</cite>,
<a href="http://code.google.com/p/libfixmath">
http://code.google.com/p/libfixmath</a>
</dd>

<dt><a name="J">[J]</a></dt>
<dd>
<cite>MIL-STD-1589C (USAF) JOVIAL (J73)</cite>,
6 July 1984,
<a
href="http://www.everyspec.com/MIL-STD/MIL-STD+%281500+-+1599%29/MIL-STD-1589C_14577/">
http://www.everyspec.com/MIL-STD/MIL-STD+%281500+-+1599%29/MIL-STD-1589C_14577/</a>
</dd>

<dt><a name="P">[P]</a></dt>
<dd>
<cite>PL/I</cite>,
Wikipedia,
<a
href="http://en.wikipedia.org/wiki/PL/I">
http://en.wikipedia.org/wiki/PL/I</a>
</dd>

</dl>

</body>
</html>
