<html>
<head>
<title>P0543R1: Saturation arithmetic</title>

<style type="text/css">
  ins { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .new { text-decoration:none; font-weight:bold; background-color:#D0FFD0 }
  del { text-decoration:line-through; background-color:#FFA0A0 }  
  strong { font-weight: inherit; color: #2020ff }
  table, td, th { border: 1px solid black; border-collapse:collapse; padding: 5px }
</style>
</head>

<body>
ISO/IEC JTC1 SC22 WG21 P0543R1<br/>
Jens Maurer &lt;Jens.Maurer@gmx.net><br/>
Target audience: SG6, LEWG<br/>
2022-05-02<br/>

<h1>P0543R1: Saturation arithmetic</h1>

<h2>Introduction</h2>

Arithmetic on unsigned integers is performed modulo 2<sup>N</sup> in C
and C++ (6.8.2 [basic.fundamental] p2):

<blockquote>
The range of representable values for the unsigned type is 0 to
2<sup>N</sup> − 1 (inclusive); arithmetic for the unsigned type is
performed modulo 2<sup>N</sup>.
</blockquote>

Signed integer operations have undefined behavior when the result is
not representable (7.1 [expr.pre] p4):

<blockquote>
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
</blockquote>

In order to implement some algorithms, the use of saturation
arithmetic is necessary, where an operation yielding a result whose
absolute value is too large instead returns the smallest or largest
representable number.  For example, when determining the color of a
pixel, it would not make sense that brightening a white pixel suddenly
turns it black or dark-grey. Instead, brightening a white pixel should
simply yield a white pixel.

<p>

This paper proposes to add simple free functions for basic saturating
operations on all signed and unsigned integer types.  Further,
a <code>saturate_cast&lt;T></code> is provided that can convert from
any of those types to any other, saturating the value as needed.

<p>
A previous proposal was in "N3864: A constexpr bitwise operations
library for C++" by Matthew Fioravante, but only for addition and
subtraction.

<p>
It is expected that the functions provided with this proposal will be,
at some later time, overloaded for <code>std::simd<T></code>, the
nascent SIMD data type (see P0214R2 "Data-Parallel Vector Types &
Operations" by Matthias Kretz).
</p>

<h2>Revision history</h2>

<h3>R1</h3>

<ul>
<li>Updated section references.</li>
<li>Addressed SG6 review comments on the reflector.</li>
<li>Add naming discussion and rename functions to <code>add_sat</code>
pattern.</li>
</ul>

<h2>Examples</h2>

The following examples assume <code>CHAR_BIT == 8</code>.
<pre>
  int x1 = add_sat(3, 4);               // ok, yields 7
  int x2 = sub_sat(INT_MIN, 1);         // ok, yields INT_MIN
  unsigned char x3 = add_sat(255, 4);   // compiles, but yields 3
  unsigned char x4 = add_sat&lt;unsigned char>(255, 4);   // ok, yields 255
  unsigned char x5 = add_sat(252, x3);  // error: inconsistent deductions for T
  unsigned char x6 = add_sat&lt;unsigned char>(251, x1);  // ok, yields 255; might yield an int -> unsigned char conversion warning for x1
</pre>

<h2>Design considerations</h2>

<p>
All of addition, subtraction, multiplication, and division are provided.
</p>

<p>
The operations are not defined on the integral types <code>bool</code>,
<code>char</code>, <code>wchar_t</code>, <code>char16_t</code>,
and <code>char32_t</code>, as these are not intended for arithmetic.
</p>

<p>
Unlike the built-in arithmetic operators on integers, this proposal
expressly does not apply integral promotions to the arguments, since
that would be besides the point for saturation arithmetic.
</p>

<p>
The situation for template argument deduction presented by
these functions is the same as for <code>std::min</code>
or <code>std::max</code>: If two arguments of different type are
passed, the call fails to compile.
</p>

<p>
Instead of free functions, it is conceivable to provide an
integer-like class template with the arithmetic operators suitably
overloaded. This would, however, make it impossible to adopt this
proposal for C, and seems slightly over-engineered for a rather simple
facility.
</p>

<p>
The header &lt;cmath> contains mostly (except for <code>abs</code>)
floating-point functions, so integer-only arithmetic functions do not
seem to fit.  The header <code>&lt;cstdlib></code> does
contain <code>abs</code> and <code>div</code> functions for integers,
but its contents is mostly defined by the related C
header <code>&lt;stdlib.h></code>, therefore I suggest to create a new
header.
</p>

<p>
Regarding customization for user-defined types, these functions are
considered in the same category as <code>std::sin</code>
or <code>std::cos</code>.  There is no intent to offer full
customization point objects at this time.
</p>

<h2>Prior art</h2>

<p>
A lot of SIMD instruction sets contain CPU instructions for saturation
arithmetic on SIMD vectors, including SSE2 for x86 and NEON for ARM.
</p>

<p>
A branch-free implementation for scalars is available here:
https://locklessinc.com/articles/sat_arithmetic/ .
</p>

<h2>Naming</h2>

Considerations:

<ul>
<li>These are basic low-level operations.  Names should be short.</li>
<li>add / sub / mul / div are common abbreviation for the
corresponding arithmetic operations.</li>
</ul>

Options (only showing the operation "saturated addition"):

<ul>
<li>addsat, satadd: not easily readable</li>
<li>saturated_add, add_saturated: too long</li>
<li>sat_add: more readable due to underscore</li>
<li>add_sat: more readable due to underscore, precedence in OpenCL</li>
</ul>

The last choice is taken.

<h2>Wording</h2>

Add the header <code>&lt;saturation></code> to the table in 16.4.2.3 [headers].
<p>

Append a new subsection to clause 28 [numerics] with the following content:

<blockquote class="new">
<h2>28.9 Saturation arithmetic [numerics.sat]</h2>

<h3>28.9.1 Header &lt;saturation> synopsis [saturation.syn]</h3>
<blockquote class="new">
  <pre>
namespace std {
  template&lt;class T>
    constexpr T add_sat(T x, T y) noexcept;
  template&lt;class T>
    constexpr T sub_sat(T x, T y) noexcept;
  template&lt;class T>
    constexpr T mul_sat(T x, T y) noexcept;
  template&lt;class T>
    constexpr T div_sat(T x, T y) noexcept;
  template&lt;class T, class U>
    constexpr T saturate_cast(U x) noexcept;
}
  <pre>
</blockquote>


<h3>28.9.2 Arithmetic functions [numerics.sat.func]</h3>
[ Note: In the following descriptions, an arithmetic operation is
performed as a mathematical operation with infinite range and then it
is determined whether the mathematical result fits into the result
type. ]

<pre>
  template&lt;class T>
    constexpr T add_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x + y is representable as a value of type T, x
+ y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x + y.

<pre>
  template&lt;class T>
    constexpr T sub_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x - y is representable as a value of type T, x
- y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x - y.

<pre>
  template&lt;class T>
    constexpr T mul_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x * y is representable as a value of type T, x *
y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x * y.

<pre>
  template&lt;class T>
    constexpr T div_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Preconditions:</em> <code>y != 0</code> is <code>true</code>.
<p>
Let q be x / y, with any fractional part discarded.
<p>
<em>Returns:</em> If q is representable as a value of type T, q,
otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of q.

<h3>28.9.3 Casting [numerics.sat.cast]</h3>

<pre>
  template&lt;class T, class U>
    constexpr T saturate_cast(U x) noexcept;
</pre>

<em>Constraints:</em> T and U are signed or unsigned integer types (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x is representable as a value of type T, x,
otherwise either the largest or smallest representable value of type
T, whichever is closer to the value of x.

</blockquote>

Add a feature-test macro in [version.syn]:

<blockquote>
<code>#define __cpp_lib_saturation_arithmetic</code>
</blockquote>


<h2>References</h2>

<ul>
  <li>ISO/IEC JTC1 SC22 WG21 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3864.html">N3864</a>: "A constexpr bitwise operations library for C++" by Matthew Fioravante</li>
  <li>ISO/IEC JTC1 SC22 WG21 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0214r2.pdf">P0214R2</a>: "Data-Parallel Vector Types & Operations" by Matthias Kretz</li>
  <li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Saturation_arithmetic">Saturation arithmetic</a></li>
  <li><a href="http://infocenter.arm.com/help/topic/com.arm.doc.dui0801g/DUI0801G_armasm_user_guide.pdf">ARM Compiler User Guide</a> [large PDF], section 13.89 "QSUB" instruction reference</li>
  <li><a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf">Intel x86 Instruction Set Reference</a> [large PDF], PSUBUSB instruction</li>
</ul>


</ul>
