<html>
<head>
<title>P0543R2: Saturation arithmetic</title>

<style type="text/css">
  ins { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .new { text-decoration:none; font-weight:bold; background-color:#D0FFD0 }
  del { text-decoration:line-through; background-color:#FFA0A0 }  
  strong { font-weight: inherit; color: #2020ff }
  table, td, th { border: 1px solid black; border-collapse:collapse; padding: 5px }
</style>
</head>

<body>
ISO/IEC JTC1 SC22 WG21 P0543R2<br/>
Jens Maurer &lt;Jens.Maurer@gmx.net><br/>
Target audience: LWG<br/>
2022-09-18<br/>

<h1>P0543R2: Saturation arithmetic</h1>

<h2>Introduction</h2>

Arithmetic on unsigned integers is performed modulo 2<sup>N</sup> in C
and C++ (6.8.2 [basic.fundamental] p2):

<blockquote>
The range of representable values for the unsigned type is 0 to
2<sup>N</sup> − 1 (inclusive); arithmetic for the unsigned type is
performed modulo 2<sup>N</sup>.
</blockquote>

Signed integer operations have undefined behavior when the result is
not representable (7.1 [expr.pre] p4):

<blockquote>
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
</blockquote>

In order to implement some algorithms, the use of saturation
arithmetic is necessary, where an operation yielding a result whose
absolute value is too large instead returns the smallest or largest
representable number.  For example, when determining the color of a
pixel, it would not make sense that brightening a white pixel suddenly
turns it black or dark-grey. Instead, brightening a white pixel should
simply yield a white pixel.

<p>

This paper proposes to add simple free functions for basic saturating
operations on all signed and unsigned integer types.  Further,
a <code>saturate_cast&lt;T></code> is provided that can convert from
any of those types to any other, saturating the value as needed.

<p>
A previous proposal was in "N3864: A constexpr bitwise operations
library for C++" by Matthew Fioravante, but only for addition and
subtraction.

<p>
It is expected that the functions provided with this proposal will be,
at some later time, overloaded for <code>std::simd<T></code>, the
nascent SIMD data type (see P0214R2 "Data-Parallel Vector Types &
Operations" by Matthias Kretz).
</p>

<h2>Revision history</h2>

<h3>R2</h3>
<ul>
<li>As directed by LEWG: put into <code>&lt;numeric&gt;</code> header,
mark as freestanding</li>
</ul>

<h3>R1</h3>

<ul>
<li>Updated section references.</li>
<li>Addressed SG6 review comments on the reflector.</li>
<li>Add naming discussion and rename functions to <code>add_sat</code>
pattern.</li>
</ul>

<h2>Examples</h2>

The following examples assume <code>CHAR_BIT == 8</code>.
<pre>
  int x1 = add_sat(3, 4);               // ok, yields 7
  int x2 = sub_sat(INT_MIN, 1);         // ok, yields INT_MIN
  unsigned char x3 = add_sat(255, 4);   // compiles, but yields 3
  unsigned char x4 = add_sat&lt;unsigned char>(255, 4);   // ok, yields 255
  unsigned char x5 = add_sat(252, x3);  // error: inconsistent deductions for T
  unsigned char x6 = add_sat&lt;unsigned char>(251, x1);  // ok, yields 255; might yield an int -> unsigned char conversion warning for x1
</pre>

<h2>Design considerations</h2>

<p>
All of addition, subtraction, multiplication, and division are provided.
</p>

<p>
The operations are not defined on the integral types <code>bool</code>,
<code>char</code>, <code>wchar_t</code>, <code>char16_t</code>,
and <code>char32_t</code>, as these are not intended for arithmetic.
</p>

<p>
Unlike the built-in arithmetic operators on integers, this proposal
expressly does not apply integral promotions to the arguments, since
that would be besides the point for saturation arithmetic.
</p>

<p>
The situation for template argument deduction presented by
these functions is the same as for <code>std::min</code>
or <code>std::max</code>: If two arguments of different type are
passed, the call fails to compile.
</p>

<p>
Instead of free functions, it is conceivable to provide an
integer-like class template with the arithmetic operators suitably
overloaded. This would, however, make it impossible to adopt this
proposal for C, and seems slightly over-engineered for a rather simple
facility.
</p>

<p>
The header &lt;cmath> contains mostly (except for <code>abs</code>)
floating-point functions, so integer-only arithmetic functions do not
seem to fit.  The header <code>&lt;cstdlib></code> does
contain <code>abs</code> and <code>div</code> functions for integers,
but its contents is mostly defined by the related C
header <code>&lt;stdlib.h></code>, therefore I suggest to create a new
header.
</p>

<p>
Regarding customization for user-defined types, these functions are
considered in the same category as <code>std::sin</code>
or <code>std::cos</code>.  There is no intent to offer full
customization point objects at this time.
</p>

<h2>Prior art</h2>

<p>
A lot of SIMD instruction sets contain CPU instructions for saturation
arithmetic on SIMD vectors, including SSE2 for x86 and NEON for ARM.
</p>

<p>
A branch-free implementation for scalars is available here:
https://locklessinc.com/articles/sat_arithmetic/ .
</p>

<h2>Naming</h2>

Considerations:

<ul>
<li>These are basic low-level operations.  Names should be short.</li>
<li>add / sub / mul / div are common abbreviation for the
corresponding arithmetic operations.</li>
</ul>

Options (only showing the operation "saturated addition"):

<ul>
<li>addsat, satadd: not easily readable</li>
<li>saturated_add, add_saturated: too long</li>
<li>sat_add: more readable due to underscore</li>
<li>add_sat: more readable due to underscore, precedence in OpenCL</li>
</ul>

The last choice is taken.

<h2>LEWG deliberations</h2>

On 2022-08-02, LEWG discussed P0543R1.  There was no consensus to
introduce the saturation arithmetic functions as customization point
objects.  There was no consensus to change the proposed names.  There
was consensus to put the functions into
the <code>&lt;numeric&gt;</code> header.  There was consensus to mark
the functions as freestanding.  Finally, there was consensus to send
the paper so modified to LWG.

<h2>Wording</h2>

In subclause 27.9 [numeric.ops.overview],
add to header <code>&lt;numeric&gt;</code> as indicated:

<blockquote>
  <pre>
  // 27.10.16, midpoint
  template<class T>
    constexpr T midpoint(T a, T b) noexcept;
  template<class T>
    constexpr T* midpoint(T* a, T* b);

<ins>  // 27.10.17, saturation arithmetic
  template&lt;class T>
    constexpr T add_sat(T x, T y) noexcept;           // freestanding
  template&lt;class T>
    constexpr T sub_sat(T x, T y) noexcept;           // freestanding
  template&lt;class T>
    constexpr T mul_sat(T x, T y) noexcept;           // freestanding
  template&lt;class T>
    constexpr T div_sat(T x, T y) noexcept;           // freestanding
  template&lt;class T, class U>
    constexpr T saturate_cast(U x) noexcept;          // freestanding</ins>
}
</pre>
</blockquote>

Append a new subsection to subclause 27.10 [numeric.ops] with the following content:

<blockquote class="new">
<h2>27.10.17 Saturation arithmetic [numeric.sat]</h2>

<h3>27.10.17.1 Arithmetic functions [numerics.sat.func]</h3>
[ Note: In the following descriptions, an arithmetic operation is
performed as a mathematical operation with infinite range and then it
is determined whether the mathematical result fits into the result
type. ]

<pre>
  template&lt;class T>
    constexpr T add_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x + y is representable as a value of type T, x
+ y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x + y.

<pre>
  template&lt;class T>
    constexpr T sub_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x - y is representable as a value of type T, x
- y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x - y.

<pre>
  template&lt;class T>
    constexpr T mul_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x * y is representable as a value of type T, x *
y, otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of x * y.

<pre>
  template&lt;class T>
    constexpr T div_sat(T x, T y) noexcept;
</pre>
<em>Constraints:</em> T is a signed or unsigned integer type (6.8.2
[basic.fundamental]).
<p>
<em>Preconditions:</em> <code>y != 0</code> is <code>true</code>.
<p>
Let q be <code>x</code> / <code>y</code>, with any fractional part discarded.
<p>
<em>Returns:</em> If q is representable as a value of type T, q,
otherwise either the largest or smallest representable value of
type T, whichever is closer to the value of q.

<h3>27.10.17.2 Casting [numerics.sat.cast]</h3>

<pre>
  template&lt;class T, class U>
    constexpr T saturate_cast(U x) noexcept;
</pre>

<em>Constraints:</em> T and U are signed or unsigned integer types (6.8.2
[basic.fundamental]).
<p>
<em>Returns:</em> If x is representable as a value of type T, x,
otherwise either the largest or smallest representable value of type
T, whichever is closer to the value of x.

</blockquote>

Add a feature-test macro in [version.syn]:

<blockquote class="new">
<code>#define __cpp_lib_saturation_arithmetic</code>
</blockquote>


<h2>References</h2>

<ul>
  <li>ISO/IEC JTC1 SC22 WG21 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3864.html">N3864</a>: "A constexpr bitwise operations library for C++" by Matthew Fioravante</li>
  <li>ISO/IEC JTC1 SC22 WG21 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0214r2.pdf">P0214R2</a>: "Data-Parallel Vector Types & Operations" by Matthias Kretz</li>
  <li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Saturation_arithmetic">Saturation arithmetic</a></li>
  <li><a href="http://infocenter.arm.com/help/topic/com.arm.doc.dui0801g/DUI0801G_armasm_user_guide.pdf">ARM Compiler User Guide</a> [large PDF], section 13.89 "QSUB" instruction reference</li>
  <li><a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf">Intel x86 Instruction Set Reference</a> [large PDF], PSUBUSB instruction</li>
</ul>


</ul>
