<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
  border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
  padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
  border: 1px solid #D1D1D1;
  padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
  color: #000000; background-color: #FFEBFF;
  border: 1px solid #ECD7EC;
  padding-left: 0.5empadding-right: 0.5em; ; }

blockquote.stdins { text-decoration: underline;
  color: #000000; background-color: #C8FFC8;
  border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
  margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
  padding-left: 0.8em; border: none; }

</style>

<title>Clarifying Memory Allocation</title>
</head>
<body>
<h1>Clarifying Memory Allocation</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N3537 - 2013-03-12
</p>

<address>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
<br>
Chandler Carruth, chandlerc@google.com
</address>

<p>
<a href="#Introduction">Introduction</a><br>
<a href="#Problem">Problem</a><br>
<a href="#Solution">Solution</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Memory">Memory</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Races">Data Races</a><br>
<a href="#Wording">Wording</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#basic.stc.dynamic">3.7.4 Dynamic storage duration [basic.stc.dynamic]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#basic.stc.dynamic.allocation">3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#replacement.functions">5.3.4 New [expr.new]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#replacement.functions">17.6.4.6 Replacement functions [replacement.functions]</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#new.delete.dataraces">18.6.1.4 Data races [new.delete.dataraces]</a><br>
<a href="#Revision">Revision History</a><br>
<a href="#References">References</a><br>
</p>


<h2><a name="Introduction">Introduction</a></h2>

<p>
The allocation and deallocation of memory
has become a significant expense in modern systems.
The optimization of that process is important to good performance.
However, it is important to distinguish between
micro-optimization of the calls
and macro-optimization of the allocation strategy.
In particular, good system performance
may well require adapting the allocation stragegy
to the dynamic behavior of the application,
or even to hints provided by the application.
</p>


<h2><a name="Problem">Problem</a></h2>

<p>
As strict reading of the current C and C++ standards
may lead one to conclude that 
the allocation strategy shall not consider any information
not derivable from the sequence of allocation and deallocation calls.
In essence, the standards may exclude macro-optimization of allocation.
</p>

<p>
On the other hand,
a strict reading of the standards
may lead one to conclude that
the implementation must make an implementation function call
for each and every abstract call.
This reading may exclude micro-optimization of allocation.
</p>


<h2><a name="Solution">Solution</a></h2>

<p>
We propose to replace existing mechanistic wording
with wording more precisely focused on essential requirements.
The intent is to enable behavior
that some existing memory allocators already have.
For example, see TCMalloc <a href="#TCM">[TCM]</a>.
</p>


<h3><a name="Memory">Memory</a></h3>

<p>
An essential requirement on implementations
is that they deliver usable memory,
not that they have a particular sequence of calls.
We propose to explicitly decouple the implementation calls
from the abstract calls.
</p>

<ol>
<li><p>
The number of implementation calls
is not part of the observable behavior of the program.
</p></li>
<li><p>
The parameters and return values of implementation calls
is not part of the observable behavior of the program,
except that the sum of the size parameters
of live implementation allocation calls
shall not exceed the sum of the size parameters,
adjusted for alignment,
of live abstract allocation calls.
(That is, implementations may "round up" size parameters,
as they already do.)
</p></li>
</ol>

<p>
Together these changes enable implementations
to reduce the number of malloc calls by avoiding them or fusing them.
However, it would only enable fusing
mallocs together into larger mallocs
provided it can prove that both mallocs have overlapping lifetimes
(ended by corresponding calls to free)
such that the peak allocated memory of the program remains unchanged.
</p>

<p>
Because C++ class-specific memory allocators
are often tuned to specific class sizes,
we do not apply this relaxation to those allocators.
</p>


<h3><a name="Races">Data Races</a></h3>

<p>
An essential requirement on implementations
is that they be data-race free,
yet the standards do not say so directly.
We propose to replace the current wording with direct wording,
thus explicitly enabling an implementation to consider information
beyond the strict sequence of allocation and deallocation calls.
</p>


<h2><a name="Wording">Wording</a></h2>

<p>
The wording in this section is relative to WG21
<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2012/n3337.pdf">
N3337</a>.
</p>


<h3><a name="basic.stc.dynamic">3.7.4 Dynamic storage duration [basic.stc.dynamic]</a></h3>

<p>
Paragraph 2 is unchanged.
</p>

<blockquote class="std">
<p>
The library provides default definitions
for the global allocation and deallocation functions.
Some global allocation and deallocation functions are replaceable (18.6.1).
A C++ program shall provide
at most one definition of a replaceable allocation or deallocation function.
Any such function definition
replaces the default version provided in the library (17.6.4.6).
The following allocation and deallocation functions (18.6)
are implicitly declared in global scope in each translation unit of a program.
</p>
<blockquote><pre>
<code>void* operator new(std::size_t);
void* operator new[](std::size_t);
void operator delete(void*);
void operator delete[](void*);</code>
</pre></blockquote>
<p>
These implicit declarations introduce only the function names
<code>operator new</code>, <code>operator new[]</code>,
<code>operator delete</code>, and <code>operator delete[]</code>.
[<em>Note:</em>
The implicit declarations do not introduce the names
<code>std</code>, <code>std::size_t</code>, or any other names
that the library uses to declare these names.
Thus, a <var>new-expression</var>, <var>delete-expression</var> or
function call that refers to one of these functions
without including the header <code>&lt;new&gt;</code> is well-formed.
However, referring to <code>std</code> or <code>std::size_t</code>
is ill-formed unless the name has been declared
by including the appropriate header.
&mdash;<i>end note</i>]
Allocation and/or deallocation functions
can also be declared and defined for any class (12.5).
</p>
</blockquote>

<p>
Add a new paragraph between the existing paragraphs 2 and 3.
</p>

<blockquote class="stdins">
<p>
A <dfn>live allocation call</dfn>
is one in which the call has occured
but the corresponding deallocation call has not occured.
An <dfn>abstract allocation call</dfn>
is a call executed by the abstract machine (1.9 [intro.execution]).
An <dfn>implemented allocation call</dfn>
is a call executed by the physical machine.
When allocation calls are <dfn>relaxed</dfn>,
the number of live implementation allocation calls
may be less than the number of live abstract allocation calls.
That is, two or more relaxed abstract allocation calls
may be merged into a single implementation allocation,
but only provided that
the sum of the size parameters of the live implementation calls
shall not exceed the sum of the size parameters of live abstract calls,
where each parameter
is rounded up to a multiple of the corresponding alignment.
Corresponding deallocation calls are likewise merged.
Non-class non-array allocation calls are relaxed with respect to each other.
Non-class array allocation calls are relaxed with respect to each other.
Other allocation calls are not relaxed.
[<i>Note:</i>
Placement <code>new</code> expressions (5.3.4 [expr.new])
and pseudo destructor call expressions (5.2.4 [expr.pseudo])
do not call allocators
and are therefore unaffected.
&mdash;<i>end note</i>]
</p>
</blockquote>

<p>
Paragraph 3 is unchanged.
</p>

<blockquote class="std">
<p>
Any allocation and/or deallocation functions defined in a C++ program,
including the default versions in the library,
shall conform to the semantics specified in 3.7.4.1 and 3.7.4.2.
</p>
</blockquote>


<h3><a name="basic.stc.dynamic.allocation">3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]</a></h3>

<p>
Paragraph 2 is unchanged.
</p>

<blockquote class="std">
<p>
The allocation function attempts to allocate the requested amount of storage.
If it is successful,
it shall return the address of the start of a block of storage
whose length in bytes shall be at least as large as the requested size.
There are no constraints on the contents of the allocated storage
on return from the allocation function.
The order, contiguity, and initial value of storage allocated
by successive calls to an allocation function are unspecified.
The pointer returned shall be suitably aligned
so that it can be converted to a pointer of any complete object type
with a fundamental alignment requirement (3.11)
and then used to access the object or array in the storage allocated
(until the storage is explicitly deallocated
by a call to a corresponding deallocation function).
Even if the size of the space requested is zero, the request can fail.
If the request succeeds,
the value returned shall be a non-null pointer value (4.10) <code>p0</code>
different from any previously returned value <code>p1</code>,
unless that value <code>p1</code>
was subsequently passed to an operator <code>delete</code>.
The effect of dereferencing a pointer returned as a request for zero size
is undefined.
[<i>Footnote:</i>
The intent is to have operator <code>new()</code>
implementable by calling
<code>std::malloc()</code> or <code>std::calloc()</code>,
so the rules are substantially the same.
C++ differs from C in requiring a zero request to return a non-null pointer.
&mdash;<i>end footnote</i>]
</p>
</blockquote>


<h3><a name="replacement.functions">5.3.4 New [expr.new]</a></h3>

<p>
Paragraph 8 is unchanged,
but note that the number of allocation implementation calls
may be less than the number of abstract calls specified below
via the provision of 3.7.4/new.
</p>

<blockquote class="std">
<p>
A <var>new-expression</var> obtains storage for the object
by calling an <var>allocation function</var> (3.7.4.1).
If the <var>new-expression</var> terminates by throwing an exception,
it may release storage by calling a deallocation function (3.7.4.2).
If the allocated type is a non-array type,
the allocation function's name is <code>operator new</code> and
the deallocation function's name is <code>operator delete</code>.
If the allocated type is an array type,
the allocation function's name is <code>operator new[]</code> and
the deallocation function's name is <code>operator delete[]</code>.
[<i>Note:</i>
an implementation shall provide default definitions
for the global allocation functions (3.7.4, 18.6.1.1, 18.6.1.2).
A C++ program can provide
alternative definitions of these functions (17.6.4.6)
and/or class-specific versions (12.5).
&mdash;<i>end note</i>]
</p>
</blockquote>

<p>
Paragraph 8 is unchanged,
but note that the number of allocation implementation calls
may be less than the number of abstract calls specified below
via the provision of 3.7.4/new.
</p>

<blockquote class="std">
<p>
A <var>new-expression</var>
passes the amount of space requested
to the allocation function
as the first argument of type <code>std::size_t</code>.
That argument shall be no less than
the size of the object being created;
it may be greater than
the size of the object being created
only if the object is an array.
For arrays of <code>char</code> and <code>unsigned char</code>,
the difference between the result of the <var>new-expression</var>
and the address returned by the allocation function
shall be an integral multiple
of the strictest fundamental alignment requirement (3.11)
of any object type whose size is no greater than
the size of the array being created.
[<i>Note:</i>
Because allocation functions
are assumed to return pointers to storage
that is appropriately aligned for objects
of any type with fundamental alignment,
this constraint on array allocation overhead
permits the common idiom of allocating character arrays
into which objects of other types will later be placed.
&mdash;<i>end note</i>]
</p>
</blockquote>


<h3><a name="replacement.functions">17.6.4.6 Replacement functions [replacement.functions]</a></h3>

<p>
Paragraph 2 is unchanged.
</p>

<blockquote class="std">
<p>
A C++ program may provide the definition for
any of eight dynamic memory allocation function signatures
declared in header <code>&lt;new&gt;</code> (3.7.4, 18.6):
</p>
<ul>
<li><code>operator new(std::size_t)</code></li>
<li><code>operator new(std::size_t, const std::nothrow_t&amp;)</code></li>
<li><code>operator new[](std::size_t)</code></li>
<li><code>operator new[](std::size_t, const std::nothrow_t&amp;)</code></li>
<li><code>operator delete(void*)</code></li>
<li><code>operator delete(void*, const std::nothrow_t&amp;)</code></li>
<li><code>operator delete[](void*)</code></li>
<li><code>operator delete[](void*, const std::nothrow_t&amp;)</code></li>
</ul>
</blockquote>

<p>
Paragraph 3 is unchanged.
</p>

<blockquote class="std">
<p>
The program's definitions are used
instead of the default versions supplied by the implementation (18.6).
Such replacement occurs prior to program startup (3.2, 3.6).
The program's definitions shall not be specified as inline.
No diagnostic is required.
</p>
</blockquote>


<h3><a name="new.delete.dataraces">18.6.1.4 Data races [new.delete.dataraces]</a></h3>

<p>
Edit paragraph 1 as follows.
</p>

<blockquote class="std">
<p>
For purposes of determining the existence of data races,
the library versions of operator <code>new</code>,
user replacement versions of global operator <code>new</code>,
and the C standard library functions <code>calloc</code> and <code>malloc</code>
<del>
shall behave as though they accessed and modified
only the storage referenced by the return value.
</del>
<ins>
shall conform to the provisions of 17.6.5.9 [res.on.data.races].
</ins>
The library versions of operator <code>delete</code>,
user replacement versions of operator <code>delete</code>,
and the C standard library function <code>free</code>
<del>
shall behave as though they accessed and modified
only the storage referenced by their first argument.
</del>
<ins>
shall conform to the provisions of 17.6.5.9 [res.on.data.races].
</ins>
The C standard library function <code>realloc</code>
<del>
shall behave as though it accessed and modified
only the storage referenced by its first argument and by its return value.
</del>
<ins>
shall conform to the provisions of 17.6.5.9 [res.on.data.races].
</ins>
Calls to these functions
that allocate or deallocate a particular unit of storage
<ins><var>p</var></ins>
shall occur in a single total order,
and each such deallocation call shall happen before
<ins>(1.10 [intro.multithread])</ins>
the next allocation (if any) in this order.
<ins>
Programs shall ensure that
a call that allocates a region <var>p</var> of memory
happens before all accesses to <var>p</var>
and that
all modifications of <var>p</var>
happen before
the call that deallocates <var>p</var>.
</ins>
</p>
</blockquote>


<h2><a name="Revision">Revision History</a></h2>

<p>
This paper revises N3433 - 2012-09-23 as follows.
</p>

<ul>

<li><p>
Clarify that class-specific allocation operators are unaffected.
</p></li>

<li><p>
Clarify that placement new is unaffectd.
</p></li>

<li><p>
Clarify that array and non-array allocations are not merged.
</p></li>

<li><p>
Clarify that happens-before constraints
on programs' allocations and deallocations.
</p></li>

<li><p>
Change terminology from "nominal" calls to "abstract" calls,
in analogy with the abstract machine.
Likewise, change the terminlogy from "external" calls to "implementation" calls.
</p></li>

<li><p>
Remove wording for the C standard.
The C committee has decided to make no changes.
</p></li>

<li><p>
Add a 'Revision History' section.
</p></li>

</ul>


<h2><a name="References">References</a></h2>

<dl>

<dt><a name="TCM">[TCM]</a></dt>
<dd>
<cite>TCMalloc : Thread-Caching Malloc</cite>,
<a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html">
http://goog-perftools.sourceforge.net/doc/tcmalloc.html</a>.
</dd>

</dl>


</body>
</html>
