<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
  <title>Introduction of std::colony to the standard library</title>
  <style type="text/css">
      body {
         font-size: 12pt;
         font-weight: normal;
         font-style: normal;
         color: black;
         background-color: white;
         line-height: 1.2em;
         margin-left: 4em;
         margin-right: 2em;
      }
      /* paragraphs */

      p {
         padding: 0;
         line-height: 1.3em;
         margin-top: 2.5em;
         margin-bottom: 1em;
         text-align: left;
      }
      /* paragraphs */

      table {
         margin-top: 3.8em;
         margin-bottom: 2em;
         text-align: left;
      }
      /* headings */

      h1 {
         font-size: 195%;
         font-weight: bold;
         font-style: normal;
         font-variant: small-caps;
         line-height: 1.6em;
         text-align: left;
         padding: 0;
         margin-top: 3.5em;
         margin-bottom: 1.7em;
      }
      h2 {
         font-size: 122%;
         font-weight: bold;
         font-style: normal;
         text-decoration: underline;
         padding: 0;
         margin-top: 4.5em;
         margin-bottom: 1.1em;
      }
      h3 {
         font-size: 110%;
         font-weight: bold;
         font-style: normal;
         text-decoration: underline;
         padding: 0;
         margin-top: 4em;
         margin-bottom: 1.1em;
      }
      h4 {
         font-size: 100%;
         font-weight: bold;
         font-style: normal;
         padding: 0;
         margin-top: 4em;
         margin-bottom: 1.1em;
      }
      h5 {
         font-size: 90%;
         font-weight: bold;
         font-style: italic;
         padding: 0;
         margin-top: 3em;
         margin-bottom: 1em;
      }
      h6 {
         font-size: 80%;
         font-weight: bold;
         font-style: normal;
         padding: 0;
         margin-top: 1em;
         margin-bottom: 1em;
      }
      /* divisions */

      div {
         padding: 0;
         margin-top: 0em;
         margin-bottom: 0em;
      }
      ul {
         margin: 0pt 0pt 22pt 15.7pt;
         padding: 0pt 0pt 0pt 0pt;
         list-style-type: square;
         font-size: 98%;
      }
      ol {
         margin: 12pt 0pt 8pt 15.7pt;
         padding: 0pt 0pt 0pt 0pt;
         font-size: 98%;
      }
      li {
         margin: 0pt 0pt 10.5pt 0pt;
         padding: 0pt 0pt 0pt 0pt;
         text-indent: 0pt;
         font-size: 98%;
         display: list-item;
      }
      /* inline */

      strong {
         font-weight: bold;
      }
      sup,
      sub {
         vertical-align: baseline;
         position: relative;
         top: -0.4em;
         font-size: 70%;
      }
      sub {
         top: 0.4em;
      }
      em {
         font-style: italic;
      }
code {
    font-family: Courier New, Courier, monospace;
    font-size: 90%;
    padding: 0 0 0 0em;
   }
      ins {
         background-color: yellow;
         text-decoration: underline;
      }
      del {
         text-decoration: line-through;
      }
      a:hover {
         color: #4398E1;
      }
      a:active {
         color: #4598E1;
         text-decoration: none;
      }
      a:link.review {
         color: #AAAAAF;
      }
      a:hover.review {
         color: #4398E1;
      }
      a:visited.review {
         color: #444444;
      }
      a:active.review {
         color: #AAAAAF;
         text-decoration: none;
      }
  </style>
</head>

<body>
Audience: LEWG, SG14, WG21<br>
Document number: D0447R8<br>
Date: 2019-09-05<br>
Project: Introduction of std::colony to the standard library <br>
Reply-to: Matthew Bentley &lt;mattreecebentley@gmail.com&gt; <br>


<h1>Introduction of std::colony to the standard library</h1>

<h2>Table of Contents</h2>
<ol type="I">
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#motivation">Motivation and Scope</a></li>
  <li><a href="#impact">Impact On the Standard</a></li>
  <li><a href="#design">Design Decisions</a></li>
  <li><a href="#technical">Technical Specifications</a></li>
  <li><a href="#acknowledgements">Acknowledgements</a></li>
  <li>Appendixes:
    <ol type="A">
      <li><a href="#functions">Memberfunctions list</a></li>
      <li><a href="#benchmarks">Referenceimplementationbenchmarks</a></li>
      <li><a href="#faq">FrequentlyAskedQuestions</a></li>
      <li><a href="#responses">Specific responses to previous committee
        feedback</a></li>
      <li><a href="#sg14gameengine">Typical game engine requirements</a></li>
      <li><a href="#questions">Questions for reviewers</a></li>
      <li><a href="#revisions">Paper revision history</a></li>
    </ol>
  </li>
</ol>

<h2><a id="introduction"></a>I. Introduction</h2>

<p>The purpose of a container in the standard library cannot be to provide the most optimal solution for all scenarios. Inevitably, in fields such as high-performance trading or gaming, within critical loops the optimal solution will be a custom-made one that fits that scenario perfectly. However, outside of those most critical of hot paths, there is a wide range of application for more generalised solutions.</p>

<p>Colony is a formalisation, extension and optimization of what is typically
known as a 'bucket array' container in game programming circles; similar
structures exist in various incarnations across the high-performance computing,
high performace trading, physics simulation, robotics, server/client
application and particle simulation fields (see: <a
href="https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ">https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ</a>).</p>

<p>The concept of a bucket array is: you have multiple memory blocks of
elements, and a boolean token for each element which denotes whether or not
that element is 'active' or 'erased'. If it is 'erased', it is skipped over
during iteration. When all elements in a block are erased, the block is
removed, so that iteration does not lose performance by having to skip empty
blocks. If an insertion occurs when all the blocks are full, a new memory block
is allocated.</p>

<p>The advantages of this structure are as follows: because a skipfield is
used, no reallocation of elements is necessary upon erasure. Because the structure uses multiple memory blocks,
insertions to a full container also do not trigger reallocations. This means that element memory
locations stay stable and pointers/references stay valid
regardless of erasure/insertion. This is highly desirable, for
example, in game programming because there are usually multiple elements in
different containers which need to reference each other during gameplay and elements are being inserted or erased in real time.</p>

<p>Problematic aspects of a typical bucket array are that they tend to have a
fixed memory block size, do not re-use memory locations from erased elements,
and utilize a boolean skipfield. The fixed block size (as opposed to block
sizes with a growth factor) and lack of erased-element re-use leads to far more
allocations/deallocations than is necessary. Given that allocation is typically
a costly operation in most OS's, this becomes important in performance-critical
environments. The boolean skipfield makes iteration time complexity undefined,
as there is no way of knowing ahead of time how many erased elements occur
between any two erased elements. It also requires branching code, which may
cause issues on processors with deep pipelines and poor branch-prediction
failure performance.</p>

<p>A colony uses a non-boolean, largely non-branching method for skipping
<i>runs</i> of erased elements, which allows for O(1) amortized iteration time
complexity and more-predictable iteration performance than a bucket array. It
also utilizes a growth factor for memory blocks and reuses erased element
locations upon insertion, which leads to fewer allocations/reallocations.
Because it reuses erased element memory space, the exact location of insertion
is undefined, unless no erasures have occured or an equal number of erasures
and insertions have occured (in which case the insertion location is the back
of the container). The container is therefore considered unordered but
sortable. Lastly, because there is no way of predicting in advance where
erasures ('skips') may occur during iteration, an O(1) time complexity []
operator is impossible and the container is bidirectional, but not
random-access.</p>

<img src="https://plflib.org/vector_addition.gif"
alt="Visual demonstration of inserting to a full vector" height="540"
width="960"> <img src="https://plflib.org/colony_addition.gif"
alt="Visual demonstration of inserting to a full colony" height="540"
width="960"> <img src="https://plflib.org/vector_erasure.gif"
alt="Visual demonstration of randomly erasing from a vector" height="540"
width="960"> <img src="https://plflib.org/colony_erasure.gif"
alt="Visual demonstration of randomly erasing from a colony" height="540"
width="960"> 

<p>There are two patterns for accessing stored elements in a colony: the first
is to iterate over the container and process each element (or skip some
elements using the advance/prev/next/++/-- functions). The second is to store
the iterator returned by the insert() function (or a pointer derived from the
iterator) in some other structure and access the inserted element in that
way.</p>

<h2><a id="motivation"></a>II. Motivation and Scope</h2>

<p><i>Note: Throughout this document I will use the term 'link' to denote any
form of referencing between elements whether it be via
iterators/pointers/indexes/references/id's/etcetera.</i></p>

<p>There are situations where data is heavily interlinked, iterated over
frequently, and changing often. An example is the typical video game engine.
Most games will have a central generic 'entity' or 'actor' class, regardless of
their overall schema (an entity class does not imply an <a
href="https://en.wikipedia.org/wiki/Entity-component-system">ECS</a>).
Entity/actor objects tend to be 'has a'-style objects rather than 'is a'-style
objects, which link to, rather than contain, shared resources like sprites,
sounds and so on. Those shared resources are usually located in separate
containers/arrays so that they can re-used by multiple entities. Entities are
in turn referenced by other structures within a game engine, such as
quadtrees/octrees, level structures, and so on.</p>

<p>Entities may be erased at any time (for example, a wall gets destroyed and
no longer is required to be processed by the game's engine, so is erased) and
new entities nserted (for example, a new enemy is spawned). While this is all
happening the links between entities, resources and superstructures such as
levels and quadtrees, must stay valid in order for the game to run. The order
of the entities and resources themselves within the containers is, in the
context of a game, typically unimportant, so an unordered container is okay.</p>

<p>Unfortunately the container with the best iteration performance in the
standard library, vector<sup><a href="#benchmarks">[1]</a></sup>, loses pointer
validity to elements within it upon insertion, and pointer/index validity upon
erasure. This tends to lead to sophisticated and often restrictive workarounds
when developers attempt to utilize vector or similar containers under the above
circumstances.</p>

<p>std::list and the like are not suitable due to their poor locality, which
leads to poor cache performance during iteration. This is however an ideal
situation for a container such as colony, which has a high degree of locality.
Even though that locality can be punctuated by gaps from erased elements, it
still works out better in terms of iteration performance<sup><a
href="#benchmarks">[1]</a></sup> than every existing standard library container
other than deque/vector, regardless of the ratio of erased to non-erased
elements.</p>

<p>Some more specific requirements for containers in the context of game
development are listed in the <a href="#sg14gameengine">appendix</a>.</p>

<p>As another example, particle simulation (weather, physics etcetera) often
involves large clusters of particles which interact with external objects and
each other. The particles each have individual properties (spin, momentum,
direction etc) and are being created and destroyed continuously. Therefore the
order of the particles is unimportant, what is important is the speed of
erasure and insertion. No current standard library container has both strong
insertion and non-back erasure speed, so again this is a good match for
colony.</p>

<p><a
href="https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ">Reports
from other fields</a> suggest that, because most developers aren't aware of
containers such as this, they often end up using solutions which are sub-par
for iteration such as std::map and std::list in order to preserve pointer
validity, when most of their processing work is actually iteration-based. So,
introducing this container would both create a convenient solution to these
situations, as well as increasing awareness of better-performing approaches in
general. It will also ease communication across fields, as opposed to the
current scenario where each field uses a similar container but each has a
different name for it.</p>

<h2><a id="impact"></a>III. Impact On the Standard</h2>

<p>This is a pure library addition, no changes necessary to the standard asides
from the introduction of the colony container. <br>
A reference implementation of colony is available for download and use <a
href="https://plflib.org/colony.htm#download">here</a>. </p>

<h2><a id="design"></a>IV. Design Decisions</h2>

<p>The three core aspects of a colony from an abstract perspective are: </p>
<ol>
  <li>A collection of element memory blocks + metadata, to prevent reallocation
    during insertion (as opposed to a single memory block)</li>
  <li>A non-boolean skipfield, to enable O(1) skipping of erased elements
    during iteration (as opposed to reallocating subsequent elements during
    erasure)</li>
  <li>An erased-element location recording mechanism, to enable the re-using of
    memory from erased elements during subsequent insertions</li>
</ol>

<p>Each memory block houses multiple elements. The metadata about each block
may or may not be allocated with the blocks themselves (could be contained in a
separate structure). This metadata might include, for example, the number of
erased elements within each block and the block's capacity - which would allow
the container to know when the block is empty. A non-boolean skipfield is
required in order to skip over erased elements during iteration while
maintaining O(1) amortized iteration time complexity. Finally, a mechanism for
keeping track of elements which've been erased must be present, so that those
memory locations can be reused upon subsequent element insertions.</p>

<p>The following aspects of a colony must be implementation-defined in order to
allow for variance in implementations:</p>
<ul>
  <li>the skipfield structure</li>
  <li>skipfield modification time complexity</li>
  <li>erasure-recording mechanism</li>
  <li>element memory block metadata</li>
  <li>iterator structure</li>
  <li>memory block growth factor</li>
  <li>time complexity of advance()/next()/prev()</li>
</ul>

<p>But their implementation <em>is</em> significantly constrained by the
requirements of the container (lack of reallocation and stable pointers to
non-erased elements regardless of erasures/insertions, etcetera).</p>

<p>In terms of the <a href="https://plflib.org/colony.htm">reference
implementation</a>, the specific structure and mechanisms have changed many
times over the course of development, however the interface to the container
and it's time complexity guarantees have remained largely unchanged (with the
exception of the time complexity for updating skipfield nodes). So it is
reasonably likely that regardless of specific implementation, it is possible to
maintain this general specification without obviating future improvements in
implementation, so long time complexity guarantees for updating skipfields are
left implementation-defined.</p>

<p>Below I will explain the reference implementation's approach in terms of the
three aspects described above, along with some alternatives for
implementation.</p>

<h4>1. Collection of element memory blocks + metadata</h4>

<p>In the reference implementation this is essentially a doubly-linked list of
'group' structs containing (a) memory blocks, (b) memory block metadata and (c)
skipfields. The memory blocks and skipfields have a growth factor of 2 from one
group to the next. The metadata includes information necessary for an iterator
to iterate over colony elements, such as the last insertion point within the
memory block, and other information useful to specific functions, such as the
total number of non-erased elements in the node. This approach keeps the
operation of freeing empty memory blocks from the colony container at O(1) time
complexity. Further information is available <a
href="https://plflib.org/chained_group_allocation_pattern.htm">here</a>.</p>

<p>An alternative implementation could be to use a vector of pointers to
dynamically-allocated memory blocks + skipfields in a single struct, with a
separate vector of memory block metadata structs. Such an approach would have
some advantages in terms of increasing the locality for metadata during
iteration, but would create reallocation costs when memory blocks + their
skipfields and metadata were removed upon becoming empty.</p>

<p>A vector of memory blocks, as opposed to a vector of pointers to memory
blocks, would not work as it would (a) disallow a growth factor in the memory
blocks and (b) invalidate pointers to elements in subsequent blocks when a
memory block became empty of elements and was therefore removed from the
vector. In short it would negate all of a colony's beneficial aspects.</p>

<h4>2. Non-boolean skipfield</h4>

<p>The reference implementation currently uses a skipfield pattern temporarily
named the 'bentley' skipfield pattern (paper in progress, name will probably
change). This effectively encodes the run-length of sequences of contiguous
erased elements, into a skipfield, which allows for O(1) time complexity during
iteration. Since there is no branching involved in iterating over the skipfield
aside from end-of-block checks, it is less problematic than a boolean skipfield
(which has to branch for every skipfield read) in terms of CPU's which don't
handle branching or branch-prediction failure efficiently.</p>

<p>This pattern stores and modifies the run-lengths during insertion and
erasure, with O(1) time complexity. It has a lot of similarities to the <a
href="http://em.rdcu.be/wf/click?upn=KP7O1RED-2BlD0F9LDqGVeSPyQHezub7M4gGFa4NTPPTU-3D_ih77hK-2FwXUNPXOClzbShNQsKzXRuUomlRdQ1DjaMsrpnUBwwtbFTp5VEo6fdTXEOm5aVQpvVZ28aaMucmOmTG7j6bqKdutLSZ5s-2FvVOpi0U-2BRUm-2BokLgfiyljrkOnlzzohhddyytFQ6xbOHnaSP-2BiryryBzyk0-2FcqHJHqWla0UjauVoYm2aWi5no-2F91Tum6XKjVqwSLtk3SZQBA-2BuYbfglhC7NGb-2F0qoV47pMMnoV-2Fo-3D">advanced
jump-counting skipfield pattern</a>, which was the pattern previously used by
the reference implementation.</p>

<p>Using an advanced jump-counting skipfield is an alternative, though the
skipfield update time complexity guarantees for that pattern are effectively
undefined, or between O(1) and O(skipfield length) for each insertion/erasure.
In practice those updates result in one memcpy operation which resolves to a
single block-copy operation, but it is still a little slower than the 'bentley'
skipfield. The skipfield type you use will also typically have an effect on the
type of memory-reuse mechanism you can utilize.</p>

<p>A boolean skipfield is not usable because it makes iteration time complexity
undefined - it could for example result in thousands of branching statements +
skipfield reads for a single ++ operation in the case of many consecutive
erased elements. In the high-performance fields for which this container was
initially designed, this brings with it unacceptable latency.</p>

<h4>3. Erased-element location recording mechanism</h4>

<p>The reference implementation currently uses two things to keep track of
erased element locations:</p>
<ol type="a">
  <li>Metadata for each memory block includes a 'next block with erasures'
    pointer. The container itself contains a 'blocks with erasures' intrusive
    list-head pointer. These are used by the container to create an instrusive
    singly-linked list of memory blocks with erased elements which can be
    re-used for future insertions.</li>
  <li>Metadata for each memory block also includes a 'free list head' index
    number, which gives the index within the memory block, of the last erased
    element. The memory space of this element is reinterpret_cast'd as two
    index numbers, the first ("previous" index) giving the index of the
    previously erased element, the second ("next" index) giving the next index
    in the sequence (in this case a unique number because it's the head of the
    free list), and so on - this forms a free list of erased element memory
    locations which may be re-used.</li>
</ol>

<p>Previous versions of the reference implementation used a singly-linked free
list instead of a doubly-linked one, this is possible with the advanced
jump-counting skipfield, not possible using a bentley pattern for various
reasons.</p>

<p>One cannot use a stack of pointers to erased elements for this mechanism, as
early versions of the reference implementation did, because this can create
allocations during erasure, which changes the exception guarantees of erase.
One could instead scan all skipfields until an erased location is found, though
this would be slow.</p>

<h3>Implementation of iterator class</h3>

<p>The reference implementation's iterator stores a pointer to the current
'group' struct mentioned above, plus a pointer to the current element and a
pointer to it's corresponding skipfield node. An alternative approach is to
store the group pointer + an index, since the index can indicate both the
offset from the memory block for the element, as well as the offset from the
start of the skipfield for the skipfield node. However multiple implementations
and benchmarks across many processors have shown this to be worse-performing
than the separate pointer-based approach, despite the increased memory cost for
the iterator class itself.</p>

<p>++ operation is as follows, utilizing the reference implementations
'bentley' skipfield pattern:</p>
<ol>
  <li>Add 1 to the existing element and skipfield pointers.</li>
  <li>Dereference skipfield pointer to get content of skipfield node, add
    content of skipfield node to both the skipfield pointer and the element
    pointer. If the node is erased, it's value will be a positive integer
    indicating the number of nodes until the next non-erased node, if not
    erased it will be zero.</li>
  <li>If element pointer is beyond end of element memory block, change group
    pointer to next group, element pointer to the start of the next group's
    element memory block, skipfield pointer to the start of the next group's
    skipfield. Then go back to 2.</li>
</ol>

<p>-- operation is the same except both step 1 and 2 involve subtraction rather
than adding, and step 3 checks to see if element pointer is before the
beginning of the element memory blocks and if so relocates to the previous
group rather than the next group.</p>

<h3>Results of implementation</h3>

<p>In practical application the reference implementation is generally faster
for insertion and (non-back) erasure than current standard library containers,
and generally faster for iteration than any container except vector and deque.
See benchmarks <a href="#benchmarks">here</a>.</p>

<h2><a id="technical"></a>V. Technical Specifications</h2>

<h4>Time complexities for basic operations</h4>
<ul>
  <li>insert (single): O(1) amortized</li>
  <li>insert (multiple): O(N)</li>
  <li>erase (single): O(1) amortized</li>
  <li>erase (multiple): O(N) for non-trivially-destructible elements, for
    trivially-destructible elements O(log n) or better (between O(1) and O(log
    n) depending on start and end erasure locations and state of colony).</li>
  <li>std::find: O(n)</li>
  <li>splice: O(1) amortized</li>
  <li>all iterator operations: O(1) amortized</li>
  <li>begin()/end(): O(1)</li>
  <li>advance()/next()/prev(): Implementation defined, as this depends on the
    metadata stored for each memory block. In terms of current reference
    implementation, between O(1) and O(n), depending on current location, end
    location and state of colony.</li>
</ul>


<h4>General specification</h4>

<p>Colony meets the requirements of the C++ <a
href="https://en.cppreference.com/w/cpp/named_req/Container">Container</a>, <a
href="https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer">AllocatorAwareContainer</a>,
and <a
href="https://en.cppreference.com/w/cpp/named_req/ReversibleContainer">ReversibleContainer</a>
concepts.</p>

<p>For the most part the syntax and semantics of colony functions are very
similar to all std:: c++ libraries. Formal description is as follows:</p>
<code>template &lt;class T, class Allocator = std::allocator&lt;T&gt;, typename
Skipfield_Type = unsigned short&gt; class colony</code>

<p><code><b>T</b></code> - the element type. In general T must meet the
requirements of <a
href="https://en.cppreference.com/w/cpp/named_req/Erasable">Erasable</a>, <a
href="https://en.cppreference.com/w/cpp/named_req/CopyAssignable">CopyAssignable</a>
and <a
href="https://en.cppreference.com/w/cpp/named_req/CopyConstructible">CopyConstructible</a>.<br>
However, if emplace is utilized to insert elements into the colony, and no
functions which involve copying or moving are utilized, T is only required to
meet the requirements of <a
href="https://en.cppreference.com/w/cpp/named_req/Erasable">Erasable</a>.<br>
If move-insert is utilized instead of emplace, T must also meet the
requirements of <a
href="https://en.cppreference.com/w/cpp/named_req/MoveConstructible">MoveConstructible</a>.<br>
<br>
<code><b>Allocator</b></code> - an allocator that is used to acquire memory to
store the elements. The type must meet the requirements of <a
href="https://en.cppreference.com/w/cpp/named_req/Allocator">Allocator</a>. The
behavior is undefined if <code>Allocator::value_type</code> is not the same as
T.<br>
<br>
<code><b>Skipfield_Type</b></code> - an unsigned integer type. This type is
used to form the skipfield which skips over erased T elements. In terms of the
reference implementation, this also acts as a limiting factor to the maximum
size of memory blocks, due to the way that the skipfield pattern works
(<code>eg. unsigned short</code> is 16-bit on most platforms which constrains
the size of individual memory blocks to a maximum of 65535 elements).
<code>unsigned short</code> has been found to be the optimal type for the
current reference implementation. However in the case of small collections (ie.
&lt; 1000 elements) in a memory-constrained environment, it may be useful to
reduce the memory usage of the skipfield by reducing the skipfield bit depth to
a Uint8 type. The reduced skipfield size may also reduce cache saturation in
this case without impacting iteration speed due to the low amount of elements.
However whether or not this constitutes a performance advantage is <a
href="https://plflib.org/blog.htm#shortandchardifferences">largely
situational</a>, so it is best to leave control in the end user's hands.</p>

<h4>Basic example of usage (using <a
href="https://plflib.org/colony.htm">reference implementation</a>)</h4>

<div
style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;">
<pre style="margin: 0; line-height: 125%"><code><span style="color: #557799">#include &lt;iostream&gt;</span>
<span style="color: #557799">#include "plf_colony.h"</span>

<span style="color: #333399; font-weight: bold">int</span> <span style="color: #0066BB; font-weight: bold">main</span>(<span style="color: #333399; font-weight: bold">int</span> argc, <span style="color: #333399; font-weight: bold">char</span> <span style="color: #333333">**</span>argv)
{
  plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span><span style="color: #333333">&gt;</span> i_colony;

  <span style="color: #888888">// Insert 100 ints:</span>
  <span style="color: #008800; font-weight: bold">for</span> (<span style="color: #333399; font-weight: bold">int</span> i <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span>; i <span style="color: #333333">!=</span> <span style="color: #0000DD; font-weight: bold">100</span>; <span style="color: #333333">++</span>i)
  {
    i_colony.insert(i);
  }

  <span style="color: #888888">// Erase half of them:</span>
  <span style="color: #008800; font-weight: bold">for</span> (plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span><span style="color: #333333">&gt;::</span>iterator it <span style="color: #333333">=</span> i_colony.begin(); it <span style="color: #333333">!=</span> i_colony.end(); <span style="color: #333333">++</span>it)
  {
    it <span style="color: #333333">=</span> i_colony.erase(it);
  }

  <span style="color: #888888">// Total the remaining ints:</span>
  <span style="color: #333399; font-weight: bold">int</span> total <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span>;

  <span style="color: #008800; font-weight: bold">for</span> (plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span><span style="color: #333333">&gt;::</span>iterator it <span style="color: #333333">=</span> i_colony.begin(); it <span style="color: #333333">!=</span> i_colony.end(); <span style="color: #333333">++</span>it)
  {
    total <span style="color: #333333">+=</span> <span style="color: #333333">*</span>it;
  }

  std<span style="color: #333333">::</span>cout <span style="color: #333333">&lt;&lt;</span> <span style="background-color: #fff0f0">"Total: "</span> <span style="color: #333333">&lt;&lt;</span> total <span style="color: #333333">&lt;&lt;</span> std<span style="color: #333333">::</span>endl;
  std<span style="color: #333333">::</span>cin.get();
  <span style="color: #008800; font-weight: bold">return</span> <span style="color: #0000DD; font-weight: bold">0</span>;
} </code></pre>
</div>

<h4>Example demonstrating pointer stability</h4>

<div
style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;">
<pre style="margin: 0; line-height: 125%"><code><span style="color: #557799">#include &lt;iostream&gt;</span>
<span style="color: #557799">#include "plf_colony.h"</span>

<span style="color: #333399; font-weight: bold">int</span> <span style="color: #0066BB; font-weight: bold">main</span>(<span style="color: #333399; font-weight: bold">int</span> argc, <span style="color: #333399; font-weight: bold">char</span> <span style="color: #333333">**</span>argv)
{
  plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span><span style="color: #333333">&gt;</span> i_colony;
  plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span><span style="color: #333333">&gt;::</span>iterator it;
  plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span> <span style="color: #333333">*&gt;</span> p_colony;
  plf<span style="color: #333333">::</span>colony<span style="color: #333333">&lt;</span><span style="color: #333399; font-weight: bold">int</span> <span style="color: #333333">*&gt;::</span>iterator p_it;

  <span style="color: #888888">// Insert 100 ints to i_colony and pointers to those ints to p_colony:</span>
  <span style="color: #008800; font-weight: bold">for</span> (<span style="color: #333399; font-weight: bold">int</span> i <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span>; i <span style="color: #333333">!=</span> <span style="color: #0000DD; font-weight: bold">100</span>; <span style="color: #333333">++</span>i)
  {
    it <span style="color: #333333">=</span> i_colony.insert(i);
    p_colony.insert(<span style="color: #333333">&amp;</span>(<span style="color: #333333">*</span>it));
  }

  <span style="color: #888888">// Erase half of the ints:</span>
  <span style="color: #008800; font-weight: bold">for</span> (it <span style="color: #333333">=</span> i_colony.begin(); it <span style="color: #333333">!=</span> i_colony.end(); <span style="color: #333333">++</span>it)
  {
    it <span style="color: #333333">=</span> i_colony.erase(it);
  }

  <span style="color: #888888">// Erase half of the int pointers:</span>
  <span style="color: #008800; font-weight: bold">for</span> (p_it <span style="color: #333333">=</span> p_colony.begin(); p_it <span style="color: #333333">!=</span> p_colony.end(); <span style="color: #333333">++</span>p_it)
  {
    p_it <span style="color: #333333">=</span> p_colony.erase(p_it);
  }

  <span style="color: #888888">// Total the remaining ints via the pointer colony (pointers will still be valid even after insertions and erasures):</span>
  <span style="color: #333399; font-weight: bold">int</span> total <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span>;

  <span style="color: #008800; font-weight: bold">for</span> (p_it <span style="color: #333333">=</span> p_colony.begin(); p_it <span style="color: #333333">!=</span> p_colony.end(); <span style="color: #333333">++</span>p_it)
  {
    total <span style="color: #333333">+=</span> <span style="color: #333333">*</span>(<span style="color: #333333">*</span>p_it);
  }

  std<span style="color: #333333">::</span>cout <span style="color: #333333">&lt;&lt;</span> <span style="background-color: #fff0f0">"Total: "</span> <span style="color: #333333">&lt;&lt;</span> total <span style="color: #333333">&lt;&lt;</span> std<span style="color: #333333">::</span>endl;

  <span style="color: #008800; font-weight: bold">if</span> (total <span style="color: #333333">==</span> <span style="color: #0000DD; font-weight: bold">2500</span>)
  {
    std<span style="color: #333333">::</span>cout <span style="color: #333333">&lt;&lt;</span> <span style="background-color: #fff0f0">"Pointers still valid!"</span> <span style="color: #333333">&lt;&lt;</span> std<span style="color: #333333">::</span>endl;
  }

  std<span style="color: #333333">::</span>cin.get();
  <span style="color: #008800; font-weight: bold">return</span> <span style="color: #0000DD; font-weight: bold">0</span>;
} </code></pre>
</div>

<h4>Iterator Invalidation</h4>

<table border="1">
  <tbody>
    <tr>
      <td>All read-only operations, swap, std::swap, free_unused_memory</td>
      <td>Never</td>
    </tr>
    <tr>
      <td>clear, sort, reinitialize, operator =</td>
      <td>Always</td>
    </tr>
    <tr>
      <td>change_block_sizes, change_minimum_block_size,
        change_maximum_block_size</td>
      <td>Only if supplied minimum block size is larger than smallest block in
        colony, or supplied maximum block size is smaller than largest block in
        colony.</td>
    </tr>
    <tr>
      <td>erase</td>
      <td>Only for the erased element. If an iterator is == end() it may be
        invalidated if the last element in the colony is erased, in some cases
        (similar to std::deque). If a reverse_iterator is == rend() it may be
        invalidated if the first element in the colony is erased, in some
      cases.</td>
    </tr>
    <tr>
      <td>insert, emplace</td>
      <td>If an iterator is == end() it may be invalidated by a subsequent
        insert/emplace, in some cases.</td>
    </tr>
  </tbody>
</table>

<h4>Member types</h4>

<table border="1">
  <tbody>
    <tr>
      <td><b>Member type</b></td>
      <td><b>Definition</b></td>
    </tr>
    <tr>
      <td><code>value_type</code></td>
      <td><code>T</code></td>
    </tr>
    <tr>
      <td><code>allocator_type</code></td>
      <td><code>Allocator</code></td>
    </tr>
    <tr>
      <td><code>skipfield_type</code> </td>
      <td><code>T_skipfield_type</code> </td>
    </tr>
    <tr>
      <td><code>size_type</code></td>
      <td><code>std::allocator_traits&lt;Allocator&gt;::size_type</code></td>
    </tr>
    <tr>
      <td><code>difference_type</code></td>
      <td><code>std::allocator_traits&lt;Allocator&gt;::difference_type</code></td>
    </tr>
    <tr>
      <td><code>reference</code></td>
      <td><code>value_type &amp;</code></td>
    </tr>
    <tr>
      <td><code>const_reference</code></td>
      <td><code>const value_type &amp;</code></td>
    </tr>
    <tr>
      <td><code>pointer</code></td>
      <td><code>std::allocator_traits&lt;Allocator&gt;::pointer</code></td>
    </tr>
    <tr>
      <td><code>const_pointer</code></td>
      <td><code>std::allocator_traits&lt;Allocator&gt;::const_pointer</code></td>
    </tr>
    <tr>
      <td><code>iterator</code></td>
      <td><code>BidirectionalIterator</code></td>
    </tr>
    <tr>
      <td><code>const_iterator</code></td>
      <td><code>Constant BidirectionalIterator</code></td>
    </tr>
    <tr>
      <td><code>reverse_iterator</code></td>
      <td><code>BidirectionalIterator</code></td>
    </tr>
    <tr>
      <td><code>const_reverse_iterator</code></td>
      <td><code>Constant BidirectionalIterator</code></td>
    </tr>
  </tbody>
</table>

<h3>Constructors</h3>

<table border="1">
  <tbody>
    <tr>
      <td>standard</td>
      <td><code>colony()<br>
        <br>
        explicit colony(allocator_type &amp;alloc)</code></td>
    </tr>
    <tr>
      <td>fill</td>
      <td><code>colony(size_type n, Skipfield_type min_block_size = 8,
        Skipfield_type max_block_size =
        std::numeric_limits&lt;Skipfield_type&gt;::max(), allocator_type
        &amp;alloc = allocator_type())<br>
        <br>
        explicit colony(size_type n, value_type &amp;element, Skipfield_type
        min_block_size = 8, Skipfield_type max_block_size =
        std::numeric_limits&lt;Skipfield_type&gt;::max(), allocator_type
        &amp;alloc = allocator_type()) </code> </td>
    </tr>
    <tr>
      <td>range</td>
      <td><code>template&lt;typename InputIterator&gt; colony(const
        InputIterator &amp;first, InputIterator &amp;last, Skipfield_type
        min_block_size = 8, Skipfield_type max_block_size =
        std::numeric_limits&lt;Skipfield_type&gt;::max(), allocator_type
        &amp;alloc = allocator_type())<br>
        </code> </td>
    </tr>
    <tr>
      <td>copy</td>
      <td><code>colony(colony &amp;source)<br>
        <br>
        colony(colony &amp;source, allocator_type &amp;alloc) </code> </td>
    </tr>
    <tr>
      <td>move</td>
      <td><code>colony(colony &amp;&amp;source) noexcept<br>
        <br>
        colony(colony &amp;&amp;source, allocator_type &amp;alloc)</code><br>


        <p><i>Note: postcondition state of source colony is the same as that of
        an empty colony.</i></p>

        <p></p>
      </td>
    </tr>
    <tr>
      <td>initializer list</td>
      <td><code>colony(std::initializer_list&lt;value_type&gt;
        &amp;element_list, Skipfield_type min_block_size = 8, Skipfield_type
        max_block_size = std::numeric_limits&lt;Skipfield_type&gt;::max(),
        allocator_type &amp;alloc = allocator_type()) </code> </td>
    </tr>
  </tbody>
</table>

<h5>Some constructor usage examples</h5>
<ul>
  <li><code>colony&lt;T&gt; a_colony</code> 
    <p>Default constructor - default minimum block size is 8, default maximum
    block size is std::numeric_limits&lt;Skipfield_type&gt;::max() (typically
    65535). You cannot set the block sizes from the constructor in this
    scenario, but you can call the change_block_sizes() member function after
    construction has occurred. <br>
    Example: <code style="color: brown">std::colony&lt;int&gt;
    int_colony;</code> </p>
  </li>
  <li><code>colony&lt;T, the_allocator&lt;T&gt; &gt; a_colony(const
    allocator_type &amp;alloc = allocator_type())</code> 
    <p>Default constructor, but using a custom memory allocator eg. something
    other than std::allocator. <br>
    Example: <code style="color: brown">std::colony&lt;int,
    tbb::allocator&lt;int&gt; &gt; int_colony;</code> <br>
    Example2: <br>
    <code style="color: brown">// Using an instance of an allocator as well as
    it's type<br>
    tbb::allocator&lt;int&gt; alloc_instance;<br>
    std::colony&lt;int, tbb::allocator&lt;int&gt; &gt;
    int_colony(alloc_instance);</code> </p>
  </li>
  <li><code>colony&lt;T&gt; a_colony(size_type n, Skipfield_type min_block_size
    = 8, Skipfield_type max_block_size =
    std::numeric_limits&lt;Skipfield_type&gt;::max())</code>
    <p>Fill constructor with value_type unspecified, so the value_type's
    default constructor is used. <code>n</code> specifies the number of
    elements to create upon construction. If <code>n</code> is larger than
    <code>min_block_size</code>, the size of the blocks created will either be
    <code>n</code> and <code>max_block_size</code>, depending on which is
    smaller. <code>min_block_size</code> (ie. the smallest possible number of
    elements which can be stored in a colony block) can be defined, as can the
    <code>max_block_size</code>. Setting the block sizes can be a performance
    advantage if you know in advance roughly how many objects are likely to be
    stored in your colony long-term - or at least the rough scale of storage.
    If that case, using this can stop many small initial blocks being
    allocated. <br>
    Example: <code style="color: brown">std::colony&lt;int&gt;
    int_colony(62);</code> </p>
  </li>
  <li><code>colony&lt;T&gt; a_colony(std::initializer_list&lt;value_type&gt;
    &amp;element_list,<br>
    Skipfield_type min_block_size = 8, Skipfield_type max_block_size =
    std::numeric_limits&lt;Skipfield_type&gt;::max())</code> 
    <p>Using an initialiser list to insert into the colony upon construction.
    <br>
    Example: <code style="color: brown">std::initializer_list&lt;int&gt;
    &amp;el = {3, 5, 2, 1000};<br>
    std::colony&lt;int&gt; int_colony(el, 64, 512);</code> </p>
  </li>
  <li><code>colony&lt;T&gt; a_colony(colony &amp;source)</code>
    <p>Copy all contents from source colony, removes any empty (erased) element
    locations in the process. Size of blocks created is either the total size
    of the source colony, or the maximum block size of the source colony,
    whichever is the smaller. <br>
    Example: <code style="color: brown">std::colony&lt;int&gt;
    int_colony_2(int_colony_1);</code> </p>
  </li>
  <li><code>colony&lt;T&gt; a_colony(colony &amp;&amp;source)</code>
    <p>Move all contents from source colony, does not remove any erased element
    locations or alter any of the source block sizes. Source colony is now
    empty and can be safely destructed or otherwise used.<br>
    Example: <code style="color: brown">std::colony&lt;int&gt; int_colony_1(50,
    5, 512, 512); // Fill-construct a colony with min and max block sizes set
    at 512 elements. Fill with 50 instances of int == 5.<br>
    std::colony&lt;int&gt; int_colony_2(std::move(int_colony_1)); // Move all
    data to int_colony_2. All of the above characteristics are now applied to
    int_colony2.</code> </p>
  </li>
</ul>

<h3>Iterators</h3>

<p>Iterators are bidirectional but also provide O(1) time complexity &gt;,
&lt;, &gt;= and &lt;= operators for convenience (for example, for use in
<code>for</code> loops when skipping over multiple elements per loop). The O(1)
complexity of these operators are achieved by keeping a record of the order of
memory blocks in some way (in the reference implementation this is done via
assigning a number to each memory block in it's metadata), comparing the
relative order of the two iterators' memory blocks via this number, then
comparing the memory locations of the elements themselves, if they happen to be
in the same memory block. The full list of operators for iterator,
reverse_iterator, const_iterator and const_reverse_iterator follow:</p>

<p><code>operator * <br>
operator -&gt; <br>
operator ++<br>
operator --<br>
operator = <br>
operator == <br>
operator != <br>
operator &lt; <br>
operator &gt; <br>
operator &lt;= <br>
operator &gt;= <br>
base() (reverse_iterator and const_reverse_iterator only)</code> </p>


<p>For more information see the <a href="#functions">member functions list</a> in the appendices.</p>




<h2><a id="acknowledgements"></a>VI. Acknowledgements</h2>
<p>Matt would like to thank: Glen Fernandes and Ion Gaztanaga for restructuring
advice, Robert Ramey for documentation advice, various Boost and SG14 members
for support, Baptiste Wicht for teaching me how to construct decent benchmarks,
Jonathan Wakely for standards-compliance advice and critiques, Sean Middleditch, Patrice Roy
and Guy Davidson for critiques, support and bug reports, that guy from Lionhead for
annoying me enough to force me to implement the original skipfield
pattern, Jon Blow for some initial advice and Mike Acton for some influence.<br>
Also Nico Josuttis for doing such an excellent job in terms of explaining the general format of the structure to the committee.
</p>




<h2>Appendices</h2>

<h3><a id="functions"></a>Appendix A: Memberfunctions</h3>

<h4>Insert</h4>

<table border="1">
  <tbody>
    <tr>
      <td>single element</td>
      <td><code>iterator insert (value_type &amp;val)</code></td>
    </tr>
    <tr>
      <td>fill</td>
      <td><code>iterator insert (size_type n, value_type &amp;val)</code></td>
    </tr>
    <tr>
      <td>range</td>
      <td><code>template &lt;class InputIterator&gt; iterator insert
        (InputIterator first, InputIterator last)</code></td>
    </tr>
    <tr>
      <td>move</td>
      <td><code>iterator insert (value_type&amp;&amp; val)</code></td>
    </tr>
    <tr>
      <td>initializer list</td>
      <td><code>iterator insert (std::initializer_list&lt;value_type&gt;
        il)</code></td>
    </tr>
  </tbody>
</table>
<ul>
  <li><code>iterator insert(value_type &amp;element)</code> 
    <p>Inserts the element supplied to the colony, using the object's
    copy-constructor. Will insert the element into a previously erased element
    slot if one exists, otherwise will insert to back of colony. Returns
    iterator to location of inserted element. Example:</p>
    <code style="color: brown">std::colony&lt;unsigned int&gt; i_colony;<br>
    i_colony.insert(23);</code> </li>
  <li><code>iterator insert(value_type &amp;&amp;element)</code> 
    <p>Moves the element supplied to the colony, using the object's
    move-constructor. Will insert the element in a previously erased element
    slot if one exists, otherwise will insert to back of colony. Returns
    iterator to location of inserted element. Example:</p>
    <p><code style="color: brown">std::string string1 = "Some text";<br>
    <br>
    std::colony&lt;std::string&gt; data_colony;<br>
    data_colony.insert(std::move(string1));</code> </p>
  </li>
  <li><code>void insert (size_type n, value_type &amp;val)</code> 
    <p>Inserts <code>n</code> copies of <code>val</code> into the colony. Will
    insert the element into a previously erased element slot if one exists,
    otherwise will insert to back of colony. Example:</p>
    <code style="color: brown">std::colony&lt;unsigned int&gt; i_colony;<br>
    i_colony.insert(10, 3);</code> </li>
  <li><code>template &lt;class InputIterator&gt; void insert (InputIterator
    &amp;first, InputIterator &amp;last)</code> 
    <p>Inserts a series of <code>value_type</code> elements from an external
    source into a colony holding the same <code>value_type</code> (eg. int,
    float, a particular class, etcetera). Stops inserting once it reaches
    <code>last</code>. Example:</p>
    <code style="color: brown">// Insert all contents of colony2 into
    colony1:<br>
    colony1.insert(colony2.begin(), colony2.end());</code> </li>
  <li><code>void insert (std::initializer_list&lt;value_type&gt;
    &amp;il)</code> 
    <p>Copies elements from an initializer list into the colony. Will insert
    the element in a previously erased element slot if one exists, otherwise
    will insert to back of colony. Example:</p>
    <p><code style="color: brown">std::initializer_list&lt;int&gt; some_ints =
    {4, 3, 2, 5};<br>
    <br>
    std::colony&lt;int&gt; i_colony;<br>
    i_colony.insert(some_ints);</code> </p>
  </li>
  <li><code>iterator emplace(Arguments &amp;&amp;...parameters)</code> 
    <p>Constructs new element directly within colony. Will insert the element
    in a previously erased element slot if one exists, otherwise will insert to
    back of colony. Returns iterator to location of inserted element.
    "...parameters" are whatever parameters are required by the element's
    constructor. Example:</p>
    <p><code style="color: brown">class simple_class<br>
    {<br>
    private:<br>
    int number;<br>
    public:<br>
    simple_class(int a_number): number (a_number) {};<br>
    };<br>
    <br>
    std::colony&lt;simple_class&gt; simple_classes;<br>
    simple_classes.emplace(45); </code> </p>
  </li>
</ul>

<h4>Erase</h4>

<table border="1">
  <tbody>
    <tr>
      <td>single element</td>
      <td><code>iterator erase(const_iterator it)</code></td>
    </tr>
    <tr>
      <td>range</td>
      <td><code>void erase(const_iterator first, const_iterator
      last)</code></td>
    </tr>
  </tbody>
</table>
<ul>
  <li><code>iterator erase(const_iterator it)</code> 
    <p>Removes the element pointed to by the supplied iterator, from the
    colony. Returns an iterator pointing to the next non-erased element in the
    colony (or to end() if no more elements are available). Attempting to erase
    a previously-erased element results in undefined behaviour (this is checked
    for via an assert in debug mode). Example:</p>
    <code style="color: brown">std::colony&lt;unsigned int&gt;
    data_colony(50);<br>
    std::colony&lt;unsigned int&gt;::iterator an_iterator;<br>
    an_iterator = data_colony.insert(23);<br>
    an_iterator = data_colony.erase(an_iterator);</code> </li>
  <li><code>void erase(const_iterator first, const_iterator last)</code> 
    <p>Erases all elements of a given colony from <code>first</code> to the
    element before the <code>last</code> iterator. This function is optimized
    for multiple consecutive erasures and will always be faster than sequential
    single-element erase calls in that scenario. Example:</p>
    <code style="color: brown">std::colony&lt;int&gt; iterator1 =
    colony1.begin();<br>
    colony1.advance(iterator1, 10);<br>
    std::colony&lt;int&gt; iterator2 = colony1.begin();<br>
    colony1.advance(iterator2, 20);<br>
    colony1.erase(iterator1, iterator2);</code> </li>
</ul>

<h4>Other functions</h4>
<ul>
  <li><code>bool empty()</code> 
    <p>Returns a boolean indicating whether the colony is currently empty of
    elements.<br>
    Example: <code style="color: brown">if (object_colony.empty())
    return;</code></p>
  </li>
  <li><code>size_type size()</code> 
    <p>Returns total number of elements currently stored in container.<br>
    Example: <code style="color: brown">std::cout &lt;&lt; i_colony.size()
    &lt;&lt; std::endl;</code></p>
  </li>
  <li><code>size_type max_size()</code> 
    <p>Returns the maximum number of elements that the allocator can store in
    the container.<br>
    Example: <code style="color: brown">std::cout &lt;&lt; i_colony.max_size()
    &lt;&lt; std::endl;</code></p>
  </li>
  <li><code>size_type capacity()</code> 
    <p>Returns total number of elements currently able to be stored in
    container without expansion.<br>
    Example: <code style="color: brown">std::cout &lt;&lt; i_colony.capacity()
    &lt;&lt; std::endl;</code></p>
  </li>
  <li><code>void clear()</code> 
    <p>Empties the colony and removes all elements and blocks.<br>
    Example: <code style="color: brown">object_colony.clear();</code></p>
  </li>
  <li><code>void change_group_sizes(Skipfield_type min_group_size, Skipfield_type max_group_size)</code>
    <p>Changes the minimum and maximum internal group sizes, in terms of number
    of elements stored per group. If the colony is not empty and either
    min_group_size is larger than the smallest group in the colony, or
    max_group_size is smaller than the largest group in the colony, the colony
    will be internally copy-constructed into a new colony which uses the new
    group sizes, invalidating all pointers/iterators/references. If trying to change group sizes with a colony storing a non-copyable/movable type, please use the reinitialize function instead.<br>
    Example: <code style="color: brown">object_colony.change_group_sizes(1000,
    10000);</code></p>
  </li>
  <li><code>void change_minimum_group_size(Skipfield_type
    min_group_size)</code>
    <p>Changes the minimum internal group size only, in terms of minimum number
    of elements stored per group. If the colony is not empty and min_group_size
    is larger than the smallest group in the colony, the colony will be
    internally move-constructed (if possible) or copy-constructed into a new colony which uses the new minimum
    group size, invalidating all pointers/iterators/references. If trying to change group sizes with a colony storing a non-copyable/movable type, please use the reinitialize function instead.<br>
    Example: <code
    style="color: brown">object_colony.change_minimum_group_size(100);</code></p>
  </li>
  <li><code>void change_maximum_group_size(Skipfield_type
    min_group_size)</code>
    <p>Changes the maximum internal group size only, in terms of maximum number
    of elements stored per group. If the colony is not empty and either
    max_group_size is smaller than the largest group in the colony, the colony
    will be internally move-constructed (if possible) or copy-constructed into a new colony which uses the new
    maximum group size, invalidating all pointers/iterators/references. If trying to change group sizes with a colony storing a non-copyable/movable type, please use the reinitialize function instead.<br>
    Example: <code
    style="color: brown">object_colony.change_maximum_group_size(1000);</code></p>
  </li>
  <li><code>void reinitialize(Skipfield_type min_group_size,
    const Skipfield_type max_group_size)</code>
    <p>Semantics of this function are the same as "clear();
    change_group_sizes(min_group_size, max_group_size);", but without the
    move/copy-construction code of the change_group_sizes() function - this means it
    can be used with element types which are non-copy-constructible and non-move-constructible, unlike
    change_group_sizes().<br>
    Example: <code style="color: brown">object_colony.reinitialize(1000, 10000);</code></p>
  </li>
  <li><code>void swap(colony &amp;source)</code>
    <p>Swaps the colony's contents with that of <code>source</code>.<br>
    Example: <code
    style="color: brown">object_colony.swap(other_colony);</code></p>
  </li>
  <li><code>void sort();<br>
    <br>
    template &lt;class comparison_function&gt;<br>
    void sort(comparison_function compare);</code>
    <p>Sort the content of the colony. By default this compares the colony
    content using a less-than operator, unless the user supplies a comparison
    function (ie. same conditions as std::list's sort function). Uses std::sort
    internally but will use plf::timsort if plf_timsort.h is included in the
    project before plf_colony.h.<br>
    Example: <code style="color: brown">// Sort a colony of integers in
    ascending order:<br>
    int_colony.sort();<br>
    // Sort a colony of doubles in descending order:<br>
    double_colony.sort(std::greater&lt;double&gt;());</code></p>
  </li>
  <li><code>void splice(colony &amp;source)</code> 
    <p>Transfer all elements from source colony into destination colony without
    invalidating pointers/iterators to either colony's elements (in other words
    the destination takes ownership of the source's memory blocks). After the
    splice, the source colony is empty. Splicing is much faster than
    range-moving or copying all elements from one colony to another. Colony
    does not guarantee a particular order of elements after splicing, for
    performance reasons; the insertion location of source elements in the
    destination colony is chosen based on the most positive performance outcome
    for subsequent iterations/insertions. For example if the destination colony
    is {1, 2, 3, 4} and the source colony is {5, 6, 7, 8} the destination
    colony post-splice could be {1, 2, 3, 4, 5, 6, 7, 8} or {5, 6, 7, 8, 1, 2,
    3, 4}, depending on internal state of both colonies and prior
    insertions/erasures.</p>
    <p>Note: If the minimum block size of the source is smaller than the
    destination, the destination will change it's minimum block size to match
    the source. The same applies for maximum block sizes (if source's is
    larger, the destination will adjust its size).<br>
    Example: <code style="color: brown">// Splice two colonies of integers
    together:<br>
    colony&lt;int&gt; colony1 = {1, 2, 3, 4}, colony2 = {5, 6, 7, 8};<br>
    colony1.splice(colony2);</code> </p>
  </li>
  <li><code>colony &amp; operator = (colony &amp;source)</code>
    <p>Copy the elements from another colony to this colony, clearing this
    colony of existing elements first.<br>
    Example: <code style="color: brown">// Manually swap data_colony1 and
    data_colony2 in C++03<br>
    data_colony3 = data_colony1;<br>
    data_colony1 = data_colony2;<br>
    data_colony2 = data_colony3;</code></p>
  </li>
  <li><code>colony &amp; operator = (colony &amp;&amp;source)</code> 
    <p>Move the elements from another colony to this colony, clearing this
    colony of existing elements first. Source colony is now empty and in a
    valid state (same as a new colony without any insertions), can be safely
    destructed or used in any regular way without problems.<br>
    Example: <code style="color: brown">// Manually swap data_colony1 and
    data_colony2<br>
    data_colony3 = std::move(data_colony1);<br>
    data_colony1 = std::move(data_colony2);<br>
    data_colony2 = std::move(data_colony3);</code></p>
  </li>
  <li><code>bool operator == (colony &amp;source)</code> 
    <p>Compare contents of another colony to this colony. Returns a boolean as
    to whether they are equal.<br>
    Example: <code style="color: brown">if (object_colony == object_colony2)
    return;</code></p>
  </li>
  <li><code>bool operator != (colony &amp;source)</code> 
    <p>Compare contents of another colony to this colony. Returns a boolean as
    to whether they are not equal.<br>
    Example: <code style="color: brown">if (object_colony != object_colony2)
    return;</code></p>
  </li>
  <li><code>iterator begin(), iterator end(), const_iterator cbegin(),
    const_iterator cend()</code> 
    <p>Return iterators pointing to, respectively, the first element of the
    colony and the element one-past the end of the colony (as per standard STL
    guidelines).</p>
  </li>
  <li><code>reverse_iterator rbegin(), reverse_iterator rend(),
    const_reverse_iterator crbegin(), const_reverse_iterator crend()</code> 
    <p>Return reverse iterators pointing to, respectively, the last element of
    the colony and the element one-before the first element of the colony (as
    per standard STL guidelines).</p>
  </li>
  <li><code>iterator get_iterator_from_pointer(element_pointer_type
    the_pointer)</code> 
    <p>Getting a pointer from an iterator is simple - simply dereference it
    then grab the address ie. <code>"&amp;(*the_iterator);"</code>. Getting an
    iterator from a pointer is typically not so simple. This function enables
    the user to do exactly that. This is expected to be useful in the use-case
    where external containers are storing pointers to colony elements instead
    of iterators (as iterators for colonies have 3 times the size of an element
    pointer) and the program wants to erase the element being pointed to or
    possibly change the element being pointed to. Converting a pointer to an
    iterator using this method and then erasing, is about 20% slower on average
    than erasing when you already have the iterator. This is less dramatic than
    it sounds, as it is still faster than other std:: container erasure times.
    However this is generally a slower, lookup-based operation. If the lookup
    doesn't find a non-erased element based on that pointer, it returns
    <code>end()</code>. Otherwise it returns an iterator pointing to the
    element in question. Example:</p>
    <p><code style="color: brown">std::colony&lt;a_struct&gt; data_colony;<br>
    std::colony&lt;a_struct&gt;::iterator an_iterator;<br>
    a_struct struct_instance;<br>
    an_iterator = data_colony.insert(struct_instance);<br>
    a_struct *struct_pointer = &amp;(*an_iterator);<br>
    iterator another_iterator =
    data_colony.get_iterator_from_pointer(struct_pointer);<br>
    if (an_iterator == another_iterator) std::cout &lt;&lt; "Iterator is
    correct" &lt;&lt; std::endl;</code> </p>
  </li>
  <li><code>size_type get_index_from_iterator(iterator/const_iterator
    &amp;the_iterator) <b>(slow)</b></code> 
    <p>While colony is a container with unordered insertion (and is therefore
    unordered), it still has a (transitory) order which changes upon any
    erasure or insertion. <i>Temporary</i> index numbers are therefore
    obtainable. These can be useful, for example, when creating a save file in
    a computer game, where certain elements in a container may need to be
    re-linked to other elements in other container upon reloading the save
    file. Example:</p>
    <p><code style="color: brown">std::colony&lt;a_struct&gt; data_colony;<br>
    std::colony&lt;a_struct&gt;::iterator an_iterator;<br>
    a_struct struct_instance;<br>
    data_colony.insert(struct_instance);<br>
    data_colony.insert(struct_instance);<br>
    an_iterator = data_colony.insert(struct_instance);<br>
    unsigned int index = data_colony.get_index_from_iterator(an_iterator);<br>
    if (index == 2) std::cout &lt;&lt; "Index is correct" &lt;&lt;
    std::endl;</code> </p>
  </li>
  <li><code>size_type
    get_index_from_reverse_iterator(reverse_iterator/const_reverse_iterator
    &amp;the_iterator) <b>(slow)</b></code>
    <p>The same as get_index_from_iterator, but for reverse_iterators and
    const_reverse_iterators. Index is from front of colony (same as iterator),
    not back of colony. Example:</p>
    <p><code style="color: brown">std::colony&lt;a_struct&gt; data_colony;<br>
    std::colony&lt;a_struct&gt;::reverse_iterator r_iterator;<br>
    a_struct struct_instance;<br>
    data_colony.insert(struct_instance);<br>
    data_colony.insert(struct_instance);<br>
    r_iterator = data_colony.rend();<br>
    unsigned int index =
    data_colony.get_index_from_reverse_iterator(r_iterator);<br>
    if (index == 1) std::cout &lt;&lt; "Index is correct" &lt;&lt;
    std::endl;</code> </p>
  </li>
  <li><code>iterator get_iterator_from_index(size_type index)
    <b>(slow)</b></code> 
    <p>As described above, there may be situations where obtaining iterators to
    specific elements based on an index can be useful, for example, when
    reloading save files. This function is basically a shorthand to avoid
    typing <code>"iterator it = colony.begin(); colony.advance(it,
    50);"</code>. Example:</p>
    <p><code style="color: brown">std::colony&lt;a_struct&gt; data_colony;<br>
    std::colony&lt;a_struct&gt;::iterator an_iterator;<br>
    a_struct struct_instance;<br>
    data_colony.insert(struct_instance);<br>
    data_colony.insert(struct_instance);<br>
    iterator an_iterator = data_colony.insert(struct_instance);<br>
    iterator another_iterator = data_colony.get_iterator_from_index(2);<br>
    if (an_iterator == another_iterator) std::cout &lt;&lt; "Iterator is
    correct" &lt;&lt; std::endl;</code> </p>
  </li>
  <li><code>allocator_type get_allocator()</code> 
    <p>Returns a copy of the allocator used by the colony instance.</p>
  </li>
</ul>

<h3>Non-member functions</h3>
<ul>
  <li><code>void swap(colony &amp;A, source &amp;B)</code> 
    <p>Swaps colony A's contents with that of colony B (assumes both colonies
    have same element type, allocator type, etc). <br>
    Example: <code style="color: brown">swap(object_colony,
    other_colony);</code> </p>
  </li>
  <li><code>template &lt;iterator_type&gt; void advance(iterator_type iterator,
    distance_type distance)</code> 
    <p>Increments/decrements the iterator supplied by the positive or negative
    amount indicated by <i>distance</i>. Speed of incrementation will almost
    always be faster than using the ++ operator on the iterator for increments
    greater than 1. In some cases it may approximate O(1). The iterator_type
    can be an iterator, const_iterator, reverse_iterator or
    const_reverse_iterator.<br>
    Example: <code style="color: brown">colony&lt;int&gt;::iterator it =
    i_colony.begin();<br>
    i_colony.advance(it, 20); </code></p>
  </li>
  <li><code>template &lt;iterator_type&gt; iterator_type next(iterator_type
    &amp;iterator, distance_type distance)</code> 
    <p>Creates a copy of the iterator supplied, then increments/decrements this
    iterator by the positive or negative amount indicated by
    <i>distance</i>.<br>
    Example: <code style="color: brown">colony&lt;int&gt;::iterator it =
    i_colony.next(i_colony.begin(), 20);</code></p>
  </li>
  <li><code>template &lt;iterator_type&gt; iterator_type prev(iterator_type
    &amp;iterator, distance_type distance)</code> 
    <p>Creates a copy of the iterator supplied, then decrements/increments this
    iterator by the positive or negative amount indicated by
    <i>distance</i>.<br>
    Example: <code style="color: brown">colony&lt;int&gt;::iterator it2 =
    i_colony.prev(i_colony.end(), 20);</code></p>
  </li>
  <li><code>template &lt;iterator_type&gt; difference_type distance(const
    iterator_type &amp;first, const iterator_type &amp;last)</code> 
    <p>Measures the distance between two iterators, returning the result, which
    will be negative if the second iterator supplied is before the first
    iterator supplied in terms of it's location in the colony.<br>
    Example: <code style="color: brown">colony&lt;int&gt;::iterator it =
    i_colony.next(i_colony.begin(), 20);<br>
    colony&lt;int&gt;::iterator it2 = i_colony.prev(i_colony.end(), 20);<br>
    std::cout "Distance: " i_colony.distance(it, it2) std::endl;</code></p>
  </li>
</ul>

<p>Note: the four immediately above are member functions in the reference
implementation as a workaround for an unfixed bug in MSVC2013.</p>



<h3><a id="benchmarks"></a>Appendix B - reference implementation benchmarks</h3>

<p>Benchmark results for the colony v5 reference implementation under GCC 8.1
x64 on an Intel Xeon E3-1241 (Haswell) are <a
href="https://plflib.org/benchmarks_haswell_gcc.htm">here</a>.</p>

<p>Old benchmark results for an earlier version of colony under MSVC 2015
update 3, on an Intel Xeon E3-1241 (Haswell) are <a
href="https://plflib.org/benchmarks_haswell_msvc.htm">here</a>. There is no
commentary for the MSVC results.</p>

<h3><a id="faq"></a>Appendix C - Frequently Asked Questions</h3>
<ol>
  <li><h4>Where is it worth using a colony in place of other std::
    containers?</h4>
    <p>As mentioned, it is worthwhile for performance reasons in situations
    where the order of container elements is not important and:</p>
    <ol type="a">
      <li>Insertion order is unimportant</li>
      <li>Insertions and erasures to the container occur frequently in
        performance-critical code, <i><b>and</b></i> </li>
      <li>Links to non-erased container elements may not be invalidated by
        insertion or erasure.</li>
    </ol>
    <p>Under these circumstances a colony will generally out-perform other
    std:: containers. In addition, because it never invalidates pointer
    references to container elements (except when the element being pointed to
    has been previously erased) it may make many programming tasks involving
    inter-relating structures in an object-oriented or modular environment much
    faster, and could be considered in those circumstances.</p>
  </li>
  <li><h4>What are some examples of situations where a colony might improve
    performance?</h4>
    <p>Some ideal situations to use a colony: cellular/atomic simulation,
    persistent octtrees/quadtrees, game entities or destructible-objects in a
    video game, particle physics, anywhere where objects are being created and
    destroyed continuously. Also, anywhere where a vector of pointers to
    dynamically-allocated objects or a std::list would typically end up being
    used in order to preserve pointer stability but where order is
    unimportant.</p>
  </li>
  <li><h4>Is it similar to a deque?</h4>
    <p>A deque is reasonably dissimilar to a colony - being a double-ended
    queue, it requires a different internal framework. In addition, being a
    random-access container, having a growth factor for memory blocks in a
    deque is problematic (not impossible though). A deque and colony have no
    comparable performance characteristics except for insertion (assuming a
    good deque implementation). Deque erasure performance varies wildly
    depending on the implementation, but is generally similar to vector erasure
    performance. A deque invalidates pointers to subsequent container elements
    when erasing elements, which a colony does not, and is ordered.</p>
  </li>
  <li><h4>What are the thread-safe guarantees?</h4>
    <p>Unlike a std::vector, a colony can be read from and inserted into at the
    same time (assuming different locations for read and write), however it
    cannot be iterated over and written to at the same time. If we look at a
    (non-concurrent implementation of) std::vector's threadsafe matrix to see
    which basic operations can occur at the same time, it reads as follows
    (please note push_back() is the same as insertion in this regard):</p>

    <table border="1" cellspacing="3">
      <tbody>
        <tr>
          <td><b>std::vector</b></td>
          <td>Insertion</td>
          <td>Erasure</td>
          <td>Iteration</td>
          <td>Read</td>
        </tr>
        <tr>
          <td>Insertion</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
        </tr>
        <tr>
          <td>Erasure</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
        </tr>
        <tr>
          <td>Iteration</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
        </tr>
        <tr>
          <td>Read</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
        </tr>
      </tbody>
    </table>
    <p>In other words, multiple reads and iterations over iterators can happen
    simultaneously, but the potential reallocation and pointer/iterator
    invalidation caused by insertion/push_back and erasure means those
    operations cannot occur at the same time as anything else. </p>
    <p>Colony on the other hand does not invalidate pointers/iterators to
    non-erased elements during insertion and erasure, resulting in the
    following matrix:</p>

    <table border="1" cellspacing="3">
      <tbody>
        <tr>
          <td><b>colony</b></td>
          <td>Insertion</td>
          <td>Erasure</td>
          <td>Iteration</td>
          <td>Read</td>
        </tr>
        <tr>
          <td>Insertion</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
        </tr>
        <tr>
          <td>Erasure</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>Mostly*</td>
        </tr>
        <tr>
          <td>Iteration</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
        </tr>
        <tr>
          <td>Read</td>
          <td>Yes</td>
          <td>Mostly*</td>
          <td>Yes</td>
          <td>Yes</td>
        </tr>
      </tbody>
    </table>
    <p><span style="font-size: 10pt">* Erasures will not invalidate iterators
    unless the iterator points to the erased element. </span></p>
    <p>In other words, reads may occur at the same time as insertions and
    erasures (provided that the element being erased is not the element being
    read), multiple reads and iterations may occur at the same time, but
    iterations may not occur at the same time as an erasure or insertion, as
    either of these may change the state of the skipfield which's being
    iterated over. Note that iterators pointing to end() may be invalidated by
    insertion.</p>
    <p>So, colony could be considered more inherently threadsafe than a
    (non-concurrent implementation of) std::vector, but still has some areas
    which would require mutexes or atomics to navigate in a multithreaded
    environment.</p>
  </li>
  <li><h4>Any pitfalls to watch out for?</h4>
    <p>Because erased-element memory locations may be reused by
    <code>insert()</code> and <code>emplace()</code>, insertion position is
    essentially random unless no erasures have been made, or an equal number of
    erasures and insertions have been made.</p>
  </li>
  <li><h4>What is the purpose of limiting memory block minimum and maximum
    sizes?</h4>
    <p>One reason might be to ensure that memory blocks match a certain
    processor's cache or memory pathway sizes. Another reason to do this is
    that it is slightly slower to obtain an erased-element location from the
    list of groups-with-erasures (subsequently utilizing that group's free list
    of erased locations) and to reuse that space than to insert a new element
    to the back of the colony (the default behaviour when there are no
    previously-erased elements). If there are any erased elements in the
    colony, the colony will recycle those memory locations, unless the entire
    block is empty, at which point it is freed to memory.</p>
    <p>So if a block size is large, and many erasures occur but the block is
    not completely emptied, iterative performance might suffer due to large
    memory gaps between any two non-erased elements and subsequent drop in data
    locality and cache performance. In that scenario you may want to experiment
    with benchmarking and limiting the minimum/maximum sizes of the blocks,
    such that memory blocks are freed earlier and find the optimal size for the
    given use case.</p>
  </li>
  <li><h4>What is colony's Abstract Data Type (ADT)?</h4>
    <p>Though I am happy to be proven wrong I suspect colonies/bucket arrays
    are their own abstract data type. Some have suggested it's ADT is of type
    bag, I would somewhat dispute this as it does not have typical bag
    functionality such as <a
    href="http://www.austincc.edu/akochis/cosc1320/bag.htm">searching based on
    value</a> (you can use std::find but it's o(n)) and adding this
    functionality would slow down other performance characteristics. <a
    href="https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset">Multisets/bags</a>
    are also not sortable (by means other than automatically by key value).
    Colony does not utilize key values, is sortable, and does not provide the
    sort of functionality frequently associated with a bag (eg. counting the
    number of times a specific value occurs).</p>
  </li>
  <li><h4><a id="remove_when_empty"></a>Why must blocks be removed when
    empty?</h4>
    <p>Two reasons:</p>
    <ol type="a">
      <li>Standards compliance: if blocks aren't removed then <code>++</code>
        and <code>--</code> iterator operations become undefined in terms of
        time complexity, making them non-compliant with the C++ standard. At
        the moment they are O(1) amortized, typically one update for both
        skipfield and element pointers, but two if a skipfield jump takes the
        iterator beyond the bounds of the current block and into the next
        block. But if empty blocks are allowed, there could be anywhere between
        1 and <code>std::numeric_limits&lt;size_type&gt;::max</code> empty
        blocks between the current element and the next. Essentially you get
        the same scenario as you do when iterating over a boolean skipfield. It
        would be possible to move these to the back of the colony as trailing
        blocks, or house them in a separate list or vector for future usage,
        but this may create performance issues if any of the blocks are not at
        their maximum size (see below).</li>
      <li>Performance: iterating over empty blocks is slower than them not
        being present, of course - but also if you have to allow for empty
        blocks while iterating, then you have to include a while loop in every
        iteration operation, which increases cache misses and code size. The
        strategy of removing blocks when they become empty also statistically
        removes (assuming randomized erasure patterns) smaller blocks from the
        colony before larger blocks, which has a net result of improving
        iteration, because with a larger block, more iterations within the
        block can occur before the end-of-block condition is reached and a jump
        to the next block (and subsequent cache miss) occurs. Lastly, pushing
        to the back of a colony, provided there is still space and no new block
        needs to be allocated, will be faster than recycling memory locations
        as each subsequent insertion occurs in a subsequent memory location
        (which is cache-friendlier) and also less computational work is
        necessary. If a block is removed it's recyclable memory locations are
        also of course removed, hence subsequent insertions are more likely to
        be pushed to the back of the colony.</li>
    </ol>
  </li>
  <li><h4>Why not preserve empty memory blocks for future use, in a separate
    list or vector instead of freeing them to the OS, or leave this decision
    undefined by the specification?</h4>
    <p>The default scenario, for reasons of predictability, should be to free
    the memory block rather than making this undefined. If a scenario calls for
    retaining memory blocks instead of deallocating them, this should be left
    to an allocator to manage. Otherwise you get unpredictable memory behaviour
    across implementations, and this is one of the things that SG14 mmembers
    have complained about time-and-time again, the lack of predictable
    behaviour across standard library implementations. Ameliorating this
    unpredictability is best in my view.</p>
  </li>
  <li><h4>Memory block sizes - what are they based on, how do they expand,
    etc</h4>
    <p>In the reference implementation memory block sizes start from either the
    default minimum size (8 elements, larger if the type stored is small) or an
    amount defined by the programmer (with a minimum of 3 elements). Subsequent
    block sizes then increase the <i>total capacity</i> of the colony by a
    factor of 2 (so, 1st block 8 elements, 2nd 8 elements, 3rd 16 elements, 4th
    32 elements etcetera) until the maximum block size is reached. The default
    maximum block size is the maximum possible number that the skipfield
    bitdepth is capable of representing
    (std::numeric_limits&lt;skipfield_type&gt;::max()). By default the
    skipfield bitdepth is 16 so the maximum size of a block is 65535
    elements.</p>
    <p>However the skipfield bitdepth is also a template parameter which can be
    set to any unsigned integer - unsigned char, unsigned int, Uint_64, etc.
    Unsigned short (guaranteed to be at least 16 bit, equivalent to C++11's
    uint_least16_t type) was found to have the best performance in real-world
    testing due to the balance between memory contiguousness, memory waste and
    the number of allocations.</p>
  </li>
  <li><h4><a id="simd"></a>Can a colony be used with SIMD instructions?</h4>
    <p>No and yes. Yes if you're careful, no if you're not.<br>
	 On platforms which support scatter and gather operations via hardware (eg. AVX512) you can use colony with SIMD as much as you want, using gather to load elements from disparate or sequential locations, directly into a SIMD register, in parallel. Then use scatter to push the post-SIMD-process values elsewhere after. On platforms which do not support this in hardware, you would need to manually implement a scalar gather-and-scatter operation which may be significantly slower.<br>
	 In situations where gather and scatter operations are too expensive, which require elements to be contiguous in memory for SIMD processing, this is more complicated. When you have a bunch of erasures in a colony, there's no guarantee that your objects will be contiguous in memory, even though they are sequential during iteration. Some of them may also be in different memory blocks to each other. In these situations if you want to use SIMD with colony, you must do the following:</p>
    <ul>
    <li>Set your minimum and maximum group sizes to multiples of the width of your SIMD instruction. If it supports 8 elements at once, set the group sizes to multiples of 8.</li>
    <li>Either never erase from the colony, or:<br>
    	<ol>
    	<li>Shrink-to-fit after you erase (will invalidate all pointers to elements within the colony).</li>
    	<li>Only erase from the back or front of the colony, and only erase elements in multiples of the width of your SIMD instruction eg. 8 consecutive elements at once. This will ensure that the end-of-memory-block boundaries line up with the width of the SIMD instruction, provided you've set your min/max block sizes as above.</li>
    	</ol>
   </li>
   </ul>
	<p>Generally if you want to use SIMD without gather/scatter, it's probably preferable to use a vector or an array.</p>
<!-- 	<p>A version of colony designed for SIMD with gather/scatter could be more fully realised by using a slight alteration of the existing skipfield pattern. -->
   </li>
</ol>

<h3><a id="responses" name="responses"></a>Appendix D - Specific responses to
previous committee feedback</h3>
<ol>
  <li><h4>"Why not 'bag'? Colony is too selective a id."</h4>
    <p>'bag' is problematic partially because it has been synonymous with a
    multiset (and colony is not one of those) in both <a
    href="https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset">computer
    science</a> and <a
    href="https://en.wikipedia.org/wiki/Multiset">mathematics</a> since the
    1970's, and partially because it's a bit vague - it doesn't describe how
    the container works. However I accept that it is a familiar name and
    describes a similar territory, for most programmers and will accept that as
    a id if needed. 'colony' is an intuitive name if you understand the
    container, and allows for easy conveyance of how it functions internally
    (colony = human colony/ant colony etc, memory blocks = houses, elements =
    people/ants in the houses who come and go). The claim that the use of the
    word is selective in terms of it's meaning, is also true for vector, set,
    'bag', and many other C++ names.</p>
  </li>
  <li><h4>"Unordered and no associative lookup, so this only supports use cases
    where you're going to do something to every element."</h4>
    <p>As noted the container was originally designed for highly
    object-oriented situations where you have many elements in different
    containers linking to many other elements in other containers. This linking
    can be done with pointers or iterators in colony (insert returns an
    iterator which can be dereferenced to get a pointer, pointers can be
    converted into iterators with the supplied functions (for erase etc)) and
    because pointers/iterators stay stable regardless of insertion/erasure,
    this usage is unproblematic. You could say the pointer is equivalent to a
    key in this case (but without the overhead). That is the first access
    pattern, the second is straight iteration over the container, as you say.
    Secondly, the container does have (typically better than O(n))
    advance/next/prev implementations, so multiple elements can be skipped.</p>
  </li>
  <li><h4>"Do we really need the skipfield_type template argument?"</h4>
    <p>This argument currently promotes use of the container in heavily
    memory-constrained environments, and in high-performance small-N
    collections (where the type of the skipfield can be reduced to 8 bits
    without having a negative effect on maximum block sizes and subsequent
    iteration speed). See more explanation in V. Technical Specifications.
    Unfortunately this parameter also means <code>operator =</code> and some
    other functions won't work between colonies of the same type but differing
    skipfield types. Further, the template argument is chiefly relevant to the
    use of the skipfield patterns utilized in the reference implementations,
    and there may be better techniques. </p>
    <p>However, the parameter can always be ignored in an implementation.
    Retaining it, even if significantly advanced strutures are discovered for
    skipping elements, harms nothing and can be deprecated if necessary. At
    this point in time I do not personally see many alternatives to the two
    skipfield patterns which have been used in the references implementations,
    both of which benefit from having this optional parameter. Please note,
    that is not the same as saying there are no alternatives, just ones never
    thought of yet. This is something I am flexible on, as a singular skipfield
    type will cover the majority of scenarios.</p>
    <p><a href="https://plflib.org/blog.htm#shortandchardifferences">Research
    into this area</a> has determined that there is only really an advantage to
    using unsigned char for the skipfield type if the number of elements is
    under 1000, and not in all scenarios. So whether or not this constitutes a
    performance gain is largely scenario-dependant, certainly it always
    constitutes a memory usage reduction but the relative effect of this
    depends on the size of your stored type.</p>
  </li>
  <li><h4>"Prove this is not an allocator"</h4>
    <p>I'm not really sure how to answer this, as I don't see the resemblance,
    unless you count maps, vectors etc as being allocators also. The only
    aspect of it which resembles what an allocator might do, is the memory
    re-use mechanism. It would be impossible for an allocator to perform a
    similar function while still allowing the container to iterate over the
    data linearly in memory, preserving locality, in the manner described in
    this document.</p>
  </li>
  <li><h4>"If this is for games, won't game devs just write their own versions
    for specific types in order to get a 1% speed increase anyway?"</h4>
    <p>This is true for many/most AAA game companies who're on the bleeding
    edge, but they also do this for vector etc, so they aren't the target
    audience of std:: for the most part; sub-AAA game companies are more likely
    to use third party/pre-existing tools. As mentioned earlier, this structure
    (bucket-array-like) crops up in <a
    href="https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ">many,
    many fields</a>, not just game dev. So the target audience is probably
    everyone other than AAA gaming, but even then, it facilitates communication
    across fields and companies as to this type of container, giving it a
    standardised name and understanding.</p>
  </li>
  <li><h4>"Is there active research in this problem space? Is it likely to
    change in future?"</h4>
    <p>The only current analysis has been around the question of whether it's
    possible for this specification to fail to allow for a better
    implementation in future. This is unlikely given the container's
    requirements and how this impacts on implementation. Bucket arrays have
    been around since the 90's, there's been no significant innovation in them
    until now. I've been researching/working on colony since early 2015, and
    while I can't say for sure that a better implementation might not be
    possible, I am confident that no change should be necessary to the
    specification to allow for future implementations, if it is done correctly.
    </p>
    <p>The requirement of allowing no reallocations upon insertion or erasure,
    truncates possible implementation strategies significantly. Memory blocks
    have to be independently allocated so that they can be removed (when empty)
    without triggering reallocation of subsequent elements. There's limited
    numbers of ways to do that and keep track of the memory blocks at the same
    time. Erased element locations must be recorded (for future re-use by
    insertion) in a way that doesn't create allocations upon erasure, and
    there's limited numbers of ways to do this also. Multiple consecutive
    erased elements have to be skipped in O(1) time, and again there's limits
    to how many ways you can do that. That covers the three core aspects upon
    which this specification is based. See <a id="design1"
    id="design1"></a>IV. Design Decisions for the various ways these aspects
    can be designed.</p>
    <p>Skipfield update time complexity should, I think, be left
    implementation-defined, as defining time complexity may obviate better
    solutions which are faster but are not necessarily O(1). Skipfield updates
    occur during erasure, insertion, splicing, sorting and container copying. I
    have looked into alternatives to a 1-node-per-element skipfield, such as a
    compressed skipfield (a series of numbers denoting alternating lengths of
    non-erased/erased elements), but all the possible implementations I can
    think of either involve resizing of an array on-the-fly (which doesn't work
    well with low latency) and/or slowing down iteration time significantly.</p>
  </li>
</ol>

<h3><a id="sg14gameengine"></a>Appendix E - Typical game engine
requirements</h3>

<p>Here are some more specific requirements with regards to game engines,
verified by game developers within SG14:</p>
<ol type="a">
  <li>Elements within data collections refer to elements within other data
    collections (through a variety of methods - indices, pointers, etc). These
    references must stay valid throughout the course of the game/level. Any
    container which causes pointer or index invalidation creates difficulties
    or necessitates workarounds.</li>
  <li>Order is unimportant for the most part. The majority of data is simply
    iterated over, transformed, referred to and utilized with no regard to
    order.</li>
  <li>Erasing or otherwise "deactivating" objects occurs frequently in
    performance-critical code. For this reason methods of erasure which create
    strong performance penalties are avoided.</li>
  <li>Inserting new objects in performance-critical code (during gameplay) is
    also common - for example, a tree drops leaves, or a player spawns in an
    online multiplayer game.</li>
  <li>It is not always clear in advance how many elements there will be in a
    container at the beginning of development, or at the beginning of a level
    during play. Genericized game engines in particular have to adapt to
    considerably different user requirements and scopes. For this reason
    extensible containers which can expand and contract in realtime are
    necessary.</li>
  <li>Due to the effects of cache on performance, memory storage which is
    more-or-less contiguous is preferred.</li>
  <li>Memory waste is avoided.</li>
</ol>

<p>std::vector in it's default state does not meet these requirements due to:
</p>
<ol>
  <li>Poor (non-fill) singular insertion performance (regardless of insertion
    position) due to the need for reallocation upon reaching capacity</li>
  <li>Insert invalidates pointers/iterators to all elements </li>
  <li>Erase invalidates pointers/iterators/indexes to all elements afer the
    erased element</li>
</ol>

<p>Game developers therefore either develop custom solutions for each scenario
or implement workarounds for vector. The most common workarounds are most
likely the following or derivatives:</p>
<ol>
  <li>Using a boolean flag or similar to indicate the inactivity of an object
    (as opposed to actually erasing from the vector). Elements flagged as
    inactive are skipped during iteration. <br>
    <br>
    Advantages: Fast "deactivation". Easy to manage in multi-access
    environments.<br>
    Disadvantages: Can be slower to iterate due to branching.</li>
  <li>Using a vector of data and a secondary vector of indexes. When erasing,
    the erasure occurs only in the vector of indexes, not the vector of data.
    When iterating it iterates over the vector of indexes and accesses the data
    from the vector of data via the remaining indexes. <br>
    <br>
    Advantages: Fast iteration.<br>
    Disadvantages: Erasure still incurs some reallocation cost which can
    increase jitter.</li>
  <li>Combining a swap-and-pop approach to erasure with some form of
    dereferenced lookup system to enable contiguous element iteration
    (sometimes called a 'packed array': <a
    href="http://bitsquid.blogspot.ca/2011/09/managing-decoupling-part-4-id-lookup.html">http://bitsquid.blogspot.ca/2011/09/managing-decoupling-part-4-id-lookup.html</a>).
    <br>
    Advantages: Iteration is at standard vector speed.<br>
    Disadvantages: Erasure will be slow if objects are large and/or
    non-trivially copyable, thereby making swap costs large. All link-based
    access to elements incur additional costs due to the dereferencing system.
  </li>
</ol>

<p>Colony brings a more generic solution to these contexts. While some
developers, particularly AAA developers, will almost always develop a custom
solution for specific use-cases within their engine, I believe most sub-AAA and
indie developers are more likely to rely on third party solutions. Regardless,
standardising the container will allow for greater cross-discipline
communcation.</p>


<h3><a id="questions"></a>Appendix F - Questions for reviewers</h3>

<p>Please feel free to get in touch with information and opinions on the
following topics:</p>
<ul>
  <li>Can you see any possible alternative to a 1-node-per-element
  skipfield?</li>
  <li>Can you see any possible difficulties with implementing a thread-safe version of this structure?</li>
</ul>


<h3><a id="revisions"></a>Appendix G - Paper revision history</h3>
<ul>
  <li>R8: Correction to SIMD info. Correction to structure (missing appendices title, member functions and technical specification were conjoined, acknowledgments section had mysteriously gone missing since an earlier version, now restored and updated). Update intro. HTML corrections.</li>
  <li>R7: Minor changes to member functions.</li>
  <li>R6: Re-write. Reserve() and shrink_to_fit() removed from
  specification.</li>
  <li>R5: Additional note for reserve, re-write of introduction.</li>
  <li>R4: Addition of revision history and review feedback appendices. General
    rewording. Update of benchmarks to v4 of colony, using max 1000000 N for
    most benchmarks, and using GCC 7.1 as compiler on a Haswell-core machine.
    Previous benchmarks also still available at external links. Expansion of
    initial metaphorical explanation. Cutting of some dead wood. Addition of
    some more dead wood. Reversion to HTML, benchmarks moved to external URL,
    based on feedback. Change of font to Times New Roman based on looking at
    what other papers were using, though I did briefly consider Comic Sans.
    Change to insert specifications.</li>
  <li>R3: Jonathan Wakely's extensive technical critique has been actioned on,
    in both documentation and the reference implementation. "Be clearer about
    what operations this supports, early in the paper." - done (V. Technical
    Specifications). "Be clear about the O() time of each operation, early in
    the paper." - done for main operations, see V. Technical Specifications.
    Responses to some other feedbacks included in the foreword.</li>
  <li>R2: Rewording.</li>
</ul>
</body>
</html>
