<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

    <title>DXXXX: Zap the Zap: Pointers should just be bags of bits</title>
    <style type="text/css">
      p {text-align:justify}
      li {text-align:justify}
      blockquote.note
      {
      background-color:#E0E0E0;
      padding-left: 15px;
      padding-right: 15px;
      padding-top: 1px;
      padding-bottom: 1px;
      }
      ins, .inserted, ins p, .inserted p
      {
      color: black;
      background: #a0ffa0;
      text-decoration: underline;
      }
      del, del p
      {
      color: black;
      background: #ffa0a0;
      text-decoration: line-through;
      }
    </style>
  </head><body>
    <table>
      <tr><td>Document Number:</td><td>DXXXXR1</td></tr>
      <tr><td>Date:</td><td>2020-06-15</td></tr>
      <tr><td>Author:</td><td><a href="mailto:anthony@justsoftwaresolutions.co.uk">Anthony
            Williams</a><br>Just Software Solutions Ltd</td></tr>
      <tr><td>Audience:</td><td>EWG</td></tr>
    </table>
    <h1>DXXXX: Zap the Zap: Pointers should just be bags of bits</h1>

    <p>This paper relates to <a href="http://wg21.link/p1726">P1726: Pointer
        lifetime-end zap and provenance, too</a>. My argument is that in many
        ways pointers are <strong>already</strong> treated like bags of bits by
        the language, so we should be consistent, and treat them as such
        throughout. A consequence of this is that there can be no "lifetime-end
        pointer zap".</p>

    <p>This paper provides a series of examples. I believe all these examples
      are clearly defined by the standard due to the fact that pointers
      are <em>scalar types</em> and <em>trivially copyable</em> types.</p>

    <p>As shown by these examples, the "pointer zap" from the final sentence of
        [basic.stc] p4 (<q> Any other use of an invalid pointer value has
        implementation-defined behavior</q>), and especially note 31 (<q>Some
        implementations might define that copying an invalid pointer value
        causes a system-generated runtime fault.</q>) is clearly incompatible
        with pointers being <em>trivially copyable</em> types from [basic.types]
        p3. Consequently we should either strike that permission from the
        standard and require that invalid pointer values are still copyable and
        comparable, or we should decide that pointers are not <em>trivially
        copyable</em> after all, which would have far reachging
        consequences.</p>

    <p>All standard references are to the C++ working draft from the 2020-04
      mailing: <a href="http://wg21.link/n4861.pdf">N4861</a>.</p>

    <p>All these examples have been tested with gcc, clang and MSVC. Links to
      compiler explorer are provided for each example.</p>

    <h2>Wording</h2>

    <p>Strike the final sentence and note 31 from [basic.stc] p4:

      <blockquote>When the end of the duration of a region of storage is
      reached, the values of all pointers representing the address of any part of
      that region of storage become invalid pointer values (6.8.2). Indirection
      through an invalid pointer value and passing an invalid pointer value to a
      deallocation function have undefined behavior. <del>Any other use of an
      invalid pointer value has implementation-defined
          behavior.31</del></blockquote></p>

    
    <p>Add a new sentence to the end of [basic.stc] p4:

      <blockquote>
        <ins>Copying and assigning invalid pointer values preserves the value
        representation. Comparisons involving an invalid pointer value return an
        unspecified result. An invalid pointer value will become a valid pointer
        value if region of storage with dynamic storage duration is allocated
        and the value representation of a pointer to the newly allocated storage
        cast to the same pointer type as the erstwhile-invalid pointer value is
        the same as the value representation of the erstwhile-invalid pointer
        value.</ins>
      </blockquote>
    </p>
    
    <h2>Examples</h2>

    <h3>Example 1: <code>memcpy</code> on a pointer</h3>

    <p><a href="https://godbolt.org/z/psYj_a">Compiler explorer link</a></p>

    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;

int main() {
    int *x= new int(42);
    int *y= nullptr;
    memcpy(&y,&x,sizeof(x));
    assert(x == y);
    assert(*y==42);
}
    </pre>

    <p>Here, we use <code>memcpy</code> to copy the bits of a pointer from one
    pointer to another. The second pointer is now valid and points to the same
    thing the original did because pointers are <em>trivially copyable</em>
      ([basic.types] p3).</p>

    <h3>Example 2: <code>memcpy</code> via a buffer</h3>

    <p><a href="https://godbolt.org/z/wbMTXf">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;

int main() {
    int *x= new int(42);
    int *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));
    memcpy(&y, buffer, sizeof(x));
    assert(x == y);
    assert(*y == 42);
}
    </pre>

    <p>Here, we use <code>memcpy</code> to copy the bits of a pointer from one
      pointer to a buffer, and then from that buffer to another pointer. The
      second pointer is now valid and points to the same thing the original did
      because pointers are <em>trivially copyable</em> ([basic.types] p2).</p>

    <h3>Example 3: <code>reinterpret_cast</code> to an integer</h3>

    <p><a href="https://godbolt.org/z/VDN9wp">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdint.h&gt;

int main() {
    int *x= new int(42);
    int *y= nullptr;
    uintptr_t temp= reinterpret_cast&lt;uintptr_t&gt;(x);
    y= reinterpret_cast&lt;int *&gt;(temp);
    assert(x == y);
    assert(*y == 42);
}
    </pre>

    <p>Here we rely on the provision of [expr.reinterpret.cast] p5 that a
      pointer may be cast to an integer and back and retain its value.</p>

    <h3>Example 4: <code>memcpy</code> with modifications</h3>

    <p><a href="https://godbolt.org/z/8Com8V">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;

int main() {
    int *x= new int(42);
    int *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));
    for(auto &c : buffer) {
        c^= 0x55;
    }
    for(auto &c : buffer) {
        c^= 0x55;
    }
    memcpy(&y, buffer, sizeof(x));
    assert(x == y);
    assert(*y == 42);
}
    </pre>

    <p>Now we take example 1 a step further: we perform a reversible
      modification on the bits in the buffer after the
      first <code>memcpy</code>, then reverse that modification
      and <code>memcpy</code> it back. Since the bits in the buffer now hold
      their original values, we can copy them to a pointer, which will have the
      same value, because pointers are <em>trivially copyable</em>.</p>

    <h3>Example 5: <code>memcpy</code> and write to file</h3>

    <p><a href="https://godbolt.org/z/6Pk3Ri">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;

int main() {
    int *x= new int(42);
    int *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));

    auto file= fopen("tempfile", "wb");
    auto written= fwrite(buffer, 1, sizeof(buffer), file);
    assert(written == sizeof(buffer));
    fclose(file);

    memset(buffer, 0, sizeof(buffer));

    file= fopen("tempfile", "rb");
    auto read= fread(buffer, 1, sizeof(buffer), file);
    assert(read == sizeof(buffer));
    fclose(file);
    memcpy(&y, buffer, sizeof(x));
    assert(x == y);
    assert(*y == 42);
}
    </pre>

    <p>This time we are copying the pointer to a buffer, writing our bytes to a
      file, clearing the buffer and reading the bytes back from the file, then
      copying the bytes back to the pointer. If our file is unmodified then the
      buffer will have the same contents after reading as it did before writing,
      so copying the buffer back to the pointer yields the same value, and the
      pointer is again valid and points to the same object.</p>


    <h3>Example 6: destroy and recreate the object</h3>

    <p><a href="https://godbolt.org/z/d5cZ-A">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;
#include &lt;new&gt;

struct X {
    int i;
};

int main() {
    X *x= new X{42};
    X *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));

    x-&gt;~X();
    new(x) X{99};

    memcpy(&y, buffer, sizeof(x));
    assert(x == y);
    assert(y-&gt;i == 99);
    assert(x-&gt;i == 99);
}
    </pre>

    <p>This time, we destroy the pointed-to object and recreate a new object
      with a new value at the same memory location.</p>

    <p>The pointer <code>x</code> still holds the same bit pattern, and still
    points to a valid object, so both the original pointer <code>x</code> and
    the newly constructed copy <code>y</code> point to the new object, and all
    is well by [basic.life] p8.</p>

    <h3>Example 7: <code>delete</code> and <code>new</code> the object</h3>

    <p><a href="https://godbolt.org/z/2-Awa4">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;
#include &lt;new&gt;

struct X {
    int i;
};

int main() {
    X *x= new X{42};
    X *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));

    delete x;
    y= new X{99};

    unsigned char buffer2[sizeof(x)];
    memcpy(buffer2, &y, sizeof(x));

    if(memcmp(buffer, buffer2, sizeof(x))) {
        printf("Different address\n");
        return 0;
    }

    memcpy(&x, buffer2, sizeof(x));

    assert(x == y);
    assert(y-&gt;i == 99);
    assert(x-&gt;i == 99);
}
    </pre>

    <p>This time, we destroy the pointed-to object with <code>delete</code> and recreate a new object
      with a new value with <code>new</code>.</p>

    <p>We then copy the new pointer into a buffer and compare the buffers. If
      the buffers are different, then the pointers are clearly different and our
      test doesn't work, so we stop.</p>

    <p>If the buffers are the same, then we copy the new buffer (which is a copy
      of our new pointer) into the old pointer.</p>

    <p><code>x</code> is now a copy of the raw bits of our new pointer, so
      everything must work.</p>

    
    
    <h3>Example 8: <code>delete</code> and <code>new</code> the object again</h3>

    <p><a href="https://godbolt.org/z/jhsTgF">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;
#include &lt;new&gt;

struct X {
    int i;
};

int main() {
    X *x= new X{42};
    X *y= nullptr;
    unsigned char buffer[sizeof(x)];
    memcpy(buffer, &x, sizeof(x));

    delete x;
    y= new X{99};

    unsigned char buffer2[sizeof(x)];
    memcpy(buffer2, &y, sizeof(x));

    if(memcmp(buffer, buffer2, sizeof(x))) {
        printf("Different address\n");
        return 0;
    }

    assert(x == y);
    assert(y-&gt;i == 99);
    assert(x-&gt;i == 99);
}
    </pre>

    <p>This is the same as example 7, except we don't copy the raw bits from the
      new buffer over our old pointer.</p>

    <p>We know that the bits of <code>x</code> and the bits of <code>y</code>
      are the same because we compared them with <code>memcmp</code>. Since the
      pointers are trivially copyable, the value of the pointer is determined by
      the <em>value representation</em>, which is the set of bits of
      the <em>object representation</em>. Since we know the <em>object
      representation</em> is the same, the <em>value representation</em> must be
      the same, so the pointers must have the same value.</p>

    <p>Since the pointers must have the same value, <code>x</code> must be equal
      to <code>y</code>, and must point to the same object, and all is well.</p>

    <h3>Example 9: using <code>std::atomic</code> to hold the pointer</h3>

    <p><a href="https://godbolt.org/z/FZyzJh">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;
#include &lt;new&gt;
#include &lt;atomic&gt;

struct X {
    int i;
};

int main() {
    X *x= new X{42};
    X *y= nullptr;
    std::atomic&lt;X *&gt; p(x);

    delete x;
    y= new X{99};

    X *temp= y;
    if(!p.compare_exchange_strong(temp, y)) {
        printf("Different address\n");
        return 0;
    }

    assert(x == y);
    assert(y-&gt;i == 99);
    assert(x-&gt;i == 99);
}
    </pre>

    
    <p>This is the same as example 8, except instead of
      using <code>memcmp</code> to determine the equivalence, we
      use <code>compare_exchange_strong</code>, which compares pointer as-if
      with <code>memcmp</code>.</p>

    
    <h3>Example 10: using <code>std::atomic</code> to hold the pointer, comparison the other way round</h3>

    <p><a href="https://godbolt.org/z/Gh8vBC">Compiler explorer link</a></p>
    
    <pre>
#include &lt;assert.h&gt;
#include &lt;string.h&gt;
#include &lt;stdio.h&gt;
#include &lt;new&gt;
#include &lt;atomic&gt;

struct X {
    int i;
};

int main() {
    X *x= new X{42};
    X *y= nullptr;

    delete x;
    y= new X{99};

    std::atomic&lt;X *&gt; p(y);
    if(!p.compare_exchange_strong(x, nullptr)) {
        printf("Different address\n");
        return 0;
    }

    assert(x == y);
    assert(y-&gt;i == 99);
    assert(x-&gt;i == 99);
}
    </pre>

    
    <p>This is the same as example 9, except that rather than comparing
    the <code>temp</code> value copied from <code>y</code> with our stored
    pointer, we store the new value in the atomic, and compare it to our
    original <code>x</code>. This still works because
    the <code>compare_exchange_strong</code> compares as-if
    using <code>memcmp</code>, so we are comparing the object representation
    of <code>x</code> against the object representation of the copy
    of <code>y</code> stored in <code>p</code>: if the pointers have the same
    object representation then they have the same value representation, so must
    be the same and point to the same object.</p>

    
