<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
  <style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
  margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
  </style>
  <link href="data:text/css;charset=utf-8,del%20%7B%20text%2Ddecoration%3A%20line%2Dthrough%3B%20color%3A%20%23aa0000%3B%20%7D%0Ains%20%7B%20text%2Ddecoration%3A%20underline%3B%20color%3A%20%2300aa00%3B%20%7D%0A" rel="stylesheet" type="text/css" />
</head>
<body>
<table>
<tr>
<td>
Title:
</td>
<td>
Supporting <tt>offsetof</tt> for All Classes
</td>
</tr>
<tr>
<td>
Document number:
</td>
<td>
P0897R0
</td>
</tr>
<tr>
<td>
Date:
</td>
<td>
2018-01-05
</td>
</tr>
<tr>
<td>
Project:
</td>
<td>
ISO JTC1/SC22/WG21: Programming Language C++
</td>
</tr>
<tr>
<td>
Audience:
</td>
<td>
EWG, LEWG
</td>
</tr>
<tr>
<td>
Reply-to:
</td>
<td>
Andrey Semashev &lt;andrey.semashev at gmail dot com&gt;
</td>
</tr>
</table>
<h1 id="introduction">1. Introduction</h1>
<p>This proposal extends the set of types <code>offsetof</code> is required to support to all classes. It builds upon the feedback received for the <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html">P0545R0</a> (Supporting <tt>offsetof</tt> for Stable-layout Classes) proposal. Its presentation in Albuquerque in 2017-11 resulted in general acknowledgement that the described problems seemed worth solving, and the subsequent discussion provided guidance toward a more comprehensive solution. The suggested avenues for further development were to broaden the set of types supported by <code>offsetof</code> either by relaxing standard-layout class restrictions or by specifying <code>offsetof</code> for more categories of types, without introducing new ones. While relaxing standard-layout class may be useful on its own, it does not coincide with the intention of the original proposal, which was to improve <code>offsetof</code>. Also, changing specification of standard-layout class might require coordination of more parts of the standard and possibly have effect on compatibility with C. Therefore this proposal takes the second approach. It does not preclude future proposals regarding standard-layout classes.</p>
<p>This paper inherits much of the motivation presented in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html">P0545R0</a> and refers to that document to avoid duplication. Some of the sections from that proposal are updated by this proposal, as noted in the text.</p>
<h1 id="impact-on-the-standard">2. Impact on the Standard</h1>
<p>This proposal is a pure extension to the C++ language and standard library. Its intent is to define behavior that was previously not defined, in a way that most current implementations already work. No existing valid portable code is made invalid or changes its behavior.</p>
<h1 id="design-decisions">3. Design Decisions</h1>
<h2 id="trivially-copyable-and-non-trivially-copyable-types">3.1. Trivially-copyable and Non-trivially-copyable Types</h2>
<p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html">P0545R0</a>, Sections 4.1 and 4.2, described that trivially-copyable types are already de-facto required to be compatible with <code>offsetof</code> by the current standard. Presence of non-trivial copy constructors and assignment operators also pose no difference to the object data layout in the current implementations. This makes those categories of types an easy extension.</p>
<h2 id="classes-with-virtual-functions-andor-virtual-base-classes">3.2. Classes with Virtual Functions and/or Virtual Base Classes</h2>
<p>Unlike <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html">P0545R0</a>, this proposal aims to add <code>offsetof</code> support for all classes, including those with virtual functions and virtual base classes. Although such classes are not suitable for data exchange between processes, there may be use cases where <code>offsetof</code> would still be useful. For example, an algorithm supporting heterogenous input (e.g. elements of different types in a heterogenous intrusive sequence) could be implemented more efficiently if it had information about offsets to the relevant data in the elements of the sequence. The alternative implementations typically involve some sort of visitation and dynamic dispatch, which is likely more expensive in terms of performance and code size than pointer manipulation using offsets.</p>
<p>The most common implementation of virtual functions involves a virtual function table (vftable), a pointer to which is stored as a hidden data member in class objects. Like virtual functions, virtual base classes also typically introduce a base offset table (vbtable) and a hidden data member that points to it. This member is used to obtain the address of the virtual base class subobject at runtime. When both virtual functions and virtual base classes are used, some compilers may add yet another hidden member (vtordisp). The size and position of these hidden members are constant, so they do not preclude <code>offsetof</code> from working as expected.</p>
<p>It is possible that other implementations of virtual functions or virtual base classes exist. However, the author is not aware of implementations that make non-static data member layout for a given class a runtime property that is not available at compile time. Such a design would complicate ABI definition and likely have significant performance consequences with no apparent benefits. Therefore, the author considers such a design unlikely to be implemented and not worth allowing for, compared to the benefit from the improved support for <code>offsetof</code>.</p>
<p>Provided the above, it is possible to to calculate the offset of a data member at compile time given that the type of the most derived object is known. In this case the compiler has full knowledge of layout of non-static data members in the object and does not require runtime resolution of the address of the base class subobject. Fortunately, the definition of <code>offsetof</code> given in the <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf">C standard</a> (7.19/3), to which C++ refers in [support.types.layout]/1, is compatible with this precondition:</p>
<blockquote>
<p>
The macros are [...]; and
<pre><code>offsetof(<i>type</i>, <i>member-designator</i>)
</code></pre>
which expands to an integer constant expression that has type <tt>size_t</tt>, the value of which is the offset in bytes, to the structure member (designated by <i>member-designator</i>), from the beginning of its structure (designated by <i>type</i>). The type and member designator shall be such that given
<pre><code>static <i>type</i> t;
</code></pre>
then the expression <tt>&amp;(t.<i>member-designator</i>)</tt> evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)
</p>
</blockquote>
<p>There are no base subobjects in C, so in terms of C++ the structure <em>type</em> can be only the most derived class. Also, the object <code>t</code> in this definition is the complete object of type <em>type</em>.</p>
<h2 id="abstract-classes">3.3. Abstract Classes</h2>
<p>Complete objects of abstract classes cannot be created, which puts them in conflict with the literal <code>offsetof</code> definition given in the C standard and quoted above. However, the object that is used in the definition is only given for the purpose of the definition itself; invoking <code>offsetof</code> does not result in creation of the object of class <em>type</em>. The presence of pure virtual functions does not affect data layout compared to regular virtual functions.</p>
<p>The result of <code>offsetof</code> applied to an abstract class can be useful, if it can be used to adjust a pointer to a base subobject of that class. For example:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> Base
{
    <span class="dt">int</span> n;

    <span class="kw">virtual</span> <span class="dt">void</span> foo() = <span class="dv">0</span>;
};

<span class="kw">struct</span> Derived1 : <span class="kw">public</span> Base
{
    <span class="dt">float</span> x;

    <span class="dt">void</span> foo() <span class="kw">override</span>;
};

<span class="kw">struct</span> Derived2 : <span class="kw">public</span> Derived<span class="dv">1</span>
{
    <span class="bu">std::</span>string str;

    <span class="dt">void</span> foo() <span class="kw">override</span>;
};

Derived1 d1;
Derived2 d2;

<span class="kw">constexpr</span> <span class="bu">std::</span>size_t offset = offsetof(Base, n); <span class="co">// ok, works as if Base::foo was not marked as pure virtual, but only virtual</span>
<span class="dt">int</span>* p1 = <span class="kw">reinterpret_cast</span>&lt;<span class="dt">int</span>*&gt;(<span class="kw">reinterpret_cast</span>&lt;<span class="bu">std::</span>byte*&gt;(<span class="kw">static_cast</span>&lt;Base*&gt;(&amp;d1)) + offset); <span class="co">// points to d1.n</span>
<span class="dt">int</span>* p2 = <span class="kw">reinterpret_cast</span>&lt;<span class="dt">int</span>*&gt;(<span class="kw">reinterpret_cast</span>&lt;<span class="bu">std::</span>byte*&gt;(<span class="kw">static_cast</span>&lt;Base*&gt;(&amp;d2)) + offset); <span class="co">// points to d2.n</span></code></pre></div>
<p>This becomes possible if we require base class subobjects to be contiguous and have the same layout in every most derived object (section 3.4 expands more on that). This requirement is currently fulfilled in every implementation the author is familiar with, unless the base class has virtual base classes itself. In this case the placement of the virtual base subobject in the storage typically depends on the most derived object size and layout. But when the most derived object has no non-static data members or other base classes of non-zero size (i.e. its size is equal to the base class subobject), the <code>offsetof</code> result is still valid.</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> BaseData
{
    <span class="dt">int</span> n;
};

<span class="kw">struct</span> Base : <span class="kw">virtual</span> <span class="kw">public</span> BaseData
{
    <span class="kw">virtual</span> <span class="dt">void</span> foo() = <span class="dv">0</span>;
};

<span class="kw">struct</span> Good : <span class="kw">public</span> Base
{
    <span class="co">// No non-static data members or base classes other than Base, sizeof(Good) == sizeof(Base)</span>

    <span class="dt">void</span> foo() <span class="kw">override</span>;
};

<span class="kw">struct</span> Bad : <span class="kw">public</span> Base
{
    <span class="dt">float</span> x;

    <span class="dt">void</span> foo() <span class="kw">override</span>;
};

Good g;
Bad b;

<span class="kw">constexpr</span> <span class="bu">std::</span>size_t offset = offsetof(Base, n); <span class="co">// still ok, works as if Base::foo was not marked as pure virtual, but only virtual</span>
<span class="dt">int</span>* p1 = <span class="kw">reinterpret_cast</span>&lt;<span class="dt">int</span>*&gt;(<span class="kw">reinterpret_cast</span>&lt;<span class="bu">std::</span>byte*&gt;(<span class="kw">static_cast</span>&lt;Base*&gt;(&amp;g)) + offset); <span class="co">// ok, points to g.n</span>
<span class="dt">int</span>* p2 = <span class="kw">reinterpret_cast</span>&lt;<span class="dt">int</span>*&gt;(<span class="kw">reinterpret_cast</span>&lt;<span class="bu">std::</span>byte*&gt;(<span class="kw">static_cast</span>&lt;Base*&gt;(&amp;b)) + offset); <span class="co">// bad, does not point to b.n because the location of BaseData may not be adjacent to Base</span></code></pre></div>
<p>The author believes that <code>offsetof</code> is still useful with classes like <code>Good</code>. The broken uses with classes like <code>Bad</code> could potentially be diagnosed with a warning, if the compiler is able to track the actual type of the pointed object. However, the warning is not a requirement imposed by this proposal and is left as an aspect of quality of implementation.</p>
<h2 id="object-storage-and-data-member-layout">3.4. Object Storage and Data Member Layout</h2>
<p>In order for <code>offsetof</code> to be able to operate and provide a useful result, certain requirements must be fulfilled:</p>
<ul>
<li>objects of a class compatible with <code>offsetof</code> must occupy a contiguous region of storage, and</li>
<li>offsets to non-static data members of such class must be constant and not depend on a particular most derived object of that class (this proposal refers to this property as <em>offset stability</em> or <em>stability of offsets</em>).</li>
</ul>
<p>The first requirement allows to obtain the offset information and then apply it to an object and the second requirement guarantees that the information can be applied to any most derived object of that class.</p>
<p>Currently, the standard guarantees contiguous storage only for objects of trivially copyable or standard-layout types ([intro.object]/7). This proposal requires contiguous storage for objects of any type, except for objects that have virtual base class subobjects. Virtual base class subobjects may occupy storage bytes that are not immediately adjacent to the containing object, if that object is also a base class subobject. For example:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> X
{
    <span class="dt">int</span> a;
};

<span class="kw">struct</span> V : <span class="kw">virtual</span> <span class="kw">public</span> X
{
    <span class="dt">int</span> b;
};

<span class="kw">struct</span> Y : <span class="kw">public</span> V
{
    <span class="dt">int</span> c;
};

V v;
Y y;</code></pre></div>
<p>Both <code>v</code> and <code>y</code> are required to occupy contiguous bytes of storage, but the subobject of <code>y</code> that represents the base class <code>V</code> may not be stored contiguously if the implementation chooses the following data layout:</p>
<pre><code>-----
|*V*|   includes V::b
-----
| Y |   includes Y::c
-----
|*X*|   includes X::a
-----</code></pre>
<p>In this layout, the storage used by the base class <code>V</code> subobject is marked with asterisks (*).</p>
<p>Note also that the requirement of contiguous storage does not mean that any implementation-specific data such as virtual function tables or virtual base displacement tables need to be placed in the object's storage. These kinds of data do not contribute to the object's value representation ([basic.types]/4) and therefore don't need to occupy the object storage.</p>
<p>The standard has no explicit provision about stability of offsets of non-static data members across different objects of a class. For trivially copyable classes the data member offsets are required to be stable, as discussed in <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html">P0545R0</a>. For standard-layout classes the current definition of <code>offsetof</code> implies that members of such classes must have stable offsets. By extending <code>offsetof</code> to support other class types, this proposal requires that member offsets are stable for all classes.</p>
<h2 id="reference-data-members">3.5. Reference Data Members</h2>
<p>The definition of <code>offsetof(type, member-designator)</code> in the C standard only requires <code>&amp;(t.member-designator)</code> to evaluate to an address constant (here, <code>t</code> is an object of class <code>type</code>). Since this proposal allows using <code>offsetof</code> with non-trivial types, <code>member-designator</code> can now identify a reference member. Taking the address of a reference member would result in the address of the referred object, which is not the intended effect of <code>offsetof</code>. For this reason, this proposal prohibits the use of references in a <code>member-designator</code>; the behavior is undefined if this requirement is violated. Note that this rule applies if references are used at any level of <code>member-designator</code>, if it identifies a nested member. For example:</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> A
{
    <span class="dt">int</span> a;
};

<span class="kw">struct</span> Bad
{
    <span class="dt">int</span>&amp; n;
    A&amp; a;
    <span class="dt">int</span> x;
};

offsetof(Bad, n);    <span class="co">// undefined behavior, Bad::n is a reference</span>
offsetof(Bad, a.a);  <span class="co">// undefined behavior, Bad::a is a reference</span>
offsetof(Bad, x);    <span class="co">// ok, returns offset of Bad::x</span></code></pre></div>
<h1 id="impact-on-existing-implementations">4. Impact on Existing Implementations</h1>
<p>The currently widespread compilers (GCC 7.2, Clang 4.0.1, MSVC 19.12.25831) all support <code>offsetof</code> for trivially copyable and non-trivially classes, as well as for classes with virtual functions (including pure virtual functions), although some of the compilers emit warnings, as currently required by <a href="http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/n4713.pdf">N4713</a>. This proposal makes all classes supported unconditionally, which will remove the need for a diagnostic.</p>
<p>The tested compilers also have limited support for <code>offsetof</code> with classes with virtual base classes.</p>
<div class="sourceCode"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span class="kw">struct</span> A
{
    <span class="dt">int</span> a;
};

<span class="kw">struct</span> D : <span class="kw">virtual</span> <span class="kw">public</span> A
{
    <span class="dt">int</span> c;
    <span class="dt">char</span> str[<span class="dv">1024</span>];
};

offsetof(D, a);       <span class="co">// (1)</span>
offsetof(D, c);       <span class="co">// (2)</span>
offsetof(D, str);     <span class="co">// (3)</span>
offsetof(D, str[<span class="dv">10</span>]); <span class="co">// (4)</span></code></pre></div>
<p>In the above example, all three tested compilers were able to compile all <code>offsetof</code> expressions, except (1), which is the only one that requests an offset of a member of a virtual base class. The updated compilers would have to be able to calculate the offset to members of virtual base classes at compile time.</p>
<p>Other than case (1), all tested compilers appeared to return values that are aligned with this proposal (i.e. the returned values were equal to the offset to the requested member in a <em>complete</em> object of the specified type). Therefore, even programs that previously relied on implementation-specific behavior of <code>offsetof</code> are unlikely to be affected.</p>
<h1 id="proposed-wording">5. Proposed Wording</h1>
<p>The proposed wording below is given relative to <a href="http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/n4713.pdf">N4713</a>. Inserted text is marked like <ins>this</ins>, removed text is marked like <del>this</del>.</p>
<h2 id="core-wording">5.1. Core Wording</h2>
<p>Modify [intro.object]/7:</p>
<blockquote>
<p>
Unless it is a bit-field (12.2.4), a most derived object shall have a nonzero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. A<del>n</del><ins> most derived</ins> object <del>of trivially copyable or standard-layout type (6.7)</del><ins>or a base class subobject of nonzero size excluding any of its virtual base class (13.1) subobjects</ins> shall occupy contiguous bytes of storage.<ins> A virtual base class subobject may occupy a distinct non-adjacent contiguous region of storage from the rest of its containing base class subobject. Each such region of storage shall be within the enclosing contiguous region of storage occupied by the containing most derived object.</ins>
</p>
</blockquote>
<h2 id="library-wording">5.2. Library Wording</h2>
<p>Modify [support.types.layout]/1:</p>
<blockquote>
<p>
The macro <tt>offsetof</tt>(<i>type</i>, <i>member-designator</i>) has the same semantics as the corresponding macro in the C standard library header <tt>&lt;stddef.h&gt;</tt>, <del>but accepts a restricted set of <i>type</i> arguments in this International Standard. Use of the <tt>offsetof</tt> macro with a <i>type</i> other than a standard-layout class (Clause 12) is conditionally-supported</del><ins>with the following additions</ins>. <ins>The argument <i>type</i> is interpreted as the most derived class (6.6.2). <i>[Note:</i> For the purpose of this definition, all <i>pure-specifiers</i> (12.2) in member function declarations of <i>type</i> and its base classes are ignored. <i>&#8212; end note]</i> </ins>The expression <tt>offsetof</tt>(<i>type</i>, <i>member-designator</i>) is never type-dependent (17.7.2.2) and it is value-dependent (17.7.2.3) if and only if <i>type</i> is dependent. The result of applying the <tt>offsetof</tt> macro to a static data member or a function member is undefined.<ins> Given an object <tt>t</tt> of type <i>type</i>, if <tt>t.<i>member-designator</i></tt> accesses or identifies a reference data member, the result of the <tt>offsetof</tt> expression is undefined.</ins> No operation invoked by the <tt>offsetof</tt> macro shall throw an exception and <tt>noexcept(offsetof(<i>type</i>, <i>member-designator</i>))</tt> shall be <tt>true</tt>.
</p>
</blockquote>
<h1 id="acknowledgements">6. Acknowledgements</h1>
<ul>
<li>Thanks to Walter E. Brown for the help with preparing and presenting the proposal.</li>
</ul>
<h1 id="references">7. References</h1>
<ul>
<li>P0545R0 proposal, Supporting <tt>offsetof</tt> for Stable-layout Classes (<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html" class="uri">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0545r0.html</a>)</li>
<li>N4713 Working Draft, Standard for Programming Language C++ (<a href="http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/n4713.pdf" class="uri">http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/n4713.pdf</a>)</li>
<li>N1570 Working Draft, Programming languages &#8208; C (<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf" class="uri">http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf</a>)</li>
</ul>
</body>
</html>
