<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang xml:lang>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="mpark/wg21" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <meta name="dcterms.date" content="2025-01-11" />
  <title>Make idiomatic usage of `offsetof` well-defined</title>
  <style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.csl-block{margin-left: 1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ background-color: #f6f8fa; }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span { } 
code span.al { color: #ff0000; } 
code span.an { } 
code span.at { } 
code span.bn { color: #9f6807; } 
code span.bu { color: #9f6807; } 
code span.cf { color: #00607c; } 
code span.ch { color: #9f6807; } 
code span.cn { } 
code span.co { color: #008000; font-style: italic; } 
code span.cv { color: #008000; font-style: italic; } 
code span.do { color: #008000; } 
code span.dt { color: #00607c; } 
code span.dv { color: #9f6807; } 
code span.er { color: #ff0000; font-weight: bold; } 
code span.ex { } 
code span.fl { color: #9f6807; } 
code span.fu { } 
code span.im { } 
code span.in { color: #008000; } 
code span.kw { color: #00607c; } 
code span.op { color: #af1915; } 
code span.ot { } 
code span.pp { color: #6f4e37; } 
code span.re { } 
code span.sc { color: #9f6807; } 
code span.ss { color: #9f6807; } 
code span.st { color: #9f6807; } 
code span.va { } 
code span.vs { color: #9f6807; } 
code span.wa { color: #008000; font-weight: bold; } 
code.diff {color: #898887}
code.diff span.va {color: #006e28}
code.diff span.st {color: #bf0303}
</style>
  <style type="text/css">
body {
margin: 5em;
font-family: serif;

hyphens: auto;
line-height: 1.35;
text-align: justify;
}
@media screen and (max-width: 30em) {
body {
margin: 1.5em;
}
}
div.wrapper {
max-width: 60em;
margin: auto;
}

ul {
list-style-type: none;
padding-left: 2.5em;
margin-top: -0.2em;
margin-bottom: -0.2em;
}
ol {
padding-left: 2.5em;
}
a {
text-decoration: none;
color: #4183C4;
}
a.hidden_link {
text-decoration: none;
color: inherit;
}
li {
margin-top: 0.6em;
margin-bottom: 0.6em;
}
h1, h2, h3, h4 {
position: relative;
line-height: 1;
}
a.self-link {
position: absolute;
top: 0;
left: calc(-1 * (3.5rem - 26px));
width: calc(3.5rem - 26px);
height: 2em;
text-align: center;
border: none;
transition: opacity .2s;
opacity: .5;
font-family: sans-serif;
font-weight: normal;
font-size: 83%;
}
a.self-link:hover { opacity: 1; }
a.self-link::before { content: "§"; }
ul > li:before {
content: "\2014";
position: absolute;
margin-left: -1.5em;
}

#TOC ul > li:before {
content: none;
}
#TOC > ul {
padding-left: 0;
}

.toc-section-number {
margin-right: 0.5em;
}
:target { background-color: #C9FBC9; }
:target .codeblock { background-color: #C9FBC9; }
:target ul { background-color: #C9FBC9; }
.abbr_ref { float: right; }
.folded_abbr_ref { float: right; }
:target .folded_abbr_ref { display: none; }
:target .unfolded_abbr_ref { float: right; display: inherit; }
.unfolded_abbr_ref { display: none; }
.secnum { display: inline-block; min-width: 35pt; }
.header-section-number { display: inline-block; min-width: 35pt; }
.annexnum { display: block; }
div.sourceLinkParent {
float: right;
}
a.sourceLink {
position: absolute;
opacity: 0;
margin-left: 10pt;
}
a.sourceLink:hover {
opacity: 1;
}
a.itemDeclLink {
position: absolute;
font-size: 75%;
text-align: right;
width: 5em;
opacity: 0;
}
a.itemDeclLink:hover { opacity: 1; }
span.marginalizedparent {
position: relative;
left: -5em;
}
li span.marginalizedparent { left: -7em; }
li ul > li span.marginalizedparent { left: -9em; }
li ul > li ul > li span.marginalizedparent { left: -11em; }
li ul > li ul > li ul > li span.marginalizedparent { left: -13em; }
div.footnoteNumberParent {
position: relative;
left: -4.7em;
}
a.marginalized {
position: absolute;
font-size: 75%;
text-align: right;
width: 5em;
}
a.enumerated_item_num {
position: relative;
left: -3.5em;
display: inline-block;
margin-right: -3em;
text-align: right;
width: 3em;
}
div.para { margin-bottom: 0.6em; margin-top: 0.6em; text-align: justify; }
div.section { text-align: justify; }
div.sentence { display: inline; }
span.indexparent {
display: inline;
position: relative;
float: right;
right: -1em;
}
a.index {
position: absolute;
display: none;
}
a.index:before { content: "⟵"; }

a.index:target {
display: inline;
}
.indexitems {
margin-left: 2em;
text-indent: -2em;
}
div.itemdescr {
margin-left: 3em;
}
.bnf {
font-family: serif;
margin-left: 40pt;
margin-top: 0.5em;
margin-bottom: 0.5em;
}
.ncbnf {
font-family: serif;
margin-top: 0.5em;
margin-bottom: 0.5em;
margin-left: 40pt;
}
.ncsimplebnf {
font-family: serif;
font-style: italic;
margin-top: 0.5em;
margin-bottom: 0.5em;
margin-left: 40pt;
background: inherit; 
}
span.textnormal {
font-style: normal;
font-family: serif;
white-space: normal;
display: inline-block;
}
span.rlap {
display: inline-block;
width: 0px;
}
span.descr { font-style: normal; font-family: serif; }
span.grammarterm { font-style: italic; }
span.term { font-style: italic; }
span.terminal { font-family: monospace; font-style: normal; }
span.nonterminal { font-style: italic; }
span.tcode { font-family: monospace; font-style: normal; }
span.textbf { font-weight: bold; }
span.textsc { font-variant: small-caps; }
a.nontermdef { font-style: italic; font-family: serif; }
span.emph { font-style: italic; }
span.techterm { font-style: italic; }
span.mathit { font-style: italic; }
span.mathsf { font-family: sans-serif; }
span.mathrm { font-family: serif; font-style: normal; }
span.textrm { font-family: serif; }
span.textsl { font-style: italic; }
span.mathtt { font-family: monospace; font-style: normal; }
span.mbox { font-family: serif; font-style: normal; }
span.ungap { display: inline-block; width: 2pt; }
span.textit { font-style: italic; }
span.texttt { font-family: monospace; }
span.tcode_in_codeblock { font-family: monospace; font-style: normal; }
span.phantom { color: white; }

span.math { font-style: normal; }
span.mathblock {
display: block;
margin-left: auto;
margin-right: auto;
margin-top: 1.2em;
margin-bottom: 1.2em;
text-align: center;
}
span.mathalpha {
font-style: italic;
}
span.synopsis {
font-weight: bold;
margin-top: 0.5em;
display: block;
}
span.definition {
font-weight: bold;
display: block;
}
.codeblock {
margin-left: 1.2em;
line-height: 127%;
}
.outputblock {
margin-left: 1.2em;
line-height: 127%;
}
div.itemdecl {
margin-top: 2ex;
}
code.itemdeclcode {
white-space: pre;
display: block;
}
span.textsuperscript {
vertical-align: super;
font-size: smaller;
line-height: 0;
}
.footnotenum { vertical-align: super; font-size: smaller; line-height: 0; }
.footnote {
font-size: small;
margin-left: 2em;
margin-right: 2em;
margin-top: 0.6em;
margin-bottom: 0.6em;
}
div.minipage {
display: inline-block;
margin-right: 3em;
}
div.numberedTable {
text-align: center;
margin: 2em;
}
div.figure {
text-align: center;
margin: 2em;
}
table {
border: 1px solid black;
border-collapse: collapse;
margin-left: auto;
margin-right: auto;
margin-top: 0.8em;
text-align: left;
hyphens: none; 
}
td, th {
padding-left: 1em;
padding-right: 1em;
vertical-align: top;
}
td.empty {
padding: 0px;
padding-left: 1px;
}
td.left {
text-align: left;
}
td.right {
text-align: right;
}
td.center {
text-align: center;
}
td.justify {
text-align: justify;
}
td.border {
border-left: 1px solid black;
}
tr.rowsep, td.cline {
border-top: 1px solid black;
}
tr.even, tr.odd {
border-bottom: 1px solid black;
}
tr.capsep {
border-top: 3px solid black;
border-top-style: double;
}
tr.header {
border-bottom: 3px solid black;
border-bottom-style: double;
}
th {
border-bottom: 1px solid black;
}
span.centry {
font-weight: bold;
}
div.table {
display: block;
margin-left: auto;
margin-right: auto;
text-align: center;
width: 90%;
}
span.indented {
display: block;
margin-left: 2em;
margin-bottom: 1em;
margin-top: 1em;
}
ol.enumeratea { list-style-type: none; background: inherit; }
ol.enumerate { list-style-type: none; background: inherit; }

code.sourceCode > span { display: inline; }
</style>
  <link href="data:image/x-icon;base64,AAABAAIAEBAAAAEAIABoBAAAJgAAACAgAAABACAAqBAAAI4EAAAoAAAAEAAAACAAAAABACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////AIJEAACCRAAAgkQAAIJEAACCRAAAgkQAVoJEAN6CRADegkQAWIJEAACCRAAAgkQAAIJEAACCRAAA////AP///wCCRAAAgkQAAIJEAACCRAAsgkQAvoJEAP+CRAD/gkQA/4JEAP+CRADAgkQALoJEAACCRAAAgkQAAP///wD///8AgkQAAIJEABSCRACSgkQA/IJEAP99PQD/dzMA/3czAP99PQD/gkQA/4JEAPyCRACUgkQAFIJEAAD///8A////AHw+AFiBQwDqgkQA/4BBAP9/PxP/uZd6/9rJtf/bybX/upd7/39AFP+AQQD/gkQA/4FDAOqAQgBc////AP///wDKklv4jlEa/3o7AP+PWC//8+3o///////////////////////z7un/kFox/35AAP+GRwD/mVYA+v///wD///8A0Zpk+NmibP+0d0T/8evj///////+/fv/1sKz/9bCs//9/fr//////+/m2/+NRwL/nloA/5xYAPj///8A////ANKaZPjRmGH/5cKh////////////k149/3UwAP91MQD/lmQ//86rhv+USg3/m1YA/5hSAP+bVgD4////AP///wDSmmT4zpJY/+/bx///////8+TV/8mLT/+TVx//gkIA/5lVAP+VTAD/x6B//7aEVv/JpH7/s39J+P///wD///8A0ppk+M6SWP/u2sf///////Pj1f/Nj1T/2KFs/8mOUv+eWhD/lEsA/8aee/+0glT/x6F7/7J8Rvj///8A////ANKaZPjRmGH/48Cf///////+/v7/2qt//82PVP/OkFX/37KJ/86siv+USg7/mVQA/5hRAP+bVgD4////AP///wDSmmT40ppk/9CVXP/69O////////7+/v/x4M//8d/P//7+/f//////9u7n/6tnJf+XUgD/nFgA+P///wD///8A0ppk+NKaZP/RmWL/1qNy//r07///////////////////////+vXw/9akdP/Wnmn/y5FY/6JfFvj///8A////ANKaZFTSmmTo0ppk/9GYYv/Ql1//5cWm//Hg0P/x4ND/5cWm/9GXYP/RmGH/0ppk/9KaZOjVnmpY////AP///wDSmmQA0ppkEtKaZI7SmmT60ppk/9CWX//OkVb/zpFW/9CWX//SmmT/0ppk/NKaZJDSmmQS0ppkAP///wD///8A0ppkANKaZADSmmQA0ppkKtKaZLrSmmT/0ppk/9KaZP/SmmT/0ppkvNKaZCrSmmQA0ppkANKaZAD///8A////ANKaZADSmmQA0ppkANKaZADSmmQA0ppkUtKaZNzSmmTc0ppkVNKaZADSmmQA0ppkANKaZADSmmQA////AP5/AAD4HwAA4AcAAMADAACAAQAAgAEAAIABAACAAQAAgAEAAIABAACAAQAAgAEAAMADAADgBwAA+B8AAP5/AAAoAAAAIAAAAEAAAAABACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////AP///wCCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAAyCRACMgkQA6oJEAOqCRACQgkQAEIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAA////AP///wD///8A////AIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRABigkQA5oJEAP+CRAD/gkQA/4JEAP+CRADqgkQAZoJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAAD///8A////AP///wD///8AgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAA4gkQAwoJEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQAxIJEADyCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAAgkQAAP///wD///8A////AP///wCCRAAAgkQAAIJEAACCRAAAgkQAAIJEAACCRAAWgkQAmIJEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAJyCRAAYgkQAAIJEAACCRAAAgkQAAIJEAACCRAAA////AP///wD///8A////AIJEAACCRAAAgkQAAIJEAACCRAAAgkQAdIJEAPCCRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAP+CRAD/gkQA/4JEAPSCRAB4gkQAAIJEAACCRAAAgkQAAIJEAAD///8A////AP///wD///8AgkQAAIJEAACCRAAAgkQASoJEANKCRAD/gkQA/4JEAP+CRAD/g0YA/39AAP9zLgD/bSQA/2shAP9rIQD/bSQA/3MuAP9/PwD/g0YA/4JEAP+CRAD/gkQA/4JEAP+CRADUgkQAToJEAACCRAAAgkQAAP///wD///8A////AP///wB+PwAAgkUAIoJEAKiCRAD/gkQA/4JEAP+CRAD/hEcA/4BBAP9sIwD/dTAA/5RfKv+viF7/vp56/76ee/+wiF7/lWAr/3YxAP9sIwD/f0AA/4RHAP+CRAD/gkQA/4JEAP+CRAD/gkQArIJEACaBQwAA////AP///wD///8A////AIBCAEBzNAD6f0EA/4NFAP+CRAD/gkQA/4VIAP92MwD/bSUA/6N1Tv/ezsL/////////////////////////////////38/D/6V3Uv9uJgD/dTEA/4VJAP+CRAD/gkQA/4JEAP+BQwD/fUAA/4FDAEj///8A////AP///wD///8AzJRd5qBlKf91NgD/dDUA/4JEAP+FSQD/cy4A/3YyAP/PuKP//////////////////////////////////////////////////////9K7qP94NQD/ciwA/4VJAP+CRAD/fkEA/35BAP+LSwD/mlYA6v///wD///8A////AP///wDdpnL/4qx3/8KJUv+PUhf/cTMA/3AsAP90LgD/4dK+/////////////////////////////////////////////////////////////////+TYxf91MAD/dTIA/31CAP+GRwD/llQA/6FcAP+gWwD8////AP///wD///8A////ANGZY/LSm2X/4ap3/92mcP+wdT3/byQA/8mwj////////////////////////////////////////////////////////////////////////////+LYxv9zLgP/jUoA/59bAP+hXAD/nFgA/5xYAPL///8A////AP///wD///8A0ppk8tKaZP/RmWL/1p9q/9ubXv/XqXj////////////////////////////7+fD/vZyG/6BxS/+gcUr/vJuE//r37f//////////////////////3MOr/5dQBf+dVQD/nVkA/5xYAP+cWAD/nFgA8v///wD///8A////AP///wDSmmTy0ppk/9KaZP/SmWP/yohJ//jo2P//////////////////////4NTG/4JDFf9lGAD/bSQA/20kAP9kGAD/fz8S/+Xb0f//////5NG9/6txN/+LOgD/m1QA/51aAP+cWAD/m1cA/5xYAP+cWADy////AP///wD///8A////ANKaZPLSmmT/0ppk/8+TWf/Unmv//v37//////////////////////+TWRr/VwsA/35AAP+ERgD/g0UA/4JGAP9lHgD/kFga/8KXX/+TRwD/jT4A/49CAP+VTQD/n10A/5xYAP+OQQD/lk4A/55cAPL///8A////AP///wD///8A0ppk8tKaZP/SmmT/y4tO/92yiP//////////////////////8NnE/8eCQP+rcTT/ez0A/3IyAP98PgD/gEMA/5FSAP+USwD/jj8A/5lUAP+JNwD/yqV2/694Mf+HNQD/jkAA/82rf/+laBj/jT4A8v///wD///8A////AP///wDSmmTy0ppk/9KaZP/LiUr/4byY///////////////////////gupX/0I5P/+Wuev/Lklz/l1sj/308AP+QSwD/ol0A/59aAP+aVQD/k0oA/8yoh///////+fXv/6pwO//Lp3v///////Pr4f+oay7y////AP///wD///8A////ANKaZPLSmmT/0ppk/8uJSv/hvJj//////////////////////+G7l//Jhkb/0ppk/96nc//fqXX/x4xO/6dkFP+QSQD/llEA/5xXAP+USgD/yaOA///////38uv/qG05/8ijdv//////8efb/6ZpLPL///8A////AP///wD///8A0ppk8tKaZP/SmmT/zIxO/9yxh///////////////////////7dbA/8iEQf/Sm2X/0Zlj/9ScZv/eqHf/2KJv/7yAQf+XTgD/iToA/5lSAP+JNgD/yKFv/611LP+HNQD/jT8A/8qmeP+kZRT/jT4A8v///wD///8A////AP///wDSmmTy0ppk/9KaZP/Pk1n/1J5q//78+//////////////////+/fv/1aFv/8iEQv/Tm2b/0ppl/9GZY//Wn2z/1pZc/9eldf/Bl2b/kUcA/4w9AP+OQAD/lUwA/59eAP+cWQD/jT8A/5ZOAP+eXADy////AP///wD///8A////ANKaZPLSmmT/0ppk/9KZY//KiEn/8d/P///////////////////////47+f/05tm/8iCP//KiEj/yohJ/8eCP//RmGH//vfy///////n1sP/rXQ7/4k4AP+TTAD/nVoA/5xYAP+cVwD/nFgA/5xYAPL///8A////AP///wD///8A0ppk8tKaZP/SmmT/0ptl/8uLTf/aq37////////////////////////////+/fz/6c2y/961jv/etY7/6Myx//78+v//////////////////////3MWv/5xXD/+ORAD/mFQA/51ZAP+cWAD/nFgA8v///wD///8A////AP///wDSmmTy0ppk/9KaZP/SmmT/0ppk/8mFRP/s1b//////////////////////////////////////////////////////////////////////////////+PD/0JFU/7NzMv+WUQD/kUsA/5tXAP+dWQDy////AP///wD///8A////ANKaZP/SmmT/0ppk/9KaZP/Sm2X/z5NZ/8yMT//z5NX/////////////////////////////////////////////////////////////////9Ofa/8yNUP/UmGH/36p5/8yTWv+qaSD/kksA/5ROAPz///8A////AP///wD///8A0ppk5NKaZP/SmmT/0ppk/9KaZP/TnGf/zY9T/82OUv/t1sD//////////////////////////////////////////////////////+7Yw//OkFX/zI5R/9OcZ//SmmP/26V0/9ymdf/BhUf/ol8R6P///wD///8A////AP///wDSmmQ80ppk9tKaZP/SmmT/0ppk/9KaZP/TnGj/zpFW/8qJSv/dson/8uHS//////////////////////////////////Lj0//etIv/y4lL/86QVf/TnGj/0ppk/9KaZP/RmWP/05xn/9ymdfjUnWdC////AP///wD///8A////ANKaZADSmmQc0ppkotKaZP/SmmT/0ppk/9KaZP/Tm2b/0Zli/8qJSf/NjlH/16Z3/+G8mP/myKr/5siq/+G8mP/Xp3f/zY5S/8qISf/RmGH/05tm/9KaZP/SmmT/0ppk/9KaZP/SmmSm0pljINWdaQD///8A////AP///wD///8A0ppkANKaZADSmmQA0ppkQtKaZMrSmmT/0ppk/9KaZP/SmmT/0ptl/9GYYf/Nj1P/y4lL/8qISP/KiEj/y4lK/82PU//RmGH/0ptl/9KaZP/SmmT/0ppk/9KaZP/SmmTO0ppkRtKaZADSmmQA0ppkAP///wD///8A////AP///wDSmmQA0ppkANKaZADSmmQA0ppkANKaZGzSmmTu0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmTw0ppkcNKaZADSmmQA0ppkANKaZADSmmQA////AP///wD///8A////ANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZBLSmmSQ0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppklNKaZBTSmmQA0ppkANKaZADSmmQA0ppkANKaZAD///8A////AP///wD///8A0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQy0ppkutKaZP/SmmT/0ppk/9KaZP/SmmT/0ppk/9KaZP/SmmT/0ppkvtKaZDbSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkAP///wD///8A////AP///wDSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkXNKaZODSmmT/0ppk/9KaZP/SmmT/0ppk5NKaZGDSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA////AP///wD///8A////ANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkBtKaZIbSmmTo0ppk6tKaZIrSmmQK0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZADSmmQA0ppkANKaZAD///8A////AP/8P///+B///+AH//+AAf//AAD//AAAP/AAAA/gAAAHwAAAA8AAAAPAAAADwAAAA8AAAAPAAAADwAAAA8AAAAPAAAADwAAAA8AAAAPAAAADwAAAA8AAAAPAAAADwAAAA+AAAAfwAAAP/AAAP/8AAP//gAH//+AH///4H////D//" rel="icon" />
  
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div class="wrapper">
<header id="title-block-header">
<h1 class="title" style="text-align:center">Make idiomatic usage of
<code class="sourceCode cpp">offsetof</code> well-defined</h1>
<table style="border:none;float:right">
  <tr>
    <td>Document #:</td>
    <td>P3407R1</td>
  </tr>
  <tr>
    <td>Date:</td>
    <td>2025-01-11</td>
  </tr>
  <tr>
    <td style="vertical-align:top">Project:</td>
    <td>Programming Language C++</td>
  </tr>
  <tr>
    <td style="vertical-align:top">Audience:</td>
    <td>
      EWG<br>
    </td>
  </tr>
  <tr>
    <td style="vertical-align:top">Reply-to:</td>
    <td>
      Brian Bi<br>&lt;<a href="mailto:bbi10@bloomberg.net" class="email">bbi10@bloomberg.net</a>&gt;<br>
    </td>
  </tr>
</table>
</header>
<div style="clear:both">
<div id="TOC" role="doc-toc">
<h1 id="toctitle">Contents</h1>
<ul>
<li><a href="#abstract" id="toc-abstract"><span class="toc-section-number">1</span> Abstract</a></li>
<li><a href="#revision-history" id="toc-revision-history"><span class="toc-section-number">2</span> Revision history</a></li>
<li><a href="#introduction" id="toc-introduction"><span class="toc-section-number">3</span> Introduction</a></li>
<li><a href="#shouldnt-this-be-fixed-by-the-accessing-object-representations-paper" id="toc-shouldnt-this-be-fixed-by-the-accessing-object-representations-paper"><span class="toc-section-number">4</span> Shouldn’t this be fixed by the
“Accessing object representations” paper?</a></li>
<li><a href="#problem-data-members-are-not-reachable-from-other-data-members-except-the-first" id="toc-problem-data-members-are-not-reachable-from-other-data-members-except-the-first"><span class="toc-section-number">5</span> Problem: data members are not
reachable from other data members, except the first</a></li>
<li><a href="#reachability-is-not-about-pointer-arithmetic" id="toc-reachability-is-not-about-pointer-arithmetic"><span class="toc-section-number">6</span> Reachability is not about pointer
arithmetic</a></li>
<li><a href="#provenance-in-c" id="toc-provenance-in-c"><span class="toc-section-number">7</span> Provenance in C</a></li>
<li><a href="#provenance-in-c-1" id="toc-provenance-in-c-1"><span class="toc-section-number">8</span> Provenance in C++</a></li>
<li><a href="#removing-undefined-behavior-and-making-optimizations-opt-in" id="toc-removing-undefined-behavior-and-making-optimizations-opt-in"><span class="toc-section-number">9</span> Removing undefined behavior and
making optimizations opt-in</a></li>
<li><a href="#design-space-for-a-solution" id="toc-design-space-for-a-solution"><span class="toc-section-number">10</span> Design space for a solution</a>
<ul>
<li><a href="#which-pointer-types-can-be-used-for-the-pointer-arithmetic" id="toc-which-pointer-types-can-be-used-for-the-pointer-arithmetic"><span class="toc-section-number">10.1</span> Which pointer types can be used
for the pointer arithmetic?</a></li>
<li><a href="#what-about-casts-to-char-that-are-already-well-defined" id="toc-what-about-casts-to-char-that-are-already-well-defined"><span class="toc-section-number">10.2</span> What about casts to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
that are already well-defined?</a></li>
<li><a href="#should-we-just-standardize-container_of" id="toc-should-we-just-standardize-container_of"><span class="toc-section-number">10.3</span> Should we just standardize
<code class="sourceCode cpp">container_of</code>?</a></li>
<li><a href="#should-non-standard-layout-types-be-supported" id="toc-should-non-standard-layout-types-be-supported"><span class="toc-section-number">10.4</span> Should non-standard-layout types
be supported?</a></li>
</ul></li>
<li><a href="#proposed-wording" id="toc-proposed-wording"><span class="toc-section-number">11</span> Proposed wording</a></li>
<li><a href="#appendix-a" id="toc-appendix-a"><span class="toc-section-number">12</span> Appendix A</a></li>
<li><a href="#bibliography" id="toc-bibliography"><span class="toc-section-number">13</span> References</a></li>
</ul>
</div>
<h1 data-number="1" id="abstract"><span class="header-section-number">1</span> Abstract<a href="#abstract" class="self-link"></a></h1>
<p>I propose a change to the core language specification that would make
it well defined to compute a pointer to the beginning of an object from
a pointer to one of its data members (<em>i.e.</em> by subtracting the
offset of the data member, as given by the
<code class="sourceCode cpp">offsetof</code> macro). Such code, which is
often written in C, arguably had well defined behavior prior to C++17.
The proposed change will standardize existing practice and is
anticipated to have no impact on existing C++ compilers, but will
eliminate the possibility of certain (as yet unimplemented) hypothetical
reachability-based optimizations that were made possible by the C++17
wording.</p>
<h1 data-number="2" id="revision-history"><span class="header-section-number">2</span> Revision history<a href="#revision-history" class="self-link"></a></h1>
<ul>
<li><strong>R0</strong>, 2024-10-14: Initial revision.</li>
<li><strong>R1</strong>, 2025-01-10: Applied wording improvements
discovered during CWG review of <span class="citation" data-cites="P1839R6">[<a href="https://wg21.link/p1839r6" role="doc-biblioref">P1839R6</a>]</span>, including the prohibition on
accessing object representations of volatile objects. Improved
readability of prose. Added discussion of non-standard-layout types and
the possibility of standardizing
<code class="sourceCode cpp">container_of</code>. Added a table
comparing the three main design alternatives discussed in this
paper.</li>
</ul>
<h1 data-number="3" id="introduction"><span class="header-section-number">3</span> Introduction<a href="#introduction" class="self-link"></a></h1>
<p>In C, an intrusive data structure, such as a doubly-linked list, must
be implemented using composition, not inheritance, since C does not have
inheritance. Given a pointer to a node within the data structure,
accessing the rest of the object requires the use of
<code class="sourceCode cpp">offsetof</code>:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> ListNode <span class="op">{</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> ListNode<span class="op">*</span> prev<span class="op">;</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> ListNode<span class="op">*</span> next<span class="op">;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="op">};</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">typedef</span> <span class="kw">struct</span> <span class="op">{</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> data<span class="op">;</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> ListNode node<span class="op">;</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="op">}</span> Foo<span class="op">;</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>Foo<span class="op">*</span> next_foo<span class="op">(</span>Foo<span class="op">*</span> foo<span class="op">)</span> <span class="op">{</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> ListNode<span class="op">*</span> next_node <span class="op">=</span> foo<span class="op">-&gt;</span>node<span class="op">;</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="op">(</span>Foo<span class="op">*)((</span><span class="dt">char</span><span class="op">*)</span>next_node <span class="op">-</span> offsetof<span class="op">(</span>Foo<span class="op">,</span> node<span class="op">));</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This pattern of casting to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>,
subtracting the appropriate <code class="sourceCode cpp">offsetof</code>
value, and then casting to a pointer to the enclosing type, is often
encapsulated in a macro that is named
<code class="sourceCode cpp">container_of</code> or similar (see
<em>e.g.</em> <a href="https://github.com/search?q=container_of+language%3AC+&amp;type=code">GitHub
code search</a>)<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
<p>A C++-only project would typically make
<code class="sourceCode cpp">ListNode</code> a base class. Converting a
<code class="sourceCode cpp">ListNode<span class="op">*</span></code> to
a <code class="sourceCode cpp">Foo<span class="op">*</span></code> could
then be done easily using
<code class="sourceCode cpp"><span class="kw">static_cast</span></code>,
and <code class="sourceCode cpp">offsetof</code> would be unnecessary.
This option is not available in C. In C, the
<code class="sourceCode cpp">container_of</code> pattern is the only
option, unless the <code class="sourceCode cpp">ListNode</code> can be
arranged to always be the first member of the enclosing struct.</p>
<p>Unfortunately, the operand of the
<code class="sourceCode cpp"><span class="cf">return</span></code>
statement in <code class="sourceCode cpp">next_foo</code> has undefined
behavior in C++. This incompatibility between C and C++ should be fixed,
and can be fixed without changing any current C++ compilers.</p>
<h1 data-number="4" id="shouldnt-this-be-fixed-by-the-accessing-object-representations-paper"><span class="header-section-number">4</span> Shouldn’t this be fixed by the
“Accessing object representations” paper?<a href="#shouldnt-this-be-fixed-by-the-accessing-object-representations-paper" class="self-link"></a></h1>
<p>At the November, 2019 WG21 meeting, EWG approved <span class="citation" data-cites="P1839R1">[<a href="https://wg21.link/p1839r1" role="doc-biblioref">P1839R1</a>]</span> in the following poll:</p>
<blockquote>
<p>It should be possible to access the entire object representation
through a pointer to a char-like type as a DR</p>
</blockquote>
<p>Something like P1839 is certainly needed in order to allow the code
given in the previous section to be valid. Currently, casting
<code class="sourceCode cpp">next_node</code> to type <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
does not yield a pointer that points into an array of
<code class="sourceCode cpp"><span class="dt">char</span></code>;
therefore, subtracting any value other than 0 can only have UB
(§<span>7.6.6
<a href="https://wg21.link/N5001#expr.add">[expr.add]</a><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></span>p4.3). To solve this problem,
<span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span> proposes that object
representations be made arrays of <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>
(and that pointers to
<code class="sourceCode cpp"><span class="dt">char</span></code> also be
allowed to traverse such arrays). This issue has also been pointed out
by <span class="citation" data-cites="P2883R0">[<a href="https://wg21.link/p2883r0" role="doc-biblioref">P2883R0</a>]</span>, which also noted that,
although this use of <code class="sourceCode cpp">offsetof</code> has UB
in C++, every known C++ implementation “consistently produced the same
behavior as the C program”.</p>
<p>However, EWG did not discuss the issue of <em>reachability</em>.
Therefore, recent revisions of P1839 have been designed to preserve
reachability-based restrictions that currently exist in C++. To put it
another way, if P1839 is adopted, it will not change <em>which</em>
bytes a piece of code is allowed to access, <em>i.e.</em>, bytes that it
would already be able to access by calling
<code class="sourceCode cpp">memcpy</code>. In order to allow the code
given in the Introduction to have well defined behavior, we must expand
the set of bytes that are considered reachable from a pointer to a data
member.</p>
<p>To be clear, P1839 <em>could</em> just make the example in the
Introduction valid, but this is a separate evolutionary question from
the approved direction in P1839. Therefore, the reachability issue has
been made the subject of <em>this</em> paper instead of being added to
P1839.</p>
<h1 data-number="5" id="problem-data-members-are-not-reachable-from-other-data-members-except-the-first"><span class="header-section-number">5</span> Problem: data members are not
reachable from other data members, except the first<a href="#problem-data-members-are-not-reachable-from-other-data-members-except-the-first" class="self-link"></a></h1>
<p>Reachability was introduced into C++17 by the adoption of <span class="citation" data-cites="P0137R1">[<a href="https://wg21.link/p0137r1" role="doc-biblioref">P0137R1</a>]</span>. The definition of reachability
is currently given in §<span>6.8.4
<a href="https://wg21.link/N5001#basic.compound">[basic.compound]</a></span>p6:</p>
<blockquote>
<p>A byte of storage <em>b</em> is <em>reachable through</em> a pointer
value that points to an object <em>x</em> if there is an object
<em>y</em>, pointer-interconvertible with <em>x</em>, such that
<em>b</em> is within the storage occupied by <em>y</em>, or the
immediately-enclosing array object if <em>y</em> is an array
element.</p>
</blockquote>
<p>The cumulative effect of all changes in P0137R1 was to make it
impossible for a pointer derived from a given pointer value,
<code class="sourceCode cpp">p</code>, to access bytes that are not
reachable from <code class="sourceCode cpp">p</code>. The adoption of
that paper therefore gave the Committee’s blessing to compiler
optimizations based on the <em>assumption</em> that unreachable bytes
cannot be accessed at all. Unfortunately, in the
<code class="sourceCode cpp">next_foo</code> example given in the
Introduction, the bytes constituting the
<code class="sourceCode cpp">foo<span class="op">-&gt;</span>data</code>
member are not reachable from a pointer to
<code class="sourceCode cpp">foo<span class="op">-&gt;</span>node</code>.</p>
<p>For example, assuming <code class="sourceCode cpp">ListNode</code>
and <code class="sourceCode cpp">Foo</code> have been defined as above,
consider the following.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> access_node<span class="op">(</span>ListNode<span class="op">*</span> p<span class="op">)</span>;</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> use_foo<span class="op">()</span> <span class="op">{</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>    Foo foo;</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>    foo<span class="op">.</span>data <span class="op">=</span> <span class="dv">1</span>;</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>    foo<span class="op">.</span>node<span class="op">.</span>prev <span class="op">=</span> <span class="op">&amp;</span>foo<span class="op">.</span>node;</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    foo<span class="op">.</span>node<span class="op">.</span>next <span class="op">=</span> <span class="op">&amp;</span>foo<span class="op">.</span>node;</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    access_node<span class="op">(&amp;</span>foo<span class="op">.</span>node<span class="op">)</span>;</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> foo<span class="op">.</span>data;</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>When the body of <code class="sourceCode cpp">use_foo</code> is
compiled, the compiler is allowed to assume that
<code class="sourceCode cpp">foo<span class="op">.</span>data</code>
cannot be modified by <code class="sourceCode cpp">access_node</code>,
even though <code class="sourceCode cpp">access_node</code> is given a
pointer to <em>another</em> member of the
<code class="sourceCode cpp">foo</code> object. In order to allow
<code class="sourceCode cpp">access_node</code> to access the
<code class="sourceCode cpp">data</code> member through
<code class="sourceCode cpp">offsetof</code> and pointer arithmetic, we
must also take away the possibility that a conforming implementation
could unconditionally optimize the
<code class="sourceCode cpp"><span class="cf">return</span></code>
statement to <code class="sourceCode cpp"><span class="cf">return</span> <span class="dv">1</span>;</code>.</p>
<p>The allowance of this particular reachability-based optimization
conflicts with the more important goal of allowing the code given in the
Introduction, which would have well defined behavior in C, to have the
same behavior in C++. In addition, such optimizations do not seem to
have been implemented in any real C++ compilers. Therefore, the changes
I propose will not have any impact on existing implementations. I will
give more detail in the “Provenance in C++” section below.</p>
<h1 data-number="6" id="reachability-is-not-about-pointer-arithmetic"><span class="header-section-number">6</span> Reachability is not about pointer
arithmetic<a href="#reachability-is-not-about-pointer-arithmetic" class="self-link"></a></h1>
<p>In the status quo (prior to the adoption of <span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span>, if any), reachability can
prevent some memory accesses even when no pointer arithmetic is
involved. For example:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S <span class="op">{</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> a<span class="op">[</span><span class="dv">2</span><span class="op">]</span>;</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> data;</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="op">}</span>;</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f1<span class="op">(</span><span class="dt">int</span><span class="op">*</span> p<span class="op">)</span>;</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> f2<span class="op">()</span> <span class="op">{</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>    S s;</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>    s<span class="op">.</span>data <span class="op">=</span> <span class="dv">1</span>;</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>    f1<span class="op">(&amp;</span>s<span class="op">.</span>a<span class="op">[</span><span class="dv">0</span><span class="op">])</span>;</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> s<span class="op">.</span>data;</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>If <code class="sourceCode cpp">f1</code> is defined as follows:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f1<span class="op">(</span><span class="dt">int</span><span class="op">*</span> p<span class="op">)</span> <span class="op">{</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">reinterpret_cast</span><span class="op">&lt;</span>S<span class="op">*&gt;(</span><span class="kw">reinterpret_cast</span><span class="op">&lt;</span><span class="dt">int</span> <span class="op">(*)[</span><span class="dv">2</span><span class="op">]&gt;(</span>p<span class="op">))-&gt;</span>data <span class="op">=</span> <span class="dv">2</span>;</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>then calling <code class="sourceCode cpp">f1</code> has undefined
behavior, because the entire array
<code class="sourceCode cpp">s<span class="op">.</span>a</code> is not
pointer-interconvertible with the element <code class="sourceCode cpp">s<span class="op">.</span>a<span class="op">[</span><span class="dv">0</span><span class="op">]</span></code>.
The inner <code class="sourceCode cpp"><span class="kw">reinterpret_cast</span></code>
yields a “wrongly typed” pointer: a pointer value that is of type <code class="sourceCode cpp"><span class="dt">int</span> <span class="op">(*)[</span><span class="dv">2</span><span class="op">]</span></code>,
but points to a single
<code class="sourceCode cpp"><span class="dt">int</span></code>, namely
<code class="sourceCode cpp">s<span class="op">.</span>a<span class="op">[</span><span class="dv">0</span><span class="op">]</span></code>;
it does not point to the array
<code class="sourceCode cpp">s<span class="op">.</span>a</code>.
Consequently, the outer <code class="sourceCode cpp"><span class="kw">reinterpret_cast</span></code>,
which attempts to go from the first member of a standard-layout struct
to the struct itself (allowed in C++17), cannot work; instead another
wrongly typed pointer is produced: a value of type
<code class="sourceCode cpp">S<span class="op">*</span></code> that
points to <code class="sourceCode cpp">s<span class="op">.</span>a<span class="op">[</span><span class="dv">0</span><span class="op">]</span></code>
(not <code class="sourceCode cpp">s</code>). Dereferencing this pointer
yields an lvalue that does not refer to an
<code class="sourceCode cpp">S</code> object, which renders the
attempted access to <code class="sourceCode cpp">data</code> UB
(§<span>7.6.1.5
<a href="https://wg21.link/N5001#expr.ref">[expr.ref]</a></span>p9).</p>
<p>The
<code class="sourceCode cpp">std<span class="op">::</span>launder</code>
function, which can accept a pointer and return a different pointer
value that holds the same address, does not help, because it has a
reachability restriction: calling
<code class="sourceCode cpp">std<span class="op">::</span>launder</code>
on a wrongly typed pointer picks out the object of the correct type that
lives at the address that the pointer holds, but if there are bytes
reachable from that object that are not reachable from the object that
the original pointer points to, the behavior is undefined (§<span>17.6.5
<a href="https://wg21.link/N5001#ptr.launder">[ptr.launder]</a></span>p2).</p>
<p>Therefore, the implementation can assume that the call to
<code class="sourceCode cpp">f1</code> in
<code class="sourceCode cpp">f2</code> never modifies
<code class="sourceCode cpp">s<span class="op">.</span>data</code>: if
any attempt were made to do so, then the behavior of the program would
be undefined.</p>
<p>In <span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span>, I have attempted to ensure
that the proposed wording is consistent with the reachability
restrictions that exist in current C++, because there is no record of
EWG having discussed the question of whether those restrictions should
be relaxed. If the <code class="sourceCode cpp">get_next_foo</code>
example is to be made well-defined, then some reachability-based
assumptions that are currently allowed to implementations must be
invalidated. This paper proposes to do just that.</p>
<h1 data-number="7" id="provenance-in-c"><span class="header-section-number">7</span> Provenance in C<a href="#provenance-in-c" class="self-link"></a></h1>
<p>The C standard does not currently have a notion of
<em>provenance</em>, but it is widely assumed that one ought to exist.
For example, in the following translation unit:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> evil<span class="op">(</span><span class="dt">void</span><span class="op">);</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">(</span><span class="dt">void</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> x <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    evil<span class="op">();</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> x<span class="op">;</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>notwithstanding that <code class="sourceCode cpp">evil</code> might
be able to “guess” the address of <code class="sourceCode cpp">x</code>
based on knowledge of the platform ABI, it is widely agreed that
<code class="sourceCode cpp">evil</code> should be allowed to neither
read nor write the value of <code class="sourceCode cpp">x</code>, and,
therefore, the compiler can eliminate
<code class="sourceCode cpp">x</code> and optimize the last statement to
<code class="sourceCode cpp"><span class="cf">return</span> <span class="dv">1</span>;</code>.
GCC and Clang both perform this optimization at
<code class="sourceCode cpp"><span class="op">-</span>O1</code> and
higher.</p>
<p>One can say that even if <code class="sourceCode cpp">evil</code>
correctly guesses the numerical value of
<code class="sourceCode cpp">x</code>’s address, casting that numerical
value to <code class="sourceCode cpp"><span class="dt">int</span><span class="op">*</span></code>
would yield a pointer that <em>lacks provenance</em> and, therefore,
causes UB when dereferenced. Such provenance-based restrictions on the
use of pointers do not exist in the current C standard, but work is
underway on a Draft Technical Specification for pointer provenance in C
(referred to as the “Provenance TS” from this point onward). The latest
version of the Provenance TS is <span class="citation" data-cites="WG14_N3057">[<a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3057.pdf" role="doc-biblioref">N3057</a>]</span>.</p>
<p>In the Provenance TS, values of pointer-to-object type<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> are
augmented to include provenance, which may be empty. A non-empty
provenance is the ID of a <em>storage instance</em>, and a pointer value
whose provenance is the ID of a storage instance <em>I</em> can be used
only to access bytes that lie within <em>I</em>. In the example above, a
storage instance is created when <code class="sourceCode cpp">x</code>
is defined. In contrast to the address that a pointer value represents,
there is no way to directly change the provenance of a pointer, other
than by storing into it another pointer value that has the desired
provenance. That is, no cast or other operation in
<code class="sourceCode cpp">evil</code> can construct a pointer value
whose provenance is the ID of <code class="sourceCode cpp">x</code>.
Therefore, the implementation can assume that any pointer constructed by
<code class="sourceCode cpp">evil</code> that happens to represent the
address of <code class="sourceCode cpp">x</code> cannot be used to
access <code class="sourceCode cpp">x</code>, since the provenance of
such a pointer value is either empty or a storage ID other than that of
<code class="sourceCode cpp">x</code>.</p>
<p>Although the Provenance TS doesn’t explicitly state that subobjects
have the provenance of their complete object, the definition of “storage
instance” given in section 3.20 of Annex C implies that only a single
storage instance is created by an object definition. A note to section
3.20 states that two subobjects within an object of structure type share
a storage instance.</p>
<p>Therefore, under the Provenance TS, if the address of a subobject is
taken, the resulting pointer’s provenance is a storage ID that contains
at least the complete object<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. Therefore, all bytes of
a complete object are always reachable starting from a valid pointer to
any subobject.</p>
<h1 data-number="8" id="provenance-in-c-1"><span class="header-section-number">8</span> Provenance in C++<a href="#provenance-in-c-1" class="self-link"></a></h1>
<p>C++ has had a provenance-based pointer model since <span class="citation" data-cites="P0137R1">[<a href="https://wg21.link/p0137r1" role="doc-biblioref">P0137R1</a>]</span>. However, the C++ standard does
not use the term “provenance”. Instead, every dereferencable pointer in
C++ has a unique object or function to which it points. But the set of
bytes that an object pointer can reach is not necessarily limited to the
bytes occupied by the object that the pointer points to. For example, a
pointer to any element of an array can be used to access any byte of the
array, including bytes that are occupied by other elements. C++ is more
restrictive than the C Provenance TS: all bytes reachable from the
pointer value “pointer to <em>o</em>” (where <em>o</em> is an object)
lie within <em>o</em>’s complete object, but not all bytes of a complete
object are reachable from a pointer to a subobject. In particular, as
stated previously, if a pointer points to a non-static data member of a
standard-layout struct other than the first non-static data member, no
other members are reachable from that pointer.</p>
<p>To look at it from the point of view of the compiler, all
provenance-based optimizations that are valid in C are also valid in
C++. For example, Clang, GCC, and MSVC are all capable of performing the
optimization mentioned in the previous section (<em>i.e.</em> that the
value of <code class="sourceCode cpp">x</code> is not accessed by
<code class="sourceCode cpp">evil</code>). Since C++ is stricter than C,
some provenance-based optimizations that are not valid in C are valid in
C++. However, <strong>I have not been able to find any cases in which
C++ implementations exploit provenance-based optimizations that are not
valid in C.</strong> For example, in the following translation unit:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S <span class="op">{</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> x;</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> y;</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="op">}</span>;</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f4<span class="op">(</span><span class="dt">int</span><span class="op">*</span> p<span class="op">)</span>;</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> f3<span class="op">()</span> <span class="op">{</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    S s;</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>    s<span class="op">.</span>x <span class="op">=</span> <span class="dv">1</span>;</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>    f4<span class="op">(&amp;</span>s<span class="op">.</span>y<span class="op">)</span>;</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> s<span class="op">.</span>x <span class="op">*</span> s<span class="op">.</span>x;</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>even at maximum optimization levels, Clang, GCC, and MSVC all
generate a load of
<code class="sourceCode cpp">s<span class="op">.</span>x</code> and an
<code class="sourceCode cpp">imul</code> instruction on x86-64; no
implementation assumes that, because only the address of
<code class="sourceCode cpp">s<span class="op">.</span>y</code> escapes
from <code class="sourceCode cpp">f3</code>, the value of
<code class="sourceCode cpp">s<span class="op">.</span>x</code> cannot
be changed.</p>
<p>I believe that the reason why such optimizations are not performed is
that C++ implementations wish to maintain a reasonable degree of
compatibility with C. Since C code often uses the
<code class="sourceCode cpp">container_of</code> idiom, which could be
used to obtain a pointer to <code class="sourceCode cpp">s</code> given
a pointer to
<code class="sourceCode cpp">s<span class="op">.</span>y</code>,
implementations make allowances for the same operation to take place in
a C++ program. Therefore, not only do implementations not currently
perform this optimization, but it is unlikely that future versions will
do so, either. Implementations are more constrained by the needs of
their users, in this case, than by the availability of compiler
engineers to implement the optimization.</p>
<p>Similarly, the function <code class="sourceCode cpp">f1</code>
defined earlier could be given the following definition in C. The offset
value will always be 0 in this case, so the subtraction can be omitted
without changing the meaning.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f1<span class="op">(</span><span class="dt">int</span><span class="op">*</span> p<span class="op">)</span> <span class="op">{</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="op">(</span>S<span class="op">*)((</span><span class="dt">char</span><span class="op">*)</span>p <span class="op">-</span> offsetof<span class="op">(</span><span class="kw">struct</span> S<span class="op">,</span> a<span class="op">[</span><span class="dv">0</span><span class="op">]))-&gt;</span>data <span class="op">=</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>Therefore, even in C++ mode, implementations do not assume that
<code class="sourceCode cpp">f1</code> cannot change the value of
<code class="sourceCode cpp">data</code>, even though the reachability
rules of the language permit optimizations based on this assumption.
Clang, GCC, and MSVC all emit both a store to
<code class="sourceCode cpp">s<span class="op">.</span>data</code>
before the call to <code class="sourceCode cpp">f1</code> and a load
after.</p>
<h1 data-number="9" id="removing-undefined-behavior-and-making-optimizations-opt-in"><span class="header-section-number">9</span> Removing undefined behavior and
making optimizations opt-in<a href="#removing-undefined-behavior-and-making-optimizations-opt-in" class="self-link"></a></h1>
<p>The overly strict reachability rules adopted in C++17 have an
additional disadvantage besides limiting compatibility with C: they
create a category of constructs that:</p>
<ol type="1">
<li>A programmer can easily use without realizing that UB will result,
and</li>
<li>Can be given perfectly sensible defined behavior (which may include
implementation-defined or unspecified results) only at the cost of minor
optimizations.</li>
</ol>
<p>My opinion is that the Committee should not create new forms of UB
that meet the above criteria, and should strongly consider removing any
such UB that already exists in the language. UB that is actually
exploited by compilers for optimization purposes makes the use of C++
less safe; UB that is not currently exploited still has a negative
impact on the perception of how safe C++ is, and is scary to beginners,
who don’t have enough context to distinguish between benign UB that is
unlikely to ever be exploited and dangerous UB that may eventually
result in an unbounded set of possible executions.<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> I
do not mean to suggest that all or even most UB can be removed from C++,
but when the two criteria above are met, I think the cost/benefit
analysis heavily favors giving the construct a defined behavior.</p>
<p>I believe that a better way to obtain the optimizations that such UB
is meant to enable is to provide mechanisms to opt in: that is, language
or library features whose sole purpose is to cause UB, which can then be
used to optimize; experts can use such features to produce faster code,
while beginners can easily avoid them because they cannot be used by
accident while writing code that uses other C++ features. (The <code class="sourceCode cpp"><span class="op">[[</span><span class="at">assume</span><span class="op">]]</span></code>
attribute is a well-known example of this genre.) It seems much more
defensible to provide “sharp tools” for experts to use in order to
improve performance than to build sharp edges into the most basic
language constructs, making it difficult for beginners to use them
safely.</p>
<p>Consider again this example from the previous section:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S <span class="op">{</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> x;</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> y;</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="op">}</span>;</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f4<span class="op">(</span><span class="dt">int</span><span class="op">*</span> p<span class="op">)</span>;</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> f3<span class="op">()</span> <span class="op">{</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>    S s;</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>    s<span class="op">.</span>x <span class="op">=</span> <span class="dv">1</span>;</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>    f4<span class="op">(&amp;</span>s<span class="op">.</span>y<span class="op">)</span>;</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> s<span class="op">.</span>x <span class="op">*</span> s<span class="op">.</span>x;</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This paper proposes that <code class="sourceCode cpp">f4</code> would
have the ability to modify
<code class="sourceCode cpp">s<span class="op">.</span>x</code>, and
that if there is sufficient interest from C++ experts in having a way to
tell the compiler that
<code class="sourceCode cpp">s<span class="op">.</span>x</code>
<em>cannot</em> be reached through the pointer passed to
<code class="sourceCode cpp">f4</code>, a new mechanism can be added to
the language. This possibility is discussed in Appendix A.</p>
<h1 data-number="10" id="design-space-for-a-solution"><span class="header-section-number">10</span> Design space for a solution<a href="#design-space-for-a-solution" class="self-link"></a></h1>
<p>To make the C++ standard match existing practice of implementations
and to bless <code class="sourceCode cpp">container_of</code>-like
constructs in C++, it is necessary to permit pointer arithmetic within
objects, which is already being proposed by <span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span>, and also to relax the
reachability rules in C++. However, this paper does not propose to relax
the C++ reachability rules all the way to the “complete object or
allocation” model proposed by the C Provenance TS because doing so is
not necessary to solve the immediate problem. Instead, it suffices to
allow a pointer to an object to reach all bytes of the complete object.
For example, this paper does not propose to enable the use of flexible
array members in C++, which are allowed by the C Provenance TS because
the trailing bytes belong to the same storage instance (allocation) as
the preceding members. The
<code class="sourceCode cpp">container_of</code> technique was valid in
C++ prior to C++17 and this paper aims to restore the <em>status quo
ante</em>, not to propose a new feature that has never been in C++.</p>
<h2 data-number="10.1" id="which-pointer-types-can-be-used-for-the-pointer-arithmetic"><span class="header-section-number">10.1</span> Which pointer types can be
used for the pointer arithmetic?<a href="#which-pointer-types-can-be-used-for-the-pointer-arithmetic" class="self-link"></a></h2>
<p>Because typical <code class="sourceCode cpp">container_of</code>
macros in C use a cast to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
(not <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>),
this paper proposes that a cast to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
be allowed to yield a pointer to an object’s object representation;
pointers to <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>
and
<code class="sourceCode cpp">std<span class="op">::</span>byte</code>
are also supported, as these types are already exempt from the strict
aliasing rule (§<span>7.2.1
<a href="https://wg21.link/N5001#basic.lval">[basic.lval]</a></span>p11).</p>
<h2 data-number="10.2" id="what-about-casts-to-char-that-are-already-well-defined"><span class="header-section-number">10.2</span> What about casts to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
that are already well-defined?<a href="#what-about-casts-to-char-that-are-already-well-defined" class="self-link"></a></h2>
<p>In some cases, a C-style cast to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
already has well-defined behavior in C++ that is different than
producing a pointer to the object representation. One of these cases is
when the operand points to an object of class type that has a conversion
function to <em>cv</em> <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>.
I do not propose to change the behavior of such casts in C++; doing so
would be a disastrous breaking change that is not needed for C
compatibility, because C does not have conversion functions. The
remaining two cases are:</p>
<ol type="1">
<li>The cast is a
<code class="sourceCode cpp"><span class="kw">const_cast</span></code>
because the operand has type <em>cv</em> <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
or array of <em>cv</em>
<code class="sourceCode cpp"><span class="dt">char</span></code>. (This
includes the case where no conversion is neede at all.)</li>
<li>The cast can be interpreted as a <code class="sourceCode cpp"><span class="kw">reinterpret_cast</span></code>
followed by an optional
<code class="sourceCode cpp"><span class="kw">const_cast</span></code>
because there is a “real” <em>cv</em>
<code class="sourceCode cpp"><span class="dt">char</span></code> (not an
element of an object representation) that is located at the address
represented by the operand and is pointer-interconvertible with it.</li>
</ol>
<p>I searched GitHub for uses of
<code class="sourceCode cpp">container_of</code> and uses of
<code class="sourceCode cpp">offsetof</code> for the purpose of reaching
an enclosing struct. In the 65 files that I analyzed manually, I found
two files in which the pointer from which the
<code class="sourceCode cpp">offsetof</code> value is subtracted points
to an array of
<code class="sourceCode cpp"><span class="dt">char</span></code>. (In
one of these cases, the array was a flexible array member, which is not
part of standard C++, but is often accepted as an extension.) That is,
the relevant details of the code are similar to:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S2 <span class="op">{</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span> data;</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">char</span> buf<span class="op">[</span><span class="dv">100</span><span class="op">]</span>;</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="op">}</span>;</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> get_data<span class="op">(</span><span class="dt">char</span><span class="op">*</span> p<span class="op">)</span> <span class="op">{</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="op">((</span><span class="kw">struct</span> S2<span class="op">*)(</span>p <span class="op">-</span> offsetof<span class="op">(</span>S2, buf<span class="op">)))-&gt;</span>data;</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> f5<span class="op">()</span> <span class="op">{</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>    S2 s;</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>    <span class="co">// ...</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>    get_data<span class="op">(</span>s<span class="op">-&gt;</span>buf<span class="op">)</span>;</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>    <span class="co">// ...</span></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>In C++, this code performs out-of-bounds array arithmetic, and thus
exhibits UB even before the attempt to access
<code class="sourceCode cpp">data</code>.</p>
<p>Essentially, this gives us three design options to deal with Case
1.</p>
<ol type="1">
<li>We can say that <em>cv</em> <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
is exempt from bounds checking, just as it’s exempt from the strict
aliasing rule. In other words, while a <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
may point to a specific
<code class="sourceCode cpp"><span class="dt">char</span></code> object
during constant evaluation, in all other cases it merely points to a
byte of storage, and pointer arithmetic that would reach any other byte
in the same complete object is permitted. In this case, <em>cv</em>
<code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>
would also be exempt from bounds checking (for symmetry with the strict
aliasing rule). This might have a negative impact on performance
relative to the status quo if compilers are currently relying on the
assumption that a pointer into a
<code class="sourceCode cpp"><span class="dt">char</span></code> array
that is a subobject cannot be used to perform pointer arithmetic outside
the bounds of the array. However, I have not yet found any examples
where compilers do use such assumptions for optimization. The more
likely impact is on sanitizers and static analyzers: they might be
forced to disable bounds checking for <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
and <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>,
which would reduce their ability to detect UB.</li>
<li>We can say that a C-style cast from <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
or a similar cast (as described in Case 1) sometimes <em>changes</em>
the pointer value such that the above example would have defined
behavior if <code class="sourceCode cpp">p</code> were to be cast to
<code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
prior to the pointer arithmetic. (A similar allowance would be made for
casts to <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>.)
In many cases in the real world, such a cast might be present because it
will have been introduced by a generic
<code class="sourceCode cpp">container_of</code>-like macro that is not
“aware” of the fact that the pointer argument, in some particular cases,
has type <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
already. However, this design has two disadvantages. First, some
compilers might simply ignore casts from <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
at some early stage of semantic analysis, so that at some later stage
they are not aware that the cast is there at all, so the cast cannot
achieve its purpose of giving the program defined behavior; it is not
clear how much work would be required to change the implementations.
Second, it would violate the current design in which a C-style cast is
equivalent to trying C++-style casts in a particular order; instead, the
C-style cast would have the additional power of producing a pointer to
the object representation instead of performing a no-op
<code class="sourceCode cpp"><span class="kw">const_cast</span></code>.</li>
<li>We can say that we don’t care enough about solving the problem in
the case of pointers that are already of type <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>.
The example above would continue to have UB, regardless of whether an
additional cast is inserted. We would still solve 99% of the problem,
because in the vast majority of cases, the subobject pointer points to
an object of struct or union type, not a
<code class="sourceCode cpp"><span class="dt">char</span></code>.</li>
</ol>
<p>To understand the practical implications of these options, consider
the following three possible definitions of
<code class="sourceCode cpp">get_data</code>:</p>
<ol type="A">
<li><code class="sourceCode cpp"><span class="cf">return</span> <span class="op">((</span><span class="kw">struct</span> S2<span class="op">*)(</span>p <span class="op">-</span> offsetof<span class="op">(</span>S2, buf<span class="op">)))-&gt;</span>data;</code></li>
<li><code class="sourceCode cpp"><span class="cf">return</span> <span class="op">((</span><span class="kw">struct</span> S2<span class="op">*)((</span><span class="dt">char</span><span class="op">*)</span>p <span class="op">-</span> offsetof<span class="op">(</span>S2, buf<span class="op">)))-&gt;</span>data;</code></li>
<li><code class="sourceCode cpp"><span class="cf">return</span> <span class="op">((</span><span class="kw">struct</span> S2<span class="op">*)((</span><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*)</span>p <span class="op">-</span> offsetof<span class="op">(</span>S2, buf<span class="op">)))-&gt;</span>data;</code></li>
</ol>
<p>In the status quo, even assuming <span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span> is adopted, the behavior is
undefined in C++ in all three cases (A, B, and C). Some or all of these
cases are well defined under C<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a> and under the proposed
options discussed above:</p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;"><div style="text-align:center">
<strong>Case</strong>
</div></th>
<th style="text-align: center;"><div style="text-align:center">
<strong>ISO C with Provenance TS</strong>
</div></th>
<th style="text-align: center;"><div style="text-align:center">
<strong>Option 1</strong>
</div></th>
<th style="text-align: center;"><div style="text-align:center">
<strong>Option 2</strong>
</div></th>
<th style="text-align: center;"><div style="text-align:center">
<strong>Option 3</strong>
</div></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">A</td>
<td style="text-align: center;">Undefined</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Undefined</td>
<td style="text-align: center;">Undefined</td>
</tr>
<tr class="even">
<td style="text-align: center;">B</td>
<td style="text-align: center;">Undefined</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Undefined</td>
</tr>
<tr class="odd">
<td style="text-align: center;">C</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Well defined</td>
<td style="text-align: center;">Well defined</td>
</tr>
</tbody>
</table>
<p>This paper proposes option 3 as the conservative option, without
prejudice to adopting something similar to option 1 or 2 in the future.
The status of Case A and Case B in ISO C is a bit unclear; if the C
standard were to be clarified to make Case A and Case B definitely well
defined, their specification strategy could provide inspiration for a
corresponding change in C++ to preserve compatibility. Assuming such a
change is not made, the small amount of C code that is similar to Case A
or Case B could be rewritten so that, if the subobject has type
<code class="sourceCode cpp"><span class="dt">char</span></code>, then
the pointer arithmetic is done using <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>
(Case C), and <em>vice versa</em>; it would then have the desired
behavior in both C and C++ under option 3.</p>
<p>For Case 2, I also found two examples in the 65 files that I analyzed
in which the subobject pointer points to a struct that is
pointer-interconvertible with an <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>
subobject. I didn’t find any examples with
<code class="sourceCode cpp"><span class="dt">char</span></code>, but
given that examples exist that use <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>,
I assume there are others that use
<code class="sourceCode cpp"><span class="dt">char</span></code>. Such
code would have relevant details similar to:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S3 <span class="op">{</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">char</span> a<span class="op">;</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">int</span>  b<span class="op">;</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="op">};</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S4 <span class="op">{</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    <span class="dt">char</span>      c<span class="op">;</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> S3 d<span class="op">;</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="op">};</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> S4<span class="op">*</span> get_s4<span class="op">(</span><span class="kw">struct</span> S3<span class="op">*</span> s3<span class="op">)</span> <span class="op">{</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="op">(</span><span class="kw">struct</span> S4<span class="op">*)((</span><span class="dt">char</span><span class="op">*)</span>s3 <span class="op">-</span> offsetof<span class="op">(</span>S4<span class="op">,</span> d<span class="op">));</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>In current C++, the cast to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
yields a pointer to the <code class="sourceCode cpp">a</code> subobject.
Note, however, that the entire example has undefined behavior because of
the subsequent pointer arithmetic. If we change the rules of C++ so that
the cast would be allowed to yield a pointer to the object
representation of the <code class="sourceCode cpp">S4</code> object, we
could make this example well-defined when it currently is not. In order
to avoid changing the behavior of any code that is already well-defined,
we could say that the status quo interpretation of the cast takes
precedence, and an pointer to the object representation is obtained only
when the former interpretation would produce undefined behavior. This
specification strategy is similar to that of implicit object creation,
in which the specific objects that are created may only be determined by
the details of a later operation, which would have UB other than under
one particular choice of objects to create.</p>
<h2 data-number="10.3" id="should-we-just-standardize-container_of"><span class="header-section-number">10.3</span> Should we just standardize
<code class="sourceCode cpp">container_of</code>?<a href="#should-we-just-standardize-container_of" class="self-link"></a></h2>
<p>Standardizing <code class="sourceCode cpp">container_of</code> could
provide a solution to the <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>/<code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>
problem: if the operand is of type <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>,
it would be cast to <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>,
and vice versa, thus ensuring that a pointer to the object
representation can always be obtained (In C, this behavior can be
implemented using a <code class="sourceCode cpp">_Generic</code>
expression.) Cases A and B in the previous subsection would remain
undefined but such undefined behavior would not be invoked as long as
<code class="sourceCode cpp">container_of</code> is used. However,
pursuing standardization of
<code class="sourceCode cpp">container_of</code> must begin in WG14, not
WG21, and in any case, such work would have to be <em>in addition</em>
to this paper, not instead of it. Casting to the other pointer type is
of no use if reachability restrictions are not also relaxed in C++.</p>
<h2 data-number="10.4" id="should-non-standard-layout-types-be-supported"><span class="header-section-number">10.4</span> Should non-standard-layout
types be supported?<a href="#should-non-standard-layout-types-be-supported" class="self-link"></a></h2>
<p>The <code class="sourceCode cpp">offsetof</code> macro is
conditionally-supported when its type argument is not a standard-layout
class (§<span>17.2.4
<a href="https://wg21.link/N5001#support.types.layout">[support.types.layout]</a></span>p1).
Any struct that is valid in C will produce a standard-layout class when
its definition is compiled as C++ code. Therefore, for purposes of C++
compatibility, we do not necessarily need to allow all bytes of a
non-standard-layout class to be reachable from a pointer to one of its
subobjects.</p>
<p>However, limiting the changes in this paper to standard-layout
classes has some downsides, and no known upsides:</p>
<ol type="1">
<li>It would complicate the specification.</li>
<li>It would leave a footgun in the language. For example, if a codebase
originally written in C were converted to be C++-only, and a
non-standard-layout member were then added to a standard-layout class,
existing code that uses <code class="sourceCode cpp">offsetof</code> to
reach the beginning of the class would silently acquire undefined
behavior.</li>
<li>Although a derived class object can always be reached (even if the
derived class is not standard-layout) from a pointer to a base class
subobject, there are situations in which it is desirable for the
complete object to have data members instead of base classes, including
data members of non-standard-layout types (which cause the enclosing
class to be non-standard-layout, too). These reasons, in the context of
the C++ execution library, are discussed by Lewis Baker in <span class="citation" data-cites="P3425R0">[<a href="https://wg21.link/p3425r0" role="doc-biblioref">P3425R0</a>]</span>. Allowing the equivalent of
<code class="sourceCode cpp">container_of</code> to be used in C++ would
simplify the use case discussed therein. (And perhaps the C++ standard
should be amended to require
<code class="sourceCode cpp">offsetof</code> to be supported for
non-standard-layout types that are aggregates, but that’s beyond the
scope of this paper.)</li>
</ol>
<h1 data-number="11" id="proposed-wording"><span class="header-section-number">11</span> Proposed wording<a href="#proposed-wording" class="self-link"></a></h1>
<p>This wording is a modified version of the wording in <span class="citation" data-cites="P1839R7">[<a href="https://isocpp.org/files/papers/P1839R7.html" role="doc-biblioref">P1839R7</a>]</span> and is relative to working
draft <span class="citation" data-cites="N5001">[<a href="https://wg21.link/n5001" role="doc-biblioref">N5001</a>]</span>.</p>
<p>Modify §<span>6.7.2
<a href="https://wg21.link/N5001#intro.object">[intro.object]</a></span>p3
as follows:</p>
<blockquote>
<p>If a complete object is created ([expr.new]) in storage associated
with another object <em>e</em> of type “array of <em>N</em> <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>”
<span class="add" style="color: #006e28"><ins>other than a synthesized
object representation ([basic.types.general])</ins></span> or of type
“array of <em>N</em>
<code class="sourceCode cpp">std<span class="op">::</span>byte</code>”
([cstddef.syn]), that array <em>provides storage</em> for the created
object if […]</p>
</blockquote>
<p>Modify §<span>6.7.2
<a href="https://wg21.link/N5001#intro.object">[intro.object]</a></span>p4
as follows:</p>
<blockquote>
<p>An object <em>a</em> is <em>nested within</em> another object
<em>b</em> if</p>
<ul>
<li><em>a</em> is a subobject of <em>b</em>, or</li>
<li><em>b</em> provides storage for <em>a</em>, or</li>
<li><span class="add" style="color: #006e28"><ins><em>a</em> and
<em>b</em> are the object representations of two objects <em>o1</em> and
<em>o2</em>, where <em>o2</em> provides storage for <em>o1</em>,
or</ins></span></li>
<li>there exists an object <em>c</em> where <em>a</em> is nested within
<em>c</em>, and <em>c</em> is nested within <em>b</em>.</li>
</ul>
</blockquote>
<p>Modify §<span>6.7.2
<a href="https://wg21.link/N5001#intro.object">[intro.object]</a></span>p10
as follows:</p>
<blockquote>
<p>Unless an object is a bit-field or a subobject of zero size, the
address of that object is the address of the first byte it occupies. Two
objects with overlapping lifetimes that are not bit-fields may have the
same address if</p>
<ul>
<li>one is nested within the other,</li>
<li>at least one is a subobject of zero size and they are not of similar
types ([conv.qual]),<span class="rm" style="color: #bf0303"><del>or</del></span></li>
<li><span class="add" style="color: #006e28"><ins>at least one is a
synthesized object representation or element thereof,
or</ins></span></li>
<li>they are both potentially non-unique objects;</li>
</ul>
<p>otherwise, they have distinct addresses and occupy distinct bytes of
storage.</p>
</blockquote>
<p>Modify §<span>6.7.2
<a href="https://wg21.link/N5001#intro.object">[intro.object]</a></span>p14
as follows:</p>
<blockquote>
<p>Except during constant evaluation, an operation that begins the
lifetime of an array of <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span></code>
or <code class="sourceCode cpp">std<span class="op">::</span>byte</code>
<span class="add" style="color: #006e28"><ins>other than a synthesized
object representation ([basic.types.general])</ins></span> implicitly
creates objects within the region of storage occupied by the array.</p>
</blockquote>
<p>Edit §<span>6.7.4
<a href="https://wg21.link/N5001#basic.life">[basic.life]</a></span>p1
as follows:</p>
<blockquote>
<p>[…] The lifetime of an object of type
<code class="sourceCode cpp">T</code> <span class="add" style="color: #006e28"><ins>other than an element of a synthesized
object representation ([basic.types.general])</ins></span> begins
when</p>
<ul>
<li>storage with the proper alignment and size for type
<code class="sourceCode cpp">T</code> is obtained, and</li>
<li><span class="add" style="color: #006e28"><ins>if it is not a
synthesized object representation,</ins></span> its initialization (if
any) is complete (including vacuous initialization) ([dcl.init]),</li>
</ul>
<p>except […]. The lifetime of an object <em>o</em> of type
<code class="sourceCode cpp">T</code> <span class="add" style="color: #006e28"><ins>other than an element of a synthesized
object representation</ins></span> ends when:</p>
<ul>
<li>if <code class="sourceCode cpp">T</code> is a non-class type, the
object is destroyed, or</li>
<li>if <code class="sourceCode cpp">T</code> is a class type, the
destructor call starts, or</li>
<li>the storage which the object occupies is released, or is reused by
an object that is <span class="rm" style="color: #bf0303"><del>not</del></span><span class="add" style="color: #006e28"><ins>neither</ins></span> nested within
<em>o</em> ([intro.object]) <span class="add" style="color: #006e28"><ins>nor nested within the object of which
<em>o</em> is the object representation, if any
([basic.types.general])</ins></span>.</li>
</ul>
<p>When evaluating a <em>new-expression</em>, […]<br />
[<em>Example 1</em>: […] <em>end example</em>]<br />
<span class="add" style="color: #006e28"><ins>A synthesized object
representation is not considered to reuse the storage of any other
object.</ins></span></p>
</blockquote>
<p>Insert a new paragraph after §<span>6.7.4
<a href="https://wg21.link/N5001#basic.life">[basic.life]</a></span>p3
as follows:</p>
<blockquote>
<p>The lifetime of a reference begins when its initialization is
complete. The lifetime of a reference ends as if it were a scalar object
requiring storage.</p>
</blockquote>
<blockquote>
<p>[<em>Note 1</em>: [class.base.init] describes the lifetime of base
and member subobjects. —<em>end note</em>]</p>
</blockquote>
<blockquote>
<p>[For an object <em>o</em> of class type, the lifetimes of the
elements of the synthesized object representation begin when the
construction of <em>o</em> begins and end when the destruction of
<em>o</em> completes. Otherwise, the lifetimes of the elements of the
synthesized object representation (if any) are the lifetime of
<em>o</em>.</p>
</blockquote>
<p>Modify §<span>6.8.1
<a href="https://wg21.link/N5001#basic.types.general">[basic.types.general]</a></span>p4
as follows and add a paragraph after it:</p>
<blockquote>
<p>The <em>object representation</em> of a complete object type
<code class="sourceCode cpp">T</code> is the sequence of <em>N</em>
<span class="rm" style="color: #bf0303"><del><span><code class="sourceCode default">unsigned char</code></span>
objects</del></span><span class="add" style="color: #006e28"><ins>bytes</ins></span> taken up by a <span class="rm" style="color: #bf0303"><del>non-bit-field</del></span>
complete object of type <code class="sourceCode cpp">T</code>, where
<em>N</em> equals <code class="sourceCode cpp"><span class="kw">sizeof</span><span class="op">(</span>T<span class="op">)</span></code>.
The <em>value representation</em> of a type
<code class="sourceCode cpp">T</code> is the set of bits in the object
representation of <code class="sourceCode cpp">T</code> that participate
in representing a value of type <code class="sourceCode cpp">T</code>.
The object and value representation of a <span class="rm" style="color: #bf0303"><del>non-bit-field</del></span> complete object
of type <span class="add" style="color: #006e28"><ins><em>cv</em></ins></span>
<code class="sourceCode cpp">T</code> are the bytes and bits,
respectively, of the object corresponding to the object and value
representation of its type<span class="add" style="color: #006e28"><ins>; the object representation is considered to
be an array of <em>N</em> <em>cv</em>
<span><code class="sourceCode default">unsigned char</code></span> if
the object occupies contiguous bytes of storage
([intro.object])</ins></span>. The object representation of a bit-field
object is the sequence of <em>N</em> bits taken up by the object, where
<em>N</em> is the width of the bit-field ([class.bit]). The value
representation of a bit-field object is the set of bits in the object
representation that participate in representing its value. Bits in the
object representation of a type or object that are not part of the value
representation are padding bits. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.</p>
</blockquote>
<div class="add" style="color: #006e28">

<blockquote>
<p>For a complete object <em>o</em> with type <em>cv</em>
<code class="sourceCode default">T</code> whose object representation is
an array <em>A</em>:</p>
<ul>
<li>If <em>o</em> has type “array of <em>cv</em>
<code class="sourceCode default">unsigned char</code>”, then <em>A</em>
is <em>o</em>.</li>
<li>Otherwise, <em>A</em> is said to be a <em>synthesized object
representation</em>, and is distinct from any object that is not an
object representation.<br />
[<em>Note</em>: In particular, when an array <em>B</em> of <em>N</em>
<code class="sourceCode default">unsigned char</code> provides storage
for an object <em>o</em> of size <em>N</em>, the object representation
of <em>o</em> is a different array that occupies the same storage as
<em>B</em>. —<em>end note</em>]<br />
For each element <em>e</em> of <em>A</em>:
<ul>
<li>If <em>e</em> occupies the same storage as an object having type
<em>cv</em> <code class="sourceCode default">char</code>, <em>cv</em>
<code class="sourceCode default">unsigned char</code>, or <em>cv</em>
<code class="sourceCode default">std::byte</code> that is either
<em>o</em> or a non-bit-field subobject thereof, the value of <em>e</em>
is the value congruent ([basic.fundamental]) to that of the
subobject.</li>
<li>Otherwise, for each bit <em>b</em> in the byte of <em>o</em> that
corresponds to <em>e</em>, let <em>b</em>’ be the corresponding bit of
<em>e</em> and let <em>p(b)</em> be the smallest subobject of <em>o</em>
that contains <em>b</em>, other than an inactive union member or
subobject thereof. If <em>p(b)</em> is a union object or is not within
its lifetime or has an indeterminate value, or if <em>b</em> is not part
of the value representation of <em>p(b)</em>, then <em>b</em>’ has
indeterminate value. Otherwise, if <em>b</em> has an erroneous value,
then <em>b</em>’ has an erroneous value. Otherwise, <em>b</em>’ has an
unspecified value that is neither indeterminate nor erroneous; such a
bit retains its value until <em>p(b)</em> is subsequently modified.</li>
</ul>
[<em>Note:</em> Attempting to access an element of a synthesized object
representation of a volatile object results in undefined behavior
([dcl.type.cv]). —<em>end note</em>]</li>
</ul>
<p>[<em>Note</em>: An object representation is always a complete object.
—<em>end note</em>]</p>
</blockquote>

</div>
<p>Modify §<span>6.8.4
<a href="https://wg21.link/N5001#basic.compound">[basic.compound]</a></span>p5
as follows:</p>
<blockquote>
<p>Two objects <em>a</em> and <em>b</em> are
<em>pointer-interconvertible</em> if <span class="add" style="color: #006e28"><ins>they have the same address
and</ins></span>:</p>
<ul>
<li><span class="rm" style="color: #bf0303"><del>they are the same
object, or</del></span></li>
<li><span class="rm" style="color: #bf0303"><del>one is a union object
and the other is a non-static data member of that object
([class.union]), or</del></span></li>
<li><span class="rm" style="color: #bf0303"><del>one is a
standard-layout class object and the other is the first non-static data
member of that object or any base class subobject of that object
([class.mem]), or</del></span></li>
<li><span class="rm" style="color: #bf0303"><del>there exists an object
<em>c</em> such that <em>a</em> and <em>c</em> are
pointer-interconvertible, and <em>c</em> and <em>b</em> are
pointer-interconvertible.</del></span></li>
<li><span class="add" style="color: #006e28"><ins>they have the same
complete object, or</ins></span></li>
<li><span class="add" style="color: #006e28"><ins>the complete object of
one is the object representation of the complete object of the
other.</ins></span></li>
</ul>
<p><span class="rm" style="color: #bf0303"><del>If two objects are
pointer-interconvertible, then they have the same address, and it is
possible to obtain a pointer to one from a pointer to the other via a
<span><code class="sourceCode default">reinterpret_cast</code></span>
([expr.reinterpret.cast]).</del></span><br />
<span class="add" style="color: #006e28"><ins>[<em>Note</em>: A
<span><code class="sourceCode default">reinterpret_cast</code></span>
([expr.reinterpret.cast]) never converts a pointer to <em>a</em> to a
pointer to <em>b</em> unless <em>a</em> and <em>b</em> are
pointer-interconvertible. —<em>end note</em>]</ins></span></p>
<p><span class="add" style="color: #006e28"><ins>[<em>Note</em>: A
standard-layout class object is pointer-interconvertible with its first
non-static data member (if any) and each of its base class subobjects
([class.mem]). An array object and an object that the array provides
storage for are not pointer-interconvertible. —<em>end
note</em>]</ins></span></p>
</blockquote>
<p>Modify §<span>6.8.4
<a href="https://wg21.link/N5001#basic.compound">[basic.compound]</a></span>p6
as follows:</p>
<blockquote>
<p>A byte of storage <em>b</em> is <em>reachable through</em> a pointer
value that points to an object <em>x</em> if <span class="rm" style="color: #bf0303"><del>there is an object <em>y</em>,
pointer-interconvertible with <em>x</em>, such that <em>b</em> is within
the storage occupied by <em>y</em>, or the immediately-enclosing array
object if <em>y</em> is an array element</del></span><span class="add" style="color: #006e28"><ins><em>b</em> is within the storage occupied by
<em>x</em>’s complete object</ins></span>.</p>
</blockquote>
<p>Modify §<span>7.2.1
<a href="https://wg21.link/N5001#basic.lval">[basic.lval]</a></span>p11
as follows:</p>
<blockquote>
<p>An object of dynamic type
<code class="sourceCode cpp">T</code><sub>obj</sub> is
<em>type-accessible</em> through a glvalue of type
<code class="sourceCode cpp">T</code><sub>ref</sub> if
<code class="sourceCode cpp">T</code><sub>ref</sub> is similar
([conv.qual]) to:</p>
<ul>
<li><code class="sourceCode cpp">T</code><sub>obj</sub>,</li>
<li>a type that is the signed or unsigned type corresponding to
<code class="sourceCode cpp">T</code><sub>obj</sub>, or</li>
<li>a
<code class="sourceCode cpp"><span class="dt">char</span></code><span class="rm" style="color: #bf0303"><del>,
<span><code class="sourceCode default">unsigned char</code></span>,</del></span>
or <code class="sourceCode cpp">std<span class="op">::</span>byte</code>
type <span class="add" style="color: #006e28"><ins>,if the object is an
element of an object representation
([basic.life.general])</ins></span>.</li>
</ul>
<p>If a program attempts to access ([defns.access]) the stored value of
an object through a glvalue through which it is not type-accessible, the
behavior is undefined. […]<br />
[<em>Note 11</em>: […]]<br />
<span class="add" style="color: #006e28"><ins>[<em>Example 2</em>: An
element of an object representation can be accessed through a glvalue of
type <span><code class="sourceCode default">char</code></span>,
<span><code class="sourceCode default">unsigned char</code></span>,
<span><code class="sourceCode default">signed char</code></span>,
<span><code class="sourceCode default">std::byte</code></span>, or a
cv-qualified version of any of these types. —<em>end
example</em>]</ins></span></p>
</blockquote>
<p><em>Drafting note</em>: Because we don’t guarantee that all complete
objects are contiguous (see <span class="citation" data-cites="P1945R0">[<a href="https://wg21.link/p1945r0" role="doc-biblioref">P1945R0</a>]</span>) it cannot always be guaranteed
that, e.g., a <code class="sourceCode cpp"><span class="kw">reinterpret_cast</span></code>
to <code class="sourceCode cpp"><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*</span></code>
will yield a pointer to an element of an object representation: no
synthesized object representation is present at all in the discontiguous
case. In those cases, we do not attempt to specify the behavior of
accesss the original object through a glvalue of char-like type, so we
shouldn’t claim that it’s well defined to do so.</p>
<p>Modify §<span>7.3.2
<a href="https://wg21.link/N5001#conv.lval">[conv.lval]</a></span>p3.4,
as amended by the proposed resolution of <span class="citation" data-cites="CWG2901">[<a href="https://wg21.link/cwg2901" role="doc-biblioref">CWG2901</a>]</span>, as follows:</p>
<blockquote>
<ul>
<li>Otherwise, the object indicated by the glvalue is read
([defns.access]). Let <em>V</em> be the value contained in the object.
If <code class="sourceCode cpp">T</code> is an integer type <span class="add" style="color: #006e28"><ins>or <em>cv</em>
<span><code class="sourceCode default">std::byte</code></span></ins></span>,
the prvalue result is the value of <code class="sourceCode cpp">T</code>
congruent ([basic.fundamental]) to <em>V</em>, and <em>V</em> otherwise.
[…]</li>
</ul>
</blockquote>
<p>Modify §<span>7.6.1.9
<a href="https://wg21.link/N5001#expr.static.cast">[expr.static.cast]</a></span>p13
as follows:</p>
<blockquote>
<p>[…] Otherwise, if the original pointer value points to an object
<em>a</em>, <span class="rm" style="color: #bf0303"><del>and there is an
object <em>b</em> of type similar to
<span><code class="sourceCode default">T</code></span> that is
pointer-interconvertible ([basic.compound]) with <em>a</em>, the result
is a pointer to <em>b</em>. Otherwise, the pointer value is unchanged by
the conversion.</del></span><span class="add" style="color: #006e28"><ins>let <em>S</em> be the set of objects that
are pointer-interconvertible with <em>a</em> and have type similar to
<span><code class="sourceCode default">T</code></span>.</ins></span></p>
<div class="add" style="color: #006e28">

<ul>
<li>If <em>S</em> contains <em>a</em>, the result is a pointer to
<em>a</em>.</li>
<li>Otherwise, the result is a member of <em>S</em> whose complete
object is not a synthesized object representation if any such result
would give the program defined behavior. If there are multiple possible
results that would give the program defined behavior, the result is an
unspecified choice among them.</li>
<li>Otherwise (i.e. when there are no such members of <em>S</em> that
would give the program defined behavior), if the object representation
of <em>a</em>’s object is an array <em>A</em>,
<code class="sourceCode default">T</code> is similar to the type of
<em>A</em>, and <em>A</em> is a member of <em>S</em>, the result is a
pointer to <em>A</em>.</li>
<li>Otherwise, if the object representation of <em>a</em>’s complete
object is an array and <code class="sourceCode default">T</code> is
<em>cv</em> <code class="sourceCode default">unsigned char</code>, the
result is a pointer to the element of that object representation that
has the same address as <em>a</em>.</li>
<li>Otherwise, if <code class="sourceCode default">T</code> is
<em>cv</em> <code class="sourceCode default">char</code> or <em>cv</em>
<code class="sourceCode default">std::byte</code>, or an array of one of
these types, let <code class="sourceCode default">U</code> be the type
obtained from <code class="sourceCode default">T</code> by replacing
<code class="sourceCode default">char</code> or
<code class="sourceCode default">std::byte</code> with
<code class="sourceCode default">unsigned char</code>. If a
<code class="sourceCode default">static_cast</code> of the operand to
<code class="sourceCode default">U*</code> would be well-formed and
would yield a pointer to an object representation or element thereof,
the result of the cast to <code class="sourceCode default">T*</code> is
that pointer value.</li>
<li>Otherwise, the result is a pointer to <em>a</em>.</li>
</ul>
<p>Otherwise, if the original pointer value points past the end of an
object <em>a</em>:</p>
<ul>
<li>If the object representation of the complete object of <em>a</em> is
an array <em>A</em>, <code class="sourceCode default">T</code> is
similar to the type of <em>A</em>, and <em>a</em> has the same address
as <em>A</em>, the result is
<code class="sourceCode default">&amp;</code><em>A</em><code class="sourceCode default">+1</code>.</li>
<li>Otherwise, if the object representation of the complete object of
<em>a</em> is an array <em>A</em> and
<code class="sourceCode default">T</code> is <em>cv</em>
<code class="sourceCode default">unsigned char</code>, the result is a
pointer to the element of <em>A</em> (possibly the past-the-end element)
that has the same address as the one represented by the operand.</li>
<li>Otherwise, if <code class="sourceCode default">T</code> is
<em>cv</em> <code class="sourceCode default">char</code> or <em>cv</em>
<code class="sourceCode default">std::byte</code>, or an array of one of
these types, let <code class="sourceCode default">U</code> be the type
obtained from <code class="sourceCode default">T</code> by replacing
<code class="sourceCode default">char</code> or
<code class="sourceCode default">std::byte</code> with
<code class="sourceCode default">unsigned char</code>. If a
<code class="sourceCode default">static_cast</code> of the operand to
<code class="sourceCode default">U*</code> would be well-formed and
would yield a pointer value defined by one of the above cases, the
result of the cast to <code class="sourceCode default">T*</code> is that
pointer value.</li>
<li>Otherwise, the result is the value of the operand.</li>
</ul>

</div>
</blockquote>
<p>Modify §<span>7.6.6
<a href="https://wg21.link/N5001#expr.add">[expr.add]</a></span>p6 as
follows:</p>
<blockquote>
<p>For addition or subtraction, if the expressions
<code class="sourceCode cpp">P</code> or
<code class="sourceCode cpp">Q</code> have type “pointer to <em>cv</em>
<code class="sourceCode cpp">T</code>”<span class="rm" style="color: #bf0303"><del>, where
<span><code class="sourceCode default">T</code></span> and the array
element type are not similar, the behavior is undefined.</del></span>
<span class="add" style="color: #006e28"><ins>, one of the following
shall hold:</ins></span></p>
<ul>
<li><span class="add" style="color: #006e28"><ins><span><code class="sourceCode default">T</code></span>
is similar to the array element type, or</ins></span></li>
<li><span class="add" style="color: #006e28"><ins><span><code class="sourceCode default">T</code></span>
is similar to <span><code class="sourceCode default">char</code></span>
or <span><code class="sourceCode default">std::byte</code></span> and
the pointer value points to a (possibly-hypothetical) element of an
object representation.</ins></span></li>
</ul>
<p><span class="add" style="color: #006e28"><ins>Otherwise, the behavior
is undefined.</ins></span></p>
</blockquote>
<p>Modify §<span>9.2.9.2
<a href="https://wg21.link/N5001#dcl.type.cv">[dcl.type.cv]</a></span>p5
as follows:</p>
<blockquote>
<p><span class="add" style="color: #006e28"><ins>If an attempt is made
to access an element <em>e</em> of a synthesized object
([basic.types.general]) and <em>e</em> overlaps the storage occupied by
a volatile object (including a subobject) that is within its lifetime,
the behavior is undefined. Otherwise, the</ins></span> <span class="rm" style="color: #bf0303"><del>The</del></span> semantics of an access
through a volatile glvalue are implementation-defined. If an attempt is
made to access an object defined with a volatile-qualified type through
the use of a non-volatile glvalue, the behavior is undefined.</p>
</blockquote>
<h1 data-number="12" id="appendix-a"><span class="header-section-number">12</span> Appendix A<a href="#appendix-a" class="self-link"></a></h1>
<p>The C programming language already contains an opt-in feature that
can be used to tell the compiler that a pointer to part of an object
cannot be used to access other parts of the same object. That feature is
the <code class="sourceCode cpp">restrict</code> keyword. Using
<code class="sourceCode cpp">restrict</code>, the definition of the
function <code class="sourceCode cpp">f3</code> given previously could
be changed to:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> f3<span class="op">(</span><span class="dt">void</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> S s<span class="op">;</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    s<span class="op">.</span>x <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">{</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>        <span class="kw">struct</span> S<span class="op">*</span> <span class="dt">restrict</span> p <span class="op">=</span> <span class="op">&amp;</span>s<span class="op">;</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>        f4<span class="op">(&amp;</span>s<span class="op">.</span>y<span class="op">);</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span> p<span class="op">-&gt;</span>x <span class="op">*</span> p<span class="op">-&gt;</span>x<span class="op">;</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>In the example above, if
<code class="sourceCode cpp">s<span class="op">.</span>x</code> is
accessed through an lvalue that is <em>based on</em> the restricted
pointer <code class="sourceCode cpp">p</code> <strong>and</strong>
<code class="sourceCode cpp">s<span class="op">.</span>x</code> is
modified at any point during the execution of the block in which
<code class="sourceCode cpp">p</code> is defined, then all accesses to
<code class="sourceCode cpp">s<span class="op">.</span>x</code> during
that block must be through lvalues that are based on
<code class="sourceCode cpp">p</code>. The first condition (that
<code class="sourceCode cpp">s<span class="op">.</span>x</code> is
accessed through an lvalue based on
<code class="sourceCode cpp">p</code>) is already met by the return
statement in <code class="sourceCode cpp">f3</code>; the second
condition will be met if <code class="sourceCode cpp">f4</code> attempts
to modify
<code class="sourceCode cpp">s<span class="op">.</span>x</code>. In that
case, all accesses to
<code class="sourceCode cpp">s<span class="op">.</span>x</code> during
the lifetime of <code class="sourceCode cpp">p</code> would need to be
through lvalues based on <code class="sourceCode cpp">p</code>, but the
modification in <code class="sourceCode cpp">f4</code> could not be, so
the behavior would be undefined. The compiler can assume that this
scenario does not occur, and that
<code class="sourceCode cpp">s<span class="op">.</span>x</code> will
still have the value 1 upon return from
<code class="sourceCode cpp">f4</code>.</p>
<p>GCC does not actually perform this optimization, even with
<code class="sourceCode cpp"><span class="op">-</span>O3</code>. I can
only speculate as to the reason: I suspect that this is not the kind of
optimization that <code class="sourceCode cpp">restrict</code> was
designed to enable, and that such an optimization is simply not very
useful. However, let’s assume for the sake of argument that some experts
would benefit from being given a tool to enable such an optimization in
C++: one that (unlike the current reachability rules in C++) could
actually be used by implementations without breaking compatibility with
C. What might that tool look like?
<code class="sourceCode cpp">restrict</code> itself is unlikely to be
added to C++. If we were to design a different feature for this purpose,
we would probably want it to be in a form that could also be added to
C.</p>
<p>For example, we could change the definition of pointer values in the
C++ standard so that, in the case of an object pointer, the value not
only identifies the object that the pointer value points to or past the
end of, but also includes a <em>reachable range</em>, which is a
contiguous set of bytes; a pointer could be used to access memory only
at addresses that lie within the pointer value’s reachable range. This
provenance model is the one used by CHERI, which refers to the reachable
range as the <em>bounds</em> of a pointer value. The <em>CHERI C/C++
Programming Guide</em> <span class="citation" data-cites="CHERI">[<a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf" role="doc-biblioref">CHERI</a>]</span> states that the <em>subobject
bounds</em> feature (described in Section 4.3.3), in which taking the
address of a subobject produces a pointer value whose bounds are
narrowed to the memory occupied by the subobject, is not enabled by
default, and when enabled, breaks code that uses the
“<code class="sourceCode cpp">containerof</code> pattern” (p. 16); such
code must be modified to <em>opt out</em> of subobject bounds. However,
CHERI aims to provide improved safety (e.g., by “[preventing] an
overflow on [an array subobject] from affecting the remainder of the
structure”); when the objective of narrowing bounds is to create
potential UB and enable additional optimizations, an opt-in mechanism is
more appropriate. Such an opt-in mechanism, that would be based on Core
wording that defines reachable ranges, might be a library function like
the following:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">/// If `p1` is a null pointer, return `p1`.  Otherwise, return a pointer that</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co">/// points to or past the end of the same object `o` as `p1` but whose</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co">/// reachable range consists of the bytes in [p2, p3).  The storage occupied by</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="co">/// `o` shall be a subrange of [p2, p3), which shall be a subrange of the</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="co">/// reachable range of `p1`; otherwise, the behavior is undefined.</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span><span class="op">*</span> narrow_reachable_range_to<span class="op">(</span><span class="dt">void</span><span class="op">*</span> p1,</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>                                <span class="kw">const</span> <span class="dt">void</span><span class="op">*</span> p2,</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>                                <span class="kw">const</span> <span class="dt">void</span><span class="op">*</span> p3<span class="op">)</span>;</span></code></pre></div>
<p>The same library function could also be available in C; for example,
it could be in the <code class="sourceCode cpp"><span class="op">&lt;</span>stdlib<span class="op">.</span>h<span class="op">&gt;</span></code>
header. The previously given example would then become:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> f3<span class="op">(</span><span class="dt">void</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> S s<span class="op">;</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    s<span class="op">.</span>x <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>    f4<span class="op">((</span><span class="dt">int</span><span class="op">*)</span>narrow_reachable_range_to<span class="op">(&amp;</span>s<span class="op">.</span>y<span class="op">,</span> <span class="op">&amp;</span>s<span class="op">.</span>y<span class="op">,</span> <span class="op">&amp;</span>s<span class="op">.</span>y <span class="op">+</span> <span class="dv">1</span><span class="op">));</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> s<span class="op">.</span>x <span class="op">*</span> s<span class="op">.</span>x<span class="op">;</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>The C++ standard library could provide more convenient (presumably
templated) facilities built on top of
<code class="sourceCode cpp">narrow_reachable_range_to</code>.</p>
<p>This paper does not propose to add reachable ranges to the C++
standard, nor a library function similar to
<code class="sourceCode cpp">narrow_reachable_range_to</code>. This
Appendix merely aims to describe one possibility as to how the
optimizations that the paper seeks to invalidate could be recovered by a
future opt-in mechanism.</p>
<h1 data-number="13" id="bibliography"><span class="header-section-number">13</span> References<a href="#bibliography" class="self-link"></a></h1>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="1" role="doc-bibliography">
<div id="ref-CHERI" class="csl-entry" role="doc-biblioentry">
[CHERI] Robert N. M. Watson et al. 2020-06. CHERI C/C++ Programming
Guide. <a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf"><div class="csl-block">https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf</div></a>
</div>
<div id="ref-CWG2901" class="csl-entry" role="doc-biblioentry">
[CWG2901] Jan Schultke. 2024-06-14. Unclear semantics for near-match
aliased access. <a href="https://wg21.link/cwg2901"><div class="csl-block">https://wg21.link/cwg2901</div></a>
</div>
<div id="ref-WG14_N3057" class="csl-entry" role="doc-biblioentry">
[N3057] Jens Gustedt et al. 2022-09-20. Programming languages - A
Provenance-aware memory object model for C. <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3057.pdf"><div class="csl-block">https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3057.pdf</div></a>
</div>
<div id="ref-N5001" class="csl-entry" role="doc-biblioentry">
[N5001] Thomas Köppe. 2024-12-17. Working Draft, Programming Languages —
C++. <a href="https://wg21.link/n5001"><div class="csl-block">https://wg21.link/n5001</div></a>
</div>
<div id="ref-P0137R1" class="csl-entry" role="doc-biblioentry">
[P0137R1] Richard Smith. 2016-06-23. Core Issue 1776: Replacement of
class objects containing reference members. <a href="https://wg21.link/p0137r1"><div class="csl-block">https://wg21.link/p0137r1</div></a>
</div>
<div id="ref-P1839R1" class="csl-entry" role="doc-biblioentry">
[P1839R1] Krystian Stasiowski. 2019-10-02. Accessing Object
Representations. <a href="https://wg21.link/p1839r1"><div class="csl-block">https://wg21.link/p1839r1</div></a>
</div>
<div id="ref-P1839R6" class="csl-entry" role="doc-biblioentry">
[P1839R6] Brian Bi, Krystian Stasiowski, Timur Doumler. 2024-10-14.
Accessing object representations. <a href="https://wg21.link/p1839r6"><div class="csl-block">https://wg21.link/p1839r6</div></a>
</div>
<div id="ref-P1839R7" class="csl-entry" role="doc-biblioentry">
[P1839R7] Timur Doumler, Krystian Stasiowski, Brian Bi. 2025-01.
Accessing object representations. <a href="https://isocpp.org/files/papers/P1839R7.html"><div class="csl-block">https://isocpp.org/files/papers/P1839R7.html</div></a>
</div>
<div id="ref-P1945R0" class="csl-entry" role="doc-biblioentry">
[P1945R0] Krystian Stasiowski. 2019-10-28. Making More Objects
Contiguous. <a href="https://wg21.link/p1945r0"><div class="csl-block">https://wg21.link/p1945r0</div></a>
</div>
<div id="ref-P2795R5" class="csl-entry" role="doc-biblioentry">
[P2795R5] Thomas Köppe. 2024-03-22. Erroneous behaviour for
uninitialized reads. <a href="https://wg21.link/p2795r5"><div class="csl-block">https://wg21.link/p2795r5</div></a>
</div>
<div id="ref-P2883R0" class="csl-entry" role="doc-biblioentry">
[P2883R0] Alisdair Meredith. 2023-05-19. `offsetof` Should Be A Keyword
In C++26. <a href="https://wg21.link/p2883r0"><div class="csl-block">https://wg21.link/p2883r0</div></a>
</div>
<div id="ref-P3425R0" class="csl-entry" role="doc-biblioentry">
[P3425R0] Lewis Baker. 2024-10-16. Reducing operation-state sizes for
subobject child operations. <a href="https://wg21.link/p3425r0"><div class="csl-block">https://wg21.link/p3425r0</div></a>
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>In cases where the macro’s name is precisely
<code class="sourceCode cpp">container_of</code>, it appears that it
usually refers to the version defined by the Linux kernel. This version
uses <code class="sourceCode cpp"><span class="dt">void</span><span class="op">*</span></code>,
not <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>;
pointer arithmetic using <code class="sourceCode cpp"><span class="dt">void</span><span class="op">*</span></code>
is not proposed by this paper. However, <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
is used in many other cases.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>All citations to the Standard are to working draft N5001
unless otherwise specified.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>In C,
<code class="sourceCode cpp"><span class="dt">void</span></code> is an
object type.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Note that the Provenance TS does not state that two
different complete objects always have different storage IDs. According
to section 3.20, a single allocation creates a single storage instance.
For example, when <code class="sourceCode cpp">malloc</code> succeeds,
it returns a pointer to “the allocated storage instance” (per section
7.22.3.4).<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>An example of dangerous UB is reading from uninitialized
variables. I’ve observed recent versions of Clang eliding branches along
which uninitialized variables are read, causing unit tests to fail when
Clang was upgraded. Such behavior will become (mostly) disallowed in
C++26 due to the adoption of <span class="citation" data-cites="P2795R5">[<a href="https://wg21.link/p2795r5" role="doc-biblioref">P2795R5</a>]</span>.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>§6.2.6.1p7 in <span class="citation" data-cites="WG14_N3057">[<a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3057.pdf" role="doc-biblioref">N3057</a>]</span> refers to the “byte array of the
storage instance”, implying that pointer arithmetic can be used to
traverse the entire storage instance. However, a pointer to the first
element of <code class="sourceCode cpp">buf</code> does not appear to be
specified to be interchangeable with a pointer to the corresponding
element of the byte array of the storage instance. The latter value must
be obtained from the former through conversion, as in Case C. In Case B,
§6.5.4p6 of the C23 draft would appear to apply; it states that “A cast
that specifies no conversion has no effect on the type or value of the
expression”. Therefore, casting from <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
to <code class="sourceCode cpp"><span class="dt">char</span><span class="op">*</span></code>
behaves as if the cast were absent.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
</div>
</div>
</body>
</html>
