<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>Erroneous behaviour for uninitialized reads</title>
    <style type="text/css">
      html { margin: 0; padding: 0; color: black; background-color: white; }
      body { margin: 0 auto; padding: 2em; font-size: medium; font-family: "DejaVu Serif", serif; line-height: 150%; max-width: 60em; }
      code { font-family: "DejaVu Sans Mono", monospace; color: #006; }

      h1, h2, h3 { margin: 1.5em 0 .75em 0; line-height: 125%; }
      h1 { clear: both; }

      div.code { white-space: pre-line; font-family: "DejaVu Sans Mono", monospace;
                 border: thin solid #E0E0E0; background-color: #F8F8F8; padding: 1em;
                 border-radius: 4px; }

      div.strictpre { white-space: pre; }

      sub, sup { margin: 0; padding: 0; line-height: 100%; }

      table { border-collapse: collapse; margin: 2em auto; }
      table caption { margin: 2ex 0 0 0; caption-side: bottom; font-family: "DejaVu Sans", sans-serif; font-size: small; }

      th, td { text-align: left; vertical-align: top; padding: .5ex 1em; margin: 0; }

      td.new { background-color: #EFE; }
      td.new:after { content: "new!"; font-family: "DejaVu Sans", sans-serif; font-weight: bold; font-size: xx-small;
                     vertical-align: top; top: -1em; right: -1em; position: relative; float: right; color: #090; }

      thead th { border-top: 2px solid #333; border-bottom: 2px solid #333; }
      tbody tr:last-child th, tbody tr:last-child td, tbody tr th.last { border-bottom: 2px solid #333; }
      tbody.lined td, tr.line td { border-bottom: 1px solid #333; }

      .code .note { font-family: "DejaVu Sans", sans-serif; font-size: small; padding: 0; margin: 0; color: #333; }

      .docinfo { float: right }
      .docinfo p { margin: 0; text-align:right; }
      .docinfo address { font-style: normal; margin-bottom: 2em; }

      .quote { display: inline-block; clear: both; margin-left: 1ex;
                 border: thin solid #E0E0E0; background-color: #F8F8F8; padding: 1ex; }

      .modify { border-left: thick solid #999; border-right: thick solid #999; padding: 0 1em; }
      .insert { border-left: thick solid #0A0; border-right: thick solid #0A0; padding: 0 1em; }
      .insert h3, .insert h4, .insert p { text-decoration: underline; color: #090; }
      .comment { color: #456; }
      .inclassit { font-family: "DejaVu Serif", serif; font-style: italic; }
      .insinline { border-bottom: 2px solid #0A0; }

      ins { color: #090; }
      del { color: #A00; }
      ins code, del code, .insert code { color: inherit; }

      ul.wide li { margin-bottom: 1em; }
      ul.wide li div.code { padding: 0.25ex 1ex; margin: 1ex 0; }
    </style>
  </head>
  <body>
    <div class="docinfo">
      <p>ISO/IEC JTC1 SC22 WG21 P2795R4</p>
      <p>Date: 2023-11-10</p>
      <p>To: SG12, SG23, EWG, CWG</p>
      <address>
        Thomas K&ouml;ppe &lt;<a href="mailto:tkoeppe@google.com">tkoeppe@google.com</a>&gt;
      </address>
    </div>

    <h1>Erroneous behaviour for uninitialized reads</h1>

    <h2>Contents</h2>
    <!-- fgrep -e "<h2 id=" meaning_of_code.html | sed -e 's/.*id="\(.*\)">\(.*\)<\/h2>/<li><a href="#\1">\2<\/a><\/li>/g' -->
    <ol>
      <li><a href="#history">Revision history</a></li>
      <li><a href="#summary">Summary</a></li>
      <li><a href="#motivation">Motivation</a></li>
      <li><a href="#proposal">Proposal: reading an uninitialized variable is erroneous</a></li>
      <li><a href="#implications">Performance and security implications</a></li>
      <li><a href="#optout">An opt-out mechanism</a></li>
      <li><a href="#wording">Proposed wording</a></li>
      <li><a href="#impact">Impact and implementability</a></li>
      <li><a href="#bigpic">The broader picture: Erroneous behaviour in C++</a></li>
      <li><a href="#tooling">Tooling</a></li>
      <li><a href="#optout-alts">Design alternatives for the opt-out mechanism</a></li>
      <li><a href="#meaning">What is code?</a></li>
      <li><a href="#relwork">Related work</a></li>
      <li><a href="#qna">Questions and answers</a></li>
      <li><a href="#ack">Acknowledgements</a></li>
      <li><a href="#references">References</a></li>
    </ol>

    <h2 id="history">Revision history</h2>
    <ul>
      <li>
        <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r0.html">P2795R0</a>:
        Initial revision. This revision was presented to SG23 and to EWG in Issaquah.
      </li>
      <li>
        <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r1.html">P2795R1</a>:
        Changed title and revised presentation to focus on uninitialized variables, and extract
        erroneous behaviour in general only as a future direction. This revision was presented
        to EWG in Varna.
      </li>
      <li>
        <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r2.html">P2795R2</a>:
        Reviewed by CWG and considered done as far as the present proposal goes, but CWG has
        requested that an opt-out mechanism be included in the proposal.
      </li>
      <li><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r3.html">P2795R3</a>:
        The target of the modified behaviour has been changed from
        &ldquo;object with automatic storage duration&rdquo; to &ldquo;non-static local
        variable&rdquo;. This excludes temporary objects and function parameters, which proved
        difficult to handle. Also, an opt-out mechanism is added via an attribute on variable
        definitions (backward-compatible with C++11).
      </li>
      <li>P2795R4: This version. The target of the modified behaviour has been restored to
        once again be <em>all</em> objects with automatic storage duration and temporaries.
        (However, there is no opt-out mechanism for temporary objects.)
        Wording has been added for <code>std::bit_cast</code> to handle erroneous values.
        It has been clarified that erroneously initialized storage can still result in
        undefined behaviour if the object representation is not valid for the type (e.g.
        for <code>bool</code>).
        It has been clarified that the opt-out attribute on a function parameter has to
        appear on the function's first declaration.
        The wording has been updated to establish a standard phrasing used to define
        situations that have erroneous behaviour.
      </li>
    </ul>

    <h2 id="summary">Summary</h2>

    <p>
      We propose to address the safety problems of reading a default-initialized automatic
      variable (an &ldquo;uninitialized read&rdquo;) by adding a novel kind of behaviour for
      C++. This new behaviour, called <em>erroneous behaviour</em>, allows us to formally speak
      about &ldquo;buggy&rdquo; (or &ldquo;incorrect&rdquo;) code, that is, code that does not
      mean what it should mean (in a sense we will discuss). This behaviour is both
      &ldquo;wrong&rdquo; in the sense of indicating a programming bug, and also well-defined in
      the sense of not posing a safety risk.
    </p>

    <h2 id="motivation">Motivation</h2>

    <p>
      Pragmatically, there are very few C++ programs in the real world that are entirely
      correct. In terms of the Standard, that means most programs are not constrained by the
      specification at all, since they run into undefined behaviour. This is ultimately not very
      helpful to real software development efforts. The term &ldquo;safety&rdquo; has been
      mentioned as a concern in both C and C++, but it is a nebulous and slippery term that
      means different things to different people. A useful definition that has come up is that
      &ldquo;safety is about the behaviour of incorrect programs&rdquo;.
    </p>
    <p>
      The motivating example of unsafe code that we address in this proposal is reading a
      default-initialized variable of automatic storage duration and scalar type:
    </p>
    <p id="exm"><strong>Example M:</strong></p>
    <div class="code">extern void f(int);

    int main() {
    &nbsp; int x; &nbsp; &nbsp; // default-initialized, value of x is indeterminate
    &nbsp; f(x); &nbsp; &nbsp; &nbsp;// glvalue-to-prvalue conversion has undefined behaviour
    }</div>

    <p>
      This code is blatantly <em>incorrect</em>, but it occurs commonly as a programming error.
      The code is also <em>unsafe</em> because this error is exploitable and leads to real,
      serious vulnerabilities.
    </p>

    <p>
      With increased community interest in safety, and a growing track record of exploited
      vulnerabilities stemming from errors such as this one, there have been calls to fix
      C++. The
      recent <a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1.html">P2723R1</a>
      proposes to make this fix by changing the undefined behaviour into well-defined behaviour,
      and specifically to well-define the initialization to be zero. We will argue below that
      such an expansion of well-defined behaviour would be a great detriment to the
      understandability of C++ code. In fact, if we want to both preserve the expressiveness of
      C++ and also fix the safety problems, we need a novel kind of behaviour.
    </p>

    <p>
      The excellent survey paper
      <a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2754r0.html">P2754R0</a>
      analyses a number of possible changes to automatic variable initialization. We had
      circulated the core idea of proposed erroneous behaviour on the reflector previously,
      and that option is contained in the survey. We reproduce the survey summary here,
      with minor modifications and added colour:
    </p>

    <table>
      <caption>Conclusion from <a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2754r0.html">P2754R0</a></caption>
      <thead>
        <tr>
          <th>Proposed Solution</th>
          <th>Viability</th>
          <th>Backward Compatibility</th>
          <th>Expressibility</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td style="color:#800">Always Zero-Initialize</td>
          <td>Viable</td>
          <td>Compatible</td>
          <td style="color:#800"><strong>Worse</strong></td>
        </tr>
        <tr>
          <td style="color:#800">Zero-Initialize or Diagnose</td>
          <td style="color:#800"><strong>Unclear</strong></td>
          <td>Correct-Code Compatible</td>
          <td>Unchanged</td>
        </tr>
        <tr>
          <td style="color:#800">Force-Initialize in Source</td>
          <td>Viable</td>
          <td style="color:#800"><strong>Incompatible</strong></td>
          <td>Better</td>
        </tr>
        <tr>
          <td style="color:#800">Force-Initialize or Annotate</td>
          <td>Viable</td>
          <td style="color:#800"><strong>Incompatible</strong></td>
          <td>Better</td>
        </tr>
        <tr>
          <td style="color:#800">Default Value, Still UB</td>
          <td style="color:#800"><strong>Nonviable</strong></td>
          <td>Compatible</td>
          <td>Unchanged</td>
        </tr>
        <tr>
          <td style="color:#060">Default Value, Erroneous</td>
          <td>Viable</td>
          <td>Compatible</td>
          <td>Unchanged</td>
        </tr>
        <tr>
          <td style="color:#800">Value-Initialize Only</td>
          <td style="color:#800"><strong>Unclear</strong></td>
          <td style="color:#800"><strong>Unclear</strong></td>
          <td style="color:#800"><strong>Unclear</strong></td>
        </tr>
      </tbody>
    </table>

    <p>
      The introduction of a novel notion of erroneous behaviour is the only solution that is
      viable, compatible with existing code, and that does not sacrifice expressiveness of the
      language. (A detailed discussion of expressiveness and the meaning of code follows
      <a href="#meaning">below</a>.) This leads us to our main proposal.
    </p>

    <h2 id="proposal">Proposal: reading an uninitialized variable is erroneous</h2>

    <p>
      We propose to change the semantics of reading an uninitialized variable:
    </p>
    <p style="margin: 0 2em;">Default-initialization of an automatic-storage object initializes the
      object with a <strong>fixed value defined by the implementation</strong>;
      however, <strong>reading that value is a conceptual error</strong>. Implementations
      are <strong>allowed and encouraged to diagnose this error</strong>, but they are also
      allowed to ignore the error and <strong>treat the read as valid</strong>. Additionally, an
      <strong>opt-out mechanism</strong> (in the form of an attribute on a variable definition
      or function parameter) is provided to restore the previous behaviour.
    </p>
    <p>
      This is a novel kind of behaviour. Reading an uninitialized value is never intended and a
      definitive sign that the code is not written correctly and needs to be fixed. At the same
      time, we do give this code well-defined behaviour, and if the situation has not been diagnosed,
      we want the program to be stable and predictable. This is what we call <em>erroneous behaviour</em>.
    </p>
    <p>
      In other words, it is still an "wrong" to read an uninitialized value, but if you do read
      it and the implementation does not otherwise stop you, you get some specific value. In
      general, implementations must exhibit the defined behaviour, at least up until a
      diagnostic is issued (if ever). There is no risk of running into the consequences
      associated with undefined behaviour (e.g. executing instructions not reflected in the
      source code, time-travel optimisations) when executing erroneous behaviour.
    </p>

    <div style="float: right; margin: 0 -18em 0 0; font-size: small;">
    <p>Recall:</p>
    <div class="code">extern void f(int);

    int main() {
    &nbsp; int x;
    &nbsp; f(x);
    }</div></div>

    <p>
      Here is a comparison of the status quo, the proposal P2723R1, and this proposal, with
      regards to the above <a href="#exm">Example&nbsp;M</a>.
    </p>

    <table>
      <caption>Comparison of Example M under various proposals</caption>
      <col width="30%">
      <col width="35%">
      <col width="35%">
      <thead>
        <tr><th>C++23</th><th>P2723R1<br><span style="text-size: x-small;">(default-init zero)</span></th><th>This proposal</th></tr>
      </thead>
      <tbody>
        <tr>
          <td>undefined behaviour</td>
          <td>well-defined behaviour</td>
          <td>erroneous behaviour</td>
        </tr>
        <tr>
          <td>definitely a bug</td>
          <td>may be intentionally using 0, or may be a bug</td>
          <td>definitely a bug</td>
        </tr>
        <tr>
          <td>common compilers allow rejecting (e.g. <code>-Werror</code>), this selects a non-conforming compiler mode</td>
          <td>conforming compilers cannot diagnose anything</td>
          <td>conforming compilers generally have to accept, but can reject as QoI in non-conforming modes</td>
        </tr>
      </tbody>
    </table>

    <p>
      A change from the earlier revision
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r0.html">P2795R0</a>
      is that the permission for an implementation to reject a translation unit
      &ldquo;if it can determine that erroneous behaviour is reachable within that translation
      unit&rdquo; has been removed: Richard Smith pointed out that such a determination is not
      generally possible. Any attempt to reject any erroneous behaviour at all would most likely
      have false positives, since it is in general impossible to determine whether a particular
      piece of code ends up being used. Whereas undefined behaviour in unused code would not
      currently prevent a build from succeeding, it could be rather disruptive if that code would
      now be rejected for containing erroneous behaviour, even if it was never used. Therefore,
      in this revision we leave it as pure QoI whether implementations attempt to detect that
      erroneous behaviour might be encountered and issue appropriate warnings or errors, similar
      to how implementations currently attempt to warn about undefined behaviour (e.g. with the
      <code>-Wuninitialized</code> flag).
    </p>

    <p>
      In <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r2.html">R3</a>
      we had changed the proposal to only target <em>variables</em> with automatic storage
      duration, whereas R2 and this R4 again target <em>all</em> objects with automatic storage
      duration. The latter include temporary objects (probably, though this is subject to
      <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#365">CWG 365</a> and
      <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1634">CWG 1634</a>)
      and function parameters. For R3, we have
      found it difficult to provide adequate opt-out mechanisms for non-variable objects,
      whereas an opt-out mechanism for local variables is straight-forward (see below). However,
      EWG has clarified that <em>all</em> automatic-storage objects should be treated the same,
      both in order to keep the mental model simple and to provide the safety benefits in all
      cases. Applying the opt-out mechanism to function parameters should be possible after all,
      but none will be provided for temporary objects. However, temporary objects are much less
      likely to be observable, so that an optimizing compiler should rarely need to create
      additional memory writes to implement the proposed behaviour.
    </p>

    <p>
      Note that we do not want to mandate that the specific value actually be zero (like P2723R1
      does), since we consider it valuable to allow implementations to use different
      &ldquo;poison&rdquo; values in different build modes. Different choices are conceivable
      here. A fixed value is more predictable, but also prevents useful debugging hints, and
      poses a greater risk of being deliberately relied upon by programmers.
    </p>

    <h2 id="implications">Performance and security implications</h2>

    <p>
      During core wording review, we noted a number of implications of changing initialization
      semantics, which we want to call out explicitly here.
    </p>
    <ul>
      <li>
        The automatic storage for an automatic variable is always fully initialized, which has
        potential performance implications. P2723R1 discusses the costs in some detail. Note that
        this cost even applies when a class-type variable is constructed that has no padding and
        whose default constructor initializes all members.
      </li>
      <li>
        In particular, unions are fully initialized. Copying a union is not erroneous, and in
        general, copying padding bits is not erroneous. In detail, this implies that
        glvalue-to-prvalue conversion of erroneous values is not itself erroneous, but doing
        anything with such a value other than copying it is erroneous. This is entirely parallel
        for the current undefined behaviour rules around indeterminate values.
      </li>
      <li>
        This proposal affects only the semantics of the initializaton of variables, not of all
        uses of indeterminate values in general. For example, one can copy an indeterminate
        value into an initialized variable, and reading that value can still lead to undefined
        behaviour, notwithstanding the variable&rsquo;s well-intentioned initialization.
      </li>
      <li>
        The proposed changes to the initialization rules only affect the initialization of an
        automatic variable as a single operation. Here is example that is <em>not</em> affected by
        this proposal and that involves an automatic variable and default-initialization separately:
        <div class="code" style="margin: 1ex 0;">void f() {
          &nbsp; char data[] = {'s', 'e', 'c', 'r', 'e', 't'}; &nbsp; // automatic variable
          &nbsp; ::new (static_cast&lt;void*&gt;(data)) char; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// default-initialization
          }
        </div>
        The placement-new does <em>not</em> guarantee to overwrite the data in its storage.</li>
      <li>Function parameters and temporary objects are also affected by this proposal.</li>
    </ul>

    <h2 id="optout">An opt-out mechanism</h2>

    <h3 id="optout-motivation">Motivation</h3>

    <p>The proposed change of behaviour has a runtime cost for existing code, since in general
      additional initialization of memory is now required. We reiterate at this point that the
      proposed change is nothing more and nothing less than a safety and security feature: It
      does not affect the semantics of correct code, and it does not alter the meaning of
      correct code or affect whether code is correct.</p>

    <p>Users who do not wish to accept this performance penalty should be given an option to
      disable this new safety feature and thereby recover the previous C++23 behaviour.
      Concretely, reading from a variable that has been opted out of the &ldquo;erroneous
      behaviour&rdquo; initialization has undefined behaviour again (so the compiler is allowed
      to assume that this does not happen and optimise accordingly).</p>

    <h3 id="optout-caution">A word of caution</h3>

    <p>Before we go into the details of the opt-out syntax, we have to discuss a potential
      pitfall (pointed out by the indefatigable Richard Smith): Users may wish to annotate
      their code to be explicit about the fact that they do not intend to initialize a
      variable. This is entirely unrelated to the safety feature of erroneous initialization.
      However, there is a danger that the opt-out for the safety feature is mistaken for a
      mechanism to document intention. If, for example, we made a syntax <code>= noinit</code>
      available, users might be tempted to use this to document their code:</p>
    <div class="code" style="background-color: #FEE;"><div class="note">hypothetical counter-example, inappropriate use</div>
      int x; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="comment">// C++23, legacy</span>
      int x = noinit; &nbsp; <span class="comment">// same thing, modern replacement, self-documenting in C++26</span></div>
    <p>It is plausible that a modern codebase would adopt a rule that initializers must never be omitted
      (as in <code>int x;</code>). Users might then expect that the new syntax would provide a principled alternative.
      (This was proposed, for example, in
      <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0632r0.html">P0632R0</a>.)
      But instead they would unwittingly and unintentionally opt out of a safety mechanism.
    </p>

    <h3 id="optout-decision">Decision</h3>

    <p>We propose to use an <em>attribute</em> to request opting out of the new behaviour.
      Attributes are a familiar mechanism in C++, and attributes are allowed in the two locations
      where we need it, namely appertaining to a variable definition and to a function parameter.
      An attribute works well
      semantically, since the sole purpose of this attribute is to disable a safety
      mechanism. Code that is correct with the attribute is also correct if the attribute is
      removed or ignored. Indeed, the attribute makes code strictly less permissive.</p>

    <p>The attribute is allowed only on variable definitions and on function parameters.
      It can appear either at the
      beginning of a declaration (in which case it applies to all declarators), or on an
      individual declarator immediately after the <em>declarator-id</em>. It is ill-formed to
      place the attribute anywhere else.</p>

    <p>No opt-out mechanism is provided for temporary objects, since we have not been able to
      find a suitable syntactic location. However, we consider this only a minor problem, since
      it is comparatively much harder to observe temporary objects, and if an optimizing compiler
      can prove that the storage is not observed, it can plausibly avoid dead stores.</p>

    <p>Alternative, rejected approaches and previous discussions and polls are discussed
      <a href="#optout-alts">below</a>. Note that unlike the alternative suggestions, attributes
      are backward-compatible with C++11.</p>

    <p>It remains to decide the name of the attribute. In light of the word of caution above, we
      would like to stay clear of the much-suggested term &ldquo;uninitialized&rdquo;. The
      opt-out is expected to be an expert-only feature that disables a safety guardrail and
      would be used only when performance concerns warrant it. We consider it acceptable for the
      name to be long and unwieldy, and it is perhaps even a desirable feature for the attribute
      to <em>not</em> appeal to the regular user for a mistaken purpose, as discussed above. We
      propose the spelling <code>[[indeterminate]]</code>. This seems to describe the effect
      reasonably well and is not prone to being misused to document intentional lack of
      initialization. To witness this in action:</p>

    <div class="code">int x [[indeterminate]];
      std::cin >> x;

      [[indeterminate]] int a, b[10], c[10][10];
      compute_values(&amp;a, b, c, 10);

      int d[3] = {}, e [[indeterminate]] [3]; &nbsp; <span class="comment">// [sic!]</span>
      copy_three_values(<span class="comment">/*from=*/</span>d, <span class="comment">/*to=*/</span>e);

      <span class="comment">// This class benefits from avoiding determinate-storage initialization guards.</span>
      struct SelfStorage {
      &nbsp; std::byte data_[512];
      &nbsp; void f(); &nbsp; <span class="comment">// uses data_ as scratch space</span>
      };

      SelfStorage s [[indeterminate]]; &nbsp; <span class="comment">// documentation suggested this</span>

      void g([[indeterminate]] SelfStorage s = SelfStorage()); &nbsp; <span class="comment">// same; unusual, but plausible</span></div>

    <h2 id="wording">Proposed wording</h2>

    <p>To establish the meaning of erroneous behaviour, first add an entry to [3, intro.defs]:</p>
    <div class="insert">
      <p><strong><ins>3.? [defns.erroneous]</ins></strong></p>
      <p><strong><ins>erroneous behavior</ins></strong></p>
      <p><ins>well-defined behavior that the implementation is recommended to diagnose</ins></p>
      <p><ins>
        [<em>Note 1 to entry</em>: Erroneous behavior is always the consequence of
        incorrect program code. Implementations are allowed, but not required, to diagnose it
        ([4.1.1, intro.compliance.general]).
        Evaluation of a constant expression ([7.7, expr.const]) never exhibits behavior
        specified as erroneous in [4, intro] through [15, cpp].
        &mdash;&nbsp;<em>end note</em>]</ins></p>
    </div>
    <p>Also modify the entry in [3, intro.defs] for &ldquo;undefined behavior&rdquo; to
      free up the term &ldquo;erroneous&rdquo;:</p>
    <div class="modify">
      <p><strong>3.65 [defns.undefined]</strong></p>
      <p><strong>undefined behavior</strong></p>
      <p>behavior for which this document imposes no requirements</p>
      <p>
        [<em>Note 1 to entry</em>:
        Undefined behavior may be expected when this document omits any explicit definition of
        behavior or when a program uses an <del>erroneous</del><ins>incorrect</ins> construct
        or <del>erroneous</del><ins>invalid</ins> data. Permissible undefined behavior ranges
        from ignoring the situation completely with unpredictable results, to behaving during
        translation or program execution in a documented manner characteristic of the
        environment (with or without the issuance of a diagnostic message), to terminating a
        translation or execution (with the issuance of a diagnostic
        message). Many <del>erroneous</del><ins>incorrect</ins> program constructs do not
        engender undefined behavior; they are required to be diagnosed. [&hellip;]
        &mdash;&nbsp;<em>end note</em>]
      </p>
    </div>

    <p>Update the footnote in [4.1.1, intro.compliance.general]p(2.1):</p>
    <div class="modify">
      <p>Although this document states [&hellip;]</p>
      <ul>
        <li>
          If a program contains no violations of the rules in [Clause 5, lex] through [Clause
          33] and [Annex D, depr], a conforming implementation shall, within its resource limits
          as described in [Annex B, implimits], accept
          and correctly execute[Footnote: &ldquo;Correct execution&rdquo;
          can include undefined behavior<ins> and erroneous behavior</ins>, depending on the data being processed; see [intro.defs]
          and [intro.execution].] that program.</li>
        <li>If a program contains a violation [&hellip;]</li>
      </ul>
    </div>

    <p>Then, define erroneous behaviour; modify [4.1.2, intro.abstract] paragraph 5
      and append two new paragraphs:</p>
    <div class="modify">
      <p>
        5. A conforming implementation executing a well-formed program shall
        produce the same observable behavior as one of the possible executions
        of the corresponding instance of the abstract machine with the
        same program and the same input.
        However, if any such execution contains an undefined operation, this document places no
        requirement on the implementation executing that program with that input
        (not even with regard to operations preceding the first undefined
        operation).
        <ins>If the execution contains an operation specified as having erroneous behavior,
        the implementation is permitted to issue a diagnostic
        and is permitted to terminate the execution at an unspecified time
        after that operation.</ins>
      </p>
      <p><ins>?. <em>Recommended practice</em>: An implementation should issue a diagnostic
        when such an operation is executed.</ins></p>
      <p><ins>?. [<em>Note&nbsp;?</em>:
        An implementation can issue a diagnostic if it can determine that erroneous behavior
        is reachable under an implementation-specific set of assumptions about the program behavior,
        which can result in false positives. &mdash;&nbsp;<em>end note</em>]</ins></p>
    </div>

    <p>
      Exclude erroneous behaviour from constant expressions;
      modify [7.7, expr.const] paragraph 5 as follows:
    </p>
    <div class="modify">
      <ul>
        <li>[&hellip;]</li>
        <li>an expression that would exceed the implementation-defined limits (see [implimits]);</li>
        <li>an operation that would have undefined<ins> or erroneous</ins> behavior
          as specified in [intro] through [cpp],
          excluding [dcl.attr.assume];<sup>[footnote]</sup></li>
        <li>an lvalue-to-rvalue conversion([conv.lval]) unless [&hellip;]</li>
        <li>[&hellip;]</li>
      </ul>
    </div>
    <p>
      Finally, make uninitialized reads erroneous. This requires changing both initialization
      and glvalue-to-prvalue conversion. Modify [6.7.4, basic.indet] heading and paragraph 1:</p>
    <div class="modify">
      <p><strong>6.7.4 Indeterminate<ins> and erroneous</ins> values [basic.indet]</strong></p>
      <p>
        1. When storage for an object with automatic or dynamic storage duration is
        obtained, the <del>object has an
        <em>indeterminate value</em></del><ins>bytes comprising the storage for the
        object have the following initial value:</ins></p>
      <ul>
        <li><ins>If the object has dynamic storage duration,
          or is the object associated with a variable or function parameter
          whose first declaration is marked with the <code>[[indeterminate]]</code>
          attribute [9.12.?, dcl.attr.indeterminate], the bytes have <em>indeterminate values</em></ins>,</li>
        <li><ins>otherwise, the bytes have <em>erroneous values</em>.</ins></li>
      </ul>
      <p><del>, and if</del><ins>If</ins> no initialization is performed for
        <del>the</del><ins>an</ins> object<ins> (including for subobjects)</ins>,
        <del>that object</del><ins>such a byte</ins> retains
        <del>an indeterminate</del><ins>its initial</ins> value
        until that value is replaced ([<ins>dcl.init.general, </ins>expr.ass]).
        <ins>If any bit in the value representation has an indeterminate value,
          the object has an <em>indeterminate value</em>;
          otherwise, if any bit in the value representation has an erroneous value,
          the object has an <em>erroneous value</em> ([conv.lval]).</ins>
      </p>
      <p>
        [<em>Note&nbsp;?</em>:
        Objects with static or thread storage duration are zero-initialized,
        see [basic.start.static].
        &mdash;&nbsp;<em>end note</em>]
      </p>
      <p>
        2. <del>If</del><ins>Except in the following cases, if</ins> an indeterminate value is
        produced by an evaluation, the behavior is undefined<del> except in the following
        cases</del><ins>, and if an erroneous value is produced by an evaluation, the behavior
        is erroneous and the result of the evaluation is a value determined by the implementation
        independent of the state of the program</ins>:
      </p>
      <ul>
        <li>
          If an indeterminate<ins> or erroneous</ins> value of unsigned ordinary character type ([basic.fundamental])
          or <code>std::byte</code> type ([cstddef.syn]) is produced by the evaluation of:
          <ul>
            <li>the second or third operand of a conditional expression ([expr.cond]),</li>
            <li>the right operand of a comma expression ([expr.comma]),</li>
            <li>the operand of a cast or conversion ([conv.integral,
              expr.type.conv,expr.static.cast,expr.cast])
              to an unsigned ordinary character type
              or <code>std::byte</code> type ([cstddef.syn]), or</li>
            <li>a discarded-value expression ([expr.context]),</li>
          </ul>
          then the result of the operation is an indeterminate value<ins> or that erroneous value, respectively</ins>.</li>
        <li>
          If an indeterminate<ins> or erroneous</ins> value of unsigned ordinary character type or <code>std::byte</code>
          type is produced by the evaluation of the right operand of a simple assignment
          operator ([expr.ass]) whose first operand is an lvalue of unsigned ordinary character
          type or <code>std::byte</code> type, an indeterminate value<ins> or that erroneous value, respectively,</ins> replaces the value of the
          object referred to by the left operand.
        </li>
        <li>
          If an indeterminate<ins> or erroneous</ins> value of unsigned ordinary character type
          is produced by the evaluation of the initialization expression when initializing an
          object of unsigned ordinary character type, that object is initialized to an
          indeterminate value<ins> or that erroneous value, respectively</ins>.
        </li>
        <li>
          If an indeterminate<ins> or erroneous</ins> value of unsigned ordinary character type
          or <code>std::byte</code> type is produced by the evaluation of the initialization
          expression when initializing an object of <code>std::byte</code> type, that object is
          initialized to an indeterminate value<ins> or that erroneous value, respectively</ins>.
        </li>
      </ul>
      <p>[<em>Example&nbsp;?</em>:</p>
      <div class="code">int f(bool b) {
        &nbsp; unsigned char <ins>*</ins>c<ins> = new unsigned char</ins>;
        &nbsp; unsigned char d = <ins>*</ins>c; &nbsp;// OK, d has an indeterminate value
        &nbsp; int e = d; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // undefined behavior
        &nbsp; return b ? d : 0; &nbsp; &nbsp; &nbsp;// undefined behavior if b is true
        }

        <ins>int g(bool b) {</ins>
        &nbsp; <ins>unsigned char c;</ins>
        &nbsp; <ins>unsigned char d = c; &nbsp; // OK, d has an erroneous value</ins>
        &nbsp; <ins>int e = d; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // OK, erroneous behavior</ins>
        &nbsp; <ins>return b ? d : 0; &nbsp; &nbsp; &nbsp;// OK, erroneous behavior if b is true</ins>
        <ins>}</ins></div>
      <p>&mdash;&nbsp;<em>end example</em>]</p>
    </div>

    <p>Modify [7.3.2, conv.lval]:
    </p>
    <div class="modify">
      <p>
        The result of the conversion is determined according to the
        following rules:
      </p>
      <ul>
        <li>[&hellip;]</li>
        <li>
          Otherwise, the object indicated by the glvalue is read ([defns.access]), and the value
          contained in the object is the prvalue result.
          <ins>If the result is an erroneous value ([basic.indet]) and the
            bits in the value representation are not valid for the object's
            type, the behavior is undefined.</ins>
        </li>
      </ul>
    </div>

    <p>Insert a new subclause [9.12.?, dcl.attr.indet]:
    </p>
    <div class="insert">
      <p><strong>9.12.? Indeterminate storage [dcl.attr.indet]</strong></p>
      <p>?. The <em>attribute-token</em> <code>indeterminate</code> may be applied to the definition
        of a block variable with automatic storage duration or to a
        <em>parameter-declaration</em> of a function declaration.
        No <em>attribute-argument-clause</em> shall be present.
        The attribute specifies that the
        storage of an object with automatic storage duration is initially indeterminate rather
        than erroneous [6.7.4, basic.indet].</p>
      <p>?.
        If a function parameter is declared with the <code>indeterminate</code> attribute,
        it shall be so declared in the first declaration of its function.
        If a function parameter is declared with the <code>indeterminate</code>
        attribute in the first declaration of its function in one
        translation unit and the same function is declared without the <code>indeterminate</code> attribute on the
        same parameter in its first declaration in another translation unit, the
        program is ill-formed, no diagnostic required.</p>
      <p>?. [<em>Note</em>: Reading from an uninitialized variable that is marked
        <code>[[indeterminate]]</code> can cause undefined behavior.</p>
      <p>[<em>Example</em>:</p>
      <div class="code"><ins>void f(int);</ins>
        <ins>void g() {</ins>
        &nbsp; <ins>int x [[indeterminate]], y;</ins>
        &nbsp; <ins>f(y); &nbsp; &nbsp; // erroneous behavior [basic.indet]</ins>
        &nbsp; <ins>f(x); &nbsp; &nbsp; // undefined behavior</ins>
        <ins>}</ins>

        <ins>struct T { T(){} int x; };</ins>
        <ins>void h(T t = T()) {</ins>
        &nbsp; <ins>f(t.x); &nbsp; // undefined behavior if default argument is used</ins>
        <ins>}</ins></div>
      <p>&mdash;&nbsp;<em>end example</em>]</p>
      <p>&mdash;&nbsp;<em>end note</em>]</p>
    </div>

    <p>Modify [11.9.3, class.base.init] paragraph 9:</p>
    <div class="modify">
      <p>
        [&hellip;] An attempt to initialize more than one non-static data member of a union renders the
        program ill-formed.
      </p>
      <p>
        [<em>Note&nbsp;?</em>:
        After the call to a constructor for class
        <code>X</code> for an object
        with automatic or dynamic storage duration has completed,
        if the constructor was not invoked as part of value-initialization and a member of
        <code>X</code> is neither initialized nor given a value during execution of
        the <em>compound-statement</em> of the body of the constructor, the member has
        an indeterminate <ins>or erroneous</ins> value<ins> ([basic.indet])</ins>.
        &mdash;&nbsp;<em>end note</em>]
      </p>
      <p>[<em>Example&nbsp;?</em>:</p>
      <div class="code">struct A {
        &nbsp; A();
        };

        struct B {
        &nbsp; B(int);
        };

        struct C {
        &nbsp; C() { } &nbsp; &nbsp; &nbsp;// initializes members as follows:
        &nbsp; A a; &nbsp; &nbsp; &nbsp; &nbsp; // OK, calls A::A()
        &nbsp; const B b; &nbsp; // error: B has no default constructor
        &nbsp; int i; &nbsp; &nbsp; &nbsp; // OK, i has indeterminate<ins> or erroneous</ins> value
        &nbsp; int j = 5; &nbsp; // OK, j has the value 5
        };</div>
      <p>&mdash;&nbsp;<em>end example</em>]</p>
    </div>

    <p>Free up the term &ldquo;erroneous&rdquo; by modifying [7.6.2.8, expr.new] paragraph 8:</p>
    <div class="modify">
      <p>
        If the <em>expression</em> in a <em>noptr-new-declarator</em>
        is present, it is implicitly converted to <code>std::size_t</code>.
        The<ins> value of the</ins> <em>expression</em> is <del>erroneous</del><ins>invalid</ins> if:
      </p>
      <ul>
        <li>the expression is of non-class type and its value before converting to
          <code>std::size_t</code> is less than zero;</li>
        <li>the expression is of class type and its value before application of the second
          standard conversion ([over.ics.user])<sup>[footnote]</sup> is less than zero;</li>
        <li>its value is such that the size of the allocated object would exceed the
          implementation-defined limit ([implimits]); or</li>
        <li>
          the <em>new-initializer</em> is a <em>braced-init-list</em> and the
          number of array elements for which initializers are provided (including the
          terminating <code>'\0'</code> in a <em>string-literal</em> ([lex.string]))
          exceeds the number of elements to initialize.
      </ul>
      <p>
        If the<ins> value of the</ins> <em>expression</em> is <del>erroneous</del><ins>invalid</ins>
        after converting to <code>std::size_t</code>:
      </p>
      <ul>
        <li>if the <em>expression</em> is a potentially-evaluated core constant expression,
          the program is ill-formed;</li>
        <li>otherwise, an allocation function is not called; instead
          <ul>
            <li>if the allocation function that would have been called
              has a non-throwing exception specification ([except.spec]),
              the value of the <em>new-expression</em>
              is the null pointer value of the required result type;</li>
            <li>otherwise, the <em>new-expression</em> terminates by throwing an
              exception of a type that would match a handler ([except.handle]) of type
              <code>std::bad_array_new_length</code> ([new.badlength]).</li>
          </ul>
        </li>
      </ul>
      <p>
        When the value of the <em>expression</em> is zero, the allocation
        function is called to allocate an array with no elements.
      </p>
    </div>

    <p>Free up the term &ldquo;erroneous&rdquo; by modifying [9.4.2, dcl.init.aggr] paragraph
      16, which uses the term in the same sense as in the preceding edit, and we need to make
      that connection clear:</p>
    <div class="modify">
      <p>
        Braces can be elided in an <em>initializer-list</em> as follows.
        If the <em>initializer-list</em> begins with a left brace, then the
        succeeding comma-separated list of <em>initializer-clause</em>s initializes
        the elements of a subaggregate; it is <del>erroneous</del><ins>invalid</ins>
        for there to be more <em>initializer-clause</em>s than elements.
      </p>
    </div>

    <p>Modify [16.4.4.4, nullablepointer.requirements] paragraph 2:</p>
    <div class="modify">
      <p>
        A value-initialized object of type <code>P</code> produces the null value of the type.
        The null value shall be equivalent only to itself. A default-initialized object
        of type <code>P</code> may have an indeterminate<ins> or erroneous</ins> value.
      </p>
      <p>
        [<em>Note&nbsp;?</em>:
        Operations involving
        indeterminate values can cause undefined behavior<ins>, and
        operations involving erroneous values can cause erroneous behavior</ins>.
        &mdash;&nbsp;<em>end note</em>]
      </p>
    </div>

    <p>Modify the specification of <code>std::bit_cast</code> to account for erroneous values
      by modifying [22.15.3, bit.cast] paragraph 2:</p>
    <div class="modify">
      <p>
        <em>Returns</em>: An object of type <code>To</code>.
        Implicitly creates objects nested within the result ([intro.object]).
        Each bit of the value representation of the result
        is equal to the corresponding bit in the object representation
        of <code>from</code>. Padding bits of the result are unspecified.
        For the result and each object created within it,
        if there is no value of the object's type corresponding to the
        value representation produced, the behavior is undefined.
        If there are multiple such values, which value is produced is unspecified.
        A bit in the value representation of the result is indeterminate if
        it does not correspond to a bit in the value representation of <code>from</code> or
        corresponds to a bit of an object that is not within its lifetime or
        has an indeterminate value ([basic.indet]).
        </p>
        <p>For each bit in the value representation of the result that is indeterminate,
        the smallest object containing that bit has an indeterminate value;
        the behavior is undefined unless that object is
        of unsigned ordinary character type or <code>std::byte</code> type.
        The result does not otherwise contain any indeterminate values.
        <ins>If a bit of the result corresponds to a bit in the object representation
        of <code>from</code> that has an erroneous value,
        and the bits in the value representation for the smallest object <code>O</code>
        containing that bit are not valid for the type of <code>O</code>,
        the behavior is undefined (see also [conv.lval]);
        otherwise the behavior is erroneous, and the result is as
        specified above, where the value of <code>O</code> is erroneous.</ins>
      </p>
    </div>    

    <p>Free up the term &ldquo;erroneous&rdquo; by modifying [33.7.3, thread.condition.nonmember]
      paragraph 5:</p>
    <div class="modify">
      <p>
        [<em>Note&nbsp;2</em>:
        It is the user&rsquo;s responsibility to ensure that waiting threads do
        not <del>erroneously</del><ins>wrongly</ins> assume that the thread has finished if they
        experience spurious wakeups. [&hellip;] &mdash;&nbsp;<em>end note</em>]
      </p>
    </div>

    <p>Free up the term &ldquo;erroneous&rdquo; by modifying [C.6.6, diff.dcl] paragraph 6:</p>
    <div class="modify">
      <p><strong>Rationale:</strong> This is to avoid <del>erroneous function calls
        (i.e., </del>function calls with the wrong number or type of arguments<del>)</del>.</p>
    </div>

    <h2 id="impact">Impact and implementability</h2>

    <p>
      Applying erroneous behaviour to the default initialization of automatic variables is
      already available today. Clang and GCC expose an example of the new production behaviour
      when given the flag <code>-ftrivial-auto-var-init=zero</code>, with the caveat that this
      never exhibits the encouraged behaviour of diagnosing an error. Clang exposes the
      error-detecting behaviour when using its Memory Sanitizer (which currently detects
      undefined behaviour, and would have to be taught to also recognize erroneous behaviour).
    </p>
    <p>
      The proposal primarily constitutes a change of the specification tools that we have
      available in the Standard, so that we have a formal concept of incorrect code that the
      Standard itself can talk about. It should pose only a minor implementation burden.
      Generally, the impact of changing an operation&rsquo;s current undefined behaviour to
      erroneous behaviour is as follows:
    </p>
    <ul>
      <li>On correct code: none observable, but possible performance cost.</li>
      <li>On incorrect code: if the code leads to previously undefined behaviour that
        is changed to erroneous behaviour, the code is still incorrect, but the behaviour
        is now as specified in the change, not unconstrained.</li>
    </ul>

    <h2 id="bigpic">The broader picture: Erroneous behaviour in C++</h2>

    <p>
      The current C++ Standard speaks only about well-defined and well-behaved programs, and
      imposes no requirements on any other program. This results in an overly simple dichotomy
      of a program either being correct as written, with specified behaviour, or being incorrect
      and entirely outside the scope of the Standard. It is not possible for program to be
      incorrect, yet have its behaviour constrained by the Standard.
    </p>
    <p>
      The newly proposed <em>erroneous behaviour</em> fills this gap. It is well-defined
      behaviour that is nonetheless acknowledged as being &ldquo;incorrect&rdquo;, and thus
      allows implementations to offer helpful diagnostics, while at the same time being
      constrained by the specification.
    </p>
    <p>
      Adopting erroneous behaviour for a particular operation consists of replacing current
      undefined behaviour with a (well-defined) specification of that operation&rsquo;s
      behaviour, explicitly called out as &ldquo;erroneous&rdquo;. This will in general have a
      performance cost, and we need some <em>principles</em> for when we change a particular
      construction to have erroneous behaviour. We propose the following.
    </p>
    <p style="margin: 0 2em;"><strong>Principles of erroneous behaviour:</strong></p>
    <ul style="margin: 0 2em;">
      <li>Currently well-defined behaviour should not generally be made erroneous, since that
        would effectively break existing code. We can consider this in exceptional circumstances
        if a particular behaviour turns out to be always wrong.</li>
      <li>Currently undefined behaviour whose potential for harmful compilation is low, e.g.
        as evidenced by CVEs or other reported exploits of vulnerabilities, should generally
        remain undefined. This requires the least amount of complexity in the standard.</li>
      <li>Currently undefined behaviour which exposes harmful failure modes and for which we
        can find a reasonable well-defined behaviour should be considered for conversion to
        erroneous behaviour.</li>
      <li>Note: some kinds of undefined behaviour, even if harmful, cannot reasonably be
        detected in every case, and thus it would be infeasible to attempt to define the
        behaviour.
      <li>Caution: erroneous behaviour is somewhat more likely to be wilfully relied upon by
        programmers than undefined behaviour, so we should err on the side of retaining undefined
        behaviour unless there is some clear benefit from defining the behaviour.</li>
    </ul>
    <p>
      To present further examples, we consult the Shafik Yaghmour&rsquo;s
      <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1705r1.html">P1705R1</a>,
      which lists occurrences of undefined behaviour in the Standard. We pick a selection of
      cases and comment on whether each one might be a candidate for conversion to erroneous
      behaviour.
    </p>

    <table>
      <col width="30%">
      <col width="20%">
      <col width="50%">
      <thead>
        <tr><th>UB in C++23</th><th>Action</th><th>Comment</th></tr>
      </thead>
      <tbody class="lined">
        <tr>
          <td>lexing splice ([lex.phases]) results in universal character name</td>
          <td>leave as is</td>
          <td>obscure, low potential for harm</td>
        </tr>
        <tr>
          <td>Modifying a const value</td>
          <td>leave as is</td>
          <td>unlikely to happen (requires explicit, suspicious code), infeasible to specify behaviour</td>
        </tr>
        <tr>
          <td>ODR violation</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>Read of indeterminate value</td>
          <td>change to erroneous</td>
          <td>(That is this paper!)</td>
        </tr>
        <tr>
          <td>Signed integer overflow</td>
          <td>could be changed to erroneous</td>
          <td>The result of an overflowing operation could &ldquo;erroneously be [some particular
            value]&rdquo;. This is not an uncommon bug. We consider it of low importance, though,
            since it is not a major safety concern.</td>
        </tr>
        <tr>
          <td>Unrepresentable arithmetic conversions</td>
          <td>could be changed to erroneous</td>
          <td>Same as for signed integer overflow.</td>
        </tr>
        <tr>
          <td>Bad bitshifts</td>
          <td>could be changed to erroneous</td>
          <td>Same as for signed integer overflow.</td>
        </tr>
        <tr>
          <td>calling a function through the wrong function type</td>
          <td>leave as is</td>
          <td>uncommon, infeasible</td>
        </tr>
        <tr>
          <td>invalid down-cast</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>invalid pointer arithmetic or comparison</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>invalid cast to enum</td>
          <td>unsure</td>
          <td>This needs investigation. Perhaps the invalid value could be erroneously
            preserved. Unclear if this would be useful.</td>
        </tr>
        <tr>
          <td>various misuses of <code>delete</code></td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>type punning, union misuse, overlapping object access</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>null pointer dereference, null pointer-to-member dereference)</td>
          <td>practically, leave as is</td>
          <td>One could entertain a change to make a null pointer dereference erroneous, but the
            choice of behaviour is tricky. For scalars, the result could be some fixed
            value. Alternatively, the result could be termination. This would of course have a
            cost.</td>
        </tr>
        <tr>
          <td>division by zero</td>
          <td>could be changed to erroneous</td>
          <td>Could erroneously result in some fixed value. The impact in the status quo is
          unclear; the change would have a cost.</td>
        </tr>
        <tr>
          <td>flowing off the end of a non-<code>void</code> function;
            returning from a <code>[[noreturn]]</code> function</td>
          <td>could be changed to erroneous</td>
          <td>E.g. could erroneously call <code>std::terminate</code>. Mild additional
          cost. Unclear how valuable.</td>
        </tr>
        <tr>
          <td>recursively entering the initialization of a block-static variable</td>
          <td>unsure</td>
          <td>Seems obscure.</td>
        </tr>
        <tr>
          <td>accessing an object outside its lifetime</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>calling a pure-virtual function in an abstract base {con,de}structor</td>
          <td>could be changed to erroneous</td>
          <td>E.g. some particular pure-virtual handler could be called erroneously. This might
          already be the case on some implementations.</td>
        </tr>
        <tr>
          <td>[class] doing things with members before construction has finished</td>
          <td>leave as is</td>
          <td>infeasible</td>
        </tr>
        <tr>
          <td>data races</td>
          <td>leave as is</td>
          <td>infeasible (also not often a cause of vulnerabilities)</td>
        </tr>
        <tr>
          <td>library undefined behaviour</td>
          <td>case by case</td>
          <td>A language-support facility such as &ldquo;<code>std::erroneous()</code>&rdquo;
          (which erroneously has no effect) could be used to allow for user-defined erroneous
          behaviour.</td>
        </tr>
        <tr>
          <td>(speculative) contract violation</td>
          <td>could be erroneous</td>
          <td>Current work on contracts comes up against the question of what should happen in
          case of a contract violation. The notion of erroneous behaviour might provide a useful
          answer.</td>
        </tr>
      </tbody>
    </table>

    <h2 id="tooling">Tooling</h2>

    <p>
      While we have been emphasising the importance of code readability and understandability,
      we must also consider the practicalities of actually compiling and running code. Whether
      code has meaning, and if so, which, impacts tools. There are two important, and sometimes
      opposed, use cases we would like to consider.
    </p>

    <h3>Production compilers</h3>

    <p>
      Getting code to run in production often comes with two important (and also opposed)
      expectations:
    </p>
    <ul>
      <li>Performance: code should use as few resources as possible to achieve its specified behaviour.</li>
      <li>Safety: incorrect code should not have harmful side effects.</li>
    </ul>
    <p>
      Undefined behaviour, and in particular its implications on the meaning of code, is
      increasingly exploited by compilers to optimise code generation. By assuming that
      undefined behaviour can never have been intentional, transitive assumptions can be derived
      that allow for far-reaching optimisations. This is often desirable and beneficial for
      correct code (and demonstrates the value of unambiguously understandable code: even
      compilers can use this reasoning to determine how much work does and does not have to be
      done). However, for incorrect code this can expose vulnerabilities, and thus constitute a
      considerable lack of safety.
      <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1093r0.pdf">P1093R0</a>
      discusses these performance implications of undefined behaviour.
    </p>
    <p>
      The proposed erroneous behaviour retains the same meaning of code as undefined behaviour
      for human readers, but the compiler has to accept that erroneous behaviour can happen.
      This constrains the compiler (as it as to ensure erroneous results are produced correctly),
      but in the event of incorrect code (which all erroneous behaviour requires), the resulting
      behaviour is constrained by the Standard and does not create a safety hazard. In other words,
      erroneous behaviour has a potential performance cost compared to undefined behaviour, but
      is safer in the presence of incorrect code.
    </p>

    <h3>Debug toolchains and sanitizers</h3>

    <p>
      The other major set of tools that software projects use are debugging tools. Those include
      extra warnings on compilers, static analysers, and runtime sanitizers. The former two are
      good at catching some localised bugs early, but do not catch every bug. Indeed one of the
      main limitations we seem to be discovering is that there is reasonable C++ code for which
      important analyses cannot be performed statically. (Note that
      <a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2687r0.pdf">P2687R0</a>
      proposes a safety strategy in which static analysis plays a major role.) Runtime
      sanitizers like ASAN, MSAN, TSAN, UBSAN, on the other hand, have excellent abilities to
      detect undefined behaviour at runtime with virtually no false positives, but at a
      significant build and runtime cost.
    </p>
    <p>
      Both runtime sanitizers and static analysis can use the code readability signal from both
      undefined and erroneous behaviour equally well. In both cases it is clear that the code is
      incorrect. For undefined behaviour, implementations are unconstrained anyway and tools may
      reject or diagnose at runtime. The goal of erroneous behaviour is to permit the exact same
      treatment, by allowing a conforming implementation to diagnose, terminate (and also reject)
      a program that contains erroneous behaviour.
    </p>

    <p>
      In other words, erroneous behaviour retains the understandability and debuggability of
      undefined behaviour, but also constrains the implementation just like well-defined
      behaviour.
    </p>

    <h3>Usage profiles</h3>

    <p>The following toolchain deployment examples are based on real-world setups.</p>
    <ul>
      <li>
        A safety-critical production system (e.g. one that handles user data) compiles code
        treating erroneous behaviour as well-defined, not issuing diagnostics. Compared to the
        status quo, the resulting executable is safer since erroneous behaviour is well-defined.
        There may be a flag to ensure that no erroneous behaviour is rejected, so as to not be
        blocked by erroneous but dead code.
      </li>
      <li>
        A safety-noncritical, high-performance system (e.g. a scientific simulation) may use a
        toolchain that assumes that no erroneous behaviour exists (similar
        to <code>-ffast-math</code>, which assumes that all floating point values lie in a
        certain range). This allows for best-possible performance, and corresponds to the status
        quo. Debugging and testing are performed separately.
      </li>
      <li>
        Debugging and testing deployments, including unit tests, use runtime sanitizers to
        detect undefined and erroneous behaviour. This is a continuous part of a larger
        production ecosystem, where test coverage and fuzzing tools try to expose as much of the
        production code to sanitizers as possible, and it is cultivated as part of the overall
        production effort.
      </li>
    </ul>

    <h2 id="optout-alts">Design alternatives for the opt-out mechanism</h2>

    <p>When it comes to an explicit syntax that restores the initialization behaviour prior to
      this proposed change, multiple plausible options exist. Paper P2723R1 proposed
      <a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1.html#out-out">several
      options</a>, which were polled in Issaquah. We will not repeat the details here, and only
      briefly summarise some of the options:</p>

    <ul>
      <li>an attribute (like proposed here, <code>int x [[foo]];</code>)</li>
      <li>a keyword (like &ldquo;<code>int x = noinit;</code>&rdquo;)</li>
      <li>a magic library type (like &ldquo;<code>int x = std::uninitialized;</code>&rdquo;)</li>
    </ul>

    <p>In the Issaquah polling, the attribute (8|20|13|10|5) and keyword (3|11|10|18|11) options
      did not reach consensus, and the magic library type option (20|11|10|11|5) received only
      mild favours. Nonetheless, in this proposal we are again using an attribute, since the
      alternatives are significantly more novel and lack any form of implementation experience,
      whereas by contrast we have plenty of experience with attributes. An attribute is
      compatible with (i.e. can be deployed in code written as) previous versions of C++ down to
      C++11. The attribute meets the rules of ignorability: ignoring the opt-out does not change
      the meaning of a program that is correct <em>with</em> the opt-out.</p>

    <p>Recall the <a href="#optout-caution">word of caution</a> from the design discussion: The
      mechanism to opt out of the safety feature must not be easily mistakable for a mechanism
      to annotate that an initialization is deliberately omitted. We will keep this in mind as
      it applies to the names chosen both for the novel syntax proposals above and also for the
      attribute, and we will dub this the &ldquo;accidental self-documentation trap&rdquo;.</p>

    <p>A number of names had been brought forward for the opt-out attribute. We list only a few
      most plausible ones:</p>
    <ul>
      <li>&#10005; <code>uninitialized</code>: falls into the accidental self-documentation trap, as
        discussed in the design section</li>
      <li>&#10005; <code>noinit</code>: quite similarly subject to the accidental
        self-documentation trap (and shows that a <code>noinit</code> keyword would also be
        problematic for the same reasons)</li>
      <li>&#10005; <code>for_overwrite</code>: quite similarly subject to the accidental
        self-documentation trap, though a nice connection with existing for-overwrite facilities</li>
      <li>&#10005; <code>no_trivial_initialization</code>: this is a bit confusing, since one can read
        &ldquo;initialization&rdquo; in multiple ways</li>
      <li>&#10005; <code>not_zeroed</code>: too specific and tied to implementation details; we are
        deliberately allowing erroneous values to vary at the implementation&rsquo;s discretion,
        and they are not required to be zero</li>
      <li>&#10005; <code>possible_security_hole</code>: too cute and imprecise,
        but demonstrates how this is about disabling a safety mechanism</li>
      <li>&#10005; <code>assume_indeterminate</code>: close, but perhaps needlessly verbose</li>
      <li>&#10004; <code>indeterminate</code>: short enough, matches the core wording, and mysterious
        enough to require users to look up how it works instead of assuming</li>
    </ul>

    <h2 id="meaning">What is code?</h2>

    <p><em>Here follows a high-level position the meaning of code, which underpins my motivation
      to preserve the ability to recognize code as incorrect even if it has well-defined
      behaviour.</em></p>

    <p>
      Code is communication. Primarily, code communicates an idea <em>among humans</em>. Humans
      work with code as an evolving and accumulating resource. Its role in software engineering
      projects is not too different from the role of traditional literature in the pursuit of
      science, technology, and engineering: literature is how individuals learn from and
      contribute to collective progress. The fact that code can also be interpreted and executed
      by computers is of course also important, but secondary. (There are many ways one can
      instruct a machine, but not all of them are suitable for building a long-term ecosystem.)
    </p>
    <p>
      The language of code are programming languages, and the medium is source code, just like
      natural languages written in books, emails, or spoken in videos are the media of
      traditional literature. Like all media, source code is imperfect and ambiguous. The
      purpose of a text is to communicate an idea, but the entire communication has to be
      funnelled through the medium, and understood by the audience. Without the author present to
      explain what they really meant, the text is the only clue to the original idea; any act of
      reading a text is always an act of forensic reconstruction of the original idea. If the
      text is written well and &ldquo;clear&rdquo;, then readers can perform this reconstruction
      with high confidence that they &ldquo;got it right&rdquo; and feel themselves
      understanding the idea; they are &ldquo;on the same page&rdquo; as the author. On the
      other hand, poor writing leads to ambiguous text, and reading requires interpretation and
      often guess-work. This is no different in natural languages than in computer code.
    </p>
    <p>
      I would like to propose that we appreciate the value of code <em>as communication with
      humans</em>, and consider how well a programming language works for that purpose in medium
      of source code.  Source code is often shared among a large group of users, who are
      actively working with the code: code is only very rarely a complete black-box that can be
      added to a project without further thought. At the very least, interfaces and vocabulary
      have to be understood. But commonly, too, code has to be modified in order to be
      integrated into a project, and to be evolved in response to new requirements. Last but not
      least, code often contains errors, which have to be found, understood, and fixed. All of
      the above efforts may be performed by a diverse group of users, none of whom need to have
      intimate familiarity with any one piece of code. There is value in having <em>any</em>
      competent users be able to read and understand any one piece of code &mdash; not
      necessarily in all its domain depth, but well enough to work with it in the context of a
      larger project. To extent the analogy with natural language above, this is similar to how
      a competent speaker of a language should be able to understand and integrate a well-made
      argument in a discussion, even if they are not themselves an expert in the domain of the
      argument.
    </p>
    <p>
      How does all this connect to C++? Like with code in any programming language, given a
      piece of code, a user should be able to understand the idea that the code is
      communicating. Absent a separate document that says &ldquo;Here is what this code is meant
      to do:&rdquo;, the main source of information available to the user is the behaviour of
      the code itself. Note how this has nothing to do with compiling and running code. At this
      point, the code and the idea it communicates exist only in the minds of the author and the
      reader; no compilation is involved. How well the user understands the code depends on how
      ambiguous the code is, that is, how many different things it can mean. The user interprets
      the code by choosing a possible meaning to it from among the choices, where we assume that
      the code is <em>correct</em>: in C++, that means correct in the sense of the Standard,
      being both well-formed and executing with well-defined behaviour. This is critical: the
      constraint of presumed correctness serves as a dramatic aid for interpretation. If we
      assume that code is correct, then we can dismiss any interpretation that would require
      incorrect behaviour, and we have to decide among only the few remaining valid
      interpretations. The more valid interpretations a construction has, the more ambiguity a
      user has during interpretation of the entire piece of code. C++ defines only a very narrow
      set of behaviours, and everything else is left as the infamous <em>undefined
      behaviour</em>, which we could say is not C++ at all, in the sense that we assume that
      that's not what could possibly have been meant. Practically, of course, we would not
      dismiss undefined behaviour as &ldquo;not C++&rdquo;, but instead we would treat it as a
      definitive signal that the code is not communicating its idea correctly. (We could then
      either ask the author for clarification, or, if we are confident to have understood the
      correct idea anyway, we can fix the code to behave correctly. I claim that in this
      long-term perspective on code as a cultural good, buggy code with a clear intention is
      better than well-behaved, ambiguous code: if the intention is clear, then I can see if the
      code is doing the right thing and fix it if not, but without knowing the intention, I have
      no idea if the well-behaved code is doing what it is supposed to.)
    </p>

    <h2 id="relwork">Related work</h2>

    <p>
      Sean Parent&rsquo;s presentation
      <em><a href="https://sean-parent.stlab.cc/presentations/2021-11-14-domain-of-operation/2021-11-14-domain-of-operation.pdf">Reasoning
      About Software Correctness</a></em> (and also subsequent Cpp North 2022 keynote talk)
      give a useful definition of &ldquo;safety&rdquo; adapted specifically to C++ and
      which explicitly concerns only <em>incorrect code</em>. He defines a function to be safe
      if it does not lead to undefined behaviour (that is, even when preconditions are violated).
    </p>

    <p>
      JF Bastien's paper P2723R1 proposes addressing the safety concerns around automatic variable
      initialization by just defining variables to be initialized to zero. The previous revision of
      that paper was what motivated the current proposal: The resulting behaviour is desirable,
      but the cost on code understandability is unacceptable to the present author.
    </p>

    <p>
      The papers P2687R0 by Bjarne Stroustrup and Gabriel Dos Reis and P2410R0 by Bjarne
      take a more general look at how to arrive at a safe language. They
      recommends a combination of static analysis and restrictions on the use of the language so
      as to make static analysis very effective. However, on the subject of automatic variable
      initialization specifically they offers no new solution: P2687R0 recommends only either
      zero-initialization or annotated non-initialization (reading of which results in UB); in
      that regard it is similar to JF Bastien's proposal. P2410R0 states that &ldquo;[s]tatic
      analysis easily prevents the creation of uninitialized objects&rdquo;, but the intended result
      of this prevention, and in particular the impact on code understandability, is left open.
    </p>

    <p>
      Tom Honerman proposed a system of &ldquo;diagnosable events&rdquo;, which is largely
      aligned with the values and goals of this proposal, and takes a quite similar approach:
      Diagnosable events have well-defined behaviour, but implementations are permitted to
      handle them in an implementation-defined way.
    </p>

    <p>
      Davis Herring's paper P1492R2 proposes a checkpointing system that would stop undefined
      behaviour from having arbitrarily far-reaching effects. That is a somewhat different
      problem area from the present safety one, and in particular, it does not control the
      effects of the undefined behaviour itself, but merely prevents it from interfering with
      other, previous behaviour. (E.g. this would not prevent the leaking of secrets via
      uninitialized variables.)
    </p>

    <p>
      The Ada programming language has a notion of bounded undefined behaviour.
    </p>

    <p>
      Paper P1093R0 by Bennieston, Coe, Gahir and Russel discusses the value of undefined
      behaviour in <em>correct</em> code and argues for
      the value of the compiler optimizations that undefined behaviour permits. This is
      essentially the tool&rsquo;s perspective of the value of undefined behaviour for the
      interpretability of code which we discussed above: both humans and compilers benefit from
      being able to understand code with fewer ambiguities. Compilers can use the absence of
      ambiguities to avoid generating unnecessary code. The paper argues that we should not
      break these optimizations lightheartedly by making erstwhile undefined behaviour well-defined.
    </p>

    <h2 id="qna">Questions and answers</h2>

    <p>
      <strong>Do you really mean that there can never be any UB in any correct code?</strong>
      There is of course always room for nuance and detail. If a particular construction is
      known to be UB, but still appropriate on some platform or under some additional
      assumptions, it is perfectly fine to use it. It should be documented/annotated
      sufficiently, and perhaps tools that detect UB need to be informed that the construction
      is intentional.
    </p>

    <p>
      <strong>Why is static analysis not enough to solve the safety problem of UB? Why do we need sanitizers?</strong>
      Current C++ is not constrained enough to allow static analysis to accurately detect all
      cases of undefined behaviour. (For example, C++ allows initializing a variable via a call
      to a function in a separate translation unit or library.) Other languages like Rust manage
      to prevent unsafe behaviour statically, but they are more constrained (e.g. Rust does not
      allow passing an uninitialized value to a function). Better static analysis is frequently
      suggested as a way to address safety concerns in C++ (e.g. P2410R0, P2687R0), but this
      usually requires adopting a limited subset of C++ that is amenable to reliable static
      analysis. This does not help with the wealth of existing C++ code, neither with making it
      safe nor with making it correct. By contrast, runtime sanitizers can reliably point out
      when undefined behaviour is reached.
    </p>

    <p>
      <strong>Why is <code>int x;</code> any different from <code>std::vector&lt;int&gt;
      v;</code>?</strong>  Several reasons. One is that <code>vector</code> is a class with
      internal invariants that needs to be destructible, so a well-defined initial state already
      suggests itself. The other is that a vector is a container of elements, and if the
      initializer does not provide any elements, then a vector with no elements is an
      unsurprising result. By contrast, if there is no initial value given for
      an <code>int</code>, there is no single number that is better or more obviously right than
      any other number. Zero is a common choice in other languages, but it does not seem helpful
      in the sense of making it easy to write unambiguous code if we allow a novel spelling of
      a zero-valued <code>int</code>. If you mean zero, just say <code>int x = 0;</code>.
    </p>

    <p>
      <strong>Is this proposal better than defining <code>int x;</code> to be zero?</strong>
      It depends on whether you want code to deliberately use <code>int x;</code> to mean,
      deliberately, that <code>x</code> is zero. The counter-position, shared by this author, is
      that zero should have no such special treatment, and all initialization should be
      explicit, <code>int x = -1, y = 0, z = +1;</code>. All numeric constants are worth seeing
      explicitly in code, and there is no reason to allow <code>int x;</code> as a valid
      alternative for one particular case that already has a perfectly readable spelling.
      (An explicit marker for a deliberately uninitialized variable is still a good idea,
      and accessing such a variable would remain undefined behaviour, and not become
      erroneous even in this present proposal.)
    </p>

    <h2 id="ack">Acknowledgements</h2>

    <p>
      Many thanks to Loïc Joly for extensive help on restructuring and refocussing this
      document, to Richard Smith for discussion and wording, to Andrzej Krzemienski for wording
      suggestions and for pointing out a possible application to contracts, to Ville Voutilainen
      for naming suggestions, and to John Lakos for valuable feedback and support. Thanks for
      valuable discussions and encouragement also to JF Bastien and to members of SG23.
      Thanks to Jens Maurer for an in-depth wording review and reorganization. Finally,
      thanks to the members of CWG for many wording suggestions.
    </p>

    <h2 id="references">References</h2>
    <ul>
      <li>
        Andrew Bennieston, Jonathan Coe, Daven Gahir, Thomas Russel,
        <em><a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1093r0.pdf">P1093R0</a>: Is undefined behaviour preserved?</em>
      </li>
      <li>
        Tom Honerman,
        <em>Posioned values: A feature and specification mechanism to aid diagnosis of implicitly initialized variables</em>, private communication.
      </li>
      <li>
        Davis Herring,
        <em><a href="https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2021/p1494r2.html">P1494R2</a>: Partial program correctness</em>.
      </li>
      <li>
        JF Bastien,
        <em><a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1.html">P2723R1</a>: Zero-initialize objects of automatic storage duration</em>.
      </li>
      <li>
        Sean Parent,
        <em><a href="https://sean-parent.stlab.cc/presentations/2021-11-14-domain-of-operation/2021-11-14-domain-of-operation.pdf">Reasoning About Software Correctness</a></em>.
      </li>
      <li>
        Sean Parent,
        Cpp North 2022 keynote talk, <em><a href="https://www.youtube.com/watch?v=kZCPURMH744">The Tragedy of C++</a></em>.
      </li>
      <li>
        Bjarne Stroustrup, Gabriel Dos Reis,
        <em><a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2687r0.pdf">P2687R0</a>: Design Alternatives for Type-and-Resource Safe C++</em>.
      </li>
      <li>
        Bjarne Stroustrup,
        <em><a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2410r0.pdf">P2410R0</a>: Type-and-resource safety in modern C++</em>.
      </li>
      <li>
        Jake Fevold,
        <em><a href="https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2754r0.html">P2754R0</a>: Deconstructing the Avoidance of Uninitialized Reads of Auto Variables</em>.
      </li>
      <li>
        Shafik Yaghmour,
        <em><a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1705r1.html">P1705R1</a>: Enumerating Core Undefined Behavior</em>.
      </li>
      <li>Jonathan M&uuml;ller
        <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0632r0.html">P0632R0</a>:
        <em>Proposal of [[uninitialized]] attribute</em></li>
    </ul>

  </body>
</html>
