<!doctype html public "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<head>
<title>SG16: Unicode meeting summaries 2020-06-10 through 2020-08-26</title>
</head>

<style type="text/css">

table#header th,
table#header td
{
    text-align: left;
}

tt {
    font-family: monospace;
}

/* Thanks to Elias Kosunen for the following CSS suggestions! */

* {
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
    line-height: 125%;
}

html, body {
    background-color: #eee;
}

h1, h2, h3, h4, h5, p, span, li, dt, dd {
    color: #333;
}

p, li {
    line-height: 140%;
}

body {
    padding: 1em;
    max-width: 1600px;
}

p, li {
    -moz-osx-font-smoothing: grayscale;
    -webkit-font-smoothing: antialiased !important;
    -moz-font-smoothing: antialiased !important;
    text-rendering: optimizelegibility !important;
    letter-spacing: .01em;
}

h1, h2, h3 {
    margin-bottom: 1em;
    letter-spacing: .03em;
}

blockquote.quote
{
    margin-left: 0em;
    border-style: solid;
    background-color: lemonchiffon;
    color: #000000;
    border: 1px solid black;
}

</style>

<body>

<table id="header">
  <tr>
    <th>Document Number:</th>
    <td>P2217R0</td>
  </tr>
  <tr>
    <th>Date:</th>
    <td>2020-08-29</td>
  </tr>
  <tr>
    <th>Audience:</th>
    <td>SG16</td>
  </tr>
  <tr>
    <th>Reply-to:</th>
    <td>Tom Honermann &lt;tom@honermann.net&gt;</td>
  </tr>
</table>


<h1>SG16: Unicode meeting summaries 2020-06-10 through 2020-08-26</h1>

<p>
Summaries of SG16 meetings are maintained at
<a href="https://github.com/sg16-unicode/sg16-meetings">
https://github.com/sg16-unicode/sg16-meetings</a>.  This paper contains a
snapshot of select meeting summaries from that repository.
</p>

<ul>
  <li><a href="#2020_06_10">
      June 10th, 2020</a></li>
  <li><a href="#2020_06_17">
      June 17th, 2020</a></li>
  <li><a href="#2020_07_08">
      July 8th, 2020</a></li>
  <li><a href="#2020_07_22">
      July 22nd, 2020</a></li>
  <li><a href="#2020_08_12">
      August 12th, 2020</a></li>
  <li><a href="#2020_08_26">
      August 26th, 2020</a></li>
</ul>

<p>
Previously published SG16 meeting summary papers:
<ul>
  <li><a href="https://wg21.link/p1080">P1080: SG16: Unicode meeting summaries 2018/03/28 - 2018/04/25</a></li>
  <li><a href="https://wg21.link/p1137">P1137: SG16: Unicode meeting summaries 2018/05/16 - 2018/06/20</a></li>
  <li><a href="https://wg21.link/p1237">P1237: SG16: Unicode meeting summaries 2018/07/11 - 2018/10/03</a></li>
  <li><a href="https://wg21.link/p1422">P1422: SG16: Unicode meeting summaries 2018/10/17 - 2019/01/09</a></li>
  <li><a href="https://wg21.link/p1666">P1666: SG16: Unicode meeting summaries 2019/01/23 - 2019/05/22</a></li>
  <li><a href="https://wg21.link/p1896">P1896: SG16: Unicode meeting summaries 2019/06/12 - 2019/09/25</a></li>
  <li><a href="https://wg21.link/p2009">P2009: SG16: Unicode meeting summaries 2019-10-09 through 2019-12-11</a></li>
  <li><a href="https://wg21.link/p2179">P2179: SG16: Unicode meeting summaries 2020-01-08 through 2020-05-27</a></li>
</ul>
</p>


<h1 id="2020_06_10">June 10th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Discuss terminology updates to strive for in C++23
    <ul>
      <li><a href="https://wg21.link/p1859r0">P1859R0: Standard terminology character sets and encodings</a>.</li>
      <li>Establish priorities for terms to address.</li>
      <li>Establish a methodology for drafting wording updates.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Alisdair Meredith</li>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Marcos Bento</li>
  <li>Mark Zeren</li>
  <li>Martinho Fernandes</li>
  <li>Peter Bindels</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>A round of introductions was held for the benefit of new attendees.</li>
  <li>Zach asked for everyone to contribute to the Boost.Text review scheduled
      to start on the following day, June 13th.
    <ul>
      <li>Contributors will need to subscribe to the
          <a href="https://lists.boost.org/mailman/listinfo.cgi/boost">boost@lists.boost.org</a>
          mailing list at
          <a href="https://lists.boost.org/mailman/listinfo.cgi/boost">https://lists.boost.org/mailman/listinfo.cgi/boost</a>.</li>
      <li>An introductory invitation for SG16 members was posted to the SG16
          mailing list and is available at
          <a href="https://lists.isocpp.org/sg16/2020/06/1499.php">https://lists.isocpp.org/sg16/2020/06/1499.php</a>.</li>
    </ul>
  </li>
  <li>Tom mentioned that work has progressed on establishing a shared calendar
      for all WG21 telecons.  Official announcements are expected soon.  For
      now, BlueJeans calendar invites will continue to be sent as usual, but
      may be discontinued in the future if the shared calendar works well for
      everyone.</li>
  <li>Discuss terminology updates to strive for in C++23
    <ul>
      <li>Tom introduced the topic.
        <ul>
          <li>Per prior meetings, modernizing terminology in the standard is an
              SG16 goal for C++23.</li>
          <li>Tom expressed uncertainty with regard to the best starting point
              for discussion, but suggested starting by reviewing a set of
              existing terms used in the standard that he included in an
              <a href="https://lists.isocpp.org/sg16/2020/06/1484.php">email to the SG16 mailing list</a>
              right before the meeting.</li>
        </ul>
      </li>
      <li>Corentin expressed desire to take a holistic approach to updating the
          wording and directed attention to his
          <a href="https://lists.isocpp.org/sg16/2020/06/1460.php">D2178R0 draft attached to a message sent to the SG16 mailing list</a>.</li>
      <li>Corentin suggested splitting the effort to focus first on core
          wording, then on library wording.</li>
      <li>PBrett opined that core wording will be difficult and would prefer a
          single paper to address it, but potentially multiple papers to address
          library wording.</li>
      <li>PBrett noted that some library components treat non-text as text.  For
          example, file names, command line arguments, stream contents, and
          environment variables.</li>
      <li>Hubert suggested inserting a third phase up front to just establish
          terminology itself.</li>
      <li>Alisdair agreed noting that commonly understood terminology provides
          the tools necessary to discuss wording.</li>
      <li>Steve expressed a desire to introduce new terms in order to facilitate
          easier communication; specifically new short terms that can substitute
          for otherwise wordy phrasing.</li>
      <li>Steve stated that we'll need to re-word with expectation of impact to
          existing implementations.</li>
      <li>Tom agreed noting that he ran into such situations drafting
          <a href="https://wg21.link/p2029">P2029</a>.
          This happens due to interaction with core issues and discovery of
          existing conformance issues in implementations.</li>
      <li>Corentin replied that any such impact should be minimal, and should
          effectively be bug fixes, each of which has limited impact to existing
          implementations.</li>
      <li>PBrett asked if we have general agreement for splitting the work in
          three phases as indicated.</li>
      <li>No objections were raised.</li>
      <li>Hubert stated that we may need to introduce new terms.</li>
      <li>Tom suggested that, perhaps, we should start discussion with
          <em>character</em> first.</li>
      <li>Hubert responded that
          <a href="https://wg21.link/p1859r0">P1859R0</a>
          already discussed <em>abstract character</em> and no one raised
          concerns.</li>
      <li>Discussion turned to the first item in the list of terms Tom sent to
          the mailing list, "The encoding of source files".</li>
      <li>Someone noted that the source may not be a file, or even a digital
          resource with an encoding in any traditional sense.</li>
      <li>Steve responded that Richard Smith is a conforming implementation of
          the standard.</li>
      <li>Alisdair asked if the standard should rule out source files contained
          in .zip files.</li>
      <li>Tom replied that he wasn't aware of such translation phase 1 abilities
          being challenged and that any proposed changes should strive to
          preserve such abilities.</li>
      <li>Corentin observed that, if the source input is an image, there is no
          traditional character encoding or character set, but a stream of
          characters is still available.</li>
      <li>Hubert suggested that it may be useful to introduce the notion of a
          logical source file that is distinct from any physical
          representation.</li>
      <li>Steve noted that a path through that logical representation is
          currently required to retrieve original spelling of characters in raw
          string literals.</li>
      <li>Corentin opined that the current machinery works and that it is nice
          to be able to discard the notion of a physical source representation
          after phase 1.</li>
      <li>Hubert stated that translation phase 1 does too much for one phase
          right now.</li>
      <li>Corentin agreed and stated a preference that translation phase 1 only
          perform character mapping.</li>
      <li>Jens described how translation phase 1 could be divided into
          sub-phases.  Phase 1A would produce logical characters and phase 1B
          would map to <em>universal-character-name</em>s.</li>
      <li>Jens opined that the notion of physical source file is too limiting;
          other input forms should not be excluded.</li>
      <li>Corentin reiterated his fondness for discarding physical details
          after translation phase 1.</li>
      <li>Jens stated that the current method of reverting portions of
          translation phases 1 and 2 to retrieve the original spelling for raw
          string literals is very hacky; it would be better to preserve the
          original information in a more direct manner.</li>
      <li>Tom asked if there are additional benefits that could be had by
          addressing the raw string literal issue.</li>
      <li>Alisdair responded that, since trigraphs were removed, this scenario
          is now the tail wagging the dog.</li>
      <li>Steve noted that addressing it could solve the
          <a href="https://lists.isocpp.org/sg16/2020/06/1469.php">issue recently discussed on the SG16 mailing list</a>
          involving EBCDIC characters that get converted to
          <em>universal-character-name</em>s that are not semantically
          preserving.</li>
      <li>Hubert noted that we still have outstanding issues with raw string
          literals and new line characters.</li>
      <li>Corentin suggested that introduction of an additional character
          mapping may be heading in the wrong direction; we want to make things
          simpler and being able to focus solely on Unicode post translation
          phase 1 would help that goal.</li>
      <li>Hubert responded that there is a benefit to having the standard
          reflect the general case.</li>
      <li>Tom suggested it would be useful to give this concern a name and move
          on to other discussion.</li>
      <li>Alisdair raised the relationships between character, character set,
          and character encoding.</li>
      <li>Hubert pondered whether we need character repertoire and noted over
          use of the term character set where character encoding is often
          meant.</li>
      <li>PBrett suggested discontinuing the use of character set.</li>
      <li>Corentin disagreed noting that the execution character set is a
          character set and that discussion of code points requires a character
          set as opposed to a repertoire.</li>
      <li>PBrett asked why a character repertoire plus an encoding doesn't
          suffice.</li>
      <li>Corentin responded that his explanation was based on Unicode
          definitions.</li>
      <li>Hubert stated that use of the Unicode definitions is fine for
          discussion purposes; the basic execution character set is sometimes
          used where an encoding is intended unless you subscribe to the belief
          that <tt>wchar_t</tt> implies a trivial encoding.</li>
      <li>Hubert continued noting that the basic execution character set is
          sometimes used as a repertoire, and at other times used as a character
          set.</li>
      <li>Tom responded that he thinks of the basic execution character set as
          defining a restriction on character sets since it places some
          constraints on code assignments; the code points for digits 0-9 must
          be in sequence, and the code point value for NUL must be 0.</li>
      <li>Hubert noted that the abstract numeric values mapped to abstract
          characters are sometimes ficticious.</li>
      <li>Corentin discussed the idea of the internal character set being a
          repertoire; that works up until translation phase 5 when conversions
          for literals produce objects with values.</li>
      <li>Tom provided a description of his understanding of character
          repertoire, character set, and character encoding.  A character
          repertoire is a set of abstract characters.  A character set is a map
          of abstract characters corresponding to some character repertoire to
          numeric code point values.  A character encoding is a specification
          for how to encode those numeric code point values as a sequence of
          code units.</li>
      <li>Tom asked if any of those definitions were surprising.</li>
      <li>PBrett expressed a little surprise with regard to the implied need for
          a character encoding to have an associated character set since an
          encoding could specify how to encode abstract characters
          directly.</li>
      <li>Steve stated that, according to Unicode, a coded character set defines
          a map of characters to numeric code point values, but that a character
          set in general need not specify such mappings.</li>
      <li>Tom asked for confirmation that we should prefer the term coded
          character set when we explicitly mean a map of characters from a
          repertoire to numeric code point values.</li>
      <li>Steve responded, yes.</li>
      <li>PBrett observed that, for ISO/IEC 8859 specifications other than
          ISO/IEC 8859-1, the specified character repertoire is a subset of the
          Unicode character repertoire, but the specified character set is not a
          subset of the Unicode character set since code point assignments
          differ for some non-ASCII cases.</li>
      <li>PBrett also observed that the basic source character set is a
          repertoire, but the compiler must also define an associated coded
          character set.</li>
      <li>Jens responded that that is true from an implementation perspective,
          but not with regard to how the standard uses it since the standard
          permits symbolic evaluation.</li>
      <li>Hubert noted that the standard may not be very consistent in how the
          existing terms are used, but the use of terms with fewer requirements
          is useful.</li>
      <li>Hubert expressed concern regarding focus on coded character sets
          because it isn't clear that abstract numeric code point values are
          helpful from a specification standpoint.</li>
      <li>Jens responded that it is convenient to be able to discuss a character
          having a numeric value, but agreed that it is not germane to the
          standard.</li>
      <li>Jens continued stating that, at the end of the day, we need to encode
          bytes for a character that was previously abstract; if the use of the
          character set term is confusing, we can replace it, but that seems
          like an editorial concern, albeit a useful one to avoid confusion or
          reduce baggage.</li>
      <li>PBrett expressed support for a new term since character set is often
          confused with encoding.</li>
      <li>Corentin provided the historical perspective that most legacy
          character encodings were trivial encodings of code points from a given
          coded character set, so the terms were almost always interchangeable
          prior to Unicode.</li>
      <li>Steve stated that numeric code point values for basic source
          characters are not observeable though, per
          <a href="http://eel.is/c++draft/cpp.cond#12">[cpp.cond]p12</a>,
          different values corresponding to them may be observed at different
          phases of translation.</li>
      <li>Hubert observed that, within the standard, discussion of character
          sets usually corresponds to the Unicode definition of character
          encoding schemes.</li>
      <li>Tom summarized; it sounds like we likely have need for character
          repertoire and character encoding scheme, but perhaps not for
          character set or coded character set.</li>
      <li>Hubert responded that there may be a need for character set
          specifically when referring to Unicode.</li>
      <li>Tom pondered whether a coded character set is needed for character
          literals.  The current constraint for the value of a character literal
          <em>[ Editor's note: other than for multicharacter literals or
          literals with no representation in the execution character set. ]</em>
          is that the abstract character can be encoded in a single code
          unit.</li>
      <li>Jens stated that the only observable character values are code units
          in Unicode parlance.</li>
      <li>PBrett asked whether <em>unicode-character-name</em>s fits that
          picture.</li>
      <li>Jens replied that we do associate them with Unicode code points, but
          from a standard perspective, they are basically text.</li>
      <li>Hubert suggested use of generalized terminology for these low level
          concerns with Unicode terminology reserved specifically for Unicode
          encodings.</li>
      <li>Alisdair noted that encoding matters can't assume octets.</li>
      <li>Hubert agreed, but noted that some of the ISO blessed specifications
          specify octets and provided a source in chat:
        <ul>
          <li>"(Source: RFC1866) A function whose domain is the set of sequences
              of octets, and whose range is the set of sequences of characters
              from a character repertoire; that is, a sequence of octets and a
              character encoding scheme determining a sequence of
              characters."</li>
          <li><a href="https://www.iso.org/standard/27688.html">ISO/IEC 15445:2000</a>, 4.3</li>
        </ul>
      </li>
      <li>Tom suggested we move on to some polls.</li>
      <li><b>Poll: We should move forward in three phases. 1) define
          terminology, 2) address core wording, 3) address library use of
          terms</b>
        <ul>
          <li><b>Attendees: 12</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll: This group generally believes that C++ lexing and parsing
          behavior through translation phase 4 can be defined in terms of
          character repertoires and without the need for coded character
          values.</b>
        <ul>
          <li><b>Attendees: 12</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">3</th>
                <th style="text-align:right">8</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
              </tr>
            </table>
          </li>
          <li>SA: We'll have the issue that we cannot preserve byte values from
              the source stream; this loses the relation to bytes and is overly
              abstract.</li>
          <li><em>[ Editor's note: After the telecon, Hubert
              <a href="https://lists.isocpp.org/sg16/2020/06/1489.php">posted to the SG16 mailing list</a>
              to express agrement with the SA position: "I agree ... that the
              strict use of abstract characters introduces problems where a
              coded character set contains multiple values for a single abstract
              character/contains characters that are canonically the same but
              assigned different values." ]</em></li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom discussed options for scheduling the next SG16 telecon noting that he
      will not be available the week of June 22nd which would be the next time
      we would meet following our usual cadence.  The group agreed to meet in
      one week, on June 17th, in order to maintain momentum on this topic.</li>
</ul>


<h1 id="2020_06_17">June 17th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Continue discussion of terminology updates to strive for in C++23
    <ul>
      <li>Resume discussion of relationships between (abstract) character,
          (character) repertoire, (coded) character set, and character encoding.
        <ul>
          <li>Review ISO/IEC 10646:2017 section 3 terms and definitions
            <ul>
              <li><a href="https://standards.iso.org/ittf/PubliclyAvailableStandards/c069119_ISO_IEC_10646_2017.zip">https://standards.iso.org/ittf/PubliclyAvailableStandards/c069119_ISO_IEC_10646_2017.zip</a></li>
            </ul>
          </li>
          <li>Review Unicode section 3.4 terms for characters and encodings
            <ul>
              <li><a href="https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf">https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf</a></li>
            </ul>
          </li>
          <li>Review the Unicode glossary
            <ul>
              <li><a href="https://www.unicode.org/glossary">https://www.unicode.org/glossary</a></li>
            </ul>
          </li>
          <li>Review Corentin's email
            <ul>
              <li><a href="https://lists.isocpp.org/sg16/2020/06/1493.php">https://lists.isocpp.org/sg16/2020/06/1493.php</a></li>
            </ul>
          </li>
          <li>Compare and contrast terms as described by the above
              resources.</li>
        </ul>
      </li>
      <li>Determine suitability of ISO/IEC 10646 terms for use in the C++
          standard.</li>
      <li>Discuss the relationship of the above terms to named entities in
          the standard.</li>
      <li>Identify possible terms to add to
          <a href="http://eel.is/c++draft/intro.defs">[intro.defs]</a>.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Marcos Bento</li>
  <li>Mark Zeren</li>
  <li>Martinho Fernandes</li>
  <li>Peter Bindels</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom introduced the topic:
    <ul>
      <li>The intent is continuation of discussion from the prior telecon.</li>
      <li>Polls taken during the prior telecon were presented and it was noted
          that mailing list discussions following the telecon may have changed
          opinions.</li>
    </ul>
  </li>
  <li>Jens opined that we need more than just a glossary; the mailing list
      discussion raised examples of characters that can not go through
      translation phase 1 without information loss.  This means we cannot
      convert to Unicode universally without loss.</li>
  <li>Tom raised a qustion that Corentin had asked him during private discussion
      following the telecon.  Corentin had asked if, given a string literal and
      a raw string literal where both are specified with the same source input
      characters (with extended characters but without escape sequences),
      whether both strings must have the same encoded contents after translation
      phase 5.</li>
  <li>PBrett responded that some people assert that raw string literals should
      effectively copy the byte sequence from the souce input.</li>
  <li>Corentin disagreed with such an interpretation and noted that conversions
      are required.</li>
  <li>Tom presented two possible models for the reversion of
      <em>universal-character-name</em>s (UCNs) in raw string literals during
      translation phase 5.
    <ul>
      <li>The UCN is reverted to the original source input character and that
          character is then encoded in the appropriate encoding for the kind
          of string literal.</li>
      <li>The UCN is reverted to the code point denoted by the UCN and that
          code point is then encoded in the appropriate encoding for the kind
          of string literal.</li>
    </ul>
  </li>
  <li>Corentin opined that the reversion can be accomplished via the as-if rule
      and translation phase 1 and 5 shenanigans.</li>
  <li>Tom asked Jens to comment on Corentin's interpretation of the as-if rule
      in this context from a core perspective.</li>
  <li>Jens responded that the question is whether a conforming program could
      observe the difference.</li>
  <li>Tom replied that implementation-defined behavior is unavoidable here, so
      the standard can't fully define the behavior on its own.</li>
  <li>Corentin stated that, if you have a Unicode character, conversion to
      Shift-JIS provides a choice of code point values for some characters.</li>
  <li>PBrett noted that the program can distinuish behavior here.</li>
  <li>Corentin replied that the original source file encoding can't be
      observed.</li>
  <li>Martinho noted that a program can demonstrate the behavior though.</li>
  <li>Jens stated that programmers have expectations of behavior based on their
      source file encoding; they expect what they write to be carried
      through.</li>
  <li>Tom asked if it would be conforming for an implementation to, given an
      'Å' (U+00C5, LATIN CAPITAL LETTER A WITH RING ABOVE) or
      'Å' (U+212B, ANGSTROM SIGN) in the source input, to always translate
      both to one or the other in the execution character set.</li>
  <li>Corentin replied that, for Unicode input, we can require preservation of
      code points.</li>
  <li>PBrett asked if the standard currently permits such translation.</li>
  <li>Steve responded that translation phase 1 is so loose that any imaginable
      conversion is conforming and provided handling of trigraphs as an
      example.</li>
  <li>Jens agreed and elaborated; translation phase 1 states that physical
      source files are mapped in an implementation-defined manner and that
      mapping can include recognizing and mutating string literals.</li>
  <li>Martinho claimed that an implementation can even recognize every source
      input file as equivalent!</li>
  <li>Jens agreed, but noted that the implementation has to actually define
      what it does.</li>
  <li>PBrett noted the utility of such lenience; for Shift-JIS we only need
      implementation-defined behavior on the input side.</li>
  <li>Steve responded that the conversion to execution character set for
      Shift-JIS could be lossy, but for the Unicode A-with-ring vs
      Angstrom-sign case, it need not be.</li>
  <li>Martinho observed that, if a UCN isn't explicitly written in the source,
      the implementation has freedom to handle the conversion however is
      desired.</li>
  <li>Tom replied that the implementation has such freedom regardless of
      whether the UCN is explicit due to translation phase 1 leniency.</li>
  <li>Corentin stated that leaving these conversions as implementation-defined
      for now will allow us to make progress.</li>
  <li>Jens observed that, for a hypothetical future where Unicode code point
      pass through is required, the implementation-defined steps in between
      can be removed.</li>
  <li>Mark asked if, in that world, whether raw string literals would still
      have to revert UCNs.</li>
  <li>Jens responded yes; translation phase 1 could simulate Unicode input.</li>
  <li>Tom observed that recognition of tokens in translation phase 3 depends on
      UCNs and asked, when a UCN is reverted, what it is reverted to.</li>
  <li>Jens responded that it is reverted to an extended character.</li>
  <li>Tom replied that extended characters are not reflected in the grammar and
      stated that this has implications for the stringize operator in the case
      where a macro name spelled with an extended character is stringized.</li>
  <li>PBrett stated that an extended character is any character in the internal
      character set that is not a member of the basic source character set.</li>
  <li>Corentin stated that the mapping from every extended character to a UCN
      is required.</li>
  <li>Hubert noted that the internal character set is effectively Unicode and
      that this differs from the model used for C.</li>
  <li>Jens agreed and observed that the requirement only exists because extended
      characters must be representable as a UCN.</li>
  <li>PBrett asked if this avoids the need to discuss the Unicode character
      set.</li>
  <li>Jens responded that that is the status quo; the question is whether we
      need to carve an exception for extended characters that don't roundtrip
      through Unicode and whether that is desirable or whether loss of some
      information is ok.</li>
  <li>Jens noted that the UCN mechanism permits translation through an ASCII
      only preprocessor.</li>
  <li>Jens summarized; there are two reasonable positions:
    <ul>
      <li>The status quo; the standard doesn't recognize the existence of
          characters that don't roundtrip through Unicode, or</li>
      <li>The standard should be updated to recognize the possibility of such
          characters and specify behavior for them.</li>
    </ul>
  </li>
  <li>Corentin agreed with Jens' summary, but noted another possible position,
      the standard could specify conversion via Unicode, but require semantic
      preservation for extended characters.</li>
  <li>PBrett asked if the internal character set could be replaced with the
      Unicode character set since the standard requires it to be isomorphic
      anyway.</li>
  <li>Jens expressed concerns about doing so since that would require defining
      behavior for unassigned code points.</li>
  <li>Hubert stated that some implementations map characters to a limited
      internal character set that only supports the current locale; conversion
      through Unicode is a complicated process to get a simple result for
      round tripping.</li>
  <li>Hubert observed that C already adopted a model that doesn't force the
      internal character set to be Unicode.</li>
  <li>Jens noted that C supports UCNs and asked how its model avoids these
      issues.</li>
  <li>Hubert referenced the
      <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf">"C99 rationale" document</a>
      and explained that it documents three models for handling UCNs.  C chose
      one model and C++ chose another.</li>
  <li>Hubert noted that limitations with regard to eager conversion of extended
      characters to UCNs in translation phase 1 effectively requiring all
      extended characters to have representation in Unicode are not discussed
      in the document.</li>
  <li>PBrett asked if implementations that support extended characters not
      represented in Unicode would become non-conforming if the internal
      character set was defined as being Unicode.</li>
  <li>Hubert responded that no, the model adopted for C++ that permits
      observability of UCNs is defective; it seems that C++ failed to specify
      the intended behavior.</li>
  <li>[ <em>Editor's note: The referenced
      <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf">"C99 rationale" document</a>,
      in section 5.2.1, subsection "UCN models", states:</em>
      <div style="padding: .5em; background: #E9FBE9">
      Once this was adopted, there was still one problem, how to specify UCNs in
      the Standard.  Both the C and C++ committees studied this situation and
      the available solutions, and drafted three models:<br/>
        <div style="padding: .5em; background: #E9FBE9">
        A. Convert everything to UCNs in basic source characters as soon as
        possible, that is, in translation phase 1.<br/>
        B. Use native encodings where possible, UCNs otherwise.<br/>
        C. Convert everything to wide characters as soon as possible using an
        internal encoding that encompasses the entire source character set and
        all UCNs.<br/>
        </div>
      Furthermore, in any place where a program could tell which model was being
      used, the standard should try to label those corner cases as undefined
      behavior.
      </div>
      ]
  </li>
  <li>Jens summarized; the UCN model was chosen by C++ decades ago and it has
      issues.  C chose a different model, and Hubert suggests that use of that
      model would not require round trip through Unicode and thus may make more
      programs well-formed.</li>
  <li>PBrett asked if the C model retains the notion of an internal character
      set.</li>
  <li>Hubert responded that C's model doesn't introduce UCNs in translation
      phase 1; rather it has extended characters and wording that achieves the
      same result.  C has explicit wording to handle basic and extended
      characters.</li>
  <li>Jens asked how C avoids handling UCNs in a character literal.</li>
  <li>Hubert responded that C doesn't have to define the special property of
      what can be encoded in a character literal.</li>
  <li>Hubert noted that, if we move away from UCNs, it will be necessary to add
      wording to handle extended characters.</li>
  <li>PBrett stated that it sounds like the C model permits the internal
      character set to be a super set of Unicode.</li>
  <li>Tom noted that Corentin and Steve have both expressed a preference for
      translating extended characters to Unicode code points that are maintained
      distinctly from UCNs.</li>
  <li>Hubert responded that code point is just a term.  If we switch models,
      then we'll need to add wording to handle these scenarios; it might not be
      less wording than is needed for UCNs.</li>
  <li>Corentin agreed, but noted that it would avoid the need for the UCN
      reversion that currently happens in raw string literals and stringize
      operations.</li>
  <li>PBrett asked how the notion of an extended character differs from a code
      point; code point has an implied character set association, but extended
      character doesn't.</li>
  <li>Hubert responded that there is a distinction: extended character excludes
      basic source characters.  This distinction may not be useful.</li>
  <li>Jens expressed concern about potentially losing that distinction since
      extended characters can only appear in a limited number of contexts.</li>
  <li>Corentin expressed a preference for use of common terminology and that
      extended characters would make it difficult to discuss behavior in Unicode
      terms.</li>
  <li>Hubert noted that extended characters just provide differentiation from
      basic source characters because the latter have additional requirements
      placed on them.</li>
  <li>PBrett observed that code points require correlation with a character set,
      but that an extended character can have distinct code points in a single
      character set.</li>
  <li>Steve noted that code point values don't tend to be observable but that
      code units are.</li>
  <li>Hubert stated that the term code point is probably not correct to describe
      a character that can apply generically to multiple character sets.</li>
  <li>Steve listed some of the requirements for the members of the basic
      execution character set; each such character is encoded as a single code
      unit with a non-negative value, and the code unit values for the digits
      0-9 have consecutive values.</li>
  <li>Jens noted that the term "code point" implies an associated numeric value,
      but that such a value is not needed within the standard for the source
      input character set.  Further, on the execution side, it should not be
      assumed that code points are encoded.  A term that is more abstract than
      code point is needed here.</li>
  <li>Hubert agreed that numeric code point values are not needed, but noted
      that abstract character isn't necessarily the right term either.</li>
  <li>Corentin stated that code point could imply a numeric value, but that the
      standard need not discuss it.</li>
  <li>Tom replied that, in ISO/IEC 10646 and Unicode, code point is primarily
      defined as a numeric value.</li>
  <li>Hubert observed that, if the internal character set is specified to be
      Unicode, then there is no requirement to define what a "chraracter" is,
      but use of a term like "extended character" will require avoiding
      discussion of details since they would be implementation-defined.</li>
  <li>Jens observed that implementations could use code point values above
      <tt>0x10FFFF</tt> for extended characters.</li>
  <li>Jens added that there is benefit to being aligned with C if we were to
      adopt the C99 model.</li>
  <li>Jens opined that there is no benefit in requiring the internal character
      set to be isomorphic to Unicode.</li>
  <li>PBrett stated that the alternative to an internal character set is
      Unicode and expressed a preference that, if the internal character set is
      effectively Unicode, that it just be made Unicode.</li>
  <li>Hubert responded that the goal was to avoid formation of UCNs in
      translation phase 1 and that doing so results in having to handle extended
      characters.  That implies that the internal character set must map Unicode
      or Unicode plus additional implementation-defined characters.</li>
  <li><b>Poll: We generally believe that the internal character set should be
      Unicode based, but that implementations can support non-Unicode
      characters.</b>
    <ul>
      <li><b>Attendees: 10</b></li>
      <li>
        <table>
          <tr>
            <th style="text-align:right">SF</th>
            <th style="text-align:right">F</th>
            <th style="text-align:right">N</th>
            <th style="text-align:right">A</th>
            <th style="text-align:right">SA</th>
          </tr>
          <tr>
            <th style="text-align:right">2</th>
            <th style="text-align:right">5</th>
            <th style="text-align:right">1</th>
            <th style="text-align:right">2</th>
            <th style="text-align:right">0</th>
          </tr>
        </table>
      </li>
      <li>A: If non-Unicode characters are allowed, then we are not encouraging
          migration to Unicode and portability.</li>
      <li>A: People with more expertise than us have been defining characters
          for all humanity and this poll states that isn't sufficient.</li>
    </ul>
  </li>
  <li>Hubert responded to the against positions stating that the intent is not
      to change the behavior of current programs and the against positions are
      therefore not consistent with the intent.</li>
  <li><b>Poll: We want to transition away from forming UCNs in phase 1 in favor
      of plumbing extended characters (perhaps as specified by C99)</b>
    <ul>
      <li><b>Attendees: 10</b></li>
      <li><b>No objection to unanimous consent.</b></li>
    </ul>
  </li>
  <li>Tom asked if anyone would be willing to volunteer to summarize the
      mechanism used in C and post it to the mailing list.</li>
  <li>Corentin volunteered.</li>
  <li>Tom confirmed that the next meeting will be on July 8th.</li>
  <li>PBindels reminded the group that EWG is scheduled to review
      <a href="https://wg21.link/p1949r4">P1949R4</a>
      the following day (Thursday, 2020-06-18).</li>
</ul>


<h1 id="2020_07_08">July 8th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li> Continue discussion of terminology updates to strive for in C++23</li>
    <ul>
      <li>Determine suitability of ISO/IEC 10646 terms for use in the C++
          standard.
        <ul>
          <li>Character</li>
          <li>Repertoire</li>
          <li>Code point</li>
          <li>Coded character</li>
          <li>Coded character set</li>
          <li>Code unit</li>
          <li>Code unit sequence</li>
          <li>Encoding form</li>
          <li>Encoding scheme</li>
          <li>UCS codespace</li>
          <li>UCS scalar value</li>
          <li>Well-formed code unit sequence</li>
          <li>Minimal well-formed code unit sequence</li>
          <li>Ill-formed code unit sequence</li>
          <li>Ill-formed code unit sequence subset</li>
        </ul>
      </li>
      <li>Identify possible terms to add to
          <a href="http://eel.is/c++draft/intro.defs">[intro.defs]</a>.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Walter Brown</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Discussion of the suitability of
      <a href="https://www.iso.org/standard/69119.html">ISO/IEC 10646:2017</a>
      terms for use in the C++ standard
    <ul>
      <li>Tom introduced the topic:
        <ul>
          <li>The intent is to focus on terminology, determine what terms from
              ISO/IEC 10646 are usable in the C++ standard and for what
              purposes, and what new terms will be needed.</li>
        </ul>
      </li>
      <li>Zach advised against introducing new terms or redefining existing
          terms with different meanings.</li>
      <li>Hubert agreed that if we try inventing terms, then we risk causing
          some of the same problems that the Unicode consortium did by making
          terms overly specific; we want generic terms.</li>
      <li>PBrett also agreed and noted that we don't want to create an N+1
          specification.</li>
      <li>Jens stated that there may not be much reason for concern; the
          proposed wording for
          <a href="https://wg21.link/p2029">P2029</a>
          illustrates that we can avoid the need for some terms.  For example,
          we may be able to get rid of execution character set completely by
          only discussing an execution encoding rather than a character
          set.</li>
      <li>PBrett asked Jens to confirm that only character encodings can be
          observed, not character sets.</li>
      <li>Jens replied, yes.</li>
      <li>The group proceeded to discuss terms from ISO/IEC 10646.
        <ul>
          <li><b>character</b>:
            <blockquote class="quote">
              member of a set of elements used for the organization, control,
              or representation of textual data<br/><br/>
              Note 1 - A graphic symbol can be represented by a sequence of one
              or several coded characters.
            </blockquote>
            <ul>
              <li>Jens commented that he used to believe that ISO/IEC 10646
                  matched the Unicode standard, but the ISO/IEC 10646 terms
                  differ from Unicode.</li>
              <li>Tom acknowledged and relayed his understanding that we are
                  required to refer to ISO standards when they exist, so we
                  need to first consider the terms from ISO/IEC 10646.</li>
              <li>Jens confirmed that understanding.</li>
              <li>Hubert asked where we envision using the "character" term from
                  ISO/IEC 10646 in the standard.</li>
              <li>Jens replied that we need a term for the members of the basic
                  source character set and for the input source.</li>
              <li>Tom added that we may need the term for the entity that is
                  designated by a simple escape sequence.</li>
              <li>Jens responded that, since simple escape sequences designate
                  an execution time value, that entity can be a code unit
                  sequence.</li>
              <li>Hubert noted that all of the characters designated by simple
                  escape sequences only require a single code unit, not even a
                  code unit sequence.</li>
              <li>Hubert noted that the designated code units do have associated
                  semantics however; like BEL for example.</li>
              <li>Jens replied that semantics can be established by referring to
                  the character name or to a Unicode code point.</li>
              <li>Hubert expressed support for the generality of that approach
                  since it is required that the mapping to execution encoding
                  can't fail.</li>
              <li>PBrett asked if there is a need for the concept of a character
                  for locale purposes.</li>
              <li>Jens replied that there may be, but that we should just focus
                  on core language for now and locale is all run-time.</li>
              <li>Mark observed that <tt>std::basic_string</tt> defines
                  character in its own way.</li>
              <li>Zach asked if "character" will be needed in order to define
                  other terms and noted that any dependencies will need to be
                  resolved in the standard.</li>
              <li>Tom replied that any dependent terms are already available via
                  the existing reference to ISO/IEC 10646.</li>
              <li>Jens stated that the list of terms in the telecon agenda are
                  ones that we should try not to add to
                  <a href="http://eel.is/c++draft/intro.defs">[intro.defs]</a>
                  as the existing terms that are there are not particularly
                  useful.</li>
              <li>Walter agreed and noted that the existing terms are somewhat
                  enemic.</li>
              <li>Hubert stated that not putting terms in
                  <a href="http://eel.is/c++draft/intro.defs">[intro.defs]</a>
                  is concerning unless wording is specific about where used
                  terms come from.</li>
              <li>Tom asked if there is a way that we can be explicit about
                  where terms come from.</li>
              <li>Hubert responsed that we haven't done that previously.</li>
              <li>Walter suggested that can be investigated offline.</li>
            </ul>
          </li>
          <li><b>repertoire</b>:
            <blockquote class="quote">
              specified set of characters that are represented in a coded
              character set
            </blockquote>
            <ul>
              <li>Tom observed that the definition has an explicit dependency
                  on "coded character set".</li>
              <li>Jens stated that the dependency makes that term unusable for
                  our purposes since it isn't sufficiently abstract.</li>
              <li>Hubert agreed.</li>
              <li>Jens stated that a term is needed for the abstract entities
                  that form the source input.</li>
              <li>Tom summarized the observations by stating that this term and
                  its definition can't be used, but we recognize a need for a
                  term that doesn't have a dependency on
                  "coded character set".</li>
              <li>Steve noted that we can't adopt terms from the C standard
                  because they have a different character model; we use the same
                  terms to mean different things.  The
                  <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf">C99 rationale document</a>
                  exposed this.</li>
              <li>Jens agreed and commented that the current C++ model needs to
                  change towards something more like the C model, but the C
                  model wording predates Unicode and doesn't use modern
                  terminology.</li>
            </ul>
          </li>
          <li><b>code point</b>:
            <blockquote class="quote">
              value in the UCS codespace
            </blockquote>
            <ul>
              <li>Tom decreed that the definition is terrible since it requires
                  "UCS codespace".</li>
              <li>Jens read the definition of "UCS codespace".</li>
              <li>Jens noted that "UCS codespace" includes surrogate code
                  points.</li>
              <li>Zach stated that surrogate inclusion is intentional, but
                  people often use code point where scalar value is intended;
                  we'll need more precision in wording.</li>
              <li>Tom asked if an analogue of code point for non-Unicode
                  encodings is needed.</li>
              <li>Jens replied no, only code units are needed; even for
                  character literals.</li>
              <li>Hubert expressed some uncertainty and that something like code
                  point may be needed for <em>universal-character-name</em>s
                  (UCNs).</li>
              <li>Jens summarized Hubert's concern and stated that UCNs are a
                  sequence of characters that designate a scalar value and that
                  we need to be able to state that the universal character set
                  maps to Unicode code points.</li>
              <li>Steve mentioned short-identifier syntax, <tt>U+XXXX</tt>, and
                  noted that, in a UCN, the <tt>XXXX</tt> is the
                  short-identifier.</li>
              <li>Jens replied that short-identifier syntax is problematic
                  because of restrictions on leading 0s; Unicode only allows
                  leading 0s to pad to a maximum length of 6 digits, but UCNs
                  require a length of exactly 4 or 8 digits.</li>
              <li>Jens noted that the "code point" term and its definition can
                  be used, but only in a Unicode context.</li>
            </ul>
          </li>
          <li><b>coded charater</b>:
            <blockquote class="quote">
              association between a character and a code point
            </blockquote>
            <ul>
              <li>Tom noted the term is Unicode specific due to the use of
                  "code point" in the definition.</li>
              <li>Jens agreed and noted the same condition for
                  "coded character set", but emphasized that neither appears to
                  be needed for the C++ standard since only code units and code
                  unit sequences are observable.</li>
              <li>PBrett agreed.</li>
            </ul>
          </li>
          <li><b>code unit</b>:
            <blockquote class="quote">
               minimal bit combination that can represent a unit of encoded text
               for processing or interchange<br/><br/>
               Note 1 - Examples of code units are octets (8-bit code units)
               used in the UTF-8 encoding form, 16-bit code units in the UTF-16
               encoding form, and 32-bit code units in the UTF-32 encoding form.
            </blockquote>
            <ul>
              <li>Tom excitedly noted that this definition is not Unicode
                  specific.</li>
              <li>Hubert agreed and added that it can be used to describe the
                  contents of strings, including wide strings.</li>
              <li>Tom asked if there are any places other than strings where
                  code unit sequence would be relevant.</li>
              <li>Jens replied that there are definitely use cases in the
                  library.</li>
              <li>PBrett asked about the requirement that the values of the
                  characters "0" through "9" in the execution character set be
                  contiguous.</li>
              <li>Hubert replied that that requirement can be defined in terms
                  of code units.</li>
              <li>Jens commented that in other wording he is involved with, that
                  just integer value suffices since <tt>char</tt>,
                  <tt>wchar_t</tt>, etc... are all integer types.</li>
              <li>PBrett recounted claims from others in outside conversations
                  that it may have been a mistake to define the character types
                  as integer types and suggested that, in a rewrite, it may be
                  beneficial to avoid that.</li>
              <li>Jens agreed, but noted that for backward compatibility, a
                  rewrite would have to allow conversions.</li>
              <li>PBrett suggested that it is useful to be able to distinguish
                  between a code unit and an integer value.</li>
              <li>Hubert noted that we would still need to discuss integer
                  values because <tt>char</tt> and <tt>wchar_t</tt> have
                  implementation-defined signedness.</li>
              <li>Jens agreed and stated that other such restrictions
                  exist.</li>
              <li>Zach stated that, in the library wording, having definitions
                  is very useful since the library environment tends to be less
                  abstract.</li>
            </ul>
          </li>
          <li><b>code unit sequence</b>:
            <blockquote class="quote">
              element of interchanged information that is specified to consist
              of a sequence of code units, in accordance with one or more
              identified standards for coded character sets<br/><br/>
              Note 1 - Such sequence can contain code units associated with any
              type of code points.<br/><br/>
              Note 2 - Since its second edition: ISO/IEC 10646:2011, this
              International Standard does not use implementation levels. Its
              definition of code unit sequence corresponds to the former
              unrestricted implementation level 3. Other definitions of code
              unit sequence, previously known as level 1 and 2, are deprecated.
              To maintain compatibility with these previous editions, in the
              context of identification of coded representation in International
              Standards such as ISO/IEC 8824 and ISO/IEC 8825, the concept of
              implementation level can still be referenced as
              ‘Implementation level 3’. See Annex N
            </blockquote>
            <ul>
              <li>Tom observed that this definition appears to require an
                  association with a standard.</li>
              <li>Zach expressed a lack of concern; EBCDIC can be considered a
                  "standard" for this purpose.</li>
              <li>PBrett agreed and stated the same is true for WTF-8.</li>
              <li>Hubert noted that ISO/IEC 10646 may not have the ability to
                  declare something as "implementation-defined", hence a
                  deference to a standard.</li>
              <li>Tom asked for confirmation that this definition is ok for our
                  purposes.</li>
              <li>Jens agreed that it is.</li>
              <li>Walter expressed frustration with the discussed terms and
                  definitions being so circular and asked where terms and
                  definitions that don't depend on prior knowledge might be
                  found.</li>
              <li>Jens responded that, in a standard, definitions should
                  generally be presented at the beginning of the standard and
                  explained by later prose.</li>
              <li>Hubert noted that the quality of these definitions is such
                  that expectations of helpful prose later in the document may
                  lead to disappointment.</li>
              <li>Zach commented that people end up developing a working
                  knowledge of these terms and processes, but the ability to
                  define them well remains elusive.</li>
              <li>Tom lamented a better source of terminology and noted that
                  the reason we are discussing these is exactly because a good
                  agreed upon source of terms is not readily available.</li>
              <li>Jens asserted that this is good motivation for reducing usage
                  to as few terms as possible.</li>
              <li>PBrett agreed and added that "character" should be especially
                  avoided because it probably has the most fuzzy
                  connotations.</li>
            </ul>
          </li>
          <li><b>encoding form</b>:
            <blockquote class="quote">
              form that determines how each UCS code point for a UCS character
              is to be expressed as one or more code units used by the
              encoding form<br/><br/>
              Note 1 - This International Standard specifies UTF-8, UTF-16, and
              UTF-32.
            </blockquote>
              <b>encoding scheme</b>:
            <blockquote class="quote">
              scheme that specifies the serialization of the code units from the
              encoding form into octets<br/><br/>
              Note 1 - Some of the UCS encoding schemes have the same labels as
              the UCS encoding form. However, they are used in different
              contexts. UCS encoding forms refer to in-memory and application
              interface representation of textual data. UCS encoding schemes
              refer to octet-serialized textual data.
            </blockquote>
            <ul>
              <li>Jens stated that encoding scheme is relevant for encoding of
                  octets in big-endian vs little-endian order, and that encoding
                  form is for code units.</li>
              <li>Jens added that encoding scheme is unnecessary for our
                  purposes since endian issues are not specified.</li>
              <li>Jens further added that encoding form is unnecessary since
                  encodings such as UTF-8, UTF-16, and UTF-32 can be referred to
                  by name.</li>
              <li>Mark asked if encoding form might be needed for literals.</li>
              <li>Jens replied that implementation-defined encoding or mention
                  of a specific encoding name suffices.</li>
              <li>Tom noted that specific encoding names will be needed for the
                  implementation-defined encodings for Corentin's
                  <a href="https://wg21.link/p1885">P1885</a>
                  proposal to expose the encoding used for literals and by the
                  locale, but agreed not for core language.</li>
              <li>Tom summarized; consensus seems to be that we don't need
                  encoding form, encoding scheme, or analogues for non-Unicode
                  encodings.</li>
              <li>Zach agreed and noted that "encoding" can be used ithout
                  intruding on "encoding form".</li>
            </ul>
          </li>
          <li><b>UCS codespace</b>:
            <blockquote class="quote">
              codespace consisting of the integers from 0 to 10FFFF
              (hexadecimal) available for assigning the repertoire of the UCS
              characters.
            </blockquote>
              <b>UCS scalar value</b>:
            <blockquote class="quote">
              any UCS code point except high-surrogate and low-surrogate code
              points
            </blockquote>
            <ul>
              <li>Tom stated that both "UCS codespace" and "UCS scalar value"
                  are available for use in Unicode contexts.</li>
              <li>Jens agreed.</li>
              <li>Mark noted that these terms start with "UCS" and that,
                  colloquially, that prefix isn't generally used, but that the
                  standard should specifically use the UCS prefixed terms.</li>
              <li>Jens agreed and added these terms don't appear frequently
                  enough to warrant a shorter term.</li>
              <li>Jens added that "scalar value" by itself is not specific
                  enough anyway.</li>
            </ul>
          </li>
          <li><b>well-formed code unit sequence</b>:
            <blockquote class="quote">
              UCS code unit sequence that purports to be in a UCS encoding form
              which conforms to the specification of that encoding form and
              contains no ill-formed code unit sequence subset
            </blockquote>
              <b>minimal well-formed code unit sequence</b>:
            <blockquote class="quote">
              well-formed code unit sequence that maps to a single UCS scalar
              value
            </blockquote>
            <ul>
              <li>Jens stated that neither of the "well-formed" terms are
                  interesting for core language.</li>
              <li>Tom countered that these could potentially be useful for a
                  fully specified translation phase 1 for Unicode encoded
                  source files.</li>
              <li>PBrett stated that, absent implementation defects, it is not
                  possible for literals to not be well-formed.</li>
              <li>Zach expressed uncertainty.</li>
              <li>Hubert noted that, for source input, all that exists are
                  characters and UCNs, so yes, well-formedness is assured.</li>
              <li>Steve agreed and added that we've previously agreed that
                  ill-formed code unit sequences in literals are possible due
                  to numeric escape sequences, but that the input to the literal
                  encoding is always well-formed.</li>
              <li>Mark expressed surprise that these terms are not needed in the
                  code language.</li>
              <li>Tom replied that library will eventually need these terms or
                  analogous ones.</li>
              <li>Zach agreed that we should revisit these terms for
                  library.</li>
            </ul>
          </li>
          <li><b>ill-formed code unit sequence</b>:
            <blockquote class="quote">
              UCS code unit sequence that purports to be in a UCS encoding form
              which does not conform to the specification of that encoding
              form<br/>
              EXAMPLE - An unpaired surrogate code unit is an ill-formed code
              unit sequence.
            </blockquote>
              <b>ill-formed code unit sequence subset</b>:
            <blockquote class="quote">
              non-empty subset of a code unit sequence X which does not contain
              any code unit which also belong to any minimal well-formed code
              unit sequence subset of X<br/><br/>
              Note 1 - An ill-formed code unit sequence subset cannot overlap
              with a minimal well-formed code unit sequence.
            </blockquote>
            <ul>
              <li>Tom stated that the situation is the same for the "ill-formed"
                  cases as for the "well-formed" ones; they can be used in
                  library, but are not needed for core language.</li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  <li>Tom stated that this meeting concludes our discussion of terminology
      for now and that a paper will be needed to make more progress.</li>
  <li>Tom stated that the next meeting will be on July 22nd and will discuss
      <a href="https://wg21.link/p2178">P2178</a>.</li>
</ul>


<h1 id="2020_07_22">July 22nd, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2139r2">P2139R2: Reviewing Deprecated Facilities of C++20 for C++23</a>
    <ul>
      <li>Provide recommendations for D.20-D.23.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2201r0">P2201R0: Mixed string literal concatenation</a>
    <ul>
      <li>Validate consensus to encourage that this paper be forwarded
          directly to core.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2178r1">P2178R1: Misc lexing and string handling improvements</a>
    <ul>
      <li>Begin discussions on the various proposals.</li>
      <li>Possibly begin taking direction polls.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Alisdair Meredith</li>
  <li>Corentin Jabot</li>
  <li>Jens Maurer</li>
  <li>Martinho Fernandes</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom provided some administrative updates:
    <ul>
      <li>Tom now has a Zoom account setup courtesy of the ISO.</li>
      <li>SG16 telecons will switch to Zoom starting with the next telecon
          on August 12th.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2139r2">P2139R2: Reviewing Deprecated Facilities of C++20 for C++23</a>:
    <ul>
      <li>Alisdair provided an introduction.
        <ul>
          <li>LEWG has already discussed the proposed changes.</li>
          <li>In general, LEWG is in favor of removal of the deprecated
              features since implementors can continue to provide them due
              to the zombie clause
              (<a href="http://eel.is/c++draft/library#zombie.names">[zombie.names]</a>).</li>
        </ul>
      </li>
      <li>D.20: Deprecated Standard code conversion facets [depr.locale.stdcvt]
        <ul>
          <li><em>[ Editor's note: This concerns the <tt>codecvt</tt> facets
              that convert between UCS-2, UTF-8, UTF-16, and UTF-32;
              <tt>codecvt_utf8</tt>, <tt>codecvt_utf8_utf16</tt>, and
              <tt>codecvt_utf16</tt>. ]</em></li>
          <li>Alisdair stated that these interfaces are all underspecified; the
              wording was based on Dinkumware's documentation.</li>
          <li>Alisdair indicated that the reference to UCS-2 in the wording for
              these facets is all that is preventing us from removing the
              normative reference to ISO/IEC 10646:1993.  UCS-2 has been
              deprecated for 20 years and the ISO no longer provides a standard
              with a definition for it.</li>
          <li><em>[ Editor's note: According to
              <a href="https://www.unicode.org/versions/Unicode13.0.0/appC.pdf#I1.7749">chapter 2 of Unicode 13</a>,
              UCS-2 was removed from ISO/IEC 10646 in ISO/IEC 10646:2011.
              ]</em></li>
          <li>Jens agreed that uses of the UCS-2 term and normative reference
              to an outdated standard should be removed.</li>
          <li>PBrett directed the group to
              <a href="https://wg21.link/p0618">P0618</a>,
              the paper that deprecated these features and noted that there were
              recent complaints by a few committee members about deprecating
              these features.  JeanHeyd is now working on a replacement.</li>
          <li><em>[ Editor's note: The paper trail for P0618 is a little
              difficult to follow.  The paper was written to address C++17 NB
              comment GB 57.  LEWG consensus for resolving GB 57 by deprecating
              the <tt>&lt;codecvt&gt;</tt> header was by unanimous consent at
              the Issaquah 2016 meeting. ]</em></li>
          <li>Zach responded that the concerns about deprecation may be
              abstract; that only features that are actively harmful should be
              removed.  Disliking a feature is not sufficient grounds for
              deprecation.</li>
          <li>PBrett noted that the referenced committee members are under the
              impression that the <tt>codecvt</tt> facets work; at least for
              basic uses.</li>
          <li>Alisdair stated that their concern was deprecation without a
              replacement.</li>
          <li>Tom noted that the discussion around those complaints was
              confusing.  Some of the code posted that worked on one platform
              but not another was using <tt>std::codecvt</tt> specializations
              that have never been guaranteed to exist by the standard.  The
              code in question wasn't using the deprecated facets at all.</li>
          <li>Steve stated that these facets are an attractive nuisance; we
              have evidence that people have a hard time using them and that
              trying to use them for UTF-16 often leads to bad bugs.</li>
          <li>Jens stated that there are differences of opinion regarding what
              deprecate means.  For example, comments have been made that
              deprecating <tt>std::regex</tt> is intended to invite alternate
              proposals.  But deprecation may lead to the addition of
              <tt>[[deprecated]]</tt> attributes which may result in warnings
              which may be elevated to errors which may cause problems for
              programmers.</li>
          <li>Jens added that we should have a migration path, but we don't
              have replacements yet.</li>
          <li>Jens asked if we can salvage these interfaces, at least the parts
              that convert between UTF-8 and UTF-16.</li>
          <li>Alisdair responded that the interfaces don't consistently convert
              to UCS-2 vs UTF-16.</li>
          <li>Jens asked if we can just remove the functionality that relates to
              UCS-2.</li>
          <li>Corentin commented that the scope of the paper is deprecation or
              removal and stated that we should not consider other options.</li>
          <li>PBrett agreed with Corentin.</li>
          <li>Alisdair replied that the intent of the paper is to find good
              direction and that he is happy to consider other options.</li>
          <li>Tom suggested that a poll on other approaches might be
              useful.</li>
          <li>PBrett stated that his primary concern with <tt>codecvt</tt> is
              that error handling is poor.</li>
          <li>Zach stated that he has only used these facets once and asked if
              they produce replacement characters for ill-formed code unit
              sequences.</li>
          <li>Alisdair responded that we don't know because the feature is so
              underspecified.</li>
          <li>Zach stated that removal is preferred if these don't conform to
              expected Unicode behavior and conformance requirements.</li>
          <li>Alisdair asked if anyone other than Jens is in favor of trying to
              remove just the UCS-2 support.</li>
          <li>Tom indicated weak support.</li>
          <li>Jens expressed concern about removal without replacement and
              pondered whether these should have been deprecated at all.</li>
          <li>PBrett indicated that he was originally surprised by the
              deprecation, but that the rationale for doing so made sense.</li>
          <li>PBrett added that people will continue to try to use these
              features if they are retained.</li>
          <li>Tom asked what the real life impact is of removal vs
              deprecation.</li>
          <li>Alisdair responded that it depends on what implementors choose to
              do.  Some may hide the interfaces behind macros while others leave
              them in place.  Similar cases in the past have lead to portability
              issues.</li>
          <li>Zach noted that the interfaces might be annotated as removed at
              cppreference.com.</li>
          <li>PBrett noted some indications that some systems are built with the
              deprecated features removed.</li>
          <li>Tom responded that those may be misunderstandings; libstdc++
              limits the available <tt>std::codecvt</tt> facets to
              specializations specified by the standard such that use of unknown
              specializations leads to linker errors.</li>
          <li>Victor stated that the choice should be pretty clear here; these
              features are poorly designed and should be removed.</li>
          <li>Corentin noted that LEWG has already indicated desire to remove
              and is just looking for confirmation.</li>
          <li><b>Poll: The deprecated Standard code conversion facets specified
              in D.20 [depr.locale.stdcvt] should be removed.</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">SF</th>
                    <th style="text-align:right">F</th>
                    <th style="text-align:right">N</th>
                    <th style="text-align:right">A</th>
                    <th style="text-align:right">SA</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">3</th>
                    <th style="text-align:right">3</th>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">2</th>
                    <th style="text-align:right">0</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is for removal.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>D.21: Deprecated convenience conversions [depr.conversions]
        <ul>
          <li><em>[ Editor's note: This concerns the <tt>wstring_convert</tt>
              and <tt>wbuffer_convert</tt> class templates. ]</em></li>
          <li>Alisdair explained that these interfaces were deprecated at the
              same time as the interfaces in D.20, that the current wording has
              a dependeny on those interfaces, that the wording could be updated
              to avoid that dependency (as demonstrated in the paper in the
              proposed wording for D.20), and that the urgency to remove these
              is therefore not as strong as for D.20.</li>
          <li>PBrett observed that the motivation for deprecating these is not
              explained in the paper that proposed their deprecation,
              <a href="https://wg21.link/p0618">P0618</a>.</li>
          <li>Alisdair responded that he does not recall there being strong
              motivation for deprecation other than their association with the
              <tt>codecvt_utf8</tt> and <tt>codecvt_utf8_utf16</tt> facets.</li>
          <li>PBrett expressed some concern about removal given that they can
              still be used with the non-deprecated <tt>codecvt</tt>
              facets.</li>
          <li>Tom noted that there are some locale restrictions; these
              interfaces can't use a locale managed <tt>codecvt</tt> facet.</li>
          <li>Jens responded that it looks like it only requires no side
              effects that impact locale.</li>
          <li>Corentin agreed with Peter's concerns; these interfaces aren't
              particularly harmful or confusing.</li>
          <li>Alisdair asked if un-deprecating these should we considered.</li>
          <li>Jens replied that a suitable replacement that handles errors
              properly is likely to have a different interface, so
              un-deprecating these is probably not the right choice without
              other motivation.</li>
          <li>Zach noted that these interfaces don't appear to be an active
              problem; no one uses them accidentally.</li>
          <li>Steve asked if the question to SG16 should be whether we object
              to removal.</li>
          <li>Alisdair responded that he heard more informed discussion in the
              last five minutes than he had in LEWG.</li>
          <li>Jens opined that removal is under-motivated.</li>
          <li>Alisdair asked if there would be more support for removal if a
              replacement was available.</li>
          <li>A chorus of affirmations was heard.</li>
          <li>Alisdair responded favorably and noted that features should not
              be left in annex D perpetually.</li>
          <li><b>Poll: The deprecated convenience conversions specified in D.21
              [depr.conversions] should be removed.</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">SF</th>
                    <th style="text-align:right">F</th>
                    <th style="text-align:right">N</th>
                    <th style="text-align:right">A</th>
                    <th style="text-align:right">SA</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">0</th>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">6</th>
                    <th style="text-align:right">2</th>
                    <th style="text-align:right">0</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is for no change to status quo.</li>
            </ul>
          </li>
          <li><b>Poll: Does SG16 object to removal of the deprecated convenience
              conversions specified in D.21 [depr.conversions]?</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">Yes</th>
                    <th style="text-align:right">No</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">8</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is no objection.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>D.22: Deprecated locale category facets [depr.locale.category]
        <ul>
          <li><em>[ Editor's note: This concerns the <tt>char</tt>-based UTF-8
              <tt>codecvt</tt> and <tt>codecvt_byname</tt> specializations.
              ]</em></li>
          <li>Alisdair mentioned that this deprecation came from SG16.</li>
          <li>Tom explained that these facets were deprecated with the
              introduction of <tt>char8_t</tt>; the deprecated specializations
              squat on the interfaces that would be desired for conversion
              between the locale dependent narrow encoding and either UTF-16
              or UTF-32.</li>
          <li>Tom stated that we don't know what will happen with
              <tt>char8_t</tt>, particularly in the Linux community where the
              narrow locale is dependably UTF-8; projects that build with
              <tt>char8_t</tt> support disabled may benefit from preserving
              these.</li>
          <li>Jens noted that these specializations were just deprecated in
              C++20.</li>
          <li>Tom stated that retaining these may be useful for code that needs
              to be compatible across C++17 and C++23, perhaps in projects that
              introduce a typedef as conditionally <tt>char</tt> or
              <tt>char8_t</tt>.</li>
          <li>Alisdair observed that zombification may not be a good answer in
              that case.</li>
          <li>PBrett asked how likely it is that we would want to re-use these
              specializations.</li>
          <li>Tom responded that it is not very likely; we want to move away
              from <tt>std::codecvt</tt>.</li>
          <li>Zach agreed.</li>
          <li>Steve predicted that the repurposed specializations would probably
              only be used with the <tt>wstring_convert</tt> and
              <tt>wbuffer_convert</tt> interfaces which may be removed soon.</li>
          <li>Alisdair observed that these specializations don't become zombies
              because they are just specializations, not names.</li>
          <li>PBrett asked what LEWG's inclination was.</li>
          <li>Alisdair responded that it was to remove and depend on the zombie
              clause.</li>
          <li><b>Poll: The deprecated locale category facets in D.22
              [depr.locale.category] should be removed.</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">SF</th>
                    <th style="text-align:right">F</th>
                    <th style="text-align:right">N</th>
                    <th style="text-align:right">A</th>
                    <th style="text-align:right">SA</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">2</th>
                    <th style="text-align:right">2</th>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">2</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is for no change to status quo.</li>
              <li>SF: I'm not empathetic towards the argument that people may
                  not use <tt>char8_t</tt> on Linux, nor do I find the typedef
                  compatibility approach compelling.</li>
              <li>SA: I'm concerned about ease of writing code that is
                  compatible across C++17 and C++23.</li>
            </ul>
          </li>
          <li><b>Poll: Does SG16 object to removal of the deprecated locale
                 category facets in D.22 [depr.locale.category]?</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">Yes</th>
                    <th style="text-align:right">No</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">1</th>
                    <th style="text-align:right">8</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is no objection.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>D.23: Deprecated filesystem path factory functions
          [depr.fs.path.factory]
        <ul>
          <li><em>[ Editor's note: This concerns
              <tt>std::filesystem::u8path</tt>. ]</em></li>
          <li>Alisdair explained that <tt>u8path</tt> only existed because
              <tt>char8_t</tt> wasn't available to differentiate constructor
              declarations for narrow encoding vs UTF-8; the <tt>char8_t</tt>
              constructor is now available.</li>
          <li>Alisdair added that LEWG's inclination is to remove the function
              and rely on the zombie clause for backward compatibility.</li>
          <li>Jens asked what the LEWG quorum was for the discussion.</li>
          <li>Alisdair responded that there were about 30 attendees with good
              breadth of experience but not necessarily depth.</li>
          <li>Corentin opined that this removal is not really an SG16 matter
              and is more traditional LEWG territory.</li>
          <li>PBrett agreed that this isn't really an SG16 matter.</li>
          <li>Jens noted that a replacement is available but opined that
              removal is premature since this was just deprecated in C++20.</li>
          <li>Alisdair noted that the function was just added in C++17, so
              hasn't been around much.</li>
          <li>Tom commented that the same concerns about C++17 and C++23
              compatibility discussed for the deprecated <tt>codecvt</tt>
              specializations applies here.</li>
          <li><b>Poll: Does SG16 object to removal of the deprecated filesystem
              path factory functions in D.23 [depr.fs.path.factory]?</b>
            <ul>
              <li><b>Attendees: 9</b></li>
              <li>
                <table>
                  <tr>
                    <th style="text-align:right">Yes</th>
                    <th style="text-align:right">No</th>
                  </tr>
                  <tr>
                    <th style="text-align:right">0</th>
                    <th style="text-align:right">9</th>
                  </tr>
                </table>
              </li>
              <li>Consensus is no objection.</li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2201r0">P2201R0: Mixed string literal concatenation</a>:
    <ul>
      <li>Jens introduced the paper.</li>
        <ul>
          <li>This makes mixed encoding string literal concatenation
              ill-formed.</li>
          <li>The only compiler known to implement this conditionally-supported
              implementation-defined behavior is the
              <a href="http://sdcc.sourceforge.net">SDCC</a>
              C compiler.  No C++ compilers are known to support it.</li>
        </ul>
      </li>
      <li>Tom stated that the intent is, assuming consensus, to forward this
          paper directly to the CWG assuming agreement by the EWG chair.</li>
      <li><b>Poll: Direct Tom to recommend to the EWG chair that P2201R0 be
          forwarded directly to the CWG.</b>
        <ul>
          <li><b>Attendees: 9</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">8</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li>Consensus is to forward to the CWG.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom stated that the next telecon will be held August 12th and will discuss
      <a href="https://wg21.link/p2178r1">P2178R1</a>.</li>
</ul>


<h1 id="2020_08_12">August 12th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2178r1">P2178R1: Misc lexing and string handling improvements</a>
    <ul>
      <li>Begin discussions on the various proposals.</li>
      <li>Possibly begin taking direction polls.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Mark Zeren</li>
  <li>Martinho Fernandes</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Walter Brown</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom provided an administrative update:
    <ul>
      <li>The EWG chair declined forwarding
          <a href="https://wg21.link/p2201r0">P2201R0: Mixed string literal concatenation</a>
          directly to the CWG in order to avoid any possible appearance of
          unilateral decision making.  The paper will be reviewed during the
          EWG telecon on August 19th.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2178r1">P2178R1: Misc lexing and string handling improvements</a>
    <ul>
      <li>Tom stated that the proposals will not be discussed in the order
          presented in the paper as proposals 1 and 9 are complicated and/or
          contentious.  The goal is to provide feedback quickly on the proposals
          that are unlikely to be contentious so that progress can be made on
          those without being held up by the others.</li>
      <li>Hubert asked if support for proposal 1, mandated support for UTF-8 as
          a source file encoding, could be handled by EWG without SG16 holding
          it up.</li>
      <li>Tom responded that there are technical details and possible points of
          contention that should be worked out in SG16 first.</li>
      <li>Corentin provided an overview of the paper.
        <ul>
          <li>The paper presents a number of proposals intended to address
              issues identified with current lexing behavior and wording.</li>
          <li>As prior discussion has revealed, lack of consistent terminology
              leads to confusion; we need to ensure the</li>
          <li>underlying model is commonly understood.</li>
          <li>Many of the issues address concerns that are especially
              significant for Unicode support.</li>
          <li>The proposals are bundled into a single paper due to
              interconnected concerns.</li>
        </ul>
      </li>
      <li>Proposal 2: What is a whitespace or a new-line?
        <ul>
          <li>Corentin stated that this is intended to align with Unicode
              specifications for what constitutes whitespace.</li>
          <li>Corentin added that the motivation is to move away from
              implementation-defined behavior in phase 1.</li>
          <li>PBrett asked if this proposal is seperable from the others; the
              introduction argues for considering all of these proposals
              collectively.</li>
          <li>Corentin replied that he would like to have just one paper for
              wording.</li>
          <li>PBrett acknowledged that goal but repeated the question as to
              whether separation is possible.</li>
          <li>Corentin replied that separation is possible, but that the
              individual proposals have less value, and therefore little urgency
              to address, when considered individually rather than
              collectively.</li>
          <li>Corentin asked what the semantics should be for a raw string
              literal and whether the exact line termination sequence should be
              preserved.</li>
          <li>Tom replied that there is a core issue for that.</li>
          <li>Corentin acknolwedged and noted that it is mentioned in the paper
              (<a href="https://wg21.link/cwg1655">CWG #1655</a>).</li>
        </ul>
      </li>
      <li>Proposal 3: Preserve Normalization forms
        <ul>
          <li>Corentin stated that the intent is to standardize existing
              practice and to persist source information through translation
              phases 1 and 5.</li>
          <li>Tom asked if this proposal is dependent on proposal 1 and then
              answered his own question in the negative.</li>
          <li>Zach noted that there is a dependence on knowing what the source
              encoding is.</li>
          <li>Corentin replied that the compiler knows what encoding is being
              used.</li>
          <li>Zach acknowledged, but noted that the compiler has to be informed,
              so stating that it knows the encoding is vacuous.</li>
          <li>Corentin stated that the intent is that, if the source is UTF-8,
              that code points are preserved.</li>
          <li>Zach responded that we previously determined that we can't
              reliably determine when the encoding being used does not match;
              there needs to be a portable way to indicate the source
              encoding.</li>
          <li><em>[ Editor's note: that determination was made during
              discussions of
              <a href="https://wg21.link/p1879">P1879</a>. ]</em></li>
          <li>Hubert stated that this just requires that the implementation
              specifies the encoding that is being used for the source
              input.</li>
          <li>Tom asked if normalization form is the right concern; preservation
              of code points would address the more general concern.</li>
          <li>PBrett noted that this proposal is separable from proposal 1
              because the implementation knows the encoding that is being
              used.</li>
          <li>PBrett added that this proposal is applicable for all encodings
              since non-basic source characters are mapped to
              <em>universal-character-name</em>s (UCNs).</li>
          <li>Tom requested that the paper address the case where the execution
              encoding supports é as a combined character
              (e.g., U+00E9 {LATIN SMALL LETTER E WITH ACUTE}),
              but not as separate characters
              (e.g., U+0065 {LATIN SMALL LETTER E} followed by
              U+0301 {COMBINING ACUTE ACCENT}).</li>
          <li>Zach opined that this proposal should still be coupled with
              proposal 1.</li>
          <li>Tom replied that Peter's explanation seems sufficient to describe
              how this would work in an encoding agnostic way.</li>
          <li>Zach stated that requires knowing what the source encoding is.</li>
          <li>Hubert noted that discussion of codepoint-by-codepoint translation
              is challenging without more structure around translation phase
              1.</li>
        </ul>
      </li>
      <li>Proposal 4: Making trailing whitespaces non-significant
        <ul>
          <li>Corentin stated that this is a lexing concern, but not really a
              Unicode or text concern.</li>
          <li>Corentin explained that gcc defends its removal of trailing white
              space as part of its translation phase 1 semantics.</li>
          <li>Corentin noted that Microsoft Visual C++ behavior diverges from
              gcc and clang.</li>
          <li>Corentin added that editors may implicitly remove trailing
              whitespace; semantically meaningful trailing whitespace is
              therefore fragile.</li>
          <li>Corentin summarized; the proposal is to align the standard with
              the behavior exhibited by gcc and Clang and to ignore trailing
              white space for the purposes of determining line
              continuation.</li>
          <li>Hubert observed that proposal 2 seeks to do the opposite of the
              intent for this proposal by potentially preserving the form of
              line endings, at least in raw string literals.</li>
          <li>Hubert added that the usual way this elision of trailing
              whitespace is handled is by claiming that the preceding white
              space is considered part of the line termination.</li>
          <li>Tom asked if there had been any comments from Microsoft
              implementors given that a change here would presumably require
              a change to their implementation.</li>
          <li>Corentin responded that he had reached out, but didn't hear
              back.</li>
        </ul>
      </li>
      <li>Proposal 5: Restricting multi-characters literals to members of the
          Basic Latin Block
        <ul>
          <li>Corentin noted that multi-character literals are used and this is
              not a proposal to removing them.</li>
          <li>Corentin explained that multicharacter literals that present as a
              single character are confusing, for example <tt>'é'</tt> written
              with a combining character.</li>
          <li>Corentin added that implementations diverge in their handling of
              them.</li>
          <li>Corentin stated that the proposal intent is to make such confusing
              cases ill-formed. </li>
          <li>PBrett expressed support for this direction.</li>
          <li>Tom asked why the restriction is to one code point.</li>
          <li>Corentin replied that the intent is that each character in the
              literal be limited to, effectively, ASCII.</li>
          <li>Mark asked why the 4th example is not ok given that the 2nd and
              3rd examples are.</li>
          <li><em>[ Editor's note: the 2nd example is <tt>'abc'</tt>, the 3rd is
              <tt>'\u0080'</tt>, and the 4th is <tt>'\u0080\u0080'</tt>.
              ]</em></li>
          <li>Corentin responded that the 3rd example is not a multicharacter
              literal, but the 4th is.  The 4th is excluded because it contains
              <em>c-char</em>s that identify characters outside the Unicode
              basic Latin block.</li>
          <li>PBrett opined that cases like the 2nd example are used, but that
              cases like the 4th are not and have no known use cases.</li>
          <li>Hubert observed that the examples are incomplete without octal and
              hex escapes.</li>
          <li>Tom expressed difficulty trying to understand how to separate
              between the basic source character and UCN examples.</li>
          <li>Hubert suggested that some presentation improvements might make
              the examples easier to understand.</li>
          <li>Hubert expressed support for allowing octal and hex escapes within
              multicharacter literals.</li>
          <li>Tom, still trying to comprehend the examples, expressed a belief
              that he was reading far too much into the use of UCNs in the
              example.</li>
          <li>PBrett stated that the use of UCNs is intended to make it more
              clear exactly which character is designated.</li>
          <li>PBrett suggested either adding or changing the examples for the
              next revision.</li>
          <li>Mark observed that broken UTF-8 is allowed in string literals, but
              that this is kind of different.</li>
          <li>Tom disagreed and noted that numeric escapes would not get
              transcoded, but would still contribute a value to the appropriate
              "slot" in the <tt>int</tt> value.</li>
          <li>Tom asked if the size of <tt>int</tt> is relevant.  For example,
              if <tt>sizeof(int)</tt> was 2, would the number of
              <em>c-char</em>s allowed in the multi-character literal be limited
              to 2?</li>
          <li>Corentin responded that no, that would still be
              implementation-defined; the intent is just to address the visual
              confusion.</li>
          <li>Mark noted that this is technically a breaking change, but that
              numeric escapes can be used as a work around.</li>
          <li>Corentin responded affirmatively, but noted the concern is mostly
              theoretical; he hasn't been able to find any examples that would
              be disallowed by these changes.</li>
          <li>Hubert noted that swapping in a numeric escape could change
              behavior and therefore should not be suggested as a a compiler
              fixit hint.</li>
        </ul>
      </li>
      <li>Proposal 6: Making wide characters literals containing multiple or
          unrepresentable c-char ill-formed
        <ul>
          <li>Corentin explained that wide multicharacter and non-encodable
              character literals are inherited from C.</li>
          <li>Corentin noted that there is implementation divergence; some
              compilers produce warnings and some do not.</li>
          <li>Mark observed that the paper does not include data from code
              searches.</li>
          <li>Corentin responded with uncertainty whether he had conducted code
              searches for this proposal.</li>
          <li>Tom recalled possibly seeing these used with Visual C++ and
              <tt>TCHAR</tt>.</li>
          <li>Corentin stated that he can't say with certainty that these are
              not used.</li>
          <li>Hubert noted that Corentin's research indicates these don't
              behave like ordinary multi-character literals.</li>
          <li>PBrett stated that the different behavior contradicts Tom's
              recollections.</li>
          <li>Tom suggested that his recollection is likely incorrect.</li>
          <li>Tom stated that the motivation for this proposal seems somewhat
              different than for the previous proposal; this proposal isn't
              just about avoiding visual confusion.</li>
          <li>PBrett replied that it is similar; the motivation for the prior
              case applies here, but is compounded by the fact that all but one
              of the <em>c-char</em>s in the literal are ignored in this
              case.</li>
          <li>Tom acknowledged but noted that is similar to the previous case
              too where excess <em>c-char</em>s are ignored.</li>
        </ul>
      </li>
      <li>Proposal 7: Making conversion of character and string literals to
          execution and wide execution encoding ill-formed for unrepresentable
          c-char
        <ul>
          <li>Corentin explained that Clang rejects such conversions and Visual
              C++ substitutes a '?'.  According to Billy O'Neal, the replacement
              with a question mark is due to the default behavior of the
              conversion functions used.</li>
          <li>Tom stated that the paper should be updated to add a reference to
              <a href="https://wg21.link/p1854">P1854</a>.</li>
          <li>Tom continued; in Belfast, an example was discussed of checking if
              a character in a literal is converted to a specific value in order
              to infer the execution encoding.</li>
          <li>Tom provided an example:
<pre>
"\u1234" == 0x1234
</pre>
          </li>
          <li>Hubert suggested an alternative syntax for fun:
<pre>
__try__("\u1234") == 0x1234 // :)
</pre>
          </li>
          <li>Corentin stated that this seems like a different issue.</li>
          <li>Tom agreed, but noted that making non-encodable characters
              ill-formed means such checks can no longer be performed.  The
              intent is to allow code to use some characters if available and
              to fallback otherwise.
<pre>
if ('\u1234' == 0x73) {
  return '\u1234';
} else {
  return 'X';
}
</pre>
          </li>
          <li>Pbrett noted that this presents a trade off for a small number
              of people who care about clever tricks like that vs the many
              more programmers that might experience surprising behavior.</li>
          <li>Zach observed that the code presented presumably doesn't work
              for gcc and clang.</li>
          <li>Tom replied that gcc will accept it depending on whether
              <tt>-finput-charset</tt> and/or <tt>-fexec-charset</tt> are
              specified; if gcc has to get <tt>iconv</tt> involved, then an
              error may be reported.</li>
          <li>Tom added that the trade off is the important concern here, not
              the use case; the use case can be addressed in other ways.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p1949">P1949: C++ Identifier Syntax using Unicode Standard Annex 31</a>:
    <ul>
      <li>Tom asked Steve if he had any updates to share since the EWG
          review.</li>
      <li>Steve replied that he was without power for a while but that he would
          try to get an update into the August mailing.</li>
    </ul>
  </li>
  <li>Tom stated that the next meeting will be August 26th and that we'll
      continue discussing
      <a href="https://wg21.link/p2178r1">P2178R1</a>
      starting with proposal 8.</li>
  <li>Tom reminded the group that Jen's paper,
      <a href="https://wg21.link/p2201r0">P2201R0: Mixed string literal concatenation</a>,
      will be presented to EWG on August 19th.</li>
</ul>


<h1 id="2020_08_26">August 26th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2178r1">P2178R1: Misc lexing and string handling improvements</a>
    <ul>
      <li>Continue discussions on the various proposals in the order 8, 10-12, 1<br/>
          (discussion of proposal 9 will be deferred due to the arrival of P2194R0).
      <li>Begin taking direction polls.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>JeanHeyd Meneide</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://wg21.link/p2178r1">P2178R1: Misc lexing and string handling improvements</a>
    <ul>
      <li>Proposal 8: Enforcing the formation of universal escape sequences in
          phase 2 and 4
        <ul>
          <li>Corentin stated that these cases of undefined behavior are
              surprising; defining behavior would not appear to present a
              problem to implementations.</li>
          <li>Corentin added that gcc, Clang, Visual C++, and the EDG based
              Intel C++ compiler all exhibit the same behavior.</li>
          <li>Corentin mentioned that SG12 should be consulted.</li>
          <li>Corentin asserted that, if defining portable behavior presents a
              challenge, then the standard should specify that behavior is
              implementation-defined.</li>
          <li>Hubert stated that the undefined behavior is present to
              accommodate various preprocessor models; early models recognized
              <em>universal-character-name</em>s (UCNs) in translation phase 1
              and did not check for them again after translation phase 2
              (logical line formation) or translation phase 4
              (macro expansion and token pasting); the differences are
              observable.</li>
          <li>Hubert noted that there are many C implementations, so WG14 may
              not be interested in defining this behavior.</li>
          <li>Jens stated that preprocessor undefined behavior falls under SG12,
              but he is unaware of any activity addressing this specific
              issue.</li>
          <li>Jens asserted that both SG12 and WG14 should be informed of any
              efforts here.</li>
          <li>Jens noted that defining behavior just for C++ does not impact
              compatibility with C.</li>
          <li>Tom stated that this is not an SG16 concern.</li>
          <li>Corentin agreed.</li>
        </ul>
      </li>
      <li>Proposal 10: Make L in _Pragma ill-formed
        <ul>
          <li>Corentin explained that <tt>_Pragma</tt> expressions written with
              a wide string literal are well-formed in both C and C++, but are
              semantically identical to an expression written with an ordinary
              string literal.</li>
          <li>Corentin added that C also permits the string literal to be
              written with <tt>u8</tt>, <tt>u</tt>, and <tt>U</tt> encoding
              prefixes as well; C++ only allows <tt>L</tt>.</li>
          <li>Corentin stated that the intent is to make the presence of an
              encoding prefix ill-formed since it serves no semantic
              purpose.</li>
          <li>PBrett agreed with the direction and stated that an encoding
              prefix being present only leads to confusion.</li>
          <li>Tom asked if it matters that <tt>_Pragma</tt> is processed in
              translation phase 4, but that tokenization is performed in
              translation phase 3.</li>
          <li>Hubert responded that it would for raw string literals.</li>
          <li>PBrett asked if raw string literals are allowed.</li>
          <li>Hubert expressed uncertainty.</li>
          <li>Corentin stated that when WG14 adopted support for the <tt>u</tt>
              and <tt>U</tt> encoding prefixes, they systematically added them
              everywhere that the <tt>L</tt> encoding was allowed; C++ did not
              do likewise.</li>
          <li>Jens stated that failure to add the additional encoding prefixes
              in C++ was an oversight.</li>
          <li>Jens noted that <tt>_Pragma</tt> accepts a <em>string-literal</em>
              and that includes <em>raw-string</em>.</li>
          <li>Jens asserted that this is not SG12 territory, but is liaison
              territory with WG14.</li>
          <li>PBrett noted that this is technically evolutionary.</li>
          <li>Corentin stated that this is not really an SG16 concern.</li>
          <li>Tom agreed; there is no actual encoding here.</li>
          <li>PBrett asked for confirmation that these strings are interpreted
              directly by the compiler.</li>
          <li>Mark asked if the compiler observes the source encoded
              string.</li>
          <li>Corentin replied that the compiler observes the string in the
              internal encoding.</li>
          <li>Tom agreed and noted that the observation occurs after translation
              phase 1 (conversion to internal encoding) and before translation
              phase 5 (conversion to execution character set).</li>
          <li>Jens opined that the use of <em>string-literal</em> is a hack to
              align behavior with <tt>#pragma</tt>.</li>
          <li>JeanHeyd asked for confirmation that the goal is to prohibit an
              encoding prefix as opposed to the current behavior that ignores
              an encoding prefix.</li>
          <li>Corentin replied affirmatively.</li>
          <li>JeanHeyd noted that this does create an incompatibility with C
              then, but it probably isn't a big deal.</li>
          <li>Tom asked if Corentin's code survey accounted for string literals
              produced by macro expansion.</li>
          <li>Corentin replied that it did not.</li>
          <li>Jens noted that a macro expansion could produce a string literal
              with an encoding prefix.</li>
          <li>PBrett observed that making the presence of an encoding prefix
              ill-formed doesn't mean an implementation has to reject the code;
              it just means that a diagnostic is required.</li>
          <li>Steve stated that the intent of <tt>_Pragma</tt> is to be an
              alternative to <tt>#pragma</tt>, one that is friendly to macros,
              but there is no encoding involved.</li>
          <li>Jens agreed; no encoding involved, an encoding prefix serves no
              purpose.</li>
          <li>Jens noted that <tt>_Pragma</tt> is relatively new; it was
              introduced in C99.</li>
          <li>JeanHeyd observed that an <tt>_Pragma</tt> expression written with
              a wide string literal might show up on Windows due to use of a
              <tt>TCHAR</tt> aware macro.</li>
          <li>JeanHeyd suggested that it might be best to just follow C; but
              that either all encoding prefixes should be allowed and ignored,
              or they should all be disallowed.</li>
          <li>Corentin stated that programmers don't tend to use a macro with
              <tt>_Pragma</tt>.</li>
          <li>Tom disagreed and noted that <tt>_Pragma</tt> was introduced as a
              macro friendly alternative to <tt>#pragma</tt>.</li>
          <li>Tom then reverted his disagreement by noting that macros can be
              used with <tt>#pragma</tt> as well (so long as the
              <tt>#pragma</tt> tokens themselves are not the result of macro
              expansion).</li>
          <li>Mark asked if the grammar for <tt>_Pragma</tt> should be
              specified using <em>string-literal</em>.</li>
          <li>Jens replied that that is not an SG16 concern.</li>
        </ul>
      </li>
      <li>Proposal 11: Make character literals in preprocessor conditional
          behave like they do in C++ expression
        <ul>
          <li>Corentin explained that character literal values can be inspected
              in preprocessor conditional directives during translation phase 4,
              but the values observed then are not required to match
              observations for character literal values during translation
              phase 7.</li>
          <li>Corentin stated that the existing specification is presumably
              intended to support an external preprocessor.</li>
          <li>Corentin added that the intent is to reduce the number of
              implementation-defined encodings in the standard and to match
              existing practice and existing programmer expectations as
              determined by code surveys.</li>
          <li>Hubert noted that the example is incorrect assuming the intent was
              to compare against ASCII values; the <tt>\x65</tt> and
              <tt>0x65</tt> should presumably be <tt>\x41</tt> and <tt>0x41</tt>
              respectively.</li>
          <li>Hubert confirmed that compilers on z/OS use the same character
              encoding for character literal observations made during
              translation phase 4 and translation phase 7.</li>
          <li>Tom asked about cross compilers; a tool chain that uses an
              external preprocessor may not have support for, or be aware of,
              the character encoding observed at translation phase 7.</li>
          <li>Hubert responded that, in cross compilation scenarios, headers are
              highly likely to be consistent between a cross compilation
              environment and native environment on the target; the observed
              values therefore need to be consistent in both environments.</li>
          <li>Steve agreed; many cross compilation environments require mounting
              a remote filesystem for access to headers and libraries.</li>
          <li>Tom stated that there are two possibilities for the character
              encoding observed at translation phase 4; either the internal
              encoding or the execution encoding.</li>
          <li>PBrett noted that the internal encoding should never be
              observable.</li>
          <li>Tom stated that this is technically a breaking change.</li>
          <li>Jens agreed, but noted that we know of no implementations that
              would be broken.</li>
          <li>Jens added that it would be odd to associate a character encoding
              with the preprocessor.</li>
          <li>Jens stated that, from a wording perspective, we'll need to state
              that the preprocessor must perform the same conversion for
              character literals at translation phase 4 that is done at
              translation phase 5.</li>
          <li>PBrett stated that he had been unaware that the preprocessor was
              potentially using a distinct character encoding; that would likely
              be a surprise to many programmers.</li>
          <li>Tom noted that this potentially has implementation impact since
              compiler drivers will need to coordinate with the preprocessor and
              the compiler to ensure a matching character encoding is used.</li>
          <li>Steve noted a typo; in the third paragraph, "where" should be
              "were" in "Of the 50 usages of the pattern, all but one where in C
              libraries."</li>
        </ul>
      </li>
      <li>Proposal 12: Phase 6 needs fixing
        <ul>
          <li>Corentin expressed uncertainty regarding how to address this
              issue.</li>
          <li>Corentin opined that it is odd that the encoding would not be
              determined by the first string literal.</li>
          <li>Corentin stated that, if a time machine were to suddenly
              materialize, the standard would require the encoding-prefix to be
              present for the first string literal.  But it is likely too late
              to make such a change now.</li>
          <li>Corentin added that this issue will be less significant if Jens'
              <a href="https://wg21.link/p2201">P2201</a>
              is adopted.</li>
          <li>Jens mentioned that a
              <a href="https://wiki.edg.com/pub/Wg21summer2020/SG16/d2201r1.html">D2201R1</a>
              now exists with the EWG requested changes.</li>
          <li>Jens added that P2201 isn't fundamentally related to this issue,
              though.</li>
          <li>Jens stated that
              <a href="https://wiki.edg.com/pub/Wg21summer2020/CoreWorkingGroup/cwg_active.html#2455">core issue 2455</a>
              now tracks this issue.</li>
          <li>Jens directed the group to
              <a href="https://wiki.edg.com/pub/Wg21summer2020/SG16/charset.html">a draft paper</a>
              that demonstrates one way to address this issue.</li>
          <li>Jens opined that this issue is really just a core issue; the
              wording is defective, but the intent is clear in
              <a href="http://eel.is/c++draft/lex.string#11">[5.13.5]</a>.</li>
          <li>Tom agreed.</li>
          <li>Steve reminded the group that there is implementation
              divergence.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Polls on
      <a href="https://wg21.link/p2178r1">P2178R1</a>
      proposals:
    <ul>
      <li>Proposal 2: What is a whitespace or a new-line?
        <ul>
          <li>Hubert stated that this proposal deals in the formation and
              replacement of newlines and therefore can not be meaningfully
              separated from the noted core issue;
              <a href="https://wg21.link/cwg1655">core issue 1655</a>.</li>
          <li>Corentin responded that the intent is that line endings are
              preserved through translation phase 1.</li>
          <li>Tom noted that specifying that intent is difficult since
              translation phase 1 is so loose.</li>
          <li>Corentin suggested that a new grammar term for newline may be
              needed.</li>
          <li>PBrett stated that the current poll should focus on whether we
              support the proposed direction.</li>
          <li>Hubert asserted that an implementation survey should be done
              since line numbers are observable via <tt>__LINE__</tt> and
              <tt>std::source_location</tt>.</li>
          <li>Hubert added that this proposal introduces challenges for
              compilers that open source files as "text" files since doing so
              transparently mutates line endings.</li>
          <li>Jens asserted that a wording direction that would suffice as a
              proposed resolution for
              <a href="https://wg21.link/cwg1655">core issue 1655</a>
              is needed before polling.</li>
          <li>Hubert raised concerns about implementations that read source
              code from datasets with fixed length records.</li>
          <li>Tom asked if anyone had a fundamental objection to the general
              direction.</li>
          <li>No objections were raised.</li>
        </ul>
      </li>
      <li>Proposal 3: Preserve Normalization forms
        <ul>
          <li>Jens asserted that this proposal needs to address how to tunnel
              code points through translation phase 1 and translation phase
              5.</li>
          <li>Hubert noted that an implementation would have to define how it
              determines whether a source file is Unicode encoded.</li>
          <li>Hubert asked what it means to preserve normalization through
              translation phase 5 if the execution character set is not
              Unicode.</li>
          <li>Corentin replied that the intent is that code point sequences
              that contain combining characters cannot be composed during
              translation phase 5.</li>
          <li><b>Poll: Proposal 3: We agree that, for Unicode source files,
              that normalization is preserved through translation phases 1
              and 5.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Proposal 4: Making trailing whitespaces non-significant
        <ul>
          <li>Tom declared that this is not an SG16 concern and that Corentin is free to take this directly to EWG.</li>
        </ul>
      </li>
      <li>Proposal 5: Restricting multi-characters literals to members of the Basic Latin Block
        <ul>
          <li>Tom suggested that the restriction be redefined in terms of characters that are encodable as a single code unit since some characters in this block may not be encodable or may not be encodable as a single code unit.</li>
          <li>Corentin expressed concern about portability.</li>
          <li>PBrett suggest changing the restriction to the basic source character set.</li>
          <li><b>Poll: Proposal 5: We support this direction modified in terms
              of the basic source character set.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Proposal 6: Making wide characters literals containing multiple or
          unrepresentable c-char ill-formed
        <ul>
          <li><b>Poll: Proposal 6: We support making wide multicharacter
              literals ill-formed.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
          <li><b>Poll: Proposal 6: We support making wide non-encodable
                 character literals ill-formed.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Proposal 7: Making conversion of character and string literals to
          execution and wide execution encoding ill-formed for unrepresentable
          c-char  
        <ul>
          <li>Steve asked if a source file containing Unicode 13 characters
              would be ill-formed if compiled by a compiler that only supports
              Unicode 12.</li>
          <li>PBrett asked for confirmation that a sparkle emoji present in a
              ordinary string literal in a Unicode encoded source code would be
              ill-formed if the execution character set is ISO-8859-1.</li>
          <li>Corentin replied that it would be.</li>
          <li>Jens stated that this restriction could always be worked around
              by defining ones own execution character set, so this doesn't
              provide much benefit.</li>
          <li>Hubert agreed that the normative impact is dubious.</li>
          <li>Jens suggested that polling be postponed since there are concerns
              that appear to warrant additional discussion.</li>
          <li>Tom agreed.</li>
        </ul>
      </li>
      <li>Proposal 8: Enforcing the formation of universal escape sequences in
          phase 2 and 4
        <ul>
          <li>Tom declared that this is not an SG16 concern and that Corentin
              is free to take this directly to EWG.</li>
        </ul>
      </li>
      <li>Proposal 10: Make L in _Pragma ill-formed
        <ul>
          <li><b>Poll: Proposal 10: We agree to make all encoding-prefixes in
              _Pragma ill-formed.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Proposal 11: Make character literals in preprocessor conditional
          behave like they do in C++ expression
        <ul>
          <li>Hubert asserted that opinions on this should be gathered from
              WG14.</li>
          <li><b>Poll: Proposal 11: We agree that the same character encoding
              should be used for character literal in translation phase 4 and
              7.</b>
            <ul>
              <li><b>Attendees: 10</b></li>
              <li><b>No objection to unanimous consent.</b></li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Proposal 12: Improved wording for phase 6 string concatenation
        <ul>
          <li>Tom declared that this is not an SG16 concern.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom stated that the next telecon will be held on September 9th.</li>
</ul>


</body>
