<!doctype html public "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<head>
<title>SG16: Unicode meeting summaries 2020-12-09 through 2021-03-24</title>
</head>

<style type="text/css">

table#header th,
table#header td
{
    text-align: left;
}

tt {
    font-family: monospace;
}

/* Thanks to Elias Kosunen for the following CSS suggestions! */

* {
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
    line-height: 125%;
}

html, body {
    background-color: #eee;
}

h1, h2, h3, h4, h5, p, span, li, dt, dd {
    color: #333;
}

p, li {
    line-height: 140%;
}

body {
    padding: 1em;
    max-width: 1600px;
}

p, li {
    -moz-osx-font-smoothing: grayscale;
    -webkit-font-smoothing: antialiased !important;
    -moz-font-smoothing: antialiased !important;
    text-rendering: optimizelegibility !important;
    letter-spacing: .01em;
}

h1, h2, h3 {
    margin-bottom: 1em;
    letter-spacing: .03em;
}

blockquote.quote
{
    margin-left: 0em;
    border-style: solid;
    background-color: lemonchiffon;
    color: #000000;
    border: 1px solid black;
}

</style>

<body>

<table id="header">
  <tr>
    <th>Document Number:</th>
    <td>P2352R0</td>
  </tr>
  <tr>
    <th>Date:</th>
    <td>2021-04-04</td>
  </tr>
  <tr>
    <th>Audience:</th>
    <td>SG16</td>
  </tr>
  <tr>
    <th>Reply-to:</th>
    <td>Tom Honermann &lt;tom@honermann.net&gt;</td>
  </tr>
</table>


<h1>SG16: Unicode meeting summaries 2020-12-09 through 2021-03-24</h1>

<p>
Summaries of SG16 meetings are maintained at
<a href="https://github.com/sg16-unicode/sg16-meetings">
https://github.com/sg16-unicode/sg16-meetings</a>.  This paper contains a
snapshot of select meeting summaries from that repository.
</p>

<ul>
  <li><a href="#2020_12_09">
      December 9th, 2020</a></li>
  <li><a href="#2021_01_13">
      January 13th, 2021</a></li>
  <li><a href="#2021_01_27">
      January 27th, 2021</a></li>
  <li><a href="#2021_02_10">
      February 10th, 2021</a></li>
  <li><a href="#2021_02_24">
      February 24th, 2021</a></li>
  <li><a href="#2021_03_10">
      March 10th, 2021</a></li>
  <li><a href="#2021_03_24">
      March 24th, 2021</a></li>
</ul>

<p>
Previously published SG16 meeting summary papers:
<ul>
  <li><a href="https://wg21.link/p1080">P1080: SG16: Unicode meeting summaries 2018/03/28 - 2018/04/25</a></li>
  <li><a href="https://wg21.link/p1137">P1137: SG16: Unicode meeting summaries 2018/05/16 - 2018/06/20</a></li>
  <li><a href="https://wg21.link/p1237">P1237: SG16: Unicode meeting summaries 2018/07/11 - 2018/10/03</a></li>
  <li><a href="https://wg21.link/p1422">P1422: SG16: Unicode meeting summaries 2018/10/17 - 2019/01/09</a></li>
  <li><a href="https://wg21.link/p1666">P1666: SG16: Unicode meeting summaries 2019/01/23 - 2019/05/22</a></li>
  <li><a href="https://wg21.link/p1896">P1896: SG16: Unicode meeting summaries 2019/06/12 - 2019/09/25</a></li>
  <li><a href="https://wg21.link/p2009">P2009: SG16: Unicode meeting summaries 2019-10-09 through 2019-12-11</a></li>
  <li><a href="https://wg21.link/p2179">P2179: SG16: Unicode meeting summaries 2020-01-08 through 2020-05-27</a></li>
  <li><a href="https://wg21.link/p2217">P2217: SG16: Unicode meeting summaries 2020-06-10 through 2020-08-26</a></li>
  <li><a href="https://wg21.link/p2253">P2253: SG16: Unicode meeting summaries 2020-09-09 through 2020-11-11</a></li>
</ul>
</p>


<h1 id="2020_12_09">December 9th, 2020</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2093r2">P2093R2: Formatted output</a>:
    <ul>
      <li>Continue discussion.</li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom provided some administrative updates:
    <ul>
      <li>A reminder that SG16 telecons operate under the WG21 code of conduct
          as described in the
          <a href="https://isocpp.org/std/standing-documents/sd-4-wg21-practices-and-procedures#code-of-conduct">"Code of conduct" section of WG21 Standing Document #4</a>
          and, by extension, the
          <a href="https://www.iso.org/publication/PUB100397.html">ISO Code of Conduct</a>
          and
          <a href="https://basecamp.iec.ch/download/iec-code-of-conduct-for-delegates-and-experts">IEC Code of Conduct</a>.</li>
      <li>A reminder that SG16 is a public group and that minutes of SG16
          telecons are made publicly available.  Participants may request that
          sensitive information not be minuted.</li>
      <li>A draft paper calling for the creation of a WG21 managed chat service
          will be submitted for the December 15th mailing.</li>
      <li><em>[ Editor's note: the submitted paper is
          <a href="https://wg21.link/p2263r0">P2263R0: A call for a WG21 managed chat service</a>.
          An administrative telecon will be scheduled in early January to
          discuss it with the intent to have a draft revision addressing any
          feedback prepared for the February pre-meeting administrative telecon.
          ]</em></li>
      <li>The deadline for initial proposals of new features to be included in
          C2x is 2021-08-27.</li>
      <li>The next C2x meeting is scheduled for March 8-12, 2021.</li>
      <li>Tom asked to clarify the next steps following our recent discussion of
          <a href="https://wg21.link/p2194r0">P2194R0</a>
          and asked who is planning to write the paper to re-word translation
          phase 1 in terms of Unicode scalar values rather than basic source
          characters and universal-character-names (UCNs).</li>
      <li>Jens stated he will write that paper, but noted some complications:
        <ul>
          <li>Not all string literals should be transcoded into the execution
              encoding because literals that appear in contexts such as
              <tt>static_assert</tt> and language linkage are evaluated at
              compile-time and have different encoding implications.</li>
          <li>The issue of translation phase 5/6 confusion regarding string
              literal concatenation and conversion to the execution encoding
              remains.</li>
          <li>It may become necessary to move string literal conversions and
              concatenation to translation phase 7.</li>
          <li>Care will be required to ensure concatenation of string literals
              does not result in construction or extension of escape
              sequences.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2093r2">P2093R2: Formatted output</a>:
    <ul>
      <li>PBrett provided an introduction:
        <ul>
          <li>It has been a month since we last reviewed.</li>
          <li>There has been some good discussion on the mailing list.</li>
          <li>Victor will continue his presentation.</li>
        </ul>
      </li>
      <li>Victor presented:
        <ul>
          <li>The intent of the proposal is to integrate formatting facilities
              with output streams.</li>
          <li>A survey of current Java, Python 3, and Rust releases was
              conducted to ascertain their behavior on a Russian Windows system
              (Active Code Page (ACP) set to Windows-1251; console encoding set
              to CP866) when a string containing Russion and Greek characters is
              written to stdout with stdout directed to a console and again with
              stdout redirected to a file.</li>
            <ul>
              <li>For Java, <tt>java.lang.System.out</tt> was used.</li>
              <li>For Python, <tt>print</tt> was used.</li>
              <li>for Rust, <tt>std::print</tt> was used. </li>
              <li>For each language, redirection of stdout to a file was done
                  using the <tt>cmd.exe</tt> shell.</li>
            </ul>
          </li>
          <li>Java failed to display the correct characters to the Windows
              console; UTF-8 output was produced when stdout was redirected to
              a file.</li>
          <li><em>[ Editor's note: See further discussion below regarding the
              Java behavior. ]</em></li>
          <li>Python threw an exception when trying to convert the Greek
              characters to Windows-1251.</li>
          <li>Rust displayed the correct characters to the console and UTF-8
              output was produced when stdout was redirected to a file.</li>
        </ul>
      </li>
      <li>PBrett noted a difference from C++ that each of these languages
          shares; they each always use a UTF encoding for strings.</li>
      <li>PBrett added that both Python and Rust appear to behave in an
          arguably correct manner.</li>
      <li>PBrett stated that he has many tools that operate in ISO-8859
          encodings and where generation of UTF-8 output would not produce
          the expected behavior.</li>
      <li>PBrett noted that this is a new facility, so it could behave
          differently.</li>
      <li>Tom expressed surprise that the Java test produced UTF-8 when stdout
          was directed to a file as that does not match his understanding of
          Java's behavior.</li>
      <li>Steve noted that such surprising behavior can be the result of
          mismatched encoding expectations and asked if the source file
          encoding was correct.</li>
      <li>Steve added that this is an easy thing to get wrong.</li>
      <li>Victor replied that all source files were UTF-8 encoded.</li>
      <li>Tom stated that the Java compiler uses the locale to determine souce
          file encoding unless invoked with the <tt>-encoding</tt> option;
          similar to how Microsoft Visual C++ behaves.</li>
      <li>Zach asked if source file encoding could influence the encoding used
          for file redirection.</li>
      <li>Victor replied that he would conduct additional tests.</li>
      <li><em>[ Editor's note: later tests revealed that the reported behavior
          for Java was incorrect.  The Java compiler had been invoked without a
          <tt>-encoding</tt> option, so the UTF-8 encoded source file was
          misinterpreted as being Windows-1251 encoded and the compiler
          converted string literals from Windows-1251 to UTF-16.  When the
          strings were then written via <tt>java.lang.System.out</tt>, the
          prior conversion was reversed when <tt>java.io.PrintStream</tt>
          converted the string to print from UTF-16 back to Windows-1251.  The
          result was that the original bytes from the source file encoding of
          the string literal were written to stdout.  This produced mojibake on
          the Windows console and gave the appearance that the program had
          (intentionally) generated UTF-8 in the file redirection scenario.
          Note that all valid UTF-8 code unit sequences are also valid
          Windows-1251 code unit sequences. ]</em></li>
      <li>Tom asked Victor if he could extend his testing to cover cases where
          stdout was redirected to a pipe that was connected to stdin of
          another process.</li>
      <li>Victor replied that he would look into that.</li>
      <li>PBrett recalled that, in prior discussion, we had started discussing
          the association of the execution encoding and run-time encoding and
          how that influences behavior.</li>
      <li>Victor summarized the proposed behavior and some of the prior
          discussion:
        <ul>
          <li>Decisions are based solely on execution encoding (known at
              compile time).</li>
          <li>Use of locale settings would be challenging due to the
              involvement of multiple encodings.</li>
          <li>The encoding of the format string must be consistent.</li>
          <li>Tom had raised the question of a z/OS implementation using UTF-8
              as the execution encoding, but operating in an EBCDIC
              environment.</li>
        </ul>
      </li>
      <li>PBrett observed that use of Microsoft's <tt>/utf-8</tt> option would
          cause UTF-8 output to be generated and asked about what should happen
          when that option isn't used.</li>
      <li>Victor replied that the behavior depends on the system configuration
          and that the proposal specifies that bytes be passed through
          unmodified in that case.</li>
      <li>Steve relayed experience with tools that end up writing ANSI terminal
          escape sequences into a file when output is redirected and noted the
          difficulty that would be encountered when attempting to determine what
          kind of device receives the final output; examples may involve
          multiple ssh hops.</li>
      <li>Tom opined that the only case he is aware of where writing directly to
          the device instead of to the file stream makes sense is on Windows
          where it can be known definitively that the file stream is attached to
          a console.</li>
      <li>Hubert explained that, on z/OS, an application can internally be in
          ASCII or EBCDIC mode, open file handles can be imbued with the
          property of being ASCII or EBCDIC, and the C-level I/O APIs can
          automatically translate between them for, at least, single-byte
          encodings.</li>
      <li>Jens commented that the proposed feature appears to be centered around
          a special facility for Windows and expressed uncertain skepticism
          regarding driving a design around it.</li>
      <li>Jens added that, in the z/OS scenario as Hubert described it, there
          appears to be uncertainty that the facility would handle variable
          length encodings adequately.</li>
      <li>Jens stated that, given the wide array of platforms supported by the
          C++ standard, that he would prefer to craft a design around an
          abstract console stream.</li>
      <li>PBrett expressed concern that this facility might not be used much on
          Windows because use of Microsoft's <tt>/utf-8</tt> option is not
          common.</li>
      <li>Victor disagreed with the notion that the design is centered around
          Windows and a particular corner case; the goal is to fix the general
          problem of producing correct output.</li>
      <li>Victor stated that, with respect to use of the <tt>/utf-8</tt> option,
          that he is open to the changes Tom suggested to transcode as necessary
          and write directly to the console when output is directed there
          regardless of what the execution encoding is.</li>
      <li>Tom agreed with Victor that the concerns are more fundamental and not
          a special case; the goal is to improve support for an internal
          encoding distinct from the external encoding.</li>
      <li>Tom asked about coexistence with other formatting facilities and what
          concerns arise due to potentially divergent behavior; programs won't
          be rewritten over night to migrate all <tt>printf()</tt> uses to
          <tt>std::print()</tt>.</li>
      <li>Zach asked what the tradeoffs in design options are, what gets broken
          based on design choices, and how much of this can be left
          implementation-defined.</li>
      <li>Zach expressed support for the approach described in the paper.</li>
      <li>Hubert stated that encompasing the console in a separate facility
          would pose challenges because it assumes the presence of a unique
          "console" in the environment.</li>
      <li>Hubert added that an important concern is encoding of string literals
          vs encoding of strings received from the environment.  For example,
          regex libraries tend to use a possibly pre-compiled pattern encoded
          in the execution encoding, but operate on strings provided by the
          environment; it is necessary to differentiate these.</li>
      <li>PBrett agreed with Hubert's concerns.</li>
      <li>Steve stated that the proposed feature is at least partly QoI and
          that, if we can specify this with sufficient latitude for Microsoft
          to make their customers happy, great; the Windows console
          capabilities are not portable.</li>
      <li>Steve added that locale information is input to the program and used
          to interpret bytes received as input.</li>
      <li>Steve raised the concern that the proposed approach may enclose
          transcoding behavior too deeply where it can't be fixed if there is a
          problem.</li>
      <li>Tom agreed with Steve's concern and noted parallels with Jens'
          previous request to separate transcoding features.</li>
      <li>PBrett asked how encoding errors should be handled.</li>
      <li>Victor replied that the current implementation throws an exception,
          but that there has been a recommendation to use U+FFFD character
          substitution instead.</li>
      <li>Zach noted that use of subtitution characters matches Unicode
          recommendations.</li>
      <li>Victor noted that, when writing to the console, substitution is safe
          since it is known that the output would be incorrect anyway.</li>
      <li>PBrett acknowledged that perception; the text gets converted to
          photons.</li>
      <li>Victor added that the Console font is typically limited in what can be
          displayed as well.</li>
      <li>PBrett asked if there were any objections to use of substitution
          characters in error handling scenarios.</li>
      <li>No objections were raised.</li>
      <li>Jens stated that the proposed feature is effectively a large hammer
          being aimed at more nails than is really desired.  For example, in
          safety critical scenarios, a programmer may need to be notified of
          errors; displaying a Unicode replacement character to an aircraft
          pilot is less than helpful.</li>
      <li>Jens reflected that, on the other hand, perhaps this is a tool
          intended for common users where such substitutions are not an
          issue.</li>
      <li>Jens expressed support for transcoding operations being explicit at
          program boundaries and noted that there are a number of encodings
          involved; execution encoding, locale dependent input, locale
          dependent output (potentially multiple; e.g., Windows ACP and
          console encoding).</li>
      <li>Jens opined that we should promote a programming model that puts more
          control in the hands of the programmer.</li>
      <li>Victor replied that <tt>std::format()</tt> doesn't do any transcoding
          at present; that the encodings must match and that programmers must
          get data into the right encoding first.</li>
      <li>Hubert noted the law of unintended consequences; that trying to ensure
          a behavior in one case can result in undesired behavior in
          another.</li>
      <li>Steve agreed and noted that use of <tt>isatty()</tt> tends to be
          problematic as it can be surprising when behavior changes based on
          redirection.</li>
      <li>PBrett asked Tom what polls should be conducted.</li>
      <li>Tom responded that he did not think discussion had progressed to a
          point where we all hold well-informed positions.</li>
      <li>Tom stated that the additional work Victor has agreed to do should
          help strengthen our understanding and positions; we should therefore
          wait for this additional information before conducting any
          polling.</li>
    </ul>
  </li>
  <li>Tom stated that the next meeting will be on January 13th and the agenda
      will include:
    <ul>
      <li><a href="https://wg21.link/p2246r0">P2246R0: Character encoding of diagnostic text</a></li>
      <li>SG16, SG22, and WG14 coordination; we'll discuss how to make forward
          progress on
          <a href="https://github.com/sg16-unicode/sg16/issues?q=is%3Aissue+is%3Aopen+label%3Awg14">SG16 issues labeled with the WG14 tag</a>.</li>
      <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a></li>
    </ul>
  </li>
</ul>


<h1 id="2021_01_13">January 13th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2246r0">P2246R0: Character encoding of diagnostic text</a>
    <ul>
      <li>And companion paper:
          <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2563.pdf">WG14 N2563: Character encoding of diagnostic text</a></li>
    </ul>
  </li>
  <li>SG16, SG22, and WG14 coordination.
    <ul>
      <li>Priority and owners for
          <a href="https://github.com/sg16-unicode/sg16/issues?q=is%3Aissue+is%3Aopen+label%3Awg14">SG16's open WG14 issues</a>.</li>
    </ul>
  </li>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a></li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Aaron Ballman</li>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>JeanHeyd Meneide</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Bindels</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom stated that his paper to clarify guidance regarding use of BOMs in
      UTF-8 text was submitted to the UTC and given paper number L2/21-038.
    <ul>
      <li>The submitted paper is available at
          <a href="https://www.unicode.org/L2/L2021/21038-bom-guidance.pdf">https://www.unicode.org/L2/L2021/21038-bom-guidance.pdf</a>.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2246r0">P2246R0: Character encoding of diagnostic text</a>:
    <ul>
      <li>Aaron provided an introduction:
        <ul>
          <li>The basic problem is that we have features that require source
              code fragments to be reproduced in diagnostics without clear
              specification or limits regarding how that should be done.</li>
          <li>Depending on the various encodings in use, accurate
              representation of the original source code may not be
              possible.</li>
          <li>The degree to which these fragments are preserved and reproduced
              should be a QoI issue.
          <li>WG14 approved the C version of this paper
              (<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2563.pdf">WG14 N2563: Character encoding of diagnostic text</a>).</li>
          <li>The wording changes for the C standard have a little additional
              cleanup; some uses of "may" were changed to "should".</li>
          <li>A goal is to keep the C and C++ standards aligned.</li>
        </ul>
      </li>
      <li>PBrett asked if the proposed changes might be considered
          editorial.</li>
      <li>Hubert opined that considering them editorial would be
          contentious.</li>
      <li>Aaron responded that the changes should not be considered
          editorial.</li>
      <li>PBrett asked if there is anything evolutionary about the
          changes.</li>
      <li>Aaron replied that he didn't think so, but that EWG participants
          may have a different perspective; for example, a desire to define
          a character set for diagnostics, though he would be surprised.</li>
      <li>Steve noted that we have parallel work in progress to specify
          translation within the standard in terms of a Unicode encoding.</li>
      <li>Hubert observed that the standard currently specifies different
          requirements for the <tt>static_assert</tt> declaration and
          <tt>#error</tt> directive vs the <tt>[[deprecated]]</tt> and
          <tt>[[nodiscard]]</tt> attributes; the former require the message
          to reproduce the text while the latter specify normative
          encouragement.</li>
      <li>Hubert noted that the status quo is not that diagnostics are QoI;
          there are requirements on the text produced, so changes could impact
          implementations.</li>
      <li>Hubert observed that the <tt>#error</tt> directive requires
          reproducing preprocessing tokens, potentially including string
          literals, at an earlier phase of translation than, for example,
          <tt>static_assert</tt>.</li>
      <li>Hubert requested that the paper clarify how diagnostics produced at
          different translation phases be handled.</li>
      <li>Aaron stated that he was not aware of a requirement that diagnostics
          be textual; an implementation that presented a frowny face or a
          graphical image of the source code would be conforming.</li>
      <li>PBindels chimed in from chat to state that
          <a href="https://github.com/dascandy/evoke">Evoke</a>
          emits emoji for its error state and that he felt called out.</li>
      <li>Victor also chimed in from chat to suggest to PBindels that emoji is
          still text and that Evoke should emit a gif with a relevant meme.</li>
      <li>Hubert responded that there is a requirement if the standard states
          one; the implementation must have some means to present a message.</li>
      <li>Aaron asked if a change to the prose was desired.</li>
      <li>Hubert responded affirmatively; that he would like to see a change of
          the rationale to clarify that there is no increase in requirement for
          some representation of the text in comparison to the status quo.</li>
      <li>Aaron agreed to make such a change.</li>
      <li>Corentin recalled earlier discussions regarding translation phase 1,
          string literals always being in execution encoding, and whether
          <tt>static_assert</tt> might require special handling.</li>
      <li>Corentin expressed contentedness with the paper as presented and
          stated that the standard should not have to specify how a string
          literal is presented.</li>
      <li>Aaron agreed with one exception; <tt>#error</tt> is specified solely
          in terms of preprocessing tokens.</li>
      <li>Corentin acknowledged; a string literal is a token in that case and
          <tt>#error</tt> processing occurs before conversion to the execution
          character set in translation phase 5.</li>
      <li>Hubert noted that the status quo is that preprocessing tokens are
          limited to the basic source character set due to translation phase 1
          converting other characters to UCNs.</li>
      <li>Hubert added that we have direction that might change this, but that
          this paper should not be held up because of that; Jens' effort may
          need to consider this though.</li>
      <li>Hubert observed that the standard does not specify how an
          implementation represents text in the execution character set, but
          that it needs to acknowledge that a translation is required when
          producing a diagnostic.</li>
      <li>Aaron asked if some string literals, such as those used in attributes,
          should even undergo translation to the execution character set.</li>
      <li>PBrett replied that an implementation performs conversion to the
          execution character set and therefore has the means to undo the
          conversion.</li>
      <li>Steve noted that an implementation presumably will retain access to
          the original source code; a reverse map to the original text
          suffices in the worst case.</li>
      <li>Hubert stated that it isn't necessarily true that a compiler can
          easily map from a converted string literal back to the original text
          after translation phase 5.</li>
      <li>Mark noted that, with regard to attributes, they can include
          arbitrary expressions, not just string literals.</li>
      <li>Corentin stated that this discussion is expanding the scope of what
          is required and repeated his opinion of the paper being acceptable
          as presented.</li>
      <li>Corentin added that, with respect to string literals, that
          programmers be able to rely on them being consistently converted;
          consistency is important for reflection and other compile-time
          programming facilities.</li>
      <li>Steve opined that some of this discussion is well in to QoI territory;
          if a compiler can't display source code in a sensible manner to the
          programmer, then all bets are off.</li>
      <li>Hubert stated that he would be happy with changing the current uses of
          "shall" to "should" via this paper, but that it isn't strictly
          necessary at this time; we don't want to exclude any execution
          environments.</li>
      <li>Aaron agreed.</li>
      <li>Hubert added that the wording for <tt>#error</tt> should be changed to
          match.</li>
      <li><b>Poll: When diagnostic messages quote a portion of source code, the
          preservation of text semantics is merely a quality of implementation
          concern.</b>
        <ul>
          <li><b>Attendance: 11</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">5</th>
                <th style="text-align:right">2</th>
                <th style="text-align:right">2</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>Consensus is in favor.</b></li>
        </ul>
      </li>
      <li><b>Poll: We agree with removing the specific character set from static_assert along the lines of P2246.</b>
        <ul>
          <li><b>Attendance: 11</b></li>
          <li><b>No objections to unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll: Forward P2246 to EWG</b>
        <ul>
          <li><b>Attendance: 11</b></li>
          <li><b>No objections to unanimous consent.</b></li>
        </ul>
      </li>
    </ul>
  <li>SG16, SG22, and WG14 coordination:
    <ul>
      <li>Aaron introduced:
        <ul>
          <li>SG22 will have its first meeting soon.</li>
          <li>Any WG21 papers targeted to SG22 will be forwarded to WG14 and
              SG22 will evaluate whether the paper is appropriate for both
              groups.  Authors will then submit papers to WG14.</li>
          <li>SG22 will assist with finding people to present at WG14 meetings
              on behalf of WG21 authors that are unable to attend a WG14
              meeting.</li>
          <li>WG14 does not currently have a study group that focuses on text
              issues.</li>
          <li>WG14 participants tend to be passionate about character sets.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Review of
      <a href="https://github.com/sg16-unicode/sg16/issues?q=is%3Aissue+is%3Aopen+label%3Awg14">SG16's open WG14 issues</a>:
    <ul>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/62">Issue #62: WG14 N2594: Disallow mixed string literal concatenation</a>:
        <ul>
          <li>JeanHeyd reported that
              <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2594.htm">WG14 N2594</a>
              was adopted for C2x and that this issue can be closed.</li>
          <li>JeanHeyd noted that there was a request for a follow up paper to
              allow mixed literals with defined semantics.</li>
          <li>Aaron added that the
              <a href="http://sdcc.sourceforge.net">sdcc</a>
              compiler implements mixed string literals with useful
              semantics.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/56">Issue #56: WG14: Improve support for Unicode characters in identifiers</a>:
        <ul>
          <li>Steve volunteered to look at this.</li>
          <li>Tom commented that WG14 generally likes to see two implementations
              before adopting a feature and that a change to the C++ standard
              generally counts as one.</li>
          <li>Aaron confirmed and noted that more implementations increases the
              motivation for standarizing existing behavior.</li>
          <li>Corentin asked if compatibility with C++ is considered good
              motivation.</li>
          <li>Aaron replied that it is seen more as weak motivation.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/55">Issue #55: WG14: Named universal character escapes</a>:
        <ul>
          <li>Tom stated that it is still on his plate to get a revision of
              <a href="https://wg21.link/p2071">P2071</a>
              submitted to WG21.</li>
          <li>PBrett suggested that SG22 be included on the next revision of
              the paper.</li>
          <li>Tom agreed.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/54">Issue #54: WG14: Make char16_t/char32_t string literals be UTF-16/32</a>:
        <ul>
          <li>Tom noted that this is standardizing existing practice.</li>
          <li>JeanHeyd volunteered to own this.</li>
          <li>PBrett volunteered as backup if JeanHeyd becomes too busy with
              his other efforts.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/5">Issue #5: WG14 N2231: char8_t: A type for UTF-8 characters and strings</a>:
        <ul>
          <li>Tom stated that he is actively working on this.</li>
        </ul>
      </li>
    </ul>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a>:
    <ul>
      <li>Tom apologized to JeanHeyd, but stated that discussion of this paper
          would have to wait until the next telecon.</li>
    </ul>
  </li>
  <li>Tom stated that the next telecon will be held January 27th.  The agenda
      will include a presentation and discussion of Jonathan Müller's Lexy
      parser combinator library and, of course, JeanHeyd's
      <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620</a>
      postponed from today's meeting.</li>
</ul>


<h1 id="2021_01_27">January 27th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Presentation and discussion with Jonathan Müller regarding his
      <a href="https://github.com/foonathan/lexy">lexy parser combinator library</a>
      (<a href="https://lexy.foonathan.net/tutorial">Tutorial</a>,
      <a href="https://lexy.foonathan.net/reference">Reference</a>),
      and the text and Unicode related challenged he faced, how he solved them,
      and what C++ standard language or library features he would have
      benefitted from.</li>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a></li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Daniela Engert</li>
  <li>Hubert Tong</li>
  <li>JeanHeyd Meneide</li>
  <li>Jens Maurer</li>
  <li>Jonathan Müller</li>
  <li>Mark Zeren</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tomasz Kamiński</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Presentation on lexy’s text and Unicode challenges by
      Jonathan Müller:</li>
    <ul>
      <li>Jonathan's presentation slides are avilable
          <a href="https://github.com/sg16-unicode/sg16-meetings/blob/master/presentations/2021-01-27-lexy-presentation.pdf">here</a>.</li>
      <li>Part 1: Inputs and Encodings:
        <ul>
          <li>Jonathan presented:
            <ul>
              <li>Several classes are provided to handle input via ranges,
                  strings, and buffers, each of which provides a reader
                  class.</li>
              <li>The reader class provides access to the input data
                  independent of the specific input class.</li>
              <li>Encoding classes provided for each of the supported encodings
                  define traits and operations for an encoding.</li>
              <li>The encoding classes are superficially similar to
                  <tt>std::char_traits</tt>; they define a character type, an
                  integer type to be used as an EOF sentinel, conversion
                  operations from the character type to the integer type,
                  secondary types that may be used with the encoding, and
                  encode operations.</li>
              <li>The supported encodings are:
                <ul>
                  <li>Default: uses <tt>char</tt> as the character type,
                      <tt>int</tt> as the integer type, and <tt>-1</tt> as the
                      EOF value.</li>
                  <li>Raw: uses <tt>unsigned char</tt> as the character type,
                      <tt>int</tt> as the integer type, and <tt>-1</tt> as the
                      EOF value.  <tt>char</tt> and <tt>std::byte</tt> can be
                      used as secondary character types.</li>
                  <li>ASCII: uses <tt>char</tt> as the character type,
                      <tt>char</tt> as the integer type, and <tt>0xff</tt> as
                      the EOF value.</li>
                  <li>UTF-8: uses <tt>char8_t</tt> as the character type,
                      <tt>char8_t</tt> as the integer type, and <tt>0xff</tt>
                      as the EOF value.  <tt>char</tt> can be used as a
                      secondary character type.</li>
                  <li>UTF-16: uses <tt>char16_t</tt> as the character type,
                      <tt>std::int_least32_t</tt> as the integer type, and
                      <tt>-1</tt> as the EOF value.  <tt>wchar_t</tt> can be
                      used as a secondary character type if it has the same
                      size as <tt>char16_t</tt>.</li>
                  <li>UTF-32: uses <tt>char32_t</tt> as the character type,
                      <tt>char32_t</tt> as the integer type, and
                      <tt>0xffffffff</tt> as the EOF value.  <tt>wchar_t</tt>
                      can be used as a secondary character type if it has the
                      same size as <tt>char32_t</tt>.</li>
                </ul>
              </li>
              <li>Wish list:
                <ul>
                  <li>The ability to determine the encoding of ordinary string
                      literals.</li>
                  <li>The ability to convert between the original character
                      types (<tt>char</tt>, <tt>wchar_t</tt>) and the newer
                      types (<tt>char8_t</tt>, <tt>char16_t</tt>,
                      <tt>char32_t</tt>) without the spectre of undefined
                      behavior.</li>
                  <li>A library function to heuristically determine or guess
                      the encoding for some input.</li>
                </ul>
              </li>
            </ul>
          </li>
          <li><em>[ Editor's note: Right at the start of Jonathan's
              presentation, one of the editor's sons showed up bleeding from
              having fallen while climbing over a fence.  Recovery was quick,
              and the editor was appreciative of the excellent slides that
              enabled him to fill in what he missed of the presentation.
              ]</em></li>
          <li>Hubert noted that use of <tt>0xFF</tt> as the EOF value for the
              ASCII encoding could be problematic since <tt>char</tt> may be a
              signed type, but is probably ok if it is just an internal
              implementation detail.</li>
          <li>PBrett observed that, for encodings with a larger integer type,
              buffer space may not be used efficiently.</li>
          <li>Jonathan replied that buffers always store values in the character
              type, not the integer type.</li>
          <li>Victor asked how ill-formed input that might have a character
              value that matches the EOF value is handled.</li>
          <li>Jonathan replied that processing of such input will end
              prematurely.</li>
          <li>PBrett stated that it seems reasonable for the library to require
              well-formed input.</li>
          <li>PBrett asked if a library solution that provides a "blessed" cast
              operation for handling input in the secondary character types
              would suffice.</li>
          <li>Jonathan replied that it would.</li>
          <li>Corentin mentioned having discussed such a possibility with
              Richard Smith in the past, particularly with regard to the
              possibility of <tt>constexpr</tt> support.</li>
          <li>Jonathan confirmed that <tt>constexpr</tt> support would be
              useful.</li>
          <li>Steve stated that there is a need for that feature for historical
              interfaces that have <tt>char*</tt> parameters.</li>
          <li>Jonathan acknowledged the need and explained that these interfaces
              accept either the primary or secondary type and use
              <tt>reinterpret_cast</tt> internally.</li>
          <li>Hubert asked if these conversions are needed only in one direction
              or in both.</li>
          <li>Jonathan replied that they are one directional; the user will not
              modify the text after handing it off and the library does not hold
              on to input beyond a call.</li>
          <li>Jonathan added that, if a buffer is provided, the data is
              copied.</li>
          <li>Jens explained that the reason that the <tt>reinterpret_cast</tt>
              results in undefined behavior is because the compiler can assume
              no aliasing of most types.</li>
          <li>Corentin asked if a bless function could work at
              <tt>constexpr</tt> time.</li>
          <li>Hubert replied that this is different than the <tt>std::bless</tt>
              that has been discussed in the past; to bless means to start the
              lifetime of an object.</li>
          <li>Jens stated that there are two approaches we can consider:
            <ul>
              <li>Changing the alias rules and thereby prohibit related
                  optimizations.</li>
              <li>Extending the memory model in some way to enable such
                  magic.</li>
            </ul>
          </li>
          <li>Mark noted that the anti-aliasing features of <tt>char8_t</tt>
              were a motivator for adopting the type.</li>
          <li>Jens added that this is a research opportunity.</li>
          <li>Jonathan obseved that undefined behavior seems to be ok for
              users for now.</li>
          <li>Zach stated that he has also written a parser combinator library,
              but did so in a way that did not require use of
              <tt>reinterpret_cast</tt>.</li>
          <li>Corentin noted from chat that
              <a href="https://wg21.link/p1885">P1885</a>
              exposes the encoding of ordinary string literals.</li>
        </ul>
      </li>
      <li>Part 2: Unicode-Aware Rules:
        <ul>
          <li>Jonathan presented:
            <ul>
              <li>Parsing requires converting character input to integer input
                  to check for EOF.</li>
              <li>Comparing input to literal values also requires
                  conversions.</li>
              <li>The allowed conversions depend on the encoding in use.</li>
              <li>Transcoding support is limited to conversion from ASCII to
                  other encodings and assumes that the code point value in the
                  target encoding matches the ASCII character code and is
                  representable with a single code unit.</li>
              <li>Matching of literal values is restricted to ASCII unless the
                  literal has a <tt>char8_t</tt>, <tt>char16_t</tt>, or
                  <tt>char32_t</tt> based type or if an explicit encoding is
                  specified.</li>
              <li>A rule is provided for matching a Unicode BOM.</li>
              <li>A rule is provided for reading a code point for encodings
                  other than default and raw.</li>
              <li>Input is assumed to be well formed; no validation is
                  performed.</li>
              <li>Wish list:
                <ul>
                  <li>Unicode character classification functions.</li>
                  <li>A <tt>std::code_point</tt> type.</li>
                  <li>Support for validating encoded input.</li>
                </ul>
              <li>Features not actually needed for this part:
                <ul>
                  <li>Decoding facilities.</li>
                  <li>Transcoding facilities.</li>
                </ul>
              </li>
            </ul>
          </li>
          <li><em>[ Editor's note: The editor's internet connection decided to
              take some time off just as Jonathan started presenting part 2.  It
              came back fairly quickly, but a few minutes of presentation were
              lost.  Gaps were again filled in thanks to the excellent slides.
              ]</em></li>
          <li>Hubert asked what features would be desirable in a
              <tt>std::code_point</tt> type.</li>
          <li>Jonathan presented lexy's <tt>code_point</tt> type and its
              <tt>is_valid()</tt>, <tt>is_surrogate()</tt>,
              <tt>is_scalar()</tt>, <tt>is_ascii()</tt>, and <tt>is_bmp()</tt>
              members.</li>
          <li>PBrett observed that the desire is for something like Unicode
              properties.</li>
          <li>Jens noted the much simpler requirements.</li>
          <li>PBrett asked if higher level features are provided that could
              handle Unicode normalization.</li>
          <li>Jonathan replied that there are not currently, but that adding
              such support seems feasible.</li>
          <li>Zach noted that there are normalization equivalences, but also
              equivalence for collation purposes and that a lot of effort is
              required to get that working.</li>
          <li>Corentin noted from chat that
              <a href="https://wg21.link/p1628">P1628</a>
              provides support for Unicode character classification and that
              an implementation is available at
              <a href="https://github.com/cor3ntin/ext-unicode-db/blob/master/generated_includes/cedilla/properties.hpp">https://github.com/cor3ntin/ext-unicode-db/blob/master/generated_includes/cedilla/properties.hpp</a>.</li>
        </ul>
      <li>Part 3: File Input:
        <ul>
          <li>Jonathan presented:
            <ul>
              <li>Buffers have an associated encoding.</li>
              <li>Endian concerns, including handling of BOMs, are addressed
                  when filling a buffer.</li>
              <li>A function is provided for populating a buffer from a
                  file.</li>
              <li>File reading is done in binary mode and without encoding
                  validation.</li>
              <li>Wish list:
                <ul>
                  <li><tt>std::read_file()</tt>.</li>
                  <li>Endian conversion facilities.</li>
                  <li>Facilities to heuristically identify the encoding of
                      arbitrary input.</li>
                </ul>
              </li>
            </ul>
          </li>
          <li>Tom noticed that, if a BOM is not present, that big endian is
              assumed and asked if the default shouldn't be the native endian
              order for data in memory.</li>
          <li>PBrett agreed that, for data in memory, yes, but for data at
              rest, big endian is the default.</li>
          <li>Jonathan explaind that BOM assumptions are only used for files
              at present, so assuming big endian seems correct.</li>
          <li>PBrett agreed that a <tt>std::read_file()</tt> function would
              be useful.</li>
          <li>Jens stated in chat that support for endian conversions is in
              progress;
              <a href="https://wg21.link/p1272">P1272R3</a>.</li>
        </ul>
      </li>
      <li>Part 4: The <tt>as_string</tt> callback:
        <ul>
          <li>Jonathan presented:
            <ul>
              <li>When a production rule is matched, the parsed lexeme is
                  passed to a functor that can receive it as a string-like
                  type, a C string, or as a lexeme with associated reader
                  class.</li>
              <li>A lexeme is a range over the input.</li>
              <li>Wish list:
                <ul>
                  <li>Encoding facilities.</li>
                  <li>Transcoding facilities.</li>
                </ul>
            </ul>
        </ul>
      </li>
      <li>PBrett noted that Zach's
          <a href="https://github.com/tzlaine/text">Boost.text</a>
          library does some of this.</li>
      <li>Zach agreed and noted support for transcoding.</li>
      <li>PBrett asked if support for grapheme rules had been considered.</li>
      <li>Jonathan replied that he has considered such rules, but that such
          support requires the Unicode DB.</li>
      <li>Mark asked if grapheme interfaces were available in the standard,
          whether they would be used.</li>
      <li>Jonathan replied, yes and stated that he intends to implement more
          Unicode rules, but they are work.</li>
      <li>Tom summarized one of his big take aways from the presentation is
          how encodings were handled; that support was limited to ASCII unless
          there was evidence in the type system for a more capable
          encoding.</li>
      <li>PBrett stated that one of his big take aways is that a number of the
          items in the wish lists are already in the pipeline.</li>
      <li>Jens agreed and noted that these are some of the easier items to
          address.</li>
      <li>Jens noted that there would be clear benefit from access to Unicode
          DB character properties.</li>
      <li>PBrett asked if Jonathan would make the slides available.</li>
      <li>Jonathan agreed to do so.</li>
      <li><em>[ Editor's note: Jonathan did so and they are linked above.
          ]</em></li>
      <li>Corentin stated that the ability to provide conversions between
          <tt>char</tt> and <tt>char8_t</tt> is something that we might be able
          to do for C++23.</li>
      <li>PBrett suggested that one way to do that might be to take a span and
          return another span.</li>
      <li>Steve agreed that the ability to perform such conversions would be
          useful, but wondered if it can be done without losing aliasing
          benefits.</li>
      <li>Jens suggested postponing how to alias <tt>char</tt> and
          <tt>char8_t</tt> until a proposal is provided.</li>
      <li>Zach commented that he and Jonathan made many similar choices in
          their implementations, but noted one point of difference; Zach tried
          to deduce the encoding all the time.</li>
      <li>JeanHeyd observed that, with Corentin's
          <a href="https://wg21.link/p1885">P1885</a>, the literal encoding
          will be known.</li>
      <li>JeanHeyd added that gcc trunk already has support for the predefined
          macro and that Clang and MSVC are making progress.</li>
    </ul>
  </li>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4></a>:
    <ul>
      <li>JeanHeyd presented:
        <ul>
          <li>The paper was presented to WG14 in November.</li>
          <li>WG14 requested that additional conversion functions be provided
              to convert between UTF encodings.</li>
          <li>The paper proposes conversion functions from the locale sensitive
              MB/wide encodings to UTF encodings.</li>
          <li>Converters are provided that perform per-code-unit translation
              and per-string translation.</li>
          <li>These functions avoid the historical issues with variable length
              encodings.</li>
          <li>MAX macros are provided to avoid having to check for and handle
              insufficient output errors.</li>
          <li>These functions differ from <tt>iconv</tt> because C doesn't have
              a locale or encoding tag suitable for specifying an encoding.</li>
        </ul>
      </li>
      <li>PBrett asked to verify that sizes are always expressed in code units
          in the paper.</li>
      <li>JeanHeyd replied that they are.</li>
      <li>PBrett observed that the historical designs returned a positive value
          to indicate the number of code units that were written.</li>
      <li>JeanHeyd acknowledged and explained that a different design is
          proposed here because of ambiguities that arise with a return value
          of 0.</li>
      <li>Steve asked how the number of code units that were consumed and
          written are returned.</li>
      <li>JeanHeyd replied that the provided pointers are updated, so theu
          caller can calculate.</li>
      <li>Jens observed that the proposed interface always requires two
          modifications when some input is consumed and output is produced; an
          alternative would be a range where only the input is bumped.</li>
      <li>JeanHeyd replied that convenience functions that provide that simpler
          interface could potentially be provided.</li>
      <li>Jens asked for more explanation for the use of a pointer/size pair as
          opposed to a pointer/pointer pair.</li>
      <li>JeanHeyd replied that null can be passed for the size.</li>
      <li>Jens expressed concerns about security issues.</li>
      <li>Tom stated that a null could be passed for an end pointer to
          accomplish the same goals; and the same security concerns.</li>
      <li>PBrett suggested postponing further discussion until the next
          meeting.</li>
    </ul>
  </li>
  <li>Tom stated that the next meeting will be February 10th and that
      <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620</a>
      will be top of the list.</li>
  <li>Tom thanked Jens for his recent draft paper proposing changes to the UCN
      model and that he would put that paper on the agenda soon.</li>
  <li>Tom added that he is thinking about dedicating an upcoming telecon to
      discussion of what we want to try and complete for C++23.</li>
</ul>


<h1 id="2021_02_10">February 10th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a>:
    <ul>
      <li>Continue discussion started at the last telecon and in
          <a href="https://lists.isocpp.org/sg16/2021/01/2049.php">recent email discussion</a>.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2093r3">P2093R3: Formatted output</a>:
    <ul>
      <li>Review Victor's updates since our
          <a href="https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2020.md#december-9th-2020">review of P2093R2 on 2020-12-09</a>.
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>JeanHeyd Meneide</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2620.htm">WG14 N2620: Restartable and Non-Restartable Functions for Efficient Character Conversions | r4</a>:
    <ul>
      <li>JeanHeyd stated that he has not yet completed the benchmarks
          requested in the email discussion.</li>
      <li>JeanHeyd shared code and demonstrated three possible signatures
          for the conversion functions:</li>
        <ul>
          <li>1) The form proposed in the paper.
            <ul>
             <li><tt>mcerr_t(const charX** ibegin, size_t* isize, charY** obegin, size_t* osize)</tt></li>
             <li>Con: Each call requires writes to <tt>*ibegin</tt>,
                 <tt>*isize</tt>, <tt>*obegin</tt>, and <tt>*osize</tt>.</li>
            </ul>
          </li>
          <li>2) Iterator pairs passed by reference.
            <ul>
             <li><tt>mcerr_t(const charX** ibegin, const charX** iend, charY** obegin, charY** oend)</tt></li>
             <li>Pro: Each call only requires writes to <tt>*ibegin</tt> and
                 <tt>*obegin</tt>.</li>
            </ul>
          </li>
          <li>3) Iterator pairs, begin passed by reference, end passed by value.
            <ul>
             <li><tt>mcerr_t(const charX** ibegin, const charX* iend, charY** obegin, charY* oend)</tt></li>
             <li>Pro: Each call only requires writes to <tt>*ibegin</tt> and
                 <tt>*obegin</tt>.</li>
             <li>Con: No support for unbounded reads and writes.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Zach asked if, for #3, both <tt>ibegin</tt> and <tt>iend</tt> are
          required to be non-null.</li>
      <li>JeanHeyd replied that they are.</li>
      <li>Zach stated that, if null is passed for <tt>oend</tt>, that should
          allow unbounded writes, and if null is passed for <tt>obegin</tt>,
          that should allow counting code units.</li>
      <li>JeanHeyd acknowledged and added that passing null for both would
          support counting and validation.</li>
      <li>Tom noted the assumption that the count is returned via the return
          value in that case.</li>
      <li>JeanHeyd agreed and noted that would require special error return
          values ike <tt>(size_t)-3</tt>.</li>
      <li>PBrett asked if named constants would be provided for such error
          values.</li>
      <li>JeanHeyd replied yes and that they already exist.</li>
      <li>Jens requested that the paper clearly illustrate the three modes of
          operation (counting, validation, and conversion) and provide examples
          of each.</li>
      <li>Jens noted that <tt>iconv</tt> does not support unbounded
          buffers.</li>
      <li>Jens added that support for unbounded buffers would be novel and
          carries significant risk.</li>
      <li>Jens observed that the arguments in the signatures presented differ
          from what is proposed in the paper.</li>
      <li>JeanHeyd acknowledged and noted there is a lack of consistency in
          existing interfaces.</li>
      <li>Jens suggested letting WG14 choose the argument order.</li>
      <li>Tom noted that returning <tt>size_t</tt> with special error values
          makes for verbose call sites since there is no simple way to test
          for an error.</li>
      <li>JeanHeyd agreed and noted that the proposed signature was designed
          to avoid that issue.</li>
      <li>JeanHeyd added that a caller could check to see if all input was
          consumed to determine if an error occurred.</li>
      <li>Tom noted that these signatures don't include an <tt>mbstate_t</tt>
          parameter and therefore cannot support processing text one byte at
          a time as might be required to advance through buffer boundaries;
          for the <tt>mbstate_t</tt> taking variants, all bytes could be
          consumed without ensuring that an error condition is not present.</li>
      <li>Tom suggested that optional parameters could be used to return
          counts of processed code units.</li>
      <li>Corentin expressed a preference for the design in the paper for
          usability reasons and noted a dislike for using a return value to
          indicate both errors and counts.</li>
      <li>Corentin stated that performance concerns should be subject to
          benchmarks indicating a problem, and that he is reluctant to
          compromise usability for small gains.</li>
      <li>JeanHeyd noted that the design in the paper avoids using the return
          value for multiple purposes.</li>
      <li>Zach wondered if the challenge with this interface is trying to do
          too many things and violating the single purpose guideline; perhaps
          it would be worth having separate interfaces for validation and
          counting.</li>
      <li>Zach added that examples of real world code would help to guide the
          design.</li>
      <li>Zach asked if the interface offered a mechanism to request
          replacement characters for ill-formed input.</li>
      <li>JeanHeyd replied that there are multiple possibilities for handling
          errors, but that WG14 felt the best approach is to leave error
          handling policy to the programmer.</li>
      <li>JeanHeyd added that these interfaces are intended to be basic low
          level functionality and that more complex functionality can be added
          at a higher level.</li>
      <li>Zach acknowledged that simplified wrappers could provide higher
          level error handling.</li>
      <li>Jens asserted that interface choice should be based on informed
          performance behaviors and acknowledged that a complicated return
          value is undesirable.</li>
      <li>Jens added that many C interfaces feel awkward from a C++ perspective
          due to the use of pointers and the need for manual error checking,
          and suggested that the focus be on getting the interface working and
          functional; ergonomics can be secondary.</li>
      <li>Jens requested that the paper reflect opinions regarding separate
          validation and counting functions.</li>
      <li>JeanHeyd stated that he would update the paper and bring back a
          revision for further review.</li>
      <li>Corentin asked JeanHeyd what additional feedback he would like, what
          feedback WG14 might appreciate, and whether we should expect WG14 to
          consider our feedback.</li>
      <li>JeanHeyd replied that WG14 will take WG21 feedback seriously.</li>
      <li>JeanHeyd stated that the feedback provided so far has been useful;
          examples of usage will be a good addition to the paper.</li>
      <li>JeanHeyd added that he'll take any questions or advice that he can't
          answer himself to WG14.</li>
      <li>JeanHeyd noted that WG14 is open to adding potentially many functions,
          so separate interfaces could be a possibility.</li>
      <li>Corentin expressed confidence that whatever interface is decided on
          will likely be reasonably usable and useful for C++, but that a
          dependency on WG14 is not required since WG21 can specify its own
          interfaces.</li>
      <li>JeanHeyd agreed and noted that part of his intent in working with
          WG14 is to ensure that WG21 implementors have the tools they need
          to implement future WG21 libraries.</li>
      <li>Tom asked if the intent is that these interfaces be efficiently
          implementable using existing interfaces like <tt>iconv</tt> and
          <tt>MultiByteToWideChar</tt>.</li>
      <li>JeanHeyd replied, yes.</li>
      <li>Tom noted that optimizing these interfaces to work with little or no
          overhead over those interfaces should be a goal.</li>
      <li>JeanHeyd agreed.</li>
      <li>Tom asked how to seek to the next valid lead code unit when an error
          is encountered.</li>
      <li>JeanHeyd replied that the simplest solution is to increment the input
          by one byte and to try again.</li>
      <li>Tom observed that doing so would result in many error actions taken
          for a contiguous sequence of ill-formed code units, e.g., issuing
          many replacement characters; that conforms with Unicode, but Unicode
          guidance is to substitute a single replacement character for such
          sequences.</li>
      <li>JeanHeyd replied that the caller could track consecutive errors and
          wait to apply an error policy until the ill-formed sequence has been
          fully consumed.</li>
      <li>Tom agreed that would work and noted that we don't need to optimize
          for the error case.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2093r3">P2093R3: Formatted output</a>:
    <ul>
      <li>Victor presented:
        <ul>
          <li>Victor's presentation slides are avilable
              <a href="https://github.com/sg16-unicode/sg16-meetings/blob/master/presentations/2021-02-10-p2093r3-presentation.key">here</a>.</li>
          <li>Feedback provided during the last SG16 review was incorporated:
            <ul>
              <li>Additional motivation for <tt>stdout</tt> as the default
                  output stream was added.</li>
              <li>Appendix A was added with a comparison of behavior in other
                  languages.</li>
              <li>Other fixes and minor changes.</li>
            </ul>
          </li>
          <li>Six languages have been evaluated on Linux, macOS, and Windows.
            <ul>
              <li>The interesting case is Windows due to its use of a console
                  encoding distinct from the system encoding.</li>
              <li>Go, JavaScript, Python, and Rust were all able to produce
                  correctly rendered output to the Windows console; C and Java
                  did not.</li>
              <li>When output was redirected to a file, Java and Python both
                  tried to transcode to Windows-1251.
                <ul>
                  <li>Java substituted replacement characters for
                      unrepresentable characters.</li>
                  <li>Python threw an exception when attempting to convert
                      unrepresentable characters.</li>
                </ul>
              </li>
            </ul>
          </li>
          <li>The paper proposes following the behavior exhibited by C, Go,
              JavaScript, and Rust.</li>
          <li>The only difference compared to <tt>printf()</tt> is special
              handling when writing to the console on Windows.</li>
        </ul>
      </li>
      <li>PBrett noted that, for Java, JavaScript, Python, and Rust, the
          internal encoding is always Unicode, but that is not the case for
          C and C++; this may indicate different behavior is warranted.</li>
      <li>Victor agreed that there is a difference, but the proposal is to
          only special case writing to the Windows console when the execution
          encoding is UTF-8.</li>
      <li>Corentin commented that, when writing to the console, the output is
          necessarily text, but when writing to a file, the output could be
          binary.</li>
      <li>Jens noted how JeanHeyd's transcoding facilities could be used here;
          we can transcode so long as a target encoding is known.</li>
      <li>Jens reiterated his desire for lower level functionality to be
          exposed such that general programmers would be able to implement
          similar functionality.  For example:
        <ul>
          <li>A facility to determine if output will be directed to a console;
              Tom showed how that can be done on several platforms and it is
              common for such customization to be done on POSIX systems for
              paging and color highlighting purposes.</li>
          <li>Facilities to enable transcoding.
              <a href="https://wg21.link/p1885">P1885</a>
              enables identifying the execution character set; the remaining
              missing functionality is how to write directly to the console
              on Windows.</li>
        </ul>
      </li>
      <li>PBrett noted that we've discussed the desire to expose the low level
          functionality before.</li>
      <li>Jens indicated that he is strongly opposed to forwarding this paper
          on without those independent lower level facilities.</li>
      <li>Victor responded that he is willing to propose those lower level
          interfaces, but would prefer to focus on the text issues within SG16;
          the other facilities can be LEWG concerns.</li>
      <li>Tom stated that functionality to write directly to a console is
          arguably an SG16 concern; though also arguably an SG13 concern.</li>
      <li>Jens expressed concern about writing to the console being library
          black magic; he would like to be able to perform such writes without
          having to use formatting facilities, if desired.</li>
      <li>Steve suggested that use of the Windows console is becoming more
          rare; software developers are probably the biggest users of it.</li>
      <li>Steve asserted that the don't pay for what you aren't using principle
          applies; transcoding overhead may be undesirable, especially if it
          corrupts output.</li>
      <li>Victor agreed and stated that he doesn't want to break anything; he
          just wants to do what <tt>printf()</tt> does with the one exception
          of writing directly to the Windows console in order to avoid Windows'
          builtin mojibake.</li>
      <li>Steve suggested that direct writing could be considered QoI; the
          standard should not be specifically concerned with the Windows
          console.</li>
      <li>Steve rephrased; we want to give Windows implementors permission,
          not a mandate.</li>
      <li>Tom raised the question from the agenda email asking about future
          direction to integrate support for other character types.</li>
      <li>Victor replied that it is straight forward for <tt>char16_t</tt> and
          <tt>char32_t</tt> since they can be simply transcoded to UTF-8, but
          that support for <tt>wchar_t</tt> is more complicated due to the
          need to actually transcode.</li>
      <li>Tom replied that it isn't straight forward what the behavior should
          be when the execution character set is not UTF-8; consider
          EBCDIC.</li>
      <li>PBrett opined that we don't have to solve this issue now.</li>
      <li>Tom agreed, but expressed concern that design decisions made now
          might result in missed opportunities to do better than
          <tt>printf()</tt> in the future.</li>
      <li>Zach commented that he does not want this paper delayed while
          awaiting proposals for the lower level interfaces encouraged by
          Jens.</li>
      <li>Corentin stated that LEWG is busy and urgency is required if the
          proposal is to make C++23.</li>
      <li>Corentin added that more features can be added later and that many
          desired features are not Unicode concerns.</li>
      <li>Hubert claimed that the usefulness of a simple interface is nice and
          that the special cases needed here are not so disimilar to those
          required for <tt>std::filesystem</tt>.</li>
      <li>Hubert observed that the proposal is both over specified and under
          specified; it is over specified for Windows, but under specified
          otherwise.</li>
      <li>Hubert noted that the literal encoding and the locale encoding are
          distinct, and although
          <a href="https://wg21.link/p1885">P1885</a>
          would allow differentiating them, we still have not standardized
          (in C or C++) facilities to transcode from the literal encoding.</li>
      <li><b>Poll: Forward P2093R3 to LEWG.</b>
        <ul>
          <li><b>Attendance: 9 (Hubert was present for discussion, but was not
              able to be present for the poll)</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">4</th>
                <th style="text-align:right">2</th>
                <th style="text-align:right">2</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
              </tr>
            </table>
          </li>
          <li><b>Consensus is in favor.</b></li>
          <li>SA stated opposition to progression without the lower level
              facilities also being made available.</li>
          <li>Victor asked if it would still be helpful to propose those
              facilities separately.</li>
          <li>SA responded, yes.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom stated that the next telecon will be on February 24th and that the
      tentative topics are Jens'
      <a href="https://wg21.link/p2314">P2314</a>
      and discussion of priorities for C++23.</li>
</ul>


<h1 id="2021_02_24">February 24th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2314r0">P2314R0: Character sets and encodings</a></li>
  <li><a href="https://wg21.link/p2297r0">P2297R0: Wording improvements for encodings and character sets</a></li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Tom provided an introduction.
    <ul>
      <li><a href="https://wg21.link/p2314r0">P2314R0</a>
          and
          <a href="https://wg21.link/p2297r0">P2297R0</a>
          overlap and compete in several ways.</li>
      <li>The authors have been communicating with each other offline to
          ensure that their respective positions and the differences between
          the papers are understood.</li>
      <li>Since the scope of
          <a href="https://wg21.link/p2314r0">P2314R0</a>
          is smaller than
          <a href="https://wg21.link/p2297r0">P2297R0</a>,
          Jens will present first, we'll keep discussion to clarifying
          questions, then Corentin will present, then we'll open the floor
          to general discussion.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2314r0">P2314R0: Character sets and encodings</a>:
    <ul>
      <li>Jens presented.
        <ul>
          <li>C++ support for source code with characters outside the basic
              source character set uses model A from the "UCN models"
              subsection of section 5.2.1 in the
              <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf">C99 rationale document</a>.</li>
          <li>In model A, characters not in the basic source character set are
              implicitly converted to <i>universal-character-names</i>
              (UCNs).</li>
          <li>The (revised) paper proposes switching to a modified model C in
              which characters are converted to an internal encoding that is
              able to represent all supported characters as well as UCNs.</li>
          <li>The model A approach is not reflective of how compilers work in
              practice.</li>
          <li>The ISO requires that, when an ISO standard exists and suffices
              for a particular purpose, that it be used.  As a result, the
              proposed wording references ISO 10646 rather than the Unicode
              standard.</li>
          <li>The terminology used in ISO 10646 differs from the terminology
              in the Unicode standard.</li>
          <li>The C++ standard does not contain a normative reference to the
              Unicode standard today.</li>
          <li>The standard conflates character sets and character encodings
              and this makes defining multibyte encodings problematic.</li>
          <li>The wording is intended to separate the ideas of character sets
              vs character encoding.</li>
          <li>Character sets are not mentioned except where normatively needed;
              their existence can be otherwise inferred from encodings.</li>
          <li>The wording introduces a new <i>literal encoding</i> term.</li>
          <li>The literal encoding may not be compatible with the locale
              dependent execution encoding; this is consistent with the
              status quo.</li>
          <li>An earlier revision of the paper removed
              <i>execution character set</i> in proposed wording, but the term
              was revived for C compatibility.</li>
          <li>Not all string literals denote the same thing; some denote an
              object, but others are used only at translation time.  For
              example, header names, <tt>extern "C"</tt>, and <tt>_Pragma</tt>
              directives.</li>
          <li>Terminology changes:
            <ul>
              <li><i>Basic source character set</i> is renamed to
                  <i>basic character set</i>.</li>
              <li><i>Basic literal character set</i> describes aditional
                  characters that are required to be representable in all
                  literal encodings.</li>
              <li><i>Ordinary literal encoding</i> specifies the encoding used
                  to encode ordinary string literal objects.</li>
              <li><i>Wide literal encoding</i> specifies the encoding used to
                  encode wide string literal objects.</li>
              <li><i>Translation character set</i> specifies the set of
                  abstract characters corresponding to all possible characters
                  encountered during translation; these map to UCS scalar
                  values.</li>
              <li><i>Extended character set</i> is removed.</li>
            </ul>
          </li>
          <li>The wording avoids the notion of a character being dependent
              on what Unicode version is used.</li>
          <li>There are no intended behavioral changes other than one case
              involving stringizing extended characters; in that case, the
              standard currently specifies that a UCN is stringized, but
              that doesn't match existing behavior.</li>
          <li>During translation phase 1, source characters are mapped to
              the translation character set.</li>
          <li>In translation phase 3, UCNs outside of a header name or
              literal are converted to the translation character set.</li>
          <li>There are a few questions regarding header names.
            <ul>
              <li>Implementations appear to replace UCNs in header names.</li>
              <li>Extended characters don't work portably in header names.</li>
            </ul>
          </li>
          <li>Translation phases 5 and 6 could be collapsed, but are preserved
              to avoid renumbering translation phase 7.</li>
          <li>The <i>translation character set</i> members include both named
              characters and abstract characters for unassigned UCS scalar
              values.</li>
          <li>The <i>basic character set</i> is defined in terms of UCS scalar
              values and names.</li>
          <li>The <i>basic literal character set</i> consists of the
              <i>basic character set</i> plus characters for alert, backspace,
              carriage return, and null.</li>
          <li>Questions remain regarding the relationship between the literal
              encodings and the locale dependent execution encodings.</li>
          <li>The C standard wording appears to require that characters used
              in a character or string literal must correspond to their
              encoding in the locale dependent execution encoding.</li>
        </ul>
      </li>
      <li>PBrett asked if the standard would be more clear if it referred to
          the Unicode standard instead of ISO 10646.</li>
      <li>Hubert replied that the editorial standards that the ISO imposes is
          salient for use in C++ and that the Unicode method of terminology
          doesn't meet those standards.</li>
      <li>Jens added that when we reviewed Unicode terminology a while back,
          we didn't find the Unicode terms very palatable.</li>
      <li>Jens noted that references to the Unicode standards will be required
          at some point to satisfy dependencies that are not present in
          ISO standards.</li>
      <li>PBrett observed that we don't have a guarantee that requirements
          don't change with new Unicode standards because of our planned
          dependence on
          <a href="https://unicode.org/reports/tr31">UAX #31</a>.</li>
      <li>Jens agreed.</li>
      <li>PBrett commented that the noted behavioral change regarding
          stringizing a UCN appears to be a bug fix and standardizes existing
          behavior.</li>
      <li>Hubert expressed concern that the new wording for
          <i>basic literal character set</i> may strengthen requirements to
          prohibit the same code unit value from meaning different things when
          it appears as a lead byte vs a trail byte.</li>
      <li>PBrett stated that the core language is not dependent on locale, so
          discussion of execution character sets should be moved to library
          wording.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2297r0">P2297R0: Wording improvements for encodings and character sets</a>:
    <ul>
      <li>Corentin presented:
        <ul>
          <li>Corentin's presentation slides are available
              <a href="https://docs.google.com/presentation/d/1PlCX8-0DbBXIr4OJU2jlec0Bwmd662udPP1tuENuTlI">here</a>.</li>
          <li>Jens' paper is good, but there are some areas of
              disagreement.</li>
          <li>We agree on removing implicit production of UCNs in
              translation phase 1.</li>
          <li>We should strive to better use terminology compatibile with
              Unicode.</li>
          <li>It isn't clear that we all have the same mental model of
              translation.</li>
          <li>There is disagreement with regard to Jens'
              <i>translation character set</i> and that translation is not
              expressed using Unicode terminology.</li>
          <li>Translation can be defined in terms of UCS scalar values;
              characters are not necessary.</li>
          <li>An abstract translation character set solves a problem that
              doesn't exist if the right terminology is used.</li>
          <li>The output of translation phase 1 should be a sequence of
              UCS scalar values.</li>
          <li>UCS scalar value may not a great term; we can use an alias, but
              it should not denote a character or abstract character.</li>
          <li>We agree on renaming <i>basic source character set</i> to
              <i>basic character set</i>.</li>
          <li>We disagree on the need for <i>basic literal character set</i>;
              we should pursue other means of handling the special
              requirements for alert, backspace, carriage return, and null.</li>
          <li>The relationship between the literal encodings and locale
              dependent execution encodings have interesting ramifications:
            <ul>
              <li><tt>isalpha('a') // Can this ever return false?</tt></li>
              <li><tt>isalpha('é') // Can this ever return false?</tt></li>
            </ul>
          </li>
          <li>It is ok to retain the status quo, but execution encoding
              concerns should be moved to library wording.</li>
          <li>Raw string literal delimiters are restricted to the basic
              character set; perhaps that should be changed.</li>
        </ul>
      </li>
      <li>Hubert stated that EBCDIC code pages can't meet a requirement that
          all members of the basic character set are encoded the same in all
          supported locale encodings.</li>
      <li>Jens noted the implication that we cannot even rely on a
          correspondence between the literal encoding and the execution
          encoding even for members of the basic character set.</li>
      <li>Hubert stated that reducing the restrictions for raw literal
          delimiters would complicate tokenization.</li>
      <li>Jens expressed a desire to poll a preference for abstract characters
          vs UCS scalar values.</li>
      <li>Mark stated that header names effectively need to specify a sequence
          of bytes and it isn't clear that these should be called
          characters.</li>
      <li>Steve noted that there is plenty of implementation-defined behavior
          involved in processing header names.</li>
      <li>Jens stated that, with the current phrasing, a lone surrogate code
          point cannot be represented in a program since they are rejected as
          UCNs; but an implementation could allow that as an extension.</li>
      <li>Jens opined that we should allow extended characters in header
          names.</li>
      <li>Corentin noted that the status quo is maintained by both papers.</li>
    </ul>
  </li>
  <li>Tom stated that the next telecon will be on March 10th and will continue
      this discussion.</li>
</ul>


<h1 id="2021_03_10">March 10th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Continue discussion from the last telecon with updated draft paper revisions:
    <ul>
      <li><a href="https://wiki.edg.com/pub/Wg21virtual2021-02/SG16/d2314r1.html">D2314R1: Character sets and encodings</a></li>
      <li><a href="https://isocpp.org/files/papers/D2297R1.pdf">D2297R1: Wording improvements for encodings and character sets</a></li>
    </ul>
  </li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Bindels</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://wiki.edg.com/pub/Wg21virtual2021-02/SG16/d2314r1.html">D2314R1</a> vs
      <a href="https://isocpp.org/files/papers/D2297R1.pdf">D2297R1</a>:
    <ul>
      <li>Peter provided an introduction.</li>
      <li>Corentin initiated discussion:
        <ul>
          <li>The primary disagreement concerns the introduction of
              "translation character set" instead of using Unicode terminology
              directly.</li>
          <li>The proposed wording uses "abstract character" in a way that is
              explicitly prohibited by the Unicode standard.</li>
          <li><em>[ Editor's note: ISO 10646 does not define
              "abstract character"; it is only defined by the
              Unicode standard. ]</em></li>
          <li><em>[ Editor's note: Unicode 13, chapter 3.2,
              "Conformance Requirements", conformance clause C2 states:</em>
              <div style="padding: .5em; background: #E9FBE9">
              <table>
                <tr>
                  <td>C2</td>
                  <td></td>
                  <td>A process shall not interpret a noncharacter code point as an abstract character.</td>
                </tr>
                <tr>
                  <td></td>
                  <td>•</td>
                  <td>The noncharacter code points may be used internally, such as for sentinel values or delimiters, but should not be exchanged publicly.</td>
                </tr>
              </table>
              </div>
              <em>]</em></li>
          <li><em>[ Editor's note: Later revisions of
              <a href="https://wiki.edg.com/pub/Wg21virtual2021-02/SG16/d2314r1.html">D2314R1</a>
              replaced "abstract character" with "character" in the proposed wording. ]</em></li>
          <li>We should prefer well known terminology and, in this case, not
              divert from the Unicode standard.</li>
          <li>We should avoid using ambiguous terminology like "character".</li>
          <li>The technically correct term is UCS scalar value.</li>
          <li>The distinction between a character and a scalar value is not
              relevant for a lexer; lexing can be described using either
              form.</li>
          <li>Thinking of lexing in terms of code units or scalar values may
              feel unintuitive, but that reflects how implementations actually
              work.</li>
        </ul>
      </li>
      <li>Hubert stated that his impression of Jens' intent in using character
          terminology is that source code is considered text.</li>
      <li>Hubert continued; we tend to think of text as a sequence of
          characters, so it is not clear if a sequence of UCS scalar values
          constitutes text.</li>
      <li>PBrett noted prior discussions that questioned whether C++ source
          code constitutes text since preservation of unassigned UCS scalar
          values, which do not correspond to characters, is required.</li>
      <li>PBrett stated that the desired model is one that describes
          translation in terms of UCS scalar values.</li>
      <li>Corentin agreed with Pter.</li>
      <li>Steve also agreed with Peter and added that even a pedantic reading
          of lexing in the C++ standard should not depend on Unicode
          version.</li>
      <li>Jens agreed with regard to avoiding dependence on Unicode version
          with the exception of recognizing identifiers since that requires
          <a href="https://unicode.org/reports/tr31">UAX #31</a>; translation
          must not otherwise be affected by Unicode version.</li>
      <li>Jens disagreed that it is otherwise beneficial to expose UCS scalar
          values more widely in the standard.</li>
      <li>Jens acknowledged the perspective that, due to the possibility of
          unassigned scalar values, C++ source may not be text, but opined that
          such cases will largely be introduced by
          <i>universal-character-name</i>s (UCNs).</li>
      <li>Jens added that concerns regarding unassigned scalar values can be
          addressed in a foot note.</li>
      <li>PBrett asked if there is a technical concern that motivates
          introducing "translation character set".</li>
      <li>Jens replied that he did not think there is a technical requirement
          and acknowledged that translation could be described in the standard
          using UCS scalar values.</li>
      <li>PBrett asked if Jens considers <i>universal-character-name</i> an
          unattractive term and, if not, why UCS scalar value should be
          disfavored.</li>
      <li>Jens replied that UCNs are used for a limited purpose and do not
          appear frequently in the standard; use of UCS scalar value for
          translation would require many more uses of that term..</li>
      <li>Jens added that UCS scalar values denote an integer value and that
          is inconsistent with how lexing is described elsewhere.</li>
      <li>Jens noted an example; proliferation of UCS scalar value leads to
          specifying "reinterpret_cast" as a sequence of numberic values.</li>
      <li>Jens concluded that this debate concerns a presentation issue in
          the C++ standard.</li>
      <li>Tom noted that it is not possible to distinguish between assigned
          and unassigned UCS scalar values without consulting a specific
          version of the Unicode standard.</li>
      <li>Tom added that there are cases where a single abstract character is
          mapped to more than one UCS scalar value but where the standard must
          define behavior based on the UCS scalar value, not the character.</li>
      <li><em>[ Editor's note: Examples include compatibility characters such as
          the angstrom character (Å, mapped to U+00C5 and U+212B) and the ohm
          character (Ω, mapped to U+03A9 and U+2126). ]</em></li>
      <li>Corentin summarized some of the concerns; Tom and Jens are concerned
          about describing translation in terms of UCS scalar values because
          those denote an integer.</li>
      <li>Zach expressed uncertainty with regard to how use of UCS scalar value
          leads to having to describe lexing in terms of integers;
          implementations always represent characters using numbers.</li>
      <li>Tom replied that source code read from a napkin or a blackboard
          doesn't go through a numeric translation; we think about lexing in
          terms of characters, not numbers.</li>
      <li>Hubert stated that the goal is to require a mapping for translation
          without having to explicitly specify it.</li>
      <li>Hubert added that there is less friction if UCS scalar values are
          used; that means that improving presentation in the standard should
          be the goal.</li>
      <li>Zach asked to clarify how presentation would be confusing.</li>
      <li>Hubert replied that translation is described assuming textual
          representation in the standard.</li>
      <li>Zach asked if that could be addressed in some front matter text.</li>
      <li>Hubert replied that he thinks it could be.</li>
      <li>Corentin asked Jens if Hubert's suggestion would alleviate his
          concerns.</li>
      <li>Hubert stated that, in terms of front matter, context matters; what
          would be needed is to state that the glyphs used in the specification
          are a proxy for UCS scalar values.</li>
      <li>Zach suggested including such a statement with the description of
          the basic character set.</li>
      <li>Jens replied that the basic character set concerns character sets
          rather than tokens and lexing.</li>
      <li>Jens suggested that
          <a href="http://eel.is/c++draft/lex.pptoken#2">[lex.pptoken]p2</a>
          may be a more appropriate location.</li>
      <li>Jens acknowledged that there should be a blanket statement somewhere
          noting how the glyphs used in the standard are to be interpreted.</li>
      <li>Tom returned discussion to Corentin's question regarding whether
          Hubert's suggestion would alleviate his concerns.</li>
      <li>Jens replied that he finds it to be a useful clarification, but that
          he continues to believe that general readers are better served by use
          of an abstraction.</li>
      <li>Jens added that use of UCS scalar value raises the question of
          assigned vs unassigned characters where we don't want to make a
          distinction and just state that some character is denoted here.</li>
      <li>Jens again acknowledged that this is purely a presentation concern,
          not a semantic one.</li>
      <li>Corentin stated that use of UCS scalar value may cause confusion for
          readers not familiar with the term, but noted that is exactly why
          character is not a good term; people have different ideas of what a
          character is, so use of UCS scalar value will force such readers to
          look up the definition in order to understand the intent.</li>
      <li>PBrett noted that he knows of colleagues that think of code point as
          a character.</li>
      <li>Hubert stated that his impression of Jens' perspective is that we
          don't want people paying attention to something that might be
          distracting if we can gloss over it.</li>
      <li>Hubert expressed support for Corentin's perspective, implementors
          can't afford to gloss over such specification and must deep dive to
          determine intended behavior; UCS scalar value may be simpler.</li>
      <li>Jens replied that he defined translation character set as precisely
          as he could.</li>
      <li>Jens provided
          <a href="http://eel.is/c++draft/lex.pptoken#2">[lex.pptoken]p2</a>
          as an example where there is mention of white-space characters that
          would require replacement written in terms of UCS scalar values, but
          where we do not have a specification.</li>
      <li>Jens noted that he has been replacing use of colloquial names of
          characters with ISO 10646 U+XXXX short names.</li>
      <li>PBrett suggested this paragraph requires an update regardless
          then.</li>
      <li>Jens agreed and added that a definition of white-space should
          precede it.</li>
      <li>Jens provided
          <a href="http://eel.is/c++draft/lex.pptoken#3">[lex.pptoken]p3</a>,
          as an additional example where there are many uses of
          "character".</li>
      <li>PBrett asked to clarify that the concern is the many pre-exising
          uses of "character" in these clauses.</li>
      <li>Jens replied that migrating away from "character" in this section
          would damage presentation.</li>
      <li>Jens added that it feels like a category error to say, "if the next
          three UCS scalar values are ..."</li>
      <li>Zach asked if colloquial terms could continue to be used with the
          previously suggested front matter.</li>
      <li>Jens replied that doing so is ok normatively, but still feels like
          a category error.</li>
      <li>Zach noted that in mathematical descriptions, an alternate notation
          is sometimes used because it is thought to be more convenient.</li>
      <li>Zach noted that the proposed wording doesn't replace newline.</li>
      <li>Jens replied that newline is special since it does not necessarily
          denote <tt>\n</tt>, it could be <tt>\r\n</tt>.</li>
      <li>Tom added that it could also represent the end of a record in a z/OS
          data set.</li>
      <li>Corentin suggested that the pre-existing concerns regarding newlines
          not be addressed in this paper.</li>
      <li>Jens agreed and stated that he did not intend to address those.</li>
      <li>Zach expressed a preference for retaining glyphs rather than
          swapping them out for U+XXXX short names and noted that
          "U+005C REVERSE SOLIDUS" is unambiguously less clear than
          "backslash `\`".</li>
      <li>Tom suggested that both could be presented;
          "U+005C REVERSE SOLIDUS (\)".</li>
      <li>Corentin agreed that "REVERSE SOLIDUS" is not a great name, but that
          including the U+XXXX short name removes ambiguity.</li>
      <li>Steve expressed sympathy for Zach's point; people don't read this
          part of the standard because they think they know what it means and
          then end up surprised; you have to know what it means before you can
          understand the wording.</li>
      <li>Steve agreed that, in terms of presentation, and as is seen in other
          standards, the redundancy of glyph, the U+XXXX short name, and the
          full name is helpful; if the glyph is known, the names can be skipped
          and if the glyph is unknown or ambiguous, then the names
          unambiguously identify the character.</li>
      <li>Tom noted that we've discussed this question for an hour and
          suggested we poll it and then proceed to another topic.</li>
      <li>Mark asked Jens if his concerns have been addressed.</li>
      <li>Jens replied that he is still concerned about presentation.</li>
      <li>Tom wondered if we might get inspiration from other language
          standards that describe lexing in Unicode terms.</li>
      <li>PBrett responded that they all describe it in terms of characters,
          but that they are able to define character in a reasonable way.</li>
      <li>Zach suggested adding a definition of "character".</li>
      <li>Jens indicated no intention of renaming "character literal" and noted
          that, outside of lexing, translation is only concerned with
          tokens.</li>
      <li>Zach asked what the ramifications would be if core language used
          "character" differently than library.</li>
      <li>Jens opined that defining "character" and then not using it
          consistently would be worse than the status quo.</li>
      <li>Hubert stated that defining "character" to suit this specific purpose
          is not a good idea and that it would be a drafting violation to use
          an inconsistent definition.</li>
      <li>Steve observed that use of "character" in the library is equally as
          bad as in the core language.</li>
      <li><b>Poll: Introduce the concept of a 'translation character set' which
          synthesizes characters for unassigned UCS scalar values.</b>
        <ul>
          <li><b>Attendance: 9</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">1</th>
                <th style="text-align:right">4</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">2</th>
                <th style="text-align:right">2</th>
              </tr>
            </table>
          </li>
          <li><b>No consensus.</b></li>
        </ul>
      </li>
      <li>Corentin asked for clarification regarding the desire for
          "translation character set"; specifically as to whether the term
          is useful or because UCS scalar value is an unattractive term.</li>
      <li>Mark replied that "translation character set" allows existing
          implementations to remain conforming.</li>
      <li>Hubert responded that both approaches have no impact on conformance
          and that this concerns a presentation issue.</li>
      <li>Corentin agreed with Hubert.</li>
      <li>Tom attempted to clarify Mark's point, that the abstraction
          acknowledges the possibility of multiple implementation
          techniques.</li>
      <li>Tom expressed appreciation for that level of abstraction.</li>
      <li>Hubert stated that whether implementations operate on scalar values
          or an "in-band transport" is observable; the C99 rationale was
          flawed.</li>
      <li>Hubert added that it took us a long time to recognize the
          C99 model A limitations.</li>
      <li>Tom asked what might lead people to change their vote.</li>
      <li>PBrett replied that the sticking point for him is the materialization
          of characters.</li>
      <li>Jens asked if that concern is more about the name
          "translation character set", or the use of "character" in the
          definition.</li>
      <li>Jens asked if substituting "element" or another abstract term for
          "character" would help.</li>
      <li>PBrett asked for an example of another character set that doesn't
          contain characters.</li>
      <li>Jens replied that most character sets contain something like bell
          that isn't really a character.</li>
      <li>Corentin opined that "element" would be an improvement, but since the
          elements would be equivalent to UCS scalar values, makes
          "translation character set" an unnecessary abstraction.</li>
      <li>Tom asked if there is a way to avoid UCS scalar values that represent
          unassigned character from coming into play.</li>
      <li>Corentin replied that he didn't think that was necessary; the
          mechanism described in Jens' paper is correct.</li>
      <li>Hubert observed that discussion so far has been concentrated on
          objections to the proposed abstraction and asked what might help
          improve the case for UCS scalar values.</li>
      <li>PBrett stated that he would prefer to have this paper with the
          proposed wording, than to not have it at all.</li>
      <li>Jens suggested a poll to forward the paper with direction that core
          resolve the wording concerns and noted that, between himself and
          Hubert, they would be able to represent both sides of the issue.</li>
      <li>Tom expressed support for that idea.</li>
      <li>Mark agreed and noted that it may just be necessary for more people
          to weigh in on the matter.</li>
      <li>Corentin stated that it seems kind of unfair to put that burden on
          core.</li>
      <li>Tom asked if Corentin would be willing to research what additional
          wording changes would be necessary for Jens' paper to switch to use
          of UCS scalar values.</li>
      <li>Corentin agreed to look into it.</li>
      <li><em>[ Editor's note: Corentin posted
          <a href="https://lists.isocpp.org/sg16/2021/03/2182.php">suggested changes to the SG16 mailing list</a>.
          ]</em></li>
    </ul>
  </li>
  <li>Tom announced that the next telecon will be help March 24th.</li>
  <li>Tom noted that, due to daylight savings time changes, the next telecon
      will start an hour later for those in North America timezones.</li>
</ul>


<h1 id="2021_03_24">March 24th, 2021</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Continue discussion from the last telecon concerning:
    <ul>
      <li><a href="https://wiki.edg.com/pub/Wg21telecons2021/SG16/d2314r2.html">D2314R2: Character sets and encodings</a></li>
      <li><a href="https://isocpp.org/files/papers/D2297R1.pdf">D2297R1: Wording improvements for encodings and character sets</a></li>
    </ul>
  </li>
  <li>Discuss priorities and goals for C++23.</li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>JeanHeyd Meneide</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Bindels</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://wiki.edg.com/pub/Wg21telecons2021/SG16/d2314r2.html">D2314R2</a>
      vs
      <a href="https://isocpp.org/files/papers/D2297R1.pdf">D2297R1</a>:
    <ul>
      <li>PBrett introduced the topic:
        <ul>
          <li>We are continuing discussion from the last telecon, but limiting
              discussion to new information.</li>
          <li>Corentin posted
              <a href="https://lists.isocpp.org/sg16/2021/03/2182.php">an email</a>
              that proposed a wording approach he hoped might bring
              consensus.</li>
        </ul>
      </li>
      <li>Corentin summarized the email.</li>
      <li>Jens noted that the email proposed using "universal character set",
          but that term is not defined in ISO 10646, so a different term or a
          definition would be needed.</li>
      <li>Jens added that a later reply suggested use of
          "Universal Coded Character Set" instead; that term is defined in
          ISO 10646 and appears to be isomorphic to the proposed
          "translation character set".</li>
      <li>PBrett asked if there is consensus for this direction.</li>
      <li>Corentin replied that there is an additional proposed change to
          replace use of "abstract character" with "element" in the definition
          of <i>universal-character-name</i> (UCN).</li>
      <li>Jens replied that he had already discontinued use of
          "abstract character" and that the members of the
          "translation character set" are denoted as "elements".</li>
      <li>Tom asked for confirmation that the universal coded character set
          includes members corresponding to unassigned code points and was
          informed that it does.</li>
      <li>Hubert noted that ISO 10646 uses "UCS" as an abbreviation for
          "Universal Coded Character Set".</li>
      <li>Hubert reported an issue with the ISO 10646 specification of the
          UCS; it doesn't actually state what the UCS elements are.</li>
      <li>Hubert added that the closest it comes to stating what those
          elements are is in "coded character".</li>
      <li><em>[ Editor's note: ISO 10646:2020 defines "coded character" in 3.8:</em>
          <div style="padding: .5em; background: #E9FBE9">
          <b>coded character</b><br/>
          association between a character and a code point
          </div>
          <em>]</em></li>
      <li>Steve stated in chat:
          "code points seem to be elements of the UCS codespace.
          Coded characters would be elements of the UCS, but there aren't
          'characters' that correspond to unassigned codepoints. IIUC."</li>
      <li>Jens shared ISO 10646:2020, chapter 6, "General Structure of the UCS"
          and noted that the description includes a "canonical form" of the UCS
          that <em>uses</em> the UCS codespace.</li>
      <li><em>[ Editor's note: ISO 10646:2020, chapter 6, "General Structure of the UCS" states:</em>
          <div style="padding: .5em; background: #E9FBE9">
          ...<br/>
          The canonical form of this coded character set &mdash; the way in
          which it is to be conceived &mdash; uses the UCS codespace which
          consists of the integers from 0 to 10FFFF.<br/>
          ...<br/>
          </div>
          <em>]</em></li>
      <li>PBrett observed that we don't seem to have consensus that the UCS can
          be used as Corentin proposed.</li>
      <li>Hubert confirmed and stated that, if we went forward with this, CWG
          would likely have to change it; probably to what Jens has
          proposed.</li>
      <li>Jens reiterated that, due to ISO rules, we are required to work with
          ISO 10646.</li>
      <li>Corentin asked if Jens' latest draft still uses "character" in the
          definition of translation character set or if that had ben switched
          to "element".</li>
      <li>Hubert confirmed that "character" is still used and shared the
          definition from the latest revision.</li>
      <li><em>[ Editor's note: that definition is:</em>
          <div style="padding: .5em; background: #E9FBE9">
          The translation character set consists of the following elements:
          <ul>
            <li>each character named by ISO/IEC 10646, as identified by its
                unique UCS scalar value, and</li>
            <li>a distinct character for each UCS scalar value where no named
                character is assigned.</li>
          </ul>
          </div>
          <em>]</em></li>
      <li>Jens stated that "character" had to be preserved for compatibility
          with existing wording that uses "character".</li>
      <li>Hubert stated that we can't expect Jens to publish a paper that
          glosses over inconsistencies in the use of "character".</li>
      <li>PBrett and Jens both agreed.</li>
      <li><b>Poll: Introduce the concept of a 'translation character set'
          which synthesizes characters for unassigned UCS scalar values.</b>
        <ul>
          <li><b>Attendance: 9</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">2</th>
                <th style="text-align:right">4</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
              </tr>
            </table>
          </li>
          <li><b>Consensus is in favor.</b></li>
          <li>SA: The "translation character set" abstraction is unnecessary
              and the definition uses terminology incorrectly.</li>
        </ul>
      </li>
      <li><b>Poll: Forward D2314R2 as presented on 2021-03-24 to EWG for
          inclusion in C++23.</b>
        <ul>
          <li><b>Attendance: 9</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">3</th>
                <th style="text-align:right">5</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>Consensus is in favor.</b></li>
        </ul>
      </li>
      <li>Jens stated that he will include poll results, note the objection,
          populate the revision section, and then submit the paper for the
          next mailing.</li>
      <li>Corentin asked if a request for escalatation to quickly progress
          this paper could be made to the EWG chair since other papers will
          depend on it.</li>
      <li>Tom replied that more motivation is needed to do so since no
          dependent papers have been forwarded from SG16 yet.</li>
    </ul>
  </li>
  <li>Priorities and goals for C++23:
    <ul>
      <li>PBrett inroduced the topic:
        <ul>
          <li>A number of items were shared in the
              <a href="https://lists.isocpp.org/sg16/2021/03/2200.php">agenda reminder email</a>
              that are candidates for prioritization for C++23.</li>
          <li>We'll briefly review each; those that are nominated as a target
              for C++23 and for which a champion is identified will be added
              to the prioritization poll.</li>
        </ul>
      </li>
      <li>PBindels asked if we have a discrete vision for C++23.</li>
      <li>Tom replied that that question is part of the motivation for this exercise.</li>
      <li><a href="https://wg21.link/p1628">P1628: Unicode character properties</a>:
        <ul>
          <li>Corentin stated that he does not plan to progress this paper
              for C++23.</li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p1629">P1629: Standard Text Encoding</a>:
        <ul>
          <li>JeanHeyd stated that the scope could be reduced for C++23.</li>
          <li>Steve reported good experience experimenting with the reference
              implementation and nominated it as a strong candidate for
              C++23.</li>
          <li>Steve added that he would want to have the support for ranges
              included.</li>
          <li>JeanHeyd replied that range support would probably be pursued
              via a different paper in order to avoid impeding progress on
              the core components.</li>
          <li>Corentin stated that this paper would be ambitious for C++23.</li>
          <li>Nominated; champion is JeanHeyd.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p1729">P1729: Text Parsing</a>:
        <ul>
          <li>Tom stated that he does not know Victor's plans for this
              paper.</li>
          <li>Corentin opined that this paper isn't really an SG16 concern
              until encodings are involved.</li>
          <li>PBrett responded that text processing in general is in the
              scope of SG16.</li>
          <li>Not nominated; Victor was not present.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p1854">P1854: Source to Execution encoding conversion should not lead to loss of information</a>:
        <ul>
          <li>Corentin reported a request to include the wide literal encoding
              in the paper.</li>
          <li>Corentin added that we voted to make this contingent on
              <a href="https://wg21.link/p1855">P1855</a>,
              but that paper is still making its way through LEWG.</li>
          <li>Tom stated that we don't have to wait on
              <a href="https://wg21.link/p1855">P1855</a>
              to be accepted to make progress on this.</li>
          <li>Nominated; champion is Corentin.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p1859">P1859: Standard terminology for execution character set encodings</a>:
        <ul>
          <li>Steve stated that this paper has been mostly subsumed by other
              work.</li>
          <li>PBrett asked if we should continue tracking this paper.</li>
          <li>Steve replied that we should; at least until things in flight
              land.</li>
          <li>Jens asked that the paper be reviewed to determine when/if it
              can be closed.</li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p1953">P1953: Unicode Identifiers And Reflection</a>:
        <ul>
          <li>Corentin stated that priority of this paper depends on what
              SG7 does for C++23.</li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p2071">P2071: Named universal character escapes</a>:
        <ul>
          <li>Jens noted that a paper revision is needed.</li>
          <li>Tom acknowledged and reported plans to complete a revision
              soon.</li>
          <li>Not nominated since this paper has already been forwarded out
              of SG16.</li>
        </ul>
      </li>
      <li><a href="https://wg21.link/p2295">P2295: Correct UTF-8 handling during phase 1 of translation</a>:
        <ul>
          <li>Corentin nominated.</li>
          <li>Nominated; champion is Corentin.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/9">Requiring wchar_t to represent all members of the execution wide character set does not match existing practice</a>:
        <ul>
          <li>Hubert noted that this will require corresponding changes
              for WG14.</li>
          <li>JeanHeyd stated that other work he is doing in WG14 will
              help with this.</li>
          <li>Nominated; champions are Tom and Corentin.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/38">std::to_chars/std::from_chars overloads for char8_t</a>:
        <ul>
          <li>Tom reported that this issue was raised from someone outside
              the committee.</li>
          <li>Corentin stated that the concept of a number is complicated in
              Unicode.</li>
          <li>PBindels noted that there are many different numbering
              systems.</li>
          <li>PBrett suggested that the scope of <tt>from_chars()</tt> should
              be restricted to only parsing text that could be produced by
              <tt>to_chars()</tt>.</li>
          <li>Jens stated that he had not considered other numbering systems,
              but had considered exposing the same functionality as for
              <tt>char</tt> for <tt>char8_t</tt> and UTF-8; this would suffice
              to provide portable ASCII support.</li>
          <li>Nominated; champion is Peter Bindels.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/53">Publish an SG16 library design guidelines paper</a>:
        <ul>
          <li>PBrett suggested removing this from the candidate list since it
              doesn't target the standard.</li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/57">Deprecate std::regex</a>:
        <ul>
          <li>PBrett noted that some NBs may object to such deprecation.</li>
          <li>Nominated; champions are Peter Bindels and Peter Brett.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/65">Make wide multicharacter character literals ill-formed</a>:
        <ul>
          <li>Nominated; champion is Peter Brett.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/66">Improve portable ingestion of command-line arguments</a>:
        <ul>
          <li>JeanHeyd recalled discussion of
              <a href="https://wg21.link/p1275">P1275</a>
              in San Diego.</li>
          <li>Tom asked if anyone knew if Isabella is planning a revision.</li>
          <li>JeanHeyd replied that further work is semi-dependent on his own
              transcoding work.</li>
          <li>Tom noted that an alternate design sketch is present in the
              issue comments.</li>
          <li>PBindels suggested a C++26 target due to lack of underlying
              existing functionality.</li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/67">Alias barriers; a replacement for the ICU hack</a>:
        <ul>
          <li>Tom volunteered and summarized recent off-list discussion;
              <a href="https://wg21.link/p0593">P0593</a> describes a
              <tt>start_lifetime_as()</tt> function that looks suitable for
              this, but it wasn't formally proposed.</li>
          <li>Nominated; champions are Tom and Mark.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/68">Support for UTF encodings in std::format() and std::print()</a>:
        <ul>
          <li>PBrett proposed only addressing the easy cases; the ones that
              don't require implicit transcoding.</li>
          <li>Corentin agreed and noted that Victor recently indicated
              intention to write such a paper.</li>
          <li>Nominated; champions are Victor and Peter Brett.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/69">Specify what constitutes white-space characters</a>:
        <ul>
          <li>Tom stated that
              <a href="https://wg21.link/p2295">P2295</a>
              intends to address this.</li>
          <li>Corentin reported that he removed that section of the paper in
              a yet-to-be published revision to ensure it wouldn't delay
              progress on the core portion of the paper.</li>
          <li>PBindels offered a comparison with
              <a href="https://wg21.link/p1949">P1949</a>
              and noted this shouldn't be a difficult paper, though it may
              not be very important.</li>
          <li>Hubert stated that there may be consensus issues with this since
              deferring to a Unicode definition of white-space character would
              introduce a lexing dependency on Unicode version.</li>
          <li>Steve opined that this isn't terribly important until we have
              portable Unicode encoded source files.</li>
          <li>Steve stated that it might help to clean up the specification
              though;
              <a href="https://wg21.link/p1949">P1949</a>
              may have helped by stabilizing identifiers.</li>
          <li>Corentin offered to send a message to the SG16 mailing list
              arguing for why this shouldn't be a priority.</li>
          <li><em>[ Editor's note: Corentin did follow up with
              <a href="https://lists.isocpp.org/sg16/2021/03/2207.php">an email</a>.
              ]</em></li>
          <li>Not nominated.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/70">Specify what constitutes a new-line</a>:
        <ul>
          <li>Not nominated for the same reasons as the previous item.</li>
        </ul>
      </li>
      <li><a href="https://github.com/sg16-unicode/sg16/issues/71">A portable mechanism to specify source file encoding</a>:
        <ul>
          <li>Tom expressed intent to work on this.</li>
          <li>Jens asked that Tom prioritize
              <a href="https://wg21.link/p2071">P2071</a>
              first.</li>
          <li>Tom agreed to do so.</li>
          <li>Corentin opined that this need not be a priority.</li>
          <li>Nominated; champion is Tom.</li>
        </ul>
      </li>
      <li>Clarify the relationship between the literal and execution encodings:
        <ul>
          <li>Corentin proposed this additional candidate.</li>
          <li><em>[ Editor's note: An SG16 issue was
              <a href="https://github.com/sg16-unicode/sg16/issues/72">filed</a>
              to track this addition. ]</em></li>
          <li>Nominated; champions are Corentin and Jens.</li>
        </ul>
      </li>
      <li>PBrett summarized the polling method Tom described in the agenda
          email.</li>
      <li>Tom suggested a simpler polling method; that everyone raise their
          hand for items that they felt most feasible and desirable for
          C++23.</li>
      <li><em>[ Editor's note: the poll results shown below have been
          reordered such that those that received the most votes are listed
          first. ]</em></li>
      <li><b>Poll: C++23 priorities.</b>
        <ul>
          <li><b>Attendance: 9</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:left">Votes</th>
                <th style="text-align:left">Candidate</th>
              </tr>
              <tr>
                <td style="text-align:right">6</td>
                <td style="text-align:left"><a href="https://wg21.link/p1854">P1854: Source to Execution encoding conversion should not lead to loss of information</a></td>
              </tr>
              <tr>
                <td style="text-align:right">6</td>
                <td style="text-align:left"><a href="https://wg21.link/p2295">P2295: Correct UTF-8 handling during phase 1 of translation</a></td>
              </tr>
              <tr>
                <td style="text-align:right">6</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/38">std::to_chars/std::from_chars overloads for char8_t</a></td>
              </tr>
              <tr>
                <td style="text-align:right">6</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/65">Make wide multicharacter character literals ill-formed</a></td>
              </tr>
              <tr>
                <td style="text-align:right">5</td>
                <td style="text-align:left"><a href="https://wg21.link/p1629">P1629: Standard Text Encoding</a></td>
              </tr>
              <tr>
                <td style="text-align:right">5</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/9">Requiring wchar_t to represent all members of the execution wide character set does not match existing practice</a></td>
              </tr>
              <tr>
                <td style="text-align:right">5</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/68">Support for UTF encodings in std::format() and std::print()</a></td>
              </tr>
              <tr>
                <td style="text-align:right">4</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/57">Deprecate std::regex</a></td>
              </tr>
              <tr>
                <td style="text-align:right">4</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/67">Alias barriers; a replacement for the ICU hack</a></td>
              </tr>
              <tr>
                <td style="text-align:right">4</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/71">A portable mechanism to specify source file encoding</a></td>
              </tr>
              <tr>
                <td style="text-align:right">4</td>
                <td style="text-align:left"><a href="https://github.com/sg16-unicode/sg16/issues/72">Clarify the relationship between the literal and execution encodings</a></td>
              </tr>
            </table>
          </li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom announced that the next telecon will be April 14th and the agenda
      will include
      <a href="https://wg21.link/p2295">P2295: Correct UTF-8 handling during phase 1 of translation</a>.</li>
  <li>Tom reminded participants from Europe of the pending switch to summer
      time and that the next telecon will be held an hour later than this
      one.</li>
</ul>


</body>
