<!doctype html public "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<head>
<title>SG16: Unicode meeting summaries 2022-10-12 through 2022-12-14</title>
</head>

<style type="text/css">

table#header th,
table#header td
{
    text-align: left;
}

tt {
    font-family: monospace;
}

/* Thanks to Elias Kosunen for the following CSS suggestions! */

* {
    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";
    line-height: 125%;
}

html, body {
    background-color: #eee;
}

h1, h2, h3, h4, h5, p, span, li, dt, dd {
    color: #333;
}

p, li {
    line-height: 140%;
}

body {
    padding: 1em;
    max-width: 1600px;
}

p, li {
    -moz-osx-font-smoothing: grayscale;
    -webkit-font-smoothing: antialiased !important;
    -moz-font-smoothing: antialiased !important;
    text-rendering: optimizelegibility !important;
    letter-spacing: .01em;
}

h1, h2, h3 {
    margin-bottom: 1em;
    letter-spacing: .03em;
}

blockquote.quote
{
    margin-left: 0em;
    border-style: solid;
    background-color: lemonchiffon;
    color: #000000;
    border: 1px solid black;
}

</style>

<body style="max-width: 8.5in">

<table id="header">
  <tr>
    <th>Document Number:</th>
    <td>P2766R0</td>
  </tr>
  <tr>
    <th>Date:</th>
    <td>2023-01-14</td>
  </tr>
  <tr>
    <th>Audience:</th>
    <td>SG16</td>
  </tr>
  <tr>
    <th>Reply-to:</th>
    <td>Tom Honermann &lt;tom@honermann.net&gt;</td>
  </tr>
</table>


<h1>SG16: Unicode meeting summaries 2022-10-12 through 2022-12-14</h1>

<p>
Summaries of SG16 meetings are maintained at
<a href="https://github.com/sg16-unicode/sg16-meetings">
https://github.com/sg16-unicode/sg16-meetings</a>.  This paper contains a
snapshot of select meeting summaries from that repository.
</p>

<ul>
  <li><a href="#2022_10_12">
      October 12th, 2022</a></li>
  <li><a href="#2022_10_19">
      October 19th, 2022</a></li>
  <li><a href="#2022_11_02">
      November 2nd, 2022</a></li>
  <li><a href="#2022_11_30">
      November 30th, 2022</a></li>
  <li><a href="#2022_12_14">
      December 14th, 2022</a></li>
</ul>

<p>
Previously published SG16 meeting summary papers:
<ul>
  <li><a href="https://wg21.link/p1080">P1080: SG16: Unicode meeting summaries 2018/03/28 - 2018/04/25</a></li>
  <li><a href="https://wg21.link/p1137">P1137: SG16: Unicode meeting summaries 2018/05/16 - 2018/06/20</a></li>
  <li><a href="https://wg21.link/p1237">P1237: SG16: Unicode meeting summaries 2018/07/11 - 2018/10/03</a></li>
  <li><a href="https://wg21.link/p1422">P1422: SG16: Unicode meeting summaries 2018/10/17 - 2019/01/09</a></li>
  <li><a href="https://wg21.link/p1666">P1666: SG16: Unicode meeting summaries 2019/01/23 - 2019/05/22</a></li>
  <li><a href="https://wg21.link/p1896">P1896: SG16: Unicode meeting summaries 2019/06/12 - 2019/09/25</a></li>
  <li><a href="https://wg21.link/p2009">P2009: SG16: Unicode meeting summaries 2019-10-09 through 2019-12-11</a></li>
  <li><a href="https://wg21.link/p2179">P2179: SG16: Unicode meeting summaries 2020-01-08 through 2020-05-27</a></li>
  <li><a href="https://wg21.link/p2217">P2217: SG16: Unicode meeting summaries 2020-06-10 through 2020-08-26</a></li>
  <li><a href="https://wg21.link/p2253">P2253: SG16: Unicode meeting summaries 2020-09-09 through 2020-11-11</a></li>
  <li><a href="https://wg21.link/p2352">P2352: SG16: Unicode meeting summaries 2020-12-09 through 2021-03-24</a></li>
  <li><a href="https://wg21.link/p2397">P2397: SG16: Unicode meeting summaries 2021-04-14 through 2021-05-26</a></li>
  <li><a href="https://wg21.link/p2512">P2512: SG16: Unicode meeting summaries 2021-06-09 through 2021-12-15</a></li>
  <li><a href="https://wg21.link/p2605">P2605: SG16: Unicode meeting summaries 2022-01-12 through 2022-06-08</a></li>
  <li><a href="https://wg21.link/p2678">P2678: SG16: Unicode meeting summaries 2022-06-22 through 2022-09-28</a></li>
</ul>
</p>


<h1 id="2022_10_12">October 12th, 2022</h1>

<h2>Draft agenda:</h2>

<ul>
  <li>Michael Kuperstein: Internationalization From the Perspective of Defect Analysis</li>
  <li>NB comment processing.</li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Charles Barto</li>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark de Wever</li>
  <li>Mark Zeren</li>
  <li>Michael Kuperstein</li>
  <li>Nevin Liber</li>
  <li>Peter Brett</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Tomasz Kamiński</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li>Michael Kuperstein: Internationalization From the Perspective of Defect Analysis
    <ul>
      <li><em>[ Editor's note: Michael's slides are available at
          <a href="https://github.com/sg16-unicode/sg16-meetings/blob/master/presentations/2022-10-12-i18n-presentation.pptx">https://github.com/sg16-unicode/sg16-meetings/blob/master/presentations/2022-10-12-i18n-presentation.pptx</a>.
          ]</em>
      </li>
      <li>Michael provided a brief introduction:
        <ul>
          <li>He has been working for Intel since 1996.</li>
          <li>He has been working in Intel's localization group since 2001.</li>
        </ul>
      </li>
      <li>Slide 1: Internationalization From the Perspective of Defect Analysis</li>
      <li>Slide 2: Venn Diagram</li>
      <li>Slide 3: Defects in Localized Software
        <ul>
          <li>The defect breakdown presented is from an analysis performed in
              2011.</li>
          <li>Internationalization and localization defects are usually found
              by the localization team.</li>
          <li>Localization defects can often be fixed by the localization team;
              as a result, localization teams tend to maintain their own defect
              database.</li>
          <li>Localization defects that require a fix by a development team tend
              to first be reported in a defect database maintained by the
              localization team and then migrated to another team's defect
              database.</li>
        </ul>
      </li>
      <li>Slide 4: World-Readiness Defect Types
        <ul>
          <li>Most localization defects are due to UI, Layout, or formatting
              issues.</li>
          <li>The next largest category of defects are due to translation
              issues.</li>
          <li>Defects due to non-translated and embedded strings make up the
              next largest two categories.</li>
          <li>Defects due to encoding issues make up the smallest defect
              category, but are very important.</li>
          <li>For software developers, internationalization and localization
              support is a small part of their total effort, but an important
              part.</li>
        </ul>
      </li>
      <li>Slide 5: Code Scans: I18N Issues by Volume
        <ul>
          <li>The top two categories of issues found by code scans are
              hard-coded strings and hard-coded formatting.</li>
        </ul>
      </li>
      <li>Slide 6: I18N Issues by Volume – Honorable Mentions
        <ul>
          <li>A consistent internal locale insensitive representation of dates
              is necessary to prevent failures.</li>
          <li>Steve confirmed that the general shape of relative error counts
              presented matches his experience.</li>
          <li>Steve reported that products he has worked on avoid localized
              formatting of dates so as to avoid confusion; likewise, "." is
              consistently used for decimal point.</li>
        </ul>
      </li>
      <li>Slide 7: More than 150 string formatting functions in C/C++ on Windows
        <ul>
          <li>Charlie noted that most of those 150 functions wrap a common
              underlying formatting function.</li>
          <li>Corentin suggested bumping the number to 151 now that
              <tt>std::format()</tt> has been standardized.</li>
        </ul>
      </li>
      <li>Slide 8: Defaults: Fall into the pit of success
        <ul>
          <li>Use of UTF-16 made it easier to produce the right results on
              Windows.</li>
          <li>A string class that basically does the right thing makes it easier
              to get the right result.</li>
          <li>The goal is to guide developers towards doing the right
              thing.</li>
          <li>Many programmers like string interpolation.</li>
          <li>ICU discussion:
            <ul>
              <li>Charlie reported that the ICU included in Windows doesn't
                  expose the C++ interface.</li>
              <li>Michael noted that, in .NET languages, programmers can choose
                  either ICU or the native Windows NLS subsystem for
                  localization, but programmers generally use the default.</li>
              <li>Charlie asked if ICU is mostly present for transcoding
                  purposes.</li>
              <li>Michael replied that he doesn't believe that to be the case
                  since .NET interfaces can defer to ICU for more localization
                  purposes.</li>
              <li>Michael expressed a belief that ICU is more deeply integrated
                  on Apple systems.</li>
              <li>PBrett asked what defect category would best be associated
                  with cases where programmers incorrectly attempt to produce
                  translated strings via concatenation.</li>
              <li>Michael expressed uncertainty, suggested "other", and noted
                  that such issues are very common but not called out
                  specifically in the slides.</li>
              <li>Michael acknowledged that, for some applications, issues due
                  to concatenation are one of the most common problems, but
                  that doesn't happen to be the case for Intel.</li>
              <li>Michael reiterated that making sure programmers fall into the
                  pit of success is important.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Slide 9: Quick Intro to BCP 47 Language Tags and Fallback
        <ul>
          <li>Spoken language is not relevant for text presentation; written
              language, or script, is.</li>
          <li>Chinese has two forms of written language; simplified and
              traditional.</li>
          <li>It is important to specify fallback locales; otherwise, a request
              for zh-SG when it is not available may result in a default
              language like English rather than zh-CN.</li>
          <li>Specifying a hierarchy of fallbacks such as zh-Hans and zh-Hant
              is recommended.</li>
          <li>Since C++ locales don't appear to provide locale fallbacks, it
              may be necessary to supply support for all of them; perhaps by
              providing the same locale data for, e.g., zh-CN and zh-SG.</li>
          <li>Steve noted that English is a better fall back than blank strings
              or the "tofu" character.</li>
        </ul>
      </li>
      <li>Slide 10: User Language Selection Choices
        <ul>
          <li>The .NET languages wrap locale info in a <tt>CultureInfo</tt>
              type.</li>
          <li>They also allow various components of a locale to be selected from
              different locales.</li>
          <li>Programmers can create their own custom cultural definitions.</li>
          <li>Thread specific locale selection is infrequently used; it is more
              common to supply a locale object locally when constructing a
              string for presentation.</li>
          <li>Browsers have multiple language settings; one for the browser UI
              itself and another for the requested page language.</li>
        </ul>
      </li>
      <li>Slide 11: Date formatting
        <ul>
          <li>Use ISO 8601 for date formatting and store times relative to UTC
              internally.</li>
          <li>Convert dates to the appropriate locale for presentation.</li>
          <li>Likewise, use one encoding internally and convert for presentation
              and at program boundaries.</li>
          <li>Hubert asked if Michael had any opinions on the use of ISO week
              days and numbers.</li>
          <li>Michael responded that he has no opinion on that.</li>
        </ul>
      </li>
      <li>Slide 12: The Famous Turkish “İ” Problem
        <ul>
          <li>Locale sensitive uppercasing may translate "i" to "İ"
              (dot retained on uppercase I).</li>
          <li>Locale sensitive lowercasing may translate "I" to "ı"
              (dot omitted on lowercase i).</li>
          <li>This is why it is important to test with Turkish locales!</li>
          <li>Various languages offer locale invariant or case insensitive case
              folding operations.</li>
          <li>ICU collation solves many of these problems when used
              correctly.</li>
          <li>Some form of collation should be used for file name matching.</li>
          <li>Hubert asked if it would generally be expected for a file with an
              uppercase dotted I like "FILE.GİF" to match a request for files
              named with a ".gif" extension.</li>
          <li>Michael responded affirmatively; that would generally be
              desired.</li>
          <li>Tom observed that such use cases may be more aligned with a form
              of transliteration.</li>
          <li>Corentin responded that Unicode case folding as defined in
              <a href="https://unicode.org/reports/tr35">UAX #35</a>
              handles that case, but that standard C++ doesn't provide an
              interface.</li>
        </ul>
      </li>
      <li>Slide 13: Formats (numbers, dates, etc.) are not as straightforward as they appear
        <ul>
          <li>ICU's message formatting abilities handle all of these.</li>
          <li>Corentin noted that currency symbols should not be locale
              dependent and that C++ got this wrong.</li>
        </ul>
      </li>
      <li>Slide 14: Many other things can go wrong when dealing with international users
        <ul>
          <li>Handling plural forms is important; the .NET languages do not
              handle plural forms or gendering.</li>
        </ul>
      </li>
      <li>Slide 15: JavaScript i18n Objects and Namespaces
        <ul>
          <li>JavaScript only provides a small number of builtins; i18n is a
              separate package.</li>
          <li>Current browser versions provide the JavaScript i18n namespace;
              polyfill is required for older browser versions.</li>
          <li>Since the language doesn't provide it as a builtin, there are
              thousands of i18n packages available.</li>
        </ul>
      </li>
      <li>Slide 16: .NET Culture Aware Classes and Namespaces
        <ul>
          <li>The .NET languages provide a relatively complete solution that
              is improving each year.</li>
          <li>The .NET fundamentals documentation is extensive.</li>
          <li>Resource files are easy for .NET languages and can be provided
              in a number of formats.</li>
          <li>The .NET languages support gettext-like methods for retrieving
              translated strings.</li>
        </ul>
      </li>
      <li>Slide 17: Resource File Formats
        <ul>
          <li>Some resource file formats are differentiated by encoding.</li>
        </ul>
      </li>
      <li>Slide 18: Read All Lines From a File
        <ul>
          <li>Some languages provide more ergonomic interfaces.</li>
        </ul>
      </li>
      <li>Slide 19: Byte Order Mark (BOM) and Endian descriptions
        <ul>
          <li>On Windows, the default encoding used to be a locale dependent
              "ANSI" encoding, but modern editors are more likely to default
              to UTF-8.</li>
          <li>C and C++ don't provide interfaces for file encoding detection
              and it isn't easy to implement well.</li>
        </ul>
      </li>
      <li>Slide 20: Character Count vs Byte Count
        <ul>
          <li>Character counts tend to be close to code unit count for many
              languages for text encoded in UTF-16.</li>
          <li>It is not easy to obtain a count of characters.</li>
          <li>Corentin asked when it is useful to count characters.</li>
          <li>Michael responded that a number of cases exist and provided an
              example of a buffer for which the user is told how many more
              characters they can expect to type; Twitter is an example for
              which both characters and bytes are counted.</li>
        </ul>
      </li>
      <li>Slide 21: Character Encodings (Incomplete List)
        <ul>
          <li>In C and C++, <tt>char</tt> doesn't have a strongly associated
              encoding.</li>
          <li>PBrett asked how often the lack of a strongly associated encoding
              leads to defects.</li>
          <li>Michael responded that it is not as much of a problem as it used
              to be, but that there are still many locale dependent "ANSI"
              encoded files to be found.</li>
        </ul>
      </li>
      <li>Slide 22: RTL Text Detection</li>
      <li>Tom asked the group what stood out to them from the presentation.
        <ul>
          <li>PBrett noted that C++ doesn't make it easy to write programs that
              are locale insensitive internally but locale sensitive at program
              boundaries.</li>
          <li>Michael noted that <tt>gettext()</tt> provides an example of how
              plural forms can be handled.</li>
          <li>Jens observed that, with <tt>std::format()</tt>, we're still far
              away from providing proper localization support; it doesn't yet
              lead to the pit of success.</li>
          <li>Tom noted that the possibility of extending <tt>std::format()</tt>
              creates opportunity.</li>
          <li>Michael noted that formatting is often used for internal uses
              that don't require localization or translation.</li>
          <li>Steve stated that the experiences reported closely match his
              experience at Bloomberg.</li>
        </ul>
      </li>
    </ul>
  <li>NB comment processing.
    <ul>
      <li>NB comment processing was postponed due to lack of time.</li>
    </ul>
  </li>
  <li>Tom reported that he would not be available for the previously scheduled
      2022-10-26 meeting and suggested rescheduling meetings for 2022-10-19 and
      2022-11-02 with the intent to focus on addressing NB comments in advance
      of the Kona meeting; there were no objections.</li>
</ul>


<h1 id="2022_10_19">October 19th, 2022</h1>

<h2>Draft agenda:</h2>

<ul>
  <li> NB comment processing.</li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Peter Bindels</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://github.com/cplusplus/nbballot/issues/474">US 2-029 3.35 [defns.multibyte] Give context for "execution character set"</a>:
    <ul>
      <li>Steve presented the concern:
        <ul>
          <li>The definition of multibyte character refers to the locale
              dependent execution character set.</li>
          <li>Changing this might be difficult, but removing the reference to
              "execution character set" might help.</li>
        </ul>
      </li>
      <li>Hubert stated that the use of "multibyte character" in the library
          wording is consistent with the definition.</li>
      <li>Tom asked if Hubert is suggesting that this is not a defect.</li>
      <li>Hubert responded affirmatively.</li>
      <li>Steve stated that he would agree if the definition was in the library
          section.</li>
      <li>Jens explained that there used to be a terms and definitions section
          in the library wording but that ISO required it to be merged with the
          section in the core wording back in the C++17 time frame.</li>
      <li>Hubert noted that the only use of "multibyte character" is in the
          library wording.</li>
      <li>Corentin responded that there are indirect uses of it via
          "multibyte string" and "NTMBS" in the definition of the <tt>main</tt>
          function in
          <a href="http://eel.is/c++draft/basic.start.main">[basic.start.main]</a>.</li>
      <li>Corentin noted that all uses of it are intended to refer to the
          locale encoding.</li>
      <li>Tom asked if it would make sense to strike the term so that it is
          inherited from the C standard.</li>
      <li>Jens expressed a preference not to do so.</li>
      <li>Steve stated that doing so might have unintended consequences.</li>
      <li>Tom summarized the sentiment expressed so far; we're leaning towards
          this not being a defect but that there are opportunities for
          improvement via editorial changes.</li>
      <li>Tom suggested that any such editorial changes be left up to the
          CWG.</li>
      <li>Jens replied that the CWG is likely to decline to make any changes
          without a proposed change.</li>
      <li>Corentin asked if an editorial pull request could be submitted.</li>
      <li>Jens replied affirmatively.</li>
      <li>Hubert stated that the concern that Corentin raised regarding use of
          "multibyte character" with <tt>main</tt> is an issue.</li>
      <li>Jens asserted that would be a different core issue.</li>
      <li><b>Poll 1: [US 2-029] SG16 suggests to consider this issue as
          "not a defect", but to improve the presentation by editorially moving
          the definition of "multibyte character" to
          <a href="http://eel.is/c++draft/multibyte.strings">[multibyte.strings]</a>.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li><em>[ Editor's note: Corentin submitted a pull request that
          implements the polled direction at
          <a href="https://github.com/cplusplus/draft/pull/5910">https://github.com/cplusplus/draft/pull/5910</a>.
          ]</em></li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/515">US 38-098 22.14.6.4p1 [format.string.escaped] Escaping for debugging and logging</a>:
    <ul>
      <li>Hubert presented the concern:
        <ul>
          <li>The feature description claims to provide a larger scope than it
              serves; the design doesn't suffice to address all logging
              scenarios.</li>
          <li>It is not clear that the escaped string is required to be usable
              as a string literal.</li>
        </ul>
      </li>
      <li>Victor opined that the proposed change to replace "logging" with
          "technical logging" makes sense.</li>
      <li>Victor expressed a preference against the second bullet regarding
          visually distinguishing equivalent text that is differently
          encoded.</li>
      <li>Victor stated that the primary motivation for the feature was to
          produce a character sequence that would not interfere with the
          formatting of ranges.</li>
      <li>Victor noted that the feature has existing experience with both
          Python and Rust and that the chosen design is modeled after
          Rust.</li>
      <li>Victor asserted that the proposed change to allow for future addition
          of alternative escaping methods is unnecessary since other extension
          methods are already available.</li>
      <li>Jens stated that the concern seems mostly related to the first
          sentence of
          <a href="https://eel.is/c++draft/format.string.escaped">[format.string.escaped]</a>:
        <blockquote class="quote">
        A character or string can be formatted as <i>escaped</i> to make it more
        suitable for debugging or for logging.
        </blockquote>
      </li>
      <li>Jens continued; and the request is to make it clear that the escaped
          result shall be valid for interpretation as a string literal and that
          "logging" be replaced with "technical logging".</li>
      <li>Hubert agreed, but noted there is still a question of whether visually
          distinct output is desired.</li>
      <li>Hubert reiterated; the first priority is that the escaped result is a
          valid string literal, and a secondary priority is that text that might
          not be visually distinct be made so.</li>
      <li>Jens stated that the minimal change would be to change that first
          sentence.</li>
      <li>Jens noted that no actual defect has been identified.</li>
      <li>Hubert stated that SG16 may not be the best place to fully resolve the
          comment; the question of extension remains and is more of a LEWG
          consideration.</li>
      <li>Jens suggested that, for LWG's benefit, SG16 should propose a change
          to that first sentence.</li>
      <li>Corentin stated that NB comment FR-005-134 similarly states that the
          intent of the feature is not clear.</li>
      <li>Corentin asserted there are further questions regarding the escaping
          of grapheme clusters and that it is not clear what is intended to be
          escaped and for what purpose.</li>
      <li>Corentin expressed concern that the currently specified behavior of
          escaping all combining characters disadvantages some languages more
          than others and provided Korean as an example.</li>
      <li>Victor acknowledged that US 38-098 and FR-005-134 both state that the
          intent is not clear, but noted that their proposed resolutions are not
          in agreement.</li>
      <li>Victor agreed with Corentin that users of scripts that require more
          use of combining characters should not be penalized.</li>
      <li>Victor stated that the Python form of the feature does not escape
          combining characters and that can result in interference with range
          separators.</li>
      <li>Victor noted that the original proposal only escaped lone combining
          characers and acknowledged that the switch to the Rust approach might
          have gone too far.</li>
      <li>Hubert disagreed with the notion that a failure to escape does not
          harm the technical debugging use case.</li>
      <li>Mark reported experience with use cases where text content is only
          available via an image; perhaps a screen shot captured with a
          phone.</li>
      <li>Mark stated that he has only experienced a need for escaped
          characters in cases where the text was not correctly encoded.</li>
      <li>Mark noted it is a valid question as to whether the standard library
          should default to producing visually indistinct text.</li>
      <li>Tom stated that a goal of maximizing visual distinction would require
          escaping all characters not in the basic character set.</li>
      <li>Corentin replied that it would be terrible to escape all non-ASCII
          characters but that doing so would not be worse than escaping all
          combining characters.</li>
      <li>Corentin expressed a preference towards either maximizing escaping or
          minimizing it.</li>
      <li>Hubert stated that the scripts like Korean provide strong motivation
          for a minimally escaped form.</li>
      <li>Hubert noted that there are still valid reasons for wanting a
          visually distinct form via an easy opt-in.</li>
      <li>Victor reiterated that the primary goal of the escaped form was to
          avoid interference with the formatted range output.</li>
      <li>Victor suggested that desires for other use cases be pursued via new
          papers rather than NB comments.</li>
      <li>Victor asserted that it is useful to have ill-formed code units
          escaped.</li>
      <li>Mark expressed a preference towards not escaping combining characters
          due to the readability harm it would impose on scripts like
          Korean.</li>
      <li>Corentin asserted that all use cases can't be satisfied with this
          single facility but that extensions can satisfy more post-C++23.</li>
      <li>Corentin expressed a preference towards a default that maintains
          readability for more languages and that more extensive escaping can be
          pursued separately.</li>
      <li>Hubert opined that a sequence of combining characters that immediately
          follow an escaped character is sufficient evidence of an error to
          justify escaping them.</li>
      <li>Corentin repeated that it is useful to escape non-printable
          characters.</li>
      <li>Corentin stated that the grapheme breaking algorithm is potentially
          expensive, but then backtracked with an observation that it is
          sufficient to check for the <tt>Grapheme_Extend=Yes</tt> character
          property to identify combining characters that may need to be
          escaped.</li>
      <li><b>Poll 2.1: [US 38-098] SG16 agrees that the formatted code units in
          the escaped string are intended to be usable as a string literal that
          reproduces the input.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll 2.2: [US 38-098] SG16 agrees that the escaped string is intended
          to be readable for its textual content in any Unicode script.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll 2.3: [US 38-098] SG16 agrees that separators and non-printable
          characters
          (<a href="https://eel.is/c++draft/format.string.escaped">[format.string.escaped]p(2.2.1.2)</a>)
          shall be escaped in the escaped string.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll 2.4: [US 38-098] SG16 agrees that combining code points shall
          not be escaped unless there is no leading code point or the previous
          character was escaped.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
      <li>Tom stated that he would provide examples for each of the polls when
          reporting the SG16 consensus once the NB comment github repository
          is populated.</li>
      <li>Tom suggested that anyone that works on proposed wording include
          examples.</li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/542">US 64-132 Annex E.4 Whitespace and pattern rules</a>:
    <ul>
      <li>Tom noted that FR-009-024, if adopted, will make this NB comment
          moot.</li>
      <li>Corentin explained the motivation for the FR-009-024 comment; that the
          annex is light on information, that many of the requirements don't
          apply to C++, and that the ones that do could be noted in
          <a href="http://eel.is/c++draft/lex.name">[lex.name]</a>.</li>
      <li>Steve responded that an explicit record of a negative answer to a
          question is useful.</li>
      <li>Steve explained that it would be difficult to identify Unicode
          requirement conformance information if it was spread throughout the
          standard wording.</li>
      <li>Tom observed that differing opinions are clearly present with regard
          to the utility of the annex and stated that, due to time constraints,
          discussion will be limited to US 64-132 for now; discussion of
          FR-009-024 will be scheduled for a future meeting.</li>
      <li>Steve expressed an expectation of agreement that UAX #31 is intended
          to apply to general purpose programming languages.</li>
      <li>Hubert expressed a desire for more details and noted that conformance
          is not currently claimed.</li>
      <li>Tom provided a link to Unicode document
          <a href="https://www.unicode.org/L2/L2022/22179-uax31-36-draft-6-post-pri450.pdf">L2/22-179</a>;
          it contains highlighted markup of the changes that were accepted for
          Unicode 15.</li>
      <li>Tom noted the changes added to the beginning of chapter 4,
          "Pattern Syntax":
          <blockquote class="quote">
          Most programming languages have a concept of whitespace as part of
          their lexical structure, as well as some set of characters that are
          disallowed in identifiers but have syntactic use, such as arithmetic
          operators. Beyond general programming languages, there are also ...
          </blockquote>
          and the changes to the "Modifications" section at the end of the
          document:
          <blockquote class="quote">
          <ul>
            <li>Section 4, Pattern Syntax
              <ul>
                <li>Clarified that this section is applicable to programming
                    languages.</li>
              </ul>
            </li>
          </ul>
          </blockquote>
      </li>
      <li>Jens observed that the NB comment is missing a reference to the
          updated Unicode document that clarifies applicability to general
          purpose programming languages.</li>
      <li>Jens suggested that the annex could state that
          <a href="http://eel.is/c++draft/lex.name">[lex.name]</a>
          defines a profile.</li>
      <li>Hubert expressed a preference to continue claiming non-conformance
          pending a clear specification of a conforming profile.</li>
      <li>Steve expressed contentedness with a change to just claim
          non-conformance.</li>
      <li>Jens observed that consensus appeared to be aligning with the
          proposed change from the NB comment as opposed to the one proposed in
          <a href="https://wg21.link/p2653r0">P2653R0 (Update Annex E based on Unicode 15.0 UAX 31)</a>.</li>
      <li>Steve asked if such a change could be applied editorially.</li>
      <li>Tom opined that it could be.</li>
      <li>Jens expressed a desire for CWG to review first and stated that, if a
          paper revision can be made available quickly, that he would schedule
          it for CWG review later in the week.</li>
      <li>Steve agreed to prepare a revision.</li>
      <li><b>Poll 3: [US 64-132] SG16 agrees with resolving the issue in the
          direction presented in the comment.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Tom discussed plans for the next SG16 meeting:
    <ul>
      <li>Review of the GB and FR draft NB comments identified 7 comments for
          SG16 to review</li>
      <li>It is not yet known if additional NB comments from other NBs will
          require review.</li>
      <li>The next meeting is scheduled for 2022-11-02.</li>
      <li>Once an agenda is sent, please discuss in email in advance of the
          meeting in order to reduce review time during the meeting.</li>
    </ul>
  </li>
</ul>


<h1 id="2022_11_02">November 2nd, 2022</h1>

<h2>Draft agenda:</h2>

<ul>
  <li> NB comment processing.</li>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Charles Barto</li>
  <li>Corentin Jabot</li>
  <li>Hubert Tong</li>
  <li>Jens Maurer</li>
  <li>Mark Zeren</li>
  <li>Mark de Wever</li>
  <li>Steve Downey</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://github.com/cplusplus/nbballot/issues/408">FR 005-134 22.14.6.4 [format.string.escaped] Aggressive escaping</a>:
    <ul>
      <li>Corentin explained that the direction polled for
          <a href="https://github.com/cplusplus/nbballot/issues/515">US 38-098</a>
          during the
          <a href="#2022_10_19">October 19th, 2022 SG16 meeting</a>
          suffices to resolve this issue.</li>
      <li>Victor agreed that the prior poll result is consistent with the first
          option of the proposed change.</li>
      <li><b>Poll 1: [FR 005-134]: SG16 recommends accepting the comment in the
          direction presented in the first bullet of the proposed change and as
          recommended in the polls for US 38-098.</b>
        <ul>
          <li><b>Attendees: 8</b></li>
          <li><b>Unanimous consent</b></li>
        </ul>
      </li>
      <li>Corentin asked if anyone is preparing wording for US 38-098.</li>
      <li>Hubert replied that no wording was provided with the comment.</li>
      <li>Victor noted that the proposed change for US 38-098 did have the
          suggestion to replace "logging" with "technical logging".</li>
      <li>Hubert replied that the direction polled didn't include that
          change.</li>
      <li>Tom stated that wording will be left up to LWG without a
          volunteer.</li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/432">GB-031 5.2 [lex.phases] Clarification of wording on new-line and whitespace</a>:
    <ul>
      <li>Tom lamented Peter Brett's absence.</li>
      <li>Tom recalled recent
          <a href="https://lists.isocpp.org/sg16/2022/10/3495.php">discussion on the SG16 mailing list</a>
          that suggested a possible misunderstanding regarding feedback provided
          during the
          <a href="https://wiki.edg.com/bin/view/Wg21telecons2022/Teleconference2022-09-09">2022-09-09 CWG review</a>
          of a draft of
          <a href="https://wg21.link/p2348r3">P2348R3</a>.</li>
      <li>Corentin explained that CWG was dissatisfied with the amount of churn
          involved in the paper and preferred an approach that addresses
          whitespace issues during translation phase 1.</li>
      <li>Corentin expressed disagreement with that approach and stated that he
          doesn't plan to pursue it.</li>
      <li>Corentin acknowledged that an issue exists.</li>
      <li>Steve expressed support for fixing the issue eventually but that he is
          weakly against doing so via an NB comment since, though the risk is
          low, late fixes can have unintended consequences.</li>
      <li>Jens disagreed with Corentin's summary of the CWG review, specifically
          with the claim that CWG wanted all whitespace issues to be addressed
          in translation phase 1.</li>
      <li>Jens explained that what CWG requested was for translation phase 1 to
          translate all accepted new-line forms to a single new-line character
          in the translation character set.</li>
      <li>Jens reported that CWG determined that the form of a new-line
          expressed in an input file is not observable by a program, not even
          in a raw string literal.</li>
      <li>Jens agreed with Corentin's claim that CWG recommended against the
          churn proposed in the paper.</li>
      <li>Jens explained the status quo, that translation phase 1 does not
          currently allow a UTF-8 encoded input file to have a new-line sequence
          other than U+000A (LINE FEED); the wording prohibits the use of
          U+000D (CARRIAGE RETURN) followed by U+000A (LINE FEED) as a new-line
          indicator.</li>
      <li>Jens noted that
          <a href="https://github.com/cplusplus/nbballot/issues/475">US 3-030</a>
          requests a change that matches the CWG feedback.</li>
      <li>Tom asked Corentin if Jens' explanation was helpful.</li>
      <li>Corentin replied that he doesn't want to object to progress but that
          he lacks bandwidth to work on the issue himself.</li>
      <li>Corentin offered to share the source to
          <a href="https://wg21.link/p2348">P2348</a>
          to anyone interested in working on a revision.</li>
      <li>Steve stated that, based on the intended scope, he would not object to
          CWG's preferred direction.</li>
      <li>Steve volunteered to look into producing a revision of P2348 if
          Corentin makes the source available.</li>
      <li>Tom stated that Jens' comments suggest a path forward of rejecting
          this comment in favor of pursuing US 3-030.</li>
      <li>Jens suggested that further action await a revision of P2348 and that
          this NB comment be handled procedurally as not having consensus for a
          change.</li>
      <li>Corentin noted that P2348 went through the committee pipeline and
          doesn't need to be rushed.</li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/411">FR-009-024 Annex E [uaxid] Shorten contents and integrate with [lex.name]</a>:
    <ul>
      <li>Tom mentioned that this issue was briefly discussed with the
          discussion of
          <a href="https://github.com/cplusplus/nbballot/issues/542">US 64-132</a>
          during the
          <a href="#2022_10_19">October 19th, 2022 SG16 meeting</a>.</li>
      <li>Corentin stated that it is not yet clear that we understand
          <a href="https://unicode.org/reports/tr31">UAX #31</a>
          sufficiently well to declare conformance.</li>
      <li>Corentin asserted that, assuming retaining annex E is desirable,
          additional work is needed to evaluate conformance against a specific
          version of UAX #31, but it isn't clear which version that evaluation
          should be performed against.</li>
      <li>Corentin claimed that it is not clear that annex E is necessary or
          useful.</li>
      <li>Corentin noted that it would be useful to note some of the
          associations in
          <a href="http://eel.is/c++draft/lex.name">[lex.name]</a>.</li>
      <li>Steve replied that the burden of conformance is the same regardless
          of where it is stated.</li>
      <li>Steve added that he is disinclined to abandon attempting to state
          conformance.</li>
      <li>Steve asserted that, similarly to undefined behavior, it is hard to
          find answers for things that are not explicitly stated in the
          standard.</li>
      <li>Steve claimed that statements regarding what is and is not intended
          to be conforming are useful.</li>
      <li>Steve noted that the placement of the conformance statements in an
          annex avoids interactions with normative wording.</li>
      <li>Jens reported that the reference to UAX #31 in the
          <a href="http://eel.is/c++draft/bibliography">bibliography</a>
          specifically refers to revision 33 and Unicode 13.</li>
      <li>Jens asserted that it is preferable that the C++ standard specify the
          syntax of identifiers itself rather than by deference to Unicode.</li>
      <li>Jens expressed support for expanding annex E to include statements of
          conformance for other Unicode requirements in a future standard.</li>
      <li>Jens noted that some of the clarifications made to UAX #31 for
          Unicode 15 were directly inspired by the initial attempt to state
          conformance in annex E and that such a feedback cycle is a valuable
          result.</li>
      <li>Jens stated that annex E doesn't require significant maintenance and,
          since it is non-normative, a failure to update it would not be highly
          consequential since it has no implementation impact.</li>
      <li>Corentin stated that the Unicode standard is defined as a complete set
          and is not intended or designed to support cherry picking different
          versions of its parts.</li>
      <li>Corentin provided normalization as an example of Unicode specification
          that is defined across multiple parts of the Unicode Standard.</li>
      <li><b>Poll 2: [FR-009-024]: SG16 recommends rejecting the comment on the
          basis that explicit indication of Unicode requirement conformance,
          non-conformance, or inapplicability is useful.</b>
        <ul>
          <li><b>Attendees: 9 (1 abstention)</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">3</th>
                <th style="text-align:right">3</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
              </tr>
            </table>
          </li>
          <li><b>Consensus.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/412">FR-010-133 [Bibliography] Unify references to Unicode</a>
      and <br/>
      <a href="https://github.com/cplusplus/nbballot/issues/423">FR-021-013 5.3p5.2 [lex.charset] Codepoint names in identifiers</a>:
    <ul>
      <li>Corentin explained that the C++ standard currently references four
          distinct Unicode versions for various purposes but that
          implementations, Clang specifically, intend to adopt behaviors from
          newer Unicode versions as releases occur.</li>
      <li>Corentin described a technical inconsistency that results from the
          disjoint version references:
        <ul>
          <li>The range of UCS scalar values that can be expressed in a
              <i>universal-character-name</i> (UCN) is determined by the
              ISO/IEC 10646 version.</li>
          <li>The set of character names recognized for a
              <i>named-universal-character</i> (NUC) are likewise determined
              by the ISO/IEC 10646 version.</li>
          <li>The set of UCS scalar values allowed in an <i>identifier</i> is
              determined by the <tt>XID_Start</tt> and <tt>XID_Continue</tt>
              properties defined in the referenced
              <a href="https://unicode.org/reports/tr44">UAX #44</a>
              version.</li>
          <li>If the version of UAX #44 referenced corresponds to a newer
              version of the Unicode Standard than the associated version for
              the referenced version of ISO/IEC 10646, then there will exist
              some identifiers that can be spelled as, for example,
              <tt>x\u1234</tt> but not as <tt>x\N{NAME_FOR_1234}</tt>.</li>
        </ul>
      </li>
      <li>Steve expressed concern that updating the referenced versions might
          break section references.</li>
      <li>Corentin replied that he checked all references and only found one
          section reference; the reference for the Unicode replacement
          character in
          <a href="http://eel.is/c++draft/ostream.formatted.print">[ostream.formatted.print]</a>
          specifically references chapter 3.9 of the core specification for
          Unicode 14.</li>
      <li>Steve stated that the bibliography is intended to reflect what the
          author was reading when writing the C++ standard.</li>
      <li>Corentin agreed and noted that normative changes should be made as
          necessary when the versions referenced in the bibliography are
          updated.</li>
      <li>Steve noted that such concerns will be more important for future
          library features that have a deeper dependence on the Unicode
          Standard.</li>
      <li>Tom noted that the ISO requires references to other ISO standards to
          reference the most recent version and asked if that applies to non-ISO
          standards as well.</li>
      <li>Jens replied that the ISO prefers undated references.</li>
      <li>Jens explained that outdated ISO versions don't really exist from the
          ISO perspective since a newer version is intended to replace a
          previous version; references to previous releases are somewhat like
          dangling pointers.</li>
      <li>Jens noted that, practically speaking, older versions do exist and
          that we do refer to older versions when necessary; like we have to do
          for UCS-2.</li>
      <li>Jens further explained that a dated reference is used for C since
          normative changes are very likely required to accommodate a newer
          version.</li>
      <li>Corentin explained that, at present, there is a mix of specific
          version references and floating references and that some are normative
          and some are non-normative.</li>
      <li>Corentin stated that the only change that would have a normative
          impact is for named character sequences.</li>
      <li>Jens stated that the Unicode Standard is referenced for cases where
          the needed subject matter is not present in an ISO standard.</li>
      <li>Jens noted that ISO prefers referencing ISO standards when
          possible.</li>
      <li>Jens suggested that the project editor should have more insight into
          the rules provided by ISO regarding references to ISO standards vs the
          Unicode Standard.</li>
      <li>Corentin clarified that the NB comment is not asking to only refer to
          the Unicode Standard; it is asking that named character sequences be
          made consistent with other uses of Unicode functionality.</li>
      <li>Jens noted that the character names are present in ISO/IEC 10646, but
          that the properties needed for identifiers are not.</li>
      <li>Hubert suggested that, when a reference is needed to the Unicode
          Standard, that the version aligned with ISO/IEC 10646 be
          referenced.</li>
      <li>Hubert stated that implementations can then veto that in favor of
          newer versions and that no one would complain.</li>
      <li>Hubert raised the option of asking the project editor to make a
          request to the ISO that the scope of ISO/IEC 10646 be expanded to
          include the additional Unicode features that we need.</li>
      <li>Hubert expressed a preference towards referencing ISO/IEC 10646 for
          terms and definitions because the ISO's practice tends to be more
          stringent than the Unicode Consortium's.</li>
      <li>Corentin repeated his goal to improve consistency; that the references
          be updated so that the character names and XID properties be sourced
          from the same reference.</li>
      <li>Hubert asked why the reference for extended grapheme cluster is
          non-normative.</li>
      <li>Jens replied that he thinks
          <a href="https://unicode.org/reports/tr29">UAX #29</a>
          is only referenced to satisfy normative encouragement for an
          implementation direction.</li>
      <li>Charlie expressed agreement with Jens' recollection.</li>
      <li>Hubert stated that normative encouragement should require a normative
          reference.</li>
      <li>Jens agreed that is probably true.</li>
      <li>Corentin asserted that, as more support for Unicode is added to C++,
          there will be more need for references to the Unicode Standard that
          can't be satisfied by ISO/IEC 10646.</li>
      <li>Jens admitted he was surprised when he first joined SG16 to learn
          that ISO/IEC 10646 specifies a subset of the features present in the
          Unicode Standard.</li>
      <li>Tom asked if changes to reference the Unicode Standard version that
          is aligned with the referenced ISO/IEC 10646 version would resolve
          the concern.</li>
      <li>Jens noted that the current reference to ISO/IEC 10646 is
          undated.</li>
      <li>MarkZ suggested the right approach would be to just reference the
          Unicode Standard.</li>
      <li>Corentin suggested that the next action be to coordinate with the
          project editor to better understand our options.</li>
      <li>Steve suggested it might be best to state that implementations should
          use the Unicode Standard version that aligns with their version of
          ISO/IEC 10646.</li>
      <li>Tom stated that the
          <a href="https://unicode.org/faq/unicode_iso.html">Unicode FAQ</a>
          explicitly states which Unicode Standard version is aligned with each
          ISO/IEC 10646 version and asked if ISO/IEC 10646 is similarly
          explicit.</li>
      <li>Jens checked and reported that it is not, but that it embeds links
          that are version specific.</li>
      <li>Corentin stated that the highest priority is to provide consistent
          references and that we can rely on forward compatibility
          guarantees.</li>
      <li>Jens noted that, though we do understand and appreciate the Unicode
          stability guarantees, we are obligated to verify that those
          commitments are honored.</li>
      <li><b>Poll 3: [FR-010-133][FR-021-013]: SG16 requests that the project
          editor discuss with the ISO the option of eschewing references to
          ISO/IEC 10646 in favor of the Unicode Standard both for technical
          consistency and release frequency.</b>
        <ul>
          <li><b>Attendees: 9 (1 abstention)</b></li>
          <li><b>Objection to unanimous consent.</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">3</th>
                <th style="text-align:right">3</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">1</th>
              </tr>
            </table>
          </li>
          <li><b>Weak consensus</b></li>
          <li><b>SA: Use of the ISO/IEC 10646 document benefits from
              ISO governance.</b></li>
          <li><b>SA: Would prefer to explore expansion of ISO/IEC 10646 to
              include more components of Unicode.</b></li>
        </ul>
      </li>
      <li>Hubert indicated he might work with his NB to raise comments on the
          next ballot of ISO/IEC 10646 to request that it expand its scope.</li>
      <li>MarkZ suggested that quality issues could also be reported to the
          Unicode Consortium.</li>
      <li>MarkZ noted that interoperation with other languages and runtimes
          might be improved by aligning with the Unicode Standard.</li>
      <li><b>Poll 4: [FR-010-133][FR-021-013]: SG16 recommends resolving these
          comments by restricting all references to the Unicode Standard to the
          version that corresponds to the referenced version of
          ISO/IEC 10646.</b>
        <ul>
          <li><b>Attendees: 9 (1 abstention)</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">2</th>
                <th style="text-align:right">3</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">3</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>No consensus.</b></li>
          <li><b>A: It doesn't benefit the community to reference a Unicode
              version that is outdated by the time the standard is
              published.</b></li>
        </ul>
      </li>
      <li>Steve suggested that it might be helpful to explore different
          guarantees for core language vs the standard library.</li>
      <li>Hubert agreed that it is conceivable that use of different Unicode
          Standard versions for the core language and the standard library
          would be ok.</li>
    </ul>
  </li>
  <li>Tom reported that the next meeting will take place on November 30th.</li>
</ul>


<h1 id="2022_11_30">November 30th, 2022</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2713r0">P2713R0: Escaping improvements in std::format</a>
    <ul>
      <li><a href="https://github.com/cplusplus/nbballot/issues/515">US 38-098 22.14.6.4p1 [format.string.escaped] Escaping for debugging and logging</a></li>
      <li><a href="https://github.com/cplusplus/nbballot/issues/408">FR 005-134 22.14.6.4 [format.string.escaped] Aggressive escaping</a></li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2693r0">P2693R0: Formatting thread::id and stacktrace</a>
    <ul>
      <li><a href="https://github.com/cplusplus/nbballot/issues/410">FR-008-011 22.14 [format] Support formatting of thread::id</a></li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/412">FR-010-133 [Bibliography] Unify references to Unicode</a>
      and<br/>
      <a href="https://github.com/cplusplus/nbballot/issues/423">FR-021-013 5.3p5.2 [lex.charset] Codepoint names in identifiers</a></li>
  <li><a href="https://wg21.link/p2675r0">P2675R0: LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)</a>
    <ul>
      <li><a href="https://cplusplus.github.io/LWG/issue3780">LWG #3780: format's width estimation is too approximate and not forward compatible</a></li>
      <li><a href="https://github.com/cplusplus/nbballot/issues/409">FR-007-012 22.14.2.2 [format.string.std] codepoints with width 2</a></li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/422">FR-020-014 5.3 [lex.charset] Replace "translation character set" by "Unicode"</a></li>
</ul>

<h2>Attendees:</h2>

<ul>
  </li>Charles Barto</li>
  </li>Corentin Jabot</li>
  </li>Jens Maurer</li>
  </li>Mark de Wever</li>
  </li>Mark Zeren</li>
  </li>Nathan Owen</li>
  </li>Peter Brett</li>
  </li>Tom Honermann</li>
  </li>Victor Zverovich</li>
  </li>Zach Laine</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://wg21.link/p2713r0">P2713R0: Escaping improvements in std::format</a>:
    <ul>
      <li>Tom reported that the paper implements the previous guidance
          provided for
          <a href="https://github.com/cplusplus/nbballot/issues/515">US 38-098</a>
          during the
          <a href="https://github.com/sg16-unicode/sg16-meetings#october-19th-2022">2022-10-19 SG16 telecon</a>
          and for
          <a href="https://github.com/cplusplus/nbballot/issues/408">FR 005-134</a>
          during the
          <a href="https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022">2022-11-02 SG16 telecon</a>
          so all that should be needed is to confirm the paper via a poll.</li>
      <li>Tom noted that some minor wording feedback was provided in a
          <a href="https://lists.isocpp.org/sg16/2022/11/3586.php">post to the SG16 mailing list</a>.</li>
      <li>Victor presented the paper and further wording review commenced.</li>
      <li><b>Poll 1: P2713R0: Forward to LEWG as the recommended resolution of
          US 38-098 and FR 005-134 amended with discussed wording changes.</b>
        <ul>
          <li><b>Attendees: 10</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2693r0">P2693R0: Formatting thread::id and stacktrace</a>:
    <ul>
      <li>Corentin provided an introduction.</li>
      <li>Victor reported Bryce's rationale for SG16 review; there were
          questions about wide string support.</li>
      <li>Victor noted that the <tt>ostream</tt> inserters for
          <tt>stacktrace_entry</tt> and <tt>basic_stacktrace</tt> do not
          support wide ostreams, so the lack of support for
          <tt>std::format</tt> is consistent.</li>
      <li>Corentin stated that there is no guarantee that
          <tt>std::thread::id</tt> will be formatted consistently for
          <tt>char</tt> and <tt>wchar_t</tt>.</li>
      <li>Jens, referring to the proposed [stacktrace.format] wording, noted
          that "must" is not allowed in normative wording.</li>
      <li>Victor asked what should be used instead.</li>
      <li>Tom suggested "mandates" or "requires".</li>
      <li>Victor explained that the wording intent is that a non-empty
          <i>format-spec</i> evaluated at compile-time render the program
          ill-formed and result in a format error exception if evaluated at
          run-time.</li>
      <li>Jens suggested wording the requirements in terms of format string
          validity.</li>
      <li>Charles noted that a thread ID is a handle on Windows.</li>
      <li>Charles stated that his only concern is whether additional header
          inclusion might be required but the proposal looks fine
          otherwise.</li>
      <li>Jens suggested dropping the
          "The syntax of format specifications is as follows"
          sentence in the wording for
          <tt>formatter&lt;thread::id, charT&gt;</tt>.</li>
      <li>Tom stated that any changes to require wide character support for
          stacktrace or consistent text representation for
          <tt>std::thread::id</tt> would be out of scope.</li>

      <li><b>Poll 2: P2693R0: Forward to LEWG as the recommended resolution of
          FR-008-011.</b>
        <ul>
          <li><b>Attendees: 10</b></li>
          <li><b>No objection to unanimous consent.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/412">FR-010-133 [Bibliography] Unify references to Unicode</a>
      and<br/>
      <a href="https://github.com/cplusplus/nbballot/issues/423">FR-021-013 5.3p5.2 [lex.charset] Codepoint names in identifiers</a>:
    <ul>
      <li>Corentin explained that authoring a paper to address these NB
          comments is on his todo list.</li>
      <li>Corentin invited offers to help with a paper.</li>
      <li>Jens stated that it will be important to understand how the change
          to the normative reference impacts how wording is interpreted
          throughout the standard.</li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2675r0">P2675R0: LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)</a>:
    <ul>
      <li>Corentin provided an introduction.
        <ul>
          <li>Victor had initially identified a range of code points that
              specify characters to be considered as having an estimated
              width of two.</li>
          <li>That code point range corresponds to Unicode 13 and has not
              been updated for more recent Unicode Standard versions.</li>
          <li>Analysis of source code and behavior in existing terminals
              inspired the current proposal to derive the code point ranges
              from the Unicode character property database.</li>
        </ul>
      </li>
      <li>Victor expressed mixed feelings regarding the proposal; though the
          idea is favorable, consulted sources indicate that the Unicode
          properties don't predict how characters are displayed particularly
          well.</li>
      <li>Victor indicated support for consideration of the Unicode width
          property, but that code point ranges that are ambiguous should be
          retained.</li>
      <li>Victor stated that all of the code points that change from an
          estimated width of two to an estimated width of one are rendered
          with a width of two in his environment, so those cases appear to
          constitute a regression.</li>
      <li>Victor acknowledged that the proposal looks like a good step in the
          right direction.</li>
      <li>Victor raised U+2E9A as an example; it is an unassigned character in
          a block for which characters are assumed to have a width of two and
          it is rendered as a wide unassigned character.</li>
      <li><em>[ Editor's note: In Unicode 15.0,
          <a href="https://util.unicode.org/UnicodeJsps/character.jsp?a=%E2%BA%9A&B1=Show">U+2E9A</a>
          is a reserved unassigned character in the
          <a href="https://www.unicode.org/charts/PDF/U2E80.pdf">CJK Radicals Supplement block</a>
          and its <tt>East_Asian_Width</tt> property value is <tt>N</tt>
          (Neutral). ]</em></li>
      <li>Corentin replied that terminals that display such characters as wide
          characters are non-conforming.</li>
      <li>Corentin argued that use of the Unicode character database is
          justified by the lack of anything obviously better.</li>
      <li>Corentin asserted that estimated width is necessarily an approximation
          at present.</li>
      <li>Corentin stated his goal with the proposal is to prioritize a
          principled solution with predictability.</li>
      <li>Zach observed that there appear to be some contradictions and pondered
          how they might be resolved.
        <ul>
          <li>There is a desire to be forward compatibile and to defer to the
              Unicode Standard.</li>
          <li>There is a desire to consider certain unassigned code points as
              wide until they are assigned a width by the Unicode Standard.</li>
        </ul>
      </li>
      <li>Corentin stated that existing behavior should be evaluated before
          choosing to deviate from Unicode.</li>
      <li>PBrett observed that wide divergence can be observed between different
          rastorizers and stated that he does not relish the idea of identifying
          the subset of behavior that is exhibited in the wild.</li>
      <li>Victor expressed skepticism regarding the feasibility of relying only
          on Unicode.</li>
      <li>Victor stated that Unicode conformance doesn't apply to this
          situation.</li>
      <li>Victor cautioned that the traditional Windows console behavior should
          not be used as a reference as it exhibits notoriously poor
          behavior.</li>
      <li>Corentin indicated an intent to update the paper with references to
          the scripts used to collect data and evaluate behavior.</li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/422">FR-020-014 5.3 [lex.charset] Replace "translation character set" by "Unicode"</a>:
    <ul>
      <li>Discussion was postponed due to lack of time.</li>
    </ul>
  </li>
  <li>Tom reported that the next meeting will take place on
      December 14th, 2022.</li>
</ul>


<h1 id="2022_12_14">December 14th, 2022</h1>

<h2>Draft agenda:</h2>

<ul>
  <li><a href="https://wg21.link/p2675r1">D2675R1: LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)</a>
    <ul>
      <li><a href="https://cplusplus.github.io/LWG/issue3780">LWG #3780: format's width estimation is too approximate and not forward compatible</a></li>
      <li><a href="https://github.com/cplusplus/nbballot/issues/409">FR-007-012 22.14.2.2 [format.string.std] codepoints with width 2</a></li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/422">FR-020-014 5.3 [lex.charset] Replace "translation character set" by "Unicode"</a>
</ul>

<h2>Attendees:</h2>

<ul>
  <li>Charlie Barto</li>
  <li>Corentin Jabot</li>
  <li>Jens Maurer</li>
  <li>Mark de Wever</li>
  <li>Peter Brett</li>
  <li>Tom Honermann</li>
  <li>Victor Zverovich</li>
</ul>

<h2>Meeting summary:</h2>

<ul>
  <li><a href="https://wg21.link/p2675r1">D2675R1: LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)</a>:
    <ul>
      <li><em>[ Editor's note: D2675R1 was the active paper under discussion at
          the telecon.
          The agenda and links used here reference P2675R1 since the links to
          the draft paper were ephemeral.
          The published document may differ from the reviewed draft revision.
          ]</em></li>
      <li>PBrett summarized the changes in the draft R1 revision.</li>
      <li>Corentin summarized an
          <a href="https://lists.isocpp.org/sg16/2022/12/3623.php">email sent by Victor</a>
          that demonstrated behavior in which a wide character was rendered such
          that it overlapped an adjacent character because the terminal treated
          the character as a narrow one but the font in use rendered it as a
          wide character.</li>
      <li>Corentin pointed out that the demonstrated behavior implies that
          character width cannot be determined by looking at a rendered
          character in isolation since the character rendering may exceed the
          bounds of a terminal cell.</li>
      <li>Victor acknowledged it was a mistake to categorize the relevant
          characters as having a width of 2; the initial error was due to
          observing the rendered character without an adjacent character.</li>
      <li>Victor expressed appreciation for the systematic approach proposed in
          the paper and that it appears to improve behavior.</li>
      <li>Victor stated that it is difficult to interpret the screenshots
          currently in the paper.</li>
      <li>PBrett suggested that it might be helpful to provide more constructive
          feedback to paper authors regarding how presentation can be
          improved.</li>
      <li>Corentin explained that he had asked for contributions of screenshots
          from others since he did not have convenient access to the wide range
          of terminals that are used in practice.</li>
      <li>Corentin reported that rendering issues that occur with just one or a
          small subset of terminals are common and asserted that we should not
          concern ourselves with such cases.</li>
      <li>Corentin stated that he has not found cases that are contrary to the
          proposal and that have consistent behavior across the sampled
          terminals.</li>
      <li>Victor, referring to an
          <a href="https://lists.isocpp.org/sg16/2022/12/3621.php">email that Tom sent to the SG16 mailing list</a>,
          reported having performed some further analysis with the attached
          source code and provided some constructive feedback.</li>
      <li><em>[ Editor's note: The mailing list software appears to have
          ignored, misplaced, or otherwise omitted the source code that was
          attached to that email. ]</em></li>
      <li>Tom stated that we could spend additional time discussing the pros
          and cons of the screenshots but that doing so might not be a good use
          of our time.</li>
      <li>Corentin opined that it would not be a good use of our time and
          agreed to remove most of the screenshots.</li>
      <li>Jens summarized his understanding of the paper; that the standard
          currently specifies explicit code point ranges and the paper proposes
          changes to better align behavior with various terminals.</li>
      <li>Tom voiced agreement.</li>
      <li>Jens expressed concern that it is late in the release cycle for such
          changes.</li>
      <li>PBrett replied that this addresses a defect.</li>
      <li>Corentin noted that
          <a href="https://cplusplus.github.io/LWG/issue3780">LWG issue 3780</a>
          already exists.</li>
      <li>Tom explained that we can choose between recommending this as a
          change for C++23 or as a DR to be addressed in C++26.</li>
      <li>PBrett expressed a preference for addressing this in C++23.</li>
      <li>Victor noted that there already is consensus that width estimation is
          best effort and likely to change in the future.</li>
      <li>Victor stated that there is not an urgent need to rush this into
          C++23 but that we might as well add it now if we agree the paper is
          ready.</li>
      <li>Corentin explained that his motivation for targeting C++23 is to
          ensure that behavior varies as expected with whatever Unicode version
          is in use by an implementation.</li>
      <li>Corentin noted that the situation will grow worse over time as the
          explicit code point ranges in the standard deviate further from
          existing practice as that practice changes with new Unicode
          releases.</li>
      <li><b>Poll 0.1: Forward D2675R1
          "format's width estimation is too approximate and not forward compatible",
          with improved presentation, to LEWG as the recommended resolution of
          LWG3780 and NB comment FR-007-012.</b>
        <ul>
          <li><b>Attendees: 6</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">3</th>
                <th style="text-align:right">3</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>Unanimous consent.</b></li>
        </ul>
      </li>
      <li><b>Poll 0.2: Recommend that D2675R1 be applied to the C++23 working
          paper.</b>
        <ul>
          <li><b>Attendees: 6</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">2</th>
                <th style="text-align:right">4</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>Unanimous consent.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://github.com/cplusplus/nbballot/issues/422">FR-020-014 5.3 [lex.charset] Replace "translation character set" by "Unicode"</a>:
    <ul>
      <li>Tom asked what new information has become available since we last
          discussed and polled this topic during the
          <a href="https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#march-24th-2021">2021-03-24 SG16 meeting</a>.</li>
      <li>PBrett responded that the existence of an NB comment may constitute
          new information.</li>
      <li>Corentin stated that removal of the "translation character set" term
          will require addressing the imprecise use of the term
          "character".</li>
      <li>Corentin reported that the Unicode Standard states that an unassigned
          character must not be treated as a character and that treating one as
          such could be a Unicode conformance concern.</li>
      <li>Corentin requested an indication of support for this direction before
          devoting the considerable time drafting a paper would require.</li>
      <li>Jens noted that we don't claim conformance with the Unicode Standard;
          we only use it as a reference.</li>
      <li>Tom opined that the current use of "character" does not constitute a
          Unicode conformance concern.</li>
      <li>Tom asserted that a paper to address the imprecise use of "character"
          would be quite valuable regardless of any changes with respect to
          "translation character set".</li>
      <li>PBrett expressed support for making changes with regard to
          "translation character set" either in C++23 or sometime after the use
          of "character" is addressed.</li>
      <li>Corentin noted that the Unicode Standard intentionally does not
          define "character".</li>
      <li>Corentin indicated that the paper he would write would address the
          core language, but not the standard library since addressing both
          would require such a significant effort.</li>
      <li>PBrett asked if these changes could be done editorially.</li>
      <li>Jens replied that there is potential for friction with the C standard
          since it also uses the term "character".</li>
      <li>Tom reported that Ken Whistler recommended reviewing
          <a href="https://www.unicode.org/reports/tr17">UTR #17 (Unicode Character Encoding Model)</a>
          for terminology to use.</li>
      <li>Corentin replied that he would review it.</li>
      <li>Corentin noted that, after translation phase 1, the elements of the
          translation character set are all Unicode scalar values because
          surrogate code points are not allowed and asked what terminology
          should be used.</li>
      <li>Tom replied that, in an offline discussion, he had suggested to
          Corentin that we prefer "code point" in general discussion and
          reserve "scalar value" for use as a form of qualifier to restrict
          code point allowances.</li>
      <li>Jens requested a paper that describes the desired end state before
          considerable effort is put into producing wording.</li>
      <li>PBrett replied that doing so implies rejecting the NB comment.</li>
      <li>Jens replied that, without a paper, rejection is the only option as
          there can be no consensus for a specific change.</li>
      <li>Tom noted that there is very little time left for making changes to
          C++23.</li>
      <li><b>Poll 1.1: Encourage further work on expressing the semantics of
          C++ lexing in terms of the terminology defined in the Unicode
          Standard.</b>
        <ul>
          <li><b>Attendees: 6</b></li>
          <li>
            <table>
              <tr>
                <th style="text-align:right">SF</th>
                <th style="text-align:right">F</th>
                <th style="text-align:right">N</th>
                <th style="text-align:right">A</th>
                <th style="text-align:right">SA</th>
              </tr>
              <tr>
                <th style="text-align:right">4</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">0</th>
                <th style="text-align:right">1</th>
                <th style="text-align:right">0</th>
              </tr>
            </table>
          </li>
          <li><b>Strong consensus.</b></li>
          <li><b>A: I'm concerned about interaction with the C standard and
              introducing inconsistency between core wording and library
              wording.</b></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="https://wg21.link/p2736r0">D2736R0: Referencing the Unicode Standard</a>:
    <ul>
      <li><em>[ Editor's note: D2736R0 was the active paper under discussion at
          the telecon.
          The agenda and links used here reference P2736R0 since the links to
          the draft paper were ephemeral.
          The published document may differ from the reviewed draft revision.
          ]</em></li>
      <li>Corentin noted that the previous feedback was to try to ensure that
          the change of reference would have no normative impact on
          behavior.</li>
      <li>Corentin explained that there is a design question regarding the
          <tt>__STDC_ISO_10646__</tt> predefined macro; the macro is specified
          by the C standard as having a value that reflects the date of a
          ISO/IEC 10646 standard.</li>
      <li>Corentin reported that there are known issues with the macro;
          compilers can't predefine it because the value to define it to is
          determined by the C standard library.</li>
      <li>Corentin stated that the macro is only useful to distinguish between
          old 16-bit Unicode and modern 21-bit Unicode.</li>
      <li>Corentin suggested that the C++ standard could specify it to have an
          implementation-defined value like it does for
          <tt>__STDC_VERSION__</tt>.</li>
      <li>Corentin suggested another alternative would be to specify it as
          having a Unicode version date instead.</li>
      <li>PBrett suggested specifying it to have a value that matters.</li>
      <li>Corentin explained that implementations that use a 16-bit
          <tt>wchar_t</tt> can't define this macro to any relevant Unicode or
          ISO/IEC 10646 standard.</li>
      <li>Jens replied that in those cases, he would expect the macro to be
          defined for the last ISO/IEC 10646 standard that had a 16-bit code
          point space.</li>
      <li>Jens suggested the value should just reflect the size of
          <tt>wchar_t</tt>.</li>
      <li>Corentin noted that the macro also reflects whether values of
          <tt>wchar_t</tt> correspond to a Unicode encoding; which could be
          locale dependent.</li>
      <li>Tom summarized three possibilities:
        <ul>
          <li><tt>wchar_t</tt> has an associated encoding that is not a
              Unicode encoding; the macro is not defined.</li>
          <li><tt>wchar_t</tt> is 16-bit and the associated encoding is
              UCS-2; the macro is defined to reflect an obsolete
              ISO/IEC 10646 standard.</li>
          <li><tt>wchar_t</tt> is 32-bit and the associated encoding is
              UTF-32; the macro is defined to reflect a relatively current
              ISO/IEC 10646 standard.</li>
        </ul>
      </li>
      <li>Jens opined that this requires coordination with WG14.</li>
      <li>PBrett asked if we can deprecate the macro.</li>
      <li>Jens replied that we can choose to deviate from the C standard but
          noted that the macro can be useful.</li>
      <li>PBrett asked about Corentin's previous suggestion to just state that
          the macro has an implementation-defined value.</li>
      <li>Jens opined that the macro has some value.</li>
      <li>Jens noted that the C++ standard has library wording that states that
          all elements of the wide character set are representable as values of
          <tt>wchar_t</tt> and that the presence of the macro definition in
          core wording is suggestive of applicability to wide character and
          string literals.</li>
      <li>Tom suggested some compare and contrast analysis with the C
          standard.</li>
      <li>Corentin stated that it isn't clear to him that WG14 knows what this
          macro is intended for.</li>
      <li>Corentin pondered deprecation, but not as a part of this paper.</li>
      <li>Corentin reported that code searches revealed few references to the
          macro that are sensitive to the macro value; most code just checks
          if the macro is defined.</li>
    </ul>
  </li>
  <li>Tom announced that the next two telecons are scheduled for 2023-01-11
      and 2023-01-25 and will be followed by the WG21 meeting in Issaquah in
      early February.</li>
</ul>


</body>
