<!doctype html public "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<head>

<title>char8_t: A type for UTF-8 characters and strings (Revision 1)</title>

<link rel="stylesheet"
      href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

<style type="text/css">
pre {
    display: inline;
}

table#header th,
table#header td
{
    text-align: left;
}
table#references th,
table#references td
{
    vertical-align: top;
}

ins, ins * { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
del, del * { text-decoration:line-through; background-color:#FFA0A0 }
#hidedel:checked ~ * del, #hidedel:checked ~ * del * { display:none; visibility:hidden }

blockquote
{
    color: #000000;
    background-color: #F1F1F1;
    border: 1px solid #D1D1D1;
    padding-left: 0.5em;
    padding-right: 0.5em;
}
blockquote.stdins
{
    text-decoration: underline;
    color: #000000;
    background-color: #C8FFC8;
    border: 1px solid #B3EBB3;
    padding: 0.5em;
}
blockquote.stddel
{
    text-decoration: line-through;
    color: #000000;
    background-color: #FFEBFF;
    border: 1px solid #ECD7EC;
    padding-left: 0.5empadding-right: 0.5em;
}
</style>

</head>


<body>

<table id="header">
  <tr>
    <th>Document Number:</th>
    <td>P0482R1</td>
  </tr>
  <tr>
    <th>Date:</th>
    <td>2018-02-12</td>
  </tr>
  <tr>
    <th>Audience:</th>
    <td>Evolution Working Group<br/>
        Library Evolution Working Group</td>
  </tr>
  <tr>
    <th>Reply-to:</th>
    <td>Tom Honermann &lt;tom@honermann.net&gt;</td>
  </tr>
</table>

<h1>char8_t: A type for UTF-8 characters and strings (Revision 1)</h1>

<ul>
  <li><a href="#changes_since_P0482R0">
      Changes since P0482R0</a></li>
  <li><a href="#introduction">
      Introduction</a></li>
  <li><a href="#motivation">
      Motivation</a></li>
  <li><a href="#proposal">
      Proposal</a></li>
  <li><a href="#design">
      Design Considerations</a>
    <ul>
      <li><a href="#design_compat">
          Backward compatibility
          </a>
        <ul>
          <li><a href="#design_compat_core">
              Core language backward compatibility
              </a>
            <ul>
              <li><a href="#design_compat_core_init">
                  Initialization
                  </a></li>
              <li><a href="#design_compat_core_implicit_conversion">
                  Implicit conversions
                  </a></li>
              <li><a href="#design_compat_core_type_deduction">
                  Type deduction
                  </a></li>
              <li><a href="#design_compat_core_overload_resolution">
                  Overload resolution
                  </a></li>
              <li><a href="#design_compat_core_template_specialization">
                  Template specialization
                  </a></li>
            </ul>
          </li>
          <li><a href="#design_compat_library">
              Library backward compatibility
              </a>
            <ul>
              <li><a href="#design_compat_library_u8string">
                  Return type of <tt>path::u8string</tt> and <tt>path::generic_u8string</tt>
                  </a></li>
              <li><a href="#design_compat_library_literal_operators">
                  Return type of <tt>operator ""s</tt> and <tt>operator ""sv</tt>
                  </a></li>
            </ul>
          </li>
        </ul>
      </li>
      <li><a href="#design_narrow_utf8">
          Should UTF-8 literals continue to be referred to as narrow literals?
          </a></li>
      <li><a href="#design_char8_t_underlying_type">
          What should be the underlying type of char8_t?
          </a></li>
    </ul>
  </li>
  <li><a href="#implementation_exp">
      Implementation Experience</a></li>
  <li><a href="#wording">
      Formal Wording</a>
    <ul>
      <li><a href="#core_wording">
          Core Wording</a></li>
      <li><a href="#library_wording">
          Library Wording</a></li>
      <li><a href="#feature_testing">
          Wording for P0096: Feature-testing recommendations for C++</a></li>
    </ul>
  </li>
  <li><a href="#acknowledgements">
      Acknowledgements</a></li>
  <li><a href="#references">
      References</a></li>
</ul>

<h1 id="changes_since_P0482R0">Changes since P0482R0</h1>

<ul>
  <li>Added the <a href="#proposal">Proposal</a> section summarizing the
      proposed changes.</li>
  <li>Rewrote most of the <a href="#design">Design Considerations</a> section.
  <li>Updated the <a href="#implementation_exp">Implementation Experience</a>
      section; an implementation is now available in a fork of gcc.</li>
  <li>Added <a href="#feature_testing">wording for feature-test macros</a>.</li>
  <li>Rebased the proposed wording on
      <a title="Working Draft, Standard for Programming Language C++"
         href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf">
      N4713</a>
      <sup><a title="Working Draft, Standard for Programming Language C++"
              href="#ref_n4713">
      [N4713]</a></sup>.</li>
  <li>Updated core wording for 5.13.5 to define <em>UTF-8 string literal</em>
      before referring to it.
  </li>
</ul>

<h1 id="introduction">Introduction</h1>

<p>C++11 introduced support for UTF-8, UTF-16, and UTF-32 encoded string
literals via
<a title="N2249: New Character Types in C++"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2249.html">
N2249</a>
<sup><a title="N2249: New Character Types in C++"
        href="#ref_n2249">
[N2249]</a></sup>.
New <tt>char16_t</tt> and <tt>char32_t</tt> types were added to hold values of
code units for the UTF-16 and UTF-32 variants, but a new type was not added for
the UTF-8 variants.  Instead, UTF-8 character literals (added in C++17 via
<a title="N4197: Adding u8 character literals"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4197.html">
N4197</a>
<sup><a title="N4197: Adding u8 character literals"
        href="#ref_n4197">
[N4197]</a></sup>)
and UTF-8 string literals were defined in terms of the <tt>char</tt> type used
for the code unit type of ordinary character and string literals.  UTF-8 is the
only text encoding mandated to be supported by the C++ standard for which there
is no distinct code unit type.  Lack of a distinct type for UTF-8 encoded
character and string literals prevents the use of overloading and template
specialization in interfaces designed for interoperability with encoded text.
The inability to infer an encoding for narrow characters and strings limits
design possibilities and hinders the production of elegant interfaces that work
seemlessly in generic code.  Library authors must choose to limit encoding
support, design interfaces that require users to explicitly specify encodings,
or provide distinct interfaces for, at least, the implementation defined
execution and UTF-8 encodings.</p>

<p>Whether <tt>char</tt> is a signed or unsigned type is implementation defined
and implementations that use an 8-bit signed char are at a disadvantage with
respect to working with UTF-8 encoded text due to the necessity of having to
rely on conversions to unsigned types in order to correctly process leading and
continuation code units of multi-byte encoded code points.</p>

<p>The lack of a distinct type and the use of a code unit type with a range that
does not portably include the full unsigned range of UTF-8 code units presents
challenges for working with UTF-8 encoded text that are not present when working
with UTF-16 or UTF-32 encoded text.  Enclosed is a proposal for a new
<tt>char8_t</tt> fundamental type and related library enhancements intended to
remove barriers to working with UTF-8 encoded text and to enable generic
interfaces that work with all five of the standard mandated text encodings in a
consistent manner.</p>

<h1 id="motivation">Motivation</h1>

<p>Consider the following string literal expressions, all of which encode
<tt>U+0123</tt>, <tt>LATIN SMALL LETTER G WITH CEDILLA</tt>:

<fieldset>
<pre><code class="c++">u8"\u0123" // UTF-8:  const char[]:     0xC4 0xA3 0x00
 u"\u0123" // UTF-16: const char16_t[]: 0x0123 0x0000
 U"\u0123" // UTF-32: const char32_t[]: 0x00000123 0x00000000
  "\u0123" // ???:    const char[]:     ???
 L"\u0123" // ???:    const wchar_t[]:  ???
</code></pre>
</fieldset>
</p>

<p>The UTF-8, UTF-16, and UTF-32 string literals have well-defined and portable
sequences of code unit values.  The ordinary and wide string literal code unit
sequences depend on the implementation defined execution and execution wide
encodings respectively.  Code that is designed to work with text encodings must
be able to differentiate these strings.  This is straight forward for wide,
UTF-16, and UTF-32 string literals since they each have a distinct code unit
type suitable for differentiation via function overloading or template
specialization.  But for ordinary and UTF-8 string literals, differentiating
between them requires additional information since they have the same code unit
type.  That additional information might be provided implicitly via differently
named functions, or explicitly via additional function or template
arguments.  For example:

<fieldset>
<pre><code class="c++">// Differentiation by function name:
void do_x(const char *);
void do_x_utf8(const char *);
void do_x(const wchar_t *);
void do_x(const char16_t *);
void do_x(const char32_t *);

// Differentiation by suffix for user-defined literals:
int operator ""_udl(const char *s, std::size_t);
int operator ""_udl_utf8(const char *s, std::size_t);
int operator ""_udl(const wchar_t *s, std::size_t);
int operator ""_udl(const char16_t *s, std::size_t);
int operator ""_udl(const char32_t *s, std::size_t);

// Differentiation by function parameter:
void do_x2(const char *, bool is_utf8);
void do_x2(const wchar_t *);
void do_x2(const char16_t *);
void do_x2(const char32_t *);

// Differentiation by template parameter:
template&lt;bool IsUTF8&gt;
void do_x3(const char *);
</code></pre>
</fieldset>
</p>

<p>The requirement to, in some way, specify the text encoding, other than
through the type of the string, limits the ability to provide elegant encoding
sensitive interfaces.  Consider the following invocations of the
<tt>make_text_view</tt> function proposed in
<a title="P0244R2: Text_view: A C++ concepts and range based character encoding
         and code point enumeration library"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0244r2.html">
P0244R2</a>
<sup><a title="P0244R2: Text_view: A C++ concepts and range based character
               encoding and code point enumeration library"
        href="#ref_p0244r2">
[P0244R2]</a></sup>:

<fieldset>
<pre><code class="c++">make_text_view&lt;execution_character_encoding&gt;("text")
make_text_view&lt;execution_wide_character_encoding&gt;(L"text")
make_text_view&lt;utf8_encoding&gt;(u8"text")
make_text_view&lt;utf16_encoding&gt;(u"text")
make_text_view&lt;utf32_encoding&gt;(U"text")
</code></pre>
</fieldset>
</p>

<p>For each invocation, the encoding of the string literal is known at compile
time, so having to explicitly specify the encoding tag is redundant.  If
UTF-8 string literals had a distinct type, then the encoding type could be
inferred, while still allowing an overriding tag to be supplied:

<fieldset>
<pre><code class="c++">make_text_view("text")   // defaults to execution_character_encoding.
make_text_view(L"text")  // defaults to execution_wide_character_encoding.
make_text_view(u8"text") // defaults to utf8_encoding.
make_text_view(u"text")  // defaults to utf16_encoding.
make_text_view(U"text")  // defaults to utf32_encoding.
make_text_view&lt;utf16be_encoding&gt;("\0t\0e\0x\0t\0")  // Default overridden to select UTF-16BE.
</code></pre>
</fieldset>
</p>

<p>The inability to infer an encoding for narrow strings doesn't just limit the
interfaces of new features under consideration.  Compromised interfaces are
already present in the standard library.</p>

<p>Consider the design of the <tt>codecvt</tt> class template.  The standard
specifies the following specializations of <tt>codecvt</tt> be provided to
enable transcoding text from one encoding to another.

<fieldset>
<pre><code class="c++">codecvt&lt;char, char, mbstate_t&gt;     <em>// #1</em>
codecvt&lt;wchar_t, char, mbstate_t&gt;  <em>// #2</em>
codecvt&lt;char16_t, char, mbstate_t&gt; <em>// #3</em>
codecvt&lt;char32_t, char, mbstate_t&gt; <em>// #4</em>
</code></pre>
</fieldset>
</p>

<p>#1 performs no conversions.  #2 converts between strings encoded in the
implementation defined wide and narrow encodings.  #3 and #4 convert between
either the UTF-16 or UTF-32 encoding and the UTF-8 encoding.  Specializations
are not currently specified for conversion between the implementation defined
narrow and wide encodings and any of the UTF-8, UTF-16, or UTF-32 encodings.
However, if support for such conversions were to be added, the desired
interfaces are already taken by #1, #3 and #4.</p>

<p>The file system interface adopted for C++17 via
<a title="P0218R1: Adopt the File System TS for C++17"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html">
P0218R1</a>
<sup><a title="P0218R1: Adopt the File System TS for C++17"
        href="#ref_p0218r1">
[P0218R1]</a></sup>
provides an example of a feature that supports all five of the standard mandated
encodings, but does so with an asymetric interface due to the inability to
overload functions for UTF-8 encoded strings.  Class
<tt>std::filesystem::path</tt> provides the following constructors to initialize
a <tt>path</tt> object based on a range of code unit values where the encoding
is inferred based on the value type of the range.

<fieldset>
<pre><code class="c++">template &lt;class Source&gt;
path(const Source&amp; source);
template &lt;class InputIterator&gt;
path(InputIterator first, InputIterator last);
</code></pre>
</fieldset>

<p>§ 30.11.7.2.2 [fs.path.type.cvt] describes how the source encoding is
determined based on whether the source range value type is <tt>char</tt>,
<tt>wchar_t</tt>, <tt>char16_t</tt>, or <tt>char32_t</tt>.  A range with value
type <tt>char</tt> is interpreted using the implementation defined execution
encoding.  It is not possible to construct a path object from UTF-8
encoded text using these constructors.

<p>To accommodate UTF-8 encoded text, the file system library specifies the
following factory functions.  Matching factory functions are not provided for
other encodings.

<fieldset>
<pre><code class="c++">template &lt;class Source&gt;
path u8path(const Source&amp; source);
template &lt;class InputIterator&gt;
path u8path(InputIterator first, InputIterator last);
</code></pre>
</fieldset>

<p>The requirement to construct <tt>path</tt> objects using one interface for
UTF-8 strings vs another interface for all other supported encodings creates
unnecessary difficulties for portable code.  Consider an application that uses
UTF-8 as its internal encoding on POSIX systems, but uses UTF-16 on Windows.
Conditional compilation or other abstractions must be implemented and used
in otherwise platform neutral code to construct <tt>path</tt> objects.</p>

<p>The inability to infer an encoding based on string type is not the only
challenge posed by use of <tt>char</tt> as the UTF-8 code unit type.  The
following code exhibits implementation defined behavior.

<fieldset>
<pre><code class="c++">bool is_utf8_multibyte_code_unit(char c) {
  return c &gt;= 0x80;
}
</code></pre>
</fieldset>
</p>

<p>UTF-8 leading and continuation code units have values in the range 128
(0x80) to 255 (0xFF).  In the common case where <tt>char</tt> is implemented
as a signed 8-bit type with a two's complement representation and a range of
-128 (-0x80) to 127 (0x7F), these values exceed the unsigned range of the
<tt>char</tt> type.  Such implementations typically encode such code units as
unsigned values which are then reinterpreted as signed values when read.  In
the code above, integral promotion rules result in <tt>c</tt> being promoted to
type <tt>int</tt> for comparison to the <tt>0x80</tt> operand.  if <tt>c</tt>
holds a value corresponding to a leading or continuation code unit value, then
its value will be interpreted as negative and the promoted value of type
<tt>int</tt> will likewise be negative.  The result is that the comparison
is always false for these implementations.</p>

<p>To correct the code above, explicit conversions are required.  For example:

<fieldset>
<pre><code class="c++">bool is_utf8_multibyte_code_unit(char c) {
  return static_cast&lt;unsigned char&gt;(c) &gt;= 0x80;
}
</code></pre>
</fieldset>
</p>

<p>Finally, processing of UTF-8 strings is currently subject to an optimization
pessimization due to glvalue expressions of type <tt>char</tt> potentially
aliasing objects of other types.  Use of a distinct type that does not share
this aliasing behavior may allow for further compiler optimizations.</p>

<p>As of November 2017,
<a title="Usage of UTF-8 for websites"
   href="https://w3techs.com/technologies/details/en-utf8/all/all">
UTF-8 is now used by more than 90% of all websites</a>
<sup><a title="Usage of UTF-8 for websites"
        href="#ref_w3techs">
[W3Techs]</a></sup>.
The C++ standard must improve support for UTF-8 by removing the existing
barriers that result in redundant tagging of character encodings, non-generic
UTF-8 specific workarounds like <tt>u8path</tt>, and the need for static
casts to examine UTF-8 code unit values.
</p>

<h1 id="proposal">Proposal</h1>

<p>The proposed changes are intended to bring the standard to the state the
author believes it would likely be in had <tt>char8_t</tt> been added at the
same time that <tt>char16_t</tt> and <tt>char32_t</tt> were added.  This
includes the ability to differentiate ordinary and UTF-8 literals in function
overloading, template specializations, and user-defined literal operator
signatures.  The following core language changes are proposed in order to
facilitate these capabilities:
<ul>
  <li>A new fundamental type named <tt>char8_t</tt>.  This integral type has
      the same signedness, size, alignment, and integer conversion rank as
      <tt>unsigned char</tt>, but does not alias with any other type
      (e.g., this proposal does not add <tt>char8_t</tt> to the list of
      aliasing types in § 8.2.1 [basic.lval] paragraph 11 (11.8)).</li>
  <li>The type of UTF-8 string literals is changed from array of
      <tt>const char</tt> to array of <tt>const char8_t</tt>.</li>
  <li>The type of UTF-8 character literals is changed from <tt>char</tt>
      to <tt>char8_t</tt>.</li>
  <li>New <tt>char8_t</tt> based signatures for user-defined literal
      operators.</li>
</ul></p>

<p>The following library changes are proposed to address concerns like those
   raised in the motivation section above, and to take advantage of the new
   core features:
<ul>
  <li>New <tt>char8_t</tt> based specializations of <tt>atomic</tt>,
      <tt>numeric_limits</tt>, <tt>hash</tt>, <tt>char_traits</tt>,
      <tt>basic_string</tt>, and <tt>basic_string_view</tt>.</li>
  <li>New <tt>u8streampos</tt>, <tt>u8string</tt>, <tt>u8string_view</tt>
      type aliases.</li>
  <li>New <tt>operator ""s</tt> and <tt>operator ""sv</tt> <tt>char8_t</tt>
      based overloads for UTF-8 literals.</li>
  <li>New <tt>char8_t</tt> based specializations of <tt>codecvt</tt> and
      <tt>codecvt_byname</tt> for converting between UTF-16, UTF-32, and
      UTF-8.  The existing <tt>char</tt> based specializations are deprecated.
      The new specializations are functionally identical to the deprecated
      ones.</li>
  <li>The return type of the <tt>u8string</tt> and <tt>generic_u8string</tt>
      member functions of the filesystem <tt>path</tt> class are changed
      from <tt>string</tt> to <tt>u8string</tt>.</li>
  <li>Filesystem <tt>path</tt> objects may now be constructed with UTF-8
      strings using the existing <tt>path</tt> constructors used for
      construction with other encodings.  The existing <tt>u8path</tt>
      factory functions are deprecated.</li>
</ul></p>

<p>These changes necessarily impact backward compatibility as described in
the <a href="#design_compat">Backward compatibility</a> section.</p>

<h1 id="design">Design Considerations</h1>

<h2 id="design_compat">Backward compatibility</h2>

<p>This proposal does not specify any backward compatibility features other than
to retain interfaces that it deprecates.  The author believes such features are
necessary, but that a single set of such features would unnecessarily compromise
the goals of this proposal.  Rather, the expectation is that implementations
will provide options to enable more fine grained compatibility features.</p>

<p>The following sections discuss backward compatibility impact.</p>

<h3 id="design_compat_core">Core language backward compatibility</h3>

<h4 id="design_compat_core_init">Initialization</h4>

<p>Declarations of arrays of <tt>char</tt> may currently be initialized with
UTF-8 string literals.  Under this proposal, such initializations would
become ill-formed.  This is intended to maintain consistency with
initialization of arrays of <tt>wchar_t</tt>, <tt>char16_t</tt>, and
<tt>char32_t</tt>, all of which require the initializing string literal to
have a matching element type as specified in § 11.6.2 [dcl.init.string].

<fieldset>
<pre><code class="c++">char ca[] = u8"text";   // C++17: Ok.
                        // This proposal: Ill-formed.

char8_t c8a[] = "text"; // C++17: N/A (char8_t is not a type specifier).
                        // This proposal: Ill-formed.
</code></pre>
</fieldset>
</p>

<p>Implementations are encouraged to add options to allow the above
initializations (with a warning) to assist users in migrating their code.</p>

<p>Declarations of variables of type <tt>char</tt> initialized with a UTF-8
character literal remain well-formed and are initialized following the
standard conversion rules.

<fieldset>
<pre><code class="c++">char c = u8'c';         // C++17: Ok.
                        // This proposal: Ok (no change from C++17).

char8_t c8 = 'c';       // C++17: N/A (char8_t is not a type specifier).
                        // This proposal: Ok; c8 is assigned the value of the 'c'
                        //                character in the execution character set.
</code></pre>
</fieldset>
</p>

<h4 id="design_compat_core_implicit_conversion">Implicit conversions</h4>

<p>Under this proposal, UTF-8 string literals no longer bind to references
to array of type <tt>const char</tt> nor do they implicitly convert to pointer
to <tt>const char</tt>.  The following code is currently well-formed, but would
become ill-formed under this proposal:

<fieldset>
<pre><code class="c++">const char (&amp;u8r)[] = u8"text"; // C++17: Ok.
                                // This proposal: Ill-formed.

const char *u8p = u8"text";     // C++17: Ok.
                                // This proposal: Ill-formed.
</code></pre>
</fieldset>
</p>

<p>Implementations are encouraged to add options to allow the above
conversions (with a warning) to assist users in migrating their code.
Such options would require allowing aliasing of <tt>char</tt> and
<tt>char8_t</tt>.  Note that it may be useful to permit these conversions
only for UTF-8 string literals and not for general expressions of array
of <tt>char8_t</tt> type.</p>

<h4 id="design_compat_core_type_deduction">Type deduction</h4>

<p>Under this proposal, UTF-8 string and character literals have type array of
<tt>const char8_t</tt> and <tt>char8_t</tt> respectively.  This affects the
types deduced for placeholder types and template parameter types.

<fieldset>
<pre><code class="c++">template&lt;typename T1, typename T2&gt;
void ft(T1, T2);

ft(u8"text", u8'c'); // C++17: T1 deduced to const char*, T2 deduced to char.
                     // This proposal: T1 deduced to const char8_t*, T2 deduced to char8_t.

auto u8p = u8"text"; // C++17: Type deduced to const char*.
                     // This proposal: Type deduced to const char8_t*.

auto u8c = u8'c';    // C++17: Type deduced to char.
                     // This proposal: Type deduced to char8_t.
</code></pre>
</fieldset>
</p>

<p>This change in behavior is a primary objective of this proposal.
Implementations are encouraged to add options to disable <tt>char8_t</tt>
support entirely when necessary to preserve compatibility with C++17.</p>

<h4 id="design_compat_core_overload_resolution">Overload resolution</h4>

<p>The following code is currently well-formed, and would remain well-formed
under this proposal, but would behave differently:

<fieldset>
<pre><code class="c++">template&lt;typename T&gt; void f(const T*);
void f(const char*);
f(u8"text");                    // C++17: Calls f(const char*).
                                // This proposal: Calls f&lt;char8_t&gt;(const char8_t*).
</code></pre>
</fieldset>
</p>

<p>The following code is currently well-formed, but would become ill-formed
under this proposal:

<fieldset>
<pre><code class="c++">void f(const char*);
f(u8"text");                    // C++17: Ok.
                                // This proposal: Ill-formed; no matching function found.

int operator ""_udl(const char*, size_t);
auto x = u8"text"_udl;          // C++17: Ok
                                // This proposal: Ill-formed; no matching literal operator found.
</code></pre>
</fieldset>
</p>

<p>These changes in behavior are a primary objective of this proposal.
Implementations are encouraged to add options to disable <tt>char8_t</tt>
support entirely when necessary to preserve compatibility with C++17.</p>

<h4 id="design_compat_core_template_specialization">Template specialization</h4>

<p>The following code is currently well-formed, and would remain well-formed
under this proposal, but would behave differently:

<fieldset>
<pre><code class="c++">template&lt;typename T&gt; struct ct { static constexpr bool value = false; };
template&lt;&gt; struct ct&lt;char&gt; { static constexpr bool value = true; };
template&lt;typename T&gt; bool ft(const T*) { return ct&lt;T&gt;::value; }
ft(u8"text");                   // C++17: returns true.
                                // This proposal: returns false.
</code></pre>
</fieldset>
</p>

<p>This change in behavior is a primary objective of this proposal.
Implementations are encouraged to add options to disable <tt>char8_t</tt>
support entirely when necessary to preserve compatibility with C++17.</p>

<h3 id="design_compat_library">Library backward compatibility</h3>

<h4 id="design_compat_library_u8string">
    Return type of <tt>path::u8string</tt> and <tt>path::generic_u8string</tt></h4>

<p>This proposal includes a new specialization of <tt>std::basic_string</tt>
for the new <tt>char8_t</tt> type, a new <tt>std::u8string</tt> type alias,
and changes to the <tt>u8string</tt> and <tt>generic_u8string</tt> member
functions of <tt>filesystem::path</tt> to return <tt>std::u8string</tt>
instead of <tt>std::string</tt>.  This change renders ill-formed the following
code that is currently well-formed.

<fieldset>
<pre><code class="c++">void f(std::filesystem::path p) {
  std::string s;

  s = p.u8string(); // C++17: Ok.
                    // This proposal: ill-formed.
}
</code></pre>
</fieldset>
</p>

<p>Implementations are encouraged to add an option that allows implicit
conversion of <tt>std::u8string</tt> to <tt>std::string</tt> to assist in
a gradual migration of code that calls these functions.</p>

<h4 id="design_compat_library_literal_operators">
    Return type of <tt>operator ""s</tt> and <tt>operator ""sv</tt></h4>

<p>This proposal includes new overloads of <tt>operator ""s</tt> and
<tt>operator ""sv</tt> that return <tt>char8_t</tt> specializations of
<tt>std::basic_string</tt> and <tt>std::basic_string_view</tt> respectively.
This change renders ill-formed the following code that is currently well-formed.

<fieldset>
<pre><code class="c++">std::string s;

s = u8"text"s;    // C++17: Ok.
                  // This proposal: ill-formed.

s = u8"text"sv;   // C++17: Ok.
                  // This proposal: ill-formed.
</code></pre>
</fieldset>
</p>

<p>Implementations are encouraged to add an option that allows implicit
conversion of <tt>std::u8string</tt> to <tt>std::string</tt> to assist in
a gradual migration of code that calls these functions.</p>

<h2 id="design_narrow_utf8">
    Should UTF-8 literals continue to be referred to as narrow literals?</h2>

<p>UTF-8 literals are maintained as narrow literals in this proposal.</p>

<h2 id="design_char8_t_underlying_type">
    What should be the underlying type of char8_t?</h2>

<p>There are several choices for the underlying type of <tt>char8_t</tt>.
Use of <tt>unsigned char</tt> closely aligns with historical use.  Use of
<tt>uint_least8_t</tt> would maintain consistency with how the underlying
types of <tt>char16_t</tt> and <tt>char32_t</tt> are specified.</p>

<p>This proposal specifies <tt>unsigned char</tt> as the underlying type as
noted in the changes to § 6.7.1 <tt>[basic.fundamental]</tt> paragraph 5.</p>

<h1 id="implementation_exp">Implementation Experience</h1>

<p>An implementation is available in the <tt>char8_t</tt> branch of a gcc
fork hosted on GitHub at
<a href="https://github.com/tahonermann/gcc/tree/char8_t">
https://github.com/tahonermann/gcc/tree/char8_t</a>.  This implementation is
believed to be complete for both the proposed core language and library
features.  New <tt>-fchar8_t</tt> and <tt>-fno-char8_t</tt> compiler options
support enabling and disabling the new features.  No backward compatibility
features are currently implemented.</p>

<h1 id="wording">Formal Wording</h1>

<input type="checkbox" id="hidedel">Hide deleted text</input>

<p>These changes are relative to
<a title="Working Draft, Standard for Programming Language C++"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf">
N4713</a>
<sup><a title="Working Draft, Standard for Programming Language C++"
        href="#ref_n4713">
[N4713]</a></sup></p>

<h2 id="core_wording">Core Wording</h2>

<p>Add <tt>char8_t</tt> to the list of keywords in table 5 in 5.11 [lex.key]
paragraph 1. </p>

<p>Change in 5.13.3 [lex.ccon] paragraph 3:
<blockquote>
A character literal that begins with <tt>u8</tt>, such as <tt>u8'w'</tt>, is a
character literal of type <del><tt>char</tt></del><ins><tt>char8_t</tt></ins>,
known as a <em>UTF-8 character literal</em>.[&hellip;]
</blockquote>
</p>

<p>Change in 5.13.5 [lex.string] paragraph 6:
<blockquote>
After translation phase 6, a <em>string-literal</em> that does not begin with
an <em>encoding-prefix</em> is an <em>ordinary string literal</em><ins>.  An
<em>ordinary string literal</em> has type "<em>array</em> of <em>n</em>
<tt>const char</tt>" where <em>n</em> is the size of the string as defined
below, has static storage duration (6.6.4)</ins>, and is initialized with the
given characters.
</blockquote>
</p>

<p>Change in 5.13.5 [lex.string] paragraph 7:
<blockquote>
A <em>string-literal</em> that begins with <tt>u8</tt>, such as
<tt>u8"asdf"</tt>, is a <em>UTF-8 string literal</em><ins>, also referred to as
a <tt>char8_t</tt> string literal.  A <tt>char8_t</tt> string literal has type
"<em>array</em> of <em>n</em> <tt>const char8_t</tt>", where <em>n</em> is
the size of the string as defined below; each successive element of the object
representation (6.7) has the value of the corresponding code unit of the UTF-8
encoding of the string.
</blockquote>
</p>

<p>Change in 5.13.5 [lex.string] paragraph 8:
<blockquote>
Ordinary string literals and UTF-8 string literals are also referred to as
narrow string literals. <del>A narrow string literal has type "<em>array</em>
of <em>n</em> <tt>const char</tt>", where <em>n</em> is the size of the string
as defined below, and has static storage duration (6.6.4).</del>
</blockquote>
</p>

<p>Remove 5.13.5 [lex.string] paragraph 9:
<blockquote class=stddel>
For a UTF-8 string literal, each successive element of the object
representation (6.7) has the value of the corresponding code unit of the UTF-8
encoding of the string.
</blockquote>
</p>

<p><em>Drafting note: The paragraph 9 content was incorporated in the changes
to paragraph 7.</em></p>

<p>Change in 5.13.5 [lex.string] paragraph 15:
<blockquote>
[&hellip;] In a narrow string literal, a <em>universal-character-name</em>
may map to more than one <tt>char</tt> <ins>or <tt>char8_t</tt></ins> element
due to <em>multibyte encoding</em>. [&hellip;]
</blockquote>
</p>

<!-- FIXME: Does 6.7.4.3p1.3 need updates for "narrow"? -->

<p>Change in 6.7.1 [basic.fundamental] paragraph 1:
<blockquote>
Objects declared <del>as characters</del><ins>with type </ins>
<del>(</del><tt>char</tt><del>)</del> shall be large enough to store any member
of the implementation’s basic character set.  If a character from this set is
stored in a character object, the integral value of that character object is
equal to the value of the single character literal form of that character. It is
implementation-defined whether a <tt>char</tt> object can hold negative values.
Characters <ins>declared with type <tt>char</tt> </ins>can be explicitly
declared <tt>unsigned</tt> or <tt>signed</tt>.  Plain <tt>char</tt>,
<tt>signed char</tt>, and <tt>unsigned char</tt> are three distinct types,
collectively called <em><del>narrow</del><ins>ordinary</ins> character
types</em>.  <ins>The <em>ordinary character types</em> and <tt>char8_t</tt> are
collectively called <em>narrow character types</em>.</ins>  A <tt>char</tt>, a
<tt>signed char</tt>, <del>and </del>an <tt>unsigned char</tt><ins>, and a
<tt>char8_t</tt></ins> occupy the same amount of storage and have the same
alignment requirements (6.6.5); that is, they have the same object
representation. For narrow character types, all bits of the object
representation participate in the value representation. [ <em>Note</em>: A
bit-field of narrow character type whose length is larger than the number of
bits in the object representation of that type has padding bits; see 6.7.
&mdash; <em>end note</em> ] For unsigned narrow character types<ins>, including
<tt>char8_t</tt></ins>, each possible bit pattern of the value representation
represents a distinct number. These requirements do not hold for other types.
In any particular implementation, a plain <tt>char</tt> object <del>can</del>
<ins>shall</ins> take on either the same values as a <tt>signed char</tt> or an
<tt>unsigned char</tt>; which one is implementation-defined. For each value
<em>i</em> of type <tt>unsigned char</tt><ins>, or <tt>char8_t</tt></ins> in the
range 0 to 255 inclusive, there exists a value <em>j</em> of type <tt>char</tt>
such that the result of an integral conversion (7.8) from <em>i</em> to
<tt>char</tt> is <em>j</em>, and the result of an integral conversion from
<em>j</em> to <tt>unsigned char</tt><ins> or <tt>char8_t</tt></ins> is
<em>i</em>.
</blockquote>
</p>

<p>Change in 6.7.1 [basic.fundamental] paragraph 5:
<blockquote>
[&hellip;] Type <tt>wchar_t</tt> shall have the same size, signedness, and
alignment requirements (6.6.5) as one of the other integral types, called its
underlying type.  <ins>Type <tt>char8_t</tt> denotes a distinct type with the
same size, signedness, and alignment as <tt>unsigned char</tt>, called its
underlying type.</ins>  Types <tt>char16_t</tt> and <tt>char32_t</tt> denote
distinct types with the same size, signedness, and alignment as
<tt>uint_least16_t</tt> and <tt>uint_least32_t</tt>, respectively, in
<tt>&lt;cstdint&gt;</tt>, called the underlying types. [&hellip;]
</blockquote>
</p>

<p>Change in 6.7.1 [basic.fundamental] paragraph 7:
<blockquote>
Types <tt>bool</tt>, <tt>char</tt>, <ins><tt>char8_t</tt>, </ins>
<tt>char16_t</tt>, <tt>char32_t</tt>, <tt>wchar_t</tt>, and the signed and
unsigned integer types are collectively called integral types. [&hellip;]
</blockquote>
</p>

<!-- FIXME: Update 6.11 [basic.align] p6 to s/narrow/ordinary/ -->

<p>Change in 6.7.4 [conv.rank] paragraph 1:
<blockquote>
[&hellip;]<br/>
(1.8) &mdash; The ranks of <ins><tt>char8_t</tt>, </ins><tt>char16_t</tt>,
<tt>char32_t</tt>, and <tt>wchar_t</tt> shall equal the ranks of their
underlying types (6.7.1).
<br/>[&hellip;]
</blockquote>
</p>

<p>Change to footnote 64 associated with 8.3 [expr.arith.conv] paragraph 1 (1.5):
<blockquote>
As a consequence, operands of type <tt>bool</tt>, <ins><tt>char8_t</tt>, </ins>
<tt>char16_t</tt>, <tt>char32_t</tt>, <tt>wchar_t</tt>, or an enumerated type
are converted to some integral type.
</blockquote>
</p>

<p>Change in 8.5.2.3 [expr.sizeof] paragraph 1:
<blockquote>
[&hellip;] <tt>sizeof(char)</tt>, <tt>sizeof(signed char)</tt><ins>,</ins>
<del>and</del> <tt>sizeof(unsigned char)</tt><ins>, and <tt>sizeof(char8_t)</tt></ins>
are 1. [&hellip;]
</blockquote>
</p>

<p>Change in 10.1.7.2 [dcl.type.simple] paragraph 1:
<blockquote>
The simple type specifiers are<br/>
<div style="margin-left: 1em;">
  <em>simple-type-specifier</em>:<br/>
  <div style="margin-left: 1em;">
    [&hellip;]<br/>
    <tt>char</tt><br/>
    <ins><tt>char8_t</tt></ins><br/>
    <tt>char16_t</tt><br/>
    <tt>char32_t</tt><br/>
    [&hellip;]<br/>
  </div>
</div>
</blockquote>
</p>

<p>Change in table 11 of 10.1.7.2 [dcl.type.simple] paragraph 2:
<blockquote>
[&hellip;]<br/>
<div style="margin-left: 1em;">
<table>
  <tr>
    <td align="center">
      Table 11 &mdash; <em>simple-type-specifiers</em> and the types they specify
    </td>
  </tr>
  <tr>
    <td align="center">
      <table border="1">
        <tr>
          <th>Specifier(s)</th>
          <th>Type</th>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
        <tr>
          <td><tt>char</tt></td>
          <td><tt>“char”</tt></td>
        </tr>
        <tr>
          <td><tt>unsigned char</tt></td>
          <td><tt>“unsigned char”</tt></td>
        </tr>
        <tr>
          <td><tt>signed char</tt></td>
          <td><tt>“signed char”</tt></td>
        </tr>
        <tr>
          <td><ins><tt>char8_t</tt></ins></td>
          <td><ins><tt>“char8_t”</tt></ins></td>
        </tr>
        <tr>
          <td><tt>char16_t</tt></td>
          <td><tt>“char16_t”</tt></td>
        </tr>
        <tr>
          <td><tt>char32_t</tt></td>
          <td><tt>“char32_t”</tt></td>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
      </table>
    </td>
  </tr>
</table>
</div>
<br/>[&hellip;]
</blockquote>
</p>

<p>Change in 11.6 [dcl.init] paragraph 17:
<blockquote>
[&hellip;]<br/>
(17.3) &mdash; If the destination type is an array of characters, <ins>an
array of <tt>char8_t</tt>, </ins>an array of <tt>char16_t</tt>, an array of
<tt>char32_t</tt>, or an array of <tt>wchar_t</tt>, and the initializer is a
string literal, see 11.6.2.
<br/>[&hellip;]
</blockquote>
</p>

<p>Change in 11.6.2 [dcl.init.string] paragraph 1:
<blockquote>
An array of <del>narrow</del><ins>ordinary</ins> character type (6.7.1),
<ins><tt>char8_t</tt> array, </ins><tt>char16_t</tt> array, <tt>char32_t</tt>
array, or <tt>wchar_t</tt> array can be initialized by <del>a narrow</del>
<ins>an ordinary</ins> string literal, <ins>char8_t string literal,
</ins>char16_t string literal, char32_t string literal, or wide string
literal, respectively, [&hellip;]
</blockquote>
</p>

<p>Change in 16.5.8 [over.literal] paragraph 3:
<blockquote>
The declaration of a literal operator shall have a
<em>parameter-declaration-clause</em> equivalent to one of the following:
<div style="margin-left: 1em;">
[&hellip;]<br/>
<tt>char</tt><br/>
<tt>wchar_t</tt><br/>
<ins><tt>char8_t</tt></ins><br/>
<tt>char16_t</tt><br/>
<tt>char32_t</tt><br/>
<tt>const char*</tt>, <tt>std::size_t</tt><br/>
<tt>const wchar_t*</tt>, <tt>std::size_t</tt><br/>
<ins><tt>const char8_t*</tt>, <tt>std::size_t</tt></ins><br/>
<tt>const char16_t*</tt>, <tt>std::size_t</tt><br/>
<tt>const char32_t*</tt>, <tt>std::size_t</tt><br/>
[&hellip;]<br/>
</div>
</blockquote>
</p>

<h2 id="library_wording">Library Wording</h2>

<p>Change in 20.1 [library.general] paragraph 7:
<blockquote>
The strings library (Clause 24) provides support for manipulating text
represented as sequences of type <tt>char</tt>,
<ins>sequences of type <tt>char8_t</tt>, </ins>
sequences of type <tt>char16_t</tt>,
sequences of type <tt>char32_t</tt>,
sequences of type <tt>wchar_t</tt>,
and sequences of any other character-like type.
</blockquote>
</p>

<p>Change in 20.3.2 [defns.character]:
<blockquote>
[&hellip;]<br/>
[ <em>Note 1 to entry:</em> The term does not mean only <tt>char</tt>,
<ins><tt>char8_t</tt>, </ins><tt>char16_t</tt>, <tt>char32_t</tt>, and
<tt>wchar_t</tt> objects, but any value that can be represented by a type
that provides the definitions specified in these Clauses.  &mdash;
<em>end note</em> ]
</blockquote>
</p>

<p>Change in 21.3.2 [limits.syn]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]<br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;char&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;signed char&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;unsigned char&gt;;<br/>
&nbsp;&nbsp;<ins>template&lt;&gt; class numeric_limits&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class numeric_limits&lt;wchar_t&gt;;<br/>
[&hellip;]<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 24.2 [char.traits] paragraph 1:
<blockquote>
This subclause defines requirements on classes representing <em>character
traits</em>, and defines a class template <tt>char_traits&lt;charT&gt;</tt>,
along with <del>four</del><ins>five</ins> specializations,
<tt>char_traits&lt;char&gt;</tt>,
<ins><tt>char_traits&lt;char8_t&gt;</tt>,</ins>
<tt>char_traits&lt;char16_t&gt;</tt>,
<tt>char_traits&lt;char32_t&gt;</tt>,
and <tt>char_traits&lt;wchar_t&gt;</tt>,
that satisfy those requirements.
</blockquote>
</p>

<p>Change in 24.2 [char.traits] paragraph 4:
<blockquote>
This subclause specifies a class template, <tt>char_traits&lt;charT&gt;</tt>,
and <del>four</del><ins>five</ins> explicit specializations of it,
<tt>char_traits&lt;char&gt;</tt>,
<ins><tt>char_traits&lt;char8_t&gt;</tt>,</ins>
<tt>char_traits&lt;char16_t&gt;</tt>,
<tt>char_traits&lt;char32_t&gt;</tt>, and
<tt>char_traits&lt;wchar_t&gt;</tt>, all of which appear in the header
<tt>&lt;string&gt;</tt> and satisfy the requirements below.
</blockquote>
</p>

<p><em>Drafting note: 24.2p4 appears to unnecessarily duplicate information
previously presented in 24.2p1.</em></p>

<p>Change in 24.2.3 [char.traits.specializations]:
<blockquote>
<div style="margin-left: 1em;">
<tt>namespace std {</tt><br/>
&nbsp;&nbsp;<tt>template&lt;&gt; struct char_traits&lt;char&gt;;</tt><br/>
&nbsp;&nbsp;<ins><tt>template&lt;&gt; struct char_traits&lt;char8_t&gt;;</tt></ins><br/>
&nbsp;&nbsp;<tt>template&lt;&gt; struct char_traits&lt;char16_t&gt;;</tt><br/>
&nbsp;&nbsp;<tt>template&lt;&gt; struct char_traits&lt;char16_t&gt;;</tt><br/>
&nbsp;&nbsp;<tt>template&lt;&gt; struct char_traits&lt;char32_t&gt;;</tt><br/>
&nbsp;&nbsp;<tt>template&lt;&gt; struct char_traits&lt;wchar_t&gt;;</tt><br/>
<tt>}</tt><br/>
</div>
</blockquote>
</p>

<p>Change in 24.2.3 [char.traits.specializations] paragraph 1:
<blockquote>
The header <tt>&lt;string&gt;</tt> shall define <del>four</del><ins>five</ins>
specializations of the class template <tt>char_traits</tt>:
<tt>char_traits&lt;char&gt;</tt>,
<ins><tt>char_traits&lt;char8_t&gt;</tt>,</ins>
<tt>char_traits&lt;char16_t&gt;</tt>,
<tt>char_traits&lt;char32_t&gt;</tt>, and
<tt>char_traits&lt;wchar_t&gt;</tt>.
</blockquote>
</p>

<p>Add a new subclause after 24.2.3.1 [char.traits.specializations.char]:
<blockquote class=stdins>
<table>
  <tr>
    <td>24.2.3.?</td>
    <td><tt>struct char_traits&lt;char8_t&gt;</tt></td>
    <td>[char.traits.specializations.char8_t]</td>
  </tr>
</table>
<tt>
namespace std {<br/>
&nbsp;&nbsp;template&lt;&gt; struct char_traits&lt;char8_t&gt; {<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using char_type  = char8_t;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using int_type   = unsigned int;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using off_type   = streamoff;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using pos_type   = u8streampos;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using state_type = mbstate_t;<br/>
<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr void assign(char_type&amp; c1, const char_type&amp; c2) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr bool eq(char_type c1, char_type c2) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr bool lt(char_type c1, char_type c2) noexcept;<br/>
<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr int compare(const char_type* s1, const char_type* s2, size_t n);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr size_t length(const char_type* s);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr const char_type* find(const char_type* s, size_t n,<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;const char_type&amp; a);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static char_type* move(char_type* s1, const char_type* s2, size_t n);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static char_type* copy(char_type* s1, const char_type* s2, size_t n);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static char_type* assign(char_type* s, size_t n, char_type a);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr int_type not_eof(int_type c) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr char_type to_char_type(int_type c) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr int_type to_int_type(char_type c) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr bool eq_int_type(int_type c1, int_type c2) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;static constexpr int_type eof() noexcept;<br/>
&nbsp;&nbsp;};<br/>
}<br/>
</tt>
</blockquote>
</p>

<p>Add paragraph 1:
<blockquote class=stdins>
The type <tt>u8streampos</tt> shall be an implementation-defined type that
satisfies the requirements for <tt>pos_type</tt> in 30.2.2 and 30.3.
</blockquote>
</p>

<p>Add paragraph 2:
<blockquote class=stdins>
The two-argument members <tt>assign</tt>, <tt>eq</tt>, and <tt>lt</tt> shall be
defined identically to the built-in operators <tt>=</tt>, <tt>==</tt>, and
<tt>&lt;</tt> respectively.
</blockquote>
</p>

<p>Add paragraph 3:
<blockquote class=stdins>
The member <tt>eof()</tt> shall return an implementation-defined constant that
cannot appear as a valid UTF-8 code unit.
</blockquote>
</p>

<p>Change in 24.3 [string.classes] paragraph 1:
<blockquote>
The header <tt>&lt;string&gt;</tt> defines the <tt>basic_string</tt> class
template for manipulating varying-length sequences of char-like objects and
<del>four</del><ins>five</ins> <em>typedef-name</em>s, <tt>string</tt>,
<ins><tt>u8string</tt>, </ins><tt>u16string</tt>, <tt>u32string</tt>, and
<tt>wstring</tt>, that name the specializations
<tt>basic_string&lt;char&gt;</tt>,
<ins><tt>basic_string&lt;char8_t&gt;</tt>,</ins>
<tt>basic_string&lt;char16_t&gt;</tt>,
<tt>basic_string&lt;char32_t&gt;</tt>, and
<tt>basic_string&lt;wchar_t&gt;</tt>, respectively.<br/>
</blockquote>
</p>

<p>Change in 24.3.1 [string.syn]:
<blockquote>
<h4>Header <tt>&lt;string&gt;</tt> synopsis</h4>
<div style="margin-left: 1em;">
<tt>
#include &lt;initializer_list&gt;<br/>
<br/>
namespace std {<br/>
&nbsp;&nbsp;// 24.2, <em>character traits</em>:<br/>
&nbsp;&nbsp;template&lt;class charT&gt; struct char_traits;<br/>
&nbsp;&nbsp;template&lt;&gt; struct char_traits&lt;char&gt;;<br/>
&nbsp;&nbsp;<ins>template&lt;&gt; struct char_traits&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;template&lt;&gt; struct char_traits&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct char_traits&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct char_traits&lt;wchar_t&gt;;<br/>
[&hellip;]<br/>
&nbsp;&nbsp;// basic_string <em>typedef names</em><br/>
&nbsp;&nbsp;using string    = basic_string&lt;char&gt;;<br/>
&nbsp;&nbsp;<ins>using u8string = basic_string&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;using u16string = basic_string&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;using u32string = basic_string&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;using wstring   = basic_string&lt;wchar_t&gt;;<br/>
[&hellip;]<br/>
&nbsp;&nbsp;// 24.3.5, <em>hash support</em>:<br/>
&nbsp;&nbsp;template&lt;class T&gt; struct hash;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;string&gt;;<br/>
&nbsp;&nbsp;<ins>template&lt;&gt; struct hash&lt;u8string&gt;;</ins><br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;u16string&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;u32string&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;wstring&gt;;<br/>
<br/>
&nbsp;&nbsp;namespace pmr {<br/>
&nbsp;&nbsp;&nbsp;&nbsp;template &lt;class charT, class traits = char_traits&lt;charT&gt;&gt;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;using basic_string = std::basic_string&lt;charT, traits, polymorphic_allocator&lt;charT&gt;&gt;;<br/>
<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using string    = basic_string&lt;char&gt;;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<ins>using u8string = basic_string&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;&nbsp;&nbsp;using u16string = basic_string&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using u32string = basic_string&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;using wstring   = basic_string&lt;wchar_t&gt;;<br/>
&nbsp;&nbsp;}<br/>
<br/>
&nbsp;&nbsp;inline namespace literals {<br/>
&nbsp;&nbsp;inline namespace string_literals {<br/>
&nbsp;&nbsp;&nbsp;&nbsp;// 24.3.6, suffix for basic_string literals:<br/>
&nbsp;&nbsp;&nbsp;&nbsp;string    operator "" s(const char* str, size_t len);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<ins>u8string operator "" s(const char8_t* str, size_t len);</ins><br/>
&nbsp;&nbsp;&nbsp;&nbsp;u16string operator "" s(const char16_t* str, size_t len);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;u32string operator "" s(const char32_t* str, size_t len);<br/>
&nbsp;&nbsp;&nbsp;&nbsp;wstring   operator "" s(const wchar_t* str, size_t len);<br/>
&nbsp;&nbsp;}<br/>
&nbsp;&nbsp;}<br/>
}<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 24.3.5 [basic.string.hash]:
<blockquote>
<tt>
template&lt;&gt; struct hash&lt;string&gt;;<br/>
<ins>template&lt;&gt; struct hash&lt;u8string&gt;;</ins><br/>
template&lt;&gt; struct hash&lt;u16string&gt;;<br/>
template&lt;&gt; struct hash&lt;u32string&gt;;<br/>
template&lt;&gt; struct hash&lt;wstring&gt;;<br/>
</tt>
</blockquote>
</p>

<p>Add a new paragraph after 24.3.6 [basic.string.literals] paragraph 1:
<blockquote class="stdins">
<tt>
u8string operator "" s(const char8_t* str, size_t len);
</tt>
<div style="margin-left: 1em;">
 <em>Returns</em>: <tt>u8string{str, len}</tt>.
</div>
</blockquote>
</p>

<p>Change in 24.4.1 [string.view.synop]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]<br/>
&nbsp;&nbsp;// basic_string_view <em>typedef names</em><br/>
&nbsp;&nbsp;using string_view = basic_string_view&lt;char&gt;;<br/>
&nbsp;&nbsp;<ins>using u8string_view = basic_string_view&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;using u16string_view = basic_string_view&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;using u32string_view = basic_string_view&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;using wstring_view = basic_string_view&lt;wchar_t&gt;;<br/>
<br/>
&nbsp;&nbsp;// 24.4.5, hash support<br/>
&nbsp;&nbsp;template&lt;class T&gt; struct hash;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;string_view&gt;;<br/>
&nbsp;&nbsp;<ins>template&lt;&gt; struct hash&lt;u8string_view&gt;;</ins><br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;u16string_view&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;u32string_view&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; struct hash&lt;wstring_view&gt;;<br/>
<br/>
&nbsp;&nbsp;inline namespace literals {<br/>
&nbsp;&nbsp;inline namespace string_view_literals {<br/>
&nbsp;&nbsp;&nbsp;&nbsp;// 24.4.6, suffix for basic_string_view literals<br/>
&nbsp;&nbsp;&nbsp;&nbsp;constexpr string_view    operator""sv(const char* str, size_t len) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<ins>constexpr u8string_view operator""sv(const char8_t* str, size_t len) noexcept;</ins><br/>
&nbsp;&nbsp;&nbsp;&nbsp;constexpr u16string_view operator""sv(const char16_t* str, size_t len) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;constexpr u32string_view operator""sv(const char32_t* str, size_t len) noexcept;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;constexpr wstring_view   operator""sv(const wchar_t* str, size_t len) noexcept;<br/>
&nbsp;&nbsp;}<br/>
&nbsp;&nbsp;}<br/>
[&hellip;]<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 24.4.5 [string.view.hash]:
<blockquote>
<tt>
template&lt;&gt; struct hash&lt;string_view&gt;;<br/>
<ins>template&lt;&gt; struct hash&lt;u8string_view&gt;;</ins><br/>
template&lt;&gt; struct hash&lt;u16string_view&gt;;<br/>
template&lt;&gt; struct hash&lt;u32string_view&gt;;<br/>
template&lt;&gt; struct hash&lt;wstring_view&gt;;<br/>
</tt>
</blockquote>
</p>

<p>Add a new paragraph after 24.4.6 [string.view.literals] paragraph 1:
<blockquote class="stdins">
<tt>
constexpr u8string_view operator""sv(const char8_t* str, size_t len) noexcept;
</tt>
<div style="margin-left: 1em;">
 <em>Returns</em>: <tt>u8string_view{str, len}</tt>.
</div>
</blockquote>
</p>

<p>Change in table 69 of 25.3.1.1.1 [locale.category]:
<blockquote>
<div style="margin-left: 1em;">
<table>
  <tr>
    <td align="center">
      Table 69 &mdash; Locale category facets
    </td>
  </tr>
  <tr>
    <td align="center">
      <table border="1">
        <tr>
          <th>Category</th>
          <th align="center">Includes facets</th>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
        <tr>
          <td valign="top">ctype</td>
          <td><tt>
            ctype&lt;char&gt;, ctype&lt;wchar_t&gt;<br/>
            codecvt&lt;char,char,mbstate_t&gt;<br/>
            codecvt&lt;char16_t,char,mbstate_t&gt;<ins> (deprecated)</ins><br/>
            codecvt&lt;char32_t,char,mbstate_t&gt;<ins> (deprecated)</ins><br/>
            <ins>codecvt&lt;char16_t,char8_t,mbstate_t&gt;</ins><br/>
            <ins>codecvt&lt;char32_t,char8_t,mbstate_t&gt;</ins><br/>
            codecvt&lt;wchar_t,char,mbstate_t&gt;<br/>
          </tt></td>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
      </table>
    </td>
  </tr>
</table>
</div>
</blockquote>
</p>

<p>Change in table 70 of 25.3.1.1.1 [locale.category]:
<blockquote>
<div style="margin-left: 1em;">
<table>
  <tr>
    <td align="center">
      Table 70 &mdash; Required specializatoins
    </td>
  </tr>
  <tr>
    <td align="center">
      <table border="1">
        <tr>
          <th>Category</th>
          <th align="center">Includes facets</th>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
        <tr>
          <td valign="top">ctype</td>
          <td><tt>
            ctype_byname&lt;char&gt;, ctype_byname&lt;wchar_t&gt;<br/>
            codecvt_byname&lt;char,char,mbstate_t&gt;<br/>
            codecvt_byname&lt;char16_t,char,mbstate_t&gt;<ins> (deprecated)</ins><br/>
            codecvt_byname&lt;char32_t,char,mbstate_t&gt;<ins> (deprecated)</ins><br/>
            <ins>codecvt_byname&lt;char16_t,char8_t,mbstate_t&gt;</ins><br/>
            <ins>codecvt_byname&lt;char32_t,char8_t,mbstate_t&gt;</ins><br/>
            codecvt_byname&lt;wchar_t,char,mbstate_t&gt;<br/>
          </tt></td>
        </tr>
        <tr>
          <td>[&hellip;]</td>
          <td>[&hellip;]</td>
        </tr>
      </table>
    </td>
  </tr>
</table>
</div>
</blockquote>
</p>

<p>Change in 25.4.1.4 [locale.codecvt] paragraph 3:
<blockquote>
The specializations required in Table 69 (25.3.1.1.1) convert the
implementation-defined native character set.
<tt>codecvt&lt;char, char, mbstate_t&gt;</tt> implements a degenerate
conversion; it does not convert at all. The specialization<ins>s</ins>
<tt>codecvt&lt;char16_t, char, mbstate_t&gt;</tt> <ins>(deprecated) and
<tt>codecvt&lt;char16_t, char8_t, mbstate_t&gt;</tt></ins>
convert<del>s</del> between the UTF-16 and UTF-8 encoding
forms, and the specialization<ins>s</ins>
<tt>codecvt&lt;char32_t, char, mbstate_t&gt;</tt> <ins>(deprecated) and
<tt>codecvt&lt;char32_t, char8_t, mbstate_t&gt;</tt></ins>
convert<del>s</del> between the UTF-32 and UTF-8 encoding forms.
<tt>codecvt&lt;wchar_t,char,mbstate_t&gt;</tt> converts between the native
character sets for <del>narrow</del><ins>ordinary</ins> and wide characters.
Specializations on <tt>mbstate_t</tt> perform conversion between encodings
known to the library implementer. Other encodings can be converted by
specializing on a user-defined <tt>stateT</tt> type. Objects of type
<tt>stateT</tt> can contain any state that is useful to communicate to or
from the specialized <tt>do_in</tt> or <tt>do_out</tt> members.
</blockquote>
</p>


<p>Change in 30.3.1 [iosfwd.syn]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]<br/>
&nbsp;&nbsp;template&lt;class charT&gt; class char_traits;<br/>
&nbsp;&nbsp;template&lt;&gt; class char_traits&lt;char&gt;;<br/>
&nbsp;&nbsp;<ins>template&lt;&gt; class char_traits&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;template&lt;&gt; class char_traits&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class char_traits&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;template&lt;&gt; class char_traits&lt;wchar_t&gt;;<br/>
[&hellip;]<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 30.11.4 [fs.req] paragraph 1:
<blockquote>
Throughout this subclause, <tt>char</tt>, <tt>wchar_t</tt>,
<ins><tt>char8_t</tt>, </ins><tt>char16_t</tt>, and <tt>char32_t</tt> are collectively
called <em>encoded character types</em>.
</blockquote>
</p>

<p>Change in 30.11.5 [fs.filesystem.syn]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
&nbsp;&nbsp;<em>// <del>30.11.7.6.2</del><ins>D.??</ins>, path factory functions <ins>(deprecated)</ins>:</em><br/>
&nbsp;&nbsp;template &lt;class Source&gt;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;path u8path(const Source&amp; source);<br/>
&nbsp;&nbsp;template &lt;class InputIterator&gt;<br/>
&nbsp;&nbsp;&nbsp;&nbsp;path u8path(InputIterator first, InputIterator last);<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 30.11.7 [fs.class.path] paragraph 6:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]<br/>
&nbsp;&nbsp;std::string string() const;<br/>
&nbsp;&nbsp;std::wstring wstring() const;<br/>
&nbsp;&nbsp;std::<del>string</del><ins>u8string</ins> u8string() const;<br/>
&nbsp;&nbsp;std::u16string u16string() const;<br/>
&nbsp;&nbsp;std::u32string u32string() const;<br/>
[&hellip;]<br/>
&nbsp;&nbsp;std::string generic_string() const;<br/>
&nbsp;&nbsp;std::wstring generic_wstring() const;<br/>
&nbsp;&nbsp;std::<del>string</del><ins>u8string</ins> generic_u8string() const;<br/>
&nbsp;&nbsp;std::u16string generic_u16string() const;<br/>
&nbsp;&nbsp;std::u32string generic_u32string() const;<br/>
[&hellip;]<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 30.11.7.2.2 [fs.path.type.cvt] paragraph 1:
<blockquote>
The <em>native encoding</em> of <del>a narrow</del><ins>an ordinary</ins>
character string is the operating system dependent current encoding for
pathnames (30.11.7). The <em>native encoding</em> for wide character strings
is the implementation-defined execution wide-character set encoding (5.3).
</blockquote>
</p>

<p>Change in 30.11.7.2.2 [fs.path.type.cvt] subparagraph (2.1):
<blockquote>
(2.1) &mdash; <tt>char</tt>: The encoding is the native
<del>narrow</del><ins>ordinary</ins> encoding. The method of conversion, if
any, is operating system dependent. [ <em>Note</em>: For POSIX-based
operating systems <tt>path::value_type</tt> is <tt>char</tt> so no conversion
from <tt>char</tt> value type arguments or to <tt>char</tt> value type return
values is performed. For Windows-based operating systems, the native
<del>narrow</del><ins>ordinary</ins> encoding is determined by calling a
Windows API function. &mdash; <em>end note</em> ] [ <em>Note</em>: This results
in behavior identical to other C and C++ standard library functions that
perform file operations using <del>narrow</del><ins>ordinary</ins> character
strings to identify paths. Changing this behavior would be surprising and error
prone. &mdash; <em>end note</em> ]
</blockquote>
</p>

<p>Add a new subparagraph after 30.11.7.2.2 [fs.path.type.cvt] subparagraph (2.2):
<blockquote class="stdins">
(2.?) &mdash; <tt>char8_t</tt>: The encoding is UTF-8. The method of conversion is
unspecified.
</blockquote>
</p>

<p>Change in 30.11.7.4.1 [fs.path.construct] subparagraph (7.2):
<blockquote>
&mdash; Otherwise a conversion is performed using the
<tt>codecvt&lt;wchar_t, char, mbstate_t&gt;</tt> facet of <tt>loc</tt>, and
then a second conversion to the current <del>narrow</del><ins>ordinary</ins>
encoding.
</blockquote>
</p>

<p><em>Drafting note: Is the requirement for a second conversion stated above
correct?  <tt>codecvt&lt;wchar_t, char, mbstate_t&gt;</tt> already converts to
the ordinary character encoding.</em></p>

<p>Change in 30.11.7.4.1 [fs.path.construct] paragraph 8:
<blockquote>
[&hellip;]<br/>
For POSIX-based operating systems, the path is constructed by first using
<tt>latin1_facet</tt> to convert ISO/IEC 8859-1 encoded <tt>latin1_string</tt>
to a wide character string in the native wide encoding (30.11.7.2.2). The
resulting wide string is then converted to
<del>a narrow</del><ins>an ordinary</ins> character pathname string in the
current native <del>narrow</del><ins>ordinary</ins> encoding. If the native
wide encoding is UTF-16 or UTF-32, and the current native
<del>narrow</del><ins>ordinary</ins> encoding is UTF-8, all of the characters
in the ISO/IEC 8859-1 character set will be converted to their Unicode
representation, but for other native <del>narrow</del><ins>ordinary</ins>
encodings some characters may have no representation.
[&hellip;]<br/>
</blockquote>
</p>

<p>Change in 30.11.7.4.6 [fs.path.native.obs] paragraph 8:
<blockquote>
<div style="margin-left: 1em;">
<tt>
std::string string() const;<br/>
std::wstring wstring() const;<br/>
std::<del>string</del><ins>u8string</ins> u8string() const;<br/>
std::u16string u16string() const;<br/>
std::u32string u32string() const;<br/>
</tt>
</div>
<br/>
<em>Returns</em>: <tt>native()</tt>.
</blockquote>
</p>

<p>Change in 30.11.7.4.6 [fs.path.native.obs] paragraph 9:
<blockquote>
<em>Remarks</em>: Conversion, if any, is performed as specified by 30.11.7.2.
<del>The encoding of the string returned by <tt>u8string()</tt> is always
UTF-8.</del>
</blockquote>
</p>

<p>Change in 30.11.7.4.7 [fs.path.generic.obs] paragraph 5:
<blockquote>
<div style="margin-left: 1em;">
<tt>
std::string generic_string() const;<br/>
std::wstring generic_wstring() const;<br/>
std::<del>string</del><ins>u8string</ins> generic_u8string() const;<br/>
std::u16string generic_u16string() const;<br/>
std::u32string generic_u32string() const;<br/>
</tt>
</div>
<br/>
<em>Returns</em>: The pathname in the generic format.
</blockquote>
</p>

<p>Change in 30.11.7.4.7 [fs.path.generic.obs] paragraph 6:
<blockquote>
<em>Remarks</em>: Conversion, if any, is specified by 30.11.7.2. <del>The
encoding of the string returned by <tt>generic_u8string()</tt> is always UTF-8.
</blockquote>
</p>

<p>Move subclause 30.11.7.6.2 [fs.path.factory] to appendix D
and rename it to [depr.fs.path.factory]</p>

<p><em>Drafting note: The <tt>u8path</tt> factory function templates are
deprecated.</em></p>

<p>Change in 32.2 [atomics.syn]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]<br/>
&nbsp;&nbsp;<em>// 32.5, lock-free property</em><br/>
&nbsp;&nbsp;#define ATOMIC_BOOL_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;<ins>#define ATOMIC_CHAR8_T_LOCK_FREE <em>unspecified</em></ins><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR16_T_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR32_T_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_WCHAR_T_LOCK_FREE <em>unspecified</em><br/>
[&hellip;]<br/>
&nbsp;&nbsp;using atomic_ullong&nbsp;&nbsp;&nbsp;= atomic&lt;unsigned long long&gt;;<br/>
&nbsp;&nbsp;<ins>using atomic_char8_t&nbsp;&nbsp;= atomic&lt;char8_t&gt;;</ins><br/>
&nbsp;&nbsp;using atomic_char16_t&nbsp;= atomic&lt;char16_t&gt;;<br/>
&nbsp;&nbsp;using atomic_char32_t&nbsp;= atomic&lt;char32_t&gt;;<br/>
&nbsp;&nbsp;using atomic_wchar_t&nbsp;&nbsp;= atomic&lt;wchar_t&gt;;<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 32.5 [atomics.lockfree]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
&nbsp;&nbsp;#define ATOMIC_BOOL_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;<ins>#define ATOMIC_CHAR8_T_LOCK_FREE <em>unspecified</em></ins><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR16_T_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_CHAR32_T_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;#define ATOMIC_WCHAR_T_LOCK_FREE <em>unspecified</em><br/>
&nbsp;&nbsp;[&hellip;]<br/>
</tt>
</div>
</blockquote>
</p>

<p>Change in 32.6.2 [atomics.types.int] paragraph 1:
<blockquote>
There are specializations of the <tt>atomic</tt> template for the
integral types <tt>char</tt>, <tt>signed char</tt>, <tt>unsigned char</tt>,
<tt>short</tt>, <tt>unsigned short</tt>, <tt>int</tt>, <tt>unsigned int</tt>,
<tt>long</tt>, <tt>unsigned long</tt>, <tt>long long</tt>,
<tt>unsigned long long</tt>, <ins><tt>char8_t</tt>, </ins><tt>char16_t</tt>,
<tt>char32_t</tt>, <tt>wchar_t</tt>, and any other types needed by the typedefs
in the header &lt;cstdint&gt;. [&hellip;]<br/>
[&hellip;]
</blockquote>
</p>

<p>Change in A.6 [gram.dcl]:
<blockquote>
<div style="margin-left: 1em;">
<tt>
[&hellip;]</br>
<em>simple-type-specifier</em>:
&nbsp;&nbsp;&nbsp;[&hellip;]</br>
&nbsp;&nbsp;&nbsp;<tt>char</tt><br/>
&nbsp;&nbsp;&nbsp;<ins><tt>char8_t</tt></ins><br/>
&nbsp;&nbsp;&nbsp;<tt>char16_t</tt><br/>
&nbsp;&nbsp;&nbsp;<tt>char32_t</tt><br/>
&nbsp;&nbsp;&nbsp;<tt>wchar_t</tt><br/>
&nbsp;&nbsp;&nbsp;[&hellip;]</br>
[&hellip;]</br>
</tt>
</div>
</blockquote>
</p>

<p>Change in C.1.1 [diff.lex]:
<blockquote>
[&hellip;]</br>
<strong>Change</strong>: String literals made const.<br/>
The type of a string literal is changed from "array of <tt>char</tt>" to
"array of <tt>const char</tt>". <ins>The type of a <tt>char8_t</tt> string
literal is changed from "array of <em>some-integer-type</em>" to "array of
<tt>const char8_t</tt>". </ins>The type of a <tt>char16_t</tt> string literal
is changed from "array of <em>some-integer-type</em>" to "array of
<tt>const char16_t</tt>". The type of a <tt>char32_t</tt> string literal is
changed from "array of <em>some-integer-type</em>" to "array of
<tt>const char32_t</tt>". The type of a wide string literal is changed from
"array of <tt>wchar_t</tt>" to "array of <tt>const wchar_t</tt>".<br/>
[&hellip;]</br>
</blockquote>
</p>

<p>Change in C.5.1 [diff.cpp17.lex] paragraph 1:
<blockquote>
<strong>Affected subclause:</strong> 5.11<br/>
<strong>Change</strong>: New keywords<br/>
<strong>Rationale</strong>: Required for new features. The <tt>requires</tt>
keyword is added to introduce constraints through a <em>requires-clause</em>
or a <em>requires-expression</em>. The <tt>concept</tt> keyword is added to
enable the definition of <em>concepts</em> (17.6.8).  <ins>The <tt>char8_t</tt>
keyword is added to differentiate the types of ordinary and UTF-8
literals (5.13.5).</ins><br/>
<strong>Effect on original feature</strong>: Valid ISO C++ 2017 code using
<tt>concept</tt><ins>,</ins><del> or</del> <tt>requires</tt><ins>, or
<tt>char8_t</tt></ins> as an identifier is not valid in this International
Standard.<br/>
</blockquote>
</p>

<p>Add a new paragraph to C.5.1 [diff.cpp17.lex]:
<blockquote class=stdins>
<strong>Affected subclause:</strong> 5.13<br/>
<strong>Change</strong>: Type of UTF-8 string and character literals.<br/>
<strong>Rationale</strong>: Required for new features.  The changed types
enable function overloading, template specialization, and type deduction
to distinguish ordinary and UTF-8 string and character literals.<br/>
<strong>Effect on original feature</strong>: Valid ISO C++ 2017 code that
depends on UTF-8 string literals having type "array of <tt>const char</tt>"
and UTF-8 character literals having type "<tt>char</tt>" is not valid in
this International Standard.<br/>
<br/>
<div style="margin-left: 1em;">
<tt>
<pre>const auto *u8s = u8"text";   <em>// <tt>u8s</tt> previously deduced as <tt>const char *</tt>; now deduced as <tt>const char8_t *</tt>.</em>
const char *ps = u8s;         <em>// ill-formed; previously well-formed.</em>

auto u8c = u8'c';             <em>// <tt>u8c</tt> previously deduced as <tt>char</tt>; now deduced as <tt>char8_t</tt>.</em>
char *pc = &amp;u8c;              <em>// ill-formed; previously well-formed.</em>

std::string s = u8"text";     <em>// ill-formed; previously well-formed.</em>

void f(const char *s);
f(u8"text");                  <em>// ill-formed; previously well-formed.</em>

template&lt;typename&gt; struct ct;
template&lt;&gt; struct ct&lt;char&gt; {
  using type = char;
};
ct&lt;decltype(u8'c')&gt;::type x;  <em>// ill-formed; previously well-formed.</em>
</pre>
</tt>
</div>
</blockquote>
</p>

<p>Add a new subcluase after C.5.4 [diff.cpp17.library]:
<blockquote class=stdins>
C.5.? Clause 30: Input/Output library [diff.cpp17.input.output]<br/>
<br/>
<strong>Affected subclause:</strong> 30.11.7<br/>
<strong>Change</strong>: Return type of filesystem path format
observer member functions.<br/>
<strong>Rationale</strong>: Required for new features.<br/>
<strong>Effect on original feature</strong>: Valid ISO C++ 2017 code that
depends on the <tt>u8string()</tt> and <tt>generic_u8string()</tt> member
functions of <tt>std::filesystem::path</tt> returning <tt>std::string</tt>
is not valid in this International Standard.<br/>
<br/>
<div style="margin-left: 1em;">
<tt>
<pre>std::filesystem::path p;
std::string s1 = p.u8string();          <em>// ill-formed; previously well-formed.</em>
std::string s2 = p.generic_u8string();  <em>// ill-formed; previously well-formed.</em>
</pre>
</tt>
</div>
</blockquote>
</p>

<h2 id="feature_testing">Wording for P0096: Feature-testing recommendations
        for C++</h2>

<p>These changes are relative to
<a title="P0096R5: Feature-testing recommendations for C++"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0096r5.html">
P0096R5 (Feature-testing recommendations for C++)</a>
<sup><a title="P0096R5: Feature-testing recommendations for C++"
        href="#ref_p0096r5">
[P0096R5]</a></sup>
</p>

<p>Add a new subclause before 3.4, "C++17 features":
<blockquote class=stdins>
<strong>[3.X] C++Maybe features</strong><br/>
<br/>
[1] The following table itemizes changes in consideration for a future WG21
working draft. (Changes that were made as specified in a core or library issue
are not generally included.)<br/>
<br/>
[2] The table is sorted by the section of the standard primarily affected. The
"Doc. No." column links to the paper itself on the committee web site. The
"Macro Name" column links to the relevant portion of the "Detailed explanation
and rationale" section of this document.  When the recommendation is to change
the value of a macro previously recommended to be defined, the "Value" column
links to the table entry for the previous recommendation.<br/>
<br/>
[3] For library features, the "Header" column identifies the header that is expected
to define the macro, although the macro may also be predefined. For language
features, the macro must be predefined.<br/>
<br/>
<table>
  <tr>
    <td align="center">
      Significant changes under consideration
    </td>
  </tr>
  <tr>
    <td align="center">
      <table border="1">
        <tr>
          <th align="center">Doc. No.</th>
          <th align="center">Title</th>
          <th align="center">Primary<br/>Section</th>
          <th align="center">Macro Name</th>
          <th align="center">Value</th>
          <th align="center">Header</th>
        </tr>
        <tr>
          <td><a href="http://wg21.link/p0482r1">P0482R1</a>
          <td>char8_t: A type for UTF-8 characters and strings</td>
          <td>5.13, 6.7.1</td>
          <td>__cpp_char8_t</td>
          <td>1</td>
          <td><em>predefined</em></td>
        </tr>
        <tr>
          <td><a href="http://wg21.link/p0482r1">P0482R1</a>
          <td>char8_t: A type for UTF-8 characters and strings</td>
          <td>21.3.2, 24.3.1, 24.4.1, 25.2, 30.10.5, 32.2</td>
          <td>__cpp_lib_char8_t</td>
          <td>1</td>
          <td>&lt;atomic&gt; &lt;filesystem&gt;
              &lt;limits&gt; &lt;locale&gt;
              &lt;string&gt; &lt;string_view&gt;
          </td>
        </tr>
      </table>
    </td>
  </tr>
</table>
</blockquote>
</p>

<h1 id="acknowledgements">Acknowledgements</h1>

<p>Michael Spencer and Davide C. C. Italiano first proposed adding a new
<tt>char8_t</tt> fundamental type in 
<a title="P0372R0: A type for utf-8 data"
   href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html">
P0372R0</a>
<sup><a title="P0372R0: A type for utf-8 data"
        href="#ref_p0372r0">
[P0372R0]</a></sup>.

<h1 id="references">References</h1>

<table id="references">
  <tr>
    <td id="ref_w3techs"><sup>[W3Techs]</sup></td>
    <td>
      "Usage of UTF-8 for websites", W3Techs, 2017.<br/>
      <a href="https://w3techs.com/technologies/details/en-utf8/all/all">
      https://w3techs.com/technologies/details/en-utf8/all/all</a></td>
  </tr>
  <tr>
    <td id="ref_n2249"><sup>[N2249]</sup></td>
    <td>
      Lawrence Crowl,
      "New Character Types in C++", N2249, 2007.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2249.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2249.html</a></td>
  </tr>
  <tr>
    <td id="ref_n4197"><sup>[N4197]</sup></td>
    <td>
      Richard Smith,
      "Adding u8 character literals", N4197, 2014.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4197.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4197.html</a></td>
  </tr>
  <tr>
    <td id="ref_n4713"><sup>[N4713]</sup></td>
    <td>
      "Working Draft, Standard for Programming Language C++", N4713, 2017.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf</a></td>
  </tr>
  <tr>
    <td id="ref_p0096r5"><sup>[P0096R5]</sup></td>
    <td>
      Clark Nelson,
      "Feature-testing recommendations for C++", P0096R5, 2017.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0096r5.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0096r5.html</a></td>
  </tr>
  <tr>
    <td id="ref_p0372r0"><sup>[P0372R0]</sup></td>
    <td>
      Michael Spencer and Davide C. C. Italiano,
      "A type for utf-8 data", P0372R0, 2016.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html</a></td>
  </tr>
  <tr>
    <td id="ref_p0244r2"><sup>[P0244R2]</sup></td>
    <td>
      Tom Honermann,
      "Text_view: A C++ concepts and range based character encoding and code
       point enumeration library", P0244R2, 2017.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0244r2.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0244r2.html</a></td>
  </tr>
  <tr>
    <td id="ref_p0218r1"><sup>[P0218R1]</sup></td>
    <td>
      Beman Dawes,
      "Adopt the File System TS for C++17", P0218R1, 2016.<br/>
      <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html">
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0372r0.html</a></td>
  </tr>
</table>

</body>
