<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style>
body
{
  font-family: arial, sans-serif;
  max-width: 6.75in;
  margin: 0px auto;
  font-size: 85%;
}
 ins  {background-color: #CCFFCC; text-decoration: none;}
 del  {background-color: #FFCACA; text-decoration: none;}
 pre  {background-color: #D7EEFF; font-size: 95%; font-family: "courier new", courier, serif;}
 code {font-family: "courier new", courier, serif;}
 table {font-size: 90%;}
</style>
<title>Unicode Encoding conversions</title>
</head>

<body>
<table>
<tr>
  <td align="left">Doc. no.:</td>
  <td align="left">P0353R0</td>
</tr>
<tr>
  <td align="left">Date:</td>
  <td align="left">
  <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y-%m-%d" startspan -->2016-05-30<!--webbot bot="Timestamp" endspan i-checksum="12329" --></td>
</tr>
<tr>
  <td align="left">Reply to:</td>
  <td align="left">Beman Dawes &lt;bdawes at acm dot org&gt;
</tr>
<tr>
  <td align="left">Audience:</td>
  <td align="left">Library Evolution</td>
</tr>
</table>



<h1 align="center">Unicode Encoding Conversions for the Standard Library</h1>

<p  style="margin-left:20%; margin-right:20%;">Proposes Unicode Transformation 
Form (UTF) encoding conversion functions to ease interoperability between the 
strings of <code>char</code>, <code>char16_t</code>, 
<code>char32_t</code>, and <code>wchar_t</code> character types. Pure addition to the standard library. 
No changes to the 
core language or existing standard library components. Breaks no existing code 
or ABI. Specified in accordance with the Unicode Standard. Proposed wording 
provided. Has been implemented. Suitable for 
either a library TS or the standard itself.</p>



<p  style="margin-left:20%; margin-right:20%;">This is a preliminary proposal to 
gain feedback from the LEWG.</p>



<h2>Motivation</h2>



<p>Modern C++ character types <code>char</code>, <code>char16_t</code>, 
and
<code>char32_t</code> support 
Unicode Transformation Forms UTF-8, UTF-16, and UTF-32 respectively. Character 
type <code>wchar_t</code> also supports one of these encodings. 
Character and string literals and several forms of strings are supported for 
these character types. Use of more than one UTF encoding may appear in the same 
application, or even the same function. Yet neither the language nor the 
standard library provides a modern, convenient way to convert between these encodings. 
There is no equivalent to the ease with which the <code>std::to_string</code> 
family of functions can convert an arithmetic value to a string. This proposal 
solves the problems encountered by users due to the lack of convenient Unicode 
encoding conversions in the standard library. It does so in a way that meets the 
error handling requirements of the Unicode standard and the error handling needs 
requested by Unicode experts.</p>



<h3>Example</h3>



  <p><b>Problem: </b>Given a third-party function <code>f()</code> that returns a UTF-8 
  encoded <code>std::string</code> from a database, and a function <code>g()</code> 
  from a different third-party that expects a UTF-16 encoded <code>std::u16string</code> as an argument, <code>
  call g()</code> with <code>f()</code> in a way that converts the string types 
  and encodings, and handles errors according to the best practices documented in 
  the Unicode standard.</p>
  <p><b>Using the proposal:</b></p>
  <blockquote>
  <p dir="ltr">
  <code>string u8str(f());</code>
  <code>
  &nbsp;// get a string that happens to be UTF-8 encoded<br>
  ...<br>
  g(<a href="#uni.conv_enc_cvt">to_u16string</a>(u8str)); // call a function 
  that requires UTF-16</code></p>
  </blockquote>
<p><b>Without the proposal, using only the standard library:</b> This might not 
be too difficult using a third-party library, but is surprisingly difficult 
using only the standard library. Unless the developer had enough Unicode 
experience to focus on error detection and to test against one of the existing 
UTF-8 test data sets, a roll-your-own solution would probably be very 
error-prone.</p>



<h2>Prior proposal</h2>



<p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3398.html">
N3398</a>, <i>String Interoperation Library</i>, proposed a complete overhaul of 
the standard library&#39;s mechanisms for character encoding conversion. The 
proposal was discussed at the Portland meeting in 2012. Some aspects of the 
proposal drew strong support, such as improving Unicode string interoperability. 
Other aspects drew strong opposition, such as new low level functionality to 
replace <code>std::codecvt</code>. Clearly participants did not want N3398 - 
they wanted a different proposal, less overreaching and more focused on Unicode 
encoding conversions. Bill Plauger summed it up when he said something like 
&quot;Don&#39;t reinvent <code>codecvt</code>. That said, we should pick  a 
winner &mdash; Unicode.&quot;</p>



<p>The current proposal is completely new and not a revision of N3398.</p>



<h2>Implementation</h2>



<p>A Boost licensed preliminary implementation is available at
<a href="https://github.com/beman/unicode/tree/std-proposal">github.com/beman/unicode/tree/std-proposal</a>.</p>



<h2>Acknowledgements</h2>



<p>Alisdair Meredith, Eric Niebler, Howard Hinnant, Jeffrey Yasskin, 
Marshall Clow, PJ Plauger, and Stephan T. Lavavej participated in the Portland 
discussion of N3398. Many of the design decisions that have gone into the current 
proposal flow directly from the Portland discussion.</p>



<p>After the Portland meeting, Matt Austern sat down with Google&#39;s &quot;Unicode 
people&quot; to &quot;clarify things&quot;. His summary of that discussion was very helpful. 
Its guidance on error handling is  reflected in current  
proposal.</p>



<h2>Design decisions</h2>



<p><b>Limit this proposal on UTF encoding conversion</b></p>



  <ul>
    <li>Getting UTF encoding conversion right is hard enough without also coping 
    with other Unicode needs.</li>
    <li>No all-encompassing Unicode proposal is on the horizon.</li>
    <li>Other needs requested by domain experts, such as code point iterators, 
    are important enough to deserve their own proposals.</li>
    <li>Guidance from both the committee and domain experts clearly favors a 
    proposal focusing on UTF conversions.</li>
    <li>UTF encoding conversions can use character type alone to determine 
    encoding, and this results in simpler conversion interfaces and no need to 
    add a <code>char8_t</code> character type to the core language.</li>
  </ul>



<p><b>Provide three levels of functionality</b></p>
<blockquote>



<p>Differing user needs is the primary motivation for providing three levels of 
functionality. Meshing well with existing standard library components such as 
STL algorithms and the <code>to_string</code> family of functions is an additional 
benefit.</p>



<ul>
  <li>Provide high-level convenience encoding conversion functions that handle 
everyday string interoperability needs. Keep the interfaces simple. Example 
  of use:</li>
</ul>



<blockquote>



<blockquote>



<p><code>string u8str = u16str;</code></p>



</blockquote>



</blockquote>



<ul>
  <li>Provide mid-level generic string conversion functions to support those 
  requiring generic string interoperability needs. Example of use:</li>
</ul>



<blockquote>



<blockquote>



<p><i><code>To Be supplied</code></i></p>



</blockquote>



</blockquote>



<ul>
  <li>Provide a low-level encoding conversion algorithm patterned after existing standard 
library algorithms. Useful for users needing to perform encoding conversions on 
sequences of characters. Provides the underlying UTF encoding conversions for 
  the other functions. Example of use:</li>
</ul>
<blockquote>



<blockquote>



<p><i><code>To Be supplied</code></i></p>
</blockquote>
</blockquote>



</blockquote>



<p><b>Provide a coherent error detection and handling policy</b></p>



<ul>
  <li>Follow the Unicode standard&#39;s requirement that &quot;A conformant encoding form 
  conversion will treat any ill-formed code unit
  sequence as an error condition.&quot; (Unicode 3.9 D93 and C10).</li>
  <li>Provide a note to inform users of as to when they do and do not have to 
  explicitly check for errors themselves.</li>
  <li>Provide error checking functions so that users can also explicitly check for errors 
  to meet application needs.</li>
  <li>Handle errors via function objects to support the varied user needs 
  described by domain experts.</li>
  <li>Default error handler function object follows the Unicode standard&#39;s 
  recommendation of U+FFFD as a replacement character. Their rationale is that 
  the other common approaches, including throwing an exception, can be and have 
  been used as attack vectors.</li>
</ul>



<p><b>Provide encoding conversions as explicitly called non-member functions</b></p>



<ul>
  <li>Avoids possibly expensive hidden automatic encoding conversions in 
  unexpected places.</li>
  <li>Avoids the need to change existing standard library components.</li>
  <li>Meshes well with the other <code>to_*string</code> functions already in the 
  standard library.</li>
</ul>



<p><b>Support <code>wchar_t</code> as well as <code>char</code>, <code>char16_t</code>, 
and <code>char32_t</code></b></p>



<ul>
  <li>
  <p><code>wchar_t</code> strings are the bridge to and from non-UTF encoded
  <code>char</code> strings, via existing standard library components using 
  codecvt facets. This requires that <code>wchar_t</code> strings are UTF 
  encoded, just as the proposal requires <code>char16_t</code> and <code>char32_t</code> 
  strings be UTF encoded.</li>
</ul>



<p><b>Place the proposed components in a <code>unicode</code> namespace</b></p>



<ul>
  <li>Emphasizes that these functions assume <code>
  char</code> strings use a Unicode encoding.</li>
  <li>Signals that the committee cares about Unicode, but doesn&#39;t want to force 
  it on users who prefer other encodings.</li>
  <li>Provides a home for these and future Unicode specific functions.</li>
</ul>



<p><b>Keep interfaces neutral as to which character type or UTF encoding is 
&quot;best&quot;</b></p>



<ul>
  <li>



<p>Each of these encodings have uses where it is preferred or required, and all 
of these needs may appear in the same application. For example:</p>



  <ul>
    <li>UTF-8 is required by the API&#39;s for some operating systems.</li>
    <li>UTF-16 is required to interfacing with existing databases that use UTF-16 
    encoding.</li>
    <li>UTF-32 is preferred in code where every 
    Unicode code point being encoded as a single code unit is advantageous.</li>
  </ul>



  </li>
</ul>
<p><b>Base conformance and definitions on the Unicode standard</b></p>
<ul>
  <li>Repeating specifications that are already covered by the Unicode standard, 
  and then trying to stay in sync as the Unicode standard evolves, is the path 
  to insanity.</li>
  <li>In an &quot;informative&quot; (i.e. non-normative) section do describe some of the 
  key Unicode definitions for the convenience of readers who have not memorized 
  them.</li>
</ul>
<h2>Questions for the Library Evolution Working Group</h2>
<ul>
  <li>The specification for the <code>to_*string</code> convenience functions 
  could be reduced from 16 signatures to four signatures by changing the 
  argument types to SOURCE, and then specifying SOURCE as being any one of the 
  four current argument types. The implementors could comply by 
  supplying the full 16 signatures or by clever template metaprogramming. In other words, a less signatures versus 
  more complex wording tradeoff. Does the LEWG/LWG have a strong preference 
  either way?<br>
&nbsp;</li>
  <li>Should a second error handler that throws an exception be provided? 
  Although the <code>ufffd</code> error handler is clearly the best default,  applications that work with supposedly well-formed UTF encodings 
  may want an exception thrown if an ill-formed encoding is encountered.<br>
&nbsp;</li>
  <li>Should a new exception type such as <code>encoding_error</code> be provided?<br>
&nbsp;</li>
  <li>The Boost version will supply stream inserters and perhaps 
  extractors that perform encoding conversion. Would LEWG/LWG like to see a 
  similar proposal?</li>
</ul>
<blockquote>
  <blockquote>
<pre>#include &lt;boost/unicode/stream.hpp&gt;
u16string str16(u&quot;☺☺☺&quot;);
...
cout << str16 << '\n';</pre>
  </blockquote>
</blockquote>
<ul>
  <li>The Boost version will supply codecvt-based encoding conversion 
  between <code>char</code> and <code>wchar_t</code> strings. Would LEWG/LWG 
  like to see a similar proposal?</li>
</ul>
<blockquote>
  <blockquote>
<pre>#include &lt;boost/unicode/codecvt_conversion.hpp&gt;
string big5buf;  // big-5 encoded
wstring wbuf;    // UTF encoded
...
wbuf = codecvt_to_wstring(big5buf, big5_codecvt_facet);</pre>
  </blockquote>
</blockquote>
<h2>To do</h2>
<ul>
  <li>
<p>If P0254, <i>Integrating <code>std::string_view</code> and <code>std::string</code></i>, 
is accepted, then the proposed wording below needs to be reviewed to accommodate 
changes mandated by P0254. Such changes, if any, are expected to be minor.<br>
&nbsp;</p>
  </li>
  <li>
<p>Add non-modifying sequence and string error-checking functions that detect ill 
formed encodings.<br>
&nbsp;</p>
  </li>
  <li>Add arguments to error handler function objects:<ul>
    <li>The location of the error in the input sequence. Probably the iterator of 
  the point where the error was detected.</li>
    <li>The specifics of the error. Probably an error type enum for the specific 
  encoding form errors called out by the Unicode Standard.</li>
  </ul>
  </li>
</ul>

<h1>Proposed wording</h1>

<h2>Unicode library [unicode]</h2>

<p>This clause describes components that C++ programs may use to perform 
operations on sequences and strings encoded in the Unicode character encoding 
forms UTF-32, UTF-16, and UTF-8.</p>

<h3>Normative references [uni.refs]</h3>

<p>The <i><a href="http://www.unicode.org/versions/latest/">Unicode Standard</a></i> is indispensable for the application of 
this document.<sup><font face="Arial">[footnote]</font></sup> The latest edition
(including any amendments) applies. A reference to the Unicode Standard  
written in the form &quot;(Unicode 3.4 D10)&quot; refers to the Unicode Standard, Core 
Specification, chapter 3, section 4, clause D10.</p>

<p><sup><font face="Arial">[Footnote]</font></sup> Unicode® is a registered 
trademark of Unicode, Inc. This information is given for the convenience of 
users of this document and does not constitute an endorsement by ISO or IEC of 
this product.</p>

<h3>Conformance [uni.conf]</h3>

<p>Any conflict between this Technical Specification&#39;s Unicode section ([unicode]) 
and the Unicode Standard, Chapter 3, C (conformance) and D (definitions) clauses 
is unintentional and should be resolved by reference to the Unicode Standard.</p>

<p>The normative definitions for the terms described informally in [uni.defs] 
are included in this Technical Specification by reference from the indicated 
D-clause definitions of the Unicode Standard.</p>

<h3>Definitions (Informative) [uni.defs]</h3>

<p>For convenience, informal summaries of definitions used in [unicode] are 
given here as quotes from the Unicode Standard. </p>

<h4>Code point (Unicode 3.4 D10)</h4>
<p>&quot;Any value in the Unicode codespace. Informally, a code point can be thought 
of as a Unicode character.&quot; </p>
<blockquote>
  <p>(Unicode Appendix A - Notational Conventions): </p>
  <p>&quot;In running text, an individual Unicode code point is expressed as U+n, 
  where n is four to six hexadecimal digits, using the digits 0–9 and uppercase 
  letters A–F (for 10 through 15, respectively). Leading zeros are omitted, 
  unless the code point would have fewer than four hexadecimal digits—for 
  example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345. </p>
  <p>[e.g.] U+0416 is the Unicode code point for the character named CYRILLIC 
  CAPITAL LETTER ZHE.&quot; </p>
</blockquote>

<h4>Code unit (Unicode 3.9 D77)</h4>
<p>&quot;The minimal bit combination that can represent a unit of encoded text for 
processing or interchange. Code units are particular units of computer storage. 
... The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 
16-bit code units in the UTF-16 encoding form, and 32-bit code units in the 
UTF-32 encoding form.&quot;</p>
<blockquote>
  <p>[<i>Note:</i> In C++ one <code>char</code>, <code>wchar_t</code>, <code>char16_t</code>, or 
  <code>char32_t</code> character holds one 
  code unit.
  One to four code units (type <code>char</code>) are required to hold a UTF-8 
  encoded code 
  point.
  One or two code units (type <code>char16_t</code>) are required to hold a UTF-16 encoded 
  code point.
  One code unit (type <code>char32_t</code>) is required to hold a UTF-32 code point. 
  Type <code>wchar_t</code> may use 8, 16, or 32-bit code units, encoded as 
  UTF-8, UTF-16, or UTF-32, respectively, so will  require 4, 2, or 1 code units
  to hold a code point depending on the encoding.&mdash;<i>end note</i>]</p>
</blockquote>

  <h4>Unicode encoding forms (Unicode 3.9)</h4>

  <p>&quot;The Unicode Standard supports three character encoding forms: UTF-32, 
  UTF-16, and UTF-8. Each encoding form maps the Unicode code points 
  U+0000..U+D7FF and U+E000..U+10FFFF to unique code unit sequences. The size of 
  the code unit is specified for each encoding form. This section (Unicode 3.9) 
  presents the formal definition of each of these encoding forms.&quot;</p>
  <p>For formal definitions of UTF-32, UTF-16, and UTF-8, see
  <a href="http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G7404">Section 
  3.9, Unicode Encoding Forms</a> in <em>The Unicode Standard</em>. </p>

<p>[Note: For general questions related to Unicode transformation 
form (UTF), UTF-8, UTF-16, UTF-32, or byte order marks (BOM), see
<a href="http://unicode.org/faq/utf_bom.html">unicode.org/faq/utf_bom.html</a>.&mdash;<i>end note</i>]</p>

<h4>Well-formed (Unicode 3.9 D85)</h4>

<p>&quot;A Unicode code unit sequence that purports to be in a Unicode encoding form 
is called well-formed if and only if it does follow the specification of that 
Unicode encoding form.&quot;</p>

<h4>Minimal well-formed code unit subsequence (Unicode 3.9 D85a)</h4>

<p>&quot;A well-formed Unicode code unit sequence that maps to a single Unicode 
scalar value.</p>


<blockquote>
  <ul>
    <li>For UTF-8, see the specification in Unicode 3.9 D92 and Table 3-7.</li>
    <li>For UTF-16, see the specification in Unicode 3.9 D91.</li>
    <li>For UTF-32, see the specification in Unicode 3.9 D90.&quot;</li>
  </ul>
</blockquote>

<h3>Header &lt;experimental/unicode&gt; synopsis</h3>

<pre>namespace std {
namespace experimental {
inline namespace fundamentals_v2 {
namespace unicode {

  //  Error function objects are called with no arguments and either throw an
  //  exception or return a const pointer to a possibly empty C-style string.

  //  default error handler: function object returns a C-string of type
  //  ToCharT with a UTF encoded value of U+FFFD.

  // [uni.err], error handling
  template &lt;class CharT&gt; struct ufffd;
  template &lt;&gt; struct ufffd&lt;char&gt;;
  template &lt;&gt; struct ufffd&lt;char16_t&gt;;
  template &lt;&gt; struct ufffd&lt;char32_t&gt;;
  template &lt;&gt; struct ufffd&lt;wchar_t&gt;;

  //  [uni.enc_cvt_alg], string encoding conversion algorithm
  template &lt;class ToCharT, class InputIterator, class OutputIterator,
    class Error = typename ufffd&lt;ToCharT&gt;&gt;
  OutputIterator convert_utf(InputIterator first, InputIterator last, 
                             OutputIterator result, Error eh = Error());

  //  [uni.gen_enc_cvt], string encoding generic conversions
  template &lt;class ToCharT, class FromCharT,
    class FromTraits = typename char_traits&lt;FromCharT&gt;,
    class View = basic_string_view&lt;FromCharT, FromTraits&gt;,
    class Error = ufffd&lt;ToCharT&gt;,
    class ToTraits = char_traits&lt;ToCharT&gt;,
    class ToAlloc = allocator&lt;ToCharT&gt;&gt;
  basic_string&lt;ToCharT, ToTraits, ToAlloc&gt;
    to_utf_string(View v, Error eh = Error(), const ToAlloc&amp; a = ToAlloc());
  
  //  [uni.conv_enc_cvt], string encoding convenience conversions
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(wstring_view v, Error eh = Error());

}  // namespace unicode
}  // namespace fundamentals_v2
}  // namespace experimental
}  // namespace std</pre>



<h3>UTF encoding conversion functions [uni.enc_cvt]</h3>

<p>UTF conversion functions determine encoding based on character type. The 
relationship between character type and encoding is specified by the following 
table:</p>

<blockquote>
<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111">
  <tr>
    <td colspan="2">
    <p align="center"><b><i>UTF Conversions</i></b></td>
      </tr>
  <tr>
    <td><b>Character Type</b></td>
    <td><b>Encoding</b></td>
  </tr>
  <tr>
    <td><code>char</code></td>
    <td>UTF-8</td>
  </tr>
  <tr>
    <td><code>char16_t</code></td>
    <td>UTF-16</td>
  </tr>
  <tr>
    <td><code>char32_t</code></td>
    <td>UFT-32</td>
  </tr>
  <tr>
    <td><code>wchar_t</code></td>
    <td>UTF-8, 16, or 32</td>
  </tr>
</table>
</blockquote>



<h4>Error handling [uni.err]</h4>



<p>When an ill-formed code unit subsequence is detected during execution of a 
conversion function, an <i>error handler</i> function object is invoked. Unless 
the error handler throws an exception, the string returned by the error handler 
is added to the output sequence and the ill-formed input code unit subsequence 
is not added to the output sequence. Detection of ill-formed code unit 
subsequences is required even when the input and output encodings are the same. 
[<i>Note:</i> If the error handler function object always returns a 
well-formed UTF character sequence, the conversions function&#39;s entire output 
sequence is a well-formed UTF sequence. &mdash; <i>end note</i>]</p>

<pre>template &lt;class CharT&gt; struct ufffd;
template &lt;&gt; struct ufffd&lt;char&gt;;
template &lt;&gt; struct ufffd&lt;char16_t&gt;;
template &lt;&gt; struct ufffd&lt;char32_t&gt;;
template &lt;&gt; struct ufffd&lt;wchar_t&gt;;</pre>


<p><code>struct ufffd</code> provides the default error handler function object for conversion functions. 
The default error 
handling function object returns U+FFFD REPLACEMENT CHARACTER as a single code point error marker. Each specialization shall provide a member function with the signature:</p>

<blockquote>


<p> <code>constexpr CharT* operator()() const noexcept;</code></p>

</blockquote>


<p>that returns the value indicated in the Specializations table: </p>

<blockquote>
<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111">
<tr><td colspan="2"><p align="center"><i><b>Specializations</b></i></td></tr>
<tr><td><i><b><code>CharT</code></b></i></td><td><i><b>Returns</b></i></td></tr>
<tr><td><code>char</code></td><td><code>u8&quot;\uFFFD&quot;</code></td></tr>
<tr><td><code>char16_t</code></td><td><code>u&quot;\uFFFD&quot;</code></td></tr>
<tr><td><code>char32_t</code></td><td><code>U&quot;\uFFFD&quot;</code></td></tr>
<tr><td><code>wchar_t</code></td><td><code>L&quot;\uFFFD&quot;</code></td></tr>
</table>
</blockquote>

<blockquote>
  <p>[<i>Note:</i> U+FFFD REPLACEMENT CHARACTER is returned as the default 
  single code point error marker in accordance with the recommendations of the Unicode Standard. 
  The rationale given by the Unicode standard is essentially that other commonly used approaches, including 
  throwing exceptions, can be and have been used as security attack vectors.
  &mdash;<i>end note</i>]</p>
</blockquote>

<h4>Encoding conversion algorithm [uni.enc_cvt_alg]</h4>

<pre>template &lt;class ToCharT, class InputIterator, class OutputIterator,
          class Error = typename ufffd&lt;ToCharT&gt;&gt;
  OutputIterator convert_utf(InputIterator first, InputIterator last, 
                             OutputIterator result, Error eh = Error());</pre>
  <blockquote>
     <p><i>Effects:</i> For each minimal well-formed or ill-formed code unit 
     subsequence in the range [<code>first, last</code>):</p>
     <ul>
       <li>
       <p>If the code unit subsequence is well-formed, copies the 
       subsequence&#39;s Unicode scalar value by performing <code>*result++ = *u++</code> 
       where u is a <code>ToCharT*</code> pointing to the code units required to represent the 
       subsequence&#39;s Unicode 
       scalar value in the encoding form of <code>result</code>.</li>
       <li>
       <p>Otherwise, copies the null-terminated string returned by the <code>eh</code> function 
       object by performing <code>*result++ = *p++</code> for each successive 
       value of a pointer <code>p</code> to the returned string.</li>
     </ul>
     <p><i>Returns: </i><code>result</code>.</p>
     <p><i>Remarks:</i>&nbsp; The Unicode encoding form for the range [<code>first, 
     last</code>) is determined by <code>InputIterator</code> value type ([uni.enc_cvt]). The 
     Unicode encoding form for <code>result</code> is determined by <code>
     ToCharT</code> ([uni.enc_cvt]).</p>
  </blockquote>                           

<h4>Generic string encoding conversion functions [uni.gen_enc_cvt]</h4>

<pre> template &lt;class ToCharT, class FromCharT,
    class FromTraits = typename char_traits&lt;FromCharT&gt;,
    class View = basic_string_view&lt;FromCharT, FromTraits&gt;,
    class Error = ufffd&lt;ToCharT&gt;,
    class ToTraits = char_traits&lt;ToCharT&gt;,
    class ToAlloc = allocator&lt;ToCharT&gt;&gt;
  basic_string&lt;ToCharT, ToTraits, ToAlloc&gt;
    to_utf_string(View v, Error eh = Error(), const ToAlloc&amp; a = ToAlloc());</pre>

  <blockquote>
     <p><i>Returns:</i> Equivalent to:</p>
     <blockquote>
     <p><code>basic_string&lt;ToCharT, ToTraits, ToAlloc&gt; tmp(a);<br>
     convert_utf&lt;ToCharT&gt;(v.cbegin(), v.cend(), back_inserter(tmp), eh);<br>
     return tmp;</code></p>
     </blockquote>
  </blockquote>                           

<h4>Convenience string encoding conversion functions [<a name="uni.conv_enc_cvt">uni.conv_enc_cvt</a>]</h4>

<pre>template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char&gt;&gt;
    string to_u8string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char16_t&gt;&gt;
    u16string to_u16string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;char32_t&gt;&gt;
    u32string to_u32string(wstring_view v, Error eh = Error());

  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(u16string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(u32string_view v, Error eh = Error());
  template &lt;class Error = ufffd&lt;wchar_t&gt;&gt;
    wstring to_wstring(wstring_view v, Error eh = Error());</pre>

  <blockquote>
     <p><i>Returns:</i> Equivalent to: </p>
     <blockquote>
     <p> <code>to_utf_string&lt;<i>r_value_type</i>,
     <i>v_value_type</i>, Error&gt;(v, eh)</code> where <i><code>r_value_type</code></i> 
     is the <code>value_type</code> of the <code>basic_string</code> to be 
     returned and <i>
     <code>v_value_type</code></i> is the <code>value_type</code> of <code>v</code>.</p>
     </blockquote>
  </blockquote>                           

<hr>

</body>

</html>