<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>String Interoperation</title>
<style type="text/css">
body  {
        font-family: sans-serif;
        margin: 1em;
        max-width : 7.5in;
      }

table { margin: 0.5em; }

pre   { background-color:#D7EEFF }

ins   { background-color:#A0FFA0 }
del   { background-color:#FFA0A0 }

</style>
</head>

<body>

  <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="579">
    <tr>
      <td width="153" align="left" valign="top">Document number:</td>
      <td width="426">N3398=12-0088</td>
    </tr>
    <tr>
      <td width="153" align="left" valign="top">Date:</td>
      <td width="426">
      <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y-%m-%d" startspan -->2012-09-19<!--webbot bot="Timestamp" endspan i-checksum="12569" --></td>
    </tr>
    <tr>
      <td width="153" align="left" valign="top">Project:</td>
      <td width="426">Programming Language C++, Library Working Group</td>
    </tr>
    <tr>
      <td width="153" align="left" valign="top">Reply-to:</td>
      <td width="426">Beman Dawes &lt;bdawes at acm dot org&gt;</td>
    </tr>
  </table>
  

<h1>String Interoperation Library<br>
<font size="5">Adapting Standard Library Strings and I/O to a Unicode World</font></h1>

<p><a name="Introduction"></a>This paper proposes library components to ease string 
interoperability problems for Unicode and other string 
encodings. These problems occur with the current C++11 standard library. Read 
the <a href="#components">Components...</a> section for a full description of 
problems or look at some simple examples <a href="#comp.to_string">here</a>.</p>

  <blockquote>

<p>I first encountered the 
C++03 version of string interoperability problems while 
providing Unicode support for the internationalization of commercial GIS software. 
These problems appeared again while working on the 
Boost Filesystem Library. They have become more apparent as compiler 
support for C++11's additional Unicode support has made it easier to write programs 
that run up against current limitations.</p>

<p>Work began on the proposal when the Library Working Group requested string encoding 
conversion arguments be removed from class <code>path</code> in the initial 
C++11 proposal for a Filesystem library. That sparked this proposal as a far 
more general solution to string encoding conversion problems than a Filesystem 
specific proposal.</p>

  </blockquote>

<p>The proposed components are separable. Any of the components except codecs and codec 
helpers could be removed, although ease-of-use would suffer as a result.</p>

<p>The proposed components are suitable for a C++ standard library Technical 
Specification (TS), either standalone or as part of a larger TS.</p>

<p>The proposed components are pure additions. No C++03 
or C++11 headers are changed and no current user or 
standard library code is broken, subject only to the usual namespace discipline 
caveats.</p>

<p><a href="#Wording">Proposed wording</a> is provided. The proposed wording relies only 
on C++11 
features. Should a <code>basic_string</code>-reference library TS be accepted, 
it might be used to reduce the number of signatures in this proposal.</p>

<p>A &quot;proof-of-concept&quot; implementation of the proposals (and more) is 
available at <a href="http://github.com/Beman/string-interoperability">
github.com/beman/string-interoperability</a>.</p>

<h2>Table of contents</h2>

<p><a href="#Introduction">Introduction</a><br>
<a href="#Revision-history">Revision history</a><br>
<a href="#components">Proposed components and their motivation</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp-Codecs">Codecs and their helpers</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp-conversion_iterator"><code>conversion_iterator</code> class 
template</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp-copy_string"><code>copy_string</code> algorithm</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp.make_string"><code>make_string</code> function template</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp.to_string"><code>to_<i>string</i></code> conversion functions, 
converting stream inserters and extractors</a><br>
&nbsp;&nbsp;&nbsp;<a href="#comp-UTF-8">Explicit UTF-8 encoded types <code>char8_t</code> and
<code>u8string</code></a><br>
<a href="#Design">Design</a><br>
&nbsp;&nbsp;&nbsp;<a href="#Design-paths-not-taken">Design paths not taken</a><br>
<a href="#Existing-interop">Existing practice with string interoperability</a><br>
<a href="#Existing-iterator">Existing practice with conversion iterators</a><br>
<a href="#Acknowledgements">Acknowledgements</a><br>
<a href="#TODO">TODO List</a><br>
<a href="#Wording">Proposed Wording</a><br>
<a href="#str-x">String interoperation&nbsp; </a><br>
<a href="#str-x.synopsis">Header &lt;string_interop.hpp&gt; synopsis&nbsp;
</a><br>
<a href="#str-x.codec">Codecs&nbsp; </a><br>
&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.default">Class default_codec&nbsp; </a><br>
&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.req">Requirements on codec classes&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.req.eos">end-of-sequence iterator requirements&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.req.from"><code>from_iterator</code> requirements&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.req.to"><code>to_iterator</code> requirements&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.req.ctors">Constructor requirements&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;<a href="#str-x.codec.select">select_codec&nbsp; </a><br>
<a href="#str-x.utf8-typedefs">UTF-8 typedefs (Informative)&nbsp;
</a><br>
<a href="#str-x.cvt-iter">Class template <code>conversion_iterator</code>&nbsp;
</a><br>
&nbsp;&nbsp;&nbsp;<a href="#str-x.cvt-iter.synop">Synopsis&nbsp; </a><br>
&nbsp;&nbsp;&nbsp;<a href="#str-x.cvt-iter.ctors">Constructors&nbsp; </a><br>
<a href="#str-x.copy_string">Algorithm <code>copy_string</code>&nbsp;
</a><br>
<a href="#str-x.make_string"><code>make_string</code> function templates&nbsp;
</a><br>
<a href="#str-x.to_string"><code>to_</code><i><code>string</code></i> function 
templates&nbsp; </a><br>
<a href="#str-x.utf8">UTF-8 string support&nbsp; </a><br>
<a href="#str-x.ins">Stream inserters&nbsp; </a><br>
<a href="#str-x.ext">Stream extractors&nbsp; </a><br>
  </p>

<h2><a name="Revision-history">Revision history</a></h2>

  <p>This paper is a complete rewrite of 
  <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html">N3336</a>, Adapting Standard Library Strings and I/O to a 
  Unicode World. It reflects C++ committee feedback from the LWG&#39;s review of 
  N3336 and further analysis and experimentation.</p>

<h2>Proposed <a name="components">components</a> and their motivation</h2>

  <h3><a name="comp-Codecs">Codecs</a> and their helpers</h3>
  <p>Provide an iterator-based composable solution to string encoding and type conversion that works 
  well in generic code and does not require heap allocated temporary strings or 
  buffers.</p>
  <blockquote>
  <p>These low-level components provide the foundation for most of the higher 
  level components. They provide an abstraction of string encoding and type 
  conversion that frees higher-level components from details.</p>
  <ul>
    <li>The library provides codecs for the native <code>char</code> and <code>
    wchar_t</code> 
    encodings, plus UTF-8, UTF-16, and UTF-32.</li>
    <li>Implementers and users may supply 
    additional codecs.</li>
  </ul>

  <p>Specific motivations include:</p>
  <ul>
    <li>Iterator-based because iterators have been far more productive than the 
    painful and problematic string based codecvt facet interface.</li>
    <li>Iterator based codecs allocate no heap memory.</li>
    <li>Codecs are composable, so that for all possible conversions between <b>n</b> encodings, the number of 
    codecs 
    required is <b>n</b> rather than <b>n<sup>2</sup></b>, yet there are no 
    temporary strings even for composed conversions.</li>
    <li>The native narrow codec, which can be a problem because its encoding is 
    runtime dependent, is built on top of <code>&lt;cuchar&gt;</code>, easing 
    implementation and ensuring consistency.</li>
    <li>The codec interfaced uses no virtual functions and is simpler than the 
    codecvt facet interface, a constant source of irritations and mistakes.</li>
  </ul>

  </blockquote>
  <h3><code><a name="comp-conversion_iterator">conversion_iterator</a></code> class template</h3>
  <p>Provides an iterator adapter that performs character type and encoding conversion 
  on-the-fly.</p>
  <blockquote>
    <p>While codecs (see below) offer worthwhile benefits, they essentially 
    provide low-level, encoding specific, iterators. The <code>
    conversion_iterator</code> class template provides a simple iterator adaptor that 
    composes two codecs regardless of encoding into a single, easy-to-use 
    iterator.</p>
    <p>With&nbsp; <code>conversion_iterator</code>, implementation of many mid 
    and high level character type and encoding conversions becomes trivial. It 
    is useful to standard, user, and third-party library implementers, as it 
    provides a vocabulary iterator type that is far easier to use than 
    roll-your-oven conversions based on codecvt facets. </p>
  </blockquote>

<h3><code><a name="comp-copy_string">copy_string</a></code> algorithm</h3>

<p>Provides an algorithm like <code>std::copy</code>, except performing type 
and encoding conversion as it copies.</p>

  <blockquote>

<p>Solves many end user problems.</p>

<p>Provides a simple way to both specify and implement other high-level 
convenience functions.</p>

  </blockquote>

  <h3><code><a name="comp.make_string">make_string</a></code> function template</h3>
  <p>Provides a generic string type and encoding conversion factory function.</p>

<h3><a name="comp.to_string"><code>to_</code><code><i>string</i></code></a> conversion functions,  converting stream inserters and extractors</h3>

<p>Provide easy-to-use (automatic, 
in the case of inserters and extractors) solutions to&nbsp;irritating string 
interoperability problems, in the style of similar standard library 
functionality.</p>

  <blockquote>

<p>With the C++11 standard library:</p>

    <blockquote>
      <pre>int i = 50;                      // OK
long j = i;                      // OK
cout &lt;&lt; j;                       // OK
string s = to_string(i);         // OK, C++11 provides this overload
wstring t = to_wstring(s);       // error!
u8string u = to_u8string(t);     // error!
u16string v = to_u16string(s);   // error!
u32string w = to_u32string(v);   // error!
string x = to_string(v.c_str()); // error!
string y = to_string(U&quot;50&quot;);     // error!
std::cout &lt;&lt; t;                  // error!</pre>
    </blockquote>
    <p>With the proposal (and the unmodified C++11 standard library):</p>
    <blockquote>
      <pre>int i = 50;                      // OK
long j = i;                      // OK
cout &lt;&lt; j;                       // OK
string s = to_string(i);         // OK
wstring t = to_wstring(s);       // OK
u8string u = to_u8string(t);     // OK
u16string v = to_u16string(s);   // OK
u32string w = to_u32string(v);   // OK
string x = to_string(v.c_str()); // OK
string y = to_string(U&quot;50&quot;);     // OK
std::cout &lt;&lt; t;                  // OK</pre>
    </blockquote>
  </blockquote>
  <h3><a name="comp-UTF-8">Explicit UTF-8</a> encoded types <code>char8_t</code> and <code>u8string</code></h3>
  <p>Specifies a character type and a string type that are unambiguously UTF-8 
  encoded.</p>
  <blockquote>
  <p>UTF-8 is the most important, and often the only, byte-sized character 
  encoding required by many internationalized applications. Yet it is the only 
  one of the critical Unicode encodings (UTF-8, UTF-16, UTF-32) that does not 
  have its own C++ character type. This causes endless technical problems, such 
  as the inability to overload on a UTF-8 character type, for those who want to 
  write portable code. It causes developers who otherwise think highly of C++ to 
  believe the standards committee is stuck in the distant past when dinosaurs 
  roamed the earth.</p>
  </blockquote>
  <blockquote>

<p>The proposed string interoperability facilities run afoul of the lack of a 
UTF-8 character type because they use generic programming techniques that depend 
on a one-to-one relationship between character value types and their encodings.</p>

  <table border="1" cellpadding="10" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="80%">
    <tr>
      <td width="100%" bgcolor="#FFFFCC">This feature is far more speculative 
      than the rest of the proposal. It has been implemented and has been used 
      in an experimental branch of the Filesystem library. But there is no user 
      experience whatsoever. It leaves <code>u8</code> string literals twisting 
      in the wind, and that&#39;s a serious problem. It needs much further study and 
      discussion before moving forward.</td>
    </tr>
</table>

  </blockquote>

<h2><a name="Design">Design</a></h2>

<p>The <code>copy_string</code> algorithm was a starting point for the design. 
The algorithm was 
arrived at by analyzing numerous real-world string conversion problems 
encountered by Boost Filesystem and while internationalizing various industrial 
applications. During that analysis, it was observed that <code>std::copy</code> algorithm would be a 
common solution to those problems if it could be given generic versions of John 
Maddock&#39;s Unicode conversion iterator adaptors used in his Boost Regex 
implementation. The <code>conversion_iterator</code> and <code>codec</code> 
designs evolved as the underlying conversion abstractions 
needed to support <code>copy_string</code>.</p>

<p>The key design for composition of codecs is the use of UTF-32 as an common 
intermediate encoding that works without an intermediate temporary string when 
applied at the iterator level. This is the same approach, albeit a compile time 
rather than run time, taken by the <a href="http://icu-project.org">International 
Components for Unicode (ICU)</a> library.</p>

<h3><a name="Design-paths-not-taken">Design paths not taken</a></h3>

<p>This proposal deals with C++11 <code>std::basic_string</code>, standard character types, and their encodings. The deeper attributes of Unicode 
characters are not addressed. See Mathias Gaunard's <a href="http://mathias.gaunard.com/unicode/doc/html/">
  Unicode project</a> for an example of deeper Unicode support.</p>

<p>This proposal provides compile-time solutions. It does not provide runtime 
solutions such as provided by the ICU library.</p>

<p>This proposal provides work-arounds for C++11&#39;s lack of UTF-8 strings. 
Several users have argued that instead of work-arounds, the C++ standard should 
require UTF-8 encoding for both C-style <code>char</code> strings and <code>
std::string</code>. This proposal assumes that is too great a leap forward at this 
time.</p>

<h2><a name="Existing-interop">Existing</a> practice with string interoperability</h2>

  <p>Boost Filesystem Version 3&#39;s class <code>path</code> solves some of the string 
  interoperability problems, albeit in limited context. A function that is 
  declared like this:</p>
  <blockquote>
    <pre>void f(const path&amp;);</pre>
</blockquote>
<p>Can be called like this:</p>
<blockquote>
  <pre>f(&quot;Meow&quot;);
f(L&quot;Meow&quot;);
f(u8&quot;Meow&quot;);
f(u&quot;Meow&quot;);
f(U&quot;Meow&quot;);
// ... many additional variations such as basic_strings and iterators</pre>
</blockquote>
<p>This string interoperability support has been a success. It does, however, 
raise the question of why <code>std::basic_string</code> isn't providing the 
interoperability support. <b>Users are misusing paths as general string containers 
because they provide interoperability.</b> The string interoperability cat is out of the bag. 
The toothpaste is out of the tube.</p>
<p>See Boost.Filesystem V3 class path for 
an example of how such interoperability might be achieved.</p>
<p>Experience with Boost.Filesystem V3 class path has demonstrated that string 
interoperability brings a considerable simplification and improvement to 
internationalized user code, but that having to provide interoperability without 
the resolution of the issues presented here is a band-aid.</p>
  <h2><a name="Existing-iterator">Existing</a> practice with conversion iterators</h2>

<p>Boost Regex for many years has included a set of Unicode conversion 
iterators as an implementation detail. Although these do not provide composition, they do demonstrate the 
technique of using encoding conversion iterators to avoid creation of temporary 
strings.</p>

<h2><a name="Acknowledgements">Acknowledgements</a></h2>

<p>Peter Dimov inspired the idea of string interoperability by arguing that the 
Boost Filesystem library should treat a path is a single 
  type (i.e. not a template) regardless of  character size 
  and encoding. The experienced gained with that approach led to a much clearer 
understanding of where to draw the line between functionality provided by a 
library such as Filesystem, and the standard library (or a TS) itself.</p>

<p>John Maddock's Unicode conversion iterators demonstrated an 
  easy-to-use, more efficient, and STL friendly way to perform character type 
and encoding conversions.</p>

<p>Yakov Galka 
suggested attacking string interoperability with free functions to reduce or 
eliminate changes to <code>basic_string</code>. </p>

<p>The C++11 standard deserves acknowledgement as it provides the underlying language and library features that allow 
Unicode string 
interoperability:</p>

<ul>
  <li><code>char16_t</code> and <code>char32_t</code>&nbsp; provide Unicode 
  character types and null-terminated characters strings with guaranteed 
  encodings.</li>
  <li><code>std::u16string</code> and <code>std::u32string</code> provide 
  library support for Unicode character types and encodings.</li>
  <li><code>u8</code>, <code>u</code>, and <code>U</code> character and string literals ease 
  programming with Unicode character types and encodings.</li>
</ul>

<h2><a name="TODO">TODO</a> List</h2>

  <div align="center">
    <center>
    <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="80%" bgcolor="#EEFFEE">
      <tr>
        <td width="100%">
        <p align="center"><b>To Do</b></p>
        <ul>
          <li>
          <p align="left">Add error handling argument where appropriate.</li>
          <li>
          <p align="left">Add three pointer case signatures for basic_ostream&lt;wchar_t&gt;&amp;
          </li>
          <li>
          <p align="left">Add stream extractors.</li>
          <li>
          <p align="left">Add usage examples to Proposed Wording.</li>
          <li>
          <p align="left">Add example of how would apply to Filesystem class 
          path.</li>
        </ul>
        </td>
      </tr>
    </table>
    </center>
  </div>

  <h1>Proposed <a name="Wording">Wording</a></h1>

  <p><i><span style="background-color: #FFFF99">Italic text highlighted in 
  yellow is commentary and not part of the proposal.</span></i></p>

  <p>The wording assumes the whole of the ISO C++ Standard Library 
  introduction [lib.library] is included by reference.</p>
  <h1>String interoperation&nbsp;&nbsp;&nbsp; [<a name="str-x">str-x</a>]</h1>
  <p>This library provides facilities that allow interoperation between strings of differing 
  types and encodings, and ease the use of strings with UTF-8 encoding. The following encodings are supported:</p>
  <ul>
    <li>Native narrow character and wide character encodings.</li>
    <li>Unicode UTF-8, UTF-16, and UTF-32 encodings.</li>
    <li>Implementation-defined additional encodings.</li>
    <li>User-defined additional encodings.</li>
  </ul>
  <h2>Header &lt;string_interop.hpp&gt; synopsis&nbsp;&nbsp;&nbsp; [<a name="str-x.synopsis">str-x.synopsis</a>]</h2>
  <div>
  <pre>namespace std {

  template &lt;&gt; struct char_traits&lt;unsigned char&gt;;

namespace tbd {  // tbd is to be decided

  //  UTF-8 typedefs [str-x.utf8-typedefs]
  typedef unsigned char           char8_t;
  typedef basic_string&lt;char8_t&gt;   u8string;
 
  //  codecs [str-x.codec]
  class narrow;
  class wide;     
  class utf8;     
  class utf16;    
  class utf32;    
  class default_codec;  // See [str-x.codec.default]

  //  select_codec [str-x.codec.select]
  template &lt;class charT&gt; struct select_codec;
  template &lt;&gt; struct select_codec&lt;char&gt;       { typedef narrow type; };
  template &lt;&gt; struct select_codec&lt;wchar_t&gt;    { typedef wide   type; };
  template &lt;&gt; struct select_codec&lt;char8_t&gt;    { typedef utf8   type; };
  template &lt;&gt; struct select_codec&lt;char16_t&gt;   { typedef utf16  type; };
  template &lt;&gt; struct select_codec&lt;char32_t&gt;   { typedef utf32  type; };
 
  //  conversion_iterator [str-x.cvt-iter]
  template &lt;class ToCodec, class FromCodec, class InputIterator&gt;
    class conversion_iterator;

  //  copy_string algorithm [str-x.copy_string]
  template&lt;class InputIterator, class FromCodec,
           class OutputIterator, class ToCodec&gt;
  OutputIterator copy_string(InputIterator first, InputIterator last,
    OutputIterator result);

  //  make_string function templates [str-x.make_string]
  template &lt;class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
            class FromString&gt;
  ToString make_string(const FromString&amp; ctr);

  template &lt;class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
            class InputIterator&gt;
  ToString make_string(InputIterator begin);

  template &lt;class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
            class InputIterator&gt;
  ToString make_string(InputIterator begin, std::size_t sz);

  template &lt;class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
            class InputIterator,
            class InputIterator2&gt;
  ToString make_string(InputIterator begin, InputIterator2 end);

  //  to_<i>string</i> function templates [str-x.to_string]
  template &lt;class FromCodec = default_codec,
    class ToString = std::basic_string&lt;char&gt;, class FromString&gt;
      ToString to_string(const FromString&amp; s);
  template &lt;class FromCodec = default_codec,
    class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
      ToString to_string(InputIterator begin);
  template &lt;class FromCodec = default_codec,
    class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
      ToString to_string(InputIterator begin, std::size_t sz);
  template &lt;class FromCodec = default_codec,
    class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
      ToString to_string(InputIterator begin, InputIterator end);
  <i><b>Repeat pattern for to_wstring, to_u8string, to_u16string, to_u32string</b></i>

  //  UTF-8 string support [str-x.utf8]
  inline const char8_t* u8(const char* s) noexcept;
  inline const char8_t* u8(const string&amp; s) noexcept;
  inline const char*    u8(const char8_t* s) noexcept;
  inline const char*    u8(const u8string&amp; s) noexcept;

}  // namespace tbd

  // stream inserters [str-x.cvt.ins]
  template &lt;class Ostream, class charT, class Traits, class Allocator&gt;
  Ostream&amp; operator&lt;&lt;(Ostream&amp; os, const basic_string&lt;charT, Traits, Allocator&gt;&amp; str);
  basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const wchar_t* p);
  basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const char16_t* p);
  basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const char32_t* p);
  
}  // namespace std</pre>
  </div>
  <h2>Codecs&nbsp;&nbsp;&nbsp; [<a name="str-x.codec">str-x.codec</a>]</h2>
  <p>Codecs are classes that package one typedef and three class templates. They 
  contain no data or function members and never need to be instantiated. Codec 
  classes may be predefined or user defined. All codec classes except <code>
  default_codec</code> shall meet the codec requirements [str-x.codec.req]</p>
  <p align="left"><b>Table: Predefined codec classes</b></p>
  <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111">
    <tr>
      <td><i><b>Class</b></i></td>
      <td><i><b><code>value_type</code></b></i></td>
      <td><i><b>Encoding</b></i></td>
    </tr>
    <tr>
      <td><code>narrow</code></td>
      <td><code>char</code></td>
      <td>Default locale&#39;s char encoding.</td>
    </tr>
    <tr>
      <td><code>wide</code></td>
      <td><code>wchar_t</code></td>
      <td>Implementation specific wchar_t encoding.</td>
    </tr>
    <tr>
      <td><code>utf8</code></td>
      <td><code>char8_t</code></td>
      <td>UTF-8</td>
    </tr>
    <tr>
      <td><code>utf16</code></td>
      <td><code>char16_t</code></td>
      <td>UTF-16</td>
    </tr>
    <tr>
      <td><code>utf32</code></td>
      <td><code>char32_t</code></td>
      <td>UTF-32</td>
    </tr>
    <tr>
      <td><code>default_codec</code></td>
      <td>
      <p align="center">N/A</td>
      <td>N/A</td>
    </tr>
  </table>
  <h3>Class default_codec&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.default">str-x.codec.default</a>]</h3>
  <p>Class <code>default_codec</code> is a pseudo-codec that provides 
  lazy <code>select_codec</code> selection. It is for use as a default for codec 
  template parameters that appear before the template parameter that determines
  <code>charT</code>. Class <code>default_codec</code> is not required to meet the 
  codec class 
  requirements</p>
  <pre>class default_codec
{
public:
  template &lt;class charT&gt;
  struct codec
  { 
    typedef typename select_codec&lt;charT&gt;::type type; 
  };
};
</pre>
  <h3>Requirements on codec classes&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.req">str-x.codec.req</a>]</h3>
  <p>Codecs are required to contain the following:</p>
  <pre>  typedef <b><i>implementation-defined</i></b> value_type;

  template &lt;class charT&gt;
  struct codec { typedef <b><i>codec-class-name</i></b> type; };

  template &lt;class InputIterator&gt;  
  class from_iterator
  {
  public:
    
    from_iterator();
    from_iterator(InputIterator begin);
    from_iterator(InputIterator begin, size_t sz);
    template &lt;class InputIterator2&gt;
      from_iterator(InputIterator begin, InputIterator2 end);
  };

  template &lt;class InputIterator&gt;  
  class to_iterator
  {
  public:
    to_iterator();
    to_iterator(InputIterator begin);
  };</pre>
  <h4>end-of-sequence iterator requirements&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.req.eos">str-x.codec.req.eos</a>]</h4>
  <p>An <i>end-of-sequence</i> iterator becomes equal to the 
  end-of-sequence value upon reaching the end of the sequence being iterated 
  over. An end-of-sequence iterator constructor with no arguments constructs the 
  end-of-sequence value, which is the only legitimate iterator value to be used 
  for the end condition. The behavior of <code>operator*</code> on an iterator 
  with the end-of-sequence value is undefined. For any other iterator value a
  <code>const T&amp;</code> is returned. The behavior of <code>operator-&gt;</code> for 
  an iterator with the end-of-sequence value is undefined. For any other 
  iterator value a <code>const T*</code> is returned. The behavior of <code>
  operator++()</code> for an iterator with the end-of-sequence value is 
  undefined.</p>
  <p>Two iterators with the end-of-sequence value are equal. An iterator with 
  the end-of-sequence value is not equal to an iterator that does not have the 
  end-of-sequence value. Two iterators that do not have the end-of-iterator 
  value are equal iff they point to the same element of the sequence.</p>
  <h4><code>from_iterator</code> requirements&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.req.from">str-x.codec.req.from</a>]</h4>
  <p>The class template <code>from_iterator</code> is an input 
  iterator that is an adaptation of a <code>InputIterator</code> template 
  parameter whose <code>value_type</code> is the same as the parent codec class
  <code>value_type</code>.&nbsp; It has a&nbsp; <code>value_type</code> of <code>
  char32_t</code> and meets the inpuyt iterator requirements of the C++ 
  standard and the end-of-sequence iterator requirements ([str-x.codec.req.eos]).</p>
  <h4><code>to_iterator</code> requirements&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.req.to">str-x.codec.req.to</a>]</h4>
  <p>The class template <code>to_iterator</code> is a input iterator that is 
  an adaptation of a <code>InputIterator</code> template parameter whose <code>
  value_type</code> is <code>char32_t</code>.&nbsp; It has a&nbsp; <code>
  value_type</code> that is the same as the parent codec class <code>value_type</code>.&nbsp; 
  It meets the input iterator requirements of the C++ standard and the 
  end-of-sequence iterator requirements ([str-x.codec.req.eos]).</p>
  <h4>Constructor requirements&nbsp;&nbsp;&nbsp; [<a name="str-x.codec.req.ctors">str-x.codec.req.ctors</a>]</h4>
  <pre><code>from_iterator();</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator with the end-of-sequence 
    iterator value ([str-x.codec.req.eos]).</p>
  </blockquote>
  <pre><code>from_iterator(InputIterator begin);</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator for the half-open range that 
    begins at <code>begin</code> and ends at the first element with a value of
    <code>value_type()</code>.</p>
  </blockquote>
  <pre><code>from_iterator(InputIterator begin, size_t sz);</code></pre>
  <blockquote>
    <p align="left"><i>Effects: </i>Constructs an iterator for the 
    half-open range that begins at <code>begin</code> and ends at <code>begin + 
    sz</code>.</p>
  </blockquote>
  <pre><code>template &lt;class </code>InputIterator2<code>&gt;
from_iterator(InputIterator begin, </code>InputIterator2<code> end);</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator for the half-open range that 
    begins at <code>begin</code> and ends at <code>end</code>.</p>
    <p><i>Remarks: </i>Shall not participate in overload resolution unless <code>
    InputIterator</code> and <code>InputIterator2</code> are the same type.</p>
  </blockquote>
  <pre><code>to_iterator();</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an object with the end-of-sequence 
    iterator value ([str-x.codec.req.eos]).</p>
  </blockquote>
  <pre><code>to_iterator(InputIterator begin);</code></pre>
  <blockquote>
    <p><code>InputIterator</code> is required to meet the end-of-sequence 
    iterator requirements ([str-x.codec.req.eos]).</p>
    <p><i>Effects: </i>Constructs an iterator for the half-open range that 
    begins at <code>begin</code> and ends when the end-of-sequence iterator 
    value is reached.</p>
  </blockquote>
  <h3>select_codec&nbsp;&nbsp; [<a name="str-x.codec.select">str-x.codec.select</a>]</h3>
  <p>
  <span style="font-style: italic; background-color: #FFFF99">
  To be supplied.</span></p>
  <h2>UTF-8 typedefs (Informative)&nbsp;&nbsp;&nbsp; [<a name="str-x.utf8-typedefs">str-x.utf8-typedefs</a>]</h2>
  <p>In portable internationalized applications, use of UTF-8 encoded 
  C-style array of <code>char</code> strings and <code>std::string</code> is 
  problematic for passing arguments to functions which assume the 
  encoding is the native narrow character encoding. For example, arguments 
  representing filenames for I/O functions or arguments representing content for 
  web sites. Disciplined conversion of all narrow character strings to UTF-8 
  encoding within an application is a partial solution, but is not enforceable 
  via the C++ language type system and does not help with third-party or 
  standard library functions that assume <code>char</code> strings use native 
  narrow encoding. </p>
  <p>The <code>char8_t</code> and <code>u8string</code> typedefs allow 
  the C++ type system to distinguish between native encoded and UTF-8 encoded 
   
  character strings. The actual type used for <code>char8_t</code> is <code>
  unsigned char</code> because the C++ language rules require that the representation of the underlying bytes 
  for <code>char</code> and <code>unsigned char</code> are the same (C++ 
  standard: [basic.types]). This allows conversion by compile-time casts with no 
  runtime cost. </p>
  <h2>Class template <code>conversion_iterator</code>&nbsp;&nbsp;&nbsp; [<a name="str-x.cvt-iter">str-x.cvt-iter</a>]</h2>
  <p>Class template <code>conversion_iterator</code> composes a input iterator from a 
  codec t<code>o_iterator</code>, a codec <code>from_iterator</code>, and a 
  input iterator. It adapts the input iterator to behave as an iterator to
  <code>ToCodec::value_type</code>. The type <code>iterator_traits&lt;InputIterator&gt;::value_type</code> 
  is required to be the same as <code>FromCodec::value_type</code>.&nbsp; <code>
  conversion_iterator</code> meets the standard library input iterator 
  requirements and the end-of-sequence iterator requirements ([str-x.codec.req.eos]).</p>
  <h3>Synopsis&nbsp;&nbsp;&nbsp; [<a name="str-x.cvt-iter.synop">str-x.cvt-iter.synop</a>]</h3>
  <pre>template &lt;class ToCodec, class FromCodec, class InputIterator&gt;
  class conversion_iterator
    : public ToCodec::template to_iterator&lt;
        typename FromCodec::template from_iterator&lt;InputIterator&gt;&gt;
{
public:
  typedef typename FromCodec::template from_iterator&lt;InputIterator&gt;
    from_iterator_type;
  typedef typename ToCodec::template to_iterator&lt;from_iterator_type&gt;
    to_iterator_type;

  conversion_iterator();
  conversion_iterator(InputIterator begin);
  conversion_iterator(InputIterator begin, std::size_t sz);
  template &lt;class U&gt;
    conversion_iterator(InputIterator begin, U end);

<i>  // other functions as needed to meet standard library requirements
  // for input iterators [input.iterators]
  ...
</i>};</pre>
  <h3>Constructors&nbsp;&nbsp;&nbsp; [<a name="str-x.cvt-iter.ctors">str-x.cvt-iter.ctors</a>]</h3>
  <pre><code>conversion_iterator();</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator with the end-of-sequence 
    iterator value ([str-x.codec.req.eos]).</p>
  </blockquote>
  <pre><code>conversion_iterator(InputIterator begin);</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator for the half-open range that 
    begins at <code>begin</code> and ends at the first element with a value of
    <code>value_type()</code>.</p>
  </blockquote>
  <pre><code>conversion_iterator(InputIterator begin, size_t sz);</code></pre>
  <blockquote>
    <p align="left"><i>Effects: </i>Constructs an iterator for the 
    half-open range that begins at <code>begin</code> and ends at <code>begin + 
    sz</code>.</p>
  </blockquote>
  <pre><code>template &lt;class </code>InputIterator2<code>&gt;
conversion_iterator(InputIterator begin, </code>InputIterator2<code> end);</code></pre>
  <blockquote>
    <p><i>Effects: </i>Constructs an iterator for the half-open range that 
    begins at <code>begin</code> and ends at <code>end</code>.</p>
    <p><i>Remarks: </i>Shall not participate in overload resolution unless <code>
    InputIterator</code> and <code>InputIterator2</code> are the same type.</p>
  </blockquote>
  <h2>Algorithm <code>copy_string</code>&nbsp;&nbsp;&nbsp; [<a name="str-x.copy_string">str-x.copy_string</a>]</h2>
  <pre>template&lt;class InputIterator, class FromCodec,
         class OutputIterator, class ToCodec&gt;
OutputIterator copy_string(InputIterator first, InputIterator last,
                           OutputIterator result);</pre>
  <blockquote>
    <p><i>Requires:</i> <code>result</code> shall not be in the range 
    [<code>first,last</code>).</p>
    <p><i>Effects:</i></p>
    <blockquote>
      <pre>typedef conversion_iterator&lt;ToCodec,
  typename FromCodec::template
    codec&lt;typename std::iterator_traits&lt;InputIterator&gt;::value_type&gt;::type,
  InputIterator&gt;
iter_type;</pre>
      <p><i>Returns:</i> <code>std::copy(iter_type(begin, end), 
      iter_type(), result)</code>.</p>
    </blockquote>
  </blockquote>
  <h2><code>make_string</code> function templates&nbsp;&nbsp;&nbsp; [<a name="str-x.make_string">str-x.make_string</a>]</h2>
  <p>The <code>make_string</code> functions create a string from a source sequence of 
  characters. The conversion of the type and encoding of the characters in the 
  source sequence of characters to the type and encoding of characters in the 
  created string is performed by <code>conversion_iterator&lt;ToCodec, typename 
  FromCodec::template codec&lt;typename FromString::value_type&gt;::type, typename 
  FromString::const_iterator&gt;</code>, where <code>ToCodec</code>, <code>
  FromCodec</code>, and <code>FromString</code> are template parameters, as is
  <code>ToString</code>, the type of the resulting string.</p>
  <pre>template &lt;class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
          class FromString&gt;
ToString make_string(const FromString&amp; s);</pre>
  <blockquote>
  <p><i>Returns: </i>&nbsp;A string containing the characters of the 
  sequence [<code>s.cbegin(), s.cend()</code>).</p>
  
  <p>[<i>Example:</i> A conforming implementation would be:</p>
  
    <pre>  typedef conversion_iterator&lt;ToCodec,
    typename FromCodec::template codec&lt;typename FromString::value_type&gt;::type,
    typename FromString::const_iterator&gt;
      iter_type;

  ToString tmp;
  std::copy(iter_type(s.cbegin(), s.cend()), iter_type(),
            std::back_insert_iterator&lt;ToString&gt;(tmp));
  return tmp;</pre>
    <p><i>--end example</i>]</p>
  </blockquote>
  
  <pre>template &lt;class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
          class InputIterator&gt;
ToString make_string(InputIterator begin);</pre>
  <blockquote>
  <p><i>Returns: </i>&nbsp;A string containing the characters of the 
  sequence [<code>begin, begin+<i>dist</i></code>) where <code><i>dist</i></code> 
  is the distance from <code>begin</code> to the first instance of character
  <code>iterator_traits&lt;InputIterator&gt;::value_type()</code>.</p>
  
  <p><i>Complexity: </i>O(<code><i>dist</i></code>)</p>
  
  </blockquote>
  <pre>template &lt;class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
          class InputIterator&gt;
ToString make_string(InputIterator begin, std::size_t sz);</pre>
  <blockquote>
  <p><i>Returns: </i>&nbsp;A string containing the characters of the 
  sequence [<code>begin, begin+sz</code>).</p>
  
  </blockquote>
  <pre>template &lt;class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string&lt;typename ToCodec::value_type&gt;,
          class InputIterator,
          class InputIterator2&gt;
ToString make_string(InputIterator begin, InputIterator2 end);</pre>
  <blockquote>
  <p><i>Returns: </i>&nbsp;A string containing the characters of the 
  sequence [<code>begin, end</code>).</p>
  
  </blockquote>
  <h2><code>to_</code><i><code>string</code></i> function templates&nbsp;&nbsp;&nbsp; [<a name="str-x.to_string">str-x.to_string</a>]</h2>
  
  <pre>template &lt;class FromCodec = default_codec,
&nbsp; class ToString = std::basic_string&lt;char&gt;, class FromString&gt;
&nbsp;&nbsp;&nbsp; ToString to_string(const FromString&amp; s);
template &lt;class FromCodec = default_codec,
&nbsp; class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
&nbsp;&nbsp;&nbsp; ToString to_string(InputIterator begin);
template &lt;class FromCodec = default_codec,
&nbsp; class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
&nbsp;&nbsp;&nbsp; ToString to_string(InputIterator begin, std::size_t sz);
template &lt;class FromCodec = default_codec,
&nbsp; class ToString = std::basic_string&lt;char&gt;, class InputIterator&gt;
&nbsp;&nbsp;&nbsp; ToString to_string(InputIterator begin, InputIterator end);
<i><b>Repeat pattern for to_wstring, to_u8string, to_u16string, to_u32string</b></i></pre>
  <blockquote>
    <p><i>Returns:</i> <code>make_string&lt;<i><b>codec</b></i>, FromCodec, 
    ToString&gt;(<i><b>arguments</b></i>)</code>, where <code><i><b>codec</b></i></code> 
    is <code>narrow</code>, <code>wide</code>, <code>utf8</code>, <code>utf16</code>, 
    and <code>utf32</code>, and <code><i><b>arguments</b></i></code> is <code>s</code>,
    <code>begin</code>, <code>begin,sz</code>, and <code>begin,end</code>.<br>
    &nbsp;</p>
  </blockquote>
  <h2>UTF-8 string support&nbsp;&nbsp;&nbsp; [<a name="str-x.utf8">str-x.utf8</a>]</h2>
  
  <p>These functions provide copy-less type conversion for use with 
  narrow character strings when no encoding conversion is required. Their 
  semantics take advantage 
  of C++ language rules that ensure the representation of the underlying bytes 
  for <code>char</code> and <code>unsigned char</code> are the same (C++ 
  standard: [basic.types]). </p>
  
  <pre>inline const char8_t* u8(const char* s) noexcept;</pre>
  <blockquote>
    <p><i>Returns: </i><code>static_cast&lt;const char8_t*&gt;(static_cast&lt;const 
    void*&gt;(s))</code>.</p>
  </blockquote>
  <pre>inline const char8_t* u8(const string&amp; s) noexcept;</pre>
  <blockquote>
    <p><i>Returns: </i><code>static_cast&lt;const char8_t*&gt;(static_cast&lt;const 
    void*&gt;(s.c_str()))</code>.</p>
  </blockquote>
  <pre>inline const char* u8(const char8_t* s) noexcept;</pre>
  <blockquote>
    <p><i>Returns: </i><code>static_cast&lt;const char*&gt;(static_cast&lt;const 
    void*&gt;(s));</code>.</p>
  </blockquote>
  <pre>inline const char* u8(const u8string&amp; s) noexcept;</pre>
  <blockquote>
    <p><i>Returns: </i><code>static_cast&lt;const char*&gt;(static_cast&lt;const 
    void*&gt;(s.c_str()))</code>.</p>
  </blockquote>
  <h2>Stream inserters&nbsp;&nbsp;&nbsp; [<a name="str-x.ins">str-x.ins</a>]</h2>
  
  <p>The stream inserter functions perform stream insertion of an 
  insertion character sequence converted from a source character sequence. The 
  conversion of the type and encoding of the source sequence to the type and 
  encoding of the insertion sequence is performed by a <code>conversion_iterator</code>.</p>
  
  <pre>template &lt;class Ostream, class charT, class traits, class Allocator&gt;
Ostream&amp; operator&lt;&lt;(Ostream&amp; os, const basic_string&lt;charT, traits, Allocator&gt;&amp; str);</pre>
  <blockquote>
  <p><i>Effects:</i> For each value of an iterator of type <code>
  conversion_iterator&lt;typename select_codec&lt;typename Ostream::char_type&gt;::type, 
  typename select_codec&lt;charT&gt;::type, typename string_type::const_iterator&gt;</code> 
  initialized with the source sequence (<code>str.cbegin(), str.cend()</code>], 
  iterate until the end-of-sequence value ([str-x.codec.req.eos]) is reached, 
  inserting the dereferenced value of the iterator into <code>os</code>.</p>
  
  <p><i>Returns: </i><code>os</code>.</p>
  
  <p><i>Remarks: </i>Does not participate in overload resolution if
  <code>charT</code> and <code>Ostream::char_type</code> are the same type. </p>
  
  </blockquote>
  <div>
    <pre>basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const wchar_t* p);
basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const char16_t* p);
basic_ostream&lt;char&gt;&amp; operator&lt;&lt;(basic_ostream&lt;char&gt;&amp; os, const char32_t* p);</pre>
  </div>
  <blockquote>
  <p><i>Effects:</i> For each value of an iterator of type <code>
  conversion_iterator&lt;typename select_codec&lt;char&gt;::type, typename select_codec&lt;<i>p&#39;s 
  value_type</i>&gt;::type, <i>p&#39;s type</i>&gt;</code> initialized with <code>p</code>, 
  iterate until the end-of-sequence value ([str-x.codec.req.eos]) is reached, 
  inserting the dereferenced value of the iterator into <code>os</code>.</p>
  
  <p><i>Returns: </i><code>os</code>.</p>
  
  <p>[<i>Note</i>: The existing <code>basic_ostream&lt;charT,traits&gt;&amp; 
  operator&lt;&lt;(const void* p)</code> prevents use of a template to abstract away 
  the differences between the pointer types covered by above signatures. <i>
  --end note</i>]</p>
  
  </blockquote>
  
  <h2>Stream extractors&nbsp;&nbsp;&nbsp; [<a name="str-x.ext">str-x.ext</a>]</h2>
  
  <p><i><span style="background-color: #FFFF99">To be supplied.</span></i></p>
  
  <hr>

</body>

</html>