<HTML><HEAD><TITLE>N2007 (WG21) 06-0077 (J16)</TITLE></HEAD><BODY>

<H2>N2007 (WG21) 06-0077 (J16)<BR>
PROPOSED LIBRARY ADDITIONS FOR CODE CONVERSION</H2>

<PRE>P.J. Plauger
Dinkumware, Ltd.
pjp@dinkumware.com

2006-04-15</PRE>

<P>Dinkumware has been marketing several character code conversion
aids for a number of years, as part of a package of supplemental
features we call CoreX. They have now been merged into our latest
comprehensive product, the Dinkum Compleat Library. Based on the
success of that package, we now feel confident in proposing
two template classes from it for inclusion in a future standard
C++ library.</P>

<P>We submitted these template classes earlier, first as N1683
and then as N1957, and met with several criticisms. Here are
our responses to the most explicit remarks about
<CODE>wstring_convert</CODE>:</P>

<UL>
<LI>1) The <CODE>Elem</CODE> template parameter is redundant,
since the <CODE>Codecvt</CODE>
parameter specifies an element type. We've found it convenient
to keep these types separate, so that (for example) you can
use the large existing corpus of char codecvt facets with
signed/unsigned char conversions as well.</LI>

<LI>2) The external character type should be a template parameter
too, not hardwired as char. We find the overwhelming use for
codecvt facets is to translate to and from a byte (char)
stream. Making the implementation sensible for other types
adds considerable complexity for a dubious payoff.</LI>

<LI>3) There are typedefs not always used in the rest of the
external interface. We now use <CODE>state_type</CODE> (see below).
the others have their uses in user code, and generally
parallel the ones in library classes.</LI>

<LI>4) You can't supply a <CODE>Codecvt</CODE> object at construction time.
While this is only occasionally necessary, it's a useful
functional enhancement that we've now incorporated in our
implementation.</LI>

<LI>5) Appending error strings is a weak way of reporting
errors. Yes, but it's often enough, and you can also have
an encoding error throw an exception. We've also now added
the ability to get the count of successfully converted
input elements.</LI>

<LI>6) You can't set the initial shift state or get the final
shift state. We've now added this capability.</LI>

<P>The descriptions that follow are taken primarily from our
documentation for our latest library.</P>

<HR>

<H2><A NAME="wstring_convert"><CODE>wstring_convert</CODE></A></H2>

<P>Template class <CODE>wstring_convert</CODE> performs conversions between a
wide string and a byte string. It lets you specify a code conversion
facet (like template class <CODE>codecvt</CODE>) to perform the conversions,
without affecting any streams or locales. Say, for example, you have
a code conversion facet called <CODE>codecvt_utf8</CODE> that you want
to use to output to <CODE>cout</CODE> a UTF-8 multibyte sequence corresponding
to a wide string, but you don't want to alter the locale for <CODE>cout</CODE>.
You can write something like:</P>

<PRE>    wstring_convert&lt;codecvt_utf8&lt;wchar_t&gt;&gt;
        myconv();
    std::string mbstring = myconv.to_bytes(L"Hello\n");
    cout &lt;&lt; mbstring;</PRE>

<P>Note that the Standard C++ library currently uses code conversion facets
only within template class <CODE>basic_filebuf</CODE>, for converting from
multibyte sequences when reading from a file and for converting to
multibyte sequences when writing to a file. Something like template class
<CODE>wstring_convert</CODE> is needed to perform similar conversions
between string objects, without involving file I/O.</P>

<PRE>namespace std {
template&lt;class Codecvt,
    class Elem = wchar_t&gt;
    class wstring_convert
    {
    typedef std::basic_string&lt;char&gt; <B><A HREF="#wstring_convert::byte_string">byte_string</A></B>;
    typedef std::basic_string&lt;Elem&gt; <B><A HREF="#wstring_convert::wide_string">wide_string</A></B>;
    typedef typename Codecvt::state_type <B><A HREF="#wstring_convert::state_type">state_type</A></B>;
    typedef typename wide_string::traits_type::int_type <B><A HREF="#wstring_convert::int_type">int_type</A></B>;

    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>(Codecvt *pcvt = new Codecvt);
    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>(Codecvt *pcvt, state_type state);
    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>(const byte_string&amp; byte_err,
        const wide_string&amp; wide_err = wide_string());

    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(char byte);
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const char *ptr);
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const byte_string&amp; str);
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const char *first, const char *last);

    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(Elem wchar);
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const _Elem *wptr);
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const wide_string&amp; wstr);
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const Elem *first, const Elem *last);

    size_t <B><A HREF="#wstring_convert::converted">converted</A></B>() const;
    state_type <B><A HREF="#wstring_convert::state">state</A></B>() const;

<I>    // exposition only
private:
    byte_string <B>byte_err_string</B>;
    wide_string <B>wide_err_string</B>;
    Codecvt *<B>cvtptr</B>;
    state_type <B>cvtstate</B>;
    size_t <B>cvtcount</B>;</I>
    };
}  // namespace std</PRE>

<P>The template class describes an object that controls conversions
between wide string objects of class <CODE>std::basic_string&lt;Elem&gt;</CODE>
and byte string objects of class <CODE>std::basic_string&lt;char&gt;</CODE>
(also known as <CODE>std::string</CODE>). The template class defines the
types <CODE>wide_string</CODE> and <CODE>byte_string</CODE> as synonyms for
these two types. Conversion between a sequence
of <CODE>Elem</CODE> values (stored in a <CODE>wide_string</CODE> object)
and multibyte sequences (stored in a <CODE>byte_string</CODE> object)
is performed by an object of class
<CODE>Codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>,
which meets the requirements of the standard code-conversion facet
<CODE>std::codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>.</P>

<P>An object of this template class stores:</P>

<UL>
<LI><B><A NAME="byte_err_string"><CODE>byte_err_string</CODE></A></B> --
a byte string to display on errors</LI>

<LI><B><A NAME="wide_err_string"><CODE>wide_err_string</CODE></A></B> --
a wide string to display on errors</LI>

<LI><B><A NAME="cvtptr"><CODE>cvtptr</CODE></A></B> --
a pointer to the allocated conversion object (which is freed
when the <CODE>wbuffer_convert</CODE> object is destroyed)</LI>

<LI><B><A NAME="cvtstate"><CODE>cvtstate</CODE></A></B> --
a conversion state object</LI>

<LI><B><A NAME="cvtcount"><CODE>cvtcount</CODE></A></B> --
a conversion count</LI>
</UL>

<H3><CODE><A NAME="wstring_convert::byte_string">wstring_convert::byte_string</A></CODE></H3>

<PRE>typedef std::basic_string&lt;char&gt; <B>byte_string</B>;</PRE>

<P>The type is a synonym for <CODE>std::basic_string&lt;char&gt;</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::converted">wstring_convert::converted</A></CODE></H3>

<PRE>size_t <B>converted</B>() const;</PRE>

<P>The member function returns <CODE>cvtcount</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::from_bytes">wstring_convert::from_bytes</A></CODE></H3>

<PRE>wide_string <B>from_bytes</B>(char byte);
wide_string <B>from_bytes</B>(const char *ptr);
wide_string <B>from_bytes</B>(const byte_string&amp; str);
wide_string <B>from_bytes</B>(const char *first, const char *last);</PRE>

<P>The first member function converts the single-element sequence <CODE>byte</CODE>
to a wide string.
The second member function converts the nul-terminated sequence beginning
at <CODE>ptr</CODE> to a wide string.
The third member function converts the sequence stored in <CODE>str</CODE>
to a wide string.
The fourth member function converts the sequence defined by the range
<CODE>[first, last)</CODE> to a wide string.</P>

<P>In all cases:</P>

<UL>
<LI>If the <CODE>cvtstate</CODE> object was <I>not</I> constructed with an
explicit value, it is set to its default value (the initial conversion
state) before the conversion begins. Otherwise it is left unchanged.</LI>

<LI>The number of input elements successfully converted is stored in
<CODE>cvtcount</CODE>.</LI>

<LI>If no conversion error occurs, the member function returns the
converted wide string.</LI>

<LI>Otherwise, if the object was constructed with a
<A HREF="#wide-error string">wide-error string</A>,
the member function returns the wide-error string.</LI>

<LI>Otherwise, the member function throws an object of class
<CODE>std::range_error</CODE>.</LI>
</UL>

<H3><CODE><A NAME="wstring_convert::int_type">wstring_convert::int_type</A></CODE></H3>

<PRE>typedef typename wide_string::traits_type::int_type <B>int_type</B>;</PRE>

<P>The type is a synonym for <CODE>wide_string::traits_type::int_type</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::state">wstring_convert::state</A></CODE></H3>

<PRE>state_type <B>state</B>() const;</PRE>

<P>The member function returns <CODE>cvtstate</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::state_type">wstring_convert::state_type</A></CODE></H3>

<PRE>typedef typename Codecvt::state_type <B>state_type</B>;</PRE>

<P>The type is a synonym for <CODE>Codecvt::state_type</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::to_bytes">wstring_convert::to_bytes</A></CODE></H3>

<PRE>byte_string <B>to_bytes</B>(Elem wchar);
byte_string <B>to_bytes</B>(const _Elem *wptr);
byte_string <B>to_bytes</B>(const wide_string&amp; wstr);
byte_string <B>to_bytes</B>(const Elem *first, const Elem *last);</PRE>

<P>The first member function converts the single-element sequence <CODE>wchar</CODE>
to a byte string.
The second member function converts the nul-terminated sequence beginning
at <CODE>wptr</CODE> to a byte string.
The third member function converts the sequence stored in <CODE>wstr</CODE>
to a byte string.
The fourth member function converts the sequence defined by the range
<CODE>[first, last)</CODE> to a byte string.</P>

<P>In all cases:</P>

<UL>
<LI>If the <CODE>cvtstate</CODE> object was <I>not</I> constructed with an
explicit value, it is set to its default value (the initial conversion
state) before the conversion begins. Otherwise it is left unchanged.</LI>

<LI>The number of input elements successfully converted is stored in
<CODE>cvtcount</CODE>.</LI>

<LI>If no conversion error occurs, the member function returns the
converted byte string.</LI>

<LI>Otherwise, if the object was constructed with a
<A HREF="#byte-error string">byte-error string</A>,
the member function returns the byte-error string.</LI>

<LI>Otherwise, the member function throws an object of class
<CODE>std::range_error</CODE>.</LI>
</UL>

<H3><CODE><A NAME="wstring_convert::wide_string">wstring_convert::wide_string</A></CODE></H3>

<PRE>typedef std::basic_string&lt;Elem&gt; <B>wide_string</B>;</PRE>

<P>The type is a synonym for <CODE>std::basic_string&lt;Elem&gt;</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::wstring_convert">wstring_convert::wstring_convert</A></CODE></H3>

<PRE><B>wstring_convert</B>(Codecvt *pcvt = new Codecvt);
<B>wstring_convert</B>(Codecvt *pcvt, state_type state);
<B>wstring_convert</B>(const byte_string&amp; byte_err,
    const wide_string&amp; wide_err = wide_string());</PRE>

<P>The first constructor stores <CODE>pcvt</CODE> in <CODE>cvtptr</CODE> and
default values in <CODE>cvtstate</CODE>, <CODE>byte_err_string</CODE>,
and <CODE>wide_err_string</CODE>.
The second constructor stores <CODE>pcvt</CODE> in <CODE>cvtptr</CODE>,
<CODE>state</CODE> in <CODE>cvtstate</CODE>, and default values in
<CODE>byte_err_string</CODE> and <CODE>wide_err_string</CODE>;
moreover the stored state is retained between calls to
<CODE><A HREF="wstring_convert::from_bytes">from_bytes</A></CODE> and
<CODE><A HREF="wstring_convert::to_bytes">to_bytes</A></CODE>.
The third constructor stores <CODE>new Codecvt</CODE> in <CODE>cvtptr</CODE>,
<CODE>state_type()</CODE> in <CODE>cvtstate</CODE>,
<CODE>byte_err</CODE> in <CODE>byte_err_string</CODE>,
and <CODE>wide_err</CODE> in <CODE>wide_err_string</CODE>.</P>

<HR>

<H2><A NAME="wbuffer_convert"><CODE>wbuffer_convert</CODE></A></H2>

<P>Template class <CODE>wbuffer_convert</CODE> looks like a wide stream
buffer, but performs all its I/O through an underlying byte stream buffer
that you specify when you construct it. Like template class
<CODE>wstring_convert</CODE>, it lets you specify a code conversion
facet to perform the conversions, without affecting any streams or locales.
The previous example can also be written as:</P>

<PRE>namespace std {
template&lt;class Codecvt,
    class Elem = wchar_t,
    class Tr = std::char_traits&lt;Elem&gt; &gt;
    class wbuffer_convert
        : public std::basic_streambuf&lt;Elem, Tr&gt;
    {
public:
    typedef typename Tr::state_type <B><A HREF="#wbuffer_convert::state_type">state_type</A></B>;

    <B><A HREF="#wbuffer_convert::wbuffer_convert">wbuffer_convert</A></B>(std::streambuf *bytebuf = 0,
        Codecvt *pcvt = new Codecvt,
        state_type state = state_type());

    std::streambuf *<B><A HREF="#wbuffer_convert::rdbuf">rdbuf</A></B>() const;
    std::streambuf *<B><A HREF="#wbuffer_convert::rdbuf">rdbuf</A></B>(std::streambuf *bytebuf);

    state_type <B><A HREF="#wbuffer_convert::state">state</A></B>() const;

<I>    // exposition only
private:
    std::streambuf *<B>bufptr</B>;
    Codecvt *<B>cvtptr</B>;
    state_type <B>cvtstate</B>;</I>
    };
}  // namespace std</PRE>

<P>The template class describes a stream buffer that controls the
transmission of elements of type <CODE>Elem</CODE>, whose character traits
are described by the class <CODE>Tr</CODE>, to and from a byte stream
buffer of type <CODE>std::streambuf</CODE>. Conversion between a sequence
of <CODE>Elem</CODE> values and multibyte sequences is performed by an
object of class <CODE>Codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>,
which meets the requirements of the standard code-conversion facet
<CODE>std::codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>.</P>

<P>An object of this template class stores:</P>

<UL>
<LI><B><A NAME="bufptr"><CODE>bufptr</CODE></A></B> --
a pointer to its underlying byte stream buffer</LI>

<LI><B><A NAME="cvtptr"><CODE>cvtptr</CODE></A></B> --
a pointer to the allocated conversion object (which is freed
when the <CODE>wbuffer_convert</CODE> object is destroyed)</LI>

<LI><B><A NAME="cvtstate"><CODE>cvtstate</CODE></A></B> --
a conversion state object</LI>
</UL>

<H3><CODE><A NAME="wbuffer_convert::state">wbuffer_convert::state</A></CODE></H3>

<PRE>state_type <B>state</B>() const;</PRE>

<P>The member function returns <CODE>cvtstate</CODE>.</P>

<H3><CODE><A NAME="wbuffer_convert::rdbuf">wbuffer_convert::rdbuf</A></CODE></H3>

<PRE>std::streambuf *<B>rdbuf</B>() const;
std::streambuf *<B>rdbuf</B>(std::streambuf *bytebuf);</PRE>

<P>The first member function returns <CODE>bufptr</CODE>.
The second member function stores <CODE>bytebuf</CODE> in <CODE>bufptr</CODE>.</P>

<H3><CODE><A NAME="wbuffer_convert::wbuffer_convert">wbuffer_convert::wbuffer_convert</A></CODE></H3>

<PRE><B>wbuffer_convert</B>(std::streambuf *bytebuf = 0,
    Codecvt *pcvt = new Codecvt,
    state_type state = state_type());</PRE>

<P>The constructor constructs a stream buffer object, initializes
<CODE>bufptr</CODE> to <CODE>bytebuf</CODE>, initializes
<CODE>cvtptr</CODE> to <CODE>pcvt</CODE>, and initializes
<CODE>cvtstate</CODE> to <CODE>state</CODE>.</P>

<H3><CODE><A NAME="wbuffer_convert::state_type">wbuffer_convert::state_type</A></CODE></H3>

<PRE>typedef typename Codecvt::state_type <B>state_type</B>;</PRE>

<P>The type is a synonym for <CODE>Codecvt::state_type</CODE>.</P>

<HR>
<P><I>Copyright</A> &#169; 2002-2006
by Dinkumware, Ltd. All rights reserved.</I></P>
