<HTML><HEAD><TITLE>N1683 (WG21) 04-0123 (J16)</TITLE></HEAD><BODY>

<H2>N1683 (WG21) 04-0123 (J16)<BR>
PROPOSED LIBRARY ADDITIONS FOR CODE CONVERSION</H2>

<PRE>P.J. Plauger
Dinkumware, Ltd.
pjp@dinkumware.com</PRE>

<P>Dinkumware has been marketing a package of character code conversion
aids for the past couple of years. Based on the success of that package
(which we call CoreX), we now feel confident in proposing
two template classes from it for inclusion in a future standard
C++ library. The descriptions that follow are taken primarily from our
documentation.</P>

<HR>

<P>Template class <CODE>wstring_convert</CODE> performs conversions between a
wide string and a byte string. It lets you specify a code conversion
facet (like template class <CODE>codecvt</CODE>) to perform the conversions,
without affecting any streams or locales. Say, for example, you have
a code conversion facet called <CODE>codecvt_utf8</CODE> that you want
to use to output to <CODE>cout</CODE> a UTF-8 multibyte sequence corresponding
to a wide string, but you don't want to alter the locale for <CODE>cout</CODE>.
You can write something like:</P>

<PRE>    wstring_convert&lt;codecvt_utf8&lt;wchar_t&gt;&gt;
        myconv();
    std::string mbstring = myconv.to_bytes(L"Hello\n");
    cout &lt;&lt; mbstring;</PRE>

<P>Note that the Standard C++ library currently uses code conversion facets
only within template class <CODE>basic_filebuf</CODE>, for converting from
multibyte sequences when reading from a file and for converting to
multibyte sequences when writing to a file. Something like template class
<CODE>wstring_convert</CODE> is needed to perform similar conversions
between string objects, without involving file I/O.</P>

<H2><A NAME="wstring_convert"><CODE>wstring_convert</CODE></A></H2>

<PRE>template&lt;class Codecvt,
    class Elem = wchar_t&gt;
    class wstring_convert
    {
    typedef std::basic_string&lt;char&gt; <B><A HREF="#wstring_convert::byte_string">byte_string</A></B>;
    typedef std::basic_string&lt;Elem&gt; <B><A HREF="#wstring_convert::wide_string">wide_string</A></B>;
    typedef typename Codecvt::state_type <B><A HREF="#wstring_convert::state_type">state_type</A></B>;
    typedef typename wide_string::traits_type::state_type <B><A HREF="#wstring_convert::int_type">int_type</A></B>;

    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>();
    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>(const byte_string&amp; byte_err);
    <B><A HREF="#wstring_convert::wstring_convert">wstring_convert</A></B>(const byte_string&amp; byte_err,
        const wide_string&a`mp; wide_err);

    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(char byte) const;
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const char *ptr) const;
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const byte_string&amp; str) const;
    wide_string <B><A HREF="#wstring_convert::from_bytes">from_bytes</A></B>(const char *first, const char *last) const;

    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(Elem wchar) const;
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const _Elem *wptr) const;
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const wide_string&amp; wstr) const;
    byte_string <B><A HREF="#wstring_convert::to_bytes">to_bytes</A></B>(const Elem *first, const Elem *last) const;

<I>    // exposition only
private:
    byte_string <B>byte_err_string</B>;
    wide_string <B>wide_err_string</B>;</I>
    };</PRE>

<P>The template class describes an object that controls conversions
between wide string objects of class <CODE>std::basic_string&lt;Elem&gt;</CODE>
and byte string objects of class <CODE>std::basic_string&lt;char&gt;</CODE>
(also known as <CODE>std::string</CODE>). The template class defines the
types <CODE>wide_string</CODE> and <CODE>byte_string</CODE> as synonyms for
these two types. Conversion between a sequence
of <CODE>Elem</CODE> values (stored in a <CODE>wide_string</CODE> object)
and multibyte sequences (stored in a <CODE>byte_string</CODE> object)
is performed by an object of class
<CODE>Codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>,
which meets the requirements of the standard code-conversion facet
<CODE>std::codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>.</P>

<P>An object of this template class stores a
<B><A NAME="wide-error string">wide-error string</A></B>, called
<B><A NAME="wide_err_string"><CODE>wide_err_string</CODE></A></B>
here for the sake of exposition. It also stores a
<B><A NAME="byte-error string">byte-error string</A></B>, called
<B><A NAME="byte_err_string"><CODE>byte_err_string</CODE></A></B>
here for the sake of exposition.</P>

<H3><CODE><A NAME="wstring_convert::byte_string">wstring_convert::byte_string</A></CODE></H3>

<PRE>typedef std::basic_string&lt;char&gt; <B>byte_string</B>;</PRE>

<P>The type is a synonym for <CODE>std::basic_string&lt;char&gt;</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::from_bytes">wstring_convert::from_bytes</A></CODE></H3>

<PRE>wide_string <B>from_bytes</B>(char byte) const;
wide_string <B>from_bytes</B>(const char *ptr) const;
wide_string <B>from_bytes</B>(const byte_string&amp; str) const;
wide_string <B>from_bytes</B>(const char *first, const char *last) const;</PRE>

<P>The first member function converts the single-element sequence <CODE>byte</CODE>
to a wide string.
The second member function converts the nul-terminated sequence beginning
at <CODE>ptr</CODE> to a wide string.
The third member function converts the sequence stored in <CODE>str</CODE>
to a wide string.
The fourth member function converts the sequence defined by the range
<CODE>[first, last)</CODE> to a wide string.</P>

<P>In all cases:</P>

<UL>
<LI>If no conversion error occurs, the member function returns the
converted wide string.</LI>

<LI>Otherwise, if the object was constructed with a
<A HREF="#wide-error string">wide-error string</A>,
the member function appends the wide-error
string to the wide string generated up to the point of the conversion
error, then returns the resultant wide string.</LI>

<LI>Otherwise, the member function throws an object of class
<CODE>std::range_error</CODE>.</LI>
</UL>

<H3><CODE><A NAME="wstring_convert::int_type">wstring_convert::int_type</A></CODE></H3>

<PRE>typedef typename wide_string::traits_type::int_type <B>int_type</B>;</PRE>

<P>The type is a synonym for <CODE>wide_string::traits_type::int_type</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::state_type">wstring_convert::state_type</A></CODE></H3>

<PRE>typedef typename Codecvt::state_type <B>state_type</B>;</PRE>

<P>The type is a synonym for <CODE>Codecvt::state_type</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::to_bytes">wstring_convert::to_bytes</A></CODE></H3>

<PRE>byte_string <B>to_bytes</B>(Elem wchar) const;
byte_string <B>to_bytes</B>(const _Elem *wptr) const;
byte_string <B>to_bytes</B>(const wide_string&amp; wstr) const;
byte_string <B>to_bytes</B>(const Elem *first, const Elem *last) const;</PRE>

<P>The first member function converts the single-element sequence <CODE>wchar</CODE>
to a byte string.
The second member function converts the nul-terminated sequence beginning
at <CODE>wptr</CODE> to a byte string.
The third member function converts the sequence stored in <CODE>wstr</CODE>
to a byte string.
The fourth member function converts the sequence defined by the range
<CODE>[first, last)</CODE> to a byte string.</P>

<P>In all cases:</P>

<UL>
<LI>If no conversion error occurs, the member function returns the
converted byte string.</LI>

<LI>Otherwise, if the object was constructed with a
<A HREF="#byte-error string">byte-error string</A>,
the member function appends the byte-error
string to the byte string generated up to the point of the conversion
error, then returns the resultant byte string.</LI>

<LI>Otherwise, the member function throws an object of class
<CODE>std::range_error</CODE>.</LI>
</UL>

<H3><CODE><A NAME="wstring_convert::wide_string">wstring_convert::wide_string</A></CODE></H3>

<PRE>typedef std::basic_string&lt;Elem&gt; <B>wide_string</B>;</PRE>

<P>The type is a synonym for <CODE>std::basic_string&lt;Elem&gt;</CODE>.</P>

<H3><CODE><A NAME="wstring_convert::wstring_convert">wstring_convert::wstring_convert</A></CODE></H3>

<PRE><B>wstring_convert</B>();
<B>wstring_convert</B>(const byte_string&amp; byte_err);
<B>wstring_convert</B>(const byte_string&amp; byte_err,
    const wide_string&amp; wide_err);</PRE>

<P>The first constructor constructs a conversion object with no stored
<A HREF="#wide-error string">wide-error string</A> or
<A HREF="#byte-error string">byte-error string</A>.
The second constructor stores a copy of <CODE>byte_err</CODE> in the stored
byte-error string.
The second constructor stores a copy of <CODE>byte_err</CODE> in the stored
byte-error string and stores a copy of <CODE>wide_err</CODE> in the stored
wide-error string.</P>

<HR>

<P>Template class <CODE>wbuffer_convert</CODE> looks like a wide stream
buffer, but performs all its I/O through an underlying byte stream buffer
that you specify when you construct it. Like template class
<CODE>wstring_convert</CODE>, it lets you specify a code conversion
facet to perform the conversions, without affecting any streams or locales.
The previous example can also be written as:</P>

<PRE>    wbuffer_convert&lt;codecvt_utf8&lt;wchar_t&gt; &gt;
        mybuf(cout.rdbuf());  // construct wide stream buffer object
    std::wofstream mystr(mybuf); // construct wide ostream object
    cout &lt;&lt; L"Hello";</PRE>

<P>Something like template class <CODE>wstring_convert</CODE> is needed
to perform code conversions when writing to streams other than files.</P>

<H2><A NAME="wbuffer_convert"><CODE>wbuffer_convert</CODE></A></H2>

<PRE>template&lt;class Codecvt,
    class Elem = wchar_t,
    class Tr = std::char_traits&lt;Elem&gt; &gt;
    class wbuffer_convert
        : public std::basic_streambuf&lt;Elem, Tr&gt;
    {
public:
    <B><A HREF="#wbuffer_convert::wbuffer_convert">wbuffer_convert</A></B>(std::streambuf *bytebuf = 0);
    std::streambuf *<B><A HREF="#wbuffer_convert::rdbuf">rdbuf</A></B>();
    std::streambuf *<B><A HREF="#wbuffer_convert::rdbuf">rdbuf</A></B>(std::streambuf *bytebuf);

<I>    // exposition only
private:
    std::streambuf *<B>bufptr</B>;</I>
    };</PRE>

<P>The template class describes a stream buffer that controls the
transmission of elements of type <CODE>Elem</CODE>, whose character traits
are described by the class <CODE>Tr</CODE>, to and from a byte stream
buffer of type <CODE>std::streambuf</CODE>. Conversion between a sequence
of <CODE>Elem</CODE> values and multibyte sequences is performed by an
object of class <CODE>Codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>,
which meets the requirements of the standard code-conversion facet
<CODE>std::codecvt&lt;Elem, char, std::mbstate_t&gt;</CODE>.</P>

<P>An object of this template class stores a pointer to its underlying
byte stream buffer, called
<B><A NAME="bufptr"><CODE>bufptr</CODE></A></B> here for the sake of
exposition.</P>

<H3><CODE><A NAME="wbuffer_convert::wbuffer_convert">wbuffer_convert::wbuffer_convert</A></CODE></H3>

<PRE><B>wbuffer_convert</B>(std::streambuf *bytebuf = 0);</PRE>

<P>The constructor constructs a stream buffer object and initializes its
stored byte stream buffer pointer to <CODE>bytebuf</CODE>.</P>

<H3><CODE><A NAME="wbuffer_convert::rdbuf">wbuffer_convert::rdbuf</A></CODE></H3>

<PRE>std::streambuf *<B>rdbuf</B>();
std::streambuf *<B>rdbuf</B>(std::streambuf *bytebuf);</PRE>

<P>The first member function returns the stored byte stream buffer pointer.
The second member function stores <CODE>bytebuf</CODE> in the stored byte
stream buffer pointer.</P>

<HR>
<P><I>Copyright</A> &#169; 2002-2004
by Dinkumware, Ltd. All rights reserved.</I></P>
