<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0053)file://\\Bgd\j16\critique_of_code_conversion_prop.htm -->
<HTML><HEAD><TITLE>Critique of Code Conversion Proposal</TITLE>
<META http-equiv=Content-Language content=en-us>
<META content="Microsoft FrontPage 5.0" name=GENERATOR>
<META content=FrontPage.Editor.Document name=ProgId>
<META http-equiv=Content-Type content="text/html; charset=windows-1252"></HEAD>
<BODY>
<p>Doc No:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; N1750=05-0010<br>
Project:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Programming Language C++<br>
Date:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  2005-01-13<br>
Author:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Beman Dawes<br>
Email:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bdawes@acm.org</p>
<H1>Critique of Code Conversion Proposal (N1683)</H1>
<P>N1683=04-0123, <I><B>Proposed Library Additions for Code Conversion</B></I>, 
proposes sorely needed code conversion facilities for the standard library. (See 
<A 
href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html</A>) 
Without these facilities programmers concerned with internationalization are 
forced to reinvent the wheel; Boost has run into that problem two or three times in existing 
libraries, and will be hitting the problem again in libraries currently in the Boost pipeline. 
The proposal should be accepted by the LWG as a useful and high priority need.</P>
<P>That being said, there are several concerns which  indicate the proposal can be  
further refined and improved.</P>
<H2>1. Hard-wired types in <I>wstring_convert</I></H2>
<P>The underlying <I>wstring_convert </I>design seems flexible enough to cope 
with conversion between any two character types which meet 
<I>std::basic_string charT</I> requirements. Conversion is actually performed by 
<I>std::codecvt</I>, which is already parameterized by both <I>internalT</I> and 
<I>externalT charT</I> types. It seems artificial to restrict 
<I>byte_string</I> to <I>std::basic_string&lt;char</I>&gt; and restrict <i>
wide_string</i> to <i>std::basic_string</i>s which use the default traits 
and allocator. Other character types including the proposed <I>char16_t</I> and <I>char32_t</I> will 
need string conversions to and from other wide string types, yet with the current restrictions 
<I>wstring_convert</I> could not be used for that purpose.</P>
<P>Discussions on the Boost list focused on two possible generalizations:</P>
<ol>
  <li>Specify the two string types as template parameters, so that any types may 
  be used which meet the 
<I>std::basic_string charT</I> requirements.</li>
  <li>Specify the conversions as algorithms operating on iterators or iterator 
  ranges, using non-member functions.</li>
</ol>
<P>(2) is not proposed here because I believe it to be an over-generalization 
for functionality which has little use outside of strings.</P>
<P>Suggested change:</P>
<P>&nbsp;&nbsp;&nbsp; <code>template&lt; class Codecvt, class Elem = wchar_t &gt;</code></P>
<P>becomes::</P>
<P>&nbsp;&nbsp;&nbsp; <code>template&lt; class Codecvt, class WideS = 
std::wstring, class NarrowS = std::string &gt;</code></P>
<P>and change types within the class accordingly. See the <a href="#synopsis">
modified synopsis</a> below.</P>
<H2>2. Need target-argument form for <I>wstring_convert</I> conversion 
functions</H2>
<P><i>wstring_convert'</i>s conversion functions are in the form:</P>
<P>&nbsp;&nbsp;&nbsp; <I>byte_string to_bytes(const wide_string&amp; wstr) 
const;</I></P>
<P>While this form is often useful and should be retained, it may imply an extra 
copy of the result if a compiler is not smart enough to optimize the copy 
away.</P>
<P>Suggested change is to add additional functions in the form:</P>
<P>&nbsp;&nbsp;&nbsp; <I>void to_bytes(const wide_string&amp; wstr, byte_string 
&amp; target) const;</I></P>
<H2>3. Need way to access error strings</H2>
<p>Need to add member functions to access the two error string prefixes. See the <a href="#synopsis">
modified synopsis</a> below.</p>
<H2>4. More explicit name for <I>wstring_convert</I></H2>
<P>"wstring" might be misleading, depending on the actual types involved. 
"convert" is a verb, yet nouns make better class names.</P>
<P>Suggested change:</P>
<P>&nbsp;&nbsp;&nbsp; <I>wstring_convert</I></P>
<P>to:</P>
<P>&nbsp;&nbsp;&nbsp; <I>string_converter</I></P>
<h2>5. Improved member names</h2>
<p>The proposal uses the name &quot;byte&quot; to identify the narrow case in member 
names. That will be misleading if the actual type is something other than char. 
However, it isn't clear what a better set of member names would be. The modified 
synopsis below uses &quot;wide&quot; and &quot;narrow&quot; in names, even though they may also be 
misleading in the case where the sizes are the same. Perhaps a better set of 
names will surface as the proposal moves forward.</p>
<h2>Modified <a name="synopsis">synopsis</a></h2>
<p>This modified synopsis applies all of the suggestions above to make their 
impact easier to visualize.</p>
<p><code>template&lt;class Codecvt,<br>
&nbsp;&nbsp;&nbsp; class WideS = wstring,<br>
&nbsp;&nbsp;&nbsp; class NarrowS = string&gt;<br>
class string_converter<br>
{<br>
&nbsp;&nbsp;&nbsp; typedef NarrowS narrow_string_type;<br>
&nbsp;&nbsp;&nbsp; typedef typename NarrowS::value_type narrow_char_type;<br>
&nbsp;&nbsp;&nbsp; typedef WideS wide_string_type;<br>
&nbsp;&nbsp;&nbsp; typedef typename WideS::value_type wide_char_type;<br>
&nbsp;&nbsp;&nbsp; typedef typename Codecvt::state_type state_type;<br>
&nbsp;&nbsp;&nbsp; typedef typename wide_string_type::traits_type::state_type 
int_type;<br>
<br>
&nbsp;&nbsp;&nbsp; string_converter();<br>
&nbsp;&nbsp;&nbsp; string_converter(const narrow_string_type&amp; narrow_err);<br>
&nbsp;&nbsp;&nbsp; string_converter(const narrow_string_type&amp; narrow_err,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const wide_string_type&amp; wide_err);<br>
<br>
&nbsp;&nbsp;&nbsp; wide_string_type from_narrow(narrow_char_type value) const;<br>
&nbsp;&nbsp;&nbsp; wide_string_type from_narrow(const narrow_char_type *ptr) 
const;<br>
&nbsp;&nbsp;&nbsp; wide_string_type from_narrow(const narrow_string_type&amp; str) 
const;<br>
&nbsp;&nbsp;&nbsp; wide_string_type from_narrow(const narrow_char_type *first,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const narrow_char_type *last) const;<br>
<br>
&nbsp;&nbsp;&nbsp; void from_narrow(narrow_char_type value, wide_string_type &amp; 
target) const;<br>
&nbsp;&nbsp;&nbsp; void from_narrow(const narrow_char_type *ptr, 
wide_string_type &amp; target) const;<br>
&nbsp;&nbsp;&nbsp; void from_narrow(const narrow_string_type&amp; str, 
wide_string_type &amp; target) const;<br>
&nbsp;&nbsp;&nbsp; void from_narrow(const narrow_char_type *first,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const narrow_char_type *last, 
wide_string_type &amp; target) const;<br>
<br>
&nbsp;&nbsp;&nbsp; narrow_string_type to_narrow(wide_char_type wchar) const;<br>
&nbsp;&nbsp;&nbsp; narrow_string_type to_narrow(const wide_char_type *wptr) 
const;<br>
&nbsp;&nbsp;&nbsp; narrow_string_type to_narrow(const wide_string_type&amp; wstr) 
const;<br>
&nbsp;&nbsp;&nbsp; narrow_string_type to_narrow(const wide_char_type *first,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const wide_char_type *last) const;<br>
<br>
&nbsp;&nbsp;&nbsp; void to_narrow(wide_char_type wchar, narrow_string_type &amp; 
target) const;<br>
&nbsp;&nbsp;&nbsp; void to_narrow(const wide_char_type *wptr, narrow_string_type 
&amp; target) const;<br>
&nbsp;&nbsp;&nbsp; void to_narrow(const wide_string_type&amp; wstr, 
narrow_string_type &amp; target) const;<br>
&nbsp;&nbsp;&nbsp; void to_narrow(const wide_char_type *first,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; const wide_char_type *last, 
narrow_string_type &amp; target) const;<br>
<br>
&nbsp;&nbsp;&nbsp; const narrow_string_type &amp; narrow_error() const;<br>
&nbsp;&nbsp;&nbsp; const wide_string_type &amp; wide_error() const;<br>
<br>
&nbsp;&nbsp;&nbsp; // exposition only<br>
private:<br>
&nbsp;&nbsp;&nbsp; narrow_string_type narrow_err_string;<br>
&nbsp;&nbsp;&nbsp; wide_string_type wide_err_string;<br>
};</code></p>
<h2>Acknowledgements</h2>
<P>This critique is based on discussions with Thorsten Ottosen, Stefan Slapeta, 
Rob Stewart, 
and Jonathan Turkanis.</P>
<HR>

<P>Revised: <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B %Y" startspan -->13 January 2005<!--webbot bot="Timestamp" endspan i-checksum="32427" --></P>
<P>&nbsp;</P></BODY></HTML>