<html>
<head>
<title>N4412: Shortcomings of iostreams</title>

<style type="text/css">
  ins { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .new { text-decoration:none; font-weight:bold; background-color:#D0FFD0 }
  del { text-decoration:line-through; background-color:#FFA0A0 }  
  strong { font-weight: inherit; color: #2020ff }
</style>
</head>

<body>
ISO/IEC JTC 1/SC 22/WG 21 N4412<br/>
Jens Maurer &lt;Jens.Maurer@gmx.net><br/>
2015-04-09<br/>

<h1>N4412: Shortcomings of iostreams</h1>

This paper collects the edited notes from the 2015-02-24 evening
session on iostreams held during the LWG meeting in Cologne.  I would
like to thank all participants, and Dietmar K&uuml;hl in
particular, for their input.

<h2>Use cases</h2>

There are a number of communications protocol frameworks in use that
employ text-based representations of data, for example XML and JSON.
The text is machine-generated and machine-read and should not
depend on or consider the locales at either end.


<h2>Low-level facilities</h2>

Low-level, locale-independent conversion functions for integer/string
and floating-point/string conversions should be exposed.  Currently,
they are assumed to be present at the core of <code>printf</code> and
<code>std::num_put</code> / <code>std::num_get</code> (22.4.2
category.numeric), but are not availabe to be called directly from
user code. Examples for such functions are <code>ecvt</code>,
<code>fcvt</code>, and <code>gcvt</code>.

<p>

Futher, there are currently no functions to reliably round-trip
floating-point values between binary representation (with fixed size)
and decimal representation (with minimal number of digits used).  Some
environments such as
<a href="http://www.w3.org/TR/xmlschema-2/#float">XML Schema</a>
require this.  See the following papers:

<ul>
<li>"How to Print Floating-Point Numbers Accurately" (Guy L. Steele Jr., Jon L. White),<br/> <a href="https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf">https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf</a></li>

<li>"Printing Floating-Point Numbers Quickly and Accurately with Integers" (Florian Loitsch),<br/> <a href="http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf">http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf</a></li>

<li>"How to Read Floating-Point Numbers Accurately" (William D. Clinger),<br/>  <a href="http://www.cesura17.net/~will/Professional/Research/Papers/howtoread.pdf">http://www.cesura17.net/~will/Professional/Research/Papers/howtoread.pdf</a>

</ul>

<p>

Overall, a <code>std::streambuf</code> seems to provide the right
interface for an I/O buffer of unspecified size.  However, that
interface does not seem to be adequately minimal for byte stream
processing (e.g. Base64 encoding or gzip compression).

<p>

Text processing should (optionally) be agnostic to different
conventions regarding line endings.  The existing iostreams offer
"text mode" for that.

<p>

The equivalent of a <code>std::filebuf</code> should not attempt to
perform code conversions.  Instead, there should be a filtering
streambuf that performs code conversions based on the codecvt
facilities.

<p>

Details for operating system errors (e.g. POSIX errno) leading to I/O
failure should be exposed, not concealed.



<h2>iostreams interface</h2>

Formatting parameters (such as uppercase/lowercase and radix) are
specified by setting flags, which mostly persist for an arbitrary
number of subsequent low-level formatting operations, until explicitly
changed.  This approach inhibits compile-time checks and compile-time
choice of formatting, and potentially establishes state shared between
threads (which requires synchronization for access).

<p>

Iostreams are templatized on the character type and the
<code>char_traits</code>. The latter degree of freedom is rarely
exercised, possibly allowing repurposing for an incremental extension
of the current iostreams design.

<p>

Chaining in the form of "s &lt;&lt; a &lt;&lt; b &lt;&lt; c &lt;&lt;
d" is a successful interface technique, because it allows to output an
arbitrary number of items in a type-safe manner. Overload resolution
on operator&lt;&lt; tends to get expensive for larger projects with
hundreds or thousands of candidates in the overload set. This seems
hard to resolve, since choosing a different name for operator&lt;&lt;
simply shifts the expense of overload resolution to a
differently-named function.

<p>

With C++11, a typesafe "printf"-style interface using variadic
templates is possible, also allowing reordering of arguments for
output, depending on the format string.

<p>

The API of Matt Wilson's FastFormat library at
<a href="http://sourceforge.net/projects/fastformat/">http://sourceforge.net/projects/fastformat/</a>
should be considered, also the various specializations of Boost's
<code>lexical_cast</code>.

<p>

The ability to extend the system to provide input/output primitives
for user-defined types is a mandatory feature of a C++ I/O library.

<p>

There is currently no use of the money (22.4.6 category.monetary) or
time (22.4.5 category.time) formatting facets from the iostreams
framework.

<p>

It is possible to construct a second i/ostream object using the same
<code>std::streambuf</code> as an existing i/ostream object.  This
way, user-defined operator&lt;&lt; functions can quickly obtain an
i/ostream object with a well-defined state of the flags.


<h2>Internationalization and locales</h2>

Locales (see <code>std::locale</code>) are integrated at both the
<code>streambuf</code> and the <code>iostream</code> levels.  Locale
acquisition requires two synchronizations per elementary output call,
because locales are a global resource.

<p>

Optional internationalization is a mandatory feature of a C++ I/O
library, based on the low-level facilities instead of integrated
therein.

<p>

C++ locales are an incomplete solution for common internationalization
requirements; see
<a href="http://www.boost.org/doc/libs/1_57_0/libs/locale/doc/html/rationale.html#rationale_why">http://www.boost.org/doc/libs/1_57_0/libs/locale/doc/html/rationale.html#rationale_why</a>.
ICU (International Components for Unicode) offer comprehensive
internationalization support, albeit with a sub-standard C++
interface. Offering a usable C++ interface based on the ICU feature
set (and implementation) would benefit from decades of experience that
went into ICU.

<p>

For a program, there should be one canonical internal format for
internationalized text processing (e.g. UTF-8 or UTF-32), with
conversions at the input and output boundaries of the program.  An
internal representation based on <code>char32_t</code> (i.e. UTF-32)
has the benefits of a fixed-length encoding, but uses more memory than
UTF-8 encoding for most texts.
