<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xml:lang='en' xmlns:svg='http://www.w3.org/2000/svg' xmlns='http://www.w3.org/1999/xhtml'>
<head><meta content='application/xhtml+xml;charset=utf-8' http-equiv='Content-type' /><title>A printf-like Interface for the Streams Library</title></head>
<body><!-- maruku -o printf.html printf.md -->
<table><tr><th>Doc. no.:</th>	<td>N3506</td></tr>
<tr><th>Date:</th> 			<td>2012-12-26</td></tr>
<tr><th>Project:</th>		<td>Programming Language C++, Library Working Group</td></tr>
<tr><th>Reply-to:</th>		<td>Zhihao Yuan &lt;lichray at gmail dot com&gt;</td></tr></table>

<h1 id='a_printflike_interface_for_the_streams_library'>A printf-like Interface for the Streams Library</h1>
<div class='maruku_toc'><ul style='list-style: none;'><li><a href='#overview'>Overview</a></li><li><a href='#impact_on_the_standard'>Impact on the Standard</a></li><li><a href='#design_decisions'>Design Decisions</a><ul style='list-style: none;'><li><a href='#syntax'>Syntax</a></li><li><a href='#extensibility'>Extensibility</a></li></ul></li><li><a href='#technical_specifications'>Technical Specifications</a><ul style='list-style: none;'><li><a href='#header_'>Header <code>&lt;ioformat&gt;</code></a></li><li><a href='#error_handling'>Error handling</a></li><li><a href='#formatting'>Formatting</a></li><li><a href='#wording'>Wording</a></li></ul></li><li><a href='#sample_implementation'>Sample Implementation</a><ul style='list-style: none;'><li><a href='#performance_notes'>Performance notes</a></li></ul></li><li><a href='#future_issues'>Future Issues</a></li><li><a href='#acknowledgments'>Acknowledgments</a></li><li><a href='#references'>References</a></li></ul></div>
<h2 id='overview'>Overview</h2>

<pre><code>  cout &lt;&lt; putf(&quot;hello, %s\n&quot;, &quot;world&quot;);</code></pre>

<p>Printf defines the most widely used syntax to format a text output. It exists in C, Perl, Python and even Java&#8482;, and is available from Qt to Boost.Format<code>[1]</code>, but not C++ standard library. This proposal tries to define such an interface based on the <code>printf</code> function defined by C<code>[2]</code> for the C++ I/O streams library, with the error handling policy and the type safety considered.</p>

<h2 id='impact_on_the_standard'>Impact on the Standard</h2>

<p>The proposed new header <code>&lt;ioformat&gt;</code> makes no changes to the existing interface of the streams library, other than an <code>operator&lt;&lt;(basic_ostream)</code> overload to print the unspecified return value of a new <code>std::putf</code> function. However, the proposed formatting features are not parallel to those provided by the existing streams library. For short, the I/O manipulators can be fully replaced by the member functions of <code>ios_base</code>, while <code>std::putf</code> can not.</p>

<p>The additional formatting features supported by <code>std::putf</code> are:</p>

<ul>
<li>Empty sign <code>&quot;% d&quot;</code>.</li>

<li>Hexfloat with precision <code>&quot;%.4a&quot;</code>.</li>

<li>Integer with precision (minimal digits) <code>&quot;%#5.2x&quot;</code>.</li>

<li>String with precision (truncation, only for C-style strings) <code>&quot;%.4s&quot;</code>.</li>
</ul>

<h2 id='design_decisions'>Design Decisions</h2>

<p>The idea is to define a portable and readable syntax to enable the extensible formatting of the streams library, while allowing an implementation to perform any formatting without any extra buffering comparing to the <code>&lt;&lt;</code> operator.</p>

<h3 id='syntax'>Syntax</h3>

<p>The syntax from printf in C is preserved as much as possible. Such an syntax is:</p>

<ul>
<li>Compatible with C; works as a drop-in replacement of <code>printf</code> (except <code>%n</code>).</li>

<li>Compatible with the legacy syntax supported by Boost.Format.</li>
</ul>

<p>For example, both of the following</p>

<pre><code>  cout &lt;&lt; format(&quot;The answer:%5d\n&quot;) % 42;  // boost.format
  cout &lt;&lt; putf(&quot;The answer:%5d\n&quot;, 42);     // std::experimental::putf</code></pre>

<p>print</p>

<pre><code>  The answer:   42</code></pre>

<p>The <em>width</em> <code>5</code> can be parameterized:</p>

<pre><code>  cout &lt;&lt; putf(&quot;The answer:%*d\n&quot;, 5, 42);  // same effect</code></pre>

<p>This mechanism is supported by both C and POSIX, but not Boost.Format.</p>

<p>POSIX<code>[4]</code> style positional arguments are added because they are necessary for i18n.</p>

<p>So the example above can be rewrote into:</p>

<pre><code>  cout &lt;&lt; putf(&quot;The answer:%2$*1$d\n&quot;, 5, 42);  // same effect</code></pre>

<p>The <code>%n</code> specification is dropped because of the security problem (and its weird semantics); no known printf fork (in Java&#8482;, Python, Boost.Format, etc.) supports it.</p>

<p>C++ streams style error handling policy and type safety requirements are satisfied with the highest priority. However, that makes the <em>length modifiers</em> (<code>hh</code>, <code>h</code>, <code>l</code>, <code>ll</code>, <code>j</code>, <code>z</code>, <code>t</code>, <code>L</code>) unneeded. The proposed solution is to ignore them, like Boost.Format and Python<code>[3]</code>, while the only difference is that, we completely ignore all of them according to the C standard, not just a subset.</p>

<h3 id='extensibility'>Extensibility</h3>

<p>A subset of the printf format specification can be translated into a combination of the formatting properties (<code>flags()</code>, <code>width()</code>, <code>precision()</code> and <code>fill()</code>) of an output stream. To balance the standard compliance and the extensibility, this proposal distinguishes the arguments to be printed into:</p>

<ul>
<li><em>internally formattable</em>, which have the same formatting as if they are formatted by <code>snprintf</code> or a wide character equivalence given the same format specifications with a fitted length modifier, and</li>

<li><em>potentially formattable</em>, which will be outputted by the <code>&lt;&lt;</code> operator with the translated formatting properties set up on the output stream.</li>
</ul>

<p>If an argument is internally formattable by a format specification, then C&#8217;s formatting is fully supported. For example, the following</p>

<pre><code>  cout &lt;&lt; putf(&quot;The answer:% -.4d\n&quot;, 42);  // empty sign, left alignment, 4 minimal digits</code></pre>

<p>has the same printing result as</p>

<pre><code>  printf(&quot;The answer:% -.4d\n&quot;, 42);</code></pre>

<p>which gives</p>

<pre><code>  The answer: 0042</code></pre>

<p>, while Boost.Format gives</p>

<pre><code>  The answer: 42</code></pre>

<p>without an integer precision support.</p>

<p>But if an argument is potentially formattable by a specification, the following</p>

<pre><code>  cout &lt;&lt; putf(&quot;The answer:% -.4f\n&quot;, 42);  // expects a floating point</code></pre>

<p>has the same printing result as</p>

<pre><code>  cout &lt;&lt; &quot;The answer:&quot; &lt;&lt; left &lt;&lt; setprecision(4) &lt;&lt; 42 &lt;&lt; &quot;\n&quot;</code></pre>

<p>which gives</p>

<pre><code>  The answer:42</code></pre>

<p>since there is no &#8220;empty sign&#8221; support in the streams library.</p>

<p>A detailed description is available in <a href='#formatting'>Formatting</a>.</p>

<h2 id='technical_specifications'>Technical Specifications</h2>

<p><em>The description below is based on POSIX<code>[4]</code>.</em></p>

<p><code>std::putf</code> takes a format string, followed by zero or more arguments. A format string is composed of zero or more directives: <em>ordinary characters</em>, which are copied unchanged to the output stream, and <em>format specifications</em>, each of which expects zero or more arguments.</p>

<p>An empty format specification <code>%%</code> matches no argument; a <code>&#39;%&#39;</code> character is printed without formatting.</p>

<p>A numbered format specification introduced by <code>&quot;%</code><em><code>n</code></em><code>$&quot;</code> matches the <em>n</em>th argument in the argument list, where <em>n</em> is a decimal integer.</p>

<p>An unnumbered format specification introduced by <code>&#39;%&#39;</code> matches the first unmatched argument in the argument list.</p>

<p>Matching an out-of-range argument in a format string results in an error described in <a href='#error_handling'>Error handling</a>, while the unmatched arguments are ignored. An argument can be matched multiple times by a format string of the numbered format specifications.</p>

<p>The character sequence <code>&quot;%</code><em><code>n</code></em><code>$&quot;</code> or the <code>&#39;%&#39;</code> character, introducing a format specification, has the following appear in sequence:</p>

<ul>
<li>Zero or more <em>flags</em> (in any order).</li>

<li>An optional minimum <em>field width</em>, which takes either a parameterized length ( <code>&#39;*&#39;</code> or <code>&quot;*</code><em><code>n</code></em><code>$&quot;</code>), described below, or a decimal integer.</li>

<li>An optional <em>precision</em>, which takes the form of a period ( <code>&#39;.&#39;</code> ) followed either by a parameterized length ( <code>&#39;*&#39;</code> or <code>&quot;*</code><em><code>n</code></em><code>$&quot;</code> ), described below, or an optional decimal digit string, where a null digit string is treated as zero.</li>

<li>An optional length modifier (ignored).</li>

<li>A <em>type hint</em> character that indicates the type of the matched argument.</li>
</ul>

<p>A field width, or precision, or both, may be indicated by a numbered parameterized length ( <code>&quot;*</code><em><code>n</code></em><code>$&quot;</code> ), which is allowed within a numbered format specification, or an unnumbered parameterized length ( <code>&#39;*&#39;</code> ), which is allowed within an unnumbered format specification. In such cases an argument of type <code>streamsize</code> supplies the field width or precision. A numbered parameterized length matches the <em>n</em>th argument in the argument list, where <em>n</em> is a decimal integer. The unnumbered parameterized lengths, in their order of appearance, match the unmatched arguments in the argument list, before the format specification they belong to. A negative field width is taken as a <code>&#39;-&#39;</code> flag followed by a positive field width. A negative precision is taken as if the precision were omitted.</p>

<p>A format string can contain either numbered format specifications, or unnumbered format specifications, but not both. Mixing numbered and unnumbered specifications or parameterized lengths result in an error described in <a href='#error_handling'>Error handling</a>. The empty format specification <code>%%</code> can be mixed with any specifications.</p>

<h3 id='header_'>Header <code>&lt;ioformat&gt;</code></h3>

<pre><code>  namespace std {
  namespace experimental {

    // types _Ts1_ and _Ts2_ are sets of implementation types which are distinguishable for different T...

    template &lt;typename CharT, typename... T&gt;
    _Ts1_ putf(CharT const *fmt, T const&amp;... t);

    template &lt;typename CharT, typename Traits, typename Allocator, typename... T&gt;
    _Ts2_ putf(basic_string&lt;CharT, Traits, Allocator&gt; const&amp; fmt, T const&amp;... t);

    template &lt;typename CharT, typename Traits, typename... T&gt;
    auto operator&lt;&lt;(basic_ostream&lt;CharT, Traits&gt;&amp; os, _Ts1_or_Ts2_ bundle)
        -&gt; decltype(os);

  }}</code></pre>

<p>The output functions of the return values of <code>std::putf</code> do formatted output, but behavior like the <em>unformatted output functions</em>. Specifically, <code>flags()</code>, <code>width()</code>, <code>precision()</code> and <code>fill()</code> of the output stream are preserved when the flow of control leaves these functions, but may be changed during the execution. Changing the return values of these members before the execution takes no effect to the printing, except:</p>

<ul>
<li><code>flags() &amp; ios_base::unitbuf</code> may change the buffering behavior.</li>

<li><code>fill()</code> works as the default padding character.</li>
</ul>

<h3 id='error_handling'>Error handling</h3>

<p>An output function of a return value of <code>std::putf</code> may encounter the following kinds of errors found in the return value:</p>

<ul>
<li>A format specification is syntactically invalid.</li>

<li>A format specification expects an argument that does not appear in the argument list.</li>

<li>Mixing numbered and unnumbered format specifications or parameterized lengths.</li>

<li>The argument matched a parameterized length is not convertible to <code>streamsize</code>.</li>
</ul>

<p>The output function set <code>ios_base::failbit</code> on the output stream when one of the errors is encountered, and then can return. The well matched format specifications, as well as the ordinary characters, if any, before the format specification that fails, must be formatted and wrote to the output stream before the function returns.</p>

<h3 id='formatting'>Formatting</h3>

<p>For a <code>basic_ostream&lt;CharT, Traits&gt;</code> and a given format description, the matched argument is <em>internally formattable</em> if:</p>

<ul>
<li>the type hint is <code>d</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>x</code>, or <code>X</code>, and the argument is an integer, or</li>

<li>the type hint is <code>a</code>, <code>A</code>, <code>e</code>, <code>E</code>, <code>f</code>, <code>F</code>, <code>g</code>, or <code>G</code>, and the argument is a floating-point number, or</li>

<li>the type hint is <code>p</code>, and the argument is a pointer, or</li>

<li>the type hint is <code>c</code>, and the argument is <code>char</code>, <code>CharT</code>, or <code>Traits::int_type</code>, or <code>signed char</code>/<code>unsigned char</code> if <code>CharT</code> is <code>char</code>, or</li>

<li>the type hint is <code>s</code>, and the argument is <code>const char*</code>, <code>const CharT*</code>, or <code>const signed char*</code>/<code>const unsigned char*</code> if <code>CharT</code> is <code>char</code>.</li>
</ul>

<p><em>[Note: An internally formattable argument has an <code>operator&lt;&lt;</code> overload, member or non-member, in the <code>&lt;ostream&gt;</code> header, and can be printted by <code>printf</code> without a type-unsafe conversion. This note also applys to <code>Traits::int_type</code>, considering its underlying type. &#8211;end note]</em></p>

<p>Otherwise, the argument is <em>potentially formattable</em>.</p>

<p>If an <em>internally formattable</em> argument is an unsigned integer and the type hint is <code>d</code> or <code>i</code>, the argument is printed as if it is formatted by <code>snprintf</code> or a wide character equivalence, which conceptually uses a default padding character of <code>os.fill()</code>, given the same <em>flags</em>, <em>field-width</em>, and <em>precision</em>, if any, respectively, followed by a fitted length modifier, if needed, and a <em>type hint</em> of <code>u</code>. Otherwise, the argument is printed as if it is formatted by <code>snprintf</code> or a wide character equivalence, which conceptually uses a default padding character of <code>os.fill()</code>, given the same <em>flags</em>, <em>field-width</em>, and <em>precision</em>, if any, respectively, followed by a fitted length modifier, if needed, and the same <em>type hint</em>. <em>[Note: <code>u</code>, <code>o</code>, <code>x</code>, <code>X</code> convert a signed argument to unsigned, while <code>d</code> and <code>i</code> do not convert an unsigned argument to signed. &#8211;end note]</em></p>

<p>If the argument is <em>potentially formattable</em>, <code>width()</code> and <code>precision()</code> of the output stream are defaulted to <code>0</code> and <code>-1</code>, respectively. The <code>flags()</code> member is defaulted to <code>os.flags() &amp; ios_base::unitbuf</code>, and the <code>fill()</code> member is defaulted to the saved fill character of the output stream before entering the current output function.</p>

<p>For a given format description, if the argument is <em>potentially formattable</em>, the <em>flag</em> characters and their effects on the output stream are:</p>

<ul>
<li><strong><code>-</code></strong> sets <code>ios_base::left</code>.</li>

<li><strong><code>+</code></strong> sets <code>ios_base::showpos</code>.</li>

<li><em>space</em> has no effect.</li>

<li><strong><code>#</code></strong> sets <code>ios_base::showbase</code> and <code>ios_base::showpoint</code>.</li>

<li><strong><code>0</code></strong> sets <code>fill()</code> to <code>&#39;0&#39;</code> and sets <code>ios_base::internal</code>, only if the <code>&#39;-&#39;</code> flag does not appear in the flags, and a precision is not specified if the type hint is <code>d</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>x</code>, or <code>X</code>.</li>
</ul>

<p>Under the same preconditions, the <em>field-width</em> field, if any, sets the <code>width()</code> member of the output stream; and the <em>precision</em> field, if any, sets the <code>precision()</code> member of the output stream. <em>[Note: The cases of a negative <em>field-width</em> or <em>precision</em> are described in <a href='#technical_specifications'>Technical Specifications</a>. &#8211;end note]</em></p>

<p>Under the same preconditions, the <em>type hint</em> characters and their effects on the output stream are:</p>

<ul>
<li><strong><code>d</code></strong> sets <code>ios_base::dec</code>.</li>

<li><strong><code>i</code></strong> has no effect (<code>os.flags() &amp; ios_base::basefield == 0</code>).</li>

<li><strong><code>u</code></strong> sets <code>ios_base::dec</code>.</li>

<li><strong><code>o</code></strong> sets <code>ios_base::oct</code>.</li>

<li><strong><code>x</code></strong> sets <code>ios_base::hex</code>.</li>

<li><strong><code>X</code></strong> sets <code>ios_base::hex | ios_base::uppercase</code>.</li>

<li><strong><code>f</code></strong> sets <code>ios_base::fixed</code>.</li>

<li><strong><code>F</code></strong> sets <code>ios_base::fixed | ios_base::uppercase</code>.</li>

<li><strong><code>e</code></strong> sets <code>ios_base::scientific</code>.</li>

<li><strong><code>E</code></strong> sets <code>ios_base::scientific | ios_base::uppercase</code>.</li>

<li><strong><code>g</code></strong> has no effect (<code>os.flags() &amp; ios_base::floatfield == 0</code>)</li>

<li><strong><code>G</code></strong> sets <code>ios_base::uppercase</code>.</li>

<li><strong><code>a</code></strong> sets <code>ios_base::fixed | ios_base::scientific</code>.</li>

<li><strong><code>A</code></strong> sets <code>ios_base::fixed | ios_base::scientific | ios_base::uppercase</code>.</li>

<li><strong><code>c</code></strong> has no effect.</li>

<li><strong><code>s</code></strong> sets <code>ios_base::boolalpha</code>.</li>

<li><strong><code>p</code></strong> has no effect.</li>
</ul>

<p>And then, the <em>potentially formattable</em> argument, namely <code>t</code>, is printed by calling <code>os &lt;&lt; t</code>.</p>

<h3 id='wording'>Wording</h3>

<p>This is an initial report; a wording can be prepared after a further discussion.</p>

<h2 id='sample_implementation'>Sample Implementation</h2>

<p>A sample implementation is available at <a href='https://github.com/lichray/formatxx/tree/proposal'>https://github.com/lichray/formatxx/tree/proposal</a></p>

<p>One known defect in this implementation is that the <code>%a</code> and <code>%A</code> format specifications ignore the precision when printing a floating point argument.</p>

<h3 id='performance_notes'>Performance notes</h3>

<p>The additional runtime performance costs comparing with the streams library are caused by parsing the format string and creating the formatting guards (to restore the flags, precision, etc., after formatting each specifications, exception-safely). In addition, to access a positional argument numbered <em>N</em>, <em>N - 1</em> empty recursions are required to locate the correct template instantiation.</p>

<p>In the sample implementation, some extra copying are involved to emulate <code>printf</code>&#8217;s formatting features using streams. However, the <em>internally formattable</em> arguments are internally supported by the streams library, so a standard library implementation must be able to avoid these costs. For example, to print a string with precision, the sample implementation has to copy the string, while <code>libstdc++</code> already has an internal interface <code>__ostream_insert()</code> which takes a size parameter. These costs are not shown by the benchmark below, and Boost.Format does the same thing, actually.</p>

<p>Here is a benchmark using Boost.Format&#8217;s test code, release mode:</p>

<p>Non-positional arguments/normal:</p>

<pre><code>  printf time         :0.367188
  ostream time        :0.59375,  = 1.61702 * printf 
  format time         :2.125,  = 5.78723 * printf ,  = 3.57895 * nullStream 
  std::putf time      :0.90625,  = 2.46809 * printf ,  = 1.52632 * nullStream </code></pre>

<p>Positional arguments/normal:</p>

<pre><code>  printf time         :0.414062
  ostream time        :0.59375,  = 1.43396 * printf 
  format time         :2.11719,  = 5.11321 * printf ,  = 3.56579 * nullStream 
  std::putf time      :1.00781,  = 2.43396 * printf ,  = 1.69737 * nullStream </code></pre>

<p>Environment:</p>

<pre><code>  FreeBSD 8.3-STABLE amd64
  g++ 4.8.0 20121209
  Boost 1.48.0</code></pre>

<p><em>Explanations</em>:</p>

<p><em>The two test cases take the same amount of arguments, and have the same formatting results. The streams library has no such &#8220;positional arguments&#8221;, so I reordered the arguments by hand.</em></p>

<p><em>&#8220;normal&#8221; means the locale is turned on. However, I did not see a stable difference between <code>normal</code> and <code>no_locale</code>.</em></p>

<p><em>The format object of boost can be reused, which brings a performance increase around %17. Such a &#8220;feature&#8221; is not applicable to <code>printf</code> or <code>std::putf</code>, so I did not include them.</em></p>

<h2 id='future_issues'>Future Issues</h2>

<p>Do we need the <code>vprintf</code>-like interfaces, like, to take the tuples of arguments? If so, use a special flag or just more function names? For reference, check <code>std::experimental::vputf</code> in the sample implementation.</p>

<p>Is an <code>scanf</code> equivalence, e.g., <code>std::getf</code>, worth to be added?</p>

<p>If the <code>string_ref</code> proposal gets approved, the overloads to take a <code>string_ref</code> as the format string will also be proposed here.</p>

<h2 id='acknowledgments'>Acknowledgments</h2>

<p>Andrew Sandoval, who gave me some suggestions on standard-compliance and error handling.</p>

<p>Herb Sutter, who encouraged me to prepare the proposal, suggested me to add the positional arguments, and even provided many suggestions and corrections on the proposal.</p>

<p>Many people in the &#8220;std-proposals&#8221; mailing list: Jeffrey Yasskin, who &#8220;enforced&#8221; me to add the positional arguments; Martin Desharnais, who gave me the link about how to implement one; and many others.</p>

<h2 id='references'>References</h2>

<p><code>[1]</code> The Boost Format library. <a href='http://www.boost.org/doc/libs/1_52_0/libs/format/doc/format.html'>http://www.boost.org/doc/libs/1_52_0/libs/format/doc/format.html</a></p>

<p><code>[2]</code> The <code>fprintf</code> function. <em>ISO/IEC 9899:2011</em>. 7.21.6.1.</p>

<p><code>[3]</code> String Formatting Operations. <em>The Python Standard Library</em>. 5.6.2. <a href='http://docs.python.org/2/library/stdtypes.html#string-formatting'>http://docs.python.org/2/library/stdtypes.html#string-formatting</a></p>

<p><code>[4]</code> dprintf, fprintf, printf, snprintf, sprintf - print formatted output. <em>IEEE Std 1003.1-2008</em>. <a href='http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html'>http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html</a></p>
</body></html>
