<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 4044: Confusing requirements for std::print on POSIX platforms</title>
<meta property="og:title" content="Issue 4044: Confusing requirements for std::print on POSIX platforms">
<meta property="og:description" content="C++ library issue. Status: WP">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue4044.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#WP">WP</a> status.</em></p>
<h3 id="4044"><a href="lwg-defects.html#4044">4044</a>. Confusing requirements for <code>std::print</code> on POSIX platforms</h3>
<p><b>Section:</b> 31.7.10 <a href="https://wg21.link/print.fun">[print.fun]</a> <b>Status:</b> <a href="lwg-active.html#WP">WP</a>
 <b>Submitter:</b> Jonathan Wakely <b>Opened:</b> 2024-01-24 <b>Last modified:</b> 2024-11-28</p>
<p><b>Priority: </b>3
</p>
<p><b>View all other</b> <a href="lwg-index.html#print.fun">issues</a> in [print.fun].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#WP">WP</a> status.</p>
<p><b>Discussion:</b></p>
<p>
The effects for <code>vprintf_unicode</code> say:
</p>

<blockquote>
<p>
If <code>stream</code> refers to a terminal capable of displaying Unicode,
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
and implementations are encouraged to diagnose it.
Otherwise writes <code>out</code> to <code>stream</code> unchanged.
If the native Unicode API is used, the function flushes <code>stream</code>
before writing <code>out</code>.
</p>
<p>
[<i>Note 1</i>:
On POSIX and Windows, <code>stream</code> referring to a terminal means that,
respectively, <code>isatty(fileno(stream))</code> and
<code>GetConsoleMode(_get_osfhandle(_fileno(stream)), ...)</code>
return nonzero.
&mdash; <i>end note</i>]
</p>
<p>
[<i>Note 2</i>:
On Windows, the native Unicode API is <code>WriteConsoleW</code>.
&mdash; <i>end note</i>]
</p>
<p>-8-
<i>Throws</i>:  [...]
</p>
<p>-9-
<i>Recommended practice</i>:
If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
</p>
</blockquote>

<p>
The very explicit mention of <code>isatty</code> for POSIX platforms has
confused at least two implementers into thinking that we're supposed to
use <code>isatty</code>, and supposed to do something differently based
on what it returns. That seems consistent with the nearly identical wording
in 28.5.2.2 <a href="https://wg21.link/format.string.std">[format.string.std]</a> paragraph 12, which says
"Implementations should use either UTF-8, UTF-16, or UTF-32,
on platforms <u>capable of displaying Unicode text in a terminal</u>"
and then has a note explicitly saying this is the case for Windows-based and
many POSIX-based operating systems. So it seems clear that POSIX platforms
are supposed to be considered to have "a terminal capable of displaying
Unicode text", and so <code>std::print</code> should use <code>isatty</code>
and then use a native Unicode API, and diagnose invalid code units.
</p>
<p>
This is a problem however, because <code>isatty</code> needs
to make a system call on Linux, adding 500ns to every <code>std::print</code>
call. This results in a 10x slowdown on Linux, where <code>std::print</code>
can take just 60ns without the <code>isatty</code> check.
</p>
<p>
From discussions with Tom Honermann I learned that the "native Unicode API"
wording is only relevant on Windows. This makes sense, because for POSIX
platforms, writing to a terminal is done using the usual stdio functions,
so there's no need to treat a terminal differently to any other file stream.
And substitution of invalid code units with
<span style="font-variant: small-caps">u+fffd</span>
is recommended for Windows because that's what typical modern terminals do on
POSIX platforms, so requiring the implementation to do that on Windows gives
consistent behaviour. But the implementation doesn't need to do anything to
make that happen with a POSIX terminal, it happens anyway.
So the <code>isatty</code> check is unnecessary for POSIX platforms,
and the note mentioning it just causes confusion and has no benefit.
</p>

<p>
Secondly, there initially seems to be a contradiction between the 
"implementations are encouraged to diagnose it" wording and the later
<i>Recommended practice</i>. In fact, there's no contradiction because
the native Unicode API might accept UTF-8 and therefore require no
transcoding, and so the <i>Recommended practice</i> wouldn't apply.
The intention is that diagnosing invalid UTF-8 is still desirable in this case,
but how should it be diagnosed? By writing an error to the terminal alongside
the formatted string?
Or by substituting <span style="font-variant: small-caps">u+fffd</span> maybe?
If the latter is the intention, why is one suggestion in the middle of the
<i>Effects</i>, and one given as <i>Recommended practice</i>?
</p>

<p>
The proposed resolution attempts to clarify that a "native Unicode API"
is only needed if that's how you display Unicode on the terminal.
It also moves the flushing requirement to be adjacent to the other
requirements for systems using a native Unicode API instead of on its own
later in the paragraph.
And the suggestion to diagnose invalid code units is moved into the
<i>Recommended practice</i> and clarified that it's only relevant if
using a native Unicode API. I'm still not entirely happy with encouragement
to diagnose invalid code units without giving any clue as to how that should
be done. What does it mean to diagnose something at runtime? That's novel
for the C++ standard. The way it's currently phrased seems to imply something
other than <span style="font-variant: small-caps">u+fffd</span> substitution
should be done, although that seems the most obvious implementation to me.
</p>


<p><i>[2024-03-12; Reflector poll]</i></p>

<p>
Set priority to 3 after reflector poll and send to SG16.
</p>

<p><strong>Previous resolution [SUPERSEDED]:</strong></p>
<blockquote class="note">

<p>
This wording is relative to <a href="https://wg21.link/N4971" title=" Working Draft, Programming Languages — C++">N4971</a>.
</p>

<ol>
<li><p>Modify 31.7.6.3.5 <a href="https://wg21.link/ostream.formatted.print">[ostream.formatted.print]</a> as indicated:</p>
<blockquote>
<pre>
void vprint_unicode(ostream&amp; os, string_view fmt, format_args args);
void vprint_nonunicode(ostream&amp; os, string_view fmt, format_args args);
</pre>
<p>-3-
<i>Effects</i>:
Behaves as a formatted output function
(31.7.6.3.1 <a href="https://wg21.link/ostream.formatted.reqmts">[ostream.formatted.reqmts]</a>)
of <code>os</code>, except that:
<ol style="list-style-type: none">
<li>(3.1) &ndash;
failure to generate output is reported as specified below, and
</li>
<li>(3.2) &ndash;
any exception thrown by the call to <code>vformat</code> is propagated without
regard to the value of <code>os.exceptions()</code> and without turning on
<code>ios_base::badbit</code> in the error state of <code>os</code>.
</li>
</ol>
</p>
<p>
After constructing a <code>sentry</code> object,
the function initializes an automatic variable via
<pre><code>  string out = vformat(os.getloc(), fmt, args); </code></pre>
If the function is <code>vprint_unicode</code>
and <code>os</code> is a stream that
refers to a terminal capable of displaying Unicode
<ins>via a native Unicode API,</ins>
which is determined in an implementation-defined manner,
<ins>flushes <code>os</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it</del>.
<del>
If the native Unicode API is used, the function flushes <code>os</code>
before writing <code>out</code>.
</del>
Otherwise, (if <code>os</code> is not such a stream or the function is
<code>vprint_nonunicode</code>), inserts the character sequence
[<code>out.begin()</code>,<code>out.end()</code>) into <code>os</code>.
If writing to the terminal or inserting into <code>os</code> fails, calls
<code>os.setstate(ios_base::badbit)</code>
(which may throw <code>ios_base::failure</code>).
</p>
<p>-4-
<i>Recommended practice</i>:
For <code>vprint_unicode</code>,
if invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
<ins>
If invoking the native Unicode API  does not require transcoding,
implementations are encouraged to diagnose invalid code units.
</ins>
</p>
</blockquote>
</li>

<li><p>Modify 31.7.10 <a href="https://wg21.link/print.fun">[print.fun]</a> as indicated:</p>

<blockquote>
<pre>
void vprint_unicode(FILE* stream, string_view fmt, format_args args);
</pre>
<p>-6-
<i>Preconditions</i>:
<code>stream</code> is a valid pointer to an output C stream.
</p>
<p>-7-
<i>Effects</i>:
The function initializes an automatic variable via
<pre><code>  string out = vformat(fmt, args); </code></pre>
If <code>stream</code> refers to a terminal capable of displaying Unicode
<ins>via a native Unicode API</ins>,
<ins>flushes <code>stream</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it</del>.
Otherwise writes <code>out</code> to <code>stream</code> unchanged.
<del>
If the native Unicode API is used, the function flushes <code>stream</code>
before writing <code>out</code>.
</del>
</p>
<p>
[<i>Note 1</i>:
On <del>POSIX and</del> Windows<del>,</del>
<ins>the native Unicode API is <code>WriteConsoleW</code> and</ins>
<code>stream</code> referring to a terminal means that<del>,
respectively, <code>isatty(fileno(stream))</code> and</del>
<code>GetConsoleMode(_get_osfhandle(_fileno(stream)), ...)</code>
return nonzero.
&mdash; <i>end note</i>]
</p>
<p>
<del>
[<i>Note 2</i>:
On Windows, the native Unicode API is <code>WriteConsoleW</code>.
&mdash; <i>end note</i>]
</del>
</p>
<p>-8-
<i>Throws</i>:  [...]
</p>
<p>-9-
<i>Recommended practice</i>:
If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
<ins>
If invoking the native Unicode API  does not require transcoding,
implementations are encouraged to diagnose invalid code units.
</ins>
</p>
</blockquote>
</li>
</ol>
</blockquote>

<p><i>[2024-03-12; Jonathan updates wording based on SG16 feedback]</i></p>

<p>
SG16 reviewed the issue and approved the proposed resolution with
the wording about diagnosing invalid code units removed.
</p>
<p>
SG16 favors removing the following text (both occurrences) from the proposed
wording. This is motivated by a lack of understanding regarding what it means
to diagnose such invalid code unit sequences given that the input is likely
provided at run-time.

<blockquote>
If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.
</blockquote>
</p>

<p>
Some concern was expressed regarding how the current wording is structured.
At present, the wording leads with a Windows centric perspective;
if the stream refers to a terminal ... use the native Unicode API ...
otherwise write code units to the stream.
It might be an improvement to structure the wording such that use of the native
Unicode API is presented as a fallback for implementations that require its use
when writing directly to the stream is not sufficient to produce desired
results. In other words, the wording should permit direct writing to the stream
even when the stream is directed to a terminal and a native Unicode API is
available when the implementation has reason to believe that doing so will
produce the correct results. For example, Microsoft's HoloLens has a Windows
based operating system, but it only supports use of UTF-8 as the system code
page and therefore would not require the native Unicode API bypass;
implementations for it could avoid the overhead of checking to see if the
stream is directed to a console.
</p>

<p><strong>Previous resolution [SUPERSEDED]:</strong></p>
<blockquote class="note">


<p>
This wording is relative to <a href="https://wg21.link/N4971" title=" Working Draft, Programming Languages — C++">N4971</a>.
</p>

<ol>
<li><p>Modify 31.7.6.3.5 <a href="https://wg21.link/ostream.formatted.print">[ostream.formatted.print]</a> as indicated:</p>
<blockquote>
<pre>
void vprint_unicode(ostream&amp; os, string_view fmt, format_args args);
void vprint_nonunicode(ostream&amp; os, string_view fmt, format_args args);
</pre>
<p>-3-
<i>Effects</i>:
Behaves as a formatted output function
(31.7.6.3.1 <a href="https://wg21.link/ostream.formatted.reqmts">[ostream.formatted.reqmts]</a>)
of <code>os</code>, except that:
<ol style="list-style-type: none">
<li>(3.1) &ndash;
failure to generate output is reported as specified below, and
</li>
<li>(3.2) &ndash;
any exception thrown by the call to <code>vformat</code> is propagated without
regard to the value of <code>os.exceptions()</code> and without turning on
<code>ios_base::badbit</code> in the error state of <code>os</code>.
</li>
</ol>
</p>
<p>
After constructing a <code>sentry</code> object,
the function initializes an automatic variable via
<pre><code>  string out = vformat(os.getloc(), fmt, args); </code></pre>
If the function is <code>vprint_unicode</code>
and <code>os</code> is a stream that
refers to a terminal <ins>that is only</ins> capable of displaying Unicode
<ins>via a native Unicode API,</ins>
which is determined in an implementation-defined manner,
<ins>flushes <code>os</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it</del>.
<del>
If the native Unicode API is used, the function flushes <code>os</code>
before writing <code>out</code>.
</del>
Otherwise, (if <code>os</code> is not such a stream or the function is
<code>vprint_nonunicode</code>), inserts the character sequence
[<code>out.begin()</code>,<code>out.end()</code>) into <code>os</code>.
If writing to the terminal or inserting into <code>os</code> fails, calls
<code>os.setstate(ios_base::badbit)</code>
(which may throw <code>ios_base::failure</code>).
</p>
<p>-4-
<i>Recommended practice</i>:
For <code>vprint_unicode</code>,
if invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
</p>
</blockquote>
</li>

<li><p>Modify 31.7.10 <a href="https://wg21.link/print.fun">[print.fun]</a> as indicated:</p>

<blockquote>
<pre>
void vprint_unicode(FILE* stream, string_view fmt, format_args args);
</pre>
<p>-6-
<i>Preconditions</i>:
<code>stream</code> is a valid pointer to an output C stream.
</p>
<p>-7-
<i>Effects</i>:
The function initializes an automatic variable via
<pre><code>  string out = vformat(fmt, args); </code></pre>
If <code>stream</code> refers to a terminal <ins>that is only</ins>
capable of displaying Unicode
<ins>via a native Unicode API</ins>,
<ins>flushes <code>stream</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it</del>.
Otherwise writes <code>out</code> to <code>stream</code> unchanged.
<del>
If the native Unicode API is used, the function flushes <code>stream</code>
before writing <code>out</code>.
</del>
</p>
<p>
[<i>Note 1</i>:
On <del>POSIX and</del> Windows<del>,</del>
<ins>the native Unicode API is <code>WriteConsoleW</code> and</ins>
<code>stream</code> referring to a terminal means that<del>,
respectively, <code>isatty(fileno(stream))</code> and</del>
<code>GetConsoleMode(_get_osfhandle(_fileno(stream)), ...)</code>
return nonzero.
&mdash; <i>end note</i>]
</p>
<p>
<del>
[<i>Note 2</i>:
On Windows, the native Unicode API is <code>WriteConsoleW</code>.
&mdash; <i>end note</i>]
</del>
</p>
<p>-8-
<i>Throws</i>:  [...]
</p>
<p>-9-
<i>Recommended practice</i>:
If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
</p>
</blockquote>
</li>
</ol>
</blockquote>

<p><i>[2024-03-19; Tokyo: Jonathan updates wording after LWG review]</i></p>

<p>
Split the <em>Effects</em>: into separate bullets for the "native Unicode API"
and "otherwise" cases. Remove the now-redundant "if <code class='backtick'>os</code> is not such a stream"
parenthesis.
</p>

<p><i>[St. Louis 2024-06-24; move to Ready.]</i></p>

<p><i>[Wrocław 2024-11-23; Status changed: Voting &rarr; WP.]</i></p>



<p id="res-4044"><b>Proposed resolution:</b></p>
<p>
This wording is relative to <a href="https://wg21.link/N4971" title=" Working Draft, Programming Languages — C++">N4971</a>.
</p>

<ol>
<li><p>Modify 31.7.6.3.5 <a href="https://wg21.link/ostream.formatted.print">[ostream.formatted.print]</a> as indicated:</p>
<blockquote>
<pre>
void vprint_unicode(ostream&amp; os, string_view fmt, format_args args);
void vprint_nonunicode(ostream&amp; os, string_view fmt, format_args args);
</pre>
<p>-3-
<i>Effects</i>:
Behaves as a formatted output function
(31.7.6.3.1 <a href="https://wg21.link/ostream.formatted.reqmts">[ostream.formatted.reqmts]</a>)
of <code>os</code>, except that:
<ol style="list-style-type: none">
<li>(3.1) &ndash;
failure to generate output is reported as specified below, and
</li>
<li>(3.2) &ndash;
any exception thrown by the call to <code>vformat</code> is propagated without
regard to the value of <code>os.exceptions()</code> and without turning on
<code>ios_base::badbit</code> in the error state of <code>os</code>.
</li>
</ol>
</p>

<p>
<ins>-?-</ins>
After constructing a <code>sentry</code> object,
the function initializes an automatic variable via
<pre><code>  string out = vformat(os.getloc(), fmt, args); </code></pre>
<ol style="list-style-type: none">
<li><ins>(?.1) &ndash; </ins>
If the function is <code>vprint_unicode</code>
and <code>os</code> is a stream that
refers to a terminal <ins>that is only</ins> capable of displaying Unicode
<ins>via a native Unicode API,</ins>
which is determined in an implementation-defined manner,
<ins>flushes <code>os</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it.</del>
<del>
If the native Unicode API is used, the function flushes <code>os</code>
before writing <code>out</code></del>.
</li>
<li><ins>(?.2) &ndash; </ins>
Otherwise,
<del>
(if <code>os</code> is not such a stream or the function is
<code>vprint_nonunicode</code>),
</del>
inserts the character sequence
[<code>out.begin()</code>,<code>out.end()</code>) into <code>os</code>.
</li>
</ol>
</p>
<p>
<ins>-?-</ins>
If writing to the terminal or inserting into <code>os</code> fails, calls
<code>os.setstate(ios_base::badbit)</code>
(which may throw <code>ios_base::failure</code>).
</p>
<p>-4-
<i>Recommended practice</i>:
For <code>vprint_unicode</code>,
if invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
</p>
</blockquote>
</li>

<li><p>Modify 31.7.10 <a href="https://wg21.link/print.fun">[print.fun]</a> as indicated:</p>

<blockquote>
<pre>
void vprint_unicode(FILE* stream, string_view fmt, format_args args);
</pre>
<p>-6-
<i>Preconditions</i>:
<code>stream</code> is a valid pointer to an output C stream.
</p>
<p>-7-
<i>Effects</i>:
The function initializes an automatic variable via
<pre><code>  string out = vformat(fmt, args); </code></pre>
<ol style="list-style-type: none">
<li><ins>(7.1) &ndash; </ins>
If <code>stream</code> refers to a terminal <ins>that is only</ins>
capable of displaying Unicode
<ins>via a native Unicode API</ins>,
<ins>flushes <code>stream</code> and then</ins>
writes <code>out</code> to the terminal using the native Unicode API;
if <code>out</code> contains invalid code units, the behavior is undefined
<del>and implementations are encouraged to diagnose it</del>.
</li>
<li><ins>(7.2) &ndash; </ins>
Otherwise writes <code>out</code> to <code>stream</code> unchanged.
</li>
</ol>
</p>
<p>
<del>
If the native Unicode API is used, the function flushes <code>stream</code>
before writing <code>out</code>.
</del>
</p>
<p>
[<i>Note 1</i>:
On <del>POSIX and</del> Windows<del>,</del>
<ins>the native Unicode API is <code>WriteConsoleW</code> and</ins>
<code>stream</code> referring to a terminal means that<del>,
respectively, <code>isatty(fileno(stream))</code> and</del>
<code>GetConsoleMode(_get_osfhandle(_fileno(stream)), ...)</code>
return<ins>s</ins> nonzero.
&mdash; <i>end note</i>]
</p>
<p>
<del>
[<i>Note 2</i>:
On Windows, the native Unicode API is <code>WriteConsoleW</code>.
&mdash; <i>end note</i>]
</del>
</p>
<p>-8-
<i>Throws</i>:  [...]
</p>
<p>-9-
<i>Recommended practice</i>:
If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with
<span style="font-variant: small-caps">u+fffd replacement character</span>
per the Unicode Standard, Chapter 3.9
<span style="font-variant: small-caps">u+fffd</span> Substitution in Conversion.
</p>
</blockquote>
</li>
</ol>






</body>
</html>
