<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE>
    CWG Issue 2639</TITLE>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<STYLE TYPE="text/css">
  INS { text-decoration:none; font-weight:bold; background-color:#A0FFA0 }
  .INS { text-decoration:none; background-color:#D0FFD0 }
  DEL { text-decoration:line-through; background-color:#FFA0A0 }
  .DEL { text-decoration:line-through; background-color: #FFD0D0 }
  @media (prefers-color-scheme: dark) {
    HTML { background-color:#202020; color:#f0f0f0; }
    A { color:#5bc0ff; }
    A:visited { color:#c6a8ff; }
    A:hover, a:focus { color:#afd7ff; }
    INS { background-color:#033a16; color:#aff5b4; }
    .INS { background-color: #033a16; }
    DEL { background-color:#67060c; color:#ffdcd7; }
    .DEL { background-color:#67060c; }
  }
  SPAN.cmnt { font-family:Times; font-style:italic }
</STYLE>
</HEAD>
<BODY>
<P><EM>This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21
  Core Issues List revision 118b.
  See http://www.open-std.org/jtc1/sc22/wg21/ for the official
  list.</EM></P>
<P>2025-09-28</P>
<HR>
<A NAME="2639"></A><H4>2639.
  
new-lines after phase 1
</H4>
<B>Section: </B>5.2&#160; [<A href="https://wg21.link/lex.phases">lex.phases</A>]
 &#160;&#160;&#160;

 <B>Status: </B>C++23
 &#160;&#160;&#160;

 <B>Submitter: </B>US
 &#160;&#160;&#160;

 <B>Date: </B>2022-11-03<BR><BR>


<A href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2720r0.pdf#US3-030">P2720R0 comment
  US&#160;3-030<BR></A>

<P>[Accepted at the November, 2022 meeting.]</P>

<P>Translation phases 2 and 3 assume that lines are terminated by
"new-line characters". However, the current specification of phase 1
does not guarantee that to be true. In particular, for a UTF-8 file
the verbatim sequence of source file characters forms the input for
phase 2, even on systems where the line terminator is a carriage
return. The non-UTF-8 specification is also defective in that it
speaks of "introducing" new-line characters, even for encodings like
Latin-1 where new-lines might already be present and no "introduction"
is needed or appropriate.</P>

<P><B>Proposed resolution [SUPERSEDED]:</B></P>

<P>Change in 5.2 [<A href="https://wg21.link/lex.phases#1.1">lex.phases</A>] paragraph 1.1 as follows:</P>

<BLOCKQUOTE>

<P>
...  If an input file is determined to be a UTF-8 file, then it shall
be a well-formed UTF-8 code unit sequence and it is decoded to produce
a sequence of UCS scalar values that constitutes the sequence of
elements of the translation character set<INS>, representing each
line-termination character or character sequence as a new-line
character</INS>.
</P>

<P>
For any other kind of input file supported by the implementation,
characters are mapped, in an implementation-defined manner, to a
sequence of translation character set elements
(5.3.1 [<A href="https://wg21.link/lex.charset">lex.charset</A>]) (<DEL>introducing new-line characters
for</DEL> <INS>representing</INS> end-of-line indicators <INS>as
new-line characters</INS>).
</P>

</BLOCKQUOTE>

<P><B>Proposed resolution (approved by CWG 2022-11-08):</B></P>

<P>Change in 5.2 [<A href="https://wg21.link/lex.phases#1.1">lex.phases</A>] paragraph 1.1 as follows:</P>

<BLOCKQUOTE>

<P>
...  If an input file is determined to be a UTF-8 file, then it shall
be a well-formed UTF-8 code unit sequence and it is decoded to produce
a sequence of UCS scalar values that constitutes the sequence of
elements of the translation character set. <INS>In the resulting
sequence, each pair of characters in the input sequence consisting of U+000D
CARRIAGE RETURN followed by U+000A LINE FEED, as well as each U+000D
CARRIAGE RETURN not immediately followed by a U+000A LINE FEED, is
replaced by a single new-line character.</INS>
</P>

<P>
For any other kind of input file supported by the implementation,
characters are mapped, in an implementation-defined manner, to a
sequence of translation character set elements
(5.3.1 [<A href="https://wg21.link/lex.charset">lex.charset</A>]) <DEL>(introducing new-line characters
for</DEL> <INS>, representing</INS> end-of-line indicators <INS>as
new-line characters</INS> <DEL>)</DEL>.
</P>

</BLOCKQUOTE>

<BR><BR>
</BODY>
</HTML>
