﻿<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
              border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
                padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
                 border: 1px solid #D1D1D1;
                 padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
                    color: #000000; background-color: #FFEBFF;
                    border: 1px solid #ECD7EC;
                    padding-left: 0.5em; padding-right: 0.5em; }

blockquote.stdins { text-decoration: underline;
                    color: #000000; background-color: #C8FFC8;
                    border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
        margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
     padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
     padding-left: 0.8em; border: none; }

p.grammarlhs { margin-bottom: 0 }
p.grammarrhs { margin-left:8em; margin-top:0; text-indent:-4em }

</style>

<title>Removing trigraphs??!</title>
</head>

<body>

<div style="text-align: right; float: right">
<p>ISO/IEC JTC1 SC22 WG21<br>N3981<br>Richard Smith<br>2014-05-06</p>
</div>

<h1>Removing trigraphs??!</h1>

<h2>Case study</h2>

<p>The uses of trigraph-like constructs in one large codebase were examined.
We discovered:

<ul>
<li><b>923</b> instances of an escaped <tt>?</tt> in a string literal to avoid
trigraph replacement:
  <pre>string pattern() const { return "foo-????\?-of-?????"; }</pre>
<li><b>4</b> instances of trigraphs being used deliberately in test code: two
in the test suite for a compiler, the other two in a test suite for boost's
preprocessor library.
<li><b>0</b> instances of trigraphs being deliberately used in production code.
</ul>

<p>Trigraphs continue to pose a burden on users of C++.</p>

<h2>Proposal</h2>

<p>Trigraphs are handled in the first phase of translation:</p>

<blockquote class="std">
Physical source file characters are mapped, in an implementation-defined
manner, to the basic source character set (introducing new-line characters for
end-of-line indicators) if necessary. The set of physical source file
characters accepted is implementation-defined. Trigraph sequences (2.4) are
replaced by corresponding single-character internal representations.
</blockquote>

<p>
Note that the mapping from physical source file characters to the basic
source character set is implementation-defined.
If trigraphs are removed from the language entirely, an implementation that
wishes to support them can continue to do so: its implementation-defined
mapping from physical source file characters to the basic source character set
can include trigraph translation (and can even avoid doing so within raw string
literals).
<b>We do not need trigraphs in the standard for backwards
compatibility.</b>
</p>

<p>This paper proposes that trigraphs be removed entirely.</p>

<h2>Proposed wording</h2>

<p>Change in 2.2 (lex.phases) paragraph 1 bullet 1:

<blockquote class="std">
Physical source file characters are mapped, in an implementation-defined
manner, to the basic source character set (introducing new-line characters for
end-of-line indicators) if necessary. The set of physical source file
characters accepted is implementation-defined. <del>Trigraph sequences (2.4)
are replaced by corresponding single-character internal representations.</del>
Any source file character not in the basic source character set (2.3) is
replaced by [&hellip;]
</blockquote>

<p>Delete subclause 2.4 (lex.trigraph) "Trigraph sequences"</p>

<p>Change in 2.5 (lex.pptoken) paragraph 3 bullet 1:</p>

<blockquote class="std">
If the next character begins a sequence of characters that could be the prefix and initial double quote of
a raw string literal, such as <tt>R"</tt>, the next preprocessing token shall be a raw string literal. Between the
initial and final double quote characters of the raw string, any transformations performed in phases 1
and 2 (<del>trigraphs,</del> universal-character-names<del>,</del> and line splicing) are reverted; this reversion shall apply
before any d-char, r-char, or delimiting parenthesis is identified. [&hellip;]
</blockquote>

<p>Change footnote 24 in 2.14.3 (lex.ccon) paragraph 3:</p>

<blockquote class="std">
Using an escape sequence for a question mark <del>can avoid accidentally
creating a trigraph</del> <ins>is supported for compatibility with ISO
C++14 and ISO C</ins>.
</blockquote>

<p>Add a subclause to Annex C:</p>

<blockquote class="stdins">
<h3>Clause 2: lexical conventions [diff.cpp14.lex]</h3>
<b>Change:</b> Removal of trigraphs.<br>
<b>Rationale:</b> Undesirable feature that prevents some uses of <tt>??</tt> in
non-raw string literals and comments.<br>
<b>Effect on original feature:</b>
Valid C++2014 code that uses trigraphs may not be valid or may have different
semantics. Implementations may choose to provide trigraphs as part of the
implementation-defined mapping from physical source file characters to the
basic source character set, but are encouraged not to do so.
</blockquote>

<h2>Feature test macro</h2>

<p>No feature test macro is provided for this feature. Code that wishes to be
portable between implementations that provide trigraphs and those that do not
should avoid using basic source character sequences containing trigraphs.</p>

</body>
</html>
