<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
	"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
  <title>Synchronizing the C++ preprocessor with C99</title>
</head>

<body>

<table summary="This table provides information identifying this document and its author.">
<tr>
<th>Doc. No.:</th>
<td>WG21/N1566, J16/04-0006</td>
</tr>
<tr>
<th>Date:</th>
<td>2004-02-05</td>
</tr>
<tr>
<th>Reply to:</th>
<td>Clark Nelson</td>
</tr>
<tr>
<th>Phone:</th>
<td>+1-503-712-8433</td>
</tr>
<tr>
<th>Email:</th>
<td><a href="mailto:clark.nelson@intel.com">clark.nelson@intel.com</a></td>
</tr>
</table>
<h1>Synchronizing the C++ preprocessor with C99</h1>

<p>This document is intended as a basis for discussion of the details of adopting 
  text from C99 to describe the C++ preprocessor. This was proposed at the Kona 
  meeting, and was supported almost unanimously by the Evolution working group.</p>

<p>This paper summarizes every change that was made to the preprocessor section 
  of either the C++ standard (as of 2003) or the C standard (as of 2001), taking 
  the 1989 C standard as the base. The descriptions of the phases of translation 
  and of trigraphs are also covered; they were explicitly mentioned in the original 
  straw vote from the first Nashua meeting (1991-03) on the subject of incorporating 
  text from the C standard.</p>

<p>Differences in the area of universal character names are also mentioned, as they 
  affect the phases of translation. UCNs were developed/introduced concurrently 
  in both committees/standards; nevertheless (and unfortunately) there is considerable 
  variance in the way they are described. Unfortunately, UCNs were not specifically 
  mentioned at Kona, and therefore it is not yet clear that there is consensus to 
  synchronize with C, nor which direction may be favored for resolving discrepancies.</p>

<p>References are to the C++ standard, with the corresponding C standard section 
  number in parentheses. A complete C reference is used when a paragraph or section 
  has been added to the C standard.</p>

<p>Changes fall into four major categories:</p>

<dl>
<dt>Universal character name differences</dt>
<dd>These are treated separately, because they are a new topic.</dd>
<dt>Technical changes</dt>
<dd>These correspond or relate to language features and other substantive committee 
decisions.</dd>
<dt>Terminology changes</dt>
<dd>Where different committees and editors have at different times used different 
terms of reference.</dd>
<dt>Editorial changes</dt>
<dd>Clarifying changes, large and small. (Some small editorial changes to non-normative 
text, and to text also modified more significantly, are ignored in this paper.)</dd>
</dl>
<h2>Universal character names</h2>

<p>I expect that everyone would agree that C and C++ should be synchronized with 
  respect to universal character names. I am less certain that everyone will agree 
  where changes should be made to effect synchronization. Personally, I believe 
  that the model described in the C standard is at least as good as that in C++. 
  Therefore, I recommend that the C++ standard be changed to match C: the changes 
  from C99 should be adopted, and the changes from C++ should be abandoned.</p>

<p>It should be noted that this list of changes is not complete. Universal character 
  names are also mentioned elsewhere in both standards: where they are defined, 
  in the descriptions of identifiers and string and character literals, and 
  in annexes specifying which characters are permitted in identifiers. More work 
  in this area will be needed, especially if the committee prefers close 
  synchronization.</p>

<h3>Changes made to C99</h3>

<p>16.3.22 (6.10.3.2): Added a statement that stringizing a string literal containing 
  a UCN is implementation-defined.</p>

<h3>Changes made to C++</h3>

<p>2.1 phase 1 (5.1.1.2): Processing of characters not in the basic source character 
  set is described in terms of universal character names.</p>

<p>2.1 phase 2 (5.1.1.2): A universal character name may not be split by an escaped 
  new-line.</p>

<p>2.1 phase 5 (5.1.1.2): Universal character names are mapped onto the execution 
  character set. <em>[In C, no change is needed here because of a terminology difference: 
  a universal character name is described as an escape sequence, which is already 
  mentioned.]</em></p>

<h2>Technical changes</h2>

<p>The technical changes are presented roughly in order of decreasing controversy 
  (in my best guess).</p>

<h3>Standard pragmas</h3>

<p>This represents quite a lot of technical work, in both specification and implementation. 
  I am not prepared to make a recommendation at this time.</p>

<h4>Changes made to C99</h4>

<p>16.61 (6.10.6): Added statements distinguishing non-standard pragmas from 
  standard pragmas.</p>

<p>16.6 -- new paragraph (6.10.62): Added introductions of the standard pragmas:</p>

<ul>
<li><code>#pragma STDC FP_CONTRACT</code></li>
<li><code>#pragma STDC FENV_ACCESS</code></li>
<li><code>#pragma STDC CX_LIMITED_RANGE</code></li>
</ul>
<p>Their semantics are specified elsewhere.</p>

<h3>Predefined macros</h3>

<p>Although the conditionally-defined macros added to C99 represent a fair amount 
  of work in specification and/or coordination, my recommendation would be to adopt 
  them into C++. <code>__STDC_HOSTED__</code> is comparatively easy, makes as much 
  sense for C++ as it does for C.</p>

<p>Clearly the description of <code>__cplusplus</code> in the C++ standard should 
  not be synchronized with C. <code>__STDC_VERSION__</code> might be trivially (and 
  usefully?) defined to have the same value as <code>__cplusplus</code>. With respect 
  to <code>__STDC__</code>, perhaps existing practice should be surveyed.</p>

<h4>Changes made to C99</h4>

<p>16.81 (6.10.8): New macros were added:</p>

<dl>
<dt><code>__STDC_HOSTED__</code></dt>
<dd>Indicates whether or not the implementation is hosted.</dd>
<dt><code>__STDC_VERSION__</code></dt>
<dd>A year-month standard version number.</dd>
</dl>
<p>Several editorial clarifications are also applied.</p>

<p>16.8 -- new paragraph (6.10.82): New conditionally-defined macros were added:</p>

<dl>
<dt><code>__STDC_IEC_559__</code></dt>
<dd>Indicates whether or not floating-point arithmetic conforms to IEC 60559 (a.k.a. 
IEEE 754).</dd>
<dt><code>__STDC_IEC_559_COMPLEX__</code></dt>
<dd>Indicates whether or not complex arithmetic conforms to IEC 60559.</dd>
<dt><code>__STDC_ISO_10646__</code></dt>
<dd>A year-month number of the version of ISO 10646 encoded by <code>wchar_t</code>.</dd>
</dl>
<p>16.8 -- new paragraph (6.10.85): Added a prohibition against predefining or 
  defining <code>__cplusplus</code>. <em>[This was added more or less as a courtesy, 
  to ensure that <code>__cplusplus</code> could be used to distinguish reliably 
  between C and C++.]</em></p>

<h4>Changes made to C++</h4>

<p>16.81 (6.10.8):</p>

<dl>
<dt><code>__STDC__</code></dt>
<dd>The state was changed to be implementation-defined.</dd>
<dt><code>__cplusplus</code></dt>
<dd>Added as a year-month version number.</dd>
</dl>
<p>In addition, a restriction on the spellings of any other predefined macros (i.e. 
  that they must begin either with two underscores or an underscore and a capital 
  letter) was deleted. <em>[I believe this was removed due to a general reluctance 
  to state restrictions on implementations using the word &quot;shall&quot;. Other such instances 
  were rephrased, not deleted. It is not clear to me that this particular change 
  is worth preserving.]</em></p>

<h3>Extended integer types</h3>

<p>Probably every hosted C++ implementation already supports 64-bit integers, most 
  by the name <code>long long</code>. So adopting it, along with the other <code>
  &lt;stdint.h&gt;</code> changes, would amount to codification of existing practice. 
  I recommend it.</p>

<h4>Changes made to C99</h4>

<p>16.14 (6.10.1): <code>long</code> and <code>unsigned long</code> were replaced 
  by <code>intmax_t</code> and <code>uintmax_t</code>, respectively. Also, integer 
  literals can have other widths than <code>int</code> or <code>long</code>.</p>

<h3>Pragma operator</h3>

<p>This is a very simple change; there is no interaction with the rest of the language. 
  It should be adopted.</p>

<h4>Changes made to C99</h4>

<p>16.3.43 (6.10.3.4): Added a statement that pragma operators are processed 
  after macro expansion.</p>

<p>16.9 -- new section (6.10.9): Added description of pragma operator:</p>

<dl>
<dd><code>_Pragma (</code> <var>string-literal</var> <code>)</code></dd>
</dl>
<p>2.1 phase 4 (5.1.1.2): Added a statement that pragma operators are interpreted.</p>

<h3>Variadic macros and empty macro arguments</h3>

<p>Paul Mensonides made this proposal in isolation at the Kona meeting. I trumped 
  it by suggesting this grander unification before many people had a chance to comment 
  on this aspect specifically. This is unquestionably the largest change under consideration. 
  Along with Paul, I recommend it.</p>

<h4>Changes made to C99</h4>

<p>16 <var>control-line</var> grammar rule (6.10): Alternatives were added with 
  an ellipsis before the close parenthesis.</p>

<p>16.34 (6.10.3): A variadic macro may be invoked with more arguments than the 
  definition has parameters.</p>

<p>16.3 -- new paragraph (6.10.35): <code>__VA_ARGS__</code> may be used only 
  in the definition of a variadic macro.</p>

<p>16.39 (6.10.3): Alternatives were added with an ellipsis before the close 
  parenthesis.</p>

<p>16.310 (6.10.3): Removed statement that empty macro arguments yield undefined 
  behavior.</p>

<p>16.3 -- new paragraph (6.10.312): Added description of argument collection 
  for variadic macros.</p>

<p>16.3.1 -- new paragraph (6.10.3.12): <code>__VA_ARGS__</code> is an implicit 
  parameter of a variadic macro.</p>

<p>16.3.22 (6.10.3.2): Added definition of the result of stringizing an empty 
  macro argument.</p>

<p>16.3.32-3 (6.10.3.3): Added definition of token-pasting with an empty macro 
  argument.</p>

<p>16.3.3 -- new paragraph (6.10.3.34): A token-pasting example was added.</p>

<p>16.3.55 (6.10.3.5): Examples of token-pasting and stringizing with empty macro 
  arguments were added.</p>

<p>16.3.5 -- new paragraph (6.10.3.57): More examples of token-pasting with empty 
  macro arguments.</p>

<p>16.3.5 -- new paragraph (6.10.3.59): Examples using variadic macros.</p>

<h3>String literal concatenation</h3>

<p>This change should be adopted. Note that, since the Technical Report on extensions 
  for new character data types (<a href="http://anubis.dkuug.dk/JTC1/SC22/WG14/www/docs/n1040.pdf">WG14/N1040</a>) 
  has new kinds of string literals, its rules are slightly different, although analogous.</p>

<h4>Changes made to C99</h4>

<p>2.1 phase 6 (5.1.1.2): If adjacent string literals are of different types, 
  the result of concatenation is a wide string literal.</p>

<h3>Header and include file names</h3>

<p>It is interesting to note that C89 explicitly allowed only letters in header 
  and include file names. C++ added underscores, and C99 added digits. Probably 
  both standards should allow both.</p>

<p>I have no idea why C99 dropped that the requirement that the implementation document 
  the mapping to external file names. But there is probably no practical impact, 
  so by default C++ should probably drop it as well.</p>

<h4>Changes made to C99</h4>

<p>16.25 (6.10.2): The mapping from header or source file name syntax to external 
  source file names is no longer implementation-defined.</p>

<p>16.25 (6.10.2): Non-initial digits are now allowed in include syntax.</p>

<h4>Changes made to C++</h4>

<p>16.25 (6.10.2): Underscores are allowed.</p>

<h3>Translation limit changes</h3>

<p>There is probably no support for adopting the lower limit on the significance 
  of a header or include file name from C, even though it has now been increased.</p>

<p>On the other hand, I imagine it was only by oversight that the limitation to 
  15-bit numbers in a <code>#line</code> directive survived into C++. There is certainly 
  no need to preserve it.</p>

<h4>Changes made to C99</h4>

<p>16.25 (6.10.2): The lower limit on the significant characters of an include 
  file or header name was raised to eight.</p>

<p>16.42 (6.10.4): The lower limit on the number that can be specified in a
  <code>#line</code> directive was raised to 2147483647.</p>

<h4>Changes made to C++</h4>

<p>16.25 (6.10.2): The standard does not explicitly grant license to limit the 
  number of significant characters in the name of an included file or header.</p>

<h3>Alternative tokens</h3>

<p>This is a considered difference from C, in which these identifier-like alternative 
  token spellings are explicitly implemented as macros. It should be preserved.</p>

<h4>Changes made to C++</h4>

<p>16.14 (6.10.1): Added a footnote clarifying that an identifier-like spelling 
  of an alternative token is not replaced by zero in a condition directive.</p>

<h3><code>bool</code> data type</h3>

<p>Although C now has a Boolean type, Boolean-valued operators are still specified 
  as having <code>int</code> results, unlike in C++. Also, in C++ <code>true</code> 
  and <code>false</code> are not defined as macros. So this difference is still 
  justified.</p>

<h4>Changes made to C++</h4>

<p>16.14 (6.10.1): In a condition directive, <code>true</code> and <code>false</code> 
  are not replaced by zero, and <code>bool</code>-typed subexpressions are immediately 
  integral-promoted.</p>

<h3>Template instantiation</h3>

<p>This change is obviously still justified.</p>

<h4>Changes made to C++</h4>

<p>2.1 phase 8 -- new phase (5.1.1.2): Template instantiation was inserted between 
  parsing/translation and linking.</p>

<h2>Terminology changes</h2>

<p>Unless someone would like to convince either committee to adopt terms from the 
  other, these are simply areas where the committees have agreed to disagree. I 
  recommend no changes.</p>

<h3>Changes made to C99</h3>

<p>&quot;integral constant expression&quot; was changed to &quot;integer constant expression&quot;.</p>

<p>&quot;comprise&quot; was changed to &quot;compose&quot;.</p>

<p>&quot;preprocessing translation unit&quot; was added, referring to a translation unit before 
  macro expansion. </p>

<h3>Changes made to C++</h3>

<p>&quot;character constant&quot; was changed to &quot;character literal&quot;.</p>

<p>The implication of &quot;shall&quot; in a Semantics paragraph of the C standard is spelled 
  out as &quot;undefined behavior&quot;.</p>

<p>When &quot;shall&quot; was used to express a requirement on an implementation, the requirement 
  was rewritten.</p>

<p>16.32-3 (6.10.3): Constraints on macro redefinition were made explicit using 
  &quot;ill-formed&quot;.</p>

<h2>Editorial changes</h2>

<p>Although I frankly do not see the point of a few of the changes made to C99, 
  for simplicity I recommend that they all be adopted, including the small edits.</p>

<p>The changes made to C++ should be forwarded to the C committee for their consideration.</p>

<h3>Changes made to C99</h3>

<p>161 (6.10): Clarifications were added with respect to translation phases (specifically, 
  processing of comments and expansion of macros). An accompanying example was added 
  as a new paragraph immediately before 16.1.</p>

<p>16 grammar rules (6.10): New rules were added for <var>text-line</var> and
  <var>non-directive</var>, and <var>group-part</var> was changed to use them, to 
  clarify (for example) that any line beginning with <code>#</code> is interpreted 
  as a directive (even though it also matches the grammar of a non-directive line). 
  Two new accompanying text paragraphs were also added before 162.</p>

<p>16.3 -- new paragraph (6.10.33): Added a requirement for white-space after 
  the macro name in an object-like macro definition.</p>

<p>16.3.41 (6.10.3.4): Added clarification that token-pasting and stringizing 
  precede rescanning. Also minor editorial changes.</p>

<p>16.3.51 (6.10.3.5): Added clarification that macros are not used after translation 
  phase 4.</p>

<p>16.61 (6.10.6): Added clarification that (non-standard) pragmas may cause 
  translation failure or non-conforming behavior.</p>

<p>2.1 phase 1 (5.1.1.2): Clarified that source may contain multibyte characters.</p>

<p>2.1 phase 2 (5.1.1.2): Clarified that a line that ends with two backslashes 
  can not result in two line-splices.</p>

<p>2.1 phase 4 (5.1.1.2): Clarified that preprocessing directives do not survive 
  past phase 4.</p>

<p>2.1 phase 5 (5.1.1.2): The mapping to the execution character set was clarified: 
  a character not in the execution set must not be mapped to a null character, but 
  different missing characters may be mapped to different execution characters.</p>

<p>2.1 phase 7 (5.1.1.2): Added clarification that the results of preprocessing 
  are translated &quot;as a translation unit&quot;.</p>

<p>Several examples were changed to include &quot;C++-style&quot; comments.</p>

<p>16 grammar rules (6.10): The definition of <var>lparen</var> was tweaked.</p>

<p>16.32-3 (6.10.3): Definitions of <dfn>object-like</dfn> and <dfn>function-like</dfn> 
  macro were moved down, and forward-referenced from here. Constraints were made 
  explicit using &quot;shall&quot;. Paragraphs were joined into one.</p>

<p>16.3.11 (6.10.3.1): &quot;translation unit&quot; changed to &quot;preprocessing file&quot;.</p>

<p>16.3.32 (6.10.3.3): Clarify that special case for parameters in token-pasting 
  applies only in function-like macros.</p>

<p>16.3.42 (6.10.3.4): Change &quot;Further&quot; to &quot;Furthermore&quot;.</p>

<p>16.3.56 (6.10.3.5): A comment referring (misleadingly) to a previous example 
  was deleted.</p>

<p>2.1 phase 2 (5.1.1.2): The description of an escaped new-line was rearranged.</p>

<p>2.1 phase 3 (5.1.1.2): Added &quot;in a&quot; in &quot;or in a partial comment.&quot;</p>

<h3>Changes made to C++</h3>

<p>16.12 (6.10.1): Added a restriction that only valid tokens may appear in a 
  condition directive.</p>

<p>16.14 (6.10.1): Added clarification that (most) keywords are replaced by zero 
  in a condition directive.</p>

<p>16.38 (6.10.3): Added clarification that object-like macros are rescanned.</p>

<p>16.51 (6.10.5): Added a statement that <code>#error</code> causes a program 
  to be ill-formed.</p>

<p>2.1 phase 3 (5.1.1.2): The footnote pointing out the context-dependent nature 
  of tokenization (specifically with respect to header names) was made normative.</p>

<p>2.1 phase 7 (5.1.1.2): A note was added clarifying that there need not be a 
  one-to-one correspondence between (for example) source files and external file 
  system files.</p>

<p>2.31 (5.2.1.1): Added clarification that trigraphs are recognized before preprocessing.</p>

<p>2.31 (5.2.1.1): Added an example using several trigraphs. Deleted the example 
  demonstrating a boundary condition (<code>???/</code>).</p>

<p>161 (6.10): Modified to break up a very long sentence.</p>

<p>16.11 (6.10.1): Spelled out &quot;0&quot; as &quot;zero&quot;.</p>

<p>16.39 (6.10.3): &quot;arguments&quot; was replaced with &quot;parameters&quot;.</p>

<p>2.31 (5.2.1.1): Description of trigraph processing changed from plural (collective) 
  to singular (distributive). Also, trigraph sequences were formatted into a table.</p>

</body>

</html>