<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 303: Bitset input operator underspecified</title>
<meta property="og:title" content="Issue 303: Bitset input operator underspecified">
<meta property="og:description" content="C++ library issue. Status: CD1">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue303.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#CD1">CD1</a> status.</em></p>
<h3 id="303"><a href="lwg-defects.html#303">303</a>. Bitset input operator underspecified</h3>
<p><b>Section:</b> 22.9.4 <a href="https://wg21.link/bitset.operators">[bitset.operators]</a> <b>Status:</b> <a href="lwg-active.html#CD1">CD1</a>
 <b>Submitter:</b> Matt Austern <b>Opened:</b> 2001-02-05 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#bitset.operators">issues</a> in [bitset.operators].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#CD1">CD1</a> status.</p>
<p><b>Discussion:</b></p>
<p>
In 23.3.5.3, we are told that <code>bitset</code>'s input operator
&quot;Extracts up to <i>N</i> (single-byte) characters from
<i>is</i>.&quot;, where <i>is</i> is a stream of type
<code>basic_istream&lt;charT, traits&gt;</code>.
</p>

<p>
The standard does not say what it means to extract single byte
characters from a stream whose character type, <code>charT</code>, is in
general not a single-byte character type.  Existing implementations
differ.
</p>

<p>
A reasonable solution will probably involve <code>widen()</code> and/or
<code>narrow()</code>, since they are the supplied mechanism for
converting a single character between <code>char</code> and 
arbitrary <code>charT</code>.
</p>

<p>Narrowing the input characters is not the same as widening the
literals <code>'0'</code> and <code>'1'</code>, because there may be some
locales in which more than one wide character maps to the narrow
character <code>'0'</code>.  Narrowing means that alternate
representations may be used for bitset input, widening means that
they may not be.</p>

<p>Note that for numeric input, <code>num_get&lt;&gt;</code>
(22.2.2.1.2/8) compares input characters to widened version of narrow
character literals.</p>

<p>From Pete Becker, in c++std-lib-8224:</p>
<blockquote>
<p>
Different writing systems can have different representations for the
digits that represent 0 and 1. For example, in the Unicode representation
of the Devanagari script (used in many of the Indic languages) the digit 0
is 0x0966, and the digit 1 is 0x0967. Calling narrow would translate those
into '0' and '1'. But Unicode also provides the ASCII values 0x0030 and
0x0031 for for the Latin representations of '0' and '1', as well as code
points for the same numeric values in several other scripts (Tamil has no
character for 0, but does have the digits 1-9), and any of these values
would also be narrowed to '0' and '1'.
</p>

<p>...</p>

<p>
It's fairly common to intermix both native and Latin
representations of numbers in a document. So I think the rule has to be
that if a wide character represents a digit whose value is 0 then the bit
should be cleared; if it represents a digit whose value is 1 then the bit
should be set; otherwise throw an exception. So in a Devanagari locale,
both 0x0966 and 0x0030 would clear the bit, and both 0x0967 and 0x0031
would set it. Widen can't do that. It would pick one of those two values,
and exclude the other one.
</p>

</blockquote>

<p>From Jens Maurer, in c++std-lib-8233:</p>

<blockquote>
<p>
Whatever we decide, I would find it most surprising if
bitset conversion worked differently from int conversion
with regard to alternate local representations of
numbers.
</p>

<p>Thus, I think the options are:</p>
<ul>
 <li> Have a new defect issue for 22.2.2.1.2/8 so that it will
require the use of narrow().</li>

 <li> Have a defect issue for bitset() which describes clearly
that widen() is to be used.</li>
</ul>
</blockquote>



<p id="res-303"><b>Proposed resolution:</b></p>

    <p>Replace the first two sentences of paragraph 5 with:</p>

    <blockquote><p>
    Extracts up to <i>N</i> characters from <i>is</i>. Stores these
    characters in a temporary object <i>str</i> of type
    <code>basic_string&lt;charT, traits&gt;</code>, then evaluates the
    expression <code><i>x</i> = bitset&lt;N&gt;(<i>str</i>)</code>.
    </p></blockquote>

    <p>Replace the third bullet item in paragraph 5 with:</p>
    <ul><li>
    the next input character is neither <code><i>is</i>.widen(0)</code>
    nor <code><i>is</i>.widen(1)</code> (in which case the input character
    is not extracted).
    </li></ul>



<p><b>Rationale:</b></p>
<p>Input for <code>bitset</code> should work the same way as numeric
input.  Using <code>widen</code> does mean that alternative digit
representations will not be recognized, but this was a known 
consequence of the design choice.</p>





</body>
</html>
