<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 356: Meaning of ctype_base::mask enumerators</title>
<meta property="og:title" content="Issue 356: Meaning of ctype_base::mask enumerators">
<meta property="og:description" content="C++ library issue. Status: NAD">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue356.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#NAD">NAD</a> status.</em></p>
<h3 id="356"><a href="lwg-closed.html#356">356</a>. Meaning of ctype_base::mask enumerators</h3>
<p><b>Section:</b> 28.3.4.2 <a href="https://wg21.link/category.ctype">[category.ctype]</a> <b>Status:</b> <a href="lwg-active.html#NAD">NAD</a>
 <b>Submitter:</b> Matt Austern <b>Opened:</b> 2002-01-23 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#category.ctype">issues</a> in [category.ctype].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#NAD">NAD</a> status.</p>
<p><b>Discussion:</b></p>

<p>What should the following program print?</p>

<pre>
  #include &lt;locale&gt;
  #include &lt;iostream&gt;

  class my_ctype : public std::ctype&lt;char&gt;
  {
    typedef std::ctype&lt;char&gt; base;
  public:
    my_ctype(std::size_t refs = 0) : base(my_table, false, refs)
    {
      std::copy(base::classic_table(), base::classic_table() + base::table_size,
                my_table);
      my_table[(unsigned char) '_'] = (base::mask) (base::print | base::space);
    }
  private:
    mask my_table[base::table_size];
  };

  int main()
  {
    my_ctype ct;
    std::cout &lt;&lt; "isspace: " &lt;&lt; ct.is(std::ctype_base::space, '_') &lt;&lt; "    "
              &lt;&lt; "isalpha: " &lt;&lt; ct.is(std::ctype_base::alpha, '_') &lt;&lt; std::endl;
  }
</pre>

<p>The goal is to create a facet where '_' is treated as whitespace.</p>

<p>On gcc 3.0, this program prints "isspace: 1 isalpha: 0".  On
Microsoft C++ it prints "isspace: 1 isalpha: 1".</p>

<p>
I believe that both implementations are legal, and the standard does not
give enough guidance for users to be able to use std::ctype's
protected interface portably.</p>

<p>
The above program assumes that ctype_base::mask enumerators like
<code>space</code> and <code>print</code> are disjoint, and that the way to
say that a character is both a space and a printing character is to or
those two enumerators together.  This is suggested by the "exposition
only" values in 28.3.4.2 <a href="https://wg21.link/category.ctype">[category.ctype]</a>, but it is nowhere specified in
normative text.  An alternative interpretation is that the more
specific categories subsume the less specific.  The above program
gives the results it does on the Microsoft compiler because, on that
compiler, <code>print</code> has all the bits set for each specific
printing character class.
</p>

<p>From the point of view of std::ctype's public interface, there's no
important difference between these two techniques.  From the point of
view of the protected interface, there is.  If I'm defining a facet
that inherits from std::ctype&lt;char&gt;, I'm the one who defines the
value that table()['a'] returns.  I need to know what combination of
mask values I should use.  This isn't so very esoteric: it's exactly
why std::ctype has a protected interface.  If we care about users
being able to write their own ctype facets, we have to give them a
portable way to do it.
</p>

<p>
Related reflector messages:
lib-9224, lib-9226, lib-9229, lib-9270, lib-9272, lib-9273, lib-9274,
lib-9277, lib-9279.
</p>

<p>Issue <a href="lwg-defects.html#339" title="definition of bitmask type restricted to clause 27 (Status: CD1)">339</a><sup><a href="https://cplusplus.github.io/LWG/issue339" title="Latest snapshot">(i)</a></sup> is related, but not identical.  The
proposed resolution if issue <a href="lwg-defects.html#339" title="definition of bitmask type restricted to clause 27 (Status: CD1)">339</a><sup><a href="https://cplusplus.github.io/LWG/issue339" title="Latest snapshot">(i)</a></sup> says that
ctype_base::mask must be a bitmask type. It does not say that the
ctype_base::mask elements are bitmask elements, so it doesn't
directly affect this issue.</p>

<p>More comments from Benjamin Kosnik, who believes that 
that C99 compatibility essentially requires what we're
calling option 1 below.</p>

<blockquote>
<pre>
I think the C99 standard is clear, that isspace -&gt; !isalpha.
--------

#include &lt;locale&gt;
#include &lt;iostream&gt;

class my_ctype : public std::ctype&lt;char&gt;
{
private:
  typedef std::ctype&lt;char&gt; base;
  mask my_table[base::table_size];

public:
  my_ctype(std::size_t refs = 0) : base(my_table, false, refs)
  {
    std::copy(base::classic_table(), base::classic_table() + base::table_size,
              my_table);
    mask both = base::print | base::space;
    my_table[static_cast&lt;mask&gt;('_')] = both;
  }
};

int main()
{
  using namespace std;
  my_ctype ct;
  cout &lt;&lt; "isspace: " &lt;&lt; ct.is(ctype_base::space, '_') &lt;&lt; endl;
  cout &lt;&lt; "isprint: " &lt;&lt; ct.is(ctype_base::print, '_') &lt;&lt; endl;

  // ISO C99, isalpha iff upper | lower set, and !space.
  // 7.5, p 193
  // -&gt; looks like g++ behavior is correct.
  // 356 -&gt; bitmask elements are required for ctype_base
  // 339 -&gt; bitmask type required for mask
  cout &lt;&lt; "isalpha: " &lt;&lt; ct.is(ctype_base::alpha, '_') &lt;&lt; endl;
}
</pre>
</blockquote>



<p id="res-356"><b>Proposed resolution:</b></p>
<p>Informally, we have three choices:</p> 
<ol>
<li>Require that the enumerators are disjoint (except for alnum and
graph)</li>
<li>Require that the enumerators are not disjoint, and specify which
of them subsume which others.  (e.g. mandate that lower includes alpha
and print)</li>
<li>Explicitly leave this unspecified, which the result that the above
program is not portable.</li>
</ol>

<p>Either of the first two options is just as good from the standpoint
of portability.  Either one will require some implementations to
change.</p>


<p><b>Rationale:</b></p>
<p>The LWG agrees that this is a real ambiguity, and that both
interpretations are conforming under the existing standard. However,
there's no evidence that it's causing problems for real users. Users
who want to define ctype facets portably can test the ctype_base masks
to see which interpretation is being used.</p>





</body>
</html>
