<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2018: [CD] regex_traits::isctype Returns clause is wrong</title>
<meta property="og:title" content="Issue 2018: [CD] regex_traits::isctype Returns clause is wrong">
<meta property="og:description" content="C++ library issue. Status: C++14">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2018.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#C++14">C++14</a> status.</em></p>
<h3 id="2018"><a href="lwg-defects.html#2018">2018</a>. [CD] <code>regex_traits::isctype</code> Returns clause is wrong</h3>
<p><b>Section:</b> 28.6.6 <a href="https://wg21.link/re.traits">[re.traits]</a> <b>Status:</b> <a href="lwg-active.html#C++14">C++14</a>
 <b>Submitter:</b> Jonathan Wakely <b>Opened:</b> 2010-11-16 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View all other</b> <a href="lwg-index.html#re.traits">issues</a> in [re.traits].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#C++14">C++14</a> status.</p>
<p><b>Discussion:</b></p>

<p><b>Addresses GB 10</b></p>

<p>28.6.6 <a href="https://wg21.link/re.traits">[re.traits]</a> p. 12 says:</p>

<blockquote><p>
returns true if <code>f</code> bitwise or&#39;ed with the result of calling
<code>lookup_classname</code> with an iterator pair that designates the character
sequence &quot;w&quot; is not equal to <code>0</code> and <code>c == '_'</code>
</p></blockquote>

<p>
If the bitmask value corresponding to &quot;w&quot; has a non-zero value (which
it must do) then the bitwise or with any value is also non-zero, and
so <code>isctype('_', f)</code> returns true for any <code>f</code>. Obviously this is wrong,
since <code>'_'</code> is not in every <code>ctype</code> category.
</p>

<p>
There&#39;s a similar problem with the following phrases discussing the
&quot;blank&quot; char class.
</p>

<p><i>[2011-05-06: Jonathan Wakely comments and provides suggested wording]</i></p>


<p>
DR <a href="lwg-defects.html#2019" title="isblank not supported by std::locale (Status: C++11)">2019</a><sup><a href="https://cplusplus.github.io/LWG/issue2019" title="Latest snapshot">(i)</a></sup> added <code>isblank</code> support to <code>&lt;locale&gt;</code> which simplifies the
definition of <code>regex_traits::isctype</code> by removing the special case for the "blank" class.
<p/>
My suggestion for 2018 is to add a new table replacing the lists of
recognized names in the Remarks clause of <code>regex_traits::lookup_classname</code>. 
I then refer to that table in the Returns clause of <code>regex_traits::isctype</code> 
to expand on the "in an unspecified manner" wording which is too vague. The conversion 
can now be described using the "is set" term defined by 16.3.3.3.3 <a href="https://wg21.link/bitmask.types">[bitmask.types]</a> and
the new table to convey the intented relationship between e.g.
[[:digit:]] and <code>ctype_base::digit</code>, which is not actually stated in the
FDIS.
<p/>
The effects of <code>isctype</code> can then most easily be described in code,
given an "exposition only" function prototype to do the not-quite-so-unspecified conversion 
from <code>char_class_type</code> to <code>ctype_base::mask</code>.
<p/>
The core of LWG 2018 is the "bitwise or'ed" wording which gives the
wrong result, always evaluating to true for all values of <code>f</code>. That is
replaced by the condition <code>(f&amp;x) == x</code> where <code>x</code> is the result of calling
<code>lookup_classname</code> with "w".  I believe that's necessary, because the
"w" class could be implemented by an internal "underscore" class i.e.
<code>x = _Alnum|_Underscore</code> in which case <code>(f&amp;x) != 0</code> would give the wrong
result when <code>f==_Alnum</code>.
<p/>
The proposed resolution also makes use of <code>ctype::widen</code> which addresses
the problem that the current wording only talks about "w" and '_' which assumes 
<code>charT</code> is char.  There's still room for improvement here:
the regex grammar in 28.6.12 <a href="https://wg21.link/re.grammar">[re.grammar]</a> says that the class names in the
table should always be recognized, implying that e.g. U"digit" should
be recognized by <code>regex_traits&lt;char32_t&gt;</code>, but the specification of
<code>regex_traits::lookup_classname</code> doesn't cover that, only mentioning
<code>char</code> and <code>wchar_t</code>.  Maybe the table should not distinguish narrow and
wide strings, but should just have one column and add wording to say
that <code>regex_traits</code> widens the name as if by using <code>use_facet&lt;ctype&lt;charT&gt;&gt;::widen()</code>.
<p/>
Another possible improvement would be to allow additional
implementation-defined extensions in <code>isctype</code>. An implementation is
allowed to support additional class names in <code>lookup_classname</code>, e.g.
[[:octdigit:]] for [0-7] or [[:bindigit:]] for [01], but the current
definition of isctype provides no way to use them unless <code>ctype_base::mask</code> 
also supports them.
</p>

<p><i>[2011-05-10: Alberto and Daniel perform minor fixes in the P&#47;R]</i></p>


<p><i>[
2011 Bloomington
]</i></p>


<p>
Consensus that this looks to be a correct solution, and the presentation as a table is a big improvement.
</p>

<p>
Concern that the middle section wording is a little muddled and confusing, Stefanus volunteered to reword.
</p>

<p><i>[
2013-09 Chicago
]</i></p>


<p>
Stefanus provides improved wording (replaced below)
</p>

<p><i>[
2013-09 Chicago
]</i></p>


<p>
Move as Immediate after reviewing Stefanus's revised wording, apply the new wording to the Working Paper.
</p>



<p id="res-2018"><b>Proposed resolution:</b></p>
<p>This wording is relative to the FDIS.</p>

<ol>
<li><p>Modify 28.6.6 <a href="https://wg21.link/re.traits">[re.traits]</a> p. 10 as indicated:</p>
<blockquote><pre>
template &lt;class ForwardIterator&gt;
  char_class_type lookup_classname(
    ForwardIterator first, ForwardIterator last, bool icase = false) const;
</pre><blockquote><p>
-9- <i>Returns</i>: an unspecified value that represents the character classification named by the character
sequence designated by the iterator range [<code>first</code>,<code>last</code>). If the parameter <code>icase</code> is true then the
returned mask identifies the character classification without regard to the case of the characters being
matched, otherwise it does honor the case of the characters being matched.(footnote 335) The value returned shall
be independent of the case of the characters in the character sequence. If the name is not recognized
then returns a value that compares equal to <code>0</code>.
<p/>
-10- <i>Remarks</i>: For <code>regex_traits&lt;char&gt;</code>, at least the <del>names "d", "w", "s", "alnum", "alpha", "blank",
"cntrl", "digit", "graph", "lower", "print", "punct", "space", "upper" and "xdigit"</del><ins>narrow character
names in Table X</ins> shall be recognized. For <code>regex_traits&lt;wchar_t&gt;</code>, at least the <del>names L"d", L"w", 
L"s", L"alnum", L"alpha", L"blank", L"cntrl", L"digit", L"graph", L"lower", L"print", L"punct", L"space", L"upper" and 
L"xdigit"</del><ins>wide character names in Table X</ins> shall be recognized.
</p></blockquote></blockquote>
</li>

<li><p>Modify 28.6.6 <a href="https://wg21.link/re.traits">[re.traits]</a> p. 12 as indicated:</p>
<blockquote><pre>
bool isctype(charT c, char_class_type f) const;
</pre><blockquote><p>
-11- <i>Effects</i>: Determines if the character <code>c</code> is a member of the character classification represented by <code>f</code>.
<p/>
-12- <i>Returns</i>: <del>Converts <code>f</code> into a value <code>m</code> of type <code>std::ctype_base::mask</code> in an 
 unspecified manner, and returns true if <code>use_facet&lt;ctype&lt;charT&gt; &gt;(getloc()).is(m, c)</code> is true. Otherwise 
 returns true if <code>f</code> bitwise or'ed with the result of calling <code>lookup_classname</code> with an iterator pair that 
 designates the character sequence "w" is not equal to <code>0</code> and <code>c == '_'</code>, or if <code>f</code> bitwise or'ed 
 with the result of calling <code>lookup_classname</code> with an iterator pair that designates the character sequence "blank" 
 is not equal to <code>0</code> and <code>c</code> is one of an implementation-defined subset of the characters for 
 which <code>isspace(c, getloc())</code> returns true, otherwise returns false.</del>
<ins>Given an exposition-only function prototype</ins></p>
<blockquote><pre>
<ins style="text-decoration: none">
  template&lt;class C&gt;
   ctype_base::mask convert(typename regex_traits&lt;C&gt;::char_class_type f);
</ins>
</pre></blockquote>
<p>
<ins>that returns a value in which each <code>ctype_base::mask</code> value corresponding to a value in <code>f</code> named in Table <i>X</i> is set,
then the result is determined as if by:</ins>
</p>
<blockquote><pre>
<ins style="text-decoration: none">
ctype_base::mask m = convert&lt;charT&gt;(f);
const ctype&lt;charT&gt;&amp; ct = use_facet&lt;ctype&lt;charT&gt;&gt;(getloc());
if (ct.is(m, c)) {
  return true;
} else if (c == ct.widen('_')) {
  charT w[1] = { ct.widen('w') };
  char_class_type x = lookup_classname(w, w+1);
  
  return (f&amp;x) == x;
} else {
  return false;
} 
</ins>
</pre></blockquote>
<p><ins>[<i>Example</i>:</ins></p>

<blockquote><pre><ins style="text-decoration: none">
regex_traits&lt;char&gt; t;
string d("d");
string u("upper");
regex_traits&lt;char&gt;::char_class_type f;
f = t.lookup_classname(d.begin(), d.end());
f |= t.lookup_classname(u.begin(), u.end());
ctype_base::mask m = convert&lt;char&gt;(f); // m == ctype_base::digit|ctype_base::upper
</ins></pre></blockquote>
<p><ins>&mdash; <i>end example</i>]</ins></p>
<p><ins>[<i>Example</i>:</ins></p>
<blockquote><pre><ins style="text-decoration: none">
regex_traits&lt;char&gt; t;
string w("w");
regex_traits&lt;char&gt;::char_class_type f;
f = t.lookup_classname(w.begin(), w.end());
t.isctype('A', f); // returns true
t.isctype('_', f); // returns true
t.isctype(' ', f); // returns false
</ins></pre></blockquote>
<p><ins>&mdash; <i>end example</i>]</ins></p>
</blockquote></blockquote>
</li>

<li><p>At the end of 28.6.6 <a href="https://wg21.link/re.traits">[re.traits]</a> add a new "Table X &mdash; Character class names and corresponding ctype masks":</p>

<blockquote>
<table border="1">
<caption>Table X &mdash; Character class names and corresponding ctype masks</caption>

<tr>
<th>Narrow character name</th>
<th>Wide character name</th>
<th>Corresponding <code>ctype_base::mask</code> value</th>
</tr>
 
<tr>
<td><code>"alnum"</code></td>
<td><code>L"alnum"</code></td>
<td><code>ctype_base::alnum</code></td>
</tr>

<tr>
<td><code>"alpha"</code></td>
<td><code>L"alpha"</code></td>
<td><code>ctype_base::alpha</code></td>
</tr>

<tr>
<td><code>"blank"</code></td>
<td><code>L"blank"</code></td>
<td><code>ctype_base::blank</code></td>
</tr>

<tr>
<td><code>"cntrl"</code></td>
<td><code>L"cntrl"</code></td>
<td><code>ctype_base::cntrl</code></td>
</tr>

<tr>
<td><code>"digit"</code></td>
<td><code>L"digit"</code></td>
<td><code>ctype_base::digit</code></td>
</tr>

<tr>
<td><code>"d"</code></td>
<td><code>L"d"</code></td>
<td><code>ctype_base::digit</code></td>
</tr>

<tr>
<td><code>"graph"</code></td>
<td><code>L"graph"</code></td>
<td><code>ctype_base::graph</code></td>
</tr>

<tr>
<td><code>"lower"</code></td>
<td><code>L"lower"</code></td>
<td><code>ctype_base::lower</code></td>
</tr>

<tr>
<td><code>"print"</code></td>
<td><code>L"print"</code></td>
<td><code>ctype_base::print</code></td>
</tr>

<tr>
<td><code>"punct"</code></td>
<td><code>L"punct"</code></td>
<td><code>ctype_base::punct</code></td>
</tr>

<tr>
<td><code>"space"</code></td>
<td><code>L"space"</code></td>
<td><code>ctype_base::space</code></td>
</tr>

<tr>
<td><code>"s"</code></td>
<td><code>L"s"</code></td>
<td><code>ctype_base::space</code></td>
</tr>

<tr>
<td><code>"upper"</code></td>
<td><code>L"upper"</code></td>
<td><code>ctype_base::upper</code></td>
</tr>

<tr>
<td><code>"w"</code></td>
<td><code>L"w"</code></td>
<td><code>ctype_base::alnum</code></td>
</tr>

<tr>
<td><code>"xdigit"</code></td>
<td><code>L"xdigit"</code></td>
<td><code>ctype_base::xdigit</code></td>
</tr>

</table>
</blockquote> 
</li>
</ol>






</body>
</html>
