<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 524: regex named character classes and case-insensitivity don't mix</title>
<meta property="og:title" content="Issue 524: regex named character classes and case-insensitivity don't mix">
<meta property="og:description" content="C++ library issue. Status: CD1">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue524.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#CD1">CD1</a> status.</em></p>
<h3 id="524"><a href="lwg-defects.html#524">524</a>. regex named character classes and case-insensitivity don't mix</h3>
<p><b>Section:</b> 28.6 <a href="https://wg21.link/re">[re]</a> <b>Status:</b> <a href="lwg-active.html#CD1">CD1</a>
 <b>Submitter:</b> Eric Niebler <b>Opened:</b> 2005-07-01 <b>Last modified:</b> 2016-01-28</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View other</b> <a href="lwg-index-open.html#re">active issues</a> in [re].</p>
<p><b>View all other</b> <a href="lwg-index.html#re">issues</a> in [re].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#CD1">CD1</a> status.</p>
<p><b>Discussion:</b></p>
<p>
This defect is also being discussed on the Boost developers list. The 
full discussion can be found here:
http://lists.boost.org/boost/2005/07/29546.php
</p>
<p>
-- Begin original message --
</p>
<p>
Also, I may have found another issue, closely related to the one under
discussion. It regards case-insensitive matching of named character
classes. The regex_traits&lt;&gt; provides two functions for working with
named char classes: lookup_classname and isctype. To match a char class
such as [[:alpha:]], you pass "alpha" to lookup_classname and get a
bitmask. Later, you pass a char and the bitmask to isctype and get a
bool yes/no answer.
</p>
<p>
But how does case-insensitivity work in this scenario? Suppose we're
doing a case-insensitive match on [[:lower:]]. It should behave as if it
were [[:lower:][:upper:]], right? But there doesn't seem to be enough
smarts in the regex_traits interface to do this.
</p>
<p>
Imagine I write a traits class which recognizes [[:fubar:]], and the
"fubar" char class happens to be case-sensitive. How is the regex engine
to know that? And how should it do a case-insensitive match of a
character against the [[:fubar:]] char class? John, can you confirm this
is a legitimate problem?
</p>
<p>
I see two options:
</p>
<p>
1) Add a bool icase parameter to lookup_classname. Then,
lookup_classname( "upper", true ) will know to return lower|upper
instead of just upper.
</p>
<p>
2) Add a isctype_nocase function
</p>
<p>
I prefer (1) because the extra computation happens at the time the
pattern is compiled rather than when it is executed.
</p>
<p>
-- End original message --
</p>

<p>
For what it's worth, John has also expressed his preference for option 
(1) above.
</p>


<p id="res-524"><b>Proposed resolution:</b></p>
<p>
Adopt the proposed resolution in
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2409.pdf">N2409</a>.
</p>


<p><i>[
Kona (2007): The LWG adopted the proposed resolution of N2409 for this issue.
The LWG voted to accelerate this issue to Ready status to be voted into the WP at Kona.
]</i></p>





</body>
</html>
