<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2986: Handling of multi-character collating elements by the regex FSM is underspecified</title>
<meta property="og:title" content="Issue 2986: Handling of multi-character collating elements by the regex FSM is underspecified">
<meta property="og:description" content="C++ library issue. Status: New">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2986.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#New">New</a> status.</em></p>
<h3 id="2986"><a href="lwg-active.html#2986">2986</a>. Handling of multi-character collating elements by the <code>regex</code> FSM is underspecified</h3>
<p><b>Section:</b> 28.6.12 <a href="https://wg21.link/re.grammar">[re.grammar]</a> <b>Status:</b> <a href="lwg-active.html#New">New</a>
 <b>Submitter:</b> Hubert Tong <b>Opened:</b> 2017-06-25 <b>Last modified:</b> 2017-07-12</p>
<p><b>Priority: </b>4
</p>
<p><b>View other</b> <a href="lwg-index-open.html#re.grammar">active issues</a> in [re.grammar].</p>
<p><b>View all other</b> <a href="lwg-index.html#re.grammar">issues</a> in [re.grammar].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#New">New</a> status.</p>
<p><b>Discussion:</b></p>
<p>
In <a href="https://wg21.link/n4660">N4660</a> subclause 31.13 [re.grammar] paragraph 5:
</p>
<blockquote><p>
The productions <code>ClassAtomExClass</code>, <code>ClassAtomCollatingElement</code> and <code>ClassAtomEquivalence</code> provide
functionality equivalent to that of the same features in regular expressions in POSIX.
</p></blockquote>
<p>
The broadness of the above statement makes it sound like it is merely a statement of intent; however, this appears to 
be a necessary normative statement insofar as identifying the general semantics to be associated with the syntactic 
forms identified. In any case, if it is meant for <code>ClassAtomCollatingElement</code> to provide functionality equivalent 
to a collating symbol in a POSIX bracket expression, multi-character collating elements need to be considered.
<p/>
In [re.grammar] paragraph 14:
</p>
<blockquote><p>
The behavior of the internal finite state machine representation when used to match a sequence of characters is 
as described in ECMA-262. The behavior is modified according to any <code>match_flag_type</code> flags specified when 
using the regular expression object in one of the regular expression algorithms. The behavior is also localized 
by interaction with the traits class template parameter as follows: [bullets 14.1 to 14.4]
</p></blockquote>
<p>
In none of the bullets does the wording handle multi-character collating elements in a clear manner:
</p>
<ul>
<li><p>14.1 deals in characters.</p></li>
<li><p>14.2 deals in characters (<code>traits_inst.translate</code> accepts only a single character).</p></li>
<li><p>14.3 might handle a multi-character collating element; however, there is no specification of how 
such a collating element is to be identified from the sequence of characters. Additionally, the definition 
of primary equivalence class specifies that it is a set of characters (not of collating elements).</p></li>
<li><p>14.4 deals in characters.</p></li>
</ul>
<p>
The ECMA-262 specification for <em>ClassRanges</em> also deals in characters.
</p>

<p><i>[2017-07 Toronto Monday issue prioritization]</i></p>

<p>Priority 4</p>


<p id="res-2986"><b>Proposed resolution:</b></p>




</body>
</html>
