<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 2273: regex_match ambiguity</title>
<meta property="og:title" content="Issue 2273: regex_match ambiguity">
<meta property="og:description" content="C++ library issue. Status: C++17">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue2273.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#C++17">C++17</a> status.</em></p>
<h3 id="2273"><a href="lwg-defects.html#2273">2273</a>. <code>regex_match</code> ambiguity</h3>
<p><b>Section:</b> 28.6.10.2 <a href="https://wg21.link/re.alg.match">[re.alg.match]</a> <b>Status:</b> <a href="lwg-active.html#C++17">C++17</a>
 <b>Submitter:</b> Howard Hinnant <b>Opened:</b> 2013-07-14 <b>Last modified:</b> 2017-07-30</p>
<p><b>Priority: </b>2
</p>
<p><b>View all other</b> <a href="lwg-index.html#re.alg.match">issues</a> in [re.alg.match].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#C++17">C++17</a> status.</p>
<p><b>Discussion:</b></p>
<p>
28.6.10.2 <a href="https://wg21.link/re.alg.match">[re.alg.match]</a> p2 in describing regex_match says:
</p>
<blockquote>
<p>
-2-  <i>Effects:</i> Determines whether there is a match between the regular expression <code>e</code>, and all of 
the character sequence <code>[first,last)</code>. The parameter <code>flags</code> is used to control how the expression 
is matched against the character sequence. Returns true if such a match exists, false otherwise.
</p>
</blockquote>

<p>
It has come to my attention that different people are interpreting the first sentence of p2 in different ways:
</p>

<ol>
<li><p>
If a search of the input string using the regular expression <code>e</code> matches the entire input string, 
<code>regex_match</code> should return true.
</p></li>
<li><p>
Search the input string using the regular expression <code>e</code>. Reject all matches that do not match the 
entire input string. If a such a match is found, return true.
</p></li>
</ol>

<p>
The difference between these two subtly different interpretations is found using the following ECMAScript example:
</p>

<blockquote><pre>
std::regex re("Get|GetValue");
</pre></blockquote>

<p>
Using <code>regex_search</code>, this <code>re</code> can never match the input string <code>"GetValue"</code>, because ECMA 
specifies that alternations are ordered, not greedy. As soon as <code>"Get"</code> is matched in the left alternation, 
the matching algorithm stops.
<p/>
Using definition 1, <code>regex_match</code> would return false for an input string of <code>"GetValue"</code>.
<p/>
However definition 2 alters the grammar and appears equivalent to augmenting the regex with a trailing <code>'$'</code>, 
which is an anchor that specifies, reject any matches which do not come at the end of the input sequence.
So, using definition 2, <code>regex_match</code> would return true for an input string of <code>"GetValue"</code>.
<p/>
My opinion is that it would be strange to have <code>regex_match</code> return true for a <code>string/regex</code> 
pair that <code>regex_search</code> could never find. I.e. I favor definition 1.
<p/>
John Maddock writes:
<p/>
The intention was always that <code>regex_match</code> would reject any match candidate which didn't match the entire 
input string. So it would find <code>GetValue</code> in this case because the <code>"Get"</code> alternative had already 
been rejected as not matching. Note that the comparison with ECMA script is somewhat moot, as ECMAScript defines 
the regex grammar (the bit we've imported), it does not define anything like <code>regex_match</code>, nor do we import 
from ECMAScript the behaviour of that function. So IMO the function should behave consistently regardless of the 
regex dialect chosen. Saying "use awk regexes" doesn't cut it, because that changes the grammar in other ways.
<p/>
(John favors definition 2).
<p/>
We need to clarify 28.6.10.2 <a href="https://wg21.link/re.alg.match">[re.alg.match]</a>/p2 in one of these two directions.
</p>

<p><i>[2014-06-21, Rapperswil]</i></p>

<p>
AM: I think there's a clear direction and consensus we agree with John Maddock's position, and if noone else 
thinks we need the other function I won't ask for it.
<p/>
Marshall Clow and STL to draft.
</p>

<p><i>[2015-06-10, Marshall suggests concrete wording]</i></p>


<p><i>[2015-01-11, Telecon]</i></p>

<p>Move to Tenatatively Ready</p>


<p id="res-2273"><b>Proposed resolution:</b></p>
<p>
This wording is relative to N4527.
</p>

<ol>
<li><p>Change 28.6.10.2 <a href="https://wg21.link/re.alg.match">[re.alg.match]</a>/2, as follows:</p>

<blockquote><pre>
template &lt;class BidirectionalIterator, class Allocator, class charT, class traits&gt;
  bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                   match_results&lt;BidirectionalIterator, Allocator&gt;&amp; m,
                   const basic_regex&lt;charT, traits>&amp; e,
                   regex_constants::match_flag_type flags =
                     regex_constants::match_default);
</pre>
<blockquote>
<p>
-1- <i>Requires</i>: The type <code>BidirectionalIterator</code> shall satisfy the requirements of a Bidirectional Iterator
(24.2.6).
<p/>
-2- <i>Effects</i>: Determines whether there is a match between the regular expression <code>e</code>, and all of the character
sequence <code>[first,last)</code>. The parameter <code>flags</code> is used to control how the expression is matched against
the character sequence. <ins>When determining if there is a match, only potential matches that match the entire character 
sequence are considered.</ins> Returns <code>true</code> if such a match exists, <code>false</code> otherwise. <ins>[<i>Example</i>:</ins>
</p>
<blockquote>
<pre>
<ins>std::regex re("Get|GetValue");
std::cmatch m;
regex_search("GetValue", m, re);	// returns true, and m[0] contains "Get"
regex_match ("GetValue", m, re);	// returns true, and m[0] contains "GetValue"
regex_search("GetValues", m, re);	// returns true, and m[0] contains "Get"
regex_match ("GetValues", m, re);	// returns false</ins>
</pre>
</blockquote>
<p>
<ins>&mdash; <i>end example</i>]</ins>
<p/>
[&hellip;]
</p>
</blockquote>
</blockquote>
</li>
</ol>





</body>
</html>
