<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 4017: Behavior of std::views::split on an empty range</title>
<meta property="og:title" content="Issue 4017: Behavior of std::views::split on an empty range">
<meta property="og:description" content="C++ library issue. Status: New">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue4017.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#New">New</a> status.</em></p>
<h3 id="4017"><a href="lwg-active.html#4017">4017</a>. Behavior of <code>std::views::split</code> on an empty range</h3>
<p><b>Section:</b> 25.7.17.3 <a href="https://wg21.link/range.split.iterator">[range.split.iterator]</a>, 25.7.16.3 <a href="https://wg21.link/range.lazy.split.outer">[range.lazy.split.outer]</a> <b>Status:</b> <a href="lwg-active.html#New">New</a>
 <b>Submitter:</b> David Stone <b>Opened:</b> 2023-11-19 <b>Last modified:</b> 2024-06-24</p>
<p><b>Priority: </b>3
</p>
<p><b>View all issues with</b> <a href="lwg-status.html#New">New</a> status.</p>
<p><b>Discussion:</b></p>
<p>
Consider the following example (which uses <code>fmt::println</code> instead of <code>std::println</code>, 
but they do the same thing in C++23):
</p>
<blockquote><pre>
#include &lt;iostream&gt;
#include &lt;string&gt;
#include &lt;ranges&gt;
#include &lt;fmt/ranges.h&gt;

int main()
{
  fmt::println("{}", std::views::split(std::string(" x "), ' '));
  fmt::println("{}", std::views::split(std::string(" "), ' '));
  fmt::println("{}", std::views::split(std::string("x"), ' '));
  fmt::println("{}", std::views::split(std::string(""), ' '));
}
</pre></blockquote>
<p>
The output of this program (as specified today) is 
</p>
<blockquote><pre>
[[], ['x'], []]
[[], []]
[['x']]
[]
</pre></blockquote>
<p>
The principle set out in LWG <a href="lwg-defects.html#3478" title="views::split drops trailing empty range (Status: Resolved)">3478</a><sup><a href="https://cplusplus.github.io/LWG/issue3478" title="Latest snapshot">(i)</a></sup> is that splitting a sequence containing <code>N</code> 
delimiters should lead to <code>N+1</code> subranges. That principle was broken if the <code>N</code>-th 
delimiter was at the end of the sequence, which was fixed by <a href="https://wg21.link/P2210" title=" Superior String Splitting">P2210</a>. 
<p/>
However, the principle is still broken if the sequence contains zero delimiters. A non-empty sequence 
will split into one range, but an empty sequence will split into zero ranges. That last line is incorrect 
&mdash; splitting an empty range on a delimiter should yield a range of an empty range &mdash; not 
simply an empty range.
<p/>
Proposed Resolution: Currently, <code>split_view::iterator</code>'s constructor unconditionally initializes 
<code><i>trailing_empty_</i></code> to <code>false</code>. Instead, change 25.7.17.3 <a href="https://wg21.link/range.split.iterator">[range.split.iterator]</a>/1 
to initialize <code><i>trailing_empty_</i></code> to <code><i>cur_</i> == <i>next_</i>.begin()</code> (i.e. 
<code><i>trailing_empty_</i></code> is typically <code>false</code>, but if we're empty on initialization then we 
have to have a trailing empty range).
<p/>
The following demo shows Barry Revzin's implementation from <a href="https://wg21.link/P2210" title=" Superior String Splitting">P2210</a>, adjusted to fix this: 
<a href="https://godbolt.org/z/axWb64j9f">godbolt.org/z/axWb64j9f</a>
</p>

<p><i>[2024-03-11; Reflector poll]</i></p>

<p>
Set priority to 3 after reflector poll.
</p>

<p><i>[2024-03; Reflector comments]</i></p>

<p>
"For <code class='backtick'>split</code>, we need to adjust the definition of <code class='backtick'>end()</code> for the
<code class='backtick'>common_range</code> case
(which may require introducing a new constructor to the iterator);
right now it would compare <code class='backtick'>ranges::end(base_)</code> against a value-initialized
iterator, which is not in the domain of <code class='backtick'>==</code>.
For <code class='backtick'>lazy_split</code>, we need to also change the non-forward overload."
</p>
<p>
"What should splitting an empty range on an empty pattern produce?
Right now the behavior is that splitting a range of N &gt; 0 elements
with an empty pattern produces a range of N single-element ranges.
I suppose you can argue that an empty pattern matches between adjacent elements
but not at the start or end, so that an empty range, like a single-element range,
contains 0 delimiters so should produce a range of one empty range.
But it's also at least arguable that this should produce an empty range instead,
so that we maintain the N element &lt;-&gt; N subrange
and 1 element per subrange invariant.
</p>



<p id="res-4017"><b>Proposed resolution:</b></p>
<p>
This wording is relative to <a href="https://wg21.link/N4964" title=" Working Draft, Programming Languages — C++">N4964</a>.
</p>

<ol>

<li><p>Modify 25.7.17.3 <a href="https://wg21.link/range.split.iterator">[range.split.iterator]</a> as indicated:</p>

<blockquote>
<pre>
constexpr <i>iterator</i>(split_view&amp; parent, iterator_t&lt;V&gt; current, subrange&lt;iterator_t&lt;V&gt;&gt; next);
</pre>
<blockquote>
<p>
-1- <i>Effects</i>: Initializes <code><i>parent_</i></code> with <code>addressof(parent)</code>, <code><i>cur_</i></code> with 
<code>std::move(current)</code>, <del>and</del> <code><i>next_</i></code> with <code>std::move(next)</code><ins>, and
<code><i>trailing_empty_</i></code> with <code><i>cur_</i> == <i>next_</i>.begin()</code></ins>.
</p>
</blockquote>
</blockquote>
</li>

<li><p>Modify 25.7.16.3 <a href="https://wg21.link/range.lazy.split.outer">[range.lazy.split.outer]</a> as indicated:</p>

<blockquote>
<pre>
constexpr <i>outer-iterator</i>(<i>Parent</i>&amp; parent, iterator_t&lt;<i>Base</i>&gt; current)
  requires forward_range&lt;<i>Base</i>&gt;;
</pre>
<blockquote>
<p>
-3- <i>Effects</i>: Initializes <code><i>parent_</i></code> with <code>addressof(parent)</code><ins>,</ins> <del>and</del> 
<code><i>current_</i></code> with <code>std::move(current)</code><ins>, and <code><i>trailing_empty_</i></code> with 
<code><i>current_</i> == ranges::end(parent.<i>base_</i>)</code></ins>.
</p>
</blockquote>
</blockquote>
</li>

</ol>





</body>
</html>
