<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 3328: Clarify that std::string is not good for UTF-8</title>
<meta property="og:title" content="Issue 3328: Clarify that std::string is not good for UTF-8">
<meta property="og:description" content="C++ library issue. Status: C++20">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue3328.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#C++20">C++20</a> status.</em></p>
<h3 id="3328"><a href="lwg-defects.html#3328">3328</a>. Clarify that <code>std::string</code> is not good for UTF-8</h3>
<p><b>Section:</b> D.22.1 <a href="https://wg21.link/depr.fs.path.factory">[depr.fs.path.factory]</a> <b>Status:</b> <a href="lwg-active.html#C++20">C++20</a>
 <b>Submitter:</b> The Netherlands <b>Opened:</b> 2019-11-07 <b>Last modified:</b> 2021-02-25</p>
<p><b>Priority: </b>0
</p>
<p><b>View all other</b> <a href="lwg-index.html#depr.fs.path.factory">issues</a> in [depr.fs.path.factory].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#C++20">C++20</a> status.</p>
<p><b>Discussion:</b></p>
<p><b>Addresses <a href="https://github.com/cplusplus/nbballot/issues/371">NL 375</a></b></p>

<p>
Example in deprecated section implies that <code>std::string</code> is the type to use for utf8 strings.
</p>
<blockquote><p>
[<i>Example:</i> A string is to be read from a database that is encoded in UTF-8, and used
to create a directory using the native encoding for filenames:
<pre>
namespace fs = std::filesystem;
std::string utf8_string = read_utf8_data();
fs::create_directory(fs::u8path(utf8_string));
</pre>
</p></blockquote>
<p>
Proposed change:
</p>
<p>
Add clarification that <code>std::string</code> is the wrong type for utf8 strings
</p>
<p>
<b>Jeff Garland:</b>
<p/>
SG16 in Belfast: Recommend to accept with a modification to update the example in
D.22.1 <a href="https://wg21.link/depr.fs.path.factory">[depr.fs.path.factory]</a> p4 to state that <code>std::u8string</code> should
be preferred for UTF-8 data.
<p/>
Rationale: The example code is representative of historic use of <code>std::filesystem::u8path</code>
and should not be changed to use <code>std::u8string</code>. The recommended change is to a
non-normative example and may therefore be considered editorial.
</p>

<p><strong>Previous resolution [SUPERSEDED]:</strong></p>
<blockquote class="note">
<p>This wording is relative to <a href="https://wg21.link/n4835">N4835</a>.</p>

<ol>
<li><p>Modify D.22.1 <a href="https://wg21.link/depr.fs.path.factory">[depr.fs.path.factory]</a> as indicated:</p>

<blockquote>
<p>
-4- [<i>Example:</i> A string is to be read from a database that is encoded in UTF-8,
and used to create a directory using the native encoding for filenames:
<blockquote><pre>
namespace fs = std::filesystem;
std::string utf8_string = read_utf8_data();
fs::create_directory(fs::u8path(utf8_string));
</pre></blockquote>
For POSIX-based operating systems with the native narrow encoding set to UTF-8,
no encoding or type conversion occurs.
<p/>
For POSIX-based operating systems with the native narrow encoding not set to UTF-8,
a conversion to UTF-32 occurs, followed by a conversion to the current native narrow
encoding. Some Unicode characters may have no native character set representation.
<p/>
For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. &mdash;
<i>end example</i>]
<p/>
<ins>[<i>Note:</i> The example above is representative of historic use of
<code>filesystem</code> <code>u8path</code>. New code should use <code>std::u8string</code>
in place of <code>std::string</code>. &mdash; <i>end note</i>]</ins>
</p>
</blockquote>
</li>

</ol>
</blockquote>

<p><em>LWG Belfast Friday Morning</em></p>
<p>
Requested changes:
<ul>
<li>Historic =&gt; historical.</li>
<li>Add missing :: before u8path.</li>
<li>Remove ISO rules forbidden 'should' in a note.</li>
<li>Use language describing why new code should use the u8string constructor
rather than preaching that new code should do something.</li>
</ul>
Billy O'Neal provides updated wording.
</p>

<p><i>[2020-02 Moved to Immediate on Tuesday in Prague.]</i></p>



<p id="res-3328"><b>Proposed resolution:</b></p>
<p>This wording is relative to <a href="https://wg21.link/n4835">N4835</a>.</p>

<ol>
<li><p>Modify D.22.1 <a href="https://wg21.link/depr.fs.path.factory">[depr.fs.path.factory]</a> as indicated:</p>

<blockquote>
<p>
-4- [<i>Example:</i> A string is to be read from a database that is encoded in UTF-8,
and used to create a directory using the native encoding for filenames:
<blockquote><pre>
namespace fs = std::filesystem;
std::string utf8_string = read_utf8_data();
fs::create_directory(fs::u8path(utf8_string));
</pre></blockquote>
For POSIX-based operating systems with the native narrow encoding set to UTF-8,
no encoding or type conversion occurs.
<p/>
For POSIX-based operating systems with the native narrow encoding not set to UTF-8,
a conversion to UTF-32 occurs, followed by a conversion to the current native narrow
encoding. Some Unicode characters may have no native character set representation.
<p/>
For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. &mdash;
<i>end example</i>]
<p/>
<ins>[<i>Note:</i> The example above is representative of a historical use of
<code>filesystem::u8path</code>. Passing a <code>std::u8string</code> to <code>path</code>'s
constructor is preferred for an indication of UTF-8 encoding more consistent with
<code>path</code>'s handling of other encodings.  &mdash; <i>end note</i>]</ins>
</p>
</blockquote>
</li>

</ol>





</body>
</html>
