<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue 3290: Are std::format field widths code units, code points, or something else?</title>
<meta property="og:title" content="Issue 3290: Are std::format field widths code units, code points, or something else?">
<meta property="og:description" content="C++ library issue. Status: C++20">
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue3290.html">
<meta property="og:type" content="website">
<meta property="og:image" content="http://cplusplus.github.io/LWG/images/cpp_logo.png">
<meta property="og:image:alt" content="C++ logo">
<style>
  p {text-align:justify}
  li {text-align:justify}
  pre code.backtick::before { content: "`" }
  pre code.backtick::after { content: "`" }
  blockquote.note
  {
    background-color:#E0E0E0;
    padding-left: 15px;
    padding-right: 15px;
    padding-top: 1px;
    padding-bottom: 1px;
  }
  ins {background-color:#A0FFA0}
  del {background-color:#FFA0A0}
  table.issues-index { border: 1px solid; border-collapse: collapse; }
  table.issues-index th { text-align: center; padding: 4px; border: 1px solid; }
  table.issues-index td { padding: 4px; border: 1px solid; }
  table.issues-index td:nth-child(1) { text-align: right; }
  table.issues-index td:nth-child(2) { text-align: left; }
  table.issues-index td:nth-child(3) { text-align: left; }
  table.issues-index td:nth-child(4) { text-align: left; }
  table.issues-index td:nth-child(5) { text-align: center; }
  table.issues-index td:nth-child(6) { text-align: center; }
  table.issues-index td:nth-child(7) { text-align: left; }
  table.issues-index td:nth-child(5) span.no-pr { color: red; }
  @media (prefers-color-scheme: dark) {
     html {
        color: #ddd;
        background-color: black;
     }
     ins {
        background-color: #225522
     }
     del {
        background-color: #662222
     }
     a {
        color: #6af
     }
     a:visited {
        color: #6af
     }
     blockquote.note
     {
        background-color: rgba(255, 255, 255, .10)
     }
  }
</style>
</head>
<body>
<hr>
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#C++20">C++20</a> status.</em></p>
<h3 id="3290"><a href="lwg-defects.html#3290">3290</a>. Are <code>std::format</code> field widths code units, code points, or something else?</h3>
<p><b>Section:</b> 28.5.2.2 <a href="https://wg21.link/format.string.std">[format.string.std]</a> <b>Status:</b> <a href="lwg-active.html#C++20">C++20</a>
 <b>Submitter:</b> Tom Honermann <b>Opened:</b> 2019-09-08 <b>Last modified:</b> 2021-02-25</p>
<p><b>Priority: </b>Not Prioritized
</p>
<p><b>View other</b> <a href="lwg-index-open.html#format.string.std">active issues</a> in [format.string.std].</p>
<p><b>View all other</b> <a href="lwg-index.html#format.string.std">issues</a> in [format.string.std].</p>
<p><b>View all issues with</b> <a href="lwg-status.html#C++20">C++20</a> status.</p>
<p><b>Discussion:</b></p>
<p>
28.5.2.2 <a href="https://wg21.link/format.string.std">[format.string.std]</a> p7 states:
</p>
<blockquote style="border-left: 3px solid #ccc;padding-left: 15px;">
<p>
The <i>positive-integer</i> in <i>width</i> is a decimal integer defining the minimum field width. If <i>width</i> 
is not specified, there is no minimum field width, and the field width is determined based on the content of the field.
</p>
</blockquote>
<p>
Is field width measured in code units, code points, or something else?
<p/>
Consider the following example assuming a UTF-8 locale:
</p>
<blockquote><pre>
std::format("{}", "\xC3\x81");     // U+00C1        { LATIN CAPITAL LETTER A WITH ACUTE }
std::format("{}", "\x41\xCC\x81"); // U+0041 U+0301 { LATIN CAPITAL LETTER A } { COMBINING ACUTE ACCENT }
</pre></blockquote>
<p>
In both cases, the arguments encode the same user-perceived character (&#xc1;). The first uses two UTF-8 
code units to encode a single code point that represents a single glyph using a composed Unicode 
normalization form. The second uses three code units to encode two code points that represent the same 
glyph using a decomposed Unicode normalization form.
<p/>
How is the field width determined? If measured in code units, the first has a width of 2 and the second of 
3. If measured in code points, the first has a width of 1 and the second of 2. If measured in grapheme 
clusters, both have a width of 1. Is the determination locale dependent?
</p>

<strong>Previous resolution [SUPERSEDED]:</strong>
<blockquote class="note">
<p>This wording is relative to <a href="https://wg21.link/n4830">N4830</a>.</p>

<ol>
<li><p>Modify 28.5.2.2 <a href="https://wg21.link/format.string.std">[format.string.std]</a> as indicated:</p>

<blockquote>
<p>
-7- The <i>positive-integer</i> in <i>width</i> is a decimal integer defining the minimum field width. If 
<i>width</i> is not specified, there is no minimum field width, and the field width is determined based 
on the content of the field. <ins>Field width is measured in code units. Each byte of a multibyte 
character contributes to the field width.</ins>
</p>
</blockquote>
</li>

</ol>
</blockquote>

<p><i>[2020-02-13, Prague]</i></p>

<p>
Resolved by <a href="https://wg21.link/p1868r2">P1868R2</a>
</p>

<p><i>[2020-04-07 Voted into the WP in Prague. Status changed: New &rarr; WP.]</i></p>



<p id="res-3290"><b>Proposed resolution:</b></p>
<p>
</p>




</body>
</html>
