<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

<style type="text/css">

body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }

p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }

code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
              border: 1px solid #E1E28E; }

p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
                padding-right: 1ex; }
.attribute dd { margin-left: 0em; }

blockquote.std { color: #000000; background-color: #F1F1F1;
                 border: 1px solid #D1D1D1;
                 padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
                    color: #000000; background-color: #FFEBFF;
                    border: 1px solid #ECD7EC;
                    padding-left: 0.5em; padding-right: 0.5em; }

blockquote.stdins { text-decoration: underline;
                    color: #000000; background-color: #C8FFC8;
                    border: 1px solid #B3EBB3; padding: 0.5em; }

table { border: 1px solid black; border-spacing: 0px;
        margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
     padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
     padding-left: 0.8em; border: none; }

p.grammarlhs { margin-bottom: 0 }
p.grammarrhs { margin-left:8em; margin-top:0; text-indent:-4em }

</style>

<title>Literal operator templates for strings</title>
</head>

<body>

<div style="text-align: right; float: right">
<p>ISO/IEC JTC1 SC22 WG21<br>N3599<br>Richard Smith<br>2013-03-13</p>
</div>

<h1>Literal operator templates for strings</h1>

<h2>Overview</h2>

<p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2765.pdf">
N2765</a> added the ability for users to define their own literal suffixes.
Several forms of literal operators are available, with one notable omission:
there is no template form of literal operator for character and string
literals. <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2750.pdf">
N2750</a> justifies this restriction based on two factors:</p>

<ul>
  <li>there may be demand for a <q>raw</q> form of string literal, in which
  <pre class="extract"><code>"Hello, " L"Worl\u0044!"</code></pre>
  is distinguishable from
  <pre class="extract"><code>L"Hello, World!"</code></pre>
  but this interacted badly with phases of translation, and
  </li>
  <li>no compelling use cases for this feature were known.</li>
</ul>

<p>Neither of these is still true, and we now have evidence that a literal
operator template for string literals would be valuable; indeed, in one codebase
where literal operators are not yet permitted, this form of literal operator has
been requested more frequently than any of the forms which C++11 permits.</p>

<h2>Examples</h2>

<h3>Type-safe printf</h3>

<p>With literal operator templates, it is possible to write a type-safe
<code>printf</code> facility:</p>

<pre class="extract"><code>// A tuple of types.
template&lt;typename ...Ts&gt; struct types {
  template&lt;typename T&gt; using push_front = types&lt;T, Ts...&gt;;
  template&lt;template&lt;typename...&gt; class F&gt; using apply = F&lt;Ts...&gt;;
};

// Select a type from a format character.
template&lt;char K&gt; struct format_type_impl;
template&lt;&gt; struct format_type_impl&lt;'d'&gt; { using type = int; };
template&lt;&gt; struct format_type_impl&lt;'f'&gt; { using type = double; };
template&lt;&gt; struct format_type_impl&lt;'s'&gt; { using type = const char *; };
// ...
template&lt;char K&gt; using format_type = typename format_type_impl&lt;K&gt;::type;

// Build a tuple of types from a format string.
template&lt;char ...String&gt;
struct format_types;
template&lt;&gt;
struct format_types&lt;&gt; : types&lt;&gt; {};
template&lt;char Char, char ...String&gt;
struct format_types&lt;Char, String...&gt; : format_types&lt;String...&gt; {};
template&lt;char ...String&gt;
struct format_types&lt;'%', '%', String...&gt; : format_types&lt;String...&gt; {};
template&lt;char Fmt, char ...String&gt;
struct format_types&lt;'%', Fmt, String...&gt; :
  format_types&lt;String...&gt;::template push_front&lt;format_type&lt;Fmt&gt;&gt; {};

// Typed printf-style formatter.
template&lt;typename ...Args&gt; struct formatter {
  int operator()(Args ...a) {
    return std::printf(str, a...);
  }
  const char *str;
};

template&lt;typename CharT, CharT ...String&gt;
typename format_types&lt;String...&gt;::template apply&lt;formatter&gt;
operator""_printf() {
  static_assert(std::is_same&lt;CharT, char&gt;(), "can only use printf on narrow strings");
  static const CharT data[] = { String..., 0 };
  return { data };
}

void log_bad_guess(const char *name, int guess, int actual) {
  "Hello %s, you guessed %d which is too %s\n"_printf(
    name, guess, guess &lt; actual ? "low" : "high");
}
</code></pre>

<p>This is not possible with the existing support for string literal operators,
because the type of the literal cannot depend on the contents of the string.</p>

<h3>Compiler-validated string literals</h3>

<p>By a similar mechanism to the type-safe printf, literal operator templates
allow the user to validate that a string literal conforms to a specific syntax
or structure during translation.</p>

<pre class="extract"><code>class SpecialString {
public:
  constexpr static bool IsValidString(const char *str) { /* ... */ }
  explicit SpecialString(const char *str) : str(str) { assert(IsValidString(str); }
  const char *get() { return str; }

private:
  struct Checked {};
  SpecialString(Checked, const char *str) : str(str) {}
  template&lt;typename CharT, CharT ...&gt; friend SpecialString operator""_special();
  const char *str;
};

template&lt;typename CharT, CharT ...String&gt; SpecialString operator""_special() {
  constexpr static CharT data[] = { String..., 0 };
  static_assert(SpecialString::IsValidString(data), "not a valid string");
  return SpecialString(SpecialString::Checked(), data);
}
</code></pre>

<p>Again, this is not possible with the existing support for string literal
operators, because the literal's value is not available in constant
expressions within the literal operator.</p>

<h3>String obfuscation</h3>

<p>Some commerical applications desire to obfuscate some of their string
literals, so that (for instance) running the Unix <code>strings</code> command
on their binary does not reveal potentially-sensitive information, such as
features the customer has not paid for, or diagnostic messages which are
specific to another customer. With a literal operator template, this is possible
without disrupting the flow or readability of the client code.</p>

<pre class="extract"><code>template&lt;typename CharT&gt; struct encoded_string {
  operator std::basic_string&lt;CharT&gt;() { /* ... decode ... */ }
  // ...
}
namespace {
  template&lt;typename CharT, CharT ...String&gt; struct encode {
    static constexpr CharT data[] = { String ^ 0xa3 ..., 0 };
  };
  template&lt;typename CharT, CharT ...String&gt; const CharT encode::data[];
  template&lt;typename CharT, CharT ...String&gt; static encoded_string&lt;CharT&gt; operator""_hidden() {
    return encode&lt;CharT, String...&gt;::data;
  }
}

void report_secret_thing() {
  my_ostream &lt;&lt; "secret thing happened"_hidden &lt;&lt; std::endl;
}
</code></pre>

<h3>String interning</h3>

<p>Access to the contents of a string literal as a template parameter pack
allows string data to be deduplicated during translation, which in turn
permits value comparisons to be performed rapidly by comparing the addresses
of strings. Without literal operator templates, this requires either runtime
overhead to perform the interning, or for the programmer to explicitly construct
an object to hold the canonical value of a string. These costs can be avoided
with a literal operator template:</p>

<pre class="extract"><code>std::map&lt;std::string, const char*&gt; intern_map;
template&lt;char ...String&gt; struct register_intern {
  static constexpr char intern[] = { String..., 0 };
  static register_intern register_;
  register_intern() { intern_map[intern] = intern; }
};
template&lt;char ...String&gt; register_intern&lt;String...&gt; register_intern&lt;String...&gt;::register_;

template&lt;typename CharT, CharT ...String&gt; constexpr const char *operator""_intern() {
  static_assert(std::is_same&lt;CharT, char&gt;(), "can only intern narrow strings");
  return (&amp;register_intern&lt;String...&gt;::register_,
          register_intern&lt;String...&gt;::intern);
}

static_assert("foo"_intern == "foo"_intern, "");
</code></pre>

<h3>String literal canonicalization</h3>

<p>Qt defines the macros <code>SIGNAL</code> and <code>SLOT</code>, which
encode a method signature in order to allow it to be dynamically invoked:</p>

<pre class="extract"><code>#define SIGNAL(x) "1" #x
#define SLOT(x) "2" #x
// ...
  QObject::connect(sender, SIGNAL(thingHappened(int)),
                   receiver, SLOT(onThingHappened(int)));
</code></pre>

<p>Before the results of the <code>SIGNAL</code> and <code>SLOT</code> macro can
be used, they must first be canonicalized (by removing spaces, canonicalizing
the location of the <code>const</code> keyword, and so on). With a literal
operator template, this canonicalization can be performed during
translation.</p>

<pre class="extract"><code>#define SIGNAL(x) #x ## _qt_signal
#define SLOT(x) #x ## _qt_slot
</code></pre>

<h2>Proposal</h2>

<p>Add a new form of literal operator template for a cooked string literal:</p>

<pre class="extract"><code>template&lt;typename CharT, CharT ...String&gt;</code></pre>

<p>This form will be used if a non-template literal operator for the string
literal is not available. The first template argument will be the element type
of the string, and the remaining arguments are the code units in the string
literal (excluding its terminating null character).</p>

<h3>Raw forms</h3>

<p>N2750 expressed a concern that users may wish to use a <q>raw</q> form of
string literal. The form proposed herein is a <q>cooked</q> literal operator; no
raw form is proposed.  If users wish to capture the contents of a string literal
as written, a literal operator template can be combined with a raw string
literal:</p>

<pre class="extract"><code>R"(.*\.\(org\|com\|net\))"_regexp</code></pre>

<h3>Character literals</h3>

<p>No literal operator template is proposed for character literals. The author
does not wish to encourage the use of multi-byte character literals, and for
single-byte character literals, the feature would have extremely limited
utility. Indeed, no use cases are known for this feature, and any possible
cases could be addressed by using a string literal instead of a character
literal.</p>

<h2>Proposed wording</h2>

<p>The term of art <i>literal operator template</i> is split into two terms,
<i>numeric literal operator template</i> and <i>string literal operator
template</i>. The term <i>literal operator template</i> is retained and refers
to either form.</p>

<p>Replace <q>literal operator template</q> with <q>numeric literal operator
template</q> in [lex.ext] (2.14.8)/3 and [lex.ext] (2.14.8)/4:</p>

<blockquote class="std">
  [...] Otherwise, <i>S</i> shall contain a raw literal operator or a
  <ins>numeric</ins> literal operator template (13.5.8) but not both. [...]
  Otherwise (<i>S</i> contains a <ins>numeric</ins> literal operator template),
  <i>L</i> is treated as a call of the form [...]
</blockquote>

<p>Change in [lex.ext] (2.14.8)/5:</p>

<blockquote class="std">
  If <i>L</i> is a <i>user-defined-string-literal</i>, <ins>let <i>C</i> be the
  element type of the string literal as determined by its
  <i>encoding-prefix</i>,</ins> let <i>str</i> be the literal without its
  <i>ud-suffix</i><ins>,</ins> and let <i>len</i> be the number of code units in
  <i>str</i> (i.e., its length excluding the terminating null character).
  <ins>If <i>S</i> contains a literal operator with parameter types <code>const
  <i>C</i> *</code> and <code>std::size_t</code>, the</ins> <del>The</del> literal
  <i>L</i> is treated as a call of the form</p>

  <code>&nbsp; operator "" <i>X</i>(<i>str</i>, <i>len</i>)</code>

  <ins>
  <p>Otherwise, <i>S</i> shall contain a string literal operator template
  (13.5.8), and <i>L</i> is treated as a call of the form</p>

  <code>&nbsp; operator "" <i>X</i>&lt;<i>C</i>,
    <i>e</i>'<i>s<sub>1</sub></i>',
    <i>e</i>'<i>s<sub>2</sub></i>', ...
    <i>e</i>'<i>s<sub>k</sub></i>'&gt;()</code></p>

  where <i>e</i> is empty when the <i>encoding-prefix</i> is <code>u8</code>
  and is otherwise the <i>encoding-prefix</i> of the string literal, and
  <i>str</i> contains the sequence of code units
  <i>s<sub>1</sub>s<sub>2</sub>...s<sub>k</i> (excluding the terminating null
  character).
  </ins>
</blockquote>

<p>Change in [over.literal] (13.5.8)/5:</p>

<blockquote class="std">
  <del>
  The declaration of a literal operator template shall have an empty
  <i>parameter-declaration-clause</i> and its <i>template-parameter-list</i>
  shall have
  </del>

  <ins>A <i>numeric literal operator template</i> is a literal operator template
  whose <i>template-parameter-list</i> has</ins>

  a single <i>template-parameter</i> that is a non-type template parameter pack
  (14.5.3) with element type <code>char</code>.

  <ins>A <i>string literal operator template</i> is a literal operator template
  whose <i>template-parameter-list</i> comprises
  a type <i>template-parameter</i> <i>C</i> followed by a non-type template
  parameter pack with element type <i>C</i>.

  The declaration of a literal operator template shall have an empty
  <i>parameter-declaration-clause</i> and shall declare either a numeric literal
  operator template or a string literal operator template.
  </ins>
</blockquote>

</body>
</html>
