<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<title>A Modern C++ Signature for Main</title>
<style type="text/css">
blockquote.p {
  border-style: solid;
  border-width: thin;
  padding-left: 1em;
  padding-right: 1em;
}

.q {
  font-weight: bold;
  text-decoration: underline;
  font-size: 150%;
}
</style>
</head>
<body>
<table border="1">
<tbody>
<tr>
<th>Doc. no:</th>
<td>P0781R0</td>
</tr>
<tr>
<th>Date:</th>
<td>2017-09-25</td>
</tr>
<tr>
<th>Reply to:</th>
<td><a href="mailto:erich.keane@intel.com">Erich Keane</a></td>
</tr>
</tbody>
</table>
<h1>A Modern C++ Signature for <tt>main</tt></h1>
<p class="q">Introduction</p>
<p>One of the greatest accomplishments of the ISO C++ Committee over the past decade was to provide easy to use and powerful zero-cost abstractions to painful C-isms in the language. A very useful benefit of these abstractions is that they make common operations easy and consistent.
</p>
<p>
However, one of of the last vestiges of <tt>C</tt> left in <tt>C++</tt> is also both the most difficult to educate new programmers, and an incredibly error prone one to knowledgable programmers. This of course, is the available signatures of the application entry function, aka <tt>main</tt>. The meaningful signature of <tt>main</tt> dates back to the earliest versions of <tt>C</tt>.  This paper proposes adding an additional signature for the <tt>main</tt> function, starting with some guidelines that should be used to select a replacement, as well as a few potential options.
</p>
<p class="q">Justification</p>
<p>
First, lets consider a somewhat common usage of the useful signature of <tt>main</tt>:
<blockquote><pre>int main(int argc, char** argv){
  for (size_t i = 0; i < argc; ++i) {
    char *Arg = argv[i];
    size_t ArgSize = strlen(Arg);
    // some usage of this character array...
  }
}
</pre></blockquote>
One thing that you may take from this pessimized and contrived example is the horrible amount of <tt>C</tt>-isms and otherwise terrible set of functions that the programmer is immediately being exposed to. For a student of <tt>Modern C++</tt>, one can imagine how terrifying this is for an otherwise simple operation. This requires familiarity of C-Arrays, pointer decay, C-string functions, and even traditional for-loops!  </p>
<p>Any experienced <tt>C++</tt> programmer would likely be disgusted by the same issues, and would immmediately wrap this in some other type, such as <tt>boost::program_options</tt>. Even so, this is a sizable dependency for a smaller application that is perhaps significantly more complex than required.</p>
<p class="q">Goals for a Solution</p>
<p>
The author of this paper has two main goals for this proposal:
<ol>
<li>Make the program command line arguments easy to use for experienced programmers in a way that embraces <tt>Modern C++</tt> features.</li>
<li>Make the program command line arguments consistent enough with the remainder of <tt>Modern C++</tt> that students to the language will have an immediate understanding of how to accept them.</li>
</ol>
Immediately, removing all mention of traditional-arrays, pointers, C-string handling, and traditional for-loops are a useful goal. This paper proposes an alternative syntax that would look more like the following example:
<blockquote><pre>int main(const some_container&lt;const some_string_type&gt; args){
  for (auto Arg : args) {
    // some usage of this character array...
  }
}
</pre></blockquote>
The primary benefit of this version is that a container of string-types is likely the most familiar type of structure to both experts and beginners alike,
providing a simple to use, self-explainatory structure for programmers of all levels. Gone are all of the <tt>C</tt>-isms, which have all been replaced with memory safe alternatives that are signifcantly more terse and self-explainatory.</p>
<p>
In addition to the above form, this paper proposes a number of options for <tt>some_container</tt> and <tt>some_string_type</tt>.  The goals of these are listed below:
</p>
<p>
<b><tt>some_container</tt></b>
<ol>
<li>Well ordered: The container should maintain the actual order of the parameters listed on the command line.</li>
<li>Iteratable: Range-for and iterators are incredibly useful and consistent in <tt>Modern C++</tt>, so looping through the parameters is necessary.</li>
<li>Contiguous Storage: This minimizes storage, and provides the best chance for operating system support as a zero-cost-abstraction.</li>
</ol>
<b><tt>some_string_type</tt></b>
<ol>
<li>Constructible from a <tt>char*</tt>: Currently the majority of OSs provides an array of character pointers, so something that could be easily be constructed from one of those is likely to provide the best performance.</li>
<li>Convertible to a standard <tt>C++</tt> string type: A vast majority of operations and algorithms in current <tt>C++</tt> code accept <tt>std::string</tt>.  A type that either is, or is easily convertible to one of these is optimal.</li>
</ol>
<p class="q">Suggested Solution</p>
<p>
The author of this paper suggests <b><tt>std::initializer_list&lt;std::string_view&gt;</tt></b> for a number of reasons, in addition to the thoughts above.  Firstly, <tt>std::initializer_list</tt> is already recognized and specially constructed by the compiler. It is a lightweight type that can be trivially mapped to an existing section of memory, so this provides the most flexibility for the implementation details.  Additionally, <tt>std::string_view</tt> is ALSO incredibly lightweight, and can be mapped easily from an existing section of memory.  It also has the advantage of being trivially copyable, simply convertible to <tt>std::string</tt>.
</p>
<p>
This solution, however, is not quite perfect.
</p>
<p>
First, on most operating systems this signature requires an allocation of size <tt>"argc"</tt>, since the current <tt>char**</tt> signature cannot be trivially copied to a <tt>std::initializer_list</tt> type.  However, the author would like to point out that the OS ALREADY knows the length, so the OS is not required to provide additional calculation in order to provide the process with a list of Pointer/Integer pairs rather than just a list of Pointers.
  </p>
</p>
Secondly, this signature currently requires an increased startup cost, since the length of each parameter must be calculated.  However, the programmer is likely to require this calculation anyway. There is the slight risk that an individual executing a function using this signature could send a very large amount of command-line parameter characters. The author of this paper believes that this is an acceptable risk (since at worst, it doubles the launch-cost in this situation).  If the programmer believes this is a viable risk, they are still welcome to use one of the previous signatures.
</p>
<p>
Finally, <tt>initializer_list</tt> has two minor issues.  First, it does not have a random access iterator.  This case is believed to be fairly minor, since the ordering of parameters is typically more important than ordinal location. Again, if the alternate is completely manditory, the exisitng signature is still available. Secondly, the <tt>initializer_list</tt> does not have a constructor that would work in this situation. However, it is already a type that is magically created by the constructor, so one more condition where this happens seems acceptable.
  </p>
<p class="q">Past/Potential Critisms</p>
<p><b>What about <tt>array_view</tt>?</b></p>
<p>This type actually has a number of advantages of <tt>initializer_list</tt> plus would add the random access indexing, however it currently does not exist in the standard. Not proposed here, but perhaps possible for a future paper would be to add random access to <tt>initializer_list</tt></p>
<p><b>What about <tt>std::string/std::vector</tt>?</b></p>
<p> These types have two big costs that are likely not acceptable. First, they require an allocator, which could potentially require state which would require a guaranteed execution order compared to global initialization. Secondly, they would likely prevent an intelligent operating system designer from changing the entry-function format of the OS to better match the language entry function.</p>
<p><b>What happens if the user didn't include initializer_list/string_view headers/modules?</b></p>
<p> The author believes that this should be an error condition.  However, others have argued that the compiler should materialize these includes/imports if necessary.  The author has no issue with this behavior.</p>
<p><b>Why is it a <tt>const string_view</tt>?</b> </p>
<p>As this functionality is meant for typical usage and to prevent common errors, this paper proposes disallowing modification of the command line arguments by default. The existing form of the entry function can be used by those wishing to modify their arguments.</p>
<p><b>What about <tt>argv[argc]</tt>?</b> </p>
<p>
The standard (both C and C++) state that this should contain 0. This paper proposes to make this value not part of the <tt>initailizer_list</tt>, as it is not terribly useful for programmers, and confusing at best for beginner programmers.
</p>

</body></html>
