<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta name="Author" content="Roger Orr">
<title>ABI breakage - summary of initial comments</title>

<style type="text/css">
  body { font-family: sans-serif }
</style>

</head>

<body>
ISO/IEC JTC1 SC22 WG21 P1654R0<br/>
Roger Orr &lt;rogero@howzatt.demon.co.uk&gt;<br/>
Target audience: All of WG21<br/>
2019-06-13<br/>

<h1 align="center">ABI breakage - summary of initial comments</h1>

<h2>Abstract</h2>

The Direction Group has been asked to look at ABI (Application Binary Interface) breakage. Note that this is different from, but related to, the API (Application Programming Interface).
<p>
The first step was to solicit input from EWG/LEWG to try and capture the variety of issues
covered by the single term "ABI breakage".
<br>
This document is an informal summary of comments, including many of those made on the two evolution reflectors.
<p>
Thank you to all who have provided input; errors and omissions remain, as usual, the fault of the author.

<h2>Table of Contents</h2>
<ol>
<li><a href="#original"> Original statement of the concern </a>
<li><a href="#meta"> Meta Comments </a>
<li><a href="#past"> Past ABI breaks </a>
<li><a href="#implementation"> Implementation related </a>
<li><a href="#deprecation"> Changes with 'deprecation' </a>
<li><a href="#rejected"> Changes rejected because of ABI breaks </a>
<li><a href="#reverted"> Changes reverted because of ABI breaks </a>
<li><a href="#future"> Future ABI breaks </a>
<li><a href="#reference"> Additional resources </a>
</ol>

<h1 id="original"> 1. Original statement of the concern </h1>
<p>
From Michael Wong's original email to EWG/LEWG:
<p>
There is a concern that we need to be clear on when we can make clear ABI breakage.
There seems to be several cases on a spectrum of Performance vs Stability.
<p>
Cases 1-4:
<ol>
<li>NEVER break ABI : this is the hope and the most stable.
<li>Break ABI on a case by case basis e.g. SSO (small string optimisation)
<li>Break ABI at key declared boundary releases, e.g.  allow break every 12 years, or every 4 releases
<li>Break ABI at will, e.g. every release, most Performance
</ol><p>
Case 1: NEVER Break ABI leads to slower and slower performance but the best stability<br/>
Case 2: we have done that before, but it is unpredictable to users<br/>
Case 3: We have never tried this before, the question is what is the appropriate time frame between breakage<br/>
Case 4: this is the fastest moving, most performance, and the least stability.

<h1 id="meta"> 2. Meta Comments </h1>

<h2>What is meant by "the C++ ABI" ?</h2>

There is no <i>ABI</i> specified in the C++ standard; implementation details are left to each implementor.<br>
However, each implementation will have made a variety of decisions about the ABI.
<p>
Some implementations have a published ABI - for example on x64 many implementations conform to the Itanium specification (which applies to more than just the Intel Itanium chipset.)
<p>
Some implementations publish guarantees about stability of their ABI. For example, Red Hat publish an 
<a href="https://access.redhat.com/articles/rhel8-abi-compatibility"> "Application Compatibility GUIDE" </a>; Appendix A guarantees compatability for three releases for various components, including <code>libstdc++</code>.
<p>
Such documents and guarantees will not, in general, be under the control of WG21.
<p>
<h3>An ABI covers two separable concerns:</h3>
<ol>
<li>The first, foundational, concern is how C++ types and functions map into the underlying architecture. This includes such things as data structure layout, primitive types, calling convention, polymorphism internal data structures, and register usage.
<li>The second concern is the ABI of the standard library. This will depend heavily on specific implementation details of library types and classes.
</ol>

Changes to the first part generally make it extremely difficult to safely combine binaries with old and new ABI into a single executable.
<p>
Changes to the library can sometimes be accommodated in the library ABI by using sufficiently clever programming to provide backwards compatibility with an existing binary library.
<p>
However, the degree to which this is possible, in any given case, can vary between implementors depending on their library implementation. It can be hard to know, without specialist knowledge, how feasible this might be in a specific case.

<h2>Impact on proposals</h2>

One thing someone found particularly noteworthy: in the San Diego meeting with its
preponderance of new-attendees and papers from new attendees, we saw a
significant uptick on papers that were dismissed because of ABI concerns.
While that isn't <b>proof</b> of anything, I <b>suspect</b> that there are many ideas
that experienced committee members are filtering early because we have
internalized "that's an ABI break and thus a non-starter."
<p>
They didn't have anything on my list that individually felt (even to them) like
"we should break ABI for this" - the most impactful bit would probably be improvements to hashing. But they <b>suspect</b> that if we plan for it
with enough lead time we'll come up with a lot of quality-of-life and minor
performance improvements that add up to a lot.
<p>
Or, if we are just going to let "that's an ABI
break" be an automatic veto, we should probably update our published priorities.
They don't think "ABI stability" is listed anywhere as a "you can rely on this" feature for C++.

<h1 id="past"> 3. Past ABI breaks </h1>

The std::string class was changed for C++11, in response to the addition of threads, to make more of its operations safely concurrently executable (which invalidated copy-on-write implementations.) This included changing <code>data()</code> to require NUL termination.
<p>
Additonally, the change was designed to support the 'small string optimisation', which improves performance for strings short enough to take advantage of it.
<p>
Implementing this requires an ABI change in general as the size of the class changes, as does the layout and meaning of its members.
<p>
gcc provided a dual ABI to support both pre- and post- C++11 code, but it is still the case that by default many installations of gcc still use the pre C++11 implementation.

<p>
When <code>::operator new()</code> started throwing <code>std::bad_alloc</code>, two binary incompatibilities were introduced:
<ol>
<li> Existing binaries that were not designed to have exceptions propagating through them suddenly had exceptions propagating
   through them 
<li> Existing binaries that checked the return value of <code>new</code> no longer handled out of memory conditions
</ol>
Compilers and runtimes incorporated some devious tricks to mitigate the potentially harmful effects of this necessary change.
<p>
The changes to the definition of triviality could potentially have changed calling conventions,
 but to avoid that the Itanium ABI uses the C++98 definition of POD, not the current definition
 of trivial and standard layout, because that's evolved over time.
<p>
Adding the <code>std::system_error</code> base class to <code>std::ios_failure</code> for C++11 was a particularly nasty one for one implementor.
<div style="margin-left: 3em">
(The reason that one's so troublesome is that changing the type of exception thrown by a library doesn't produce any linkage
 changes. You can still link to it as before, but suddenly an exception that used to get caught now passes straight through
 your catch handlers. When a function's return type or parameter type changes, that can be turned into a linker error, so
 the user knows to recompile. Changes to the type of an exception thrown by the standard library (where the throw site is
 not in a header, so the precise type thrown is out of your control) is a silent change in runtime behaviour, and only on
 the exceptional path.
<p>
One implementor remarked: "Buy me a beer some time and I'll tell you the story of Schr&#246;dinger's Catch, which allows a single catch handler to work for 
two distinct types of <code>std::ios_failure</code>."

</div>
<p>
In C++11 <code>std::char_traits</code> changed the parameter types of its members from <code>const char_type&</code> to passing <code>char_type</code> by value
 (It is believeed this is still not actually implemented in libstdc++).
<p>
I think the LWG issues list records quite a few breaks between C++98 and C++03 that probably wouldn't be acceptable today,
but back then almost nobody actually implemented the full standard anyway, so making breaking changes was just part of
finishing the implementation!
<p>
P0482 (char8_t) changed the return type of the <code>u8string</code> and <code>generic_u8string</code> member functions of
 <code>std::filesystem::path</code> for C++20.

<h1 id="implementation"> 4. Implementation related </h1>

Whether move constructors should affect whether a type is passed in a register or on the stack
<p>
Whether empty class types as function arguments take up a slot in the argument list or not.

<h1 id="deprecation"> 5. Changes with 'deprecation' </h1>

The removal of <code>uncaught_exception</code> wasn't really an ABI break due to zombie names.
<p>
Same for get_unexpected/set_unexpected, etc.

<h1 id="rejected"> 6. Changes rejected because of ABI breaks </h1>

A selected list of papers that have been to LEWG or LEWG-I and rejected (sometimes
without further discussion) because their perceived value <b>individually</b>
didn't measure up to the perceived cost of an ABI break:
<ul>
<li> <a href="http://wg21.link/LEWG1053">LEWG1053</a> (Unify algorithms with operator and function object variants)
<li> D.7 - remove uncaught_exception?
<li> system_error should return string_view not std::string
<li> heterogenous lookup could be smoother
<li> hashing salt / std::hash optimization (which means standard unordered
   containers are forever vulnerable to hash flooding)
<li> push_back returns T&
<li> int128 / uint128 can't be added (because maxint_t is part of the ABI)
<li> mark bitset trivial?
<li> <a href="http://wg21.link/P1196">P1196</a> (Value-based std::error_category comparison)
<li> <a href="http://wg21.link/P1197">P1197</a> (A non-allocating overload of error_category::message())
<li> <a href="http://wg21.link/P1198">P1198</a> (Adding error_category::failed())
<li> <a href="http://wg21.link/P1249">P1249</a> (Allow initializer_list to be of non-const T)
<li> <a href="http://wg21.link/LWG3211">LWG3211</a> (std::tuple&lt;&gt; should be trivially constructible)
</ul>
<p>
ABI was the reason why we didn't make destructors implicitly virtual in polymorphic classes.  "If we can take an ABI break we can fix that."<br>
Note: the ABI was not the sole reason; and the impact of this change would be massive at this stage in the life of the language.
<p>
Adding new virtual functions to <code>std::num_get</code> and <code>std::num_put</code> was proposed for short float,
but it is believed has now been dropped from the proposal.
<p>

<h1 id="reverted"> 7. Changes reverted because of ABI breaks </h1>

The addition of <code>std::default_order</code> to associative containers was reverted because it was an ABI break.
<p>
The change from <code>lock_guard&lt;T&gt;</code> to <code>lock_guard<T...></code> was reverted because it was an ABI break.

<h1 id="future"> 8. Future ABI breaks </h1>

When we tried to add monadic optionals, we were concerned that we cannot pass overload sets to callables. This (passing overload sets to callables) would require a future planned ABI breakage.
<p>

(Not an ABI break taken, but one that should have been (or should be) taken)
<p>
make <code>std::unique_ptr&lt;T&gt;</code> be passed as efficiently as <code>T*</code>. Currently there is
a significant performance and optimization hit from using
<code>std::unique_ptr&lt;T&gt;</code> due to the ABI & calling convention required.
<div style="margin-left: 3em">
Which had the following reply: This is slightly different from, say, <code>list::size</code> and CoW string; there
was no change in the specification that would've caused or prevented
such a passing convention. There's fairly little we could've done in
the standard to impact this.
</div>
<p>
Numerous aspects of <code>std::unordered_map</code>'s API and ABI force seriously suboptimal implementation strategies.
(See "SwissTable" talks.)
<p>
Same for <code>std::map</code>. (For example btree-based sorted containers.)
<p>
The most frustrating for one person is <code>std::vector</code>, which cannot support small-size-optimization due to stability
of pointers & iterators across move.
<p>
We changed the return type of *<code>::emplace_back</code> from <code>void</code> to return a reference to the new element.
We <i>didn't</i> do the same for <code>push_back</code> because that would have broken ABI.
If we could break ABI we could make them consistent and remove one reason to (ab)use <code>emplace_back</code>.
<div style="margin-left: 3em">
<i>Further discussion</i>
<p>
Is that really a problematic ABI break for some compilers? In gcc-land we might stick an abi_tag on it so
the new version gets a different mangling, but I believe the ABI is not
broken. Unless of course you start introspecting and use
<code>decltype(c.push_back(e))</code>, but that's indirect and seems acceptable. 
<p>
<i>In reply:</i>
<p>
Without the abi-tag the old and new versions of the function have the same
mangled name.
<p>
One translation unit instantiates the old definition, and in that TU
nothing uses the return value (because it's void).
<p>
Another translation unit instantiates the new definition, and the caller of
the function uses the non-void return values.
<p>
You have two instantiations, with the same symbol name. The linker picks
one. Because it's a Thursday the linker picks the old definition of the
symbol, which doesn't actually return anything. The new TU calls the old
symbol, and there is junk on the stack where it expects to find a return
value. 
<p>
<i>Further reply:</i>
<p>
Thanks. Sorry, my message was not clear enough. I know all that. My point
was that some annotation like abi-tag easily avoids this issue. And you
only need a very basic version of abi-tag for that, which should be easy
to implement for any compiler that cares about binary compatibility. So I
don't think we should refrain from making such changes for ABI reasons.
<p>
Since this is a member function, its exact signature is not mandated by
the standard, so an implementation could also add an extra argument with a
default value, as allowed by [member.functions], to give it a different
mangling. But a vendor-specific annotation is more convenient.
</div>
<p>
Library Fundamentals defines <code>std::packaged_task</code> and <code>std::promise</code> with polymorphic allocator members,
which adds a pointer member to the class. That was originally proposed as a change to the standard types when LFTS was enabled,
which would have been an ABI break. Instead the types in LFTS are distinct types in a distinct namespace.
<p>
<a href="http://wg21.link/LWG2503">LWG2503</a> (multiline option should be added to syntax_option_type) is an ABI break.

<h1 id="reference"> 9. Additional resources </h1>

See these links for other discussions on binary compatability issues:
<ul>
    <li><a href="https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B">https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C++</a>
    <li><a href="https://community.kde.org/Policies/Binary_Compatibility_Examples">https://community.kde.org/Policies/Binary_Compatibility_Examples</a>
    <li><A href="https://wiki.qt.io/Qt-Version-Compatibility">https://wiki.qt.io/Qt-Version-Compatibility</a> Qt pledged ABI compatibility for major releases (almost 10 years)
</ul>

There's some tooling around managing the ABI:
<ul>
    <li><a href="https://abi-laboratory.pro/">https://abi-laboratory.pro/</a>
    <li>a href="https://sourceware.org/libabigail/">https://sourceware.org/libabigail/</a>
    <li><a href="http://lists.llvm.org/pipermail/llvm-dev/2018-September/126472.html">Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support, which morphed into the new llvm-elfabi tool</a>
</ul>
(Thanks to Morris Hafner for providing these links)
</body>