<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">
<title>C++ Dynamic Library Support</title>
</head>
<body>
<h1>C++ Dynamic Library Support</h1>

<p>
ISO/IEC JTC1 SC22 WG21 N2407 = 07-0267 - 2007-09-10

<p>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</p>

<ul>

<li><a href="#Introduction">Introduction</a
<ul>
<li><a href="#Benefits">Benefits</a></li>
<li><a href="#Terminology">Terminology</a></li>
</ul>
</li>

<li><a href="#Practice">Practice</a>
<ul>
<li><a href="#PracticeIsolation">Isolation</a></li>
<li><a href="#PracticeResolution">Resolution</a></li>
</ul>
</li>

<li><a href="#Proposal">Proposal</a>
<ul>
<li><a href="#ProposalBinding">Late Binding</a></li>
<li><a href="#ProposalIsolation">Isolation</a></li>
<li><a href="#ProposalResolution">Resolution</a>
    <ul>
    <li><a href="#ProposalSingle">Single Definition</a></li>
    <li><a href="#ProposalMultiple">Multiple Definitions</a></li>
    </ul>
<li><a href="#ProposalLoading">Conditional Loading</a></li>
<li><a href="#ProposalRemoval">Removal</a></li>
</ul>
</li>

<li><a href="#Changes">Changes</a></li>

</ul>

<h2><a name="Introduction">Introduction</a></h2>

<p>
The construction and use of dynamic libraries
has become a significant requirement on modern software development.
Unfortunately, their interaction with C++
varies between implementations
and is often underspecified on any given implementation.
</p>

<p>
The problem with dynamic libraries in C++ is that
the benefits they provide introduce another layer of visibility.
This additional layer of visibility
is intended to provide for additional isolation,
but is in direct contradiction to the one-definition rule.
</p>

<p>
See the following papers for more complete discussion of the issues.
The latter paper has an extensive set of references.
</p>

<ul>

<li>
<cite>Matt Austern,
<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2002/n1400.html">
N1400</a> Toward standardization of dynamic libraries</cite>
</li>

<li>
<cite>Pete Becker,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n1418.html">
N1418</a>
Dynamic Libraries in C++,
Notes from the Technical Session in Santa Cruz, Oct. 21, 2002</cite>
</li>

<li>
<cite>Benjamin Kosnik,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1976.html">
N1976</a>
Dynamic Shared Objects: Survey and Issues</cite>
</li>

</ul>

<p>
In practice,
programmers are able to work around the contradition
and produce well-formed and reliable programs.
Changing the standard to recognize and guide existing practice
will markedly improve program construction.
Unfortunately, a coherent change to the standard
may well require changes to some of the C++ ABIs,
and hence should be done as part of the standard
rather than as a Technical Report.
</p>


<h3><a name="Benefits">Benefits</a></h3>

<p>
The primary feature of dynamic libraries
is the means to defer the binding of a library interface
to an implementation of that interface
until program execution.
This defered binding provides a number of benefits to a program.
</p>

<ul>

<li>
The program may bind to an implementation of a library
tuned to a particular hardware platform.
For example, a numeric library vendor
may provide different implementations of a library
tuned to specific cache sizes.
</li>

<li>
The program may be updated in parts.
Newer versions of components
may be distributed independently of each other.
This independent update
may be both within a vendor's components
and between components of different vendors.
For example,
a database vendor may update its query interpreter library
independently of the application using the interpreter
and independently of the C run-time library used by the interpreter.
</li>

<li>
The program may gain functionality after distribution
by binding to libraries with functionality not originally anticipated.
The libraries are often called "plug-in" libraries.
One example is the a PDF display plug-in for a web browser.
</li>

</ul>

<p>
The second feature of dynamic libraries is isolation.
Isolation means that
accidents of implementation are not exposed to the users of the library.
That is, the set of bindable symbols provided by the library
is exactly the set of symbols in its interface;
none of the implementation-specific symbols are bindable.
</p>

<p>
The third feature of dynamic libraries is resolution.
Resolution means that
the system can resolve multiple definitions of a symbol.
There are two general strategies for resolution,
dependence and interposition.
More colloquially, these are "the Windows way" and "the Unix way",
respectively.
</p>

<p>
The fourth feature of dynamic libraries is conditional loading.
Conditional loading means that
the name of a dynamic library can be computed at run-time
and then brought into the load set.
This feature is also known as "plug-in".
</p>

<p>
The fifth feature of dynamic libraries is removal.
Removal means that
a dynamic library can be taken out of the load set,
or alternatively, that
the load set is not monotonically increasing.
</p>


<h3><a name="Terminology">Terminology</a></h3>

<p>
We adopt the terminology of
<cite>Matt Austern,
<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2002/n1400.html">
N1400</a> Toward standardization of dynamic libraries</cite>:
</p>

<dl>

<dt>executable</dt>
<dd>
A program that the user runs.
</dd>

<dt>dynamic library</dt>
<dd>
A library that is bound ot the executable at run time.
</dd>

<dt>load unit</dt>
<dd>
Either an executable or a dynamic library.
</dd>

<dt>load set</dt>
<dd>
The executable together with all dynamic libraries
laoded in the execution of the program.
</dd>

<dt>direct dependences</dt>
<dd>
The load units available to the static linker
to satisfy symbols undefined by the load unit.
</dd>

</dl>

<p>
In addition,
we introduce additional terminology
that is necessary to clarify the constraints of dynamic libraries.
</p>

<dl>

<dt>symbol</dt>
<dd>
A named function, type, or variable.
(Typedefs are not symbols.)
</dd>

<dt>visibility</dt>
<dd>
The visibility of a symbol is whether or not it is isolated.
</dd>

<dt>exclusive object definition</dt>
<dd>
An object definition that may appear in only one translation unit.
Regular functions have these definitions.
Regular initialized variables have these definitions.
</dd>

<dt>replicable object definition</dt>
<dd>
An object definition that may appear in multiple translation units,
provided the definitions are the same.
Inline functions and template functions have these definitions.
Uninitialized variables sometimes have these definitions,
which are also known as tentative definitions.
</dd>

<dt>exclusive class definition</dt>
<dd>
A class definition in which
at least one of its function or static data members
has a exclusive definition.
An exclusive class definition is effective
if and only if the compiler emits
class meta data along with exactly one exclusive member definition.
</dd>

<dt>replicable class definition</dt>
<dd>
A class definition 
that does not meet the criteria of an exclusive class definition.
</dd>

</dl>


<h2><a name="Practice">Practice</a></h2>

<p>
This section describes some existing practice.
It is not a complete description;
<cite>Benjamin Kosnik,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1976.html">
N1976</a>
Dynamic Shared Objects: Survey and Issues</cite>
provides more details.

<h3><a name="PracticeIsolation">Isolation</a></h3>

<p>
There are several approaches
to the syntax for specifying or retracting isolation for a symbol.
</p>

<dl>

<dt>Microsoft</dt>
<dd>
Symbols are isolated by default.
The declaration specifier
<code>__declspec(dllexport)</code>
specifies that a symbol definition is not isolated.
The declaration specifier
<code>__declspec(dllimport)</code>
specifies that a symbol declaration is satisfied by an non-isolated symbol.
</dd>

<dt>GNU on Unix</dt>
<dd>
Symbols are not isolated by default.
A declaration attribute specifies that a symbol is isolated,
e.g. <code>__attribute__((visibility("hidden")))</code>.
</dd>

<dt>Sun</dt>
<dd>
Default symbol isolation is defined by a command-line option,
with the default of the option being that symbols are not isolated by default.
For a given symbol,
the visibility is specified with a storage class,
e.g. <code>__global</code> or <code>__hidden</code>.
</dd>

<dt>
<cite>Pete Becker,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1428.html">
N1428</a>
/<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1496.html">
N1496</a>
Draft Proposal for Dynamic Libraries in C++</cite>
</dt>
<dd>
The syntax is only notional, not a formal proposal.
Symbols are isolated by default.
For a given symbol,
the visibility is specified with a storage class,
e.g. <code>shared</code>.
</dd>

<dt>
<cite>Lawrence Crowl,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2117.html">
N2117</a>
Minimal Dynamic Library Support</cite>
</dt>
<dd>
The syntax is hinted as a storage class.
</dd>

</dl>

<p>
In addition to specifying (non-)isolation for a single symbol,
it is convenient to have a syntax for specifying (non-)isolation
for a region of code,
particularly in header files.
There are fewer examples of such syntax.
</p>
<dl>

<dt>GNU on Unix</dt>
<dd>
A pragma can push and pop default visibility.<br>
<code>#pragma GCC visibility push(hidden)<br>
#pragma GCC visibility pop</code>
</dd>

<dt>
<cite>Pete Becker,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1428.html">
N1428</a>
/<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1496.html">
N1496</a>
Draft Proposal for Dynamic Libraries in C++</cite>
</dt>
<dd>
The syntax is only notional, but the <code>shared</code> storage class
can be placed before a brace-enclosed region,
much like <code>extern&nbsp;"C"</code>.
</dd>

</dl>

<h3><a name="PracticeResolution">Resolution</a></h3>

<p>
There are two primary approaches to resolution of multiple symbol definitions.
</p>

<dl>

<dt>Windows</dt>
<dd>
A reference is bound to the
definition of a symbol in a statically dependent library.
Thus a library may have not have a function replaced by the application.
A consequence is that
a library may not offer replicable functions without substantial work.
This work is necessary to meet the
application-replaceable semantics of the global allocation operators.
</dd>

<dt>Unix</dt>
<dd>
The first definition of a symbol in the ordered load set
is chosen for all references.
That is, the first definition interposes on other definitions.
A consequence is that
a library may have any function replaced by the application.
</dd>
</dl>

<p>
As always, there are complications.
Modern Unix systems provide for "protected" resolution,
in which a reference to a protected symbol
defined within the same load unit
will bind to that definition
irrespective of any prior definitions in the ordered load set.
</p>

<p>
Furthermore,
some Unix systems, e.g. Sun and GNU/Linux,
provide the ability to resolve a symbol to a dependent library
in preference to normal interposition resolution.


<h2><a name="Proposal">Proposal</a></h2>

<p>
We propose C++ dynamic library support
that exploits existing operating system facilities
for dynamic libraries.
Furthermore,
we structure that support
so that complexity rises with benefits.
The Committee can choose the features that it needs.
Finally, we specifically avoid trying to solve the whole problem,
concentrating instead on those portions of the problem
that affect large amounts of code.
If an aspect of the program generally only affects a few lines of code,
we leave it to programmers to write platform-specific code.
</p>


<h3><a name="ProposalBinding">Late Binding</a></h3>

<p>
The first feature of dynamic library support is late binding.
Late binding is entirely consistent with the current standard,
and no change is necessary for this feature.
</p>


<h3><a name="ProposalIsolation">Isolation</a></h3>

<p>
The second feature of dynamic library support is isolation.
To enable isolation,
the standard must recognize the load unit
as an intermediate layer of visibility
between a translation unit and the program.
</p>

<p>
One load units are present,
the standard must provide a mechanism
that specifies whether a symbol is isolated to a load unit
or visible to all load units.
</p>

<p>
The primary mechanism for isolation is and should remain namespaces.
Namespaces provide the best foundation for preventing symbol clashes.
However, namespaces are insufficient for two reasons.
First, they are transparent to functions with C linkage.
Second, they are not robust to an adversarial use of implementation details.
As a consequence, an additional mechanism is necessary.
</p>

<p>
Given a mechanism for isolation,
the standard must admit multiple definitions for the same symbol,
provided that those definitions are isolated from each other.
</p>

<p>
For the isolation syntax,
we propose to avoid introducing a new keyword
and extend the <code>public</code>, <code>protected</code>,
and <code>private</code> labels
to load unit visibility for namespace-scoped symbols.
Symbols with <code>public</code> or <code>protected</code> labels
are <em>not</em> isolated.
(The distinction between <code>public</code> and <code>protected</code>
appears later.)
Symbols with a <code>private</code> label
<em>are</em> isolated to a load unit
and are distinct from any non-isolated symbol
or any isolated symbol declared in another load unit.
Specifically, functions and variables have distict addresses
while types have distinct typeids.
</p>

<p>
For class definitions, any meta-data must be isolated as well.
Achieving distinct typeids for isolated types
is most likely to require an implementation
to change the ABI of the language.
</p>

<p>
The member function and static member variable symbols
associated with a class
have the visibility of their containing class.
That is, within class definitions,
the labels have their existing <dfn>access-specifier</dfn> meaning.
Furthermore, class member definitions outside of a class definition
ignore the prevailing visibility,
and instead use the visibility of the class definition.
</p>

<p>
A label within a declarative region
extends to the next label or to the end of the region,
whichever comes first.
Any label in effect immediately before a declarative region
will be in effect immediately after that region.
There are two kinds of declarative regions,
namespace and language linkage.
Programmers can limit the scope of such labels at global scope,
or within a namespace region,
by enclosing them
in language linkage (<code>extern&nbsp;"C++"&nbsp;{&nbsp;}</code>) regions.
For example:
</p>

<blockquote><pre><code>
extern "C++" {
private:
    int my_helper( int a ) { return a+1; }
public:
    int give_me_more( int a ) { return my_helper( a+1 ); }
}
</code></pre></blockquote>

<p>
To assist in migration of existing code,
the visibility in effect at the beginning of a translation unit
is implementation-defined.
Within headers,
programmers should place all labels
within a declarative region
so as to preserve the implementation default.
</p>

<p>
We considered using the proposed annotation facility,
<cite>Jens Maurer, Michael Wong,
<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2379.pdf">
N2379</a>
Towards support for attributes in C++</cite>,
but decided against using it
because the isolation specification
does not meet the "ignorable" criteria for attributes.
That is, removing the isolation indication
would produce ill-formed programs.
</p>


<h3><a name="ProposalResolution">Resolution</a></h3>

<p>
The third feature of dynamic library support is
resolution of symbol references to multiple definitions.
This topic is somewhat complicated,
and we approach it in two ways.
</p>


<h4><a name="ProposalSingle">Single Definition</a></h4>

<p>
The simplest proposal is to simply define multiple definitions
of non-isolated symbols as an error.
</p>

<p>
Because existing dynamic linker technology
has only one category of definition,
any replicable definition appears as though
there were multiple exclusive definitions.
Therefore, the simplest standard
would simply prohibit non-isolated replicable definitions.
A consequence is that the standard library 
would need careful thought
as to which parts were applicable to a shared dynamic library
and which parts were applicable to a replicated static library.
</p>

<p>
A more usable standard would support non-isolated replicable definitions
provided that the definitions are identical.
Doing so is not conceptially difficult;
the primary problem is choosing a unique address/typeid.
The dynamic linker can simply choose one of the definition artifacts.
The existing Unix interposition resolution approach
meets these semantics exactly.
The existing Windows dependence resolution approach poses a problem,
normally yielding different addresses within different load units.
Potential solutions to this is to require
each library obtain addresses from a shared table
or to simply live with different addresses
for what are conceptually the same function.
Programmers rarely rely on inline functions having identical addresses.
</p>


<h4><a name="ProposalMultiple">Multiple Definitions</a></h4>

<p>
When multiple definitions are available for exclusive definitions,
the implementation must resolve references to definitions.
Unfortunately,
neither the Unix approach nor the Windows approach
appears to fully solve the problem.
The Unix interposition approach
leaves programs vunerable to inconsistent definitions
when functions are inlined and interposed.
The Windows dependence approach prevents interposition,
as required for the global allocation operators.
To resolve this issue,
we propose to "do both".
</p>

<p>
Syntactically, we refine the label syntax introduced above for isolation.
Semantically, we leave much implementation-defined
because detailed specification of compile and link commands
is beyond the scope of the standard.
</p>

<ul>

<li>
A symbol declared with the label <code>public</code>
has interposition semantics.
All references to a <code>public</code> symbol
will resolve to a single definition within the program.
The selection of definition is otherwise implementation-defined.
</li>

<li>
A symbol declared with the label <code>protected</code>
has dependence semantics.
A load unit's reference to a <code>protected</code> symbol
will resolve to a definition
either in the current load unit
or, failing that,
in one of its direct dependences.
The selection of definition is otherwise implementation-defined.
</li>

<li>
For type definitions, all non-isolated types
may only differ in their exclusive function and variable definitions.
</li>

</ul>

<p>
For example, and by way of illustration,
the standard library would have the following declarations.
</p>

<blockquote><pre><code>
namespace std {
    typedef void (*new_handler)();
protected:
    new_handler set_new_handler( new_handler ) throw();
}
extern "C++" {
public:
    void * operator new( std::size_t ) thorw( std::badalloc );
}
</code></pre></blockquote>


<p>
The primary problem with different replicable definitions
is that current linker technology
is unable to determine that two definitions are replicants of each other.
Furthermore, replicants are often involved in inlining,
and a non-inline call with different semantics from an inline expansion
is bound to cause inconsistency and potentially failure.
Therefore, we propose to prohibit <code>public</code> replicable definitions.
</p>


<h3><a name="ProposalLoading">Conditional Loading</a></h3>

<p>
The fourth feature of dynamic library support is conditional loading.
In terms of isolation and resolution,
conditional loading introduces no new issues.
The two new issues are initialization and destruction order
for static-duration variables
and finding a root symbol for the library.
</p>

<p>
We believe that the order of initialization and destruction
as defined in
<cite>Lawrence Crowl,
<a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2382.html">
N2382</a> Dynamic Initialization and Destruction with Concurrency</cite>
provides for sufficiently late execution of initializers
to admit conditional loading.
</p>

<p>
Finding the root symbol on a library
generally involves converting a string
containing some form of the symbol name
into an address.
As this code has low static frequency,
we choose to not standardize it.
Programmers will need to specialize their code
for each supported platform.
</p>


<h3><a name="ProposalRemoval">Removal</a></h3>

<p>
The fifth feature of dynamic library support is library removal.
This feature is also known as closing a dynamic library.
The implications on order of destruction
of static-duration and thread-duration variables
could be severe.
So, rather than try to define a precise meaning,
we intend to provide advice to programmers
on how to avoid the problems.
In particular,
</p>

<ul>

<li>
Programmers shall terminate all threads
that reference a thread-duration variable defined within a load unit
before removing that load unit from the load set.
In practice, this means that a library
intended to be conditionally loadable
should only use thread-duration variables
in threads that it creates
and then terminates before removal.
</li>

<li>
Programmers shall ensure that no static-duration variable
is referenced from outside the removable load unit.
In practice, this means that
all variables in removable libraries
have <code>private</code> visibility
and that the library does not pass their addresses
outside of the library.
</li>

</ul>

<p>
As code to remove a dynamic library also has low static frequency,
so we chose to not standardize it.
Programmers will need to specialize their code
for each supported platform.
</p>


<h2><a name="Changes">Changes</a></h2>

<p>
The changes to text of the standard will go here.
The extent of those changes depends on
which features the committee chooses to support.
</p>

<p>
The intent is to cover core language changes only,
leaving standard library changes to a separate paper.
</p>

</body>
</html>
