<HTML><HEAD><TITLE>N1496=03-0079, Draft Proposal for Dynamic Libraries in C++ (Revision 1)</TITLE></HEAD><BODY>

<CENTER>
<H1><A NAME="N1496=03-0079, Draft Proposal for Dynamic Libraries in C++">Draft Proposal for Dynamic Libraries in C++ (Revision 1)</A></H1>
</CENTER>

<TABLE ALIGN="RIGHT" CELLSPACING="0" CELLPADDING="0">
<TR>
<TD ALIGN="RIGHT"><B><I>Document number:</I></B></TD>
<TD>&nbsp; N1496&nbsp;=&nbsp;03-0079</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Date:</I></B></TD>
<TD>&nbsp; September 22, 2003</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Project:</I></B></TD>
<TD>&nbsp; Programming Language C++</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Reference:</I></B></TD>
<TD>&nbsp; ISO/IEC IS 14882:1998(E)</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Reply to:</I></B></TD>
<TD>&nbsp; Pete Becker</TD>
</TR>
<TR>
<TD></TD>
<TD>&nbsp; Dinkumware, Ltd.</TD>
</TR>
<TR>
<TD></TD>
<TD>&nbsp; petebecker@acm.org</TD>
</TR>
</TABLE>
<BR CLEAR="ALL">

<HR>

<P>Changes from n1428:</P>

<UL>
<LI>Added new section, <A HREF="#Use Cases">Use Cases</A></LI>
<LI>Reworded discussion of <A HREF="#efficient">efficient</A> to talk about
efficiency compared to native use of dynamic libraries, rather than
to static linking, which on some systems can be much faster</LI>
<LI>Added <A HREF="#transition">transition</A>, as a reminder that
existing code ought to work pretty much the same way when it is
converted to whatever we standardize</LI>
<LI>Added two notes labeled &quot;Open issue&quot;, to highlight
the big issues that we need to discuss in Kona</LI>
</UL>

<HR>

<P><B><A HREF="#Overview">Overview</A>
&#183; <A HREF="#Dynamic Libraries and Application Architecture">Dynamic Libraries and Application Architecture</A>
&#183; <A HREF="#Use Cases">Use Cases</A>
&#183; <A HREF="#Design Principles">Design Principles</A>
&#183; <A HREF="#Impact on the Standard">Impact on the Standard</A>
&#183; <A HREF="#Proposed Text">Proposed Text</A>
&#183; <A HREF="#Bibliography">Bibliography</A>
</B></P>

<HR>

<H2><A NAME="Overview">Overview</A></H2>

<P>Many operating systems today support separating an application into
multiple components, referred to in this paper as <B>dynamic libraries</B>,
that are gathered together by the operating system when the application runs.
Deferring the final assembly of the application until runtime provides
more flexibility for application updates and simplifies addition of code
modules that were not part of the application at the time it was compiled.
Use of dynamic libraries can also reduce system memory requirements when
multiple applications use the same code, by only requiring one copy of the
common code to be held in memory instead of requiring a separate copy for each
application.</P>

<P>The C++ Standard has no explicit provisions for creating applications that
use dynamic libraries. There are a few requirements that were deliberately written
to avoid precluding use of dynamic libraries, but in general an application that
uses dynamic libraries cannot be written entirely in standard C++.</P>

<P>The terminology, the compiler and linker mechanisms, and the semantic rules
for dynamic libraries vary widely from system to system. In particular, it is
difficult to write applications that use dynamic libraries and are portable
between Windows and UNIX, even if those applications use only standard C++ except
where non-standard code is required for dynamic libraries.</P>

<P>This paper discusses the issues presented in adding support for dynamic
libraries to standard C++ and it makes specific recommendations for changes
to the C++ standard for that support. These recommendations are at present
incomplete, and the parts that are presented will undoubtedly be changed
extensively as a result of future discussions. They are intended to provide
a starting point and a framework for changes needed to support dynamic libraries
in standard C++.</P>

<H2><A NAME="Dynamic Libraries and Application Architecture">Dynamic Libraries and Application Architecture</A></H2>

<P>Dynamic libraries are an implementation technique that supports two somewhat different
application architecture features. First, by deferring linking of some source identifiers
until runtime, they can make distribution of updated application code simpler, since
only the affected dynamic libraries need to be replaced and not the entire application.
Second, by supporting loading of libraries identified at runtime (e.g. through
<CODE>dlsym</CODE>, <CODE>LoadLibrary</CODE>, etc.) they make it easier for
an application to support user-supplied extensions, through user-written or third-party add-on
libraries, commonly known as &quot;plug-ins&quot;. This paper currently only addresses
deferred linking; loadable libraries are important, however, and their omission is
temporary.</P>

<H2><A NAME="Use Cases">Use Cases</A></H2>

<P>There are four situations that we need to consider in
designing support for dynamic libraries in C++:</P>

<UL>
<LI>an implementor of the <B><A NAME="standard library">standard library</A></B>
puts the library code in one or more dynamic libraries</LI>
<LI>the implementor of a <B><A NAME="third-party library">third-party library</A></B>
puts the library code in one or more dynamic libraries</LI>
<LI>an application is <B><A NAME="partitioned">partitioned</A></B>
to use one or more dynamic libraries, all known at static link time</LI>
<LI>an application supports <B><A NAME="explicit loading">explicit loading and unloading</A></B>
of dynamic libraries identified at runtime</LI>
</UL>

<H2><A NAME="Design Principles">Design Principles</A></H2>

<P>Several principles should be considered in designing C++ language support
for programs that use dynamic libraries. These principles sometimes conflict.</P>

<P>Language support for dynamic libraries should be <B><A NAME="easy to use">easy to use</A></B>.
The syntax should be simple and intuitive; converting existing applications should, in
most cases, be straightforward; the constraints imposed on applications that use
dynamic libraries should be easy to understand.</P>

<P>Applications that use dynamic libraries should be <B><A NAME="portable">portable</A></B>.
This doesn't mean that code can simply be recompiled and expected to work. It means that
most platform differences should be masked by the language support, that most differences
that aren't masked can be identified when recompiling, and that differences that aren't
identified when recompiling can be fairly easily diagnosed, either by source code analysis or
by debugging.</P>

<P>Portable C++ applications that use dynamic libraries should be as
<B><A NAME="efficient">efficient</A></B> as applications written in
other languages and as applications that use native support for dynamic
libraries directly. Minor differences in speed and size are acceptable; major ones are not.</P>

<P>The use of dynamic libraries should be <B><A NAME="transparent">transparent</A></B>.
Making the obvious syntactic changes needed to convert a statically linked application to
a dynamically linked one should not introduce subtle semantic differences. (For example,
this means that changing a declaration of an external name to a declaration of a shared name
should not affect overloading).</P>

<P>Dynamic libraries should support <B><A NAME="information hiding">information hiding</A></B>.
Developers of dynamic libraries should be able to control which parts of a dynamic library are
available to users and which parts are not. This makes it easier to replace the implementation
of a dynamic library without affecting applications that use it.</P>

<P>The <B><A NAME="transition">transition</A></B> to portable C++ should
be easy. The semantics of dynamic library use should be as close to the semantics
of native implementations as possible.</P>

<H2><A NAME="Impact on the Standard">Impact on the Standard</A></H2>

 <H3>Definition of a Program</H3>

<P>A standard C++ program today is a set of translation units, constrained by
the one definition rule, compiled separately and linked together. To allow for
dynamic libraries an additional layer is needed between translation units and
programs. This paper uses the term <B>linkage unit</B><SUP><A HREF="#fn1">1</A></SUP>
for this layer. Thus, a program is a set of linkage units that are compiled and
linked separately, constrained by the one definition rule, and linked together
when the program is run. A linkage unit is a set of translation units, constrained
by the one definition rule, compiled separately and linked together.</P>

<P>This expanded definition of a program is reflected in the recommendation to add
a clause entitled <A HREF="#Program Model">The C++ Program Model</A> to the standard's
<I>General</I> clause, and a change to the current
<A HREF="#Phases of Translation">Phases of Translation</A>.</P>

<P>The program model also introduces the notion of <B>tentative resolution</B>, requiring
that all shared references be resolved at the time that a linkage unit is statically
linked. This is done by linking against a library or libraries that provide
definitions for all of these symbols<SUP><A HREF="#fn2">2</A></SUP>.</P>

<P><I><B>Open issue.</B> We need to discuss this in Kona. The two extremes are:</I></P>

<UL><I>
<LI>all names should be tentatively resolved at static link time in order to
ensure reliability and portability</LI>
<LI>the possibility of deferring resolution of dynamically linked names
provides flexibility for applications that provide reduced functionality
when some functions are not present, and to avoid overspecifying how applications
are linked and loaded.</I></UL>

<P>Further, the program model introduces the notion of a <B>linkage unit
identifier</B><SUP><A HREF="#fn3">3</A></SUP>. Linkage unit identifiers identify the
linkage units that were used to tentatively resolve shared references when a
linkage unit was statically linked. If a shared identifier that is used in the
application is present at runtime in a linkage unit with a different identifier
than the one to which it was tentatively resolved the behavior of the program
is undefined. This permits replacement of the linkage unit that was used during
the static link with a different implementation (typically a newer version) at
runtime, but does not permit moving functions and data objects to different
linkage units<SUP><A HREF="#fn4">4</A></SUP>.</P>

<P><I><B>Open issue.</B> Depending on what we decide about the preceding
issue, this may be moot. The purpose of this circumlocution was to permit replacement
of </I><CODE>mylibrary.1</CODE><I> with </I><CODE>mylibrary.2</CODE><I>. The
linkage unit identifier for both libraries would have to be the same. That
amounts to an assertion by the library provider that these libraries do
&quot;the same thing&quot;.</I></P>

 <H3>Shared Linkage</H3>

<P>Entities within a linkage unit whose names can be referred to from another linkage
unit have <B>shared linkage</B><SUP><A HREF="#fn5">5</A></SUP>. This requires a
straightforward change to the standard's discussion of <A HREF="#Linkage">Linkage</A>.
This change also means that names with external linkage can only be referred to by other
code in the same linkage unit.</P>

 <H3>Extension of the One Definition Rule</H3>

<P>The recommended changes to the <A HREF="#One Definition Rule">One Definition Rule</A>
are that multiple definitions of the same function or object that would currently be allowed
in a program be allowed within in a linkage unit, and that shared functions and objects may not
be defined more than once<SUP><A HREF="#fn6">6</A></SUP>.</P>

 <H3>Declaration Syntax</H3>

<P>There are three possible approaches to specifying which names in a translation
unit refer to entities with shared linkage. First, names with external linkage
can be non-shared by default, and the programmer would have to explicitly identify
names that are to have shared linkage. This is the model that Windows programmers
are familiar with. Second, names with external linkage can be shared by default,
and the programmer would have to explicitly identify names that are not to have
shared linkage. This is similar to the model that UNIX programmers are familiar
with. Third, it can be implementation-defined which of the two preceeding models
applies. This minimizes the required changes to existing code.</P>

<P>This paper does not yet recommend syntax for specifying which names have shared
linkage. However, it seems clear that the syntax for specifying which names have shared linkage
should have the following properties<SUP><A HREF="#fn7">7</A></SUP>:</P>

<UL>
<LI>it should be applicable to individual names:</LI>
<PRE>    <CODE>shared int i;        // shared linkage</CODE></PRE>
<LI>it should be applicable to a block at file scope, affecting all names declared
within the block that would otherwise have external linkage:</LI>
<PRE>    <CODE>shared {
         int j;          // shared linkage
         static int k;   // internal linkage
         }</CODE></PRE>
<LI>it should be applicable to a class, giving all member functions and all static data
members shared linkage:</LI>
<PRE>    <CODE>shared class C {
         int c0;         // no linkage
         static int c1;  // shared linkage
         void f();       // shared linkage
         };</CODE></PRE>
<LI>it should be applicable to an explicit template specialization (but not to a mere
template):</LI>
<PRE>    <CODE>template &lt;class T&gt; class D {
         int d0;         // no linkage
         static int d1;  // no linkage
         void f();       // no linkage
         };
    shared template &lt;&gt; D&lt;int&gt;;
                         // D&lt;int&gt;::d1 and D&lt;int&gt;::f have shared linkage</CODE></PRE>
</UL>

 <H3>Type Identification and Other Compiler-Generated Meta-Data</H3>

<P>The changes suggested so far do not address the issue of
type identity across linkage units. For example, in order for code in one
linkage unit to catch an exception that was thrown by code in a different
linkage unit, the type of the thrown object must be recognized in the code
that handles the catch clause. This means that the two linkage units must
have the same notion of the exception's type. On the other hand, two linkage
units that were developed independently may unintentionally use the
same type name. Such independent use of the same name for two distinct types
should not lead to program failure.</P>

<P>In C++ today, a class type with external linkage
must have the &quot;same&quot; definition in every translation unit that uses it.
It's clear that this rule should apply to type definitions within a linkage unit.
However, in a dynamically linked application the actual components are determined at runtime,
so the application developer cannot control what types are defined within different
linkage units. In this setting it's not clear what the rule should be. I'm uncomfortable
with the notion of shared and non-shared types, but maybe that's what we need.</P>

 <H3>Construction and Destruction of Static Objects, atexit</H3>

<P>In the absence of loadable libraries (see below) the current rules appear
adequate.</P>

 <H3>Support for Loadable Libraries</H3>

<P>Next draft?</P>

<H2><A NAME="Proposed Text">Proposed Text</A></H2>

 <H3><A NAME="Program Model">Program Model</A></H3>

<P>Remove paragraph 1 of [basic.link] (3.5/1) which currently reads:</P>

<BLOCKQUOTE>A <I>program</I> consists of one or more <I>translation
units</I> (clause 2) linked together. A translation unit consists
of a sequence of declarations.

<PRE>    <I>translation-unit:
        declaration-seq<SUB>opt</SUB></I></PRE></BLOCKQUOTE>


<P>Add new clause [intro.program.model] immediately preceding [intro.execution] (1.9):</P>

<BLOCKQUOTE><B>The C++ Program Model</B></BLOCKQUOTE>

<BLOCKQUOTE>A <I>program</I> consists of one or more <I>linkage units</I>
linked together at runtime. A linkage unit consists of one or more <I>translation
units</I> (clause 2) linked together when the application is built. A translation
unit consists of a sequence of declarations.

<PRE>    <I>translation-unit:
        declaration-seq<SUB>opt</SUB></I></PRE></BLOCKQUOTE>

<BLOCKQUOTE>When translation units are linked to form a linkage unit
they may contain references to shared entities that are not defined by any translation
unit in the linkage unit being linked. Such references shall be <I>tentatively resolved</I> by
linking to a file or files with implementation-defined type specifying which linkage
unit or units define each of those shared entities.</BLOCKQUOTE>

<BLOCKQUOTE>Every linkage unit has a <I>linkage unit identifier</I> whose form is
implementation-defined.</BLOCKQUOTE>

<BLOCKQUOTE>A program begins execution in the linkage unit that defines the <CODE>main</CODE>
function. Prior to calling a shared function or accessing a shared object that is not
defined in the currently-executing linkage unit the program loader must
<I>finally resolve</I> references to that function or object by loading the linkage
unit that defines it<SUP><A HREF="#fn8">8</A></SUP>. The linkage unit that defines each
such reference must have the same linkage unit identifier as the linkage unit to which
the reference was tentatively resolved. [Note: this does not require that the linkage unit
to which the reference is finally resolved be identical to the one to which it was
tentatively resolved. This allows use of replacement dynamic libraries provided that all
symbols tentatively resolved to a particular dynamic library must also be finally resolved
to the corresponding replacement library. ]</BLOCKQUOTE>

<BLOCKQUOTE>[Note: the two steps of tentative resolution and final resolution can be
thought of as jointly defining <I>deferred linking</I>. This term is preferable to the
more common term &quot;dynamic linking&quot; because it more strongly suggests the
similarity between the name resolution done at final resolution and the name resolution
done in static linking.
]</BLOCKQUOTE>

 <H3><A NAME="Phases of Translation">Phases of Translation</A></H3>

<P>Change the second paragraph of [lex] (2/2) from:</P>

<BLOCKQUOTE>[Note: previously translated translation units and instantiation
units can be preserved individually or in libraries. The separate translation
units of a program communicate (3.5) by (for example) calls to functions
whose identifiers have external linkage, manipulation of objects whose identifiers
have external linkage, or manipulation of data files. Translation units can be
separately translated and then later linked to produce an executable program. (3.5). ]</BLOCKQUOTE>

<P>to:</P>

<BLOCKQUOTE>[Note: previously translated translation units and instantiation
units can be preserved individually, in libraries, or in linkage units. The separate translation
units of a linkage unit communicate (3.5) by (for example) calls to functions
whose identifiers have shared or external linkage, manipulation of objects whose
identifiers have shared or external linkage, or manipulation of data files. The
separate linkage units of a program communicate (3.5) by (for example) calls to
functions whose identifiers have shared linkage, manipulation of objects whose
identifiers have shared linkage, or manipulation of data files. Translation units can be
separately translated and then later linked to produce a linkage unit. ]</BLOCKQUOTE>

<P>Change paragraph nine of [lex.phases] (2.1/9) from:</P>

<BLOCKQUOTE>All external object and function references are resolved. Library
components are linked to satisfy external references to functions and objects
not defined in the current translation. All such translator output is collected
into a program image which contains information needed for execution in its execution
environment.</BLOCKQUOTE>

<P>to:</P>

<BLOCKQUOTE>All external object and function references are resolved and all
shared object and function references are resolved or tentatively resolved. Library
components are linked to satisfy external references to functions and objects
not defined in the current translation unit and to satisfy shared references
to functions and objects not defined in the current linkage unit. All such
translator output is collected into a partial executable image which contains
information needed for execution in its execution environment.</BLOCKQUOTE>

<P>Add new final paragraph to [lex.phases] (2.1/10):</P>

<BLOCKQUOTE>All shared object and function references that were tentatively
resolved when each linkage unit used by the progam was linked are finally resolved.
[Note: this final resolution will typically occur each time the program is
executed. ]</BLOCKQUOTE>

 <H3><A NAME="One Definition Rule">One Definition Rule</A></H3>

<P>Change paragraph three of [basic.def.odr] (3.2/3) from:</P>

<BLOCKQUOTE>Every program shall contain exactly one definition of every non-inline
function or object that is used in that program; no diagnostic required. The
definition can appear explicitly in the program, it can be found in the standard
or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1,
12.4 and 12.8). An inline function shall be defined in every translation unit in
which it is used.</BLOCKQUOTE>

<P>to:</P>

<BLOCKQUOTE>Every linkage unit shall contain exactly one definition of every non-inline
function or object that is used in that linkage unit; no diagnostic required. The
definition can appear explicitly in the linkage unit, it can be found in the standard
or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1,
12.4 and 12.8). An inline function shall be defined in every translation unit in
which it is used.</BLOCKQUOTE>

<BLOCKQUOTE>Every program shall contain exactly one definition of every shared
function or object that is used in that program; no diagnostic required. The
definition must appear explicitly in one of the linkage units that make up
the program. [Note: this implies that inline functions and implicit template
instantiations cannot, in general, be shared. ]</BLOCKQUOTE>

 <H3><A NAME="Linkage">Linkage</A></H3>

<P>Replace the first bullet item of the second paragraph from [basic.link] (3.5/2):</P>

<BLOCKQUOTE>When a name has <I>external linkage</I>, the entity it denotes can be
referred to by names from scopes of other translation units or from other
scopes of the same translation unit.</BLOCKQUOTE>

<P>with two bullet items:</P>

<BLOCKQUOTE>When a name has <I>shared linkage</I>, the entity it denotes can be
referred to by names from scopes of other linkage units, from scopes of other
translation units of the same linkage unit, or from other scopes of the same
translation unit.</BLOCKQUOTE>

<BLOCKQUOTE>When a name has <I>external linkage</I>, the entity it denotes can be
referred to by names from scopes of other translation units of the same linkage
unit or from other scopes of the same translation unit.</BLOCKQUOTE>

<H2><A NAME="Bibliography">Bibliography</A></H2>

<UL>
<LI>Austern, Matthew, "Toward Standardization of Dynamic Libraries", N1400=02-0058, 2002.</LI>
<LI>Becker, Pete, "Dynamic Libraries in C++", N1418=02-0076, 2002.</LI>
</UL>

<HR>

<P><A NAME="fn1">1</A>. The term &quot;linkage unit&quot; was chosen to suggest an analogy to a
translation unit and to emphasize that the use of dynamic libraries depends on
runtime linking. However, &quot;linkage&quot; has a specific meaning
in the C++ standard, so &quot;linkage unit&quot; might confuse people. Will this
be a problem? If so, suggestions for a different term are welcome.</P>

<P><A NAME="fn2">2</A>. For example, under UNIX this typically means linking
to the shared objects that define those symbols; under Windows this typically means
linking to import libraries associated with the DLLs that define those symbols.</P>

<P><A NAME="fn3">3</A>. Under UNIX this would typically be the name of the
shared object or the name of a link to the shared object; under Windows this is
typically the name of the DLL.</P>

<P><A NAME="fn4">4</A>. This is the approach that Windows takes. Another possibility
is the UNIX approach, where the linkage unit that defines a function or data object
at runtime can be distinct from the one that defined it at the time the linkage unit
that uses it was linked. One possible drawback of this approach, if I remember correctly,
is that it might not be implementable under Windows. Need input from someone with more
current knowledge of the Windows application loader.</P>

<P><A NAME="fn5">5</A>. I originally used &quot;global linkage&quot; throughout
this paper, but didn't like it, because &quot;global&quot;
has significantly different connotations for most programmers. I also found
that &quot;exported&quot;, which I have been using, is awkward because
its implied direction makes phrases like &quot;exported object and function references&quot;
confusing.</P>

<P><A NAME="fn6">6</A>. UNIX and Windows both permit multiple definitions,
but with rather different semantics; this rule restricts portable code to
common ground.</P>

<P><A NAME="fn7">7</A>. The syntax in the examples assumes the first model.</P>

<P><A NAME="fn8">8</A>. This implies that linkage units can be loaded on call.
That may be too hard to specify in conjunction with construction and destruction
of static objects, so it may be better to require loading at application startup.</P>

</BODY></HTML>
