Doc. no. 02-0014=N1356<br>
Date: 12 March 2002<br>
Project: Programming Language C++<br>
Reply to: <a href="mailto:RWGrosse-Kunstleve@lbl.gov"
>RWGrosse-Kunstleve@lbl.gov</a>

<h2>Predictable data layout for certain non-POD types</h2>

By R.W. Grosse-Kunstleve & D. Abrahams

<p>
<cite>It would be nice if every kind of numeric software could
be written in C++ without loss of efficiency, but unless
something can be found that achieves this without compromising
the C++ type system it may be preferable to rely on Fortran,
assembler or architecture-specific extensions (Bjarne
Stroustrup).</cite>

<h3>Problem</h3>
When combining multiple programming languages as suggested
by B. Stroustrup it is essential that the data layout for
the types that are to be used from two or more languages
are precisely defined. We will also show that a predictable
layout is a necessary prerequisite to enabling important
optimizations when working within C++. According to
ISO/IEC 14882:1998, the data layout of a user defined type
in C++ is predictable only if the the type is POD. The
conditions which cause a type to become non-POD are very
liberal. In particular the presence of any constructor is
not compatible with the requirements of a POD struct.

<p>
<b>Example 1: <code>std::complex&lt;T&gt;</code></b>

<p>
<i>1(a): Language Interoperability</i>

<p>
Since <code>std::complex&lt;T&gt;</code> includes several
constructors, it is not POD and the standard doesn't define how its
real and imaginary parts are stored. This inhibits portable use of a
large number of FORTRAN and C libraries which manipulate complex
numbers (e.g. FFTW, FFTPACK, BLAS), both as object libraries and as
source code ported to C++. In the latter case a complete rewrite would
be required for many libraries to achieve full portability
(e.g. FFTPACK), and in some cases the C++ version could not portably
achieve similar performance. Internally to these libraries, arrays of
complex numbers are commonly treated as arrays of real numbers
(reference: <a href="http://cctbx.sf.net/">cctbx.sf.net</a>, module fftbx).

<p>
The C99 standard includes a reserved keyword <code>_Complex</code> for
a family of types implementing complex numbers. The data layout is
precisely defined in 6.2.5/13 of ISO/IEC 9899:1999:

<blockquote>
  Each complex type has the same representation and alignment
  requirements as an array type containing exactly two
  elements of the corresponding real type; the first element
  is equal to the real part, and the second element to the
  imaginary part, of the complex number.
</blockquote>

The FORTRAN standard defines the same data layout for the
<code>COMPLEX</code> type.
In contrast, the definition of ISO/IEC 14882:1998 is much
less specific (26.2.2/1):

<blockquote>
  The class <code>complex</code> describes an object that can store
  the Cartesian components, <code>real()</code> and
  <code>imag()</code>, of a complex number.
</blockquote>

<p>
The current standard does not define storage and
alignment requirements. Some have claimed that
the internal representation of complex values can be
arbitrarily transformed. For example, some people interpret
the standard as saying a polar internal representation
might be legal.
To our knowledge, the data layout of all current
implementations of <code>std::complex&lt;T&gt;</code> are actually
compatible with C99 and FORTRAN. However, as it stands C++
and C99 or C++ and FORTRAN programs cannot be interfaced
portably because of the liberal definition 26.2.2/1 in
ISO/IEC 14882:1998.

<p>
<i>1(b): Optimization considerations</i>

<p>
To facilitate the discussion, we will use a highly
simplified outline of one of the most important algorithms
in numerical applications: an inplace real-to-complex Fast
Fourier Transform (FFT).

<pre>
std::vector&lt;double&gt; vec;
// fill vec
std::complex&lt;double&gt;*
result = fft_real_to_complex(&amp;*vec.begin(), vec.size());

std::complex&lt;double&gt;*
fft_real_to_complex(double* seq, std::size_t n)
{
  std::complex&lt;double&gt;*
  result = reinterpret_cast&lt;std::complex&lt;double&gt;*&gt;(seq);
  // Do the transform. In the process the array of real
  // values will become an array of complex values.
  return result;
}
</pre>

<p>
To be able to do the transform truly in place (i.e.,
without copying an entire array at some point in the
algorithm) it is essential that either (a) the data layout
of <code>std::complex&lt;T&gt;</code> is predictable or (b) the real
and imaginary parts of the complex values are directly
accessible, such as through references. ISO/IEC 14882:1998
does not provide any of these prerequisites.

<p>
A predictable data layout or direct access
through references is also a prerequisite to enabling
essential speed optimizations, even for complex-to-complex
transforms. Example: Any of the automatically generated
codelets in FFTW, such as ftw_4.c.

<p>
For the algorithm above to work it is essential that
<ul>
<li>
<code>std::complex&lt;T&gt;</code> has a trivial assignment operator to
avoid undefined behavior when the complex values are replaced by the
real values;

<li>
<code>std::complex&lt;T&gt;</code> has a trivial destructor to avoid
undefined behavior when <tt>vec</tt> in the example goes out of scope.

</ul>

<p>
<b>Example 2: Interfacing Python and C++</b>
<p>
Python is a dynamically typed, object-oriented, interpreted
language and therefore a powerful complement for the
statically-typed, compiled C++ language. The most popular
implementation of the Python programming language is
written in ANSI C89. David Abrahams has been implementing a
system for the integration of C-Python and C++ (reference:
<a href="http://www.boost.org/">www.boost.org</a>,
module
<a href="http://www.boost.org/libs/python/doc/index.html">Boost.Python</a>).

<p>
In the Python 'C' API, all objects are manipulated through pointers to a
"base" <code>struct PyObject</code>. The layout of every Python object which
participates in its cycle garbage-collection begins with the layout of a
<code>PyObject</code>. The <code>PyObject</code> contains a reference count and what is for all
intents and purposes a vtable. This arrangement provides a crude form of
object-orientation in 'C' and the basic idioms have been repeated in the
implementations of countless languages and systems.

<p>
The 'C' programmer wishing to implement a new object type in Python has
the opportunity to employ two of the language's most-beloved features,
macros and 'C'-style casts:

<pre>
struct MyObject
{
    PyObject_HEAD   // MACRO providing the members of PyObject
    T1 additional_data_1;
    T2 additional_data_2;
};

// Return a Python string representing MyObject
PyObject* MyObject_print(PyObject* o)
{
    MyObject* x = (MyObject*)o; // downcast
    ...
}

// "vtbl"
PyTypeObject MyType = {
    ...
    MyObject_print,
    ...
};

// Creation function
PyObject* MyObject_new()
{
    // MACRO invocation which allocates memory and initializes
    MyObject* result = PyObject_New(MyObject, &amp;MyObject_Type);
    ...more initialization...
    return (PyObject*)result;
}
</pre>

In keeping with the design intention that C++ is "a better C", consider
how we might solve this problem in C++. Obviously, we'd use inheritance
to eliminate macros and casting as much as possible. We'd add
constructors for <code>MyObject</code> and <code>PyObject</code> to eliminate the need for
initialization in <code>MyObject_new()</code>. We'd use real virtual functions
instead of an ad-hoc PyTypeObject filled with functions using the 'C'
calling convention.

<p>
Unfortunately, the rest of Python is still written in 'C', so we really
can't expect to replace the <code>PyTypeObject</code> with real virtual functions
here. However, we are tantalizingly close to being able to do very much
better than shown above in C++:

<pre>
// Base object for all Python extension types
struct PyBaseObject : PyObject
{
    // initializes refcount and vtbl
    PyObject(PyTypeObject const&amp;);
    // allocates in Python's special GC area
    void* operator new(std::size_t n);
};

extern "C" PyObject* MyObject_print(PyObject* o) {
    MyObject* x = static_cast&lt;MyObject*&gt;(o);
}

PyTypeObject MyType = {
    ...
    MyObject_print,
    ...
};

struct MyObject : PyBaseObject
{
    MyObject() : PyBaseObject(MyType) {...}
};

// Just use operator new for allocation
</pre>

Though the above works on every C++ implementation we know
of, it relies on an assumption which is technically
non-portable: that base classes in non-virtual
inheritance hierarchies are laid out as though they were the
first data members of a class. The assumption is invalid
because the classes involved are non-POD: they have both
base classes and constructors. In the absence of such a
guarantee, or a way to achieve it, the C++ programmer is
exposed to most of the same dangers as the 'C' programmer
when interfacing to many 'C' systems, and to Python in
particular.

<p>
<h3>Proposed resolution</h3>

<p>
The original considerations about POD focused strictly on
being able to interoperate with types defined in 'C', but
not on being able to leverage the power of C++ for
interfacing with 'C' systems. The examples above illustrate
the importance of a predictable data layout for this
and other purposes. Therefore:

<ol>

<li>To facilitate the usability of std::complex in
multi-language projects we propose to adopt the
definition 6.2.5/13 of ISO/IEC 9899:1999 for
<code>std::complex&lt;T&gt;</code>.

<p>

<li>To ease reinterpretion of an array of
<code>std::complex&lt;T&gt;</code> as an array of <code>T</code> and
vice versa, we propose that the member functions for data access
return references instead of copies.

<p>

<li>We propose that the standard includes a new concept of "Enhanced
POD" which allows the use of certain C++ language features such as
constructors and inheritance as a notational convenience while
providing POD-like guarantees for data layout. The exact definition of
Enhanced POD (e.g. no virtual functions, single or multiple
inheritance, etc.) is open to discussion.

<p>
By allowing constructors, and thus ensuring initialization to a valid
state, the Enhanced POD concept encourages safer programming practices.
Right now certain classes (endian arithmetic, for example) are often
designed without constructors so that they can be used in contexts
requiring POD types. This is neither as safe or convenient as if these
classes had constructors.

<p>
Presumably many of the contexts now requiring POD types will be relaxed
to require only Enhanced POD types. In particular, it would be very
helpful if the requirements on implementations for POD types in 3.9
paragraphs 2-4 could also apply to Enhanced POD types.

</ol>

<p>
We encourage the committee to consider the Enhanced POD
proposal separately from the others.

<p>
The proposals will allow to (a) build arrays of
std::complex&lt;T&gt; with a predictable data layout and
(b) portably pass T* pointers to these arrays to other
languages, e.g.:

<pre>
void foo(double *data, long n); // C library function
std::vector&lt;std::complex&lt;double&gt; &gt; vec;
// fill vec
foo(&amp;vec[0].real(), vec.size());
</pre>


<h3>Acknowledgments</h3>

John Spicer's &quot;advice and consent&quot; was invaluable in
formulating this proposal. We thank Beman Dawes for contributing
substantive additional motivation, and Robert Stewart for careful
proof reading.
