<HTML><HEAD><TITLE>N1499=03-0082, Simplifying Interfaces in basic_regex</TITLE></HEAD><BODY>

<CENTER>
<H1><A NAME="N1499=03-0082, Simplifying Interfaces in basic_regex">Simplifying Interfaces in <CODE>basic_regex</CODE></A></H1>
</CENTER>

<TABLE ALIGN="RIGHT" CELLSPACING="0" CELLPADDING="0">
<TR>
<TD ALIGN="RIGHT"><B><I>Document number:</I></B></TD>
<TD>&nbsp; N1499&nbsp;=&nbsp;03-0082</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Date:</I></B></TD>
<TD>&nbsp; September 22, 2003</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Project:</I></B></TD>
<TD>&nbsp; Programming Language C++</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Reference:</I></B></TD>
<TD>&nbsp; ISO/IEC IS 14882:1998(E)</TD>
</TR>
<TR>
<TD ALIGN="RIGHT"><B><I>Reply to:</I></B></TD>
<TD>&nbsp; Pete Becker</TD>
</TR>
<TR>
<TD></TD>
<TD>&nbsp; Dinkumware, Ltd.</TD>
</TR>
<TR>
<TD></TD>
<TD>&nbsp; petebecker@acm.org</TD>
</TR>
</TABLE>
<BR CLEAR="ALL">

<HR>

<H2><CODE>basic_regex</CODE> Should Not Keep a Copy of its Initializer</H2>

<P>The <CODE>basic_regex</CODE> template has a member function <CODE>str</CODE> which
returns a string object that holds the text used to initialize the <CODE>basic_regex</CODE>
object. It also provides a container-like interface to this text through the
member functions <CODE>begin</CODE> and <CODE>end</CODE>, which return
<CODE>const_iterator</CODE> objects that allow inspection of the initializer text.
While it might occasionally be useful to look at the initializer string, we ought
to apply the rule that you don't pay for it if you don't use it. Just as <CODE>fstream</CODE>
objects don't carry around the file name that they were opened with, <CODE>basic_regex</CODE>
objects should not carry around their initializer text. If someone needs to keep
track of that text they can write a class that holds the text and the <CODE>basic_regex</CODE>
object.</P>

<P><B>Recommended changes:</B> remove the member functions <CODE>str</CODE>,
<CODE>begin</CODE>, and <CODE>end</CODE>.</P>

<H2><CODE>basic_regex</CODE> Should Not Have an Allocator</H2>

<P>The <CODE>basic_regex</CODE> template takes an argument that defines a type for
an allocator object. The template also has several member typedefs and one member
function to provide information about the allocator type and the allocator object.
This is because a <CODE>basic_regex</CODE> object &quot;is in effect both a
container of characters, and a container of states, as such an allocator
parameter is appropriate.&quot; Calling it a container doesn't make it one.
The allocator in <CODE>basic_regex</CODE> is not very useful, and it
unduly complicates the implementation.</P>

<P>The cost of using an allocator is high. Every type that the <CODE>basic_regex</CODE>
object uses internally must have its own allocator type and its own allocator object.
A node based implementation might have a dozen or more node types, requiring a dozen
or more allocator objects. Allocator objects can be created as local objects
when needed, which effectively precludes allocators with internal state; they can be
ordinary members of the <CODE>basic_regex</CODE> object, inflating its size; or
they can be implemented as a chain of base classes (to take advantage of the
zero-size base optimization), with a high cost in readability and maintainability.
None of these options is attractive.</P>

<P>Further, it's not at all clear how a user can determine that a substitute allocator
is appopriate or what characteristics such an allocator should have. The STL containers
have clearly spelled out requirements for their memory usage; <CODE>basic_regex</CODE>
objects have no such requirements (nor should they). The implementor of the
<CODE>basic_regex</CODE> template knows best what its memory requirements are.</P>

<P><B>Recommended changes:</B> remove the <CODE>Allocator</CODE> argument from
<CODE>basic_regex</CODE> and remove the members <CODE>reference</CODE>,
<CODE>const_reference</CODE>, <CODE>difference_type</CODE>, <CODE>size_type</CODE>,
<CODE>allocator_type</CODE>, <CODE>get_allocator</CODE>, and <CODE>max_size</CODE>.</P>

<H2>The Interface to <CODE>regex_traits</CODE> Should Use Iterators, Not Strings</H2>

<P>The member functions of the <CODE>regex_trait</CODE> template support customization
and internationalization for regular expressions. Of these, the member functions
<CODE>transform</CODE>, <CODE>transform_primary</CODE>, <CODE>lookup_collatename</CODE>,
and <CODE>lookup_classname</CODE> take <CODE>string</CODE> as input.</P>

<P>This interface is inherently inefficient -- it requires creating a string object
from a sequence in order to pass that string to the function. Further, in the
case of <CODE>transform</CODE>, the function typically extracts iterators from
the string object. Passing the text as a pair of iterators avoids introducing
unnecessary string objects.</P>

<P><B>Recommended changes:</B></P>

<OL>
<LI>Change the signature of <CODE>regex_traits::transform</CODE> to
<PRE>    <CODE>template &lt;class InIt, class OutIt&gt;
    string_type transform(InIt first, InIt last) const;</CODE></PRE>

<P>and change the Effects clause to:</P>

<P><B>Effects:</B> returns <CODE>use_facet&lt;collate&lt;charT&gt; &gt;(getloc()).transform(first, last))</CODE>.</P>
</LI>

<LI>Change the signature of <CODE>regex_traits::transform_primary</CODE> to
<PRE>    <CODE>template &lt;class InIt, class OutIt&gt;
    string_type transform_primary(InIt first, InIt last) const;</CODE></PRE>

<P>and change the Effects clause to:</P>

<P><B>Effects:</B> if <CODE>typeid(use_facet&lt;collate&lt;charT&gt; &gt;)
== typeid(collate_byname&lt;charT&gt;)</CODE> and the form of the sort key
returned by <CODE>collate_byname&lt;charT&gt;::transform(first, last)</CODE>
is known and can be converted into a primary sort key, then returns that key,
otherwise returns an empty string.</P>
</LI>

<LI>Change the signature of <CODE>regex_traits::lookup_collatename</CODE> to
<PRE>    <CODE>template &lt;class InIt, class OutIt&gt;
    char_class_type lookup_collatename(InIt first, InIt last) const;</CODE></PRE>

<P>and change the Effects clause to:</P>

<P><B>Effects:</B> returns the sequence characters that represents the
collation element named by the characters in the half-open range
<CODE>[first, last)</CODE> if that sequence names a valid collation element
under the imbuded locale, otherwise returns an empty string.</P>

<P>Note that in addition to the iterator language, this change to the effects clause
removes the requirement that <CODE>lookup_collatename</CODE> recognize the names
of characters in the POSIX Portable Character Set. This requirement seems to be the
result of a misunderstanding of what constitutes a collation element.</P>

</LI>

<LI>Change the signature of <CODE>regex_traits::lookup_classname</CODE> to
<PRE>    <CODE>template &lt;class InIt, class OutIt&gt;
    char_class_type lookup_classname(InIt first, InIt last) const;</CODE></PRE>

<P>and change the Effects clause to:</P>

<P><B>Effects:</B> returns an implementation-specific value that represents
a character classification named without regard to case by the characters in the half-open
range <CODE>[first, last)</CODE> if such a character classification exists, otherwise
returns 0. The implementation shall provide character classes with the following names:
<CODE>&quot;d&quot;</CODE>,
<CODE>&quot;w&quot;</CODE>,
<CODE>&quot;s&quot;</CODE>,
<CODE>&quot;alnum&quot;</CODE>,
<CODE>&quot;alpha&quot;</CODE>,
<CODE>&quot;blank&quot;</CODE>,
<CODE>&quot;cntrl&quot;</CODE>,
<CODE>&quot;digit&quot;</CODE>,
<CODE>&quot;graph&quot;</CODE>,
<CODE>&quot;lower&quot;</CODE>,
<CODE>&quot;print&quot;</CODE>,
<CODE>&quot;punct&quot;</CODE>,
<CODE>&quot;space&quot;</CODE>,
<CODE>&quot;upper&quot;</CODE>,
and <CODE>&quot;xdigit&quot;</CODE>.</P>
</LI>

</OL>

</BODY></HTML>
