<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
        <title>
            Enhancing the time_get facet for POSIX compatibility, Revision 2
        </title>
    </head>

    <body>
        <table>
            <tr>
                <td align="left">Doc. no.</td>
                <td align="left">N2321=07=0181</td>
            </tr>
            <tr>
                <td align="left">Obsoletes:</td>
                <td align="left"><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2211.html">N2211=07=0071</a></td>
            </tr>
            <tr>
                <td align="left">Date:</td>
                <td align="left">2007-06-22</td>
            </tr>
            <tr>
                <td align="left">Project:</td>
                <td align="left">Programming Language C++</td>
            </tr>
            <tr>
                <td align="left">Reply to:</td>
                <td align="left">
                    <a href="mailto:sebor@roguewave.com">Martin Sebor
                </td>
            </tr>
        </table>

        <hr><!-------------------------------------------------------->
        <h1>

            Enhancing the <code>time_get</code> facet for POSIX&reg;
            compatibility, Revision 2
        </h1>
        <!------------------------------------------------------------>
        <h2>
            Index
        </h2>
        <ul>

            <li>
                <a href="#changes-r1">Changes in This Revision</a>
            </li>
            <li>
                <a href="#motivation">Motivation</a>
            </li>
            <li>
                <a href="#description">Description</a>

            </li>
            <li>
                <a href="#changes">Proposed Changes</a>
            </li>
            <li>
                <a href="#implementation">Implementation</a>
            </li>
            <li>
                <a href="#impact">Impact on Programs</a>
            </li>
            <li>
                <a href="#compatibility">Compatibility</a>
            </li>
        </ul>
        <!------------------------------------------------------------>
        <h2>
            <a name="changes-r1">Changes in This Revision</a>
        </h2>
        <p>

            This is  a minor revision  of the proposal  that clarifies
            the   permission   granted   to  implementations   in   <a
            href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2211.html">Revision
            1</a>  of the document  to fail  to parse  input sequences
            using    complex    conversion    directives    such    as
            <code>%c</code>, <code>%x</code>,  and <code>%X</code>, so
            as to extend to the  same sequences even when they involve
            the optional modifiers <code>E</code> and <code>O</code>. 

        </p>
        <p>

            In     addition     this     revision    adds     a     <a
            href="#compatibility">Comaptibility</a> paragraph.

        </p>
        <p>

            It should be  noted that cases where the  function may not
            be able  to correctly parse even  complex sequences should
            be  quite rare  especially  on POSIX  platforms where  the
            function                                           <code><a
            href="http://www.opengroup.org/onlinepubs/009695399/functions/nl_langinfo.html">nl_langinfo</a></code>
            may be used to  retrieve the broken-down string consisting
            of   a   sequence    of   simple   conversion   directives
            corresponding to each of the complex ones. For example, in
            the C locale, the  broken-down string corresponding to the
            <code>%c</code>   directive   is   <code>"%a  %b   %e   %T
            %Y"</code>.  The  <code>nl_langinfo</code>  function  also
            makes it possible to retrieve the alternative symbols used
            instead  of ordinary  digits in  directives  involving the
            <code>E</code> and <code>O</code> modifiers.

        </p>
        <!------------------------------------------------------------>

        <h2>
            <a name="motivation">Motivation</a>
        </h2>
        <p>

            The <code>time_get</code> and <code>time_put</code> facets
            provide a  low-level asymmetric interface  for the parsing
            and  formatting  of   time  values.   The  interfaces  are
            asymmetric  because  the  <code>time_put</code>  facet  is
            capable of  producing a much larger set  of sequences than
            the  <code>time_get</code> facet  is  capable of  parsing.
            The  <code>time_put</code>   interface  can  also  readily
            expose   useful   implementation-defined   extensions   by
            recognizing additional formatting specifiers and modifiers
            while the <code>time_get</code> interface provides no such
            flexibility.   The behavior  of  the <code>time_put</code>

            facet  is specified  in terms  of the  C  standard library
            function  <code>strftime</code> and the  facet's interface
            allows programs to  take advantage of the rich  set of the
            60   or  so  <code>strftime</code>   conversion  specifies
            (including  their optional  modifiers).  In  contrast, the
            behavior of <code>time_get</code> is restricted to parsing
            a limited  set of  time and date  sequences produced  by a
            handful    of    formatting    specifiers,   namely    the
            locale-independent  and trivial <code>%T</code>  (which is
            the  same  as <code>"%H:%M:%S"</code>,  the  24 hour  time
            representation),  the  locale-specific  and  less  trivial
            <code>%x</code> (the locale's date representation), and to
            parsing   simple   weekday   names  (<code>%a</code>   and
            <code>%A</code>)   and  the   names  of   calendar  months
            (<code>%b</code>  and <code>%B</code>).   Presumably, this
            restriction  exists only  because the  C  standard library
            provides no  function for parsing time  sequences.  Such a
            function    is,    however,    specified   by    the    <a
            href="http://www.unix.org/version3/iso_std.html">ISO/IEC
            9945</a> standard  (also known  as POSIX) --  see <code><a
            href="http://www.opengroup.org/onlinepubs/009695399/functions/strptime.html">strptime</a></code>.
            Thus,  C++ programs  that need  to process  date  and time
            sequences produced by any of the other 56 or so formatting
            specifiers  are  unable  to   do  so  by  relying  on  the
            <code>time_get</code> facet's  parsing functionality, even
            though  much of  it often  exists in  implementations that
            parse non-trivial date sequences but is not exposed in the
            interface  of the  facet.  For  instance, even  the simple
            task of  parsing a 12  hour time representation  is beyond
            the ability of  the facet, as is the  often needed ability
            to recognize and interpret time zones.

        </p>

        <!------------------------------------------------------------>
        <h2>
            <a name="description">Description</a>
        </h2>
        <p>

            This  paper proposes  to extend  the <code>time_get</code>
            facet interface in a way  to permit the parsing of most of
            the same  set of  date and time  sequences as  produced by
            <code>time_put</code>, thus providing a subset of the same
            functionality                   as                   POSIX
            <code>strptime</code>. Specifically, we propose to add two
            <code>get</code>   and   one  <code>do_get</code>   member
            functions  to   class  template  <code>time_get</code>  to
            parallel those declared by <code>time_put</code>.

        </p>

        <!------------------------------------------------------------>
        <h2>
            <a name="changes">Proposed Changes</a>
        </h2>
        <p>

            Add to the declaration of class <code>time_get</code> in
            [lib.locale.time.get], immediately below the declaration
            of the member function <code>get_year</code>, the
            following declarations:

        </p>

        <blockquote>
            <code>

            iter_type
            get (iter_type <i>s</i>, iter_type <i>end</i>,
                 ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                 tm* <i>t</i>, char <i>format</i>,
                 char <i>modifier</i> = 0) const;

            </code>

        </blockquote>
        <blockquote>
            <code>

            iter_type
            get (iter_type <i>s</i>, iter_type <i>end</i>,
                 ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                 tm* <i>t</i>, const char_type* <i>fmt</i>,
                 const char_type *<i>end</i>) const;

            </code>

        </blockquote>
        <p>

            Add  to the  declaration  of class  <code>time_get</code>,
            immediately  below the declaration  of the  virtual member
            function     <code>do_get_year</code>,    the    following
            declaration:


        </p>
        <blockquote>
            <code>

            virtual iter_type
            do_get (iter_type <i>s</i>, iter_type <i>end</i>,
                    ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                    tm* <i>t</i>, char <i>format</i>,
                    char <i>modifier</i>) const;

            </code>

        </blockquote>
        <p>

            Add to the end of [lib.locale.time.get.members] the
            following text:

        </p>
        <blockquote>
            <code>

            iter_type
            get (iter_type <i>s</i>, iter_type <i>end</i>,
                 ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                 tm* <i>t</i>, char <i>format</i>,
                 char <i>modifier</i> = 0) const;

            </code>

            <p>
                <i>Returns:</i>   <code>do_get(s,  end,  f,   err,  t,
                 format, modifier)</code>

            </p>
        </blockquote>
        <blockquote>
            <code>

            iter_type
            get (iter_type <i>s</i>, iter_type <i>end</i>,
                 ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                 tm* <i>t</i>, const char_type* <i>fmt</i>,
                 const char_type *<i>end</i>) const;

            </code>

            <p>

                <i>Requires:</i>  <code>[fmt, end)</code>  is  a valid
                range.

            </p>
            <p>

                <i>Effects:</i>  The  function  starts  by  evaluating
                <code>err =  ios_base::goodbit</code>.  It then enters
                a   loop,  reading  zero  or   more   characters  from
                <code>s</code>  at  each  iteration. Unless  otherwise
                specified below, the loop terminates when the first of
                the following conditions holds:

            </p>

            <ul>
                <li>

                    The expression <code>(fmt == end)</code> evaluates
                    to <code>true</code>.

                </li>
                <li>

                    The         expression        <code>(err        ==
                    ios_base::goodbit)</code>       evaluates       to
                    <code>false</code>.

                </li>

                <li>

                    The  expression <code>(s ==  end)</code> evaluates
                    to <code>true</code>,  in which case  the function
                    evaluates    <code>err   =    ios_base::eofbit   |
                    ios_base::failbit</code>.

                </li>
                <li>

                    The next  element of <code>fmt</code>  is equal to
                    <code>'%'</code>,   optionally   followed   by   a
                    <code>modifier</code>  character,  followed  by  a
                    conversion           specifier          character,
                    <code>format</code>, together forming a conversion
                    specification valid for  the ISO/IEC 9945 function
                    <code>strptime</code>.  If  the number of elements
                    in  the  range   <code>[fmt,  end)</code>  is  not
                    sufficient to  unambiguously determine whether the
                    conversion  specification is  complete  and valid,
                    the     function     evaluates     <code>err     =
                    ios_base::failbit</code>.  Otherwise, the function
                    evaluates  <code> s  = do_get(s,  end, f,  err, t,
                    format,  modifier)</code>,   where  the  value  of
                    <code>modifier</code>  is  <code>'\0'</code>  when
                    the   optional  modifier   is   absent  from   the
                    conversion   specification.    If  <code>(err   ==
                    ios_base::goodbit</code>)    holds    after    the
                    evaluation   of  the   expression,   the  function
                    increments <code>fmt</code> to point just past the
                    end of the  conversion specification and continues
                    looping.

                </li>

                <li>

                    The          expresion         <code>isspace(*fmt,
                    f.getloc())</code> evaluates to <code>true</code>,
                    in  which  case   the  function  first  increments
                    <code>fmt</code>   until  <code>(fmt  ==   end  ||
                    !isspace(*fmt,   f.getloc())</code>  evaluates  to
                    <code>true</code>,  then  advances  <code>s</code>

                    until    <code>(s   ==    end    ||   !isspace(*s,
                    f.getloc()))</code>   is   <code>true</code>,  and
                    finally resumes looping.

                </li>
                <li>

                    The   next  character  read   from  <code>s</code>
                    matches the element pointed to by <code>fmt</code>

                    in  a case-insensitive  comparison, in  which case
                    the function evaluates <code>++fmt, ++s</code> and
                    continues   looping.    Otherwise,  the   function
                    evaluates <code>err = ios_base::failbit</code>.

                </li>
            </ul>
            <p>

                <i>Note:</i>      The      function      uses      the
                <code>ctype&lt;charT&gt;</code>   facet  installed  in
                <code>f</code>'s locale  to determine valid whitespace
                characters.  It  is  unspecified  by  what  means  the
                function   performs  case-insensitive   comparison  or
                whether multi-character sequences are considered while
                doing so.

            </p>

            <p>
                <i>Returns:</i> <code>s</code>.
            </p>
        </blockquote>
        <p>

            Add the following paragraphs to the end of
            [lib.locale.time.get.virtuals]:

        </p>
        <blockquote>

            <code>

            virtual iter_type
            do_get (iter_type <i>s</i>, iter_type <i>end</i>,
                    ios_base& <i>f</i>, ios_base::iostate& <i>err</i>,
                    tm* <i>t</i>, char <i>format</i>,
                    char <i>modifier</i>) const;

            </code>

            <p>

                <i>Requires:</i>  <code>[fmt, end)</code>  is  a valid
                range and <code>t</code> is dereferenceable.

            </p>
            <p>

                <i>Effects:</i>  The  function  starts  by  evaluating
                <code>err  = ios_base::goodbit</code>.  It  then reads
                characters   starting  at   <code>s</code>   until  it
                encounters  an error,  or until  it has  extracted and
                assigned those <code>struct tm</code> members, and any
                remaining   format  characters,  corresponding   to  a
                conversion directive appropriate  for the ISO/IEC 9945
                function      <code>strptime</code>,     formed     by
                concatenating           <code>'%'</code>,          the
                <code>modifier</code> character, when non-NUL, and the
                <code>format</code> character.  When the concatenation
                fails to yield a complete valid directive the function
                leaves the object pointed to by <code>t</code>

                unchanged      and     evaluates      <code>err     |=
                ios_base::failbit</code>. When <code>(s == end)</code>
                evaluates   to  <code>true</code>   after   reading  a
                character   the   function   evaluates  <code>err   |=
                ios_base::eofbit</code>.

            </p>
            <p>

                For    complex   conversion    directives    such   as
                <code>%c</code>,  <code>%x</code>, or <code>%X</code>,
                or  directives  that  involve the  optional  modifiers
                <code>E</code> or <code>O</code>, when the function is
                unable   to  unambiguously   determine  some   or  all
                <code>struct tm</code> members from the input sequence
                <code>[<i>s</i>,   <i>end</i>)</code>,  it   evaluates
                <code>err  |= ios_base::eofbit</code>.  In  such cases
                the values of those <code>struct tm</code> members are
                unspecified and may be outside their valid range.

            </p>

            <p>

                <i>Note:</i> It is  unspecified whether multiple calls
                to <code>do_get()</code> with  the address of the same
                <code>struct tm</code> object  will update the current
                contents  of  the   object  or  simply  overwrite  its
                members.  Portable  programs must zero  out the object
                before invoking the function. 

            </p>
            <p>

                <i>Returns:</i>   An  iterator   pointing  immediately
                beyond the last  character recognized as possibly part
                of   a   valid    input   sequence   for   the   given
                <code>format</code> and <code>modifier</code>.

            </p>
        </blockquote>
        <!------------------------------------------------------------>

        <h2>

            <a name="implementation">Implementation</a>
        </h2>
        <p>

            A reference implementation  of this extension is available
            for     review      in     the     Open      Source     <a
            href="http://incubator.apache.org/stdcxx/">Apache       C++
            Standard   Library</a>.  The   same  extension   has  been
            implemented  in the Rogue  Wave&reg; C++  Standard Library
            and       shipped      since      2001.        See      <a
            href="http://www.roguewave.com/support/docs/sourcepro/edition9-update1/html/stdlibref/time-get.html#idx1241">

            this page</a> for the latest documentation of the feature.

        </p>

        <!------------------------------------------------------------>
        <h2>
            <a name="impact">Impact On Programs</a>
        </h2>
        <p>

            The proposed extensions are largely source compatible with
            the  existing interface of the <code>time_get</code> facet
            (there is a  very small chance that the  introduction of a
            new  a  base  class   member  function  might  affect  the
            well-formedness  or even  the behavior  of a  program that
            calls a  function with  the same name  in a  class derived
            from the base).

        </p>

        <!------------------------------------------------------------>
        <h2>
            <a name="compatibility">Compatibility</a>
        </h2>
        <p>

            Adding  a   new  virtual  member  function   is  a  binary
            incompatible  change.   During   the  discussion  of  this
            proposal at the  Oxford meeting in April 2007  a number of
            attendees  expressed  concern  about  introducing  such  a
            change in a Technical Report (such as TR2) and felt that a
            change of this nature would be more appropriate for the
            upcoming revision of the C++ standard. 

        </p>
    </body>
</html>
