<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
        <title>
            Enhancing the time_get facet for POSIX compatibility
        </title>
    </head>

    <body>
        <table>
            <tr>
                <td align="left">Doc. no.</td>
                <td align="left">N2070=06-0140</td>
            </tr>
            <tr>
                <td align="left">Date:</td>
                <td align="left">2006-09-08</td>
            </tr>
            <tr>
                <td align="left">Project:</td>
                <td align="left">Programming Language C++</td>
            </tr>
            <tr>
                <td align="left">Reply to:</td>
                <td align="left">
                    <a href="mailto:sebor@roguewave.com">Martin Sebor
                </td>
            </tr>
        </table>
        <hr><!-------------------------------------------------------->
        <h1>
            Enhancing the <code>time_get</code> facet for POSIX&reg;
            compatibility
        </h1>
        <!------------------------------------------------------------>
        <h2>
            Index
        </h2>
        <ul>
            <li>
                <a href="#motivation">Motivation</a>
            </li>
            <li>
                <a href="#description">Description</a>
            </li>
            <li>
                <a href="#changes">Proposed Changes</a>
            </li>
            <li>
                <a href="#implementation">Implementation</a>
            </li>
            <li>
                <a href="#impact">Impact on Programs</a>
            </li>
        </ul>
        <!------------------------------------------------------------>
        <h2>
            <a name="motivation">Motivation</a>
        </h2>
        <p>

            The <code>time_get</code> and <code>time_put</code> facets
            provide a  low-level asymmetric interface  for the parsing
            and  formatting  of   time  values.   The  interfaces  are
            asymmetric  because  the  <code>time_put</code>  facet  is
            capable of  producing a much larger set  of sequences than
            the  <code>time_get</code> facet  is  capable of  parsing.
            The  <code>time_put</code>   interface  can  also  readily
            expose   useful   implementation-defined   extensions   by
            recognizing additional formatting specifiers and modifiers
            while the <code>time_get</code> interface provides no such
            flexibility.   The behavior  of  the <code>time_put</code>
            facet  is specified  in terms  of the  C  standard library
            function  <code>strftime</code> and the  facet's interface
            allows programs to  take advantage of the rich  set of the
            60   or  so  <code>strftime</code>   conversion  specifies
            (including  their optional  modifiers).  In  contrast, the
            behavior of <code>time_get</code> is restricted to parsing
            a  limited  set time  and  date  sequences  produced by  a
            handful    of    formatting    specifiers,   namely    the
            locale-independent  and trivial <code>%T</code>  (which is
            the  same  as <code>"%H:%M:%S"</code>,  the  24 hour  time
            representation),  the  locale-specific  and  less  trivial
            <code>%x</code> (the locale's date representation), and to
            parsing   simple   weekday   names  (<code>%a</code>   and
            <code>%A</code>)   and  the   names  of   calendar  months
            (<code>%b</code>  and <code>%B</code>).   Presumably, this
            restriction  exists only  because the  C  standard library
            provides no  function for  parsing time sequences.  Such a
            function    is,    however,    specified   by    the    <a
            href="http://www.unix.org/version3/iso_std.html">ISO/IEC
            9945</a> standard  (also known  as POSIX) --  see <code><a
            href="http://www.opengroup.org/onlinepubs/009695399/functions/strptime.html">strptime</a></code>.
            Thus,  C++ programs  that need  to process  date  and time
            sequences  produced by  any of  the other  other 56  or so
            formatting specifiers  are unable to  do so by  relying on
            the  <code>time_get</code>'s  parsing functionality,  even
            though  much of  it often  exists in  implementations that
            parse non-trivial date sequences but is not exposed in the
            interface  of the  facet.  For  instance, even  the simple
            task of  parsing a 12  hour time representation  is beyond
            the ability of  the facet, as is the  often needed ability
            to recognize and interpret time zones.

        </p>
        <!------------------------------------------------------------>
        <h2>
            <a name="description">Description</a>
        </h2>
        <p>

            This  paper proposes  to extend  the <code>time_get</code>
            facet interface in a way  to permit the parsing of most of
            the same  set of  date and time  sequences as  produced by
            <code>time_put</code>, thus providing a subset of the same
            functionality                   as                   POSIX
            <code>strptime</code>. Specifically, we propose to add two
            <code>get</code>   and   one  <code>do_get</code>   member
            functions to class <code>time_get</code> to parallel those
            declared by <code>time_put</code>.

        </p>
        <!------------------------------------------------------------>
        <h2>
            <a name="changes">Proposed Changes</a>
        </h2>
        <p>

            Add to the declaration of class <code>time_get</code> in
            [lib.locale.time.get], immediately below the declaration
            of the member function <code>get_year</code>, the
            following:

        </p>
        <blockquote>
            <code>

            iter_type get (iter_type s, iter_type end, ios_base& f,
                           ios_base::iostate& err, tm* t,
                           char format, char modifier = 0) const;

            </code>
        </blockquote>
        <blockquote>
            <code>

            iter_type get (iter_type s, iter_type end, ios_base& f,
                           ios_base::iostate& err, tm* t,
                           const char_type* fmt, const char_type *end) const;

            </code>
        </blockquote>
        <p>

            Add to the declaration of class <code>time_get</code>,
            immediately below the declaration of the virtual member
            function <code>do_get_year</code>, the following:


        </p>
        <blockquote>
            <code>

            virtual iter_type get (iter_type s, iter_type end,
                                   ios_base& f,
                                   ios_base::iostate& err, tm* t,
                                   char format, char modifier) const;

            </code>
        </blockquote>
        <p>

            Add to the end of [lib.locale.time.get.members] the
            following text:

        </p>
        <blockquote>
            <code>

            iter_type get (iter_type s, iter_type end, ios_base& f,
                           ios_base::iostate& err, tm* t,
                           char format, char modifier = 0) const;
            </code>
            <p>
                <i>Returns:</i>   <code>do_get(s,  end,  f,   err,  t,
                 format, modifier)</code>
            </p>
        </blockquote>
        <blockquote>
            <code>

            iter_type get (iter_type s, iter_type end, ios_base& f,
                           ios_base::iostate& err, tm* t,
                           const char_type* fmt, const char_type* end) const;
            </code>
            <p>

                <i>Requires:</i>  <code>[fmt, end)</code>  is  a valid
                range.

            </p>
            <p>

                <i>Effects:</i>  The  function  starts  by  evaluating
                <code>err =  ios_base::goodbit</code>.  It then enters
                a   loop,  reading  zero  or   more   characters  from
                <code>s</code>  at  each  iteration. Unless  otherwise
                specified below, the loop terminates when the first of
                the following conditions holds:

            </p>
            <ul>
                <li>

                    The expression <code>(fmt == end)</code> evaluates
                    to true.

                </li>
                <li>

                    The         expression        <code>(err        ==
                    ios_base::goodbit)</code> evaluates to false.

                </li>
                <li>

                    The  expression <code>(s ==  end)</code> evaluates
                    to  true,  in which  case  the function  evaluates
                    <code>err        =        ios_base::eofbit       |
                    ios_base::failbit</code>.

                </li>
                <li>

                    The next  element of <code>fmt</code>  is equal to
                    <code>'%'</code>,   optionally   followed   by   a
                    <code>modifier</code>  character,  followed  by  a
                    conversion           specifier          character,
                    <code>format</code>, together forming a conversion
                    specification valid for  the ISO/IEC 9945 function
                    <code>strptime</code>.  If  the number of elements
                    in  the  range   <code>[fmt,  end)</code>  is  not
                    sufficient to  unambiguously determine whether the
                    conversion specification is complete and valid the
                    function        evaluates        <code>err       =
                    ios_base::failbit</code>.  Otherwise, the function
                    evaluates  <code> s  = do_get(s,  end, f,  err, t,
                    format,  modifier)</code>,   where  the  value  of
                    <code>modifier</code>  is  <code>'\0'</code>  when
                    the   optional  modifier   is   absent  from   the
                    conversion   specification.    If  <code>(err   ==
                    ios_base::goodbit</code>)    holds    after    the
                    evaluation   of   the   expression  the   function
                    increments fmt  to point just past the  end of the
                    conversion specification and continues looping.

                </li>
                <li>

                    The          expresion         <code>isspace(*fmt,
                    f.getloc())</code>  evaluates  to  true, in  which
                    case     the     function     first     increments
                    <code>fmt</code>   until  <code>(fmt  ==   end  ||
                    !isspace(*fmt,   f.getloc())</code>  evaluates  to
                    true,  advances <code>s</code>  until  <code>(s ==
                    end  || !isspace(*s, f.getloc()))</code>  is true,
                    and then resumes looping.

                </li>
                <li>

                    The   next  character  read   from  <code>s</code>
                    matches the element pointed to by <code>fmt</code>
                    in  a case-insensitive  comparison, in  which case
                    the function evaluates <code>++fmt, ++s</code> and
                    continues   looping.    Otherwise,  the   function
                    evaluates <code>err = ios_base::failbit</code>.

                </li>
            </ul>
            <p>

                <i>Note:</i>      The      function      uses      the
                <code>ctype&lt;charT&gt;</code>   facet  installed  in
                <code>f</code>'s locale  to determine valid whitespace
                characters.  It  is  unspecified  by  what  means  the
                function   performs  case-insensitive   comparison  or
                whether multi-character sequences are considered while
                doing so.

            </p>
            <p>

                <i>Returns:</i> <code>s</code>.

            </p>
        </blockquote>
        <p>

            Add the following paragraphs to the end of
            [lib.locale.time.get.virtuals]:

        </p>
        <blockquote>
            <code>

            iter_type do_get (iter_type s, iter_type end, ios_base& f,
                              ios_base::iostate& err, tm* t,
                              char format, char modifier) const;
            </code>
            <p>

                <i>Requires:</i> <code>t</code> is a valid pointer.

            </p>
            <p>

                <i>Effects:</i>  The  function  starts  by  evaluating
                <code>err  = ios_base::goodbit</code>.  It  then reads
                characters   starting  at   <code>s</code>   until  it
                encounters an  error, or until it  has extracted those
                <code>struct  tm</code>  members,  and  any  remaining
                format  characters,   corresponding  to  a  conversion
                directive   appropriate for the  ISO/IEC 9945 function
                <code>strptime</code>    formed    by    concatenating
                <code>'%'</code>, the <code>modifier</code> character,
                when  non-NUL, and the  <code>format</code> character.
                When the concatenation fails to yield a valid complete
                directive the function leaves the object pointed to by
                <code>t</code>  unchanged and  evaluates  <code>err |=
                ios_base::failbit</code>. When <code>(s == end)</code>
                evaluates  to  true  after  reading  a  character  the
                function              evaluates              <code>err
                |= ios_base::eofbit</code>. 

            </p>
            <p>

                <i>Note:</i> It is  unspecified whether multiple calls
                to <code>do_get()</code> with  the address of the same
                <code>struct tm</code> object  will update the current
                contents  of  the   object  or  simply  overwrite  its
                members.  Portable  programs must zero  out the object
                before invoking the function. 

            </p>
            <p>

                <i>Returns:</i>   An  iterator   pointing  immediately
                beyond the last  character recognized as possibly part
                of   a   valid    input   sequence   for   the   given
                <code>format</code> and <code>modifier</code>.

            </p>
        </blockquote>
        <!------------------------------------------------------------>
        <h2>
            <a name="implementation">Implementation</a>
        </h2>
        <p>

            A reference implementation  of this extension is available
            for     review      in     the     Open      Source     <a
            href="http://incubator.apache.org/stdcxx/">Apache       C++
            Standard   Library</a>.  The   same  extension   has  been
            implemented  in the Rogue  Wave&reg; C++  Standard Library
            and       shipped      since      2001.        See      <a
            href="http://www.roguewave.com/support/docs/sourcepro/edition9-update1/html/stdlibref/time-get.html#idx1241">
            this page</a> for the latest documentation of the feature.

        </p>
        <!------------------------------------------------------------>
        <h2>
            <a name="impact">Impact On Programs</a>
        </h2>
        <p>

            The proposed extensions are largely source compatible with
            the  existing interface of  the <code>time_get</code>facet
            (there is a  very small chance that the  introduction of a
            new  a  base  class   member  function  might  affect  the
            well-formedness  or even  the behavior  of a  program that
            calls a  function with  the same name  in a  class derived
            from the base). Adding a  new virtual member function is a
            binary incompatible change.

        </p>
    </body>
</html>
