<html>
<head><title>N2842 - Another numeric facet</title></head>
<body>
<table border=0>
<tr>
  <td><b>Doc No:</b></td>
  <td>WG21 N2842 = 09-0032</td>
</tr>
<tr>
  <td><b>Date:</b></td>
  <td>2009-04-01</td>
</tr>
<tr>
  <td><b>Reply to:</b>&nbsp;</td>
  <td>Bill Seymour &lt;stdbill.h@pobox.com&gt;</td>
</tr>
</table>

<center>

<h1>Another numeric facet</h1>
<h3>
  Bill Seymour<p>the first of April, two thousand nine
</h3>
</center>

<p><hr size=5>

<h2>Abstract</h2>

Wouldn&rsquo;t it be nice if there were a standard way to write natural numbers
as locale-specific text?  This paper proposes a facet to do that.

<p><hr size=5>

<h2>Yes, it has been implemented.</h2>

So far, <tt>numtext&lt;char&gt;</tt> has been implemented in the &ldquo;C,&rdquo;
Danish, German and French locales; and <tt>numtext&lt;wchar_t&gt;</tt> has been
implemented in the Hindi and Russian locales.  There&rsquo;s a demo at
<a href="http://www.stdbill.com/cgi-bin/try_numtext">http://www.stdbill.com/cgi-bin/try_numtext</a>.
(The Hindi ordinals don&rsquo;t work as of this writing;
but it&rsquo;s hoped that they will by the time this
paper is published.)

<p>Because many current C++ implementations don&rsquo;t have <nobr><tt>&lt;cstdint&gt;</tt></nobr>
yet, instead of <tt>uintmax_t</tt>, the demo uses <tt>unsigned</tt>&nbsp;<tt>long</tt>&nbsp;<tt>long</tt>
on implementations that have that type, or just <tt>unsigned</tt>&nbsp;<tt>long</tt>
on ones that don&rsquo;t.

<p>To show that generating text for very large numbers is also possible,
the demo has a page that will generate <nobr>&ldquo;C&rdquo;-locale</nobr>
group names for values up to
<nobr>1000<sup><tt>UINT_MAX</tt></sup>.</nobr>
The author uses
<a href="http://www.isthe.com/chongo/tech/math/number/howhigh.html"><cite>How high can you count?</cite></a>
by Landon Curt Noll as his lexical authority for the <nobr>&ldquo;C&rdquo;&nbsp;locale.</nobr>

<p>The author would like to thank (in <nobr><tt>lexicographical_compare&lt;&gt;</tt></nobr>
order) Soumitra Chatterjee, Ilya Kofman, Jens Maurer,
Dhaivat Parikh, Bjarne Stroustrup, and Willem Wakker
for their help with the current implmentation.

<p><hr size=5>

<h2>The basic design</h2>

<pre>
    class numtext_base {
    public:
        enum inflection {
            none     = 0x0000,
            // ...
            cardinal = 0x0000,
            ordinal  = 0x8000,
        };
    };
</pre>

We&rsquo;ll have a non-template base class with bitmasks for the
various inflections that we might need.
At a minimum, even in the &ldquo;C&rdquo; locale, we&rsquo;d
want to produce both cardinal and ordinal numbers; and other
locales will have additional requirements.  In Danish, for example,
both the cardinal 1 and the ordinal 2 have gender
(<i>en</i>-<i>et</i>, <i>anden</i>-<i>andet</i>).  In German,
the ordinals are considered adjectives with number, gender,
and case; there are <a href="http://www.apronus.com/learngerman/adj.htm">three
complete sets of adjective declensions</a>; and although there is a word for the cardinal 1
(<i>eins</i>), it is often replaced by the indefinite article
(<i>ein</i>-<i>eines</i>-<i>einem</i>-&hellip;).

<pre>
    template&lt;class charT&gt;
    class numtext : public locale::facet, public numtext_base {
    public:
        // ...
        void convert(basic_string&lt;charT&gt;&amp;, uintmax_t, inflection = none) const;
    };
</pre>

Like <tt>codecvt&lt;&gt;</tt> and similar facets, <tt>numtext&lt;&gt;</tt>
doesn&rsquo;t actually do I/O.  Instead, it just returns the correct
text in a <tt>basic_string&lt;&gt;</tt> passed by <nobr>non-<tt>const</tt></nobr>
reference.  Alternatively, we could return a <tt>basic_string&lt;&gt;</tt>
by value; but the strings can be fairly long.  For example, in the &ldquo;C&rdquo;
locale, a <nobr>64-bit</nobr> <tt>ULLONG_MAX</tt> would be &ldquo;eighteen
quintillion four hundred forty-six quadrillion seven hundred forty-four
trillion seventy-three billion seven hundred nine million five hundred
fifty-one thousand six hundred fifteen&rdquo;; and we&rsquo;re only
up to about 10<small><sup>19</sup></small>.

<p>The second argument, the number to convert, can be any unsigned integer
that the C++ implementation can handle.  Having many functions overloaded
on the integer type wouldn&rsquo;t have any noticeable effect on efficiency
since the time it takes to generate the text would certainly swamp whatever
it takes to promote the value to <tt>uintmax_t</tt>.

<p>The optional third argument specifies the inflection.  It defaults to
producing uninflected cardinal numbers.

<p>One or both of two additional overloads on the second argument,
perhaps some user-defined type with integer semantics (presumably a bignum
of some sort), or maybe a <tt>basic_string&lt;&gt;</tt> of digits,
would serve to extend the range.  These overloads are not proposed
by this paper, but could be considered for a TR.

<p><hr size=5>

<h2>More detail on the current design</h2>

<pre>
namespace std {

class numtext_base {
public:
    enum inflection {
        none          = 0x0000,

        number        = 0x0003,
        singular      = 0x0000,
        dual          = 0x0001,
        plural        = 0x0002,
     //               = 0x0003,

        gender        = 0x000C,
        common        = 0x0000,
        masculine     = 0x0004,
        feminine      = 0x0008,
        neuter        = 0x000C,

        lexcase       = 0x00F0,
        nominative    = 0x0000,
        genitive      = 0x0010,
        dative        = 0x0020,
        accusative    = 0x0030,
        oblative      = 0x0040,
        vocative      = 0x0050,
        locative      = 0x0060,
        ergative      = 0x0070,
        absolutive    = 0x0080,
        direct        = 0x0090,
        instrumental  = 0x00A0,
        prepositional = 0x00B0,
     //               = 0x00C0,
     //               = 0x00D0,
     //               = 0x00E0,
     //               = 0x00F0,

        strength      = 0x0300,  // Sometimes, adjectives can be declined
        strong        = 0x0000,  // more weakly if there&rsquo;s another word in
        mixed         = 0x0100,  // the phrase, like a definite article,
        weak          = 0x0200,  // that provides the information.
     //               = 0x0300,

        scale         = 0x0C00,  // In English, is 10**9
        amer          = 0x0000,  // a billion,
        euro          = 0x0400,  // a milliard,
        olduk         = 0x0800,  // or a thousand million?
     //               = 0x0C00,

     // bits 0x7000 unused so far

        cardinal      = 0x0000,
        ordinal       = 0x8000,
    };
};

//
// The facet itself is unsurprising:
//
template&lt;class charT&gt;
class numtext : public locale::facet, public numtext_base {
public:
    typedef charT char_type;
    typedef basic_string&lt;charT&gt; string_type;

    explicit numtext(size_t refs = 0) : locale::facet(refs) { }

    void convert(string_type&amp; dest, uintmax_t val, inflection i = none) const
    {
        do_convert(dest, val, i);
    }

    static locale::id id;

protected:
    ~numtext() { }
    virtual void do_convert(string_type&amp;, uintmax_t, inflection) const = 0;
};

} // namespace std
</pre>

(You can download the current implementation&rsquo;s source code
as <a href="http://www.stdbill.com/numtext/numtext_demo.tar.gz">.tar.gz</a>
or <a href="http://www.stdbill.com/numtext/numtext_demo.zip">.zip</a>.
For the purposes of the demo, the leaf classes have <tt>public</tt>,
<tt>static</tt> <nobr><tt>do_do_convert()</tt></nobr> functions that
the CGI can call directly instead of having to first set a locale
and then use the facet.)

<p><hr size=5>
All suggestions and corrections will be welcome; all flames will be amusing.
<br>Mail to stdbill.h@pobox.com

</body>
</html>
