Numerics

Numerics

Similar to chlit, strlit etc. numeric parsers are also primitives. Numeric parsers are placed on a section of its own so as to provide a better focus on this important building block. The framework includes a couple of predefined objects for parsing signed and unsigned integers and real numbers. These parsers are fully parametric. Most of the important aspects of numeric parsing can be finely adjusted to suit. This include the radix base, the minimum and maximum number of allowable digits, the exponent, the fraction etc. Policies control the real number parsers' behavior. There are some predefined policies covering the most common real number formats but the user can supply her own when needed.

uint_parser

This class is the simplest among the members of the numerics package. The uint_parser can parse unsigned integers of arbitrary length and size. The uint_parser parser can be used to parse ordinary primitive C/C++ integers or even user defined scalars such as bigints (unlimited precision integers). Like most of the classes in Spirit, the uint_parser is a template class. Template parameters fine tune its behavior. The uint_parser is so flexible that the other numeric parsers are implemented using it as the backbone.

    template 
    <
        typename T = unsigned,
        int Radix = 10,
        unsigned MinDigits = 1,
        int MaxDigits = -1
    >
    struct uint_parser { /*...*/ };

uint_parser template parameters
T	The underlying type of the numeric parser. Defaults to `unsigned int`
Radix	The radix base. This can be either 2: binary, 8: octal, 10: decimal and 16: hexadecimal. Defaults to 10; decimal
MinDigits	The minimum number of digits allowable
MaxDigits	The maximum number of digits allowable. If this is -1, then the maximum limit becomes unbounded

Predefined uint_parsers
bin_p	`uint_parser<unsigned, 2, 1, -1> const`
oct_p	`uint_parser<unsigned, 8, 1, -1> const`
uint_p	`uint_parser<unsigned, 10, 1, -1> const`
hex_p	`uint_parser<unsigned, 16, 1, -1> const`

The following example shows how the uint_parser can be used to parse thousand separated numbers. The example can correctly parse numbers such as 1,234,567,890.

    uint_parser<unsigned, 10, 1, 3> uint3_p;        //  1..3 digits
    uint_parser<unsigned, 10, 3, 3> uint3_3_p;      //  exactly 3 digits
    ts_num_p = (uint3_p >> *(',' >> uint3_3_p));    //  our thousand separated number parser

bin_p, oct_p, uint_p and hex_p are parser generator objects designed to be used within expressions. Here's an example of a rule that parses comma delimited list of numbers (We've seen this before):

    list_of_numbers = real_p >> *(',' >> real_p);

Later, we shall see how we can extract the actual numbers parsed by the numeric parsers. We shall deal with this when we get to the section on specialized actions.

int_parser

The int_parser can parse signed integers of arbitrary length and size. This is almost the same as the uint_parser. The only difference is the additional task of parsing the '+' or '-' sign preceding the number. The class interface is the same as that of the uint_parser.

A predefined int_parser
int_p	`int_parser<int, 10, 1, -1> const`

real_parser

The real_parser can parse real numbers of arbitrary length and size limited by its underlying parametric type T. The real_parser is a template class with 2 template parameters. Here's the real_parser template interface:

    template
    <
        typename T = double,
        typename RealPoliciesT = ureal_parser_policies<T>
    >
    struct real_parser;

The first template parameter is its underlying type T. This defaults to double.

Parsing special numeric types

Notice that T can be specified by the user. This is the underlying data type of the parser. This implies that we can use the numeric parsers to parse user defined numeric types such as fixed_point (fixed point reals) and bigint (unlimited precision integers).

The second template parameter are the policies grouped in a class and defaults to ureal_parser_policies<T>. As already mentioned, policies control the real number parsers' behavior. The default policies are provided to take care of the most common case (there are many ways to represent, and hence parse, real numbers). In most cases, the default setting of the real_parser is sufficient and can be used straight out of the box. Actually, there are two real_parsers pre-defined for immediate use:

Predefined real_parsers
ureal_p	`real_parser<double, ureal_parser_policies<double> > const`
real_p	`real_parser<double, real_parser_policies<double> > const`

We've seen real_p before. ureal_p is its unsigned variant.

The default policies provided are designed to parse C/C++ style real numbers of the form nnn.fff.Eeee where nnn is the whole number part, fff is the fractional part, E is 'e' or 'E'and eee is the exponent optionally preceded by '-' or '+'. This corresponds to the following grammar:

    floatingliteral
        =   fractionalconstant >> !exponentpart
        |  +digit_p >> exponentpart
        ;

    fractionalconstant
        =  *digit_p >> '.' >> +digit_p
        |  +digit_p >> '.'
        ;

    exponentpart
        =   ('e' | 'E') >> !('+' | '-') >> +digit_p
        ;

Advanced: real_parser policies

The parser policies break down real number parsing into 6 steps:

1	parse_sign	Parse the prefix sign
2	parse_n	Parse the integer at the left of the decimal point
3	parse_dot	Parse the decimal point
4	parse_frac_n	Parse the fraction after the decimal point
5	parse_exp	Parse the exponent prefix (e.g. 'e')
6	parse_exp_n	Parse the actual exponent

[ From here on, required reading: The Scanner, In-depth The Parser and In-depth The Scanner ]

sign_parser [ sign_p ]

Before we move on, a small utility parser is included here to ease the parsing of the '-' or '+' sign. While it is easy to write one:

    sign_p = (ch_p('+') | '-');

it is not possible to extract the actual sign (positive or negative) without resorting to semantic actions. The sign_p parser has a bool attribute returned back to the caller through the match object which, after parsing, is set to true if the parsed sign is negative. This attribute can be used to detect if the negative sign has been parsed . Examples:

    bool is_negative;
    r = sign_p[assign(is_negative)];

or simply...

    // directly extract the result from the match result's value
    bool is_negative = sign_p.parse(scan).value();

The sign_p parser expects attached semantic actions to have a signature (see Specialized Actions for further detail) compatible with:

Signature for functions:

    void func(bool is_negative);

Signature for functors:

    struct ftor
    {
        void operator()(bool is_negative) const;
    };

ureal_parser_policies

    template <typename T>
    struct ureal_parser_policies
    {
        typedef uint_parser<T, 10, 1, -1>   uint_parser_t;
        typedef int_parser<T, 10, 1, -1>    int_parser_t;

        template <typename ScannerT>
        static typename match_result<ScannerT, nil_t>::type
        parse_sign(ScannerT& scan)
        { return scan.no_match(); }

        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_n(ScannerT& scan)
        { return uint_parser_t().parse(scan); }

        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_dot(ScannerT& scan)
        { return ch_p('.').parse(scan); }

        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_frac_n(ScannerT& scan)
        { return uint_parser_t().parse(scan); }

        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_exp(ScannerT& scan)
        { return nocase_d['e'].parse(scan); }

        template <typename ScannerT>
        static typename parser_result<int_parser_t, ScannerT>::type
        parse_exp_n(ScannerT& scan)
        { return int_parser_t().parse(scan); }
    };

The default ureal_parser_policies uses the lower level integer numeric parsers to do its job.

real_parser_policies

    template <typename T>
    struct real_parser_policies : public ureal_parser_policies<T>
    {
        template <typename ScannerT>
        static typename parser_result<sign_parser, ScannerT>::type
        parse_sign(ScannerT& scan)
        { return sign_p.parse(scan); }
    };

Notice how the real_parser_policies has replaced parse_sign of the ureal_parser_policies from which it is subclassed from. The default real_parser_policies simply uses a sign_p instead of scan.no_match() in the parse_sign step.

Other "specialized" real parser policies can reuse these defaults. One or more of these policies may be replaced by the client. For example, here's a real number parser that parses thousands separated numbers with at most two decimal places and no exponent:

    template <typename T>
    struct ts_real_parser_policies : public ureal_parser_policies<T> 
    {
        //  These policies can be used to parse thousand separated
        //  numbers with at most 2 decimal digits after the decimal
        //  point. e.g. 123,456,789.01

        typedef uint_parser<int, 10, 1, 2>  uint2_t;
        typedef uint_parser<T, 10, 1, -1>   uint_parser_t;
        typedef int_parser<int, 10, 1, -1>  int_parser_t;

        //////////////////////////////////  2 decimal places Max
        template <typename ScannerT>
        static typename parser_result<uint2_t, ScannerT>::type
        parse_frac_n(ScannerT& scan)
        { return uint2_t().parse(scan); }

        //////////////////////////////////
        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_exp(ScannerT& scan)
        { return scan.no_match(); }

        //////////////////////////////////
        template <typename ScannerT>
        static typename parser_result<int_parser_t, ScannerT>::type
        parse_exp_n(ScannerT& scan)
        { return scan.no_match(); }

        //////////////////////////////////  Thousands separated numbers
        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_n(ScannerT& scan)
        {
            typedef typename parser_result<uint_parser_t, ScannerT>::type RT;
            uint_parser<unsigned, 10, 1, 3> uint3_p;
            uint_parser<unsigned, 10, 3, 3> uint3_3_p;

            if (RT hit = uint3_p.parse(scan))
            {
                T n;
                while (match<> next = (',' >> uint3_3_p[assign(n)]).parse(scan))
                {
                    hit.value() *= 1000;
                    hit.value() += n;
                    scan.concat_match(hit, next);
                }
                return hit;
            }
            return scan.no_match();
        }
    };

    // ts_real_p, our thousand separated numeric parser
    real_parser<double, ts_real_parser_policies<double> > const
        ts_real_p = real_parser<double, ts_real_parser_policies<double> >();

Copyright © 1998-2002 Joel de Guzman

Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.