Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Markus Armbruster <armbru@redhat.com>
To: Tao Xu <tao3.xu@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"mdroth@linux.vnet.ibm.com" <mdroth@linux.vnet.ibm.com>
Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits
Date: Tue, 17 Dec 2019 11:25:39 +0100	[thread overview]
Message-ID: <87y2vbgsf0.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <b7c442e3-cc7e-155e-5370-db9a371928a6@intel.com> (Tao Xu's message of "Mon, 9 Dec 2019 13:38:39 +0800")

Tao Xu <tao3.xu@intel.com> writes:

> On 12/5/19 11:29 PM, Markus Armbruster wrote:
>> Tao Xu <tao3.xu@intel.com> writes:
>>
>>> Parse input string both as a double and as a uint64_t, then use the
>>> method which consumes more characters. Update the related test cases.
>>>
>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>> ---
>> [...]
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 77acadc70a..b08058c57c 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
>>>                         const char default_suffix, int64_t unit,
>>>                         uint64_t *result)
>>>   {
>>> -    int retval;
>>> -    const char *endptr;
>>> +    int retval, retd, retu;
>>> +    const char *suffix, *suffixd, *suffixu;
>>>       unsigned char c;
>>>       int mul_required = 0;
>>> -    double val, mul, integral, fraction;
>>> +    bool use_strtod;
>>> +    uint64_t valu;
>>> +    double vald, mul, integral, fraction;
>>
>> Note for later: @mul is double.
>>
>>> +
>>> +    retd = qemu_strtod_finite(nptr, &suffixd, &vald);
>>> +    retu = qemu_strtou64(nptr, &suffixu, 0, &valu);

Note for later: passing 0 to base accepts octal and hexadecimal
integers.

>>> +    use_strtod = strlen(suffixd) < strlen(suffixu);
>>> +
>>> +    /*
>>> +     * Parse @nptr both as a double and as a uint64_t, then use the method
>>> +     * which consumes more characters.
>>> +     */
>>
>> The comment is in a funny place.  I'd put it right before the
>> qemu_strtod_finite() line.
>>
>>> +    if (use_strtod) {
>>> +        suffix = suffixd;
>>> +        retval = retd;
>>> +    } else {
>>> +        suffix = suffixu;
>>> +        retval = retu;
>>> +    }
>>>   -    retval = qemu_strtod_finite(nptr, &endptr, &val);
>>>       if (retval) {
>>>           goto out;
>>>       }
>>
>> This is even more subtle than it looks.
>>
>> A close reading of the function contracts leads to three cases for each
>> conversion:
>>
>> * parse error (including infinity and NaN)
>>
>>    @retu / @retd is -EINVAL
>>    @valu / @vald is uninitialized
>>    @suffixu / @suffixd is @nptr
>>
>> * range error
>>
>>    @retu / @retd is -ERANGE
>>    @valu / @vald is our best approximation of the conversion result
>>    @suffixu / @suffixd points to the first character not consumed by the
>>    conversion.
>>
>>    Sub-cases:
>>
>>    - uint64_t overflow
>>
>>      We know the conversion result exceeds UINT64_MAX.
>>
>>    - double overflow
>>
>>      we know the conversion result's magnitude exceeds the largest
>>      representable finite double DBL_MAX.
>>
>>    - double underflow
>>
>>      we know the conversion result is close to zero (closer than DBL_MIN,
>>      the smallest normalized positive double).
>>
>> * success
>>
>>    @retu / @retd is 0
>>    @valu / @vald is the conversion result
>>    @suffixu / @suffixd points to the first character not consumed by the
>>    conversion.
>>
>> This leads to a matrix (parse error, uint64_t overflow, success) x
>> (parse error, double overflow, double underflow, success).  We need to
>> check the code does what we want for each element of this matrix, and
>> document any behavior that's not perfectly obvious.
>>
>> (success, success): we pick uint64_t if qemu_strtou64() consumed more
>> characters than qemu_strtod_finite(), else double.  "More" is important
>> here; when they consume the same characters, we *need* to use the
>> uint64_t result.  Example: for "18446744073709551615", we need to use
>> uint64_t 18446744073709551615, not double 18446744073709551616.0.  But
>> for "18446744073709551616.", we need to use the double.  Good.

Also fun: for "0123", we use uint64_t 83, not double 123.0.  But for
"0123.", we use 123.0, not 83.

Do we really want to accept octal and hexadecimal integers?

>> (success, parse error) and (parse error, success): we pick the one that
>> succeeds, because success consumes characters, and failure to parse does
>> not.  Good.
>>
>> (parse error, parse error): neither consumes characters, so we pick
>> uint64_t.  Good.
>>
>> (parse error, double overflow), (parse error, double underflow) and
>> (uint64_t overflow, parse error): we pick the range error, because it
>> consumes characters.  Good.
>>
>> These are the simple combinations.  The remainder are hairier: (success,
>> double overflow), (success, double underflow), (uint64_t overflow,
>> success).  I lack the time to analyze them today.  Must be done before
>> we take this patch.  Any takers?
>
> (success, double overflow), (success, double underflow), pick double
> overflow error, return -ERANGE. Because it consumes
> characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1",
> and prases as uint64_t; but qemu_strtod_finite return -ERANGE and
> consumes all characters. It is OK.

The only way to have double overflow when uint64_t succeeds is an
exponent.  Double consumes the characters making up the exponent,
uint64_t does not.  We use double.

The only way to have double underflow is with an exponent or a decimal
point.  Double consumes their characters, uint64_t does not.  We use
double.

Okay.

> (uint64_t overflow, success), consume the same characters, use the
> uint64_t return -ERANGE. Note that even if qemu_strtod_finite can
> parse these cases such as "18446744073709551617", but the result is
> uint64_t so we also need to return -ERANGE. It is OK.

That's just one of two cases, I think.  The other one is when the
overflowing integer is followed by an exponent or decimal point.  We use
double then.  Converting the double to uint64_t overflows, except when a
negative exponent brings the number into range.

Examples: "18446744073709551617" picks uint64_t overflow,
"18446744073709551617.0" picks double success (but converting it to
uint64_t below overflows), and "18446744073709551617e-10" picks double
success (converted to 1844674407 below).

Okay.

> Thank you for your analysis and suggestion. I will add more test cases
> to cover some of these analysis.

Good move.

next prev parent reply	other threads:[~2019-12-17 10:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05  2:14 [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits Tao Xu
2019-12-05 15:29 ` Markus Armbruster
2019-12-09  5:38   ` Tao Xu
2019-12-17 10:25     ` Markus Armbruster [this message]
2019-12-18  1:33       ` Tao Xu
2019-12-18  5:26         ` Tao Xu
2019-12-18 18:26           ` Markus Armbruster
2019-12-19  7:43             ` Tao Xu
2019-12-19 10:15               ` Markus Armbruster
2019-12-18 21:49         ` Eric Blake
2019-12-17 12:04   ` Christophe de Dinechin
2019-12-17 14:08     ` Markus Armbruster
2019-12-17 14:12       ` Christophe de Dinechin
2019-12-17 15:01         ` Markus Armbruster
2019-12-18  2:29           ` Tao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2vbgsf0.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tao3.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.