Re: [PATCH 1/3] utils: Improve qemu_strtosz() to have 64 bits of precision

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	vsementsov@virtuozzo.com, qemu-block@nongnu.org,
	rjones@redhat.com, tao3.xu@intel.com, armbru@redhat.com,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH 1/3] utils: Improve qemu_strtosz() to have 64 bits of precision
Date: Fri, 5 Feb 2021 11:34:29 +0000	[thread overview]
Message-ID: <20210205113429.GG908621@redhat.com> (raw)
In-Reply-To: <20210204190708.1306296-2-eblake@redhat.com>

On Thu, Feb 04, 2021 at 01:07:06PM -0600, Eric Blake wrote:
> We have multiple clients of qemu_strtosz (qemu-io, the opts visitor,
> the keyval visitor), and it gets annoying that edge-case testing is
> impacted by implicit rounding to 53 bits of precision due to parsing
> with strtod().  As an example posted by Rich Jones:
>  $ nbdkit memory $(( 2**63 - 2**30 )) --run \
>    'build/qemu-io -f raw "$uri" -c "w -P 3 $(( 2**63 - 2**30 - 512 )) 512" '
>  write failed: Input/output error
> 
> because 9223372035781033472 got rounded to 0x7fffffffc0000000 which is
> out of bounds.
> 
> It is also worth noting that our existing parser, by virtue of using
> strtod(), accepts decimal AND hex numbers, even though test-cutils
> previously lacked any coverage of the latter.  We do have existing
> clients that expect a hex parse to work (for example, iotest 33 using
> qemu-io -c "write -P 0xa 0x200 0x400"), but strtod() parses "08" as 8
> rather than as an invalid octal number, so we know there are no
> clients that depend on octal.  Our use of strtod() also means that
> "0x1.8k" would actually parse as 1536 (the fraction is 8/16), rather
> than 1843 (if the fraction were 8/10); but as this was not covered in
> the testsuite, I have no qualms forbidding hex fractions as invalid,
> so this patch declares that the use of fractions is only supported
> with decimal input, and enhances the testsuite to document that.
> 
> Our previous use of strtod() meant that -1 parsed as a negative; now
> that we parse with strtoull(), negative values can wrap around module
> 2^64, so we have to explicitly check whether the user passed in a '-'.
> 
> We also had no testsuite coverage of "1.1e0k", which happened to parse
> under strtod() but is unlikely to occur in practice; as long as we are
> making things more robust, it is easy enough to reject the use of
> exponents in a strtod parse.
> 
> The fix is done by breaking the parse into an integer prefix (no loss
> in precision), rejecting negative values (since we can no longer rely
> on strtod() to do that), determining if a decimal or hexadecimal parse
> was intended (with the new restriction that a fractional hex parse is
> not allowed), and where appropriate, using a floating point fractional
> parse (where we also scan to reject use of exponents in the fraction).
> The bulk of the patch is then updates to the testsuite to match our
> new precision, as well as adding new cases we reject (whether they
> were rejected or inadvertenly accepted before).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> Note that this approach differs from what has been attempted in the
> past; see the thread starting at
> https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00852.html
> That approach tried to parse both as strtoull and strtod and take
> whichever was longer, but that was harder to document.
> ---
>  tests/test-cutils.c        | 79 ++++++++++++++++++++++++++++++--------
>  tests/test-keyval.c        | 24 +++++++++---
>  tests/test-qemu-opts.c     | 20 +++++++---
>  util/cutils.c              | 77 +++++++++++++++++++++++++++----------
>  tests/qemu-iotests/049.out |  7 +++-
>  5 files changed, 156 insertions(+), 51 deletions(-)
> 
> diff --git a/tests/test-cutils.c b/tests/test-cutils.c
> index 1aa8351520ae..0c2c89d6f113 100644
> --- a/tests/test-cutils.c
> +++ b/tests/test-cutils.c
> @@ -1960,6 +1960,13 @@ static void test_qemu_strtosz_simple(void)
>      g_assert_cmpint(res, ==, 0);
>      g_assert(endptr == str + 1);
> 
> +    /* Leading 0 gives decimal results, not octal */
> +    str = "08";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, 0);
> +    g_assert_cmpint(res, ==, 8);
> +    g_assert(endptr == str + 2);
> +
>      str = "12345";
>      err = qemu_strtosz(str, &endptr, &res);
>      g_assert_cmpint(err, ==, 0);
> @@ -1970,7 +1977,7 @@ static void test_qemu_strtosz_simple(void)
>      g_assert_cmpint(err, ==, 0);
>      g_assert_cmpint(res, ==, 12345);
> 
> -    /* Note: precision is 53 bits since we're parsing with strtod() */
> +    /* Note: If there is no '.', we get full 64 bits of precision */

IIUC, our goal is that we should never loose precision for the
non-fractional part. 

> 
>      str = "9007199254740991"; /* 2^53-1 */
>      err = qemu_strtosz(str, &endptr, &res);
> @@ -1987,7 +1994,7 @@ static void test_qemu_strtosz_simple(void)
>      str = "9007199254740993"; /* 2^53+1 */
>      err = qemu_strtosz(str, &endptr, &res);
>      g_assert_cmpint(err, ==, 0);
> -    g_assert_cmpint(res, ==, 0x20000000000000); /* rounded to 53 bits */
> +    g_assert_cmpint(res, ==, 0x20000000000001);
>      g_assert(endptr == str + 16);
> 
>      str = "18446744073709549568"; /* 0xfffffffffffff800 (53 msbs set) */
> @@ -1999,11 +2006,35 @@ static void test_qemu_strtosz_simple(void)
>      str = "18446744073709550591"; /* 0xfffffffffffffbff */
>      err = qemu_strtosz(str, &endptr, &res);
>      g_assert_cmpint(err, ==, 0);
> -    g_assert_cmpint(res, ==, 0xfffffffffffff800); /* rounded to 53 bits */
> +    g_assert_cmpint(res, ==, 0xfffffffffffffbff);
>      g_assert(endptr == str + 20);
> 
> -    /* 0x7ffffffffffffe00..0x7fffffffffffffff get rounded to
> -     * 0x8000000000000000, thus -ERANGE; see test_qemu_strtosz_erange() */
> +    str = "18446744073709551615"; /* 0xffffffffffffffff */
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, 0);
> +    g_assert_cmpint(res, ==, 0xffffffffffffffff);
> +    g_assert(endptr == str + 20);
> +
> +}
> +
> +static void test_qemu_strtosz_hex(void)
> +{
> +    const char *str;
> +    const char *endptr;
> +    int err;
> +    uint64_t res = 0xbaadf00d;
> +
> +    str = "0x0";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, 0);
> +    g_assert_cmpint(res, ==, 0);
> +    g_assert(endptr == str + 3);
> +
> +    str = "0xa";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, 0);
> +    g_assert_cmpint(res, ==, 10);
> +    g_assert(endptr == str + 3);

I'd stick a  '0xab' or "0xae" in there, to demonstrate that we're
not accidentally applyin the scaling units for "b" and "e" in hex.

>  }
> 
>  static void test_qemu_strtosz_units(void)
> @@ -2106,6 +2137,21 @@ static void test_qemu_strtosz_invalid(void)
>      err = qemu_strtosz(str, &endptr, &res);
>      g_assert_cmpint(err, ==, -EINVAL);
>      g_assert(endptr == str);
> +
> +    str = "1.1e5";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, -EINVAL);
> +    g_assert(endptr == str);
> +
> +    str = "1.1B";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, -EINVAL);
> +    g_assert(endptr == str);
> +
> +    str = "0x1.8k";
> +    err = qemu_strtosz(str, &endptr, &res);
> +    g_assert_cmpint(err, ==, -EINVAL);
> +    g_assert(endptr == str);
>  }

A test case to reject negative numers ?

> @@ -253,40 +264,66 @@ static int do_strtosz(const char *nptr, const char **end,
>      int retval;
>      const char *endptr;
>      unsigned char c;
> -    int mul_required = 0;
> -    double val, mul, integral, fraction;
> +    bool mul_required = false;
> +    uint64_t val;
> +    int64_t mul;
> +    double fraction = 0.0;
> 
> -    retval = qemu_strtod_finite(nptr, &endptr, &val);
> +    /* Parse integral portion as decimal. */
> +    retval = qemu_strtou64(nptr, &endptr, 10, &val);
>      if (retval) {
>          goto out;
>      }
> -    fraction = modf(val, &integral);
> -    if (fraction != 0) {
> -        mul_required = 1;
> +    if (strchr(nptr, '-') != NULL) {
> +        retval = -ERANGE;
> +        goto out;
> +    }
> +    if (val == 0 && (*endptr == 'x' || *endptr == 'X')) {

This works as an approach but it might be more clear to
just check for hex upfront.

  if (g_str_has_prefix("0x", val) ||
      g_str_has_prefix("0X", val)) {
      ... do hex parse..
  } else {
      ... do dec parse..
  }
      

> +        /* Input looks like hex, reparse, and insist on no fraction. */
> +        retval = qemu_strtou64(nptr, &endptr, 16, &val);
> +        if (retval) {
> +            goto out;
> +        }
> +        if (*endptr == '.') {
> +            endptr = nptr;
> +            retval = -EINVAL;
> +            goto out;
> +        }
> +    } else if (*endptr == '.') {
> +        /* Input is fractional, insist on 0 <= fraction < 1, with no exponent */
> +        retval = qemu_strtod_finite(endptr, &endptr, &fraction);
> +        if (retval) {
> +            endptr = nptr;
> +            goto out;
> +        }
> +        if (fraction >= 1.0 || memchr(nptr, 'e', endptr - nptr)
> +            || memchr(nptr, 'E', endptr - nptr)) {

Can this be simplified ?  Fraction can be > 1, if-and only-if exponent
syntax is used IIUC.

Shouldn't we be passing 'endptr+1' as the first arg to memchr ?

nptr points to the leading decimal part, and we only need to
scan the fractional part.

Also what happens with   "1.1e" - that needs to be treated
as a exabyte suffix, and not rejected as an exponent. We
ought to test that if we don't already.

> +            endptr = nptr;
> +            retval = -EINVAL;
> +            goto out;
> +        }
> +        if (fraction != 0) {
> +            mul_required = true;
> +        }
>      }
>      c = *endptr;
>      mul = suffix_mul(c, unit);
> -    if (mul >= 0) {
> +    if (mul > 0) {
>          endptr++;
>      } else {
>          mul = suffix_mul(default_suffix, unit);
> -        assert(mul >= 0);
> +        assert(mul > 0);
>      }
>      if (mul == 1 && mul_required) {
> +        endptr = nptr;
>          retval = -EINVAL;
>          goto out;
>      }
> -    /*
> -     * Values near UINT64_MAX overflow to 2**64 when converting to double
> -     * precision.  Compare against the maximum representable double precision
> -     * value below 2**64, computed as "the next value after 2**64 (0x1p64) in
> -     * the direction of 0".
> -     */
> -    if ((val * mul > nextafter(0x1p64, 0)) || val < 0) {
> +    if (val > UINT64_MAX / mul) {
>          retval = -ERANGE;
>          goto out;
>      }
> -    *result = val * mul;
> +    *result = val * mul + (uint64_t) (fraction * mul);
>      retval = 0;
> 
>  out:

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

next prev parent reply	other threads:[~2021-02-05 11:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04 19:07 [PATCH 0/3] Improve do_strtosz precision Eric Blake
2021-02-04 19:07 ` [PATCH 1/3] utils: Improve qemu_strtosz() to have 64 bits of precision Eric Blake
2021-02-04 20:12   ` Eric Blake
2021-02-05 10:06     ` Vladimir Sementsov-Ogievskiy
2021-02-05 10:18       ` Vladimir Sementsov-Ogievskiy
2021-02-05 14:06       ` Eric Blake
2021-02-05 14:10         ` Daniel P. Berrangé
2021-02-05 10:07   ` Vladimir Sementsov-Ogievskiy
2021-02-05 14:12     ` Eric Blake
2021-02-05 10:28   ` Richard W.M. Jones
2021-02-05 14:15     ` Eric Blake
2021-02-05 11:02   ` Daniel P. Berrangé
2021-02-05 14:27     ` Eric Blake
2021-02-05 14:36       ` Daniel P. Berrangé
2021-02-05 11:34   ` Daniel P. Berrangé [this message]
2021-02-05 14:36     ` Eric Blake
2021-02-04 19:07 ` [PATCH 2/3] utils: Deprecate hex-with-suffix sizes Eric Blake
2021-02-05 10:25   ` Vladimir Sementsov-Ogievskiy
2021-02-05 10:31     ` Richard W.M. Jones
2021-02-05 13:38     ` Eric Blake
2021-02-05 11:13   ` Daniel P. Berrangé
2021-02-05 13:40     ` Eric Blake
2021-02-05 14:02       ` Daniel P. Berrangé
2021-02-04 19:07 ` [PATCH 3/3] utils: Deprecate inexact fractional suffix sizes Eric Blake
2021-02-04 20:02   ` Eric Blake
2021-02-05 10:34   ` Richard W.M. Jones
2021-02-05 14:19     ` Eric Blake
2021-02-05 10:38   ` Vladimir Sementsov-Ogievskiy
2021-02-05 11:10   ` Daniel P. Berrangé
2021-02-05 14:28     ` Eric Blake
2021-02-05 14:40       ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210205113429.GG908621@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=tao3.xu@intel.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.