Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Markus Armbruster <armbru@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>,
	Laszlo Ersek <lersek@redhat.com>,
	qemu-devel@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions
Date: Fri, 18 Jan 2013 19:06:28 +0100	[thread overview]
Message-ID: <874nieqsbf.fsf@blackfin.pond.sub.org> (raw)
In-Reply-To: <20130118132646.GW10683@otherpad.lan.raisama.net> (Eduardo Habkost's message of "Fri, 18 Jan 2013 11:26:46 -0200")

Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>> > There are lots of duplicate parsing code using strto*() in QEMU, and
>> > most of that code is broken in one way or another. Even the visitors
>> > code have duplicate integer parsing code[1]. This introduces functions
>> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
>> >
>> > Parsing functions for signed ints and floats will be submitted later.
>> >
>> > parse_uint_full() has all the checks made by opts_type_uint64() at
>> > opts-visitor.c:
>> >
>> >  - Check for NULL (returns -EINVAL)
>> >  - Check for negative numbers (returns -ERANGE)
>> >  - Check for empty string (returns -EINVAL)
>> >  - Check for overflow or other errno values set by strtoll() (returns
>> >    -errno)
>> >  - Check for end of string (reject invalid characters after number)
>> >    (returns -EINVAL)
>> >
>> > parse_uint() does everything above except checking for the end of the
>> > string, so callers can continue parsing the remainder of string after
>> > the number.
>> >
>> > Unit tests included.
>> >
>> > [1] string-input-visitor.c:parse_int() could use the same parsing code
>> >     used by opts-visitor.c:opts_type_int(), instead of duplicating that
>> >     logic.
>> [...]
>> > diff --git a/util/cutils.c b/util/cutils.c
>> > index 80bb1dc..7f09251 100644
>> > --- a/util/cutils.c
>> > +++ b/util/cutils.c
>> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
>> >      return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
>> >  }
>> >  
>> > +/* Try to parse an unsigned integer
>> > + *
>> > + * Error checks done by the function:
>> > + * - NULL pointer will return -EINVAL.
>> > + * - Empty strings will return -EINVAL.
>> 
>> Same for strings containing only whitespace.
>
> Oh, you're right.
>
>> 
>> > + * - Overflow errors or other errno values  set by strtoull() will
>> 
>> Extra space before 'set'.
>
> Oops.
>
>> 
>> > + *   return -errno (-ERANGE in case of overflow).
>> > + * - Differently from strtoull(), values starting with a minus sign are
>> > + *   rejected (returning -ERANGE).
>> > + *
>> > + * Sets endptr to point to the first invalid character.
>> 
>> Mention that unlike strtol() & friends, endptr must not be null?
>> Probably not necessary.
>
> I believe it is implicit if I document it as "Sets *endptr". But worth
> documenting as it is different from strtoull() behavior.
>
>> 
>> The two ERANGE cases treat endptr differently: it's either set to point
>> just behind the out-of-range number, or to point to the beginning of the
>> out-of-range number.  Ugh.
>
> This should be fixed in the newer version I sent.
>
>> 
>> > +                                                        Callers may rely
>> > + * on *value and *endptr to be always set by the function, even in case of
>> > + * errors.
>> 
>> You neglect to specify what gets assigned to *value in error cases.
>> 
>
> Undefined.  :-)
>
> Anyway, I agree it not very useful to document that "*value and *endptr
> will be always set" if it is undefined. I will try to reword it.

Yes, please.

I'm not fond of undefined behavior, unless there's a good reason.

If you don't want to assign a defined value, consider not assigning
anything.

>> Your list of error checks isn't quite complete.  Here's my try:
>> 
>> /**
>>  * Parse unsigned integer
>>  *
>>  * @s: String to parse
>>  * @value: Destination for parsed integer value
>>  * @endptr: Destination for pointer to first character not consumed
>>  * @base: integer base, between 2 and 36 inclusive, or 0
>>  *
>>  * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>>  * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>>  * more digits.
>
> The newer version has '-' as _not_ part of the syntax.
>
>>  *
>>  * If @s is null, or @base is invalid, or @s doesn't start with an
>>  * integer in the syntax above, set *@value to 0, *@endptr to @s, and
>>  * return -EINVAL.
>
> This matches the behavior of the newer version. But I would like to keep
> *value documented as undefined in case of errors.
>
>>  *
>>  * Set @endptr to point right beyond the parsed integer.
>>  *
>>  * If the integer is negative, set *@value to 0, and return -ERANGE.
>
> This isn't the case in the newer version.
>
>>  * If the integer overflows unsigned long long, set *@value to
>>  * ULLONG_MAX, and return -ERANGE.
>>  * Else, set *@value to the parsed integer, and return 0.
>>  */
>
> Thanks for taking the time to write this. I hate having to look up
> what's the right syntax to use in docstrings, so I often just write
> plain comments before functions.  :-)

I don't particularly like GLib-style doc-strings, but it's what we use
when we use anything, which is rarely.

> I have a minor problem with the documentation above: as a developer, the
> most important question I have is: what's the difference between using
> parse_uint() and using strtoull() directly? But maybe it is a good
> thing: instead of referencing the strtoull() specification (and an
> implementation detail), now we have a well-defined and well-documented
> behavior.

I understand why you're asking for the difference to stroull().  Trouble
is strtoull() is complex, and often misunderstood.

>> > + *
>> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
>> > + *
>> > + * Returns 0 on success, negative errno value on error.
>> > + */
>> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
>> > +               int base)
>> > +{
>> > +    int r = 0;
>> > +    char *endp = (char *)s;
>> > +    unsigned long long val = 0;
>> > +
>> > +    if (!s) {
>> > +        r = -EINVAL;
>> > +        goto out;
>> > +    }
>> > +
>> > +    /* make sure we reject negative numbers: */
>> > +    while (isspace((unsigned char)*s)) {
>> > +        ++s; endp++;
>> 
>> Mixing pre- and post-increment like that is ugly.  Recommend to stick to
>> post-increment.
>
> Agreed. I blame the copy & paste I made from opts-visitor. Later I added
> the postfix increment without noticing how ugly it looked like
>
> Anyway, this was changed in the newer patch version.
>
>> 
>> I'd set endp after the loop.  Right before goto.
>
> Fixed in the newer version.
>
>> 
>> Or simply change the specification to set *endptr to point beyond the
>> integer instead of to the '-'.  I took the liberty to do exactly that in
>> my proposed function comment.
>
> The newer version was changed to return -EINVAL and set endptr to the
> beginning of the string.
>
>> 
>> > +    }
>> > +    if (*s == '-') {
>> > +        r = -ERANGE;
>> > +        goto out;
>> > +    }
>> 
>> This creates the special case that made me go "ugh" above.  Suggest to
>> drop the if here, and check right after strtoull() instead, as shown
>> below.  That way, we get the same endptr behavior for all out-of-range
>> numbers.
>
> Is returning -EINVAL acceptable for you, as well? In your proposal we
> consider "-1234" part of the language but out-of-range. Returning
> -EINVAL, we consider "-1234" not part of the language, and invalid
> input. The newer version returns -EINVAL.

I prefer -ERANGE on underflow, for symmetry with parsing *signed*
integers.  A similar parsing function for signed integers would surely
assign LLONG_MIN and return -ERANGE then.

A secondary, weak argument: caller can more easily distinguish the
failure modes:

* -EINVAL: @s invalid, @base invalid (both programming errors), or @s
  doesn't start with an integer.  I use "integer" in the mathematical
  sense here.

* -ERANGE, *value == 0: integer underflows *value

* -ERANGE, *value == ULLONG_MAX: integer overflows *value.

If we lump underflow into the -EINVAL case, you have to check *endptr ==
'-' to find it.

The argument is weak, because I don't have a caller ready that actually
wants to find it.

>> > +
>> > +    errno = 0;
>> > +    val = strtoull(s, &endp, base);
>> 
>>        if (*s = '-') {
>>            r = -ERANGE;
>>            val = 0;
>>            goto out;
>>        }
>
> This has another bug: makes endptr point to the middle of the string in
> case we are parsing "   xxx" or "   -1234".

It makes endptr point right behind the integer, doesn't it?

When parsing "   xxx", it points to the first 'x' (we skipped the spaces
already).

When parsing "   -1234", it points behind '4'.

>> > +    if (errno) {
>> > +        r = -errno;
>> > +        goto out;
>> > +    }
>> > +
>> > +    if (endp == s) {
>> > +        r = -EINVAL;
>> > +        goto out;
>> > +    }
>> > +
>> > +out:
>> > +    *value = val;
>> > +    *endptr = endp;
>> > +    return r;
>> > +}
>> > +
>> > +/* Try to parse an unsigned integer, making sure the whole string is parsed
>> > + *
>> > + * Have the same behavior of parse_uint(), but with an additional check
>> > + * for additional data after the parsed number (in that case, the function
>> > + * will return -EINVAL).
>> 
>> What's assigned to *value then?
>
> Undefined.  :-)
>
>
>> 
>> > + */
>> 
>> Alternatively, you could make into parse_uint() do that check when
>> endptr is null, and drop this function.  Matter of taste.
>
> I considered doing that, but I believe it would be too subtle.
>
>> 
>> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
>> > +{
>> > +    char *endp;
>> > +    int r;
>> > +
>> > +    r = parse_uint(s, value, &endp, base);
>> > +    if (r < 0) {
>> > +        return r;
>> > +    }
>> > +    if (*endp) {
>> 
>> Answering my own question: the parsed integer is assigned to *value then.
>
> I prefer to document it as undefined. If you want to use the parsed
> integer even if there's additional data after it, you should use
> parse_int() instead.

Maybe.

What I want is a proper function contract.  That includes how *value is
changed.  *Especially* when it's an unspecified change.

>> > +        return -EINVAL;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> > +
>> >  int qemu_parse_fd(const char *param)
>> >  {
>> >      int fd;

next prev parent reply	other threads:[~2013-01-18 18:06 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-16 18:28 [Qemu-devel] [PATCH 0/8] -numa option parsing fixes (v3) Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions Eduardo Habkost
2013-01-17 18:29   ` Laszlo Ersek
2013-01-17 18:50     ` Eduardo Habkost
2013-01-17 19:06       ` [Qemu-devel] [PATCH 1/8 v4] " Eduardo Habkost
2013-01-17 19:25         ` Eduardo Habkost
2013-01-17 19:31         ` Laszlo Ersek
2013-01-17 19:55         ` Eric Blake
2013-01-17 21:04         ` Blue Swirl
2013-01-18 10:01   ` [Qemu-devel] [PATCH 1/8] " Markus Armbruster
2013-01-18 13:26     ` Eduardo Habkost
2013-01-18 13:32       ` Andreas Färber
2013-01-18 17:57       ` [Qemu-devel] [PATCH 1/8 v5] " Eduardo Habkost
2013-01-18 18:11         ` Eric Blake
2013-01-18 19:41           ` [Qemu-devel] [PATCH 1/8 v6] " Eduardo Habkost
2013-01-18 20:20             ` Eric Blake
2013-01-18 18:06       ` Markus Armbruster [this message]
2013-01-16 18:28 ` [Qemu-devel] [PATCH 2/8] vl.c: Fix off-by-one bug when handling "-numa node" argument Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 3/8] vl.c: Abort on unknown -numa option type Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 4/8] vl.c: Check for NUMA node limit inside numa_add() Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 5/8] vl.c: numa_add(): Validate nodeid before using it Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 6/8] vl.c: Use parse_uint_full() for NUMA nodeid Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 7/8] vl.c: Extract -numa "cpus" parsing to separate function Eduardo Habkost
2013-01-16 18:28 ` [Qemu-devel] [PATCH 8/8] vl.c: validate -numa "cpus" parameter properly Eduardo Habkost
2013-01-16 20:07 ` [Qemu-devel] [PATCH 0/8] -numa option parsing fixes (v3) Eric Blake
     [not found] ` <20130128165559.GA6849@otherpad.lan.raisama.net>
     [not found]   ` <20130131154227.GH6849@otherpad.lan.raisama.net>
2013-02-01 20:45     ` [Qemu-devel] [PATCH for-1.4 " Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2013-01-16 15:24 [Qemu-devel] [PATCH 0/8] -numa option parsing fixes (v2) Eduardo Habkost
2013-01-16 15:24 ` [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions Eduardo Habkost
2013-01-16 17:10   ` Eric Blake
2013-01-16 17:33     ` Eduardo Habkost
2013-01-16 17:50       ` Eric Blake
2013-01-16 17:56         ` Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874nieqsbf.fsf@blackfin.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=chegu_vinod@hp.com \
    --cc=ehabkost@redhat.com \
    --cc=lersek@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.