Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.org>,
	git@vger.kernel.org
Subject: Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing
Date: Tue, 24 Mar 2015 16:05:56 +0100	[thread overview]
Message-ID: <55117D54.103@alum.mit.edu> (raw)
In-Reply-To: <20150319052620.GA30645@peff.net>

On 03/19/2015 06:26 AM, Jeff King wrote:
> On Tue, Mar 17, 2015 at 05:00:02PM +0100, Michael Haggerty wrote:
> 
>> My main questions:
>>
>> * Do people like the API? My main goal was to make these functions as
>>   painless as possible to use correctly, because there are so many
>>   call sites.
>>
>> * Is it too gimmicky to encode the base together with other options in
>>   `flags`? (I think it is worth it to avoid the need for another
>>   parameter, which callers could easily put in the wrong order.)
> 
> I definitely like the overall direction of this. My first thought was
> that it does seem like there are a lot of possible options to the
> functions (and OR-ing the flags with the base does seem weird, though I
> can't think of a plausible case where it would actually cause errors).
> Many of those options don't seem used in the example conversions (I'm
> not clear who would want NUM_SATURATE, for example).

There are a lot of options, but so far only few of them have been used.
As the call sites are rewritten we will see which of these features are
useful, and which can be dispensed with. If groups of options tend to be
used together, we can define constants for them like I've done with
NUM_SLOPPY and NUM_SIGN.

Regarding NUM_SATURATE: I'm not sure who would want it either, but I
thought there might be places where the user wants to specify
(effectively) "infinity", and it might be convenient to let him specify
something easy to type like "--max=999999999999" rather than
"--max=2147483647".

> I wondered if we could do away with the radix entirely. Wouldn't we be
> asking for base 10 most of the time? Of course, your first few patches
> show octal parsing, but I wonder if we should actually have a separate
> parse_mode() for that, since that seems to be the general reason for
> doing octal parsing. 100000644 does not overflow an int, but it is
> hardly a reasonable mode.

Again, as a first pass I wanted to just have a really flexible API so
that call sites can be rewritten without a lot of extra thought. If
somebody wants to add a parse_mode() function, it will be easy to build
on top of convert_ui(). But that change can be done after this one.

> I also wondered if we could get rid of NUM_SIGN in favor of just having
> the type imply it (e.g., convert_l would always allow negative numbers,
> whereas convert_ul would not). But I suppose there are times when we end
> up using an "int" to store an unsigned value for a good reason (e.g.,
> "-1" is a sentinel value, but we expect only positive values from the
> user). So that might be a bad idea.

Yes, as I was rewriting call sites, I found many that used the unsigned
variants of the parsing functions but stored the result in an int.
Probably some of these use -1 to denote "unset"; it might be that there
are other cases where the variable could actually be declared to be
"unsigned int".

Prohibiting signs when parsing signed quantities isn't really elegant
from an API purity point of view, but it sure is handy!

> I notice that you go up to "unsigned long" here for sizes. If we want to
> use this everywhere, we'll need something larger for parsing off_t
> values on 32-bit machines (though I would not at all be surprised with
> the current code if 32-bit machines have a hard time configuring a
> pack.packSizeLimit above 4G).

Yes, probably. I haven't run into such call sites yet.

> I wonder how much of the boilerplate in the parse_* functions could be
> factored out to use a uintmax_t, with the caller just providing the
> range. That would make it easier to add new types like off_t, and
> possibly even constrained types (e.g., an integer from 0 to 100). On the
> other hand, you mentioned to me elsewhere that there may be some bugs in
> the range-checks of config.c's integer parsing. I suspect they are
> related to exactly this kind of refactoring, so perhaps writing things
> out is best.

It's not a lot of code yet. If we find out we need variants for size_t
and off_t and uintmax_t and intmax_t then such a refactoring would
definitely be worth considering.

>> * Am I making callers too strict? In many cases where a positive
>>   integer is expected (e.g., "--abbrev=<num>"), I have been replacing
>>   code like
>> [...]
> 
> IMHO most of the tightening happening here is a good thing, and means we
> are more likely to notice mistakes rather than silently doing something
> stupid.
> 
> For sites that currently allow it, I could imagine people using hex
> notation for some values, though, depending on the context. It looks
> there aren't many of them ((it is just when the radix is "0", right?).
> Some of them look to be accidental (does anybody really ask for
> --threads=0x10?), but others might not be (the pack index-version
> contains an offset field that might be quite big).

Yes, we can even debate whether we want to implement a general policy
that user-entered integers can be specified in any radix. Probably
nobody will ever specify "--threads=0x10", but is there harm in allowing it?

Against the gain in flexibility, I see the following potential
disadvantages:

* Added cognitive burden for rarely-used flexibility.
* Other Git clients might feel obliged to be just as flexible, causing
their implementers extra work.
* Users might see commands written in unfamiliar ways (e.g. in scripts
or stackoverflow) and get confused.
* Octal is ambiguous with decimal. What to make of "--count=010"? People
who expect octal numbers to be allowed will think it means 8, whereas
people who don't expect octal numbers to be allowed (or don't even know
what an octal number is!) will think that it means 10. Such uses might
even already exist and have their meaning changed if we start allowing
octal.

All in all, I thought it is less error-prone to default to allowing only
decimal, except in selected situations where hex and octal are
traditionally used (e.g., file modes, memory limits).

Most of the call sites so far have explicitly specified decimal parsing,
and I have left them unchanged.

> Feel free to ignore any or all of that. It is not so much criticism as
> a dump of thoughts I had while reading the patches. Perhaps you can pick
> something useful out of that, and perhaps not. :)

Thanks very much for the feedback!

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

next prev parent reply	other threads:[~2015-03-24 15:06 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-17 16:00 [PATCH 00/14] numparse module: systematically tighten up integer parsing Michael Haggerty
2015-03-17 16:00 ` [PATCH 01/14] numparse: new module for parsing integral numbers Michael Haggerty
2015-03-18 18:27   ` Eric Sunshine
2015-03-18 22:47     ` Michael Haggerty
2015-03-20  8:54       ` Eric Sunshine
2015-03-20 17:51   ` Junio C Hamano
2015-03-17 16:00 ` [PATCH 02/14] cacheinfo_callback(): use convert_ui() when handling "--cacheinfo" Michael Haggerty
2015-03-17 16:00 ` [PATCH 03/14] write_subdirectory(): use convert_ui() for parsing mode Michael Haggerty
2015-03-17 16:00 ` [PATCH 04/14] handle_revision_opt(): use skip_prefix() in many places Michael Haggerty
2015-03-17 16:00 ` [PATCH 05/14] handle_revision_opt(): use convert_i() when handling "-<digit>" Michael Haggerty
2015-03-19  6:34   ` Junio C Hamano
2015-03-17 16:00 ` [PATCH 06/14] strtoul_ui(), strtol_i(): remove functions Michael Haggerty
2015-03-17 16:00 ` [PATCH 07/14] handle_revision_opt(): use convert_ui() when handling "--abbrev=" Michael Haggerty
2015-03-17 16:00 ` [PATCH 08/14] builtin_diff(): detect errors when parsing --unified argument Michael Haggerty
2015-03-17 16:00 ` [PATCH 09/14] opt_arg(): val is always non-NULL Michael Haggerty
2015-03-17 16:00 ` [PATCH 10/14] opt_arg(): use convert_i() in implementation Michael Haggerty
2015-03-17 16:00 ` [PATCH 11/14] opt_arg(): report errors parsing option values Michael Haggerty
2015-03-17 16:00 ` [PATCH 12/14] opt_arg(): simplify pointer handling Michael Haggerty
2015-03-17 16:00 ` [PATCH 13/14] diff_opt_parse(): use convert_i() when handling "-l<num>" Michael Haggerty
2015-03-17 16:00 ` [PATCH 14/14] diff_opt_parse(): use convert_i() when handling --abbrev=<num> Michael Haggerty
2015-03-19  6:37   ` Junio C Hamano
2015-03-17 18:48 ` [PATCH 00/14] numparse module: systematically tighten up integer parsing Junio C Hamano
2015-03-17 19:46   ` Michael Haggerty
2015-03-19  6:31     ` Junio C Hamano
2015-03-17 23:05 ` Duy Nguyen
2015-03-18  9:47   ` Michael Haggerty
2015-03-18  9:58     ` Duy Nguyen
2015-03-18 10:03     ` Jeff King
2015-03-18 10:20       ` Michael Haggerty
2015-03-19  5:26 ` Jeff King
2015-03-19  6:41   ` Junio C Hamano
2015-03-19  7:32   ` Junio C Hamano
2015-03-24 16:06     ` Michael Haggerty
2015-03-24 16:49       ` René Scharfe
2015-03-25 21:14         ` Michael Haggerty
2015-03-25 21:59           ` Junio C Hamano
2015-03-24 15:05   ` Michael Haggerty [this message]
2015-03-19  6:22 ` Junio C Hamano
2015-03-24 15:42   ` Michael Haggerty
2015-03-24 15:58     ` Junio C Hamano
2015-03-24 16:09       ` Junio C Hamano
2015-03-24 17:39       ` Michael Haggerty
2015-03-24 18:08         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55117D54.103@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=peff@peff.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).