From: Alejandro Colomar <alx@kernel.org>
To: Matthew House <mattlloydhouse@gmail.com>
Cc: linux-man <linux-man@vger.kernel.org>,
Zack Weinberg <zack@owlfolio.org>,
Lee Griffiths <poddster@gmail.com>
Subject: Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
Date: Wed, 6 Dec 2023 21:17:57 +0100 [thread overview]
Message-ID: <ZXDXBngCYG11NsMZ@debian> (raw)
In-Reply-To: <20231206183351.749567-1-mattlloydhouse@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4367 bytes --]
Hi Matthew,
On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote:
> On Wed, Dec 6, 2023 at 11:36 AM Alejandro Colomar <alx@kernel.org> wrote:
> > Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
> > isn't easy to use portably (since POSIX allows EINVAL on no conversion,
> > how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
>
> I feel like this is rather overstating the difficulty. In practice, the
> no-conversion condition is very commonly detected by checking whether
> *endptr == nptr after the call. The usual idiom I see is something like:
>
> char *end;
> errno = 0;
> value = strtol(ptr, &end, 10);
> if (end == ptr || *end != '\0' || errno == ERANGE)
That test could trigger UB, if you passed an unsupported base. Of
course, in this case you pass 10, but what if the base was a
user-controlled variable? In such a case, nothing says what happens to
'end' (experimentally, I see it is not modified, so it would be left
uninitialized); so dereferencing it, or even comparing it, would be UB.
> goto err;
Yeah, if you just don't care and want to handle all errors in the same
way, and you know the base is supported, this is correct.
But what happens when you want to differentiate the different errors?
Let's list the possible errors, as per strtoi(3bsd):
ERRORS
[ECANCELED] The string did not contain any characters that
were converted.
[EINVAL] The base is not between 2 and 36 and does not
contain the special value 0.
[ENOTSUP] The string contained non‐numeric characters
that did not get converted. In this case,
endptr points to the first unconverted charac‐
ter.
[ERANGE] The given string was out of range; the value
converted has been clamped; or the range given
was invalid, i.e. lo > hi.
Let's see how strtol(3) handles these:
ECANCELED:
strtol(1) has `end == ptr`. But POSIX allows EINVAL. But make sure you
pass a supported base.
EINVAL:
strtol(1) has EINVAL. But what happens to end? It could be left
unmodified (current glibc behavior); or could be set to ptr, since none
of the string has been read. If the former, it's easy to trigger UB.
If the latter, it is indistinguishable from ECANCELED.
ENOTSUPP:
strtol(3) has `*end != '\0'`. But make sure you pass a supported base,
or buy a protector for nasal demons.
ERANGE:
strtol(3) has ERANGE; same as strtoi().
In the end, it amounts to saying: "the behavior of strtol(3) is
undefined if the base is unsupported; don't bother to test EINVAL: don't
trigger it". Which is fine, but we need to clarify that, because if
someone actually needs to use a non-standard base, they should be very
careful, and set end=NULL before the call (but there are no guarantees
that end is not modified either, so...). Or better, provide strtoi(3)
and compare (err != 0), or (err != 0 && err != E***) if you explicitly
allow some error.
>
> Of course, the *end != '\0' condition can be omitted or adapted as
> necessary. Alternatively, one can avoid checking errno at all, by just
> checking whether the value is in the permitted range, since the saturating
> behavior will make such a check reject on overflow. And even without an
> explicit permitted range, one can just reject on on value == LONG_MIN ||
> value == LONG_MAX, or just on value == ULONG_MAX for strtoul(3); rejecting
> a value that's almost an overflow isn't going to harm anything, except for
> the rare scenarios where a printed integer can actually reach the minimum
> or maximum, but needs to be round-tripped unconditionally.
>
> In general, I don't think most programmers are in the habit of carefully
> distinguishing errno values for <string.h> functions. They'd rather check
> for self-explanatory conditions, such as *endptr == nptr, that readers
> don't have to refer to the man page to decipher. There's a reason that most
> high-level language bindings return errno values for file I/O but not for
> anything else.
>
> Thank you,
> Matthew House
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-12-06 20:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 14:52 [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS Alejandro Colomar
2023-12-06 16:36 ` Alejandro Colomar
2023-12-06 18:33 ` Matthew House
2023-12-06 20:17 ` Alejandro Colomar [this message]
2023-12-06 20:45 ` Matthew House
2023-12-06 20:54 ` Matthew House
2023-12-06 21:12 ` Alejandro Colomar
[not found] ` <CAKXok1GQvKi2HiBU89CSd+KF_dd9+mOMVhHrMKAVLLwcyJDN2g@mail.gmail.com>
2023-12-07 21:50 ` Fwd: " Lee Griffiths
2023-12-09 11:55 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXDXBngCYG11NsMZ@debian \
--to=alx@kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=mattlloydhouse@gmail.com \
--cc=poddster@gmail.com \
--cc=zack@owlfolio.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox