From: Alejandro Colomar <alx@kernel.org>
To: thomas@habets.se
Cc: linux-man@vger.kernel.org
Subject: Re: [patch] atoi.3: Document return value on under/overflow as undefined
Date: Sun, 10 Dec 2023 21:35:15 +0100 [thread overview]
Message-ID: <ZXYhCo6s-usIn-9d@debian> (raw)
In-Reply-To: <CA+kHd+cpgbREUpfm+xBJkhUNc52n1juM3gF_M+8_Wo3AU6wdEw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4703 bytes --]
Hello Thomas,
On Sun, Dec 10, 2023 at 06:08:48AM -0800, thomas@habets.se wrote:
> See patch below.
>
> --
> typedef struct me_s {
> char name[] = { "Thomas Habets" };
> char email[] = { "thomas@habets.se" };
> char kernel[] = { "Linux" };
> char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" };
> char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" };
> char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" };
> } me_t;
>
>
> commit 095cc630082ea389d5f6657ce497e02d3dde0b21
> Author: Thomas Habets <thomas@habets.se>
> Date: Sun Dec 10 13:44:47 2023 +0000
>
> atoi.3: Document return value on under/overflow as undefined
>
> Before this change, the manpage is clear enough:
>
> ```
> RETURN VALUE
> The converted value or 0 on error.
For extra fun, you could have quoted this together :)
```
except that atoi() does not detect errors.
```
> […]
> No checks for overflow or underflow are done.
> ```
>
> This is not really true. atoi() uses strtol() to convert from string
> to long, and the results may under or overflow a long, in which
> case strtol() returns LONG_MIN and LONG_MAX, respectively.
>
> LONG_MIN cast to int is 0, which lives up to the manpage just fine
> ("0 on error"), assuming underflow should be seen as an error.
>
> LONG_MAX cast to int is -1.
>
> POSIX says "The atoi() function shall return the converted value if
> the value can be represented", the current behavior doesn't violate
> POSIX.
>
> But is surprising. And arguably is incorrectly documented for Linux
> manpages. There is, in fact, a range check, but but against long, not
> int.
We could say it's just an accident, and not an intentional check.
Something similar happens in sscanf(3). Since something between INT_MAX
and LONG_MAX won't be covered by that range check, let's say there's
none, for simplicity.
> "Error" is not defined in the manpage. Is over/underflow an
> error?
>
> It's kinda handled, kinda not, with the effect that over and underflow
> have different return values for atoi(), and for atol() proper range
> checking is in fact being done by the implementation.
>
> It would be possible to document atol(3) to say that it actually does
> range checking, but that seems like a bigger commitment than this
> clarification.
>
> More thoughts from me on parsing and handling integers:
>
> https://blog.habets.se/2022/10/No-way-to-parse-integers-in-C.html
> https://blog.habets.se/2022/11/Integer-handling-is-broken.html
Very interesting!
>
> Previously (incorrectly) filed as a bug here:
> https://sourceware.org/bugzilla/show_bug.cgi?id=29753
>
> Signed-off-by: Thomas Habets <thomas@habets.se>
>
> diff --git a/man3/atoi.3 b/man3/atoi.3
> index f5fb5d0e1..7c005fc15 100644
> --- a/man3/atoi.3
> +++ b/man3/atoi.3
> @@ -111,7 +111,9 @@ only.
> .I errno
> is not set on error so there is no way to distinguish between 0 as an
> error and as the converted value.
> -No checks for overflow or underflow are done.
> +The return value in case of under/overflow is undefined, but currently
> +atol() and atoll() return LONG_MIN/LONG_MAX and LLONG_MIN/LLONG_MAX,
> +respectively.
I don't want to document current behavior, since that behavior is
completely bogus, and beter described as undefined. Let curious
programmers find out how much undefined it is.
Also, it's not only the return value that is undefined; the entire
program behavior is undefined. We're lucky that the compiler is
(likely) unable to see the UB, and so it can't freak out.
So, a patch should say the behavior is undefined if the value is not
representable in an int.
However, maybe we should instead try to fix glibc to do the right thing.
int
atoi(const char *nptr)
{
int i, err;
i = strtoi(nptr, NULL, 10, INT_MIN, INT_MAX, &err);
if (err)
errno = err;
return i;
}
This is compatible with ISO C, since it behaves like
(int) strtol(nptr, NULL, 10);
"Except for the behavior on error", in which this atoi(3) implementation
sets errno, but nothing forbids that (ISO C only says "need not affect
the value of the integer expression errno on an error", which allows
affecting errno). POSIX also allows this implementation: "except that
the handling of errors may differ".
Have a lovely night,
Alex
> Only base-10 input can be converted.
> It is recommended to instead use the
> .BR strtol ()
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-12-10 20:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-10 14:08 [patch] atoi.3: Document return value on under/overflow as undefined thomas
2023-12-10 20:35 ` Alejandro Colomar [this message]
2023-12-10 22:25 ` Thomas Habets
2023-12-10 23:47 ` Alejandro Colomar
2023-12-11 10:53 ` Thomas Habets
2023-12-11 11:51 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXYhCo6s-usIn-9d@debian \
--to=alx@kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=thomas@habets.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox