* [patch] atoi.3: Document return value on under/overflow as undefined
@ 2023-12-10 14:08 thomas
2023-12-10 20:35 ` Alejandro Colomar
0 siblings, 1 reply; 6+ messages in thread
From: thomas @ 2023-12-10 14:08 UTC (permalink / raw)
To: linux-man
See patch below.
--
typedef struct me_s {
char name[] = { "Thomas Habets" };
char email[] = { "thomas@habets.se" };
char kernel[] = { "Linux" };
char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" };
char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" };
char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;
commit 095cc630082ea389d5f6657ce497e02d3dde0b21
Author: Thomas Habets <thomas@habets.se>
Date: Sun Dec 10 13:44:47 2023 +0000
atoi.3: Document return value on under/overflow as undefined
Before this change, the manpage is clear enough:
```
RETURN VALUE
The converted value or 0 on error.
[…]
No checks for overflow or underflow are done.
```
This is not really true. atoi() uses strtol() to convert from string
to long, and the results may under or overflow a long, in which
case strtol() returns LONG_MIN and LONG_MAX, respectively.
LONG_MIN cast to int is 0, which lives up to the manpage just fine
("0 on error"), assuming underflow should be seen as an error.
LONG_MAX cast to int is -1.
POSIX says "The atoi() function shall return the converted value if
the value can be represented", the current behavior doesn't violate
POSIX.
But is surprising. And arguably is incorrectly documented for Linux
manpages. There is, in fact, a range check, but but against long, not
int. "Error" is not defined in the manpage. Is over/underflow an
error?
It's kinda handled, kinda not, with the effect that over and underflow
have different return values for atoi(), and for atol() proper range
checking is in fact being done by the implementation.
It would be possible to document atol(3) to say that it actually does
range checking, but that seems like a bigger commitment than this
clarification.
More thoughts from me on parsing and handling integers:
https://blog.habets.se/2022/10/No-way-to-parse-integers-in-C.html
https://blog.habets.se/2022/11/Integer-handling-is-broken.html
Previously (incorrectly) filed as a bug here:
https://sourceware.org/bugzilla/show_bug.cgi?id=29753
Signed-off-by: Thomas Habets <thomas@habets.se>
diff --git a/man3/atoi.3 b/man3/atoi.3
index f5fb5d0e1..7c005fc15 100644
--- a/man3/atoi.3
+++ b/man3/atoi.3
@@ -111,7 +111,9 @@ only.
.I errno
is not set on error so there is no way to distinguish between 0 as an
error and as the converted value.
-No checks for overflow or underflow are done.
+The return value in case of under/overflow is undefined, but currently
+atol() and atoll() return LONG_MIN/LONG_MAX and LLONG_MIN/LLONG_MAX,
+respectively.
Only base-10 input can be converted.
It is recommended to instead use the
.BR strtol ()
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [patch] atoi.3: Document return value on under/overflow as undefined 2023-12-10 14:08 [patch] atoi.3: Document return value on under/overflow as undefined thomas @ 2023-12-10 20:35 ` Alejandro Colomar 2023-12-10 22:25 ` Thomas Habets 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2023-12-10 20:35 UTC (permalink / raw) To: thomas; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4703 bytes --] Hello Thomas, On Sun, Dec 10, 2023 at 06:08:48AM -0800, thomas@habets.se wrote: > See patch below. > > -- > typedef struct me_s { > char name[] = { "Thomas Habets" }; > char email[] = { "thomas@habets.se" }; > char kernel[] = { "Linux" }; > char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; > char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" }; > char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; > } me_t; > > > commit 095cc630082ea389d5f6657ce497e02d3dde0b21 > Author: Thomas Habets <thomas@habets.se> > Date: Sun Dec 10 13:44:47 2023 +0000 > > atoi.3: Document return value on under/overflow as undefined > > Before this change, the manpage is clear enough: > > ``` > RETURN VALUE > The converted value or 0 on error. For extra fun, you could have quoted this together :) ``` except that atoi() does not detect errors. ``` > […] > No checks for overflow or underflow are done. > ``` > > This is not really true. atoi() uses strtol() to convert from string > to long, and the results may under or overflow a long, in which > case strtol() returns LONG_MIN and LONG_MAX, respectively. > > LONG_MIN cast to int is 0, which lives up to the manpage just fine > ("0 on error"), assuming underflow should be seen as an error. > > LONG_MAX cast to int is -1. > > POSIX says "The atoi() function shall return the converted value if > the value can be represented", the current behavior doesn't violate > POSIX. > > But is surprising. And arguably is incorrectly documented for Linux > manpages. There is, in fact, a range check, but but against long, not > int. We could say it's just an accident, and not an intentional check. Something similar happens in sscanf(3). Since something between INT_MAX and LONG_MAX won't be covered by that range check, let's say there's none, for simplicity. > "Error" is not defined in the manpage. Is over/underflow an > error? > > It's kinda handled, kinda not, with the effect that over and underflow > have different return values for atoi(), and for atol() proper range > checking is in fact being done by the implementation. > > It would be possible to document atol(3) to say that it actually does > range checking, but that seems like a bigger commitment than this > clarification. > > More thoughts from me on parsing and handling integers: > > https://blog.habets.se/2022/10/No-way-to-parse-integers-in-C.html > https://blog.habets.se/2022/11/Integer-handling-is-broken.html Very interesting! > > Previously (incorrectly) filed as a bug here: > https://sourceware.org/bugzilla/show_bug.cgi?id=29753 > > Signed-off-by: Thomas Habets <thomas@habets.se> > > diff --git a/man3/atoi.3 b/man3/atoi.3 > index f5fb5d0e1..7c005fc15 100644 > --- a/man3/atoi.3 > +++ b/man3/atoi.3 > @@ -111,7 +111,9 @@ only. > .I errno > is not set on error so there is no way to distinguish between 0 as an > error and as the converted value. > -No checks for overflow or underflow are done. > +The return value in case of under/overflow is undefined, but currently > +atol() and atoll() return LONG_MIN/LONG_MAX and LLONG_MIN/LLONG_MAX, > +respectively. I don't want to document current behavior, since that behavior is completely bogus, and beter described as undefined. Let curious programmers find out how much undefined it is. Also, it's not only the return value that is undefined; the entire program behavior is undefined. We're lucky that the compiler is (likely) unable to see the UB, and so it can't freak out. So, a patch should say the behavior is undefined if the value is not representable in an int. However, maybe we should instead try to fix glibc to do the right thing. int atoi(const char *nptr) { int i, err; i = strtoi(nptr, NULL, 10, INT_MIN, INT_MAX, &err); if (err) errno = err; return i; } This is compatible with ISO C, since it behaves like (int) strtol(nptr, NULL, 10); "Except for the behavior on error", in which this atoi(3) implementation sets errno, but nothing forbids that (ISO C only says "need not affect the value of the integer expression errno on an error", which allows affecting errno). POSIX also allows this implementation: "except that the handling of errors may differ". Have a lovely night, Alex > Only base-10 input can be converted. > It is recommended to instead use the > .BR strtol () > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] atoi.3: Document return value on under/overflow as undefined 2023-12-10 20:35 ` Alejandro Colomar @ 2023-12-10 22:25 ` Thomas Habets 2023-12-10 23:47 ` Alejandro Colomar 0 siblings, 1 reply; 6+ messages in thread From: Thomas Habets @ 2023-12-10 22:25 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On Sun, 10 Dec 2023 20:35:15 +0000, Alejandro Colomar <alx@kernel.org> said: > For extra fun, you could have quoted this together :) > > ``` > except that atoi() does not detect errors. > ``` Yeah, which of course makes no sense no matter if over/underflow is supposed to be considered an "error". > However, maybe we should instead try to fix glibc to do the right thing. > > int > atoi(const char *nptr) > { > int i, err; > > i = strtoi(nptr, NULL, 10, INT_MIN, INT_MAX, &err); > if (err) > errno = err; > return i; > } > > This is compatible with ISO C, since it behaves like > > (int) strtol(nptr, NULL, 10); > > "Except for the behavior on error", in which this atoi(3) implementation > sets errno, but nothing forbids that (ISO C only says "need not affect > the value of the integer expression errno on an error", which allows > affecting errno). POSIX also allows this implementation: "except that > the handling of errors may differ". If we don't change the manpage, then it should return 0 on error, not the clamped value. Unless you mean that the manpage should be changed to say it'll return the clamped value? Portable code won't be able to rely on errno anyway, so might as well not set it, in my opinion. But at least this implementation won't trigger UB for any input. -- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] atoi.3: Document return value on under/overflow as undefined 2023-12-10 22:25 ` Thomas Habets @ 2023-12-10 23:47 ` Alejandro Colomar 2023-12-11 10:53 ` Thomas Habets 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2023-12-10 23:47 UTC (permalink / raw) To: Thomas Habets; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2569 bytes --] Hi Thomas, On Sun, Dec 10, 2023 at 02:25:20PM -0800, Thomas Habets wrote: > On Sun, 10 Dec 2023 20:35:15 +0000, Alejandro Colomar <alx@kernel.org> said: > > For extra fun, you could have quoted this together :) > > > > ``` > > except that atoi() does not detect errors. > > ``` > > Yeah, which of course makes no sense no matter if over/underflow is > supposed to be considered an "error". > > > However, maybe we should instead try to fix glibc to do the right thing. > > > > int > > atoi(const char *nptr) > > { > > int i, err; > > > > i = strtoi(nptr, NULL, 10, INT_MIN, INT_MAX, &err); > > if (err) > > errno = err; > > return i; > > } > > > > This is compatible with ISO C, since it behaves like > > > > (int) strtol(nptr, NULL, 10); > > > > "Except for the behavior on error", in which this atoi(3) implementation > > sets errno, but nothing forbids that (ISO C only says "need not affect > > the value of the integer expression errno on an error", which allows > > affecting errno). POSIX also allows this implementation: "except that > > the handling of errors may differ". > > If we don't change the manpage, then it should return 0 on error, not > the clamped value. Unless you mean that the manpage should be changed > to say it'll return the clamped value? Yes, if the implementation is changed for good, I'd also change the manual page. > > Portable code won't be able to rely on errno anyway, so might as well > not set it, in my opinion. > > But at least this implementation won't trigger UB for any input. Yeah, I'm thinking in 50 years from now, assuming all implementations have good intentions and don't want to break programs just because the standard says they can. Hopefully atoi(3) could be usable in half a century; if the planet is still there. BTW, regarding your blog post about strtoul(3), I don't think it's so hard to parse unsigned integers. I couldn't reply to your blong without logging in, but replied to the linked SO post: <https://softwareengineering.stackexchange.com/a/449060/332848> Have a lovely night, Alex > > -- > typedef struct me_s { > char name[] = { "Thomas Habets" }; > char email[] = { "thomas@habets.se" }; > char kernel[] = { "Linux" }; > char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; > char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" }; > char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; > } me_t; -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] atoi.3: Document return value on under/overflow as undefined 2023-12-10 23:47 ` Alejandro Colomar @ 2023-12-11 10:53 ` Thomas Habets 2023-12-11 11:51 ` Alejandro Colomar 0 siblings, 1 reply; 6+ messages in thread From: Thomas Habets @ 2023-12-11 10:53 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On Sun, 10 Dec 2023 23:47:19 +0000, Alejandro Colomar <alx@kernel.org> said: > Yeah, I'm thinking in 50 years from now, assuming all implementations > have good intentions and don't want to break programs just because the > standard says they can. Hopefully atoi(3) could be usable in half a > century; if the planet is still there. Sure, one can lead by example. I wouldn't hold my breath that everyone follows, though. I definitely predict maintainers (cough, some BSDs, cough) saying "nobody should use ato*() anyway". > BTW, regarding your blog post about strtoul(3), I don't think it's so > hard to parse unsigned integers. I couldn't reply to your blong without > logging in, but replied to the linked SO post: > <https://softwareengineering.stackexchange.com/a/449060/332848> Ah, parse it twice to check. Yeah I'd not thought of that. Thanks. I'll add an update. -- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] atoi.3: Document return value on under/overflow as undefined 2023-12-11 10:53 ` Thomas Habets @ 2023-12-11 11:51 ` Alejandro Colomar 0 siblings, 0 replies; 6+ messages in thread From: Alejandro Colomar @ 2023-12-11 11:51 UTC (permalink / raw) To: Thomas Habets; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1933 bytes --] Hi Thomas, On Mon, Dec 11, 2023 at 02:53:31AM -0800, Thomas Habets wrote: > On Sun, 10 Dec 2023 23:47:19 +0000, Alejandro Colomar <alx@kernel.org> said: > > Yeah, I'm thinking in 50 years from now, assuming all implementations > > have good intentions and don't want to break programs just because the > > standard says they can. Hopefully atoi(3) could be usable in half a > > century; if the planet is still there. > > Sure, one can lead by example. I wouldn't hold my breath that everyone > follows, though. I definitely predict maintainers (cough, some BSDs, > cough) saying "nobody should use ato*() anyway". To that (very likely) response, I'd reply with: Then remove it from your libc. I'm sure they'll then reply with something like "but the standard says the function should exist" To which I have another reply: POSIX also specifies gets(3) as of the latest version (POSIX.1-2017), and everyone (rightly) removed it from libc. While atoi(3) doesn't produce such an obvious buffer overflow, it similarly invokes Undefined Behavior on conditions which the program can't control or prevent, which makes the function equally broken. To which they may say: But it's not so broken to remove it. It's just that the standard doesn't mandate to implement it in the sane way, so nobody did it. Then I'd reply: Then go fix it. At which point, they may get in an infinite loop, or just redirect to /dev/null. > > BTW, regarding your blog post about strtoul(3), I don't think it's so > > hard to parse unsigned integers. I couldn't reply to your blong without > > logging in, but replied to the linked SO post: > > <https://softwareengineering.stackexchange.com/a/449060/332848> > > Ah, parse it twice to check. Yeah I'd not thought of that. Thanks. I'll add an > update. Great. :) Have a nice day, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-12-11 11:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-10 14:08 [patch] atoi.3: Document return value on under/overflow as undefined thomas 2023-12-10 20:35 ` Alejandro Colomar 2023-12-10 22:25 ` Thomas Habets 2023-12-10 23:47 ` Alejandro Colomar 2023-12-11 10:53 ` Thomas Habets 2023-12-11 11:51 ` Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox