* [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
@ 2023-12-06 14:52 Alejandro Colomar
2023-12-06 16:36 ` Alejandro Colomar
[not found] ` <CAKXok1GQvKi2HiBU89CSd+KF_dd9+mOMVhHrMKAVLLwcyJDN2g@mail.gmail.com>
0 siblings, 2 replies; 9+ messages in thread
From: Alejandro Colomar @ 2023-12-06 14:52 UTC (permalink / raw)
To: linux-man; +Cc: Alejandro Colomar, Lee Griffiths, Zack Weinberg
[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]
Several programmers have been confused about this use of 'deprecated'.
Also, maximum field width can be used with these fields to mitigate the
problem. Still, it's only a mitigation, since it limits the number of
characters read, but that means an input of LONG_MAX+1 --which takes up
the same number of characters than LONG_MAX-- would still cause UB; or
one can limit that to well below the limit of UB, but then you
artificially invalidate valid input. No good way to avoid UB with
sscanf(3), but it's not necessarily bad with trusted input (and
strtol(3) isn't the panacea either; strtoi(3) is good, though, but not
standard).
Try to be more convincing in BUGS instead.
Link: <https://stackoverflow.com/questions/77601832/man-sscanf-d-is-deprecated-in-c-or-glibc/>
Cc: Lee Griffiths <poddster@gmail.com>
Cc: Zack Weinberg <zack@owlfolio.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
Hi Lee!
Thanks for the report. After seeing how much frustration it has caused,
I propose this change. Does it look good to you?
Thanks,
Alex
man3/sscanf.3 | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
diff --git a/man3/sscanf.3 b/man3/sscanf.3
index 2211cab7d..4c0bdc318 100644
--- a/man3/sscanf.3
+++ b/man3/sscanf.3
@@ -359,7 +359,6 @@ .SS Conversions
and assignment does not occur.
.TP
.B d
-.IR Deprecated .
Matches an optionally signed decimal integer;
the next pointer must be a pointer to
.IR int .
@@ -374,7 +373,6 @@ .SS Conversions
.\" is silently ignored, causing old programs to fail mysteriously.)
.TP
.B i
-.IR Deprecated .
Matches an optionally signed integer; the next pointer must be a pointer to
.IR int .
The integer is read in base 16 if it begins with
@@ -387,18 +385,15 @@ .SS Conversions
Only characters that correspond to the base are used.
.TP
.B o
-.IR Deprecated .
Matches an unsigned octal integer; the next pointer must be a pointer to
.IR "unsigned int" .
.TP
.B u
-.IR Deprecated .
Matches an unsigned decimal integer; the next pointer must be a
pointer to
.IR "unsigned int" .
.TP
.B x
-.IR Deprecated .
Matches an unsigned hexadecimal integer
(that may optionally begin with a prefix of
.I 0x
@@ -409,33 +404,27 @@ .SS Conversions
.IR "unsigned int" .
.TP
.B X
-.IR Deprecated .
Equivalent to
.BR x .
.TP
.B f
-.IR Deprecated .
Matches an optionally signed floating-point number; the next pointer must
be a pointer to
.IR float .
.TP
.B e
-.IR Deprecated .
Equivalent to
.BR f .
.TP
.B g
-.IR Deprecated .
Equivalent to
.BR f .
.TP
.B E
-.IR Deprecated .
Equivalent to
.BR f .
.TP
.B a
-.IR Deprecated .
(C99) Equivalent to
.BR f .
.TP
@@ -661,8 +650,8 @@ .SS Numeric conversion specifiers
programs should use functions such as
.BR strtol (3)
to parse numeric input.
-This manual page deprecates use of the numeric conversion specifiers
-until they are fixed by ISO C.
+Alternatively,
+mitigate it by specifying a maximum field width.
.SS Nonstandard modifiers
These functions are fully C99 conformant, but provide the
additional modifiers
--
2.42.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 14:52 [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS Alejandro Colomar
@ 2023-12-06 16:36 ` Alejandro Colomar
2023-12-06 18:33 ` Matthew House
[not found] ` <CAKXok1GQvKi2HiBU89CSd+KF_dd9+mOMVhHrMKAVLLwcyJDN2g@mail.gmail.com>
1 sibling, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-12-06 16:36 UTC (permalink / raw)
To: linux-man, Zack Weinberg; +Cc: Lee Griffiths
[-- Attachment #1: Type: text/plain, Size: 3920 bytes --]
Hi Zack,
On Wed, Dec 06, 2023 at 03:52:34PM +0100, Alejandro Colomar wrote:
> Several programmers have been confused about this use of 'deprecated'.
>
> Also, maximum field width can be used with these fields to mitigate the
> problem. Still, it's only a mitigation, since it limits the number of
> characters read, but that means an input of LONG_MAX+1 --which takes up
> the same number of characters than LONG_MAX-- would still cause UB; or
> one can limit that to well below the limit of UB, but then you
> artificially invalidate valid input. No good way to avoid UB with
> sscanf(3), but it's not necessarily bad with trusted input (and
> strtol(3) isn't the panacea either; strtoi(3) is good, though, but not
> standard).
>
> Try to be more convincing in BUGS instead.
>
> Link: <https://stackoverflow.com/questions/77601832/man-sscanf-d-is-deprecated-in-c-or-glibc/>
> Cc: Lee Griffiths <poddster@gmail.com>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>
> Hi Lee!
>
> Thanks for the report. After seeing how much frustration it has caused,
> I propose this change. Does it look good to you?
>
> Thanks,
> Alex
Formatted page:
BUGS
Numeric conversion specifiers
Use of the numeric conversion specifiers produces Undefined Be‐
havior for invalid input. See C11 7.21.6.2/10. This is a bug in
the ISO C standard, and not an inherent design issue with the
API. However, current implementations are not safe from that
bug, so it is not recommended to use them. Instead, programs
should use functions such as strtol(3) to parse numeric input.
This manual page deprecates use of the numeric conversion speci‐
fiers until they are fixed by ISO C.
I think it would be good if glibc would make promises about sscanf(3)
on untrusted input. How about guaranteeing a value of -1 and ERANGE if
the integer would overflow?
The current implementation, AFAIK, uses strtol(3), so it has the
following behavior:
- For %d, if the value is >INT_MAX but <=LONG_MAX, the wrap-around
value is stored, and errno is not set.
- For %d, if the value is >LONG_MAX, -1 is stored, and errno is set.
$ cat sscanf.c
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <string.h>
#define wrap(s) do \
{ \
int i, ret; \
\
errno = 0; \
ret = sscanf(s, "%d", &i); \
printf("%s: ret= %d, val= %d, errno= %s\n", #s , ret, i, strerrorname_np(errno)); \
} while (0)
int
main(void)
{
char str_a[] = "9223372036854775828"; // 2^63 + 20
char str_s[] = "8589934599"; // 2^33 + 7
char str_d[] = "4294967290"; // 2^32 - 6
char str_f[] = "2147483678"; // 2^31 + 30
char str_g[] = "2147483638"; // 2^31 - 10
wrap(str_a);
wrap(str_s);
wrap(str_d);
wrap(str_f);
wrap(str_g);
}
$ cc -Wall -Wextra sscanf.c
$ ./a.out
str_a: ret= 1, val= -1, errno= ERANGE
str_s: ret= 1, val= 7, errno= 0
str_d: ret= 1, val= -6, errno= 0
str_f: ret= 1, val= -2147483618, errno= 0
str_g: ret= 1, val= 2147483638, errno= 0
The suggested change would be to act as if
strtoi(str, NULL, 0, INT_MIN, INT_MAX, &err);
would have been called. Does that make sense to you?
Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
isn't easy to use portably (since POSIX allows EINVAL on no conversion,
how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
Thanks,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 16:36 ` Alejandro Colomar
@ 2023-12-06 18:33 ` Matthew House
2023-12-06 20:17 ` Alejandro Colomar
0 siblings, 1 reply; 9+ messages in thread
From: Matthew House @ 2023-12-06 18:33 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man, Zack Weinberg, Lee Griffiths
On Wed, Dec 6, 2023 at 11:36 AM Alejandro Colomar <alx@kernel.org> wrote:
> Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
> isn't easy to use portably (since POSIX allows EINVAL on no conversion,
> how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
I feel like this is rather overstating the difficulty. In practice, the
no-conversion condition is very commonly detected by checking whether
*endptr == nptr after the call. The usual idiom I see is something like:
char *end;
errno = 0;
value = strtol(ptr, &end, 10);
if (end == ptr || *end != '\0' || errno == ERANGE)
goto err;
Of course, the *end != '\0' condition can be omitted or adapted as
necessary. Alternatively, one can avoid checking errno at all, by just
checking whether the value is in the permitted range, since the saturating
behavior will make such a check reject on overflow. And even without an
explicit permitted range, one can just reject on on value == LONG_MIN ||
value == LONG_MAX, or just on value == ULONG_MAX for strtoul(3); rejecting
a value that's almost an overflow isn't going to harm anything, except for
the rare scenarios where a printed integer can actually reach the minimum
or maximum, but needs to be round-tripped unconditionally.
In general, I don't think most programmers are in the habit of carefully
distinguishing errno values for <string.h> functions. They'd rather check
for self-explanatory conditions, such as *endptr == nptr, that readers
don't have to refer to the man page to decipher. There's a reason that most
high-level language bindings return errno values for file I/O but not for
anything else.
Thank you,
Matthew House
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 18:33 ` Matthew House
@ 2023-12-06 20:17 ` Alejandro Colomar
2023-12-06 20:45 ` Matthew House
0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Colomar @ 2023-12-06 20:17 UTC (permalink / raw)
To: Matthew House; +Cc: linux-man, Zack Weinberg, Lee Griffiths
[-- Attachment #1: Type: text/plain, Size: 4367 bytes --]
Hi Matthew,
On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote:
> On Wed, Dec 6, 2023 at 11:36 AM Alejandro Colomar <alx@kernel.org> wrote:
> > Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
> > isn't easy to use portably (since POSIX allows EINVAL on no conversion,
> > how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
>
> I feel like this is rather overstating the difficulty. In practice, the
> no-conversion condition is very commonly detected by checking whether
> *endptr == nptr after the call. The usual idiom I see is something like:
>
> char *end;
> errno = 0;
> value = strtol(ptr, &end, 10);
> if (end == ptr || *end != '\0' || errno == ERANGE)
That test could trigger UB, if you passed an unsupported base. Of
course, in this case you pass 10, but what if the base was a
user-controlled variable? In such a case, nothing says what happens to
'end' (experimentally, I see it is not modified, so it would be left
uninitialized); so dereferencing it, or even comparing it, would be UB.
> goto err;
Yeah, if you just don't care and want to handle all errors in the same
way, and you know the base is supported, this is correct.
But what happens when you want to differentiate the different errors?
Let's list the possible errors, as per strtoi(3bsd):
ERRORS
[ECANCELED] The string did not contain any characters that
were converted.
[EINVAL] The base is not between 2 and 36 and does not
contain the special value 0.
[ENOTSUP] The string contained non‐numeric characters
that did not get converted. In this case,
endptr points to the first unconverted charac‐
ter.
[ERANGE] The given string was out of range; the value
converted has been clamped; or the range given
was invalid, i.e. lo > hi.
Let's see how strtol(3) handles these:
ECANCELED:
strtol(1) has `end == ptr`. But POSIX allows EINVAL. But make sure you
pass a supported base.
EINVAL:
strtol(1) has EINVAL. But what happens to end? It could be left
unmodified (current glibc behavior); or could be set to ptr, since none
of the string has been read. If the former, it's easy to trigger UB.
If the latter, it is indistinguishable from ECANCELED.
ENOTSUPP:
strtol(3) has `*end != '\0'`. But make sure you pass a supported base,
or buy a protector for nasal demons.
ERANGE:
strtol(3) has ERANGE; same as strtoi().
In the end, it amounts to saying: "the behavior of strtol(3) is
undefined if the base is unsupported; don't bother to test EINVAL: don't
trigger it". Which is fine, but we need to clarify that, because if
someone actually needs to use a non-standard base, they should be very
careful, and set end=NULL before the call (but there are no guarantees
that end is not modified either, so...). Or better, provide strtoi(3)
and compare (err != 0), or (err != 0 && err != E***) if you explicitly
allow some error.
>
> Of course, the *end != '\0' condition can be omitted or adapted as
> necessary. Alternatively, one can avoid checking errno at all, by just
> checking whether the value is in the permitted range, since the saturating
> behavior will make such a check reject on overflow. And even without an
> explicit permitted range, one can just reject on on value == LONG_MIN ||
> value == LONG_MAX, or just on value == ULONG_MAX for strtoul(3); rejecting
> a value that's almost an overflow isn't going to harm anything, except for
> the rare scenarios where a printed integer can actually reach the minimum
> or maximum, but needs to be round-tripped unconditionally.
>
> In general, I don't think most programmers are in the habit of carefully
> distinguishing errno values for <string.h> functions. They'd rather check
> for self-explanatory conditions, such as *endptr == nptr, that readers
> don't have to refer to the man page to decipher. There's a reason that most
> high-level language bindings return errno values for file I/O but not for
> anything else.
>
> Thank you,
> Matthew House
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 20:17 ` Alejandro Colomar
@ 2023-12-06 20:45 ` Matthew House
2023-12-06 20:54 ` Matthew House
2023-12-06 21:12 ` Alejandro Colomar
0 siblings, 2 replies; 9+ messages in thread
From: Matthew House @ 2023-12-06 20:45 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man, Zack Weinberg, Lee Griffiths
On Wed, Dec 6, 2023 at 3:18 PM Alejandro Colomar <alx@kernel.org> wrote:
> Hi Matthew,
>
> On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote:
> > I feel like this is rather overstating the difficulty. In practice, the
> > no-conversion condition is very commonly detected by checking whether
> > *endptr == nptr after the call. The usual idiom I see is something like:
> >
> > char *end;
> > errno = 0;
> > value = strtol(ptr, &end, 10);
> > if (end == ptr || *end != '\0' || errno == ERANGE)
>
> That test could trigger UB, if you passed an unsupported base. Of
> course, in this case you pass 10, but what if the base was a
> user-controlled variable? In such a case, nothing says what happens to
> 'end' (experimentally, I see it is not modified, so it would be left
> uninitialized); so dereferencing it, or even comparing it, would be UB.
>
> > goto err;
>
> Yeah, if you just don't care and want to handle all errors in the same
> way, and you know the base is supported, this is correct.
The practical answer is that the base is never ultimately a user-controlled
variable. Sometimes people define wrapper functions with a variable base,
but that base is still ultimately fixed by all its callers. If you disagree
with this, I challenge you to name a single example.
The theoretical answer is that you can just replace (errno == ERANGE) with
(errno != 0), or just (errno), if you still don't care about distinguishing
a base error. If you do care about distinguishing a base error, you can
just check its value directly, which, as I said, most people prefer over
trying to decode different funnily-named values of errno in my experience.
if (!(base == 0 || base >= 2 && base <= 36))
goto bad_base;
char *end;
errno = 0;
value = strtol(ptr, &end, base);
if (end == ptr)
goto not_a_number;
if (*end != '\0')
goto trailing_garbage;
if (errno == ERANGE)
goto overflow_error;
/* the last could also be, e.g., if (value < 0 || value > MAX_VALUE) */
Thank you,
Matthew House
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 20:45 ` Matthew House
@ 2023-12-06 20:54 ` Matthew House
2023-12-06 21:12 ` Alejandro Colomar
1 sibling, 0 replies; 9+ messages in thread
From: Matthew House @ 2023-12-06 20:54 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man, Zack Weinberg, Lee Griffiths
On Wed, Dec 6, 2023 at 3:45 PM Matthew House <mattlloydhouse@gmail.com> wrote:
> The theoretical answer is that you can just replace (errno == ERANGE) with
> (errno != 0), or just (errno), if you still don't care about distinguishing
> a base error.
(And move the errno check up front, of course.)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-06 20:45 ` Matthew House
2023-12-06 20:54 ` Matthew House
@ 2023-12-06 21:12 ` Alejandro Colomar
1 sibling, 0 replies; 9+ messages in thread
From: Alejandro Colomar @ 2023-12-06 21:12 UTC (permalink / raw)
To: Matthew House; +Cc: linux-man, Zack Weinberg, Lee Griffiths
[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]
Hi Matthew,
On Wed, Dec 06, 2023 at 03:45:19PM -0500, Matthew House wrote:
> On Wed, Dec 6, 2023 at 3:18 PM Alejandro Colomar <alx@kernel.org> wrote:
> > Hi Matthew,
> >
> > On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote:
> > > I feel like this is rather overstating the difficulty. In practice, the
> > > no-conversion condition is very commonly detected by checking whether
> > > *endptr == nptr after the call. The usual idiom I see is something like:
> > >
> > > char *end;
> > > errno = 0;
> > > value = strtol(ptr, &end, 10);
> > > if (end == ptr || *end != '\0' || errno == ERANGE)
> >
> > That test could trigger UB, if you passed an unsupported base. Of
> > course, in this case you pass 10, but what if the base was a
> > user-controlled variable? In such a case, nothing says what happens to
> > 'end' (experimentally, I see it is not modified, so it would be left
> > uninitialized); so dereferencing it, or even comparing it, would be UB.
> >
> > > goto err;
> >
> > Yeah, if you just don't care and want to handle all errors in the same
> > way, and you know the base is supported, this is correct.
>
> The practical answer is that the base is never ultimately a user-controlled
> variable. Sometimes people define wrapper functions with a variable base,
> but that base is still ultimately fixed by all its callers. If you disagree
> with this, I challenge you to name a single example.
Agree. But then the manual shouldn't suggest that it's fine to test for
EINVAL. It would be fine to test beforehand, though:
errno = 0;
strtol("0", NULL, base);
if (errno == EINVAL)
goto bad;
// Now we can work with that base.
...
errrno = 0;
val = strtol(str, &end, base);
if (end == ptr)
goto nan;
if (errno == ERANGE || val < min || val > max)
goto bignum;
if (*end != '\0')
goto garbage;
I think this example would be an improvement over the current page.
Still, strtoi() is simpler to use in the general case:
errno = 0;
val1 = strtoi(str, &end, base, min, max, &err);
if (err != 0 || err != ENOTSUP)
goto err;
val2 = strtoi(str, &end, base, min, max, &err);
if (err != 0)
goto err;
But yeah, this is something you can pull from libbsd, or write your own,
after taking into consideration the thing about EINVAL from above.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Fwd: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
[not found] ` <CAKXok1GQvKi2HiBU89CSd+KF_dd9+mOMVhHrMKAVLLwcyJDN2g@mail.gmail.com>
@ 2023-12-07 21:50 ` Lee Griffiths
2023-12-09 11:55 ` Alejandro Colomar
0 siblings, 1 reply; 9+ messages in thread
From: Lee Griffiths @ 2023-12-07 21:50 UTC (permalink / raw)
To: linux-man
(repost to mailing list, as my previous message attempt looked like
plain-text but was actually html)
> Hi Lee!
> Thanks for the report. After seeing how much frustration it has caused,
> I propose this change. Does it look good to you?
I don't wish to bike-shed this (as the current man-page is fine by me)
and I have no idea on the style guide used by the man-pages, but if I
was making the change I would replace the 'deprecated' on every
integer specifier with "CAVEAT: SEE BUGS". That way the inexperienced
reader is still frightened into using the function carefully. But if
that kind of thing isn't allowed then the proposed patch looks good to
me.
As a general point: A _lot_ of inexperienced users use this function
to parse user input. At the start of every semester you see an influx
of "why is my use of scanf broken?" posts on the various C and
learn-programming based subreddits, as well as Stackoverflow. I have
no idea why but it seems there's a large body of professors out there
teaching people to use scanf() instead of getc() or fgets() etc, so
I'm of the opinion that the scanf() page needs to be as scary as
possible :)
Again, I know nothing about how man pages are written, but if it was
documentation for legacy code I'd inherited I'm make sure to stress
the following somewhere on the page:
1. scanf() is intended to parse FORMATTED input, i.e. it consumes the
kind of strings produced by printf(), and NOT user input. (I'm not
100% sure if K&R had that as their rationale, but that's the way it's
designed now. Though this might confuse people into thinking they can
use their similar, but not identical, format strings between printf
and scanf!). Currently the word "format" or "formatted" barely
appears. But it's this feature that distinguishes it from the other
parsing functions.
2. Things like fgets() are much better for consuming user input, which
you can then parse with all the other functions.
Thanks,
Lee Griffiths
On Wed, 6 Dec 2023 at 14:52, Alejandro Colomar <alx@kernel.org> wrote:
>
> Several programmers have been confused about this use of 'deprecated'.
>
> Also, maximum field width can be used with these fields to mitigate the
> problem. Still, it's only a mitigation, since it limits the number of
> characters read, but that means an input of LONG_MAX+1 --which takes up
> the same number of characters than LONG_MAX-- would still cause UB; or
> one can limit that to well below the limit of UB, but then you
> artificially invalidate valid input. No good way to avoid UB with
> sscanf(3), but it's not necessarily bad with trusted input (and
> strtol(3) isn't the panacea either; strtoi(3) is good, though, but not
> standard).
>
> Try to be more convincing in BUGS instead.
>
> Link: <https://stackoverflow.com/questions/77601832/man-sscanf-d-is-deprecated-in-c-or-glibc/>
> Cc: Lee Griffiths <poddster@gmail.com>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>
> Hi Lee!
>
> Thanks for the report. After seeing how much frustration it has caused,
> I propose this change. Does it look good to you?
>
> Thanks,
> Alex
>
> man3/sscanf.3 | 15 ++-------------
> 1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/man3/sscanf.3 b/man3/sscanf.3
> index 2211cab7d..4c0bdc318 100644
> --- a/man3/sscanf.3
> +++ b/man3/sscanf.3
> @@ -359,7 +359,6 @@ .SS Conversions
> and assignment does not occur.
> .TP
> .B d
> -.IR Deprecated .
> Matches an optionally signed decimal integer;
> the next pointer must be a pointer to
> .IR int .
> @@ -374,7 +373,6 @@ .SS Conversions
> .\" is silently ignored, causing old programs to fail mysteriously.)
> .TP
> .B i
> -.IR Deprecated .
> Matches an optionally signed integer; the next pointer must be a pointer to
> .IR int .
> The integer is read in base 16 if it begins with
> @@ -387,18 +385,15 @@ .SS Conversions
> Only characters that correspond to the base are used.
> .TP
> .B o
> -.IR Deprecated .
> Matches an unsigned octal integer; the next pointer must be a pointer to
> .IR "unsigned int" .
> .TP
> .B u
> -.IR Deprecated .
> Matches an unsigned decimal integer; the next pointer must be a
> pointer to
> .IR "unsigned int" .
> .TP
> .B x
> -.IR Deprecated .
> Matches an unsigned hexadecimal integer
> (that may optionally begin with a prefix of
> .I 0x
> @@ -409,33 +404,27 @@ .SS Conversions
> .IR "unsigned int" .
> .TP
> .B X
> -.IR Deprecated .
> Equivalent to
> .BR x .
> .TP
> .B f
> -.IR Deprecated .
> Matches an optionally signed floating-point number; the next pointer must
> be a pointer to
> .IR float .
> .TP
> .B e
> -.IR Deprecated .
> Equivalent to
> .BR f .
> .TP
> .B g
> -.IR Deprecated .
> Equivalent to
> .BR f .
> .TP
> .B E
> -.IR Deprecated .
> Equivalent to
> .BR f .
> .TP
> .B a
> -.IR Deprecated .
> (C99) Equivalent to
> .BR f .
> .TP
> @@ -661,8 +650,8 @@ .SS Numeric conversion specifiers
> programs should use functions such as
> .BR strtol (3)
> to parse numeric input.
> -This manual page deprecates use of the numeric conversion specifiers
> -until they are fixed by ISO C.
> +Alternatively,
> +mitigate it by specifying a maximum field width.
> .SS Nonstandard modifiers
> These functions are fully C99 conformant, but provide the
> additional modifiers
> --
> 2.42.0
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS
2023-12-07 21:50 ` Fwd: " Lee Griffiths
@ 2023-12-09 11:55 ` Alejandro Colomar
0 siblings, 0 replies; 9+ messages in thread
From: Alejandro Colomar @ 2023-12-09 11:55 UTC (permalink / raw)
To: Lee Griffiths; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 4109 bytes --]
On Thu, Dec 07, 2023 at 09:50:35PM +0000, Lee Griffiths wrote:
> (repost to mailing list, as my previous message attempt looked like
> plain-text but was actually html)
>
>
>
> > Hi Lee!
>
> > Thanks for the report. After seeing how much frustration it has caused,
> > I propose this change. Does it look good to you?
>
> I don't wish to bike-shed this (as the current man-page is fine by me)
> and I have no idea on the style guide used by the man-pages, but if I
> was making the change I would replace the 'deprecated' on every
> integer specifier with "CAVEAT: SEE BUGS". That way the inexperienced
> reader is still frightened into using the function carefully. But if
> that kind of thing isn't allowed then the proposed patch looks good to
> me.
We could do that kind of thing. There are pages where the first line in
the DESCRIPTION is something like 'Never use this function.' (that
exact text appears in gets(3)).
>
> As a general point: A _lot_ of inexperienced users use this function
> to parse user input. At the start of every semester you see an influx
> of "why is my use of scanf broken?" posts on the various C and
> learn-programming based subreddits, as well as Stackoverflow.
Not exactly. This page is only about sscanf(3), which is not as bad as
scanf(3).
For scanf(3), I've re-read the page after these discussions, and have
added some more text, documenting some of the problems:
- <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=4ea602c6ab2716c00d189d28199a9236180d2145>
- <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=8c3bd620bca7de41c9d3e28d73f09ec88fd52a86>
> I have
> no idea why but it seems there's a large body of professors out there
> teaching people to use scanf() instead of getc() or fgets() etc, so
> I'm of the opinion that the scanf() page needs to be as scary as
> possible :)
My guess is that the old manual page wasn't scary enough (if at all).
I've done a few steps to try to prevent that.
Split [f]scanf(3) from sscanf(3). The latter is not so bad, since it
doesn't need to differentiate newlines from other white space, and it
doesn't leave the unrecognized text in the input stream.
So, the new page for sscanf(3) is what documents the conversions and
all that, and the new page for scanf(3) (and fscanf(3)) is shorter
and just recommending avoiding these functions at all (but still
referring to sscanf(3) for documentation of the conversions).
>
> Again, I know nothing about how man pages are written, but if it was
> documentation for legacy code I'd inherited I'm make sure to stress
> the following somewhere on the page:
We have man-pages(7) with a small style guide.
> 1. scanf() is intended to parse FORMATTED input, i.e. it consumes the
> kind of strings produced by printf(), and NOT user input. (I'm not
> 100% sure if K&R had that as their rationale, but that's the way it's
> designed now. Though this might confuse people into thinking they can
> use their similar, but not identical, format strings between printf
> and scanf!). Currently the word "format" or "formatted" barely
> appears. But it's this feature that distinguishes it from the other
> parsing functions.
Agree. I've added this commit:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=bb4dbdb82f141f6394984aced67d65810ec7f747>
> 2. Things like fgets() are much better for consuming user input, which
> you can then parse with all the other functions.
That's already specified in scanf(3), in the first paragraph:
DESCRIPTION
The scanf() family of functions scans input like sscanf(3), but
read from a FILE. It is very difficult to use these functions
correctly, and it is preferable to read entire lines with
fgets(3) or getline(3) and parse them later with sscanf(3) or
more specialized functions such as strtol(3).
Thanks,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-09 11:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-06 14:52 [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS Alejandro Colomar
2023-12-06 16:36 ` Alejandro Colomar
2023-12-06 18:33 ` Matthew House
2023-12-06 20:17 ` Alejandro Colomar
2023-12-06 20:45 ` Matthew House
2023-12-06 20:54 ` Matthew House
2023-12-06 21:12 ` Alejandro Colomar
[not found] ` <CAKXok1GQvKi2HiBU89CSd+KF_dd9+mOMVhHrMKAVLLwcyJDN2g@mail.gmail.com>
2023-12-07 21:50 ` Fwd: " Lee Griffiths
2023-12-09 11:55 ` Alejandro Colomar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox