Hi Zack,
On Wed, Dec 06, 2023 at 03:52:34PM +0100, Alejandro Colomar wrote:
> Several programmers have been confused about this use of 'deprecated'.
>
> Also, maximum field width can be used with these fields to mitigate the
> problem. Still, it's only a mitigation, since it limits the number of
> characters read, but that means an input of LONG_MAX+1 --which takes up
> the same number of characters than LONG_MAX-- would still cause UB; or
> one can limit that to well below the limit of UB, but then you
> artificially invalidate valid input. No good way to avoid UB with
> sscanf(3), but it's not necessarily bad with trusted input (and
> strtol(3) isn't the panacea either; strtoi(3) is good, though, but not
> standard).
>
> Try to be more convincing in BUGS instead.
>
> Link:
> Cc: Lee Griffiths
> Cc: Zack Weinberg
> Signed-off-by: Alejandro Colomar
> ---
>
> Hi Lee!
>
> Thanks for the report. After seeing how much frustration it has caused,
> I propose this change. Does it look good to you?
>
> Thanks,
> Alex
Formatted page:
BUGS
Numeric conversion specifiers
Use of the numeric conversion specifiers produces Undefined Be‐
havior for invalid input. See C11 7.21.6.2/10. This is a bug in
the ISO C standard, and not an inherent design issue with the
API. However, current implementations are not safe from that
bug, so it is not recommended to use them. Instead, programs
should use functions such as strtol(3) to parse numeric input.
This manual page deprecates use of the numeric conversion speci‐
fiers until they are fixed by ISO C.
I think it would be good if glibc would make promises about sscanf(3)
on untrusted input. How about guaranteeing a value of -1 and ERANGE if
the integer would overflow?
The current implementation, AFAIK, uses strtol(3), so it has the
following behavior:
- For %d, if the value is >INT_MAX but <=LONG_MAX, the wrap-around
value is stored, and errno is not set.
- For %d, if the value is >LONG_MAX, -1 is stored, and errno is set.
$ cat sscanf.c
#define _GNU_SOURCE
#include
#include
#include
#define wrap(s) do \
{ \
int i, ret; \
\
errno = 0; \
ret = sscanf(s, "%d", &i); \
printf("%s: ret= %d, val= %d, errno= %s\n", #s , ret, i, strerrorname_np(errno)); \
} while (0)
int
main(void)
{
char str_a[] = "9223372036854775828"; // 2^63 + 20
char str_s[] = "8589934599"; // 2^33 + 7
char str_d[] = "4294967290"; // 2^32 - 6
char str_f[] = "2147483678"; // 2^31 + 30
char str_g[] = "2147483638"; // 2^31 - 10
wrap(str_a);
wrap(str_s);
wrap(str_d);
wrap(str_f);
wrap(str_g);
}
$ cc -Wall -Wextra sscanf.c
$ ./a.out
str_a: ret= 1, val= -1, errno= ERANGE
str_s: ret= 1, val= 7, errno= 0
str_d: ret= 1, val= -6, errno= 0
str_f: ret= 1, val= -2147483618, errno= 0
str_g: ret= 1, val= 2147483638, errno= 0
The suggested change would be to act as if
strtoi(str, NULL, 0, INT_MIN, INT_MAX, &err);
would have been called. Does that make sense to you?
Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
isn't easy to use portably (since POSIX allows EINVAL on no conversion,
how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
Thanks,
Alex
--