From: Bruno Haible <bruno@clisp.org>
To: Alejandro Colomar <alx@kernel.org>
Cc: liba2i@lists.linux.dev, sc22wg14@open-std.org,
libbsd@lists.freedesktop.org, tech-misc@netbsd.org,
christos <christos@netbsd.org>,
"Đoàn Trần Công Danh" <congdanhqx@gmail.com>,
"Paul Eggert" <eggert@cs.ucla.edu>,
"Eli Schwartz" <eschwartz93@gmail.com>,
"Guillem Jover" <guillem@hadrons.org>,
"Iker Pedrosa" <ipedrosa@redhat.com>,
"Michael Vetter" <jubalh@iodoru.org>,
"Robert Elz" <kre@netbsd.org>,
riastradh@netbsd.org, "Sam James" <sam@gentoo.org>,
"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Date: Thu, 20 Mar 2025 15:26:46 +0100 [thread overview]
Message-ID: <6580350.j6PcuT4dK6@nimes> (raw)
In-Reply-To: <x34jdlyb6gf6asojjhzwpol5schyz4llk2ebrtzpb7ryje52u6@smbvzpoqcf6v>
Alejandro Colomar wrote:
> > 2) nproc.c, omp_init.c
> >
> > > > gnulib/lib/nproc.c:402
> > >
> > > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > > lib/nproc.c-384- Return 0 for invalid values. */
> > > lib/nproc.c-385-static unsigned long int
> > > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > > lib/nproc.c-387-{
> > >
> > > ...
> > >
> > > lib/nproc.c-398- /* Convert it from positive decimal to 'unsigned long'. */
> > > lib/nproc.c-399- if (c_isdigit (*threads))
> > > lib/nproc.c-400- {
> > > lib/nproc.c-401- char *endptr = NULL;
> > > lib/nproc.c:402: unsigned long int value = strtoul (threads, &endptr, 10);
> > > lib/nproc.c-403-
> > > lib/nproc.c-404- if (endptr != NULL)
> > > lib/nproc.c-405- {
> > > lib/nproc.c-406- while (*endptr != '\0' && c_isspace (*endptr))
> > > lib/nproc.c-407- endptr++;
> > > lib/nproc.c-408- if (*endptr == '\0')
> > > lib/nproc.c-409- return value;
> > > lib/nproc.c-410- /* Also accept the first value in a nesting level,
> > > lib/nproc.c-411- since we can't determine the nesting level from env vars. */
> > > lib/nproc.c-412- else if (*endptr == ',')
> > > lib/nproc.c-413- return value;
> > > lib/nproc.c-414- }
> > > lib/nproc.c-415- }
> > > ...
> > > This is one case where you seem to silently ignore saturation.
> >
> > This is on purpose. A CPU never has more than 1000 processors. Therefore
> > all values > 1000 should be treated the same way, whether they are
> > <= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
> > processors occurs later in the code.
>
> I guess you refer to the MIN() calls within num_processors(), right?
> Why do those clampings not result in diagnostics? Couldn't you
> calculate the limits before parsing the actual number, and so use them
> to perform the range checks during the strtou(3) call?
The essential code is like this:
unsigned long int omp_env_limit = ULONG_MAX;
if (query == NPROC_CURRENT_OVERRIDABLE)
{
omp_env_limit = parse_omp_threads (getenv ("OMP_THREAD_LIMIT"));
if (! omp_env_limit)
omp_env_limit = ULONG_MAX;
...
query = NPROC_CURRENT;
}
unsigned long nprocs = num_processors_ignoring_omp (query);
return MIN (nprocs, omp_env_limit);
and num_processors_ignoring_omp (query) is expensive to compute, since
it makes system calls. For this reason, we do the parsing of environment
variables first; this gives us the opportunity to optimize away the
system calls in particular circumstances.[1]
> > 3) msgfmt.c
> >
> > > > gettext/gettext-tools/src/msgfmt.c:287
> > >
> > > gettext-tools/src/msgfmt.c-286- char *endp;
> > > gettext-tools/src/msgfmt.c:287: size_t new_align = strtoul (optarg, &endp, 0);
> > > gettext-tools/src/msgfmt.c-288-
> > > gettext-tools/src/msgfmt.c-289- if (endp != optarg)
> > > gettext-tools/src/msgfmt.c-290- alignment = new_align;
> > > ...
> > > Why don't you have a diagnostic message for invalid input?
> >
> > Because this option is meant for users who know what an "alignment" is.
>
> But still, you don't need to actively ignore ERANGE, especially when it
> results in more complex code than actually checking. Once the API gives
> you the check for free, you should probably use it.
>
> In this case, I think I'd do
>
> int status;
> size_t new_align;
>
> new_align = strtou(optarg, NULL, 0, 1, 16, &status);
> if (status == 0)
> alignment = new_align;
>
> Don't you think this is more robust and just as simple?
> ...
> strtoul(3) makes it difficult for you to do the check,
> not because you actively want to not check.
Yes and no. Yes, it would be nice to have a range check at essentially
zero cost. No, clamping is not the right behaviour: an alignment of 5 or
17 should be rejected, not mapped to something valid.
> > 4) msgl-check.c
> >
> > > > gettext/gettext-tools/src/msgl-check.c:379
> > >
> > > gettext-tools/src/msgl-check.c-374- while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > > gettext-tools/src/msgl-check.c-375- ++nplurals;
> > > gettext-tools/src/msgl-check.c-376- endp = nplurals;
> > > gettext-tools/src/msgl-check.c-377- nplurals_value = 0;
> > > gettext-tools/src/msgl-check.c-378- if (*nplurals >= '0' && *nplurals <= '9')
> > > gettext-tools/src/msgl-check.c:379: nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > > gettext-tools/src/msgl-check.c-380- if (nplurals == endp)
> > > gettext-tools/src/msgl-check.c-381- {
> > > gettext-tools/src/msgl-check.c-382- const char *msg = _("invalid nplurals value");
> > > gettext-tools/src/msgl-check.c-383- char *help = plural_help (nullentry);
> > > gettext-tools/src/msgl-check.c-384-
> > > gettext-tools/src/msgl-check.c-385- if (help != NULL)
> > > gettext-tools/src/msgl-check.c-386- {
> > > gettext-tools/src/msgl-check.c-387- char *msgext = xasprintf ("%s\n%s", msg, help);
> > > gettext-tools/src/msgl-check.c-388- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > > gettext-tools/src/msgl-check.c-389- msgext);
> > > gettext-tools/src/msgl-check.c-390- free (msgext);
> > > gettext-tools/src/msgl-check.c-391- free (help);
> > > gettext-tools/src/msgl-check.c-392- }
> > > gettext-tools/src/msgl-check.c-393- else
> > > gettext-tools/src/msgl-check.c-394- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > > gettext-tools/src/msgl-check.c-395- msg);
> > > gettext-tools/src/msgl-check.c-396-
> > > gettext-tools/src/msgl-check.c-397- seen_errors++;
> > > gettext-tools/src/msgl-check.c-398- }
> >
> > > On the other hand, I also wonder why you don't diagnose invalid input.
> > > Why is -10 an "invalid nplurals value"
> >
> > "-10" designates a negative number. For the number of plural forms, that's
> > invalid.
> >
> > > but ULONG_MAX+10 is a valid (albeit clamped) one?
> >
> > Again, as mentioned above, values like 1000 and 10000000000000000 should
> > be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
> > just fine.
>
> I see in line 433 a check
>
> if (min_nplurals < nplurals_value)
>
> And in line 449 a check
>
> else if (max_nplurals > nplurals_value)
>
> nplurals_value was parsed via strtoul(3) in line 379.
>
> And min_plurals and max_plurals were assigned in lines 293 and lines 298.
> Which makes me wonder: why don't you move the tests from lines 4** into
> the strtou(3) call?
>
> That is, why not call this?
>
> nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);
>
> And then have the ERANGE handling there?
For two reasons:
- The program's logic is to parse simple things first and complicated things
afterwards. The user is likely to make a mistake in the simple things with
a lower probability than in the complicated things. When there is a
mismatch, the likely culprit is a mistake in the complicated things; the
diagnostics need to be directed at this scenario.
- It feels strange to move complicated validation logic into a library
function.
In summary, while the built-in range checks are welcome for things like
port numbers (0..65536), in this case it's better to keep the program's logic
explicit and situation-aware.
> In cases 2 and 4, I think you'd benefit from refactoring the code to use
> the limits in the strtou(3) call. Especially in case 4 where you
> already know the limits.
No. As the author of that code, I disagree.
Bruno
[1] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=205078f891d87e0e966ad8616fdf0306437f0370
next prev parent reply other threads:[~2025-03-20 14:27 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250318142555.09A86356820@www.open-std.org>
2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
2025-03-18 21:16 ` Alejandro Colomar
2025-03-18 21:53 ` Bruno Haible
2025-03-18 22:43 ` Alejandro Colomar
2025-03-19 0:15 ` Bruno Haible
2025-03-19 15:26 ` Alejandro Colomar
2025-03-19 18:48 ` Alejandro Colomar
2025-03-19 18:56 ` Alejandro Colomar
2025-03-19 21:59 ` Bruno Haible
2025-03-19 23:12 ` Alejandro Colomar
2025-03-19 23:30 ` strtou(3) handling of negative input Alejandro Colomar
2025-03-19 23:52 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
2025-03-20 0:19 ` Alejandro Colomar
2025-03-20 0:31 ` Thorsten Glaser
2025-03-20 0:36 ` Alejandro Colomar
2025-03-19 23:52 ` nullability of status parameter in strtoi/u(3) Alejandro Colomar
2025-03-20 12:44 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
2025-03-20 12:55 ` Alejandro Colomar
2025-03-20 17:18 ` Thorsten Glaser
2025-03-20 14:26 ` Bruno Haible [this message]
2025-03-20 14:54 ` Alejandro Colomar
2025-03-19 19:27 ` Paul Eggert
2025-03-19 20:05 ` Alejandro Colomar
2025-03-19 20:39 ` Paul Eggert
2025-03-19 21:23 ` Alejandro Colomar
2025-03-20 0:39 ` Paul Eggert
2025-03-20 1:15 ` Alejandro Colomar
2025-03-20 7:03 ` Paul Eggert
2025-03-20 10:32 ` Alejandro Colomar
2025-03-19 15:56 ` Thorsten Glaser
2025-03-19 16:25 ` Alejandro Colomar
2025-03-19 16:36 ` Thorsten Glaser
2025-03-19 16:53 ` Alejandro Colomar
2025-03-19 17:35 ` Bruno Haible
2025-03-19 18:01 ` Alejandro Colomar
2025-03-20 16:13 ` alx-0008r2 " Alejandro Colomar
2025-03-18 17:20 ` [SC22WG14.29900] alx-0008 " Joseph Myers
2025-03-18 20:18 ` Alejandro Colomar
[not found] ` <20250318201854.66AB5356895@www.open-std.org>
2025-03-18 21:11 ` [SC22WG14.29912] " Joseph Myers
2025-03-18 21:35 ` Alejandro Colomar
2025-03-18 21:40 ` Alejandro Colomar
2025-03-18 22:14 ` Joseph Myers
2025-03-18 22:49 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6580350.j6PcuT4dK6@nimes \
--to=bruno@clisp.org \
--cc=alx@kernel.org \
--cc=christos@netbsd.org \
--cc=congdanhqx@gmail.com \
--cc=eggert@cs.ucla.edu \
--cc=eschwartz93@gmail.com \
--cc=guillem@hadrons.org \
--cc=ipedrosa@redhat.com \
--cc=jubalh@iodoru.org \
--cc=kre@netbsd.org \
--cc=liba2i@lists.linux.dev \
--cc=libbsd@lists.freedesktop.org \
--cc=riastradh@netbsd.org \
--cc=sam@gentoo.org \
--cc=sc22wg14@open-std.org \
--cc=serge@hallyn.com \
--cc=tech-misc@netbsd.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.