All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: Alejandro Colomar <alx@kernel.org>
Cc: liba2i@lists.linux.dev, sc22wg14@open-std.org,
	libbsd@lists.freedesktop.org, tech-misc@netbsd.org,
	christos <christos@netbsd.org>,
	"Đoàn Trần Công Danh" <congdanhqx@gmail.com>,
	"Paul Eggert" <eggert@cs.ucla.edu>,
	"Eli Schwartz" <eschwartz93@gmail.com>,
	"Guillem Jover" <guillem@hadrons.org>,
	"Iker Pedrosa" <ipedrosa@redhat.com>,
	"Michael Vetter" <jubalh@iodoru.org>,
	"Robert Elz" <kre@netbsd.org>,
	riastradh@netbsd.org, "Sam James" <sam@gentoo.org>,
	"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Date: Wed, 19 Mar 2025 22:59:26 +0100	[thread overview]
Message-ID: <4085563.2iPT33SAM4@nimes> (raw)
In-Reply-To: <jx4664ishtl34eg2npdrv5fkfdiczqnlq3vjuacjrupjvh377x@gddcftzgwmfq>

Hi Alejandro,

> > > > It would be useful to show how a success test looks like, after
> > > >     strtoi (s, &end, base, min, max, &status)
> > > > for each of the four frequent use-cases:
> > > >   -a. expect to parse the initial portion of the string, no coercion,
> > > >   -b. expect to parse the initial portion of the string, silent coercion,
> > > >   -c. expect to parse the entire string, no coercion,
> > > >   -d. expect to parse the entire string, silent coercion.
> > > > 
> > > > AFAICS, the success tests are:
> > > >   -a. status == 0 || status == ENOTSUP
> > > 
> > > Correct.
> > > 
> > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > 
> > > Correct (but most likely a bug).
> 
> Actually, now I remember that status can be NULL, in which case it's not
> reported.  This is a case where you could check for errors with a
> simpler expression:
> 
> 	end != str
> 
> but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> reasnoable one.

Unfortunately, with a comment like this, you make things more complicated,
not simpler. I was hoping for an API where success can be determined by
looking at 'status' in all four cases, and the "simple" solution that you
are recommending now is:
  -a. look at status
  -b. look at end
  -c. look at status
  -d. look at status AND end.

> I need to update the specification to mention that status can be NULL.

Why do so? This adds text to the specification, making the specification
more complex (=> longer to understand, harder to remember). The ability
to pass NULL for rstatus is not a useful feature.

> > > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > > 
> > > You don't need end>s, because that would preclude ERANGE.
> > > 
> > > 	status == 0 || (status == ERANGE && end == '\0')
> > > 
> > > Aaand, most likely a bug.
> > 
> > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > that the values "4294967297" and "4294967295" produce the same behaviour
> > (the same error message, for example).
> 
> If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> result in an error, then you should probably clamp at ULONG_MAX - 1,
> and consider anything above an error.

I said it in the paragraph above: Often the programmer wants a smooth
transition between "unreasonably large but not overflowing" values and
"overflow". So that only one diagnostic is needed for both cases.

1) getaddrinfo.c

> > Any use of strtoul that does not test errno wants overflow
> > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > Just looking in gnulib and gettext, I find already 6 occurrences:
> >   gnulib/lib/getaddrinfo.c:299
> 
> lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
> lib/getaddrinfo.c-298-            return EAI_NONAME;
> lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
> lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
> lib/getaddrinfo.c-301-            return EAI_NONAME;
> lib/getaddrinfo.c-302-          port = htons (port);
> 
> You could remove the preceding conditional if you don't want to avoid
> leading whitespace.

We don't want to allow leading whitespace here. We need to follow the
getaddrinfo() spec [1]. At the same time, disallowing a leading '-' sign
is a benefit as well. I consider it a misfeature that strtoul() parses
"-3" successfully and returns ULONG_MAX-2, which was most certainly
not intended by the user.

> You could merge that into the strtou(3) call, which
> would report ECANCELED for non-numeric input).  Except that a negative
> number is silently converted to a positive large value.  This is why I
> use a wrapper function strtou_noneg() that rejects negative numbers.

Ouch. Yet another problem with strtoul(). That would be worth another
error code in the specification of strtou().

> I think this is not one case where you want silent saturation.  You're
> indeed doing range checks [0, UINT16_MAX].

Yes, in this particular case, the ability to pass an arbitrary min and max
to strtou() is a practical feature.

2) nproc.c, omp_init.c

> >   gnulib/lib/nproc.c:402
> 
> lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> lib/nproc.c-384-   Return 0 for invalid values.  */
> lib/nproc.c-385-static unsigned long int
> lib/nproc.c:386:parse_omp_threads (char const* threads)
> lib/nproc.c-387-{
> 
> ...
> 
> lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> lib/nproc.c-399-  if (c_isdigit (*threads))
> lib/nproc.c-400-    {
> lib/nproc.c-401-      char *endptr = NULL;
> lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> lib/nproc.c-403-
> lib/nproc.c-404-      if (endptr != NULL)
> lib/nproc.c-405-        {
> lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> lib/nproc.c-407-            endptr++;
> lib/nproc.c-408-          if (*endptr == '\0')
> lib/nproc.c-409-            return value;
> lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> lib/nproc.c-412-          else if (*endptr == ',')
> lib/nproc.c-413-            return value;
> lib/nproc.c-414-        }
> lib/nproc.c-415-    }
> 
> First of all, the endptr!=NULL test seems misplaced.

I wasn't sure that all implementations store an endptr in all cases.

> And you could probably remove the isdigit test by calling
> strtou_noneg().

This would allow leading whitespace in the value of the environment
variable, which is not needed for OpenMP 6.0 [2] § 4.1.3 and therefore
better avoided.

> This could be something like this (fixing the bugs reported above):
> 
> 	char    *end;
> 	u_long  value;
> 
> 	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> 	if (end != threads) {
> 		end += strspn(end, " \t\n");
> 		if (streq(end, "")
> 			return value;
> 		if (strprefix(end, ","))
> 			return value;
> 	}
> 
> 
> This is one case where you seem to silently ignore saturation.

This is on purpose. A CPU never has more than 1000 processors. Therefore
all values > 1000 should be treated the same way, whether they are
<= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
processors occurs later in the code.

3) msgfmt.c

> >   gettext/gettext-tools/src/msgfmt.c:287
> 
> gettext-tools/src/msgfmt.c-286-          char *endp;
> gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> gettext-tools/src/msgfmt.c-288-
> gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> 
> This code will misbehave badly on platforms where size_t is narrower
> than u_long.

Such platforms (namely Windows 3.1) are not among the portability targets
of this code.

> You also don't reject negative numbers, which I expect to be a bug,
> connected with the one from above.

That could be a problem. Fortunately, the documentation makes it clear
that the argument is an "alignment", and reasonable users will therefore
only pass the values 1, 2, 4, 8, 16.

> Why don't you have a diagnostic message for invalid input?

Because this option is meant for users who know what an "alignment" is.

4) msgl-check.c

> >   gettext/gettext-tools/src/msgl-check.c:379
> 
> gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> gettext-tools/src/msgl-check.c-375-            ++nplurals;
> gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> gettext-tools/src/msgl-check.c-381-            {
> gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> gettext-tools/src/msgl-check.c-384-
> gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> gettext-tools/src/msgl-check.c-386-                {
> gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> gettext-tools/src/msgl-check.c-389-                               msgext);
> gettext-tools/src/msgl-check.c-390-                  free (msgext);
> gettext-tools/src/msgl-check.c-391-                  free (help);
> gettext-tools/src/msgl-check.c-392-                }
> gettext-tools/src/msgl-check.c-393-              else
> gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> gettext-tools/src/msgl-check.c-395-                             msg);
> gettext-tools/src/msgl-check.c-396-
> gettext-tools/src/msgl-check.c-397-              seen_errors++;
> gettext-tools/src/msgl-check.c-398-            }
> 
> You could get rid of a lot of code preceding the strtoul(3) call by
> calling strtou_noneg() instead.

This would allow whitespace. Allowing whitespace here (as in "nplurals= 3")
is not a worthy feature.

> On the other hand, I also wonder why you don't diagnose invalid input.
> Why is -10 an "invalid nplurals value"

"-10" designates a negative number. For the number of plural forms, that's
invalid.

> but ULONG_MAX+10 is a valid (albeit clamped) one?

Again, as mentioned above, values like 1000 and 10000000000000000 should
be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
just fine.

> All of these cases look like missing error handling, IMO.

No. In this case, inspecting errno would be additional code with no benefit.

5) read-stringtable.c

> >   gettext/gettext-tools/src/read-stringtable.c:561
> 
> gettext-tools/src/read-stringtable.c-553-    {
> gettext-tools/src/read-stringtable.c-554-      char *last_colon;
> gettext-tools/src/read-stringtable.c-555-      unsigned long number;
> gettext-tools/src/read-stringtable.c-556-      char *endp;
> gettext-tools/src/read-stringtable.c-557-
> gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
> gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
> gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> gettext-tools/src/read-stringtable.c-562-        {
> gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
> gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
> gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
> gettext-tools/src/read-stringtable.c-566-        }
> gettext-tools/src/read-stringtable.c-567-      else
> gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
> gettext-tools/src/read-stringtable.c-569-    }
> 
> You're forgetting about negative numbers?

Indeed, I didn't realize that strtoul() would accept "-3" as valid. As said
above, strtou() can improve on it, by reserving an error code for it.

> How about huge values?

In this code, we assume that line numbers are <= ULONG_MAX. Not an
unreasonable assumption. I have yet to see source code files that are
4 GiB large...

> But again, I wonder why you don't do range checks.

Range checks are not important here: The line numbers are not used as
indices; they are merely reproduced in the output.

> > If you don't want to do that, I can only repeat what I said in the previous
> > mail: The proposal *does not achieve the goal* of avoiding the most common
> > programmer mistakes. For a robust API, the success test should *only* involve
> > testing the returned 'status', nothing else.
> 
> Let's discuss this after your responses to the above.

In the cases 1), 2), 4), use-case d. was chosen on purpose.

Bruno

[1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getaddrinfo.html
[2] https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf




  parent reply	other threads:[~2025-03-19 22:00 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250318142555.09A86356820@www.open-std.org>
2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
2025-03-18 21:16   ` Alejandro Colomar
2025-03-18 21:53   ` Bruno Haible
2025-03-18 22:43     ` Alejandro Colomar
2025-03-19  0:15       ` Bruno Haible
2025-03-19 15:26         ` Alejandro Colomar
2025-03-19 18:48           ` Alejandro Colomar
2025-03-19 18:56             ` Alejandro Colomar
2025-03-19 21:59           ` Bruno Haible [this message]
2025-03-19 23:12             ` Alejandro Colomar
2025-03-19 23:30               ` strtou(3) handling of negative input Alejandro Colomar
2025-03-19 23:52               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
2025-03-20  0:19                 ` Alejandro Colomar
2025-03-20  0:31                   ` Thorsten Glaser
2025-03-20  0:36                     ` Alejandro Colomar
2025-03-19 23:52               ` nullability of status parameter in strtoi/u(3) Alejandro Colomar
2025-03-20 12:44               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
2025-03-20 12:55                 ` Alejandro Colomar
2025-03-20 17:18                   ` Thorsten Glaser
2025-03-20 14:26               ` Bruno Haible
2025-03-20 14:54                 ` Alejandro Colomar
2025-03-19 19:27         ` Paul Eggert
2025-03-19 20:05           ` Alejandro Colomar
2025-03-19 20:39             ` Paul Eggert
2025-03-19 21:23               ` Alejandro Colomar
2025-03-20  0:39                 ` Paul Eggert
2025-03-20  1:15                   ` Alejandro Colomar
2025-03-20  7:03                     ` Paul Eggert
2025-03-20 10:32                       ` Alejandro Colomar
2025-03-19 15:56       ` Thorsten Glaser
2025-03-19 16:25         ` Alejandro Colomar
2025-03-19 16:36           ` Thorsten Glaser
2025-03-19 16:53             ` Alejandro Colomar
2025-03-19 17:35           ` Bruno Haible
2025-03-19 18:01             ` Alejandro Colomar
2025-03-20 16:13   ` alx-0008r2 " Alejandro Colomar
2025-03-18 17:20 ` [SC22WG14.29900] alx-0008 " Joseph Myers
2025-03-18 20:18   ` Alejandro Colomar
     [not found]   ` <20250318201854.66AB5356895@www.open-std.org>
2025-03-18 21:11     ` [SC22WG14.29912] " Joseph Myers
2025-03-18 21:35       ` Alejandro Colomar
2025-03-18 21:40         ` Alejandro Colomar
2025-03-18 22:14         ` Joseph Myers
2025-03-18 22:49           ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4085563.2iPT33SAM4@nimes \
    --to=bruno@clisp.org \
    --cc=alx@kernel.org \
    --cc=christos@netbsd.org \
    --cc=congdanhqx@gmail.com \
    --cc=eggert@cs.ucla.edu \
    --cc=eschwartz93@gmail.com \
    --cc=guillem@hadrons.org \
    --cc=ipedrosa@redhat.com \
    --cc=jubalh@iodoru.org \
    --cc=kre@netbsd.org \
    --cc=liba2i@lists.linux.dev \
    --cc=libbsd@lists.freedesktop.org \
    --cc=riastradh@netbsd.org \
    --cc=sam@gentoo.org \
    --cc=sc22wg14@open-std.org \
    --cc=serge@hallyn.com \
    --cc=tech-misc@netbsd.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.