All of lore.kernel.org
 help / color / mirror / Atom feed
* alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
@ 2025-03-18 13:54 ` Alejandro Colomar
  2025-03-18 21:16   ` Alejandro Colomar
                     ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 13:54 UTC (permalink / raw)
  To: liba2i, sc22wg14
  Cc: libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 7510 bytes --]

Hi all,

Below is a draft of a proposal for standardization of strtoi/u(3) from
NetBSD in ISO C2y.  Please review.

I've CCed everyone who was CCd in the discussions earlier this year
about these APIs and alternate APIs.  Please add anyone who is
interested, or say if you want to be removed.

I've kept several mailing lists in CC, since some of them are private.
The <liba2i@lists.linux.dev> is public, and its archives can be found
here: <https://lore.kernel.org/liba2i/>.


Have a lovely day!
Alex

---
Name
	alx-0008r0 - Standardize strtoi(3) and strtou(3) from NetBSD

Principles
	-  Codify existing practice to address evident deficiencies.
	-  Enable secure programming

Category
	Standardize existing libc APIs

Author
	Alejandro Colomar <alx@kernel.org>

	Cc: <liba2i@lists.linux.dev>
	Cc: <libbsd@lists.freedesktop.org>
	Cc: <sc22wg14@open-std.org>
	Cc: <tech-misc@netbsd.org>
	Cc: Bruno Haible <bruno@clisp.org>
	Cc: christos <christos@netbsd.org>
	Cc: Đoàn Trần Công Danh <congdanhqx@gmail.com>
	Cc: Paul Eggert <eggert@cs.ucla.edu>
	Cc: Eli Schwartz <eschwartz93@gmail.com>
	Cc: Guillem Jover <guillem@hadrons.org>
	Cc: Iker Pedrosa <ipedrosa@redhat.com>
	Cc: Michael Vetter <jubalh@iodoru.org>
	Cc: Robert Elz <kre@netbsd.org>
	Cc: <riastradh@NetBSD.org>
	Cc: Sam James <sam@gentoo.org>
	Cc: "Serge E. Hallyn" <serge@hallyn.com>

History
	<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0008.git/>

	r0 (2025-03-18):
	-  Initial draft.

Description
	The strtol(3) family of functions is do damn hard to use
	correctly.  Only a handful of programmers in the world really
	know how to use it correctly in all the corner cases, and even
	those need to be really careful to not make mistakes.

	Several projects have tried to develop successor APIs, from
	which the only one that is generic enough to supersede them is
	strtoi/u(3) from NetBSD.

	Other APIs include OpenBSD's strtonum(3), but that API isn't
	generic, and cannot replace every use of strtol(3).  gnulib has
	also some attempts to improve their situation, but they're also
	not suitable for standardization.

	strtoi/u(3) had originally a bug, which shows how difficult it
	is to correctly wrap strto{i,u}max(3) (from the strtol(3)
	family).  That bug has been fixed, and after two years of
	research into string-to-numeric APIs, I can conclude that it is
	a net improvement over the existing APIs, and doesn't have any
	significant flaws.

	It is still not the ideal API in terms of type safety, and I'm
	working on a library that provides safer wrappers.  However,
	such a library would still benefit from having strtoi/u(3) in
	the standard library, by being able to wrap around it.  And user
	programs would immediately benefit from being able to replace
	strtol(3) et al. by strtoi/u(3).

	I have audited several projects which use strtol(3) et al., and
	they're full of bugs.  It's an API that we should really
	deprecate some day.

Prior art
	NetBSD provides strto{i,u}(3), which were introduced in
	NetBSD 7.

	libbsd ports these APIs to other POSIX systems.

	shadow-utils has its own implementation for internal use.

See also
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57828>

Proposed wording
	Based on N3467.

    7.24.2  Numeric conversion functions
	New section _before_ 7.24.2.2 (The atof function).

	While all this section is new, some text is pasted verbatim from
	7.24.2.8.  I'll write that text as if it was already existing
	in the diff below.

	I also renamed the parameters of strtol(3):
	nptr => s	Because it's a string, not a pointer to a number.
	endptr => endp	It's shorter and just as readable (if not more).

	@@
	+7.24.2.*  The <b>strtoi</b> and <b>strtou</b> functions
	+
	+Synopsis
	+1	#include <stdlib.h>
	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
	+	    intmax_t min, intmax_t max, int *rstatus);
	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
	+	    uintmax_t min, uintmax_t max, int *rstatus);
	+
	+Description
	+2	The <b>strtoi</b> and <b>strtou</b> functions
		convert the initial portion of
		the string pointed to by <tt>s</tt>
	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
		respectively.
		First,
		they decompose the input string into three parts:
		an initial, possibly empty, sequence of white-space characters,
		a subject sequence resembling an integer
		represented in some radix determined by the value of <tt>base</tt>,
		and a final string of one or more unrecognized characters,
		including the terminating null character of the input string.
	+	Then,
		they attempt to convert the subject sequence to an integer.
	+	Then,
	+	they coerce the integer into the range [min, max].
	+	Finally,
		they return the result.

	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
	type names as appropriate.

	@@
	+7	If the value of <tt>base</tt> is different from
	+	the values specified in the preceding paragraphs,
	+	the behavior is implementation defined.

	The above paragraph ensures that this function has no
	input-controlled UB.  strtol(s, NULL, base) with a
	user-controlled base can result in UB, and thus vulnerabilities.
	It is trivial to report an error, so let's do it.  This function
	is heavy enough that optimizing this is not worth.  Even POSIX
	does this for strtol(3).

	@@
	 8	If the subject sequence is empty
		or does not have the expected form,
	+	or the value of <tt>base</tt> is not supported,
		no conversion is performed;
		the value of <tt>s</tt>
		is stored in the object pointer to by <tt>endp</tt>,
		provided that <tt>endp</tt> is not a null pointer.

	The above paragraph ensures that *endp can be read after a call
	to these functions.  strtol(3) doesn't provide enough guarantees
	to be able to reliably read it, even in POSIX, and it's hard to
	portably write code that calls it and can inspect *endp after
	the call without UB.

	@@
	 Returns
	+10	The <b>strtoi</b> and <b>strtou</b> functions
		return the converted and coerced value, if any.
		If no conversion could be performed,
	+	zero is coerced into the range,
	+	and then returned.

	The paragraph above doesn't mention the range of representable
	values (unlike 7.24.2.8) because that's already covered by the
	range coercion specified in p2 above.

	+Returns
	+10	The <b>strtoi</b> and <b>strtou</b> functions
	+	return the converted value, if any.
	+	If no conversion is returned,
	+	these functions return the value in the range [min, max]
	+	that is closer to 0.
	+
	+Errors
	+11	These functions don't set <b>errno</b>.
	+	Instead, they set the object pointed to by <tt>rstatus</tt>
	+	to an error code,
	+	or to zero on success.
	+
	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
	+	-- ECANCELED	The given string did not contain
	+			any characters that were converted.
	+	-- ERANGE	The converted value was out of range
	+			and has been coerced,
	+			or the range was invalid (e.g., min > max).
	+	-- ENOTSUP	The given string contained characters
	+			that did not get converted.
	+
	+13	If various errors happen in the same call,
	+	the first one listed here is reported.

	The paragraph above is important to differentiate the following:
	strtoi("7z", &end, 0, 3, 7, &status);
	strtoi("42z", &end, 0, 3, 7, &status);

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29900] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
       [not found] <20250318142555.09A86356820@www.open-std.org>
  2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
@ 2025-03-18 17:20 ` Joseph Myers
  2025-03-18 20:18   ` Alejandro Colomar
       [not found]   ` <20250318201854.66AB5356895@www.open-std.org>
  1 sibling, 2 replies; 43+ messages in thread
From: Joseph Myers @ 2025-03-18 17:20 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, alx, libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Tue, 18 Mar 2025, Alejandro Colomar wrote:

>     7.24.2  Numeric conversion functions
> 	New section _before_ 7.24.2.2 (The atof function).

You're missing corresponding <wchar.h> functions.

Maybe there should also be a reference to N3183 (discussed in Strasbourg) 
- which dealt with UB for numeric conversions in scanf rather than strto*, 
but still seems related to this proposal.

> 	While all this section is new, some text is pasted verbatim from
> 	7.24.2.8.  I'll write that text as if it was already existing
> 	in the diff below.
> 
> 	I also renamed the parameters of strtol(3):
> 	nptr => s	Because it's a string, not a pointer to a number.
> 	endptr => endp	It's shorter and just as readable (if not more).
> 
> 	@@
> 	+7.24.2.*  The <b>strtoi</b> and <b>strtou</b> functions
> 	+
> 	+Synopsis
> 	+1	#include <stdlib.h>
> 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> 	+	    intmax_t min, intmax_t max, int *rstatus);
> 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> 	+	    uintmax_t min, uintmax_t max, int *rstatus);

intmax_t and uintmax_t are not declared in <stdlib.h>.  Either the 
synopsis should mention <stdint.h> as well, or those types should be added 
to the ones declared by that header.

I'm also concerned that the names sound like int / unsigned int analogues 
of strtol, but aren't.

> 	+Description
> 	+2	The <b>strtoi</b> and <b>strtou</b> functions
> 		convert the initial portion of
> 		the string pointed to by <tt>s</tt>
> 	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
> 		respectively.
> 		First,
> 		they decompose the input string into three parts:
> 		an initial, possibly empty, sequence of white-space characters,
> 		a subject sequence resembling an integer
> 		represented in some radix determined by the value of <tt>base</tt>,
> 		and a final string of one or more unrecognized characters,
> 		including the terminating null character of the input string.
> 	+	Then,
> 		they attempt to convert the subject sequence to an integer.
> 	+	Then,
> 	+	they coerce the integer into the range [min, max].
> 	+	Finally,
> 		they return the result.
> 
> 	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
> 	type names as appropriate.

So the conversion is still locale-specific (p6).  One thing that can be 
useful for numeric conversions, and isn't covered well by the standard at 
present, is ones that are guaranteed to be in the C locale.  (That would 
require a flags argument or similar to configure the functions.)

> 	@@
> 	+7	If the value of <tt>base</tt> is different from
> 	+	the values specified in the preceding paragraphs,
> 	+	the behavior is implementation defined.

It's "implementation-defined", with a hyphen.  And for that to be useful, 
you need clear bounds on what is permitted (that is, an 
implementation-defined set of sequences is accepted, and interpreted as 
having implementation-defined numeric values).

> 	@@
> 	 Returns
> 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> 		return the converted and coerced value, if any.
> 		If no conversion could be performed,
> 	+	zero is coerced into the range,
> 	+	and then returned.
> 
> 	The paragraph above doesn't mention the range of representable
> 	values (unlike 7.24.2.8) because that's already covered by the
> 	range coercion specified in p2 above.

You don't seem to define how the coercion works.  Modulo?  Saturation?  
Something else?  ("Coerce" is not a term defined in the C standard, nor in 
ISO 2382.  So it has no semantics without them being explicitly defined 
for these functions.)

What happens if min > max?  You say below that there is an ERANGE error 
for this case, but don't say what the return value is when it can't be in 
the range.

> 	+Returns
> 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> 	+	return the converted value, if any.
> 	+	If no conversion is returned,
> 	+	these functions return the value in the range [min, max]
> 	+	that is closer to 0.

What if both are equally close to 0?

> 	+Errors
> 	+11	These functions don't set <b>errno</b>.

The standard does not use the abbreviation "don't", but says "do not".

> 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> 	+	to an error code,
> 	+	or to zero on success.
> 	+
> 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> 	+	-- ECANCELED	The given string did not contain
> 	+			any characters that were converted.
> 	+	-- ERANGE	The converted value was out of range
> 	+			and has been coerced,
> 	+			or the range was invalid (e.g., min > max).
> 	+	-- ENOTSUP	The given string contained characters
> 	+			that did not get converted.

Of these names, only ERANGE is actually defined in the C standard.  You 
don't have any updates to <errno.h> to add the others.

These functions would clearly also need several examples added to the 
standard to illustrate their functionality, which are missing from this 
proposal.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29900] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 17:20 ` [SC22WG14.29900] alx-0008 " Joseph Myers
@ 2025-03-18 20:18   ` Alejandro Colomar
       [not found]   ` <20250318201854.66AB5356895@www.open-std.org>
  1 sibling, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 20:18 UTC (permalink / raw)
  To: Joseph Myers
  Cc: liba2i, sc22wg14, alx, libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 7704 bytes --]

Hi Joseph,

Thanks for the feedback!

On Tue, Mar 18, 2025 at 05:20:19PM +0000, Joseph Myers wrote:
> On Tue, 18 Mar 2025, Alejandro Colomar wrote:
> 
> >     7.24.2  Numeric conversion functions
> > 	New section _before_ 7.24.2.2 (The atof function).
> 
> You're missing corresponding <wchar.h> functions.

As with other proposals, I prefer leaving it for a different paper.
I'm not an expert in wchar stuff.

> Maybe there should also be a reference to N3183 (discussed in Strasbourg) 
> - which dealt with UB for numeric conversions in scanf rather than strto*, 
> but still seems related to this proposal.

I have something in mind about it.  My idea was to change the definition
of atoi(3) et al. to be in terms of strtoi(3):

	int
	atoi(const char *s)
	{
		int  n, e;

		n = strtoi(s, NULL, 10, INT_MIN, INT_MAX, &e);
		errno = e ?: errno;

		return n;
	}

Which would make atoi(3) behave just like one would expect.
And then define scanf(3) %d in terms of atoi(3).

I'll add a 'Future directions' section mentioning that.

> > 	While all this section is new, some text is pasted verbatim from
> > 	7.24.2.8.  I'll write that text as if it was already existing
> > 	in the diff below.
> > 
> > 	I also renamed the parameters of strtol(3):
> > 	nptr => s	Because it's a string, not a pointer to a number.
> > 	endptr => endp	It's shorter and just as readable (if not more).
> > 
> > 	@@
> > 	+7.24.2.*  The <b>strtoi</b> and <b>strtou</b> functions
> > 	+
> > 	+Synopsis
> > 	+1	#include <stdlib.h>
> > 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> > 	+	    intmax_t min, intmax_t max, int *rstatus);
> > 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> > 	+	    uintmax_t min, uintmax_t max, int *rstatus);
> 
> intmax_t and uintmax_t are not declared in <stdlib.h>.  Either the 
> synopsis should mention <stdint.h> as well, or those types should be added 
> to the ones declared by that header.

Hmmm, my bad.  This function is from <inttypes.h>.  I should move it.

> I'm also concerned that the names sound like int / unsigned int analogues 
> of strtol, but aren't.

I don't get to choose the name.  Anyway, my plans are to erradicate
strtol(3) from history, eventually.

I'm not especially concerned because the number and type of arguments is
significantly different that mistakes are unlikely to happen; and I also
don't have a better name for it.

> > 	+Description
> > 	+2	The <b>strtoi</b> and <b>strtou</b> functions
> > 		convert the initial portion of
> > 		the string pointed to by <tt>s</tt>
> > 	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
> > 		respectively.
> > 		First,
> > 		they decompose the input string into three parts:
> > 		an initial, possibly empty, sequence of white-space characters,
> > 		a subject sequence resembling an integer
> > 		represented in some radix determined by the value of <tt>base</tt>,
> > 		and a final string of one or more unrecognized characters,
> > 		including the terminating null character of the input string.
> > 	+	Then,
> > 		they attempt to convert the subject sequence to an integer.
> > 	+	Then,
> > 	+	they coerce the integer into the range [min, max].
> > 	+	Finally,
> > 		they return the result.
> > 
> > 	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
> > 	type names as appropriate.
> 
> So the conversion is still locale-specific (p6).  One thing that can be 
> useful for numeric conversions, and isn't covered well by the standard at 
> present, is ones that are guaranteed to be in the C locale.  (That would 
> require a flags argument or similar to configure the functions.)

NetBSD has strtoi_l(3), which has an extra parameter in which you can
specify the locale.

That should have a dedicated paper, though, just like the wchar variant.

<https://man.netbsd.org/strtoi_l.3>

I'll add this to 'Future directions'.

> > 	@@
> > 	+7	If the value of <tt>base</tt> is different from
> > 	+	the values specified in the preceding paragraphs,
> > 	+	the behavior is implementation defined.
> 
> It's "implementation-defined", with a hyphen.

True.

>  And for that to be useful, 
> you need clear bounds on what is permitted (that is, an 
> implementation-defined set of sequences is accepted, and interpreted as 
> having implementation-defined numeric values).

The choices should be:

	-  Report an error.
	-  Convert in an implementation-defined manner.

> > 	@@
> > 	 Returns
> > 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> > 		return the converted and coerced value, if any.
> > 		If no conversion could be performed,
> > 	+	zero is coerced into the range,
> > 	+	and then returned.
> > 
> > 	The paragraph above doesn't mention the range of representable
> > 	values (unlike 7.24.2.8) because that's already covered by the
> > 	range coercion specified in p2 above.
> 
> You don't seem to define how the coercion works.  Modulo?  Saturation?  
> Something else?  ("Coerce" is not a term defined in the C standard, nor in 
> ISO 2382.  So it has no semantics without them being explicitly defined 
> for these functions.)

I have some wording in p2, but I should improve it.  It is saturation.

> What happens if min > max?  You say below that there is an ERANGE error 
> for this case, but don't say what the return value is when it can't be in 
> the range.

I don't have much to say.  To be honest, when implementing it I just
left it to chance.  I do

	return MAX(min, MIN(max, n));

NetBSD has a slightly different algorithm which may or may not return
the same value.  We should say it returns an unspecified value.

> > 	+Returns
> > 	+10	The <b>strtoi</b> and <b>strtou</b> functions
> > 	+	return the converted value, if any.
> > 	+	If no conversion is returned,
> > 	+	these functions return the value in the range [min, max]
> > 	+	that is closer to 0.
> 
> What if both are equally close to 0?

"both" refers to min or max, but the paragraph specifies the entire
range.  Assuming that min<=max,
-  if 0<min, then min is the closest value
-  if min<0<max, then 0 is the closest value
-  if max<0, then max is the closest value.

And if min>max, then it would be covered by the suggestion above of
saying it returns an unspecified value.

However, this duplication of p10 was an accident.  I first wrote the
second one, then the first one but forgot to remove the second one.
I like the wording of the first better (with some tweaks I'll do).

> > 	+Errors
> > 	+11	These functions don't set <b>errno</b>.
> 
> The standard does not use the abbreviation "don't", but says "do not".

Ok.

> > 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> > 	+	to an error code,
> > 	+	or to zero on success.
> > 	+
> > 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> > 	+	-- ECANCELED	The given string did not contain
> > 	+			any characters that were converted.
> > 	+	-- ERANGE	The converted value was out of range
> > 	+			and has been coerced,
> > 	+			or the range was invalid (e.g., min > max).
> > 	+	-- ENOTSUP	The given string contained characters
> > 	+			that did not get converted.
> 
> Of these names, only ERANGE is actually defined in the C standard.  You 
> don't have any updates to <errno.h> to add the others.

Ok.

> These functions would clearly also need several examples added to the 
> standard to illustrate their functionality, which are missing from this 
> proposal.

Ok.

I'll post r1 soon.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29912] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
       [not found]   ` <20250318201854.66AB5356895@www.open-std.org>
@ 2025-03-18 21:11     ` Joseph Myers
  2025-03-18 21:35       ` Alejandro Colomar
  0 siblings, 1 reply; 43+ messages in thread
From: Joseph Myers @ 2025-03-18 21:11 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, alx, libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Tue, 18 Mar 2025, Alejandro Colomar wrote:

> Hi Joseph,
> 
> Thanks for the feedback!
> 
> On Tue, Mar 18, 2025 at 05:20:19PM +0000, Joseph Myers wrote:
> > On Tue, 18 Mar 2025, Alejandro Colomar wrote:
> > 
> > >     7.24.2  Numeric conversion functions
> > > 	New section _before_ 7.24.2.2 (The atof function).
> > 
> > You're missing corresponding <wchar.h> functions.
> 
> As with other proposals, I prefer leaving it for a different paper.
> I'm not an expert in wchar stuff.

I strongly disapprove of this approach to making standard proposals; if 
everyone does this, it's a recipe for turning the standard into an 
inconsistent, non-orthogonal mess, where each feature only has sensible 
interactions with the subset of other features the proposer of the new 
feature was interested in at the time they added it.

As far as I'm concerned, it's the responsibility of the person making a 
proposal to produce a complete proposal with properly orthogonal 
interaction with other features.  I objected to the unsuccessful attempt 
to define complex literals that didn't allow for Annex H types, and I 
object likewise to randomly proposing functions, for a family that has 
corresponding <wchar.h> functions, without the corresponding <wchar.h> 
versions.  If a proposal is to add some non-orthogonal feature, there 
needs to be a good and clearly stated reason why, *as part of the overall 
language and library design*, it makes sense that way.  That is, not 
something relating to your interest or expertise in <wchar.h> functions, 
but something about the technical content of the standard that makes 
having some strto* functions with corresponding wcsto* functions and these 
ones without corresponding wcsto* functions into a logically coherent 
design.

(I don't care about Annex K myself - but I still made sure that my recent 
report of issue 1012 included the relevant Annex K edits.)

> > I'm also concerned that the names sound like int / unsigned int analogues 
> > of strtol, but aren't.
> 
> I don't get to choose the name.  Anyway, my plans are to erradicate

You do get to choose the name when making a new proposal.  If an existing 
name is defective through suggesting an incorrect analogy, that would be a 
reasonable basis to choose a new one.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
@ 2025-03-18 21:16   ` Alejandro Colomar
  2025-03-18 21:53   ` Bruno Haible
  2025-03-20 16:13   ` alx-0008r2 " Alejandro Colomar
  2 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 21:16 UTC (permalink / raw)
  To: liba2i
  Cc: alx, libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 11124 bytes --]

Hi,

Here's v2 after Joseph's feedback.

The C Committee mailing list is a mess.  Please include the following
header in your response if you're reading this email from the C Committe
mailing list: 

In-Reply-To: <ovyhifkfxvrulde33vara5qb3zerletmxrtfiur4z3c2xnlksz@k4m7xt5kd62l>


Cheers,
Alex

---
Name
	alx-0008r1 - Standardize strtoi(3) and strtou(3) from NetBSD

Principles
	-  Codify existing practice to address evident deficiencies.
	-  Enable secure programming

Category
	Standardize existing libc APIs

Author
	Alejandro Colomar <alx@kernel.org>

	Cc: <liba2i@lists.linux.dev>
	Cc: <libbsd@lists.freedesktop.org>
	Cc: <sc22wg14@open-std.org>
	Cc: <tech-misc@netbsd.org>
	Cc: Bruno Haible <bruno@clisp.org>
	Cc: christos <christos@netbsd.org>
	Cc: Đoàn Trần Công Danh <congdanhqx@gmail.com>
	Cc: Paul Eggert <eggert@cs.ucla.edu>
	Cc: Eli Schwartz <eschwartz93@gmail.com>
	Cc: Guillem Jover <guillem@hadrons.org>
	Cc: Iker Pedrosa <ipedrosa@redhat.com>
	Cc: Joseph Myers <josmyers@redhat.com>
	Cc: Michael Vetter <jubalh@iodoru.org>
	Cc: Robert Elz <kre@netbsd.org>
	Cc: <riastradh@NetBSD.org>
	Cc: Sam James <sam@gentoo.org>
	Cc: "Serge E. Hallyn" <serge@hallyn.com>

History
	<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0008.git/>

	r0 (2025-03-18):
	-  Initial draft.

	r1 (2025-03-18):
	-  Add 'Future directions' section.
	-  Fix typos.
	-  Move to <inttypes.h> (7.8 instead of 7.24).
	-  Add links to more NetBSD bug reports in 'See also'.
	-  Add link to n3183 (discussed in Strasbourg) in 'See also'.
	-  Specify the possible implementation-defined behaviors when
	   the base is a value not specified here.
	-  Specify that the range coercion is done with saturation.
	-  Specify that if min>max, these functions return an
	   unspecified value.
	-  Add ECANCELED, EINVAL, ENOTSUP to <errno.h> (7.5).
	-  Note that in the future we'll want to make this
	   const-generic.
	-  Add example.
	-  Add implementation.

Description
	The strtol(3) family of functions is do damn hard to use
	correctly.  Only a handful of programmers in the world really
	know how to use it correctly in all the corner cases, and even
	those need to be really careful to not make mistakes.

	Several projects have tried to develop successor APIs, from
	which the only one that is generic enough to supersede them is
	strtoi/u(3) from NetBSD.

	Other APIs include OpenBSD's strtonum(3), but that API isn't
	generic, and cannot replace every use of strtol(3).  gnulib has
	also some attempts to improve their situation, but they're also
	not suitable for standardization.

	strtoi/u(3) had originally a bug, which shows how difficult it
	is to correctly wrap strto{i,u}max(3) (from the strtol(3)
	family).  That bug has been fixed, and after two years of
	research into string-to-numeric APIs, I can conclude that it is
	a net improvement over the existing APIs, and doesn't have any
	significant flaws.

	It is still not the ideal API in terms of type safety, and I'm
	working on a library that provides safer wrappers.  However,
	such a library would still benefit from having strtoi/u(3) in
	the standard library, by being able to wrap around it.  And user
	programs would immediately benefit from being able to replace
	strtol(3) et al. by strtoi/u(3).

	I have audited several projects which use strtol(3) et al., and
	they're full of bugs.  It's an API that we should really
	deprecate some day.

Prior art
	NetBSD provides strto{i,u}(3), which were introduced in
	NetBSD 7.

	libbsd ports these APIs to other POSIX systems.

	shadow-utils has its own implementation for internal use.

	Here's a possible implementation of strtoi(3):

		intmax_t
		strtoi(const char *s, char **restrict endp, int base,
		    intmax_t min, intmax_t max, int *restrict status)
		{
			int        e, st;
			char       *end;
			intmax_t   n;

			if (endp == NULL)
				endp = &end;
			if (status == NULL)
				status = &st;

			if (base != 0 && (base < 2 || base > 36)) {
				*endp = (char *) s;
				*status = EINVAL;
				return MAX(min, MIN(max, 0));
			}

			e = errno;
			errno = 0;

			n = strtoimax(s, endp, base);

			if (*endp == s)
				*status = ECANCELED;
			else if (errno == ERANGE || n < min || n > max)
				*status = ERANGE;
			else if (**endp != '\0')
				*status = ENOTSUP;
			else
				*status = 0;

			errno = e;

			return MAX(min, MIN(max, n));
		}

	strtou(3) can be implemented with the same exact code, replacing
	s/intmax_t/uintmax_t/, and s/strtoimax/strtoumax/.


Future directions
    atoi(3), scanf(3)
	The atoi(3) family of functions has unnecessary UB.  It could be
	removed by redefining it in terms of this API:

		int
		atoi(const char *s)
		{
			int  n, e;

			n = strtoi(s, NULL, 10, _Minof(n), _Maxof(n), &e)
			errno = e ?: errno;

			return n;
		}

	Which would make atoi(3) behave just like one would expect.
	Then we could define scanf(3)'s %d et al. in terms of atoi(3).

    wchar_t
	It could be interesting to add a wchar-based variant of these
	APIs.

    locale_t
	It could be interesting to add a variant of these APIs that
	accepts a locale_t parameter instead of using the current
	locale.  Those APIs exist in NetBSD as strtoi_l(), strtou_l().

    _Generic
	Once something like Chris's n3510 (2025-02-27, "Enhanced type
	variance (v2)") is accepted into C2y, we could transform these
	functions to use QChar, thus transforming them into
	const-generic functions, as as with the strtol(3) family of
	functions.

See also
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57828>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58453>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58461>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3183.pdf>

Proposed wording
	Based on N3467.

    7.5  Errors <errno.h>
	@@ p2
	 The macros are
	+	ECANCELED
		EDOM
	+	EINVAL
		EILSEQ
	+	ENOTSUP
		ERANGE

    7.8.3  Functions for greatest-width integer types
	New section _before_ 7.8.3.3 (The strtoimax and strtoumax functions).

	While all this section is new, some text is pasted verbatim from
	7.24.2.8.  I'll write that text as if it was already existing
	in the diff below.

	I also renamed the parameters of strtol(3):
	nptr => s	Because it's a string, not a pointer to a number.
	endptr => endp	It's shorter and just as readable (if not more).

	@@
	+7.8.2.*  The <b>strtoi</b> and <b>strtou</b> functions
	+
	+Synopsis
	+1	#include <inttypes.h>
	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
	+	    intmax_t min, intmax_t max, int *rstatus);
	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
	+	    uintmax_t min, uintmax_t max, int *rstatus);
	+
	+Description
	+2	The <b>strtoi</b> and <b>strtou</b> functions
		convert the initial portion of
		the string pointed to by <tt>s</tt>
	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
		respectively.
		First,
		they decompose the input string into three parts:
		an initial, possibly empty, sequence of white-space characters,
		a subject sequence resembling an integer
		represented in some radix determined by the value of <tt>base</tt>,
		and a final string of one or more unrecognized characters,
		including the terminating null character of the input string.
	+	Then,
		they attempt to convert the subject sequence to an integer.
	+	Then,
	+	they coerce with saturation
	+	the integer into the range [min, max].
	+	Finally,
		they return the result.

	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
	type names as appropriate.

	@@
	+7	If the value of <tt>base</tt> is different from
	+	the values specified in the preceding paragraphs,
	+	it is implementation-defined
	+	whether these functions successfully convert the value
	+	and in which manner.

	The above paragraph ensures that this function has no
	input-controlled UB.  strtol(s, NULL, base) with a
	user-controlled base can result in UB, and thus vulnerabilities.
	It is trivial to report an error, so let's do it.  This function
	is heavy enough that optimizing this is not worth.  Even POSIX
	does this for strtol(3).

	@@
	 8	If the subject sequence is empty
		or does not have the expected form,
	+	or the value of <tt>base</tt> is not supported,
		no conversion is performed;
		the value of <tt>s</tt>
		is stored in the object pointer to by <tt>endp</tt>,
		provided that <tt>endp</tt> is not a null pointer.

	The above paragraph ensures that *endp can be read after a call
	to these functions.  strtol(3) doesn't provide enough guarantees
	to be able to reliably read it, even in POSIX, and it's hard to
	portably write code that calls it and can inspect *endp after
	the call without UB.

	@@
	 Returns
	+10	The <b>strtoi</b> and <b>strtou</b> functions
		return the converted value, if any.
		If no conversion could be performed,
	+	zero is coerced with saturation into the range,
	+	and then returned.

	The paragraph above doesn't mention the range of representable
	values (unlike 7.24.2.8) because that's already covered by the
	range coercion specified in p2 above.

	@@
	+11	If <tt>min > max</tt>,
	+	these functions return an unspecified value.

	The above paragraph covers the case where min>max, where the
	conversion with saturation into the range cannot do anything
	meaningful.  The error is still specified as ERANGE.

	@@
	+Errors
	+12	These functions do not set <b>errno</b>.
	+	Instead, they set the object pointed to by <tt>rstatus</tt>
	+	to an error code,
	+	or to zero on success.
	+
	+13	-- EINVAL	The value in <tt>base</tt> is not supported.
	+	-- ECANCELED	The given string did not contain
	+			any characters that were converted.
	+	-- ERANGE	The converted value was out of range
	+			and has been coerced,
	+			or the range was invalid (e.g., min > max).
	+	-- ENOTSUP	The given string contained characters
	+			that did not get converted.
	+
	+14	If various errors happen in the same call,
	+	the first one listed here is reported.

	The paragraph above is important to differentiate the following:
	strtoi("7z", &end, 0, 3, 7, &status);
	strtoi("42z", &end, 0, 3, 7, &status);

	@@
	+15	EXAMPLE 1
	+	The following is an example of
	+	using these functions to parse a number
	+	and the string that follows.
	+
	+		int       err;
	+		char      *end;
	+		intmax_t  n, min = 5, max = 50;
	+
	+		n = strtoi(" 42 kg", &end, 10, min, max, &err);
	+		if (err != 0) {
	+			if (err == EINVAL || err == ECANCELED)
	+				fprintf(stderr, "%s\n", strerror(err));
	+				exit(EXIT_FAILURE);
	+			if (err == ERANGE && n == min)
	+				puts("Too light");
	+			if (err == ERANGE && n == max)
	+				puts("Too heavy");
	+		}
	+		printf("Quantity: %jd\n", n);
	+		if (err == ENOTSUP)
	+			printf("Units: %s\n", end + strspn(end));
	+		else
	+			puts("Unitless?");

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29912] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 21:11     ` [SC22WG14.29912] " Joseph Myers
@ 2025-03-18 21:35       ` Alejandro Colomar
  2025-03-18 21:40         ` Alejandro Colomar
  2025-03-18 22:14         ` Joseph Myers
  0 siblings, 2 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 21:35 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alejandro Colomar, liba2i, sc22wg14, libbsd, tech-misc,
	Bruno Haible, christos, Đoàn Trần Công Danh,
	Paul Eggert, Eli Schwartz, Guillem Jover, Iker Pedrosa,
	Michael Vetter, Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 3488 bytes --]

Hi Joseph,

On Tue, Mar 18, 2025 at 09:11:58PM +0000, Joseph Myers wrote:
> > > >     7.24.2  Numeric conversion functions
> > > > 	New section _before_ 7.24.2.2 (The atof function).
> > > 
> > > You're missing corresponding <wchar.h> functions.
> > 
> > As with other proposals, I prefer leaving it for a different paper.
> > I'm not an expert in wchar stuff.
> 
> I strongly disapprove of this approach to making standard proposals; if 
> everyone does this, it's a recipe for turning the standard into an 
> inconsistent, non-orthogonal mess, where each feature only has sensible 
> interactions with the subset of other features the proposer of the new 
> feature was interested in at the time they added it.
> 
> As far as I'm concerned, it's the responsibility of the person making a 
> proposal to produce a complete proposal with properly orthogonal 
> interaction with other features.  I objected to the unsuccessful attempt 
> to define complex literals that didn't allow for Annex H types, and I 
> object likewise to randomly proposing functions, for a family that has 
> corresponding <wchar.h> functions, without the corresponding <wchar.h> 
> versions.  If a proposal is to add some non-orthogonal feature, there 
> needs to be a good and clearly stated reason why, *as part of the overall 
> language and library design*, it makes sense that way.  That is, not 
> something relating to your interest or expertise in <wchar.h> functions, 
> but something about the technical content of the standard that makes 
> having some strto* functions with corresponding wcsto* functions and these 
> ones without corresponding wcsto* functions into a logically coherent 
> design.

Okay, let's try with some more rationale:

NetBSD has strtoi/u(3), but not wcstoi/u(3), AFAICS.  This is prior art.

I don't feel qualified to propose a function for a family (wchar.h)
which I have never used myself, nor implemented.

I think it would make sense to present two papers at the same time, one
proposing the wchar.h variant, and one presenting the normal variant.  I
just don't feel qualified to decide whether we want a wchar_t variant,
nor to specify it myself.

On the other hand, I feel qualified to propose strtoi/u(3), and think it
is quite necessary in the standard, regardless of what happens to the
wchar_t variant.

If someone who is an expert in wide strings wants to work with me on the
specification of a wide variant, I'm happy to help.  But I can't do that
myself alone.

> (I don't care about Annex K myself - but I still made sure that my recent 
> report of issue 1012 included the relevant Annex K edits.)
> 
> > > I'm also concerned that the names sound like int / unsigned int analogues 
> > > of strtol, but aren't.
> > 
> > I don't get to choose the name.  Anyway, my plans are to erradicate
> 
> You do get to choose the name when making a new proposal.  If an existing 
> name is defective through suggesting an incorrect analogy, that would be a 
> reasonable basis to choose a new one.

Yeah, I could choose it, if I had a better one.  I did that for the
case of Plan9's seprint(2), which I called aprintf().  However, in this
case, I don't have an idea for a significantly better name, and
considering that the number of arguments makes accidents impossible,
there aren't strong reasons to deviate from prior art.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29912] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 21:35       ` Alejandro Colomar
@ 2025-03-18 21:40         ` Alejandro Colomar
  2025-03-18 22:14         ` Joseph Myers
  1 sibling, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 21:40 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alejandro Colomar, liba2i, sc22wg14, libbsd, tech-misc,
	Bruno Haible, christos, Đoàn Trần Công Danh,
	Paul Eggert, Eli Schwartz, Guillem Jover, Iker Pedrosa,
	Michael Vetter, Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 3780 bytes --]

On Tue, Mar 18, 2025 at 10:35:53PM +0100, Alejandro Colomar wrote:
> Hi Joseph,
> 
> On Tue, Mar 18, 2025 at 09:11:58PM +0000, Joseph Myers wrote:
> > > > >     7.24.2  Numeric conversion functions
> > > > > 	New section _before_ 7.24.2.2 (The atof function).
> > > > 
> > > > You're missing corresponding <wchar.h> functions.
> > > 
> > > As with other proposals, I prefer leaving it for a different paper.
> > > I'm not an expert in wchar stuff.
> > 
> > I strongly disapprove of this approach to making standard proposals; if 
> > everyone does this, it's a recipe for turning the standard into an 
> > inconsistent, non-orthogonal mess, where each feature only has sensible 
> > interactions with the subset of other features the proposer of the new 
> > feature was interested in at the time they added it.
> > 
> > As far as I'm concerned, it's the responsibility of the person making a 
> > proposal to produce a complete proposal with properly orthogonal 
> > interaction with other features.  I objected to the unsuccessful attempt 
> > to define complex literals that didn't allow for Annex H types, and I 
> > object likewise to randomly proposing functions, for a family that has 
> > corresponding <wchar.h> functions, without the corresponding <wchar.h> 
> > versions.  If a proposal is to add some non-orthogonal feature, there 
> > needs to be a good and clearly stated reason why, *as part of the overall 
> > language and library design*, it makes sense that way.  That is, not 
> > something relating to your interest or expertise in <wchar.h> functions, 
> > but something about the technical content of the standard that makes 
> > having some strto* functions with corresponding wcsto* functions and these 
> > ones without corresponding wcsto* functions into a logically coherent 
> > design.
> 
> Okay, let's try with some more rationale:
> 
> NetBSD has strtoi/u(3), but not wcstoi/u(3), AFAICS.  This is prior art.
> 
> I don't feel qualified to propose a function for a family (wchar.h)
> which I have never used myself, nor implemented.
> 
> I think it would make sense to present two papers at the same time, one
> proposing the wchar.h variant, and one presenting the normal variant.  I
> just don't feel qualified to decide whether we want a wchar_t variant,
> nor to specify it myself.
> 
> On the other hand, I feel qualified to propose strtoi/u(3), and think it
> is quite necessary in the standard, regardless of what happens to the
> wchar_t variant.
> 
> If someone who is an expert in wide strings wants to work with me on the
> specification of a wide variant, I'm happy to help.  But I can't do that
> myself alone.
> 
> > (I don't care about Annex K myself - but I still made sure that my recent 
> > report of issue 1012 included the relevant Annex K edits.)
> > 
> > > > I'm also concerned that the names sound like int / unsigned int analogues 
> > > > of strtol, but aren't.
> > > 
> > > I don't get to choose the name.  Anyway, my plans are to erradicate
> > 
> > You do get to choose the name when making a new proposal.  If an existing 
> > name is defective through suggesting an incorrect analogy, that would be a 
> > reasonable basis to choose a new one.
> 
> Yeah, I could choose it, if I had a better one.  I did that for the
> case of Plan9's seprint(2), which I called aprintf().  However, in this

s/seprint/smprint/

> case, I don't have an idea for a significantly better name, and
> considering that the number of arguments makes accidents impossible,
> there aren't strong reasons to deviate from prior art.
> 
> 
> Have a lovely night!
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
  2025-03-18 21:16   ` Alejandro Colomar
@ 2025-03-18 21:53   ` Bruno Haible
  2025-03-18 22:43     ` Alejandro Colomar
  2025-03-20 16:13   ` alx-0008r2 " Alejandro Colomar
  2 siblings, 1 reply; 43+ messages in thread
From: Bruno Haible @ 2025-03-18 21:53 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Hi Alejandro,

> Below is a draft of a proposal for standardization of strtoi/u(3) from
> NetBSD in ISO C2y.

First of all: I like your initiative, and I moderately like this proposal.

> 	The strtol(3) family of functions is do damn hard to use
> 	correctly.  Only a handful of programmers in the world really
> 	know how to use it correctly in all the corner cases, and even
> 	those need to be really careful to not make mistakes.

It would be useful to list the mistakes that are being made most frequently;
so as to verify that the proposed strtoi / strtou functions don't tend
to provoke the same mistakes. (I'd guess that one of the frequent mistakes
is that when the number is not expected to occupy the entire string,
the success test after (errno = 0, strtol (...)) is
    endptr > nptr && errno == 0
and programmers tend to forget one of the two conditions.)

> 	+Synopsis
> 	+1	#include <stdlib.h>
> 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> 	+	    intmax_t min, intmax_t max, int *rstatus);
> 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> 	+	    uintmax_t min, uintmax_t max, int *rstatus);

Probably it will be an impediment to adoption that these functions work
on [u]intmax_t, which is 64-bits or 128-bits integers, which seems overkill
when people want to parse, say, a port number in the range 0..65535.

To address this adoption problem, how about changing these function to
generic functions (in the sense of <tgmath.h>)? In such a way that
    strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
is known to return a 'long' rather than 'intmax_t', and
    strtoi (n, &end, base, INT_MIN, INT_MAX, &status)
is known to return an 'int' rather than 'intmax_t'.

If the standard does NOT say that these functions are generic, it would
be harder for an implementation to optimize invocations of these
functions for narrower types: I don't see how it could be done without
explicit compiler support.

> 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> 	+	to an error code,
> 	+	or to zero on success.
> 	+
> 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> 	+	-- ECANCELED	The given string did not contain
> 	+			any characters that were converted.
> 	+	-- ERANGE	The converted value was out of range
> 	+			and has been coerced,
> 	+			or the range was invalid (e.g., min > max).
> 	+	-- ENOTSUP	The given string contained characters
> 	+			that did not get converted.
> 	+
> 	+13	If various errors happen in the same call,
> 	+	the first one listed here is reported.

It would be useful to show how a success test looks like, after
    strtoi (s, &end, base, min, max, &status)
for each of the four frequent use-cases:
  -a. expect to parse the initial portion of the string, no coercion,
  -b. expect to parse the initial portion of the string, silent coercion,
  -c. expect to parse the entire string, no coercion,
  -d. expect to parse the entire string, silent coercion.

AFAICS, the success tests are:
  -a. status == 0 || status == ENOTSUP
  -b. status == 0 || status == ENOTSUP || status == ERANGE
  -c. status == 0
  -d. status == 0 || (status == ERANGE && end > s && *end == '\0')

The success test in case d. is so complicated that, for my feeling, the goal
to avoid programmer mistakes is not being met.

I would therefore propose to change the status value to a bit mask, so that
the error conditions "The converted value was out of range and has been
coerced" and "The given string contains characters that did not get converted"
can be both returned together, without conflicting.

And, while at it, the error condition "min > max" is an error that is
independent of the given string contents; I would better see it mapped to
EINVAL rather than ERANGE.

Bruno




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29912] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 21:35       ` Alejandro Colomar
  2025-03-18 21:40         ` Alejandro Colomar
@ 2025-03-18 22:14         ` Joseph Myers
  2025-03-18 22:49           ` Alejandro Colomar
  1 sibling, 1 reply; 43+ messages in thread
From: Joseph Myers @ 2025-03-18 22:14 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Alejandro Colomar, liba2i, sc22wg14, libbsd, tech-misc,
	Bruno Haible, christos, Đoàn Trần Công Danh,
	Paul Eggert, Eli Schwartz, Guillem Jover, Iker Pedrosa,
	Michael Vetter, Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Tue, 18 Mar 2025, Alejandro Colomar wrote:

> Okay, let's try with some more rationale:
> 
> NetBSD has strtoi/u(3), but not wcstoi/u(3), AFAICS.  This is prior art.

Prior art is something to learn from, but if there are issues with it (and 
not being properly orthogonal when considered together with the existing 
APIs in the standard is such an issue) then they provide a basis for not 
adopting it exactly as-is.

> I don't feel qualified to propose a function for a family (wchar.h)
> which I have never used myself, nor implemented.
> 
> I think it would make sense to present two papers at the same time, one
> proposing the wchar.h variant, and one presenting the normal variant.  I
> just don't feel qualified to decide whether we want a wchar_t variant,
> nor to specify it myself.

This really doesn't need much expertise.  Just look at how strtol and 
wcstol differ (such as references to "wide character" for wcstol) and 
follow that.  I think that by the time the proposal reaches an actual 
document submitted to the committee, it should include both wide and 
narrow versions (even if earlier drafts just state that the wide version 
will be included in the full proposal, but omit it to reduce the extent to 
which changes need applying to both halves of the proposal, while it's 
still changing rapidly).

It's *excluding* the wchar_t version that would need more expertise to 
justify any such exclusion, because the default for these interfaces is to 
have both versions.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 21:53   ` Bruno Haible
@ 2025-03-18 22:43     ` Alejandro Colomar
  2025-03-19  0:15       ` Bruno Haible
  2025-03-19 15:56       ` Thorsten Glaser
  0 siblings, 2 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 22:43 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 8374 bytes --]

Hi Bruno,

On Tue, Mar 18, 2025 at 10:53:09PM +0100, Bruno Haible wrote:
> > Below is a draft of a proposal for standardization of strtoi/u(3) from
> > NetBSD in ISO C2y.
> 
> First of all: I like your initiative, and I moderately like this proposal.

Thanks!

> > 	The strtol(3) family of functions is do damn hard to use
> > 	correctly.  Only a handful of programmers in the world really
> > 	know how to use it correctly in all the corner cases, and even
> > 	those need to be really careful to not make mistakes.
> 
> It would be useful to list the mistakes that are being made most frequently;
> so as to verify that the proposed strtoi / strtou functions don't tend
> to provoke the same mistakes.

Ughhh, I don't think I can come up with a comprehensive list; I will
probably forget some.  They are spread on the mailing lists, for anyone
very interested.  I noted some of them intermingled in the proposal, as
comments below the proposed wording paragraphs.

The groff@ mailing list also has a large number of bugs I found in calls
to strtol(3), some of which probably still remain there, because it
still uses strtol(3).

I think covering the API in "Police, do not trespass" bands would be
easier.  :-)

> (I'd guess that one of the frequent mistakes
> is that when the number is not expected to occupy the entire string,

Do you mean "not expected to occupy the entire string", or "expected to
not occupy the entire string"?  They are different cases.

> the success test after (errno = 0, strtol (...)) is
>     endptr > nptr && errno == 0

You need to dereference *endptr.  And the test actually depends on your
response to the above.

> and programmers tend to forget one of the two conditions.)

Assuming that you know that the base is valid beforehand, and that your
response yo my question is the former, and that you dereference endptr,
yes that's the correct check, AFAICS (it is painful to validate these
things; I may make mistakes now).  And yes, that's one of the common
mistakes.

Another is forgetting to validate the base, which makes your test UB,
because *endptr might be uninitialized.  It ain't fun.  :|

> > 	+Synopsis
> > 	+1	#include <stdlib.h>
> > 	+	intmax_t strtoi(const char *restrict s, char **restrict endp, int base,
> > 	+	    intmax_t min, intmax_t max, int *rstatus);
> > 	+	uintmax_t strtou(const char *restrict s, char **restrict endp, int base,
> > 	+	    uintmax_t min, uintmax_t max, int *rstatus);
> 
> Probably it will be an impediment to adoption that these functions work
> on [u]intmax_t, which is 64-bits or 128-bits integers, which seems overkill
> when people want to parse, say, a port number in the range 0..65535.
> 
> To address this adoption problem, how about changing these function to
> generic functions (in the sense of <tgmath.h>)? In such a way that
>     strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
> is known to return a 'long' rather than 'intmax_t', and
>     strtoi (n, &end, base, INT_MIN, INT_MAX, &status)

The first parameter should be a string, not a number.  I think the name
nptr is a historic mistake.  The number is returned.

> is known to return an 'int' rather than 'intmax_t'.

How would you decide when the type of min and max is different?
The problem of this API is not having the number as an argument.

I have designed a better API, through a type-generic macro:

	int
	a2i(typename T, T *restrict n, QChar *s,
	    QChar *_Optional *restrict endp, int base,
	    T min, T max);

to be used as:

	if (a2i(int, &n, "42 c", &end, 0, -1000, 1000) == -1 && errno != ENOTSUP)
		err(1, "a2i");

However, I was wondering lately about changing it slightly:

	QChar *
	funcname(typename T, T *restrict n, QChar *restrict s, int base,
	    T min, T max);

to be used as:

	errno = 0;
	end = funcname(int, &n, "42 c", 0, -1000, 1000);
	if (errno != 0 && errno != ENOTSUP)
		err(1, "funcname");

It has the benefit of being simpler regarding the 'restrict' pseudo-
qualifier, and it also means that the implementation is probably
simpler.  This one would need to guarantee not clobbering errno, though,
since there's no error code.  I'm still undecided.  For now, these APIs
are under experimentation in shadow-utils.  (But feedback is welcome!)

> If the standard does NOT say that these functions are generic, it would
> be harder for an implementation to optimize invocations of these
> functions for narrower types: I don't see how it could be done without
> explicit compiler support.

I think a function that parses numbers is going to be slow enough that
these micro-optimizations should be unimportant.

When/if we have a2i() in the future, we could get better optimizations.

> > 	+	Instead, they set the object pointed to by <tt>rstatus</tt>
> > 	+	to an error code,
> > 	+	or to zero on success.
> > 	+
> > 	+12	-- EINVAL	The value in <tt>base</tt> is not supported.
> > 	+	-- ECANCELED	The given string did not contain
> > 	+			any characters that were converted.
> > 	+	-- ERANGE	The converted value was out of range
> > 	+			and has been coerced,
> > 	+			or the range was invalid (e.g., min > max).
> > 	+	-- ENOTSUP	The given string contained characters
> > 	+			that did not get converted.
> > 	+
> > 	+13	If various errors happen in the same call,
> > 	+	the first one listed here is reported.
> 
> It would be useful to show how a success test looks like, after
>     strtoi (s, &end, base, min, max, &status)
> for each of the four frequent use-cases:
>   -a. expect to parse the initial portion of the string, no coercion,
>   -b. expect to parse the initial portion of the string, silent coercion,
>   -c. expect to parse the entire string, no coercion,
>   -d. expect to parse the entire string, silent coercion.
> 
> AFAICS, the success tests are:
>   -a. status == 0 || status == ENOTSUP

Correct.

>   -b. status == 0 || status == ENOTSUP || status == ERANGE

Correct (but most likely a bug).

>   -c. status == 0

Correct.

>   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')

You don't need end>s, because that would preclude ERANGE.

	status == 0 || (status == ERANGE && end == '\0')

Aaand, most likely a bug.

BTW, remember that 'endp' is the name of the parameter because you pass
strtoi(..., &end, ...).  The name of your local variable should be end.

> 
> The success test in case d. is so complicated that, for my feeling, the goal
> to avoid programmer mistakes is not being met.

Cases b and d are not real, IMO.  I have never seen code where that is
wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
looking precisely for that usage.  That's what allowed us to fix the bug
in strtoi/u(3) back then.

> I would therefore propose to change the status value to a bit mask, so that
> the error conditions "The converted value was out of range and has been
> coerced" and "The given string contains characters that did not get converted"
> can be both returned together, without conflicting.

Because it is theoretical conditions that a real program never wants,
let's not do that.

> And, while at it, the error condition "min > max" is an error that is
> independent of the given string contents; I would better see it mapped to
> EINVAL rather than ERANGE.

I think it is ERANGE because it should never happen in good programs.
The base is something that depends on the system supporting a certain
base or not, so it makes sense to query it.  But the range is something
specified by the user, and which would never make any sense to pass
twisted like that, so the API just does the least harmful thing.

Consider the following call:

	strtoi(s, NULL, 42, min, max, &err)

which will return EINVAL.  Does the system support the base or not?  If
the only condition that will trigger EINVAL is an invalid base, then I
have the answer.  Otherwise, I need to check for the validity of
min,max.  I think it's better if the function reports ERANGE for an
invalid range, and then it's my fault if I don't validate my range.

TL;DR:  The base can only be validated by the system, while the range
can (and should) be validated by the caller.


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [SC22WG14.29912] alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 22:14         ` Joseph Myers
@ 2025-03-18 22:49           ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-18 22:49 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alejandro Colomar, liba2i, sc22wg14, libbsd, tech-misc,
	Bruno Haible, christos, Đoàn Trần Công Danh,
	Paul Eggert, Eli Schwartz, Guillem Jover, Iker Pedrosa,
	Michael Vetter, Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]

Hi Joseph,

On Tue, Mar 18, 2025 at 10:14:03PM +0000, Joseph Myers wrote:
> On Tue, 18 Mar 2025, Alejandro Colomar wrote:
> 
> > Okay, let's try with some more rationale:
> > 
> > NetBSD has strtoi/u(3), but not wcstoi/u(3), AFAICS.  This is prior art.
> 
> Prior art is something to learn from, but if there are issues with it (and 
> not being properly orthogonal when considered together with the existing 
> APIs in the standard is such an issue) then they provide a basis for not 
> adopting it exactly as-is.
> 
> > I don't feel qualified to propose a function for a family (wchar.h)
> > which I have never used myself, nor implemented.
> > 
> > I think it would make sense to present two papers at the same time, one
> > proposing the wchar.h variant, and one presenting the normal variant.  I
> > just don't feel qualified to decide whether we want a wchar_t variant,
> > nor to specify it myself.
> 
> This really doesn't need much expertise.  Just look at how strtol and 
> wcstol differ (such as references to "wide character" for wcstol) and 
> follow that.  I think that by the time the proposal reaches an actual 
> document submitted to the committee, it should include both wide and 
> narrow versions (even if earlier drafts just state that the wide version 
> will be included in the full proposal, but omit it to reduce the extent to 
> which changes need applying to both halves of the proposal, while it's 
> still changing rapidly).

Okay, I'll include a paragraph saying "a wide variant will be added to
the final proposal".

I still want to submit an N document without it, to allow reviewers to
concentrate on just the API concepts.  Then, well before Pittsburgh/Brno
I'll add the other one.

I hereby disclaim any mistakes in that part, and encourage anyone
interested in having (or not having) it to verify it thoroughly, and
express it as soon as possible.

> It's *excluding* the wchar_t version that would need more expertise to 
> justify any such exclusion, because the default for these interfaces is to 
> have both versions.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 22:43     ` Alejandro Colomar
@ 2025-03-19  0:15       ` Bruno Haible
  2025-03-19 15:26         ` Alejandro Colomar
  2025-03-19 19:27         ` Paul Eggert
  2025-03-19 15:56       ` Thorsten Glaser
  1 sibling, 2 replies; 43+ messages in thread
From: Bruno Haible @ 2025-03-19  0:15 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Alejandro Colomar wrote:
> > It would be useful to show how a success test looks like, after
> >     strtoi (s, &end, base, min, max, &status)
> > for each of the four frequent use-cases:
> >   -a. expect to parse the initial portion of the string, no coercion,
> >   -b. expect to parse the initial portion of the string, silent coercion,
> >   -c. expect to parse the entire string, no coercion,
> >   -d. expect to parse the entire string, silent coercion.
> > 
> > AFAICS, the success tests are:
> >   -a. status == 0 || status == ENOTSUP
> 
> Correct.
> 
> >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> 
> Correct (but most likely a bug).
> 
> >   -c. status == 0
> 
> Correct.
> 
> >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> 
> You don't need end>s, because that would preclude ERANGE.
> 
> 	status == 0 || (status == ERANGE && end == '\0')
> 
> Aaand, most likely a bug.

Cases b. and d. are not bugs. Often, the programmer knows that treating
a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
that the values "4294967297" and "4294967295" produce the same behaviour
(the same error message, for example).

It is for these cases that your specification contains the clamping /
coercion behaviour.

Now, when you look at the table of success tests:

   -a. status == 0 || status == ENOTSUP
   -b. status == 0 || status == ENOTSUP || status == ERANGE
   -c. status == 0
   -d. status == 0 || (status == ERANGE && *end == '\0')

it is immediately clear that the status return convention is ill-designed,
because the returned 'status' is not the only thing a programmer has to test
after calling the function.

> Cases b and d are not real, IMO.  I have never seen code where that is
> wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
> looking precisely for that usage.

I disagree. Any use of strtoul that does not test errno wants overflow
to be mapped to ULONG_MAX, that is, is in case b. or d.
Just looking in gnulib and gettext, I find already 6 occurrences:
  gnulib/lib/getaddrinfo.c:299
  gnulib/lib/nproc.c:402
  gnulib/lib/omp-init.c:48
  gettext/gettext-tools/src/msgfmt.c:287
  gettext/gettext-tools/src/msgl-check.c:379
  gettext/gettext-tools/src/read-stringtable.c:561

> > I would therefore propose to change the status value to a bit mask, so that
> > the error conditions "The converted value was out of range and has been
> > coerced" and "The given string contains characters that did not get converted"
> > can be both returned together, without conflicting.
> 
> Because it is theoretical conditions that a real program never wants,
> let's not do that.

If you don't want to do that, I can only repeat what I said in the previous
mail: The proposal *does not achieve the goal* of avoiding the most common
programmer mistakes. For a robust API, the success test should *only* involve
testing the returned 'status', nothing else.

Bruno




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19  0:15       ` Bruno Haible
@ 2025-03-19 15:26         ` Alejandro Colomar
  2025-03-19 18:48           ` Alejandro Colomar
  2025-03-19 21:59           ` Bruno Haible
  2025-03-19 19:27         ` Paul Eggert
  1 sibling, 2 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 15:26 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 13986 bytes --]

Hi Bruno,

On Wed, Mar 19, 2025 at 01:15:30AM +0100, Bruno Haible wrote:
> Alejandro Colomar wrote:
> > > It would be useful to show how a success test looks like, after
> > >     strtoi (s, &end, base, min, max, &status)
> > > for each of the four frequent use-cases:
> > >   -a. expect to parse the initial portion of the string, no coercion,
> > >   -b. expect to parse the initial portion of the string, silent coercion,
> > >   -c. expect to parse the entire string, no coercion,
> > >   -d. expect to parse the entire string, silent coercion.
> > > 
> > > AFAICS, the success tests are:
> > >   -a. status == 0 || status == ENOTSUP
> > 
> > Correct.
> > 
> > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > 
> > Correct (but most likely a bug).

Actually, now I remember that status can be NULL, in which case it's not
reported.  This is a case where you could check for errors with a
simpler expression:

	end != str

but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
reasnoable one.

like you can do with strtol(3), but with portability guarantess
regarding EINVAL, because strtoi(3bsd) always writes *endp (if nonnull).

I need to update the specification to mention that status can be NULL.

> > 
> > >   -c. status == 0
> > 
> > Correct.
> > 
> > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > 
> > You don't need end>s, because that would preclude ERANGE.
> > 
> > 	status == 0 || (status == ERANGE && end == '\0')
> > 
> > Aaand, most likely a bug.
> 
> Cases b. and d. are not bugs. Often, the programmer knows that treating
> a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> that the values "4294967297" and "4294967295" produce the same behaviour
> (the same error message, for example).

If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
result in an error, then you should probably clamp at ULONG_MAX - 1,
and consider anything above an error.

> It is for these cases that your specification contains the clamping /
> coercion behaviour.
> 
> Now, when you look at the table of success tests:
> 
>    -a. status == 0 || status == ENOTSUP
>    -b. status == 0 || status == ENOTSUP || status == ERANGE
>    -c. status == 0
>    -d. status == 0 || (status == ERANGE && *end == '\0')
> 
> it is immediately clear that the status return convention is ill-designed,
> because the returned 'status' is not the only thing a programmer has to test
> after calling the function.
> 
> > Cases b and d are not real, IMO.  I have never seen code where that is
> > wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
> > looking precisely for that usage.
> 
> I disagree.

I didn't find any occurence of 'd' in calls to strtoi(3)/strtou(3).
I didn't analyze calls to strtol(3) et al.

> Any use of strtoul that does not test errno wants overflow
> to be mapped to ULONG_MAX, that is, is in case b. or d.
> Just looking in gnulib and gettext, I find already 6 occurrences:
>   gnulib/lib/getaddrinfo.c:299

lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
lib/getaddrinfo.c-298-            return EAI_NONAME;
lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
lib/getaddrinfo.c-301-            return EAI_NONAME;
lib/getaddrinfo.c-302-          port = htons (port);

You could remove the preceding conditional if you don't want to avoid
leading whitespace.  You could merge that into the strtou(3) call, which
would report ECANCELED for non-numeric input).  Except that a negative
number is silently converted to a positive large value.  This is why I
use a wrapper function strtou_noneg() that rejects negative numbers.

You could rewrite it as:

	port = strtou_noneg(servname, NULL, 10, 0, UINT16_MAX, &status);
	if (status != 0)
		return EAI_NONAME;
	port = htons(port);

where strtou_noneg() is:

	uintmax_t
	strtou_noneg(const char *s, char **restrict endp, int base,
	    uintmax_t min, uintmax_t max, int *restrict status)
	{
		int  st;

		if (status == NULL)
			status = &st;
		if (strtoi(s, endp, base, 0, 1, status) == 0 && *status == ERANGE)
			return min;

		return strtou(s, endp, base, min, max, status);
	}

I think this is not one case where you want silent saturation.  You're
indeed doing range checks [0, UINT16_MAX].

>   gnulib/lib/nproc.c:402

lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
lib/nproc.c-384-   Return 0 for invalid values.  */
lib/nproc.c-385-static unsigned long int
lib/nproc.c:386:parse_omp_threads (char const* threads)
lib/nproc.c-387-{

...

lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
lib/nproc.c-399-  if (c_isdigit (*threads))
lib/nproc.c-400-    {
lib/nproc.c-401-      char *endptr = NULL;
lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
lib/nproc.c-403-
lib/nproc.c-404-      if (endptr != NULL)
lib/nproc.c-405-        {
lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
lib/nproc.c-407-            endptr++;
lib/nproc.c-408-          if (*endptr == '\0')
lib/nproc.c-409-            return value;
lib/nproc.c-410-          /* Also accept the first value in a nesting level,
lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
lib/nproc.c-412-          else if (*endptr == ',')
lib/nproc.c-413-            return value;
lib/nproc.c-414-        }
lib/nproc.c-415-    }

First of all, the endptr!=NULL test seems misplaced.  The only way that
could be true is if the base is unsupported, and 10 is necessarily
supported.  You should remove the initialization '= NULL', and the
check, since both are dead code, IIRC.  That's one of the things you
don't need to care with strtoi(3), because it _always_ sets *endp.

And you could probably remove the isdigit test by calling
strtou_noneg().

This could be something like this (fixing the bugs reported above):

	char    *end;
	u_long  value;

	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
	if (end != threads) {
		end += strspn(end, " \t\n");
		if (streq(end, "")
			return value;
		if (strprefix(end, ","))
			return value;
	}


This is one case where you seem to silently ignore saturation.  Why
don't you have any diagnostic message?

>   gnulib/lib/omp-init.c:48

lib/omp-init.c-47-      char *endptr = NULL;
lib/omp-init.c:48:      unsigned long int value = strtoul (threads, &endptr, 10);
lib/omp-init.c-49-
lib/omp-init.c-50-      if (endptr != NULL)
lib/omp-init.c-51-        {
lib/omp-init.c-52-          while (*endptr != '\0' && c_isspace (*endptr))
lib/omp-init.c-53-            endptr++;
lib/omp-init.c-54-          if (*endptr == '\0')
lib/omp-init.c-55-            return value;
lib/omp-init.c-56-          /* Also accept the first value in a nesting level,
lib/omp-init.c-57-             since we can't determine the nesting level from env vars.  */
lib/omp-init.c-58-          else if (*endptr == ',')
lib/omp-init.c-59-            return value;
lib/omp-init.c-60-        }

This seems identical to the previous case.

>   gettext/gettext-tools/src/msgfmt.c:287

gettext-tools/src/msgfmt.c-286-          char *endp;
gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
gettext-tools/src/msgfmt.c-288-
gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
gettext-tools/src/msgfmt.c-290-            alignment = new_align;

This code will misbehave badly on platforms where size_t is narrower
than u_long.  Consider the case where you parse a high u_long, let's say
SIZE_MAX + 1ul.  It will be converted to 1.  There's a bug due to a
missing range check (but orthogonal to saturation).

You also don't reject negative numbers, which I expect to be a bug,
connected with the one from above.

This could be rewritten to (fixing the bugs reported above):

	char    *end;
	size_t  new_align;

	new_align = strtou_noneg(optarg, &end, 0, 0, SIZE_MAX, NULL);
	if (optarg != end)
		alignment = new_align;

This seems another case where you silently saturate.  Why don't you have
a diagnostic message for invalid input?

>   gettext/gettext-tools/src/msgl-check.c:379

gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
gettext-tools/src/msgl-check.c-375-            ++nplurals;
gettext-tools/src/msgl-check.c-376-          endp = nplurals;
gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
gettext-tools/src/msgl-check.c-381-            {
gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
gettext-tools/src/msgl-check.c-384-
gettext-tools/src/msgl-check.c-385-              if (help != NULL)
gettext-tools/src/msgl-check.c-386-                {
gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
gettext-tools/src/msgl-check.c-389-                               msgext);
gettext-tools/src/msgl-check.c-390-                  free (msgext);
gettext-tools/src/msgl-check.c-391-                  free (help);
gettext-tools/src/msgl-check.c-392-                }
gettext-tools/src/msgl-check.c-393-              else
gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
gettext-tools/src/msgl-check.c-395-                             msg);
gettext-tools/src/msgl-check.c-396-
gettext-tools/src/msgl-check.c-397-              seen_errors++;
gettext-tools/src/msgl-check.c-398-            }

You could get rid of a lot of code preceding the strtoul(3) call by
calling strtou_noneg() instead.  And strspn(3) would also help.

	nplurals += strspn(nplurals, " \t\n");
	nplurals_value = strtou_noneg(nplurals, (char **) &end, 10, 0, ULONG_MAX);
	if (nplurals == end)
		...

On the other hand, I also wonder why you don't diagnose invalid input.
Why is -10 an "invalid nplurals value", but ULONG_MAX+10 is a valid
(albeit clamped) one?

All of these cases look like missing error handling, IMO.

>   gettext/gettext-tools/src/read-stringtable.c:561

gettext-tools/src/read-stringtable.c-553-    {
gettext-tools/src/read-stringtable.c-554-      char *last_colon;
gettext-tools/src/read-stringtable.c-555-      unsigned long number;
gettext-tools/src/read-stringtable.c-556-      char *endp;
gettext-tools/src/read-stringtable.c-557-
gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
gettext-tools/src/read-stringtable.c-562-        {
gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
gettext-tools/src/read-stringtable.c-566-        }
gettext-tools/src/read-stringtable.c-567-      else
gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
gettext-tools/src/read-stringtable.c-569-    }

Let me try to rewrite it for readability first.

	char        *filepos, *last_colon;
	u_long      n;
	const char  *numstr, *end;

	filepos = strprefix(line, "File: ") ?: (char []){""};
	last_colon = strrchr(filepos, ':') ?: (char []){":"};
	numstr = strprefix(last_colon, ":");

	n = strtoul(numstr, (char **) &end, 10);
	if (numstr != end && streq(end, "")) {
		/* A "File: <filename>:<number>" type comment.  */
		strcpy(last_colon, "");
		catalog_reader_seen_comment_filepos(catr, filepos, n);

	} else {
		catalog_reader_seen_comment(catr, line);
	}

You're forgetting about negative numbers?  Or are you certain that they
can't happen?  How about huge values?  Assuming you want compatible code
calling strtou():

	n = strtou(numstr, (char **) &end, 10, 0, ULONG_MAX, NULL);
	if (status == 0 || status == ENOTSUP || status == ERANGE) {
		...
	} else {
		...
	}

But again, I wonder why you don't do range checks.

> > > I would therefore propose to change the status value to a bit mask, so that
> > > the error conditions "The converted value was out of range and has been
> > > coerced" and "The given string contains characters that did not get converted"
> > > can be both returned together, without conflicting.
> > 
> > Because it is theoretical conditions that a real program never wants,
> > let's not do that.
> 
> If you don't want to do that, I can only repeat what I said in the previous
> mail: The proposal *does not achieve the goal* of avoiding the most common
> programmer mistakes. For a robust API, the success test should *only* involve
> testing the returned 'status', nothing else.

Let's discuss this after your responses to the above.

> 
> Bruno
> 
> 
> 
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 22:43     ` Alejandro Colomar
  2025-03-19  0:15       ` Bruno Haible
@ 2025-03-19 15:56       ` Thorsten Glaser
  2025-03-19 16:25         ` Alejandro Colomar
  1 sibling, 1 reply; 43+ messages in thread
From: Thorsten Glaser @ 2025-03-19 15:56 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Tue, 18 Mar 2025, Alejandro Colomar wrote:

>> To address this adoption problem, how about changing these function to
>> generic functions (in the sense of <tgmath.h>)? In such a way that
>>     strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
>> is known to return a 'long' rather than 'intmax_t', and
>>     strtoi (n, &end, base, INT_MIN, INT_MAX, &status)

That, and especially…

>I have designed a better API, through a type-generic macro:
>
>	int
>	a2i(typename T, T *restrict n, QChar *s,
>	    QChar *_Optional *restrict endp, int base,
>	    T min, T max);
[…]

… this is a nightmare. This effectively will prevent people from
adding that to systems that do not use C2y as primary/only target
yet, and mixing code.

(Besides, a2i is a too generic name.)

bye,
//mirabilos
PS: Please don’t Cc me explicitly on this thread.
-- 
13:28⎜«neurodamage:#cvs» you're a handy guy to have around for systems stuff ☺
16:06⎜<Draget:#cvs> Thank god I found you =)   20:03│«bioe007:#cvs» mira2k: ty
17:14⎜<ldiain:#cvs> Thanks big help you are :-)   <bioe007> mira|nwt: ty again
18:36⎜«ThunderChicken:#cvs» mirabilos FTW!  23:03⎜«mithraic:#cvs» aaah. thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 15:56       ` Thorsten Glaser
@ 2025-03-19 16:25         ` Alejandro Colomar
  2025-03-19 16:36           ` Thorsten Glaser
  2025-03-19 17:35           ` Bruno Haible
  0 siblings, 2 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 16:25 UTC (permalink / raw)
  To: liba2i
  Cc: Bruno Haible, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]

Hi Thorsten,

On Wed, Mar 19, 2025 at 04:56:32PM +0100, Thorsten Glaser wrote:
> On Tue, 18 Mar 2025, Alejandro Colomar wrote:
> 
> >> To address this adoption problem, how about changing these function to
> >> generic functions (in the sense of <tgmath.h>)? In such a way that
> >>     strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
> >> is known to return a 'long' rather than 'intmax_t', and
> >>     strtoi (n, &end, base, INT_MIN, INT_MAX, &status)
> 
> That, and especially…

Please propose an implementation of the overload-selectinging macro, and
clarify how this should work:

	n = strto*(s, NULL, 0, SHRT_MIN, UINT_MAX, &status);

Do we use typeof(min + max)?  Which is the type to use?

Also, aren't you worried about going inventive in the standard?  I feel
safer with this old API than one invented now.

> >I have designed a better API, through a type-generic macro:
> >
> >	int
> >	a2i(typename T, T *restrict n, QChar *s,
> >	    QChar *_Optional *restrict endp, int base,
> >	    T min, T max);
> […]
> 
> … this is a nightmare. This effectively will prevent people from
> adding that to systems that do not use C2y as primary/only target
> yet, and mixing code.

That macro can be implemented with C11.  Here's the implementation I
wrote for shadow-utils, which we've been using and distributing for a
year already:

	#define a2i(TYPE, n, s, ...)                                                  \
	(                                                                             \
		_Generic((void (*)(TYPE, typeof(s))) 0,                               \
			void (*)(short,              const char *):  a2sh_c,          \
			void (*)(short,              const void *):  a2sh_c,          \
			void (*)(short,              char *):        a2sh_nc,         \
			void (*)(short,              void *):        a2sh_nc,         \
			void (*)(int,                const char *):  a2si_c,          \
			void (*)(int,                const void *):  a2si_c,          \
			void (*)(int,                char *):        a2si_nc,         \
			void (*)(int,                void *):        a2si_nc,         \
			void (*)(long,               const char *):  a2sl_c,          \
			void (*)(long,               const void *):  a2sl_c,          \
			void (*)(long,               char *):        a2sl_nc,         \
			void (*)(long,               void *):        a2sl_nc,         \
			void (*)(long long,          const char *):  a2sll_c,         \
			void (*)(long long,          const void *):  a2sll_c,         \
			void (*)(long long,          char *):        a2sll_nc,        \
			void (*)(long long,          void *):        a2sll_nc,        \
			void (*)(unsigned short,     const char *):  a2uh_c,          \
			void (*)(unsigned short,     const void *):  a2uh_c,          \
			void (*)(unsigned short,     char *):        a2uh_nc,         \
			void (*)(unsigned short,     void *):        a2uh_nc,         \
			void (*)(unsigned int,       const char *):  a2ui_c,          \
			void (*)(unsigned int,       const void *):  a2ui_c,          \
			void (*)(unsigned int,       char *):        a2ui_nc,         \
			void (*)(unsigned int,       void *):        a2ui_nc,         \
			void (*)(unsigned long,      const char *):  a2ul_c,          \
			void (*)(unsigned long,      const void *):  a2ul_c,          \
			void (*)(unsigned long,      char *):        a2ul_nc,         \
			void (*)(unsigned long,      void *):        a2ul_nc,         \
			void (*)(unsigned long long, const char *):  a2ull_c,         \
			void (*)(unsigned long long, const void *):  a2ull_c,         \
			void (*)(unsigned long long, char *):        a2ull_nc,        \
			void (*)(unsigned long long, void *):        a2ull_nc         \
		)(n, s, __VA_ARGS__)                                                  \
	)

And here's the definition of one of the overloads:

	int
	a2sl_nc(long *restrict n, char *s,
	    char **restrict endp, int base, long min, long max)
	{
		int  status;

		*n = strtoi(s, endp, base, min, max, &status);
		if (status != 0) {
			errno = status;
			return -1;
		}
		return 0;
	}

> (Besides, a2i is a too generic name.)

I named it like atoi(3), just replacing s/to/2/.  It is just as generic
as the name of the APIs it intends to supersede.

> bye,
> //mirabilos
> PS: Please don’t Cc me explicitly on this thread.

Ok.  Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 16:25         ` Alejandro Colomar
@ 2025-03-19 16:36           ` Thorsten Glaser
  2025-03-19 16:53             ` Alejandro Colomar
  2025-03-19 17:35           ` Bruno Haible
  1 sibling, 1 reply; 43+ messages in thread
From: Thorsten Glaser @ 2025-03-19 16:36 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, Bruno Haible, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Wed, 19 Mar 2025, Alejandro Colomar wrote:

>Please propose an implementation of the overload-selectinging macro

Well yes, no.

>That macro can be implemented with C11.  Here's the implementation I

But not with C89.

bye,
//mirabilos
PS: Please don’t Cc me explicitly on this thread.

>Ok.  Have a lovely day!

(That’s because I get it via a mailing list, to avoid duplicates,
which is always a hassle to sort out in my INBOX.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 16:36           ` Thorsten Glaser
@ 2025-03-19 16:53             ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 16:53 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: liba2i, Bruno Haible, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]

Hi Thorsten,

On Wed, Mar 19, 2025 at 05:36:32PM +0100, Thorsten Glaser wrote:
> On Wed, 19 Mar 2025, Alejandro Colomar wrote:
> 
> >Please propose an implementation of the overload-selectinging macro
> 
> Well yes, no.

Could you please clarify?  I don't understand what you mean.  If you
think strtoi(3) should be replaced by a type-generic function, you
should clarify the semantics, at least to some degree (which I think
would be better with some code, but if you do with words it would be
better than nothing).

> >That macro can be implemented with C11.  Here's the implementation I
> 
> But not with C89.

A type-generic macro cannot be implemented in C89 at all.  C11 is the
minimum for anything type-generic (except for compiler magic).  You can
probably use the same compiler magic to backport my a2i() macro to
dialects older than C11.

> bye,
> //mirabilos
> PS: Please don’t Cc me explicitly on this thread.
> 
> >Ok.  Have a lovely day!
> 
> (That’s because I get it via a mailing list, to avoid duplicates,
> which is always a hassle to sort out in my INBOX.)

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 16:25         ` Alejandro Colomar
  2025-03-19 16:36           ` Thorsten Glaser
@ 2025-03-19 17:35           ` Bruno Haible
  2025-03-19 18:01             ` Alejandro Colomar
  1 sibling, 1 reply; 43+ messages in thread
From: Bruno Haible @ 2025-03-19 17:35 UTC (permalink / raw)
  To: liba2i, Alejandro Colomar
  Cc: sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Alejandro Colomar wrote:
> > >> To address this adoption problem, how about changing these function to
> > >> generic functions (in the sense of <tgmath.h>)? In such a way that
> > >>     strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
> > >> is known to return a 'long' rather than 'intmax_t', and
> > >>     strtoi (n, &end, base, INT_MIN, INT_MAX, &status)
> > 
> > That, and especially…
> 
> Please propose an implementation of the overload-selectinging macro, and
> clarify how this should work:
> 
> 	n = strto*(s, NULL, 0, SHRT_MIN, UINT_MAX, &status);

Indeed, the "usual arithmetic conversions" (ISO C 23 § 6.3.1.8, § 7.27.(7))
would not work well in this case. Instead, one needs to distinguish strtoi
and strtou:
  - For strtoi, the first of the types 'signed char', 'short', 'int', 'long',
    'long long', 'intmax_t' that contains both the min and the max value.
  - For strtou, the first of the types 'unsigned char', 'unsigned short',
    'unsigned int', 'unsigned long', 'unsigned long long', 'uintmax_t' that
    contains both the min and the max value.

Bruno




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 17:35           ` Bruno Haible
@ 2025-03-19 18:01             ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 18:01 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

Hi Bruno,

On Wed, Mar 19, 2025 at 06:35:29PM +0100, Bruno Haible wrote:
> Alejandro Colomar wrote:
> > > >> To address this adoption problem, how about changing these function to
> > > >> generic functions (in the sense of <tgmath.h>)? In such a way that
> > > >>     strtoi (n, &end, base, LONG_MIN, LONG_MAX, &status)
> > > >> is known to return a 'long' rather than 'intmax_t', and
> > > >>     strtoi (n, &end, base, INT_MIN, INT_MAX, &status)
> > > 
> > > That, and especially…
> > 
> > Please propose an implementation of the overload-selectinging macro, and
> > clarify how this should work:
> > 
> > 	n = strto*(s, NULL, 0, SHRT_MIN, UINT_MAX, &status);
> 
> Indeed, the "usual arithmetic conversions" (ISO C 23 § 6.3.1.8, § 7.27.(7))
> would not work well in this case. Instead, one needs to distinguish strtoi
> and strtou:
>   - For strtoi, the first of the types 'signed char', 'short', 'int', 'long',
>     'long long', 'intmax_t' that contains both the min and the max value.
>   - For strtou, the first of the types 'unsigned char', 'unsigned short',
>     'unsigned int', 'unsigned long', 'unsigned long long', 'uintmax_t' that
>     contains both the min and the max value.

What if none contain the value?

E.g.:

	n = strtoi(s, NULL, 0, SHRT_MIN, ULLONG_MAX, &status);
	n = strtou(s, NULL, 0, 3, -1, &status);

This first one is a bug, but we should still decide what to do with it.
The second one could be a short-hand to say UINTMAX_MAX.

Anyway, for this, I'm working on a better API.  I don't think we should
make strtoi(3) type-generic.  a2i() is better for that, accepting the
type as a parameter.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 15:26         ` Alejandro Colomar
@ 2025-03-19 18:48           ` Alejandro Colomar
  2025-03-19 18:56             ` Alejandro Colomar
  2025-03-19 21:59           ` Bruno Haible
  1 sibling, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 18:48 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 14815 bytes --]

On Wed, Mar 19, 2025 at 04:26:11PM +0100, Alejandro Colomar wrote:
> Hi Bruno,
> 
> On Wed, Mar 19, 2025 at 01:15:30AM +0100, Bruno Haible wrote:
> > Alejandro Colomar wrote:
> > > > It would be useful to show how a success test looks like, after
> > > >     strtoi (s, &end, base, min, max, &status)
> > > > for each of the four frequent use-cases:
> > > >   -a. expect to parse the initial portion of the string, no coercion,
> > > >   -b. expect to parse the initial portion of the string, silent coercion,
> > > >   -c. expect to parse the entire string, no coercion,
> > > >   -d. expect to parse the entire string, silent coercion.
> > > > 
> > > > AFAICS, the success tests are:
> > > >   -a. status == 0 || status == ENOTSUP
> > > 
> > > Correct.
> > > 
> > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > 
> > > Correct (but most likely a bug).
> 
> Actually, now I remember that status can be NULL, in which case it's not
> reported.  This is a case where you could check for errors with a
> simpler expression:
> 
> 	end != str
> 
> but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> reasnoable one.
> 
> like you can do with strtol(3), but with portability guarantess
> regarding EINVAL, because strtoi(3bsd) always writes *endp (if nonnull).
> 
> I need to update the specification to mention that status can be NULL.
> 
> > > 
> > > >   -c. status == 0
> > > 
> > > Correct.
> > > 
> > > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > > 
> > > You don't need end>s, because that would preclude ERANGE.
> > > 
> > > 	status == 0 || (status == ERANGE && end == '\0')
> > > 
> > > Aaand, most likely a bug.
> > 
> > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > that the values "4294967297" and "4294967295" produce the same behaviour
> > (the same error message, for example).
> 
> If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> result in an error, then you should probably clamp at ULONG_MAX - 1,
> and consider anything above an error.
> 
> > It is for these cases that your specification contains the clamping /
> > coercion behaviour.
> > 
> > Now, when you look at the table of success tests:
> > 
> >    -a. status == 0 || status == ENOTSUP
> >    -b. status == 0 || status == ENOTSUP || status == ERANGE
> >    -c. status == 0
> >    -d. status == 0 || (status == ERANGE && *end == '\0')
> > 
> > it is immediately clear that the status return convention is ill-designed,
> > because the returned 'status' is not the only thing a programmer has to test
> > after calling the function.
> > 
> > > Cases b and d are not real, IMO.  I have never seen code where that is
> > > wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
> > > looking precisely for that usage.
> > 
> > I disagree.
> 
> I didn't find any occurence of 'd' in calls to strtoi(3)/strtou(3).
> I didn't analyze calls to strtol(3) et al.
> 
> > Any use of strtoul that does not test errno wants overflow
> > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > Just looking in gnulib and gettext, I find already 6 occurrences:
> >   gnulib/lib/getaddrinfo.c:299
> 
> lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
> lib/getaddrinfo.c-298-            return EAI_NONAME;
> lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
> lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
> lib/getaddrinfo.c-301-            return EAI_NONAME;
> lib/getaddrinfo.c-302-          port = htons (port);
> 
> You could remove the preceding conditional if you don't want to avoid
> leading whitespace.  You could merge that into the strtou(3) call, which
> would report ECANCELED for non-numeric input).  Except that a negative
> number is silently converted to a positive large value.  This is why I
> use a wrapper function strtou_noneg() that rejects negative numbers.
> 
> You could rewrite it as:
> 
> 	port = strtou_noneg(servname, NULL, 10, 0, UINT16_MAX, &status);
> 	if (status != 0)
> 		return EAI_NONAME;
> 	port = htons(port);
> 
> where strtou_noneg() is:
> 
> 	uintmax_t
> 	strtou_noneg(const char *s, char **restrict endp, int base,
> 	    uintmax_t min, uintmax_t max, int *restrict status)
> 	{
> 		int  st;
> 
> 		if (status == NULL)
> 			status = &st;
> 		if (strtoi(s, endp, base, 0, 1, status) == 0 && *status == ERANGE)
> 			return min;
> 
> 		return strtou(s, endp, base, min, max, status);
> 	}
> 
> I think this is not one case where you want silent saturation.  You're
> indeed doing range checks [0, UINT16_MAX].
> 
> >   gnulib/lib/nproc.c:402
> 
> lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> lib/nproc.c-384-   Return 0 for invalid values.  */
> lib/nproc.c-385-static unsigned long int
> lib/nproc.c:386:parse_omp_threads (char const* threads)
> lib/nproc.c-387-{
> 
> ...
> 
> lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> lib/nproc.c-399-  if (c_isdigit (*threads))
> lib/nproc.c-400-    {
> lib/nproc.c-401-      char *endptr = NULL;
> lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> lib/nproc.c-403-
> lib/nproc.c-404-      if (endptr != NULL)
> lib/nproc.c-405-        {
> lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> lib/nproc.c-407-            endptr++;
> lib/nproc.c-408-          if (*endptr == '\0')
> lib/nproc.c-409-            return value;
> lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> lib/nproc.c-412-          else if (*endptr == ',')
> lib/nproc.c-413-            return value;
> lib/nproc.c-414-        }
> lib/nproc.c-415-    }
> 
> First of all, the endptr!=NULL test seems misplaced.  The only way that
> could be true is if the base is unsupported, and 10 is necessarily

s/true/equal/

> supported.  You should remove the initialization '= NULL', and the
> check, since both are dead code, IIRC.  That's one of the things you
> don't need to care with strtoi(3), because it _always_ sets *endp.
> 
> And you could probably remove the isdigit test by calling
> strtou_noneg().
> 
> This could be something like this (fixing the bugs reported above):
> 
> 	char    *end;
> 	u_long  value;
> 
> 	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> 	if (end != threads) {
> 		end += strspn(end, " \t\n");
> 		if (streq(end, "")
> 			return value;
> 		if (strprefix(end, ","))
> 			return value;
> 	}
> 
> 
> This is one case where you seem to silently ignore saturation.  Why
> don't you have any diagnostic message?
> 
> >   gnulib/lib/omp-init.c:48
> 
> lib/omp-init.c-47-      char *endptr = NULL;
> lib/omp-init.c:48:      unsigned long int value = strtoul (threads, &endptr, 10);
> lib/omp-init.c-49-
> lib/omp-init.c-50-      if (endptr != NULL)
> lib/omp-init.c-51-        {
> lib/omp-init.c-52-          while (*endptr != '\0' && c_isspace (*endptr))
> lib/omp-init.c-53-            endptr++;
> lib/omp-init.c-54-          if (*endptr == '\0')
> lib/omp-init.c-55-            return value;
> lib/omp-init.c-56-          /* Also accept the first value in a nesting level,
> lib/omp-init.c-57-             since we can't determine the nesting level from env vars.  */
> lib/omp-init.c-58-          else if (*endptr == ',')
> lib/omp-init.c-59-            return value;
> lib/omp-init.c-60-        }
> 
> This seems identical to the previous case.
> 
> >   gettext/gettext-tools/src/msgfmt.c:287
> 
> gettext-tools/src/msgfmt.c-286-          char *endp;
> gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> gettext-tools/src/msgfmt.c-288-
> gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> 
> This code will misbehave badly on platforms where size_t is narrower
> than u_long.  Consider the case where you parse a high u_long, let's say
> SIZE_MAX + 1ul.  It will be converted to 1.  There's a bug due to a
> missing range check (but orthogonal to saturation).
> 
> You also don't reject negative numbers, which I expect to be a bug,
> connected with the one from above.
> 
> This could be rewritten to (fixing the bugs reported above):
> 
> 	char    *end;
> 	size_t  new_align;
> 
> 	new_align = strtou_noneg(optarg, &end, 0, 0, SIZE_MAX, NULL);
> 	if (optarg != end)
> 		alignment = new_align;
> 
> This seems another case where you silently saturate.  Why don't you have
> a diagnostic message for invalid input?
> 
> >   gettext/gettext-tools/src/msgl-check.c:379
> 
> gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> gettext-tools/src/msgl-check.c-375-            ++nplurals;
> gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> gettext-tools/src/msgl-check.c-381-            {
> gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> gettext-tools/src/msgl-check.c-384-
> gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> gettext-tools/src/msgl-check.c-386-                {
> gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> gettext-tools/src/msgl-check.c-389-                               msgext);
> gettext-tools/src/msgl-check.c-390-                  free (msgext);
> gettext-tools/src/msgl-check.c-391-                  free (help);
> gettext-tools/src/msgl-check.c-392-                }
> gettext-tools/src/msgl-check.c-393-              else
> gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> gettext-tools/src/msgl-check.c-395-                             msg);
> gettext-tools/src/msgl-check.c-396-
> gettext-tools/src/msgl-check.c-397-              seen_errors++;
> gettext-tools/src/msgl-check.c-398-            }
> 
> You could get rid of a lot of code preceding the strtoul(3) call by
> calling strtou_noneg() instead.  And strspn(3) would also help.
> 
> 	nplurals += strspn(nplurals, " \t\n");
> 	nplurals_value = strtou_noneg(nplurals, (char **) &end, 10, 0, ULONG_MAX);
> 	if (nplurals == end)
> 		...
> 
> On the other hand, I also wonder why you don't diagnose invalid input.
> Why is -10 an "invalid nplurals value", but ULONG_MAX+10 is a valid
> (albeit clamped) one?
> 
> All of these cases look like missing error handling, IMO.
> 
> >   gettext/gettext-tools/src/read-stringtable.c:561
> 
> gettext-tools/src/read-stringtable.c-553-    {
> gettext-tools/src/read-stringtable.c-554-      char *last_colon;
> gettext-tools/src/read-stringtable.c-555-      unsigned long number;
> gettext-tools/src/read-stringtable.c-556-      char *endp;
> gettext-tools/src/read-stringtable.c-557-
> gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
> gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
> gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> gettext-tools/src/read-stringtable.c-562-        {
> gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
> gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
> gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
> gettext-tools/src/read-stringtable.c-566-        }
> gettext-tools/src/read-stringtable.c-567-      else
> gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
> gettext-tools/src/read-stringtable.c-569-    }
> 
> Let me try to rewrite it for readability first.
> 
> 	char        *filepos, *last_colon;
> 	u_long      n;
> 	const char  *numstr, *end;
> 
> 	filepos = strprefix(line, "File: ") ?: (char []){""};
> 	last_colon = strrchr(filepos, ':') ?: (char []){":"};
> 	numstr = strprefix(last_colon, ":");
> 
> 	n = strtoul(numstr, (char **) &end, 10);
> 	if (numstr != end && streq(end, "")) {
> 		/* A "File: <filename>:<number>" type comment.  */
> 		strcpy(last_colon, "");
> 		catalog_reader_seen_comment_filepos(catr, filepos, n);
> 
> 	} else {
> 		catalog_reader_seen_comment(catr, line);
> 	}
> 
> You're forgetting about negative numbers?  Or are you certain that they
> can't happen?  How about huge values?  Assuming you want compatible code
> calling strtou():
> 
> 	n = strtou(numstr, (char **) &end, 10, 0, ULONG_MAX, NULL);
> 	if (status == 0 || status == ENOTSUP || status == ERANGE) {
> 		...
> 	} else {
> 		...
> 	}
> 
> But again, I wonder why you don't do range checks.
> 
> > > > I would therefore propose to change the status value to a bit mask, so that
> > > > the error conditions "The converted value was out of range and has been
> > > > coerced" and "The given string contains characters that did not get converted"
> > > > can be both returned together, without conflicting.
> > > 
> > > Because it is theoretical conditions that a real program never wants,
> > > let's not do that.
> > 
> > If you don't want to do that, I can only repeat what I said in the previous
> > mail: The proposal *does not achieve the goal* of avoiding the most common
> > programmer mistakes. For a robust API, the success test should *only* involve
> > testing the returned 'status', nothing else.
> 
> Let's discuss this after your responses to the above.
> 
> > 
> > Bruno
> > 
> > 
> > 
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 18:48           ` Alejandro Colomar
@ 2025-03-19 18:56             ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 18:56 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 15730 bytes --]

On Wed, Mar 19, 2025 at 07:48:59PM +0100, Alejandro Colomar wrote:
> On Wed, Mar 19, 2025 at 04:26:11PM +0100, Alejandro Colomar wrote:
> > Hi Bruno,
> > 
> > On Wed, Mar 19, 2025 at 01:15:30AM +0100, Bruno Haible wrote:
> > > Alejandro Colomar wrote:
> > > > > It would be useful to show how a success test looks like, after
> > > > >     strtoi (s, &end, base, min, max, &status)
> > > > > for each of the four frequent use-cases:
> > > > >   -a. expect to parse the initial portion of the string, no coercion,
> > > > >   -b. expect to parse the initial portion of the string, silent coercion,
> > > > >   -c. expect to parse the entire string, no coercion,
> > > > >   -d. expect to parse the entire string, silent coercion.
> > > > > 
> > > > > AFAICS, the success tests are:
> > > > >   -a. status == 0 || status == ENOTSUP
> > > > 
> > > > Correct.
> > > > 
> > > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > > 
> > > > Correct (but most likely a bug).
> > 
> > Actually, now I remember that status can be NULL, in which case it's not
> > reported.  This is a case where you could check for errors with a
> > simpler expression:
> > 
> > 	end != str
> > 
> > but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> > reasnoable one.
> > 
> > like you can do with strtol(3), but with portability guarantess
> > regarding EINVAL, because strtoi(3bsd) always writes *endp (if nonnull).
> > 
> > I need to update the specification to mention that status can be NULL.
> > 
> > > > 
> > > > >   -c. status == 0
> > > > 
> > > > Correct.
> > > > 
> > > > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > > > 
> > > > You don't need end>s, because that would preclude ERANGE.
> > > > 
> > > > 	status == 0 || (status == ERANGE && end == '\0')
> > > > 
> > > > Aaand, most likely a bug.
> > > 
> > > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > > that the values "4294967297" and "4294967295" produce the same behaviour
> > > (the same error message, for example).
> > 
> > If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> > result in an error, then you should probably clamp at ULONG_MAX - 1,
> > and consider anything above an error.
> > 
> > > It is for these cases that your specification contains the clamping /
> > > coercion behaviour.
> > > 
> > > Now, when you look at the table of success tests:
> > > 
> > >    -a. status == 0 || status == ENOTSUP
> > >    -b. status == 0 || status == ENOTSUP || status == ERANGE
> > >    -c. status == 0
> > >    -d. status == 0 || (status == ERANGE && *end == '\0')
> > > 
> > > it is immediately clear that the status return convention is ill-designed,
> > > because the returned 'status' is not the only thing a programmer has to test
> > > after calling the function.
> > > 
> > > > Cases b and d are not real, IMO.  I have never seen code where that is
> > > > wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases
> > > > looking precisely for that usage.
> > > 
> > > I disagree.
> > 
> > I didn't find any occurence of 'd' in calls to strtoi(3)/strtou(3).
> > I didn't analyze calls to strtol(3) et al.
> > 
> > > Any use of strtoul that does not test errno wants overflow
> > > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > > Just looking in gnulib and gettext, I find already 6 occurrences:
> > >   gnulib/lib/getaddrinfo.c:299
> > 
> > lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
> > lib/getaddrinfo.c-298-            return EAI_NONAME;
> > lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
> > lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
> > lib/getaddrinfo.c-301-            return EAI_NONAME;
> > lib/getaddrinfo.c-302-          port = htons (port);
> > 
> > You could remove the preceding conditional if you don't want to avoid
> > leading whitespace.  You could merge that into the strtou(3) call, which
> > would report ECANCELED for non-numeric input).  Except that a negative
> > number is silently converted to a positive large value.  This is why I
> > use a wrapper function strtou_noneg() that rejects negative numbers.
> > 
> > You could rewrite it as:
> > 
> > 	port = strtou_noneg(servname, NULL, 10, 0, UINT16_MAX, &status);
> > 	if (status != 0)
> > 		return EAI_NONAME;
> > 	port = htons(port);
> > 
> > where strtou_noneg() is:
> > 
> > 	uintmax_t
> > 	strtou_noneg(const char *s, char **restrict endp, int base,
> > 	    uintmax_t min, uintmax_t max, int *restrict status)
> > 	{
> > 		int  st;
> > 
> > 		if (status == NULL)
> > 			status = &st;
> > 		if (strtoi(s, endp, base, 0, 1, status) == 0 && *status == ERANGE)
> > 			return min;
> > 
> > 		return strtou(s, endp, base, min, max, status);
> > 	}
> > 
> > I think this is not one case where you want silent saturation.  You're
> > indeed doing range checks [0, UINT16_MAX].
> > 
> > >   gnulib/lib/nproc.c:402
> > 
> > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > lib/nproc.c-384-   Return 0 for invalid values.  */
> > lib/nproc.c-385-static unsigned long int
> > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > lib/nproc.c-387-{
> > 
> > ...
> > 
> > lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> > lib/nproc.c-399-  if (c_isdigit (*threads))
> > lib/nproc.c-400-    {
> > lib/nproc.c-401-      char *endptr = NULL;
> > lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> > lib/nproc.c-403-
> > lib/nproc.c-404-      if (endptr != NULL)
> > lib/nproc.c-405-        {
> > lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> > lib/nproc.c-407-            endptr++;
> > lib/nproc.c-408-          if (*endptr == '\0')
> > lib/nproc.c-409-            return value;
> > lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> > lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> > lib/nproc.c-412-          else if (*endptr == ',')
> > lib/nproc.c-413-            return value;
> > lib/nproc.c-414-        }
> > lib/nproc.c-415-    }
> > 
> > First of all, the endptr!=NULL test seems misplaced.  The only way that
> > could be true is if the base is unsupported, and 10 is necessarily
> 
> s/true/equal/
> 
> > supported.  You should remove the initialization '= NULL', and the
> > check, since both are dead code, IIRC.  That's one of the things you
> > don't need to care with strtoi(3), because it _always_ sets *endp.
> > 
> > And you could probably remove the isdigit test by calling
> > strtou_noneg().
> > 
> > This could be something like this (fixing the bugs reported above):
> > 
> > 	char    *end;
> > 	u_long  value;
> > 
> > 	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> > 	if (end != threads) {
> > 		end += strspn(end, " \t\n");
> > 		if (streq(end, "")
> > 			return value;
> > 		if (strprefix(end, ","))
> > 			return value;
> > 	}
> > 
> > 
> > This is one case where you seem to silently ignore saturation.  Why
> > don't you have any diagnostic message?
> > 
> > >   gnulib/lib/omp-init.c:48
> > 
> > lib/omp-init.c-47-      char *endptr = NULL;
> > lib/omp-init.c:48:      unsigned long int value = strtoul (threads, &endptr, 10);
> > lib/omp-init.c-49-
> > lib/omp-init.c-50-      if (endptr != NULL)
> > lib/omp-init.c-51-        {
> > lib/omp-init.c-52-          while (*endptr != '\0' && c_isspace (*endptr))
> > lib/omp-init.c-53-            endptr++;
> > lib/omp-init.c-54-          if (*endptr == '\0')
> > lib/omp-init.c-55-            return value;
> > lib/omp-init.c-56-          /* Also accept the first value in a nesting level,
> > lib/omp-init.c-57-             since we can't determine the nesting level from env vars.  */
> > lib/omp-init.c-58-          else if (*endptr == ',')
> > lib/omp-init.c-59-            return value;
> > lib/omp-init.c-60-        }
> > 
> > This seems identical to the previous case.
> > 
> > >   gettext/gettext-tools/src/msgfmt.c:287
> > 
> > gettext-tools/src/msgfmt.c-286-          char *endp;
> > gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> > gettext-tools/src/msgfmt.c-288-
> > gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> > gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> > 
> > This code will misbehave badly on platforms where size_t is narrower
> > than u_long.  Consider the case where you parse a high u_long, let's say
> > SIZE_MAX + 1ul.  It will be converted to 1.  There's a bug due to a
> > missing range check (but orthogonal to saturation).
> > 
> > You also don't reject negative numbers, which I expect to be a bug,
> > connected with the one from above.
> > 
> > This could be rewritten to (fixing the bugs reported above):
> > 
> > 	char    *end;
> > 	size_t  new_align;
> > 
> > 	new_align = strtou_noneg(optarg, &end, 0, 0, SIZE_MAX, NULL);
> > 	if (optarg != end)
> > 		alignment = new_align;
> > 
> > This seems another case where you silently saturate.  Why don't you have
> > a diagnostic message for invalid input?
> > 
> > >   gettext/gettext-tools/src/msgl-check.c:379
> > 
> > gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > gettext-tools/src/msgl-check.c-375-            ++nplurals;
> > gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> > gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> > gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> > gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> > gettext-tools/src/msgl-check.c-381-            {
> > gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> > gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> > gettext-tools/src/msgl-check.c-384-
> > gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> > gettext-tools/src/msgl-check.c-386-                {
> > gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> > gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > gettext-tools/src/msgl-check.c-389-                               msgext);
> > gettext-tools/src/msgl-check.c-390-                  free (msgext);
> > gettext-tools/src/msgl-check.c-391-                  free (help);
> > gettext-tools/src/msgl-check.c-392-                }
> > gettext-tools/src/msgl-check.c-393-              else
> > gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > gettext-tools/src/msgl-check.c-395-                             msg);
> > gettext-tools/src/msgl-check.c-396-
> > gettext-tools/src/msgl-check.c-397-              seen_errors++;
> > gettext-tools/src/msgl-check.c-398-            }
> > 
> > You could get rid of a lot of code preceding the strtoul(3) call by
> > calling strtou_noneg() instead.  And strspn(3) would also help.
> > 
> > 	nplurals += strspn(nplurals, " \t\n");

(Actually, strtou(3) skips leading space, so we don't really need this
 at all.)

> > 	nplurals_value = strtou_noneg(nplurals, (char **) &end, 10, 0, ULONG_MAX);
> > 	if (nplurals == end)
> > 		...
> > 
> > On the other hand, I also wonder why you don't diagnose invalid input.
> > Why is -10 an "invalid nplurals value", but ULONG_MAX+10 is a valid
> > (albeit clamped) one?
> > 
> > All of these cases look like missing error handling, IMO.
> > 
> > >   gettext/gettext-tools/src/read-stringtable.c:561
> > 
> > gettext-tools/src/read-stringtable.c-553-    {
> > gettext-tools/src/read-stringtable.c-554-      char *last_colon;
> > gettext-tools/src/read-stringtable.c-555-      unsigned long number;
> > gettext-tools/src/read-stringtable.c-556-      char *endp;
> > gettext-tools/src/read-stringtable.c-557-
> > gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> > gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
> > gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
> > gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> > gettext-tools/src/read-stringtable.c-562-        {
> > gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
> > gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
> > gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
> > gettext-tools/src/read-stringtable.c-566-        }
> > gettext-tools/src/read-stringtable.c-567-      else
> > gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
> > gettext-tools/src/read-stringtable.c-569-    }
> > 
> > Let me try to rewrite it for readability first.
> > 
> > 	char        *filepos, *last_colon;
> > 	u_long      n;
> > 	const char  *numstr, *end;
> > 
> > 	filepos = strprefix(line, "File: ") ?: (char []){""};
> > 	last_colon = strrchr(filepos, ':') ?: (char []){":"};
> > 	numstr = strprefix(last_colon, ":");
> > 
> > 	n = strtoul(numstr, (char **) &end, 10);
> > 	if (numstr != end && streq(end, "")) {
> > 		/* A "File: <filename>:<number>" type comment.  */
> > 		strcpy(last_colon, "");
> > 		catalog_reader_seen_comment_filepos(catr, filepos, n);
> > 
> > 	} else {
> > 		catalog_reader_seen_comment(catr, line);
> > 	}
> > 
> > You're forgetting about negative numbers?  Or are you certain that they
> > can't happen?  How about huge values?  Assuming you want compatible code
> > calling strtou():
> > 
> > 	n = strtou(numstr, (char **) &end, 10, 0, ULONG_MAX, NULL);
> > 	if (status == 0 || status == ENOTSUP || status == ERANGE) {
> > 		...
> > 	} else {
> > 		...
> > 	}
> > 
> > But again, I wonder why you don't do range checks.
> > 
> > > > > I would therefore propose to change the status value to a bit mask, so that
> > > > > the error conditions "The converted value was out of range and has been
> > > > > coerced" and "The given string contains characters that did not get converted"
> > > > > can be both returned together, without conflicting.
> > > > 
> > > > Because it is theoretical conditions that a real program never wants,
> > > > let's not do that.
> > > 
> > > If you don't want to do that, I can only repeat what I said in the previous
> > > mail: The proposal *does not achieve the goal* of avoiding the most common
> > > programmer mistakes. For a robust API, the success test should *only* involve
> > > testing the returned 'status', nothing else.
> > 
> > Let's discuss this after your responses to the above.
> > 
> > > 
> > > Bruno
> > > 
> > > 
> > > 
> > > 
> > 
> > -- 
> > <https://www.alejandro-colomar.es/>
> 
> 
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19  0:15       ` Bruno Haible
  2025-03-19 15:26         ` Alejandro Colomar
@ 2025-03-19 19:27         ` Paul Eggert
  2025-03-19 20:05           ` Alejandro Colomar
  1 sibling, 1 reply; 43+ messages in thread
From: Paul Eggert @ 2025-03-19 19:27 UTC (permalink / raw)
  To: Bruno Haible, Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

On 2025-03-18 17:15, Bruno Haible wrote:
> If you don't want to do that, I can only repeat what I said in the previous
> mail: The proposal*does not achieve the goal* of avoiding the most common
> programmer mistakes. For a robust API, the success test should*only* involve
> testing the returned 'status', nothing else.

This was my initial reaction as well. Although strtol has real problems, 
the proposed interface is even more complicated and confusing and I 
suspect that in practice it'd be misused even more often than strtol is.

I suggest starting from scratch. In particular, use a functional style, 
with no side effects (no pointers-to-results). Just return the result 
you want, as a struct, and keep the struct simple. Two struct components 
should suffice: the scanned numeric value and a success/error indicator.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 19:27         ` Paul Eggert
@ 2025-03-19 20:05           ` Alejandro Colomar
  2025-03-19 20:39             ` Paul Eggert
  0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 20:05 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 3228 bytes --]

Hi Paul,

On Wed, Mar 19, 2025 at 12:27:07PM -0700, Paul Eggert wrote:
> On 2025-03-18 17:15, Bruno Haible wrote:
> > If you don't want to do that, I can only repeat what I said in the previous
> > mail: The proposal*does not achieve the goal* of avoiding the most common
> > programmer mistakes. For a robust API, the success test should*only* involve
> > testing the returned 'status', nothing else.
> 
> This was my initial reaction as well. Although strtol has real problems, the
> proposed interface is even more complicated and confusing and I suspect that
> in practice it'd be misused even more often than strtol is.

Please comment on the subthread where Bruno mentioned a number of places
in gnulib and gettext where you use strtoul(3).  I found there a few
bugs, plus some ways to just simplify with strtou(3).

The concerns about the case where one doesn't want to check ERANGE but
wants to check ENOTSUP need justification.  I suspect it's rather
missing error handling in that code.

I've never seen code calling strtoi/u(3) that wants that.

> I suggest starting from scratch.

I'm doing that in shadow-utils and liba2i, but I don't feel ready for
standardization yet.  In particular, I have two competing APIs in my
head.

> In particular, use a functional style, with
> no side effects (no pointers-to-results). Just return the result you want,
> as a struct, and keep the struct simple. Two struct components should
> suffice: the scanned numeric value and a success/error indicator.

That's going to complicate usage significantly.

If I had invented strtoi/u(3) myself, I would have used errno instead of
the *status parameter, but I don't feel too strongly about it to scrape
that API.

Regarding my APIs that are under development, here are the two
alternatives:

	int
	alt_1(typename T,
	    T *n, QChar *s, QChar **_Nullable endp, int base, T min, T max);


	if (alt_1(time_t, &time, s, NULL, 0, now, later) == -1)
		err(1, "alt_1");

which returns 0 on success, or -1 on error and sets errno on error, so
usual libc behavior.

	QChar *
	alt_2(typename T,
	    T *n, QChar *s, int base, T min, T max);


	errno = 0;
	alt_2(time_t, &time, s, 0, now, later);
	if (errno != 0)
		err(1, "alt_2");

which guarantees not setting errno on success, and sets it on error.
It always returns 'end' (instead of having the output parameter *endp).

Each one has benefits and drawbacks.  But the number we do want it as an
output parameter, since it gives us type safety: if you pass something
of a type other than the one specified in the first parameter, you get a
compiler error.

I'm still undecided which one I prefer.  So far, we're using alt_1 in
shadow-utils.  Feel free to comment about them.  If you have any idea
that will simplify usage or improve type safety, please present it, and
show how the API would look like, and an example of use.

But, these APIs are implemented in terms of strtoi/u(3), so even if we
want to eventually get these APIs, we'd benefit of standardizing the
NetBSD ones now, which will allow easier deployment of my wrappers.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 20:05           ` Alejandro Colomar
@ 2025-03-19 20:39             ` Paul Eggert
  2025-03-19 21:23               ` Alejandro Colomar
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Eggert @ 2025-03-19 20:39 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

On 2025-03-19 13:05, Alejandro Colomar wrote:
> Please comment on the subthread where Bruno mentioned a number of places
> in gnulib and gettext where you use strtoul(3).  I found there a few
> bugs, plus some ways to just simplify with strtou(3).

I looked at the Gnulib commentary in 
<https://lore.kernel.org/liba2i/jx4664ishtl34eg2npdrv5fkfdiczqnlq3vjuacjrupjvh377x@gddcftzgwmfq/>, 
as I assume that's what you're talking about. (I don't hack on gettext 
and will leave Bruno to comment on that.)

For Gnulib, I didn't see any bugs in the three areas mentioned.

The patch suggested to lib/getaddrinfo.c doesn't fix any bugs that I can 
see, and needs an additional wrapper to work anyway, which is 
introducing complexity.

The patch suggested to lib/nproc.c is merely a minor clarity / 
performance improvement (it removes three instructions), and does not 
fix any bugs. Likewise for the patch to lib/omp-init.c. And these 
improvements (where the code mistakenly worried about endptr == NULL) 
fix a mistake that one could make with the proposed strtoi API, so I 
don't see strtoi helping there.

But thanks for the clarity / speedup idea; I installed a patch into 
Gnulib here:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=2835ca01722fcd41761383ef289d19797b13b2e8




>> > In particular, use a functional style, with
>> > no side effects (no pointers-to-results). Just return the result you want,
>> > as a struct, and keep the struct simple. Two struct components should
>> > suffice: the scanned numeric value and a success/error indicator.
> 
> That's going to complicate usage significantly.

Please try it and see. You might be surprised at how clean and efficient 
functional programming can be, if done right. Admittedly C doesn't 
always make it easy.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 20:39             ` Paul Eggert
@ 2025-03-19 21:23               ` Alejandro Colomar
  2025-03-20  0:39                 ` Paul Eggert
  0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 21:23 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

Hi Paul,

On Wed, Mar 19, 2025 at 01:39:00PM -0700, Paul Eggert wrote:
> On 2025-03-19 13:05, Alejandro Colomar wrote:
> > Please comment on the subthread where Bruno mentioned a number of places
> > in gnulib and gettext where you use strtoul(3).  I found there a few
> > bugs, plus some ways to just simplify with strtou(3).
> 
> I looked at the Gnulib commentary in <https://lore.kernel.org/liba2i/jx4664ishtl34eg2npdrv5fkfdiczqnlq3vjuacjrupjvh377x@gddcftzgwmfq/>,
> as I assume that's what you're talking about. (I don't hack on gettext and
> will leave Bruno to comment on that.)

Yes, I was referring to those.

> For Gnulib, I didn't see any bugs in the three areas mentioned.
> 
> The patch suggested to lib/getaddrinfo.c doesn't fix any bugs that I can
> see, and needs an additional wrapper to work anyway, which is introducing
> complexity.

Agree.  gnulib had dead code and not-very-readable code, but no
misbehavior.

Although, I think not reporting errors or warnings on saturation needs
justification.

> The patch suggested to lib/nproc.c is merely a minor clarity / performance
> improvement (it removes three instructions), and does not fix any bugs.
> Likewise for the patch to lib/omp-init.c. And these improvements (where the
> code mistakenly worried about endptr == NULL) fix a mistake that one could
> make with the proposed strtoi API, so I don't see strtoi helping there.

The test ==NULL would never make sense in strtoi(3) because it
guarantees setting *endp even on EINVAL.  Some programmers are paranoid
with strtol(3) and check for NULL, because in some systems it may keep
it uninitialized on EINVAL, but that's not portable at all.  So yes,
strtoi(3) should remove that issue.

> 
> But thanks for the clarity / speedup idea; I installed a patch into Gnulib
> here:
> 
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=2835ca01722fcd41761383ef289d19797b13b2e8

Thanks, those changes look good.  BTW, what do you think of using
strspn(3) to simplify the c_isspace loop?

However, would you mind clarifying why you don't diagnose huge values in
the two places that you have updated?

> > > > In particular, use a functional style, with
> > > > no side effects (no pointers-to-results). Just return the result you want,
> > > > as a struct, and keep the struct simple. Two struct components should
> > > > suffice: the scanned numeric value and a success/error indicator.
> > 
> > That's going to complicate usage significantly.
> 
> Please try it and see. You might be surprised at how clean and efficient
> functional programming can be, if done right. Admittedly C doesn't always
> make it easy.

Here's an example of parsing a time_t:

	time_t  t;

	if (a2i(time_t, &t, ...) == -1)
		err(1, "a2i");

If you need to store it in a struct, you need to know what a time_t is
in the first place:

	struct foo {
		long  val;
		int   err;
	}

	struct foo  ret;

	ret = f(time_t, ...);
	if (ret.err != 0)
		err(1, "f");

How do I know which variant of struct foo I need?


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 15:26         ` Alejandro Colomar
  2025-03-19 18:48           ` Alejandro Colomar
@ 2025-03-19 21:59           ` Bruno Haible
  2025-03-19 23:12             ` Alejandro Colomar
  1 sibling, 1 reply; 43+ messages in thread
From: Bruno Haible @ 2025-03-19 21:59 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Hi Alejandro,

> > > > It would be useful to show how a success test looks like, after
> > > >     strtoi (s, &end, base, min, max, &status)
> > > > for each of the four frequent use-cases:
> > > >   -a. expect to parse the initial portion of the string, no coercion,
> > > >   -b. expect to parse the initial portion of the string, silent coercion,
> > > >   -c. expect to parse the entire string, no coercion,
> > > >   -d. expect to parse the entire string, silent coercion.
> > > > 
> > > > AFAICS, the success tests are:
> > > >   -a. status == 0 || status == ENOTSUP
> > > 
> > > Correct.
> > > 
> > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > 
> > > Correct (but most likely a bug).
> 
> Actually, now I remember that status can be NULL, in which case it's not
> reported.  This is a case where you could check for errors with a
> simpler expression:
> 
> 	end != str
> 
> but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> reasnoable one.

Unfortunately, with a comment like this, you make things more complicated,
not simpler. I was hoping for an API where success can be determined by
looking at 'status' in all four cases, and the "simple" solution that you
are recommending now is:
  -a. look at status
  -b. look at end
  -c. look at status
  -d. look at status AND end.

> I need to update the specification to mention that status can be NULL.

Why do so? This adds text to the specification, making the specification
more complex (=> longer to understand, harder to remember). The ability
to pass NULL for rstatus is not a useful feature.

> > > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > > 
> > > You don't need end>s, because that would preclude ERANGE.
> > > 
> > > 	status == 0 || (status == ERANGE && end == '\0')
> > > 
> > > Aaand, most likely a bug.
> > 
> > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > that the values "4294967297" and "4294967295" produce the same behaviour
> > (the same error message, for example).
> 
> If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> result in an error, then you should probably clamp at ULONG_MAX - 1,
> and consider anything above an error.

I said it in the paragraph above: Often the programmer wants a smooth
transition between "unreasonably large but not overflowing" values and
"overflow". So that only one diagnostic is needed for both cases.

1) getaddrinfo.c

> > Any use of strtoul that does not test errno wants overflow
> > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > Just looking in gnulib and gettext, I find already 6 occurrences:
> >   gnulib/lib/getaddrinfo.c:299
> 
> lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
> lib/getaddrinfo.c-298-            return EAI_NONAME;
> lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
> lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
> lib/getaddrinfo.c-301-            return EAI_NONAME;
> lib/getaddrinfo.c-302-          port = htons (port);
> 
> You could remove the preceding conditional if you don't want to avoid
> leading whitespace.

We don't want to allow leading whitespace here. We need to follow the
getaddrinfo() spec [1]. At the same time, disallowing a leading '-' sign
is a benefit as well. I consider it a misfeature that strtoul() parses
"-3" successfully and returns ULONG_MAX-2, which was most certainly
not intended by the user.

> You could merge that into the strtou(3) call, which
> would report ECANCELED for non-numeric input).  Except that a negative
> number is silently converted to a positive large value.  This is why I
> use a wrapper function strtou_noneg() that rejects negative numbers.

Ouch. Yet another problem with strtoul(). That would be worth another
error code in the specification of strtou().

> I think this is not one case where you want silent saturation.  You're
> indeed doing range checks [0, UINT16_MAX].

Yes, in this particular case, the ability to pass an arbitrary min and max
to strtou() is a practical feature.

2) nproc.c, omp_init.c

> >   gnulib/lib/nproc.c:402
> 
> lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> lib/nproc.c-384-   Return 0 for invalid values.  */
> lib/nproc.c-385-static unsigned long int
> lib/nproc.c:386:parse_omp_threads (char const* threads)
> lib/nproc.c-387-{
> 
> ...
> 
> lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> lib/nproc.c-399-  if (c_isdigit (*threads))
> lib/nproc.c-400-    {
> lib/nproc.c-401-      char *endptr = NULL;
> lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> lib/nproc.c-403-
> lib/nproc.c-404-      if (endptr != NULL)
> lib/nproc.c-405-        {
> lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> lib/nproc.c-407-            endptr++;
> lib/nproc.c-408-          if (*endptr == '\0')
> lib/nproc.c-409-            return value;
> lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> lib/nproc.c-412-          else if (*endptr == ',')
> lib/nproc.c-413-            return value;
> lib/nproc.c-414-        }
> lib/nproc.c-415-    }
> 
> First of all, the endptr!=NULL test seems misplaced.

I wasn't sure that all implementations store an endptr in all cases.

> And you could probably remove the isdigit test by calling
> strtou_noneg().

This would allow leading whitespace in the value of the environment
variable, which is not needed for OpenMP 6.0 [2] § 4.1.3 and therefore
better avoided.

> This could be something like this (fixing the bugs reported above):
> 
> 	char    *end;
> 	u_long  value;
> 
> 	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> 	if (end != threads) {
> 		end += strspn(end, " \t\n");
> 		if (streq(end, "")
> 			return value;
> 		if (strprefix(end, ","))
> 			return value;
> 	}
> 
> 
> This is one case where you seem to silently ignore saturation.

This is on purpose. A CPU never has more than 1000 processors. Therefore
all values > 1000 should be treated the same way, whether they are
<= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
processors occurs later in the code.

3) msgfmt.c

> >   gettext/gettext-tools/src/msgfmt.c:287
> 
> gettext-tools/src/msgfmt.c-286-          char *endp;
> gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> gettext-tools/src/msgfmt.c-288-
> gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> 
> This code will misbehave badly on platforms where size_t is narrower
> than u_long.

Such platforms (namely Windows 3.1) are not among the portability targets
of this code.

> You also don't reject negative numbers, which I expect to be a bug,
> connected with the one from above.

That could be a problem. Fortunately, the documentation makes it clear
that the argument is an "alignment", and reasonable users will therefore
only pass the values 1, 2, 4, 8, 16.

> Why don't you have a diagnostic message for invalid input?

Because this option is meant for users who know what an "alignment" is.

4) msgl-check.c

> >   gettext/gettext-tools/src/msgl-check.c:379
> 
> gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> gettext-tools/src/msgl-check.c-375-            ++nplurals;
> gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> gettext-tools/src/msgl-check.c-381-            {
> gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> gettext-tools/src/msgl-check.c-384-
> gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> gettext-tools/src/msgl-check.c-386-                {
> gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> gettext-tools/src/msgl-check.c-389-                               msgext);
> gettext-tools/src/msgl-check.c-390-                  free (msgext);
> gettext-tools/src/msgl-check.c-391-                  free (help);
> gettext-tools/src/msgl-check.c-392-                }
> gettext-tools/src/msgl-check.c-393-              else
> gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> gettext-tools/src/msgl-check.c-395-                             msg);
> gettext-tools/src/msgl-check.c-396-
> gettext-tools/src/msgl-check.c-397-              seen_errors++;
> gettext-tools/src/msgl-check.c-398-            }
> 
> You could get rid of a lot of code preceding the strtoul(3) call by
> calling strtou_noneg() instead.

This would allow whitespace. Allowing whitespace here (as in "nplurals= 3")
is not a worthy feature.

> On the other hand, I also wonder why you don't diagnose invalid input.
> Why is -10 an "invalid nplurals value"

"-10" designates a negative number. For the number of plural forms, that's
invalid.

> but ULONG_MAX+10 is a valid (albeit clamped) one?

Again, as mentioned above, values like 1000 and 10000000000000000 should
be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
just fine.

> All of these cases look like missing error handling, IMO.

No. In this case, inspecting errno would be additional code with no benefit.

5) read-stringtable.c

> >   gettext/gettext-tools/src/read-stringtable.c:561
> 
> gettext-tools/src/read-stringtable.c-553-    {
> gettext-tools/src/read-stringtable.c-554-      char *last_colon;
> gettext-tools/src/read-stringtable.c-555-      unsigned long number;
> gettext-tools/src/read-stringtable.c-556-      char *endp;
> gettext-tools/src/read-stringtable.c-557-
> gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
> gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
> gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> gettext-tools/src/read-stringtable.c-562-        {
> gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
> gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
> gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
> gettext-tools/src/read-stringtable.c-566-        }
> gettext-tools/src/read-stringtable.c-567-      else
> gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
> gettext-tools/src/read-stringtable.c-569-    }
> 
> You're forgetting about negative numbers?

Indeed, I didn't realize that strtoul() would accept "-3" as valid. As said
above, strtou() can improve on it, by reserving an error code for it.

> How about huge values?

In this code, we assume that line numbers are <= ULONG_MAX. Not an
unreasonable assumption. I have yet to see source code files that are
4 GiB large...

> But again, I wonder why you don't do range checks.

Range checks are not important here: The line numbers are not used as
indices; they are merely reproduced in the output.

> > If you don't want to do that, I can only repeat what I said in the previous
> > mail: The proposal *does not achieve the goal* of avoiding the most common
> > programmer mistakes. For a robust API, the success test should *only* involve
> > testing the returned 'status', nothing else.
> 
> Let's discuss this after your responses to the above.

In the cases 1), 2), 4), use-case d. was chosen on purpose.

Bruno

[1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getaddrinfo.html
[2] https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 21:59           ` Bruno Haible
@ 2025-03-19 23:12             ` Alejandro Colomar
  2025-03-19 23:30               ` strtou(3) handling of negative input Alejandro Colomar
                                 ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 23:12 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 16891 bytes --]

Hi Bruno,

On Wed, Mar 19, 2025 at 10:59:26PM +0100, Bruno Haible wrote:
> Hi Alejandro,
> 
> > > > > It would be useful to show how a success test looks like, after
> > > > >     strtoi (s, &end, base, min, max, &status)
> > > > > for each of the four frequent use-cases:
> > > > >   -a. expect to parse the initial portion of the string, no coercion,
> > > > >   -b. expect to parse the initial portion of the string, silent coercion,
> > > > >   -c. expect to parse the entire string, no coercion,
> > > > >   -d. expect to parse the entire string, silent coercion.
> > > > > 
> > > > > AFAICS, the success tests are:
> > > > >   -a. status == 0 || status == ENOTSUP
> > > > 
> > > > Correct.
> > > > 
> > > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > > 
> > > > Correct (but most likely a bug).
> > 
> > Actually, now I remember that status can be NULL, in which case it's not
> > reported.  This is a case where you could check for errors with a
> > simpler expression:
> > 
> > 	end != str
> > 
> > but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> > reasnoable one.
> 
> Unfortunately, with a comment like this, you make things more complicated,
> not simpler. I was hoping for an API where success can be determined by
> looking at 'status' in all four cases, and the "simple" solution that you
> are recommending now is:
>   -a. look at status
>   -b. look at end
>   -c. look at status
>   -d. look at status AND end.
> 
> > I need to update the specification to mention that status can be NULL.
> 
> Why do so? This adds text to the specification, making the specification
> more complex (=> longer to understand, harder to remember). The ability
> to pass NULL for rstatus is not a useful feature.

Hmmm, in Debian, there's exactly one place where this happens, and it's
a test case, so I guess it's fine if we break it.

<https://sources.debian.org/src/mk-configure/0.37.0-2/tests/mkc_features/tool/test_features1.cxx/?hl=118#L118>

And there are 0 such calls in NetBSD trunk:

alx@devuan:~/src/bsd/netbsd/trunk$ find -type f \
	| grep '\.[ch]$' \
	| xargs grep -l '\<strto[iu]\>' \
	| xargs pcre2grep -Mn '(?s)\bstrto[iu] *\([^;]*(NULL|0)\)';

So we could tighten the specification to require a non-null pointer.
I would be okay with that.

> > > > >   -d. status == 0 || (status == ERANGE && end > s && *end == '\0')
> > > > 
> > > > You don't need end>s, because that would preclude ERANGE.
> > > > 
> > > > 	status == 0 || (status == ERANGE && end == '\0')
> > > > 
> > > > Aaand, most likely a bug.
> > > 
> > > Cases b. and d. are not bugs. Often, the programmer knows that treating
> > > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These
> > > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent
> > > that the values "4294967297" and "4294967295" produce the same behaviour
> > > (the same error message, for example).
> > 
> > If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both
> > result in an error, then you should probably clamp at ULONG_MAX - 1,
> > and consider anything above an error.
> 
> I said it in the paragraph above: Often the programmer wants a smooth
> transition between "unreasonably large but not overflowing" values and
> "overflow". So that only one diagnostic is needed for both cases.
> 
> 1) getaddrinfo.c
> 
> > > Any use of strtoul that does not test errno wants overflow
> > > to be mapped to ULONG_MAX, that is, is in case b. or d.
> > > Just looking in gnulib and gettext, I find already 6 occurrences:
> > >   gnulib/lib/getaddrinfo.c:299
> > 
> > lib/getaddrinfo.c-297-          if (!(*servname >= '0' && *servname <= '9'))
> > lib/getaddrinfo.c-298-            return EAI_NONAME;
> > lib/getaddrinfo.c:299:          port = strtoul (servname, &c, 10);
> > lib/getaddrinfo.c-300-          if (*c || port > 0xffff)
> > lib/getaddrinfo.c-301-            return EAI_NONAME;
> > lib/getaddrinfo.c-302-          port = htons (port);
> > 
> > You could remove the preceding conditional if you don't want to avoid
> > leading whitespace.
> 
> We don't want to allow leading whitespace here. We need to follow the
> getaddrinfo() spec [1].

Ok.

> At the same time, disallowing a leading '-' sign
> is a benefit as well. I consider it a misfeature that strtoul() parses
> "-3" successfully and returns ULONG_MAX-2, which was most certainly
> not intended by the user.

Agree; it is a misfeature.  In my API a2i(), when the type passed in the
first parameter is an unsigned type, negative values are rejected.

I wonder if there's any legitimate user of that misfeature.  I didn't
want to rule it out from a fundamental API just because I can't think of
a good use of it.

Maybe since we have people from many systems here, anyone who has even
seen a good use of strtoul(3) parsing negative values into an unsigned
type can comment.  Maybe if we don't hear about it, we could consider it
useless and tighten it?  Especially for an API that has explicit range
checks.

Would NetBSD be open to changing the implementation of strtou(3) to
reject negative input?

> > You could merge that into the strtou(3) call, which
> > would report ECANCELED for non-numeric input).  Except that a negative
> > number is silently converted to a positive large value.  This is why I
> > use a wrapper function strtou_noneg() that rejects negative numbers.
> 
> Ouch. Yet another problem with strtoul(). That would be worth another
> error code in the specification of strtou().

No, in strtou(3) I think that should be part of ERANGE.  If I specify
min=0, I really want -1 to trigger an ERANGE, just like 1 would trigger
an ERANGE if I pass min=2.

But yes, another issue in the list of "strtol(3) is really broken,
really.".  :-)

> > I think this is not one case where you want silent saturation.  You're
> > indeed doing range checks [0, UINT16_MAX].
> 
> Yes, in this particular case, the ability to pass an arbitrary min and max
> to strtou() is a practical feature.

:)

> 
> 2) nproc.c, omp_init.c
> 
> > >   gnulib/lib/nproc.c:402
> > 
> > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > lib/nproc.c-384-   Return 0 for invalid values.  */
> > lib/nproc.c-385-static unsigned long int
> > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > lib/nproc.c-387-{
> > 
> > ...
> > 
> > lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> > lib/nproc.c-399-  if (c_isdigit (*threads))
> > lib/nproc.c-400-    {
> > lib/nproc.c-401-      char *endptr = NULL;
> > lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> > lib/nproc.c-403-
> > lib/nproc.c-404-      if (endptr != NULL)
> > lib/nproc.c-405-        {
> > lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> > lib/nproc.c-407-            endptr++;
> > lib/nproc.c-408-          if (*endptr == '\0')
> > lib/nproc.c-409-            return value;
> > lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> > lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> > lib/nproc.c-412-          else if (*endptr == ',')
> > lib/nproc.c-413-            return value;
> > lib/nproc.c-414-        }
> > lib/nproc.c-415-    }
> > 
> > First of all, the endptr!=NULL test seems misplaced.
> 
> I wasn't sure that all implementations store an endptr in all cases.

One of the many issues with strtol(3) to note in the list.  This one is
rather obscure, because different platforms do different things.  The
conclusion we reached when we discussed it back then is: never rely on
strtol(3) reporting EINVAL.  If you've reached that point, you're really
deep into UB.  Check the base before the actual call.

> > This could be something like this (fixing the bugs reported above):
> > 
> > 	char    *end;
> > 	u_long  value;
> > 
> > 	value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
> > 	if (end != threads) {
> > 		end += strspn(end, " \t\n");
> > 		if (streq(end, "")
> > 			return value;
> > 		if (strprefix(end, ","))
> > 			return value;
> > 	}
> > 
> > 
> > This is one case where you seem to silently ignore saturation.
> 
> This is on purpose. A CPU never has more than 1000 processors. Therefore
> all values > 1000 should be treated the same way, whether they are
> <= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
> processors occurs later in the code.

I guess you refer to the MIN() calls within num_processors(), right?
Why do those clampings not result in diagnostics?  Couldn't you
calculate the limits before parsing the actual number, and so use them
to perform the range checks during the strtou(3) call?

> 3) msgfmt.c
> 
> > >   gettext/gettext-tools/src/msgfmt.c:287
> > 
> > gettext-tools/src/msgfmt.c-286-          char *endp;
> > gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> > gettext-tools/src/msgfmt.c-288-
> > gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> > gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> > 
> > This code will misbehave badly on platforms where size_t is narrower
> > than u_long.
> 
> Such platforms (namely Windows 3.1) are not among the portability targets
> of this code.

Ok.  Still, strtou(3) would make that a non-issue.

> > You also don't reject negative numbers, which I expect to be a bug,
> > connected with the one from above.
> 
> That could be a problem. Fortunately, the documentation makes it clear
> that the argument is an "alignment", and reasonable users will therefore
> only pass the values 1, 2, 4, 8, 16.

Nice, then a call to strtou(3) would have the limits 1 and 16, which
would make it more robust.  No need to actively ignore ERANGE.

> > Why don't you have a diagnostic message for invalid input?
> 
> Because this option is meant for users who know what an "alignment" is.

But still, you don't need to actively ignore ERANGE, especially when it
results in more complex code than actually checking.  Once the API gives
you the check for free, you should probably use it.

In this case, I think I'd do

	int     status;
	size_t  new_align;

	new_align = strtou(optarg, NULL, 0, 1, 16, &status);
	if (status == 0)
		alignment = new_align;

Don't you think this is more robust and just as simple?  I think it is a
'd' case because strtoul(3) makes it difficult for you to do the check,
not because you actively want to not check.

> 4) msgl-check.c
> 
> > >   gettext/gettext-tools/src/msgl-check.c:379
> > 
> > gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > gettext-tools/src/msgl-check.c-375-            ++nplurals;
> > gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> > gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> > gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> > gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> > gettext-tools/src/msgl-check.c-381-            {
> > gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> > gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> > gettext-tools/src/msgl-check.c-384-
> > gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> > gettext-tools/src/msgl-check.c-386-                {
> > gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> > gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > gettext-tools/src/msgl-check.c-389-                               msgext);
> > gettext-tools/src/msgl-check.c-390-                  free (msgext);
> > gettext-tools/src/msgl-check.c-391-                  free (help);
> > gettext-tools/src/msgl-check.c-392-                }
> > gettext-tools/src/msgl-check.c-393-              else
> > gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > gettext-tools/src/msgl-check.c-395-                             msg);
> > gettext-tools/src/msgl-check.c-396-
> > gettext-tools/src/msgl-check.c-397-              seen_errors++;
> > gettext-tools/src/msgl-check.c-398-            }
> 
> > On the other hand, I also wonder why you don't diagnose invalid input.
> > Why is -10 an "invalid nplurals value"
> 
> "-10" designates a negative number. For the number of plural forms, that's
> invalid.
> 
> > but ULONG_MAX+10 is a valid (albeit clamped) one?
> 
> Again, as mentioned above, values like 1000 and 10000000000000000 should
> be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
> just fine.

I see in line 433 a check

	if (min_nplurals < nplurals_value)

And in line 449 a check

	else if (max_nplurals > nplurals_value)

nplurals_value was parsed via strtoul(3) in line 379.

And min_plurals and max_plurals were assigned in lines 293 and lines 298.
Which makes me wonder: why don't you move the tests from lines 4** into
the strtou(3) call?

That is, why not call this?

	nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);

And then have the ERANGE handling there?

> 5) read-stringtable.c
> 
> > >   gettext/gettext-tools/src/read-stringtable.c:561
> > 
> > gettext-tools/src/read-stringtable.c-553-    {
> > gettext-tools/src/read-stringtable.c-554-      char *last_colon;
> > gettext-tools/src/read-stringtable.c-555-      unsigned long number;
> > gettext-tools/src/read-stringtable.c-556-      char *endp;
> > gettext-tools/src/read-stringtable.c-557-
> > gettext-tools/src/read-stringtable.c-558-      if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0
> > gettext-tools/src/read-stringtable.c-559-          && (last_colon = strrchr (line + 6, ':')) != NULL
> > gettext-tools/src/read-stringtable.c-560-          && *(last_colon + 1) != '\0'
> > gettext-tools/src/read-stringtable.c:561:          && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0'))
> > gettext-tools/src/read-stringtable.c-562-        {
> > gettext-tools/src/read-stringtable.c-563-          /* A "File: <filename>:<number>" type comment.  */
> > gettext-tools/src/read-stringtable.c-564-          *last_colon = '\0';
> > gettext-tools/src/read-stringtable.c-565-          catalog_reader_seen_comment_filepos (catr, line + 6, number);
> > gettext-tools/src/read-stringtable.c-566-        }
> > gettext-tools/src/read-stringtable.c-567-      else
> > gettext-tools/src/read-stringtable.c-568-        catalog_reader_seen_comment (catr, line);
> > gettext-tools/src/read-stringtable.c-569-    }
> > 
> > You're forgetting about negative numbers?
> 
> Indeed, I didn't realize that strtoul() would accept "-3" as valid. As said
> above, strtou() can improve on it, by reserving an error code for it.

Yeah, if I had designed it, it would have ERANGE for those cases.
Maybe we're in time to reform it.  I'll first open a NetBSD bug, and see
if we can convince them.

> > How about huge values?
> 
> In this code, we assume that line numbers are <= ULONG_MAX. Not an
> unreasonable assumption. I have yet to see source code files that are
> 4 GiB large...
> 
> > But again, I wonder why you don't do range checks.
> 
> Range checks are not important here: The line numbers are not used as
> indices; they are merely reproduced in the output.

They're not important, but if the API gives you a range check for free,
I think you should take it, and consider a line number of ULONG_MAX + 1
to be an error.  It will make the code more robust, I think.

> > > If you don't want to do that, I can only repeat what I said in the previous
> > > mail: The proposal *does not achieve the goal* of avoiding the most common
> > > programmer mistakes. For a robust API, the success test should *only* involve
> > > testing the returned 'status', nothing else.
> > 
> > Let's discuss this after your responses to the above.
> 
> In the cases 1), 2), 4), use-case d. was chosen on purpose.

Case 1 was not d.  Case 1 does check the range.  ?
In cases 2 and 4, I think you'd benefit from refactoring the code to use
the limits in the strtou(3) call.  Especially in case 4 where you
already know the limits.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* strtou(3) handling of negative input
  2025-03-19 23:12             ` Alejandro Colomar
@ 2025-03-19 23:30               ` Alejandro Colomar
  2025-03-19 23:52               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 23:30 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

Hi Bruno,

On Thu, Mar 20, 2025 at 12:12:10AM +0100, Alejandro Colomar wrote:
> > At the same time, disallowing a leading '-' sign
> > is a benefit as well. I consider it a misfeature that strtoul() parses
> > "-3" successfully and returns ULONG_MAX-2, which was most certainly
> > not intended by the user.
> 
> Agree; it is a misfeature.  In my API a2i(), when the type passed in the
> first parameter is an unsigned type, negative values are rejected.
> 
> I wonder if there's any legitimate user of that misfeature.  I didn't
> want to rule it out from a fundamental API just because I can't think of
> a good use of it.
> 
> Maybe since we have people from many systems here, anyone who has even
> seen a good use of strtoul(3) parsing negative values into an unsigned
> type can comment.  Maybe if we don't hear about it, we could consider it
> useless and tighten it?  Especially for an API that has explicit range
> checks.
> 
> Would NetBSD be open to changing the implementation of strtou(3) to
> reject negative input?

I have filed a bug in NetBSD for reforming strtou(3):
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59198>

Let's see what they think about it.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 23:12             ` Alejandro Colomar
  2025-03-19 23:30               ` strtou(3) handling of negative input Alejandro Colomar
@ 2025-03-19 23:52               ` Thorsten Glaser
  2025-03-20  0:19                 ` Alejandro Colomar
  2025-03-19 23:52               ` nullability of status parameter in strtoi/u(3) Alejandro Colomar
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Thorsten Glaser @ 2025-03-19 23:52 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Thu, 20 Mar 2025, Alejandro Colomar wrote:

>> At the same time, disallowing a leading '-' sign
>> is a benefit as well. I consider it a misfeature that strtoul() parses
>> "-3" successfully and returns ULONG_MAX-2, which was most certainly
>> not intended by the user.
>
>Agree; it is a misfeature.

What?

From a user’s PoV, this is hugely useful, and many other
read-unsigned-integer-value routines handle this similarily
(of course using whatever range they have) and C also defines
this, so from an implementor’s PoV this is no trouble.

The other user’s PoV thing would be to allow 0x prefixing,
but that needs an entire duplication of the inner loop, so
I can see why people would want to exclude that.

bye,
//mirabilos
-- 
22:20⎜<asarch> The crazy that persists in his craziness becomes a master
22:21⎜<asarch> And the distance between the craziness and geniality is
only measured by the success 18:35⎜<asarch> "Psychotics are consistently
inconsistent. The essence of sanity is to be inconsistently inconsistent

^ permalink raw reply	[flat|nested] 43+ messages in thread

* nullability of status parameter in strtoi/u(3)
  2025-03-19 23:12             ` Alejandro Colomar
  2025-03-19 23:30               ` strtou(3) handling of negative input Alejandro Colomar
  2025-03-19 23:52               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
@ 2025-03-19 23:52               ` Alejandro Colomar
  2025-03-20 12:44               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
  2025-03-20 14:26               ` Bruno Haible
  4 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-19 23:52 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 2017 bytes --]

Hi Bruno,

On Thu, Mar 20, 2025 at 12:12:10AM +0100, Alejandro Colomar wrote:
> > > > > >   -b. status == 0 || status == ENOTSUP || status == ERANGE
> > > > > 
> > > > > Correct (but most likely a bug).
> > > 
> > > Actually, now I remember that status can be NULL, in which case it's not
> > > reported.  This is a case where you could check for errors with a
> > > simpler expression:
> > > 
> > > 	end != str
> > > 
> > > but (status == 0 || status == ENOTSUP || status == ERANGE) is still a
> > > reasnoable one.
> > 
> > Unfortunately, with a comment like this, you make things more complicated,
> > not simpler. I was hoping for an API where success can be determined by
> > looking at 'status' in all four cases, and the "simple" solution that you
> > are recommending now is:
> >   -a. look at status
> >   -b. look at end
> >   -c. look at status
> >   -d. look at status AND end.
> > 
> > > I need to update the specification to mention that status can be NULL.
> > 
> > Why do so? This adds text to the specification, making the specification
> > more complex (=> longer to understand, harder to remember). The ability
> > to pass NULL for rstatus is not a useful feature.
> 
> Hmmm, in Debian, there's exactly one place where this happens, and it's
> a test case, so I guess it's fine if we break it.
> 
> <https://sources.debian.org/src/mk-configure/0.37.0-2/tests/mkc_features/tool/test_features1.cxx/?hl=118#L118>
> 
> And there are 0 such calls in NetBSD trunk:
> 
> alx@devuan:~/src/bsd/netbsd/trunk$ find -type f \
> 	| grep '\.[ch]$' \
> 	| xargs grep -l '\<strto[iu]\>' \
> 	| xargs pcre2grep -Mn '(?s)\bstrto[iu] *\([^;]*(NULL|0)\)';
> 
> So we could tighten the specification to require a non-null pointer.
> I would be okay with that.

I've reported this to NetBSD.  Let's see what they think about it.
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59199>


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 23:52               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
@ 2025-03-20  0:19                 ` Alejandro Colomar
  2025-03-20  0:31                   ` Thorsten Glaser
  0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20  0:19 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

Hi Thorsten,

On Thu, Mar 20, 2025 at 12:52:07AM +0100, Thorsten Glaser wrote:
> On Thu, 20 Mar 2025, Alejandro Colomar wrote:
> 
> >> At the same time, disallowing a leading '-' sign
> >> is a benefit as well. I consider it a misfeature that strtoul() parses
> >> "-3" successfully and returns ULONG_MAX-2, which was most certainly
> >> not intended by the user.
> >
> >Agree; it is a misfeature.
> 
> What?
> 
> From a user’s PoV, this is hugely useful, and many other
> read-unsigned-integer-value routines handle this similarily
> (of course using whatever range they have) and C also defines
> this, so from an implementor’s PoV this is no trouble.

Can you clarify how this is useful as a programmer?  I have replaced
*all* calls to strtoul(3) et al. in shadow-utils by my strtou_noneg(),
and never ever saw a valid use case of that feature.

> The other user’s PoV thing would be to allow 0x prefixing,
> but that needs an entire duplication of the inner loop, so
> I can see why people would want to exclude that.

I don't understand what you mean.  strtou(3) supports 0x strings as long
as you specify 0 or 16 as the base.

	alx@devuan:~/tmp$ cat hex.c 
	#include <bsd/inttypes.h>
	#include <stdio.h>

	int
	main(void)
	{
		int   status, n;
		char  *end;

		n = strtou("0xF", &end, 0, 0, 1000, &status);

		printf("%d\n", n);
		printf("%s\n", end);
		printf("%d\n", status);

		n = strtou("0xF", &end, 16, 0, 1000, &status);

		printf("%d\n", n);
		printf("%s\n", end);
		printf("%d\n", status);
	}
	alx@devuan:~/tmp$ gcc -Wall -Wextra hex.c -lbsd
	alx@devuan:~/tmp$ ./a.out 
	15

	0
	15

	0


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20  0:19                 ` Alejandro Colomar
@ 2025-03-20  0:31                   ` Thorsten Glaser
  2025-03-20  0:36                     ` Alejandro Colomar
  0 siblings, 1 reply; 43+ messages in thread
From: Thorsten Glaser @ 2025-03-20  0:31 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Thu, 20 Mar 2025, Alejandro Colomar wrote:

>> From a user’s PoV, this is hugely useful, and many other
>> read-unsigned-integer-value routines handle this similarily
>> (of course using whatever range they have) and C also defines
>> this, so from an implementor’s PoV this is no trouble.
>
>Can you clarify how this is useful as a programmer?  I have replaced
>*all* calls to strtoul(3) et al. in shadow-utils by my strtou_noneg(),
>and never ever saw a valid use case of that feature.

I said useful for users (and no bother to programmers).

This way, users can do things like './a.out -x $((something))'
and have it work even if that something ends up negative in
the shell (not all shells have a way for unsigned arithmetic
output).

>> The other user’s PoV thing would be to allow 0x prefixing,

>I don't understand what you mean.  strtou(3) supports 0x strings as

Oh, right, it’s one of those where you specify the base.
I might have been confused by too many things at talk at
the same time.

Or just need sleep. Good night to you as well,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.		-- Andreas Bogk über boehm-gc in d.a.s.r

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20  0:31                   ` Thorsten Glaser
@ 2025-03-20  0:36                     ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20  0:36 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

Hi Thorsten,

On Thu, Mar 20, 2025 at 01:31:37AM +0100, Thorsten Glaser wrote:
> >Can you clarify how this is useful as a programmer?  I have replaced
> >*all* calls to strtoul(3) et al. in shadow-utils by my strtou_noneg(),
> >and never ever saw a valid use case of that feature.
> 
> I said useful for users (and no bother to programmers).
> 
> This way, users can do things like './a.out -x $((something))'
> and have it work even if that something ends up negative in
> the shell (not all shells have a way for unsigned arithmetic
> output).

Ahh, I understand now.  Hmmm, I think I prefer keeping my sanity, and
let the user do the hard work.  :)

> >> The other user’s PoV thing would be to allow 0x prefixing,
> 
> >I don't understand what you mean.  strtou(3) supports 0x strings as
> 
> Oh, right, it’s one of those where you specify the base.
> I might have been confused by too many things at talk at
> the same time.

Ahhh, probably with OpenBSD's strtonum(3).

     long long
     strtonum(const char *nptr, long long minval, long long maxval,
         const char **errstr);

It seems it doesn't have a base.

> Or just need sleep. Good night to you as well,
> //mirabilos

Cheers,
Alex  :)

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 21:23               ` Alejandro Colomar
@ 2025-03-20  0:39                 ` Paul Eggert
  2025-03-20  1:15                   ` Alejandro Colomar
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Eggert @ 2025-03-20  0:39 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

On 3/19/25 14:23, Alejandro Colomar wrote:

> I think not reporting errors or warnings on saturation needs
> justification.

I don't know what "justification" means, but if it means a comment in 
the code I'm not sure I agree. Code where saturation is ordinarily 
what's wanted shouldn't need a comment on each nontrivial line saying 
"Saturation is OK here."


> Thanks, those changes look good.  BTW, what do you think of using
> strspn(3) to simplify the c_isspace loop?

Not worth the trouble. The loop is easier to read and debug than the 
strspn call, which got some minor details wrong and fixing that would 
complicate the strspn code even further. (The loop is typically more 
efficient too, not that this matters much here.)


> However, would you mind clarifying why you don't diagnose huge values in
> the two places that you have updated?

For this particular resource, a limit of ULONG_MAX has the same 
practical effect as a limit of ULONG_MAX + 1. Since the user can't tell 
the difference in behavior, it's fine to implement the larger limit as 
the smaller one, with no diagnostic.

A reasonable amount of GNU code works that way.


> 	struct foo {
> 		long  val;
> 		int   err;
> 	}
> 
> 	struct foo  ret;
> 
> 	ret = f(time_t, ...);
> 	if (ret.err != 0)
> 		err(1, "f");
> 
> How do I know which variant of struct foo I need?

I don't understand the question. There's no variant here; "variant" to 
me implies something like a union.

But to fill in the details: C doesn't have a convenient notation for 
returning multiple values, you do need a struct. One convention is to 
use a struct whose tag is the same as the function. So, something like 
this in a header file somewhere:

     struct a2i { intmax_t val; ptrdiff_t len; }
     a2i (char const *str, int base);

where LEN is negative for errors, and callers look like this:

     struct a2i r = a2i(stringval, 10);
     if (r.len < 0 || stringval[r.len])
       err("a2i", stringval, r.len);

the "|| stringval[r.len]" is needed only for callers that consider 
nonnumeric suffixes to be an error.

This is simpler than the pointers and "restrict"s in the proposed API.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20  0:39                 ` Paul Eggert
@ 2025-03-20  1:15                   ` Alejandro Colomar
  2025-03-20  7:03                     ` Paul Eggert
  0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20  1:15 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 4801 bytes --]

Hi Paul,

On Wed, Mar 19, 2025 at 05:39:33PM -0700, Paul Eggert wrote:
> On 3/19/25 14:23, Alejandro Colomar wrote:
> 
> > I think not reporting errors or warnings on saturation needs
> > justification.
> 
> I don't know what "justification" means, but if it means a comment in the
> code I'm not sure I agree. Code where saturation is ordinarily what's wanted
> shouldn't need a comment on each nontrivial line saying "Saturation is OK
> here."

Nah, not a source-code comment.  I think comments are usually evil.

More like you telling me now why you do it that way.  Actually, Bruno
send detailed responses in his last email, and I think that you'd
benefit from range checks, actually.  (See my response to his email.)
<https://lore.kernel.org/liba2i/6oyljvsenypqnrmgjbcwskqpdsag677h2dzay6hvfoosju4224@3j7iczm4d7nw/T/#m38066e6eec63a8906e3cbfea275c9d7940d8df98>

> > Thanks, those changes look good.  BTW, what do you think of using
> > strspn(3) to simplify the c_isspace loop?
> 
> Not worth the trouble. The loop is easier to read and debug than the strspn
> call,

I guess I got used to the niceties of strspn(3) that I find it easier to
read.  It's a matter of taste, so ok.  :)

> which got some minor details wrong and fixing that would complicate
> the strspn code even further.

Do you mean that the implementation of strspn(3) was temporarily broken?
Or that the specification is bad?  I'm curious about it; could you
please clarify?

> > However, would you mind clarifying why you don't diagnose huge values in
> > the two places that you have updated?
> 
> For this particular resource, a limit of ULONG_MAX has the same practical
> effect as a limit of ULONG_MAX + 1. Since the user can't tell the difference
> in behavior, it's fine to implement the larger limit as the smaller one,
> with no diagnostic.

According to Bruno, that limit is later clamped at a much lower value,
so I think that clamping could be moved up to the strtou(3) call.

Of course, that would mean having to implement strtou(3) for now, since
it's non-standard, so keeping it as is is simpler.  I was just trying to
say that if strtou(3) was standard in libc, then you could just use it
and simplify code, while making it more robust.

> A reasonable amount of GNU code works that way.

Ok.

> > 	struct foo {
> > 		long  val;
> > 		int   err;
> > 	}
> > 
> > 	struct foo  ret;
> > 
> > 	ret = f(time_t, ...);
> > 	if (ret.err != 0)
> > 		err(1, "f");
> > 
> > How do I know which variant of struct foo I need?
> 
> I don't understand the question. There's no variant here; "variant" to me
> implies something like a union.
> 
> But to fill in the details: C doesn't have a convenient notation for
> returning multiple values, you do need a struct. One convention is to use a
> struct whose tag is the same as the function. So, something like this in a
> header file somewhere:
> 
>     struct a2i { intmax_t val; ptrdiff_t len; }
>     a2i (char const *str, int base);

How do you get a uintmax_t?  Let's say I'm parsing an unsigned variable.

Also, how do I perform range checks in that call?  I need to specify
min and max limits.

> where LEN is negative for errors, and callers look like this:

How do you know how much has been parsed on error?  That's something
useful from strtoi/u(3).

>     struct a2i r = a2i(stringval, 10);
>     if (r.len < 0 || stringval[r.len])
>       err("a2i", stringval, r.len);
> 
> the "|| stringval[r.len]" is needed only for callers that consider
> nonnumeric suffixes to be an error.

How do you perform range checks with this API?

> This is simpler than the pointers and "restrict"s in the proposed API.

Compare to

	QChar *alt_2(typename T,
	             T *restrict n, QChar *s, int base, T min, T max);


which can be called

	time_t  t;
	char    *end;

	errno = 0;
	end = alt_2(time_t, &t, s, 0, past, future);
	if (errno == ERANGE && t == past)
		goto too_old;
	if (errno == ERANGE && t == future)
		goto too_new;
	if (errno == ENOTSUP)
		goto trailing_test;
	if (errno != 0)
		goto hard_error;

	// All's good here.  Can use 't'.

	...
	return;

trailing_text:
	printf("Trailing text: %s", end);

which gives me for free checks that t is between past and future, and
of course saturation.  It also gives me for free type validation that t
is of type time_t.  It calls strtoi(3) if time_t is a signed type, and
strtou(3) if time_t is an unsigned type.

I can perform all the checks to errno that I want, or I can omit them if
I want.  This is the API I'm working on at the moment, and I don't think
a struct has anything more compelling than that.


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20  1:15                   ` Alejandro Colomar
@ 2025-03-20  7:03                     ` Paul Eggert
  2025-03-20 10:32                       ` Alejandro Colomar
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Eggert @ 2025-03-20  7:03 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

On 2025-03-19 18:15, Alejandro Colomar wrote:

> I think that you'd
> benefit from range checks, actually.  (See my response to his email.)
> <https://lore.kernel.org/liba2i/6oyljvsenypqnrmgjbcwskqpdsag677h2dzay6hvfoosju4224@3j7iczm4d7nw/T/#m38066e6eec63a8906e3cbfea275c9d7940d8df98>

That's a long email and I'm not sure I'm looking at the right place in 
it, but if I understand correctly it says that Gnulib code like this:

   port = strtoul (servname, &c, 10);
   if (port > 0xffff)
     return EAI_NONAME;

would be clearer if worded this way:

   port = strtou (servname, &c, 10, 0, 0xffff, &err);
   if (err == ERANGE)
     return EAI_NONAME;

If so, I disagree. The strtou API is more complicated, and the reader is 
likely to forget what each of its arguments means, e.g., that the 0 is a 
minimum and the 0xffff is a maximum. The strtoul API is simpler and it's 
more obvious what is intended.


> Do you mean that the implementation of strspn(3) was temporarily broken?
> Or that the specification is bad?

No, strspn itself is fine. It's that call to strspn that is broken. The 
call assumes that only ' ', '\t', and '\n' satisfy c_isspace, which is 
incorrect. This is an area where c_isspace is simpler and easier to 
follow than strspn.


>> For this particular resource, a limit of ULONG_MAX has the same practical
>> effect as a limit of ULONG_MAX + 1. Since the user can't tell the difference
>> in behavior, it's fine to implement the larger limit as the smaller one,
>> with no diagnostic.
> 
> According to Bruno, that limit is later clamped at a much lower value,
> so I think that clamping could be moved up to the strtou(3) call.

Yes, it could be moved into strtou, but there's a cost in simplicity and 
easy of understanding.


> How do you get a uintmax_t?

With a different function a2u that comes with a different struct (just 
like strtoi/strtou).

> Also, how do I perform range checks in that call?  I need to specify
> min and max limits.

The caller should do the range checks, as this makes C code easier to 
follow, both by the human programmer and the compiler, which can more 
easily use the range information to generate better code.

For cases like these having a function do the range checks is often a 
mistake - it doesn't significantly increase reliability (on the 
contrary, it can reduce it). But if you prefer a function with range 
checks, it's easy to add bounds arguments.

> How do you know how much has been parsed on error?

Good point. I guess we'll need three elements to the structure, then.

At this point we're bikeshedding but you can have the last word if you like.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20  7:03                     ` Paul Eggert
@ 2025-03-20 10:32                       ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20 10:32 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Eli Schwartz,
	Guillem Jover, Iker Pedrosa, Michael Vetter, Robert Elz,
	riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 3106 bytes --]

Hi Paul,

On Thu, Mar 20, 2025 at 12:03:24AM -0700, Paul Eggert wrote:
> > How do you get a uintmax_t?
> 
> With a different function a2u that comes with a different struct (just like
> strtoi/strtou).

That's a problem.  For fundamental types it's clear what to use, but
say I'm parsing a time_t; I need a macro that calls the signed or the
unsigned variant as needed.

And if I need to declare a variable of a structure type where the
output will be written, I need to know which type it will be.  A union
could work, but then I need to know which member of the union I should
read.  It's not that easy.

> > Also, how do I perform range checks in that call?  I need to specify
> > min and max limits.
> 
> The caller should do the range checks, as this makes C code easier to
> follow, both by the human programmer and the compiler, which can more easily
> use the range information to generate better code.
> 
> For cases like these having a function do the range checks is often a
> mistake - it doesn't significantly increase reliability (on the contrary, it
> can reduce it). But if you prefer a function with range checks, it's easy to
> add bounds arguments.

That's where I wanted to arrive.  That's not true in this case.  I found
several bugs in groff in range checks after strtol(3), because it's
really hard to check for them.  Consider the following code, and focus
only on the range checks (I'll ignore handling other errors).

	int min, max;

	...
	max = cond ? some_value : INT_MAX;
	...

	errno = 0;
	n = strtol(s, NULL, 0);
	if (n > max)
		goto too_high;

It looks reasonable, right?  Well, if max == INT_MAX == LONG_MAX, the
test above will never trigger.  The correct check is the following:


	errno = 0;
	n = strtol(s, NULL, 0);
	if (errno == ERANGE && n == LONG_MAX || n > max)
		goto too_high;
	if (errno == ERANGE && n == LONG_MIN || n < min)
		goto too_low;

Which not many programmers write (at least, I've seen this bug in
several projects.  Since this API does internally perform range checks
and clamping, it should do them fully.

	n = strtoi(s, NULL, 0, min, max, &status);
	if (status == ERANGE && n == max)
		goto too_high;
	if (status == ERANGE && n == min)
		goto too_low;

which is objectively simpler and less error-prone than with strtol(3),
and IMO, more readable.

> > How do you know how much has been parsed on error?
> 
> Good point. I guess we'll need three elements to the structure, then.
> 
> At this point we're bikeshedding but you can have the last word if you like.

Not really.  My point is that an old and time-tested API is better than
a new invention, unless there are significant flaws in it.  strtoi(3)
has passed the test of time, and is free of the many flaws of strtol(3).
I would have designed it differently, but I don't think our concerns are
enough to justify an invention that might be worse in retrospective.

And considering that strtol(3) is really broken, I'd go for strtoi(3).


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 23:12             ` Alejandro Colomar
                                 ` (2 preceding siblings ...)
  2025-03-19 23:52               ` nullability of status parameter in strtoi/u(3) Alejandro Colomar
@ 2025-03-20 12:44               ` Bruno Haible
  2025-03-20 12:55                 ` Alejandro Colomar
  2025-03-20 14:26               ` Bruno Haible
  4 siblings, 1 reply; 43+ messages in thread
From: Bruno Haible @ 2025-03-20 12:44 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Alejandro Colomar wrote:
> > At the same time, disallowing a leading '-' sign
> > is a benefit as well. I consider it a misfeature that strtoul() parses
> > "-3" successfully and returns ULONG_MAX-2, which was most certainly
> > not intended by the user.
> 
> Agree; it is a misfeature.  ...
> 
> I wonder if there's any legitimate user of that misfeature.

I don't think there is. Callers who wish to accept a leading '-' sign
can call strtol() and cast the result to 'unsigned long'.

Bruno




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20 12:44               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
@ 2025-03-20 12:55                 ` Alejandro Colomar
  2025-03-20 17:18                   ` Thorsten Glaser
  0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20 12:55 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 761 bytes --]

Hi Bruno,

On Thu, Mar 20, 2025 at 01:44:49PM +0100, Bruno Haible wrote:
> > > At the same time, disallowing a leading '-' sign
> > > is a benefit as well. I consider it a misfeature that strtoul() parses
> > > "-3" successfully and returns ULONG_MAX-2, which was most certainly
> > > not intended by the user.
> > 
> > Agree; it is a misfeature.  ...
> > 
> > I wonder if there's any legitimate user of that misfeature.
> 
> I don't think there is. Callers who wish to accept a leading '-' sign
> can call strtol() and cast the result to 'unsigned long'.

Yeah.  The only issue with that is not having the full range of
uintmax_t.  But I still think it's a misfeature.


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-19 23:12             ` Alejandro Colomar
                                 ` (3 preceding siblings ...)
  2025-03-20 12:44               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
@ 2025-03-20 14:26               ` Bruno Haible
  2025-03-20 14:54                 ` Alejandro Colomar
  4 siblings, 1 reply; 43+ messages in thread
From: Bruno Haible @ 2025-03-20 14:26 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

Alejandro Colomar wrote:
> > 2) nproc.c, omp_init.c
> > 
> > > >   gnulib/lib/nproc.c:402
> > > 
> > > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > > lib/nproc.c-384-   Return 0 for invalid values.  */
> > > lib/nproc.c-385-static unsigned long int
> > > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > > lib/nproc.c-387-{
> > > 
> > > ...
> > > 
> > > lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> > > lib/nproc.c-399-  if (c_isdigit (*threads))
> > > lib/nproc.c-400-    {
> > > lib/nproc.c-401-      char *endptr = NULL;
> > > lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> > > lib/nproc.c-403-
> > > lib/nproc.c-404-      if (endptr != NULL)
> > > lib/nproc.c-405-        {
> > > lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> > > lib/nproc.c-407-            endptr++;
> > > lib/nproc.c-408-          if (*endptr == '\0')
> > > lib/nproc.c-409-            return value;
> > > lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> > > lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> > > lib/nproc.c-412-          else if (*endptr == ',')
> > > lib/nproc.c-413-            return value;
> > > lib/nproc.c-414-        }
> > > lib/nproc.c-415-    }
> > > ...
> > > This is one case where you seem to silently ignore saturation.
> > 
> > This is on purpose. A CPU never has more than 1000 processors. Therefore
> > all values > 1000 should be treated the same way, whether they are
> > <= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
> > processors occurs later in the code.
> 
> I guess you refer to the MIN() calls within num_processors(), right?
> Why do those clampings not result in diagnostics?  Couldn't you
> calculate the limits before parsing the actual number, and so use them
> to perform the range checks during the strtou(3) call?

The essential code is like this:

  unsigned long int omp_env_limit = ULONG_MAX;

  if (query == NPROC_CURRENT_OVERRIDABLE)
    {
      omp_env_limit = parse_omp_threads (getenv ("OMP_THREAD_LIMIT"));
      if (! omp_env_limit)
        omp_env_limit = ULONG_MAX;
      ...
      query = NPROC_CURRENT;
    }
  unsigned long nprocs = num_processors_ignoring_omp (query);
  return MIN (nprocs, omp_env_limit);

and num_processors_ignoring_omp (query) is expensive to compute, since
it makes system calls. For this reason, we do the parsing of environment
variables first; this gives us the opportunity to optimize away the
system calls in particular circumstances.[1]

> > 3) msgfmt.c
> > 
> > > >   gettext/gettext-tools/src/msgfmt.c:287
> > > 
> > > gettext-tools/src/msgfmt.c-286-          char *endp;
> > > gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> > > gettext-tools/src/msgfmt.c-288-
> > > gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> > > gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> > > ...
> > > Why don't you have a diagnostic message for invalid input?
> > 
> > Because this option is meant for users who know what an "alignment" is.
> 
> But still, you don't need to actively ignore ERANGE, especially when it
> results in more complex code than actually checking.  Once the API gives
> you the check for free, you should probably use it.
> 
> In this case, I think I'd do
> 
> 	int     status;
> 	size_t  new_align;
> 
> 	new_align = strtou(optarg, NULL, 0, 1, 16, &status);
> 	if (status == 0)
> 		alignment = new_align;
> 
> Don't you think this is more robust and just as simple?
> ...
> strtoul(3) makes it difficult for you to do the check,
> not because you actively want to not check.

Yes and no. Yes, it would be nice to have a range check at essentially
zero cost. No, clamping is not the right behaviour: an alignment of 5 or
17 should be rejected, not mapped to something valid.

> > 4) msgl-check.c
> > 
> > > >   gettext/gettext-tools/src/msgl-check.c:379
> > > 
> > > gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > > gettext-tools/src/msgl-check.c-375-            ++nplurals;
> > > gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> > > gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> > > gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> > > gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > > gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> > > gettext-tools/src/msgl-check.c-381-            {
> > > gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> > > gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> > > gettext-tools/src/msgl-check.c-384-
> > > gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> > > gettext-tools/src/msgl-check.c-386-                {
> > > gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> > > gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > > gettext-tools/src/msgl-check.c-389-                               msgext);
> > > gettext-tools/src/msgl-check.c-390-                  free (msgext);
> > > gettext-tools/src/msgl-check.c-391-                  free (help);
> > > gettext-tools/src/msgl-check.c-392-                }
> > > gettext-tools/src/msgl-check.c-393-              else
> > > gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > > gettext-tools/src/msgl-check.c-395-                             msg);
> > > gettext-tools/src/msgl-check.c-396-
> > > gettext-tools/src/msgl-check.c-397-              seen_errors++;
> > > gettext-tools/src/msgl-check.c-398-            }
> > 
> > > On the other hand, I also wonder why you don't diagnose invalid input.
> > > Why is -10 an "invalid nplurals value"
> > 
> > "-10" designates a negative number. For the number of plural forms, that's
> > invalid.
> > 
> > > but ULONG_MAX+10 is a valid (albeit clamped) one?
> > 
> > Again, as mentioned above, values like 1000 and 10000000000000000 should
> > be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
> > just fine.
> 
> I see in line 433 a check
> 
> 	if (min_nplurals < nplurals_value)
> 
> And in line 449 a check
> 
> 	else if (max_nplurals > nplurals_value)
> 
> nplurals_value was parsed via strtoul(3) in line 379.
> 
> And min_plurals and max_plurals were assigned in lines 293 and lines 298.
> Which makes me wonder: why don't you move the tests from lines 4** into
> the strtou(3) call?
> 
> That is, why not call this?
> 
> 	nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);
> 
> And then have the ERANGE handling there?

For two reasons:

  - The program's logic is to parse simple things first and complicated things
    afterwards. The user is likely to make a mistake in the simple things with
    a lower probability than in the complicated things. When there is a
    mismatch, the likely culprit is a mistake in the complicated things; the
    diagnostics need to be directed at this scenario.

  - It feels strange to move complicated validation logic into a library
    function.

In summary, while the built-in range checks are welcome for things like
port numbers (0..65536), in this case it's better to keep the program's logic
explicit and situation-aware.

> In cases 2 and 4, I think you'd benefit from refactoring the code to use
> the limits in the strtou(3) call.  Especially in case 4 where you
> already know the limits.

No. As the author of that code, I disagree.

Bruno

[1] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=205078f891d87e0e966ad8616fdf0306437f0370




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20 14:26               ` Bruno Haible
@ 2025-03-20 14:54                 ` Alejandro Colomar
  0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20 14:54 UTC (permalink / raw)
  To: Bruno Haible
  Cc: liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 6856 bytes --]

Hi Bruno,

On Thu, Mar 20, 2025 at 03:26:46PM +0100, Bruno Haible wrote:
> > > 2) nproc.c, omp_init.c
> > I guess you refer to the MIN() calls within num_processors(), right?
> > Why do those clampings not result in diagnostics?  Couldn't you
> > calculate the limits before parsing the actual number, and so use them
> > to perform the range checks during the strtou(3) call?
> 
> The essential code is like this:
> 
>   unsigned long int omp_env_limit = ULONG_MAX;
> 
>   if (query == NPROC_CURRENT_OVERRIDABLE)
>     {
>       omp_env_limit = parse_omp_threads (getenv ("OMP_THREAD_LIMIT"));
>       if (! omp_env_limit)
>         omp_env_limit = ULONG_MAX;
>       ...
>       query = NPROC_CURRENT;
>     }
>   unsigned long nprocs = num_processors_ignoring_omp (query);
>   return MIN (nprocs, omp_env_limit);
> 
> and num_processors_ignoring_omp (query) is expensive to compute, since
> it makes system calls. For this reason, we do the parsing of environment
> variables first; this gives us the opportunity to optimize away the
> system calls in particular circumstances.[1]

Okay, I guess for such an exceptional optimization, you could live with

        value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL);
        if (end != threads) {
                end += strspn(end, " \t\n");
                if (streq(end, "")
                        return value;
                if (strprefix(end, ","))
                        return value;
        }

since 'end != threads' means "something was parsed".

It deviates from testing 'status' for error handling, but you're
optimizing, so it's expected that you'll not do the obvious thing.  That
keeps usage for most people reasonable.  And remember that I have seen
zero cases of such calls in existing code calling strtoi(3)/strtou(3).

> > > 3) msgfmt.c
> > > 
> > > > >   gettext/gettext-tools/src/msgfmt.c:287
> > > > 
> > > > gettext-tools/src/msgfmt.c-286-          char *endp;
> > > > gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> > > > gettext-tools/src/msgfmt.c-288-
> > > > gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> > > > gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> > > > ...
> > > > Why don't you have a diagnostic message for invalid input?
> > > 
> > > Because this option is meant for users who know what an "alignment" is.
> > 
> > But still, you don't need to actively ignore ERANGE, especially when it
> > results in more complex code than actually checking.  Once the API gives
> > you the check for free, you should probably use it.
> > 
> > In this case, I think I'd do
> > 
> > 	int     status;
> > 	size_t  new_align;
> > 
> > 	new_align = strtou(optarg, NULL, 0, 1, 16, &status);
> > 	if (status == 0)
> > 		alignment = new_align;
> > 
> > Don't you think this is more robust and just as simple?
> > ...
> > strtoul(3) makes it difficult for you to do the check,
> > not because you actively want to not check.
> 
> Yes and no. Yes, it would be nice to have a range check at essentially
> zero cost.

Good.

> No, clamping is not the right behaviour: an alignment of 5 or
> 17 should be rejected, not mapped to something valid.

I'm not clamping there; I'm rejecting the value.  See that I'm only
executing 'alignment = ...' under 'if (status == 0)'.

Or, rather than rejecting it, I'm ignoring it, but that was already
being done for invalid input (e.g., -1).  What I've done is to treat 17
just like -1.

Compare the current code:

gettext-tools/src/msgfmt.c-284-      case 'a':
gettext-tools/src/msgfmt.c-285-        {
gettext-tools/src/msgfmt.c-286-          char *endp;
gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
gettext-tools/src/msgfmt.c-288-
gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
gettext-tools/src/msgfmt.c-290-            alignment = new_align;
gettext-tools/src/msgfmt.c-291-        }
gettext-tools/src/msgfmt.c-292-        break;


with my suggested code:

	case 'a':
		{
			int     status;
			size_t  new_align;

			new_align = strtou(optarg, NULL, 0, 1, 16, &status);
			if (status == 0)
				alignment = new_align;
		}
		break;

> > > 4) msgl-check.c
> > > 
> > > > >   gettext/gettext-tools/src/msgl-check.c:379
> > 
> > I see in line 433 a check
> > 
> > 	if (min_nplurals < nplurals_value)
> > 
> > And in line 449 a check
> > 
> > 	else if (max_nplurals > nplurals_value)
> > 
> > nplurals_value was parsed via strtoul(3) in line 379.
> > 
> > And min_plurals and max_plurals were assigned in lines 293 and lines 298.
> > Which makes me wonder: why don't you move the tests from lines 4** into
> > the strtou(3) call?
> > 
> > That is, why not call this?
> > 
> > 	nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);
> > 
> > And then have the ERANGE handling there?
> 
> For two reasons:
> 
>   - The program's logic is to parse simple things first and complicated things
>     afterwards. The user is likely to make a mistake in the simple things with
>     a lower probability than in the complicated things. When there is a
>     mismatch, the likely culprit is a mistake in the complicated things; the
>     diagnostics need to be directed at this scenario.
> 
>   - It feels strange to move complicated validation logic into a library
>     function.

Then it's your choice.  You can again test (nplurals == end) to make
sure that "something was parsed", if you want to delay the actual range
checks.

BTW, manual checks like n>max are brittle.  Consider a user writing a
ULONG_MAX + 1 in the string, which gets clamped into ULONG_MAX in
strtoul(3), and if for some reason your max value is ULONG_MAX, n will
silently be understood as ULONG_MAX.  I've seen that bug in groff in
several places, and don't remember if in other places.

In general, not checking ERANGE right after a strtoul(3) call is prone
to this bug.  The fact that strtou(3) forces you to specify a range is a
good reminder that you should not delay your checks unless you know what
you're doing.

> In summary, while the built-in range checks are welcome for things like
> port numbers (0..65536), in this case it's better to keep the program's logic
> explicit and situation-aware.
> 
> > In cases 2 and 4, I think you'd benefit from refactoring the code to use
> > the limits in the strtou(3) call.  Especially in case 4 where you
> > already know the limits.
> 
> No. As the author of that code, I disagree.
> 
> Bruno
> 
> [1] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=205078f891d87e0e966ad8616fdf0306437f0370

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* alx-0008r2 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
  2025-03-18 21:16   ` Alejandro Colomar
  2025-03-18 21:53   ` Bruno Haible
@ 2025-03-20 16:13   ` Alejandro Colomar
  2 siblings, 0 replies; 43+ messages in thread
From: Alejandro Colomar @ 2025-03-20 16:13 UTC (permalink / raw)
  To: liba2i, sc22wg14
  Cc: libbsd, tech-misc, Bruno Haible, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn,
	наб

[-- Attachment #1: Type: text/plain, Size: 14373 bytes --]

Hi!

Message-ID: <ywcuccsuwvssekftw6hlsv6umxbd6qoxdiwxb54vcyeb65vkfk@mpxoyhjuri22>

Here's revision r2.  The main change is the addition of the wchar_t
variants, and allowance of status to be NULL.  I've also added mentions
to the two bug tickets open in NetBSD at the moment against these APIs.

To address comments saying that the error handling of this API is
suboptimal: I disagree.  I've already replied to those comments with
specific details.  An innovative error handling system would be worse
than this.  And the existing strtol(3) is certainly worse.


Cheers,
Alex

---
Name
	alx-0008r2 - Standardize strtoi(3) and strtou(3) from NetBSD

Principles
	-  Codify existing practice to address evident deficiencies.
	-  Enable secure programming

Category
	Standardize existing libc APIs

Author
	Alejandro Colomar <alx@kernel.org>

	Cc: <liba2i@lists.linux.dev>
	Cc: <libbsd@lists.freedesktop.org>
	Cc: <sc22wg14@open-std.org>
	Cc: <tech-misc@netbsd.org>
	Cc: Bruno Haible <bruno@clisp.org>
	Cc: christos <christos@netbsd.org>
	Cc: Đoàn Trần Công Danh <congdanhqx@gmail.com>
	Cc: Paul Eggert <eggert@cs.ucla.edu>
	Cc: Eli Schwartz <eschwartz93@gmail.com>
	Cc: Guillem Jover <guillem@hadrons.org>
	Cc: Iker Pedrosa <ipedrosa@redhat.com>
	Cc: Joseph Myers <josmyers@redhat.com>
	Cc: Michael Vetter <jubalh@iodoru.org>
	Cc: Robert Elz <kre@netbsd.org>
	Cc: <riastradh@NetBSD.org>
	Cc: Sam James <sam@gentoo.org>
	Cc: "Serge E. Hallyn" <serge@hallyn.com>
	Cc: наб <nabijaczleweli@nabijaczleweli.xyz>

History
	<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0008.git/>

	r0 (2025-03-18):
	-  Initial draft.

	r1 (2025-03-18):
	-  Add 'Future directions' section.
	-  Fix typos.
	-  Move to <inttypes.h> (7.8 instead of 7.24).
	-  Add links to more NetBSD bug reports in 'See also'.
	-  Add link to n3183 (discussed in Strasbourg) in 'See also'.
	-  Specify the possible implementation-defined behaviors when
	   the base is a value not specified here.
	-  Specify that the range coercion is done with saturation.
	-  Specify that if min>max, these functions return an
	   unspecified value.
	-  Add ECANCELED, EINVAL, ENOTSUP to <errno.h> (7.5).
	-  Note that in the future we'll want to make this
	   const-generic.
	-  Add example.
	-  Add implementation.

	r2 (2025-03-20):
	-  Add Caveats section.
	-  Rename rstatus => status.
	-  Allow 'status' to be NULL.
	-  Add links to 'See also'.
	-  Add wchar_t variants.

Description
	The strtol(3) family of functions is do damn hard to use
	correctly.  Only a handful of programmers in the world really
	know how to use it correctly in all the corner cases, and even
	those need to be really careful to not make mistakes.

	Several projects have tried to develop successor APIs, from
	which the only one that is generic enough to supersede them is
	strtoi/u(3) from NetBSD.

	Other APIs include OpenBSD's strtonum(3), but that API isn't
	generic, and cannot replace every use of strtol(3).  gnulib has
	also some attempts to improve their situation, but they're also
	not suitable for standardization.

	strtoi/u(3) had originally a bug, which shows how difficult it
	is to correctly wrap strto{i,u}max(3) (from the strtol(3)
	family).  That bug has been fixed, and after two years of
	research into string-to-numeric APIs, I can conclude that it is
	a net improvement over the existing APIs, and doesn't have any
	significant flaws.

	It is still not the ideal API in terms of type safety, and I'm
	working on a library that provides safer wrappers.  However,
	such a library would still benefit from having strtoi/u(3) in
	the standard library, by being able to wrap around it.  And user
	programs would immediately benefit from being able to replace
	strtol(3) et al. by strtoi/u(3).

	I have audited several projects which use strtol(3) et al., and
	they're full of bugs.  It's an API that we should really
	deprecate some day.

Prior art
	NetBSD provides strto{i,u}(3), which were introduced in
	NetBSD 7.

	libbsd ports these APIs to other POSIX systems.

	shadow-utils has its own implementation for internal use.

	Here's a possible implementation of strtoi(3):

		intmax_t
		strtoi(const char *s, char **restrict endp, int base,
		    intmax_t min, intmax_t max, int *restrict status)
		{
			int        e, st;
			char       *end;
			intmax_t   n;

			if (endp == NULL)
				endp = &end;
			if (status == NULL)
				status = &st;

			if (base != 0 && (base < 2 || base > 36)) {
				*endp = (char *) s;
				*status = EINVAL;
				return MAX(min, MIN(max, 0));
			}

			e = errno;
			errno = 0;

			n = strtoimax(s, endp, base);

			if (*endp == s)
				*status = ECANCELED;
			else if (errno == ERANGE || n < min || n > max)
				*status = ERANGE;
			else if (**endp != '\0')
				*status = ENOTSUP;
			else
				*status = 0;

			errno = e;

			return MAX(min, MIN(max, n));
		}

	strtou(3) can be implemented with the same exact code, replacing
	s/intmax_t/uintmax_t/, and s/strtoimax/strtoumax/.

    wchar_t
	NetBSD doesn't provide a wchar_t variant of these functions.

Caveats
    strtou_nn()
	strtou(3) leaves one issue of strtoul(3) unfixed: negative
	values are converted to huge positive values by modulo
	arithmetic, before performing range checks.

	For this, I personally use a wrapper, strtou_noneg(), which
	rejects any negative values.  It might be interesting to add it
	to the standard too, since most callers of strtou() really want
	to avoid negative values.  We could call it strtou_nn().  Here's
	a possible implementation:

		uintmax_t
		strtou_nn(const char *s, char **restrict endp, int base,
		    uintmax_t min, uintmax_t max, int *restrict status)
		{
			int  st;

			if (status == NULL)
				status = &st;
			if (strtoi(s, endp, base, 0, 1, status) == 0
			    && *status == ERANGE)
			{
				return min;
			}

			return strtou(s, endp, base, min, max, status);
		}

	Another possibility is to change strtou(3) to reject negative
	numbers.  I've proposed this possibility to NetBSD:
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59198>

    status
	The status parameter can be NULL in the NetBSD implementation,
	but from what I could find, there are no users of this feature.
	It would make sense to have a narrower contract where it cannot
	be NULL.  I've proposed this to NetBSD:
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59199>

Future directions
    atoi(3), scanf(3)
	The atoi(3) family of functions has unnecessary UB.  It could be
	removed by redefining it in terms of this API:

		int
		atoi(const char *s)
		{
			int  n, e;

			n = strtoi(s, NULL, 10, _Minof(n), _Maxof(n), &e)
			errno = e ?: errno;

			return n;
		}

	Which would make atoi(3) behave just like one would expect.
	Then we could define scanf(3)'s %d et al. in terms of atoi(3).

    wchar_t
	It could be interesting to add a wchar-based variant of these
	APIs.

    locale_t
	It could be interesting to add a variant of these APIs that
	accepts a locale_t parameter instead of using the current
	locale.  Those APIs exist in NetBSD as strtoi_l(), strtou_l().

    _Generic
	Once something like Chris's n3510 (2025-02-27, "Enhanced type
	variance (v2)") is accepted into C2y, we could transform these
	functions to use QChar, thus transforming them into
	const-generic functions, as as with the strtol(3) family of
	functions.

See also
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57828>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58453>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58461>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59198>
	<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59199>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3183.pdf>

Proposed wording
	Based on N3467.

    7.5  Errors <errno.h>
	@@ p2
	 The macros are
	+	ECANCELED
		EDOM
	+	EINVAL
		EILSEQ
	+	ENOTSUP
		ERANGE

    7.8.3  Functions for greatest-width integer types
	New section _before_ 7.8.3.3 (The strtoimax and strtoumax functions).

	While all this section is new, some text is pasted verbatim from
	7.24.2.8.  I'll write that text as if it was already existing
	in the diff below.

	I also renamed the parameters of strtol(3):
	nptr => s	Because it's a string, not a pointer to a number.
	endptr => endp	It's shorter and just as readable (if not more).

	@@
	+7.8.2.*  The <b>strtoi</b> and <b>strtou</b> functions
	+
	+Synopsis
	+1	#include <inttypes.h>
	+	intmax_t strtoi(const char *restrict s, char **restrict endp,
	+	    int base, intmax_t min, intmax_t max, int *status);
	+	uintmax_t strtou(const char *restrict s, char **restrict endp,
	+	    int base, uintmax_t min, uintmax_t max, int *status);
	+
	+Description
	+2	The <b>strtoi</b> and <b>strtou</b> functions
		convert the initial portion of
		the string pointed to by <tt>s</tt>
	+	to <b>intmax_t</b> and <b>uintmax_t</b>,
		respectively.
		First,
		they decompose the input string into three parts:
		an initial, possibly empty, sequence of white-space characters,
		a subject sequence resembling an integer
		represented in some radix determined by the value of <tt>base</tt>,
		and a final string of one or more unrecognized characters,
		including the terminating null character of the input string.
	+	Then,
		they attempt to convert the subject sequence to an integer.
	+	Then,
	+	they coerce with saturation
	+	the integer into the range [min, max].
	+	Finally,
		they return the result.

	Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and
	type names as appropriate.

	@@
	+7	If the value of <tt>base</tt> is different from
	+	the values specified in the preceding paragraphs,
	+	it is implementation-defined
	+	whether these functions successfully convert the value
	+	and in which manner.

	The above paragraph ensures that this function has no
	input-controlled UB.  strtol(s, NULL, base) with a
	user-controlled base can result in UB, and thus vulnerabilities.
	It is trivial to report an error, so let's do it.  This function
	is heavy enough that optimizing this is not worth.  Even POSIX
	does this for strtol(3).

	@@
	 8	If the subject sequence is empty
		or does not have the expected form,
	+	or the value of <tt>base</tt> is not supported,
		no conversion is performed;
		the value of <tt>s</tt>
		is stored in the object pointer to by <tt>endp</tt>,
		provided that <tt>endp</tt> is not a null pointer.

	The above paragraph ensures that *endp can be read after a call
	to these functions.  strtol(3) doesn't provide enough guarantees
	to be able to reliably read it, even in POSIX, and it's hard to
	portably write code that calls it and can inspect *endp after
	the call without UB.

	@@
	 Returns
	+10	The <b>strtoi</b> and <b>strtou</b> functions
		return the converted value, if any.
		If no conversion could be performed,
	+	zero is coerced with saturation into the range,
	+	and then returned.

	The paragraph above doesn't mention the range of representable
	values (unlike 7.24.2.8) because that's already covered by the
	range coercion specified in p2 above.

	@@
	+11	If <tt>min > max</tt>,
	+	these functions return an unspecified value.

	The above paragraph covers the case where min>max, where the
	conversion with saturation into the range cannot do anything
	meaningful.  The error is still specified as ERANGE.

		the value of <tt>s</tt>
		is stored in the object pointer to by <tt>endp</tt>,
		provided that <tt>endp</tt> is not a null pointer.
	@@
	+Errors
	+12	These functions do not set <b>errno</b>.
	+	Instead,
	+	and provided that <tt>endp</tt> is not a null pointer,
	+	they set the object pointed to by <tt>status</tt>
	+	to an error code,
	+	or to zero on success.
	+
	+13	-- EINVAL	The value in <tt>base</tt> is not supported.
	+	-- ECANCELED	The given string did not contain
	+			any characters that were converted.
	+	-- ERANGE	The converted value was out of range
	+			and has been coerced,
	+			or the range was invalid (e.g., min > max).
	+	-- ENOTSUP	The given string contained characters
	+			that did not get converted.
	+
	+14	If various errors happen in the same call,
	+	the first one listed here is reported.

	The paragraph above is important to differentiate the following:
	strtoi("7z", &end, 0, 3, 7, &status);
	strtoi("42z", &end, 0, 3, 7, &status);

	@@
	+15	EXAMPLE 1
	+	The following is an example of
	+	using these functions to parse a number
	+	and the string that follows.
	+
	+		int       err;
	+		char      *end;
	+		intmax_t  n, min = 5, max = 50;
	+
	+		n = strtoi(" 42 kg", &end, 10, min, max, &err);
	+		if (err != 0) {
	+			if (err == EINVAL || err == ECANCELED)
	+				fprintf(stderr, "%s\n", strerror(err));
	+				exit(EXIT_FAILURE);
	+			if (err == ERANGE && n == min)
	+				puts("Too light");
	+			if (err == ERANGE && n == max)
	+				puts("Too heavy");
	+		}
	+		printf("Quantity: %jd\n", n);
	+		if (err == ENOTSUP)
	+			printf("Units: %s\n", end + strspn(end));
	+		else
	+			puts("Unitless?");

    7.32.4.2.1  Wide string numeric conversion functions :: General
	@@ p1
	 This subclause describes
	 wide string analogs of
	-the <b>strtod</b> family of functions (...).
	+the <b>strtoi</b> and <b>strtod</b> families of functions (...).

	Note to the editor: make sure to update the (...) correctly in
	the text above.

    7.32.4.2  Wide string numeric conversion functions
	New section after 7.32.4.2.1 (General).

	+7.32.4.2.1  The <b>wcstoi</b> and <b>wcstou</b> functions
	+
	+Synopsis
	+1	#include <wchar.h>
	+	intmax_t wcstoi(const wchar_t *restrict s, wchar_t **restrict endp,
	+	    int base, intmax_t min, intmax_t max, int *status);
	+	uintmax_t wcstou(const wchar_t *restrict s, wchar_t **restrict endp,
	+	    int base, uintmax_t min, uintmax_t max, int *status);
	+
	+Description
	+2	The <b>wcstoi</b> and <b>wcstou</b> functions
	+	are equivalent to
	+	<b>strtoi</b> and <b>strtou</b>,
	+	except that these handle wide strings.


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
  2025-03-20 12:55                 ` Alejandro Colomar
@ 2025-03-20 17:18                   ` Thorsten Glaser
  0 siblings, 0 replies; 43+ messages in thread
From: Thorsten Glaser @ 2025-03-20 17:18 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Bruno Haible, liba2i, sc22wg14, libbsd, tech-misc, christos,
	Đoàn Trần Công Danh, Paul Eggert,
	Eli Schwartz, Guillem Jover, Iker Pedrosa, Michael Vetter,
	Robert Elz, riastradh, Sam James, Serge E. Hallyn

On Thu, 20 Mar 2025, Alejandro Colomar wrote:

>> I don't think there is. Callers who wish to accept a leading '-' sign
>> can call strtol() and cast the result to 'unsigned long'.
>
>Yeah.  The only issue with that is not having the full range of
>uintmax_t.  But I still think it's a misfeature.

The other issue is that -LONG_MAX-1 may not exist in the
signed data type (in ISO C, even with two’s complement now
prescribed).

Unlikely to matter in the POSIX case, of course.

bye,
//mirabilos
-- 
22:59⎜<Vutral> glaub ich termkit is kompliziert | glabe nicht das man
damit schneller arbeitet | reizüberflutung │ wie windows │ alles evil
zuviel bilder │ wie ein spiel | 23:00⎜<Vutral> die meisten raffen auch
nicht mehr von windows | 23:01⎜<Vutral> bilderbücher sind ja auch nich
wirklich verbreitet als erwachsenen literatur	‣ who needs GUIs thus?

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-03-20 17:18 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250318142555.09A86356820@www.open-std.org>
2025-03-18 13:54 ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Alejandro Colomar
2025-03-18 21:16   ` Alejandro Colomar
2025-03-18 21:53   ` Bruno Haible
2025-03-18 22:43     ` Alejandro Colomar
2025-03-19  0:15       ` Bruno Haible
2025-03-19 15:26         ` Alejandro Colomar
2025-03-19 18:48           ` Alejandro Colomar
2025-03-19 18:56             ` Alejandro Colomar
2025-03-19 21:59           ` Bruno Haible
2025-03-19 23:12             ` Alejandro Colomar
2025-03-19 23:30               ` strtou(3) handling of negative input Alejandro Colomar
2025-03-19 23:52               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Thorsten Glaser
2025-03-20  0:19                 ` Alejandro Colomar
2025-03-20  0:31                   ` Thorsten Glaser
2025-03-20  0:36                     ` Alejandro Colomar
2025-03-19 23:52               ` nullability of status parameter in strtoi/u(3) Alejandro Colomar
2025-03-20 12:44               ` alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Bruno Haible
2025-03-20 12:55                 ` Alejandro Colomar
2025-03-20 17:18                   ` Thorsten Glaser
2025-03-20 14:26               ` Bruno Haible
2025-03-20 14:54                 ` Alejandro Colomar
2025-03-19 19:27         ` Paul Eggert
2025-03-19 20:05           ` Alejandro Colomar
2025-03-19 20:39             ` Paul Eggert
2025-03-19 21:23               ` Alejandro Colomar
2025-03-20  0:39                 ` Paul Eggert
2025-03-20  1:15                   ` Alejandro Colomar
2025-03-20  7:03                     ` Paul Eggert
2025-03-20 10:32                       ` Alejandro Colomar
2025-03-19 15:56       ` Thorsten Glaser
2025-03-19 16:25         ` Alejandro Colomar
2025-03-19 16:36           ` Thorsten Glaser
2025-03-19 16:53             ` Alejandro Colomar
2025-03-19 17:35           ` Bruno Haible
2025-03-19 18:01             ` Alejandro Colomar
2025-03-20 16:13   ` alx-0008r2 " Alejandro Colomar
2025-03-18 17:20 ` [SC22WG14.29900] alx-0008 " Joseph Myers
2025-03-18 20:18   ` Alejandro Colomar
     [not found]   ` <20250318201854.66AB5356895@www.open-std.org>
2025-03-18 21:11     ` [SC22WG14.29912] " Joseph Myers
2025-03-18 21:35       ` Alejandro Colomar
2025-03-18 21:40         ` Alejandro Colomar
2025-03-18 22:14         ` Joseph Myers
2025-03-18 22:49           ` Alejandro Colomar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.