From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mo4-p01-ob.smtp.rzone.de (mo4-p01-ob.smtp.rzone.de [81.169.146.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 374621CAA81 for ; Wed, 19 Mar 2025 22:00:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=81.169.146.165 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742421630; cv=pass; b=XQYok7ZFIeyIsAQ2E07Jz9h29BO2OyYqEhJoNRjiIkn3eF9YwzlmzsSsZlLMu6r8FDVMOkM4zsBZDMobqnjDs9ij+FC85Zlz59c8X9MwBTOYH/EWXMmiGVbB3Z5+4+PAIm1eadkPDV8u3Tw9/aMK+Z48LTYrKPdeRcBsw6HxR8A= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742421630; c=relaxed/simple; bh=cjKHibTmxKKwoI8FFbuT+1z8efnzDtHGVFw3HLaPAIE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q7TLf0C9KNEGhSXk+7Qf5WuWvnLSw2DfM2f8plkqSHDSmtuJBSjH+gZ52SdA+/FvfCKR+L4FFr56g502ntDAma8ex2e7VZGGc9vc1I+FQUS0VINTnRwfqgaH4GerZLZuspo6D5Th9kn0hzCJD6wm9c6sD1whWTrvFd6ziymJ8vU= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=clisp.org; spf=none smtp.mailfrom=clisp.org; dkim=pass (2048-bit key) header.d=clisp.org header.i=@clisp.org header.b=V8YV995o; dkim=permerror (0-bit key) header.d=clisp.org header.i=@clisp.org header.b=QuV3mH3M; arc=pass smtp.client-ip=81.169.146.165 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=clisp.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=clisp.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=clisp.org header.i=@clisp.org header.b="V8YV995o"; dkim=permerror (0-bit key) header.d=clisp.org header.i=@clisp.org header.b="QuV3mH3M" ARC-Seal: i=1; a=rsa-sha256; t=1742421568; cv=none; d=strato.com; s=strato-dkim-0002; b=CPzAy2sR2BC6x7EFHQVibuKEek/biS7iczxzEybLGPLrBYvLP2+zY401mvqW2C8zOy vkSPxd5pE8OII/uSsX4CvftF2g0cbm6Jk5M+ttz8lAyGRd/c39D6cNLvFTj+z78nM1ZB P2iM/RoGhHo9cA/QibTkhs2jTAIlkw73ulTUI+J/BuluEgJEl2UUEJCAMXlqtVZW4Br0 cOdPMBcdC3J+GzYk4qdONe7qBdNlPfYubc9z6YRhwXQR89G2vEXSt8741ArKN+GKnMAm wFsgftz6Y4he88vlIuEchz+l7V2V8qsMyjgeYgmJkXHcqYdeq30/ciSOkXJOzu69pvWc klGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1742421568; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=LNiDarcTy411anrxlTz+d28ALYnNZuuc+PeZTjlmBTI=; b=eNE9QMu3sCBeTyDOyHMt/lpfhTCnpG3i+HULa8up0LYCum3aiEE52KvVjK8oR5L234 3Kw9v8yH0XvtJQOihu3vhO5kQ1/E/QoMFzqkaZdoGfEzkv4XF0GsXKNNAbGGLphzQ78H ec7O36lIw405nZtYZmQ5xtkD8kAijyyoV7SfRHX2FbbjMq8zIdVkFqVurhBRyVEq4JHs Pb543/S/8RcbjwMhimtayRj2D5E/4C7N24o7X+a38bEX3a2EO6D9T0toM0rQz7W1pI+F roQ/viL3IDAtJXlQrZ5rNQEfzAbJv5xnUHrde6UO+dwpjwJOeCpNR/26I0sFMooQhE2+ WfJA== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1742421568; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=LNiDarcTy411anrxlTz+d28ALYnNZuuc+PeZTjlmBTI=; b=V8YV995oFIpJBKaqaUJ+t91LqnNgNuNCzmhqRs1111VZgkzicd45d2qC/dX7lTiu+C VjXa1V6CAQS72yHF0nzqxqAJDrXE+B6zdUux1dBANPL+kvqq0cA3ra8GMSr/ODVGIVs8 JDKIehN07Xs6f3TQDaEZFvik4RUuHwquX2Ym4sjH7EUfhMN03c32iTFo5T/GIwfR4KFA fmYhNmBHt6fwPsc4yiBY1sxqdaO2hcTc3hjfPBTsxCBQk+UjcIG0MlhANFZWcMqIycxJ ACE4ZyHrOTpHPXqQWaAav/zJxv/px9dcxeDhJWUU9xAl5coqVvmFryWi7hR5GCZDpVVI vCRA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1742421568; s=strato-dkim-0003; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=LNiDarcTy411anrxlTz+d28ALYnNZuuc+PeZTjlmBTI=; b=QuV3mH3MNxhdP5LgDUKTaaxHjzzHXgPp3C/yth56LyMZ3SY+UsFuNRkE9xYJZ0sTLR GwEAeM4JnXsm37XBwfAw== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlLnY4jECd2hdUURIbZgL8PX2QiTuZ3cdB8X/nqjiQEm7gzaGYG3gikYmFw+Yd+lvO" Received: from nimes.localnet by smtp.strato.de (RZmta 51.3.0 AUTH) with ESMTPSA id N7dcf812JLxQ8NX (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Wed, 19 Mar 2025 22:59:26 +0100 (CET) From: Bruno Haible To: Alejandro Colomar Cc: liba2i@lists.linux.dev, sc22wg14@open-std.org, libbsd@lists.freedesktop.org, tech-misc@netbsd.org, christos , =?utf-8?B?xJBvw6BuIFRy4bqnbiBDw7RuZw==?= Danh , Paul Eggert , Eli Schwartz , Guillem Jover , Iker Pedrosa , Michael Vetter , Robert Elz , riastradh@netbsd.org, Sam James , "Serge E. Hallyn" Subject: Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD Date: Wed, 19 Mar 2025 22:59:26 +0100 Message-ID: <4085563.2iPT33SAM4@nimes> Organization: GNU In-Reply-To: References: <3237498.fEcJ0Lxnt5@nimes> Precedence: bulk X-Mailing-List: liba2i@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Hi Alejandro, > > > > It would be useful to show how a success test looks like, after > > > > strtoi (s, &end, base, min, max, &status) > > > > for each of the four frequent use-cases: > > > > -a. expect to parse the initial portion of the string, no coercio= n, > > > > -b. expect to parse the initial portion of the string, silent coe= rcion, > > > > -c. expect to parse the entire string, no coercion, > > > > -d. expect to parse the entire string, silent coercion. > > > >=20 > > > > AFAICS, the success tests are: > > > > -a. status =3D=3D 0 || status =3D=3D ENOTSUP > > >=20 > > > Correct. > > >=20 > > > > -b. status =3D=3D 0 || status =3D=3D ENOTSUP || status =3D=3D ERA= NGE > > >=20 > > > Correct (but most likely a bug). >=20 > Actually, now I remember that status can be NULL, in which case it's not > reported. This is a case where you could check for errors with a > simpler expression: >=20 > end !=3D str >=20 > but (status =3D=3D 0 || status =3D=3D ENOTSUP || status =3D=3D ERANGE) is= still a > reasnoable one. Unfortunately, with a comment like this, you make things more complicated, not simpler. I was hoping for an API where success can be determined by looking at 'status' in all four cases, and the "simple" solution that you are recommending now is: -a. look at status -b. look at end -c. look at status -d. look at status AND end. > I need to update the specification to mention that status can be NULL. Why do so? This adds text to the specification, making the specification more complex (=3D> longer to understand, harder to remember). The ability to pass NULL for rstatus is not a useful feature. > > > > -d. status =3D=3D 0 || (status =3D=3D ERANGE && end > s && *end = =3D=3D '\0') > > >=20 > > > You don't need end>s, because that would preclude ERANGE. > > >=20 > > > status =3D=3D 0 || (status =3D=3D ERANGE && end =3D=3D '\0') > > >=20 > > > Aaand, most likely a bug. > >=20 > > Cases b. and d. are not bugs. Often, the programmer knows that treating > > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These > > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent > > that the values "4294967297" and "4294967295" produce the same behaviour > > (the same error message, for example). >=20 > If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both > result in an error, then you should probably clamp at ULONG_MAX - 1, > and consider anything above an error. I said it in the paragraph above: Often the programmer wants a smooth transition between "unreasonably large but not overflowing" values and "overflow". So that only one diagnostic is needed for both cases. 1) getaddrinfo.c > > Any use of strtoul that does not test errno wants overflow > > to be mapped to ULONG_MAX, that is, is in case b. or d. > > Just looking in gnulib and gettext, I find already 6 occurrences: > > gnulib/lib/getaddrinfo.c:299 >=20 > lib/getaddrinfo.c-297- if (!(*servname >=3D '0' && *servname <= =3D '9')) > lib/getaddrinfo.c-298- return EAI_NONAME; > lib/getaddrinfo.c:299: port =3D strtoul (servname, &c, 10); > lib/getaddrinfo.c-300- if (*c || port > 0xffff) > lib/getaddrinfo.c-301- return EAI_NONAME; > lib/getaddrinfo.c-302- port =3D htons (port); >=20 > You could remove the preceding conditional if you don't want to avoid > leading whitespace. We don't want to allow leading whitespace here. We need to follow the getaddrinfo() spec [1]. At the same time, disallowing a leading '-' sign is a benefit as well. I consider it a misfeature that strtoul() parses "-3" successfully and returns ULONG_MAX-2, which was most certainly not intended by the user. > You could merge that into the strtou(3) call, which > would report ECANCELED for non-numeric input). Except that a negative > number is silently converted to a positive large value. This is why I > use a wrapper function strtou_noneg() that rejects negative numbers. Ouch. Yet another problem with strtoul(). That would be worth another error code in the specification of strtou(). > I think this is not one case where you want silent saturation. You're > indeed doing range checks [0, UINT16_MAX]. Yes, in this particular case, the ability to pass an arbitrary min and max to strtou() is a practical feature. 2) nproc.c, omp_init.c > > gnulib/lib/nproc.c:402 >=20 > lib/nproc.c-383-/* Parse OMP environment variables without dependence on = OMP. > lib/nproc.c-384- Return 0 for invalid values. */ > lib/nproc.c-385-static unsigned long int > lib/nproc.c:386:parse_omp_threads (char const* threads) > lib/nproc.c-387-{ >=20 > ... >=20 > lib/nproc.c-398- /* Convert it from positive decimal to 'unsigned long'.= */ > lib/nproc.c-399- if (c_isdigit (*threads)) > lib/nproc.c-400- { > lib/nproc.c-401- char *endptr =3D NULL; > lib/nproc.c:402: unsigned long int value =3D strtoul (threads, &endp= tr, 10); > lib/nproc.c-403- > lib/nproc.c-404- if (endptr !=3D NULL) > lib/nproc.c-405- { > lib/nproc.c-406- while (*endptr !=3D '\0' && c_isspace (*endptr)) > lib/nproc.c-407- endptr++; > lib/nproc.c-408- if (*endptr =3D=3D '\0') > lib/nproc.c-409- return value; > lib/nproc.c-410- /* Also accept the first value in a nesting lev= el, > lib/nproc.c-411- since we can't determine the nesting level f= rom env vars. */ > lib/nproc.c-412- else if (*endptr =3D=3D ',') > lib/nproc.c-413- return value; > lib/nproc.c-414- } > lib/nproc.c-415- } >=20 > First of all, the endptr!=3DNULL test seems misplaced. I wasn't sure that all implementations store an endptr in all cases. > And you could probably remove the isdigit test by calling > strtou_noneg(). This would allow leading whitespace in the value of the environment variable, which is not needed for OpenMP 6.0 [2] =A7 4.1.3 and therefore better avoided. > This could be something like this (fixing the bugs reported above): >=20 > char *end; > u_long value; >=20 > value =3D strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL); > if (end !=3D threads) { > end +=3D strspn(end, " \t\n"); > if (streq(end, "") > return value; > if (strprefix(end, ",")) > return value; > } >=20 >=20 > This is one case where you seem to silently ignore saturation. This is on purpose. A CPU never has more than 1000 processors. Therefore all values > 1000 should be treated the same way, whether they are <=3D ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available processors occurs later in the code. 3) msgfmt.c > > gettext/gettext-tools/src/msgfmt.c:287 >=20 > gettext-tools/src/msgfmt.c-286- char *endp; > gettext-tools/src/msgfmt.c:287: size_t new_align =3D strtoul (op= targ, &endp, 0); > gettext-tools/src/msgfmt.c-288- > gettext-tools/src/msgfmt.c-289- if (endp !=3D optarg) > gettext-tools/src/msgfmt.c-290- alignment =3D new_align; >=20 > This code will misbehave badly on platforms where size_t is narrower > than u_long. Such platforms (namely Windows 3.1) are not among the portability targets of this code. > You also don't reject negative numbers, which I expect to be a bug, > connected with the one from above. That could be a problem. Fortunately, the documentation makes it clear that the argument is an "alignment", and reasonable users will therefore only pass the values 1, 2, 4, 8, 16. > Why don't you have a diagnostic message for invalid input? Because this option is meant for users who know what an "alignment" is. 4) msgl-check.c > > gettext/gettext-tools/src/msgl-check.c:379 >=20 > gettext-tools/src/msgl-check.c-374- while (*nplurals !=3D '\0' &= & c_isspace ((unsigned char) *nplurals)) > gettext-tools/src/msgl-check.c-375- ++nplurals; > gettext-tools/src/msgl-check.c-376- endp =3D nplurals; > gettext-tools/src/msgl-check.c-377- nplurals_value =3D 0; > gettext-tools/src/msgl-check.c-378- if (*nplurals >=3D '0' && *n= plurals <=3D '9') > gettext-tools/src/msgl-check.c:379: nplurals_value =3D strtoul= (nplurals, (char **) &endp, 10); > gettext-tools/src/msgl-check.c-380- if (nplurals =3D=3D endp) > gettext-tools/src/msgl-check.c-381- { > gettext-tools/src/msgl-check.c-382- const char *msg =3D _("i= nvalid nplurals value"); > gettext-tools/src/msgl-check.c-383- char *help =3D plural_he= lp (nullentry); > gettext-tools/src/msgl-check.c-384- > gettext-tools/src/msgl-check.c-385- if (help !=3D NULL) > gettext-tools/src/msgl-check.c-386- { > gettext-tools/src/msgl-check.c-387- char *msgext =3D xas= printf ("%s\n%s", msg, help); > gettext-tools/src/msgl-check.c-388- xeh->xerror (CAT_SEV= ERITY_ERROR, header, NULL, 0, 0, true, > gettext-tools/src/msgl-check.c-389- msgext); > gettext-tools/src/msgl-check.c-390- free (msgext); > gettext-tools/src/msgl-check.c-391- free (help); > gettext-tools/src/msgl-check.c-392- } > gettext-tools/src/msgl-check.c-393- else > gettext-tools/src/msgl-check.c-394- xeh->xerror (CAT_SEVER= ITY_ERROR, header, NULL, 0, 0, false, > gettext-tools/src/msgl-check.c-395- msg); > gettext-tools/src/msgl-check.c-396- > gettext-tools/src/msgl-check.c-397- seen_errors++; > gettext-tools/src/msgl-check.c-398- } >=20 > You could get rid of a lot of code preceding the strtoul(3) call by > calling strtou_noneg() instead. This would allow whitespace. Allowing whitespace here (as in "nplurals=3D 3= ") is not a worthy feature. > On the other hand, I also wonder why you don't diagnose invalid input. > Why is -10 an "invalid nplurals value" "-10" designates a negative number. For the number of plural forms, that's invalid. > but ULONG_MAX+10 is a valid (albeit clamped) one? Again, as mentioned above, values like 1000 and 10000000000000000 should be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that= 's just fine. > All of these cases look like missing error handling, IMO. No. In this case, inspecting errno would be additional code with no benefit. 5) read-stringtable.c > > gettext/gettext-tools/src/read-stringtable.c:561 >=20 > gettext-tools/src/read-stringtable.c-553- { > gettext-tools/src/read-stringtable.c-554- char *last_colon; > gettext-tools/src/read-stringtable.c-555- unsigned long number; > gettext-tools/src/read-stringtable.c-556- char *endp; > gettext-tools/src/read-stringtable.c-557- > gettext-tools/src/read-stringtable.c-558- if (strlen (line) >=3D 6 &= & memcmp (line, "File: ", 6) =3D=3D 0 > gettext-tools/src/read-stringtable.c-559- && (last_colon =3D str= rchr (line + 6, ':')) !=3D NULL > gettext-tools/src/read-stringtable.c-560- && *(last_colon + 1) != =3D '\0' > gettext-tools/src/read-stringtable.c:561: && (number =3D strtoul= (last_colon + 1, &endp, 10), *endp =3D=3D '\0')) > gettext-tools/src/read-stringtable.c-562- { > gettext-tools/src/read-stringtable.c-563- /* A "File: = :" type comment. */ > gettext-tools/src/read-stringtable.c-564- *last_colon =3D '\0'; > gettext-tools/src/read-stringtable.c-565- catalog_reader_seen_co= mment_filepos (catr, line + 6, number); > gettext-tools/src/read-stringtable.c-566- } > gettext-tools/src/read-stringtable.c-567- else > gettext-tools/src/read-stringtable.c-568- catalog_reader_seen_comm= ent (catr, line); > gettext-tools/src/read-stringtable.c-569- } >=20 > You're forgetting about negative numbers? Indeed, I didn't realize that strtoul() would accept "-3" as valid. As said above, strtou() can improve on it, by reserving an error code for it. > How about huge values? In this code, we assume that line numbers are <=3D ULONG_MAX. Not an unreasonable assumption. I have yet to see source code files that are 4 GiB large... > But again, I wonder why you don't do range checks. Range checks are not important here: The line numbers are not used as indices; they are merely reproduced in the output. > > If you don't want to do that, I can only repeat what I said in the prev= ious > > mail: The proposal *does not achieve the goal* of avoiding the most com= mon > > programmer mistakes. For a robust API, the success test should *only* i= nvolve > > testing the returned 'status', nothing else. >=20 > Let's discuss this after your responses to the above. In the cases 1), 2), 4), use-case d. was chosen on purpose. Bruno [1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getaddrinfo.= html [2] https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.= pdf