From: Alejandro Colomar <alx.manpages@gmail.com>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
Ian Abbott <abbotti@mev.co.uk>, Zack Weinberg <zack@owlfolio.org>,
linux-man@vger.kernel.org,
Michael Kerrisk <mtk.manpages@gmail.com>,
Alejandro Colomar <alx@kernel.org>
Subject: Re: [PATCH] scanf.3: Do not mention the ERANGE error
Date: Fri, 20 Jan 2023 23:02:22 +0100 [thread overview]
Message-ID: <8f997c92-e82d-08ed-5e01-3f54efa03dcb@gmail.com> (raw)
In-Reply-To: <20230120175552.ri5odhrf56bapuxj@illithid>
[-- Attachment #1.1: Type: text/plain, Size: 9012 bytes --]
Hi Branden!
On 1/20/23 18:55, G. Branden Robinson wrote:
> [re-ordering the mail I'm quoting]
>
> Hi Alex,
>
> I have some observations on your deprecation initiative and people's
> reactions to it.
Sure :)
>
> At 2023-01-20T14:12:07+0100, Alejandro Colomar wrote:
>> All implementations of sscanf(3) produce Undefined Behavior (UB),
>> AFAIK. How much you consider UB to be a real-world issue differs for
>> each programmer, but I tend to consider all UB to be as bad as nasal
>> demons. I'm not saying UB shouldn't exist, just that you shouldn't
>> invoke it. And a function that is used for scanning user input is one
>> of those places where you really want to avoid invoking UB.
>
> If there are common idioms that result in UB, it might be worth
> documenting this in the man page, with a citation to the relevant
> clause of the standard that declares it thus.
Okay. See proposed diff below
> I agree that UB is
> something to be avoided and I think most other programmers do too. The
> advantage to this approach is that if they disagree, they can take their
> argument to the standards body instead of litigating it with you.
:)
>
>> This is similar but different to bzero(3). bzero(3) was broken or
>> slow in some implementations. That's probably why it was never added
>> to ISO C, and why POSIX later removed it. The API wasn't bad, and in
>> fact it's great, I prefer it over memset(3). The difference between
>> bzero(3) and sscanf(3) is that bzero(3) has now been fixed,
>
> I still don't share your preference here. The exposure of a more
> general interface (memset) by a general-purpose library when the
> implementation otherwise has no additional implementation cost is the
> correct choice.
While I share your interest in general-purpose over specialized, and that's in
essence the essence of Unix, I also believe that encapsulation is very necessary
for writing readable code.
Your (and many others') proposal of having a project-specific macro for bzero(3)
seems reasonable in absence of a standard name for it. However, having a
POSIX-blessed (until recently) name for such an interface, I'd prefer sticking
to it. Otherwise, we risk having bzero(), memzero(), zerobytes(), zero(), ...
which is not crazy, but hey, I prefer less moving parts when reading code :)
As for removing from POSIX a function just because it's not generic... I have in
mind a long list of such features that are equally trivial and unnecessary (and
in some cases, they hurt unlike bzero(3), IMO), yet they haven't died. For a
representative, let me present our friend:
printf(3)
Oh boy, tell me it hurts your fingers writing fprintf(stdout, ) but not memset(,
0, ). At least with fprintf(3) it's obvious the ordering of the parameters and
I don't need to check the man page.
> If a given programmer's use cases are restricted such
It's not a single given programmer. memset(3) is likely to be the most obvious
case where the thin wrapper is what you want to call. There are many uses for
fprintf(3), there are many uses for other such functions that have a thin
wrapper in the same libc, but memset(3)? How much you've (or any code you know)
used it with something other than constant expression 0?
> that one of the arguments to a general-purpose function is constant,
> then that is exactly the time for them to write a macro or function
> specific to their project to hide the complexity.
>
> If you tilt your head right, this is similar to one of the ways closures
> are used in other languages.
I'm fine with the function being implemented as a macro, although it would be
better to have it as an inline function, so that -Os can produce smaller code if
needed. In general, I don't like macros unless there's a need to avoid type
conversions; for example for keeping arrays as arrays.
>
>> I could change the "deprecated" statements by "see bugs",
>
> I think you've hit upon one of the core drivers of resistance here. A
> problem with calling something "deprecated" is that it's often unstated
> _who_ is doing the deprecation. Traditionally, I think the Linux
> man-pages have tended only to use this term in reference to one of the
> standards bodies (WG14 or the Austin Group) formally employing it.
There are some pages which have single-handedly deprecated features with no
standard or group doing so. I remember having seen a few pages do that, but
they are all from prehistoric times, when standards didn't mean so much (or
maybe there weren't such standards).
>
> (Maybe I'm wrong, and Linux man-pages _has_ deprecated things in its own
> authorial voice...but if other people also don't know that, it doesn't
> matter, and confusion remains.)
Yes, they did. Well, confusion always happens when things change. I expect
that to settle down. However, I'll try to improve my methods for deprecating
broken stuff as much as I can so we can reduce the confusion.
>
> So I suggest you adopt a new phrase, like "discouraged by Linux
> man-pages", to characterize the authorial voice here. Some people will
> ignore your advice either way, but at least they'll know who they're
> ignoring.[1]
I like deprecating. I want such a strong term. I'll try to clarify that it's
the man-pages that do the deprecation, and not a standards body.
>
>> However, if somebody really wants to use that function, and would like
>> to fix it, I encourage that effort. If the function is fixed, which
>> shouldn't be that hard, I'm fine removing the messages against its
>> usage in the manual.
>>
>> While that doesn't happen, I prefer strongly recommending against
>> their usage in the manual. And dict(1) seems to say that the verb for
>> that is "to deprecate" :)
>
> Your dictionary is correct but social knowledge, a.k.a. tradition and
> folklore, impose a context on the discussion. Sometimes dumb things
> become tradition (like calculating factorials or Fibonacci numbers with
> recursive functions[2])--we don't have to acquiesce to that, but we will
> have to document and sometimes defend our rejection of them.
>
>> Right. memcpy(3) has a bug in the standard. However, implementations
>> do the Right Thing (tm). If implementations did the right thing for
>> sscanf(3), that would be enough to remove the recommendation against
>> it. But my understanding is that the sscanf(3) implementation is not
>> free of that problem.
>
> This is a good opportunity to say so in these terms. "Linux man-pages
> discourages use of sscanf [under the conditions XXX] until
> implementations are corrected to avoid undefined behavior [cite URL
> here]."[3]
>
> Regards,
> Branden
>
> [1] In groff_man(7), I admit I have not taken my own advice, and use the
> term "deprecated" in a subsection heading. I have two defenses for
> this. (1) I reorganized the man page along those lines 5-6 years
> ago, when I had less practice at writing technical documentation,
> and (2) the man(7) macros are not formally standardized anywhere
> anyway. There is no "official" body with which to conflict, or with
> whom groff can be confused by the reader.
>
> After groff 1.23 is released (good news, I heard from Bertrand last
> weekend)
Nice :)
> I hope to add the SunOS extension "SB" to the deprecation
> list now that Solaris's death seems irreversible.
>
> [2] https://sleeplessafternoon.wordpress.com/2013/03/26/examples-of-recursion-the-good-the-bad-and-the-silly/
>
> For the mathematically or algorithmically inclined, I also
> recommend "The Genuine Sieve of Eratosthenes", by Melissa O'Neill.
>
> https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
>
> [3] groff_man(7) gives you UR/UE, so use them! >:-)
How about the following?
Cheers,
Alex
---
diff --git a/man3/sscanf.3 b/man3/sscanf.3
index 26a02521b..870c6f54b 100644
--- a/man3/sscanf.3
+++ b/man3/sscanf.3
@@ -653,6 +653,25 @@ .SS The 'a' assignment-allocation modifier
.I gcc\~\-std=c99
etc.).
.SH BUGS
+.SS Numeric conversion specifiers
+Use of the numeric conversion specifiers produces Undefined Behavior
+for invalid input.
+See
+.UR https://port70.net/\:%7Ensz/\:c/\:c11/\:n1570.html\:#7.21.6.2p10
+C11 7.21.6.2/10
+.UE .
+This is a bug in the ISO C standard,
+and not an inherent design issue with the API.
+However,
+current implementations are not safe from that bug,
+so it is not recommended to use them.
+Instead,
+programs should use functions such as
+.BR strtol (3)
+to parse numeric input.
+This manual page deprecates use of the numeric conversion specifiers
+until they are fixed by ISO C.
+.SS Nonstandard modifiers
These functions are fully C99 conformant, but provide the
additional modifiers
.B q
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-01-20 22:02 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-08 12:34 [PATCH] scanf.3: Do not mention the ERANGE error Ian Abbott
2022-12-09 18:59 ` Alejandro Colomar
2022-12-09 19:28 ` Ian Abbott
2022-12-09 19:33 ` Alejandro Colomar
2022-12-09 21:41 ` Zack Weinberg
2022-12-11 15:58 ` Alejandro Colomar
2022-12-11 16:03 ` Alejandro Colomar
2022-12-12 2:11 ` Zack Weinberg
2022-12-12 10:21 ` Alejandro Colomar
2022-12-14 2:13 ` Zack Weinberg
2022-12-14 10:47 ` Alejandro Colomar
2022-12-14 11:03 ` Ian Abbott
2022-12-29 6:42 ` Zack Weinberg
2022-12-29 6:39 ` Zack Weinberg
2022-12-29 10:47 ` Alejandro Colomar
2022-12-29 16:35 ` Zack Weinberg
2022-12-29 16:39 ` Alejandro Colomar
2022-12-12 15:22 ` Ian Abbott
2022-12-14 2:18 ` Zack Weinberg
2022-12-14 10:22 ` Ian Abbott
2022-12-14 10:39 ` Alejandro Colomar
2022-12-14 10:52 ` Ian Abbott
2022-12-14 11:23 ` Alejandro Colomar
2022-12-14 14:10 ` Ian Abbott
2022-12-14 16:38 ` Joseph Myers
2022-12-12 10:07 ` Ian Abbott
2022-12-12 11:33 ` Alejandro Colomar
2023-01-20 4:09 ` Eric Biggers
2023-01-20 13:12 ` Alejandro Colomar
2023-01-20 17:55 ` G. Branden Robinson
2023-01-20 22:02 ` Alejandro Colomar [this message]
2023-01-20 19:41 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f997c92-e82d-08ed-5e01-3f54efa03dcb@gmail.com \
--to=alx.manpages@gmail.com \
--cc=abbotti@mev.co.uk \
--cc=alx@kernel.org \
--cc=ebiggers@kernel.org \
--cc=g.branden.robinson@gmail.com \
--cc=linux-man@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=zack@owlfolio.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox