Hi Rich, Zack, On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote: > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: [...] > > Now, the abstract correct behavior is secondary to the fact that we > > know there are both systems where close should not be retried after > > EINTR (Linux) and systems where the fd is still open after EINTR > > (HP-UX). But it is my position that *portable code* should assume the > > Linux behavior, because that is the safest option. If you assume the > > HP-UX behavior on a machine that implements the Linux behavior, you > > might close some unrelated file out from under yourself (probably but > > not necessarily a different thread). If you assume the Linux behavior > > on a machine that implements the HP-UX behavior, you have leaked a > > file descriptor; the worst things that can do are much less severe. > > Unfortunately, regardless of what happens, code portable to old > systems needs to avoid getting in the situation to begin with. By > either not installing interrupting signal handlers or blocking EINTR > around close. [...] > > > While I agree with all of this, I think the tone is way too > > > proscriptive. The man pages are to document the behaviors, not tell > > > people how to program. > > > > I could be persuaded to tone it down a little but in this case I think > > the man page's job *is* to tell people how to program. We know lots of > > existing code has gotten the fine details of close() wrong and we are > > trying to document how to do it right. > > No, the job of the man pages absolutely is not "to tell people how to > program". It's to document behaviors. They are not a programming > tutorial. They are not polemic diatribes. They are unbiased statements > of facts. Facts of what the standards say and what implementations do, > that equip programmers with the knowledge they need to make their own > informed decisions, rather than blindly following what someone who > thinks they know better told them to do. This reminds me a little bit of the realloc(p,0) fiasco of C89 and glibc. In most cases, I agree with you that manual pages are and should be aseptic, there are cases where I think the manual page needs to be tutorial. Especially when there's such a mess, we need to both explain all the possible behaviors (or at least mention them to some degree). But for example, there's the case of realloc(p,0), where we have a fiasco that was pushed by a compoundment of wrong decisions by the C Committee, and prior to that from System V. We're a bit lucky that C17 accidentally broke it so badly that we now have it as UB, and that gives us the opportunity to fix it now (which BTW might also be the case for close(2)). In the case of realloc(3), I went and documented in the manual page that glibc is broken, and that ISO C is also broken. STANDARDS malloc() free() calloc() realloc() C23, POSIX.1‐2024. reallocarray() POSIX.1‐2024. realloc(p, 0) The behavior of realloc(p, 0) in glibc doesn’t conform to any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008, POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17 specification was changed to make it conforming, but that specification made it impossible to write code that reli‐ ably determines if the input pointer is freed after real‐ loc(p, 0), and C23 changed it again to make this undefined behavior, acknowledging that the C17 specification was broad enough, so that undefined behavior wasn’t worse than that. reallocarray() suffers the same issues in glibc. musl libc and the BSDs conform to all versions of ISO C and POSIX.1. gnulib provides the realloc‐posix module, which provides wrappers realloc() and reallocarray() that conform to all versions of ISO C and POSIX.1. There’s a proposal to standardize the BSD behavior: https: //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt. HISTORY malloc() free() calloc() realloc() POSIX.1‐2001, C89. reallocarray() glibc 2.26. OpenBSD 5.6, FreeBSD 11.0. malloc() and related functions rejected sizes greater than PTRDIFF_MAX starting in glibc 2.30. free() preserved errno starting in glibc 2.33. realloc(p, 0) C89 was ambiguous in its specification of realloc(p, 0). C99 partially fixed this. The original implementation in glibc would have been con‐ forming to C99. However, and ironically, trying to comply with C99 before the standard was released, glibc changed its behavior in glibc 2.1.1 into something that ended up not conforming to the final C99 specification (but this is debated, as the wording of the standard seems self‐contra‐ dicting). ... BUGS Programmers would naturally expect by induction that realloc(p, size) is consistent with free(p) and mal‐ loc(size), as that is the behavior in the general case. This is not explicitly required by POSIX.1‐2024 or C11, but all conforming implementations are consistent with that. The glibc implementation of realloc() is not consistent with that, and as a consequence, it is dangerous to call realloc(p, 0) in glibc. A trivial workaround for glibc is calling it as realloc(p, size?size:1). The workaround for reallocarray() in glibc ——which shares the same bug—— would be reallocarray(p, n?n:1, size?size:1). Apart from documenting that glibc and ISO C are broken, we document how to best deal with it (see the last paragraph in BUGS). This is necessary because I fear that just by documenting the different behaviors, programmers would still not know what to do with that. Just take into account that even several members of the committee don't know how to deal with it. I'd be willing to have something similar for close(2). Have a lovely night! Alex P.S.: I have great news about realloc(p,0)! Microsoft is on-board with the change. They told me they like the proposal, and are willing to fix their realloc(3) implementation. They'll now conduct tests to make sure it doesn't break anything too badly, and will come back to me with any feedback they have from those tests. I'll put the standards proposal for realloc(3) on hold, waiting for Microsoft's feedback. > > > Aside: the reason EINTR *has to* be specified this way is that pthread > > > cancellation is aligned with EINTR. If EINTR were defined to have > > > closed the fd, then acting on cancellation during close would also > > > have closed the fd, but the cancellation handler would have no way to > > > distinguish this, leading to a situation where you're forced to either > > > leak fds or introduce a double-close vuln. > > > > The correct way to address this would be to make close() not be a > > cancellation point. > > This would also be a desirable change, one I would support if other > implementors are on-board with pushing for it. > > > > An outline of what I'd like to see instead: > > > > > > - Clear explanation of why double-close is a serious bug that must > > > always be avoided. (I think we all agree on this.) > > > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > > requirement differ, without language that tries to paint the POSIX > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > > conforming systems). > > > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > > GNU/Linux, maybe others). > > > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > > possibly ways to infer behavior, etc). > > > > This outline seems more or less reasonable to me but, if it's me > > writing the text, I _will_ characterize what POSIX currently says > > about EINTR returns from close() as a bug in POSIX. As far as I'm > > concerned, that is a fact, not polemic. > > > > I have found that arguing with you in particular, Rich, is generally > > not worth the effort. Therefore, unless you reply and _accept_ that > > the final version of the close manpage will say that POSIX is buggy, > > I am not going to write another version of this text, nor will I be > > drawn into further debate. > > I will not accept that because it's a gross violation of the > responsibility of document writing. > > Rich --