* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 [not found] ` <20260120190010.GF6263@brightrain.aerifal.cx> @ 2026-01-20 20:05 ` Florian Weimer 0 siblings, 0 replies; 23+ messages in thread From: Florian Weimer @ 2026-01-20 20:05 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Rich Felker: > On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote: >> * Rich Felker: >> >> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: >> >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: >> >> >> close() always succeeds. That is, after it returns, _fd_ has >> >> >> always been disconnected from the open file it formerly referred >> >> >> to, and its number can be recycled to refer to some other file. >> >> >> Furthermore, if _fd_ was the last reference to the underlying >> >> >> open file description, the resources associated with the open file >> >> >> description will always have been scheduled to be released. >> >> ... >> >> >> EINPROGRESS >> >> >> EINTR >> >> >> There are no delayed errors to report, but the kernel is >> >> >> still doing some clean-up work in the background. This >> >> >> situation should be treated the same as if close() had >> >> >> returned zero. Do not retry the close(), and do not report >> >> >> an error to the user. >> >> > >> >> > Since this behavior for EINTR is non-conforming (and even prior to the >> >> > POSIX 2024 update, it was contrary to the general semantics for EINTR, >> >> > that no non-ignoreable side-effects have taken place), it should be >> >> > noted that it's Linux/glibc-specific. >> >> >> >> I am prepared to take your word for it that POSIX says this is >> >> non-conforming, but in that case, POSIX is wrong, and I will not be >> >> convinced otherwise by any argument. Operations that release a >> >> resource must always succeed. >> > >> > There are two conflicting requirements here: >> > >> > 1. Operations that release a resource must always succeed. >> > 2. Failure with EINTR must not not have side effects. >> > >> > The right conclusion is that operations that release resources must >> > not be able to fail with EINTR. And that's how POSIX should have >> > resolved the situation -- by getting rid of support for the silly >> > legacy synchronous-tape-drive-rewinding behavior of close on some >> > systems, and requiring close to succeed immediately with no waiting >> > for anything. >> >> What about SO_LINGER? Isn't this relevant in context? > > shutdown should be used for this, not close. So that the acts of > waiting for the operation to finish, and releasing the resource handle > needed to observe if it's finished, are separate. I think shutdown on TCP sockets is non-blocking under Linux. It doesn't wait until the peer has acknowledged the FIN segment, as far as I understand it. Other systems may behave differently. >> As far as I know, there is no other way besides SO_LINGER to get >> notification if the packet buffers are actually gone. If you don't use >> it, memory can pile up in the kernel without the application's >> knowledge. > > The way Linux's EINTR behaves, using close can't ensure this memory > doesn't pile up, because on EINTR you lose the ability to wait for it. Can't the application reliably avoid EINTR by blocking signals? Thanks, Florian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 [not found] ` <20260120174659.GE6263@brightrain.aerifal.cx> [not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com> @ 2026-01-20 20:11 ` Paul Eggert 2026-01-20 20:35 ` Alejandro Colomar 2 siblings, 0 replies; 23+ messages in thread From: Paul Eggert @ 2026-01-20 20:11 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development, Zack Weinberg On 2026-01-20 09:46, Rich Felker wrote: > the job of the man pages absolutely is not "to tell people how to > program". It's to document behaviors. In practice man pages do both. When I type "man close" on GNU/Linux I see text like the text quoted below, and as a C programmer I appreciate getting advice like this when the situation is sufficiently tricky. ---- Any record locks (see fcntl(2)) held on the file it was associated with, and owned by the process, are removed regardless of the file descriptor that was used to obtain the lock. This has some unfortunate consequences and one should be extra careful when using advisory record locking. See fcntl(2) for discussion of the risks and consequences as well as for the (probably preferred) open file description locks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 [not found] ` <20260120174659.GE6263@brightrain.aerifal.cx> [not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com> 2026-01-20 20:11 ` Paul Eggert @ 2026-01-20 20:35 ` Alejandro Colomar 2026-01-20 20:42 ` Alejandro Colomar 2 siblings, 1 reply; 23+ messages in thread From: Alejandro Colomar @ 2026-01-20 20:35 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 9114 bytes --] Hi Rich, Zack, On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote: > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: [...] > > Now, the abstract correct behavior is secondary to the fact that we > > know there are both systems where close should not be retried after > > EINTR (Linux) and systems where the fd is still open after EINTR > > (HP-UX). But it is my position that *portable code* should assume the > > Linux behavior, because that is the safest option. If you assume the > > HP-UX behavior on a machine that implements the Linux behavior, you > > might close some unrelated file out from under yourself (probably but > > not necessarily a different thread). If you assume the Linux behavior > > on a machine that implements the HP-UX behavior, you have leaked a > > file descriptor; the worst things that can do are much less severe. > > Unfortunately, regardless of what happens, code portable to old > systems needs to avoid getting in the situation to begin with. By > either not installing interrupting signal handlers or blocking EINTR > around close. [...] > > > While I agree with all of this, I think the tone is way too > > > proscriptive. The man pages are to document the behaviors, not tell > > > people how to program. > > > > I could be persuaded to tone it down a little but in this case I think > > the man page's job *is* to tell people how to program. We know lots of > > existing code has gotten the fine details of close() wrong and we are > > trying to document how to do it right. > > No, the job of the man pages absolutely is not "to tell people how to > program". It's to document behaviors. They are not a programming > tutorial. They are not polemic diatribes. They are unbiased statements > of facts. Facts of what the standards say and what implementations do, > that equip programmers with the knowledge they need to make their own > informed decisions, rather than blindly following what someone who > thinks they know better told them to do. This reminds me a little bit of the realloc(p,0) fiasco of C89 and glibc. In most cases, I agree with you that manual pages are and should be aseptic, there are cases where I think the manual page needs to be tutorial. Especially when there's such a mess, we need to both explain all the possible behaviors (or at least mention them to some degree). But for example, there's the case of realloc(p,0), where we have a fiasco that was pushed by a compoundment of wrong decisions by the C Committee, and prior to that from System V. We're a bit lucky that C17 accidentally broke it so badly that we now have it as UB, and that gives us the opportunity to fix it now (which BTW might also be the case for close(2)). In the case of realloc(3), I went and documented in the manual page that glibc is broken, and that ISO C is also broken. STANDARDS malloc() free() calloc() realloc() C23, POSIX.1‐2024. reallocarray() POSIX.1‐2024. realloc(p, 0) The behavior of realloc(p, 0) in glibc doesn’t conform to any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008, POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17 specification was changed to make it conforming, but that specification made it impossible to write code that reli‐ ably determines if the input pointer is freed after real‐ loc(p, 0), and C23 changed it again to make this undefined behavior, acknowledging that the C17 specification was broad enough, so that undefined behavior wasn’t worse than that. reallocarray() suffers the same issues in glibc. musl libc and the BSDs conform to all versions of ISO C and POSIX.1. gnulib provides the realloc‐posix module, which provides wrappers realloc() and reallocarray() that conform to all versions of ISO C and POSIX.1. There’s a proposal to standardize the BSD behavior: https: //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt. HISTORY malloc() free() calloc() realloc() POSIX.1‐2001, C89. reallocarray() glibc 2.26. OpenBSD 5.6, FreeBSD 11.0. malloc() and related functions rejected sizes greater than PTRDIFF_MAX starting in glibc 2.30. free() preserved errno starting in glibc 2.33. realloc(p, 0) C89 was ambiguous in its specification of realloc(p, 0). C99 partially fixed this. The original implementation in glibc would have been con‐ forming to C99. However, and ironically, trying to comply with C99 before the standard was released, glibc changed its behavior in glibc 2.1.1 into something that ended up not conforming to the final C99 specification (but this is debated, as the wording of the standard seems self‐contra‐ dicting). ... BUGS Programmers would naturally expect by induction that realloc(p, size) is consistent with free(p) and mal‐ loc(size), as that is the behavior in the general case. This is not explicitly required by POSIX.1‐2024 or C11, but all conforming implementations are consistent with that. The glibc implementation of realloc() is not consistent with that, and as a consequence, it is dangerous to call realloc(p, 0) in glibc. A trivial workaround for glibc is calling it as realloc(p, size?size:1). The workaround for reallocarray() in glibc ——which shares the same bug—— would be reallocarray(p, n?n:1, size?size:1). Apart from documenting that glibc and ISO C are broken, we document how to best deal with it (see the last paragraph in BUGS). This is necessary because I fear that just by documenting the different behaviors, programmers would still not know what to do with that. Just take into account that even several members of the committee don't know how to deal with it. I'd be willing to have something similar for close(2). Have a lovely night! Alex P.S.: I have great news about realloc(p,0)! Microsoft is on-board with the change. They told me they like the proposal, and are willing to fix their realloc(3) implementation. They'll now conduct tests to make sure it doesn't break anything too badly, and will come back to me with any feedback they have from those tests. I'll put the standards proposal for realloc(3) on hold, waiting for Microsoft's feedback. > > > Aside: the reason EINTR *has to* be specified this way is that pthread > > > cancellation is aligned with EINTR. If EINTR were defined to have > > > closed the fd, then acting on cancellation during close would also > > > have closed the fd, but the cancellation handler would have no way to > > > distinguish this, leading to a situation where you're forced to either > > > leak fds or introduce a double-close vuln. > > > > The correct way to address this would be to make close() not be a > > cancellation point. > > This would also be a desirable change, one I would support if other > implementors are on-board with pushing for it. > > > > An outline of what I'd like to see instead: > > > > > > - Clear explanation of why double-close is a serious bug that must > > > always be avoided. (I think we all agree on this.) > > > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > > requirement differ, without language that tries to paint the POSIX > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > > conforming systems). > > > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > > GNU/Linux, maybe others). > > > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > > possibly ways to infer behavior, etc). > > > > This outline seems more or less reasonable to me but, if it's me > > writing the text, I _will_ characterize what POSIX currently says > > about EINTR returns from close() as a bug in POSIX. As far as I'm > > concerned, that is a fact, not polemic. > > > > I have found that arguing with you in particular, Rich, is generally > > not worth the effort. Therefore, unless you reply and _accept_ that > > the final version of the close manpage will say that POSIX is buggy, > > I am not going to write another version of this text, nor will I be > > drawn into further debate. > > I will not accept that because it's a gross violation of the > responsibility of document writing. > > Rich -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 20:35 ` Alejandro Colomar @ 2026-01-20 20:42 ` Alejandro Colomar 2026-01-23 0:33 ` Zack Weinberg 0 siblings, 1 reply; 23+ messages in thread From: Alejandro Colomar @ 2026-01-20 20:42 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 9759 bytes --] On Tue, Jan 20, 2026 at 09:35:43PM +0100, Alejandro Colomar wrote: > Hi Rich, Zack, > > On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote: > > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > > [...] > > > > Now, the abstract correct behavior is secondary to the fact that we > > > know there are both systems where close should not be retried after > > > EINTR (Linux) and systems where the fd is still open after EINTR > > > (HP-UX). But it is my position that *portable code* should assume the > > > Linux behavior, because that is the safest option. If you assume the > > > HP-UX behavior on a machine that implements the Linux behavior, you > > > might close some unrelated file out from under yourself (probably but > > > not necessarily a different thread). If you assume the Linux behavior > > > on a machine that implements the HP-UX behavior, you have leaked a > > > file descriptor; the worst things that can do are much less severe. > > > > Unfortunately, regardless of what happens, code portable to old > > systems needs to avoid getting in the situation to begin with. By > > either not installing interrupting signal handlers or blocking EINTR > > around close. > > [...] > > > > > While I agree with all of this, I think the tone is way too > > > > proscriptive. The man pages are to document the behaviors, not tell > > > > people how to program. > > > > > > I could be persuaded to tone it down a little but in this case I think > > > the man page's job *is* to tell people how to program. We know lots of > > > existing code has gotten the fine details of close() wrong and we are > > > trying to document how to do it right. > > > > No, the job of the man pages absolutely is not "to tell people how to > > program". It's to document behaviors. They are not a programming > > tutorial. They are not polemic diatribes. They are unbiased statements > > of facts. Facts of what the standards say and what implementations do, > > that equip programmers with the knowledge they need to make their own > > informed decisions, rather than blindly following what someone who > > thinks they know better told them to do. > > This reminds me a little bit of the realloc(p,0) fiasco of C89 and > glibc. > > In most cases, I agree with you that manual pages are and should be > aseptic, there are cases where I think the manual page needs to be > tutorial. Especially when there's such a mess, we need to both explain > all the possible behaviors (or at least mention them to some degree). ... and guide programmers about how to best use the API. I forgot to finish the sentence. > > But for example, there's the case of realloc(p,0), where we have > a fiasco that was pushed by a compoundment of wrong decisions by the > C Committee, and prior to that from System V. We're a bit lucky that > C17 accidentally broke it so badly that we now have it as UB, and that > gives us the opportunity to fix it now (which BTW might also be the case > for close(2)). > > In the case of realloc(3), I went and documented in the manual page that > glibc is broken, and that ISO C is also broken. > > STANDARDS > malloc() > free() > calloc() > realloc() > C23, POSIX.1‐2024. > > reallocarray() > POSIX.1‐2024. > > realloc(p, 0) > The behavior of realloc(p, 0) in glibc doesn’t conform to > any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008, > POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17 > specification was changed to make it conforming, but that > specification made it impossible to write code that reli‐ > ably determines if the input pointer is freed after real‐ > loc(p, 0), and C23 changed it again to make this undefined > behavior, acknowledging that the C17 specification was > broad enough, so that undefined behavior wasn’t worse than > that. > > reallocarray() suffers the same issues in glibc. > > musl libc and the BSDs conform to all versions of ISO C > and POSIX.1. > > gnulib provides the realloc‐posix module, which provides > wrappers realloc() and reallocarray() that conform to all > versions of ISO C and POSIX.1. > > There’s a proposal to standardize the BSD behavior: https: > //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt. > > HISTORY > malloc() > free() > calloc() > realloc() > POSIX.1‐2001, C89. > > reallocarray() > glibc 2.26. OpenBSD 5.6, FreeBSD 11.0. > > malloc() and related functions rejected sizes greater than > PTRDIFF_MAX starting in glibc 2.30. > > free() preserved errno starting in glibc 2.33. > > realloc(p, 0) > C89 was ambiguous in its specification of realloc(p, 0). > C99 partially fixed this. > > The original implementation in glibc would have been con‐ > forming to C99. However, and ironically, trying to comply > with C99 before the standard was released, glibc changed > its behavior in glibc 2.1.1 into something that ended up > not conforming to the final C99 specification (but this is > debated, as the wording of the standard seems self‐contra‐ > dicting). > > ... > > BUGS > Programmers would naturally expect by induction that > realloc(p, size) is consistent with free(p) and mal‐ > loc(size), as that is the behavior in the general case. > This is not explicitly required by POSIX.1‐2024 or C11, > but all conforming implementations are consistent with > that. > > The glibc implementation of realloc() is not consistent > with that, and as a consequence, it is dangerous to call > realloc(p, 0) in glibc. > > A trivial workaround for glibc is calling it as > realloc(p, size?size:1). > > The workaround for reallocarray() in glibc ——which shares > the same bug—— would be > reallocarray(p, n?n:1, size?size:1). > > > Apart from documenting that glibc and ISO C are broken, we document how > to best deal with it (see the last paragraph in BUGS). This is > necessary because I fear that just by documenting the different > behaviors, programmers would still not know what to do with that. > Just take into account that even several members of the committee don't > know how to deal with it. > > I'd be willing to have something similar for close(2). > > > Have a lovely night! > Alex > > P.S.: I have great news about realloc(p,0)! Microsoft is on-board with > the change. They told me they like the proposal, and are willing to > fix their realloc(3) implementation. They'll now conduct tests to make > sure it doesn't break anything too badly, and will come back to me with > any feedback they have from those tests. > > I'll put the standards proposal for realloc(3) on hold, waiting for > Microsoft's feedback. > > > > > Aside: the reason EINTR *has to* be specified this way is that pthread > > > > cancellation is aligned with EINTR. If EINTR were defined to have > > > > closed the fd, then acting on cancellation during close would also > > > > have closed the fd, but the cancellation handler would have no way to > > > > distinguish this, leading to a situation where you're forced to either > > > > leak fds or introduce a double-close vuln. > > > > > > The correct way to address this would be to make close() not be a > > > cancellation point. > > > > This would also be a desirable change, one I would support if other > > implementors are on-board with pushing for it. > > > > > > An outline of what I'd like to see instead: > > > > > > > > - Clear explanation of why double-close is a serious bug that must > > > > always be avoided. (I think we all agree on this.) > > > > > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > > > requirement differ, without language that tries to paint the POSIX > > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > > > conforming systems). > > > > > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > > > GNU/Linux, maybe others). > > > > > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > > > possibly ways to infer behavior, etc). > > > > > > This outline seems more or less reasonable to me but, if it's me > > > writing the text, I _will_ characterize what POSIX currently says > > > about EINTR returns from close() as a bug in POSIX. As far as I'm > > > concerned, that is a fact, not polemic. > > > > > > I have found that arguing with you in particular, Rich, is generally > > > not worth the effort. Therefore, unless you reply and _accept_ that > > > the final version of the close manpage will say that POSIX is buggy, > > > I am not going to write another version of this text, nor will I be > > > drawn into further debate. > > > > I will not accept that because it's a gross violation of the > > responsibility of document writing. > > > > Rich > > -- > <https://www.alejandro-colomar.es> -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 20:42 ` Alejandro Colomar @ 2026-01-23 0:33 ` Zack Weinberg 2026-01-23 1:02 ` Alejandro Colomar 2026-01-24 19:34 ` The 8472 0 siblings, 2 replies; 23+ messages in thread From: Zack Weinberg @ 2026-01-23 0:33 UTC (permalink / raw) To: Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development Alright, since it actually seems possible we might be having a reasonable conversation about the close manpage now, I've done another draft. I *think* this covers all the concerns expressed so far. I am feeling somewhat more charitable toward the Austin Group after close-reading the current POSIX spec for close, so there is no BUGS section after all. In their shoes I would still have disallowed EINTR returns from close altogether, but I can see why they felt that was a step too far. This is a full top-to-bottom rewrite of the manpage; please speak up if you don't like any of my changes to any of it, not just the new stuff about delayed errors. It's written in freeform text for ease of reading; I'll do proper troff markup after the text is finalized. (Alejandro, do you have a preference between -man and -mdoc markup?) Please note the [QUERY:] sections sprinkled throughout NOTES. I would like to have answers to those questions for the final draft. zw NAME close - close a file descriptor LIBRARY Standard C library (libc, -lc) SYNOPSIS #include <unistd.h> int close(int fd); DESCRIPTION close() closes a file descriptor, so that it no longer refers to any file and may be reused. When the last file descriptor referring to an underlying open file description (see open(2)) is closed, the resources associated with the open file description are freed. If that open file description is the last reference to a file which has been removed using unlink(2), the file is deleted. When *any* file descriptor is closed, all record locks held by the *process*, on the file formerly referred to by that file descriptor, are released. This happens even if the file is still open in the process via a different file descriptor. See fcntl(2) for discussion of the consequences, and for alternatives with less surprising semantics. close() may report a *delayed error* from previous I/O operations on a file. When it does this, the file descriptor has still been closed, but the error needs to be handled. See RETURN VALUE, ERRORS, and NOTES for further discussion of what the errors reported by close mean, and how to handle them. Despite the possibility of delayed errors, a successful close() does *not* guarantee that all data written to the file has been successfully saved to persistent storage. If you need such a guarantee, use fsync(2); see that page for details. The close-on-exec file descriptor flag can be used to ensure that a file descriptor is automatically closed upon a successful execve(2); see fcntl(2) for details. RETURN VALUE close() returns zero if the descriptor has been closed and there were no delayed errors to report. It returns -1 if there was an error that prevented the file descriptor from being closed, *or* if the descriptor has successfully been closed but there was a delayed error to report. The errno code can be used to distinguish them; see ERRORS and NOTES. ERRORS EBADF The fd argument was not a valid, open file descriptor. EINTR The close() call was interrupted by a signal. The file descriptor *may or may not* have been closed, depending on the operating system. See “Signals and close(),” below. EINPROGRESS [POSIX.1-2024 only] The close() call was interrupted by a signal, after the file descriptor number was released for reuse, but before all clean-up work had been completed. The file descriptor has been closed, and a delayed error may have been lost. See “Signals and close(),” below. EIO ESTALE EDQUOT EFBIG ENOSPC These error codes indicate a delayed error from a previous write(2) operation. The file descriptor has been closed, but the error needs to be handled. See “Delayed errors reported by close()”, below. Depending on the underlying file and/or file system, close() may return with other errno codes besides those listed. All such codes also indicate delayed errors. NOTES Multithreaded processes and close() In a multithreaded program, each thread must take care not to accidentally close file descriptors that are in use by other threads. Because system calls that *open* files, sockets, etc. always allocate the lowest file descriptor number that’s not in use, file descriptor numbers are rapidly reused. Closing an fd that another thread is still using is therefore likely to cause data to be read or written to the wrong place. Sometimes programs *deliberately* close a file descriptor that is in use by another thread, intending to cancel any blocking I/O operation that the other thread is performing. Whether this works depends on the operating system. On Linux, it doesn’t work; a blocking I/O system call holds a direct reference to the underlying open file description that is the target of the I/O, and is unaffected by the program closing the file descriptor that was used to initiate the I/O operation. (See open(2) for a discussion of open file descriptions.) Delayed errors reported by close() In a variety of situations, most notably when writing to a file that is hosted on a network file server, write(2) operations may “optimistically” return successfully as soon as the write has been queued for processing. close(2) waits for confirmation that *most* of the processing for previous writes to a file has been completed, and reports any errors that the earlier write() calls *would have* reported, if they hadn’t returned optimistically. Especially, close() will report “disk full” (ENOSPC) and “disk quota exceeded” (EDQUOT) errors that write() didn’t wait for. (To wait for *all* processing to complete, it is necessary to use fsync(2) as well.) Because of these delayed errors, it’s important to check the return value of close() and handle any errors it reports. Ignoring delayed errors can cause silent loss of data. However, when handling delayed errors, keep in mind that the close() call should *not* be repeated. When close() has a delayed error to report, it still closes the file before returning. The file descriptor number might already have been reused for some other file, especially in multithreaded programs. To make another attempt at the failed writes, it’s necessary to reopen the file and start all over again. [QUERY: Do delayed errors ever happen in any of these situations? - The fd is not the last reference to the open file description - The OFD was opened with O_RDONLY - The OFD was opened with O_RDWR but has never actually been written to - No data has been written to the OFD since the last call to fsync() for that OFD - No data has been written to the OFD since the last call to fdatasync() for that OFD If we can give some guidance about when people don’t need to worry about delayed errors, it would be helpful.] Signals and close() close() waits for various I/O operations to complete; it is a blocking system call, which can be interrupted by signals and thread cancellation. As usual, when close() is interrupted by a signal, it returns -1 and sets errno to EINTR. Unlike most system calls that can be interrupted by signals, it is not safe to repeat an interrupted call to close(). Prior to POSIX.1-2024, when a close() was interrupted by a signal, it was *unspecified* whether the file descriptor was still open afterward. The authors of this manpage are aware of both systems where the file descriptor is guaranteed to still be open after an interrupted close(), e.g. HP-UX, and systems where it is guaranteed to be *closed* after an interrupted close(), e.g. Linux and FreeBSD. POSIX.1-2024 makes stricter requirements; operating systems should now return EINPROGRESS, rather than EINTR, when close() is interrupted before it’s completely done, but after the file descriptor number is released for reuse. As usual, though, it will be a a long time before portable code can safely assume all supported systems are compliant with this new requirement. Regardless of the error code, on systems where an interrupted close() cannot be retried, an interruption means that delayed errors may be lost, and in turn *that* means data might silently be lost. Therefore, we strongly recommend that programmers avoid allowing close() to be interrupted by signals in the first place. This can be done in all the usual ways—use only signal handlers installed by sigaction(2) with the SA_RESTART flag, keep signals blocked at all times except during calls to ppoll(2), dedicate a thread to signal handling, etc. [QUERY: Do we know if close() is allowed to block or report delayed errors when no data has been written to the OFD since the last completed fsync() or fdatasync() on that OFD? If it isn’t allowed to block or report delayed errors in that case, another good recommendation would be to always use at least fdatasync() and let *that* be the thing that gets interrupted by signals. The POSIX.1-2024 RATIONALE section makes a very similar recommendation, but doesn’t appear to back that up with normative requirements on close().] STANDARDS POSIX.1-2024. HISTORY The close() system call was present in Unix V7. POSIX.1-2024 clarified the semantics of delayed errors; prior to that revision, it was unspecified whether a close() call that returned a delayed error would close the file descriptor. However, we are not aware of any systems where it didn’t. SEE ALSO close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2), unlink(2), open(2), read(2), write(2), fopen(3), fclose(3) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 0:33 ` Zack Weinberg @ 2026-01-23 1:02 ` Alejandro Colomar 2026-01-23 1:38 ` Al Viro 2026-01-23 14:05 ` Zack Weinberg 2026-01-24 19:34 ` The 8472 1 sibling, 2 replies; 23+ messages in thread From: Alejandro Colomar @ 2026-01-23 1:02 UTC (permalink / raw) To: Zack Weinberg Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 1764 bytes --] Hi Zack, On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote: [...] > This is a full top-to-bottom rewrite of the manpage; please speak > up if you don't like any of my changes to any of it, not just the > new stuff about delayed errors. It's written in freeform text for > ease of reading; I'll do proper troff markup after the text is > finalized. (Alejandro, do you have a preference between -man > and -mdoc markup?) Strong preference for man(7). [...] > ERRORS > EBADF The fd argument was not a valid, open file descriptor. > > EINTR The close() call was interrupted by a signal. > The file descriptor *may or may not* have been closed, > depending on the operating system. See “Signals and > close(),” below. Punctuation like commas should go outside of the quotes (yes, I know some styles do that, but we don't). [...] > STANDARDS > POSIX.1-2024. > > HISTORY > The close() system call was present in Unix V7. That would be simply stated as: V7. We could also document the first POSIX standard, as not all Unix APIs were standardized at the same time. Thus: V7, POSIX.1-1988. Thanks! Have a lovely night! Alex > > POSIX.1-2024 clarified the semantics of delayed errors; prior > to that revision, it was unspecified whether a close() call > that returned a delayed error would close the file descriptor. > However, we are not aware of any systems where it didn’t. > > SEE ALSO > close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2), > unlink(2), open(2), read(2), write(2), fopen(3), fclose(3) -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:02 ` Alejandro Colomar @ 2026-01-23 1:38 ` Al Viro 2026-01-23 14:44 ` Alejandro Colomar 2026-01-23 14:05 ` Zack Weinberg 1 sibling, 1 reply; 23+ messages in thread From: Al Viro @ 2026-01-23 1:38 UTC (permalink / raw) To: Alejandro Colomar Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote: > > HISTORY > > The close() system call was present in Unix V7. > > That would be simply stated as: > > V7. > > We could also document the first POSIX standard, as not all Unix APIs > were standardized at the same time. Thus: > > V7, POSIX.1-1988. > > Thanks! 11/3/71 SYS CLOSE (II) NAME close -- close a file SYNOPSIS (file descriptor in r0) sys close / close = 6. DESCRIPTION Given a file descriptor such as returned from an open or creat call, close closes the associated file. A close of all files is automatic on exit, but since processes are limited to 10 simultaneously open files, close is necessary to programs which deal with many files. FILES SEE ALSO creat, open DIAGNOSTICS The error bit (c—bit) is set for an unknown file descriptor. BUGS OWNER ken, dmr That's V1 manual. In V3 we already get EBADF on unopened descriptor; in _all_ cases there close(N) ends up with descriptor N not opened. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:38 ` Al Viro @ 2026-01-23 14:44 ` Alejandro Colomar 0 siblings, 0 replies; 23+ messages in thread From: Alejandro Colomar @ 2026-01-23 14:44 UTC (permalink / raw) To: Al Viro Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 1455 bytes --] Hi Al, On Fri, Jan 23, 2026 at 01:38:59AM +0000, Al Viro wrote: > On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote: > > > HISTORY > > > The close() system call was present in Unix V7. > > > > That would be simply stated as: > > > > V7. > > > > We could also document the first POSIX standard, as not all Unix APIs > > were standardized at the same time. Thus: > > > > V7, POSIX.1-1988. > > > > Thanks! > > 11/3/71 SYS CLOSE (II) > NAME close -- close a file > SYNOPSIS (file descriptor in r0) > sys close / close = 6. > DESCRIPTION Given a file descriptor such as returned from an open or > creat call, close closes the associated file. A close of > all files is automatic on exit, but since processes are > limited to 10 simultaneously open files, close is > necessary to programs which deal with many files. > FILES > SEE ALSO creat, open > DIAGNOSTICS The error bit (c—bit) is set for an unknown file > descriptor. > BUGS > OWNER ken, dmr > > That's V1 manual. In V3 we already get EBADF on unopened descriptor; > in _all_ cases there close(N) ends up with descriptor N not opened. Thanks! Then it should actually be V1, POSIX.1-1988. Let's not document the history change from V3, as those details are better documented as part of the V3 manual and reading the sources. Have a lovely day! Alex -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:02 ` Alejandro Colomar 2026-01-23 1:38 ` Al Viro @ 2026-01-23 14:05 ` Zack Weinberg 1 sibling, 0 replies; 23+ messages in thread From: Zack Weinberg @ 2026-01-23 14:05 UTC (permalink / raw) To: Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On Thu, Jan 22, 2026, at 8:02 PM, Alejandro Colomar wrote: > On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote: > [...] > >> (Alejandro, do you have a preference between -man >> and -mdoc markup?) > > Strong preference for man(7). OK. >> close(),” below. > > Punctuation like commas should go outside of the quotes (yes, I know > some styles do that, but we don't). Will correct. >> HISTORY >> The close() system call was present in Unix V7. > > That would be simply stated as: > > V7. Looking at other really old system calls (fork(), open(), read(), _exit(), link()), they all say "SVr4, 4.3BSD, POSIX.1-2001" and that's what this one said too, before I changed it. I think I'll put it back the way it was. zw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 0:33 ` Zack Weinberg 2026-01-23 1:02 ` Alejandro Colomar @ 2026-01-24 19:34 ` The 8472 2026-01-24 21:39 ` Rich Felker 1 sibling, 1 reply; 23+ messages in thread From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw) To: Zack Weinberg, Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On 23/01/2026 01:33, Zack Weinberg wrote: [...] > ERRORS > EBADF The fd argument was not a valid, open file descriptor. Unfortunately EBADF from FUSE is passed through unfiltered by the kernel on close[0], that makes it more difficult to reliably detect bugs relating to double-closes of file descriptors. [...] > Delayed errors reported by close() > > In a variety of situations, most notably when writing to a file > that is hosted on a network file server, write(2) operations may > “optimistically” return successfully as soon as the write has > been queued for processing. > > close(2) waits for confirmation that *most* of the processing > for previous writes to a file has been completed, and reports > any errors that the earlier write() calls *would have* reported, > if they hadn’t returned optimistically. Especially, close() > will report “disk full” (ENOSPC) and “disk quota exceeded” > (EDQUOT) errors that write() didn’t wait for. > > (To wait for *all* processing to complete, it is necessary to > use fsync(2) as well.) > > Because of these delayed errors, it’s important to check the > return value of close() and handle any errors it reports. > Ignoring delayed errors can cause silent loss of data. > > However, when handling delayed errors, keep in mind that the > close() call should *not* be repeated. When close() has a > delayed error to report, it still closes the file before > returning. The file descriptor number might already have been > reused for some other file, especially in multithreaded > programs. To make another attempt at the failed writes, it’s > necessary to reopen the file and start all over again. > > [QUERY: Do delayed errors ever happen in any of these situations? > > - The fd is not the last reference to the open file description > > - The OFD was opened with O_RDONLY > > - The OFD was opened with O_RDWR but has never actually > been written to > > - No data has been written to the OFD since the last call to > fsync() for that OFD > > - No data has been written to the OFD since the last call to > fdatasync() for that OFD > > If we can give some guidance about when people don’t need to > worry about delayed errors, it would be helpful.] > The Rust standard library team is also interested in this topic, there is lively discussion[1] whether it makes sense to surface errors from close at all. Our current default is to ignore them. It is my understanding that errors may not have happened yet at the time of close due to delayed writeback or additional descriptors pointing to the description, e.g. in a forked child, and thus close() is not a reliable mechanism for error detection and fsync() is the only available option. Some users do care specifically about the unusual behavior on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate that there's no middle ground to get errors on an open file descriptor or initiate the NFS flush behavior without a full fsync. [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ [1] https://github.com/rust-lang/libs-team/issues/705 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 19:34 ` The 8472 @ 2026-01-24 21:39 ` Rich Felker 2026-01-24 21:57 ` The 8472 0 siblings, 1 reply; 23+ messages in thread From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw) To: The 8472 Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: > On 23/01/2026 01:33, Zack Weinberg wrote: > > [...] > > > ERRORS > > EBADF The fd argument was not a valid, open file descriptor. > > Unfortunately EBADF from FUSE is passed through unfiltered by the kernel > on close[0], that makes it more difficult to reliably detect bugs relating > to double-closes of file descriptors. Wow, that's a nasty bug. Are the kernel folks not amenable to fixing it? I wonder if that could even have security implications. I think you could detect these fraudulent EBADFs (albeit not under conditions where there's a race bug) by performing fcntl/F_GETFD before close and knowing the EBADF from close is fake is fcntl didn't EBADF, but that seems like an unreasonable cost to work around FUSE behaving badly. Rich ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 21:39 ` Rich Felker @ 2026-01-24 21:57 ` The 8472 2026-01-25 15:37 ` Zack Weinberg 0 siblings, 1 reply; 23+ messages in thread From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On 24/01/2026 22:39, Rich Felker wrote: > On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: >> On 23/01/2026 01:33, Zack Weinberg wrote: >> >> [...] >> >>> ERRORS >>> EBADF The fd argument was not a valid, open file descriptor. >> >> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel >> on close[0], that makes it more difficult to reliably detect bugs relating >> to double-closes of file descriptors. > > Wow, that's a nasty bug. Are the kernel folks not amenable to fixing > it? Not when I brought it up last time, no[0] > I wonder if that could even have security implications. I think > you could detect these fraudulent EBADFs (albeit not under conditions > where there's a race bug) by performing fcntl/F_GETFD before close and > knowing the EBADF from close is fake is fcntl didn't EBADF, but that > seems like an unreasonable cost to work around FUSE behaving badly. > > Rich That's pretty much the workaround[1] we use, but due to the extra syscall it's only done in debug builds. [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ [1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 21:57 ` The 8472 @ 2026-01-25 15:37 ` Zack Weinberg 2026-01-26 8:51 ` Florian Weimer 2026-01-26 12:15 ` Jan Kara 0 siblings, 2 replies; 23+ messages in thread From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw) To: The 8472, Rich Felker Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > On 24/01/2026 22:39, Rich Felker wrote: >> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: >>> On 23/01/2026 01:33, Zack Weinberg wrote: >>> >>> [...] >>> >>>> ERRORS >>>> EBADF The fd argument was not a valid, open file descriptor. >>> >>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel >>> on close[0], that makes it more difficult to reliably detect bugs relating >>> to double-closes of file descriptors. >> >> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing >> it? > > Not when I brought it up last time, no[0] > > [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ It seems to me that Antonio Muscemi’s point is valid for *most* errno codes. Like, a whole lot of them exist just to give more information *to a human user* about the cause of an unrecoverable error. Take the list of “error codes that indicate a delayed error from a previous write(2) operation,” from a little later in the draft, for instance: there’s no plausible way for a *program* to react differently to EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want to react differently, so we want different error messages for each, so they’re different error codes. It’s not a problem if the kernel produces an error code of this type that wasn’t in the official documented list, because the program doesn’t need to treat it specially. But EBADF is different; it has the very specific meaning “user space passed an invalid file descriptor to a system call,” which almost always indicates a *bug in the program*, and allowing that meaning to be diluted is not OK. It’s getting off topic for this conversation, but there’s a short list of other errno codes that indicate a specific situation that the *program* should respond to in a specific way (EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones I can think of) and maybe it would spark a more constructive conversation on the kernel side if we presented a *comprehensive* list of errno codes that FUSE servers shouldn’t be allowed to produce with a specific rationale for each. >> Delayed errors reported by close() >> >> In a variety of situations, most notably when writing to a file >> that is hosted on a network file server, write(2) operations may >> “optimistically” return successfully as soon as the write has >> been queued for processing. >> >> close(2) waits for confirmation that *most* of the processing >> for previous writes to a file has been completed, and reports >> any errors that the earlier write() calls *would have* reported, >> if they hadn’t returned optimistically. Especially, close() >> will report “disk full” (ENOSPC) and “disk quota exceeded” >> (EDQUOT) errors that write() didn’t wait for. > > The Rust standard library team is also interested in this topic, there > is lively discussion[1] whether it makes sense to surface errors from > close at all. Our current default is to ignore them. > It is my understanding that errors may not have happened yet at > the time of close due to delayed writeback or additional descriptors > pointing to the description, e.g. in a forked child, and thus > close() is not a reliable mechanism for error detection and > fsync() is the only available option. > > [1] https://github.com/rust-lang/libs-team/issues/705 This is something I care about a lot as well, but I currently don’t have an *opinion*. To form an informed opinion, I need the answers to these questions: >> [QUERY: Do delayed errors ever happen in any of these situations? >> >> - The fd is not the last reference to the open file description >> >> - The OFD was opened with O_RDONLY >> >> - The OFD was opened with O_RDWR but has never actually >> been written to >> >> - No data has been written to the OFD since the last call to >> fsync() for that OFD >> >> - No data has been written to the OFD since the last call to >> fdatasync() for that OFD >> >> If we can give some guidance about when people don’t need to >> worry about delayed errors, it would be helpful.] In particular, I really hope delayed errors *aren’t* ever reported when you close a file descriptor that *isn’t* the last reference to its open file description, because the thread-safe way to close stdout without losing write errors[2] depends on that not happening. And whether the Rust stdlib can legitimately say “leaving aside the additional cost of calling fsync(), you do not *need* the error return from close() because you can call fsync() first,” depends on whether it’s actually true that you *won’t* ever get a delayed error from close() if you called fsync() first and didn’t do any more output in between (assume the fd has no duplicates here). I would not be surprised at all if those FUSE guys insisted on their right to make char msg[] = "soon I will be invincible\n"; int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666); write(fd, msg, sizeof(msg) - 1); fsync(fd); close(fd); return an error *only* from the close, not the write or the fsync. And I also wouldn’t be surprised at all to find production NFS or SMB servers that did that. [2] https://stackoverflow.com/a/50865617 (third code block) zw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-25 15:37 ` Zack Weinberg @ 2026-01-26 8:51 ` Florian Weimer 2026-01-26 12:15 ` Jan Kara 1 sibling, 0 replies; 23+ messages in thread From: Florian Weimer @ 2026-01-26 8:51 UTC (permalink / raw) To: Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Zack Weinberg: > In particular, I really hope delayed errors *aren’t* ever reported > when you close a file descriptor that *isn’t* the last reference > to its open file description, because the thread-safe way to close > stdout without losing write errors[2] depends on that not happening. > [2] https://stackoverflow.com/a/50865617 (third code block) Are you sure about that? It means that errors are never reported if a shell script redirects standard output over multiple commands. Thanks, Florian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-25 15:37 ` Zack Weinberg 2026-01-26 8:51 ` Florian Weimer @ 2026-01-26 12:15 ` Jan Kara 2026-01-26 13:53 ` The 8472 1 sibling, 1 reply; 23+ messages in thread From: Jan Kara @ 2026-01-26 12:15 UTC (permalink / raw) To: Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > >> Delayed errors reported by close() > >> > >> In a variety of situations, most notably when writing to a file > >> that is hosted on a network file server, write(2) operations may > >> “optimistically” return successfully as soon as the write has > >> been queued for processing. > >> > >> close(2) waits for confirmation that *most* of the processing > >> for previous writes to a file has been completed, and reports > >> any errors that the earlier write() calls *would have* reported, > >> if they hadn’t returned optimistically. Especially, close() > >> will report “disk full” (ENOSPC) and “disk quota exceeded” > >> (EDQUOT) errors that write() didn’t wait for. > > > > The Rust standard library team is also interested in this topic, there > > is lively discussion[1] whether it makes sense to surface errors from > > close at all. Our current default is to ignore them. > > It is my understanding that errors may not have happened yet at > > the time of close due to delayed writeback or additional descriptors > > pointing to the description, e.g. in a forked child, and thus > > close() is not a reliable mechanism for error detection and > > fsync() is the only available option. > > > > [1] https://github.com/rust-lang/libs-team/issues/705 > > This is something I care about a lot as well, but I currently don’t > have an *opinion*. To form an informed opinion, I need the answers > to these questions: > > >> [QUERY: Do delayed errors ever happen in any of these situations? > >> > >> - The fd is not the last reference to the open file description > >> > >> - The OFD was opened with O_RDONLY > >> > >> - The OFD was opened with O_RDWR but has never actually > >> been written to > >> > >> - No data has been written to the OFD since the last call to > >> fsync() for that OFD > >> > >> - No data has been written to the OFD since the last call to > >> fdatasync() for that OFD > >> > >> If we can give some guidance about when people don’t need to > >> worry about delayed errors, it would be helpful.] > > In particular, I really hope delayed errors *aren’t* ever reported > when you close a file descriptor that *isn’t* the last reference > to its open file description, because the thread-safe way to close > stdout without losing write errors[2] depends on that not happening. So I've checked and in Linux ->flush callback for the file is called whenever you close a file descriptor (regardless whether there are other file descriptors pointing to the same file description) so it's upto filesystem implementation what it decides to do and which error it will return... Checking the implementations e.g. FUSE and NFS *will* return delayed writeback errors on *first* descriptor close even if there are other still open descriptors for the description AFAICS. > And whether the Rust stdlib can legitimately say “leaving aside the > additional cost of calling fsync(), you do not *need* the error return > from close() because you can call fsync() first,” depends on whether > it’s actually true that you *won’t* ever get a delayed error from > close() if you called fsync() first and didn’t do any more output in > between (assume the fd has no duplicates here). I would not be > surprised at all if those FUSE guys insisted on their right to make > > char msg[] = "soon I will be invincible\n"; > int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666); > write(fd, msg, sizeof(msg) - 1); > fsync(fd); > close(fd); > > return an error *only* from the close, not the write or the fsync. So fsync(2) must make sure data is persistently stored and return error if it was not. Thus as a VFS person I'd consider it a filesystem bug if an error preveting reading data later was not returned from fsync(2). OTOH that doesn't necessarily mean that later close doesn't return an error - e.g. FUSE does communicate with the server on close that can fail and error can be returned. With this in mind let me now try to answer your remaining questions: > >> - The OFD was opened with O_RDONLY If the filesystem supports atime, close can in principle report that atime update failed. > >> - The OFD was opened with O_RDWR but has never actually > >> been written to The same as above but with inode mtime updates. > >> - No data has been written to the OFD since the last call to > >> fsync() for that OFD No writeback errors should happen in this case. As I wrote above I'd consider this a filesystem bug. > >> > >> - No data has been written to the OFD since the last call to > >> fdatasync() for that OFD Errors can happen because some inode metadata (in practice probably only inode time stamps) may still need to be written out. So in the cases described above (except for fsync()) you may get delayed errors on close. But since in all those cases no data is lost, I don't think 99.9% of applications care at all... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 12:15 ` Jan Kara @ 2026-01-26 13:53 ` The 8472 2026-01-26 15:56 ` Jan Kara 0 siblings, 1 reply; 23+ messages in thread From: The 8472 @ 2026-01-26 13:53 UTC (permalink / raw) To: Jan Kara, Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On 26/01/2026 13:15, Jan Kara wrote: > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: >> >>>> [QUERY: Do delayed errors ever happen in any of these situations? >>>> >>>> - The fd is not the last reference to the open file description >>>> >>>> - The OFD was opened with O_RDONLY >>>> >>>> - The OFD was opened with O_RDWR but has never actually >>>> been written to >>>> >>>> - No data has been written to the OFD since the last call to >>>> fsync() for that OFD >>>> >>>> - No data has been written to the OFD since the last call to >>>> fdatasync() for that OFD >>>> >>>> If we can give some guidance about when people don’t need to >>>> worry about delayed errors, it would be helpful.] >> >> In particular, I really hope delayed errors *aren’t* ever reported >> when you close a file descriptor that *isn’t* the last reference >> to its open file description, because the thread-safe way to close >> stdout without losing write errors[2] depends on that not happening. > > So I've checked and in Linux ->flush callback for the file is called > whenever you close a file descriptor (regardless whether there are other > file descriptors pointing to the same file description) so it's upto > filesystem implementation what it decides to do and which error it will > return... Checking the implementations e.g. FUSE and NFS *will* return > delayed writeback errors on *first* descriptor close even if there are > other still open descriptors for the description AFAICS. Regarding the "first", does that mean the errors only get delivered once? I.e. if a concurrent fork/exec happens for process spawning and the fork-child closes the file descriptors then this closing may basically receive the errors and the parent will not see them (unless additional errors happen)? Or if _any_ part of the program dups the descriptor and then closes it without reporting errors then all uses of those descriptor must consider error delivery on close to be unreliable? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 13:53 ` The 8472 @ 2026-01-26 15:56 ` Jan Kara 2026-01-26 16:43 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Jan Kara @ 2026-01-26 15:56 UTC (permalink / raw) To: The 8472 Cc: Jan Kara, Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development, Jeff Layton On Mon 26-01-26 14:53:12, The 8472 wrote: > On 26/01/2026 13:15, Jan Kara wrote: > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > been written to > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > fsync() for that OFD > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > fdatasync() for that OFD > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > worry about delayed errors, it would be helpful.] > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > when you close a file descriptor that *isn’t* the last reference > > > to its open file description, because the thread-safe way to close > > > stdout without losing write errors[2] depends on that not happening. > > > > So I've checked and in Linux ->flush callback for the file is called > > whenever you close a file descriptor (regardless whether there are other > > file descriptors pointing to the same file description) so it's upto > > filesystem implementation what it decides to do and which error it will > > return... Checking the implementations e.g. FUSE and NFS *will* return > > delayed writeback errors on *first* descriptor close even if there are > > other still open descriptors for the description AFAICS. > Regarding the "first", does that mean the errors only get delivered once? I've added Jeff to CC who should be able to provide you with a more authoritative answer but AFAIK the answer is yes. E.g. NFS does: static int nfs_file_flush(struct file *file, fl_owner_t id) { ... /* Flush writes to the server and return any errors */ since = filemap_sample_wb_err(file->f_mapping); nfs_wb_all(inode); return filemap_check_wb_err(file->f_mapping, since); } which will writeback all outstanding data on the first close and report error if it happened. Following close has nothing to flush and thus no error to report. That being said if you call fsync(2) you'll still get the error back again because fsync uses a separate writeback error counter in the file description. But again only the first fsync(2) will return the error. Following fsyncs will report no error. > I.e. if a concurrent fork/exec happens for process spawning and the > fork-child closes the file descriptors then this closing may basically > receive the errors and the parent will not see them (unless additional > errors happen)? Correct AFAICT. > Or if _any_ part of the program dups the descriptor and then closes it > without reporting errors then all uses of those descriptor must consider > error delivery on close to be unreliable? Correct as well AFAICT. I should probably also add that traditional filesystems (classical local disk based filesystems) don't bother with reporting delayed errors on close(2) *at all*. So unless you call fsync(2) you will never learn there was any writeback error. After all for these filesystems there are good chances writeback didn't even start by the time you are calling close(2). So overall I'd say that error reporting from close(2) is so random and filesystem dependent that the errors are not worth paying attention to. If you really care about data integrity (and thus writeback errors) you must call fsync(2) in which case the kernel provides at least somewhat consistent error reporting story. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 15:56 ` Jan Kara @ 2026-01-26 16:43 ` Jeff Layton 2026-01-26 23:01 ` Trevor Gross 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2026-01-26 16:43 UTC (permalink / raw) To: Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: > On Mon 26-01-26 14:53:12, The 8472 wrote: > > On 26/01/2026 13:15, Jan Kara wrote: > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > > been written to > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > fsync() for that OFD > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > fdatasync() for that OFD > > > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > > worry about delayed errors, it would be helpful.] > > > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > > when you close a file descriptor that *isn’t* the last reference > > > > to its open file description, because the thread-safe way to close > > > > stdout without losing write errors[2] depends on that not happening. > > > > > > So I've checked and in Linux ->flush callback for the file is called > > > whenever you close a file descriptor (regardless whether there are other > > > file descriptors pointing to the same file description) so it's upto > > > filesystem implementation what it decides to do and which error it will > > > return... Checking the implementations e.g. FUSE and NFS *will* return > > > delayed writeback errors on *first* descriptor close even if there are > > > other still open descriptors for the description AFAICS. ...and I really wish they _didn't_. Reporting a writeback error on close is not particularly useful. Most filesystems don't require you to write back all data on a close(). A successful close() on those just means that no error has happened yet. Any application that cares about writeback errors needs to fsync(), full stop. > > Regarding the "first", does that mean the errors only get delivered once? > > I've added Jeff to CC who should be able to provide you with a more > authoritative answer but AFAIK the answer is yes. > > E.g. NFS does: > > static int > nfs_file_flush(struct file *file, fl_owner_t id) > { > ... > /* Flush writes to the server and return any errors */ > since = filemap_sample_wb_err(file->f_mapping); > nfs_wb_all(inode); > return filemap_check_wb_err(file->f_mapping, since); > } > > which will writeback all outstanding data on the first close and report > error if it happened. Following close has nothing to flush and thus no > error to report. > > That being said if you call fsync(2) you'll still get the error back again > because fsync uses a separate writeback error counter in the file > description. But again only the first fsync(2) will return the error. > Following fsyncs will report no error. > Note that NFS is "special" in that it will flush data on close() in order to maintain close-to-open cache consistency. Technically, what nfs is doing above is sampling the errseq_t in the mapping, and then writing back any dirty data, and then checking for errors that happened since the sample. close() will only report writeback errors that happened within that window. If a preexisting writeback error occurred before "since" was sampled, then it won't report that here...which is weird, and another good argument for not reporting or checking for writeback errors at close(). > > I.e. if a concurrent fork/exec happens for process spawning and the > > fork-child closes the file descriptors then this closing may basically > > receive the errors and the parent will not see them (unless additional > > errors happen)? > > Correct AFAICT. > It will see them if it calls fsync(). Reporting on close() is iffy. > > Or if _any_ part of the program dups the descriptor and then closes it > > without reporting errors then all uses of those descriptor must consider > > error delivery on close to be unreliable? > > Correct as well AFAICT. > > I should probably also add that traditional filesystems (classical local > disk based filesystems) don't bother with reporting delayed errors on > close(2) *at all*. So unless you call fsync(2) you will never learn there > was any writeback error. After all for these filesystems there are good > chances writeback didn't even start by the time you are calling close(2). > So overall I'd say that error reporting from close(2) is so random and > filesystem dependent that the errors are not worth paying attention to. If > you really care about data integrity (and thus writeback errors) you must > call fsync(2) in which case the kernel provides at least somewhat > consistent error reporting story. > +1. tl;dr: the only useful error from close() is EBADF. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 16:43 ` Jeff Layton @ 2026-01-26 23:01 ` Trevor Gross 2026-01-27 0:49 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Trevor Gross @ 2026-01-26 23:01 UTC (permalink / raw) To: Jeff Layton, Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: >> On Mon 26-01-26 14:53:12, The 8472 wrote: >> > On 26/01/2026 13:15, Jan Kara wrote: >> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: >> > > > > > [QUERY: Do delayed errors ever happen in any of these situations? >> > > > > > >> > > > > > - The fd is not the last reference to the open file description >> > > > > > >> > > > > > - The OFD was opened with O_RDONLY >> > > > > > >> > > > > > - The OFD was opened with O_RDWR but has never actually >> > > > > > been written to >> > > > > > >> > > > > > - No data has been written to the OFD since the last call to >> > > > > > fsync() for that OFD >> > > > > > >> > > > > > - No data has been written to the OFD since the last call to >> > > > > > fdatasync() for that OFD >> > > > > > >> > > > > > If we can give some guidance about when people don’t need to >> > > > > > worry about delayed errors, it would be helpful.] >> > > > >> > > > In particular, I really hope delayed errors *aren’t* ever reported >> > > > when you close a file descriptor that *isn’t* the last reference >> > > > to its open file description, because the thread-safe way to close >> > > > stdout without losing write errors[2] depends on that not happening. >> > > >> > > So I've checked and in Linux ->flush callback for the file is called >> > > whenever you close a file descriptor (regardless whether there are other >> > > file descriptors pointing to the same file description) so it's upto >> > > filesystem implementation what it decides to do and which error it will >> > > return... Checking the implementations e.g. FUSE and NFS *will* return >> > > delayed writeback errors on *first* descriptor close even if there are >> > > other still open descriptors for the description AFAICS. > > ...and I really wish they _didn't_. > > Reporting a writeback error on close is not particularly useful. Most > filesystems don't require you to write back all data on a close(). A > successful close() on those just means that no error has happened yet. > > Any application that cares about writeback errors needs to fsync(), > full stop. Is there a good middle ground solution here? It seems reasonable that an application may want to have different handling for errors expected during normal operation, such as temporary network failure with NFS, compared to more catastrophic things like failure to write to disk. The reason cited around [1] for avoiding fsync is that it comes with a cost that, for many applications, may not be worth it unless you are dealing with NFS. I was wondering if it could be worth a new fnctl that provides this kind of "best effort" error checking behavior without having the strict requirements of fsync. In effect, to report the errors that you might currently get at close() before actually calling close() and losing the fd. Alternatively, it would be interesting to have a deferred fsync() that schedules a nonblocking sync event that can be polled for completion/ errors, with flags to indicate immediate sync or allow automatic syncing as needed. But there is probably a better alternative to this complexity. - Trevor [1]: https://github.com/rust-lang/libs-team/issues/705 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 23:01 ` Trevor Gross @ 2026-01-27 0:49 ` Jeff Layton 2026-01-28 16:58 ` Zack Weinberg 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2026-01-27 0:49 UTC (permalink / raw) To: Trevor Gross, Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote: > On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: > > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: > > > On Mon 26-01-26 14:53:12, The 8472 wrote: > > > > On 26/01/2026 13:15, Jan Kara wrote: > > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > > > > been written to > > > > > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > > > fsync() for that OFD > > > > > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > > > fdatasync() for that OFD > > > > > > > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > > > > worry about delayed errors, it would be helpful.] > > > > > > > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > > > > when you close a file descriptor that *isn’t* the last reference > > > > > > to its open file description, because the thread-safe way to close > > > > > > stdout without losing write errors[2] depends on that not happening. > > > > > > > > > > So I've checked and in Linux ->flush callback for the file is called > > > > > whenever you close a file descriptor (regardless whether there are other > > > > > file descriptors pointing to the same file description) so it's upto > > > > > filesystem implementation what it decides to do and which error it will > > > > > return... Checking the implementations e.g. FUSE and NFS *will* return > > > > > delayed writeback errors on *first* descriptor close even if there are > > > > > other still open descriptors for the description AFAICS. > > > > ...and I really wish they _didn't_. > > > > Reporting a writeback error on close is not particularly useful. Most > > filesystems don't require you to write back all data on a close(). A > > successful close() on those just means that no error has happened yet. > > > > Any application that cares about writeback errors needs to fsync(), > > full stop. > > Is there a good middle ground solution here? > > It seems reasonable that an application may want to have different > handling for errors expected during normal operation, such as temporary > network failure with NFS, compared to more catastrophic things like > failure to write to disk. The reason cited around [1] for avoiding fsync > is that it comes with a cost that, for many applications, may not be > worth it unless you are dealing with NFS. > > I was wondering if it could be worth a new fnctl that provides this kind > of "best effort" error checking behavior without having the strict > requirements of fsync. In effect, to report the errors that you might > currently get at close() before actually calling close() and losing the > fd. > For a long-held fd, I can see the appeal: spray writes at it and just check occasionally (without blocking) that nothing has gone wrong. Maybe when things are idle, you fsync(). A new fcntl(..., F_CHECKERR, ...) command that does a file_check_and_advance_wb_err() on the fd and reports the result would be pretty straightforward. Would that be helpful for your use-case? This would be like a non- blocking fsync that just reports whether an error has occurred since the last F_CHECKERR or fsync(). > Alternatively, it would be interesting to have a deferred fsync() that > schedules a nonblocking sync event that can be polled for completion/ > errors, with flags to indicate immediate sync or allow automatic syncing > as needed. But there is probably a better alternative to this > complexity. > > [1]: https://github.com/rust-lang/libs-team/issues/705 Aside from the polling, I suppose you could effectively do this with io_uring. I'm pretty sure you can issue an fsync() or sync_file_range() that way, but I think it just ends up blocking a kernel thread until writeback is done. We've had people ask for a non-blocking fsync before. Maybe it's time to get serious about adding one. What would such a thing look like? It would be pretty simple to add a new fcntl(..., F_DATAWRITE) command that kicks off writeback a'la filemap_fdatawrite(). Then add fcntl(..., F_WB_CHECK): That could do a non-blocking version of filemap_fdatawait(), and return whether any folios are still under writeback. If there is a writeback error, it can return that instead. The catch of course is that a polling mechanism like this could easily livelock. If there is a lot of memory pressure, it might always return that something is still under writeback, no matter how often you hammer F_CHECKERR. Maybe that's ok? You can always issue a blocking fsync() if you really need to know draw a line in the sand. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-27 0:49 ` Jeff Layton @ 2026-01-28 16:58 ` Zack Weinberg 2026-02-05 9:34 ` Jan Kara 0 siblings, 1 reply; 23+ messages in thread From: Zack Weinberg @ 2026-01-28 16:58 UTC (permalink / raw) To: Jeff Layton, Trevor Gross, Jan Kara, The 8472 Cc: Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote: > On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote: >> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: >> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: >> > > On Mon 26-01-26 14:53:12, The 8472 wrote: >> > > > On 26/01/2026 13:15, Jan Kara wrote: >> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: ... >> > > > > > In particular, I really hope delayed errors *aren’t* ever reported >> > > > > > when you close a file descriptor that *isn’t* the last reference >> > > > > > to its open file description, because the thread-safe way to close >> > > > > > stdout without losing write errors[2] depends on that not happening. >> > > > > >> > > > > So I've checked and in Linux ->flush callback for the file is called >> > > > > whenever you close a file descriptor (regardless whether there are other >> > > > > file descriptors pointing to the same file description) so it's upto >> > > > > filesystem implementation what it decides to do and which error it will >> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return >> > > > > delayed writeback errors on *first* descriptor close even if there are >> > > > > other still open descriptors for the description AFAICS. >> > >> > ...and I really wish they _didn't_. >> > >> > Reporting a writeback error on close is not particularly useful. Most >> > filesystems don't require you to write back all data on a close(). A >> > successful close() on those just means that no error has happened yet. >> > >> > Any application that cares about writeback errors needs to fsync(), >> > full stop. >> >> Is there a good middle ground solution here? ... >> I was wondering if it could be worth a new fnctl that provides this kind >> of "best effort" error checking behavior without having the strict >> requirements of fsync. In effect, to report the errors that you might >> currently get at close() before actually calling close() and losing the >> fd. ... > A new fcntl(..., F_CHECKERR, ...) command that does a > file_check_and_advance_wb_err() on the fd and reports the result would > be pretty straightforward. > > Would that be helpful for your use-case? This would be like a non- > blocking fsync that just reports whether an error has occurred since > the last F_CHECKERR or fsync(). I feel I need to point out that “should the kernel report errors on close()” and “should the kernel add a new API to make life better for programs that currently expect close() to report [some] errors” and “should the Rust standard library propagate errors produced by close() back up to the application” and “what should the close(2) manpage say about errors” are four different conversation topics. I am all in favor of moving toward a world where close() never fails and there’s _something_ that reports write errors like fsync() without also kicking your application off a performance cliff. But that’s not the world we live in today, and this thread started as a conversation about revising the close(2) manpage, and I’d kinda like to *finish* revising the manpage in, like, the next couple weeks, not several years from now :-) So I’d like to refocus on that topic. Given what Jan Kara said earlier... > Checking the implementations e.g. FUSE and NFS *will* return delayed > writeback errors on *first* descriptor close even if there are other > still open descriptors for the description AFAICS. ... > fsync(2) must make sure data is persistently stored and return error if > it was not. Thus as a VFS person I'd consider it a filesystem bug if an > error preveting reading data later was not returned from fsync(2). OTOH > that doesn't necessarily mean that later close doesn't return an error - > e.g. FUSE does communicate with the server on close that can fail and > error can be returned. > > With this in mind let me now try to answer your remaining questions: > >> >> - The OFD was opened with O_RDONLY > > If the filesystem supports atime, close can in principle report that atime > update failed. > >> >> - The OFD was opened with O_RDWR but has never actually >> >> been written to > > The same as above but with inode mtime updates. > >> >> - No data has been written to the OFD since the last call to >> >> fsync() for that OFD > > No writeback errors should happen in this case. As I wrote above I'd > consider this a filesystem bug. > >> >> >> >> - No data has been written to the OFD since the last call to >> >> fdatasync() for that OFD > > Errors can happen because some inode metadata (in practice probably only > inode time stamps) may still need to be written out. > > So in the cases described above (except for fsync()) you may get delayed > errors on close. But since in all those cases no data is lost, I don't > think 99.9% of applications care at all... ... regrettably I think this does mean the close(3) manpage still needs to tell people to watch out for errors, and should probably say that errors _can_ happen even if the file wasn’t written to, but are much less likely to be important in that case. And my “how to close stdout in a thread-safe manner” sample code is wrong, because I was wrong to think that the error reporting only happened on the _final_ close, when the OFD is destroyed. ... What happens if the close is implicit in a dup2() operation? Here’s that erroneous “how to close stdout” fragment, with comments indicating what I thought could and could not fail at the time I wrote it: // These allocate new fds, which can always fail, e.g. because // the program already has too many files open. int new_stdout = open("/dev/null", O_WRONLY); if (new_stdout == -1) perror_exit("/dev/null"); int old_stdout = dup(1); if (old_stdout == -1) perror_exit("dup(1)"); flockfile(stdout); if (fflush(stdout)) perror_exit("stdout: write error"); dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1 funlockfile(stdout); // this close may receive delayed write errors from previous writes // to stdout if (close(old_stdout)) perror_exit("stdout: write error"); // this close cannot fail, because it only drops an alternative // reference to the open file description now installed as fd 1 close(new_stdout); Note in particular that the first close _operation_ on fd 1 is in consequence of dup2(new_stdout, 1). The dup2() manpage specifically says “the close is performed silently (i.e. any errors during the close are not reported by dup()” but, if stdout points to a file on an NFS mount, are those errors _lost_, or will they actually be reported by the subsequent close(old_stdout)? Incidentally, the dup2() manpage has a very similar example in its NOTES section, also presuming that close only reports errors on the _final_ close, not when it “merely” drops reference >=2 to an OFD. (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that already a thing somehow?) zw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-28 16:58 ` Zack Weinberg @ 2026-02-05 9:34 ` Jan Kara 0 siblings, 0 replies; 23+ messages in thread From: Jan Kara @ 2026-02-05 9:34 UTC (permalink / raw) To: Zack Weinberg Cc: Jeff Layton, Trevor Gross, Jan Kara, The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development I've noticed we didn't reply to one question here: On Wed 28-01-26 11:58:07, Zack Weinberg wrote: > On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote: > > Checking the implementations e.g. FUSE and NFS *will* return delayed > > writeback errors on *first* descriptor close even if there are other > > still open descriptors for the description AFAICS. > ... > > fsync(2) must make sure data is persistently stored and return error if > > it was not. Thus as a VFS person I'd consider it a filesystem bug if an > > error preveting reading data later was not returned from fsync(2). OTOH > > that doesn't necessarily mean that later close doesn't return an error - > > e.g. FUSE does communicate with the server on close that can fail and > > error can be returned. > > > > With this in mind let me now try to answer your remaining questions: > > > >> >> - The OFD was opened with O_RDONLY > > > > If the filesystem supports atime, close can in principle report that atime > > update failed. > > > >> >> - The OFD was opened with O_RDWR but has never actually > >> >> been written to > > > > The same as above but with inode mtime updates. > > > >> >> - No data has been written to the OFD since the last call to > >> >> fsync() for that OFD > > > > No writeback errors should happen in this case. As I wrote above I'd > > consider this a filesystem bug. > > > >> >> > >> >> - No data has been written to the OFD since the last call to > >> >> fdatasync() for that OFD > > > > Errors can happen because some inode metadata (in practice probably only > > inode time stamps) may still need to be written out. > > > > So in the cases described above (except for fsync()) you may get delayed > > errors on close. But since in all those cases no data is lost, I don't > > think 99.9% of applications care at all... > > ... regrettably I think this does mean the close(3) manpage still needs > to tell people to watch out for errors, and should probably say that > errors _can_ happen even if the file wasn’t written to, but are much > less likely to be important in that case. > > And my “how to close stdout in a thread-safe manner” sample code is > wrong, because I was wrong to think that the error reporting only > happened on the _final_ close, when the OFD is destroyed. > > ... What happens if the close is implicit in a dup2() operation? Here’s > that erroneous “how to close stdout” fragment, with comments > indicating what I thought could and could not fail at the time I wrote > it: > > // These allocate new fds, which can always fail, e.g. because > // the program already has too many files open. > int new_stdout = open("/dev/null", O_WRONLY); > if (new_stdout == -1) perror_exit("/dev/null"); > int old_stdout = dup(1); > if (old_stdout == -1) perror_exit("dup(1)"); > > flockfile(stdout); > if (fflush(stdout)) perror_exit("stdout: write error"); > dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1 > funlockfile(stdout); > > // this close may receive delayed write errors from previous writes > // to stdout > if (close(old_stdout)) perror_exit("stdout: write error"); > > // this close cannot fail, because it only drops an alternative > // reference to the open file description now installed as fd 1 > close(new_stdout); > > Note in particular that the first close _operation_ on fd 1 is in > consequence of dup2(new_stdout, 1). The dup2() manpage specifically > says “the close is performed silently (i.e. any errors during the > close are not reported by dup()” but, if stdout points to a file on > an NFS mount, are those errors _lost_, or will they actually be > reported by the subsequent close(old_stdout)? It is simply lost (the error is propagated from the filesystem to VFS which just ignores it). > Incidentally, the dup2() manpage has a very similar example in its > NOTES section, also presuming that close only reports errors on the > _final_ close, not when it “merely” drops reference >=2 to an OFD. > > (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that > already a thing somehow?) I don't think a functionality like this currently exists. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 [not found] ` <20250517133251.GY1509@brightrain.aerifal.cx> [not found] ` <5jm7pblkwkhh4frqjptrw4ll4nwncn22ep2v7sli6kz5wxg5ik@pbnj6wfv66af> @ 2026-02-06 15:13 ` Vincent Lefevre 1 sibling, 0 replies; 23+ messages in thread From: Vincent Lefevre @ 2026-02-06 15:13 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On 2025-05-17 09:32:52 -0400, Rich Felker wrote: > On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote: > > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > > issue, and later changed it to returning 0 since applications > > > (particularly, any written prior to this interpretation) are prone to > > > interpret EINPROGRESS as an error condition rather than success and > > > possibly misinterpret it as meaning the fd is still open and valid to > > > pass to close again. > > > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > > > ERRORS > > The close() and posix_close() functions shall fail if: > > [...] > > [EINPROGRESS] > > The function was interrupted by a signal and fildes was closed > > but the close operation is continuing asynchronously. > > > > But this does not mean that the asynchronous close operation will > > succeed. > > There are no asynchronous behaviors specified for there to be a > conformance distinction here. The only observable behaviors happen > instantly, mainly the release of the file descriptor and the process's > handle on the underlying resource. Abstractly, there is no async > operation that could succeed or fail. Sorry, this is old. But a consequence may be memory leak if something unexpected occurred during what was done asynchronously. There is no guarantee that *every* resource has been released. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-02-06 15:20 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <a5tirrssh3t66q4vpwpgmxgxaumhqukw5nyxd4x6bevh7mtuvy@wtwdsb4oloh4>
[not found] ` <efaffc5a404cf104f225c26dbc96e0001cede8f9.1747399542.git.alx@kernel.org>
[not found] ` <20250516130547.GV1509@brightrain.aerifal.cx>
[not found] ` <20250516143957.GB5388@qaa.vinc17.org>
[not found] ` <20250517133251.GY1509@brightrain.aerifal.cx>
[not found] ` <5jm7pblkwkhh4frqjptrw4ll4nwncn22ep2v7sli6kz5wxg5ik@pbnj6wfv66af>
[not found] ` <8c47e10a-be82-4d5b-a45e-2526f6e95123@app.fastmail.com>
[not found] ` <20250524022416.GB6263@brightrain.aerifal.cx>
[not found] ` <1571b14d-1077-4e81-ab97-36e39099761e@app.fastmail.com>
[not found] ` <20260120174659.GE6263@brightrain.aerifal.cx>
[not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com>
[not found] ` <20260120190010.GF6263@brightrain.aerifal.cx>
2026-01-20 20:05 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Florian Weimer
2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2026-01-20 20:42 ` Alejandro Colomar
2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
2026-01-23 14:44 ` Alejandro Colomar
2026-01-23 14:05 ` Zack Weinberg
2026-01-24 19:34 ` The 8472
2026-01-24 21:39 ` Rich Felker
2026-01-24 21:57 ` The 8472
2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
2026-01-26 13:53 ` The 8472
2026-01-26 15:56 ` Jan Kara
2026-01-26 16:43 ` Jeff Layton
2026-01-26 23:01 ` Trevor Gross
2026-01-27 0:49 ` Jeff Layton
2026-01-28 16:58 ` Zack Weinberg
2026-02-05 9:34 ` Jan Kara
2026-02-06 15:13 ` Vincent Lefevre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox