* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
[not found] ` <20260120190010.GF6263@brightrain.aerifal.cx>
@ 2026-01-20 20:05 ` Florian Weimer
0 siblings, 0 replies; 23+ messages in thread
From: Florian Weimer @ 2026-01-20 20:05 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
* Rich Felker:
> On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>>
>> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
>> >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>> >> >> close() always succeeds. That is, after it returns, _fd_ has
>> >> >> always been disconnected from the open file it formerly referred
>> >> >> to, and its number can be recycled to refer to some other file.
>> >> >> Furthermore, if _fd_ was the last reference to the underlying
>> >> >> open file description, the resources associated with the open file
>> >> >> description will always have been scheduled to be released.
>> >> ...
>> >> >> EINPROGRESS
>> >> >> EINTR
>> >> >> There are no delayed errors to report, but the kernel is
>> >> >> still doing some clean-up work in the background. This
>> >> >> situation should be treated the same as if close() had
>> >> >> returned zero. Do not retry the close(), and do not report
>> >> >> an error to the user.
>> >> >
>> >> > Since this behavior for EINTR is non-conforming (and even prior to the
>> >> > POSIX 2024 update, it was contrary to the general semantics for EINTR,
>> >> > that no non-ignoreable side-effects have taken place), it should be
>> >> > noted that it's Linux/glibc-specific.
>> >>
>> >> I am prepared to take your word for it that POSIX says this is
>> >> non-conforming, but in that case, POSIX is wrong, and I will not be
>> >> convinced otherwise by any argument. Operations that release a
>> >> resource must always succeed.
>> >
>> > There are two conflicting requirements here:
>> >
>> > 1. Operations that release a resource must always succeed.
>> > 2. Failure with EINTR must not not have side effects.
>> >
>> > The right conclusion is that operations that release resources must
>> > not be able to fail with EINTR. And that's how POSIX should have
>> > resolved the situation -- by getting rid of support for the silly
>> > legacy synchronous-tape-drive-rewinding behavior of close on some
>> > systems, and requiring close to succeed immediately with no waiting
>> > for anything.
>>
>> What about SO_LINGER? Isn't this relevant in context?
>
> shutdown should be used for this, not close. So that the acts of
> waiting for the operation to finish, and releasing the resource handle
> needed to observe if it's finished, are separate.
I think shutdown on TCP sockets is non-blocking under Linux. It doesn't
wait until the peer has acknowledged the FIN segment, as far as I
understand it. Other systems may behave differently.
>> As far as I know, there is no other way besides SO_LINGER to get
>> notification if the packet buffers are actually gone. If you don't use
>> it, memory can pile up in the kernel without the application's
>> knowledge.
>
> The way Linux's EINTR behaves, using close can't ensure this memory
> doesn't pile up, because on EINTR you lose the ability to wait for it.
Can't the application reliably avoid EINTR by blocking signals?
Thanks,
Florian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
[not found] ` <20260120174659.GE6263@brightrain.aerifal.cx>
[not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com>
@ 2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2 siblings, 0 replies; 23+ messages in thread
From: Paul Eggert @ 2026-01-20 20:11 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development,
Zack Weinberg
On 2026-01-20 09:46, Rich Felker wrote:
> the job of the man pages absolutely is not "to tell people how to
> program". It's to document behaviors.
In practice man pages do both. When I type "man close" on GNU/Linux I
see text like the text quoted below, and as a C programmer I appreciate
getting advice like this when the situation is sufficiently tricky.
----
Any record locks (see fcntl(2)) held on the file it was associated with,
and owned by the process, are removed regardless of the file descriptor
that was used to obtain the lock. This has some unfortunate consequences
and one should be extra careful when using advisory record locking. See
fcntl(2) for discussion of the risks and consequences as well as for the
(probably preferred) open file description locks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
[not found] ` <20260120174659.GE6263@brightrain.aerifal.cx>
[not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com>
2026-01-20 20:11 ` Paul Eggert
@ 2026-01-20 20:35 ` Alejandro Colomar
2026-01-20 20:42 ` Alejandro Colomar
2 siblings, 1 reply; 23+ messages in thread
From: Alejandro Colomar @ 2026-01-20 20:35 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 9114 bytes --]
Hi Rich, Zack,
On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote:
> On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
[...]
> > Now, the abstract correct behavior is secondary to the fact that we
> > know there are both systems where close should not be retried after
> > EINTR (Linux) and systems where the fd is still open after EINTR
> > (HP-UX). But it is my position that *portable code* should assume the
> > Linux behavior, because that is the safest option. If you assume the
> > HP-UX behavior on a machine that implements the Linux behavior, you
> > might close some unrelated file out from under yourself (probably but
> > not necessarily a different thread). If you assume the Linux behavior
> > on a machine that implements the HP-UX behavior, you have leaked a
> > file descriptor; the worst things that can do are much less severe.
>
> Unfortunately, regardless of what happens, code portable to old
> systems needs to avoid getting in the situation to begin with. By
> either not installing interrupting signal handlers or blocking EINTR
> around close.
[...]
> > > While I agree with all of this, I think the tone is way too
> > > proscriptive. The man pages are to document the behaviors, not tell
> > > people how to program.
> >
> > I could be persuaded to tone it down a little but in this case I think
> > the man page's job *is* to tell people how to program. We know lots of
> > existing code has gotten the fine details of close() wrong and we are
> > trying to document how to do it right.
>
> No, the job of the man pages absolutely is not "to tell people how to
> program". It's to document behaviors. They are not a programming
> tutorial. They are not polemic diatribes. They are unbiased statements
> of facts. Facts of what the standards say and what implementations do,
> that equip programmers with the knowledge they need to make their own
> informed decisions, rather than blindly following what someone who
> thinks they know better told them to do.
This reminds me a little bit of the realloc(p,0) fiasco of C89 and
glibc.
In most cases, I agree with you that manual pages are and should be
aseptic, there are cases where I think the manual page needs to be
tutorial. Especially when there's such a mess, we need to both explain
all the possible behaviors (or at least mention them to some degree).
But for example, there's the case of realloc(p,0), where we have
a fiasco that was pushed by a compoundment of wrong decisions by the
C Committee, and prior to that from System V. We're a bit lucky that
C17 accidentally broke it so badly that we now have it as UB, and that
gives us the opportunity to fix it now (which BTW might also be the case
for close(2)).
In the case of realloc(3), I went and documented in the manual page that
glibc is broken, and that ISO C is also broken.
STANDARDS
malloc()
free()
calloc()
realloc()
C23, POSIX.1‐2024.
reallocarray()
POSIX.1‐2024.
realloc(p, 0)
The behavior of realloc(p, 0) in glibc doesn’t conform to
any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008,
POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17
specification was changed to make it conforming, but that
specification made it impossible to write code that reli‐
ably determines if the input pointer is freed after real‐
loc(p, 0), and C23 changed it again to make this undefined
behavior, acknowledging that the C17 specification was
broad enough, so that undefined behavior wasn’t worse than
that.
reallocarray() suffers the same issues in glibc.
musl libc and the BSDs conform to all versions of ISO C
and POSIX.1.
gnulib provides the realloc‐posix module, which provides
wrappers realloc() and reallocarray() that conform to all
versions of ISO C and POSIX.1.
There’s a proposal to standardize the BSD behavior: https:
//www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt.
HISTORY
malloc()
free()
calloc()
realloc()
POSIX.1‐2001, C89.
reallocarray()
glibc 2.26. OpenBSD 5.6, FreeBSD 11.0.
malloc() and related functions rejected sizes greater than
PTRDIFF_MAX starting in glibc 2.30.
free() preserved errno starting in glibc 2.33.
realloc(p, 0)
C89 was ambiguous in its specification of realloc(p, 0).
C99 partially fixed this.
The original implementation in glibc would have been con‐
forming to C99. However, and ironically, trying to comply
with C99 before the standard was released, glibc changed
its behavior in glibc 2.1.1 into something that ended up
not conforming to the final C99 specification (but this is
debated, as the wording of the standard seems self‐contra‐
dicting).
...
BUGS
Programmers would naturally expect by induction that
realloc(p, size) is consistent with free(p) and mal‐
loc(size), as that is the behavior in the general case.
This is not explicitly required by POSIX.1‐2024 or C11,
but all conforming implementations are consistent with
that.
The glibc implementation of realloc() is not consistent
with that, and as a consequence, it is dangerous to call
realloc(p, 0) in glibc.
A trivial workaround for glibc is calling it as
realloc(p, size?size:1).
The workaround for reallocarray() in glibc ——which shares
the same bug—— would be
reallocarray(p, n?n:1, size?size:1).
Apart from documenting that glibc and ISO C are broken, we document how
to best deal with it (see the last paragraph in BUGS). This is
necessary because I fear that just by documenting the different
behaviors, programmers would still not know what to do with that.
Just take into account that even several members of the committee don't
know how to deal with it.
I'd be willing to have something similar for close(2).
Have a lovely night!
Alex
P.S.: I have great news about realloc(p,0)! Microsoft is on-board with
the change. They told me they like the proposal, and are willing to
fix their realloc(3) implementation. They'll now conduct tests to make
sure it doesn't break anything too badly, and will come back to me with
any feedback they have from those tests.
I'll put the standards proposal for realloc(3) on hold, waiting for
Microsoft's feedback.
> > > Aside: the reason EINTR *has to* be specified this way is that pthread
> > > cancellation is aligned with EINTR. If EINTR were defined to have
> > > closed the fd, then acting on cancellation during close would also
> > > have closed the fd, but the cancellation handler would have no way to
> > > distinguish this, leading to a situation where you're forced to either
> > > leak fds or introduce a double-close vuln.
> >
> > The correct way to address this would be to make close() not be a
> > cancellation point.
>
> This would also be a desirable change, one I would support if other
> implementors are on-board with pushing for it.
>
> > > An outline of what I'd like to see instead:
> > >
> > > - Clear explanation of why double-close is a serious bug that must
> > > always be avoided. (I think we all agree on this.)
> > >
> > > - Statement that the historical Linux/glibc behavior and current POSIX
> > > requirement differ, without language that tries to paint the POSIX
> > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> > > of the issue (Austin Group tracker items 529, 614; maybe others).
> > >
> > > - Consequence of just assuming the Linux behavior (fd leaks on
> > > conforming systems).
> > >
> > > - Consequences of assuming the POSIX behavior (double-close vulns on
> > > GNU/Linux, maybe others).
> > >
> > > - Survey of methods for avoiding the problem (ways to preclude EINTR,
> > > possibly ways to infer behavior, etc).
> >
> > This outline seems more or less reasonable to me but, if it's me
> > writing the text, I _will_ characterize what POSIX currently says
> > about EINTR returns from close() as a bug in POSIX. As far as I'm
> > concerned, that is a fact, not polemic.
> >
> > I have found that arguing with you in particular, Rich, is generally
> > not worth the effort. Therefore, unless you reply and _accept_ that
> > the final version of the close manpage will say that POSIX is buggy,
> > I am not going to write another version of this text, nor will I be
> > drawn into further debate.
>
> I will not accept that because it's a gross violation of the
> responsibility of document writing.
>
> Rich
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 20:35 ` Alejandro Colomar
@ 2026-01-20 20:42 ` Alejandro Colomar
2026-01-23 0:33 ` Zack Weinberg
0 siblings, 1 reply; 23+ messages in thread
From: Alejandro Colomar @ 2026-01-20 20:42 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 9759 bytes --]
On Tue, Jan 20, 2026 at 09:35:43PM +0100, Alejandro Colomar wrote:
> Hi Rich, Zack,
>
> On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote:
> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>
> [...]
>
> > > Now, the abstract correct behavior is secondary to the fact that we
> > > know there are both systems where close should not be retried after
> > > EINTR (Linux) and systems where the fd is still open after EINTR
> > > (HP-UX). But it is my position that *portable code* should assume the
> > > Linux behavior, because that is the safest option. If you assume the
> > > HP-UX behavior on a machine that implements the Linux behavior, you
> > > might close some unrelated file out from under yourself (probably but
> > > not necessarily a different thread). If you assume the Linux behavior
> > > on a machine that implements the HP-UX behavior, you have leaked a
> > > file descriptor; the worst things that can do are much less severe.
> >
> > Unfortunately, regardless of what happens, code portable to old
> > systems needs to avoid getting in the situation to begin with. By
> > either not installing interrupting signal handlers or blocking EINTR
> > around close.
>
> [...]
>
> > > > While I agree with all of this, I think the tone is way too
> > > > proscriptive. The man pages are to document the behaviors, not tell
> > > > people how to program.
> > >
> > > I could be persuaded to tone it down a little but in this case I think
> > > the man page's job *is* to tell people how to program. We know lots of
> > > existing code has gotten the fine details of close() wrong and we are
> > > trying to document how to do it right.
> >
> > No, the job of the man pages absolutely is not "to tell people how to
> > program". It's to document behaviors. They are not a programming
> > tutorial. They are not polemic diatribes. They are unbiased statements
> > of facts. Facts of what the standards say and what implementations do,
> > that equip programmers with the knowledge they need to make their own
> > informed decisions, rather than blindly following what someone who
> > thinks they know better told them to do.
>
> This reminds me a little bit of the realloc(p,0) fiasco of C89 and
> glibc.
>
> In most cases, I agree with you that manual pages are and should be
> aseptic, there are cases where I think the manual page needs to be
> tutorial. Especially when there's such a mess, we need to both explain
> all the possible behaviors (or at least mention them to some degree).
... and guide programmers about how to best use the API.
I forgot to finish the sentence.
>
> But for example, there's the case of realloc(p,0), where we have
> a fiasco that was pushed by a compoundment of wrong decisions by the
> C Committee, and prior to that from System V. We're a bit lucky that
> C17 accidentally broke it so badly that we now have it as UB, and that
> gives us the opportunity to fix it now (which BTW might also be the case
> for close(2)).
>
> In the case of realloc(3), I went and documented in the manual page that
> glibc is broken, and that ISO C is also broken.
>
> STANDARDS
> malloc()
> free()
> calloc()
> realloc()
> C23, POSIX.1‐2024.
>
> reallocarray()
> POSIX.1‐2024.
>
> realloc(p, 0)
> The behavior of realloc(p, 0) in glibc doesn’t conform to
> any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008,
> POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17
> specification was changed to make it conforming, but that
> specification made it impossible to write code that reli‐
> ably determines if the input pointer is freed after real‐
> loc(p, 0), and C23 changed it again to make this undefined
> behavior, acknowledging that the C17 specification was
> broad enough, so that undefined behavior wasn’t worse than
> that.
>
> reallocarray() suffers the same issues in glibc.
>
> musl libc and the BSDs conform to all versions of ISO C
> and POSIX.1.
>
> gnulib provides the realloc‐posix module, which provides
> wrappers realloc() and reallocarray() that conform to all
> versions of ISO C and POSIX.1.
>
> There’s a proposal to standardize the BSD behavior: https:
> //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt.
>
> HISTORY
> malloc()
> free()
> calloc()
> realloc()
> POSIX.1‐2001, C89.
>
> reallocarray()
> glibc 2.26. OpenBSD 5.6, FreeBSD 11.0.
>
> malloc() and related functions rejected sizes greater than
> PTRDIFF_MAX starting in glibc 2.30.
>
> free() preserved errno starting in glibc 2.33.
>
> realloc(p, 0)
> C89 was ambiguous in its specification of realloc(p, 0).
> C99 partially fixed this.
>
> The original implementation in glibc would have been con‐
> forming to C99. However, and ironically, trying to comply
> with C99 before the standard was released, glibc changed
> its behavior in glibc 2.1.1 into something that ended up
> not conforming to the final C99 specification (but this is
> debated, as the wording of the standard seems self‐contra‐
> dicting).
>
> ...
>
> BUGS
> Programmers would naturally expect by induction that
> realloc(p, size) is consistent with free(p) and mal‐
> loc(size), as that is the behavior in the general case.
> This is not explicitly required by POSIX.1‐2024 or C11,
> but all conforming implementations are consistent with
> that.
>
> The glibc implementation of realloc() is not consistent
> with that, and as a consequence, it is dangerous to call
> realloc(p, 0) in glibc.
>
> A trivial workaround for glibc is calling it as
> realloc(p, size?size:1).
>
> The workaround for reallocarray() in glibc ——which shares
> the same bug—— would be
> reallocarray(p, n?n:1, size?size:1).
>
>
> Apart from documenting that glibc and ISO C are broken, we document how
> to best deal with it (see the last paragraph in BUGS). This is
> necessary because I fear that just by documenting the different
> behaviors, programmers would still not know what to do with that.
> Just take into account that even several members of the committee don't
> know how to deal with it.
>
> I'd be willing to have something similar for close(2).
>
>
> Have a lovely night!
> Alex
>
> P.S.: I have great news about realloc(p,0)! Microsoft is on-board with
> the change. They told me they like the proposal, and are willing to
> fix their realloc(3) implementation. They'll now conduct tests to make
> sure it doesn't break anything too badly, and will come back to me with
> any feedback they have from those tests.
>
> I'll put the standards proposal for realloc(3) on hold, waiting for
> Microsoft's feedback.
>
> > > > Aside: the reason EINTR *has to* be specified this way is that pthread
> > > > cancellation is aligned with EINTR. If EINTR were defined to have
> > > > closed the fd, then acting on cancellation during close would also
> > > > have closed the fd, but the cancellation handler would have no way to
> > > > distinguish this, leading to a situation where you're forced to either
> > > > leak fds or introduce a double-close vuln.
> > >
> > > The correct way to address this would be to make close() not be a
> > > cancellation point.
> >
> > This would also be a desirable change, one I would support if other
> > implementors are on-board with pushing for it.
> >
> > > > An outline of what I'd like to see instead:
> > > >
> > > > - Clear explanation of why double-close is a serious bug that must
> > > > always be avoided. (I think we all agree on this.)
> > > >
> > > > - Statement that the historical Linux/glibc behavior and current POSIX
> > > > requirement differ, without language that tries to paint the POSIX
> > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> > > > of the issue (Austin Group tracker items 529, 614; maybe others).
> > > >
> > > > - Consequence of just assuming the Linux behavior (fd leaks on
> > > > conforming systems).
> > > >
> > > > - Consequences of assuming the POSIX behavior (double-close vulns on
> > > > GNU/Linux, maybe others).
> > > >
> > > > - Survey of methods for avoiding the problem (ways to preclude EINTR,
> > > > possibly ways to infer behavior, etc).
> > >
> > > This outline seems more or less reasonable to me but, if it's me
> > > writing the text, I _will_ characterize what POSIX currently says
> > > about EINTR returns from close() as a bug in POSIX. As far as I'm
> > > concerned, that is a fact, not polemic.
> > >
> > > I have found that arguing with you in particular, Rich, is generally
> > > not worth the effort. Therefore, unless you reply and _accept_ that
> > > the final version of the close manpage will say that POSIX is buggy,
> > > I am not going to write another version of this text, nor will I be
> > > drawn into further debate.
> >
> > I will not accept that because it's a gross violation of the
> > responsibility of document writing.
> >
> > Rich
>
> --
> <https://www.alejandro-colomar.es>
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 20:42 ` Alejandro Colomar
@ 2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
2026-01-24 19:34 ` The 8472
0 siblings, 2 replies; 23+ messages in thread
From: Zack Weinberg @ 2026-01-23 0:33 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
Alright, since it actually seems possible we might be having a
reasonable conversation about the close manpage now, I've done
another draft. I *think* this covers all the concerns expressed
so far. I am feeling somewhat more charitable toward the Austin
Group after close-reading the current POSIX spec for close,
so there is no BUGS section after all. In their shoes I would
still have disallowed EINTR returns from close altogether, but
I can see why they felt that was a step too far.
This is a full top-to-bottom rewrite of the manpage; please speak
up if you don't like any of my changes to any of it, not just the
new stuff about delayed errors. It's written in freeform text for
ease of reading; I'll do proper troff markup after the text is
finalized. (Alejandro, do you have a preference between -man
and -mdoc markup?)
Please note the [QUERY:] sections sprinkled throughout NOTES.
I would like to have answers to those questions for the final draft.
zw
NAME
close - close a file descriptor
LIBRARY
Standard C library (libc, -lc)
SYNOPSIS
#include <unistd.h>
int close(int fd);
DESCRIPTION
close() closes a file descriptor, so that it no longer refers
to any file and may be reused.
When the last file descriptor referring to an underlying open
file description (see open(2)) is closed, the resources
associated with the open file description are freed. If that
open file description is the last reference to a file which has
been removed using unlink(2), the file is deleted.
When *any* file descriptor is closed, all record locks held by
the *process*, on the file formerly referred to by that file
descriptor, are released. This happens even if the file is
still open in the process via a different file descriptor.
See fcntl(2) for discussion of the consequences, and for
alternatives with less surprising semantics.
close() may report a *delayed error* from previous I/O
operations on a file. When it does this, the file descriptor
has still been closed, but the error needs to be handled.
See RETURN VALUE, ERRORS, and NOTES for further discussion of
what the errors reported by close mean, and how to handle them.
Despite the possibility of delayed errors, a successful close()
does *not* guarantee that all data written to the file has been
successfully saved to persistent storage. If you need such a
guarantee, use fsync(2); see that page for details.
The close-on-exec file descriptor flag can be used to ensure
that a file descriptor is automatically closed upon a
successful execve(2); see fcntl(2) for details.
RETURN VALUE
close() returns zero if the descriptor has been closed and
there were no delayed errors to report.
It returns -1 if there was an error that prevented the
file descriptor from being closed, *or* if the descriptor
has successfully been closed but there was a delayed error
to report. The errno code can be used to distinguish them;
see ERRORS and NOTES.
ERRORS
EBADF The fd argument was not a valid, open file descriptor.
EINTR The close() call was interrupted by a signal.
The file descriptor *may or may not* have been closed,
depending on the operating system. See “Signals and
close(),” below.
EINPROGRESS
[POSIX.1-2024 only] The close() call was interrupted by
a signal, after the file descriptor number was released
for reuse, but before all clean-up work had been
completed. The file descriptor has been closed,
and a delayed error may have been lost. See “Signals
and close(),” below.
EIO
ESTALE
EDQUOT
EFBIG
ENOSPC These error codes indicate a delayed error from a
previous write(2) operation. The file descriptor has
been closed, but the error needs to be handled.
See “Delayed errors reported by close()”, below.
Depending on the underlying file and/or file system, close()
may return with other errno codes besides those listed.
All such codes also indicate delayed errors.
NOTES
Multithreaded processes and close()
In a multithreaded program, each thread must take care not to
accidentally close file descriptors that are in use by other
threads. Because system calls that *open* files, sockets,
etc. always allocate the lowest file descriptor number that’s
not in use, file descriptor numbers are rapidly reused.
Closing an fd that another thread is still using is therefore
likely to cause data to be read or written to the wrong place.
Sometimes programs *deliberately* close a file descriptor that
is in use by another thread, intending to cancel any blocking
I/O operation that the other thread is performing. Whether
this works depends on the operating system. On Linux, it
doesn’t work; a blocking I/O system call holds a direct
reference to the underlying open file description that is the
target of the I/O, and is unaffected by the program closing the
file descriptor that was used to initiate the I/O operation.
(See open(2) for a discussion of open file descriptions.)
Delayed errors reported by close()
In a variety of situations, most notably when writing to a file
that is hosted on a network file server, write(2) operations may
“optimistically” return successfully as soon as the write has
been queued for processing.
close(2) waits for confirmation that *most* of the processing
for previous writes to a file has been completed, and reports
any errors that the earlier write() calls *would have* reported,
if they hadn’t returned optimistically. Especially, close()
will report “disk full” (ENOSPC) and “disk quota exceeded”
(EDQUOT) errors that write() didn’t wait for.
(To wait for *all* processing to complete, it is necessary to
use fsync(2) as well.)
Because of these delayed errors, it’s important to check the
return value of close() and handle any errors it reports.
Ignoring delayed errors can cause silent loss of data.
However, when handling delayed errors, keep in mind that the
close() call should *not* be repeated. When close() has a
delayed error to report, it still closes the file before
returning. The file descriptor number might already have been
reused for some other file, especially in multithreaded
programs. To make another attempt at the failed writes, it’s
necessary to reopen the file and start all over again.
[QUERY: Do delayed errors ever happen in any of these situations?
- The fd is not the last reference to the open file description
- The OFD was opened with O_RDONLY
- The OFD was opened with O_RDWR but has never actually
been written to
- No data has been written to the OFD since the last call to
fsync() for that OFD
- No data has been written to the OFD since the last call to
fdatasync() for that OFD
If we can give some guidance about when people don’t need to
worry about delayed errors, it would be helpful.]
Signals and close()
close() waits for various I/O operations to complete; it is a
blocking system call, which can be interrupted by signals and
thread cancellation. As usual, when close() is interrupted
by a signal, it returns -1 and sets errno to EINTR.
Unlike most system calls that can be interrupted by signals,
it is not safe to repeat an interrupted call to close().
Prior to POSIX.1-2024, when a close() was interrupted by a
signal, it was *unspecified* whether the file descriptor was
still open afterward. The authors of this manpage are aware
of both systems where the file descriptor is guaranteed to
still be open after an interrupted close(), e.g. HP-UX, and
systems where it is guaranteed to be *closed* after an
interrupted close(), e.g. Linux and FreeBSD.
POSIX.1-2024 makes stricter requirements; operating systems
should now return EINPROGRESS, rather than EINTR, when close()
is interrupted before it’s completely done, but after the file
descriptor number is released for reuse. As usual, though, it
will be a a long time before portable code can safely assume
all supported systems are compliant with this new requirement.
Regardless of the error code, on systems where an interrupted
close() cannot be retried, an interruption means that delayed
errors may be lost, and in turn *that* means data might silently
be lost. Therefore, we strongly recommend that programmers
avoid allowing close() to be interrupted by signals in the
first place. This can be done in all the usual ways—use only
signal handlers installed by sigaction(2) with the SA_RESTART
flag, keep signals blocked at all times except during calls
to ppoll(2), dedicate a thread to signal handling, etc.
[QUERY: Do we know if close() is allowed to block or report delayed
errors when no data has been written to the OFD since the last
completed fsync() or fdatasync() on that OFD? If it isn’t
allowed to block or report delayed errors in that case, another
good recommendation would be to always use at least fdatasync()
and let *that* be the thing that gets interrupted by signals.
The POSIX.1-2024 RATIONALE section makes a very similar
recommendation, but doesn’t appear to back that up with
normative requirements on close().]
STANDARDS
POSIX.1-2024.
HISTORY
The close() system call was present in Unix V7.
POSIX.1-2024 clarified the semantics of delayed errors; prior
to that revision, it was unspecified whether a close() call
that returned a delayed error would close the file descriptor.
However, we are not aware of any systems where it didn’t.
SEE ALSO
close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2),
unlink(2), open(2), read(2), write(2), fopen(3), fclose(3)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 0:33 ` Zack Weinberg
@ 2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
2026-01-23 14:05 ` Zack Weinberg
2026-01-24 19:34 ` The 8472
1 sibling, 2 replies; 23+ messages in thread
From: Alejandro Colomar @ 2026-01-23 1:02 UTC (permalink / raw)
To: Zack Weinberg
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]
Hi Zack,
On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote:
[...]
> This is a full top-to-bottom rewrite of the manpage; please speak
> up if you don't like any of my changes to any of it, not just the
> new stuff about delayed errors. It's written in freeform text for
> ease of reading; I'll do proper troff markup after the text is
> finalized. (Alejandro, do you have a preference between -man
> and -mdoc markup?)
Strong preference for man(7).
[...]
> ERRORS
> EBADF The fd argument was not a valid, open file descriptor.
>
> EINTR The close() call was interrupted by a signal.
> The file descriptor *may or may not* have been closed,
> depending on the operating system. See “Signals and
> close(),” below.
Punctuation like commas should go outside of the quotes (yes, I know
some styles do that, but we don't).
[...]
> STANDARDS
> POSIX.1-2024.
>
> HISTORY
> The close() system call was present in Unix V7.
That would be simply stated as:
V7.
We could also document the first POSIX standard, as not all Unix APIs
were standardized at the same time. Thus:
V7, POSIX.1-1988.
Thanks!
Have a lovely night!
Alex
>
> POSIX.1-2024 clarified the semantics of delayed errors; prior
> to that revision, it was unspecified whether a close() call
> that returned a delayed error would close the file descriptor.
> However, we are not aware of any systems where it didn’t.
>
> SEE ALSO
> close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2),
> unlink(2), open(2), read(2), write(2), fopen(3), fclose(3)
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:02 ` Alejandro Colomar
@ 2026-01-23 1:38 ` Al Viro
2026-01-23 14:44 ` Alejandro Colomar
2026-01-23 14:05 ` Zack Weinberg
1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2026-01-23 1:38 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote:
> > HISTORY
> > The close() system call was present in Unix V7.
>
> That would be simply stated as:
>
> V7.
>
> We could also document the first POSIX standard, as not all Unix APIs
> were standardized at the same time. Thus:
>
> V7, POSIX.1-1988.
>
> Thanks!
11/3/71 SYS CLOSE (II)
NAME close -- close a file
SYNOPSIS (file descriptor in r0)
sys close / close = 6.
DESCRIPTION Given a file descriptor such as returned from an open or
creat call, close closes the associated file. A close of
all files is automatic on exit, but since processes are
limited to 10 simultaneously open files, close is
necessary to programs which deal with many files.
FILES
SEE ALSO creat, open
DIAGNOSTICS The error bit (c—bit) is set for an unknown file
descriptor.
BUGS
OWNER ken, dmr
That's V1 manual. In V3 we already get EBADF on unopened descriptor;
in _all_ cases there close(N) ends up with descriptor N not opened.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
@ 2026-01-23 14:05 ` Zack Weinberg
1 sibling, 0 replies; 23+ messages in thread
From: Zack Weinberg @ 2026-01-23 14:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On Thu, Jan 22, 2026, at 8:02 PM, Alejandro Colomar wrote:
> On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote:
> [...]
>
>> (Alejandro, do you have a preference between -man
>> and -mdoc markup?)
>
> Strong preference for man(7).
OK.
>> close(),” below.
>
> Punctuation like commas should go outside of the quotes (yes, I know
> some styles do that, but we don't).
Will correct.
>> HISTORY
>> The close() system call was present in Unix V7.
>
> That would be simply stated as:
>
> V7.
Looking at other really old system calls (fork(), open(), read(), _exit(), link()),
they all say "SVr4, 4.3BSD, POSIX.1-2001" and that's what this one said too,
before I changed it. I think I'll put it back the way it was.
zw
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:38 ` Al Viro
@ 2026-01-23 14:44 ` Alejandro Colomar
0 siblings, 0 replies; 23+ messages in thread
From: Alejandro Colomar @ 2026-01-23 14:44 UTC (permalink / raw)
To: Al Viro
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
Hi Al,
On Fri, Jan 23, 2026 at 01:38:59AM +0000, Al Viro wrote:
> On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote:
> > > HISTORY
> > > The close() system call was present in Unix V7.
> >
> > That would be simply stated as:
> >
> > V7.
> >
> > We could also document the first POSIX standard, as not all Unix APIs
> > were standardized at the same time. Thus:
> >
> > V7, POSIX.1-1988.
> >
> > Thanks!
>
> 11/3/71 SYS CLOSE (II)
> NAME close -- close a file
> SYNOPSIS (file descriptor in r0)
> sys close / close = 6.
> DESCRIPTION Given a file descriptor such as returned from an open or
> creat call, close closes the associated file. A close of
> all files is automatic on exit, but since processes are
> limited to 10 simultaneously open files, close is
> necessary to programs which deal with many files.
> FILES
> SEE ALSO creat, open
> DIAGNOSTICS The error bit (c—bit) is set for an unknown file
> descriptor.
> BUGS
> OWNER ken, dmr
>
> That's V1 manual. In V3 we already get EBADF on unopened descriptor;
> in _all_ cases there close(N) ends up with descriptor N not opened.
Thanks! Then it should actually be
V1, POSIX.1-1988.
Let's not document the history change from V3, as those details are
better documented as part of the V3 manual and reading the sources.
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
@ 2026-01-24 19:34 ` The 8472
2026-01-24 21:39 ` Rich Felker
1 sibling, 1 reply; 23+ messages in thread
From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw)
To: Zack Weinberg, Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On 23/01/2026 01:33, Zack Weinberg wrote:
[...]
> ERRORS
> EBADF The fd argument was not a valid, open file descriptor.
Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
on close[0], that makes it more difficult to reliably detect bugs relating
to double-closes of file descriptors.
[...]
> Delayed errors reported by close()
>
> In a variety of situations, most notably when writing to a file
> that is hosted on a network file server, write(2) operations may
> “optimistically” return successfully as soon as the write has
> been queued for processing.
>
> close(2) waits for confirmation that *most* of the processing
> for previous writes to a file has been completed, and reports
> any errors that the earlier write() calls *would have* reported,
> if they hadn’t returned optimistically. Especially, close()
> will report “disk full” (ENOSPC) and “disk quota exceeded”
> (EDQUOT) errors that write() didn’t wait for.
>
> (To wait for *all* processing to complete, it is necessary to
> use fsync(2) as well.)
>
> Because of these delayed errors, it’s important to check the
> return value of close() and handle any errors it reports.
> Ignoring delayed errors can cause silent loss of data.
>
> However, when handling delayed errors, keep in mind that the
> close() call should *not* be repeated. When close() has a
> delayed error to report, it still closes the file before
> returning. The file descriptor number might already have been
> reused for some other file, especially in multithreaded
> programs. To make another attempt at the failed writes, it’s
> necessary to reopen the file and start all over again.
>
> [QUERY: Do delayed errors ever happen in any of these situations?
>
> - The fd is not the last reference to the open file description
>
> - The OFD was opened with O_RDONLY
>
> - The OFD was opened with O_RDWR but has never actually
> been written to
>
> - No data has been written to the OFD since the last call to
> fsync() for that OFD
>
> - No data has been written to the OFD since the last call to
> fdatasync() for that OFD
>
> If we can give some guidance about when people don’t need to
> worry about delayed errors, it would be helpful.]
>
The Rust standard library team is also interested in this topic, there
is lively discussion[1] whether it makes sense to surface errors from
close at all. Our current default is to ignore them.
It is my understanding that errors may not have happened yet at
the time of close due to delayed writeback or additional descriptors
pointing to the description, e.g. in a forked child, and thus
close() is not a reliable mechanism for error detection and
fsync() is the only available option.
Some users do care specifically about the unusual behavior
on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate
that there's no middle ground to get errors on an open file descriptor
or initiate the NFS flush behavior without a full fsync.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/libs-team/issues/705
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 19:34 ` The 8472
@ 2026-01-24 21:39 ` Rich Felker
2026-01-24 21:57 ` The 8472
0 siblings, 1 reply; 23+ messages in thread
From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw)
To: The 8472
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
> On 23/01/2026 01:33, Zack Weinberg wrote:
>
> [...]
>
> > ERRORS
> > EBADF The fd argument was not a valid, open file descriptor.
>
> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
> on close[0], that makes it more difficult to reliably detect bugs relating
> to double-closes of file descriptors.
Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
it? I wonder if that could even have security implications. I think
you could detect these fraudulent EBADFs (albeit not under conditions
where there's a race bug) by performing fcntl/F_GETFD before close and
knowing the EBADF from close is fake is fcntl didn't EBADF, but that
seems like an unreasonable cost to work around FUSE behaving badly.
Rich
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 21:39 ` Rich Felker
@ 2026-01-24 21:57 ` The 8472
2026-01-25 15:37 ` Zack Weinberg
0 siblings, 1 reply; 23+ messages in thread
From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On 24/01/2026 22:39, Rich Felker wrote:
> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>
>> [...]
>>
>>> ERRORS
>>> EBADF The fd argument was not a valid, open file descriptor.
>>
>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>> on close[0], that makes it more difficult to reliably detect bugs relating
>> to double-closes of file descriptors.
>
> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
> it?
Not when I brought it up last time, no[0]
> I wonder if that could even have security implications. I think
> you could detect these fraudulent EBADFs (albeit not under conditions
> where there's a race bug) by performing fcntl/F_GETFD before close and
> knowing the EBADF from close is fake is fcntl didn't EBADF, but that
> seems like an unreasonable cost to work around FUSE behaving badly.
>
> Rich
That's pretty much the workaround[1] we use, but due to the extra syscall it's
only done in debug builds.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 21:57 ` The 8472
@ 2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
0 siblings, 2 replies; 23+ messages in thread
From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw)
To: The 8472, Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> On 24/01/2026 22:39, Rich Felker wrote:
>> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>>
>>> [...]
>>>
>>>> ERRORS
>>>> EBADF The fd argument was not a valid, open file descriptor.
>>>
>>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>>> on close[0], that makes it more difficult to reliably detect bugs relating
>>> to double-closes of file descriptors.
>>
>> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
>> it?
>
> Not when I brought it up last time, no[0]
>
> [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
It seems to me that Antonio Muscemi’s point is valid for *most* errno
codes. Like, a whole lot of them exist just to give more information
*to a human user* about the cause of an unrecoverable error. Take
the list of “error codes that indicate a delayed error from a previous
write(2) operation,” from a little later in the draft, for instance:
there’s no plausible way for a *program* to react differently to
EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want
to react differently, so we want different error messages for each,
so they’re different error codes. It’s not a problem if the kernel
produces an error code of this type that wasn’t in the official
documented list, because the program doesn’t need to treat it specially.
But EBADF is different; it has the very specific meaning “user space
passed an invalid file descriptor to a system call,” which almost
always indicates a *bug in the program*, and allowing that meaning to
be diluted is not OK. It’s getting off topic for this conversation,
but there’s a short list of other errno codes that indicate a specific
situation that the *program* should respond to in a specific way
(EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones
I can think of) and maybe it would spark a more constructive
conversation on the kernel side if we presented a *comprehensive*
list of errno codes that FUSE servers shouldn’t be allowed to produce
with a specific rationale for each.
>> Delayed errors reported by close()
>>
>> In a variety of situations, most notably when writing to a file
>> that is hosted on a network file server, write(2) operations may
>> “optimistically” return successfully as soon as the write has
>> been queued for processing.
>>
>> close(2) waits for confirmation that *most* of the processing
>> for previous writes to a file has been completed, and reports
>> any errors that the earlier write() calls *would have* reported,
>> if they hadn’t returned optimistically. Especially, close()
>> will report “disk full” (ENOSPC) and “disk quota exceeded”
>> (EDQUOT) errors that write() didn’t wait for.
>
> The Rust standard library team is also interested in this topic, there
> is lively discussion[1] whether it makes sense to surface errors from
> close at all. Our current default is to ignore them.
> It is my understanding that errors may not have happened yet at
> the time of close due to delayed writeback or additional descriptors
> pointing to the description, e.g. in a forked child, and thus
> close() is not a reliable mechanism for error detection and
> fsync() is the only available option.
>
> [1] https://github.com/rust-lang/libs-team/issues/705
This is something I care about a lot as well, but I currently don’t
have an *opinion*. To form an informed opinion, I need the answers
to these questions:
>> [QUERY: Do delayed errors ever happen in any of these situations?
>>
>> - The fd is not the last reference to the open file description
>>
>> - The OFD was opened with O_RDONLY
>>
>> - The OFD was opened with O_RDWR but has never actually
>> been written to
>>
>> - No data has been written to the OFD since the last call to
>> fsync() for that OFD
>>
>> - No data has been written to the OFD since the last call to
>> fdatasync() for that OFD
>>
>> If we can give some guidance about when people don’t need to
>> worry about delayed errors, it would be helpful.]
In particular, I really hope delayed errors *aren’t* ever reported
when you close a file descriptor that *isn’t* the last reference
to its open file description, because the thread-safe way to close
stdout without losing write errors[2] depends on that not happening.
And whether the Rust stdlib can legitimately say “leaving aside the
additional cost of calling fsync(), you do not *need* the error return
from close() because you can call fsync() first,” depends on whether
it’s actually true that you *won’t* ever get a delayed error from
close() if you called fsync() first and didn’t do any more output in
between (assume the fd has no duplicates here). I would not be
surprised at all if those FUSE guys insisted on their right to make
char msg[] = "soon I will be invincible\n";
int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
write(fd, msg, sizeof(msg) - 1);
fsync(fd);
close(fd);
return an error *only* from the close, not the write or the fsync.
And I also wouldn’t be surprised at all to find production NFS or
SMB servers that did that.
[2] https://stackoverflow.com/a/50865617 (third code block)
zw
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-25 15:37 ` Zack Weinberg
@ 2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
1 sibling, 0 replies; 23+ messages in thread
From: Florian Weimer @ 2026-01-26 8:51 UTC (permalink / raw)
To: Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development
* Zack Weinberg:
> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.
> [2] https://stackoverflow.com/a/50865617 (third code block)
Are you sure about that? It means that errors are never reported if a
shell script redirects standard output over multiple commands.
Thanks,
Florian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
@ 2026-01-26 12:15 ` Jan Kara
2026-01-26 13:53 ` The 8472
1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2026-01-26 12:15 UTC (permalink / raw)
To: Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development
On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> >> Delayed errors reported by close()
> >>
> >> In a variety of situations, most notably when writing to a file
> >> that is hosted on a network file server, write(2) operations may
> >> “optimistically” return successfully as soon as the write has
> >> been queued for processing.
> >>
> >> close(2) waits for confirmation that *most* of the processing
> >> for previous writes to a file has been completed, and reports
> >> any errors that the earlier write() calls *would have* reported,
> >> if they hadn’t returned optimistically. Especially, close()
> >> will report “disk full” (ENOSPC) and “disk quota exceeded”
> >> (EDQUOT) errors that write() didn’t wait for.
> >
> > The Rust standard library team is also interested in this topic, there
> > is lively discussion[1] whether it makes sense to surface errors from
> > close at all. Our current default is to ignore them.
> > It is my understanding that errors may not have happened yet at
> > the time of close due to delayed writeback or additional descriptors
> > pointing to the description, e.g. in a forked child, and thus
> > close() is not a reliable mechanism for error detection and
> > fsync() is the only available option.
> >
> > [1] https://github.com/rust-lang/libs-team/issues/705
>
> This is something I care about a lot as well, but I currently don’t
> have an *opinion*. To form an informed opinion, I need the answers
> to these questions:
>
> >> [QUERY: Do delayed errors ever happen in any of these situations?
> >>
> >> - The fd is not the last reference to the open file description
> >>
> >> - The OFD was opened with O_RDONLY
> >>
> >> - The OFD was opened with O_RDWR but has never actually
> >> been written to
> >>
> >> - No data has been written to the OFD since the last call to
> >> fsync() for that OFD
> >>
> >> - No data has been written to the OFD since the last call to
> >> fdatasync() for that OFD
> >>
> >> If we can give some guidance about when people don’t need to
> >> worry about delayed errors, it would be helpful.]
>
> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.
So I've checked and in Linux ->flush callback for the file is called
whenever you close a file descriptor (regardless whether there are other
file descriptors pointing to the same file description) so it's upto
filesystem implementation what it decides to do and which error it will
return... Checking the implementations e.g. FUSE and NFS *will* return
delayed writeback errors on *first* descriptor close even if there are
other still open descriptors for the description AFAICS.
> And whether the Rust stdlib can legitimately say “leaving aside the
> additional cost of calling fsync(), you do not *need* the error return
> from close() because you can call fsync() first,” depends on whether
> it’s actually true that you *won’t* ever get a delayed error from
> close() if you called fsync() first and didn’t do any more output in
> between (assume the fd has no duplicates here). I would not be
> surprised at all if those FUSE guys insisted on their right to make
>
> char msg[] = "soon I will be invincible\n";
> int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
> write(fd, msg, sizeof(msg) - 1);
> fsync(fd);
> close(fd);
>
> return an error *only* from the close, not the write or the fsync.
So fsync(2) must make sure data is persistently stored and return error if
it was not. Thus as a VFS person I'd consider it a filesystem bug if an
error preveting reading data later was not returned from fsync(2). OTOH
that doesn't necessarily mean that later close doesn't return an error -
e.g. FUSE does communicate with the server on close that can fail and
error can be returned.
With this in mind let me now try to answer your remaining questions:
> >> - The OFD was opened with O_RDONLY
If the filesystem supports atime, close can in principle report that atime
update failed.
> >> - The OFD was opened with O_RDWR but has never actually
> >> been written to
The same as above but with inode mtime updates.
> >> - No data has been written to the OFD since the last call to
> >> fsync() for that OFD
No writeback errors should happen in this case. As I wrote above I'd
consider this a filesystem bug.
> >>
> >> - No data has been written to the OFD since the last call to
> >> fdatasync() for that OFD
Errors can happen because some inode metadata (in practice probably only
inode time stamps) may still need to be written out.
So in the cases described above (except for fsync()) you may get delayed
errors on close. But since in all those cases no data is lost, I don't
think 99.9% of applications care at all...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 12:15 ` Jan Kara
@ 2026-01-26 13:53 ` The 8472
2026-01-26 15:56 ` Jan Kara
0 siblings, 1 reply; 23+ messages in thread
From: The 8472 @ 2026-01-26 13:53 UTC (permalink / raw)
To: Jan Kara, Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On 26/01/2026 13:15, Jan Kara wrote:
> On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
>>
>>>> [QUERY: Do delayed errors ever happen in any of these situations?
>>>>
>>>> - The fd is not the last reference to the open file description
>>>>
>>>> - The OFD was opened with O_RDONLY
>>>>
>>>> - The OFD was opened with O_RDWR but has never actually
>>>> been written to
>>>>
>>>> - No data has been written to the OFD since the last call to
>>>> fsync() for that OFD
>>>>
>>>> - No data has been written to the OFD since the last call to
>>>> fdatasync() for that OFD
>>>>
>>>> If we can give some guidance about when people don’t need to
>>>> worry about delayed errors, it would be helpful.]
>>
>> In particular, I really hope delayed errors *aren’t* ever reported
>> when you close a file descriptor that *isn’t* the last reference
>> to its open file description, because the thread-safe way to close
>> stdout without losing write errors[2] depends on that not happening.
>
> So I've checked and in Linux ->flush callback for the file is called
> whenever you close a file descriptor (regardless whether there are other
> file descriptors pointing to the same file description) so it's upto
> filesystem implementation what it decides to do and which error it will
> return... Checking the implementations e.g. FUSE and NFS *will* return
> delayed writeback errors on *first* descriptor close even if there are
> other still open descriptors for the description AFAICS.
Regarding the "first", does that mean the errors only get delivered once?
I.e. if a concurrent fork/exec happens for process spawning and the fork-child
closes the file descriptors then this closing may basically receive the errors
and the parent will not see them (unless additional errors happen)?
Or if _any_ part of the program dups the descriptor and then closes it without
reporting errors then all uses of those descriptor must consider error delivery
on close to be unreliable?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 13:53 ` The 8472
@ 2026-01-26 15:56 ` Jan Kara
2026-01-26 16:43 ` Jeff Layton
0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2026-01-26 15:56 UTC (permalink / raw)
To: The 8472
Cc: Jan Kara, Zack Weinberg, Rich Felker, Alejandro Colomar,
Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development, Jeff Layton
On Mon 26-01-26 14:53:12, The 8472 wrote:
> On 26/01/2026 13:15, Jan Kara wrote:
> > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > >
> > > > > - The fd is not the last reference to the open file description
> > > > >
> > > > > - The OFD was opened with O_RDONLY
> > > > >
> > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > been written to
> > > > >
> > > > > - No data has been written to the OFD since the last call to
> > > > > fsync() for that OFD
> > > > >
> > > > > - No data has been written to the OFD since the last call to
> > > > > fdatasync() for that OFD
> > > > >
> > > > > If we can give some guidance about when people don’t need to
> > > > > worry about delayed errors, it would be helpful.]
> > >
> > > In particular, I really hope delayed errors *aren’t* ever reported
> > > when you close a file descriptor that *isn’t* the last reference
> > > to its open file description, because the thread-safe way to close
> > > stdout without losing write errors[2] depends on that not happening.
> >
> > So I've checked and in Linux ->flush callback for the file is called
> > whenever you close a file descriptor (regardless whether there are other
> > file descriptors pointing to the same file description) so it's upto
> > filesystem implementation what it decides to do and which error it will
> > return... Checking the implementations e.g. FUSE and NFS *will* return
> > delayed writeback errors on *first* descriptor close even if there are
> > other still open descriptors for the description AFAICS.
> Regarding the "first", does that mean the errors only get delivered once?
I've added Jeff to CC who should be able to provide you with a more
authoritative answer but AFAIK the answer is yes.
E.g. NFS does:
static int
nfs_file_flush(struct file *file, fl_owner_t id)
{
...
/* Flush writes to the server and return any errors */
since = filemap_sample_wb_err(file->f_mapping);
nfs_wb_all(inode);
return filemap_check_wb_err(file->f_mapping, since);
}
which will writeback all outstanding data on the first close and report
error if it happened. Following close has nothing to flush and thus no
error to report.
That being said if you call fsync(2) you'll still get the error back again
because fsync uses a separate writeback error counter in the file
description. But again only the first fsync(2) will return the error.
Following fsyncs will report no error.
> I.e. if a concurrent fork/exec happens for process spawning and the
> fork-child closes the file descriptors then this closing may basically
> receive the errors and the parent will not see them (unless additional
> errors happen)?
Correct AFAICT.
> Or if _any_ part of the program dups the descriptor and then closes it
> without reporting errors then all uses of those descriptor must consider
> error delivery on close to be unreliable?
Correct as well AFAICT.
I should probably also add that traditional filesystems (classical local
disk based filesystems) don't bother with reporting delayed errors on
close(2) *at all*. So unless you call fsync(2) you will never learn there
was any writeback error. After all for these filesystems there are good
chances writeback didn't even start by the time you are calling close(2).
So overall I'd say that error reporting from close(2) is so random and
filesystem dependent that the errors are not worth paying attention to. If
you really care about data integrity (and thus writeback errors) you must
call fsync(2) in which case the kernel provides at least somewhat
consistent error reporting story.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 15:56 ` Jan Kara
@ 2026-01-26 16:43 ` Jeff Layton
2026-01-26 23:01 ` Trevor Gross
0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2026-01-26 16:43 UTC (permalink / raw)
To: Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
> On Mon 26-01-26 14:53:12, The 8472 wrote:
> > On 26/01/2026 13:15, Jan Kara wrote:
> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > >
> > > > > > - The fd is not the last reference to the open file description
> > > > > >
> > > > > > - The OFD was opened with O_RDONLY
> > > > > >
> > > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > > been written to
> > > > > >
> > > > > > - No data has been written to the OFD since the last call to
> > > > > > fsync() for that OFD
> > > > > >
> > > > > > - No data has been written to the OFD since the last call to
> > > > > > fdatasync() for that OFD
> > > > > >
> > > > > > If we can give some guidance about when people don’t need to
> > > > > > worry about delayed errors, it would be helpful.]
> > > >
> > > > In particular, I really hope delayed errors *aren’t* ever reported
> > > > when you close a file descriptor that *isn’t* the last reference
> > > > to its open file description, because the thread-safe way to close
> > > > stdout without losing write errors[2] depends on that not happening.
> > >
> > > So I've checked and in Linux ->flush callback for the file is called
> > > whenever you close a file descriptor (regardless whether there are other
> > > file descriptors pointing to the same file description) so it's upto
> > > filesystem implementation what it decides to do and which error it will
> > > return... Checking the implementations e.g. FUSE and NFS *will* return
> > > delayed writeback errors on *first* descriptor close even if there are
> > > other still open descriptors for the description AFAICS.
...and I really wish they _didn't_.
Reporting a writeback error on close is not particularly useful. Most
filesystems don't require you to write back all data on a close(). A
successful close() on those just means that no error has happened yet.
Any application that cares about writeback errors needs to fsync(),
full stop.
> > Regarding the "first", does that mean the errors only get delivered once?
>
> I've added Jeff to CC who should be able to provide you with a more
> authoritative answer but AFAIK the answer is yes.
>
> E.g. NFS does:
>
> static int
> nfs_file_flush(struct file *file, fl_owner_t id)
> {
> ...
> /* Flush writes to the server and return any errors */
> since = filemap_sample_wb_err(file->f_mapping);
> nfs_wb_all(inode);
> return filemap_check_wb_err(file->f_mapping, since);
> }
>
> which will writeback all outstanding data on the first close and report
> error if it happened. Following close has nothing to flush and thus no
> error to report.
>
> That being said if you call fsync(2) you'll still get the error back again
> because fsync uses a separate writeback error counter in the file
> description. But again only the first fsync(2) will return the error.
> Following fsyncs will report no error.
>
Note that NFS is "special" in that it will flush data on close() in
order to maintain close-to-open cache consistency.
Technically, what nfs is doing above is sampling the errseq_t in the
mapping, and then writing back any dirty data, and then checking for
errors that happened since the sample. close() will only report
writeback errors that happened within that window. If a preexisting
writeback error occurred before "since" was sampled, then it won't
report that here...which is weird, and another good argument for not
reporting or checking for writeback errors at close().
> > I.e. if a concurrent fork/exec happens for process spawning and the
> > fork-child closes the file descriptors then this closing may basically
> > receive the errors and the parent will not see them (unless additional
> > errors happen)?
>
> Correct AFAICT.
>
It will see them if it calls fsync(). Reporting on close() is iffy.
> > Or if _any_ part of the program dups the descriptor and then closes it
> > without reporting errors then all uses of those descriptor must consider
> > error delivery on close to be unreliable?
>
> Correct as well AFAICT.
>
> I should probably also add that traditional filesystems (classical local
> disk based filesystems) don't bother with reporting delayed errors on
> close(2) *at all*. So unless you call fsync(2) you will never learn there
> was any writeback error. After all for these filesystems there are good
> chances writeback didn't even start by the time you are calling close(2).
> So overall I'd say that error reporting from close(2) is so random and
> filesystem dependent that the errors are not worth paying attention to. If
> you really care about data integrity (and thus writeback errors) you must
> call fsync(2) in which case the kernel provides at least somewhat
> consistent error reporting story.
>
+1.
tl;dr: the only useful error from close() is EBADF.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 16:43 ` Jeff Layton
@ 2026-01-26 23:01 ` Trevor Gross
2026-01-27 0:49 ` Jeff Layton
0 siblings, 1 reply; 23+ messages in thread
From: Trevor Gross @ 2026-01-26 23:01 UTC (permalink / raw)
To: Jeff Layton, Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
> On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
>> On Mon 26-01-26 14:53:12, The 8472 wrote:
>> > On 26/01/2026 13:15, Jan Kara wrote:
>> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
>> > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
>> > > > > >
>> > > > > > - The fd is not the last reference to the open file description
>> > > > > >
>> > > > > > - The OFD was opened with O_RDONLY
>> > > > > >
>> > > > > > - The OFD was opened with O_RDWR but has never actually
>> > > > > > been written to
>> > > > > >
>> > > > > > - No data has been written to the OFD since the last call to
>> > > > > > fsync() for that OFD
>> > > > > >
>> > > > > > - No data has been written to the OFD since the last call to
>> > > > > > fdatasync() for that OFD
>> > > > > >
>> > > > > > If we can give some guidance about when people don’t need to
>> > > > > > worry about delayed errors, it would be helpful.]
>> > > >
>> > > > In particular, I really hope delayed errors *aren’t* ever reported
>> > > > when you close a file descriptor that *isn’t* the last reference
>> > > > to its open file description, because the thread-safe way to close
>> > > > stdout without losing write errors[2] depends on that not happening.
>> > >
>> > > So I've checked and in Linux ->flush callback for the file is called
>> > > whenever you close a file descriptor (regardless whether there are other
>> > > file descriptors pointing to the same file description) so it's upto
>> > > filesystem implementation what it decides to do and which error it will
>> > > return... Checking the implementations e.g. FUSE and NFS *will* return
>> > > delayed writeback errors on *first* descriptor close even if there are
>> > > other still open descriptors for the description AFAICS.
>
> ...and I really wish they _didn't_.
>
> Reporting a writeback error on close is not particularly useful. Most
> filesystems don't require you to write back all data on a close(). A
> successful close() on those just means that no error has happened yet.
>
> Any application that cares about writeback errors needs to fsync(),
> full stop.
Is there a good middle ground solution here?
It seems reasonable that an application may want to have different
handling for errors expected during normal operation, such as temporary
network failure with NFS, compared to more catastrophic things like
failure to write to disk. The reason cited around [1] for avoiding fsync
is that it comes with a cost that, for many applications, may not be
worth it unless you are dealing with NFS.
I was wondering if it could be worth a new fnctl that provides this kind
of "best effort" error checking behavior without having the strict
requirements of fsync. In effect, to report the errors that you might
currently get at close() before actually calling close() and losing the
fd.
Alternatively, it would be interesting to have a deferred fsync() that
schedules a nonblocking sync event that can be polled for completion/
errors, with flags to indicate immediate sync or allow automatic syncing
as needed. But there is probably a better alternative to this
complexity.
- Trevor
[1]: https://github.com/rust-lang/libs-team/issues/705
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 23:01 ` Trevor Gross
@ 2026-01-27 0:49 ` Jeff Layton
2026-01-28 16:58 ` Zack Weinberg
0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2026-01-27 0:49 UTC (permalink / raw)
To: Trevor Gross, Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote:
> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
> > > On Mon 26-01-26 14:53:12, The 8472 wrote:
> > > > On 26/01/2026 13:15, Jan Kara wrote:
> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > > > >
> > > > > > > > - The fd is not the last reference to the open file description
> > > > > > > >
> > > > > > > > - The OFD was opened with O_RDONLY
> > > > > > > >
> > > > > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > > > > been written to
> > > > > > > >
> > > > > > > > - No data has been written to the OFD since the last call to
> > > > > > > > fsync() for that OFD
> > > > > > > >
> > > > > > > > - No data has been written to the OFD since the last call to
> > > > > > > > fdatasync() for that OFD
> > > > > > > >
> > > > > > > > If we can give some guidance about when people don’t need to
> > > > > > > > worry about delayed errors, it would be helpful.]
> > > > > >
> > > > > > In particular, I really hope delayed errors *aren’t* ever reported
> > > > > > when you close a file descriptor that *isn’t* the last reference
> > > > > > to its open file description, because the thread-safe way to close
> > > > > > stdout without losing write errors[2] depends on that not happening.
> > > > >
> > > > > So I've checked and in Linux ->flush callback for the file is called
> > > > > whenever you close a file descriptor (regardless whether there are other
> > > > > file descriptors pointing to the same file description) so it's upto
> > > > > filesystem implementation what it decides to do and which error it will
> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return
> > > > > delayed writeback errors on *first* descriptor close even if there are
> > > > > other still open descriptors for the description AFAICS.
> >
> > ...and I really wish they _didn't_.
> >
> > Reporting a writeback error on close is not particularly useful. Most
> > filesystems don't require you to write back all data on a close(). A
> > successful close() on those just means that no error has happened yet.
> >
> > Any application that cares about writeback errors needs to fsync(),
> > full stop.
>
> Is there a good middle ground solution here?
>
> It seems reasonable that an application may want to have different
> handling for errors expected during normal operation, such as temporary
> network failure with NFS, compared to more catastrophic things like
> failure to write to disk. The reason cited around [1] for avoiding fsync
> is that it comes with a cost that, for many applications, may not be
> worth it unless you are dealing with NFS.
>
> I was wondering if it could be worth a new fnctl that provides this kind
> of "best effort" error checking behavior without having the strict
> requirements of fsync. In effect, to report the errors that you might
> currently get at close() before actually calling close() and losing the
> fd.
>
For a long-held fd, I can see the appeal: spray writes at it and just
check occasionally (without blocking) that nothing has gone wrong.
Maybe when things are idle, you fsync().
A new fcntl(..., F_CHECKERR, ...) command that does a
file_check_and_advance_wb_err() on the fd and reports the result would
be pretty straightforward.
Would that be helpful for your use-case? This would be like a non-
blocking fsync that just reports whether an error has occurred since
the last F_CHECKERR or fsync().
> Alternatively, it would be interesting to have a deferred fsync() that
> schedules a nonblocking sync event that can be polled for completion/
> errors, with flags to indicate immediate sync or allow automatic syncing
> as needed. But there is probably a better alternative to this
> complexity.
>
> [1]: https://github.com/rust-lang/libs-team/issues/705
Aside from the polling, I suppose you could effectively do this with
io_uring. I'm pretty sure you can issue an fsync() or sync_file_range()
that way, but I think it just ends up blocking a kernel thread until
writeback is done.
We've had people ask for a non-blocking fsync before. Maybe it's time
to get serious about adding one. What would such a thing look like?
It would be pretty simple to add a new fcntl(..., F_DATAWRITE) command
that kicks off writeback a'la filemap_fdatawrite().
Then add fcntl(..., F_WB_CHECK):
That could do a non-blocking version of filemap_fdatawait(), and return
whether any folios are still under writeback. If there is a writeback
error, it can return that instead.
The catch of course is that a polling mechanism like this could easily
livelock. If there is a lot of memory pressure, it might always return
that something is still under writeback, no matter how often you hammer
F_CHECKERR.
Maybe that's ok? You can always issue a blocking fsync() if you really
need to know draw a line in the sand.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-27 0:49 ` Jeff Layton
@ 2026-01-28 16:58 ` Zack Weinberg
2026-02-05 9:34 ` Jan Kara
0 siblings, 1 reply; 23+ messages in thread
From: Zack Weinberg @ 2026-01-28 16:58 UTC (permalink / raw)
To: Jeff Layton, Trevor Gross, Jan Kara, The 8472
Cc: Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote:
> On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote:
>> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
>> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
>> > > On Mon 26-01-26 14:53:12, The 8472 wrote:
>> > > > On 26/01/2026 13:15, Jan Kara wrote:
>> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
...
>> > > > > > In particular, I really hope delayed errors *aren’t* ever reported
>> > > > > > when you close a file descriptor that *isn’t* the last reference
>> > > > > > to its open file description, because the thread-safe way to close
>> > > > > > stdout without losing write errors[2] depends on that not happening.
>> > > > >
>> > > > > So I've checked and in Linux ->flush callback for the file is called
>> > > > > whenever you close a file descriptor (regardless whether there are other
>> > > > > file descriptors pointing to the same file description) so it's upto
>> > > > > filesystem implementation what it decides to do and which error it will
>> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return
>> > > > > delayed writeback errors on *first* descriptor close even if there are
>> > > > > other still open descriptors for the description AFAICS.
>> >
>> > ...and I really wish they _didn't_.
>> >
>> > Reporting a writeback error on close is not particularly useful. Most
>> > filesystems don't require you to write back all data on a close(). A
>> > successful close() on those just means that no error has happened yet.
>> >
>> > Any application that cares about writeback errors needs to fsync(),
>> > full stop.
>>
>> Is there a good middle ground solution here?
...
>> I was wondering if it could be worth a new fnctl that provides this kind
>> of "best effort" error checking behavior without having the strict
>> requirements of fsync. In effect, to report the errors that you might
>> currently get at close() before actually calling close() and losing the
>> fd.
...
> A new fcntl(..., F_CHECKERR, ...) command that does a
> file_check_and_advance_wb_err() on the fd and reports the result would
> be pretty straightforward.
>
> Would that be helpful for your use-case? This would be like a non-
> blocking fsync that just reports whether an error has occurred since
> the last F_CHECKERR or fsync().
I feel I need to point out that “should the kernel report errors on
close()” and “should the kernel add a new API to make life better for
programs that currently expect close() to report [some] errors” and
“should the Rust standard library propagate errors produced by close()
back up to the application” and “what should the close(2) manpage say
about errors” are four different conversation topics.
I am all in favor of moving toward a world where close() never fails
and there’s _something_ that reports write errors like fsync() without
also kicking your application off a performance cliff. But that’s not
the world we live in today, and this thread started as a conversation
about revising the close(2) manpage, and I’d kinda like to *finish*
revising the manpage in, like, the next couple weeks, not several
years from now :-) So I’d like to refocus on that topic.
Given what Jan Kara said earlier...
> Checking the implementations e.g. FUSE and NFS *will* return delayed
> writeback errors on *first* descriptor close even if there are other
> still open descriptors for the description AFAICS.
...
> fsync(2) must make sure data is persistently stored and return error if
> it was not. Thus as a VFS person I'd consider it a filesystem bug if an
> error preveting reading data later was not returned from fsync(2). OTOH
> that doesn't necessarily mean that later close doesn't return an error -
> e.g. FUSE does communicate with the server on close that can fail and
> error can be returned.
>
> With this in mind let me now try to answer your remaining questions:
>
>> >> - The OFD was opened with O_RDONLY
>
> If the filesystem supports atime, close can in principle report that atime
> update failed.
>
>> >> - The OFD was opened with O_RDWR but has never actually
>> >> been written to
>
> The same as above but with inode mtime updates.
>
>> >> - No data has been written to the OFD since the last call to
>> >> fsync() for that OFD
>
> No writeback errors should happen in this case. As I wrote above I'd
> consider this a filesystem bug.
>
>> >>
>> >> - No data has been written to the OFD since the last call to
>> >> fdatasync() for that OFD
>
> Errors can happen because some inode metadata (in practice probably only
> inode time stamps) may still need to be written out.
>
> So in the cases described above (except for fsync()) you may get delayed
> errors on close. But since in all those cases no data is lost, I don't
> think 99.9% of applications care at all...
... regrettably I think this does mean the close(3) manpage still needs
to tell people to watch out for errors, and should probably say that
errors _can_ happen even if the file wasn’t written to, but are much
less likely to be important in that case.
And my “how to close stdout in a thread-safe manner” sample code is
wrong, because I was wrong to think that the error reporting only
happened on the _final_ close, when the OFD is destroyed.
... What happens if the close is implicit in a dup2() operation? Here’s
that erroneous “how to close stdout” fragment, with comments
indicating what I thought could and could not fail at the time I wrote
it:
// These allocate new fds, which can always fail, e.g. because
// the program already has too many files open.
int new_stdout = open("/dev/null", O_WRONLY);
if (new_stdout == -1) perror_exit("/dev/null");
int old_stdout = dup(1);
if (old_stdout == -1) perror_exit("dup(1)");
flockfile(stdout);
if (fflush(stdout)) perror_exit("stdout: write error");
dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1
funlockfile(stdout);
// this close may receive delayed write errors from previous writes
// to stdout
if (close(old_stdout)) perror_exit("stdout: write error");
// this close cannot fail, because it only drops an alternative
// reference to the open file description now installed as fd 1
close(new_stdout);
Note in particular that the first close _operation_ on fd 1 is in
consequence of dup2(new_stdout, 1). The dup2() manpage specifically
says “the close is performed silently (i.e. any errors during the
close are not reported by dup()” but, if stdout points to a file on
an NFS mount, are those errors _lost_, or will they actually be
reported by the subsequent close(old_stdout)?
Incidentally, the dup2() manpage has a very similar example in its
NOTES section, also presuming that close only reports errors on the
_final_ close, not when it “merely” drops reference >=2 to an OFD.
(I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that
already a thing somehow?)
zw
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-28 16:58 ` Zack Weinberg
@ 2026-02-05 9:34 ` Jan Kara
0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2026-02-05 9:34 UTC (permalink / raw)
To: Zack Weinberg
Cc: Jeff Layton, Trevor Gross, Jan Kara, The 8472, Rich Felker,
Alejandro Colomar, Vincent Lefevre, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
I've noticed we didn't reply to one question here:
On Wed 28-01-26 11:58:07, Zack Weinberg wrote:
> On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote:
> > Checking the implementations e.g. FUSE and NFS *will* return delayed
> > writeback errors on *first* descriptor close even if there are other
> > still open descriptors for the description AFAICS.
> ...
> > fsync(2) must make sure data is persistently stored and return error if
> > it was not. Thus as a VFS person I'd consider it a filesystem bug if an
> > error preveting reading data later was not returned from fsync(2). OTOH
> > that doesn't necessarily mean that later close doesn't return an error -
> > e.g. FUSE does communicate with the server on close that can fail and
> > error can be returned.
> >
> > With this in mind let me now try to answer your remaining questions:
> >
> >> >> - The OFD was opened with O_RDONLY
> >
> > If the filesystem supports atime, close can in principle report that atime
> > update failed.
> >
> >> >> - The OFD was opened with O_RDWR but has never actually
> >> >> been written to
> >
> > The same as above but with inode mtime updates.
> >
> >> >> - No data has been written to the OFD since the last call to
> >> >> fsync() for that OFD
> >
> > No writeback errors should happen in this case. As I wrote above I'd
> > consider this a filesystem bug.
> >
> >> >>
> >> >> - No data has been written to the OFD since the last call to
> >> >> fdatasync() for that OFD
> >
> > Errors can happen because some inode metadata (in practice probably only
> > inode time stamps) may still need to be written out.
> >
> > So in the cases described above (except for fsync()) you may get delayed
> > errors on close. But since in all those cases no data is lost, I don't
> > think 99.9% of applications care at all...
>
> ... regrettably I think this does mean the close(3) manpage still needs
> to tell people to watch out for errors, and should probably say that
> errors _can_ happen even if the file wasn’t written to, but are much
> less likely to be important in that case.
>
> And my “how to close stdout in a thread-safe manner” sample code is
> wrong, because I was wrong to think that the error reporting only
> happened on the _final_ close, when the OFD is destroyed.
>
> ... What happens if the close is implicit in a dup2() operation? Here’s
> that erroneous “how to close stdout” fragment, with comments
> indicating what I thought could and could not fail at the time I wrote
> it:
>
> // These allocate new fds, which can always fail, e.g. because
> // the program already has too many files open.
> int new_stdout = open("/dev/null", O_WRONLY);
> if (new_stdout == -1) perror_exit("/dev/null");
> int old_stdout = dup(1);
> if (old_stdout == -1) perror_exit("dup(1)");
>
> flockfile(stdout);
> if (fflush(stdout)) perror_exit("stdout: write error");
> dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1
> funlockfile(stdout);
>
> // this close may receive delayed write errors from previous writes
> // to stdout
> if (close(old_stdout)) perror_exit("stdout: write error");
>
> // this close cannot fail, because it only drops an alternative
> // reference to the open file description now installed as fd 1
> close(new_stdout);
>
> Note in particular that the first close _operation_ on fd 1 is in
> consequence of dup2(new_stdout, 1). The dup2() manpage specifically
> says “the close is performed silently (i.e. any errors during the
> close are not reported by dup()” but, if stdout points to a file on
> an NFS mount, are those errors _lost_, or will they actually be
> reported by the subsequent close(old_stdout)?
It is simply lost (the error is propagated from the filesystem to VFS which
just ignores it).
> Incidentally, the dup2() manpage has a very similar example in its
> NOTES section, also presuming that close only reports errors on the
> _final_ close, not when it “merely” drops reference >=2 to an OFD.
>
> (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that
> already a thing somehow?)
I don't think a functionality like this currently exists.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
[not found] ` <20250517133251.GY1509@brightrain.aerifal.cx>
[not found] ` <5jm7pblkwkhh4frqjptrw4ll4nwncn22ep2v7sli6kz5wxg5ik@pbnj6wfv66af>
@ 2026-02-06 15:13 ` Vincent Lefevre
1 sibling, 0 replies; 23+ messages in thread
From: Vincent Lefevre @ 2026-02-06 15:13 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
On 2025-05-17 09:32:52 -0400, Rich Felker wrote:
> On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote:
> > On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > > issue, and later changed it to returning 0 since applications
> > > (particularly, any written prior to this interpretation) are prone to
> > > interpret EINPROGRESS as an error condition rather than success and
> > > possibly misinterpret it as meaning the fd is still open and valid to
> > > pass to close again.
> >
> > If I understand correctly, this is a poor choice. POSIX.1-2024 says:
> >
> > ERRORS
> > The close() and posix_close() functions shall fail if:
> > [...]
> > [EINPROGRESS]
> > The function was interrupted by a signal and fildes was closed
> > but the close operation is continuing asynchronously.
> >
> > But this does not mean that the asynchronous close operation will
> > succeed.
>
> There are no asynchronous behaviors specified for there to be a
> conformance distinction here. The only observable behaviors happen
> instantly, mainly the release of the file descriptor and the process's
> handle on the underlying resource. Abstractly, there is no async
> operation that could succeed or fail.
Sorry, this is old. But a consequence may be memory leak if something
unexpected occurred during what was done asynchronously. There is no
guarantee that *every* resource has been released.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-02-06 15:20 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <a5tirrssh3t66q4vpwpgmxgxaumhqukw5nyxd4x6bevh7mtuvy@wtwdsb4oloh4>
[not found] ` <efaffc5a404cf104f225c26dbc96e0001cede8f9.1747399542.git.alx@kernel.org>
[not found] ` <20250516130547.GV1509@brightrain.aerifal.cx>
[not found] ` <20250516143957.GB5388@qaa.vinc17.org>
[not found] ` <20250517133251.GY1509@brightrain.aerifal.cx>
[not found] ` <5jm7pblkwkhh4frqjptrw4ll4nwncn22ep2v7sli6kz5wxg5ik@pbnj6wfv66af>
[not found] ` <8c47e10a-be82-4d5b-a45e-2526f6e95123@app.fastmail.com>
[not found] ` <20250524022416.GB6263@brightrain.aerifal.cx>
[not found] ` <1571b14d-1077-4e81-ab97-36e39099761e@app.fastmail.com>
[not found] ` <20260120174659.GE6263@brightrain.aerifal.cx>
[not found] ` <lhubjio5dsb.fsf@oldenburg.str.redhat.com>
[not found] ` <20260120190010.GF6263@brightrain.aerifal.cx>
2026-01-20 20:05 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Florian Weimer
2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2026-01-20 20:42 ` Alejandro Colomar
2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
2026-01-23 14:44 ` Alejandro Colomar
2026-01-23 14:05 ` Zack Weinberg
2026-01-24 19:34 ` The 8472
2026-01-24 21:39 ` Rich Felker
2026-01-24 21:57 ` The 8472
2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
2026-01-26 13:53 ` The 8472
2026-01-26 15:56 ` Jan Kara
2026-01-26 16:43 ` Jeff Layton
2026-01-26 23:01 ` Trevor Gross
2026-01-27 0:49 ` Jeff Layton
2026-01-28 16:58 ` Zack Weinberg
2026-02-05 9:34 ` Jan Kara
2026-02-06 15:13 ` Vincent Lefevre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox