* close(2) with EINTR has been changed by POSIX.1-2024
@ 2025-05-15 21:33 Alejandro Colomar
2025-05-16 10:48 ` Jan Kara
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-15 21:33 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-api
Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]
Hi,
I'm updating the manual pages for POSIX.1-2024, and have some doubts
about close(2). The manual page for close(2) says (conforming to
POSIX.1-2008):
The EINTR error is a somewhat special case. Regarding the EINTR
error, POSIX.1‐2008 says:
If close() is interrupted by a signal that is to be
caught, it shall return -1 with errno set to EINTR and
the state of fildes is unspecified.
This permits the behavior that occurs on Linux and many other
implementations, where, as with other errors that may be re‐
ported by close(), the file descriptor is guaranteed to be
closed. However, it also permits another possibility: that the
implementation returns an EINTR error and keeps the file de‐
scriptor open. (According to its documentation, HP‐UX’s close()
does this.) The caller must then once more use close() to close
the file descriptor, to avoid file descriptor leaks. This di‐
vergence in implementation behaviors provides a difficult hurdle
for portable applications, since on many implementations,
close() must not be called again after an EINTR error, and on at
least one, close() must be called again. There are plans to ad‐
dress this conundrum for the next major release of the POSIX.1
standard.
TL;DR: close(2) with EINTR is allowed to either leave the fd open or
closed, and Linux leaves it closed, while others (HP-UX only?) leaves it
open.
Now, POSIX.1-2024 says:
If close() is interrupted by a signal that is to be caught, then
it is unspecified whether it returns -1 with errno set to
[EINTR] and fildes remaining open, or returns -1 with errno set
to [EINPROGRESS] and fildes being closed, or returns 0 to
indicate successful completion; [...]
<https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
Which seems to bless HP-UX and screw all the others, requiring them to
report EINPROGRESS.
Was there any discussion about what to do in the Linux kernel?
Have a lovely night!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-15 21:33 close(2) with EINTR has been changed by POSIX.1-2024 Alejandro Colomar
@ 2025-05-16 10:48 ` Jan Kara
2025-05-16 12:11 ` Alejandro Colomar
` (4 more replies)
0 siblings, 5 replies; 56+ messages in thread
From: Jan Kara @ 2025-05-16 10:48 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-api, linux-man
Hi!
On Thu 15-05-25 23:33:22, Alejandro Colomar wrote:
> I'm updating the manual pages for POSIX.1-2024, and have some doubts
> about close(2). The manual page for close(2) says (conforming to
> POSIX.1-2008):
>
> The EINTR error is a somewhat special case. Regarding the EINTR
> error, POSIX.1‐2008 says:
>
> If close() is interrupted by a signal that is to be
> caught, it shall return -1 with errno set to EINTR and
> the state of fildes is unspecified.
>
> This permits the behavior that occurs on Linux and many other
> implementations, where, as with other errors that may be re‐
> ported by close(), the file descriptor is guaranteed to be
> closed. However, it also permits another possibility: that the
> implementation returns an EINTR error and keeps the file de‐
> scriptor open. (According to its documentation, HP‐UX’s close()
> does this.) The caller must then once more use close() to close
> the file descriptor, to avoid file descriptor leaks. This di‐
> vergence in implementation behaviors provides a difficult hurdle
> for portable applications, since on many implementations,
> close() must not be called again after an EINTR error, and on at
> least one, close() must be called again. There are plans to ad‐
> dress this conundrum for the next major release of the POSIX.1
> standard.
>
> TL;DR: close(2) with EINTR is allowed to either leave the fd open or
> closed, and Linux leaves it closed, while others (HP-UX only?) leaves it
> open.
>
> Now, POSIX.1-2024 says:
>
> If close() is interrupted by a signal that is to be caught, then
> it is unspecified whether it returns -1 with errno set to
> [EINTR] and fildes remaining open, or returns -1 with errno set
> to [EINPROGRESS] and fildes being closed, or returns 0 to
> indicate successful completion; [...]
>
> <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
>
> Which seems to bless HP-UX and screw all the others, requiring them to
> report EINPROGRESS.
>
> Was there any discussion about what to do in the Linux kernel?
I'm not aware of any discussions but indeed we are returning EINTR while
closing the fd. Frankly, changing the error code we return in that case is
really asking for userspace regressions so I'm of the opinion we just
ignore the standard as in my opinion it goes against a long established
reality.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 10:48 ` Jan Kara
@ 2025-05-16 12:11 ` Alejandro Colomar
2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar
2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik
` (3 subsequent siblings)
4 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-16 12:11 UTC (permalink / raw)
To: Jan Kara
Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
linux-man
[-- Attachment #1: Type: text/plain, Size: 844 bytes --]
Hi Jan!
On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
> > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
> >
> > Which seems to bless HP-UX and screw all the others, requiring them to
> > report EINPROGRESS.
> >
> > Was there any discussion about what to do in the Linux kernel?
>
> I'm not aware of any discussions but indeed we are returning EINTR while
> closing the fd. Frankly, changing the error code we return in that case is
> really asking for userspace regressions so I'm of the opinion we just
> ignore the standard as in my opinion it goes against a long established
> reality.
Yep, sounds like what I was expecting. I'll document that we'll ignore
the new POSIX for close(2) on purpose. Thanks!
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 10:48 ` Jan Kara
2025-05-16 12:11 ` Alejandro Colomar
@ 2025-05-16 12:41 ` Mateusz Guzik
2025-05-16 12:41 ` Theodore Ts'o
` (2 subsequent siblings)
4 siblings, 0 replies; 56+ messages in thread
From: Mateusz Guzik @ 2025-05-16 12:41 UTC (permalink / raw)
To: Jan Kara
Cc: Alejandro Colomar, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, linux-man
On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
> Hi!
>
> On Thu 15-05-25 23:33:22, Alejandro Colomar wrote:
> > I'm updating the manual pages for POSIX.1-2024, and have some doubts
> > about close(2). The manual page for close(2) says (conforming to
> > POSIX.1-2008):
> >
> > The EINTR error is a somewhat special case. Regarding the EINTR
> > error, POSIX.1‐2008 says:
> >
> > If close() is interrupted by a signal that is to be
> > caught, it shall return -1 with errno set to EINTR and
> > the state of fildes is unspecified.
> >
> > This permits the behavior that occurs on Linux and many other
> > implementations, where, as with other errors that may be re‐
> > ported by close(), the file descriptor is guaranteed to be
> > closed. However, it also permits another possibility: that the
> > implementation returns an EINTR error and keeps the file de‐
> > scriptor open. (According to its documentation, HP‐UX’s close()
> > does this.) The caller must then once more use close() to close
> > the file descriptor, to avoid file descriptor leaks. This di‐
> > vergence in implementation behaviors provides a difficult hurdle
> > for portable applications, since on many implementations,
> > close() must not be called again after an EINTR error, and on at
> > least one, close() must be called again. There are plans to ad‐
> > dress this conundrum for the next major release of the POSIX.1
> > standard.
> >
> > TL;DR: close(2) with EINTR is allowed to either leave the fd open or
> > closed, and Linux leaves it closed, while others (HP-UX only?) leaves it
> > open.
> >
> > Now, POSIX.1-2024 says:
> >
> > If close() is interrupted by a signal that is to be caught, then
> > it is unspecified whether it returns -1 with errno set to
> > [EINTR] and fildes remaining open, or returns -1 with errno set
> > to [EINPROGRESS] and fildes being closed, or returns 0 to
> > indicate successful completion; [...]
> >
> > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
> >
> > Which seems to bless HP-UX and screw all the others, requiring them to
> > report EINPROGRESS.
> >
> > Was there any discussion about what to do in the Linux kernel?
>
> I'm not aware of any discussions but indeed we are returning EINTR while
> closing the fd. Frankly, changing the error code we return in that case is
> really asking for userspace regressions so I'm of the opinion we just
> ignore the standard as in my opinion it goes against a long established
> reality.
I wonder what are they thinking there.
Any program which even bothers to check for EINTR assumes the fd is
already closed, so one has to assume augmenting behavior to support this
would result in fd leaks.
But that crappery aside, I do wonder if a close() variant which can fail
and leaves the fd intact would be warranted.
For example one of the error modes is ENOSPC (or at least the manpage
claims as much). As is the error is not actionable as the fd is gone.
If instead a magic flag was passed down to indicate what to do (e.g.,
leave the fd in place), the program could try to do some recovery (for
examples unlinking temp files it knows it stores there).
Similar deal with EINTR, albeit this error for close() would preferably get
eradicated instead.
Just some meh rambling.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 10:48 ` Jan Kara
2025-05-16 12:11 ` Alejandro Colomar
2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik
@ 2025-05-16 12:41 ` Theodore Ts'o
2025-05-19 23:19 ` Steffen Nurpmeso
2025-05-16 19:13 ` Al Viro
2025-05-19 9:48 ` Christian Brauner
4 siblings, 1 reply; 56+ messages in thread
From: Theodore Ts'o @ 2025-05-16 12:41 UTC (permalink / raw)
To: Jan Kara
Cc: Alejandro Colomar, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, linux-man
On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
> > Now, POSIX.1-2024 says:
> >
> > If close() is interrupted by a signal that is to be caught, then
> > it is unspecified whether it returns -1 with errno set to
> > [EINTR] and fildes remaining open, or returns -1 with errno set
> > to [EINPROGRESS] and fildes being closed, or returns 0 to
> > indicate successful completion; [...]
> >
> > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
> >
> > Which seems to bless HP-UX and screw all the others, requiring them to
> > report EINPROGRESS.
> >
> > Was there any discussion about what to do in the Linux kernel?
>
> I'm not aware of any discussions but indeed we are returning EINTR while
> closing the fd. Frankly, changing the error code we return in that case is
> really asking for userspace regressions so I'm of the opinion we just
> ignore the standard as in my opinion it goes against a long established
> reality.
Yeah, it appears that the Austin Group has lost all connection with
reality, and we should treat POSIX 2024 accordingly. Not breaking
userspace applications is way more important that POSIX 2024
compliance. Which is sad, because I used to really care about POSIX.1
standard as being very useful. But that seems to be no longer the
case...
- Ted
^ permalink raw reply [flat|nested] 56+ messages in thread
* [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 12:11 ` Alejandro Colomar
@ 2025-05-16 12:52 ` Alejandro Colomar
2025-05-16 13:05 ` Rich Felker
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-16 12:52 UTC (permalink / raw)
Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, libc-alpha
POSIX.1-2024 now mandates a behavior different from what Linux (and many
other implementations) does. It requires that we report EINPROGRESS for
what now is EINTR.
There are no plans to conform to POSIX.1-2024 within the Linux kernel,
so document this divergence. Keep POSIX.1-2008 as the standard to
which we conform in STANDARDS.
Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=14627>
Link: <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: <linux-api@vger.kernel.org>
Cc: <libc-alpha@sourceware.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
Hi,
I've prepared this draft for discussion. While doing so, I've noticed
the glibc bug ticket, which sounds possibly reasonable: returning 0
instead of reporting an error on EINTR. That would be an option that
would make us conforming to POSIX.1-2024. And given that a user can
(and must) do nothing after seeing EINTR, returning 0 wouldn't change
things.
So, I'll leave this patch open for discussion.
Have a lovely day!
Alex
man/man2/close.2 | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
diff --git a/man/man2/close.2 b/man/man2/close.2
index b25ea4de9..9d5e26eed 100644
--- a/man/man2/close.2
+++ b/man/man2/close.2
@@ -191,10 +191,7 @@ .SS Dealing with error returns from close()
meaning that the file descriptor was invalid)
even if they subsequently report an error on return from
.BR close ().
-POSIX.1 is currently silent on this point,
-but there are plans to mandate this behavior in the next major release
-.\" Issue 8
-of the standard.
+POSIX.1-2008 was silent on this point.
.P
A careful programmer who wants to know about I/O errors may precede
.BR close ()
@@ -206,7 +203,7 @@ .SS Dealing with error returns from close()
error is a somewhat special case.
Regarding the
.B EINTR
-error, POSIX.1-2008 says:
+error, POSIX.1-2008 said:
.P
.RS
If
@@ -243,16 +240,10 @@ .SS Dealing with error returns from close()
error, and on at least one,
.BR close ()
must be called again.
-There are plans to address this conundrum for
-the next major release of the POSIX.1 standard.
-.\" FIXME . for later review when Issue 8 is one day released...
-.\" POSIX proposes further changes for EINTR
-.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8
-.\" http://austingroupbugs.net/view.php?id=529
-.\"
-.\" FIXME .
-.\" Review the following glibc bug later
-.\" https://sourceware.org/bugzilla/show_bug.cgi?id=14627
+.P
+POSIX.1-2024 standardized the behavior of HP-UX,
+making Linux and many other implementations non-conforming.
+There are no plans to change the behavior on Linux.
.SH SEE ALSO
.BR close_range (2),
.BR fcntl (2),
Range-diff against v0:
-: --------- > 1: efaffc5a4 man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
base-commit: 978b017d93e4e32b752b33877e44a8365644630c
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar
@ 2025-05-16 13:05 ` Rich Felker
2025-05-16 14:20 ` Theodore Ts'o
2025-05-16 14:39 ` Vincent Lefevre
0 siblings, 2 replies; 56+ messages in thread
From: Rich Felker @ 2025-05-16 13:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, libc-alpha
On Fri, May 16, 2025 at 02:52:05PM +0200, Alejandro Colomar wrote:
> POSIX.1-2024 now mandates a behavior different from what Linux (and many
> other implementations) does. It requires that we report EINPROGRESS for
> what now is EINTR.
>
> There are no plans to conform to POSIX.1-2024 within the Linux kernel,
> so document this divergence. Keep POSIX.1-2008 as the standard to
> which we conform in STANDARDS.
>
> Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=14627>
> Link: <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Rich Felker <dalias@libc.org>
> Cc: <linux-fsdevel@vger.kernel.org>
> Cc: <linux-api@vger.kernel.org>
> Cc: <libc-alpha@sourceware.org>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>
> Hi,
>
> I've prepared this draft for discussion. While doing so, I've noticed
> the glibc bug ticket, which sounds possibly reasonable: returning 0
> instead of reporting an error on EINTR. That would be an option that
> would make us conforming to POSIX.1-2024. And given that a user can
> (and must) do nothing after seeing EINTR, returning 0 wouldn't change
> things.
>
> So, I'll leave this patch open for discussion.
FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
issue, and later changed it to returning 0 since applications
(particularly, any written prior to this interpretation) are prone to
interpret EINPROGRESS as an error condition rather than success and
possibly misinterpret it as meaning the fd is still open and valid to
pass to close again.
In general, raw kernel interfaces do not conform to any version of
POSIX; they're just a low-impedance-mismatch set of inferfaces that
facilitate implementing POSIX at the userspace libc layer. So I don't
think this should be documented as "Linux doesn't conform" but
(hopefully, once glibc fixes this) "old versions of glibc did not
conform".
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 13:05 ` Rich Felker
@ 2025-05-16 14:20 ` Theodore Ts'o
2025-05-17 5:46 ` Alejandro Colomar
2025-05-16 14:39 ` Vincent Lefevre
1 sibling, 1 reply; 56+ messages in thread
From: Theodore Ts'o @ 2025-05-16 14:20 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
> In general, raw kernel interfaces do not conform to any version of
> POSIX; they're just a low-impedance-mismatch set of inferfaces that
> facilitate implementing POSIX at the userspace libc layer. So I don't
> think this should be documented as "Linux doesn't conform" but
> (hopefully, once glibc fixes this) "old versions of glibc did not
> conform".
If glibc maintainers want to deal with breaking userspace, then as a
kernel developer, I'm happy to let them deal with the
angry/disappointed users and application programmers. :-)
- Ted
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 13:05 ` Rich Felker
2025-05-16 14:20 ` Theodore Ts'o
@ 2025-05-16 14:39 ` Vincent Lefevre
2025-05-16 14:52 ` Florian Weimer
` (2 more replies)
1 sibling, 3 replies; 56+ messages in thread
From: Vincent Lefevre @ 2025-05-16 14:39 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> issue, and later changed it to returning 0 since applications
> (particularly, any written prior to this interpretation) are prone to
> interpret EINPROGRESS as an error condition rather than success and
> possibly misinterpret it as meaning the fd is still open and valid to
> pass to close again.
If I understand correctly, this is a poor choice. POSIX.1-2024 says:
ERRORS
The close() and posix_close() functions shall fail if:
[...]
[EINPROGRESS]
The function was interrupted by a signal and fildes was closed
but the close operation is continuing asynchronously.
But this does not mean that the asynchronous close operation will
succeed.
So the application could incorrectly deduce that the close operation
was done without any error.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 14:39 ` Vincent Lefevre
@ 2025-05-16 14:52 ` Florian Weimer
2025-05-16 15:28 ` Vincent Lefevre
2025-05-16 15:28 ` Rich Felker
2025-05-17 13:32 ` Rich Felker
2 siblings, 1 reply; 56+ messages in thread
From: Florian Weimer @ 2025-05-16 14:52 UTC (permalink / raw)
To: Vincent Lefevre
Cc: Rich Felker, Alejandro Colomar, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, libc-alpha
* Vincent Lefevre:
> On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
>> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
>> issue, and later changed it to returning 0 since applications
>> (particularly, any written prior to this interpretation) are prone to
>> interpret EINPROGRESS as an error condition rather than success and
>> possibly misinterpret it as meaning the fd is still open and valid to
>> pass to close again.
>
> If I understand correctly, this is a poor choice. POSIX.1-2024 says:
>
> ERRORS
> The close() and posix_close() functions shall fail if:
> [...]
> [EINPROGRESS]
> The function was interrupted by a signal and fildes was closed
> but the close operation is continuing asynchronously.
>
> But this does not mean that the asynchronous close operation will
> succeed.
>
> So the application could incorrectly deduce that the close operation
> was done without any error.
But on Linux, close traditionally has poor error reporting anyway. You
have to fsync (or equivalent) before calling close if you want error
checking. On other systems, the fsync is more or less implied by the
close, leading to rather poor performance.
Thanks,
Florian
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 14:39 ` Vincent Lefevre
2025-05-16 14:52 ` Florian Weimer
@ 2025-05-16 15:28 ` Rich Felker
2025-05-17 13:32 ` Rich Felker
2 siblings, 0 replies; 56+ messages in thread
From: Rich Felker @ 2025-05-16 15:28 UTC (permalink / raw)
To: Vincent Lefevre, Alejandro Colomar, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, libc-alpha
On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote:
> On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > issue, and later changed it to returning 0 since applications
> > (particularly, any written prior to this interpretation) are prone to
> > interpret EINPROGRESS as an error condition rather than success and
> > possibly misinterpret it as meaning the fd is still open and valid to
> > pass to close again.
>
> If I understand correctly, this is a poor choice. POSIX.1-2024 says:
>
> ERRORS
> The close() and posix_close() functions shall fail if:
> [...]
> [EINPROGRESS]
> The function was interrupted by a signal and fildes was closed
> but the close operation is continuing asynchronously.
>
> But this does not mean that the asynchronous close operation will
> succeed.
It always succeeds in the way that's important: the file descriptor is
freed and the process no longer has this reference to the open file
description.
What might or might not succeed is:
(1) other ancient legacy behaviors coupled to close(), like rewinding
a tape drive. If the application cares how that behaves, it needs to
be performing an explicit rewind *before* calling close, when it still
has a handle on the open file so that it can respond to exceptional
conditions, not relying on a legacy behavior like "close also rewinds"
that's device-specific and outside the scope of any modern
cross-platform standard.
(2) deferred operations in unsafe async NFS setups. This is a huge
mess with no real reliable solution except "don't configure your NFS
to have unsafe and nonconforming behaviors in the pursuit of
performance".
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 14:52 ` Florian Weimer
@ 2025-05-16 15:28 ` Vincent Lefevre
0 siblings, 0 replies; 56+ messages in thread
From: Vincent Lefevre @ 2025-05-16 15:28 UTC (permalink / raw)
To: Florian Weimer
Cc: Rich Felker, Alejandro Colomar, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, libc-alpha
On 2025-05-16 16:52:48 +0200, Florian Weimer wrote:
> * Vincent Lefevre:
>
> > On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> >> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> >> issue, and later changed it to returning 0 since applications
> >> (particularly, any written prior to this interpretation) are prone to
> >> interpret EINPROGRESS as an error condition rather than success and
> >> possibly misinterpret it as meaning the fd is still open and valid to
> >> pass to close again.
> >
> > If I understand correctly, this is a poor choice. POSIX.1-2024 says:
> >
> > ERRORS
> > The close() and posix_close() functions shall fail if:
> > [...]
> > [EINPROGRESS]
> > The function was interrupted by a signal and fildes was closed
> > but the close operation is continuing asynchronously.
> >
> > But this does not mean that the asynchronous close operation will
> > succeed.
> >
> > So the application could incorrectly deduce that the close operation
> > was done without any error.
>
> But on Linux, close traditionally has poor error reporting anyway. You
> have to fsync (or equivalent) before calling close if you want error
> checking. On other systems, the fsync is more or less implied by the
> close, leading to rather poor performance.
According to its documentation, fsync is only for storage devices,
while not all file descriptors are associated with storage devices.
So I'm wondering the consequences in the other cases.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 10:48 ` Jan Kara
` (2 preceding siblings ...)
2025-05-16 12:41 ` Theodore Ts'o
@ 2025-05-16 19:13 ` Al Viro
2025-05-19 9:48 ` Christian Brauner
4 siblings, 0 replies; 56+ messages in thread
From: Al Viro @ 2025-05-16 19:13 UTC (permalink / raw)
To: Jan Kara
Cc: Alejandro Colomar, Christian Brauner, linux-fsdevel, linux-api,
linux-man
On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
> I'm not aware of any discussions but indeed we are returning EINTR while
> closing the fd. Frankly, changing the error code we return in that case is
> really asking for userspace regressions so I'm of the opinion we just
> ignore the standard as in my opinion it goes against a long established
> reality.
AFAICS what happens is that relevance of Austin Group has dropped so low
that they stopped caring about any BS filters they used to have. What
we are seeing now is assorted pet idiocies that used to sit in their
system, periodically getting shot down while there had been anyone who
cared to do that.
Sad, of course, but what can we do, other than politely ignoring the... output?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 14:20 ` Theodore Ts'o
@ 2025-05-17 5:46 ` Alejandro Colomar
2025-05-17 13:03 ` Alejandro Colomar
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-17 5:46 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Rich Felker, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1912 bytes --]
Hi Ted, Rich,
On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> issue, and later changed it to returning 0 since applications
> (particularly, any written prior to this interpretation) are prone to
> interpret EINPROGRESS as an error condition rather than success and
> possibly misinterpret it as meaning the fd is still open and valid to
> pass to close again.
Hmmm, this page will need a kernel/libc differences section where I
should explain this.
On Fri, May 16, 2025 at 10:20:24AM -0400, Theodore Ts'o wrote:
> On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
>
> > In general, raw kernel interfaces do not conform to any version of
> > POSIX; they're just a low-impedance-mismatch set of inferfaces that
> > facilitate implementing POSIX at the userspace libc layer. So I don't
> > think this should be documented as "Linux doesn't conform" but
> > (hopefully, once glibc fixes this) "old versions of glibc did not
> > conform".
>
> If glibc maintainers want to deal with breaking userspace, then as a
> kernel developer, I'm happy to let them deal with the
> angry/disappointed users and application programmers. :-)
Which breakage do you expect from the behavior that musl has chosen?
I agree that the POSIX invention of EINPROGRESS is something that would
break users. However, in removing the error completely and making it a
success, I don't see the same problem. That is, if a program calls
close(2) and sees a return of 0, or sees a return of -1 with EINTR on
Linux, both mean "the file descriptor has been closed, and the contents
of the file will *eventually* reach the file".
In which cases do you expect any existing Linux program to behave
differently on 0 and on EINTR?
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-17 5:46 ` Alejandro Colomar
@ 2025-05-17 13:03 ` Alejandro Colomar
2025-05-17 13:43 ` Rich Felker
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-17 13:03 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Rich Felker, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 2731 bytes --]
Hi,
On Sat, May 17, 2025 at 07:46:48AM +0200, Alejandro Colomar wrote:
> Hi Ted, Rich,
>
> On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
> > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > issue, and later changed it to returning 0 since applications
> > (particularly, any written prior to this interpretation) are prone to
> > interpret EINPROGRESS as an error condition rather than success and
> > possibly misinterpret it as meaning the fd is still open and valid to
> > pass to close again.
BTW, I don't think that's a correct interpretation. The manual page
clearly says after close(2), even on error, the fd is closed and not
usable. The issue I see is a program thinking it failed and trying to
copy the file again or reporting an error.
On the other hand, as Vincent said, maybe this is not so bad. For
certain files, fsync(2) is only described for storage devices, so in
some cases there's no clear way to make sure close(2) won't fail after
EINTR (maybe calling sync(2)?). So, maybe considering it an error
wouldn't be a terrible idea.
I don't know.
Cheers,
Alex
>
> Hmmm, this page will need a kernel/libc differences section where I
> should explain this.
>
> On Fri, May 16, 2025 at 10:20:24AM -0400, Theodore Ts'o wrote:
> > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
> >
> > > In general, raw kernel interfaces do not conform to any version of
> > > POSIX; they're just a low-impedance-mismatch set of inferfaces that
> > > facilitate implementing POSIX at the userspace libc layer. So I don't
> > > think this should be documented as "Linux doesn't conform" but
> > > (hopefully, once glibc fixes this) "old versions of glibc did not
> > > conform".
> >
> > If glibc maintainers want to deal with breaking userspace, then as a
> > kernel developer, I'm happy to let them deal with the
> > angry/disappointed users and application programmers. :-)
>
> Which breakage do you expect from the behavior that musl has chosen?
>
> I agree that the POSIX invention of EINPROGRESS is something that would
> break users. However, in removing the error completely and making it a
> success, I don't see the same problem. That is, if a program calls
> close(2) and sees a return of 0, or sees a return of -1 with EINTR on
> Linux, both mean "the file descriptor has been closed, and the contents
> of the file will *eventually* reach the file".
>
> In which cases do you expect any existing Linux program to behave
> differently on 0 and on EINTR?
>
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-16 14:39 ` Vincent Lefevre
2025-05-16 14:52 ` Florian Weimer
2025-05-16 15:28 ` Rich Felker
@ 2025-05-17 13:32 ` Rich Felker
2025-05-17 13:46 ` Alejandro Colomar
2026-02-06 15:13 ` Vincent Lefevre
2 siblings, 2 replies; 56+ messages in thread
From: Rich Felker @ 2025-05-17 13:32 UTC (permalink / raw)
To: Vincent Lefevre, Alejandro Colomar, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, libc-alpha
On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote:
> On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > issue, and later changed it to returning 0 since applications
> > (particularly, any written prior to this interpretation) are prone to
> > interpret EINPROGRESS as an error condition rather than success and
> > possibly misinterpret it as meaning the fd is still open and valid to
> > pass to close again.
>
> If I understand correctly, this is a poor choice. POSIX.1-2024 says:
>
> ERRORS
> The close() and posix_close() functions shall fail if:
> [...]
> [EINPROGRESS]
> The function was interrupted by a signal and fildes was closed
> but the close operation is continuing asynchronously.
>
> But this does not mean that the asynchronous close operation will
> succeed.
There are no asynchronous behaviors specified for there to be a
conformance distinction here. The only observable behaviors happen
instantly, mainly the release of the file descriptor and the process's
handle on the underlying resource. Abstractly, there is no async
operation that could succeed or fail.
> So the application could incorrectly deduce that the close operation
> was done without any error.
This deduction is correct, not incorrect. Rather, failing with
EINPROGRESS would make the application incorrectly deduce that there
might be some error it missed (even if it's aware of the new error
code), and absolutely does make all existing applications written
prior to the new text in POSIX 2024 unable to determine if the fd was
even released and needs to be passed to close again or not.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-17 13:03 ` Alejandro Colomar
@ 2025-05-17 13:43 ` Rich Felker
0 siblings, 0 replies; 56+ messages in thread
From: Rich Felker @ 2025-05-17 13:43 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Theodore Ts'o, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
On Sat, May 17, 2025 at 03:03:52PM +0200, Alejandro Colomar wrote:
> Hi,
>
> On Sat, May 17, 2025 at 07:46:48AM +0200, Alejandro Colomar wrote:
> > Hi Ted, Rich,
> >
> > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote:
> > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > > issue, and later changed it to returning 0 since applications
> > > (particularly, any written prior to this interpretation) are prone to
> > > interpret EINPROGRESS as an error condition rather than success and
> > > possibly misinterpret it as meaning the fd is still open and valid to
> > > pass to close again.
>
> BTW, I don't think that's a correct interpretation. The manual page
> clearly says after close(2), even on error, the fd is closed and not
> usable. The issue I see is a program thinking it failed and trying to
> copy the file again or reporting an error.
The authoritative source here is POSIX not the man page, assuming
you're writing a portable application and not a "Linux application".
Until the lastest issue (POSIX 2024/Issue 8), the state of the fd
after EINTR was explicitly unspecified, and after other errors was
unspecified by omission. So there is no way for a program written to
prior versions of the standard to have known how to safely handle
getting EINPROGRESS -- or any error from close for that matter.
Really, the only safe error for close to return, *ever*, is EBADF. On
valid input, it *must succeed*. This is a general principle for
"deallocation/destruction functions". Not an explicit requirement of
this or any standard; just a logical requirement for forward progress
to be possible.
> On the other hand, as Vincent said, maybe this is not so bad. For
> certain files, fsync(2) is only described for storage devices, so in
> some cases there's no clear way to make sure close(2) won't fail after
> EINTR (maybe calling sync(2)?). So, maybe considering it an error
> wouldn't be a terrible idea.
Whether data is committed to physical storage in a way that's robust
against machine faults is a completely separate issue from whether
it's committed to the abstract storage. The latter happens at the
moment of write, not close.
If an application is trying to ensure that kind of robustness, the
return value of close is not the tool. It needs the Synchronized IO
interfaces (fsync, etc.) or something specific to whatever it's
writing to.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-17 13:32 ` Rich Felker
@ 2025-05-17 13:46 ` Alejandro Colomar
2025-05-23 18:10 ` Zack Weinberg
2026-02-06 15:13 ` Vincent Lefevre
1 sibling, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-17 13:46 UTC (permalink / raw)
To: Rich Felker
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]
On Sat, May 17, 2025 at 09:32:52AM -0400, Rich Felker wrote:
> On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote:
> > On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > > issue, and later changed it to returning 0 since applications
> > > (particularly, any written prior to this interpretation) are prone to
> > > interpret EINPROGRESS as an error condition rather than success and
> > > possibly misinterpret it as meaning the fd is still open and valid to
> > > pass to close again.
> >
> > If I understand correctly, this is a poor choice. POSIX.1-2024 says:
> >
> > ERRORS
> > The close() and posix_close() functions shall fail if:
> > [...]
> > [EINPROGRESS]
> > The function was interrupted by a signal and fildes was closed
> > but the close operation is continuing asynchronously.
> >
> > But this does not mean that the asynchronous close operation will
> > succeed.
>
> There are no asynchronous behaviors specified for there to be a
> conformance distinction here. The only observable behaviors happen
> instantly, mainly the release of the file descriptor and the process's
> handle on the underlying resource. Abstractly, there is no async
> operation that could succeed or fail.
>
> > So the application could incorrectly deduce that the close operation
> > was done without any error.
>
> This deduction is correct, not incorrect. Rather, failing with
> EINPROGRESS would make the application incorrectly deduce that there
> might be some error it missed (even if it's aware of the new error
> code), and absolutely does make all existing applications written
> prior to the new text in POSIX 2024 unable to determine if the fd was
> even released and needs to be passed to close again or not.
Hi Rich,
I think this is not correct; at least on Linux. The manual page is very
clear that close(2) should not be retried on error:
Dealing with error returns from close()
A careful programmer will check the return value of close(),
since it is quite possible that errors on a previous write(2)
operation are reported only on the final close() that releases
the open file description. Failing to check the return value
when closing a file may lead to silent loss of data. This can
especially be observed with NFS and with disk quota.
Note, however, that a failure return should be used only for di‐
agnostic purposes (i.e., a warning to the application that there
may still be I/O pending or there may have been failed I/O) or
remedial purposes (e.g., writing the file once more or creating
a backup).
Retrying the close() after a failure return is the wrong thing
to do, since this may cause a reused file descriptor from an‐
other thread to be closed. This can occur because the Linux
kernel always releases the file descriptor early in the close
operation, freeing it for reuse; the steps that may return an
error, such as flushing data to the filesystem or device, occur
only later in the close operation.
...
A careful programmer who wants to know about I/O errors may pre‐
cede close() with a call to fsync(2).
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 10:48 ` Jan Kara
` (3 preceding siblings ...)
2025-05-16 19:13 ` Al Viro
@ 2025-05-19 9:48 ` Christian Brauner
4 siblings, 0 replies; 56+ messages in thread
From: Christian Brauner @ 2025-05-19 9:48 UTC (permalink / raw)
To: Jan Kara
Cc: Alejandro Colomar, Alexander Viro, linux-fsdevel, linux-api,
linux-man
On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
> Hi!
>
> On Thu 15-05-25 23:33:22, Alejandro Colomar wrote:
> > I'm updating the manual pages for POSIX.1-2024, and have some doubts
> > about close(2). The manual page for close(2) says (conforming to
> > POSIX.1-2008):
> >
> > The EINTR error is a somewhat special case. Regarding the EINTR
> > error, POSIX.1‐2008 says:
> >
> > If close() is interrupted by a signal that is to be
> > caught, it shall return -1 with errno set to EINTR and
> > the state of fildes is unspecified.
> >
> > This permits the behavior that occurs on Linux and many other
> > implementations, where, as with other errors that may be re‐
> > ported by close(), the file descriptor is guaranteed to be
> > closed. However, it also permits another possibility: that the
> > implementation returns an EINTR error and keeps the file de‐
> > scriptor open. (According to its documentation, HP‐UX’s close()
> > does this.) The caller must then once more use close() to close
> > the file descriptor, to avoid file descriptor leaks. This di‐
> > vergence in implementation behaviors provides a difficult hurdle
> > for portable applications, since on many implementations,
> > close() must not be called again after an EINTR error, and on at
> > least one, close() must be called again. There are plans to ad‐
> > dress this conundrum for the next major release of the POSIX.1
> > standard.
> >
> > TL;DR: close(2) with EINTR is allowed to either leave the fd open or
> > closed, and Linux leaves it closed, while others (HP-UX only?) leaves it
> > open.
> >
> > Now, POSIX.1-2024 says:
> >
> > If close() is interrupted by a signal that is to be caught, then
> > it is unspecified whether it returns -1 with errno set to
> > [EINTR] and fildes remaining open, or returns -1 with errno set
> > to [EINPROGRESS] and fildes being closed, or returns 0 to
> > indicate successful completion; [...]
> >
> > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
> >
> > Which seems to bless HP-UX and screw all the others, requiring them to
> > report EINPROGRESS.
> >
> > Was there any discussion about what to do in the Linux kernel?
>
> I'm not aware of any discussions but indeed we are returning EINTR while
> closing the fd. Frankly, changing the error code we return in that case is
> really asking for userspace regressions so I'm of the opinion we just
> ignore the standard as in my opinion it goes against a long established
> reality.
Ignore. We've long since stopped designing apis with input from that
standard in mind. And I think that was a very wise decision.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-16 12:41 ` Theodore Ts'o
@ 2025-05-19 23:19 ` Steffen Nurpmeso
2025-05-20 13:37 ` Theodore Ts'o
0 siblings, 1 reply; 56+ messages in thread
From: Steffen Nurpmeso @ 2025-05-19 23:19 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso
Theodore Ts'o wrote in
<20250516124147.GB7158@mit.edu>:
|On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote:
|>> Now, POSIX.1-2024 says:
|>>
|>> If close() is interrupted by a signal that is to be caught, then
|>> it is unspecified whether it returns -1 with errno set to
|>> [EINTR] and fildes remaining open, or returns -1 with errno set
|>> to [EINPROGRESS] and fildes being closed, or returns 0 to
|>> indicate successful completion; [...]
|>>
|>> <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
|>>
|>> Which seems to bless HP-UX and screw all the others, requiring them to
|>> report EINPROGRESS.
|>>
|>> Was there any discussion about what to do in the Linux kernel?
|>
|> I'm not aware of any discussions but indeed we are returning EINTR while
|> closing the fd. Frankly, changing the error code we return in that \
|> case is
|> really asking for userspace regressions so I'm of the opinion we just
|> ignore the standard as in my opinion it goes against a long established
|> reality.
|
|Yeah, it appears that the Austin Group has lost all connection with
|reality, and we should treat POSIX 2024 accordingly. Not breaking
|userspace applications is way more important that POSIX 2024
|compliance. Which is sad, because I used to really care about POSIX.1
|standard as being very useful. But that seems to be no longer the
|case...
They could not do otherwise than talking the status quo, i think.
They have explicitly added posix_close() which overcomes the
problem (for those operating systems which actually act like
that). There is a long RATIONALE on this, it starts on page 747 :)
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-19 23:19 ` Steffen Nurpmeso
@ 2025-05-20 13:37 ` Theodore Ts'o
2025-05-20 23:16 ` Steffen Nurpmeso
0 siblings, 1 reply; 56+ messages in thread
From: Theodore Ts'o @ 2025-05-20 13:37 UTC (permalink / raw)
To: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso
On Tue, May 20, 2025 at 01:19:19AM +0200, Steffen Nurpmeso wrote:
>
> They could not do otherwise than talking the status quo, i think.
> They have explicitly added posix_close() which overcomes the
> problem (for those operating systems which actually act like
> that). There is a long RATIONALE on this, it starts on page 747 :)
They could have just added posix_close() which provided well-defined
semantics without demanding that existing implementations make
non-backwards compatible changes to close(2). Personally, while they
were adding posix_close(2) they could have also fixed the disaster
which is the semantics around close(2) and how advisory locks get
released that were held by other file descriptors and add a profound
apologies over the insane semantics demanded by POSIX[1].
[1] "POSIX advisory locks are broken by design."
https://www.sqlite.org/src/artifact/c230a7a24?ln=994-1081
- Ted
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024
2025-05-20 13:37 ` Theodore Ts'o
@ 2025-05-20 23:16 ` Steffen Nurpmeso
0 siblings, 0 replies; 56+ messages in thread
From: Steffen Nurpmeso @ 2025-05-20 23:16 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso
Theodore Ts'o wrote in
<20250520133705.GE38098@mit.edu>:
|On Tue, May 20, 2025 at 01:19:19AM +0200, Steffen Nurpmeso wrote:
|> They could not do otherwise than talking the status quo, i think.
|> They have explicitly added posix_close() which overcomes the
|> problem (for those operating systems which actually act like
|> that). There is a long RATIONALE on this, it starts on page 747 :)
|
|They could have just added posix_close() which provided well-defined
|semantics without demanding that existing implementations make
|non-backwards compatible changes to close(2). Personally, while they
|were adding posix_close(2) they could have also fixed the disaster
|which is the semantics around close(2) [.]
Well it was a lot of trouble, not only in bug 529[1], with
follow-ups like a thread started by Michael Kerrisk, with an
interesting response by Rich Felker of Musl[2].
In [1] Erik Blake of RedHat/libvirt said for example
The Linux kernel currently always frees the file descriptor (no
chance for a retry; the filedes can immediately be reused by
another open()), for both EINTR and EIO. Maybe it is safer to
state that the fd is _always_ closed, even if failure is
reported?
etc, but Geoff Clare then (this also was in 2012, where one
possibly could have hoped that more operating systems survive /
continue with money/manpower backing by serious companies; just
in case that mattered) came via
HP got it right with HP-UX; AIX and Linux do the wrong thing.
and he has quite some reasoning for descriptors like ttys etc,
where close can linger, which resulted in Erik Blake quoting
Let me make it very, very clear - no matter how many times these
guys assert HP-UX insane behaviour correct, no "fixes" to Linux
one are going to be accepted. Consider it vetoed. By me, in
role of Linux VFS maintainer. And I'm _very_ certain that
getting Linus to agree will be a matter of minutes.
[1] https://www.austingroupbugs.net/view.php?id=529
[2] https://www.mail-archive.com/austin-group-l@opengroup.org/msg00579.html
|[.] and how advisory locks get
|released that were held by other file descriptors and add a profound
|apologies over the insane semantics demanded by POSIX[1].
The new standard added the Linux-style F_OFD_* fcntl(2) locks!
They are yet Linux-only, but NetBSD at least has an issue by
a major contributor (bug 59241):
NetBSD seems to lack the following:
3.237 OFD-Owned File Lock
...
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_237
>How-To-Repeat:
standards inspection
>Fix:
Yes, please! (That or write down a reason why we eschew it.)
|[1] "POSIX advisory locks are broken by design."
| https://www.sqlite.org/src/artifact/c230a7a24?ln=994-1081
|
| - Ted
--End of <20250520133705.GE38098@mit.edu>
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-17 13:46 ` Alejandro Colomar
@ 2025-05-23 18:10 ` Zack Weinberg
2025-05-24 2:24 ` Rich Felker
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Zack Weinberg @ 2025-05-23 18:10 UTC (permalink / raw)
To: Alejandro Colomar, Rich Felker
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, GNU libc development
Taking everything said in this thread into account, I have attempted to
wordsmith new language for the close(2) manpage. Please let me know
what you think, and please help me with the bits marked in square
brackets. I can make this into a proper patch for the manpages
when everyone is happy with it.
zw
---
DESCRIPTION
... existing text ...
close() always succeeds. That is, after it returns, _fd_ has
always been disconnected from the open file it formerly referred
to, and its number can be recycled to refer to some other file.
Furthermore, if _fd_ was the last reference to the underlying
open file description, the resources associated with the open file
description will always have been scheduled to be released.
However, close may report _delayed errors_ from a previous I/O
operation. Therefore, its return value should not be ignored.
RETURN VALUE
close() returns zero if there are no delayed errors to report,
or -1 if there _might_ be delayed errors.
When close() returns -1, check _errno_ to see what the situation
actually is. Most, but not all, _errno_ codes indicate a delayed
I/O error that should be reported to the user. See ERRORS and
NOTES for more detail.
[QUERY: Is it ever possible to get delayed errors on close() from
a file that was opened with O_RDONLY? What about a file that was
opened with O_RDWR but never actually written to? If people only
have to worry about delayed errors if the file was actually
written to, we should say so at this point.
It would also be good to mention whether it is possible to get a
delayed error on close() even if a previous call to fsync() or
fdatasync() succeeded and there haven’t been any more writes to
that file *description* (not necessarily via the fd being closed)
since.]
ERRORS
EBADF _fd_ wasn’t open in the first place, or is outside the
valid numeric range for file descriptors.
EINPROGRESS
EINTR
There are no delayed errors to report, but the kernel is
still doing some clean-up work in the background. This
situation should be treated the same as if close() had
returned zero. Do not retry the close(), and do not report
an error to the user.
EDQUOT
EFBIG
EIO
ENOSPC
These are the most common errno codes associated with
delayed I/O errors. They should be treated as a hard
failure to write to the file that was formerly associated
with _fd_, the same as if an earlier write(2) had failed
with one of these codes. The file has still been closed!
Do not retry the close(). But do report an error to the user.
Depending on the underlying file, close() may return other errno
codes; these should generally also be treated as delayed I/O errors.
NOTES
Dealing with error returns from close()
As discussed above, close() always closes the file. Except when
errno is set to EBADF, EINPROGRESS, or EINTR, an error return from
close() reports a _delayed I/O error_ from a previous write()
operation.
It is vital to report delayed I/O errors to the user; failing to
check the return value of close() can cause _silent_ loss of data.
The most common situations where this actually happens involve
networked filesystems, where, in the name of throughput, write()
often returns success before the server has actually confirmed a
successful write.
However, it is also vital to understand that _no matter what_
close() returns, and _no matter what_ it sets errno to, when it
returns, _the file descriptor passed to close() has been closed_,
and its number is _immediately_ available for reuse by open(2),
dup(2), etc. Therefore, one should never retry a close(), not
even if it set errno to a value that normally indicates the
operation needs to be retried (e.g. EINTR). Retrying a close()
is a serious bug, particularly in a multithreaded program; if
the file descriptor number has already been reused, _that file_
will get closed out from under whatever other thread opened it.
[Possibly something about fsync/fdatasync here?]
BUGS
Prior to POSIX.1-2024, there was no official guarantee that
close() would always close the file descriptor, even on error.
Linux has always closed the file descriptor, even on error,
but other implementations might not have.
The only such implementation we have heard of is HP-UX; at least
some versions of HP-UX’s man page for close() said it should be
retried if it returned -1 with errno set to EINTR. (If you know
exactly which versions of HP-UX are affected, or of any other
Unix where close() doesn’t always close the file descriptor,
please contact us about it.)
Portable code should nonetheless never retry a failed close(); the
consequences of a file descriptor leak are far less dangerous than
the consequences of closing a file out from under another thread.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-23 18:10 ` Zack Weinberg
@ 2025-05-24 2:24 ` Rich Felker
2026-01-20 17:05 ` Zack Weinberg
2025-05-24 19:25 ` Florian Weimer
2026-01-18 22:23 ` Alejandro Colomar
2 siblings, 1 reply; 56+ messages in thread
From: Rich Felker @ 2025-05-24 2:24 UTC (permalink / raw)
To: Zack Weinberg
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
> Taking everything said in this thread into account, I have attempted to
> wordsmith new language for the close(2) manpage. Please let me know
> what you think, and please help me with the bits marked in square
> brackets. I can make this into a proper patch for the manpages
> when everyone is happy with it.
>
> zw
>
> ---
>
> DESCRIPTION
> ... existing text ...
>
> close() always succeeds. That is, after it returns, _fd_ has
> always been disconnected from the open file it formerly referred
> to, and its number can be recycled to refer to some other file.
> Furthermore, if _fd_ was the last reference to the underlying
> open file description, the resources associated with the open file
> description will always have been scheduled to be released.
>
> However, close may report _delayed errors_ from a previous I/O
> operation. Therefore, its return value should not be ignored.
>
> RETURN VALUE
> close() returns zero if there are no delayed errors to report,
> or -1 if there _might_ be delayed errors.
>
> When close() returns -1, check _errno_ to see what the situation
> actually is. Most, but not all, _errno_ codes indicate a delayed
> I/O error that should be reported to the user. See ERRORS and
> NOTES for more detail.
>
> [QUERY: Is it ever possible to get delayed errors on close() from
> a file that was opened with O_RDONLY? What about a file that was
> opened with O_RDWR but never actually written to? If people only
> have to worry about delayed errors if the file was actually
> written to, we should say so at this point.
>
> It would also be good to mention whether it is possible to get a
> delayed error on close() even if a previous call to fsync() or
> fdatasync() succeeded and there haven’t been any more writes to
> that file *description* (not necessarily via the fd being closed)
> since.]
>
> ERRORS
> EBADF _fd_ wasn’t open in the first place, or is outside the
> valid numeric range for file descriptors.
>
> EINPROGRESS
> EINTR
> There are no delayed errors to report, but the kernel is
> still doing some clean-up work in the background. This
> situation should be treated the same as if close() had
> returned zero. Do not retry the close(), and do not report
> an error to the user.
Since this behavior for EINTR is non-conforming (and even prior to the
POSIX 2024 update, it was contrary to the general semantics for EINTR,
that no non-ignoreable side-effects have taken place), it should be
noted that it's Linux/glibc-specific.
> EDQUOT
> EFBIG
> EIO
> ENOSPC
> These are the most common errno codes associated with
> delayed I/O errors. They should be treated as a hard
> failure to write to the file that was formerly associated
> with _fd_, the same as if an earlier write(2) had failed
> with one of these codes. The file has still been closed!
> Do not retry the close(). But do report an error to the user.
>
> Depending on the underlying file, close() may return other errno
> codes; these should generally also be treated as delayed I/O errors.
>
> NOTES
> Dealing with error returns from close()
>
> As discussed above, close() always closes the file. Except when
> errno is set to EBADF, EINPROGRESS, or EINTR, an error return from
> close() reports a _delayed I/O error_ from a previous write()
> operation.
>
> It is vital to report delayed I/O errors to the user; failing to
> check the return value of close() can cause _silent_ loss of data.
> The most common situations where this actually happens involve
> networked filesystems, where, in the name of throughput, write()
> often returns success before the server has actually confirmed a
> successful write.
>
> However, it is also vital to understand that _no matter what_
> close() returns, and _no matter what_ it sets errno to, when it
> returns, _the file descriptor passed to close() has been closed_,
> and its number is _immediately_ available for reuse by open(2),
> dup(2), etc. Therefore, one should never retry a close(), not
> even if it set errno to a value that normally indicates the
> operation needs to be retried (e.g. EINTR). Retrying a close()
> is a serious bug, particularly in a multithreaded program; if
> the file descriptor number has already been reused, _that file_
> will get closed out from under whatever other thread opened it.
>
> [Possibly something about fsync/fdatasync here?]
While I agree with all of this, I think the tone is way too
proscriptive. The man pages are to document the behaviors, not tell
people how to program. And again, it should be noted that the standard
behavior is that you *do* have to retry on EINTR, or arrange to ensure
it never happens (e.g. by not installing interrupting signal handlers,
or blocking signals across calls to close), and that treating EINTR as
"fd has been closed" is something you should only do on
known-nonconforming systems.
Aside: the reason EINTR *has to* be specified this way is that pthread
cancellation is aligned with EINTR. If EINTR were defined to have
closed the fd, then acting on cancellation during close would also
have closed the fd, but the cancellation handler would have no way to
distinguish this, leading to a situation where you're forced to either
leak fds or introduce a double-close vuln.
> BUGS
> Prior to POSIX.1-2024, there was no official guarantee that
> close() would always close the file descriptor, even on error.
> Linux has always closed the file descriptor, even on error,
> but other implementations might not have.
>
> The only such implementation we have heard of is HP-UX; at least
> some versions of HP-UX’s man page for close() said it should be
> retried if it returned -1 with errno set to EINTR. (If you know
> exactly which versions of HP-UX are affected, or of any other
> Unix where close() doesn’t always close the file descriptor,
> please contact us about it.)
>
> Portable code should nonetheless never retry a failed close(); the
> consequences of a file descriptor leak are far less dangerous than
> the consequences of closing a file out from under another thread.
This is explicitly the opposite of what's specified for portable code.
It sounds like you are intentionally omitting that POSIX says the
opposite of what you want it to, and treating the standard behavior as
a historical HP-UX quirk/bug. This is polemic, not the sort of
documentation that belongs in a man page.
An outline of what I'd like to see instead:
- Clear explanation of why double-close is a serious bug that must
always be avoided. (I think we all agree on this.)
- Statement that the historical Linux/glibc behavior and current POSIX
requirement differ, without language that tries to paint the POSIX
behavior as a HP-UX bug/quirk. Possibly citing real sources/history
of the issue (Austin Group tracker items 529, 614; maybe others).
- Consequence of just assuming the Linux behavior (fd leaks on
conforming systems).
- Consequences of assuming the POSIX behavior (double-close vulns on
GNU/Linux, maybe others).
- Survey of methods for avoiding the problem (ways to preclude EINTR,
possibly ways to infer behavior, etc).
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-23 18:10 ` Zack Weinberg
2025-05-24 2:24 ` Rich Felker
@ 2025-05-24 19:25 ` Florian Weimer
2026-01-18 22:23 ` Alejandro Colomar
2 siblings, 0 replies; 56+ messages in thread
From: Florian Weimer @ 2025-05-24 19:25 UTC (permalink / raw)
To: Zack Weinberg
Cc: Alejandro Colomar, Rich Felker, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
* Zack Weinberg:
> BUGS
> Prior to POSIX.1-2024, there was no official guarantee that
> close() would always close the file descriptor, even on error.
> Linux has always closed the file descriptor, even on error,
> but other implementations might not have.
>
> The only such implementation we have heard of is HP-UX; at least
> some versions of HP-UX’s man page for close() said it should be
> retried if it returned -1 with errno set to EINTR. (If you know
> exactly which versions of HP-UX are affected, or of any other
> Unix where close() doesn’t always close the file descriptor,
> please contact us about it.)
The AIX documentation also says this:
| The success of the close subroutine is undetermined if the following
| is true:
|
| EINTR The state of the FileDescriptor is undetermined. Retry the
| close routine to ensure that the FileDescriptor is closed.
<https://www.ibm.com/docs/en/aix/7.2.0?topic=c-close-subroutine>
So it's not just HP-UX.
For z/OS, it looks like some other errors leave the descriptor open:
| EAGAIN
|
| The call did not complete because the specified socket descriptor
| is currently being used by another thread in the same process.
|
| For example, in a multithreaded environment, close() fails and
| returns EAGAIN when the following sequence of events occurs (1)
| thread is blocked in a read() or select() call on a given file or
| socket descriptor and (2) another thread issues a simultaneous
| close() call for the same descriptor.
| […]
| EBADF
| fildes is not a valid open file descriptor, or the socket
| parameter is not a valid socket descriptor.
<https://www.ibm.com/docs/en/zos/2.1.0?topic=functions-close-close-file>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-23 18:10 ` Zack Weinberg
2025-05-24 2:24 ` Rich Felker
2025-05-24 19:25 ` Florian Weimer
@ 2026-01-18 22:23 ` Alejandro Colomar
2026-01-20 16:15 ` Zack Weinberg
2 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2026-01-18 22:23 UTC (permalink / raw)
To: Zack Weinberg
Cc: Rich Felker, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 5703 bytes --]
Hi Zack and others,
Just a gentle ping. It would be nice to have an agreement for some
patch.
Have a lovely night!
Alex
On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
> Taking everything said in this thread into account, I have attempted to
> wordsmith new language for the close(2) manpage. Please let me know
> what you think, and please help me with the bits marked in square
> brackets. I can make this into a proper patch for the manpages
> when everyone is happy with it.
>
> zw
>
> ---
>
> DESCRIPTION
> ... existing text ...
>
> close() always succeeds. That is, after it returns, _fd_ has
> always been disconnected from the open file it formerly referred
> to, and its number can be recycled to refer to some other file.
> Furthermore, if _fd_ was the last reference to the underlying
> open file description, the resources associated with the open file
> description will always have been scheduled to be released.
>
> However, close may report _delayed errors_ from a previous I/O
> operation. Therefore, its return value should not be ignored.
>
> RETURN VALUE
> close() returns zero if there are no delayed errors to report,
> or -1 if there _might_ be delayed errors.
>
> When close() returns -1, check _errno_ to see what the situation
> actually is. Most, but not all, _errno_ codes indicate a delayed
> I/O error that should be reported to the user. See ERRORS and
> NOTES for more detail.
>
> [QUERY: Is it ever possible to get delayed errors on close() from
> a file that was opened with O_RDONLY? What about a file that was
> opened with O_RDWR but never actually written to? If people only
> have to worry about delayed errors if the file was actually
> written to, we should say so at this point.
>
> It would also be good to mention whether it is possible to get a
> delayed error on close() even if a previous call to fsync() or
> fdatasync() succeeded and there haven’t been any more writes to
> that file *description* (not necessarily via the fd being closed)
> since.]
>
> ERRORS
> EBADF _fd_ wasn’t open in the first place, or is outside the
> valid numeric range for file descriptors.
>
> EINPROGRESS
> EINTR
> There are no delayed errors to report, but the kernel is
> still doing some clean-up work in the background. This
> situation should be treated the same as if close() had
> returned zero. Do not retry the close(), and do not report
> an error to the user.
>
> EDQUOT
> EFBIG
> EIO
> ENOSPC
> These are the most common errno codes associated with
> delayed I/O errors. They should be treated as a hard
> failure to write to the file that was formerly associated
> with _fd_, the same as if an earlier write(2) had failed
> with one of these codes. The file has still been closed!
> Do not retry the close(). But do report an error to the user.
>
> Depending on the underlying file, close() may return other errno
> codes; these should generally also be treated as delayed I/O errors.
>
> NOTES
> Dealing with error returns from close()
>
> As discussed above, close() always closes the file. Except when
> errno is set to EBADF, EINPROGRESS, or EINTR, an error return from
> close() reports a _delayed I/O error_ from a previous write()
> operation.
>
> It is vital to report delayed I/O errors to the user; failing to
> check the return value of close() can cause _silent_ loss of data.
> The most common situations where this actually happens involve
> networked filesystems, where, in the name of throughput, write()
> often returns success before the server has actually confirmed a
> successful write.
>
> However, it is also vital to understand that _no matter what_
> close() returns, and _no matter what_ it sets errno to, when it
> returns, _the file descriptor passed to close() has been closed_,
> and its number is _immediately_ available for reuse by open(2),
> dup(2), etc. Therefore, one should never retry a close(), not
> even if it set errno to a value that normally indicates the
> operation needs to be retried (e.g. EINTR). Retrying a close()
> is a serious bug, particularly in a multithreaded program; if
> the file descriptor number has already been reused, _that file_
> will get closed out from under whatever other thread opened it.
>
> [Possibly something about fsync/fdatasync here?]
>
> BUGS
> Prior to POSIX.1-2024, there was no official guarantee that
> close() would always close the file descriptor, even on error.
> Linux has always closed the file descriptor, even on error,
> but other implementations might not have.
>
> The only such implementation we have heard of is HP-UX; at least
> some versions of HP-UX’s man page for close() said it should be
> retried if it returned -1 with errno set to EINTR. (If you know
> exactly which versions of HP-UX are affected, or of any other
> Unix where close() doesn’t always close the file descriptor,
> please contact us about it.)
>
> Portable code should nonetheless never retry a failed close(); the
> consequences of a file descriptor leak are far less dangerous than
> the consequences of closing a file out from under another thread.
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-18 22:23 ` Alejandro Colomar
@ 2026-01-20 16:15 ` Zack Weinberg
2026-01-20 16:36 ` Rich Felker
0 siblings, 1 reply; 56+ messages in thread
From: Zack Weinberg @ 2026-01-20 16:15 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Rich Felker, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
Rich and I have an irreconciliable disagreement on what the semantics of close
_should_ be. I'm not going to do any more work on this until/unless he
changes his mind.
On Sun, Jan 18, 2026, at 5:23 PM, Alejandro Colomar wrote:
> Hi Zack and others,
>
> Just a gentle ping. It would be nice to have an agreement for some
> patch.
>
>
> Have a lovely night!
> Alex
>
> On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>> Taking everything said in this thread into account, I have attempted to
>> wordsmith new language for the close(2) manpage. Please let me know
>> what you think, and please help me with the bits marked in square
>> brackets. I can make this into a proper patch for the manpages
>> when everyone is happy with it.
>>
>> zw
>>
>> ---
>>
>> DESCRIPTION
>> ... existing text ...
>>
>> close() always succeeds. That is, after it returns, _fd_ has
>> always been disconnected from the open file it formerly referred
>> to, and its number can be recycled to refer to some other file.
>> Furthermore, if _fd_ was the last reference to the underlying
>> open file description, the resources associated with the open file
>> description will always have been scheduled to be released.
>>
>> However, close may report _delayed errors_ from a previous I/O
>> operation. Therefore, its return value should not be ignored.
>>
>> RETURN VALUE
>> close() returns zero if there are no delayed errors to report,
>> or -1 if there _might_ be delayed errors.
>>
>> When close() returns -1, check _errno_ to see what the situation
>> actually is. Most, but not all, _errno_ codes indicate a delayed
>> I/O error that should be reported to the user. See ERRORS and
>> NOTES for more detail.
>>
>> [QUERY: Is it ever possible to get delayed errors on close() from
>> a file that was opened with O_RDONLY? What about a file that was
>> opened with O_RDWR but never actually written to? If people only
>> have to worry about delayed errors if the file was actually
>> written to, we should say so at this point.
>>
>> It would also be good to mention whether it is possible to get a
>> delayed error on close() even if a previous call to fsync() or
>> fdatasync() succeeded and there haven’t been any more writes to
>> that file *description* (not necessarily via the fd being closed)
>> since.]
>>
>> ERRORS
>> EBADF _fd_ wasn’t open in the first place, or is outside the
>> valid numeric range for file descriptors.
>>
>> EINPROGRESS
>> EINTR
>> There are no delayed errors to report, but the kernel is
>> still doing some clean-up work in the background. This
>> situation should be treated the same as if close() had
>> returned zero. Do not retry the close(), and do not report
>> an error to the user.
>>
>> EDQUOT
>> EFBIG
>> EIO
>> ENOSPC
>> These are the most common errno codes associated with
>> delayed I/O errors. They should be treated as a hard
>> failure to write to the file that was formerly associated
>> with _fd_, the same as if an earlier write(2) had failed
>> with one of these codes. The file has still been closed!
>> Do not retry the close(). But do report an error to the user.
>>
>> Depending on the underlying file, close() may return other errno
>> codes; these should generally also be treated as delayed I/O errors.
>>
>> NOTES
>> Dealing with error returns from close()
>>
>> As discussed above, close() always closes the file. Except when
>> errno is set to EBADF, EINPROGRESS, or EINTR, an error return from
>> close() reports a _delayed I/O error_ from a previous write()
>> operation.
>>
>> It is vital to report delayed I/O errors to the user; failing to
>> check the return value of close() can cause _silent_ loss of data.
>> The most common situations where this actually happens involve
>> networked filesystems, where, in the name of throughput, write()
>> often returns success before the server has actually confirmed a
>> successful write.
>>
>> However, it is also vital to understand that _no matter what_
>> close() returns, and _no matter what_ it sets errno to, when it
>> returns, _the file descriptor passed to close() has been closed_,
>> and its number is _immediately_ available for reuse by open(2),
>> dup(2), etc. Therefore, one should never retry a close(), not
>> even if it set errno to a value that normally indicates the
>> operation needs to be retried (e.g. EINTR). Retrying a close()
>> is a serious bug, particularly in a multithreaded program; if
>> the file descriptor number has already been reused, _that file_
>> will get closed out from under whatever other thread opened it.
>>
>> [Possibly something about fsync/fdatasync here?]
>>
>> BUGS
>> Prior to POSIX.1-2024, there was no official guarantee that
>> close() would always close the file descriptor, even on error.
>> Linux has always closed the file descriptor, even on error,
>> but other implementations might not have.
>>
>> The only such implementation we have heard of is HP-UX; at least
>> some versions of HP-UX’s man page for close() said it should be
>> retried if it returned -1 with errno set to EINTR. (If you know
>> exactly which versions of HP-UX are affected, or of any other
>> Unix where close() doesn’t always close the file descriptor,
>> please contact us about it.)
>>
>> Portable code should nonetheless never retry a failed close(); the
>> consequences of a file descriptor leak are far less dangerous than
>> the consequences of closing a file out from under another thread.
>
> --
> <https://www.alejandro-colomar.es>
>
> Attachments:
> * signature.asc
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 16:15 ` Zack Weinberg
@ 2026-01-20 16:36 ` Rich Felker
2026-01-20 19:17 ` Al Viro
0 siblings, 1 reply; 56+ messages in thread
From: Rich Felker @ 2026-01-20 16:36 UTC (permalink / raw)
To: Zack Weinberg
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Tue, Jan 20, 2026 at 11:15:15AM -0500, Zack Weinberg wrote:
> Rich and I have an irreconciliable disagreement on what the semantics of close
> _should_ be. I'm not going to do any more work on this until/unless he
> changes his mind.
It's been way too long since I read this thread to recall what our
point of disagreement is or what point glibc might be at in
reconciling the Linux kernel disagreement with POSIX.
I believe my position is basically this:
1. Documentation should reflect that the EINTR behavior on raw Linux
syscall and traditional glibc is non-conforming to POSIX, but make
applications aware of it and that it's unsafe to retry close on
these systems.
2. Documentation should be descriptive not polemic or proscriptive of
coding style or practices. When there is a disagreement like this
it should document that and faithfully represent the different
positions, not represent the author's views on which one is
correct.
3. It may be helpful to have further information on what types of
errors can actually be expected from close on Linux, and under what
conditions, but only if these behaviors can actually be guaranteed.
If it's just documenting what Linux currently happens to do, but
without any existing promise to preserve that for new file types
etc., then this is stepping out of line of the role of
documentation into defining the specification, and that requires
input from other folks.
4. If musl behavior is being documented, it should be noted that we do
not have the non-conforming EINTR issue. If the kernel produces
EINTR, we return 0. From 0.9.7 to 1.1.6 we produced EINPROGRESS,
but this was changed in 1.1.7 as it was found that applications
would treat EINPROGRESS as an error condition.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-24 2:24 ` Rich Felker
@ 2026-01-20 17:05 ` Zack Weinberg
2026-01-20 17:46 ` Rich Felker
0 siblings, 1 reply; 56+ messages in thread
From: Zack Weinberg @ 2026-01-20 17:05 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
> On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>> close() always succeeds. That is, after it returns, _fd_ has
>> always been disconnected from the open file it formerly referred
>> to, and its number can be recycled to refer to some other file.
>> Furthermore, if _fd_ was the last reference to the underlying
>> open file description, the resources associated with the open file
>> description will always have been scheduled to be released.
...
>> EINPROGRESS
>> EINTR
>> There are no delayed errors to report, but the kernel is
>> still doing some clean-up work in the background. This
>> situation should be treated the same as if close() had
>> returned zero. Do not retry the close(), and do not report
>> an error to the user.
>
> Since this behavior for EINTR is non-conforming (and even prior to the
> POSIX 2024 update, it was contrary to the general semantics for EINTR,
> that no non-ignoreable side-effects have taken place), it should be
> noted that it's Linux/glibc-specific.
I am prepared to take your word for it that POSIX says this is
non-conforming, but in that case, POSIX is wrong, and I will not be
convinced otherwise by any argument. Operations that release a
resource must always succeed.
Now, the abstract correct behavior is secondary to the fact that we
know there are both systems where close should not be retried after
EINTR (Linux) and systems where the fd is still open after EINTR
(HP-UX). But it is my position that *portable code* should assume the
Linux behavior, because that is the safest option. If you assume the
HP-UX behavior on a machine that implements the Linux behavior, you
might close some unrelated file out from under yourself (probably but
not necessarily a different thread). If you assume the Linux behavior
on a machine that implements the HP-UX behavior, you have leaked a
file descriptor; the worst things that can do are much less severe.
The only way to get it right all the time is to have a big long list
of #ifdefs for every Unix under the sun, and we don't even have the
data we would need to write that list.
> While I agree with all of this, I think the tone is way too
> proscriptive. The man pages are to document the behaviors, not tell
> people how to program.
I could be persuaded to tone it down a little but in this case I think
the man page's job *is* to tell people how to program. We know lots of
existing code has gotten the fine details of close() wrong and we are
trying to document how to do it right.
> Aside: the reason EINTR *has to* be specified this way is that pthread
> cancellation is aligned with EINTR. If EINTR were defined to have
> closed the fd, then acting on cancellation during close would also
> have closed the fd, but the cancellation handler would have no way to
> distinguish this, leading to a situation where you're forced to either
> leak fds or introduce a double-close vuln.
The correct way to address this would be to make close() not be a
cancellation point.
> It sounds like you are intentionally omitting that POSIX says the
> opposite of what you want it to, and treating the standard behavior
> as a historical HP-UX quirk/bug. This is polemic, not the sort of
> documentation that belongs in a man page.
To be clear, when I wrote all this I thought the POSIX.1-2024 change
did in fact make the semantics be that close() closes the descriptor
no matter what it returns.
However, I insist that the correct behavior is in fact for close to
close the descriptor no matter what it returns, and to the extent
POSIX says anything else, POSIX is wrong. Again, you cannot change
my mind about this.
N.B. I have skimmed the current text of
https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html
and it appears to me that the committee more or less agrees with me,
but wishes to avoid declaring HP-UX (and any other systems with the
same behavior) nonconformant. So instead of just saying the fd is
closed no matter what, they've invented a new variant on close that
they have more scope to modify the behavior of, and they're nudging
implementations to not return EINTR from (posix_)close at all.
I don't think we (authors of this particular set of manpages) need to
care about the Austin Group's reluctance to declare existing legacy
systems nonconformant.
> An outline of what I'd like to see instead:
>
> - Clear explanation of why double-close is a serious bug that must
> always be avoided. (I think we all agree on this.)
>
> - Statement that the historical Linux/glibc behavior and current POSIX
> requirement differ, without language that tries to paint the POSIX
> behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> of the issue (Austin Group tracker items 529, 614; maybe others).
>
> - Consequence of just assuming the Linux behavior (fd leaks on
> conforming systems).
>
> - Consequences of assuming the POSIX behavior (double-close vulns on
> GNU/Linux, maybe others).
>
> - Survey of methods for avoiding the problem (ways to preclude EINTR,
> possibly ways to infer behavior, etc).
This outline seems more or less reasonable to me but, if it's me
writing the text, I _will_ characterize what POSIX currently says
about EINTR returns from close() as a bug in POSIX. As far as I'm
concerned, that is a fact, not polemic.
I have found that arguing with you in particular, Rich, is generally
not worth the effort. Therefore, unless you reply and _accept_ that
the final version of the close manpage will say that POSIX is buggy,
I am not going to write another version of this text, nor will I be
drawn into further debate.
zw
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 17:05 ` Zack Weinberg
@ 2026-01-20 17:46 ` Rich Felker
2026-01-20 18:39 ` Florian Weimer
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Rich Felker @ 2026-01-20 17:46 UTC (permalink / raw)
To: Zack Weinberg
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
> >> close() always succeeds. That is, after it returns, _fd_ has
> >> always been disconnected from the open file it formerly referred
> >> to, and its number can be recycled to refer to some other file.
> >> Furthermore, if _fd_ was the last reference to the underlying
> >> open file description, the resources associated with the open file
> >> description will always have been scheduled to be released.
> ...
> >> EINPROGRESS
> >> EINTR
> >> There are no delayed errors to report, but the kernel is
> >> still doing some clean-up work in the background. This
> >> situation should be treated the same as if close() had
> >> returned zero. Do not retry the close(), and do not report
> >> an error to the user.
> >
> > Since this behavior for EINTR is non-conforming (and even prior to the
> > POSIX 2024 update, it was contrary to the general semantics for EINTR,
> > that no non-ignoreable side-effects have taken place), it should be
> > noted that it's Linux/glibc-specific.
>
> I am prepared to take your word for it that POSIX says this is
> non-conforming, but in that case, POSIX is wrong, and I will not be
> convinced otherwise by any argument. Operations that release a
> resource must always succeed.
There are two conflicting requirements here:
1. Operations that release a resource must always succeed.
2. Failure with EINTR must not not have side effects.
The right conclusion is that operations that release resources must
not be able to fail with EINTR. And that's how POSIX should have
resolved the situation -- by getting rid of support for the silly
legacy synchronous-tape-drive-rewinding behavior of close on some
systems, and requiring close to succeed immediately with no waiting
for anything. But abandoning requirement 2 is not an option,
especially in light of the relationship between EINTR and thread
cancellation in regards to contract about side effects.
It's perfectly reasonable for implementations (as musl does, and as I
think glibc either does or intends to do) to just go all the way and
satisfy both 1 and 2 by having close translate the kernel EINTR into
0.
> Now, the abstract correct behavior is secondary to the fact that we
> know there are both systems where close should not be retried after
> EINTR (Linux) and systems where the fd is still open after EINTR
> (HP-UX). But it is my position that *portable code* should assume the
> Linux behavior, because that is the safest option. If you assume the
> HP-UX behavior on a machine that implements the Linux behavior, you
> might close some unrelated file out from under yourself (probably but
> not necessarily a different thread). If you assume the Linux behavior
> on a machine that implements the HP-UX behavior, you have leaked a
> file descriptor; the worst things that can do are much less severe.
Unfortunately, regardless of what happens, code portable to old
systems needs to avoid getting in the situation to begin with. By
either not installing interrupting signal handlers or blocking EINTR
around close.
> The only way to get it right all the time is to have a big long list
> of #ifdefs for every Unix under the sun, and we don't even have the
> data we would need to write that list.
>
> > While I agree with all of this, I think the tone is way too
> > proscriptive. The man pages are to document the behaviors, not tell
> > people how to program.
>
> I could be persuaded to tone it down a little but in this case I think
> the man page's job *is* to tell people how to program. We know lots of
> existing code has gotten the fine details of close() wrong and we are
> trying to document how to do it right.
No, the job of the man pages absolutely is not "to tell people how to
program". It's to document behaviors. They are not a programming
tutorial. They are not polemic diatribes. They are unbiased statements
of facts. Facts of what the standards say and what implementations do,
that equip programmers with the knowledge they need to make their own
informed decisions, rather than blindly following what someone who
thinks they know better told them to do.
> > Aside: the reason EINTR *has to* be specified this way is that pthread
> > cancellation is aligned with EINTR. If EINTR were defined to have
> > closed the fd, then acting on cancellation during close would also
> > have closed the fd, but the cancellation handler would have no way to
> > distinguish this, leading to a situation where you're forced to either
> > leak fds or introduce a double-close vuln.
>
> The correct way to address this would be to make close() not be a
> cancellation point.
This would also be a desirable change, one I would support if other
implementors are on-board with pushing for it.
> > An outline of what I'd like to see instead:
> >
> > - Clear explanation of why double-close is a serious bug that must
> > always be avoided. (I think we all agree on this.)
> >
> > - Statement that the historical Linux/glibc behavior and current POSIX
> > requirement differ, without language that tries to paint the POSIX
> > behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> > of the issue (Austin Group tracker items 529, 614; maybe others).
> >
> > - Consequence of just assuming the Linux behavior (fd leaks on
> > conforming systems).
> >
> > - Consequences of assuming the POSIX behavior (double-close vulns on
> > GNU/Linux, maybe others).
> >
> > - Survey of methods for avoiding the problem (ways to preclude EINTR,
> > possibly ways to infer behavior, etc).
>
> This outline seems more or less reasonable to me but, if it's me
> writing the text, I _will_ characterize what POSIX currently says
> about EINTR returns from close() as a bug in POSIX. As far as I'm
> concerned, that is a fact, not polemic.
>
> I have found that arguing with you in particular, Rich, is generally
> not worth the effort. Therefore, unless you reply and _accept_ that
> the final version of the close manpage will say that POSIX is buggy,
> I am not going to write another version of this text, nor will I be
> drawn into further debate.
I will not accept that because it's a gross violation of the
responsibility of document writing.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 17:46 ` Rich Felker
@ 2026-01-20 18:39 ` Florian Weimer
2026-01-20 19:00 ` Rich Felker
2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2 siblings, 1 reply; 56+ messages in thread
From: Florian Weimer @ 2026-01-20 18:39 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
* Rich Felker:
> On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
>> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>> >> close() always succeeds. That is, after it returns, _fd_ has
>> >> always been disconnected from the open file it formerly referred
>> >> to, and its number can be recycled to refer to some other file.
>> >> Furthermore, if _fd_ was the last reference to the underlying
>> >> open file description, the resources associated with the open file
>> >> description will always have been scheduled to be released.
>> ...
>> >> EINPROGRESS
>> >> EINTR
>> >> There are no delayed errors to report, but the kernel is
>> >> still doing some clean-up work in the background. This
>> >> situation should be treated the same as if close() had
>> >> returned zero. Do not retry the close(), and do not report
>> >> an error to the user.
>> >
>> > Since this behavior for EINTR is non-conforming (and even prior to the
>> > POSIX 2024 update, it was contrary to the general semantics for EINTR,
>> > that no non-ignoreable side-effects have taken place), it should be
>> > noted that it's Linux/glibc-specific.
>>
>> I am prepared to take your word for it that POSIX says this is
>> non-conforming, but in that case, POSIX is wrong, and I will not be
>> convinced otherwise by any argument. Operations that release a
>> resource must always succeed.
>
> There are two conflicting requirements here:
>
> 1. Operations that release a resource must always succeed.
> 2. Failure with EINTR must not not have side effects.
>
> The right conclusion is that operations that release resources must
> not be able to fail with EINTR. And that's how POSIX should have
> resolved the situation -- by getting rid of support for the silly
> legacy synchronous-tape-drive-rewinding behavior of close on some
> systems, and requiring close to succeed immediately with no waiting
> for anything.
What about SO_LINGER? Isn't this relevant in context?
As far as I know, there is no other way besides SO_LINGER to get
notification if the packet buffers are actually gone. If you don't use
it, memory can pile up in the kernel without the application's
knowledge.
Thanks,
Florian
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 18:39 ` Florian Weimer
@ 2026-01-20 19:00 ` Rich Felker
2026-01-20 20:05 ` Florian Weimer
0 siblings, 1 reply; 56+ messages in thread
From: Rich Felker @ 2026-01-20 19:00 UTC (permalink / raw)
To: Florian Weimer
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote:
> * Rich Felker:
>
> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
> >> >> close() always succeeds. That is, after it returns, _fd_ has
> >> >> always been disconnected from the open file it formerly referred
> >> >> to, and its number can be recycled to refer to some other file.
> >> >> Furthermore, if _fd_ was the last reference to the underlying
> >> >> open file description, the resources associated with the open file
> >> >> description will always have been scheduled to be released.
> >> ...
> >> >> EINPROGRESS
> >> >> EINTR
> >> >> There are no delayed errors to report, but the kernel is
> >> >> still doing some clean-up work in the background. This
> >> >> situation should be treated the same as if close() had
> >> >> returned zero. Do not retry the close(), and do not report
> >> >> an error to the user.
> >> >
> >> > Since this behavior for EINTR is non-conforming (and even prior to the
> >> > POSIX 2024 update, it was contrary to the general semantics for EINTR,
> >> > that no non-ignoreable side-effects have taken place), it should be
> >> > noted that it's Linux/glibc-specific.
> >>
> >> I am prepared to take your word for it that POSIX says this is
> >> non-conforming, but in that case, POSIX is wrong, and I will not be
> >> convinced otherwise by any argument. Operations that release a
> >> resource must always succeed.
> >
> > There are two conflicting requirements here:
> >
> > 1. Operations that release a resource must always succeed.
> > 2. Failure with EINTR must not not have side effects.
> >
> > The right conclusion is that operations that release resources must
> > not be able to fail with EINTR. And that's how POSIX should have
> > resolved the situation -- by getting rid of support for the silly
> > legacy synchronous-tape-drive-rewinding behavior of close on some
> > systems, and requiring close to succeed immediately with no waiting
> > for anything.
>
> What about SO_LINGER? Isn't this relevant in context?
shutdown should be used for this, not close. So that the acts of
waiting for the operation to finish, and releasing the resource handle
needed to observe if it's finished, are separate.
> As far as I know, there is no other way besides SO_LINGER to get
> notification if the packet buffers are actually gone. If you don't use
> it, memory can pile up in the kernel without the application's
> knowledge.
The way Linux's EINTR behaves, using close can't ensure this memory
doesn't pile up, because on EINTR you lose the ability to wait for it.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 16:36 ` Rich Felker
@ 2026-01-20 19:17 ` Al Viro
0 siblings, 0 replies; 56+ messages in thread
From: Al Viro @ 2026-01-20 19:17 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Tue, Jan 20, 2026 at 11:36:34AM -0500, Rich Felker wrote:
> On Tue, Jan 20, 2026 at 11:15:15AM -0500, Zack Weinberg wrote:
> > Rich and I have an irreconciliable disagreement on what the semantics of close
> > _should_ be. I'm not going to do any more work on this until/unless he
> > changes his mind.
>
> It's been way too long since I read this thread to recall what our
> point of disagreement is or what point glibc might be at in
> reconciling the Linux kernel disagreement with POSIX.
It's not so much disagreement as breakage of internal POSIX decision
process that has lead to POSIX irrelevance in this particular area.
POSIX authority derives from the agreement of actual behaviour of
Unices. Always had been, witness the amount of underspecified
areas where various vendor implementation had different semantics,
due to exact that reason.
They (or anybody else, really) can argue that such-and-such behaviour
ought to change. In quite a few cases that has succeeded. What they
can't do is to force such change by fiat. Especially not when Linux
and *BSD happen to agree on behaviour that differs from what they
wish it to be.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 19:00 ` Rich Felker
@ 2026-01-20 20:05 ` Florian Weimer
0 siblings, 0 replies; 56+ messages in thread
From: Florian Weimer @ 2026-01-20 20:05 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
* Rich Felker:
> On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>>
>> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
>> >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>> >> >> close() always succeeds. That is, after it returns, _fd_ has
>> >> >> always been disconnected from the open file it formerly referred
>> >> >> to, and its number can be recycled to refer to some other file.
>> >> >> Furthermore, if _fd_ was the last reference to the underlying
>> >> >> open file description, the resources associated with the open file
>> >> >> description will always have been scheduled to be released.
>> >> ...
>> >> >> EINPROGRESS
>> >> >> EINTR
>> >> >> There are no delayed errors to report, but the kernel is
>> >> >> still doing some clean-up work in the background. This
>> >> >> situation should be treated the same as if close() had
>> >> >> returned zero. Do not retry the close(), and do not report
>> >> >> an error to the user.
>> >> >
>> >> > Since this behavior for EINTR is non-conforming (and even prior to the
>> >> > POSIX 2024 update, it was contrary to the general semantics for EINTR,
>> >> > that no non-ignoreable side-effects have taken place), it should be
>> >> > noted that it's Linux/glibc-specific.
>> >>
>> >> I am prepared to take your word for it that POSIX says this is
>> >> non-conforming, but in that case, POSIX is wrong, and I will not be
>> >> convinced otherwise by any argument. Operations that release a
>> >> resource must always succeed.
>> >
>> > There are two conflicting requirements here:
>> >
>> > 1. Operations that release a resource must always succeed.
>> > 2. Failure with EINTR must not not have side effects.
>> >
>> > The right conclusion is that operations that release resources must
>> > not be able to fail with EINTR. And that's how POSIX should have
>> > resolved the situation -- by getting rid of support for the silly
>> > legacy synchronous-tape-drive-rewinding behavior of close on some
>> > systems, and requiring close to succeed immediately with no waiting
>> > for anything.
>>
>> What about SO_LINGER? Isn't this relevant in context?
>
> shutdown should be used for this, not close. So that the acts of
> waiting for the operation to finish, and releasing the resource handle
> needed to observe if it's finished, are separate.
I think shutdown on TCP sockets is non-blocking under Linux. It doesn't
wait until the peer has acknowledged the FIN segment, as far as I
understand it. Other systems may behave differently.
>> As far as I know, there is no other way besides SO_LINGER to get
>> notification if the packet buffers are actually gone. If you don't use
>> it, memory can pile up in the kernel without the application's
>> knowledge.
>
> The way Linux's EINTR behaves, using close can't ensure this memory
> doesn't pile up, because on EINTR you lose the ability to wait for it.
Can't the application reliably avoid EINTR by blocking signals?
Thanks,
Florian
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 17:46 ` Rich Felker
2026-01-20 18:39 ` Florian Weimer
@ 2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2026-01-20 20:11 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development,
Zack Weinberg
On 2026-01-20 09:46, Rich Felker wrote:
> the job of the man pages absolutely is not "to tell people how to
> program". It's to document behaviors.
In practice man pages do both. When I type "man close" on GNU/Linux I
see text like the text quoted below, and as a C programmer I appreciate
getting advice like this when the situation is sufficiently tricky.
----
Any record locks (see fcntl(2)) held on the file it was associated with,
and owned by the process, are removed regardless of the file descriptor
that was used to obtain the lock. This has some unfortunate consequences
and one should be extra careful when using advisory record locking. See
fcntl(2) for discussion of the risks and consequences as well as for the
(probably preferred) open file description locks.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 17:46 ` Rich Felker
2026-01-20 18:39 ` Florian Weimer
2026-01-20 20:11 ` Paul Eggert
@ 2026-01-20 20:35 ` Alejandro Colomar
2026-01-20 20:42 ` Alejandro Colomar
2 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2026-01-20 20:35 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 9114 bytes --]
Hi Rich, Zack,
On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote:
> On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
[...]
> > Now, the abstract correct behavior is secondary to the fact that we
> > know there are both systems where close should not be retried after
> > EINTR (Linux) and systems where the fd is still open after EINTR
> > (HP-UX). But it is my position that *portable code* should assume the
> > Linux behavior, because that is the safest option. If you assume the
> > HP-UX behavior on a machine that implements the Linux behavior, you
> > might close some unrelated file out from under yourself (probably but
> > not necessarily a different thread). If you assume the Linux behavior
> > on a machine that implements the HP-UX behavior, you have leaked a
> > file descriptor; the worst things that can do are much less severe.
>
> Unfortunately, regardless of what happens, code portable to old
> systems needs to avoid getting in the situation to begin with. By
> either not installing interrupting signal handlers or blocking EINTR
> around close.
[...]
> > > While I agree with all of this, I think the tone is way too
> > > proscriptive. The man pages are to document the behaviors, not tell
> > > people how to program.
> >
> > I could be persuaded to tone it down a little but in this case I think
> > the man page's job *is* to tell people how to program. We know lots of
> > existing code has gotten the fine details of close() wrong and we are
> > trying to document how to do it right.
>
> No, the job of the man pages absolutely is not "to tell people how to
> program". It's to document behaviors. They are not a programming
> tutorial. They are not polemic diatribes. They are unbiased statements
> of facts. Facts of what the standards say and what implementations do,
> that equip programmers with the knowledge they need to make their own
> informed decisions, rather than blindly following what someone who
> thinks they know better told them to do.
This reminds me a little bit of the realloc(p,0) fiasco of C89 and
glibc.
In most cases, I agree with you that manual pages are and should be
aseptic, there are cases where I think the manual page needs to be
tutorial. Especially when there's such a mess, we need to both explain
all the possible behaviors (or at least mention them to some degree).
But for example, there's the case of realloc(p,0), where we have
a fiasco that was pushed by a compoundment of wrong decisions by the
C Committee, and prior to that from System V. We're a bit lucky that
C17 accidentally broke it so badly that we now have it as UB, and that
gives us the opportunity to fix it now (which BTW might also be the case
for close(2)).
In the case of realloc(3), I went and documented in the manual page that
glibc is broken, and that ISO C is also broken.
STANDARDS
malloc()
free()
calloc()
realloc()
C23, POSIX.1‐2024.
reallocarray()
POSIX.1‐2024.
realloc(p, 0)
The behavior of realloc(p, 0) in glibc doesn’t conform to
any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008,
POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17
specification was changed to make it conforming, but that
specification made it impossible to write code that reli‐
ably determines if the input pointer is freed after real‐
loc(p, 0), and C23 changed it again to make this undefined
behavior, acknowledging that the C17 specification was
broad enough, so that undefined behavior wasn’t worse than
that.
reallocarray() suffers the same issues in glibc.
musl libc and the BSDs conform to all versions of ISO C
and POSIX.1.
gnulib provides the realloc‐posix module, which provides
wrappers realloc() and reallocarray() that conform to all
versions of ISO C and POSIX.1.
There’s a proposal to standardize the BSD behavior: https:
//www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt.
HISTORY
malloc()
free()
calloc()
realloc()
POSIX.1‐2001, C89.
reallocarray()
glibc 2.26. OpenBSD 5.6, FreeBSD 11.0.
malloc() and related functions rejected sizes greater than
PTRDIFF_MAX starting in glibc 2.30.
free() preserved errno starting in glibc 2.33.
realloc(p, 0)
C89 was ambiguous in its specification of realloc(p, 0).
C99 partially fixed this.
The original implementation in glibc would have been con‐
forming to C99. However, and ironically, trying to comply
with C99 before the standard was released, glibc changed
its behavior in glibc 2.1.1 into something that ended up
not conforming to the final C99 specification (but this is
debated, as the wording of the standard seems self‐contra‐
dicting).
...
BUGS
Programmers would naturally expect by induction that
realloc(p, size) is consistent with free(p) and mal‐
loc(size), as that is the behavior in the general case.
This is not explicitly required by POSIX.1‐2024 or C11,
but all conforming implementations are consistent with
that.
The glibc implementation of realloc() is not consistent
with that, and as a consequence, it is dangerous to call
realloc(p, 0) in glibc.
A trivial workaround for glibc is calling it as
realloc(p, size?size:1).
The workaround for reallocarray() in glibc ——which shares
the same bug—— would be
reallocarray(p, n?n:1, size?size:1).
Apart from documenting that glibc and ISO C are broken, we document how
to best deal with it (see the last paragraph in BUGS). This is
necessary because I fear that just by documenting the different
behaviors, programmers would still not know what to do with that.
Just take into account that even several members of the committee don't
know how to deal with it.
I'd be willing to have something similar for close(2).
Have a lovely night!
Alex
P.S.: I have great news about realloc(p,0)! Microsoft is on-board with
the change. They told me they like the proposal, and are willing to
fix their realloc(3) implementation. They'll now conduct tests to make
sure it doesn't break anything too badly, and will come back to me with
any feedback they have from those tests.
I'll put the standards proposal for realloc(3) on hold, waiting for
Microsoft's feedback.
> > > Aside: the reason EINTR *has to* be specified this way is that pthread
> > > cancellation is aligned with EINTR. If EINTR were defined to have
> > > closed the fd, then acting on cancellation during close would also
> > > have closed the fd, but the cancellation handler would have no way to
> > > distinguish this, leading to a situation where you're forced to either
> > > leak fds or introduce a double-close vuln.
> >
> > The correct way to address this would be to make close() not be a
> > cancellation point.
>
> This would also be a desirable change, one I would support if other
> implementors are on-board with pushing for it.
>
> > > An outline of what I'd like to see instead:
> > >
> > > - Clear explanation of why double-close is a serious bug that must
> > > always be avoided. (I think we all agree on this.)
> > >
> > > - Statement that the historical Linux/glibc behavior and current POSIX
> > > requirement differ, without language that tries to paint the POSIX
> > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> > > of the issue (Austin Group tracker items 529, 614; maybe others).
> > >
> > > - Consequence of just assuming the Linux behavior (fd leaks on
> > > conforming systems).
> > >
> > > - Consequences of assuming the POSIX behavior (double-close vulns on
> > > GNU/Linux, maybe others).
> > >
> > > - Survey of methods for avoiding the problem (ways to preclude EINTR,
> > > possibly ways to infer behavior, etc).
> >
> > This outline seems more or less reasonable to me but, if it's me
> > writing the text, I _will_ characterize what POSIX currently says
> > about EINTR returns from close() as a bug in POSIX. As far as I'm
> > concerned, that is a fact, not polemic.
> >
> > I have found that arguing with you in particular, Rich, is generally
> > not worth the effort. Therefore, unless you reply and _accept_ that
> > the final version of the close manpage will say that POSIX is buggy,
> > I am not going to write another version of this text, nor will I be
> > drawn into further debate.
>
> I will not accept that because it's a gross violation of the
> responsibility of document writing.
>
> Rich
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 20:35 ` Alejandro Colomar
@ 2026-01-20 20:42 ` Alejandro Colomar
2026-01-23 0:33 ` Zack Weinberg
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2026-01-20 20:42 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 9759 bytes --]
On Tue, Jan 20, 2026 at 09:35:43PM +0100, Alejandro Colomar wrote:
> Hi Rich, Zack,
>
> On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote:
> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
> > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
>
> [...]
>
> > > Now, the abstract correct behavior is secondary to the fact that we
> > > know there are both systems where close should not be retried after
> > > EINTR (Linux) and systems where the fd is still open after EINTR
> > > (HP-UX). But it is my position that *portable code* should assume the
> > > Linux behavior, because that is the safest option. If you assume the
> > > HP-UX behavior on a machine that implements the Linux behavior, you
> > > might close some unrelated file out from under yourself (probably but
> > > not necessarily a different thread). If you assume the Linux behavior
> > > on a machine that implements the HP-UX behavior, you have leaked a
> > > file descriptor; the worst things that can do are much less severe.
> >
> > Unfortunately, regardless of what happens, code portable to old
> > systems needs to avoid getting in the situation to begin with. By
> > either not installing interrupting signal handlers or blocking EINTR
> > around close.
>
> [...]
>
> > > > While I agree with all of this, I think the tone is way too
> > > > proscriptive. The man pages are to document the behaviors, not tell
> > > > people how to program.
> > >
> > > I could be persuaded to tone it down a little but in this case I think
> > > the man page's job *is* to tell people how to program. We know lots of
> > > existing code has gotten the fine details of close() wrong and we are
> > > trying to document how to do it right.
> >
> > No, the job of the man pages absolutely is not "to tell people how to
> > program". It's to document behaviors. They are not a programming
> > tutorial. They are not polemic diatribes. They are unbiased statements
> > of facts. Facts of what the standards say and what implementations do,
> > that equip programmers with the knowledge they need to make their own
> > informed decisions, rather than blindly following what someone who
> > thinks they know better told them to do.
>
> This reminds me a little bit of the realloc(p,0) fiasco of C89 and
> glibc.
>
> In most cases, I agree with you that manual pages are and should be
> aseptic, there are cases where I think the manual page needs to be
> tutorial. Especially when there's such a mess, we need to both explain
> all the possible behaviors (or at least mention them to some degree).
... and guide programmers about how to best use the API.
I forgot to finish the sentence.
>
> But for example, there's the case of realloc(p,0), where we have
> a fiasco that was pushed by a compoundment of wrong decisions by the
> C Committee, and prior to that from System V. We're a bit lucky that
> C17 accidentally broke it so badly that we now have it as UB, and that
> gives us the opportunity to fix it now (which BTW might also be the case
> for close(2)).
>
> In the case of realloc(3), I went and documented in the manual page that
> glibc is broken, and that ISO C is also broken.
>
> STANDARDS
> malloc()
> free()
> calloc()
> realloc()
> C23, POSIX.1‐2024.
>
> reallocarray()
> POSIX.1‐2024.
>
> realloc(p, 0)
> The behavior of realloc(p, 0) in glibc doesn’t conform to
> any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008,
> POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17
> specification was changed to make it conforming, but that
> specification made it impossible to write code that reli‐
> ably determines if the input pointer is freed after real‐
> loc(p, 0), and C23 changed it again to make this undefined
> behavior, acknowledging that the C17 specification was
> broad enough, so that undefined behavior wasn’t worse than
> that.
>
> reallocarray() suffers the same issues in glibc.
>
> musl libc and the BSDs conform to all versions of ISO C
> and POSIX.1.
>
> gnulib provides the realloc‐posix module, which provides
> wrappers realloc() and reallocarray() that conform to all
> versions of ISO C and POSIX.1.
>
> There’s a proposal to standardize the BSD behavior: https:
> //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt.
>
> HISTORY
> malloc()
> free()
> calloc()
> realloc()
> POSIX.1‐2001, C89.
>
> reallocarray()
> glibc 2.26. OpenBSD 5.6, FreeBSD 11.0.
>
> malloc() and related functions rejected sizes greater than
> PTRDIFF_MAX starting in glibc 2.30.
>
> free() preserved errno starting in glibc 2.33.
>
> realloc(p, 0)
> C89 was ambiguous in its specification of realloc(p, 0).
> C99 partially fixed this.
>
> The original implementation in glibc would have been con‐
> forming to C99. However, and ironically, trying to comply
> with C99 before the standard was released, glibc changed
> its behavior in glibc 2.1.1 into something that ended up
> not conforming to the final C99 specification (but this is
> debated, as the wording of the standard seems self‐contra‐
> dicting).
>
> ...
>
> BUGS
> Programmers would naturally expect by induction that
> realloc(p, size) is consistent with free(p) and mal‐
> loc(size), as that is the behavior in the general case.
> This is not explicitly required by POSIX.1‐2024 or C11,
> but all conforming implementations are consistent with
> that.
>
> The glibc implementation of realloc() is not consistent
> with that, and as a consequence, it is dangerous to call
> realloc(p, 0) in glibc.
>
> A trivial workaround for glibc is calling it as
> realloc(p, size?size:1).
>
> The workaround for reallocarray() in glibc ——which shares
> the same bug—— would be
> reallocarray(p, n?n:1, size?size:1).
>
>
> Apart from documenting that glibc and ISO C are broken, we document how
> to best deal with it (see the last paragraph in BUGS). This is
> necessary because I fear that just by documenting the different
> behaviors, programmers would still not know what to do with that.
> Just take into account that even several members of the committee don't
> know how to deal with it.
>
> I'd be willing to have something similar for close(2).
>
>
> Have a lovely night!
> Alex
>
> P.S.: I have great news about realloc(p,0)! Microsoft is on-board with
> the change. They told me they like the proposal, and are willing to
> fix their realloc(3) implementation. They'll now conduct tests to make
> sure it doesn't break anything too badly, and will come back to me with
> any feedback they have from those tests.
>
> I'll put the standards proposal for realloc(3) on hold, waiting for
> Microsoft's feedback.
>
> > > > Aside: the reason EINTR *has to* be specified this way is that pthread
> > > > cancellation is aligned with EINTR. If EINTR were defined to have
> > > > closed the fd, then acting on cancellation during close would also
> > > > have closed the fd, but the cancellation handler would have no way to
> > > > distinguish this, leading to a situation where you're forced to either
> > > > leak fds or introduce a double-close vuln.
> > >
> > > The correct way to address this would be to make close() not be a
> > > cancellation point.
> >
> > This would also be a desirable change, one I would support if other
> > implementors are on-board with pushing for it.
> >
> > > > An outline of what I'd like to see instead:
> > > >
> > > > - Clear explanation of why double-close is a serious bug that must
> > > > always be avoided. (I think we all agree on this.)
> > > >
> > > > - Statement that the historical Linux/glibc behavior and current POSIX
> > > > requirement differ, without language that tries to paint the POSIX
> > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history
> > > > of the issue (Austin Group tracker items 529, 614; maybe others).
> > > >
> > > > - Consequence of just assuming the Linux behavior (fd leaks on
> > > > conforming systems).
> > > >
> > > > - Consequences of assuming the POSIX behavior (double-close vulns on
> > > > GNU/Linux, maybe others).
> > > >
> > > > - Survey of methods for avoiding the problem (ways to preclude EINTR,
> > > > possibly ways to infer behavior, etc).
> > >
> > > This outline seems more or less reasonable to me but, if it's me
> > > writing the text, I _will_ characterize what POSIX currently says
> > > about EINTR returns from close() as a bug in POSIX. As far as I'm
> > > concerned, that is a fact, not polemic.
> > >
> > > I have found that arguing with you in particular, Rich, is generally
> > > not worth the effort. Therefore, unless you reply and _accept_ that
> > > the final version of the close manpage will say that POSIX is buggy,
> > > I am not going to write another version of this text, nor will I be
> > > drawn into further debate.
> >
> > I will not accept that because it's a gross violation of the
> > responsibility of document writing.
> >
> > Rich
>
> --
> <https://www.alejandro-colomar.es>
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-20 20:42 ` Alejandro Colomar
@ 2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
2026-01-24 19:34 ` The 8472
0 siblings, 2 replies; 56+ messages in thread
From: Zack Weinberg @ 2026-01-23 0:33 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
Alright, since it actually seems possible we might be having a
reasonable conversation about the close manpage now, I've done
another draft. I *think* this covers all the concerns expressed
so far. I am feeling somewhat more charitable toward the Austin
Group after close-reading the current POSIX spec for close,
so there is no BUGS section after all. In their shoes I would
still have disallowed EINTR returns from close altogether, but
I can see why they felt that was a step too far.
This is a full top-to-bottom rewrite of the manpage; please speak
up if you don't like any of my changes to any of it, not just the
new stuff about delayed errors. It's written in freeform text for
ease of reading; I'll do proper troff markup after the text is
finalized. (Alejandro, do you have a preference between -man
and -mdoc markup?)
Please note the [QUERY:] sections sprinkled throughout NOTES.
I would like to have answers to those questions for the final draft.
zw
NAME
close - close a file descriptor
LIBRARY
Standard C library (libc, -lc)
SYNOPSIS
#include <unistd.h>
int close(int fd);
DESCRIPTION
close() closes a file descriptor, so that it no longer refers
to any file and may be reused.
When the last file descriptor referring to an underlying open
file description (see open(2)) is closed, the resources
associated with the open file description are freed. If that
open file description is the last reference to a file which has
been removed using unlink(2), the file is deleted.
When *any* file descriptor is closed, all record locks held by
the *process*, on the file formerly referred to by that file
descriptor, are released. This happens even if the file is
still open in the process via a different file descriptor.
See fcntl(2) for discussion of the consequences, and for
alternatives with less surprising semantics.
close() may report a *delayed error* from previous I/O
operations on a file. When it does this, the file descriptor
has still been closed, but the error needs to be handled.
See RETURN VALUE, ERRORS, and NOTES for further discussion of
what the errors reported by close mean, and how to handle them.
Despite the possibility of delayed errors, a successful close()
does *not* guarantee that all data written to the file has been
successfully saved to persistent storage. If you need such a
guarantee, use fsync(2); see that page for details.
The close-on-exec file descriptor flag can be used to ensure
that a file descriptor is automatically closed upon a
successful execve(2); see fcntl(2) for details.
RETURN VALUE
close() returns zero if the descriptor has been closed and
there were no delayed errors to report.
It returns -1 if there was an error that prevented the
file descriptor from being closed, *or* if the descriptor
has successfully been closed but there was a delayed error
to report. The errno code can be used to distinguish them;
see ERRORS and NOTES.
ERRORS
EBADF The fd argument was not a valid, open file descriptor.
EINTR The close() call was interrupted by a signal.
The file descriptor *may or may not* have been closed,
depending on the operating system. See “Signals and
close(),” below.
EINPROGRESS
[POSIX.1-2024 only] The close() call was interrupted by
a signal, after the file descriptor number was released
for reuse, but before all clean-up work had been
completed. The file descriptor has been closed,
and a delayed error may have been lost. See “Signals
and close(),” below.
EIO
ESTALE
EDQUOT
EFBIG
ENOSPC These error codes indicate a delayed error from a
previous write(2) operation. The file descriptor has
been closed, but the error needs to be handled.
See “Delayed errors reported by close()”, below.
Depending on the underlying file and/or file system, close()
may return with other errno codes besides those listed.
All such codes also indicate delayed errors.
NOTES
Multithreaded processes and close()
In a multithreaded program, each thread must take care not to
accidentally close file descriptors that are in use by other
threads. Because system calls that *open* files, sockets,
etc. always allocate the lowest file descriptor number that’s
not in use, file descriptor numbers are rapidly reused.
Closing an fd that another thread is still using is therefore
likely to cause data to be read or written to the wrong place.
Sometimes programs *deliberately* close a file descriptor that
is in use by another thread, intending to cancel any blocking
I/O operation that the other thread is performing. Whether
this works depends on the operating system. On Linux, it
doesn’t work; a blocking I/O system call holds a direct
reference to the underlying open file description that is the
target of the I/O, and is unaffected by the program closing the
file descriptor that was used to initiate the I/O operation.
(See open(2) for a discussion of open file descriptions.)
Delayed errors reported by close()
In a variety of situations, most notably when writing to a file
that is hosted on a network file server, write(2) operations may
“optimistically” return successfully as soon as the write has
been queued for processing.
close(2) waits for confirmation that *most* of the processing
for previous writes to a file has been completed, and reports
any errors that the earlier write() calls *would have* reported,
if they hadn’t returned optimistically. Especially, close()
will report “disk full” (ENOSPC) and “disk quota exceeded”
(EDQUOT) errors that write() didn’t wait for.
(To wait for *all* processing to complete, it is necessary to
use fsync(2) as well.)
Because of these delayed errors, it’s important to check the
return value of close() and handle any errors it reports.
Ignoring delayed errors can cause silent loss of data.
However, when handling delayed errors, keep in mind that the
close() call should *not* be repeated. When close() has a
delayed error to report, it still closes the file before
returning. The file descriptor number might already have been
reused for some other file, especially in multithreaded
programs. To make another attempt at the failed writes, it’s
necessary to reopen the file and start all over again.
[QUERY: Do delayed errors ever happen in any of these situations?
- The fd is not the last reference to the open file description
- The OFD was opened with O_RDONLY
- The OFD was opened with O_RDWR but has never actually
been written to
- No data has been written to the OFD since the last call to
fsync() for that OFD
- No data has been written to the OFD since the last call to
fdatasync() for that OFD
If we can give some guidance about when people don’t need to
worry about delayed errors, it would be helpful.]
Signals and close()
close() waits for various I/O operations to complete; it is a
blocking system call, which can be interrupted by signals and
thread cancellation. As usual, when close() is interrupted
by a signal, it returns -1 and sets errno to EINTR.
Unlike most system calls that can be interrupted by signals,
it is not safe to repeat an interrupted call to close().
Prior to POSIX.1-2024, when a close() was interrupted by a
signal, it was *unspecified* whether the file descriptor was
still open afterward. The authors of this manpage are aware
of both systems where the file descriptor is guaranteed to
still be open after an interrupted close(), e.g. HP-UX, and
systems where it is guaranteed to be *closed* after an
interrupted close(), e.g. Linux and FreeBSD.
POSIX.1-2024 makes stricter requirements; operating systems
should now return EINPROGRESS, rather than EINTR, when close()
is interrupted before it’s completely done, but after the file
descriptor number is released for reuse. As usual, though, it
will be a a long time before portable code can safely assume
all supported systems are compliant with this new requirement.
Regardless of the error code, on systems where an interrupted
close() cannot be retried, an interruption means that delayed
errors may be lost, and in turn *that* means data might silently
be lost. Therefore, we strongly recommend that programmers
avoid allowing close() to be interrupted by signals in the
first place. This can be done in all the usual ways—use only
signal handlers installed by sigaction(2) with the SA_RESTART
flag, keep signals blocked at all times except during calls
to ppoll(2), dedicate a thread to signal handling, etc.
[QUERY: Do we know if close() is allowed to block or report delayed
errors when no data has been written to the OFD since the last
completed fsync() or fdatasync() on that OFD? If it isn’t
allowed to block or report delayed errors in that case, another
good recommendation would be to always use at least fdatasync()
and let *that* be the thing that gets interrupted by signals.
The POSIX.1-2024 RATIONALE section makes a very similar
recommendation, but doesn’t appear to back that up with
normative requirements on close().]
STANDARDS
POSIX.1-2024.
HISTORY
The close() system call was present in Unix V7.
POSIX.1-2024 clarified the semantics of delayed errors; prior
to that revision, it was unspecified whether a close() call
that returned a delayed error would close the file descriptor.
However, we are not aware of any systems where it didn’t.
SEE ALSO
close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2),
unlink(2), open(2), read(2), write(2), fopen(3), fclose(3)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 0:33 ` Zack Weinberg
@ 2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
2026-01-23 14:05 ` Zack Weinberg
2026-01-24 19:34 ` The 8472
1 sibling, 2 replies; 56+ messages in thread
From: Alejandro Colomar @ 2026-01-23 1:02 UTC (permalink / raw)
To: Zack Weinberg
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]
Hi Zack,
On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote:
[...]
> This is a full top-to-bottom rewrite of the manpage; please speak
> up if you don't like any of my changes to any of it, not just the
> new stuff about delayed errors. It's written in freeform text for
> ease of reading; I'll do proper troff markup after the text is
> finalized. (Alejandro, do you have a preference between -man
> and -mdoc markup?)
Strong preference for man(7).
[...]
> ERRORS
> EBADF The fd argument was not a valid, open file descriptor.
>
> EINTR The close() call was interrupted by a signal.
> The file descriptor *may or may not* have been closed,
> depending on the operating system. See “Signals and
> close(),” below.
Punctuation like commas should go outside of the quotes (yes, I know
some styles do that, but we don't).
[...]
> STANDARDS
> POSIX.1-2024.
>
> HISTORY
> The close() system call was present in Unix V7.
That would be simply stated as:
V7.
We could also document the first POSIX standard, as not all Unix APIs
were standardized at the same time. Thus:
V7, POSIX.1-1988.
Thanks!
Have a lovely night!
Alex
>
> POSIX.1-2024 clarified the semantics of delayed errors; prior
> to that revision, it was unspecified whether a close() call
> that returned a delayed error would close the file descriptor.
> However, we are not aware of any systems where it didn’t.
>
> SEE ALSO
> close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2),
> unlink(2), open(2), read(2), write(2), fopen(3), fclose(3)
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:02 ` Alejandro Colomar
@ 2026-01-23 1:38 ` Al Viro
2026-01-23 14:44 ` Alejandro Colomar
2026-01-23 14:05 ` Zack Weinberg
1 sibling, 1 reply; 56+ messages in thread
From: Al Viro @ 2026-01-23 1:38 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote:
> > HISTORY
> > The close() system call was present in Unix V7.
>
> That would be simply stated as:
>
> V7.
>
> We could also document the first POSIX standard, as not all Unix APIs
> were standardized at the same time. Thus:
>
> V7, POSIX.1-1988.
>
> Thanks!
11/3/71 SYS CLOSE (II)
NAME close -- close a file
SYNOPSIS (file descriptor in r0)
sys close / close = 6.
DESCRIPTION Given a file descriptor such as returned from an open or
creat call, close closes the associated file. A close of
all files is automatic on exit, but since processes are
limited to 10 simultaneously open files, close is
necessary to programs which deal with many files.
FILES
SEE ALSO creat, open
DIAGNOSTICS The error bit (c—bit) is set for an unknown file
descriptor.
BUGS
OWNER ken, dmr
That's V1 manual. In V3 we already get EBADF on unopened descriptor;
in _all_ cases there close(N) ends up with descriptor N not opened.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
@ 2026-01-23 14:05 ` Zack Weinberg
1 sibling, 0 replies; 56+ messages in thread
From: Zack Weinberg @ 2026-01-23 14:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On Thu, Jan 22, 2026, at 8:02 PM, Alejandro Colomar wrote:
> On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote:
> [...]
>
>> (Alejandro, do you have a preference between -man
>> and -mdoc markup?)
>
> Strong preference for man(7).
OK.
>> close(),” below.
>
> Punctuation like commas should go outside of the quotes (yes, I know
> some styles do that, but we don't).
Will correct.
>> HISTORY
>> The close() system call was present in Unix V7.
>
> That would be simply stated as:
>
> V7.
Looking at other really old system calls (fork(), open(), read(), _exit(), link()),
they all say "SVr4, 4.3BSD, POSIX.1-2001" and that's what this one said too,
before I changed it. I think I'll put it back the way it was.
zw
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 1:38 ` Al Viro
@ 2026-01-23 14:44 ` Alejandro Colomar
0 siblings, 0 replies; 56+ messages in thread
From: Alejandro Colomar @ 2026-01-23 14:44 UTC (permalink / raw)
To: Al Viro
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
Hi Al,
On Fri, Jan 23, 2026 at 01:38:59AM +0000, Al Viro wrote:
> On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote:
> > > HISTORY
> > > The close() system call was present in Unix V7.
> >
> > That would be simply stated as:
> >
> > V7.
> >
> > We could also document the first POSIX standard, as not all Unix APIs
> > were standardized at the same time. Thus:
> >
> > V7, POSIX.1-1988.
> >
> > Thanks!
>
> 11/3/71 SYS CLOSE (II)
> NAME close -- close a file
> SYNOPSIS (file descriptor in r0)
> sys close / close = 6.
> DESCRIPTION Given a file descriptor such as returned from an open or
> creat call, close closes the associated file. A close of
> all files is automatic on exit, but since processes are
> limited to 10 simultaneously open files, close is
> necessary to programs which deal with many files.
> FILES
> SEE ALSO creat, open
> DIAGNOSTICS The error bit (c—bit) is set for an unknown file
> descriptor.
> BUGS
> OWNER ken, dmr
>
> That's V1 manual. In V3 we already get EBADF on unopened descriptor;
> in _all_ cases there close(N) ends up with descriptor N not opened.
Thanks! Then it should actually be
V1, POSIX.1-1988.
Let's not document the history change from V3, as those details are
better documented as part of the V3 manual and reading the sources.
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
@ 2026-01-24 19:34 ` The 8472
2026-01-24 21:39 ` Rich Felker
1 sibling, 1 reply; 56+ messages in thread
From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw)
To: Zack Weinberg, Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
On 23/01/2026 01:33, Zack Weinberg wrote:
[...]
> ERRORS
> EBADF The fd argument was not a valid, open file descriptor.
Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
on close[0], that makes it more difficult to reliably detect bugs relating
to double-closes of file descriptors.
[...]
> Delayed errors reported by close()
>
> In a variety of situations, most notably when writing to a file
> that is hosted on a network file server, write(2) operations may
> “optimistically” return successfully as soon as the write has
> been queued for processing.
>
> close(2) waits for confirmation that *most* of the processing
> for previous writes to a file has been completed, and reports
> any errors that the earlier write() calls *would have* reported,
> if they hadn’t returned optimistically. Especially, close()
> will report “disk full” (ENOSPC) and “disk quota exceeded”
> (EDQUOT) errors that write() didn’t wait for.
>
> (To wait for *all* processing to complete, it is necessary to
> use fsync(2) as well.)
>
> Because of these delayed errors, it’s important to check the
> return value of close() and handle any errors it reports.
> Ignoring delayed errors can cause silent loss of data.
>
> However, when handling delayed errors, keep in mind that the
> close() call should *not* be repeated. When close() has a
> delayed error to report, it still closes the file before
> returning. The file descriptor number might already have been
> reused for some other file, especially in multithreaded
> programs. To make another attempt at the failed writes, it’s
> necessary to reopen the file and start all over again.
>
> [QUERY: Do delayed errors ever happen in any of these situations?
>
> - The fd is not the last reference to the open file description
>
> - The OFD was opened with O_RDONLY
>
> - The OFD was opened with O_RDWR but has never actually
> been written to
>
> - No data has been written to the OFD since the last call to
> fsync() for that OFD
>
> - No data has been written to the OFD since the last call to
> fdatasync() for that OFD
>
> If we can give some guidance about when people don’t need to
> worry about delayed errors, it would be helpful.]
>
The Rust standard library team is also interested in this topic, there
is lively discussion[1] whether it makes sense to surface errors from
close at all. Our current default is to ignore them.
It is my understanding that errors may not have happened yet at
the time of close due to delayed writeback or additional descriptors
pointing to the description, e.g. in a forked child, and thus
close() is not a reliable mechanism for error detection and
fsync() is the only available option.
Some users do care specifically about the unusual behavior
on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate
that there's no middle ground to get errors on an open file descriptor
or initiate the NFS flush behavior without a full fsync.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/libs-team/issues/705
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 19:34 ` The 8472
@ 2026-01-24 21:39 ` Rich Felker
2026-01-24 21:57 ` The 8472
0 siblings, 1 reply; 56+ messages in thread
From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw)
To: The 8472
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
> On 23/01/2026 01:33, Zack Weinberg wrote:
>
> [...]
>
> > ERRORS
> > EBADF The fd argument was not a valid, open file descriptor.
>
> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
> on close[0], that makes it more difficult to reliably detect bugs relating
> to double-closes of file descriptors.
Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
it? I wonder if that could even have security implications. I think
you could detect these fraudulent EBADFs (albeit not under conditions
where there's a race bug) by performing fcntl/F_GETFD before close and
knowing the EBADF from close is fake is fcntl didn't EBADF, but that
seems like an unreasonable cost to work around FUSE behaving badly.
Rich
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 21:39 ` Rich Felker
@ 2026-01-24 21:57 ` The 8472
2026-01-25 15:37 ` Zack Weinberg
0 siblings, 1 reply; 56+ messages in thread
From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On 24/01/2026 22:39, Rich Felker wrote:
> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>
>> [...]
>>
>>> ERRORS
>>> EBADF The fd argument was not a valid, open file descriptor.
>>
>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>> on close[0], that makes it more difficult to reliably detect bugs relating
>> to double-closes of file descriptors.
>
> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
> it?
Not when I brought it up last time, no[0]
> I wonder if that could even have security implications. I think
> you could detect these fraudulent EBADFs (albeit not under conditions
> where there's a race bug) by performing fcntl/F_GETFD before close and
> knowing the EBADF from close is fake is fcntl didn't EBADF, but that
> seems like an unreasonable cost to work around FUSE behaving badly.
>
> Rich
That's pretty much the workaround[1] we use, but due to the extra syscall it's
only done in debug builds.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-24 21:57 ` The 8472
@ 2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
0 siblings, 2 replies; 56+ messages in thread
From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw)
To: The 8472, Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> On 24/01/2026 22:39, Rich Felker wrote:
>> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>>
>>> [...]
>>>
>>>> ERRORS
>>>> EBADF The fd argument was not a valid, open file descriptor.
>>>
>>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>>> on close[0], that makes it more difficult to reliably detect bugs relating
>>> to double-closes of file descriptors.
>>
>> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
>> it?
>
> Not when I brought it up last time, no[0]
>
> [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
It seems to me that Antonio Muscemi’s point is valid for *most* errno
codes. Like, a whole lot of them exist just to give more information
*to a human user* about the cause of an unrecoverable error. Take
the list of “error codes that indicate a delayed error from a previous
write(2) operation,” from a little later in the draft, for instance:
there’s no plausible way for a *program* to react differently to
EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want
to react differently, so we want different error messages for each,
so they’re different error codes. It’s not a problem if the kernel
produces an error code of this type that wasn’t in the official
documented list, because the program doesn’t need to treat it specially.
But EBADF is different; it has the very specific meaning “user space
passed an invalid file descriptor to a system call,” which almost
always indicates a *bug in the program*, and allowing that meaning to
be diluted is not OK. It’s getting off topic for this conversation,
but there’s a short list of other errno codes that indicate a specific
situation that the *program* should respond to in a specific way
(EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones
I can think of) and maybe it would spark a more constructive
conversation on the kernel side if we presented a *comprehensive*
list of errno codes that FUSE servers shouldn’t be allowed to produce
with a specific rationale for each.
>> Delayed errors reported by close()
>>
>> In a variety of situations, most notably when writing to a file
>> that is hosted on a network file server, write(2) operations may
>> “optimistically” return successfully as soon as the write has
>> been queued for processing.
>>
>> close(2) waits for confirmation that *most* of the processing
>> for previous writes to a file has been completed, and reports
>> any errors that the earlier write() calls *would have* reported,
>> if they hadn’t returned optimistically. Especially, close()
>> will report “disk full” (ENOSPC) and “disk quota exceeded”
>> (EDQUOT) errors that write() didn’t wait for.
>
> The Rust standard library team is also interested in this topic, there
> is lively discussion[1] whether it makes sense to surface errors from
> close at all. Our current default is to ignore them.
> It is my understanding that errors may not have happened yet at
> the time of close due to delayed writeback or additional descriptors
> pointing to the description, e.g. in a forked child, and thus
> close() is not a reliable mechanism for error detection and
> fsync() is the only available option.
>
> [1] https://github.com/rust-lang/libs-team/issues/705
This is something I care about a lot as well, but I currently don’t
have an *opinion*. To form an informed opinion, I need the answers
to these questions:
>> [QUERY: Do delayed errors ever happen in any of these situations?
>>
>> - The fd is not the last reference to the open file description
>>
>> - The OFD was opened with O_RDONLY
>>
>> - The OFD was opened with O_RDWR but has never actually
>> been written to
>>
>> - No data has been written to the OFD since the last call to
>> fsync() for that OFD
>>
>> - No data has been written to the OFD since the last call to
>> fdatasync() for that OFD
>>
>> If we can give some guidance about when people don’t need to
>> worry about delayed errors, it would be helpful.]
In particular, I really hope delayed errors *aren’t* ever reported
when you close a file descriptor that *isn’t* the last reference
to its open file description, because the thread-safe way to close
stdout without losing write errors[2] depends on that not happening.
And whether the Rust stdlib can legitimately say “leaving aside the
additional cost of calling fsync(), you do not *need* the error return
from close() because you can call fsync() first,” depends on whether
it’s actually true that you *won’t* ever get a delayed error from
close() if you called fsync() first and didn’t do any more output in
between (assume the fd has no duplicates here). I would not be
surprised at all if those FUSE guys insisted on their right to make
char msg[] = "soon I will be invincible\n";
int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
write(fd, msg, sizeof(msg) - 1);
fsync(fd);
close(fd);
return an error *only* from the close, not the write or the fsync.
And I also wouldn’t be surprised at all to find production NFS or
SMB servers that did that.
[2] https://stackoverflow.com/a/50865617 (third code block)
zw
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-25 15:37 ` Zack Weinberg
@ 2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
1 sibling, 0 replies; 56+ messages in thread
From: Florian Weimer @ 2026-01-26 8:51 UTC (permalink / raw)
To: Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development
* Zack Weinberg:
> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.
> [2] https://stackoverflow.com/a/50865617 (third code block)
Are you sure about that? It means that errors are never reported if a
shell script redirects standard output over multiple commands.
Thanks,
Florian
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
@ 2026-01-26 12:15 ` Jan Kara
2026-01-26 13:53 ` The 8472
1 sibling, 1 reply; 56+ messages in thread
From: Jan Kara @ 2026-01-26 12:15 UTC (permalink / raw)
To: Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development
On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> >> Delayed errors reported by close()
> >>
> >> In a variety of situations, most notably when writing to a file
> >> that is hosted on a network file server, write(2) operations may
> >> “optimistically” return successfully as soon as the write has
> >> been queued for processing.
> >>
> >> close(2) waits for confirmation that *most* of the processing
> >> for previous writes to a file has been completed, and reports
> >> any errors that the earlier write() calls *would have* reported,
> >> if they hadn’t returned optimistically. Especially, close()
> >> will report “disk full” (ENOSPC) and “disk quota exceeded”
> >> (EDQUOT) errors that write() didn’t wait for.
> >
> > The Rust standard library team is also interested in this topic, there
> > is lively discussion[1] whether it makes sense to surface errors from
> > close at all. Our current default is to ignore them.
> > It is my understanding that errors may not have happened yet at
> > the time of close due to delayed writeback or additional descriptors
> > pointing to the description, e.g. in a forked child, and thus
> > close() is not a reliable mechanism for error detection and
> > fsync() is the only available option.
> >
> > [1] https://github.com/rust-lang/libs-team/issues/705
>
> This is something I care about a lot as well, but I currently don’t
> have an *opinion*. To form an informed opinion, I need the answers
> to these questions:
>
> >> [QUERY: Do delayed errors ever happen in any of these situations?
> >>
> >> - The fd is not the last reference to the open file description
> >>
> >> - The OFD was opened with O_RDONLY
> >>
> >> - The OFD was opened with O_RDWR but has never actually
> >> been written to
> >>
> >> - No data has been written to the OFD since the last call to
> >> fsync() for that OFD
> >>
> >> - No data has been written to the OFD since the last call to
> >> fdatasync() for that OFD
> >>
> >> If we can give some guidance about when people don’t need to
> >> worry about delayed errors, it would be helpful.]
>
> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.
So I've checked and in Linux ->flush callback for the file is called
whenever you close a file descriptor (regardless whether there are other
file descriptors pointing to the same file description) so it's upto
filesystem implementation what it decides to do and which error it will
return... Checking the implementations e.g. FUSE and NFS *will* return
delayed writeback errors on *first* descriptor close even if there are
other still open descriptors for the description AFAICS.
> And whether the Rust stdlib can legitimately say “leaving aside the
> additional cost of calling fsync(), you do not *need* the error return
> from close() because you can call fsync() first,” depends on whether
> it’s actually true that you *won’t* ever get a delayed error from
> close() if you called fsync() first and didn’t do any more output in
> between (assume the fd has no duplicates here). I would not be
> surprised at all if those FUSE guys insisted on their right to make
>
> char msg[] = "soon I will be invincible\n";
> int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
> write(fd, msg, sizeof(msg) - 1);
> fsync(fd);
> close(fd);
>
> return an error *only* from the close, not the write or the fsync.
So fsync(2) must make sure data is persistently stored and return error if
it was not. Thus as a VFS person I'd consider it a filesystem bug if an
error preveting reading data later was not returned from fsync(2). OTOH
that doesn't necessarily mean that later close doesn't return an error -
e.g. FUSE does communicate with the server on close that can fail and
error can be returned.
With this in mind let me now try to answer your remaining questions:
> >> - The OFD was opened with O_RDONLY
If the filesystem supports atime, close can in principle report that atime
update failed.
> >> - The OFD was opened with O_RDWR but has never actually
> >> been written to
The same as above but with inode mtime updates.
> >> - No data has been written to the OFD since the last call to
> >> fsync() for that OFD
No writeback errors should happen in this case. As I wrote above I'd
consider this a filesystem bug.
> >>
> >> - No data has been written to the OFD since the last call to
> >> fdatasync() for that OFD
Errors can happen because some inode metadata (in practice probably only
inode time stamps) may still need to be written out.
So in the cases described above (except for fsync()) you may get delayed
errors on close. But since in all those cases no data is lost, I don't
think 99.9% of applications care at all...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 12:15 ` Jan Kara
@ 2026-01-26 13:53 ` The 8472
2026-01-26 15:56 ` Jan Kara
0 siblings, 1 reply; 56+ messages in thread
From: The 8472 @ 2026-01-26 13:53 UTC (permalink / raw)
To: Jan Kara, Zack Weinberg
Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On 26/01/2026 13:15, Jan Kara wrote:
> On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
>>
>>>> [QUERY: Do delayed errors ever happen in any of these situations?
>>>>
>>>> - The fd is not the last reference to the open file description
>>>>
>>>> - The OFD was opened with O_RDONLY
>>>>
>>>> - The OFD was opened with O_RDWR but has never actually
>>>> been written to
>>>>
>>>> - No data has been written to the OFD since the last call to
>>>> fsync() for that OFD
>>>>
>>>> - No data has been written to the OFD since the last call to
>>>> fdatasync() for that OFD
>>>>
>>>> If we can give some guidance about when people don’t need to
>>>> worry about delayed errors, it would be helpful.]
>>
>> In particular, I really hope delayed errors *aren’t* ever reported
>> when you close a file descriptor that *isn’t* the last reference
>> to its open file description, because the thread-safe way to close
>> stdout without losing write errors[2] depends on that not happening.
>
> So I've checked and in Linux ->flush callback for the file is called
> whenever you close a file descriptor (regardless whether there are other
> file descriptors pointing to the same file description) so it's upto
> filesystem implementation what it decides to do and which error it will
> return... Checking the implementations e.g. FUSE and NFS *will* return
> delayed writeback errors on *first* descriptor close even if there are
> other still open descriptors for the description AFAICS.
Regarding the "first", does that mean the errors only get delivered once?
I.e. if a concurrent fork/exec happens for process spawning and the fork-child
closes the file descriptors then this closing may basically receive the errors
and the parent will not see them (unless additional errors happen)?
Or if _any_ part of the program dups the descriptor and then closes it without
reporting errors then all uses of those descriptor must consider error delivery
on close to be unreliable?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 13:53 ` The 8472
@ 2026-01-26 15:56 ` Jan Kara
2026-01-26 16:43 ` Jeff Layton
0 siblings, 1 reply; 56+ messages in thread
From: Jan Kara @ 2026-01-26 15:56 UTC (permalink / raw)
To: The 8472
Cc: Jan Kara, Zack Weinberg, Rich Felker, Alejandro Colomar,
Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-api, GNU libc development, Jeff Layton
On Mon 26-01-26 14:53:12, The 8472 wrote:
> On 26/01/2026 13:15, Jan Kara wrote:
> > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > >
> > > > > - The fd is not the last reference to the open file description
> > > > >
> > > > > - The OFD was opened with O_RDONLY
> > > > >
> > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > been written to
> > > > >
> > > > > - No data has been written to the OFD since the last call to
> > > > > fsync() for that OFD
> > > > >
> > > > > - No data has been written to the OFD since the last call to
> > > > > fdatasync() for that OFD
> > > > >
> > > > > If we can give some guidance about when people don’t need to
> > > > > worry about delayed errors, it would be helpful.]
> > >
> > > In particular, I really hope delayed errors *aren’t* ever reported
> > > when you close a file descriptor that *isn’t* the last reference
> > > to its open file description, because the thread-safe way to close
> > > stdout without losing write errors[2] depends on that not happening.
> >
> > So I've checked and in Linux ->flush callback for the file is called
> > whenever you close a file descriptor (regardless whether there are other
> > file descriptors pointing to the same file description) so it's upto
> > filesystem implementation what it decides to do and which error it will
> > return... Checking the implementations e.g. FUSE and NFS *will* return
> > delayed writeback errors on *first* descriptor close even if there are
> > other still open descriptors for the description AFAICS.
> Regarding the "first", does that mean the errors only get delivered once?
I've added Jeff to CC who should be able to provide you with a more
authoritative answer but AFAIK the answer is yes.
E.g. NFS does:
static int
nfs_file_flush(struct file *file, fl_owner_t id)
{
...
/* Flush writes to the server and return any errors */
since = filemap_sample_wb_err(file->f_mapping);
nfs_wb_all(inode);
return filemap_check_wb_err(file->f_mapping, since);
}
which will writeback all outstanding data on the first close and report
error if it happened. Following close has nothing to flush and thus no
error to report.
That being said if you call fsync(2) you'll still get the error back again
because fsync uses a separate writeback error counter in the file
description. But again only the first fsync(2) will return the error.
Following fsyncs will report no error.
> I.e. if a concurrent fork/exec happens for process spawning and the
> fork-child closes the file descriptors then this closing may basically
> receive the errors and the parent will not see them (unless additional
> errors happen)?
Correct AFAICT.
> Or if _any_ part of the program dups the descriptor and then closes it
> without reporting errors then all uses of those descriptor must consider
> error delivery on close to be unreliable?
Correct as well AFAICT.
I should probably also add that traditional filesystems (classical local
disk based filesystems) don't bother with reporting delayed errors on
close(2) *at all*. So unless you call fsync(2) you will never learn there
was any writeback error. After all for these filesystems there are good
chances writeback didn't even start by the time you are calling close(2).
So overall I'd say that error reporting from close(2) is so random and
filesystem dependent that the errors are not worth paying attention to. If
you really care about data integrity (and thus writeback errors) you must
call fsync(2) in which case the kernel provides at least somewhat
consistent error reporting story.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 15:56 ` Jan Kara
@ 2026-01-26 16:43 ` Jeff Layton
2026-01-26 23:01 ` Trevor Gross
0 siblings, 1 reply; 56+ messages in thread
From: Jeff Layton @ 2026-01-26 16:43 UTC (permalink / raw)
To: Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
> On Mon 26-01-26 14:53:12, The 8472 wrote:
> > On 26/01/2026 13:15, Jan Kara wrote:
> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > >
> > > > > > - The fd is not the last reference to the open file description
> > > > > >
> > > > > > - The OFD was opened with O_RDONLY
> > > > > >
> > > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > > been written to
> > > > > >
> > > > > > - No data has been written to the OFD since the last call to
> > > > > > fsync() for that OFD
> > > > > >
> > > > > > - No data has been written to the OFD since the last call to
> > > > > > fdatasync() for that OFD
> > > > > >
> > > > > > If we can give some guidance about when people don’t need to
> > > > > > worry about delayed errors, it would be helpful.]
> > > >
> > > > In particular, I really hope delayed errors *aren’t* ever reported
> > > > when you close a file descriptor that *isn’t* the last reference
> > > > to its open file description, because the thread-safe way to close
> > > > stdout without losing write errors[2] depends on that not happening.
> > >
> > > So I've checked and in Linux ->flush callback for the file is called
> > > whenever you close a file descriptor (regardless whether there are other
> > > file descriptors pointing to the same file description) so it's upto
> > > filesystem implementation what it decides to do and which error it will
> > > return... Checking the implementations e.g. FUSE and NFS *will* return
> > > delayed writeback errors on *first* descriptor close even if there are
> > > other still open descriptors for the description AFAICS.
...and I really wish they _didn't_.
Reporting a writeback error on close is not particularly useful. Most
filesystems don't require you to write back all data on a close(). A
successful close() on those just means that no error has happened yet.
Any application that cares about writeback errors needs to fsync(),
full stop.
> > Regarding the "first", does that mean the errors only get delivered once?
>
> I've added Jeff to CC who should be able to provide you with a more
> authoritative answer but AFAIK the answer is yes.
>
> E.g. NFS does:
>
> static int
> nfs_file_flush(struct file *file, fl_owner_t id)
> {
> ...
> /* Flush writes to the server and return any errors */
> since = filemap_sample_wb_err(file->f_mapping);
> nfs_wb_all(inode);
> return filemap_check_wb_err(file->f_mapping, since);
> }
>
> which will writeback all outstanding data on the first close and report
> error if it happened. Following close has nothing to flush and thus no
> error to report.
>
> That being said if you call fsync(2) you'll still get the error back again
> because fsync uses a separate writeback error counter in the file
> description. But again only the first fsync(2) will return the error.
> Following fsyncs will report no error.
>
Note that NFS is "special" in that it will flush data on close() in
order to maintain close-to-open cache consistency.
Technically, what nfs is doing above is sampling the errseq_t in the
mapping, and then writing back any dirty data, and then checking for
errors that happened since the sample. close() will only report
writeback errors that happened within that window. If a preexisting
writeback error occurred before "since" was sampled, then it won't
report that here...which is weird, and another good argument for not
reporting or checking for writeback errors at close().
> > I.e. if a concurrent fork/exec happens for process spawning and the
> > fork-child closes the file descriptors then this closing may basically
> > receive the errors and the parent will not see them (unless additional
> > errors happen)?
>
> Correct AFAICT.
>
It will see them if it calls fsync(). Reporting on close() is iffy.
> > Or if _any_ part of the program dups the descriptor and then closes it
> > without reporting errors then all uses of those descriptor must consider
> > error delivery on close to be unreliable?
>
> Correct as well AFAICT.
>
> I should probably also add that traditional filesystems (classical local
> disk based filesystems) don't bother with reporting delayed errors on
> close(2) *at all*. So unless you call fsync(2) you will never learn there
> was any writeback error. After all for these filesystems there are good
> chances writeback didn't even start by the time you are calling close(2).
> So overall I'd say that error reporting from close(2) is so random and
> filesystem dependent that the errors are not worth paying attention to. If
> you really care about data integrity (and thus writeback errors) you must
> call fsync(2) in which case the kernel provides at least somewhat
> consistent error reporting story.
>
+1.
tl;dr: the only useful error from close() is EBADF.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 16:43 ` Jeff Layton
@ 2026-01-26 23:01 ` Trevor Gross
2026-01-27 0:49 ` Jeff Layton
0 siblings, 1 reply; 56+ messages in thread
From: Trevor Gross @ 2026-01-26 23:01 UTC (permalink / raw)
To: Jeff Layton, Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
> On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
>> On Mon 26-01-26 14:53:12, The 8472 wrote:
>> > On 26/01/2026 13:15, Jan Kara wrote:
>> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
>> > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
>> > > > > >
>> > > > > > - The fd is not the last reference to the open file description
>> > > > > >
>> > > > > > - The OFD was opened with O_RDONLY
>> > > > > >
>> > > > > > - The OFD was opened with O_RDWR but has never actually
>> > > > > > been written to
>> > > > > >
>> > > > > > - No data has been written to the OFD since the last call to
>> > > > > > fsync() for that OFD
>> > > > > >
>> > > > > > - No data has been written to the OFD since the last call to
>> > > > > > fdatasync() for that OFD
>> > > > > >
>> > > > > > If we can give some guidance about when people don’t need to
>> > > > > > worry about delayed errors, it would be helpful.]
>> > > >
>> > > > In particular, I really hope delayed errors *aren’t* ever reported
>> > > > when you close a file descriptor that *isn’t* the last reference
>> > > > to its open file description, because the thread-safe way to close
>> > > > stdout without losing write errors[2] depends on that not happening.
>> > >
>> > > So I've checked and in Linux ->flush callback for the file is called
>> > > whenever you close a file descriptor (regardless whether there are other
>> > > file descriptors pointing to the same file description) so it's upto
>> > > filesystem implementation what it decides to do and which error it will
>> > > return... Checking the implementations e.g. FUSE and NFS *will* return
>> > > delayed writeback errors on *first* descriptor close even if there are
>> > > other still open descriptors for the description AFAICS.
>
> ...and I really wish they _didn't_.
>
> Reporting a writeback error on close is not particularly useful. Most
> filesystems don't require you to write back all data on a close(). A
> successful close() on those just means that no error has happened yet.
>
> Any application that cares about writeback errors needs to fsync(),
> full stop.
Is there a good middle ground solution here?
It seems reasonable that an application may want to have different
handling for errors expected during normal operation, such as temporary
network failure with NFS, compared to more catastrophic things like
failure to write to disk. The reason cited around [1] for avoiding fsync
is that it comes with a cost that, for many applications, may not be
worth it unless you are dealing with NFS.
I was wondering if it could be worth a new fnctl that provides this kind
of "best effort" error checking behavior without having the strict
requirements of fsync. In effect, to report the errors that you might
currently get at close() before actually calling close() and losing the
fd.
Alternatively, it would be interesting to have a deferred fsync() that
schedules a nonblocking sync event that can be polled for completion/
errors, with flags to indicate immediate sync or allow automatic syncing
as needed. But there is probably a better alternative to this
complexity.
- Trevor
[1]: https://github.com/rust-lang/libs-team/issues/705
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-26 23:01 ` Trevor Gross
@ 2026-01-27 0:49 ` Jeff Layton
2026-01-28 16:58 ` Zack Weinberg
0 siblings, 1 reply; 56+ messages in thread
From: Jeff Layton @ 2026-01-27 0:49 UTC (permalink / raw)
To: Trevor Gross, Jan Kara, The 8472
Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote:
> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
> > > On Mon 26-01-26 14:53:12, The 8472 wrote:
> > > > On 26/01/2026 13:15, Jan Kara wrote:
> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > > > > [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > > > >
> > > > > > > > - The fd is not the last reference to the open file description
> > > > > > > >
> > > > > > > > - The OFD was opened with O_RDONLY
> > > > > > > >
> > > > > > > > - The OFD was opened with O_RDWR but has never actually
> > > > > > > > been written to
> > > > > > > >
> > > > > > > > - No data has been written to the OFD since the last call to
> > > > > > > > fsync() for that OFD
> > > > > > > >
> > > > > > > > - No data has been written to the OFD since the last call to
> > > > > > > > fdatasync() for that OFD
> > > > > > > >
> > > > > > > > If we can give some guidance about when people don’t need to
> > > > > > > > worry about delayed errors, it would be helpful.]
> > > > > >
> > > > > > In particular, I really hope delayed errors *aren’t* ever reported
> > > > > > when you close a file descriptor that *isn’t* the last reference
> > > > > > to its open file description, because the thread-safe way to close
> > > > > > stdout without losing write errors[2] depends on that not happening.
> > > > >
> > > > > So I've checked and in Linux ->flush callback for the file is called
> > > > > whenever you close a file descriptor (regardless whether there are other
> > > > > file descriptors pointing to the same file description) so it's upto
> > > > > filesystem implementation what it decides to do and which error it will
> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return
> > > > > delayed writeback errors on *first* descriptor close even if there are
> > > > > other still open descriptors for the description AFAICS.
> >
> > ...and I really wish they _didn't_.
> >
> > Reporting a writeback error on close is not particularly useful. Most
> > filesystems don't require you to write back all data on a close(). A
> > successful close() on those just means that no error has happened yet.
> >
> > Any application that cares about writeback errors needs to fsync(),
> > full stop.
>
> Is there a good middle ground solution here?
>
> It seems reasonable that an application may want to have different
> handling for errors expected during normal operation, such as temporary
> network failure with NFS, compared to more catastrophic things like
> failure to write to disk. The reason cited around [1] for avoiding fsync
> is that it comes with a cost that, for many applications, may not be
> worth it unless you are dealing with NFS.
>
> I was wondering if it could be worth a new fnctl that provides this kind
> of "best effort" error checking behavior without having the strict
> requirements of fsync. In effect, to report the errors that you might
> currently get at close() before actually calling close() and losing the
> fd.
>
For a long-held fd, I can see the appeal: spray writes at it and just
check occasionally (without blocking) that nothing has gone wrong.
Maybe when things are idle, you fsync().
A new fcntl(..., F_CHECKERR, ...) command that does a
file_check_and_advance_wb_err() on the fd and reports the result would
be pretty straightforward.
Would that be helpful for your use-case? This would be like a non-
blocking fsync that just reports whether an error has occurred since
the last F_CHECKERR or fsync().
> Alternatively, it would be interesting to have a deferred fsync() that
> schedules a nonblocking sync event that can be polled for completion/
> errors, with flags to indicate immediate sync or allow automatic syncing
> as needed. But there is probably a better alternative to this
> complexity.
>
> [1]: https://github.com/rust-lang/libs-team/issues/705
Aside from the polling, I suppose you could effectively do this with
io_uring. I'm pretty sure you can issue an fsync() or sync_file_range()
that way, but I think it just ends up blocking a kernel thread until
writeback is done.
We've had people ask for a non-blocking fsync before. Maybe it's time
to get serious about adding one. What would such a thing look like?
It would be pretty simple to add a new fcntl(..., F_DATAWRITE) command
that kicks off writeback a'la filemap_fdatawrite().
Then add fcntl(..., F_WB_CHECK):
That could do a non-blocking version of filemap_fdatawait(), and return
whether any folios are still under writeback. If there is a writeback
error, it can return that instead.
The catch of course is that a polling mechanism like this could easily
livelock. If there is a lot of memory pressure, it might always return
that something is still under writeback, no matter how often you hammer
F_CHECKERR.
Maybe that's ok? You can always issue a blocking fsync() if you really
need to know draw a line in the sand.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-27 0:49 ` Jeff Layton
@ 2026-01-28 16:58 ` Zack Weinberg
2026-02-05 9:34 ` Jan Kara
0 siblings, 1 reply; 56+ messages in thread
From: Zack Weinberg @ 2026-01-28 16:58 UTC (permalink / raw)
To: Jeff Layton, Trevor Gross, Jan Kara, The 8472
Cc: Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote:
> On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote:
>> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote:
>> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
>> > > On Mon 26-01-26 14:53:12, The 8472 wrote:
>> > > > On 26/01/2026 13:15, Jan Kara wrote:
>> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
...
>> > > > > > In particular, I really hope delayed errors *aren’t* ever reported
>> > > > > > when you close a file descriptor that *isn’t* the last reference
>> > > > > > to its open file description, because the thread-safe way to close
>> > > > > > stdout without losing write errors[2] depends on that not happening.
>> > > > >
>> > > > > So I've checked and in Linux ->flush callback for the file is called
>> > > > > whenever you close a file descriptor (regardless whether there are other
>> > > > > file descriptors pointing to the same file description) so it's upto
>> > > > > filesystem implementation what it decides to do and which error it will
>> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return
>> > > > > delayed writeback errors on *first* descriptor close even if there are
>> > > > > other still open descriptors for the description AFAICS.
>> >
>> > ...and I really wish they _didn't_.
>> >
>> > Reporting a writeback error on close is not particularly useful. Most
>> > filesystems don't require you to write back all data on a close(). A
>> > successful close() on those just means that no error has happened yet.
>> >
>> > Any application that cares about writeback errors needs to fsync(),
>> > full stop.
>>
>> Is there a good middle ground solution here?
...
>> I was wondering if it could be worth a new fnctl that provides this kind
>> of "best effort" error checking behavior without having the strict
>> requirements of fsync. In effect, to report the errors that you might
>> currently get at close() before actually calling close() and losing the
>> fd.
...
> A new fcntl(..., F_CHECKERR, ...) command that does a
> file_check_and_advance_wb_err() on the fd and reports the result would
> be pretty straightforward.
>
> Would that be helpful for your use-case? This would be like a non-
> blocking fsync that just reports whether an error has occurred since
> the last F_CHECKERR or fsync().
I feel I need to point out that “should the kernel report errors on
close()” and “should the kernel add a new API to make life better for
programs that currently expect close() to report [some] errors” and
“should the Rust standard library propagate errors produced by close()
back up to the application” and “what should the close(2) manpage say
about errors” are four different conversation topics.
I am all in favor of moving toward a world where close() never fails
and there’s _something_ that reports write errors like fsync() without
also kicking your application off a performance cliff. But that’s not
the world we live in today, and this thread started as a conversation
about revising the close(2) manpage, and I’d kinda like to *finish*
revising the manpage in, like, the next couple weeks, not several
years from now :-) So I’d like to refocus on that topic.
Given what Jan Kara said earlier...
> Checking the implementations e.g. FUSE and NFS *will* return delayed
> writeback errors on *first* descriptor close even if there are other
> still open descriptors for the description AFAICS.
...
> fsync(2) must make sure data is persistently stored and return error if
> it was not. Thus as a VFS person I'd consider it a filesystem bug if an
> error preveting reading data later was not returned from fsync(2). OTOH
> that doesn't necessarily mean that later close doesn't return an error -
> e.g. FUSE does communicate with the server on close that can fail and
> error can be returned.
>
> With this in mind let me now try to answer your remaining questions:
>
>> >> - The OFD was opened with O_RDONLY
>
> If the filesystem supports atime, close can in principle report that atime
> update failed.
>
>> >> - The OFD was opened with O_RDWR but has never actually
>> >> been written to
>
> The same as above but with inode mtime updates.
>
>> >> - No data has been written to the OFD since the last call to
>> >> fsync() for that OFD
>
> No writeback errors should happen in this case. As I wrote above I'd
> consider this a filesystem bug.
>
>> >>
>> >> - No data has been written to the OFD since the last call to
>> >> fdatasync() for that OFD
>
> Errors can happen because some inode metadata (in practice probably only
> inode time stamps) may still need to be written out.
>
> So in the cases described above (except for fsync()) you may get delayed
> errors on close. But since in all those cases no data is lost, I don't
> think 99.9% of applications care at all...
... regrettably I think this does mean the close(3) manpage still needs
to tell people to watch out for errors, and should probably say that
errors _can_ happen even if the file wasn’t written to, but are much
less likely to be important in that case.
And my “how to close stdout in a thread-safe manner” sample code is
wrong, because I was wrong to think that the error reporting only
happened on the _final_ close, when the OFD is destroyed.
... What happens if the close is implicit in a dup2() operation? Here’s
that erroneous “how to close stdout” fragment, with comments
indicating what I thought could and could not fail at the time I wrote
it:
// These allocate new fds, which can always fail, e.g. because
// the program already has too many files open.
int new_stdout = open("/dev/null", O_WRONLY);
if (new_stdout == -1) perror_exit("/dev/null");
int old_stdout = dup(1);
if (old_stdout == -1) perror_exit("dup(1)");
flockfile(stdout);
if (fflush(stdout)) perror_exit("stdout: write error");
dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1
funlockfile(stdout);
// this close may receive delayed write errors from previous writes
// to stdout
if (close(old_stdout)) perror_exit("stdout: write error");
// this close cannot fail, because it only drops an alternative
// reference to the open file description now installed as fd 1
close(new_stdout);
Note in particular that the first close _operation_ on fd 1 is in
consequence of dup2(new_stdout, 1). The dup2() manpage specifically
says “the close is performed silently (i.e. any errors during the
close are not reported by dup()” but, if stdout points to a file on
an NFS mount, are those errors _lost_, or will they actually be
reported by the subsequent close(old_stdout)?
Incidentally, the dup2() manpage has a very similar example in its
NOTES section, also presuming that close only reports errors on the
_final_ close, not when it “merely” drops reference >=2 to an OFD.
(I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that
already a thing somehow?)
zw
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2026-01-28 16:58 ` Zack Weinberg
@ 2026-02-05 9:34 ` Jan Kara
0 siblings, 0 replies; 56+ messages in thread
From: Jan Kara @ 2026-02-05 9:34 UTC (permalink / raw)
To: Zack Weinberg
Cc: Jeff Layton, Trevor Gross, Jan Kara, The 8472, Rich Felker,
Alejandro Colomar, Vincent Lefevre, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
I've noticed we didn't reply to one question here:
On Wed 28-01-26 11:58:07, Zack Weinberg wrote:
> On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote:
> > Checking the implementations e.g. FUSE and NFS *will* return delayed
> > writeback errors on *first* descriptor close even if there are other
> > still open descriptors for the description AFAICS.
> ...
> > fsync(2) must make sure data is persistently stored and return error if
> > it was not. Thus as a VFS person I'd consider it a filesystem bug if an
> > error preveting reading data later was not returned from fsync(2). OTOH
> > that doesn't necessarily mean that later close doesn't return an error -
> > e.g. FUSE does communicate with the server on close that can fail and
> > error can be returned.
> >
> > With this in mind let me now try to answer your remaining questions:
> >
> >> >> - The OFD was opened with O_RDONLY
> >
> > If the filesystem supports atime, close can in principle report that atime
> > update failed.
> >
> >> >> - The OFD was opened with O_RDWR but has never actually
> >> >> been written to
> >
> > The same as above but with inode mtime updates.
> >
> >> >> - No data has been written to the OFD since the last call to
> >> >> fsync() for that OFD
> >
> > No writeback errors should happen in this case. As I wrote above I'd
> > consider this a filesystem bug.
> >
> >> >>
> >> >> - No data has been written to the OFD since the last call to
> >> >> fdatasync() for that OFD
> >
> > Errors can happen because some inode metadata (in practice probably only
> > inode time stamps) may still need to be written out.
> >
> > So in the cases described above (except for fsync()) you may get delayed
> > errors on close. But since in all those cases no data is lost, I don't
> > think 99.9% of applications care at all...
>
> ... regrettably I think this does mean the close(3) manpage still needs
> to tell people to watch out for errors, and should probably say that
> errors _can_ happen even if the file wasn’t written to, but are much
> less likely to be important in that case.
>
> And my “how to close stdout in a thread-safe manner” sample code is
> wrong, because I was wrong to think that the error reporting only
> happened on the _final_ close, when the OFD is destroyed.
>
> ... What happens if the close is implicit in a dup2() operation? Here’s
> that erroneous “how to close stdout” fragment, with comments
> indicating what I thought could and could not fail at the time I wrote
> it:
>
> // These allocate new fds, which can always fail, e.g. because
> // the program already has too many files open.
> int new_stdout = open("/dev/null", O_WRONLY);
> if (new_stdout == -1) perror_exit("/dev/null");
> int old_stdout = dup(1);
> if (old_stdout == -1) perror_exit("dup(1)");
>
> flockfile(stdout);
> if (fflush(stdout)) perror_exit("stdout: write error");
> dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1
> funlockfile(stdout);
>
> // this close may receive delayed write errors from previous writes
> // to stdout
> if (close(old_stdout)) perror_exit("stdout: write error");
>
> // this close cannot fail, because it only drops an alternative
> // reference to the open file description now installed as fd 1
> close(new_stdout);
>
> Note in particular that the first close _operation_ on fd 1 is in
> consequence of dup2(new_stdout, 1). The dup2() manpage specifically
> says “the close is performed silently (i.e. any errors during the
> close are not reported by dup()” but, if stdout points to a file on
> an NFS mount, are those errors _lost_, or will they actually be
> reported by the subsequent close(old_stdout)?
It is simply lost (the error is propagated from the filesystem to VFS which
just ignores it).
> Incidentally, the dup2() manpage has a very similar example in its
> NOTES section, also presuming that close only reports errors on the
> _final_ close, not when it “merely” drops reference >=2 to an OFD.
>
> (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that
> already a thing somehow?)
I don't think a functionality like this currently exists.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
2025-05-17 13:32 ` Rich Felker
2025-05-17 13:46 ` Alejandro Colomar
@ 2026-02-06 15:13 ` Vincent Lefevre
1 sibling, 0 replies; 56+ messages in thread
From: Vincent Lefevre @ 2026-02-06 15:13 UTC (permalink / raw)
To: Rich Felker
Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner,
linux-fsdevel, linux-api, libc-alpha
On 2025-05-17 09:32:52 -0400, Rich Felker wrote:
> On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote:
> > On 2025-05-16 09:05:47 -0400, Rich Felker wrote:
> > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the
> > > issue, and later changed it to returning 0 since applications
> > > (particularly, any written prior to this interpretation) are prone to
> > > interpret EINPROGRESS as an error condition rather than success and
> > > possibly misinterpret it as meaning the fd is still open and valid to
> > > pass to close again.
> >
> > If I understand correctly, this is a poor choice. POSIX.1-2024 says:
> >
> > ERRORS
> > The close() and posix_close() functions shall fail if:
> > [...]
> > [EINPROGRESS]
> > The function was interrupted by a signal and fildes was closed
> > but the close operation is continuing asynchronously.
> >
> > But this does not mean that the asynchronous close operation will
> > succeed.
>
> There are no asynchronous behaviors specified for there to be a
> conformance distinction here. The only observable behaviors happen
> instantly, mainly the release of the file descriptor and the process's
> handle on the underlying resource. Abstractly, there is no async
> operation that could succeed or fail.
Sorry, this is old. But a consequence may be memory leak if something
unexpected occurred during what was done asynchronously. There is no
guarantee that *every* resource has been released.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2026-02-06 15:20 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 21:33 close(2) with EINTR has been changed by POSIX.1-2024 Alejandro Colomar
2025-05-16 10:48 ` Jan Kara
2025-05-16 12:11 ` Alejandro Colomar
2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar
2025-05-16 13:05 ` Rich Felker
2025-05-16 14:20 ` Theodore Ts'o
2025-05-17 5:46 ` Alejandro Colomar
2025-05-17 13:03 ` Alejandro Colomar
2025-05-17 13:43 ` Rich Felker
2025-05-16 14:39 ` Vincent Lefevre
2025-05-16 14:52 ` Florian Weimer
2025-05-16 15:28 ` Vincent Lefevre
2025-05-16 15:28 ` Rich Felker
2025-05-17 13:32 ` Rich Felker
2025-05-17 13:46 ` Alejandro Colomar
2025-05-23 18:10 ` Zack Weinberg
2025-05-24 2:24 ` Rich Felker
2026-01-20 17:05 ` Zack Weinberg
2026-01-20 17:46 ` Rich Felker
2026-01-20 18:39 ` Florian Weimer
2026-01-20 19:00 ` Rich Felker
2026-01-20 20:05 ` Florian Weimer
2026-01-20 20:11 ` Paul Eggert
2026-01-20 20:35 ` Alejandro Colomar
2026-01-20 20:42 ` Alejandro Colomar
2026-01-23 0:33 ` Zack Weinberg
2026-01-23 1:02 ` Alejandro Colomar
2026-01-23 1:38 ` Al Viro
2026-01-23 14:44 ` Alejandro Colomar
2026-01-23 14:05 ` Zack Weinberg
2026-01-24 19:34 ` The 8472
2026-01-24 21:39 ` Rich Felker
2026-01-24 21:57 ` The 8472
2026-01-25 15:37 ` Zack Weinberg
2026-01-26 8:51 ` Florian Weimer
2026-01-26 12:15 ` Jan Kara
2026-01-26 13:53 ` The 8472
2026-01-26 15:56 ` Jan Kara
2026-01-26 16:43 ` Jeff Layton
2026-01-26 23:01 ` Trevor Gross
2026-01-27 0:49 ` Jeff Layton
2026-01-28 16:58 ` Zack Weinberg
2026-02-05 9:34 ` Jan Kara
2025-05-24 19:25 ` Florian Weimer
2026-01-18 22:23 ` Alejandro Colomar
2026-01-20 16:15 ` Zack Weinberg
2026-01-20 16:36 ` Rich Felker
2026-01-20 19:17 ` Al Viro
2026-02-06 15:13 ` Vincent Lefevre
2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik
2025-05-16 12:41 ` Theodore Ts'o
2025-05-19 23:19 ` Steffen Nurpmeso
2025-05-20 13:37 ` Theodore Ts'o
2025-05-20 23:16 ` Steffen Nurpmeso
2025-05-16 19:13 ` Al Viro
2025-05-19 9:48 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox