* close(2) with EINTR has been changed by POSIX.1-2024
@ 2025-05-15 21:33 Alejandro Colomar
2025-05-16 10:48 ` Jan Kara
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Colomar @ 2025-05-15 21:33 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-api
Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]
Hi,
I'm updating the manual pages for POSIX.1-2024, and have some doubts
about close(2). The manual page for close(2) says (conforming to
POSIX.1-2008):
The EINTR error is a somewhat special case. Regarding the EINTR
error, POSIX.1‐2008 says:
If close() is interrupted by a signal that is to be
caught, it shall return -1 with errno set to EINTR and
the state of fildes is unspecified.
This permits the behavior that occurs on Linux and many other
implementations, where, as with other errors that may be re‐
ported by close(), the file descriptor is guaranteed to be
closed. However, it also permits another possibility: that the
implementation returns an EINTR error and keeps the file de‐
scriptor open. (According to its documentation, HP‐UX’s close()
does this.) The caller must then once more use close() to close
the file descriptor, to avoid file descriptor leaks. This di‐
vergence in implementation behaviors provides a difficult hurdle
for portable applications, since on many implementations,
close() must not be called again after an EINTR error, and on at
least one, close() must be called again. There are plans to ad‐
dress this conundrum for the next major release of the POSIX.1
standard.
TL;DR: close(2) with EINTR is allowed to either leave the fd open or
closed, and Linux leaves it closed, while others (HP-UX only?) leaves it
open.
Now, POSIX.1-2024 says:
If close() is interrupted by a signal that is to be caught, then
it is unspecified whether it returns -1 with errno set to
[EINTR] and fildes remaining open, or returns -1 with errno set
to [EINPROGRESS] and fildes being closed, or returns 0 to
indicate successful completion; [...]
<https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html>
Which seems to bless HP-UX and screw all the others, requiring them to
report EINPROGRESS.
Was there any discussion about what to do in the Linux kernel?
Have a lovely night!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-15 21:33 close(2) with EINTR has been changed by POSIX.1-2024 Alejandro Colomar @ 2025-05-16 10:48 ` Jan Kara 2025-05-16 12:11 ` Alejandro Colomar ` (4 more replies) 0 siblings, 5 replies; 56+ messages in thread From: Jan Kara @ 2025-05-16 10:48 UTC (permalink / raw) To: Alejandro Colomar Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-api, linux-man Hi! On Thu 15-05-25 23:33:22, Alejandro Colomar wrote: > I'm updating the manual pages for POSIX.1-2024, and have some doubts > about close(2). The manual page for close(2) says (conforming to > POSIX.1-2008): > > The EINTR error is a somewhat special case. Regarding the EINTR > error, POSIX.1‐2008 says: > > If close() is interrupted by a signal that is to be > caught, it shall return -1 with errno set to EINTR and > the state of fildes is unspecified. > > This permits the behavior that occurs on Linux and many other > implementations, where, as with other errors that may be re‐ > ported by close(), the file descriptor is guaranteed to be > closed. However, it also permits another possibility: that the > implementation returns an EINTR error and keeps the file de‐ > scriptor open. (According to its documentation, HP‐UX’s close() > does this.) The caller must then once more use close() to close > the file descriptor, to avoid file descriptor leaks. This di‐ > vergence in implementation behaviors provides a difficult hurdle > for portable applications, since on many implementations, > close() must not be called again after an EINTR error, and on at > least one, close() must be called again. There are plans to ad‐ > dress this conundrum for the next major release of the POSIX.1 > standard. > > TL;DR: close(2) with EINTR is allowed to either leave the fd open or > closed, and Linux leaves it closed, while others (HP-UX only?) leaves it > open. > > Now, POSIX.1-2024 says: > > If close() is interrupted by a signal that is to be caught, then > it is unspecified whether it returns -1 with errno set to > [EINTR] and fildes remaining open, or returns -1 with errno set > to [EINPROGRESS] and fildes being closed, or returns 0 to > indicate successful completion; [...] > > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > > Which seems to bless HP-UX and screw all the others, requiring them to > report EINPROGRESS. > > Was there any discussion about what to do in the Linux kernel? I'm not aware of any discussions but indeed we are returning EINTR while closing the fd. Frankly, changing the error code we return in that case is really asking for userspace regressions so I'm of the opinion we just ignore the standard as in my opinion it goes against a long established reality. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 10:48 ` Jan Kara @ 2025-05-16 12:11 ` Alejandro Colomar 2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar 2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik ` (3 subsequent siblings) 4 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2025-05-16 12:11 UTC (permalink / raw) To: Jan Kara Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man [-- Attachment #1: Type: text/plain, Size: 844 bytes --] Hi Jan! On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: > > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > > > > Which seems to bless HP-UX and screw all the others, requiring them to > > report EINPROGRESS. > > > > Was there any discussion about what to do in the Linux kernel? > > I'm not aware of any discussions but indeed we are returning EINTR while > closing the fd. Frankly, changing the error code we return in that case is > really asking for userspace regressions so I'm of the opinion we just > ignore the standard as in my opinion it goes against a long established > reality. Yep, sounds like what I was expecting. I'll document that we'll ignore the new POSIX for close(2) on purpose. Thanks! Have a lovely day! Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 12:11 ` Alejandro Colomar @ 2025-05-16 12:52 ` Alejandro Colomar 2025-05-16 13:05 ` Rich Felker 0 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2025-05-16 12:52 UTC (permalink / raw) Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, libc-alpha POSIX.1-2024 now mandates a behavior different from what Linux (and many other implementations) does. It requires that we report EINPROGRESS for what now is EINTR. There are no plans to conform to POSIX.1-2024 within the Linux kernel, so document this divergence. Keep POSIX.1-2008 as the standard to which we conform in STANDARDS. Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=14627> Link: <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> Cc: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-api@vger.kernel.org> Cc: <libc-alpha@sourceware.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- Hi, I've prepared this draft for discussion. While doing so, I've noticed the glibc bug ticket, which sounds possibly reasonable: returning 0 instead of reporting an error on EINTR. That would be an option that would make us conforming to POSIX.1-2024. And given that a user can (and must) do nothing after seeing EINTR, returning 0 wouldn't change things. So, I'll leave this patch open for discussion. Have a lovely day! Alex man/man2/close.2 | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/man/man2/close.2 b/man/man2/close.2 index b25ea4de9..9d5e26eed 100644 --- a/man/man2/close.2 +++ b/man/man2/close.2 @@ -191,10 +191,7 @@ .SS Dealing with error returns from close() meaning that the file descriptor was invalid) even if they subsequently report an error on return from .BR close (). -POSIX.1 is currently silent on this point, -but there are plans to mandate this behavior in the next major release -.\" Issue 8 -of the standard. +POSIX.1-2008 was silent on this point. .P A careful programmer who wants to know about I/O errors may precede .BR close () @@ -206,7 +203,7 @@ .SS Dealing with error returns from close() error is a somewhat special case. Regarding the .B EINTR -error, POSIX.1-2008 says: +error, POSIX.1-2008 said: .P .RS If @@ -243,16 +240,10 @@ .SS Dealing with error returns from close() error, and on at least one, .BR close () must be called again. -There are plans to address this conundrum for -the next major release of the POSIX.1 standard. -.\" FIXME . for later review when Issue 8 is one day released... -.\" POSIX proposes further changes for EINTR -.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8 -.\" http://austingroupbugs.net/view.php?id=529 -.\" -.\" FIXME . -.\" Review the following glibc bug later -.\" https://sourceware.org/bugzilla/show_bug.cgi?id=14627 +.P +POSIX.1-2024 standardized the behavior of HP-UX, +making Linux and many other implementations non-conforming. +There are no plans to change the behavior on Linux. .SH SEE ALSO .BR close_range (2), .BR fcntl (2), Range-diff against v0: -: --------- > 1: efaffc5a4 man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 base-commit: 978b017d93e4e32b752b33877e44a8365644630c -- 2.49.0 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar @ 2025-05-16 13:05 ` Rich Felker 2025-05-16 14:20 ` Theodore Ts'o 2025-05-16 14:39 ` Vincent Lefevre 0 siblings, 2 replies; 56+ messages in thread From: Rich Felker @ 2025-05-16 13:05 UTC (permalink / raw) To: Alejandro Colomar Cc: Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On Fri, May 16, 2025 at 02:52:05PM +0200, Alejandro Colomar wrote: > POSIX.1-2024 now mandates a behavior different from what Linux (and many > other implementations) does. It requires that we report EINPROGRESS for > what now is EINTR. > > There are no plans to conform to POSIX.1-2024 within the Linux kernel, > so document this divergence. Keep POSIX.1-2008 as the standard to > which we conform in STANDARDS. > > Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=14627> > Link: <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > Cc: Jan Kara <jack@suse.cz> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Rich Felker <dalias@libc.org> > Cc: <linux-fsdevel@vger.kernel.org> > Cc: <linux-api@vger.kernel.org> > Cc: <libc-alpha@sourceware.org> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- > > Hi, > > I've prepared this draft for discussion. While doing so, I've noticed > the glibc bug ticket, which sounds possibly reasonable: returning 0 > instead of reporting an error on EINTR. That would be an option that > would make us conforming to POSIX.1-2024. And given that a user can > (and must) do nothing after seeing EINTR, returning 0 wouldn't change > things. > > So, I'll leave this patch open for discussion. FWIW musl adopted the EINPROGRESS as soon as we were made aware of the issue, and later changed it to returning 0 since applications (particularly, any written prior to this interpretation) are prone to interpret EINPROGRESS as an error condition rather than success and possibly misinterpret it as meaning the fd is still open and valid to pass to close again. In general, raw kernel interfaces do not conform to any version of POSIX; they're just a low-impedance-mismatch set of inferfaces that facilitate implementing POSIX at the userspace libc layer. So I don't think this should be documented as "Linux doesn't conform" but (hopefully, once glibc fixes this) "old versions of glibc did not conform". Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 13:05 ` Rich Felker @ 2025-05-16 14:20 ` Theodore Ts'o 2025-05-17 5:46 ` Alejandro Colomar 2025-05-16 14:39 ` Vincent Lefevre 1 sibling, 1 reply; 56+ messages in thread From: Theodore Ts'o @ 2025-05-16 14:20 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > In general, raw kernel interfaces do not conform to any version of > POSIX; they're just a low-impedance-mismatch set of inferfaces that > facilitate implementing POSIX at the userspace libc layer. So I don't > think this should be documented as "Linux doesn't conform" but > (hopefully, once glibc fixes this) "old versions of glibc did not > conform". If glibc maintainers want to deal with breaking userspace, then as a kernel developer, I'm happy to let them deal with the angry/disappointed users and application programmers. :-) - Ted ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 14:20 ` Theodore Ts'o @ 2025-05-17 5:46 ` Alejandro Colomar 2025-05-17 13:03 ` Alejandro Colomar 0 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2025-05-17 5:46 UTC (permalink / raw) To: Theodore Ts'o Cc: Rich Felker, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha [-- Attachment #1: Type: text/plain, Size: 1912 bytes --] Hi Ted, Rich, On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > issue, and later changed it to returning 0 since applications > (particularly, any written prior to this interpretation) are prone to > interpret EINPROGRESS as an error condition rather than success and > possibly misinterpret it as meaning the fd is still open and valid to > pass to close again. Hmmm, this page will need a kernel/libc differences section where I should explain this. On Fri, May 16, 2025 at 10:20:24AM -0400, Theodore Ts'o wrote: > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > > > In general, raw kernel interfaces do not conform to any version of > > POSIX; they're just a low-impedance-mismatch set of inferfaces that > > facilitate implementing POSIX at the userspace libc layer. So I don't > > think this should be documented as "Linux doesn't conform" but > > (hopefully, once glibc fixes this) "old versions of glibc did not > > conform". > > If glibc maintainers want to deal with breaking userspace, then as a > kernel developer, I'm happy to let them deal with the > angry/disappointed users and application programmers. :-) Which breakage do you expect from the behavior that musl has chosen? I agree that the POSIX invention of EINPROGRESS is something that would break users. However, in removing the error completely and making it a success, I don't see the same problem. That is, if a program calls close(2) and sees a return of 0, or sees a return of -1 with EINTR on Linux, both mean "the file descriptor has been closed, and the contents of the file will *eventually* reach the file". In which cases do you expect any existing Linux program to behave differently on 0 and on EINTR? Have a lovely day! Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-17 5:46 ` Alejandro Colomar @ 2025-05-17 13:03 ` Alejandro Colomar 2025-05-17 13:43 ` Rich Felker 0 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2025-05-17 13:03 UTC (permalink / raw) To: Theodore Ts'o Cc: Rich Felker, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha [-- Attachment #1: Type: text/plain, Size: 2731 bytes --] Hi, On Sat, May 17, 2025 at 07:46:48AM +0200, Alejandro Colomar wrote: > Hi Ted, Rich, > > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > issue, and later changed it to returning 0 since applications > > (particularly, any written prior to this interpretation) are prone to > > interpret EINPROGRESS as an error condition rather than success and > > possibly misinterpret it as meaning the fd is still open and valid to > > pass to close again. BTW, I don't think that's a correct interpretation. The manual page clearly says after close(2), even on error, the fd is closed and not usable. The issue I see is a program thinking it failed and trying to copy the file again or reporting an error. On the other hand, as Vincent said, maybe this is not so bad. For certain files, fsync(2) is only described for storage devices, so in some cases there's no clear way to make sure close(2) won't fail after EINTR (maybe calling sync(2)?). So, maybe considering it an error wouldn't be a terrible idea. I don't know. Cheers, Alex > > Hmmm, this page will need a kernel/libc differences section where I > should explain this. > > On Fri, May 16, 2025 at 10:20:24AM -0400, Theodore Ts'o wrote: > > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > > > > > In general, raw kernel interfaces do not conform to any version of > > > POSIX; they're just a low-impedance-mismatch set of inferfaces that > > > facilitate implementing POSIX at the userspace libc layer. So I don't > > > think this should be documented as "Linux doesn't conform" but > > > (hopefully, once glibc fixes this) "old versions of glibc did not > > > conform". > > > > If glibc maintainers want to deal with breaking userspace, then as a > > kernel developer, I'm happy to let them deal with the > > angry/disappointed users and application programmers. :-) > > Which breakage do you expect from the behavior that musl has chosen? > > I agree that the POSIX invention of EINPROGRESS is something that would > break users. However, in removing the error completely and making it a > success, I don't see the same problem. That is, if a program calls > close(2) and sees a return of 0, or sees a return of -1 with EINTR on > Linux, both mean "the file descriptor has been closed, and the contents > of the file will *eventually* reach the file". > > In which cases do you expect any existing Linux program to behave > differently on 0 and on EINTR? > > > Have a lovely day! > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-17 13:03 ` Alejandro Colomar @ 2025-05-17 13:43 ` Rich Felker 0 siblings, 0 replies; 56+ messages in thread From: Rich Felker @ 2025-05-17 13:43 UTC (permalink / raw) To: Alejandro Colomar Cc: Theodore Ts'o, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On Sat, May 17, 2025 at 03:03:52PM +0200, Alejandro Colomar wrote: > Hi, > > On Sat, May 17, 2025 at 07:46:48AM +0200, Alejandro Colomar wrote: > > Hi Ted, Rich, > > > > On Fri, May 16, 2025 at 09:05:47AM -0400, Rich Felker wrote: > > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > > issue, and later changed it to returning 0 since applications > > > (particularly, any written prior to this interpretation) are prone to > > > interpret EINPROGRESS as an error condition rather than success and > > > possibly misinterpret it as meaning the fd is still open and valid to > > > pass to close again. > > BTW, I don't think that's a correct interpretation. The manual page > clearly says after close(2), even on error, the fd is closed and not > usable. The issue I see is a program thinking it failed and trying to > copy the file again or reporting an error. The authoritative source here is POSIX not the man page, assuming you're writing a portable application and not a "Linux application". Until the lastest issue (POSIX 2024/Issue 8), the state of the fd after EINTR was explicitly unspecified, and after other errors was unspecified by omission. So there is no way for a program written to prior versions of the standard to have known how to safely handle getting EINPROGRESS -- or any error from close for that matter. Really, the only safe error for close to return, *ever*, is EBADF. On valid input, it *must succeed*. This is a general principle for "deallocation/destruction functions". Not an explicit requirement of this or any standard; just a logical requirement for forward progress to be possible. > On the other hand, as Vincent said, maybe this is not so bad. For > certain files, fsync(2) is only described for storage devices, so in > some cases there's no clear way to make sure close(2) won't fail after > EINTR (maybe calling sync(2)?). So, maybe considering it an error > wouldn't be a terrible idea. Whether data is committed to physical storage in a way that's robust against machine faults is a completely separate issue from whether it's committed to the abstract storage. The latter happens at the moment of write, not close. If an application is trying to ensure that kind of robustness, the return value of close is not the tool. It needs the Synchronized IO interfaces (fsync, etc.) or something specific to whatever it's writing to. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 13:05 ` Rich Felker 2025-05-16 14:20 ` Theodore Ts'o @ 2025-05-16 14:39 ` Vincent Lefevre 2025-05-16 14:52 ` Florian Weimer ` (2 more replies) 1 sibling, 3 replies; 56+ messages in thread From: Vincent Lefevre @ 2025-05-16 14:39 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > issue, and later changed it to returning 0 since applications > (particularly, any written prior to this interpretation) are prone to > interpret EINPROGRESS as an error condition rather than success and > possibly misinterpret it as meaning the fd is still open and valid to > pass to close again. If I understand correctly, this is a poor choice. POSIX.1-2024 says: ERRORS The close() and posix_close() functions shall fail if: [...] [EINPROGRESS] The function was interrupted by a signal and fildes was closed but the close operation is continuing asynchronously. But this does not mean that the asynchronous close operation will succeed. So the application could incorrectly deduce that the close operation was done without any error. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 14:39 ` Vincent Lefevre @ 2025-05-16 14:52 ` Florian Weimer 2025-05-16 15:28 ` Vincent Lefevre 2025-05-16 15:28 ` Rich Felker 2025-05-17 13:32 ` Rich Felker 2 siblings, 1 reply; 56+ messages in thread From: Florian Weimer @ 2025-05-16 14:52 UTC (permalink / raw) To: Vincent Lefevre Cc: Rich Felker, Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha * Vincent Lefevre: > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: >> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the >> issue, and later changed it to returning 0 since applications >> (particularly, any written prior to this interpretation) are prone to >> interpret EINPROGRESS as an error condition rather than success and >> possibly misinterpret it as meaning the fd is still open and valid to >> pass to close again. > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > ERRORS > The close() and posix_close() functions shall fail if: > [...] > [EINPROGRESS] > The function was interrupted by a signal and fildes was closed > but the close operation is continuing asynchronously. > > But this does not mean that the asynchronous close operation will > succeed. > > So the application could incorrectly deduce that the close operation > was done without any error. But on Linux, close traditionally has poor error reporting anyway. You have to fsync (or equivalent) before calling close if you want error checking. On other systems, the fsync is more or less implied by the close, leading to rather poor performance. Thanks, Florian ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 14:52 ` Florian Weimer @ 2025-05-16 15:28 ` Vincent Lefevre 0 siblings, 0 replies; 56+ messages in thread From: Vincent Lefevre @ 2025-05-16 15:28 UTC (permalink / raw) To: Florian Weimer Cc: Rich Felker, Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On 2025-05-16 16:52:48 +0200, Florian Weimer wrote: > * Vincent Lefevre: > > > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > >> FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > >> issue, and later changed it to returning 0 since applications > >> (particularly, any written prior to this interpretation) are prone to > >> interpret EINPROGRESS as an error condition rather than success and > >> possibly misinterpret it as meaning the fd is still open and valid to > >> pass to close again. > > > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > > > ERRORS > > The close() and posix_close() functions shall fail if: > > [...] > > [EINPROGRESS] > > The function was interrupted by a signal and fildes was closed > > but the close operation is continuing asynchronously. > > > > But this does not mean that the asynchronous close operation will > > succeed. > > > > So the application could incorrectly deduce that the close operation > > was done without any error. > > But on Linux, close traditionally has poor error reporting anyway. You > have to fsync (or equivalent) before calling close if you want error > checking. On other systems, the fsync is more or less implied by the > close, leading to rather poor performance. According to its documentation, fsync is only for storage devices, while not all file descriptors are associated with storage devices. So I'm wondering the consequences in the other cases. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 14:39 ` Vincent Lefevre 2025-05-16 14:52 ` Florian Weimer @ 2025-05-16 15:28 ` Rich Felker 2025-05-17 13:32 ` Rich Felker 2 siblings, 0 replies; 56+ messages in thread From: Rich Felker @ 2025-05-16 15:28 UTC (permalink / raw) To: Vincent Lefevre, Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote: > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > issue, and later changed it to returning 0 since applications > > (particularly, any written prior to this interpretation) are prone to > > interpret EINPROGRESS as an error condition rather than success and > > possibly misinterpret it as meaning the fd is still open and valid to > > pass to close again. > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > ERRORS > The close() and posix_close() functions shall fail if: > [...] > [EINPROGRESS] > The function was interrupted by a signal and fildes was closed > but the close operation is continuing asynchronously. > > But this does not mean that the asynchronous close operation will > succeed. It always succeeds in the way that's important: the file descriptor is freed and the process no longer has this reference to the open file description. What might or might not succeed is: (1) other ancient legacy behaviors coupled to close(), like rewinding a tape drive. If the application cares how that behaves, it needs to be performing an explicit rewind *before* calling close, when it still has a handle on the open file so that it can respond to exceptional conditions, not relying on a legacy behavior like "close also rewinds" that's device-specific and outside the scope of any modern cross-platform standard. (2) deferred operations in unsafe async NFS setups. This is a huge mess with no real reliable solution except "don't configure your NFS to have unsafe and nonconforming behaviors in the pursuit of performance". Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-16 14:39 ` Vincent Lefevre 2025-05-16 14:52 ` Florian Weimer 2025-05-16 15:28 ` Rich Felker @ 2025-05-17 13:32 ` Rich Felker 2025-05-17 13:46 ` Alejandro Colomar 2026-02-06 15:13 ` Vincent Lefevre 2 siblings, 2 replies; 56+ messages in thread From: Rich Felker @ 2025-05-17 13:32 UTC (permalink / raw) To: Vincent Lefevre, Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote: > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > issue, and later changed it to returning 0 since applications > > (particularly, any written prior to this interpretation) are prone to > > interpret EINPROGRESS as an error condition rather than success and > > possibly misinterpret it as meaning the fd is still open and valid to > > pass to close again. > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > ERRORS > The close() and posix_close() functions shall fail if: > [...] > [EINPROGRESS] > The function was interrupted by a signal and fildes was closed > but the close operation is continuing asynchronously. > > But this does not mean that the asynchronous close operation will > succeed. There are no asynchronous behaviors specified for there to be a conformance distinction here. The only observable behaviors happen instantly, mainly the release of the file descriptor and the process's handle on the underlying resource. Abstractly, there is no async operation that could succeed or fail. > So the application could incorrectly deduce that the close operation > was done without any error. This deduction is correct, not incorrect. Rather, failing with EINPROGRESS would make the application incorrectly deduce that there might be some error it missed (even if it's aware of the new error code), and absolutely does make all existing applications written prior to the new text in POSIX 2024 unable to determine if the fd was even released and needs to be passed to close again or not. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-17 13:32 ` Rich Felker @ 2025-05-17 13:46 ` Alejandro Colomar 2025-05-23 18:10 ` Zack Weinberg 2026-02-06 15:13 ` Vincent Lefevre 1 sibling, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2025-05-17 13:46 UTC (permalink / raw) To: Rich Felker Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha [-- Attachment #1: Type: text/plain, Size: 3465 bytes --] On Sat, May 17, 2025 at 09:32:52AM -0400, Rich Felker wrote: > On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote: > > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > > issue, and later changed it to returning 0 since applications > > > (particularly, any written prior to this interpretation) are prone to > > > interpret EINPROGRESS as an error condition rather than success and > > > possibly misinterpret it as meaning the fd is still open and valid to > > > pass to close again. > > > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > > > ERRORS > > The close() and posix_close() functions shall fail if: > > [...] > > [EINPROGRESS] > > The function was interrupted by a signal and fildes was closed > > but the close operation is continuing asynchronously. > > > > But this does not mean that the asynchronous close operation will > > succeed. > > There are no asynchronous behaviors specified for there to be a > conformance distinction here. The only observable behaviors happen > instantly, mainly the release of the file descriptor and the process's > handle on the underlying resource. Abstractly, there is no async > operation that could succeed or fail. > > > So the application could incorrectly deduce that the close operation > > was done without any error. > > This deduction is correct, not incorrect. Rather, failing with > EINPROGRESS would make the application incorrectly deduce that there > might be some error it missed (even if it's aware of the new error > code), and absolutely does make all existing applications written > prior to the new text in POSIX 2024 unable to determine if the fd was > even released and needs to be passed to close again or not. Hi Rich, I think this is not correct; at least on Linux. The manual page is very clear that close(2) should not be retried on error: Dealing with error returns from close() A careful programmer will check the return value of close(), since it is quite possible that errors on a previous write(2) operation are reported only on the final close() that releases the open file description. Failing to check the return value when closing a file may lead to silent loss of data. This can especially be observed with NFS and with disk quota. Note, however, that a failure return should be used only for di‐ agnostic purposes (i.e., a warning to the application that there may still be I/O pending or there may have been failed I/O) or remedial purposes (e.g., writing the file once more or creating a backup). Retrying the close() after a failure return is the wrong thing to do, since this may cause a reused file descriptor from an‐ other thread to be closed. This can occur because the Linux kernel always releases the file descriptor early in the close operation, freeing it for reuse; the steps that may return an error, such as flushing data to the filesystem or device, occur only later in the close operation. ... A careful programmer who wants to know about I/O errors may pre‐ cede close() with a call to fsync(2). Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-17 13:46 ` Alejandro Colomar @ 2025-05-23 18:10 ` Zack Weinberg 2025-05-24 2:24 ` Rich Felker ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Zack Weinberg @ 2025-05-23 18:10 UTC (permalink / raw) To: Alejandro Colomar, Rich Felker Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development Taking everything said in this thread into account, I have attempted to wordsmith new language for the close(2) manpage. Please let me know what you think, and please help me with the bits marked in square brackets. I can make this into a proper patch for the manpages when everyone is happy with it. zw --- DESCRIPTION ... existing text ... close() always succeeds. That is, after it returns, _fd_ has always been disconnected from the open file it formerly referred to, and its number can be recycled to refer to some other file. Furthermore, if _fd_ was the last reference to the underlying open file description, the resources associated with the open file description will always have been scheduled to be released. However, close may report _delayed errors_ from a previous I/O operation. Therefore, its return value should not be ignored. RETURN VALUE close() returns zero if there are no delayed errors to report, or -1 if there _might_ be delayed errors. When close() returns -1, check _errno_ to see what the situation actually is. Most, but not all, _errno_ codes indicate a delayed I/O error that should be reported to the user. See ERRORS and NOTES for more detail. [QUERY: Is it ever possible to get delayed errors on close() from a file that was opened with O_RDONLY? What about a file that was opened with O_RDWR but never actually written to? If people only have to worry about delayed errors if the file was actually written to, we should say so at this point. It would also be good to mention whether it is possible to get a delayed error on close() even if a previous call to fsync() or fdatasync() succeeded and there haven’t been any more writes to that file *description* (not necessarily via the fd being closed) since.] ERRORS EBADF _fd_ wasn’t open in the first place, or is outside the valid numeric range for file descriptors. EINPROGRESS EINTR There are no delayed errors to report, but the kernel is still doing some clean-up work in the background. This situation should be treated the same as if close() had returned zero. Do not retry the close(), and do not report an error to the user. EDQUOT EFBIG EIO ENOSPC These are the most common errno codes associated with delayed I/O errors. They should be treated as a hard failure to write to the file that was formerly associated with _fd_, the same as if an earlier write(2) had failed with one of these codes. The file has still been closed! Do not retry the close(). But do report an error to the user. Depending on the underlying file, close() may return other errno codes; these should generally also be treated as delayed I/O errors. NOTES Dealing with error returns from close() As discussed above, close() always closes the file. Except when errno is set to EBADF, EINPROGRESS, or EINTR, an error return from close() reports a _delayed I/O error_ from a previous write() operation. It is vital to report delayed I/O errors to the user; failing to check the return value of close() can cause _silent_ loss of data. The most common situations where this actually happens involve networked filesystems, where, in the name of throughput, write() often returns success before the server has actually confirmed a successful write. However, it is also vital to understand that _no matter what_ close() returns, and _no matter what_ it sets errno to, when it returns, _the file descriptor passed to close() has been closed_, and its number is _immediately_ available for reuse by open(2), dup(2), etc. Therefore, one should never retry a close(), not even if it set errno to a value that normally indicates the operation needs to be retried (e.g. EINTR). Retrying a close() is a serious bug, particularly in a multithreaded program; if the file descriptor number has already been reused, _that file_ will get closed out from under whatever other thread opened it. [Possibly something about fsync/fdatasync here?] BUGS Prior to POSIX.1-2024, there was no official guarantee that close() would always close the file descriptor, even on error. Linux has always closed the file descriptor, even on error, but other implementations might not have. The only such implementation we have heard of is HP-UX; at least some versions of HP-UX’s man page for close() said it should be retried if it returned -1 with errno set to EINTR. (If you know exactly which versions of HP-UX are affected, or of any other Unix where close() doesn’t always close the file descriptor, please contact us about it.) Portable code should nonetheless never retry a failed close(); the consequences of a file descriptor leak are far less dangerous than the consequences of closing a file out from under another thread. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-23 18:10 ` Zack Weinberg @ 2025-05-24 2:24 ` Rich Felker 2026-01-20 17:05 ` Zack Weinberg 2025-05-24 19:25 ` Florian Weimer 2026-01-18 22:23 ` Alejandro Colomar 2 siblings, 1 reply; 56+ messages in thread From: Rich Felker @ 2025-05-24 2:24 UTC (permalink / raw) To: Zack Weinberg Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > Taking everything said in this thread into account, I have attempted to > wordsmith new language for the close(2) manpage. Please let me know > what you think, and please help me with the bits marked in square > brackets. I can make this into a proper patch for the manpages > when everyone is happy with it. > > zw > > --- > > DESCRIPTION > ... existing text ... > > close() always succeeds. That is, after it returns, _fd_ has > always been disconnected from the open file it formerly referred > to, and its number can be recycled to refer to some other file. > Furthermore, if _fd_ was the last reference to the underlying > open file description, the resources associated with the open file > description will always have been scheduled to be released. > > However, close may report _delayed errors_ from a previous I/O > operation. Therefore, its return value should not be ignored. > > RETURN VALUE > close() returns zero if there are no delayed errors to report, > or -1 if there _might_ be delayed errors. > > When close() returns -1, check _errno_ to see what the situation > actually is. Most, but not all, _errno_ codes indicate a delayed > I/O error that should be reported to the user. See ERRORS and > NOTES for more detail. > > [QUERY: Is it ever possible to get delayed errors on close() from > a file that was opened with O_RDONLY? What about a file that was > opened with O_RDWR but never actually written to? If people only > have to worry about delayed errors if the file was actually > written to, we should say so at this point. > > It would also be good to mention whether it is possible to get a > delayed error on close() even if a previous call to fsync() or > fdatasync() succeeded and there haven’t been any more writes to > that file *description* (not necessarily via the fd being closed) > since.] > > ERRORS > EBADF _fd_ wasn’t open in the first place, or is outside the > valid numeric range for file descriptors. > > EINPROGRESS > EINTR > There are no delayed errors to report, but the kernel is > still doing some clean-up work in the background. This > situation should be treated the same as if close() had > returned zero. Do not retry the close(), and do not report > an error to the user. Since this behavior for EINTR is non-conforming (and even prior to the POSIX 2024 update, it was contrary to the general semantics for EINTR, that no non-ignoreable side-effects have taken place), it should be noted that it's Linux/glibc-specific. > EDQUOT > EFBIG > EIO > ENOSPC > These are the most common errno codes associated with > delayed I/O errors. They should be treated as a hard > failure to write to the file that was formerly associated > with _fd_, the same as if an earlier write(2) had failed > with one of these codes. The file has still been closed! > Do not retry the close(). But do report an error to the user. > > Depending on the underlying file, close() may return other errno > codes; these should generally also be treated as delayed I/O errors. > > NOTES > Dealing with error returns from close() > > As discussed above, close() always closes the file. Except when > errno is set to EBADF, EINPROGRESS, or EINTR, an error return from > close() reports a _delayed I/O error_ from a previous write() > operation. > > It is vital to report delayed I/O errors to the user; failing to > check the return value of close() can cause _silent_ loss of data. > The most common situations where this actually happens involve > networked filesystems, where, in the name of throughput, write() > often returns success before the server has actually confirmed a > successful write. > > However, it is also vital to understand that _no matter what_ > close() returns, and _no matter what_ it sets errno to, when it > returns, _the file descriptor passed to close() has been closed_, > and its number is _immediately_ available for reuse by open(2), > dup(2), etc. Therefore, one should never retry a close(), not > even if it set errno to a value that normally indicates the > operation needs to be retried (e.g. EINTR). Retrying a close() > is a serious bug, particularly in a multithreaded program; if > the file descriptor number has already been reused, _that file_ > will get closed out from under whatever other thread opened it. > > [Possibly something about fsync/fdatasync here?] While I agree with all of this, I think the tone is way too proscriptive. The man pages are to document the behaviors, not tell people how to program. And again, it should be noted that the standard behavior is that you *do* have to retry on EINTR, or arrange to ensure it never happens (e.g. by not installing interrupting signal handlers, or blocking signals across calls to close), and that treating EINTR as "fd has been closed" is something you should only do on known-nonconforming systems. Aside: the reason EINTR *has to* be specified this way is that pthread cancellation is aligned with EINTR. If EINTR were defined to have closed the fd, then acting on cancellation during close would also have closed the fd, but the cancellation handler would have no way to distinguish this, leading to a situation where you're forced to either leak fds or introduce a double-close vuln. > BUGS > Prior to POSIX.1-2024, there was no official guarantee that > close() would always close the file descriptor, even on error. > Linux has always closed the file descriptor, even on error, > but other implementations might not have. > > The only such implementation we have heard of is HP-UX; at least > some versions of HP-UX’s man page for close() said it should be > retried if it returned -1 with errno set to EINTR. (If you know > exactly which versions of HP-UX are affected, or of any other > Unix where close() doesn’t always close the file descriptor, > please contact us about it.) > > Portable code should nonetheless never retry a failed close(); the > consequences of a file descriptor leak are far less dangerous than > the consequences of closing a file out from under another thread. This is explicitly the opposite of what's specified for portable code. It sounds like you are intentionally omitting that POSIX says the opposite of what you want it to, and treating the standard behavior as a historical HP-UX quirk/bug. This is polemic, not the sort of documentation that belongs in a man page. An outline of what I'd like to see instead: - Clear explanation of why double-close is a serious bug that must always be avoided. (I think we all agree on this.) - Statement that the historical Linux/glibc behavior and current POSIX requirement differ, without language that tries to paint the POSIX behavior as a HP-UX bug/quirk. Possibly citing real sources/history of the issue (Austin Group tracker items 529, 614; maybe others). - Consequence of just assuming the Linux behavior (fd leaks on conforming systems). - Consequences of assuming the POSIX behavior (double-close vulns on GNU/Linux, maybe others). - Survey of methods for avoiding the problem (ways to preclude EINTR, possibly ways to infer behavior, etc). Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-24 2:24 ` Rich Felker @ 2026-01-20 17:05 ` Zack Weinberg 2026-01-20 17:46 ` Rich Felker 0 siblings, 1 reply; 56+ messages in thread From: Zack Weinberg @ 2026-01-20 17:05 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: >> close() always succeeds. That is, after it returns, _fd_ has >> always been disconnected from the open file it formerly referred >> to, and its number can be recycled to refer to some other file. >> Furthermore, if _fd_ was the last reference to the underlying >> open file description, the resources associated with the open file >> description will always have been scheduled to be released. ... >> EINPROGRESS >> EINTR >> There are no delayed errors to report, but the kernel is >> still doing some clean-up work in the background. This >> situation should be treated the same as if close() had >> returned zero. Do not retry the close(), and do not report >> an error to the user. > > Since this behavior for EINTR is non-conforming (and even prior to the > POSIX 2024 update, it was contrary to the general semantics for EINTR, > that no non-ignoreable side-effects have taken place), it should be > noted that it's Linux/glibc-specific. I am prepared to take your word for it that POSIX says this is non-conforming, but in that case, POSIX is wrong, and I will not be convinced otherwise by any argument. Operations that release a resource must always succeed. Now, the abstract correct behavior is secondary to the fact that we know there are both systems where close should not be retried after EINTR (Linux) and systems where the fd is still open after EINTR (HP-UX). But it is my position that *portable code* should assume the Linux behavior, because that is the safest option. If you assume the HP-UX behavior on a machine that implements the Linux behavior, you might close some unrelated file out from under yourself (probably but not necessarily a different thread). If you assume the Linux behavior on a machine that implements the HP-UX behavior, you have leaked a file descriptor; the worst things that can do are much less severe. The only way to get it right all the time is to have a big long list of #ifdefs for every Unix under the sun, and we don't even have the data we would need to write that list. > While I agree with all of this, I think the tone is way too > proscriptive. The man pages are to document the behaviors, not tell > people how to program. I could be persuaded to tone it down a little but in this case I think the man page's job *is* to tell people how to program. We know lots of existing code has gotten the fine details of close() wrong and we are trying to document how to do it right. > Aside: the reason EINTR *has to* be specified this way is that pthread > cancellation is aligned with EINTR. If EINTR were defined to have > closed the fd, then acting on cancellation during close would also > have closed the fd, but the cancellation handler would have no way to > distinguish this, leading to a situation where you're forced to either > leak fds or introduce a double-close vuln. The correct way to address this would be to make close() not be a cancellation point. > It sounds like you are intentionally omitting that POSIX says the > opposite of what you want it to, and treating the standard behavior > as a historical HP-UX quirk/bug. This is polemic, not the sort of > documentation that belongs in a man page. To be clear, when I wrote all this I thought the POSIX.1-2024 change did in fact make the semantics be that close() closes the descriptor no matter what it returns. However, I insist that the correct behavior is in fact for close to close the descriptor no matter what it returns, and to the extent POSIX says anything else, POSIX is wrong. Again, you cannot change my mind about this. N.B. I have skimmed the current text of https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html and it appears to me that the committee more or less agrees with me, but wishes to avoid declaring HP-UX (and any other systems with the same behavior) nonconformant. So instead of just saying the fd is closed no matter what, they've invented a new variant on close that they have more scope to modify the behavior of, and they're nudging implementations to not return EINTR from (posix_)close at all. I don't think we (authors of this particular set of manpages) need to care about the Austin Group's reluctance to declare existing legacy systems nonconformant. > An outline of what I'd like to see instead: > > - Clear explanation of why double-close is a serious bug that must > always be avoided. (I think we all agree on this.) > > - Statement that the historical Linux/glibc behavior and current POSIX > requirement differ, without language that tries to paint the POSIX > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > of the issue (Austin Group tracker items 529, 614; maybe others). > > - Consequence of just assuming the Linux behavior (fd leaks on > conforming systems). > > - Consequences of assuming the POSIX behavior (double-close vulns on > GNU/Linux, maybe others). > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > possibly ways to infer behavior, etc). This outline seems more or less reasonable to me but, if it's me writing the text, I _will_ characterize what POSIX currently says about EINTR returns from close() as a bug in POSIX. As far as I'm concerned, that is a fact, not polemic. I have found that arguing with you in particular, Rich, is generally not worth the effort. Therefore, unless you reply and _accept_ that the final version of the close manpage will say that POSIX is buggy, I am not going to write another version of this text, nor will I be drawn into further debate. zw ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 17:05 ` Zack Weinberg @ 2026-01-20 17:46 ` Rich Felker 2026-01-20 18:39 ` Florian Weimer ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Rich Felker @ 2026-01-20 17:46 UTC (permalink / raw) To: Zack Weinberg Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > >> close() always succeeds. That is, after it returns, _fd_ has > >> always been disconnected from the open file it formerly referred > >> to, and its number can be recycled to refer to some other file. > >> Furthermore, if _fd_ was the last reference to the underlying > >> open file description, the resources associated with the open file > >> description will always have been scheduled to be released. > ... > >> EINPROGRESS > >> EINTR > >> There are no delayed errors to report, but the kernel is > >> still doing some clean-up work in the background. This > >> situation should be treated the same as if close() had > >> returned zero. Do not retry the close(), and do not report > >> an error to the user. > > > > Since this behavior for EINTR is non-conforming (and even prior to the > > POSIX 2024 update, it was contrary to the general semantics for EINTR, > > that no non-ignoreable side-effects have taken place), it should be > > noted that it's Linux/glibc-specific. > > I am prepared to take your word for it that POSIX says this is > non-conforming, but in that case, POSIX is wrong, and I will not be > convinced otherwise by any argument. Operations that release a > resource must always succeed. There are two conflicting requirements here: 1. Operations that release a resource must always succeed. 2. Failure with EINTR must not not have side effects. The right conclusion is that operations that release resources must not be able to fail with EINTR. And that's how POSIX should have resolved the situation -- by getting rid of support for the silly legacy synchronous-tape-drive-rewinding behavior of close on some systems, and requiring close to succeed immediately with no waiting for anything. But abandoning requirement 2 is not an option, especially in light of the relationship between EINTR and thread cancellation in regards to contract about side effects. It's perfectly reasonable for implementations (as musl does, and as I think glibc either does or intends to do) to just go all the way and satisfy both 1 and 2 by having close translate the kernel EINTR into 0. > Now, the abstract correct behavior is secondary to the fact that we > know there are both systems where close should not be retried after > EINTR (Linux) and systems where the fd is still open after EINTR > (HP-UX). But it is my position that *portable code* should assume the > Linux behavior, because that is the safest option. If you assume the > HP-UX behavior on a machine that implements the Linux behavior, you > might close some unrelated file out from under yourself (probably but > not necessarily a different thread). If you assume the Linux behavior > on a machine that implements the HP-UX behavior, you have leaked a > file descriptor; the worst things that can do are much less severe. Unfortunately, regardless of what happens, code portable to old systems needs to avoid getting in the situation to begin with. By either not installing interrupting signal handlers or blocking EINTR around close. > The only way to get it right all the time is to have a big long list > of #ifdefs for every Unix under the sun, and we don't even have the > data we would need to write that list. > > > While I agree with all of this, I think the tone is way too > > proscriptive. The man pages are to document the behaviors, not tell > > people how to program. > > I could be persuaded to tone it down a little but in this case I think > the man page's job *is* to tell people how to program. We know lots of > existing code has gotten the fine details of close() wrong and we are > trying to document how to do it right. No, the job of the man pages absolutely is not "to tell people how to program". It's to document behaviors. They are not a programming tutorial. They are not polemic diatribes. They are unbiased statements of facts. Facts of what the standards say and what implementations do, that equip programmers with the knowledge they need to make their own informed decisions, rather than blindly following what someone who thinks they know better told them to do. > > Aside: the reason EINTR *has to* be specified this way is that pthread > > cancellation is aligned with EINTR. If EINTR were defined to have > > closed the fd, then acting on cancellation during close would also > > have closed the fd, but the cancellation handler would have no way to > > distinguish this, leading to a situation where you're forced to either > > leak fds or introduce a double-close vuln. > > The correct way to address this would be to make close() not be a > cancellation point. This would also be a desirable change, one I would support if other implementors are on-board with pushing for it. > > An outline of what I'd like to see instead: > > > > - Clear explanation of why double-close is a serious bug that must > > always be avoided. (I think we all agree on this.) > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > requirement differ, without language that tries to paint the POSIX > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > conforming systems). > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > GNU/Linux, maybe others). > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > possibly ways to infer behavior, etc). > > This outline seems more or less reasonable to me but, if it's me > writing the text, I _will_ characterize what POSIX currently says > about EINTR returns from close() as a bug in POSIX. As far as I'm > concerned, that is a fact, not polemic. > > I have found that arguing with you in particular, Rich, is generally > not worth the effort. Therefore, unless you reply and _accept_ that > the final version of the close manpage will say that POSIX is buggy, > I am not going to write another version of this text, nor will I be > drawn into further debate. I will not accept that because it's a gross violation of the responsibility of document writing. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 17:46 ` Rich Felker @ 2026-01-20 18:39 ` Florian Weimer 2026-01-20 19:00 ` Rich Felker 2026-01-20 20:11 ` Paul Eggert 2026-01-20 20:35 ` Alejandro Colomar 2 siblings, 1 reply; 56+ messages in thread From: Florian Weimer @ 2026-01-20 18:39 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Rich Felker: > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: >> >> close() always succeeds. That is, after it returns, _fd_ has >> >> always been disconnected from the open file it formerly referred >> >> to, and its number can be recycled to refer to some other file. >> >> Furthermore, if _fd_ was the last reference to the underlying >> >> open file description, the resources associated with the open file >> >> description will always have been scheduled to be released. >> ... >> >> EINPROGRESS >> >> EINTR >> >> There are no delayed errors to report, but the kernel is >> >> still doing some clean-up work in the background. This >> >> situation should be treated the same as if close() had >> >> returned zero. Do not retry the close(), and do not report >> >> an error to the user. >> > >> > Since this behavior for EINTR is non-conforming (and even prior to the >> > POSIX 2024 update, it was contrary to the general semantics for EINTR, >> > that no non-ignoreable side-effects have taken place), it should be >> > noted that it's Linux/glibc-specific. >> >> I am prepared to take your word for it that POSIX says this is >> non-conforming, but in that case, POSIX is wrong, and I will not be >> convinced otherwise by any argument. Operations that release a >> resource must always succeed. > > There are two conflicting requirements here: > > 1. Operations that release a resource must always succeed. > 2. Failure with EINTR must not not have side effects. > > The right conclusion is that operations that release resources must > not be able to fail with EINTR. And that's how POSIX should have > resolved the situation -- by getting rid of support for the silly > legacy synchronous-tape-drive-rewinding behavior of close on some > systems, and requiring close to succeed immediately with no waiting > for anything. What about SO_LINGER? Isn't this relevant in context? As far as I know, there is no other way besides SO_LINGER to get notification if the packet buffers are actually gone. If you don't use it, memory can pile up in the kernel without the application's knowledge. Thanks, Florian ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 18:39 ` Florian Weimer @ 2026-01-20 19:00 ` Rich Felker 2026-01-20 20:05 ` Florian Weimer 0 siblings, 1 reply; 56+ messages in thread From: Rich Felker @ 2026-01-20 19:00 UTC (permalink / raw) To: Florian Weimer Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote: > * Rich Felker: > > > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > >> >> close() always succeeds. That is, after it returns, _fd_ has > >> >> always been disconnected from the open file it formerly referred > >> >> to, and its number can be recycled to refer to some other file. > >> >> Furthermore, if _fd_ was the last reference to the underlying > >> >> open file description, the resources associated with the open file > >> >> description will always have been scheduled to be released. > >> ... > >> >> EINPROGRESS > >> >> EINTR > >> >> There are no delayed errors to report, but the kernel is > >> >> still doing some clean-up work in the background. This > >> >> situation should be treated the same as if close() had > >> >> returned zero. Do not retry the close(), and do not report > >> >> an error to the user. > >> > > >> > Since this behavior for EINTR is non-conforming (and even prior to the > >> > POSIX 2024 update, it was contrary to the general semantics for EINTR, > >> > that no non-ignoreable side-effects have taken place), it should be > >> > noted that it's Linux/glibc-specific. > >> > >> I am prepared to take your word for it that POSIX says this is > >> non-conforming, but in that case, POSIX is wrong, and I will not be > >> convinced otherwise by any argument. Operations that release a > >> resource must always succeed. > > > > There are two conflicting requirements here: > > > > 1. Operations that release a resource must always succeed. > > 2. Failure with EINTR must not not have side effects. > > > > The right conclusion is that operations that release resources must > > not be able to fail with EINTR. And that's how POSIX should have > > resolved the situation -- by getting rid of support for the silly > > legacy synchronous-tape-drive-rewinding behavior of close on some > > systems, and requiring close to succeed immediately with no waiting > > for anything. > > What about SO_LINGER? Isn't this relevant in context? shutdown should be used for this, not close. So that the acts of waiting for the operation to finish, and releasing the resource handle needed to observe if it's finished, are separate. > As far as I know, there is no other way besides SO_LINGER to get > notification if the packet buffers are actually gone. If you don't use > it, memory can pile up in the kernel without the application's > knowledge. The way Linux's EINTR behaves, using close can't ensure this memory doesn't pile up, because on EINTR you lose the ability to wait for it. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 19:00 ` Rich Felker @ 2026-01-20 20:05 ` Florian Weimer 0 siblings, 0 replies; 56+ messages in thread From: Florian Weimer @ 2026-01-20 20:05 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Rich Felker: > On Tue, Jan 20, 2026 at 07:39:48PM +0100, Florian Weimer wrote: >> * Rich Felker: >> >> > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: >> >> > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: >> >> >> close() always succeeds. That is, after it returns, _fd_ has >> >> >> always been disconnected from the open file it formerly referred >> >> >> to, and its number can be recycled to refer to some other file. >> >> >> Furthermore, if _fd_ was the last reference to the underlying >> >> >> open file description, the resources associated with the open file >> >> >> description will always have been scheduled to be released. >> >> ... >> >> >> EINPROGRESS >> >> >> EINTR >> >> >> There are no delayed errors to report, but the kernel is >> >> >> still doing some clean-up work in the background. This >> >> >> situation should be treated the same as if close() had >> >> >> returned zero. Do not retry the close(), and do not report >> >> >> an error to the user. >> >> > >> >> > Since this behavior for EINTR is non-conforming (and even prior to the >> >> > POSIX 2024 update, it was contrary to the general semantics for EINTR, >> >> > that no non-ignoreable side-effects have taken place), it should be >> >> > noted that it's Linux/glibc-specific. >> >> >> >> I am prepared to take your word for it that POSIX says this is >> >> non-conforming, but in that case, POSIX is wrong, and I will not be >> >> convinced otherwise by any argument. Operations that release a >> >> resource must always succeed. >> > >> > There are two conflicting requirements here: >> > >> > 1. Operations that release a resource must always succeed. >> > 2. Failure with EINTR must not not have side effects. >> > >> > The right conclusion is that operations that release resources must >> > not be able to fail with EINTR. And that's how POSIX should have >> > resolved the situation -- by getting rid of support for the silly >> > legacy synchronous-tape-drive-rewinding behavior of close on some >> > systems, and requiring close to succeed immediately with no waiting >> > for anything. >> >> What about SO_LINGER? Isn't this relevant in context? > > shutdown should be used for this, not close. So that the acts of > waiting for the operation to finish, and releasing the resource handle > needed to observe if it's finished, are separate. I think shutdown on TCP sockets is non-blocking under Linux. It doesn't wait until the peer has acknowledged the FIN segment, as far as I understand it. Other systems may behave differently. >> As far as I know, there is no other way besides SO_LINGER to get >> notification if the packet buffers are actually gone. If you don't use >> it, memory can pile up in the kernel without the application's >> knowledge. > > The way Linux's EINTR behaves, using close can't ensure this memory > doesn't pile up, because on EINTR you lose the ability to wait for it. Can't the application reliably avoid EINTR by blocking signals? Thanks, Florian ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 17:46 ` Rich Felker 2026-01-20 18:39 ` Florian Weimer @ 2026-01-20 20:11 ` Paul Eggert 2026-01-20 20:35 ` Alejandro Colomar 2 siblings, 0 replies; 56+ messages in thread From: Paul Eggert @ 2026-01-20 20:11 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development, Zack Weinberg On 2026-01-20 09:46, Rich Felker wrote: > the job of the man pages absolutely is not "to tell people how to > program". It's to document behaviors. In practice man pages do both. When I type "man close" on GNU/Linux I see text like the text quoted below, and as a C programmer I appreciate getting advice like this when the situation is sufficiently tricky. ---- Any record locks (see fcntl(2)) held on the file it was associated with, and owned by the process, are removed regardless of the file descriptor that was used to obtain the lock. This has some unfortunate consequences and one should be extra careful when using advisory record locking. See fcntl(2) for discussion of the risks and consequences as well as for the (probably preferred) open file description locks. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 17:46 ` Rich Felker 2026-01-20 18:39 ` Florian Weimer 2026-01-20 20:11 ` Paul Eggert @ 2026-01-20 20:35 ` Alejandro Colomar 2026-01-20 20:42 ` Alejandro Colomar 2 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2026-01-20 20:35 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 9114 bytes --] Hi Rich, Zack, On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote: > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: [...] > > Now, the abstract correct behavior is secondary to the fact that we > > know there are both systems where close should not be retried after > > EINTR (Linux) and systems where the fd is still open after EINTR > > (HP-UX). But it is my position that *portable code* should assume the > > Linux behavior, because that is the safest option. If you assume the > > HP-UX behavior on a machine that implements the Linux behavior, you > > might close some unrelated file out from under yourself (probably but > > not necessarily a different thread). If you assume the Linux behavior > > on a machine that implements the HP-UX behavior, you have leaked a > > file descriptor; the worst things that can do are much less severe. > > Unfortunately, regardless of what happens, code portable to old > systems needs to avoid getting in the situation to begin with. By > either not installing interrupting signal handlers or blocking EINTR > around close. [...] > > > While I agree with all of this, I think the tone is way too > > > proscriptive. The man pages are to document the behaviors, not tell > > > people how to program. > > > > I could be persuaded to tone it down a little but in this case I think > > the man page's job *is* to tell people how to program. We know lots of > > existing code has gotten the fine details of close() wrong and we are > > trying to document how to do it right. > > No, the job of the man pages absolutely is not "to tell people how to > program". It's to document behaviors. They are not a programming > tutorial. They are not polemic diatribes. They are unbiased statements > of facts. Facts of what the standards say and what implementations do, > that equip programmers with the knowledge they need to make their own > informed decisions, rather than blindly following what someone who > thinks they know better told them to do. This reminds me a little bit of the realloc(p,0) fiasco of C89 and glibc. In most cases, I agree with you that manual pages are and should be aseptic, there are cases where I think the manual page needs to be tutorial. Especially when there's such a mess, we need to both explain all the possible behaviors (or at least mention them to some degree). But for example, there's the case of realloc(p,0), where we have a fiasco that was pushed by a compoundment of wrong decisions by the C Committee, and prior to that from System V. We're a bit lucky that C17 accidentally broke it so badly that we now have it as UB, and that gives us the opportunity to fix it now (which BTW might also be the case for close(2)). In the case of realloc(3), I went and documented in the manual page that glibc is broken, and that ISO C is also broken. STANDARDS malloc() free() calloc() realloc() C23, POSIX.1‐2024. reallocarray() POSIX.1‐2024. realloc(p, 0) The behavior of realloc(p, 0) in glibc doesn’t conform to any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008, POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17 specification was changed to make it conforming, but that specification made it impossible to write code that reli‐ ably determines if the input pointer is freed after real‐ loc(p, 0), and C23 changed it again to make this undefined behavior, acknowledging that the C17 specification was broad enough, so that undefined behavior wasn’t worse than that. reallocarray() suffers the same issues in glibc. musl libc and the BSDs conform to all versions of ISO C and POSIX.1. gnulib provides the realloc‐posix module, which provides wrappers realloc() and reallocarray() that conform to all versions of ISO C and POSIX.1. There’s a proposal to standardize the BSD behavior: https: //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt. HISTORY malloc() free() calloc() realloc() POSIX.1‐2001, C89. reallocarray() glibc 2.26. OpenBSD 5.6, FreeBSD 11.0. malloc() and related functions rejected sizes greater than PTRDIFF_MAX starting in glibc 2.30. free() preserved errno starting in glibc 2.33. realloc(p, 0) C89 was ambiguous in its specification of realloc(p, 0). C99 partially fixed this. The original implementation in glibc would have been con‐ forming to C99. However, and ironically, trying to comply with C99 before the standard was released, glibc changed its behavior in glibc 2.1.1 into something that ended up not conforming to the final C99 specification (but this is debated, as the wording of the standard seems self‐contra‐ dicting). ... BUGS Programmers would naturally expect by induction that realloc(p, size) is consistent with free(p) and mal‐ loc(size), as that is the behavior in the general case. This is not explicitly required by POSIX.1‐2024 or C11, but all conforming implementations are consistent with that. The glibc implementation of realloc() is not consistent with that, and as a consequence, it is dangerous to call realloc(p, 0) in glibc. A trivial workaround for glibc is calling it as realloc(p, size?size:1). The workaround for reallocarray() in glibc ——which shares the same bug—— would be reallocarray(p, n?n:1, size?size:1). Apart from documenting that glibc and ISO C are broken, we document how to best deal with it (see the last paragraph in BUGS). This is necessary because I fear that just by documenting the different behaviors, programmers would still not know what to do with that. Just take into account that even several members of the committee don't know how to deal with it. I'd be willing to have something similar for close(2). Have a lovely night! Alex P.S.: I have great news about realloc(p,0)! Microsoft is on-board with the change. They told me they like the proposal, and are willing to fix their realloc(3) implementation. They'll now conduct tests to make sure it doesn't break anything too badly, and will come back to me with any feedback they have from those tests. I'll put the standards proposal for realloc(3) on hold, waiting for Microsoft's feedback. > > > Aside: the reason EINTR *has to* be specified this way is that pthread > > > cancellation is aligned with EINTR. If EINTR were defined to have > > > closed the fd, then acting on cancellation during close would also > > > have closed the fd, but the cancellation handler would have no way to > > > distinguish this, leading to a situation where you're forced to either > > > leak fds or introduce a double-close vuln. > > > > The correct way to address this would be to make close() not be a > > cancellation point. > > This would also be a desirable change, one I would support if other > implementors are on-board with pushing for it. > > > > An outline of what I'd like to see instead: > > > > > > - Clear explanation of why double-close is a serious bug that must > > > always be avoided. (I think we all agree on this.) > > > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > > requirement differ, without language that tries to paint the POSIX > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > > conforming systems). > > > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > > GNU/Linux, maybe others). > > > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > > possibly ways to infer behavior, etc). > > > > This outline seems more or less reasonable to me but, if it's me > > writing the text, I _will_ characterize what POSIX currently says > > about EINTR returns from close() as a bug in POSIX. As far as I'm > > concerned, that is a fact, not polemic. > > > > I have found that arguing with you in particular, Rich, is generally > > not worth the effort. Therefore, unless you reply and _accept_ that > > the final version of the close manpage will say that POSIX is buggy, > > I am not going to write another version of this text, nor will I be > > drawn into further debate. > > I will not accept that because it's a gross violation of the > responsibility of document writing. > > Rich -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 20:35 ` Alejandro Colomar @ 2026-01-20 20:42 ` Alejandro Colomar 2026-01-23 0:33 ` Zack Weinberg 0 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2026-01-20 20:42 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 9759 bytes --] On Tue, Jan 20, 2026 at 09:35:43PM +0100, Alejandro Colomar wrote: > Hi Rich, Zack, > > On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote: > > On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote: > > > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > > [...] > > > > Now, the abstract correct behavior is secondary to the fact that we > > > know there are both systems where close should not be retried after > > > EINTR (Linux) and systems where the fd is still open after EINTR > > > (HP-UX). But it is my position that *portable code* should assume the > > > Linux behavior, because that is the safest option. If you assume the > > > HP-UX behavior on a machine that implements the Linux behavior, you > > > might close some unrelated file out from under yourself (probably but > > > not necessarily a different thread). If you assume the Linux behavior > > > on a machine that implements the HP-UX behavior, you have leaked a > > > file descriptor; the worst things that can do are much less severe. > > > > Unfortunately, regardless of what happens, code portable to old > > systems needs to avoid getting in the situation to begin with. By > > either not installing interrupting signal handlers or blocking EINTR > > around close. > > [...] > > > > > While I agree with all of this, I think the tone is way too > > > > proscriptive. The man pages are to document the behaviors, not tell > > > > people how to program. > > > > > > I could be persuaded to tone it down a little but in this case I think > > > the man page's job *is* to tell people how to program. We know lots of > > > existing code has gotten the fine details of close() wrong and we are > > > trying to document how to do it right. > > > > No, the job of the man pages absolutely is not "to tell people how to > > program". It's to document behaviors. They are not a programming > > tutorial. They are not polemic diatribes. They are unbiased statements > > of facts. Facts of what the standards say and what implementations do, > > that equip programmers with the knowledge they need to make their own > > informed decisions, rather than blindly following what someone who > > thinks they know better told them to do. > > This reminds me a little bit of the realloc(p,0) fiasco of C89 and > glibc. > > In most cases, I agree with you that manual pages are and should be > aseptic, there are cases where I think the manual page needs to be > tutorial. Especially when there's such a mess, we need to both explain > all the possible behaviors (or at least mention them to some degree). ... and guide programmers about how to best use the API. I forgot to finish the sentence. > > But for example, there's the case of realloc(p,0), where we have > a fiasco that was pushed by a compoundment of wrong decisions by the > C Committee, and prior to that from System V. We're a bit lucky that > C17 accidentally broke it so badly that we now have it as UB, and that > gives us the opportunity to fix it now (which BTW might also be the case > for close(2)). > > In the case of realloc(3), I went and documented in the manual page that > glibc is broken, and that ISO C is also broken. > > STANDARDS > malloc() > free() > calloc() > realloc() > C23, POSIX.1‐2024. > > reallocarray() > POSIX.1‐2024. > > realloc(p, 0) > The behavior of realloc(p, 0) in glibc doesn’t conform to > any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008, > POSIX.1‐2013, POSIX.1‐2017, or POSIX.1‐2024. The C17 > specification was changed to make it conforming, but that > specification made it impossible to write code that reli‐ > ably determines if the input pointer is freed after real‐ > loc(p, 0), and C23 changed it again to make this undefined > behavior, acknowledging that the C17 specification was > broad enough, so that undefined behavior wasn’t worse than > that. > > reallocarray() suffers the same issues in glibc. > > musl libc and the BSDs conform to all versions of ISO C > and POSIX.1. > > gnulib provides the realloc‐posix module, which provides > wrappers realloc() and reallocarray() that conform to all > versions of ISO C and POSIX.1. > > There’s a proposal to standardize the BSD behavior: https: > //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt. > > HISTORY > malloc() > free() > calloc() > realloc() > POSIX.1‐2001, C89. > > reallocarray() > glibc 2.26. OpenBSD 5.6, FreeBSD 11.0. > > malloc() and related functions rejected sizes greater than > PTRDIFF_MAX starting in glibc 2.30. > > free() preserved errno starting in glibc 2.33. > > realloc(p, 0) > C89 was ambiguous in its specification of realloc(p, 0). > C99 partially fixed this. > > The original implementation in glibc would have been con‐ > forming to C99. However, and ironically, trying to comply > with C99 before the standard was released, glibc changed > its behavior in glibc 2.1.1 into something that ended up > not conforming to the final C99 specification (but this is > debated, as the wording of the standard seems self‐contra‐ > dicting). > > ... > > BUGS > Programmers would naturally expect by induction that > realloc(p, size) is consistent with free(p) and mal‐ > loc(size), as that is the behavior in the general case. > This is not explicitly required by POSIX.1‐2024 or C11, > but all conforming implementations are consistent with > that. > > The glibc implementation of realloc() is not consistent > with that, and as a consequence, it is dangerous to call > realloc(p, 0) in glibc. > > A trivial workaround for glibc is calling it as > realloc(p, size?size:1). > > The workaround for reallocarray() in glibc ——which shares > the same bug—— would be > reallocarray(p, n?n:1, size?size:1). > > > Apart from documenting that glibc and ISO C are broken, we document how > to best deal with it (see the last paragraph in BUGS). This is > necessary because I fear that just by documenting the different > behaviors, programmers would still not know what to do with that. > Just take into account that even several members of the committee don't > know how to deal with it. > > I'd be willing to have something similar for close(2). > > > Have a lovely night! > Alex > > P.S.: I have great news about realloc(p,0)! Microsoft is on-board with > the change. They told me they like the proposal, and are willing to > fix their realloc(3) implementation. They'll now conduct tests to make > sure it doesn't break anything too badly, and will come back to me with > any feedback they have from those tests. > > I'll put the standards proposal for realloc(3) on hold, waiting for > Microsoft's feedback. > > > > > Aside: the reason EINTR *has to* be specified this way is that pthread > > > > cancellation is aligned with EINTR. If EINTR were defined to have > > > > closed the fd, then acting on cancellation during close would also > > > > have closed the fd, but the cancellation handler would have no way to > > > > distinguish this, leading to a situation where you're forced to either > > > > leak fds or introduce a double-close vuln. > > > > > > The correct way to address this would be to make close() not be a > > > cancellation point. > > > > This would also be a desirable change, one I would support if other > > implementors are on-board with pushing for it. > > > > > > An outline of what I'd like to see instead: > > > > > > > > - Clear explanation of why double-close is a serious bug that must > > > > always be avoided. (I think we all agree on this.) > > > > > > > > - Statement that the historical Linux/glibc behavior and current POSIX > > > > requirement differ, without language that tries to paint the POSIX > > > > behavior as a HP-UX bug/quirk. Possibly citing real sources/history > > > > of the issue (Austin Group tracker items 529, 614; maybe others). > > > > > > > > - Consequence of just assuming the Linux behavior (fd leaks on > > > > conforming systems). > > > > > > > > - Consequences of assuming the POSIX behavior (double-close vulns on > > > > GNU/Linux, maybe others). > > > > > > > > - Survey of methods for avoiding the problem (ways to preclude EINTR, > > > > possibly ways to infer behavior, etc). > > > > > > This outline seems more or less reasonable to me but, if it's me > > > writing the text, I _will_ characterize what POSIX currently says > > > about EINTR returns from close() as a bug in POSIX. As far as I'm > > > concerned, that is a fact, not polemic. > > > > > > I have found that arguing with you in particular, Rich, is generally > > > not worth the effort. Therefore, unless you reply and _accept_ that > > > the final version of the close manpage will say that POSIX is buggy, > > > I am not going to write another version of this text, nor will I be > > > drawn into further debate. > > > > I will not accept that because it's a gross violation of the > > responsibility of document writing. > > > > Rich > > -- > <https://www.alejandro-colomar.es> -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 20:42 ` Alejandro Colomar @ 2026-01-23 0:33 ` Zack Weinberg 2026-01-23 1:02 ` Alejandro Colomar 2026-01-24 19:34 ` The 8472 0 siblings, 2 replies; 56+ messages in thread From: Zack Weinberg @ 2026-01-23 0:33 UTC (permalink / raw) To: Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development Alright, since it actually seems possible we might be having a reasonable conversation about the close manpage now, I've done another draft. I *think* this covers all the concerns expressed so far. I am feeling somewhat more charitable toward the Austin Group after close-reading the current POSIX spec for close, so there is no BUGS section after all. In their shoes I would still have disallowed EINTR returns from close altogether, but I can see why they felt that was a step too far. This is a full top-to-bottom rewrite of the manpage; please speak up if you don't like any of my changes to any of it, not just the new stuff about delayed errors. It's written in freeform text for ease of reading; I'll do proper troff markup after the text is finalized. (Alejandro, do you have a preference between -man and -mdoc markup?) Please note the [QUERY:] sections sprinkled throughout NOTES. I would like to have answers to those questions for the final draft. zw NAME close - close a file descriptor LIBRARY Standard C library (libc, -lc) SYNOPSIS #include <unistd.h> int close(int fd); DESCRIPTION close() closes a file descriptor, so that it no longer refers to any file and may be reused. When the last file descriptor referring to an underlying open file description (see open(2)) is closed, the resources associated with the open file description are freed. If that open file description is the last reference to a file which has been removed using unlink(2), the file is deleted. When *any* file descriptor is closed, all record locks held by the *process*, on the file formerly referred to by that file descriptor, are released. This happens even if the file is still open in the process via a different file descriptor. See fcntl(2) for discussion of the consequences, and for alternatives with less surprising semantics. close() may report a *delayed error* from previous I/O operations on a file. When it does this, the file descriptor has still been closed, but the error needs to be handled. See RETURN VALUE, ERRORS, and NOTES for further discussion of what the errors reported by close mean, and how to handle them. Despite the possibility of delayed errors, a successful close() does *not* guarantee that all data written to the file has been successfully saved to persistent storage. If you need such a guarantee, use fsync(2); see that page for details. The close-on-exec file descriptor flag can be used to ensure that a file descriptor is automatically closed upon a successful execve(2); see fcntl(2) for details. RETURN VALUE close() returns zero if the descriptor has been closed and there were no delayed errors to report. It returns -1 if there was an error that prevented the file descriptor from being closed, *or* if the descriptor has successfully been closed but there was a delayed error to report. The errno code can be used to distinguish them; see ERRORS and NOTES. ERRORS EBADF The fd argument was not a valid, open file descriptor. EINTR The close() call was interrupted by a signal. The file descriptor *may or may not* have been closed, depending on the operating system. See “Signals and close(),” below. EINPROGRESS [POSIX.1-2024 only] The close() call was interrupted by a signal, after the file descriptor number was released for reuse, but before all clean-up work had been completed. The file descriptor has been closed, and a delayed error may have been lost. See “Signals and close(),” below. EIO ESTALE EDQUOT EFBIG ENOSPC These error codes indicate a delayed error from a previous write(2) operation. The file descriptor has been closed, but the error needs to be handled. See “Delayed errors reported by close()”, below. Depending on the underlying file and/or file system, close() may return with other errno codes besides those listed. All such codes also indicate delayed errors. NOTES Multithreaded processes and close() In a multithreaded program, each thread must take care not to accidentally close file descriptors that are in use by other threads. Because system calls that *open* files, sockets, etc. always allocate the lowest file descriptor number that’s not in use, file descriptor numbers are rapidly reused. Closing an fd that another thread is still using is therefore likely to cause data to be read or written to the wrong place. Sometimes programs *deliberately* close a file descriptor that is in use by another thread, intending to cancel any blocking I/O operation that the other thread is performing. Whether this works depends on the operating system. On Linux, it doesn’t work; a blocking I/O system call holds a direct reference to the underlying open file description that is the target of the I/O, and is unaffected by the program closing the file descriptor that was used to initiate the I/O operation. (See open(2) for a discussion of open file descriptions.) Delayed errors reported by close() In a variety of situations, most notably when writing to a file that is hosted on a network file server, write(2) operations may “optimistically” return successfully as soon as the write has been queued for processing. close(2) waits for confirmation that *most* of the processing for previous writes to a file has been completed, and reports any errors that the earlier write() calls *would have* reported, if they hadn’t returned optimistically. Especially, close() will report “disk full” (ENOSPC) and “disk quota exceeded” (EDQUOT) errors that write() didn’t wait for. (To wait for *all* processing to complete, it is necessary to use fsync(2) as well.) Because of these delayed errors, it’s important to check the return value of close() and handle any errors it reports. Ignoring delayed errors can cause silent loss of data. However, when handling delayed errors, keep in mind that the close() call should *not* be repeated. When close() has a delayed error to report, it still closes the file before returning. The file descriptor number might already have been reused for some other file, especially in multithreaded programs. To make another attempt at the failed writes, it’s necessary to reopen the file and start all over again. [QUERY: Do delayed errors ever happen in any of these situations? - The fd is not the last reference to the open file description - The OFD was opened with O_RDONLY - The OFD was opened with O_RDWR but has never actually been written to - No data has been written to the OFD since the last call to fsync() for that OFD - No data has been written to the OFD since the last call to fdatasync() for that OFD If we can give some guidance about when people don’t need to worry about delayed errors, it would be helpful.] Signals and close() close() waits for various I/O operations to complete; it is a blocking system call, which can be interrupted by signals and thread cancellation. As usual, when close() is interrupted by a signal, it returns -1 and sets errno to EINTR. Unlike most system calls that can be interrupted by signals, it is not safe to repeat an interrupted call to close(). Prior to POSIX.1-2024, when a close() was interrupted by a signal, it was *unspecified* whether the file descriptor was still open afterward. The authors of this manpage are aware of both systems where the file descriptor is guaranteed to still be open after an interrupted close(), e.g. HP-UX, and systems where it is guaranteed to be *closed* after an interrupted close(), e.g. Linux and FreeBSD. POSIX.1-2024 makes stricter requirements; operating systems should now return EINPROGRESS, rather than EINTR, when close() is interrupted before it’s completely done, but after the file descriptor number is released for reuse. As usual, though, it will be a a long time before portable code can safely assume all supported systems are compliant with this new requirement. Regardless of the error code, on systems where an interrupted close() cannot be retried, an interruption means that delayed errors may be lost, and in turn *that* means data might silently be lost. Therefore, we strongly recommend that programmers avoid allowing close() to be interrupted by signals in the first place. This can be done in all the usual ways—use only signal handlers installed by sigaction(2) with the SA_RESTART flag, keep signals blocked at all times except during calls to ppoll(2), dedicate a thread to signal handling, etc. [QUERY: Do we know if close() is allowed to block or report delayed errors when no data has been written to the OFD since the last completed fsync() or fdatasync() on that OFD? If it isn’t allowed to block or report delayed errors in that case, another good recommendation would be to always use at least fdatasync() and let *that* be the thing that gets interrupted by signals. The POSIX.1-2024 RATIONALE section makes a very similar recommendation, but doesn’t appear to back that up with normative requirements on close().] STANDARDS POSIX.1-2024. HISTORY The close() system call was present in Unix V7. POSIX.1-2024 clarified the semantics of delayed errors; prior to that revision, it was unspecified whether a close() call that returned a delayed error would close the file descriptor. However, we are not aware of any systems where it didn’t. SEE ALSO close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2), unlink(2), open(2), read(2), write(2), fopen(3), fclose(3) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 0:33 ` Zack Weinberg @ 2026-01-23 1:02 ` Alejandro Colomar 2026-01-23 1:38 ` Al Viro 2026-01-23 14:05 ` Zack Weinberg 2026-01-24 19:34 ` The 8472 1 sibling, 2 replies; 56+ messages in thread From: Alejandro Colomar @ 2026-01-23 1:02 UTC (permalink / raw) To: Zack Weinberg Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 1764 bytes --] Hi Zack, On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote: [...] > This is a full top-to-bottom rewrite of the manpage; please speak > up if you don't like any of my changes to any of it, not just the > new stuff about delayed errors. It's written in freeform text for > ease of reading; I'll do proper troff markup after the text is > finalized. (Alejandro, do you have a preference between -man > and -mdoc markup?) Strong preference for man(7). [...] > ERRORS > EBADF The fd argument was not a valid, open file descriptor. > > EINTR The close() call was interrupted by a signal. > The file descriptor *may or may not* have been closed, > depending on the operating system. See “Signals and > close(),” below. Punctuation like commas should go outside of the quotes (yes, I know some styles do that, but we don't). [...] > STANDARDS > POSIX.1-2024. > > HISTORY > The close() system call was present in Unix V7. That would be simply stated as: V7. We could also document the first POSIX standard, as not all Unix APIs were standardized at the same time. Thus: V7, POSIX.1-1988. Thanks! Have a lovely night! Alex > > POSIX.1-2024 clarified the semantics of delayed errors; prior > to that revision, it was unspecified whether a close() call > that returned a delayed error would close the file descriptor. > However, we are not aware of any systems where it didn’t. > > SEE ALSO > close_range(2), fcntl(2), fsync(2), fdatasync(2), shutdown(2), > unlink(2), open(2), read(2), write(2), fopen(3), fclose(3) -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:02 ` Alejandro Colomar @ 2026-01-23 1:38 ` Al Viro 2026-01-23 14:44 ` Alejandro Colomar 2026-01-23 14:05 ` Zack Weinberg 1 sibling, 1 reply; 56+ messages in thread From: Al Viro @ 2026-01-23 1:38 UTC (permalink / raw) To: Alejandro Colomar Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote: > > HISTORY > > The close() system call was present in Unix V7. > > That would be simply stated as: > > V7. > > We could also document the first POSIX standard, as not all Unix APIs > were standardized at the same time. Thus: > > V7, POSIX.1-1988. > > Thanks! 11/3/71 SYS CLOSE (II) NAME close -- close a file SYNOPSIS (file descriptor in r0) sys close / close = 6. DESCRIPTION Given a file descriptor such as returned from an open or creat call, close closes the associated file. A close of all files is automatic on exit, but since processes are limited to 10 simultaneously open files, close is necessary to programs which deal with many files. FILES SEE ALSO creat, open DIAGNOSTICS The error bit (c—bit) is set for an unknown file descriptor. BUGS OWNER ken, dmr That's V1 manual. In V3 we already get EBADF on unopened descriptor; in _all_ cases there close(N) ends up with descriptor N not opened. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:38 ` Al Viro @ 2026-01-23 14:44 ` Alejandro Colomar 0 siblings, 0 replies; 56+ messages in thread From: Alejandro Colomar @ 2026-01-23 14:44 UTC (permalink / raw) To: Al Viro Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 1455 bytes --] Hi Al, On Fri, Jan 23, 2026 at 01:38:59AM +0000, Al Viro wrote: > On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote: > > > HISTORY > > > The close() system call was present in Unix V7. > > > > That would be simply stated as: > > > > V7. > > > > We could also document the first POSIX standard, as not all Unix APIs > > were standardized at the same time. Thus: > > > > V7, POSIX.1-1988. > > > > Thanks! > > 11/3/71 SYS CLOSE (II) > NAME close -- close a file > SYNOPSIS (file descriptor in r0) > sys close / close = 6. > DESCRIPTION Given a file descriptor such as returned from an open or > creat call, close closes the associated file. A close of > all files is automatic on exit, but since processes are > limited to 10 simultaneously open files, close is > necessary to programs which deal with many files. > FILES > SEE ALSO creat, open > DIAGNOSTICS The error bit (c—bit) is set for an unknown file > descriptor. > BUGS > OWNER ken, dmr > > That's V1 manual. In V3 we already get EBADF on unopened descriptor; > in _all_ cases there close(N) ends up with descriptor N not opened. Thanks! Then it should actually be V1, POSIX.1-1988. Let's not document the history change from V3, as those details are better documented as part of the V3 manual and reading the sources. Have a lovely day! Alex -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 1:02 ` Alejandro Colomar 2026-01-23 1:38 ` Al Viro @ 2026-01-23 14:05 ` Zack Weinberg 1 sibling, 0 replies; 56+ messages in thread From: Zack Weinberg @ 2026-01-23 14:05 UTC (permalink / raw) To: Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On Thu, Jan 22, 2026, at 8:02 PM, Alejandro Colomar wrote: > On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote: > [...] > >> (Alejandro, do you have a preference between -man >> and -mdoc markup?) > > Strong preference for man(7). OK. >> close(),” below. > > Punctuation like commas should go outside of the quotes (yes, I know > some styles do that, but we don't). Will correct. >> HISTORY >> The close() system call was present in Unix V7. > > That would be simply stated as: > > V7. Looking at other really old system calls (fork(), open(), read(), _exit(), link()), they all say "SVr4, 4.3BSD, POSIX.1-2001" and that's what this one said too, before I changed it. I think I'll put it back the way it was. zw ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-23 0:33 ` Zack Weinberg 2026-01-23 1:02 ` Alejandro Colomar @ 2026-01-24 19:34 ` The 8472 2026-01-24 21:39 ` Rich Felker 1 sibling, 1 reply; 56+ messages in thread From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw) To: Zack Weinberg, Alejandro Colomar Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, Rich Felker, linux-fsdevel, linux-api, GNU libc development On 23/01/2026 01:33, Zack Weinberg wrote: [...] > ERRORS > EBADF The fd argument was not a valid, open file descriptor. Unfortunately EBADF from FUSE is passed through unfiltered by the kernel on close[0], that makes it more difficult to reliably detect bugs relating to double-closes of file descriptors. [...] > Delayed errors reported by close() > > In a variety of situations, most notably when writing to a file > that is hosted on a network file server, write(2) operations may > “optimistically” return successfully as soon as the write has > been queued for processing. > > close(2) waits for confirmation that *most* of the processing > for previous writes to a file has been completed, and reports > any errors that the earlier write() calls *would have* reported, > if they hadn’t returned optimistically. Especially, close() > will report “disk full” (ENOSPC) and “disk quota exceeded” > (EDQUOT) errors that write() didn’t wait for. > > (To wait for *all* processing to complete, it is necessary to > use fsync(2) as well.) > > Because of these delayed errors, it’s important to check the > return value of close() and handle any errors it reports. > Ignoring delayed errors can cause silent loss of data. > > However, when handling delayed errors, keep in mind that the > close() call should *not* be repeated. When close() has a > delayed error to report, it still closes the file before > returning. The file descriptor number might already have been > reused for some other file, especially in multithreaded > programs. To make another attempt at the failed writes, it’s > necessary to reopen the file and start all over again. > > [QUERY: Do delayed errors ever happen in any of these situations? > > - The fd is not the last reference to the open file description > > - The OFD was opened with O_RDONLY > > - The OFD was opened with O_RDWR but has never actually > been written to > > - No data has been written to the OFD since the last call to > fsync() for that OFD > > - No data has been written to the OFD since the last call to > fdatasync() for that OFD > > If we can give some guidance about when people don’t need to > worry about delayed errors, it would be helpful.] > The Rust standard library team is also interested in this topic, there is lively discussion[1] whether it makes sense to surface errors from close at all. Our current default is to ignore them. It is my understanding that errors may not have happened yet at the time of close due to delayed writeback or additional descriptors pointing to the description, e.g. in a forked child, and thus close() is not a reliable mechanism for error detection and fsync() is the only available option. Some users do care specifically about the unusual behavior on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate that there's no middle ground to get errors on an open file descriptor or initiate the NFS flush behavior without a full fsync. [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ [1] https://github.com/rust-lang/libs-team/issues/705 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 19:34 ` The 8472 @ 2026-01-24 21:39 ` Rich Felker 2026-01-24 21:57 ` The 8472 0 siblings, 1 reply; 56+ messages in thread From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw) To: The 8472 Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: > On 23/01/2026 01:33, Zack Weinberg wrote: > > [...] > > > ERRORS > > EBADF The fd argument was not a valid, open file descriptor. > > Unfortunately EBADF from FUSE is passed through unfiltered by the kernel > on close[0], that makes it more difficult to reliably detect bugs relating > to double-closes of file descriptors. Wow, that's a nasty bug. Are the kernel folks not amenable to fixing it? I wonder if that could even have security implications. I think you could detect these fraudulent EBADFs (albeit not under conditions where there's a race bug) by performing fcntl/F_GETFD before close and knowing the EBADF from close is fake is fcntl didn't EBADF, but that seems like an unreasonable cost to work around FUSE behaving badly. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 21:39 ` Rich Felker @ 2026-01-24 21:57 ` The 8472 2026-01-25 15:37 ` Zack Weinberg 0 siblings, 1 reply; 56+ messages in thread From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On 24/01/2026 22:39, Rich Felker wrote: > On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: >> On 23/01/2026 01:33, Zack Weinberg wrote: >> >> [...] >> >>> ERRORS >>> EBADF The fd argument was not a valid, open file descriptor. >> >> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel >> on close[0], that makes it more difficult to reliably detect bugs relating >> to double-closes of file descriptors. > > Wow, that's a nasty bug. Are the kernel folks not amenable to fixing > it? Not when I brought it up last time, no[0] > I wonder if that could even have security implications. I think > you could detect these fraudulent EBADFs (albeit not under conditions > where there's a race bug) by performing fcntl/F_GETFD before close and > knowing the EBADF from close is fake is fcntl didn't EBADF, but that > seems like an unreasonable cost to work around FUSE behaving badly. > > Rich That's pretty much the workaround[1] we use, but due to the extra syscall it's only done in debug builds. [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ [1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-24 21:57 ` The 8472 @ 2026-01-25 15:37 ` Zack Weinberg 2026-01-26 8:51 ` Florian Weimer 2026-01-26 12:15 ` Jan Kara 0 siblings, 2 replies; 56+ messages in thread From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw) To: The 8472, Rich Felker Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > On 24/01/2026 22:39, Rich Felker wrote: >> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote: >>> On 23/01/2026 01:33, Zack Weinberg wrote: >>> >>> [...] >>> >>>> ERRORS >>>> EBADF The fd argument was not a valid, open file descriptor. >>> >>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel >>> on close[0], that makes it more difficult to reliably detect bugs relating >>> to double-closes of file descriptors. >> >> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing >> it? > > Not when I brought it up last time, no[0] > > [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/ It seems to me that Antonio Muscemi’s point is valid for *most* errno codes. Like, a whole lot of them exist just to give more information *to a human user* about the cause of an unrecoverable error. Take the list of “error codes that indicate a delayed error from a previous write(2) operation,” from a little later in the draft, for instance: there’s no plausible way for a *program* to react differently to EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want to react differently, so we want different error messages for each, so they’re different error codes. It’s not a problem if the kernel produces an error code of this type that wasn’t in the official documented list, because the program doesn’t need to treat it specially. But EBADF is different; it has the very specific meaning “user space passed an invalid file descriptor to a system call,” which almost always indicates a *bug in the program*, and allowing that meaning to be diluted is not OK. It’s getting off topic for this conversation, but there’s a short list of other errno codes that indicate a specific situation that the *program* should respond to in a specific way (EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones I can think of) and maybe it would spark a more constructive conversation on the kernel side if we presented a *comprehensive* list of errno codes that FUSE servers shouldn’t be allowed to produce with a specific rationale for each. >> Delayed errors reported by close() >> >> In a variety of situations, most notably when writing to a file >> that is hosted on a network file server, write(2) operations may >> “optimistically” return successfully as soon as the write has >> been queued for processing. >> >> close(2) waits for confirmation that *most* of the processing >> for previous writes to a file has been completed, and reports >> any errors that the earlier write() calls *would have* reported, >> if they hadn’t returned optimistically. Especially, close() >> will report “disk full” (ENOSPC) and “disk quota exceeded” >> (EDQUOT) errors that write() didn’t wait for. > > The Rust standard library team is also interested in this topic, there > is lively discussion[1] whether it makes sense to surface errors from > close at all. Our current default is to ignore them. > It is my understanding that errors may not have happened yet at > the time of close due to delayed writeback or additional descriptors > pointing to the description, e.g. in a forked child, and thus > close() is not a reliable mechanism for error detection and > fsync() is the only available option. > > [1] https://github.com/rust-lang/libs-team/issues/705 This is something I care about a lot as well, but I currently don’t have an *opinion*. To form an informed opinion, I need the answers to these questions: >> [QUERY: Do delayed errors ever happen in any of these situations? >> >> - The fd is not the last reference to the open file description >> >> - The OFD was opened with O_RDONLY >> >> - The OFD was opened with O_RDWR but has never actually >> been written to >> >> - No data has been written to the OFD since the last call to >> fsync() for that OFD >> >> - No data has been written to the OFD since the last call to >> fdatasync() for that OFD >> >> If we can give some guidance about when people don’t need to >> worry about delayed errors, it would be helpful.] In particular, I really hope delayed errors *aren’t* ever reported when you close a file descriptor that *isn’t* the last reference to its open file description, because the thread-safe way to close stdout without losing write errors[2] depends on that not happening. And whether the Rust stdlib can legitimately say “leaving aside the additional cost of calling fsync(), you do not *need* the error return from close() because you can call fsync() first,” depends on whether it’s actually true that you *won’t* ever get a delayed error from close() if you called fsync() first and didn’t do any more output in between (assume the fd has no duplicates here). I would not be surprised at all if those FUSE guys insisted on their right to make char msg[] = "soon I will be invincible\n"; int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666); write(fd, msg, sizeof(msg) - 1); fsync(fd); close(fd); return an error *only* from the close, not the write or the fsync. And I also wouldn’t be surprised at all to find production NFS or SMB servers that did that. [2] https://stackoverflow.com/a/50865617 (third code block) zw ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-25 15:37 ` Zack Weinberg @ 2026-01-26 8:51 ` Florian Weimer 2026-01-26 12:15 ` Jan Kara 1 sibling, 0 replies; 56+ messages in thread From: Florian Weimer @ 2026-01-26 8:51 UTC (permalink / raw) To: Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Zack Weinberg: > In particular, I really hope delayed errors *aren’t* ever reported > when you close a file descriptor that *isn’t* the last reference > to its open file description, because the thread-safe way to close > stdout without losing write errors[2] depends on that not happening. > [2] https://stackoverflow.com/a/50865617 (third code block) Are you sure about that? It means that errors are never reported if a shell script redirects standard output over multiple commands. Thanks, Florian ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-25 15:37 ` Zack Weinberg 2026-01-26 8:51 ` Florian Weimer @ 2026-01-26 12:15 ` Jan Kara 2026-01-26 13:53 ` The 8472 1 sibling, 1 reply; 56+ messages in thread From: Jan Kara @ 2026-01-26 12:15 UTC (permalink / raw) To: Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > >> Delayed errors reported by close() > >> > >> In a variety of situations, most notably when writing to a file > >> that is hosted on a network file server, write(2) operations may > >> “optimistically” return successfully as soon as the write has > >> been queued for processing. > >> > >> close(2) waits for confirmation that *most* of the processing > >> for previous writes to a file has been completed, and reports > >> any errors that the earlier write() calls *would have* reported, > >> if they hadn’t returned optimistically. Especially, close() > >> will report “disk full” (ENOSPC) and “disk quota exceeded” > >> (EDQUOT) errors that write() didn’t wait for. > > > > The Rust standard library team is also interested in this topic, there > > is lively discussion[1] whether it makes sense to surface errors from > > close at all. Our current default is to ignore them. > > It is my understanding that errors may not have happened yet at > > the time of close due to delayed writeback or additional descriptors > > pointing to the description, e.g. in a forked child, and thus > > close() is not a reliable mechanism for error detection and > > fsync() is the only available option. > > > > [1] https://github.com/rust-lang/libs-team/issues/705 > > This is something I care about a lot as well, but I currently don’t > have an *opinion*. To form an informed opinion, I need the answers > to these questions: > > >> [QUERY: Do delayed errors ever happen in any of these situations? > >> > >> - The fd is not the last reference to the open file description > >> > >> - The OFD was opened with O_RDONLY > >> > >> - The OFD was opened with O_RDWR but has never actually > >> been written to > >> > >> - No data has been written to the OFD since the last call to > >> fsync() for that OFD > >> > >> - No data has been written to the OFD since the last call to > >> fdatasync() for that OFD > >> > >> If we can give some guidance about when people don’t need to > >> worry about delayed errors, it would be helpful.] > > In particular, I really hope delayed errors *aren’t* ever reported > when you close a file descriptor that *isn’t* the last reference > to its open file description, because the thread-safe way to close > stdout without losing write errors[2] depends on that not happening. So I've checked and in Linux ->flush callback for the file is called whenever you close a file descriptor (regardless whether there are other file descriptors pointing to the same file description) so it's upto filesystem implementation what it decides to do and which error it will return... Checking the implementations e.g. FUSE and NFS *will* return delayed writeback errors on *first* descriptor close even if there are other still open descriptors for the description AFAICS. > And whether the Rust stdlib can legitimately say “leaving aside the > additional cost of calling fsync(), you do not *need* the error return > from close() because you can call fsync() first,” depends on whether > it’s actually true that you *won’t* ever get a delayed error from > close() if you called fsync() first and didn’t do any more output in > between (assume the fd has no duplicates here). I would not be > surprised at all if those FUSE guys insisted on their right to make > > char msg[] = "soon I will be invincible\n"; > int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666); > write(fd, msg, sizeof(msg) - 1); > fsync(fd); > close(fd); > > return an error *only* from the close, not the write or the fsync. So fsync(2) must make sure data is persistently stored and return error if it was not. Thus as a VFS person I'd consider it a filesystem bug if an error preveting reading data later was not returned from fsync(2). OTOH that doesn't necessarily mean that later close doesn't return an error - e.g. FUSE does communicate with the server on close that can fail and error can be returned. With this in mind let me now try to answer your remaining questions: > >> - The OFD was opened with O_RDONLY If the filesystem supports atime, close can in principle report that atime update failed. > >> - The OFD was opened with O_RDWR but has never actually > >> been written to The same as above but with inode mtime updates. > >> - No data has been written to the OFD since the last call to > >> fsync() for that OFD No writeback errors should happen in this case. As I wrote above I'd consider this a filesystem bug. > >> > >> - No data has been written to the OFD since the last call to > >> fdatasync() for that OFD Errors can happen because some inode metadata (in practice probably only inode time stamps) may still need to be written out. So in the cases described above (except for fsync()) you may get delayed errors on close. But since in all those cases no data is lost, I don't think 99.9% of applications care at all... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 12:15 ` Jan Kara @ 2026-01-26 13:53 ` The 8472 2026-01-26 15:56 ` Jan Kara 0 siblings, 1 reply; 56+ messages in thread From: The 8472 @ 2026-01-26 13:53 UTC (permalink / raw) To: Jan Kara, Zack Weinberg Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On 26/01/2026 13:15, Jan Kara wrote: > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: >> >>>> [QUERY: Do delayed errors ever happen in any of these situations? >>>> >>>> - The fd is not the last reference to the open file description >>>> >>>> - The OFD was opened with O_RDONLY >>>> >>>> - The OFD was opened with O_RDWR but has never actually >>>> been written to >>>> >>>> - No data has been written to the OFD since the last call to >>>> fsync() for that OFD >>>> >>>> - No data has been written to the OFD since the last call to >>>> fdatasync() for that OFD >>>> >>>> If we can give some guidance about when people don’t need to >>>> worry about delayed errors, it would be helpful.] >> >> In particular, I really hope delayed errors *aren’t* ever reported >> when you close a file descriptor that *isn’t* the last reference >> to its open file description, because the thread-safe way to close >> stdout without losing write errors[2] depends on that not happening. > > So I've checked and in Linux ->flush callback for the file is called > whenever you close a file descriptor (regardless whether there are other > file descriptors pointing to the same file description) so it's upto > filesystem implementation what it decides to do and which error it will > return... Checking the implementations e.g. FUSE and NFS *will* return > delayed writeback errors on *first* descriptor close even if there are > other still open descriptors for the description AFAICS. Regarding the "first", does that mean the errors only get delivered once? I.e. if a concurrent fork/exec happens for process spawning and the fork-child closes the file descriptors then this closing may basically receive the errors and the parent will not see them (unless additional errors happen)? Or if _any_ part of the program dups the descriptor and then closes it without reporting errors then all uses of those descriptor must consider error delivery on close to be unreliable? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 13:53 ` The 8472 @ 2026-01-26 15:56 ` Jan Kara 2026-01-26 16:43 ` Jeff Layton 0 siblings, 1 reply; 56+ messages in thread From: Jan Kara @ 2026-01-26 15:56 UTC (permalink / raw) To: The 8472 Cc: Jan Kara, Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development, Jeff Layton On Mon 26-01-26 14:53:12, The 8472 wrote: > On 26/01/2026 13:15, Jan Kara wrote: > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > been written to > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > fsync() for that OFD > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > fdatasync() for that OFD > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > worry about delayed errors, it would be helpful.] > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > when you close a file descriptor that *isn’t* the last reference > > > to its open file description, because the thread-safe way to close > > > stdout without losing write errors[2] depends on that not happening. > > > > So I've checked and in Linux ->flush callback for the file is called > > whenever you close a file descriptor (regardless whether there are other > > file descriptors pointing to the same file description) so it's upto > > filesystem implementation what it decides to do and which error it will > > return... Checking the implementations e.g. FUSE and NFS *will* return > > delayed writeback errors on *first* descriptor close even if there are > > other still open descriptors for the description AFAICS. > Regarding the "first", does that mean the errors only get delivered once? I've added Jeff to CC who should be able to provide you with a more authoritative answer but AFAIK the answer is yes. E.g. NFS does: static int nfs_file_flush(struct file *file, fl_owner_t id) { ... /* Flush writes to the server and return any errors */ since = filemap_sample_wb_err(file->f_mapping); nfs_wb_all(inode); return filemap_check_wb_err(file->f_mapping, since); } which will writeback all outstanding data on the first close and report error if it happened. Following close has nothing to flush and thus no error to report. That being said if you call fsync(2) you'll still get the error back again because fsync uses a separate writeback error counter in the file description. But again only the first fsync(2) will return the error. Following fsyncs will report no error. > I.e. if a concurrent fork/exec happens for process spawning and the > fork-child closes the file descriptors then this closing may basically > receive the errors and the parent will not see them (unless additional > errors happen)? Correct AFAICT. > Or if _any_ part of the program dups the descriptor and then closes it > without reporting errors then all uses of those descriptor must consider > error delivery on close to be unreliable? Correct as well AFAICT. I should probably also add that traditional filesystems (classical local disk based filesystems) don't bother with reporting delayed errors on close(2) *at all*. So unless you call fsync(2) you will never learn there was any writeback error. After all for these filesystems there are good chances writeback didn't even start by the time you are calling close(2). So overall I'd say that error reporting from close(2) is so random and filesystem dependent that the errors are not worth paying attention to. If you really care about data integrity (and thus writeback errors) you must call fsync(2) in which case the kernel provides at least somewhat consistent error reporting story. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 15:56 ` Jan Kara @ 2026-01-26 16:43 ` Jeff Layton 2026-01-26 23:01 ` Trevor Gross 0 siblings, 1 reply; 56+ messages in thread From: Jeff Layton @ 2026-01-26 16:43 UTC (permalink / raw) To: Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: > On Mon 26-01-26 14:53:12, The 8472 wrote: > > On 26/01/2026 13:15, Jan Kara wrote: > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > > been written to > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > fsync() for that OFD > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > fdatasync() for that OFD > > > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > > worry about delayed errors, it would be helpful.] > > > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > > when you close a file descriptor that *isn’t* the last reference > > > > to its open file description, because the thread-safe way to close > > > > stdout without losing write errors[2] depends on that not happening. > > > > > > So I've checked and in Linux ->flush callback for the file is called > > > whenever you close a file descriptor (regardless whether there are other > > > file descriptors pointing to the same file description) so it's upto > > > filesystem implementation what it decides to do and which error it will > > > return... Checking the implementations e.g. FUSE and NFS *will* return > > > delayed writeback errors on *first* descriptor close even if there are > > > other still open descriptors for the description AFAICS. ...and I really wish they _didn't_. Reporting a writeback error on close is not particularly useful. Most filesystems don't require you to write back all data on a close(). A successful close() on those just means that no error has happened yet. Any application that cares about writeback errors needs to fsync(), full stop. > > Regarding the "first", does that mean the errors only get delivered once? > > I've added Jeff to CC who should be able to provide you with a more > authoritative answer but AFAIK the answer is yes. > > E.g. NFS does: > > static int > nfs_file_flush(struct file *file, fl_owner_t id) > { > ... > /* Flush writes to the server and return any errors */ > since = filemap_sample_wb_err(file->f_mapping); > nfs_wb_all(inode); > return filemap_check_wb_err(file->f_mapping, since); > } > > which will writeback all outstanding data on the first close and report > error if it happened. Following close has nothing to flush and thus no > error to report. > > That being said if you call fsync(2) you'll still get the error back again > because fsync uses a separate writeback error counter in the file > description. But again only the first fsync(2) will return the error. > Following fsyncs will report no error. > Note that NFS is "special" in that it will flush data on close() in order to maintain close-to-open cache consistency. Technically, what nfs is doing above is sampling the errseq_t in the mapping, and then writing back any dirty data, and then checking for errors that happened since the sample. close() will only report writeback errors that happened within that window. If a preexisting writeback error occurred before "since" was sampled, then it won't report that here...which is weird, and another good argument for not reporting or checking for writeback errors at close(). > > I.e. if a concurrent fork/exec happens for process spawning and the > > fork-child closes the file descriptors then this closing may basically > > receive the errors and the parent will not see them (unless additional > > errors happen)? > > Correct AFAICT. > It will see them if it calls fsync(). Reporting on close() is iffy. > > Or if _any_ part of the program dups the descriptor and then closes it > > without reporting errors then all uses of those descriptor must consider > > error delivery on close to be unreliable? > > Correct as well AFAICT. > > I should probably also add that traditional filesystems (classical local > disk based filesystems) don't bother with reporting delayed errors on > close(2) *at all*. So unless you call fsync(2) you will never learn there > was any writeback error. After all for these filesystems there are good > chances writeback didn't even start by the time you are calling close(2). > So overall I'd say that error reporting from close(2) is so random and > filesystem dependent that the errors are not worth paying attention to. If > you really care about data integrity (and thus writeback errors) you must > call fsync(2) in which case the kernel provides at least somewhat > consistent error reporting story. > +1. tl;dr: the only useful error from close() is EBADF. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 16:43 ` Jeff Layton @ 2026-01-26 23:01 ` Trevor Gross 2026-01-27 0:49 ` Jeff Layton 0 siblings, 1 reply; 56+ messages in thread From: Trevor Gross @ 2026-01-26 23:01 UTC (permalink / raw) To: Jeff Layton, Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: >> On Mon 26-01-26 14:53:12, The 8472 wrote: >> > On 26/01/2026 13:15, Jan Kara wrote: >> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: >> > > > > > [QUERY: Do delayed errors ever happen in any of these situations? >> > > > > > >> > > > > > - The fd is not the last reference to the open file description >> > > > > > >> > > > > > - The OFD was opened with O_RDONLY >> > > > > > >> > > > > > - The OFD was opened with O_RDWR but has never actually >> > > > > > been written to >> > > > > > >> > > > > > - No data has been written to the OFD since the last call to >> > > > > > fsync() for that OFD >> > > > > > >> > > > > > - No data has been written to the OFD since the last call to >> > > > > > fdatasync() for that OFD >> > > > > > >> > > > > > If we can give some guidance about when people don’t need to >> > > > > > worry about delayed errors, it would be helpful.] >> > > > >> > > > In particular, I really hope delayed errors *aren’t* ever reported >> > > > when you close a file descriptor that *isn’t* the last reference >> > > > to its open file description, because the thread-safe way to close >> > > > stdout without losing write errors[2] depends on that not happening. >> > > >> > > So I've checked and in Linux ->flush callback for the file is called >> > > whenever you close a file descriptor (regardless whether there are other >> > > file descriptors pointing to the same file description) so it's upto >> > > filesystem implementation what it decides to do and which error it will >> > > return... Checking the implementations e.g. FUSE and NFS *will* return >> > > delayed writeback errors on *first* descriptor close even if there are >> > > other still open descriptors for the description AFAICS. > > ...and I really wish they _didn't_. > > Reporting a writeback error on close is not particularly useful. Most > filesystems don't require you to write back all data on a close(). A > successful close() on those just means that no error has happened yet. > > Any application that cares about writeback errors needs to fsync(), > full stop. Is there a good middle ground solution here? It seems reasonable that an application may want to have different handling for errors expected during normal operation, such as temporary network failure with NFS, compared to more catastrophic things like failure to write to disk. The reason cited around [1] for avoiding fsync is that it comes with a cost that, for many applications, may not be worth it unless you are dealing with NFS. I was wondering if it could be worth a new fnctl that provides this kind of "best effort" error checking behavior without having the strict requirements of fsync. In effect, to report the errors that you might currently get at close() before actually calling close() and losing the fd. Alternatively, it would be interesting to have a deferred fsync() that schedules a nonblocking sync event that can be polled for completion/ errors, with flags to indicate immediate sync or allow automatic syncing as needed. But there is probably a better alternative to this complexity. - Trevor [1]: https://github.com/rust-lang/libs-team/issues/705 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-26 23:01 ` Trevor Gross @ 2026-01-27 0:49 ` Jeff Layton 2026-01-28 16:58 ` Zack Weinberg 0 siblings, 1 reply; 56+ messages in thread From: Jeff Layton @ 2026-01-27 0:49 UTC (permalink / raw) To: Trevor Gross, Jan Kara, The 8472 Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote: > On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: > > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: > > > On Mon 26-01-26 14:53:12, The 8472 wrote: > > > > On 26/01/2026 13:15, Jan Kara wrote: > > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: > > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: > > > > > > > > [QUERY: Do delayed errors ever happen in any of these situations? > > > > > > > > > > > > > > > > - The fd is not the last reference to the open file description > > > > > > > > > > > > > > > > - The OFD was opened with O_RDONLY > > > > > > > > > > > > > > > > - The OFD was opened with O_RDWR but has never actually > > > > > > > > been written to > > > > > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > > > fsync() for that OFD > > > > > > > > > > > > > > > > - No data has been written to the OFD since the last call to > > > > > > > > fdatasync() for that OFD > > > > > > > > > > > > > > > > If we can give some guidance about when people don’t need to > > > > > > > > worry about delayed errors, it would be helpful.] > > > > > > > > > > > > In particular, I really hope delayed errors *aren’t* ever reported > > > > > > when you close a file descriptor that *isn’t* the last reference > > > > > > to its open file description, because the thread-safe way to close > > > > > > stdout without losing write errors[2] depends on that not happening. > > > > > > > > > > So I've checked and in Linux ->flush callback for the file is called > > > > > whenever you close a file descriptor (regardless whether there are other > > > > > file descriptors pointing to the same file description) so it's upto > > > > > filesystem implementation what it decides to do and which error it will > > > > > return... Checking the implementations e.g. FUSE and NFS *will* return > > > > > delayed writeback errors on *first* descriptor close even if there are > > > > > other still open descriptors for the description AFAICS. > > > > ...and I really wish they _didn't_. > > > > Reporting a writeback error on close is not particularly useful. Most > > filesystems don't require you to write back all data on a close(). A > > successful close() on those just means that no error has happened yet. > > > > Any application that cares about writeback errors needs to fsync(), > > full stop. > > Is there a good middle ground solution here? > > It seems reasonable that an application may want to have different > handling for errors expected during normal operation, such as temporary > network failure with NFS, compared to more catastrophic things like > failure to write to disk. The reason cited around [1] for avoiding fsync > is that it comes with a cost that, for many applications, may not be > worth it unless you are dealing with NFS. > > I was wondering if it could be worth a new fnctl that provides this kind > of "best effort" error checking behavior without having the strict > requirements of fsync. In effect, to report the errors that you might > currently get at close() before actually calling close() and losing the > fd. > For a long-held fd, I can see the appeal: spray writes at it and just check occasionally (without blocking) that nothing has gone wrong. Maybe when things are idle, you fsync(). A new fcntl(..., F_CHECKERR, ...) command that does a file_check_and_advance_wb_err() on the fd and reports the result would be pretty straightforward. Would that be helpful for your use-case? This would be like a non- blocking fsync that just reports whether an error has occurred since the last F_CHECKERR or fsync(). > Alternatively, it would be interesting to have a deferred fsync() that > schedules a nonblocking sync event that can be polled for completion/ > errors, with flags to indicate immediate sync or allow automatic syncing > as needed. But there is probably a better alternative to this > complexity. > > [1]: https://github.com/rust-lang/libs-team/issues/705 Aside from the polling, I suppose you could effectively do this with io_uring. I'm pretty sure you can issue an fsync() or sync_file_range() that way, but I think it just ends up blocking a kernel thread until writeback is done. We've had people ask for a non-blocking fsync before. Maybe it's time to get serious about adding one. What would such a thing look like? It would be pretty simple to add a new fcntl(..., F_DATAWRITE) command that kicks off writeback a'la filemap_fdatawrite(). Then add fcntl(..., F_WB_CHECK): That could do a non-blocking version of filemap_fdatawait(), and return whether any folios are still under writeback. If there is a writeback error, it can return that instead. The catch of course is that a polling mechanism like this could easily livelock. If there is a lot of memory pressure, it might always return that something is still under writeback, no matter how often you hammer F_CHECKERR. Maybe that's ok? You can always issue a blocking fsync() if you really need to know draw a line in the sand. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-27 0:49 ` Jeff Layton @ 2026-01-28 16:58 ` Zack Weinberg 2026-02-05 9:34 ` Jan Kara 0 siblings, 1 reply; 56+ messages in thread From: Zack Weinberg @ 2026-01-28 16:58 UTC (permalink / raw) To: Jeff Layton, Trevor Gross, Jan Kara, The 8472 Cc: Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote: > On Mon, 2026-01-26 at 17:01 -0600, Trevor Gross wrote: >> On Mon Jan 26, 2026 at 10:43 AM CST, Jeff Layton wrote: >> > On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote: >> > > On Mon 26-01-26 14:53:12, The 8472 wrote: >> > > > On 26/01/2026 13:15, Jan Kara wrote: >> > > > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote: >> > > > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote: ... >> > > > > > In particular, I really hope delayed errors *aren’t* ever reported >> > > > > > when you close a file descriptor that *isn’t* the last reference >> > > > > > to its open file description, because the thread-safe way to close >> > > > > > stdout without losing write errors[2] depends on that not happening. >> > > > > >> > > > > So I've checked and in Linux ->flush callback for the file is called >> > > > > whenever you close a file descriptor (regardless whether there are other >> > > > > file descriptors pointing to the same file description) so it's upto >> > > > > filesystem implementation what it decides to do and which error it will >> > > > > return... Checking the implementations e.g. FUSE and NFS *will* return >> > > > > delayed writeback errors on *first* descriptor close even if there are >> > > > > other still open descriptors for the description AFAICS. >> > >> > ...and I really wish they _didn't_. >> > >> > Reporting a writeback error on close is not particularly useful. Most >> > filesystems don't require you to write back all data on a close(). A >> > successful close() on those just means that no error has happened yet. >> > >> > Any application that cares about writeback errors needs to fsync(), >> > full stop. >> >> Is there a good middle ground solution here? ... >> I was wondering if it could be worth a new fnctl that provides this kind >> of "best effort" error checking behavior without having the strict >> requirements of fsync. In effect, to report the errors that you might >> currently get at close() before actually calling close() and losing the >> fd. ... > A new fcntl(..., F_CHECKERR, ...) command that does a > file_check_and_advance_wb_err() on the fd and reports the result would > be pretty straightforward. > > Would that be helpful for your use-case? This would be like a non- > blocking fsync that just reports whether an error has occurred since > the last F_CHECKERR or fsync(). I feel I need to point out that “should the kernel report errors on close()” and “should the kernel add a new API to make life better for programs that currently expect close() to report [some] errors” and “should the Rust standard library propagate errors produced by close() back up to the application” and “what should the close(2) manpage say about errors” are four different conversation topics. I am all in favor of moving toward a world where close() never fails and there’s _something_ that reports write errors like fsync() without also kicking your application off a performance cliff. But that’s not the world we live in today, and this thread started as a conversation about revising the close(2) manpage, and I’d kinda like to *finish* revising the manpage in, like, the next couple weeks, not several years from now :-) So I’d like to refocus on that topic. Given what Jan Kara said earlier... > Checking the implementations e.g. FUSE and NFS *will* return delayed > writeback errors on *first* descriptor close even if there are other > still open descriptors for the description AFAICS. ... > fsync(2) must make sure data is persistently stored and return error if > it was not. Thus as a VFS person I'd consider it a filesystem bug if an > error preveting reading data later was not returned from fsync(2). OTOH > that doesn't necessarily mean that later close doesn't return an error - > e.g. FUSE does communicate with the server on close that can fail and > error can be returned. > > With this in mind let me now try to answer your remaining questions: > >> >> - The OFD was opened with O_RDONLY > > If the filesystem supports atime, close can in principle report that atime > update failed. > >> >> - The OFD was opened with O_RDWR but has never actually >> >> been written to > > The same as above but with inode mtime updates. > >> >> - No data has been written to the OFD since the last call to >> >> fsync() for that OFD > > No writeback errors should happen in this case. As I wrote above I'd > consider this a filesystem bug. > >> >> >> >> - No data has been written to the OFD since the last call to >> >> fdatasync() for that OFD > > Errors can happen because some inode metadata (in practice probably only > inode time stamps) may still need to be written out. > > So in the cases described above (except for fsync()) you may get delayed > errors on close. But since in all those cases no data is lost, I don't > think 99.9% of applications care at all... ... regrettably I think this does mean the close(3) manpage still needs to tell people to watch out for errors, and should probably say that errors _can_ happen even if the file wasn’t written to, but are much less likely to be important in that case. And my “how to close stdout in a thread-safe manner” sample code is wrong, because I was wrong to think that the error reporting only happened on the _final_ close, when the OFD is destroyed. ... What happens if the close is implicit in a dup2() operation? Here’s that erroneous “how to close stdout” fragment, with comments indicating what I thought could and could not fail at the time I wrote it: // These allocate new fds, which can always fail, e.g. because // the program already has too many files open. int new_stdout = open("/dev/null", O_WRONLY); if (new_stdout == -1) perror_exit("/dev/null"); int old_stdout = dup(1); if (old_stdout == -1) perror_exit("dup(1)"); flockfile(stdout); if (fflush(stdout)) perror_exit("stdout: write error"); dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1 funlockfile(stdout); // this close may receive delayed write errors from previous writes // to stdout if (close(old_stdout)) perror_exit("stdout: write error"); // this close cannot fail, because it only drops an alternative // reference to the open file description now installed as fd 1 close(new_stdout); Note in particular that the first close _operation_ on fd 1 is in consequence of dup2(new_stdout, 1). The dup2() manpage specifically says “the close is performed silently (i.e. any errors during the close are not reported by dup()” but, if stdout points to a file on an NFS mount, are those errors _lost_, or will they actually be reported by the subsequent close(old_stdout)? Incidentally, the dup2() manpage has a very similar example in its NOTES section, also presuming that close only reports errors on the _final_ close, not when it “merely” drops reference >=2 to an OFD. (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that already a thing somehow?) zw ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-28 16:58 ` Zack Weinberg @ 2026-02-05 9:34 ` Jan Kara 0 siblings, 0 replies; 56+ messages in thread From: Jan Kara @ 2026-02-05 9:34 UTC (permalink / raw) To: Zack Weinberg Cc: Jeff Layton, Trevor Gross, Jan Kara, The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development I've noticed we didn't reply to one question here: On Wed 28-01-26 11:58:07, Zack Weinberg wrote: > On Mon, Jan 26, 2026, at 7:49 PM, Jeff Layton wrote: > > Checking the implementations e.g. FUSE and NFS *will* return delayed > > writeback errors on *first* descriptor close even if there are other > > still open descriptors for the description AFAICS. > ... > > fsync(2) must make sure data is persistently stored and return error if > > it was not. Thus as a VFS person I'd consider it a filesystem bug if an > > error preveting reading data later was not returned from fsync(2). OTOH > > that doesn't necessarily mean that later close doesn't return an error - > > e.g. FUSE does communicate with the server on close that can fail and > > error can be returned. > > > > With this in mind let me now try to answer your remaining questions: > > > >> >> - The OFD was opened with O_RDONLY > > > > If the filesystem supports atime, close can in principle report that atime > > update failed. > > > >> >> - The OFD was opened with O_RDWR but has never actually > >> >> been written to > > > > The same as above but with inode mtime updates. > > > >> >> - No data has been written to the OFD since the last call to > >> >> fsync() for that OFD > > > > No writeback errors should happen in this case. As I wrote above I'd > > consider this a filesystem bug. > > > >> >> > >> >> - No data has been written to the OFD since the last call to > >> >> fdatasync() for that OFD > > > > Errors can happen because some inode metadata (in practice probably only > > inode time stamps) may still need to be written out. > > > > So in the cases described above (except for fsync()) you may get delayed > > errors on close. But since in all those cases no data is lost, I don't > > think 99.9% of applications care at all... > > ... regrettably I think this does mean the close(3) manpage still needs > to tell people to watch out for errors, and should probably say that > errors _can_ happen even if the file wasn’t written to, but are much > less likely to be important in that case. > > And my “how to close stdout in a thread-safe manner” sample code is > wrong, because I was wrong to think that the error reporting only > happened on the _final_ close, when the OFD is destroyed. > > ... What happens if the close is implicit in a dup2() operation? Here’s > that erroneous “how to close stdout” fragment, with comments > indicating what I thought could and could not fail at the time I wrote > it: > > // These allocate new fds, which can always fail, e.g. because > // the program already has too many files open. > int new_stdout = open("/dev/null", O_WRONLY); > if (new_stdout == -1) perror_exit("/dev/null"); > int old_stdout = dup(1); > if (old_stdout == -1) perror_exit("dup(1)"); > > flockfile(stdout); > if (fflush(stdout)) perror_exit("stdout: write error"); > dup2(new_stdout, 1); // cannot fail, atomically replaces fd 1 > funlockfile(stdout); > > // this close may receive delayed write errors from previous writes > // to stdout > if (close(old_stdout)) perror_exit("stdout: write error"); > > // this close cannot fail, because it only drops an alternative > // reference to the open file description now installed as fd 1 > close(new_stdout); > > Note in particular that the first close _operation_ on fd 1 is in > consequence of dup2(new_stdout, 1). The dup2() manpage specifically > says “the close is performed silently (i.e. any errors during the > close are not reported by dup()” but, if stdout points to a file on > an NFS mount, are those errors _lost_, or will they actually be > reported by the subsequent close(old_stdout)? It is simply lost (the error is propagated from the filesystem to VFS which just ignores it). > Incidentally, the dup2() manpage has a very similar example in its > NOTES section, also presuming that close only reports errors on the > _final_ close, not when it “merely” drops reference >=2 to an OFD. > > (I’m starting to think we need dup3(old, new, O_SWAP_FDS). Or is that > already a thing somehow?) I don't think a functionality like this currently exists. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-23 18:10 ` Zack Weinberg 2025-05-24 2:24 ` Rich Felker @ 2025-05-24 19:25 ` Florian Weimer 2026-01-18 22:23 ` Alejandro Colomar 2 siblings, 0 replies; 56+ messages in thread From: Florian Weimer @ 2025-05-24 19:25 UTC (permalink / raw) To: Zack Weinberg Cc: Alejandro Colomar, Rich Felker, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development * Zack Weinberg: > BUGS > Prior to POSIX.1-2024, there was no official guarantee that > close() would always close the file descriptor, even on error. > Linux has always closed the file descriptor, even on error, > but other implementations might not have. > > The only such implementation we have heard of is HP-UX; at least > some versions of HP-UX’s man page for close() said it should be > retried if it returned -1 with errno set to EINTR. (If you know > exactly which versions of HP-UX are affected, or of any other > Unix where close() doesn’t always close the file descriptor, > please contact us about it.) The AIX documentation also says this: | The success of the close subroutine is undetermined if the following | is true: | | EINTR The state of the FileDescriptor is undetermined. Retry the | close routine to ensure that the FileDescriptor is closed. <https://www.ibm.com/docs/en/aix/7.2.0?topic=c-close-subroutine> So it's not just HP-UX. For z/OS, it looks like some other errors leave the descriptor open: | EAGAIN | | The call did not complete because the specified socket descriptor | is currently being used by another thread in the same process. | | For example, in a multithreaded environment, close() fails and | returns EAGAIN when the following sequence of events occurs (1) | thread is blocked in a read() or select() call on a given file or | socket descriptor and (2) another thread issues a simultaneous | close() call for the same descriptor. | […] | EBADF | fildes is not a valid open file descriptor, or the socket | parameter is not a valid socket descriptor. <https://www.ibm.com/docs/en/zos/2.1.0?topic=functions-close-close-file> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-23 18:10 ` Zack Weinberg 2025-05-24 2:24 ` Rich Felker 2025-05-24 19:25 ` Florian Weimer @ 2026-01-18 22:23 ` Alejandro Colomar 2026-01-20 16:15 ` Zack Weinberg 2 siblings, 1 reply; 56+ messages in thread From: Alejandro Colomar @ 2026-01-18 22:23 UTC (permalink / raw) To: Zack Weinberg Cc: Rich Felker, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development [-- Attachment #1: Type: text/plain, Size: 5703 bytes --] Hi Zack and others, Just a gentle ping. It would be nice to have an agreement for some patch. Have a lovely night! Alex On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: > Taking everything said in this thread into account, I have attempted to > wordsmith new language for the close(2) manpage. Please let me know > what you think, and please help me with the bits marked in square > brackets. I can make this into a proper patch for the manpages > when everyone is happy with it. > > zw > > --- > > DESCRIPTION > ... existing text ... > > close() always succeeds. That is, after it returns, _fd_ has > always been disconnected from the open file it formerly referred > to, and its number can be recycled to refer to some other file. > Furthermore, if _fd_ was the last reference to the underlying > open file description, the resources associated with the open file > description will always have been scheduled to be released. > > However, close may report _delayed errors_ from a previous I/O > operation. Therefore, its return value should not be ignored. > > RETURN VALUE > close() returns zero if there are no delayed errors to report, > or -1 if there _might_ be delayed errors. > > When close() returns -1, check _errno_ to see what the situation > actually is. Most, but not all, _errno_ codes indicate a delayed > I/O error that should be reported to the user. See ERRORS and > NOTES for more detail. > > [QUERY: Is it ever possible to get delayed errors on close() from > a file that was opened with O_RDONLY? What about a file that was > opened with O_RDWR but never actually written to? If people only > have to worry about delayed errors if the file was actually > written to, we should say so at this point. > > It would also be good to mention whether it is possible to get a > delayed error on close() even if a previous call to fsync() or > fdatasync() succeeded and there haven’t been any more writes to > that file *description* (not necessarily via the fd being closed) > since.] > > ERRORS > EBADF _fd_ wasn’t open in the first place, or is outside the > valid numeric range for file descriptors. > > EINPROGRESS > EINTR > There are no delayed errors to report, but the kernel is > still doing some clean-up work in the background. This > situation should be treated the same as if close() had > returned zero. Do not retry the close(), and do not report > an error to the user. > > EDQUOT > EFBIG > EIO > ENOSPC > These are the most common errno codes associated with > delayed I/O errors. They should be treated as a hard > failure to write to the file that was formerly associated > with _fd_, the same as if an earlier write(2) had failed > with one of these codes. The file has still been closed! > Do not retry the close(). But do report an error to the user. > > Depending on the underlying file, close() may return other errno > codes; these should generally also be treated as delayed I/O errors. > > NOTES > Dealing with error returns from close() > > As discussed above, close() always closes the file. Except when > errno is set to EBADF, EINPROGRESS, or EINTR, an error return from > close() reports a _delayed I/O error_ from a previous write() > operation. > > It is vital to report delayed I/O errors to the user; failing to > check the return value of close() can cause _silent_ loss of data. > The most common situations where this actually happens involve > networked filesystems, where, in the name of throughput, write() > often returns success before the server has actually confirmed a > successful write. > > However, it is also vital to understand that _no matter what_ > close() returns, and _no matter what_ it sets errno to, when it > returns, _the file descriptor passed to close() has been closed_, > and its number is _immediately_ available for reuse by open(2), > dup(2), etc. Therefore, one should never retry a close(), not > even if it set errno to a value that normally indicates the > operation needs to be retried (e.g. EINTR). Retrying a close() > is a serious bug, particularly in a multithreaded program; if > the file descriptor number has already been reused, _that file_ > will get closed out from under whatever other thread opened it. > > [Possibly something about fsync/fdatasync here?] > > BUGS > Prior to POSIX.1-2024, there was no official guarantee that > close() would always close the file descriptor, even on error. > Linux has always closed the file descriptor, even on error, > but other implementations might not have. > > The only such implementation we have heard of is HP-UX; at least > some versions of HP-UX’s man page for close() said it should be > retried if it returned -1 with errno set to EINTR. (If you know > exactly which versions of HP-UX are affected, or of any other > Unix where close() doesn’t always close the file descriptor, > please contact us about it.) > > Portable code should nonetheless never retry a failed close(); the > consequences of a file descriptor leak are far less dangerous than > the consequences of closing a file out from under another thread. -- <https://www.alejandro-colomar.es> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-18 22:23 ` Alejandro Colomar @ 2026-01-20 16:15 ` Zack Weinberg 2026-01-20 16:36 ` Rich Felker 0 siblings, 1 reply; 56+ messages in thread From: Zack Weinberg @ 2026-01-20 16:15 UTC (permalink / raw) To: Alejandro Colomar Cc: Rich Felker, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development Rich and I have an irreconciliable disagreement on what the semantics of close _should_ be. I'm not going to do any more work on this until/unless he changes his mind. On Sun, Jan 18, 2026, at 5:23 PM, Alejandro Colomar wrote: > Hi Zack and others, > > Just a gentle ping. It would be nice to have an agreement for some > patch. > > > Have a lovely night! > Alex > > On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote: >> Taking everything said in this thread into account, I have attempted to >> wordsmith new language for the close(2) manpage. Please let me know >> what you think, and please help me with the bits marked in square >> brackets. I can make this into a proper patch for the manpages >> when everyone is happy with it. >> >> zw >> >> --- >> >> DESCRIPTION >> ... existing text ... >> >> close() always succeeds. That is, after it returns, _fd_ has >> always been disconnected from the open file it formerly referred >> to, and its number can be recycled to refer to some other file. >> Furthermore, if _fd_ was the last reference to the underlying >> open file description, the resources associated with the open file >> description will always have been scheduled to be released. >> >> However, close may report _delayed errors_ from a previous I/O >> operation. Therefore, its return value should not be ignored. >> >> RETURN VALUE >> close() returns zero if there are no delayed errors to report, >> or -1 if there _might_ be delayed errors. >> >> When close() returns -1, check _errno_ to see what the situation >> actually is. Most, but not all, _errno_ codes indicate a delayed >> I/O error that should be reported to the user. See ERRORS and >> NOTES for more detail. >> >> [QUERY: Is it ever possible to get delayed errors on close() from >> a file that was opened with O_RDONLY? What about a file that was >> opened with O_RDWR but never actually written to? If people only >> have to worry about delayed errors if the file was actually >> written to, we should say so at this point. >> >> It would also be good to mention whether it is possible to get a >> delayed error on close() even if a previous call to fsync() or >> fdatasync() succeeded and there haven’t been any more writes to >> that file *description* (not necessarily via the fd being closed) >> since.] >> >> ERRORS >> EBADF _fd_ wasn’t open in the first place, or is outside the >> valid numeric range for file descriptors. >> >> EINPROGRESS >> EINTR >> There are no delayed errors to report, but the kernel is >> still doing some clean-up work in the background. This >> situation should be treated the same as if close() had >> returned zero. Do not retry the close(), and do not report >> an error to the user. >> >> EDQUOT >> EFBIG >> EIO >> ENOSPC >> These are the most common errno codes associated with >> delayed I/O errors. They should be treated as a hard >> failure to write to the file that was formerly associated >> with _fd_, the same as if an earlier write(2) had failed >> with one of these codes. The file has still been closed! >> Do not retry the close(). But do report an error to the user. >> >> Depending on the underlying file, close() may return other errno >> codes; these should generally also be treated as delayed I/O errors. >> >> NOTES >> Dealing with error returns from close() >> >> As discussed above, close() always closes the file. Except when >> errno is set to EBADF, EINPROGRESS, or EINTR, an error return from >> close() reports a _delayed I/O error_ from a previous write() >> operation. >> >> It is vital to report delayed I/O errors to the user; failing to >> check the return value of close() can cause _silent_ loss of data. >> The most common situations where this actually happens involve >> networked filesystems, where, in the name of throughput, write() >> often returns success before the server has actually confirmed a >> successful write. >> >> However, it is also vital to understand that _no matter what_ >> close() returns, and _no matter what_ it sets errno to, when it >> returns, _the file descriptor passed to close() has been closed_, >> and its number is _immediately_ available for reuse by open(2), >> dup(2), etc. Therefore, one should never retry a close(), not >> even if it set errno to a value that normally indicates the >> operation needs to be retried (e.g. EINTR). Retrying a close() >> is a serious bug, particularly in a multithreaded program; if >> the file descriptor number has already been reused, _that file_ >> will get closed out from under whatever other thread opened it. >> >> [Possibly something about fsync/fdatasync here?] >> >> BUGS >> Prior to POSIX.1-2024, there was no official guarantee that >> close() would always close the file descriptor, even on error. >> Linux has always closed the file descriptor, even on error, >> but other implementations might not have. >> >> The only such implementation we have heard of is HP-UX; at least >> some versions of HP-UX’s man page for close() said it should be >> retried if it returned -1 with errno set to EINTR. (If you know >> exactly which versions of HP-UX are affected, or of any other >> Unix where close() doesn’t always close the file descriptor, >> please contact us about it.) >> >> Portable code should nonetheless never retry a failed close(); the >> consequences of a file descriptor leak are far less dangerous than >> the consequences of closing a file out from under another thread. > > -- > <https://www.alejandro-colomar.es> > > Attachments: > * signature.asc ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 16:15 ` Zack Weinberg @ 2026-01-20 16:36 ` Rich Felker 2026-01-20 19:17 ` Al Viro 0 siblings, 1 reply; 56+ messages in thread From: Rich Felker @ 2026-01-20 16:36 UTC (permalink / raw) To: Zack Weinberg Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Tue, Jan 20, 2026 at 11:15:15AM -0500, Zack Weinberg wrote: > Rich and I have an irreconciliable disagreement on what the semantics of close > _should_ be. I'm not going to do any more work on this until/unless he > changes his mind. It's been way too long since I read this thread to recall what our point of disagreement is or what point glibc might be at in reconciling the Linux kernel disagreement with POSIX. I believe my position is basically this: 1. Documentation should reflect that the EINTR behavior on raw Linux syscall and traditional glibc is non-conforming to POSIX, but make applications aware of it and that it's unsafe to retry close on these systems. 2. Documentation should be descriptive not polemic or proscriptive of coding style or practices. When there is a disagreement like this it should document that and faithfully represent the different positions, not represent the author's views on which one is correct. 3. It may be helpful to have further information on what types of errors can actually be expected from close on Linux, and under what conditions, but only if these behaviors can actually be guaranteed. If it's just documenting what Linux currently happens to do, but without any existing promise to preserve that for new file types etc., then this is stepping out of line of the role of documentation into defining the specification, and that requires input from other folks. 4. If musl behavior is being documented, it should be noted that we do not have the non-conforming EINTR issue. If the kernel produces EINTR, we return 0. From 0.9.7 to 1.1.6 we produced EINPROGRESS, but this was changed in 1.1.7 as it was found that applications would treat EINPROGRESS as an error condition. Rich ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2026-01-20 16:36 ` Rich Felker @ 2026-01-20 19:17 ` Al Viro 0 siblings, 0 replies; 56+ messages in thread From: Al Viro @ 2026-01-20 19:17 UTC (permalink / raw) To: Rich Felker Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara, Christian Brauner, linux-fsdevel, linux-api, GNU libc development On Tue, Jan 20, 2026 at 11:36:34AM -0500, Rich Felker wrote: > On Tue, Jan 20, 2026 at 11:15:15AM -0500, Zack Weinberg wrote: > > Rich and I have an irreconciliable disagreement on what the semantics of close > > _should_ be. I'm not going to do any more work on this until/unless he > > changes his mind. > > It's been way too long since I read this thread to recall what our > point of disagreement is or what point glibc might be at in > reconciling the Linux kernel disagreement with POSIX. It's not so much disagreement as breakage of internal POSIX decision process that has lead to POSIX irrelevance in this particular area. POSIX authority derives from the agreement of actual behaviour of Unices. Always had been, witness the amount of underspecified areas where various vendor implementation had different semantics, due to exact that reason. They (or anybody else, really) can argue that such-and-such behaviour ought to change. In quite a few cases that has succeeded. What they can't do is to force such change by fiat. Especially not when Linux and *BSD happen to agree on behaviour that differs from what they wish it to be. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 2025-05-17 13:32 ` Rich Felker 2025-05-17 13:46 ` Alejandro Colomar @ 2026-02-06 15:13 ` Vincent Lefevre 1 sibling, 0 replies; 56+ messages in thread From: Vincent Lefevre @ 2026-02-06 15:13 UTC (permalink / raw) To: Rich Felker Cc: Alejandro Colomar, Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, libc-alpha On 2025-05-17 09:32:52 -0400, Rich Felker wrote: > On Fri, May 16, 2025 at 04:39:57PM +0200, Vincent Lefevre wrote: > > On 2025-05-16 09:05:47 -0400, Rich Felker wrote: > > > FWIW musl adopted the EINPROGRESS as soon as we were made aware of the > > > issue, and later changed it to returning 0 since applications > > > (particularly, any written prior to this interpretation) are prone to > > > interpret EINPROGRESS as an error condition rather than success and > > > possibly misinterpret it as meaning the fd is still open and valid to > > > pass to close again. > > > > If I understand correctly, this is a poor choice. POSIX.1-2024 says: > > > > ERRORS > > The close() and posix_close() functions shall fail if: > > [...] > > [EINPROGRESS] > > The function was interrupted by a signal and fildes was closed > > but the close operation is continuing asynchronously. > > > > But this does not mean that the asynchronous close operation will > > succeed. > > There are no asynchronous behaviors specified for there to be a > conformance distinction here. The only observable behaviors happen > instantly, mainly the release of the file descriptor and the process's > handle on the underlying resource. Abstractly, there is no async > operation that could succeed or fail. Sorry, this is old. But a consequence may be memory leak if something unexpected occurred during what was done asynchronously. There is no guarantee that *every* resource has been released. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 10:48 ` Jan Kara 2025-05-16 12:11 ` Alejandro Colomar @ 2025-05-16 12:41 ` Mateusz Guzik 2025-05-16 12:41 ` Theodore Ts'o ` (2 subsequent siblings) 4 siblings, 0 replies; 56+ messages in thread From: Mateusz Guzik @ 2025-05-16 12:41 UTC (permalink / raw) To: Jan Kara Cc: Alejandro Colomar, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: > Hi! > > On Thu 15-05-25 23:33:22, Alejandro Colomar wrote: > > I'm updating the manual pages for POSIX.1-2024, and have some doubts > > about close(2). The manual page for close(2) says (conforming to > > POSIX.1-2008): > > > > The EINTR error is a somewhat special case. Regarding the EINTR > > error, POSIX.1‐2008 says: > > > > If close() is interrupted by a signal that is to be > > caught, it shall return -1 with errno set to EINTR and > > the state of fildes is unspecified. > > > > This permits the behavior that occurs on Linux and many other > > implementations, where, as with other errors that may be re‐ > > ported by close(), the file descriptor is guaranteed to be > > closed. However, it also permits another possibility: that the > > implementation returns an EINTR error and keeps the file de‐ > > scriptor open. (According to its documentation, HP‐UX’s close() > > does this.) The caller must then once more use close() to close > > the file descriptor, to avoid file descriptor leaks. This di‐ > > vergence in implementation behaviors provides a difficult hurdle > > for portable applications, since on many implementations, > > close() must not be called again after an EINTR error, and on at > > least one, close() must be called again. There are plans to ad‐ > > dress this conundrum for the next major release of the POSIX.1 > > standard. > > > > TL;DR: close(2) with EINTR is allowed to either leave the fd open or > > closed, and Linux leaves it closed, while others (HP-UX only?) leaves it > > open. > > > > Now, POSIX.1-2024 says: > > > > If close() is interrupted by a signal that is to be caught, then > > it is unspecified whether it returns -1 with errno set to > > [EINTR] and fildes remaining open, or returns -1 with errno set > > to [EINPROGRESS] and fildes being closed, or returns 0 to > > indicate successful completion; [...] > > > > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > > > > Which seems to bless HP-UX and screw all the others, requiring them to > > report EINPROGRESS. > > > > Was there any discussion about what to do in the Linux kernel? > > I'm not aware of any discussions but indeed we are returning EINTR while > closing the fd. Frankly, changing the error code we return in that case is > really asking for userspace regressions so I'm of the opinion we just > ignore the standard as in my opinion it goes against a long established > reality. I wonder what are they thinking there. Any program which even bothers to check for EINTR assumes the fd is already closed, so one has to assume augmenting behavior to support this would result in fd leaks. But that crappery aside, I do wonder if a close() variant which can fail and leaves the fd intact would be warranted. For example one of the error modes is ENOSPC (or at least the manpage claims as much). As is the error is not actionable as the fd is gone. If instead a magic flag was passed down to indicate what to do (e.g., leave the fd in place), the program could try to do some recovery (for examples unlinking temp files it knows it stores there). Similar deal with EINTR, albeit this error for close() would preferably get eradicated instead. Just some meh rambling. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 10:48 ` Jan Kara 2025-05-16 12:11 ` Alejandro Colomar 2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik @ 2025-05-16 12:41 ` Theodore Ts'o 2025-05-19 23:19 ` Steffen Nurpmeso 2025-05-16 19:13 ` Al Viro 2025-05-19 9:48 ` Christian Brauner 4 siblings, 1 reply; 56+ messages in thread From: Theodore Ts'o @ 2025-05-16 12:41 UTC (permalink / raw) To: Jan Kara Cc: Alejandro Colomar, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: > > Now, POSIX.1-2024 says: > > > > If close() is interrupted by a signal that is to be caught, then > > it is unspecified whether it returns -1 with errno set to > > [EINTR] and fildes remaining open, or returns -1 with errno set > > to [EINPROGRESS] and fildes being closed, or returns 0 to > > indicate successful completion; [...] > > > > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > > > > Which seems to bless HP-UX and screw all the others, requiring them to > > report EINPROGRESS. > > > > Was there any discussion about what to do in the Linux kernel? > > I'm not aware of any discussions but indeed we are returning EINTR while > closing the fd. Frankly, changing the error code we return in that case is > really asking for userspace regressions so I'm of the opinion we just > ignore the standard as in my opinion it goes against a long established > reality. Yeah, it appears that the Austin Group has lost all connection with reality, and we should treat POSIX 2024 accordingly. Not breaking userspace applications is way more important that POSIX 2024 compliance. Which is sad, because I used to really care about POSIX.1 standard as being very useful. But that seems to be no longer the case... - Ted ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 12:41 ` Theodore Ts'o @ 2025-05-19 23:19 ` Steffen Nurpmeso 2025-05-20 13:37 ` Theodore Ts'o 0 siblings, 1 reply; 56+ messages in thread From: Steffen Nurpmeso @ 2025-05-19 23:19 UTC (permalink / raw) To: Theodore Ts'o Cc: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso Theodore Ts'o wrote in <20250516124147.GB7158@mit.edu>: |On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: |>> Now, POSIX.1-2024 says: |>> |>> If close() is interrupted by a signal that is to be caught, then |>> it is unspecified whether it returns -1 with errno set to |>> [EINTR] and fildes remaining open, or returns -1 with errno set |>> to [EINPROGRESS] and fildes being closed, or returns 0 to |>> indicate successful completion; [...] |>> |>> <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> |>> |>> Which seems to bless HP-UX and screw all the others, requiring them to |>> report EINPROGRESS. |>> |>> Was there any discussion about what to do in the Linux kernel? |> |> I'm not aware of any discussions but indeed we are returning EINTR while |> closing the fd. Frankly, changing the error code we return in that \ |> case is |> really asking for userspace regressions so I'm of the opinion we just |> ignore the standard as in my opinion it goes against a long established |> reality. | |Yeah, it appears that the Austin Group has lost all connection with |reality, and we should treat POSIX 2024 accordingly. Not breaking |userspace applications is way more important that POSIX 2024 |compliance. Which is sad, because I used to really care about POSIX.1 |standard as being very useful. But that seems to be no longer the |case... They could not do otherwise than talking the status quo, i think. They have explicitly added posix_close() which overcomes the problem (for those operating systems which actually act like that). There is a long RATIONALE on this, it starts on page 747 :) --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-19 23:19 ` Steffen Nurpmeso @ 2025-05-20 13:37 ` Theodore Ts'o 2025-05-20 23:16 ` Steffen Nurpmeso 0 siblings, 1 reply; 56+ messages in thread From: Theodore Ts'o @ 2025-05-20 13:37 UTC (permalink / raw) To: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso On Tue, May 20, 2025 at 01:19:19AM +0200, Steffen Nurpmeso wrote: > > They could not do otherwise than talking the status quo, i think. > They have explicitly added posix_close() which overcomes the > problem (for those operating systems which actually act like > that). There is a long RATIONALE on this, it starts on page 747 :) They could have just added posix_close() which provided well-defined semantics without demanding that existing implementations make non-backwards compatible changes to close(2). Personally, while they were adding posix_close(2) they could have also fixed the disaster which is the semantics around close(2) and how advisory locks get released that were held by other file descriptors and add a profound apologies over the insane semantics demanded by POSIX[1]. [1] "POSIX advisory locks are broken by design." https://www.sqlite.org/src/artifact/c230a7a24?ln=994-1081 - Ted ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-20 13:37 ` Theodore Ts'o @ 2025-05-20 23:16 ` Steffen Nurpmeso 0 siblings, 0 replies; 56+ messages in thread From: Steffen Nurpmeso @ 2025-05-20 23:16 UTC (permalink / raw) To: Theodore Ts'o Cc: Jan Kara, Alejandro Colomar, Alexander Viro, Christian Brauner, linux-fsdevel, linux-api, linux-man, Steffen Nurpmeso Theodore Ts'o wrote in <20250520133705.GE38098@mit.edu>: |On Tue, May 20, 2025 at 01:19:19AM +0200, Steffen Nurpmeso wrote: |> They could not do otherwise than talking the status quo, i think. |> They have explicitly added posix_close() which overcomes the |> problem (for those operating systems which actually act like |> that). There is a long RATIONALE on this, it starts on page 747 :) | |They could have just added posix_close() which provided well-defined |semantics without demanding that existing implementations make |non-backwards compatible changes to close(2). Personally, while they |were adding posix_close(2) they could have also fixed the disaster |which is the semantics around close(2) [.] Well it was a lot of trouble, not only in bug 529[1], with follow-ups like a thread started by Michael Kerrisk, with an interesting response by Rich Felker of Musl[2]. In [1] Erik Blake of RedHat/libvirt said for example The Linux kernel currently always frees the file descriptor (no chance for a retry; the filedes can immediately be reused by another open()), for both EINTR and EIO. Maybe it is safer to state that the fd is _always_ closed, even if failure is reported? etc, but Geoff Clare then (this also was in 2012, where one possibly could have hoped that more operating systems survive / continue with money/manpower backing by serious companies; just in case that mattered) came via HP got it right with HP-UX; AIX and Linux do the wrong thing. and he has quite some reasoning for descriptors like ttys etc, where close can linger, which resulted in Erik Blake quoting Let me make it very, very clear - no matter how many times these guys assert HP-UX insane behaviour correct, no "fixes" to Linux one are going to be accepted. Consider it vetoed. By me, in role of Linux VFS maintainer. And I'm _very_ certain that getting Linus to agree will be a matter of minutes. [1] https://www.austingroupbugs.net/view.php?id=529 [2] https://www.mail-archive.com/austin-group-l@opengroup.org/msg00579.html |[.] and how advisory locks get |released that were held by other file descriptors and add a profound |apologies over the insane semantics demanded by POSIX[1]. The new standard added the Linux-style F_OFD_* fcntl(2) locks! They are yet Linux-only, but NetBSD at least has an issue by a major contributor (bug 59241): NetBSD seems to lack the following: 3.237 OFD-Owned File Lock ... https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_237 >How-To-Repeat: standards inspection >Fix: Yes, please! (That or write down a reason why we eschew it.) |[1] "POSIX advisory locks are broken by design." | https://www.sqlite.org/src/artifact/c230a7a24?ln=994-1081 | | - Ted --End of <20250520133705.GE38098@mit.edu> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 10:48 ` Jan Kara ` (2 preceding siblings ...) 2025-05-16 12:41 ` Theodore Ts'o @ 2025-05-16 19:13 ` Al Viro 2025-05-19 9:48 ` Christian Brauner 4 siblings, 0 replies; 56+ messages in thread From: Al Viro @ 2025-05-16 19:13 UTC (permalink / raw) To: Jan Kara Cc: Alejandro Colomar, Christian Brauner, linux-fsdevel, linux-api, linux-man On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: > I'm not aware of any discussions but indeed we are returning EINTR while > closing the fd. Frankly, changing the error code we return in that case is > really asking for userspace regressions so I'm of the opinion we just > ignore the standard as in my opinion it goes against a long established > reality. AFAICS what happens is that relevance of Austin Group has dropped so low that they stopped caring about any BS filters they used to have. What we are seeing now is assorted pet idiocies that used to sit in their system, periodically getting shot down while there had been anyone who cared to do that. Sad, of course, but what can we do, other than politely ignoring the... output? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: close(2) with EINTR has been changed by POSIX.1-2024 2025-05-16 10:48 ` Jan Kara ` (3 preceding siblings ...) 2025-05-16 19:13 ` Al Viro @ 2025-05-19 9:48 ` Christian Brauner 4 siblings, 0 replies; 56+ messages in thread From: Christian Brauner @ 2025-05-19 9:48 UTC (permalink / raw) To: Jan Kara Cc: Alejandro Colomar, Alexander Viro, linux-fsdevel, linux-api, linux-man On Fri, May 16, 2025 at 12:48:56PM +0200, Jan Kara wrote: > Hi! > > On Thu 15-05-25 23:33:22, Alejandro Colomar wrote: > > I'm updating the manual pages for POSIX.1-2024, and have some doubts > > about close(2). The manual page for close(2) says (conforming to > > POSIX.1-2008): > > > > The EINTR error is a somewhat special case. Regarding the EINTR > > error, POSIX.1‐2008 says: > > > > If close() is interrupted by a signal that is to be > > caught, it shall return -1 with errno set to EINTR and > > the state of fildes is unspecified. > > > > This permits the behavior that occurs on Linux and many other > > implementations, where, as with other errors that may be re‐ > > ported by close(), the file descriptor is guaranteed to be > > closed. However, it also permits another possibility: that the > > implementation returns an EINTR error and keeps the file de‐ > > scriptor open. (According to its documentation, HP‐UX’s close() > > does this.) The caller must then once more use close() to close > > the file descriptor, to avoid file descriptor leaks. This di‐ > > vergence in implementation behaviors provides a difficult hurdle > > for portable applications, since on many implementations, > > close() must not be called again after an EINTR error, and on at > > least one, close() must be called again. There are plans to ad‐ > > dress this conundrum for the next major release of the POSIX.1 > > standard. > > > > TL;DR: close(2) with EINTR is allowed to either leave the fd open or > > closed, and Linux leaves it closed, while others (HP-UX only?) leaves it > > open. > > > > Now, POSIX.1-2024 says: > > > > If close() is interrupted by a signal that is to be caught, then > > it is unspecified whether it returns -1 with errno set to > > [EINTR] and fildes remaining open, or returns -1 with errno set > > to [EINPROGRESS] and fildes being closed, or returns 0 to > > indicate successful completion; [...] > > > > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/close.html> > > > > Which seems to bless HP-UX and screw all the others, requiring them to > > report EINPROGRESS. > > > > Was there any discussion about what to do in the Linux kernel? > > I'm not aware of any discussions but indeed we are returning EINTR while > closing the fd. Frankly, changing the error code we return in that case is > really asking for userspace regressions so I'm of the opinion we just > ignore the standard as in my opinion it goes against a long established > reality. Ignore. We've long since stopped designing apis with input from that standard in mind. And I think that was a very wise decision. ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2026-02-06 15:20 UTC | newest] Thread overview: 56+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-15 21:33 close(2) with EINTR has been changed by POSIX.1-2024 Alejandro Colomar 2025-05-16 10:48 ` Jan Kara 2025-05-16 12:11 ` Alejandro Colomar 2025-05-16 12:52 ` [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024 Alejandro Colomar 2025-05-16 13:05 ` Rich Felker 2025-05-16 14:20 ` Theodore Ts'o 2025-05-17 5:46 ` Alejandro Colomar 2025-05-17 13:03 ` Alejandro Colomar 2025-05-17 13:43 ` Rich Felker 2025-05-16 14:39 ` Vincent Lefevre 2025-05-16 14:52 ` Florian Weimer 2025-05-16 15:28 ` Vincent Lefevre 2025-05-16 15:28 ` Rich Felker 2025-05-17 13:32 ` Rich Felker 2025-05-17 13:46 ` Alejandro Colomar 2025-05-23 18:10 ` Zack Weinberg 2025-05-24 2:24 ` Rich Felker 2026-01-20 17:05 ` Zack Weinberg 2026-01-20 17:46 ` Rich Felker 2026-01-20 18:39 ` Florian Weimer 2026-01-20 19:00 ` Rich Felker 2026-01-20 20:05 ` Florian Weimer 2026-01-20 20:11 ` Paul Eggert 2026-01-20 20:35 ` Alejandro Colomar 2026-01-20 20:42 ` Alejandro Colomar 2026-01-23 0:33 ` Zack Weinberg 2026-01-23 1:02 ` Alejandro Colomar 2026-01-23 1:38 ` Al Viro 2026-01-23 14:44 ` Alejandro Colomar 2026-01-23 14:05 ` Zack Weinberg 2026-01-24 19:34 ` The 8472 2026-01-24 21:39 ` Rich Felker 2026-01-24 21:57 ` The 8472 2026-01-25 15:37 ` Zack Weinberg 2026-01-26 8:51 ` Florian Weimer 2026-01-26 12:15 ` Jan Kara 2026-01-26 13:53 ` The 8472 2026-01-26 15:56 ` Jan Kara 2026-01-26 16:43 ` Jeff Layton 2026-01-26 23:01 ` Trevor Gross 2026-01-27 0:49 ` Jeff Layton 2026-01-28 16:58 ` Zack Weinberg 2026-02-05 9:34 ` Jan Kara 2025-05-24 19:25 ` Florian Weimer 2026-01-18 22:23 ` Alejandro Colomar 2026-01-20 16:15 ` Zack Weinberg 2026-01-20 16:36 ` Rich Felker 2026-01-20 19:17 ` Al Viro 2026-02-06 15:13 ` Vincent Lefevre 2025-05-16 12:41 ` close(2) with EINTR has been changed by POSIX.1-2024 Mateusz Guzik 2025-05-16 12:41 ` Theodore Ts'o 2025-05-19 23:19 ` Steffen Nurpmeso 2025-05-20 13:37 ` Theodore Ts'o 2025-05-20 23:16 ` Steffen Nurpmeso 2025-05-16 19:13 ` Al Viro 2025-05-19 9:48 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox