* O_CLOEXEC use for OPEN_TREE_CLOEXEC @ 2026-01-13 22:40 Florian Weimer 2026-01-14 16:03 ` Christian Brauner 0 siblings, 1 reply; 6+ messages in thread From: Florian Weimer @ 2026-01-13 22:40 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie In <linux/mount.h>, we have this: #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ This causes a few pain points for us to on the glibc side when we mirror this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, which is one of the headers that's completely incompatible with the UAPI headers. The reason why this is painful is because O_CLOEXEC has at least three different values across architectures: 0x80000, 0x200000, 0x400000 Even for the UAPI this isn't ideal because it effectively burns three open_tree flags, unless the flags are made architecture-specific, too. Thanks, Florian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC 2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer @ 2026-01-14 16:03 ` Christian Brauner 2026-01-14 19:42 ` Andy Lutomirski 2026-01-15 8:55 ` Florian Weimer 0 siblings, 2 replies; 6+ messages in thread From: Christian Brauner @ 2026-01-14 16:03 UTC (permalink / raw) To: Florian Weimer Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote: > In <linux/mount.h>, we have this: > > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ > > This causes a few pain points for us to on the glibc side when we mirror > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, > which is one of the headers that's completely incompatible with the UAPI > headers. > > The reason why this is painful is because O_CLOEXEC has at least three > different values across architectures: 0x80000, 0x200000, 0x400000 > > Even for the UAPI this isn't ideal because it effectively burns three > open_tree flags, unless the flags are made architecture-specific, too. I think that just got cargo-culted... A long time ago some API define as O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we can't change that now but we can document that this shouldn't be ifdefed and instead be a separate per-syscall bit. But I think that's the best we can do right now. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC 2026-01-14 16:03 ` Christian Brauner @ 2026-01-14 19:42 ` Andy Lutomirski 2026-01-14 21:18 ` Aleksa Sarai 2026-01-15 8:55 ` Florian Weimer 1 sibling, 1 reply; 6+ messages in thread From: Andy Lutomirski @ 2026-01-14 19:42 UTC (permalink / raw) To: Christian Brauner Cc: Florian Weimer, linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote: > > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote: > > In <linux/mount.h>, we have this: > > > > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ > > > > This causes a few pain points for us to on the glibc side when we mirror > > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, > > which is one of the headers that's completely incompatible with the UAPI > > headers. > > > > The reason why this is painful is because O_CLOEXEC has at least three > > different values across architectures: 0x80000, 0x200000, 0x400000 > > > > Even for the UAPI this isn't ideal because it effectively burns three > > open_tree flags, unless the flags are made architecture-specific, too. > > I think that just got cargo-culted... A long time ago some API define as > O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we > can't change that now but we can document that this shouldn't be ifdefed > and instead be a separate per-syscall bit. But I think that's the best > we can do right now. > How about, for future syscalls, we make CLOEXEC unconditional? If anyone wants an ofd to get inherited across exec, they can F_SETFD it themselves. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC 2026-01-14 19:42 ` Andy Lutomirski @ 2026-01-14 21:18 ` Aleksa Sarai 0 siblings, 0 replies; 6+ messages in thread From: Aleksa Sarai @ 2026-01-14 21:18 UTC (permalink / raw) To: Andy Lutomirski Cc: Christian Brauner, Florian Weimer, linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie [-- Attachment #1: Type: text/plain, Size: 2010 bytes --] On 2026-01-14, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote: > > > > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote: > > > In <linux/mount.h>, we have this: > > > > > > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ > > > > > > This causes a few pain points for us to on the glibc side when we mirror > > > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, > > > which is one of the headers that's completely incompatible with the UAPI > > > headers. > > > > > > The reason why this is painful is because O_CLOEXEC has at least three > > > different values across architectures: 0x80000, 0x200000, 0x400000 > > > > > > Even for the UAPI this isn't ideal because it effectively burns three > > > open_tree flags, unless the flags are made architecture-specific, too. > > > > I think that just got cargo-culted... A long time ago some API define as > > O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we > > can't change that now but we can document that this shouldn't be ifdefed > > and instead be a separate per-syscall bit. But I think that's the best > > we can do right now. > > > > How about, for future syscalls, we make CLOEXEC unconditional? If > anyone wants an ofd to get inherited across exec, they can F_SETFD it > themselves. I believe newer interfaces have already started doing that (e.g., all of the pidfd stuff is O_CLOEXEC by default) but we should definitely update the documentation in Documentation/process/adding-syscalls.rst to stop recommending the inclusion of the O_CLOEXEC flag. The funniest thing about open_tree(2) is that it actually borrows flag bits from three distinct namespaces! It has an OPEN_TREE_* namespace, the AT_* namespace (which now has a concept of "per-syscall flags"), and O_CLOEXEC. What a fun interface! -- Aleksa Sarai https://www.cyphar.com/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 265 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC 2026-01-14 16:03 ` Christian Brauner 2026-01-14 19:42 ` Andy Lutomirski @ 2026-01-15 8:55 ` Florian Weimer 2026-01-16 10:00 ` Christian Brauner 1 sibling, 1 reply; 6+ messages in thread From: Florian Weimer @ 2026-01-15 8:55 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie * Christian Brauner: > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote: >> In <linux/mount.h>, we have this: >> >> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ >> >> This causes a few pain points for us to on the glibc side when we mirror >> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, >> which is one of the headers that's completely incompatible with the UAPI >> headers. >> >> The reason why this is painful is because O_CLOEXEC has at least three >> different values across architectures: 0x80000, 0x200000, 0x400000 >> >> Even for the UAPI this isn't ideal because it effectively burns three >> open_tree flags, unless the flags are made architecture-specific, too. > > I think that just got cargo-culted... A long time ago some API define as > O_CLOEXEC and now a lot of APIs have done the same. Yes, it looks like inotify is in the same boat. > I'm pretty sure we can't change that now but we can document that this > shouldn't be ifdefed and instead be a separate per-syscall bit. But I > think that's the best we can do right now. Maybe add something like this as a safety measure, to ensure that the flags don't overlap? diff --git a/fs/namespace.c b/fs/namespace.c index c58674a20cad..5bbfd379ec44 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned bool detached = flags & OPEN_TREE_CLONE; BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC); + BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE)); + BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) & + (O_CLOEXEC | OPEN_TREE_CLONE))); if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE | @@ -3100,7 +3103,7 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags) { - return FD_ADD(flags, vfs_open_tree(dfd, filename, flags)); + return FD_ADD(flags & O_CLOEXEC, vfs_open_tree(dfd, filename, flags)); } /* (Completely untested.) Passing the mix of flags to FD_ADD isn't really future-proof if FD_ADD ever recognizes more than just O_CLOEXEC. Thanks, Florian ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC 2026-01-15 8:55 ` Florian Weimer @ 2026-01-16 10:00 ` Christian Brauner 0 siblings, 0 replies; 6+ messages in thread From: Christian Brauner @ 2026-01-16 10:00 UTC (permalink / raw) To: Florian Weimer Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie On Thu, Jan 15, 2026 at 09:55:10AM +0100, Florian Weimer wrote: > * Christian Brauner: > > > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote: > >> In <linux/mount.h>, we have this: > >> > >> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ > >> > >> This causes a few pain points for us to on the glibc side when we mirror > >> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>, > >> which is one of the headers that's completely incompatible with the UAPI > >> headers. > >> > >> The reason why this is painful is because O_CLOEXEC has at least three > >> different values across architectures: 0x80000, 0x200000, 0x400000 > >> > >> Even for the UAPI this isn't ideal because it effectively burns three > >> open_tree flags, unless the flags are made architecture-specific, too. > > > > I think that just got cargo-culted... A long time ago some API define as > > O_CLOEXEC and now a lot of APIs have done the same. > > Yes, it looks like inotify is in the same boat. It's unfortunately nost just inotify...: include/linux/net.h:#define SOCK_CLOEXEC O_CLOEXEC include/uapi/drm/drm.h:#define DRM_CLOEXEC O_CLOEXEC include/uapi/linux/eventfd.h:#define EFD_CLOEXEC O_CLOEXEC include/uapi/linux/eventpoll.h:#define EPOLL_CLOEXEC O_CLOEXEC include/uapi/linux/inotify.h:#define IN_CLOEXEC O_CLOEXEC include/uapi/linux/signalfd.h:#define SFD_CLOEXEC O_CLOEXEC include/uapi/linux/timerfd.h:#define TFD_CLOEXEC O_CLOEXEC > > > I'm pretty sure we can't change that now but we can document that this > > shouldn't be ifdefed and instead be a separate per-syscall bit. But I > > think that's the best we can do right now. > > Maybe add something like this as a safety measure, to ensure that the > flags don't overlap? > > diff --git a/fs/namespace.c b/fs/namespace.c > index c58674a20cad..5bbfd379ec44 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned > bool detached = flags & OPEN_TREE_CLONE; > > BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC); > + BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE)); > + BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) & > + (O_CLOEXEC | OPEN_TREE_CLONE))); Yeah, we can do something like that! ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-16 10:00 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer 2026-01-14 16:03 ` Christian Brauner 2026-01-14 19:42 ` Andy Lutomirski 2026-01-14 21:18 ` Aleksa Sarai 2026-01-15 8:55 ` Florian Weimer 2026-01-16 10:00 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox