* O_CLOEXEC use for OPEN_TREE_CLOEXEC
@ 2026-01-13 22:40 Florian Weimer
2026-01-14 16:03 ` Christian Brauner
0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2026-01-13 22:40 UTC (permalink / raw)
To: linux-fsdevel; +Cc: linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie
In <linux/mount.h>, we have this:
#define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
This causes a few pain points for us to on the glibc side when we mirror
this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
which is one of the headers that's completely incompatible with the UAPI
headers.
The reason why this is painful is because O_CLOEXEC has at least three
different values across architectures: 0x80000, 0x200000, 0x400000
Even for the UAPI this isn't ideal because it effectively burns three
open_tree flags, unless the flags are made architecture-specific, too.
Thanks,
Florian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer
@ 2026-01-14 16:03 ` Christian Brauner
2026-01-14 19:42 ` Andy Lutomirski
2026-01-15 8:55 ` Florian Weimer
0 siblings, 2 replies; 6+ messages in thread
From: Christian Brauner @ 2026-01-14 16:03 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
DJ Delorie
On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> In <linux/mount.h>, we have this:
>
> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
>
> This causes a few pain points for us to on the glibc side when we mirror
> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> which is one of the headers that's completely incompatible with the UAPI
> headers.
>
> The reason why this is painful is because O_CLOEXEC has at least three
> different values across architectures: 0x80000, 0x200000, 0x400000
>
> Even for the UAPI this isn't ideal because it effectively burns three
> open_tree flags, unless the flags are made architecture-specific, too.
I think that just got cargo-culted... A long time ago some API define as
O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
can't change that now but we can document that this shouldn't be ifdefed
and instead be a separate per-syscall bit. But I think that's the best
we can do right now.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
2026-01-14 16:03 ` Christian Brauner
@ 2026-01-14 19:42 ` Andy Lutomirski
2026-01-14 21:18 ` Aleksa Sarai
2026-01-15 8:55 ` Florian Weimer
1 sibling, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2026-01-14 19:42 UTC (permalink / raw)
To: Christian Brauner
Cc: Florian Weimer, linux-fsdevel, linux-api, linux-kernel, Al Viro,
David Howells, DJ Delorie
On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> > In <linux/mount.h>, we have this:
> >
> > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
> >
> > This causes a few pain points for us to on the glibc side when we mirror
> > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> > which is one of the headers that's completely incompatible with the UAPI
> > headers.
> >
> > The reason why this is painful is because O_CLOEXEC has at least three
> > different values across architectures: 0x80000, 0x200000, 0x400000
> >
> > Even for the UAPI this isn't ideal because it effectively burns three
> > open_tree flags, unless the flags are made architecture-specific, too.
>
> I think that just got cargo-culted... A long time ago some API define as
> O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
> can't change that now but we can document that this shouldn't be ifdefed
> and instead be a separate per-syscall bit. But I think that's the best
> we can do right now.
>
How about, for future syscalls, we make CLOEXEC unconditional? If
anyone wants an ofd to get inherited across exec, they can F_SETFD it
themselves.
--Andy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
2026-01-14 19:42 ` Andy Lutomirski
@ 2026-01-14 21:18 ` Aleksa Sarai
0 siblings, 0 replies; 6+ messages in thread
From: Aleksa Sarai @ 2026-01-14 21:18 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christian Brauner, Florian Weimer, linux-fsdevel, linux-api,
linux-kernel, Al Viro, David Howells, DJ Delorie
[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]
On 2026-01-14, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> > > In <linux/mount.h>, we have this:
> > >
> > > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
> > >
> > > This causes a few pain points for us to on the glibc side when we mirror
> > > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> > > which is one of the headers that's completely incompatible with the UAPI
> > > headers.
> > >
> > > The reason why this is painful is because O_CLOEXEC has at least three
> > > different values across architectures: 0x80000, 0x200000, 0x400000
> > >
> > > Even for the UAPI this isn't ideal because it effectively burns three
> > > open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
> > can't change that now but we can document that this shouldn't be ifdefed
> > and instead be a separate per-syscall bit. But I think that's the best
> > we can do right now.
> >
>
> How about, for future syscalls, we make CLOEXEC unconditional? If
> anyone wants an ofd to get inherited across exec, they can F_SETFD it
> themselves.
I believe newer interfaces have already started doing that (e.g., all of
the pidfd stuff is O_CLOEXEC by default) but we should definitely update
the documentation in Documentation/process/adding-syscalls.rst to stop
recommending the inclusion of the O_CLOEXEC flag.
The funniest thing about open_tree(2) is that it actually borrows flag
bits from three distinct namespaces! It has an OPEN_TREE_* namespace,
the AT_* namespace (which now has a concept of "per-syscall flags"), and
O_CLOEXEC. What a fun interface!
--
Aleksa Sarai
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
2026-01-14 16:03 ` Christian Brauner
2026-01-14 19:42 ` Andy Lutomirski
@ 2026-01-15 8:55 ` Florian Weimer
2026-01-16 10:00 ` Christian Brauner
1 sibling, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2026-01-15 8:55 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
DJ Delorie
* Christian Brauner:
> On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
>> In <linux/mount.h>, we have this:
>>
>> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
>>
>> This causes a few pain points for us to on the glibc side when we mirror
>> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
>> which is one of the headers that's completely incompatible with the UAPI
>> headers.
>>
>> The reason why this is painful is because O_CLOEXEC has at least three
>> different values across architectures: 0x80000, 0x200000, 0x400000
>>
>> Even for the UAPI this isn't ideal because it effectively burns three
>> open_tree flags, unless the flags are made architecture-specific, too.
>
> I think that just got cargo-culted... A long time ago some API define as
> O_CLOEXEC and now a lot of APIs have done the same.
Yes, it looks like inotify is in the same boat.
> I'm pretty sure we can't change that now but we can document that this
> shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> think that's the best we can do right now.
Maybe add something like this as a safety measure, to ensure that the
flags don't overlap?
diff --git a/fs/namespace.c b/fs/namespace.c
index c58674a20cad..5bbfd379ec44 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
bool detached = flags & OPEN_TREE_CLONE;
BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
+ BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
+ BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
+ (O_CLOEXEC | OPEN_TREE_CLONE)));
if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
@@ -3100,7 +3103,7 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags)
{
- return FD_ADD(flags, vfs_open_tree(dfd, filename, flags));
+ return FD_ADD(flags & O_CLOEXEC, vfs_open_tree(dfd, filename, flags));
}
/*
(Completely untested.)
Passing the mix of flags to FD_ADD isn't really future-proof if FD_ADD
ever recognizes more than just O_CLOEXEC.
Thanks,
Florian
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
2026-01-15 8:55 ` Florian Weimer
@ 2026-01-16 10:00 ` Christian Brauner
0 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2026-01-16 10:00 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
DJ Delorie
On Thu, Jan 15, 2026 at 09:55:10AM +0100, Florian Weimer wrote:
> * Christian Brauner:
>
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> >> In <linux/mount.h>, we have this:
> >>
> >> #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
> >>
> >> This causes a few pain points for us to on the glibc side when we mirror
> >> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> >> which is one of the headers that's completely incompatible with the UAPI
> >> headers.
> >>
> >> The reason why this is painful is because O_CLOEXEC has at least three
> >> different values across architectures: 0x80000, 0x200000, 0x400000
> >>
> >> Even for the UAPI this isn't ideal because it effectively burns three
> >> open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same.
>
> Yes, it looks like inotify is in the same boat.
It's unfortunately nost just inotify...:
include/linux/net.h:#define SOCK_CLOEXEC O_CLOEXEC
include/uapi/drm/drm.h:#define DRM_CLOEXEC O_CLOEXEC
include/uapi/linux/eventfd.h:#define EFD_CLOEXEC O_CLOEXEC
include/uapi/linux/eventpoll.h:#define EPOLL_CLOEXEC O_CLOEXEC
include/uapi/linux/inotify.h:#define IN_CLOEXEC O_CLOEXEC
include/uapi/linux/signalfd.h:#define SFD_CLOEXEC O_CLOEXEC
include/uapi/linux/timerfd.h:#define TFD_CLOEXEC O_CLOEXEC
>
> > I'm pretty sure we can't change that now but we can document that this
> > shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> > think that's the best we can do right now.
>
> Maybe add something like this as a safety measure, to ensure that the
> flags don't overlap?
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c58674a20cad..5bbfd379ec44 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
> bool detached = flags & OPEN_TREE_CLONE;
>
> BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
> + BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
> + BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
> + (O_CLOEXEC | OPEN_TREE_CLONE)));
Yeah, we can do something like that!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-16 10:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer
2026-01-14 16:03 ` Christian Brauner
2026-01-14 19:42 ` Andy Lutomirski
2026-01-14 21:18 ` Aleksa Sarai
2026-01-15 8:55 ` Florian Weimer
2026-01-16 10:00 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox