public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* O_CLOEXEC use for OPEN_TREE_CLOEXEC
@ 2026-01-13 22:40 Florian Weimer
  2026-01-14 16:03 ` Christian Brauner
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2026-01-13 22:40 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-api, linux-kernel, Al Viro, David Howells, DJ Delorie

In <linux/mount.h>, we have this:

#define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */

This causes a few pain points for us to on the glibc side when we mirror
this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
which is one of the headers that's completely incompatible with the UAPI
headers.

The reason why this is painful is because O_CLOEXEC has at least three
different values across architectures: 0x80000, 0x200000, 0x400000

Even for the UAPI this isn't ideal because it effectively burns three
open_tree flags, unless the flags are made architecture-specific, too.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
  2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer
@ 2026-01-14 16:03 ` Christian Brauner
  2026-01-14 19:42   ` Andy Lutomirski
  2026-01-15  8:55   ` Florian Weimer
  0 siblings, 2 replies; 6+ messages in thread
From: Christian Brauner @ 2026-01-14 16:03 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
	DJ Delorie

On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> In <linux/mount.h>, we have this:
> 
> #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
> 
> This causes a few pain points for us to on the glibc side when we mirror
> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> which is one of the headers that's completely incompatible with the UAPI
> headers.
> 
> The reason why this is painful is because O_CLOEXEC has at least three
> different values across architectures: 0x80000, 0x200000, 0x400000
> 
> Even for the UAPI this isn't ideal because it effectively burns three
> open_tree flags, unless the flags are made architecture-specific, too.

I think that just got cargo-culted... A long time ago some API define as
O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
can't change that now but we can document that this shouldn't be ifdefed
and instead be a separate per-syscall bit. But I think that's the best
we can do right now.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
  2026-01-14 16:03 ` Christian Brauner
@ 2026-01-14 19:42   ` Andy Lutomirski
  2026-01-14 21:18     ` Aleksa Sarai
  2026-01-15  8:55   ` Florian Weimer
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2026-01-14 19:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Florian Weimer, linux-fsdevel, linux-api, linux-kernel, Al Viro,
	David Howells, DJ Delorie

On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> > In <linux/mount.h>, we have this:
> >
> > #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
> >
> > This causes a few pain points for us to on the glibc side when we mirror
> > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> > which is one of the headers that's completely incompatible with the UAPI
> > headers.
> >
> > The reason why this is painful is because O_CLOEXEC has at least three
> > different values across architectures: 0x80000, 0x200000, 0x400000
> >
> > Even for the UAPI this isn't ideal because it effectively burns three
> > open_tree flags, unless the flags are made architecture-specific, too.
>
> I think that just got cargo-culted... A long time ago some API define as
> O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
> can't change that now but we can document that this shouldn't be ifdefed
> and instead be a separate per-syscall bit. But I think that's the best
> we can do right now.
>

How about, for future syscalls, we make CLOEXEC unconditional?  If
anyone wants an ofd to get inherited across exec, they can F_SETFD it
themselves.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
  2026-01-14 19:42   ` Andy Lutomirski
@ 2026-01-14 21:18     ` Aleksa Sarai
  0 siblings, 0 replies; 6+ messages in thread
From: Aleksa Sarai @ 2026-01-14 21:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christian Brauner, Florian Weimer, linux-fsdevel, linux-api,
	linux-kernel, Al Viro, David Howells, DJ Delorie

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

On 2026-01-14, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> > > In <linux/mount.h>, we have this:
> > >
> > > #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
> > >
> > > This causes a few pain points for us to on the glibc side when we mirror
> > > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> > > which is one of the headers that's completely incompatible with the UAPI
> > > headers.
> > >
> > > The reason why this is painful is because O_CLOEXEC has at least three
> > > different values across architectures: 0x80000, 0x200000, 0x400000
> > >
> > > Even for the UAPI this isn't ideal because it effectively burns three
> > > open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
> > can't change that now but we can document that this shouldn't be ifdefed
> > and instead be a separate per-syscall bit. But I think that's the best
> > we can do right now.
> >
> 
> How about, for future syscalls, we make CLOEXEC unconditional?  If
> anyone wants an ofd to get inherited across exec, they can F_SETFD it
> themselves.

I believe newer interfaces have already started doing that (e.g., all of
the pidfd stuff is O_CLOEXEC by default) but we should definitely update
the documentation in Documentation/process/adding-syscalls.rst to stop
recommending the inclusion of the O_CLOEXEC flag.

The funniest thing about open_tree(2) is that it actually borrows flag
bits from three distinct namespaces! It has an OPEN_TREE_* namespace,
the AT_* namespace (which now has a concept of "per-syscall flags"), and
O_CLOEXEC. What a fun interface!

-- 
Aleksa Sarai
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
  2026-01-14 16:03 ` Christian Brauner
  2026-01-14 19:42   ` Andy Lutomirski
@ 2026-01-15  8:55   ` Florian Weimer
  2026-01-16 10:00     ` Christian Brauner
  1 sibling, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2026-01-15  8:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
	DJ Delorie

* Christian Brauner:

> On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
>> In <linux/mount.h>, we have this:
>> 
>> #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
>> 
>> This causes a few pain points for us to on the glibc side when we mirror
>> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
>> which is one of the headers that's completely incompatible with the UAPI
>> headers.
>> 
>> The reason why this is painful is because O_CLOEXEC has at least three
>> different values across architectures: 0x80000, 0x200000, 0x400000
>> 
>> Even for the UAPI this isn't ideal because it effectively burns three
>> open_tree flags, unless the flags are made architecture-specific, too.
>
> I think that just got cargo-culted... A long time ago some API define as
> O_CLOEXEC and now a lot of APIs have done the same.

Yes, it looks like inotify is in the same boat.

> I'm pretty sure we can't change that now but we can document that this
> shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> think that's the best we can do right now.

Maybe add something like this as a safety measure, to ensure that the
flags don't overlap?

diff --git a/fs/namespace.c b/fs/namespace.c
index c58674a20cad..5bbfd379ec44 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
 	bool detached = flags & OPEN_TREE_CLONE;
 
 	BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
+	BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
+	BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
+		       (O_CLOEXEC | OPEN_TREE_CLONE)));
 
 	if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
 		      AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
@@ -3100,7 +3103,7 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
 
 SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags)
 {
-	return FD_ADD(flags, vfs_open_tree(dfd, filename, flags));
+	return FD_ADD(flags & O_CLOEXEC, vfs_open_tree(dfd, filename, flags));
 }
 
 /*

(Completely untested.)

Passing the mix of flags to FD_ADD isn't really future-proof if FD_ADD
ever recognizes more than just O_CLOEXEC.

Thanks,
Florian


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
  2026-01-15  8:55   ` Florian Weimer
@ 2026-01-16 10:00     ` Christian Brauner
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2026-01-16 10:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-fsdevel, linux-api, linux-kernel, Al Viro, David Howells,
	DJ Delorie

On Thu, Jan 15, 2026 at 09:55:10AM +0100, Florian Weimer wrote:
> * Christian Brauner:
> 
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> >> In <linux/mount.h>, we have this:
> >> 
> >> #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
> >> 
> >> This causes a few pain points for us to on the glibc side when we mirror
> >> this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> >> which is one of the headers that's completely incompatible with the UAPI
> >> headers.
> >> 
> >> The reason why this is painful is because O_CLOEXEC has at least three
> >> different values across architectures: 0x80000, 0x200000, 0x400000
> >> 
> >> Even for the UAPI this isn't ideal because it effectively burns three
> >> open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same.
> 
> Yes, it looks like inotify is in the same boat.

It's unfortunately nost just inotify...:

include/linux/net.h:#define SOCK_CLOEXEC        O_CLOEXEC
include/uapi/drm/drm.h:#define DRM_CLOEXEC O_CLOEXEC
include/uapi/linux/eventfd.h:#define EFD_CLOEXEC O_CLOEXEC
include/uapi/linux/eventpoll.h:#define EPOLL_CLOEXEC O_CLOEXEC
include/uapi/linux/inotify.h:#define IN_CLOEXEC O_CLOEXEC
include/uapi/linux/signalfd.h:#define SFD_CLOEXEC O_CLOEXEC
include/uapi/linux/timerfd.h:#define TFD_CLOEXEC O_CLOEXEC

> 
> > I'm pretty sure we can't change that now but we can document that this
> > shouldn't be ifdefed and instead be a separate per-syscall bit. But I
> > think that's the best we can do right now.
> 
> Maybe add something like this as a safety measure, to ensure that the
> flags don't overlap?
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c58674a20cad..5bbfd379ec44 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3069,6 +3069,9 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
>  	bool detached = flags & OPEN_TREE_CLONE;
>  
>  	BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
> +	BUILD_BUG_IN(!(O_CLOEXEC & OPEN_TREE_CLONE));
> +	BUILD_BUG_ON(!((AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW) &
> +		       (O_CLOEXEC | OPEN_TREE_CLONE)));

Yeah, we can do something like that!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-16 10:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13 22:40 O_CLOEXEC use for OPEN_TREE_CLOEXEC Florian Weimer
2026-01-14 16:03 ` Christian Brauner
2026-01-14 19:42   ` Andy Lutomirski
2026-01-14 21:18     ` Aleksa Sarai
2026-01-15  8:55   ` Florian Weimer
2026-01-16 10:00     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox