public inbox for linux-api@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Jori Koolstra <jkoolstra@xs4all.nl>, Jeff Layton <jlayton@kernel.org>
Cc: "Andy Lutomirski" <luto@kernel.org>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"H . Peter Anvin" <hpa@zytor.com>, "Jan Kara" <jack@suse.cz>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrey Albershteyn" <aalbersh@redhat.com>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Aleksa Sarai" <cyphar@cyphar.com>,
	cmirabil@redhat.com,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
Date: Mon, 27 Apr 2026 17:48:43 +0200	[thread overview]
Message-ID: <20260427-umlegen-aufbau-ee3a97f1528a@brauner> (raw)
In-Reply-To: <20260412135434.3095416-2-jkoolstra@xs4all.nl>

On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
> Currently there is no way to race-freely create and open a directory.
> For regular files we have open(O_CREAT) for creating a new file inode,
> and returning a pinning fd to it. The lack of such functionality for
> directories means that when populating a directory tree there's always
> a race involved: the inodes first need to be created, and then opened
> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
> but in the time window between the creation and the opening they might
> be replaced by something else.
> 
> Addressing this race without proper APIs is possible (by immediately
> fstat()ing what was opened, to verify that it has the right inode type),
> but difficult to get right. Hence, mkdirat2() that creates a directory
> and returns an O_DIRECTORY fd is useful.
> 
> This feature idea (and description) is taken from the UAPI group:
> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
> 
> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>  fs/internal.h                          |  2 ++
>  fs/namei.c                             | 44 +++++++++++++++++++++++---
>  include/linux/syscalls.h               |  2 ++
>  include/uapi/asm-generic/unistd.h      |  5 ++-
>  scripts/syscall.tbl                    |  1 +
>  6 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 524155d655da..e200ca2067a4 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -396,6 +396,7 @@
>  469	common	file_setattr		sys_file_setattr
>  470	common	listns			sys_listns
>  471	common	rseq_slice_yield	sys_rseq_slice_yield
> +472	common	mkdirat2		sys_mkdirat2
>  
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/fs/internal.h b/fs/internal.h
> index cbc384a1aa09..c6a79afadacf 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>  int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>  		 struct filename *newname, unsigned int flags);
>  int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> +		unsigned int flags, bool open);
>  int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>  int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>  int filename_linkat(int olddfd, struct filename *old, int newdfd,
> diff --git a/fs/namei.c b/fs/namei.c
> index a880454a6415..6451e96dc225 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  }
>  EXPORT_SYMBOL(vfs_mkdir);
>  
> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
> +static int mkdirat_lookup_flags(unsigned int flags)
> +{
> +	int lookup_flags = LOOKUP_DIRECTORY;
> +
> +	if (!(flags & AT_SYMLINK_NOFOLLOW))
> +		lookup_flags |= LOOKUP_FOLLOW;
> +	if (!(flags & AT_NO_AUTOMOUNT))
> +		lookup_flags |= LOOKUP_AUTOMOUNT;
> +
> +	return lookup_flags;
> +}
> +
> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
> +	return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
> +}
> +
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> +		unsigned int flags, bool open)
>  {
>  	struct dentry *dentry;
>  	struct path path;
>  	int error;
> -	unsigned int lookup_flags = LOOKUP_DIRECTORY;
> +	struct file *filp = NULL;
> +	unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>  	struct delegated_inode delegated_inode = { };
>  
>  retry:
>  	dentry = filename_create(dfd, name, &path, lookup_flags);
>  	if (IS_ERR(dentry))
> -		return PTR_ERR(dentry);
> +		return ERR_CAST(dentry);
>  
>  	error = security_path_mkdir(&path, dentry,
>  			mode_strip_umask(path.dentry->d_inode, mode));
> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>  		if (IS_ERR(dentry))
>  			error = PTR_ERR(dentry);
>  	}
> +	if (open && !error && !is_delegated(&delegated_inode)) {
> +		const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
> +		filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
> +	}

So definitely a patchset worthing doing but this will be hairy. And
Mateusz is right. As written this doesn't work. The canonical pattern
how e.g., dentry_open() does it is to preallocate the file.

I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
work. I remember that I had a vague comment about this in [1] a few
years ago (cf. [1]). It might even be less hairy to get that one right
as all the thinking for O_CREAT is already there.

What was the rationale for mkdirat2() instead of threading this through
openat()/openat2() with O_CREAT?

And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
O_DIRECTORY?

[1]: 43b450632676 ("open: return EINVAL for O_DIRECTORY | O_CREAT")

  parent reply	other threads:[~2026-04-27 15:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 13:54 [RFC PATCH v2 0/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd Jori Koolstra
2026-04-12 13:54 ` [RFC PATCH v2 1/2] " Jori Koolstra
2026-04-24 10:09   ` Mateusz Guzik
2026-04-27 15:14     ` Christian Brauner
2026-04-27 16:30       ` Mateusz Guzik
2026-04-28  8:55         ` Christian Brauner
2026-04-28 14:39           ` Mateusz Guzik
2026-04-27 15:48   ` Christian Brauner [this message]
2026-04-28  1:14     ` Aleksa Sarai
2026-04-28  6:39     ` Jeff Layton
2026-04-28  7:01       ` Jeff Layton
2026-04-28 13:39     ` Stefan Metzmacher
2026-04-28 13:49       ` Stefan Metzmacher
2026-04-28 14:01       ` Paulo Alcantara
2026-04-12 13:54 ` [RFC PATCH v2 2/2] selftest: add tests for mkdirat2() Jori Koolstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260427-umlegen-aufbau-ee3a97f1528a@brauner \
    --to=brauner@kernel.org \
    --cc=aalbersh@redhat.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=cmirabil@redhat.com \
    --cc=cyphar@cyphar.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=jkoolstra@xs4all.nl \
    --cc=jlayton@kernel.org \
    --cc=jolsa@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox