* [RFC PATCH 0/2] vfs: mkdirat_fd() syscall @ 2026-03-31 17:19 Jori Koolstra 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra 2026-03-31 17:19 ` [RFC PATCH 2/2] selftest: add tests for mkdirat_fd() Jori Koolstra 0 siblings, 2 replies; 13+ messages in thread From: Jori Koolstra @ 2026-03-31 17:19 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman Cc: H . Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Jori Koolstra This series implements the mkdirat_fd() syscall that was suggested over at the UAPI group kernel feature page [1] with some tests. Obviously, if we want this we should also implement mknodeat_fd() and symlinkat_fd(), but their implementation can be done quite similar I believe. I have added an unigned int flags like [2] suggests and an example flag that we may want to remove (it right now mainly serves an internal purpose). But it marks where I would want to place the definitions. This has been compiled and tested on x86 only. [2] is a bit confusing here and there, so I hope I have added the proper syscall definitions everywhere where they needs to be added. [1]: https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes [2]: https://www.kernel.org/doc/html/latest/process/adding-syscalls.html Jori Koolstra (2): vfs: syscalls: add mkdirat_fd() selftest: add tests for mkdirat_fd() arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/internal.h | 1 + fs/namei.c | 26 +++- include/linux/fcntl.h | 2 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/fcntl.h | 3 + include/uapi/asm-generic/unistd.h | 5 +- scripts/syscall.tbl | 1 + tools/include/uapi/asm-generic/unistd.h | 5 +- tools/testing/selftests/filesystems/Makefile | 4 +- .../selftests/filesystems/mkdirat_fd_test.c | 139 ++++++++++++++++++ 11 files changed, 183 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/filesystems/mkdirat_fd_test.c -- 2.53.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 17:19 [RFC PATCH 0/2] vfs: mkdirat_fd() syscall Jori Koolstra @ 2026-03-31 17:19 ` Jori Koolstra 2026-03-31 19:13 ` Arnd Bergmann ` (2 more replies) 2026-03-31 17:19 ` [RFC PATCH 2/2] selftest: add tests for mkdirat_fd() Jori Koolstra 1 sibling, 3 replies; 13+ messages in thread From: Jori Koolstra @ 2026-03-31 17:19 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring Cc: Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Jori Koolstra, Masami Hiramatsu (Google) Currently there is no way to race-freely create and open a directory. For regular files we have open(O_CREAT) for creating a new file inode, and returning a pinning fd to it. The lack of such functionality for directories means that when populating a directory tree there's always a race involved: the inodes first need to be created, and then opened to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., but in the time window between the creation and the opening they might be replaced by something else. Addressing this race without proper APIs is possible (by immediately fstat()ing what was opened, to verify that it has the right inode type), but difficult to get right. Hence, mkdirat_fd() that creates a directory and returns an O_DIRECTORY fd is useful. This feature idea (and description) is taken from the UAPI group: https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/internal.h | 1 + fs/namei.c | 26 ++++++++++++++++++++++++-- include/linux/fcntl.h | 2 ++ include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/fcntl.h | 3 +++ include/uapi/asm-generic/unistd.h | 5 ++++- scripts/syscall.tbl | 1 + 8 files changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 524155d655da..dda920c26941 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -396,6 +396,7 @@ 469 common file_setattr sys_file_setattr 470 common listns sys_listns 471 common rseq_slice_yield sys_rseq_slice_yield +472 common mkdirat_fd sys_mkdirat_fd # # Due to a historical design error, certain syscalls are numbered differently diff --git a/fs/internal.h b/fs/internal.h index cbc384a1aa09..2885a3e4ebdd 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -58,6 +58,7 @@ int filename_unlinkat(int dfd, struct filename *name); int may_linkat(struct mnt_idmap *idmap, const struct path *link); int filename_renameat2(int olddfd, struct filename *oldname, int newdfd, struct filename *newname, unsigned int flags); +int filename_mkdirat_fd(int dfd, struct filename *name, umode_t mode, unsigned int flags); int filename_mkdirat(int dfd, struct filename *name, umode_t mode); int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev); int filename_symlinkat(struct filename *from, int newdfd, struct filename *to); diff --git a/fs/namei.c b/fs/namei.c index 1eb9db055292..93252937983e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5256,6 +5256,11 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir, EXPORT_SYMBOL(vfs_mkdir); int filename_mkdirat(int dfd, struct filename *name, umode_t mode) +{ + return filename_mkdirat_fd(dfd, name, mode, 0); +} + +int filename_mkdirat_fd(int dfd, struct filename *name, umode_t mode, unsigned int flags) { struct dentry *dentry; struct path path; @@ -5263,7 +5268,7 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) unsigned int lookup_flags = LOOKUP_DIRECTORY; struct delegated_inode delegated_inode = { }; -retry: +start: dentry = filename_create(dfd, name, &path, lookup_flags); if (IS_ERR(dentry)) return PTR_ERR(dentry); @@ -5276,7 +5281,6 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) if (IS_ERR(dentry)) error = PTR_ERR(dentry); } - end_creating_path(&path, dentry); if (is_delegated(&delegated_inode)) { error = break_deleg_wait(&delegated_inode); if (!error) @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) lookup_flags |= LOOKUP_REVAL; goto retry; } + + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { + struct path new_path = { .mnt = path.mnt, .dentry = dentry }; + error = FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); + } + end_creating_path(&path, dentry); return error; +retry: + end_creating_path(&path, dentry); + goto start; +} + +SYSCALL_DEFINE4(mkdirat_fd, int, dfd, const char __user *, pathname, umode_t, mode, + unsigned int, flags) +{ + CLASS(filename, name)(pathname); + if (flags & ~VALID_MKDIRAT_FD_FLAGS) + return -EINVAL; + return filename_mkdirat_fd(dfd, name, mode, flags | MKDIRAT_FD_NEED_FD); } SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode) diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..d2f0fdb82847 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -25,6 +25,8 @@ #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) #endif +#define VALID_MKDIRAT_FD_FLAGS (MKDIRAT_FD_NEED_FD) + #if BITS_PER_LONG == 32 #define IS_GETLK32(cmd) ((cmd) == F_GETLK) #define IS_SETLK32(cmd) ((cmd) == F_SETLK) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 02bd6ddb6278..52e7f09d5525 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user * asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx, u32 size, u32 flags); asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags); +asmlinkage long sys_mkdirat_fd(int dfd, const char __user *pathname, umode_t mode, + unsigned int flags) /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 613475285643..621458bf1fbf 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -95,6 +95,9 @@ #define O_NDELAY O_NONBLOCK #endif +/* Flags for mkdirat_fd */ +#define MKDIRAT_FD_NEED_FD 0x01 + #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get close_on_exec */ #define F_SETFD 2 /* set/clear close_on_exec */ diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index a627acc8fb5f..5bae1029f5d9 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -863,8 +863,11 @@ __SYSCALL(__NR_listns, sys_listns) #define __NR_rseq_slice_yield 471 __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield) +#define __NR_mkdirat_fd 472 +__SYSCALL(__NR_mkdirat_fd, sys_mkdirat_fd) + #undef __NR_syscalls -#define __NR_syscalls 472 +#define __NR_syscalls 473 /* * 32 bit systems traditionally used different diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl index 7a42b32b6577..db3bd97d4a1a 100644 --- a/scripts/syscall.tbl +++ b/scripts/syscall.tbl @@ -412,3 +412,4 @@ 469 common file_setattr sys_file_setattr 470 common listns sys_listns 471 common rseq_slice_yield sys_rseq_slice_yield +472 common mkdirat_fd sys_mkdirat_fd -- 2.53.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra @ 2026-03-31 19:13 ` Arnd Bergmann 2026-04-01 14:09 ` David Laight 2026-03-31 20:25 ` Yann Droneaud 2026-04-01 4:19 ` Mateusz Guzik 2 siblings, 1 reply; 13+ messages in thread From: Arnd Bergmann @ 2026-03-31 19:13 UTC (permalink / raw) To: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, shuah, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring Cc: Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, Linux-Arch, linux-kselftest, cmirabil, Masami Hiramatsu On Tue, Mar 31, 2026, at 19:19, Jori Koolstra wrote: > Currently there is no way to race-freely create and open a directory. > For regular files we have open(O_CREAT) for creating a new file inode, > and returning a pinning fd to it. The lack of such functionality for > directories means that when populating a directory tree there's always > a race involved: the inodes first need to be created, and then opened > to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., > but in the time window between the creation and the opening they might > be replaced by something else. > > Addressing this race without proper APIs is possible (by immediately > fstat()ing what was opened, to verify that it has the right inode type), > but difficult to get right. Hence, mkdirat_fd() that creates a directory > and returns an O_DIRECTORY fd is useful. > > This feature idea (and description) is taken from the UAPI group: > https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes > > Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> I checked that the calling conventions are fine, i.e. this will work as expected across all architectures. I assume you are also aware that the non-RFC patch will need to add the syscall number to all .tbl files. The hardest problem here does seem to be the naming of the new syscall, and I'm sorry to not be able to offer any solution either, just two observations: - mkdirat/mkdirat_fd sounds similar to the existing quotactl/quotactl_fd pair, but quotactl_fd() takes a file descriptor argument rather than returning it, which makes this addition quite confusing. - the nicest interface IMO would have been a variation of openat(dfd, filename, O_CREAT | O_DIRECTORY, mode) but that is a minefield of incompatible implementations[1], so we can't do that without changing the behavior for existing callers that currently run into an error. Arnd [1] https://lwn.net/Articles/926782/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 19:13 ` Arnd Bergmann @ 2026-04-01 14:09 ` David Laight 0 siblings, 0 replies; 13+ messages in thread From: David Laight @ 2026-04-01 14:09 UTC (permalink / raw) To: Arnd Bergmann Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, shuah, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, Linux-Arch, linux-kselftest, cmirabil, Masami Hiramatsu On Tue, 31 Mar 2026 21:13:34 +0200 "Arnd Bergmann" <arnd@arndb.de> wrote: > On Tue, Mar 31, 2026, at 19:19, Jori Koolstra wrote: > > Currently there is no way to race-freely create and open a directory. > > For regular files we have open(O_CREAT) for creating a new file inode, > > and returning a pinning fd to it. The lack of such functionality for > > directories means that when populating a directory tree there's always > > a race involved: the inodes first need to be created, and then opened > > to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., > > but in the time window between the creation and the opening they might > > be replaced by something else. > > > > Addressing this race without proper APIs is possible (by immediately > > fstat()ing what was opened, to verify that it has the right inode type), > > but difficult to get right. Hence, mkdirat_fd() that creates a directory > > and returns an O_DIRECTORY fd is useful. > > > > This feature idea (and description) is taken from the UAPI group: > > https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes > > > > Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> > > I checked that the calling conventions are fine, i.e. this will work > as expected across all architectures. I assume you are also aware > that the non-RFC patch will need to add the syscall number to all > .tbl files. > > The hardest problem here does seem to be the naming of the > new syscall, and I'm sorry to not be able to offer any solution > either, just two observations: > > - mkdirat/mkdirat_fd sounds similar to the existing > quotactl/quotactl_fd pair, but quotactl_fd() takes a file > descriptor argument rather than returning it, which makes > this addition quite confusing. > > - the nicest interface IMO would have been a variation of > openat(dfd, filename, O_CREAT | O_DIRECTORY, mode) > but that is a minefield of incompatible implementations[1], > so we can't do that without changing the behavior for > existing callers that currently run into an error. Just require O_TMPFILE to be set as well :-) You know you'll never regret it one Apr-1 is over. Can something be done with the flags to openat2(). That might save allocating an extra system call. David > > Arnd > > [1] https://lwn.net/Articles/926782/ > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra 2026-03-31 19:13 ` Arnd Bergmann @ 2026-03-31 20:25 ` Yann Droneaud 2026-03-31 20:42 ` H. Peter Anvin 2026-04-01 4:19 ` Mateusz Guzik 2 siblings, 1 reply; 13+ messages in thread From: Yann Droneaud @ 2026-03-31 20:25 UTC (permalink / raw) To: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring Cc: Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) Hi, Le 31/03/2026 à 19:19, Jori Koolstra a écrit : > Currently there is no way to race-freely create and open a directory. > For regular files we have open(O_CREAT) for creating a new file inode, > and returning a pinning fd to it. The lack of such functionality for > directories means that when populating a directory tree there's always > a race involved: the inodes first need to be created, and then opened > to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., > but in the time window between the creation and the opening they might > be replaced by something else. > > Addressing this race without proper APIs is possible (by immediately > fstat()ing what was opened, to verify that it has the right inode type), > but difficult to get right. Hence, mkdirat_fd() that creates a directory > and returns an O_DIRECTORY fd is useful. > > This feature idea (and description) is taken from the UAPI group: > https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes > > Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> > --- > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > fs/internal.h | 1 + > fs/namei.c | 26 ++++++++++++++++++++++++-- > include/linux/fcntl.h | 2 ++ > include/linux/syscalls.h | 2 ++ > include/uapi/asm-generic/fcntl.h | 3 +++ > include/uapi/asm-generic/unistd.h | 5 ++++- > scripts/syscall.tbl | 1 + > 8 files changed, 38 insertions(+), 3 deletions(-) > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h > index a332e79b3207..d2f0fdb82847 100644 > --- a/include/linux/fcntl.h > +++ b/include/linux/fcntl.h > @@ -25,6 +25,8 @@ > #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) > #endif > > +#define VALID_MKDIRAT_FD_FLAGS (MKDIRAT_FD_NEED_FD) > + I don't see support for O_CLOEXEC-ish flag, is the file descriptor in close-on-exec mode by default ? If yes, it should be mentioned. > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > index 613475285643..621458bf1fbf 100644 > --- a/include/uapi/asm-generic/fcntl.h > +++ b/include/uapi/asm-generic/fcntl.h > @@ -95,6 +95,9 @@ > #define O_NDELAY O_NONBLOCK > #endif > > +/* Flags for mkdirat_fd */ > +#define MKDIRAT_FD_NEED_FD 0x01 > + Regards. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 20:25 ` Yann Droneaud @ 2026-03-31 20:42 ` H. Peter Anvin 0 siblings, 0 replies; 13+ messages in thread From: H. Peter Anvin @ 2026-03-31 20:42 UTC (permalink / raw) To: Yann Droneaud, Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, Jan Kara, Alexander Aring Cc: Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) On March 31, 2026 1:25:03 PM PDT, Yann Droneaud <yann@droneaud.fr> wrote: >Hi, > >Le 31/03/2026 à 19:19, Jori Koolstra a écrit : >> Currently there is no way to race-freely create and open a directory. >> For regular files we have open(O_CREAT) for creating a new file inode, >> and returning a pinning fd to it. The lack of such functionality for >> directories means that when populating a directory tree there's always >> a race involved: the inodes first need to be created, and then opened >> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., >> but in the time window between the creation and the opening they might >> be replaced by something else. >> >> Addressing this race without proper APIs is possible (by immediately >> fstat()ing what was opened, to verify that it has the right inode type), >> but difficult to get right. Hence, mkdirat_fd() that creates a directory >> and returns an O_DIRECTORY fd is useful. >> >> This feature idea (and description) is taken from the UAPI group: >> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes >> >> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> >> --- >> arch/x86/entry/syscalls/syscall_64.tbl | 1 + >> fs/internal.h | 1 + >> fs/namei.c | 26 ++++++++++++++++++++++++-- >> include/linux/fcntl.h | 2 ++ >> include/linux/syscalls.h | 2 ++ >> include/uapi/asm-generic/fcntl.h | 3 +++ >> include/uapi/asm-generic/unistd.h | 5 ++++- >> scripts/syscall.tbl | 1 + >> 8 files changed, 38 insertions(+), 3 deletions(-) > >> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h >> index a332e79b3207..d2f0fdb82847 100644 >> --- a/include/linux/fcntl.h >> +++ b/include/linux/fcntl.h >> @@ -25,6 +25,8 @@ >> #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) >> #endif >> +#define VALID_MKDIRAT_FD_FLAGS (MKDIRAT_FD_NEED_FD) >> + > >I don't see support for O_CLOEXEC-ish flag, is the file descriptor in close-on-exec mode by default ? If yes, it should be mentioned. > > >> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h >> index 613475285643..621458bf1fbf 100644 >> --- a/include/uapi/asm-generic/fcntl.h >> +++ b/include/uapi/asm-generic/fcntl.h >> @@ -95,6 +95,9 @@ >> #define O_NDELAY O_NONBLOCK >> #endif >> +/* Flags for mkdirat_fd */ >> +#define MKDIRAT_FD_NEED_FD 0x01 >> + > > >Regards. > > And even if it is, POSIX already has O_CLOFORK and we should expect that that will be needed, too. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra 2026-03-31 19:13 ` Arnd Bergmann 2026-03-31 20:25 ` Yann Droneaud @ 2026-04-01 4:19 ` Mateusz Guzik 2026-04-01 9:44 ` Cyril Hrubis ` (2 more replies) 2 siblings, 3 replies; 13+ messages in thread From: Mateusz Guzik @ 2026-04-01 4:19 UTC (permalink / raw) To: Jori Koolstra Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) On Tue, Mar 31, 2026 at 07:19:58PM +0200, Jori Koolstra wrote: > @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) > lookup_flags |= LOOKUP_REVAL; > goto retry; > } > + > + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { > + struct path new_path = { .mnt = path.mnt, .dentry = dentry }; > + error = FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); > + } > + end_creating_path(&path, dentry); > return error; You can't do it like this. Should it turn out no fd can be allocated, the entire thing is going to error out while keeping the newly created directory behind. You need to allocate the fd first, then do the hard work, and only then fd_install and or free the fd. The FD_ADD machinery can probably still be used provided proper wrapping of the real new mkdir. It should be perfectly feasible to de facto wrap existing mkdir functionality by this syscall. On top of that similarly to what other people mentioned the new syscall will definitely want to support O_CLOEXEC and probably other flags down the line. Trying to handle this in open() is a no-go. openat2 is rather problematic. I tend to agree mkdirat_fd is not a good name for the syscall either, but I don't have a suggestion I'm happy with. I think least bad name would follow the existing stuff and be mkdirat2 or similar. The routine would have to start with validating the passed O_ flags, for now only allowing O_CLOEXEC and EINVAL-ing otherwise. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-04-01 4:19 ` Mateusz Guzik @ 2026-04-01 9:44 ` Cyril Hrubis 2026-04-01 10:25 ` Jori Koolstra 2026-04-02 2:52 ` Aleksa Sarai 2 siblings, 0 replies; 13+ messages in thread From: Cyril Hrubis @ 2026-04-01 9:44 UTC (permalink / raw) To: Mateusz Guzik Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) Hi! > I tend to agree mkdirat_fd is not a good name for the syscall either, > but I don't have a suggestion I'm happy with. I think least bad name > would follow the existing stuff and be mkdirat2 or similar. Why not mkdirat_open() as it does combine these two syscalls into one? -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-04-01 4:19 ` Mateusz Guzik 2026-04-01 9:44 ` Cyril Hrubis @ 2026-04-01 10:25 ` Jori Koolstra 2026-04-07 9:00 ` Mateusz Guzik 2026-04-02 2:52 ` Aleksa Sarai 2 siblings, 1 reply; 13+ messages in thread From: Jori Koolstra @ 2026-04-01 10:25 UTC (permalink / raw) To: Mateusz Guzik Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) > Op 01-04-2026 06:19 CEST schreef Mateusz Guzik <mjguzik@gmail.com>: > > > On Tue, Mar 31, 2026 at 07:19:58PM +0200, Jori Koolstra wrote: > > @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) > > lookup_flags |= LOOKUP_REVAL; > > goto retry; > > } > > + > > + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { > > + struct path new_path = { .mnt = path.mnt, .dentry = dentry }; > > + error = FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); > > + } > > + end_creating_path(&path, dentry); > > return error; > > > You can't do it like this. Should it turn out no fd can be allocated, > the entire thing is going to error out while keeping the newly created > directory behind. You need to allocate the fd first, then do the hard > work, and only then fd_install and or free the fd. The FD_ADD machinery > can probably still be used provided proper wrapping of the real new > mkdir. But isn't this exactly what happens in open(O_CREAT) too? Eventually we call error = dir_inode->i_op->create(idmap, dir_inode, dentry, mode, open_flag & O_EXCL); and only then do we assign and install the fd. AFAIK there is no cleanup happening there either if the FD_ADD step fails. You will just have a regular file and no descriptor. But I would have to test this to be sure. > > On top of that similarly to what other people mentioned the new syscall > will definitely want to support O_CLOEXEC and probably other flags down > the line. > I agree, and perhaps O_PATH too. Maybe just all open flags relevant to directories? > Trying to handle this in open() is a no-go. openat2 is rather > problematic. I don't think that is necessarily true. It turned out O_CREAT | O_DIRECTORY was bugged for a very long time. Christian Brauner fixed it eventually, and that combination now returns EINVAL. But I think there is nothing really stopping us from implementing that combination in the expected way, apart from whatever reasons there were for not allowing this in the first place, which I don't know about (maybe mixing semantics?) > > I tend to agree mkdirat_fd is not a good name for the syscall either, > but I don't have a suggestion I'm happy with. I think least bad name > would follow the existing stuff and be mkdirat2 or similar. > > The routine would have to start with validating the passed O_ flags, for > now only allowing O_CLOEXEC and EINVAL-ing otherwise. Thanks, Jori ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-04-01 10:25 ` Jori Koolstra @ 2026-04-07 9:00 ` Mateusz Guzik 0 siblings, 0 replies; 13+ messages in thread From: Mateusz Guzik @ 2026-04-07 9:00 UTC (permalink / raw) To: Jori Koolstra Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) On Wed, Apr 1, 2026 at 12:25 PM Jori Koolstra <jkoolstra@xs4all.nl> wrote: > > > > Op 01-04-2026 06:19 CEST schreef Mateusz Guzik <mjguzik@gmail.com>: > > > > > > On Tue, Mar 31, 2026 at 07:19:58PM +0200, Jori Koolstra wrote: > > > @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) > > > lookup_flags |= LOOKUP_REVAL; > > > goto retry; > > > } > > > + > > > + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { > > > + struct path new_path = { .mnt = path.mnt, .dentry = dentry }; > > > + error = FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); > > > + } > > > + end_creating_path(&path, dentry); > > > return error; > > > > > > You can't do it like this. Should it turn out no fd can be allocated, > > the entire thing is going to error out while keeping the newly created > > directory behind. You need to allocate the fd first, then do the hard > > work, and only then fd_install and or free the fd. The FD_ADD machinery > > can probably still be used provided proper wrapping of the real new > > mkdir. > > But isn't this exactly what happens in open(O_CREAT) too? Eventually we > call > error = dir_inode->i_op->create(idmap, dir_inode, dentry, > mode, open_flag & O_EXCL); > > and only then do we assign and install the fd. AFAIK there is no cleanup > happening there either if the FD_ADD step fails. You will just have a > regular file and no descriptor. But I would have to test this to be sure. > FD_ADD(how->flags, do_file_open(dfd, name, &op)) means fd itself will be allocated upfront and only then file creation will happen and which is what I'm saying is how it should be done. With your patch the directory is created first and the possibly failing fd allocation happens later. > > > > On top of that similarly to what other people mentioned the new syscall > > will definitely want to support O_CLOEXEC and probably other flags down > > the line. > > > > I agree, and perhaps O_PATH too. Maybe just all open flags relevant to > directories? > I don't know about O_PATH as is, but certainly the syscall needs to be able to grab more flags in the future. > > Trying to handle this in open() is a no-go. openat2 is rather > > problematic. > > I don't think that is necessarily true. It turned out O_CREAT | O_DIRECTORY > was bugged for a very long time. Christian Brauner fixed it eventually, and > that combination now returns EINVAL. But I think there is nothing really > stopping us from implementing that combination in the expected way, apart > from whatever reasons there were for not allowing this in the first place, > which I don't know about (maybe mixing semantics?) > I am not saying it's impossible. I am saying mkdir was always a separate codepath and in order to change that you would need to add a branchfest to open. I don't see any reason to go that route. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-04-01 4:19 ` Mateusz Guzik 2026-04-01 9:44 ` Cyril Hrubis 2026-04-01 10:25 ` Jori Koolstra @ 2026-04-02 2:52 ` Aleksa Sarai 2026-04-07 8:52 ` Mateusz Guzik 2 siblings, 1 reply; 13+ messages in thread From: Aleksa Sarai @ 2026-04-02 2:52 UTC (permalink / raw) To: Mateusz Guzik Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) [-- Attachment #1: Type: text/plain, Size: 2688 bytes --] On 2026-04-01, Mateusz Guzik <mjguzik@gmail.com> wrote: > On Tue, Mar 31, 2026 at 07:19:58PM +0200, Jori Koolstra wrote: > > @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode) > > lookup_flags |= LOOKUP_REVAL; > > goto retry; > > } > > + > > + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { > > + struct path new_path = { .mnt = path.mnt, .dentry = dentry }; > > + error = FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); > > + } > > + end_creating_path(&path, dentry); > > return error; > > > You can't do it like this. Should it turn out no fd can be allocated, > the entire thing is going to error out while keeping the newly created > directory behind. You need to allocate the fd first, then do the hard > work, and only then fd_install and or free the fd. The FD_ADD machinery > can probably still be used provided proper wrapping of the real new > mkdir. > > It should be perfectly feasible to de facto wrap existing mkdir > functionality by this syscall. > > On top of that similarly to what other people mentioned the new syscall > will definitely want to support O_CLOEXEC and probably other flags down > the line. > > Trying to handle this in open() is a no-go. openat2 is rather > problematic. I'm interested in what makes you say that. It would be very nice to be able to do mkdir + RESOLVE_IN_ROOT and get an fd back all in one syscall. :D To be fair, build_open_how() will need some more magic to keep openat() working, and that won't be particularly pretty. If we went with O_CREAT|O_DIRECTORY we would need to be quite careful to make sure O_TMPFILE continues to work for both openat() and openat2()... > I tend to agree mkdirat_fd is not a good name for the syscall either, > but I don't have a suggestion I'm happy with. I think least bad name > would follow the existing stuff and be mkdirat2 or similar. > > The routine would have to start with validating the passed O_ flags, for > now only allowing O_CLOEXEC and EINVAL-ing otherwise. Please do not use O_* flags! O_CLOEXEC takes up 3 flag bits on different architectures which makes adding new flags a nightmare. I think this should take AT_* flags and (like most newer syscalls) O_CLOEXEC should be automatically set. Userspace can unset it with fnctl(F_SETFD) in the relatively rare case where they don't want O_CLOEXEC. Alternatively, we could just bite the bullet and make AT_NO_CLOEXEC a thing... But yes, new syscalls *absolutely* need to take some kind of flag argument. I'd hoped we finally learned our lesson on that one... -- Aleksa Sarai https://www.cyphar.com/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 265 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() 2026-04-02 2:52 ` Aleksa Sarai @ 2026-04-07 8:52 ` Mateusz Guzik 0 siblings, 0 replies; 13+ messages in thread From: Mateusz Guzik @ 2026-04-07 8:52 UTC (permalink / raw) To: Aleksa Sarai Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman, H. Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Masami Hiramatsu (Google) On Thu, Apr 2, 2026 at 4:52 AM Aleksa Sarai <cyphar@cyphar.com> wrote: > > On 2026-04-01, Mateusz Guzik <mjguzik@gmail.com> wrote: > > Trying to handle this in open() is a no-go. openat2 is rather > > problematic. > > I'm interested in what makes you say that. It would be very nice to be able > to do mkdir + RESOLVE_IN_ROOT and get an fd back all in one syscall. :D > Not handling this in either of open or openat2 does not preclude mkdir + RESOLVE_IN_ROOT + getting a fd in one go from existing. Creating a directory was always a different syscall than creating a file. I don't see any benefit to squeezing it into open. I do see a downside because of an extra branchfest to differentiate the cases. > > The routine would have to start with validating the passed O_ flags, for > > now only allowing O_CLOEXEC and EINVAL-ing otherwise. > > Please do not use O_* flags! O_CLOEXEC takes up 3 flag bits on different > architectures which makes adding new flags a nightmare. > With my proposal there are no new flags added so I don't think that's relevant. > I think this should take AT_* flags and (like most newer syscalls) > O_CLOEXEC should be automatically set. Userspace can unset it with > fnctl(F_SETFD) in the relatively rare case where they don't want > O_CLOEXEC. Alternatively, we could just bite the bullet and make > AT_NO_CLOEXEC a thing... > I would say that's a pretty weird discrepancy vs what normally happens with other syscalls, but perhaps it would be fine. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH 2/2] selftest: add tests for mkdirat_fd() 2026-03-31 17:19 [RFC PATCH 0/2] vfs: mkdirat_fd() syscall Jori Koolstra 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra @ 2026-03-31 17:19 ` Jori Koolstra 1 sibling, 0 replies; 13+ messages in thread From: Jori Koolstra @ 2026-03-31 17:19 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro, Christian Brauner, Jeff Layton, Chuck Lever, Arnd Bergmann, Shuah Khan, Greg Kroah-Hartman Cc: H . Peter Anvin, Jan Kara, Alexander Aring, Peter Zijlstra, Oleg Nesterov, Andrey Albershteyn, Jiri Olsa, Mathieu Desnoyers, Thomas Weißschuh, Namhyung Kim, Arnaldo Carvalho de Melo, Aleksa Sarai, linux-kernel, linux-fsdevel, linux-api, linux-arch, linux-kselftest, cmirabil, Jori Koolstra, Ingo Molnar Add some tests for the new mkdirat_fd() syscall to test compliance and to showcase its behaviour. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> --- tools/include/uapi/asm-generic/unistd.h | 5 +- tools/testing/selftests/filesystems/Makefile | 4 +- .../selftests/filesystems/mkdirat_fd_test.c | 139 ++++++++++++++++++ 3 files changed, 145 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/filesystems/mkdirat_fd_test.c diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index a627acc8fb5f..5bae1029f5d9 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -863,8 +863,11 @@ __SYSCALL(__NR_listns, sys_listns) #define __NR_rseq_slice_yield 471 __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield) +#define __NR_mkdirat_fd 472 +__SYSCALL(__NR_mkdirat_fd, sys_mkdirat_fd) + #undef __NR_syscalls -#define __NR_syscalls 472 +#define __NR_syscalls 473 /* * 32 bit systems traditionally used different diff --git a/tools/testing/selftests/filesystems/Makefile b/tools/testing/selftests/filesystems/Makefile index 85427d7f19b9..7357769db57a 100644 --- a/tools/testing/selftests/filesystems/Makefile +++ b/tools/testing/selftests/filesystems/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -CFLAGS += $(KHDR_INCLUDES) -TEST_GEN_PROGS := devpts_pts file_stressor anon_inode_test kernfs_test fclog +CFLAGS += $(KHDR_INCLUDES) $(TOOLS_INCLUDES) +TEST_GEN_PROGS := devpts_pts file_stressor anon_inode_test kernfs_test fclog mkdirat_fd_test TEST_GEN_PROGS_EXTENDED := dnotify_test include ../lib.mk diff --git a/tools/testing/selftests/filesystems/mkdirat_fd_test.c b/tools/testing/selftests/filesystems/mkdirat_fd_test.c new file mode 100644 index 000000000000..9058be49dc7b --- /dev/null +++ b/tools/testing/selftests/filesystems/mkdirat_fd_test.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <sys/stat.h> + +#include <asm-generic/unistd.h> + +#include "kselftest_harness.h" + +#ifndef MKDIRAT_FD_NEED_FD +#define MKDIRAT_FD_NEED_FD 0x01 +#endif + +#define mkdirat_fd_checked(dfd, pathname) ({ \ + struct stat __st; \ + int __fd = sys_mkdirat_fd(dfd, pathname, S_IRWXU, MKDIRAT_FD_NEED_FD); \ + ASSERT_GE(__fd, 0); \ + EXPECT_EQ(fstat(__fd, &__st), 0); \ + EXPECT_TRUE(S_ISDIR(__st.st_mode)); \ + __fd; \ +}) + +static inline int sys_mkdirat_fd(int dfd, const char *pathname, mode_t mode, + unsigned int flags) +{ + return syscall(__NR_mkdirat_fd, dfd, pathname, mode, flags); +} + +FIXTURE(mkdirat_fd) { + char dirpath[PATH_MAX]; + int dfd; +}; + +FIXTURE_SETUP(mkdirat_fd) +{ + snprintf(self->dirpath, sizeof(self->dirpath), + "/tmp/mkdirat_fd_test.%d", getpid()); + ASSERT_EQ(mkdir(self->dirpath, S_IRWXU), 0); + + self->dfd = open(self->dirpath, O_DIRECTORY); + ASSERT_GE(self->dfd, 0); +} + +FIXTURE_TEARDOWN(mkdirat_fd) +{ + close(self->dfd); + rmdir(self->dirpath); +} + +/* Does mkdirat_fd return a fd at all */ +TEST_F(mkdirat_fd, returns_fd) +{ + int fd = mkdirat_fd_checked(self->dfd, "newdir"); + EXPECT_EQ(close(fd), 0) + EXPECT_EQ(unlinkat(self->dfd, "newdir", AT_REMOVEDIR), 0); +} + +/* The fd must refer to the directory that was just created. */ +TEST_F(mkdirat_fd, fd_is_created_dir) +{ + int fd; + struct stat st_via_fd, st_via_path; + char path[PATH_MAX]; + + fd = mkdirat_fd_checked(self->dfd, "checkdir"); + + ASSERT_EQ(fstat(fd, &st_via_fd), 0); + + snprintf(path, sizeof(path), "%s/checkdir", self->dirpath); + ASSERT_EQ(stat(path, &st_via_path), 0); + + EXPECT_EQ(st_via_fd.st_ino, st_via_path.st_ino); + EXPECT_EQ(st_via_fd.st_dev, st_via_path.st_dev); + + EXPECT_EQ(close(fd), 0) + EXPECT_EQ(rmdir(path), 0); +} + + +/* Missing parent component must fail with ENOENT. */ +TEST_F(mkdirat_fd, enoent_missing_parent) +{ + EXPECT_EQ(sys_mkdirat_fd(self->dfd, "nonexistent/child", S_IRWXU, MKDIRAT_FD_NEED_FD), -1); + EXPECT_EQ(errno, ENOENT); +} + +/* An invalid dfd must fail with EBADF. */ +TEST_F(mkdirat_fd, ebadf) +{ + EXPECT_EQ(sys_mkdirat_fd(-42, "badfdir", S_IRWXU, MKDIRAT_FD_NEED_FD), -1); + EXPECT_EQ(errno, EBADF); +} + +/* A dfd that points to a file (not a directory) must fail with ENOTDIR. */ +TEST_F(mkdirat_fd, enotdir_dfd) +{ + int file_fd; + + file_fd = openat(self->dfd, "file", + O_CREAT | O_WRONLY, S_IRWXU); + ASSERT_GE(file_fd, 0); + + EXPECT_EQ(sys_mkdirat_fd(file_fd, "subdir", S_IRWXU, MKDIRAT_FD_NEED_FD), -1); + EXPECT_EQ(errno, ENOTDIR); + + EXPECT_EQ(close(file_fd), 0); + EXPECT_EQ(unlinkat(self->dfd, "file", 0), 0); +} + +/* + * The returned fd must be usable as a dfd for further *at() calls. + */ +TEST_F(mkdirat_fd, fd_usable_as_dfd) +{ + int parent_fd, child_fd; + + parent_fd = mkdirat_fd_checked(self->dfd, "parent"); + child_fd = mkdirat_fd_checked(parent_fd, "child"); + + EXPECT_EQ(close(child_fd), 0); + EXPECT_EQ(close(parent_fd), 0); + + char path[PATH_MAX]; + snprintf(path, sizeof(path), "%s/parent/child", self->dirpath); + EXPECT_EQ(rmdir(path), 0); + snprintf(path, sizeof(path), "%s/parent", self->dirpath); + EXPECT_EQ(rmdir(path), 0); +} + +/* Unknown flags must be rejected with EINVAL. */ +TEST_F(mkdirat_fd, einval_unknown_flags) +{ + EXPECT_EQ(sys_mkdirat_fd(self->dfd, "flagsdir", S_IRWXU, ~MKDIRAT_FD_NEED_FD), -1); + EXPECT_EQ(errno, EINVAL); +} + +TEST_HARNESS_MAIN -- 2.53.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-07 9:00 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-31 17:19 [RFC PATCH 0/2] vfs: mkdirat_fd() syscall Jori Koolstra 2026-03-31 17:19 ` [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Jori Koolstra 2026-03-31 19:13 ` Arnd Bergmann 2026-04-01 14:09 ` David Laight 2026-03-31 20:25 ` Yann Droneaud 2026-03-31 20:42 ` H. Peter Anvin 2026-04-01 4:19 ` Mateusz Guzik 2026-04-01 9:44 ` Cyril Hrubis 2026-04-01 10:25 ` Jori Koolstra 2026-04-07 9:00 ` Mateusz Guzik 2026-04-02 2:52 ` Aleksa Sarai 2026-04-07 8:52 ` Mateusz Guzik 2026-03-31 17:19 ` [RFC PATCH 2/2] selftest: add tests for mkdirat_fd() Jori Koolstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox