Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCH v12 04/15] exfat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT is always case-insensitive (using an upcase table for
comparison) and always preserves case at rest.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/exfat/exfat_fs.h |  2 ++
 fs/exfat/file.c     | 18 ++++++++++++++++--
 fs/exfat/namei.c    |  1 +
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..aff4dcd4e75a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -496,6 +496,8 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, unsigned int request_mask,
 		  unsigned int query_flags);
+struct file_kattr;
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
 long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
 long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..91e5511945d1 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,6 +14,7 @@
 #include <linux/writeback.h>
 #include <linux/filelock.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 
 #include "exfat_raw.h"
 #include "exfat_fs.h"
@@ -323,6 +324,18 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 	return 0;
 }
 
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	/*
+	 * exFAT compares filenames through an upcase table, so lookup
+	 * is always case-insensitive. Long names are stored in UTF-16
+	 * with case intact; CASENONPRESERVING stays clear.
+	 */
+	fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+	fa->flags |= FS_CASEFOLD_FL;
+	return 0;
+}
+
 int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		  struct iattr *attr)
 {
@@ -817,6 +830,7 @@ const struct file_operations exfat_file_operations = {
 };
 
 const struct inode_operations exfat_file_inode_operations = {
-	.setattr     = exfat_setattr,
-	.getattr     = exfat_getattr,
+	.setattr	= exfat_setattr,
+	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..94002e43db08 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1311,4 +1311,5 @@ const struct inode_operations exfat_dir_inode_operations = {
 	.rename		= exfat_rename,
 	.setattr	= exfat_setattr,
 	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v12 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.

MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.

VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/fat/fat.h         |  3 +++
 fs/fat/file.c        | 36 ++++++++++++++++++++++++++++++++++++
 fs/fat/namei_msdos.c |  1 +
 fs/fat/namei_vfat.c  |  1 +
 4 files changed, 41 insertions(+)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 5a58f0bf8ce8..99ed9228a677 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -10,6 +10,8 @@
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 
+struct file_kattr;
+
 /*
  * vfat shortname flags
  */
@@ -408,6 +410,7 @@ extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(struct mnt_idmap *idmap,
 		       const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int flags);
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
 			  int datasync);
 
diff --git a/fs/fat/file.c b/fs/fat/file.c
index becccdd2e501..37e7049b4c8c 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,6 +17,7 @@
 #include <linux/fsnotify.h>
 #include <linux/security.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 #include "fat.h"
 
 static long fat_fallocate(struct file *file, int mode,
@@ -398,6 +399,40 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset)
 	fat_flush_inodes(inode->i_sb, inode, NULL);
 }
 
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
+	bool case_sensitive;
+
+	/*
+	 * FAT filesystems are case-insensitive by default. VFAT
+	 * becomes case-sensitive when mounted with 'check=strict',
+	 * which installs vfat_dentry_ops. MSDOS has no such option;
+	 * its 'nocase' mount option selects case-sensitive matching.
+	 *
+	 * VFAT long filename entries preserve case. Without VFAT, only
+	 * uppercased 8.3 short names are stored. MSDOS with 'nocase'
+	 * also preserves case.
+	 */
+	if (sbi->options.isvfat)
+		case_sensitive = sbi->options.name_check == 's';
+	else
+		case_sensitive = sbi->options.nocase;
+
+	if (!case_sensitive) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+		if (!sbi->options.isvfat)
+			fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	}
+	if (d_inode(dentry)->i_flags & S_IMMUTABLE) {
+		fa->fsx_xflags |= FS_XFLAG_IMMUTABLE;
+		fa->flags |= FS_IMMUTABLE_FL;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fat_fileattr_get);
+
 int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		struct kstat *stat, u32 request_mask, unsigned int flags)
 {
@@ -575,5 +610,6 @@ EXPORT_SYMBOL_GPL(fat_setattr);
 const struct inode_operations fat_file_inode_operations = {
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 4cc65f330fb7..0fd2971ad4b1 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -644,6 +644,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
 	.rename		= msdos_rename,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 918b3756674c..e909447873e3 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1185,6 +1185,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
 	.rename		= vfat_rename2,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH v12 02/15] fs: Add case sensitivity flags to file_kattr
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.

Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).

Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c           | 4 ++++
 include/linux/fileattr.h | 3 ++-
 include/uapi/linux/fs.h  | 7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index f429da66a317..bfb00d256dd5 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -37,6 +37,8 @@ void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 		fa->flags |= FS_PROJINHERIT_FL;
 	if (fa->fsx_xflags & FS_XFLAG_VERITY)
 		fa->flags |= FS_VERITY_FL;
+	if (fa->fsx_xflags & FS_XFLAG_CASEFOLD)
+		fa->flags |= FS_CASEFOLD_FL;
 }
 EXPORT_SYMBOL(fileattr_fill_xflags);
 
@@ -67,6 +69,8 @@ void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 		fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
 	if (fa->flags & FS_VERITY_FL)
 		fa->fsx_xflags |= FS_XFLAG_VERITY;
+	if (fa->flags & FS_CASEFOLD_FL)
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
 }
 EXPORT_SYMBOL(fileattr_fill_flags);
 
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 3780904a63a6..58044b598016 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -16,7 +16,8 @@
 
 /* Read-only inode flags */
 #define FS_XFLAG_RDONLY_MASK \
-	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY)
+	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY | \
+	 FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
 
 /* Flags to indicate valid value of fsx_ fields */
 #define FS_XFLAG_VALUES_MASK \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 13f71202845e..2ea4c81df08f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -254,6 +254,13 @@ struct file_attr {
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
 #define FS_XFLAG_VERITY		0x00020000	/* fs-verity enabled */
+/*
+ * Case handling flags (read-only, cannot be set via ioctl).
+ * Default (neither set) indicates POSIX semantics: case-sensitive
+ * lookups and case-preserving storage.
+ */
+#define FS_XFLAG_CASEFOLD	0x00040000	/* case-insensitive lookups */
+#define FS_XFLAG_CASENONPRESERVING 0x00080000	/* case not preserved */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is

-- 
2.53.0


^ permalink raw reply related

* [PATCH v12 01/15] fs: Move file_kattr initialization to callers
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.

Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().

Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c     | 12 ++++--------
 fs/xfs/xfs_ioctl.c |  2 +-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index da983e105d70..f429da66a317 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -15,12 +15,10 @@
  * @fa:		fileattr pointer
  * @xflags:	FS_XFLAG_* flags
  *
- * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).  All
- * other fields are zeroed.
+ * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).
  */
 void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->fsx_valid = true;
 	fa->fsx_xflags = xflags;
 	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
@@ -48,11 +46,9 @@ EXPORT_SYMBOL(fileattr_fill_xflags);
  * @flags:	FS_*_FL flags
  *
  * Set ->flags, ->flags_valid and ->fsx_xflags (translated flags).
- * All other fields are zeroed.
  */
 void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->flags_valid = true;
 	fa->flags = flags;
 	if (fa->flags & FS_SYNC_FL)
@@ -325,7 +321,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	unsigned int flags;
 	int err;
 
@@ -357,7 +353,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int err;
 
 	err = copy_fsxattr_from_user(&fa, argp);
@@ -431,7 +427,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
 	struct path filepath __free(path_put) = {};
 	unsigned int lookup_flags = 0;
 	struct file_attr fattr;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int error;
 
 	BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 46e234863644..ed9b4846c05f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -517,7 +517,7 @@ xfs_ioc_fsgetxattra(
 	xfs_inode_t		*ip,
 	void			__user *arg)
 {
-	struct file_kattr	fa;
+	struct file_kattr	fa = {};
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);

-- 
2.53.0


^ permalink raw reply related

* [PATCH v12 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz, Steve French

Following on from:

https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa

I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:

  case-insensitive = false
  case-nonpreserving = false

The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.

Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.

The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.

One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.

Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.

Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.

[1] https://github.com/kofemann/ms-nfs41-client

[2] https://patchwork.kernel.org/project/linux-nfs/cover/20211217203658.439352-1-trondmy@kernel.org/

---
Changes since v11:
- isofs: Wire .fileattr_get only on directory inodes, since
  NFSD and ksmbd query casefolding on directories (Jan Kara)
- xfs, hfsplus: Drop the FS_CASEFOLD_FL fileattr_get mask;
  admit the bit through fileattr_set's allowlist instead
- Address findings from sashiko(gemini-3) and gpt-5.5:
  - cifs: Wire .fileattr_get on cifs_namespace_inode_operations
    so DFS referral / automount directories report case handling
  - fat, ntfs3: Fill FS_IMMUTABLE_FL in fileattr_get
  - hfsplus: Hide FS_CASEFOLD_FL from the legacy flags view so
    chattr round-trips do not hit the setflags whitelist
  - nfs: Clear NFS_CAP_CASE_INSENSITIVE and
    NFS_CAP_CASE_NONPRESERVING before re-OR'ing in the v3 and
    v4 probe paths so re-probe / TSM does not retain stale caps
  - nfsd: Switch nfsd_get_case_info() to errno return so
    v3 PATHCONF and v4 GETATTR can apply version-appropriate
    policy on failure
  - nfsd: Use dget_parent() in v4 case-attr probe to keep
    the parent dentry referenced across the query
  - isofs: Report FS_XFLAG_CASENONPRESERVING for map=n/map=a

Changes since v10:
- cifs: Source case-handling flags from the server's cached
  FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
  option, with a nocase fallback when the reply is absent
- Address findings from sashiko(gemini-3) and gpt-5.5:
  - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
    instead)
  - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
    chattr round-trips do not hit the setflags whitelist
  - ext4, f2fs: Drop redundant fileattr_get patches; the
    FS_CASEFOLD_FL translation in fileattr_fill_flags() already
    reports FS_XFLAG_CASEFOLD for casefolded directories
  - nfsd: Report FATTR4_HOMOGENEOUS = FALSE when the exported
    filesystem has a Unicode encoding, since per-directory
    casefold makes the fs-scoped case attributes inhomogeneous
  - nfsd: Document in nfsd_get_case_info() why -ENOIOCTLCMD and
    -ENOTTY are swallowed while other errors propagate
  - fat: Honor vfat 'check=strict' when reporting FS_XFLAG_CASEFOLD
  - Set FS_CASEFOLD_FL so FS_IOC_GETFLAGS reflects case-insensitive
    mount
  - isofs: Register fileattr_get on regular file and symlink inodes,
    not just directories
  - nfsd: Query NFSv4 FATTR4_CASE_* from the parent directory for
    non-directory objects, since casefold lives on the directory

Changes since v9:
- nfs: always probe PATHCONF for case caps. Default to case-
  preserving when the server does not report case_preserving
- nfsd, ksmbd: tolerate -ENOTTY from vfs_fileattr_get() so
  overlayfs exports on backing filesystems without fileattr_get
  do not fail the RPC
- xfs: map FS_XFLAG_CASEFOLD inside xfs_ip2xflags() so BULKSTAT
  and FS_IOC_FSGETXATTR report the flag consistently
- vboxsf: reject a short host reply to SHFL_INFO_VOLUME before
  trusting volinfo.properties.case_sensitive

Changes since v8:
- Rebase on v7.0-rc1

Changes since v7:
- Split file_attr initialization changes into a separate patch

Changes since v6:
- Remove the memset from vfs_fileattr_get

Changes since v5:
- Finish the conversion to FS_XFLAGs
- NFSv4 GETATTR now clears the attr mask bit if nfsd_get_case_info()
  fails

Changes since v4:
- Observe the MSDOS "nocase" mount option
- Define new FS_XFLAGs for the user API

Changes since v3:
- Change fa->case_preserving to fa_case_nonpreserving
- VFAT is case preserving
- Make new fields available to user space

Changes since v2:
- Remove unicode labels
- Replace vfs_get_case_info
- Add support for several more local file system implementations
- Add support for in-kernel SMB server

Changes since RFC:
- Use file_getattr instead of statx
- Postpone exposing Unicode version until later
- Support NTFS and ext4 in addition to FAT
- Support NFSv4 fattr4 in addition to NFSv3 PATHCONF

---
Chuck Lever (15):
      fs: Move file_kattr initialization to callers
      fs: Add case sensitivity flags to file_kattr
      fat: Implement fileattr_get for case sensitivity
      exfat: Implement fileattr_get for case sensitivity
      ntfs3: Implement fileattr_get for case sensitivity
      hfs: Implement fileattr_get for case sensitivity
      hfsplus: Report case sensitivity in fileattr_get
      xfs: Report case sensitivity in fileattr_get
      cifs: Implement fileattr_get for case sensitivity
      nfs: Implement fileattr_get for case sensitivity
      vboxsf: Implement fileattr_get for case sensitivity
      isofs: Implement fileattr_get for case sensitivity
      nfsd: Report export case-folding via NFSv3 PATHCONF
      nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
      ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION

 fs/exfat/exfat_fs.h            |  2 ++
 fs/exfat/file.c                | 18 +++++++++--
 fs/exfat/namei.c               |  1 +
 fs/fat/fat.h                   |  3 ++
 fs/fat/file.c                  | 36 +++++++++++++++++++++
 fs/fat/namei_msdos.c           |  1 +
 fs/fat/namei_vfat.c            |  1 +
 fs/file_attr.c                 | 16 +++++-----
 fs/hfs/dir.c                   |  1 +
 fs/hfs/hfs_fs.h                |  2 ++
 fs/hfs/inode.c                 | 14 ++++++++
 fs/hfsplus/inode.c             | 16 +++++++++-
 fs/isofs/dir.c                 | 16 ++++++++++
 fs/isofs/isofs.h               |  3 ++
 fs/nfs/client.c                | 25 +++++++++++----
 fs/nfs/inode.c                 | 15 +++++++++
 fs/nfs/internal.h              |  3 ++
 fs/nfs/namespace.c             |  2 ++
 fs/nfs/nfs3proc.c              |  2 ++
 fs/nfs/nfs3xdr.c               |  7 ++--
 fs/nfs/nfs4proc.c              | 10 ++++--
 fs/nfs/proc.c                  |  3 ++
 fs/nfs/symlink.c               |  3 ++
 fs/nfsd/nfs3proc.c             | 36 ++++++++++++++++-----
 fs/nfsd/nfs4xdr.c              | 52 ++++++++++++++++++++++++++++--
 fs/nfsd/vfs.c                  | 72 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.h                  |  3 ++
 fs/nfsd/xdr3.h                 |  4 +--
 fs/ntfs3/file.c                | 29 +++++++++++++++++
 fs/ntfs3/inode.c               |  1 +
 fs/ntfs3/namei.c               |  2 ++
 fs/ntfs3/ntfs_fs.h             |  1 +
 fs/smb/client/cifsfs.c         | 41 ++++++++++++++++++++++++
 fs/smb/client/cifsfs.h         |  3 ++
 fs/smb/client/namespace.c      |  1 +
 fs/smb/server/smb2pdu.c        | 30 ++++++++++++++----
 fs/vboxsf/dir.c                |  1 +
 fs/vboxsf/file.c               |  6 ++--
 fs/vboxsf/super.c              |  7 ++++
 fs/vboxsf/utils.c              | 30 ++++++++++++++++++
 fs/vboxsf/vfsmod.h             |  6 ++++
 fs/xfs/libxfs/xfs_inode_util.c |  2 ++
 fs/xfs/xfs_ioctl.c             | 22 ++++++++++---
 include/linux/fileattr.h       |  3 +-
 include/linux/nfs_fs_sb.h      |  2 +-
 include/linux/nfs_xdr.h        |  2 ++
 include/uapi/linux/fs.h        |  7 ++++
 47 files changed, 513 insertions(+), 50 deletions(-)
---
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
change-id: 20260422-case-sensitivity-5cbffc8f1558

Best regards,
--  
Chuck Lever


^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Mateusz Guzik @ 2026-04-28 14:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
	H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
	Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
	Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
	Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
	linux-arch
In-Reply-To: <20260428-zoodirektor-latten-e412db97141d@brauner>

On Tue, Apr 28, 2026 at 10:55 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Apr 27, 2026 at 06:30:42PM +0200, Mateusz Guzik wrote:
> > On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > > Things proceed to handle_truncate:
> > > >       int error = get_write_access(inode);
> > > >       if (error)
> > > >               return error;
> > > >
> > > >       error = security_file_truncate(filp);
> > > >       if (!error) {
> > > >               error = do_truncate(idmap, path->dentry, 0,
> > > >                                   ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > > >                                   filp);
> > > >       }
> > > >
> > > > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> > > >
> > > > AFAICS nothing prevents the same user from racing against file creation to
> > > > execve it, which starts with exe_file_deny_write_access. Should the
> > > > other thread win the race, get_write_access will fail and the WARN_ON
> > > > splat will be generated. That is definitely a problem.
> > >
> > > That can't happen:
> > >
> > > static inline int get_write_access(struct inode *inode)
> > > {
> > >         return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> > > }
> > >
> > > and the check is:
> > >
> > > error = handle_truncate(idmap, file);
> > > if (unlikely(error > 0)) {
> > >
> > > This was a catch all for broken LSM hook or ->open() instance.
> > >
> >
> > So with this prog:
> > #include <fcntl.h>
> >
> > int main(void)
> > {
> >     open("test", O_TRUNC);
> > }
> >
> > I verified writecount is 0 on entry to handle_truncate like so:
> >
> > bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
> > file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
> > }'
> >
> > @[a.out, 1]: 1
> >
> > i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
> >
> > but then what prevents the following race:
> >
> > CPU0                    CPU1
> > open("test")            execve("test")
> >   handle_truncate         do_open_execat
> >                             exe_file_deny_write_access # should
> > succeed as count is 0?
> >   get_write_access # should fail as the count is now -1?
>
> I'm not arguing that get_write_access() cannot fail. I'm arguing that it
> cannot hit that WARN_ON() as you said above because get_write_access()
> returns either 0 or -ETXTBUSY.

ops, right:
  4681         │       error = handle_truncate(idmap, file);
  4682         if (unlikely(error > 0)) {
  4683                 WARN_ON(1);
  4684                 error = -EINVAL;
  4685         }

I mentally had it warn on any error.

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Paulo Alcantara @ 2026-04-28 14:01 UTC (permalink / raw)
  To: Stefan Metzmacher, Christian Brauner, Jori Koolstra, Jeff Layton
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <b97874a9-9fef-4f7c-8505-cc23e4b45355@samba.org>

Stefan Metzmacher <metze@samba.org> writes:

> Am 27.04.26 um 17:48 schrieb Christian Brauner:
>> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>>> Currently there is no way to race-freely create and open a directory.
>>> For regular files we have open(O_CREAT) for creating a new file inode,
>>> and returning a pinning fd to it. The lack of such functionality for
>>> directories means that when populating a directory tree there's always
>>> a race involved: the inodes first need to be created, and then opened
>>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>>> but in the time window between the creation and the opening they might
>>> be replaced by something else.
>>>
>>> Addressing this race without proper APIs is possible (by immediately
>>> fstat()ing what was opened, to verify that it has the right inode type),
>>> but difficult to get right. Hence, mkdirat2() that creates a directory
>>> and returns an O_DIRECTORY fd is useful.
>>>
>>> This feature idea (and description) is taken from the UAPI group:
>>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>>
>>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>>> ---
>>>   arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>>>   fs/internal.h                          |  2 ++
>>>   fs/namei.c                             | 44 +++++++++++++++++++++++---
>>>   include/linux/syscalls.h               |  2 ++
>>>   include/uapi/asm-generic/unistd.h      |  5 ++-
>>>   scripts/syscall.tbl                    |  1 +
>>>   6 files changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>>> index 524155d655da..e200ca2067a4 100644
>>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>>> @@ -396,6 +396,7 @@
>>>   469	common	file_setattr		sys_file_setattr
>>>   470	common	listns			sys_listns
>>>   471	common	rseq_slice_yield	sys_rseq_slice_yield
>>> +472	common	mkdirat2		sys_mkdirat2
>>>   
>>>   #
>>>   # Due to a historical design error, certain syscalls are numbered differently
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index cbc384a1aa09..c6a79afadacf 100644
>>> --- a/fs/internal.h
>>> +++ b/fs/internal.h
>>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>>>   int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>>>   		 struct filename *newname, unsigned int flags);
>>>   int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> +		unsigned int flags, bool open);
>>>   int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>>>   int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>>>   int filename_linkat(int olddfd, struct filename *old, int newdfd,
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index a880454a6415..6451e96dc225 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>>>   }
>>>   EXPORT_SYMBOL(vfs_mkdir);
>>>   
>>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> +static int mkdirat_lookup_flags(unsigned int flags)
>>> +{
>>> +	int lookup_flags = LOOKUP_DIRECTORY;
>>> +
>>> +	if (!(flags & AT_SYMLINK_NOFOLLOW))
>>> +		lookup_flags |= LOOKUP_FOLLOW;
>>> +	if (!(flags & AT_NO_AUTOMOUNT))
>>> +		lookup_flags |= LOOKUP_AUTOMOUNT;
>>> +
>>> +	return lookup_flags;
>>> +}
>>> +
>>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>>> +	return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>>> +}
>>> +
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> +		unsigned int flags, bool open)
>>>   {
>>>   	struct dentry *dentry;
>>>   	struct path path;
>>>   	int error;
>>> -	unsigned int lookup_flags = LOOKUP_DIRECTORY;
>>> +	struct file *filp = NULL;
>>> +	unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>>>   	struct delegated_inode delegated_inode = { };
>>>   
>>>   retry:
>>>   	dentry = filename_create(dfd, name, &path, lookup_flags);
>>>   	if (IS_ERR(dentry))
>>> -		return PTR_ERR(dentry);
>>> +		return ERR_CAST(dentry);
>>>   
>>>   	error = security_path_mkdir(&path, dentry,
>>>   			mode_strip_umask(path.dentry->d_inode, mode));
>>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>>   		if (IS_ERR(dentry))
>>>   			error = PTR_ERR(dentry);
>>>   	}
>>> +	if (open && !error && !is_delegated(&delegated_inode)) {
>>> +		const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>>> +		filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>>> +	}
>> 
>> So definitely a patchset worthing doing but this will be hairy. And
>> Mateusz is right. As written this doesn't work. The canonical pattern
>> how e.g., dentry_open() does it is to preallocate the file.
>> 
>> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
>> work. I remember that I had a vague comment about this in [1] a few
>> years ago (cf. [1]). It might even be less hairy to get that one right
>> as all the thinking for O_CREAT is already there.
>> 
>> What was the rationale for mkdirat2() instead of threading this through
>> openat()/openat2() with O_CREAT?
>> 
>> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
>> O_DIRECTORY?
>
> If it helps the SMB2/3 protocol only has a single SMB2 Create operation
> that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.

Yes.  However cifs.ko will handle atomic open of regular files only.

IIRC, NFS also doesn't handle atomic opens of directories either.  Jeff
could confirm that.

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Stefan Metzmacher @ 2026-04-28 13:49 UTC (permalink / raw)
  To: Christian Brauner, Jori Koolstra, Jeff Layton
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <b97874a9-9fef-4f7c-8505-cc23e4b45355@samba.org>

Am 28.04.26 um 15:39 schrieb Stefan Metzmacher:
> Am 27.04.26 um 17:48 schrieb Christian Brauner:
>> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>>> Currently there is no way to race-freely create and open a directory.
>>> For regular files we have open(O_CREAT) for creating a new file inode,
>>> and returning a pinning fd to it. The lack of such functionality for
>>> directories means that when populating a directory tree there's always
>>> a race involved: the inodes first need to be created, and then opened
>>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>>> but in the time window between the creation and the opening they might
>>> be replaced by something else.
>>>
>>> Addressing this race without proper APIs is possible (by immediately
>>> fstat()ing what was opened, to verify that it has the right inode type),
>>> but difficult to get right. Hence, mkdirat2() that creates a directory
>>> and returns an O_DIRECTORY fd is useful.
>>>
>>> This feature idea (and description) is taken from the UAPI group:
>>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>>
>>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>>> ---
>>>   arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>>>   fs/internal.h                          |  2 ++
>>>   fs/namei.c                             | 44 +++++++++++++++++++++++---
>>>   include/linux/syscalls.h               |  2 ++
>>>   include/uapi/asm-generic/unistd.h      |  5 ++-
>>>   scripts/syscall.tbl                    |  1 +
>>>   6 files changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>>> index 524155d655da..e200ca2067a4 100644
>>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>>> @@ -396,6 +396,7 @@
>>>   469    common    file_setattr        sys_file_setattr
>>>   470    common    listns            sys_listns
>>>   471    common    rseq_slice_yield    sys_rseq_slice_yield
>>> +472    common    mkdirat2        sys_mkdirat2
>>>   #
>>>   # Due to a historical design error, certain syscalls are numbered differently
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index cbc384a1aa09..c6a79afadacf 100644
>>> --- a/fs/internal.h
>>> +++ b/fs/internal.h
>>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>>>   int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>>>            struct filename *newname, unsigned int flags);
>>>   int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> +        unsigned int flags, bool open);
>>>   int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>>>   int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>>>   int filename_linkat(int olddfd, struct filename *old, int newdfd,
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index a880454a6415..6451e96dc225 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>>>   }
>>>   EXPORT_SYMBOL(vfs_mkdir);
>>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> +static int mkdirat_lookup_flags(unsigned int flags)
>>> +{
>>> +    int lookup_flags = LOOKUP_DIRECTORY;
>>> +
>>> +    if (!(flags & AT_SYMLINK_NOFOLLOW))
>>> +        lookup_flags |= LOOKUP_FOLLOW;
>>> +    if (!(flags & AT_NO_AUTOMOUNT))
>>> +        lookup_flags |= LOOKUP_AUTOMOUNT;
>>> +
>>> +    return lookup_flags;
>>> +}
>>> +
>>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>>> +    return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>>> +}
>>> +
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> +        unsigned int flags, bool open)
>>>   {
>>>       struct dentry *dentry;
>>>       struct path path;
>>>       int error;
>>> -    unsigned int lookup_flags = LOOKUP_DIRECTORY;
>>> +    struct file *filp = NULL;
>>> +    unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>>>       struct delegated_inode delegated_inode = { };
>>>   retry:
>>>       dentry = filename_create(dfd, name, &path, lookup_flags);
>>>       if (IS_ERR(dentry))
>>> -        return PTR_ERR(dentry);
>>> +        return ERR_CAST(dentry);
>>>       error = security_path_mkdir(&path, dentry,
>>>               mode_strip_umask(path.dentry->d_inode, mode));
>>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>>           if (IS_ERR(dentry))
>>>               error = PTR_ERR(dentry);
>>>       }
>>> +    if (open && !error && !is_delegated(&delegated_inode)) {
>>> +        const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>>> +        filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>>> +    }
>>
>> So definitely a patchset worthing doing but this will be hairy. And
>> Mateusz is right. As written this doesn't work. The canonical pattern
>> how e.g., dentry_open() does it is to preallocate the file.
>>
>> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
>> work. I remember that I had a vague comment about this in [1] a few
>> years ago (cf. [1]). It might even be less hairy to get that one right
>> as all the thinking for O_CREAT is already there.
>>
>> What was the rationale for mkdirat2() instead of threading this through
>> openat()/openat2() with O_CREAT?
>>
>> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
>> O_DIRECTORY?
> 
> If it helps the SMB2/3 protocol only has a single SMB2 Create operation
> that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.
> 
> Given all the openat() ignores unknown flags or combinations, maybe this
> should be openat2 only and even a new flag (at the for the userspace interface).
> or do_sys_open() will reject it for open and openat.

I just found the interaction of __O_TMPFILE and O_DIRECTORY
there should be a O_MKDIR or something similar that's openat2 only.

> While we're there an O_TMPDIR would also be wonderful to have.
> Currently samba works around it by using a hidden directory name, invisible
> for SMB clients, but nfs and local users see it.

That should also be openat2 only if added.

metze


^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Stefan Metzmacher @ 2026-04-28 13:39 UTC (permalink / raw)
  To: Christian Brauner, Jori Koolstra, Jeff Layton
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>

Am 27.04.26 um 17:48 schrieb Christian Brauner:
> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>> Currently there is no way to race-freely create and open a directory.
>> For regular files we have open(O_CREAT) for creating a new file inode,
>> and returning a pinning fd to it. The lack of such functionality for
>> directories means that when populating a directory tree there's always
>> a race involved: the inodes first need to be created, and then opened
>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>> but in the time window between the creation and the opening they might
>> be replaced by something else.
>>
>> Addressing this race without proper APIs is possible (by immediately
>> fstat()ing what was opened, to verify that it has the right inode type),
>> but difficult to get right. Hence, mkdirat2() that creates a directory
>> and returns an O_DIRECTORY fd is useful.
>>
>> This feature idea (and description) is taken from the UAPI group:
>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>
>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>> ---
>>   arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>>   fs/internal.h                          |  2 ++
>>   fs/namei.c                             | 44 +++++++++++++++++++++++---
>>   include/linux/syscalls.h               |  2 ++
>>   include/uapi/asm-generic/unistd.h      |  5 ++-
>>   scripts/syscall.tbl                    |  1 +
>>   6 files changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>> index 524155d655da..e200ca2067a4 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -396,6 +396,7 @@
>>   469	common	file_setattr		sys_file_setattr
>>   470	common	listns			sys_listns
>>   471	common	rseq_slice_yield	sys_rseq_slice_yield
>> +472	common	mkdirat2		sys_mkdirat2
>>   
>>   #
>>   # Due to a historical design error, certain syscalls are numbered differently
>> diff --git a/fs/internal.h b/fs/internal.h
>> index cbc384a1aa09..c6a79afadacf 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>>   int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>>   		 struct filename *newname, unsigned int flags);
>>   int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>> +		unsigned int flags, bool open);
>>   int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>>   int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>>   int filename_linkat(int olddfd, struct filename *old, int newdfd,
>> diff --git a/fs/namei.c b/fs/namei.c
>> index a880454a6415..6451e96dc225 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>>   }
>>   EXPORT_SYMBOL(vfs_mkdir);
>>   
>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>> +static int mkdirat_lookup_flags(unsigned int flags)
>> +{
>> +	int lookup_flags = LOOKUP_DIRECTORY;
>> +
>> +	if (!(flags & AT_SYMLINK_NOFOLLOW))
>> +		lookup_flags |= LOOKUP_FOLLOW;
>> +	if (!(flags & AT_NO_AUTOMOUNT))
>> +		lookup_flags |= LOOKUP_AUTOMOUNT;
>> +
>> +	return lookup_flags;
>> +}
>> +
>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>> +	return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>> +}
>> +
>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>> +		unsigned int flags, bool open)
>>   {
>>   	struct dentry *dentry;
>>   	struct path path;
>>   	int error;
>> -	unsigned int lookup_flags = LOOKUP_DIRECTORY;
>> +	struct file *filp = NULL;
>> +	unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>>   	struct delegated_inode delegated_inode = { };
>>   
>>   retry:
>>   	dentry = filename_create(dfd, name, &path, lookup_flags);
>>   	if (IS_ERR(dentry))
>> -		return PTR_ERR(dentry);
>> +		return ERR_CAST(dentry);
>>   
>>   	error = security_path_mkdir(&path, dentry,
>>   			mode_strip_umask(path.dentry->d_inode, mode));
>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>   		if (IS_ERR(dentry))
>>   			error = PTR_ERR(dentry);
>>   	}
>> +	if (open && !error && !is_delegated(&delegated_inode)) {
>> +		const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>> +		filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>> +	}
> 
> So definitely a patchset worthing doing but this will be hairy. And
> Mateusz is right. As written this doesn't work. The canonical pattern
> how e.g., dentry_open() does it is to preallocate the file.
> 
> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
> work. I remember that I had a vague comment about this in [1] a few
> years ago (cf. [1]). It might even be less hairy to get that one right
> as all the thinking for O_CREAT is already there.
> 
> What was the rationale for mkdirat2() instead of threading this through
> openat()/openat2() with O_CREAT?
> 
> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?

If it helps the SMB2/3 protocol only has a single SMB2 Create operation
that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.

Given all the openat() ignores unknown flags or combinations, maybe this
should be openat2 only and even a new flag (at the for the userspace interface).
or do_sys_open() will reject it for open and openat.

While we're there an O_TMPDIR would also be wonderful to have.
Currently samba works around it by using a hidden directory name, invisible
for SMB clients, but nfs and local users see it.

metze

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-28  8:55 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
	H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
	Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
	Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
	Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
	linux-arch
In-Reply-To: <CAGudoHFLSHhDZoC6maLsn234dMQVnG4ZbpKXoVrueGujArNF-A@mail.gmail.com>

On Mon, Apr 27, 2026 at 06:30:42PM +0200, Mateusz Guzik wrote:
> On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > Things proceed to handle_truncate:
> > >       int error = get_write_access(inode);
> > >       if (error)
> > >               return error;
> > >
> > >       error = security_file_truncate(filp);
> > >       if (!error) {
> > >               error = do_truncate(idmap, path->dentry, 0,
> > >                                   ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > >                                   filp);
> > >       }
> > >
> > > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> > >
> > > AFAICS nothing prevents the same user from racing against file creation to
> > > execve it, which starts with exe_file_deny_write_access. Should the
> > > other thread win the race, get_write_access will fail and the WARN_ON
> > > splat will be generated. That is definitely a problem.
> >
> > That can't happen:
> >
> > static inline int get_write_access(struct inode *inode)
> > {
> >         return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> > }
> >
> > and the check is:
> >
> > error = handle_truncate(idmap, file);
> > if (unlikely(error > 0)) {
> >
> > This was a catch all for broken LSM hook or ->open() instance.
> >
> 
> So with this prog:
> #include <fcntl.h>
> 
> int main(void)
> {
>     open("test", O_TRUNC);
> }
> 
> I verified writecount is 0 on entry to handle_truncate like so:
> 
> bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
> file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
> }'
> 
> @[a.out, 1]: 1
> 
> i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
> 
> but then what prevents the following race:
> 
> CPU0                    CPU1
> open("test")            execve("test")
>   handle_truncate         do_open_execat
>                             exe_file_deny_write_access # should
> succeed as count is 0?
>   get_write_access # should fail as the count is now -1?

I'm not arguing that get_write_access() cannot fail. I'm arguing that it
cannot hit that WARN_ON() as you said above because get_write_access()
returns either 0 or -ETXTBUSY.

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Jeff Layton @ 2026-04-28  7:01 UTC (permalink / raw)
  To: Christian Brauner, Jori Koolstra
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <0028f90b0d06cdfcf6b306941fdc3dcbe6c6ab0d.camel@kernel.org>

On Tue, 2026-04-28 at 07:39 +0100, Jeff Layton wrote:
> On Mon, 2026-04-27 at 17:48 +0200, Christian Brauner wrote:
> > 
> > 
> > And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> > O_DIRECTORY?
> > 
> 
> No, it can't. OPEN calls only work on regular files. This is why
> O_DIRECTORY works on NFS. If we end up issuing an OPEN against a
> directory, it'll fail, which is what we want in that situation.

To be clear, we could make that work by sending a second RPC:

    PUTFH+OPEN+....   (OPEN fails with NFS4ERR_ISDIR)

...and then send:

    PUTFH+CREATE...

...for a directory (which is how mkdir works in v4). If the calls race
with something else being created in its place, we could just open it
if it's a directory, or fail.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Jeff Layton @ 2026-04-28  6:39 UTC (permalink / raw)
  To: Christian Brauner, Jori Koolstra
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>

On Mon, 2026-04-27 at 17:48 +0200, Christian Brauner wrote:
> 
> 
> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?
> 

No, it can't. OPEN calls only work on regular files. This is why
O_DIRECTORY works on NFS. If we end up issuing an OPEN against a
directory, it'll fail, which is what we want in that situation.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-28  1:32 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <20260427155636.GC7751@frogsfrogsfrogs>



On Mon, Apr 27, 2026, at 11:56 AM, Darrick J. Wong wrote:
> On Fri, Apr 24, 2026 at 09:53:10PM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> Upper layers such as NFSD need to query whether a filesystem
>> is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
>> when the filesystem is formatted with the ASCIICI feature
>> flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
>> xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
>> directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
>> queries see a consistent view of the filesystem's case-folding
>> behavior.
>> 
>> XFS always preserves case. XFS is case-sensitive by default, but
>> supports ASCII case-insensitive lookups when formatted with the
>> ASCIICI feature flag.
>> 
>> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---

> I don't understand this at all.  Yes, FS_XFLAG_CASEFOLD is readonly,
> but how does clearing FS_CASEFOLD_FL from the fileattr_get output
> (without clearing XFLAG_CASEFOLD!) solve anything?  This makes the
> reported output inconsistent between fsgetxattr and getflags -- one
> reports case folding, the other reports no casefolding.

The masking is a misplaced reaction to a sashiko review on the
v9 predecessor of this patch [1], which pointed out that v9 set
FS_XFLAG_CASEFOLD in fa->fsx_xflags after xfs_fill_fsxattr() had
already synced fa->flags, leaving the two views inconsistent in
the other direction, and that bulkstat would miss the flag for
the same reason. Moving the injection into xfs_ip2xflags() fixed
both gaps -- but it also surfaced FS_CASEFOLD_FL on the legacy
view, so chattr's RMW through FS_IOC_SETFLAGS hits the EOPNOTSUPP
gate at the top of xfs_fileattr_set(). Hiding it from getflags
was the wrong place to address that.

> If you want to avoid fileattr_set returning EINVAL when setting
> attributes due to the casefold flag, then don't you want to check
> the flag state vs. xfs_has_asciici() in the *fileattr_set* path?

Yep. For v12 I’ll drop the fa->flags mask and add FS_CASEFOLD_FL
to the allowlist in xfs_fileattr_set(), gated on xfs_has_asciici(mp).
xfs_flags2diflags() already has no clause for CASEFOLD, so the
FSSETXATTR path silently no-ops it the same way it does for
FS_XFLAG_HASATTR, and FS_XFLAG_CASEFOLD is in FS_XFLAG_RDONLY_MASK
so FSSETXATTR strips it centrally. Both views then agree, and a
chattr round-trip is accepted as a no-op.

The hfsplus patch in this series carries the same pattern --
FS_XFLAG_CASEFOLD is set after fileattr_fill_flags() so that
FS_CASEFOLD_FL stays out of fa->flags and dodges the EOPNOTSUPP
gate in hfsplus_fileattr_set(). I will fix it the same way.

Thanks for the catch!

[1] https://sashiko.dev/#/patchset/20260422-case-sensitivity-v9-0-be023cc070e2@oracle.com?part=9


-- 
Chuck Lever

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Aleksa Sarai @ 2026-04-28  1:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jori Koolstra, Jeff Layton, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro,
	Arnd Bergmann, H . Peter Anvin, Jan Kara, Peter Zijlstra,
	Andrey Albershteyn, Masami Hiramatsu, Jiri Olsa,
	Thomas Weißschuh, Mathieu Desnoyers, cmirabil,
	Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
	linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>

[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]

On 2026-04-27, Christian Brauner <brauner@kernel.org> wrote:
> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
> > +	if (open && !error && !is_delegated(&delegated_inode)) {
> > +		const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
> > +		filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
> > +	}
> 
> So definitely a patchset worthing doing but this will be hairy. And
> Mateusz is right. As written this doesn't work. The canonical pattern
> how e.g., dentry_open() does it is to preallocate the file.
> 
> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
> work. I remember that I had a vague comment about this in [1] a few
> years ago (cf. [1]). It might even be less hairy to get that one right
> as all the thinking for O_CREAT is already there.

That would be my preference, as it would also allow us to use RESOLVE_*
flags nicely.

> What was the rationale for mkdirat2() instead of threading this through
> openat()/openat2() with O_CREAT?

Mateusz said that he didn't like the idea of having more branches in
the open() paths, I think that ship has long since sailed tbh.

> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?
> 
> [1]: 43b450632676 ("open: return EINVAL for O_DIRECTORY | O_CREAT")

-- 
Aleksa Sarai
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Mateusz Guzik @ 2026-04-27 16:30 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
	H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
	Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
	Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
	Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
	linux-arch
In-Reply-To: <20260427-rudel-gipsabdruck-a7884db4ecea@brauner>

On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
>
> > Things proceed to handle_truncate:
> >       int error = get_write_access(inode);
> >       if (error)
> >               return error;
> >
> >       error = security_file_truncate(filp);
> >       if (!error) {
> >               error = do_truncate(idmap, path->dentry, 0,
> >                                   ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> >                                   filp);
> >       }
> >
> > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> >
> > AFAICS nothing prevents the same user from racing against file creation to
> > execve it, which starts with exe_file_deny_write_access. Should the
> > other thread win the race, get_write_access will fail and the WARN_ON
> > splat will be generated. That is definitely a problem.
>
> That can't happen:
>
> static inline int get_write_access(struct inode *inode)
> {
>         return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> }
>
> and the check is:
>
> error = handle_truncate(idmap, file);
> if (unlikely(error > 0)) {
>
> This was a catch all for broken LSM hook or ->open() instance.
>

So with this prog:
#include <fcntl.h>

int main(void)
{
    open("test", O_TRUNC);
}

I verified writecount is 0 on entry to handle_truncate like so:

bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
}'

@[a.out, 1]: 1

i.e., get_write_access in handle_truncate transitioned the count 0 -> 1

but then what prevents the following race:

CPU0                    CPU1
open("test")            execve("test")
  handle_truncate         do_open_execat
                            exe_file_deny_write_access # should
succeed as count is 0?
  get_write_access # should fail as the count is now -1?

^ permalink raw reply

* Re: [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Darrick J. Wong @ 2026-04-27 15:56 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
	linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
	hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
	almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
	adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-8-de5619beddaf@oracle.com>

On Fri, Apr 24, 2026 at 09:53:10PM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Upper layers such as NFSD need to query whether a filesystem
> is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
> when the filesystem is formatted with the ASCIICI feature
> flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
> xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
> directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
> queries see a consistent view of the filesystem's case-folding
> behavior.
> 
> XFS always preserves case. XFS is case-sensitive by default, but
> supports ASCII case-insensitive lookups when formatted with the
> ASCIICI feature flag.
> 
> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_inode_util.c | 2 ++
>  fs/xfs/xfs_ioctl.c             | 7 +++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
> index 551fa51befb6..82be54b6f8d3 100644
> --- a/fs/xfs/libxfs/xfs_inode_util.c
> +++ b/fs/xfs/libxfs/xfs_inode_util.c
> @@ -130,6 +130,8 @@ xfs_ip2xflags(
>  
>  	if (xfs_inode_has_attr_fork(ip))
>  		flags |= FS_XFLAG_HASATTR;
> +	if (xfs_has_asciici(ip->i_mount))
> +		flags |= FS_XFLAG_CASEFOLD;
>  	return flags;
>  }
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index ed9b4846c05f..5a58fb0bad2b 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -472,6 +472,13 @@ xfs_fill_fsxattr(
>  
>  	fileattr_fill_xflags(fa, xfs_ip2xflags(ip));
>  
> +	/*
> +	 * FS_XFLAG_CASEFOLD is read-only; hide it from the legacy
> +	 * flags view so chattr's RMW cycle does not pass it back to
> +	 * xfs_fileattr_set().
> +	 */
> +	fa->flags &= ~FS_CASEFOLD_FL;

I don't understand this at all.  Yes, FS_XFLAG_CASEFOLD is readonly, but
how does clearing FS_CASEFOLD_FL from the fileattr_get output (without
clearing XFLAG_CASEFOLD!) solve anything?  This makes the reported
output inconsistent between fsgetxattr and getflags -- one reports case
folding, the other reports no casefolding.

If you want to avoid fileattr_set returning EINVAL when setting
attributes due to the casefold flag, then don't you want to check the
flag state vs. xfs_has_asciici() in the *fileattr_set* path?

--D

> +
>  	if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) {
>  		fa->fsx_extsize = XFS_FSB_TO_B(mp, ip->i_extsize);
>  	} else if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {
> 
> -- 
> 2.53.0
> 
> 

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-27 15:48 UTC (permalink / raw)
  To: Jori Koolstra, Jeff Layton
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
	Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
	Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
	cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
	linux-api, linux-arch
In-Reply-To: <20260412135434.3095416-2-jkoolstra@xs4all.nl>

On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
> Currently there is no way to race-freely create and open a directory.
> For regular files we have open(O_CREAT) for creating a new file inode,
> and returning a pinning fd to it. The lack of such functionality for
> directories means that when populating a directory tree there's always
> a race involved: the inodes first need to be created, and then opened
> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
> but in the time window between the creation and the opening they might
> be replaced by something else.
> 
> Addressing this race without proper APIs is possible (by immediately
> fstat()ing what was opened, to verify that it has the right inode type),
> but difficult to get right. Hence, mkdirat2() that creates a directory
> and returns an O_DIRECTORY fd is useful.
> 
> This feature idea (and description) is taken from the UAPI group:
> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
> 
> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>  fs/internal.h                          |  2 ++
>  fs/namei.c                             | 44 +++++++++++++++++++++++---
>  include/linux/syscalls.h               |  2 ++
>  include/uapi/asm-generic/unistd.h      |  5 ++-
>  scripts/syscall.tbl                    |  1 +
>  6 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 524155d655da..e200ca2067a4 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -396,6 +396,7 @@
>  469	common	file_setattr		sys_file_setattr
>  470	common	listns			sys_listns
>  471	common	rseq_slice_yield	sys_rseq_slice_yield
> +472	common	mkdirat2		sys_mkdirat2
>  
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/fs/internal.h b/fs/internal.h
> index cbc384a1aa09..c6a79afadacf 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>  int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>  		 struct filename *newname, unsigned int flags);
>  int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> +		unsigned int flags, bool open);
>  int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>  int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>  int filename_linkat(int olddfd, struct filename *old, int newdfd,
> diff --git a/fs/namei.c b/fs/namei.c
> index a880454a6415..6451e96dc225 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  }
>  EXPORT_SYMBOL(vfs_mkdir);
>  
> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
> +static int mkdirat_lookup_flags(unsigned int flags)
> +{
> +	int lookup_flags = LOOKUP_DIRECTORY;
> +
> +	if (!(flags & AT_SYMLINK_NOFOLLOW))
> +		lookup_flags |= LOOKUP_FOLLOW;
> +	if (!(flags & AT_NO_AUTOMOUNT))
> +		lookup_flags |= LOOKUP_AUTOMOUNT;
> +
> +	return lookup_flags;
> +}
> +
> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
> +	return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
> +}
> +
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> +		unsigned int flags, bool open)
>  {
>  	struct dentry *dentry;
>  	struct path path;
>  	int error;
> -	unsigned int lookup_flags = LOOKUP_DIRECTORY;
> +	struct file *filp = NULL;
> +	unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>  	struct delegated_inode delegated_inode = { };
>  
>  retry:
>  	dentry = filename_create(dfd, name, &path, lookup_flags);
>  	if (IS_ERR(dentry))
> -		return PTR_ERR(dentry);
> +		return ERR_CAST(dentry);
>  
>  	error = security_path_mkdir(&path, dentry,
>  			mode_strip_umask(path.dentry->d_inode, mode));
> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>  		if (IS_ERR(dentry))
>  			error = PTR_ERR(dentry);
>  	}
> +	if (open && !error && !is_delegated(&delegated_inode)) {
> +		const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
> +		filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
> +	}

So definitely a patchset worthing doing but this will be hairy. And
Mateusz is right. As written this doesn't work. The canonical pattern
how e.g., dentry_open() does it is to preallocate the file.

I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
work. I remember that I had a vague comment about this in [1] a few
years ago (cf. [1]). It might even be less hairy to get that one right
as all the thinking for O_CREAT is already there.

What was the rationale for mkdirat2() instead of threading this through
openat()/openat2() with O_CREAT?

And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
O_DIRECTORY?

[1]: 43b450632676 ("open: return EINVAL for O_DIRECTORY | O_CREAT")

^ permalink raw reply

* Re: [PATCH v11 00/15] Exposing case folding behavior
From: Jan Kara @ 2026-04-27 15:30 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Darrick J. Wong,
	Roland Mainz, Steve French
In-Reply-To: <af3f7518-7501-4c25-9bbc-a8fc8cdb4e29@app.fastmail.com>

On Mon 27-04-26 09:30:28, Chuck Lever wrote:
> 
> On Mon, Apr 27, 2026, at 6:55 AM, Jan Kara wrote:
> > On Fri 24-04-26 21:53:02, Chuck Lever wrote:
> >> Changes since v10:
> >> - cifs: Source case-handling flags from the server's cached
> >>   FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
> >>   option, with a nocase fallback when the reply is absent
> >> - Address findings from sashiko(gemini-3) and gpt-5.5:
> >>   - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
> >>     instead)
> >>   - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
> >>     chattr round-trips do not hit the setflags whitelist
> >>   - ext4, f2fs: Drop redundant fileattr_get patches; the
> >>     FS_CASEFOLD_FL translation in fileattr_fill_flags() already
> >>     reports FS_XFLAG_CASEFOLD for casefolded directories
> >
> > Err, how is this supposed to work? I wasn't able to find any code
> > transforming S_CASEFOLDED inode flag into FS_CASEFOLD_FL on fileattr_get
> > path. Sure, fileattr_fill_flags() takes care of setting FS_XFLAG_CASEFOLD
> > once FS_CASEFOLD_FL is set. What am I missing?
> 
> Agreed, that is a little surprising.
> 
> The path does not go through S_CASEFOLD.  Both filesystems
> report FS_CASEFOLD_FL straight from their on-disk flag word.
> 
> For ext4, EXT4_CASEFOLD_FL is 0x40000000, the same bit value
> as FS_CASEFOLD_FL, and it is included in EXT4_FL_USER_VISIBLE.
> ext4_iget() loads it into ei->i_flags directly from
> raw_inode->i_flags (fs/ext4/inode.c:5358). ext4_fileattr_get()
> then masks with EXT4_FL_USER_VISIBLE and hands the result to
> fileattr_fill_flags(), which translates FS_CASEFOLD_FL into
> FS_XFLAG_CASEFOLD on the way out.

Oh, right. I've missed how EXT4_CASEFOLD_FL propagates into FS_CASEFOLD_FL.
Thanks for clearing that out.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-27 15:14 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
	H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
	Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
	Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
	Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
	linux-arch
In-Reply-To: <5xexygc3rvvlir4smdfn7gndwjgbuijqfummwwumivsnosijux@ygqs3iqxmovh>

> Things proceed to handle_truncate:
> 	int error = get_write_access(inode);
> 	if (error)
> 		return error;
> 
> 	error = security_file_truncate(filp);
> 	if (!error) {
> 		error = do_truncate(idmap, path->dentry, 0,
> 				    ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> 				    filp);
> 	}
> 
> I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> 
> AFAICS nothing prevents the same user from racing against file creation to
> execve it, which starts with exe_file_deny_write_access. Should the
> other thread win the race, get_write_access will fail and the WARN_ON
> splat will be generated. That is definitely a problem.

That can't happen:

static inline int get_write_access(struct inode *inode)
{
        return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
}

and the check is:

error = handle_truncate(idmap, file);
if (unlikely(error > 0)) {

This was a catch all for broken LSM hook or ->open() instance.


^ permalink raw reply

* Re: [PATCH v6 1/4] openat2: new OPENAT2_REGULAR flag support
From: Christian Brauner @ 2026-04-27 14:29 UTC (permalink / raw)
  To: Dorjoy Chowdhury, Florian Weimer
  Cc: Alejandro Colomar, linux-fsdevel, linux-kernel, linux-api,
	ceph-devel, gfs2, linux-nfs, linux-cifs, v9fs, linux-kselftest,
	viro, jack, jlayton, chuck.lever, alex.aring, arnd, adilger,
	mjguzik, smfrench, richard.henderson, mattst88, linmag7, tsbogend,
	James.Bottomley, deller, davem, andreas, idryomov, amarkuze,
	slava, agruenba, trondmy, anna, sfrench, pc, ronniesahlberg,
	sprasad, tom, bharathsm, shuah, miklos, hansg
In-Reply-To: <CAFfO_h5B=Ox9S=Xc=az2vQwowffohch-mkvSggYAfNXaVuv5GA@mail.gmail.com>

On Mon, Apr 27, 2026 at 08:17:43PM +0600, Dorjoy Chowdhury wrote:
> On Mon, Apr 27, 2026 at 7:28 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Dorjoy Chowdhury:
> >
> > > diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> > > index 92e7ae493ee3..bd78e69e0a43 100644
> > > --- a/include/uapi/asm-generic/errno.h
> > > +++ b/include/uapi/asm-generic/errno.h
> > > @@ -122,4 +122,6 @@
> > >
> > >  #define EHWPOISON    133     /* Memory page has hardware error */
> > >
> > > +#define EFTYPE               134     /* Wrong file type for the intended operation */
> > > +
> > >  #endif
> >
> > This is what POSIX says about EFTYPE, in the Rationale for System
> > Interfaces:
> >
> > “
> > [EFTYPE]
> >     This error code was proposed in earlier proposals as "Inappropriate
> >     operation for file type", meaning that the operation requested is
> >     not appropriate for the file specified in the function call. This
> >     code was proposed, although the same idea was covered by [ENOTTY],
> >     because the connotations of the name would be misleading. It was
> >     pointed out that the fcntl() function uses the error code [EINVAL]
> >     for this notion, and hence all instances of [EFTYPE] were changed to
> >     this code.
> > ”
> >
> > So I'm not sure if reusing this name is a good idea.
> >
> 
> Thanks for pointing this out. I had started out the patch series with
> ENOTREGULAR and it was discussed that EFTYPE was a better and more
> generic error code which is also used in BSD systems like FreeBSD[1]
> and MacOS[2]. I also agree that EFTYPE makes sense. We can of course
> change to something else if everyone agrees.
> 
> cc Christian Brauner who originally suggested EFTYPE for input on this.
> 
> [1]: https://man.freebsd.org/cgi/man.cgi?errno(2)
> [2]: https://developer.apple.com/documentation/foundation/posixerror/eftype

Given that both the bsds and macos already use that is there a good
reason to return ENOTTY for this other than a standard we ignore most of
the time anyway? I'm honestly asking.

^ permalink raw reply

* Re: [PATCH v6 1/4] openat2: new OPENAT2_REGULAR flag support
From: Dorjoy Chowdhury @ 2026-04-27 14:17 UTC (permalink / raw)
  To: Florian Weimer, brauner, Alejandro Colomar
  Cc: linux-fsdevel, linux-kernel, linux-api, ceph-devel, gfs2,
	linux-nfs, linux-cifs, v9fs, linux-kselftest, viro, jack, jlayton,
	chuck.lever, alex.aring, arnd, adilger, mjguzik, smfrench,
	richard.henderson, mattst88, linmag7, tsbogend, James.Bottomley,
	deller, davem, andreas, idryomov, amarkuze, slava, agruenba,
	trondmy, anna, sfrench, pc, ronniesahlberg, sprasad, tom,
	bharathsm, shuah, miklos, hansg
In-Reply-To: <lhuzf2oy1me.fsf@oldenburg.str.redhat.com>

On Mon, Apr 27, 2026 at 7:28 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Dorjoy Chowdhury:
>
> > diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> > index 92e7ae493ee3..bd78e69e0a43 100644
> > --- a/include/uapi/asm-generic/errno.h
> > +++ b/include/uapi/asm-generic/errno.h
> > @@ -122,4 +122,6 @@
> >
> >  #define EHWPOISON    133     /* Memory page has hardware error */
> >
> > +#define EFTYPE               134     /* Wrong file type for the intended operation */
> > +
> >  #endif
>
> This is what POSIX says about EFTYPE, in the Rationale for System
> Interfaces:
>
> “
> [EFTYPE]
>     This error code was proposed in earlier proposals as "Inappropriate
>     operation for file type", meaning that the operation requested is
>     not appropriate for the file specified in the function call. This
>     code was proposed, although the same idea was covered by [ENOTTY],
>     because the connotations of the name would be misleading. It was
>     pointed out that the fcntl() function uses the error code [EINVAL]
>     for this notion, and hence all instances of [EFTYPE] were changed to
>     this code.
> ”
>
> So I'm not sure if reusing this name is a good idea.
>

Thanks for pointing this out. I had started out the patch series with
ENOTREGULAR and it was discussed that EFTYPE was a better and more
generic error code which is also used in BSD systems like FreeBSD[1]
and MacOS[2]. I also agree that EFTYPE makes sense. We can of course
change to something else if everyone agrees.

cc Christian Brauner who originally suggested EFTYPE for input on this.

[1]: https://man.freebsd.org/cgi/man.cgi?errno(2)
[2]: https://developer.apple.com/documentation/foundation/posixerror/eftype

Regards,
Dorjoy

^ permalink raw reply

* Re: [PATCH v2 1/2] man/man3/errno.3: Document EFTYPE error code
From: Alejandro Colomar @ 2026-04-27 13:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dorjoy Chowdhury, linux-man, brauner, jlayton, libc-alpha,
	linux-api
In-Reply-To: <lhuv7dcy1j9.fsf@oldenburg.str.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Hi Florian,

On 2026-04-27T15:29:30+0200, Florian Weimer wrote:
> * Alejandro Colomar:
> 
> > Hi Florian,
> >
> > On 2026-04-27T12:34:30+0200, Florian Weimer wrote:
> >> * Alejandro Colomar:
> >> 
> >> > [CC += libc-alpha]
> >> >
> >> > Hi Dorjoy,
> >> >
> >> > On 2026-04-26T17:14:25+0600, Dorjoy Chowdhury wrote:
> >> >> Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
> >> >
> >> > Thanks!
> >> >
> >> > 	Reviewed-by: Alejandro Colomar <alx@kernel.org>
> >> >
> >> > I will wait until glibc adds this error code to their <errno.h> before
> >> > applying the patch.  This means either you should write and send a patch
> >> > to glibc (if so, please CC me), or you should ask them to add it
> >> > themselves (if you're not comfortable writing glibc code).
> >> 
> >> I'm not sure where this is coming from.
> >
> > Here's a link to the thread:
> > <https://lore.kernel.org/linux-man/20260426111707.36541-1-dorjoychy111@gmail.com/T/>
> >
> >> POSIX says EFTYPE was rejected
> >> in favor of ENOTTY.
> >
> > Could you please share a link to that?
> >
> > Anyway, I guess ENOTTY would be inappropriate in this case.  Although
> > maybe a better error code could be devised; I don't know.  This is why
> > I wanted glibc involved in this discussion before this arrives to a
> > Linux release.  Thanks for the quick feedback!
> 
> It's in the Rationale for System Interfaces:
> 
> “
> [EFTYPE]
>     This error code was proposed in earlier proposals as "Inappropriate
>     operation for file type", meaning that the operation requested is
>     not appropriate for the file specified in the function call. This
>     code was proposed, although the same idea was covered by [ENOTTY],
>     because the connotations of the name would be misleading. It was
>     pointed out that the fcntl() function uses the error code [EINVAL]
>     for this notion, and hence all instances of [EFTYPE] were changed to
>     this code.
> ”
> 
> I replied on linux-fsdevel, too.

Thanks!

> 
> (It would be nice to submit patches introducing new error codes to
> linux-api with a subject mentioning the error code.)

Thanks!  I'll remember this advice for when receiving patches that add
error codes.

> 
> Thanks,
> Florian

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v11 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-04-27 13:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-ext4,
	linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
	OGAWA Hirofumi, Namjae Jeon, Sungjong Seo, Yuezhang Mo,
	almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Darrick J. Wong,
	Roland Mainz, Steve French
In-Reply-To: <yc7ygk6w6zvf46arzzvmxnuoqjrni2dtlhmywaivzmvfxnilf3@xv7tthtrowns>


On Mon, Apr 27, 2026, at 6:55 AM, Jan Kara wrote:
> On Fri 24-04-26 21:53:02, Chuck Lever wrote:
>> Changes since v10:
>> - cifs: Source case-handling flags from the server's cached
>>   FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
>>   option, with a nocase fallback when the reply is absent
>> - Address findings from sashiko(gemini-3) and gpt-5.5:
>>   - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
>>     instead)
>>   - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
>>     chattr round-trips do not hit the setflags whitelist
>>   - ext4, f2fs: Drop redundant fileattr_get patches; the
>>     FS_CASEFOLD_FL translation in fileattr_fill_flags() already
>>     reports FS_XFLAG_CASEFOLD for casefolded directories
>
> Err, how is this supposed to work? I wasn't able to find any code
> transforming S_CASEFOLDED inode flag into FS_CASEFOLD_FL on fileattr_get
> path. Sure, fileattr_fill_flags() takes care of setting FS_XFLAG_CASEFOLD
> once FS_CASEFOLD_FL is set. What am I missing?

Agreed, that is a little surprising.

The path does not go through S_CASEFOLD.  Both filesystems
report FS_CASEFOLD_FL straight from their on-disk flag word.

For ext4, EXT4_CASEFOLD_FL is 0x40000000, the same bit value
as FS_CASEFOLD_FL, and it is included in EXT4_FL_USER_VISIBLE.
ext4_iget() loads it into ei->i_flags directly from
raw_inode->i_flags (fs/ext4/inode.c:5358). ext4_fileattr_get()
then masks with EXT4_FL_USER_VISIBLE and hands the result to
fileattr_fill_flags(), which translates FS_CASEFOLD_FL into
FS_XFLAG_CASEFOLD on the way out.

For f2fs, f2fs_fileattr_get() runs fi->i_flags through
f2fs_iflags_to_fsflags(), whose mapping table has an explicit
{ F2FS_CASEFOLD_FL, FS_CASEFOLD_FL } entry (fs/f2fs/file.c:2205).
F2FS_GETTABLE_FS_FL includes FS_CASEFOLD_FL, so
fileattr_fill_flags() again lights up FS_XFLAG_CASEFOLD.

S_CASEFOLD is a separate VFS-level cache that
ext4_set_inode_flags() and f2fs's equivalent set at iget
time; nothing on the fileattr_get path consults it.

For reference, the original observation about the manual
assignment being redundant came from sashiko's review of v10:

  https://sashiko.dev/#/patchset/20260423-case-sensitivity-v10-0-c385d674a6cf%40oracle.com?part=8                                             
  https://sashiko.dev/#/patchset/20260423-case-sensitivity-v10-0-c385d674a6cf%40oracle.com?part=12  


-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v11 12/15] isofs: Implement fileattr_get for case sensitivity
From: Jan Kara @ 2026-04-27 13:30 UTC (permalink / raw)
  To: Lionel Cons
  Cc: Jan Kara, Chuck Lever, Al Viro, Christian Brauner, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
	almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
	adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <CAPJSo4WmRu_64TxBsaimWOqz3VAU0TZ1H-_hw36HSqzQULm39w@mail.gmail.com>

On Mon 27-04-26 14:02:00, Lionel Cons wrote:
> On Mon, 27 Apr 2026 at 12:47, Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 24-04-26 21:53:14, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > >
> > > Upper layers such as NFSD need a way to query whether a
> > > filesystem handles filenames in a case-sensitive manner so
> > > they can provide correct semantics to remote clients. Without
> > > this information, NFS exports of ISO 9660 filesystems cannot
> > > advertise their filename case behavior.
> > >
> > > Implement isofs_fileattr_get() to report ISO 9660 case handling
> > > behavior via the FS_XFLAG_CASEFOLD flag. The 'check=r' (relaxed)
> > > mount option enables case-insensitive lookups, and this setting
> > > determines the value reported. By default, Joliet extensions
> > > operate in relaxed mode while plain ISO 9660 uses strict
> > > (case-sensitive) mode. All ISO 9660 variants are case-preserving,
> > > meaning filenames are stored exactly as they appear on the disc.
> > >
> > > Case handling is a superblock-wide property, so the callback
> > > must report the same value for every inode type. Regular files
> > > previously had no inode_operations; introduce
> > > isofs_file_inode_operations to carry the callback. Symlinks
> > > previously shared page_symlink_inode_operations; introduce
> > > isofs_symlink_inode_operations, which wires page_get_link
> > > alongside the callback, so that fileattr queries on a symlink
> > > reach the isofs implementation instead of returning
> > > -ENOIOCTLCMD. The flag is set in both fa->fsx_xflags and
> > > fa->flags so FS_IOC_FSGETXATTR and FS_IOC_GETFLAGS agree.
> > >
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >
> > ...
> >
> > > @@ -281,6 +293,18 @@ const struct file_operations isofs_dir_operations =
> > >  const struct inode_operations isofs_dir_inode_operations =
> > >  {
> > >       .lookup = isofs_lookup,
> > > +     .fileattr_get = isofs_fileattr_get,
> > > +};
> > > +
> > > +const struct inode_operations isofs_file_inode_operations =
> > > +{
> > > +     .fileattr_get = isofs_fileattr_get,
> > > +};
> > > +
> > > +const struct inode_operations isofs_symlink_inode_operations =
> > > +{
> > > +     .get_link = page_get_link,
> > > +     .fileattr_get = isofs_fileattr_get,
> > >  };
> >
> > Hum, I thought casefolding is a directory attribute. At least I don't see
> > a big point in reporting it for regular files or symlinks (and then why not
> > report it for device nodes or named pipes?). So why did you decide for this
> > change?
> 
> Where do you see this being a per-directory attribute in
> https://web.archive.org/web/20170404043745/http://www.ymi.com/ymi/sites/default/files/pdf/Rockridge.pdf

I wasn't refering to Rockridge standard but rather to the general way how
VFS tracks (and reports) casefolding.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v2 1/2] man/man3/errno.3: Document EFTYPE error code
From: Florian Weimer @ 2026-04-27 13:29 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Dorjoy Chowdhury, linux-man, brauner, jlayton, libc-alpha,
	linux-api
In-Reply-To: <ae9gDtEo6OxHTYBt@devuan>

* Alejandro Colomar:

> Hi Florian,
>
> On 2026-04-27T12:34:30+0200, Florian Weimer wrote:
>> * Alejandro Colomar:
>> 
>> > [CC += libc-alpha]
>> >
>> > Hi Dorjoy,
>> >
>> > On 2026-04-26T17:14:25+0600, Dorjoy Chowdhury wrote:
>> >> Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
>> >
>> > Thanks!
>> >
>> > 	Reviewed-by: Alejandro Colomar <alx@kernel.org>
>> >
>> > I will wait until glibc adds this error code to their <errno.h> before
>> > applying the patch.  This means either you should write and send a patch
>> > to glibc (if so, please CC me), or you should ask them to add it
>> > themselves (if you're not comfortable writing glibc code).
>> 
>> I'm not sure where this is coming from.
>
> Here's a link to the thread:
> <https://lore.kernel.org/linux-man/20260426111707.36541-1-dorjoychy111@gmail.com/T/>
>
>> POSIX says EFTYPE was rejected
>> in favor of ENOTTY.
>
> Could you please share a link to that?
>
> Anyway, I guess ENOTTY would be inappropriate in this case.  Although
> maybe a better error code could be devised; I don't know.  This is why
> I wanted glibc involved in this discussion before this arrives to a
> Linux release.  Thanks for the quick feedback!

It's in the Rationale for System Interfaces:

“
[EFTYPE]
    This error code was proposed in earlier proposals as "Inappropriate
    operation for file type", meaning that the operation requested is
    not appropriate for the file specified in the function call. This
    code was proposed, although the same idea was covered by [ENOTTY],
    because the connotations of the name would be misleading. It was
    pointed out that the fcntl() function uses the error code [EINVAL]
    for this notion, and hence all instances of [EFTYPE] were changed to
    this code.
”

I replied on linux-fsdevel, too.

(It would be nice to submit patches introducing new error codes to
linux-api with a subject mentioning the error code.)

Thanks,
Florian


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox