Linux EXT4 FS development
 help / color / mirror / Atom feed
* [PATCH v14 12/15] isofs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner so
they can provide correct semantics to remote clients. Without
this information, NFS exports of ISO 9660 filesystems cannot
advertise their filename case behavior.

Implement isofs_fileattr_get() to report ISO 9660 case handling
behavior. The 'check=r' (relaxed) mount option enables
case-insensitive lookups and is reported via FS_XFLAG_CASEFOLD.
By default, Joliet extensions operate in relaxed mode while
plain ISO 9660 uses strict (case-sensitive) mode.

Plain ISO 9660 names on the medium are uppercase. When neither
Rock Ridge nor Joliet is in effect, the default 'map=n' option
(and 'map=a') routes lookup and readdir through
isofs_name_translate(), which forces A-Z to a-z. The names
visible to userspace then differ in case from the on-disc form,
so report FS_XFLAG_CASENONPRESERVING in that configuration. Rock
Ridge and Joliet both deliver names as authored, and 'map=o'
emits the raw on-disc name unchanged, so those configurations
remain case-preserving.

Casefolding is a directory property, and the in-tree consumers
(NFSD, ksmbd) issue the query against a directory: NFSD walks
to the parent for non-directory dentries before calling
vfs_fileattr_get(), and ksmbd reports per-share attributes from
the share root. Wire .fileattr_get only on
isofs_dir_inode_operations. The CASEFOLD flag is set in both
fa->fsx_xflags and fa->flags so FS_IOC_FSGETXATTR and
FS_IOC_GETFLAGS agree.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/isofs/dir.c   | 16 ++++++++++++++++
 fs/isofs/isofs.h |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/fs/isofs/dir.c b/fs/isofs/dir.c
index 2fd9948d606e..55385a72a4ce 100644
--- a/fs/isofs/dir.c
+++ b/fs/isofs/dir.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/filelock.h>
 #include "isofs.h"
+#include <linux/fileattr.h>
 
 int isofs_name_translate(struct iso_directory_record *de, char *new, struct inode *inode)
 {
@@ -267,6 +268,20 @@ static int isofs_readdir(struct file *file, struct dir_context *ctx)
 	return result;
 }
 
+int isofs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct isofs_sb_info *sbi = ISOFS_SB(dentry->d_sb);
+
+	if (sbi->s_check == 'r') {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (!sbi->s_joliet_level && !sbi->s_rock &&
+	    (sbi->s_mapping == 'n' || sbi->s_mapping == 'a'))
+		fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	return 0;
+}
+
 const struct file_operations isofs_dir_operations =
 {
 	.llseek = generic_file_llseek,
@@ -281,6 +296,7 @@ const struct file_operations isofs_dir_operations =
 const struct inode_operations isofs_dir_inode_operations =
 {
 	.lookup = isofs_lookup,
+	.fileattr_get = isofs_fileattr_get,
 };
 
 
diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index 506555837533..0ec8b24a42ed 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -197,6 +197,9 @@ isofs_normalize_block_and_offset(struct iso_directory_record* de,
 	}
 }
 
+struct file_kattr;
+int isofs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
 extern const struct inode_operations isofs_dir_inode_operations;
 extern const struct file_operations isofs_dir_operations;
 extern const struct address_space_operations isofs_symlink_aops;

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 11/15] vboxsf: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner. Report
VirtualBox shared folder case handling behavior via the
FS_XFLAG_CASEFOLD flag.

The case sensitivity property is queried from the VirtualBox host
service at mount time and cached in struct vboxsf_sbi. The host
determines case sensitivity based on the underlying host filesystem
(for example, Windows NTFS is case-insensitive while Linux ext4 is
case-sensitive).

VirtualBox shared folders always preserve filename case exactly
as provided by the guest. The host interface does not expose a
separate case-preserving property; leaving
FS_XFLAG_CASENONPRESERVING unset reports the POSIX-default
case-preserving behavior, which matches vboxsf semantics.

The callback is registered in all three inode_operations
structures (directory, file, and symlink) to ensure consistent
reporting across all inode types.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/vboxsf/dir.c    |  1 +
 fs/vboxsf/file.c   |  6 ++++--
 fs/vboxsf/super.c  |  7 +++++++
 fs/vboxsf/utils.c  | 30 ++++++++++++++++++++++++++++++
 fs/vboxsf/vfsmod.h |  6 ++++++
 5 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 42bedc4ec7af..c5bd3271aa96 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -477,4 +477,5 @@ const struct inode_operations vboxsf_dir_iops = {
 	.symlink = vboxsf_dir_symlink,
 	.getattr = vboxsf_getattr,
 	.setattr = vboxsf_setattr,
+	.fileattr_get = vboxsf_fileattr_get,
 };
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index 7a7a3fbb2651..943953867e18 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -222,7 +222,8 @@ const struct file_operations vboxsf_reg_fops = {
 
 const struct inode_operations vboxsf_reg_iops = {
 	.getattr = vboxsf_getattr,
-	.setattr = vboxsf_setattr
+	.setattr = vboxsf_setattr,
+	.fileattr_get = vboxsf_fileattr_get,
 };
 
 static int vboxsf_read_folio(struct file *file, struct folio *folio)
@@ -389,5 +390,6 @@ static const char *vboxsf_get_link(struct dentry *dentry, struct inode *inode,
 }
 
 const struct inode_operations vboxsf_lnk_iops = {
-	.get_link = vboxsf_get_link
+	.get_link = vboxsf_get_link,
+	.fileattr_get = vboxsf_fileattr_get,
 };
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index a618cb093e00..a61fbab51d37 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -185,6 +185,13 @@ static int vboxsf_fill_super(struct super_block *sb, struct fs_context *fc)
 	if (err)
 		goto fail_unmap;
 
+	/*
+	 * A failed query leaves sbi->case_insensitive false, so the
+	 * mount defaults to reporting case-sensitive behavior. Do not
+	 * fail the mount over an advisory attribute.
+	 */
+	vboxsf_query_case_sensitive(sbi);
+
 	sb->s_magic = VBOXSF_SUPER_MAGIC;
 	sb->s_blocksize = 1024;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index 440e8c50629d..298bfc93255c 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -11,6 +11,7 @@
 #include <linux/sizes.h>
 #include <linux/pagemap.h>
 #include <linux/vfs.h>
+#include <linux/fileattr.h>
 #include "vfsmod.h"
 
 struct inode *vboxsf_new_inode(struct super_block *sb)
@@ -567,3 +568,32 @@ int vboxsf_dir_read_all(struct vboxsf_sbi *sbi, struct vboxsf_dir_info *sf_d,
 
 	return err;
 }
+
+int vboxsf_query_case_sensitive(struct vboxsf_sbi *sbi)
+{
+	struct shfl_volinfo volinfo = {};
+	u32 buf_len;
+	int err;
+
+	buf_len = sizeof(volinfo);
+	err = vboxsf_fsinfo(sbi->root, 0, SHFL_INFO_GET | SHFL_INFO_VOLUME,
+			    &buf_len, &volinfo);
+	if (err)
+		return err;
+	if (buf_len < sizeof(volinfo))
+		return 0;
+
+	sbi->case_insensitive = !volinfo.properties.case_sensitive;
+	return 0;
+}
+
+int vboxsf_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct vboxsf_sbi *sbi = VBOXSF_SBI(dentry->d_sb);
+
+	if (sbi->case_insensitive) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	return 0;
+}
diff --git a/fs/vboxsf/vfsmod.h b/fs/vboxsf/vfsmod.h
index 05973eb89d52..b61afd0ce842 100644
--- a/fs/vboxsf/vfsmod.h
+++ b/fs/vboxsf/vfsmod.h
@@ -47,6 +47,7 @@ struct vboxsf_sbi {
 	u32 next_generation;
 	u32 root;
 	int bdi_id;
+	bool case_insensitive;
 };
 
 /* per-inode information */
@@ -111,6 +112,11 @@ void vboxsf_dir_info_free(struct vboxsf_dir_info *p);
 int vboxsf_dir_read_all(struct vboxsf_sbi *sbi, struct vboxsf_dir_info *sf_d,
 			u64 handle);
 
+int vboxsf_query_case_sensitive(struct vboxsf_sbi *sbi);
+
+struct file_kattr;
+int vboxsf_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
 /* from vboxsf_wrappers.c */
 int vboxsf_connect(void);
 void vboxsf_disconnect(void);

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 10/15] nfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

An NFS server re-exporting an NFS mount point needs to report
the case sensitivity behavior of the underlying filesystem to
its clients. NFSD's attribute encoder obtains that information
by calling vfs_fileattr_get() on the lower filesystem, so the
NFS client must implement fileattr_get to surface what it
learned from its own server.

The NFS client already retrieves case sensitivity information
from servers during mount via PATHCONF (NFSv3) or the
FATTR4_CASE_INSENSITIVE/FATTR4_CASE_PRESERVING attributes
(NFSv4). Expose this information through fileattr_get by
reporting the FS_XFLAG_CASEFOLD and FS_XFLAG_CASENONPRESERVING
flags. NFSv2 lacks PATHCONF support, so mounts using that protocol
version default to standard POSIX behavior: case-sensitive and
case-preserving.

PATHCONF is now invoked unconditionally for NFSv2 and NFSv3 mounts
so the case-sensitivity capabilities are established even when the
user pins server->namelen with the namlen= mount option. That option
is orthogonal to case handling, and skipping PATHCONF because
namelen was already known would leave the caps unset.

The two capability bits carry opposite polarity because their POSIX
defaults differ. Most servers are case-sensitive and case-
preserving, matching "neither xflag set." NFS_CAP_CASE_INSENSITIVE
is set only when the server affirms case insensitivity, so "server
said no" and "server did not answer" both collapse to the case-
sensitive default. NFS_CAP_CASE_NONPRESERVING follows the same
pattern in the opposite direction: set only when the server affirms
that it does not preserve case, so that silence or a missing
attribute lands on the case-preserving default. The NFSv4 probe
checks res.attr_bitmask[0] to distinguish "server said false" from
"server omitted the attribute" before setting the bit.

Both capability bits are cleared before each probe so a remount,
an NFSv4 transparent state migration to a server with different
case semantics, or a probe whose reply does not arrive does not
retain stale capabilities from the prior probe.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/client.c           | 21 +++++++++++++++------
 fs/nfs/inode.c            | 15 +++++++++++++++
 fs/nfs/internal.h         |  3 +++
 fs/nfs/namespace.c        |  2 ++
 fs/nfs/nfs3proc.c         |  2 ++
 fs/nfs/nfs3xdr.c          |  7 +++++--
 fs/nfs/nfs4proc.c         | 10 +++++++---
 fs/nfs/proc.c             |  3 +++
 fs/nfs/symlink.c          |  3 +++
 include/linux/nfs_fs_sb.h |  2 +-
 include/linux/nfs_xdr.h   |  2 ++
 11 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be02bb227741..3db2f18315b8 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -914,6 +914,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
  */
 static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs_fattr *fattr)
 {
+	struct nfs_pathconf pathinfo = { };
 	struct nfs_fsinfo fsinfo;
 	struct nfs_client *clp = server->nfs_client;
 	int error;
@@ -933,15 +934,23 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
 
 	nfs_server_set_fsinfo(server, &fsinfo);
 
-	/* Get some general file system info */
-	if (server->namelen == 0) {
-		struct nfs_pathconf pathinfo;
+	pathinfo.fattr = fattr;
+	nfs_fattr_init(fattr);
 
-		pathinfo.fattr = fattr;
-		nfs_fattr_init(fattr);
+	/* Clear before probing so a failed RPC does not retain stale bits. */
+	if (clp->rpc_ops->version < 4)
+		server->caps &= ~(NFS_CAP_CASE_INSENSITIVE |
+				  NFS_CAP_CASE_NONPRESERVING);
 
-		if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0)
+	if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0) {
+		if (server->namelen == 0)
 			server->namelen = pathinfo.max_namelen;
+		if (clp->rpc_ops->version < 4) {
+			if (pathinfo.case_insensitive)
+				server->caps |= NFS_CAP_CASE_INSENSITIVE;
+			if (!pathinfo.case_preserving)
+				server->caps |= NFS_CAP_CASE_NONPRESERVING;
+		}
 	}
 
 	if (clp->rpc_ops->discover_trunking != NULL &&
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 98a8f0de1199..fdcbe6f2052c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -41,6 +41,7 @@
 #include <linux/freezer.h>
 #include <linux/uaccess.h>
 #include <linux/iversion.h>
+#include <linux/fileattr.h>
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -1101,6 +1102,20 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 }
 EXPORT_SYMBOL_GPL(nfs_getattr);
 
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct inode *inode = d_inode(dentry);
+
+	if (nfs_server_capable(inode, NFS_CAP_CASE_INSENSITIVE)) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (nfs_server_capable(inode, NFS_CAP_CASE_NONPRESERVING))
+		fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nfs_fileattr_get);
+
 static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
 {
 	refcount_set(&l_ctx->count, 1);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fc5456377160..309d3f679bb3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,6 +449,9 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
 extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
 extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
 
+struct file_kattr;
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
 /* localio.c */
 struct nfs_local_dio {
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index af9be0c5f516..6d0073c24771 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -246,11 +246,13 @@ nfs_namespace_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 const struct inode_operations nfs_mountpoint_inode_operations = {
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 const struct inode_operations nfs_referral_inode_operations = {
 	.getattr	= nfs_namespace_getattr,
 	.setattr	= nfs_namespace_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static void nfs_expire_automounts(struct work_struct *work)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 95d7cd564b74..b80d0c5efc27 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1053,6 +1053,7 @@ static const struct inode_operations nfs3_dir_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 #ifdef CONFIG_NFS_V3_ACL
 	.listxattr	= nfs3_listxattr,
 	.get_inode_acl	= nfs3_get_acl,
@@ -1064,6 +1065,7 @@ static const struct inode_operations nfs3_file_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 #ifdef CONFIG_NFS_V3_ACL
 	.listxattr	= nfs3_listxattr,
 	.get_inode_acl	= nfs3_get_acl,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index e17d72908412..e745e78faab0 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2276,8 +2276,11 @@ static int decode_pathconf3resok(struct xdr_stream *xdr,
 	if (unlikely(!p))
 		return -EIO;
 	result->max_link = be32_to_cpup(p++);
-	result->max_namelen = be32_to_cpup(p);
-	/* ignore remaining fields */
+	result->max_namelen = be32_to_cpup(p++);
+	p++;	/* ignore no_trunc */
+	p++;	/* ignore chown_restricted */
+	result->case_insensitive = be32_to_cpup(p++) != 0;
+	result->case_preserving = be32_to_cpup(p) != 0;
 	return 0;
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d839a97df822..62f66684fbc8 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3933,7 +3933,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 		server->caps &=
 			~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS |
 			  NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS |
-			  NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME);
+			  NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME |
+			  NFS_CAP_CASE_INSENSITIVE | NFS_CAP_CASE_NONPRESERVING);
 		server->fattr_valid = NFS_ATTR_FATTR_V4;
 		if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
 				res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
@@ -3944,8 +3945,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 			server->caps |= NFS_CAP_SYMLINKS;
 		if (res.case_insensitive)
 			server->caps |= NFS_CAP_CASE_INSENSITIVE;
-		if (res.case_preserving)
-			server->caps |= NFS_CAP_CASE_PRESERVING;
+		if ((res.attr_bitmask[0] & FATTR4_WORD0_CASE_PRESERVING) &&
+		    !res.case_preserving)
+			server->caps |= NFS_CAP_CASE_NONPRESERVING;
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 		if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
 			server->caps |= NFS_CAP_SECURITY_LABEL;
@@ -10598,6 +10600,7 @@ static const struct inode_operations nfs4_dir_inode_operations = {
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
 	.listxattr	= nfs4_listxattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static const struct inode_operations nfs4_file_inode_operations = {
@@ -10605,6 +10608,7 @@ static const struct inode_operations nfs4_file_inode_operations = {
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
 	.listxattr	= nfs4_listxattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static struct nfs_server *nfs4_clone_server(struct nfs_server *source,
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 70795684b8e8..03c2c1f31be9 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -598,6 +598,7 @@ nfs_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 {
 	info->max_link = 0;
 	info->max_namelen = NFS2_MAXNAMLEN;
+	info->case_preserving = true;
 	return 0;
 }
 
@@ -718,12 +719,14 @@ static const struct inode_operations nfs_dir_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static const struct inode_operations nfs_file_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 const struct nfs_rpc_ops nfs_v2_clientops = {
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 58146e935402..74a072896f8d 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -22,6 +22,8 @@
 #include <linux/mm.h>
 #include <linux/string.h>
 
+#include "internal.h"
+
 /* Symlink caching in the page cache is even more simplistic
  * and straight-forward than readdir caching.
  */
@@ -74,4 +76,5 @@ const struct inode_operations nfs_symlink_inode_operations = {
 	.get_link	= nfs_get_link,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4daee27fa5eb..34d294774f8c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -306,7 +306,7 @@ struct nfs_server {
 #define NFS_CAP_ATOMIC_OPEN	(1U << 4)
 #define NFS_CAP_LGOPEN		(1U << 5)
 #define NFS_CAP_CASE_INSENSITIVE	(1U << 6)
-#define NFS_CAP_CASE_PRESERVING	(1U << 7)
+#define NFS_CAP_CASE_NONPRESERVING	(1U << 7)
 #define NFS_CAP_REBOOT_LAYOUTRETURN	(1U << 8)
 #define NFS_CAP_OFFLOAD_STATUS	(1U << 9)
 #define NFS_CAP_ZERO_RANGE	(1U << 10)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..7c2057e40f99 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -182,6 +182,8 @@ struct nfs_pathconf {
 	struct nfs_fattr	*fattr; /* Post-op attributes */
 	__u32			max_link; /* max # of hard links */
 	__u32			max_namelen; /* max name length */
+	bool			case_insensitive;
+	bool			case_preserving;
 };
 
 struct nfs4_change_info {

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 09/15] cifs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Steve French, Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need a way to query whether a filesystem
handles filenames in a case-sensitive manner. Report CIFS/SMB case
handling behavior via FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING.

The authoritative source is the server itself: at mount time CIFS
issues QueryFSInfo(FS_ATTRIBUTE_INFORMATION) and caches the reply
on the tcon. That reply carries FILE_CASE_SENSITIVE_SEARCH and
FILE_CASE_PRESERVED_NAMES, which reflect whatever case handling
the share actually implements after SMB3.1.1 POSIX extensions
negotiation. Translating those two bits into the VFS flags lets
cifs_fileattr_get report what the server advertises rather than
what the client was asked to pretend.

QueryFSInfo is best-effort; the mount completes even if the server
does not answer. MaxPathNameComponentLength is zero in that case
and is used as the "no reply received" sentinel. When no reply is
available, fall back to the nocase mount option so that the reported
behavior agrees with the dentry comparison operations installed on
the superblock.

The callback is registered on cifs_dir_inode_ops so that NFSD,
ksmbd, and other consumers querying case handling against a
directory get a definitive answer, and on cifs_file_inode_ops to
preserve FS_COMPR_FL reporting on regular files. cifs_set_ops()
also installs cifs_namespace_inode_operations on DFS referral
directories that carry IS_AUTOMOUNT; register the same callback
there so the answer does not depend on whether the directory is
a referral point.

Registering fileattr_get routes FS_IOC_GETFLAGS through
vfs_fileattr_get() and short-circuits the syscall's fallback to
cifs_ioctl(). That fallback invoked CIFSGetExtAttr() under
CONFIG_CIFS_POSIX and CONFIG_CIFS_ALLOW_INSECURE_LEGACY on servers
advertising CIFS_UNIX_EXTATTR_CAP, surfacing the SMB1 Unix-extension
immutable, append, and nodump bits. cifs_fileattr_get carries over
only FS_COMPR_FL from cached cifsAttrs; the SMB1 extattr fetch is
not reproduced. SMB1 is deprecated, and acquiring a netfid from
within a dentry-only callback is not worth preserving a path tied
to an insecure legacy dialect.

Acked-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/smb/client/cifsfs.c    | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/smb/client/cifsfs.h    |  3 +++
 fs/smb/client/namespace.c |  1 +
 3 files changed, 57 insertions(+)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2025739f070a..6c113ae7fdd3 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -30,6 +30,7 @@
 #include <linux/xattr.h>
 #include <linux/mm.h>
 #include <linux/key-type.h>
+#include <linux/fileattr.h>
 #include <uapi/linux/magic.h>
 #include <net/ipv6.h>
 #include "cifsfs.h"
@@ -1199,6 +1200,56 @@ struct file_system_type smb3_fs_type = {
 MODULE_ALIAS_FS("smb3");
 MODULE_ALIAS("smb3");
 
+int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
+	struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+	struct inode *inode = d_inode(dentry);
+	u32 attrs;
+
+	/* Preserve FS_COMPR_FL previously reported by cifs_ioctl(). */
+	if (CIFS_I(inode)->cifsAttrs & ATTR_COMPRESSED)
+		fa->flags |= FS_COMPR_FL;
+
+	/*
+	 * FS_CASEFOLD_FL is defined by UAPI as a folder attribute,
+	 * and userspace tools (e.g., lsattr) display it only on
+	 * directories. Confine the case-handling bits to directories
+	 * to match that convention; for non-directories the share's
+	 * case semantics are still discoverable through the parent.
+	 */
+	if (!S_ISDIR(inode->i_mode))
+		return 0;
+
+	/*
+	 * The server's FS_ATTRIBUTE_INFORMATION response, cached on
+	 * the tcon at mount, reflects the share's case-handling
+	 * semantics after any POSIX extensions negotiation. Prefer
+	 * it over the client-local nocase mount option, which only
+	 * governs dentry comparison on this superblock.
+	 *
+	 * QueryFSInfo is best-effort at mount; when it did not
+	 * populate fsAttrInfo, MaxPathNameComponentLength remains
+	 * zero. In that case fall back to nocase so the reporting
+	 * matches the comparison behavior installed on the sb.
+	 */
+	if (le32_to_cpu(tcon->fsAttrInfo.MaxPathNameComponentLength) == 0) {
+		if (tcon->nocase) {
+			fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+			fa->flags |= FS_CASEFOLD_FL;
+		}
+		return 0;
+	}
+	attrs = le32_to_cpu(tcon->fsAttrInfo.Attributes);
+	if (!(attrs & FILE_CASE_SENSITIVE_SEARCH)) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (!(attrs & FILE_CASE_PRESERVED_NAMES))
+		fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	return 0;
+}
+
 const struct inode_operations cifs_dir_inode_ops = {
 	.create = cifs_create,
 	.atomic_open = cifs_atomic_open,
@@ -1217,6 +1268,7 @@ const struct inode_operations cifs_dir_inode_ops = {
 	.listxattr = cifs_listxattr,
 	.get_acl = cifs_get_acl,
 	.set_acl = cifs_set_acl,
+	.fileattr_get = cifs_fileattr_get,
 };
 
 const struct inode_operations cifs_file_inode_ops = {
@@ -1227,6 +1279,7 @@ const struct inode_operations cifs_file_inode_ops = {
 	.fiemap = cifs_fiemap,
 	.get_acl = cifs_get_acl,
 	.set_acl = cifs_set_acl,
+	.fileattr_get = cifs_fileattr_get,
 };
 
 const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index 7370b38da938..5f0d459d1a89 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -89,6 +89,9 @@ extern const struct inode_operations cifs_file_inode_ops;
 extern const struct inode_operations cifs_symlink_inode_ops;
 extern const struct inode_operations cifs_namespace_inode_operations;
 
+struct file_kattr;
+int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
 
 /* Functions related to files and directories */
 extern const struct netfs_request_ops cifs_req_ops;
diff --git a/fs/smb/client/namespace.c b/fs/smb/client/namespace.c
index 52a520349cb7..52a51b032fae 100644
--- a/fs/smb/client/namespace.c
+++ b/fs/smb/client/namespace.c
@@ -294,4 +294,5 @@ struct vfsmount *cifs_d_automount(struct path *path)
 }
 
 const struct inode_operations cifs_namespace_inode_operations = {
+	.fileattr_get	= cifs_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need to query whether a filesystem
is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
when the filesystem is formatted with the ASCIICI feature
flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr()
in xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates
bs_xflags directly from xfs_ip2xflags()), so bulkstat consumers
and per-inode queries see a consistent view of the filesystem's
case-folding behavior.

FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it, and xfs_flags2diflags() has no
clause for CASEFOLD so the on-disk diflags are unaffected.
The legacy FS_IOC_SETFLAGS path in xfs_fileattr_set() also
allows FS_CASEFOLD_FL through its allowlist on ASCIICI
filesystems so that a chattr read-modify-write cycle does
not fail with EOPNOTSUPP.

XFS always preserves case. XFS is case-sensitive by default,
but supports ASCII case-insensitive lookups when formatted
with the ASCIICI feature flag.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/xfs/libxfs/xfs_inode_util.c |  2 ++
 fs/xfs/xfs_ioctl.c             | 20 +++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 551fa51befb6..82be54b6f8d3 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -130,6 +130,8 @@ xfs_ip2xflags(
 
 	if (xfs_inode_has_attr_fork(ip))
 		flags |= FS_XFLAG_HASATTR;
+	if (xfs_has_asciici(ip->i_mount))
+		flags |= FS_XFLAG_CASEFOLD;
 	return flags;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ed9b4846c05f..f8216f74679f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -755,9 +755,23 @@ xfs_fileattr_set(
 	trace_xfs_ioctl_setattr(ip);
 
 	if (!fa->fsx_valid) {
-		if (fa->flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL |
-				  FS_NOATIME_FL | FS_NODUMP_FL |
-				  FS_SYNC_FL | FS_DAX_FL | FS_PROJINHERIT_FL))
+		unsigned int allowed = FS_IMMUTABLE_FL | FS_APPEND_FL |
+				       FS_NOATIME_FL | FS_NODUMP_FL |
+				       FS_SYNC_FL | FS_DAX_FL |
+				       FS_PROJINHERIT_FL;
+
+		/*
+		 * FS_CASEFOLD_FL reflects the ASCIICI superblock feature,
+		 * a read-only property. Accept it as a no-op so chattr's
+		 * RMW round-trip succeeds; reject any attempt to enable
+		 * it on a non-ASCIICI filesystem. xfs_flags2diflags()
+		 * has no clause for CASEFOLD, so the bit is dropped from
+		 * the on-disk diflags regardless.
+		 */
+		if (xfs_has_asciici(mp))
+			allowed |= FS_CASEFOLD_FL;
+
+		if (fa->flags & ~allowed)
 			return -EOPNOTSUPP;
 	}
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 07/15] hfsplus: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-05-07  8:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Add case sensitivity reporting to the existing hfsplus_fileattr_get()
function via the FS_XFLAG_CASEFOLD flag. HFS+ always preserves case
at rest.

Case sensitivity depends on how the volume was formatted: HFSX
volumes may be either case-sensitive or case-insensitive, indicated
by the HFSPLUS_SB_CASEFOLD superblock flag.

FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it. The legacy FS_IOC_SETFLAGS path in
hfsplus_fileattr_set() also allows FS_CASEFOLD_FL through its
allowlist on case-insensitive volumes so that a chattr
read-modify-write cycle does not fail with EOPNOTSUPP.

Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/hfsplus/inode.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index d05891ec492e..5565c14b4bf6 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -740,6 +740,7 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
 {
 	struct inode *inode = d_inode(dentry);
 	struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+	struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
 	unsigned int flags = 0;
 
 	if (inode->i_flags & S_IMMUTABLE)
@@ -748,6 +749,8 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
 		flags |= FS_APPEND_FL;
 	if (hip->userflags & HFSPLUS_FLG_NODUMP)
 		flags |= FS_NODUMP_FL;
+	if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags))
+		flags |= FS_CASEFOLD_FL;
 
 	fileattr_fill_flags(fa, flags);
 
@@ -759,13 +762,24 @@ int hfsplus_fileattr_set(struct mnt_idmap *idmap,
 {
 	struct inode *inode = d_inode(dentry);
 	struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+	struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+	unsigned int allowed = FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL;
 	unsigned int new_fl = 0;
 
 	if (fileattr_has_fsx(fa))
 		return -EOPNOTSUPP;
 
+	/*
+	 * FS_CASEFOLD_FL reflects HFSPLUS_SB_CASEFOLD, a mount-time
+	 * property. Accept it as a no-op so chattr's RMW round-trip
+	 * succeeds; reject any attempt to enable it on a volume that
+	 * was not formatted case-insensitive.
+	 */
+	if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags))
+		allowed |= FS_CASEFOLD_FL;
+
 	/* don't silently ignore unsupported ext2 flags */
-	if (fa->flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL))
+	if (fa->flags & ~allowed)
 		return -EOPNOTSUPP;
 
 	if (fa->flags & FS_IMMUTABLE_FL)

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 06/15] hfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report HFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. HFS is always case-insensitive (using Mac OS Roman case
folding) and always preserves case at rest.

Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/hfs/dir.c    |  1 +
 fs/hfs/hfs_fs.h |  2 ++
 fs/hfs/inode.c  | 14 ++++++++++++++
 3 files changed, 17 insertions(+)

diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index f5e7efe924e7..c4c6e1623f55 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -328,4 +328,5 @@ const struct inode_operations hfs_dir_inode_operations = {
 	.rmdir		= hfs_remove,
 	.rename		= hfs_rename,
 	.setattr	= hfs_inode_setattr,
+	.fileattr_get	= hfs_fileattr_get,
 };
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..1b23448c9a48 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -177,6 +177,8 @@ extern int hfs_get_block(struct inode *inode, sector_t block,
 extern const struct address_space_operations hfs_aops;
 extern const struct address_space_operations hfs_btree_aops;
 
+struct file_kattr;
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int hfs_write_begin(const struct kiocb *iocb, struct address_space *mapping,
 		    loff_t pos, unsigned int len, struct folio **foliop,
 		    void **fsdata);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 89b33a9d46d5..f41cc261684d 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -18,6 +18,7 @@
 #include <linux/uio.h>
 #include <linux/xattr.h>
 #include <linux/blkdev.h>
+#include <linux/fileattr.h>
 
 #include "hfs_fs.h"
 #include "btree.h"
@@ -699,6 +700,18 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
 	return ret;
 }
 
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	/*
+	 * HFS compares filenames using Mac OS Roman case folding, so
+	 * lookup is always case-insensitive. Names are stored on disk
+	 * with case intact; CASENONPRESERVING stays clear.
+	 */
+	fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+	fa->flags |= FS_CASEFOLD_FL;
+	return 0;
+}
+
 static const struct file_operations hfs_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read_iter	= generic_file_read_iter,
@@ -715,4 +728,5 @@ static const struct inode_operations hfs_file_inode_operations = {
 	.lookup		= hfs_file_lookup,
 	.setattr	= hfs_inode_setattr,
 	.listxattr	= generic_listxattr,
+	.fileattr_get	= hfs_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 05/15] ntfs3: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report NTFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. NTFS always preserves case at rest.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/ntfs3/file.c    | 29 +++++++++++++++++++++++++++++
 fs/ntfs3/namei.c   |  1 +
 fs/ntfs3/ntfs_fs.h |  1 +
 3 files changed, 31 insertions(+)

diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index b041639ab406..ad9350d7fc3f 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -180,6 +180,34 @@ long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
 }
 #endif
 
+/*
+ * ntfs_fileattr_get - inode_operations::fileattr_get
+ */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct inode *inode = d_inode(dentry);
+	struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
+
+	/* Avoid any operation if inode is bad. */
+	if (unlikely(is_bad_ni(ntfs_i(inode))))
+		return -EINVAL;
+
+	/*
+	 * NTFS preserves case (the default). Case sensitivity depends on
+	 * mount options: with "nocase", NTFS is case-insensitive;
+	 * otherwise it is case-sensitive.
+	 */
+	if (sbi->options->nocase) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (inode->i_flags & S_IMMUTABLE) {
+		fa->fsx_xflags |= FS_XFLAG_IMMUTABLE;
+		fa->flags |= FS_IMMUTABLE_FL;
+	}
+	return 0;
+}
+
 /*
  * ntfs_getattr - inode_operations::getattr
  */
@@ -1547,6 +1575,7 @@ const struct inode_operations ntfs_file_inode_operations = {
 	.get_acl	= ntfs_get_acl,
 	.set_acl	= ntfs_set_acl,
 	.fiemap		= ntfs_fiemap,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct file_operations ntfs_file_operations = {
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index b2af8f695e60..e159ba66a34a 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -518,6 +518,7 @@ const struct inode_operations ntfs_dir_inode_operations = {
 	.getattr	= ntfs_getattr,
 	.listxattr	= ntfs_listxattr,
 	.fiemap		= ntfs_fiemap,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct inode_operations ntfs_special_inode_operations = {
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index bbf3b6a1dcbe..41db22d652c4 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -529,6 +529,7 @@ bool dir_is_empty(struct inode *dir);
 extern const struct file_operations ntfs_dir_operations;
 
 /* Globals from file.c */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		 struct kstat *stat, u32 request_mask, u32 flags);
 int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 04/15] exfat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT compares names through the volume's upcase table; in
practice that table folds case, and case is preserved at rest.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/exfat/exfat_fs.h |  2 ++
 fs/exfat/file.c     | 18 ++++++++++++++++--
 fs/exfat/namei.c    |  1 +
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..aff4dcd4e75a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -496,6 +496,8 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, unsigned int request_mask,
 		  unsigned int query_flags);
+struct file_kattr;
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
 long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
 long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..91e5511945d1 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,6 +14,7 @@
 #include <linux/writeback.h>
 #include <linux/filelock.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 
 #include "exfat_raw.h"
 #include "exfat_fs.h"
@@ -323,6 +324,18 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 	return 0;
 }
 
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	/*
+	 * exFAT compares filenames through an upcase table, so lookup
+	 * is always case-insensitive. Long names are stored in UTF-16
+	 * with case intact; CASENONPRESERVING stays clear.
+	 */
+	fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+	fa->flags |= FS_CASEFOLD_FL;
+	return 0;
+}
+
 int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		  struct iattr *attr)
 {
@@ -817,6 +830,7 @@ const struct file_operations exfat_file_operations = {
 };
 
 const struct inode_operations exfat_file_inode_operations = {
-	.setattr     = exfat_setattr,
-	.getattr     = exfat_getattr,
+	.setattr	= exfat_setattr,
+	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..94002e43db08 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1311,4 +1311,5 @@ const struct inode_operations exfat_dir_inode_operations = {
 	.rename		= exfat_rename,
 	.setattr	= exfat_setattr,
 	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.

MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.

VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/fat/fat.h         |  3 +++
 fs/fat/file.c        | 36 ++++++++++++++++++++++++++++++++++++
 fs/fat/namei_msdos.c |  1 +
 fs/fat/namei_vfat.c  |  1 +
 4 files changed, 41 insertions(+)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 5a58f0bf8ce8..99ed9228a677 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -10,6 +10,8 @@
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 
+struct file_kattr;
+
 /*
  * vfat shortname flags
  */
@@ -408,6 +410,7 @@ extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(struct mnt_idmap *idmap,
 		       const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int flags);
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
 			  int datasync);
 
diff --git a/fs/fat/file.c b/fs/fat/file.c
index becccdd2e501..37e7049b4c8c 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,6 +17,7 @@
 #include <linux/fsnotify.h>
 #include <linux/security.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 #include "fat.h"
 
 static long fat_fallocate(struct file *file, int mode,
@@ -398,6 +399,40 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset)
 	fat_flush_inodes(inode->i_sb, inode, NULL);
 }
 
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
+	bool case_sensitive;
+
+	/*
+	 * FAT filesystems are case-insensitive by default. VFAT
+	 * becomes case-sensitive when mounted with 'check=strict',
+	 * which installs vfat_dentry_ops. MSDOS has no such option;
+	 * its 'nocase' mount option selects case-sensitive matching.
+	 *
+	 * VFAT long filename entries preserve case. Without VFAT, only
+	 * uppercased 8.3 short names are stored. MSDOS with 'nocase'
+	 * also preserves case.
+	 */
+	if (sbi->options.isvfat)
+		case_sensitive = sbi->options.name_check == 's';
+	else
+		case_sensitive = sbi->options.nocase;
+
+	if (!case_sensitive) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+		if (!sbi->options.isvfat)
+			fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	}
+	if (d_inode(dentry)->i_flags & S_IMMUTABLE) {
+		fa->fsx_xflags |= FS_XFLAG_IMMUTABLE;
+		fa->flags |= FS_IMMUTABLE_FL;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fat_fileattr_get);
+
 int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		struct kstat *stat, u32 request_mask, unsigned int flags)
 {
@@ -575,5 +610,6 @@ EXPORT_SYMBOL_GPL(fat_setattr);
 const struct inode_operations fat_file_inode_operations = {
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 4cc65f330fb7..0fd2971ad4b1 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -644,6 +644,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
 	.rename		= msdos_rename,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 918b3756674c..e909447873e3 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1185,6 +1185,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
 	.rename		= vfat_rename2,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 02/15] fs: Add case sensitivity flags to file_kattr
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.

Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).

Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c           | 4 ++++
 include/linux/fileattr.h | 3 ++-
 include/uapi/linux/fs.h  | 7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index f429da66a317..bfb00d256dd5 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -37,6 +37,8 @@ void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 		fa->flags |= FS_PROJINHERIT_FL;
 	if (fa->fsx_xflags & FS_XFLAG_VERITY)
 		fa->flags |= FS_VERITY_FL;
+	if (fa->fsx_xflags & FS_XFLAG_CASEFOLD)
+		fa->flags |= FS_CASEFOLD_FL;
 }
 EXPORT_SYMBOL(fileattr_fill_xflags);
 
@@ -67,6 +69,8 @@ void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 		fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
 	if (fa->flags & FS_VERITY_FL)
 		fa->fsx_xflags |= FS_XFLAG_VERITY;
+	if (fa->flags & FS_CASEFOLD_FL)
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
 }
 EXPORT_SYMBOL(fileattr_fill_flags);
 
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 3780904a63a6..58044b598016 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -16,7 +16,8 @@
 
 /* Read-only inode flags */
 #define FS_XFLAG_RDONLY_MASK \
-	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY)
+	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY | \
+	 FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
 
 /* Flags to indicate valid value of fsx_ fields */
 #define FS_XFLAG_VALUES_MASK \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 13f71202845e..2ea4c81df08f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -254,6 +254,13 @@ struct file_attr {
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
 #define FS_XFLAG_VERITY		0x00020000	/* fs-verity enabled */
+/*
+ * Case handling flags (read-only, cannot be set via ioctl).
+ * Default (neither set) indicates POSIX semantics: case-sensitive
+ * lookups and case-preserving storage.
+ */
+#define FS_XFLAG_CASEFOLD	0x00040000	/* case-insensitive lookups */
+#define FS_XFLAG_CASENONPRESERVING 0x00080000	/* case not preserved */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 01/15] fs: Move file_kattr initialization to callers
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.

Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().

Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c     | 12 ++++--------
 fs/xfs/xfs_ioctl.c |  2 +-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index da983e105d70..f429da66a317 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -15,12 +15,10 @@
  * @fa:		fileattr pointer
  * @xflags:	FS_XFLAG_* flags
  *
- * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).  All
- * other fields are zeroed.
+ * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).
  */
 void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->fsx_valid = true;
 	fa->fsx_xflags = xflags;
 	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
@@ -48,11 +46,9 @@ EXPORT_SYMBOL(fileattr_fill_xflags);
  * @flags:	FS_*_FL flags
  *
  * Set ->flags, ->flags_valid and ->fsx_xflags (translated flags).
- * All other fields are zeroed.
  */
 void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->flags_valid = true;
 	fa->flags = flags;
 	if (fa->flags & FS_SYNC_FL)
@@ -325,7 +321,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	unsigned int flags;
 	int err;
 
@@ -357,7 +353,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int err;
 
 	err = copy_fsxattr_from_user(&fa, argp);
@@ -431,7 +427,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
 	struct path filepath __free(path_put) = {};
 	unsigned int lookup_flags = 0;
 	struct file_attr fattr;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int error;
 
 	BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 46e234863644..ed9b4846c05f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -517,7 +517,7 @@ xfs_ioc_fsgetxattra(
 	xfs_inode_t		*ip,
 	void			__user *arg)
 {
-	struct file_kattr	fa;
+	struct file_kattr	fa = {};
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);

-- 
2.53.0


^ permalink raw reply related

* [PATCH v14 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-05-07  8:52 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz, Steve French

Christian, let's lock this one in. I will post subsequent changes
as delta patches.

Following on from:

https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa

I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:

  case-insensitive = false
  case-nonpreserving = false

The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.

Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.

The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.

One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.

Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.

Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.

[1] https://github.com/kofemann/ms-nfs41-client

[2] https://patchwork.kernel.org/project/linux-nfs/cover/20211217203658.439352-1-trondmy@kernel.org/

---
Changes since v13:
- Address findings from sashiko (gemini-3.1):
  - ntfs3: Drop fileattr_get from symlink and special inode ops
  - nfsd: Probe nfsd_get_case_info() under kernel creds to avoid
    spurious NFS4ERR_ACCESS from per-client MAC policy

Changes since v12:
- Address findings from sashiko (gemini-3.1):
  - cifs: Restrict case-handling flags to directories per UAPI
  - nfs: Clear case caps before PATHCONF so a failed reply
    does not retain stale bits from the prior probe
  - nfsd: Document the parent-resolution corner cases of
    nfsd_get_case_info() (single-file exports, disconnected
    dentries, hardlinks) in the v3 and v4 commit messages

Changes since v11:
- isofs: Wire .fileattr_get only on directory inodes, since
  NFSD and ksmbd query casefolding on directories (Jan Kara)
- xfs, hfsplus: Drop the FS_CASEFOLD_FL fileattr_get mask;
  admit the bit through fileattr_set's allowlist instead
- Address findings from sashiko(gemini-3) and gpt-5.5:
  - cifs: Wire .fileattr_get on cifs_namespace_inode_operations
    so DFS referral / automount directories report case handling
  - fat, ntfs3: Fill FS_IMMUTABLE_FL in fileattr_get
  - hfsplus: Hide FS_CASEFOLD_FL from the legacy flags view so
    chattr round-trips do not hit the setflags whitelist
  - nfs: Clear NFS_CAP_CASE_INSENSITIVE and
    NFS_CAP_CASE_NONPRESERVING before re-OR'ing in the v3 and
    v4 probe paths so re-probe / TSM does not retain stale caps
  - nfsd: Switch nfsd_get_case_info() to errno return so
    v3 PATHCONF and v4 GETATTR can apply version-appropriate
    policy on failure
  - nfsd: Use dget_parent() in v4 case-attr probe to keep
    the parent dentry referenced across the query
  - isofs: Report FS_XFLAG_CASENONPRESERVING for map=n/map=a

Changes since v10:
- cifs: Source case-handling flags from the server's cached
  FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
  option, with a nocase fallback when the reply is absent
- Address findings from sashiko(gemini-3) and gpt-5.5:
  - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
    instead)
  - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
    chattr round-trips do not hit the setflags whitelist
  - ext4, f2fs: Drop redundant fileattr_get patches; the
    FS_CASEFOLD_FL translation in fileattr_fill_flags() already
    reports FS_XFLAG_CASEFOLD for casefolded directories
  - nfsd: Report FATTR4_HOMOGENEOUS = FALSE when the exported
    filesystem has a Unicode encoding, since per-directory
    casefold makes the fs-scoped case attributes inhomogeneous
  - nfsd: Document in nfsd_get_case_info() why -ENOIOCTLCMD and
    -ENOTTY are swallowed while other errors propagate
  - fat: Honor vfat 'check=strict' when reporting FS_XFLAG_CASEFOLD
  - Set FS_CASEFOLD_FL so FS_IOC_GETFLAGS reflects case-insensitive
    mount
  - isofs: Register fileattr_get on regular file and symlink inodes,
    not just directories
  - nfsd: Query NFSv4 FATTR4_CASE_* from the parent directory for
    non-directory objects, since casefold lives on the directory

Changes since v9:
- nfs: always probe PATHCONF for case caps. Default to case-
  preserving when the server does not report case_preserving
- nfsd, ksmbd: tolerate -ENOTTY from vfs_fileattr_get() so
  overlayfs exports on backing filesystems without fileattr_get
  do not fail the RPC
- xfs: map FS_XFLAG_CASEFOLD inside xfs_ip2xflags() so BULKSTAT
  and FS_IOC_FSGETXATTR report the flag consistently
- vboxsf: reject a short host reply to SHFL_INFO_VOLUME before
  trusting volinfo.properties.case_sensitive

Changes since v8:
- Rebase on v7.0-rc1

Changes since v7:
- Split file_attr initialization changes into a separate patch

Changes since v6:
- Remove the memset from vfs_fileattr_get

Changes since v5:
- Finish the conversion to FS_XFLAGs
- NFSv4 GETATTR now clears the attr mask bit if nfsd_get_case_info()
  fails

Changes since v4:
- Observe the MSDOS "nocase" mount option
- Define new FS_XFLAGs for the user API

Changes since v3:
- Change fa->case_preserving to fa_case_nonpreserving
- VFAT is case preserving
- Make new fields available to user space

Changes since v2:
- Remove unicode labels
- Replace vfs_get_case_info
- Add support for several more local file system implementations
- Add support for in-kernel SMB server

Changes since RFC:
- Use file_getattr instead of statx
- Postpone exposing Unicode version until later
- Support NTFS and ext4 in addition to FAT
- Support NFSv4 fattr4 in addition to NFSv3 PATHCONF

---
Chuck Lever (15):
      fs: Move file_kattr initialization to callers
      fs: Add case sensitivity flags to file_kattr
      fat: Implement fileattr_get for case sensitivity
      exfat: Implement fileattr_get for case sensitivity
      ntfs3: Implement fileattr_get for case sensitivity
      hfs: Implement fileattr_get for case sensitivity
      hfsplus: Report case sensitivity in fileattr_get
      xfs: Report case sensitivity in fileattr_get
      cifs: Implement fileattr_get for case sensitivity
      nfs: Implement fileattr_get for case sensitivity
      vboxsf: Implement fileattr_get for case sensitivity
      isofs: Implement fileattr_get for case sensitivity
      nfsd: Report export case-folding via NFSv3 PATHCONF
      nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
      ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION

 fs/exfat/exfat_fs.h            |  2 +
 fs/exfat/file.c                | 18 ++++++++-
 fs/exfat/namei.c               |  1 +
 fs/fat/fat.h                   |  3 ++
 fs/fat/file.c                  | 36 +++++++++++++++++
 fs/fat/namei_msdos.c           |  1 +
 fs/fat/namei_vfat.c            |  1 +
 fs/file_attr.c                 | 16 ++++----
 fs/hfs/dir.c                   |  1 +
 fs/hfs/hfs_fs.h                |  2 +
 fs/hfs/inode.c                 | 14 +++++++
 fs/hfsplus/inode.c             | 16 +++++++-
 fs/isofs/dir.c                 | 16 ++++++++
 fs/isofs/isofs.h               |  3 ++
 fs/nfs/client.c                | 21 +++++++---
 fs/nfs/inode.c                 | 15 +++++++
 fs/nfs/internal.h              |  3 ++
 fs/nfs/namespace.c             |  2 +
 fs/nfs/nfs3proc.c              |  2 +
 fs/nfs/nfs3xdr.c               |  7 +++-
 fs/nfs/nfs4proc.c              | 10 +++--
 fs/nfs/proc.c                  |  3 ++
 fs/nfs/symlink.c               |  3 ++
 fs/nfsd/nfs3proc.c             | 36 +++++++++++++----
 fs/nfsd/nfs4xdr.c              | 52 +++++++++++++++++++++++--
 fs/nfsd/vfs.c                  | 88 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.h                  |  3 ++
 fs/nfsd/xdr3.h                 |  4 +-
 fs/ntfs3/file.c                | 29 ++++++++++++++
 fs/ntfs3/namei.c               |  1 +
 fs/ntfs3/ntfs_fs.h             |  1 +
 fs/smb/client/cifsfs.c         | 53 +++++++++++++++++++++++++
 fs/smb/client/cifsfs.h         |  3 ++
 fs/smb/client/namespace.c      |  1 +
 fs/smb/server/smb2pdu.c        | 30 +++++++++++---
 fs/vboxsf/dir.c                |  1 +
 fs/vboxsf/file.c               |  6 ++-
 fs/vboxsf/super.c              |  7 ++++
 fs/vboxsf/utils.c              | 30 ++++++++++++++
 fs/vboxsf/vfsmod.h             |  6 +++
 fs/xfs/libxfs/xfs_inode_util.c |  2 +
 fs/xfs/xfs_ioctl.c             | 22 +++++++++--
 include/linux/fileattr.h       |  3 +-
 include/linux/nfs_fs_sb.h      |  2 +-
 include/linux/nfs_xdr.h        |  2 +
 include/uapi/linux/fs.h        |  7 ++++
 46 files changed, 536 insertions(+), 49 deletions(-)
---
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
change-id: 20260422-case-sensitivity-5cbffc8f1558

Best regards,
--  
Chuck Lever <chuck.lever@oracle.com>


^ permalink raw reply

* Re: [PATCH] ext4: enable mballoc kunit tests for blocksize > PAGE_SIZE
From: Baokun Li @ 2026-05-07  8:51 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list
In-Reply-To: <20260506075900.3649944-1-libaokun@linux.alibaba.com>


Please ignore this patch.

During review, Sashiko noticed a potential out-of-bounds write when
CONFIG_TRANSPARENT_HUGEPAGE is disabled.

This has been fixed in v2 by adding a CONFIG_TRANSPARENT_HUGEPAGE check:

v2:
https://patch.msgid.link/20260507083754.1646636-1-libaokun@linux.alibaba.com


Regards,
Baokun

在 2026/5/6 15:59, Baokun Li 写道:
> With Large Block Size (LBS) support, ext4 can now use block sizes larger
> than PAGE_SIZE. The mballoc kunit tests previously skipped three test
> cases (test_mb_mark_used, test_mb_free_blocks, test_mb_mark_used_cost)
> under this configuration because the buddy cache inode's folio mapping
> order was never initialized in the test harness.
>
> The real mount path configures s_min_folio_order and s_max_folio_order
> in ext4_fill_super(), which allows ext4_set_inode_mapping_order() to
> set up the correct folio order for the buddy cache inode. The kunit
> test bypasses ext4_fill_super(), so the mapping order stayed at zero
> and __filemap_get_folio() allocated order-0 folios too small for LBS.
>
> Initialize s_min_folio_order and s_max_folio_order in mbt_init_sb_layout()
> to mirror ext4_fill_super() behavior, enabling properly sized folio
> allocations and removing the three blocksize > PAGE_SIZE skips.
>
> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
> ---
>  fs/ext4/mballoc-test.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
> index 90ed505fa4b1..04bc9f773d63 100644
> --- a/fs/ext4/mballoc-test.c
> +++ b/fs/ext4/mballoc-test.c
> @@ -206,6 +206,8 @@ static void mbt_init_sb_layout(struct super_block *sb,
>  	sbi->s_desc_per_block_bits =
>  		sb->s_blocksize_bits - (fls(layout->desc_size) - 1);
>  	sbi->s_desc_per_block = 1 << sbi->s_desc_per_block_bits;
> +	sbi->s_min_folio_order = get_order(sb->s_blocksize);
> +	sbi->s_max_folio_order = sbi->s_min_folio_order;
>  
>  	es->s_first_data_block = cpu_to_le32(0);
>  	es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
> @@ -791,10 +793,6 @@ static void test_mb_mark_used(struct kunit *test)
>  	struct test_range ranges[TEST_RANGE_COUNT];
>  	int i;
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
>  	buddy = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
> @@ -858,10 +856,6 @@ static void test_mb_free_blocks(struct kunit *test)
>  	int i;
>  	struct test_range ranges[TEST_RANGE_COUNT];
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
>  	buddy = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
> @@ -905,10 +899,6 @@ static void test_mb_mark_used_cost(struct kunit *test)
>  	int i, j;
>  	unsigned long start, end, all = 0;
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	ret = ext4_mb_load_buddy_test(sb, TEST_GOAL_GROUP, &e4b);
>  	KUNIT_ASSERT_EQ(test, ret, 0);
>  



^ permalink raw reply

* [PATCH v2] ext4: enable mballoc kunit tests for blocksize > PAGE_SIZE
From: Baokun Li @ 2026-05-07  8:37 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, ojaswin, ritesh.list,
	libaokun

With Large Block Size (LBS) support, ext4 can now use block sizes larger
than PAGE_SIZE. The mballoc kunit tests previously skipped three test
cases (test_mb_mark_used, test_mb_free_blocks, test_mb_mark_used_cost)
under this configuration because the buddy cache inode's folio mapping
order was never initialized in the test harness.

The real mount path configures s_min_folio_order and s_max_folio_order
in ext4_fill_super(), which allows ext4_set_inode_mapping_order() to
set up the correct folio order for the buddy cache inode. The kunit
test bypasses ext4_fill_super(), so the mapping order stayed at zero
and __filemap_get_folio() allocated order-0 folios too small for LBS.

Initialize s_min_folio_order and s_max_folio_order in mbt_init_sb_layout()
to mirror ext4_fill_super() behavior, enabling properly sized folio
allocations and removing the three blocksize > PAGE_SIZE skips.

Add mbt_check_lbs_support() to skip tests that use
ext4_mb_load_buddy_test() when blocksize > PAGE_SIZE without
CONFIG_TRANSPARENT_HUGEPAGE.  Without THP, mapping_set_folio_order_range()
is a no-op and __filemap_get_folio() allocates order-0 folios (PAGE_SIZE),
which are too small for the buddy cache bitmap/buddy data.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
---
Changes since v1:
 * Add mbt_check_lbs_support() to skip tests when blocksize > PAGE_SIZE
   without CONFIG_TRANSPARENT_HUGEPAGE. (Reported by sashiko.)

v1: https://patch.msgid.link/20260506075900.3649944-1-libaokun@linux.alibaba.com

 fs/ext4/mballoc-test.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index 90ed505fa4b1..72820bb2dcd4 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -206,6 +206,8 @@ static void mbt_init_sb_layout(struct super_block *sb,
 	sbi->s_desc_per_block_bits =
 		sb->s_blocksize_bits - (fls(layout->desc_size) - 1);
 	sbi->s_desc_per_block = 1 << sbi->s_desc_per_block_bits;
+	sbi->s_min_folio_order = get_order(sb->s_blocksize);
+	sbi->s_max_folio_order = sbi->s_min_folio_order;
 
 	es->s_first_data_block = cpu_to_le32(0);
 	es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
@@ -781,6 +783,16 @@ test_mb_mark_used_range(struct kunit *test, struct ext4_buddy *e4b,
 	mbt_validate_group_info(test, grp, e4b->bd_info);
 }
 
+/*
+ * Skip if blocksize > PAGE_SIZE without THP.  The buddy cache folio
+ * allocation requires CONFIG_TRANSPARENT_HUGEPAGE for large blocks.
+ */
+static void mbt_check_lbs_support(struct kunit *test, struct super_block *sb)
+{
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && sb->s_blocksize > PAGE_SIZE)
+		kunit_skip(test, "blocksize > PAGE_SIZE requires CONFIG_TRANSPARENT_HUGEPAGE");
+}
+
 static void test_mb_mark_used(struct kunit *test)
 {
 	struct ext4_buddy e4b;
@@ -791,9 +803,7 @@ static void test_mb_mark_used(struct kunit *test)
 	struct test_range ranges[TEST_RANGE_COUNT];
 	int i;
 
-	/* buddy cache assumes that each page contains at least one block */
-	if (sb->s_blocksize > PAGE_SIZE)
-		kunit_skip(test, "blocksize exceeds pagesize");
+	mbt_check_lbs_support(test, sb);
 
 	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
@@ -858,9 +868,7 @@ static void test_mb_free_blocks(struct kunit *test)
 	int i;
 	struct test_range ranges[TEST_RANGE_COUNT];
 
-	/* buddy cache assumes that each page contains at least one block */
-	if (sb->s_blocksize > PAGE_SIZE)
-		kunit_skip(test, "blocksize exceeds pagesize");
+	mbt_check_lbs_support(test, sb);
 
 	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
@@ -905,9 +913,7 @@ static void test_mb_mark_used_cost(struct kunit *test)
 	int i, j;
 	unsigned long start, end, all = 0;
 
-	/* buddy cache assumes that each page contains at least one block */
-	if (sb->s_blocksize > PAGE_SIZE)
-		kunit_skip(test, "blocksize exceeds pagesize");
+	mbt_check_lbs_support(test, sb);
 
 	ret = ext4_mb_load_buddy_test(sb, TEST_GOAL_GROUP, &e4b);
 	KUNIT_ASSERT_EQ(test, ret, 0);
-- 
2.43.7


^ permalink raw reply related

* Re: [PATCH] jbd2: check for aborted handle in jbd2_journal_dirty_metadata()
From: Andreas Dilger @ 2026-05-07  7:47 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: tytso, jack, linux-ext4, linux-kernel,
	syzbot+98f651460e558a21baae
In-Reply-To: <20260507050605.50081-1-kartikey406@gmail.com>

On May 6, 2026, at 23:06, Deepanshu Kartikey <kartikey406@gmail.com> wrote:
> 
> jbd2_journal_dirty_metadata() unconditionally dereferences
> handle->h_transaction at function entry to obtain the journal pointer:
> 
> transaction_t *transaction = handle->h_transaction;
> journal_t *journal = transaction->t_journal;
> 
> However, h_transaction may legitimately be NULL for an aborted handle.
> The is_handle_aborted() helper in include/linux/jbd2.h explicitly
> treats !h_transaction as one of the aborted states:
> 
> if (handle->h_aborted || !handle->h_transaction)
> return 1;
> 
> Every other entry point in fs/jbd2/transaction.c
> (jbd2_journal_get_{write,undo,create}_access, jbd2_journal_extend,
> jbd2_journal_restart, jbd2_journal_stop, etc.) guards against this
> with an is_handle_aborted() check before any dereference of
> h_transaction. jbd2_journal_dirty_metadata() was missing this guard.
> 
> This is reachable from ocfs2's xattr code. ocfs2_xa_set() intentionally
> falls through to ocfs2_xa_journal_dirty() even after
> ocfs2_xa_prepare_entry() fails, on the assumption that the buffer
> needs to be journaled to record any partial modifications (see the
> comment above the out_dirty label in fs/ocfs2/xattr.c). If the failure
> was caused by the journal being aborted -- e.g. an underlying I/O
> error during a sub-operation such as __ocfs2_remove_xattr_range() --
> the handle's h_transaction has been cleared by the abort path, and
> the unconditional deref in jbd2_journal_dirty_metadata() becomes a
> NULL deref.
> 
> Reproduced by syzbot with a crafted ocfs2 image where I/O against the
> loop device backing the mount is sabotaged via LOOP_SET_STATUS64
> between two setxattr() calls, causing the second setxattr (which
> truncates an external xattr value) to abort the journal mid-flight:
> 
>  Oops: general protection fault, probably for non-canonical
>        address 0xdffffc0000000000
>  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
>  RIP: jbd2_journal_dirty_metadata+0x4a/0xd30 fs/jbd2/transaction.c:1520
>  Call Trace:
>   ocfs2_journal_dirty+0x130/0x700 fs/ocfs2/journal.c:831
>   ocfs2_xa_journal_dirty fs/ocfs2/xattr.c:1483 [inline]
>   ocfs2_xa_set+0x15e3/0x2ec0 fs/ocfs2/xattr.c:2294
>   ocfs2_xattr_block_set+0x3e0/0x33c0 fs/ocfs2/xattr.c:3016
>   __ocfs2_xattr_set_handle+0x6b3/0xf50 fs/ocfs2/xattr.c:3418
>   ocfs2_xattr_set+0xf3f/0x13e0 fs/ocfs2/xattr.c:3681
>   __vfs_setxattr+0x43c/0x480 fs/xattr.c:218
>   ...
> 
> Fix by adding the standard is_handle_aborted() guard at the top of
> jbd2_journal_dirty_metadata() and returning -EROFS, matching the
> pattern used by every other entry point in this file.
> ocfs2_journal_dirty() already handles a non-zero return from
> jbd2_journal_dirty_metadata() correctly.
> 
> Reported-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=98f651460e558a21baae
> Tested-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>

LGTM.

Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>>

> ---
> fs/jbd2/transaction.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 4885903bbd10..aa0be9e9c876 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1516,14 +1516,19 @@ void jbd2_buffer_abort_trigger(struct journal_head *jh,
>  */
> int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
> {
> - transaction_t *transaction = handle->h_transaction;
> - journal_t *journal = transaction->t_journal;
> + transaction_t *transaction;
> + journal_t *journal;
> struct journal_head *jh;
> int ret = 0;
> 
> + if (is_handle_aborted(handle))
> + return -EROFS;
> if (!buffer_jbd(bh))
> return -EUCLEAN;
> 
> + transaction = handle->h_transaction;
> + journal = transaction->t_journal;
> +
> /*
> * We don't grab jh reference here since the buffer must be part
> * of the running transaction.
> -- 
> 2.43.0
> 
> 


Cheers, Andreas






^ permalink raw reply

* Re: [PATCH v9 00/22] fs-verity support for XFS with post EOF merkle tree
From: Christoph Hellwig @ 2026-05-07  5:52 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: linux-xfs, fsverity, linux-fsdevel, ebiggers, hch, linux-ext4,
	linux-f2fs-devel, linux-btrfs, linux-unionfs, djwong, david
In-Reply-To: <20260428083332.768693-1-aalbersh@kernel.org>

On Tue, Apr 28, 2026 at 10:33:06AM +0200, Andrey Albershteyn wrote:
> This series based on v7.0 with Christoph's read ioends patchset [1].

That's not a good baseline.  We'll need it on the -rc that has everything
from the current merge window at least.  It might also interact with
the fsverity fix that just went in.

This might also be time to come up with a merge plan to figure out through
what tree(s) to merge it as there don't seem to be any maintainer
objections.


^ permalink raw reply

* [PATCH] jbd2: check for aborted handle in jbd2_journal_dirty_metadata()
From: Deepanshu Kartikey @ 2026-05-07  5:06 UTC (permalink / raw)
  To: tytso, jack
  Cc: linux-ext4, linux-kernel, Deepanshu Kartikey,
	syzbot+98f651460e558a21baae

jbd2_journal_dirty_metadata() unconditionally dereferences
handle->h_transaction at function entry to obtain the journal pointer:

	transaction_t *transaction = handle->h_transaction;
	journal_t *journal = transaction->t_journal;

However, h_transaction may legitimately be NULL for an aborted handle.
The is_handle_aborted() helper in include/linux/jbd2.h explicitly
treats !h_transaction as one of the aborted states:

	if (handle->h_aborted || !handle->h_transaction)
		return 1;

Every other entry point in fs/jbd2/transaction.c
(jbd2_journal_get_{write,undo,create}_access, jbd2_journal_extend,
jbd2_journal_restart, jbd2_journal_stop, etc.) guards against this
with an is_handle_aborted() check before any dereference of
h_transaction. jbd2_journal_dirty_metadata() was missing this guard.

This is reachable from ocfs2's xattr code. ocfs2_xa_set() intentionally
falls through to ocfs2_xa_journal_dirty() even after
ocfs2_xa_prepare_entry() fails, on the assumption that the buffer
needs to be journaled to record any partial modifications (see the
comment above the out_dirty label in fs/ocfs2/xattr.c). If the failure
was caused by the journal being aborted -- e.g. an underlying I/O
error during a sub-operation such as __ocfs2_remove_xattr_range() --
the handle's h_transaction has been cleared by the abort path, and
the unconditional deref in jbd2_journal_dirty_metadata() becomes a
NULL deref.

Reproduced by syzbot with a crafted ocfs2 image where I/O against the
loop device backing the mount is sabotaged via LOOP_SET_STATUS64
between two setxattr() calls, causing the second setxattr (which
truncates an external xattr value) to abort the journal mid-flight:

  Oops: general protection fault, probably for non-canonical
        address 0xdffffc0000000000
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  RIP: jbd2_journal_dirty_metadata+0x4a/0xd30 fs/jbd2/transaction.c:1520
  Call Trace:
   ocfs2_journal_dirty+0x130/0x700 fs/ocfs2/journal.c:831
   ocfs2_xa_journal_dirty fs/ocfs2/xattr.c:1483 [inline]
   ocfs2_xa_set+0x15e3/0x2ec0 fs/ocfs2/xattr.c:2294
   ocfs2_xattr_block_set+0x3e0/0x33c0 fs/ocfs2/xattr.c:3016
   __ocfs2_xattr_set_handle+0x6b3/0xf50 fs/ocfs2/xattr.c:3418
   ocfs2_xattr_set+0xf3f/0x13e0 fs/ocfs2/xattr.c:3681
   __vfs_setxattr+0x43c/0x480 fs/xattr.c:218
   ...

Fix by adding the standard is_handle_aborted() guard at the top of
jbd2_journal_dirty_metadata() and returning -EROFS, matching the
pattern used by every other entry point in this file.
ocfs2_journal_dirty() already handles a non-zero return from
jbd2_journal_dirty_metadata() correctly.

Reported-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=98f651460e558a21baae
Tested-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 fs/jbd2/transaction.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 4885903bbd10..aa0be9e9c876 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1516,14 +1516,19 @@ void jbd2_buffer_abort_trigger(struct journal_head *jh,
  */
 int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
 {
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	transaction_t *transaction;
+	journal_t *journal;
 	struct journal_head *jh;
 	int ret = 0;
 
+	if (is_handle_aborted(handle))
+		return -EROFS;
 	if (!buffer_jbd(bh))
 		return -EUCLEAN;
 
+	transaction = handle->h_transaction;
+	journal = transaction->t_journal;
+
 	/*
 	 * We don't grab jh reference here since the buffer must be part
 	 * of the running transaction.
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 0/7] fix up issues from djwong/fuse4fs-fork
From: Darrick J. Wong @ 2026-05-06 16:39 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Ext4 Developers List, fuse-devel
In-Reply-To: <20260506150833.GD49070@macsyma.local>

On Wed, May 06, 2026 at 05:08:33PM +0200, Theodore Tso wrote:
> On Wed, May 06, 2026 at 07:34:13AM -0700, Darrick J. Wong wrote:
> > [cc fuse-devel]
> > 
> > TLDR for the fuse developers: Ted and I discovered a collision between
> > the upstream libfuse feature bits and the MacFUSE feature bits, which
> > causes macfuse to do the wrong thing if you try to enable symlink
> > pagecache.
> 
> This is the patch for fuse2fs and fuse4fs in e2fsprogs which works
> around the problem (tested on MacOS using macfuse 5.2.0_1 from
> MacPorts).  More details about why it was needed is in the commit
> description.
> 
> 					- Ted
> 
> From 67f1ec55a1309abead16cad883e38b798a567191 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Wed, 6 May 2026 10:51:56 -0400
> Subject: [PATCH] fuse2fs, fuse4fs: Fix MacFuse compatibility issue
> 
> Unfortunately, MacFuse is overloading the top bits of the flags field
> in struct fuse_init_{out} for MacFuse-specific capability extensions.
> This results in an attempt to use FUSE_CAP_CACHE_SYMLINKS when linking
> with the libfuse in MacPorts will end up enabling
> FUSE_DARWIN_CAP_ACCESS_EXT with MacFuse.  Hilarity then ensues with
> all non-privileged access failing with permission denied.
> 
> The change which is needed in MacFuse is described in a TODO(bf)
> statement:
> 
> https://github.com/macfuse/library/blob/ddb630db8327a50b6670ef5e4f5e6da82a549e99/lib/fuse_lowlevel.c#L3415
> 
> I plan to submit a bug report to MacFuse, but in the mean time, work around
> the problem by disabling the overloaded capability flags on MacOS.
> 
> Link: https://lore.kernel.org/r/20260505225635.GT7765@frogsfrogsfrogs
> Link: https://lore.kernel.org/r/20260506092858.GC49070@macsyma.local
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Looks fine for now...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fuse4fs/fuse4fs.c | 18 ++++++++++++++++++
>  misc/fuse2fs.c    | 18 ++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
> index 92847326..2739da92 100644
> --- a/fuse4fs/fuse4fs.c
> +++ b/fuse4fs/fuse4fs.c
> @@ -120,6 +120,24 @@
>  #endif
>  #endif /* !defined(ENODATA) */
>  
> +#ifdef __APPLE__
> +/*
> + * Sigh.... MacFuse is overloading the top bits of the flags field in
> + * struct fuse_init_{out} for MacFuse-specific capability extensions.
> + * Avoid using these fuse3 capability flags until this gets fixed in
> + * MacFUSE
> + */
> +#undef FUSE_CAP_CACHE_SYMLINKS
> +#undef FUSE_CAP_NO_OPENDIR_SUPPORT
> +#undef FUSE_CAP_EXPLICIT_INVAL_DATA
> +#undef FUSE_CAP_EXPIRE_ONLY
> +#undef FUSE_CAP_SETXATTR_EXT
> +#undef FUSE_CAP_DIRECT_IO_ALLOW_MMAP
> +#undef FUSE_CAP_PASSTHROUGH
> +#undef FUSE_CAP_NO_EXPORT_SUPPORT
> +#undef FUSE_CAP_OVER_IO_URING
> +#endif
> +
>  #define FUSE4FS_ATTR_TIMEOUT	(0.0)
>  
>  static inline uint64_t round_up(uint64_t b, unsigned int align)
> diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
> index c46cfc23..0f2a3c35 100644
> --- a/misc/fuse2fs.c
> +++ b/misc/fuse2fs.c
> @@ -116,6 +116,24 @@
>  #endif
>  #endif /* !defined(ENODATA) */
>  
> +#ifdef __APPLE__
> +/*
> + * Sigh.... MacFuse is overloading the top bits of the flags field in
> + * struct fuse_init_{out} for MacFuse-specific capability extensions.
> + * Avoid using these fuse3 capability flags until this gets fixed in
> + * MacFUSE
> + */
> +#undef FUSE_CAP_CACHE_SYMLINKS
> +#undef FUSE_CAP_NO_OPENDIR_SUPPORT
> +#undef FUSE_CAP_EXPLICIT_INVAL_DATA
> +#undef FUSE_CAP_EXPIRE_ONLY
> +#undef FUSE_CAP_SETXATTR_EXT
> +#undef FUSE_CAP_DIRECT_IO_ALLOW_MMAP
> +#undef FUSE_CAP_PASSTHROUGH
> +#undef FUSE_CAP_NO_EXPORT_SUPPORT
> +#undef FUSE_CAP_OVER_IO_URING
> +#endif
> +
>  static inline uint64_t round_up(uint64_t b, unsigned int align)
>  {
>  	unsigned int m;
> -- 
> 2.50.1 (Apple Git-155)
> 

^ permalink raw reply

* Re: [PATCH 0/7] fix up issues from djwong/fuse4fs-fork
From: Theodore Tso @ 2026-05-06 15:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Ext4 Developers List, fuse-devel
In-Reply-To: <20260506143413.GA2241589@frogsfrogsfrogs>

On Wed, May 06, 2026 at 07:34:13AM -0700, Darrick J. Wong wrote:
> [cc fuse-devel]
> 
> TLDR for the fuse developers: Ted and I discovered a collision between
> the upstream libfuse feature bits and the MacFUSE feature bits, which
> causes macfuse to do the wrong thing if you try to enable symlink
> pagecache.

This is the patch for fuse2fs and fuse4fs in e2fsprogs which works
around the problem (tested on MacOS using macfuse 5.2.0_1 from
MacPorts).  More details about why it was needed is in the commit
description.

					- Ted

From 67f1ec55a1309abead16cad883e38b798a567191 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Wed, 6 May 2026 10:51:56 -0400
Subject: [PATCH] fuse2fs, fuse4fs: Fix MacFuse compatibility issue

Unfortunately, MacFuse is overloading the top bits of the flags field
in struct fuse_init_{out} for MacFuse-specific capability extensions.
This results in an attempt to use FUSE_CAP_CACHE_SYMLINKS when linking
with the libfuse in MacPorts will end up enabling
FUSE_DARWIN_CAP_ACCESS_EXT with MacFuse.  Hilarity then ensues with
all non-privileged access failing with permission denied.

The change which is needed in MacFuse is described in a TODO(bf)
statement:

https://github.com/macfuse/library/blob/ddb630db8327a50b6670ef5e4f5e6da82a549e99/lib/fuse_lowlevel.c#L3415

I plan to submit a bug report to MacFuse, but in the mean time, work around
the problem by disabling the overloaded capability flags on MacOS.

Link: https://lore.kernel.org/r/20260505225635.GT7765@frogsfrogsfrogs
Link: https://lore.kernel.org/r/20260506092858.GC49070@macsyma.local
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fuse4fs/fuse4fs.c | 18 ++++++++++++++++++
 misc/fuse2fs.c    | 18 ++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/fuse4fs/fuse4fs.c b/fuse4fs/fuse4fs.c
index 92847326..2739da92 100644
--- a/fuse4fs/fuse4fs.c
+++ b/fuse4fs/fuse4fs.c
@@ -120,6 +120,24 @@
 #endif
 #endif /* !defined(ENODATA) */
 
+#ifdef __APPLE__
+/*
+ * Sigh.... MacFuse is overloading the top bits of the flags field in
+ * struct fuse_init_{out} for MacFuse-specific capability extensions.
+ * Avoid using these fuse3 capability flags until this gets fixed in
+ * MacFUSE
+ */
+#undef FUSE_CAP_CACHE_SYMLINKS
+#undef FUSE_CAP_NO_OPENDIR_SUPPORT
+#undef FUSE_CAP_EXPLICIT_INVAL_DATA
+#undef FUSE_CAP_EXPIRE_ONLY
+#undef FUSE_CAP_SETXATTR_EXT
+#undef FUSE_CAP_DIRECT_IO_ALLOW_MMAP
+#undef FUSE_CAP_PASSTHROUGH
+#undef FUSE_CAP_NO_EXPORT_SUPPORT
+#undef FUSE_CAP_OVER_IO_URING
+#endif
+
 #define FUSE4FS_ATTR_TIMEOUT	(0.0)
 
 static inline uint64_t round_up(uint64_t b, unsigned int align)
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index c46cfc23..0f2a3c35 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -116,6 +116,24 @@
 #endif
 #endif /* !defined(ENODATA) */
 
+#ifdef __APPLE__
+/*
+ * Sigh.... MacFuse is overloading the top bits of the flags field in
+ * struct fuse_init_{out} for MacFuse-specific capability extensions.
+ * Avoid using these fuse3 capability flags until this gets fixed in
+ * MacFUSE
+ */
+#undef FUSE_CAP_CACHE_SYMLINKS
+#undef FUSE_CAP_NO_OPENDIR_SUPPORT
+#undef FUSE_CAP_EXPLICIT_INVAL_DATA
+#undef FUSE_CAP_EXPIRE_ONLY
+#undef FUSE_CAP_SETXATTR_EXT
+#undef FUSE_CAP_DIRECT_IO_ALLOW_MMAP
+#undef FUSE_CAP_PASSTHROUGH
+#undef FUSE_CAP_NO_EXPORT_SUPPORT
+#undef FUSE_CAP_OVER_IO_URING
+#endif
+
 static inline uint64_t round_up(uint64_t b, unsigned int align)
 {
 	unsigned int m;
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* Re: [PATCH 0/7] fix up issues from djwong/fuse4fs-fork
From: Darrick J. Wong @ 2026-05-06 14:34 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Ext4 Developers List, fuse-devel
In-Reply-To: <20260506092858.GC49070@macsyma.local>

[cc fuse-devel]

TLDR for the fuse developers: Ted and I discovered a collision between
the upstream libfuse feature bits and the MacFUSE feature bits, which
causes macfuse to do the wrong thing if you try to enable symlink
pagecache.

https://lore.kernel.org/linux-ext4/20260505225635.GT7765@frogsfrogsfrogs/

On Wed, May 06, 2026 at 11:28:58AM +0200, Theodore Tso wrote:
> On Tue, May 05, 2026 at 03:56:35PM -0700, Darrick J. Wong wrote:
> > FUSE kABI 7.20 added FUSE_AUTO_INVAL_DATA, which is bit 12.  It looks to
> > me as though they decided to add their own MacOS-specific feature flags
> > at the end of the u32 want field.  Then Linux FUSE added 11 more feature
> > flags, at which point they unthinkingly ported over FUSE_CACHE_SYMLINKS,
> > which collides with FUSE_DARWIN_ACCESS_EXT.  Apparently nobody on the
> > macfuse end noticed, so on your machine you're getting whatever
> > "ACCESS_EXT" does.
> 
> Yeah....
> 
> What MacFuse needs to do is to steal some extra fields from struct
> fuse_init_in and fuse_init_out for the darwin-specific capabilities.
> It turns out it already has conn->{want,capable}_darwin, but there's
> no way to pass it in and out of op_init....
> 
> #ifdef __APPLE__
> 	/*
> 	 * TODO(bf)
> 	 *
> 	 * Resolve conflict with vanilla API. We need a separate field flags for
> 	 * Darwin-only flags. As long as we don't support anything beyond ABI
> 	 * version 7.19 on the kernel-side this should not be an issue, though.
> 	 * We need to clean this up when moving to 7.20 or later.
> 	 */
> 	if (se->conn.want_darwin & FUSE_DARWIN_CAP_ACCESS_EXT)
> 		outargflags |= FUSE_DARWIN_ACCESS_EXT;
> 
> So I *guess* what MacFuse needs to do is to do something like:
> 
> struct fuse_init_in {
> 	uint32_t	major;
> 	uint32_t	minor;
> 	uint32_t	max_readahead;
> 	uint32_t	flags;
> 	uint32_t	flags2;
> 	uint32_t	unused[9];
> 	uint32_t	darwin_flags;
> 	uint32_t	darwin_flags2;
> };
> 
> am I right in understanding that fuse_*_{in,out} is private between
> the OS's libfuse and OS's fuse driver or kernel extension, so
> it's not disastrous for fuse_kernel.h for Mac and Linux to drift?

Yeah, the easiest method would be to carve out the darwin_flags field.

If they still want to use flags/flags2, then they could probably still
fix it by redefining whichever feature was added later (probably
FUSE_CACHE_SYMLINKS) because right now they're broken, so it's not an
ABI break to redefine the symbol.

Either way someone will have to talk to the MacFUSE people about this.

> > Does this work?
> >
> > /* MacFUSE overlays feature bits with LinuxFUSE, this is fcked up */
> > #if defined(FUSE_CAP_CACHE_SYMLINKS) && !defined(FUSE_CACHE_SYMLINKS)
> > 	fuse_set_feature_flag(conn, FUSE_CAP_CACHE_SYMLINKS);
> > #endif
> 
> What I'm thinking about doing is adding at the beginning of
> fuse[24]fs.c:
> 
> #ifdef __APPLE__
> /*
>  * Sigh.... MacFuse is overloading the top bits of the flags field of
>  * struct fuse_init_{out} so we have to avoid using these capability
>  * flags until this gets fixed in MacFUSE
>  */
> #undef FUSE_CACHE_SYMLINKS
> #undef FUSE_NO_OPENDIR_SUPPORT
> #undef FUSE_EXPLICIT_INVAL_DATA
> #undef FUSE_MAP_ALIGNMENT
> #undef FUSE_SUBMOUNTS
> #undef FUSE_HANDLE_KILLPRIV_V2
> #undef FUSE_SETXATTR_EXT
> #undef FUSE_INIT_EXT
> #endif

That works for now.  If macfuse fixes themselves, then I guess we could
turn that into a configure check.

--D

> 
> 						- Ted

^ permalink raw reply

* [RFC v1 1/1] buffer_head: fail fast on repeated reads after I/O errors
From: Diangang Li @ 2026-05-06 13:50 UTC (permalink / raw)
  To: axboe, viro, brauner
  Cc: linux-block, linux-ext4, linux-fsdevel, changfengnan, Diangang Li
In-Reply-To: <20260506135047.2670453-1-diangangli@gmail.com>

From: Diangang Li <lidiangang@bytedance.com>

A failed buffer_head read leaves the buffer !Uptodate. If multiple
threads hit that same buffer_head, they serialize on BH_Lock and each
one re-submits the same read after the previous owner drops the lock.
If the device is slow to return the error, this can turn one bad block
into long stalls and repeated slow I/O.

Trying to remember bad LBAs in block or drivers would need a generic
per-device table with lookup, eviction, and lifetime rules. For buffer_head
users, keep the failure state with the cached buffer_head instead.

Track non-readahead read I/O errors in buffer_head with a dedicated bit
and a failure timestamp. Update this state from the bio completion path.
Add an optional per-bdev retry window: within the window, non-readahead
submit_bh() reads complete immediately with failure for a buffer_head
that recently saw a non-readahead read error. A successful read or
rewrite clears the state.

The timestamp is recorded on the first error only, so repeated failures do
not extend the window. Once the window expires, the next read is submitted
normally and can discover that the device or media has recovered.

Configure per block device via sysfs:

  /sys/block/<disk>/read_err_retry_sec
  /sys/block/<disk>/<part>/read_err_retry_sec

Default is 0, preserving existing behavior. Disk and partition values are
independent, and values larger than MAX_JIFFY_OFFSET / HZ are rejected to
avoid jiffies overflow.

Link: https://lore.kernel.org/linux-ext4/20260325093349.630193-1-diangangli@gmail.com/
Signed-off-by: Diangang Li <lidiangang@bytedance.com>
---
 Documentation/ABI/stable/sysfs-block | 26 +++++++++++
 block/genhd.c                        | 24 ++++++++++
 block/partitions/core.c              | 24 ++++++++++
 fs/buffer.c                          | 65 ++++++++++++++++++++++++++++
 include/linux/blk_types.h            |  3 ++
 include/linux/buffer_head.h          | 10 +++++
 6 files changed, 152 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 900b3fc4c72d0..b850f96fa048e 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -185,6 +185,32 @@ Description:
 		unsigned integer, but only "0" and "1" are valid values.
 
 
+What:		/sys/block/<disk>/read_err_retry_sec
+What:		/sys/block/<disk>/<partition>/read_err_retry_sec
+Date:		May 2026
+Contact:	linux-block@vger.kernel.org
+Description:
+		(RW) Configure the fail-fast window, in seconds, for repeated
+		buffer_head reads after read I/O errors.
+
+		The default value is 0, which disables the fail-fast behavior and
+		preserves the existing retry behavior. When this value is non-zero,
+		a buffer_head that has recently seen a non-readahead read I/O error
+		can fail another read immediately within the configured window,
+		instead of submitting another bio for the same buffer_head.
+
+		This only applies to buffer_head reads submitted through submit_bh().
+		It is not a generic block layer read retry policy, and it does not
+		affect direct I/O or non-buffer_head bio submissions.
+
+		Disk and partition attributes are independent. Setting the disk
+		attribute does not change the value for existing or future
+		partition block devices.
+
+		The maximum accepted value is MAX_JIFFY_OFFSET / HZ. Larger values
+		are rejected with -ERANGE.
+
+
 What:		/sys/block/<disk>/<partition>/alignment_offset
 Date:		April 2009
 Contact:	Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/block/genhd.c b/block/genhd.c
index 7d6854fd28e95..302dce67d685c 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1159,6 +1159,28 @@ static ssize_t partscan_show(struct device *dev,
 	return sysfs_emit(buf, "%u\n", disk_has_partscan(dev_to_disk(dev)));
 }
 
+static ssize_t read_err_retry_sec_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%lu\n",
+			  READ_ONCE(dev_to_bdev(dev)->bd_read_err_retry_sec));
+}
+
+static ssize_t read_err_retry_sec_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	unsigned long sec;
+
+	if (kstrtoul(buf, 0, &sec))
+		return -EINVAL;
+	if (sec > MAX_JIFFY_OFFSET / HZ)
+		return -ERANGE;
+
+	WRITE_ONCE(dev_to_bdev(dev)->bd_read_err_retry_sec, sec);
+	return count;
+}
+
 static DEVICE_ATTR(range, 0444, disk_range_show, NULL);
 static DEVICE_ATTR(ext_range, 0444, disk_ext_range_show, NULL);
 static DEVICE_ATTR(removable, 0444, disk_removable_show, NULL);
@@ -1173,6 +1195,7 @@ static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL);
 static DEVICE_ATTR(badblocks, 0644, disk_badblocks_show, disk_badblocks_store);
 static DEVICE_ATTR(diskseq, 0444, diskseq_show, NULL);
 static DEVICE_ATTR(partscan, 0444, partscan_show, NULL);
+static DEVICE_ATTR_RW(read_err_retry_sec);
 
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 ssize_t part_fail_show(struct device *dev,
@@ -1224,6 +1247,7 @@ static struct attribute *disk_attrs[] = {
 	&dev_attr_events_poll_msecs.attr,
 	&dev_attr_diskseq.attr,
 	&dev_attr_partscan.attr,
+	&dev_attr_read_err_retry_sec.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	&dev_attr_fail.attr,
 #endif
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 5d5332ce586b6..62b4c2f70709f 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -205,6 +205,28 @@ static ssize_t part_discard_alignment_show(struct device *dev,
 	return sysfs_emit(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev)));
 }
 
+static ssize_t read_err_retry_sec_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%lu\n",
+			  READ_ONCE(dev_to_bdev(dev)->bd_read_err_retry_sec));
+}
+
+static ssize_t read_err_retry_sec_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	unsigned long sec;
+
+	if (kstrtoul(buf, 0, &sec))
+		return -EINVAL;
+	if (sec > MAX_JIFFY_OFFSET / HZ)
+		return -ERANGE;
+
+	WRITE_ONCE(dev_to_bdev(dev)->bd_read_err_retry_sec, sec);
+	return count;
+}
+
 static DEVICE_ATTR(partition, 0444, part_partition_show, NULL);
 static DEVICE_ATTR(start, 0444, part_start_show, NULL);
 static DEVICE_ATTR(size, 0444, part_size_show, NULL);
@@ -213,6 +235,7 @@ static DEVICE_ATTR(alignment_offset, 0444, part_alignment_offset_show, NULL);
 static DEVICE_ATTR(discard_alignment, 0444, part_discard_alignment_show, NULL);
 static DEVICE_ATTR(stat, 0444, part_stat_show, NULL);
 static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL);
+static DEVICE_ATTR_RW(read_err_retry_sec);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 static struct device_attribute dev_attr_fail =
 	__ATTR(make-it-fail, 0644, part_fail_show, part_fail_store);
@@ -227,6 +250,7 @@ static struct attribute *part_attrs[] = {
 	&dev_attr_discard_alignment.attr,
 	&dev_attr_stat.attr,
 	&dev_attr_inflight.attr,
+	&dev_attr_read_err_retry_sec.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	&dev_attr_fail.attr,
 #endif
diff --git a/fs/buffer.c b/fs/buffer.c
index b0b3792b1496e..2a28ab6a51f0e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -920,6 +920,7 @@ static sector_t folio_init_buffers(struct folio *folio,
 			bh->b_private = NULL;
 			bh->b_bdev = bdev;
 			bh->b_blocknr = block;
+			clear_buffer_read_io_error_state(bh);
 			if (uptodate)
 				set_buffer_uptodate(bh);
 			if (block < end_block)
@@ -1503,6 +1504,7 @@ static void discard_buffer(struct buffer_head * bh)
 	lock_buffer(bh);
 	clear_buffer_dirty(bh);
 	bh->b_bdev = NULL;
+	clear_buffer_read_io_error_state(bh);
 	b_state = READ_ONCE(bh->b_state);
 	do {
 	} while (!try_cmpxchg_relaxed(&bh->b_state, &b_state,
@@ -1997,6 +1999,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 		bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
 				inode->i_blkbits;
 		set_buffer_mapped(bh);
+		clear_buffer_read_io_error_state(bh);
 		return 0;
 	default:
 		WARN_ON_ONCE(1);
@@ -2663,6 +2666,33 @@ sector_t generic_block_bmap(struct address_space *mapping, sector_t block,
 }
 EXPORT_SYMBOL(generic_block_bmap);
 
+static void bh_update_io_error_state(struct buffer_head *bh, const struct bio *bio)
+{
+	const enum req_op op = bio_op(bio);
+
+	if (op != REQ_OP_READ && op != REQ_OP_WRITE)
+		return;
+
+	/*
+	 * Track non-readahead read failures (timestamped) so submit_bh() can
+	 * fail repeated reads fast. A successful read or rewrite clears the
+	 * state.
+	 */
+	if (!bio->bi_status) {
+		clear_buffer_read_io_error(bh);
+		bh->b_err_timestamp = 0;
+		return;
+	}
+
+	/* Record the first failure; don't extend the window on repeats. */
+	if (op != REQ_OP_READ || (bio->bi_opf & REQ_RAHEAD) ||
+	    buffer_read_io_error(bh))
+		return;
+
+	set_buffer_read_io_error(bh);
+	bh->b_err_timestamp = jiffies;
+}
+
 static void end_bio_bh_io_sync(struct bio *bio)
 {
 	struct buffer_head *bh = bio->bi_private;
@@ -2670,10 +2700,37 @@ static void end_bio_bh_io_sync(struct bio *bio)
 	if (unlikely(bio_flagged(bio, BIO_QUIET)))
 		set_bit(BH_Quiet, &bh->b_state);
 
+	bh_update_io_error_state(bh, bio);
+
 	bh->b_end_io(bh, !bio->bi_status);
 	bio_put(bio);
 }
 
+static bool bh_failfast_read(struct buffer_head *bh)
+{
+	unsigned long retry_sec = READ_ONCE(bh->b_bdev->bd_read_err_retry_sec);
+
+	if (!retry_sec || !buffer_read_io_error(bh))
+		return false;
+
+	/* No timestamp: treat as stale state and re-arm on the next failure. */
+	if (!bh->b_err_timestamp) {
+		clear_buffer_read_io_error(bh);
+		return false;
+	}
+
+	if (time_before(jiffies,
+			bh->b_err_timestamp + secs_to_jiffies(retry_sec))) {
+		test_set_buffer_req(bh);
+		bh->b_end_io(bh, 0);
+		return true;
+	}
+
+	clear_buffer_read_io_error(bh);
+	bh->b_err_timestamp = 0;
+	return false;
+}
+
 static void buffer_set_crypto_ctx(struct bio *bio, const struct buffer_head *bh,
 				  gfp_t gfp_mask)
 {
@@ -2702,6 +2759,14 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	BUG_ON(buffer_delay(bh));
 	BUG_ON(buffer_unwritten(bh));
 
+	/*
+	 * Fail fast for repeated non-readahead buffer_head reads after a recent
+	 * I/O error. This avoids serializing many callers on BH_Lock while
+	 * re-submitting the same failing read.
+	 */
+	if (op == REQ_OP_READ && !(opf & REQ_RAHEAD) && bh_failfast_read(bh))
+		return;
+
 	/*
 	 * Only clear out a write error when rewriting
 	 */
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8808ee76e73c0..9437c471ee7d7 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -69,6 +69,9 @@ struct block_device {
 	atomic_t		bd_fsfreeze_count; /* number of freeze requests */
 	struct mutex		bd_fsfreeze_mutex; /* serialize freeze/thaw */
 
+	/* Seconds; 0 disables read fail-fast window for submit_bh(READ). */
+	unsigned long		bd_read_err_retry_sec;
+
 	struct partition_meta_info *bd_meta_info;
 	int			bd_writers;
 #ifdef CONFIG_SECURITY
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index e4939e33b4b51..3ab36429f8f38 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -29,6 +29,7 @@ enum bh_state_bits {
 	BH_Delay,	/* Buffer is not yet allocated on disk */
 	BH_Boundary,	/* Block is followed by a discontiguity */
 	BH_Write_EIO,	/* I/O error on write */
+	BH_Read_EIO,	/* I/O error on read */
 	BH_Unwritten,	/* Buffer is allocated on disk but not written */
 	BH_Quiet,	/* Buffer Error Prinks to be quiet */
 	BH_Meta,	/* Buffer contains metadata */
@@ -79,6 +80,7 @@ struct buffer_head {
 	spinlock_t b_uptodate_lock;	/* Used by the first bh in a page, to
 					 * serialise IO completion of other
 					 * buffers in the page */
+	unsigned long b_err_timestamp;	/* timestamp of last I/O error */
 };
 
 /*
@@ -132,11 +134,18 @@ BUFFER_FNS(Async_Write, async_write)
 BUFFER_FNS(Delay, delay)
 BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
+BUFFER_FNS(Read_EIO, read_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)
 
+static inline void clear_buffer_read_io_error_state(struct buffer_head *bh)
+{
+	clear_buffer_read_io_error(bh);
+	bh->b_err_timestamp = 0;
+}
+
 static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
 {
 	/*
@@ -411,6 +420,7 @@ map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block)
 	bh->b_bdev = sb->s_bdev;
 	bh->b_blocknr = block;
 	bh->b_size = sb->s_blocksize;
+	clear_buffer_read_io_error_state(bh);
 }
 
 static inline void wait_on_buffer(struct buffer_head *bh)
-- 
2.39.5

^ permalink raw reply related

* [RFC v1 0/1] buffer_head: fail fast on repeated reads after I/O errors
From: Diangang Li @ 2026-05-06 13:50 UTC (permalink / raw)
  To: axboe, viro, brauner
  Cc: linux-block, linux-ext4, linux-fsdevel, changfengnan, Diangang Li

From: Diangang Li <lidiangang@bytedance.com>

A production system reported hung tasks blocked for 300s+ in ext4
buffer_head paths. Hung task reports were accompanied by disk I/O errors,
but profiling showed that most individual reads completed (or failed)
within 10s, with the worst case around 60s.

At the same time, we observed a high repeat rate to the same disk LBAs.
The repeated reads frequently showed seconds-level latency and ended with
I/O errors, e.g.:

  [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,
      sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
  [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,
      sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
  [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,
      sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

We also sampled repeated-LBA latency histograms on /dev/sdi and saw that
the same error-prone LBAs were re-submitted many times with ~1-4s latency:

  LBA 10704488160 (count=22): 1-2s: 20, 2-4s: 2
  LBA 10704382912 (count=21): 1-2s: 20, 2-4s: 1
  LBA 10704150288 (count=21): 1-2s: 19, 2-4s: 2

Root cause
==========

buffer_head reads serialize I/O via BH_Lock. When one read fails, the
buffer remains !Uptodate. With multiple threads concurrently accessing
the same buffer_head, each waiter wakes up after the previous owner drops
BH_Lock, then submits the same read again and waits again. This makes the
latency grow linearly with the number of contending threads, leading to
300s+ hung tasks.

The failing I/Os are repeatedly issued to the same LBA. The observed 1s+
per-I/O latency is likely from device-side retry/error recovery. On SCSI
the driver typically retries reads several times (e.g. 5 retries in our
environment), so a single filesystem submission can easily accumulate 5s+
delay before failing. When multiple threads then re-submit the same
failing read and serialize on BH_Lock, the delay is amplified into 300s+
hung tasks.

Similar behavior exists for other devices (e.g. NVMe with multiple
internal retries).

Example hung stacks:

  INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.
  Call Trace:
   __schedule
   io_schedule
   __wait_on_bit_lock
   bh_uptodate_or_lock
   __read_extent_tree_block
   ext4_find_extent
   ext4_ext_map_blocks
   ext4_map_blocks
   ext4_getblk
   ext4_bread
   __ext4_read_dirblock
   dx_probe
   ext4_htree_fill_tree
   ext4_readdir
   iterate_dir
   ksys_getdents64

  INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.
  Call Trace:
   __schedule
   io_schedule
   __wait_on_bit_lock
   ext4_read_bh_lock
   ext4_bread
   __ext4_read_dirblock
   htree_dirblock_to_tree
   ext4_htree_fill_tree
   ext4_readdir
   iterate_dir
   ksys_getdents64

This series follows an earlier ext4-only RFC and moves the policy to the
generic buffer_head path so other buffer_head users can opt in with the
same per-block-device knob.

Approach
========

Record non-readahead read failures on buffer_head (BH_Read_EIO +
b_err_timestamp). When a per-bdev retry window is configured, submit_bh()
will skip submitting another non-readahead read for a buffer_head that
already failed within the window and complete it immediately with failure.
Clear the state on successful read or rewrite so the buffer can recover
if the error is transient.

The timestamp is recorded on the first failure only, so repeated failures
do not extend the retry window. After the window expires, the next
non-readahead read is submitted normally and can discover that the device
or media has recovered.

The retry window is configured per block device:

  /sys/block/<disk>/read_err_retry_sec
  /sys/block/<disk>/<part>/read_err_retry_sec

The default value is 0, which keeps the current behavior: after a read
error, callers may keep retrying the same read. Set it to a non-zero
value to fail repeated non-readahead reads fast within the window.

Patch summary
=============

  1) Add BH_Read_EIO and b_err_timestamp to buffer_head.
  2) Track non-readahead read failures in the submit_bh() bio completion
     path.
  3) Add per-bdev read_err_retry_sec sysfs knobs for disks and partitions.
  4) Fail repeated non-readahead submit_bh() reads fast within the
     configured window, while leaving readahead and other bio users
     unchanged.

Diangang Li (1):
  buffer_head: fail fast on repeated reads after I/O errors

 Documentation/ABI/stable/sysfs-block | 26 +++++++++++
 block/genhd.c                        | 24 ++++++++++
 block/partitions/core.c              | 24 ++++++++++
 fs/buffer.c                          | 65 ++++++++++++++++++++++++++++
 include/linux/blk_types.h            |  3 ++
 include/linux/buffer_head.h          | 10 +++++
 6 files changed, 152 insertions(+)

-- 
2.39.5

^ permalink raw reply

* Re: [PATCH] ext4: enable mballoc kunit tests for blocksize > PAGE_SIZE
From: Jan Kara @ 2026-05-06  9:35 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, ojaswin,
	ritesh.list
In-Reply-To: <20260506075900.3649944-1-libaokun@linux.alibaba.com>

On Wed 06-05-26 15:59:00, Baokun Li wrote:
> With Large Block Size (LBS) support, ext4 can now use block sizes larger
> than PAGE_SIZE. The mballoc kunit tests previously skipped three test
> cases (test_mb_mark_used, test_mb_free_blocks, test_mb_mark_used_cost)
> under this configuration because the buddy cache inode's folio mapping
> order was never initialized in the test harness.
> 
> The real mount path configures s_min_folio_order and s_max_folio_order
> in ext4_fill_super(), which allows ext4_set_inode_mapping_order() to
> set up the correct folio order for the buddy cache inode. The kunit
> test bypasses ext4_fill_super(), so the mapping order stayed at zero
> and __filemap_get_folio() allocated order-0 folios too small for LBS.
> 
> Initialize s_min_folio_order and s_max_folio_order in mbt_init_sb_layout()
> to mirror ext4_fill_super() behavior, enabling properly sized folio
> allocations and removing the three blocksize > PAGE_SIZE skips.
> 
> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>

Looks sensible. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc-test.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
> index 90ed505fa4b1..04bc9f773d63 100644
> --- a/fs/ext4/mballoc-test.c
> +++ b/fs/ext4/mballoc-test.c
> @@ -206,6 +206,8 @@ static void mbt_init_sb_layout(struct super_block *sb,
>  	sbi->s_desc_per_block_bits =
>  		sb->s_blocksize_bits - (fls(layout->desc_size) - 1);
>  	sbi->s_desc_per_block = 1 << sbi->s_desc_per_block_bits;
> +	sbi->s_min_folio_order = get_order(sb->s_blocksize);
> +	sbi->s_max_folio_order = sbi->s_min_folio_order;
>  
>  	es->s_first_data_block = cpu_to_le32(0);
>  	es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
> @@ -791,10 +793,6 @@ static void test_mb_mark_used(struct kunit *test)
>  	struct test_range ranges[TEST_RANGE_COUNT];
>  	int i;
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
>  	buddy = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
> @@ -858,10 +856,6 @@ static void test_mb_free_blocks(struct kunit *test)
>  	int i;
>  	struct test_range ranges[TEST_RANGE_COUNT];
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	bitmap = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bitmap);
>  	buddy = kunit_kzalloc(test, sb->s_blocksize, GFP_KERNEL);
> @@ -905,10 +899,6 @@ static void test_mb_mark_used_cost(struct kunit *test)
>  	int i, j;
>  	unsigned long start, end, all = 0;
>  
> -	/* buddy cache assumes that each page contains at least one block */
> -	if (sb->s_blocksize > PAGE_SIZE)
> -		kunit_skip(test, "blocksize exceeds pagesize");
> -
>  	ret = ext4_mb_load_buddy_test(sb, TEST_GOAL_GROUP, &e4b);
>  	KUNIT_ASSERT_EQ(test, ret, 0);
>  
> -- 
> 2.43.7
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 0/7] fix up issues from djwong/fuse4fs-fork
From: Theodore Tso @ 2026-05-06  9:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Ext4 Developers List
In-Reply-To: <20260505225635.GT7765@frogsfrogsfrogs>

On Tue, May 05, 2026 at 03:56:35PM -0700, Darrick J. Wong wrote:
> FUSE kABI 7.20 added FUSE_AUTO_INVAL_DATA, which is bit 12.  It looks to
> me as though they decided to add their own MacOS-specific feature flags
> at the end of the u32 want field.  Then Linux FUSE added 11 more feature
> flags, at which point they unthinkingly ported over FUSE_CACHE_SYMLINKS,
> which collides with FUSE_DARWIN_ACCESS_EXT.  Apparently nobody on the
> macfuse end noticed, so on your machine you're getting whatever
> "ACCESS_EXT" does.

Yeah....

What MacFuse needs to do is to steal some extra fields from struct
fuse_init_in and fuse_init_out for the darwin-specific capabilities.
It turns out it already has conn->{want,capable}_darwin, but there's
no way to pass it in and out of op_init....

#ifdef __APPLE__
	/*
	 * TODO(bf)
	 *
	 * Resolve conflict with vanilla API. We need a separate field flags for
	 * Darwin-only flags. As long as we don't support anything beyond ABI
	 * version 7.19 on the kernel-side this should not be an issue, though.
	 * We need to clean this up when moving to 7.20 or later.
	 */
	if (se->conn.want_darwin & FUSE_DARWIN_CAP_ACCESS_EXT)
		outargflags |= FUSE_DARWIN_ACCESS_EXT;

So I *guess* what MacFuse needs to do is to do something like:

struct fuse_init_in {
	uint32_t	major;
	uint32_t	minor;
	uint32_t	max_readahead;
	uint32_t	flags;
	uint32_t	flags2;
	uint32_t	unused[9];
	uint32_t	darwin_flags;
	uint32_t	darwin_flags2;
};

am I right in understanding that fuse_*_{in,out} is private between
the OS's libfuse and OS's fuse driver or kernel extension, so
it's not disastrous for fuse_kernel.h for Mac and Linux to drift?

> Does this work?
>
> /* MacFUSE overlays feature bits with LinuxFUSE, this is fcked up */
> #if defined(FUSE_CAP_CACHE_SYMLINKS) && !defined(FUSE_CACHE_SYMLINKS)
> 	fuse_set_feature_flag(conn, FUSE_CAP_CACHE_SYMLINKS);
> #endif

What I'm thinking about doing is adding at the beginning of
fuse[24]fs.c:

#ifdef __APPLE__
/*
 * Sigh.... MacFuse is overloading the top bits of the flags field of
 * struct fuse_init_{out} so we have to avoid using these capability
 * flags until this gets fixed in MacFUSE
 */
#undef FUSE_CACHE_SYMLINKS
#undef FUSE_NO_OPENDIR_SUPPORT
#undef FUSE_EXPLICIT_INVAL_DATA
#undef FUSE_MAP_ALIGNMENT
#undef FUSE_SUBMOUNTS
#undef FUSE_HANDLE_KILLPRIV_V2
#undef FUSE_SETXATTR_EXT
#undef FUSE_INIT_EXT
#endif

						- Ted

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox