* [PATCH v12 10/15] nfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
An NFS server re-exporting an NFS mount point needs to report
the case sensitivity behavior of the underlying filesystem to
its clients. NFSD's attribute encoder obtains that information
by calling vfs_fileattr_get() on the lower filesystem, so the
NFS client must implement fileattr_get to surface what it
learned from its own server.
The NFS client already retrieves case sensitivity information
from servers during mount via PATHCONF (NFSv3) or the
FATTR4_CASE_INSENSITIVE/FATTR4_CASE_PRESERVING attributes
(NFSv4). Expose this information through fileattr_get by
reporting the FS_XFLAG_CASEFOLD and FS_XFLAG_CASENONPRESERVING
flags. NFSv2 lacks PATHCONF support, so mounts using that protocol
version default to standard POSIX behavior: case-sensitive and
case-preserving.
PATHCONF is now invoked unconditionally for NFSv2 and NFSv3 mounts
so the case-sensitivity capabilities are established even when the
user pins server->namelen with the namlen= mount option. That option
is orthogonal to case handling, and skipping PATHCONF because
namelen was already known would leave the caps unset.
The two capability bits carry opposite polarity because their POSIX
defaults differ. Most servers are case-sensitive and case-
preserving, matching "neither xflag set." NFS_CAP_CASE_INSENSITIVE
is set only when the server affirms case insensitivity, so "server
said no" and "server did not answer" both collapse to the case-
sensitive default. NFS_CAP_CASE_NONPRESERVING follows the same
pattern in the opposite direction: set only when the server affirms
that it does not preserve case, so that silence or a missing
attribute lands on the case-preserving default. The NFSv4 probe
checks res.attr_bitmask[0] to distinguish "server said false" from
"server omitted the attribute" before setting the bit.
Both capability bits are cleared at the start of each successful
probe so a remount or NFSv4 transparent state migration to a server
with different case semantics does not retain stale capabilities
from the prior probe.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/client.c | 25 ++++++++++++++++++-------
fs/nfs/inode.c | 15 +++++++++++++++
fs/nfs/internal.h | 3 +++
fs/nfs/namespace.c | 2 ++
fs/nfs/nfs3proc.c | 2 ++
fs/nfs/nfs3xdr.c | 7 +++++--
fs/nfs/nfs4proc.c | 10 +++++++---
fs/nfs/proc.c | 3 +++
fs/nfs/symlink.c | 3 +++
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 2 ++
11 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be02bb227741..7ca16fc72689 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -914,6 +914,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
*/
static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs_fattr *fattr)
{
+ struct nfs_pathconf pathinfo = { };
struct nfs_fsinfo fsinfo;
struct nfs_client *clp = server->nfs_client;
int error;
@@ -933,15 +934,25 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
nfs_server_set_fsinfo(server, &fsinfo);
- /* Get some general file system info */
- if (server->namelen == 0) {
- struct nfs_pathconf pathinfo;
+ pathinfo.fattr = fattr;
+ nfs_fattr_init(fattr);
- pathinfo.fattr = fattr;
- nfs_fattr_init(fattr);
-
- if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0)
+ if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0) {
+ if (server->namelen == 0)
server->namelen = pathinfo.max_namelen;
+ /*
+ * Clear the bits before re-OR'ing so a remount
+ * against a server with different case semantics
+ * does not retain stale caps.
+ */
+ if (clp->rpc_ops->version < 4) {
+ server->caps &= ~(NFS_CAP_CASE_INSENSITIVE |
+ NFS_CAP_CASE_NONPRESERVING);
+ if (pathinfo.case_insensitive)
+ server->caps |= NFS_CAP_CASE_INSENSITIVE;
+ if (!pathinfo.case_preserving)
+ server->caps |= NFS_CAP_CASE_NONPRESERVING;
+ }
}
if (clp->rpc_ops->discover_trunking != NULL &&
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 98a8f0de1199..fdcbe6f2052c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -41,6 +41,7 @@
#include <linux/freezer.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
+#include <linux/fileattr.h>
#include "nfs4_fs.h"
#include "callback.h"
@@ -1101,6 +1102,20 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
}
EXPORT_SYMBOL_GPL(nfs_getattr);
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct inode *inode = d_inode(dentry);
+
+ if (nfs_server_capable(inode, NFS_CAP_CASE_INSENSITIVE)) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ if (nfs_server_capable(inode, NFS_CAP_CASE_NONPRESERVING))
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nfs_fileattr_get);
+
static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
{
refcount_set(&l_ctx->count, 1);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fc5456377160..309d3f679bb3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,6 +449,9 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+struct file_kattr;
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
/* localio.c */
struct nfs_local_dio {
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index af9be0c5f516..6d0073c24771 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -246,11 +246,13 @@ nfs_namespace_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
const struct inode_operations nfs_mountpoint_inode_operations = {
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
const struct inode_operations nfs_referral_inode_operations = {
.getattr = nfs_namespace_getattr,
.setattr = nfs_namespace_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
static void nfs_expire_automounts(struct work_struct *work)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 95d7cd564b74..b80d0c5efc27 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1053,6 +1053,7 @@ static const struct inode_operations nfs3_dir_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
#ifdef CONFIG_NFS_V3_ACL
.listxattr = nfs3_listxattr,
.get_inode_acl = nfs3_get_acl,
@@ -1064,6 +1065,7 @@ static const struct inode_operations nfs3_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
#ifdef CONFIG_NFS_V3_ACL
.listxattr = nfs3_listxattr,
.get_inode_acl = nfs3_get_acl,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index e17d72908412..e745e78faab0 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2276,8 +2276,11 @@ static int decode_pathconf3resok(struct xdr_stream *xdr,
if (unlikely(!p))
return -EIO;
result->max_link = be32_to_cpup(p++);
- result->max_namelen = be32_to_cpup(p);
- /* ignore remaining fields */
+ result->max_namelen = be32_to_cpup(p++);
+ p++; /* ignore no_trunc */
+ p++; /* ignore chown_restricted */
+ result->case_insensitive = be32_to_cpup(p++) != 0;
+ result->case_preserving = be32_to_cpup(p) != 0;
return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d839a97df822..62f66684fbc8 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3933,7 +3933,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
server->caps &=
~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS |
NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS |
- NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME);
+ NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME |
+ NFS_CAP_CASE_INSENSITIVE | NFS_CAP_CASE_NONPRESERVING);
server->fattr_valid = NFS_ATTR_FATTR_V4;
if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
@@ -3944,8 +3945,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
server->caps |= NFS_CAP_SYMLINKS;
if (res.case_insensitive)
server->caps |= NFS_CAP_CASE_INSENSITIVE;
- if (res.case_preserving)
- server->caps |= NFS_CAP_CASE_PRESERVING;
+ if ((res.attr_bitmask[0] & FATTR4_WORD0_CASE_PRESERVING) &&
+ !res.case_preserving)
+ server->caps |= NFS_CAP_CASE_NONPRESERVING;
#ifdef CONFIG_NFS_V4_SECURITY_LABEL
if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
server->caps |= NFS_CAP_SECURITY_LABEL;
@@ -10598,6 +10600,7 @@ static const struct inode_operations nfs4_dir_inode_operations = {
.getattr = nfs_getattr,
.setattr = nfs_setattr,
.listxattr = nfs4_listxattr,
+ .fileattr_get = nfs_fileattr_get,
};
static const struct inode_operations nfs4_file_inode_operations = {
@@ -10605,6 +10608,7 @@ static const struct inode_operations nfs4_file_inode_operations = {
.getattr = nfs_getattr,
.setattr = nfs_setattr,
.listxattr = nfs4_listxattr,
+ .fileattr_get = nfs_fileattr_get,
};
static struct nfs_server *nfs4_clone_server(struct nfs_server *source,
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 70795684b8e8..03c2c1f31be9 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -598,6 +598,7 @@ nfs_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
{
info->max_link = 0;
info->max_namelen = NFS2_MAXNAMLEN;
+ info->case_preserving = true;
return 0;
}
@@ -718,12 +719,14 @@ static const struct inode_operations nfs_dir_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
static const struct inode_operations nfs_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
const struct nfs_rpc_ops nfs_v2_clientops = {
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 58146e935402..74a072896f8d 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -22,6 +22,8 @@
#include <linux/mm.h>
#include <linux/string.h>
+#include "internal.h"
+
/* Symlink caching in the page cache is even more simplistic
* and straight-forward than readdir caching.
*/
@@ -74,4 +76,5 @@ const struct inode_operations nfs_symlink_inode_operations = {
.get_link = nfs_get_link,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4daee27fa5eb..34d294774f8c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -306,7 +306,7 @@ struct nfs_server {
#define NFS_CAP_ATOMIC_OPEN (1U << 4)
#define NFS_CAP_LGOPEN (1U << 5)
#define NFS_CAP_CASE_INSENSITIVE (1U << 6)
-#define NFS_CAP_CASE_PRESERVING (1U << 7)
+#define NFS_CAP_CASE_NONPRESERVING (1U << 7)
#define NFS_CAP_REBOOT_LAYOUTRETURN (1U << 8)
#define NFS_CAP_OFFLOAD_STATUS (1U << 9)
#define NFS_CAP_ZERO_RANGE (1U << 10)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..7c2057e40f99 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -182,6 +182,8 @@ struct nfs_pathconf {
struct nfs_fattr *fattr; /* Post-op attributes */
__u32 max_link; /* max # of hard links */
__u32 max_namelen; /* max name length */
+ bool case_insensitive;
+ bool case_preserving;
};
struct nfs4_change_info {
--
2.53.0
^ permalink raw reply related
* [PATCH v12 09/15] cifs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Steve French, Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need a way to query whether a filesystem
handles filenames in a case-sensitive manner. Report CIFS/SMB case
handling behavior via FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING.
The authoritative source is the server itself: at mount time CIFS
issues QueryFSInfo(FS_ATTRIBUTE_INFORMATION) and caches the reply
on the tcon. That reply carries FILE_CASE_SENSITIVE_SEARCH and
FILE_CASE_PRESERVED_NAMES, which reflect whatever case handling
the share actually implements after SMB3.1.1 POSIX extensions
negotiation. Translating those two bits into the VFS flags lets
cifs_fileattr_get report what the server advertises rather than
what the client was asked to pretend.
QueryFSInfo is best-effort; the mount completes even if the server
does not answer. MaxPathNameComponentLength is zero in that case
and is used as the "no reply received" sentinel. When no reply is
available, fall back to the nocase mount option so that the reported
behavior agrees with the dentry comparison operations installed on
the superblock.
The callback is registered on cifs_dir_inode_ops so that NFSD,
ksmbd, and other consumers querying case handling against a
directory get a definitive answer, and on cifs_file_inode_ops to
preserve FS_COMPR_FL reporting on regular files. cifs_set_ops()
also installs cifs_namespace_inode_operations on DFS referral
directories that carry IS_AUTOMOUNT; register the same callback
there so the answer does not depend on whether the directory is
a referral point.
Registering fileattr_get routes FS_IOC_GETFLAGS through
vfs_fileattr_get() and short-circuits the syscall's fallback to
cifs_ioctl(). That fallback invoked CIFSGetExtAttr() under
CONFIG_CIFS_POSIX and CONFIG_CIFS_ALLOW_INSECURE_LEGACY on servers
advertising CIFS_UNIX_EXTATTR_CAP, surfacing the SMB1 Unix-extension
immutable, append, and nodump bits. cifs_fileattr_get carries over
only FS_COMPR_FL from cached cifsAttrs; the SMB1 extattr fetch is
not reproduced. SMB1 is deprecated, and acquiring a netfid from
within a dentry-only callback is not worth preserving a path tied
to an insecure legacy dialect.
Acked-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/smb/client/cifsfs.c | 41 +++++++++++++++++++++++++++++++++++++++++
fs/smb/client/cifsfs.h | 3 +++
fs/smb/client/namespace.c | 1 +
3 files changed, 45 insertions(+)
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2025739f070a..0912d74e32de 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -30,6 +30,7 @@
#include <linux/xattr.h>
#include <linux/mm.h>
#include <linux/key-type.h>
+#include <linux/fileattr.h>
#include <uapi/linux/magic.h>
#include <net/ipv6.h>
#include "cifsfs.h"
@@ -1199,6 +1200,44 @@ struct file_system_type smb3_fs_type = {
MODULE_ALIAS_FS("smb3");
MODULE_ALIAS("smb3");
+int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
+ struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+ u32 attrs = le32_to_cpu(tcon->fsAttrInfo.Attributes);
+
+ /* Preserve FS_COMPR_FL previously reported by cifs_ioctl(). */
+ if (CIFS_I(d_inode(dentry))->cifsAttrs & ATTR_COMPRESSED)
+ fa->flags |= FS_COMPR_FL;
+
+ /*
+ * The server's FS_ATTRIBUTE_INFORMATION response, cached on
+ * the tcon at mount, reflects the share's case-handling
+ * semantics after any POSIX extensions negotiation. Prefer
+ * it over the client-local nocase mount option, which only
+ * governs dentry comparison on this superblock.
+ *
+ * QueryFSInfo is best-effort at mount; when it did not
+ * populate fsAttrInfo, MaxPathNameComponentLength remains
+ * zero. In that case fall back to nocase so the reporting
+ * matches the comparison behavior installed on the sb.
+ */
+ if (le32_to_cpu(tcon->fsAttrInfo.MaxPathNameComponentLength) == 0) {
+ if (tcon->nocase) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ return 0;
+ }
+ if (!(attrs & FILE_CASE_SENSITIVE_SEARCH)) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ if (!(attrs & FILE_CASE_PRESERVED_NAMES))
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ return 0;
+}
+
const struct inode_operations cifs_dir_inode_ops = {
.create = cifs_create,
.atomic_open = cifs_atomic_open,
@@ -1217,6 +1256,7 @@ const struct inode_operations cifs_dir_inode_ops = {
.listxattr = cifs_listxattr,
.get_acl = cifs_get_acl,
.set_acl = cifs_set_acl,
+ .fileattr_get = cifs_fileattr_get,
};
const struct inode_operations cifs_file_inode_ops = {
@@ -1227,6 +1267,7 @@ const struct inode_operations cifs_file_inode_ops = {
.fiemap = cifs_fiemap,
.get_acl = cifs_get_acl,
.set_acl = cifs_set_acl,
+ .fileattr_get = cifs_fileattr_get,
};
const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index 7370b38da938..5f0d459d1a89 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -89,6 +89,9 @@ extern const struct inode_operations cifs_file_inode_ops;
extern const struct inode_operations cifs_symlink_inode_ops;
extern const struct inode_operations cifs_namespace_inode_operations;
+struct file_kattr;
+int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
/* Functions related to files and directories */
extern const struct netfs_request_ops cifs_req_ops;
diff --git a/fs/smb/client/namespace.c b/fs/smb/client/namespace.c
index 52a520349cb7..52a51b032fae 100644
--- a/fs/smb/client/namespace.c
+++ b/fs/smb/client/namespace.c
@@ -294,4 +294,5 @@ struct vfsmount *cifs_d_automount(struct path *path)
}
const struct inode_operations cifs_namespace_inode_operations = {
+ .fileattr_get = cifs_fileattr_get,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v12 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need to query whether a filesystem
is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
when the filesystem is formatted with the ASCIICI feature
flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr()
in xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates
bs_xflags directly from xfs_ip2xflags()), so bulkstat consumers
and per-inode queries see a consistent view of the filesystem's
case-folding behavior.
FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it, and xfs_flags2diflags() has no
clause for CASEFOLD so the on-disk diflags are unaffected.
The legacy FS_IOC_SETFLAGS path in xfs_fileattr_set() also
allows FS_CASEFOLD_FL through its allowlist on ASCIICI
filesystems so that a chattr read-modify-write cycle does
not fail with EOPNOTSUPP.
XFS always preserves case. XFS is case-sensitive by default,
but supports ASCII case-insensitive lookups when formatted
with the ASCIICI feature flag.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_ioctl.c | 20 +++++++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 551fa51befb6..82be54b6f8d3 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -130,6 +130,8 @@ xfs_ip2xflags(
if (xfs_inode_has_attr_fork(ip))
flags |= FS_XFLAG_HASATTR;
+ if (xfs_has_asciici(ip->i_mount))
+ flags |= FS_XFLAG_CASEFOLD;
return flags;
}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ed9b4846c05f..f8216f74679f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -755,9 +755,23 @@ xfs_fileattr_set(
trace_xfs_ioctl_setattr(ip);
if (!fa->fsx_valid) {
- if (fa->flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL |
- FS_NOATIME_FL | FS_NODUMP_FL |
- FS_SYNC_FL | FS_DAX_FL | FS_PROJINHERIT_FL))
+ unsigned int allowed = FS_IMMUTABLE_FL | FS_APPEND_FL |
+ FS_NOATIME_FL | FS_NODUMP_FL |
+ FS_SYNC_FL | FS_DAX_FL |
+ FS_PROJINHERIT_FL;
+
+ /*
+ * FS_CASEFOLD_FL reflects the ASCIICI superblock feature,
+ * a read-only property. Accept it as a no-op so chattr's
+ * RMW round-trip succeeds; reject any attempt to enable
+ * it on a non-ASCIICI filesystem. xfs_flags2diflags()
+ * has no clause for CASEFOLD, so the bit is dropped from
+ * the on-disk diflags regardless.
+ */
+ if (xfs_has_asciici(mp))
+ allowed |= FS_CASEFOLD_FL;
+
+ if (fa->flags & ~allowed)
return -EOPNOTSUPP;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v12 07/15] hfsplus: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Add case sensitivity reporting to the existing hfsplus_fileattr_get()
function via the FS_XFLAG_CASEFOLD flag. HFS+ always preserves case
at rest.
Case sensitivity depends on how the volume was formatted: HFSX
volumes may be either case-sensitive or case-insensitive, indicated
by the HFSPLUS_SB_CASEFOLD superblock flag.
FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it. The legacy FS_IOC_SETFLAGS path in
hfsplus_fileattr_set() also allows FS_CASEFOLD_FL through its
allowlist on case-insensitive volumes so that a chattr
read-modify-write cycle does not fail with EOPNOTSUPP.
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/hfsplus/inode.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index d05891ec492e..5565c14b4bf6 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -740,6 +740,7 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
{
struct inode *inode = d_inode(dentry);
struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
unsigned int flags = 0;
if (inode->i_flags & S_IMMUTABLE)
@@ -748,6 +749,8 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
flags |= FS_APPEND_FL;
if (hip->userflags & HFSPLUS_FLG_NODUMP)
flags |= FS_NODUMP_FL;
+ if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags))
+ flags |= FS_CASEFOLD_FL;
fileattr_fill_flags(fa, flags);
@@ -759,13 +762,24 @@ int hfsplus_fileattr_set(struct mnt_idmap *idmap,
{
struct inode *inode = d_inode(dentry);
struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+ unsigned int allowed = FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL;
unsigned int new_fl = 0;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
+ /*
+ * FS_CASEFOLD_FL reflects HFSPLUS_SB_CASEFOLD, a mount-time
+ * property. Accept it as a no-op so chattr's RMW round-trip
+ * succeeds; reject any attempt to enable it on a volume that
+ * was not formatted case-insensitive.
+ */
+ if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags))
+ allowed |= FS_CASEFOLD_FL;
+
/* don't silently ignore unsupported ext2 flags */
- if (fa->flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL))
+ if (fa->flags & ~allowed)
return -EOPNOTSUPP;
if (fa->flags & FS_IMMUTABLE_FL)
--
2.53.0
^ permalink raw reply related
* [PATCH v12 06/15] hfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report HFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. HFS is always case-insensitive (using Mac OS Roman case
folding) and always preserves case at rest.
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/hfs/dir.c | 1 +
fs/hfs/hfs_fs.h | 2 ++
fs/hfs/inode.c | 14 ++++++++++++++
3 files changed, 17 insertions(+)
diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index f5e7efe924e7..c4c6e1623f55 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -328,4 +328,5 @@ const struct inode_operations hfs_dir_inode_operations = {
.rmdir = hfs_remove,
.rename = hfs_rename,
.setattr = hfs_inode_setattr,
+ .fileattr_get = hfs_fileattr_get,
};
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..1b23448c9a48 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -177,6 +177,8 @@ extern int hfs_get_block(struct inode *inode, sector_t block,
extern const struct address_space_operations hfs_aops;
extern const struct address_space_operations hfs_btree_aops;
+struct file_kattr;
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int hfs_write_begin(const struct kiocb *iocb, struct address_space *mapping,
loff_t pos, unsigned int len, struct folio **foliop,
void **fsdata);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 89b33a9d46d5..f41cc261684d 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -18,6 +18,7 @@
#include <linux/uio.h>
#include <linux/xattr.h>
#include <linux/blkdev.h>
+#include <linux/fileattr.h>
#include "hfs_fs.h"
#include "btree.h"
@@ -699,6 +700,18 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
return ret;
}
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ /*
+ * HFS compares filenames using Mac OS Roman case folding, so
+ * lookup is always case-insensitive. Names are stored on disk
+ * with case intact; CASENONPRESERVING stays clear.
+ */
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ return 0;
+}
+
static const struct file_operations hfs_file_operations = {
.llseek = generic_file_llseek,
.read_iter = generic_file_read_iter,
@@ -715,4 +728,5 @@ static const struct inode_operations hfs_file_inode_operations = {
.lookup = hfs_file_lookup,
.setattr = hfs_inode_setattr,
.listxattr = generic_listxattr,
+ .fileattr_get = hfs_fileattr_get,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v12 05/15] ntfs3: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report NTFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. NTFS always preserves case at rest.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/ntfs3/file.c | 29 +++++++++++++++++++++++++++++
fs/ntfs3/inode.c | 1 +
fs/ntfs3/namei.c | 2 ++
fs/ntfs3/ntfs_fs.h | 1 +
4 files changed, 33 insertions(+)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index b041639ab406..ad9350d7fc3f 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -180,6 +180,34 @@ long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
}
#endif
+/*
+ * ntfs_fileattr_get - inode_operations::fileattr_get
+ */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct inode *inode = d_inode(dentry);
+ struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
+
+ /* Avoid any operation if inode is bad. */
+ if (unlikely(is_bad_ni(ntfs_i(inode))))
+ return -EINVAL;
+
+ /*
+ * NTFS preserves case (the default). Case sensitivity depends on
+ * mount options: with "nocase", NTFS is case-insensitive;
+ * otherwise it is case-sensitive.
+ */
+ if (sbi->options->nocase) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ if (inode->i_flags & S_IMMUTABLE) {
+ fa->fsx_xflags |= FS_XFLAG_IMMUTABLE;
+ fa->flags |= FS_IMMUTABLE_FL;
+ }
+ return 0;
+}
+
/*
* ntfs_getattr - inode_operations::getattr
*/
@@ -1547,6 +1575,7 @@ const struct inode_operations ntfs_file_inode_operations = {
.get_acl = ntfs_get_acl,
.set_acl = ntfs_set_acl,
.fiemap = ntfs_fiemap,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct file_operations ntfs_file_operations = {
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 42af1abe17f8..a5ff04c2efd3 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -2095,6 +2095,7 @@ const struct inode_operations ntfs_link_inode_operations = {
.get_link = ntfs_get_link,
.setattr = ntfs_setattr,
.listxattr = ntfs_listxattr,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct address_space_operations ntfs_aops = {
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index b2af8f695e60..eb241d7796ba 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -518,6 +518,7 @@ const struct inode_operations ntfs_dir_inode_operations = {
.getattr = ntfs_getattr,
.listxattr = ntfs_listxattr,
.fiemap = ntfs_fiemap,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct inode_operations ntfs_special_inode_operations = {
@@ -526,6 +527,7 @@ const struct inode_operations ntfs_special_inode_operations = {
.listxattr = ntfs_listxattr,
.get_acl = ntfs_get_acl,
.set_acl = ntfs_set_acl,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct dentry_operations ntfs_dentry_ops = {
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index bbf3b6a1dcbe..41db22d652c4 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -529,6 +529,7 @@ bool dir_is_empty(struct inode *dir);
extern const struct file_operations ntfs_dir_operations;
/* Globals from file.c */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, u32 flags);
int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
--
2.53.0
^ permalink raw reply related
* [PATCH v12 04/15] exfat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT is always case-insensitive (using an upcase table for
comparison) and always preserves case at rest.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/exfat/exfat_fs.h | 2 ++
fs/exfat/file.c | 18 ++++++++++++++++--
fs/exfat/namei.c | 1 +
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..aff4dcd4e75a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -496,6 +496,8 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, unsigned int request_mask,
unsigned int query_flags);
+struct file_kattr;
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..91e5511945d1 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,6 +14,7 @@
#include <linux/writeback.h>
#include <linux/filelock.h>
#include <linux/falloc.h>
+#include <linux/fileattr.h>
#include "exfat_raw.h"
#include "exfat_fs.h"
@@ -323,6 +324,18 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
return 0;
}
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ /*
+ * exFAT compares filenames through an upcase table, so lookup
+ * is always case-insensitive. Long names are stored in UTF-16
+ * with case intact; CASENONPRESERVING stays clear.
+ */
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ return 0;
+}
+
int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr)
{
@@ -817,6 +830,7 @@ const struct file_operations exfat_file_operations = {
};
const struct inode_operations exfat_file_inode_operations = {
- .setattr = exfat_setattr,
- .getattr = exfat_getattr,
+ .setattr = exfat_setattr,
+ .getattr = exfat_getattr,
+ .fileattr_get = exfat_fileattr_get,
};
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..94002e43db08 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1311,4 +1311,5 @@ const struct inode_operations exfat_dir_inode_operations = {
.rename = exfat_rename,
.setattr = exfat_setattr,
.getattr = exfat_getattr,
+ .fileattr_get = exfat_fileattr_get,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v12 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.
MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.
VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/fat/fat.h | 3 +++
fs/fat/file.c | 36 ++++++++++++++++++++++++++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
4 files changed, 41 insertions(+)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 5a58f0bf8ce8..99ed9228a677 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -10,6 +10,8 @@
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
+struct file_kattr;
+
/*
* vfat shortname flags
*/
@@ -408,6 +410,7 @@ extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
extern int fat_getattr(struct mnt_idmap *idmap,
const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags);
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
int datasync);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index becccdd2e501..37e7049b4c8c 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,6 +17,7 @@
#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/falloc.h>
+#include <linux/fileattr.h>
#include "fat.h"
static long fat_fallocate(struct file *file, int mode,
@@ -398,6 +399,40 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset)
fat_flush_inodes(inode->i_sb, inode, NULL);
}
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
+ bool case_sensitive;
+
+ /*
+ * FAT filesystems are case-insensitive by default. VFAT
+ * becomes case-sensitive when mounted with 'check=strict',
+ * which installs vfat_dentry_ops. MSDOS has no such option;
+ * its 'nocase' mount option selects case-sensitive matching.
+ *
+ * VFAT long filename entries preserve case. Without VFAT, only
+ * uppercased 8.3 short names are stored. MSDOS with 'nocase'
+ * also preserves case.
+ */
+ if (sbi->options.isvfat)
+ case_sensitive = sbi->options.name_check == 's';
+ else
+ case_sensitive = sbi->options.nocase;
+
+ if (!case_sensitive) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ if (!sbi->options.isvfat)
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ }
+ if (d_inode(dentry)->i_flags & S_IMMUTABLE) {
+ fa->fsx_xflags |= FS_XFLAG_IMMUTABLE;
+ fa->flags |= FS_IMMUTABLE_FL;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fat_fileattr_get);
+
int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int flags)
{
@@ -575,5 +610,6 @@ EXPORT_SYMBOL_GPL(fat_setattr);
const struct inode_operations fat_file_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 4cc65f330fb7..0fd2971ad4b1 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -644,6 +644,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
.rename = msdos_rename,
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 918b3756674c..e909447873e3 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1185,6 +1185,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
.rename = vfat_rename2,
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v12 02/15] fs: Add case sensitivity flags to file_kattr
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.
Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).
Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/file_attr.c | 4 ++++
include/linux/fileattr.h | 3 ++-
include/uapi/linux/fs.h | 7 +++++++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/file_attr.c b/fs/file_attr.c
index f429da66a317..bfb00d256dd5 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -37,6 +37,8 @@ void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
fa->flags |= FS_PROJINHERIT_FL;
if (fa->fsx_xflags & FS_XFLAG_VERITY)
fa->flags |= FS_VERITY_FL;
+ if (fa->fsx_xflags & FS_XFLAG_CASEFOLD)
+ fa->flags |= FS_CASEFOLD_FL;
}
EXPORT_SYMBOL(fileattr_fill_xflags);
@@ -67,6 +69,8 @@ void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
if (fa->flags & FS_VERITY_FL)
fa->fsx_xflags |= FS_XFLAG_VERITY;
+ if (fa->flags & FS_CASEFOLD_FL)
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
}
EXPORT_SYMBOL(fileattr_fill_flags);
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 3780904a63a6..58044b598016 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -16,7 +16,8 @@
/* Read-only inode flags */
#define FS_XFLAG_RDONLY_MASK \
- (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY)
+ (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY | \
+ FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
/* Flags to indicate valid value of fsx_ fields */
#define FS_XFLAG_VALUES_MASK \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 13f71202845e..2ea4c81df08f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -254,6 +254,13 @@ struct file_attr {
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
#define FS_XFLAG_VERITY 0x00020000 /* fs-verity enabled */
+/*
+ * Case handling flags (read-only, cannot be set via ioctl).
+ * Default (neither set) indicates POSIX semantics: case-sensitive
+ * lookups and case-preserving storage.
+ */
+#define FS_XFLAG_CASEFOLD 0x00040000 /* case-insensitive lookups */
+#define FS_XFLAG_CASENONPRESERVING 0x00080000 /* case not preserved */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
--
2.53.0
^ permalink raw reply related
* [PATCH v12 01/15] fs: Move file_kattr initialization to callers
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz
In-Reply-To: <20260429-case-sensitivity-v12-0-8057123bebe0@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.
Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().
Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/file_attr.c | 12 ++++--------
fs/xfs/xfs_ioctl.c | 2 +-
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/file_attr.c b/fs/file_attr.c
index da983e105d70..f429da66a317 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -15,12 +15,10 @@
* @fa: fileattr pointer
* @xflags: FS_XFLAG_* flags
*
- * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags). All
- * other fields are zeroed.
+ * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).
*/
void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
{
- memset(fa, 0, sizeof(*fa));
fa->fsx_valid = true;
fa->fsx_xflags = xflags;
if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
@@ -48,11 +46,9 @@ EXPORT_SYMBOL(fileattr_fill_xflags);
* @flags: FS_*_FL flags
*
* Set ->flags, ->flags_valid and ->fsx_xflags (translated flags).
- * All other fields are zeroed.
*/
void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
{
- memset(fa, 0, sizeof(*fa));
fa->flags_valid = true;
fa->flags = flags;
if (fa->flags & FS_SYNC_FL)
@@ -325,7 +321,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct dentry *dentry = file->f_path.dentry;
- struct file_kattr fa;
+ struct file_kattr fa = {};
unsigned int flags;
int err;
@@ -357,7 +353,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct dentry *dentry = file->f_path.dentry;
- struct file_kattr fa;
+ struct file_kattr fa = {};
int err;
err = copy_fsxattr_from_user(&fa, argp);
@@ -431,7 +427,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
struct path filepath __free(path_put) = {};
unsigned int lookup_flags = 0;
struct file_attr fattr;
- struct file_kattr fa;
+ struct file_kattr fa = {};
int error;
BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 46e234863644..ed9b4846c05f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -517,7 +517,7 @@ xfs_ioc_fsgetxattra(
xfs_inode_t *ip,
void __user *arg)
{
- struct file_kattr fa;
+ struct file_kattr fa = {};
xfs_ilock(ip, XFS_ILOCK_SHARED);
xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);
--
2.53.0
^ permalink raw reply related
* [PATCH v12 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-04-29 18:07 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz, Steve French
Following on from:
https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:
case-insensitive = false
case-nonpreserving = false
The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.
Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.
The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.
One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.
Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.
Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.
[1] https://github.com/kofemann/ms-nfs41-client
[2] https://patchwork.kernel.org/project/linux-nfs/cover/20211217203658.439352-1-trondmy@kernel.org/
---
Changes since v11:
- isofs: Wire .fileattr_get only on directory inodes, since
NFSD and ksmbd query casefolding on directories (Jan Kara)
- xfs, hfsplus: Drop the FS_CASEFOLD_FL fileattr_get mask;
admit the bit through fileattr_set's allowlist instead
- Address findings from sashiko(gemini-3) and gpt-5.5:
- cifs: Wire .fileattr_get on cifs_namespace_inode_operations
so DFS referral / automount directories report case handling
- fat, ntfs3: Fill FS_IMMUTABLE_FL in fileattr_get
- hfsplus: Hide FS_CASEFOLD_FL from the legacy flags view so
chattr round-trips do not hit the setflags whitelist
- nfs: Clear NFS_CAP_CASE_INSENSITIVE and
NFS_CAP_CASE_NONPRESERVING before re-OR'ing in the v3 and
v4 probe paths so re-probe / TSM does not retain stale caps
- nfsd: Switch nfsd_get_case_info() to errno return so
v3 PATHCONF and v4 GETATTR can apply version-appropriate
policy on failure
- nfsd: Use dget_parent() in v4 case-attr probe to keep
the parent dentry referenced across the query
- isofs: Report FS_XFLAG_CASENONPRESERVING for map=n/map=a
Changes since v10:
- cifs: Source case-handling flags from the server's cached
FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
option, with a nocase fallback when the reply is absent
- Address findings from sashiko(gemini-3) and gpt-5.5:
- nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
instead)
- xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
chattr round-trips do not hit the setflags whitelist
- ext4, f2fs: Drop redundant fileattr_get patches; the
FS_CASEFOLD_FL translation in fileattr_fill_flags() already
reports FS_XFLAG_CASEFOLD for casefolded directories
- nfsd: Report FATTR4_HOMOGENEOUS = FALSE when the exported
filesystem has a Unicode encoding, since per-directory
casefold makes the fs-scoped case attributes inhomogeneous
- nfsd: Document in nfsd_get_case_info() why -ENOIOCTLCMD and
-ENOTTY are swallowed while other errors propagate
- fat: Honor vfat 'check=strict' when reporting FS_XFLAG_CASEFOLD
- Set FS_CASEFOLD_FL so FS_IOC_GETFLAGS reflects case-insensitive
mount
- isofs: Register fileattr_get on regular file and symlink inodes,
not just directories
- nfsd: Query NFSv4 FATTR4_CASE_* from the parent directory for
non-directory objects, since casefold lives on the directory
Changes since v9:
- nfs: always probe PATHCONF for case caps. Default to case-
preserving when the server does not report case_preserving
- nfsd, ksmbd: tolerate -ENOTTY from vfs_fileattr_get() so
overlayfs exports on backing filesystems without fileattr_get
do not fail the RPC
- xfs: map FS_XFLAG_CASEFOLD inside xfs_ip2xflags() so BULKSTAT
and FS_IOC_FSGETXATTR report the flag consistently
- vboxsf: reject a short host reply to SHFL_INFO_VOLUME before
trusting volinfo.properties.case_sensitive
Changes since v8:
- Rebase on v7.0-rc1
Changes since v7:
- Split file_attr initialization changes into a separate patch
Changes since v6:
- Remove the memset from vfs_fileattr_get
Changes since v5:
- Finish the conversion to FS_XFLAGs
- NFSv4 GETATTR now clears the attr mask bit if nfsd_get_case_info()
fails
Changes since v4:
- Observe the MSDOS "nocase" mount option
- Define new FS_XFLAGs for the user API
Changes since v3:
- Change fa->case_preserving to fa_case_nonpreserving
- VFAT is case preserving
- Make new fields available to user space
Changes since v2:
- Remove unicode labels
- Replace vfs_get_case_info
- Add support for several more local file system implementations
- Add support for in-kernel SMB server
Changes since RFC:
- Use file_getattr instead of statx
- Postpone exposing Unicode version until later
- Support NTFS and ext4 in addition to FAT
- Support NFSv4 fattr4 in addition to NFSv3 PATHCONF
---
Chuck Lever (15):
fs: Move file_kattr initialization to callers
fs: Add case sensitivity flags to file_kattr
fat: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
hfs: Implement fileattr_get for case sensitivity
hfsplus: Report case sensitivity in fileattr_get
xfs: Report case sensitivity in fileattr_get
cifs: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
isofs: Implement fileattr_get for case sensitivity
nfsd: Report export case-folding via NFSv3 PATHCONF
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
fs/exfat/exfat_fs.h | 2 ++
fs/exfat/file.c | 18 +++++++++--
fs/exfat/namei.c | 1 +
fs/fat/fat.h | 3 ++
fs/fat/file.c | 36 +++++++++++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
fs/file_attr.c | 16 +++++-----
fs/hfs/dir.c | 1 +
fs/hfs/hfs_fs.h | 2 ++
fs/hfs/inode.c | 14 ++++++++
fs/hfsplus/inode.c | 16 +++++++++-
fs/isofs/dir.c | 16 ++++++++++
fs/isofs/isofs.h | 3 ++
fs/nfs/client.c | 25 +++++++++++----
fs/nfs/inode.c | 15 +++++++++
fs/nfs/internal.h | 3 ++
fs/nfs/namespace.c | 2 ++
fs/nfs/nfs3proc.c | 2 ++
fs/nfs/nfs3xdr.c | 7 ++--
fs/nfs/nfs4proc.c | 10 ++++--
fs/nfs/proc.c | 3 ++
fs/nfs/symlink.c | 3 ++
fs/nfsd/nfs3proc.c | 36 ++++++++++++++++-----
fs/nfsd/nfs4xdr.c | 52 ++++++++++++++++++++++++++++--
fs/nfsd/vfs.c | 72 ++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/vfs.h | 3 ++
fs/nfsd/xdr3.h | 4 +--
fs/ntfs3/file.c | 29 +++++++++++++++++
fs/ntfs3/inode.c | 1 +
fs/ntfs3/namei.c | 2 ++
fs/ntfs3/ntfs_fs.h | 1 +
fs/smb/client/cifsfs.c | 41 ++++++++++++++++++++++++
fs/smb/client/cifsfs.h | 3 ++
fs/smb/client/namespace.c | 1 +
fs/smb/server/smb2pdu.c | 30 ++++++++++++++----
fs/vboxsf/dir.c | 1 +
fs/vboxsf/file.c | 6 ++--
fs/vboxsf/super.c | 7 ++++
fs/vboxsf/utils.c | 30 ++++++++++++++++++
fs/vboxsf/vfsmod.h | 6 ++++
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_ioctl.c | 22 ++++++++++---
include/linux/fileattr.h | 3 +-
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 2 ++
include/uapi/linux/fs.h | 7 ++++
47 files changed, 513 insertions(+), 50 deletions(-)
---
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
change-id: 20260422-case-sensitivity-5cbffc8f1558
Best regards,
--
Chuck Lever
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Mateusz Guzik @ 2026-04-28 14:39 UTC (permalink / raw)
To: Christian Brauner
Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
linux-arch
In-Reply-To: <20260428-zoodirektor-latten-e412db97141d@brauner>
On Tue, Apr 28, 2026 at 10:55 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Apr 27, 2026 at 06:30:42PM +0200, Mateusz Guzik wrote:
> > On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > > Things proceed to handle_truncate:
> > > > int error = get_write_access(inode);
> > > > if (error)
> > > > return error;
> > > >
> > > > error = security_file_truncate(filp);
> > > > if (!error) {
> > > > error = do_truncate(idmap, path->dentry, 0,
> > > > ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > > > filp);
> > > > }
> > > >
> > > > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> > > >
> > > > AFAICS nothing prevents the same user from racing against file creation to
> > > > execve it, which starts with exe_file_deny_write_access. Should the
> > > > other thread win the race, get_write_access will fail and the WARN_ON
> > > > splat will be generated. That is definitely a problem.
> > >
> > > That can't happen:
> > >
> > > static inline int get_write_access(struct inode *inode)
> > > {
> > > return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> > > }
> > >
> > > and the check is:
> > >
> > > error = handle_truncate(idmap, file);
> > > if (unlikely(error > 0)) {
> > >
> > > This was a catch all for broken LSM hook or ->open() instance.
> > >
> >
> > So with this prog:
> > #include <fcntl.h>
> >
> > int main(void)
> > {
> > open("test", O_TRUNC);
> > }
> >
> > I verified writecount is 0 on entry to handle_truncate like so:
> >
> > bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
> > file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
> > }'
> >
> > @[a.out, 1]: 1
> >
> > i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
> >
> > but then what prevents the following race:
> >
> > CPU0 CPU1
> > open("test") execve("test")
> > handle_truncate do_open_execat
> > exe_file_deny_write_access # should
> > succeed as count is 0?
> > get_write_access # should fail as the count is now -1?
>
> I'm not arguing that get_write_access() cannot fail. I'm arguing that it
> cannot hit that WARN_ON() as you said above because get_write_access()
> returns either 0 or -ETXTBUSY.
ops, right:
4681 │ error = handle_truncate(idmap, file);
4682 if (unlikely(error > 0)) {
4683 WARN_ON(1);
4684 error = -EINVAL;
4685 }
I mentally had it warn on any error.
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Paulo Alcantara @ 2026-04-28 14:01 UTC (permalink / raw)
To: Stefan Metzmacher, Christian Brauner, Jori Koolstra, Jeff Layton
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <b97874a9-9fef-4f7c-8505-cc23e4b45355@samba.org>
Stefan Metzmacher <metze@samba.org> writes:
> Am 27.04.26 um 17:48 schrieb Christian Brauner:
>> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>>> Currently there is no way to race-freely create and open a directory.
>>> For regular files we have open(O_CREAT) for creating a new file inode,
>>> and returning a pinning fd to it. The lack of such functionality for
>>> directories means that when populating a directory tree there's always
>>> a race involved: the inodes first need to be created, and then opened
>>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>>> but in the time window between the creation and the opening they might
>>> be replaced by something else.
>>>
>>> Addressing this race without proper APIs is possible (by immediately
>>> fstat()ing what was opened, to verify that it has the right inode type),
>>> but difficult to get right. Hence, mkdirat2() that creates a directory
>>> and returns an O_DIRECTORY fd is useful.
>>>
>>> This feature idea (and description) is taken from the UAPI group:
>>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>>
>>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>>> ---
>>> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
>>> fs/internal.h | 2 ++
>>> fs/namei.c | 44 +++++++++++++++++++++++---
>>> include/linux/syscalls.h | 2 ++
>>> include/uapi/asm-generic/unistd.h | 5 ++-
>>> scripts/syscall.tbl | 1 +
>>> 6 files changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>>> index 524155d655da..e200ca2067a4 100644
>>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>>> @@ -396,6 +396,7 @@
>>> 469 common file_setattr sys_file_setattr
>>> 470 common listns sys_listns
>>> 471 common rseq_slice_yield sys_rseq_slice_yield
>>> +472 common mkdirat2 sys_mkdirat2
>>>
>>> #
>>> # Due to a historical design error, certain syscalls are numbered differently
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index cbc384a1aa09..c6a79afadacf 100644
>>> --- a/fs/internal.h
>>> +++ b/fs/internal.h
>>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>>> int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>>> struct filename *newname, unsigned int flags);
>>> int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> + unsigned int flags, bool open);
>>> int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>>> int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>>> int filename_linkat(int olddfd, struct filename *old, int newdfd,
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index a880454a6415..6451e96dc225 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>>> }
>>> EXPORT_SYMBOL(vfs_mkdir);
>>>
>>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> +static int mkdirat_lookup_flags(unsigned int flags)
>>> +{
>>> + int lookup_flags = LOOKUP_DIRECTORY;
>>> +
>>> + if (!(flags & AT_SYMLINK_NOFOLLOW))
>>> + lookup_flags |= LOOKUP_FOLLOW;
>>> + if (!(flags & AT_NO_AUTOMOUNT))
>>> + lookup_flags |= LOOKUP_AUTOMOUNT;
>>> +
>>> + return lookup_flags;
>>> +}
>>> +
>>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>>> + return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>>> +}
>>> +
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> + unsigned int flags, bool open)
>>> {
>>> struct dentry *dentry;
>>> struct path path;
>>> int error;
>>> - unsigned int lookup_flags = LOOKUP_DIRECTORY;
>>> + struct file *filp = NULL;
>>> + unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>>> struct delegated_inode delegated_inode = { };
>>>
>>> retry:
>>> dentry = filename_create(dfd, name, &path, lookup_flags);
>>> if (IS_ERR(dentry))
>>> - return PTR_ERR(dentry);
>>> + return ERR_CAST(dentry);
>>>
>>> error = security_path_mkdir(&path, dentry,
>>> mode_strip_umask(path.dentry->d_inode, mode));
>>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> if (IS_ERR(dentry))
>>> error = PTR_ERR(dentry);
>>> }
>>> + if (open && !error && !is_delegated(&delegated_inode)) {
>>> + const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>>> + filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>>> + }
>>
>> So definitely a patchset worthing doing but this will be hairy. And
>> Mateusz is right. As written this doesn't work. The canonical pattern
>> how e.g., dentry_open() does it is to preallocate the file.
>>
>> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
>> work. I remember that I had a vague comment about this in [1] a few
>> years ago (cf. [1]). It might even be less hairy to get that one right
>> as all the thinking for O_CREAT is already there.
>>
>> What was the rationale for mkdirat2() instead of threading this through
>> openat()/openat2() with O_CREAT?
>>
>> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
>> O_DIRECTORY?
>
> If it helps the SMB2/3 protocol only has a single SMB2 Create operation
> that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.
Yes. However cifs.ko will handle atomic open of regular files only.
IIRC, NFS also doesn't handle atomic opens of directories either. Jeff
could confirm that.
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Stefan Metzmacher @ 2026-04-28 13:49 UTC (permalink / raw)
To: Christian Brauner, Jori Koolstra, Jeff Layton
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <b97874a9-9fef-4f7c-8505-cc23e4b45355@samba.org>
Am 28.04.26 um 15:39 schrieb Stefan Metzmacher:
> Am 27.04.26 um 17:48 schrieb Christian Brauner:
>> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>>> Currently there is no way to race-freely create and open a directory.
>>> For regular files we have open(O_CREAT) for creating a new file inode,
>>> and returning a pinning fd to it. The lack of such functionality for
>>> directories means that when populating a directory tree there's always
>>> a race involved: the inodes first need to be created, and then opened
>>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>>> but in the time window between the creation and the opening they might
>>> be replaced by something else.
>>>
>>> Addressing this race without proper APIs is possible (by immediately
>>> fstat()ing what was opened, to verify that it has the right inode type),
>>> but difficult to get right. Hence, mkdirat2() that creates a directory
>>> and returns an O_DIRECTORY fd is useful.
>>>
>>> This feature idea (and description) is taken from the UAPI group:
>>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>>
>>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>>> ---
>>> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
>>> fs/internal.h | 2 ++
>>> fs/namei.c | 44 +++++++++++++++++++++++---
>>> include/linux/syscalls.h | 2 ++
>>> include/uapi/asm-generic/unistd.h | 5 ++-
>>> scripts/syscall.tbl | 1 +
>>> 6 files changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>>> index 524155d655da..e200ca2067a4 100644
>>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>>> @@ -396,6 +396,7 @@
>>> 469 common file_setattr sys_file_setattr
>>> 470 common listns sys_listns
>>> 471 common rseq_slice_yield sys_rseq_slice_yield
>>> +472 common mkdirat2 sys_mkdirat2
>>> #
>>> # Due to a historical design error, certain syscalls are numbered differently
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index cbc384a1aa09..c6a79afadacf 100644
>>> --- a/fs/internal.h
>>> +++ b/fs/internal.h
>>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>>> int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>>> struct filename *newname, unsigned int flags);
>>> int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> + unsigned int flags, bool open);
>>> int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>>> int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>>> int filename_linkat(int olddfd, struct filename *old, int newdfd,
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index a880454a6415..6451e96dc225 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>>> }
>>> EXPORT_SYMBOL(vfs_mkdir);
>>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> +static int mkdirat_lookup_flags(unsigned int flags)
>>> +{
>>> + int lookup_flags = LOOKUP_DIRECTORY;
>>> +
>>> + if (!(flags & AT_SYMLINK_NOFOLLOW))
>>> + lookup_flags |= LOOKUP_FOLLOW;
>>> + if (!(flags & AT_NO_AUTOMOUNT))
>>> + lookup_flags |= LOOKUP_AUTOMOUNT;
>>> +
>>> + return lookup_flags;
>>> +}
>>> +
>>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>>> + return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>>> +}
>>> +
>>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>>> + unsigned int flags, bool open)
>>> {
>>> struct dentry *dentry;
>>> struct path path;
>>> int error;
>>> - unsigned int lookup_flags = LOOKUP_DIRECTORY;
>>> + struct file *filp = NULL;
>>> + unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>>> struct delegated_inode delegated_inode = { };
>>> retry:
>>> dentry = filename_create(dfd, name, &path, lookup_flags);
>>> if (IS_ERR(dentry))
>>> - return PTR_ERR(dentry);
>>> + return ERR_CAST(dentry);
>>> error = security_path_mkdir(&path, dentry,
>>> mode_strip_umask(path.dentry->d_inode, mode));
>>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>>> if (IS_ERR(dentry))
>>> error = PTR_ERR(dentry);
>>> }
>>> + if (open && !error && !is_delegated(&delegated_inode)) {
>>> + const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>>> + filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>>> + }
>>
>> So definitely a patchset worthing doing but this will be hairy. And
>> Mateusz is right. As written this doesn't work. The canonical pattern
>> how e.g., dentry_open() does it is to preallocate the file.
>>
>> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
>> work. I remember that I had a vague comment about this in [1] a few
>> years ago (cf. [1]). It might even be less hairy to get that one right
>> as all the thinking for O_CREAT is already there.
>>
>> What was the rationale for mkdirat2() instead of threading this through
>> openat()/openat2() with O_CREAT?
>>
>> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
>> O_DIRECTORY?
>
> If it helps the SMB2/3 protocol only has a single SMB2 Create operation
> that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.
>
> Given all the openat() ignores unknown flags or combinations, maybe this
> should be openat2 only and even a new flag (at the for the userspace interface).
> or do_sys_open() will reject it for open and openat.
I just found the interaction of __O_TMPFILE and O_DIRECTORY
there should be a O_MKDIR or something similar that's openat2 only.
> While we're there an O_TMPDIR would also be wonderful to have.
> Currently samba works around it by using a hidden directory name, invisible
> for SMB clients, but nfs and local users see it.
That should also be openat2 only if added.
metze
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Stefan Metzmacher @ 2026-04-28 13:39 UTC (permalink / raw)
To: Christian Brauner, Jori Koolstra, Jeff Layton
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>
Am 27.04.26 um 17:48 schrieb Christian Brauner:
> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
>> Currently there is no way to race-freely create and open a directory.
>> For regular files we have open(O_CREAT) for creating a new file inode,
>> and returning a pinning fd to it. The lack of such functionality for
>> directories means that when populating a directory tree there's always
>> a race involved: the inodes first need to be created, and then opened
>> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
>> but in the time window between the creation and the opening they might
>> be replaced by something else.
>>
>> Addressing this race without proper APIs is possible (by immediately
>> fstat()ing what was opened, to verify that it has the right inode type),
>> but difficult to get right. Hence, mkdirat2() that creates a directory
>> and returns an O_DIRECTORY fd is useful.
>>
>> This feature idea (and description) is taken from the UAPI group:
>> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>>
>> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
>> ---
>> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
>> fs/internal.h | 2 ++
>> fs/namei.c | 44 +++++++++++++++++++++++---
>> include/linux/syscalls.h | 2 ++
>> include/uapi/asm-generic/unistd.h | 5 ++-
>> scripts/syscall.tbl | 1 +
>> 6 files changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
>> index 524155d655da..e200ca2067a4 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -396,6 +396,7 @@
>> 469 common file_setattr sys_file_setattr
>> 470 common listns sys_listns
>> 471 common rseq_slice_yield sys_rseq_slice_yield
>> +472 common mkdirat2 sys_mkdirat2
>>
>> #
>> # Due to a historical design error, certain syscalls are numbered differently
>> diff --git a/fs/internal.h b/fs/internal.h
>> index cbc384a1aa09..c6a79afadacf 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
>> int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
>> struct filename *newname, unsigned int flags);
>> int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>> + unsigned int flags, bool open);
>> int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
>> int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
>> int filename_linkat(int olddfd, struct filename *old, int newdfd,
>> diff --git a/fs/namei.c b/fs/namei.c
>> index a880454a6415..6451e96dc225 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>> }
>> EXPORT_SYMBOL(vfs_mkdir);
>>
>> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>> +static int mkdirat_lookup_flags(unsigned int flags)
>> +{
>> + int lookup_flags = LOOKUP_DIRECTORY;
>> +
>> + if (!(flags & AT_SYMLINK_NOFOLLOW))
>> + lookup_flags |= LOOKUP_FOLLOW;
>> + if (!(flags & AT_NO_AUTOMOUNT))
>> + lookup_flags |= LOOKUP_AUTOMOUNT;
>> +
>> + return lookup_flags;
>> +}
>> +
>> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
>> + return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
>> +}
>> +
>> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
>> + unsigned int flags, bool open)
>> {
>> struct dentry *dentry;
>> struct path path;
>> int error;
>> - unsigned int lookup_flags = LOOKUP_DIRECTORY;
>> + struct file *filp = NULL;
>> + unsigned int lookup_flags = mkdirat_lookup_flags(flags);
>> struct delegated_inode delegated_inode = { };
>>
>> retry:
>> dentry = filename_create(dfd, name, &path, lookup_flags);
>> if (IS_ERR(dentry))
>> - return PTR_ERR(dentry);
>> + return ERR_CAST(dentry);
>>
>> error = security_path_mkdir(&path, dentry,
>> mode_strip_umask(path.dentry->d_inode, mode));
>> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
>> if (IS_ERR(dentry))
>> error = PTR_ERR(dentry);
>> }
>> + if (open && !error && !is_delegated(&delegated_inode)) {
>> + const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
>> + filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
>> + }
>
> So definitely a patchset worthing doing but this will be hairy. And
> Mateusz is right. As written this doesn't work. The canonical pattern
> how e.g., dentry_open() does it is to preallocate the file.
>
> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
> work. I remember that I had a vague comment about this in [1] a few
> years ago (cf. [1]). It might even be less hairy to get that one right
> as all the thinking for O_CREAT is already there.
>
> What was the rationale for mkdirat2() instead of threading this through
> openat()/openat2() with O_CREAT?
>
> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?
If it helps the SMB2/3 protocol only has a single SMB2 Create operation
that uses FILE_CREATE+FILE_NON_DIRECTORY_FILE or FILE_CREATE+FILE_DIRECTORY_FILE.
Given all the openat() ignores unknown flags or combinations, maybe this
should be openat2 only and even a new flag (at the for the userspace interface).
or do_sys_open() will reject it for open and openat.
While we're there an O_TMPDIR would also be wonderful to have.
Currently samba works around it by using a hidden directory name, invisible
for SMB clients, but nfs and local users see it.
metze
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-28 8:55 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
linux-arch
In-Reply-To: <CAGudoHFLSHhDZoC6maLsn234dMQVnG4ZbpKXoVrueGujArNF-A@mail.gmail.com>
On Mon, Apr 27, 2026 at 06:30:42PM +0200, Mateusz Guzik wrote:
> On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > Things proceed to handle_truncate:
> > > int error = get_write_access(inode);
> > > if (error)
> > > return error;
> > >
> > > error = security_file_truncate(filp);
> > > if (!error) {
> > > error = do_truncate(idmap, path->dentry, 0,
> > > ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > > filp);
> > > }
> > >
> > > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> > >
> > > AFAICS nothing prevents the same user from racing against file creation to
> > > execve it, which starts with exe_file_deny_write_access. Should the
> > > other thread win the race, get_write_access will fail and the WARN_ON
> > > splat will be generated. That is definitely a problem.
> >
> > That can't happen:
> >
> > static inline int get_write_access(struct inode *inode)
> > {
> > return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> > }
> >
> > and the check is:
> >
> > error = handle_truncate(idmap, file);
> > if (unlikely(error > 0)) {
> >
> > This was a catch all for broken LSM hook or ->open() instance.
> >
>
> So with this prog:
> #include <fcntl.h>
>
> int main(void)
> {
> open("test", O_TRUNC);
> }
>
> I verified writecount is 0 on entry to handle_truncate like so:
>
> bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
> file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
> }'
>
> @[a.out, 1]: 1
>
> i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
>
> but then what prevents the following race:
>
> CPU0 CPU1
> open("test") execve("test")
> handle_truncate do_open_execat
> exe_file_deny_write_access # should
> succeed as count is 0?
> get_write_access # should fail as the count is now -1?
I'm not arguing that get_write_access() cannot fail. I'm arguing that it
cannot hit that WARN_ON() as you said above because get_write_access()
returns either 0 or -ETXTBUSY.
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Jeff Layton @ 2026-04-28 7:01 UTC (permalink / raw)
To: Christian Brauner, Jori Koolstra
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <0028f90b0d06cdfcf6b306941fdc3dcbe6c6ab0d.camel@kernel.org>
On Tue, 2026-04-28 at 07:39 +0100, Jeff Layton wrote:
> On Mon, 2026-04-27 at 17:48 +0200, Christian Brauner wrote:
> >
> >
> > And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> > O_DIRECTORY?
> >
>
> No, it can't. OPEN calls only work on regular files. This is why
> O_DIRECTORY works on NFS. If we end up issuing an OPEN against a
> directory, it'll fail, which is what we want in that situation.
To be clear, we could make that work by sending a second RPC:
PUTFH+OPEN+.... (OPEN fails with NFS4ERR_ISDIR)
...and then send:
PUTFH+CREATE...
...for a directory (which is how mkdir works in v4). If the calls race
with something else being created in its place, we could just open it
if it's a directory, or fail.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Jeff Layton @ 2026-04-28 6:39 UTC (permalink / raw)
To: Christian Brauner, Jori Koolstra
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>
On Mon, 2026-04-27 at 17:48 +0200, Christian Brauner wrote:
>
>
> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?
>
No, it can't. OPEN calls only work on regular files. This is why
O_DIRECTORY works on NFS. If we end up issuing an OPEN against a
directory, it'll fail, which is what we want in that situation.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply
* Re: [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-28 1:32 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <20260427155636.GC7751@frogsfrogsfrogs>
On Mon, Apr 27, 2026, at 11:56 AM, Darrick J. Wong wrote:
> On Fri, Apr 24, 2026 at 09:53:10PM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Upper layers such as NFSD need to query whether a filesystem
>> is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
>> when the filesystem is formatted with the ASCIICI feature
>> flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
>> xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
>> directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
>> queries see a consistent view of the filesystem's case-folding
>> behavior.
>>
>> XFS always preserves case. XFS is case-sensitive by default, but
>> supports ASCII case-insensitive lookups when formatted with the
>> ASCIICI feature flag.
>>
>> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
> I don't understand this at all. Yes, FS_XFLAG_CASEFOLD is readonly,
> but how does clearing FS_CASEFOLD_FL from the fileattr_get output
> (without clearing XFLAG_CASEFOLD!) solve anything? This makes the
> reported output inconsistent between fsgetxattr and getflags -- one
> reports case folding, the other reports no casefolding.
The masking is a misplaced reaction to a sashiko review on the
v9 predecessor of this patch [1], which pointed out that v9 set
FS_XFLAG_CASEFOLD in fa->fsx_xflags after xfs_fill_fsxattr() had
already synced fa->flags, leaving the two views inconsistent in
the other direction, and that bulkstat would miss the flag for
the same reason. Moving the injection into xfs_ip2xflags() fixed
both gaps -- but it also surfaced FS_CASEFOLD_FL on the legacy
view, so chattr's RMW through FS_IOC_SETFLAGS hits the EOPNOTSUPP
gate at the top of xfs_fileattr_set(). Hiding it from getflags
was the wrong place to address that.
> If you want to avoid fileattr_set returning EINVAL when setting
> attributes due to the casefold flag, then don't you want to check
> the flag state vs. xfs_has_asciici() in the *fileattr_set* path?
Yep. For v12 I’ll drop the fa->flags mask and add FS_CASEFOLD_FL
to the allowlist in xfs_fileattr_set(), gated on xfs_has_asciici(mp).
xfs_flags2diflags() already has no clause for CASEFOLD, so the
FSSETXATTR path silently no-ops it the same way it does for
FS_XFLAG_HASATTR, and FS_XFLAG_CASEFOLD is in FS_XFLAG_RDONLY_MASK
so FSSETXATTR strips it centrally. Both views then agree, and a
chattr round-trip is accepted as a no-op.
The hfsplus patch in this series carries the same pattern --
FS_XFLAG_CASEFOLD is set after fileattr_fill_flags() so that
FS_CASEFOLD_FL stays out of fa->flags and dodges the EOPNOTSUPP
gate in hfsplus_fileattr_set(). I will fix it the same way.
Thanks for the catch!
[1] https://sashiko.dev/#/patchset/20260422-case-sensitivity-v9-0-be023cc070e2@oracle.com?part=9
--
Chuck Lever
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Aleksa Sarai @ 2026-04-28 1:14 UTC (permalink / raw)
To: Christian Brauner
Cc: Jori Koolstra, Jeff Layton, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Alexander Viro,
Arnd Bergmann, H . Peter Anvin, Jan Kara, Peter Zijlstra,
Andrey Albershteyn, Masami Hiramatsu, Jiri Olsa,
Thomas Weißschuh, Mathieu Desnoyers, cmirabil,
Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
linux-arch
In-Reply-To: <20260427-umlegen-aufbau-ee3a97f1528a@brauner>
[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]
On 2026-04-27, Christian Brauner <brauner@kernel.org> wrote:
> On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
> > + if (open && !error && !is_delegated(&delegated_inode)) {
> > + const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
> > + filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
> > + }
>
> So definitely a patchset worthing doing but this will be hairy. And
> Mateusz is right. As written this doesn't work. The canonical pattern
> how e.g., dentry_open() does it is to preallocate the file.
>
> I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
> work. I remember that I had a vague comment about this in [1] a few
> years ago (cf. [1]). It might even be less hairy to get that one right
> as all the thinking for O_CREAT is already there.
That would be my preference, as it would also allow us to use RESOLVE_*
flags nicely.
> What was the rationale for mkdirat2() instead of threading this through
> openat()/openat2() with O_CREAT?
Mateusz said that he didn't like the idea of having more branches in
the open() paths, I think that ship has long since sailed tbh.
> And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
> O_DIRECTORY?
>
> [1]: 43b450632676 ("open: return EINVAL for O_DIRECTORY | O_CREAT")
--
Aleksa Sarai
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Mateusz Guzik @ 2026-04-27 16:30 UTC (permalink / raw)
To: Christian Brauner
Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
linux-arch
In-Reply-To: <20260427-rudel-gipsabdruck-a7884db4ecea@brauner>
On Mon, Apr 27, 2026 at 5:14 PM Christian Brauner <brauner@kernel.org> wrote:
>
> > Things proceed to handle_truncate:
> > int error = get_write_access(inode);
> > if (error)
> > return error;
> >
> > error = security_file_truncate(filp);
> > if (!error) {
> > error = do_truncate(idmap, path->dentry, 0,
> > ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> > filp);
> > }
> >
> > I'm going to ignore the LSM situation and do_truncate failure modes in this one.
> >
> > AFAICS nothing prevents the same user from racing against file creation to
> > execve it, which starts with exe_file_deny_write_access. Should the
> > other thread win the race, get_write_access will fail and the WARN_ON
> > splat will be generated. That is definitely a problem.
>
> That can't happen:
>
> static inline int get_write_access(struct inode *inode)
> {
> return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
> }
>
> and the check is:
>
> error = handle_truncate(idmap, file);
> if (unlikely(error > 0)) {
>
> This was a catch all for broken LSM hook or ->open() instance.
>
So with this prog:
#include <fcntl.h>
int main(void)
{
open("test", O_TRUNC);
}
I verified writecount is 0 on entry to handle_truncate like so:
bpftrace -e 'kprobe:security_file_truncate { @[comm, (int64)((struct
file *)arg0)->f_path.dentry->d_inode->i_writecount.counter] = count();
}'
@[a.out, 1]: 1
i.e., get_write_access in handle_truncate transitioned the count 0 -> 1
but then what prevents the following race:
CPU0 CPU1
open("test") execve("test")
handle_truncate do_open_execat
exe_file_deny_write_access # should
succeed as count is 0?
get_write_access # should fail as the count is now -1?
^ permalink raw reply
* Re: [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Darrick J. Wong @ 2026-04-27 15:56 UTC (permalink / raw)
To: Chuck Lever
Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-8-de5619beddaf@oracle.com>
On Fri, Apr 24, 2026 at 09:53:10PM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Upper layers such as NFSD need to query whether a filesystem
> is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
> when the filesystem is formatted with the ASCIICI feature
> flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
> xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
> directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
> queries see a consistent view of the filesystem's case-folding
> behavior.
>
> XFS always preserves case. XFS is case-sensitive by default, but
> supports ASCII case-insensitive lookups when formatted with the
> ASCIICI feature flag.
>
> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/xfs/libxfs/xfs_inode_util.c | 2 ++
> fs/xfs/xfs_ioctl.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
> index 551fa51befb6..82be54b6f8d3 100644
> --- a/fs/xfs/libxfs/xfs_inode_util.c
> +++ b/fs/xfs/libxfs/xfs_inode_util.c
> @@ -130,6 +130,8 @@ xfs_ip2xflags(
>
> if (xfs_inode_has_attr_fork(ip))
> flags |= FS_XFLAG_HASATTR;
> + if (xfs_has_asciici(ip->i_mount))
> + flags |= FS_XFLAG_CASEFOLD;
> return flags;
> }
>
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index ed9b4846c05f..5a58fb0bad2b 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -472,6 +472,13 @@ xfs_fill_fsxattr(
>
> fileattr_fill_xflags(fa, xfs_ip2xflags(ip));
>
> + /*
> + * FS_XFLAG_CASEFOLD is read-only; hide it from the legacy
> + * flags view so chattr's RMW cycle does not pass it back to
> + * xfs_fileattr_set().
> + */
> + fa->flags &= ~FS_CASEFOLD_FL;
I don't understand this at all. Yes, FS_XFLAG_CASEFOLD is readonly, but
how does clearing FS_CASEFOLD_FL from the fileattr_get output (without
clearing XFLAG_CASEFOLD!) solve anything? This makes the reported
output inconsistent between fsgetxattr and getflags -- one reports case
folding, the other reports no casefolding.
If you want to avoid fileattr_set returning EINVAL when setting
attributes due to the casefold flag, then don't you want to check the
flag state vs. xfs_has_asciici() in the *fileattr_set* path?
--D
> +
> if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) {
> fa->fsx_extsize = XFS_FSB_TO_B(mp, ip->i_extsize);
> } else if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {
>
> --
> 2.53.0
>
>
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-27 15:48 UTC (permalink / raw)
To: Jori Koolstra, Jeff Layton
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Alexander Viro, Arnd Bergmann, H . Peter Anvin,
Jan Kara, Peter Zijlstra, Andrey Albershteyn, Masami Hiramatsu,
Jiri Olsa, Thomas Weißschuh, Mathieu Desnoyers, Aleksa Sarai,
cmirabil, Greg Kroah-Hartman, linux-kernel, linux-fsdevel,
linux-api, linux-arch
In-Reply-To: <20260412135434.3095416-2-jkoolstra@xs4all.nl>
On Sun, Apr 12, 2026 at 03:54:33PM +0200, Jori Koolstra wrote:
> Currently there is no way to race-freely create and open a directory.
> For regular files we have open(O_CREAT) for creating a new file inode,
> and returning a pinning fd to it. The lack of such functionality for
> directories means that when populating a directory tree there's always
> a race involved: the inodes first need to be created, and then opened
> to adjust their permissions/ownership/labels/timestamps/acls/xattrs/...,
> but in the time window between the creation and the opening they might
> be replaced by something else.
>
> Addressing this race without proper APIs is possible (by immediately
> fstat()ing what was opened, to verify that it has the right inode type),
> but difficult to get right. Hence, mkdirat2() that creates a directory
> and returns an O_DIRECTORY fd is useful.
>
> This feature idea (and description) is taken from the UAPI group:
> https://github.com/uapi-group/kernel-features?tab=readme-ov-file#race-free-creation-and-opening-of-non-file-inodes
>
> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
> ---
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> fs/internal.h | 2 ++
> fs/namei.c | 44 +++++++++++++++++++++++---
> include/linux/syscalls.h | 2 ++
> include/uapi/asm-generic/unistd.h | 5 ++-
> scripts/syscall.tbl | 1 +
> 6 files changed, 50 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 524155d655da..e200ca2067a4 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -396,6 +396,7 @@
> 469 common file_setattr sys_file_setattr
> 470 common listns sys_listns
> 471 common rseq_slice_yield sys_rseq_slice_yield
> +472 common mkdirat2 sys_mkdirat2
>
> #
> # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/fs/internal.h b/fs/internal.h
> index cbc384a1aa09..c6a79afadacf 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,6 +59,8 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link);
> int filename_renameat2(int olddfd, struct filename *oldname, int newdfd,
> struct filename *newname, unsigned int flags);
> int filename_mkdirat(int dfd, struct filename *name, umode_t mode);
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> + unsigned int flags, bool open);
> int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigned int dev);
> int filename_symlinkat(struct filename *from, int newdfd, struct filename *to);
> int filename_linkat(int olddfd, struct filename *old, int newdfd,
> diff --git a/fs/namei.c b/fs/namei.c
> index a880454a6415..6451e96dc225 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -5255,18 +5255,36 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> }
> EXPORT_SYMBOL(vfs_mkdir);
>
> -int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
> +static int mkdirat_lookup_flags(unsigned int flags)
> +{
> + int lookup_flags = LOOKUP_DIRECTORY;
> +
> + if (!(flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> + if (!(flags & AT_NO_AUTOMOUNT))
> + lookup_flags |= LOOKUP_AUTOMOUNT;
> +
> + return lookup_flags;
> +}
> +
> +int filename_mkdirat(int dfd, struct filename *name, umode_t mode) {
> + return PTR_ERR_OR_ZERO(do_file_mkdirat(dfd, name, mode, 0, false));
> +}
> +
> +struct file *do_file_mkdirat(int dfd, struct filename *name, umode_t mode,
> + unsigned int flags, bool open)
> {
> struct dentry *dentry;
> struct path path;
> int error;
> - unsigned int lookup_flags = LOOKUP_DIRECTORY;
> + struct file *filp = NULL;
> + unsigned int lookup_flags = mkdirat_lookup_flags(flags);
> struct delegated_inode delegated_inode = { };
>
> retry:
> dentry = filename_create(dfd, name, &path, lookup_flags);
> if (IS_ERR(dentry))
> - return PTR_ERR(dentry);
> + return ERR_CAST(dentry);
>
> error = security_path_mkdir(&path, dentry,
> mode_strip_umask(path.dentry->d_inode, mode));
> @@ -5276,6 +5294,10 @@ int filename_mkdirat(int dfd, struct filename *name, umode_t mode)
> if (IS_ERR(dentry))
> error = PTR_ERR(dentry);
> }
> + if (open && !error && !is_delegated(&delegated_inode)) {
> + const struct path new_path = { .mnt = path.mnt, .dentry = dentry };
> + filp = dentry_open(&new_path, O_DIRECTORY, current_cred());
> + }
So definitely a patchset worthing doing but this will be hairy. And
Mateusz is right. As written this doesn't work. The canonical pattern
how e.g., dentry_open() does it is to preallocate the file.
I do wonder though whether we shouldn't just make O_CREAT | O_DIRECTORY
work. I remember that I had a vague comment about this in [1] a few
years ago (cf. [1]). It might even be less hairy to get that one right
as all the thinking for O_CREAT is already there.
What was the rationale for mkdirat2() instead of threading this through
openat()/openat2() with O_CREAT?
And side-question: @Jeff, can nfs atomic open deal with O_CREAT |
O_DIRECTORY?
[1]: 43b450632676 ("open: return EINVAL for O_DIRECTORY | O_CREAT")
^ permalink raw reply
* Re: [PATCH v11 00/15] Exposing case folding behavior
From: Jan Kara @ 2026-04-27 15:30 UTC (permalink / raw)
To: Chuck Lever
Cc: Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Darrick J. Wong,
Roland Mainz, Steve French
In-Reply-To: <af3f7518-7501-4c25-9bbc-a8fc8cdb4e29@app.fastmail.com>
On Mon 27-04-26 09:30:28, Chuck Lever wrote:
>
> On Mon, Apr 27, 2026, at 6:55 AM, Jan Kara wrote:
> > On Fri 24-04-26 21:53:02, Chuck Lever wrote:
> >> Changes since v10:
> >> - cifs: Source case-handling flags from the server's cached
> >> FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
> >> option, with a nocase fallback when the reply is absent
> >> - Address findings from sashiko(gemini-3) and gpt-5.5:
> >> - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
> >> instead)
> >> - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
> >> chattr round-trips do not hit the setflags whitelist
> >> - ext4, f2fs: Drop redundant fileattr_get patches; the
> >> FS_CASEFOLD_FL translation in fileattr_fill_flags() already
> >> reports FS_XFLAG_CASEFOLD for casefolded directories
> >
> > Err, how is this supposed to work? I wasn't able to find any code
> > transforming S_CASEFOLDED inode flag into FS_CASEFOLD_FL on fileattr_get
> > path. Sure, fileattr_fill_flags() takes care of setting FS_XFLAG_CASEFOLD
> > once FS_CASEFOLD_FL is set. What am I missing?
>
> Agreed, that is a little surprising.
>
> The path does not go through S_CASEFOLD. Both filesystems
> report FS_CASEFOLD_FL straight from their on-disk flag word.
>
> For ext4, EXT4_CASEFOLD_FL is 0x40000000, the same bit value
> as FS_CASEFOLD_FL, and it is included in EXT4_FL_USER_VISIBLE.
> ext4_iget() loads it into ei->i_flags directly from
> raw_inode->i_flags (fs/ext4/inode.c:5358). ext4_fileattr_get()
> then masks with EXT4_FL_USER_VISIBLE and hands the result to
> fileattr_fill_flags(), which translates FS_CASEFOLD_FL into
> FS_XFLAG_CASEFOLD on the way out.
Oh, right. I've missed how EXT4_CASEFOLD_FL propagates into FS_CASEFOLD_FL.
Thanks for clearing that out.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [RFC PATCH v2 1/2] vfs: syscalls: add mkdirat2() that returns an O_DIRECTORY fd
From: Christian Brauner @ 2026-04-27 15:14 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Jori Koolstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Alexander Viro, Arnd Bergmann,
H . Peter Anvin, Jan Kara, Peter Zijlstra, Andrey Albershteyn,
Masami Hiramatsu, Jiri Olsa, Thomas Weißschuh,
Mathieu Desnoyers, Jeff Layton, Aleksa Sarai, cmirabil,
Greg Kroah-Hartman, linux-kernel, linux-fsdevel, linux-api,
linux-arch
In-Reply-To: <5xexygc3rvvlir4smdfn7gndwjgbuijqfummwwumivsnosijux@ygqs3iqxmovh>
> Things proceed to handle_truncate:
> int error = get_write_access(inode);
> if (error)
> return error;
>
> error = security_file_truncate(filp);
> if (!error) {
> error = do_truncate(idmap, path->dentry, 0,
> ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
> filp);
> }
>
> I'm going to ignore the LSM situation and do_truncate failure modes in this one.
>
> AFAICS nothing prevents the same user from racing against file creation to
> execve it, which starts with exe_file_deny_write_access. Should the
> other thread win the race, get_write_access will fail and the WARN_ON
> splat will be generated. That is definitely a problem.
That can't happen:
static inline int get_write_access(struct inode *inode)
{
return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
}
and the check is:
error = handle_truncate(idmap, file);
if (unlikely(error > 0)) {
This was a catch all for broken LSM hook or ->open() instance.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox