Linux filesystem development
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	sashiko-bot <sashiko-bot@kernel.org>
Subject: [PATCH 7/7] nfsd: Cap case-folding probe cost across READDIR entries
Date: Fri, 15 May 2026 11:35:15 -0400	[thread overview]
Message-ID: <20260515153515.362266-8-cel@kernel.org> (raw)
In-Reply-To: <20260515153515.362266-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

NFSv4 READDIR carries a per-entry attrmask. When the attrmask
includes FATTR4_CASE_INSENSITIVE or FATTR4_CASE_PRESERVING,
nfsd4_encode_fattr4() resolves each non-directory child's case
attributes by calling nfsd_get_case_info(), which dget_parent()s
back to the directory being read and re-runs the cred swap and LSM
probe per child. The encoder amplifies a single answer into one
prepare_kernel_cred() allocation, two LSM hooks, and one put_cred()
RCU callback for every non-directory entry.

No mainstream NFSv4 client has been observed to populate a READDIR
attrmask with these attributes; the Linux client queries them only
via SERVER_CAPS at mount time. The exposure is therefore to test
clients exploring corner cases and to hostile clients that submit
an attrmask designed to multiply server work by rd_dircount.

Probe the directory being read once and cache the result on
struct nfsd4_readdir for use by every non-directory child. The
probe targets the readdir filehandle's dentry, which is held for
the duration of the request, rather than dget_parent() of a
child's locklessly-acquired dentry; the latter could be moved out
of the directory by a concurrent rename and report attributes
from an unrelated parent. Directory entries continue to be
queried individually, because casefold-capable filesystems (ext4,
f2fs) report case state per directory. The other callers of
nfsd4_encode_fattr4() (single GETATTR, the buffer wrapper) pass
NULL for the cache pointer and behave as before.

Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com?part=14
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4xdr.c | 55 +++++++++++++++++++++++++++++++++++++++--------
 fs/nfsd/xdr4.h    | 14 ++++++++++++
 2 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 319007b79d49..20355dc3f1d1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3883,13 +3883,16 @@ static const nfsd4_enc_attr nfsd4_enc_fattr4_encode_ops[] = {
 
 /*
  * Note: @fhp can be NULL; in this case, we might have to compose the filehandle
- * ourselves.
+ * ourselves. @case_cache is NULL for callers that encode a single dentry
+ * (GETATTR, the buffer wrapper); READDIR passes a per-request cache so
+ * non-directory children share the parent's case-folding probe result.
  */
 static __be32
 nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 		    struct svc_fh *fhp, struct svc_export *exp,
 		    struct dentry *dentry, const u32 *bmval,
-		    int ignore_crossmnt)
+		    int ignore_crossmnt,
+		    struct nfsd_case_attrs_cache *case_cache)
 {
 	DECLARE_BITMAP(attr_bitmap, ARRAY_SIZE(nfsd4_enc_fattr4_encode_ops));
 	struct nfs4_delegation *dp = NULL;
@@ -3999,9 +4002,17 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 		args.fhp = fhp;
 	if (attrmask[0] & (FATTR4_WORD0_CASE_INSENSITIVE |
 			   FATTR4_WORD0_CASE_PRESERVING)) {
-		err = nfsd_get_case_info(dentry, &args.case_insensitive,
-					 &args.case_preserving);
 		/*
+		 * In a batched encoder (READDIR) every non-directory
+		 * child shares the same case-folding answer, so the
+		 * directory being read is probed once and the result is
+		 * cached. The probe targets case_cache->dir, the held
+		 * readdir filehandle's dentry, instead of the child's
+		 * locklessly-acquired dentry, which a concurrent rename
+		 * could move under an unrelated parent. Directory
+		 * entries are queried directly because casefold-capable
+		 * filesystems answer per directory.
+		 *
 		 * Per RFC 8881 Section 18.7.3, an attribute advertised
 		 * in SUPPORTED_ATTRS must come back with a value or the
 		 * GETATTR must fail. nfsd_get_case_info() fills POSIX
@@ -4011,8 +4022,24 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 		 * advertises. Other errors fail the operation as the
 		 * spec requires.
 		 */
-		if (err && err != -EOPNOTSUPP)
-			goto out_nfserr;
+		if (case_cache && !d_is_dir(dentry)) {
+			if (!case_cache->valid) {
+				err = nfsd_get_case_info(case_cache->dir,
+							 &case_cache->insensitive,
+							 &case_cache->preserving);
+				if (err && err != -EOPNOTSUPP)
+					goto out_nfserr;
+				case_cache->valid = true;
+			}
+			args.case_insensitive = case_cache->insensitive;
+			args.case_preserving = case_cache->preserving;
+		} else {
+			err = nfsd_get_case_info(dentry,
+						 &args.case_insensitive,
+						 &args.case_preserving);
+			if (err && err != -EOPNOTSUPP)
+				goto out_nfserr;
+		}
 	}
 
 	if (attrmask[0] & FATTR4_WORD0_ACL) {
@@ -4170,7 +4197,7 @@ __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
 
 	svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
 	ret = nfsd4_encode_fattr4(rqstp, &xdr, fhp, exp, dentry, bmval,
-				  ignore_crossmnt);
+				  ignore_crossmnt, NULL);
 	*p = xdr.p;
 	return ret;
 }
@@ -4208,6 +4235,7 @@ nfsd4_encode_entry4_fattr(struct nfsd4_readdir *cd, const char *name,
 	struct dentry *dentry;
 	__be32 nfserr;
 	int ignore_crossmnt = 0;
+	bool crossed = false;
 
 	dentry = lookup_one_positive_unlocked(&nop_mnt_idmap,
 					      &QSTR_LEN(name, namlen),
@@ -4244,11 +4272,18 @@ nfsd4_encode_entry4_fattr(struct nfsd4_readdir *cd, const char *name,
 		nfserr = check_nfsd_access(exp, cd->rd_rqstp, false);
 		if (nfserr)
 			goto out_put;
+		crossed = true;
 
 	}
 out_encode:
+	/*
+	 * A crossed entry no longer shares a parent with the directory
+	 * being read, so it must neither consume nor populate the
+	 * per-readdir case-folding cache.
+	 */
 	nfserr = nfsd4_encode_fattr4(cd->rd_rqstp, cd->xdr, NULL, exp, dentry,
-				     cd->rd_bmval, ignore_crossmnt);
+				     cd->rd_bmval, ignore_crossmnt,
+				     crossed ? NULL : &cd->rd_case_cache);
 out_put:
 	dput(dentry);
 	exp_put(exp);
@@ -4495,7 +4530,7 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr,
 
 	/* obj_attributes */
 	return nfsd4_encode_fattr4(resp->rqstp, xdr, fhp, fhp->fh_export,
-				   fhp->fh_dentry, getattr->ga_bmval, 0);
+				   fhp->fh_dentry, getattr->ga_bmval, 0, NULL);
 }
 
 static __be32
@@ -5022,6 +5057,8 @@ static __be32 nfsd4_encode_dirlist4(struct xdr_stream *xdr,
 	readdir->rd_maxcount = maxcount;
 	readdir->common.err = 0;
 	readdir->cookie_offset = 0;
+	readdir->rd_case_cache.dir = readdir->rd_fhp->fh_dentry;
+	readdir->rd_case_cache.valid = false;
 	offset = readdir->rd_cookie;
 	status = nfsd_readdir(readdir->rd_rqstp, readdir->rd_fhp, &offset,
 			      &readdir->common, nfsd4_encode_entry4);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 417e9ad9fbb3..615797df218f 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -432,6 +432,19 @@ struct nfsd4_read {
 	u32			rd_eof;             /* response */
 };
 
+/*
+ * Cache the case-folding properties of @dir so a batched encoder
+ * (e.g., READDIR) does not re-probe per child. @dir is the
+ * directory being read, held by the request, so it is stable
+ * against rename for the duration of the cache's lifetime.
+ */
+struct nfsd_case_attrs_cache {
+	struct dentry	*dir;
+	bool		valid;
+	bool		insensitive;
+	bool		preserving;
+};
+
 struct nfsd4_readdir {
 	u64		rd_cookie;          /* request */
 	nfs4_verifier	rd_verf;            /* request */
@@ -444,6 +457,7 @@ struct nfsd4_readdir {
 	struct readdir_cd	common;
 	struct xdr_stream	*xdr;
 	int			cookie_offset;
+	struct nfsd_case_attrs_cache rd_case_cache;
 };
 
 struct nfsd4_release_lockowner {
-- 
2.54.0


      parent reply	other threads:[~2026-05-15 15:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 15:35 [PATCH 0/7] Fixes for vfs/vfs-7.2.casefold Chuck Lever
2026-05-15 15:35 ` [PATCH 1/7] tools headers UAPI: Sync case-sensitivity flags from linux/fs.h Chuck Lever
2026-05-15 15:35 ` [PATCH 2/7] nfs: Avoid transient zeroed case capability bits during probe Chuck Lever
2026-05-15 15:35 ` [PATCH 3/7] nfs: Skip pathconf probe when neither field is consumed Chuck Lever
2026-05-15 15:35 ` [PATCH 4/7] fs: Clarify FS_CASEFOLD_FL semantics in UAPI header Chuck Lever
2026-05-15 15:35 ` [PATCH 5/7] nfsd: Use kernel credentials for case-info probe Chuck Lever
2026-05-15 15:35 ` [PATCH 6/7] nfsd: Map -ESTALE from case probe to NFS3ERR_STALE Chuck Lever
2026-05-15 15:35 ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515153515.362266-8-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sashiko-bot@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox