Linux CIFS filesystem development
 help / color / mirror / Atom feed
* [PATCH v4 01/19] cifs: change_conf needs to be called for session setup
@ 2026-05-01 11:20 nspmangalore
  2026-05-01 11:20 ` [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases nspmangalore
                   ` (17 more replies)
  0 siblings, 18 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N, stable

From: Shyam Prasad N <sprasad@microsoft.com>

Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.

This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.

Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/smb2ops.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index 509fcea28a429..a9d68e5fcea91 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -111,10 +111,21 @@ smb2_add_credits(struct TCP_Server_Info *server,
 				      cifs_trace_rw_credits_zero_in_flight);
 	}
 	server->in_flight--;
+
+	/*
+	 * Rebalance credits when an op drains in_flight. For session setup,
+	 * do this only when the total accumulated credits are high enough (>2)
+	 * so that a newly established secondary channel can reserve credits for
+	 * echoes and oplocks. We expect this to happen at the end of the final
+	 * session setup response.
+	 */
 	if (server->in_flight == 0 &&
 	   ((optype & CIFS_OP_MASK) != CIFS_NEG_OP) &&
 	   ((optype & CIFS_OP_MASK) != CIFS_SESS_OP))
 		rc = change_conf(server);
+	else if (server->in_flight == 0 &&
+		 ((optype & CIFS_OP_MASK) == CIFS_SESS_OP) && *val > 2)
+		rc = change_conf(server);
 	/*
 	 * Sometimes server returns 0 credits on oplock break ack - we need to
 	 * rebalance credits in this case.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-06 14:16   ` Bharath SM
  2026-05-01 11:20 ` [PATCH v4 03/19] cifs: invalidate cfid on unlink/rename/rmdir nspmangalore
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N, stable

From: Shyam Prasad N <sprasad@microsoft.com>

It is possible that SMB2_open_init may not set lease context based
on the requested oplock level. This can happen when leases have been
temporarily or permanently disabled. When this happens, we will have
open_cached_dir making an open without lease context and the response
will anyway be rejected by open_cached_dir (thereby forcing a close to
discard this open). That's unnecessary two round-trips to the server.

This change adds a check before making the open request to the server
to make sure that SMB2_open_init did add the expected lease context
to the open in open_cached_dir.

Cc: <stable@vger.kernel.org>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 04bb95091f498..64e22c064fa0a 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -286,6 +286,14 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 			    &rqst[0], &oplock, &oparms, utf16_path);
 	if (rc)
 		goto oshr_free;
+
+	if (oplock != SMB2_OPLOCK_LEVEL_II) {
+		rc = -EINVAL;
+		cifs_dbg(FYI, "%s: Oplock level %d not suitable for cached directory\n",
+			 __func__, oplock);
+		goto oshr_free;
+	}
+
 	smb2_set_next_command(tcon, &rqst[0]);
 
 	memset(&qi_iov, 0, sizeof(qi_iov));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 03/19] cifs: invalidate cfid on unlink/rename/rmdir
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
  2026-05-01 11:20 ` [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 04/19] cifs: define variable sized buffer for querydir responses nspmangalore
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N, stable

From: Shyam Prasad N <sprasad@microsoft.com>

Today we do not invalidate the cached_dirent or the entire
parent cfid when a dentry in a dir has been removed/moved.

This change invalidates the parent cfid so that we don't serve
directory contents from the cache.

Cc: <stable@vger.kernel.org>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/inode.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index 888f9e35f14b8..f0b76670b0921 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -28,6 +28,23 @@
 #include "cached_dir.h"
 #include "reparse.h"
 
+static void cifs_invalidate_cached_dir(struct cifs_tcon *tcon,
+				       struct dentry *parent)
+{
+	struct cached_fid *parent_cfid = NULL;
+
+	if (!tcon || !parent)
+		return;
+
+	if (!open_cached_dir_by_dentry(tcon, parent, &parent_cfid)) {
+		mutex_lock(&parent_cfid->dirents.de_mutex);
+		parent_cfid->dirents.is_valid = false;
+		parent_cfid->dirents.is_failed = true;
+		mutex_unlock(&parent_cfid->dirents.de_mutex);
+		close_cached_dir(parent_cfid);
+	}
+}
+
 /*
  * Set parameters for the netfs library
  */
@@ -2067,6 +2084,9 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren
 		cifs_set_file_info(inode, attrs, xid, full_path, origattr);
 
 out_reval:
+	if (!rc && dentry->d_parent)
+		cifs_invalidate_cached_dir(tcon, dentry->d_parent);
+
 	if (inode) {
 		cifs_inode = CIFS_I(inode);
 		cifs_inode->time = 0;	/* will force revalidate to get info
@@ -2378,7 +2398,6 @@ int cifs_rmdir(struct inode *inode, struct dentry *direntry)
 	}
 
 	rc = server->ops->rmdir(xid, tcon, full_path, cifs_sb);
-	cifs_put_tlink(tlink);
 
 	cifsInode = CIFS_I(d_inode(direntry));
 
@@ -2388,6 +2407,8 @@ int cifs_rmdir(struct inode *inode, struct dentry *direntry)
 		i_size_write(d_inode(direntry), 0);
 		clear_nlink(d_inode(direntry));
 		spin_unlock(&d_inode(direntry)->i_lock);
+		if (direntry->d_parent)
+			cifs_invalidate_cached_dir(tcon, direntry->d_parent);
 	}
 
 	/* force revalidate to go get info when needed */
@@ -2402,6 +2423,7 @@ int cifs_rmdir(struct inode *inode, struct dentry *direntry)
 
 	inode_set_ctime_current(d_inode(direntry));
 	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
+	cifs_put_tlink(tlink);
 
 rmdir_exit:
 	free_dentry_path(page);
@@ -2668,6 +2690,12 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	}
 
 	/* force revalidate to go get info when needed */
+	if (!rc) {
+		cifs_invalidate_cached_dir(tcon, source_dentry->d_parent);
+		if (target_dentry->d_parent != source_dentry->d_parent)
+			cifs_invalidate_cached_dir(tcon, target_dentry->d_parent);
+	}
+
 	CIFS_I(source_dir)->time = CIFS_I(target_dir)->time = 0;
 
 cifs_rename_exit:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 04/19] cifs: define variable sized buffer for querydir responses
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
  2026-05-01 11:20 ` [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases nspmangalore
  2026-05-01 11:20 ` [PATCH v4 03/19] cifs: invalidate cfid on unlink/rename/rmdir nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 05/19] cifs: optimize readdir for small directories nspmangalore
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

QueryDirectory responses today are stored in one of two fixed
sized buffers: smallbuf (448 bytes) or bigbuf (16KB). These are
borrowed from server struct and are not sufficient for large-sized
query dir operations.

With this change we will now define a new buffer type specifically
for cifs_search_info to hold variable sized responses. These will
be allocated by kmalloc and freed by kfree.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cifsglob.h |  2 ++
 fs/smb/client/file.c     |  2 ++
 fs/smb/client/readdir.c  |  2 ++
 fs/smb/client/smb2pdu.c  | 14 +++++++++++---
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 709e96e077916..8d089ba08e3e5 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1394,6 +1394,7 @@ struct cifs_search_info {
 	bool emptyDir:1;
 	bool unicode:1;
 	bool smallBuf:1; /* so we know which buf_release function to call */
+	bool is_dynamic_buf:1; /* dynamically allocated buffer - can be variable size */
 };
 
 #define ACL_NO_MODE	((umode_t)(-1))
@@ -1907,6 +1908,7 @@ enum cifs_find_flags {
 #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
 #define   CIFS_SMALL_BUFFER     1
 #define   CIFS_LARGE_BUFFER     2
+#define   CIFS_DYNAMIC_BUFFER   3    /* Dynamically allocated buffer */
 #define   CIFS_IOVEC            4    /* array of response buffers */
 
 /* Type of Request to SendReceive2 */
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index a69e05f86d7e2..6a1419d59ed5a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -1546,6 +1546,8 @@ int cifs_closedir(struct inode *inode, struct file *file)
 		cfile->srch_inf.ntwrk_buf_start = NULL;
 		if (cfile->srch_inf.smallBuf)
 			cifs_small_buf_release(buf);
+		else if (cfile->srch_inf.is_dynamic_buf)
+			kfree(buf);
 		else
 			cifs_buf_release(buf);
 	}
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index be22bbc4a65a0..b50efd9b9e1d2 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -732,6 +732,8 @@ find_cifs_entry(const unsigned int xid, struct cifs_tcon *tcon, loff_t pos,
 			if (cfile->srch_inf.smallBuf)
 				cifs_small_buf_release(cfile->srch_inf.
 						ntwrk_buf_start);
+			else if (cfile->srch_inf.is_dynamic_buf)
+				kfree(cfile->srch_inf.ntwrk_buf_start);
 			else
 				cifs_buf_release(cfile->srch_inf.
 						ntwrk_buf_start);
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 5188218c25be4..49dca84b169e6 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -5625,6 +5625,8 @@ smb2_parse_query_directory(struct cifs_tcon *tcon,
 	if (srch_inf->ntwrk_buf_start) {
 		if (srch_inf->smallBuf)
 			cifs_small_buf_release(srch_inf->ntwrk_buf_start);
+		else if (srch_inf->is_dynamic_buf)
+			kfree(srch_inf->ntwrk_buf_start);
 		else
 			cifs_buf_release(srch_inf->ntwrk_buf_start);
 	}
@@ -5644,12 +5646,18 @@ smb2_parse_query_directory(struct cifs_tcon *tcon,
 	cifs_dbg(FYI, "num entries %d last_index %lld srch start %p srch end %p\n",
 		 srch_inf->entries_in_buffer, srch_inf->index_of_last_entry,
 		 srch_inf->srch_entries_start, srch_inf->last_entry);
-	if (resp_buftype == CIFS_LARGE_BUFFER)
+	if (resp_buftype == CIFS_LARGE_BUFFER) {
 		srch_inf->smallBuf = false;
-	else if (resp_buftype == CIFS_SMALL_BUFFER)
+		srch_inf->is_dynamic_buf = false;
+	} else if (resp_buftype == CIFS_SMALL_BUFFER) {
 		srch_inf->smallBuf = true;
-	else
+		srch_inf->is_dynamic_buf = false;
+	} else if (resp_buftype == CIFS_DYNAMIC_BUFFER) {
+		srch_inf->smallBuf = false;
+		srch_inf->is_dynamic_buf = true;
+	} else {
 		cifs_tcon_dbg(VFS, "Invalid search buffer type\n");
+	}
 
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 05/19] cifs: optimize readdir for small directories
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (2 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 04/19] cifs: define variable sized buffer for querydir responses nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 06/19] cifs: optimize readdir for larger directories nspmangalore
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

For small directories (where the entire directory contents could be
read in a single QueryDir request), we currently do an extra
round-trip just to get a STATUS_NO_MORE_FILES back from the server.

This change avoids doing that by adding another QueryDir to the first
compound to the server for readdir. i.e. first request to readdir
will correspond to a compound of (OPEN+QD1+QD2). QD2 will request
for a smaller size (in anticipation of STATUS_NO_MORE_FILES).

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/smb2ops.c   | 156 ++++++++++++++++++++++++++++++++++----
 fs/smb/client/smb2pdu.c   |  19 +++--
 fs/smb/client/smb2pdu.h   |  11 +++
 fs/smb/client/smb2proto.h |   3 +-
 4 files changed, 168 insertions(+), 21 deletions(-)

diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index a9d68e5fcea91..f075330f88598 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -2450,18 +2450,21 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 		     struct cifs_search_info *srch_inf)
 {
 	__le16 *utf16_path;
-	struct smb_rqst rqst[2];
-	struct kvec rsp_iov[2];
-	int resp_buftype[2];
+	struct smb_rqst rqst[3];
+	struct kvec rsp_iov[3];
+	int resp_buftype[3];
 	struct kvec open_iov[SMB2_CREATE_IOV_SIZE];
-	struct kvec qd_iov[SMB2_QUERY_DIRECTORY_IOV_SIZE];
+	struct kvec qd_iov[SMB2_QUERY_DIRECTORY_IOV_SIZE + 1]; /* +1 for padding */
+	struct kvec qd2_iov[SMB2_QUERY_DIRECTORY_IOV_SIZE + 1]; /* +1 for padding */
 	int rc, flags = 0;
 	u8 oplock = SMB2_OPLOCK_LEVEL_NONE;
 	struct cifs_open_parms oparms;
 	struct smb2_query_directory_rsp *qd_rsp = NULL;
+	struct smb2_query_directory_rsp *qd2_rsp = NULL;
 	struct smb2_create_rsp *op_rsp = NULL;
 	struct TCP_Server_Info *server;
 	int retries = 0, cur_sleep = 0;
+	unsigned int compound_resp_bufsize;
 
 replay_again:
 	/* reinitialize for possible replay */
@@ -2476,8 +2479,15 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	if (smb3_encryption_required(tcon))
 		flags |= CIFS_TRANSFORM_REQ;
 
+	/*
+	 * Clamp compound Create+QD1+QD2 response sizing to a response size
+	 * for suited for one credit even if CIFSMaxBufSize is tuned larger
+	 */
+	compound_resp_bufsize = min_t(unsigned int, CIFSMaxBufSize,
+				      SMB2_MAX_BUFFER_SIZE);
+
 	memset(rqst, 0, sizeof(rqst));
-	resp_buftype[0] = resp_buftype[1] = CIFS_NO_BUFFER;
+	resp_buftype[0] = resp_buftype[1] = resp_buftype[2] = CIFS_NO_BUFFER;
 	memset(rsp_iov, 0, sizeof(rsp_iov));
 
 	/* Open */
@@ -2501,7 +2511,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 		goto qdf_free;
 	smb2_set_next_command(tcon, &rqst[0]);
 
-	/* Query directory */
+	/* First Query directory */
 	srch_inf->entries_in_buffer = 0;
 	srch_inf->index_of_last_entry = 2;
 
@@ -2512,11 +2522,27 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	rc = SMB2_query_directory_init(xid, tcon, server,
 				       &rqst[1],
 				       COMPOUND_FID, COMPOUND_FID,
-				       0, srch_inf->info_level);
+				       0, srch_inf->info_level,
+				       SMB2_QD1_OUTPUT_SIZE(compound_resp_bufsize));
 	if (rc)
 		goto qdf_free;
 
 	smb2_set_related(&rqst[1]);
+	smb2_set_next_command(tcon, &rqst[1]);
+
+	/* Second Query directory - minimal size to check if more data exists */
+	memset(&qd2_iov, 0, sizeof(qd2_iov));
+	rqst[2].rq_iov = qd2_iov;
+	rqst[2].rq_nvec = SMB2_QUERY_DIRECTORY_IOV_SIZE;
+
+	rc = SMB2_query_directory_init(xid, tcon, server,
+				       &rqst[2],
+				       COMPOUND_FID, COMPOUND_FID,
+				       0, srch_inf->info_level, SMB2_QD2_RESPONSE_SIZE);
+	if (rc)
+		goto qdf_free;
+
+	smb2_set_related(&rqst[2]);
 
 	if (retries) {
 		/* Back-off before retry */
@@ -2524,10 +2550,11 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 			msleep(cur_sleep);
 		smb2_set_replay(server, &rqst[0]);
 		smb2_set_replay(server, &rqst[1]);
+		smb2_set_replay(server, &rqst[2]);
 	}
 
 	rc = compound_send_recv(xid, tcon->ses, server,
-				flags, 2, rqst,
+				flags, 3, rqst,
 				resp_buftype, rsp_iov);
 
 	/* If the open failed there is nothing to do */
@@ -2559,14 +2586,111 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 		goto qdf_free;
 	}
 
-	rc = smb2_parse_query_directory(tcon, &rsp_iov[1], resp_buftype[1],
-					srch_inf);
-	if (rc) {
-		trace_smb3_query_dir_err(xid, fid->persistent_fid, tcon->tid,
-			tcon->ses->Suid, 0, 0, rc);
-		goto qdf_free;
+	qd2_rsp = (struct smb2_query_directory_rsp *)rsp_iov[2].iov_base;
+
+	/*
+	 * If QD2 has data, combine QD1 and QD2 responses before parsing.
+	 * The server cursor advances past both responses, so we can't discard QD2.
+	 */
+	if (qd2_rsp && qd2_rsp->hdr.Status == STATUS_SUCCESS &&
+	    le32_to_cpu(qd2_rsp->OutputBufferLength) > 0) {
+		char *combined_buf;
+		size_t qd1_data_len, qd2_data_len, combined_len;
+		u16 qd1_offset, qd2_offset;
+		struct smb2_query_directory_rsp *combined_rsp;
+		struct kvec combined_iov;
+		FILE_DIRECTORY_INFO *last_entry_in_qd1;
+		char *qd1_entries_start, *qd2_entries_start;
+		unsigned int next_offset;
+
+		qd1_offset = le16_to_cpu(qd_rsp->OutputBufferOffset);
+		qd2_offset = le16_to_cpu(qd2_rsp->OutputBufferOffset);
+		qd1_data_len = le32_to_cpu(qd_rsp->OutputBufferLength);
+		qd2_data_len = le32_to_cpu(qd2_rsp->OutputBufferLength);
+
+		/* Allocate buffer for: QD1 header + QD1 data + QD2 data */
+		combined_len = qd1_offset + qd1_data_len + qd2_data_len;
+		combined_buf = kmalloc(combined_len, GFP_KERNEL);
+		if (!combined_buf) {
+			rc = -ENOMEM;
+			goto qdf_free;
+		}
+
+		/* Copy QD1 header and data */
+		memcpy(combined_buf, qd_rsp, qd1_offset + qd1_data_len);
+
+		/* Append QD2 data (directory entries only, not the header) */
+		memcpy(combined_buf + qd1_offset + qd1_data_len,
+		       (char *)qd2_rsp + qd2_offset, qd2_data_len);
+
+		/* Update OutputBufferLength to reflect combined data */
+		combined_rsp = (struct smb2_query_directory_rsp *)combined_buf;
+		combined_rsp->OutputBufferLength = cpu_to_le32(qd1_data_len + qd2_data_len);
+
+		/*
+		 * Chain QD1 and QD2 entries: find the last entry in QD1 and update
+		 * its NextEntryOffset to point to the first entry in QD2.
+		 */
+		if (qd1_data_len > 0) {
+			qd1_entries_start = combined_buf + qd1_offset;
+			qd2_entries_start = combined_buf + qd1_offset + qd1_data_len;
+			last_entry_in_qd1 = (FILE_DIRECTORY_INFO *)qd1_entries_start;
+
+			/* Walk QD1 entries to find the last one with bounds checking */
+			while (1) {
+				char *end_of_qd1 = qd1_entries_start + qd1_data_len;
+
+				next_offset = le32_to_cpu(last_entry_in_qd1->NextEntryOffset);
+				if (next_offset == 0)
+					break;  /* Found last entry */
+
+				/* Bounds check before advancing */
+				if ((char *)last_entry_in_qd1 + next_offset >= end_of_qd1) {
+					cifs_dbg(VFS, "query_dir_first: invalid NextEntryOffset in QD1\n");
+					kfree(combined_buf);
+					rc = -EIO;
+					goto qdf_free;
+				}
+
+				last_entry_in_qd1 = (FILE_DIRECTORY_INFO *)((char *)last_entry_in_qd1 + next_offset);
+			}
+
+			/* Chain last QD1 entry to first QD2 entry */
+			last_entry_in_qd1->NextEntryOffset = cpu_to_le32(qd2_entries_start - (char *)last_entry_in_qd1);
+		}
+
+		/* Parse the combined buffer */
+		combined_iov.iov_base = combined_buf;
+		combined_iov.iov_len = combined_len;
+		rc = smb2_parse_query_directory(tcon, &combined_iov, CIFS_DYNAMIC_BUFFER,
+						srch_inf);
+		if (rc) {
+			kfree(combined_buf);
+			trace_smb3_query_dir_err(xid, fid->persistent_fid, tcon->tid,
+						 tcon->ses->Suid, 0, 0, rc);
+			goto qdf_free;
+		}
+		/* Ownership of combined_buf transferred to srch_inf->ntwrk_buf_start */
+		srch_inf->endOfSearch = false;
+		cifs_dbg(FYI, "query_dir_first: combined QD1 and QD2, %d entries\n",
+			 srch_inf->entries_in_buffer);
+	} else {
+		/* No data in QD2, just parse QD1 */
+		rc = smb2_parse_query_directory(tcon, &rsp_iov[1], resp_buftype[1],
+						srch_inf);
+		if (rc) {
+			trace_smb3_query_dir_err(xid, fid->persistent_fid, tcon->tid,
+						 tcon->ses->Suid, 0, 0, rc);
+			goto qdf_free;
+		}
+		resp_buftype[1] = CIFS_NO_BUFFER;
+
+		/* Check if QD2 indicates end of directory */
+		if (qd2_rsp && qd2_rsp->hdr.Status == STATUS_NO_MORE_FILES) {
+			srch_inf->endOfSearch = true;
+			cifs_dbg(FYI, "query_dir_first: small directory, all entries read\n");
+		}
 	}
-	resp_buftype[1] = CIFS_NO_BUFFER;
 
 	trace_smb3_query_dir_done(xid, fid->persistent_fid, tcon->tid,
 			tcon->ses->Suid, 0, srch_inf->entries_in_buffer);
@@ -2575,8 +2699,10 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	kfree(utf16_path);
 	SMB2_open_free(&rqst[0]);
 	SMB2_query_directory_free(&rqst[1]);
+	SMB2_query_directory_free(&rqst[2]);
 	free_rsp_buf(resp_buftype[0], rsp_iov[0].iov_base);
 	free_rsp_buf(resp_buftype[1], rsp_iov[1].iov_base);
+	free_rsp_buf(resp_buftype[2], rsp_iov[2].iov_base);
 
 	if (is_replayable_error(rc) &&
 	    smb2_should_replay(tcon, &retries, &cur_sleep))
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 49dca84b169e6..2d55246d2851b 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -5504,18 +5504,27 @@ int SMB2_query_directory_init(const unsigned int xid,
 			      struct TCP_Server_Info *server,
 			      struct smb_rqst *rqst,
 			      u64 persistent_fid, u64 volatile_fid,
-			      int index, int info_level)
+			      int index, int info_level,
+			      unsigned int output_size)
 {
 	struct smb2_query_directory_req *req;
 	unsigned char *bufptr;
 	__le16 asteriks = cpu_to_le16('*');
-	unsigned int output_size = CIFSMaxBufSize -
-		MAX_SMB2_CREATE_RESPONSE_SIZE -
-		MAX_SMB2_CLOSE_RESPONSE_SIZE;
 	unsigned int total_len;
 	struct kvec *iov = rqst->rq_iov;
 	int len, rc;
 
+	/*
+	 * Use provided output_size, or default to CIFSMaxBufSize calculation.
+	 * The default is for standalone QueryDir (smb2_query_dir_next).
+	 * For compounds, the caller should pass explicit output_size.
+	 */
+	if (output_size == 0) {
+		output_size = CIFSMaxBufSize -
+			MAX_SMB2_CREATE_RESPONSE_SIZE -
+			MAX_SMB2_CLOSE_RESPONSE_SIZE;
+	}
+
 	rc = smb2_plain_req_init(SMB2_QUERY_DIRECTORY, tcon, server,
 				 (void **) &req, &total_len);
 	if (rc)
@@ -5697,7 +5706,7 @@ SMB2_query_directory(const unsigned int xid, struct cifs_tcon *tcon,
 	rc = SMB2_query_directory_init(xid, tcon, server,
 				       &rqst, persistent_fid,
 				       volatile_fid, index,
-				       srch_inf->info_level);
+				       srch_inf->info_level, 0);
 	if (rc)
 		goto qdir_exit;
 
diff --git a/fs/smb/client/smb2pdu.h b/fs/smb/client/smb2pdu.h
index 30d70097fe2fa..7b7a864520c68 100644
--- a/fs/smb/client/smb2pdu.h
+++ b/fs/smb/client/smb2pdu.h
@@ -129,6 +129,17 @@ struct share_redirect_error_context_rsp {
  */
 #define MAX_SMB2_CREATE_RESPONSE_SIZE 880
 
+/* Size of the minimal QueryDir response for checking if more data exists */
+#define SMB2_QD2_RESPONSE_SIZE 4096
+
+/*
+ * Output buffer size for first QueryDir in Create+QD1+QD2 compound.
+ * Accounts for shared buffer space needed for all three responses.
+ */
+#define SMB2_QD1_OUTPUT_SIZE(bufsize) \
+	((bufsize) - MAX_SMB2_CREATE_RESPONSE_SIZE - \
+	 sizeof(struct smb2_hdr) - SMB2_QD2_RESPONSE_SIZE)
+
 #define SMB2_LEASE_READ_CACHING_HE	0x01
 #define SMB2_LEASE_HANDLE_CACHING_HE	0x02
 #define SMB2_LEASE_WRITE_CACHING_HE	0x04
diff --git a/fs/smb/client/smb2proto.h b/fs/smb/client/smb2proto.h
index 230bb1e9f4e19..9de7d2fe8d466 100644
--- a/fs/smb/client/smb2proto.h
+++ b/fs/smb/client/smb2proto.h
@@ -194,7 +194,8 @@ int SMB2_query_directory(const unsigned int xid, struct cifs_tcon *tcon,
 int SMB2_query_directory_init(const unsigned int xid, struct cifs_tcon *tcon,
 			      struct TCP_Server_Info *server,
 			      struct smb_rqst *rqst, u64 persistent_fid,
-			      u64 volatile_fid, int index, int info_level);
+			      u64 volatile_fid, int index, int info_level,
+			      unsigned int output_size);
 void SMB2_query_directory_free(struct smb_rqst *rqst);
 int SMB2_set_eof(const unsigned int xid, struct cifs_tcon *tcon,
 		 u64 persistent_fid, u64 volatile_fid, u32 pid,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 06/19] cifs: optimize readdir for larger directories
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (3 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 05/19] cifs: optimize readdir for small directories nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 07/19] cifs: reorganize cached dir helpers nspmangalore
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today QueryDirectory uses the compound_send_recv infrastructure
which is limited to 16KB in size. As a result, readdir of large
directories generally take several round trips.

With this change, if the readdir needs a QueryDir after the first
round-trip (meaning that there are more dirents to read), then the
following QueryDirs will now switch to using larger buffers with
MTU credits.

Till now, the only command type that used this flow was SMB2_READ.
In case of encrypted response, it becomes challenging to decide if
the response is for SMB2_READ or SMB2_READDIR. This change reuses
receive_encrypted_read and after decrypting the response decides
the handling function based on the command in the resp header. That
way, care has been taken to ensure that the read code path
modifications on account of this change are kept to a minimum.

The change also renames the discard function since both read and
query_dir paths will now reuse the same function to discard remaining
data in the socket and dequeue the mid.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cifsglob.h  |  21 +-
 fs/smb/client/cifsproto.h |   3 +
 fs/smb/client/readdir.c   |   2 +-
 fs/smb/client/smb1ops.c   |   4 +-
 fs/smb/client/smb2misc.c  |   7 +-
 fs/smb/client/smb2ops.c   | 458 ++++++++++++++++++++++++++++++++++++--
 fs/smb/client/smb2pdu.c   | 185 ++++++++++++++-
 fs/smb/client/smb2pdu.h   |   3 +
 fs/smb/client/smb2proto.h |   3 +
 fs/smb/client/trace.h     |   1 +
 fs/smb/client/transport.c | 169 +++++++++++++-
 11 files changed, 811 insertions(+), 45 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 8d089ba08e3e5..38d5600efe2c8 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -504,7 +504,7 @@ struct smb_version_operations {
 			       struct cifs_search_info *);
 	/* continue readdir */
 	int (*query_dir_next)(const unsigned int, struct cifs_tcon *,
-			      struct cifs_fid *,
+			      struct cifs_sb_info *, struct cifs_fid *,
 			      __u16, struct cifs_search_info *srch_inf);
 	/* close dir */
 	int (*close_dir)(const unsigned int, struct cifs_tcon *,
@@ -1397,6 +1397,25 @@ struct cifs_search_info {
 	bool is_dynamic_buf:1; /* dynamically allocated buffer - can be variable size */
 };
 
+/* Structure for QueryDirectory with multi-credit support */
+struct cifs_query_dir_io {
+	struct cifs_tcon *tcon;
+	struct TCP_Server_Info *server;
+	struct cifs_search_info *srch_inf;
+	unsigned int xid;
+	u64 persistent_fid;
+	u64 volatile_fid;
+	int index;
+	struct kvec combined_iov;	/* Pre-allocated buffer to hold resp */
+	struct completion done;
+	int result;
+	struct cifs_credits credits;
+	bool replay;
+	unsigned int retries;
+	unsigned int cur_sleep;
+	struct kvec iov[2];		/* For response handling */
+};
+
 #define ACL_NO_MODE	((umode_t)(-1))
 struct cifs_open_parms {
 	struct cifs_tcon *tcon;
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 884bfa1cf0b42..bbbee0ef09443 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -336,6 +336,9 @@ struct cifs_ses *cifs_get_smb_ses(struct TCP_Server_Info *server,
 int cifs_readv_receive(struct TCP_Server_Info *server,
 		       struct mid_q_entry *mid);
 
+int cifs_query_dir_receive(struct TCP_Server_Info *server,
+			    struct mid_q_entry *mid);
+
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
 			  struct cifs_sb_info *cifs_sb,
 			  const unsigned char *path, char *pbuf,
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index b50efd9b9e1d2..8a444f97e0ae9 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -760,7 +760,7 @@ find_cifs_entry(const unsigned int xid, struct cifs_tcon *tcon, loff_t pos,
 	while ((index_to_find >= cfile->srch_inf.index_of_last_entry) &&
 	       (rc == 0) && !cfile->srch_inf.endOfSearch) {
 		cifs_dbg(FYI, "calling findnext2\n");
-		rc = server->ops->query_dir_next(xid, tcon, &cfile->fid,
+		rc = server->ops->query_dir_next(xid, tcon, cifs_sb, &cfile->fid,
 						 search_flags,
 						 &cfile->srch_inf);
 		if (rc)
diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c
index 9694117050a6c..860a9b23a2f8d 100644
--- a/fs/smb/client/smb1ops.c
+++ b/fs/smb/client/smb1ops.c
@@ -1135,8 +1135,8 @@ cifs_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 
 static int
 cifs_query_dir_next(const unsigned int xid, struct cifs_tcon *tcon,
-		    struct cifs_fid *fid, __u16 search_flags,
-		    struct cifs_search_info *srch_inf)
+		    struct cifs_sb_info *cifs_sb, struct cifs_fid *fid,
+		    __u16 search_flags, struct cifs_search_info *srch_inf)
 {
 	return CIFSFindNext(xid, tcon, fid->netfid, search_flags, srch_inf);
 }
diff --git a/fs/smb/client/smb2misc.c b/fs/smb/client/smb2misc.c
index 973fce3c959c4..b7b6ecd5fdaee 100644
--- a/fs/smb/client/smb2misc.c
+++ b/fs/smb/client/smb2misc.c
@@ -12,6 +12,7 @@
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "smb2proto.h"
+#include "smb2pdu.h"
 #include "cifs_debug.h"
 #include "cifs_unicode.h"
 #include "../common/smb2status.h"
@@ -316,7 +317,7 @@ char *
 smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr)
 {
 	const int max_off = 4096;
-	const int max_len = 128 * 1024;
+	int max_len = 128 * 1024;
 
 	*off = 0;
 	*len = 0;
@@ -367,6 +368,10 @@ smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr)
 		  ((struct smb2_query_directory_rsp *)shdr)->OutputBufferOffset);
 		*len = le32_to_cpu(
 		  ((struct smb2_query_directory_rsp *)shdr)->OutputBufferLength);
+		/* Allow larger buffers for query directory (up to 2MB).
+		 * The actual data is handled separately in cifs_query_dir_receive().
+		 */
+		max_len = SMB2_MAX_QD_DATABUF_SIZE;
 		break;
 	case SMB2_IOCTL:
 		*off = le32_to_cpu(
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index f075330f88598..2df4d080e95f0 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -2713,11 +2713,131 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 
 static int
 smb2_query_dir_next(const unsigned int xid, struct cifs_tcon *tcon,
-		    struct cifs_fid *fid, __u16 search_flags,
-		    struct cifs_search_info *srch_inf)
+		    struct cifs_sb_info *cifs_sb, struct cifs_fid *fid,
+		    __u16 search_flags, struct cifs_search_info *srch_inf)
 {
-	return SMB2_query_directory(xid, tcon, fid->persistent_fid,
-				    fid->volatile_fid, 0, srch_inf);
+	struct cifs_query_dir_io qd_io;
+	struct TCP_Server_Info *server;
+	struct cifs_ses *ses = tcon->ses;
+	size_t buf_size;
+	int rc;
+	int resp_buftype = CIFS_DYNAMIC_BUFFER;
+
+	/* Pick server and determine buffer size based on negotiated rsize */
+	server = cifs_pick_channel(ses);
+	if (!server)
+		return smb_EIO(smb_eio_trace_null_pointers);
+
+	/* Negotiate rsize if not already set */
+	if (cifs_sb->ctx->rsize == 0)
+		cifs_negotiate_rsize(server, cifs_sb->ctx, tcon);
+
+	/* Use negotiated rsize for buffer size, with reasonable limits */
+	buf_size = cifs_sb->ctx->rsize;
+
+	cifs_dbg(FYI, "%s: using buffer size %zu (rsize=%u, encrypted=%d)\n",
+		 __func__, buf_size, cifs_sb->ctx->rsize, smb3_encryption_required(tcon));
+
+	/* Initialize qd_io structure */
+	memset(&qd_io, 0, sizeof(qd_io));
+	qd_io.tcon = tcon;
+	qd_io.server = server;
+	qd_io.srch_inf = srch_inf;
+	qd_io.xid = xid;
+	qd_io.persistent_fid = fid->persistent_fid;
+	qd_io.volatile_fid = fid->volatile_fid;
+	qd_io.index = 0;
+	qd_io.result = 0;
+	qd_io.replay = false;
+	qd_io.retries = 0;
+	qd_io.cur_sleep = 0;
+
+	/* Allocate credits for the buffer size */
+	rc = server->ops->wait_mtu_credits(server, buf_size, &buf_size,
+					   &qd_io.credits);
+	if (rc) {
+		cifs_dbg(VFS, "%s: failed to get credits: %d\n", __func__, rc);
+		return rc;
+	}
+
+	cifs_dbg(FYI, "%s: allocated %u credits for %zu bytes\n",
+		 __func__, qd_io.credits.value, buf_size);
+
+	/* Send query directory with large buffer and wait for completion */
+	rc = SMB2_query_directory_large(&qd_io, buf_size);
+	if (rc) {
+		if (rc == -ENODATA) {
+			const struct smb2_hdr *hdr = NULL;
+
+			if (qd_io.combined_iov.iov_base)
+				hdr = (const struct smb2_hdr *)qd_io.combined_iov.iov_base;
+			else if (qd_io.iov[0].iov_base)
+				hdr = (const struct smb2_hdr *)qd_io.iov[0].iov_base;
+
+			/*
+			 * ENODATA from QUERY_DIRECTORY generally means enumeration reached
+			 * the end. Treat it as end-of-search even if the header buffer is
+			 * unavailable in this async path.
+			 */
+			if (!hdr) {
+				cifs_dbg(FYI, "%s: ENODATA but hdr is NULL\n", __func__);
+			} else {
+				cifs_dbg(FYI, "%s: ENODATA with hdr->Status=0x%x (STATUS_NO_MORE_FILES=0x%x)\n",
+					 __func__, le32_to_cpu(hdr->Status), le32_to_cpu(STATUS_NO_MORE_FILES));
+			}
+
+			if (hdr && hdr->Status == STATUS_NO_MORE_FILES) {
+				trace_smb3_query_dir_done(xid, fid->persistent_fid,
+					tcon->tid, tcon->ses->Suid, 0, 0);
+				srch_inf->endOfSearch = true;
+				rc = 0;
+			} else {
+				cifs_dbg(FYI, "%s: ENODATA but Status mismatch - not treating as end-of-search\n",
+					__func__);
+				trace_smb3_query_dir_err(xid, fid->persistent_fid,
+					tcon->tid, tcon->ses->Suid, 0, 0, rc);
+			}
+		} else {
+			trace_smb3_query_dir_err(xid, fid->persistent_fid,
+				tcon->tid, tcon->ses->Suid, 0, 0, rc);
+		}
+		goto qdir_next_exit;
+	}
+
+	/* Parse the response using the combined buffer built in receive handler */
+	if (qd_io.combined_iov.iov_len > 0) {
+		rc = smb2_parse_query_directory(tcon, &qd_io.combined_iov, resp_buftype,
+						srch_inf);
+		if (rc) {
+			trace_smb3_query_dir_err(xid, fid->persistent_fid,
+				tcon->tid, tcon->ses->Suid, 0, 0, rc);
+			kfree(qd_io.combined_iov.iov_base);
+			qd_io.combined_iov.iov_base = NULL;
+			goto qdir_next_exit;
+		}
+
+		/* combined_iov.iov_base ownership transferred to srch_inf->ntwrk_buf_start */
+		qd_io.combined_iov.iov_base = NULL;
+
+		trace_smb3_query_dir_done(xid, fid->persistent_fid,
+			tcon->tid, tcon->ses->Suid, 0,
+			srch_inf->entries_in_buffer);
+	}
+
+qdir_next_exit:
+	/* Free the data buffer if not transferred to srch_inf */
+	kfree(qd_io.combined_iov.iov_base);
+
+	/* Return credits if we still have them (they should have been cleared in callback) */
+	if (qd_io.credits.value != 0) {
+		trace_smb3_rw_credits(0, 0, 0,
+				      server->credits, server->in_flight,
+				      qd_io.credits.value,
+				      cifs_trace_rw_credits_query_dir_done);
+		add_credits(server, &qd_io.credits, 0);
+	}
+
+	return rc;
 }
 
 static int
@@ -4834,6 +4954,252 @@ cifs_copy_folioq_to_iter(struct folio_queue *folioq, size_t data_size,
 	return 0;
 }
 
+static int
+cifs_copy_folioq_to_buf(struct folio_queue *folioq, size_t total_size,
+			size_t skip, char *buf, size_t buf_len)
+{
+	size_t copied = 0;
+
+	if (buf_len > total_size - skip)
+		buf_len = total_size - skip;
+
+	for (; folioq; folioq = folioq->next) {
+		for (int s = 0; s < folioq_count(folioq); s++) {
+			struct folio *folio = folioq_folio(folioq, s);
+			size_t fsize = folio_size(folio);
+			size_t len = umin(fsize - skip, buf_len);
+
+			if (len == 0)
+				break;
+
+			memcpy_from_folio(buf + copied, folio, skip, len);
+			copied += len;
+			buf_len -= len;
+			skip = 0;
+
+			if (buf_len == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Handle encrypted QueryDirectory response data.
+ * Called only for encrypted responses where mid->decrypted == true.
+ * For unencrypted responses, cifs_query_dir_receive handles everything.
+ *
+ * This is written in such a way that handle_read_data does not need modification.
+ *
+ * buf: contains read_rsp_size bytes of decrypted response
+ * buffer: contains (total_len - read_rsp_size) bytes of decrypted data
+ *
+ * Since sizeof(struct smb2_query_directory_rsp) < read_rsp_size,
+ * the response header is always fully in buf. Data may be split between
+ * buf and buffer depending on data_offset.
+ */
+static int
+handle_query_dir_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
+		      char *buf, unsigned int buf_len, struct folio_queue *buffer,
+		      unsigned int buffer_len, bool is_offloaded)
+{
+	struct cifs_query_dir_io *qd_io = mid->callback_data;
+	struct smb2_query_directory_rsp *rsp;
+	struct smb2_hdr *shdr = (struct smb2_hdr *)buf;
+	unsigned int data_offset, data_len;
+	unsigned int hdr_len;
+
+	cifs_dbg(FYI, "%s: processing encrypted QueryDirectory response\n", __func__);
+
+	if (shdr->Command != SMB2_QUERY_DIRECTORY) {
+		cifs_server_dbg(VFS, "only QueryDirectory responses are supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (server->ops->is_session_expired &&
+	    server->ops->is_session_expired(buf)) {
+		if (!is_offloaded)
+			cifs_reconnect(server, true);
+		return -1;
+	}
+
+	if (server->ops->is_status_pending &&
+			server->ops->is_status_pending(buf, server))
+		return -1;
+
+	rsp = (struct smb2_query_directory_rsp *)buf;
+	hdr_len = min_t(unsigned int, buf_len,
+			 sizeof(struct smb2_query_directory_rsp));
+
+	/* Map error code first */
+	qd_io->result = server->ops->map_error(buf, false);
+
+	/* Get data_offset early to set up iov properly */
+	data_offset = le16_to_cpu(rsp->OutputBufferOffset);
+	data_len = le32_to_cpu(rsp->OutputBufferLength);
+
+	/* Set up first iov to point to header portion (needed for credits/signature) */
+	qd_io->iov[0].iov_base = buf;
+	qd_io->iov[0].iov_len = qd_io->result ? hdr_len : data_offset;
+
+	if (qd_io->result != 0) {
+		cifs_dbg(FYI, "%s: server returned error %d (Status=0x%x)\n",
+			 __func__, qd_io->result, le32_to_cpu(rsp->hdr.Status));
+
+		/*
+		 * Copy header to persistent combined_iov buffer so status
+		 * remains accessible after receive handler returns. buf is temporary
+		 * and will be freed/reused, so we can't leave iov[0] pointing to it
+		 */
+		if (qd_io->combined_iov.iov_base && hdr_len > 0 &&
+		    hdr_len <= qd_io->combined_iov.iov_len) {
+			memcpy(qd_io->combined_iov.iov_base, buf, hdr_len);
+			qd_io->iov[0].iov_base = qd_io->combined_iov.iov_base;
+			qd_io->iov[0].iov_len = hdr_len;
+			cifs_dbg(FYI, "%s: copied error response header to combined_iov\n",
+				__func__);
+		}
+
+		/*
+		 * Normal error on query_directory response - response received successfully,
+		 * but the command failed. Store error in qd_io->result for callback
+		 */
+		if (is_offloaded)
+			mid->mid_state = MID_RESPONSE_RECEIVED;
+		else
+			dequeue_mid(server, mid, false);
+		return 0;
+	}
+
+	/* Success - parse the response data */
+	cifs_dbg(FYI, "%s: data_offset=%u data_len=%u buf_len=%u buffer_len=%u\n",
+		 __func__, data_offset, data_len, buf_len, buffer_len);
+
+	/* Validate data_offset */
+	if (data_offset < sizeof(struct smb2_query_directory_rsp)) {
+		cifs_dbg(FYI, "%s: data offset (%u) inside response header\n",
+			 __func__, data_offset);
+		data_offset = sizeof(struct smb2_query_directory_rsp);
+	} else if (data_offset > MAX_CIFS_SMALL_BUFFER_SIZE) {
+		cifs_dbg(VFS, "%s: data offset (%u) beyond end of smallbuf\n",
+			 __func__, data_offset);
+		qd_io->result = -EIO;
+		dequeue_mid(server, mid, qd_io->result);
+		return qd_io->result;
+	}
+
+	/* Validate data_offset is within buf_len + buffer_len */
+	if (data_offset > buf_len + buffer_len) {
+		cifs_dbg(VFS, "%s: data offset (%u) beyond response length (%u)\n",
+			 __func__, data_offset, buf_len + buffer_len);
+		qd_io->result = -EIO;
+		dequeue_mid(server, mid, qd_io->result);
+		return qd_io->result;
+	}
+
+	/* Validate response fits in pre-allocated combined buffer */
+	if ((size_t)data_offset + data_len > qd_io->combined_iov.iov_len) {
+		cifs_dbg(VFS, "%s: response (%u + %u) exceeds buffer capacity (%zu)\n",
+			 __func__, data_offset, data_len, qd_io->combined_iov.iov_len);
+		qd_io->result = -EIO;
+		dequeue_mid(server, mid, qd_io->result);
+		return qd_io->result;
+	}
+
+	/* Copy the prefix present in buf into combined_iov, preserving wire layout */
+	memcpy(qd_io->combined_iov.iov_base, buf, min(data_offset, buf_len));
+
+	if (data_offset < buf_len) {
+		/* Data starts in buf, may continue into buffer */
+		unsigned int data_in_buf = buf_len - data_offset;
+		unsigned int data_in_buffer;
+
+		if (data_len <= data_in_buf) {
+			/* All data is in buf */
+			memcpy(qd_io->combined_iov.iov_base + data_offset,
+			       buf + data_offset, data_len);
+		} else {
+			/* Copy from buf first */
+			memcpy(qd_io->combined_iov.iov_base + data_offset,
+			       buf + data_offset, data_in_buf);
+
+			/* Copy remainder from buffer at offset 0 */
+			data_in_buffer = data_len - data_in_buf;
+			if (data_in_buffer > buffer_len) {
+				cifs_dbg(VFS, "%s: data_in_buffer (%u) > buffer_len (%u)\n",
+					 __func__, data_in_buffer, buffer_len);
+				qd_io->result = -EIO;
+				dequeue_mid(server, mid, qd_io->result);
+				return qd_io->result;
+			}
+
+			qd_io->result = cifs_copy_folioq_to_buf(buffer, buffer_len, 0,
+								qd_io->combined_iov.iov_base +
+								data_offset + data_in_buf,
+								data_in_buffer);
+			if (qd_io->result != 0) {
+				cifs_dbg(VFS, "%s: failed to copy from folio_queue: %d\n",
+					 __func__, qd_io->result);
+				dequeue_mid(server, mid, qd_io->result);
+				return qd_io->result;
+			}
+		}
+	} else {
+		/* Padding and data are in buffer starting at offset 0 */
+		unsigned int bytes_in_buffer = data_offset - buf_len + data_len;
+
+		if (bytes_in_buffer > buffer_len) {
+			cifs_dbg(VFS, "%s: data beyond buffer: prefix+len=%u buffer_len=%u\n",
+				 __func__, bytes_in_buffer, buffer_len);
+			qd_io->result = -EIO;
+			dequeue_mid(server, mid, qd_io->result);
+			return qd_io->result;
+		}
+
+		qd_io->result = cifs_copy_folioq_to_buf(buffer, buffer_len, 0,
+							qd_io->combined_iov.iov_base + buf_len,
+							bytes_in_buffer);
+		if (qd_io->result != 0) {
+			cifs_dbg(VFS, "%s: failed to copy from folio_queue: %d\n",
+				 __func__, qd_io->result);
+			dequeue_mid(server, mid, qd_io->result);
+			return qd_io->result;
+		}
+	}
+
+	/* Set up iov[1] pointing into combined buffer, finalize valid length */
+	qd_io->iov[1].iov_base = qd_io->combined_iov.iov_base + data_offset;
+	qd_io->iov[1].iov_len = data_len;
+	qd_io->combined_iov.iov_len = data_offset + data_len;
+
+	dequeue_mid(server, mid, false);
+	return 0;
+}
+
+/*
+ * Handle callback for async QueryDirectory with multi-credit support.
+ * For encrypted responses, extracts decrypted data.
+ * For unencrypted responses, cifs_query_dir_receive already processed everything.
+ */
+int
+smb2_query_dir_handle_data(struct TCP_Server_Info *server, struct mid_q_entry *mid)
+{
+	char *buf = server->large_buf ? server->bigbuf : server->smallbuf;
+
+	/* For unencrypted responses, data already processed in cifs_query_dir_receive */
+	if (!mid->decrypted)
+		return 0;
+
+	/*
+	 * For small encrypted responses (< CIFSMaxBufSize), all data is in buf.
+	 * For large encrypted responses, this callback is not used - instead,
+	 * receive_encrypted_read/smb2_decrypt_offload call handle_query_dir_data directly.
+	 */
+	return handle_query_dir_data(server, mid, buf, server->pdu_size,
+				      NULL, 0, false);
+}
+
 static int
 handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 		 char *buf, unsigned int buf_len, struct folio_queue *buffer,
@@ -4997,25 +5363,46 @@ static void smb2_decrypt_offload(struct work_struct *work)
 	int rc;
 	struct mid_q_entry *mid;
 	struct iov_iter iter;
+	struct smb2_hdr *shdr;
+	unsigned int read_rsp_size = dw->server->vals->read_rsp_size;
 
+	/* decrypt read_rsp_size in buf + remainder in folio_queue */
 	iov_iter_folio_queue(&iter, ITER_DEST, dw->buffer, 0, 0, dw->len);
-	rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size,
-			      &iter, true);
+	rc = decrypt_raw_data(dw->server, dw->buf, read_rsp_size, &iter, true);
 	if (rc) {
 		cifs_dbg(VFS, "error decrypting rc=%d\n", rc);
 		goto free_pages;
 	}
 
 	dw->server->lstrp = jiffies;
+
+	shdr = (struct smb2_hdr *)(dw->buf + sizeof(struct smb2_transform_hdr));
+
+	/*
+	 * buf now contains read_rsp_size bytes after transform_hdr.
+	 * folio_queue contains (total_len - read_rsp_size) bytes.
+	 * The handle functions will determine where data actually starts based on data_offset.
+	 */
+
 	mid = smb2_find_dequeue_mid(dw->server, dw->buf);
 	if (mid == NULL)
 		cifs_dbg(FYI, "mid not found\n");
 	else {
 		mid->decrypted = true;
-		rc = handle_read_data(dw->server, mid, dw->buf,
-				      dw->server->vals->read_rsp_size,
-				      dw->buffer, dw->len,
-				      true);
+
+		/* Handle based on command type */
+		if (le16_to_cpu(shdr->Command) == SMB2_READ) {
+			rc = handle_read_data(dw->server, mid, dw->buf, read_rsp_size,
+					      dw->buffer, dw->len, true);
+		} else if (le16_to_cpu(shdr->Command) == SMB2_QUERY_DIRECTORY) {
+			rc = handle_query_dir_data(dw->server, mid, dw->buf, read_rsp_size,
+						   dw->buffer, dw->len, true);
+		} else {
+			cifs_dbg(VFS, "Unexpected command %u in decrypt offload\n",
+				 le16_to_cpu(shdr->Command));
+			rc = -EOPNOTSUPP;
+		}
+
 		if (rc >= 0) {
 #ifdef CONFIG_CIFS_STATS2
 			mid->when_received = jiffies;
@@ -5059,9 +5446,10 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 {
 	char *buf = server->smallbuf;
 	struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf;
+	struct smb2_hdr *shdr;
 	struct iov_iter iter;
-	unsigned int len;
-	unsigned int buflen = server->pdu_size;
+	unsigned int len, total_len, buflen = server->pdu_size;
+	unsigned int read_rsp_size = server->vals->read_rsp_size;
 	int rc;
 	struct smb2_decrypt_work *dw;
 
@@ -5072,7 +5460,10 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 	dw->server = server;
 
 	*num_mids = 1;
-	len = min_t(unsigned int, buflen, server->vals->read_rsp_size +
+	total_len = le32_to_cpu(tr_hdr->OriginalMessageSize);
+
+	/* Read transform_hdr + read_rsp_size into buf */
+	len = min_t(unsigned int, buflen, read_rsp_size +
 		sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1;
 
 	rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len);
@@ -5080,9 +5471,8 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 		goto free_dw;
 	server->total_read += rc;
 
-	len = le32_to_cpu(tr_hdr->OriginalMessageSize) -
-		server->vals->read_rsp_size;
-	dw->len = len;
+	/* Read remaining data into folio_queue */
+	dw->len = total_len - read_rsp_size;
 	len = round_up(dw->len, PAGE_SIZE);
 
 	size_t cur_size = 0;
@@ -5111,7 +5501,7 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 		goto free_pages;
 
 	/*
-	 * For large reads, offload to different thread for better performance,
+	 * For large responses, offload to different thread for better performance,
 	 * use more cores decrypting which can be expensive
 	 */
 
@@ -5125,20 +5515,41 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid,
 		return -1;
 	}
 
-	rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size,
-			      &iter, false);
+	/* Decrypt: read_rsp_size in buf + remainder in folio_queue */
+	rc = decrypt_raw_data(server, buf, read_rsp_size, &iter, false);
 	if (rc)
 		goto free_pages;
 
+	shdr = (struct smb2_hdr *) buf;
+
+	/*
+	 * buf now contains the complete response header (read_rsp_size bytes).
+	 * folio_queue contains (total_len - read_rsp_size) bytes.
+	 * The handle functions will determine where data actually starts based on data_offset.
+	 */
+
 	*mid = smb2_find_mid(server, buf);
 	if (*mid == NULL) {
 		cifs_dbg(FYI, "mid not found\n");
 	} else {
-		cifs_dbg(FYI, "mid found\n");
+		cifs_dbg(FYI, "mid found, command=%u\n", le16_to_cpu(shdr->Command));
 		(*mid)->decrypted = true;
-		rc = handle_read_data(server, *mid, buf,
-				      server->vals->read_rsp_size,
-				      dw->buffer, dw->len, false);
+
+		/* Handle based on command type */
+		if (le16_to_cpu(shdr->Command) == SMB2_READ) {
+			rc = handle_read_data(server, *mid, buf, read_rsp_size,
+					      dw->buffer, dw->len, false);
+		} else if (le16_to_cpu(shdr->Command) == SMB2_QUERY_DIRECTORY) {
+			rc = handle_query_dir_data(server, *mid, buf, read_rsp_size,
+						   dw->buffer, dw->len, false);
+		} else {
+			/* For now, other commands not supported in large encrypted path */
+			cifs_server_dbg(VFS,
+					"Large encrypted responses only supported for SMB2_READ and SMB2_QUERY_DIRECTORY (got %u)\n",
+					le16_to_cpu(shdr->Command));
+			rc = -EOPNOTSUPP;
+		}
+
 		if (rc >= 0) {
 			if (server->ops->is_network_name_deleted) {
 				server->ops->is_network_name_deleted(buf,
@@ -5279,7 +5690,6 @@ smb3_receive_transform(struct TCP_Server_Info *server,
 		return -ECONNABORTED;
 	}
 
-	/* TODO: add support for compounds containing READ. */
 	if (pdu_length > CIFSMaxBufSize + MAX_HEADER_SIZE(server)) {
 		return receive_encrypted_read(server, &mids[0], num_mids);
 	}
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 2d55246d2851b..92724cfb5b3f5 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -5496,6 +5496,185 @@ num_entries(int infotype, char *bufstart, char *end_of_buf, char **lastentry,
 	return entrycount;
 }
 
+/*
+ * Callback for async QueryDirectory with multi-credit support
+ */
+static void
+smb2_query_dir_callback(struct TCP_Server_Info *server, struct mid_q_entry *mid)
+{
+	struct cifs_query_dir_io *qd_io = mid->callback_data;
+	struct cifs_tcon *tcon = qd_io->tcon;
+	struct smb2_hdr *shdr = (struct smb2_hdr *)qd_io->iov[0].iov_base;
+	struct cifs_credits credits = {
+		.value = 0,
+		.instance = 0,
+	};
+
+	WARN_ONCE(qd_io->server != server,
+		  "qd_io server %p != mid server %p",
+		  qd_io->server, server);
+
+	cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d\n",
+		 __func__, mid->mid, mid->mid_state, qd_io->result);
+
+	switch (mid->mid_state) {
+	case MID_RESPONSE_RECEIVED:
+		credits.value = le16_to_cpu(shdr->CreditRequest);
+		credits.instance = server->reconnect_instance;
+		/* result already set, check signature if needed */
+		if (server->sign && !mid->decrypted) {
+			int rc;
+			struct smb_rqst rqst = {
+				.rq_iov = &qd_io->iov[0],
+				.rq_nvec = qd_io->iov[1].iov_len ? 2 : 1,
+			};
+
+			rc = smb2_verify_signature(&rqst, server);
+			if (rc) {
+				cifs_tcon_dbg(VFS, "QueryDir signature verification returned error = %d\n",
+					      rc);
+				qd_io->result = rc;
+			}
+		}
+		break;
+	case MID_REQUEST_SUBMITTED:
+	case MID_RETRY_NEEDED:
+		qd_io->result = -EAGAIN;
+		break;
+	case MID_RESPONSE_MALFORMED:
+		credits.value = le16_to_cpu(shdr->CreditRequest);
+		credits.instance = server->reconnect_instance;
+		qd_io->result = smb_EIO(smb_eio_trace_read_rsp_malformed);
+		break;
+	default:
+		qd_io->result = smb_EIO1(smb_eio_trace_read_mid_state_unknown,
+					 mid->mid_state);
+		break;
+	}
+
+	if (qd_io->result && qd_io->result != -ENODATA)
+		cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE);
+
+	trace_smb3_rw_credits(0, 0, qd_io->credits.value,
+			      server->credits, server->in_flight,
+			      0, cifs_trace_rw_credits_read_response_clear);
+	qd_io->credits.value = 0;
+	release_mid(server, mid);
+	trace_smb3_rw_credits(0, 0, 0,
+			      server->credits, server->in_flight,
+			      credits.value, cifs_trace_rw_credits_read_response_add);
+	add_credits(server, &credits, 0);
+
+	complete(&qd_io->done);
+}
+
+/*
+ * QueryDirectory with large buffer and multi-credit support.
+ * Uses async infrastructure but waits for completion synchronously.
+ */
+int
+SMB2_query_directory_large(struct cifs_query_dir_io *qd_io, unsigned int buf_size)
+{
+	int rc, flags = 0;
+	char *buf;
+	struct smb2_hdr *shdr;
+	struct smb_rqst rqst = { .rq_iov = &qd_io->iov[0],
+				 .rq_nvec = SMB2_QUERY_DIRECTORY_IOV_SIZE };
+	struct TCP_Server_Info *server = qd_io->server;
+	struct cifs_tcon *tcon = qd_io->tcon;
+	unsigned int total_len;
+	int credit_request;
+
+	cifs_dbg(FYI, "%s: buf_size=%u\n", __func__, buf_size);
+
+	/* Cap buffer size to avoid kmalloc failures for very large allocations.
+	 * SMB2_MAX_QD_DATABUF_SIZE is a safe limit that stays well below typical
+	 * kmalloc constraints while still allowing large directory listings.
+	 */
+	if (buf_size > SMB2_MAX_QD_DATABUF_SIZE)
+		buf_size = SMB2_MAX_QD_DATABUF_SIZE;
+
+	/* Allocate response buffer. Since we'll build a combined header+data buffer,
+	 * we need space for both. We'll request a slightly smaller OutputBufferLength
+	 * from the server to ensure the total response fits.
+	 */
+	qd_io->combined_iov.iov_base = kmalloc(buf_size, GFP_KERNEL);
+	if (!qd_io->combined_iov.iov_base)
+		return -ENOMEM;
+	/* Store total capacity in iov_len; updated to actual data length by receive handler */
+	qd_io->combined_iov.iov_len = buf_size;
+
+	/* Initialize completion */
+	init_completion(&qd_io->done);
+
+	/* Request less data from server to leave room for the response header.
+	 * Use MAX_CIFS_SMALL_BUFFER_SIZE as a safety margin.
+	 */
+	rc = SMB2_query_directory_init(qd_io->xid, tcon, server,
+				       &rqst, qd_io->persistent_fid,
+				       qd_io->volatile_fid, qd_io->index,
+				       qd_io->srch_inf->info_level,
+				       buf_size - MAX_CIFS_SMALL_BUFFER_SIZE);
+	if (rc) {
+		kfree(qd_io->combined_iov.iov_base);
+		qd_io->combined_iov.iov_base = NULL;
+		return rc;
+	}
+
+	if (smb3_encryption_required(tcon))
+		flags |= CIFS_TRANSFORM_REQ;
+
+	buf = rqst.rq_iov[0].iov_base;
+	total_len = rqst.rq_iov[0].iov_len;
+
+	shdr = (struct smb2_hdr *)buf;
+
+	if (qd_io->replay) {
+		/* Back-off before retry */
+		if (qd_io->cur_sleep)
+			msleep(qd_io->cur_sleep);
+		smb2_set_replay(server, &rqst);
+	}
+
+	/* Set credit charge based on buffer size */
+	if (qd_io->credits.value > 0) {
+		shdr->CreditCharge = cpu_to_le16(DIV_ROUND_UP(buf_size,
+						SMB2_MAX_BUFFER_SIZE));
+		credit_request = le16_to_cpu(shdr->CreditCharge) + 8;
+		if (server->credits >= server->max_credits)
+			shdr->CreditRequest = cpu_to_le16(0);
+		else
+			shdr->CreditRequest = cpu_to_le16(
+				min_t(int, server->max_credits -
+						server->credits, credit_request));
+
+		flags |= CIFS_HAS_CREDITS;
+	}
+
+	rc = cifs_call_async(server, &rqst,
+			     cifs_query_dir_receive, smb2_query_dir_callback,
+			     smb2_query_dir_handle_data, qd_io, flags,
+			     &qd_io->credits);
+	if (rc) {
+		cifs_stats_fail_inc(tcon, SMB2_QUERY_DIRECTORY_HE);
+		trace_smb3_query_dir_err(qd_io->xid, qd_io->persistent_fid,
+					 tcon->tid, tcon->ses->Suid,
+					 qd_io->index, 0, rc);
+		kfree(qd_io->combined_iov.iov_base);
+		qd_io->combined_iov.iov_base = NULL;
+	}
+
+	/* Free request buffer immediately after async call */
+	cifs_small_buf_release(buf);
+
+	if (rc)
+		return rc;
+
+	/* Wait for the async operation to complete */
+	wait_for_completion(&qd_io->done);
+	return qd_io->result;
+}
+
 /*
  * Readdir/FindFirst
  */
@@ -5560,12 +5739,6 @@ int SMB2_query_directory_init(const unsigned int xid,
 	req->FileNameOffset =
 		cpu_to_le16(sizeof(struct smb2_query_directory_req));
 	req->FileNameLength = cpu_to_le16(len);
-	/*
-	 * BB could be 30 bytes or so longer if we used SMB2 specific
-	 * buffer lengths, but this is safe and close enough.
-	 */
-	output_size = min_t(unsigned int, output_size, server->maxBuf);
-	output_size = min_t(unsigned int, output_size, 2 << 15);
 	req->OutputBufferLength = cpu_to_le32(output_size);
 
 	iov[0].iov_base = (char *)req;
diff --git a/fs/smb/client/smb2pdu.h b/fs/smb/client/smb2pdu.h
index 7b7a864520c68..843e30c1ecc61 100644
--- a/fs/smb/client/smb2pdu.h
+++ b/fs/smb/client/smb2pdu.h
@@ -132,6 +132,9 @@ struct share_redirect_error_context_rsp {
 /* Size of the minimal QueryDir response for checking if more data exists */
 #define SMB2_QD2_RESPONSE_SIZE 4096
 
+/* max query directory data buffer size */
+#define SMB2_MAX_QD_DATABUF_SIZE (2 * 1024 * 1024)
+
 /*
  * Output buffer size for first QueryDir in Create+QD1+QD2 compound.
  * Accounts for shared buffer space needed for all three responses.
diff --git a/fs/smb/client/smb2proto.h b/fs/smb/client/smb2proto.h
index 9de7d2fe8d466..9607e8899f7ff 100644
--- a/fs/smb/client/smb2proto.h
+++ b/fs/smb/client/smb2proto.h
@@ -46,6 +46,8 @@ __le32 smb2_get_lease_state(struct cifsInodeInfo *cinode, unsigned int oplock);
 bool smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server);
 int smb3_handle_read_data(struct TCP_Server_Info *server,
 			  struct mid_q_entry *mid);
+int smb2_query_dir_handle_data(struct TCP_Server_Info *server,
+			       struct mid_q_entry *mid);
 struct inode *smb2_create_reparse_inode(struct cifs_open_info_data *data,
 					struct super_block *sb,
 					const unsigned int xid,
@@ -197,6 +199,7 @@ int SMB2_query_directory_init(const unsigned int xid, struct cifs_tcon *tcon,
 			      u64 volatile_fid, int index, int info_level,
 			      unsigned int output_size);
 void SMB2_query_directory_free(struct smb_rqst *rqst);
+int SMB2_query_directory_large(struct cifs_query_dir_io *qd_io, unsigned int buf_size);
 int SMB2_set_eof(const unsigned int xid, struct cifs_tcon *tcon,
 		 u64 persistent_fid, u64 volatile_fid, u32 pid,
 		 loff_t new_eof);
diff --git a/fs/smb/client/trace.h b/fs/smb/client/trace.h
index acfbb63086ea2..54ee1317c5b12 100644
--- a/fs/smb/client/trace.h
+++ b/fs/smb/client/trace.h
@@ -165,6 +165,7 @@
 	EM(cifs_trace_rw_credits_write_prepare,		"wr-prepare ") \
 	EM(cifs_trace_rw_credits_write_response_add,	"wr-resp-add") \
 	EM(cifs_trace_rw_credits_write_response_clear,	"wr-resp-clr") \
+	EM(cifs_trace_rw_credits_query_dir_done,	"qd-done    ") \
 	E_(cifs_trace_rw_credits_zero_in_flight,	"ZERO-IN-FLT")
 
 #define smb3_tcon_ref_traces					      \
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
index 05f8099047e1a..24ccadb00f568 100644
--- a/fs/smb/client/transport.c
+++ b/fs/smb/client/transport.c
@@ -1134,7 +1134,7 @@ cifs_discard_remaining_data(struct TCP_Server_Info *server)
 }
 
 static int
-__cifs_readv_discard(struct TCP_Server_Info *server, struct mid_q_entry *mid,
+__cifs_discard_and_dequeue(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 		     bool malformed)
 {
 	int length;
@@ -1146,12 +1146,161 @@ __cifs_readv_discard(struct TCP_Server_Info *server, struct mid_q_entry *mid,
 	return length;
 }
 
-static int
-cifs_readv_discard(struct TCP_Server_Info *server, struct mid_q_entry *mid)
+/*
+ * Receive handler for async QueryDirectory with multi-credit support
+ */
+int
+cifs_query_dir_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 {
-	struct cifs_io_subrequest *rdata = mid->callback_data;
+	int length, len;
+	unsigned int data_offset, data_len, resp_size;
+	struct cifs_query_dir_io *qd_io = mid->callback_data;
+	char *buf = server->smallbuf;
+	unsigned int buflen = server->pdu_size;
+	struct smb2_query_directory_rsp *rsp;
+
+	cifs_dbg(FYI, "%s: mid=%llu buf_capacity=%zu\n",
+		 __func__, mid->mid, qd_io->combined_iov.iov_len);
+
+	/*
+	 * Read the rest of QUERY_DIRECTORY_RSP header (sans Data array).
+	 * QueryDirectory response structure is 64 bytes (SMB2 header) + 8 bytes (fixed part).
+	 */
+	resp_size = sizeof(struct smb2_query_directory_rsp);
+	len = min_t(unsigned int, buflen, resp_size) - HEADER_SIZE(server) + 1;
+
+	length = cifs_read_from_socket(server,
+				       buf + HEADER_SIZE(server) - 1, len);
+	if (length < 0) {
+		qd_io->result = length;
+		return __cifs_discard_and_dequeue(server, mid, false);
+	}
+	server->total_read += length;
+
+	if (server->ops->is_session_expired &&
+	    server->ops->is_session_expired(buf)) {
+		cifs_reconnect(server, true);
+		return -1;
+	}
+
+	if (server->ops->is_status_pending &&
+	    server->ops->is_status_pending(buf, server)) {
+		cifs_discard_remaining_data(server);
+		return -1;
+	}
+
+	/* Is there enough to get to the rest of the QUERY_DIRECTORY_RSP header? */
+	if (server->total_read < resp_size) {
+		cifs_dbg(FYI, "%s: server returned short header. got=%u expected=%u\n",
+			 __func__, server->total_read, resp_size);
+		qd_io->result = smb_EIO2(smb_eio_trace_read_rsp_short,
+					 server->total_read, resp_size);
+		return __cifs_discard_and_dequeue(server, mid, true);
+	}
+
+	/* Set up first iov for signature check and to get credits */
+	qd_io->iov[0].iov_base = buf;
+	qd_io->iov[0].iov_len = server->total_read;
+	qd_io->iov[1].iov_base = NULL;
+	qd_io->iov[1].iov_len = 0;
+	cifs_dbg(FYI, "0: iov_base=%p iov_len=%zu\n",
+		 qd_io->iov[0].iov_base, qd_io->iov[0].iov_len);
+
+	/* Parse header early to access status before map_error converts it */
+	rsp = (struct smb2_query_directory_rsp *)buf;
+
+	/* Was the SMB query_directory successful? */
+	qd_io->result = server->ops->map_error(buf, false);
+	if (qd_io->result != 0) {
+		if (qd_io->combined_iov.iov_base && qd_io->iov[0].iov_len > 0 &&
+		    qd_io->iov[0].iov_len <= qd_io->combined_iov.iov_len) {
+			memcpy(qd_io->combined_iov.iov_base, qd_io->iov[0].iov_base,
+			       qd_io->iov[0].iov_len);
+			qd_io->iov[0].iov_base = qd_io->combined_iov.iov_base;
+		}
+		cifs_dbg(FYI, "%s: server returned error %d (Status was 0x%x)\n",
+			 __func__, qd_io->result, le32_to_cpu(rsp->hdr.Status));
+		return __cifs_discard_and_dequeue(server, mid, false);
+	}
+
+	data_offset = le16_to_cpu(rsp->OutputBufferOffset);
+	data_len = le32_to_cpu(rsp->OutputBufferLength);
+
+	cifs_dbg(FYI, "%s: total_read=%u data_offset=%u data_len=%u\n",
+		 __func__, server->total_read, data_offset, data_len);
+
+	/* Validate data_offset and data_len */
+	if (data_offset < server->total_read) {
+		cifs_dbg(FYI, "%s: data offset (%u) inside response header\n",
+			 __func__, data_offset);
+		data_offset = server->total_read;
+		data_len = 0;
+	} else if (data_offset > MAX_CIFS_SMALL_BUFFER_SIZE) {
+		cifs_dbg(FYI, "%s: data offset (%u) beyond end of smallbuf\n",
+			 __func__, data_offset);
+		qd_io->result = smb_EIO1(smb_eio_trace_read_overlarge,
+					 data_offset);
+		return __cifs_discard_and_dequeue(server, mid, true);
+	}
+
+	/* Read any padding between header and data */
+	len = data_offset - server->total_read;
+	if (len > 0) {
+		length = cifs_read_from_socket(server,
+					       buf + server->total_read, len);
+		if (length < 0) {
+			qd_io->result = length;
+			return __cifs_discard_and_dequeue(server, mid, false);
+		}
+		server->total_read += length;
+		qd_io->iov[0].iov_len = server->total_read;
+	}
+
+	/* Check if data fits in the pre-allocated combined buffer */
+	if (qd_io->iov[0].iov_len + data_len > qd_io->combined_iov.iov_len) {
+		cifs_dbg(VFS, "%s: response (%zu + %u) exceeds buffer capacity (%zu)\n",
+			 __func__, qd_io->iov[0].iov_len, data_len, qd_io->combined_iov.iov_len);
+		qd_io->result = smb_EIO2(smb_eio_trace_read_rsp_malformed,
+					 data_len, (unsigned int)qd_io->combined_iov.iov_len);
+		return __cifs_discard_and_dequeue(server, mid, true);
+	}
+
+	/*
+	 * Build combined header+data in qd_io->combined_iov.iov_base,
+	 * preserving the wire layout so SMB2 signing can be verified against
+	 * the exact on-the-wire bytes.  hdr_len already accounts for the
+	 * header and any padding up to data_offset; data is read at that
+	 * same offset.  OutputBufferOffset/Length are left untouched.
+	 */
+	if (qd_io->iov[0].iov_len > 0 && qd_io->result == 0) {
+		size_t hdr_len = qd_io->iov[0].iov_len;
+
+		/* Copy header+padding to combined buffer, preserving wire layout */
+		memcpy(qd_io->combined_iov.iov_base, qd_io->iov[0].iov_base, hdr_len);
+
+		/* Read directory entries at the wire data_offset position */
+		if (data_len > 0) {
+			length = cifs_read_from_socket(server,
+						       qd_io->combined_iov.iov_base + hdr_len,
+						       data_len);
+			if (length < 0) {
+				qd_io->result = length;
+				return __cifs_discard_and_dequeue(server, mid, false);
+			}
+			server->total_read += length;
+
+			/* Set up second iov pointing to the directory data within combined buffer */
+			qd_io->iov[1].iov_base = qd_io->combined_iov.iov_base + hdr_len;
+			qd_io->iov[1].iov_len = length;
+		}
+
+		qd_io->combined_iov.iov_len = hdr_len + (data_len > 0 ? length : 0);
+
+		cifs_dbg(FYI, "total_read=%u buflen=%u data_len=%u hdr_len=%zu combined_len=%zu\n",
+			 server->total_read, buflen, data_len, hdr_len, qd_io->combined_iov.iov_len);
+	}
 
-	return  __cifs_readv_discard(server, mid, rdata->result);
+	return __cifs_discard_and_dequeue(server, mid, false);
 }
 
 int
@@ -1205,7 +1354,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 		cifs_dbg(FYI, "%s: server returned error %d\n",
 			 __func__, rdata->result);
 		/* normal error on read response */
-		return __cifs_readv_discard(server, mid, false);
+		return __cifs_discard_and_dequeue(server, mid, false);
 	}
 
 	/* Is there enough to get to the rest of the READ_RSP header? */
@@ -1215,7 +1364,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 			 server->vals->read_rsp_size);
 		rdata->result = smb_EIO2(smb_eio_trace_read_rsp_short,
 					 server->total_read, server->vals->read_rsp_size);
-		return cifs_readv_discard(server, mid);
+		return __cifs_discard_and_dequeue(server, mid, rdata->result);
 	}
 
 	data_offset = server->ops->read_data_offset(buf);
@@ -1234,7 +1383,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 			 __func__, data_offset);
 		rdata->result = smb_EIO1(smb_eio_trace_read_overlarge,
 					 data_offset);
-		return cifs_readv_discard(server, mid);
+		return __cifs_discard_and_dequeue(server, mid, rdata->result);
 	}
 
 	cifs_dbg(FYI, "%s: total_read=%u data_offset=%u\n",
@@ -1260,7 +1409,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 		/* data_len is corrupt -- discard frame */
 		rdata->result = smb_EIO2(smb_eio_trace_read_rsp_malformed,
 					 data_offset + data_len, buflen);
-		return cifs_readv_discard(server, mid);
+		return __cifs_discard_and_dequeue(server, mid, rdata->result);
 	}
 
 #ifdef CONFIG_CIFS_SMB_DIRECT
@@ -1279,7 +1428,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 
 	/* discard anything left over */
 	if (server->total_read < buflen)
-		return cifs_readv_discard(server, mid);
+		return __cifs_discard_and_dequeue(server, mid, rdata->result);
 
 	dequeue_mid(server, mid, false);
 	mid->resp_buf = server->smallbuf;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 07/19] cifs: reorganize cached dir helpers
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (4 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 06/19] cifs: optimize readdir for larger directories nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 08/19] cifs: make cfid locks more granular nspmangalore
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Currently, we have helper functions for cfid and dirent caching spread
across cached_dir.c and readdir.c, with de_mutex locking done inside
the calling functions. This change neatly wraps them in helper functions
and keeps all such functions in cached_dir.c.

This code also splits the logic of dirent emit into cache and to VFS into
different functions.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 207 +++++++++++++++++++++++++++++++++++++
 fs/smb/client/cached_dir.h |  18 +++-
 fs/smb/client/readdir.c    | 167 ++----------------------------
 3 files changed, 231 insertions(+), 161 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 64e22c064fa0a..a9bc0c81868c8 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -23,6 +23,213 @@ struct cached_dir_dentry {
 	struct dentry *dentry;
 };
 
+static bool emit_cached_dirents(struct cached_dirents *cde,
+				struct dir_context *ctx)
+{
+	struct cached_dirent *dirent;
+	bool rc;
+
+	lockdep_assert_held(&cde->de_mutex);
+
+	list_for_each_entry(dirent, &cde->entries, entry) {
+		/*
+		 * Skip all early entries prior to the current lseek()
+		 * position.
+		 */
+		if (ctx->pos > dirent->pos)
+			continue;
+		/*
+		 * We recorded the current ->pos value for the dirent
+		 * when we stored it in the cache.
+		 * However, this sequence of ->pos values may have holes
+		 * in it, for example dot-dirs returned from the server
+		 * are suppressed.
+		 * Handle this by forcing ctx->pos to be the same as the
+		 * ->pos of the current dirent we emit from the cache.
+		 * This means that when we emit these entries from the cache
+		 * we now emit them with the same ->pos value as in the
+		 * initial scan.
+		 */
+		ctx->pos = dirent->pos;
+		rc = dir_emit(ctx, dirent->name, dirent->namelen,
+			      dirent->fattr.cf_uniqueid,
+			      dirent->fattr.cf_dtype);
+		if (!rc)
+			return rc;
+		ctx->pos++;
+	}
+	return true;
+}
+
+static bool add_cached_dirent(struct cached_dirents *cde,
+			      struct dir_context *ctx, const char *name,
+			      int namelen, struct cifs_fattr *fattr,
+			      struct file *file)
+{
+	struct cached_dirent *de;
+
+	lockdep_assert_held(&cde->de_mutex);
+
+	if (cde->file != file)
+		return false;
+	if (cde->is_valid || cde->is_failed)
+		return false;
+	if (ctx->pos != cde->pos) {
+		cde->is_failed = 1;
+		return false;
+	}
+	de = kzalloc_obj(*de, GFP_KERNEL);
+	if (de == NULL) {
+		cde->is_failed = 1;
+		return false;
+	}
+	de->namelen = namelen;
+	de->name = kstrndup(name, namelen, GFP_KERNEL);
+	if (de->name == NULL) {
+		kfree(de);
+		cde->is_failed = 1;
+		return false;
+	}
+	de->pos = ctx->pos;
+
+	memcpy(&de->fattr, fattr, sizeof(struct cifs_fattr));
+
+	list_add_tail(&de->entry, &cde->entries);
+	/* update accounting */
+	cde->entries_count++;
+	cde->bytes_used += sizeof(*de) + (size_t)namelen + 1;
+	return true;
+}
+
+bool emit_cached_dir_if_valid(struct cached_fid *cfid,
+			      struct file *file,
+			      struct dir_context *ctx)
+{
+	if (!cfid)
+		return false;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	/*
+	 * If this was reading from the start of the directory
+	 * we need to initialize scanning and storing the
+	 * directory content.
+	 */
+	if (ctx->pos == 0 && cfid->dirents.file == NULL) {
+		cfid->dirents.file = file;
+		cfid->dirents.pos = 2;
+	}
+
+	if (!cfid->dirents.is_valid) {
+		mutex_unlock(&cfid->dirents.de_mutex);
+		return false;
+	}
+
+	if (dir_emit_dots(file, ctx))
+		emit_cached_dirents(&cfid->dirents, ctx);
+
+	mutex_unlock(&cfid->dirents.de_mutex);
+	return true;
+}
+
+bool add_to_cached_dir(struct cached_fid *cfid,
+		       struct dir_context *ctx,
+		       const char *name,
+		       int namelen,
+		       struct cifs_fattr *fattr,
+		       struct file *file)
+{
+	size_t delta_bytes;
+	bool added = false;
+
+	if (!cfid)
+		return false;
+
+	/* Cost of this entry */
+	delta_bytes = sizeof(struct cached_dirent) + (size_t)namelen + 1;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	added = add_cached_dirent(&cfid->dirents, ctx, name, namelen,
+				  fattr, file);
+	mutex_unlock(&cfid->dirents.de_mutex);
+
+	if (added) {
+		/* per-tcon then global for consistency with free path */
+		atomic64_add((long long)delta_bytes, &cfid->cfids->total_dirents_bytes);
+		atomic_long_inc(&cfid->cfids->total_dirents_entries);
+		atomic64_add((long long)delta_bytes, &cifs_dircache_bytes_used);
+	}
+
+	return added;
+}
+
+static void update_cached_dirents_count(struct cached_dirents *cde,
+					struct file *file)
+{
+	if (cde->file != file)
+		return;
+	if (cde->is_valid || cde->is_failed)
+		return;
+
+	cde->pos++;
+}
+
+static void finished_cached_dirents_count(struct cached_dirents *cde,
+					  struct dir_context *ctx,
+					  struct file *file)
+{
+	if (cde->file != file)
+		return;
+	if (cde->is_valid || cde->is_failed)
+		return;
+	if (ctx->pos != cde->pos)
+		return;
+
+	cde->is_valid = 1;
+}
+
+void update_pos_cached_dir(struct cached_fid *cfid,
+				      struct file *file)
+{
+	if (!cfid)
+		return;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	update_cached_dirents_count(&cfid->dirents, file);
+	mutex_unlock(&cfid->dirents.de_mutex);
+}
+
+void complete_cached_dir(struct cached_fid *cfid,
+					struct dir_context *ctx,
+					struct file *file)
+{
+	if (!cfid)
+		return;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	finished_cached_dirents_count(&cfid->dirents, ctx, file);
+	mutex_unlock(&cfid->dirents.de_mutex);
+}
+
+struct cached_dirent *lookup_cached_dirent(struct cached_dirents *cde,
+				   const char *name,
+				   unsigned int namelen)
+{
+	struct cached_dirent *entry;
+
+	if (!cde)
+		return NULL;
+
+	lockdep_assert_held(&cde->de_mutex);
+
+	list_for_each_entry(entry, &cde->entries, entry) {
+		if (entry->namelen == namelen &&
+		    memcmp(entry->name, name, namelen) == 0)
+			return entry;
+	}
+
+	return NULL;
+}
+
 static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 						    const char *path,
 						    bool lookup_only,
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 19d5592512e4b..09f1f488059c9 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -8,7 +8,6 @@
 #ifndef _CACHED_DIR_H
 #define _CACHED_DIR_H
 
-
 struct cached_dirent {
 	struct list_head entry;
 	char *name;
@@ -87,6 +86,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, const char *path,
 int open_cached_dir_by_dentry(struct cifs_tcon *tcon, struct dentry *dentry,
 			      struct cached_fid **ret_cfid);
 void close_cached_dir(struct cached_fid *cfid);
+bool emit_cached_dir_if_valid(struct cached_fid *cfid,
+			      struct file *file,
+			      struct dir_context *ctx);
+bool add_to_cached_dir(struct cached_fid *cfid,
+		       struct dir_context *ctx,
+		       const char *name,
+		       int namelen,
+		       struct cifs_fattr *fattr,
+		       struct file *file);
+void update_pos_cached_dir(struct cached_fid *cfid,
+				      struct file *file);
+void complete_cached_dir(struct cached_fid *cfid,
+					struct dir_context *ctx,
+					struct file *file);
+struct cached_dirent *lookup_cached_dirent(struct cached_dirents *cde,
+				   const char *name,
+				   unsigned int namelen);
 void drop_cached_dir_by_name(const unsigned int xid, struct cifs_tcon *tcon,
 			     const char *name, struct cifs_sb_info *cifs_sb);
 void close_all_cached_dirs(struct cifs_sb_info *cifs_sb);
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 8a444f97e0ae9..907e235ad1b8f 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -817,136 +817,13 @@ find_cifs_entry(const unsigned int xid, struct cifs_tcon *tcon, loff_t pos,
 	return rc;
 }
 
-static bool emit_cached_dirents(struct cached_dirents *cde,
-				struct dir_context *ctx)
-{
-	struct cached_dirent *dirent;
-	bool rc;
-
-	list_for_each_entry(dirent, &cde->entries, entry) {
-		/*
-		 * Skip all early entries prior to the current lseek()
-		 * position.
-		 */
-		if (ctx->pos > dirent->pos)
-			continue;
-		/*
-		 * We recorded the current ->pos value for the dirent
-		 * when we stored it in the cache.
-		 * However, this sequence of ->pos values may have holes
-		 * in it, for example dot-dirs returned from the server
-		 * are suppressed.
-		 * Handle this by forcing ctx->pos to be the same as the
-		 * ->pos of the current dirent we emit from the cache.
-		 * This means that when we emit these entries from the cache
-		 * we now emit them with the same ->pos value as in the
-		 * initial scan.
-		 */
-		ctx->pos = dirent->pos;
-		rc = dir_emit(ctx, dirent->name, dirent->namelen,
-			      dirent->fattr.cf_uniqueid,
-			      dirent->fattr.cf_dtype);
-		if (!rc)
-			return rc;
-		ctx->pos++;
-	}
-	return true;
-}
-
-static void update_cached_dirents_count(struct cached_dirents *cde,
-					struct file *file)
-{
-	if (cde->file != file)
-		return;
-	if (cde->is_valid || cde->is_failed)
-		return;
-
-	cde->pos++;
-}
-
-static void finished_cached_dirents_count(struct cached_dirents *cde,
-					struct dir_context *ctx, struct file *file)
-{
-	if (cde->file != file)
-		return;
-	if (cde->is_valid || cde->is_failed)
-		return;
-	if (ctx->pos != cde->pos)
-		return;
-
-	cde->is_valid = 1;
-}
-
-static bool add_cached_dirent(struct cached_dirents *cde,
-			      struct dir_context *ctx, const char *name,
-			      int namelen, struct cifs_fattr *fattr,
-			      struct file *file)
-{
-	struct cached_dirent *de;
-
-	if (cde->file != file)
-		return false;
-	if (cde->is_valid || cde->is_failed)
-		return false;
-	if (ctx->pos != cde->pos) {
-		cde->is_failed = 1;
-		return false;
-	}
-	de = kzalloc_obj(*de, GFP_ATOMIC);
-	if (de == NULL) {
-		cde->is_failed = 1;
-		return false;
-	}
-	de->namelen = namelen;
-	de->name = kstrndup(name, namelen, GFP_ATOMIC);
-	if (de->name == NULL) {
-		kfree(de);
-		cde->is_failed = 1;
-		return false;
-	}
-	de->pos = ctx->pos;
-
-	memcpy(&de->fattr, fattr, sizeof(struct cifs_fattr));
-
-	list_add_tail(&de->entry, &cde->entries);
-	/* update accounting */
-	cde->entries_count++;
-	cde->bytes_used += sizeof(*de) + (size_t)namelen + 1;
-	return true;
-}
-
 static bool cifs_dir_emit(struct dir_context *ctx,
 			  const char *name, int namelen,
-			  struct cifs_fattr *fattr,
-			  struct cached_fid *cfid,
-			  struct file *file)
+			  struct cifs_fattr *fattr)
 {
-	size_t delta_bytes = 0;
-	bool rc, added = false;
 	ino_t ino = cifs_uniqueid_to_ino_t(fattr->cf_uniqueid);
 
-	rc = dir_emit(ctx, name, namelen, ino, fattr->cf_dtype);
-	if (!rc)
-		return rc;
-
-	if (cfid) {
-		/* Cost of this entry */
-		delta_bytes = sizeof(struct cached_dirent) + (size_t)namelen + 1;
-
-		mutex_lock(&cfid->dirents.de_mutex);
-		added = add_cached_dirent(&cfid->dirents, ctx, name, namelen,
-					  fattr, file);
-		mutex_unlock(&cfid->dirents.de_mutex);
-
-		if (added) {
-			/* per-tcon then global for consistency with free path */
-			atomic64_add((long long)delta_bytes, &cfid->cfids->total_dirents_bytes);
-			atomic_long_inc(&cfid->cfids->total_dirents_entries);
-			atomic64_add((long long)delta_bytes, &cifs_dircache_bytes_used);
-		}
-	}
-
-	return rc;
+	return dir_emit(ctx, name, namelen, ino, fattr->cf_dtype);
 }
 
 static int cifs_filldir(char *find_entry, struct file *file,
@@ -1040,10 +917,10 @@ static int cifs_filldir(char *find_entry, struct file *file,
 		 */
 		fattr.cf_flags |= CIFS_FATTR_NEED_REVAL;
 
+	add_to_cached_dir(cfid, ctx, name.name, name.len, &fattr, file);
 	cifs_prime_dcache(file_dentry(file), &name, &fattr);
 
-	return !cifs_dir_emit(ctx, name.name, name.len,
-			      &fattr, cfid, file);
+	return !cifs_dir_emit(ctx, name.name, name.len, &fattr);
 }
 
 
@@ -1088,30 +965,8 @@ int cifs_readdir(struct file *file, struct dir_context *ctx)
 	if (rc)
 		goto cache_not_found;
 
-	mutex_lock(&cfid->dirents.de_mutex);
-	/*
-	 * If this was reading from the start of the directory
-	 * we need to initialize scanning and storing the
-	 * directory content.
-	 */
-	if (ctx->pos == 0 && cfid->dirents.file == NULL) {
-		cfid->dirents.file = file;
-		cfid->dirents.pos = 2;
-	}
-	/*
-	 * If we already have the entire directory cached then
-	 * we can just serve the cache.
-	 */
-	if (cfid->dirents.is_valid) {
-		if (!dir_emit_dots(file, ctx)) {
-			mutex_unlock(&cfid->dirents.de_mutex);
-			goto rddir2_exit;
-		}
-		emit_cached_dirents(&cfid->dirents, ctx);
-		mutex_unlock(&cfid->dirents.de_mutex);
+	if (emit_cached_dir_if_valid(cfid, file, ctx))
 		goto rddir2_exit;
-	}
-	mutex_unlock(&cfid->dirents.de_mutex);
 
 	/* Drop the cache while calling initiate_cifs_search and
 	 * find_cifs_entry in case there will be reconnects during
@@ -1161,11 +1016,7 @@ int cifs_readdir(struct file *file, struct dir_context *ctx)
 	} else if (current_entry != NULL) {
 		cifs_dbg(FYI, "entry %lld found\n", ctx->pos);
 	} else {
-		if (cfid) {
-			mutex_lock(&cfid->dirents.de_mutex);
-			finished_cached_dirents_count(&cfid->dirents, ctx, file);
-			mutex_unlock(&cfid->dirents.de_mutex);
-		}
+		complete_cached_dir(cfid, ctx, file);
 		cifs_dbg(FYI, "Could not find entry\n");
 		goto rddir2_exit;
 	}
@@ -1202,11 +1053,7 @@ int cifs_readdir(struct file *file, struct dir_context *ctx)
 		}
 
 		ctx->pos++;
-		if (cfid) {
-			mutex_lock(&cfid->dirents.de_mutex);
-			update_cached_dirents_count(&cfid->dirents, file);
-			mutex_unlock(&cfid->dirents.de_mutex);
-		}
+		update_pos_cached_dir(cfid, file);
 
 		if (ctx->pos ==
 			cifsFile->srch_inf.index_of_last_entry) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 08/19] cifs: make cfid locks more granular
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (5 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 07/19] cifs: reorganize cached dir helpers nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 09/19] cifs: query dir should reuse cfid even if not fully cached nspmangalore
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today all synchronization of cfid related data structures are done
using cfid_list_lock. This can serialize caching of different dirs
unnecessarily.

This change introduces two new locks to provide finer locking.
Every cfid will now have a cfid_lock. This is designed to protect
everything inside a cfid that is not related to list operations.

Every cfid will now also have a cfid_open_mutex. This is designed to
protect parallel open calls to the same dir.

Additionally, this change will now make accesses to cfid->dentries
more stringent using the de_mutex.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 155 +++++++++++++++++++++++++------------
 fs/smb/client/cached_dir.h |  13 +++-
 fs/smb/client/cifs_debug.c |   7 +-
 fs/smb/client/cifsglob.h   |   2 +
 fs/smb/client/dir.c        |  34 ++++++--
 5 files changed, 149 insertions(+), 62 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index a9bc0c81868c8..ad2439856a1fe 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -16,13 +16,29 @@ static struct cached_fid *init_cached_dir(const char *path);
 static void free_cached_dir(struct cached_fid *cfid);
 static void smb2_close_cached_fid(struct kref *ref);
 static void cfids_laundromat_worker(struct work_struct *work);
-static void close_cached_dir_locked(struct cached_fid *cfid);
 
 struct cached_dir_dentry {
 	struct list_head entry;
 	struct dentry *dentry;
 };
 
+bool cached_dir_copy_lease_key(struct cached_fid *cfid,
+			      __u8 lease_key[SMB2_LEASE_KEY_SIZE])
+{
+	bool valid;
+
+	if (!cfid)
+		return false;
+
+	spin_lock(&cfid->cfid_lock);
+	valid = is_valid_cached_dir(cfid);
+	if (valid)
+		memcpy(lease_key, cfid->fid.lease_key, SMB2_LEASE_KEY_SIZE);
+	spin_unlock(&cfid->cfid_lock);
+
+	return valid;
+}
+
 static bool emit_cached_dirents(struct cached_dirents *cde,
 				struct dir_context *ctx)
 {
@@ -244,9 +260,13 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 			 * fully cached or it may be in the process of
 			 * being deleted due to a lease break.
 			 */
-			if (!is_valid_cached_dir(cfid))
+			spin_lock(&cfid->cfid_lock);
+			if (!is_valid_cached_dir(cfid)) {
+				spin_unlock(&cfid->cfid_lock);
 				return NULL;
+			}
 			kref_get(&cfid->refcount);
+			spin_unlock(&cfid->cfid_lock);
 			return cfid;
 		}
 	}
@@ -273,7 +293,9 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 	 * Concurrent processes won't be to use it yet due to @cfid->time being
 	 * zero.
 	 */
+	spin_lock(&cfid->cfid_lock);
 	cfid->has_lease = true;
+	spin_unlock(&cfid->cfid_lock);
 
 	return cfid;
 }
@@ -396,19 +418,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 		kfree(utf16_path);
 		return -ENOENT;
 	}
+	spin_unlock(&cfids->cfid_list_lock);
+
 	/*
 	 * Return cached fid if it is valid (has a lease and has a time).
 	 * Otherwise, it is either a new entry or laundromat worker removed it
 	 * from @cfids->entries.  Caller will put last reference if the latter.
 	 */
+
+	spin_lock(&cfid->cfid_lock);
 	if (is_valid_cached_dir(cfid)) {
 		cfid->last_access_time = jiffies;
-		spin_unlock(&cfids->cfid_list_lock);
+		spin_unlock(&cfid->cfid_lock);
 		*ret_cfid = cfid;
 		kfree(utf16_path);
 		return 0;
 	}
-	spin_unlock(&cfids->cfid_list_lock);
+	spin_unlock(&cfid->cfid_lock);
 
 	pfid = &cfid->fid;
 
@@ -438,6 +464,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 
 			spin_lock(&cfids->cfid_list_lock);
 			list_for_each_entry(parent_cfid, &cfids->entries, entry) {
+				spin_lock(&parent_cfid->cfid_lock);
 				if (parent_cfid->dentry == dentry->d_parent) {
 					cifs_dbg(FYI, "found a parent cached file handle\n");
 					if (is_valid_cached_dir(parent_cfid)) {
@@ -447,8 +474,10 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 						       parent_cfid->fid.lease_key,
 						       SMB2_LEASE_KEY_SIZE);
 					}
+					spin_unlock(&parent_cfid->cfid_lock);
 					break;
 				}
+				spin_unlock(&parent_cfid->cfid_lock);
 			}
 			spin_unlock(&cfids->cfid_list_lock);
 		}
@@ -527,10 +556,13 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 		smb2_set_replay(server, &rqst[1]);
 	}
 
+	mutex_lock(&cfid->cfid_open_mutex);
+
 	rc = compound_send_recv(xid, ses, server,
 				flags, 2, rqst,
 				resp_buftype, rsp_iov);
 	if (rc) {
+		mutex_unlock(&cfid->cfid_open_mutex);
 		if (rc == -EREMCHG) {
 			tcon->need_reconnect = true;
 			pr_warn_once("server share %s deleted\n",
@@ -538,10 +570,9 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 		}
 		goto oshr_free;
 	}
+	spin_lock(&cfid->cfid_lock);
 	cfid->is_open = true;
 
-	spin_lock(&cfids->cfid_list_lock);
-
 	o_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
 	oparms.fid->persistent_fid = o_rsp->PersistentFileId;
 	oparms.fid->volatile_fid = o_rsp->VolatileFileId;
@@ -551,8 +582,9 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 
 
 	if (o_rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) {
-		spin_unlock(&cfids->cfid_list_lock);
 		rc = -EINVAL;
+		spin_unlock(&cfid->cfid_lock);
+		mutex_unlock(&cfid->cfid_open_mutex);
 		goto oshr_free;
 	}
 
@@ -561,18 +593,21 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 				 oparms.fid->lease_key,
 				 &oplock, NULL, NULL);
 	if (rc) {
-		spin_unlock(&cfids->cfid_list_lock);
+		spin_unlock(&cfid->cfid_lock);
+		mutex_unlock(&cfid->cfid_open_mutex);
 		goto oshr_free;
 	}
 
 	rc = -EINVAL;
 	if (!(oplock & SMB2_LEASE_READ_CACHING_HE)) {
-		spin_unlock(&cfids->cfid_list_lock);
+		spin_unlock(&cfid->cfid_lock);
+		mutex_unlock(&cfid->cfid_open_mutex);
 		goto oshr_free;
 	}
 	qi_rsp = (struct smb2_query_info_rsp *)rsp_iov[1].iov_base;
 	if (le32_to_cpu(qi_rsp->OutputBufferLength) < sizeof(struct smb2_file_all_info)) {
-		spin_unlock(&cfids->cfid_list_lock);
+		spin_unlock(&cfid->cfid_lock);
+		mutex_unlock(&cfid->cfid_open_mutex);
 		goto oshr_free;
 	}
 	if (!smb2_validate_and_copy_iov(
@@ -584,7 +619,8 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 
 	cfid->time = jiffies;
 	cfid->last_access_time = jiffies;
-	spin_unlock(&cfids->cfid_list_lock);
+	spin_unlock(&cfid->cfid_lock);
+	mutex_unlock(&cfid->cfid_open_mutex);
 	/* At this point the directory handle is fully cached */
 	rc = 0;
 
@@ -595,23 +631,24 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 	free_rsp_buf(resp_buftype[1], rsp_iov[1].iov_base);
 out:
 	if (rc) {
+		bool drop_lease_ref = false;
+
 		spin_lock(&cfids->cfid_list_lock);
 		if (cfid->on_list) {
 			list_del(&cfid->entry);
 			cfid->on_list = false;
 			cfids->num_entries--;
 		}
+		spin_lock(&cfid->cfid_lock);
 		if (cfid->has_lease) {
-			/*
-			 * We are guaranteed to have two references at this
-			 * point. One for the caller and one for a potential
-			 * lease. Release one here, and the second below.
-			 */
 			cfid->has_lease = false;
-			close_cached_dir_locked(cfid);
+			drop_lease_ref = true;
 		}
+		spin_unlock(&cfid->cfid_lock);
 		spin_unlock(&cfids->cfid_list_lock);
 
+		if (drop_lease_ref)
+			close_cached_dir(cfid);
 		close_cached_dir(cfid);
 	} else {
 		*ret_cfid = cfid;
@@ -642,12 +679,16 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 	spin_lock(&cfids->cfid_list_lock);
 	list_for_each_entry(cfid, &cfids->entries, entry) {
 		if (cfid->dentry == dentry) {
-			if (!is_valid_cached_dir(cfid))
+			spin_lock(&cfid->cfid_lock);
+			if (!is_valid_cached_dir(cfid)) {
+				spin_unlock(&cfid->cfid_lock);
 				break;
+			}
 			cifs_dbg(FYI, "found a cached file handle by dentry\n");
 			kref_get(&cfid->refcount);
 			*ret_cfid = cfid;
 			cfid->last_access_time = jiffies;
+			spin_unlock(&cfid->cfid_lock);
 			spin_unlock(&cfids->cfid_list_lock);
 			return 0;
 		}
@@ -662,6 +703,8 @@ __releases(&cfid->cfids->cfid_list_lock)
 {
 	struct cached_fid *cfid = container_of(ref, struct cached_fid,
 					       refcount);
+	u64 persistent_fid = 0, volatile_fid = 0;
+	bool is_open;
 	int rc;
 
 	lockdep_assert_held(&cfid->cfids->cfid_list_lock);
@@ -676,9 +719,17 @@ __releases(&cfid->cfids->cfid_list_lock)
 	dput(cfid->dentry);
 	cfid->dentry = NULL;
 
-	if (cfid->is_open) {
-		rc = SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid,
-			   cfid->fid.volatile_fid);
+	spin_lock(&cfid->cfid_lock);
+	is_open = cfid->is_open;
+	if (is_open) {
+		persistent_fid = cfid->fid.persistent_fid;
+		volatile_fid = cfid->fid.volatile_fid;
+		cfid->is_open = false;
+	}
+	spin_unlock(&cfid->cfid_lock);
+
+	if (is_open) {
+		rc = SMB2_close(0, cfid->tcon, persistent_fid, volatile_fid);
 		if (rc) /* should we retry on -EBUSY or -EAGAIN? */
 			cifs_dbg(VFS, "close cached dir rc %d\n", rc);
 	}
@@ -691,6 +742,7 @@ void drop_cached_dir_by_name(const unsigned int xid, struct cifs_tcon *tcon,
 {
 	struct cached_fid *cfid = NULL;
 	int rc;
+	bool drop_lease_ref = false;
 
 	rc = open_cached_dir(xid, tcon, name, cifs_sb, true, &cfid);
 	if (rc) {
@@ -698,11 +750,16 @@ void drop_cached_dir_by_name(const unsigned int xid, struct cifs_tcon *tcon,
 		return;
 	}
 	spin_lock(&cfid->cfids->cfid_list_lock);
+	spin_lock(&cfid->cfid_lock);
 	if (cfid->has_lease) {
 		cfid->has_lease = false;
-		close_cached_dir_locked(cfid);
+		drop_lease_ref = true;
 	}
+	spin_unlock(&cfid->cfid_lock);
 	spin_unlock(&cfid->cfids->cfid_list_lock);
+
+	if (drop_lease_ref)
+		close_cached_dir(cfid);
 	close_cached_dir(cfid);
 }
 
@@ -711,8 +768,7 @@ void drop_cached_dir_by_name(const unsigned int xid, struct cifs_tcon *tcon,
  *
  * The release function will be called with cfid_list_lock held to remove the
  * cached dirs from the list before any other thread can take another @cfid
- * ref. Must not be called with cfid_list_lock held; use
- * close_cached_dir_locked() called instead.
+ * ref. Must not be called with cfid_list_lock held.
  *
  * @cfid: cached dir
  */
@@ -722,30 +778,6 @@ void close_cached_dir(struct cached_fid *cfid)
 	kref_put_lock(&cfid->refcount, smb2_close_cached_fid, &cfid->cfids->cfid_list_lock);
 }
 
-/**
- * close_cached_dir_locked - put a reference of a cached dir with
- * cfid_list_lock held
- *
- * Calling close_cached_dir() with cfid_list_lock held has the potential effect
- * of causing a deadlock if the invariant of refcount >= 2 is false.
- *
- * This function is used in paths that hold cfid_list_lock and expect at least
- * two references. If that invariant is violated, WARNs and returns without
- * dropping a reference; the final put must still go through
- * close_cached_dir().
- *
- * @cfid: cached dir
- */
-static void close_cached_dir_locked(struct cached_fid *cfid)
-{
-	lockdep_assert_held(&cfid->cfids->cfid_list_lock);
-
-	if (WARN_ON(kref_read(&cfid->refcount) < 2))
-		return;
-
-	kref_put(&cfid->refcount, smb2_close_cached_fid);
-}
-
 /*
  * Called from cifs_kill_sb when we unmount a share
  */
@@ -784,8 +816,10 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 				goto done;
 			}
 
+			spin_lock(&cfid->cfid_lock);
 			tmp_list->dentry = cfid->dentry;
 			cfid->dentry = NULL;
+			spin_unlock(&cfid->cfid_lock);
 
 			list_add_tail(&tmp_list->entry, &entry);
 		}
@@ -825,16 +859,20 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
 	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
 		list_move(&cfid->entry, &cfids->dying);
 		cfids->num_entries--;
+		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
-		cfid->on_list = false;
 		if (cfid->has_lease) {
 			/*
 			 * The lease was never cancelled from the server,
 			 * so steal that reference.
 			 */
 			cfid->has_lease = false;
-		} else
+			spin_unlock(&cfid->cfid_lock);
+		} else {
+			spin_unlock(&cfid->cfid_lock);
 			kref_get(&cfid->refcount);
+		}
+		cfid->on_list = false;
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 
@@ -883,12 +921,14 @@ bool cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
 
 	spin_lock(&cfids->cfid_list_lock);
 	list_for_each_entry(cfid, &cfids->entries, entry) {
+		spin_lock(&cfid->cfid_lock);
 		if (cfid->has_lease &&
 		    !memcmp(lease_key,
 			    cfid->fid.lease_key,
 			    SMB2_LEASE_KEY_SIZE)) {
 			cfid->has_lease = false;
 			cfid->time = 0;
+			spin_unlock(&cfid->cfid_lock);
 			/*
 			 * We found a lease remove it from the list
 			 * so no threads can access it.
@@ -904,6 +944,7 @@ bool cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
 			spin_unlock(&cfids->cfid_list_lock);
 			return true;
 		}
+		spin_unlock(&cfid->cfid_lock);
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 	return false;
@@ -927,6 +968,8 @@ static struct cached_fid *init_cached_dir(const char *path)
 	INIT_LIST_HEAD(&cfid->entry);
 	INIT_LIST_HEAD(&cfid->dirents.entries);
 	mutex_init(&cfid->dirents.de_mutex);
+	mutex_init(&cfid->cfid_open_mutex);
+	spin_lock_init(&cfid->cfid_lock);
 	kref_init(&cfid->refcount);
 	return cfid;
 }
@@ -983,6 +1026,7 @@ static void cfids_laundromat_worker(struct work_struct *work)
 	list_cut_before(&entry, &cfids->dying, &cfids->dying);
 
 	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
+		spin_lock(&cfid->cfid_lock);
 		if (cfid->last_access_time &&
 		    time_after(jiffies, cfid->last_access_time + HZ * dir_cache_timeout)) {
 			cfid->on_list = false;
@@ -994,8 +1038,13 @@ static void cfids_laundromat_worker(struct work_struct *work)
 				 * server. Steal that reference.
 				 */
 				cfid->has_lease = false;
-			} else
+				spin_unlock(&cfid->cfid_lock);
+			} else {
+				spin_unlock(&cfid->cfid_lock);
 				kref_get(&cfid->refcount);
+			}
+		} else {
+			spin_unlock(&cfid->cfid_lock);
 		}
 	}
 	spin_unlock(&cfids->cfid_list_lock);
@@ -1062,12 +1111,16 @@ void free_cached_dirs(struct cached_fids *cfids)
 	spin_lock(&cfids->cfid_list_lock);
 	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
 		cfid->on_list = false;
+		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
+		spin_unlock(&cfid->cfid_lock);
 		list_move(&cfid->entry, &entry);
 	}
 	list_for_each_entry_safe(cfid, q, &cfids->dying, entry) {
 		cfid->on_list = false;
+		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
+		spin_unlock(&cfid->cfid_lock);
 		list_move(&cfid->entry, &entry);
 	}
 	spin_unlock(&cfids->cfid_list_lock);
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 09f1f488059c9..f82db6a7ca5b0 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -48,6 +48,8 @@ struct cached_fid {
 	struct work_struct put_work;
 	struct work_struct close_work;
 	struct cached_dirents dirents;
+	struct mutex cfid_open_mutex; /* Serializes OPEN response processing and lease key population */
+	spinlock_t cfid_lock; /* Protects: has_lease, time, is_open, file_all_info_is_valid, last_access_time, fid.lease_key reads */
 
 	/* Must be last as it ends in a flexible-array member. */
 	struct smb2_file_all_info file_all_info;
@@ -56,8 +58,12 @@ struct cached_fid {
 /* default MAX_CACHED_FIDS is 16 */
 struct cached_fids {
 	/* Must be held when:
-	 * - accessing the cfids->entries list
-	 * - accessing the cfids->dying list
+	 * - modifying cfids->entries list (add/remove entries)
+	 * - modifying cfids->dying list
+	 * - modifying cfid->on_list or cfids->num_entries
+	 *
+	 * Lock ordering: if you need both cfid_list_lock and cfid_lock,
+	 * acquire cfid_list_lock FIRST, then cfid_lock to avoid deadlock.
 	 */
 	spinlock_t cfid_list_lock;
 	int num_entries;
@@ -78,6 +84,9 @@ is_valid_cached_dir(struct cached_fid *cfid)
 	return cfid->time && cfid->has_lease;
 }
 
+bool cached_dir_copy_lease_key(struct cached_fid *cfid,
+			      __u8 lease_key[SMB2_LEASE_KEY_SIZE]);
+
 struct cached_fids *init_cached_dirs(void);
 void free_cached_dirs(struct cached_fids *cfids);
 int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, const char *path,
diff --git a/fs/smb/client/cifs_debug.c b/fs/smb/client/cifs_debug.c
index 217444e3e6d01..cc7d26a3917c5 100644
--- a/fs/smb/client/cifs_debug.c
+++ b/fs/smb/client/cifs_debug.c
@@ -327,6 +327,7 @@ static int cifs_debug_dirs_proc_show(struct seq_file *m, void *v)
 						(unsigned long)atomic_long_read(&cfids->total_dirents_entries),
 						(unsigned long long)atomic64_read(&cfids->total_dirents_bytes));
 				list_for_each_entry(cfid, &cfids->entries, entry) {
+					spin_lock(&cfid->cfid_lock);
 					seq_printf(m, "0x%x 0x%llx 0x%llx ",
 						tcon->tid,
 						ses->Suid,
@@ -338,11 +339,13 @@ static int cifs_debug_dirs_proc_show(struct seq_file *m, void *v)
 					seq_printf(m, "%s", cfid->path);
 					if (cfid->file_all_info_is_valid)
 						seq_printf(m, "\tvalid file info");
+					spin_unlock(&cfid->cfid_lock);
 					if (cfid->dirents.is_valid)
 						seq_printf(m, ", valid dirents");
-					if (!list_empty(&cfid->dirents.entries))
+					if (READ_ONCE(cfid->dirents.entries_count))
 						seq_printf(m, ", dirents: %lu entries, %lu bytes",
-						cfid->dirents.entries_count, cfid->dirents.bytes_used);
+						READ_ONCE(cfid->dirents.entries_count),
+						READ_ONCE(cfid->dirents.bytes_used));
 					seq_printf(m, "\n");
 				}
 				spin_unlock(&cfids->cfid_list_lock);
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 38d5600efe2c8..a15971ffeee58 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -2068,6 +2068,8 @@ require use of the stronger protocol */
  *				->can_cache_brlcks
  * cifsInodeInfo->deferred_lock	cifsInodeInfo->deferred_closes	cifsInodeInfo_alloc
  * cached_fids->cfid_list_lock	cifs_tcon->cfids->entries	init_cached_dirs
+ * cached_fid->cfid_open_mutex	cached_fid OPEN/lease serialization	alloc_cached_dir
+ * cached_fid->cfid_lock	cached_fid state		alloc_cached_dir
  * cached_fid->dirents.de_mutex	cached_fid->dirents		alloc_cached_dir
  * cifsFileInfo->fh_mutex	cifsFileInfo			cifs_new_fileinfo
  * cifsFileInfo->file_info_lock	cifsFileInfo->count		cifs_new_fileinfo
diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index 6d2378eeb7f68..4e5c580e4de0a 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -194,6 +194,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 	struct cached_fid *parent_cfid = NULL;
 	int rdwr_for_fscache = 0;
 	__le32 lease_flags = 0;
+	bool found_parent_cfid;
 
 	*oplock = 0;
 	if (tcon->ses->server->oplocks)
@@ -319,24 +320,33 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 
 retry_open:
 	if (tcon->cfids && direntry->d_parent && server->dialect >= SMB30_PROT_ID) {
+		found_parent_cfid = false;
 		parent_cfid = NULL;
 		spin_lock(&tcon->cfids->cfid_list_lock);
 		list_for_each_entry(parent_cfid, &tcon->cfids->entries, entry) {
+			spin_lock(&parent_cfid->cfid_lock);
 			if (parent_cfid->dentry == direntry->d_parent) {
+				kref_get(&parent_cfid->refcount);
+				spin_unlock(&parent_cfid->cfid_lock);
+				spin_unlock(&tcon->cfids->cfid_list_lock);
+				found_parent_cfid = true;
 				cifs_dbg(FYI, "found a parent cached file handle\n");
-				if (is_valid_cached_dir(parent_cfid)) {
+				if (cached_dir_copy_lease_key(parent_cfid,
+						      fid->parent_lease_key)) {
 					lease_flags
 						|= SMB2_LEASE_FLAG_PARENT_LEASE_KEY_SET_LE;
-					memcpy(fid->parent_lease_key,
-					       parent_cfid->fid.lease_key,
-					       SMB2_LEASE_KEY_SIZE);
+					mutex_lock(&parent_cfid->dirents.de_mutex);
 					parent_cfid->dirents.is_valid = false;
 					parent_cfid->dirents.is_failed = true;
+					mutex_unlock(&parent_cfid->dirents.de_mutex);
 				}
+				close_cached_dir(parent_cfid);
 				break;
 			}
+			spin_unlock(&parent_cfid->cfid_lock);
 		}
-		spin_unlock(&tcon->cfids->cfid_list_lock);
+		if (!found_parent_cfid)
+			spin_unlock(&tcon->cfids->cfid_list_lock);
 	}
 
 	oparms = (struct cifs_open_parms) {
@@ -737,7 +747,12 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
 			 * dentry is negative and parent is fully cached:
 			 * we can assume file does not exist
 			 */
-			if (cfid->dirents.is_valid) {
+			bool dirents_valid;
+
+			mutex_lock(&cfid->dirents.de_mutex);
+			dirents_valid = cfid->dirents.is_valid;
+			mutex_unlock(&cfid->dirents.de_mutex);
+			if (dirents_valid) {
 				close_cached_dir(cfid);
 				goto out;
 			}
@@ -848,7 +863,12 @@ cifs_d_revalidate(struct inode *dir, const struct qstr *name,
 			 * dentry is negative and parent is fully cached:
 			 * we can assume file does not exist
 			 */
-			if (cfid->dirents.is_valid) {
+			bool dirents_valid;
+
+			mutex_lock(&cfid->dirents.de_mutex);
+			dirents_valid = cfid->dirents.is_valid;
+			mutex_unlock(&cfid->dirents.de_mutex);
+			if (dirents_valid) {
 				close_cached_dir(cfid);
 				return 1;
 			}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 09/19] cifs: query dir should reuse cfid even if not fully cached
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (6 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 08/19] cifs: make cfid locks more granular nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 10/19] cifs: back cached_dirents with page cache nspmangalore
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

When a cached_dirents population is underway but not yet fully populated,
cifs_readdir does not rely on the local cache, but makes a parallel
stream of QueryDir calls to the server. However, these calls are made
without lease key and that ends up breaking our dir lease.

This change will reuse the existing lease key for this scenario for the
parallel QueryDir calls that are made.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 38 ++++++++++++++++++++++++++++++++++++
 fs/smb/client/cached_dir.h |  6 ++++++
 fs/smb/client/cifsglob.h   |  2 ++
 fs/smb/client/file.c       |  1 +
 fs/smb/client/readdir.c    | 40 ++++++++++++++++++++++++++------------
 fs/smb/client/smb2ops.c    | 19 ++++++++++++++++++
 6 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index ad2439856a1fe..614a241393b59 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -22,6 +22,20 @@ struct cached_dir_dentry {
 	struct dentry *dentry;
 };
 
+bool cached_dir_is_valid(struct cached_fid *cfid)
+{
+	bool valid;
+
+	if (!cfid)
+		return false;
+
+	spin_lock(&cfid->cfid_lock);
+	valid = is_valid_cached_dir(cfid);
+	spin_unlock(&cfid->cfid_lock);
+
+	return valid;
+}
+
 bool cached_dir_copy_lease_key(struct cached_fid *cfid,
 			      __u8 lease_key[SMB2_LEASE_KEY_SIZE])
 {
@@ -1132,3 +1146,27 @@ void free_cached_dirs(struct cached_fids *cfids)
 
 	kfree(cfids);
 }
+
+void cifs_set_srch_inf_cfid(struct cifs_search_info *srch_inf,
+			   struct cached_fid *cfid)
+{
+	if (srch_inf->cfid == cfid)
+		return;
+
+	if (cfid)
+		kref_get(&cfid->refcount);
+
+	if (srch_inf->cfid)
+		close_cached_dir(srch_inf->cfid);
+
+	srch_inf->cfid = cfid;
+}
+
+void cifs_put_srch_inf_cfid(struct cifs_search_info *srch_inf)
+{
+	if (!srch_inf->cfid)
+		return;
+
+	close_cached_dir(srch_inf->cfid);
+	srch_inf->cfid = NULL;
+}
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index f82db6a7ca5b0..0767350b40fba 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -8,6 +8,8 @@
 #ifndef _CACHED_DIR_H
 #define _CACHED_DIR_H
 
+struct cifs_search_info;
+
 struct cached_dirent {
 	struct list_head entry;
 	char *name;
@@ -84,6 +86,7 @@ is_valid_cached_dir(struct cached_fid *cfid)
 	return cfid->time && cfid->has_lease;
 }
 
+bool cached_dir_is_valid(struct cached_fid *cfid);
 bool cached_dir_copy_lease_key(struct cached_fid *cfid,
 			      __u8 lease_key[SMB2_LEASE_KEY_SIZE]);
 
@@ -95,6 +98,9 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, const char *path,
 int open_cached_dir_by_dentry(struct cifs_tcon *tcon, struct dentry *dentry,
 			      struct cached_fid **ret_cfid);
 void close_cached_dir(struct cached_fid *cfid);
+void cifs_set_srch_inf_cfid(struct cifs_search_info *srch_inf,
+			   struct cached_fid *cfid);
+void cifs_put_srch_inf_cfid(struct cifs_search_info *srch_inf);
 bool emit_cached_dir_if_valid(struct cached_fid *cfid,
 			      struct file *file,
 			      struct dir_context *ctx);
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index a15971ffeee58..2a3fad071564a 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -309,6 +309,7 @@ struct cifs_search_info;
 struct cifsInodeInfo;
 struct cifs_open_parms;
 struct cifs_credits;
+struct cached_fid;
 
 struct smb_version_operations {
 	int (*send_cancel)(struct cifs_ses *ses, struct TCP_Server_Info *server,
@@ -1395,6 +1396,7 @@ struct cifs_search_info {
 	bool unicode:1;
 	bool smallBuf:1; /* so we know which buf_release function to call */
 	bool is_dynamic_buf:1; /* dynamically allocated buffer - can be variable size */
+	struct cached_fid *cfid; /* Reference to cached file id for directory enumeration */
 };
 
 /* Structure for QueryDirectory with multi-credit support */
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 6a1419d59ed5a..9e3c07006f4f2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -1552,6 +1552,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 			cifs_buf_release(buf);
 	}
 
+	cifs_put_srch_inf_cfid(&cfile->srch_inf);
 	cifs_put_tlink(cfile->tlink);
 	kfree(file->private_data);
 	file->private_data = NULL;
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 907e235ad1b8f..ef81fdb503c0a 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -344,7 +344,7 @@ cifs_std_info_to_fattr(struct cifs_fattr *fattr, FIND_FILE_STANDARD_INFO *info,
 
 static int
 _initiate_cifs_search(const unsigned int xid, struct file *file,
-		     const char *full_path)
+		     const char *full_path, struct cached_fid *cfid)
 {
 	struct cifs_sb_info *cifs_sb = CIFS_SB(file);
 	struct tcon_link *tlink = NULL;
@@ -368,9 +368,11 @@ _initiate_cifs_search(const unsigned int xid, struct file *file,
 		spin_lock_init(&cifsFile->file_info_lock);
 		file->private_data = cifsFile;
 		cifsFile->tlink = cifs_get_tlink(tlink);
+		cifs_set_srch_inf_cfid(&cifsFile->srch_inf, cfid);
 		tcon = tlink_tcon(tlink);
 	} else {
 		cifsFile = file->private_data;
+		cifs_set_srch_inf_cfid(&cifsFile->srch_inf, cfid);
 		tcon = tlink_tcon(cifsFile->tlink);
 	}
 
@@ -425,12 +427,12 @@ _initiate_cifs_search(const unsigned int xid, struct file *file,
 
 static int
 initiate_cifs_search(const unsigned int xid, struct file *file,
-		     const char *full_path)
+		     const char *full_path, struct cached_fid *cfid)
 {
 	int rc, retry_count = 0;
 
 	do {
-		rc = _initiate_cifs_search(xid, file, full_path);
+		rc = _initiate_cifs_search(xid, file, full_path, cfid);
 		/*
 		 * If we don't have enough credits to start reading the
 		 * directory just try again after short wait.
@@ -742,7 +744,11 @@ find_cifs_entry(const unsigned int xid, struct cifs_tcon *tcon, loff_t pos,
 			cfile->srch_inf.srch_entries_start = NULL;
 			cfile->srch_inf.last_entry = NULL;
 		}
-		rc = initiate_cifs_search(xid, file, full_path);
+		/* Pass cfid only if still valid; srch_inf owns the reference. */
+		struct cached_fid *rewind_cfid =
+			cached_dir_is_valid(cfile->srch_inf.cfid) ?
+			cfile->srch_inf.cfid : NULL;
+		rc = initiate_cifs_search(xid, file, full_path, rewind_cfid);
 		if (rc) {
 			cifs_dbg(FYI, "error %d reinitiating a search on rewind\n",
 				 rc);
@@ -968,20 +974,31 @@ int cifs_readdir(struct file *file, struct dir_context *ctx)
 	if (emit_cached_dir_if_valid(cfid, file, ctx))
 		goto rddir2_exit;
 
-	/* Drop the cache while calling initiate_cifs_search and
-	 * find_cifs_entry in case there will be reconnects during
-	 * query_directory.
+	/*
+	 * If cfid is valid but cache is invalid and not failed,
+	 * keep cfid and pass it to initiate_cifs_search to populate.
+	 * Otherwise (no cfid or cache is failed), close cfid and
+	 * proceed without cache for this session.
 	 */
-	close_cached_dir(cfid);
-	cfid = NULL;
+	if (cfid) {
+		bool cache_pending;
+
+		mutex_lock(&cfid->dirents.de_mutex);
+		cache_pending = !cfid->dirents.is_valid && !cfid->dirents.is_failed;
+		mutex_unlock(&cfid->dirents.de_mutex);
+		if (!cache_pending) {
+			close_cached_dir(cfid);
+			cfid = NULL;
+		}
+	}
 
- cache_not_found:
+cache_not_found:
 	/*
 	 * Ensure FindFirst doesn't fail before doing filldir() for '.' and
 	 * '..'. Otherwise we won't be able to notify VFS in case of failure.
 	 */
 	if (file->private_data == NULL) {
-		rc = initiate_cifs_search(xid, file, full_path);
+		rc = initiate_cifs_search(xid, file, full_path, cfid);
 		cifs_dbg(FYI, "initiate cifs search rc %d\n", rc);
 		if (rc)
 			goto rddir2_exit;
@@ -1009,7 +1026,6 @@ int cifs_readdir(struct file *file, struct dir_context *ctx)
 	tcon = tlink_tcon(cifsFile->tlink);
 	rc = find_cifs_entry(xid, tcon, ctx->pos, file, full_path,
 			     &current_entry, &num_to_fill);
-	open_cached_dir(xid, tcon, full_path, cifs_sb, false, &cfid);
 	if (rc) {
 		cifs_dbg(FYI, "fce error %d\n", rc);
 		goto rddir2_exit;
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index 2df4d080e95f0..05b636fbb20a6 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -2463,12 +2463,16 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	struct smb2_query_directory_rsp *qd2_rsp = NULL;
 	struct smb2_create_rsp *op_rsp = NULL;
 	struct TCP_Server_Info *server;
+	struct cached_fid *cfid = srch_inf ? srch_inf->cfid : NULL;
 	int retries = 0, cur_sleep = 0;
 	unsigned int compound_resp_bufsize;
+	bool use_cfid_lease = false;
+	bool cfid_open_locked = false;
 
 replay_again:
 	/* reinitialize for possible replay */
 	flags = 0;
+	use_cfid_lease = false;
 	oplock = SMB2_OPLOCK_LEVEL_NONE;
 	server = cifs_pick_channel(tcon->ses);
 
@@ -2476,6 +2480,15 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	if (!utf16_path)
 		return -ENOMEM;
 
+	if (cfid) {
+		mutex_lock(&cfid->cfid_open_mutex);
+		cfid_open_locked = true;
+		use_cfid_lease = cached_dir_copy_lease_key(cfid,
+						      fid->lease_key);
+		oplock = use_cfid_lease ?
+			SMB2_OPLOCK_LEVEL_II : SMB2_OPLOCK_LEVEL_NONE;
+	}
+
 	if (smb3_encryption_required(tcon))
 		flags |= CIFS_TRANSFORM_REQ;
 
@@ -2556,6 +2569,10 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 	rc = compound_send_recv(xid, tcon->ses, server,
 				flags, 3, rqst,
 				resp_buftype, rsp_iov);
+	if (cfid_open_locked) {
+		mutex_unlock(&cfid->cfid_open_mutex);
+		cfid_open_locked = false;
+	}
 
 	/* If the open failed there is nothing to do */
 	op_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
@@ -2696,6 +2713,8 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 			tcon->ses->Suid, 0, srch_inf->entries_in_buffer);
 
  qdf_free:
+	if (cfid_open_locked)
+		mutex_unlock(&cfid->cfid_open_mutex);
 	kfree(utf16_path);
 	SMB2_open_free(&rqst[0]);
 	SMB2_query_directory_free(&rqst[1]);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 10/19] cifs: back cached_dirents with page cache
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (7 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 09/19] cifs: query dir should reuse cfid even if not fully cached nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 11/19] cifs: in place changes to cached_dirents when dir lease is held nspmangalore
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today the cached_dirents is a linked list with one entry per dentry.
This is very inefficient from the point of view of memory allocation
and memory management.

This change introduces a hybrid structure. cached_dirents will start
with maintaining a linked list for cached_dirents for small directories.
When the size of the directory (in terms of number of dirents) exceeds a
threshold (64), cached_dirents will now switch over to a folioq structure
to store the cached_dirents.

The idea is to reduce the number of memory allocations significantly for
large directories. Additionally, this change also tries to store short names
(names less than 64 bytes) in the folio itself, further reducing the
memory allocation calls. If the namelen is greater than 64 bytes or
if the folio does not have space to store more names, it falls back to kmalloc.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 1219 ++++++++++++++++++++++++++++++++----
 fs/smb/client/cached_dir.h |  141 ++++-
 fs/smb/client/cifsproto.h  |    1 +
 3 files changed, 1236 insertions(+), 125 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 614a241393b59..7cfbe50db66f5 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -6,22 +6,29 @@
  */
 
 #include <linux/namei.h>
+#include <linux/completion.h>
+#include <linux/kmemleak.h>
+#include <linux/hash.h>
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "smb2proto.h"
 #include "cached_dir.h"
+#include "trace.h"
 
 static struct cached_fid *init_cached_dir(const char *path);
 static void free_cached_dir(struct cached_fid *cfid);
 static void smb2_close_cached_fid(struct kref *ref);
 static void cfids_laundromat_worker(struct work_struct *work);
 
+#define CACHED_DIRENT_HASH_BITS	7
+
 struct cached_dir_dentry {
 	struct list_head entry;
 	struct dentry *dentry;
 };
 
+/* Generic helpers */
 bool cached_dir_is_valid(struct cached_fid *cfid)
 {
 	bool valid;
@@ -53,50 +60,689 @@ bool cached_dir_copy_lease_key(struct cached_fid *cfid,
 	return valid;
 }
 
+/* Cached mapping helpers */
+static inline const char *cached_dirent_name(const struct cifs_cached_dir_mapping *cached_mapping,
+					     const struct cached_dirent *de)
+{
+	if (de->external_name)
+		return de->name;
+
+	return ((const char *)cached_mapping) + de->inline_name_off;
+}
+
+static inline struct cifs_cached_dir_mapping *cached_dir_mapping(struct folio *folio)
+{
+	return folio_address(folio);
+}
+
+static inline size_t cached_dirent_array_bytes(unsigned int entries)
+{
+	return struct_size((struct cifs_cached_dir_mapping *)NULL, entries, entries);
+}
+
+static inline bool cached_dirent_has_space_for_record(const struct cifs_cached_dir_mapping *cached_mapping,
+						      size_t record_bytes)
+{
+	return cached_dirent_array_bytes(cached_mapping->entries_count + 1) + record_bytes <=
+		cached_mapping->name_tail_offset;
+}
+
+/* for short names, try to place them inside the folio */
+static bool cached_dirent_try_inline_name(struct folio *folio,
+					  struct cifs_cached_dir_mapping *cached_mapping,
+					  struct cached_dirent *de,
+					  const char *name,
+					  unsigned int namelen,
+					  const char **stored_name)
+{
+	char *base;
+	u32 tail;
+
+	if (namelen > CIFS_CACHED_INLINE_NAME_LEN)
+		return false;
+
+	/* try to fit cached_dirent+name in the same folio (inline) */
+	if (!cached_dirent_has_space_for_record(cached_mapping, namelen))
+		return false;
+
+	base = folio_address(folio);
+	if (!base)
+		return false;
+
+	tail = cached_mapping->name_tail_offset - namelen;
+	memcpy(base + tail, name, namelen);
+	de->external_name = false;
+	de->inline_name_off = tail;
+	de->name = NULL;
+	cached_mapping->name_tail_offset = tail;
+	*stored_name = base + tail;
+	return true;
+}
+
+static unsigned int cached_dir_folio_count(struct cached_dirents *cde)
+{
+	struct folio_queue *fq;
+	unsigned int count = 0;
+
+	for (fq = cde->folioq; fq; fq = fq->next) {
+		count += folioq_count(fq);
+	}
+
+	return count;
+}
+
+/* insert cursor helpers to aid fast appends to cached_dir */
+static void cached_dir_reset_insert_cursor_locked(struct cached_dirents *cde)
+{
+	cde->insert_cursor_fq = cde->folioq;
+	cde->insert_cursor_slot = 0;
+	cde->insert_cursor_folio_index = 0;
+}
+
+static void cached_dir_set_insert_cursor_locked(struct cached_dirents *cde,
+						struct folio_queue *fq,
+						unsigned int slot,
+						unsigned int folio_index)
+{
+	cde->insert_cursor_fq = fq;
+	cde->insert_cursor_slot = slot;
+	cde->insert_cursor_folio_index = folio_index;
+}
+
+static bool cached_dirents_use_folioq_locked(struct cached_dirents *cde)
+{
+	return cde->folioq != NULL;
+}
+
+static void cached_dir_init_new_folios(struct cached_dirents *cde,
+				       unsigned int old_folio_count)
+{
+	struct folio_queue *fq;
+	unsigned int folio_index = 0;
+
+	for (fq = cde->folioq; fq; fq = fq->next) {
+		for (int s = 0; s < folioq_count(fq); s++, folio_index++) {
+			struct folio *folio = folioq_folio(fq, s);
+			void *base;
+
+			if (folio_index < old_folio_count)
+				continue;
+
+			base = folio_address(folio);
+			if (base) {
+				memset(base, 0, folio_size(folio));
+				cached_dir_mapping(folio)->name_tail_offset = folio_size(folio);
+			}
+		}
+	}
+}
+
+/*
+ * Expand the folioq backing store for a cached directory by one PAGE_SIZE.
+ * Called by add_cached_dirent_folioq_locked() when no free slot is found in
+ * the existing folios, and by convert_cached_dirents_list_to_folioq_locked()
+ * when initializing folioq mode for the first time.
+ *
+ * After growing, newly added folios are zeroed and their name_tail_offset is
+ * set to folio_size so that inline name packing starts from the tail.
+ * The insert cursor must be reset by the caller after this returns.
+ */
+static int grow_cached_dirents_folioq_locked(struct cached_dirents *cde)
+{
+	unsigned int old_folio_count;
+	size_t old_size, target_size;
+	int rc;
+
+	old_folio_count = cached_dir_folio_count(cde);
+	old_size = cde->folioq_size;
+	target_size = old_size + PAGE_SIZE;
+
+	cifs_dbg(FYI,
+		 "cached_dir folioq alloc: old_size=%zu target_size=%zu\n",
+		 old_size, target_size);
+
+	rc = netfs_alloc_folioq_buffer(NULL, &cde->folioq,
+				      &cde->folioq_size,
+				      target_size, GFP_NOFS);
+	if (rc < 0)
+		return rc;
+
+	cached_dir_init_new_folios(cde, old_folio_count);
+
+	return 0;
+}
+
+/* lookup cached_dirent by traversing the list */
+static struct cached_dir_lookup_entry *lookup_cached_dirent_list_locked(struct cached_dirents *cde,
+							 const char *name,
+							 unsigned int namelen)
+{
+	struct cached_dir_lookup_entry *entry;
+	u32 name_hash;
+
+	name_hash = full_name_hash(NULL, name, namelen);
+
+	list_for_each_entry(entry, &cde->entry_list, list_node) {
+		if (entry->name_hash == name_hash &&
+		    entry->dirent &&
+		    entry->dirent->name_len == namelen &&
+		    memcmp(entry->dirent->name, name, namelen) == 0)
+			return entry;
+	}
+
+	return NULL;
+}
+
+/* lookup cached_dirent in folioq by using the hash table */
+static struct cached_dir_lookup_entry *lookup_cached_dirent_locked(struct cached_dirents *cde,
+								   const char *name,
+								   unsigned int namelen)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct hlist_head *bucket;
+	u32 name_hash;
+
+	if (!cde->lookup_ht)
+		return NULL;
+
+	name_hash = full_name_hash(NULL, name, namelen);
+	bucket = &cde->lookup_ht[hash_32(name_hash, CACHED_DIRENT_HASH_BITS)];
+
+	hlist_for_each_entry(entry, bucket, hash_node) {
+		if (entry->name_hash == name_hash &&
+		    entry->dirent &&
+		    entry->dirent->name_len == namelen &&
+		    memcmp(entry->dirent->name, name, namelen) == 0)
+			return entry;
+	}
+
+	return NULL;
+}
+
+/* lookup wrapper to decide if the entry is in list or folioq */
+static struct cached_dir_lookup_entry *lookup_cached_dirent_entry_locked(struct cached_dirents *cde,
+								  const char *name,
+								  unsigned int namelen)
+{
+	if (cached_dirents_use_folioq_locked(cde))
+		return lookup_cached_dirent_locked(cde, name, namelen);
+
+	return lookup_cached_dirent_list_locked(cde, name, namelen);
+}
+
+/* lookup the last cached_dir_mapping in the folioq */
+static struct cifs_cached_dir_mapping *last_cached_dir_mapping_locked(struct cached_dirents *cde)
+{
+	struct folio_queue *fq;
+	unsigned int slot;
+	struct cifs_cached_dir_mapping *last = NULL;
+
+	lockdep_assert_held(&cde->de_mutex);
+
+	if (!cde->folioq)
+		return NULL;
+
+	/* Fast path: the insert cursor tracks the most recent append location. */
+	if (cde->insert_cursor_fq) {
+		slot = cde->insert_cursor_slot;
+		if (slot < folioq_count(cde->insert_cursor_fq)) {
+			last = cached_dir_mapping(folioq_folio(cde->insert_cursor_fq, slot));
+			if (last && last->entries_count)
+				return last;
+		}
+	}
+
+	for (fq = cde->folioq; fq; fq = fq->next) {
+		for (int s = 0; s < folioq_count(fq); s++) {
+			struct cifs_cached_dir_mapping *cached_mapping;
+
+			cached_mapping = cached_dir_mapping(folioq_folio(fq, s));
+			if (cached_mapping && cached_mapping->entries_count)
+				last = cached_mapping;
+		}
+	}
+
+	return last;
+}
+
+/* emit dirents from the cache, starting with the current position of ctx */
 static bool emit_cached_dirents(struct cached_dirents *cde,
 				struct dir_context *ctx)
 {
-	struct cached_dirent *dirent;
+	struct folio_queue *fq;
 	bool rc;
 
 	lockdep_assert_held(&cde->de_mutex);
 
-	list_for_each_entry(dirent, &cde->entries, entry) {
-		/*
-		 * Skip all early entries prior to the current lseek()
-		 * position.
-		 */
-		if (ctx->pos > dirent->pos)
-			continue;
-		/*
-		 * We recorded the current ->pos value for the dirent
-		 * when we stored it in the cache.
-		 * However, this sequence of ->pos values may have holes
-		 * in it, for example dot-dirs returned from the server
-		 * are suppressed.
-		 * Handle this by forcing ctx->pos to be the same as the
-		 * ->pos of the current dirent we emit from the cache.
-		 * This means that when we emit these entries from the cache
-		 * we now emit them with the same ->pos value as in the
-		 * initial scan.
-		 */
-		ctx->pos = dirent->pos;
-		rc = dir_emit(ctx, dirent->name, dirent->namelen,
-			      dirent->fattr.cf_uniqueid,
-			      dirent->fattr.cf_dtype);
-		if (!rc)
-			return rc;
-		ctx->pos++;
+	/* if folioq is empty, this is a small dir; dirents will be found in list */
+	if (!cde->folioq) {
+		struct cached_dir_lookup_entry *entry;
+
+		list_for_each_entry(entry, &cde->entry_list, list_node) {
+			struct cached_dirent *dirent = entry->dirent;
+
+			if (dirent->tombstone)
+				continue;
+			if (ctx->pos > dirent->ctx_pos)
+				continue;
+
+			ctx->pos = dirent->ctx_pos;
+			rc = dir_emit(ctx, dirent->name, dirent->name_len,
+				      dirent->fattr.cf_uniqueid,
+				      dirent->fattr.cf_dtype);
+			if (!rc)
+				return rc;
+			ctx->pos++;
+		}
+
+		return cde->is_valid;
 	}
+
+	/* large dir; emit from folioq */
+	for (fq = cde->folioq; fq; fq = fq->next) {
+		for (int s = 0; s < folioq_count(fq); s++) {
+			struct folio *folio = folioq_folio(fq, s);
+			struct cifs_cached_dir_mapping *cached_mapping;
+
+			cached_mapping = cached_dir_mapping(folio);
+			if (!cached_mapping)
+				return false;
+
+			for (u32 i = 0; i < cached_mapping->entries_count; i++) {
+				struct cached_dirent *dirent = &cached_mapping->entries[i];
+				const char *name;
+
+				if (dirent->tombstone)
+					continue;
+
+				name = cached_dirent_name(cached_mapping, dirent);
+
+				/*
+				 * Skip all early entries prior to the current lseek()
+				 * position.
+				 */
+				if (ctx->pos > dirent->ctx_pos)
+					continue;
+				/*
+				 * We recorded the current ->pos value for the dirent
+				 * when we stored it in the cache.
+				 * However, this sequence of ->pos values may have holes
+				 * in it, for example dot-dirs returned from the server
+				 * are suppressed.
+				 * Handle this by forcing ctx->pos to be the same as the
+				 * ->pos of the current dirent we emit from the cache.
+				 * This means that when we emit these entries from the cache
+				 * we now emit them with the same ->pos value as in the
+				 * initial scan.
+				 */
+				ctx->pos = dirent->ctx_pos;
+				rc = dir_emit(ctx, name, dirent->name_len,
+					      dirent->fattr.cf_uniqueid,
+					      dirent->fattr.cf_dtype);
+				if (!rc)
+					return rc;
+				ctx->pos++;
+			}
+
+			if (cached_mapping->folio_is_eof)
+				return true;
+		}
+	}
+	return true;
+}
+
+/* release the lookup hashtable */
+static void release_lookup_table_locked(struct cached_dirents *cde)
+{
+	int bucket;
+
+	if (!cde->lookup_ht)
+		return;
+
+	for (bucket = 0; bucket < (1 << CACHED_DIRENT_HASH_BITS); bucket++) {
+		struct cached_dir_lookup_entry *entry;
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(entry, tmp, &cde->lookup_ht[bucket], hash_node) {
+			hlist_del(&entry->hash_node);
+			kfree(entry);
+		}
+	}
+
+	kfree(cde->lookup_ht);
+	cde->lookup_ht = NULL;
+	cde->lookup_bytes = 0;
+}
+
+/* release all cached_dirents in list */
+static void release_cached_dirents_list_locked(struct cached_dirents *cde)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dir_lookup_entry *tmp;
+
+	list_for_each_entry_safe(entry, tmp, &cde->entry_list, list_node) {
+		list_del(&entry->list_node);
+		if (entry->dirent) {
+			if (entry->dirent->external_name)
+				kfree((void *)entry->dirent->name);
+			kfree(entry->dirent);
+		}
+		kfree(entry);
+	}
+
+	cde->entry_list_count = 0;
+}
+
+/* release all cached_dirents in folioq */
+static void release_cached_dirents_folioq_locked(struct cached_dirents *cde)
+{
+	struct folio_queue *fq;
+
+	lockdep_assert_held(&cde->de_mutex);
+
+	for (fq = cde->folioq; fq; fq = fq->next) {
+		for (int s = 0; s < folioq_count(fq); s++) {
+			struct folio *folio = folioq_folio(fq, s);
+			struct cifs_cached_dir_mapping *cached_mapping;
+
+			cached_mapping = cached_dir_mapping(folio);
+			if (!cached_mapping)
+				continue;
+
+			for (u32 i = 0; i < cached_mapping->entries_count; i++)
+				if (cached_mapping->entries[i].external_name)
+					kfree((void *)cached_mapping->entries[i].name);
+		}
+	}
+
+	if (cde->folioq) {
+		cifs_dbg(FYI, "cached_dir folioq free: old_size=%zu target_size=%d\n",
+			 cde->folioq_size, 0);
+		netfs_free_folioq_buffer(cde->folioq);
+		cde->folioq = NULL;
+	}
+
+	cde->folioq_size = 0;
+}
+
+/* release wrapper for cached_dirents */
+static void release_cached_dirents_locked(struct cached_dirents *cde)
+{
+	lockdep_assert_held(&cde->de_mutex);
+
+	if (cached_dirents_use_folioq_locked(cde))
+		release_cached_dirents_folioq_locked(cde);
+	else
+		release_cached_dirents_list_locked(cde);
+
+	release_lookup_table_locked(cde);
+
+	cde->entries_count = 0;
+	cde->external_name_bytes = 0;
+	cde->lookup_bytes = 0;
+	cde->bytes_used = 0;
+	cde->dir_inode = NULL;
+	cached_dir_reset_insert_cursor_locked(cde);
+}
+
+/* invalidate cached_dirents and release resources, but keep the cache structure for reuse */
+static void fail_cached_dir_locked(struct cached_dirents *cde)
+{
+	cde->is_failed = 1;
+	release_cached_dirents_locked(cde);
+	/*
+	 * Reset the file pointer so the next cifs_readdir from position 0
+	 * can claim this slot and repopulate the cache.
+	 */
+	cde->file = NULL;
+}
+
+/* insert cached_dirent into lookup hashtable */
+static int insert_cached_dir_lookup_locked(struct cached_dirents *cde,
+					   const char *name,
+					   unsigned int namelen,
+					   struct cached_dirent *dirent,
+					   bool pending_dcache)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct hlist_head *bucket;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->name_hash = full_name_hash(NULL, name, namelen);
+	entry->dirent = dirent;
+	entry->pending_dcache = pending_dcache;
+	init_completion(&entry->dcache_complete);
+
+	bucket = &cde->lookup_ht[hash_32(entry->name_hash, CACHED_DIRENT_HASH_BITS)];
+	hlist_add_head(&entry->hash_node, bucket);
+	cde->lookup_bytes += sizeof(*entry);
+	return 0;
+}
+
+/* add cached_dirent to folioq */
+static bool add_cached_dirent_folioq_locked(struct cached_dirents *cde,
+					    loff_t ctx_pos,
+					    const char *name,
+					    unsigned int namelen,
+					    const struct cifs_fattr *fattr,
+					    bool pending_dcache)
+{
+	struct cached_dirent *de;
+	struct cifs_cached_dir_mapping *cached_mapping = NULL;
+	const char *stored_name;
+	struct folio *target_folio = NULL;
+	struct folio_queue *fq;
+	unsigned int cur_folio;
+	unsigned int start_slot;
+	int rc;
+	bool grew = false;
+
+	if (!cde->lookup_ht) {
+		cde->lookup_ht = kcalloc(1 << CACHED_DIRENT_HASH_BITS,
+					 sizeof(*cde->lookup_ht), GFP_KERNEL);
+		if (!cde->lookup_ht) {
+			fail_cached_dir_locked(cde);
+			return false;
+		}
+	}
+
+	/* Grow phase: ensure folioq exists */
+	if (!cde->folioq) {
+		rc = grow_cached_dirents_folioq_locked(cde);
+		if (rc < 0) {
+			fail_cached_dir_locked(cde);
+			return false;
+		}
+		cached_dir_reset_insert_cursor_locked(cde);
+	}
+
+	if (!cde->insert_cursor_fq)
+		cached_dir_reset_insert_cursor_locked(cde);
+
+retry_insert:
+	/* Insertion phase: try to find space in current folios */
+	de = NULL;
+	fq = cde->insert_cursor_fq;
+	start_slot = cde->insert_cursor_slot;
+	cur_folio = cde->insert_cursor_folio_index;
+	if (!fq) {
+		fq = cde->folioq;
+		start_slot = 0;
+		cur_folio = 0;
+	}
+
+	for (; fq && !de; fq = fq->next) {
+		for (int s = start_slot; s < folioq_count(fq) && !de; s++, cur_folio++) {
+			struct folio *folio = folioq_folio(fq, s);
+
+			cached_mapping = cached_dir_mapping(folio);
+			if (!cached_mapping)
+				continue;
+
+			if (cached_mapping->folio_full)
+				continue;
+
+			if (cached_dirent_has_space_for_record(cached_mapping, 0)) {
+				target_folio = folio;
+				de = &cached_mapping->entries[cached_mapping->entries_count];
+				cached_dir_set_insert_cursor_locked(cde, fq, s, cur_folio);
+				break;
+			}
+
+			cached_mapping->folio_full = 1;
+		}
+		start_slot = 0;
+	}
+
+	/* If no space found and haven't grown yet, grow and retry once */
+	if (!de && !grew) {
+		rc = grow_cached_dirents_folioq_locked(cde);
+		if (rc < 0) {
+			fail_cached_dir_locked(cde);
+			return false;
+		}
+
+		cached_dir_reset_insert_cursor_locked(cde);
+		grew = true;
+		goto retry_insert;
+	}
+
+	if (!de) {
+		fail_cached_dir_locked(cde);
+		return false;
+	}
+
+	memset(de, 0, sizeof(*de));
+	de->name_len = namelen;
+	de->ctx_pos = ctx_pos;
+	memcpy(&de->fattr, fattr, sizeof(*fattr));
+	stored_name = NULL;
+	if (!cached_dirent_try_inline_name(target_folio, cached_mapping, de,
+					      name, namelen, &stored_name)) {
+		de->name = kstrndup(name, namelen, GFP_KERNEL);
+		if (!de->name) {
+			fail_cached_dir_locked(cde);
+			return false;
+		}
+		kmemleak_not_leak((void *)de->name);
+		de->external_name = true;
+		cde->external_name_bytes += (size_t)namelen + 1;
+		stored_name = de->name;
+	} else {
+		de->external_name = false;
+	}
+	de->name = stored_name;
+
+	if (insert_cached_dir_lookup_locked(cde, stored_name, namelen,
+				   de,
+				   pending_dcache) < 0) {
+		if (de->external_name)
+			kfree((void *)de->name);
+		memset(de, 0, sizeof(*de));
+		fail_cached_dir_locked(cde);
+		return false;
+	}
+
+	cached_mapping->entries_count++;
+	cde->entries_count++;
+	cde->bytes_used = cde->folioq_size + cde->external_name_bytes +
+				  cde->lookup_bytes;
+	return true;
+}
+
+/* add cached_dirent to list */
+static bool add_cached_dirent_list_locked(struct cached_dirents *cde,
+					  loff_t ctx_pos,
+					  const char *name,
+					  unsigned int namelen,
+					  const struct cifs_fattr *fattr)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *de;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return false;
+
+	de = kzalloc(sizeof(*de), GFP_KERNEL);
+	if (!de) {
+		kfree(entry);
+		return false;
+	}
+
+	de->name = kstrndup(name, namelen, GFP_KERNEL);
+	if (!de->name) {
+		kfree(de);
+		kfree(entry);
+		return false;
+	}
+
+	de->name_len = namelen;
+	de->external_name = true;
+	de->ctx_pos = ctx_pos;
+	memcpy(&de->fattr, fattr, sizeof(*fattr));
+
+	entry->dirent = de;
+	entry->name_hash = full_name_hash(NULL, name, namelen);
+	entry->pending_dcache = false;
+	list_add_tail(&entry->list_node, &cde->entry_list);
+
+	cde->entry_list_count++;
+	cde->entries_count++;
+	cde->external_name_bytes += (size_t)namelen + 1;
+	cde->bytes_used = cde->external_name_bytes +
+			  cde->entry_list_count * (sizeof(*entry) + sizeof(*de));
 	return true;
 }
 
+/* convert cached_dirents from list to folioq format, freeing list entries */
+static int convert_cached_dirents_list_to_folioq_locked(struct cached_dirents *cde)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dir_lookup_entry *tmp;
+	unsigned long restored_entries = 0;
+
+	if (cde->folioq)
+		return 0;
+
+	release_lookup_table_locked(cde);
+	cde->entries_count = 0;
+	cde->external_name_bytes = 0;
+	cde->lookup_bytes = 0;
+	cde->bytes_used = 0;
+
+	list_for_each_entry_safe(entry, tmp, &cde->entry_list, list_node) {
+		if (!add_cached_dirent_folioq_locked(cde, entry->dirent->ctx_pos,
+						   entry->dirent->name,
+						   entry->dirent->name_len,
+						   &entry->dirent->fattr, false)) {
+			return -ENOMEM;
+		}
+
+		restored_entries++;
+		list_del(&entry->list_node);
+		kfree((void *)entry->dirent->name);
+		kfree(entry->dirent);
+		kfree(entry);
+	}
+
+	cde->entry_list_count = 0;
+	cde->entries_count = restored_entries;
+	cde->bytes_used = cde->folioq_size + cde->external_name_bytes +
+			  cde->lookup_bytes;
+	return 0;
+}
+
+/* add cached_dirent, deciding whether to put it in the list or folioq */
 static bool add_cached_dirent(struct cached_dirents *cde,
 			      struct dir_context *ctx, const char *name,
 			      int namelen, struct cifs_fattr *fattr,
 			      struct file *file)
 {
-	struct cached_dirent *de;
+	int rc;
 
 	lockdep_assert_held(&cde->de_mutex);
 
@@ -105,32 +751,36 @@ static bool add_cached_dirent(struct cached_dirents *cde,
 	if (cde->is_valid || cde->is_failed)
 		return false;
 	if (ctx->pos != cde->pos) {
-		cde->is_failed = 1;
+		fail_cached_dir_locked(cde);
 		return false;
 	}
-	de = kzalloc_obj(*de, GFP_KERNEL);
-	if (de == NULL) {
-		cde->is_failed = 1;
-		return false;
+
+	if (!cached_dirents_use_folioq_locked(cde)) {
+		if (cde->entry_list_count < CIFS_CACHED_DIRENT_LIST_THRESHOLD)
+			return add_cached_dirent_list_locked(cde, ctx->pos, name,
+						     namelen, fattr);
+
+		rc = convert_cached_dirents_list_to_folioq_locked(cde);
+		if (rc < 0) {
+			fail_cached_dir_locked(cde);
+			return false;
+		}
 	}
-	de->namelen = namelen;
-	de->name = kstrndup(name, namelen, GFP_KERNEL);
-	if (de->name == NULL) {
-		kfree(de);
-		cde->is_failed = 1;
+
+	if (!add_cached_dirent_folioq_locked(cde, ctx->pos, name, namelen, fattr,
+					     true)) {
+		fail_cached_dir_locked(cde);
 		return false;
 	}
-	de->pos = ctx->pos;
 
-	memcpy(&de->fattr, fattr, sizeof(struct cifs_fattr));
-
-	list_add_tail(&de->entry, &cde->entries);
-	/* update accounting */
-	cde->entries_count++;
-	cde->bytes_used += sizeof(*de) + (size_t)namelen + 1;
 	return true;
 }
 
+/*
+ * emit cached dirents for the current ctx position if the cache is valid.
+ * If there is no ongoing population for this directory (ctx->pos == 0) then
+ * make the ongoing readdir call responsible for populating the cache
+ */
 bool emit_cached_dir_if_valid(struct cached_fid *cfid,
 			      struct file *file,
 			      struct dir_context *ctx)
@@ -146,7 +796,15 @@ bool emit_cached_dir_if_valid(struct cached_fid *cfid,
 	 */
 	if (ctx->pos == 0 && cfid->dirents.file == NULL) {
 		cfid->dirents.file = file;
+		cfid->dirents.dir_inode = file_inode(file);
 		cfid->dirents.pos = 2;
+		cached_dir_reset_insert_cursor_locked(&cfid->dirents);
+		/*
+		 * A previous population attempt may have failed and left
+		 * is_failed set.  Clear it now so add_cached_dirent() will
+		 * accept new entries from this readdir pass.
+		 */
+		cfid->dirents.is_failed = 0;
 	}
 
 	if (!cfid->dirents.is_valid) {
@@ -161,6 +819,155 @@ bool emit_cached_dir_if_valid(struct cached_fid *cfid,
 	return true;
 }
 
+/* update the cached dir position during a readdir population pass */
+static void update_cached_dirents_count(struct cached_dirents *cde,
+					struct file *file)
+{
+	if (cde->file != file)
+		return;
+	if (cde->is_valid || cde->is_failed)
+		return;
+
+	cde->pos++;
+}
+
+/* mark the cached_dirents as valid if readdir population pass completed successfully */
+static void finished_cached_dirents_count(struct cached_dirents *cde,
+					  struct dir_context *ctx,
+					  struct file *file)
+{
+	struct cifs_cached_dir_mapping *cached_mapping;
+
+	if (cde->file != file)
+		return;
+	if (cde->is_valid || cde->is_failed)
+		return;
+	if (ctx->pos != cde->pos)
+		return;
+
+	cached_mapping = last_cached_dir_mapping_locked(cde);
+	if (cached_mapping)
+		cached_mapping->folio_is_eof = 1;
+
+	cde->is_valid = 1;
+}
+
+/* update the cached_dirent for a given name in list */
+static bool update_cached_dirent_list_locked(struct cached_dirents *cde,
+						     const char *name,
+						     unsigned int namelen,
+						     const struct cifs_fattr *fattr)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *dirent;
+
+	entry = lookup_cached_dirent_list_locked(cde, name, namelen);
+	if (!entry)
+		return false;
+
+	dirent = entry->dirent;
+	if (!dirent)
+		return false;
+
+	memcpy(&dirent->fattr, fattr, sizeof(dirent->fattr));
+	dirent->tombstone = false;
+	return true;
+}
+
+/* update the cached_dirent for a given name in folioq */
+static bool update_cached_dirent_folioq_locked(struct cached_dirents *cde,
+						       const char *name,
+						       unsigned int namelen,
+						       const struct cifs_fattr *fattr)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *dirent;
+
+	entry = lookup_cached_dirent_locked(cde, name, namelen);
+	if (!entry)
+		return false;
+
+	dirent = entry->dirent;
+	if (!dirent)
+		return false;
+
+	memcpy(&dirent->fattr, fattr, sizeof(dirent->fattr));
+	dirent->tombstone = false;
+	return true;
+}
+
+/* update wrapper to decide if the entry is in list or folioq */
+static bool update_cached_dirent_locked(struct cached_dirents *cde,
+						const char *name,
+						unsigned int namelen,
+						const struct cifs_fattr *fattr)
+{
+	if (cached_dirents_use_folioq_locked(cde))
+		return update_cached_dirent_folioq_locked(cde, name, namelen,
+							  fattr);
+
+	return update_cached_dirent_list_locked(cde, name, namelen,
+							 fattr);
+}
+
+/* invalidate a cached_dirent by name in list */
+static bool invalidate_cached_dirent_list_locked(struct cached_dirents *cde,
+						 const char *name,
+						 unsigned int namelen)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *dirent;
+
+	entry = lookup_cached_dirent_list_locked(cde, name, namelen);
+	if (!entry)
+		return true;
+
+	dirent = entry->dirent;
+	if (!dirent)
+		return true;
+
+	dirent->tombstone = true;
+	return true;
+}
+
+/* invalidate a cached_dirent by name in folioq */
+static bool invalidate_cached_dirent_folioq_locked(struct cached_dirents *cde,
+						   const char *name,
+						   unsigned int namelen)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *dirent;
+
+	entry = lookup_cached_dirent_locked(cde, name, namelen);
+	if (!entry)
+		return true;
+
+	dirent = entry->dirent;
+	if (!dirent)
+		return false;
+
+	dirent->tombstone = true;
+	if (entry->pending_dcache) {
+		entry->pending_dcache = false;
+		complete_all(&entry->dcache_complete);
+	}
+
+	return true;
+}
+
+/* invalidate wrapper to decide if the entry is in list or folioq */
+static bool invalidate_cached_dirent_locked(struct cached_dirents *cde,
+						const char *name,
+						unsigned int namelen)
+{
+	if (cached_dirents_use_folioq_locked(cde))
+		return invalidate_cached_dirent_folioq_locked(cde, name,
+							      namelen);
+
+	return invalidate_cached_dirent_list_locked(cde, name, namelen);
+}
+
+/* append a dirent to the cached_dir */
 bool add_to_cached_dir(struct cached_fid *cfid,
 		       struct dir_context *ctx,
 		       const char *name,
@@ -168,96 +975,258 @@ bool add_to_cached_dir(struct cached_fid *cfid,
 		       struct cifs_fattr *fattr,
 		       struct file *file)
 {
-	size_t delta_bytes;
+	unsigned long old_entries;
+	unsigned long new_entries;
+	u64 old_bytes;
+	u64 new_bytes;
+	long entry_diff;
+	long long bytes_diff;
 	bool added = false;
 
 	if (!cfid)
 		return false;
 
-	/* Cost of this entry */
-	delta_bytes = sizeof(struct cached_dirent) + (size_t)namelen + 1;
-
 	mutex_lock(&cfid->dirents.de_mutex);
+	old_entries = cfid->dirents.entries_count;
+	old_bytes = cfid->dirents.bytes_used;
 	added = add_cached_dirent(&cfid->dirents, ctx, name, namelen,
 				  fattr, file);
+	new_entries = cfid->dirents.entries_count;
+	new_bytes = cfid->dirents.bytes_used;
 	mutex_unlock(&cfid->dirents.de_mutex);
 
-	if (added) {
-		/* per-tcon then global for consistency with free path */
-		atomic64_add((long long)delta_bytes, &cfid->cfids->total_dirents_bytes);
-		atomic_long_inc(&cfid->cfids->total_dirents_entries);
-		atomic64_add((long long)delta_bytes, &cifs_dircache_bytes_used);
+	entry_diff = (long)new_entries - (long)old_entries;
+	bytes_diff = (long long)new_bytes - (long long)old_bytes;
+
+	if (entry_diff > 0) {
+		atomic_long_add(entry_diff, &cfid->cfids->total_dirents_entries);
+	} else if (entry_diff < 0) {
+		atomic_long_sub(-entry_diff, &cfid->cfids->total_dirents_entries);
+	}
+
+	if (bytes_diff > 0) {
+		atomic64_add(bytes_diff, &cfid->cfids->total_dirents_bytes);
+		atomic64_add(bytes_diff, &cifs_dircache_bytes_used);
+	} else if (bytes_diff < 0) {
+		atomic64_sub(-bytes_diff, &cfid->cfids->total_dirents_bytes);
+		atomic64_sub(-bytes_diff, &cifs_dircache_bytes_used);
 	}
 
+
 	return added;
 }
 
-static void update_cached_dirents_count(struct cached_dirents *cde,
-					struct file *file)
+/* update the cached_dir position during a readdir population pass */
+void update_pos_cached_dir(struct cached_fid *cfid,
+				      struct file *file)
 {
-	if (cde->file != file)
-		return;
-	if (cde->is_valid || cde->is_failed)
+	if (!cfid)
 		return;
 
-	cde->pos++;
+	mutex_lock(&cfid->dirents.de_mutex);
+	update_cached_dirents_count(&cfid->dirents, file);
+	mutex_unlock(&cfid->dirents.de_mutex);
 }
 
-static void finished_cached_dirents_count(struct cached_dirents *cde,
-					  struct dir_context *ctx,
-					  struct file *file)
+/* signal completion of cached_dir population after a readdir pass */
+void complete_cached_dir(struct cached_fid *cfid,
+					struct dir_context *ctx,
+					struct file *file)
 {
-	if (cde->file != file)
-		return;
-	if (cde->is_valid || cde->is_failed)
-		return;
-	if (ctx->pos != cde->pos)
+	struct cached_dirents *cde;
+
+	if (!cfid)
 		return;
 
-	cde->is_valid = 1;
+	cde = &cfid->dirents;
+	mutex_lock(&cfid->dirents.de_mutex);
+	finished_cached_dirents_count(cde, ctx, file);
+	mutex_unlock(&cfid->dirents.de_mutex);
 }
 
-void update_pos_cached_dir(struct cached_fid *cfid,
-				      struct file *file)
+/*
+ * lookup a cached_dirent by name, returning -ENOENT if not found or if the
+ * entry is a tombstone.  The result struct is filled in with the fattr of the
+ * found entry, and flags indicating whether the entry was found, whether the
+ * cache was fully populated at the time of lookup, and whether there was an
+ * active lease on the directory at the time of lookup.
+ */
+int lookup_cached_dir(struct cached_fid *cfid,
+				 const char *name,
+				 unsigned int namelen,
+				 struct cached_dirent_lookup_result *result)
+{
+	struct cached_dir_lookup_entry *entry;
+	struct cached_dirent *dirent;
+	bool lease_active;
+
+	if (!cfid || !name || !namelen || !result)
+		return -EINVAL;
+
+	memset(result, 0, sizeof(*result));
+
+	spin_lock(&cfid->cfid_lock);
+	lease_active = is_valid_cached_dir(cfid);
+	spin_unlock(&cfid->cfid_lock);
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	result->under_active_lease = lease_active;
+	result->fully_populated = cfid->dirents.is_valid;
+
+	entry = lookup_cached_dirent_entry_locked(&cfid->dirents, name, namelen);
+	if (!entry || !entry->dirent) {
+		mutex_unlock(&cfid->dirents.de_mutex);
+		return -ENOENT;
+	}
+
+	dirent = entry->dirent;
+	if (dirent->tombstone) {
+		mutex_unlock(&cfid->dirents.de_mutex);
+		return -ENOENT;
+	}
+
+	result->found = true;
+	memcpy(&result->fattr, &dirent->fattr, sizeof(result->fattr));
+
+	mutex_unlock(&cfid->dirents.de_mutex);
+	return 0;
+}
+
+/*
+ * Invalidate all cached_dirents for a cached_fid. We generally
+ * try to invalidate specific entries by name. This is used as
+ * a last resort when we can't invalidate specific entries
+ */
+void invalidate_cached_dir_contents(struct cached_fid *cfid)
 {
 	if (!cfid)
 		return;
 
 	mutex_lock(&cfid->dirents.de_mutex);
-	update_cached_dirents_count(&cfid->dirents, file);
+	fail_cached_dir_locked(&cfid->dirents);
 	mutex_unlock(&cfid->dirents.de_mutex);
 }
 
-void complete_cached_dir(struct cached_fid *cfid,
-					struct dir_context *ctx,
-					struct file *file)
+/*
+ * Update a cached_dirent for a given name.  Returns true if the entry was
+ * found and updated, false if the entry was not found or if the cache is not
+ * valid.
+ */
+bool update_dirent_in_cached_dir(struct cached_fid *cfid,
+				  const char *name,
+				  unsigned int namelen,
+				  const struct cifs_fattr *fattr)
+{
+	bool updated = false;
+
+	if (!cfid || !name || !namelen || !fattr)
+		return false;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	updated = update_cached_dirent_locked(&cfid->dirents, name,
+						      namelen, fattr);
+	mutex_unlock(&cfid->dirents.de_mutex);
+	return updated;
+}
+
+/*
+ * Invalidate a cached_dirent for a given name.  Returns true if the entry was
+ * found and invalidated, false if the entry was not found or if the cache is
+ * not valid.
+ */
+bool invalidate_dirent_in_cached_dir(struct cached_fid *cfid,
+				      const char *name,
+				      unsigned int namelen)
 {
+	bool invalidated = false;
+
+	if (!cfid || !name || !namelen)
+		return false;
+	if (!cached_dir_is_valid(cfid))
+		return false;
+
+	mutex_lock(&cfid->dirents.de_mutex);
+	if (!cfid->dirents.is_valid || cfid->dirents.is_failed)
+		goto out_unlock;
+
+	invalidated = invalidate_cached_dirent_locked(&cfid->dirents,
+							 name, namelen);
+
+out_unlock:
+	mutex_unlock(&cfid->dirents.de_mutex);
+	return invalidated;
+}
+
+/*
+ * Signal completion of dcache population for a specific dirent.
+ * Called after cifs_prime_dcache returns, on both sync and async paths.
+ * Clears the pending_dcache flag and unblocks any waiting lookups.
+ */
+void cifs_complete_pending_dcache(struct cached_fid *cfid,
+		const char *name, unsigned int namelen)
+{
+	struct cached_dir_lookup_entry *entry;
+	bool uses_folioq;
+	int ret = -ENOENT;
+
 	if (!cfid)
 		return;
 
 	mutex_lock(&cfid->dirents.de_mutex);
-	finished_cached_dirents_count(&cfid->dirents, ctx, file);
+	uses_folioq = cached_dirents_use_folioq_locked(&cfid->dirents);
+	entry = lookup_cached_dirent_entry_locked(&cfid->dirents, name, namelen);
+	if (entry) {
+		if (uses_folioq && entry->pending_dcache) {
+			entry->pending_dcache = false;
+			complete_all(&entry->dcache_complete);
+		}
+		ret = 0;
+	}
 	mutex_unlock(&cfid->dirents.de_mutex);
+	cifs_dbg(FYI, "Dcache population of %.*s. status: %d\n",
+					namelen, name, ret);
 }
 
-struct cached_dirent *lookup_cached_dirent(struct cached_dirents *cde,
-				   const char *name,
-				   unsigned int namelen)
+/*
+ * Signal completion of dcache population for a specific dirent.
+ * Wait for async dcache population to complete for a specific dirent.
+ * Returns: 0 on completion or entry not pending, -ETIMEDOUT on timeout,
+ *          -ENOENT if entry not found in the cache.
+ */
+int cifs_wait_for_pending_dcache(struct cached_fid *cfid,
+		const char *name, unsigned int namelen)
 {
-	struct cached_dirent *entry;
+	struct cached_dir_lookup_entry *entry;
+	bool uses_folioq;
+	struct completion *comp = NULL;
+	int ret = -ENOENT;
 
-	if (!cde)
-		return NULL;
+	if (!cfid)
+		return -ENOENT;
 
-	lockdep_assert_held(&cde->de_mutex);
+	mutex_lock(&cfid->dirents.de_mutex);
+	uses_folioq = cached_dirents_use_folioq_locked(&cfid->dirents);
+	entry = lookup_cached_dirent_entry_locked(&cfid->dirents, name, namelen);
+	if (entry) {
+		ret = 0;
+		if (uses_folioq && entry->pending_dcache)
+			comp = &entry->dcache_complete;
+	}
+	mutex_unlock(&cfid->dirents.de_mutex);
 
-	list_for_each_entry(entry, &cde->entries, entry) {
-		if (entry->namelen == namelen &&
-		    memcmp(entry->name, name, namelen) == 0)
-			return entry;
+	if (comp) {
+		if (wait_for_completion_timeout(comp, CIFS_DCACHE_WAIT_TIMEOUT) == 0) {
+			cifs_dbg(FYI, "Timeout waiting for dcache population of %.*s\n",
+					namelen, name);
+			ret = -ETIMEDOUT;
+		} else {
+			cifs_dbg(FYI, "Dcache population completed for %.*s\n",
+					namelen, name);
+			ret = 0;
+		}
 	}
 
-	return NULL;
+	return ret;
 }
 
 static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
@@ -682,7 +1651,9 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 			      struct cached_fid **ret_cfid)
 {
 	struct cached_fid *cfid;
+	struct cached_fid *trace_cfid = NULL;
 	struct cached_fids *cfids = tcon->cfids;
+	int rc = -ENOENT;
 
 	if (cfids == NULL)
 		return -EOPNOTSUPP;
@@ -702,13 +1673,15 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 			kref_get(&cfid->refcount);
 			*ret_cfid = cfid;
 			cfid->last_access_time = jiffies;
+			rc = 0;
+			trace_cfid = cfid;
 			spin_unlock(&cfid->cfid_lock);
 			spin_unlock(&cfids->cfid_list_lock);
-			return 0;
+			return rc;
 		}
 	}
 	spin_unlock(&cfids->cfid_list_lock);
-	return -ENOENT;
+	return rc;
 }
 
 static void
@@ -853,10 +1826,10 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 }
 
 /*
- * Invalidate all cached dirs when a TCON has been reset
- * due to a session loss.
+ * Queue all cached dirs for invalidation on laundromat without waiting.
+ * Safe for callers that hold cifs_tcp_ses_lock.
  */
-void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
+void invalidate_all_cached_dirs_nowait(struct cifs_tcon *tcon)
 {
 	struct cached_fids *cfids = tcon->cfids;
 	struct cached_fid *cfid, *q;
@@ -890,8 +1863,22 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 
-	/* run laundromat unconditionally now as there might have been previously queued work */
+	/* Run laundromat now as there might have been previously queued work. */
 	mod_delayed_work(cfid_put_wq, &cfids->laundromat_work, 0);
+}
+
+/*
+ * Invalidate all cached dirs when a TCON has been reset
+ * due to a session loss.
+ */
+void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
+{
+	struct cached_fids *cfids = tcon->cfids;
+
+	if (!cfids)
+		return;
+
+	invalidate_all_cached_dirs_nowait(tcon);
 	flush_delayed_work(&cfids->laundromat_work);
 }
 
@@ -980,7 +1967,7 @@ static struct cached_fid *init_cached_dir(const char *path)
 	INIT_WORK(&cfid->close_work, cached_dir_offload_close);
 	INIT_WORK(&cfid->put_work, cached_dir_put_work);
 	INIT_LIST_HEAD(&cfid->entry);
-	INIT_LIST_HEAD(&cfid->dirents.entries);
+	INIT_LIST_HEAD(&cfid->dirents.entry_list);
 	mutex_init(&cfid->dirents.de_mutex);
 	mutex_init(&cfid->cfid_open_mutex);
 	spin_lock_init(&cfid->cfid_lock);
@@ -990,38 +1977,34 @@ static struct cached_fid *init_cached_dir(const char *path)
 
 static void free_cached_dir(struct cached_fid *cfid)
 {
-	struct cached_dirent *dirent, *q;
+	unsigned long entries_count = 0;
+	u64 bytes_used = 0;
 
 	WARN_ON(work_pending(&cfid->close_work));
 	WARN_ON(work_pending(&cfid->put_work));
 
+
 	dput(cfid->dentry);
 	cfid->dentry = NULL;
 
-	/*
-	 * Delete all cached dirent names
-	 */
-	list_for_each_entry_safe(dirent, q, &cfid->dirents.entries, entry) {
-		list_del(&dirent->entry);
-		kfree(dirent->name);
-		kfree(dirent);
-	}
+	mutex_lock(&cfid->dirents.de_mutex);
+	entries_count = cfid->dirents.entries_count;
+	bytes_used = cfid->dirents.bytes_used;
+	release_cached_dirents_locked(&cfid->dirents);
+	mutex_unlock(&cfid->dirents.de_mutex);
 
 	/* adjust tcon-level counters and reset per-dir accounting */
 	if (cfid->cfids) {
-		if (cfid->dirents.entries_count)
-			atomic_long_sub((long)cfid->dirents.entries_count,
+		if (entries_count)
+			atomic_long_sub((long)entries_count,
 					&cfid->cfids->total_dirents_entries);
-		if (cfid->dirents.bytes_used) {
-			atomic64_sub((long long)cfid->dirents.bytes_used,
+		if (bytes_used) {
+			atomic64_sub((long long)bytes_used,
 					&cfid->cfids->total_dirents_bytes);
-			atomic64_sub((long long)cfid->dirents.bytes_used,
+			atomic64_sub((long long)bytes_used,
 					&cifs_dircache_bytes_used);
 		}
 	}
-	cfid->dirents.entries_count = 0;
-	cfid->dirents.bytes_used = 0;
-
 	kfree(cfid->path);
 	cfid->path = NULL;
 	kfree(cfid);
@@ -1041,7 +2024,7 @@ static void cfids_laundromat_worker(struct work_struct *work)
 
 	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
 		spin_lock(&cfid->cfid_lock);
-		if (cfid->last_access_time &&
+		if (dir_cache_timeout && cfid->last_access_time &&
 		    time_after(jiffies, cfid->last_access_time + HZ * dir_cache_timeout)) {
 			cfid->on_list = false;
 			list_move(&cfid->entry, &entry);
@@ -1083,8 +2066,9 @@ static void cfids_laundromat_worker(struct work_struct *work)
 			 */
 			close_cached_dir(cfid);
 	}
-	queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
-			   dir_cache_timeout * HZ);
+	if (dir_cache_timeout)
+		queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
+				   dir_cache_timeout * HZ);
 }
 
 struct cached_fids *init_cached_dirs(void)
@@ -1099,8 +2083,9 @@ struct cached_fids *init_cached_dirs(void)
 	INIT_LIST_HEAD(&cfids->dying);
 
 	INIT_DELAYED_WORK(&cfids->laundromat_work, cfids_laundromat_worker);
-	queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
-			   dir_cache_timeout * HZ);
+	if (dir_cache_timeout)
+		queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
+				   dir_cache_timeout * HZ);
 
 	atomic_long_set(&cfids->total_dirents_entries, 0);
 	atomic64_set(&cfids->total_dirents_bytes, 0);
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 0767350b40fba..0726f25b9144a 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -8,16 +8,107 @@
 #ifndef _CACHED_DIR_H
 #define _CACHED_DIR_H
 
+#include <linux/completion.h>
+#include <linux/build_bug.h>
+#include <linux/list.h>
+#include <linux/netfs.h>
+
 struct cifs_search_info;
 
+/* Timeout for waiting on async dcache population to complete */
+#define CIFS_DCACHE_WAIT_TIMEOUT	(HZ / 10)
+
+#define CIFS_CACHED_INLINE_NAME_LEN	64
+#define CIFS_CACHED_DIRENT_LIST_THRESHOLD	64
+
 struct cached_dirent {
-	struct list_head entry;
-	char *name;
-	int namelen;
-	loff_t pos;
+	const char *name;
+	u32 name_len;
+	bool external_name;
+	bool tombstone;
+	u32 inline_name_off;
+	loff_t ctx_pos;
 	struct cifs_fattr fattr;
 };
 
+/*
+ * Folio-backed cached directory entry storage:
+ *
+ * Directory entries are stored in a folio_queue managed by cached_dirents.
+ * Each folio's virtual address points to a cifs_cached_dir_mapping structure,
+ * which combines directory metadata and a variable-length array of cached_dirent
+ * entries in a single folio allocation.
+ *
+ * Layout within each folio:
+ *   [cifs_cached_dir_mapping] [cached_dirent[0]] ... [cached_dirent[n]]
+ *                             ^                                            ^
+ *                             |-------- entries_count ---------|
+ *                             |-------- name_tail_offset (growing downward) ---------|
+ *                             Inline name data (packed at tail of the folio)
+ *
+ * Field meanings:
+ *   name_tail_offset: Current start offset of inline-name storage in the folio.
+ *                     This moves downward as inline names are packed from tail.
+ *   folio_full: Set when this folio cannot accept another cached_dirent record
+ *               (record array would collide with inline-name tail region).
+ *   folio_is_eof: Set when this folio contains the last emitted dirent for the
+ *                 cached directory stream; readers stop when this folio is seen.
+ *
+ * Inline name optimization:
+ *   Names <= CIFS_CACHED_INLINE_NAME_LEN are packed at the tail of the folio,
+ *   after the last dirent entry. This avoids per-name allocation. For longer names,
+ *   external_name is set and a separate kstrndup'd pointer is used.
+ *
+ * Tracking and lookup:
+ *   A hash table (lookup_ht) in cached_dirents indexes all entries by name.
+ *   Each hash entry (cached_dir_lookup_entry) records:
+ *     - name pointer (points into inline region or external memory)
+ *     - dirent pointer (points to cached_dirent in folio or list allocation)
+ *   This enables O(1) lookups during dirent reservation and update operations,
+ *   while also allowing list-backed staging to reuse cached_dirent directly.
+ *
+ * Sequencing and position tracking:
+ *   last_pos tracks the directory position (ctx->pos) of the last entry added
+ *   to this folio. When adding the next entry, we use last_pos + 1 to maintain
+ *   consistent incrementing positions used for directory iteration.
+ */
+struct cifs_cached_dir_mapping {
+	u64 last_cookie;
+	u32 entries_count;
+	u32 name_tail_offset;
+	u32 folio_full:1;
+	u32 folio_is_eof:1;
+	struct cached_dirent entries[];
+};
+
+struct cached_dir_lookup_entry {
+	struct hlist_node hash_node;
+	struct list_head list_node;
+	struct completion dcache_complete;
+	struct cached_dirent *dirent;
+	u32 name_hash;
+	bool pending_dcache;
+};
+
+/*
+ * Per-directory dirent cache using a two-mode storage strategy:
+ *
+ * Small directories (up to CIFS_CACHED_DIRENT_LIST_THRESHOLD entries):
+ *   Entries are stored as individually allocated cached_dirent structs linked
+ *   via cached_dir_lookup_entry nodes in entry_list. Each entry carries its
+ *   own name allocation. This avoids folio overhead for short-lived or small
+ *   directories.
+ *
+ * Large directories (above the threshold):
+ *   The list is converted to folio-backed storage. Entries are packed into
+ *   folios managed by folioq, with names <= CIFS_CACHED_INLINE_NAME_LEN stored
+ *   inline at the tail of each folio to reduce per-name allocations. A hash
+ *   table (lookup_ht) provides O(1) name lookup in this mode.
+ *
+ * The active mode is determined by whether folioq is non-NULL. All CRUD
+ * operations (insert, lookup, update, invalidate, release) dispatch to the
+ * appropriate list or folioq implementation via mode-dispatching helpers.
+ */
 struct cached_dirents {
 	bool is_valid:1;
 	bool is_failed:1;
@@ -25,9 +116,23 @@ struct cached_dirents {
 			    * Used to associate the cache with a single
 			    * open file instance.
 			    */
+	struct inode *dir_inode;
 	struct mutex de_mutex;
 	loff_t pos;		 /* Expected ctx->pos */
-	struct list_head entries;
+	struct folio_queue *folioq;
+	struct list_head entry_list;
+	unsigned int entry_list_count;
+	/*
+	 * Insertion cursor used by add_cached_dirent() to avoid rescanning folioq
+	 * from the head on every append.
+	 */
+	struct folio_queue *insert_cursor_fq;
+	unsigned int insert_cursor_slot;
+	unsigned int insert_cursor_folio_index;
+	size_t folioq_size;
+	unsigned long external_name_bytes;
+	struct hlist_head *lookup_ht;
+	unsigned long lookup_bytes;
 	/* accounting for cached entries in this directory */
 	unsigned long entries_count;
 	unsigned long bytes_used;
@@ -57,6 +162,13 @@ struct cached_fid {
 	struct smb2_file_all_info file_all_info;
 };
 
+struct cached_dirent_lookup_result {
+	bool found;
+	bool under_active_lease;
+	bool fully_populated;
+	struct cifs_fattr fattr;
+};
+
 /* default MAX_CACHED_FIDS is 16 */
 struct cached_fids {
 	/* Must be held when:
@@ -115,12 +227,25 @@ void update_pos_cached_dir(struct cached_fid *cfid,
 void complete_cached_dir(struct cached_fid *cfid,
 					struct dir_context *ctx,
 					struct file *file);
-struct cached_dirent *lookup_cached_dirent(struct cached_dirents *cde,
-				   const char *name,
-				   unsigned int namelen);
+int lookup_cached_dir(struct cached_fid *cfid,
+				 const char *name, unsigned int namelen,
+				 struct cached_dirent_lookup_result *result);
+void invalidate_cached_dir_contents(struct cached_fid *cfid);
+bool update_dirent_in_cached_dir(struct cached_fid *cfid,
+				  const char *name,
+				  unsigned int namelen,
+				  const struct cifs_fattr *fattr);
+bool invalidate_dirent_in_cached_dir(struct cached_fid *cfid,
+				      const char *name,
+				      unsigned int namelen);
+void cifs_complete_pending_dcache(struct cached_fid *cfid,
+				  const char *name, unsigned int namelen);
+int cifs_wait_for_pending_dcache(struct cached_fid *cfid,
+				 const char *name, unsigned int namelen);
 void drop_cached_dir_by_name(const unsigned int xid, struct cifs_tcon *tcon,
 			     const char *name, struct cifs_sb_info *cifs_sb);
 void close_all_cached_dirs(struct cifs_sb_info *cifs_sb);
+void invalidate_all_cached_dirs_nowait(struct cifs_tcon *tcon);
 void invalidate_all_cached_dirs(struct cifs_tcon *tcon);
 bool cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16]);
 
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index bbbee0ef09443..1bf34a97f051f 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -179,6 +179,7 @@ void cifs_unix_basic_to_fattr(struct cifs_fattr *fattr,
 void cifs_dir_info_to_fattr(struct cifs_fattr *fattr,
 			    FILE_DIRECTORY_INFO *info,
 			    struct cifs_sb_info *cifs_sb);
+void cifs_inode_to_fattr(struct inode *inode, struct cifs_fattr *fattr);
 int cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr,
 			bool from_readdir);
 struct inode *cifs_iget(struct super_block *sb, struct cifs_fattr *fattr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 11/19] cifs: in place changes to cached_dirents when dir lease is held
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (8 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 10/19] cifs: back cached_dirents with page cache nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 12/19] cifs: register a shrinker to manage cached_dirents nspmangalore
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

When a directory lease is held, we do not need to invalidate the
dirent cache on the cfid when new dentries are added.

This change allows local adds to directory contents to be made
in the cached_dirents so that we don't need to fetch again from server.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/dir.c   |  95 ++++++++++---------
 fs/smb/client/inode.c | 209 +++++++++++++++++++++++++++++++++++++-----
 2 files changed, 236 insertions(+), 68 deletions(-)

diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index 4e5c580e4de0a..092fff1ad02a2 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -194,7 +194,6 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 	struct cached_fid *parent_cfid = NULL;
 	int rdwr_for_fscache = 0;
 	__le32 lease_flags = 0;
-	bool found_parent_cfid;
 
 	*oplock = 0;
 	if (tcon->ses->server->oplocks)
@@ -320,33 +319,14 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 
 retry_open:
 	if (tcon->cfids && direntry->d_parent && server->dialect >= SMB30_PROT_ID) {
-		found_parent_cfid = false;
 		parent_cfid = NULL;
-		spin_lock(&tcon->cfids->cfid_list_lock);
-		list_for_each_entry(parent_cfid, &tcon->cfids->entries, entry) {
-			spin_lock(&parent_cfid->cfid_lock);
-			if (parent_cfid->dentry == direntry->d_parent) {
-				kref_get(&parent_cfid->refcount);
-				spin_unlock(&parent_cfid->cfid_lock);
-				spin_unlock(&tcon->cfids->cfid_list_lock);
-				found_parent_cfid = true;
-				cifs_dbg(FYI, "found a parent cached file handle\n");
-				if (cached_dir_copy_lease_key(parent_cfid,
-						      fid->parent_lease_key)) {
-					lease_flags
-						|= SMB2_LEASE_FLAG_PARENT_LEASE_KEY_SET_LE;
-					mutex_lock(&parent_cfid->dirents.de_mutex);
-					parent_cfid->dirents.is_valid = false;
-					parent_cfid->dirents.is_failed = true;
-					mutex_unlock(&parent_cfid->dirents.de_mutex);
-				}
-				close_cached_dir(parent_cfid);
-				break;
-			}
-			spin_unlock(&parent_cfid->cfid_lock);
+		if (!open_cached_dir_by_dentry(tcon, direntry->d_parent,
+					      &parent_cfid)) {
+			cifs_dbg(FYI, "found a parent cached file handle\n");
+			if (cached_dir_copy_lease_key(parent_cfid,
+					      fid->parent_lease_key))
+				lease_flags |= SMB2_LEASE_FLAG_PARENT_LEASE_KEY_SET_LE;
 		}
-		if (!found_parent_cfid)
-			spin_unlock(&tcon->cfids->cfid_list_lock);
 	}
 
 	oparms = (struct cifs_open_parms) {
@@ -364,6 +344,10 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 	if (rc) {
 		cifs_dbg(FYI, "cifs_create returned 0x%x\n", rc);
 		if (rc == -EACCES && rdwr_for_fscache == 1) {
+			if (parent_cfid) {
+				close_cached_dir(parent_cfid);
+				parent_cfid = NULL;
+			}
 			desired_access &= ~GENERIC_READ;
 			rdwr_for_fscache = 2;
 			goto retry_open;
@@ -452,10 +436,25 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
 			goto out_err;
 		}
 
+	if (newinode && parent_cfid) {
+		struct cifs_fattr fattr;
+		bool cache_updated;
+
+		cifs_inode_to_fattr(newinode, &fattr);
+		cache_updated = update_dirent_in_cached_dir(parent_cfid,
+						    direntry->d_name.name,
+						    direntry->d_name.len,
+						    &fattr);
+		if (!cache_updated)
+			invalidate_cached_dir_contents(parent_cfid);
+	}
+
 	d_drop(direntry);
 	d_add(direntry, newinode);
 
 out:
+	if (parent_cfid)
+		close_cached_dir(parent_cfid);
 	free_dentry_path(page);
 	return rc;
 
@@ -732,27 +731,35 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
 
 		cifs_dbg(FYI, "NULL inode in lookup\n");
 
-		/*
-		 * We can only rely on negative dentries having the same
-		 * spelling as the cached dirent if case insensitivity is
-		 * forced on mount.
-		 *
-		 * XXX: if servers correctly announce Case Sensitivity Search
-		 * on GetInfo of FileFSAttributeInformation, then we can take
-		 * correct action even if case insensitive is not forced on
-		 * mount.
-		 */
-		if (pTcon->nocase && !open_cached_dir_by_dentry(pTcon, direntry->d_parent, &cfid)) {
+		if (!open_cached_dir_by_dentry(pTcon, direntry->d_parent, &cfid)) {
+			struct qstr qname = QSTR_INIT(direntry->d_name.name, direntry->d_name.len);
+			struct cached_dirent_lookup_result lookup = {};
+			int rc_lookup;
+			int rc_wait;
+
+			rc_wait = cifs_wait_for_pending_dcache(cfid, qname.name, qname.len);
+			if (rc_wait == -ETIMEDOUT)
+				cifs_dbg(FYI, "Wait for pending dcache entry timed out\n");
+
+			rc_lookup = lookup_cached_dir(cfid, qname.name,
+							       qname.len, &lookup);
+			if (!rc_lookup && lookup.found && lookup.under_active_lease) {
+				newInode = cifs_iget(parent_dir_inode->i_sb, &lookup.fattr);
+				close_cached_dir(cfid);
+				if (!newInode) {
+					de = ERR_PTR(-ENOMEM);
+					goto free_dentry_path;
+				}
+				rc = 0;
+				renew_parental_timestamps(direntry);
+				goto out;
+			}
+
 			/*
 			 * dentry is negative and parent is fully cached:
-			 * we can assume file does not exist
+			 * we can assume file does not exist if case sensitive
 			 */
-			bool dirents_valid;
-
-			mutex_lock(&cfid->dirents.de_mutex);
-			dirents_valid = cfid->dirents.is_valid;
-			mutex_unlock(&cfid->dirents.de_mutex);
-			if (dirents_valid) {
+			if (pTcon->nocase && cfid->dirents.is_valid) {
 				close_cached_dir(cfid);
 				goto out;
 			}
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index f0b76670b0921..ecc92e5c7f7b6 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -28,19 +28,18 @@
 #include "cached_dir.h"
 #include "reparse.h"
 
-static void cifs_invalidate_cached_dir(struct cifs_tcon *tcon,
-				       struct dentry *parent)
+static void cifs_invalidate_cached_dirent(struct cifs_tcon *tcon,
+						 struct dentry *parent,
+						 const char *name,
+						 unsigned int namelen)
 {
 	struct cached_fid *parent_cfid = NULL;
 
-	if (!tcon || !parent)
+	if (!tcon || !parent || !name || !namelen)
 		return;
 
 	if (!open_cached_dir_by_dentry(tcon, parent, &parent_cfid)) {
-		mutex_lock(&parent_cfid->dirents.de_mutex);
-		parent_cfid->dirents.is_valid = false;
-		parent_cfid->dirents.is_failed = true;
-		mutex_unlock(&parent_cfid->dirents.de_mutex);
+		invalidate_dirent_in_cached_dir(parent_cfid, name, namelen);
 		close_cached_dir(parent_cfid);
 	}
 }
@@ -177,6 +176,28 @@ cifs_nlink_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
 	set_nlink(inode, fattr->cf_nlink);
 }
 
+void cifs_inode_to_fattr(struct inode *inode, struct cifs_fattr *fattr)
+{
+	struct cifsInodeInfo *cifs_i = CIFS_I(inode);
+
+	memset(fattr, 0, sizeof(*fattr));
+	fattr->cf_cifsattrs = cifs_i->cifsAttrs;
+	fattr->cf_uniqueid = cifs_i->uniqueid;
+	fattr->cf_eof = cifs_i->netfs.remote_i_size;
+	fattr->cf_bytes = (u64)inode->i_blocks << 9;
+	fattr->cf_createtime = cifs_i->createtime;
+	fattr->cf_uid = inode->i_uid;
+	fattr->cf_gid = inode->i_gid;
+	fattr->cf_mode = inode->i_mode;
+	fattr->cf_rdev = inode->i_rdev;
+	fattr->cf_nlink = inode->i_nlink;
+	fattr->cf_dtype = S_DT(inode->i_mode);
+	fattr->cf_atime = inode_get_atime(inode);
+	fattr->cf_mtime = inode_get_mtime(inode);
+	fattr->cf_ctime = inode_get_ctime(inode);
+	fattr->cf_cifstag = cifs_i->reparse_tag;
+}
+
 /* populate an inode with info from a cifs_fattr struct */
 int
 cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr,
@@ -1169,6 +1190,24 @@ static inline bool is_inode_cache_good(struct inode *ino)
 	return ino && CIFS_CACHE_READ(CIFS_I(ino)) && CIFS_I(ino)->time != 0;
 }
 
+static bool cifs_inode_has_writable_handle(struct inode *inode)
+{
+	struct cifsInodeInfo *cifs_inode = CIFS_I(inode);
+	struct cifsFileInfo *open_file;
+	bool writable = false;
+
+	spin_lock(&cifs_inode->open_file_lock);
+	list_for_each_entry(open_file, &cifs_inode->openFileList, flist) {
+		if (OPEN_FMODE(open_file->f_flags) & FMODE_WRITE) {
+			writable = true;
+			break;
+		}
+	}
+	spin_unlock(&cifs_inode->open_file_lock);
+
+	return writable;
+}
+
 static int reparse_info_to_fattr(struct cifs_open_info_data *data,
 				 struct super_block *sb,
 				 const unsigned int xid,
@@ -2085,7 +2124,9 @@ static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyren
 
 out_reval:
 	if (!rc && dentry->d_parent)
-		cifs_invalidate_cached_dir(tcon, dentry->d_parent);
+		cifs_invalidate_cached_dirent(tcon, dentry->d_parent,
+							    dentry->d_name.name,
+							    dentry->d_name.len);
 
 	if (inode) {
 		cifs_inode = CIFS_I(inode);
@@ -2276,6 +2317,7 @@ struct dentry *cifs_mkdir(struct mnt_idmap *idmap, struct inode *inode,
 	int rc = 0;
 	unsigned int xid;
 	struct cifs_sb_info *cifs_sb;
+	struct cached_fid *parent_cfid = NULL;
 	struct tcon_link *tlink;
 	struct cifs_tcon *tcon;
 	struct TCP_Server_Info *server;
@@ -2337,12 +2379,26 @@ struct dentry *cifs_mkdir(struct mnt_idmap *idmap, struct inode *inode,
 	/* TODO: skip this for smb2/smb3 */
 	rc = cifs_mkdir_qinfo(inode, direntry, mode, full_path, cifs_sb, tcon,
 			      xid);
+	if (!rc && d_inode(direntry) && direntry->d_parent &&
+	    server->dialect >= SMB30_PROT_ID &&
+	    !open_cached_dir_by_dentry(tcon, direntry->d_parent, &parent_cfid)) {
+		struct cifs_fattr fattr;
+
+		cifs_inode_to_fattr(d_inode(direntry), &fattr);
+		if (!update_dirent_in_cached_dir(parent_cfid,
+						  direntry->d_name.name,
+						  direntry->d_name.len,
+						  &fattr))
+			invalidate_cached_dir_contents(parent_cfid);
+	}
 mkdir_out:
 	/*
 	 * Force revalidate to get parent dir info when needed since cached
 	 * attributes are invalid now.
 	 */
 	CIFS_I(inode)->time = 0;
+	if (parent_cfid)
+		close_cached_dir(parent_cfid);
 	free_dentry_path(page);
 	free_xid(xid);
 	cifs_put_tlink(tlink);
@@ -2408,7 +2464,9 @@ int cifs_rmdir(struct inode *inode, struct dentry *direntry)
 		clear_nlink(d_inode(direntry));
 		spin_unlock(&d_inode(direntry)->i_lock);
 		if (direntry->d_parent)
-			cifs_invalidate_cached_dir(tcon, direntry->d_parent);
+			cifs_invalidate_cached_dirent(tcon, direntry->d_parent,
+							    direntry->d_name.name,
+							    direntry->d_name.len);
 	}
 
 	/* force revalidate to go get info when needed */
@@ -2518,15 +2576,18 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	     struct dentry *target_dentry, unsigned int flags)
 {
 	const char *from_name, *to_name;
+	const char *source_name, *target_name;
 	struct TCP_Server_Info *server;
 	void *page1, *page2;
 	struct cifs_sb_info *cifs_sb;
 	struct tcon_link *tlink;
 	struct cifs_tcon *tcon;
+	struct inode *source_inode;
 	bool rehash = false;
 	unsigned int xid;
 	int rc, tmprc;
 	int retry_count = 0;
+	unsigned int source_namelen, target_namelen;
 	FILE_UNIX_BASIC_INFO *info_buf_source = NULL;
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
 	FILE_UNIX_BASIC_INFO *info_buf_target;
@@ -2554,6 +2615,11 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 	if (IS_ERR(tlink))
 		return PTR_ERR(tlink);
 	tcon = tlink_tcon(tlink);
+	source_inode = d_inode(source_dentry);
+	source_name = source_dentry->d_name.name;
+	source_namelen = source_dentry->d_name.len;
+	target_name = target_dentry->d_name.name;
+	target_namelen = target_dentry->d_name.len;
 	server = tcon->ses->server;
 
 	page1 = alloc_dentry_path();
@@ -2594,6 +2660,33 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 
 	if (!rc)
 		rehash = false;
+
+	/* Update cached dirents after successful rename (before exit checks) */
+	if (!rc) {
+		struct cifs_fattr fattr;
+		struct cached_fid *target_cfid = NULL;
+
+		/* Invalidate source entry (no longer exists at old name) */
+		cifs_invalidate_cached_dirent(tcon, source_dentry->d_parent,
+					      source_name,
+					      source_namelen);
+
+		/* Upsert target entry with the renamed inode's attributes */
+		if (source_inode) {
+			cifs_inode_to_fattr(source_inode, &fattr);
+			if (!open_cached_dir_by_dentry(tcon,
+						       target_dentry->d_parent,
+						       &target_cfid)) {
+				if (!update_dirent_in_cached_dir(target_cfid,
+								  target_name,
+								  target_namelen,
+								  &fattr))
+					invalidate_cached_dir_contents(target_cfid);
+				close_cached_dir(target_cfid);
+			}
+		}
+	}
+
 	/*
 	 * No-replace is the natural behavior for CIFS, so skip unlink hacks.
 	 */
@@ -2689,13 +2782,6 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir,
 		}
 	}
 
-	/* force revalidate to go get info when needed */
-	if (!rc) {
-		cifs_invalidate_cached_dir(tcon, source_dentry->d_parent);
-		if (target_dentry->d_parent != source_dentry->d_parent)
-			cifs_invalidate_cached_dir(tcon, target_dentry->d_parent);
-	}
-
 	CIFS_I(source_dir)->time = CIFS_I(target_dir)->time = 0;
 
 cifs_rename_exit:
@@ -2715,13 +2801,14 @@ cifs_dentry_needs_reval(struct dentry *dentry)
 	struct inode *inode = d_inode(dentry);
 	struct cifsInodeInfo *cifs_i = CIFS_I(inode);
 	struct cifs_sb_info *cifs_sb = CIFS_SB(inode);
-	struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+	struct tcon_link *tlink;
+	struct cifs_tcon *tcon;
 	struct cached_fid *cfid = NULL;
+	bool retried_pending = false;
+	bool force_reval = cifs_i->time == 0;
 
 	if (test_bit(CIFS_INO_DELETE_PENDING, &cifs_i->flags))
 		return false;
-	if (cifs_i->time == 0)
-		return true;
 
 	if (CIFS_CACHE_READ(cifs_i))
 		return false;
@@ -2729,36 +2816,110 @@ cifs_dentry_needs_reval(struct dentry *dentry)
 	if (!lookupCacheEnabled)
 		return true;
 
+	tlink = cifs_sb_tlink(cifs_sb);
+	if (IS_ERR(tlink))
+		return true;
+	tcon = tlink_tcon(tlink);
+
 	if (!open_cached_dir_by_dentry(tcon, dentry->d_parent, &cfid)) {
 		if (cifs_i->time > cfid->time) {
 			close_cached_dir(cfid);
+			cifs_put_tlink(tlink);
 			return false;
 		}
 		close_cached_dir(cfid);
 	}
+
+	if (dentry->d_parent) {
+		struct cached_dirent_lookup_result lookup = {};
+		int rc;
+		int rc_wait;
+
+	retry_lookup:
+		cfid = NULL;
+		if (!open_cached_dir_by_dentry(tcon, dentry->d_parent, &cfid)) {
+			rc = lookup_cached_dir(cfid, dentry->d_name.name,
+							  dentry->d_name.len,
+							  &lookup);
+			if (rc == -ENOENT && !retried_pending) {
+				rc_wait = cifs_wait_for_pending_dcache(cfid,
+							     dentry->d_name.name,
+							     dentry->d_name.len);
+				if (rc_wait == -ETIMEDOUT)
+					cifs_dbg(FYI,
+						 "Timed out waiting for async dcache population of %pd\n",
+						 dentry);
+				else if (!rc_wait) {
+					close_cached_dir(cfid);
+					retried_pending = true;
+					goto retry_lookup;
+				}
+			}
+			close_cached_dir(cfid);
+			if (!rc && lookup.found && lookup.under_active_lease) {
+				if (cifs_inode_has_writable_handle(inode)) {
+					cifs_set_time(dentry, jiffies);
+					cifs_put_tlink(tlink);
+					return false;
+				}
+				rc = cifs_fattr_to_inode(inode, &lookup.fattr, false);
+				if (!rc) {
+					cifs_set_time(dentry, jiffies);
+					cifs_put_tlink(tlink);
+					return false;
+				}
+				if (rc != -ESTALE) {
+					cifs_put_tlink(tlink);
+					return true;
+				}
+			}
+		}
+	}
+
+	/*
+	 * Even when metadata is marked stale (time == 0), attempt the
+	 * cached-dir fast path above first; only force wire revalidation if
+	 * cache lookup/update did not satisfy this dentry.
+	 */
+	if (force_reval) {
+		cifs_put_tlink(tlink);
+		return true;
+	}
+
 	/*
 	 * depending on inode type, check if attribute caching disabled for
 	 * files or directories
 	 */
 	if (S_ISDIR(inode->i_mode)) {
-		if (!cifs_sb->ctx->acdirmax)
+		if (!cifs_sb->ctx->acdirmax) {
+			cifs_put_tlink(tlink);
 			return true;
+		}
 		if (!time_in_range(jiffies, cifs_i->time,
-				   cifs_i->time + cifs_sb->ctx->acdirmax))
+				   cifs_i->time + cifs_sb->ctx->acdirmax)) {
+			cifs_put_tlink(tlink);
 			return true;
+		}
 	} else { /* file */
-		if (!cifs_sb->ctx->acregmax)
+		if (!cifs_sb->ctx->acregmax) {
+			cifs_put_tlink(tlink);
 			return true;
+		}
 		if (!time_in_range(jiffies, cifs_i->time,
-				   cifs_i->time + cifs_sb->ctx->acregmax))
+				   cifs_i->time + cifs_sb->ctx->acregmax)) {
+			cifs_put_tlink(tlink);
 			return true;
+		}
 	}
 
 	/* hardlinked files w/ noserverino get "special" treatment */
 	if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM) &&
-	    S_ISREG(inode->i_mode) && inode->i_nlink != 1)
+	    S_ISREG(inode->i_mode) && inode->i_nlink != 1) {
+		cifs_put_tlink(tlink);
 		return true;
+	}
 
+	cifs_put_tlink(tlink);
 	return false;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 12/19] cifs: register a shrinker to manage cached_dirents
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (9 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 11/19] cifs: in place changes to cached_dirents when dir lease is held nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 13/19] cifs: option to disable time-based eviction of cache nspmangalore
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Since the cached_dirents are now backed by folioq, we do not need
a timed cleanup of these cfids anymore. This change registers a
shrinker with the mm layer for the dircache. If mm needs to free up
memory or flush the cache, it can inform cifs about this using the
shrinker interface.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cifsfs.c | 86 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 80 insertions(+), 6 deletions(-)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 32d0305a1239a..ee5de358e27f8 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -29,6 +29,7 @@
 #include <linux/uuid.h>
 #include <linux/xattr.h>
 #include <linux/mm.h>
+#include <linux/shrinker.h>
 #include <linux/key-type.h>
 #include <uapi/linux/magic.h>
 #include <net/ipv6.h>
@@ -123,27 +124,86 @@ MODULE_PARM_DESC(dir_cache_timeout, "Number of seconds to cache directory conten
 				 "Range: 1 to 65000 seconds, 0 to disable caching dir contents");
 /* Module-wide total cached dirents (in bytes) across all tcons */
 atomic64_t cifs_dircache_bytes_used = ATOMIC64_INIT(0);
+static struct shrinker *cifs_dircache_shrinker;
 
 /*
  * Write-only module parameter to drop all cached directory entries across
  * all CIFS mounts. Echo a non-zero value to trigger.
  */
-static void cifs_drop_all_dir_caches(void)
+static unsigned long cifs_drop_all_dir_caches(bool wait, unsigned long nr_to_free)
 {
 	struct TCP_Server_Info *server;
 	struct cifs_ses *ses;
 	struct cifs_tcon *tcon;
+	u64 before, after, freed_bytes = 0;
+	u64 target_bytes;
+
+	before = atomic64_read(&cifs_dircache_bytes_used);
+	if (nr_to_free == ULONG_MAX)
+		target_bytes = U64_MAX;
+	else
+		target_bytes = (u64)nr_to_free * PAGE_SIZE;
 
 	spin_lock(&cifs_tcp_ses_lock);
 	list_for_each_entry(server, &cifs_tcp_ses_list, tcp_ses_list) {
 		list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) {
 			if (cifs_ses_exiting(ses))
 				continue;
-			list_for_each_entry(tcon, &ses->tcon_list, tcon_list)
-				invalidate_all_cached_dirs(tcon);
+			list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
+				invalidate_all_cached_dirs_nowait(tcon);
+				after = atomic64_read(&cifs_dircache_bytes_used);
+				if (after < before)
+					freed_bytes = before - after;
+				if (freed_bytes >= target_bytes)
+					goto out_unlock;
+			}
 		}
 	}
+out_unlock:
 	spin_unlock(&cifs_tcp_ses_lock);
+
+	if (wait)
+		flush_workqueue(cfid_put_wq);
+
+	after = atomic64_read(&cifs_dircache_bytes_used);
+	if (after >= before)
+		return 0;
+	return (unsigned long)(before - after);
+}
+
+static unsigned long cifs_dircache_shrinker_count(struct shrinker *shrink,
+						   struct shrink_control *sc)
+{
+	u64 bytes = atomic64_read(&cifs_dircache_bytes_used);
+
+	(void)shrink;
+	(void)sc;
+
+	return DIV_ROUND_UP_ULL(bytes, PAGE_SIZE);
+}
+
+static unsigned long cifs_dircache_shrinker_scan(struct shrinker *shrink,
+						  struct shrink_control *sc)
+{
+	unsigned long freed_bytes;
+
+	(void)shrink;
+
+	if (!sc->nr_to_scan)
+		return 0;
+
+	if (!atomic64_read(&cifs_dircache_bytes_used))
+		return SHRINK_STOP;
+
+	/*
+	 * Shrinker scan can run from reclaim context, so avoid synchronously
+	 * flushing worker queues here to prevent long stalls/deadlocks.
+	 */
+	freed_bytes = cifs_drop_all_dir_caches(false, max_t(unsigned long, 1, sc->nr_to_scan));
+	if (!freed_bytes)
+		return SHRINK_STOP;
+
+	return DIV_ROUND_UP_ULL(freed_bytes, PAGE_SIZE);
 }
 
 static int cifs_param_set_drop_dir_cache(const char *val, const struct kernel_param *kp)
@@ -154,7 +214,7 @@ static int cifs_param_set_drop_dir_cache(const char *val, const struct kernel_pa
 	if (rc)
 		return rc;
 	if (bv)
-		cifs_drop_all_dir_caches();
+		cifs_drop_all_dir_caches(true, ULONG_MAX);
 	return 0;
 }
 
@@ -2038,10 +2098,19 @@ init_cifs(void)
 	if (rc)
 		goto out_destroy_mids;
 
+	cifs_dircache_shrinker = shrinker_alloc(0, "cifs-dircache");
+	if (!cifs_dircache_shrinker) {
+		rc = -ENOMEM;
+		goto out_destroy_request_bufs;
+	}
+	cifs_dircache_shrinker->count_objects = cifs_dircache_shrinker_count;
+	cifs_dircache_shrinker->scan_objects = cifs_dircache_shrinker_scan;
+	shrinker_register(cifs_dircache_shrinker);
+
 #ifdef CONFIG_CIFS_DFS_UPCALL
 	rc = dfs_cache_init();
 	if (rc)
-		goto out_destroy_request_bufs;
+		goto out_free_dircache_shrinker;
 #endif /* CONFIG_CIFS_DFS_UPCALL */
 #ifdef CONFIG_CIFS_UPCALL
 	rc = init_cifs_spnego();
@@ -2083,8 +2152,11 @@ init_cifs(void)
 #endif
 #ifdef CONFIG_CIFS_DFS_UPCALL
 	dfs_cache_destroy();
-out_destroy_request_bufs:
+out_free_dircache_shrinker:
 #endif
+	shrinker_free(cifs_dircache_shrinker);
+	cifs_dircache_shrinker = NULL;
+out_destroy_request_bufs:
 	cifs_destroy_request_bufs();
 out_destroy_mids:
 	destroy_mids();
@@ -2117,6 +2189,8 @@ exit_cifs(void)
 	cifs_dbg(NOISY, "exit_smb3\n");
 	unregister_filesystem(&cifs_fs_type);
 	unregister_filesystem(&smb3_fs_type);
+	shrinker_free(cifs_dircache_shrinker);
+	cifs_dircache_shrinker = NULL;
 	cifs_release_automount_timer();
 	exit_cifs_idmap();
 #ifdef CONFIG_CIFS_SWN_UPCALL
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 13/19] cifs: option to disable time-based eviction of cache
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (10 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 12/19] cifs: register a shrinker to manage cached_dirents nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 15:47   ` Steve French
  2026-05-01 11:20 ` [PATCH v4 14/19] cifs: option to set unlimited number of cached dirs nspmangalore
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today there is no way to disable time-based eviction of dir cache.
dir_cache_timeout = 0 meant immediate free up of dir cache on next
laundromat scan. We already have nohandlecache to disable dir cache.

This changes the meaning of dir_cache_timeout = 0 to mean unlimited
timeout. Shrinker-based eviction is still possible.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cifsfs.c  | 2 +-
 fs/smb/client/connect.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index ee5de358e27f8..79a6a4c297ee3 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -121,7 +121,7 @@ MODULE_PARM_DESC(cifs_max_pending, "Simultaneous requests to server for "
 unsigned int dir_cache_timeout = 30;
 module_param(dir_cache_timeout, uint, 0644);
 MODULE_PARM_DESC(dir_cache_timeout, "Number of seconds to cache directory contents for which we have a lease. Default: 30 "
-				 "Range: 1 to 65000 seconds, 0 to disable caching dir contents");
+				 "Range: 0 to 65000 seconds. 0 disables timeout-based cleanup (cached dirs persist until explicitly invalidated).");
 /* Module-wide total cached dirents (in bytes) across all tcons */
 atomic64_t cifs_dircache_bytes_used = ATOMIC64_INIT(0);
 static struct shrinker *cifs_dircache_shrinker;
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 69b38f0ccf2b2..849c16c538353 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -2698,7 +2698,7 @@ cifs_get_tcon(struct cifs_ses *ses, struct smb3_fs_context *ctx)
 
 	if (ses->server->dialect >= SMB20_PROT_ID &&
 	    (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING))
-		nohandlecache = ctx->nohandlecache || !dir_cache_timeout;
+		nohandlecache = ctx->nohandlecache;
 	else
 		nohandlecache = true;
 	tcon = tcon_info_alloc(!nohandlecache, netfs_trace_tcon_ref_new);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 14/19] cifs: option to set unlimited number of cached dirs
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (11 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 13/19] cifs: option to disable time-based eviction of cache nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 15/19] cifs: allow dcache population to happen asynchronously nspmangalore
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today we can control the number of cached dirs on the system using
the mount option max_cached_dirs. The default value is 16.

This change allows setting this option to a special value of 0,
which means unlimited number of cached dirs. Shrinker can still
evict the cached dirs.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c |  2 +-
 fs/smb/client/fs_context.c | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 7cfbe50db66f5..863b8d666ea95 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -1256,7 +1256,7 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 	if (lookup_only) {
 		return NULL;
 	}
-	if (cfids->num_entries >= max_cached_dirs) {
+	if (max_cached_dirs && cfids->num_entries >= max_cached_dirs) {
 		return NULL;
 	}
 	cfid = init_cached_dir(path);
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index a46764c247107..3eff94df70d45 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -1493,12 +1493,12 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
 		ctx->max_channels = result.uint_32;
 		break;
 	case Opt_max_cached_dirs:
-		if (result.uint_32 < 1) {
-			cifs_errorf(fc, "%s: Invalid max_cached_dirs, needs to be 1 or more\n",
-				    __func__);
-			goto cifs_parse_mount_err;
+		if (result.uint_32 > 0) {
+			ctx->max_cached_dirs = result.uint_32;
+		} else {
+			/* 0 means unlimited */
+			ctx->max_cached_dirs = 0;
 		}
-		ctx->max_cached_dirs = result.uint_32;
 		break;
 	case Opt_handletimeout:
 		ctx->handle_timeout = result.uint_32;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 15/19] cifs: allow dcache population to happen asynchronously
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (12 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 14/19] cifs: option to set unlimited number of cached dirs nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 16/19] cifs: trace points for cached_dir operations nspmangalore
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today cifs_readdir populates the dcache inline whenever it
emits dentries. This introduces a perf bottleneck in
readdir performance.

This change attempts to make dcache population asynchronous
and parallel by doing this inside workers. It introduces a new
flag that lets reval and lookup wait for dentry population instead.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cifs_fs_sb.h |  1 +
 fs/smb/client/cifsfs.c     | 22 +++++++++-
 fs/smb/client/cifsglob.h   |  2 +
 fs/smb/client/readdir.c    | 88 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/fs/smb/client/cifs_fs_sb.h b/fs/smb/client/cifs_fs_sb.h
index 84e7e366b0ff4..0efc1483f3ab4 100644
--- a/fs/smb/client/cifs_fs_sb.h
+++ b/fs/smb/client/cifs_fs_sb.h
@@ -71,5 +71,6 @@ struct cifs_sb_info {
 	 * Available once the mount has completed.
 	 */
 	struct dentry *root;
+	bool sb_dying:1;  /* superblock is being destroyed, skip dcache ops */
 };
 #endif				/* _CIFS_FS_SB_H */
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 79a6a4c297ee3..f8f3b27ba60bc 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -69,6 +69,7 @@ bool disable_legacy_dialects; /* false by default */
 bool enable_gcm_256 = true;
 bool require_gcm_256; /* false by default */
 bool enable_negotiate_signing; /* false by default */
+bool dcache_populate_async; /* false by default */
 unsigned int global_secflags = CIFSSEC_DEF;
 /* unsigned int ntlmv2_support = 0; */
 
@@ -241,6 +242,9 @@ MODULE_PARM_DESC(require_gcm_256, "Require strongest (256 bit) GCM encryption. D
 module_param(enable_negotiate_signing, bool, 0644);
 MODULE_PARM_DESC(enable_negotiate_signing, "Enable negotiating packet signing algorithm with server. Default: n/N/0");
 
+module_param(dcache_populate_async, bool, 0644);
+MODULE_PARM_DESC(dcache_populate_async, "Enable asynchronous dcache population during readdir to improve performance on large directories. Default: n/N/0");
+
 module_param(disable_legacy_dialects, bool, 0644);
 MODULE_PARM_DESC(disable_legacy_dialects, "To improve security it may be "
 				  "helpful to restrict the ability to "
@@ -395,6 +399,11 @@ static void cifs_kill_sb(struct super_block *sb)
 	 * and close all deferred file handles before we kill the sb.
 	 */
 	if (cifs_sb->root) {
+		/* Mark superblock as dying to skip expensive dcache ops in flight */
+		cifs_sb->sb_dying = true;
+		/* Wait for dcache work to complete and clean up */
+		flush_workqueue(cifs_dcache_wq);
+
 		close_all_cached_dirs(cifs_sb);
 		cifs_close_all_deferred_files_sb(cifs_sb);
 
@@ -2082,9 +2091,17 @@ init_cifs(void)
 		goto out_destroy_serverclose_wq;
 	}
 
+	/* WQ_UNBOUND allows dcache work to run on any CPU for parallelism */
+	cifs_dcache_wq = alloc_workqueue("cifs_dcache",
+					 WQ_UNBOUND | WQ_FREEZABLE, 0);
+	if (!cifs_dcache_wq) {
+		rc = -ENOMEM;
+		goto out_destroy_cfid_put_wq;
+	}
+
 	rc = cifs_init_inodecache();
 	if (rc)
-		goto out_destroy_cfid_put_wq;
+		goto out_destroy_dcache_wq;
 
 	rc = cifs_init_netfs();
 	if (rc)
@@ -2164,6 +2181,8 @@ init_cifs(void)
 	cifs_destroy_netfs();
 out_destroy_inodecache:
 	cifs_destroy_inodecache();
+out_destroy_dcache_wq:
+	destroy_workqueue(cifs_dcache_wq);
 out_destroy_cfid_put_wq:
 	destroy_workqueue(cfid_put_wq);
 out_destroy_serverclose_wq:
@@ -2212,6 +2231,7 @@ exit_cifs(void)
 	destroy_workqueue(fileinfo_put_wq);
 	destroy_workqueue(serverclose_wq);
 	destroy_workqueue(cfid_put_wq);
+	destroy_workqueue(cifs_dcache_wq);
 	destroy_workqueue(cifsiod_wq);
 	cifs_proc_clean();
 }
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 2a3fad071564a..278bf2bf11e97 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -2144,6 +2144,7 @@ extern unsigned int sign_CIFS_PDUs;  /* enable smb packet signing */
 extern bool enable_gcm_256; /* allow optional negotiate of strongest signing (aes-gcm-256) */
 extern bool require_gcm_256; /* require use of strongest signing (aes-gcm-256) */
 extern bool enable_negotiate_signing; /* request use of faster (GMAC) signing if available */
+extern bool dcache_populate_async; /* enable async dcache population during readdir */
 extern bool linuxExtEnabled;/*enable Linux/Unix CIFS extensions*/
 extern unsigned int CIFSMaxBufSize;  /* max size not including hdr */
 extern unsigned int cifs_min_rcv;    /* min size of big ntwrk buf pool */
@@ -2165,6 +2166,7 @@ extern struct workqueue_struct *cifsoplockd_wq;
 extern struct workqueue_struct *deferredclose_wq;
 extern struct workqueue_struct *serverclose_wq;
 extern struct workqueue_struct *cfid_put_wq;
+extern struct workqueue_struct *cifs_dcache_wq;
 extern __u32 cifs_lock_secret;
 
 extern mempool_t *cifs_sm_req_poolp;
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index ef81fdb503c0a..a1202a82be4e8 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/slab.h>
 #include <linux/stat.h>
+#include <linux/workqueue.h>
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_unicode.h"
@@ -24,6 +25,19 @@
 #include "cached_dir.h"
 #include "reparse.h"
 
+/* Workqueue for async dcache population */
+struct workqueue_struct *cifs_dcache_wq;
+
+/* Work item for async dcache population */
+struct cifs_dcache_work {
+	struct work_struct work;
+	struct dentry *parent;		/* dget() reference to parent dir */
+	char *name;			/* kstrdup() copy of filename */
+	unsigned int namelen;
+	struct cifs_fattr fattr;	/* Copy of attributes */
+	struct cached_fid *cfid;	/* ref-counted cfid for cifs_complete_pending_dcache */
+};
+
 /*
  * To be safe - for UCS to UTF-8 with strings loaded with the rare long
  * characters alloc more to account for such multibyte target UTF-8
@@ -171,6 +185,65 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 	dput(dentry);
 }
 
+/*
+ * Async dcache population work handler.
+ * Delegates to cifs_prime_dcache then signals completion to unblock waiters.
+ */
+static void cifs_dcache_work_handler(struct work_struct *work)
+{
+	struct cifs_dcache_work *dcache_work =
+		container_of(work, struct cifs_dcache_work, work);
+	struct qstr name = QSTR_INIT(dcache_work->name, dcache_work->namelen);
+	struct cifs_sb_info *cifs_sb = CIFS_SB(dcache_work->parent->d_sb);
+
+	cifs_dbg(FYI, "%s: async dcache for %s\n", __func__, name.name);
+
+	/* Skip expensive dcache operations if superblock is being torn down */
+	if (!cifs_sb->sb_dying) {
+		cifs_prime_dcache(dcache_work->parent, &name, &dcache_work->fattr);
+		cifs_complete_pending_dcache(dcache_work->cfid, dcache_work->name,
+					     dcache_work->namelen);
+	}
+	close_cached_dir(dcache_work->cfid);
+	dput(dcache_work->parent);
+	kfree(dcache_work->name);
+	kfree(dcache_work);
+}
+
+/*
+ * Queue async dcache population work.
+ * Returns true if work was queued, false if sync fallback needed.
+ */
+static bool cifs_queue_dcache_work(struct dentry *parent, const char *name,
+				    unsigned int namelen, struct cifs_fattr *fattr,
+				    struct cached_fid *cfid)
+{
+	struct cifs_dcache_work *work;
+
+	if (!cfid)
+		return false;
+
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return false;
+
+	work->name = kstrndup(name, namelen, GFP_KERNEL);
+	if (!work->name) {
+		kfree(work);
+		return false;
+	}
+
+	work->parent = dget(parent);
+	work->namelen = namelen;
+	memcpy(&work->fattr, fattr, sizeof(work->fattr));
+	kref_get(&cfid->refcount);
+	work->cfid = cfid;
+
+	INIT_WORK(&work->work, cifs_dcache_work_handler);
+	queue_work(cifs_dcache_wq, &work->work);
+	return true;
+}
+
 static void
 cifs_fill_common_info(struct cifs_fattr *fattr, struct cifs_sb_info *cifs_sb)
 {
@@ -923,8 +996,19 @@ static int cifs_filldir(char *find_entry, struct file *file,
 		 */
 		fattr.cf_flags |= CIFS_FATTR_NEED_REVAL;
 
-	add_to_cached_dir(cfid, ctx, name.name, name.len, &fattr, file);
-	cifs_prime_dcache(file_dentry(file), &name, &fattr);
+	/* queue async dcache population if enabled; fallback to sync if disabled or queueing fails */
+	bool cached = add_to_cached_dir(cfid, ctx, name.name, name.len, &fattr, file);
+
+	if (dcache_populate_async && cached &&
+	    cifs_queue_dcache_work(file_dentry(file), name.name, name.len,
+				  &fattr, cfid)) {
+		/* Async: handler will call cifs_prime_dcache + cifs_complete_pending_dcache */
+		cifs_dbg(FYI, "Queued async dcache population for %.*s\n", name.len, name.name);
+	} else {
+		cifs_prime_dcache(file_dentry(file), &name, &fattr);
+		if (cached)
+			cifs_complete_pending_dcache(cfid, name.name, name.len);
+	}
 
 	return !cifs_dir_emit(ctx, name.name, name.len, &fattr);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 16/19] cifs: trace points for cached_dir operations
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (13 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 15/19] cifs: allow dcache population to happen asynchronously nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 17/19] cifs: discard functions to ensure that mid callbacks get called nspmangalore
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

This change introduces ftrace tracepoints for cached_dir operations.
Intended to be used for debugging cached_dir.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 21 ++++++++++-
 fs/smb/client/inode.c      | 13 +++++++
 fs/smb/client/trace.h      | 73 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 863b8d666ea95..6fb88c4c97dc1 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -1011,6 +1011,7 @@ bool add_to_cached_dir(struct cached_fid *cfid,
 		atomic64_sub(-bytes_diff, &cfid->cfids->total_dirents_bytes);
 		atomic64_sub(-bytes_diff, &cifs_dircache_bytes_used);
 	}
+	trace_smb3_add_to_cached_dir(cfid, name, namelen, added ? 0 : -1);
 
 
 	return added;
@@ -1041,6 +1042,9 @@ void complete_cached_dir(struct cached_fid *cfid,
 	cde = &cfid->dirents;
 	mutex_lock(&cfid->dirents.de_mutex);
 	finished_cached_dirents_count(cde, ctx, file);
+	trace_smb3_complete_cached_dir(cfid, ctx->pos, cde->pos,
+					 cde->is_valid,
+					 cde->is_failed);
 	mutex_unlock(&cfid->dirents.de_mutex);
 }
 
@@ -1076,12 +1080,14 @@ int lookup_cached_dir(struct cached_fid *cfid,
 	entry = lookup_cached_dirent_entry_locked(&cfid->dirents, name, namelen);
 	if (!entry || !entry->dirent) {
 		mutex_unlock(&cfid->dirents.de_mutex);
+		trace_smb3_lookup_cached_dir(cfid, name, namelen, -ENOENT);
 		return -ENOENT;
 	}
 
 	dirent = entry->dirent;
 	if (dirent->tombstone) {
 		mutex_unlock(&cfid->dirents.de_mutex);
+		trace_smb3_lookup_cached_dir(cfid, name, namelen, -ENOENT);
 		return -ENOENT;
 	}
 
@@ -1089,6 +1095,7 @@ int lookup_cached_dir(struct cached_fid *cfid,
 	memcpy(&result->fattr, &dirent->fattr, sizeof(result->fattr));
 
 	mutex_unlock(&cfid->dirents.de_mutex);
+	trace_smb3_lookup_cached_dir(cfid, name, namelen, 0);
 	return 0;
 }
 
@@ -1126,6 +1133,8 @@ bool update_dirent_in_cached_dir(struct cached_fid *cfid,
 	updated = update_cached_dirent_locked(&cfid->dirents, name,
 						      namelen, fattr);
 	mutex_unlock(&cfid->dirents.de_mutex);
+	trace_smb3_update_dirent_in_cached_dir(cfid, name, namelen,
+					      updated ? 0 : -ENOENT);
 	return updated;
 }
 
@@ -1185,6 +1194,7 @@ void cifs_complete_pending_dcache(struct cached_fid *cfid,
 	mutex_unlock(&cfid->dirents.de_mutex);
 	cifs_dbg(FYI, "Dcache population of %.*s. status: %d\n",
 					namelen, name, ret);
+	trace_smb3_dcache_complete(cfid, name, namelen, ret);
 }
 
 /*
@@ -1226,6 +1236,7 @@ int cifs_wait_for_pending_dcache(struct cached_fid *cfid,
 		}
 	}
 
+	trace_smb3_dcache_wait(cfid, name, namelen, ret);
 	return ret;
 }
 
@@ -1399,6 +1410,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 	if (cfid == NULL) {
 		spin_unlock(&cfids->cfid_list_lock);
 		kfree(utf16_path);
+		trace_smb3_open_cached_dir(NULL, path, strlen(path), -ENOENT);
 		return -ENOENT;
 	}
 	spin_unlock(&cfids->cfid_list_lock);
@@ -1637,6 +1649,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 		*ret_cfid = cfid;
 		atomic_inc(&tcon->num_remote_opens);
 	}
+	trace_smb3_open_cached_dir(cfid, path, strlen(path), rc);
 	kfree(utf16_path);
 
 	if (is_replayable_error(rc) &&
@@ -1651,7 +1664,6 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 			      struct cached_fid **ret_cfid)
 {
 	struct cached_fid *cfid;
-	struct cached_fid *trace_cfid = NULL;
 	struct cached_fids *cfids = tcon->cfids;
 	int rc = -ENOENT;
 
@@ -1667,6 +1679,7 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 			spin_lock(&cfid->cfid_lock);
 			if (!is_valid_cached_dir(cfid)) {
 				spin_unlock(&cfid->cfid_lock);
+				rc = -ENOENT;
 				break;
 			}
 			cifs_dbg(FYI, "found a cached file handle by dentry\n");
@@ -1677,10 +1690,15 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 			trace_cfid = cfid;
 			spin_unlock(&cfid->cfid_lock);
 			spin_unlock(&cfids->cfid_list_lock);
+			trace_smb3_open_cached_dir_by_dentry(cfid, dentry->d_name.name,
+							 dentry->d_name.len, 0);
 			return rc;
 		}
 	}
 	spin_unlock(&cfids->cfid_list_lock);
+	trace_smb3_open_cached_dir_by_dentry(NULL, dentry->d_name.name,
+					     dentry->d_name.len, rc);
+
 	return rc;
 }
 
@@ -1982,6 +2000,7 @@ static void free_cached_dir(struct cached_fid *cfid)
 
 	WARN_ON(work_pending(&cfid->close_work));
 	WARN_ON(work_pending(&cfid->put_work));
+	trace_smb3_free_cached_dir(cfid, cfid->path, strlen(cfid->path), 0);
 
 
 	dput(cfid->dentry);
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index ecc92e5c7f7b6..08cf0cfcda89b 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -2859,20 +2859,31 @@ cifs_dentry_needs_reval(struct dentry *dentry)
 			if (!rc && lookup.found && lookup.under_active_lease) {
 				if (cifs_inode_has_writable_handle(inode)) {
 					cifs_set_time(dentry, jiffies);
+					trace_smb3_dcache_revalidate(cfid,
+							 dentry->d_name.name,
+							     dentry->d_name.len, 0);
 					cifs_put_tlink(tlink);
 					return false;
 				}
 				rc = cifs_fattr_to_inode(inode, &lookup.fattr, false);
 				if (!rc) {
 					cifs_set_time(dentry, jiffies);
+					trace_smb3_dcache_revalidate(cfid,
+							 dentry->d_name.name,
+							     dentry->d_name.len, 0);
 					cifs_put_tlink(tlink);
 					return false;
 				}
 				if (rc != -ESTALE) {
+					trace_smb3_dcache_revalidate(cfid,
+							 dentry->d_name.name,
+							     dentry->d_name.len, rc);
 					cifs_put_tlink(tlink);
 					return true;
 				}
 			}
+			trace_smb3_dcache_revalidate(cfid, dentry->d_name.name,
+						     dentry->d_name.len, rc);
 		}
 	}
 
@@ -2882,6 +2893,8 @@ cifs_dentry_needs_reval(struct dentry *dentry)
 	 * cache lookup/update did not satisfy this dentry.
 	 */
 	if (force_reval) {
+		trace_smb3_dcache_revalidate(cfid, dentry->d_name.name,
+					     dentry->d_name.len, -1);
 		cifs_put_tlink(tlink);
 		return true;
 	}
diff --git a/fs/smb/client/trace.h b/fs/smb/client/trace.h
index 54ee1317c5b12..9907a89ee4001 100644
--- a/fs/smb/client/trace.h
+++ b/fs/smb/client/trace.h
@@ -1813,6 +1813,79 @@ TRACE_EVENT(smb3_eio,
 		      __entry->info, __entry->info2)
 	    );
 
+/*
+ * Trace events for async directory cache population work
+ */
+DECLARE_EVENT_CLASS(smb3_dcache,
+	TP_PROTO(const void *cfid,
+		const char *name,
+		int namelen,
+		int result),
+	TP_ARGS(cfid, name, namelen, result),
+	TP_STRUCT__entry(
+		__field(const void *, cfid)
+		__string(name, name)
+		__field(int, namelen)
+		__field(int, result)
+	),
+	TP_fast_assign(
+		__entry->cfid = cfid;
+		__assign_str(name);
+		__entry->namelen = namelen;
+		__entry->result = result;
+	),
+	TP_printk("cfid=%p name=%.*s result=%d",
+		__entry->cfid,
+		__entry->namelen, __get_str(name), __entry->result)
+);
+
+#define DEFINE_SMB3_DCACHE_EVENT(name)          \
+DEFINE_EVENT(smb3_dcache, smb3_##name,    \
+	TP_PROTO(const void *cfid,		\
+		const char *name,		\
+		int namelen,			\
+		int result),			\
+	TP_ARGS(cfid, name, namelen, result))
+
+DEFINE_SMB3_DCACHE_EVENT(dcache_complete);
+DEFINE_SMB3_DCACHE_EVENT(dcache_wait);
+DEFINE_SMB3_DCACHE_EVENT(dcache_revalidate);
+DEFINE_SMB3_DCACHE_EVENT(open_cached_dir);
+DEFINE_SMB3_DCACHE_EVENT(open_cached_dir_by_dentry);
+DEFINE_SMB3_DCACHE_EVENT(free_cached_dir);
+DEFINE_SMB3_DCACHE_EVENT(add_to_cached_dir);
+DEFINE_SMB3_DCACHE_EVENT(lookup_cached_dir);
+DEFINE_SMB3_DCACHE_EVENT(update_dirent_in_cached_dir);
+
+TRACE_EVENT(smb3_complete_cached_dir,
+	TP_PROTO(const void *cfid,
+		 loff_t ctx_pos,
+		 loff_t cached_pos,
+		 int is_valid,
+		 int is_failed),
+	TP_ARGS(cfid, ctx_pos, cached_pos, is_valid, is_failed),
+	TP_STRUCT__entry(
+		__field(const void *, cfid)
+		__field(loff_t, ctx_pos)
+		__field(loff_t, cached_pos)
+		__field(int, is_valid)
+		__field(int, is_failed)
+	),
+	TP_fast_assign(
+		__entry->cfid = cfid;
+		__entry->ctx_pos = ctx_pos;
+		__entry->cached_pos = cached_pos;
+		__entry->is_valid = is_valid;
+		__entry->is_failed = is_failed;
+	),
+	TP_printk("cfid=%p ctx_pos=%lld cached_pos=%lld is_valid=%d is_failed=%d",
+		__entry->cfid,
+		(long long)__entry->ctx_pos,
+		(long long)__entry->cached_pos,
+		__entry->is_valid,
+		__entry->is_failed)
+);
+
 #undef EM
 #undef E_
 #endif /* _CIFS_TRACE_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 17/19] cifs: discard functions to ensure that mid callbacks get called
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (14 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 16/19] cifs: trace points for cached_dir operations nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 18/19] cifs: keep cfids in rbtree for efficient lookups nspmangalore
  2026-05-01 11:20 ` [PATCH v4 19/19] cifs: invalidate cached_dirents if population aborted nspmangalore
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

If discard functions for readv and query_dir return error, the
callback functions can be skipped. This can end up with hung
syscalls due to the completion functions not getting called.

This change ensures that both these discard functions call
the callback function when discard from socket returned error.
This ensures that at least after the mid for the response is found,
the callback doesn't get skipped, and we do not leave syscalls
waiting.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/transport.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
index 24ccadb00f568..a11e6eba008a6 100644
--- a/fs/smb/client/transport.c
+++ b/fs/smb/client/transport.c
@@ -1143,6 +1143,11 @@ __cifs_discard_and_dequeue(struct TCP_Server_Info *server, struct mid_q_entry *m
 	dequeue_mid(server, mid, malformed);
 	mid->resp_buf = server->smallbuf;
 	server->smallbuf = NULL;
+
+	/* Once the mid is dequeued, the callback must run to terminate the subreq */
+	if (length < 0)
+		mid_execute_callback(server, mid);
+
 	return length;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 18/19] cifs: keep cfids in rbtree for efficient lookups
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (15 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 17/19] cifs: discard functions to ensure that mid callbacks get called nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  2026-05-01 11:20 ` [PATCH v4 19/19] cifs: invalidate cached_dirents if population aborted nspmangalore
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

Today tcon->cfids are maintained as a linked list. When we do
not limit the number of cfids (limited by system memory), this
can be a problem if the mount has a large number of active dir
accesses. We soon start hitting perf bottlenecks.

This change stores cfids in a rbtree instead of a linked list.
The nodes of this tree are keyed based on the path inside the tcon.

Additionally, for faster lookups by dentry, introducing a hashtable
to allow lookups of cfids by dentry faster. This is important
since open_cached_dir_by_dentry is called quite frequently.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 227 ++++++++++++++++++++++++++-----------
 fs/smb/client/cached_dir.h |   8 +-
 fs/smb/client/cifs_debug.c |   4 +-
 3 files changed, 172 insertions(+), 67 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 6fb88c4c97dc1..14c87ac1a4ad4 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -22,6 +22,7 @@ static void smb2_close_cached_fid(struct kref *ref);
 static void cfids_laundromat_worker(struct work_struct *work);
 
 #define CACHED_DIRENT_HASH_BITS	7
+#define CACHED_DIR_DENTRY_HT_BITS	8
 
 struct cached_dir_dentry {
 	struct list_head entry;
@@ -1240,6 +1241,57 @@ int cifs_wait_for_pending_dcache(struct cached_fid *cfid,
 	return ret;
 }
 
+static struct cached_fid *cfid_rb_find(struct rb_root *root, const char *path)
+{
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct cached_fid *cfid = rb_entry(node, struct cached_fid, node);
+		int cmp = strcmp(path, cfid->path);
+
+		if (cmp < 0)
+			node = node->rb_left;
+		else if (cmp > 0)
+			node = node->rb_right;
+		else
+			return cfid;
+	}
+	return NULL;
+}
+
+static struct cached_fid *cfid_dentry_ht_find(struct cached_fids *cfids,
+					      struct dentry *dentry)
+{
+	struct cached_fid *cfid;
+
+	hlist_for_each_entry(cfid,
+			     &cfids->dentry_ht[hash_ptr(dentry, CACHED_DIR_DENTRY_HT_BITS)],
+			     dentry_node) {
+		if (cfid->dentry == dentry)
+			return cfid;
+	}
+	return NULL;
+}
+
+static void cfid_rb_insert(struct rb_root *root, struct cached_fid *new)
+{
+	struct rb_node **link = &root->rb_node;
+	struct rb_node *parent = NULL;
+
+	while (*link) {
+		struct cached_fid *cfid = rb_entry(*link, struct cached_fid, node);
+		int cmp = strcmp(new->path, cfid->path);
+
+		parent = *link;
+		if (cmp < 0)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, root);
+}
+
 static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 						    const char *path,
 						    bool lookup_only,
@@ -1247,22 +1299,21 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 {
 	struct cached_fid *cfid;
 
-	list_for_each_entry(cfid, &cfids->entries, entry) {
-		if (!strcmp(cfid->path, path)) {
-			/*
-			 * If it doesn't have a lease it is either not yet
-			 * fully cached or it may be in the process of
-			 * being deleted due to a lease break.
-			 */
-			spin_lock(&cfid->cfid_lock);
-			if (!is_valid_cached_dir(cfid)) {
-				spin_unlock(&cfid->cfid_lock);
-				return NULL;
-			}
-			kref_get(&cfid->refcount);
+	cfid = cfid_rb_find(&cfids->entries, path);
+	if (cfid) {
+		/*
+		 * If it doesn't have a lease it is either not yet
+		 * fully cached or it may be in the process of
+		 * being deleted due to a lease break.
+		 */
+		spin_lock(&cfid->cfid_lock);
+		if (!is_valid_cached_dir(cfid)) {
 			spin_unlock(&cfid->cfid_lock);
-			return cfid;
+			return NULL;
 		}
+		kref_get(&cfid->refcount);
+		spin_unlock(&cfid->cfid_lock);
+		return cfid;
 	}
 	if (lookup_only) {
 		return NULL;
@@ -1276,7 +1327,7 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
 	}
 	cfid->cfids = cfids;
 	cfids->num_entries++;
-	list_add(&cfid->entry, &cfids->entries);
+	cfid_rb_insert(&cfids->entries, cfid);
 	cfid->on_list = true;
 	kref_get(&cfid->refcount);
 	/*
@@ -1458,26 +1509,30 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 			struct cached_fid *parent_cfid;
 
 			spin_lock(&cfids->cfid_list_lock);
-			list_for_each_entry(parent_cfid, &cfids->entries, entry) {
+			hlist_for_each_entry(parent_cfid,
+					     &cfids->dentry_ht[hash_ptr(dentry->d_parent,
+								CACHED_DIR_DENTRY_HT_BITS)],
+					     dentry_node) {
+				if (parent_cfid->dentry != dentry->d_parent)
+					continue;
 				spin_lock(&parent_cfid->cfid_lock);
-				if (parent_cfid->dentry == dentry->d_parent) {
-					cifs_dbg(FYI, "found a parent cached file handle\n");
-					if (is_valid_cached_dir(parent_cfid)) {
-						lease_flags
-							|= SMB2_LEASE_FLAG_PARENT_LEASE_KEY_SET_LE;
-						memcpy(pfid->parent_lease_key,
-						       parent_cfid->fid.lease_key,
-						       SMB2_LEASE_KEY_SIZE);
-					}
-					spin_unlock(&parent_cfid->cfid_lock);
-					break;
+				cifs_dbg(FYI, "found a parent cached file handle\n");
+				if (is_valid_cached_dir(parent_cfid)) {
+					lease_flags
+						|= SMB2_LEASE_FLAG_PARENT_LEASE_KEY_SET_LE;
+					memcpy(pfid->parent_lease_key,
+					       parent_cfid->fid.lease_key,
+					       SMB2_LEASE_KEY_SIZE);
 				}
 				spin_unlock(&parent_cfid->cfid_lock);
+				break;
 			}
 			spin_unlock(&cfids->cfid_list_lock);
 		}
 	}
 	cfid->dentry = dentry;
+	hlist_add_head(&cfid->dentry_node,
+		       &cfids->dentry_ht[hash_ptr(dentry, CACHED_DIR_DENTRY_HT_BITS)]);
 	cfid->tcon = tcon;
 
 	/*
@@ -1630,7 +1685,9 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 
 		spin_lock(&cfids->cfid_list_lock);
 		if (cfid->on_list) {
-			list_del(&cfid->entry);
+			if (cfid->dentry)
+				hlist_del_init(&cfid->dentry_node);
+			rb_erase(&cfid->node, &cfids->entries);
 			cfid->on_list = false;
 			cfids->num_entries--;
 		}
@@ -1674,26 +1731,28 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
 		return -ENOENT;
 
 	spin_lock(&cfids->cfid_list_lock);
-	list_for_each_entry(cfid, &cfids->entries, entry) {
-		if (cfid->dentry == dentry) {
-			spin_lock(&cfid->cfid_lock);
-			if (!is_valid_cached_dir(cfid)) {
-				spin_unlock(&cfid->cfid_lock);
-				rc = -ENOENT;
-				break;
-			}
-			cifs_dbg(FYI, "found a cached file handle by dentry\n");
-			kref_get(&cfid->refcount);
-			*ret_cfid = cfid;
-			cfid->last_access_time = jiffies;
-			rc = 0;
-			trace_cfid = cfid;
+	cfid = cfid_dentry_ht_find(cfids, dentry);
+	if (cfid) {
+		spin_lock(&cfid->cfid_lock);
+		if (!is_valid_cached_dir(cfid)) {
 			spin_unlock(&cfid->cfid_lock);
 			spin_unlock(&cfids->cfid_list_lock);
-			trace_smb3_open_cached_dir_by_dentry(cfid, dentry->d_name.name,
-							 dentry->d_name.len, 0);
+			trace_smb3_open_cached_dir_by_dentry(cfid,
+					   dentry->d_name.name,
+					   dentry->d_name.len, rc);
 			return rc;
 		}
+		cifs_dbg(FYI, "found a cached file handle by dentry\n");
+		kref_get(&cfid->refcount);
+		*ret_cfid = cfid;
+		cfid->last_access_time = jiffies;
+		rc = 0;
+		spin_unlock(&cfid->cfid_lock);
+		spin_unlock(&cfids->cfid_list_lock);
+		trace_smb3_open_cached_dir_by_dentry(cfid,
+				   dentry->d_name.name,
+				   dentry->d_name.len, rc);
+		return rc;
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 	trace_smb3_open_cached_dir_by_dentry(NULL, dentry->d_name.name,
@@ -1715,7 +1774,9 @@ __releases(&cfid->cfids->cfid_list_lock)
 	lockdep_assert_held(&cfid->cfids->cfid_list_lock);
 
 	if (cfid->on_list) {
-		list_del(&cfid->entry);
+		if (cfid->dentry)
+			hlist_del_init(&cfid->dentry_node);
+		rb_erase(&cfid->node, &cfid->cfids->entries);
 		cfid->on_list = false;
 		cfid->cfids->num_entries--;
 	}
@@ -1790,6 +1851,7 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 {
 	struct rb_root *root = &cifs_sb->tlink_tree;
 	struct rb_node *node;
+	struct rb_node *cfid_node;
 	struct cached_fid *cfid;
 	struct cifs_tcon *tcon;
 	struct tcon_link *tlink;
@@ -1807,7 +1869,9 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 		if (cfids == NULL)
 			continue;
 		spin_lock(&cfids->cfid_list_lock);
-		list_for_each_entry(cfid, &cfids->entries, entry) {
+		for (cfid_node = rb_first(&cfids->entries);
+		     cfid_node; cfid_node = rb_next(cfid_node)) {
+			cfid = rb_entry(cfid_node, struct cached_fid, node);
 			tmp_list = kmalloc_obj(*tmp_list, GFP_ATOMIC);
 			if (tmp_list == NULL) {
 				/*
@@ -1823,6 +1887,7 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 
 			spin_lock(&cfid->cfid_lock);
 			tmp_list->dentry = cfid->dentry;
+			hlist_del_init(&cfid->dentry_node);
 			cfid->dentry = NULL;
 			spin_unlock(&cfid->cfid_lock);
 
@@ -1850,7 +1915,8 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
 void invalidate_all_cached_dirs_nowait(struct cifs_tcon *tcon)
 {
 	struct cached_fids *cfids = tcon->cfids;
-	struct cached_fid *cfid, *q;
+	struct cached_fid *cfid;
+	struct rb_node *rb_node = NULL, *next_node = NULL;
 
 	if (cfids == NULL)
 		return;
@@ -1861,8 +1927,14 @@ void invalidate_all_cached_dirs_nowait(struct cifs_tcon *tcon)
 	 * during this process.
 	 */
 	spin_lock(&cfids->cfid_list_lock);
-	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
-		list_move(&cfid->entry, &cfids->dying);
+	for (rb_node = rb_first(&cfids->entries);
+	     rb_node; rb_node = next_node) {
+		next_node = rb_next(rb_node);
+		cfid = rb_entry(rb_node, struct cached_fid, node);
+		if (cfid->dentry)
+			hlist_del_init(&cfid->dentry_node);
+		rb_erase(rb_node, &cfids->entries);
+		list_add(&cfid->dying_entry, &cfids->dying);
 		cfids->num_entries--;
 		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
@@ -1939,7 +2011,9 @@ bool cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
 		return false;
 
 	spin_lock(&cfids->cfid_list_lock);
-	list_for_each_entry(cfid, &cfids->entries, entry) {
+	for (struct rb_node *rb_node = rb_first(&cfids->entries);
+	     rb_node; rb_node = rb_next(rb_node)) {
+		cfid = rb_entry(rb_node, struct cached_fid, node);
 		spin_lock(&cfid->cfid_lock);
 		if (cfid->has_lease &&
 		    !memcmp(lease_key,
@@ -1952,7 +2026,9 @@ bool cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
 			 * We found a lease remove it from the list
 			 * so no threads can access it.
 			 */
-			list_del(&cfid->entry);
+			if (cfid->dentry)
+				hlist_del_init(&cfid->dentry_node);
+			rb_erase(rb_node, &cfids->entries);
 			cfid->on_list = false;
 			cfids->num_entries--;
 
@@ -1984,7 +2060,9 @@ static struct cached_fid *init_cached_dir(const char *path)
 
 	INIT_WORK(&cfid->close_work, cached_dir_offload_close);
 	INIT_WORK(&cfid->put_work, cached_dir_put_work);
-	INIT_LIST_HEAD(&cfid->entry);
+	RB_CLEAR_NODE(&cfid->node);
+	INIT_HLIST_NODE(&cfid->dentry_node);
+	INIT_LIST_HEAD(&cfid->dying_entry);
 	INIT_LIST_HEAD(&cfid->dirents.entry_list);
 	mutex_init(&cfid->dirents.de_mutex);
 	mutex_init(&cfid->cfid_open_mutex);
@@ -2039,14 +2117,20 @@ static void cfids_laundromat_worker(struct work_struct *work)
 
 	spin_lock(&cfids->cfid_list_lock);
 	/* move cfids->dying to the local list */
-	list_cut_before(&entry, &cfids->dying, &cfids->dying);
+	list_splice_init(&cfids->dying, &entry);
 
-	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
+	for (struct rb_node *rb_node = rb_first(&cfids->entries), *next_node;
+	     rb_node; rb_node = next_node) {
+		next_node = rb_next(rb_node);
+		cfid = rb_entry(rb_node, struct cached_fid, node);
 		spin_lock(&cfid->cfid_lock);
 		if (dir_cache_timeout && cfid->last_access_time &&
 		    time_after(jiffies, cfid->last_access_time + HZ * dir_cache_timeout)) {
 			cfid->on_list = false;
-			list_move(&cfid->entry, &entry);
+			if (cfid->dentry)
+				hlist_del_init(&cfid->dentry_node);
+			rb_erase(rb_node, &cfids->entries);
+			list_add(&cfid->dying_entry, &entry);
 			cfids->num_entries--;
 			if (cfid->has_lease) {
 				/*
@@ -2065,8 +2149,8 @@ static void cfids_laundromat_worker(struct work_struct *work)
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 
-	list_for_each_entry_safe(cfid, q, &entry, entry) {
-		list_del(&cfid->entry);
+	list_for_each_entry_safe(cfid, q, &entry, dying_entry) {
+		list_del(&cfid->dying_entry);
 
 		dput(cfid->dentry);
 		cfid->dentry = NULL;
@@ -2098,7 +2182,13 @@ struct cached_fids *init_cached_dirs(void)
 	if (!cfids)
 		return NULL;
 	spin_lock_init(&cfids->cfid_list_lock);
-	INIT_LIST_HEAD(&cfids->entries);
+	cfids->entries = RB_ROOT;
+	cfids->dentry_ht = kcalloc(1 << CACHED_DIR_DENTRY_HT_BITS,
+				   sizeof(*cfids->dentry_ht), GFP_KERNEL);
+	if (!cfids->dentry_ht) {
+		kfree(cfids);
+		return NULL;
+	}
 	INIT_LIST_HEAD(&cfids->dying);
 
 	INIT_DELAYED_WORK(&cfids->laundromat_work, cfids_laundromat_worker);
@@ -2126,25 +2216,34 @@ void free_cached_dirs(struct cached_fids *cfids)
 
 	cancel_delayed_work_sync(&cfids->laundromat_work);
 
+	kfree(cfids->dentry_ht);
+	cfids->dentry_ht = NULL;
+
 	spin_lock(&cfids->cfid_list_lock);
-	list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
+	for (struct rb_node *rb_node = rb_first(&cfids->entries), *next_node;
+	     rb_node; rb_node = next_node) {
+		next_node = rb_next(rb_node);
+		cfid = rb_entry(rb_node, struct cached_fid, node);
 		cfid->on_list = false;
+		if (cfid->dentry)
+			hlist_del_init(&cfid->dentry_node);
 		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
 		spin_unlock(&cfid->cfid_lock);
-		list_move(&cfid->entry, &entry);
+		rb_erase(rb_node, &cfids->entries);
+		list_add(&cfid->dying_entry, &entry);
 	}
-	list_for_each_entry_safe(cfid, q, &cfids->dying, entry) {
+	list_for_each_entry_safe(cfid, q, &cfids->dying, dying_entry) {
 		cfid->on_list = false;
 		spin_lock(&cfid->cfid_lock);
 		cfid->is_open = false;
 		spin_unlock(&cfid->cfid_lock);
-		list_move(&cfid->entry, &entry);
+		list_move(&cfid->dying_entry, &entry);
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 
-	list_for_each_entry_safe(cfid, q, &entry, entry) {
-		list_del(&cfid->entry);
+	list_for_each_entry_safe(cfid, q, &entry, dying_entry) {
+		list_del(&cfid->dying_entry);
 		free_cached_dir(cfid);
 	}
 
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 0726f25b9144a..58dde9452ec9b 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -11,6 +11,7 @@
 #include <linux/completion.h>
 #include <linux/build_bug.h>
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/netfs.h>
 
 struct cifs_search_info;
@@ -139,7 +140,9 @@ struct cached_dirents {
 };
 
 struct cached_fid {
-	struct list_head entry;
+	struct rb_node node;
+	struct hlist_node dentry_node;
+	struct list_head dying_entry;
 	struct cached_fids *cfids;
 	const char *path;
 	bool has_lease;
@@ -181,7 +184,8 @@ struct cached_fids {
 	 */
 	spinlock_t cfid_list_lock;
 	int num_entries;
-	struct list_head entries;
+	struct rb_root entries;
+	struct hlist_head *dentry_ht;
 	struct list_head dying;
 	struct delayed_work laundromat_work;
 	/* aggregate accounting for all cached dirents under this tcon */
diff --git a/fs/smb/client/cifs_debug.c b/fs/smb/client/cifs_debug.c
index cc7d26a3917c5..81908679a11e3 100644
--- a/fs/smb/client/cifs_debug.c
+++ b/fs/smb/client/cifs_debug.c
@@ -326,7 +326,9 @@ static int cifs_debug_dirs_proc_show(struct seq_file *m, void *v)
 						cfids->num_entries,
 						(unsigned long)atomic_long_read(&cfids->total_dirents_entries),
 						(unsigned long long)atomic64_read(&cfids->total_dirents_bytes));
-				list_for_each_entry(cfid, &cfids->entries, entry) {
+				for (struct rb_node *rb_node = rb_first(&cfids->entries);
+				     rb_node; rb_node = rb_next(rb_node)) {
+					cfid = rb_entry(rb_node, struct cached_fid, node);
 					spin_lock(&cfid->cfid_lock);
 					seq_printf(m, "0x%x 0x%llx 0x%llx ",
 						tcon->tid,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 19/19] cifs: invalidate cached_dirents if population aborted
  2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
                   ` (16 preceding siblings ...)
  2026-05-01 11:20 ` [PATCH v4 18/19] cifs: keep cfids in rbtree for efficient lookups nspmangalore
@ 2026-05-01 11:20 ` nspmangalore
  17 siblings, 0 replies; 22+ messages in thread
From: nspmangalore @ 2026-05-01 11:20 UTC (permalink / raw)
  To: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya
  Cc: Shyam Prasad N

From: Shyam Prasad N <sprasad@microsoft.com>

In order to make sure that parallel readdirs do not populate
the cfid->cached_dirents, only the first readdir is given
"ownership" of populating cached_dirents. However, if the
next readdir on the same FD never arrives, we will always
miss the dirent cache.

This change introduces a 10-second timeout which will be used
by laundromat thread to check if the cached_dirents can be
invalidated. Ten seconds is a long enough interval between
successive readdir calls.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/smb/client/cached_dir.c | 65 ++++++++++++++++++++++++++++++++++----
 fs/smb/client/cached_dir.h |  1 +
 2 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 14c87ac1a4ad4..b626045745ca2 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -23,12 +23,18 @@ static void cfids_laundromat_worker(struct work_struct *work);
 
 #define CACHED_DIRENT_HASH_BITS	7
 #define CACHED_DIR_DENTRY_HT_BITS	8
+#define CACHED_DIR_POPULATE_TIMEOUT	10
 
 struct cached_dir_dentry {
 	struct list_head entry;
 	struct dentry *dentry;
 };
 
+struct cached_dir_invalidate_entry {
+	struct list_head entry;
+	struct cached_fid *cfid;
+};
+
 /* Generic helpers */
 bool cached_dir_is_valid(struct cached_fid *cfid)
 {
@@ -496,6 +502,7 @@ static void fail_cached_dir_locked(struct cached_dirents *cde)
 	 * can claim this slot and repopulate the cache.
 	 */
 	cde->file = NULL;
+	cde->last_populate_time = 0;
 }
 
 /* insert cached_dirent into lookup hashtable */
@@ -799,6 +806,7 @@ bool emit_cached_dir_if_valid(struct cached_fid *cfid,
 		cfid->dirents.file = file;
 		cfid->dirents.dir_inode = file_inode(file);
 		cfid->dirents.pos = 2;
+		cfid->dirents.last_populate_time = jiffies;
 		cached_dir_reset_insert_cursor_locked(&cfid->dirents);
 		/*
 		 * A previous population attempt may have failed and left
@@ -851,6 +859,28 @@ static void finished_cached_dirents_count(struct cached_dirents *cde,
 		cached_mapping->folio_is_eof = 1;
 
 	cde->is_valid = 1;
+	cde->last_populate_time = 0;
+}
+
+static void maybe_invalidate_stale_cached_dirents(struct cached_fid *cfid)
+{
+	struct cached_dirents *cde = &cfid->dirents;
+
+	mutex_lock(&cde->de_mutex);
+	if (cde->last_populate_time && !cde->is_valid && !cde->is_failed &&
+	    cde->file &&
+	    time_after(jiffies,
+		       cde->last_populate_time + HZ * CACHED_DIR_POPULATE_TIMEOUT))
+		fail_cached_dir_locked(cde);
+	mutex_unlock(&cde->de_mutex);
+}
+
+static unsigned long cached_dir_laundromat_interval_seconds(void)
+{
+	if (!dir_cache_timeout)
+		return CACHED_DIR_POPULATE_TIMEOUT;
+
+	return min_t(unsigned int, dir_cache_timeout, CACHED_DIR_POPULATE_TIMEOUT);
 }
 
 /* update the cached_dirent for a given name in list */
@@ -992,6 +1022,8 @@ bool add_to_cached_dir(struct cached_fid *cfid,
 	old_bytes = cfid->dirents.bytes_used;
 	added = add_cached_dirent(&cfid->dirents, ctx, name, namelen,
 				  fattr, file);
+	if (added)
+		cfid->dirents.last_populate_time = jiffies;
 	new_entries = cfid->dirents.entries_count;
 	new_bytes = cfid->dirents.bytes_used;
 	mutex_unlock(&cfid->dirents.de_mutex);
@@ -2111,7 +2143,9 @@ static void cfids_laundromat_worker(struct work_struct *work)
 {
 	struct cached_fids *cfids;
 	struct cached_fid *cfid, *q;
+	struct cached_dir_invalidate_entry *inv, *inv_q;
 	LIST_HEAD(entry);
+	LIST_HEAD(invalidate_list);
 
 	cfids = container_of(work, struct cached_fids, laundromat_work.work);
 
@@ -2121,6 +2155,9 @@ static void cfids_laundromat_worker(struct work_struct *work)
 
 	for (struct rb_node *rb_node = rb_first(&cfids->entries), *next_node;
 	     rb_node; rb_node = next_node) {
+		struct cached_dir_invalidate_entry *inv_ent;
+		unsigned long last_populate_time;
+
 		next_node = rb_next(rb_node);
 		cfid = rb_entry(rb_node, struct cached_fid, node);
 		spin_lock(&cfid->cfid_lock);
@@ -2144,11 +2181,29 @@ static void cfids_laundromat_worker(struct work_struct *work)
 				kref_get(&cfid->refcount);
 			}
 		} else {
+			last_populate_time = READ_ONCE(cfid->dirents.last_populate_time);
+			if (last_populate_time &&
+			    time_after(jiffies,
+				       last_populate_time + HZ * CACHED_DIR_POPULATE_TIMEOUT)) {
+				inv_ent = kmalloc_obj(*inv_ent, GFP_ATOMIC);
+				if (inv_ent) {
+					kref_get(&cfid->refcount);
+					inv_ent->cfid = cfid;
+					list_add_tail(&inv_ent->entry, &invalidate_list);
+				}
+			}
 			spin_unlock(&cfid->cfid_lock);
 		}
 	}
 	spin_unlock(&cfids->cfid_list_lock);
 
+	list_for_each_entry_safe(inv, inv_q, &invalidate_list, entry) {
+		list_del(&inv->entry);
+		maybe_invalidate_stale_cached_dirents(inv->cfid);
+		close_cached_dir(inv->cfid);
+		kfree(inv);
+	}
+
 	list_for_each_entry_safe(cfid, q, &entry, dying_entry) {
 		list_del(&cfid->dying_entry);
 
@@ -2169,9 +2224,8 @@ static void cfids_laundromat_worker(struct work_struct *work)
 			 */
 			close_cached_dir(cfid);
 	}
-	if (dir_cache_timeout)
-		queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
-				   dir_cache_timeout * HZ);
+	queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
+			   cached_dir_laundromat_interval_seconds() * HZ);
 }
 
 struct cached_fids *init_cached_dirs(void)
@@ -2192,9 +2246,8 @@ struct cached_fids *init_cached_dirs(void)
 	INIT_LIST_HEAD(&cfids->dying);
 
 	INIT_DELAYED_WORK(&cfids->laundromat_work, cfids_laundromat_worker);
-	if (dir_cache_timeout)
-		queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
-				   dir_cache_timeout * HZ);
+	queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
+			   cached_dir_laundromat_interval_seconds() * HZ);
 
 	atomic_long_set(&cfids->total_dirents_entries, 0);
 	atomic64_set(&cfids->total_dirents_bytes, 0);
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 58dde9452ec9b..eca0a0ca3674c 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -120,6 +120,7 @@ struct cached_dirents {
 	struct inode *dir_inode;
 	struct mutex de_mutex;
 	loff_t pos;		 /* Expected ctx->pos */
+	unsigned long last_populate_time; /* jiffies of last successful populate progress */
 	struct folio_queue *folioq;
 	struct list_head entry_list;
 	unsigned int entry_list_count;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 13/19] cifs: option to disable time-based eviction of cache
  2026-05-01 11:20 ` [PATCH v4 13/19] cifs: option to disable time-based eviction of cache nspmangalore
@ 2026-05-01 15:47   ` Steve French
  2026-05-04 12:28     ` Shyam Prasad N
  0 siblings, 1 reply; 22+ messages in thread
From: Steve French @ 2026-05-01 15:47 UTC (permalink / raw)
  To: nspmangalore
  Cc: linux-cifs, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya, Shyam Prasad N

Seems like there is still value in temporarily free up the dir cache
(setting dir_cache_timeout to 0), remount with nohandlecache may not
be as helpful if you just want to temporarily reset the dir cache for
debugging.  Is there another option that could set it to unlimited
(e.g. setting to -1, if that is allowed)?

On Fri, May 1, 2026 at 6:20 AM <nspmangalore@gmail.com> wrote:
>
> From: Shyam Prasad N <sprasad@microsoft.com>
>
> Today there is no way to disable time-based eviction of dir cache.
> dir_cache_timeout = 0 meant immediate free up of dir cache on next
> laundromat scan. We already have nohandlecache to disable dir cache.
>
> This changes the meaning of dir_cache_timeout = 0 to mean unlimited
> timeout. Shrinker-based eviction is still possible.
>
> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
> ---
>  fs/smb/client/cifsfs.c  | 2 +-
>  fs/smb/client/connect.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
> index ee5de358e27f8..79a6a4c297ee3 100644
> --- a/fs/smb/client/cifsfs.c
> +++ b/fs/smb/client/cifsfs.c
> @@ -121,7 +121,7 @@ MODULE_PARM_DESC(cifs_max_pending, "Simultaneous requests to server for "
>  unsigned int dir_cache_timeout = 30;
>  module_param(dir_cache_timeout, uint, 0644);
>  MODULE_PARM_DESC(dir_cache_timeout, "Number of seconds to cache directory contents for which we have a lease. Default: 30 "
> -                                "Range: 1 to 65000 seconds, 0 to disable caching dir contents");
> +                                "Range: 0 to 65000 seconds. 0 disables timeout-based cleanup (cached dirs persist until explicitly invalidated).");
>  /* Module-wide total cached dirents (in bytes) across all tcons */
>  atomic64_t cifs_dircache_bytes_used = ATOMIC64_INIT(0);
>  static struct shrinker *cifs_dircache_shrinker;
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index 69b38f0ccf2b2..849c16c538353 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -2698,7 +2698,7 @@ cifs_get_tcon(struct cifs_ses *ses, struct smb3_fs_context *ctx)
>
>         if (ses->server->dialect >= SMB20_PROT_ID &&
>             (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING))
> -               nohandlecache = ctx->nohandlecache || !dir_cache_timeout;
> +               nohandlecache = ctx->nohandlecache;
>         else
>                 nohandlecache = true;
>         tcon = tcon_info_alloc(!nohandlecache, netfs_trace_tcon_ref_new);
> --
> 2.43.0
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 13/19] cifs: option to disable time-based eviction of cache
  2026-05-01 15:47   ` Steve French
@ 2026-05-04 12:28     ` Shyam Prasad N
  0 siblings, 0 replies; 22+ messages in thread
From: Shyam Prasad N @ 2026-05-04 12:28 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya, Shyam Prasad N

On Fri, May 1, 2026 at 9:17 PM Steve French <smfrench@gmail.com> wrote:
>
> Seems like there is still value in temporarily free up the dir cache
> (setting dir_cache_timeout to 0), remount with nohandlecache may not
> be as helpful if you just want to temporarily reset the dir cache for
> debugging.  Is there another option that could set it to unlimited
> (e.g. setting to -1, if that is allowed)?
>
> On Fri, May 1, 2026 at 6:20 AM <nspmangalore@gmail.com> wrote:
> >
> > From: Shyam Prasad N <sprasad@microsoft.com>
> >
> > Today there is no way to disable time-based eviction of dir cache.
> > dir_cache_timeout = 0 meant immediate free up of dir cache on next
> > laundromat scan. We already have nohandlecache to disable dir cache.
> >
> > This changes the meaning of dir_cache_timeout = 0 to mean unlimited
> > timeout. Shrinker-based eviction is still possible.
> >
> > Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
> > ---
> >  fs/smb/client/cifsfs.c  | 2 +-
> >  fs/smb/client/connect.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
> > index ee5de358e27f8..79a6a4c297ee3 100644
> > --- a/fs/smb/client/cifsfs.c
> > +++ b/fs/smb/client/cifsfs.c
> > @@ -121,7 +121,7 @@ MODULE_PARM_DESC(cifs_max_pending, "Simultaneous requests to server for "
> >  unsigned int dir_cache_timeout = 30;
> >  module_param(dir_cache_timeout, uint, 0644);
> >  MODULE_PARM_DESC(dir_cache_timeout, "Number of seconds to cache directory contents for which we have a lease. Default: 30 "
> > -                                "Range: 1 to 65000 seconds, 0 to disable caching dir contents");
> > +                                "Range: 0 to 65000 seconds. 0 disables timeout-based cleanup (cached dirs persist until explicitly invalidated).");
> >  /* Module-wide total cached dirents (in bytes) across all tcons */
> >  atomic64_t cifs_dircache_bytes_used = ATOMIC64_INIT(0);
> >  static struct shrinker *cifs_dircache_shrinker;
> > diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> > index 69b38f0ccf2b2..849c16c538353 100644
> > --- a/fs/smb/client/connect.c
> > +++ b/fs/smb/client/connect.c
> > @@ -2698,7 +2698,7 @@ cifs_get_tcon(struct cifs_ses *ses, struct smb3_fs_context *ctx)
> >
> >         if (ses->server->dialect >= SMB20_PROT_ID &&
> >             (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING))
> > -               nohandlecache = ctx->nohandlecache || !dir_cache_timeout;
> > +               nohandlecache = ctx->nohandlecache;
> >         else
> >                 nohandlecache = true;
> >         tcon = tcon_info_alloc(!nohandlecache, netfs_trace_tcon_ref_new);
> > --
> > 2.43.0
> >
>
>
> --
> Thanks,
>
> Steve

You have an option to drop dir cache just for that.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases
  2026-05-01 11:20 ` [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases nspmangalore
@ 2026-05-06 14:16   ` Bharath SM
  0 siblings, 0 replies; 22+ messages in thread
From: Bharath SM @ 2026-05-06 14:16 UTC (permalink / raw)
  To: nspmangalore
  Cc: linux-cifs, smfrench, pc, bharathsm, dhowells, henrique.carvalho,
	ematsumiya, Shyam Prasad N, stable

On Fri, May 1, 2026 at 4:22 AM <nspmangalore@gmail.com> wrote:
>
> From: Shyam Prasad N <sprasad@microsoft.com>
>
> It is possible that SMB2_open_init may not set lease context based
> on the requested oplock level. This can happen when leases have been
> temporarily or permanently disabled. When this happens, we will have
> open_cached_dir making an open without lease context and the response
> will anyway be rejected by open_cached_dir (thereby forcing a close to
> discard this open). That's unnecessary two round-trips to the server.
>
> This change adds a check before making the open request to the server
> to make sure that SMB2_open_init did add the expected lease context
> to the open in open_cached_dir.
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
> ---
>  fs/smb/client/cached_dir.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> index 04bb95091f498..64e22c064fa0a 100644
> --- a/fs/smb/client/cached_dir.c
> +++ b/fs/smb/client/cached_dir.c
> @@ -286,6 +286,14 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
>                             &rqst[0], &oplock, &oparms, utf16_path);
>         if (rc)
>                 goto oshr_free;
> +
> +       if (oplock != SMB2_OPLOCK_LEVEL_II) {
> +               rc = -EINVAL;
> +               cifs_dbg(FYI, "%s: Oplock level %d not suitable for cached directory\n",
> +                        __func__, oplock);
> +               goto oshr_free;
> +       }
> +
>         smb2_set_next_command(tcon, &rqst[0]);
>
>         memset(&qi_iov, 0, sizeof(qi_iov));
> --
Reviewed-by: Bharath SM <bharathsm@microsoft.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-05-06 14:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 11:20 [PATCH v4 01/19] cifs: change_conf needs to be called for session setup nspmangalore
2026-05-01 11:20 ` [PATCH v4 02/19] cifs: abort open_cached_dir if we don't request leases nspmangalore
2026-05-06 14:16   ` Bharath SM
2026-05-01 11:20 ` [PATCH v4 03/19] cifs: invalidate cfid on unlink/rename/rmdir nspmangalore
2026-05-01 11:20 ` [PATCH v4 04/19] cifs: define variable sized buffer for querydir responses nspmangalore
2026-05-01 11:20 ` [PATCH v4 05/19] cifs: optimize readdir for small directories nspmangalore
2026-05-01 11:20 ` [PATCH v4 06/19] cifs: optimize readdir for larger directories nspmangalore
2026-05-01 11:20 ` [PATCH v4 07/19] cifs: reorganize cached dir helpers nspmangalore
2026-05-01 11:20 ` [PATCH v4 08/19] cifs: make cfid locks more granular nspmangalore
2026-05-01 11:20 ` [PATCH v4 09/19] cifs: query dir should reuse cfid even if not fully cached nspmangalore
2026-05-01 11:20 ` [PATCH v4 10/19] cifs: back cached_dirents with page cache nspmangalore
2026-05-01 11:20 ` [PATCH v4 11/19] cifs: in place changes to cached_dirents when dir lease is held nspmangalore
2026-05-01 11:20 ` [PATCH v4 12/19] cifs: register a shrinker to manage cached_dirents nspmangalore
2026-05-01 11:20 ` [PATCH v4 13/19] cifs: option to disable time-based eviction of cache nspmangalore
2026-05-01 15:47   ` Steve French
2026-05-04 12:28     ` Shyam Prasad N
2026-05-01 11:20 ` [PATCH v4 14/19] cifs: option to set unlimited number of cached dirs nspmangalore
2026-05-01 11:20 ` [PATCH v4 15/19] cifs: allow dcache population to happen asynchronously nspmangalore
2026-05-01 11:20 ` [PATCH v4 16/19] cifs: trace points for cached_dir operations nspmangalore
2026-05-01 11:20 ` [PATCH v4 17/19] cifs: discard functions to ensure that mid callbacks get called nspmangalore
2026-05-01 11:20 ` [PATCH v4 18/19] cifs: keep cfids in rbtree for efficient lookups nspmangalore
2026-05-01 11:20 ` [PATCH v4 19/19] cifs: invalidate cached_dirents if population aborted nspmangalore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox