Linux CIFS filesystem development
 help / color / mirror / Atom feed
* [PATCH v2 14/14] netfs: Fix DIO write retry for filesystems without a ->prepare_write()
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel,
	syzbot+3c74b1f0c372e98efc32, hongao, ChenXiaoSong
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix netfs_unbuffered_write() so that it doesn't re-issue a write twice when
the filesystem doesn't have a ->prepare_write().  The bit of code that does
the first issue should just be removed as everything it does is done again
when the loop it's in goes back to the top.

Reported-by: syzbot+3c74b1f0c372e98efc32@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3c74b1f0c372e98efc32
Tested-by: syzbot+3c74b1f0c372e98efc32@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: hongao <hongao@uniontech.com>
cc: ChenXiaoSong <chenxiaosong@chenxiaosong.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 25f8ceb15fad..1ff1789016b8 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -190,12 +190,6 @@ static int netfs_unbuffered_write(struct netfs_io_request *wreq)
 			stream->prepare_write(subreq);
 			__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
 			netfs_stat(&netfs_n_wh_retry_write_subreq);
-		} else {
-			struct iov_iter source;
-
-			netfs_reset_iter(subreq);
-			source = subreq->io_iter;
-			netfs_reissue_write(stream, subreq, &source);
 		}
 	}
 


^ permalink raw reply related

* [PATCH v2 13/14] netfs: Fix folio state after ENOMEM whilst under writeback iteration
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix the state of the current folio when ENOMEM occurs during writeback
iteration.  The folio needs to be redirtied and unlocked before the
terminal writeback_iter() is invoked.

Fixes: 06fa229ceb36 ("netfs: Abstract out a rolling folio buffer implementation")
Link: https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/write_issue.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index d1f98cca5d0f..6eb6cff12c17 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -334,8 +334,11 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
 
 	_enter("");
 
-	if (rolling_buffer_make_space(&wreq->buffer) < 0)
+	if (rolling_buffer_make_space(&wreq->buffer) < 0) {
+		folio_redirty_for_writepage(wbc, folio);
+		folio_unlock(folio);
 		return -ENOMEM;
+	}
 
 	/* netfs_perform_write() may shift i_size around the page or from out
 	 * of the page to beyond it, but cannot move i_size into or through the


^ permalink raw reply related

* [PATCH v2 12/14] netfs: Fix writeback error handling
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix the error handling in writeback_iter() loop.  If an error occurs,
writeback_iter() needs to be called again with *error set to the error so
that it can clean up iteration state.  Further, the current folio needs
unlocking and redirtying.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Link: https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/write_issue.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index e5d4f9ba3d6b..d1f98cca5d0f 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -582,8 +582,6 @@ int netfs_writepages(struct address_space *mapping,
 		}
 
 		error = netfs_write_folio(wreq, wbc, folio);
-		if (error < 0)
-			break;
 	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
 
 	netfs_end_issue_write(wreq);
@@ -594,6 +592,9 @@ int netfs_writepages(struct address_space *mapping,
 	return error;
 
 couldnt_start:
+	folio_redirty_for_writepage(wbc, folio);
+	folio_unlock(folio);
+	writeback_iter(mapping, wbc, folio, &error);
 	netfs_kill_dirty_pages(mapping, wbc, folio);
 out:
 	netfs_wb_end(ictx);


^ permalink raw reply related

* [PATCH v2 11/14] netfs: Fix writethrough to use collection offload
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix writethrough write to set NETFS_RREQ_OFFLOAD_COLLECTION on the request
so that collection is processed asynchronously rather than only right at
the end - and also so that asynchronous O_SYNC writes get collected at all.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/write_issue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 437bb50ce175..e5d4f9ba3d6b 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -620,6 +620,7 @@ struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len
 	}
 
 	wreq->io_streams[0].avail = true;
+	__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &wreq->flags);
 	trace_netfs_write(wreq, netfs_write_trace_writethrough);
 	return wreq;
 }


^ permalink raw reply related

* [PATCH v2 10/14] netfs: Replace wb_lock with a bit lock for asynchronicity
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

The netfs_inode::wb_lock mutex is used to prevent multiple simultaneous
writebacks from fighting each other (a writeback thread will write multiple
discontiguous regions within the same request).  The mutex, however, only
serialises the issuing of subrequests; it doesn't serialise the collection
of results, and, in particular, the updating of file size information and
fscache populatedness data.

Unfortunately, the mutex cannot be held around the entire process as it has
to be unlocked in the same thread in which it is locked - and we don't want
to hold up the allocator whilst we complete the writeback.

Fix this by replacing the mutex with a bit flag and a list of lock waiters
so that the lock can be dropped in the collector thread after collection is
complete.

Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/afs/symlink.c         |  4 +-
 fs/netfs/locking.c       | 95 ++++++++++++++++++++++++++++++++++++++++
 fs/netfs/write_collect.c | 10 +++++
 fs/netfs/write_issue.c   | 37 +++++-----------
 include/linux/netfs.h    | 11 ++++-
 5 files changed, 126 insertions(+), 31 deletions(-)

diff --git a/fs/afs/symlink.c b/fs/afs/symlink.c
index ed5868369f37..16b4823cb7b7 100644
--- a/fs/afs/symlink.c
+++ b/fs/afs/symlink.c
@@ -255,11 +255,11 @@ int afs_symlink_writepages(struct address_space *mapping,
 	}
 
 	if (ret == 0) {
-		mutex_lock(&vnode->netfs.wb_lock);
+		netfs_wb_begin(&vnode->netfs, false);
 		netfs_free_folioq_buffer(vnode->directory);
 		vnode->directory = NULL;
 		vnode->directory_size = 0;
-		mutex_unlock(&vnode->netfs.wb_lock);
+		netfs_wb_end(&vnode->netfs);
 	} else if (ret == 1) {
 		ret = 0; /* Skipped write due to lock conflict. */
 	}
diff --git a/fs/netfs/locking.c b/fs/netfs/locking.c
index 2249ecd09d0a..4e3be2b81504 100644
--- a/fs/netfs/locking.c
+++ b/fs/netfs/locking.c
@@ -9,6 +9,11 @@
 #include <linux/netfs.h>
 #include "internal.h"
 
+struct netfs_wb_waiter {
+	struct list_head	link;		/* Link in ictx->wb_queue */
+	struct task_struct	*waiter;	/* Waiter task; cleared when lock granted */
+};
+
 /*
  * inode_dio_wait_interruptible - wait for outstanding DIO requests to finish
  * @inode: inode to wait for
@@ -203,3 +208,93 @@ void netfs_end_io_direct(struct inode *inode)
 	up_read(&inode->i_rwsem);
 }
 EXPORT_SYMBOL(netfs_end_io_direct);
+
+/*
+ * Wait to have exclusive access to writeback.
+ */
+static bool netfs_wb_begin_wait(struct netfs_inode *ictx)
+{
+	struct netfs_wb_waiter waiter = {};
+	struct task_struct *tsk = current;
+	bool got = false;
+
+	spin_lock(&ictx->lock);
+
+	if (test_and_set_bit_lock(NETFS_ICTX_WB_LOCK, &ictx->flags)) {
+		get_task_struct(tsk);
+		waiter.waiter = tsk;
+		list_add_tail(&waiter.link, &ictx->wb_queue);
+	} else {
+		got = true;
+	}
+	spin_unlock(&ictx->lock);
+
+	if (!got) {
+		for (;;) {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			/* Read waiter before accessing inode state. */
+			if (smp_load_acquire(&waiter.waiter) == NULL)
+				break;
+			schedule();
+		}
+	}
+	__set_current_state(TASK_RUNNING);
+	return true;
+}
+
+/**
+ * netfs_wb_begin - Begin writeback, waiting if need be
+ * @ictx: The inode to get writeback access on
+ * @nowait: Return failure immediately rather than waiting if true
+ *
+ * Begin writeback to an inode, waiting for exclusive access if @nowait is
+ * false.  This prevents collection from being done out of order with respect
+ * to the issuance of write subrequests.
+ *
+ * Note that writeback may be ended in a different process (e.g. the collection
+ * function on a workqueue) than started it.
+ *
+ * Return: True if can proceed, false if denied.
+ */
+bool netfs_wb_begin(struct netfs_inode *ictx, bool nowait)
+{
+	if (!test_and_set_bit_lock(NETFS_ICTX_WB_LOCK, &ictx->flags))
+		return true;
+	if (nowait) {
+		netfs_stat(&netfs_n_wb_lock_skip);
+		return false;
+	}
+	netfs_stat(&netfs_n_wb_lock_wait);
+	return netfs_wb_begin_wait(ictx);
+}
+EXPORT_SYMBOL(netfs_wb_begin);
+
+/* netfs_wb_end - End writeback
+ * @ictx: The inode we have writeback access to
+ *
+ * End writeback access on an inode, waking up the next writeback request.
+ */
+void netfs_wb_end(struct netfs_inode *ictx)
+{
+	struct netfs_wb_waiter *waiter;
+	struct task_struct *tsk;
+
+	WARN_ON_ONCE(!test_bit(NETFS_ICTX_WB_LOCK, &ictx->flags));
+
+	spin_lock(&ictx->lock);
+
+	waiter = list_first_entry_or_null(&ictx->wb_queue, struct netfs_wb_waiter, link);
+	if (waiter) {
+		list_del(&waiter->link);
+		tsk = waiter->waiter;
+		/* Write inode state before clearing waiter. */
+		smp_store_release(&waiter->waiter, NULL);
+		wake_up_process(tsk);
+		put_task_struct(tsk);
+	} else {
+		clear_bit_unlock(NETFS_ICTX_WB_LOCK, &ictx->flags);
+	}
+
+	spin_unlock(&ictx->lock);
+}
+EXPORT_SYMBOL(netfs_wb_end);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 24fc2bb2f8a4..210eb8f3958d 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -408,6 +408,16 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
 	netfs_wake_rreq_flag(wreq, NETFS_RREQ_IN_PROGRESS, netfs_rreq_trace_wake_ip);
 	/* As we cleared NETFS_RREQ_IN_PROGRESS, we acquired its ref. */
 
+	switch (wreq->origin) {
+	case NETFS_WRITEBACK:
+	case NETFS_WRITEBACK_SINGLE:
+	case NETFS_WRITETHROUGH:
+		netfs_wb_end(ictx);
+		break;
+	default:
+		break;
+	}
+
 	if (wreq->iocb) {
 		size_t written = min(wreq->transferred, wreq->len);
 		wreq->iocb->ki_pos += written;
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index c03c7cc45e47..437bb50ce175 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -551,14 +551,8 @@ int netfs_writepages(struct address_space *mapping,
 	struct folio *folio;
 	int error = 0;
 
-	if (!mutex_trylock(&ictx->wb_lock)) {
-		if (wbc->sync_mode == WB_SYNC_NONE) {
-			netfs_stat(&netfs_n_wb_lock_skip);
-			return 0;
-		}
-		netfs_stat(&netfs_n_wb_lock_wait);
-		mutex_lock(&ictx->wb_lock);
-	}
+	if (!netfs_wb_begin(ictx, wbc->sync_mode == WB_SYNC_NONE))
+		return 0;
 
 	/* Need the first folio to be able to set up the op. */
 	folio = writeback_iter(mapping, wbc, NULL, &error);
@@ -593,8 +587,6 @@ int netfs_writepages(struct address_space *mapping,
 	} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
 
 	netfs_end_issue_write(wreq);
-
-	mutex_unlock(&ictx->wb_lock);
 	netfs_wake_collector(wreq);
 
 	netfs_put_request(wreq, netfs_rreq_trace_put_return);
@@ -604,7 +596,7 @@ int netfs_writepages(struct address_space *mapping,
 couldnt_start:
 	netfs_kill_dirty_pages(mapping, wbc, folio);
 out:
-	mutex_unlock(&ictx->wb_lock);
+	netfs_wb_end(ictx);
 	_leave(" = %d", error);
 	return error;
 }
@@ -618,12 +610,12 @@ struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len
 	struct netfs_io_request *wreq = NULL;
 	struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
 
-	mutex_lock(&ictx->wb_lock);
+	netfs_wb_begin(ictx, false);
 
 	wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
 				      iocb->ki_pos, NETFS_WRITETHROUGH);
 	if (IS_ERR(wreq)) {
-		mutex_unlock(&ictx->wb_lock);
+		netfs_wb_end(ictx);
 		return wreq;
 	}
 
@@ -685,7 +677,6 @@ int netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_c
 ssize_t netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
 			       struct folio *writethrough_cache)
 {
-	struct netfs_inode *ictx = netfs_inode(wreq->inode);
 	ssize_t ret;
 
 	_enter("R=%x", wreq->debug_id);
@@ -699,8 +690,6 @@ ssize_t netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_c
 
 	netfs_end_issue_write(wreq);
 
-	mutex_unlock(&ictx->wb_lock);
-
 	if (wreq->iocb)
 		ret = -EIOCBQUEUED;
 	else
@@ -847,15 +836,10 @@ int netfs_writeback_single(struct address_space *mapping,
 	if (WARN_ON_ONCE(!iov_iter_is_folioq(iter)))
 		return -EIO;
 
-	if (!mutex_trylock(&ictx->wb_lock)) {
-		if (wbc->sync_mode == WB_SYNC_NONE) {
-			/* The VFS will have undirtied the inode. */
-			netfs_single_mark_inode_dirty(&ictx->inode);
-			netfs_stat(&netfs_n_wb_lock_skip);
-			return 1;
-		}
-		netfs_stat(&netfs_n_wb_lock_wait);
-		mutex_lock(&ictx->wb_lock);
+	if (!netfs_wb_begin(ictx, wbc->sync_mode == WB_SYNC_NONE)) {
+		/* The VFS will have undirtied the inode. */
+		netfs_single_mark_inode_dirty(&ictx->inode);
+		return 1;
 	}
 
 	wreq = netfs_create_write_req(mapping, NULL, 0, NETFS_WRITEBACK_SINGLE);
@@ -893,7 +877,6 @@ int netfs_writeback_single(struct address_space *mapping,
 	smp_wmb(); /* Write lists before ALL_QUEUED. */
 	set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
 
-	mutex_unlock(&ictx->wb_lock);
 	netfs_wake_collector(wreq);
 
 	netfs_put_request(wreq, netfs_rreq_trace_put_return);
@@ -901,7 +884,7 @@ int netfs_writeback_single(struct address_space *mapping,
 	return ret;
 
 couldnt_start:
-	mutex_unlock(&ictx->wb_lock);
+	netfs_wb_end(ictx);
 	_leave(" = %d", ret);
 	return ret;
 }
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index bdc270e84b30..1bc120d61c5b 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -61,14 +61,16 @@ struct netfs_inode {
 #if IS_ENABLED(CONFIG_FSCACHE)
 	struct fscache_cookie	*cache;
 #endif
-	struct mutex		wb_lock;	/* Writeback serialisation */
+	struct list_head	wb_queue;	/* Queue of processes wanting to do writeback */
 	loff_t			_remote_i_size;	/* Size of the remote file */
 	loff_t			_zero_point;	/* Size after which we assume there's no data
 						 * on the server */
+	spinlock_t		lock;		/* Lock covering wb_queue */
 	atomic_t		io_count;	/* Number of outstanding reqs */
 	unsigned long		flags;
 #define NETFS_ICTX_ODIRECT	0		/* The file has DIO in progress */
 #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the pagecache */
+#define NETFS_ICTX_WB_LOCK	2		/* Writeback serialisation lock */
 #define NETFS_ICTX_MODIFIED_ATTR 3		/* Indicate change in mtime/ctime */
 #define NETFS_ICTX_SINGLE_NO_UPLOAD 4		/* Monolithic payload, cache but no upload */
 };
@@ -462,6 +464,10 @@ int netfs_alloc_folioq_buffer(struct address_space *mapping,
 			      size_t *_cur_size, ssize_t size, gfp_t gfp);
 void netfs_free_folioq_buffer(struct folio_queue *fq);
 
+/* Writeback exclusion API. */
+bool netfs_wb_begin(struct netfs_inode *ictx, bool nowait);
+void netfs_wb_end(struct netfs_inode *ictx);
+
 /**
  * netfs_inode - Get the netfs inode context from the inode
  * @inode: The inode to query
@@ -743,7 +749,8 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
 #if IS_ENABLED(CONFIG_FSCACHE)
 	ctx->cache = NULL;
 #endif
-	mutex_init(&ctx->wb_lock);
+	INIT_LIST_HEAD(&ctx->wb_queue);
+	spin_lock_init(&ctx->lock);
 	/* ->releasepage() drives zero_point */
 	if (use_zero_point) {
 		ctx->_zero_point = ctx->_remote_i_size;


^ permalink raw reply related

* [PATCH v2 09/14] netfs: Fix kdoc warning
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix a kdoc warning due to a misnamed parameter in the description.

Reported-by: Matthew Wilcox <willy@infradead.org>
cc: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 include/linux/netfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 243c0f737938..bdc270e84b30 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -753,7 +753,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
 
 /**
  * netfs_resize_file - Note that a file got resized
- * @ctx: The netfs inode being resized
+ * @ictx: The netfs inode being resized
  * @new_i_size: The new file size
  * @changed_on_server: The change was applied to the server
  *


^ permalink raw reply related

* [PATCH v2 08/14] scatterlist: Fix offset in folio calc in extract_xarray_to_sg()
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox,
	Christoph Hellwig, Jens Axboe, Mike Marshall
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix the calculation of the offset in the folio being extracted in
extract_xarray_to_sg().

Note that in the near future, ITER_XARRAY should be removed.

Fixes: f5f82cd18732 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 lib/scatterlist.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index b7fe91ef35b8..6ea40d2e6247 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1366,6 +1366,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
 		sg_max--;
 
 		maxsize -= len;
+		start += len;
 		ret += len;
 		if (maxsize <= 0 || sg_max == 0)
 			break;


^ permalink raw reply related

* [PATCH v2 07/14] iov_iter: Remove unused variable in kunit_iov_iter.c
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Ming Lei, Matthew Wilcox,
	Christoph Hellwig, Jens Axboe
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Remove the no longer used variable 'b' from iov_kunit_copy_to_bvec().  The
variable is initialised and incremented, but nothing now makes use of the
value.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ming Lei <ming.lei@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 lib/tests/kunit_iov_iter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/tests/kunit_iov_iter.c b/lib/tests/kunit_iov_iter.c
index f02f7b7aa796..e7e154f94f66 100644
--- a/lib/tests/kunit_iov_iter.c
+++ b/lib/tests/kunit_iov_iter.c
@@ -275,7 +275,7 @@ static void __init iov_kunit_copy_to_bvec(struct kunit *test)
 	struct page **spages, **bpages;
 	u8 *scratch, *buffer;
 	size_t bufsize, npages, size, copied;
-	int i, b, patt;
+	int i, patt;
 
 	bufsize = 0x100000;
 	npages = bufsize / PAGE_SIZE;
@@ -298,10 +298,9 @@ static void __init iov_kunit_copy_to_bvec(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, iter.nr_segs, 0);
 
 	/* Build the expected image in the scratch buffer. */
-	b = 0;
 	patt = 0;
 	memset(scratch, 0, bufsize);
-	for (pr = bvec_test_ranges; pr->from >= 0; pr++, b++) {
+	for (pr = bvec_test_ranges; pr->from >= 0; pr++) {
 		u8 *p = scratch + pr->page * PAGE_SIZE;
 
 		for (i = pr->from; i < pr->to; i++)


^ permalink raw reply related

* [PATCH v2 06/14] iov_iter: Fix a memory leak in iov_iter_extract_user_pages()
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox,
	Christoph Hellwig, Jens Axboe
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

There's a potential memory leak in callers of iov_iter_extract_user_pages()
whereby if a pages array is allocated in function, it isn't freed before
returning of an error or 0.

Now, it's not a leak per se in iov_iter_extract_user_pages() as, if an
array is allocated, it's returned through *pages, so it's incumbent on the
caller to free it.  However, not all callers do.

Fix this by freeing the table and clearing *pages before returning an error
or 0.  Note that iov_iter_extract_pages() and its subfunctions are allowed
to return 0 without returning an array (for instance if the iterator count
is 0).

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 lib/iov_iter.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1661c237b643..5a6434709716 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1756,6 +1756,7 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
 	unsigned long addr;
 	unsigned int gup_flags = 0;
 	size_t offset;
+	bool will_alloc = !*pages;
 	int res;
 
 	if (i->data_source == ITER_DEST)
@@ -1772,8 +1773,14 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
 	if (!maxpages)
 		return -ENOMEM;
 	res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
-	if (unlikely(res <= 0))
+	if (unlikely(res <= 0)) {
+		if (will_alloc) {
+			kvfree(*pages);
+			*pages = NULL;
+		}
 		return res;
+	}
+
 	maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
 	iov_iter_advance(i, maxsize);
 	return maxsize;


^ permalink raw reply related

* [PATCH v2 05/14] iov_iter: Fix missing alloc fail check in iov_iter_extract_bvec_pages()
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Ming Lei, Matthew Wilcox,
	Christoph Hellwig, Jens Axboe
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix iov_iter_extract_bvec_pages() to check if want_pages_array() fails and,
if so, return -ENOMEM appropriately.

Fixes: e4e535bff2bc ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ming Lei <ming.lei@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 lib/iov_iter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5f60733f3280..1661c237b643 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1637,6 +1637,8 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
 	bi.bi_bvec_done = skip;
 
 	maxpages = want_pages_array(pages, maxsize, skip, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
 
 	while (bi.bi_size && bi.bi_idx < i->nr_segs) {
 		struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);


^ permalink raw reply related

* [PATCH v2 04/14] iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages()
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox,
	Christoph Hellwig, Jens Axboe, Mike Marshall
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

In iov_iter_extract_xarray_pages(), if no pages are extracted because
there's a hole (or something otherwise unextractable) in the xarray, then
the calculation of maxsize at the end can go wrong if the starting offset
is not zero.

Fix this by returning 0 in such a case and freeing the page array if
allocated here rather than being passed in.

Note that in the near future, ITER_XARRAY should be removed.

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Link: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 lib/iov_iter.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af7..5f60733f3280 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1568,6 +1568,7 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
 	struct folio *folio;
 	unsigned int nr = 0, offset;
 	loff_t pos = i->xarray_start + i->iov_offset;
+	bool will_alloc = !*pages;
 	XA_STATE(xas, i->xarray, pos >> PAGE_SHIFT);
 
 	offset = pos & ~PAGE_MASK;
@@ -1595,6 +1596,14 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
 	}
 	rcu_read_unlock();
 
+	if (!nr) {
+		if (will_alloc) {
+			kvfree(*pages);
+			*pages = NULL;
+		}
+		return 0;
+	}
+
 	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
 	iov_iter_advance(i, maxsize);
 	return maxsize;


^ permalink raw reply related

* [PATCH v2 03/14] cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, NeilBrown
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix cachefiles_bury_object() to lock the inode of the file being buried
whilst it unsets the S_KERNEL_FILE flag.

Fixes: 07a90e97400c ("cachefiles: Implement culling daemon commands")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: NeilBrown <neil@brown.name>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/cachefiles/namei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index a464c4646c04..d4309dfb55df 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -375,7 +375,7 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
 					    "Rename failed with error %d", ret);
 	}
 
-	__cachefiles_unmark_inode_in_use(object, d_inode(rep));
+	cachefiles_do_unmark_inode_in_use(object, d_inode(rep));
 	end_renaming(&rd);
 	_leave(" = 0");
 	return 0;


^ permalink raw reply related

* [PATCH v2 02/14] cachefiles: Fix double fput
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

Fix a double fput() in error handling in cachefiles_create_tmpfile().

Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/cachefiles/namei.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 2937db690b40..a464c4646c04 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -467,7 +467,6 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 	ret = -EINVAL;
 	if (unlikely(!file->f_op->read_iter) ||
 	    unlikely(!file->f_op->write_iter)) {
-		fput(file);
 		pr_notice("Cache does not support read_iter and write_iter\n");
 		goto err_unuse;
 	}


^ permalink raw reply related

* [PATCH v2 01/14] netfs: Fix decision whether to disallow write-streaming due to fscache use
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel, Marc Dionne
In-Reply-To: <20260624115737.2964520-1-dhowells@redhat.com>

netfs_perform_write() buffers data by writing it into the pagecache for
later writeback.  If the folio it wants to write to isn't present, it uses
"write streaming" in which is will store partial data in a non-uptodate,
but dirty folio.

However, when fscache is in use, this is a potential problem as writes to
the cache have to be aligned to the cache backend's DIO granularity, and so
netfs_perform_write() attempts to suppress write-streaming in such a case,
requiring the folio content to be fetched first unless the entire folio is
going to be overwritten.  This allows the content to be written to the
cache too.

Unfortunately, the test netfs_perform_write() uses isn't correct because it
doesn't take into account the fact that the object lookup is asynchronous
and farmed off to a work queue, so there's a short window in which the
cache is doing a lookup but the test fails because the answer is undefined.

This can be triggered by the generic/464 xfstest, and causes a warning to
be emitted in cachefiles (in code not yet upstream) because it sees a write
that doesn't have its bounds rounded out to DIO alignment.

Fix this by changing the condition to whether FSCACHE_COOKIE_IS_CACHING is
set on a cookie rather than whether the cookie is marked enabled.  Note
that this is really just a hint as to whether we allow write streaming or
not and no other aspects of the cookie or cache object are accessed.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
cc: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/buffered_write.c |  2 +-
 fs/netfs/internal.h       | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 6bde3320bcec..2cdb68e6b16f 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -277,7 +277,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
 		 * caching service temporarily because the backing store got
 		 * culled.
 		 */
-		if (netfs_is_cache_enabled(ctx)) {
+		if (netfs_is_cache_maybe_enabled(ctx)) {
 			if (finfo) {
 				netfs_stat(&netfs_n_wh_wstream_conflict);
 				goto flush_content;
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 645996ecfc80..d889caa401dc 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -239,6 +239,18 @@ static inline bool netfs_is_cache_enabled(struct netfs_inode *ctx)
 #endif
 }
 
+static inline bool netfs_is_cache_maybe_enabled(struct netfs_inode *ctx)
+{
+#if IS_ENABLED(CONFIG_FSCACHE)
+	struct fscache_cookie *cookie = ctx->cache;
+
+	return fscache_cookie_valid(cookie) &&
+		test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags);
+#else
+	return false;
+#endif
+}
+
 /*
  * Get a ref on a netfs group attached to a dirty page (e.g. a ceph snap).
  */


^ permalink raw reply related

* [PATCH v2 00/14] netfs: Miscellaneous fixes
From: David Howells @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
	ceph-devel, linux-fsdevel, linux-kernel

Hi Christian,

Here are some miscellaneous fixes for netfslib.  I separated them from my
netfs-next branch.  Various Sashiko review comments[1][2][3] are addressed:

 (1) Fix the decision whether to disallow write-streaming due to fscache
     use.

 (2) Fix a double fput in cachefiles_create_tmpfile().

 (3) Fix alteration of S_KERNEL_FILE inode flag without holding inode lock.

 (4) Fix a potential mathematical underflow in
     iov_iter_extract_xarray_pages() and make it return 0 and free the
     array if no pages could be extracted.

 (5) Fix a missing alloc failure check in iov_iter_extract_bvec_pages().

 (6) Fix iov_iter_extract_user_pages() so that it doesn't leak the pages
     array if it returns an error or 0 (inasmuch as the leak is really in
     the callers).

 (7) Remove an unused variable in kunit_iov_iter.c.

 (8) Fix extract_xarray_to_sg() to calculate folio offset correctly.

 (9) Fix a kdoc comment.

(10) Replace the netfs_inode::wb_lock mutex with a bit lock so that the
     lock can be passed to the collector so that multiple asynchronous
     writebacks won't interfere with each other.

(11) Fix writeback error handling to go through writeback_iter() so that it
     can clean up its state.

(12) Fix ENOMEM handling in writeback to clean up the current folio if we
     can't allocate a rolling buffer segment.

(13) Fix unbuffered/DIO write retry for filesystems that don't have a
     ->prepare_write() method.

The patches can also be found here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes

Thanks,
David

[1] https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
[2] https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
[3] https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com

Changes
=======
ver #2)
- Addressed various Sashiko comments[3]:
  - Changed some kfree() to kvfree().
  - Added some writeback error handling patches.
- Fixed DIO write retry.

ver #1)
- Added a fix for S_KERNEL_FILE in cachefiles_bury_object().
- Fixed how iov_iter_extract_*_pages() deals with *pages when returning 0
  or error.
- Fixed the replacement of wb_lock with bit lock:
  - Only release the bit lock for writeback, writethrough and write-single,
    not for PG_private_2-based I/O, which doesn't take the lock.
  - In netfs_writeback_single(), need to redirty the inode if can't get the
    lock.

David Howells (14):
  netfs: Fix decision whether to disallow write-streaming due to fscache
    use
  cachefiles: Fix double fput
  cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE
  iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages()
  iov_iter: Fix missing alloc fail check in
    iov_iter_extract_bvec_pages()
  iov_iter: Fix a memory leak in iov_iter_extract_user_pages()
  iov_iter: Remove unused variable in kunit_iov_iter.c
  scatterlist: Fix offset in folio calc in extract_xarray_to_sg()
  netfs: Fix kdoc warning
  netfs: Replace wb_lock with a bit lock for asynchronicity
  netfs: Fix writethrough to use collection offload
  netfs: Fix writeback error handling
  netfs: Fix folio state after ENOMEM whilst under writeback iteration
  netfs: Fix DIO write retry for filesystems without a ->prepare_write()

 fs/afs/symlink.c           |  4 +-
 fs/cachefiles/namei.c      |  3 +-
 fs/netfs/buffered_write.c  |  2 +-
 fs/netfs/direct_write.c    |  6 ---
 fs/netfs/internal.h        | 12 +++++
 fs/netfs/locking.c         | 95 ++++++++++++++++++++++++++++++++++++++
 fs/netfs/write_collect.c   | 10 ++++
 fs/netfs/write_issue.c     | 48 ++++++++-----------
 include/linux/netfs.h      | 13 ++++--
 lib/iov_iter.c             | 20 +++++++-
 lib/scatterlist.c          |  1 +
 lib/tests/kunit_iov_iter.c |  5 +-
 12 files changed, 171 insertions(+), 48 deletions(-)


^ permalink raw reply

* Re: [PATCH] netfs: Fix UAF in netfs_unbuffered_write() on failed preparation
From: David Howells @ 2026-06-24 11:02 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: dhowells, hongao, Paulo Alcantara, netfs, linux-fsdevel,
	linux-kernel, syzbot+3c74b1f0c372e98efc32, Steve French,
	Namjae Jeon, linux-cifs@vger.kernel.org
In-Reply-To: <498534d3-e82e-40ce-bc7d-230580b2fcae@chenxiaosong.com>

I suspect the issue is this bit in netfs_unbuffered_write():

	for (;;) {
		...
		netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);

		if (stream->prepare_write) {
			stream->prepare_write(subreq);
			__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
			netfs_stat(&netfs_n_wh_retry_write_subreq);
		} else {
			struct iov_iter source;

			netfs_reset_iter(subreq);
			source = subreq->io_iter;
			netfs_reissue_write(stream, subreq, &source); <----
		}
	}

This doesn't happen with AFS because it has a ->prepare_write() method.  Does
this change fix the problem for you?

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 25f8ceb15fad..9f6258da45d6 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -190,12 +190,6 @@ static int netfs_unbuffered_write(struct netfs_io_request *wreq)
 			stream->prepare_write(subreq);
 			__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
 			netfs_stat(&netfs_n_wh_retry_write_subreq);
-		} else {
-			struct iov_iter source;
-
-			netfs_reset_iter(subreq);
-			source = subreq->io_iter;
-			netfs_reissue_write(stream, subreq, &source);
 		}
 	}
 

David


^ permalink raw reply related

* Re: [PATCH] netfs: Fix UAF in netfs_unbuffered_write() on failed preparation
From: David Howells @ 2026-06-24 10:59 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: dhowells, hongao, Paulo Alcantara, netfs, linux-fsdevel,
	linux-kernel, syzbot+3c74b1f0c372e98efc32, Steve French,
	Namjae Jeon, linux-cifs@vger.kernel.org
In-Reply-To: <498534d3-e82e-40ce-bc7d-230580b2fcae@chenxiaosong.com>

Actually, this shows what happened:

 a.out-717 : netfs_sreq: R=00000001[13] UPLD RETRY f=210 s=240000 1fac0/20000 s=160 e=-11
 a.out-717 : netfs_sreq: R=00000001[13] UPLD SUBMT f=100 s=240000 0/dbfac0 s=192 e=0
 a.out-717 : netfs_failure: R=00000001[13] UPLD f=500 s=240000 0/dbfac0 write e=-5
 a.out-717 : netfs_rreq: R=00000001 UW PAUSE   f=1801
 a.out-717 : netfs_sreq: R=00000001[13] UPLD TERM  f=500 s=240000 0/dbfac0 s=192 e=-5
 a.out-717 : netfs_rreq: R=00000001 UW WAKE-Q  f=1805
 a.out-717 : netfs_failure: R=00000001[13] UPLD f=400 s=240000 0/dbfac0 write e=-5
 a.out-717 : netfs_rreq: R=00000001 UW PAUSE   f=1805
 a.out-717 : netfs_sreq: R=00000001[13] UPLD TERM  f=400 s=240000 0/dbfac0 s=192 e=-5
 a.out-717 : netfs_rreq: R=00000001 UW WAKE-Q  f=1805
 a.out-717 : netfs_sreq: R=00000001[13] UPLD FREE  f=400 s=240000 0/dbfac0 s=192 e=-5

There are two calls to netfs_write_subrequest_terminated().  You can see the
IN_PROGRESS bit has been cleared by the second one (f=500 -> f=400).

David


^ permalink raw reply

* Re: Request CVE for ksmbd authenticated permission bypass in FSCTL_SET_SPARSE handle
From: Greg KH @ 2026-06-24  9:57 UTC (permalink / raw)
  To: grayhat@foxmail.com; +Cc: cve, sfrench, namjae.jeon, linux-cifs
In-Reply-To: <tencent_32075C41A32AB2B941A3641C72794B4B6408@qq.com>

On Wed, Jun 24, 2026 at 05:53:48PM +0800, grayhat@foxmail.com wrote:
> Hi CNA team,
> 
> I am writing to request a CVE ID for a permission bypass vulnerability in the ksmbd FSCTL_SET_SPARSE handler. A fix for this issue has been merged into the mainline Linux kernel and backported to stable kernels, but no corresponding CVE has yet been assigned.
> 
> Vulnerability classification: CWE-274 (Improper Handling of Insufficient Privileges)
> Commit references:
> - Mainline fix commit: cc57232cae23c0df91b4a59d0f519141ce9b5b02 ("ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE")
> - Stable backport releases: v7.0.12, v6.18.35, v6.6.143 LTS

Now assigned CVE-2026-52944

Hope this helps,

greg k-h

^ permalink raw reply

* Request CVE for ksmbd authenticated permission bypass in FSCTL_SET_SPARSE handle
From: grayhat @ 2026-06-24  9:53 UTC (permalink / raw)
  To: cve; +Cc: sfrench, namjae.jeon, linux-cifs

Hi CNA team,

I am writing to request a CVE ID for a permission bypass vulnerability in the ksmbd FSCTL_SET_SPARSE handler. A fix for this issue has been merged into the mainline Linux kernel and backported to stable kernels, but no corresponding CVE has yet been assigned.

Vulnerability classification: CWE-274 (Improper Handling of Insufficient Privileges)
Commit references:
- Mainline fix commit: cc57232cae23c0df91b4a59d0f519141ce9b5b02 ("ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE")
- Stable backport releases: v7.0.12, v6.18.35, v6.6.143 LTS

Note: ksmbd features are developed in a standalone Samba repository and merged into mainline Linux in periodic bulk pull requests. The FSCTL_SET_SPARSE handler landed via one such batch merge, so there is no dedicated mainline commit solely introducing this handler. Correspondingly, the fix commit does not carry a precise Fixes: tag referencing a mainline introduction commit.

The ksmbd FSCTL_SET_SPARSE handler is missing both share-level and handle-level write permission checks that are enforced for other write-sensitive FSCTL operations.

Within smb2_ioctl() (fs/smb/server/smb2pdu.c), the branch handling FSCTL_SET_SPARSE invokes fsctl_set_sparse() without validating two critical access conditions:
1. Share writable flag: test_tree_conn_flag(work->tcon, KSMBD_TREE_CONN_FLAG_WRITABLE)
2. Per-handle write access: fp->daccess & (FILE_WRITE_DATA_LE | FILE_WRITE_ATTRIBUTES_LE)

For comparison, related destructive operations such as FSCTL_COPYCHUNK and FSCTL_SET_ZERO_DATA both explicitly check KSMBD_TREE_CONN_FLAG_WRITABLE, while FSCTL_SET_SPARSE omits this protection entirely.

Internally, fsctl_set_sparse() modifies the in-memory file attribute flag fp->f_ci->m_fattr. When the Samba share option `store dos attributes = yes` is enabled, it also persists DOS attribute state via vfs_setxattr(). Unlike vfs_write() or vfs_fallocate(), vfs_setxattr() does not validate FMODE_WRITE at the VFS layer, meaning there is no underlying kernel enforcement to compensate for missing permission checks inside ksmbd.

As a result, an authenticated SMB client with read-only access to a share is able to alter the sparse attribute of arbitrary files. This bypasses both share-level read-only restrictions and file-handle write access requirements.

This behavior violates the MS-FSA specification section §2.1.5.10.38:
"If Open.GrantedAccess.FILE_WRITE_DATA is FALSE and Open.GrantedAccess.FILE_WRITE_ATTRIBUTES is FALSE, the operation MUST be failed with STATUS_ACCESS_DENIED."
ksmbd does not implement this mandatory access control check.

Thanks,
Sean Shen
grayhat@foxmail.com

^ permalink raw reply

* Re: [f2fs-dev] [PATCH v8 00/17] Subject: Exposing case folding behavior
From: patchwork-bot+f2fs @ 2026-06-24  8:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: viro, brauner, jack, pc, yuezhang.mo, cem, almaz.alexandrovich,
	adilger.kernel, linux-cifs, sfrench, slava, linux-ext4,
	linkinjeon, sprasad, frank.li, ronniesahlberg, glaubitz, jaegeuk,
	hirofumi, linux-nfs, tytso, linux-api, linux-f2fs-devel,
	linux-xfs, senozhatsky, chuck.lever, hansg, anna, linux-fsdevel,
	sj1557.seo, trondmy
In-Reply-To: <20260217214741.1928576-1-cel@kernel.org>

Hello:

This series was applied to jaegeuk/f2fs.git (dev)
by Christian Brauner <brauner@kernel.org>:

On Tue, 17 Feb 2026 16:47:24 -0500 you wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Following on from
> 
> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
> 
> I'm attempting to implement enough support in the Linux VFS to
> enable file services like NFSD and ksmbd (and user space
> equivalents) to provide the actual status of case folding support
> in local file systems. The default behavior for local file systems
> not explicitly supported in this series is to reflect the usual
> POSIX behaviors:
> 
> [...]

Here is the summary with links:
  - [f2fs-dev,v8,01/17] fs: Move file_kattr initialization to callers
    (no matching commit)
  - [f2fs-dev,v8,02/17] fs: Add case sensitivity flags to file_kattr
    https://git.kernel.org/jaegeuk/f2fs/c/3035e4454142
  - [f2fs-dev,v8,03/17] fat: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,04/17] exfat: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,05/17] ntfs3: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,06/17] hfs: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,07/17] hfsplus: Report case sensitivity in fileattr_get
    (no matching commit)
  - [f2fs-dev,v8,08/17] ext4: Report case sensitivity in fileattr_get
    (no matching commit)
  - [f2fs-dev,v8,09/17] xfs: Report case sensitivity in fileattr_get
    (no matching commit)
  - [f2fs-dev,v8,10/17] cifs: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,11/17] nfs: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,12/17] f2fs: Add case sensitivity reporting to fileattr_get
    (no matching commit)
  - [f2fs-dev,v8,13/17] vboxsf: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,14/17] isofs: Implement fileattr_get for case sensitivity
    (no matching commit)
  - [f2fs-dev,v8,15/17] nfsd: Report export case-folding via NFSv3 PATHCONF
    (no matching commit)
  - [f2fs-dev,v8,16/17] nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
    (no matching commit)
  - [f2fs-dev,v8,17/17] ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [f2fs-dev] [PATCH v14 00/15] Exposing case folding behavior
From: patchwork-bot+f2fs @ 2026-06-24  8:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: viro, brauner, jack, pc, yuezhang.mo, cem, roland.mainz,
	almaz.alexandrovich, adilger.kernel, linux-cifs, sfrench, slava,
	djwong, linux-ext4, linkinjeon, stfrench, sprasad, frank.li,
	ronniesahlberg, glaubitz, jaegeuk, hirofumi, linux-nfs, tytso,
	linux-api, linux-f2fs-devel, linux-xfs, senozhatsky, chuck.lever,
	hansg, anna, linux-fsdevel, sj1557.seo, trondmy
In-Reply-To: <20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com>

Hello:

This series was applied to jaegeuk/f2fs.git (dev)
by Christian Brauner <brauner@kernel.org>:

On Thu, 07 May 2026 04:52:53 -0400 you wrote:
> Christian, let's lock this one in. I will post subsequent changes
> as delta patches.
> 
> Following on from:
> 
> https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
> 
> [...]

Here is the summary with links:
  - [f2fs-dev,v14,01/15] fs: Move file_kattr initialization to callers
    https://git.kernel.org/jaegeuk/f2fs/c/14c3197ecf07
  - [f2fs-dev,v14,02/15] fs: Add case sensitivity flags to file_kattr
    https://git.kernel.org/jaegeuk/f2fs/c/3035e4454142
  - [f2fs-dev,v14,03/15] fat: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/c92db2ca726f
  - [f2fs-dev,v14,04/15] exfat: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/27e0b573dd4a
  - [f2fs-dev,v14,05/15] ntfs3: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/eeb7b37b9700
  - [f2fs-dev,v14,06/15] hfs: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/b6fe046c3023
  - [f2fs-dev,v14,07/15] hfsplus: Report case sensitivity in fileattr_get
    https://git.kernel.org/jaegeuk/f2fs/c/a6469a15eefe
  - [f2fs-dev,v14,08/15] xfs: Report case sensitivity in fileattr_get
    https://git.kernel.org/jaegeuk/f2fs/c/c9da43e4e5c3
  - [f2fs-dev,v14,09/15] cifs: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/e50bc12f5a36
  - [f2fs-dev,v14,10/15] nfs: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/92d67628a1a9
  - [f2fs-dev,v14,11/15] vboxsf: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/ef14aa143f1d
  - [f2fs-dev,v14,12/15] isofs: Implement fileattr_get for case sensitivity
    https://git.kernel.org/jaegeuk/f2fs/c/7bbd51b1d748
  - [f2fs-dev,v14,13/15] nfsd: Report export case-folding via NFSv3 PATHCONF
    https://git.kernel.org/jaegeuk/f2fs/c/211cb2ba4877
  - [f2fs-dev,v14,14/15] nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
    https://git.kernel.org/jaegeuk/f2fs/c/01ee7c3d2e23
  - [f2fs-dev,v14,15/15] ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
    https://git.kernel.org/jaegeuk/f2fs/c/0164df1d1de7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] netfs: Fix UAF in netfs_unbuffered_write() on failed preparation
From: David Howells @ 2026-06-24  7:36 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: dhowells, hongao, Paulo Alcantara, netfs, linux-fsdevel,
	linux-kernel, syzbot+3c74b1f0c372e98efc32, Steve French,
	Namjae Jeon, linux-cifs@vger.kernel.org
In-Reply-To: <498534d3-e82e-40ce-bc7d-230580b2fcae@chenxiaosong.com>

This excerpt would seem to show how the problem comes about:

  a.out-717 : netfs_sreq: R=00000001[13] UPLD PREP  f=000 s=240000 0/0 s=192 e=0
  a.out-717 : netfs_sreq: R=00000001[13] UPLD SUBMT f=100 s=240000 0/20000 s=192 e=0
  a.out-717 : netfs_sreq: R=00000001[13] UPLD TERM  f=310 s=240000 1fac0/20000 s=160 e=0
  a.out-717 : netfs_rreq: R=00000001 UW WAKE-Q  f=1801
  a.out-717 : netfs_rreq: R=00000001 UW DONE-QUIESCE f=1801
  a.out-717 : netfs_sreq: R=00000001[13] UPLD RETRY f=210 s=240000 1fac0/20000 s=160 e=-11
  a.out-717 : netfs_sreq: R=00000001[13] UPLD SUBMT f=100 s=240000 0/dbfac0 s=192 e=0
  a.out-717 : netfs_failure: R=00000001[13] UPLD f=500 s=240000 0/dbfac0 write e=-5

It looks like there are two issues: firstly, why does the upload stop short?
Presumably that's forced by the test.  Secondly, I think it has miscalculated
the subreq size in the retry, hence the EIO.

David


^ permalink raw reply

* Re: Subject: Re: [PATCH] fix smb client defer close causes file
From: Chunjie Zhu @ 2026-06-24  7:19 UTC (permalink / raw)
  To: ematsumiya
  Cc: chunjie.zhu, linux-cifs, linux-kernel, lsahlber, pc,
	samba-technical, sfrench, smfrench, sprasad, tom
In-Reply-To: <ajtYUL_7qdzu21mx@suse.de>

It looks like it's not enough to mark a reused file handle by only file path.
Is there any plan to update it?
 
> As for the original bug report, do you happen to have a simple
> reproducer?
> 
> I created one on top of your instructions and I'm definitely getting a
> lease break on rename, which IIUC is the alleged root cause.
> This was with closetimeo=30 on a Windows Server 2022 share.

4 clients shared mount looks simple. Currently I do not have a simpler
reproduce step or script.

> Also, I'm failing to understand how cache= has any impact on this -- I
> get the same successful results with none,loose,strict.

In real tests with cache=none, we cannot see the symptom, by disabling
cache/buffer, client goes back to the server more directly, and stale cached
data/metadata windows shrink.

> 
> Cheers,
> 
> Enzo
> 
> >> Other narrower client fixes for this presumably could be done, e.g.
> >> forcing close on the deferred close when rename a hardlink.
> >>
> >> On Mon, Jun 22, 2026 at 4:40=E2=80=AFAM Chunjie Zhu <chunjie.zhu@citrix.com=
> >> > wrote:
> >> >
> >> > Test environment
> >> >
> >> >  4 hosts as smb client, 1 host as smb server
> >> >  smb client hosts, kernel 6.6.138
> >> >  mount options,
> >> >   //10.70.48.15/xxx /run/xxx cifs rw,relatime,vers=3D3.0,
> >> >   cache=3Dloose,username=3Dxxx,domain=3Dxxx,uid=3D0,noforceuid,
> >> >   gid=3D0,noforcegid,addr=3D10.70.48.15,file_mode=3D0755,
> >> >   dir_mode=3D0755,soft,nounix,serverino,mapposix,reparse=3Dnfs,
> >> >   rsize=3D1048576,wsize=3D1048576,bsize=3D1048576,echo_interval=3D60,
> >> >   actimeo=3D0,closetimeo=3D1
> >> >
> >> > Work around
> >> >
> >> >  mount with cache=3Dnone or closetimeo=3D0
> >> >
> >> > The Race Condition Flow
> >> >
> >> >  Step 1: Host-01 closes file
> >> >
> >> >   Host-01:
> >> >    file close (eeefe8d0.vhd)
> >> >    -> CIFS defers SMB2 CLOSE
> >> >    -> Handle H1 stored in deferred_closes list
> >> >    -> Lease L1 (RWH or RH) still active on server
> >> >    -> Entry: { path=3D=E2=80=9Ceeefe8d0.vhd=E2=80=9D, handle=3DH1, inode=
> >> =3DI1 }
> >> >
> >> >  Step 2: Host-02 does hardlink and rename
> >> >
> >> >   Host-02:
> >> >    hardlink(eeefe8d0.vhd, 0f11b74e.vhd)
> >> >    -> SMB2: Creates new name for same inode
> >> >    -> Server: inode I1 now has 2 names (link count =3D 2)
> >> >    -> Host-01 lease L1: NO BREAK (same inode, just added name)
> >> >
> >> >    crate(eeefe8d0.vhd.new)
> >> >    -> Entry { path=3D"eeefe8d0.vhd.new", handle=3DH2, inode=3DI2 }
> >> >
> >> >    rename(eeefe8d0.vhd.new, eeefe8d0.vhd)
> >> >    -> SMB2: Replaces =E2=80=9Ceeefe8d0.vhd=E2=80=9D name =E2=86=92 points=
> >>  to new inode I2
> >> >    -> Server: old inode I1 now only accessible as =E2=80=9C0f11b74e.vhd=
> >> =E2=80=9D
> >> >    -> Server SHOULD send: Lease Break notification to H1 =E2=86=90 KEY!
> >> >
> >> >  Step 3: Lease break delivery is not reliable
> >> >
> >> >   strict locking off, level2 oplock
> >> >
> >> >    Host-01:
> >> >    -> Lease break not received or processed
> >> >    -> H1 is in deferred_closes list (not "active")
> >> >
> >> >    Result: Stale entry remains:
> >> >       { path=3D=E2=80=9Ceeefe8d0.vhd=E2=80=9D, handle=3DH1, inode=3DI1_OL=
> >> D }
> >> >
> >> >    Host-02:
> >> >    -> Open 0f11b74e.vhd in readonly
> >> >
> >> >    Result:
> >> >       { path=3D"0f11b74e.vhd", inode=3DI1_NEW }
> >> >
> >> >  Step 4: Host-01 reopens file
> >> >
> >> >   Host-01:
> >> >    file open (eeefe8d0.vhd)
> >> >    -> Kernel checks deferred_closes for =E2=80=9Ceeefe8d0.vhd=E2=80=9D
> >> >    -> Found H1! (matched by pathname string)
> >> >    -> REUSES H1 without checking
> >> >    -> close or reconnect, flush buffered writes
> >> >       slient corruption?
> >> >
> >> > Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
> >> > ---
> >> >  fs/smb/client/fs_context.c | 8 ++++----
> >> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
> >> > index 0812af001417..4ed33de0a00d 100644
> >> > --- a/fs/smb/client/fs_context.c
> >> > +++ b/fs/smb/client/fs_context.c
> >> > @@ -1300,11 +1300,11 @@ static int smb3_fs_context_parse_param(struct fs_=
> >> context *fc,
> >> >                 ctx->acdirmax =3D ctx->acregmax =3D HZ * result.uint_32;
> >> >                 break;
> >> >         case Opt_closetimeo:
> >> > -               if (result.uint_32 > SMB3_MAX_DCLOSETIMEO / HZ) {
> >> > -                       cifs_errorf(fc, "closetimeo too large\n");
> >> > +               if (result.uint_32 !=3D 0) {
> >> > +                       cifs_errorf(fc, "closetimeo must be 0, deferred c=
> >> lose is disabled\n");
> >> >                         goto cifs_parse_mount_err;
> >> >                 }
> >> > -               ctx->closetimeo =3D HZ * result.uint_32;
> >> > +               ctx->closetimeo =3D 0;
> >> >                 break;
> >> >         case Opt_echo_interval:
> >> >                 if (result.uint_32 < SMB_ECHO_INTERVAL_MIN ||
> >> > @@ -1795,7 +1795,7 @@ int smb3_init_fs_context(struct fs_context *fc)
> >> >
> >> >         ctx->acregmax =3D CIFS_DEF_ACTIMEO;
> >> >         ctx->acdirmax =3D CIFS_DEF_ACTIMEO;
> >> > -       ctx->closetimeo =3D SMB3_DEF_DCLOSETIMEO;
> >> > +       ctx->closetimeo =3D 0;
> >> >         ctx->max_cached_dirs =3D MAX_CACHED_FIDS;
> >> >         /* Most clients set timeout to 0, allows server to use its defaul=
> >> t */
> >> >         ctx->handle_timeout =3D 0; /* See MS-SMB2 spec section 2.2.14.2.1=
> >> 2 */
> >> > --
> >> > 2.52.0
> >> >
> >> >
> >>
> >>
> >> --=20
> >> Thanks,
> >>
> >> Steve
> >>
> >
> 

^ permalink raw reply

* Re: [PATCH] fix smb client defer close causes file corruption
From: Enzo Matsumiya @ 2026-06-24  4:19 UTC (permalink / raw)
  To: Steve French
  Cc: Chunjie Zhu, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Tom Talpey, linux-cifs, samba-technical,
	linux-kernel
In-Reply-To: <CAH2r5mu7sdqJ4NJQzsqhY+3LmHE7AcREM9nYAbEZCNZ08yEafA@mail.gmail.com>

On 06/23, Steve French wrote:
>Do you see different behavior of this to different servers (Samba,
>Windows, ksmbd, Azure xSMB etc)?
>
>Disabling deferred close carries metadata integrity issues because it
>will lead users to set actimeo (or acregmax) to higher values, and
>disabling deferred close could significantly hurt performance for some
>common workload patterns (open/write/close/open/read/close e.g.)
>
>Other narrower client fixes for this presumably could be done, e.g.
>forcing close on the deferred close when rename a hardlink.

That looks like a safe measure to have IMO.


Cheers,

Enzo

>On Mon, Jun 22, 2026 at 4:40 AM Chunjie Zhu <chunjie.zhu@citrix.com> wrote:
>>
>> Test environment
>>
>>  4 hosts as smb client, 1 host as smb server
>>  smb client hosts, kernel 6.6.138
>>  mount options,
>>   //10.70.48.15/xxx /run/xxx cifs rw,relatime,vers=3.0,
>>   cache=loose,username=xxx,domain=xxx,uid=0,noforceuid,
>>   gid=0,noforcegid,addr=10.70.48.15,file_mode=0755,
>>   dir_mode=0755,soft,nounix,serverino,mapposix,reparse=nfs,
>>   rsize=1048576,wsize=1048576,bsize=1048576,echo_interval=60,
>>   actimeo=0,closetimeo=1
>>
>> Work around
>>
>>  mount with cache=none or closetimeo=0
>>
>> The Race Condition Flow
>>
>>  Step 1: Host-01 closes file
>>
>>   Host-01:
>>    file close (eeefe8d0.vhd)
>>    -> CIFS defers SMB2 CLOSE
>>    -> Handle H1 stored in deferred_closes list
>>    -> Lease L1 (RWH or RH) still active on server
>>    -> Entry: { path=“eeefe8d0.vhd”, handle=H1, inode=I1 }
>>
>>  Step 2: Host-02 does hardlink and rename
>>
>>   Host-02:
>>    hardlink(eeefe8d0.vhd, 0f11b74e.vhd)
>>    -> SMB2: Creates new name for same inode
>>    -> Server: inode I1 now has 2 names (link count = 2)
>>    -> Host-01 lease L1: NO BREAK (same inode, just added name)
>>
>>    crate(eeefe8d0.vhd.new)
>>    -> Entry { path="eeefe8d0.vhd.new", handle=H2, inode=I2 }
>>
>>    rename(eeefe8d0.vhd.new, eeefe8d0.vhd)
>>    -> SMB2: Replaces “eeefe8d0.vhd” name → points to new inode I2
>>    -> Server: old inode I1 now only accessible as “0f11b74e.vhd”
>>    -> Server SHOULD send: Lease Break notification to H1 ← KEY!
>>
>>  Step 3: Lease break delivery is not reliable
>>
>>   strict locking off, level2 oplock
>>
>>    Host-01:
>>    -> Lease break not received or processed
>>    -> H1 is in deferred_closes list (not "active")
>>
>>    Result: Stale entry remains:
>>       { path=“eeefe8d0.vhd”, handle=H1, inode=I1_OLD }
>>
>>    Host-02:
>>    -> Open 0f11b74e.vhd in readonly
>>
>>    Result:
>>       { path="0f11b74e.vhd", inode=I1_NEW }
>>
>>  Step 4: Host-01 reopens file
>>
>>   Host-01:
>>    file open (eeefe8d0.vhd)
>>    -> Kernel checks deferred_closes for “eeefe8d0.vhd”
>>    -> Found H1! (matched by pathname string)
>>    -> REUSES H1 without checking
>>    -> close or reconnect, flush buffered writes
>>       slient corruption?
>>
>> Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
>> ---
>>  fs/smb/client/fs_context.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
>> index 0812af001417..4ed33de0a00d 100644
>> --- a/fs/smb/client/fs_context.c
>> +++ b/fs/smb/client/fs_context.c
>> @@ -1300,11 +1300,11 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
>>                 ctx->acdirmax = ctx->acregmax = HZ * result.uint_32;
>>                 break;
>>         case Opt_closetimeo:
>> -               if (result.uint_32 > SMB3_MAX_DCLOSETIMEO / HZ) {
>> -                       cifs_errorf(fc, "closetimeo too large\n");
>> +               if (result.uint_32 != 0) {
>> +                       cifs_errorf(fc, "closetimeo must be 0, deferred close is disabled\n");
>>                         goto cifs_parse_mount_err;
>>                 }
>> -               ctx->closetimeo = HZ * result.uint_32;
>> +               ctx->closetimeo = 0;
>>                 break;
>>         case Opt_echo_interval:
>>                 if (result.uint_32 < SMB_ECHO_INTERVAL_MIN ||
>> @@ -1795,7 +1795,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>
>>         ctx->acregmax = CIFS_DEF_ACTIMEO;
>>         ctx->acdirmax = CIFS_DEF_ACTIMEO;
>> -       ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>> +       ctx->closetimeo = 0;
>>         ctx->max_cached_dirs = MAX_CACHED_FIDS;
>>         /* Most clients set timeout to 0, allows server to use its default */
>>         ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>> --
>> 2.52.0
>>
>>
>
>
>-- 
>Thanks,
>
>Steve
>

^ permalink raw reply

* Re: Subject: Re: [PATCH] fix smb client defer close causes file corruption
From: Enzo Matsumiya @ 2026-06-24  4:18 UTC (permalink / raw)
  To: Chunjie Zhu
  Cc: smfrench, linux-cifs, linux-kernel, lsahlber, pc, samba-technical,
	sfrench, sprasad, tom
In-Reply-To: <20260624034414.5087-1-chunjie.zhu@citrix.com>

On 06/24, Chunjie Zhu wrote:
>From my experience, these common workloads tend to occur within a single
>process. Therefore, it might be worth optimizing cifs_get_readable_path
>by extending its matching logic: instead of matching only the file path,
>we could also check the process PID. The old handler would then be reused
>only when both the file path and the PID match.
>
>Do you think this approach is viable?

I don't think this is a good idea; PIDs change and are naturally made to
be reused as well, so storing PID for a 'later' (no matter how later)
check is 100% unreliable.

As for the original bug report, do you happen to have a simple
reproducer?

I created one on top of your instructions and I'm definitely getting a
lease break on rename, which IIUC is the alleged root cause.
This was with closetimeo=30 on a Windows Server 2022 share.

Also, I'm failing to understand how cache= has any impact on this -- I
get the same successful results with none,loose,strict.


Cheers,

Enzo

>> Other narrower client fixes for this presumably could be done, e.g.
>> forcing close on the deferred close when rename a hardlink.
>>
>> On Mon, Jun 22, 2026 at 4:40=E2=80=AFAM Chunjie Zhu <chunjie.zhu@citrix.com=
>> > wrote:
>> >
>> > Test environment
>> >
>> >  4 hosts as smb client, 1 host as smb server
>> >  smb client hosts, kernel 6.6.138
>> >  mount options,
>> >   //10.70.48.15/xxx /run/xxx cifs rw,relatime,vers=3D3.0,
>> >   cache=3Dloose,username=3Dxxx,domain=3Dxxx,uid=3D0,noforceuid,
>> >   gid=3D0,noforcegid,addr=3D10.70.48.15,file_mode=3D0755,
>> >   dir_mode=3D0755,soft,nounix,serverino,mapposix,reparse=3Dnfs,
>> >   rsize=3D1048576,wsize=3D1048576,bsize=3D1048576,echo_interval=3D60,
>> >   actimeo=3D0,closetimeo=3D1
>> >
>> > Work around
>> >
>> >  mount with cache=3Dnone or closetimeo=3D0
>> >
>> > The Race Condition Flow
>> >
>> >  Step 1: Host-01 closes file
>> >
>> >   Host-01:
>> >    file close (eeefe8d0.vhd)
>> >    -> CIFS defers SMB2 CLOSE
>> >    -> Handle H1 stored in deferred_closes list
>> >    -> Lease L1 (RWH or RH) still active on server
>> >    -> Entry: { path=3D=E2=80=9Ceeefe8d0.vhd=E2=80=9D, handle=3DH1, inode=
>> =3DI1 }
>> >
>> >  Step 2: Host-02 does hardlink and rename
>> >
>> >   Host-02:
>> >    hardlink(eeefe8d0.vhd, 0f11b74e.vhd)
>> >    -> SMB2: Creates new name for same inode
>> >    -> Server: inode I1 now has 2 names (link count =3D 2)
>> >    -> Host-01 lease L1: NO BREAK (same inode, just added name)
>> >
>> >    crate(eeefe8d0.vhd.new)
>> >    -> Entry { path=3D"eeefe8d0.vhd.new", handle=3DH2, inode=3DI2 }
>> >
>> >    rename(eeefe8d0.vhd.new, eeefe8d0.vhd)
>> >    -> SMB2: Replaces =E2=80=9Ceeefe8d0.vhd=E2=80=9D name =E2=86=92 points=
>>  to new inode I2
>> >    -> Server: old inode I1 now only accessible as =E2=80=9C0f11b74e.vhd=
>> =E2=80=9D
>> >    -> Server SHOULD send: Lease Break notification to H1 =E2=86=90 KEY!
>> >
>> >  Step 3: Lease break delivery is not reliable
>> >
>> >   strict locking off, level2 oplock
>> >
>> >    Host-01:
>> >    -> Lease break not received or processed
>> >    -> H1 is in deferred_closes list (not "active")
>> >
>> >    Result: Stale entry remains:
>> >       { path=3D=E2=80=9Ceeefe8d0.vhd=E2=80=9D, handle=3DH1, inode=3DI1_OL=
>> D }
>> >
>> >    Host-02:
>> >    -> Open 0f11b74e.vhd in readonly
>> >
>> >    Result:
>> >       { path=3D"0f11b74e.vhd", inode=3DI1_NEW }
>> >
>> >  Step 4: Host-01 reopens file
>> >
>> >   Host-01:
>> >    file open (eeefe8d0.vhd)
>> >    -> Kernel checks deferred_closes for =E2=80=9Ceeefe8d0.vhd=E2=80=9D
>> >    -> Found H1! (matched by pathname string)
>> >    -> REUSES H1 without checking
>> >    -> close or reconnect, flush buffered writes
>> >       slient corruption?
>> >
>> > Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
>> > ---
>> >  fs/smb/client/fs_context.c | 8 ++++----
>> >  1 file changed, 4 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
>> > index 0812af001417..4ed33de0a00d 100644
>> > --- a/fs/smb/client/fs_context.c
>> > +++ b/fs/smb/client/fs_context.c
>> > @@ -1300,11 +1300,11 @@ static int smb3_fs_context_parse_param(struct fs_=
>> context *fc,
>> >                 ctx->acdirmax =3D ctx->acregmax =3D HZ * result.uint_32;
>> >                 break;
>> >         case Opt_closetimeo:
>> > -               if (result.uint_32 > SMB3_MAX_DCLOSETIMEO / HZ) {
>> > -                       cifs_errorf(fc, "closetimeo too large\n");
>> > +               if (result.uint_32 !=3D 0) {
>> > +                       cifs_errorf(fc, "closetimeo must be 0, deferred c=
>> lose is disabled\n");
>> >                         goto cifs_parse_mount_err;
>> >                 }
>> > -               ctx->closetimeo =3D HZ * result.uint_32;
>> > +               ctx->closetimeo =3D 0;
>> >                 break;
>> >         case Opt_echo_interval:
>> >                 if (result.uint_32 < SMB_ECHO_INTERVAL_MIN ||
>> > @@ -1795,7 +1795,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>> >
>> >         ctx->acregmax =3D CIFS_DEF_ACTIMEO;
>> >         ctx->acdirmax =3D CIFS_DEF_ACTIMEO;
>> > -       ctx->closetimeo =3D SMB3_DEF_DCLOSETIMEO;
>> > +       ctx->closetimeo =3D 0;
>> >         ctx->max_cached_dirs =3D MAX_CACHED_FIDS;
>> >         /* Most clients set timeout to 0, allows server to use its defaul=
>> t */
>> >         ctx->handle_timeout =3D 0; /* See MS-SMB2 spec section 2.2.14.2.1=
>> 2 */
>> > --
>> > 2.52.0
>> >
>> >
>>
>>
>> --=20
>> Thanks,
>>
>> Steve
>>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox