* [PATCH v4 00/22] netfs: Miscellaneous fixes
@ 2026-04-27 15:29 David Howells
2026-04-27 15:29 ` [PATCH v4 01/22] netfs: Fix cancellation of a DIO and single read subrequests David Howells
` (10 more replies)
0 siblings, 11 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel
Hi Christian,
Here are the outstanding miscellaneous fixes for netfslib gathered together
and with some fixes-to-fixes folded down and one rearrangement. Various
Sashiko review comments[1][2][3] are addressed:
(1) Fix subrequest cancellation cleanup in DIO read and single-read.
(2) Fix read and write result collection to use barriering correctly to
access a request's subrequest lists without taking a lock.
This adds list_add_tail_release() and list_first_entry_acquire() to
appropriate incorporate barriering into some list functions.
(3) Fix missing locking around retry adding new subrequests.
(4) Fix netfs_read_to_pagecache() to pause on subrequest I/O failure.
(5) Fix the potential for 64-bit tearing on a 32-bit machine when reading
netfs_inode->remote_i_size and ->zero_point by using much the same
mechanism as is used for ->i_size.
(6) Fix the calculation of zero_point in netfs_release_folio() to limit it
to ->remote_i_size, not ->i_size.
(7) Fix triggering of a VM_BUG_ON_FOLIO() in netfs_write_begin().
(8) Fix error handling in netfs_extract_user_iter().
(9) Fix netfs_invalidate_folio() to clear the folio dirty bit if all dirty
data removed.
(10) Defer the emission of trace_netfs_folio() in netfs_perform_write().
This allows the next patch to emit the correct traces.
(11) Fix the handling of a partially failed copy (ie. EFAULT) into a
streaming write folio. Also remove the netfs_folio if a streaming
write folio is entirely overwritten.
(12) Fix netfs_read_gaps() to remove the netfs_folio from a filled folio.
(13) Fix netfs_perform_write() to not disable streaming writes when writing
to an fd that's open O_RDWR.
(14) Fix an early put of the sink page used in netfs_read_gaps(), before
the request has completed.
(15) Fix request leak in netfs_write_begin() error handling.
(16) Fix a potential UAF in netfs_unlock_abandoned_read_pages() due to
trying to check index of each folio we're abandoning to see if that
folio is actually owned by the caller (in which case, we're not
actually allowed to dereference it).
(17) Fix a potentially uninitialised error value in
netfs_extract_user_iter().
(18) Fix incorrect adjustment of dirty region when partially invalidating a
streaming write folio.
(19) Fix the handling of folio->private in netfs_perform_write() and the
attached netfs_folio and/or group when a streaming write folio is
modified.
(20) Fix netfs_read_folio() to wait on writeback first (it holds the folio
lock) otherwise we aren't allowed to look at the netfs_folio struct as
that could be modified at any time by the writeback collector.
(21) Fix write skipping in dir/symlink writepages.
(22) Fix the locking used by afs_get_link().
The patches can also be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
Thanks,
David
[1] https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com
[2] https://sashiko.dev/#/patchset/20260326104544.509518-1-dhowells%40redhat.com
[3] https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com
Changes
=======
ver #4)
- Rebase on v7.0-rc1
- Fix latest set of Sashiko issues[3].
- Move the ->subrequests barriering patch up front as it modifies
linux/list.h.
- Split that barriering patch and make the first patch to harmonise the
order of adding a read subreq to the queue, for buffered, dio and
single reads and to fix cancellation on prep failure. The second patch
then fixes the barriering.
- Lock ->subrequests in retry when adding in extra subreqs.
- Use a spinlock as well when modifying ->zero_point with a seq lock.
- Atomically check and change ->zero_point when bumping it up.
- Merged the two patches sorting out the locking in afs symlink handling,
then fixed a number of issues in them.
- Added a patch to make afs dir and symlink writepages skip if the
validate_lock is held and WB_SYNC_NONE is set.
ver #3)
- Rebase on linus/master.
- Consolidate the various sets of fixes for reposting.
- Fold down fixes-to-fixes.
- Move the tracing change in netfs_perform_write() down to below the patch
it primarily affects.
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
David Howells (20):
netfs: Fix cancellation of a DIO and single read subrequests
netfs: Fix missing barriers when accessing stream->subrequests
locklessly
netfs: Fix missing locking around retry adding new subreqs
netfs: Fix netfs_read_to_pagecache() to pause on subreq failure
netfs: Fix potential for tearing in ->remote_i_size and ->zero_point
netfs: Fix zeropoint update where i_size > remote_i_size
netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes
gone
netfs: Defer the emission of trace_netfs_folio()
netfs: Fix streaming write being overwritten
netfs: Fix read-gaps to remove netfs_folio from filled folio
netfs: Fix write streaming disablement if fd open O_RDWR
netfs: Fix early put of sink folio in netfs_read_gaps()
netfs: Fix leak of request in netfs_write_begin() error handling
netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages()
netfs: Fix potential uninitialised var in netfs_extract_user_iter()
netfs: Fix partial invalidation of streaming-write folio
netfs: Fix folio->private handling in netfs_perform_write()
netfs: Fix netfs_read_folio() to wait on writeback
netfs, afs: Fix write skipping in dir/link writepages
afs: Fix the locking used by afs_get_link()
Paulo Alcantara (1):
netfs: fix error handling in netfs_extract_user_iter()
Viacheslav Dubeyko (1):
netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call
fs/9p/vfs_inode.c | 2 +-
fs/9p/vfs_inode_dotl.c | 4 +-
fs/afs/Makefile | 1 +
fs/afs/dir.c | 76 ++++----
fs/afs/fsclient.c | 4 +-
fs/afs/inode.c | 104 +----------
fs/afs/internal.h | 34 +++-
fs/afs/symlink.c | 267 +++++++++++++++++++++++++++++
fs/afs/validation.c | 8 +-
fs/afs/write.c | 2 +-
fs/afs/yfsclient.c | 4 +-
fs/netfs/buffered_read.c | 64 ++++---
fs/netfs/buffered_write.c | 147 ++++++++++------
fs/netfs/direct_read.c | 19 +-
fs/netfs/direct_write.c | 4 +-
fs/netfs/internal.h | 2 +
fs/netfs/iterator.c | 15 +-
fs/netfs/misc.c | 21 ++-
fs/netfs/read_collect.c | 6 +-
fs/netfs/read_retry.c | 13 +-
fs/netfs/read_single.c | 20 +--
fs/netfs/write_collect.c | 7 +-
fs/netfs/write_issue.c | 8 +-
fs/netfs/write_retry.c | 2 +
fs/smb/client/cifsfs.c | 28 +--
fs/smb/client/cifssmb.c | 2 +-
fs/smb/client/file.c | 9 +-
fs/smb/client/inode.c | 9 +-
fs/smb/client/readdir.c | 3 +-
fs/smb/client/smb2ops.c | 16 +-
fs/smb/client/smb2pdu.c | 2 +-
include/linux/list.h | 37 ++++
include/linux/netfs.h | 324 +++++++++++++++++++++++++++++++++--
include/trace/events/netfs.h | 8 +
34 files changed, 949 insertions(+), 323 deletions(-)
create mode 100644 fs/afs/symlink.c
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v4 01/22] netfs: Fix cancellation of a DIO and single read subrequests
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 02/22] netfs: Fix missing barriers when accessing stream->subrequests locklessly David Howells
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel
When the preparation of a new subrequest for a read fails, if the
subrequest has already been added to the stream->subrequests list, it can't
simply be put and abandoned as the collector may see it. Also, if it
hasn't been queued yet, it has two outstanding refs that both need to be
put. Both DIO read and single-read dispatch fail at this; further, both
differ in the order they do things to the way buffered read works.
Fix cancellation of both DIO-read and single-read subrequests that failed
preparation by the following steps:
(1) Harmonise all three reads (buffered, dio, single) to queue the subreq
before prepping it.
(2) Make all three call netfs_queue_read() to do the queuing.
(3) Set NETFS_RREQ_ALL_QUEUED independently of the queuing as we don't
know the length of the subreq at this point.
(4) In all cases, set the error and NETFS_SREQ_FAILED flag on the subreq
and then call netfs_read_subreq_terminated() to deal with it. This
will pass responsibility off to the collector for dealing with it.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_read.c | 26 +++++++++++---------------
fs/netfs/direct_read.c | 19 ++++---------------
fs/netfs/internal.h | 2 ++
fs/netfs/read_single.c | 20 ++++++++------------
4 files changed, 25 insertions(+), 42 deletions(-)
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index a8c0d86118c5..2c51c55a9b15 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -156,9 +156,8 @@ static void netfs_read_cache_to_pagecache(struct netfs_io_request *rreq,
netfs_cache_read_terminated, subreq);
}
-static void netfs_queue_read(struct netfs_io_request *rreq,
- struct netfs_io_subrequest *subreq,
- bool last_subreq)
+void netfs_queue_read(struct netfs_io_request *rreq,
+ struct netfs_io_subrequest *subreq)
{
struct netfs_io_stream *stream = &rreq->io_streams[0];
@@ -178,11 +177,6 @@ static void netfs_queue_read(struct netfs_io_request *rreq,
}
}
- if (last_subreq) {
- smp_wmb(); /* Write lists before ALL_QUEUED. */
- set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
- }
-
spin_unlock(&rreq->lock);
}
@@ -233,6 +227,8 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
subreq->start = start;
subreq->len = size;
+ netfs_queue_read(rreq, subreq);
+
source = netfs_cache_prepare_read(rreq, subreq, rreq->i_size);
subreq->source = source;
if (source == NETFS_DOWNLOAD_FROM_SERVER) {
@@ -262,11 +258,8 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
ret = rreq->netfs_ops->prepare_read(subreq);
if (ret < 0) {
subreq->error = ret;
- /* Not queued - release both refs. */
- netfs_put_subrequest(subreq,
- netfs_sreq_trace_put_cancel);
- netfs_put_subrequest(subreq,
- netfs_sreq_trace_put_cancel);
+ __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ netfs_read_subreq_terminated(subreq);
break;
}
trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
@@ -302,10 +295,13 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel);
break;
}
- size -= slice;
start += slice;
+ size -= slice;
+ if (size <= 0) {
+ smp_wmb(); /* Write lists before ALL_QUEUED. */
+ set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
+ }
- netfs_queue_read(rreq, subreq, size <= 0);
netfs_issue_read(rreq, subreq);
cond_resched();
} while (size > 0);
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index f72e6da88cca..4fd5cfa690cf 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -47,7 +47,6 @@ static void netfs_prepare_dio_read_iterator(struct netfs_io_subrequest *subreq)
*/
static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
{
- struct netfs_io_stream *stream = &rreq->io_streams[0];
unsigned long long start = rreq->start;
ssize_t size = rreq->len;
int ret = 0;
@@ -66,25 +65,15 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
subreq->start = start;
subreq->len = size;
- __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
-
- spin_lock(&rreq->lock);
- list_add_tail(&subreq->rreq_link, &stream->subrequests);
- if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
- if (!stream->active) {
- stream->collected_to = subreq->start;
- /* Store list pointers before active flag */
- smp_store_release(&stream->active, true);
- }
- }
- trace_netfs_sreq(subreq, netfs_sreq_trace_added);
- spin_unlock(&rreq->lock);
+ netfs_queue_read(rreq, subreq);
netfs_stat(&netfs_n_rh_download);
if (rreq->netfs_ops->prepare_read) {
ret = rreq->netfs_ops->prepare_read(subreq);
if (ret < 0) {
- netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel);
+ __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ subreq->error = ret;
+ netfs_read_subreq_terminated(subreq);
break;
}
}
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index d436e20d3418..24fefa1b179d 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -23,6 +23,8 @@
/*
* buffered_read.c
*/
+void netfs_queue_read(struct netfs_io_request *rreq,
+ struct netfs_io_subrequest *subreq);
void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error);
int netfs_prefetch_for_write(struct file *file, struct folio *folio,
size_t offset, size_t len);
diff --git a/fs/netfs/read_single.c b/fs/netfs/read_single.c
index d0e23bc42445..432c7456a1b6 100644
--- a/fs/netfs/read_single.c
+++ b/fs/netfs/read_single.c
@@ -89,7 +89,6 @@ static void netfs_single_read_cache(struct netfs_io_request *rreq,
*/
static int netfs_single_dispatch_read(struct netfs_io_request *rreq)
{
- struct netfs_io_stream *stream = &rreq->io_streams[0];
struct netfs_io_subrequest *subreq;
int ret = 0;
@@ -102,14 +101,7 @@ static int netfs_single_dispatch_read(struct netfs_io_request *rreq)
subreq->len = rreq->len;
subreq->io_iter = rreq->buffer.iter;
- __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
-
- spin_lock(&rreq->lock);
- list_add_tail(&subreq->rreq_link, &stream->subrequests);
- trace_netfs_sreq(subreq, netfs_sreq_trace_added);
- /* Store list pointers before active flag */
- smp_store_release(&stream->active, true);
- spin_unlock(&rreq->lock);
+ netfs_queue_read(rreq, subreq);
netfs_single_cache_prepare_read(rreq, subreq);
switch (subreq->source) {
@@ -121,10 +113,14 @@ static int netfs_single_dispatch_read(struct netfs_io_request *rreq)
goto cancel;
}
+ smp_wmb(); /* Write lists before ALL_QUEUED. */
+ set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
rreq->netfs_ops->issue_read(subreq);
rreq->submitted += subreq->len;
break;
case NETFS_READ_FROM_CACHE:
+ smp_wmb(); /* Write lists before ALL_QUEUED. */
+ set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
netfs_single_read_cache(rreq, subreq);
rreq->submitted += subreq->len;
@@ -137,11 +133,11 @@ static int netfs_single_dispatch_read(struct netfs_io_request *rreq)
break;
}
- smp_wmb(); /* Write lists before ALL_QUEUED. */
- set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
return ret;
cancel:
- netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel);
+ __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
+ subreq->error = ret;
+ netfs_read_subreq_terminated(subreq);
return ret;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 02/22] netfs: Fix missing barriers when accessing stream->subrequests locklessly
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
2026-04-27 15:29 ` [PATCH v4 01/22] netfs: Fix cancellation of a DIO and single read subrequests David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 03/22] netfs: Fix missing locking around retry adding new subreqs David Howells
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel
The list of subrequests attached to stream->subrequests is accessed without
locks by netfs_collect_read_results() and netfs_collect_write_results(),
and then they access subreq->flags without taking a barrier after getting
the subreq pointer from the list. Relatedly, the functions that build the
list don't use any sort of write barrier when constructing the list to make
sure that the NETFS_SREQ_IN_PROGRESS flag is perceived to be set first if
no lock is taken.
Fix this by:
(1) Add a new list_add_tail_release() function that uses a release barrier
to set the pointer to the new member of the list.
(2) Add a new list_first_entry_acquire() function that uses an acquire
barrier to read the pointer to the first member in a list (or return
NULL).
(3) Use list_add_tail_release() when adding a subreq to ->subrequests.
(4) Use list_first_entry_acquire() when initially accessing the front of
the list (when an item is removed, the pointer to the new front iterm
is obtained under the same lock).
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Link: https://sashiko.dev/#/patchset/20260326104544.509518-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_read.c | 3 ++-
fs/netfs/read_collect.c | 4 +++-
fs/netfs/write_collect.c | 4 +++-
fs/netfs/write_issue.c | 3 ++-
include/linux/list.h | 37 +++++++++++++++++++++++++++++++++++++
5 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 2c51c55a9b15..3bc7d0c5c3b9 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -168,7 +168,8 @@ void netfs_queue_read(struct netfs_io_request *rreq,
* remove entries off of the front.
*/
spin_lock(&rreq->lock);
- list_add_tail(&subreq->rreq_link, &stream->subrequests);
+ /* Write IN_PROGRESS before pointer to new subreq */
+ list_add_tail_release(&subreq->rreq_link, &stream->subrequests);
if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
if (!stream->active) {
stream->collected_to = subreq->start;
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index e5f6665b3341..f6b87a22c290 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -205,8 +205,10 @@ static void netfs_collect_read_results(struct netfs_io_request *rreq)
* in progress. The issuer thread may be adding stuff to the tail
* whilst we're doing this.
*/
- front = list_first_entry_or_null(&stream->subrequests,
+ front = list_first_entry_acquire(&stream->subrequests,
struct netfs_io_subrequest, rreq_link);
+ /* Read first subreq pointer before IN_PROGRESS flag. */
+
while (front) {
size_t transferred;
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index b194447f4b11..ba4ac6993b74 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -228,8 +228,10 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
if (!smp_load_acquire(&stream->active))
continue;
- front = list_first_entry_or_null(&stream->subrequests,
+ front = list_first_entry_acquire(&stream->subrequests,
struct netfs_io_subrequest, rreq_link);
+ /* Read first subreq pointer before IN_PROGRESS flag. */
+
while (front) {
trace_netfs_collect_sreq(wreq, front);
//_debug("sreq [%x] %llx %zx/%zx",
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 2db688f94125..b0e9690bb90c 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -204,7 +204,8 @@ void netfs_prepare_write(struct netfs_io_request *wreq,
* remove entries off of the front.
*/
spin_lock(&wreq->lock);
- list_add_tail(&subreq->rreq_link, &stream->subrequests);
+ /* Write IN_PROGRESS before pointer to new subreq */
+ list_add_tail_release(&subreq->rreq_link, &stream->subrequests);
if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
if (!stream->active) {
stream->collected_to = subreq->start;
diff --git a/include/linux/list.h b/include/linux/list.h
index 00ea8e5fb88b..5af356efd725 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -191,6 +191,29 @@ static inline void list_add_tail(struct list_head *new, struct list_head *head)
__list_add(new, head->prev, head);
}
+/**
+ * list_add_tail_release - add a new entry with release barrier
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head, using a release barrier to set
+ * the ->next pointer that points to it. This is useful for implementing
+ * queues, in particular one that the elements will be walked through forwards
+ * locklessly.
+ */
+static inline void list_add_tail_release(struct list_head *new,
+ struct list_head *head)
+{
+ struct list_head *prev = head->prev;
+
+ if (__list_add_valid(new, prev, head)) {
+ new->next = head;
+ new->prev = prev;
+ head->prev = new;
+ smp_store_release(&prev->next, new);
+ }
+}
+
/*
* Delete a list entry by making the prev/next entries
* point to each other.
@@ -644,6 +667,20 @@ static inline void list_splice_tail_init(struct list_head *list,
pos__ != head__ ? list_entry(pos__, type, member) : NULL; \
})
+/**
+ * list_first_entry_acquire - get the first element from a list with barrier
+ * @ptr: the list head to take the element from.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the list_head within the struct.
+ *
+ * Note that if the list is empty, it returns NULL.
+ */
+#define list_first_entry_acquire(ptr, type, member) ({ \
+ struct list_head *head__ = (ptr); \
+ struct list_head *pos__ = smp_load_acquire(&head__->next); \
+ pos__ != head__ ? list_entry(pos__, type, member) : NULL; \
+})
+
/**
* list_last_entry_or_null - get the last element from a list
* @ptr: the list head to take the element from.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 03/22] netfs: Fix missing locking around retry adding new subreqs
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
2026-04-27 15:29 ` [PATCH v4 01/22] netfs: Fix cancellation of a DIO and single read subrequests David Howells
2026-04-27 15:29 ` [PATCH v4 02/22] netfs: Fix missing barriers when accessing stream->subrequests locklessly David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 04/22] netfs: Fix netfs_read_to_pagecache() to pause on subreq failure David Howells
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel
Fix netfs_retry_read_subrequests() and netfs_retry_write_stream() to take
the appropriate lock when adding extra subrequests into
stream->subrequests.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/read_retry.c | 2 ++
fs/netfs/write_retry.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index cca9ac43c077..b34561e257f0 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -203,7 +203,9 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
refcount_read(&subreq->ref),
netfs_sreq_trace_new);
+ spin_lock(&rreq->lock);
list_add(&subreq->rreq_link, &to->rreq_link);
+ spin_unlock(&rreq->lock);
to = list_next_entry(to, rreq_link);
trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 29489a23a220..db0f23708b2d 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -153,7 +153,9 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
netfs_sreq_trace_new);
trace_netfs_sreq(subreq, netfs_sreq_trace_split);
+ spin_lock(&wreq->lock);
list_add(&subreq->rreq_link, &to->rreq_link);
+ spin_unlock(&wreq->lock);
to = list_next_entry(to, rreq_link);
trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 04/22] netfs: Fix netfs_read_to_pagecache() to pause on subreq failure
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (2 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 03/22] netfs: Fix missing locking around retry adding new subreqs David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 05/22] netfs: Fix potential for tearing in ->remote_i_size and ->zero_point David Howells
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel
Fix netfs_read_to_pagecache() so that it pauses the generation of new
subrequests if an already-issued subrequest fails.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_read.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 3bc7d0c5c3b9..9c7a2f984be9 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -304,6 +304,11 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
}
netfs_issue_read(rreq, subreq);
+
+ if (test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
+ netfs_wait_for_paused_read(rreq);
+ if (test_bit(NETFS_RREQ_FAILED, &rreq->flags))
+ break;
cond_resched();
} while (size > 0);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 05/22] netfs: Fix potential for tearing in ->remote_i_size and ->zero_point
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (3 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 04/22] netfs: Fix netfs_read_to_pagecache() to pause on subreq failure David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 06/22] netfs: Fix zeropoint update where i_size > remote_i_size David Howells
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
Fix potential tearing in using ->remote_i_size and ->zero_point by copying
i_size_read() and i_size_write() and using the same seqcount as for i_size.
Fixes: 4058f742105e ("netfs: Keep track of the actual remote file size")
Fixes: 100ccd18bb41 ("netfs: Optimise away reads above the point at which there can be no data")
Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/9p/vfs_inode.c | 2 +-
fs/9p/vfs_inode_dotl.c | 4 +-
fs/afs/inode.c | 8 +-
fs/afs/write.c | 2 +-
fs/netfs/buffered_read.c | 5 +-
fs/netfs/buffered_write.c | 2 +-
fs/netfs/direct_write.c | 4 +-
fs/netfs/misc.c | 13 +-
fs/netfs/write_collect.c | 3 +-
fs/smb/client/cifsfs.c | 28 ++--
fs/smb/client/cifssmb.c | 2 +-
fs/smb/client/file.c | 9 +-
fs/smb/client/inode.c | 9 +-
fs/smb/client/readdir.c | 3 +-
fs/smb/client/smb2ops.c | 16 +-
fs/smb/client/smb2pdu.c | 2 +-
include/linux/netfs.h | 303 ++++++++++++++++++++++++++++++++++++--
17 files changed, 354 insertions(+), 61 deletions(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index d1508b1fe109..b13156ac2f1f 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1141,7 +1141,7 @@ v9fs_stat2inode(struct p9_wstat *stat, struct inode *inode,
mode |= inode->i_mode & ~S_IALLUGO;
inode->i_mode = mode;
- v9inode->netfs.remote_i_size = stat->length;
+ netfs_write_remote_i_size(&v9inode->netfs, stat->length);
if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE))
v9fs_i_size_write(inode, stat->length);
/* not real number of blocks, but 512 byte ones ... */
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 71796a89bcf4..81d6150a8ae4 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -634,7 +634,7 @@ v9fs_stat2inode_dotl(struct p9_stat_dotl *stat, struct inode *inode,
mode |= inode->i_mode & ~S_IALLUGO;
inode->i_mode = mode;
- v9inode->netfs.remote_i_size = stat->st_size;
+ netfs_write_remote_i_size(&v9inode->netfs, stat->st_size);
if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE))
v9fs_i_size_write(inode, stat->st_size);
inode->i_blocks = stat->st_blocks;
@@ -664,7 +664,7 @@ v9fs_stat2inode_dotl(struct p9_stat_dotl *stat, struct inode *inode,
}
if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE) &&
stat->st_result_mask & P9_STATS_SIZE) {
- v9inode->netfs.remote_i_size = stat->st_size;
+ netfs_write_remote_i_size(&v9inode->netfs, stat->st_size);
v9fs_i_size_write(inode, stat->st_size);
}
if (stat->st_result_mask & P9_STATS_BLOCKS)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index a5173434f786..06e25e1b12df 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -343,11 +343,11 @@ static void afs_apply_status(struct afs_operation *op,
* idea of what the size should be that's not the same as
* what's on the server.
*/
- vnode->netfs.remote_i_size = status->size;
+ netfs_write_remote_i_size(&vnode->netfs, status->size);
if (change_size || status->size > i_size_read(inode)) {
afs_set_i_size(vnode, status->size);
if (unexpected_jump)
- vnode->netfs.zero_point = status->size;
+ netfs_write_zero_point(&vnode->netfs, status->size);
inode_set_ctime_to_ts(inode, t);
inode_set_atime_to_ts(inode, t);
}
@@ -709,7 +709,7 @@ int afs_getattr(struct mnt_idmap *idmap, const struct path *path,
* it, but we need to give userspace the server's size.
*/
if (S_ISDIR(inode->i_mode))
- stat->size = vnode->netfs.remote_i_size;
+ stat->size = netfs_read_remote_i_size(&vnode->netfs);
} while (read_seqretry(&vnode->cb_lock, seq));
return 0;
@@ -889,7 +889,7 @@ int afs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
*/
if (!(attr->ia_valid & (supported & ~ATTR_SIZE & ~ATTR_MTIME)) &&
attr->ia_size < i_size &&
- attr->ia_size > vnode->netfs.remote_i_size) {
+ attr->ia_size > netfs_read_remote_i_size(&vnode->netfs)) {
truncate_setsize(inode, attr->ia_size);
netfs_resize_file(&vnode->netfs, size, false);
fscache_resize_cookie(afs_vnode_cache(vnode),
diff --git a/fs/afs/write.c b/fs/afs/write.c
index fcfed9d24e0a..c087151c4bf9 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -142,7 +142,7 @@ static void afs_issue_write_worker(struct work_struct *work)
afs_begin_vnode_operation(op);
op->store.write_iter = &subreq->io_iter;
- op->store.i_size = umax(pos + len, vnode->netfs.remote_i_size);
+ op->store.i_size = umax(pos + len, netfs_read_remote_i_size(&vnode->netfs));
op->mtime = inode_get_mtime(&vnode->netfs.inode);
afs_wait_for_operation(op);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 9c7a2f984be9..3b26b8113401 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -233,7 +233,8 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
source = netfs_cache_prepare_read(rreq, subreq, rreq->i_size);
subreq->source = source;
if (source == NETFS_DOWNLOAD_FROM_SERVER) {
- unsigned long long zp = umin(ictx->zero_point, rreq->i_size);
+ unsigned long long zero_point = netfs_read_zero_point(ictx);
+ unsigned long long zp = umin(zero_point, rreq->i_size);
size_t len = subreq->len;
if (unlikely(rreq->origin == NETFS_READ_SINGLE))
@@ -249,7 +250,7 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq,
pr_err("ZERO-LEN READ: R=%08x[%x] l=%zx/%zx s=%llx z=%llx i=%llx",
rreq->debug_id, subreq->debug_index,
subreq->len, size,
- subreq->start, ictx->zero_point, rreq->i_size);
+ subreq->start, zero_point, rreq->i_size);
break;
}
subreq->len = len;
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 05ea5b0cc0e8..fc94eb1ef27b 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -230,7 +230,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
* server would just return a block of zeros or a short read if
* we try to read it.
*/
- if (fpos >= ctx->zero_point) {
+ if (fpos >= netfs_read_zero_point(ctx)) {
folio_zero_segment(folio, 0, offset);
copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
if (unlikely(copied == 0))
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index f9ab69de3e29..96c1dad04168 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -376,8 +376,8 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret < 0)
goto out;
end = iocb->ki_pos + iov_iter_count(from);
- if (end > ictx->zero_point)
- ictx->zero_point = end;
+ if (end > netfs_read_zero_point(ictx))
+ netfs_write_zero_point(ictx, end);
fscache_invalidate(netfs_i_cookie(ictx), NULL, i_size_read(inode),
FSCACHE_INVAL_DIO_WRITE);
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 6df89c92b10b..9d92d068f1da 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -221,8 +221,8 @@ void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
unsigned long long fpos = folio_pos(folio), end;
end = umin(fpos + flen, i_size);
- if (fpos < i_size && end > ctx->zero_point)
- ctx->zero_point = end;
+ if (fpos < i_size && end > netfs_read_zero_point(ctx))
+ netfs_write_zero_point(ctx, end);
}
folio_wait_private_2(folio); /* [DEPRECATED] */
@@ -293,14 +293,15 @@ EXPORT_SYMBOL(netfs_invalidate_folio);
bool netfs_release_folio(struct folio *folio, gfp_t gfp)
{
struct netfs_inode *ctx = netfs_inode(folio_inode(folio));
- unsigned long long end;
+ unsigned long long i_size, remote_i_size, zero_point, end;
if (folio_test_dirty(folio))
return false;
- end = umin(folio_next_pos(folio), i_size_read(&ctx->inode));
- if (end > ctx->zero_point)
- ctx->zero_point = end;
+ netfs_read_sizes(ctx, &i_size, &remote_i_size, &zero_point);
+ end = umin(folio_next_pos(folio), i_size);
+ if (end > zero_point)
+ netfs_write_zero_point(ctx, end);
if (folio_test_private(folio))
return false;
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index ba4ac6993b74..f0cafa1d5835 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -69,8 +69,7 @@ int netfs_folio_written_back(struct folio *folio)
unsigned long long fend;
fend = folio_pos(folio) + finfo->dirty_offset + finfo->dirty_len;
- if (fend > ictx->zero_point)
- ictx->zero_point = fend;
+ netfs_push_back_zero_point(ictx, fend);
folio_detach_private(folio);
group = finfo->netfs_group;
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 9f76b0347fa9..dacf35676e75 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -434,7 +434,8 @@ cifs_alloc_inode(struct super_block *sb)
spin_lock_init(&cifs_inode->writers_lock);
cifs_inode->writers = 0;
cifs_inode->netfs.inode.i_blkbits = 14; /* 2**14 = CIFS_MAX_MSGSIZE */
- cifs_inode->netfs.remote_i_size = 0;
+ cifs_inode->netfs._remote_i_size = 0;
+ cifs_inode->netfs._zero_point = 0;
cifs_inode->uniqueid = 0;
cifs_inode->createtime = 0;
cifs_inode->epoch = 0;
@@ -1303,7 +1304,8 @@ static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
struct cifsFileInfo *smb_file_src = src_file->private_data;
struct cifsFileInfo *smb_file_target = dst_file->private_data;
struct cifs_tcon *target_tcon, *src_tcon;
- unsigned long long destend, fstart, fend, old_size, new_size;
+ unsigned long long i_size, old_size, new_size, zero_point;
+ unsigned long long destend, fstart, fend;
unsigned int xid;
int rc;
@@ -1347,7 +1349,7 @@ static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
* Advance the EOF marker after the flush above to the end of the range
* if it's short of that.
*/
- if (src_cifsi->netfs.remote_i_size < off + len) {
+ if (netfs_read_remote_i_size(&src_cifsi->netfs) < off + len) {
rc = cifs_precopy_set_eof(src_inode, src_cifsi, src_tcon, xid, off + len);
if (rc < 0)
goto unlock;
@@ -1368,16 +1370,16 @@ static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
rc = cifs_flush_folio(target_inode, destend, &fstart, &fend, false);
if (rc)
goto unlock;
- if (fend > target_cifsi->netfs.zero_point)
- target_cifsi->netfs.zero_point = fend + 1;
- old_size = target_cifsi->netfs.remote_i_size;
+
+ netfs_read_sizes(&target_cifsi->netfs, &i_size, &old_size, &zero_point);
+ if (fend > zero_point)
+ netfs_write_zero_point(&target_cifsi->netfs, fend + 1);
/* Discard all the folios that overlap the destination region. */
cifs_dbg(FYI, "about to discard pages %llx-%llx\n", fstart, fend);
truncate_inode_pages_range(&target_inode->i_data, fstart, fend);
- fscache_invalidate(cifs_inode_cookie(target_inode), NULL,
- i_size_read(target_inode), 0);
+ fscache_invalidate(cifs_inode_cookie(target_inode), NULL, i_size, 0);
rc = -EOPNOTSUPP;
if (target_tcon->ses->server->ops->duplicate_extents) {
@@ -1402,8 +1404,8 @@ static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
rc = -EINVAL;
}
}
- if (rc == 0 && new_size > target_cifsi->netfs.zero_point)
- target_cifsi->netfs.zero_point = new_size;
+ if (rc == 0)
+ netfs_push_back_zero_point(&target_cifsi->netfs, new_size);
}
/* force revalidate of size and timestamps of target file now
@@ -1474,7 +1476,7 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
* Advance the EOF marker after the flush above to the end of the range
* if it's short of that.
*/
- if (src_cifsi->netfs.remote_i_size < off + len) {
+ if (netfs_read_remote_i_size(&src_cifsi->netfs) < off + len) {
rc = cifs_precopy_set_eof(src_inode, src_cifsi, src_tcon, xid, off + len);
if (rc < 0)
goto unlock;
@@ -1502,8 +1504,8 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
fscache_resize_cookie(cifs_inode_cookie(target_inode),
i_size_read(target_inode));
}
- if (rc > 0 && destoff + rc > target_cifsi->netfs.zero_point)
- target_cifsi->netfs.zero_point = destoff + rc;
+ if (rc > 0)
+ netfs_push_back_zero_point(&target_cifsi->netfs, destoff + rc);
}
file_accessed(src_file);
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 3990a9012264..102dd9dde760 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1538,7 +1538,7 @@ cifs_readv_callback(struct TCP_Server_Info *server, struct mid_q_entry *mid)
} else {
size_t trans = rdata->subreq.transferred + rdata->got_bytes;
if (trans < rdata->subreq.len &&
- rdata->subreq.start + trans >= ictx->remote_i_size) {
+ rdata->subreq.start + trans >= netfs_read_remote_i_size(ictx)) {
rdata->result = 0;
__set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
} else if (rdata->got_bytes > 0) {
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 664a2c223089..c1b152f8d20f 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2518,16 +2518,19 @@ void cifs_write_subrequest_terminated(struct cifs_io_subrequest *wdata, ssize_t
{
struct netfs_io_request *wreq = wdata->rreq;
struct netfs_inode *ictx = netfs_inode(wreq->inode);
+ unsigned long long i_size, remote_i_size, zero_point;
loff_t wrend;
if (result > 0) {
+ netfs_read_sizes(ictx, &i_size, &remote_i_size, &zero_point);
+
wrend = wdata->subreq.start + wdata->subreq.transferred + result;
- if (wrend > ictx->zero_point &&
+ if (wrend > zero_point &&
(wdata->rreq->origin == NETFS_UNBUFFERED_WRITE ||
wdata->rreq->origin == NETFS_DIO_WRITE))
- ictx->zero_point = wrend;
- if (wrend > ictx->remote_i_size)
+ netfs_write_zero_point(ictx, wrend);
+ if (wrend > remote_i_size)
netfs_resize_file(ictx, wrend, true);
}
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index 16a5310155d5..c5a1e37ce55a 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -119,7 +119,7 @@ cifs_revalidate_cache(struct inode *inode, struct cifs_fattr *fattr)
fattr->cf_mtime = timestamp_truncate(fattr->cf_mtime, inode);
mtime = inode_get_mtime(inode);
if (timespec64_equal(&mtime, &fattr->cf_mtime) &&
- cifs_i->netfs.remote_i_size == fattr->cf_eof) {
+ netfs_read_remote_i_size(&cifs_i->netfs) == fattr->cf_eof) {
cifs_dbg(FYI, "%s: inode %llu is unchanged\n",
__func__, cifs_i->uniqueid);
return;
@@ -174,7 +174,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr,
return -ESTALE;
}
if (inode_state_read_once(inode) & I_NEW)
- CIFS_I(inode)->netfs.zero_point = fattr->cf_eof;
+ netfs_write_zero_point(&CIFS_I(inode)->netfs, fattr->cf_eof);
cifs_revalidate_cache(inode, fattr);
@@ -212,7 +212,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr,
else
clear_bit(CIFS_INO_DELETE_PENDING, &cifs_i->flags);
- cifs_i->netfs.remote_i_size = fattr->cf_eof;
+ netfs_write_remote_i_size(&cifs_i->netfs, fattr->cf_eof);
/*
* Can't safely change the file size here if the client is writing to
* it due to potential races.
@@ -2772,7 +2772,8 @@ cifs_revalidate_mapping(struct inode *inode)
if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_RW_CACHE)
goto skip_invalidate;
- cifs_inode->netfs.zero_point = cifs_inode->netfs.remote_i_size;
+ netfs_write_zero_point(&cifs_inode->netfs,
+ netfs_read_remote_i_size(&cifs_inode->netfs));
rc = filemap_invalidate_inode(inode, true, 0, LLONG_MAX);
if (rc) {
cifs_dbg(VFS, "%s: invalidate inode %p failed with rc %d\n",
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index be22bbc4a65a..d88682e89ec0 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -143,7 +143,8 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
fattr->cf_rdev = inode->i_rdev;
fattr->cf_uid = inode->i_uid;
fattr->cf_gid = inode->i_gid;
- fattr->cf_eof = CIFS_I(inode)->netfs.remote_i_size;
+ fattr->cf_eof =
+ netfs_read_remote_i_size(&CIFS_I(inode)->netfs);
fattr->cf_symlink_target = NULL;
} else {
CIFS_I(inode)->time = 0;
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index 7f346ee50289..98638ac17b7b 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -3404,7 +3404,7 @@ static long smb3_zero_range(struct file *file, struct cifs_tcon *tcon,
filemap_invalidate_lock(inode->i_mapping);
i_size = i_size_read(inode);
- remote_size = ictx->remote_i_size;
+ remote_size = netfs_read_remote_i_size(ictx);
if (offset + len >= remote_size && offset < i_size) {
unsigned long long top = umin(offset + len, i_size);
@@ -3439,8 +3439,8 @@ static long smb3_zero_range(struct file *file, struct cifs_tcon *tcon,
if (rc >= 0) {
truncate_setsize(inode, new_size);
netfs_resize_file(&cifsi->netfs, new_size, true);
- if (offset < cifsi->netfs.zero_point)
- cifsi->netfs.zero_point = offset;
+ if (offset < netfs_read_zero_point(&cifsi->netfs))
+ netfs_write_zero_point(&cifsi->netfs, offset);
fscache_resize_cookie(cifs_inode_cookie(inode), new_size);
}
}
@@ -3506,13 +3506,13 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon,
* EOF update will end up in the wrong place.
*/
i_size = i_size_read(inode);
- remote_i_size = netfs_inode(inode)->remote_i_size;
+ remote_i_size = netfs_read_remote_i_size(netfs_inode(inode));
if (end > remote_i_size && i_size > remote_i_size) {
unsigned long long extend_to = umin(end, i_size);
rc = SMB2_set_eof(xid, tcon, cfile->fid.persistent_fid,
cfile->fid.volatile_fid, cfile->pid, extend_to);
if (rc >= 0)
- netfs_inode(inode)->remote_i_size = extend_to;
+ netfs_write_remote_i_size(netfs_inode(inode), extend_to);
}
unlock:
@@ -3794,7 +3794,7 @@ static long smb3_collapse_range(struct file *file, struct cifs_tcon *tcon,
goto out_2;
truncate_pagecache_range(inode, off, old_eof);
- ictx->zero_point = old_eof;
+ netfs_write_zero_point(ictx, old_eof);
netfs_wait_for_outstanding_io(inode);
rc = smb2_copychunk_range(xid, cfile, cfile, off + len,
@@ -3812,7 +3812,7 @@ static long smb3_collapse_range(struct file *file, struct cifs_tcon *tcon,
truncate_setsize(inode, new_eof);
netfs_resize_file(&cifsi->netfs, new_eof, true);
- ictx->zero_point = new_eof;
+ netfs_write_zero_point(ictx, new_eof);
fscache_resize_cookie(cifs_inode_cookie(inode), new_eof);
out_2:
filemap_invalidate_unlock(inode->i_mapping);
@@ -3861,7 +3861,7 @@ static long smb3_insert_range(struct file *file, struct cifs_tcon *tcon,
rc = smb2_copychunk_range(xid, cfile, cfile, off, count, off + len);
if (rc < 0)
goto out_2;
- cifsi->netfs.zero_point = new_eof;
+ netfs_write_zero_point(&cifsi->netfs, new_eof);
rc = smb3_zero_data(file, tcon, off, len, xid);
if (rc < 0)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index cb61051f9af3..368472589fe6 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -4708,7 +4708,7 @@ smb2_readv_callback(struct TCP_Server_Info *server, struct mid_q_entry *mid)
} else {
size_t trans = rdata->subreq.transferred + rdata->got_bytes;
if (trans < rdata->subreq.len &&
- rdata->subreq.start + trans >= ictx->remote_i_size) {
+ rdata->subreq.start + trans >= netfs_read_remote_i_size(ictx)) {
__set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
rdata->result = 0;
}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index ba17ac5bf356..90e061e444ce 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -62,8 +62,8 @@ struct netfs_inode {
struct fscache_cookie *cache;
#endif
struct mutex wb_lock; /* Writeback serialisation */
- loff_t remote_i_size; /* Size of the remote file */
- loff_t zero_point; /* Size after which we assume there's no data
+ loff_t _remote_i_size; /* Size of the remote file */
+ loff_t _zero_point; /* Size after which we assume there's no data
* on the server */
atomic_t io_count; /* Number of outstanding reqs */
unsigned long flags;
@@ -474,6 +474,264 @@ static inline struct netfs_inode *netfs_inode(struct inode *inode)
return container_of(inode, struct netfs_inode, inode);
}
+/**
+ * netfs_read_remote_i_size - Read remote_i_size safely
+ * @ictx: The inode context to access
+ *
+ * Read remote_i_size safely without the potential for tearing on 32-bit
+ * arches.
+ *
+ * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the
+ * i_size_read/write must be atomic with respect to the local cpu (unlike with
+ * preempt disabled), but they don't need to be atomic with respect to other
+ * cpus like in true SMP (so they need either to either locally disable irq
+ * around the read or for example on x86 they can be still implemented as a
+ * cmpxchg8b without the need of the lock prefix). For SMP compiles and 64bit
+ * archs it makes no difference if preempt is enabled or not.
+ */
+static inline unsigned long long netfs_read_remote_i_size(const struct netfs_inode *ictx)
+{
+ unsigned long long remote_i_size;
+
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ const struct inode *inode = &ictx->inode;
+ unsigned int seq;
+
+ do {
+ seq = read_seqcount_begin(&inode->i_size_seqcount);
+ remote_i_size = ictx->_remote_i_size;
+ } while (read_seqcount_retry(&inode->i_size_seqcount, seq));
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ remote_i_size = ictx->_remote_i_size;
+ preempt_enable();
+#else
+ /* Pairs with smp_store_release() in netfs_write_remote_i_size() */
+ remote_i_size = smp_load_acquire(&ictx->_remote_i_size);
+#endif
+ return remote_i_size;
+}
+
+/*
+ * netfs_write_remote_i_size - Set remote_i_size safely
+ * @ictx: The inode context to access
+ * @remote_i_size: The new value for the size of the file on the server
+ *
+ * Set remote_i_size safely without the potential for tearing on 32-bit arches.
+ *
+ * NOTE: unlike netfs_read_remote_i_size(), netfs_write_remote_i_size() does
+ * need locking around it (normally i_rwsem), otherwise on 32bit/SMP an update
+ * of i_size_seqcount can be lost, resulting in subsequent i_size_read() calls
+ * spinning forever.
+ */
+static inline void netfs_write_remote_i_size(struct netfs_inode *ictx,
+ unsigned long long remote_i_size)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ struct inode *inode = &ictx->inode;
+
+ preempt_disable();
+ write_seqcount_begin(&inode->i_size_seqcount);
+ ictx->_remote_i_size = remote_i_size;
+ write_seqcount_end(&inode->i_size_seqcount);
+ preempt_enable();
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ ictx->_remote_i_size = remote_i_size;
+ preempt_enable();
+#else
+ /*
+ * Pairs with smp_load_acquire() in netfs_read_remote_i_size() to
+ * ensure changes related to inode size (such as page contents) are
+ * visible before we see the changed inode size.
+ */
+ smp_store_release(&ictx->_remote_i_size, remote_i_size);
+#endif
+}
+
+/**
+ * netfs_read_zero_point - Read zero_point safely
+ * @ictx: The inode context to access
+ *
+ * Read zero_point safely without the potential for tearing on 32-bit
+ * arches.
+ *
+ * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the
+ * i_size_read/write must be atomic with respect to the local cpu (unlike with
+ * preempt disabled), but they don't need to be atomic with respect to other
+ * cpus like in true SMP (so they need either to either locally disable irq
+ * around the read or for example on x86 they can be still implemented as a
+ * cmpxchg8b without the need of the lock prefix). For SMP compiles and 64bit
+ * archs it makes no difference if preempt is enabled or not.
+ */
+static inline unsigned long long netfs_read_zero_point(const struct netfs_inode *ictx)
+{
+ unsigned long long zero_point;
+
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ const struct inode *inode = &ictx->inode;
+ unsigned int seq;
+
+ do {
+ seq = read_seqcount_begin(&inode->i_size_seqcount);
+ zero_point = ictx->_zero_point;
+ } while (read_seqcount_retry(&inode->i_size_seqcount, seq));
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ zero_point = ictx->_zero_point;
+ preempt_enable();
+#else
+ /* Pairs with smp_store_release() in netfs_write_zero_point() */
+ zero_point = smp_load_acquire(&ictx->_zero_point);
+#endif
+ return zero_point;
+}
+
+/*
+ * netfs_write_zero_point - Set zero_point safely
+ * @ictx: The inode context to access
+ * @zero_point: The new value for the point beyond which the server has no data
+ *
+ * Set zero_point safely without the potential for tearing on 32-bit arches.
+ *
+ * NOTE: unlike netfs_read_zero_point(), netfs_write_zero_point() does need
+ * locking around it (normally i_rwsem), otherwise on 32bit/SMP an update of
+ * i_size_seqcount can be lost, resulting in subsequent read calls spinning
+ * forever.
+ */
+static inline void netfs_write_zero_point(struct netfs_inode *ictx,
+ unsigned long long zero_point)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ struct inode *inode = &ictx->inode;
+
+ preempt_disable();
+ write_seqcount_begin(&inode->i_size_seqcount);
+ ictx->_zero_point = zero_point;
+ write_seqcount_end(&inode->i_size_seqcount);
+ preempt_enable();
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ ictx->_zero_point = zero_point;
+ preempt_enable();
+#else
+ /*
+ * Pairs with smp_load_acquire() in netfs_read_zero_point() to
+ * ensure changes related to inode size (such as page contents) are
+ * visible before we see the changed inode size.
+ */
+ smp_store_release(&ictx->_zero_point, zero_point);
+#endif
+}
+
+/**
+ * netfs_push_back_zero_point - Push back the zero point if unknown data now beyond it
+ * ictx: The inode context to access
+ * to: The end of a new region of unknown data
+ *
+ * Move back the zero_point if we cause a region of unknown data to appear
+ * beyond it (such as doing a copy_file_range).
+ */
+static inline void netfs_push_back_zero_point(struct netfs_inode *ictx,
+ unsigned long long to)
+{
+ if (to > netfs_read_zero_point(ictx))
+ netfs_write_zero_point(ictx, to);
+}
+
+/**
+ * netfs_read_sizes - Read remote_i_size and zero_point safely
+ * @ictx: The inode context to access
+ * @i_size: Where to return the local file size.
+ * @remote_i_size: Where to return the size of the file on the server
+ * @zero_point: Where to return the the point beyond which the server has no data
+ *
+ * Read remote_i_size and zero_point safely without the potential for tearing
+ * on 32-bit arches.
+ *
+ * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the
+ * i_size_read/write must be atomic with respect to the local cpu (unlike with
+ * preempt disabled), but they don't need to be atomic with respect to other
+ * cpus like in true SMP (so they need either to either locally disable irq
+ * around the read or for example on x86 they can be still implemented as a
+ * cmpxchg8b without the need of the lock prefix). For SMP compiles and 64bit
+ * archs it makes no difference if preempt is enabled or not.
+ */
+static inline void netfs_read_sizes(const struct netfs_inode *ictx,
+ unsigned long long *i_size,
+ unsigned long long *remote_i_size,
+ unsigned long long *zero_point)
+{
+ const struct inode *inode = &ictx->inode;
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ unsigned int seq;
+
+ do {
+ seq = read_seqcount_begin(&inode->i_size_seqcount);
+ *i_size = inode->i_size;
+ *remote_i_size = ictx->_remote_i_size;
+ *zero_point = ictx->_zero_point;
+ } while (read_seqcount_retry(&inode->i_size_seqcount, seq));
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ *i_size = inode->i_size;
+ *remote_i_size = ictx->_remote_i_size;
+ *zero_point = ictx->_zero_point;
+ preempt_enable();
+#else
+ /* Pairs with smp_store_release() in i_size_write() */
+ *i_size = smp_load_acquire(&inode->i_size);
+ /* Pairs with smp_store_release() in netfs_write_remote_i_size() */
+ *remote_i_size = smp_load_acquire(&ictx->_remote_i_size);
+ /* Pairs with smp_store_release() in netfs_write_zero_point() */
+ *zero_point = smp_load_acquire(&ictx->_zero_point);
+#endif
+}
+
+/*
+ * netfs_write_sizes - Set remote_i_size and zero_point safely
+ * @ictx: The inode context to access
+ * @remote_i_size: The new value for the size of the file on the server
+ * @zero_point: The new value for the point beyond which the server has no data
+ *
+ * Set both remote_i_size and zero_point safely without the potential for
+ * tearing on 32-bit arches.
+ *
+ * NOTE: unlike netfs_read_zero_point(), netfs_write_zero_point() does need
+ * locking around it (normally i_rwsem), otherwise on 32bit/SMP an update of
+ * i_size_seqcount can be lost, resulting in subsequent read calls spinning
+ * forever.
+ */
+static inline void netfs_write_sizes(struct netfs_inode *ictx,
+ unsigned long long remote_i_size,
+ unsigned long long zero_point)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ struct inode *inode = &ictx->inode;
+
+ preempt_disable();
+ write_seqcount_begin(&inode->i_size_seqcount);
+ ictx->_remote_i_size = remote_i_size;
+ ictx->_zero_point = zero_point;
+ write_seqcount_end(&inode->i_size_seqcount);
+ preempt_enable();
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ ictx->_remote_i_size = remote_i_size;
+ ictx->_zero_point = zero_point;
+ preempt_enable();
+#else
+ /*
+ * Pairs with smp_load_acquire() in netfs_read_remote_i_size and
+ * netfs_read_zero_point() to ensure changes related to inode size
+ * (such as page contents) are visible before we see the changed inode
+ * size.
+ */
+ smp_store_release(&ictx->_remote_i_size, remote_i_size);
+ smp_store_release(&ictx->_zero_point, zero_point);
+#endif
+}
+
/**
* netfs_inode_init - Initialise a netfslib inode context
* @ctx: The netfs inode to initialise
@@ -488,8 +746,8 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
bool use_zero_point)
{
ctx->ops = ops;
- ctx->remote_i_size = i_size_read(&ctx->inode);
- ctx->zero_point = LLONG_MAX;
+ ctx->_remote_i_size = i_size_read(&ctx->inode);
+ ctx->_zero_point = LLONG_MAX;
ctx->flags = 0;
atomic_set(&ctx->io_count, 0);
#if IS_ENABLED(CONFIG_FSCACHE)
@@ -498,7 +756,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
mutex_init(&ctx->wb_lock);
/* ->releasepage() drives zero_point */
if (use_zero_point) {
- ctx->zero_point = ctx->remote_i_size;
+ ctx->_zero_point = ctx->_remote_i_size;
mapping_set_release_always(ctx->inode.i_mapping);
}
}
@@ -511,13 +769,40 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
*
* Inform the netfs lib that a file got resized so that it can adjust its state.
*/
-static inline void netfs_resize_file(struct netfs_inode *ctx, loff_t new_i_size,
+static inline void netfs_resize_file(struct netfs_inode *ictx,
+ unsigned long long new_i_size,
bool changed_on_server)
{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ struct inode *inode = &ictx->inode;
+
+ preempt_disable();
+ write_seqcount_begin(&inode->i_size_seqcount);
+ if (changed_on_server)
+ ictx->_remote_i_size = new_i_size;
+ if (new_i_size < ictx->_zero_point)
+ ictx->_zero_point = new_i_size;
+ write_seqcount_end(&inode->i_size_seqcount);
+ preempt_enable();
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
if (changed_on_server)
- ctx->remote_i_size = new_i_size;
- if (new_i_size < ctx->zero_point)
- ctx->zero_point = new_i_size;
+ ictx->_remote_i_size = new_i_size;
+ if (new_i_size < ictx->_zero_point)
+ ictx->_zero_point = new_i_size;
+ preempt_enable();
+#else
+ /*
+ * Pairs with smp_load_acquire() in netfs_read_remote_i_size and
+ * netfs_read_zero_point() to ensure changes related to inode size
+ * (such as page contents) are visible before we see the changed inode
+ * size.
+ */
+ if (changed_on_server)
+ smp_store_release(&ictx->_remote_i_size, new_i_size);
+ if (new_i_size < ictx->_zero_point)
+ smp_store_release(&ictx->_zero_point, new_i_size);
+#endif
}
/**
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 06/22] netfs: Fix zeropoint update where i_size > remote_i_size
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (4 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 05/22] netfs: Fix potential for tearing in ->remote_i_size and ->zero_point David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 07/22] netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call David Howells
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
Fix the update of the zero point[*] by netfs_release_folio() when there is
uncommitted data in the pagecache beyond the folio being released but the
on-server EOF is in this folio (ie. i_size > remote_i_size). The update
needs to limit zero_point to remote_i_size, not i_size as i_size is a local
phenomenon reflecting updates made locally to the pagecache, not stuff
written to the server. remote_i_size tracks the server's i_size.
[*] The zero point is the file position from which we can assume that the
server will just return zeros, so we can avoid generating reads.
Note that netfs_invalidate_folio() probably doesn't need fixing as
zero_point should be updated by setattr after truncation or fallocate.
Found with:
fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \
/xfstest.test/junk --replay-ops=junk.fsxops
using the following as junk.fsxops:
truncate 0x0 0x1bbae 0x82864
write 0x3ef2e 0xf9c8 0x1bbae
write 0x67e05 0xcb5a 0x4e8f6
mapread 0x57781 0x85b6 0x7495f
copy_range 0x5d3d 0x10329 0x54fac 0x7495f
write 0x64710 0x1c2b 0x7495f
mapread 0x64000 0x1000 0x7495f
on cifs with the default cache option.
It shows read-gaps on folio 0x64 failing with a short read (ie. it hits
EOF) if the FMODE_READ check is commented out in netfs_perform_write():
if (//(file->f_mode & FMODE_READ) ||
netfs_is_cache_enabled(ctx)) {
and no fscache. This was initially found with the generic/522 xfstest.
Fixes: cce6bfa6ca0e ("netfs: Fix trimming of streaming-write folios in netfs_inval_folio()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/misc.c | 4 ++--
include/linux/netfs.h | 35 +++++++++++++++++++++++++++--------
2 files changed, 29 insertions(+), 10 deletions(-)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 9d92d068f1da..37d9651078e6 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -299,9 +299,9 @@ bool netfs_release_folio(struct folio *folio, gfp_t gfp)
return false;
netfs_read_sizes(ctx, &i_size, &remote_i_size, &zero_point);
- end = umin(folio_next_pos(folio), i_size);
+ end = folio_next_pos(folio);
if (end > zero_point)
- netfs_write_zero_point(ctx, end);
+ netfs_push_back_zero_point(ctx, umin(end, remote_i_size));
if (folio_test_private(folio))
return false;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 90e061e444ce..59f35d2eeb2e 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -530,11 +530,11 @@ static inline void netfs_write_remote_i_size(struct netfs_inode *ictx,
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
struct inode *inode = &ictx->inode;
- preempt_disable();
+ spin_lock(&inode->i_lock);
write_seqcount_begin(&inode->i_size_seqcount);
ictx->_remote_i_size = remote_i_size;
write_seqcount_end(&inode->i_size_seqcount);
- preempt_enable();
+ spin_unlock(&inode->i_lock);
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
preempt_disable();
ictx->_remote_i_size = remote_i_size;
@@ -605,11 +605,11 @@ static inline void netfs_write_zero_point(struct netfs_inode *ictx,
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
struct inode *inode = &ictx->inode;
- preempt_disable();
+ spin_lock(&inode->i_lock);
write_seqcount_begin(&inode->i_size_seqcount);
ictx->_zero_point = zero_point;
write_seqcount_end(&inode->i_size_seqcount);
- preempt_enable();
+ spin_unlock(&inode->i_lock);
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
preempt_disable();
ictx->_zero_point = zero_point;
@@ -635,8 +635,27 @@ static inline void netfs_write_zero_point(struct netfs_inode *ictx,
static inline void netfs_push_back_zero_point(struct netfs_inode *ictx,
unsigned long long to)
{
- if (to > netfs_read_zero_point(ictx))
- netfs_write_zero_point(ictx, to);
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+ struct inode *inode = &ictx->inode;
+
+ spin_lock(&inode->i_lock);
+ write_seqcount_begin(&inode->i_size_seqcount);
+ if (to > ictx->_zero_point)
+ ictx->_zero_point = to;
+ write_seqcount_end(&inode->i_size_seqcount);
+ spin_unlock(&inode->i_lock);
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
+ preempt_disable();
+ if (to > ictx->_zero_point)
+ ictx->_zero_point = to;
+ preempt_enable();
+#else
+ unsigned long long old = ictx->_zero_point;
+
+ while (to > old) {
+ old = cmpxchg_release(&ictx->_zero_point, old, to);
+ }
+#endif
}
/**
@@ -709,12 +728,12 @@ static inline void netfs_write_sizes(struct netfs_inode *ictx,
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
struct inode *inode = &ictx->inode;
- preempt_disable();
+ spin_lock(&inode->i_lock);
write_seqcount_begin(&inode->i_size_seqcount);
ictx->_remote_i_size = remote_i_size;
ictx->_zero_point = zero_point;
write_seqcount_end(&inode->i_size_seqcount);
- preempt_enable();
+ spin_unlock(&inode->i_lock);
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
preempt_disable();
ictx->_remote_i_size = remote_i_size;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 07/22] netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (5 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 06/22] netfs: Fix zeropoint update where i_size > remote_i_size David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 08/22] netfs: fix error handling in netfs_extract_user_iter() David Howells
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Viacheslav Dubeyko
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
The multiple runs of generic/013 test-case is capable
to reproduce a kernel BUG at mm/filemap.c:1504 with
probability of 30%.
while true; do
sudo ./check generic/013
done
[ 9849.452376] page: refcount:3 mapcount:0 mapping:00000000e58ff252 index:0x10781 pfn:0x1c322
[ 9849.452412] memcg:ffff8881a1915800
[ 9849.452417] aops:ceph_aops ino:1000058db9e dentry name(?):"f9XXXXXX"
[ 9849.452432] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 9849.452441] raw: 0017ffffc0000000 0000000000000000 dead000000000122 ffff88816110d248
[ 9849.452445] raw: 0000000000010781 0000000000000000 00000003ffffffff ffff8881a1915800
[ 9849.452447] page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
[ 9849.452474] ------------[ cut here ]------------
[ 9849.452476] kernel BUG at mm/filemap.c:1504!
[ 9849.478635] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[ 9849.481772] CPU: 2 UID: 0 PID: 84223 Comm: fsstress Not tainted 7.0.0-rc1+ #18 PREEMPT(full)
[ 9849.482881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/1
0/2025
[ 9849.484539] RIP: 0010:folio_unlock+0x85/0xa0
[ 9849.485076] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 31 f6 31 ff c3 cc
cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f> 0b 48 89 df e8 21 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84
[ 9849.493818] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246
[ 9849.495740] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 0000000000000000
[ 9849.498678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 9849.500559] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 0000000000000000
[ 9849.501097] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000010782000
[ 9849.502108] R13: ffff8881935de738 R14: ffff88816110d010 R15: 0000000000001000
[ 9849.502516] FS: 00007e36cbe94740(0000) GS:ffff88824a899000(0000) knlGS:0000000000000000
[ 9849.502996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9849.503810] CR2: 000000c0002b0000 CR3: 000000011bbf6004 CR4: 0000000000772ef0
[ 9849.504459] PKRU: 55555554
[ 9849.504626] Call Trace:
[ 9849.505242] <TASK>
[ 9849.505379] netfs_write_begin+0x7c8/0x10a0
[ 9849.505877] ? __kasan_check_read+0x11/0x20
[ 9849.506384] ? __pfx_netfs_write_begin+0x10/0x10
[ 9849.507178] ceph_write_begin+0x8c/0x1c0
[ 9849.507934] generic_perform_write+0x391/0x8f0
[ 9849.508503] ? __pfx_generic_perform_write+0x10/0x10
[ 9849.509062] ? file_update_time_flags+0x19a/0x4b0
[ 9849.509581] ? ceph_get_caps+0x63/0xf0
[ 9849.510259] ? ceph_get_caps+0x63/0xf0
[ 9849.510530] ceph_write_iter+0xe79/0x1ae0
[ 9849.511282] ? __pfx_ceph_write_iter+0x10/0x10
[ 9849.511839] ? lock_acquire+0x1ad/0x310
[ 9849.512334] ? ksys_write+0xf9/0x230
[ 9849.512582] ? lock_is_held_type+0xaa/0x140
[ 9849.513128] vfs_write+0x512/0x1110
[ 9849.513634] ? __fget_files+0x33/0x350
[ 9849.513893] ? __pfx_vfs_write+0x10/0x10
[ 9849.514143] ? mutex_lock_nested+0x1b/0x30
[ 9849.514394] ksys_write+0xf9/0x230
[ 9849.514621] ? __pfx_ksys_write+0x10/0x10
[ 9849.514887] ? do_syscall_64+0x25e/0x1520
[ 9849.515122] ? __kasan_check_read+0x11/0x20
[ 9849.515366] ? trace_hardirqs_on_prepare+0x178/0x1c0
[ 9849.515655] __x64_sys_write+0x72/0xd0
[ 9849.515885] ? trace_hardirqs_on+0x24/0x1c0
[ 9849.516130] x64_sys_call+0x22f/0x2390
[ 9849.516341] do_syscall_64+0x12b/0x1520
[ 9849.516545] ? do_syscall_64+0x27c/0x1520
[ 9849.516783] ? do_syscall_64+0x27c/0x1520
[ 9849.517003] ? lock_release+0x318/0x480
[ 9849.517220] ? __x64_sys_io_getevents+0x143/0x2d0
[ 9849.517479] ? percpu_ref_put_many.constprop.0+0x8f/0x210
[ 9849.517779] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9849.518073] ? do_syscall_64+0x25e/0x1520
[ 9849.518291] ? __kasan_check_read+0x11/0x20
[ 9849.518519] ? trace_hardirqs_on_prepare+0x178/0x1c0
[ 9849.518799] ? do_syscall_64+0x27c/0x1520
[ 9849.519024] ? local_clock_noinstr+0xf/0x120
[ 9849.519262] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9849.519544] ? do_syscall_64+0x25e/0x1520
[ 9849.519781] ? __kasan_check_read+0x11/0x20
[ 9849.520008] ? trace_hardirqs_on_prepare+0x178/0x1c0
[ 9849.520273] ? do_syscall_64+0x27c/0x1520
[ 9849.520491] ? trace_hardirqs_on_prepare+0x178/0x1c0
[ 9849.520767] ? irqentry_exit+0x10c/0x6c0
[ 9849.520984] ? trace_hardirqs_off+0x86/0x1b0
[ 9849.521224] ? exc_page_fault+0xab/0x130
[ 9849.521472] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9849.521766] RIP: 0033:0x7e36cbd14907
[ 9849.521989] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 9849.523057] RSP: 002b:00007ffff2d2a968 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 9849.523484] RAX: ffffffffffffffda RBX: 000000000000e549 RCX: 00007e36cbd14907
[ 9849.523885] RDX: 000000000000e549 RSI: 00005bd797ec6370 RDI: 0000000000000004
[ 9849.524277] RBP: 0000000000000004 R08: 0000000000000047 R09: 00005bd797ec6370
[ 9849.524652] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000049
[ 9849.525062] R13: 0000000010781a37 R14: 00005bd797ec6370 R15: 0000000000000000
[ 9849.525447] </TASK>
[ 9849.525574] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass ghash_clmulni_intel aesni_intel input_leds rapl mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 bochs qemu_fw_cfg i2c_smbus pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[ 9849.529150] ---[ end trace 0000000000000000 ]---
[ 9849.529502] RIP: 0010:folio_unlock+0x85/0xa0
[ 9849.530813] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f> 0b 48 89 df e8 21 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84
[ 9849.534986] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246
[ 9849.536198] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 0000000000000000
[ 9849.537718] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 9849.539321] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 0000000000000000
[ 9849.540862] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000010782000
[ 9849.542438] R13: ffff8881935de738 R14: ffff88816110d010 R15: 0000000000001000
[ 9849.543996] FS: 00007e36cbe94740(0000) GS:ffff88824b899000(0000) knlGS:0000000000000000
[ 9849.545854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9849.547092] CR2: 00007e36cb3ff000 CR3: 000000011bbf6006 CR4: 0000000000772ef0
[ 9849.548679] PKRU: 55555554
The race sequence:
1. Read completes -> netfs_read_collection() runs
2. netfs_wake_rreq_flag(rreq, NETFS_RREQ_IN_PROGRESS, ...)
3. netfs_wait_for_read() returns -EFAULT to netfs_write_begin()
4. The netfs_unlock_abandoned_read_pages() unlocks the folio
5. netfs_write_begin() calls folio_unlock(folio) -> VM_BUG_ON_FOLIO()
The key reason of the issue that netfs_unlock_abandoned_read_pages()
doesn't check the flag NETFS_RREQ_NO_UNLOCK_FOLIO and executes
folio_unlock() unconditionally. This patch implements in
netfs_unlock_abandoned_read_pages() logic similar to
netfs_unlock_read_folio().
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: Ceph Development <ceph-devel@vger.kernel.org>
---
fs/netfs/read_retry.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index b34561e257f0..6e5e0da88290 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -290,8 +290,15 @@ void netfs_unlock_abandoned_read_pages(struct netfs_io_request *rreq)
struct folio *folio = folioq_folio(p, slot);
if (folio && !folioq_is_marked2(p, slot)) {
- trace_netfs_folio(folio, netfs_folio_trace_abandon);
- folio_unlock(folio);
+ if (folio->index == rreq->no_unlock_folio &&
+ test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO,
+ &rreq->flags)) {
+ _debug("no unlock");
+ } else {
+ trace_netfs_folio(folio,
+ netfs_folio_trace_abandon);
+ folio_unlock(folio);
+ }
}
}
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 08/22] netfs: fix error handling in netfs_extract_user_iter()
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (6 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 07/22] netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 09/22] netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone David Howells
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Xiaoli Feng, stable
From: Paulo Alcantara <pc@manguebit.org>
In netfs_extract_user_iter(), if iov_iter_extract_pages() failed to
extract user pages, bail out on -ENOMEM, otherwise return the error
code only if @npages == 0, allowing short DIO reads and writes to be
issued.
This fixes mmapstress02 from LTP tests against CIFS.
Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator")
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: netfs@lists.linux.dev
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/iterator.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 154a14bb2d7f..adca78747f23 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -22,7 +22,7 @@
*
* Extract the page fragments from the given amount of the source iterator and
* build up a second iterator that refers to all of those bits. This allows
- * the original iterator to disposed of.
+ * the original iterator to be disposed of.
*
* @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be
* allowed on the pages extracted.
@@ -67,8 +67,8 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
ret = iov_iter_extract_pages(orig, &pages, count,
max_pages - npages, extraction_flags,
&offset);
- if (ret < 0) {
- pr_err("Couldn't get user pages (rc=%zd)\n", ret);
+ if (unlikely(ret <= 0)) {
+ ret = ret ?: -EIO;
break;
}
@@ -97,6 +97,13 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
npages += cur_npages;
}
+ if (ret < 0 && (ret == -ENOMEM || npages == 0)) {
+ for (i = 0; i < npages; i++)
+ unpin_user_page(bv[i].bv_page);
+ kvfree(bv);
+ return ret;
+ }
+
iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count);
return npages;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 09/22] netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (7 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 08/22] netfs: fix error handling in netfs_extract_user_iter() David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 10/22] netfs: Defer the emission of trace_netfs_folio() David Howells
2026-04-27 15:29 ` [PATCH v4 11/22] netfs: Fix streaming write being overwritten David Howells
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Marc Dionne,
Matthew Wilcox
If a streaming write is made, this will leave the relevant modified folio
in a not-uptodate, but dirty state with a netfs_folio struct hung off of
folio->private indicating the dirty range. Subsequently truncating the
file such that the dirty data in the folio is removed, but the first part
of the folio theoretically remains will cause the netfs_folio struct to be
discarded... but will leave the dirty flag set.
If the folio is then read via mmap(), netfs_read_folio() will see that the
page is dirty and jump to netfs_read_gaps() to fill in the missing bits.
netfs_read_gaps(), however, expects there to be a netfs_folio struct
present and can oops because truncate removed it.
Fix this by calling folio_cancel_dirty() in netfs_invalidate_folio() in the
event that all the dirty data in the folio is erased (as nfs does).
Also add some tracepoints to log modifications to a dirty page.
This can be reproduced with something like:
dd if=/dev/zero of=/xfstest.test/foo bs=1M count=1
umount /xfstest.test
mount /xfstest.test
xfs_io -c "w 0xbbbf 0xf96c" \
-c "truncate 0xbbbf" \
-c "mmap -r 0xb000 0x11000" \
-c "mr 0xb000 0x11000" \
/xfstest.test/foo
with fscaching disabled (otherwise streaming writes are suppressed) and a
change to netfs_perform_write() to disallow streaming writes if the fd is
open O_RDWR:
if (//(file->f_mode & FMODE_READ) || <--- comment this out
netfs_is_cache_enabled(ctx)) {
It should be reproducible even without this change, but if prevents the
above trivial xfs_io command from reproducing it.
Note that the initial dd is important: the file must start out sufficiently
large that the zero-point logic doesn't just clear the gaps because it
knows there's nothing in the file to read yet. Unmounting and mounting is
needed to clear the pagecache (there are other ways to do that that may
also work).
This was initially reproduced with the generic/522 xfstest on some patches
that remove the FMODE_READ restriction.
Fixes: 9ebff83e6481 ("netfs: Prep to use folio->private for write grouping and streaming write")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/misc.c | 6 +++++-
include/trace/events/netfs.h | 4 ++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 37d9651078e6..4e91a8d75bce 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -256,6 +256,7 @@ void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
/* Move the start of the data. */
finfo->dirty_len = fend - iend;
finfo->dirty_offset = offset;
+ trace_netfs_folio(folio, netfs_folio_trace_invalidate_front);
return;
}
@@ -264,12 +265,14 @@ void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
*/
if (iend >= fend) {
finfo->dirty_len = offset - fstart;
+ trace_netfs_folio(folio, netfs_folio_trace_invalidate_tail);
return;
}
/* A partial write was split. The caller has already zeroed
* it, so just absorb the hole.
*/
+ trace_netfs_folio(folio, netfs_folio_trace_invalidate_middle);
}
return;
@@ -277,8 +280,9 @@ void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
netfs_put_group(netfs_folio_group(folio));
folio_detach_private(folio);
folio_clear_uptodate(folio);
+ folio_cancel_dirty(folio);
kfree(finfo);
- return;
+ trace_netfs_folio(folio, netfs_folio_trace_invalidate_all);
}
EXPORT_SYMBOL(netfs_invalidate_folio);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 8c936fc575d5..0b702f74aefe 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -194,6 +194,10 @@
EM(netfs_folio_trace_copy_to_cache, "mark-copy") \
EM(netfs_folio_trace_end_copy, "end-copy") \
EM(netfs_folio_trace_filled_gaps, "filled-gaps") \
+ EM(netfs_folio_trace_invalidate_all, "inval-all") \
+ EM(netfs_folio_trace_invalidate_front, "inval-front") \
+ EM(netfs_folio_trace_invalidate_middle, "inval-mid") \
+ EM(netfs_folio_trace_invalidate_tail, "inval-tail") \
EM(netfs_folio_trace_kill, "kill") \
EM(netfs_folio_trace_kill_cc, "kill-cc") \
EM(netfs_folio_trace_kill_g, "kill-g") \
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 10/22] netfs: Defer the emission of trace_netfs_folio()
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (8 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 09/22] netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone David Howells
@ 2026-04-27 15:29 ` David Howells
2026-04-27 15:29 ` [PATCH v4 11/22] netfs: Fix streaming write being overwritten David Howells
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
Change netfs_perform_write() to keep the netfs_folio trace value in a
variable and emit it later to make it easier to choose the value displayed.
This is a prerequisite for a subsequent patch.
Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_write.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index fc94eb1ef27b..7ac128d0b4e5 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -149,6 +149,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
}
do {
+ enum netfs_folio_trace trace;
struct netfs_folio *finfo;
struct netfs_group *group;
unsigned long long fpos;
@@ -222,7 +223,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
if (unlikely(copied == 0))
goto copy_failed;
netfs_set_group(folio, netfs_group);
- trace_netfs_folio(folio, netfs_folio_is_uptodate);
+ trace = netfs_folio_is_uptodate;
goto copied;
}
@@ -238,7 +239,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
folio_zero_segment(folio, offset + copied, flen);
__netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- trace_netfs_folio(folio, netfs_modify_and_clear);
+ trace = netfs_modify_and_clear;
goto copied;
}
@@ -256,7 +257,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
}
__netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- trace_netfs_folio(folio, netfs_whole_folio_modify);
+ trace = netfs_whole_folio_modify;
goto copied;
}
@@ -283,7 +284,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
if (unlikely(copied == 0))
goto copy_failed;
netfs_set_group(folio, netfs_group);
- trace_netfs_folio(folio, netfs_just_prefetch);
+ trace = netfs_just_prefetch;
goto copied;
}
@@ -297,7 +298,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
if (offset == 0 && copied == flen) {
__netfs_set_group(folio, netfs_group);
folio_mark_uptodate(folio);
- trace_netfs_folio(folio, netfs_streaming_filled_page);
+ trace = netfs_streaming_filled_page;
goto copied;
}
@@ -312,7 +313,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
finfo->dirty_len = copied;
folio_attach_private(folio, (void *)((unsigned long)finfo |
NETFS_FOLIO_INFO));
- trace_netfs_folio(folio, netfs_streaming_write);
+ trace = netfs_streaming_write;
goto copied;
}
@@ -332,9 +333,9 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
folio_detach_private(folio);
folio_mark_uptodate(folio);
kfree(finfo);
- trace_netfs_folio(folio, netfs_streaming_cont_filled_page);
+ trace = netfs_streaming_cont_filled_page;
} else {
- trace_netfs_folio(folio, netfs_streaming_write_cont);
+ trace = netfs_streaming_write_cont;
}
goto copied;
}
@@ -350,6 +351,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
continue;
copied:
+ trace_netfs_folio(folio, trace);
flush_dcache_folio(folio);
/* Update the inode size if we moved the EOF marker */
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 11/22] netfs: Fix streaming write being overwritten
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
` (9 preceding siblings ...)
2026-04-27 15:29 ` [PATCH v4 10/22] netfs: Defer the emission of trace_netfs_folio() David Howells
@ 2026-04-27 15:29 ` David Howells
10 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:29 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
In order to avoid reading whilst writing, netfslib will allow "streaming
writes" in which dirty data is stored directly into folios without reading
them first. Such folios are marked dirty but may not be marked uptodate.
If a folio is entirely written by a streaming write, uptodate will be set,
otherwise it will have a netfs_folio struct attached to ->private recording
the dirty region.
In the event that a partially written streaming write page is to be
overwritten entirely by a single write(), netfs_perform_write() will try to
copy over it, but doesn't discard the netfs_folio if it succeeds; further,
it doesn't correctly handle a partial copy that overwrites some of the
dirty data.
Fix this by the following:
(1) If the folio is successfully overwritten, free the netfs_folio struct
before marking the page uptodate.
(2) If the copy to the folio partially fails, but short of the dirty data,
just ignore the copy.
(3) If the copy partially fails and overwrites some of the dirty data,
accept the copy, update the netfs_folio struct to record the new data.
If the folio is now filled, free the netfs_folio and set uptodate,
otherwise return a partial write.
Found with:
fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \
/xfstest.test/junk --replay-ops=junk.fsxops
using the following as junk.fsxops:
truncate 0x0 0 0x927c0
write 0x63fb8 0x53c8 0
copy_range 0xb704 0x19b9 0x24429 0x79380
write 0x2402b 0x144a2 0x90660 *
write 0x204d5 0x140a0 0x927c0 *
copy_range 0x1f72c 0x137d0 0x7a906 0x927c0 *
read 0x00000 0x20000 0x9157c
read 0x20000 0x20000 0x9157c
read 0x40000 0x20000 0x9157c
read 0x60000 0x20000 0x9157c
read 0x7e1a0 0xcfb9 0x9157c
on cifs with the default cache option.
It shows folio 0x24 misbehaving if the FMODE_READ check is commented out in
netfs_perform_write():
if (//(file->f_mode & FMODE_READ) ||
netfs_is_cache_enabled(ctx)) {
and no fscache. This was initially found with the generic/522 xfstest.
Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs_perform_write()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_write.c | 47 ++++++++++++++++++++++++++----------
include/trace/events/netfs.h | 3 +++
2 files changed, 37 insertions(+), 13 deletions(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 7ac128d0b4e5..25571a570ac9 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -246,18 +246,38 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
/* See if we can write a whole folio in one go. */
if (!maybe_trouble && offset == 0 && part >= flen) {
copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
- if (unlikely(copied == 0))
+ if (likely(copied == part)) {
+ if (finfo) {
+ trace = netfs_whole_folio_modify_filled;
+ goto folio_now_filled;
+ }
+ __netfs_set_group(folio, netfs_group);
+ folio_mark_uptodate(folio);
+ trace = netfs_whole_folio_modify;
+ goto copied;
+ }
+ if (copied == 0)
goto copy_failed;
- if (unlikely(copied < part)) {
+ if (!finfo || copied <= finfo->dirty_offset) {
maybe_trouble = true;
iov_iter_revert(iter, copied);
copied = 0;
folio_unlock(folio);
goto retry;
}
- __netfs_set_group(folio, netfs_group);
- folio_mark_uptodate(folio);
- trace = netfs_whole_folio_modify;
+
+ /* We overwrote some existing dirty data, so we have to
+ * accept the partial write.
+ */
+ finfo->dirty_len += finfo->dirty_offset;
+ if (finfo->dirty_len == flen) {
+ trace = netfs_whole_folio_modify_filled_efault;
+ goto folio_now_filled;
+ }
+ if (copied > finfo->dirty_len)
+ finfo->dirty_len = copied;
+ finfo->dirty_offset = 0;
+ trace = netfs_whole_folio_modify_efault;
goto copied;
}
@@ -327,16 +347,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
goto copy_failed;
finfo->dirty_len += copied;
if (finfo->dirty_offset == 0 && finfo->dirty_len == flen) {
- if (finfo->netfs_group)
- folio_change_private(folio, finfo->netfs_group);
- else
- folio_detach_private(folio);
- folio_mark_uptodate(folio);
- kfree(finfo);
trace = netfs_streaming_cont_filled_page;
- } else {
- trace = netfs_streaming_write_cont;
+ goto folio_now_filled;
}
+ trace = netfs_streaming_write_cont;
goto copied;
}
@@ -350,6 +364,13 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
goto out;
continue;
+ folio_now_filled:
+ if (finfo->netfs_group)
+ folio_change_private(folio, finfo->netfs_group);
+ else
+ folio_detach_private(folio);
+ folio_mark_uptodate(folio);
+ kfree(finfo);
copied:
trace_netfs_folio(folio, trace);
flush_dcache_folio(folio);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 0b702f74aefe..aa9940ba307b 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -177,6 +177,9 @@
EM(netfs_folio_is_uptodate, "mod-uptodate") \
EM(netfs_just_prefetch, "mod-prefetch") \
EM(netfs_whole_folio_modify, "mod-whole-f") \
+ EM(netfs_whole_folio_modify_efault, "mod-whole-f!") \
+ EM(netfs_whole_folio_modify_filled, "mod-whole-f+") \
+ EM(netfs_whole_folio_modify_filled_efault, "mod-whole-f+!") \
EM(netfs_modify_and_clear, "mod-n-clear") \
EM(netfs_streaming_write, "mod-streamw") \
EM(netfs_streaming_write_cont, "mod-streamw+") \
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 11/22] netfs: Fix streaming write being overwritten
2026-04-27 15:46 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
@ 2026-04-27 15:46 ` David Howells
0 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2026-04-27 15:46 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
ceph-devel, linux-fsdevel, linux-kernel, Matthew Wilcox
In order to avoid reading whilst writing, netfslib will allow "streaming
writes" in which dirty data is stored directly into folios without reading
them first. Such folios are marked dirty but may not be marked uptodate.
If a folio is entirely written by a streaming write, uptodate will be set,
otherwise it will have a netfs_folio struct attached to ->private recording
the dirty region.
In the event that a partially written streaming write page is to be
overwritten entirely by a single write(), netfs_perform_write() will try to
copy over it, but doesn't discard the netfs_folio if it succeeds; further,
it doesn't correctly handle a partial copy that overwrites some of the
dirty data.
Fix this by the following:
(1) If the folio is successfully overwritten, free the netfs_folio struct
before marking the page uptodate.
(2) If the copy to the folio partially fails, but short of the dirty data,
just ignore the copy.
(3) If the copy partially fails and overwrites some of the dirty data,
accept the copy, update the netfs_folio struct to record the new data.
If the folio is now filled, free the netfs_folio and set uptodate,
otherwise return a partial write.
Found with:
fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \
/xfstest.test/junk --replay-ops=junk.fsxops
using the following as junk.fsxops:
truncate 0x0 0 0x927c0
write 0x63fb8 0x53c8 0
copy_range 0xb704 0x19b9 0x24429 0x79380
write 0x2402b 0x144a2 0x90660 *
write 0x204d5 0x140a0 0x927c0 *
copy_range 0x1f72c 0x137d0 0x7a906 0x927c0 *
read 0x00000 0x20000 0x9157c
read 0x20000 0x20000 0x9157c
read 0x40000 0x20000 0x9157c
read 0x60000 0x20000 0x9157c
read 0x7e1a0 0xcfb9 0x9157c
on cifs with the default cache option.
It shows folio 0x24 misbehaving if the FMODE_READ check is commented out in
netfs_perform_write():
if (//(file->f_mode & FMODE_READ) ||
netfs_is_cache_enabled(ctx)) {
and no fscache. This was initially found with the generic/522 xfstest.
Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs_perform_write()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_write.c | 47 ++++++++++++++++++++++++++----------
include/trace/events/netfs.h | 3 +++
2 files changed, 37 insertions(+), 13 deletions(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 7ac128d0b4e5..25571a570ac9 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -246,18 +246,38 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
/* See if we can write a whole folio in one go. */
if (!maybe_trouble && offset == 0 && part >= flen) {
copied = copy_folio_from_iter_atomic(folio, offset, part, iter);
- if (unlikely(copied == 0))
+ if (likely(copied == part)) {
+ if (finfo) {
+ trace = netfs_whole_folio_modify_filled;
+ goto folio_now_filled;
+ }
+ __netfs_set_group(folio, netfs_group);
+ folio_mark_uptodate(folio);
+ trace = netfs_whole_folio_modify;
+ goto copied;
+ }
+ if (copied == 0)
goto copy_failed;
- if (unlikely(copied < part)) {
+ if (!finfo || copied <= finfo->dirty_offset) {
maybe_trouble = true;
iov_iter_revert(iter, copied);
copied = 0;
folio_unlock(folio);
goto retry;
}
- __netfs_set_group(folio, netfs_group);
- folio_mark_uptodate(folio);
- trace = netfs_whole_folio_modify;
+
+ /* We overwrote some existing dirty data, so we have to
+ * accept the partial write.
+ */
+ finfo->dirty_len += finfo->dirty_offset;
+ if (finfo->dirty_len == flen) {
+ trace = netfs_whole_folio_modify_filled_efault;
+ goto folio_now_filled;
+ }
+ if (copied > finfo->dirty_len)
+ finfo->dirty_len = copied;
+ finfo->dirty_offset = 0;
+ trace = netfs_whole_folio_modify_efault;
goto copied;
}
@@ -327,16 +347,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
goto copy_failed;
finfo->dirty_len += copied;
if (finfo->dirty_offset == 0 && finfo->dirty_len == flen) {
- if (finfo->netfs_group)
- folio_change_private(folio, finfo->netfs_group);
- else
- folio_detach_private(folio);
- folio_mark_uptodate(folio);
- kfree(finfo);
trace = netfs_streaming_cont_filled_page;
- } else {
- trace = netfs_streaming_write_cont;
+ goto folio_now_filled;
}
+ trace = netfs_streaming_write_cont;
goto copied;
}
@@ -350,6 +364,13 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
goto out;
continue;
+ folio_now_filled:
+ if (finfo->netfs_group)
+ folio_change_private(folio, finfo->netfs_group);
+ else
+ folio_detach_private(folio);
+ folio_mark_uptodate(folio);
+ kfree(finfo);
copied:
trace_netfs_folio(folio, trace);
flush_dcache_folio(folio);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 0b702f74aefe..aa9940ba307b 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -177,6 +177,9 @@
EM(netfs_folio_is_uptodate, "mod-uptodate") \
EM(netfs_just_prefetch, "mod-prefetch") \
EM(netfs_whole_folio_modify, "mod-whole-f") \
+ EM(netfs_whole_folio_modify_efault, "mod-whole-f!") \
+ EM(netfs_whole_folio_modify_filled, "mod-whole-f+") \
+ EM(netfs_whole_folio_modify_filled_efault, "mod-whole-f+!") \
EM(netfs_modify_and_clear, "mod-n-clear") \
EM(netfs_streaming_write, "mod-streamw") \
EM(netfs_streaming_write_cont, "mod-streamw+") \
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-27 15:47 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 15:29 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
2026-04-27 15:29 ` [PATCH v4 01/22] netfs: Fix cancellation of a DIO and single read subrequests David Howells
2026-04-27 15:29 ` [PATCH v4 02/22] netfs: Fix missing barriers when accessing stream->subrequests locklessly David Howells
2026-04-27 15:29 ` [PATCH v4 03/22] netfs: Fix missing locking around retry adding new subreqs David Howells
2026-04-27 15:29 ` [PATCH v4 04/22] netfs: Fix netfs_read_to_pagecache() to pause on subreq failure David Howells
2026-04-27 15:29 ` [PATCH v4 05/22] netfs: Fix potential for tearing in ->remote_i_size and ->zero_point David Howells
2026-04-27 15:29 ` [PATCH v4 06/22] netfs: Fix zeropoint update where i_size > remote_i_size David Howells
2026-04-27 15:29 ` [PATCH v4 07/22] netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call David Howells
2026-04-27 15:29 ` [PATCH v4 08/22] netfs: fix error handling in netfs_extract_user_iter() David Howells
2026-04-27 15:29 ` [PATCH v4 09/22] netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone David Howells
2026-04-27 15:29 ` [PATCH v4 10/22] netfs: Defer the emission of trace_netfs_folio() David Howells
2026-04-27 15:29 ` [PATCH v4 11/22] netfs: Fix streaming write being overwritten David Howells
-- strict thread matches above, loose matches on Subject: below --
2026-04-27 15:46 [PATCH v4 00/22] netfs: Miscellaneous fixes David Howells
2026-04-27 15:46 ` [PATCH v4 11/22] netfs: Fix streaming write being overwritten David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox