* [PATCH 01/13] netfs: Fix hang due to missing case in final DIO read result collection
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 02/13] netfs: Fix double put of request David Howells
` (13 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
When doing a DIO read, if the subrequests we issue fail and cause the
request PAUSE flag to be set to put a pause on subrequest generation, we
may complete collection of the subrequests (possibly discarding them) prior
to the ALL_QUEUED flags being set.
In such a case, netfs_read_collection() doesn't see ALL_QUEUED being set
after netfs_collect_read_results() returns and will just return to the app
(the collector can be seen unpausing the generator in the trace log).
The subrequest generator can then set ALL_QUEUED and the app thread reaches
netfs_wait_for_request(). This causes netfs_collect_in_app() to be called
to see if we're done yet, but there's missing case here.
netfs_collect_in_app() will see that a thread is active and set inactive to
false, but won't see any subrequests in the read stream, and so won't set
need_collect to true. The function will then just return 0, indicating
that the caller should just sleep until further activity (which won't be
forthcoming) occurs.
Fix this by making netfs_collect_in_app() check to see if an active thread
is complete - i.e. that ALL_QUEUED is set and the subrequests list is empty
- and to skip the sleep return path. The collector will then be called
which will clear the request IN_PROGRESS flag, allowing the app to
progress.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Reported-by: Steve French <sfrench@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
Tested-by: Steve French <sfrench@samba.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/misc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 43b67a28a8fa..0a54b1203486 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -381,7 +381,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
static int netfs_collect_in_app(struct netfs_io_request *rreq,
bool (*collector)(struct netfs_io_request *rreq))
{
- bool need_collect = false, inactive = true;
+ bool need_collect = false, inactive = true, done = true;
for (int i = 0; i < NR_IO_STREAMS; i++) {
struct netfs_io_subrequest *subreq;
@@ -400,9 +400,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
need_collect = true;
break;
}
+ if (subreq || !test_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags))
+ done = false;
}
- if (!need_collect && !inactive)
+ if (!need_collect && !inactive && !done)
return 0; /* Sleep */
__set_current_state(TASK_RUNNING);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 02/13] netfs: Fix double put of request
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
2025-07-01 16:38 ` [PATCH 01/13] netfs: Fix hang due to missing case in final DIO read result collection David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 03/13] netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling David Howells
` (12 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
If a netfs request finishes during the pause loop, it will have the ref
that belongs to the IN_PROGRESS flag removed at that point - however, if it
then goes to the final wait loop, that will *also* put the ref because it
sees that the IN_PROGRESS flag is clear and incorrectly assumes that this
happened when it called the collector.
In fact, since IN_PROGRESS is clear, we shouldn't call the collector again
since it's done all the cleanup, such as calling ->ki_complete().
Fix this by making netfs_collect_in_app() just return, indicating that
we're done if IN_PROGRESS is removed.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
Tested-by: Steve French <sfrench@samba.org>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
---
fs/netfs/misc.c | 5 +++++
include/trace/events/netfs.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 0a54b1203486..8cf73b237269 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -383,6 +383,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
{
bool need_collect = false, inactive = true, done = true;
+ if (!netfs_check_rreq_in_progress(rreq)) {
+ trace_netfs_rreq(rreq, netfs_rreq_trace_recollect);
+ return 1; /* Done */
+ }
+
for (int i = 0; i < NR_IO_STREAMS; i++) {
struct netfs_io_subrequest *subreq;
struct netfs_io_stream *stream = &rreq->io_streams[i];
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 333d2e38dd2c..ba35dc66e986 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -56,6 +56,7 @@
EM(netfs_rreq_trace_dirty, "DIRTY ") \
EM(netfs_rreq_trace_done, "DONE ") \
EM(netfs_rreq_trace_free, "FREE ") \
+ EM(netfs_rreq_trace_recollect, "RECLLCT") \
EM(netfs_rreq_trace_redirty, "REDIRTY") \
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
EM(netfs_rreq_trace_set_abandon, "S-ABNDN") \
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 03/13] netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
2025-07-01 16:38 ` [PATCH 01/13] netfs: Fix hang due to missing case in final DIO read result collection David Howells
2025-07-01 16:38 ` [PATCH 02/13] netfs: Fix double put of request David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 04/13] netfs: Fix looping in wait functions David Howells
` (11 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
Provide helpers to clear and test the NETFS_RREQ_IN_PROGRESS and to insert
the appropriate barrierage.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
Tested-by: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/internal.h | 18 ++++++++++++++++++
fs/netfs/misc.c | 10 +++++-----
fs/netfs/read_collect.c | 4 ++--
fs/netfs/write_collect.c | 4 ++--
4 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index e2ee9183392b..d6656d2b54ab 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -274,6 +274,24 @@ static inline void netfs_wake_rreq_flag(struct netfs_io_request *rreq,
}
}
+/*
+ * Test the NETFS_RREQ_IN_PROGRESS flag, inserting an appropriate barrier.
+ */
+static inline bool netfs_check_rreq_in_progress(const struct netfs_io_request *rreq)
+{
+ /* Order read of flags before read of anything else, such as error. */
+ return test_bit_acquire(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
+}
+
+/*
+ * Test the NETFS_SREQ_IN_PROGRESS flag, inserting an appropriate barrier.
+ */
+static inline bool netfs_check_subreq_in_progress(const struct netfs_io_subrequest *subreq)
+{
+ /* Order read of flags before read of anything else, such as error. */
+ return test_bit_acquire(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+}
+
/*
* fscache-cache.c
*/
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 8cf73b237269..7f31c3cbfe01 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -356,14 +356,14 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
DEFINE_WAIT(myself);
list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
- if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
+ if (!netfs_check_subreq_in_progress(subreq))
continue;
trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
for (;;) {
prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
- if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
+ if (!netfs_check_subreq_in_progress(subreq))
break;
trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
@@ -400,7 +400,7 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
struct netfs_io_subrequest,
rreq_link);
if (subreq &&
- (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
+ (!netfs_check_subreq_in_progress(subreq) ||
test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
need_collect = true;
break;
@@ -451,7 +451,7 @@ static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
}
}
- if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags))
+ if (!netfs_check_rreq_in_progress(rreq))
break;
schedule();
@@ -518,7 +518,7 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
}
}
- if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags) ||
+ if (!netfs_check_rreq_in_progress(rreq) ||
!test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
break;
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 96ee18af28ef..cceed9d629c6 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -218,7 +218,7 @@ static void netfs_collect_read_results(struct netfs_io_request *rreq)
stream->collected_to = front->start;
}
- if (test_bit(NETFS_SREQ_IN_PROGRESS, &front->flags))
+ if (netfs_check_subreq_in_progress(front))
notes |= HIT_PENDING;
smp_rmb(); /* Read counters after IN_PROGRESS flag. */
transferred = READ_ONCE(front->transferred);
@@ -445,7 +445,7 @@ void netfs_read_collection_worker(struct work_struct *work)
struct netfs_io_request *rreq = container_of(work, struct netfs_io_request, work);
netfs_see_request(rreq, netfs_rreq_trace_see_work);
- if (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) {
+ if (netfs_check_rreq_in_progress(rreq)) {
if (netfs_read_collection(rreq))
/* Drop the ref from the IN_PROGRESS flag. */
netfs_put_request(rreq, netfs_rreq_trace_put_work_ip);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index e2b102ffb768..2ac85a819b71 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -240,7 +240,7 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
}
/* Stall if the front is still undergoing I/O. */
- if (test_bit(NETFS_SREQ_IN_PROGRESS, &front->flags)) {
+ if (netfs_check_subreq_in_progress(front)) {
notes |= HIT_PENDING;
break;
}
@@ -434,7 +434,7 @@ void netfs_write_collection_worker(struct work_struct *work)
struct netfs_io_request *rreq = container_of(work, struct netfs_io_request, work);
netfs_see_request(rreq, netfs_rreq_trace_see_work);
- if (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) {
+ if (netfs_check_rreq_in_progress(rreq)) {
if (netfs_write_collection(rreq))
/* Drop the ref from the IN_PROGRESS flag. */
netfs_put_request(rreq, netfs_rreq_trace_put_work_ip);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 04/13] netfs: Fix looping in wait functions
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (2 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 03/13] netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 05/13] netfs: Fix ref leak on inserted extra subreq in write retry David Howells
` (10 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
netfs_wait_for_request() and netfs_wait_for_pause() can loop forever if
netfs_collect_in_app() returns 2, indicating that it wants to repeat
because the ALL_QUEUED flag isn't yet set and there are no subreqs left
that haven't been collected.
The problem is that, unless collection is offloaded (OFFLOAD_COLLECTION),
we have to return to the application thread to continue and eventually set
ALL_QUEUED after pausing to deal with a retry - but we never get there.
Fix this by inserting checks for the IN_PROGRESS and PAUSE flags as
appropriate before cycling round - and add cond_resched() for good measure.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
Tested-by: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/misc.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 7f31c3cbfe01..127a269938bb 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -430,8 +430,8 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
/*
* Wait for a request to complete, successfully or otherwise.
*/
-static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
- bool (*collector)(struct netfs_io_request *rreq))
+static ssize_t netfs_wait_for_in_progress(struct netfs_io_request *rreq,
+ bool (*collector)(struct netfs_io_request *rreq))
{
DEFINE_WAIT(myself);
ssize_t ret;
@@ -447,6 +447,9 @@ static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
case 1:
goto all_collected;
case 2:
+ if (!netfs_check_rreq_in_progress(rreq))
+ break;
+ cond_resched();
continue;
}
}
@@ -485,12 +488,12 @@ static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
ssize_t netfs_wait_for_read(struct netfs_io_request *rreq)
{
- return netfs_wait_for_request(rreq, netfs_read_collection);
+ return netfs_wait_for_in_progress(rreq, netfs_read_collection);
}
ssize_t netfs_wait_for_write(struct netfs_io_request *rreq)
{
- return netfs_wait_for_request(rreq, netfs_write_collection);
+ return netfs_wait_for_in_progress(rreq, netfs_write_collection);
}
/*
@@ -514,6 +517,10 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
case 1:
goto all_collected;
case 2:
+ if (!netfs_check_rreq_in_progress(rreq) ||
+ !test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
+ break;
+ cond_resched();
continue;
}
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 05/13] netfs: Fix ref leak on inserted extra subreq in write retry
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (3 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 04/13] netfs: Fix looping in wait functions David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 06/13] smb: client: set missing retry flag in smb2_writev_callback() David Howells
` (9 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
The write-retry algorithm will insert extra subrequests into the list if it
can't get sufficient capacity to split the range that needs to be retried
into the sequence of subrequests it currently has (for instance, if the
cifs credit pool has fewer credits available than it did when the range was
originally divided).
However, the allocator furnishes each new subreq with 2 refs and then
another is added for resubmission, causing one to be leaked.
Fix this by replacing the ref-getting line with a neutral trace line.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
Tested-by: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/write_retry.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 9d1d8a8bab72..7158657061e9 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -153,7 +153,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
refcount_read(&subreq->ref),
netfs_sreq_trace_new);
- netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+ trace_netfs_sreq(subreq, netfs_sreq_trace_split);
list_add(&subreq->rreq_link, &to->rreq_link);
to = list_next_entry(to, rreq_link);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 06/13] smb: client: set missing retry flag in smb2_writev_callback()
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (4 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 05/13] netfs: Fix ref leak on inserted extra subreq in write retry David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 07/13] smb: client: set missing retry flag in cifs_readv_callback() David Howells
` (8 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
From: Paulo Alcantara <pc@manguebit.org>
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
---
fs/smb/client/smb2pdu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index a717be1626a3..084ee66e73fd 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -4862,6 +4862,7 @@ smb2_writev_callback(struct mid_q_entry *mid)
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
+ __set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
result = -EAGAIN;
break;
case MID_RESPONSE_MALFORMED:
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 07/13] smb: client: set missing retry flag in cifs_readv_callback()
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (5 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 06/13] smb: client: set missing retry flag in smb2_writev_callback() David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 08/13] smb: client: set missing retry flag in cifs_writev_callback() David Howells
` (7 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
From: Paulo Alcantara <pc@manguebit.org>
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
---
fs/smb/client/cifssmb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 7216fcec79e8..f9ccae5de5b8 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1335,6 +1335,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
+ __set_bit(NETFS_SREQ_NEED_RETRY, &rdata->subreq.flags);
rdata->result = -EAGAIN;
if (server->sign && rdata->got_bytes)
/* reset bytes number since we can not check a sign */
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 08/13] smb: client: set missing retry flag in cifs_writev_callback()
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (6 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 07/13] smb: client: set missing retry flag in cifs_readv_callback() David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 09/13] smb: client: fix warning when reconnecting channel David Howells
` (6 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
From: Paulo Alcantara <pc@manguebit.org>
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
---
fs/smb/client/cifssmb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index f9ccae5de5b8..0e509a0433fb 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1715,6 +1715,7 @@ cifs_writev_callback(struct mid_q_entry *mid)
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
+ __set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
result = -EAGAIN;
break;
default:
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 09/13] smb: client: fix warning when reconnecting channel
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (7 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 08/13] smb: client: set missing retry flag in cifs_writev_callback() David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 17:07 ` Steve French
2025-07-01 16:38 ` [PATCH 10/13] netfs: Fix i_size updating David Howells
` (5 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
From: Paulo Alcantara <pc@manguebit.org>
When reconnecting a channel in smb2_reconnect_server(), a dummy tcon
is passed down to smb2_reconnect() with ->query_interface
uninitialized, so we can't call queue_delayed_work() on it.
Fix the following warning by ensuring that we're queueing the delayed
worker from correct tcon.
WARNING: CPU: 4 PID: 1126 at kernel/workqueue.c:2498 __queue_delayed_work+0x1d2/0x200
Modules linked in: cifs cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CPU: 4 UID: 0 PID: 1126 Comm: kworker/4:0 Not tainted 6.16.0-rc3 #5 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-4.fc42 04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__queue_delayed_work+0x1d2/0x200
Code: 41 5e 41 5f e9 7f ee ff ff 90 0f 0b 90 e9 5d ff ff ff bf 02 00
00 00 e8 6c f3 07 00 89 c3 eb bd 90 0f 0b 90 e9 57 f> 0b 90 e9 65 fe
ff ff 90 0f 0b 90 e9 72 fe ff ff 90 0f 0b 90 e9
RSP: 0018:ffffc900014afad8 EFLAGS: 00010003
RAX: 0000000000000000 RBX: ffff888124d99988 RCX: ffffffff81399cc1
RDX: dffffc0000000000 RSI: ffff888114326e00 RDI: ffff888124d999f0
RBP: 000000000000ea60 R08: 0000000000000001 R09: ffffed10249b3331
R10: ffff888124d9998f R11: 0000000000000004 R12: 0000000000000040
R13: ffff888114326e00 R14: ffff888124d999d8 R15: ffff888114939020
FS: 0000000000000000(0000) GS:ffff88829f7fe000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe7a2b4038 CR3: 0000000120a6f000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
queue_delayed_work_on+0xb4/0xc0
smb2_reconnect+0xb22/0xf50 [cifs]
smb2_reconnect_server+0x413/0xd40 [cifs]
? __pfx_smb2_reconnect_server+0x10/0x10 [cifs]
? local_clock_noinstr+0xd/0xd0
? local_clock+0x15/0x30
? lock_release+0x29b/0x390
process_one_work+0x4c5/0xa10
? __pfx_process_one_work+0x10/0x10
? __list_add_valid_or_report+0x37/0x120
worker_thread+0x2f1/0x5a0
? __kthread_parkme+0xde/0x100
? __pfx_worker_thread+0x10/0x10
kthread+0x1fe/0x380
? kthread+0x10f/0x380
? __pfx_kthread+0x10/0x10
? local_clock_noinstr+0xd/0xd0
? ret_from_fork+0x1b/0x1f0
? local_clock+0x15/0x30
? lock_release+0x29b/0x390
? rcu_is_watching+0x20/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork+0x15b/0x1f0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
irq event stamp: 1116206
hardirqs last enabled at (1116205): [<ffffffff8143af42>] __up_console_sem+0x52/0x60
hardirqs last disabled at (1116206): [<ffffffff81399f0e>] queue_delayed_work_on+0x6e/0xc0
softirqs last enabled at (1116138): [<ffffffffc04562fd>] __smb_send_rqst+0x42d/0x950 [cifs]
softirqs last disabled at (1116136): [<ffffffff823d35e1>] release_sock+0x21/0xf0
Cc: linux-cifs@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Fixes: 42ca547b13a2 ("cifs: do not disable interface polling on failure")
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve French <sfrench@samba.org>
---
fs/smb/client/cifsglob.h | 1 +
fs/smb/client/smb2pdu.c | 10 ++++------
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 318a8405d475..fdd95e5100cd 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1303,6 +1303,7 @@ struct cifs_tcon {
bool use_persistent:1; /* use persistent instead of durable handles */
bool no_lease:1; /* Do not request leases on files or directories */
bool use_witness:1; /* use witness protocol */
+ bool dummy:1; /* dummy tcon used for reconnecting channels */
__le32 capabilities;
__u32 share_flags;
__u32 maximal_access;
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 084ee66e73fd..572cfc42dda8 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -424,9 +424,9 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
free_xid(xid);
ses->flags &= ~CIFS_SES_FLAGS_PENDING_QUERY_INTERFACES;
- /* regardless of rc value, setup polling */
- queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
- (SMB_INTERFACE_POLL_INTERVAL * HZ));
+ if (!tcon->ipc && !tcon->dummy)
+ queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
+ (SMB_INTERFACE_POLL_INTERVAL * HZ));
mutex_unlock(&ses->session_mutex);
@@ -4229,10 +4229,8 @@ void smb2_reconnect_server(struct work_struct *work)
}
goto done;
}
-
tcon->status = TID_GOOD;
- tcon->retry = false;
- tcon->need_reconnect = false;
+ tcon->dummy = true;
/* now reconnect sessions for necessary channels */
list_for_each_entry_safe(ses, ses2, &tmp_ses_list, rlist) {
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 09/13] smb: client: fix warning when reconnecting channel
2025-07-01 16:38 ` [PATCH 09/13] smb: client: fix warning when reconnecting channel David Howells
@ 2025-07-01 17:07 ` Steve French
0 siblings, 0 replies; 25+ messages in thread
From: Steve French @ 2025-07-01 17:07 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara, samba-technical, Shyam Prasad
I already have this patch in my for-next branch (which also includes
the Reviewed-by from Shyam)
On Tue, Jul 1, 2025 at 11:44 AM David Howells <dhowells@redhat.com> wrote:
>
> From: Paulo Alcantara <pc@manguebit.org>
>
> When reconnecting a channel in smb2_reconnect_server(), a dummy tcon
> is passed down to smb2_reconnect() with ->query_interface
> uninitialized, so we can't call queue_delayed_work() on it.
>
> Fix the following warning by ensuring that we're queueing the delayed
> worker from correct tcon.
>
> WARNING: CPU: 4 PID: 1126 at kernel/workqueue.c:2498 __queue_delayed_work+0x1d2/0x200
> Modules linked in: cifs cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
> CPU: 4 UID: 0 PID: 1126 Comm: kworker/4:0 Not tainted 6.16.0-rc3 #5 PREEMPT(voluntary)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-4.fc42 04/01/2014
> Workqueue: cifsiod smb2_reconnect_server [cifs]
> RIP: 0010:__queue_delayed_work+0x1d2/0x200
> Code: 41 5e 41 5f e9 7f ee ff ff 90 0f 0b 90 e9 5d ff ff ff bf 02 00
> 00 00 e8 6c f3 07 00 89 c3 eb bd 90 0f 0b 90 e9 57 f> 0b 90 e9 65 fe
> ff ff 90 0f 0b 90 e9 72 fe ff ff 90 0f 0b 90 e9
> RSP: 0018:ffffc900014afad8 EFLAGS: 00010003
> RAX: 0000000000000000 RBX: ffff888124d99988 RCX: ffffffff81399cc1
> RDX: dffffc0000000000 RSI: ffff888114326e00 RDI: ffff888124d999f0
> RBP: 000000000000ea60 R08: 0000000000000001 R09: ffffed10249b3331
> R10: ffff888124d9998f R11: 0000000000000004 R12: 0000000000000040
> R13: ffff888114326e00 R14: ffff888124d999d8 R15: ffff888114939020
> FS: 0000000000000000(0000) GS:ffff88829f7fe000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe7a2b4038 CR3: 0000000120a6f000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> queue_delayed_work_on+0xb4/0xc0
> smb2_reconnect+0xb22/0xf50 [cifs]
> smb2_reconnect_server+0x413/0xd40 [cifs]
> ? __pfx_smb2_reconnect_server+0x10/0x10 [cifs]
> ? local_clock_noinstr+0xd/0xd0
> ? local_clock+0x15/0x30
> ? lock_release+0x29b/0x390
> process_one_work+0x4c5/0xa10
> ? __pfx_process_one_work+0x10/0x10
> ? __list_add_valid_or_report+0x37/0x120
> worker_thread+0x2f1/0x5a0
> ? __kthread_parkme+0xde/0x100
> ? __pfx_worker_thread+0x10/0x10
> kthread+0x1fe/0x380
> ? kthread+0x10f/0x380
> ? __pfx_kthread+0x10/0x10
> ? local_clock_noinstr+0xd/0xd0
> ? ret_from_fork+0x1b/0x1f0
> ? local_clock+0x15/0x30
> ? lock_release+0x29b/0x390
> ? rcu_is_watching+0x20/0x50
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x15b/0x1f0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> irq event stamp: 1116206
> hardirqs last enabled at (1116205): [<ffffffff8143af42>] __up_console_sem+0x52/0x60
> hardirqs last disabled at (1116206): [<ffffffff81399f0e>] queue_delayed_work_on+0x6e/0xc0
> softirqs last enabled at (1116138): [<ffffffffc04562fd>] __smb_send_rqst+0x42d/0x950 [cifs]
> softirqs last disabled at (1116136): [<ffffffff823d35e1>] release_sock+0x21/0xf0
>
> Cc: linux-cifs@vger.kernel.org
> Reported-by: David Howells <dhowells@redhat.com>
> Fixes: 42ca547b13a2 ("cifs: do not disable interface polling on failure")
> Reviewed-by: David Howells <dhowells@redhat.com>
> Tested-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Tested-by: Steve French <sfrench@samba.org>
> ---
> fs/smb/client/cifsglob.h | 1 +
> fs/smb/client/smb2pdu.c | 10 ++++------
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
> index 318a8405d475..fdd95e5100cd 100644
> --- a/fs/smb/client/cifsglob.h
> +++ b/fs/smb/client/cifsglob.h
> @@ -1303,6 +1303,7 @@ struct cifs_tcon {
> bool use_persistent:1; /* use persistent instead of durable handles */
> bool no_lease:1; /* Do not request leases on files or directories */
> bool use_witness:1; /* use witness protocol */
> + bool dummy:1; /* dummy tcon used for reconnecting channels */
> __le32 capabilities;
> __u32 share_flags;
> __u32 maximal_access;
> diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
> index 084ee66e73fd..572cfc42dda8 100644
> --- a/fs/smb/client/smb2pdu.c
> +++ b/fs/smb/client/smb2pdu.c
> @@ -424,9 +424,9 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
> free_xid(xid);
> ses->flags &= ~CIFS_SES_FLAGS_PENDING_QUERY_INTERFACES;
>
> - /* regardless of rc value, setup polling */
> - queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
> - (SMB_INTERFACE_POLL_INTERVAL * HZ));
> + if (!tcon->ipc && !tcon->dummy)
> + queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
> + (SMB_INTERFACE_POLL_INTERVAL * HZ));
>
> mutex_unlock(&ses->session_mutex);
>
> @@ -4229,10 +4229,8 @@ void smb2_reconnect_server(struct work_struct *work)
> }
> goto done;
> }
> -
> tcon->status = TID_GOOD;
> - tcon->retry = false;
> - tcon->need_reconnect = false;
> + tcon->dummy = true;
>
> /* now reconnect sessions for necessary channels */
> list_for_each_entry_safe(ses, ses2, &tmp_ses_list, rlist) {
>
>
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 10/13] netfs: Fix i_size updating
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (8 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 09/13] smb: client: fix warning when reconnecting channel David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 11/13] netfs: Merge i_size update functions David Howells
` (4 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
Fix the updating of i_size, particularly in regard to the completion of DIO
writes and especially async DIO writes by using a lock.
The bug is triggered occasionally by the generic/207 xfstest as it chucks a
bunch of AIO DIO writes at the filesystem and then checks that fstat()
returns a reasonable st_size as each completes.
The problem is that netfs is trying to do "if new_size > inode->i_size,
update inode->i_size" sort of thing but without a lock around it.
This can be seen with cifs, but shouldn't be seen with kafs because kafs
serialises modification ops on the client whereas cifs sends the requests
to the server as they're generated and lets the server order them.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_write.c | 2 ++
fs/netfs/direct_write.c | 8 ++++++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 72a3e6db2524..b87ef3fe4ea4 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -64,6 +64,7 @@ static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
return;
}
+ spin_lock(&inode->i_lock);
i_size_write(inode, pos);
#if IS_ENABLED(CONFIG_FSCACHE)
fscache_update_cookie(ctx->cache, NULL, &pos);
@@ -77,6 +78,7 @@ static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
DIV_ROUND_UP(pos, SECTOR_SIZE),
inode->i_blocks + add);
}
+ spin_unlock(&inode->i_lock);
}
/**
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index fa9a5bf3c6d5..3efa5894b2c0 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -14,13 +14,17 @@ static void netfs_cleanup_dio_write(struct netfs_io_request *wreq)
struct inode *inode = wreq->inode;
unsigned long long end = wreq->start + wreq->transferred;
- if (!wreq->error &&
- i_size_read(inode) < end) {
+ if (wreq->error || end <= i_size_read(inode))
+ return;
+
+ spin_lock(&inode->i_lock);
+ if (end > i_size_read(inode)) {
if (wreq->netfs_ops->update_i_size)
wreq->netfs_ops->update_i_size(inode, end);
else
i_size_write(inode, end);
}
+ spin_unlock(&inode->i_lock);
}
/*
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 11/13] netfs: Merge i_size update functions
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (9 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 10/13] netfs: Fix i_size updating David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 12/13] netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read David Howells
` (3 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
Netfslib has two functions for updating the i_size after a write: one for
buffered writes into the pagecache and one for direct/unbuffered writes.
However, what needs to be done is much the same in both cases, so merge
them together.
This does raise one question, though: should updating the i_size after a
direct write do the same estimated update of i_blocks as is done for
buffered writes.
Also get rid of the cleanup function pointer from netfs_io_request as it's
only used for direct write to update i_size; instead do the i_size setting
directly from write collection.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/buffered_write.c | 36 +++++++++++++++++++++---------------
fs/netfs/direct_write.c | 19 -------------------
fs/netfs/internal.h | 6 ++++++
fs/netfs/write_collect.c | 6 ++++--
include/linux/netfs.h | 1 -
5 files changed, 31 insertions(+), 37 deletions(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index b87ef3fe4ea4..f27ea5099a68 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -53,30 +53,38 @@ static struct folio *netfs_grab_folio_for_write(struct address_space *mapping,
* data written into the pagecache until we can find out from the server what
* the values actually are.
*/
-static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
- loff_t i_size, loff_t pos, size_t copied)
+void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
+ loff_t pos, size_t copied)
{
+ loff_t i_size, end = pos + copied;
blkcnt_t add;
size_t gap;
+ if (end <= i_size_read(inode))
+ return;
+
if (ctx->ops->update_i_size) {
- ctx->ops->update_i_size(inode, pos);
+ ctx->ops->update_i_size(inode, end);
return;
}
spin_lock(&inode->i_lock);
- i_size_write(inode, pos);
+
+ i_size = i_size_read(inode);
+ if (end > i_size) {
+ i_size_write(inode, end);
#if IS_ENABLED(CONFIG_FSCACHE)
- fscache_update_cookie(ctx->cache, NULL, &pos);
+ fscache_update_cookie(ctx->cache, NULL, &end);
#endif
- gap = SECTOR_SIZE - (i_size & (SECTOR_SIZE - 1));
- if (copied > gap) {
- add = DIV_ROUND_UP(copied - gap, SECTOR_SIZE);
+ gap = SECTOR_SIZE - (i_size & (SECTOR_SIZE - 1));
+ if (copied > gap) {
+ add = DIV_ROUND_UP(copied - gap, SECTOR_SIZE);
- inode->i_blocks = min_t(blkcnt_t,
- DIV_ROUND_UP(pos, SECTOR_SIZE),
- inode->i_blocks + add);
+ inode->i_blocks = min_t(blkcnt_t,
+ DIV_ROUND_UP(end, SECTOR_SIZE),
+ inode->i_blocks + add);
+ }
}
spin_unlock(&inode->i_lock);
}
@@ -113,7 +121,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
struct folio *folio = NULL, *writethrough = NULL;
unsigned int bdp_flags = (iocb->ki_flags & IOCB_NOWAIT) ? BDP_ASYNC : 0;
ssize_t written = 0, ret, ret2;
- loff_t i_size, pos = iocb->ki_pos;
+ loff_t pos = iocb->ki_pos;
size_t max_chunk = mapping_max_folio_size(mapping);
bool maybe_trouble = false;
@@ -346,10 +354,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
flush_dcache_folio(folio);
/* Update the inode size if we moved the EOF marker */
+ netfs_update_i_size(ctx, inode, pos, copied);
pos += copied;
- i_size = i_size_read(inode);
- if (pos > i_size)
- netfs_update_i_size(ctx, inode, i_size, pos, copied);
written += copied;
if (likely(!wreq)) {
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 3efa5894b2c0..dcf2b096cc4e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -9,24 +9,6 @@
#include <linux/uio.h>
#include "internal.h"
-static void netfs_cleanup_dio_write(struct netfs_io_request *wreq)
-{
- struct inode *inode = wreq->inode;
- unsigned long long end = wreq->start + wreq->transferred;
-
- if (wreq->error || end <= i_size_read(inode))
- return;
-
- spin_lock(&inode->i_lock);
- if (end > i_size_read(inode)) {
- if (wreq->netfs_ops->update_i_size)
- wreq->netfs_ops->update_i_size(inode, end);
- else
- i_size_write(inode, end);
- }
- spin_unlock(&inode->i_lock);
-}
-
/*
* Perform an unbuffered write where we may have to do an RMW operation on an
* encrypted file. This can also be used for direct I/O writes.
@@ -102,7 +84,6 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
if (async)
wreq->iocb = iocb;
wreq->len = iov_iter_count(&wreq->buffer.iter);
- wreq->cleanup = netfs_cleanup_dio_write;
ret = netfs_unbuffered_write(wreq, is_sync_kiocb(iocb), wreq->len);
if (ret < 0) {
_debug("begin = %zd", ret);
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index d6656d2b54ab..f9bb9464a147 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -27,6 +27,12 @@ void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error);
int netfs_prefetch_for_write(struct file *file, struct folio *folio,
size_t offset, size_t len);
+/*
+ * buffered_write.c
+ */
+void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
+ loff_t pos, size_t copied);
+
/*
* main.c
*/
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 2ac85a819b71..33a93258f36e 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -393,8 +393,10 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
ictx->ops->invalidate_cache(wreq);
}
- if (wreq->cleanup)
- wreq->cleanup(wreq);
+ if ((wreq->origin == NETFS_UNBUFFERED_WRITE ||
+ wreq->origin == NETFS_DIO_WRITE) &&
+ !wreq->error)
+ netfs_update_i_size(ictx, &ictx->inode, wreq->start, wreq->transferred);
if (wreq->origin == NETFS_DIO_WRITE &&
wreq->mapping->nrpages) {
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 065c17385e53..d8186b90fb38 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -279,7 +279,6 @@ struct netfs_io_request {
#define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
* write to cache on read */
const struct netfs_request_ops *netfs_ops;
- void (*cleanup)(struct netfs_io_request *req);
};
/*
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 12/13] netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (10 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 11/13] netfs: Merge i_size update functions David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-01 16:38 ` [PATCH 13/13] netfs: Update tracepoints in a number of ways David Howells
` (2 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
Renumber the NETFS_RREQ_* flags to put the most useful status bits in the
bottom nibble - and therefore the last hex digit in the trace output -
making it easier to grasp the state at a glance.
In particular, put the IN_PROGRESS flag in bit 0 and ALL_QUEUED at bit 1.
Also make the flags field in /proc/fs/netfs/requests larger to accommodate
all the flags.
Also make the flags field in the netfs_sreq tracepoint larger to
accommodate all the NETFS_SREQ_* flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/main.c | 6 +++---
include/linux/netfs.h | 20 ++++++++++----------
include/trace/events/netfs.h | 2 +-
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 3db401d269e7..73da6c9f5777 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -58,15 +58,15 @@ static int netfs_requests_seq_show(struct seq_file *m, void *v)
if (v == &netfs_io_requests) {
seq_puts(m,
- "REQUEST OR REF FL ERR OPS COVERAGE\n"
- "======== == === == ==== === =========\n"
+ "REQUEST OR REF FLAG ERR OPS COVERAGE\n"
+ "======== == === ==== ==== === =========\n"
);
return 0;
}
rreq = list_entry(v, struct netfs_io_request, proc_link);
seq_printf(m,
- "%08x %s %3d %2lx %4ld %3d @%04llx %llx/%llx",
+ "%08x %s %3d %4lx %4ld %3d @%04llx %llx/%llx",
rreq->debug_id,
netfs_origins[rreq->origin],
refcount_read(&rreq->ref),
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index d8186b90fb38..f43f075852c0 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -265,17 +265,17 @@ struct netfs_io_request {
bool direct_bv_unpin; /* T if direct_bv[] must be unpinned */
refcount_t ref;
unsigned long flags;
-#define NETFS_RREQ_OFFLOAD_COLLECTION 0 /* Offload collection to workqueue */
-#define NETFS_RREQ_NO_UNLOCK_FOLIO 2 /* Don't unlock no_unlock_folio on completion */
-#define NETFS_RREQ_FAILED 4 /* The request failed */
-#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes (has ref) */
-#define NETFS_RREQ_FOLIO_COPY_TO_CACHE 6 /* Copy current folio to cache from read */
-#define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */
-#define NETFS_RREQ_PAUSE 11 /* Pause subrequest generation */
+#define NETFS_RREQ_IN_PROGRESS 0 /* Unlocked when the request completes (has ref) */
+#define NETFS_RREQ_ALL_QUEUED 1 /* All subreqs are now queued */
+#define NETFS_RREQ_PAUSE 2 /* Pause subrequest generation */
+#define NETFS_RREQ_FAILED 3 /* The request failed */
+#define NETFS_RREQ_RETRYING 4 /* Set if we're in the retry path */
+#define NETFS_RREQ_SHORT_TRANSFER 5 /* Set if we have a short transfer */
+#define NETFS_RREQ_OFFLOAD_COLLECTION 8 /* Offload collection to workqueue */
+#define NETFS_RREQ_NO_UNLOCK_FOLIO 9 /* Don't unlock no_unlock_folio on completion */
+#define NETFS_RREQ_FOLIO_COPY_TO_CACHE 10 /* Copy current folio to cache from read */
+#define NETFS_RREQ_UPLOAD_TO_SERVER 11 /* Need to write to the server */
#define NETFS_RREQ_USE_IO_ITER 12 /* Use ->io_iter rather than ->i_pages */
-#define NETFS_RREQ_ALL_QUEUED 13 /* All subreqs are now queued */
-#define NETFS_RREQ_RETRYING 14 /* Set if we're in the retry path */
-#define NETFS_RREQ_SHORT_TRANSFER 15 /* Set if we have a short transfer */
#define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
* write to cache on read */
const struct netfs_request_ops *netfs_ops;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index ba35dc66e986..c2d581429a7b 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -367,7 +367,7 @@ TRACE_EVENT(netfs_sreq,
__entry->slot = sreq->io_iter.folioq_slot;
),
- TP_printk("R=%08x[%x] %s %s f=%02x s=%llx %zx/%zx s=%u e=%d",
+ TP_printk("R=%08x[%x] %s %s f=%03x s=%llx %zx/%zx s=%u e=%d",
__entry->rreq, __entry->index,
__print_symbolic(__entry->source, netfs_sreq_sources),
__print_symbolic(__entry->what, netfs_sreq_traces),
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 13/13] netfs: Update tracepoints in a number of ways
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (11 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 12/13] netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read David Howells
@ 2025-07-01 16:38 ` David Howells
2025-07-09 10:26 ` [PATCH 00/13] netfs, cifs: Fixes to retry-related code Max Kellermann
2025-07-09 13:01 ` David Howells
14 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-01 16:38 UTC (permalink / raw)
To: Christian Brauner, Steve French
Cc: David Howells, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel,
Paulo Alcantara
Make a number of updates to the netfs tracepoints:
(1) Remove a duplicate trace from netfs_unbuffered_write_iter_locked().
(2) Move the trace in netfs_wake_rreq_flag() to after the flag is cleared
so that the change appears in the trace.
(3) Differentiate the use of netfs_rreq_trace_wait/woke_queue symbols.
(4) Don't do so many trace emissions in the wait functions as some of them
are redundant.
(5) In netfs_collect_read_results(), differentiate a subreq that's being
abandoned vs one that has been consumed in a regular way.
(6) Add a tracepoint to indicate the call to ->ki_complete().
(7) Don't double-increment the subreq_counter when retrying a write.
(8) Move the netfs_sreq_trace_io_progress tracepoint within cifs code to
just MID_RESPONSE_RECEIVED and add different tracepoints for other MID
states and note check failure.
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Paulo Alcantara <pc@manguebit.org>
Signed-off-by: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
---
fs/netfs/direct_write.c | 1 -
fs/netfs/internal.h | 2 +-
fs/netfs/misc.c | 14 ++++++--------
fs/netfs/read_collect.c | 12 +++++++++---
fs/netfs/write_collect.c | 4 +++-
fs/netfs/write_retry.c | 1 -
fs/smb/client/cifssmb.c | 20 ++++++++++++++++++++
fs/smb/client/smb2pdu.c | 26 ++++++++++++++++++++++----
include/trace/events/netfs.h | 26 ++++++++++++++++++--------
9 files changed, 79 insertions(+), 27 deletions(-)
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index dcf2b096cc4e..a16660ab7f83 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -91,7 +91,6 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
}
if (!async) {
- trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip);
ret = netfs_wait_for_write(wreq);
if (ret > 0)
iocb->ki_pos += ret;
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index f9bb9464a147..d4f16fefd965 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -273,9 +273,9 @@ static inline void netfs_wake_rreq_flag(struct netfs_io_request *rreq,
enum netfs_rreq_trace trace)
{
if (test_bit(rreq_flag, &rreq->flags)) {
- trace_netfs_rreq(rreq, trace);
clear_bit_unlock(rreq_flag, &rreq->flags);
smp_mb__after_atomic(); /* Set flag before task state */
+ trace_netfs_rreq(rreq, trace);
wake_up(&rreq->waitq);
}
}
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 127a269938bb..20748bcfbf59 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -359,7 +359,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
if (!netfs_check_subreq_in_progress(subreq))
continue;
- trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wait_quiesce);
for (;;) {
prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
@@ -368,10 +368,10 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
schedule();
- trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
}
}
+ trace_netfs_rreq(rreq, netfs_rreq_trace_waited_quiesce);
finish_wait(&rreq->waitq, &myself);
}
@@ -437,7 +437,6 @@ static ssize_t netfs_wait_for_in_progress(struct netfs_io_request *rreq,
ssize_t ret;
for (;;) {
- trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
if (!test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
@@ -457,11 +456,12 @@ static ssize_t netfs_wait_for_in_progress(struct netfs_io_request *rreq,
if (!netfs_check_rreq_in_progress(rreq))
break;
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip);
schedule();
- trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
}
all_collected:
+ trace_netfs_rreq(rreq, netfs_rreq_trace_waited_ip);
finish_wait(&rreq->waitq, &myself);
ret = rreq->error;
@@ -504,10 +504,8 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
{
DEFINE_WAIT(myself);
- trace_netfs_rreq(rreq, netfs_rreq_trace_wait_pause);
-
for (;;) {
- trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_wait_pause);
prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
if (!test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
@@ -530,10 +528,10 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
break;
schedule();
- trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
}
all_collected:
+ trace_netfs_rreq(rreq, netfs_rreq_trace_waited_pause);
finish_wait(&rreq->waitq, &myself);
}
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index cceed9d629c6..3e804da1e1eb 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -293,7 +293,9 @@ static void netfs_collect_read_results(struct netfs_io_request *rreq)
spin_lock(&rreq->lock);
remove = front;
- trace_netfs_sreq(front, netfs_sreq_trace_discard);
+ trace_netfs_sreq(front,
+ notes & ABANDON_SREQ ?
+ netfs_sreq_trace_abandoned : netfs_sreq_trace_consumed);
list_del_init(&front->rreq_link);
front = list_first_entry_or_null(&stream->subrequests,
struct netfs_io_subrequest, rreq_link);
@@ -353,9 +355,11 @@ static void netfs_rreq_assess_dio(struct netfs_io_request *rreq)
if (rreq->iocb) {
rreq->iocb->ki_pos += rreq->transferred;
- if (rreq->iocb->ki_complete)
+ if (rreq->iocb->ki_complete) {
+ trace_netfs_rreq(rreq, netfs_rreq_trace_ki_complete);
rreq->iocb->ki_complete(
rreq->iocb, rreq->error ? rreq->error : rreq->transferred);
+ }
}
if (rreq->netfs_ops->done)
rreq->netfs_ops->done(rreq);
@@ -379,9 +383,11 @@ static void netfs_rreq_assess_single(struct netfs_io_request *rreq)
if (rreq->iocb) {
rreq->iocb->ki_pos += rreq->transferred;
- if (rreq->iocb->ki_complete)
+ if (rreq->iocb->ki_complete) {
+ trace_netfs_rreq(rreq, netfs_rreq_trace_ki_complete);
rreq->iocb->ki_complete(
rreq->iocb, rreq->error ? rreq->error : rreq->transferred);
+ }
}
if (rreq->netfs_ops->done)
rreq->netfs_ops->done(rreq);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 33a93258f36e..0f3a36852a4d 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -421,9 +421,11 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
if (wreq->iocb) {
size_t written = min(wreq->transferred, wreq->len);
wreq->iocb->ki_pos += written;
- if (wreq->iocb->ki_complete)
+ if (wreq->iocb->ki_complete) {
+ trace_netfs_rreq(wreq, netfs_rreq_trace_ki_complete);
wreq->iocb->ki_complete(
wreq->iocb, wreq->error ? wreq->error : written);
+ }
wreq->iocb = VFS_PTR_POISON;
}
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 7158657061e9..fc9c3e0d34d8 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -146,7 +146,6 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
subreq = netfs_alloc_subrequest(wreq);
subreq->source = to->source;
subreq->start = start;
- subreq->debug_index = atomic_inc_return(&wreq->subreq_counter);
subreq->stream_nr = to->stream_nr;
subreq->retry_count = 1;
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 0e509a0433fb..75142f49d65d 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1334,7 +1334,11 @@ cifs_readv_callback(struct mid_q_entry *mid)
cifs_stats_bytes_read(tcon, rdata->got_bytes);
break;
case MID_REQUEST_SUBMITTED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_req_submitted);
+ goto do_retry;
case MID_RETRY_NEEDED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_retry_needed);
+do_retry:
__set_bit(NETFS_SREQ_NEED_RETRY, &rdata->subreq.flags);
rdata->result = -EAGAIN;
if (server->sign && rdata->got_bytes)
@@ -1344,8 +1348,14 @@ cifs_readv_callback(struct mid_q_entry *mid)
task_io_account_read(rdata->got_bytes);
cifs_stats_bytes_read(tcon, rdata->got_bytes);
break;
+ case MID_RESPONSE_MALFORMED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_malformed);
+ rdata->result = -EIO;
+ break;
default:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_unknown);
rdata->result = -EIO;
+ break;
}
if (rdata->result == -ENODATA) {
@@ -1714,11 +1724,21 @@ cifs_writev_callback(struct mid_q_entry *mid)
}
break;
case MID_REQUEST_SUBMITTED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_req_submitted);
+ __set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
+ result = -EAGAIN;
+ break;
case MID_RETRY_NEEDED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_retry_needed);
__set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
result = -EAGAIN;
break;
+ case MID_RESPONSE_MALFORMED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_malformed);
+ result = -EIO;
+ break;
default:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_unknown);
result = -EIO;
break;
}
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 572cfc42dda8..2df93a75e3b8 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -4565,7 +4565,11 @@ smb2_readv_callback(struct mid_q_entry *mid)
cifs_stats_bytes_read(tcon, rdata->got_bytes);
break;
case MID_REQUEST_SUBMITTED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_req_submitted);
+ goto do_retry;
case MID_RETRY_NEEDED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_retry_needed);
+do_retry:
__set_bit(NETFS_SREQ_NEED_RETRY, &rdata->subreq.flags);
rdata->result = -EAGAIN;
if (server->sign && rdata->got_bytes)
@@ -4576,11 +4580,15 @@ smb2_readv_callback(struct mid_q_entry *mid)
cifs_stats_bytes_read(tcon, rdata->got_bytes);
break;
case MID_RESPONSE_MALFORMED:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_malformed);
credits.value = le16_to_cpu(shdr->CreditRequest);
credits.instance = server->reconnect_instance;
- fallthrough;
+ rdata->result = -EIO;
+ break;
default:
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_unknown);
rdata->result = -EIO;
+ break;
}
#ifdef CONFIG_CIFS_SMB_DIRECT
/*
@@ -4833,11 +4841,14 @@ smb2_writev_callback(struct mid_q_entry *mid)
switch (mid->mid_state) {
case MID_RESPONSE_RECEIVED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_progress);
credits.value = le16_to_cpu(rsp->hdr.CreditRequest);
credits.instance = server->reconnect_instance;
result = smb2_check_receive(mid, server, 0);
- if (result != 0)
+ if (result != 0) {
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_bad);
break;
+ }
written = le32_to_cpu(rsp->DataLength);
/*
@@ -4859,15 +4870,23 @@ smb2_writev_callback(struct mid_q_entry *mid)
}
break;
case MID_REQUEST_SUBMITTED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_req_submitted);
+ __set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
+ result = -EAGAIN;
+ break;
case MID_RETRY_NEEDED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_retry_needed);
__set_bit(NETFS_SREQ_NEED_RETRY, &wdata->subreq.flags);
result = -EAGAIN;
break;
case MID_RESPONSE_MALFORMED:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_malformed);
credits.value = le16_to_cpu(rsp->hdr.CreditRequest);
credits.instance = server->reconnect_instance;
- fallthrough;
+ result = -EIO;
+ break;
default:
+ trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_unknown);
result = -EIO;
break;
}
@@ -4907,7 +4926,6 @@ smb2_writev_callback(struct mid_q_entry *mid)
server->credits, server->in_flight,
0, cifs_trace_rw_credits_write_response_clear);
wdata->credits.value = 0;
- trace_netfs_sreq(&wdata->subreq, netfs_sreq_trace_io_progress);
cifs_write_subrequest_terminated(wdata, result ?: written);
release_mid(mid);
trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, 0,
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index c2d581429a7b..73e96ccbe830 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -50,12 +50,13 @@
#define netfs_rreq_traces \
EM(netfs_rreq_trace_assess, "ASSESS ") \
- EM(netfs_rreq_trace_copy, "COPY ") \
EM(netfs_rreq_trace_collect, "COLLECT") \
EM(netfs_rreq_trace_complete, "COMPLET") \
+ EM(netfs_rreq_trace_copy, "COPY ") \
EM(netfs_rreq_trace_dirty, "DIRTY ") \
EM(netfs_rreq_trace_done, "DONE ") \
EM(netfs_rreq_trace_free, "FREE ") \
+ EM(netfs_rreq_trace_ki_complete, "KI-CMPL") \
EM(netfs_rreq_trace_recollect, "RECLLCT") \
EM(netfs_rreq_trace_redirty, "REDIRTY") \
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
@@ -64,13 +65,15 @@
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
EM(netfs_rreq_trace_unlock_pgpriv2, "UNLCK-2") \
EM(netfs_rreq_trace_unmark, "UNMARK ") \
+ EM(netfs_rreq_trace_unpause, "UNPAUSE") \
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
- EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \
- EM(netfs_rreq_trace_wait_queue, "WAIT-Q ") \
+ EM(netfs_rreq_trace_wait_pause, "--PAUSED--") \
+ EM(netfs_rreq_trace_wait_quiesce, "WAIT-QUIESCE") \
+ EM(netfs_rreq_trace_waited_ip, "DONE-IP") \
+ EM(netfs_rreq_trace_waited_pause, "--UNPAUSED--") \
+ EM(netfs_rreq_trace_waited_quiesce, "DONE-QUIESCE") \
EM(netfs_rreq_trace_wake_ip, "WAKE-IP") \
EM(netfs_rreq_trace_wake_queue, "WAKE-Q ") \
- EM(netfs_rreq_trace_woke_queue, "WOKE-Q ") \
- EM(netfs_rreq_trace_unpause, "UNPAUSE") \
E_(netfs_rreq_trace_write_done, "WR-DONE")
#define netfs_sreq_sources \
@@ -83,6 +86,7 @@
E_(NETFS_WRITE_TO_CACHE, "WRIT")
#define netfs_sreq_traces \
+ EM(netfs_sreq_trace_abandoned, "ABNDN") \
EM(netfs_sreq_trace_add_donations, "+DON ") \
EM(netfs_sreq_trace_added, "ADD ") \
EM(netfs_sreq_trace_cache_nowrite, "CA-NW") \
@@ -90,6 +94,7 @@
EM(netfs_sreq_trace_cache_write, "CA-WR") \
EM(netfs_sreq_trace_cancel, "CANCL") \
EM(netfs_sreq_trace_clear, "CLEAR") \
+ EM(netfs_sreq_trace_consumed, "CONSM") \
EM(netfs_sreq_trace_discard, "DSCRD") \
EM(netfs_sreq_trace_donate_to_prev, "DON-P") \
EM(netfs_sreq_trace_donate_to_next, "DON-N") \
@@ -97,7 +102,12 @@
EM(netfs_sreq_trace_fail, "FAIL ") \
EM(netfs_sreq_trace_free, "FREE ") \
EM(netfs_sreq_trace_hit_eof, "EOF ") \
- EM(netfs_sreq_trace_io_progress, "IO ") \
+ EM(netfs_sreq_trace_io_bad, "I-BAD") \
+ EM(netfs_sreq_trace_io_malformed, "I-MLF") \
+ EM(netfs_sreq_trace_io_unknown, "I-UNK") \
+ EM(netfs_sreq_trace_io_progress, "I-OK ") \
+ EM(netfs_sreq_trace_io_req_submitted, "I-RSB") \
+ EM(netfs_sreq_trace_io_retry_needed, "I-RTR") \
EM(netfs_sreq_trace_limited, "LIMIT") \
EM(netfs_sreq_trace_need_clear, "N-CLR") \
EM(netfs_sreq_trace_partial_read, "PARTR") \
@@ -143,8 +153,8 @@
#define netfs_sreq_ref_traces \
EM(netfs_sreq_trace_get_copy_to_cache, "GET COPY2C ") \
- EM(netfs_sreq_trace_get_resubmit, "GET RESUBMIT") \
- EM(netfs_sreq_trace_get_submit, "GET SUBMIT") \
+ EM(netfs_sreq_trace_get_resubmit, "GET RESUBMT") \
+ EM(netfs_sreq_trace_get_submit, "GET SUBMIT ") \
EM(netfs_sreq_trace_get_short_read, "GET SHORTRD") \
EM(netfs_sreq_trace_new, "NEW ") \
EM(netfs_sreq_trace_put_abandon, "PUT ABANDON") \
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (12 preceding siblings ...)
2025-07-01 16:38 ` [PATCH 13/13] netfs: Update tracepoints in a number of ways David Howells
@ 2025-07-09 10:26 ` Max Kellermann
2025-07-09 13:01 ` David Howells
14 siblings, 0 replies; 25+ messages in thread
From: Max Kellermann @ 2025-07-09 10:26 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Steve French, Paulo Alcantara, netfs,
linux-afs, linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
David Howells <dhowells@redhat.com> wrote:
> Here are some miscellaneous fixes and changes for netfslib and cifs, if you
> could consider pulling them. All the bugs fixed were observed in cifs, so
> they should probably go through the cifs tree unless Christian would much
> prefer for them to go through the VFS tree.
Hi David,
your commit 2b1424cd131c ("netfs: Fix wait/wake to be consistent about
the waitqueue used") has given me serious headaches; it has caused
outages in our web hosting clusters (yet again - all Linux versions
since 6.9 had serious netfs regressions). Your patch was backported to
6.15 as commit 329ba1cb402a in 6.15.3 (why oh why??), and therefore
the bugs it has caused will be "available" to all Linux stable users.
The problem we had is that writing to certain files never finishes. It
looks like it has to do with the cachefiles subrequest never reporting
completion. (We use Ceph with cachefiles)
I have tried applying the fixes in this pull request, which sounded
promising, but the problem is still there. The only thing that helps
is reverting 2b1424cd131c completely - everything is fine with 6.15.5
plus the revert.
What do you need from me in order to analyze the bug?
Max
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-01 16:38 [PATCH 00/13] netfs, cifs: Fixes to retry-related code David Howells
` (13 preceding siblings ...)
2025-07-09 10:26 ` [PATCH 00/13] netfs, cifs: Fixes to retry-related code Max Kellermann
@ 2025-07-09 13:01 ` David Howells
2025-07-09 19:04 ` Max Kellermann
2025-07-09 20:22 ` David Howells
14 siblings, 2 replies; 25+ messages in thread
From: David Howells @ 2025-07-09 13:01 UTC (permalink / raw)
To: Max Kellermann
Cc: dhowells, Christian Brauner, Steve French, Paulo Alcantara, netfs,
linux-afs, linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
Max Kellermann <max.kellermann@ionos.com> wrote:
> your commit 2b1424cd131c ("netfs: Fix wait/wake to be consistent about
> the waitqueue used") has given me serious headaches; it has caused
> outages in our web hosting clusters (yet again - all Linux versions
> since 6.9 had serious netfs regressions). Your patch was backported to
> 6.15 as commit 329ba1cb402a in 6.15.3 (why oh why??), and therefore
> the bugs it has caused will be "available" to all Linux stable users.
>
> The problem we had is that writing to certain files never finishes. It
> looks like it has to do with the cachefiles subrequest never reporting
> completion. (We use Ceph with cachefiles)
>
> I have tried applying the fixes in this pull request, which sounded
> promising, but the problem is still there. The only thing that helps
> is reverting 2b1424cd131c completely - everything is fine with 6.15.5
> plus the revert.
>
> What do you need from me in order to analyze the bug?
As a start, can you turn on:
echo 65536 >/sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/netfs/netfs_read/enable
echo 1 > /sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 > /sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 > /sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
If you keep an eye on /proc/fs/netfs/requests you should be able to see any
tasks in there that get stuck. If one gets stuck, then:
echo 0 > /sys/kernel/debug/tracing/events/enable
to stop further tracing.
Looking in /proc/fs/netfs/requests, you should be able to see the debug ID of
the stuck request. If you can try grepping the trace log for that:
grep "R=<8-digit-hex-id>" /sys/kernel/debug/tracing/trace
that should hopefully let me see how things progressed on that call.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-09 13:01 ` David Howells
@ 2025-07-09 19:04 ` Max Kellermann
2025-07-09 20:22 ` David Howells
1 sibling, 0 replies; 25+ messages in thread
From: Max Kellermann @ 2025-07-09 19:04 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Steve French, Paulo Alcantara, netfs,
linux-afs, linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
On Wed, Jul 9, 2025 at 3:01 PM David Howells <dhowells@redhat.com> wrote:
> If you keep an eye on /proc/fs/netfs/requests you should be able to see any
> tasks in there that get stuck. If one gets stuck, then:
After one got stuck, this is what I see in /proc/fs/netfs/requests:
REQUEST OR REF FL ERR OPS COVERAGE
======== == === == ==== === =========
00000065 2C 2 80002020 0 0 @0000 0/0
> Looking in /proc/fs/netfs/requests, you should be able to see the debug ID of
> the stuck request. If you can try grepping the trace log for that:
>
> grep "R=<8-digit-hex-id>" /sys/kernel/debug/tracing/trace
kworker/u96:4-455 [008] ...1. 107.145222: netfs_sreq:
R=00000065[1] WRIT PREP f=00 s=0 0/0 s=0 e=0
kworker/u96:4-455 [008] ...1. 107.145292: netfs_sreq:
R=00000065[1] WRIT SUBMT f=100 s=0 0/29e1 s=0 e=0
kworker/u96:4-455 [008] ...1. 107.145311: netfs_sreq:
R=00000065[1] WRIT CA-PR f=100 s=0 0/3000 s=0 e=0
kworker/u96:4-455 [008] ...1. 107.145457: netfs_sreq:
R=00000065[1] WRIT CA-WR f=100 s=0 0/3000 s=0 e=0
kworker/8:1-437 [008] ...1. 107.149530: netfs_sreq:
R=00000065[1] WRIT TERM f=100 s=0 3000/3000 s=2 e=0
kworker/8:1-437 [008] ...1. 107.149531: netfs_rreq:
R=00000065 2C WAKE-Q f=80002020
I can reproduce this easily - it happens every time I log out of that
machine when bash tries to write the bash_history file - the write()
always gets stuck.
(The above was 6.15.5 plus all patches in this PR.)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-09 13:01 ` David Howells
2025-07-09 19:04 ` Max Kellermann
@ 2025-07-09 20:22 ` David Howells
2025-07-09 21:44 ` Max Kellermann
` (2 more replies)
1 sibling, 3 replies; 25+ messages in thread
From: David Howells @ 2025-07-09 20:22 UTC (permalink / raw)
To: Max Kellermann
Cc: dhowells, Christian Brauner, Steve French, Paulo Alcantara, netfs,
linux-afs, linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
Max Kellermann <max.kellermann@ionos.com> wrote:
> kworker/8:1-437 [008] ...1. 107.149531: netfs_rreq:
> R=00000065 2C WAKE-Q f=80002020
>
> ...
> (The above was 6.15.5 plus all patches in this PR.)
Can you check that, please? If you have patch 12 applied, then the flags
should be renumbered and there shouldn't be a NETFS_RREQ_ flag with 13, but
f=80002020 would seem to have 0x2000 (ie. bit 13) set in it.
If you do have it applied, then this might be an indicator of the issue.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-09 20:22 ` David Howells
@ 2025-07-09 21:44 ` Max Kellermann
2025-07-10 10:47 ` David Howells
2025-07-10 11:17 ` David Howells
2 siblings, 0 replies; 25+ messages in thread
From: Max Kellermann @ 2025-07-09 21:44 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Steve French, Paulo Alcantara, netfs,
linux-afs, linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
On Wed, Jul 9, 2025 at 10:22 PM David Howells <dhowells@redhat.com> wrote:
> > (The above was 6.15.5 plus all patches in this PR.)
>
> Can you check that, please? If you have patch 12 applied, then the flags
> should be renumbered and there shouldn't be a NETFS_RREQ_ flag with 13, but
> f=80002020 would seem to have 0x2000 (ie. bit 13) set in it.
Oh, I was slightly wrong, I merged only 12 patches, omitting the
renumbering patch because it had conflicts with my branch, and it was
only a cosmetic change, not relevant for the bug. Sorry for the
confusion!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-09 20:22 ` David Howells
2025-07-09 21:44 ` Max Kellermann
@ 2025-07-10 10:47 ` David Howells
2025-07-10 13:41 ` Max Kellermann
` (2 more replies)
2025-07-10 11:17 ` David Howells
2 siblings, 3 replies; 25+ messages in thread
From: David Howells @ 2025-07-10 10:47 UTC (permalink / raw)
To: Max Kellermann
Cc: dhowells, Christian Brauner, Viacheslav Dubeyko, Alex Markuze,
Steve French, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel, stable
Hi Max,
I managed to reproduce it on my test machine with ceph + fscache.
Does this fix the problem for you?
David
---
netfs: Fix copy-to-cache so that it performs collection with ceph+fscache
The netfs copy-to-cache that is used by Ceph with local caching sets up a
new request to write data just read to the cache. The request is started
and then left to look after itself whilst the app continues. The request
gets notified by the backing fs upon completion of the async DIO write, but
then tries to wake up the app because NETFS_RREQ_OFFLOAD_COLLECTION isn't
set - but the app isn't waiting there, and so the request just hangs.
Fix this by setting NETFS_RREQ_OFFLOAD_COLLECTION which causes the
notification from the backing filesystem to put the collection onto a work
queue instead.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
---
fs/netfs/read_pgpriv2.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 5bbe906a551d..080d2a6a51d9 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -110,6 +110,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
if (!creq->io_streams[1].avail)
goto cancel_put;
+ __set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
rreq->copy_to_cache = creq;
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-10 10:47 ` David Howells
@ 2025-07-10 13:41 ` Max Kellermann
2025-07-10 15:32 ` David Howells
2025-07-10 16:31 ` David Howells
2 siblings, 0 replies; 25+ messages in thread
From: Max Kellermann @ 2025-07-10 13:41 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Viacheslav Dubeyko, Alex Markuze, Steve French,
Paulo Alcantara, netfs, linux-afs, linux-cifs, linux-nfs,
ceph-devel, v9fs, linux-fsdevel, linux-kernel, stable
On Thu, Jul 10, 2025 at 12:47 PM David Howells <dhowells@redhat.com> wrote:
> I managed to reproduce it on my test machine with ceph + fscache.
>
> Does this fix the problem for you?
Yes! I can no longer reproduce my problem.
Thanks, David.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-10 10:47 ` David Howells
2025-07-10 13:41 ` Max Kellermann
@ 2025-07-10 15:32 ` David Howells
2025-07-10 16:31 ` David Howells
2 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-10 15:32 UTC (permalink / raw)
To: Max Kellermann
Cc: dhowells, Christian Brauner, Viacheslav Dubeyko, Alex Markuze,
Steve French, Paulo Alcantara, netfs, linux-afs, linux-cifs,
linux-nfs, ceph-devel, v9fs, linux-fsdevel, linux-kernel, stable
Hi Max,
Depending on what you're doing on ceph, you might need the attached patch as
well. I managed to reproduce it by doing a git clone and kernel build on a
ceph mount with cachefiles active.
David
----
commit 6c24e7124642846b628274c845cd6de252f1dfbf
Author: David Howells <dhowells@redhat.com>
Date: Thu Jul 10 15:02:57 2025 +0100
netfs: Fix race between cache write completion and ALL_QUEUED being set
When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function. At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.
Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.
However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue. Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.
This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache. Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.
If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.
Fix this by queuing the collector after setting ALL_QUEUED. This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.
Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 080d2a6a51d9..889ff7954f8c 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -111,6 +111,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
goto cancel_put;
__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
+ trace_netfs_copy2cache(rreq, creq);
trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
rreq->copy_to_cache = creq;
@@ -155,6 +156,8 @@ void netfs_pgpriv2_end_copy_to_cache(struct netfs_io_request *rreq)
netfs_issue_write(creq, &creq->io_streams[1]);
smp_wmb(); /* Write lists before ALL_QUEUED. */
set_bit(NETFS_RREQ_ALL_QUEUED, &creq->flags);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_end_copy_to_cache);
+ netfs_wake_collector(creq);
netfs_put_request(creq, netfs_rreq_trace_put_return);
creq->copy_to_cache = NULL;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 73e96ccbe830..64a382fbc31a 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -55,6 +55,7 @@
EM(netfs_rreq_trace_copy, "COPY ") \
EM(netfs_rreq_trace_dirty, "DIRTY ") \
EM(netfs_rreq_trace_done, "DONE ") \
+ EM(netfs_rreq_trace_end_copy_to_cache, "END-C2C") \
EM(netfs_rreq_trace_free, "FREE ") \
EM(netfs_rreq_trace_ki_complete, "KI-CMPL") \
EM(netfs_rreq_trace_recollect, "RECLLCT") \
@@ -559,6 +560,35 @@ TRACE_EVENT(netfs_write,
__entry->start, __entry->start + __entry->len - 1)
);
+TRACE_EVENT(netfs_copy2cache,
+ TP_PROTO(const struct netfs_io_request *rreq,
+ const struct netfs_io_request *creq),
+
+ TP_ARGS(rreq, creq),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, rreq)
+ __field(unsigned int, creq)
+ __field(unsigned int, cookie)
+ __field(unsigned int, ino)
+ ),
+
+ TP_fast_assign(
+ struct netfs_inode *__ctx = netfs_inode(rreq->inode);
+ struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
+ __entry->rreq = rreq->debug_id;
+ __entry->creq = creq->debug_id;
+ __entry->cookie = __cookie ? __cookie->debug_id : 0;
+ __entry->ino = rreq->inode->i_ino;
+ ),
+
+ TP_printk("R=%08x CR=%08x c=%08x i=%x ",
+ __entry->rreq,
+ __entry->creq,
+ __entry->cookie,
+ __entry->ino)
+ );
+
TRACE_EVENT(netfs_collect,
TP_PROTO(const struct netfs_io_request *wreq),
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-10 10:47 ` David Howells
2025-07-10 13:41 ` Max Kellermann
2025-07-10 15:32 ` David Howells
@ 2025-07-10 16:31 ` David Howells
2 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-10 16:31 UTC (permalink / raw)
Cc: dhowells, Max Kellermann, Christian Brauner, Viacheslav Dubeyko,
Alex Markuze, Steve French, Paulo Alcantara, netfs, linux-afs,
linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
David Howells <dhowells@redhat.com> wrote:
> Depending on what you're doing on ceph, you might need the attached patch as
> well. I managed to reproduce it by doing a git clone and kernel build on a
> ceph mount with cachefiles active.
Here's a version of the patch that conditionally does the needed wakeup. I
don't want to force processing if there's no need.
David
---
commit 1fe42a9a7f0b2f51107574f0b8e151d13dc766cc
Author: David Howells <dhowells@redhat.com>
Date: Thu Jul 10 15:02:57 2025 +0100
netfs: Fix race between cache write completion and ALL_QUEUED being set
When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function. At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.
Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.
However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue. Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.
This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache. Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.
If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.
Fix this by queuing the collector after setting ALL_QUEUED. This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.
Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 080d2a6a51d9..8097bc069c1d 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -111,6 +111,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
goto cancel_put;
__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
+ trace_netfs_copy2cache(rreq, creq);
trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
rreq->copy_to_cache = creq;
@@ -155,6 +156,9 @@ void netfs_pgpriv2_end_copy_to_cache(struct netfs_io_request *rreq)
netfs_issue_write(creq, &creq->io_streams[1]);
smp_wmb(); /* Write lists before ALL_QUEUED. */
set_bit(NETFS_RREQ_ALL_QUEUED, &creq->flags);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_end_copy_to_cache);
+ if (list_empty_careful(&creq->io_streams[1].subrequests))
+ netfs_wake_collector(creq);
netfs_put_request(creq, netfs_rreq_trace_put_return);
creq->copy_to_cache = NULL;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 73e96ccbe830..64a382fbc31a 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -55,6 +55,7 @@
EM(netfs_rreq_trace_copy, "COPY ") \
EM(netfs_rreq_trace_dirty, "DIRTY ") \
EM(netfs_rreq_trace_done, "DONE ") \
+ EM(netfs_rreq_trace_end_copy_to_cache, "END-C2C") \
EM(netfs_rreq_trace_free, "FREE ") \
EM(netfs_rreq_trace_ki_complete, "KI-CMPL") \
EM(netfs_rreq_trace_recollect, "RECLLCT") \
@@ -559,6 +560,35 @@ TRACE_EVENT(netfs_write,
__entry->start, __entry->start + __entry->len - 1)
);
+TRACE_EVENT(netfs_copy2cache,
+ TP_PROTO(const struct netfs_io_request *rreq,
+ const struct netfs_io_request *creq),
+
+ TP_ARGS(rreq, creq),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, rreq)
+ __field(unsigned int, creq)
+ __field(unsigned int, cookie)
+ __field(unsigned int, ino)
+ ),
+
+ TP_fast_assign(
+ struct netfs_inode *__ctx = netfs_inode(rreq->inode);
+ struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
+ __entry->rreq = rreq->debug_id;
+ __entry->creq = creq->debug_id;
+ __entry->cookie = __cookie ? __cookie->debug_id : 0;
+ __entry->ino = rreq->inode->i_ino;
+ ),
+
+ TP_printk("R=%08x CR=%08x c=%08x i=%x ",
+ __entry->rreq,
+ __entry->creq,
+ __entry->cookie,
+ __entry->ino)
+ );
+
TRACE_EVENT(netfs_collect,
TP_PROTO(const struct netfs_io_request *wreq),
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 00/13] netfs, cifs: Fixes to retry-related code
2025-07-09 20:22 ` David Howells
2025-07-09 21:44 ` Max Kellermann
2025-07-10 10:47 ` David Howells
@ 2025-07-10 11:17 ` David Howells
2 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2025-07-10 11:17 UTC (permalink / raw)
Cc: dhowells, Max Kellermann, Christian Brauner, Viacheslav Dubeyko,
Alex Markuze, Steve French, Paulo Alcantara, netfs, linux-afs,
linux-cifs, linux-nfs, ceph-devel, v9fs, linux-fsdevel,
linux-kernel, stable
David Howells <dhowells@redhat.com> wrote:
> I managed to reproduce it on my test machine with ceph + fscache.
>
> Does this fix the problem for you?
There's at least one more bug in there, so this won't fix all of the cases.
David
^ permalink raw reply [flat|nested] 25+ messages in thread