[PATCH] netfs: Fix a number of read-retry hangs

netfs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [PATCH] netfs: Fix a number of read-retry hangs
@ 2025-01-28  0:33 David Howells
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Howells @ 2025-01-28  0:33 UTC (permalink / raw)
  To: Ihor Solodrai, Marc Dionne, Steve French
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck, Paulo Alcantara,
	Jeff Layton, Christian Brauner, v9fs, linux-cifs, netfs,
	linux-fsdevel, linux-kernel

Hi Ihor, Marc, Steve,

I think this patch should better fix the hangs that have been seen than Ihor's
previously tested fix (which I think isn't actually correct).

David
---
From: David Howells <dhowells@redhat.com>

netfs: Fix a number of read-retry hangs

Fix a number of hangs in the netfslib read-retry code, including:

 (1) netfs_reissue_read() doubles up the getting of references on
     subrequests, thereby leaking the subrequest and causing inode eviction
     to wait indefinitely.  This can lead to the kernel reporting a hang in
     the filesystem's evict_inode().

     Fix this by removing the get from netfs_reissue_read() and adding one
     to netfs_retry_read_subrequests() to deal with the one place that
     didn't double up.

 (2) The loop in netfs_retry_read_subrequests() that retries a sequence of
     failed subrequests doesn't record whether or not it retried the one
     that the "subreq" pointer points to when it leaves the loop.  It may
     not if renegotiation/repreparation of the subrequests means that fewer
     subrequests are needed to span the cumulative range of the sequence.

     Because it doesn't record this, the piece of code that discards
     now-superfluous subrequests doesn't know whether it should discard the
     one "subreq" points to - and so it doesn't.

     Fix this by noting whether the last subreq it examines is superfluous
     and if it is, then getting rid of it and all subsequent subrequests.

     If that one one wasn't superfluous, then we would have tried to go
     round the previous loop again and so there can be no further unretried
     subrequests in the sequence.

 (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
     subrequests it has to get because it ran out of ones it could reuse to
     to renegotiation/repreparation shrinking the subrequests.

     Fix this by removing that extra ref.

 (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
     NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
     sequence - but netfs_read_subreq_terminated() is now using a wait
     queue on the request instead and so this wait will never finish.

     Fix this by waiting on the wait queue instead.  To make this work, a
     new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
     the wake-up code to wake up the wait queue rather than requeuing the
     request's work item.

     Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
     no longer used.

 (5) Whilst not strictly anything to do with the hang,
     netfs_retry_read_subrequests() was also doubly incrementing the
     subreq_counter and re-setting the debug index, leaving a gap in the
     trace.  This is also fixed.

One of these hangs was observed with 9p and with cifs.  Others were forced
by manual code injection into fs/afs/file.c.  Firstly, afs_prepare_read()
was created to provide an changing pattern of maximum subrequest sizes:

        static int afs_prepare_read(struct netfs_io_subrequest *subreq)
        {
                struct netfs_io_request *rreq = subreq->rreq;
                if (!S_ISREG(subreq->rreq->inode->i_mode))
                        return 0;
                if (subreq->retry_count < 20)
                        rreq->io_streams[0].sreq_max_len =
                                umax(200, 2222 - subreq->retry_count * 40);
                else
                        rreq->io_streams[0].sreq_max_len = 3333;
                return 0;
        }

and pointed to by afs_req_ops.  Then the following:

        struct netfs_io_subrequest *subreq = op->fetch.subreq;
        if (subreq->error == 0 &&
            S_ISREG(subreq->rreq->inode->i_mode) &&
            subreq->retry_count < 20) {
                subreq->transferred = subreq->already_done;
                __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
                __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
                afs_fetch_data_notify(op);
                return;
        }

was inserted into afs_fetch_data_success() at the beginning and struct
netfs_io_subrequest given an extra field, "already_done" that was set to
the value in "subreq->transferred" by netfs_reissue_read().

When reading a 4K file, the subrequests would get gradually smaller, a new
subrequest would be allocated around the 3rd retry and then eventually be
rendered superfluous when the 20th retry was hit and the limit on the first
subrequest was eased.

Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <stfrench@microsoft.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/read_collect.c      |    6 ++++--
 fs/netfs/read_retry.c        |   40 ++++++++++++++++++++++++++++++----------
 include/linux/netfs.h        |    2 +-
 include/trace/events/netfs.h |    4 +++-
 4 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index f65affa5a9e4..636cc5a98ef5 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work)
  */
 void netfs_wake_read_collector(struct netfs_io_request *rreq)
 {
-	if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
+	if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) &&
+	    !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) {
 		if (!work_pending(&rreq->work)) {
 			netfs_get_request(rreq, netfs_rreq_trace_get_work);
 			if (!queue_work(system_unbound_wq, &rreq->work))
@@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
 	smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */
 
 	/* If we are at the head of the queue, wake up the collector. */
-	if (list_is_first(&subreq->rreq_link, &stream->subrequests))
+	if (list_is_first(&subreq->rreq_link, &stream->subrequests) ||
+	    test_bit(NETFS_RREQ_RETRYING, &rreq->flags))
 		netfs_wake_read_collector(rreq);
 
 	netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index 2290af0d51ac..8316c4533a51 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
 {
 	__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
-	netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
 	subreq->rreq->netfs_ops->issue_read(subreq);
 }
 
@@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 				__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 				subreq->retry_count++;
 				netfs_reset_iter(subreq);
+				netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
 				netfs_reissue_read(rreq, subreq);
 			}
 		}
@@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 		struct iov_iter source;
 		unsigned long long start, len;
 		size_t part;
-		bool boundary = false;
+		bool boundary = false, subreq_superfluous = false;
 
 		/* Go through the subreqs and find the next span of contiguous
 		 * buffer that we then rejig (cifs, for example, needs the
@@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 		/* Work through the sublist. */
 		subreq = from;
 		list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
-			if (!len)
+			if (!len) {
+				subreq_superfluous = true;
 				break;
+			}
 			subreq->source	= NETFS_DOWNLOAD_FROM_SERVER;
 			subreq->start	= start - subreq->transferred;
 			subreq->len	= len   + subreq->transferred;
@@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 
 			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
 			netfs_reissue_read(rreq, subreq);
-			if (subreq == to)
+			if (subreq == to) {
+				subreq_superfluous = false;
 				break;
+			}
 		}
 
 		/* If we managed to use fewer subreqs, we can discard the
 		 * excess; if we used the same number, then we're done.
 		 */
 		if (!len) {
-			if (subreq == to)
+			if (!subreq_superfluous)
 				continue;
 			list_for_each_entry_safe_from(subreq, tmp,
 						      &stream->subrequests, rreq_link) {
-				trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
+				trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous);
 				list_del(&subreq->rreq_link);
 				netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
 				if (subreq == to)
@@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 			subreq->source		= NETFS_DOWNLOAD_FROM_SERVER;
 			subreq->start		= start;
 			subreq->len		= len;
-			subreq->debug_index	= atomic_inc_return(&rreq->subreq_counter);
 			subreq->stream_nr	= stream->stream_nr;
 			subreq->retry_count	= 1;
 
 			trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index,
 					     refcount_read(&subreq->ref),
 					     netfs_sreq_trace_new);
-			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
 
 			list_add(&subreq->rreq_link, &to->rreq_link);
 			to = list_next_entry(to, rreq_link);
@@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq)
 {
 	struct netfs_io_subrequest *subreq;
 	struct netfs_io_stream *stream = &rreq->io_streams[0];
+	DEFINE_WAIT(myself);
+
+	set_bit(NETFS_RREQ_RETRYING, &rreq->flags);
 
 	/* Wait for all outstanding I/O to quiesce before performing retries as
 	 * we may need to renegotiate the I/O sizes.
 	 */
 	list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
-		wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS,
-			    TASK_UNINTERRUPTIBLE);
+		if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
+			continue;
+
+		trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
+		for (;;) {
+			prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
+
+			if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
+				break;
+
+			trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
+			schedule();
+			trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
+		}
+
+		finish_wait(&rreq->waitq, &myself);
 	}
+	clear_bit(NETFS_RREQ_RETRYING, &rreq->flags);
 
 	trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
 	netfs_retry_read_subrequests(rreq);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 071d05d81d38..c86a11cfc4a3 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -278,7 +278,7 @@ struct netfs_io_request {
 #define NETFS_RREQ_PAUSE		11	/* Pause subrequest generation */
 #define NETFS_RREQ_USE_IO_ITER		12	/* Use ->io_iter rather than ->i_pages */
 #define NETFS_RREQ_ALL_QUEUED		13	/* All subreqs are now queued */
-#define NETFS_RREQ_NEED_RETRY		14	/* Need to try retrying */
+#define NETFS_RREQ_RETRYING		14	/* Set if we're in the retry path */
 #define NETFS_RREQ_USE_PGPRIV2		31	/* [DEPRECATED] Use PG_private_2 to mark
 						 * write to cache on read */
 	const struct netfs_request_ops *netfs_ops;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 6e699cadcb29..f880835f7695 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -99,7 +99,7 @@
 	EM(netfs_sreq_trace_limited,		"LIMIT")	\
 	EM(netfs_sreq_trace_need_clear,		"N-CLR")	\
 	EM(netfs_sreq_trace_partial_read,	"PARTR")	\
-	EM(netfs_sreq_trace_need_retry,		"NRTRY")	\
+	EM(netfs_sreq_trace_need_retry,		"ND-RT")	\
 	EM(netfs_sreq_trace_prepare,		"PREP ")	\
 	EM(netfs_sreq_trace_prep_failed,	"PRPFL")	\
 	EM(netfs_sreq_trace_progress,		"PRGRS")	\
@@ -108,7 +108,9 @@
 	EM(netfs_sreq_trace_short,		"SHORT")	\
 	EM(netfs_sreq_trace_split,		"SPLIT")	\
 	EM(netfs_sreq_trace_submit,		"SUBMT")	\
+	EM(netfs_sreq_trace_superfluous,	"SPRFL")	\
 	EM(netfs_sreq_trace_terminated,		"TERM ")	\
+	EM(netfs_sreq_trace_wait_for,		"_WAIT")	\
 	EM(netfs_sreq_trace_write,		"WRITE")	\
 	EM(netfs_sreq_trace_write_skip,		"SKIP ")	\
 	E_(netfs_sreq_trace_write_term,		"WTERM")


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] netfs: Add retry stat counters
  2025-01-28  0:33 [PATCH] netfs: Fix a number of read-retry hangs David Howells
@ 2025-01-28  9:33 ` David Howells
  2025-01-28 19:11   ` Ihor Solodrai
                     ` (2 more replies)
  2025-01-28 16:51 ` [PATCH] netfs: Fix a number of read-retry hangs Marc Dionne
  2025-02-11  1:07 ` Steve French
  2 siblings, 3 replies; 13+ messages in thread
From: David Howells @ 2025-01-28  9:33 UTC (permalink / raw)
  To: Ihor Solodrai, Marc Dionne, Steve French
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck, Paulo Alcantara,
	Jeff Layton, Christian Brauner, v9fs, linux-cifs, netfs,
	linux-fsdevel, linux-kernel

Here's an additional patch to allow stats on the number of retries to be
obtained.  This isn't a fix per se.

David
---
From: David Howells <dhowells@redhat.com>

netfs: Add retry stat counters

Add stat counters to count the number of request and subrequest retries and
display them in /proc/fs/netfs/stats.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/internal.h    |    4 ++++
 fs/netfs/read_retry.c  |    3 +++
 fs/netfs/stats.c       |    9 +++++++++
 fs/netfs/write_issue.c |    1 +
 fs/netfs/write_retry.c |    2 ++
 5 files changed, 19 insertions(+)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index eb76f98c894b..1c4f953c3d68 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -135,6 +135,8 @@ extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
 extern atomic_t netfs_n_rh_write_zskip;
+extern atomic_t netfs_n_rh_retry_read_req;
+extern atomic_t netfs_n_rh_retry_read_subreq;
 extern atomic_t netfs_n_wh_buffered_write;
 extern atomic_t netfs_n_wh_writethrough;
 extern atomic_t netfs_n_wh_dio_write;
@@ -147,6 +149,8 @@ extern atomic_t netfs_n_wh_upload_failed;
 extern atomic_t netfs_n_wh_write;
 extern atomic_t netfs_n_wh_write_done;
 extern atomic_t netfs_n_wh_write_failed;
+extern atomic_t netfs_n_wh_retry_write_req;
+extern atomic_t netfs_n_wh_retry_write_subreq;
 extern atomic_t netfs_n_wb_lock_skip;
 extern atomic_t netfs_n_wb_lock_wait;
 extern atomic_t netfs_n_folioq;
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index 8316c4533a51..0f294b26e08c 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -14,6 +14,7 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
 {
 	__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+	netfs_stat(&netfs_n_rh_retry_read_subreq);
 	subreq->rreq->netfs_ops->issue_read(subreq);
 }
 
@@ -260,6 +261,8 @@ void netfs_retry_reads(struct netfs_io_request *rreq)
 	struct netfs_io_stream *stream = &rreq->io_streams[0];
 	DEFINE_WAIT(myself);
 
+	netfs_stat(&netfs_n_rh_retry_read_req);
+
 	set_bit(NETFS_RREQ_RETRYING, &rreq->flags);
 
 	/* Wait for all outstanding I/O to quiesce before performing retries as
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index f1af344266cc..ab6b916addc4 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -29,6 +29,8 @@ atomic_t netfs_n_rh_write_begin;
 atomic_t netfs_n_rh_write_done;
 atomic_t netfs_n_rh_write_failed;
 atomic_t netfs_n_rh_write_zskip;
+atomic_t netfs_n_rh_retry_read_req;
+atomic_t netfs_n_rh_retry_read_subreq;
 atomic_t netfs_n_wh_buffered_write;
 atomic_t netfs_n_wh_writethrough;
 atomic_t netfs_n_wh_dio_write;
@@ -41,6 +43,8 @@ atomic_t netfs_n_wh_upload_failed;
 atomic_t netfs_n_wh_write;
 atomic_t netfs_n_wh_write_done;
 atomic_t netfs_n_wh_write_failed;
+atomic_t netfs_n_wh_retry_write_req;
+atomic_t netfs_n_wh_retry_write_subreq;
 atomic_t netfs_n_wb_lock_skip;
 atomic_t netfs_n_wb_lock_wait;
 atomic_t netfs_n_folioq;
@@ -81,6 +85,11 @@ int netfs_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&netfs_n_wh_write),
 		   atomic_read(&netfs_n_wh_write_done),
 		   atomic_read(&netfs_n_wh_write_failed));
+	seq_printf(m, "Retries: rq=%u rs=%u wq=%u ws=%u\n",
+		   atomic_read(&netfs_n_rh_retry_read_req),
+		   atomic_read(&netfs_n_rh_retry_read_subreq),
+		   atomic_read(&netfs_n_wh_retry_write_req),
+		   atomic_read(&netfs_n_wh_retry_write_subreq));
 	seq_printf(m, "Objs   : rr=%u sr=%u foq=%u wsc=%u\n",
 		   atomic_read(&netfs_n_rh_rreq),
 		   atomic_read(&netfs_n_rh_sreq),
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 69727411683e..77279fc5b5a7 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -253,6 +253,7 @@ void netfs_reissue_write(struct netfs_io_stream *stream,
 	subreq->retry_count++;
 	__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+	netfs_stat(&netfs_n_wh_retry_write_subreq);
 	netfs_do_issue_write(stream, subreq);
 }
 
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index c841a851dd73..545d33079a77 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -203,6 +203,8 @@ void netfs_retry_writes(struct netfs_io_request *wreq)
 	struct netfs_io_stream *stream;
 	int s;
 
+	netfs_stat(&netfs_n_wh_retry_write_req);
+
 	/* Wait for all outstanding I/O to quiesce before performing retries as
 	 * we may need to renegotiate the I/O sizes.
 	 */


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Fix a number of read-retry hangs
  2025-01-28  0:33 [PATCH] netfs: Fix a number of read-retry hangs David Howells
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
@ 2025-01-28 16:51 ` Marc Dionne
  2025-02-11  1:07 ` Steve French
  2 siblings, 0 replies; 13+ messages in thread
From: Marc Dionne @ 2025-01-28 16:51 UTC (permalink / raw)
  To: David Howells
  Cc: Ihor Solodrai, Steve French, Eric Van Hensbergen,
	Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel

On Mon, Jan 27, 2025 at 8:33 PM David Howells <dhowells@redhat.com> wrote:
>
> Hi Ihor, Marc, Steve,
>
> I think this patch should better fix the hangs that have been seen than Ihor's
> previously tested fix (which I think isn't actually correct).
>
> David
> ---
> From: David Howells <dhowells@redhat.com>
>
> netfs: Fix a number of read-retry hangs
>
> Fix a number of hangs in the netfslib read-retry code, including:
>
>  (1) netfs_reissue_read() doubles up the getting of references on
>      subrequests, thereby leaking the subrequest and causing inode eviction
>      to wait indefinitely.  This can lead to the kernel reporting a hang in
>      the filesystem's evict_inode().
>
>      Fix this by removing the get from netfs_reissue_read() and adding one
>      to netfs_retry_read_subrequests() to deal with the one place that
>      didn't double up.
>
>  (2) The loop in netfs_retry_read_subrequests() that retries a sequence of
>      failed subrequests doesn't record whether or not it retried the one
>      that the "subreq" pointer points to when it leaves the loop.  It may
>      not if renegotiation/repreparation of the subrequests means that fewer
>      subrequests are needed to span the cumulative range of the sequence.
>
>      Because it doesn't record this, the piece of code that discards
>      now-superfluous subrequests doesn't know whether it should discard the
>      one "subreq" points to - and so it doesn't.
>
>      Fix this by noting whether the last subreq it examines is superfluous
>      and if it is, then getting rid of it and all subsequent subrequests.
>
>      If that one one wasn't superfluous, then we would have tried to go
>      round the previous loop again and so there can be no further unretried
>      subrequests in the sequence.
>
>  (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
>      subrequests it has to get because it ran out of ones it could reuse to
>      to renegotiation/repreparation shrinking the subrequests.
>
>      Fix this by removing that extra ref.
>
>  (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
>      NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
>      sequence - but netfs_read_subreq_terminated() is now using a wait
>      queue on the request instead and so this wait will never finish.
>
>      Fix this by waiting on the wait queue instead.  To make this work, a
>      new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
>      the wake-up code to wake up the wait queue rather than requeuing the
>      request's work item.
>
>      Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
>      no longer used.
>
>  (5) Whilst not strictly anything to do with the hang,
>      netfs_retry_read_subrequests() was also doubly incrementing the
>      subreq_counter and re-setting the debug index, leaving a gap in the
>      trace.  This is also fixed.
>
> One of these hangs was observed with 9p and with cifs.  Others were forced
> by manual code injection into fs/afs/file.c.  Firstly, afs_prepare_read()
> was created to provide an changing pattern of maximum subrequest sizes:
>
>         static int afs_prepare_read(struct netfs_io_subrequest *subreq)
>         {
>                 struct netfs_io_request *rreq = subreq->rreq;
>                 if (!S_ISREG(subreq->rreq->inode->i_mode))
>                         return 0;
>                 if (subreq->retry_count < 20)
>                         rreq->io_streams[0].sreq_max_len =
>                                 umax(200, 2222 - subreq->retry_count * 40);
>                 else
>                         rreq->io_streams[0].sreq_max_len = 3333;
>                 return 0;
>         }
>
> and pointed to by afs_req_ops.  Then the following:
>
>         struct netfs_io_subrequest *subreq = op->fetch.subreq;
>         if (subreq->error == 0 &&
>             S_ISREG(subreq->rreq->inode->i_mode) &&
>             subreq->retry_count < 20) {
>                 subreq->transferred = subreq->already_done;
>                 __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
>                 __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
>                 afs_fetch_data_notify(op);
>                 return;
>         }
>
> was inserted into afs_fetch_data_success() at the beginning and struct
> netfs_io_subrequest given an extra field, "already_done" that was set to
> the value in "subreq->transferred" by netfs_reissue_read().
>
> When reading a 4K file, the subrequests would get gradually smaller, a new
> subrequest would be allocated around the 3rd retry and then eventually be
> rendered superfluous when the 20th retry was hit and the limit on the first
> subrequest was eased.
>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Steve French <stfrench@microsoft.com>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Paulo Alcantara <pc@manguebit.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: v9fs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/read_collect.c      |    6 ++++--
>  fs/netfs/read_retry.c        |   40 ++++++++++++++++++++++++++++++----------
>  include/linux/netfs.h        |    2 +-
>  include/trace/events/netfs.h |    4 +++-
>  4 files changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index f65affa5a9e4..636cc5a98ef5 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work)
>   */
>  void netfs_wake_read_collector(struct netfs_io_request *rreq)
>  {
> -       if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
> +       if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) &&
> +           !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) {
>                 if (!work_pending(&rreq->work)) {
>                         netfs_get_request(rreq, netfs_rreq_trace_get_work);
>                         if (!queue_work(system_unbound_wq, &rreq->work))
> @@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
>         smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */
>
>         /* If we are at the head of the queue, wake up the collector. */
> -       if (list_is_first(&subreq->rreq_link, &stream->subrequests))
> +       if (list_is_first(&subreq->rreq_link, &stream->subrequests) ||
> +           test_bit(NETFS_RREQ_RETRYING, &rreq->flags))
>                 netfs_wake_read_collector(rreq);
>
>         netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated);
> diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
> index 2290af0d51ac..8316c4533a51 100644
> --- a/fs/netfs/read_retry.c
> +++ b/fs/netfs/read_retry.c
> @@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
>  {
>         __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
>         __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> -       netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>         subreq->rreq->netfs_ops->issue_read(subreq);
>  }
>
> @@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                                 __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
>                                 subreq->retry_count++;
>                                 netfs_reset_iter(subreq);
> +                               netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>                                 netfs_reissue_read(rreq, subreq);
>                         }
>                 }
> @@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                 struct iov_iter source;
>                 unsigned long long start, len;
>                 size_t part;
> -               bool boundary = false;
> +               bool boundary = false, subreq_superfluous = false;
>
>                 /* Go through the subreqs and find the next span of contiguous
>                  * buffer that we then rejig (cifs, for example, needs the
> @@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                 /* Work through the sublist. */
>                 subreq = from;
>                 list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
> -                       if (!len)
> +                       if (!len) {
> +                               subreq_superfluous = true;
>                                 break;
> +                       }
>                         subreq->source  = NETFS_DOWNLOAD_FROM_SERVER;
>                         subreq->start   = start - subreq->transferred;
>                         subreq->len     = len   + subreq->transferred;
> @@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>
>                         netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>                         netfs_reissue_read(rreq, subreq);
> -                       if (subreq == to)
> +                       if (subreq == to) {
> +                               subreq_superfluous = false;
>                                 break;
> +                       }
>                 }
>
>                 /* If we managed to use fewer subreqs, we can discard the
>                  * excess; if we used the same number, then we're done.
>                  */
>                 if (!len) {
> -                       if (subreq == to)
> +                       if (!subreq_superfluous)
>                                 continue;
>                         list_for_each_entry_safe_from(subreq, tmp,
>                                                       &stream->subrequests, rreq_link) {
> -                               trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
> +                               trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous);
>                                 list_del(&subreq->rreq_link);
>                                 netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
>                                 if (subreq == to)
> @@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                         subreq->source          = NETFS_DOWNLOAD_FROM_SERVER;
>                         subreq->start           = start;
>                         subreq->len             = len;
> -                       subreq->debug_index     = atomic_inc_return(&rreq->subreq_counter);
>                         subreq->stream_nr       = stream->stream_nr;
>                         subreq->retry_count     = 1;
>
>                         trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index,
>                                              refcount_read(&subreq->ref),
>                                              netfs_sreq_trace_new);
> -                       netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>
>                         list_add(&subreq->rreq_link, &to->rreq_link);
>                         to = list_next_entry(to, rreq_link);
> @@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq)
>  {
>         struct netfs_io_subrequest *subreq;
>         struct netfs_io_stream *stream = &rreq->io_streams[0];
> +       DEFINE_WAIT(myself);
> +
> +       set_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
>         /* Wait for all outstanding I/O to quiesce before performing retries as
>          * we may need to renegotiate the I/O sizes.
>          */
>         list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
> -               wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS,
> -                           TASK_UNINTERRUPTIBLE);
> +               if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> +                       continue;
> +
> +               trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
> +               for (;;) {
> +                       prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
> +
> +                       if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> +                               break;
> +
> +                       trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
> +                       schedule();
> +                       trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
> +               }
> +
> +               finish_wait(&rreq->waitq, &myself);
>         }
> +       clear_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
>         trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
>         netfs_retry_read_subrequests(rreq);
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 071d05d81d38..c86a11cfc4a3 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -278,7 +278,7 @@ struct netfs_io_request {
>  #define NETFS_RREQ_PAUSE               11      /* Pause subrequest generation */
>  #define NETFS_RREQ_USE_IO_ITER         12      /* Use ->io_iter rather than ->i_pages */
>  #define NETFS_RREQ_ALL_QUEUED          13      /* All subreqs are now queued */
> -#define NETFS_RREQ_NEED_RETRY          14      /* Need to try retrying */
> +#define NETFS_RREQ_RETRYING            14      /* Set if we're in the retry path */
>  #define NETFS_RREQ_USE_PGPRIV2         31      /* [DEPRECATED] Use PG_private_2 to mark
>                                                  * write to cache on read */
>         const struct netfs_request_ops *netfs_ops;
> diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
> index 6e699cadcb29..f880835f7695 100644
> --- a/include/trace/events/netfs.h
> +++ b/include/trace/events/netfs.h
> @@ -99,7 +99,7 @@
>         EM(netfs_sreq_trace_limited,            "LIMIT")        \
>         EM(netfs_sreq_trace_need_clear,         "N-CLR")        \
>         EM(netfs_sreq_trace_partial_read,       "PARTR")        \
> -       EM(netfs_sreq_trace_need_retry,         "NRTRY")        \
> +       EM(netfs_sreq_trace_need_retry,         "ND-RT")        \
>         EM(netfs_sreq_trace_prepare,            "PREP ")        \
>         EM(netfs_sreq_trace_prep_failed,        "PRPFL")        \
>         EM(netfs_sreq_trace_progress,           "PRGRS")        \
> @@ -108,7 +108,9 @@
>         EM(netfs_sreq_trace_short,              "SHORT")        \
>         EM(netfs_sreq_trace_split,              "SPLIT")        \
>         EM(netfs_sreq_trace_submit,             "SUBMT")        \
> +       EM(netfs_sreq_trace_superfluous,        "SPRFL")        \
>         EM(netfs_sreq_trace_terminated,         "TERM ")        \
> +       EM(netfs_sreq_trace_wait_for,           "_WAIT")        \
>         EM(netfs_sreq_trace_write,              "WRITE")        \
>         EM(netfs_sreq_trace_write_skip,         "SKIP ")        \
>         E_(netfs_sreq_trace_write_term,         "WTERM")

Looks good in testing, with afs.  Took quite a few iterations to see
evidence of a retry occurring, where the new stat was helpful.

Tested-by: Marc Dionne <marc.dionne@auristor.com>

Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
@ 2025-01-28 19:11   ` Ihor Solodrai
  2025-02-03 18:01     ` Ihor Solodrai
  2025-02-10 10:57   ` David Howells
  2025-02-10 11:12   ` David Howells
  2 siblings, 1 reply; 13+ messages in thread
From: Ihor Solodrai @ 2025-01-28 19:11 UTC (permalink / raw)
  To: David Howells, Marc Dionne, Steve French
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck, Paulo Alcantara,
	Jeff Layton, Christian Brauner, v9fs, linux-cifs, netfs,
	linux-fsdevel, linux-kernel, ast, bpf

January 28, 2025 at 1:33 AM, "David Howells" <dhowells@redhat.com> wrote:

> 
> Here's an additional patch to allow stats on the number of retries to be
> 
> obtained. This isn't a fix per se.
> 
> David
>
> [...]

Hi David, Marc.

I regret to report that this patch didn't fix the hanging in
v9fs_evict_inode when running selftests/bpf.

Here is what I tried to test:

  * Checked out latest bpf-next source tree (0fc5dddb9409)
  * Applied patch: https://lore.kernel.org/netfs/3173328.1738024385@warthog.procyon.org.uk/
  * Applied retry stats patch: https://lore.kernel.org/netfs/3187377.1738056789@warthog.procyon.org.uk/
  * Modified tools/testing/selftests/bpf/config to enable /proc/fs/netfs/stats
  * Modified CI scripts to collect the stats
  * Ran the shell script reproducing the CI testing pipeline

Bash piece starting a process collecting /proc/fs/netfs/stats:

    function tail_netfs {
        echo -n > /mnt/vmtest/netfs-stats.log
        while true; do
            echo >> /mnt/vmtest/netfs-stats.log
            cat /proc/fs/netfs/stats >> /mnt/vmtest/netfs-stats.log
            sleep 1
        done
    }
    export -f tail_netfs
    nohup bash -c 'tail_netfs' & disown

Last recored /proc/fs/netfs/stats (note 0 retries):

    Reads  : DR=0 RA=15184 RF=5 RS=0 WB=0 WBZ=0
    Writes : BW=488 WT=0 DW=0 WP=488 2C=0
    ZeroOps: ZR=7964 sh=0 sk=0
    DownOps: DL=15189 ds=15189 df=0 di=0
    CaRdOps: RD=0 rs=0 rf=0
    UpldOps: UL=488 us=488 uf=0
    CaWrOps: WR=0 ws=0 wf=0
    Retries: rq=0 rs=0 wq=0 ws=0
    Objs   : rr=2 sr=1 foq=1 wsc=0
    WbLock : skip=0 wait=0
    -- FS-Cache statistics --
    Cookies: n=0 v=0 vcol=0 voom=0
    Acquire: n=0 ok=0 oom=0
    LRU    : n=0 exp=0 rmv=0 drp=0 at=0
    Invals : n=0
    Updates: n=0 rsz=0 rsn=0
    Relinqs: n=0 rtr=0 drop=0
    NoSpace: nwr=0 ncr=0 cull=0
    IO     : rd=0 wr=0 mis=0

The stack on hung task hasn't changed:

    [  184.375149] INFO: task modprobe:2759 blocked for more than 20 seconds.
    [  184.376149]       Tainted: G           OE      6.13.0-gbb67a65a921c-dirty #3
    [  184.376593] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  184.377119] task:modprobe        state:D stack:0     pid:2759  tgid:2759  ppid:455    flags:0x00004002
    [  184.377701] Call Trace:
    [  184.377886]  <TASK>
    [  184.378039]  __schedule+0xa91/0xe80
    [  184.378282]  schedule+0x41/0xb0
    [  184.378490]  v9fs_evict_inode+0xfe/0x170
    [  184.378754]  ? __pfx_var_wake_function+0x10/0x10
    [  184.379070]  evict+0x1ef/0x360
    [  184.379288]  __dentry_kill+0xb0/0x220
    [  184.379528]  ? dput+0x3a/0x1d0
    [  184.379736]  dput+0x114/0x1d0
    [  184.379946]  __fput+0x136/0x2b0
    [  184.380158]  task_work_run+0x89/0xc0
    [  184.380396]  do_exit+0x2c6/0x9c0
    [  184.380617]  do_group_exit+0xa4/0xb0
    [  184.380870]  __x64_sys_exit_group+0x17/0x20
    [  184.381137]  x64_sys_call+0x21a0/0x21a0
    [  184.381386]  do_syscall_64+0x79/0x120
    [  184.381630]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  184.381969] RIP: 0033:0x7f817bf7c21d
    [  184.382202] RSP: 002b:00007fff92c8d148 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
    [  184.382676] RAX: ffffffffffffffda RBX: 00007f817c092fa8 RCX: 00007f817bf7c21d
    [  184.383138] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001
    [  184.383582] RBP: 00007fff92c8d1a0 R08: 00007fff92c8d0e8 R09: 0000000000000000
    [  184.384042] R10: 00007fff92c8d05f R11: 0000000000000206 R12: 0000000000000001
    [  184.384486] R13: 0000000000000000 R14: 0000000000000001 R15: 00007f817c092fc0
    [  184.384963]  </TASK>
    [  184.385112]
    [  184.385112] Showing all locks held in the system:
    [  184.385499] 1 lock held by khungtaskd/32:
    [  184.385793]  #0: ffffffff9d195d90 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180
    [  184.386366] 2 locks held by kworker/u8:10/455:
    [  184.386649]  #0: ffffa1a240104d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x23a/0x600
    [  184.387357]  #1: ffffb06380a23e20 ((work_completion)(&sub_info->work)){+.+.}-{0:0}, at: process_scheduled_works+0x25a/0x600
    [  184.388076]
    [  184.388183] =============================================

I pushed full logs to github:
https://github.com/kernel-patches/bpf/commit/88c0d0e1692b04c0d54b7c1941003758d23e0d6a

I recommend trying to reproduce with steps I shared in my initial report:
https://lore.kernel.org/bpf/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/

I know it may not be very convenient due to all the CI stuff, but you
should be able to use it to iterate on the kernel source locally and
narrow down the problem.

I have everything set up, so you also might share some debugging code
for me to run if you prefer.

Thanks.

---

Not directly related, but it took me a while to figure out how to
collect the netfs stats.

I first added:
    CONFIG_NETFS_DEBUG=y
    CONFIG_NETFS_STATS=y

But that didn't work, because /proc/fs/netfs/stats is created only
with CONFIG_FSCACHE_STATS (fs/netfs/main.c):

    #ifdef CONFIG_FSCACHE_STATS
            if (!proc_create_single("fs/netfs/stats", S_IFREG | 0444, NULL,
                                    netfs_stats_show))
                    goto error_procfile;
    #endif

And that depends on CONFIG_FSCACHE=y, so I ended up with:

    CONFIG_FSCACHE=y
    CONFIG_FSCACHE_STATS=y
    CONFIG_NETFS_SUPPORT=y
    CONFIG_NETFS_DEBUG=y
    CONFIG_NETFS_STATS=y

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-01-28 19:11   ` Ihor Solodrai
@ 2025-02-03 18:01     ` Ihor Solodrai
  0 siblings, 0 replies; 13+ messages in thread
From: Ihor Solodrai @ 2025-02-03 18:01 UTC (permalink / raw)
  To: David Howells, Marc Dionne, Steve French
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck, Paulo Alcantara,
	Jeff  Layton, Christian Brauner, v9fs, linux-cifs, netfs,
	linux-fsdevel, linux-kernel, ast, bpf

On 1/28/25 11:11 AM, Ihor Solodrai wrote:
> January 28, 2025 at 1:33 AM, "David Howells" <dhowells@redhat.com> wrote:
>
>> Here's an additional patch to allow stats on the number of retries to be
>> obtained. This isn't a fix per se.
>> David
>>
>> [...]
>
> Hi David, Marc.
>
> I regret to report that this patch didn't fix the hanging in
> v9fs_evict_inode when running selftests/bpf.

David, Marc,

Were you able to reproduce the hanging?

We're still using the revert of the original patch [1] in BPF CI as a
mitigation.

If there is other pending work that may be relevant, please share.

Thanks.

[1] https://lore.kernel.org/all/20241216204124.3752367-28-dhowells@redhat.com/

>
> [...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
  2025-01-28 19:11   ` Ihor Solodrai
@ 2025-02-10 10:57   ` David Howells
  2025-02-10 11:12   ` David Howells
  2 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2025-02-10 10:57 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: dhowells, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

Ihor Solodrai <ihor.solodrai@linux.dev> wrote:

> I recommend trying to reproduce with steps I shared in my initial report:
> https://lore.kernel.org/bpf/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
> 
> I know it may not be very convenient due to all the CI stuff,

That's an understatement. :-)

> but you should be able to use it to iterate on the kernel source locally and
> narrow down the problem.

Can you share just the reproducer without all the docker stuff?  Is this one
of those tests that requires 9p over virtio?  I have a different environment
for that.

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
  2025-01-28 19:11   ` Ihor Solodrai
  2025-02-10 10:57   ` David Howells
@ 2025-02-10 11:12   ` David Howells
  2025-02-10 21:54     ` Ihor Solodrai
  2025-02-10 23:18     ` David Howells
  2 siblings, 2 replies; 13+ messages in thread
From: David Howells @ 2025-02-10 11:12 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: dhowells, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

Ihor Solodrai <ihor.solodrai@linux.dev> wrote:

> Bash piece starting a process collecting /proc/fs/netfs/stats:
> 
>     function tail_netfs {
>         echo -n > /mnt/vmtest/netfs-stats.log
>         while true; do
>             echo >> /mnt/vmtest/netfs-stats.log
>             cat /proc/fs/netfs/stats >> /mnt/vmtest/netfs-stats.log
>             sleep 1
>         done
>     }
>     export -f tail_netfs
>     nohup bash -c 'tail_netfs' & disown

I'm afraid, intermediate snapshots of this file aren't particularly useful -
just the last snapshot:

> Last recored /proc/fs/netfs/stats (note 0 retries):
> 
>     Reads  : DR=0 RA=15184 RF=5 RS=0 WB=0 WBZ=0
>     Writes : BW=488 WT=0 DW=0 WP=488 2C=0
>     ZeroOps: ZR=7964 sh=0 sk=0
>     DownOps: DL=15189 ds=15189 df=0 di=0
>     CaRdOps: RD=0 rs=0 rf=0
>     UpldOps: UL=488 us=488 uf=0
>     CaWrOps: WR=0 ws=0 wf=0
>     Retries: rq=0 rs=0 wq=0 ws=0
>     Objs   : rr=2 sr=1 foq=1 wsc=0
>     WbLock : skip=0 wait=0
>     -- FS-Cache statistics --
>     Cookies: n=0 v=0 vcol=0 voom=0
>     Acquire: n=0 ok=0 oom=0
>     LRU    : n=0 exp=0 rmv=0 drp=0 at=0
>     Invals : n=0
>     Updates: n=0 rsz=0 rsn=0
>     Relinqs: n=0 rtr=0 drop=0
>     NoSpace: nwr=0 ncr=0 cull=0
>     IO     : rd=0 wr=0 mis=0

Could you collect some tracing:

echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_read/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write_iter/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable

and then collect the tracelog:

trace-cmd show | bzip2 >some_file_somewhere.bz2

And if you could collect /proc/fs/netfs/requests as well, that will show the
debug IDs of the hanging requests.  These can be used to grep the trace by
prepending "R=".  For example, if you see:

	REQUEST  OR REF FL ERR  OPS COVERAGE
	======== == === == ==== === =========
	00000043 WB   1 2120    0   0 @34000000 0/0

then:

	trace-cmd show | grep R=00000043

Thanks,
David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-02-10 11:12   ` David Howells
@ 2025-02-10 21:54     ` Ihor Solodrai
  2025-02-10 23:18     ` David Howells
  1 sibling, 0 replies; 13+ messages in thread
From: Ihor Solodrai @ 2025-02-10 21:54 UTC (permalink / raw)
  To: David Howells
  Cc: dhowells, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar  Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

On 2/10/25 2:57 AM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>
>> I recommend trying to reproduce with steps I shared in my initial report:
>> https://lore.kernel.org/bpf/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
>>
>> I know it may not be very convenient due to all the CI stuff,
>
> That's an understatement. :-)
>
>> but you should be able to use it to iterate on the kernel source locally and
>> narrow down the problem.
>
> Can you share just the reproducer without all the docker stuff?  

I wrote a couple of shell scripts with a gist of what's happening on
CI: build kernel, build selftests and run. You may try them.

Pull this branch from my github:
https://github.com/theihor/bpf/tree/netfs-debug

It's the kernel source in a broken state with the scripts.
Inlining the scripts here:

## ./reproducer.sh

#!/bin/bash

set -euo pipefail

export KBUILD_OUTPUT=$(realpath kbuild-output)
mkdir -p $KBUILD_OUTPUT

cp -f repro.config $KBUILD_OUTPUT/.config
make olddefconfig
make -j$(nproc) all
make -j$(nproc) headers

# apt install lsb-release wget software-properties-common gnupg
# bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
export LLVM_VERSION=18

make -C tools/testing/selftests/bpf \
     CLANG=clang-${LLVM_VERSION} \
     LLC=llc-${LLVM_VERSION} \
     LLVM_STRIP=llvm-strip-${LLVM_VERSION} \
     -j$(nproc) test_progs-no_alu32

# wget https://github.com/danobi/vmtest/releases/download/v0.15.0/vmtest-x86_64
# chmod +x vmtest-x86_64
./vmtest-x86_64 -k $KBUILD_OUTPUT/$(make -s image_name) ./run-bpf-selftests.sh | tee test.log

## end of ./reproducer.sh

## ./run-bpf-selftests.sh

#!/bin/bash

/bin/mount bpffs /sys/fs/bpf -t bpf
ip link set lo up

echo 10 > /proc/sys/kernel/hung_task_timeout_secs
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_read/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write_iter/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq_ref/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable

function tail_proc {
    src=$1
    dst=$2
    echo -n > $dst
    while true; do
        echo >> $dst
        cat $src >> $dst
        sleep 1
    done
}
export -f tail_proc

nohup bash -c 'tail_proc /proc/fs/netfs/stats netfs-stats.log' & disown
nohup bash -c 'tail_proc /proc/fs/netfs/requests netfs-requests.log' & disown
nohup bash -c 'trace-cmd show -p > trace-cmd.log' & disown

cd tools/testing/selftests/bpf
./test_progs-no_alu32

## end of ./run-bpf-selftests.sh

One of the reasons for suggesting docker is that all the dependencies
are pre-packaged in the image, and so the environment is pretty close
to the actual CI environment. With only shell scripts you will have to
detect and install missing dependencies on your system and hope
package versions are more or less the same and don't affect the issue.

Notable things: LLVM 18, pahole, qemu, qemu-guest-agent, vmtest tool.

> Is this one
> of those tests that requires 9p over virtio?  I have a different environment
> for that.

We run the tests via vmtest tool: https://github.com/danobi/vmtest
This is essentially a qemu wrapper.

I am not familiar with its internals, but for sure it is using 9p.


On 2/10/25 3:12 AM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>
>> Bash piece starting a process collecting /proc/fs/netfs/stats:
>>
>>     function tail_netfs {
>>         echo -n > /mnt/vmtest/netfs-stats.log
>>         while true; do
>>             echo >> /mnt/vmtest/netfs-stats.log
>>             cat /proc/fs/netfs/stats >> /mnt/vmtest/netfs-stats.log
>>             sleep 1
>>         done
>>     }
>>     export -f tail_netfs
>>     nohup bash -c 'tail_netfs' & disown
>
> I'm afraid, intermediate snapshots of this file aren't particularly useful -
> just the last snapshot:

The reason I wrote it like this is because the test runner hangs, and
so I have to kill qemu to stop it (with no ability to run
post-processing within qemu instance; well, at least I don't know how
to do it).

>
> [...]
>
> Could you collect some tracing:
>
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_read/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_write_iter/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq_ref/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq_ref/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
>
> and then collect the tracelog:
>
> trace-cmd show | bzip2 >some_file_somewhere.bz2
>
> And if you could collect /proc/fs/netfs/requests as well, that will show the
> debug IDs of the hanging requests.  These can be used to grep the trace by
> prepending "R=".  For example, if you see:
>
> 	REQUEST  OR REF FL ERR  OPS COVERAGE
> 	======== == === == ==== === =========
> 	00000043 WB   1 2120    0   0 @34000000 0/0
>
> then:
>
> 	trace-cmd show | grep R=00000043

Done. I pushed the logs to the previously mentioned github branch:
https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f

Let me know if I can help with anything else.

>
> Thanks,
> David
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-02-10 11:12   ` David Howells
  2025-02-10 21:54     ` Ihor Solodrai
@ 2025-02-10 23:18     ` David Howells
  2025-02-11  0:54       ` Ihor Solodrai
  2025-02-12  9:47       ` [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued David Howells
  1 sibling, 2 replies; 13+ messages in thread
From: David Howells @ 2025-02-10 23:18 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: dhowells, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar  Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

Ihor Solodrai <ihor.solodrai@linux.dev> wrote:

> Done. I pushed the logs to the previously mentioned github branch:
> https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f

Perfect, thanks.

Looking at the last record of /proc/fs/netfs/requests, I see:

	REQUEST  OR REF FL ERR  OPS COVERAGE
	======== == === == ==== === =========
	00000a98 RA   1 2001    0   0 @0000 2000/2000

So the request of interest is R=00000a98 in the trace.  Grepping for that, I
see (with a few columns cut out):

 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 NEW         r=1
 test_progs-no_a-97: netfs_read: R=00000a98 READAHEAD c=00000000 ni=2ec02f16 s=0 l=2000 sz=17a8
 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=2
 test_progs-no_a-97: netfs_sreq: R=00000a98[1] DOWN TERM  f=192 s=0 17a8/17a8 s=1 e=0
 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
 test_progs-no_a-97: netfs_sreq_ref: R=00000a98[1] PUT TERM    r=1
 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=4
 test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
    kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=4
    kworker/u8:2-36: netfs_rreq: R=00000a98 RA COLLECT f=2021
    kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN DSCRD f=92 s=0 17a8/17a8 s=1 e=0
    kworker/u8:2-36: netfs_sreq_ref: R=00000a98[1] PUT DONE    r=0
    kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
    kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3
    kworker/u8:2-36: netfs_rreq: R=00000a98 RA COMPLET f=2021
    kworker/u8:2-36: netfs_rreq: R=00000a98 RA WAKE-IP f=2021
    kworker/u8:2-36: netfs_rreq: R=00000a98 RA DONE    f=2001
    kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=2
 test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
 test_progs-no_a-97: netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1
 test_progs-no_a-97: netfs_rreq_ref: R=00000a98 PUT RETURN  r=2
    kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=2
    kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=1

You can see subrequest 1 completes fine, the subrequest is freed and the ref
it had on the request is put:

	netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
	netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3

Subrequest 2, however isn't collected:

	netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
	netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1

and the work happens again in kworker/u8:2-36 right at the end:

	netfs_rreq_ref: R=00000a98 SEE WORK    r=2
	netfs_rreq_ref: R=00000a98 PUT WORK    r=1

but this doesn't do anything.

The excess buffer clearance happened in the app thread (test_progs-no_a-97):

	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0

> Let me know if I can help with anything else.

Can you add some more tracepoints?

echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_state/enable

However, I think I may have spotted the issue: I'm mixing
clear_and_wake_up_bit() for NETFS_RREQ_IN_PROGRESS (which will use a common
system waitqueue) with waiting on an rreq-specific waitqueue in such as
netfs_wait_for_read().

I'll work up a fix patch for that tomorrow.

Thanks,
David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Add retry stat counters
  2025-02-10 23:18     ` David Howells
@ 2025-02-11  0:54       ` Ihor Solodrai
  2025-02-12  9:47       ` [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued David Howells
  1 sibling, 0 replies; 13+ messages in thread
From: Ihor Solodrai @ 2025-02-11  0:54 UTC (permalink / raw)
  To: David Howells
  Cc: dhowells, Marc Dionne, Steve  French, Eric Van Hensbergen,
	Latchesar  Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

On 2/10/25 3:18 PM, David Howells wrote:
> Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>
>> Done. I pushed the logs to the previously mentioned github branch:
>> https://github.com/kernel-patches/bpf/commit/699a3bb95e2291d877737438fb641628702fd18f
>
> Perfect, thanks.
>
> Looking at the last record of /proc/fs/netfs/requests, I see:
>
> 	REQUEST  OR REF FL ERR  OPS COVERAGE
> 	======== == === == ==== === =========
> 	00000a98 RA   1 2001    0   0 @0000 2000/2000
>
> So the request of interest is R=00000a98 in the trace.  Grepping for that, I
> see (with a few columns cut out):
>
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 NEW         r=1
>  test_progs-no_a-97: netfs_read: R=00000a98 READAHEAD c=00000000 ni=2ec02f16 s=0 l=2000 sz=17a8
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=2
>  test_progs-no_a-97: netfs_sreq: R=00000a98[1] DOWN TERM  f=192 s=0 17a8/17a8 s=1 e=0
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
>  test_progs-no_a-97: netfs_sreq_ref: R=00000a98[1] PUT TERM    r=1
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET SUBREQ  r=4
>  test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=4
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA COLLECT f=2021
>     kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN DSCRD f=92 s=0 17a8/17a8 s=1 e=0
>     kworker/u8:2-36: netfs_sreq_ref: R=00000a98[1] PUT DONE    r=0
>     kworker/u8:2-36: netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA COMPLET f=2021
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA WAKE-IP f=2021
>     kworker/u8:2-36: netfs_rreq: R=00000a98 RA DONE    f=2001
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=2
>  test_progs-no_a-97: netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 GET WORK    r=3
>  test_progs-no_a-97: netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1
>  test_progs-no_a-97: netfs_rreq_ref: R=00000a98 PUT RETURN  r=2
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 SEE WORK    r=2
>     kworker/u8:2-36: netfs_rreq_ref: R=00000a98 PUT WORK    r=1
>
> You can see subrequest 1 completes fine, the subrequest is freed and the ref
> it had on the request is put:
>
> 	netfs_sreq: R=00000a98[1] DOWN FREE  f=92 s=0 17a8/17a8 s=1 e=0
> 	netfs_rreq_ref: R=00000a98 PUT SUBREQ  r=3
>
> Subrequest 2, however isn't collected:
>
> 	netfs_sreq: R=00000a98[2] ZERO SUBMT f=00 s=17a8 0/858 s=0 e=0
> 	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
> 	netfs_sreq_ref: R=00000a98[2] PUT TERM    r=1
>
> and the work happens again in kworker/u8:2-36 right at the end:
>
> 	netfs_rreq_ref: R=00000a98 SEE WORK    r=2
> 	netfs_rreq_ref: R=00000a98 PUT WORK    r=1
>
> but this doesn't do anything.
>
> The excess buffer clearance happened in the app thread (test_progs-no_a-97):
>
> 	netfs_sreq: R=00000a98[2] ZERO TERM  f=102 s=17a8 858/858 s=1 e=0
>
>> Let me know if I can help with anything else.
>
> Can you add some more tracepoints?
>
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_sreq/enable
> echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_collect_state/enable

See here: https://github.com/kernel-patches/bpf/commit/517f51d1f6c09ebab9df3e3d17bb669601ab14ef

Beware, uncompressed trace-cmd.log is 37Mb

>
> However, I think I may have spotted the issue: I'm mixing
> clear_and_wake_up_bit() for NETFS_RREQ_IN_PROGRESS (which will use a common
> system waitqueue) with waiting on an rreq-specific waitqueue in such as
> netfs_wait_for_read().
>
> I'll work up a fix patch for that tomorrow.

Great! Thank you.

>
> Thanks,
> David
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Fix a number of read-retry hangs
  2025-01-28  0:33 [PATCH] netfs: Fix a number of read-retry hangs David Howells
  2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
  2025-01-28 16:51 ` [PATCH] netfs: Fix a number of read-retry hangs Marc Dionne
@ 2025-02-11  1:07 ` Steve French
  2 siblings, 0 replies; 13+ messages in thread
From: Steve French @ 2025-02-11  1:07 UTC (permalink / raw)
  To: David Howells
  Cc: Ihor Solodrai, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel

Tested-by: Steve French <stfrench@microsoft.com>

Verified it fixed various hangs I saw in testing cifs.ko  Here is a
success run with the patch e.g.:
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/342


On Mon, Jan 27, 2025 at 6:33 PM David Howells <dhowells@redhat.com> wrote:
>
> Hi Ihor, Marc, Steve,
>
> I think this patch should better fix the hangs that have been seen than Ihor's
> previously tested fix (which I think isn't actually correct).
>
> David
> ---
> From: David Howells <dhowells@redhat.com>
>
> netfs: Fix a number of read-retry hangs
>
> Fix a number of hangs in the netfslib read-retry code, including:
>
>  (1) netfs_reissue_read() doubles up the getting of references on
>      subrequests, thereby leaking the subrequest and causing inode eviction
>      to wait indefinitely.  This can lead to the kernel reporting a hang in
>      the filesystem's evict_inode().
>
>      Fix this by removing the get from netfs_reissue_read() and adding one
>      to netfs_retry_read_subrequests() to deal with the one place that
>      didn't double up.
>
>  (2) The loop in netfs_retry_read_subrequests() that retries a sequence of
>      failed subrequests doesn't record whether or not it retried the one
>      that the "subreq" pointer points to when it leaves the loop.  It may
>      not if renegotiation/repreparation of the subrequests means that fewer
>      subrequests are needed to span the cumulative range of the sequence.
>
>      Because it doesn't record this, the piece of code that discards
>      now-superfluous subrequests doesn't know whether it should discard the
>      one "subreq" points to - and so it doesn't.
>
>      Fix this by noting whether the last subreq it examines is superfluous
>      and if it is, then getting rid of it and all subsequent subrequests.
>
>      If that one one wasn't superfluous, then we would have tried to go
>      round the previous loop again and so there can be no further unretried
>      subrequests in the sequence.
>
>  (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
>      subrequests it has to get because it ran out of ones it could reuse to
>      to renegotiation/repreparation shrinking the subrequests.
>
>      Fix this by removing that extra ref.
>
>  (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
>      NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
>      sequence - but netfs_read_subreq_terminated() is now using a wait
>      queue on the request instead and so this wait will never finish.
>
>      Fix this by waiting on the wait queue instead.  To make this work, a
>      new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
>      the wake-up code to wake up the wait queue rather than requeuing the
>      request's work item.
>
>      Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
>      no longer used.
>
>  (5) Whilst not strictly anything to do with the hang,
>      netfs_retry_read_subrequests() was also doubly incrementing the
>      subreq_counter and re-setting the debug index, leaving a gap in the
>      trace.  This is also fixed.
>
> One of these hangs was observed with 9p and with cifs.  Others were forced
> by manual code injection into fs/afs/file.c.  Firstly, afs_prepare_read()
> was created to provide an changing pattern of maximum subrequest sizes:
>
>         static int afs_prepare_read(struct netfs_io_subrequest *subreq)
>         {
>                 struct netfs_io_request *rreq = subreq->rreq;
>                 if (!S_ISREG(subreq->rreq->inode->i_mode))
>                         return 0;
>                 if (subreq->retry_count < 20)
>                         rreq->io_streams[0].sreq_max_len =
>                                 umax(200, 2222 - subreq->retry_count * 40);
>                 else
>                         rreq->io_streams[0].sreq_max_len = 3333;
>                 return 0;
>         }
>
> and pointed to by afs_req_ops.  Then the following:
>
>         struct netfs_io_subrequest *subreq = op->fetch.subreq;
>         if (subreq->error == 0 &&
>             S_ISREG(subreq->rreq->inode->i_mode) &&
>             subreq->retry_count < 20) {
>                 subreq->transferred = subreq->already_done;
>                 __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
>                 __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
>                 afs_fetch_data_notify(op);
>                 return;
>         }
>
> was inserted into afs_fetch_data_success() at the beginning and struct
> netfs_io_subrequest given an extra field, "already_done" that was set to
> the value in "subreq->transferred" by netfs_reissue_read().
>
> When reading a 4K file, the subrequests would get gradually smaller, a new
> subrequest would be allocated around the 3rd retry and then eventually be
> rendered superfluous when the 20th retry was hit and the limit on the first
> subrequest was eased.
>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Steve French <stfrench@microsoft.com>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Paulo Alcantara <pc@manguebit.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: v9fs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/read_collect.c      |    6 ++++--
>  fs/netfs/read_retry.c        |   40 ++++++++++++++++++++++++++++++----------
>  include/linux/netfs.h        |    2 +-
>  include/trace/events/netfs.h |    4 +++-
>  4 files changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index f65affa5a9e4..636cc5a98ef5 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work)
>   */
>  void netfs_wake_read_collector(struct netfs_io_request *rreq)
>  {
> -       if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
> +       if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) &&
> +           !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) {
>                 if (!work_pending(&rreq->work)) {
>                         netfs_get_request(rreq, netfs_rreq_trace_get_work);
>                         if (!queue_work(system_unbound_wq, &rreq->work))
> @@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
>         smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */
>
>         /* If we are at the head of the queue, wake up the collector. */
> -       if (list_is_first(&subreq->rreq_link, &stream->subrequests))
> +       if (list_is_first(&subreq->rreq_link, &stream->subrequests) ||
> +           test_bit(NETFS_RREQ_RETRYING, &rreq->flags))
>                 netfs_wake_read_collector(rreq);
>
>         netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated);
> diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
> index 2290af0d51ac..8316c4533a51 100644
> --- a/fs/netfs/read_retry.c
> +++ b/fs/netfs/read_retry.c
> @@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
>  {
>         __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
>         __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> -       netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>         subreq->rreq->netfs_ops->issue_read(subreq);
>  }
>
> @@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                                 __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
>                                 subreq->retry_count++;
>                                 netfs_reset_iter(subreq);
> +                               netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>                                 netfs_reissue_read(rreq, subreq);
>                         }
>                 }
> @@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                 struct iov_iter source;
>                 unsigned long long start, len;
>                 size_t part;
> -               bool boundary = false;
> +               bool boundary = false, subreq_superfluous = false;
>
>                 /* Go through the subreqs and find the next span of contiguous
>                  * buffer that we then rejig (cifs, for example, needs the
> @@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                 /* Work through the sublist. */
>                 subreq = from;
>                 list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
> -                       if (!len)
> +                       if (!len) {
> +                               subreq_superfluous = true;
>                                 break;
> +                       }
>                         subreq->source  = NETFS_DOWNLOAD_FROM_SERVER;
>                         subreq->start   = start - subreq->transferred;
>                         subreq->len     = len   + subreq->transferred;
> @@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>
>                         netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>                         netfs_reissue_read(rreq, subreq);
> -                       if (subreq == to)
> +                       if (subreq == to) {
> +                               subreq_superfluous = false;
>                                 break;
> +                       }
>                 }
>
>                 /* If we managed to use fewer subreqs, we can discard the
>                  * excess; if we used the same number, then we're done.
>                  */
>                 if (!len) {
> -                       if (subreq == to)
> +                       if (!subreq_superfluous)
>                                 continue;
>                         list_for_each_entry_safe_from(subreq, tmp,
>                                                       &stream->subrequests, rreq_link) {
> -                               trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
> +                               trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous);
>                                 list_del(&subreq->rreq_link);
>                                 netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
>                                 if (subreq == to)
> @@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>                         subreq->source          = NETFS_DOWNLOAD_FROM_SERVER;
>                         subreq->start           = start;
>                         subreq->len             = len;
> -                       subreq->debug_index     = atomic_inc_return(&rreq->subreq_counter);
>                         subreq->stream_nr       = stream->stream_nr;
>                         subreq->retry_count     = 1;
>
>                         trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index,
>                                              refcount_read(&subreq->ref),
>                                              netfs_sreq_trace_new);
> -                       netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>
>                         list_add(&subreq->rreq_link, &to->rreq_link);
>                         to = list_next_entry(to, rreq_link);
> @@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq)
>  {
>         struct netfs_io_subrequest *subreq;
>         struct netfs_io_stream *stream = &rreq->io_streams[0];
> +       DEFINE_WAIT(myself);
> +
> +       set_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
>         /* Wait for all outstanding I/O to quiesce before performing retries as
>          * we may need to renegotiate the I/O sizes.
>          */
>         list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
> -               wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS,
> -                           TASK_UNINTERRUPTIBLE);
> +               if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> +                       continue;
> +
> +               trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
> +               for (;;) {
> +                       prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
> +
> +                       if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> +                               break;
> +
> +                       trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
> +                       schedule();
> +                       trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
> +               }
> +
> +               finish_wait(&rreq->waitq, &myself);
>         }
> +       clear_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
>         trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
>         netfs_retry_read_subrequests(rreq);
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 071d05d81d38..c86a11cfc4a3 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -278,7 +278,7 @@ struct netfs_io_request {
>  #define NETFS_RREQ_PAUSE               11      /* Pause subrequest generation */
>  #define NETFS_RREQ_USE_IO_ITER         12      /* Use ->io_iter rather than ->i_pages */
>  #define NETFS_RREQ_ALL_QUEUED          13      /* All subreqs are now queued */
> -#define NETFS_RREQ_NEED_RETRY          14      /* Need to try retrying */
> +#define NETFS_RREQ_RETRYING            14      /* Set if we're in the retry path */
>  #define NETFS_RREQ_USE_PGPRIV2         31      /* [DEPRECATED] Use PG_private_2 to mark
>                                                  * write to cache on read */
>         const struct netfs_request_ops *netfs_ops;
> diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
> index 6e699cadcb29..f880835f7695 100644
> --- a/include/trace/events/netfs.h
> +++ b/include/trace/events/netfs.h
> @@ -99,7 +99,7 @@
>         EM(netfs_sreq_trace_limited,            "LIMIT")        \
>         EM(netfs_sreq_trace_need_clear,         "N-CLR")        \
>         EM(netfs_sreq_trace_partial_read,       "PARTR")        \
> -       EM(netfs_sreq_trace_need_retry,         "NRTRY")        \
> +       EM(netfs_sreq_trace_need_retry,         "ND-RT")        \
>         EM(netfs_sreq_trace_prepare,            "PREP ")        \
>         EM(netfs_sreq_trace_prep_failed,        "PRPFL")        \
>         EM(netfs_sreq_trace_progress,           "PRGRS")        \
> @@ -108,7 +108,9 @@
>         EM(netfs_sreq_trace_short,              "SHORT")        \
>         EM(netfs_sreq_trace_split,              "SPLIT")        \
>         EM(netfs_sreq_trace_submit,             "SUBMT")        \
> +       EM(netfs_sreq_trace_superfluous,        "SPRFL")        \
>         EM(netfs_sreq_trace_terminated,         "TERM ")        \
> +       EM(netfs_sreq_trace_wait_for,           "_WAIT")        \
>         EM(netfs_sreq_trace_write,              "WRITE")        \
>         EM(netfs_sreq_trace_write_skip,         "SKIP ")        \
>         E_(netfs_sreq_trace_write_term,         "WTERM")
>
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
  2025-02-10 23:18     ` David Howells
  2025-02-11  0:54       ` Ihor Solodrai
@ 2025-02-12  9:47       ` David Howells
  2025-02-12 18:01         ` Ihor Solodrai
  1 sibling, 1 reply; 13+ messages in thread
From: David Howells @ 2025-02-12  9:47 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: dhowells, Marc Dionne, Steve French, Eric Van Hensbergen,
	Latchesar  Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

Hi Ihor,

Okay, the bug you're hitting appears to be a different one to the one I
thought first.  Can you try the attached patch?  I managed to reproduce it
with AFS by injecting a delay.

Grepping your logs for the stuck request, you can see the issue:

          ip: netfs_rreq_ref: R=00002152 NEW         r=1
          ip: netfs_read: R=00002152 READAHEAD c=00000000 ni=1034fe3 s=4000 l=3000 sz=6898
          ip: netfs_rreq_ref: R=00002152 GET SUBREQ  r=2

Subrequest 1 completes synchronously and queues the collector work item:

          ip: netfs_sreq: R=00002152[1] DOWN TERM  f=192 s=4000 2898/2898 s=2 e=0
          ip: netfs_rreq_ref: R=00002152 GET WORK    r=3
kworker/u8:3: netfs_rreq_ref: R=00002152 SEE WORK    r=3
          ip: netfs_sreq_ref: R=00002152[1] PUT TERM    r=1
          ip: netfs_rreq_ref: R=00002152 GET SUBREQ  r=4

Then proposed a new subreq to clear the end of the page, but it's not queued
at this point:

          ip: netfs_sreq: R=00002152[2] ZERO SUBMT f=00 s=6898 0/768 s=0 e=0

(I should probably move the tracepoint to the queue point to make it more
obvious).  The collector processes the subrequests it can see, and
NETFS_RREQ_ALL_QUEUED (0x2000) is set in the flags (f=2021):

kworker/u8:3: netfs_rreq: R=00002152 RA COLLECT f=2021
kworker/u8:3: netfs_collect: R=00002152 s=4000-7000
kworker/u8:3: netfs_collect_sreq: R=00002152[0:01] s=4000 t=2898/2898
kworker/u8:3: netfs_sreq: R=00002152[1] DOWN DSCRD f=92 s=4000 2898/2898 s=2 e=0
kworker/u8:3: netfs_sreq_ref: R=00002152[1] PUT DONE    r=0
kworker/u8:3: netfs_sreq: R=00002152[1] DOWN FREE  f=92 s=4000 2898/2898 s=2 e=0
kworker/u8:3: netfs_rreq_ref: R=00002152 PUT SUBREQ  r=3

The notes (n=x) indicate that the collector didn't see subreq 2 (bit 0,
HIT_PENDING, wasn't set)...:

kworker/u8:3: netfs_collect_state: R=00002152 col=6898 cln=7000 n=c
kworker/u8:3: netfs_collect_state: R=00002152 col=6898 cln=7000 n=8

... and so it completed the request:

kworker/u8:3: netfs_rreq: R=00002152 RA COMPLET f=2021
kworker/u8:3: netfs_rreq: R=00002152 RA WAKE-IP f=2021

And now, NETFS_RREQ_IN_PROGRESS has been cleared, which means we can't get
back into the read collector.

kworker/u8:3: netfs_rreq: R=00002152 RA DONE    f=2001
kworker/u8:3: netfs_rreq_ref: R=00002152 PUT WORK    r=2

Then subreq 2 finishes and you can see the worker happen, but do nothing:

          ip: netfs_sreq: R=00002152[2] ZERO TERM  f=102 s=6898 768/768 s=2 e=0
          ip: netfs_rreq_ref: R=00002152 GET WORK    r=3
kworker/u8:3: netfs_rreq_ref: R=00002152 SEE WORK    r=3
kworker/u8:3: netfs_rreq_ref: R=00002152 PUT WORK    r=2

David
---
netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued

Due to the code that queues a subreq on the active subrequest list getting
moved to netfs_issue_read(), the NETFS_RREQ_ALL_QUEUED flag may now get set
before the list-add actually happens.  This is not a problem if the
collection worker happens after the list-add, but it's a race - and, for
9P, where the read from the server is synchronous and done in the
submitting thread, this is a lot more likely.

The result is that, if the timing is wrong, a ref gets leaked because the
collector thinks that all the subreqs have completed (because it can't see
the last one yet) and clears NETFS_RREQ_IN_PROGRESS - at which point, the
collection worker no longer goes into the collector.

This can be provoked with AFS by injecting an msleep() right before the
final subreq is queued.

Fix this by splitting the queuing part out of netfs_issue_read() into a new
function, netfs_queue_read(), and calling it separately.  The setting of
NETFS_RREQ_ALL_QUEUED is then done by netfs_queue_read() whilst it is
holding the spinlock (that's probably unnecessary, but shouldn't hurt).

It might be better to set a flag on the final subreq, but this could be a
problem if an error occurs and we can't queue it.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/buffered_read.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index f761d44b3436..0d1b6d35ff3b 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -155,8 +155,9 @@ static void netfs_read_cache_to_pagecache(struct netfs_io_request *rreq,
 			netfs_cache_read_terminated, subreq);
 }

-static void netfs_issue_read(struct netfs_io_request *rreq,
-			     struct netfs_io_subrequest *subreq)
+static void netfs_queue_read(struct netfs_io_request *rreq,
+			     struct netfs_io_subrequest *subreq,
+			     bool last_subreq)
 {
 	struct netfs_io_stream *stream = &rreq->io_streams[0];

@@ -177,8 +178,17 @@ static void netfs_issue_read(struct netfs_io_request *rreq,
 		}
 	}

+	if (last_subreq) {
+		smp_wmb(); /* Write lists before ALL_QUEUED. */
+		set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
+	}
+
 	spin_unlock(&rreq->lock);
+}

+static void netfs_issue_read(struct netfs_io_request *rreq,
+			     struct netfs_io_subrequest *subreq)
+{
 	switch (subreq->source) {
 	case NETFS_DOWNLOAD_FROM_SERVER:
 		rreq->netfs_ops->issue_read(subreq);
@@ -293,11 +303,8 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq)
 		}
 		size -= slice;
 		start += slice;
-		if (size <= 0) {
-			smp_wmb(); /* Write lists before ALL_QUEUED. */
-			set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags);
-		}

+		netfs_queue_read(rreq, subreq, size <= 0);
 		netfs_issue_read(rreq, subreq);
 		cond_resched();
 	} while (size > 0);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
  2025-02-12  9:47       ` [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued David Howells
@ 2025-02-12 18:01         ` Ihor Solodrai
  0 siblings, 0 replies; 13+ messages in thread
From: Ihor Solodrai @ 2025-02-12 18:01 UTC (permalink / raw)
  To: David Howells
  Cc: dhowells, Marc Dionne, Steve  French, Eric Van Hensbergen,
	Latchesar  Ionkov, Dominique Martinet, Christian Schoenebeck,
	Paulo Alcantara, Jeff Layton, Christian Brauner, v9fs, linux-cifs,
	netfs, linux-fsdevel, linux-kernel, ast, bpf

On 2/12/25 1:47 AM, David Howells wrote:
> Hi Ihor,
>
> Okay, the bug you're hitting appears to be a different one to the one I
> thought first.  Can you try the attached patch?  I managed to reproduce it
> with AFS by injecting a delay.
>
> [...]
>
> netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
>
> [...]

Hi David.

I tried this patch locally, and then on BPF CI. It fixes the 9p hanging issue.
A couple of platforms and toolchains are tested there:
https://github.com/kernel-patches/vmtest/actions/runs/13291034531

Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>

Note that on CI the "netfs: Fix a number of read-retry hangs" patch [1]
is *not* applied. Only this one.

Thank you!

[1] https://lore.kernel.org/v9fs/3173328.1738024385@warthog.procyon.org.uk/

> [...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-02-12 18:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28  0:33 [PATCH] netfs: Fix a number of read-retry hangs David Howells
2025-01-28  9:33 ` [PATCH] netfs: Add retry stat counters David Howells
2025-01-28 19:11   ` Ihor Solodrai
2025-02-03 18:01     ` Ihor Solodrai
2025-02-10 10:57   ` David Howells
2025-02-10 11:12   ` David Howells
2025-02-10 21:54     ` Ihor Solodrai
2025-02-10 23:18     ` David Howells
2025-02-11  0:54       ` Ihor Solodrai
2025-02-12  9:47       ` [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued David Howells
2025-02-12 18:01         ` Ihor Solodrai
2025-01-28 16:51 ` [PATCH] netfs: Fix a number of read-retry hangs Marc Dionne
2025-02-11  1:07 ` Steve French

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).