[PATCH 6.15 097/178] netfs: Fix hang due to missing case in final DIO read result collection

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 6.15 097/178] netfs: Fix hang due to missing case in final DIO read result collection
       [not found] <20250708162236.549307806@linuxfoundation.org>
@ 2025-07-08 16:22 ` Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 098/178] netfs: Fix looping in wait functions Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-08 16:22 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, Steve French, David Howells,
	Paulo Alcantara, linux-cifs, netfs, linux-fsdevel,
	Christian Brauner, Sasha Levin

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>

[ Upstream commit da8cf4bd458722d090a788c6e581eeb72695c62f ]

When doing a DIO read, if the subrequests we issue fail and cause the
request PAUSE flag to be set to put a pause on subrequest generation, we
may complete collection of the subrequests (possibly discarding them) prior
to the ALL_QUEUED flags being set.

In such a case, netfs_read_collection() doesn't see ALL_QUEUED being set
after netfs_collect_read_results() returns and will just return to the app
(the collector can be seen unpausing the generator in the trace log).

The subrequest generator can then set ALL_QUEUED and the app thread reaches
netfs_wait_for_request().  This causes netfs_collect_in_app() to be called
to see if we're done yet, but there's missing case here.

netfs_collect_in_app() will see that a thread is active and set inactive to
false, but won't see any subrequests in the read stream, and so won't set
need_collect to true.  The function will then just return 0, indicating
that the caller should just sleep until further activity (which won't be
forthcoming) occurs.

Fix this by making netfs_collect_in_app() check to see if an active thread
is complete - i.e. that ALL_QUEUED is set and the subrequests list is empty
- and to skip the sleep return path.  The collector will then be called
which will clear the request IN_PROGRESS flag, allowing the app to
progress.

Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Reported-by: Steve French <sfrench@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-2-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/netfs/misc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 43b67a28a8fa0..0a54b12034868 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -381,7 +381,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
 static int netfs_collect_in_app(struct netfs_io_request *rreq,
 				bool (*collector)(struct netfs_io_request *rreq))
 {
-	bool need_collect = false, inactive = true;
+	bool need_collect = false, inactive = true, done = true;
 
 	for (int i = 0; i < NR_IO_STREAMS; i++) {
 		struct netfs_io_subrequest *subreq;
@@ -400,9 +400,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
 			need_collect = true;
 			break;
 		}
+		if (subreq || !test_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags))
+			done = false;
 	}
 
-	if (!need_collect && !inactive)
+	if (!need_collect && !inactive && !done)
 		return 0; /* Sleep */
 
 	__set_current_state(TASK_RUNNING);
-- 
2.39.5




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.15 098/178] netfs: Fix looping in wait functions
       [not found] <20250708162236.549307806@linuxfoundation.org>
  2025-07-08 16:22 ` [PATCH 6.15 097/178] netfs: Fix hang due to missing case in final DIO read result collection Greg Kroah-Hartman
@ 2025-07-08 16:22 ` Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 099/178] netfs: Fix ref leak on inserted extra subreq in write retry Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-08 16:22 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, David Howells, Steve French,
	Paulo Alcantara, netfs, linux-fsdevel, Christian Brauner,
	Sasha Levin

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>

[ Upstream commit 09623e3a14c1c5465124350cd227457c2b0fb017 ]

netfs_wait_for_request() and netfs_wait_for_pause() can loop forever if
netfs_collect_in_app() returns 2, indicating that it wants to repeat
because the ALL_QUEUED flag isn't yet set and there are no subreqs left
that haven't been collected.

The problem is that, unless collection is offloaded (OFFLOAD_COLLECTION),
we have to return to the application thread to continue and eventually set
ALL_QUEUED after pausing to deal with a retry - but we never get there.

Fix this by inserting checks for the IN_PROGRESS and PAUSE flags as
appropriate before cycling round - and add cond_resched() for good measure.

Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-5-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/netfs/misc.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index 0a54b12034868..d8b1a279dbda9 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -425,8 +425,8 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
 /*
  * Wait for a request to complete, successfully or otherwise.
  */
-static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
-				      bool (*collector)(struct netfs_io_request *rreq))
+static ssize_t netfs_wait_for_in_progress(struct netfs_io_request *rreq,
+					  bool (*collector)(struct netfs_io_request *rreq))
 {
 	DEFINE_WAIT(myself);
 	ssize_t ret;
@@ -442,6 +442,9 @@ static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
 			case 1:
 				goto all_collected;
 			case 2:
+				if (!netfs_check_rreq_in_progress(rreq))
+					break;
+				cond_resched();
 				continue;
 			}
 		}
@@ -480,12 +483,12 @@ static ssize_t netfs_wait_for_request(struct netfs_io_request *rreq,
 
 ssize_t netfs_wait_for_read(struct netfs_io_request *rreq)
 {
-	return netfs_wait_for_request(rreq, netfs_read_collection);
+	return netfs_wait_for_in_progress(rreq, netfs_read_collection);
 }
 
 ssize_t netfs_wait_for_write(struct netfs_io_request *rreq)
 {
-	return netfs_wait_for_request(rreq, netfs_write_collection);
+	return netfs_wait_for_in_progress(rreq, netfs_write_collection);
 }
 
 /*
@@ -509,6 +512,10 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
 			case 1:
 				goto all_collected;
 			case 2:
+				if (!netfs_check_rreq_in_progress(rreq) ||
+				    !test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
+					break;
+				cond_resched();
 				continue;
 			}
 		}
-- 
2.39.5




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.15 099/178] netfs: Fix ref leak on inserted extra subreq in write retry
       [not found] <20250708162236.549307806@linuxfoundation.org>
  2025-07-08 16:22 ` [PATCH 6.15 097/178] netfs: Fix hang due to missing case in final DIO read result collection Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 098/178] netfs: Fix looping in wait functions Greg Kroah-Hartman
@ 2025-07-08 16:22 ` Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 103/178] netfs: Fix i_size updating Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 119/178] netfs: Fix double put of request Greg Kroah-Hartman
  4 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-08 16:22 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, David Howells, Steve French,
	Paulo Alcantara, netfs, linux-fsdevel, Christian Brauner,
	Sasha Levin

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>

[ Upstream commit 97d8e8e52cb8ab3d7675880a92626d9a4332f7a6 ]

The write-retry algorithm will insert extra subrequests into the list if it
can't get sufficient capacity to split the range that needs to be retried
into the sequence of subrequests it currently has (for instance, if the
cifs credit pool has fewer credits available than it did when the range was
originally divided).

However, the allocator furnishes each new subreq with 2 refs and then
another is added for resubmission, causing one to be leaked.

Fix this by replacing the ref-getting line with a neutral trace line.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-6-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/netfs/write_retry.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 9d1d8a8bab726..7158657061e98 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -153,7 +153,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 			trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
 					     refcount_read(&subreq->ref),
 					     netfs_sreq_trace_new);
-			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+			trace_netfs_sreq(subreq, netfs_sreq_trace_split);
 
 			list_add(&subreq->rreq_link, &to->rreq_link);
 			to = list_next_entry(to, rreq_link);
-- 
2.39.5




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.15 103/178] netfs: Fix i_size updating
       [not found] <20250708162236.549307806@linuxfoundation.org>
                   ` (2 preceding siblings ...)
  2025-07-08 16:22 ` [PATCH 6.15 099/178] netfs: Fix ref leak on inserted extra subreq in write retry Greg Kroah-Hartman
@ 2025-07-08 16:22 ` Greg Kroah-Hartman
  2025-07-08 16:22 ` [PATCH 6.15 119/178] netfs: Fix double put of request Greg Kroah-Hartman
  4 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-08 16:22 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, David Howells,
	Paulo Alcantara (Red Hat), Steve French, linux-cifs, netfs,
	linux-fsdevel, Christian Brauner, Sasha Levin

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>

[ Upstream commit 2e0658940d90a3dc130bb3b7f75bae9f4100e01f ]

Fix the updating of i_size, particularly in regard to the completion of DIO
writes and especially async DIO writes by using a lock.

The bug is triggered occasionally by the generic/207 xfstest as it chucks a
bunch of AIO DIO writes at the filesystem and then checks that fstat()
returns a reasonable st_size as each completes.

The problem is that netfs is trying to do "if new_size > inode->i_size,
update inode->i_size" sort of thing but without a lock around it.

This can be seen with cifs, but shouldn't be seen with kafs because kafs
serialises modification ops on the client whereas cifs sends the requests
to the server as they're generated and lets the server order them.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-11-dhowells@redhat.com
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/netfs/buffered_write.c | 2 ++
 fs/netfs/direct_write.c   | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index dbb544e183d13..9f22ff890a8cd 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -64,6 +64,7 @@ static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
 		return;
 	}
 
+	spin_lock(&inode->i_lock);
 	i_size_write(inode, pos);
 #if IS_ENABLED(CONFIG_FSCACHE)
 	fscache_update_cookie(ctx->cache, NULL, &pos);
@@ -77,6 +78,7 @@ static void netfs_update_i_size(struct netfs_inode *ctx, struct inode *inode,
 					DIV_ROUND_UP(pos, SECTOR_SIZE),
 					inode->i_blocks + add);
 	}
+	spin_unlock(&inode->i_lock);
 }
 
 /**
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index fa9a5bf3c6d51..3efa5894b2c07 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -14,13 +14,17 @@ static void netfs_cleanup_dio_write(struct netfs_io_request *wreq)
 	struct inode *inode = wreq->inode;
 	unsigned long long end = wreq->start + wreq->transferred;
 
-	if (!wreq->error &&
-	    i_size_read(inode) < end) {
+	if (wreq->error || end <= i_size_read(inode))
+		return;
+
+	spin_lock(&inode->i_lock);
+	if (end > i_size_read(inode)) {
 		if (wreq->netfs_ops->update_i_size)
 			wreq->netfs_ops->update_i_size(inode, end);
 		else
 			i_size_write(inode, end);
 	}
+	spin_unlock(&inode->i_lock);
 }
 
 /*
-- 
2.39.5




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.15 119/178] netfs: Fix double put of request
       [not found] <20250708162236.549307806@linuxfoundation.org>
                   ` (3 preceding siblings ...)
  2025-07-08 16:22 ` [PATCH 6.15 103/178] netfs: Fix i_size updating Greg Kroah-Hartman
@ 2025-07-08 16:22 ` Greg Kroah-Hartman
  4 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-08 16:22 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, David Howells, Steve French,
	Paulo Alcantara, netfs, linux-fsdevel, linux-cifs,
	Christian Brauner, Sasha Levin

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>

[ Upstream commit 9df7b5ebead649b00bf9a53a798e4bf83a1318fd ]

If a netfs request finishes during the pause loop, it will have the ref
that belongs to the IN_PROGRESS flag removed at that point - however, if it
then goes to the final wait loop, that will *also* put the ref because it
sees that the IN_PROGRESS flag is clear and incorrectly assumes that this
happened when it called the collector.

In fact, since IN_PROGRESS is clear, we shouldn't call the collector again
since it's done all the cleanup, such as calling ->ki_complete().

Fix this by making netfs_collect_in_app() just return, indicating that
we're done if IN_PROGRESS is removed.

Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-3-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/netfs/misc.c              | 9 +++++++--
 include/trace/events/netfs.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index d8b1a279dbda9..8b1c11ef32aa5 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -383,6 +383,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
 {
 	bool need_collect = false, inactive = true, done = true;
 
+	if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) {
+		trace_netfs_rreq(rreq, netfs_rreq_trace_recollect);
+		return 1; /* Done */
+	}
+
 	for (int i = 0; i < NR_IO_STREAMS; i++) {
 		struct netfs_io_subrequest *subreq;
 		struct netfs_io_stream *stream = &rreq->io_streams[i];
@@ -442,7 +447,7 @@ static ssize_t netfs_wait_for_in_progress(struct netfs_io_request *rreq,
 			case 1:
 				goto all_collected;
 			case 2:
-				if (!netfs_check_rreq_in_progress(rreq))
+				if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags))
 					break;
 				cond_resched();
 				continue;
@@ -512,7 +517,7 @@ static void netfs_wait_for_pause(struct netfs_io_request *rreq,
 			case 1:
 				goto all_collected;
 			case 2:
-				if (!netfs_check_rreq_in_progress(rreq) ||
+				if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags) ||
 				    !test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
 					break;
 				cond_resched();
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 4175eec40048a..ecc1b852661e3 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -56,6 +56,7 @@
 	EM(netfs_rreq_trace_dirty,		"DIRTY  ")	\
 	EM(netfs_rreq_trace_done,		"DONE   ")	\
 	EM(netfs_rreq_trace_free,		"FREE   ")	\
+	EM(netfs_rreq_trace_recollect,		"RECLLCT")	\
 	EM(netfs_rreq_trace_redirty,		"REDIRTY")	\
 	EM(netfs_rreq_trace_resubmit,		"RESUBMT")	\
 	EM(netfs_rreq_trace_set_abandon,	"S-ABNDN")	\
-- 
2.39.5




^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-08 16:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250708162236.549307806@linuxfoundation.org>
2025-07-08 16:22 ` [PATCH 6.15 097/178] netfs: Fix hang due to missing case in final DIO read result collection Greg Kroah-Hartman
2025-07-08 16:22 ` [PATCH 6.15 098/178] netfs: Fix looping in wait functions Greg Kroah-Hartman
2025-07-08 16:22 ` [PATCH 6.15 099/178] netfs: Fix ref leak on inserted extra subreq in write retry Greg Kroah-Hartman
2025-07-08 16:22 ` [PATCH 6.15 103/178] netfs: Fix i_size updating Greg Kroah-Hartman
2025-07-08 16:22 ` [PATCH 6.15 119/178] netfs: Fix double put of request Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).