linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] netfs: Fix use of fscache with ceph
@ 2025-07-11 15:09 David Howells
  2025-07-11 15:10 ` [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache David Howells
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: David Howells @ 2025-07-11 15:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, Max Kellermann,
	Viacheslav Dubeyko, Alex Markuze, Ilya Dryomov, netfs, linux-nfs,
	ceph-devel, linux-fsdevel, linux-kernel

Hi Christian,

Here are a couple of patches that fix the use of fscaching with ceph:

 (1) Fix the read collector to mark the write request that it creates to copy
     data to the cache with NETFS_RREQ_OFFLOAD_COLLECTION so that it will run
     the write collector on a workqueue as it's meant to run in the background
     and the app isn't going to wait for it.

 (2) Fix the read collector to wake up the copy-to-cache write request after
     it sets NETFS_RREQ_ALL_QUEUED if the write request doesn't have any
     subrequests left on it.  ALL_QUEUED indicates that there won't be any
     more subreqs coming and the collector should clean up - except that an
     event is needed to trigger that, but it only gets events from subreq
     termination and so the last event can beat us to setting ALL_QUEUED.

The patches can also be found here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes

Thanks,
David

David Howells (2):
  netfs: Fix copy-to-cache so that it performs collection with
    ceph+fscache
  netfs: Fix race between cache write completion and ALL_QUEUED being
    set

 fs/netfs/read_pgpriv2.c      |  5 +++++
 include/trace/events/netfs.h | 30 ++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache
  2025-07-11 15:09 [PATCH 0/2] netfs: Fix use of fscache with ceph David Howells
@ 2025-07-11 15:10 ` David Howells
  2025-07-12  5:25   ` Max Kellermann
  2025-07-11 15:10 ` [PATCH 2/2] netfs: Fix race between cache write completion and ALL_QUEUED being set David Howells
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: David Howells @ 2025-07-11 15:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, Max Kellermann,
	Viacheslav Dubeyko, Alex Markuze, Ilya Dryomov, netfs, linux-nfs,
	ceph-devel, linux-fsdevel, linux-kernel, Paulo Alcantara, stable

The netfs copy-to-cache that is used by Ceph with local caching sets up a
new request to write data just read to the cache.  The request is started
and then left to look after itself whilst the app continues.  The request
gets notified by the backing fs upon completion of the async DIO write, but
then tries to wake up the app because NETFS_RREQ_OFFLOAD_COLLECTION isn't
set - but the app isn't waiting there, and so the request just hangs.

Fix this by setting NETFS_RREQ_OFFLOAD_COLLECTION which causes the
notification from the backing filesystem to put the collection onto a work
queue instead.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
---
 fs/netfs/read_pgpriv2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 5bbe906a551d..080d2a6a51d9 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -110,6 +110,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
 	if (!creq->io_streams[1].avail)
 		goto cancel_put;
 
+	__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
 	trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
 	netfs_stat(&netfs_n_wh_copy_to_cache);
 	rreq->copy_to_cache = creq;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] netfs: Fix race between cache write completion and ALL_QUEUED being set
  2025-07-11 15:09 [PATCH 0/2] netfs: Fix use of fscache with ceph David Howells
  2025-07-11 15:10 ` [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache David Howells
@ 2025-07-11 15:10 ` David Howells
  2025-07-11 20:09 ` [PATCH 0/2] netfs: Fix use of fscache with ceph Paulo Alcantara
  2025-07-14  9:05 ` Christian Brauner
  3 siblings, 0 replies; 6+ messages in thread
From: David Howells @ 2025-07-11 15:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Paulo Alcantara, Max Kellermann,
	Viacheslav Dubeyko, Alex Markuze, Ilya Dryomov, netfs, linux-nfs,
	ceph-devel, linux-fsdevel, linux-kernel, Paulo Alcantara, stable

When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function.  At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.

Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.

However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue.  Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.

This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache.  Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.

If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.

Fix this by queuing the collector after setting ALL_QUEUED.  This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.

Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
---
 fs/netfs/read_pgpriv2.c      |  4 ++++
 include/trace/events/netfs.h | 30 ++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 080d2a6a51d9..8097bc069c1d 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -111,6 +111,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
 		goto cancel_put;
 
 	__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
+	trace_netfs_copy2cache(rreq, creq);
 	trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
 	netfs_stat(&netfs_n_wh_copy_to_cache);
 	rreq->copy_to_cache = creq;
@@ -155,6 +156,9 @@ void netfs_pgpriv2_end_copy_to_cache(struct netfs_io_request *rreq)
 	netfs_issue_write(creq, &creq->io_streams[1]);
 	smp_wmb(); /* Write lists before ALL_QUEUED. */
 	set_bit(NETFS_RREQ_ALL_QUEUED, &creq->flags);
+	trace_netfs_rreq(rreq, netfs_rreq_trace_end_copy_to_cache);
+	if (list_empty_careful(&creq->io_streams[1].subrequests))
+		netfs_wake_collector(creq);
 
 	netfs_put_request(creq, netfs_rreq_trace_put_return);
 	creq->copy_to_cache = NULL;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 73e96ccbe830..64a382fbc31a 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -55,6 +55,7 @@
 	EM(netfs_rreq_trace_copy,		"COPY   ")	\
 	EM(netfs_rreq_trace_dirty,		"DIRTY  ")	\
 	EM(netfs_rreq_trace_done,		"DONE   ")	\
+	EM(netfs_rreq_trace_end_copy_to_cache,	"END-C2C")	\
 	EM(netfs_rreq_trace_free,		"FREE   ")	\
 	EM(netfs_rreq_trace_ki_complete,	"KI-CMPL")	\
 	EM(netfs_rreq_trace_recollect,		"RECLLCT")	\
@@ -559,6 +560,35 @@ TRACE_EVENT(netfs_write,
 		      __entry->start, __entry->start + __entry->len - 1)
 	    );
 
+TRACE_EVENT(netfs_copy2cache,
+	    TP_PROTO(const struct netfs_io_request *rreq,
+		     const struct netfs_io_request *creq),
+
+	    TP_ARGS(rreq, creq),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		rreq)
+		    __field(unsigned int,		creq)
+		    __field(unsigned int,		cookie)
+		    __field(unsigned int,		ino)
+			     ),
+
+	    TP_fast_assign(
+		    struct netfs_inode *__ctx = netfs_inode(rreq->inode);
+		    struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
+		    __entry->rreq	= rreq->debug_id;
+		    __entry->creq	= creq->debug_id;
+		    __entry->cookie	= __cookie ? __cookie->debug_id : 0;
+		    __entry->ino	= rreq->inode->i_ino;
+			   ),
+
+	    TP_printk("R=%08x CR=%08x c=%08x i=%x ",
+		      __entry->rreq,
+		      __entry->creq,
+		      __entry->cookie,
+		      __entry->ino)
+	    );
+
 TRACE_EVENT(netfs_collect,
 	    TP_PROTO(const struct netfs_io_request *wreq),
 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] netfs: Fix use of fscache with ceph
  2025-07-11 15:09 [PATCH 0/2] netfs: Fix use of fscache with ceph David Howells
  2025-07-11 15:10 ` [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache David Howells
  2025-07-11 15:10 ` [PATCH 2/2] netfs: Fix race between cache write completion and ALL_QUEUED being set David Howells
@ 2025-07-11 20:09 ` Paulo Alcantara
  2025-07-14  9:05 ` Christian Brauner
  3 siblings, 0 replies; 6+ messages in thread
From: Paulo Alcantara @ 2025-07-11 20:09 UTC (permalink / raw)
  To: David Howells, Christian Brauner
  Cc: David Howells, Max Kellermann, Viacheslav Dubeyko, Alex Markuze,
	Ilya Dryomov, netfs, linux-nfs, ceph-devel, linux-fsdevel,
	linux-kernel

David Howells <dhowells@redhat.com> writes:

> Hi Christian,
>
> Here are a couple of patches that fix the use of fscaching with ceph:
>
>  (1) Fix the read collector to mark the write request that it creates to copy
>      data to the cache with NETFS_RREQ_OFFLOAD_COLLECTION so that it will run
>      the write collector on a workqueue as it's meant to run in the background
>      and the app isn't going to wait for it.
>
>  (2) Fix the read collector to wake up the copy-to-cache write request after
>      it sets NETFS_RREQ_ALL_QUEUED if the write request doesn't have any
>      subrequests left on it.  ALL_QUEUED indicates that there won't be any
>      more subreqs coming and the collector should clean up - except that an
>      event is needed to trigger that, but it only gets events from subreq
>      termination and so the last event can beat us to setting ALL_QUEUED.
>
> The patches can also be found here:
>
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
>
> Thanks,
> David
>
> David Howells (2):
>   netfs: Fix copy-to-cache so that it performs collection with
>     ceph+fscache
>   netfs: Fix race between cache write completion and ALL_QUEUED being
>     set
>
>  fs/netfs/read_pgpriv2.c      |  5 +++++
>  include/trace/events/netfs.h | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)

Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache
  2025-07-11 15:10 ` [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache David Howells
@ 2025-07-12  5:25   ` Max Kellermann
  0 siblings, 0 replies; 6+ messages in thread
From: Max Kellermann @ 2025-07-12  5:25 UTC (permalink / raw)
  To: David Howells
  Cc: Christian Brauner, Paulo Alcantara, Viacheslav Dubeyko,
	Alex Markuze, Ilya Dryomov, netfs, linux-nfs, ceph-devel,
	linux-fsdevel, linux-kernel, Paulo Alcantara, stable

On Fri, Jul 11, 2025 at 5:10 PM David Howells <dhowells@redhat.com> wrote:
>
> The netfs copy-to-cache that is used by Ceph with local caching sets up a
> new request to write data just read to the cache.  The request is started
> and then left to look after itself whilst the app continues.  The request
> gets notified by the backing fs upon completion of the async DIO write, but
> then tries to wake up the app because NETFS_RREQ_OFFLOAD_COLLECTION isn't
> set - but the app isn't waiting there, and so the request just hangs.
>
> Fix this by setting NETFS_RREQ_OFFLOAD_COLLECTION which causes the
> notification from the backing filesystem to put the collection onto a work
> queue instead.

Thanks David, you can add me as Tested-by if you want.

I can't test the other patch for the next two weeks (vacation). When
I'm back, I'll install both fixes on some heavily loaded production
machines - our clusters always shake out the worst in every piece of
code they run!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] netfs: Fix use of fscache with ceph
  2025-07-11 15:09 [PATCH 0/2] netfs: Fix use of fscache with ceph David Howells
                   ` (2 preceding siblings ...)
  2025-07-11 20:09 ` [PATCH 0/2] netfs: Fix use of fscache with ceph Paulo Alcantara
@ 2025-07-14  9:05 ` Christian Brauner
  3 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2025-07-14  9:05 UTC (permalink / raw)
  To: David Howells
  Cc: Christian Brauner, Max Kellermann, Viacheslav Dubeyko,
	Alex Markuze, Ilya Dryomov, netfs, linux-nfs, ceph-devel,
	linux-fsdevel, linux-kernel, Paulo Alcantara

On Fri, 11 Jul 2025 16:09:59 +0100, David Howells wrote:
> Here are a couple of patches that fix the use of fscaching with ceph:
> 
>  (1) Fix the read collector to mark the write request that it creates to copy
>      data to the cache with NETFS_RREQ_OFFLOAD_COLLECTION so that it will run
>      the write collector on a workqueue as it's meant to run in the background
>      and the app isn't going to wait for it.
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache
      https://git.kernel.org/vfs/vfs/c/4c238e30774e
[2/2] netfs: Fix race between cache write completion and ALL_QUEUED being set
      https://git.kernel.org/vfs/vfs/c/89635eae076c

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-14  9:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11 15:09 [PATCH 0/2] netfs: Fix use of fscache with ceph David Howells
2025-07-11 15:10 ` [PATCH 1/2] netfs: Fix copy-to-cache so that it performs collection with ceph+fscache David Howells
2025-07-12  5:25   ` Max Kellermann
2025-07-11 15:10 ` [PATCH 2/2] netfs: Fix race between cache write completion and ALL_QUEUED being set David Howells
2025-07-11 20:09 ` [PATCH 0/2] netfs: Fix use of fscache with ceph Paulo Alcantara
2025-07-14  9:05 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).