* Re: [PATCH] netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
2024-07-31 10:41 ` David Howells
@ 2024-07-31 11:37 ` Max Kellermann
2024-08-04 13:57 ` [PATCH] netfs: Set NETFS_RREQ_WRITE_TO_CACHE when caching is possible Hristo Venev
2024-08-07 20:17 ` [RFC][PATCH] netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags David Howells
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Max Kellermann @ 2024-07-31 11:37 UTC (permalink / raw)
To: David Howells
Cc: Ilya Dryomov, Xiubo Li, Jeff Layton, willy, ceph-devel, netfs,
linux-fsdevel, linux-kernel, stable
On Wed, Jul 31, 2024 at 12:41 PM David Howells <dhowells@redhat.com> wrote:
> > ------------[ cut here ]------------
> > WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
> > ceph_put_wrbuffer_cap_refs+0x416/0x500
>
> Is that "WARN_ON_ONCE(ci->i_auth_cap);" for you?
Yes, and that happens because no "capsnap" was found, because the
"snapc" parameter is 0x356 (NETFS_FOLIO_COPY_TO_CACHE); no
snap_context with address 0x356 could be found, of course.
Max
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH] netfs: Set NETFS_RREQ_WRITE_TO_CACHE when caching is possible
2024-07-31 11:37 ` Max Kellermann
@ 2024-08-04 13:57 ` Hristo Venev
2024-08-04 23:22 ` Trond Myklebust
0 siblings, 1 reply; 17+ messages in thread
From: Hristo Venev @ 2024-08-04 13:57 UTC (permalink / raw)
To: Max Kellermann, David Howells
Cc: Ilya Dryomov, Xiubo Li, Jeff Layton, willy, ceph-devel, netfs,
linux-fsdevel, linux-kernel, linux-nfs, blokos, Trond Myklebust,
dan.aloni@vastdata.com
[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]
In addition to Ceph, in NFS there are also some crashes related to the
use of 0x356 as a pointer.
`netfs_is_cache_enabled()` only returns true when the fscache cookie is
fully initialized. This may happen after the request has been created,
so check for the cookie's existence instead.
Link: https://lore.kernel.org/linux-nfs/b78c88db-8b3a-4008-94cb-82ae08f0e37b@free.fr/T/
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
Cc: blokos <blokos@free.fr>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Cc: dan.aloni@vastdata.com <dan.aloni@vastdata.com>
Signed-off-by: Hristo Venev <hristo@venev.name>
---
fs/netfs/objects.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index f4a6427274792..a74ca90c86c9b 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -27,7 +27,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
origin == NETFS_DIO_READ ||
origin == NETFS_DIO_WRITE);
- bool cached = !is_unbuffered && netfs_is_cache_enabled(ctx);
int ret;
for (;;) {
@@ -56,8 +55,9 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
refcount_set(&rreq->ref, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
- if (cached) {
- __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
+ if (!is_unbuffered && fscache_cookie_valid(netfs_i_cookie(ctx))) {
+ if(netfs_is_cache_enabled(ctx))
+ __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
/* Filesystem uses deprecated PG_private_2 marking. */
__set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 858 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] netfs: Set NETFS_RREQ_WRITE_TO_CACHE when caching is possible
2024-08-04 13:57 ` [PATCH] netfs: Set NETFS_RREQ_WRITE_TO_CACHE when caching is possible Hristo Venev
@ 2024-08-04 23:22 ` Trond Myklebust
2024-08-05 15:34 ` Hristo Venev
0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2024-08-04 23:22 UTC (permalink / raw)
To: max.kellermann@ionos.com, hristo@venev.name, dhowells@redhat.com
Cc: dan.aloni@vastdata.com, xiubli@redhat.com,
linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-kernel@vger.kernel.org, netfs@lists.linux.dev,
jlayton@kernel.org, idryomov@gmail.com, willy@infradead.org,
blokos@free.fr, linux-nfs@vger.kernel.org
On Sun, 2024-08-04 at 16:57 +0300, Hristo Venev wrote:
> In addition to Ceph, in NFS there are also some crashes related to
> the
> use of 0x356 as a pointer.
>
> `netfs_is_cache_enabled()` only returns true when the fscache cookie
> is
> fully initialized. This may happen after the request has been
> created,
> so check for the cookie's existence instead.
>
> Link:
> https://lore.kernel.org/linux-nfs/b78c88db-8b3a-4008-94cb-82ae08f0e37b@free.fr/T/
> Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio-
> >private and marking dirty")
> Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
> Cc: blokos <blokos@free.fr>
> Cc: Trond Myklebust <trondmy@hammerspace.com>
> Cc: dan.aloni@vastdata.com <dan.aloni@vastdata.com>
> Signed-off-by: Hristo Venev <hristo@venev.name>
> ---
> fs/netfs/objects.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
> index f4a6427274792..a74ca90c86c9b 100644
> --- a/fs/netfs/objects.c
> +++ b/fs/netfs/objects.c
> @@ -27,7 +27,6 @@ struct netfs_io_request *netfs_alloc_request(struct
> address_space *mapping,
> bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
> origin == NETFS_DIO_READ ||
> origin == NETFS_DIO_WRITE);
> - bool cached = !is_unbuffered && netfs_is_cache_enabled(ctx);
> int ret;
>
> for (;;) {
> @@ -56,8 +55,9 @@ struct netfs_io_request *netfs_alloc_request(struct
> address_space *mapping,
> refcount_set(&rreq->ref, 1);
>
> __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
> - if (cached) {
> - __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
> + if (!is_unbuffered &&
> fscache_cookie_valid(netfs_i_cookie(ctx))) {
> + if(netfs_is_cache_enabled(ctx))
> + __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq-
> >flags);
> if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
> /* Filesystem uses deprecated PG_private_2
> marking. */
> __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq-
> >flags);
Does this mean that netfs could still end up setting a value for folio-
>private in NFS given some other set of circumstances?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] netfs: Set NETFS_RREQ_WRITE_TO_CACHE when caching is possible
2024-08-04 23:22 ` Trond Myklebust
@ 2024-08-05 15:34 ` Hristo Venev
0 siblings, 0 replies; 17+ messages in thread
From: Hristo Venev @ 2024-08-05 15:34 UTC (permalink / raw)
To: Trond Myklebust, max.kellermann@ionos.com, dhowells@redhat.com
Cc: dan.aloni@vastdata.com, xiubli@redhat.com,
linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-kernel@vger.kernel.org, netfs@lists.linux.dev,
jlayton@kernel.org, idryomov@gmail.com, willy@infradead.org,
blokos@free.fr, linux-nfs@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2752 bytes --]
On Sun, 2024-08-04 at 23:22 +0000, Trond Myklebust wrote:
> On Sun, 2024-08-04 at 16:57 +0300, Hristo Venev wrote:
> > In addition to Ceph, in NFS there are also some crashes related to
> > the
> > use of 0x356 as a pointer.
> >
> > `netfs_is_cache_enabled()` only returns true when the fscache
> > cookie
> > is
> > fully initialized. This may happen after the request has been
> > created,
> > so check for the cookie's existence instead.
> >
> > Link:
> > https://lore.kernel.org/linux-nfs/b78c88db-8b3a-4008-94cb-82ae08f0e37b@free.fr/T/
> > Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio-
> > > private and marking dirty")
> > Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
> > Cc: blokos <blokos@free.fr>
> > Cc: Trond Myklebust <trondmy@hammerspace.com>
> > Cc: dan.aloni@vastdata.com <dan.aloni@vastdata.com>
> > Signed-off-by: Hristo Venev <hristo@venev.name>
> > ---
> > fs/netfs/objects.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
> > index f4a6427274792..a74ca90c86c9b 100644
> > --- a/fs/netfs/objects.c
> > +++ b/fs/netfs/objects.c
> > @@ -27,7 +27,6 @@ struct netfs_io_request
> > *netfs_alloc_request(struct
> > address_space *mapping,
> > bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
> > origin == NETFS_DIO_READ ||
> > origin == NETFS_DIO_WRITE);
> > - bool cached = !is_unbuffered &&
> > netfs_is_cache_enabled(ctx);
> > int ret;
> >
> > for (;;) {
> > @@ -56,8 +55,9 @@ struct netfs_io_request
> > *netfs_alloc_request(struct
> > address_space *mapping,
> > refcount_set(&rreq->ref, 1);
> >
> > __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
> > - if (cached) {
> > - __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq-
> > >flags);
> > + if (!is_unbuffered &&
> > fscache_cookie_valid(netfs_i_cookie(ctx))) {
> > + if(netfs_is_cache_enabled(ctx))
> > + __set_bit(NETFS_RREQ_WRITE_TO_CACHE,
> > &rreq-
> > > flags);
> > if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
> > /* Filesystem uses deprecated PG_private_2
> > marking. */
> > __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq-
> > > flags);
>
> Does this mean that netfs could still end up setting a value for
> folio-
> > private in NFS given some other set of circumstances?
Hopefully not? For NFS the cookie should be allocated in
`nfs_fscache_init_inode`, and for Ceph I think `ceph_fill_inode` (which
calls `ceph_fscache_register_inode_cookie`) should also be called early
enough as well.
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC][PATCH] netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
2024-07-31 10:41 ` David Howells
2024-07-31 11:37 ` Max Kellermann
@ 2024-08-07 20:17 ` David Howells
2024-08-08 11:46 ` [PATCH v2] " David Howells
2024-08-08 21:31 ` [PATCH v3] " David Howells
3 siblings, 0 replies; 17+ messages in thread
From: David Howells @ 2024-08-07 20:17 UTC (permalink / raw)
To: Max Kellermann, Hristo Venev
Cc: dhowells, Ilya Dryomov, Xiubo Li, Jeff Layton, willy, ceph-devel,
netfs, linux-fsdevel, linux-kernel, stable
The attached patch gets me most of the way there, applied on the top of the
reversion one. See:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
There's still an occasional slab-use-after-free that pops up:
BUG: KASAN: slab-use-after-free in xa_head+0xe/0x70
Read of size 8 at addr ffff8881b2cf6df8 by task kworker/0:1/9
...
xa_head+0xe/0x70
xas_start+0xca/0x140
xas_load+0x16/0x110
xas_find+0x84/0x1f0
__fscache_clear_page_bits+0x136/0x340
...
where the thing being allocated is a ceph inode.
Note that Hristo's patch is not sufficient.
David
---
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly. The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct. Further, we secondarily sample the
cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
->init_request() rather than trying to juggle that in
netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
to be done, then PG_private_2 is to be used rather than only setting
it if we decide to cache and then having netfs_rreq_unlock_folios()
set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which
contains the deprecated code for using PG_private_2 to avoid
accidentally doing the writeback path - and always use it if
USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
wait for PG_private_2. This function is deprecated and only used by
ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
fscache_operation_valid() on the cache_resources instead. This has
the advantage of picking up the result of netfs_begin_cache_read() and
fscache_begin_write_operation() - which are called after the object is
initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Hristo Venev <hristo@venev.name>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 73b5a07bf94d..cc0a2240de98 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -424,6 +424,9 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
struct ceph_netfs_request_data *priv;
int ret = 0;
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
+
if (rreq->origin != NETFS_READAHEAD)
return 0;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8f8de8f33abb..71cd70514efa 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -577,8 +577,6 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
/* Set parameters for the netfs library */
netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &ci->netfs.flags);
spin_lock_init(&ci->i_ceph_lock);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 424048f9ed1f..79d83abb655b 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -9,6 +9,97 @@
#include <linux/task_io_accounting_ops.h>
#include "internal.h"
+/*
+ * [DEPRECATED] Unlock the folios in a read operation for when the filesystem
+ * is using PG_private_2 and direct writing to the cache from here rather than
+ * marking the page for writeback.
+ *
+ * Note that we don't touch folio->private in this code.
+ */
+static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq)
+{
+ struct netfs_io_subrequest *subreq;
+ struct folio *folio;
+ pgoff_t start_page = rreq->start / PAGE_SIZE;
+ pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
+ size_t account = 0;
+ bool subreq_failed = false;
+
+ XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+
+ /* Walk through the pagecache and the I/O request lists simultaneously.
+ * We may have a mixture of cached and uncached sections and we only
+ * really want to write out the uncached sections. This is slightly
+ * complicated by the possibility that we might have huge pages with a
+ * mixture inside.
+ */
+ subreq = list_first_entry(&rreq->subrequests,
+ struct netfs_io_subrequest, rreq_link);
+ subreq_failed = (subreq->error < 0);
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last_page) {
+ loff_t pg_end;
+ bool pg_failed = false;
+ bool folio_started = false;
+
+ if (xas_retry(&xas, folio))
+ continue;
+
+ pg_end = folio_pos(folio) + folio_size(folio) - 1;
+
+ for (;;) {
+ loff_t sreq_end;
+
+ if (!subreq) {
+ pg_failed = true;
+ break;
+ }
+
+ if (!folio_started &&
+ test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) &&
+ fscache_operation_valid(&rreq->cache_resources)) {
+ trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+ folio_start_private_2(folio);
+ folio_started = true;
+ }
+
+ pg_failed |= subreq_failed;
+ sreq_end = subreq->start + subreq->len - 1;
+ if (pg_end < sreq_end)
+ break;
+
+ account += subreq->transferred;
+ if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+ subreq = list_next_entry(subreq, rreq_link);
+ subreq_failed = (subreq->error < 0);
+ } else {
+ subreq = NULL;
+ subreq_failed = false;
+ }
+
+ if (pg_end == sreq_end)
+ break;
+ }
+
+ if (!pg_failed) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+
+ if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
+ if (folio->index == rreq->no_unlock_folio &&
+ test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
+ kdebug("no unlock");
+ else
+ folio_unlock(folio);
+ }
+ }
+ rcu_read_unlock();
+}
+
/*
* Unlock the folios in a read operation. We need to set PG_writeback on any
* folios we're going to write back before we unlock them.
@@ -35,6 +126,12 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
}
+ /* Handle deprecated PG_private_2 case. */
+ if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
+ netfs_rreq_unlock_folios_pgpriv2(rreq);
+ goto out;
+ }
+
/* Walk through the pagecache and the I/O request lists simultaneously.
* We may have a mixture of cached and uncached sections and we only
* really want to write out the uncached sections. This is slightly
@@ -52,7 +149,6 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
loff_t pg_end;
bool pg_failed = false;
bool wback_to_cache = false;
- bool folio_started = false;
if (xas_retry(&xas, folio))
continue;
@@ -66,17 +162,8 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
pg_failed = true;
break;
}
- if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
- if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
- &subreq->flags)) {
- trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
- folio_start_private_2(folio);
- folio_started = true;
- }
- } else {
- wback_to_cache |=
- test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
- }
+
+ wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
pg_failed |= subreq_failed;
sreq_end = subreq->start + subreq->len - 1;
if (pg_end < sreq_end)
@@ -124,6 +211,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
rcu_read_unlock();
+out:
task_io_account_read(account);
if (rreq->netfs_ops->done)
rreq->netfs_ops->done(rreq);
@@ -395,7 +483,7 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
}
/**
- * netfs_write_begin - Helper to prepare for writing
+ * netfs_write_begin - Helper to prepare for writing [DEPRECATED]
* @ctx: The netfs context
* @file: The file to read from
* @mapping: The mapping to read from
@@ -426,6 +514,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
* inode before calling this.
*
* This is usable whether or not caching is enabled.
+ *
+ * Note that this should be considered deprecated and netfs_perform_write()
+ * used instead.
*/
int netfs_write_begin(struct netfs_inode *ctx,
struct file *file, struct address_space *mapping,
@@ -507,11 +598,9 @@ int netfs_write_begin(struct netfs_inode *ctx,
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
have_folio:
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
- ret = folio_wait_private_2_killable(folio);
- if (ret < 0)
- goto error;
- }
+ ret = folio_wait_private_2_killable(folio);
+ if (ret < 0)
+ goto error;
have_folio_no_wait:
*_folio = folio;
_leave(" = 0");
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index f4a642727479..0faea0cee179 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -57,10 +57,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
if (cached) {
- __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
- /* Filesystem uses deprecated PG_private_2 marking. */
- __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
}
if (file && file->f_flags & O_NONBLOCK)
__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 9258d30cffe3..d35bb0f25d69 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -102,7 +102,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
_enter("R=%x", wreq->debug_id);
ictx = netfs_inode(wreq->inode);
- if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
+ if (fscache_operation_valid(&wreq->cache_resources))
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
wreq->contiguity = wreq->start;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 7202ce84d0eb..bf29a65c5027 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -265,6 +265,8 @@ static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *fi
{
rreq->netfs_priv = get_nfs_open_context(nfs_file_open_context(file));
rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
return 0;
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index fbed0027996f..e8adae1bc260 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -81,8 +81,6 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
{
netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &nfsi->netfs.flags);
}
extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5d0288938cc2..983816608f15 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -73,8 +73,6 @@ struct netfs_inode {
#define NETFS_ICTX_ODIRECT 0 /* The file has DIO in progress */
#define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */
#define NETFS_ICTX_WRITETHROUGH 2 /* Write-through caching */
-#define NETFS_ICTX_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
- * write to cache on read */
};
/*
@@ -269,7 +267,6 @@ struct netfs_io_request {
#define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */
#define NETFS_RREQ_FAILED 4 /* The request failed */
#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */
-#define NETFS_RREQ_WRITE_TO_CACHE 7 /* Need to write to the cache */
#define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */
#define NETFS_RREQ_NONBLOCK 9 /* Don't block if possible (O_NONBLOCK) */
#define NETFS_RREQ_BLOCKED 10 /* We blocked */
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 24ec3434d32e..606b4a0f92da 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -51,6 +51,7 @@
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
EM(netfs_rreq_trace_set_pause, "PAUSE ") \
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
+ EM(netfs_rreq_trace_unlock_pgpriv2, "UNLCK-2") \
EM(netfs_rreq_trace_unmark, "UNMARK ") \
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v2] netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
2024-07-31 10:41 ` David Howells
2024-07-31 11:37 ` Max Kellermann
2024-08-07 20:17 ` [RFC][PATCH] netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags David Howells
@ 2024-08-08 11:46 ` David Howells
2024-08-08 21:31 ` [PATCH v3] " David Howells
3 siblings, 0 replies; 17+ messages in thread
From: David Howells @ 2024-08-08 11:46 UTC (permalink / raw)
To: Max Kellermann
Cc: dhowells, Hristo Venev, Ilya Dryomov, Xiubo Li, Jeff Layton,
willy, ceph-devel, netfs, linux-fsdevel, linux-kernel, stable
Okay, I updated to -rc2 and now the apparent UAF of a ceph inode doesn't
seem to happen. I've fixed a couple of minor issues in the patch:
- Switched a kdebug() to a _debug().
- netfs_rreq_unlock_folios_pgpriv2() needed to updated the 'account'
variable in the caller, not do it's own thing.
David
---
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly. The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct. Further, we secondarily sample the
cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
->init_request() rather than trying to juggle that in
netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
to be done, then PG_private_2 is to be used rather than only setting
it if we decide to cache and then having netfs_rreq_unlock_folios()
set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which
contains the deprecated code for using PG_private_2 to avoid
accidentally doing the writeback path - and always use it if
USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
wait for PG_private_2. This function is deprecated and only used by
ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
fscache_operation_valid() on the cache_resources instead. This has
the advantage of picking up the result of netfs_begin_cache_read() and
fscache_begin_write_operation() - which are called after the object is
initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Hristo Venev <hristo@venev.name>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
---
fs/ceph/addr.c | 3 +
fs/ceph/inode.c | 2
fs/netfs/buffered_read.c | 125 ++++++++++++++++++++++++++++++++++++-------
fs/netfs/objects.c | 4 -
fs/netfs/write_issue.c | 2
fs/nfs/fscache.c | 2
fs/nfs/fscache.h | 2
include/linux/netfs.h | 3 -
include/trace/events/netfs.h | 1
9 files changed, 114 insertions(+), 30 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 73b5a07bf94d..cc0a2240de98 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -424,6 +424,9 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
struct ceph_netfs_request_data *priv;
int ret = 0;
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
+
if (rreq->origin != NETFS_READAHEAD)
return 0;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8f8de8f33abb..71cd70514efa 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -577,8 +577,6 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
/* Set parameters for the netfs library */
netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &ci->netfs.flags);
spin_lock_init(&ci->i_ceph_lock);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 424048f9ed1f..27c750d39476 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -9,6 +9,97 @@
#include <linux/task_io_accounting_ops.h>
#include "internal.h"
+/*
+ * [DEPRECATED] Unlock the folios in a read operation for when the filesystem
+ * is using PG_private_2 and direct writing to the cache from here rather than
+ * marking the page for writeback.
+ *
+ * Note that we don't touch folio->private in this code.
+ */
+static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq,
+ size_t *account)
+{
+ struct netfs_io_subrequest *subreq;
+ struct folio *folio;
+ pgoff_t start_page = rreq->start / PAGE_SIZE;
+ pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
+ bool subreq_failed = false;
+
+ XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+
+ /* Walk through the pagecache and the I/O request lists simultaneously.
+ * We may have a mixture of cached and uncached sections and we only
+ * really want to write out the uncached sections. This is slightly
+ * complicated by the possibility that we might have huge pages with a
+ * mixture inside.
+ */
+ subreq = list_first_entry(&rreq->subrequests,
+ struct netfs_io_subrequest, rreq_link);
+ subreq_failed = (subreq->error < 0);
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last_page) {
+ loff_t pg_end;
+ bool pg_failed = false;
+ bool folio_started = false;
+
+ if (xas_retry(&xas, folio))
+ continue;
+
+ pg_end = folio_pos(folio) + folio_size(folio) - 1;
+
+ for (;;) {
+ loff_t sreq_end;
+
+ if (!subreq) {
+ pg_failed = true;
+ break;
+ }
+
+ if (!folio_started &&
+ test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) &&
+ fscache_operation_valid(&rreq->cache_resources)) {
+ trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+ folio_start_private_2(folio);
+ folio_started = true;
+ }
+
+ pg_failed |= subreq_failed;
+ sreq_end = subreq->start + subreq->len - 1;
+ if (pg_end < sreq_end)
+ break;
+
+ *account += subreq->transferred;
+ if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+ subreq = list_next_entry(subreq, rreq_link);
+ subreq_failed = (subreq->error < 0);
+ } else {
+ subreq = NULL;
+ subreq_failed = false;
+ }
+
+ if (pg_end == sreq_end)
+ break;
+ }
+
+ if (!pg_failed) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+
+ if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
+ if (folio->index == rreq->no_unlock_folio &&
+ test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
+ _debug("no unlock");
+ else
+ folio_unlock(folio);
+ }
+ }
+ rcu_read_unlock();
+}
+
/*
* Unlock the folios in a read operation. We need to set PG_writeback on any
* folios we're going to write back before we unlock them.
@@ -35,6 +126,12 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
}
+ /* Handle deprecated PG_private_2 case. */
+ if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
+ netfs_rreq_unlock_folios_pgpriv2(rreq, &account);
+ goto out;
+ }
+
/* Walk through the pagecache and the I/O request lists simultaneously.
* We may have a mixture of cached and uncached sections and we only
* really want to write out the uncached sections. This is slightly
@@ -52,7 +149,6 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
loff_t pg_end;
bool pg_failed = false;
bool wback_to_cache = false;
- bool folio_started = false;
if (xas_retry(&xas, folio))
continue;
@@ -66,17 +162,8 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
pg_failed = true;
break;
}
- if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
- if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
- &subreq->flags)) {
- trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
- folio_start_private_2(folio);
- folio_started = true;
- }
- } else {
- wback_to_cache |=
- test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
- }
+
+ wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
pg_failed |= subreq_failed;
sreq_end = subreq->start + subreq->len - 1;
if (pg_end < sreq_end)
@@ -124,6 +211,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
rcu_read_unlock();
+out:
task_io_account_read(account);
if (rreq->netfs_ops->done)
rreq->netfs_ops->done(rreq);
@@ -395,7 +483,7 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
}
/**
- * netfs_write_begin - Helper to prepare for writing
+ * netfs_write_begin - Helper to prepare for writing [DEPRECATED]
* @ctx: The netfs context
* @file: The file to read from
* @mapping: The mapping to read from
@@ -426,6 +514,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
* inode before calling this.
*
* This is usable whether or not caching is enabled.
+ *
+ * Note that this should be considered deprecated and netfs_perform_write()
+ * used instead.
*/
int netfs_write_begin(struct netfs_inode *ctx,
struct file *file, struct address_space *mapping,
@@ -507,11 +598,9 @@ int netfs_write_begin(struct netfs_inode *ctx,
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
have_folio:
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
- ret = folio_wait_private_2_killable(folio);
- if (ret < 0)
- goto error;
- }
+ ret = folio_wait_private_2_killable(folio);
+ if (ret < 0)
+ goto error;
have_folio_no_wait:
*_folio = folio;
_leave(" = 0");
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index f4a642727479..0faea0cee179 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -57,10 +57,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
if (cached) {
- __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
- /* Filesystem uses deprecated PG_private_2 marking. */
- __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
}
if (file && file->f_flags & O_NONBLOCK)
__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 9258d30cffe3..d35bb0f25d69 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -102,7 +102,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
_enter("R=%x", wreq->debug_id);
ictx = netfs_inode(wreq->inode);
- if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
+ if (fscache_operation_valid(&wreq->cache_resources))
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
wreq->contiguity = wreq->start;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 7202ce84d0eb..bf29a65c5027 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -265,6 +265,8 @@ static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *fi
{
rreq->netfs_priv = get_nfs_open_context(nfs_file_open_context(file));
rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
return 0;
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index fbed0027996f..e8adae1bc260 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -81,8 +81,6 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
{
netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &nfsi->netfs.flags);
}
extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5d0288938cc2..983816608f15 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -73,8 +73,6 @@ struct netfs_inode {
#define NETFS_ICTX_ODIRECT 0 /* The file has DIO in progress */
#define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */
#define NETFS_ICTX_WRITETHROUGH 2 /* Write-through caching */
-#define NETFS_ICTX_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
- * write to cache on read */
};
/*
@@ -269,7 +267,6 @@ struct netfs_io_request {
#define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */
#define NETFS_RREQ_FAILED 4 /* The request failed */
#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */
-#define NETFS_RREQ_WRITE_TO_CACHE 7 /* Need to write to the cache */
#define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */
#define NETFS_RREQ_NONBLOCK 9 /* Don't block if possible (O_NONBLOCK) */
#define NETFS_RREQ_BLOCKED 10 /* We blocked */
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 24ec3434d32e..606b4a0f92da 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -51,6 +51,7 @@
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
EM(netfs_rreq_trace_set_pause, "PAUSE ") \
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
+ EM(netfs_rreq_trace_unlock_pgpriv2, "UNLCK-2") \
EM(netfs_rreq_trace_unmark, "UNMARK ") \
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3] netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
2024-07-31 10:41 ` David Howells
` (2 preceding siblings ...)
2024-08-08 11:46 ` [PATCH v2] " David Howells
@ 2024-08-08 21:31 ` David Howells
3 siblings, 0 replies; 17+ messages in thread
From: David Howells @ 2024-08-08 21:31 UTC (permalink / raw)
Cc: dhowells, Max Kellermann, Hristo Venev, Ilya Dryomov, Xiubo Li,
Jeff Layton, willy, Christian Brauner, ceph-devel, netfs,
linux-fsdevel, linux-kernel, stable
It turned out that I had accidentally disabled caching for 9p, afs and cifs,
so here's a v3 that fixes that.
David
---
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly. The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct. Further, we secondarily sample the
cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
->init_request() rather than trying to juggle that in
netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
to be done, then PG_private_2 is to be used rather than only setting
it if we decide to cache and then having netfs_rreq_unlock_folios()
set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which
contains the deprecated code for using PG_private_2 to avoid
accidentally doing the writeback path - and always use it if
USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
wait for PG_private_2. This function is deprecated and only used by
ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
fscache_operation_valid() on the cache_resources instead. This has
the advantage of picking up the result of netfs_begin_cache_read() and
fscache_begin_write_operation() - which are called after the object is
initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Hristo Venev <hristo@venev.name>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
---
fs/ceph/addr.c | 3 +
fs/ceph/inode.c | 2
fs/netfs/buffered_read.c | 125 ++++++++++++++++++++++++++++++++++++-------
fs/netfs/objects.c | 10 ---
fs/netfs/write_issue.c | 4 +
fs/nfs/fscache.c | 2
fs/nfs/fscache.h | 2
include/linux/netfs.h | 3 -
include/trace/events/netfs.h | 1
9 files changed, 116 insertions(+), 36 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 73b5a07bf94d..cc0a2240de98 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -424,6 +424,9 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
struct ceph_netfs_request_data *priv;
int ret = 0;
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
+
if (rreq->origin != NETFS_READAHEAD)
return 0;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8f8de8f33abb..71cd70514efa 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -577,8 +577,6 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
/* Set parameters for the netfs library */
netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &ci->netfs.flags);
spin_lock_init(&ci->i_ceph_lock);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 424048f9ed1f..27c750d39476 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -9,6 +9,97 @@
#include <linux/task_io_accounting_ops.h>
#include "internal.h"
+/*
+ * [DEPRECATED] Unlock the folios in a read operation for when the filesystem
+ * is using PG_private_2 and direct writing to the cache from here rather than
+ * marking the page for writeback.
+ *
+ * Note that we don't touch folio->private in this code.
+ */
+static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq,
+ size_t *account)
+{
+ struct netfs_io_subrequest *subreq;
+ struct folio *folio;
+ pgoff_t start_page = rreq->start / PAGE_SIZE;
+ pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
+ bool subreq_failed = false;
+
+ XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+
+ /* Walk through the pagecache and the I/O request lists simultaneously.
+ * We may have a mixture of cached and uncached sections and we only
+ * really want to write out the uncached sections. This is slightly
+ * complicated by the possibility that we might have huge pages with a
+ * mixture inside.
+ */
+ subreq = list_first_entry(&rreq->subrequests,
+ struct netfs_io_subrequest, rreq_link);
+ subreq_failed = (subreq->error < 0);
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last_page) {
+ loff_t pg_end;
+ bool pg_failed = false;
+ bool folio_started = false;
+
+ if (xas_retry(&xas, folio))
+ continue;
+
+ pg_end = folio_pos(folio) + folio_size(folio) - 1;
+
+ for (;;) {
+ loff_t sreq_end;
+
+ if (!subreq) {
+ pg_failed = true;
+ break;
+ }
+
+ if (!folio_started &&
+ test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) &&
+ fscache_operation_valid(&rreq->cache_resources)) {
+ trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
+ folio_start_private_2(folio);
+ folio_started = true;
+ }
+
+ pg_failed |= subreq_failed;
+ sreq_end = subreq->start + subreq->len - 1;
+ if (pg_end < sreq_end)
+ break;
+
+ *account += subreq->transferred;
+ if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+ subreq = list_next_entry(subreq, rreq_link);
+ subreq_failed = (subreq->error < 0);
+ } else {
+ subreq = NULL;
+ subreq_failed = false;
+ }
+
+ if (pg_end == sreq_end)
+ break;
+ }
+
+ if (!pg_failed) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+
+ if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
+ if (folio->index == rreq->no_unlock_folio &&
+ test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
+ _debug("no unlock");
+ else
+ folio_unlock(folio);
+ }
+ }
+ rcu_read_unlock();
+}
+
/*
* Unlock the folios in a read operation. We need to set PG_writeback on any
* folios we're going to write back before we unlock them.
@@ -35,6 +126,12 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
}
+ /* Handle deprecated PG_private_2 case. */
+ if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
+ netfs_rreq_unlock_folios_pgpriv2(rreq, &account);
+ goto out;
+ }
+
/* Walk through the pagecache and the I/O request lists simultaneously.
* We may have a mixture of cached and uncached sections and we only
* really want to write out the uncached sections. This is slightly
@@ -52,7 +149,6 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
loff_t pg_end;
bool pg_failed = false;
bool wback_to_cache = false;
- bool folio_started = false;
if (xas_retry(&xas, folio))
continue;
@@ -66,17 +162,8 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
pg_failed = true;
break;
}
- if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
- if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
- &subreq->flags)) {
- trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
- folio_start_private_2(folio);
- folio_started = true;
- }
- } else {
- wback_to_cache |=
- test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
- }
+
+ wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
pg_failed |= subreq_failed;
sreq_end = subreq->start + subreq->len - 1;
if (pg_end < sreq_end)
@@ -124,6 +211,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
}
rcu_read_unlock();
+out:
task_io_account_read(account);
if (rreq->netfs_ops->done)
rreq->netfs_ops->done(rreq);
@@ -395,7 +483,7 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
}
/**
- * netfs_write_begin - Helper to prepare for writing
+ * netfs_write_begin - Helper to prepare for writing [DEPRECATED]
* @ctx: The netfs context
* @file: The file to read from
* @mapping: The mapping to read from
@@ -426,6 +514,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
* inode before calling this.
*
* This is usable whether or not caching is enabled.
+ *
+ * Note that this should be considered deprecated and netfs_perform_write()
+ * used instead.
*/
int netfs_write_begin(struct netfs_inode *ctx,
struct file *file, struct address_space *mapping,
@@ -507,11 +598,9 @@ int netfs_write_begin(struct netfs_inode *ctx,
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
have_folio:
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
- ret = folio_wait_private_2_killable(folio);
- if (ret < 0)
- goto error;
- }
+ ret = folio_wait_private_2_killable(folio);
+ if (ret < 0)
+ goto error;
have_folio_no_wait:
*_folio = folio;
_leave(" = 0");
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index f4a642727479..0294df70c3ff 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -24,10 +24,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
struct netfs_io_request *rreq;
mempool_t *mempool = ctx->ops->request_pool ?: &netfs_request_pool;
struct kmem_cache *cache = mempool->pool_data;
- bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
- origin == NETFS_DIO_READ ||
- origin == NETFS_DIO_WRITE);
- bool cached = !is_unbuffered && netfs_is_cache_enabled(ctx);
int ret;
for (;;) {
@@ -56,12 +52,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
refcount_set(&rreq->ref, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
- if (cached) {
- __set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
- if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
- /* Filesystem uses deprecated PG_private_2 marking. */
- __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
- }
if (file && file->f_flags & O_NONBLOCK)
__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
if (rreq->netfs_ops->init_request) {
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 9258d30cffe3..3f7e37e50c7d 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -94,6 +94,8 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
{
struct netfs_io_request *wreq;
struct netfs_inode *ictx;
+ bool is_buffered = (origin == NETFS_WRITEBACK ||
+ origin == NETFS_WRITETHROUGH);
wreq = netfs_alloc_request(mapping, file, start, 0, origin);
if (IS_ERR(wreq))
@@ -102,7 +104,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
_enter("R=%x", wreq->debug_id);
ictx = netfs_inode(wreq->inode);
- if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
+ if (is_buffered && netfs_is_cache_enabled(ictx))
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
wreq->contiguity = wreq->start;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 7202ce84d0eb..bf29a65c5027 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -265,6 +265,8 @@ static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *fi
{
rreq->netfs_priv = get_nfs_open_context(nfs_file_open_context(file));
rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
+ /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
+ __set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
return 0;
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index fbed0027996f..e8adae1bc260 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -81,8 +81,6 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
{
netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
- /* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
- __set_bit(NETFS_ICTX_USE_PGPRIV2, &nfsi->netfs.flags);
}
extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5d0288938cc2..983816608f15 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -73,8 +73,6 @@ struct netfs_inode {
#define NETFS_ICTX_ODIRECT 0 /* The file has DIO in progress */
#define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */
#define NETFS_ICTX_WRITETHROUGH 2 /* Write-through caching */
-#define NETFS_ICTX_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
- * write to cache on read */
};
/*
@@ -269,7 +267,6 @@ struct netfs_io_request {
#define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */
#define NETFS_RREQ_FAILED 4 /* The request failed */
#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */
-#define NETFS_RREQ_WRITE_TO_CACHE 7 /* Need to write to the cache */
#define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */
#define NETFS_RREQ_NONBLOCK 9 /* Don't block if possible (O_NONBLOCK) */
#define NETFS_RREQ_BLOCKED 10 /* We blocked */
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 24ec3434d32e..606b4a0f92da 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -51,6 +51,7 @@
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
EM(netfs_rreq_trace_set_pause, "PAUSE ") \
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
+ EM(netfs_rreq_trace_unlock_pgpriv2, "UNLCK-2") \
EM(netfs_rreq_trace_unmark, "UNMARK ") \
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \
^ permalink raw reply related [flat|nested] 17+ messages in thread