* [PATCH 0/1] Fix potential oops in nfs_inode_remove_request()
@ 2023-07-25 15:08 Scott Mayhew
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
0 siblings, 1 reply; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 15:08 UTC (permalink / raw)
To: trond.myklebust, anna; +Cc: linux-nfs
IBM reported a panic in nfs_inode_remove_request() while running LTP
(based on the the NFS mounts and the processes in the vmcore I'm pretty
sure it was in the nfslock3t_01 test):
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000005015600
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: loop dummy can_bcm can vcan can_dev veth vsock_loopback vmw_vsock_virtio_transport_common vsock tun tcp_dctcp nft_limit xt_limit xt_multiport xt_LOG nf_log_syslog n_hdlc sch_netem tcp_bbr overlay exfat vfat fat squashfs brd uinput snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore sctp_diag sctp xt_sctp act_tunnel_key nft_tunnel tunnel6 ip6_udp_tunnel xfrm4_tunnel udp_tunnel tunnel4 ip_tunnel xt_dccp xfrm_ipcomp nft_xfrm ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver rpadlpar_io rpaphp xsk_diag rpcrdma rdma_cm iw_cm ib_cm nfsv3 ib_core nfs bonding fscache netfs tls rfkill binfmt_misc pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc ibmvnic scsi_transport_fc vmx_crypto pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: ltp_fw_load]
CPU: 0 PID: 3878459 Comm: kworker/u128:7 Tainted: G OE ------- --- 5.14.0-316.el9.ppc64le #1
Workqueue: nfsiod rpc_async_release [sunrpc]
NIP: c008000005015600 LR: c008000005015598 CTR: c000000000f6d9d0
REGS: c0000001aecd3840 TRAP: 0300 Tainted: G OE ------- --- (5.14.0-316.el9.ppc64le)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002820 XER: 00000001
CFAR: c0080000050155bc DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
GPR00: c008000005015598 c0000001aecd3ae0 c008000005069000 0000000000000000
GPR04: 0000000000000001 0000000000000000 0000000000000000 0000000000000005
GPR08: 0000000000000001 0000000000000000 0000000000000002 c008000005033420
GPR12: c000000000f6d9d0 c000000002eb0000 c0000000001876a8 c000000150c7e200
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 c000000002c1d238 0000000000000000 c000000002650e40
GPR24: c00000008a941b74 0000000000000000 c0000006774cafd0 c0000006774cafe0
GPR28: c008000001ee26f0 c00c000001d49580 c0000006774cae00 c00000016fb2a600
NIP [c008000005015600] nfs_inode_remove_request+0x1c8/0x240 [nfs]
LR [c008000005015598] nfs_inode_remove_request+0x160/0x240 [nfs]
Call Trace:
[c0000001aecd3ae0] [c008000005015464] nfs_inode_remove_request+0x2c/0x240 [nfs] (unreliable)
[c0000001aecd3b30] [c008000005015d50] nfs_commit_release_pages+0x298/0x410 [nfs]
[c0000001aecd3c00] [c008000005013250] nfs_commit_release+0x38/0x80 [nfs]
[c0000001aecd3c30] [c008000001e53410] rpc_free_task+0x58/0xc0 [sunrpc]
[c0000001aecd3c60] [c008000001e534b8] rpc_async_release+0x40/0x70 [sunrpc]
[c0000001aecd3c90] [c00000000017a0b8] process_one_work+0x298/0x580
[c0000001aecd3d30] [c00000000017aa48] worker_thread+0xa8/0x620
[c0000001aecd3dc0] [c0000000001877c4] kthread+0x124/0x130
[c0000001aecd3e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
e8410018 4800002c 60000000 60000000 e9230008 3949ffff 71290001 7c63509e
e9230000 75290008 4082ffc8 e8630018 <e9230000> 3929ff38 7d4048a8 314affff
---[ end trace 2902c6562da6500b ]---
In the vmcore, I can see that the folio->mapping was NULL:
crash> nfs_page.wb_folio c00000016fb2a600
wb_folio = 0xc00c000001d49580
crash> folio.mapping 0xc00c000001d49580
mapping = 0x0,
Note that I have *not* been able to reproduce the panic myself, but it
looks to me like once nfs_inode_remove_request() calls
folio_clear_private(), another process can come in and clobber the
folio->mapping, for example:
nfs_file_write
nfs_clear_invalid_mapping
nfs_invalidate_mapping
invalidate_inode_pages2
invalidate_inode_pages2_range
invalidate_complete_folio2
__filemap_remove_folio
or:
drop_pagecache_sb
invalidate_mapping_pages
invalidate_mapping_pagevec
mapping_evict_folio
remove_mapping
__remove_mapping
__filemap_remove_folio
So the nfs_inode should probably be initialized in the beginning of
nfs_inode_remove_request(), which is what it used to do before commit
0c493b5cf16e.
Scott Mayhew (1):
NFS: Fix potential oops in nfs_inode_remove_request()
fs/nfs/write.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--
2.41.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 15:08 [PATCH 0/1] Fix potential oops in nfs_inode_remove_request() Scott Mayhew
@ 2023-07-25 15:08 ` Scott Mayhew
2023-07-25 15:14 ` Trond Myklebust
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 15:08 UTC (permalink / raw)
To: trond.myklebust, anna; +Cc: linux-nfs
Once a folio's private data has been cleared, it's possible for another
process to clear the folio->mapping (e.g. via invalidate_complete_folio2
or evict_mapping_folio), so it wouldn't be safe to call
nfs_page_to_inode() after that.
Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
fs/nfs/write.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f4cca8f00c0c..489c3f9dae23 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page *req)
*/
static void nfs_inode_remove_request(struct nfs_page *req)
{
+ struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
+
if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
struct folio *folio = nfs_page_to_folio(req->wb_head);
struct address_space *mapping = folio_file_mapping(folio);
@@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct nfs_page *req)
if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
nfs_release_request(req);
- atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
+ atomic_long_dec(&nfsi->nrequests);
}
}
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
@ 2023-07-25 15:14 ` Trond Myklebust
2023-07-25 16:24 ` Scott Mayhew
2023-10-02 20:16 ` Benjamin Coddington
2023-10-02 20:29 ` Jeff Layton
2 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2023-07-25 15:14 UTC (permalink / raw)
To: anna@kernel.org, smayhew@redhat.com; +Cc: linux-nfs@vger.kernel.org
On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> Once a folio's private data has been cleared, it's possible for
> another
> process to clear the folio->mapping (e.g. via
> invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
>
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
> fs/nfs/write.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index f4cca8f00c0c..489c3f9dae23 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page
> *req)
> */
> static void nfs_inode_remove_request(struct nfs_page *req)
> {
> + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> +
> if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> struct folio *folio = nfs_page_to_folio(req-
> >wb_head);
> struct address_space *mapping =
> folio_file_mapping(folio);
> @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> nfs_page *req)
>
> if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> nfs_release_request(req);
> - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> >nrequests);
> + atomic_long_dec(&nfsi->nrequests);
Why not just invert the order of the atomic_long_dec() and the
nfs_release_request()? That way you are also ensuring that the inode is
still pinned in memory by the open context.
> }
> }
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 15:14 ` Trond Myklebust
@ 2023-07-25 16:24 ` Scott Mayhew
2023-07-25 17:41 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 16:24 UTC (permalink / raw)
To: Trond Myklebust; +Cc: anna@kernel.org, linux-nfs@vger.kernel.org
On Tue, 25 Jul 2023, Trond Myklebust wrote:
> On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > Once a folio's private data has been cleared, it's possible for
> > another
> > process to clear the folio->mapping (e.g. via
> > invalidate_complete_folio2
> > or evict_mapping_folio), so it wouldn't be safe to call
> > nfs_page_to_inode() after that.
> >
> > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > ---
> > fs/nfs/write.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index f4cca8f00c0c..489c3f9dae23 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page
> > *req)
> > */
> > static void nfs_inode_remove_request(struct nfs_page *req)
> > {
> > + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > +
> > if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > struct folio *folio = nfs_page_to_folio(req-
> > >wb_head);
> > struct address_space *mapping =
> > folio_file_mapping(folio);
> > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > nfs_page *req)
> >
> > if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > nfs_release_request(req);
> > - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > >nrequests);
> > + atomic_long_dec(&nfsi->nrequests);
>
> Why not just invert the order of the atomic_long_dec() and the
> nfs_release_request()? That way you are also ensuring that the inode is
> still pinned in memory by the open context.
I'm not following. How does inverting the order prevent the
folio->mapping from getting clobbered?
-Scott
> > }
> > }
> >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 16:24 ` Scott Mayhew
@ 2023-07-25 17:41 ` Trond Myklebust
2023-07-26 12:40 ` Jeff Layton
0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2023-07-25 17:41 UTC (permalink / raw)
To: smayhew@redhat.com; +Cc: anna@kernel.org, linux-nfs@vger.kernel.org
On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> On Tue, 25 Jul 2023, Trond Myklebust wrote:
>
> > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > Once a folio's private data has been cleared, it's possible for
> > > another
> > > process to clear the folio->mapping (e.g. via
> > > invalidate_complete_folio2
> > > or evict_mapping_folio), so it wouldn't be safe to call
> > > nfs_page_to_inode() after that.
> > >
> > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > folios")
> > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > ---
> > > fs/nfs/write.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > index f4cca8f00c0c..489c3f9dae23 100644
> > > --- a/fs/nfs/write.c
> > > +++ b/fs/nfs/write.c
> > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > nfs_page
> > > *req)
> > > */
> > > static void nfs_inode_remove_request(struct nfs_page *req)
> > > {
> > > + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > +
> > > if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > > struct folio *folio = nfs_page_to_folio(req-
> > > > wb_head);
> > > struct address_space *mapping =
> > > folio_file_mapping(folio);
> > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > nfs_page *req)
> > >
> > > if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > > nfs_release_request(req);
> > > - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > nrequests);
> > > + atomic_long_dec(&nfsi->nrequests);
> >
> > Why not just invert the order of the atomic_long_dec() and the
> > nfs_release_request()? That way you are also ensuring that the
> > inode is
> > still pinned in memory by the open context.
>
> I'm not following. How does inverting the order prevent the
> folio->mapping from getting clobbered?
>
The open/lock context is refcounted by the nfs_page until the latter is
released. That's why the inode is guaranteed to remain around at least
until the call to nfs_release_request().
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 17:41 ` Trond Myklebust
@ 2023-07-26 12:40 ` Jeff Layton
2023-07-26 13:43 ` Scott Mayhew
0 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2023-07-26 12:40 UTC (permalink / raw)
To: Trond Myklebust, smayhew@redhat.com
Cc: anna@kernel.org, linux-nfs@vger.kernel.org
On Tue, 2023-07-25 at 17:41 +0000, Trond Myklebust wrote:
> On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> > On Tue, 25 Jul 2023, Trond Myklebust wrote:
> >
> > > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > > Once a folio's private data has been cleared, it's possible for
> > > > another
> > > > process to clear the folio->mapping (e.g. via
> > > > invalidate_complete_folio2
> > > > or evict_mapping_folio), so it wouldn't be safe to call
> > > > nfs_page_to_inode() after that.
> > > >
> > > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > > folios")
> > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > > ---
> > > > fs/nfs/write.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > index f4cca8f00c0c..489c3f9dae23 100644
> > > > --- a/fs/nfs/write.c
> > > > +++ b/fs/nfs/write.c
> > > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > > nfs_page
> > > > *req)
> > > > */
> > > > static void nfs_inode_remove_request(struct nfs_page *req)
> > > > {
> > > > + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > > +
> > > > if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > > > struct folio *folio = nfs_page_to_folio(req-
> > > > > wb_head);
> > > > struct address_space *mapping =
> > > > folio_file_mapping(folio);
> > > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > > nfs_page *req)
> > > >
> > > > if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > > > nfs_release_request(req);
> > > > - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > > nrequests);
> > > > + atomic_long_dec(&nfsi->nrequests);
> > >
> > > Why not just invert the order of the atomic_long_dec() and the
> > > nfs_release_request()? That way you are also ensuring that the
> > > inode is
> > > still pinned in memory by the open context.
> >
> > I'm not following. How does inverting the order prevent the
> > folio->mapping from getting clobbered?
> >
>
> The open/lock context is refcounted by the nfs_page until the latter is
> released. That's why the inode is guaranteed to remain around at least
> until the call to nfs_release_request().
>
The problem is not that the inode is going away, but rather that we
can't guarantee that the page is still part of the mapping at this
point, and so we can't safely dereference page->mapping there. I do see
that nfs_release_request releases a reference to the page, but I don't
think that's sufficient to ensure that it remains part of the mapping.
AFAICT, once we clear page->private, the page is subject to be removed
from the mapping. So, I *think* it's safe to just move the call to
nfs_page_to_inode prior to the call to nfs_page_group_sync_on_bit.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-26 12:40 ` Jeff Layton
@ 2023-07-26 13:43 ` Scott Mayhew
0 siblings, 0 replies; 9+ messages in thread
From: Scott Mayhew @ 2023-07-26 13:43 UTC (permalink / raw)
To: Jeff Layton; +Cc: Trond Myklebust, anna@kernel.org, linux-nfs@vger.kernel.org
On Wed, 26 Jul 2023, Jeff Layton wrote:
> On Tue, 2023-07-25 at 17:41 +0000, Trond Myklebust wrote:
> > On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> > > On Tue, 25 Jul 2023, Trond Myklebust wrote:
> > >
> > > > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > > > Once a folio's private data has been cleared, it's possible for
> > > > > another
> > > > > process to clear the folio->mapping (e.g. via
> > > > > invalidate_complete_folio2
> > > > > or evict_mapping_folio), so it wouldn't be safe to call
> > > > > nfs_page_to_inode() after that.
> > > > >
> > > > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > > > folios")
> > > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > > > ---
> > > > > fs/nfs/write.c | 4 +++-
> > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > > index f4cca8f00c0c..489c3f9dae23 100644
> > > > > --- a/fs/nfs/write.c
> > > > > +++ b/fs/nfs/write.c
> > > > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > > > nfs_page
> > > > > *req)
> > > > > */
> > > > > static void nfs_inode_remove_request(struct nfs_page *req)
> > > > > {
> > > > > + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > > > +
> > > > > if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > > > > struct folio *folio = nfs_page_to_folio(req-
> > > > > > wb_head);
> > > > > struct address_space *mapping =
> > > > > folio_file_mapping(folio);
> > > > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > > > nfs_page *req)
> > > > >
> > > > > if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > > > > nfs_release_request(req);
> > > > > - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > > > nrequests);
> > > > > + atomic_long_dec(&nfsi->nrequests);
> > > >
> > > > Why not just invert the order of the atomic_long_dec() and the
> > > > nfs_release_request()? That way you are also ensuring that the
> > > > inode is
> > > > still pinned in memory by the open context.
> > >
> > > I'm not following. How does inverting the order prevent the
> > > folio->mapping from getting clobbered?
> > >
> >
> > The open/lock context is refcounted by the nfs_page until the latter is
> > released. That's why the inode is guaranteed to remain around at least
> > until the call to nfs_release_request().
> >
>
> The problem is not that the inode is going away, but rather that we
> can't guarantee that the page is still part of the mapping at this
> point, and so we can't safely dereference page->mapping there. I do see
> that nfs_release_request releases a reference to the page, but I don't
> think that's sufficient to ensure that it remains part of the mapping.
>
> AFAICT, once we clear page->private, the page is subject to be removed
> from the mapping. So, I *think* it's safe to just move the call to
> nfs_page_to_inode prior to the call to nfs_page_group_sync_on_bit.
Yeah, the inode hasn't gone away. I can pick the nfs_commit_data
address off the stack in nfs_commit_release_pages:
crash> nfs_commit_data.inode c0000006774cae00
inode = 0xc00000006c1b05f8,
The nfs_inode is still allocated:
crash> kmem 0xc00000006c1b05f8
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
c000000030332600 1088 128 5959 101 64k nfs_inode_cache
SLAB MEMORY NODE TOTAL ALLOCATED FREE
c00c0000001b06c0 c00000006c1b0000 0 59 1 58
FREE / [ALLOCATED]
[c00000006c1b0448]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c00c0000001b06c0 6c1b0000 c000000030332600 c00000006c1b4480 1 23ffff800000200 slab
The vfs_inode:
crash> px &((struct nfs_inode *)0xc00000006c1b0448)->vfs_inode
$7 = (struct inode *) 0xc00000006c1b05f8
Matches the inodes open by both nfs_flock programs from the test:
crash> foreach nfs_flock files
PID: 4006780 TASK: c00000009d436600 CPU: 43 COMMAND: "nfs_flock"
ROOT: / CWD: /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/
FD FILE DENTRY INODE TYPE PATH
0 c000000196e9a000 c000000004090840 c00000000ae13bf0 CHR /dev/null
1 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG /opt/ltp/output/nfslock01.sh_20230610112802
2 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG /opt/ltp/output/nfslock01.sh_20230610112802
3 c000000196e97700 c000000419ccb040 c00000006c1b05f8 REG /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/flock_idata
^^^^^^^^^^^^^^^^
PID: 4006781 TASK: c00000009d42d500 CPU: 42 COMMAND: "nfs_flock"
ROOT: / CWD: /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/
FD FILE DENTRY INODE TYPE PATH
0 c0000000f0812200 c000000004090840 c00000000ae13bf0 CHR /dev/null
1 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG /opt/ltp/output/nfslock01.sh_20230610112802
2 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG /opt/ltp/output/nfslock01.sh_20230610112802
3 c0000000f0813c00 c000000419ccb040 c00000006c1b05f8 REG /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/flock_idata
^^^^^^^^^^^^^^^^
The file->f_mapping for both struct files matches the inode->i_data:
crash> file.f_mapping c000000196e97700
f_mapping = 0xc00000006c1b0770,
crash> file.f_mapping c0000000f0813c00
f_mapping = 0xc00000006c1b0770,
crash> px &((struct inode *)0xc00000006c1b05f8)->i_data
$8 = (struct address_space *) 0xc00000006c1b0770
and if I look at one of those nfs_flock tasks, the folio
passed in to nfs_read_folio has the same mapping:
crash> bt 4006781
PID: 4006781 TASK: c00000009d42d500 CPU: 42 COMMAND: "nfs_flock"
#0 [c000000177053710] __schedule at c000000000f61d9c
#1 [c0000001770537d0] schedule at c000000000f621f4
#2 [c000000177053840] io_schedule at c000000000f62354
#3 [c000000177053870] folio_wait_bit_common at c00000000042dc60
#4 [c000000177053970] nfs_read_folio at c0080000050108a8 [nfs]
#5 [c000000177053a60] nfs_write_begin at c008000004fff06c [nfs]
#6 [c000000177053b10] generic_perform_write at c00000000042b044
#7 [c000000177053bc0] nfs_file_write at c008000004ffda08 [nfs]
#8 [c000000177053c60] new_sync_write at c00000000057fdd8
#9 [c000000177053d10] vfs_write at c000000000582fd4
#10 [c000000177053d60] ksys_write at c0000000005833a4
#11 [c000000177053db0] system_call_exception at c00000000002f434
#12 [c000000177053e10] system_call_vectored_common at c00000000000bfe8
crash> folio.mapping c00c000000564400
mapping = 0xc00000006c1b0770,
It's just that if we go back to the nfs_page being released by our panic
task, the folio->mapping has been cleared, so we panic when we try to go
folio->mapping->host.
crash> nfs_page c00000016fb2a600
struct nfs_page {
wb_list = {
next = 0xc00000016fb2a600,
prev = 0xc00000016fb2a600
},
{
wb_page = 0xc00c000001d49580,
wb_folio = 0xc00c000001d49580
},
wb_lock_context = 0xc00000010518b2c0,
wb_index = 0x1,
wb_offset = 0x6940,
wb_pgbase = 0x6940,
wb_bytes = 0x40,
wb_kref = {
refcount = {
refs = {
counter = 0x1
}
}
},
wb_flags = 0x5,
wb_verf = {
data = "\214\205_d\214\210W\036"
},
wb_this_page = 0xc00000016fb2a600,
wb_head = 0xc00000016fb2a600,
wb_nio = 0x0
}
crash> folio.mapping 0xc00c000001d49580
mapping = 0x0,
-Scott
> --
> Jeff Layton <jlayton@kernel.org>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
2023-07-25 15:14 ` Trond Myklebust
@ 2023-10-02 20:16 ` Benjamin Coddington
2023-10-02 20:29 ` Jeff Layton
2 siblings, 0 replies; 9+ messages in thread
From: Benjamin Coddington @ 2023-10-02 20:16 UTC (permalink / raw)
To: Scott Mayhew; +Cc: trond.myklebust, anna, linux-nfs
On 25 Jul 2023, at 11:08, Scott Mayhew wrote:
> Once a folio's private data has been cleared, it's possible for another
> process to clear the folio->mapping (e.g. via invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
>
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Ben
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
2023-07-25 15:14 ` Trond Myklebust
2023-10-02 20:16 ` Benjamin Coddington
@ 2023-10-02 20:29 ` Jeff Layton
2 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2023-10-02 20:29 UTC (permalink / raw)
To: Scott Mayhew, trond.myklebust, anna; +Cc: linux-nfs
On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> Once a folio's private data has been cleared, it's possible for another
> process to clear the folio->mapping (e.g. via invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
>
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
> fs/nfs/write.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index f4cca8f00c0c..489c3f9dae23 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page *req)
> */
> static void nfs_inode_remove_request(struct nfs_page *req)
> {
> + struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> +
> if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> struct folio *folio = nfs_page_to_folio(req->wb_head);
> struct address_space *mapping = folio_file_mapping(folio);
> @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct nfs_page *req)
>
> if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> nfs_release_request(req);
> - atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
> + atomic_long_dec(&nfsi->nrequests);
> }
> }
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-10-02 20:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-25 15:08 [PATCH 0/1] Fix potential oops in nfs_inode_remove_request() Scott Mayhew
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
2023-07-25 15:14 ` Trond Myklebust
2023-07-25 16:24 ` Scott Mayhew
2023-07-25 17:41 ` Trond Myklebust
2023-07-26 12:40 ` Jeff Layton
2023-07-26 13:43 ` Scott Mayhew
2023-10-02 20:16 ` Benjamin Coddington
2023-10-02 20:29 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).