linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix potential oops in nfs_inode_remove_request()
@ 2023-07-25 15:08 Scott Mayhew
  2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
  0 siblings, 1 reply; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 15:08 UTC (permalink / raw)
  To: trond.myklebust, anna; +Cc: linux-nfs

IBM reported a panic in nfs_inode_remove_request() while running LTP
(based on the the NFS mounts and the processes in the vmcore I'm pretty
sure it was in the nfslock3t_01 test):

BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000005015600
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: loop dummy can_bcm can vcan can_dev veth vsock_loopback vmw_vsock_virtio_transport_common vsock tun tcp_dctcp nft_limit xt_limit xt_multiport xt_LOG nf_log_syslog n_hdlc sch_netem tcp_bbr overlay exfat vfat fat squashfs brd uinput snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore sctp_diag sctp xt_sctp act_tunnel_key nft_tunnel tunnel6 ip6_udp_tunnel xfrm4_tunnel udp_tunnel tunnel4 ip_tunnel xt_dccp xfrm_ipcomp nft_xfrm ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver rpadlpar_io rpaphp xsk_diag rpcrdma rdma_cm iw_cm ib_cm nfsv3 ib_core nfs bonding fscache netfs tls rfkill binfmt_misc pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc ibmvnic scsi_transport_fc vmx_crypto pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: ltp_fw_load]
CPU: 0 PID: 3878459 Comm: kworker/u128:7 Tainted: G           OE     -------  ---  5.14.0-316.el9.ppc64le #1
Workqueue: nfsiod rpc_async_release [sunrpc]
NIP:  c008000005015600 LR: c008000005015598 CTR: c000000000f6d9d0
REGS: c0000001aecd3840 TRAP: 0300   Tainted: G           OE     -------  ---   (5.14.0-316.el9.ppc64le)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002820  XER: 00000001
CFAR: c0080000050155bc DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 
GPR00: c008000005015598 c0000001aecd3ae0 c008000005069000 0000000000000000 
GPR04: 0000000000000001 0000000000000000 0000000000000000 0000000000000005 
GPR08: 0000000000000001 0000000000000000 0000000000000002 c008000005033420 
GPR12: c000000000f6d9d0 c000000002eb0000 c0000000001876a8 c000000150c7e200 
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 c000000002c1d238 0000000000000000 c000000002650e40 
GPR24: c00000008a941b74 0000000000000000 c0000006774cafd0 c0000006774cafe0 
GPR28: c008000001ee26f0 c00c000001d49580 c0000006774cae00 c00000016fb2a600 
NIP [c008000005015600] nfs_inode_remove_request+0x1c8/0x240 [nfs]
LR [c008000005015598] nfs_inode_remove_request+0x160/0x240 [nfs]
Call Trace:
[c0000001aecd3ae0] [c008000005015464] nfs_inode_remove_request+0x2c/0x240 [nfs] (unreliable)
[c0000001aecd3b30] [c008000005015d50] nfs_commit_release_pages+0x298/0x410 [nfs]
[c0000001aecd3c00] [c008000005013250] nfs_commit_release+0x38/0x80 [nfs]
[c0000001aecd3c30] [c008000001e53410] rpc_free_task+0x58/0xc0 [sunrpc]
[c0000001aecd3c60] [c008000001e534b8] rpc_async_release+0x40/0x70 [sunrpc]
[c0000001aecd3c90] [c00000000017a0b8] process_one_work+0x298/0x580
[c0000001aecd3d30] [c00000000017aa48] worker_thread+0xa8/0x620
[c0000001aecd3dc0] [c0000000001877c4] kthread+0x124/0x130
[c0000001aecd3e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
e8410018 4800002c 60000000 60000000 e9230008 3949ffff 71290001 7c63509e 
e9230000 75290008 4082ffc8 e8630018 <e9230000> 3929ff38 7d4048a8 314affff 
---[ end trace 2902c6562da6500b ]---

In the vmcore, I can see that the folio->mapping was NULL:

crash> nfs_page.wb_folio c00000016fb2a600
    wb_folio = 0xc00c000001d49580
crash> folio.mapping 0xc00c000001d49580
      mapping = 0x0,

Note that I have *not* been able to reproduce the panic myself, but it
looks to me like once nfs_inode_remove_request() calls
folio_clear_private(), another process can come in and clobber the
folio->mapping, for example:

nfs_file_write
  nfs_clear_invalid_mapping
    nfs_invalidate_mapping
      invalidate_inode_pages2
        invalidate_inode_pages2_range
          invalidate_complete_folio2
            __filemap_remove_folio

or:

drop_pagecache_sb
  invalidate_mapping_pages
    invalidate_mapping_pagevec
      mapping_evict_folio
        remove_mapping
          __remove_mapping
            __filemap_remove_folio

So the nfs_inode should probably be initialized in the beginning of
nfs_inode_remove_request(), which is what it used to do before commit 
0c493b5cf16e.

Scott Mayhew (1):
  NFS: Fix potential oops in nfs_inode_remove_request()

 fs/nfs/write.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 15:08 [PATCH 0/1] Fix potential oops in nfs_inode_remove_request() Scott Mayhew
@ 2023-07-25 15:08 ` Scott Mayhew
  2023-07-25 15:14   ` Trond Myklebust
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 15:08 UTC (permalink / raw)
  To: trond.myklebust, anna; +Cc: linux-nfs

Once a folio's private data has been cleared, it's possible for another
process to clear the folio->mapping (e.g. via invalidate_complete_folio2
or evict_mapping_folio), so it wouldn't be safe to call
nfs_page_to_inode() after that.

Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
 fs/nfs/write.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f4cca8f00c0c..489c3f9dae23 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page *req)
  */
 static void nfs_inode_remove_request(struct nfs_page *req)
 {
+	struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
+
 	if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
 		struct folio *folio = nfs_page_to_folio(req->wb_head);
 		struct address_space *mapping = folio_file_mapping(folio);
@@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 
 	if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
 		nfs_release_request(req);
-		atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
+		atomic_long_dec(&nfsi->nrequests);
 	}
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
@ 2023-07-25 15:14   ` Trond Myklebust
  2023-07-25 16:24     ` Scott Mayhew
  2023-10-02 20:16   ` Benjamin Coddington
  2023-10-02 20:29   ` Jeff Layton
  2 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2023-07-25 15:14 UTC (permalink / raw)
  To: anna@kernel.org, smayhew@redhat.com; +Cc: linux-nfs@vger.kernel.org

On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> Once a folio's private data has been cleared, it's possible for
> another
> process to clear the folio->mapping (e.g. via
> invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
> 
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
>  fs/nfs/write.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index f4cca8f00c0c..489c3f9dae23 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page
> *req)
>   */
>  static void nfs_inode_remove_request(struct nfs_page *req)
>  {
> +       struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> +
>         if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
>                 struct folio *folio = nfs_page_to_folio(req-
> >wb_head);
>                 struct address_space *mapping =
> folio_file_mapping(folio);
> @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> nfs_page *req)
>  
>         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
>                 nfs_release_request(req);
> -               atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> >nrequests);
> +               atomic_long_dec(&nfsi->nrequests);

Why not just invert the order of the atomic_long_dec() and the
nfs_release_request()? That way you are also ensuring that the inode is
still pinned in memory by the open context.
>         }
>  }
>  

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 15:14   ` Trond Myklebust
@ 2023-07-25 16:24     ` Scott Mayhew
  2023-07-25 17:41       ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: Scott Mayhew @ 2023-07-25 16:24 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna@kernel.org, linux-nfs@vger.kernel.org

On Tue, 25 Jul 2023, Trond Myklebust wrote:

> On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > Once a folio's private data has been cleared, it's possible for
> > another
> > process to clear the folio->mapping (e.g. via
> > invalidate_complete_folio2
> > or evict_mapping_folio), so it wouldn't be safe to call
> > nfs_page_to_inode() after that.
> > 
> > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > ---
> >  fs/nfs/write.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index f4cca8f00c0c..489c3f9dae23 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page
> > *req)
> >   */
> >  static void nfs_inode_remove_request(struct nfs_page *req)
> >  {
> > +       struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > +
> >         if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> >                 struct folio *folio = nfs_page_to_folio(req-
> > >wb_head);
> >                 struct address_space *mapping =
> > folio_file_mapping(folio);
> > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > nfs_page *req)
> >  
> >         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> >                 nfs_release_request(req);
> > -               atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > >nrequests);
> > +               atomic_long_dec(&nfsi->nrequests);
> 
> Why not just invert the order of the atomic_long_dec() and the
> nfs_release_request()? That way you are also ensuring that the inode is
> still pinned in memory by the open context.

I'm not following.  How does inverting the order prevent the
folio->mapping from getting clobbered?

-Scott
> >         }
> >  }
> >  
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 16:24     ` Scott Mayhew
@ 2023-07-25 17:41       ` Trond Myklebust
  2023-07-26 12:40         ` Jeff Layton
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2023-07-25 17:41 UTC (permalink / raw)
  To: smayhew@redhat.com; +Cc: anna@kernel.org, linux-nfs@vger.kernel.org

On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> On Tue, 25 Jul 2023, Trond Myklebust wrote:
> 
> > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > Once a folio's private data has been cleared, it's possible for
> > > another
> > > process to clear the folio->mapping (e.g. via
> > > invalidate_complete_folio2
> > > or evict_mapping_folio), so it wouldn't be safe to call
> > > nfs_page_to_inode() after that.
> > > 
> > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > folios")
> > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > ---
> > >  fs/nfs/write.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > index f4cca8f00c0c..489c3f9dae23 100644
> > > --- a/fs/nfs/write.c
> > > +++ b/fs/nfs/write.c
> > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > nfs_page
> > > *req)
> > >   */
> > >  static void nfs_inode_remove_request(struct nfs_page *req)
> > >  {
> > > +       struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > +
> > >         if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > >                 struct folio *folio = nfs_page_to_folio(req-
> > > > wb_head);
> > >                 struct address_space *mapping =
> > > folio_file_mapping(folio);
> > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > nfs_page *req)
> > >  
> > >         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > >                 nfs_release_request(req);
> > > -               atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > nrequests);
> > > +               atomic_long_dec(&nfsi->nrequests);
> > 
> > Why not just invert the order of the atomic_long_dec() and the
> > nfs_release_request()? That way you are also ensuring that the
> > inode is
> > still pinned in memory by the open context.
> 
> I'm not following.  How does inverting the order prevent the
> folio->mapping from getting clobbered?
> 

The open/lock context is refcounted by the nfs_page until the latter is
released. That's why the inode is guaranteed to remain around at least
until the  call to nfs_release_request().


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 17:41       ` Trond Myklebust
@ 2023-07-26 12:40         ` Jeff Layton
  2023-07-26 13:43           ` Scott Mayhew
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2023-07-26 12:40 UTC (permalink / raw)
  To: Trond Myklebust, smayhew@redhat.com
  Cc: anna@kernel.org, linux-nfs@vger.kernel.org

On Tue, 2023-07-25 at 17:41 +0000, Trond Myklebust wrote:
> On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> > On Tue, 25 Jul 2023, Trond Myklebust wrote:
> > 
> > > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > > Once a folio's private data has been cleared, it's possible for
> > > > another
> > > > process to clear the folio->mapping (e.g. via
> > > > invalidate_complete_folio2
> > > > or evict_mapping_folio), so it wouldn't be safe to call
> > > > nfs_page_to_inode() after that.
> > > > 
> > > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > > folios")
> > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > > ---
> > > >  fs/nfs/write.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > index f4cca8f00c0c..489c3f9dae23 100644
> > > > --- a/fs/nfs/write.c
> > > > +++ b/fs/nfs/write.c
> > > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > > nfs_page
> > > > *req)
> > > >   */
> > > >  static void nfs_inode_remove_request(struct nfs_page *req)
> > > >  {
> > > > +       struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > > +
> > > >         if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > > >                 struct folio *folio = nfs_page_to_folio(req-
> > > > > wb_head);
> > > >                 struct address_space *mapping =
> > > > folio_file_mapping(folio);
> > > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > > nfs_page *req)
> > > >  
> > > >         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > > >                 nfs_release_request(req);
> > > > -               atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > > nrequests);
> > > > +               atomic_long_dec(&nfsi->nrequests);
> > > 
> > > Why not just invert the order of the atomic_long_dec() and the
> > > nfs_release_request()? That way you are also ensuring that the
> > > inode is
> > > still pinned in memory by the open context.
> > 
> > I'm not following.  How does inverting the order prevent the
> > folio->mapping from getting clobbered?
> > 
> 
> The open/lock context is refcounted by the nfs_page until the latter is
> released. That's why the inode is guaranteed to remain around at least
> until the  call to nfs_release_request().
> 

The problem is not that the inode is going away, but rather that we
can't guarantee that the page is still part of the mapping at this
point, and so we can't safely dereference page->mapping there. I do see
that nfs_release_request releases a reference to the page, but I don't
think that's sufficient to ensure that it remains part of the mapping.

AFAICT, once we clear page->private, the page is subject to be removed
from the mapping. So, I *think* it's safe to just move the call to
nfs_page_to_inode prior to the call to nfs_page_group_sync_on_bit.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-26 12:40         ` Jeff Layton
@ 2023-07-26 13:43           ` Scott Mayhew
  0 siblings, 0 replies; 9+ messages in thread
From: Scott Mayhew @ 2023-07-26 13:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Trond Myklebust, anna@kernel.org, linux-nfs@vger.kernel.org

On Wed, 26 Jul 2023, Jeff Layton wrote:

> On Tue, 2023-07-25 at 17:41 +0000, Trond Myklebust wrote:
> > On Tue, 2023-07-25 at 12:24 -0400, Scott Mayhew wrote:
> > > On Tue, 25 Jul 2023, Trond Myklebust wrote:
> > > 
> > > > On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> > > > > Once a folio's private data has been cleared, it's possible for
> > > > > another
> > > > > process to clear the folio->mapping (e.g. via
> > > > > invalidate_complete_folio2
> > > > > or evict_mapping_folio), so it wouldn't be safe to call
> > > > > nfs_page_to_inode() after that.
> > > > > 
> > > > > Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use
> > > > > folios")
> > > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > > > ---
> > > > >  fs/nfs/write.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > > index f4cca8f00c0c..489c3f9dae23 100644
> > > > > --- a/fs/nfs/write.c
> > > > > +++ b/fs/nfs/write.c
> > > > > @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct
> > > > > nfs_page
> > > > > *req)
> > > > >   */
> > > > >  static void nfs_inode_remove_request(struct nfs_page *req)
> > > > >  {
> > > > > +       struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> > > > > +
> > > > >         if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
> > > > >                 struct folio *folio = nfs_page_to_folio(req-
> > > > > > wb_head);
> > > > >                 struct address_space *mapping =
> > > > > folio_file_mapping(folio);
> > > > > @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct
> > > > > nfs_page *req)
> > > > >  
> > > > >         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > > > >                 nfs_release_request(req);
> > > > > -               atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > > > > nrequests);
> > > > > +               atomic_long_dec(&nfsi->nrequests);
> > > > 
> > > > Why not just invert the order of the atomic_long_dec() and the
> > > > nfs_release_request()? That way you are also ensuring that the
> > > > inode is
> > > > still pinned in memory by the open context.
> > > 
> > > I'm not following.  How does inverting the order prevent the
> > > folio->mapping from getting clobbered?
> > > 
> > 
> > The open/lock context is refcounted by the nfs_page until the latter is
> > released. That's why the inode is guaranteed to remain around at least
> > until the  call to nfs_release_request().
> > 
> 
> The problem is not that the inode is going away, but rather that we
> can't guarantee that the page is still part of the mapping at this
> point, and so we can't safely dereference page->mapping there. I do see
> that nfs_release_request releases a reference to the page, but I don't
> think that's sufficient to ensure that it remains part of the mapping.
> 
> AFAICT, once we clear page->private, the page is subject to be removed
> from the mapping. So, I *think* it's safe to just move the call to
> nfs_page_to_inode prior to the call to nfs_page_group_sync_on_bit.

Yeah, the inode hasn't gone away.  I can pick the nfs_commit_data
address off the stack in nfs_commit_release_pages:

crash> nfs_commit_data.inode c0000006774cae00
  inode = 0xc00000006c1b05f8,
 
The nfs_inode is still allocated:

crash> kmem 0xc00000006c1b05f8
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
c000000030332600     1088        128      5959    101    64k  nfs_inode_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  c00c0000001b06c0  c00000006c1b0000     0     59          1    58
  FREE / [ALLOCATED]
  [c00000006c1b0448]

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
c00c0000001b06c0  6c1b0000 c000000030332600 c00000006c1b4480  1 23ffff800000200 slab

The vfs_inode:

crash> px &((struct nfs_inode *)0xc00000006c1b0448)->vfs_inode
$7 = (struct inode *) 0xc00000006c1b05f8

Matches the inodes open by both nfs_flock programs from the test:

crash> foreach nfs_flock files
PID: 4006780  TASK: c00000009d436600  CPU: 43   COMMAND: "nfs_flock"
ROOT: /    CWD: /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 c000000196e9a000 c000000004090840 c00000000ae13bf0 CHR  /dev/null
  1 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG  /opt/ltp/output/nfslock01.sh_20230610112802
  2 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG  /opt/ltp/output/nfslock01.sh_20230610112802
  3 c000000196e97700 c000000419ccb040 c00000006c1b05f8 REG  /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/flock_idata
                                      ^^^^^^^^^^^^^^^^

PID: 4006781  TASK: c00000009d42d500  CPU: 42   COMMAND: "nfs_flock"
ROOT: /    CWD: /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 c0000000f0812200 c000000004090840 c00000000ae13bf0 CHR  /dev/null
  1 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG  /opt/ltp/output/nfslock01.sh_20230610112802
  2 c0000000bfd1ff00 c0000000963e0e40 c00000006c573900 REG  /opt/ltp/output/nfslock01.sh_20230610112802
  3 c0000000f0813c00 c000000419ccb040 c00000006c1b05f8 REG  /tmp/ltp-aFr4AJt3R9/LTP_nfslock01.9hyHNgoKqq/3/0/flock_idata
                                      ^^^^^^^^^^^^^^^^

The file->f_mapping for both struct files matches the inode->i_data:

crash> file.f_mapping c000000196e97700
  f_mapping = 0xc00000006c1b0770,
crash> file.f_mapping c0000000f0813c00
  f_mapping = 0xc00000006c1b0770,
crash> px &((struct inode *)0xc00000006c1b05f8)->i_data
$8 = (struct address_space *) 0xc00000006c1b0770

and if I look at one of those nfs_flock tasks, the folio
passed in to nfs_read_folio has the same mapping:

crash> bt 4006781
PID: 4006781  TASK: c00000009d42d500  CPU: 42   COMMAND: "nfs_flock"
 #0 [c000000177053710] __schedule at c000000000f61d9c
 #1 [c0000001770537d0] schedule at c000000000f621f4
 #2 [c000000177053840] io_schedule at c000000000f62354
 #3 [c000000177053870] folio_wait_bit_common at c00000000042dc60
 #4 [c000000177053970] nfs_read_folio at c0080000050108a8 [nfs]
 #5 [c000000177053a60] nfs_write_begin at c008000004fff06c [nfs]
 #6 [c000000177053b10] generic_perform_write at c00000000042b044
 #7 [c000000177053bc0] nfs_file_write at c008000004ffda08 [nfs]
 #8 [c000000177053c60] new_sync_write at c00000000057fdd8
 #9 [c000000177053d10] vfs_write at c000000000582fd4
#10 [c000000177053d60] ksys_write at c0000000005833a4
#11 [c000000177053db0] system_call_exception at c00000000002f434
#12 [c000000177053e10] system_call_vectored_common at c00000000000bfe8

crash> folio.mapping c00c000000564400
      mapping = 0xc00000006c1b0770,

It's just that if we go back to the nfs_page being released by our panic
task, the folio->mapping has been cleared, so we panic when we try to go
folio->mapping->host.

crash> nfs_page c00000016fb2a600
struct nfs_page {
  wb_list = {
    next = 0xc00000016fb2a600,
    prev = 0xc00000016fb2a600
  },
  {
    wb_page = 0xc00c000001d49580,
    wb_folio = 0xc00c000001d49580
  },
  wb_lock_context = 0xc00000010518b2c0,
  wb_index = 0x1,
  wb_offset = 0x6940,
  wb_pgbase = 0x6940,
  wb_bytes = 0x40,
  wb_kref = {
    refcount = {
      refs = {
        counter = 0x1
      }
    }
  },
  wb_flags = 0x5,
  wb_verf = {
    data = "\214\205_d\214\210W\036"
  },
  wb_this_page = 0xc00000016fb2a600,
  wb_head = 0xc00000016fb2a600,
  wb_nio = 0x0
}
crash> folio.mapping 0xc00c000001d49580
      mapping = 0x0,

-Scott

> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
  2023-07-25 15:14   ` Trond Myklebust
@ 2023-10-02 20:16   ` Benjamin Coddington
  2023-10-02 20:29   ` Jeff Layton
  2 siblings, 0 replies; 9+ messages in thread
From: Benjamin Coddington @ 2023-10-02 20:16 UTC (permalink / raw)
  To: Scott Mayhew; +Cc: trond.myklebust, anna, linux-nfs

On 25 Jul 2023, at 11:08, Scott Mayhew wrote:

> Once a folio's private data has been cleared, it's possible for another
> process to clear the folio->mapping (e.g. via invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
>
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>

Ben


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] NFS: Fix potential oops in nfs_inode_remove_request()
  2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
  2023-07-25 15:14   ` Trond Myklebust
  2023-10-02 20:16   ` Benjamin Coddington
@ 2023-10-02 20:29   ` Jeff Layton
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2023-10-02 20:29 UTC (permalink / raw)
  To: Scott Mayhew, trond.myklebust, anna; +Cc: linux-nfs

On Tue, 2023-07-25 at 11:08 -0400, Scott Mayhew wrote:
> Once a folio's private data has been cleared, it's possible for another
> process to clear the folio->mapping (e.g. via invalidate_complete_folio2
> or evict_mapping_folio), so it wouldn't be safe to call
> nfs_page_to_inode() after that.
> 
> Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
>  fs/nfs/write.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index f4cca8f00c0c..489c3f9dae23 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -785,6 +785,8 @@ static void nfs_inode_add_request(struct nfs_page *req)
>   */
>  static void nfs_inode_remove_request(struct nfs_page *req)
>  {
> +	struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
> +
>  	if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
>  		struct folio *folio = nfs_page_to_folio(req->wb_head);
>  		struct address_space *mapping = folio_file_mapping(folio);
> @@ -800,7 +802,7 @@ static void nfs_inode_remove_request(struct nfs_page *req)
>  
>  	if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
>  		nfs_release_request(req);
> -		atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
> +		atomic_long_dec(&nfsi->nrequests);
>  	}
>  }
>  

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-02 20:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-25 15:08 [PATCH 0/1] Fix potential oops in nfs_inode_remove_request() Scott Mayhew
2023-07-25 15:08 ` [PATCH 1/1] NFS: " Scott Mayhew
2023-07-25 15:14   ` Trond Myklebust
2023-07-25 16:24     ` Scott Mayhew
2023-07-25 17:41       ` Trond Myklebust
2023-07-26 12:40         ` Jeff Layton
2023-07-26 13:43           ` Scott Mayhew
2023-10-02 20:16   ` Benjamin Coddington
2023-10-02 20:29   ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).