public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
@ 2025-05-11 13:28 trondmy
  2025-05-11 13:48 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: trondmy @ 2025-05-11 13:28 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.

Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 10fdd065a61c..fc7c5fb10198 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
 	return remaining;
 }
 
+static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
+{
+	struct pnfs_layout_segment *lseg;
+
+	list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
+		pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
+}
+
 static void
 pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
 		struct list_head *free_me,
@@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct pnfs_layout_hdr *lo,
 		pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
 		pnfs_free_returned_lsegs(lo, &freeme, range, seq);
 		pnfs_set_layout_stateid(lo, stateid, NULL, true);
+		pnfs_reset_return_info(lo);
 	} else
 		pnfs_mark_layout_stateid_invalid(lo, &freeme);
 out_unlock:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
  2025-05-11 13:28 [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn trondmy
@ 2025-05-11 13:48 ` Trond Myklebust
  2025-05-12 18:59   ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2025-05-11 13:48 UTC (permalink / raw)
  To: osandov@osandov.com, linux-nfs@vger.kernel.org,
	jlayton@kernel.org, clm@fb.com

Hi Jeff and Omar,

On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> If there are still layout segments in the layout plh_return_lsegs
> list
> after a layout return, we should be resetting the state to ensure
> they
> eventually get returned as well.
> 
> Fixes: 68f744797edd ("pNFS: Do not free layout segments that are
> marked for return")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/pnfs.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 10fdd065a61c..fc7c5fb10198 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct
> pnfs_layout_hdr *lo,
>   return remaining;
>  }
>  
> +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
> +{
> + struct pnfs_layout_segment *lseg;
> +
> + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
> + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
> +}
> +
>  static void
>  pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
>   struct list_head *free_me,
> @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct
> pnfs_layout_hdr *lo,
>   pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
>   pnfs_free_returned_lsegs(lo, &freeme, range, seq);
>   pnfs_set_layout_stateid(lo, stateid, NULL, true);
> + pnfs_reset_return_info(lo);
>   } else
>   pnfs_mark_layout_stateid_invalid(lo, &freeme);
>  out_unlock:

Could the above bug perhaps explain the issue with leaked layout
segments that you were seeing?
If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the server
is unable to recall the layout due to the network getting shut down,
then it seems to me that these layout segments just disappear down a
black hole.

IOW: the scenario is something like this:
 * The client holds a read and a read/write layout.
 * The server recalls the read layout.
 * The client closes the file while the recall is being processed, so
   that the read and read/write layout segments are both put on the
   plh_return_segs list.
 * The client returns the read layout, and clears the associated read
   layout segments. The read/write layout segments are still on the
   list, but without NFS_LAYOUT_RETURN_REQUESTED being set.

Cheers
Trond

-- 
Trond Myklebust Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
  2025-05-11 13:48 ` Trond Myklebust
@ 2025-05-12 18:59   ` Jeff Layton
  2025-05-13  0:35     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-05-12 18:59 UTC (permalink / raw)
  To: Trond Myklebust, osandov@osandov.com, linux-nfs@vger.kernel.org,
	clm@fb.com

On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote:
> Hi Jeff and Omar,
> 
> On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > If there are still layout segments in the layout plh_return_lsegs
> > list
> > after a layout return, we should be resetting the state to ensure
> > they
> > eventually get returned as well.
> > 
> > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are
> > marked for return")
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/pnfs.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index 10fdd065a61c..fc7c5fb10198 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct
> > pnfs_layout_hdr *lo,
> >   return remaining;
> >  }
> >  
> > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
> > +{
> > + struct pnfs_layout_segment *lseg;
> > +
> > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
> > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
> > +}
> > +
> >  static void
> >  pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
> >   struct list_head *free_me,
> > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct
> > pnfs_layout_hdr *lo,
> >   pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
> >   pnfs_free_returned_lsegs(lo, &freeme, range, seq);
> >   pnfs_set_layout_stateid(lo, stateid, NULL, true);
> > + pnfs_reset_return_info(lo);
> >   } else
> >   pnfs_mark_layout_stateid_invalid(lo, &freeme);
> >  out_unlock:
> 
> Could the above bug perhaps explain the issue with leaked layout
> segments that you were seeing?
> If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the server
> is unable to recall the layout due to the network getting shut down,
> then it seems to me that these layout segments just disappear down a
> black hole.
> 
> IOW: the scenario is something like this:
>  * The client holds a read and a read/write layout.
>  * The server recalls the read layout.
>  * The client closes the file while the recall is being processed, so
>    that the read and read/write layout segments are both put on the
>    plh_return_segs list.
>  * The client returns the read layout, and clears the associated read
>    layout segments. The read/write layout segments are still on the
>    list, but without NFS_LAYOUT_RETURN_REQUESTED being set.
> 

Maybe?

The problem I think we hit was that pnfs_put_layout_hdr() got called
and its refcount went to zero while there were still entries on
plh_return_segs.

pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr()
as its "last ditch" effort to clean out the plh_return_segs list. It
looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set on
all of those entries, but if that flag gets set during the
pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may be
too late and they'll just leak anyway.

So I guess the question is: is every entry on plh_return_segs
guaranteed to get a first attempt at a LAYOUTRETURN before
pnfs_layoutreturn_before_put_layout_hdr() is called?

If so, then yes that should fix it. If not, then I think it may not
(unless I'm misunderstanding this code).

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
  2025-05-12 18:59   ` Jeff Layton
@ 2025-05-13  0:35     ` Trond Myklebust
  2025-05-13 15:21       ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2025-05-13  0:35 UTC (permalink / raw)
  To: osandov@osandov.com, linux-nfs@vger.kernel.org,
	jlayton@kernel.org, clm@fb.com

On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote:
> On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote:
> > Hi Jeff and Omar,
> > 
> > On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > 
> > > If there are still layout segments in the layout plh_return_lsegs
> > > list
> > > after a layout return, we should be resetting the state to ensure
> > > they
> > > eventually get returned as well.
> > > 
> > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are
> > > marked for return")
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > >  fs/nfs/pnfs.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > index 10fdd065a61c..fc7c5fb10198 100644
> > > --- a/fs/nfs/pnfs.c
> > > +++ b/fs/nfs/pnfs.c
> > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct
> > > pnfs_layout_hdr *lo,
> > >   return remaining;
> > >  }
> > >  
> > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
> > > +{
> > > + struct pnfs_layout_segment *lseg;
> > > +
> > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
> > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
> > > +}
> > > +
> > >  static void
> > >  pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
> > >   struct list_head *free_me,
> > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct
> > > pnfs_layout_hdr *lo,
> > >   pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
> > >   pnfs_free_returned_lsegs(lo, &freeme, range, seq);
> > >   pnfs_set_layout_stateid(lo, stateid, NULL, true);
> > > + pnfs_reset_return_info(lo);
> > >   } else
> > >   pnfs_mark_layout_stateid_invalid(lo, &freeme);
> > >  out_unlock:
> > 
> > Could the above bug perhaps explain the issue with leaked layout
> > segments that you were seeing?
> > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the
> > server
> > is unable to recall the layout due to the network getting shut
> > down,
> > then it seems to me that these layout segments just disappear down
> > a
> > black hole.
> > 
> > IOW: the scenario is something like this:
> >  * The client holds a read and a read/write layout.
> >  * The server recalls the read layout.
> >  * The client closes the file while the recall is being processed,
> > so
> >    that the read and read/write layout segments are both put on the
> >    plh_return_segs list.
> >  * The client returns the read layout, and clears the associated
> > read
> >    layout segments. The read/write layout segments are still on the
> >    list, but without NFS_LAYOUT_RETURN_REQUESTED being set.
> > 
> 
> Maybe?
> 
> The problem I think we hit was that pnfs_put_layout_hdr() got called
> and its refcount went to zero while there were still entries on
> plh_return_segs.
> 
> pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr()
> as its "last ditch" effort to clean out the plh_return_segs list. It
> looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set
> on
> all of those entries, but if that flag gets set during the
> pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may
> be
> too late and they'll just leak anyway.
> 
> So I guess the question is: is every entry on plh_return_segs
> guaranteed to get a first attempt at a LAYOUTRETURN before
> pnfs_layoutreturn_before_put_layout_hdr() is called?
> 
> If so, then yes that should fix it. If not, then I think it may not
> (unless I'm misunderstanding this code).

So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will
take a reference to the layout (in pnfs_prepare_layoutreturn()) before
it kicks off the layoutreturn RPC call. Upon finishing the
layoutreturn, the nfs4_layoutreturn_release() callback will then
trigger another layoutreturn attempt when it calls
pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was
set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info().

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
  2025-05-13  0:35     ` Trond Myklebust
@ 2025-05-13 15:21       ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2025-05-13 15:21 UTC (permalink / raw)
  To: Trond Myklebust, osandov@osandov.com, linux-nfs@vger.kernel.org,
	clm@fb.com

On Tue, 2025-05-13 at 00:35 +0000, Trond Myklebust wrote:
> On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote:
> > On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote:
> > > Hi Jeff and Omar,
> > > 
> > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote:
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > If there are still layout segments in the layout plh_return_lsegs
> > > > list
> > > > after a layout return, we should be resetting the state to ensure
> > > > they
> > > > eventually get returned as well.
> > > > 
> > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are
> > > > marked for return")
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > ---
> > > >  fs/nfs/pnfs.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > > index 10fdd065a61c..fc7c5fb10198 100644
> > > > --- a/fs/nfs/pnfs.c
> > > > +++ b/fs/nfs/pnfs.c
> > > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct
> > > > pnfs_layout_hdr *lo,
> > > >   return remaining;
> > > >  }
> > > >  
> > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
> > > > +{
> > > > + struct pnfs_layout_segment *lseg;
> > > > +
> > > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
> > > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
> > > > +}
> > > > +
> > > >  static void
> > > >  pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
> > > >   struct list_head *free_me,
> > > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct
> > > > pnfs_layout_hdr *lo,
> > > >   pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
> > > >   pnfs_free_returned_lsegs(lo, &freeme, range, seq);
> > > >   pnfs_set_layout_stateid(lo, stateid, NULL, true);
> > > > + pnfs_reset_return_info(lo);
> > > >   } else
> > > >   pnfs_mark_layout_stateid_invalid(lo, &freeme);
> > > >  out_unlock:
> > > 
> > > Could the above bug perhaps explain the issue with leaked layout
> > > segments that you were seeing?
> > > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the
> > > server
> > > is unable to recall the layout due to the network getting shut
> > > down,
> > > then it seems to me that these layout segments just disappear down
> > > a
> > > black hole.
> > > 
> > > IOW: the scenario is something like this:
> > >  * The client holds a read and a read/write layout.
> > >  * The server recalls the read layout.
> > >  * The client closes the file while the recall is being processed,
> > > so
> > >    that the read and read/write layout segments are both put on the
> > >    plh_return_segs list.
> > >  * The client returns the read layout, and clears the associated
> > > read
> > >    layout segments. The read/write layout segments are still on the
> > >    list, but without NFS_LAYOUT_RETURN_REQUESTED being set.
> > > 
> > 
> > Maybe?
> > 
> > The problem I think we hit was that pnfs_put_layout_hdr() got called
> > and its refcount went to zero while there were still entries on
> > plh_return_segs.
> > 
> > pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr()
> > as its "last ditch" effort to clean out the plh_return_segs list. It
> > looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set
> > on
> > all of those entries, but if that flag gets set during the
> > pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may
> > be
> > too late and they'll just leak anyway.
> > 
> > So I guess the question is: is every entry on plh_return_segs
> > guaranteed to get a first attempt at a LAYOUTRETURN before
> > pnfs_layoutreturn_before_put_layout_hdr() is called?
> > 
> > If so, then yes that should fix it. If not, then I think it may not
> > (unless I'm misunderstanding this code).
> 
> So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will
> take a reference to the layout (in pnfs_prepare_layoutreturn()) before
> it kicks off the layoutreturn RPC call. Upon finishing the
> layoutreturn, the nfs4_layoutreturn_release() callback will then
> trigger another layoutreturn attempt when it calls
> pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was
> set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info().
> 

Oh lovely. I missed that detail. So the thing that puts a reference to
the layout_hdr will _itself_ take and put an extra reference. I guess
the extra put happens in another context though so you don't need to
worry about recursion.

So yeah, that probably will fix the problem Omar found. I'll see about
spinning up a kernel internally that we can test.

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-13 15:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-11 13:28 [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn trondmy
2025-05-11 13:48 ` Trond Myklebust
2025-05-12 18:59   ` Jeff Layton
2025-05-13  0:35     ` Trond Myklebust
2025-05-13 15:21       ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox