* [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn
@ 2025-05-11 13:28 trondmy
2025-05-11 13:48 ` Trond Myklebust
0 siblings, 1 reply; 5+ messages in thread
From: trondmy @ 2025-05-11 13:28 UTC (permalink / raw)
To: linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/pnfs.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 10fdd065a61c..fc7c5fb10198 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
return remaining;
}
+static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
+{
+ struct pnfs_layout_segment *lseg;
+
+ list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
+ pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
+}
+
static void
pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
struct list_head *free_me,
@@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct pnfs_layout_hdr *lo,
pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
pnfs_free_returned_lsegs(lo, &freeme, range, seq);
pnfs_set_layout_stateid(lo, stateid, NULL, true);
+ pnfs_reset_return_info(lo);
} else
pnfs_mark_layout_stateid_invalid(lo, &freeme);
out_unlock:
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn 2025-05-11 13:28 [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn trondmy @ 2025-05-11 13:48 ` Trond Myklebust 2025-05-12 18:59 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2025-05-11 13:48 UTC (permalink / raw) To: osandov@osandov.com, linux-nfs@vger.kernel.org, jlayton@kernel.org, clm@fb.com Hi Jeff and Omar, On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > If there are still layout segments in the layout plh_return_lsegs > list > after a layout return, we should be resetting the state to ensure > they > eventually get returned as well. > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are > marked for return") > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/pnfs.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > index 10fdd065a61c..fc7c5fb10198 100644 > --- a/fs/nfs/pnfs.c > +++ b/fs/nfs/pnfs.c > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct > pnfs_layout_hdr *lo, > return remaining; > } > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) > +{ > + struct pnfs_layout_segment *lseg; > + > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); > +} > + > static void > pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, > struct list_head *free_me, > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct > pnfs_layout_hdr *lo, > pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); > pnfs_free_returned_lsegs(lo, &freeme, range, seq); > pnfs_set_layout_stateid(lo, stateid, NULL, true); > + pnfs_reset_return_info(lo); > } else > pnfs_mark_layout_stateid_invalid(lo, &freeme); > out_unlock: Could the above bug perhaps explain the issue with leaked layout segments that you were seeing? If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the server is unable to recall the layout due to the network getting shut down, then it seems to me that these layout segments just disappear down a black hole. IOW: the scenario is something like this: * The client holds a read and a read/write layout. * The server recalls the read layout. * The client closes the file while the recall is being processed, so that the read and read/write layout segments are both put on the plh_return_segs list. * The client returns the read layout, and clears the associated read layout segments. The read/write layout segments are still on the list, but without NFS_LAYOUT_RETURN_REQUESTED being set. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn 2025-05-11 13:48 ` Trond Myklebust @ 2025-05-12 18:59 ` Jeff Layton 2025-05-13 0:35 ` Trond Myklebust 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2025-05-12 18:59 UTC (permalink / raw) To: Trond Myklebust, osandov@osandov.com, linux-nfs@vger.kernel.org, clm@fb.com On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote: > Hi Jeff and Omar, > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > If there are still layout segments in the layout plh_return_lsegs > > list > > after a layout return, we should be resetting the state to ensure > > they > > eventually get returned as well. > > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are > > marked for return") > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/pnfs.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > index 10fdd065a61c..fc7c5fb10198 100644 > > --- a/fs/nfs/pnfs.c > > +++ b/fs/nfs/pnfs.c > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct > > pnfs_layout_hdr *lo, > > return remaining; > > } > > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) > > +{ > > + struct pnfs_layout_segment *lseg; > > + > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); > > +} > > + > > static void > > pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, > > struct list_head *free_me, > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct > > pnfs_layout_hdr *lo, > > pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); > > pnfs_free_returned_lsegs(lo, &freeme, range, seq); > > pnfs_set_layout_stateid(lo, stateid, NULL, true); > > + pnfs_reset_return_info(lo); > > } else > > pnfs_mark_layout_stateid_invalid(lo, &freeme); > > out_unlock: > > Could the above bug perhaps explain the issue with leaked layout > segments that you were seeing? > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the server > is unable to recall the layout due to the network getting shut down, > then it seems to me that these layout segments just disappear down a > black hole. > > IOW: the scenario is something like this: > * The client holds a read and a read/write layout. > * The server recalls the read layout. > * The client closes the file while the recall is being processed, so > that the read and read/write layout segments are both put on the > plh_return_segs list. > * The client returns the read layout, and clears the associated read > layout segments. The read/write layout segments are still on the > list, but without NFS_LAYOUT_RETURN_REQUESTED being set. > Maybe? The problem I think we hit was that pnfs_put_layout_hdr() got called and its refcount went to zero while there were still entries on plh_return_segs. pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr() as its "last ditch" effort to clean out the plh_return_segs list. It looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set on all of those entries, but if that flag gets set during the pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may be too late and they'll just leak anyway. So I guess the question is: is every entry on plh_return_segs guaranteed to get a first attempt at a LAYOUTRETURN before pnfs_layoutreturn_before_put_layout_hdr() is called? If so, then yes that should fix it. If not, then I think it may not (unless I'm misunderstanding this code). -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn 2025-05-12 18:59 ` Jeff Layton @ 2025-05-13 0:35 ` Trond Myklebust 2025-05-13 15:21 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2025-05-13 0:35 UTC (permalink / raw) To: osandov@osandov.com, linux-nfs@vger.kernel.org, jlayton@kernel.org, clm@fb.com On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote: > On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote: > > Hi Jeff and Omar, > > > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote: > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > If there are still layout segments in the layout plh_return_lsegs > > > list > > > after a layout return, we should be resetting the state to ensure > > > they > > > eventually get returned as well. > > > > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are > > > marked for return") > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > --- > > > fs/nfs/pnfs.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > > index 10fdd065a61c..fc7c5fb10198 100644 > > > --- a/fs/nfs/pnfs.c > > > +++ b/fs/nfs/pnfs.c > > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct > > > pnfs_layout_hdr *lo, > > > return remaining; > > > } > > > > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) > > > +{ > > > + struct pnfs_layout_segment *lseg; > > > + > > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) > > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); > > > +} > > > + > > > static void > > > pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, > > > struct list_head *free_me, > > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct > > > pnfs_layout_hdr *lo, > > > pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); > > > pnfs_free_returned_lsegs(lo, &freeme, range, seq); > > > pnfs_set_layout_stateid(lo, stateid, NULL, true); > > > + pnfs_reset_return_info(lo); > > > } else > > > pnfs_mark_layout_stateid_invalid(lo, &freeme); > > > out_unlock: > > > > Could the above bug perhaps explain the issue with leaked layout > > segments that you were seeing? > > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the > > server > > is unable to recall the layout due to the network getting shut > > down, > > then it seems to me that these layout segments just disappear down > > a > > black hole. > > > > IOW: the scenario is something like this: > > * The client holds a read and a read/write layout. > > * The server recalls the read layout. > > * The client closes the file while the recall is being processed, > > so > > that the read and read/write layout segments are both put on the > > plh_return_segs list. > > * The client returns the read layout, and clears the associated > > read > > layout segments. The read/write layout segments are still on the > > list, but without NFS_LAYOUT_RETURN_REQUESTED being set. > > > > Maybe? > > The problem I think we hit was that pnfs_put_layout_hdr() got called > and its refcount went to zero while there were still entries on > plh_return_segs. > > pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr() > as its "last ditch" effort to clean out the plh_return_segs list. It > looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set > on > all of those entries, but if that flag gets set during the > pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may > be > too late and they'll just leak anyway. > > So I guess the question is: is every entry on plh_return_segs > guaranteed to get a first attempt at a LAYOUTRETURN before > pnfs_layoutreturn_before_put_layout_hdr() is called? > > If so, then yes that should fix it. If not, then I think it may not > (unless I'm misunderstanding this code). So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will take a reference to the layout (in pnfs_prepare_layoutreturn()) before it kicks off the layoutreturn RPC call. Upon finishing the layoutreturn, the nfs4_layoutreturn_release() callback will then trigger another layoutreturn attempt when it calls pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info(). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn 2025-05-13 0:35 ` Trond Myklebust @ 2025-05-13 15:21 ` Jeff Layton 0 siblings, 0 replies; 5+ messages in thread From: Jeff Layton @ 2025-05-13 15:21 UTC (permalink / raw) To: Trond Myklebust, osandov@osandov.com, linux-nfs@vger.kernel.org, clm@fb.com On Tue, 2025-05-13 at 00:35 +0000, Trond Myklebust wrote: > On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote: > > On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote: > > > Hi Jeff and Omar, > > > > > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@kernel.org wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > If there are still layout segments in the layout plh_return_lsegs > > > > list > > > > after a layout return, we should be resetting the state to ensure > > > > they > > > > eventually get returned as well. > > > > > > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are > > > > marked for return") > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > --- > > > > fs/nfs/pnfs.c | 9 +++++++++ > > > > 1 file changed, 9 insertions(+) > > > > > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > > > index 10fdd065a61c..fc7c5fb10198 100644 > > > > --- a/fs/nfs/pnfs.c > > > > +++ b/fs/nfs/pnfs.c > > > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct > > > > pnfs_layout_hdr *lo, > > > > return remaining; > > > > } > > > > > > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) > > > > +{ > > > > + struct pnfs_layout_segment *lseg; > > > > + > > > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) > > > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); > > > > +} > > > > + > > > > static void > > > > pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, > > > > struct list_head *free_me, > > > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct > > > > pnfs_layout_hdr *lo, > > > > pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); > > > > pnfs_free_returned_lsegs(lo, &freeme, range, seq); > > > > pnfs_set_layout_stateid(lo, stateid, NULL, true); > > > > + pnfs_reset_return_info(lo); > > > > } else > > > > pnfs_mark_layout_stateid_invalid(lo, &freeme); > > > > out_unlock: > > > > > > Could the above bug perhaps explain the issue with leaked layout > > > segments that you were seeing? > > > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the > > > server > > > is unable to recall the layout due to the network getting shut > > > down, > > > then it seems to me that these layout segments just disappear down > > > a > > > black hole. > > > > > > IOW: the scenario is something like this: > > > * The client holds a read and a read/write layout. > > > * The server recalls the read layout. > > > * The client closes the file while the recall is being processed, > > > so > > > that the read and read/write layout segments are both put on the > > > plh_return_segs list. > > > * The client returns the read layout, and clears the associated > > > read > > > layout segments. The read/write layout segments are still on the > > > list, but without NFS_LAYOUT_RETURN_REQUESTED being set. > > > > > > > Maybe? > > > > The problem I think we hit was that pnfs_put_layout_hdr() got called > > and its refcount went to zero while there were still entries on > > plh_return_segs. > > > > pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr() > > as its "last ditch" effort to clean out the plh_return_segs list. It > > looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set > > on > > all of those entries, but if that flag gets set during the > > pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may > > be > > too late and they'll just leak anyway. > > > > So I guess the question is: is every entry on plh_return_segs > > guaranteed to get a first attempt at a LAYOUTRETURN before > > pnfs_layoutreturn_before_put_layout_hdr() is called? > > > > If so, then yes that should fix it. If not, then I think it may not > > (unless I'm misunderstanding this code). > > So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will > take a reference to the layout (in pnfs_prepare_layoutreturn()) before > it kicks off the layoutreturn RPC call. Upon finishing the > layoutreturn, the nfs4_layoutreturn_release() callback will then > trigger another layoutreturn attempt when it calls > pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was > set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info(). > Oh lovely. I missed that detail. So the thing that puts a reference to the layout_hdr will _itself_ take and put an extra reference. I guess the extra put happens in another context though so you don't need to worry about recursion. So yeah, that probably will fix the problem Omar found. I'll see about spinning up a kernel internally that we can test. Thanks! -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-13 15:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-11 13:28 [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn trondmy 2025-05-11 13:48 ` Trond Myklebust 2025-05-12 18:59 ` Jeff Layton 2025-05-13 0:35 ` Trond Myklebust 2025-05-13 15:21 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox