Linux NFS development
 help / color / mirror / Atom feed
* [PATCH 00/11] LAYOUTRETURN on reboot
@ 2024-06-13  5:00 trondmy
  2024-06-13  5:00 ` [PATCH 01/11] NFSv4/pnfs: Remove redundant list check trondmy
  2024-06-14 11:47 ` [PATCH 00/11] LAYOUTRETURN on reboot Jeff Layton
  0 siblings, 2 replies; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Now that https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layrec/ is
mostly done with the review process, I'd like to move the final patches
for the client implementation upstream.

The following patch series therefore adds support to the flexfiles pNFS
driver so that if a metadata server reboot occurs while a client has
layouts outstanding, and is performing I/O, then the client will report
layoutstats and layout errors through a LAYOUTRETURN during the grace
period, after the metadata server comes back up.
This has implications for mirrored workloads, since it allows the client
to report exactly which mirror data instances may have been corrupted
due to the presence of errors during WRITEs or COMMITs.

Trond Myklebust (11):
  NFSv4/pnfs: Remove redundant list check
  NFSv4.1: constify the stateid argument in nfs41_test_stateid()
  NFSv4: Clean up encode_nfs4_stateid()
  pNFS: Add a flag argument to pnfs_destroy_layouts_byclid()
  NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag
  NFSv4/pNFS: Add a helper to defer failed layoutreturn calls
  NFSv4/pNFS: Handle server reboots in pnfs_poc_release()
  NFSv4/pNFS: Retry the layout return later in case of a timeout or
    reboot
  NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument
  NFSv4/pNFS: Remove redundant call to unhash the layout
  NFSv4/pNFS: Do layout state recovery upon reboot

 fs/nfs/callback_proc.c                 |   5 +-
 fs/nfs/flexfilelayout/flexfilelayout.c |   2 +-
 fs/nfs/nfs4_fs.h                       |   3 +-
 fs/nfs/nfs4proc.c                      |  53 ++++--
 fs/nfs/nfs4state.c                     |   4 +-
 fs/nfs/nfs4xdr.c                       |   7 +-
 fs/nfs/pnfs.c                          | 223 +++++++++++++++++++------
 fs/nfs/pnfs.h                          |  30 +++-
 include/linux/nfs_fs_sb.h              |   1 +
 include/linux/nfs_xdr.h                |   2 +-
 10 files changed, 249 insertions(+), 81 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] NFSv4/pnfs: Remove redundant list check
  2024-06-13  5:00 [PATCH 00/11] LAYOUTRETURN on reboot trondmy
@ 2024-06-13  5:00 ` trondmy
  2024-06-13  5:00   ` [PATCH 02/11] NFSv4.1: constify the stateid argument in nfs41_test_stateid() trondmy
  2024-06-14 11:47 ` [PATCH 00/11] LAYOUTRETURN on reboot Jeff Layton
  1 sibling, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

pnfs_layout_free_bulk_destroy_list() already checks for whether the list
is empty or not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b5834728f31b..bbbb692b2a47 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -923,8 +923,6 @@ pnfs_destroy_layouts_byfsid(struct nfs_client *clp,
 	rcu_read_unlock();
 	spin_unlock(&clp->cl_lock);
 
-	if (list_empty(&layout_list))
-		return 0;
 	return pnfs_layout_free_bulk_destroy_list(&layout_list, is_recall);
 }
 
@@ -947,8 +945,6 @@ pnfs_destroy_layouts_byclid(struct nfs_client *clp,
 	rcu_read_unlock();
 	spin_unlock(&clp->cl_lock);
 
-	if (list_empty(&layout_list))
-		return 0;
 	return pnfs_layout_free_bulk_destroy_list(&layout_list, is_recall);
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/11] NFSv4.1: constify the stateid argument in nfs41_test_stateid()
  2024-06-13  5:00 ` [PATCH 01/11] NFSv4/pnfs: Remove redundant list check trondmy
@ 2024-06-13  5:00   ` trondmy
  2024-06-13  5:00     ` [PATCH 03/11] NFSv4: Clean up encode_nfs4_stateid() trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4_fs.h        |  3 ++-
 fs/nfs/nfs4proc.c       | 24 ++++++++++++------------
 fs/nfs/nfs4xdr.c        |  2 +-
 include/linux/nfs_xdr.h |  2 +-
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7024230f0d1d..c2045a2a9d0f 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -67,7 +67,8 @@ struct nfs4_minor_version_ops {
 	void	(*free_lock_state)(struct nfs_server *,
 			struct nfs4_lock_state *);
 	int	(*test_and_free_expired)(struct nfs_server *,
-			nfs4_stateid *, const struct cred *);
+					 const nfs4_stateid *,
+					 const struct cred *);
 	struct nfs_seqid *
 		(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
 	void	(*session_trunk)(struct rpc_clnt *clnt,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a4f85af880c2..ae835d14ac75 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -103,10 +103,10 @@ static struct rpc_task *_nfs41_proc_sequence(struct nfs_client *clp,
 		const struct cred *cred,
 		struct nfs4_slot *slot,
 		bool is_privileged);
-static int nfs41_test_stateid(struct nfs_server *, nfs4_stateid *,
-		const struct cred *);
+static int nfs41_test_stateid(struct nfs_server *, const nfs4_stateid *,
+			      const struct cred *);
 static int nfs41_free_stateid(struct nfs_server *, const nfs4_stateid *,
-		const struct cred *, bool);
+			      const struct cred *, bool);
 #endif
 
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
@@ -2867,16 +2867,16 @@ static int nfs40_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *st
 }
 
 static int nfs40_test_and_free_expired_stateid(struct nfs_server *server,
-		nfs4_stateid *stateid,
-		const struct cred *cred)
+					       const nfs4_stateid *stateid,
+					       const struct cred *cred)
 {
 	return -NFS4ERR_BAD_STATEID;
 }
 
 #if defined(CONFIG_NFS_V4_1)
 static int nfs41_test_and_free_expired_stateid(struct nfs_server *server,
-		nfs4_stateid *stateid,
-		const struct cred *cred)
+					       const nfs4_stateid *stateid,
+					       const struct cred *cred)
 {
 	int status;
 
@@ -10357,12 +10357,12 @@ nfs41_find_root_sec(struct nfs_server *server, struct nfs_fh *fhandle,
 }
 
 static int _nfs41_test_stateid(struct nfs_server *server,
-		nfs4_stateid *stateid,
-		const struct cred *cred)
+			       const nfs4_stateid *stateid,
+			       const struct cred *cred)
 {
 	int status;
 	struct nfs41_test_stateid_args args = {
-		.stateid = stateid,
+		.stateid = *stateid,
 	};
 	struct nfs41_test_stateid_res res;
 	struct rpc_message msg = {
@@ -10418,8 +10418,8 @@ static void nfs4_handle_delay_or_session_error(struct nfs_server *server,
  * failed or the state ID is not currently valid.
  */
 static int nfs41_test_stateid(struct nfs_server *server,
-		nfs4_stateid *stateid,
-		const struct cred *cred)
+			      const nfs4_stateid *stateid,
+			      const struct cred *cred)
 {
 	struct nfs4_exception exception = {
 		.interruptible = true,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 98aab2c324c9..4bf7d5c09282 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2137,7 +2137,7 @@ static void encode_test_stateid(struct xdr_stream *xdr,
 {
 	encode_op_hdr(xdr, OP_TEST_STATEID, decode_test_stateid_maxsz, hdr);
 	encode_uint32(xdr, 1);
-	encode_nfs4_stateid(xdr, args->stateid);
+	encode_nfs4_stateid(xdr, &args->stateid);
 }
 
 static void encode_free_stateid(struct xdr_stream *xdr,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 01efacae4634..45623af3e7b8 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1438,7 +1438,7 @@ struct nfs41_secinfo_no_name_args {
 
 struct nfs41_test_stateid_args {
 	struct nfs4_sequence_args	seq_args;
-	nfs4_stateid			*stateid;
+	nfs4_stateid			stateid;
 };
 
 struct nfs41_test_stateid_res {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/11] NFSv4: Clean up encode_nfs4_stateid()
  2024-06-13  5:00   ` [PATCH 02/11] NFSv4.1: constify the stateid argument in nfs41_test_stateid() trondmy
@ 2024-06-13  5:00     ` trondmy
  2024-06-13  5:00       ` [PATCH 04/11] pNFS: Add a flag argument to pnfs_destroy_layouts_byclid() trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Ensure that we encode the actual stateid, and not any metadata.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4xdr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 4bf7d5c09282..7704a4509676 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1067,9 +1067,10 @@ static void encode_nops(struct compound_hdr *hdr)
 	*hdr->nops_p = htonl(hdr->nops);
 }
 
-static void encode_nfs4_stateid(struct xdr_stream *xdr, const nfs4_stateid *stateid)
+static void encode_nfs4_stateid(struct xdr_stream *xdr,
+				const nfs4_stateid *stateid)
 {
-	encode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
+	encode_opaque_fixed(xdr, stateid->data, NFS4_STATEID_SIZE);
 }
 
 static void encode_nfs4_verifier(struct xdr_stream *xdr, const nfs4_verifier *verf)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/11] pNFS: Add a flag argument to pnfs_destroy_layouts_byclid()
  2024-06-13  5:00     ` [PATCH 03/11] NFSv4: Clean up encode_nfs4_stateid() trondmy
@ 2024-06-13  5:00       ` trondmy
  2024-06-13  5:00         ` [PATCH 05/11] NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Change the bool argument to a flag so that we can add different modes
for doing bulk destroy of a layout. In particular, we will want the
ability to schedule return of all the layouts associated with a given
NFS server when it reboots.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/callback_proc.c |  5 +++--
 fs/nfs/pnfs.c          | 21 +++++++++------------
 fs/nfs/pnfs.h          | 14 +++++++++-----
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 199c52788640..7832fb0369a1 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -323,9 +323,10 @@ static u32 initiate_bulk_draining(struct nfs_client *clp,
 	int stat;
 
 	if (args->cbl_recall_type == RETURN_FSID)
-		stat = pnfs_destroy_layouts_byfsid(clp, &args->cbl_fsid, true);
+		stat = pnfs_layout_destroy_byfsid(clp, &args->cbl_fsid,
+						  PNFS_LAYOUT_BULK_RETURN);
 	else
-		stat = pnfs_destroy_layouts_byclid(clp, true);
+		stat = pnfs_layout_destroy_byclid(clp, PNFS_LAYOUT_BULK_RETURN);
 	if (stat != 0)
 		return NFS4ERR_DELAY;
 	return NFS4ERR_NOMATCHING_LAYOUT;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index bbbb692b2a47..0e188bc303ee 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -868,7 +868,7 @@ pnfs_layout_bulk_destroy_byserver_locked(struct nfs_client *clp,
 
 static int
 pnfs_layout_free_bulk_destroy_list(struct list_head *layout_list,
-		bool is_bulk_recall)
+				   enum pnfs_layout_destroy_mode mode)
 {
 	struct pnfs_layout_hdr *lo;
 	struct inode *inode;
@@ -887,7 +887,7 @@ pnfs_layout_free_bulk_destroy_list(struct list_head *layout_list,
 		spin_lock(&inode->i_lock);
 		list_del_init(&lo->plh_bulk_destroy);
 		if (pnfs_mark_layout_stateid_invalid(lo, &lseg_list)) {
-			if (is_bulk_recall)
+			if (mode == PNFS_LAYOUT_BULK_RETURN)
 				set_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags);
 			ret = -EAGAIN;
 		}
@@ -901,10 +901,8 @@ pnfs_layout_free_bulk_destroy_list(struct list_head *layout_list,
 	return ret;
 }
 
-int
-pnfs_destroy_layouts_byfsid(struct nfs_client *clp,
-		struct nfs_fsid *fsid,
-		bool is_recall)
+int pnfs_layout_destroy_byfsid(struct nfs_client *clp, struct nfs_fsid *fsid,
+			       enum pnfs_layout_destroy_mode mode)
 {
 	struct nfs_server *server;
 	LIST_HEAD(layout_list);
@@ -923,12 +921,11 @@ pnfs_destroy_layouts_byfsid(struct nfs_client *clp,
 	rcu_read_unlock();
 	spin_unlock(&clp->cl_lock);
 
-	return pnfs_layout_free_bulk_destroy_list(&layout_list, is_recall);
+	return pnfs_layout_free_bulk_destroy_list(&layout_list, mode);
 }
 
-int
-pnfs_destroy_layouts_byclid(struct nfs_client *clp,
-		bool is_recall)
+int pnfs_layout_destroy_byclid(struct nfs_client *clp,
+			       enum pnfs_layout_destroy_mode mode)
 {
 	struct nfs_server *server;
 	LIST_HEAD(layout_list);
@@ -945,7 +942,7 @@ pnfs_destroy_layouts_byclid(struct nfs_client *clp,
 	rcu_read_unlock();
 	spin_unlock(&clp->cl_lock);
 
-	return pnfs_layout_free_bulk_destroy_list(&layout_list, is_recall);
+	return pnfs_layout_free_bulk_destroy_list(&layout_list, mode);
 }
 
 /*
@@ -958,7 +955,7 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
 	nfs4_deviceid_mark_client_invalid(clp);
 	nfs4_deviceid_purge_client(clp);
 
-	pnfs_destroy_layouts_byclid(clp, false);
+	pnfs_layout_destroy_byclid(clp, PNFS_LAYOUT_INVALIDATE);
 }
 
 static void
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa5beeaaf5da..a6f9427782c2 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -118,6 +118,11 @@ enum layoutdriver_policy_flags {
 	PNFS_LAYOUTGET_ON_OPEN		= 1 << 3,
 };
 
+enum pnfs_layout_destroy_mode {
+	PNFS_LAYOUT_INVALIDATE = 0,
+	PNFS_LAYOUT_BULK_RETURN,
+};
+
 struct nfs4_deviceid_node;
 
 /* Per-layout driver specific registration structure */
@@ -273,11 +278,10 @@ void pnfs_free_lseg_list(struct list_head *tmp_list);
 void pnfs_destroy_layout(struct nfs_inode *);
 void pnfs_destroy_layout_final(struct nfs_inode *);
 void pnfs_destroy_all_layouts(struct nfs_client *);
-int pnfs_destroy_layouts_byfsid(struct nfs_client *clp,
-		struct nfs_fsid *fsid,
-		bool is_recall);
-int pnfs_destroy_layouts_byclid(struct nfs_client *clp,
-		bool is_recall);
+int pnfs_layout_destroy_byfsid(struct nfs_client *clp, struct nfs_fsid *fsid,
+			       enum pnfs_layout_destroy_mode mode);
+int pnfs_layout_destroy_byclid(struct nfs_client *clp,
+			       enum pnfs_layout_destroy_mode mode);
 bool nfs4_layout_refresh_old_stateid(nfs4_stateid *dst,
 		struct pnfs_layout_range *dst_range,
 		struct inode *inode);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/11] NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag
  2024-06-13  5:00       ` [PATCH 04/11] pNFS: Add a flag argument to pnfs_destroy_layouts_byclid() trondmy
@ 2024-06-13  5:00         ` trondmy
  2024-06-13  5:00           ` [PATCH 06/11] NFSv4/pNFS: Add a helper to defer failed layoutreturn calls trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Add a flag PNFS_LAYOUT_FILE_BULK_RETURN, that will attempt to return all
the layouts in a pnfs_layout_destroy_byfsid/pnfs_layout_destroy_byclid
call, instead of just invalidating them.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 35 +++++++++++++++++++----------------
 fs/nfs/pnfs.h |  1 +
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0e188bc303ee..3bfc74841831 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -476,6 +476,18 @@ pnfs_mark_layout_stateid_invalid(struct pnfs_layout_hdr *lo,
 	return !list_empty(&lo->plh_segs);
 }
 
+static int pnfs_mark_layout_stateid_return(struct pnfs_layout_hdr *lo,
+					   struct list_head *lseg_list,
+					   enum pnfs_iomode iomode, u32 seq)
+{
+	struct pnfs_layout_range range = {
+		.iomode = iomode,
+		.length = NFS4_MAX_UINT64,
+	};
+
+	return pnfs_mark_matching_lsegs_return(lo, lseg_list, &range, seq);
+}
+
 static int
 pnfs_iomode_to_fail_bit(u32 iomode)
 {
@@ -886,7 +898,10 @@ pnfs_layout_free_bulk_destroy_list(struct list_head *layout_list,
 
 		spin_lock(&inode->i_lock);
 		list_del_init(&lo->plh_bulk_destroy);
-		if (pnfs_mark_layout_stateid_invalid(lo, &lseg_list)) {
+		if (mode == PNFS_LAYOUT_FILE_BULK_RETURN) {
+			pnfs_mark_layout_stateid_return(lo, &lseg_list,
+							IOMODE_ANY, 0);
+		} else if (pnfs_mark_layout_stateid_invalid(lo, &lseg_list)) {
 			if (mode == PNFS_LAYOUT_BULK_RETURN)
 				set_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags);
 			ret = -EAGAIN;
@@ -1265,27 +1280,15 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
 	return status;
 }
 
-static bool
-pnfs_layout_segments_returnable(struct pnfs_layout_hdr *lo,
-				enum pnfs_iomode iomode,
-				u32 seq)
-{
-	struct pnfs_layout_range recall_range = {
-		.length = NFS4_MAX_UINT64,
-		.iomode = iomode,
-	};
-	return pnfs_mark_matching_lsegs_return(lo, &lo->plh_return_segs,
-					       &recall_range, seq) != -EBUSY;
-}
-
 /* Return true if layoutreturn is needed */
 static bool
 pnfs_layout_need_return(struct pnfs_layout_hdr *lo)
 {
 	if (!test_bit(NFS_LAYOUT_RETURN_REQUESTED, &lo->plh_flags))
 		return false;
-	return pnfs_layout_segments_returnable(lo, lo->plh_return_iomode,
-					       lo->plh_return_seq);
+	return pnfs_mark_layout_stateid_return(lo, &lo->plh_return_segs,
+					       lo->plh_return_iomode,
+					       lo->plh_return_seq) != EBUSY;
 }
 
 static void pnfs_layoutreturn_before_put_layout_hdr(struct pnfs_layout_hdr *lo)
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a6f9427782c2..8fa0f152ed19 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -121,6 +121,7 @@ enum layoutdriver_policy_flags {
 enum pnfs_layout_destroy_mode {
 	PNFS_LAYOUT_INVALIDATE = 0,
 	PNFS_LAYOUT_BULK_RETURN,
+	PNFS_LAYOUT_FILE_BULK_RETURN,
 };
 
 struct nfs4_deviceid_node;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/11] NFSv4/pNFS: Add a helper to defer failed layoutreturn calls
  2024-06-13  5:00         ` [PATCH 05/11] NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag trondmy
@ 2024-06-13  5:00           ` trondmy
  2024-06-13  5:00             ` [PATCH 07/11] NFSv4/pNFS: Handle server reboots in pnfs_poc_release() trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the layoutreturn-on-close fails due to an RPC layer problem, such as
a timeout, then we want to retry at a later time. Add a helper function
to allow this.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3bfc74841831..a79ae47b3842 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1171,6 +1171,26 @@ static void pnfs_clear_layoutcommit(struct inode *inode,
 	}
 }
 
+static void
+pnfs_layoutreturn_retry_later_locked(struct pnfs_layout_hdr *lo,
+				     const nfs4_stateid *arg_stateid,
+				     const struct pnfs_layout_range *range)
+{
+	const struct pnfs_layout_segment *lseg;
+	u32 seq = be32_to_cpu(arg_stateid->seqid);
+
+	if (pnfs_layout_is_valid(lo) &&
+	    nfs4_stateid_match_other(&lo->plh_stateid, arg_stateid)) {
+		list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) {
+			if (pnfs_seqid_is_newer(lseg->pls_seq, seq) ||
+			    !pnfs_should_free_range(&lseg->pls_range, range))
+				continue;
+			pnfs_set_plh_return_info(lo, range->iomode, seq);
+			break;
+		}
+	}
+}
+
 void pnfs_layoutreturn_free_lsegs(struct pnfs_layout_hdr *lo,
 		const nfs4_stateid *arg_stateid,
 		const struct pnfs_layout_range *range,
@@ -1577,9 +1597,8 @@ void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 	switch (ret) {
 	case -NFS4ERR_NOMATCHING_LAYOUT:
 		spin_lock(&inode->i_lock);
-		if (pnfs_layout_is_valid(lo) &&
-		    nfs4_stateid_match_other(&args->stateid, &lo->plh_stateid))
-			pnfs_set_plh_return_info(lo, args->range.iomode, 0);
+		pnfs_layoutreturn_retry_later_locked(lo, &args->stateid,
+						     &args->range);
 		pnfs_clear_layoutreturn_waitbit(lo);
 		spin_unlock(&inode->i_lock);
 		break;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/11] NFSv4/pNFS: Handle server reboots in pnfs_poc_release()
  2024-06-13  5:00           ` [PATCH 06/11] NFSv4/pNFS: Add a helper to defer failed layoutreturn calls trondmy
@ 2024-06-13  5:00             ` trondmy
  2024-06-13  5:00               ` [PATCH 08/11] NFSv4/pNFS: Retry the layout return later in case of a timeout or reboot trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the server reboots, then handle it by deferring the layout return.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index a79ae47b3842..c8b1be1810e2 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1586,8 +1586,7 @@ int pnfs_roc_done(struct rpc_task *task, struct nfs4_layoutreturn_args **argpp,
 }
 
 void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
-		struct nfs4_layoutreturn_res *res,
-		int ret)
+		      struct nfs4_layoutreturn_res *res, int ret)
 {
 	struct pnfs_layout_hdr *lo = args->layout;
 	struct inode *inode = args->inode;
@@ -1595,6 +1594,9 @@ void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 	struct nfs4_xdr_opaque_data *ld_private = args->ld_private;
 
 	switch (ret) {
+	case -NFS4ERR_BADSESSION:
+	case -NFS4ERR_DEADSESSION:
+	case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
 	case -NFS4ERR_NOMATCHING_LAYOUT:
 		spin_lock(&inode->i_lock);
 		pnfs_layoutreturn_retry_later_locked(lo, &args->stateid,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/11] NFSv4/pNFS: Retry the layout return later in case of a timeout or reboot
  2024-06-13  5:00             ` [PATCH 07/11] NFSv4/pNFS: Handle server reboots in pnfs_poc_release() trondmy
@ 2024-06-13  5:00               ` trondmy
  2024-06-13  5:00                 ` [PATCH 09/11] NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the layout return failed due to a timeout or reboot, then leave the
layout segments on the list so that the layout return gets replayed
later.
The exception would be if we're freeing the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c | 21 ++++++++++++++++++++-
 fs/nfs/pnfs.c     | 12 ++++++++++++
 fs/nfs/pnfs.h     |  3 +++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ae835d14ac75..952d1e930185 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9944,6 +9944,11 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
 	if (!nfs41_sequence_process(task, &lrp->res.seq_res))
 		return;
 
+	if (task->tk_rpc_status == -ETIMEDOUT) {
+		lrp->rpc_status = -EAGAIN;
+		lrp->res.lrs_present = 0;
+		return;
+	}
 	/*
 	 * Was there an RPC level error? Assume the call succeeded,
 	 * and that we need to release the layout
@@ -9966,6 +9971,15 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
 		fallthrough;
 	case 0:
 		break;
+	case -NFS4ERR_BADSESSION:
+	case -NFS4ERR_DEADSESSION:
+	case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+		nfs4_schedule_session_recovery(server->nfs_client->cl_session,
+					       task->tk_status);
+		lrp->res.lrs_present = 0;
+		lrp->rpc_status = -EAGAIN;
+		task->tk_status = 0;
+		break;
 	case -NFS4ERR_DELAY:
 		if (nfs4_async_handle_error(task, server, NULL, NULL) != -EAGAIN)
 			break;
@@ -9983,8 +9997,13 @@ static void nfs4_layoutreturn_release(void *calldata)
 	struct nfs4_layoutreturn *lrp = calldata;
 	struct pnfs_layout_hdr *lo = lrp->args.layout;
 
-	pnfs_layoutreturn_free_lsegs(lo, &lrp->args.stateid, &lrp->args.range,
+	if (lrp->rpc_status == 0 || !lrp->inode)
+		pnfs_layoutreturn_free_lsegs(
+			lo, &lrp->args.stateid, &lrp->args.range,
 			lrp->res.lrs_present ? &lrp->res.stateid : NULL);
+	else
+		pnfs_layoutreturn_retry_later(lo, &lrp->args.stateid,
+					      &lrp->args.range);
 	nfs4_sequence_free_slot(&lrp->res.seq_res);
 	if (lrp->ld_private.ops && lrp->ld_private.ops->free)
 		lrp->ld_private.ops->free(&lrp->ld_private);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c8b1be1810e2..04a52fa3d28c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1191,6 +1191,18 @@ pnfs_layoutreturn_retry_later_locked(struct pnfs_layout_hdr *lo,
 	}
 }
 
+void pnfs_layoutreturn_retry_later(struct pnfs_layout_hdr *lo,
+				   const nfs4_stateid *arg_stateid,
+				   const struct pnfs_layout_range *range)
+{
+	struct inode *inode = lo->plh_inode;
+
+	spin_lock(&inode->i_lock);
+	pnfs_layoutreturn_retry_later_locked(lo, arg_stateid, range);
+	pnfs_clear_layoutreturn_waitbit(lo);
+	spin_unlock(&inode->i_lock);
+}
+
 void pnfs_layoutreturn_free_lsegs(struct pnfs_layout_hdr *lo,
 		const nfs4_stateid *arg_stateid,
 		const struct pnfs_layout_range *range,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8fa0f152ed19..cd23a38eac75 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -328,6 +328,9 @@ struct pnfs_layout_segment *pnfs_update_layout(struct inode *ino,
 					       enum pnfs_iomode iomode,
 					       bool strict_iomode,
 					       gfp_t gfp_flags);
+void pnfs_layoutreturn_retry_later(struct pnfs_layout_hdr *lo,
+				   const nfs4_stateid *arg_stateid,
+				   const struct pnfs_layout_range *range);
 void pnfs_layoutreturn_free_lsegs(struct pnfs_layout_hdr *lo,
 		const nfs4_stateid *arg_stateid,
 		const struct pnfs_layout_range *range,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/11] NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument
  2024-06-13  5:00               ` [PATCH 08/11] NFSv4/pNFS: Retry the layout return later in case of a timeout or reboot trondmy
@ 2024-06-13  5:00                 ` trondmy
  2024-06-13  5:00                   ` [PATCH 10/11] NFSv4/pNFS: Remove redundant call to unhash the layout trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Replace the boolean in nfs4_proc_layoutreturn() with a set of flags that
will allow us to craft a version that is appropriate for reboot
recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c |  8 +++++---
 fs/nfs/pnfs.c     | 18 +++++++++++-------
 fs/nfs/pnfs.h     |  6 +++++-
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 952d1e930185..8d237cf15a03 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -10019,7 +10019,7 @@ static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
 	.rpc_release = nfs4_layoutreturn_release,
 };
 
-int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
+int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, unsigned int flags)
 {
 	struct rpc_task *task;
 	struct rpc_message msg = {
@@ -10042,7 +10042,7 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
 			&task_setup_data.rpc_client, &msg);
 
 	lrp->inode = nfs_igrab_and_active(lrp->args.inode);
-	if (!sync) {
+	if (flags & PNFS_FL_LAYOUTRETURN_ASYNC) {
 		if (!lrp->inode) {
 			nfs4_layoutreturn_release(lrp);
 			return -EAGAIN;
@@ -10050,6 +10050,8 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
 		task_setup_data.flags |= RPC_TASK_ASYNC;
 	}
 	if (!lrp->inode)
+		flags |= PNFS_FL_LAYOUTRETURN_PRIVILEGED;
+	if (flags & PNFS_FL_LAYOUTRETURN_PRIVILEGED)
 		nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1,
 				   1);
 	else
@@ -10058,7 +10060,7 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
 		return PTR_ERR(task);
-	if (sync)
+	if (!(flags & PNFS_FL_LAYOUTRETURN_ASYNC))
 		status = task->tk_status;
 	trace_nfs4_layoutreturn(lrp->args.inode, &lrp->args.stateid, status);
 	dprintk("<-- %s status=%d\n", __func__, status);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 04a52fa3d28c..c482088cb485 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1279,7 +1279,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
 		       const nfs4_stateid *stateid,
 		       const struct cred **pcred,
 		       enum pnfs_iomode iomode,
-		       bool sync)
+		       unsigned int flags)
 {
 	struct inode *ino = lo->plh_inode;
 	struct pnfs_layoutdriver_type *ld = NFS_SERVER(ino)->pnfs_curr_ld;
@@ -1306,7 +1306,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
 	if (ld->prepare_layoutreturn)
 		ld->prepare_layoutreturn(&lrp->args);
 
-	status = nfs4_proc_layoutreturn(lrp, sync);
+	status = nfs4_proc_layoutreturn(lrp, flags);
 out:
 	dprintk("<-- %s status: %d\n", __func__, status);
 	return status;
@@ -1340,7 +1340,8 @@ static void pnfs_layoutreturn_before_put_layout_hdr(struct pnfs_layout_hdr *lo)
 		spin_unlock(&inode->i_lock);
 		if (send) {
 			/* Send an async layoutreturn so we dont deadlock */
-			pnfs_send_layoutreturn(lo, &stateid, &cred, iomode, false);
+			pnfs_send_layoutreturn(lo, &stateid, &cred, iomode,
+					       PNFS_FL_LAYOUTRETURN_ASYNC);
 		}
 	} else
 		spin_unlock(&inode->i_lock);
@@ -1407,7 +1408,8 @@ _pnfs_return_layout(struct inode *ino)
 	send = pnfs_prepare_layoutreturn(lo, &stateid, &cred, NULL);
 	spin_unlock(&ino->i_lock);
 	if (send)
-		status = pnfs_send_layoutreturn(lo, &stateid, &cred, IOMODE_ANY, true);
+		status = pnfs_send_layoutreturn(lo, &stateid, &cred, IOMODE_ANY,
+						0);
 out_wait_layoutreturn:
 	wait_on_bit(&lo->plh_flags, NFS_LAYOUT_RETURN, TASK_UNINTERRUPTIBLE);
 out_put_layout_hdr:
@@ -1548,7 +1550,7 @@ bool pnfs_roc(struct inode *ino,
 		return true;
 	}
 	if (layoutreturn)
-		pnfs_send_layoutreturn(lo, &stateid, &lc_cred, iomode, true);
+		pnfs_send_layoutreturn(lo, &stateid, &lc_cred, iomode, 0);
 	pnfs_put_layout_hdr(lo);
 	return false;
 }
@@ -2595,7 +2597,8 @@ pnfs_mark_layout_for_return(struct inode *inode,
 		return_now = pnfs_prepare_layoutreturn(lo, &stateid, &cred, &iomode);
 		spin_unlock(&inode->i_lock);
 		if (return_now)
-			pnfs_send_layoutreturn(lo, &stateid, &cred, iomode, false);
+			pnfs_send_layoutreturn(lo, &stateid, &cred, iomode,
+					       PNFS_FL_LAYOUTRETURN_ASYNC);
 	} else {
 		spin_unlock(&inode->i_lock);
 		nfs_commit_inode(inode, 0);
@@ -2711,7 +2714,8 @@ static int pnfs_layout_return_unused_byserver(struct nfs_server *server,
 		}
 		spin_unlock(&inode->i_lock);
 		rcu_read_unlock();
-		pnfs_send_layoutreturn(lo, &stateid, &cred, iomode, false);
+		pnfs_send_layoutreturn(lo, &stateid, &cred, iomode,
+				       PNFS_FL_LAYOUTRETURN_ASYNC);
 		pnfs_put_layout_hdr(lo);
 		cond_resched();
 		goto restart;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index cd23a38eac75..d192feb346b4 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -248,6 +248,9 @@ extern const struct pnfs_layoutdriver_type *pnfs_find_layoutdriver(u32 id);
 extern void pnfs_put_layoutdriver(const struct pnfs_layoutdriver_type *ld);
 
 /* nfs4proc.c */
+#define PNFS_FL_LAYOUTRETURN_ASYNC (1U << 0)
+#define PNFS_FL_LAYOUTRETURN_PRIVILEGED (1U << 1)
+
 extern size_t max_response_pages(struct nfs_server *server);
 extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev,
@@ -255,7 +258,8 @@ extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
 extern struct pnfs_layout_segment *
 nfs4_proc_layoutget(struct nfs4_layoutget *lgp,
 		    struct nfs4_exception *exception);
-extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
+extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp,
+				  unsigned int flags);
 
 /* pnfs.c */
 void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/11] NFSv4/pNFS: Remove redundant call to unhash the layout
  2024-06-13  5:00                 ` [PATCH 09/11] NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument trondmy
@ 2024-06-13  5:00                   ` trondmy
  2024-06-13  5:00                     ` [PATCH 11/11] NFSv4/pNFS: Do layout state recovery upon reboot trondmy
  0 siblings, 1 reply; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The layout will be automatically unhashed on final release of the
reference count.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c482088cb485..31df5fae7acb 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -858,8 +858,6 @@ pnfs_layout_bulk_destroy_byserver_locked(struct nfs_client *clp,
 			break;
 		inode = pnfs_grab_inode_layout_hdr(lo);
 		if (inode != NULL) {
-			if (test_and_clear_bit(NFS_LAYOUT_HASHED, &lo->plh_flags))
-				list_del_rcu(&lo->plh_layouts);
 			if (pnfs_layout_add_bulk_destroy_list(inode,
 						layout_list))
 				continue;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/11] NFSv4/pNFS: Do layout state recovery upon reboot
  2024-06-13  5:00                   ` [PATCH 10/11] NFSv4/pNFS: Remove redundant call to unhash the layout trondmy
@ 2024-06-13  5:00                     ` trondmy
  0 siblings, 0 replies; 13+ messages in thread
From: trondmy @ 2024-06-13  5:00 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Some pNFS implementations, such as flexible files, want the client to
send the layout stats and layout errors that may have incurred while the
metadata server was booting. To do so, the client sends a layoutreturn
with an all-zero stateid while the server is in grace during reboot
recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c |   2 +-
 fs/nfs/nfs4state.c                     |   4 +-
 fs/nfs/pnfs.c                          | 106 +++++++++++++++++++++++--
 fs/nfs/pnfs.h                          |   6 ++
 include/linux/nfs_fs_sb.h              |   1 +
 5 files changed, 110 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24188af56d5b..39ba9f4208aa 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2548,7 +2548,7 @@ ff_layout_set_layoutdriver(struct nfs_server *server,
 		const struct nfs_fh *dummy)
 {
 #if IS_ENABLED(CONFIG_NFS_V4_2)
-	server->caps |= NFS_CAP_LAYOUTSTATS;
+	server->caps |= NFS_CAP_LAYOUTSTATS | NFS_CAP_REBOOT_LAYOUTRETURN;
 #endif
 	return 0;
 }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5b452411e8fd..877f682b45f2 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1863,6 +1863,7 @@ static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp)
 
 	if (!nfs4_state_clear_reclaim_reboot(clp))
 		return;
+	pnfs_destroy_all_layouts(clp);
 	ops = clp->cl_mvops->reboot_recovery_ops;
 	cred = nfs4_get_clid_cred(clp);
 	err = nfs4_reclaim_complete(clp, ops, cred);
@@ -2068,7 +2069,6 @@ static int nfs4_establish_lease(struct nfs_client *clp)
 	put_cred(cred);
 	if (status != 0)
 		return status;
-	pnfs_destroy_all_layouts(clp);
 	return 0;
 }
 
@@ -2680,6 +2680,8 @@ static void nfs4_state_manager(struct nfs_client *clp)
 			section = "reclaim reboot";
 			status = nfs4_do_reclaim(clp,
 				clp->cl_mvops->reboot_recovery_ops);
+			if (status == 0)
+				status = pnfs_layout_handle_reboot(clp);
 			if (status == -EAGAIN)
 				continue;
 			if (status < 0)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 31df5fae7acb..aa698481bec8 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -61,6 +61,7 @@ static void pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
 		u32 seq);
 static bool pnfs_lseg_dec_and_remove_zero(struct pnfs_layout_segment *lseg,
 		                struct list_head *tmp_list);
+static int pnfs_layout_return_on_reboot(struct pnfs_layout_hdr *lo);
 
 /* Return the registered pnfs layout driver module matching given id */
 static struct pnfs_layoutdriver_type *
@@ -937,25 +938,37 @@ int pnfs_layout_destroy_byfsid(struct nfs_client *clp, struct nfs_fsid *fsid,
 	return pnfs_layout_free_bulk_destroy_list(&layout_list, mode);
 }
 
-int pnfs_layout_destroy_byclid(struct nfs_client *clp,
-			       enum pnfs_layout_destroy_mode mode)
+static void pnfs_layout_build_destroy_list_byclient(struct nfs_client *clp,
+						    struct list_head *list)
 {
 	struct nfs_server *server;
-	LIST_HEAD(layout_list);
 
 	spin_lock(&clp->cl_lock);
 	rcu_read_lock();
 restart:
 	list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
-		if (pnfs_layout_bulk_destroy_byserver_locked(clp,
-					server,
-					&layout_list) != 0)
+		if (pnfs_layout_bulk_destroy_byserver_locked(clp, server,
+							     list) != 0)
 			goto restart;
 	}
 	rcu_read_unlock();
 	spin_unlock(&clp->cl_lock);
+}
 
-	return pnfs_layout_free_bulk_destroy_list(&layout_list, mode);
+static int pnfs_layout_do_destroy_byclid(struct nfs_client *clp,
+					 struct list_head *list,
+					 enum pnfs_layout_destroy_mode mode)
+{
+	pnfs_layout_build_destroy_list_byclient(clp, list);
+	return pnfs_layout_free_bulk_destroy_list(list, mode);
+}
+
+int pnfs_layout_destroy_byclid(struct nfs_client *clp,
+			       enum pnfs_layout_destroy_mode mode)
+{
+	LIST_HEAD(layout_list);
+
+	return pnfs_layout_do_destroy_byclid(clp, &layout_list, mode);
 }
 
 /*
@@ -971,6 +984,67 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
 	pnfs_layout_destroy_byclid(clp, PNFS_LAYOUT_INVALIDATE);
 }
 
+static void pnfs_layout_build_recover_list_byclient(struct nfs_client *clp,
+						    struct list_head *list)
+{
+	struct nfs_server *server;
+
+	spin_lock(&clp->cl_lock);
+	rcu_read_lock();
+restart:
+	list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
+		if (!(server->caps & NFS_CAP_REBOOT_LAYOUTRETURN))
+			continue;
+		if (pnfs_layout_bulk_destroy_byserver_locked(clp, server,
+							     list) != 0)
+			goto restart;
+	}
+	rcu_read_unlock();
+	spin_unlock(&clp->cl_lock);
+}
+
+static int pnfs_layout_bulk_list_reboot(struct list_head *list)
+{
+	struct pnfs_layout_hdr *lo;
+	struct nfs_server *server;
+	int ret;
+
+	list_for_each_entry(lo, list, plh_bulk_destroy) {
+		server = NFS_SERVER(lo->plh_inode);
+		ret = pnfs_layout_return_on_reboot(lo);
+		switch (ret) {
+		case 0:
+			continue;
+		case -NFS4ERR_BAD_STATEID:
+			server->caps &= ~NFS_CAP_REBOOT_LAYOUTRETURN;
+			break;
+		case -NFS4ERR_NO_GRACE:
+			break;
+		default:
+			goto err;
+		}
+		break;
+	}
+	return 0;
+err:
+	return ret;
+}
+
+int pnfs_layout_handle_reboot(struct nfs_client *clp)
+{
+	LIST_HEAD(list);
+	int ret = 0, ret2;
+
+	pnfs_layout_build_recover_list_byclient(clp, &list);
+	if (!list_empty(&list))
+		ret = pnfs_layout_bulk_list_reboot(&list);
+	ret2 = pnfs_layout_do_destroy_byclid(clp, &list,
+					     PNFS_LAYOUT_INVALIDATE);
+	if (!ret)
+		ret = ret2;
+	return (ret == 0) ?  0 : -EAGAIN;
+}
+
 static void
 pnfs_set_layout_cred(struct pnfs_layout_hdr *lo, const struct cred *cred)
 {
@@ -1445,6 +1519,24 @@ pnfs_commit_and_return_layout(struct inode *inode)
 	return ret;
 }
 
+static int pnfs_layout_return_on_reboot(struct pnfs_layout_hdr *lo)
+{
+	struct inode *inode = lo->plh_inode;
+	const struct cred *cred;
+
+	spin_lock(&inode->i_lock);
+	if (!pnfs_layout_is_valid(lo)) {
+		spin_unlock(&inode->i_lock);
+		return 0;
+	}
+	cred = get_cred(lo->plh_lc_cred);
+	pnfs_get_layout_hdr(lo);
+	spin_unlock(&inode->i_lock);
+
+	return pnfs_send_layoutreturn(lo, &zero_stateid, &cred, IOMODE_ANY,
+				      PNFS_FL_LAYOUTRETURN_PRIVILEGED);
+}
+
 bool pnfs_roc(struct inode *ino,
 		struct nfs4_layoutreturn_args *args,
 		struct nfs4_layoutreturn_res *res,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d192feb346b4..bb5142b4e67a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -356,6 +356,7 @@ void pnfs_error_mark_layout_for_return(struct inode *inode,
 				       struct pnfs_layout_segment *lseg);
 void pnfs_layout_return_unused_byclid(struct nfs_client *clp,
 				      enum pnfs_iomode iomode);
+int pnfs_layout_handle_reboot(struct nfs_client *clp);
 
 /* nfs4_deviceid_flags */
 enum {
@@ -737,6 +738,11 @@ static inline void pnfs_destroy_layout_final(struct nfs_inode *nfsi)
 {
 }
 
+static inline int pnfs_layout_handle_reboot(struct nfs_client *clp)
+{
+	return 0;
+}
+
 static inline struct pnfs_layout_segment *
 pnfs_get_lseg(struct pnfs_layout_segment *lseg)
 {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index fe5b1a8bd723..ba9df1848b35 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -278,6 +278,7 @@ struct nfs_server {
 #define NFS_CAP_LGOPEN		(1U << 5)
 #define NFS_CAP_CASE_INSENSITIVE	(1U << 6)
 #define NFS_CAP_CASE_PRESERVING	(1U << 7)
+#define NFS_CAP_REBOOT_LAYOUTRETURN	(1U << 8)
 #define NFS_CAP_OPEN_XOR	(1U << 12)
 #define NFS_CAP_DELEGTIME	(1U << 13)
 #define NFS_CAP_POSIX_LOCK	(1U << 14)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 00/11] LAYOUTRETURN on reboot
  2024-06-13  5:00 [PATCH 00/11] LAYOUTRETURN on reboot trondmy
  2024-06-13  5:00 ` [PATCH 01/11] NFSv4/pnfs: Remove redundant list check trondmy
@ 2024-06-14 11:47 ` Jeff Layton
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2024-06-14 11:47 UTC (permalink / raw)
  To: trondmy, linux-nfs

On Thu, 2024-06-13 at 01:00 -0400, trondmy@gmail.com wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Now that https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layrec/ is
> mostly done with the review process, I'd like to move the final patches
> for the client implementation upstream.
> 
> The following patch series therefore adds support to the flexfiles pNFS
> driver so that if a metadata server reboot occurs while a client has
> layouts outstanding, and is performing I/O, then the client will report
> layoutstats and layout errors through a LAYOUTRETURN during the grace
> period, after the metadata server comes back up.
> This has implications for mirrored workloads, since it allows the client
> to report exactly which mirror data instances may have been corrupted
> due to the presence of errors during WRITEs or COMMITs.
> 
> Trond Myklebust (11):
>   NFSv4/pnfs: Remove redundant list check
>   NFSv4.1: constify the stateid argument in nfs41_test_stateid()
>   NFSv4: Clean up encode_nfs4_stateid()
>   pNFS: Add a flag argument to pnfs_destroy_layouts_byclid()
>   NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag
>   NFSv4/pNFS: Add a helper to defer failed layoutreturn calls
>   NFSv4/pNFS: Handle server reboots in pnfs_poc_release()
>   NFSv4/pNFS: Retry the layout return later in case of a timeout or
>     reboot
>   NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument
>   NFSv4/pNFS: Remove redundant call to unhash the layout
>   NFSv4/pNFS: Do layout state recovery upon reboot
> 
>  fs/nfs/callback_proc.c                 |   5 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c |   2 +-
>  fs/nfs/nfs4_fs.h                       |   3 +-
>  fs/nfs/nfs4proc.c                      |  53 ++++--
>  fs/nfs/nfs4state.c                     |   4 +-
>  fs/nfs/nfs4xdr.c                       |   7 +-
>  fs/nfs/pnfs.c                          | 223 +++++++++++++++++++------
>  fs/nfs/pnfs.h                          |  30 +++-
>  include/linux/nfs_fs_sb.h              |   1 +
>  include/linux/nfs_xdr.h                |   2 +-
>  10 files changed, 249 insertions(+), 81 deletions(-)
> 

These have been used for a while inside of Meta vs. Hammerspace's
servers and have been behaving.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-06-14 11:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13  5:00 [PATCH 00/11] LAYOUTRETURN on reboot trondmy
2024-06-13  5:00 ` [PATCH 01/11] NFSv4/pnfs: Remove redundant list check trondmy
2024-06-13  5:00   ` [PATCH 02/11] NFSv4.1: constify the stateid argument in nfs41_test_stateid() trondmy
2024-06-13  5:00     ` [PATCH 03/11] NFSv4: Clean up encode_nfs4_stateid() trondmy
2024-06-13  5:00       ` [PATCH 04/11] pNFS: Add a flag argument to pnfs_destroy_layouts_byclid() trondmy
2024-06-13  5:00         ` [PATCH 05/11] NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag trondmy
2024-06-13  5:00           ` [PATCH 06/11] NFSv4/pNFS: Add a helper to defer failed layoutreturn calls trondmy
2024-06-13  5:00             ` [PATCH 07/11] NFSv4/pNFS: Handle server reboots in pnfs_poc_release() trondmy
2024-06-13  5:00               ` [PATCH 08/11] NFSv4/pNFS: Retry the layout return later in case of a timeout or reboot trondmy
2024-06-13  5:00                 ` [PATCH 09/11] NFSv4/pnfs: Give nfs4_proc_layoutreturn() a flags argument trondmy
2024-06-13  5:00                   ` [PATCH 10/11] NFSv4/pNFS: Remove redundant call to unhash the layout trondmy
2024-06-13  5:00                     ` [PATCH 11/11] NFSv4/pNFS: Do layout state recovery upon reboot trondmy
2024-06-14 11:47 ` [PATCH 00/11] LAYOUTRETURN on reboot Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox