[PATCH v2 0/7] Client-side OFFLOAD

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation
@ 2024-12-20 15:42 cel
  2024-12-20 15:42 ` [PATCH v2 1/7] NFS: CB_OFFLOAD can return NFS4ERR_DELAY cel
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

SCSI implementation experience has shown that an interrupt-only
COPY offload implementation is not reliable. There are too many
common scenarios where the client can miss the completion interrupt
(in our case, this is a CB_OFFLOAD callback).

Therefore, a polling mechanism is needed. The NFSv4.2 protocol
provides one in the form of the new OFFLOAD_STATUS operation. Linux
NFSD implements OFFLOAD_STATUS already. This series adds a Linux NFS
client implementation of the OFFLOAD_STATUS operation that can query
the state of a background COPY on the server.

These patches are also available here:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=fix-async-copy

Changes since v1:
- nfs42_proc_offload_status() now uses a synchronous RPC

Chuck Lever (7):
  NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  NFS: Fix typo in OFFLOAD_CANCEL comment
  NFS: Rename struct nfs4_offloadcancel_data
  NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR
  NFS: Implement NFSv4.2's OFFLOAD_STATUS operation
  NFS: Use NFSv4.2's OFFLOAD_STATUS operation
  NFS: Refactor trace_nfs4_offload_cancel

 fs/nfs/callback_proc.c    |   2 +-
 fs/nfs/nfs42proc.c        | 188 ++++++++++++++++++++++++++++++++++----
 fs/nfs/nfs42xdr.c         |  88 +++++++++++++++++-
 fs/nfs/nfs4proc.c         |   3 +-
 fs/nfs/nfs4trace.h        |  11 ++-
 fs/nfs/nfs4xdr.c          |   1 +
 include/linux/nfs4.h      |   1 +
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |   5 +-
 9 files changed, 275 insertions(+), 25 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/7] NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 15:42 ` [PATCH v2 2/7] NFS: Fix typo in OFFLOAD_CANCEL comment cel
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

RFC 7862 permits the callback service to respond to a CB_OFFLOAD
operation with NFS4ERR_DELAY. Use that instead of
NFS4ERR_SERVERFAULT for temporary memory allocation failure, as that
is more consistent with how other operations report memory
allocation failure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/callback_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 7832fb0369a1..8397c43358bd 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -718,7 +718,7 @@ __be32 nfs4_callback_offload(void *data, void *dummy,
 
 	copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
 	if (!copy)
-		return htonl(NFS4ERR_SERVERFAULT);
+		return cpu_to_be32(NFS4ERR_DELAY);
 
 	spin_lock(&cps->clp->cl_lock);
 	rcu_read_lock();
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/7] NFS: Fix typo in OFFLOAD_CANCEL comment
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
  2024-12-20 15:42 ` [PATCH v2 1/7] NFS: CB_OFFLOAD can return NFS4ERR_DELAY cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 15:42 ` [PATCH v2 3/7] NFS: Rename struct nfs4_offloadcancel_data cel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42xdr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 9e3ae53e2205..ef5730c5e704 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -549,7 +549,7 @@ static void nfs4_xdr_enc_copy(struct rpc_rqst *req,
 }
 
 /*
- * Encode OFFLOAD_CANEL request
+ * Encode OFFLOAD_CANCEL request
  */
 static void nfs4_xdr_enc_offload_cancel(struct rpc_rqst *req,
 					struct xdr_stream *xdr,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/7] NFS: Rename struct nfs4_offloadcancel_data
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
  2024-12-20 15:42 ` [PATCH v2 1/7] NFS: CB_OFFLOAD can return NFS4ERR_DELAY cel
  2024-12-20 15:42 ` [PATCH v2 2/7] NFS: Fix typo in OFFLOAD_CANCEL comment cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 15:42 ` [PATCH v2 4/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR cel
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Refactor: This struct can be used unchanged for the new
OFFLOAD_STATUS implementation, so give it a more generic name.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42proc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 531c9c20ef1d..9d716907cf30 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -498,15 +498,15 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 	return err;
 }
 
-struct nfs42_offloadcancel_data {
+struct nfs42_offload_data {
 	struct nfs_server *seq_server;
 	struct nfs42_offload_status_args args;
 	struct nfs42_offload_status_res res;
 };
 
-static void nfs42_offload_cancel_prepare(struct rpc_task *task, void *calldata)
+static void nfs42_offload_prepare(struct rpc_task *task, void *calldata)
 {
-	struct nfs42_offloadcancel_data *data = calldata;
+	struct nfs42_offload_data *data = calldata;
 
 	nfs4_setup_sequence(data->seq_server->nfs_client,
 				&data->args.osa_seq_args,
@@ -515,7 +515,7 @@ static void nfs42_offload_cancel_prepare(struct rpc_task *task, void *calldata)
 
 static void nfs42_offload_cancel_done(struct rpc_task *task, void *calldata)
 {
-	struct nfs42_offloadcancel_data *data = calldata;
+	struct nfs42_offload_data *data = calldata;
 
 	trace_nfs4_offload_cancel(&data->args, task->tk_status);
 	nfs41_sequence_done(task, &data->res.osr_seq_res);
@@ -525,22 +525,22 @@ static void nfs42_offload_cancel_done(struct rpc_task *task, void *calldata)
 		rpc_restart_call_prepare(task);
 }
 
-static void nfs42_free_offloadcancel_data(void *data)
+static void nfs42_offload_release(void *data)
 {
 	kfree(data);
 }
 
 static const struct rpc_call_ops nfs42_offload_cancel_ops = {
-	.rpc_call_prepare = nfs42_offload_cancel_prepare,
+	.rpc_call_prepare = nfs42_offload_prepare,
 	.rpc_call_done = nfs42_offload_cancel_done,
-	.rpc_release = nfs42_free_offloadcancel_data,
+	.rpc_release = nfs42_offload_release,
 };
 
 static int nfs42_do_offload_cancel_async(struct file *dst,
 					 nfs4_stateid *stateid)
 {
 	struct nfs_server *dst_server = NFS_SERVER(file_inode(dst));
-	struct nfs42_offloadcancel_data *data = NULL;
+	struct nfs42_offload_data *data = NULL;
 	struct nfs_open_context *ctx = nfs_file_open_context(dst);
 	struct rpc_task *task;
 	struct rpc_message msg = {
@@ -559,7 +559,7 @@ static int nfs42_do_offload_cancel_async(struct file *dst,
 	if (!(dst_server->caps & NFS_CAP_OFFLOAD_CANCEL))
 		return -EOPNOTSUPP;
 
-	data = kzalloc(sizeof(struct nfs42_offloadcancel_data), GFP_KERNEL);
+	data = kzalloc(sizeof(struct nfs42_offload_data), GFP_KERNEL);
 	if (data == NULL)
 		return -ENOMEM;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
                   ` (2 preceding siblings ...)
  2024-12-20 15:42 ` [PATCH v2 3/7] NFS: Rename struct nfs4_offloadcancel_data cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 15:42 ` [PATCH v2 5/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS operation cel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Add XDR encoding and decoding functions for the NFSv4.2
OFFLOAD_STATUS operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42xdr.c       | 86 +++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c        |  1 +
 include/linux/nfs4.h    |  1 +
 include/linux/nfs_xdr.h |  5 ++-
 4 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index ef5730c5e704..a928b7f90e59 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -35,6 +35,11 @@
 #define encode_offload_cancel_maxsz	(op_encode_hdr_maxsz + \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE))
 #define decode_offload_cancel_maxsz	(op_decode_hdr_maxsz)
+#define encode_offload_status_maxsz	(op_encode_hdr_maxsz + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE))
+#define decode_offload_status_maxsz	(op_decode_hdr_maxsz + \
+					 2 /* osr_count */ + \
+					 2 /* osr_complete */)
 #define encode_copy_notify_maxsz	(op_encode_hdr_maxsz + \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
 					 1 + /* nl4_type */ \
@@ -143,6 +148,14 @@
 					 decode_sequence_maxsz + \
 					 decode_putfh_maxsz + \
 					 decode_offload_cancel_maxsz)
+#define NFS4_enc_offload_status_sz	(compound_encode_hdr_maxsz + \
+					 encode_sequence_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_offload_status_maxsz)
+#define NFS4_dec_offload_status_sz	(compound_decode_hdr_maxsz + \
+					 decode_sequence_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_offload_status_maxsz)
 #define NFS4_enc_copy_notify_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_copy_notify_maxsz)
@@ -343,6 +356,14 @@ static void encode_offload_cancel(struct xdr_stream *xdr,
 	encode_nfs4_stateid(xdr, &args->osa_stateid);
 }
 
+static void encode_offload_status(struct xdr_stream *xdr,
+				  const struct nfs42_offload_status_args *args,
+				  struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_OFFLOAD_STATUS, decode_offload_status_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->osa_stateid);
+}
+
 static void encode_copy_notify(struct xdr_stream *xdr,
 			       const struct nfs42_copy_notify_args *args,
 			       struct compound_hdr *hdr)
@@ -567,6 +588,25 @@ static void nfs4_xdr_enc_offload_cancel(struct rpc_rqst *req,
 	encode_nops(&hdr);
 }
 
+/*
+ * Encode OFFLOAD_STATUS request
+ */
+static void nfs4_xdr_enc_offload_status(struct rpc_rqst *req,
+					struct xdr_stream *xdr,
+					const void *data)
+{
+	const struct nfs42_offload_status_args *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->osa_seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->osa_seq_args, &hdr);
+	encode_putfh(xdr, args->osa_src_fh, &hdr);
+	encode_offload_status(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
 /*
  * Encode COPY_NOTIFY request
  */
@@ -919,6 +959,26 @@ static int decode_offload_cancel(struct xdr_stream *xdr,
 	return decode_op_hdr(xdr, OP_OFFLOAD_CANCEL);
 }
 
+static int decode_offload_status(struct xdr_stream *xdr,
+				 struct nfs42_offload_status_res *res)
+{
+	ssize_t result;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_OFFLOAD_STATUS);
+	if (status)
+		return status;
+	/* osr_count */
+	if (xdr_stream_decode_u64(xdr, &res->osr_count) < 0)
+		return -EIO;
+	/* osr_complete<1> */
+	result = xdr_stream_decode_uint32_array(xdr, &res->osr_complete, 1);
+	if (result < 0)
+		return -EIO;
+	res->complete_count = result;
+	return 0;
+}
+
 static int decode_copy_notify(struct xdr_stream *xdr,
 			      struct nfs42_copy_notify_res *res)
 {
@@ -1368,6 +1428,32 @@ static int nfs4_xdr_dec_offload_cancel(struct rpc_rqst *rqstp,
 	return status;
 }
 
+/*
+ * Decode OFFLOAD_STATUS response
+ */
+static int nfs4_xdr_dec_offload_status(struct rpc_rqst *rqstp,
+				       struct xdr_stream *xdr,
+				       void *data)
+{
+	struct nfs42_offload_status_res *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->osr_seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_offload_status(xdr, res);
+
+out:
+	return status;
+}
+
 /*
  * Decode COPY_NOTIFY response
  */
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index e8ac3f615f93..08be0a0cce24 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7702,6 +7702,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
 	PROC42(CLONE,		enc_clone,		dec_clone),
 	PROC42(COPY,		enc_copy,		dec_copy),
 	PROC42(OFFLOAD_CANCEL,	enc_offload_cancel,	dec_offload_cancel),
+	PROC42(OFFLOAD_STATUS,	enc_offload_status,	dec_offload_status),
 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 8d7430d9f218..5de96243a252 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -695,6 +695,7 @@ enum {
 	NFSPROC4_CLNT_LISTXATTRS,
 	NFSPROC4_CLNT_REMOVEXATTR,
 	NFSPROC4_CLNT_READ_PLUS,
+	NFSPROC4_CLNT_OFFLOAD_STATUS,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 559273a0f16d..9ac6c7a26b44 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1520,8 +1520,9 @@ struct nfs42_offload_status_args {
 
 struct nfs42_offload_status_res {
 	struct nfs4_sequence_res	osr_seq_res;
-	uint64_t			osr_count;
-	int				osr_status;
+	u64				osr_count;
+	int				complete_count;
+	u32				osr_complete;
 };
 
 struct nfs42_copy_notify_args {
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS operation
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
                   ` (3 preceding siblings ...)
  2024-12-20 15:42 ` [PATCH v2 4/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 15:42 ` [PATCH v2 6/7] NFS: Use " cel
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Enable the Linux NFS client to observe the progress of an offloaded
asynchronous COPY operation. This new operation will be put to use
in a subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42proc.c        | 101 ++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c         |   3 +-
 include/linux/nfs_fs_sb.h |   1 +
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 9d716907cf30..7fd0f2aa42d4 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -21,6 +21,8 @@
 
 #define NFSDBG_FACILITY NFSDBG_PROC
 static int nfs42_do_offload_cancel_async(struct file *dst, nfs4_stateid *std);
+static int nfs42_proc_offload_status(struct file *file, nfs4_stateid *stateid,
+				     u64 *copied);
 
 static void nfs42_set_netaddr(struct file *filep, struct nfs42_netaddr *naddr)
 {
@@ -582,6 +584,105 @@ static int nfs42_do_offload_cancel_async(struct file *dst,
 	return status;
 }
 
+static int
+_nfs42_proc_offload_status(struct nfs_server *server, struct file *file,
+			   struct nfs42_offload_data *data)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(file);
+	struct rpc_message msg = {
+		.rpc_proc	= &nfs4_procedures[NFSPROC4_CLNT_OFFLOAD_STATUS],
+		.rpc_argp	= &data->args,
+		.rpc_resp	= &data->res,
+		.rpc_cred	= ctx->cred,
+	};
+	int status;
+
+	status = nfs4_call_sync(server->client, server, &msg,
+				&data->args.osa_seq_args,
+				&data->res.osr_seq_res, 1);
+	switch (status) {
+	case 0:
+		break;
+
+	case -NFS4ERR_ADMIN_REVOKED:
+	case -NFS4ERR_BAD_STATEID:
+	case -NFS4ERR_OLD_STATEID:
+		/*
+		 * Server does not recognize the COPY stateid. CB_OFFLOAD
+		 * could have purged it, or server might have rebooted.
+		 * Since COPY stateids don't have an associated inode,
+		 * avoid triggering state recovery.
+		 */
+		status = -EBADF;
+		break;
+	case -NFS4ERR_NOTSUPP:
+	case -ENOTSUPP:
+	case -EOPNOTSUPP:
+		server->caps &= ~NFS_CAP_OFFLOAD_STATUS;
+		status = -EOPNOTSUPP;
+		break;
+	}
+
+	return status;
+}
+
+/**
+ * nfs42_proc_offload_status - Poll completion status of an async copy operation
+ * @dst: handle of file being copied into
+ * @stateid: copy stateid (from async COPY result)
+ * @copied: OUT: number of bytes copied so far
+ *
+ * Return values:
+ *   %0: Server returned an NFS4_OK completion status
+ *   %-EINPROGRESS: Server returned no completion status
+ *   %-EREMOTEIO: Server returned an error completion status
+ *   %-EBADF: Server did not recognize the copy stateid
+ *   %-EOPNOTSUPP: Server does not support OFFLOAD_STATUS
+ *   %-ERESTARTSYS: Wait interrupted by signal
+ *
+ * Other negative errnos indicate the client could not complete the
+ * request.
+ */
+static int __maybe_unused
+nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid, u64 *copied)
+{
+	struct inode *inode = file_inode(dst);
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct nfs4_exception exception = {
+		.inode = inode,
+	};
+	struct nfs42_offload_data *data;
+	int status;
+
+	if (!(server->caps & NFS_CAP_OFFLOAD_STATUS))
+		return -EOPNOTSUPP;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+	data->seq_server = server;
+	data->args.osa_src_fh = NFS_FH(inode);
+	memcpy(&data->args.osa_stateid, stateid,
+		sizeof(data->args.osa_stateid));
+	exception.stateid = &data->args.osa_stateid;
+	do {
+		status = _nfs42_proc_offload_status(server, dst, data);
+		if (status == -EOPNOTSUPP)
+			goto out;
+		status = nfs4_handle_exception(server, status, &exception);
+	} while (exception.retry);
+
+	*copied = data->res.osr_count;
+	if (!data->res.complete_count)
+		status = -EINPROGRESS;
+	else if (data->res.osr_complete != NFS_OK)
+		status = -EREMOTEIO;
+
+out:
+	kfree(data);
+	return status;
+}
+
 static int _nfs42_proc_copy_notify(struct file *src, struct file *dst,
 				   struct nfs42_copy_notify_args *args,
 				   struct nfs42_copy_notify_res *res)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 405f17e6e0b4..973b8d8fa98b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -10769,7 +10769,8 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
 		| NFS_CAP_CLONE
 		| NFS_CAP_LAYOUTERROR
 		| NFS_CAP_READ_PLUS
-		| NFS_CAP_MOVEABLE,
+		| NFS_CAP_MOVEABLE
+		| NFS_CAP_OFFLOAD_STATUS,
 	.init_client = nfs41_init_client,
 	.shutdown_client = nfs41_shutdown_client,
 	.match_stateid = nfs41_match_stateid,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b804346a9741..946ca1c28773 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -290,6 +290,7 @@ struct nfs_server {
 #define NFS_CAP_CASE_INSENSITIVE	(1U << 6)
 #define NFS_CAP_CASE_PRESERVING	(1U << 7)
 #define NFS_CAP_REBOOT_LAYOUTRETURN	(1U << 8)
+#define NFS_CAP_OFFLOAD_STATUS	(1U << 9)
 #define NFS_CAP_OPEN_XOR	(1U << 12)
 #define NFS_CAP_DELEGTIME	(1U << 13)
 #define NFS_CAP_POSIX_LOCK	(1U << 14)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/7] NFS: Use NFSv4.2's OFFLOAD_STATUS operation
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
                   ` (4 preceding siblings ...)
  2024-12-20 15:42 ` [PATCH v2 5/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS operation cel
@ 2024-12-20 15:42 ` cel
  2024-12-21 14:36   ` Olga Kornievskaia
  2024-12-20 15:42 ` [PATCH v2 7/7] NFS: Refactor trace_nfs4_offload_cancel cel
  2024-12-20 16:46 ` [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation Jeff Layton
  7 siblings, 1 reply; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker
  Cc: linux-nfs, Chuck Lever, Olga Kornievskaia

From: Chuck Lever <chuck.lever@oracle.com>

We've found that there are cases where a transport disconnection
results in the loss of callback RPCs. NFS servers typically do not
retransmit callback operations after a disconnect.

This can be a problem for the Linux NFS client's current
implementation of asynchronous COPY, which waits indefinitely for a
CB_OFFLOAD callback. If a transport disconnect occurs while an async
COPY is running, there's a good chance the client will never get the
completing CB_OFFLOAD.

Fix this by implementing the OFFLOAD_STATUS operation so that the
Linux NFS client can probe the NFS server if it doesn't see a
CB_OFFLOAD in a reasonable amount of time.

This patch implements a simplistic check. As future work, the client
might also be able to detect whether there is no forward progress on
the request asynchronous COPY operation, and CANCEL it.

Suggested-by: Olga Kornievskaia <kolga@netapp.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42proc.c | 70 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 7fd0f2aa42d4..65cfdb5c7b02 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -175,6 +175,25 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 	return err;
 }
 
+/* Wait this long before checking progress on a COPY operation */
+enum {
+	NFS42_COPY_TIMEOUT	= 3 * HZ,
+};
+
+static void nfs4_copy_dequeue_callback(struct nfs_server *dst_server,
+				       struct nfs_server *src_server,
+				       struct nfs4_copy_state *copy)
+{
+	spin_lock(&dst_server->nfs_client->cl_lock);
+	list_del_init(&copy->copies);
+	spin_unlock(&dst_server->nfs_client->cl_lock);
+	if (dst_server != src_server) {
+		spin_lock(&src_server->nfs_client->cl_lock);
+		list_del_init(&copy->src_copies);
+		spin_unlock(&src_server->nfs_client->cl_lock);
+	}
+}
+
 static int handle_async_copy(struct nfs42_copy_res *res,
 			     struct nfs_server *dst_server,
 			     struct nfs_server *src_server,
@@ -184,9 +203,10 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 			     bool *restart)
 {
 	struct nfs4_copy_state *copy, *tmp_copy = NULL, *iter;
-	int status = NFS4_OK;
 	struct nfs_open_context *dst_ctx = nfs_file_open_context(dst);
 	struct nfs_open_context *src_ctx = nfs_file_open_context(src);
+	int status = NFS4_OK;
+	u64 copied;
 
 	copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
 	if (!copy)
@@ -224,15 +244,12 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 		spin_unlock(&src_server->nfs_client->cl_lock);
 	}
 
-	status = wait_for_completion_interruptible(&copy->completion);
-	spin_lock(&dst_server->nfs_client->cl_lock);
-	list_del_init(&copy->copies);
-	spin_unlock(&dst_server->nfs_client->cl_lock);
-	if (dst_server != src_server) {
-		spin_lock(&src_server->nfs_client->cl_lock);
-		list_del_init(&copy->src_copies);
-		spin_unlock(&src_server->nfs_client->cl_lock);
-	}
+wait:
+	status = wait_for_completion_interruptible_timeout(&copy->completion,
+							   NFS42_COPY_TIMEOUT);
+	if (!status)
+		goto timeout;
+	nfs4_copy_dequeue_callback(dst_server, src_server, copy);
 	if (status == -ERESTARTSYS) {
 		goto out_cancel;
 	} else if (copy->flags || copy->error == NFS4ERR_PARTNER_NO_AUTH) {
@@ -242,6 +259,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	}
 out:
 	res->write_res.count = copy->count;
+	/* Copy out the updated write verifier provided by CB_OFFLOAD. */
 	memcpy(&res->write_res.verifier, &copy->verf, sizeof(copy->verf));
 	status = -copy->error;
 
@@ -253,6 +271,36 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	if (!nfs42_files_from_same_server(src, dst))
 		nfs42_do_offload_cancel_async(src, src_stateid);
 	goto out_free;
+timeout:
+	status = nfs42_proc_offload_status(dst, &copy->stateid, &copied);
+	if (status == -EINPROGRESS)
+		goto wait;
+	nfs4_copy_dequeue_callback(dst_server, src_server, copy);
+	switch (status) {
+	case 0:
+		/* The server recognized the copy stateid, so it hasn't
+		 * rebooted. Don't overwrite the verifier returned in the
+		 * COPY result. */
+		res->write_res.count = copied;
+		goto out_free;
+	case -EREMOTEIO:
+		/* COPY operation failed on the server. */
+		status = -EOPNOTSUPP;
+		res->write_res.count = copied;
+		goto out_free;
+	case -EBADF:
+		/* Server did not recognize the copy stateid. It has
+		 * probably restarted and lost the plot. */
+		res->write_res.count = 0;
+		status = -EOPNOTSUPP;
+		break;
+	case -EOPNOTSUPP:
+		/* RFC 7862 REQUIREs server to support OFFLOAD_STATUS when
+		 * it has signed up for an async COPY, so server is not
+		 * spec-compliant. */
+		res->write_res.count = 0;
+	}
+	goto out_free;
 }
 
 static int process_copy_commit(struct file *dst, loff_t pos_dst,
@@ -643,7 +691,7 @@ _nfs42_proc_offload_status(struct nfs_server *server, struct file *file,
  * Other negative errnos indicate the client could not complete the
  * request.
  */
-static int __maybe_unused
+static int
 nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid, u64 *copied)
 {
 	struct inode *inode = file_inode(dst);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/7] NFS: Use NFSv4.2's OFFLOAD_STATUS operation
  2024-12-20 15:42 ` [PATCH v2 6/7] NFS: Use " cel
@ 2024-12-21 14:36   ` Olga Kornievskaia
  2024-12-21 16:10     ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Olga Kornievskaia @ 2024-12-21 14:36 UTC (permalink / raw)
  To: cel
  Cc: Olga Kornievskaia, Trond Myklebust, Anna Schumaker, linux-nfs,
	Chuck Lever, Olga Kornievskaia

On Fri, Dec 20, 2024 at 10:46 AM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> We've found that there are cases where a transport disconnection
> results in the loss of callback RPCs. NFS servers typically do not
> retransmit callback operations after a disconnect.
>
> This can be a problem for the Linux NFS client's current
> implementation of asynchronous COPY, which waits indefinitely for a
> CB_OFFLOAD callback. If a transport disconnect occurs while an async
> COPY is running, there's a good chance the client will never get the
> completing CB_OFFLOAD.
>
> Fix this by implementing the OFFLOAD_STATUS operation so that the
> Linux NFS client can probe the NFS server if it doesn't see a
> CB_OFFLOAD in a reasonable amount of time.
>
> This patch implements a simplistic check. As future work, the client
> might also be able to detect whether there is no forward progress on
> the request asynchronous COPY operation, and CANCEL it.
>
> Suggested-by: Olga Kornievskaia <kolga@netapp.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfs/nfs42proc.c | 70 ++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 59 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 7fd0f2aa42d4..65cfdb5c7b02 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -175,6 +175,25 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>         return err;
>  }
>
> +/* Wait this long before checking progress on a COPY operation */
> +enum {
> +       NFS42_COPY_TIMEOUT      = 3 * HZ,

I'm really not a fan of such a short time out. This make the
OFFLOAD_STATUS a more likely operation rather than a less likely
operation to occur during a copy. OFFLOAD_STATUS and CB_OFFLOAD being
concurrent operations introduce races which we have to try to account
for.

> +};
> +
> +static void nfs4_copy_dequeue_callback(struct nfs_server *dst_server,
> +                                      struct nfs_server *src_server,
> +                                      struct nfs4_copy_state *copy)
> +{
> +       spin_lock(&dst_server->nfs_client->cl_lock);
> +       list_del_init(&copy->copies);
> +       spin_unlock(&dst_server->nfs_client->cl_lock);
> +       if (dst_server != src_server) {
> +               spin_lock(&src_server->nfs_client->cl_lock);
> +               list_del_init(&copy->src_copies);
> +               spin_unlock(&src_server->nfs_client->cl_lock);
> +       }
> +}
> +
>  static int handle_async_copy(struct nfs42_copy_res *res,
>                              struct nfs_server *dst_server,
>                              struct nfs_server *src_server,
> @@ -184,9 +203,10 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>                              bool *restart)
>  {
>         struct nfs4_copy_state *copy, *tmp_copy = NULL, *iter;
> -       int status = NFS4_OK;
>         struct nfs_open_context *dst_ctx = nfs_file_open_context(dst);
>         struct nfs_open_context *src_ctx = nfs_file_open_context(src);
> +       int status = NFS4_OK;
> +       u64 copied;
>
>         copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
>         if (!copy)
> @@ -224,15 +244,12 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>                 spin_unlock(&src_server->nfs_client->cl_lock);
>         }
>
> -       status = wait_for_completion_interruptible(&copy->completion);
> -       spin_lock(&dst_server->nfs_client->cl_lock);
> -       list_del_init(&copy->copies);
> -       spin_unlock(&dst_server->nfs_client->cl_lock);
> -       if (dst_server != src_server) {
> -               spin_lock(&src_server->nfs_client->cl_lock);
> -               list_del_init(&copy->src_copies);
> -               spin_unlock(&src_server->nfs_client->cl_lock);
> -       }
> +wait:
> +       status = wait_for_completion_interruptible_timeout(&copy->completion,
> +                                                          NFS42_COPY_TIMEOUT);
> +       if (!status)
> +               goto timeout;
> +       nfs4_copy_dequeue_callback(dst_server, src_server, copy);
>         if (status == -ERESTARTSYS) {
>                 goto out_cancel;
>         } else if (copy->flags || copy->error == NFS4ERR_PARTNER_NO_AUTH) {
> @@ -242,6 +259,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>         }
>  out:
>         res->write_res.count = copy->count;
> +       /* Copy out the updated write verifier provided by CB_OFFLOAD. */
>         memcpy(&res->write_res.verifier, &copy->verf, sizeof(copy->verf));
>         status = -copy->error;
>
> @@ -253,6 +271,36 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>         if (!nfs42_files_from_same_server(src, dst))
>                 nfs42_do_offload_cancel_async(src, src_stateid);
>         goto out_free;
> +timeout:
> +       status = nfs42_proc_offload_status(dst, &copy->stateid, &copied);

Regardless of what OFFLOAD_STATUS returned we have to check whether or
not the CB_OFFLOAD had arrived while we were waiting for the reply to
the OFFLOAD_STATUS.

> +       if (status == -EINPROGRESS)
> +               goto wait;
> +       nfs4_copy_dequeue_callback(dst_server, src_server, copy);
> +       switch (status) {
> +       case 0:
> +               /* The server recognized the copy stateid, so it hasn't
> +                * rebooted. Don't overwrite the verifier returned in the
> +                * COPY result. */
> +               res->write_res.count = copied;
> +               goto out_free;

In case OFFLOAD_STATUS was successful and CB_OFFLOAD was received we
should take the verifier from the CB_OFFLOAD otherwise we are sending
the unneeede and expensive COMMIT because OFFLOAD_STATUS carries the
completion and value of copy it does not carry the "how committed"
value and thus we are forced to use async copy's "how committed"
value.

> +       case -EREMOTEIO:
> +               /* COPY operation failed on the server. */
> +               status = -EOPNOTSUPP;
> +               res->write_res.count = copied;
> +               goto out_free;
> +       case -EBADF:
> +               /* Server did not recognize the copy stateid. It has
> +                * probably restarted and lost the plot. */
> +               res->write_res.count = 0;
> +               status = -EOPNOTSUPP;
> +               break;

This is the case of receiving a BAD_STATEID from OFFLOAD_STATUS and
this would lead to copy falling back to read/write operation. IF we
don't check the existence of CB_OFFLOAD reply, then OFFLOAD_STATUS can
and will carry BAD_STATEID as the server is free to delete copy status
after it get CB_OFFLOAD reply (which as i said we have to check). And
If we did then we need take the result of the CB_OFFLOAD and not act
on OFFLOAD_STATUS's error.

> +       case -EOPNOTSUPP:
> +               /* RFC 7862 REQUIREs server to support OFFLOAD_STATUS when
> +                * it has signed up for an async COPY, so server is not
> +                * spec-compliant. */
> +               res->write_res.count = 0;
> +       }
> +       goto out_free;
>  }
>
>  static int process_copy_commit(struct file *dst, loff_t pos_dst,
> @@ -643,7 +691,7 @@ _nfs42_proc_offload_status(struct nfs_server *server, struct file *file,
>   * Other negative errnos indicate the client could not complete the
>   * request.
>   */
> -static int __maybe_unused
> +static int
>  nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid, u64 *copied)
>  {
>         struct inode *inode = file_inode(dst);
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/7] NFS: Use NFSv4.2's OFFLOAD_STATUS operation
  2024-12-21 14:36   ` Olga Kornievskaia
@ 2024-12-21 16:10     ` Chuck Lever
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2024-12-21 16:10 UTC (permalink / raw)
  To: Olga Kornievskaia, cel
  Cc: Olga Kornievskaia, Trond Myklebust, Anna Schumaker, linux-nfs,
	Olga Kornievskaia

On 12/21/24 9:36 AM, Olga Kornievskaia wrote:
> On Fri, Dec 20, 2024 at 10:46 AM <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> We've found that there are cases where a transport disconnection
>> results in the loss of callback RPCs. NFS servers typically do not
>> retransmit callback operations after a disconnect.
>>
>> This can be a problem for the Linux NFS client's current
>> implementation of asynchronous COPY, which waits indefinitely for a
>> CB_OFFLOAD callback. If a transport disconnect occurs while an async
>> COPY is running, there's a good chance the client will never get the
>> completing CB_OFFLOAD.
>>
>> Fix this by implementing the OFFLOAD_STATUS operation so that the
>> Linux NFS client can probe the NFS server if it doesn't see a
>> CB_OFFLOAD in a reasonable amount of time.
>>
>> This patch implements a simplistic check. As future work, the client
>> might also be able to detect whether there is no forward progress on
>> the request asynchronous COPY operation, and CANCEL it.
>>
>> Suggested-by: Olga Kornievskaia <kolga@netapp.com>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   fs/nfs/nfs42proc.c | 70 ++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 59 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>> index 7fd0f2aa42d4..65cfdb5c7b02 100644
>> --- a/fs/nfs/nfs42proc.c
>> +++ b/fs/nfs/nfs42proc.c
>> @@ -175,6 +175,25 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>>          return err;
>>   }
>>
>> +/* Wait this long before checking progress on a COPY operation */
>> +enum {
>> +       NFS42_COPY_TIMEOUT      = 3 * HZ,
> 
> I'm really not a fan of such a short time out.

I'm not wedded to 3 seconds.

For the purpose of a proof-of-concept, it can be valuable to make
such timeouts short to exacerbate races. But perhaps we are about to
step out of the realm of proof-of-concept. Suggestions are welcome.


> This make the
> OFFLOAD_STATUS a more likely operation rather than a less likely
> operation to occur during a copy. OFFLOAD_STATUS and CB_OFFLOAD being
> concurrent operations introduce races which we have to try to account
> for.

However, we have to close these windows whether the timeout is short or
long. Making a timeout longer simply to avoid races seems like a band
aid to me.

Back to choosing a value.

For a very long running copy, every 3 seconds might indeed be overkill.
We could use the advertised lease timeout (or a fraction of it) instead.

Neil suggested an exponential backoff here. That feels to me like an
optimization that can be made later if we find there is evidence in
favor of doing so.



>> +};
>> +
>> +static void nfs4_copy_dequeue_callback(struct nfs_server *dst_server,
>> +                                      struct nfs_server *src_server,
>> +                                      struct nfs4_copy_state *copy)
>> +{
>> +       spin_lock(&dst_server->nfs_client->cl_lock);
>> +       list_del_init(&copy->copies);
>> +       spin_unlock(&dst_server->nfs_client->cl_lock);
>> +       if (dst_server != src_server) {
>> +               spin_lock(&src_server->nfs_client->cl_lock);
>> +               list_del_init(&copy->src_copies);
>> +               spin_unlock(&src_server->nfs_client->cl_lock);
>> +       }
>> +}
>> +
>>   static int handle_async_copy(struct nfs42_copy_res *res,
>>                               struct nfs_server *dst_server,
>>                               struct nfs_server *src_server,
>> @@ -184,9 +203,10 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>>                               bool *restart)
>>   {
>>          struct nfs4_copy_state *copy, *tmp_copy = NULL, *iter;
>> -       int status = NFS4_OK;
>>          struct nfs_open_context *dst_ctx = nfs_file_open_context(dst);
>>          struct nfs_open_context *src_ctx = nfs_file_open_context(src);
>> +       int status = NFS4_OK;
>> +       u64 copied;
>>
>>          copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
>>          if (!copy)
>> @@ -224,15 +244,12 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>>                  spin_unlock(&src_server->nfs_client->cl_lock);
>>          }
>>
>> -       status = wait_for_completion_interruptible(&copy->completion);
>> -       spin_lock(&dst_server->nfs_client->cl_lock);
>> -       list_del_init(&copy->copies);
>> -       spin_unlock(&dst_server->nfs_client->cl_lock);
>> -       if (dst_server != src_server) {
>> -               spin_lock(&src_server->nfs_client->cl_lock);
>> -               list_del_init(&copy->src_copies);
>> -               spin_unlock(&src_server->nfs_client->cl_lock);
>> -       }
>> +wait:
>> +       status = wait_for_completion_interruptible_timeout(&copy->completion,
>> +                                                          NFS42_COPY_TIMEOUT);
>> +       if (!status)
>> +               goto timeout;
>> +       nfs4_copy_dequeue_callback(dst_server, src_server, copy);
>>          if (status == -ERESTARTSYS) {
>>                  goto out_cancel;
>>          } else if (copy->flags || copy->error == NFS4ERR_PARTNER_NO_AUTH) {
>> @@ -242,6 +259,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>>          }
>>   out:
>>          res->write_res.count = copy->count;
>> +       /* Copy out the updated write verifier provided by CB_OFFLOAD. */
>>          memcpy(&res->write_res.verifier, &copy->verf, sizeof(copy->verf));
>>          status = -copy->error;
>>
>> @@ -253,6 +271,36 @@ static int handle_async_copy(struct nfs42_copy_res *res,
>>          if (!nfs42_files_from_same_server(src, dst))
>>                  nfs42_do_offload_cancel_async(src, src_stateid);
>>          goto out_free;
>> +timeout:
>> +       status = nfs42_proc_offload_status(dst, &copy->stateid, &copied);
> 
> Regardless of what OFFLOAD_STATUS returned we have to check whether or
> not the CB_OFFLOAD had arrived while we were waiting for the reply to
> the OFFLOAD_STATUS.

I think that could be done. The only reason to add that extra complexity
is because CB_OFFLOAD carries a verifier and stable_how, and
OFFLOAD_STATUS does not.

But it seems to me this race scenario should be pretty infrequent. And
I'm not yet convinced that the COPY will misbehave if the client ignores
a racing CB_OFFLOAD request. It might be a little slower, is all.

For me, the Minimum Viable Product for this series is that COPY should
always be able to make forward progress. Are we at that point yet?
Anything else is an optimization that should be made based on whether
there is evidence that shows there is a substantial application-visible
improvement.


>> +       if (status == -EINPROGRESS)
>> +               goto wait;
>> +       nfs4_copy_dequeue_callback(dst_server, src_server, copy);
>> +       switch (status) {
>> +       case 0:
>> +               /* The server recognized the copy stateid, so it hasn't
>> +                * rebooted. Don't overwrite the verifier returned in the
>> +                * COPY result. */
>> +               res->write_res.count = copied;
>> +               goto out_free;
> 
> In case OFFLOAD_STATUS was successful and CB_OFFLOAD was received we
> should take the verifier from the CB_OFFLOAD otherwise we are sending
> the unneeede and expensive COMMIT because OFFLOAD_STATUS carries the
> completion and value of copy it does not carry the "how committed"
> value and thus we are forced to use async copy's "how committed"
> value.

I quibble with the description of COMMIT as "expensive".

- If the COPY was UNSTABLE, then a COMMIT will have to be done whether
the client checks for an intervening CB_OFFLOAD or not.

- If the COPY was FILE_SYNC, then the COMMIT is a no-op, and the only
expense here is an extra network round trip.

- This is not a performance path. If the client missed the CB_OFFLOAD
in the first place, then it's already waited a few seconds longer than
the server took to do the COPY.

So my feeling is that avoiding the COMMIT is an optimization. It isn't
required to make COPY or OFFLOAD_STATUS work correctly AFAICT.


>> +       case -EREMOTEIO:
>> +               /* COPY operation failed on the server. */
>> +               status = -EOPNOTSUPP;
>> +               res->write_res.count = copied;
>> +               goto out_free;
>> +       case -EBADF:
>> +               /* Server did not recognize the copy stateid. It has
>> +                * probably restarted and lost the plot. */
>> +               res->write_res.count = 0;
>> +               status = -EOPNOTSUPP;
>> +               break;
> 
> This is the case of receiving a BAD_STATEID from OFFLOAD_STATUS and
> this would lead to copy falling back to read/write operation. IF we
> don't check the existence of CB_OFFLOAD reply, then OFFLOAD_STATUS can
> and will carry BAD_STATEID as the server is free to delete copy status
> after it get CB_OFFLOAD reply (which as i said we have to check).

I don't agree that the client /has/ to check for it. I think it's an
optimization for an infrequent race. As the meme goes: "Change my
mind". ;-)

But back to changing the logic in this arm: what is the problem with
falling back to a read/write copy in such cases? This seems to me like
an infrequent case where we should prioritize recovering reliably.


> And
> If we did then we need take the result of the CB_OFFLOAD and not act
> on OFFLOAD_STATUS's error.

Checking for CB_OFFLOAD after an OFFLOAD_STATUS operation seems racy to
me. There is still a gap where the client has done OFFLOAD_STATUS,
checks for the CB_OFFLOAD and doesn't see one, and then the CB_OFFLOAD
actually shows up. So this does not eliminate a race, it merely narrows
the window.


>> +       case -EOPNOTSUPP:
>> +               /* RFC 7862 REQUIREs server to support OFFLOAD_STATUS when
>> +                * it has signed up for an async COPY, so server is not
>> +                * spec-compliant. */
>> +               res->write_res.count = 0;
>> +       }
>> +       goto out_free;
>>   }
>>
>>   static int process_copy_commit(struct file *dst, loff_t pos_dst,
>> @@ -643,7 +691,7 @@ _nfs42_proc_offload_status(struct nfs_server *server, struct file *file,
>>    * Other negative errnos indicate the client could not complete the
>>    * request.
>>    */
>> -static int __maybe_unused
>> +static int
>>   nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid, u64 *copied)
>>   {
>>          struct inode *inode = file_inode(dst);
>> --
>> 2.47.0
>>
>>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 7/7] NFS: Refactor trace_nfs4_offload_cancel
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
                   ` (5 preceding siblings ...)
  2024-12-20 15:42 ` [PATCH v2 6/7] NFS: Use " cel
@ 2024-12-20 15:42 ` cel
  2024-12-20 16:46 ` [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation Jeff Layton
  7 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-20 15:42 UTC (permalink / raw)
  To: Olga Kornievskaia, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Add a trace_nfs4_offload_status trace point that looks just like
trace_nfs4_offload_cancel. Promote that event to an event class to
avoid duplicating code.

An alternative approach would be to expand trace_nfs4_offload_status
to report more of the actual OFFLOAD_STATUS result.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs42proc.c |  1 +
 fs/nfs/nfs4trace.h | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 65cfdb5c7b02..3cdd4f6f87e9 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -648,6 +648,7 @@ _nfs42_proc_offload_status(struct nfs_server *server, struct file *file,
 	status = nfs4_call_sync(server->client, server, &msg,
 				&data->args.osa_seq_args,
 				&data->res.osr_seq_res, 1);
+	trace_nfs4_offload_status(&data->args, status);
 	switch (status) {
 	case 0:
 		break;
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index 22c973316f0b..bc67fe6801b1 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -2608,7 +2608,7 @@ TRACE_EVENT(nfs4_copy_notify,
 		)
 );
 
-TRACE_EVENT(nfs4_offload_cancel,
+DECLARE_EVENT_CLASS(nfs4_offload_class,
 		TP_PROTO(
 			const struct nfs42_offload_status_args *args,
 			int error
@@ -2640,6 +2640,15 @@ TRACE_EVENT(nfs4_offload_cancel,
 			__entry->stateid_seq, __entry->stateid_hash
 		)
 );
+#define DEFINE_NFS4_OFFLOAD_EVENT(name) \
+	DEFINE_EVENT(nfs4_offload_class, name,  \
+			TP_PROTO( \
+				const struct nfs42_offload_status_args *args, \
+				int error \
+			), \
+			TP_ARGS(args, error))
+DEFINE_NFS4_OFFLOAD_EVENT(nfs4_offload_cancel);
+DEFINE_NFS4_OFFLOAD_EVENT(nfs4_offload_status);
 
 DECLARE_EVENT_CLASS(nfs4_xattr_event,
 		TP_PROTO(
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation
  2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
                   ` (6 preceding siblings ...)
  2024-12-20 15:42 ` [PATCH v2 7/7] NFS: Refactor trace_nfs4_offload_cancel cel
@ 2024-12-20 16:46 ` Jeff Layton
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2024-12-20 16:46 UTC (permalink / raw)
  To: cel, Olga Kornievskaia, Trond Myklebust, Anna Schumaker
  Cc: linux-nfs, Chuck Lever

On Fri, 2024-12-20 at 10:42 -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> SCSI implementation experience has shown that an interrupt-only
> COPY offload implementation is not reliable. There are too many
> common scenarios where the client can miss the completion interrupt
> (in our case, this is a CB_OFFLOAD callback).
> 
> Therefore, a polling mechanism is needed. The NFSv4.2 protocol
> provides one in the form of the new OFFLOAD_STATUS operation. Linux
> NFSD implements OFFLOAD_STATUS already. This series adds a Linux NFS
> client implementation of the OFFLOAD_STATUS operation that can query
> the state of a background COPY on the server.
> 
> These patches are also available here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=fix-async-copy
> 
> Changes since v1:
> - nfs42_proc_offload_status() now uses a synchronous RPC
> 
> Chuck Lever (7):
>   NFS: CB_OFFLOAD can return NFS4ERR_DELAY
>   NFS: Fix typo in OFFLOAD_CANCEL comment
>   NFS: Rename struct nfs4_offloadcancel_data
>   NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR
>   NFS: Implement NFSv4.2's OFFLOAD_STATUS operation
>   NFS: Use NFSv4.2's OFFLOAD_STATUS operation
>   NFS: Refactor trace_nfs4_offload_cancel
> 
>  fs/nfs/callback_proc.c    |   2 +-
>  fs/nfs/nfs42proc.c        | 188 ++++++++++++++++++++++++++++++++++----
>  fs/nfs/nfs42xdr.c         |  88 +++++++++++++++++-
>  fs/nfs/nfs4proc.c         |   3 +-
>  fs/nfs/nfs4trace.h        |  11 ++-
>  fs/nfs/nfs4xdr.c          |   1 +
>  include/linux/nfs4.h      |   1 +
>  include/linux/nfs_fs_sb.h |   1 +
>  include/linux/nfs_xdr.h   |   5 +-
>  9 files changed, 275 insertions(+), 25 deletions(-)
> 

Nice work, Chuck. This all looks good to me.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-12-21 16:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 15:42 [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation cel
2024-12-20 15:42 ` [PATCH v2 1/7] NFS: CB_OFFLOAD can return NFS4ERR_DELAY cel
2024-12-20 15:42 ` [PATCH v2 2/7] NFS: Fix typo in OFFLOAD_CANCEL comment cel
2024-12-20 15:42 ` [PATCH v2 3/7] NFS: Rename struct nfs4_offloadcancel_data cel
2024-12-20 15:42 ` [PATCH v2 4/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR cel
2024-12-20 15:42 ` [PATCH v2 5/7] NFS: Implement NFSv4.2's OFFLOAD_STATUS operation cel
2024-12-20 15:42 ` [PATCH v2 6/7] NFS: Use " cel
2024-12-21 14:36   ` Olga Kornievskaia
2024-12-21 16:10     ` Chuck Lever
2024-12-20 15:42 ` [PATCH v2 7/7] NFS: Refactor trace_nfs4_offload_cancel cel
2024-12-20 16:46 ` [PATCH v2 0/7] Client-side OFFLOAD_STATUS implementation Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox