[PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups
@ 2025-02-07 21:53 Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework Jeff Layton
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

This patch is mostly the same as the v4 series, but with the first
patch broken up into multiple changes (as Chuck suggested), and a few
changes to the comments.

Another minor difference here is that I didn't change the code to ignore
the return value of rpc_restart_call() and rpc_restart_call_prepare().
These could end up being backported to kernels that have to handle those
cases.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v5:
- don't ignore return of rpc_restart_call() and rpc_restart_call_prepare()
- Break up the nfsd4_cb_sequence_done() error handling changes into multiple patches
- Link to v4: https://lore.kernel.org/r/20250207-nfsd-6-14-v4-0-1aa42c407265@kernel.org

Changes in v4:
- Hold back on session refcounting changes for now and just send CB_SEQUENCE
  error handling rework.
- Link to v3: https://lore.kernel.org/r/20250129-nfsd-6-14-v3-0-506e71e39e6b@kernel.org

Changes in v3:
- rename cb_session_changed to nfsd4_cb_session_changed
- rename restart_callback to requeue_callback, and rename need_restart:
  label to requeue:
- don't increment seq_nr on -ESERVERFAULT
- comment cleanups
- drop client-side rpc patch (will send separately)
- Link to v2: https://lore.kernel.org/r/20250129-nfsd-6-14-v2-0-2700c92f3e44@kernel.org

Changes in v2:
- make nfsd4_session be RCU-freed
- change code to keep reference to session over callback RPCs
- rework error handling in nfsd4_cb_sequence_done()
- move NFSv4.0 handling out of nfsd4_cb_sequence_done()
- Link to v1: https://lore.kernel.org/r/20250123-nfsd-6-14-v1-0-c1137a4fa2ae@kernel.org

---
Jeff Layton (7):
      nfsd: prepare nfsd4_cb_sequence_done() for error handling rework
      nfsd: always release slot when requeueing callback
      nfsd: only check RPC_SIGNALLED() when restarting rpc_task
      nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr
      nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better
      nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
      nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()

 fs/nfsd/nfs4callback.c | 102 +++++++++++++++++++++++++++++--------------------
 1 file changed, 61 insertions(+), 41 deletions(-)
---
base-commit: 50934b1a613cabba2b917879c3e722882b72f628
change-id: 20250123-nfsd-6-14-b0797e385dc0

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 2/7] nfsd: always release slot when requeueing callback Jeff Layton
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

There is only one case where we want to proceed with processing the rest
of the CB_COMPOUND, and that's when the cb_seq_status is 0. Make the
default return value be false, and only set it to true in that case.

Rename the "need_restart" label to "requeue", to better indicate that
it's being requeued to the workqueue.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index cf6d29828f4e561418b812ea2c9402929dd52bd0..79abc981e6416a88d9a81497e03e12faa3ce6d0e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1328,11 +1328,12 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
 	rpc_call_start(task);
 }
 
+/* Returns true if CB_COMPOUND processing should continue */
 static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
 {
 	struct nfs4_client *clp = cb->cb_clp;
 	struct nfsd4_session *session = clp->cl_cb_session;
-	bool ret = true;
+	bool ret = false;
 
 	if (!clp->cl_minorversion) {
 		/*
@@ -1345,13 +1346,13 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		 * handle that case here.
 		 */
 		if (RPC_SIGNALLED(task))
-			goto need_restart;
+			goto requeue;
 
 		return true;
 	}
 
 	if (cb->cb_held_slot < 0)
-		goto need_restart;
+		goto requeue;
 
 	/* This is the operation status code for CB_SEQUENCE */
 	trace_nfsd_cb_seq_status(task, cb);
@@ -1365,11 +1366,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		 * (sequence ID, cached reply) MUST NOT change.
 		 */
 		++session->se_cb_seq_nr[cb->cb_held_slot];
+		ret = true;
 		break;
 	case -ESERVERFAULT:
 		++session->se_cb_seq_nr[cb->cb_held_slot];
 		nfsd4_mark_cb_fault(cb->cb_clp);
-		ret = false;
 		break;
 	case 1:
 		/*
@@ -1381,13 +1382,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		fallthrough;
 	case -NFS4ERR_BADSESSION:
 		nfsd4_mark_cb_fault(cb->cb_clp);
-		ret = false;
-		goto need_restart;
+		goto requeue;
 	case -NFS4ERR_DELAY:
 		cb->cb_seq_status = 1;
 		if (!rpc_restart_call(task))
 			goto out;
-
 		rpc_delay(task, 2 * HZ);
 		return false;
 	case -NFS4ERR_BADSLOT:
@@ -1405,14 +1404,13 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	nfsd41_cb_release_slot(cb);
 
 	if (RPC_SIGNALLED(task))
-		goto need_restart;
+		goto requeue;
 out:
 	return ret;
 retry_nowait:
-	if (rpc_restart_call_prepare(task))
-		ret = false;
+	rpc_restart_call_prepare(task);
 	goto out;
-need_restart:
+requeue:
 	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
 		trace_nfsd_cb_restart(clp, cb);
 		task->tk_status = 0;

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 2/7] nfsd: always release slot when requeueing callback
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-08 16:57   ` Chuck Lever
  2025-02-07 21:53 ` [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task Jeff Layton
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

If the callback is going to be requeued to the workqueue, then release
the slot. The callback client and session could change and the slot may
no longer be valid after that point.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 79abc981e6416a88d9a81497e03e12faa3ce6d0e..bb5356e8713a8840bb714859618ff88130825efd 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1411,6 +1411,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	rpc_restart_call_prepare(task);
 	goto out;
 requeue:
+	nfsd41_cb_release_slot(cb);
 	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
 		trace_nfsd_cb_restart(clp, cb);
 		task->tk_status = 0;

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 2/7] nfsd: always release slot when requeueing callback Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-08 16:59   ` Chuck Lever
  2025-02-07 21:53 ` [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr Jeff Layton
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

nfsd4_cb_sequence_done() currently checks RPC_SIGNALLED() when
processing the compound and releasing the slot. If RPC_SIGNALLED()
returns true, then that means that the client is going to be torn down.

Don't check RPC_SIGNALLED() after processing a successful reply.
Instead, only check that before restarting the rpc_task. If that returns
true, then requeue the callback.

Also, handle rpc_restart_call() and rpc_restart_call_prepare() failures
correctly, by requeueing the callback.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index bb5356e8713a8840bb714859618ff88130825efd..1e601075fac02bd5f01ff89d9252ab23d08c4fd9 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1385,8 +1385,8 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		goto requeue;
 	case -NFS4ERR_DELAY:
 		cb->cb_seq_status = 1;
-		if (!rpc_restart_call(task))
-			goto out;
+		if (RPC_SIGNALLED(task) || !rpc_restart_call(task))
+			goto requeue;
 		rpc_delay(task, 2 * HZ);
 		return false;
 	case -NFS4ERR_BADSLOT:
@@ -1402,14 +1402,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	}
 	trace_nfsd_cb_free_slot(task, cb);
 	nfsd41_cb_release_slot(cb);
-
-	if (RPC_SIGNALLED(task))
-		goto requeue;
-out:
 	return ret;
 retry_nowait:
-	rpc_restart_call_prepare(task);
-	goto out;
+	/*
+	 * RPC_SIGNALLED() means that the rpc_client is being torn down and
+	 * (possibly) recreated. Requeue the call in that case.
+	 */
+	if (!RPC_SIGNALLED(task)) {
+		if (rpc_restart_call_prepare(task))
+			return false;
+	}
 requeue:
 	nfsd41_cb_release_slot(cb);
 	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
                   ` (2 preceding siblings ...)
  2025-02-07 21:53 ` [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-08 17:13   ` Chuck Lever
  2025-02-07 21:53 ` [PATCH v5 5/7] nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better Jeff Layton
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

ESERVERFAULT means that the server sent a successful and legitimate
reply, but the session info didn't match what was expected. Don't
increment the seq_nr in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1e601075fac02bd5f01ff89d9252ab23d08c4fd9..d307ca1771bd1eca8f60b62a74170e71e5acf144 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1369,7 +1369,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		ret = true;
 		break;
 	case -ESERVERFAULT:
-		++session->se_cb_seq_nr[cb->cb_held_slot];
+		/*
+		 * Call succeeded, but CB_SEQUENCE reply failed sanity checks.
+		 * The client has gone insane. Mark the BC faulty, since there
+		 * isn't much else we can do.
+		 */
 		nfsd4_mark_cb_fault(cb->cb_clp);
 		break;
 	case 1:

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 5/7] nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
                   ` (3 preceding siblings ...)
  2025-02-07 21:53 ` [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better Jeff Layton
  2025-02-07 21:53 ` [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() Jeff Layton
  6 siblings, 0 replies; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

Currently it just restarts the call, without getting a new slot.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d307ca1771bd1eca8f60b62a74170e71e5acf144..10067a34db3afff8d4e4383854ab9abd9767c2d6 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1394,6 +1394,14 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		rpc_delay(task, 2 * HZ);
 		return false;
 	case -NFS4ERR_BADSLOT:
+		/*
+		 * BADSLOT means that the client and server are out of sync
+		 * on the BC parameters. In this case, we want to mark the
+		 * backchannel faulty and then try it again, but leak the
+		 * slot so no one uses it.
+		 */
+		nfsd4_mark_cb_fault(cb->cb_clp);
+		cb->cb_held_slot = -1;
 		goto retry_nowait;
 	case -NFS4ERR_SEQ_MISORDERED:
 		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
                   ` (4 preceding siblings ...)
  2025-02-07 21:53 ` [PATCH v5 5/7] nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-08 17:01   ` Chuck Lever
  2025-02-07 21:53 ` [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() Jeff Layton
  6 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
fall back to treating it like a BADSLOT if that fails.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 			goto requeue;
 		rpc_delay(task, 2 * HZ);
 		return false;
+	case -NFS4ERR_SEQ_MISORDERED:
+		/*
+		 * Reattempt once with seq_nr 1. If that fails, treat this
+		 * like BADSLOT.
+		 */
+		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
+			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
+			goto retry_nowait;
+		}
+		fallthrough;
 	case -NFS4ERR_BADSLOT:
 		/*
 		 * BADSLOT means that the client and server are out of sync
@@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		nfsd4_mark_cb_fault(cb->cb_clp);
 		cb->cb_held_slot = -1;
 		goto retry_nowait;
-	case -NFS4ERR_SEQ_MISORDERED:
-		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
-			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
-			goto retry_nowait;
-		}
-		break;
 	default:
 		nfsd4_mark_cb_fault(cb->cb_clp);
 	}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()
  2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
                   ` (5 preceding siblings ...)
  2025-02-07 21:53 ` [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better Jeff Layton
@ 2025-02-07 21:53 ` Jeff Layton
  2025-02-08 17:05   ` Chuck Lever
  6 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-07 21:53 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel, Jeff Layton

It's a bit strange to call nfsd4_cb_sequence_done() on a callback with no
CB_SEQUENCE. Lift the handling of restarting a call into a new helper,
and move the handling of NFSv4.0 into nfsd4_cb_done().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 53 ++++++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d6e3e8bb2efabadda9f922318880e12e1cb2c23f..a4427e2f6182415755b646dba1a1ef4acddc0709 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1328,28 +1328,23 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
 	rpc_call_start(task);
 }
 
-/* Returns true if CB_COMPOUND processing should continue */
-static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
+static void requeue_callback(struct rpc_task *task, struct nfsd4_callback *cb)
 {
 	struct nfs4_client *clp = cb->cb_clp;
-	struct nfsd4_session *session = clp->cl_cb_session;
-	bool ret = false;
-
-	if (!clp->cl_minorversion) {
-		/*
-		 * If the backchannel connection was shut down while this
-		 * task was queued, we need to resubmit it after setting up
-		 * a new backchannel connection.
-		 *
-		 * Note that if we lost our callback connection permanently
-		 * the submission code will error out, so we don't need to
-		 * handle that case here.
-		 */
-		if (RPC_SIGNALLED(task))
-			goto requeue;
 
-		return true;
+	nfsd41_cb_release_slot(cb);
+	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
+		trace_nfsd_cb_restart(clp, cb);
+		task->tk_status = 0;
+		cb->cb_need_restart = true;
 	}
+}
+
+/* Returns true if CB_COMPOUND processing should continue */
+static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
+{
+	struct nfsd4_session *session = cb->cb_clp->cl_cb_session;
+	bool ret = false;
 
 	if (cb->cb_held_slot < 0)
 		goto requeue;
@@ -1429,12 +1424,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 			return false;
 	}
 requeue:
-	nfsd41_cb_release_slot(cb);
-	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
-		trace_nfsd_cb_restart(clp, cb);
-		task->tk_status = 0;
-		cb->cb_need_restart = true;
-	}
+	requeue_callback(task, cb);
 	return false;
 }
 
@@ -1445,8 +1435,21 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
 
 	trace_nfsd_cb_rpc_done(clp);
 
-	if (!nfsd4_cb_sequence_done(task, cb))
+	if (!clp->cl_minorversion) {
+		/*
+		 * If the backchannel connection was shut down while this
+		 * task was queued, we need to resubmit it after setting up
+		 * a new backchannel connection.
+		 *
+		 * Note that if we lost our callback connection permanently
+		 * the submission code will error out, so we don't need to
+		 * handle that case here.
+		 */
+		if (RPC_SIGNALLED(task))
+			requeue_callback(task, cb);
+	} else if (!nfsd4_cb_sequence_done(task, cb)) {
 		return;
+	}
 
 	if (cb->cb_status) {
 		WARN_ONCE(task->tk_status,

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 2/7] nfsd: always release slot when requeueing callback
  2025-02-07 21:53 ` [PATCH v5 2/7] nfsd: always release slot when requeueing callback Jeff Layton
@ 2025-02-08 16:57   ` Chuck Lever
  2025-02-08 17:55     ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 16:57 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/7/25 4:53 PM, Jeff Layton wrote:
> If the callback is going to be requeued to the workqueue, then release
> the slot. The callback client and session could change and the slot may
> no longer be valid after that point.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4callback.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 79abc981e6416a88d9a81497e03e12faa3ce6d0e..bb5356e8713a8840bb714859618ff88130825efd 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1411,6 +1411,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  	rpc_restart_call_prepare(task);
>  	goto out;
>  requeue:
> +	nfsd41_cb_release_slot(cb);
>  	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
>  		trace_nfsd_cb_restart(clp, cb);
>  		task->tk_status = 0;
> 

The NFSv4.0 case also goes to the requeue label. Is
nfsd41_cb_release_slot() a no-op for NFSv4.0? Even if it is, this is a
little confusing. Later, in 7/7, the nfsd41_cb_release_slot() call is
carried into the helper that is called by both NFSv4.0 and NFSv4.1, and
that doesn't seem necessary.

Perhaps this change should be done /after/ the NFSv4.0 error flow is
hoisted into nfsd4_cb_done().

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task
  2025-02-07 21:53 ` [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task Jeff Layton
@ 2025-02-08 16:59   ` Chuck Lever
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 16:59 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/7/25 4:53 PM, Jeff Layton wrote:
> nfsd4_cb_sequence_done() currently checks RPC_SIGNALLED() when
> processing the compound and releasing the slot. If RPC_SIGNALLED()
> returns true, then that means that the client is going to be torn down.
> 
> Don't check RPC_SIGNALLED() after processing a successful reply.
> Instead, only check that before restarting the rpc_task. If that returns
> true, then requeue the callback.

This might seem like a nit, but this paragraph didn't make sense to me
at all until I changed s/only check that before/check that only before/.


> Also, handle rpc_restart_call() and rpc_restart_call_prepare() failures
> correctly, by requeueing the callback.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4callback.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index bb5356e8713a8840bb714859618ff88130825efd..1e601075fac02bd5f01ff89d9252ab23d08c4fd9 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1385,8 +1385,8 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  		goto requeue;
>  	case -NFS4ERR_DELAY:
>  		cb->cb_seq_status = 1;
> -		if (!rpc_restart_call(task))
> -			goto out;
> +		if (RPC_SIGNALLED(task) || !rpc_restart_call(task))
> +			goto requeue;
>  		rpc_delay(task, 2 * HZ);
>  		return false;
>  	case -NFS4ERR_BADSLOT:
> @@ -1402,14 +1402,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  	}
>  	trace_nfsd_cb_free_slot(task, cb);
>  	nfsd41_cb_release_slot(cb);
> -
> -	if (RPC_SIGNALLED(task))
> -		goto requeue;
> -out:
>  	return ret;
>  retry_nowait:
> -	rpc_restart_call_prepare(task);
> -	goto out;
> +	/*
> +	 * RPC_SIGNALLED() means that the rpc_client is being torn down and
> +	 * (possibly) recreated. Requeue the call in that case.
> +	 */
> +	if (!RPC_SIGNALLED(task)) {
> +		if (rpc_restart_call_prepare(task))
> +			return false;
> +	}
>  requeue:
>  	nfsd41_cb_release_slot(cb);
>  	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-07 21:53 ` [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better Jeff Layton
@ 2025-02-08 17:01   ` Chuck Lever
  2025-02-08 18:02     ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 17:01 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/7/25 4:53 PM, Jeff Layton wrote:
> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> fall back to treating it like a BADSLOT if that fails.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4callback.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  			goto requeue;
>  		rpc_delay(task, 2 * HZ);
>  		return false;
> +	case -NFS4ERR_SEQ_MISORDERED:
> +		/*
> +		 * Reattempt once with seq_nr 1. If that fails, treat this
> +		 * like BADSLOT.
> +		 */

Nit: this comment says exactly what the code says. If it were me, I'd
remove it. Is there a "why" statement that could be made here? Like,
why retry with a seq_nr of 1 instead of just failing immediately?


> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> +			goto retry_nowait;
> +		}
> +		fallthrough;
>  	case -NFS4ERR_BADSLOT:
>  		/*
>  		 * BADSLOT means that the client and server are out of sync
> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  		nfsd4_mark_cb_fault(cb->cb_clp);
>  		cb->cb_held_slot = -1;
>  		goto retry_nowait;
> -	case -NFS4ERR_SEQ_MISORDERED:
> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> -			goto retry_nowait;
> -		}
> -		break;
>  	default:
>  		nfsd4_mark_cb_fault(cb->cb_clp);
>  	}
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()
  2025-02-07 21:53 ` [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() Jeff Layton
@ 2025-02-08 17:05   ` Chuck Lever
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 17:05 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/7/25 4:53 PM, Jeff Layton wrote:
> It's a bit strange to call nfsd4_cb_sequence_done() on a callback with no
> CB_SEQUENCE. Lift the handling of restarting a call into a new helper,
> and move the handling of NFSv4.0 into nfsd4_cb_done().
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4callback.c | 53 ++++++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index d6e3e8bb2efabadda9f922318880e12e1cb2c23f..a4427e2f6182415755b646dba1a1ef4acddc0709 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1328,28 +1328,23 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>  	rpc_call_start(task);
>  }
>  
> -/* Returns true if CB_COMPOUND processing should continue */
> -static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
> +static void requeue_callback(struct rpc_task *task, struct nfsd4_callback *cb)

Nit: requeue_callback => nfsd4_requeue_cb().

Once the nfsd41_cb_release_slot() call is gone from this helper, it
can easily be placed just after nfsd4_queue_cb(), which seems like
more legible code organization.

Just small nits this time. v6 ought to be ready for me to pull.



>  {
>  	struct nfs4_client *clp = cb->cb_clp;
> -	struct nfsd4_session *session = clp->cl_cb_session;
> -	bool ret = false;
> -
> -	if (!clp->cl_minorversion) {
> -		/*
> -		 * If the backchannel connection was shut down while this
> -		 * task was queued, we need to resubmit it after setting up
> -		 * a new backchannel connection.
> -		 *
> -		 * Note that if we lost our callback connection permanently
> -		 * the submission code will error out, so we don't need to
> -		 * handle that case here.
> -		 */
> -		if (RPC_SIGNALLED(task))
> -			goto requeue;
>  
> -		return true;
> +	nfsd41_cb_release_slot(cb);
> +	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> +		trace_nfsd_cb_restart(clp, cb);
> +		task->tk_status = 0;
> +		cb->cb_need_restart = true;
>  	}
> +}
> +
> +/* Returns true if CB_COMPOUND processing should continue */
> +static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
> +{
> +	struct nfsd4_session *session = cb->cb_clp->cl_cb_session;
> +	bool ret = false;
>  
>  	if (cb->cb_held_slot < 0)
>  		goto requeue;
> @@ -1429,12 +1424,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  			return false;
>  	}
>  requeue:
> -	nfsd41_cb_release_slot(cb);
> -	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> -		trace_nfsd_cb_restart(clp, cb);
> -		task->tk_status = 0;
> -		cb->cb_need_restart = true;
> -	}
> +	requeue_callback(task, cb);
>  	return false;
>  }
>  
> @@ -1445,8 +1435,21 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>  
>  	trace_nfsd_cb_rpc_done(clp);
>  
> -	if (!nfsd4_cb_sequence_done(task, cb))
> +	if (!clp->cl_minorversion) {
> +		/*
> +		 * If the backchannel connection was shut down while this
> +		 * task was queued, we need to resubmit it after setting up
> +		 * a new backchannel connection.
> +		 *
> +		 * Note that if we lost our callback connection permanently
> +		 * the submission code will error out, so we don't need to
> +		 * handle that case here.
> +		 */
> +		if (RPC_SIGNALLED(task))
> +			requeue_callback(task, cb);
> +	} else if (!nfsd4_cb_sequence_done(task, cb)) {
>  		return;
> +	}
>  
>  	if (cb->cb_status) {
>  		WARN_ONCE(task->tk_status,
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr
  2025-02-07 21:53 ` [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr Jeff Layton
@ 2025-02-08 17:13   ` Chuck Lever
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 17:13 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/7/25 4:53 PM, Jeff Layton wrote:
> ESERVERFAULT means that the server sent a successful and legitimate
> reply, but the session info didn't match what was expected. Don't
> increment the seq_nr in that case.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4callback.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 1e601075fac02bd5f01ff89d9252ab23d08c4fd9..d307ca1771bd1eca8f60b62a74170e71e5acf144 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1369,7 +1369,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  		ret = true;
>  		break;
>  	case -ESERVERFAULT:
> -		++session->se_cb_seq_nr[cb->cb_held_slot];
> +		/*
> +		 * Call succeeded, but CB_SEQUENCE reply failed sanity checks.
> +		 * The client has gone insane. Mark the BC faulty, since there
> +		 * isn't much else we can do.
> +		 */

I'm chewing on the "gone insane" remark. Since I'm tagging a few nits in
other spots: perhaps this comment could explain in more specific detail
why the server cannot trust this CB_SEQUENCE response. Maybe:

 /*
  * Call succeeded, but the session, slot index, or slot
  * sequence number in the response do not match the same
  * in the server's call. The sequence information is thus
  * untrustworthy.
  */

That's wordy, but you get the idea. Even better if we can find some
relevant spec language.


>  		nfsd4_mark_cb_fault(cb->cb_clp);
>  		break;
>  	case 1:
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 2/7] nfsd: always release slot when requeueing callback
  2025-02-08 16:57   ` Chuck Lever
@ 2025-02-08 17:55     ` Jeff Layton
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Layton @ 2025-02-08 17:55 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sat, 2025-02-08 at 11:57 -0500, Chuck Lever wrote:
> On 2/7/25 4:53 PM, Jeff Layton wrote:
> > If the callback is going to be requeued to the workqueue, then release
> > the slot. The callback client and session could change and the slot may
> > no longer be valid after that point.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 79abc981e6416a88d9a81497e03e12faa3ce6d0e..bb5356e8713a8840bb714859618ff88130825efd 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1411,6 +1411,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  	rpc_restart_call_prepare(task);
> >  	goto out;
> >  requeue:
> > +	nfsd41_cb_release_slot(cb);
> >  	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> >  		trace_nfsd_cb_restart(clp, cb);
> >  		task->tk_status = 0;
> > 
> 
> The NFSv4.0 case also goes to the requeue label. Is
> nfsd41_cb_release_slot() a no-op for NFSv4.0?
> 

Yes.

> Even if it is, this is a
> little confusing. Later, in 7/7, the nfsd41_cb_release_slot() call is
> carried into the helper that is called by both NFSv4.0 and NFSv4.1, and
> that doesn't seem necessary.
> 
> Perhaps this change should be done /after/ the NFSv4.0 error flow is
> hoisted into nfsd4_cb_done().
> 

Fair point. I'll move the v4.0 rework ahead of this patch and only call
nfsd41_cb_release_slot for v4.1+. I'll also fix the changelog of #3
like you suggested.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 17:01   ` Chuck Lever
@ 2025-02-08 18:02     ` Jeff Layton
  2025-02-08 18:40       ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-08 18:02 UTC (permalink / raw)
  To: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> On 2/7/25 4:53 PM, Jeff Layton wrote:
> > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > fall back to treating it like a BADSLOT if that fails.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/nfsd/nfs4callback.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  			goto requeue;
> >  		rpc_delay(task, 2 * HZ);
> >  		return false;
> > +	case -NFS4ERR_SEQ_MISORDERED:
> > +		/*
> > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > +		 * like BADSLOT.
> > +		 */
> 
> Nit: this comment says exactly what the code says. If it were me, I'd
> remove it. Is there a "why" statement that could be made here? Like,
> why retry with a seq_nr of 1 instead of just failing immediately?
> 

There isn't one that I know of. It looks like Kinglong Mee added it in
7ba6cad6c88f, but there is no real mention of that in the changelog.

TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
when we got this error, and we then retry with a seq_nr of 1? Does the
server then treat that as a retransmission? We might be best off
dropping this and just always treating it like BADSLOT.

Thoughts?

> 
> > +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > +			goto retry_nowait;
> > +		}
> > +		fallthrough;
> >  	case -NFS4ERR_BADSLOT:
> >  		/*
> >  		 * BADSLOT means that the client and server are out of sync
> > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  		nfsd4_mark_cb_fault(cb->cb_clp);
> >  		cb->cb_held_slot = -1;
> >  		goto retry_nowait;
> > -	case -NFS4ERR_SEQ_MISORDERED:
> > -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > -			goto retry_nowait;
> > -		}
> > -		break;
> >  	default:
> >  		nfsd4_mark_cb_fault(cb->cb_clp);
> >  	}
> > 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 18:02     ` Jeff Layton
@ 2025-02-08 18:40       ` Tom Talpey
  2025-02-08 19:08         ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Talpey @ 2025-02-08 18:40 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/8/2025 10:02 AM, Jeff Layton wrote:
> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>> fall back to treating it like a BADSLOT if that fails.
>>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>   fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>   1 file changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>   			goto requeue;
>>>   		rpc_delay(task, 2 * HZ);
>>>   		return false;
>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>> +		/*
>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>> +		 * like BADSLOT.
>>> +		 */
>>
>> Nit: this comment says exactly what the code says. If it were me, I'd
>> remove it. Is there a "why" statement that could be made here? Like,
>> why retry with a seq_nr of 1 instead of just failing immediately?
>>
> 
> There isn't one that I know of. It looks like Kinglong Mee added it in
> 7ba6cad6c88f, but there is no real mention of that in the changelog.
> 
> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> when we got this error, and we then retry with a seq_nr of 1? Does the
> server then treat that as a retransmission? 

So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
subsequent seq_nr 2, to which it gets SEQ_MISORDERED.

If so, yes definitely backing up the seq_nr to 1 will result in the
peer considering it to be a retransmission, which will be bad.

> We might be best off
> dropping this and just always treating it like BADSLOT.

But, why would this happen? Usually I'd think the peer sent seq_nr X
before it received a reply to seq_nr X-1, which would be a peer bug.

OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
how does the requester know the difference?

If treating it as BADSLOT completely resets the sequence, then sure,
but either a) the request is still in-progress, or b) if a bug is
causing the situation, well it's not going to converge on a functional
session.

Not sure I have a solid suggestion right now. Whatever the fix, it
should capture any subtlety in a comment.

Tom.


> 
> Thoughts?
> 
>>
>>> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>> +			goto retry_nowait;
>>> +		}
>>> +		fallthrough;
>>>   	case -NFS4ERR_BADSLOT:
>>>   		/*
>>>   		 * BADSLOT means that the client and server are out of sync
>>> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>   		nfsd4_mark_cb_fault(cb->cb_clp);
>>>   		cb->cb_held_slot = -1;
>>>   		goto retry_nowait;
>>> -	case -NFS4ERR_SEQ_MISORDERED:
>>> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>> -			goto retry_nowait;
>>> -		}
>>> -		break;
>>>   	default:
>>>   		nfsd4_mark_cb_fault(cb->cb_clp);
>>>   	}
>>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 18:40       ` Tom Talpey
@ 2025-02-08 19:08         ` Jeff Layton
  2025-02-08 19:18           ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-08 19:08 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > fall back to treating it like a BADSLOT if that fails.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >   fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > >   1 file changed, 10 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >   			goto requeue;
> > > >   		rpc_delay(task, 2 * HZ);
> > > >   		return false;
> > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > +		/*
> > > > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > > > +		 * like BADSLOT.
> > > > +		 */
> > > 
> > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > remove it. Is there a "why" statement that could be made here? Like,
> > > why retry with a seq_nr of 1 instead of just failing immediately?
> > > 
> > 
> > There isn't one that I know of. It looks like Kinglong Mee added it in
> > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> > 
> > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > when we got this error, and we then retry with a seq_nr of 1? Does the
> > server then treat that as a retransmission? 
> 
> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
> 
> If so, yes definitely backing up the seq_nr to 1 will result in the
> peer considering it to be a retransmission, which will be bad.
> 

Yes, that's what I meant.

> > We might be best off
> > dropping this and just always treating it like BADSLOT.
> 
> But, why would this happen? Usually I'd think the peer sent seq_nr X
> before it received a reply to seq_nr X-1, which would be a peer bug.
> 
> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> how does the requester know the difference?
> 
> If treating it as BADSLOT completely resets the sequence, then sure,
> but either a) the request is still in-progress, or b) if a bug is
> causing the situation, well it's not going to converge on a functional
> session.
> 

With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
in the next forechannel SEQUENCE on the session. That should cause the
client to (eventually) send a DESTROY_SESSION and create a new one. 

Unfortunately, in the meantime, because of the way the callback channel
update works, the server can end up trying to send the callback again
on the same session (and maybe more than once). I'm not sure that
that's a real problem per-se, but it's less than ideal.

> Not sure I have a solid suggestion right now. Whatever the fix, it
> should capture any subtlety in a comment.
> 

At this point, I'm leaning toward just treating it like BADSLOT.
Basically, mark the backchannel faulty, and leak the slot so that
nothing else uses it. That allows us to send backchannel requests on
the other slots until the session gets recreated.

> > 
> > Thoughts?
> > 
> > > 
> > > > +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > +			goto retry_nowait;
> > > > +		}
> > > > +		fallthrough;
> > > >   	case -NFS4ERR_BADSLOT:
> > > >   		/*
> > > >   		 * BADSLOT means that the client and server are out of sync
> > > > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >   		nfsd4_mark_cb_fault(cb->cb_clp);
> > > >   		cb->cb_held_slot = -1;
> > > >   		goto retry_nowait;
> > > > -	case -NFS4ERR_SEQ_MISORDERED:
> > > > -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > -			goto retry_nowait;
> > > > -		}
> > > > -		break;
> > > >   	default:
> > > >   		nfsd4_mark_cb_fault(cb->cb_clp);
> > > >   	}
> > > > 
> > > 
> > > 
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 19:08         ` Jeff Layton
@ 2025-02-08 19:18           ` Tom Talpey
  2025-02-08 20:45             ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Talpey @ 2025-02-08 19:18 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/8/2025 11:08 AM, Jeff Layton wrote:
> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>
>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>> ---
>>>>>    fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>    1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>    			goto requeue;
>>>>>    		rpc_delay(task, 2 * HZ);
>>>>>    		return false;
>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>> +		/*
>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>> +		 * like BADSLOT.
>>>>> +		 */
>>>>
>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>
>>>
>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>
>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>> server then treat that as a retransmission?
>>
>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>
>> If so, yes definitely backing up the seq_nr to 1 will result in the
>> peer considering it to be a retransmission, which will be bad.
>>
> 
> Yes, that's what I meant.
> 
>>> We might be best off
>>> dropping this and just always treating it like BADSLOT.
>>
>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>
>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>> how does the requester know the difference?
>>
>> If treating it as BADSLOT completely resets the sequence, then sure,
>> but either a) the request is still in-progress, or b) if a bug is
>> causing the situation, well it's not going to converge on a functional
>> session.
>>
> 
> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
> in the next forechannel SEQUENCE on the session. That should cause the
> client to (eventually) send a DESTROY_SESSION and create a new one.
> 
> Unfortunately, in the meantime, because of the way the callback channel
> update works, the server can end up trying to send the callback again
> on the same session (and maybe more than once). I'm not sure that
> that's a real problem per-se, but it's less than ideal.
> 
>> Not sure I have a solid suggestion right now. Whatever the fix, it
>> should capture any subtlety in a comment.
>>
> 
> At this point, I'm leaning toward just treating it like BADSLOT.
> Basically, mark the backchannel faulty, and leak the slot so that
> nothing else uses it. That allows us to send backchannel requests on
> the other slots until the session gets recreated.

Hmm, leaking the slot is a workable approach, as long as it doesn't
cascade more than a time or two. Some sort of trigger should be armed
to prevent runaway retries.

It's maybe worth considering what state the peer might be in when this
happens. It too may effectively leak a slot, and if is retaining some
bogus state either as a result of or because of the previous exchange(s)
then this may lead to future hangs/failures. Not pretty, and maybe not
worth trying to guess.

Tom.

> 
>>>
>>> Thoughts?
>>>
>>>>
>>>>> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>> +			goto retry_nowait;
>>>>> +		}
>>>>> +		fallthrough;
>>>>>    	case -NFS4ERR_BADSLOT:
>>>>>    		/*
>>>>>    		 * BADSLOT means that the client and server are out of sync
>>>>> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>    		cb->cb_held_slot = -1;
>>>>>    		goto retry_nowait;
>>>>> -	case -NFS4ERR_SEQ_MISORDERED:
>>>>> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>> -			goto retry_nowait;
>>>>> -		}
>>>>> -		break;
>>>>>    	default:
>>>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>    	}
>>>>>
>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 19:18           ` Tom Talpey
@ 2025-02-08 20:45             ` Jeff Layton
  2025-02-08 21:07               ` Chuck Lever
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-08 20:45 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
> On 2/8/2025 11:08 AM, Jeff Layton wrote:
> > On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> > > On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > > > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > > > fall back to treating it like a BADSLOT if that fails.
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > ---
> > > > > >    fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > > > >    1 file changed, 10 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > >    			goto requeue;
> > > > > >    		rpc_delay(task, 2 * HZ);
> > > > > >    		return false;
> > > > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > +		/*
> > > > > > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > > > > > +		 * like BADSLOT.
> > > > > > +		 */
> > > > > 
> > > > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > > > remove it. Is there a "why" statement that could be made here? Like,
> > > > > why retry with a seq_nr of 1 instead of just failing immediately?
> > > > > 
> > > > 
> > > > There isn't one that I know of. It looks like Kinglong Mee added it in
> > > > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> > > > 
> > > > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > > > when we got this error, and we then retry with a seq_nr of 1? Does the
> > > > server then treat that as a retransmission?
> > > 
> > > So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> > > subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
> > > 
> > > If so, yes definitely backing up the seq_nr to 1 will result in the
> > > peer considering it to be a retransmission, which will be bad.
> > > 
> > 
> > Yes, that's what I meant.
> > 
> > > > We might be best off
> > > > dropping this and just always treating it like BADSLOT.
> > > 
> > > But, why would this happen? Usually I'd think the peer sent seq_nr X
> > > before it received a reply to seq_nr X-1, which would be a peer bug.
> > > 
> > > OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> > > how does the requester know the difference?
> > > 
> > > If treating it as BADSLOT completely resets the sequence, then sure,
> > > but either a) the request is still in-progress, or b) if a bug is
> > > causing the situation, well it's not going to converge on a functional
> > > session.
> > > 
> > 
> > With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
> > in the next forechannel SEQUENCE on the session. That should cause the
> > client to (eventually) send a DESTROY_SESSION and create a new one.
> > 
> > Unfortunately, in the meantime, because of the way the callback channel
> > update works, the server can end up trying to send the callback again
> > on the same session (and maybe more than once). I'm not sure that
> > that's a real problem per-se, but it's less than ideal.
> > 
> > > Not sure I have a solid suggestion right now. Whatever the fix, it
> > > should capture any subtlety in a comment.
> > > 
> > 
> > At this point, I'm leaning toward just treating it like BADSLOT.
> > Basically, mark the backchannel faulty, and leak the slot so that
> > nothing else uses it. That allows us to send backchannel requests on
> > the other slots until the session gets recreated.
> 
> Hmm, leaking the slot is a workable approach, as long as it doesn't
> cascade more than a time or two. Some sort of trigger should be armed
> to prevent runaway retries.
> 
> It's maybe worth considering what state the peer might be in when this
> happens. It too may effectively leak a slot, and if is retaining some
> bogus state either as a result of or because of the previous exchange(s)
> then this may lead to future hangs/failures. Not pretty, and maybe not
> worth trying to guess.
> 
> Tom.
> 


The idea here is that eventually the client should figure out that
something is wrong and reestablish the session. Currently we don't
limit the number of retries on a callback.

Maybe they should time out after a while? If we've retried a callback
for more than two lease periods, give up and log something?

Either way, I'd consider that to be follow-on work to this set.


> > 
> > > > 
> > > > Thoughts?
> > > > 
> > > > > 
> > > > > > +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > > > +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > > > +			goto retry_nowait;
> > > > > > +		}
> > > > > > +		fallthrough;
> > > > > >    	case -NFS4ERR_BADSLOT:
> > > > > >    		/*
> > > > > >    		 * BADSLOT means that the client and server are out of sync
> > > > > > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > > > >    		cb->cb_held_slot = -1;
> > > > > >    		goto retry_nowait;
> > > > > > -	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > > > -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > > > -			goto retry_nowait;
> > > > > > -		}
> > > > > > -		break;
> > > > > >    	default:
> > > > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > > > >    	}
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 20:45             ` Jeff Layton
@ 2025-02-08 21:07               ` Chuck Lever
  2025-02-09  1:24                 ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Chuck Lever @ 2025-02-08 21:07 UTC (permalink / raw)
  To: Jeff Layton, Tom Talpey, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/8/25 3:45 PM, Jeff Layton wrote:
> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>
>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>> ---
>>>>>>>    fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>    1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>    			goto requeue;
>>>>>>>    		rpc_delay(task, 2 * HZ);
>>>>>>>    		return false;
>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>> +		/*
>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>> +		 * like BADSLOT.
>>>>>>> +		 */
>>>>>>
>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>
>>>>>
>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>
>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>> server then treat that as a retransmission?
>>>>
>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>
>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>> peer considering it to be a retransmission, which will be bad.
>>>>
>>>
>>> Yes, that's what I meant.
>>>
>>>>> We might be best off
>>>>> dropping this and just always treating it like BADSLOT.
>>>>
>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>
>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>> how does the requester know the difference?
>>>>
>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>> but either a) the request is still in-progress, or b) if a bug is
>>>> causing the situation, well it's not going to converge on a functional
>>>> session.
>>>>
>>>
>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>> in the next forechannel SEQUENCE on the session. That should cause the
>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>
>>> Unfortunately, in the meantime, because of the way the callback channel
>>> update works, the server can end up trying to send the callback again
>>> on the same session (and maybe more than once). I'm not sure that
>>> that's a real problem per-se, but it's less than ideal.
>>>
>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>> should capture any subtlety in a comment.
>>>>
>>>
>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>> Basically, mark the backchannel faulty, and leak the slot so that
>>> nothing else uses it. That allows us to send backchannel requests on
>>> the other slots until the session gets recreated.
>>
>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>> cascade more than a time or two. Some sort of trigger should be armed
>> to prevent runaway retries.
>>
>> It's maybe worth considering what state the peer might be in when this
>> happens. It too may effectively leak a slot, and if is retaining some
>> bogus state either as a result of or because of the previous exchange(s)
>> then this may lead to future hangs/failures. Not pretty, and maybe not
>> worth trying to guess.
>>
>> Tom.
>>
> 
> 
> The idea here is that eventually the client should figure out that
> something is wrong and reestablish the session. Currently we don't
> limit the number of retries on a callback.
> 
> Maybe they should time out after a while? If we've retried a callback
> for more than two lease periods, give up and log something?
> 
> Either way, I'd consider that to be follow-on work to this set.

As a general comment, I think making a heroic effort to recover in any
of these cases is probably not worth the additional complexity. Where it
is required or where we believe it is worth the trouble, that's where we
want a detailed comment.

What we want to do is ensure forward progress. I'm guessing that error
conditions are going to be rare, so leaking the slot until a certain
portion of them are gone, and then indicating a session fault to force
the client to start over from scratch, is probably the most
straightforward approach.

So, is there a good reason to retry? There doesn't appear to be any
reasoning mentioned in the commit log or in nearby comments.


>>>>> Thoughts?
>>>>>
>>>>>>
>>>>>>> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>> +			goto retry_nowait;
>>>>>>> +		}
>>>>>>> +		fallthrough;
>>>>>>>    	case -NFS4ERR_BADSLOT:
>>>>>>>    		/*
>>>>>>>    		 * BADSLOT means that the client and server are out of sync
>>>>>>> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>    		cb->cb_held_slot = -1;
>>>>>>>    		goto retry_nowait;
>>>>>>> -	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>> -			goto retry_nowait;
>>>>>>> -		}
>>>>>>> -		break;
>>>>>>>    	default:
>>>>>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>    	}
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-08 21:07               ` Chuck Lever
@ 2025-02-09  1:24                 ` Tom Talpey
  2025-02-09  2:14                   ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Talpey @ 2025-02-09  1:24 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/8/2025 4:07 PM, Chuck Lever wrote:
> On 2/8/25 3:45 PM, Jeff Layton wrote:
>> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>>
>>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>>> ---
>>>>>>>>     fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>>     1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>     			goto requeue;
>>>>>>>>     		rpc_delay(task, 2 * HZ);
>>>>>>>>     		return false;
>>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>> +		/*
>>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>>> +		 * like BADSLOT.
>>>>>>>> +		 */
>>>>>>>
>>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>>
>>>>>>
>>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>>
>>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>>> server then treat that as a retransmission?
>>>>>
>>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>>
>>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>>> peer considering it to be a retransmission, which will be bad.
>>>>>
>>>>
>>>> Yes, that's what I meant.
>>>>
>>>>>> We might be best off
>>>>>> dropping this and just always treating it like BADSLOT.
>>>>>
>>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>>
>>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>>> how does the requester know the difference?
>>>>>
>>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>>> but either a) the request is still in-progress, or b) if a bug is
>>>>> causing the situation, well it's not going to converge on a functional
>>>>> session.
>>>>>
>>>>
>>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>>> in the next forechannel SEQUENCE on the session. That should cause the
>>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>>
>>>> Unfortunately, in the meantime, because of the way the callback channel
>>>> update works, the server can end up trying to send the callback again
>>>> on the same session (and maybe more than once). I'm not sure that
>>>> that's a real problem per-se, but it's less than ideal.
>>>>
>>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>>> should capture any subtlety in a comment.
>>>>>
>>>>
>>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>>> Basically, mark the backchannel faulty, and leak the slot so that
>>>> nothing else uses it. That allows us to send backchannel requests on
>>>> the other slots until the session gets recreated.
>>>
>>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>>> cascade more than a time or two. Some sort of trigger should be armed
>>> to prevent runaway retries.
>>>
>>> It's maybe worth considering what state the peer might be in when this
>>> happens. It too may effectively leak a slot, and if is retaining some
>>> bogus state either as a result of or because of the previous exchange(s)
>>> then this may lead to future hangs/failures. Not pretty, and maybe not
>>> worth trying to guess.
>>>
>>> Tom.
>>>
>>
>>
>> The idea here is that eventually the client should figure out that
>> something is wrong and reestablish the session. Currently we don't
>> limit the number of retries on a callback.
>>
>> Maybe they should time out after a while? If we've retried a callback
>> for more than two lease periods, give up and log something?
>>
>> Either way, I'd consider that to be follow-on work to this set.
> 
> As a general comment, I think making a heroic effort to recover in any
> of these cases is probably not worth the additional complexity. Where it
> is required or where we believe it is worth the trouble, that's where we
> want a detailed comment.
> 
> What we want to do is ensure forward progress. I'm guessing that error
> conditions are going to be rare, so leaking the slot until a certain
> portion of them are gone, and then indicating a session fault to force
> the client to start over from scratch, is probably the most
> straightforward approach.
> 
> So, is there a good reason to retry? There doesn't appear to be any
> reasoning mentioned in the commit log or in nearby comments.

Agreed on the general comment.

As for the "any reason to retry" - maybe. If it's a transient error we
don't want to give up early. Unfortunately that appears to be an
ambiguous situation, because SEQ_MISORDERED is allowed in place of
ERR_DELAY. I don't have any great suggestion however.

Jeff, to your point that the "client should figure out something is
wrong", I'm not sure how you think that will happen. If the server is
making a delegation recall and the client receive code chooses to reject 
it at the sequence check, how would that eventually cause the client to
reestablish the session (on the forechannel)?

Tom.

> 
> 
>>>>>> Thoughts?
>>>>>>
>>>>>>>
>>>>>>>> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>>> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>>> +			goto retry_nowait;
>>>>>>>> +		}
>>>>>>>> +		fallthrough;
>>>>>>>>     	case -NFS4ERR_BADSLOT:
>>>>>>>>     		/*
>>>>>>>>     		 * BADSLOT means that the client and server are out of sync
>>>>>>>> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>     		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>>     		cb->cb_held_slot = -1;
>>>>>>>>     		goto retry_nowait;
>>>>>>>> -	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>>> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>>> -			goto retry_nowait;
>>>>>>>> -		}
>>>>>>>> -		break;
>>>>>>>>     	default:
>>>>>>>>     		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>>     	}
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09  1:24                 ` Tom Talpey
@ 2025-02-09  2:14                   ` Jeff Layton
  2025-02-09 16:26                     ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-09  2:14 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
> On 2/8/2025 4:07 PM, Chuck Lever wrote:
> > On 2/8/25 3:45 PM, Jeff Layton wrote:
> > > On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
> > > > On 2/8/2025 11:08 AM, Jeff Layton wrote:
> > > > > On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> > > > > > On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > > > > > > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > > > > > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > > > > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > > > > > > fall back to treating it like a BADSLOT if that fails.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > ---
> > > > > > > > >     fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > > > > > > >     1 file changed, 10 insertions(+), 6 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > > > > >     			goto requeue;
> > > > > > > > >     		rpc_delay(task, 2 * HZ);
> > > > > > > > >     		return false;
> > > > > > > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > > > > +		/*
> > > > > > > > > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > > > > > > > > +		 * like BADSLOT.
> > > > > > > > > +		 */
> > > > > > > > 
> > > > > > > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > > > > > > remove it. Is there a "why" statement that could be made here? Like,
> > > > > > > > why retry with a seq_nr of 1 instead of just failing immediately?
> > > > > > > > 
> > > > > > > 
> > > > > > > There isn't one that I know of. It looks like Kinglong Mee added it in
> > > > > > > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> > > > > > > 
> > > > > > > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > > > > > > when we got this error, and we then retry with a seq_nr of 1? Does the
> > > > > > > server then treat that as a retransmission?
> > > > > > 
> > > > > > So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> > > > > > subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
> > > > > > 
> > > > > > If so, yes definitely backing up the seq_nr to 1 will result in the
> > > > > > peer considering it to be a retransmission, which will be bad.
> > > > > > 
> > > > > 
> > > > > Yes, that's what I meant.
> > > > > 
> > > > > > > We might be best off
> > > > > > > dropping this and just always treating it like BADSLOT.
> > > > > > 
> > > > > > But, why would this happen? Usually I'd think the peer sent seq_nr X
> > > > > > before it received a reply to seq_nr X-1, which would be a peer bug.
> > > > > > 
> > > > > > OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> > > > > > how does the requester know the difference?
> > > > > > 
> > > > > > If treating it as BADSLOT completely resets the sequence, then sure,
> > > > > > but either a) the request is still in-progress, or b) if a bug is
> > > > > > causing the situation, well it's not going to converge on a functional
> > > > > > session.
> > > > > > 
> > > > > 
> > > > > With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
> > > > > in the next forechannel SEQUENCE on the session. That should cause the
> > > > > client to (eventually) send a DESTROY_SESSION and create a new one.
> > > > > 
> > > > > Unfortunately, in the meantime, because of the way the callback channel
> > > > > update works, the server can end up trying to send the callback again
> > > > > on the same session (and maybe more than once). I'm not sure that
> > > > > that's a real problem per-se, but it's less than ideal.
> > > > > 
> > > > > > Not sure I have a solid suggestion right now. Whatever the fix, it
> > > > > > should capture any subtlety in a comment.
> > > > > > 
> > > > > 
> > > > > At this point, I'm leaning toward just treating it like BADSLOT.
> > > > > Basically, mark the backchannel faulty, and leak the slot so that
> > > > > nothing else uses it. That allows us to send backchannel requests on
> > > > > the other slots until the session gets recreated.
> > > > 
> > > > Hmm, leaking the slot is a workable approach, as long as it doesn't
> > > > cascade more than a time or two. Some sort of trigger should be armed
> > > > to prevent runaway retries.
> > > > 
> > > > It's maybe worth considering what state the peer might be in when this
> > > > happens. It too may effectively leak a slot, and if is retaining some
> > > > bogus state either as a result of or because of the previous exchange(s)
> > > > then this may lead to future hangs/failures. Not pretty, and maybe not
> > > > worth trying to guess.
> > > > 
> > > > Tom.
> > > > 
> > > 
> > > 
> > > The idea here is that eventually the client should figure out that
> > > something is wrong and reestablish the session. Currently we don't
> > > limit the number of retries on a callback.
> > > 
> > > Maybe they should time out after a while? If we've retried a callback
> > > for more than two lease periods, give up and log something?
> > > 
> > > Either way, I'd consider that to be follow-on work to this set.
> > 
> > As a general comment, I think making a heroic effort to recover in any
> > of these cases is probably not worth the additional complexity. Where it
> > is required or where we believe it is worth the trouble, that's where we
> > want a detailed comment.
> > 
> > What we want to do is ensure forward progress. I'm guessing that error
> > conditions are going to be rare, so leaking the slot until a certain
> > portion of them are gone, and then indicating a session fault to force
> > the client to start over from scratch, is probably the most
> > straightforward approach.
> > 
> > So, is there a good reason to retry? There doesn't appear to be any
> > reasoning mentioned in the commit log or in nearby comments.
> 
> Agreed on the general comment.
> 
> As for the "any reason to retry" - maybe. If it's a transient error we
> don't want to give up early. Unfortunately that appears to be an
> ambiguous situation, because SEQ_MISORDERED is allowed in place of
> ERR_DELAY. I don't have any great suggestion however.
> 

IMO, we should retry callbacks (basically) indefinitely, unless the
NFSv4 client is being torn down (i.e. lease expires or an unmount
happened, etc).

> Jeff, to your point that the "client should figure out something is
> wrong", I'm not sure how you think that will happen. If the server is
> making a delegation recall and the client receive code chooses to reject 
> it at the sequence check, how would that eventually cause the client to
> reestablish the session (on the forechannel)?
> 
> 

In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
sets a flag in the client that makes it set
SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.

The client should take that as an indication that there is a problem
and reestablish a new session (and maybe a new connection). Granted, it
might take up to the next lease renewal, but there's not much else we
can do if the client won't talk to us.

That's why I was suggesting that we might time out the backchannel
calls after two lease periods. OTOH, maybe it's sufficient to not queue
any callbacks for courtesy clients?

> > 
> > 
> > > > > > > Thoughts?
> > > > > > > 
> > > > > > > > 
> > > > > > > > > +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > > > > > > +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > > > > > > +			goto retry_nowait;
> > > > > > > > > +		}
> > > > > > > > > +		fallthrough;
> > > > > > > > >     	case -NFS4ERR_BADSLOT:
> > > > > > > > >     		/*
> > > > > > > > >     		 * BADSLOT means that the client and server are out of sync
> > > > > > > > > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > > > > >     		nfsd4_mark_cb_fault(cb->cb_clp);
> > > > > > > > >     		cb->cb_held_slot = -1;
> > > > > > > > >     		goto retry_nowait;
> > > > > > > > > -	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > > > > -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > > > > > > -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > > > > > > -			goto retry_nowait;
> > > > > > > > > -		}
> > > > > > > > > -		break;
> > > > > > > > >     	default:
> > > > > > > > >     		nfsd4_mark_cb_fault(cb->cb_clp);
> > > > > > > > >     	}
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09  2:14                   ` Jeff Layton
@ 2025-02-09 16:26                     ` Tom Talpey
  2025-02-09 16:51                       ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Talpey @ 2025-02-09 16:26 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/8/2025 9:14 PM, Jeff Layton wrote:
> On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
>> On 2/8/2025 4:07 PM, Chuck Lever wrote:
>>> On 2/8/25 3:45 PM, Jeff Layton wrote:
>>>> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>>>>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>>>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>>>>> ---
>>>>>>>>>>      fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>>>>      1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>>>      			goto requeue;
>>>>>>>>>>      		rpc_delay(task, 2 * HZ);
>>>>>>>>>>      		return false;
>>>>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>>>> +		/*
>>>>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>>>>> +		 * like BADSLOT.
>>>>>>>>>> +		 */
>>>>>>>>>
>>>>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>>>>
>>>>>>>>
>>>>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>>>>
>>>>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>>>>> server then treat that as a retransmission?
>>>>>>>
>>>>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>>>>
>>>>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>>>>> peer considering it to be a retransmission, which will be bad.
>>>>>>>
>>>>>>
>>>>>> Yes, that's what I meant.
>>>>>>
>>>>>>>> We might be best off
>>>>>>>> dropping this and just always treating it like BADSLOT.
>>>>>>>
>>>>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>>>>
>>>>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>>>>> how does the requester know the difference?
>>>>>>>
>>>>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>>>>> but either a) the request is still in-progress, or b) if a bug is
>>>>>>> causing the situation, well it's not going to converge on a functional
>>>>>>> session.
>>>>>>>
>>>>>>
>>>>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>>> in the next forechannel SEQUENCE on the session. That should cause the
>>>>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>>>>
>>>>>> Unfortunately, in the meantime, because of the way the callback channel
>>>>>> update works, the server can end up trying to send the callback again
>>>>>> on the same session (and maybe more than once). I'm not sure that
>>>>>> that's a real problem per-se, but it's less than ideal.
>>>>>>
>>>>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>>>>> should capture any subtlety in a comment.
>>>>>>>
>>>>>>
>>>>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>>>>> Basically, mark the backchannel faulty, and leak the slot so that
>>>>>> nothing else uses it. That allows us to send backchannel requests on
>>>>>> the other slots until the session gets recreated.
>>>>>
>>>>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>>>>> cascade more than a time or two. Some sort of trigger should be armed
>>>>> to prevent runaway retries.
>>>>>
>>>>> It's maybe worth considering what state the peer might be in when this
>>>>> happens. It too may effectively leak a slot, and if is retaining some
>>>>> bogus state either as a result of or because of the previous exchange(s)
>>>>> then this may lead to future hangs/failures. Not pretty, and maybe not
>>>>> worth trying to guess.
>>>>>
>>>>> Tom.
>>>>>
>>>>
>>>>
>>>> The idea here is that eventually the client should figure out that
>>>> something is wrong and reestablish the session. Currently we don't
>>>> limit the number of retries on a callback.
>>>>
>>>> Maybe they should time out after a while? If we've retried a callback
>>>> for more than two lease periods, give up and log something?
>>>>
>>>> Either way, I'd consider that to be follow-on work to this set.
>>>
>>> As a general comment, I think making a heroic effort to recover in any
>>> of these cases is probably not worth the additional complexity. Where it
>>> is required or where we believe it is worth the trouble, that's where we
>>> want a detailed comment.
>>>
>>> What we want to do is ensure forward progress. I'm guessing that error
>>> conditions are going to be rare, so leaking the slot until a certain
>>> portion of them are gone, and then indicating a session fault to force
>>> the client to start over from scratch, is probably the most
>>> straightforward approach.
>>>
>>> So, is there a good reason to retry? There doesn't appear to be any
>>> reasoning mentioned in the commit log or in nearby comments.
>>
>> Agreed on the general comment.
>>
>> As for the "any reason to retry" - maybe. If it's a transient error we
>> don't want to give up early. Unfortunately that appears to be an
>> ambiguous situation, because SEQ_MISORDERED is allowed in place of
>> ERR_DELAY. I don't have any great suggestion however.
>>
> 
> IMO, we should retry callbacks (basically) indefinitely, unless the
> NFSv4 client is being torn down (i.e. lease expires or an unmount
> happened, etc).
> 
>> Jeff, to your point that the "client should figure out something is
>> wrong", I'm not sure how you think that will happen. If the server is
>> making a delegation recall and the client receive code chooses to reject
>> it at the sequence check, how would that eventually cause the client to
>> reestablish the session (on the forechannel)?
>>
>>
> 
> In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
> sets a flag in the client that makes it set
> SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.

Aha, that's good. RFC8881 only mentions it twice, but it's normative:

SEQ4_STATUS_BACKCHANNEL_FAULT
     The server has encountered an unrecoverable fault with the
     backchannel (e.g., it has lost track of the sequence ID for a slot
     in the backchannel). The client MUST stop sending more requests on
     the session's fore channel, wait for all outstanding requests to
     complete on the fore and back channel, and then destroy the session.

I guess my question is, what if the client ignores it anyway? What
server code actually forces the recovery?

Tom.


> 
> The client should take that as an indication that there is a problem
> and reestablish a new session (and maybe a new connection). Granted, it
> might take up to the next lease renewal, but there's not much else we
> can do if the client won't talk to us.
> 
> That's why I was suggesting that we might time out the backchannel
> calls after two lease periods. OTOH, maybe it's sufficient to not queue
> any callbacks for courtesy clients?
> 
>>>
>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> +		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>>>>> +			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>>>>> +			goto retry_nowait;
>>>>>>>>>> +		}
>>>>>>>>>> +		fallthrough;
>>>>>>>>>>      	case -NFS4ERR_BADSLOT:
>>>>>>>>>>      		/*
>>>>>>>>>>      		 * BADSLOT means that the client and server are out of sync
>>>>>>>>>> @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>>>      		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>>>>      		cb->cb_held_slot = -1;
>>>>>>>>>>      		goto retry_nowait;
>>>>>>>>>> -	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>>>> -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
>>>>>>>>>> -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
>>>>>>>>>> -			goto retry_nowait;
>>>>>>>>>> -		}
>>>>>>>>>> -		break;
>>>>>>>>>>      	default:
>>>>>>>>>>      		nfsd4_mark_cb_fault(cb->cb_clp);
>>>>>>>>>>      	}
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09 16:26                     ` Tom Talpey
@ 2025-02-09 16:51                       ` Jeff Layton
  2025-02-09 16:58                         ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-09 16:51 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sun, 2025-02-09 at 11:26 -0500, Tom Talpey wrote:
> On 2/8/2025 9:14 PM, Jeff Layton wrote:
> > On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
> > > On 2/8/2025 4:07 PM, Chuck Lever wrote:
> > > > On 2/8/25 3:45 PM, Jeff Layton wrote:
> > > > > On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
> > > > > > On 2/8/2025 11:08 AM, Jeff Layton wrote:
> > > > > > > On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> > > > > > > > On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > > > > > > > > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > > > > > > > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > > > > > > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > > > > > > > > fall back to treating it like a BADSLOT if that fails.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > ---
> > > > > > > > > > >      fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > > > > > > > > >      1 file changed, 10 insertions(+), 6 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > > > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > > > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > > > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > > > > > > >      			goto requeue;
> > > > > > > > > > >      		rpc_delay(task, 2 * HZ);
> > > > > > > > > > >      		return false;
> > > > > > > > > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > > > > > > +		/*
> > > > > > > > > > > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > > > > > > > > > > +		 * like BADSLOT.
> > > > > > > > > > > +		 */
> > > > > > > > > > 
> > > > > > > > > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > > > > > > > > remove it. Is there a "why" statement that could be made here? Like,
> > > > > > > > > > why retry with a seq_nr of 1 instead of just failing immediately?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > There isn't one that I know of. It looks like Kinglong Mee added it in
> > > > > > > > > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> > > > > > > > > 
> > > > > > > > > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > > > > > > > > when we got this error, and we then retry with a seq_nr of 1? Does the
> > > > > > > > > server then treat that as a retransmission?
> > > > > > > > 
> > > > > > > > So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> > > > > > > > subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
> > > > > > > > 
> > > > > > > > If so, yes definitely backing up the seq_nr to 1 will result in the
> > > > > > > > peer considering it to be a retransmission, which will be bad.
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes, that's what I meant.
> > > > > > > 
> > > > > > > > > We might be best off
> > > > > > > > > dropping this and just always treating it like BADSLOT.
> > > > > > > > 
> > > > > > > > But, why would this happen? Usually I'd think the peer sent seq_nr X
> > > > > > > > before it received a reply to seq_nr X-1, which would be a peer bug.
> > > > > > > > 
> > > > > > > > OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> > > > > > > > how does the requester know the difference?
> > > > > > > > 
> > > > > > > > If treating it as BADSLOT completely resets the sequence, then sure,
> > > > > > > > but either a) the request is still in-progress, or b) if a bug is
> > > > > > > > causing the situation, well it's not going to converge on a functional
> > > > > > > > session.
> > > > > > > > 
> > > > > > > 
> > > > > > > With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
> > > > > > > in the next forechannel SEQUENCE on the session. That should cause the
> > > > > > > client to (eventually) send a DESTROY_SESSION and create a new one.
> > > > > > > 
> > > > > > > Unfortunately, in the meantime, because of the way the callback channel
> > > > > > > update works, the server can end up trying to send the callback again
> > > > > > > on the same session (and maybe more than once). I'm not sure that
> > > > > > > that's a real problem per-se, but it's less than ideal.
> > > > > > > 
> > > > > > > > Not sure I have a solid suggestion right now. Whatever the fix, it
> > > > > > > > should capture any subtlety in a comment.
> > > > > > > > 
> > > > > > > 
> > > > > > > At this point, I'm leaning toward just treating it like BADSLOT.
> > > > > > > Basically, mark the backchannel faulty, and leak the slot so that
> > > > > > > nothing else uses it. That allows us to send backchannel requests on
> > > > > > > the other slots until the session gets recreated.
> > > > > > 
> > > > > > Hmm, leaking the slot is a workable approach, as long as it doesn't
> > > > > > cascade more than a time or two. Some sort of trigger should be armed
> > > > > > to prevent runaway retries.
> > > > > > 
> > > > > > It's maybe worth considering what state the peer might be in when this
> > > > > > happens. It too may effectively leak a slot, and if is retaining some
> > > > > > bogus state either as a result of or because of the previous exchange(s)
> > > > > > then this may lead to future hangs/failures. Not pretty, and maybe not
> > > > > > worth trying to guess.
> > > > > > 
> > > > > > Tom.
> > > > > > 
> > > > > 
> > > > > 
> > > > > The idea here is that eventually the client should figure out that
> > > > > something is wrong and reestablish the session. Currently we don't
> > > > > limit the number of retries on a callback.
> > > > > 
> > > > > Maybe they should time out after a while? If we've retried a callback
> > > > > for more than two lease periods, give up and log something?
> > > > > 
> > > > > Either way, I'd consider that to be follow-on work to this set.
> > > > 
> > > > As a general comment, I think making a heroic effort to recover in any
> > > > of these cases is probably not worth the additional complexity. Where it
> > > > is required or where we believe it is worth the trouble, that's where we
> > > > want a detailed comment.
> > > > 
> > > > What we want to do is ensure forward progress. I'm guessing that error
> > > > conditions are going to be rare, so leaking the slot until a certain
> > > > portion of them are gone, and then indicating a session fault to force
> > > > the client to start over from scratch, is probably the most
> > > > straightforward approach.
> > > > 
> > > > So, is there a good reason to retry? There doesn't appear to be any
> > > > reasoning mentioned in the commit log or in nearby comments.
> > > 
> > > Agreed on the general comment.
> > > 
> > > As for the "any reason to retry" - maybe. If it's a transient error we
> > > don't want to give up early. Unfortunately that appears to be an
> > > ambiguous situation, because SEQ_MISORDERED is allowed in place of
> > > ERR_DELAY. I don't have any great suggestion however.
> > > 
> > 
> > IMO, we should retry callbacks (basically) indefinitely, unless the
> > NFSv4 client is being torn down (i.e. lease expires or an unmount
> > happened, etc).
> > 
> > > Jeff, to your point that the "client should figure out something is
> > > wrong", I'm not sure how you think that will happen. If the server is
> > > making a delegation recall and the client receive code chooses to reject
> > > it at the sequence check, how would that eventually cause the client to
> > > reestablish the session (on the forechannel)?
> > > 
> > > 
> > 
> > In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
> > sets a flag in the client that makes it set
> > SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.
> 
> Aha, that's good. RFC8881 only mentions it twice, but it's normative:
> 
> SEQ4_STATUS_BACKCHANNEL_FAULT
>      The server has encountered an unrecoverable fault with the
>      backchannel (e.g., it has lost track of the sequence ID for a slot
>      in the backchannel). The client MUST stop sending more requests on
>      the session's fore channel, wait for all outstanding requests to
>      complete on the fore and back channel, and then destroy the session.
> 
> I guess my question is, what if the client ignores it anyway? What
> server code actually forces the recovery?
> 
> Tom.
> 

I don't think there is anything that does this right now. Does the RFC
mention what the server should do if that happens? I suppose the server
could just unilaterally destroy the session at some point, and force
the client to reestablish it.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09 16:51                       ` Jeff Layton
@ 2025-02-09 16:58                         ` Tom Talpey
  2025-02-09 17:05                           ` Jeff Layton
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Talpey @ 2025-02-09 16:58 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/9/2025 11:51 AM, Jeff Layton wrote:
> On Sun, 2025-02-09 at 11:26 -0500, Tom Talpey wrote:
>> On 2/8/2025 9:14 PM, Jeff Layton wrote:
>>> On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
>>>> On 2/8/2025 4:07 PM, Chuck Lever wrote:
>>>>> On 2/8/25 3:45 PM, Jeff Layton wrote:
>>>>>> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>>>>>>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>>>>>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>>>>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>>>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>>>>>>       1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>>>>>       			goto requeue;
>>>>>>>>>>>>       		rpc_delay(task, 2 * HZ);
>>>>>>>>>>>>       		return false;
>>>>>>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>>>>>> +		/*
>>>>>>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>>>>>>> +		 * like BADSLOT.
>>>>>>>>>>>> +		 */
>>>>>>>>>>>
>>>>>>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>>>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>>>>>>
>>>>>>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>>>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>>>>>>> server then treat that as a retransmission?
>>>>>>>>>
>>>>>>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>>>>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>>>>>>
>>>>>>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>>>>>>> peer considering it to be a retransmission, which will be bad.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, that's what I meant.
>>>>>>>>
>>>>>>>>>> We might be best off
>>>>>>>>>> dropping this and just always treating it like BADSLOT.
>>>>>>>>>
>>>>>>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>>>>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>>>>>>
>>>>>>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>>>>>>> how does the requester know the difference?
>>>>>>>>>
>>>>>>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>>>>>>> but either a) the request is still in-progress, or b) if a bug is
>>>>>>>>> causing the situation, well it's not going to converge on a functional
>>>>>>>>> session.
>>>>>>>>>
>>>>>>>>
>>>>>>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>>>>> in the next forechannel SEQUENCE on the session. That should cause the
>>>>>>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>>>>>>
>>>>>>>> Unfortunately, in the meantime, because of the way the callback channel
>>>>>>>> update works, the server can end up trying to send the callback again
>>>>>>>> on the same session (and maybe more than once). I'm not sure that
>>>>>>>> that's a real problem per-se, but it's less than ideal.
>>>>>>>>
>>>>>>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>>>>>>> should capture any subtlety in a comment.
>>>>>>>>>
>>>>>>>>
>>>>>>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>>>>>>> Basically, mark the backchannel faulty, and leak the slot so that
>>>>>>>> nothing else uses it. That allows us to send backchannel requests on
>>>>>>>> the other slots until the session gets recreated.
>>>>>>>
>>>>>>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>>>>>>> cascade more than a time or two. Some sort of trigger should be armed
>>>>>>> to prevent runaway retries.
>>>>>>>
>>>>>>> It's maybe worth considering what state the peer might be in when this
>>>>>>> happens. It too may effectively leak a slot, and if is retaining some
>>>>>>> bogus state either as a result of or because of the previous exchange(s)
>>>>>>> then this may lead to future hangs/failures. Not pretty, and maybe not
>>>>>>> worth trying to guess.
>>>>>>>
>>>>>>> Tom.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> The idea here is that eventually the client should figure out that
>>>>>> something is wrong and reestablish the session. Currently we don't
>>>>>> limit the number of retries on a callback.
>>>>>>
>>>>>> Maybe they should time out after a while? If we've retried a callback
>>>>>> for more than two lease periods, give up and log something?
>>>>>>
>>>>>> Either way, I'd consider that to be follow-on work to this set.
>>>>>
>>>>> As a general comment, I think making a heroic effort to recover in any
>>>>> of these cases is probably not worth the additional complexity. Where it
>>>>> is required or where we believe it is worth the trouble, that's where we
>>>>> want a detailed comment.
>>>>>
>>>>> What we want to do is ensure forward progress. I'm guessing that error
>>>>> conditions are going to be rare, so leaking the slot until a certain
>>>>> portion of them are gone, and then indicating a session fault to force
>>>>> the client to start over from scratch, is probably the most
>>>>> straightforward approach.
>>>>>
>>>>> So, is there a good reason to retry? There doesn't appear to be any
>>>>> reasoning mentioned in the commit log or in nearby comments.
>>>>
>>>> Agreed on the general comment.
>>>>
>>>> As for the "any reason to retry" - maybe. If it's a transient error we
>>>> don't want to give up early. Unfortunately that appears to be an
>>>> ambiguous situation, because SEQ_MISORDERED is allowed in place of
>>>> ERR_DELAY. I don't have any great suggestion however.
>>>>
>>>
>>> IMO, we should retry callbacks (basically) indefinitely, unless the
>>> NFSv4 client is being torn down (i.e. lease expires or an unmount
>>> happened, etc).
>>>
>>>> Jeff, to your point that the "client should figure out something is
>>>> wrong", I'm not sure how you think that will happen. If the server is
>>>> making a delegation recall and the client receive code chooses to reject
>>>> it at the sequence check, how would that eventually cause the client to
>>>> reestablish the session (on the forechannel)?
>>>>
>>>>
>>>
>>> In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
>>> sets a flag in the client that makes it set
>>> SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.
>>
>> Aha, that's good. RFC8881 only mentions it twice, but it's normative:
>>
>> SEQ4_STATUS_BACKCHANNEL_FAULT
>>       The server has encountered an unrecoverable fault with the
>>       backchannel (e.g., it has lost track of the sequence ID for a slot
>>       in the backchannel). The client MUST stop sending more requests on
>>       the session's fore channel, wait for all outstanding requests to
>>       complete on the fore and back channel, and then destroy the session.
>>
>> I guess my question is, what if the client ignores it anyway? What
>> server code actually forces the recovery?
>>
>> Tom.
>>
> 
> I don't think there is anything that does this right now. Does the RFC
> mention what the server should do if that happens? I suppose the server
> could just unilaterally destroy the session at some point, and force
> the client to reestablish it.

Nope. :( Like many other requirements, it's unenforced normatively.

Perhaps we could consider this as an erratum, but it's more of an
omission. Because of that, it may need IETF discussion ("at some point"
needs a MUST). I'll volunteer to open an issue, if you agree to discuss!

Tom.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09 16:58                         ` Tom Talpey
@ 2025-02-09 17:05                           ` Jeff Layton
  2025-02-09 18:52                             ` Tom Talpey
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2025-02-09 17:05 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On Sun, 2025-02-09 at 11:58 -0500, Tom Talpey wrote:
> On 2/9/2025 11:51 AM, Jeff Layton wrote:
> > On Sun, 2025-02-09 at 11:26 -0500, Tom Talpey wrote:
> > > On 2/8/2025 9:14 PM, Jeff Layton wrote:
> > > > On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
> > > > > On 2/8/2025 4:07 PM, Chuck Lever wrote:
> > > > > > On 2/8/25 3:45 PM, Jeff Layton wrote:
> > > > > > > On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
> > > > > > > > On 2/8/2025 11:08 AM, Jeff Layton wrote:
> > > > > > > > > On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> > > > > > > > > > On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > > > > > > > > > > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > > > > > > > > > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > > > > > > > > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > > > > > > > > > > fall back to treating it like a BADSLOT if that fails.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >       fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > > > > > > > > > > >       1 file changed, 10 insertions(+), 6 deletions(-)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > > > > > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > > > > > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > > > > > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > > > > > > > > > >       			goto requeue;
> > > > > > > > > > > > >       		rpc_delay(task, 2 * HZ);
> > > > > > > > > > > > >       		return false;
> > > > > > > > > > > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > > > > > > > > > > +		/*
> > > > > > > > > > > > > +		 * Reattempt once with seq_nr 1. If that fails, treat this
> > > > > > > > > > > > > +		 * like BADSLOT.
> > > > > > > > > > > > > +		 */
> > > > > > > > > > > > 
> > > > > > > > > > > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > > > > > > > > > > remove it. Is there a "why" statement that could be made here? Like,
> > > > > > > > > > > > why retry with a seq_nr of 1 instead of just failing immediately?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > There isn't one that I know of. It looks like Kinglong Mee added it in
> > > > > > > > > > > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> > > > > > > > > > > 
> > > > > > > > > > > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > > > > > > > > > > when we got this error, and we then retry with a seq_nr of 1? Does the
> > > > > > > > > > > server then treat that as a retransmission?
> > > > > > > > > > 
> > > > > > > > > > So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> > > > > > > > > > subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
> > > > > > > > > > 
> > > > > > > > > > If so, yes definitely backing up the seq_nr to 1 will result in the
> > > > > > > > > > peer considering it to be a retransmission, which will be bad.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Yes, that's what I meant.
> > > > > > > > > 
> > > > > > > > > > > We might be best off
> > > > > > > > > > > dropping this and just always treating it like BADSLOT.
> > > > > > > > > > 
> > > > > > > > > > But, why would this happen? Usually I'd think the peer sent seq_nr X
> > > > > > > > > > before it received a reply to seq_nr X-1, which would be a peer bug.
> > > > > > > > > > 
> > > > > > > > > > OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> > > > > > > > > > how does the requester know the difference?
> > > > > > > > > > 
> > > > > > > > > > If treating it as BADSLOT completely resets the sequence, then sure,
> > > > > > > > > > but either a) the request is still in-progress, or b) if a bug is
> > > > > > > > > > causing the situation, well it's not going to converge on a functional
> > > > > > > > > > session.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
> > > > > > > > > in the next forechannel SEQUENCE on the session. That should cause the
> > > > > > > > > client to (eventually) send a DESTROY_SESSION and create a new one.
> > > > > > > > > 
> > > > > > > > > Unfortunately, in the meantime, because of the way the callback channel
> > > > > > > > > update works, the server can end up trying to send the callback again
> > > > > > > > > on the same session (and maybe more than once). I'm not sure that
> > > > > > > > > that's a real problem per-se, but it's less than ideal.
> > > > > > > > > 
> > > > > > > > > > Not sure I have a solid suggestion right now. Whatever the fix, it
> > > > > > > > > > should capture any subtlety in a comment.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > At this point, I'm leaning toward just treating it like BADSLOT.
> > > > > > > > > Basically, mark the backchannel faulty, and leak the slot so that
> > > > > > > > > nothing else uses it. That allows us to send backchannel requests on
> > > > > > > > > the other slots until the session gets recreated.
> > > > > > > > 
> > > > > > > > Hmm, leaking the slot is a workable approach, as long as it doesn't
> > > > > > > > cascade more than a time or two. Some sort of trigger should be armed
> > > > > > > > to prevent runaway retries.
> > > > > > > > 
> > > > > > > > It's maybe worth considering what state the peer might be in when this
> > > > > > > > happens. It too may effectively leak a slot, and if is retaining some
> > > > > > > > bogus state either as a result of or because of the previous exchange(s)
> > > > > > > > then this may lead to future hangs/failures. Not pretty, and maybe not
> > > > > > > > worth trying to guess.
> > > > > > > > 
> > > > > > > > Tom.
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > The idea here is that eventually the client should figure out that
> > > > > > > something is wrong and reestablish the session. Currently we don't
> > > > > > > limit the number of retries on a callback.
> > > > > > > 
> > > > > > > Maybe they should time out after a while? If we've retried a callback
> > > > > > > for more than two lease periods, give up and log something?
> > > > > > > 
> > > > > > > Either way, I'd consider that to be follow-on work to this set.
> > > > > > 
> > > > > > As a general comment, I think making a heroic effort to recover in any
> > > > > > of these cases is probably not worth the additional complexity. Where it
> > > > > > is required or where we believe it is worth the trouble, that's where we
> > > > > > want a detailed comment.
> > > > > > 
> > > > > > What we want to do is ensure forward progress. I'm guessing that error
> > > > > > conditions are going to be rare, so leaking the slot until a certain
> > > > > > portion of them are gone, and then indicating a session fault to force
> > > > > > the client to start over from scratch, is probably the most
> > > > > > straightforward approach.
> > > > > > 
> > > > > > So, is there a good reason to retry? There doesn't appear to be any
> > > > > > reasoning mentioned in the commit log or in nearby comments.
> > > > > 
> > > > > Agreed on the general comment.
> > > > > 
> > > > > As for the "any reason to retry" - maybe. If it's a transient error we
> > > > > don't want to give up early. Unfortunately that appears to be an
> > > > > ambiguous situation, because SEQ_MISORDERED is allowed in place of
> > > > > ERR_DELAY. I don't have any great suggestion however.
> > > > > 
> > > > 
> > > > IMO, we should retry callbacks (basically) indefinitely, unless the
> > > > NFSv4 client is being torn down (i.e. lease expires or an unmount
> > > > happened, etc).
> > > > 
> > > > > Jeff, to your point that the "client should figure out something is
> > > > > wrong", I'm not sure how you think that will happen. If the server is
> > > > > making a delegation recall and the client receive code chooses to reject
> > > > > it at the sequence check, how would that eventually cause the client to
> > > > > reestablish the session (on the forechannel)?
> > > > > 
> > > > > 
> > > > 
> > > > In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
> > > > sets a flag in the client that makes it set
> > > > SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.
> > > 
> > > Aha, that's good. RFC8881 only mentions it twice, but it's normative:
> > > 
> > > SEQ4_STATUS_BACKCHANNEL_FAULT
> > >       The server has encountered an unrecoverable fault with the
> > >       backchannel (e.g., it has lost track of the sequence ID for a slot
> > >       in the backchannel). The client MUST stop sending more requests on
> > >       the session's fore channel, wait for all outstanding requests to
> > >       complete on the fore and back channel, and then destroy the session.
> > > 
> > > I guess my question is, what if the client ignores it anyway? What
> > > server code actually forces the recovery?
> > > 
> > > Tom.
> > > 
> > 
> > I don't think there is anything that does this right now. Does the RFC
> > mention what the server should do if that happens? I suppose the server
> > could just unilaterally destroy the session at some point, and force
> > the client to reestablish it.
> 
> Nope. :( Like many other requirements, it's unenforced normatively.
> 
> Perhaps we could consider this as an erratum, but it's more of an
> omission. Because of that, it may need IETF discussion ("at some point"
> needs a MUST). I'll volunteer to open an issue, if you agree to discuss!

Sure. I don't have a whole lot to add, but my proposal would be:

If the server sends a SEQUENCE reply with SEQ4_STATUS_BACKCHANNEL_FAULT
set, then the client has another lease period in which to reestablish
the session. After that, the server may unilaterally drop the session
and start returning NFS4ERR_BADSESSION to attempts to use it.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
  2025-02-09 17:05                           ` Jeff Layton
@ 2025-02-09 18:52                             ` Tom Talpey
  0 siblings, 0 replies; 27+ messages in thread
From: Tom Talpey @ 2025-02-09 18:52 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo,
	J. Bruce Fields
  Cc: linux-nfs, linux-kernel

On 2/9/2025 12:05 PM, Jeff Layton wrote:
> On Sun, 2025-02-09 at 11:58 -0500, Tom Talpey wrote:
>> On 2/9/2025 11:51 AM, Jeff Layton wrote:
>>> On Sun, 2025-02-09 at 11:26 -0500, Tom Talpey wrote:
>>>> On 2/8/2025 9:14 PM, Jeff Layton wrote:
>>>>> On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
>>>>>> On 2/8/2025 4:07 PM, Chuck Lever wrote:
>>>>>>> On 2/8/25 3:45 PM, Jeff Layton wrote:
>>>>>>>> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>>>>>>>>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>>>>>>>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>>>>>>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>>>>>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>>>>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>>>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>>>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>        fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>>>>>>>>        1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>>>>>>>        			goto requeue;
>>>>>>>>>>>>>>        		rpc_delay(task, 2 * HZ);
>>>>>>>>>>>>>>        		return false;
>>>>>>>>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>>>>>>>> +		/*
>>>>>>>>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>>>>>>>>> +		 * like BADSLOT.
>>>>>>>>>>>>>> +		 */
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>>>>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>>>>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>>>>>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>>>>>>>>
>>>>>>>>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>>>>>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>>>>>>>>> server then treat that as a retransmission?
>>>>>>>>>>>
>>>>>>>>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>>>>>>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>>>>>>>>
>>>>>>>>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>>>>>>>>> peer considering it to be a retransmission, which will be bad.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, that's what I meant.
>>>>>>>>>>
>>>>>>>>>>>> We might be best off
>>>>>>>>>>>> dropping this and just always treating it like BADSLOT.
>>>>>>>>>>>
>>>>>>>>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>>>>>>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>>>>>>>>
>>>>>>>>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>>>>>>>>> how does the requester know the difference?
>>>>>>>>>>>
>>>>>>>>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>>>>>>>>> but either a) the request is still in-progress, or b) if a bug is
>>>>>>>>>>> causing the situation, well it's not going to converge on a functional
>>>>>>>>>>> session.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>>>>>>> in the next forechannel SEQUENCE on the session. That should cause the
>>>>>>>>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, in the meantime, because of the way the callback channel
>>>>>>>>>> update works, the server can end up trying to send the callback again
>>>>>>>>>> on the same session (and maybe more than once). I'm not sure that
>>>>>>>>>> that's a real problem per-se, but it's less than ideal.
>>>>>>>>>>
>>>>>>>>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>>>>>>>>> should capture any subtlety in a comment.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>>>>>>>>> Basically, mark the backchannel faulty, and leak the slot so that
>>>>>>>>>> nothing else uses it. That allows us to send backchannel requests on
>>>>>>>>>> the other slots until the session gets recreated.
>>>>>>>>>
>>>>>>>>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>>>>>>>>> cascade more than a time or two. Some sort of trigger should be armed
>>>>>>>>> to prevent runaway retries.
>>>>>>>>>
>>>>>>>>> It's maybe worth considering what state the peer might be in when this
>>>>>>>>> happens. It too may effectively leak a slot, and if is retaining some
>>>>>>>>> bogus state either as a result of or because of the previous exchange(s)
>>>>>>>>> then this may lead to future hangs/failures. Not pretty, and maybe not
>>>>>>>>> worth trying to guess.
>>>>>>>>>
>>>>>>>>> Tom.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The idea here is that eventually the client should figure out that
>>>>>>>> something is wrong and reestablish the session. Currently we don't
>>>>>>>> limit the number of retries on a callback.
>>>>>>>>
>>>>>>>> Maybe they should time out after a while? If we've retried a callback
>>>>>>>> for more than two lease periods, give up and log something?
>>>>>>>>
>>>>>>>> Either way, I'd consider that to be follow-on work to this set.
>>>>>>>
>>>>>>> As a general comment, I think making a heroic effort to recover in any
>>>>>>> of these cases is probably not worth the additional complexity. Where it
>>>>>>> is required or where we believe it is worth the trouble, that's where we
>>>>>>> want a detailed comment.
>>>>>>>
>>>>>>> What we want to do is ensure forward progress. I'm guessing that error
>>>>>>> conditions are going to be rare, so leaking the slot until a certain
>>>>>>> portion of them are gone, and then indicating a session fault to force
>>>>>>> the client to start over from scratch, is probably the most
>>>>>>> straightforward approach.
>>>>>>>
>>>>>>> So, is there a good reason to retry? There doesn't appear to be any
>>>>>>> reasoning mentioned in the commit log or in nearby comments.
>>>>>>
>>>>>> Agreed on the general comment.
>>>>>>
>>>>>> As for the "any reason to retry" - maybe. If it's a transient error we
>>>>>> don't want to give up early. Unfortunately that appears to be an
>>>>>> ambiguous situation, because SEQ_MISORDERED is allowed in place of
>>>>>> ERR_DELAY. I don't have any great suggestion however.
>>>>>>
>>>>>
>>>>> IMO, we should retry callbacks (basically) indefinitely, unless the
>>>>> NFSv4 client is being torn down (i.e. lease expires or an unmount
>>>>> happened, etc).
>>>>>
>>>>>> Jeff, to your point that the "client should figure out something is
>>>>>> wrong", I'm not sure how you think that will happen. If the server is
>>>>>> making a delegation recall and the client receive code chooses to reject
>>>>>> it at the sequence check, how would that eventually cause the client to
>>>>>> reestablish the session (on the forechannel)?
>>>>>>
>>>>>>
>>>>>
>>>>> In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
>>>>> sets a flag in the client that makes it set
>>>>> SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.
>>>>
>>>> Aha, that's good. RFC8881 only mentions it twice, but it's normative:
>>>>
>>>> SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>        The server has encountered an unrecoverable fault with the
>>>>        backchannel (e.g., it has lost track of the sequence ID for a slot
>>>>        in the backchannel). The client MUST stop sending more requests on
>>>>        the session's fore channel, wait for all outstanding requests to
>>>>        complete on the fore and back channel, and then destroy the session.
>>>>
>>>> I guess my question is, what if the client ignores it anyway? What
>>>> server code actually forces the recovery?
>>>>
>>>> Tom.
>>>>
>>>
>>> I don't think there is anything that does this right now. Does the RFC
>>> mention what the server should do if that happens? I suppose the server
>>> could just unilaterally destroy the session at some point, and force
>>> the client to reestablish it.
>>
>> Nope. :( Like many other requirements, it's unenforced normatively.
>>
>> Perhaps we could consider this as an erratum, but it's more of an
>> omission. Because of that, it may need IETF discussion ("at some point"
>> needs a MUST). I'll volunteer to open an issue, if you agree to discuss!
> 
> Sure. I don't have a whole lot to add, but my proposal would be:
> 
> If the server sends a SEQUENCE reply with SEQ4_STATUS_BACKCHANNEL_FAULT
> set, then the client has another lease period in which to reestablish
> the session. After that, the server may unilaterally drop the session
> and start returning NFS4ERR_BADSESSION to attempts to use it.
Oh, I don't think it needs to wait one lease period, but that's an
interesting approach. The language needs to be normative and a bit
more clear.

I'll take it to the IETF this week.

Tom.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-02-09 18:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
2025-02-07 21:53 ` [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework Jeff Layton
2025-02-07 21:53 ` [PATCH v5 2/7] nfsd: always release slot when requeueing callback Jeff Layton
2025-02-08 16:57   ` Chuck Lever
2025-02-08 17:55     ` Jeff Layton
2025-02-07 21:53 ` [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task Jeff Layton
2025-02-08 16:59   ` Chuck Lever
2025-02-07 21:53 ` [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr Jeff Layton
2025-02-08 17:13   ` Chuck Lever
2025-02-07 21:53 ` [PATCH v5 5/7] nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better Jeff Layton
2025-02-07 21:53 ` [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better Jeff Layton
2025-02-08 17:01   ` Chuck Lever
2025-02-08 18:02     ` Jeff Layton
2025-02-08 18:40       ` Tom Talpey
2025-02-08 19:08         ` Jeff Layton
2025-02-08 19:18           ` Tom Talpey
2025-02-08 20:45             ` Jeff Layton
2025-02-08 21:07               ` Chuck Lever
2025-02-09  1:24                 ` Tom Talpey
2025-02-09  2:14                   ` Jeff Layton
2025-02-09 16:26                     ` Tom Talpey
2025-02-09 16:51                       ` Jeff Layton
2025-02-09 16:58                         ` Tom Talpey
2025-02-09 17:05                           ` Jeff Layton
2025-02-09 18:52                             ` Tom Talpey
2025-02-07 21:53 ` [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() Jeff Layton
2025-02-08 17:05   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox