* [PATCH 2 01/14] NFSD: Reset cb_seq_status after NFS4ERR_DELAY
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 17:45 ` [PATCH 2 02/14] NFSD: Convert the callback workqueue to use delayed_work Chuck Lever
` (13 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
I noticed that once an NFSv4.1 callback operation gets a
NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost,
the callback client loops, resending it indefinitely.
The switch arm in nfsd4_cb_sequence_done() that handles
NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine
for the retransmit, but that path does not call the rpc_prepare_call
callback again. Thus cb_seq_status is set to -10008 by the first
NFS4ERR_DELAY result, but is never set back to 1 for the retransmits.
nfsd4_cb_sequence_done() thinks it's getting nothing but a
long series of CB_SEQUENCE NFS4ERR_DELAY replies.
Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 926c29879c6a..43b0a34a5d5b 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1178,6 +1178,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
ret = false;
break;
case -NFS4ERR_DELAY:
+ cb->cb_seq_status = 1;
if (!rpc_restart_call(task))
goto out;
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 02/14] NFSD: Convert the callback workqueue to use delayed_work
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
2024-01-26 17:45 ` [PATCH 2 01/14] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 19:09 ` Jeff Layton
2024-01-26 17:45 ` [PATCH 2 03/14] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
` (12 subsequent siblings)
14 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Normally, NFSv4 callback operations are supposed to be sent to the
client as soon as they are queued up.
In a moment, I will introduce a recovery path where the server has
to wait for the client to reconnect. We don't want a hard busy wait
here -- the callback should be requeued to try again in several
milliseconds.
For now, convert nfsd4_callback from struct work_struct to struct
delayed_work, and queue with a zero delay argument. This should
avoid behavior changes for current operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 6 +++---
fs/nfsd/state.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 43b0a34a5d5b..1ed2512b3648 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -887,7 +887,7 @@ static struct workqueue_struct *callback_wq;
static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
{
- return queue_work(callback_wq, &cb->cb_work);
+ return queue_delayed_work(callback_wq, &cb->cb_work, 0);
}
static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
@@ -1370,7 +1370,7 @@ static void
nfsd4_run_cb_work(struct work_struct *work)
{
struct nfsd4_callback *cb =
- container_of(work, struct nfsd4_callback, cb_work);
+ container_of(work, struct nfsd4_callback, cb_work.work);
struct nfs4_client *clp = cb->cb_clp;
struct rpc_clnt *clnt;
int flags;
@@ -1415,7 +1415,7 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
cb->cb_msg.rpc_argp = cb;
cb->cb_msg.rpc_resp = cb;
cb->cb_ops = ops;
- INIT_WORK(&cb->cb_work, nfsd4_run_cb_work);
+ INIT_DELAYED_WORK(&cb->cb_work, nfsd4_run_cb_work);
cb->cb_seq_status = 1;
cb->cb_status = 0;
cb->cb_need_restart = false;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 41bdc913fa71..87c4372ba36a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -68,7 +68,7 @@ struct nfsd4_callback {
struct nfs4_client *cb_clp;
struct rpc_message cb_msg;
const struct nfsd4_callback_ops *cb_ops;
- struct work_struct cb_work;
+ struct delayed_work cb_work;
int cb_seq_status;
int cb_status;
bool cb_need_restart;
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2 02/14] NFSD: Convert the callback workqueue to use delayed_work
2024-01-26 17:45 ` [PATCH 2 02/14] NFSD: Convert the callback workqueue to use delayed_work Chuck Lever
@ 2024-01-26 19:09 ` Jeff Layton
0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2024-01-26 19:09 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Fri, 2024-01-26 at 12:45 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Normally, NFSv4 callback operations are supposed to be sent to the
> client as soon as they are queued up.
>
> In a moment, I will introduce a recovery path where the server has
> to wait for the client to reconnect. We don't want a hard busy wait
> here -- the callback should be requeued to try again in several
> milliseconds.
>
> For now, convert nfsd4_callback from struct work_struct to struct
> delayed_work, and queue with a zero delay argument. This should
> avoid behavior changes for current operation.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 6 +++---
> fs/nfsd/state.h | 2 +-
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 43b0a34a5d5b..1ed2512b3648 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -887,7 +887,7 @@ static struct workqueue_struct *callback_wq;
>
> static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
> {
> - return queue_work(callback_wq, &cb->cb_work);
> + return queue_delayed_work(callback_wq, &cb->cb_work, 0);
> }
>
> static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
> @@ -1370,7 +1370,7 @@ static void
> nfsd4_run_cb_work(struct work_struct *work)
> {
> struct nfsd4_callback *cb =
> - container_of(work, struct nfsd4_callback, cb_work);
> + container_of(work, struct nfsd4_callback, cb_work.work);
> struct nfs4_client *clp = cb->cb_clp;
> struct rpc_clnt *clnt;
> int flags;
> @@ -1415,7 +1415,7 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
> cb->cb_msg.rpc_argp = cb;
> cb->cb_msg.rpc_resp = cb;
> cb->cb_ops = ops;
> - INIT_WORK(&cb->cb_work, nfsd4_run_cb_work);
> + INIT_DELAYED_WORK(&cb->cb_work, nfsd4_run_cb_work);
> cb->cb_seq_status = 1;
> cb->cb_status = 0;
> cb->cb_need_restart = false;
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 41bdc913fa71..87c4372ba36a 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -68,7 +68,7 @@ struct nfsd4_callback {
> struct nfs4_client *cb_clp;
> struct rpc_message cb_msg;
> const struct nfsd4_callback_ops *cb_ops;
> - struct work_struct cb_work;
> + struct delayed_work cb_work;
> int cb_seq_status;
> int cb_status;
> bool cb_need_restart;
>
>
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2 03/14] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
2024-01-26 17:45 ` [PATCH 2 01/14] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
2024-01-26 17:45 ` [PATCH 2 02/14] NFSD: Convert the callback workqueue to use delayed_work Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 19:10 ` Jeff Layton
2024-01-26 17:45 ` [PATCH 2 04/14] NFSD: Retransmit callbacks after client reconnects Chuck Lever
` (11 subsequent siblings)
14 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
As part of managing a client disconnect, NFSD closes down and
replaces the backchannel rpc_clnt.
If a callback operation is pending when the backchannel rpc_clnt is
shut down, currently nfsd4_run_cb_work() just discards that
callback. But there are multiple cases to deal with here:
o The client's lease is getting destroyed. Throw the CB away.
o The client disconnected. It might be forcing a retransmit of
CB operations, or it could have disconnected for other reasons.
Reschedule the CB so it is retransmitted when the client
reconnects.
Since callback operations can now be rescheduled, ensure that
cb_ops->prepare can be called only once by moving the
cb_ops->prepare paragraph down to just before the rpc_call_async()
call.
Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1ed2512b3648..389d05985c52 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -890,6 +890,13 @@ static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
return queue_delayed_work(callback_wq, &cb->cb_work, 0);
}
+static void nfsd4_queue_cb_delayed(struct nfsd4_callback *cb,
+ unsigned long msecs)
+{
+ queue_delayed_work(callback_wq, &cb->cb_work,
+ msecs_to_jiffies(msecs));
+}
+
static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
{
atomic_inc(&clp->cl_cb_inflight);
@@ -1375,20 +1382,21 @@ nfsd4_run_cb_work(struct work_struct *work)
struct rpc_clnt *clnt;
int flags;
- if (cb->cb_need_restart) {
- cb->cb_need_restart = false;
- } else {
- if (cb->cb_ops && cb->cb_ops->prepare)
- cb->cb_ops->prepare(cb);
- }
-
if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
nfsd4_process_cb_update(cb);
clnt = clp->cl_cb_client;
if (!clnt) {
- /* Callback channel broken, or client killed; give up: */
- nfsd41_destroy_cb(cb);
+ if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
+ nfsd41_destroy_cb(cb);
+ else {
+ /*
+ * XXX: Ideally, we could wait for the client to
+ * reconnect, but I haven't figured out how
+ * to do that yet.
+ */
+ nfsd4_queue_cb_delayed(cb, 25);
+ }
return;
}
@@ -1401,6 +1409,12 @@ nfsd4_run_cb_work(struct work_struct *work)
return;
}
+ if (cb->cb_need_restart) {
+ cb->cb_need_restart = false;
+ } else {
+ if (cb->cb_ops && cb->cb_ops->prepare)
+ cb->cb_ops->prepare(cb);
+ }
cb->cb_msg.rpc_cred = clp->cl_cb_cred;
flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN;
rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags,
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2 03/14] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
2024-01-26 17:45 ` [PATCH 2 03/14] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
@ 2024-01-26 19:10 ` Jeff Layton
0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2024-01-26 19:10 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Fri, 2024-01-26 at 12:45 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> As part of managing a client disconnect, NFSD closes down and
> replaces the backchannel rpc_clnt.
>
> If a callback operation is pending when the backchannel rpc_clnt is
> shut down, currently nfsd4_run_cb_work() just discards that
> callback. But there are multiple cases to deal with here:
>
> o The client's lease is getting destroyed. Throw the CB away.
>
> o The client disconnected. It might be forcing a retransmit of
> CB operations, or it could have disconnected for other reasons.
> Reschedule the CB so it is retransmitted when the client
> reconnects.
>
> Since callback operations can now be rescheduled, ensure that
> cb_ops->prepare can be called only once by moving the
> cb_ops->prepare paragraph down to just before the rpc_call_async()
> call.
>
> Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 32 +++++++++++++++++++++++---------
> 1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 1ed2512b3648..389d05985c52 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -890,6 +890,13 @@ static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
> return queue_delayed_work(callback_wq, &cb->cb_work, 0);
> }
>
> +static void nfsd4_queue_cb_delayed(struct nfsd4_callback *cb,
> + unsigned long msecs)
> +{
> + queue_delayed_work(callback_wq, &cb->cb_work,
> + msecs_to_jiffies(msecs));
> +}
> +
> static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
> {
> atomic_inc(&clp->cl_cb_inflight);
> @@ -1375,20 +1382,21 @@ nfsd4_run_cb_work(struct work_struct *work)
> struct rpc_clnt *clnt;
> int flags;
>
> - if (cb->cb_need_restart) {
> - cb->cb_need_restart = false;
> - } else {
> - if (cb->cb_ops && cb->cb_ops->prepare)
> - cb->cb_ops->prepare(cb);
> - }
> -
> if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
> nfsd4_process_cb_update(cb);
>
> clnt = clp->cl_cb_client;
> if (!clnt) {
> - /* Callback channel broken, or client killed; give up: */
> - nfsd41_destroy_cb(cb);
> + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> + nfsd41_destroy_cb(cb);
> + else {
> + /*
> + * XXX: Ideally, we could wait for the client to
> + * reconnect, but I haven't figured out how
> + * to do that yet.
> + */
> + nfsd4_queue_cb_delayed(cb, 25);
> + }
> return;
> }
>
> @@ -1401,6 +1409,12 @@ nfsd4_run_cb_work(struct work_struct *work)
> return;
> }
>
> + if (cb->cb_need_restart) {
> + cb->cb_need_restart = false;
> + } else {
> + if (cb->cb_ops && cb->cb_ops->prepare)
> + cb->cb_ops->prepare(cb);
> + }
> cb->cb_msg.rpc_cred = clp->cl_cb_cred;
> flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN;
> rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags,
>
>
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2 04/14] NFSD: Retransmit callbacks after client reconnects
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (2 preceding siblings ...)
2024-01-26 17:45 ` [PATCH 2 03/14] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 17:45 ` [PATCH 2 05/14] NFSD: Add nfsd_seq4_status trace event Chuck Lever
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
NFSv4.1 clients assume that if they disconnect, that will force the
server to resend pending callback operations once a fresh connection
has been established.
Turns out NFSD has not been resending after reconnect.
Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 389d05985c52..3bff14241b3c 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1178,12 +1178,21 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
case -ESERVERFAULT:
++session->se_cb_seq_nr;
- fallthrough;
+ nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ ret = false;
+ break;
case 1:
+ /*
+ * cb_seq_status remains 1 if an RPC Reply was never
+ * received. NFSD can't know if the client processed
+ * the CB_SEQUENCE operation. Ask the client to send a
+ * DESTROY_SESSION to recover.
+ */
+ fallthrough;
case -NFS4ERR_BADSESSION:
nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
ret = false;
- break;
+ goto need_restart;
case -NFS4ERR_DELAY:
cb->cb_seq_status = 1;
if (!rpc_restart_call(task))
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 05/14] NFSD: Add nfsd_seq4_status trace event
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (3 preceding siblings ...)
2024-01-26 17:45 ` [PATCH 2 04/14] NFSD: Retransmit callbacks after client reconnects Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 17:45 ` [PATCH 2 06/14] NFSD: Replace dprintks in nfsd4_cb_sequence_done() Chuck Lever
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Add a trace point that records SEQ4_STATUS flags returned in an
NFSv4.1 SEQUENCE response. SEQ4_STATUS flags report backchannel
issues and changes to lease state to clients. Knowing what the
server is reporting to clients is useful for debugging both
configuration and operational issues in real time.
For example, upcoming patches will enable server administrators to
revoke parts of a client's lease; that revocation is indicated to
the client when a subsequent SEQUENCE operation has one or more
SEQ4_STATUS flags that are set.
Sample trace records:
nfsd-927 [006] 615.581821: nfsd_seq4_status: xid=0x095ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
nfsd-927 [006] 615.588043: nfsd_seq4_status: xid=0x0a5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
nfsd-928 [003] 615.588448: nfsd_seq4_status: xid=0x0b5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/trace.h | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d7d561b29fb0..3d6e3dcfdee1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4058,6 +4058,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
}
if (!list_empty(&clp->cl_revoked))
seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
+ trace_nfsd_seq4_status(rqstp, seq);
out_no_session:
if (conn)
free_conn(conn);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d1e8cf079b0f..38d11b43779c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -696,6 +696,41 @@ DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name, \
DEFINE_STID_EVENT(revoke);
+TRACE_EVENT_CONDITION(nfsd_seq4_status,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct nfsd4_sequence *sequence
+ ),
+ TP_ARGS(rqstp, sequence),
+ TP_CONDITION(sequence->status_flags),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __field(u32, xid)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(unsigned long, status_flags)
+ ),
+ TP_fast_assign(
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&sequence->sessionid;
+
+ __entry->netns_ino = SVC_NET(rqstp)->ns.inum;
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->status_flags = sequence->status_flags;
+ ),
+ TP_printk("xid=0x%08x sessionid=%08x:%08x:%08x:%08x status_flags=%s",
+ __entry->xid, __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ show_nfs4_seq4_status(__entry->status_flags)
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_clientid_class,
TP_PROTO(const clientid_t *clid),
TP_ARGS(clid),
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 06/14] NFSD: Replace dprintks in nfsd4_cb_sequence_done()
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (4 preceding siblings ...)
2024-01-26 17:45 ` [PATCH 2 05/14] NFSD: Add nfsd_seq4_status trace event Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 17:45 ` [PATCH 2 07/14] NFSD: Rename nfsd_cb_state trace point Chuck Lever
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Improve observability of backchannel session operation.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 9 ++---
fs/nfsd/trace.h | 82 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 3bff14241b3c..78d9939cf4b0 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1165,6 +1165,8 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
if (!cb->cb_holds_slot)
goto need_restart;
+ /* This is the operation status code for CB_SEQUENCE */
+ trace_nfsd_cb_seq_status(task, cb);
switch (cb->cb_seq_status) {
case 0:
/*
@@ -1210,13 +1212,10 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
default:
nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
- dprintk("%s: unprocessed error %d\n", __func__,
- cb->cb_seq_status);
}
-
nfsd41_cb_release_slot(cb);
- dprintk("%s: freed slot, new seqid=%d\n", __func__,
- clp->cl_cb_session->se_cb_seq_nr);
+
+ trace_nfsd_cb_free_slot(task, cb);
if (RPC_SIGNALLED(task))
goto need_restart;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 38d11b43779c..c134c755ae5d 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -9,8 +9,10 @@
#define _NFSD_TRACE_H
#include <linux/tracepoint.h>
+#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprt.h>
#include <trace/misc/nfs.h>
+#include <trace/misc/sunrpc.h>
#include "export.h"
#include "nfsfh.h"
@@ -1440,6 +1442,86 @@ TRACE_EVENT(nfsd_cb_setup_err,
__entry->error)
);
+TRACE_EVENT(nfsd_cb_seq_status,
+ TP_PROTO(
+ const struct rpc_task *task,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(task, cb),
+ TP_STRUCT__entry(
+ __field(unsigned int, task_id)
+ __field(unsigned int, client_id)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(int, tk_status)
+ __field(int, seq_status)
+ ),
+ TP_fast_assign(
+ const struct nfs4_client *clp = cb->cb_clp;
+ const struct nfsd4_session *session = clp->cl_cb_session;
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&session->se_sessionid;
+
+ __entry->task_id = task->tk_pid;
+ __entry->client_id = task->tk_client ?
+ task->tk_client->cl_clid : -1;
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->tk_status = task->tk_status;
+ __entry->seq_status = cb->cb_seq_status;
+ ),
+ TP_printk(SUNRPC_TRACE_TASK_SPECIFIER
+ " sessionid=%08x:%08x:%08x:%08x tk_status=%d seq_status=%d\n",
+ __entry->task_id, __entry->client_id,
+ __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ __entry->tk_status, __entry->seq_status
+ )
+);
+
+TRACE_EVENT(nfsd_cb_free_slot,
+ TP_PROTO(
+ const struct rpc_task *task,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(task, cb),
+ TP_STRUCT__entry(
+ __field(unsigned int, task_id)
+ __field(unsigned int, client_id)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(u32, slot_seqno)
+ ),
+ TP_fast_assign(
+ const struct nfs4_client *clp = cb->cb_clp;
+ const struct nfsd4_session *session = clp->cl_cb_session;
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&session->se_sessionid;
+
+ __entry->task_id = task->tk_pid;
+ __entry->client_id = task->tk_client ?
+ task->tk_client->cl_clid : -1;
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->slot_seqno = session->se_cb_seq_nr;
+ ),
+ TP_printk(SUNRPC_TRACE_TASK_SPECIFIER
+ " sessionid=%08x:%08x:%08x:%08x new slot seqno=%u\n",
+ __entry->task_id, __entry->client_id,
+ __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ __entry->slot_seqno
+ )
+);
+
TRACE_EVENT_CONDITION(nfsd_cb_recall,
TP_PROTO(
const struct nfs4_stid *stid
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 07/14] NFSD: Rename nfsd_cb_state trace point
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (5 preceding siblings ...)
2024-01-26 17:45 ` [PATCH 2 06/14] NFSD: Replace dprintks in nfsd4_cb_sequence_done() Chuck Lever
@ 2024-01-26 17:45 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 08/14] NFSD: Add callback operation lifetime trace points Chuck Lever
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:45 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Make it clear where backchannel state is updated.
Example trace point output:
kworker/u16:0-10 [006] 2800.080404: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UP
nfsd-940 [003] 2800.478368: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN
kworker/u16:0-10 [003] 2800.478828: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN
kworker/u16:0-10 [005] 2802.039724: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UP
kworker/u16:0-10 [005] 2810.611452: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=FAULT
kworker/u16:0-10 [005] 2810.616832: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN
kworker/u16:0-10 [005] 2810.616931: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 4 +++-
fs/nfsd/trace.h | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 78d9939cf4b0..a63171ccfc2b 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1006,7 +1006,7 @@ static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate)
{
if (clp->cl_cb_state != newstate) {
clp->cl_cb_state = newstate;
- trace_nfsd_cb_state(clp);
+ trace_nfsd_cb_new_state(clp);
}
}
@@ -1390,6 +1390,8 @@ nfsd4_run_cb_work(struct work_struct *work)
struct rpc_clnt *clnt;
int flags;
+ trace_nfsd_cb_start(clp);
+
if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
nfsd4_process_cb_update(cb);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c134c755ae5d..6003af2bee33 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1371,7 +1371,8 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \
TP_PROTO(const struct nfs4_client *clp), \
TP_ARGS(clp))
-DEFINE_NFSD_CB_EVENT(state);
+DEFINE_NFSD_CB_EVENT(start);
+DEFINE_NFSD_CB_EVENT(new_state);
DEFINE_NFSD_CB_EVENT(probe);
DEFINE_NFSD_CB_EVENT(lost);
DEFINE_NFSD_CB_EVENT(shutdown);
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 08/14] NFSD: Add callback operation lifetime trace points
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (6 preceding siblings ...)
2024-01-26 17:45 ` [PATCH 2 07/14] NFSD: Rename nfsd_cb_state trace point Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 09/14] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc() Chuck Lever
` (6 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Help observe the flow of callback operations.
bc_shutdown() records exactly when the backchannel RPC client is
destroyed and cl_cb_client is replaced with NULL.
Examples include:
nfsd-955 [004] 650.013997: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try)
kworker/u21:4-497 [004] 650.014050: nfsd_cb_seq_status: task:00000001@00000001 sessionid=65b3c5b8:f541f749:00000001:00000000 tk_status=-107 seq_status=1
kworker/u21:4-497 [004] 650.014051: nfsd_cb_restart: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (first try)
kworker/u21:4-497 [004] 650.014066: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (need restart)
kworker/u16:0-10 [006] 650.065750: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN
kworker/u16:0-10 [006] 650.065752: nfsd_cb_bc_update: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try)
kworker/u16:0-10 [006] 650.065754: nfsd_cb_bc_shutdown: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try)
kworker/u16:0-10 [006] 650.065810: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 8 ++++++++
fs/nfsd/trace.h | 42 ++++++++++++++++++++++++++++++++++++++++++
include/trace/misc/nfs.h | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 84 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index a63171ccfc2b..b50ce54aa1bf 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -887,12 +887,14 @@ static struct workqueue_struct *callback_wq;
static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
{
+ trace_nfsd_cb_queue(cb->cb_clp, cb);
return queue_delayed_work(callback_wq, &cb->cb_work, 0);
}
static void nfsd4_queue_cb_delayed(struct nfsd4_callback *cb,
unsigned long msecs)
{
+ trace_nfsd_cb_queue(cb->cb_clp, cb);
queue_delayed_work(callback_wq, &cb->cb_work,
msecs_to_jiffies(msecs));
}
@@ -1113,6 +1115,7 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb)
{
struct nfs4_client *clp = cb->cb_clp;
+ trace_nfsd_cb_destroy(clp, cb);
nfsd41_cb_release_slot(cb);
if (cb->cb_ops && cb->cb_ops->release)
cb->cb_ops->release(cb);
@@ -1227,6 +1230,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
goto out;
need_restart:
if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
+ trace_nfsd_cb_restart(clp, cb);
task->tk_status = 0;
cb->cb_need_restart = true;
}
@@ -1340,11 +1344,14 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
struct nfsd4_conn *c;
int err;
+ trace_nfsd_cb_bc_update(clp, cb);
+
/*
* This is either an update, or the client dying; in either case,
* kill the old client:
*/
if (clp->cl_cb_client) {
+ trace_nfsd_cb_bc_shutdown(clp, cb);
rpc_shutdown_client(clp->cl_cb_client);
clp->cl_cb_client = NULL;
put_cred(clp->cl_cb_cred);
@@ -1356,6 +1363,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
}
if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
return;
+
spin_lock(&clp->cl_lock);
/*
* Only serialized callback code is allowed to clear these
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 6003af2bee33..9f9e58debc26 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1443,6 +1443,48 @@ TRACE_EVENT(nfsd_cb_setup_err,
__entry->error)
);
+DECLARE_EVENT_CLASS(nfsd_cb_lifetime_class,
+ TP_PROTO(
+ const struct nfs4_client *clp,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(clp, cb),
+ TP_STRUCT__entry(
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(const void *, cb)
+ __field(bool, need_restart)
+ __sockaddr(addr, clp->cl_cb_conn.cb_addrlen)
+ ),
+ TP_fast_assign(
+ __entry->cl_boot = clp->cl_clientid.cl_boot;
+ __entry->cl_id = clp->cl_clientid.cl_id;
+ __entry->cb = cb;
+ __entry->need_restart = cb->cb_need_restart;
+ __assign_sockaddr(addr, &clp->cl_cb_conn.cb_addr,
+ clp->cl_cb_conn.cb_addrlen)
+ ),
+ TP_printk("addr=%pISpc client %08x:%08x cb=%p%s",
+ __get_sockaddr(addr), __entry->cl_boot, __entry->cl_id,
+ __entry->cb, __entry->need_restart ?
+ " (need restart)" : " (first try)"
+ )
+);
+
+#define DEFINE_NFSD_CB_LIFETIME_EVENT(name) \
+DEFINE_EVENT(nfsd_cb_lifetime_class, nfsd_cb_##name, \
+ TP_PROTO( \
+ const struct nfs4_client *clp, \
+ const struct nfsd4_callback *cb \
+ ), \
+ TP_ARGS(clp, cb))
+
+DEFINE_NFSD_CB_LIFETIME_EVENT(queue);
+DEFINE_NFSD_CB_LIFETIME_EVENT(destroy);
+DEFINE_NFSD_CB_LIFETIME_EVENT(restart);
+DEFINE_NFSD_CB_LIFETIME_EVENT(bc_update);
+DEFINE_NFSD_CB_LIFETIME_EVENT(bc_shutdown);
+
TRACE_EVENT(nfsd_cb_seq_status,
TP_PROTO(
const struct rpc_task *task,
diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h
index 0d9d48dca38a..64ab5dac59ce 100644
--- a/include/trace/misc/nfs.h
+++ b/include/trace/misc/nfs.h
@@ -385,3 +385,37 @@ TRACE_DEFINE_ENUM(IOMODE_ANY);
{ SEQ4_STATUS_RESTART_RECLAIM_NEEDED, "RESTART_RECLAIM_NEEDED" }, \
{ SEQ4_STATUS_CB_PATH_DOWN_SESSION, "CB_PATH_DOWN_SESSION" }, \
{ SEQ4_STATUS_BACKCHANNEL_FAULT, "BACKCHANNEL_FAULT" })
+
+TRACE_DEFINE_ENUM(OP_CB_GETATTR);
+TRACE_DEFINE_ENUM(OP_CB_RECALL);
+TRACE_DEFINE_ENUM(OP_CB_LAYOUTRECALL);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY);
+TRACE_DEFINE_ENUM(OP_CB_PUSH_DELEG);
+TRACE_DEFINE_ENUM(OP_CB_RECALL_ANY);
+TRACE_DEFINE_ENUM(OP_CB_RECALLABLE_OBJ_AVAIL);
+TRACE_DEFINE_ENUM(OP_CB_RECALL_SLOT);
+TRACE_DEFINE_ENUM(OP_CB_SEQUENCE);
+TRACE_DEFINE_ENUM(OP_CB_WANTS_CANCELLED);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY_LOCK);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY_DEVICEID);
+TRACE_DEFINE_ENUM(OP_CB_OFFLOAD);
+TRACE_DEFINE_ENUM(OP_CB_ILLEGAL);
+
+#define show_nfs4_cb_op(x) \
+ __print_symbolic(x, \
+ { 0, "CB_NULL" }, \
+ { 1, "CB_COMPOUND" }, \
+ { OP_CB_GETATTR, "CB_GETATTR" }, \
+ { OP_CB_RECALL, "CB_RECALL" }, \
+ { OP_CB_LAYOUTRECALL, "CB_LAYOUTRECALL" }, \
+ { OP_CB_NOTIFY, "CB_NOTIFY" }, \
+ { OP_CB_PUSH_DELEG, "CB_PUSH_DELEG" }, \
+ { OP_CB_RECALL_ANY, "CB_RECALL_ANY" }, \
+ { OP_CB_RECALLABLE_OBJ_AVAIL, "CB_RECALLABLE_OBJ_AVAIL" }, \
+ { OP_CB_RECALL_SLOT, "CB_RECALL_SLOT" }, \
+ { OP_CB_SEQUENCE, "CB_SEQUENCE" }, \
+ { OP_CB_WANTS_CANCELLED, "CB_WANTS_CANCELLED" }, \
+ { OP_CB_NOTIFY_LOCK, "CB_NOTIFY_LOCK" }, \
+ { OP_CB_NOTIFY_DEVICEID, "CB_NOTIFY_DEVICEID" }, \
+ { OP_CB_OFFLOAD, "CB_OFFLOAD" }, \
+ { OP_CB_ILLEGAL, "CB_ILLEGAL" })
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 09/14] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (7 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 08/14] NFSD: Add callback operation lifetime trace points Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 10/14] NFSD: Remove unused @reason argument Chuck Lever
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
svc_process_bc(), previously known as bc_svc_process(), was
added in commit 4d6bbb6233c9 ("nfs41: Backchannel bc_svc_process()")
but there has never been a call site outside of the sunrpc.ko
module.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index eb5856e1351d..425c1ca9a772 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1603,7 +1603,6 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
WARN_ON_ONCE(atomic_read(&task->tk_count) != 1);
rpc_put_task(task);
}
-EXPORT_SYMBOL_GPL(svc_process_bc);
#endif /* CONFIG_SUNRPC_BACKCHANNEL */
/**
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 10/14] NFSD: Remove unused @reason argument
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (8 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 09/14] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc() Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 11/14] NFSD: Replace comment with lockdep assertion Chuck Lever
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index b50ce54aa1bf..45a31f051595 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -45,7 +45,7 @@
#define NFSDDBG_FACILITY NFSDDBG_PROC
-static void nfsd4_mark_cb_fault(struct nfs4_client *, int reason);
+static void nfsd4_mark_cb_fault(struct nfs4_client *clp);
#define NFSPROC4_CB_NULL 0
#define NFSPROC4_CB_COMPOUND 1
@@ -1012,14 +1012,14 @@ static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate)
}
}
-static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason)
+static void nfsd4_mark_cb_down(struct nfs4_client *clp)
{
if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags))
return;
nfsd4_mark_cb_state(clp, NFSD4_CB_DOWN);
}
-static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason)
+static void nfsd4_mark_cb_fault(struct nfs4_client *clp)
{
if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags))
return;
@@ -1031,7 +1031,7 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata)
struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null);
if (task->tk_status)
- nfsd4_mark_cb_down(clp, task->tk_status);
+ nfsd4_mark_cb_down(clp);
else
nfsd4_mark_cb_state(clp, NFSD4_CB_UP);
}
@@ -1183,7 +1183,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
case -ESERVERFAULT:
++session->se_cb_seq_nr;
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
ret = false;
break;
case 1:
@@ -1195,7 +1195,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
*/
fallthrough;
case -NFS4ERR_BADSESSION:
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
ret = false;
goto need_restart;
case -NFS4ERR_DELAY:
@@ -1214,7 +1214,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
}
break;
default:
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
}
nfsd41_cb_release_slot(cb);
@@ -1260,7 +1260,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
case -EIO:
case -ETIMEDOUT:
case -EACCES:
- nfsd4_mark_cb_down(clp, task->tk_status);
+ nfsd4_mark_cb_down(clp);
}
break;
default:
@@ -1382,7 +1382,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
err = setup_callback_client(clp, &conn, ses);
if (err) {
- nfsd4_mark_cb_down(clp, err);
+ nfsd4_mark_cb_down(clp);
if (c)
svc_xprt_put(c->cn_xprt);
return;
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 11/14] NFSD: Replace comment with lockdep assertion
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (9 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 10/14] NFSD: Remove unused @reason argument Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 12/14] NFSD: Remove BUG_ON in nfsd4_process_cb_update() Chuck Lever
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Convert a code comment into a real assertion.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 45a31f051595..d73c66fa131d 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1315,12 +1315,13 @@ void nfsd4_shutdown_callback(struct nfs4_client *clp)
nfsd41_cb_inflight_wait_complete(clp);
}
-/* requires cl_lock: */
static struct nfsd4_conn * __nfsd4_find_backchannel(struct nfs4_client *clp)
{
struct nfsd4_session *s;
struct nfsd4_conn *c;
+ lockdep_assert_held(&clp->cl_lock);
+
list_for_each_entry(s, &clp->cl_sessions, se_perclnt) {
list_for_each_entry(c, &s->se_conns, cn_persession) {
if (c->cn_flags & NFS4_CDFC4_BACK)
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 12/14] NFSD: Remove BUG_ON in nfsd4_process_cb_update()
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (10 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 11/14] NFSD: Replace comment with lockdep assertion Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 13/14] SUNRPC: Remove stale comments Chuck Lever
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Don't kill the kworker thread, and don't panic while cl_lock is
held. There's no need for scorching the earth here.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d73c66fa131d..fd6a27e20f65 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1370,8 +1370,9 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
* Only serialized callback code is allowed to clear these
* flags; main nfsd code can only set them:
*/
- BUG_ON(!(clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK));
+ WARN_ON(!(clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK));
clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
+
memcpy(&conn, &cb->cb_clp->cl_cb_conn, sizeof(struct nfs4_cb_conn));
c = __nfsd4_find_backchannel(clp);
if (c) {
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 13/14] SUNRPC: Remove stale comments
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (11 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 12/14] NFSD: Remove BUG_ON in nfsd4_process_cb_update() Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 17:46 ` [PATCH 2 14/14] NFSD: Remove redundant cb_seq_status initialization Chuck Lever
2024-01-26 20:10 ` [PATCH 2 00/14] NFSD backchannel fixes Benjamin Coddington
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
bc_close() and bc_destroy now do something, so the comments are
no longer correct. Commit 6221f1d9b63f ("SUNRPC: Fix backchannel
RPC soft lockups") should have removed these.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtsock.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 58f3dc8d0d71..d92c13e78a56 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2987,20 +2987,11 @@ static int bc_send_request(struct rpc_rqst *req)
return len;
}
-/*
- * The close routine. Since this is client initiated, we do nothing
- */
-
static void bc_close(struct rpc_xprt *xprt)
{
xprt_disconnect_done(xprt);
}
-/*
- * The xprt destroy routine. Again, because this connection is client
- * initiated, we do nothing
- */
-
static void bc_destroy(struct rpc_xprt *xprt)
{
dprintk("RPC: bc_destroy xprt %p\n", xprt);
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2 14/14] NFSD: Remove redundant cb_seq_status initialization
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (12 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 13/14] SUNRPC: Remove stale comments Chuck Lever
@ 2024-01-26 17:46 ` Chuck Lever
2024-01-26 20:10 ` [PATCH 2 00/14] NFSD backchannel fixes Benjamin Coddington
14 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2024-01-26 17:46 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
As far as I can see, setting cb_seq_status in nfsd4_init_cb() is
superfluous because it is set again in nfsd4_cb_prepare().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index fd6a27e20f65..32dd2fbb1f30 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1450,7 +1450,6 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
cb->cb_msg.rpc_resp = cb;
cb->cb_ops = ops;
INIT_DELAYED_WORK(&cb->cb_work, nfsd4_run_cb_work);
- cb->cb_seq_status = 1;
cb->cb_status = 0;
cb->cb_need_restart = false;
cb->cb_holds_slot = false;
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2 00/14] NFSD backchannel fixes
2024-01-26 17:45 [PATCH 2 00/14] NFSD backchannel fixes Chuck Lever
` (13 preceding siblings ...)
2024-01-26 17:46 ` [PATCH 2 14/14] NFSD: Remove redundant cb_seq_status initialization Chuck Lever
@ 2024-01-26 20:10 ` Benjamin Coddington
14 siblings, 0 replies; 18+ messages in thread
From: Benjamin Coddington @ 2024-01-26 20:10 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs
On 26 Jan 2024, at 12:45, Chuck Lever wrote:
> The first four patches fix bugs that prevent NFSD's backchannel
> from reliably retransmitting after a client reconnects.
>
> Following that are some new trace points that might be helpful for
> field troubleshooting.
>
> Then there are some minor clean-ups.
>
> Changes since RFC:
> - Replace the msleep with queue_delayed_work
> - Refinements to patch descriptions
>
> ---
>
> Chuck Lever (14):
> NFSD: Reset cb_seq_status after NFS4ERR_DELAY
> NFSD: Convert the callback workqueue to use delayed_work
> NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
> NFSD: Retransmit callbacks after client reconnects
> NFSD: Add nfsd_seq4_status trace event
> NFSD: Replace dprintks in nfsd4_cb_sequence_done()
> NFSD: Rename nfsd_cb_state trace point
> NFSD: Add callback operation lifetime trace points
> SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
> NFSD: Remove unused @reason argument
> NFSD: Replace comment with lockdep assertion
> NFSD: Remove BUG_ON in nfsd4_process_cb_update()
> SUNRPC: Remove stale comments
> NFSD: Remove redundant cb_seq_status initialization
>
>
> fs/nfsd/nfs4callback.c | 94 +++++++++++++++--------
> fs/nfsd/nfs4state.c | 1 +
> fs/nfsd/state.h | 2 +-
> fs/nfsd/trace.h | 162 ++++++++++++++++++++++++++++++++++++++-
> include/trace/misc/nfs.h | 34 ++++++++
> net/sunrpc/svc.c | 1 -
> net/sunrpc/xprtsock.c | 9 ---
> 7 files changed, 261 insertions(+), 42 deletions(-)
>
> --
> Chuck Lever
These all make sense, even to me. For the series:
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Ben
^ permalink raw reply [flat|nested] 18+ messages in thread