* [PATCH RFC 00/13] NFSD backchannel fixes
@ 2024-01-25 16:28 Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
` (14 more replies)
0 siblings, 15 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:28 UTC (permalink / raw)
To: linux-nfs
The first three patches fix bugs that prevent NFSD's backchannel
from reliably retransmitting after a client reconnects. These fixes
might be appropriate for 6.8-rc.
Following that are some new trace points that might be helpful for
field troubleshooting.
Then there are some minor clean-ups.
I am still testing this series, and there is one msleep() call that
needs some thought. Thoughts, comments, opinions, rotten fruit? You
know the drill.
---
Chuck Lever (13):
NFSD: Reset cb_seq_status after NFS4ERR_DELAY
NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
NFSD: Retransmit callbacks after client reconnects
NFSD: Add nfsd_seq4_status trace event
NFSD: Replace dprintks in nfsd4_cb_sequence_done()
NFSD: Rename nfsd_cb_state trace point
NFSD: Add callback operation lifetime trace points
SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
NFSD: Remove unused @reason argument
NFSD: Replace comment with lockdep assertion
NFSD: Remove BUG_ON in nfsd4_process_cb_update()
SUNRPC: Remove stale comments
NFSD: Remove redundant cb_seq_status initialization
fs/nfsd/nfs4callback.c | 81 +++++++++++++-------
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/trace.h | 162 ++++++++++++++++++++++++++++++++++++++-
include/trace/misc/nfs.h | 34 ++++++++
net/sunrpc/svc.c | 1 -
net/sunrpc/xprtsock.c | 9 ---
6 files changed, 250 insertions(+), 38 deletions(-)
--
Chuck Lever
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
@ 2024-01-25 16:28 ` Chuck Lever
2024-01-25 20:07 ` Jeff Layton
2024-01-25 16:28 ` [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
` (13 subsequent siblings)
14 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:28 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
I noticed that once an NFSv4.1 callback operation gets a
NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost,
the callback client loops, resending it indefinitely.
The switch arm in nfsd4_cb_sequence_done() that handles
NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine
for the retransmit, but that path does not call the rpc_prepare_call
callback again. Thus cb_seq_status is set to -10008 by the first
NFS4ERR_DELAY result, but is never set back to 1 for the retransmits.
nfsd4_cb_sequence_done() thinks it's getting nothing but a
long series of CB_SEQUENCE NFS4ERR_DELAY replies.
Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 926c29879c6a..43b0a34a5d5b 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1178,6 +1178,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
ret = false;
break;
case -NFS4ERR_DELAY:
+ cb->cb_seq_status = 1;
if (!rpc_restart_call(task))
goto out;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
@ 2024-01-25 16:28 ` Chuck Lever
2024-01-25 20:19 ` Jeff Layton
2024-01-25 16:28 ` [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects Chuck Lever
` (12 subsequent siblings)
14 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:28 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
As part of managing a client disconnect, NFSD closes down and
replaces the backchannel rpc_clnt.
If a callback operation is pending when the backchannel rpc_clnt is
shut down, currently nfsd4_run_cb_work() just discards that
callback. But there are multiple cases to deal with here:
o The client's lease is getting destroyed. Throw the CB away.
o The client disconnected. It might be forcing a retransmit of
CB operations, or it could have disconnected for other reasons.
Reschedule the CB so it is retransmitted when the client
reconnects.
Since callback operations can now be rescheduled, ensure that
cb_ops->prepare can be called only once by moving the
cb_ops->prepare paragraph down to just before the rpc_call_async()
call.
Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 43b0a34a5d5b..b2844abcb51f 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1375,20 +1375,22 @@ nfsd4_run_cb_work(struct work_struct *work)
struct rpc_clnt *clnt;
int flags;
- if (cb->cb_need_restart) {
- cb->cb_need_restart = false;
- } else {
- if (cb->cb_ops && cb->cb_ops->prepare)
- cb->cb_ops->prepare(cb);
- }
-
if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
nfsd4_process_cb_update(cb);
clnt = clp->cl_cb_client;
if (!clnt) {
- /* Callback channel broken, or client killed; give up: */
- nfsd41_destroy_cb(cb);
+ if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
+ nfsd41_destroy_cb(cb);
+ else {
+ /*
+ * XXX: Ideally, we would wait for the client to
+ * reconnect, but I haven't figured out how
+ * to do that yet.
+ */
+ msleep(30);
+ nfsd4_queue_cb(cb);
+ }
return;
}
@@ -1401,6 +1403,12 @@ nfsd4_run_cb_work(struct work_struct *work)
return;
}
+ if (cb->cb_need_restart) {
+ cb->cb_need_restart = false;
+ } else {
+ if (cb->cb_ops && cb->cb_ops->prepare)
+ cb->cb_ops->prepare(cb);
+ }
cb->cb_msg.rpc_cred = clp->cl_cb_cred;
flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN;
rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags,
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
@ 2024-01-25 16:28 ` Chuck Lever
2024-01-25 20:24 ` Jeff Layton
2024-01-25 16:29 ` [PATCH RFC 04/13] NFSD: Add nfsd_seq4_status trace event Chuck Lever
` (11 subsequent siblings)
14 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:28 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
NFSv4.1 clients assume that if they disconnect, that will force the
server to resend pending callback operations once a fresh connection
has been established.
Turns out NFSD has not been resending after reconnect.
Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index b2844abcb51f..1ff64efe1f5c 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1171,12 +1171,21 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
case -ESERVERFAULT:
++session->se_cb_seq_nr;
- fallthrough;
+ nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ ret = false;
+ break;
case 1:
+ /*
+ * cb_seq_status remains 1 if an RPC Reply was never
+ * received. NFSD can't know if the client processed
+ * the CB_SEQUENCE operation. Ask the client to send a
+ * DESTROY_SESSION to recover.
+ */
+ fallthrough;
case -NFS4ERR_BADSESSION:
nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
ret = false;
- break;
+ goto need_restart;
case -NFS4ERR_DELAY:
cb->cb_seq_status = 1;
if (!rpc_restart_call(task))
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 04/13] NFSD: Add nfsd_seq4_status trace event
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (2 preceding siblings ...)
2024-01-25 16:28 ` [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 05/13] NFSD: Replace dprintks in nfsd4_cb_sequence_done() Chuck Lever
` (10 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Add a trace point that records SEQ4_STATUS flags returned in an
NFSv4.1 SEQUENCE response. SEQ4_STATUS flags report backchannel
issues and changes to lease state to clients. Knowing what the
server is reporting to clients is useful for debugging both
configuration and operational issues in real time.
For example, upcoming patches will enable server administrators to
revoke parts of a client's lease; that revocation is indicated to
the client when a subsequent SEQUENCE operation has one or more
SEQ4_STATUS flags that are set.
Sample trace records:
nfsd-927 [006] 615.581821: nfsd_seq4_status: xid=0x095ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
nfsd-927 [006] 615.588043: nfsd_seq4_status: xid=0x0a5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
nfsd-928 [003] 615.588448: nfsd_seq4_status: xid=0x0b5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/trace.h | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d7d561b29fb0..3d6e3dcfdee1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4058,6 +4058,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
}
if (!list_empty(&clp->cl_revoked))
seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
+ trace_nfsd_seq4_status(rqstp, seq);
out_no_session:
if (conn)
free_conn(conn);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d1e8cf079b0f..38d11b43779c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -696,6 +696,41 @@ DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name, \
DEFINE_STID_EVENT(revoke);
+TRACE_EVENT_CONDITION(nfsd_seq4_status,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct nfsd4_sequence *sequence
+ ),
+ TP_ARGS(rqstp, sequence),
+ TP_CONDITION(sequence->status_flags),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __field(u32, xid)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(unsigned long, status_flags)
+ ),
+ TP_fast_assign(
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&sequence->sessionid;
+
+ __entry->netns_ino = SVC_NET(rqstp)->ns.inum;
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->status_flags = sequence->status_flags;
+ ),
+ TP_printk("xid=0x%08x sessionid=%08x:%08x:%08x:%08x status_flags=%s",
+ __entry->xid, __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ show_nfs4_seq4_status(__entry->status_flags)
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_clientid_class,
TP_PROTO(const clientid_t *clid),
TP_ARGS(clid),
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 05/13] NFSD: Replace dprintks in nfsd4_cb_sequence_done()
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (3 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 04/13] NFSD: Add nfsd_seq4_status trace event Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 06/13] NFSD: Rename nfsd_cb_state trace point Chuck Lever
` (9 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Improve observability of backchannel session operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 9 ++---
fs/nfsd/trace.h | 82 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1ff64efe1f5c..9f5aebeef83c 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1158,6 +1158,8 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
if (!cb->cb_holds_slot)
goto need_restart;
+ /* This is the operation status code for CB_SEQUENCE */
+ trace_nfsd_cb_seq_status(task, cb);
switch (cb->cb_seq_status) {
case 0:
/*
@@ -1203,13 +1205,10 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
default:
nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
- dprintk("%s: unprocessed error %d\n", __func__,
- cb->cb_seq_status);
}
-
nfsd41_cb_release_slot(cb);
- dprintk("%s: freed slot, new seqid=%d\n", __func__,
- clp->cl_cb_session->se_cb_seq_nr);
+
+ trace_nfsd_cb_free_slot(task, cb);
if (RPC_SIGNALLED(task))
goto need_restart;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 38d11b43779c..c134c755ae5d 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -9,8 +9,10 @@
#define _NFSD_TRACE_H
#include <linux/tracepoint.h>
+#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprt.h>
#include <trace/misc/nfs.h>
+#include <trace/misc/sunrpc.h>
#include "export.h"
#include "nfsfh.h"
@@ -1440,6 +1442,86 @@ TRACE_EVENT(nfsd_cb_setup_err,
__entry->error)
);
+TRACE_EVENT(nfsd_cb_seq_status,
+ TP_PROTO(
+ const struct rpc_task *task,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(task, cb),
+ TP_STRUCT__entry(
+ __field(unsigned int, task_id)
+ __field(unsigned int, client_id)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(int, tk_status)
+ __field(int, seq_status)
+ ),
+ TP_fast_assign(
+ const struct nfs4_client *clp = cb->cb_clp;
+ const struct nfsd4_session *session = clp->cl_cb_session;
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&session->se_sessionid;
+
+ __entry->task_id = task->tk_pid;
+ __entry->client_id = task->tk_client ?
+ task->tk_client->cl_clid : -1;
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->tk_status = task->tk_status;
+ __entry->seq_status = cb->cb_seq_status;
+ ),
+ TP_printk(SUNRPC_TRACE_TASK_SPECIFIER
+ " sessionid=%08x:%08x:%08x:%08x tk_status=%d seq_status=%d\n",
+ __entry->task_id, __entry->client_id,
+ __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ __entry->tk_status, __entry->seq_status
+ )
+);
+
+TRACE_EVENT(nfsd_cb_free_slot,
+ TP_PROTO(
+ const struct rpc_task *task,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(task, cb),
+ TP_STRUCT__entry(
+ __field(unsigned int, task_id)
+ __field(unsigned int, client_id)
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, seqno)
+ __field(u32, reserved)
+ __field(u32, slot_seqno)
+ ),
+ TP_fast_assign(
+ const struct nfs4_client *clp = cb->cb_clp;
+ const struct nfsd4_session *session = clp->cl_cb_session;
+ const struct nfsd4_sessionid *sid =
+ (struct nfsd4_sessionid *)&session->se_sessionid;
+
+ __entry->task_id = task->tk_pid;
+ __entry->client_id = task->tk_client ?
+ task->tk_client->cl_clid : -1;
+ __entry->cl_boot = sid->clientid.cl_boot;
+ __entry->cl_id = sid->clientid.cl_id;
+ __entry->seqno = sid->sequence;
+ __entry->reserved = sid->reserved;
+ __entry->slot_seqno = session->se_cb_seq_nr;
+ ),
+ TP_printk(SUNRPC_TRACE_TASK_SPECIFIER
+ " sessionid=%08x:%08x:%08x:%08x new slot seqno=%u\n",
+ __entry->task_id, __entry->client_id,
+ __entry->cl_boot, __entry->cl_id,
+ __entry->seqno, __entry->reserved,
+ __entry->slot_seqno
+ )
+);
+
TRACE_EVENT_CONDITION(nfsd_cb_recall,
TP_PROTO(
const struct nfs4_stid *stid
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 06/13] NFSD: Rename nfsd_cb_state trace point
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (4 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 05/13] NFSD: Replace dprintks in nfsd4_cb_sequence_done() Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points Chuck Lever
` (8 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Make it clear where backchannel state is updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 4 +++-
fs/nfsd/trace.h | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 9f5aebeef83c..1c85426830b1 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -999,7 +999,7 @@ static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate)
{
if (clp->cl_cb_state != newstate) {
clp->cl_cb_state = newstate;
- trace_nfsd_cb_state(clp);
+ trace_nfsd_cb_new_state(clp);
}
}
@@ -1383,6 +1383,8 @@ nfsd4_run_cb_work(struct work_struct *work)
struct rpc_clnt *clnt;
int flags;
+ trace_nfsd_cb_start(clp);
+
if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
nfsd4_process_cb_update(cb);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c134c755ae5d..6003af2bee33 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1371,7 +1371,8 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \
TP_PROTO(const struct nfs4_client *clp), \
TP_ARGS(clp))
-DEFINE_NFSD_CB_EVENT(state);
+DEFINE_NFSD_CB_EVENT(start);
+DEFINE_NFSD_CB_EVENT(new_state);
DEFINE_NFSD_CB_EVENT(probe);
DEFINE_NFSD_CB_EVENT(lost);
DEFINE_NFSD_CB_EVENT(shutdown);
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (5 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 06/13] NFSD: Rename nfsd_cb_state trace point Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 21:49 ` Benjamin Coddington
2024-01-25 16:29 ` [PATCH RFC 08/13] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc() Chuck Lever
` (7 subsequent siblings)
14 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Help observe the flow of callback operations.
bc_shutdown() records exactly when the backchannel RPC client is
destroyed and cl_cb_client is replaced with NULL.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 7 +++++++
fs/nfsd/trace.h | 42 ++++++++++++++++++++++++++++++++++++++++++
include/trace/misc/nfs.h | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 83 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1c85426830b1..4d5a6370b92c 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -887,6 +887,7 @@ static struct workqueue_struct *callback_wq;
static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
{
+ trace_nfsd_cb_queue(cb->cb_clp, cb);
return queue_work(callback_wq, &cb->cb_work);
}
@@ -1106,6 +1107,7 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb)
{
struct nfs4_client *clp = cb->cb_clp;
+ trace_nfsd_cb_destroy(clp, cb);
nfsd41_cb_release_slot(cb);
if (cb->cb_ops && cb->cb_ops->release)
cb->cb_ops->release(cb);
@@ -1220,6 +1222,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
goto out;
need_restart:
if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
+ trace_nfsd_cb_restart(clp, cb);
task->tk_status = 0;
cb->cb_need_restart = true;
}
@@ -1333,11 +1336,14 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
struct nfsd4_conn *c;
int err;
+ trace_nfsd_cb_bc_update(clp, cb);
+
/*
* This is either an update, or the client dying; in either case,
* kill the old client:
*/
if (clp->cl_cb_client) {
+ trace_nfsd_cb_bc_shutdown(clp, cb);
rpc_shutdown_client(clp->cl_cb_client);
clp->cl_cb_client = NULL;
put_cred(clp->cl_cb_cred);
@@ -1349,6 +1355,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
}
if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
return;
+
spin_lock(&clp->cl_lock);
/*
* Only serialized callback code is allowed to clear these
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 6003af2bee33..9f9e58debc26 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1443,6 +1443,48 @@ TRACE_EVENT(nfsd_cb_setup_err,
__entry->error)
);
+DECLARE_EVENT_CLASS(nfsd_cb_lifetime_class,
+ TP_PROTO(
+ const struct nfs4_client *clp,
+ const struct nfsd4_callback *cb
+ ),
+ TP_ARGS(clp, cb),
+ TP_STRUCT__entry(
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(const void *, cb)
+ __field(bool, need_restart)
+ __sockaddr(addr, clp->cl_cb_conn.cb_addrlen)
+ ),
+ TP_fast_assign(
+ __entry->cl_boot = clp->cl_clientid.cl_boot;
+ __entry->cl_id = clp->cl_clientid.cl_id;
+ __entry->cb = cb;
+ __entry->need_restart = cb->cb_need_restart;
+ __assign_sockaddr(addr, &clp->cl_cb_conn.cb_addr,
+ clp->cl_cb_conn.cb_addrlen)
+ ),
+ TP_printk("addr=%pISpc client %08x:%08x cb=%p%s",
+ __get_sockaddr(addr), __entry->cl_boot, __entry->cl_id,
+ __entry->cb, __entry->need_restart ?
+ " (need restart)" : " (first try)"
+ )
+);
+
+#define DEFINE_NFSD_CB_LIFETIME_EVENT(name) \
+DEFINE_EVENT(nfsd_cb_lifetime_class, nfsd_cb_##name, \
+ TP_PROTO( \
+ const struct nfs4_client *clp, \
+ const struct nfsd4_callback *cb \
+ ), \
+ TP_ARGS(clp, cb))
+
+DEFINE_NFSD_CB_LIFETIME_EVENT(queue);
+DEFINE_NFSD_CB_LIFETIME_EVENT(destroy);
+DEFINE_NFSD_CB_LIFETIME_EVENT(restart);
+DEFINE_NFSD_CB_LIFETIME_EVENT(bc_update);
+DEFINE_NFSD_CB_LIFETIME_EVENT(bc_shutdown);
+
TRACE_EVENT(nfsd_cb_seq_status,
TP_PROTO(
const struct rpc_task *task,
diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h
index 0d9d48dca38a..64ab5dac59ce 100644
--- a/include/trace/misc/nfs.h
+++ b/include/trace/misc/nfs.h
@@ -385,3 +385,37 @@ TRACE_DEFINE_ENUM(IOMODE_ANY);
{ SEQ4_STATUS_RESTART_RECLAIM_NEEDED, "RESTART_RECLAIM_NEEDED" }, \
{ SEQ4_STATUS_CB_PATH_DOWN_SESSION, "CB_PATH_DOWN_SESSION" }, \
{ SEQ4_STATUS_BACKCHANNEL_FAULT, "BACKCHANNEL_FAULT" })
+
+TRACE_DEFINE_ENUM(OP_CB_GETATTR);
+TRACE_DEFINE_ENUM(OP_CB_RECALL);
+TRACE_DEFINE_ENUM(OP_CB_LAYOUTRECALL);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY);
+TRACE_DEFINE_ENUM(OP_CB_PUSH_DELEG);
+TRACE_DEFINE_ENUM(OP_CB_RECALL_ANY);
+TRACE_DEFINE_ENUM(OP_CB_RECALLABLE_OBJ_AVAIL);
+TRACE_DEFINE_ENUM(OP_CB_RECALL_SLOT);
+TRACE_DEFINE_ENUM(OP_CB_SEQUENCE);
+TRACE_DEFINE_ENUM(OP_CB_WANTS_CANCELLED);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY_LOCK);
+TRACE_DEFINE_ENUM(OP_CB_NOTIFY_DEVICEID);
+TRACE_DEFINE_ENUM(OP_CB_OFFLOAD);
+TRACE_DEFINE_ENUM(OP_CB_ILLEGAL);
+
+#define show_nfs4_cb_op(x) \
+ __print_symbolic(x, \
+ { 0, "CB_NULL" }, \
+ { 1, "CB_COMPOUND" }, \
+ { OP_CB_GETATTR, "CB_GETATTR" }, \
+ { OP_CB_RECALL, "CB_RECALL" }, \
+ { OP_CB_LAYOUTRECALL, "CB_LAYOUTRECALL" }, \
+ { OP_CB_NOTIFY, "CB_NOTIFY" }, \
+ { OP_CB_PUSH_DELEG, "CB_PUSH_DELEG" }, \
+ { OP_CB_RECALL_ANY, "CB_RECALL_ANY" }, \
+ { OP_CB_RECALLABLE_OBJ_AVAIL, "CB_RECALLABLE_OBJ_AVAIL" }, \
+ { OP_CB_RECALL_SLOT, "CB_RECALL_SLOT" }, \
+ { OP_CB_SEQUENCE, "CB_SEQUENCE" }, \
+ { OP_CB_WANTS_CANCELLED, "CB_WANTS_CANCELLED" }, \
+ { OP_CB_NOTIFY_LOCK, "CB_NOTIFY_LOCK" }, \
+ { OP_CB_NOTIFY_DEVICEID, "CB_NOTIFY_DEVICEID" }, \
+ { OP_CB_OFFLOAD, "CB_OFFLOAD" }, \
+ { OP_CB_ILLEGAL, "CB_ILLEGAL" })
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 08/13] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (6 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 09/13] NFSD: Remove unused @reason argument Chuck Lever
` (6 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
svc_process_bc(), previously known as bc_svc_process(), was
added in commit 4d6bbb6233c9 ("nfs41: Backchannel bc_svc_process()")
but there has never been a call site outside of the sunrpc.ko
module.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index eb5856e1351d..425c1ca9a772 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1603,7 +1603,6 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
WARN_ON_ONCE(atomic_read(&task->tk_count) != 1);
rpc_put_task(task);
}
-EXPORT_SYMBOL_GPL(svc_process_bc);
#endif /* CONFIG_SUNRPC_BACKCHANNEL */
/**
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 09/13] NFSD: Remove unused @reason argument
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (7 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 08/13] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc() Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 10/13] NFSD: Replace comment with lockdep assertion Chuck Lever
` (5 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 4d5a6370b92c..1e0f5a0bd804 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -45,7 +45,7 @@
#define NFSDDBG_FACILITY NFSDDBG_PROC
-static void nfsd4_mark_cb_fault(struct nfs4_client *, int reason);
+static void nfsd4_mark_cb_fault(struct nfs4_client *clp);
#define NFSPROC4_CB_NULL 0
#define NFSPROC4_CB_COMPOUND 1
@@ -1004,14 +1004,14 @@ static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate)
}
}
-static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason)
+static void nfsd4_mark_cb_down(struct nfs4_client *clp)
{
if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags))
return;
nfsd4_mark_cb_state(clp, NFSD4_CB_DOWN);
}
-static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason)
+static void nfsd4_mark_cb_fault(struct nfs4_client *clp)
{
if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags))
return;
@@ -1023,7 +1023,7 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata)
struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null);
if (task->tk_status)
- nfsd4_mark_cb_down(clp, task->tk_status);
+ nfsd4_mark_cb_down(clp);
else
nfsd4_mark_cb_state(clp, NFSD4_CB_UP);
}
@@ -1175,7 +1175,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
break;
case -ESERVERFAULT:
++session->se_cb_seq_nr;
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
ret = false;
break;
case 1:
@@ -1187,7 +1187,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
*/
fallthrough;
case -NFS4ERR_BADSESSION:
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
ret = false;
goto need_restart;
case -NFS4ERR_DELAY:
@@ -1206,7 +1206,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
}
break;
default:
- nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
+ nfsd4_mark_cb_fault(cb->cb_clp);
}
nfsd41_cb_release_slot(cb);
@@ -1252,7 +1252,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
case -EIO:
case -ETIMEDOUT:
case -EACCES:
- nfsd4_mark_cb_down(clp, task->tk_status);
+ nfsd4_mark_cb_down(clp);
}
break;
default:
@@ -1374,7 +1374,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
err = setup_callback_client(clp, &conn, ses);
if (err) {
- nfsd4_mark_cb_down(clp, err);
+ nfsd4_mark_cb_down(clp);
if (c)
svc_xprt_put(c->cn_xprt);
return;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 10/13] NFSD: Replace comment with lockdep assertion
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (8 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 09/13] NFSD: Remove unused @reason argument Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 11/13] NFSD: Remove BUG_ON in nfsd4_process_cb_update() Chuck Lever
` (4 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Convert a code comment into a real assertion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1e0f5a0bd804..dc308569bc31 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1307,12 +1307,13 @@ void nfsd4_shutdown_callback(struct nfs4_client *clp)
nfsd41_cb_inflight_wait_complete(clp);
}
-/* requires cl_lock: */
static struct nfsd4_conn * __nfsd4_find_backchannel(struct nfs4_client *clp)
{
struct nfsd4_session *s;
struct nfsd4_conn *c;
+ lockdep_assert_held(&clp->cl_lock);
+
list_for_each_entry(s, &clp->cl_sessions, se_perclnt) {
list_for_each_entry(c, &s->se_conns, cn_persession) {
if (c->cn_flags & NFS4_CDFC4_BACK)
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 11/13] NFSD: Remove BUG_ON in nfsd4_process_cb_update()
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (9 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 10/13] NFSD: Replace comment with lockdep assertion Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 12/13] SUNRPC: Remove stale comments Chuck Lever
` (3 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
Don't kill the kworker thread, and don't panic while cl_lock is
held. There's no need for scorching the earth here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index dc308569bc31..46ecb3ec0f8f 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1362,8 +1362,9 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
* Only serialized callback code is allowed to clear these
* flags; main nfsd code can only set them:
*/
- BUG_ON(!(clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK));
+ WARN_ON(!(clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK));
clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
+
memcpy(&conn, &cb->cb_clp->cl_cb_conn, sizeof(struct nfs4_cb_conn));
c = __nfsd4_find_backchannel(clp);
if (c) {
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 12/13] SUNRPC: Remove stale comments
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (10 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 11/13] NFSD: Remove BUG_ON in nfsd4_process_cb_update() Chuck Lever
@ 2024-01-25 16:29 ` Chuck Lever
2024-01-25 16:30 ` [PATCH RFC 13/13] NFSD: Remove redundant cb_seq_status initialization Chuck Lever
` (2 subsequent siblings)
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:29 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
bc_close() and bc_destroy now do something, so the comments are
no longer correct. Commit 6221f1d9b63f ("SUNRPC: Fix backchannel
RPC soft lockups") should have removed these.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtsock.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 58f3dc8d0d71..d92c13e78a56 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2987,20 +2987,11 @@ static int bc_send_request(struct rpc_rqst *req)
return len;
}
-/*
- * The close routine. Since this is client initiated, we do nothing
- */
-
static void bc_close(struct rpc_xprt *xprt)
{
xprt_disconnect_done(xprt);
}
-/*
- * The xprt destroy routine. Again, because this connection is client
- * initiated, we do nothing
- */
-
static void bc_destroy(struct rpc_xprt *xprt)
{
dprintk("RPC: bc_destroy xprt %p\n", xprt);
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH RFC 13/13] NFSD: Remove redundant cb_seq_status initialization
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (11 preceding siblings ...)
2024-01-25 16:29 ` [PATCH RFC 12/13] SUNRPC: Remove stale comments Chuck Lever
@ 2024-01-25 16:30 ` Chuck Lever
2024-01-25 20:41 ` [PATCH RFC 00/13] NFSD backchannel fixes Jeff Layton
2024-01-25 22:09 ` Benjamin Coddington
14 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 16:30 UTC (permalink / raw)
To: linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
As far as I can see, setting cb_seq_status in nfsd4_init_cb() is
superfluous because it is set again in nfsd4_cb_prepare().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4callback.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 46ecb3ec0f8f..e72a904f120b 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1443,7 +1443,6 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
cb->cb_msg.rpc_resp = cb;
cb->cb_ops = ops;
INIT_WORK(&cb->cb_work, nfsd4_run_cb_work);
- cb->cb_seq_status = 1;
cb->cb_status = 0;
cb->cb_need_restart = false;
cb->cb_holds_slot = false;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY
2024-01-25 16:28 ` [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
@ 2024-01-25 20:07 ` Jeff Layton
0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2024-01-25 20:07 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> I noticed that once an NFSv4.1 callback operation gets a
> NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost,
> the callback client loops, resending it indefinitely.
>
> The switch arm in nfsd4_cb_sequence_done() that handles
> NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine
> for the retransmit, but that path does not call the rpc_prepare_call
> callback again. Thus cb_seq_status is set to -10008 by the first
> NFS4ERR_DELAY result, but is never set back to 1 for the retransmits.
>
> nfsd4_cb_sequence_done() thinks it's getting nothing but a
> long series of CB_SEQUENCE NFS4ERR_DELAY replies.
>
> Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 926c29879c6a..43b0a34a5d5b 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1178,6 +1178,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> ret = false;
> break;
> case -NFS4ERR_DELAY:
> + cb->cb_seq_status = 1;
> if (!rpc_restart_call(task))
> goto out;
>
>
>
>
Nice catch!
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
2024-01-25 16:28 ` [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
@ 2024-01-25 20:19 ` Jeff Layton
2024-01-25 20:29 ` Chuck Lever
0 siblings, 1 reply; 24+ messages in thread
From: Jeff Layton @ 2024-01-25 20:19 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> As part of managing a client disconnect, NFSD closes down and
> replaces the backchannel rpc_clnt.
>
> If a callback operation is pending when the backchannel rpc_clnt is
> shut down, currently nfsd4_run_cb_work() just discards that
> callback. But there are multiple cases to deal with here:
>
> o The client's lease is getting destroyed. Throw the CB away.
>
> o The client disconnected. It might be forcing a retransmit of
> CB operations, or it could have disconnected for other reasons.
> Reschedule the CB so it is retransmitted when the client
> reconnects.
>
> Since callback operations can now be rescheduled, ensure that
> cb_ops->prepare can be called only once by moving the
> cb_ops->prepare paragraph down to just before the rpc_call_async()
> call.
>
> Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 43b0a34a5d5b..b2844abcb51f 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1375,20 +1375,22 @@ nfsd4_run_cb_work(struct work_struct *work)
> struct rpc_clnt *clnt;
> int flags;
>
> - if (cb->cb_need_restart) {
> - cb->cb_need_restart = false;
> - } else {
> - if (cb->cb_ops && cb->cb_ops->prepare)
> - cb->cb_ops->prepare(cb);
> - }
> -
> if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
> nfsd4_process_cb_update(cb);
>
> clnt = clp->cl_cb_client;
> if (!clnt) {
> - /* Callback channel broken, or client killed; give up: */
> - nfsd41_destroy_cb(cb);
> + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> + nfsd41_destroy_cb(cb);
> + else {
> + /*
> + * XXX: Ideally, we would wait for the client to
> + * reconnect, but I haven't figured out how
> + * to do that yet.
> + */
> + msleep(30);
> + nfsd4_queue_cb(cb);
It would probably be better to just queue the cb as delayed_work here,
so you don't squat on the workqueue thread. That'll mean changing
cb_work to struct delayed_work, but that should be NBD.
> + }
> return;
> }
>
> @@ -1401,6 +1403,12 @@ nfsd4_run_cb_work(struct work_struct *work)
> return;
> }
>
> + if (cb->cb_need_restart) {
> + cb->cb_need_restart = false;
> + } else {
> + if (cb->cb_ops && cb->cb_ops->prepare)
> + cb->cb_ops->prepare(cb);
> + }
> cb->cb_msg.rpc_cred = clp->cl_cb_cred;
> flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN;
> rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags,
>
>
>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects
2024-01-25 16:28 ` [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects Chuck Lever
@ 2024-01-25 20:24 ` Jeff Layton
0 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2024-01-25 20:24 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> NFSv4.1 clients assume that if they disconnect, that will force the
> server to resend pending callback operations once a fresh connection
> has been established.
>
> Turns out NFSD has not been resending after reconnect.
>
> Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index b2844abcb51f..1ff64efe1f5c 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1171,12 +1171,21 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> break;
> case -ESERVERFAULT:
> ++session->se_cb_seq_nr;
> - fallthrough;
> + nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
> + ret = false;
> + break;
> case 1:
> + /*
> + * cb_seq_status remains 1 if an RPC Reply was never
> + * received. NFSD can't know if the client processed
> + * the CB_SEQUENCE operation. Ask the client to send a
> + * DESTROY_SESSION to recover.
> + */
> + fallthrough;
> case -NFS4ERR_BADSESSION:
> nfsd4_mark_cb_fault(cb->cb_clp, cb->cb_seq_status);
> ret = false;
> - break;
> + goto need_restart;
> case -NFS4ERR_DELAY:
> cb->cb_seq_status = 1;
> if (!rpc_restart_call(task))
>
>
>
Nice work tracking that down.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
2024-01-25 20:19 ` Jeff Layton
@ 2024-01-25 20:29 ` Chuck Lever
0 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2024-01-25 20:29 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs
On Thu, Jan 25, 2024 at 03:19:41PM -0500, Jeff Layton wrote:
> On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > As part of managing a client disconnect, NFSD closes down and
> > replaces the backchannel rpc_clnt.
> >
> > If a callback operation is pending when the backchannel rpc_clnt is
> > shut down, currently nfsd4_run_cb_work() just discards that
> > callback. But there are multiple cases to deal with here:
> >
> > o The client's lease is getting destroyed. Throw the CB away.
> >
> > o The client disconnected. It might be forcing a retransmit of
> > CB operations, or it could have disconnected for other reasons.
> > Reschedule the CB so it is retransmitted when the client
> > reconnects.
> >
> > Since callback operations can now be rescheduled, ensure that
> > cb_ops->prepare can be called only once by moving the
> > cb_ops->prepare paragraph down to just before the rpc_call_async()
> > call.
> >
> > Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()")
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> > fs/nfsd/nfs4callback.c | 26 +++++++++++++++++---------
> > 1 file changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 43b0a34a5d5b..b2844abcb51f 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1375,20 +1375,22 @@ nfsd4_run_cb_work(struct work_struct *work)
> > struct rpc_clnt *clnt;
> > int flags;
> >
> > - if (cb->cb_need_restart) {
> > - cb->cb_need_restart = false;
> > - } else {
> > - if (cb->cb_ops && cb->cb_ops->prepare)
> > - cb->cb_ops->prepare(cb);
> > - }
> > -
> > if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
> > nfsd4_process_cb_update(cb);
> >
> > clnt = clp->cl_cb_client;
> > if (!clnt) {
> > - /* Callback channel broken, or client killed; give up: */
> > - nfsd41_destroy_cb(cb);
> > + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> > + nfsd41_destroy_cb(cb);
> > + else {
> > + /*
> > + * XXX: Ideally, we would wait for the client to
> > + * reconnect, but I haven't figured out how
> > + * to do that yet.
> > + */
> > + msleep(30);
> > + nfsd4_queue_cb(cb);
>
> It would probably be better to just queue the cb as delayed_work here,
> so you don't squat on the workqueue thread.
You found my placeholder :-)
> That'll mean changing
> cb_work to struct delayed_work, but that should be NBD.
I've looked at that. I wanted to be sure, before going that route,
that there is no obvious way to implement a "wait for client to
reconnect". msleep (or delayed_work) is basically a slow busy wait.
> > + }
> > return;
> > }
> >
> > @@ -1401,6 +1403,12 @@ nfsd4_run_cb_work(struct work_struct *work)
> > return;
> > }
> >
> > + if (cb->cb_need_restart) {
> > + cb->cb_need_restart = false;
> > + } else {
> > + if (cb->cb_ops && cb->cb_ops->prepare)
> > + cb->cb_ops->prepare(cb);
> > + }
> > cb->cb_msg.rpc_cred = clp->cl_cb_cred;
> > flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN;
> > rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags,
> >
> >
> >
>
> --
> Jeff Layton <jlayton@kernel.org>
--
Chuck Lever
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 00/13] NFSD backchannel fixes
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (12 preceding siblings ...)
2024-01-25 16:30 ` [PATCH RFC 13/13] NFSD: Remove redundant cb_seq_status initialization Chuck Lever
@ 2024-01-25 20:41 ` Jeff Layton
2024-01-25 22:09 ` Benjamin Coddington
14 siblings, 0 replies; 24+ messages in thread
From: Jeff Layton @ 2024-01-25 20:41 UTC (permalink / raw)
To: Chuck Lever, linux-nfs
On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote:
> The first three patches fix bugs that prevent NFSD's backchannel
> from reliably retransmitting after a client reconnects. These fixes
> might be appropriate for 6.8-rc.
>
> Following that are some new trace points that might be helpful for
> field troubleshooting.
>
> Then there are some minor clean-ups.
>
> I am still testing this series, and there is one msleep() call that
> needs some thought. Thoughts, comments, opinions, rotten fruit? You
> know the drill.
>
> ---
>
> Chuck Lever (13):
> NFSD: Reset cb_seq_status after NFS4ERR_DELAY
> NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
> NFSD: Retransmit callbacks after client reconnects
> NFSD: Add nfsd_seq4_status trace event
> NFSD: Replace dprintks in nfsd4_cb_sequence_done()
> NFSD: Rename nfsd_cb_state trace point
> NFSD: Add callback operation lifetime trace points
> SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
> NFSD: Remove unused @reason argument
> NFSD: Replace comment with lockdep assertion
> NFSD: Remove BUG_ON in nfsd4_process_cb_update()
> SUNRPC: Remove stale comments
> NFSD: Remove redundant cb_seq_status initialization
>
>
> fs/nfsd/nfs4callback.c | 81 +++++++++++++-------
> fs/nfsd/nfs4state.c | 1 +
> fs/nfsd/trace.h | 162 ++++++++++++++++++++++++++++++++++++++-
> include/trace/misc/nfs.h | 34 ++++++++
> net/sunrpc/svc.c | 1 -
> net/sunrpc/xprtsock.c | 9 ---
> 6 files changed, 250 insertions(+), 38 deletions(-)
>
> --
> Chuck Lever
>
>
Love some of the new backchannel tracepoints. That should be helpful.
You can add this to 4-13:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points
2024-01-25 16:29 ` [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points Chuck Lever
@ 2024-01-25 21:49 ` Benjamin Coddington
2024-01-26 14:12 ` Chuck Lever
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Coddington @ 2024-01-25 21:49 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs
On 25 Jan 2024, at 11:29, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Help observe the flow of callback operations.
>
> bc_shutdown() records exactly when the backchannel RPC client is
> destroyed and cl_cb_client is replaced with NULL.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4callback.c | 7 +++++++
> fs/nfsd/trace.h | 42 ++++++++++++++++++++++++++++++++++++++++++
> include/trace/misc/nfs.h | 34 ++++++++++++++++++++++++++++++++++
> 3 files changed, 83 insertions(+)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 1c85426830b1..4d5a6370b92c 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -887,6 +887,7 @@ static struct workqueue_struct *callback_wq;
>
> static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
> {
> + trace_nfsd_cb_queue(cb->cb_clp, cb);
> return queue_work(callback_wq, &cb->cb_work);
> }
>
> @@ -1106,6 +1107,7 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb)
> {
> struct nfs4_client *clp = cb->cb_clp;
>
> + trace_nfsd_cb_destroy(clp, cb);
> nfsd41_cb_release_slot(cb);
> if (cb->cb_ops && cb->cb_ops->release)
> cb->cb_ops->release(cb);
> @@ -1220,6 +1222,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> goto out;
> need_restart:
> if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> + trace_nfsd_cb_restart(clp, cb);
> task->tk_status = 0;
> cb->cb_need_restart = true;
I think you want to call the tracepoint /after/ setting cb_need_restart here..
> }
> @@ -1333,11 +1336,14 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> struct nfsd4_conn *c;
> int err;
>
> + trace_nfsd_cb_bc_update(clp, cb);
> +
> /*
> * This is either an update, or the client dying; in either case,
> * kill the old client:
> */
> if (clp->cl_cb_client) {
> + trace_nfsd_cb_bc_shutdown(clp, cb);
> rpc_shutdown_client(clp->cl_cb_client);
> clp->cl_cb_client = NULL;
> put_cred(clp->cl_cb_cred);
> @@ -1349,6 +1355,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> }
> if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> return;
> +
I'm in favor of this whitespace change, but did you mean to include it?
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 00/13] NFSD backchannel fixes
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
` (13 preceding siblings ...)
2024-01-25 20:41 ` [PATCH RFC 00/13] NFSD backchannel fixes Jeff Layton
@ 2024-01-25 22:09 ` Benjamin Coddington
2024-01-26 13:56 ` Chuck Lever III
14 siblings, 1 reply; 24+ messages in thread
From: Benjamin Coddington @ 2024-01-25 22:09 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs
On 25 Jan 2024, at 11:28, Chuck Lever wrote:
> The first three patches fix bugs that prevent NFSD's backchannel
> from reliably retransmitting after a client reconnects. These fixes
> might be appropriate for 6.8-rc.
>
> Following that are some new trace points that might be helpful for
> field troubleshooting.
>
> Then there are some minor clean-ups.
>
> I am still testing this series, and there is one msleep() call that
> needs some thought. Thoughts, comments, opinions, rotten fruit? You
> know the drill.
>
> ---
>
> Chuck Lever (13):
> NFSD: Reset cb_seq_status after NFS4ERR_DELAY
> NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
> NFSD: Retransmit callbacks after client reconnects
> NFSD: Add nfsd_seq4_status trace event
> NFSD: Replace dprintks in nfsd4_cb_sequence_done()
> NFSD: Rename nfsd_cb_state trace point
> NFSD: Add callback operation lifetime trace points
> SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
> NFSD: Remove unused @reason argument
> NFSD: Replace comment with lockdep assertion
> NFSD: Remove BUG_ON in nfsd4_process_cb_update()
> SUNRPC: Remove stale comments
> NFSD: Remove redundant cb_seq_status initialization
>
>
> fs/nfsd/nfs4callback.c | 81 +++++++++++++-------
> fs/nfsd/nfs4state.c | 1 +
> fs/nfsd/trace.h | 162 ++++++++++++++++++++++++++++++++++++++-
> include/trace/misc/nfs.h | 34 ++++++++
> net/sunrpc/svc.c | 1 -
> net/sunrpc/xprtsock.c | 9 ---
> 6 files changed, 250 insertions(+), 38 deletions(-)
These are great, looking forward to see how 02/13 waits for reconnection.
Seems like a wait_on_bit or wait_on_var triggered from nfsd4_init_conn()
would do, but that's just my wild speculation.
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 00/13] NFSD backchannel fixes
2024-01-25 22:09 ` Benjamin Coddington
@ 2024-01-26 13:56 ` Chuck Lever III
0 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever III @ 2024-01-26 13:56 UTC (permalink / raw)
To: Benjamin Coddington; +Cc: Chuck Lever, Linux NFS Mailing List
> On Jan 25, 2024, at 5:09 PM, Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 25 Jan 2024, at 11:28, Chuck Lever wrote:
>
>> The first three patches fix bugs that prevent NFSD's backchannel
>> from reliably retransmitting after a client reconnects. These fixes
>> might be appropriate for 6.8-rc.
>>
>> Following that are some new trace points that might be helpful for
>> field troubleshooting.
>>
>> Then there are some minor clean-ups.
>>
>> I am still testing this series, and there is one msleep() call that
>> needs some thought. Thoughts, comments, opinions, rotten fruit? You
>> know the drill.
>>
>> ---
>>
>> Chuck Lever (13):
>> NFSD: Reset cb_seq_status after NFS4ERR_DELAY
>> NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down
>> NFSD: Retransmit callbacks after client reconnects
>> NFSD: Add nfsd_seq4_status trace event
>> NFSD: Replace dprintks in nfsd4_cb_sequence_done()
>> NFSD: Rename nfsd_cb_state trace point
>> NFSD: Add callback operation lifetime trace points
>> SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()
>> NFSD: Remove unused @reason argument
>> NFSD: Replace comment with lockdep assertion
>> NFSD: Remove BUG_ON in nfsd4_process_cb_update()
>> SUNRPC: Remove stale comments
>> NFSD: Remove redundant cb_seq_status initialization
>>
>>
>> fs/nfsd/nfs4callback.c | 81 +++++++++++++-------
>> fs/nfsd/nfs4state.c | 1 +
>> fs/nfsd/trace.h | 162 ++++++++++++++++++++++++++++++++++++++-
>> include/trace/misc/nfs.h | 34 ++++++++
>> net/sunrpc/svc.c | 1 -
>> net/sunrpc/xprtsock.c | 9 ---
>> 6 files changed, 250 insertions(+), 38 deletions(-)
>
>
> These are great, looking forward to see how 02/13 waits for reconnection.
> Seems like a wait_on_bit or wait_on_var triggered from nfsd4_init_conn()
> would do, but that's just my wild speculation.
Thanks for having a look!
Well Jeff pointed out that we don't want to put a workqueue worker
thread to sleep, so wait_on_* would work but is not an appropriate
long-term solution.
I'd rather prefer to have a mechanism where the callback RPC tasks
can be "hard" and then they would wait in the RPC client for the
reconnect. The issue then is how connection establishment is
indicated so that the waiting tasks are woken at the right time.
--
Chuck Lever
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points
2024-01-25 21:49 ` Benjamin Coddington
@ 2024-01-26 14:12 ` Chuck Lever
2024-01-26 14:27 ` Benjamin Coddington
0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-01-26 14:12 UTC (permalink / raw)
To: Benjamin Coddington; +Cc: Chuck Lever, linux-nfs
On Thu, Jan 25, 2024 at 04:49:17PM -0500, Benjamin Coddington wrote:
> On 25 Jan 2024, at 11:29, Chuck Lever wrote:
>
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > Help observe the flow of callback operations.
> >
> > bc_shutdown() records exactly when the backchannel RPC client is
> > destroyed and cl_cb_client is replaced with NULL.
> >
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> > fs/nfsd/nfs4callback.c | 7 +++++++
> > fs/nfsd/trace.h | 42 ++++++++++++++++++++++++++++++++++++++++++
> > include/trace/misc/nfs.h | 34 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 83 insertions(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 1c85426830b1..4d5a6370b92c 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -887,6 +887,7 @@ static struct workqueue_struct *callback_wq;
> >
> > static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
> > {
> > + trace_nfsd_cb_queue(cb->cb_clp, cb);
> > return queue_work(callback_wq, &cb->cb_work);
> > }
> >
> > @@ -1106,6 +1107,7 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb)
> > {
> > struct nfs4_client *clp = cb->cb_clp;
> >
> > + trace_nfsd_cb_destroy(clp, cb);
> > nfsd41_cb_release_slot(cb);
> > if (cb->cb_ops && cb->cb_ops->release)
> > cb->cb_ops->release(cb);
> > @@ -1220,6 +1222,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > goto out;
> > need_restart:
> > if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> > + trace_nfsd_cb_restart(clp, cb);
> > task->tk_status = 0;
> > cb->cb_need_restart = true;
>
> I think you want to call the tracepoint /after/ setting cb_need_restart here..
Maybe?
The trace point currently captures the value of cb_need_restart
before this logic overwrites it. Is that beneficial for
troubleshooting?
> > }
> > @@ -1333,11 +1336,14 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> > struct nfsd4_conn *c;
> > int err;
> >
> > + trace_nfsd_cb_bc_update(clp, cb);
> > +
> > /*
> > * This is either an update, or the client dying; in either case,
> > * kill the old client:
> > */
> > if (clp->cl_cb_client) {
> > + trace_nfsd_cb_bc_shutdown(clp, cb);
> > rpc_shutdown_client(clp->cl_cb_client);
> > clp->cl_cb_client = NULL;
> > put_cred(clp->cl_cb_cred);
> > @@ -1349,6 +1355,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> > }
> > if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> > return;
> > +
>
> I'm in favor of this whitespace change, but did you mean to include it?
Yes, but it might be better off in one of the other patches.
--
Chuck Lever
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points
2024-01-26 14:12 ` Chuck Lever
@ 2024-01-26 14:27 ` Benjamin Coddington
0 siblings, 0 replies; 24+ messages in thread
From: Benjamin Coddington @ 2024-01-26 14:27 UTC (permalink / raw)
To: Chuck Lever; +Cc: Chuck Lever, linux-nfs
On 26 Jan 2024, at 9:12, Chuck Lever wrote:
> On Thu, Jan 25, 2024 at 04:49:17PM -0500, Benjamin Coddington wrote:
>> On 25 Jan 2024, at 11:29, Chuck Lever wrote:
>>
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> Help observe the flow of callback operations.
>>>
>>> bc_shutdown() records exactly when the backchannel RPC client is
>>> destroyed and cl_cb_client is replaced with NULL.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> fs/nfsd/nfs4callback.c | 7 +++++++
>>> fs/nfsd/trace.h | 42 ++++++++++++++++++++++++++++++++++++++++++
>>> include/trace/misc/nfs.h | 34 ++++++++++++++++++++++++++++++++++
>>> 3 files changed, 83 insertions(+)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 1c85426830b1..4d5a6370b92c 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -887,6 +887,7 @@ static struct workqueue_struct *callback_wq;
>>>
>>> static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
>>> {
>>> + trace_nfsd_cb_queue(cb->cb_clp, cb);
>>> return queue_work(callback_wq, &cb->cb_work);
>>> }
>>>
>>> @@ -1106,6 +1107,7 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb)
>>> {
>>> struct nfs4_client *clp = cb->cb_clp;
>>>
>>> + trace_nfsd_cb_destroy(clp, cb);
>>> nfsd41_cb_release_slot(cb);
>>> if (cb->cb_ops && cb->cb_ops->release)
>>> cb->cb_ops->release(cb);
>>> @@ -1220,6 +1222,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>> goto out;
>>> need_restart:
>>> if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
>>> + trace_nfsd_cb_restart(clp, cb);
>>> task->tk_status = 0;
>>> cb->cb_need_restart = true;
>>
>> I think you want to call the tracepoint /after/ setting cb_need_restart here..
>
> Maybe?
>
> The trace point currently captures the value of cb_need_restart
> before this logic overwrites it. Is that beneficial for
> troubleshooting?
Got it, yes its better the way you have it - and the tracepoint name makes
clear we want to restart.
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-01-26 14:27 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 16:28 [PATCH RFC 00/13] NFSD backchannel fixes Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 01/13] NFSD: Reset cb_seq_status after NFS4ERR_DELAY Chuck Lever
2024-01-25 20:07 ` Jeff Layton
2024-01-25 16:28 ` [PATCH RFC 02/13] NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down Chuck Lever
2024-01-25 20:19 ` Jeff Layton
2024-01-25 20:29 ` Chuck Lever
2024-01-25 16:28 ` [PATCH RFC 03/13] NFSD: Retransmit callbacks after client reconnects Chuck Lever
2024-01-25 20:24 ` Jeff Layton
2024-01-25 16:29 ` [PATCH RFC 04/13] NFSD: Add nfsd_seq4_status trace event Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 05/13] NFSD: Replace dprintks in nfsd4_cb_sequence_done() Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 06/13] NFSD: Rename nfsd_cb_state trace point Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 07/13] NFSD: Add callback operation lifetime trace points Chuck Lever
2024-01-25 21:49 ` Benjamin Coddington
2024-01-26 14:12 ` Chuck Lever
2024-01-26 14:27 ` Benjamin Coddington
2024-01-25 16:29 ` [PATCH RFC 08/13] SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc() Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 09/13] NFSD: Remove unused @reason argument Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 10/13] NFSD: Replace comment with lockdep assertion Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 11/13] NFSD: Remove BUG_ON in nfsd4_process_cb_update() Chuck Lever
2024-01-25 16:29 ` [PATCH RFC 12/13] SUNRPC: Remove stale comments Chuck Lever
2024-01-25 16:30 ` [PATCH RFC 13/13] NFSD: Remove redundant cb_seq_status initialization Chuck Lever
2024-01-25 20:41 ` [PATCH RFC 00/13] NFSD backchannel fixes Jeff Layton
2024-01-25 22:09 ` Benjamin Coddington
2024-01-26 13:56 ` Chuck Lever III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox