From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Kinglong Mee <kinglongmee@gmail.com>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Subject: Re: [PATCH 2/8] nfsd: fix CB_SEQUENCE error handling of NFS4ERR_{BADSLOT,BADSESSION,SEQ_MISORDERED}
Date: Fri, 24 Jan 2025 11:08:58 -0500 [thread overview]
Message-ID: <00baf78e7d483930ddac4129fb91828707d89769.camel@kernel.org> (raw)
In-Reply-To: <e8b4f46a-2c4b-43b3-bf82-dc5d8f6af171@oracle.com>
On Fri, 2025-01-24 at 10:31 -0500, Chuck Lever wrote:
> On 1/24/25 9:46 AM, Jeff Layton wrote:
> > On Fri, 2025-01-24 at 09:32 -0500, Chuck Lever wrote:
> > > On 1/23/25 3:25 PM, Jeff Layton wrote:
> > > > The current error handling has some problems:
> > > >
> > > > BADSLOT and BADSESSION: don't release the slot before retrying the call
> > > >
> > > > SEQ_MISORDERED: does some sketchy resetting of the seqid? I can't find any
> > > > recommendation about doing that in the spec, and it seems wrong.
> > >
> > > Random thought: You might use the Linux NFS client's forechannel session
> > > implementation as a code reference.
> > >
> > >
> > > > Handle all three errors the same way: release the slot, but then handle
> > > > it just like we would as if we hadn't gotten a reply; mark the session
> > > > as faulty, and retry the call.
> > >
> > > Some questions:
> > >
> > > Why does it matter whether NFSD keeps the slot if both sides plan to
> > > destroy the session?
> > >
> >
> > It may not be required, but there is no reason to hold onto the slot in
> > these cases.
>
> In the BADSLOT case, if the slot is released, then another session
> consumer on the NFS server can use it and will encounter the same error.
> Best to keep it in the penalty box, IMO.
>
There is another problem here too. Once the session is reconstituted,
there is no guarantee that the slot that the call is sitting on will
still be valid. The new CB slot table may be smaller than before. I
think we do need to release the slot in these cases for that reason
alone.
> If there are other slots, they are likely still usable. An
> implementation can choose to continue using the session rather than
> scuttling it immediately. In the past, with a single backchannel slot,
> NFSD had no choice but to replace the session. But now it can be more
> conservative.
>
>
> > Also, at this point, only nfsd has declared that it needs
> > a new session (see below).
>
> If the client's backchannel service has returned BADSESSION, then the
> client already knows this session is unusable.
>
>
> > > Also, AFAICT marking CB_FAULT does not destroy the session, it simply
> > > tries to recreate backchannel's rpc_clnt. Perhaps NFSD's callback code
> > > should actively destroy the session and let the client drive a fresh
> > > CREATE_SESSION to recover?
> > >
> >
> > Marking it with a fault just sets the cl_cb_state to NFSD4_CB_FAULT.
> > Then, on the next SEQUENCE call, that makes nfsd set
> > SEQ4_STATUS_BACKCHANNEL_FAULT, which should make the client recreate
> > the session. Obviously, there is some delay involved there since we
> > might have to wait for the client to do a lease renewal before this
> > happens.
> >
> > >
> > > > Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > fs/nfsd/nfs4callback.c | 27 +++++++++++----------------
> > > > 1 file changed, 11 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index e12205ef16ca932ffbcc86d67b0817aec2436c89..bfc9de1fcb67b4f05ed2f7a28038cd8290809c17 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1371,17 +1371,24 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > nfsd4_mark_cb_fault(cb->cb_clp);
> > > > ret = false;
> > > > break;
> > > > + case -NFS4ERR_BADSESSION:
> > > > + case -NFS4ERR_BADSLOT:
> > > > + case -NFS4ERR_SEQ_MISORDERED:
> > > > + /*
> > > > + * These errors indicate that something has gone wrong
> > > > + * with the server and client's synchronization. Release
> > > > + * the slot, but handle it as if we hadn't gotten a reply.
> > > > + */
> > > > + nfsd41_cb_release_slot(cb);
> > > > + fallthrough;
> > > > case 1:
> > > > /*
> > > > * cb_seq_status remains 1 if an RPC Reply was never
> > > > * received. NFSD can't know if the client processed
> > > > * the CB_SEQUENCE operation. Ask the client to send a
> > > > - * DESTROY_SESSION to recover.
> > > > + * DESTROY_SESSION to recover, but keep the slot.
> > > > */
> > > > - fallthrough;
> > > > - case -NFS4ERR_BADSESSION:
> > > > nfsd4_mark_cb_fault(cb->cb_clp);
> > > > - ret = false;
> > > > goto need_restart;
> > > > case -NFS4ERR_DELAY:
> > > > cb->cb_seq_status = 1;
> > > > @@ -1390,14 +1397,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >
> > > > rpc_delay(task, 2 * HZ);
> > > > return false;
> > > > - case -NFS4ERR_BADSLOT:
> > > > - goto retry_nowait;
> > > > - case -NFS4ERR_SEQ_MISORDERED:
> > > > - if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > - session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > - goto retry_nowait;
> > > > - }
> > > > - break;
> > > > default:
> > > > nfsd4_mark_cb_fault(cb->cb_clp);
> > > > }
> > > > @@ -1405,10 +1404,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > nfsd41_cb_release_slot(cb);
> > > > out:
> > > > return ret;
> > > > -retry_nowait:
> > > > - if (rpc_restart_call_prepare(task))
> > > > - ret = false;
> > > > - goto out;
> > > > need_restart:
> > > > if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> > > > trace_nfsd_cb_restart(clp, cb);
> > > >
> > >
> > >
> >
>
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-01-24 16:09 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 20:25 [PATCH 0/8] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
2025-01-23 20:25 ` [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED is set Jeff Layton
2025-01-25 16:24 ` Chuck Lever
2025-01-25 22:04 ` Jeff Layton
2025-01-25 23:01 ` NeilBrown
2025-01-26 11:18 ` Jeff Layton
2025-01-26 16:41 ` Chuck Lever
2025-01-27 15:43 ` Jeff Layton
2025-01-27 17:00 ` Chuck Lever
2025-01-23 20:25 ` [PATCH 2/8] nfsd: fix CB_SEQUENCE error handling of NFS4ERR_{BADSLOT,BADSESSION,SEQ_MISORDERED} Jeff Layton
2025-01-24 14:32 ` Chuck Lever
2025-01-24 14:46 ` Jeff Layton
2025-01-24 15:31 ` Chuck Lever
2025-01-24 16:04 ` Jeff Layton
2025-01-24 16:08 ` Jeff Layton [this message]
2025-01-23 20:25 ` [PATCH 3/8] nfsd: when CB_SEQUENCE gets NFS4ERR_DELAY, release the slot Jeff Layton
2025-01-23 22:18 ` Chuck Lever
2025-01-23 23:20 ` Jeff Layton
2025-01-24 1:30 ` Tom Talpey
2025-01-24 14:00 ` J. Bruce Fields
2025-01-24 14:11 ` Jeff Layton
2025-01-24 20:29 ` Tom Talpey
2025-01-24 17:45 ` Olga Kornievskaia
2025-01-24 17:47 ` Olga Kornievskaia
2025-01-23 20:25 ` [PATCH 4/8] nfsd: fix default case in nfsd4_cb_sequence_done() Jeff Layton
2025-01-23 20:25 ` [PATCH 5/8] nfsd: reverse default of "ret" variable " Jeff Layton
2025-01-23 20:25 ` [PATCH 6/8] nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault() Jeff Layton
2025-01-23 20:25 ` [PATCH 7/8] nfsd: clean up and amend comments around nfsd4_cb_sequence_done() Jeff Layton
2025-01-24 14:43 ` Chuck Lever
2025-01-24 14:50 ` Jeff Layton
2025-01-24 15:05 ` Chuck Lever
2025-01-24 15:31 ` Jeff Layton
2025-01-24 15:42 ` Chuck Lever
2025-01-26 16:50 ` Chuck Lever
2025-01-23 20:25 ` [PATCH 8/8] sunrpc: make rpc_restart_call() and rpc_restart_call_prepare() void return Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00baf78e7d483930ddac4129fb91828707d89769.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kinglongmee@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox