From: Jeff Layton <jlayton@kernel.org>
To: Tom Talpey <tom@talpey.com>, Chuck Lever <chuck.lever@oracle.com>,
Neil Brown <neilb@suse.de>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>,
"J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better
Date: Sat, 08 Feb 2025 14:08:16 -0500 [thread overview]
Message-ID: <66532654ca25280ffa30168a977601ba4a37aaab.camel@kernel.org> (raw)
In-Reply-To: <40970e33-4689-4623-a423-b346e739ba80@talpey.com>
On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
> On 2/8/2025 10:02 AM, Jeff Layton wrote:
> > On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> > > On 2/7/25 4:53 PM, Jeff Layton wrote:
> > > > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > > > fall back to treating it like a BADSLOT if that fails.
> > > >
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > > > 1 file changed, 10 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > goto requeue;
> > > > rpc_delay(task, 2 * HZ);
> > > > return false;
> > > > + case -NFS4ERR_SEQ_MISORDERED:
> > > > + /*
> > > > + * Reattempt once with seq_nr 1. If that fails, treat this
> > > > + * like BADSLOT.
> > > > + */
> > >
> > > Nit: this comment says exactly what the code says. If it were me, I'd
> > > remove it. Is there a "why" statement that could be made here? Like,
> > > why retry with a seq_nr of 1 instead of just failing immediately?
> > >
> >
> > There isn't one that I know of. It looks like Kinglong Mee added it in
> > 7ba6cad6c88f, but there is no real mention of that in the changelog.
> >
> > TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
> > when we got this error, and we then retry with a seq_nr of 1? Does the
> > server then treat that as a retransmission?
>
> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>
> If so, yes definitely backing up the seq_nr to 1 will result in the
> peer considering it to be a retransmission, which will be bad.
>
Yes, that's what I meant.
> > We might be best off
> > dropping this and just always treating it like BADSLOT.
>
> But, why would this happen? Usually I'd think the peer sent seq_nr X
> before it received a reply to seq_nr X-1, which would be a peer bug.
>
> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
> how does the requester know the difference?
>
> If treating it as BADSLOT completely resets the sequence, then sure,
> but either a) the request is still in-progress, or b) if a bug is
> causing the situation, well it's not going to converge on a functional
> session.
>
With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
in the next forechannel SEQUENCE on the session. That should cause the
client to (eventually) send a DESTROY_SESSION and create a new one.
Unfortunately, in the meantime, because of the way the callback channel
update works, the server can end up trying to send the callback again
on the same session (and maybe more than once). I'm not sure that
that's a real problem per-se, but it's less than ideal.
> Not sure I have a solid suggestion right now. Whatever the fix, it
> should capture any subtlety in a comment.
>
At this point, I'm leaning toward just treating it like BADSLOT.
Basically, mark the backchannel faulty, and leak the slot so that
nothing else uses it. That allows us to send backchannel requests on
the other slots until the session gets recreated.
> >
> > Thoughts?
> >
> > >
> > > > + if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > + session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > + goto retry_nowait;
> > > > + }
> > > > + fallthrough;
> > > > case -NFS4ERR_BADSLOT:
> > > > /*
> > > > * BADSLOT means that the client and server are out of sync
> > > > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > > nfsd4_mark_cb_fault(cb->cb_clp);
> > > > cb->cb_held_slot = -1;
> > > > goto retry_nowait;
> > > > - case -NFS4ERR_SEQ_MISORDERED:
> > > > - if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > - session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > - goto retry_nowait;
> > > > - }
> > > > - break;
> > > > default:
> > > > nfsd4_mark_cb_fault(cb->cb_clp);
> > > > }
> > > >
> > >
> > >
> >
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-02-08 19:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 21:53 [PATCH v5 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
2025-02-07 21:53 ` [PATCH v5 1/7] nfsd: prepare nfsd4_cb_sequence_done() for error handling rework Jeff Layton
2025-02-07 21:53 ` [PATCH v5 2/7] nfsd: always release slot when requeueing callback Jeff Layton
2025-02-08 16:57 ` Chuck Lever
2025-02-08 17:55 ` Jeff Layton
2025-02-07 21:53 ` [PATCH v5 3/7] nfsd: only check RPC_SIGNALLED() when restarting rpc_task Jeff Layton
2025-02-08 16:59 ` Chuck Lever
2025-02-07 21:53 ` [PATCH v5 4/7] nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr Jeff Layton
2025-02-08 17:13 ` Chuck Lever
2025-02-07 21:53 ` [PATCH v5 5/7] nfsd: handle CB_SEQUENCE NFS4ERR_BADSLOT better Jeff Layton
2025-02-07 21:53 ` [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED error better Jeff Layton
2025-02-08 17:01 ` Chuck Lever
2025-02-08 18:02 ` Jeff Layton
2025-02-08 18:40 ` Tom Talpey
2025-02-08 19:08 ` Jeff Layton [this message]
2025-02-08 19:18 ` Tom Talpey
2025-02-08 20:45 ` Jeff Layton
2025-02-08 21:07 ` Chuck Lever
2025-02-09 1:24 ` Tom Talpey
2025-02-09 2:14 ` Jeff Layton
2025-02-09 16:26 ` Tom Talpey
2025-02-09 16:51 ` Jeff Layton
2025-02-09 16:58 ` Tom Talpey
2025-02-09 17:05 ` Jeff Layton
2025-02-09 18:52 ` Tom Talpey
2025-02-07 21:53 ` [PATCH v5 7/7] nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() Jeff Layton
2025-02-08 17:05 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66532654ca25280ffa30168a977601ba4a37aaab.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox