linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Chuck Lever <chuck.lever@oracle.com>, Christoph Hellwig <hch@lst.de>
Cc: jlayton@kernel.org, neilb@ownmail.net, okorniev@redhat.com,
	tom@talpey.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error
Date: Mon, 3 Nov 2025 12:36:48 -0800	[thread overview]
Message-ID: <5195fb82-0ebc-43a5-9b9a-54ad0d74e92c@oracle.com> (raw)
In-Reply-To: <40c969cf-9898-48f5-88bb-d6bed7b54a9c@oracle.com>


On 11/3/25 12:15 PM, Chuck Lever wrote:
> On 11/3/25 2:14 PM, Dai Ngo wrote:
>>>    and I disagree that fencing is harsh, because
>>> NFS4ERR_RETRY_UNCACHED_REP is supposed to be quite rare, and of course
>>> there are other ways this error can happen.
>> Yes, this error should be rare. But is fencing the client is a correct
>> solution for it? IMHO, NFS4ERR_RETRY_UNCACHED_REP means the client has
>> received and replied to the server, it just somehow the server did not
>> see the reply due to many reasons.
> Fencing seems appropriate when there is a clear indication that the
> client and server state are out of sync. The question is why, and how do
> we prevent that situation from occurring? And, when we get into this
> state, what is the correct recovery?
>
> I don't see NFSD doing this short-circuit when processing a CB_RECALL
> response, for instance.
>
>
>> I think in this case we should just
>> mark the back channel down and let the client recover it, instead of
>> fencing the client.
> Clearly the backchannel needs to recover properly from
> NFS4ERR_RETRY_UNCACHED_REP, and if it goes into a loop, something is not
> right. I don't think this is the correct fix for looping, either.
>
> I don't understand why, after the server indicates a backchannel fault,
> the client and server don't replace the session. The server is trying
> to re-use what is obviously an incorrect slot sequence ID; it shouldn't
> expect any different result by retrying.
>
> So, yes, there are one or more real bugs here. But ignoring a sign that
> state synchrony has been lost is not the right fix.

ok, I'll drop this patch. This error occurs due to the hard hang problem.
In that case, there is no back channel recovery, no new session, etc takes
place, the callback code just keep retrying forever.

Let fix the hang problem first and we can take care of this error later
if it still happens.

-Dai


  reply	other threads:[~2025-11-03 20:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-01 18:51 [PATCH 0/3] NFSD: Fix problem with nfsd4_scsi_fence_client Dai Ngo
2025-11-01 18:51 ` [PATCH 1/3] NFSD: Fix problem with nfsd4_scsi_fence_client using the wrong reservation type Dai Ngo
2025-11-03 11:42   ` Christoph Hellwig
2025-11-01 18:51 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo
2025-11-03 11:45   ` Christoph Hellwig
2025-11-03 14:16     ` Chuck Lever
2025-11-03 18:50       ` Dai Ngo
2025-11-03 18:57         ` Chuck Lever
2025-11-03 19:14           ` Dai Ngo
2025-11-03 20:03             ` Dai Ngo
2025-11-03 20:15             ` Chuck Lever
2025-11-03 20:36               ` Dai Ngo [this message]
2025-11-03 19:22         ` Jeff Layton
2025-11-03 19:36           ` Dai Ngo
2025-11-03 19:40             ` Jeff Layton
2025-11-01 18:51 ` [PATCH 3/3] NFSD: Add trace point for SCSI fencing operation Dai Ngo
2025-11-02 15:40   ` Chuck Lever
2025-11-03 20:44     ` Dai Ngo
2025-11-03 21:00       ` Chuck Lever
2025-11-04  0:32     ` Dai Ngo
2025-11-04 14:05       ` Chuck Lever
  -- strict thread matches above, loose matches on Subject: below --
2025-11-01 18:25 Dai Ngo
2025-11-01 18:25 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5195fb82-0ebc-43a5-9b9a-54ad0d74e92c@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).