All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	 Christoph Hellwig <hch@lst.de>
Cc: neilb@ownmail.net, okorniev@redhat.com, tom@talpey.com,
	 linux-nfs@vger.kernel.org
Subject: Re: [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error
Date: Mon, 03 Nov 2025 14:22:13 -0500	[thread overview]
Message-ID: <ae427b4ffdb219a64abd7d68680240d9798af845.camel@kernel.org> (raw)
In-Reply-To: <b8489e0f-550c-4c63-8429-fb2d44f24c0e@oracle.com>

On Mon, 2025-11-03 at 10:50 -0800, Dai Ngo wrote:
> On 11/3/25 6:16 AM, Chuck Lever wrote:
> > On 11/3/25 6:45 AM, Christoph Hellwig wrote:
> > > On Sat, Nov 01, 2025 at 11:51:34AM -0700, Dai Ngo wrote:
> > > > NFS4ERR_RETRY_UNCACHED_REP error means client has seen and replied
> > > > to the layout recall, no fencing is needed.
> > > RFC 5661 specifies that error as:
> > > 
> > >    The requester has attempted a retry of a Compound that it previously
> > >    requested not be placed in the reply cache.
> > > 
> > > which to me doesn't imply a positive action here.
> > Agreed, this status code seems like a loss of synchronization of session
> > state between the client and server, or an implementation bug. Ie, it
> > seems to mean that at the very least, session re-negotiation is needed,
> > at first blush. Should the server mark a callback channel FAULT, for
> > instance?
> > 
> > 
> > > But I'm not an
> > > expert at reply cache semantics, so I'll leave others to correct me.
> > > But please add a more detailed comment and commit log as this is
> > > completely unintuitive.
> > The session state and the state of the layout are at two different
> > and separate layers. Connect the dots to show that not fencing is
> > the correct action and will result in recovery of full backchannel
> > operation while maintaining the integrity of the file's content.
> > 
> > So IMHO reviewers need this patch description to provide:
> > 
> > - How this came up during your testing (and maybe a small reproducer)
> > 
> > - An explanation of why leaving the client unfenced is appropriate
> > 
> > - A discussion of what will happen when the server subsequently sends
> >    another operation on this session slot
> 
> Here is the sequence of events that leads to NFS4ERR_RETRY_UNCACHED_REP:
> 
> 1. Server sends CB_LAYOUTRECALL with stateID seqid 2
> 2. Client replies NFS4ERR_NOMATCHING_LAYOUT
> 3. Server does not receive the reply due to hard hang - no server thread
>     available to service the reply (I will post a fix for this problem)
> 4. Server RPC times out waiting for the reply, nfsd4_cb_sequence_done
>     is called with cb_seq_status == 1, nfsd4_mark_cb_fault is called
>     and the request is re-queued.
> 5. Client receives the same CB_LAYOUTRECALL with stateID seqid 2
>     again and this time client replies with NFS4ERR_RETRY_UNCACHED_REP.
> 
> This process repeats forever from step 4.
> 

I'm a little confused here. I could see that you might not be able to
process a LAYOUTRETURN if all nfsd threads were blocked waiting for the
break_layout(), but I don't get why that would blocks a CB_LAYOUTRECALL
reply.

For the server, CB_LAYOUTRECALL is a client RPC (server acts as client
and vice versa). A CB_LAYOUTRECALL shouldn't depend on having a nfsd
thread available, since it runs in the context of a workqueue thread.

What am I missing?

-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2025-11-03 19:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-01 18:51 [PATCH 0/3] NFSD: Fix problem with nfsd4_scsi_fence_client Dai Ngo
2025-11-01 18:51 ` [PATCH 1/3] NFSD: Fix problem with nfsd4_scsi_fence_client using the wrong reservation type Dai Ngo
2025-11-03 11:42   ` Christoph Hellwig
2025-11-01 18:51 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo
2025-11-03 11:45   ` Christoph Hellwig
2025-11-03 14:16     ` Chuck Lever
2025-11-03 18:50       ` Dai Ngo
2025-11-03 18:57         ` Chuck Lever
2025-11-03 19:14           ` Dai Ngo
2025-11-03 20:03             ` Dai Ngo
2025-11-03 20:15             ` Chuck Lever
2025-11-03 20:36               ` Dai Ngo
2025-11-03 19:22         ` Jeff Layton [this message]
2025-11-03 19:36           ` Dai Ngo
2025-11-03 19:40             ` Jeff Layton
2025-11-01 18:51 ` [PATCH 3/3] NFSD: Add trace point for SCSI fencing operation Dai Ngo
2025-11-02 15:40   ` Chuck Lever
2025-11-03 20:44     ` Dai Ngo
2025-11-03 21:00       ` Chuck Lever
2025-11-04  0:32     ` Dai Ngo
2025-11-04 14:05       ` Chuck Lever
  -- strict thread matches above, loose matches on Subject: below --
2025-11-01 18:25 Dai Ngo
2025-11-01 18:25 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae427b4ffdb219a64abd7d68680240d9798af845.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.