From: Dai Ngo <dai.ngo@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
Christoph Hellwig <hch@lst.de>
Cc: neilb@ownmail.net, okorniev@redhat.com, tom@talpey.com,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error
Date: Mon, 3 Nov 2025 11:36:22 -0800 [thread overview]
Message-ID: <a5d81349-c670-4aba-add5-c921a14c8e6a@oracle.com> (raw)
In-Reply-To: <ae427b4ffdb219a64abd7d68680240d9798af845.camel@kernel.org>
On 11/3/25 11:22 AM, Jeff Layton wrote:
> On Mon, 2025-11-03 at 10:50 -0800, Dai Ngo wrote:
>> On 11/3/25 6:16 AM, Chuck Lever wrote:
>>> On 11/3/25 6:45 AM, Christoph Hellwig wrote:
>>>> On Sat, Nov 01, 2025 at 11:51:34AM -0700, Dai Ngo wrote:
>>>>> NFS4ERR_RETRY_UNCACHED_REP error means client has seen and replied
>>>>> to the layout recall, no fencing is needed.
>>>> RFC 5661 specifies that error as:
>>>>
>>>> The requester has attempted a retry of a Compound that it previously
>>>> requested not be placed in the reply cache.
>>>>
>>>> which to me doesn't imply a positive action here.
>>> Agreed, this status code seems like a loss of synchronization of session
>>> state between the client and server, or an implementation bug. Ie, it
>>> seems to mean that at the very least, session re-negotiation is needed,
>>> at first blush. Should the server mark a callback channel FAULT, for
>>> instance?
>>>
>>>
>>>> But I'm not an
>>>> expert at reply cache semantics, so I'll leave others to correct me.
>>>> But please add a more detailed comment and commit log as this is
>>>> completely unintuitive.
>>> The session state and the state of the layout are at two different
>>> and separate layers. Connect the dots to show that not fencing is
>>> the correct action and will result in recovery of full backchannel
>>> operation while maintaining the integrity of the file's content.
>>>
>>> So IMHO reviewers need this patch description to provide:
>>>
>>> - How this came up during your testing (and maybe a small reproducer)
>>>
>>> - An explanation of why leaving the client unfenced is appropriate
>>>
>>> - A discussion of what will happen when the server subsequently sends
>>> another operation on this session slot
>> Here is the sequence of events that leads to NFS4ERR_RETRY_UNCACHED_REP:
>>
>> 1. Server sends CB_LAYOUTRECALL with stateID seqid 2
>> 2. Client replies NFS4ERR_NOMATCHING_LAYOUT
>> 3. Server does not receive the reply due to hard hang - no server thread
>> available to service the reply (I will post a fix for this problem)
>> 4. Server RPC times out waiting for the reply, nfsd4_cb_sequence_done
>> is called with cb_seq_status == 1, nfsd4_mark_cb_fault is called
>> and the request is re-queued.
>> 5. Client receives the same CB_LAYOUTRECALL with stateID seqid 2
>> again and this time client replies with NFS4ERR_RETRY_UNCACHED_REP.
>>
>> This process repeats forever from step 4.
>>
> I'm a little confused here. I could see that you might not be able to
> process a LAYOUTRETURN if all nfsd threads were blocked waiting for the
> break_layout(), but I don't get why that would blocks a CB_LAYOUTRECALL
> reply.
>
> For the server, CB_LAYOUTRECALL is a client RPC (server acts as client
> and vice versa). A CB_LAYOUTRECALL shouldn't depend on having a nfsd
> thread available, since it runs in the context of a workqueue thread.
>
> What am I missing?
This is call stack when the server receives the callback reply:
receive_cb_reply+1
svc_tcp_recvfrom+3531
svc_handle_xprt+3747
svc_recv+511
nfsd+588
kthread+916
ret_from_fork+479
ret_from_fork_asm+26
As shown, it requires a NFSD thread to service it. The NFSD thread eventually
calls xprt_complete_rqst to wake up the RPC task waiting for the reply.
-Dai
>
next prev parent reply other threads:[~2025-11-03 19:36 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-01 18:51 [PATCH 0/3] NFSD: Fix problem with nfsd4_scsi_fence_client Dai Ngo
2025-11-01 18:51 ` [PATCH 1/3] NFSD: Fix problem with nfsd4_scsi_fence_client using the wrong reservation type Dai Ngo
2025-11-03 11:42 ` Christoph Hellwig
2025-11-01 18:51 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo
2025-11-03 11:45 ` Christoph Hellwig
2025-11-03 14:16 ` Chuck Lever
2025-11-03 18:50 ` Dai Ngo
2025-11-03 18:57 ` Chuck Lever
2025-11-03 19:14 ` Dai Ngo
2025-11-03 20:03 ` Dai Ngo
2025-11-03 20:15 ` Chuck Lever
2025-11-03 20:36 ` Dai Ngo
2025-11-03 19:22 ` Jeff Layton
2025-11-03 19:36 ` Dai Ngo [this message]
2025-11-03 19:40 ` Jeff Layton
2025-11-01 18:51 ` [PATCH 3/3] NFSD: Add trace point for SCSI fencing operation Dai Ngo
2025-11-02 15:40 ` Chuck Lever
2025-11-03 20:44 ` Dai Ngo
2025-11-03 21:00 ` Chuck Lever
2025-11-04 0:32 ` Dai Ngo
2025-11-04 14:05 ` Chuck Lever
-- strict thread matches above, loose matches on Subject: below --
2025-11-01 18:25 Dai Ngo
2025-11-01 18:25 ` [PATCH 2/3] NFSD: Do not fence the client on NFS4ERR_RETRY_UNCACHED_REP error Dai Ngo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5d81349-c670-4aba-add5-c921a14c8e6a@oracle.com \
--to=dai.ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@ownmail.net \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).