From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>, NeilBrown <neilb@suse.de>
Cc: Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Kinglong Mee <kinglongmee@gmail.com>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Subject: Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED is set
Date: Sun, 26 Jan 2025 11:41:44 -0500 [thread overview]
Message-ID: <ac5834d4-1465-4dde-a451-b0804c537f04@oracle.com> (raw)
In-Reply-To: <d52cf9b9b83753434c1b0098afe1b77bf65590d4.camel@kernel.org>
On 1/26/25 6:18 AM, Jeff Layton wrote:
> On Sun, 2025-01-26 at 10:01 +1100, NeilBrown wrote:
>> On Fri, 24 Jan 2025, Jeff Layton wrote:
>>> This is problematic, since the RPC might have been entirely successful.
>>> There is no point in restarting a v4.1+ callback just because
>>> RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
>>> detecting when it should retransmit the RPC.
>>
>> But why might RPC_SIGNALLED() ever be true?
>> The flag RPC_TASK_SIGNALLED is only ever set by rpc_signal_task() which
>> is only called from rpc_killall_tasks() and __rpc_execute() for
>> non-async tasks which doesn't apply to nfsd callbacks as they are
>> started with rpc_call_async().
>>
>> rpc_killall_tasks() is called by fs/nfs/ which isn't relevant for us,
>> and from rpc_shutdown_client(). In those cases we certainly don't want
>> the request to be retried, though the nfsd4_process_cb_update() case is
>> a little interesting as we do want it to be retried, but in a different
>> client.
>>
>> So the code you are removing is either dead code because something else
>> will prevent the restart when a client is being shut down, or it is bad
>> code because it would delay rpc_shutdown_client() while the request is
>> retried.
>>
>> I haven't dug the extra step to figure out which, but either way I think
>> the code should go.
>
> Thanks. That was my analysis too.
Agreed, this code is problematic, but it appears to me that some of
these problems are not resolved by simply removing the signal check.
> rpc_shutdown_client() is called when we tear down and rebuild the
> rpc_client. nfsd does this in setup_callback_client(), which gets
> called from nfsd4_process_cb_update() (basically when we detect that
> the backchannel is having problems).
>
> There are really only two states: We either got a reply from the server
> before the client went down, or we didn't. In the case where we got a
> reply, there is no need to retry anything. In the case where we didn't,
> the tk_status will be '1', so there is no need to check RPC_SIGNALLED()
> at all here.
Fwiw, the "cb_seq_status == 1" arm skips the signal check in the current
code.
> The existing code could lead to the call being retried when we had
> already gotten a perfectly valid reply.
Here's a case-by-case audit:
- NFS_OK: SEQUENCE was decoded and passed sanity checks. So this should
not ever requeue in here. It might be requeued during subsequent
processing.
- ESERVERFAULT: SEQUENCE was decoded but failed sanity checking. The
reply should be dropped now, and the session marked FAULT. No requeue
is ever needed here.
[ I question whether the sequence number should be bumped in this
case -- the client's callback server replied with the identity of
some other slot. And anyway, this session is about to become
toast. ]
- 1: The timeout case. We want a fresh session here, so it falls
through to BADSESSION.
- NFS4ERR_BADSESSION: This needs a requeue so that the operation can
be retried with a fresh session. But it should always check if the
rpc_clnt is shutting down before doing so. This is a problem in the
current code.
- NFS4ERR_DELAY: Skips the signal check, but shouldn't. If the rpc_clnt
is shutting down, this RPC should not be requeued.
- NFS4ERR_BAD_SLOT: Skips the signal check, but shouldn't. I need to
think more about BAD_SLOT recovery best practice.
- NFS4ERR_SEQ_MISORDERED: Does one retry with a seq_nr of 1. It
probably should terminate if that fails. IMO this should check for
rpc_clnt shutdown before requeuing the retry.
- default: I don't think this case should ever be requeued, but it
appears that it could be if the rpc_clnt is shutting down.
>>> Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>> fs/nfsd/nfs4callback.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>> }
>>> trace_nfsd_cb_free_slot(task, cb);
>>> nfsd41_cb_release_slot(cb);
>>> -
>>> - if (RPC_SIGNALLED(task))
>>> - goto need_restart;
>>> out:
>>> return ret;
>>> retry_nowait:
>>>
>>> --
>>> 2.48.1
>>>
>>>
>>
>
--
Chuck Lever
next prev parent reply other threads:[~2025-01-26 16:42 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 20:25 [PATCH 0/8] nfsd: CB_SEQUENCE error handling fixes and cleanups Jeff Layton
2025-01-23 20:25 ` [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED is set Jeff Layton
2025-01-25 16:24 ` Chuck Lever
2025-01-25 22:04 ` Jeff Layton
2025-01-25 23:01 ` NeilBrown
2025-01-26 11:18 ` Jeff Layton
2025-01-26 16:41 ` Chuck Lever [this message]
2025-01-27 15:43 ` Jeff Layton
2025-01-27 17:00 ` Chuck Lever
2025-01-23 20:25 ` [PATCH 2/8] nfsd: fix CB_SEQUENCE error handling of NFS4ERR_{BADSLOT,BADSESSION,SEQ_MISORDERED} Jeff Layton
2025-01-24 14:32 ` Chuck Lever
2025-01-24 14:46 ` Jeff Layton
2025-01-24 15:31 ` Chuck Lever
2025-01-24 16:04 ` Jeff Layton
2025-01-24 16:08 ` Jeff Layton
2025-01-23 20:25 ` [PATCH 3/8] nfsd: when CB_SEQUENCE gets NFS4ERR_DELAY, release the slot Jeff Layton
2025-01-23 22:18 ` Chuck Lever
2025-01-23 23:20 ` Jeff Layton
2025-01-24 1:30 ` Tom Talpey
2025-01-24 14:00 ` J. Bruce Fields
2025-01-24 14:11 ` Jeff Layton
2025-01-24 20:29 ` Tom Talpey
2025-01-24 17:45 ` Olga Kornievskaia
2025-01-24 17:47 ` Olga Kornievskaia
2025-01-23 20:25 ` [PATCH 4/8] nfsd: fix default case in nfsd4_cb_sequence_done() Jeff Layton
2025-01-23 20:25 ` [PATCH 5/8] nfsd: reverse default of "ret" variable " Jeff Layton
2025-01-23 20:25 ` [PATCH 6/8] nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault() Jeff Layton
2025-01-23 20:25 ` [PATCH 7/8] nfsd: clean up and amend comments around nfsd4_cb_sequence_done() Jeff Layton
2025-01-24 14:43 ` Chuck Lever
2025-01-24 14:50 ` Jeff Layton
2025-01-24 15:05 ` Chuck Lever
2025-01-24 15:31 ` Jeff Layton
2025-01-24 15:42 ` Chuck Lever
2025-01-26 16:50 ` Chuck Lever
2025-01-23 20:25 ` [PATCH 8/8] sunrpc: make rpc_restart_call() and rpc_restart_call_prepare() void return Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac5834d4-1465-4dde-a451-b0804c537f04@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=bfields@fieldses.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jlayton@kernel.org \
--cc=kinglongmee@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox