From: Chuck Lever <chuck.lever@oracle.com>
To: Dai Ngo <dai.ngo@oracle.com>
Cc: jlayton@kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed
Date: Fri, 29 Mar 2024 10:55:54 -0400 [thread overview]
Message-ID: <ZgbWevtNp8Vust4A@tissot.1015granger.net> (raw)
In-Reply-To: <c97be8b9-c0ba-4f2d-9340-78368008ba4b@oracle.com>
On Thu, Mar 28, 2024 at 05:31:02PM -0700, Dai Ngo wrote:
>
> On 3/28/24 11:14 AM, Dai Ngo wrote:
> >
> > On 3/28/24 7:08 AM, Chuck Lever wrote:
> > > On Wed, Mar 27, 2024 at 06:09:28PM -0700, Dai Ngo wrote:
> > > > On 3/26/24 11:27 AM, Chuck Lever wrote:
> > > > > On Tue, Mar 26, 2024 at 11:13:29AM -0700, Dai Ngo wrote:
> > > > > > Currently when a nfs4_client is destroyed we wait for
> > > > > > the cb_recall_any
> > > > > > callback to complete before proceed. This adds
> > > > > > unnecessary delay to the
> > > > > > __destroy_client call if there is problem communicating
> > > > > > with the client.
> > > > > By "unnecessary delay" do you mean only the seven-second RPC
> > > > > retransmit timeout, or is there something else?
> > > > when the client network interface is down, the RPC task takes ~9s to
> > > > send the callback, waits for the reply and gets ETIMEDOUT. This process
> > > > repeats in a loop with the same RPC task before being stopped by
> > > > rpc_shutdown_client after client lease expires.
> > > I'll have to review this code again, but rpc_shutdown_client
> > > should cause these RPCs to terminate immediately and safely. Can't
> > > we use that?
> >
> > rpc_shutdown_client works, it terminated the RPC call to stop the loop.
> >
> > >
> > >
> > > > It takes a total of about 1m20s before the CB_RECALL is terminated.
> > > > For CB_RECALL_ANY and CB_OFFLOAD, this process gets in to a infinite
> > > > loop since there is no delegation conflict and the client is allowed
> > > > to stay in courtesy state.
> > > >
> > > > The loop happens because in nfsd4_cb_sequence_done if cb_seq_status
> > > > is 1 (an RPC Reply was never received) it calls nfsd4_mark_cb_fault
> > > > to set the NFSD4_CB_FAULT bit. It then sets cb_need_restart to true.
> > > > When nfsd4_cb_release is called, it checks cb_need_restart bit and
> > > > re-queues the work again.
> > > Something in the sequence_done path should check if the server is
> > > tearing down this callback connection. If it doesn't, that is a bug
> > > IMO.
>
> TCP terminated the connection after retrying for 16 minutes and
> notified the RPC layer which deleted the nfsd4_conn.
The server should have closed this connection already. Is it stuck
waiting for the client to respond to a FIN or something?
> But when nfsd4_run_cb_work ran again, it got into the infinite
> loop caused by:
> /*
> * XXX: Ideally, we could wait for the client to
> * reconnect, but I haven't figured out how
> * to do that yet.
> */
> nfsd4_queue_cb_delayed(cb, 25);
>
> which was introduced by c1ccfcf1a9bf. Note that I'm using 6.9-rc1.
The whole paragraph is:
1503 clnt = clp->cl_cb_client;
1504 if (!clnt) {
1505 if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
1506 nfsd41_destroy_cb(cb);
1507 else {
1508 /*
1509 * XXX: Ideally, we could wait for the client to
1510 * reconnect, but I haven't figured out how
1511 * to do that yet.
1512 */
1513 nfsd4_queue_cb_delayed(cb, 25);
1514 }
1515 return;
1516 }
When there's no rpc_clnt and CB_KILL is set, the callback
operation should be destroyed immediately. CB_KILL is set by
nfsd4_shutdown_callback. It's only caller is
__destroy_client().
Why isn't NFSD4_CLIENT_CB_KILL set?
--
Chuck Lever
next prev parent reply other threads:[~2024-03-29 14:56 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 18:13 [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed Dai Ngo
2024-03-26 18:27 ` Chuck Lever
2024-03-28 1:09 ` Dai Ngo
2024-03-28 14:08 ` Chuck Lever
2024-03-28 18:14 ` Dai Ngo
2024-03-29 0:31 ` Dai Ngo
2024-03-29 14:55 ` Chuck Lever [this message]
2024-03-29 17:57 ` Dai Ngo
2024-03-29 23:42 ` Chuck Lever
2024-03-30 17:46 ` Dai Ngo
2024-03-30 18:28 ` Chuck Lever
2024-03-30 23:30 ` Dai Ngo
2024-04-01 12:49 ` Jeff Layton
2024-04-01 13:34 ` Chuck Lever
2024-04-01 16:00 ` Dai Ngo
2024-04-01 16:46 ` Dai Ngo
2024-04-01 17:49 ` Chuck Lever
2024-04-01 19:55 ` Dai Ngo
2024-04-01 20:17 ` Dai Ngo
2024-04-02 13:58 ` Chuck Lever
2024-04-02 14:29 ` Dai Ngo
2024-04-01 16:11 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZgbWevtNp8Vust4A@tissot.1015granger.net \
--to=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox