From: Dai Ngo <dai.ngo@oracle.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: jlayton@kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed
Date: Sat, 30 Mar 2024 10:46:08 -0700 [thread overview]
Message-ID: <09da882f-4bad-4398-a2d0-ef8daefce374@oracle.com> (raw)
In-Reply-To: <ZgdR48dhdSoUj+VM@tissot.1015granger.net>
On 3/29/24 4:42 PM, Chuck Lever wrote:
> On Fri, Mar 29, 2024 at 10:57:22AM -0700, Dai Ngo wrote:
>> On 3/29/24 7:55 AM, Chuck Lever wrote:
>>> On Thu, Mar 28, 2024 at 05:31:02PM -0700, Dai Ngo wrote:
>>>> On 3/28/24 11:14 AM, Dai Ngo wrote:
>>>>> On 3/28/24 7:08 AM, Chuck Lever wrote:
>>>>>> On Wed, Mar 27, 2024 at 06:09:28PM -0700, Dai Ngo wrote:
>>>>>>> On 3/26/24 11:27 AM, Chuck Lever wrote:
>>>>>>>> On Tue, Mar 26, 2024 at 11:13:29AM -0700, Dai Ngo wrote:
>>>>>>>>> Currently when a nfs4_client is destroyed we wait for
>>>>>>>>> the cb_recall_any
>>>>>>>>> callback to complete before proceed. This adds
>>>>>>>>> unnecessary delay to the
>>>>>>>>> __destroy_client call if there is problem communicating
>>>>>>>>> with the client.
>>>>>>>> By "unnecessary delay" do you mean only the seven-second RPC
>>>>>>>> retransmit timeout, or is there something else?
>>>>>>> when the client network interface is down, the RPC task takes ~9s to
>>>>>>> send the callback, waits for the reply and gets ETIMEDOUT. This process
>>>>>>> repeats in a loop with the same RPC task before being stopped by
>>>>>>> rpc_shutdown_client after client lease expires.
>>>>>> I'll have to review this code again, but rpc_shutdown_client
>>>>>> should cause these RPCs to terminate immediately and safely. Can't
>>>>>> we use that?
>>>>> rpc_shutdown_client works, it terminated the RPC call to stop the loop.
>>>>>
>>>>>>> It takes a total of about 1m20s before the CB_RECALL is terminated.
>>>>>>> For CB_RECALL_ANY and CB_OFFLOAD, this process gets in to a infinite
>>>>>>> loop since there is no delegation conflict and the client is allowed
>>>>>>> to stay in courtesy state.
>>>>>>>
>>>>>>> The loop happens because in nfsd4_cb_sequence_done if cb_seq_status
>>>>>>> is 1 (an RPC Reply was never received) it calls nfsd4_mark_cb_fault
>>>>>>> to set the NFSD4_CB_FAULT bit. It then sets cb_need_restart to true.
>>>>>>> When nfsd4_cb_release is called, it checks cb_need_restart bit and
>>>>>>> re-queues the work again.
>>>>>> Something in the sequence_done path should check if the server is
>>>>>> tearing down this callback connection. If it doesn't, that is a bug
>>>>>> IMO.
>>>> TCP terminated the connection after retrying for 16 minutes and
>>>> notified the RPC layer which deleted the nfsd4_conn.
>>> The server should have closed this connection already.
>> Since there is no delegation conflict, the client remains in courtesy
>> state.
>>
>>> Is it stuck
>>> waiting for the client to respond to a FIN or something?
>> No, in this case since the client network interface was down server
>> TCP did not even receive ACK for the transmit so the server TCP
>> kept retrying.
> It sounds like the server attempts to maintain the client's
> transport while the client is in courtesy state?
Yes, TCP keeps retryingwhile the client is in courtesy state.
> I thought the
> purpose of courteous server was to more gracefully handle network
> partitions, in which case, the transport is not reliable.
>
> If a courtesy client hasn't re-established a connection with a
> backchannel by the time a conflicting open/lock request arrives,
> the server has no choice but to expire that client's courtesy
> state immediately. Right?
Yes, that is the case for CB_RECALL but not for CB_RECALL_ANY.
>
> But maybe that's a side-bar.
>
>
>>>> But when nfsd4_run_cb_work ran again, it got into the infinite
>>>> loop caused by:
>>>> /*
>>>> * XXX: Ideally, we could wait for the client to
>>>> * reconnect, but I haven't figured out how
>>>> * to do that yet.
>>>> */
>>>> nfsd4_queue_cb_delayed(cb, 25);
>>>>
>>>> which was introduced by c1ccfcf1a9bf. Note that I'm using 6.9-rc1.
>>> The whole paragraph is:
>>>
>>> 1503 clnt = clp->cl_cb_client;
>>> 1504 if (!clnt) {
>>> 1505 if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
>>> 1506 nfsd41_destroy_cb(cb);
>>> 1507 else {
>>> 1508 /*
>>> 1509 * XXX: Ideally, we could wait for the client to
>>> 1510 * reconnect, but I haven't figured out how
>>> 1511 * to do that yet.
>>> 1512 */
>>> 1513 nfsd4_queue_cb_delayed(cb, 25);
>>> 1514 }
>>> 1515 return;
>>> 1516 }
>>>
>>> When there's no rpc_clnt and CB_KILL is set, the callback
>>> operation should be destroyed immediately. CB_KILL is set by
>>> nfsd4_shutdown_callback. It's only caller is
>>> __destroy_client().
>>>
>>> Why isn't NFSD4_CLIENT_CB_KILL set?
>> Since __destroy_client was not called, for the reason mentioned above,
>> nfsd4_shutdown_callback was not called so NFSD4_CLIENT_CB_KILL was not
>> set.
> Well, then I'm missing something. You said, above:
>
>> Currently when a nfs4_client is destroyed we wait for
> And __destroy_client() invokes nfsd4_shutdown_callback(). Can you
> explain the usage scenario(s) to me again?
__destroy_client is not called in the case of CB_RECALL_ANY since
there is no open/lock conflict associated the the client.
>
>
>> Since the nfsd callback_wq was created with alloc_ordered_workqueue,
>> once this loop happens the nfsd callback_wq is stuck since this workqueue
>> executes at most one work item at any given time.
>>
>> This means that if a nfs client is shutdown and the server is in this
>> state then all other clients are effected; all callbacks to other
>> clients are stuck in the workqueue. And if a callback for a client is
>> stuck in the workqueue then that client can not unmount its shares.
>>
>> This is the symptom that was reported recently on this list. I think
>> the nfsd callback_wq should be created as a normal workqueue that allows
>> multiple work items to be executed at the same time so a problem with
>> one client does not effect other clients.
> My impression is that the callback_wq is single-threaded to avoid
> locking complexity. I'm not yet convinced it would be safe to simply
> change that workqueue to permit multiple concurrent work items.
Do you have any specific concern about lock complexity related to
callback operation?
>
> It could be straightforward, however, to move the callback_wq into
> struct nfs4_client so that each client can have its own workqueue.
> Then we can take some time and design something less brittle and
> more scalable (and maybe come up with some test infrastructure so
> this stuff doesn't break as often).
IMHO I don't see why the callback workqueue has to be different
from the laundry_wq or nfsd_filecache_wq used by nfsd.
-Dai
>
>
next prev parent reply other threads:[~2024-03-30 17:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 18:13 [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed Dai Ngo
2024-03-26 18:27 ` Chuck Lever
2024-03-28 1:09 ` Dai Ngo
2024-03-28 14:08 ` Chuck Lever
2024-03-28 18:14 ` Dai Ngo
2024-03-29 0:31 ` Dai Ngo
2024-03-29 14:55 ` Chuck Lever
2024-03-29 17:57 ` Dai Ngo
2024-03-29 23:42 ` Chuck Lever
2024-03-30 17:46 ` Dai Ngo [this message]
2024-03-30 18:28 ` Chuck Lever
2024-03-30 23:30 ` Dai Ngo
2024-04-01 12:49 ` Jeff Layton
2024-04-01 13:34 ` Chuck Lever
2024-04-01 16:00 ` Dai Ngo
2024-04-01 16:46 ` Dai Ngo
2024-04-01 17:49 ` Chuck Lever
2024-04-01 19:55 ` Dai Ngo
2024-04-01 20:17 ` Dai Ngo
2024-04-02 13:58 ` Chuck Lever
2024-04-02 14:29 ` Dai Ngo
2024-04-01 16:11 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=09da882f-4bad-4398-a2d0-ef8daefce374@oracle.com \
--to=dai.ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox