Linux NFS development
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: jlayton@kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed
Date: Sat, 30 Mar 2024 10:46:08 -0700	[thread overview]
Message-ID: <09da882f-4bad-4398-a2d0-ef8daefce374@oracle.com> (raw)
In-Reply-To: <ZgdR48dhdSoUj+VM@tissot.1015granger.net>


On 3/29/24 4:42 PM, Chuck Lever wrote:
> On Fri, Mar 29, 2024 at 10:57:22AM -0700, Dai Ngo wrote:
>> On 3/29/24 7:55 AM, Chuck Lever wrote:
>>> On Thu, Mar 28, 2024 at 05:31:02PM -0700, Dai Ngo wrote:
>>>> On 3/28/24 11:14 AM, Dai Ngo wrote:
>>>>> On 3/28/24 7:08 AM, Chuck Lever wrote:
>>>>>> On Wed, Mar 27, 2024 at 06:09:28PM -0700, Dai Ngo wrote:
>>>>>>> On 3/26/24 11:27 AM, Chuck Lever wrote:
>>>>>>>> On Tue, Mar 26, 2024 at 11:13:29AM -0700, Dai Ngo wrote:
>>>>>>>>> Currently when a nfs4_client is destroyed we wait for
>>>>>>>>> the cb_recall_any
>>>>>>>>> callback to complete before proceed. This adds
>>>>>>>>> unnecessary delay to the
>>>>>>>>> __destroy_client call if there is problem communicating
>>>>>>>>> with the client.
>>>>>>>> By "unnecessary delay" do you mean only the seven-second RPC
>>>>>>>> retransmit timeout, or is there something else?
>>>>>>> when the client network interface is down, the RPC task takes ~9s to
>>>>>>> send the callback, waits for the reply and gets ETIMEDOUT. This process
>>>>>>> repeats in a loop with the same RPC task before being stopped by
>>>>>>> rpc_shutdown_client after client lease expires.
>>>>>> I'll have to review this code again, but rpc_shutdown_client
>>>>>> should cause these RPCs to terminate immediately and safely. Can't
>>>>>> we use that?
>>>>> rpc_shutdown_client works, it terminated the RPC call to stop the loop.
>>>>>
>>>>>>> It takes a total of about 1m20s before the CB_RECALL is terminated.
>>>>>>> For CB_RECALL_ANY and CB_OFFLOAD, this process gets in to a infinite
>>>>>>> loop since there is no delegation conflict and the client is allowed
>>>>>>> to stay in courtesy state.
>>>>>>>
>>>>>>> The loop happens because in nfsd4_cb_sequence_done if cb_seq_status
>>>>>>> is 1 (an RPC Reply was never received) it calls nfsd4_mark_cb_fault
>>>>>>> to set the NFSD4_CB_FAULT bit. It then sets cb_need_restart to true.
>>>>>>> When nfsd4_cb_release is called, it checks cb_need_restart bit and
>>>>>>> re-queues the work again.
>>>>>> Something in the sequence_done path should check if the server is
>>>>>> tearing down this callback connection. If it doesn't, that is a bug
>>>>>> IMO.
>>>> TCP terminated the connection after retrying for 16 minutes and
>>>> notified the RPC layer which deleted the nfsd4_conn.
>>> The server should have closed this connection already.
>> Since there is no delegation conflict, the client remains in courtesy
>> state.
>>
>>>    Is it stuck
>>> waiting for the client to respond to a FIN or something?
>> No, in this case since the client network interface was down server
>> TCP did not even receive ACK for the transmit so the server TCP
>> kept retrying.
> It sounds like the server attempts to maintain the client's
> transport while the client is in courtesy state?

Yes, TCP keeps retryingwhile the client is in courtesy state.
  

>   I thought the
> purpose of courteous server was to more gracefully handle network
> partitions, in which case, the transport is not reliable.
>
> If a courtesy client hasn't re-established a connection with a
> backchannel by the time a conflicting open/lock request arrives,
> the server has no choice but to expire that client's courtesy
> state immediately. Right?

Yes, that is the case for CB_RECALL but not for CB_RECALL_ANY.

>
> But maybe that's a side-bar.
>
>
>>>> But when nfsd4_run_cb_work ran again, it got into the infinite
>>>> loop caused by:
>>>>        /*
>>>>         * XXX: Ideally, we could wait for the client to
>>>>         *      reconnect, but I haven't figured out how
>>>>         *      to do that yet.
>>>>         */
>>>>         nfsd4_queue_cb_delayed(cb, 25);
>>>>
>>>> which was introduced by c1ccfcf1a9bf. Note that I'm using 6.9-rc1.
>>> The whole paragraph is:
>>>
>>> 1503         clnt = clp->cl_cb_client;
>>> 1504         if (!clnt) {
>>> 1505                 if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
>>> 1506                         nfsd41_destroy_cb(cb);
>>> 1507                 else {
>>> 1508                         /*
>>> 1509                          * XXX: Ideally, we could wait for the client to
>>> 1510                          *      reconnect, but I haven't figured out how
>>> 1511                          *      to do that yet.
>>> 1512                          */
>>> 1513                         nfsd4_queue_cb_delayed(cb, 25);
>>> 1514                 }
>>> 1515                 return;
>>> 1516         }
>>>
>>> When there's no rpc_clnt and CB_KILL is set, the callback
>>> operation should be destroyed immediately. CB_KILL is set by
>>> nfsd4_shutdown_callback. It's only caller is
>>> __destroy_client().
>>>
>>> Why isn't NFSD4_CLIENT_CB_KILL set?
>> Since __destroy_client was not called, for the reason mentioned above,
>> nfsd4_shutdown_callback was not called so NFSD4_CLIENT_CB_KILL was not
>> set.
> Well, then I'm missing something. You said, above:
>
>> Currently when a nfs4_client is destroyed we wait for
> And __destroy_client() invokes nfsd4_shutdown_callback(). Can you
> explain the usage scenario(s) to me again?

__destroy_client is not called in the case of CB_RECALL_ANY since
there is no open/lock conflict associated the the client.

>
>
>> Since the nfsd callback_wq was created with alloc_ordered_workqueue,
>> once this loop happens the nfsd callback_wq is stuck since this workqueue
>> executes at most one work item at any given time.
>>
>> This means that if a nfs client is shutdown and the server is in this
>> state then all other clients are effected; all callbacks to other
>> clients are stuck in the workqueue. And if a callback for a client is
>> stuck in the workqueue then that client can not unmount its shares.
>>
>> This is the symptom that was reported recently on this list. I think
>> the nfsd callback_wq should be created as a normal workqueue that allows
>> multiple work items to be executed at the same time so a problem with
>> one client does not effect other clients.
> My impression is that the callback_wq is single-threaded to avoid
> locking complexity. I'm not yet convinced it would be safe to simply
> change that workqueue to permit multiple concurrent work items.

Do you have any specific concern about lock complexity related to
callback operation?

>
> It could be straightforward, however, to move the callback_wq into
> struct nfs4_client so that each client can have its own workqueue.
> Then we can take some time and design something less brittle and
> more scalable (and maybe come up with some test infrastructure so
> this stuff doesn't break as often).

IMHO I don't see why the callback workqueue has to be different
from the laundry_wq or nfsd_filecache_wq used by nfsd.

-Dai

>
>

  reply	other threads:[~2024-03-30 17:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 18:13 [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed Dai Ngo
2024-03-26 18:27 ` Chuck Lever
2024-03-28  1:09   ` Dai Ngo
2024-03-28 14:08     ` Chuck Lever
2024-03-28 18:14       ` Dai Ngo
2024-03-29  0:31         ` Dai Ngo
2024-03-29 14:55           ` Chuck Lever
2024-03-29 17:57             ` Dai Ngo
2024-03-29 23:42               ` Chuck Lever
2024-03-30 17:46                 ` Dai Ngo [this message]
2024-03-30 18:28                   ` Chuck Lever
2024-03-30 23:30                     ` Dai Ngo
2024-04-01 12:49                       ` Jeff Layton
2024-04-01 13:34                         ` Chuck Lever
2024-04-01 16:00                           ` Dai Ngo
2024-04-01 16:46                             ` Dai Ngo
2024-04-01 17:49                               ` Chuck Lever
2024-04-01 19:55                                 ` Dai Ngo
2024-04-01 20:17                                   ` Dai Ngo
2024-04-02 13:58                                   ` Chuck Lever
2024-04-02 14:29                                     ` Dai Ngo
2024-04-01 16:11                           ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09da882f-4bad-4398-a2d0-ef8daefce374@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox