From: Li Lingfeng <lilingfeng3@huawei.com>
To: <Dai.Ngo@oracle.com>, Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>, NeilBrown <neilb@suse.de>,
<okorniev@redhat.com>, <tom@talpey.com>,
<trond.myklebust@hammerspace.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Yu Kuai <yukuai1@huaweicloud.com>, Hou Tao <houtao1@huawei.com>,
"zhangyi (F)" <yi.zhang@huawei.com>,
yangerkun <yangerkun@huawei.com>, <chengzhihao1@huawei.com>,
Li Lingfeng <lilingfeng@huaweicloud.com>
Subject: Re: [bug report] deploying both NFS client and server on the same machine triggle hungtask
Date: Thu, 28 Nov 2024 15:22:04 +0800 [thread overview]
Message-ID: <8b155d3c-62b4-4f16-ab00-e3d030148d29@huawei.com> (raw)
In-Reply-To: <887cd8f6-3e49-410c-8b36-9e617c34ca6f@huawei.com>
Besides nfsd_file_shrinker, the nfsd_client_shrinker added by commit
7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low memory
condition") in 2022 and the nfsd_reply_cache_shrinker added by commit
3ba75830ce17 ("nfsd4: drc containerization") in 2019 may also trigger such
an issue.
Was this scenario not considered when designing the shrinkers for NFSD, or
was it deemed unreasonable and not worth considering?
在 2024/11/25 19:17, Li Lingfeng 写道:
> Hi, we have found a hungtask issue recently.
>
> Commit 7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low
> memory condition") adds a shrinker to NFSD, which causes NFSD to try to
> obtain shrinker_rwsem when starting and stopping services.
>
> Deploying both NFS client and server on the same machine may lead to the
> following issue, since they will share the global shrinker_rwsem.
>
> nfsd nfs
> drop_cache // hold shrinker_rwsem
> write back, wait for rpc_task to exit
> // stop nfsd threads
> svc_set_num_threads
> // clean up xprts
> svc_xprt_destroy_all
> rpc_check_timeout
> rpc_check_connected
> // wait for the connection to be
> disconnected
> unregister_shrinker
> // wait for shrinker_rwsem
>
> Normally, the client's rpc_task will exit after the server's nfsd thread
> has processed the request.
> When all the server's nfsd threads exit, the client’s rpc_task is
> expected
> to detect the network connection being disconnected and exit.
> However, although the server has executed svc_xprt_destroy_all before
> waiting for shrinker_rwsem, the network connection is not actually
> disconnected. Instead, the operation to close the socket is simply added
> to the task_works queue.
>
> svc_xprt_destroy_all
> ...
> svc_sock_free
> sockfd_put
> fput_many
> init_task_work // ____fput
> task_work_add // add to task->task_works
>
> The actual disconnection of the network connection will only occur after
> the current process finishes.
> do_exit
> exit_task_work
> task_work_run
> ...
> ____fput // close sock
>
> Although it is not a common practice to deploy NFS client and server on
> the same machine, I think this issue still needs to be addressed,
> otherwise it will cause all processes trying to acquire the
> shrinker_rwsem
> to hang.
>
> I don't have any ideas yet on how to solve this problem, does anyone have
> any suggestions?
>
> Thanks.
>
next prev parent reply other threads:[~2024-11-28 7:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-25 11:17 [bug report] deploying both NFS client and server on the same machine triggle hungtask Li Lingfeng
2024-11-25 17:32 ` Mark Liam Brown
2024-11-26 2:28 ` Li Lingfeng
2024-11-28 7:22 ` Li Lingfeng [this message]
2024-12-02 16:05 ` Chuck Lever III
2024-12-03 2:32 ` Li Lingfeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b155d3c-62b4-4f16-ab00-e3d030148d29@huawei.com \
--to=lilingfeng3@huawei.com \
--cc=Dai.Ngo@oracle.com \
--cc=chengzhihao1@huawei.com \
--cc=chuck.lever@oracle.com \
--cc=houtao1@huawei.com \
--cc=jlayton@kernel.org \
--cc=lilingfeng@huaweicloud.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
--cc=trond.myklebust@hammerspace.com \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai1@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox