From: Li Lingfeng <lilingfeng3@huawei.com>
To: Mark Liam Brown <brownmarkliam@gmail.com>,
<linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: yangerkun <yangerkun@huawei.com>,
"zhangyi (F)" <yi.zhang@huawei.com>,
"yukuai (C)" <yukuai3@huawei.com>, <chengzhihao1@huawei.com>,
Hou Tao <houtao1@huawei.com>
Subject: Re: [bug report] deploying both NFS client and server on the same machine triggle hungtask
Date: Tue, 26 Nov 2024 10:28:49 +0800 [thread overview]
Message-ID: <9420a368-8d18-4920-b196-a65cb265a26a@huawei.com> (raw)
In-Reply-To: <CAN0SSYwzsVEvopiuJuQTbJkOeGhDtLLFMsetVM2m5zOa0JEwDA@mail.gmail.com>
在 2024/11/26 1:32, Mark Liam Brown 写道:
> On Mon, Nov 25, 2024 at 1:48 PM Li Lingfeng <lilingfeng3@huawei.com> wrote:
>> Hi, we have found a hungtask issue recently.
>>
>> Commit 7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low
>> memory condition") adds a shrinker to NFSD, which causes NFSD to try to
>> obtain shrinker_rwsem when starting and stopping services.
>>
>> Deploying both NFS client and server on the same machine may lead to the
>> following issue, since they will share the global shrinker_rwsem.
>>
>> nfsd nfs
>> drop_cache // hold shrinker_rwsem
>> write back, wait for rpc_task to exit
>> // stop nfsd threads
>> svc_set_num_threads
>> // clean up xprts
>> svc_xprt_destroy_all
>> rpc_check_timeout
>> rpc_check_connected
>> // wait for the connection to be disconnected
>> unregister_shrinker
>> // wait for shrinker_rwsem
>>
>> Normally, the client's rpc_task will exit after the server's nfsd thread
>> has processed the request.
>> When all the server's nfsd threads exit, the client’s rpc_task is expected
>> to detect the network connection being disconnected and exit.
>> However, although the server has executed svc_xprt_destroy_all before
>> waiting for shrinker_rwsem, the network connection is not actually
>> disconnected. Instead, the operation to close the socket is simply added
>> to the task_works queue.
>>
>> svc_xprt_destroy_all
>> ...
>> svc_sock_free
>> sockfd_put
>> fput_many
>> init_task_work // ____fput
>> task_work_add // add to task->task_works
>>
>> The actual disconnection of the network connection will only occur after
>> the current process finishes.
>> do_exit
>> exit_task_work
>> task_work_run
>> ...
>> ____fput // close sock
>>
>> Although it is not a common practice to deploy NFS client and server on
>> the same machine, I think this issue still needs to be addressed,
>> otherwise it will cause all processes trying to acquire the shrinker_rwsem
>> to hang.
> I disagree with that comment. Most small companies have NFS client and
> NFS server on the same machine, the client being used to allow logins
> by users, or to support schroot or containers.
>
> Mark
Sorry for my hasty conclusion.
By the way, nfsd_reply_cache_shrinker triggers this too.
Li
next prev parent reply other threads:[~2024-11-26 2:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-25 11:17 [bug report] deploying both NFS client and server on the same machine triggle hungtask Li Lingfeng
2024-11-25 17:32 ` Mark Liam Brown
2024-11-26 2:28 ` Li Lingfeng [this message]
2024-11-28 7:22 ` Li Lingfeng
2024-12-02 16:05 ` Chuck Lever III
2024-12-03 2:32 ` Li Lingfeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9420a368-8d18-4920-b196-a65cb265a26a@huawei.com \
--to=lilingfeng3@huawei.com \
--cc=brownmarkliam@gmail.com \
--cc=chengzhihao1@huawei.com \
--cc=houtao1@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox