Linux NFS development
 help / color / mirror / Atom feed
From: Li Lingfeng <lilingfeng3@huawei.com>
To: Mark Liam Brown <brownmarkliam@gmail.com>,
	<linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: yangerkun <yangerkun@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>, <chengzhihao1@huawei.com>,
	Hou Tao <houtao1@huawei.com>
Subject: Re: [bug report] deploying both NFS client and server on the same machine triggle hungtask
Date: Tue, 26 Nov 2024 10:28:49 +0800	[thread overview]
Message-ID: <9420a368-8d18-4920-b196-a65cb265a26a@huawei.com> (raw)
In-Reply-To: <CAN0SSYwzsVEvopiuJuQTbJkOeGhDtLLFMsetVM2m5zOa0JEwDA@mail.gmail.com>


在 2024/11/26 1:32, Mark Liam Brown 写道:
> On Mon, Nov 25, 2024 at 1:48 PM Li Lingfeng <lilingfeng3@huawei.com> wrote:
>> Hi, we have found a hungtask issue recently.
>>
>> Commit 7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low
>> memory condition") adds a shrinker to NFSD, which causes NFSD to try to
>> obtain shrinker_rwsem when starting and stopping services.
>>
>> Deploying both NFS client and server on the same machine may lead to the
>> following issue, since they will share the global shrinker_rwsem.
>>
>>       nfsd                            nfs
>>                               drop_cache // hold shrinker_rwsem
>>                               write back, wait for rpc_task to exit
>> // stop nfsd threads
>> svc_set_num_threads
>> // clean up xprts
>> svc_xprt_destroy_all
>>                               rpc_check_timeout
>>                                rpc_check_connected
>>                                // wait for the connection to be disconnected
>> unregister_shrinker
>> // wait for shrinker_rwsem
>>
>> Normally, the client's rpc_task will exit after the server's nfsd thread
>> has processed the request.
>> When all the server's nfsd threads exit, the client’s rpc_task is expected
>> to detect the network connection being disconnected and exit.
>> However, although the server has executed svc_xprt_destroy_all before
>> waiting for shrinker_rwsem, the network connection is not actually
>> disconnected. Instead, the operation to close the socket is simply added
>> to the task_works queue.
>>
>> svc_xprt_destroy_all
>>    ...
>>    svc_sock_free
>>     sockfd_put
>>      fput_many
>>       init_task_work // ____fput
>>       task_work_add // add to task->task_works
>>
>> The actual disconnection of the network connection will only occur after
>> the current process finishes.
>> do_exit
>>    exit_task_work
>>     task_work_run
>>     ...
>>      ____fput // close sock
>>
>> Although it is not a common practice to deploy NFS client and server on
>> the same machine, I think this issue still needs to be addressed,
>> otherwise it will cause all processes trying to acquire the shrinker_rwsem
>> to hang.
> I disagree with that comment. Most small companies have NFS client and
> NFS server on the same machine, the client being used to allow logins
> by users, or to support schroot or containers.
>
> Mark

Sorry for my hasty conclusion.

By the way, nfsd_reply_cache_shrinker triggers this too.

Li


  reply	other threads:[~2024-11-26  2:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-25 11:17 [bug report] deploying both NFS client and server on the same machine triggle hungtask Li Lingfeng
2024-11-25 17:32 ` Mark Liam Brown
2024-11-26  2:28   ` Li Lingfeng [this message]
2024-11-28  7:22 ` Li Lingfeng
2024-12-02 16:05   ` Chuck Lever III
2024-12-03  2:32     ` Li Lingfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9420a368-8d18-4920-b196-a65cb265a26a@huawei.com \
    --to=lilingfeng3@huawei.com \
    --cc=brownmarkliam@gmail.com \
    --cc=chengzhihao1@huawei.com \
    --cc=houtao1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox