Linux NFS development
 help / color / mirror / Atom feed
From: Li Lingfeng <lilingfeng3@huawei.com>
To: <Dai.Ngo@oracle.com>, Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neilb@suse.de>,
	<okorniev@redhat.com>, <tom@talpey.com>,
	<trond.myklebust@hammerspace.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Yu Kuai <yukuai1@huaweicloud.com>, Hou Tao <houtao1@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	yangerkun <yangerkun@huawei.com>, <chengzhihao1@huawei.com>,
	Li Lingfeng <lilingfeng3@huawei.com>,
	Li Lingfeng <lilingfeng@huaweicloud.com>
Subject: [bug report] deploying both NFS client and server on the same machine triggle hungtask
Date: Mon, 25 Nov 2024 19:17:13 +0800	[thread overview]
Message-ID: <887cd8f6-3e49-410c-8b36-9e617c34ca6f@huawei.com> (raw)

Hi, we have found a hungtask issue recently.

Commit 7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low
memory condition") adds a shrinker to NFSD, which causes NFSD to try to
obtain shrinker_rwsem when starting and stopping services.

Deploying both NFS client and server on the same machine may lead to the
following issue, since they will share the global shrinker_rwsem.

     nfsd                            nfs
                             drop_cache // hold shrinker_rwsem
                             write back, wait for rpc_task to exit
// stop nfsd threads
svc_set_num_threads
// clean up xprts
svc_xprt_destroy_all
                             rpc_check_timeout
                              rpc_check_connected
                              // wait for the connection to be disconnected
unregister_shrinker
// wait for shrinker_rwsem

Normally, the client's rpc_task will exit after the server's nfsd thread
has processed the request.
When all the server's nfsd threads exit, the client’s rpc_task is expected
to detect the network connection being disconnected and exit.
However, although the server has executed svc_xprt_destroy_all before
waiting for shrinker_rwsem, the network connection is not actually
disconnected. Instead, the operation to close the socket is simply added
to the task_works queue.

svc_xprt_destroy_all
  ...
  svc_sock_free
   sockfd_put
    fput_many
     init_task_work // ____fput
     task_work_add // add to task->task_works

The actual disconnection of the network connection will only occur after
the current process finishes.
do_exit
  exit_task_work
   task_work_run
   ...
    ____fput // close sock

Although it is not a common practice to deploy NFS client and server on
the same machine, I think this issue still needs to be addressed,
otherwise it will cause all processes trying to acquire the shrinker_rwsem
to hang.

I don't have any ideas yet on how to solve this problem, does anyone have
any suggestions?

Thanks.


             reply	other threads:[~2024-11-25 11:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-25 11:17 Li Lingfeng [this message]
2024-11-25 17:32 ` [bug report] deploying both NFS client and server on the same machine triggle hungtask Mark Liam Brown
2024-11-26  2:28   ` Li Lingfeng
2024-11-28  7:22 ` Li Lingfeng
2024-12-02 16:05   ` Chuck Lever III
2024-12-03  2:32     ` Li Lingfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=887cd8f6-3e49-410c-8b36-9e617c34ca6f@huawei.com \
    --to=lilingfeng3@huawei.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=chengzhihao1@huawei.com \
    --cc=chuck.lever@oracle.com \
    --cc=houtao1@huawei.com \
    --cc=jlayton@kernel.org \
    --cc=lilingfeng@huaweicloud.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox