From: Christian Herzog <herzog@phys.ethz.ch>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: linux-block@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 20 Apr 2023 14:57:37 +0200 [thread overview]
Message-ID: <ZEE2wZPpY7JBWbY8@phys.ethz.ch> (raw)
In-Reply-To: <50766556-cd33-7506-13b1-64940b5995bb@huaweicloud.com>
Dear all,
we just had another freeze on one of our bookworm file servers. The scenario
is a bit different, but the root cause might be just the same. So what
happened:
- the server had been happily serving NFS + SMB for two weeks
- today I noticed a left-over rsync process from a recent backup run that
didn't do any IO and was in D state
- I killed this rsync process, but since it was in D, it never died
- after a few minutes I noticed an nfsd in D state too (but just one). I
watched it for a bit and then decided to try "service nfs-kernel-server
restart" to see if again nfs was involved. I guess it was...
- from then on, all sorts of processes entered eternal D: several smbd,
autofs, the rsync and one nfsd
- however: at all times, the underlying file systems seemed perfectly fine. We
could write to every single one of them and gdu the hundred-TiB ones without
a problem
- my impression is that at least this time, nfsd was just one of the victims
of a deeper problem
- we took all the forensics suggested last time by Kuai and Bob. I don't
really understand them, but here's the facts:
- memory on the machine is completely uncritical, < 20% used
- the rqos/wbt/inflight of all block devices are 0 (remember: those are
iSCSI LUNs)
- all the hctx* values seem unsuspicious to me, but what do I know
- the stacks traces of the D processes don't show any rq_qos_wait this time
here's the D rsync trace:
[<0>] iterate_dir+0x52/0x1c0
[<0>] __x64_sys_getdents64+0x84/0x120
[<0>] do_syscall_64+0x58/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
and the D nfsd:
[<0>] vfs_rename+0x266/0xd70
[<0>] nfsd_rename+0x327/0x470 [nfsd]
[<0>] nfsd4_rename+0x53/0x110 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30
all the forensics are contained in
https://people.phys.ethz.ch/~daduke/freeze.tgz
we would be extremely grateful for any hints how we can debug (or even solve)
this. We're really at a loss here...
thanks and kind regards,
-Christian
--
Dr. Christian Herzog <herzog@phys.ethz.ch> support: +41 44 633 26 68
Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland http://isg.phys.ethz.ch/
prev parent reply other threads:[~2023-04-20 12:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-06 16:59 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
2023-04-07 6:26 ` Yu Kuai
2023-04-20 12:57 ` Christian Herzog [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEE2wZPpY7JBWbY8@phys.ethz.ch \
--to=herzog@phys.ethz.ch \
--cc=linux-block@vger.kernel.org \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).