From: Christian Herzog <herzog@phys.ethz.ch>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: linux-block@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 20 Apr 2023 14:57:37 +0200 [thread overview]
Message-ID: <ZEE2wZPpY7JBWbY8@phys.ethz.ch> (raw)
In-Reply-To: <50766556-cd33-7506-13b1-64940b5995bb@huaweicloud.com>
Dear all,
we just had another freeze on one of our bookworm file servers. The scenario
is a bit different, but the root cause might be just the same. So what
happened:
- the server had been happily serving NFS + SMB for two weeks
- today I noticed a left-over rsync process from a recent backup run that
didn't do any IO and was in D state
- I killed this rsync process, but since it was in D, it never died
- after a few minutes I noticed an nfsd in D state too (but just one). I
watched it for a bit and then decided to try "service nfs-kernel-server
restart" to see if again nfs was involved. I guess it was...
- from then on, all sorts of processes entered eternal D: several smbd,
autofs, the rsync and one nfsd
- however: at all times, the underlying file systems seemed perfectly fine. We
could write to every single one of them and gdu the hundred-TiB ones without
a problem
- my impression is that at least this time, nfsd was just one of the victims
of a deeper problem
- we took all the forensics suggested last time by Kuai and Bob. I don't
really understand them, but here's the facts:
- memory on the machine is completely uncritical, < 20% used
- the rqos/wbt/inflight of all block devices are 0 (remember: those are
iSCSI LUNs)
- all the hctx* values seem unsuspicious to me, but what do I know
- the stacks traces of the D processes don't show any rq_qos_wait this time
here's the D rsync trace:
[<0>] iterate_dir+0x52/0x1c0
[<0>] __x64_sys_getdents64+0x84/0x120
[<0>] do_syscall_64+0x58/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
and the D nfsd:
[<0>] vfs_rename+0x266/0xd70
[<0>] nfsd_rename+0x327/0x470 [nfsd]
[<0>] nfsd4_rename+0x53/0x110 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30
all the forensics are contained in
https://people.phys.ethz.ch/~daduke/freeze.tgz
we would be extremely grateful for any hints how we can debug (or even solve)
this. We're really at a loss here...
thanks and kind regards,
-Christian
--
Dr. Christian Herzog <herzog@phys.ethz.ch> support: +41 44 633 26 68
Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland http://isg.phys.ethz.ch/
next prev parent reply other threads:[~2023-04-20 12:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-06 16:59 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
2023-04-07 6:26 ` Yu Kuai
2023-04-20 12:57 ` Christian Herzog [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-04-06 11:09 Christian Herzog
2023-04-06 13:48 ` Chuck Lever III
2023-04-06 15:33 ` Christian Herzog
2023-04-06 15:40 ` Chuck Lever III
2023-04-06 15:54 ` Christian Herzog
2023-04-06 16:19 ` Chuck Lever III
[not found] ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
2023-04-06 17:26 ` Christian Herzog
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEE2wZPpY7JBWbY8@phys.ethz.ch \
--to=herzog@phys.ethz.ch \
--cc=linux-block@vger.kernel.org \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.