From: Christian Herzog <herzog@phys.ethz.ch>
To: linux-block@vger.kernel.org
Subject: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 6 Apr 2023 18:59:32 +0200 [thread overview]
Message-ID: <ZC76dIshWvaWlki4@phys.ethz.ch> (raw)
Dear all,
disclaimer: this email was originally posted to linux-nfs since we believed
the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a
problem further down in the storage stack and referred to you guys, so here we
are:
for our researchers we are running file servers in the hundreds-of-TiB to
low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
we prepared an upgrade to Debian bookworm and tests went well. About a week
after one of the upgrades, we ran into the first occurence of our problem: all
of a sudden, all nfsds enter the D state and are not recoverable. However, the
underlying file systems seem fine and can be read and written to. The only way
out appears to be to reboot the server. The only clues are the frozen nfsds
and strack traces like
[<0>] rq_qos_wait+0xbc/0x130
[<0>] wbt_wait+0xa2/0x110
[<0>] __rq_qos_throttle+0x20/0x40
[<0>] blk_mq_submit_bio+0x2d3/0x580
[<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
[<0>] iomap_submit_ioend+0x4b/0x80
[<0>] iomap_do_writepage+0x4b4/0x820
[<0>] write_cache_pages+0x180/0x4c0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
[<0>] do_writepages+0xbd/0x1c0
[<0>] filemap_fdatawrite_wbc+0x5f/0x80
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] file_write_and_wait_range+0x41/0x90
[<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
[<0>] nfsd_commit+0x93/0x190 [nfsd]
[<0>] nfsd4_commit+0x5e/0x90 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30
(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
it and it has now happened 4 times on one server and once on a second.
Needless to say, these are production systems, so we have a window of a few
minutes for debugging before people start yelling. We've thrown everything we
could at our test setup but so far haven't been able to trigger it.
Any pointers would be highly appreciated.
thanks and best regards,
-Christian
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
uname -vr
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)
apt list --installed '*nfs*'
libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]
nfsconf -d
[exportd]
debug = all
[exportfs]
debug = all
[general]
pipefs-directory = /run/rpc_pipefs
[lockd]
port = 32769
udp-port = 32769
[mountd]
debug = all
manage-gids = True
port = 892
[nfsd]
debug = all
port = 2049
threads = 48
[nfsdcld]
debug = all
[nfsdcltrack]
debug = all
[sm-notify]
debug = all
outgoing-port = 846
[statd]
debug = all
outgoing-port = 2020
port = 662
--
Dr. Christian Herzog <herzog@phys.ethz.ch> support: +41 44 633 26 68
Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland http://isg.phys.ethz.ch/
----- End forwarded message -----
--
Dr. Christian Herzog <herzog@phys.ethz.ch> support: +41 44 633 26 68
Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland http://isg.phys.ethz.ch/
next reply other threads:[~2023-04-06 17:06 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-06 16:59 Christian Herzog [this message]
2023-04-07 6:26 ` file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Yu Kuai
2023-04-20 12:57 ` Christian Herzog
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZC76dIshWvaWlki4@phys.ethz.ch \
--to=herzog@phys.ethz.ch \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).