linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Herzog <herzog@phys.ethz.ch>
To: linux-block@vger.kernel.org
Subject: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 6 Apr 2023 18:59:32 +0200	[thread overview]
Message-ID: <ZC76dIshWvaWlki4@phys.ethz.ch> (raw)

Dear all,

disclaimer: this email was originally posted to linux-nfs since we believed
the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a
problem further down in the storage stack and referred to you guys, so here we
are:

for our researchers we are running file servers in the hundreds-of-TiB to
low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
we prepared an upgrade to Debian bookworm and tests went well. About a week
after one of the upgrades, we ran into the first occurence of our problem: all
of a sudden, all nfsds enter the D state and are not recoverable. However, the
underlying file systems seem fine and can be read and written to. The only way
out appears to be to reboot the server. The only clues are the frozen nfsds
and strack traces like

[<0>] rq_qos_wait+0xbc/0x130
[<0>] wbt_wait+0xa2/0x110
[<0>] __rq_qos_throttle+0x20/0x40
[<0>] blk_mq_submit_bio+0x2d3/0x580
[<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
[<0>] iomap_submit_ioend+0x4b/0x80
[<0>] iomap_do_writepage+0x4b4/0x820
[<0>] write_cache_pages+0x180/0x4c0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
[<0>] do_writepages+0xbd/0x1c0
[<0>] filemap_fdatawrite_wbc+0x5f/0x80
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] file_write_and_wait_range+0x41/0x90
[<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
[<0>] nfsd_commit+0x93/0x190 [nfsd]
[<0>] nfsd4_commit+0x5e/0x90 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
it and it has now happened 4 times on one server and once on a second.
Needless to say, these are production systems, so we have a window of a few
minutes for debugging before people start yelling. We've thrown everything we
could at our test setup but so far haven't been able to trigger it.
Any pointers would be highly appreciated.


thanks and best regards,
-Christian



cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

uname -vr
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)

apt list --installed '*nfs*'
libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]

nfsconf -d
[exportd]
 debug = all
[exportfs]
 debug = all
[general]
 pipefs-directory = /run/rpc_pipefs
[lockd]
 port = 32769
 udp-port = 32769
[mountd]
 debug = all
 manage-gids = True
 port = 892
[nfsd]
 debug = all
 port = 2049
 threads = 48
[nfsdcld]
 debug = all
[nfsdcltrack]
 debug = all
[sm-notify]
 debug = all
 outgoing-port = 846
[statd]
 debug = all
 outgoing-port = 2020
 port = 662



-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

----- End forwarded message -----

-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

             reply	other threads:[~2023-04-06 17:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-06 16:59 Christian Herzog [this message]
2023-04-07  6:26 ` file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Yu Kuai
2023-04-20 12:57   ` Christian Herzog

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC76dIshWvaWlki4@phys.ethz.ch \
    --to=herzog@phys.ethz.ch \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).