From: Chuck Lever III <chuck.lever@oracle.com>
To: Christian Herzog <herzog@phys.ethz.ch>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 6 Apr 2023 15:40:42 +0000 [thread overview]
Message-ID: <478CD009-C11B-46F2-AD13-689953D612ED@oracle.com> (raw)
In-Reply-To: <ZC7mOH4I3roIM4xr@phys.ethz.ch>
> On Apr 6, 2023, at 11:33 AM, Christian Herzog <herzog@phys.ethz.ch> wrote:
>
> Dear Chuck,
>
>>> for our researchers we are running file servers in the hundreds-of-TiB to
>>> low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
>>> LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
>>> we prepared an upgrade to Debian bookworm and tests went well. About a week
>>> after one of the upgrades, we ran into the first occurence of our problem: all
>>> of a sudden, all nfsds enter the D state and are not recoverable. However, the
>>> underlying file systems seem fine and can be read and written to. The only way
>>> out appears to be to reboot the server. The only clues are the frozen nfsds
>>> and strack traces like
>>>
>>> [<0>] rq_qos_wait+0xbc/0x130
>>> [<0>] wbt_wait+0xa2/0x110
>>
>> Hi Christian, you have a pretty deep storage stack!
>> rq_qos_wait is a few layers below NFSD. Jens Axboe
>> and linux-block are the folks who maintain that.
> are you saying the root cause isn't nfs*, but the file system?
I can't possibly know what the root cause is at this point.
> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> read from them and write to them. If xfs were to block nfs IO, this should
> affect other processes too, right?
It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
I'm merely suggesting that you should start troubleshooting at the bottom of the stack instead of the top. The wait is far outside the realm of NFSD.
--
Chuck Lever
next prev parent reply other threads:[~2023-04-06 15:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-06 11:09 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
2023-04-06 13:48 ` Chuck Lever III
2023-04-06 15:33 ` Christian Herzog
2023-04-06 15:40 ` Chuck Lever III [this message]
2023-04-06 15:54 ` Christian Herzog
2023-04-06 16:19 ` Chuck Lever III
[not found] ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
2023-04-06 17:26 ` Christian Herzog
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478CD009-C11B-46F2-AD13-689953D612ED@oracle.com \
--to=chuck.lever@oracle.com \
--cc=herzog@phys.ethz.ch \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox