All of lore.kernel.org
 help / color / mirror / Atom feed
* file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
@ 2023-04-06 11:09 Christian Herzog
  2023-04-06 13:48 ` Chuck Lever III
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Herzog @ 2023-04-06 11:09 UTC (permalink / raw)
  To: linux-nfs

Dear all,

for our researchers we are running file servers in the hundreds-of-TiB to
low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
we prepared an upgrade to Debian bookworm and tests went well. About a week
after one of the upgrades, we ran into the first occurence of our problem: all
of a sudden, all nfsds enter the D state and are not recoverable. However, the
underlying file systems seem fine and can be read and written to. The only way
out appears to be to reboot the server. The only clues are the frozen nfsds
and strack traces like

[<0>] rq_qos_wait+0xbc/0x130
[<0>] wbt_wait+0xa2/0x110
[<0>] __rq_qos_throttle+0x20/0x40
[<0>] blk_mq_submit_bio+0x2d3/0x580
[<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
[<0>] iomap_submit_ioend+0x4b/0x80
[<0>] iomap_do_writepage+0x4b4/0x820
[<0>] write_cache_pages+0x180/0x4c0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
[<0>] do_writepages+0xbd/0x1c0
[<0>] filemap_fdatawrite_wbc+0x5f/0x80
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] file_write_and_wait_range+0x41/0x90
[<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
[<0>] nfsd_commit+0x93/0x190 [nfsd]
[<0>] nfsd4_commit+0x5e/0x90 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
it and it has now happened 4 times on one server and once on a second.
Needless to say, these are production systems, so we have a window of a few
minutes for debugging before people start yelling. We've thrown everything we
could at our test setup but so far haven't been able to trigger it.
Any pointers would be highly appreciated.


thanks and best regards,
-Christian



cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

uname -vr
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)

apt list --installed '*nfs*'
libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]

nfsconf -d
[exportd]
 debug = all
[exportfs]
 debug = all
[general]
 pipefs-directory = /run/rpc_pipefs
[lockd]
 port = 32769
 udp-port = 32769
[mountd]
 debug = all
 manage-gids = True
 port = 892
[nfsd]
 debug = all
 port = 2049
 threads = 48
[nfsdcld]
 debug = all
[nfsdcltrack]
 debug = all
[sm-notify]
 debug = all
 outgoing-port = 846
[statd]
 debug = all
 outgoing-port = 2020
 port = 662



-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 11:09 Christian Herzog
@ 2023-04-06 13:48 ` Chuck Lever III
  2023-04-06 15:33   ` Christian Herzog
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever III @ 2023-04-06 13:48 UTC (permalink / raw)
  To: Christian Herzog; +Cc: Linux NFS Mailing List



> On Apr 6, 2023, at 7:09 AM, Christian Herzog <herzog@phys.ethz.ch> wrote:
> 
> Dear all,
> 
> for our researchers we are running file servers in the hundreds-of-TiB to
> low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
> LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
> we prepared an upgrade to Debian bookworm and tests went well. About a week
> after one of the upgrades, we ran into the first occurence of our problem: all
> of a sudden, all nfsds enter the D state and are not recoverable. However, the
> underlying file systems seem fine and can be read and written to. The only way
> out appears to be to reboot the server. The only clues are the frozen nfsds
> and strack traces like
> 
> [<0>] rq_qos_wait+0xbc/0x130
> [<0>] wbt_wait+0xa2/0x110

Hi Christian, you have a pretty deep storage stack!
rq_qos_wait is a few layers below NFSD. Jens Axboe
and linux-block are the folks who maintain that.


> [<0>] __rq_qos_throttle+0x20/0x40
> [<0>] blk_mq_submit_bio+0x2d3/0x580
> [<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
> [<0>] iomap_submit_ioend+0x4b/0x80
> [<0>] iomap_do_writepage+0x4b4/0x820
> [<0>] write_cache_pages+0x180/0x4c0
> [<0>] iomap_writepages+0x1c/0x40
> [<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
> [<0>] do_writepages+0xbd/0x1c0
> [<0>] filemap_fdatawrite_wbc+0x5f/0x80
> [<0>] __filemap_fdatawrite_range+0x58/0x80
> [<0>] file_write_and_wait_range+0x41/0x90
> [<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
> [<0>] nfsd_commit+0x93/0x190 [nfsd]
> [<0>] nfsd4_commit+0x5e/0x90 [nfsd]
> [<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
> [<0>] nfsd_dispatch+0x167/0x280 [nfsd]
> [<0>] svc_process_common+0x286/0x5e0 [sunrpc]
> [<0>] svc_process+0xad/0x100 [sunrpc]
> [<0>] nfsd+0xd5/0x190 [nfsd]
> [<0>] kthread+0xe6/0x110
> [<0>] ret_from_fork+0x1f/0x30
> 
> (we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
> it and it has now happened 4 times on one server and once on a second.
> Needless to say, these are production systems, so we have a window of a few
> minutes for debugging before people start yelling. We've thrown everything we
> could at our test setup but so far haven't been able to trigger it.
> Any pointers would be highly appreciated.
> 
> 
> thanks and best regards,
> -Christian
> 
> 
> 
> cat /etc/os-release 
> PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
> 
> uname -vr
> 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)
> 
> apt list --installed '*nfs*'
> libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
> nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
> nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]
> 
> nfsconf -d
> [exportd]
> debug = all
> [exportfs]
> debug = all
> [general]
> pipefs-directory = /run/rpc_pipefs
> [lockd]
> port = 32769
> udp-port = 32769
> [mountd]
> debug = all
> manage-gids = True
> port = 892
> [nfsd]
> debug = all
> port = 2049
> threads = 48
> [nfsdcld]
> debug = all
> [nfsdcltrack]
> debug = all
> [sm-notify]
> debug = all
> outgoing-port = 846
> [statd]
> debug = all
> outgoing-port = 2020
> port = 662
> 
> 
> 
> -- 
> Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
> Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
> Department of Physics, ETH Zurich           
> 8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

--
Chuck Lever



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 13:48 ` Chuck Lever III
@ 2023-04-06 15:33   ` Christian Herzog
  2023-04-06 15:40     ` Chuck Lever III
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Herzog @ 2023-04-06 15:33 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

Dear Chuck,

> > for our researchers we are running file servers in the hundreds-of-TiB to
> > low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
> > LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
> > we prepared an upgrade to Debian bookworm and tests went well. About a week
> > after one of the upgrades, we ran into the first occurence of our problem: all
> > of a sudden, all nfsds enter the D state and are not recoverable. However, the
> > underlying file systems seem fine and can be read and written to. The only way
> > out appears to be to reboot the server. The only clues are the frozen nfsds
> > and strack traces like
> > 
> > [<0>] rq_qos_wait+0xbc/0x130
> > [<0>] wbt_wait+0xa2/0x110
> 
> Hi Christian, you have a pretty deep storage stack!
> rq_qos_wait is a few layers below NFSD. Jens Axboe
> and linux-block are the folks who maintain that.
are you saying the root cause isn't nfs*, but the file system? That was our
first idea too, but we haven't found any indication that this is the case. The
xfs file systems seem perfectly fine when all nfsds are in D state, and we can
read from them and write to them. If xfs were to block nfs IO, this should
affect other processes too, right?

thanks and Happy Easter,
-Christian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 15:33   ` Christian Herzog
@ 2023-04-06 15:40     ` Chuck Lever III
  2023-04-06 15:54       ` Christian Herzog
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever III @ 2023-04-06 15:40 UTC (permalink / raw)
  To: Christian Herzog; +Cc: Linux NFS Mailing List



> On Apr 6, 2023, at 11:33 AM, Christian Herzog <herzog@phys.ethz.ch> wrote:
> 
> Dear Chuck,
> 
>>> for our researchers we are running file servers in the hundreds-of-TiB to
>>> low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
>>> LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
>>> we prepared an upgrade to Debian bookworm and tests went well. About a week
>>> after one of the upgrades, we ran into the first occurence of our problem: all
>>> of a sudden, all nfsds enter the D state and are not recoverable. However, the
>>> underlying file systems seem fine and can be read and written to. The only way
>>> out appears to be to reboot the server. The only clues are the frozen nfsds
>>> and strack traces like
>>> 
>>> [<0>] rq_qos_wait+0xbc/0x130
>>> [<0>] wbt_wait+0xa2/0x110
>> 
>> Hi Christian, you have a pretty deep storage stack!
>> rq_qos_wait is a few layers below NFSD. Jens Axboe
>> and linux-block are the folks who maintain that.
> are you saying the root cause isn't nfs*, but the file system?

I can't possibly know what the root cause is at this point.


> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> read from them and write to them. If xfs were to block nfs IO, this should
> affect other processes too, right?

It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.

I'm merely suggesting that you should start troubleshooting at the bottom of the stack instead of the top. The wait is far outside the realm of NFSD.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 15:40     ` Chuck Lever III
@ 2023-04-06 15:54       ` Christian Herzog
  2023-04-06 16:19         ` Chuck Lever III
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Herzog @ 2023-04-06 15:54 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

Dear Chuck,

> > That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> > read from them and write to them. If xfs were to block nfs IO, this should
> > affect other processes too, right?
> 
> It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
ok good to know. So far we were under the impression that a file system would
block as a whole.

> I'm merely suggesting that you should start troubleshooting at the bottom of the stack instead of the top. The wait is far outside the realm of NFSD.
thanks, point taken. So next time it happens we'll make sure to poke in this
direction during the few minutes we have for debugging before we get tarred
and feathered by the users.


-Christian


-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 15:54       ` Christian Herzog
@ 2023-04-06 16:19         ` Chuck Lever III
       [not found]           ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever III @ 2023-04-06 16:19 UTC (permalink / raw)
  To: Christian Herzog; +Cc: Linux NFS Mailing List



> On Apr 6, 2023, at 11:54 AM, Christian Herzog <herzog@phys.ethz.ch> wrote:
> 
> Dear Chuck,
> 
>>> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
>>> read from them and write to them. If xfs were to block nfs IO, this should
>>> affect other processes too, right?
>> 
>> It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
> ok good to know. So far we were under the impression that a file system would
> block as a whole.

XFS tries to operate in parallel as much as it can. Maybe other filesystems aren't as capable.

If the unresponsive block is part of a superblock or the journal (ie, shared metadata) I would expect XFS to become unresponsive. For I/O on blocks containing file data, it is likely to have more robust behavior.


>> I'm merely suggesting that you should start troubleshooting at the bottom of the stack instead of the top. The wait is far outside the realm of NFSD.
> thanks, point taken. So next time it happens we'll make sure to poke in this
> direction during the few minutes we have for debugging before we get tarred
> and feathered by the users.

I encourage you to discuss debugging tactics with Jens and the block folks -- you can probably capture a lot of info during those few minutes if you have some expert guidance.

Good luck!


--
Chuck Lever



^ permalink raw reply	[flat|nested] 10+ messages in thread

* file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
@ 2023-04-06 16:59 Christian Herzog
  2023-04-07  6:26 ` Yu Kuai
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Herzog @ 2023-04-06 16:59 UTC (permalink / raw)
  To: linux-block

Dear all,

disclaimer: this email was originally posted to linux-nfs since we believed
the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a
problem further down in the storage stack and referred to you guys, so here we
are:

for our researchers we are running file servers in the hundreds-of-TiB to
low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
we prepared an upgrade to Debian bookworm and tests went well. About a week
after one of the upgrades, we ran into the first occurence of our problem: all
of a sudden, all nfsds enter the D state and are not recoverable. However, the
underlying file systems seem fine and can be read and written to. The only way
out appears to be to reboot the server. The only clues are the frozen nfsds
and strack traces like

[<0>] rq_qos_wait+0xbc/0x130
[<0>] wbt_wait+0xa2/0x110
[<0>] __rq_qos_throttle+0x20/0x40
[<0>] blk_mq_submit_bio+0x2d3/0x580
[<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
[<0>] iomap_submit_ioend+0x4b/0x80
[<0>] iomap_do_writepage+0x4b4/0x820
[<0>] write_cache_pages+0x180/0x4c0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
[<0>] do_writepages+0xbd/0x1c0
[<0>] filemap_fdatawrite_wbc+0x5f/0x80
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] file_write_and_wait_range+0x41/0x90
[<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
[<0>] nfsd_commit+0x93/0x190 [nfsd]
[<0>] nfsd4_commit+0x5e/0x90 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
it and it has now happened 4 times on one server and once on a second.
Needless to say, these are production systems, so we have a window of a few
minutes for debugging before people start yelling. We've thrown everything we
could at our test setup but so far haven't been able to trigger it.
Any pointers would be highly appreciated.


thanks and best regards,
-Christian



cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

uname -vr
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)

apt list --installed '*nfs*'
libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]

nfsconf -d
[exportd]
 debug = all
[exportfs]
 debug = all
[general]
 pipefs-directory = /run/rpc_pipefs
[lockd]
 port = 32769
 udp-port = 32769
[mountd]
 debug = all
 manage-gids = True
 port = 892
[nfsd]
 debug = all
 port = 2049
 threads = 48
[nfsdcld]
 debug = all
[nfsdcltrack]
 debug = all
[sm-notify]
 debug = all
 outgoing-port = 846
[statd]
 debug = all
 outgoing-port = 2020
 port = 662



-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

----- End forwarded message -----

-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
       [not found]           ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
@ 2023-04-06 17:26             ` Christian Herzog
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Herzog @ 2023-04-06 17:26 UTC (permalink / raw)
  To: Bob Ciotti; +Cc: Chuck Lever III, Linux NFS Mailing List, Bob Ciotti

Dear Bob,

thanks a lot for your input.

> >>>> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> >>>> read from them and write to them. If xfs were to block nfs IO, this should
> >>>> affect other processes too, right?
> >>> It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
> >> ok good to know. So far we were under the impression that a file system would
> >> block as a whole.
> > 
> > XFS tries to operate in parallel as much as it can. Maybe other filesystems aren't as capable.
> > 
> > If the unresponsive block is part of a superblock or the journal (ie, shared metadata) I would expect XFS to become unresponsive. For I/O on blocks containing file data, it is likely to have more robust behavior.
> > 
> 
> Pretty sure we have seen a similar issue - never fully explained.  From what I recall, the server gets to a low memory state. At that point, efforts to coalesce writes are abandoned, and each write request is processed in line - vs scheduled - all nfsd's then pile up in D.  writes continue to arrive at a rate higher than can keep up. But, the back end store (a high end netapp raid 6 w/240 drives also with xfs) had very little load - not too busy.  Never fully explained it - but Chucks point on  shared metadata block may be good place to look - and whether in-line write at low memory could have synergy.  IIRC, worked around with releases and tunables like minfree kmem et.al. , that came into play to reduce - but not eliminate. I'm away from reference material for a while but I'll review and update if I find anything.
we'll certainly investigate this topic, but right now it's kinda hard to
imagine since I've never seen the file server above ~10G of its 64G of RAM
(excluding page cache of course). We're not even sure heavy writes trigger the
problem, in one case our monitoring hinted at a lot of reads leading up to the
freeze.
OTOH if our issue could be resolved by throwing a bunch of RAM bars into the
server, all the better.


thanks,
-Christian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-06 16:59 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
@ 2023-04-07  6:26 ` Yu Kuai
  2023-04-20 12:57   ` Christian Herzog
  0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2023-04-07  6:26 UTC (permalink / raw)
  To: Christian Herzog, linux-block, yukuai (C)

Hi,

在 2023/04/07 0:59, Christian Herzog 写道:
> Dear all,
> 
> disclaimer: this email was originally posted to linux-nfs since we believed
> the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a
> problem further down in the storage stack and referred to you guys, so here we
> are:
> 
> for our researchers we are running file servers in the hundreds-of-TiB to
> low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
> LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
> we prepared an upgrade to Debian bookworm and tests went well. About a week
> after one of the upgrades, we ran into the first occurence of our problem: all
> of a sudden, all nfsds enter the D state and are not recoverable. However, the
> underlying file systems seem fine and can be read and written to. The only way
> out appears to be to reboot the server. The only clues are the frozen nfsds
> and strack traces like
> 
> [<0>] rq_qos_wait+0xbc/0x130
> [<0>] wbt_wait+0xa2/0x110
> [<0>] __rq_qos_throttle+0x20/0x40
> [<0>] blk_mq_submit_bio+0x2d3/0x580
> [<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
> [<0>] iomap_submit_ioend+0x4b/0x80
> [<0>] iomap_do_writepage+0x4b4/0x820
> [<0>] write_cache_pages+0x180/0x4c0
> [<0>] iomap_writepages+0x1c/0x40
> [<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
> [<0>] do_writepages+0xbd/0x1c0
> [<0>] filemap_fdatawrite_wbc+0x5f/0x80
> [<0>] __filemap_fdatawrite_range+0x58/0x80
> [<0>] file_write_and_wait_range+0x41/0x90
> [<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
> [<0>] nfsd_commit+0x93/0x190 [nfsd]
> [<0>] nfsd4_commit+0x5e/0x90 [nfsd]
> [<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
> [<0>] nfsd_dispatch+0x167/0x280 [nfsd]
> [<0>] svc_process_common+0x286/0x5e0 [sunrpc]
> [<0>] svc_process+0xad/0x100 [sunrpc]
> [<0>] nfsd+0xd5/0x190 [nfsd]
> [<0>] kthread+0xe6/0x110
> [<0>] ret_from_fork+0x1f/0x30

I'm not familiar with nfsd, but since above thread is waiting for
inflight request to be done, it'll be helper to monitor following
debugfs:

under /sys/kernel/debug/block/[device]/:

rqos/wbt/inflight
hctx*/tags
hctx*/sched_tags
hctx*/busy
hctx*/dispatch

This can provide a preliminary conclusion that this is due to io is too
slow or there is a bug and io is hanged.

Thanks,
Kuai
> 
> (we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
> it and it has now happened 4 times on one server and once on a second.
> Needless to say, these are production systems, so we have a window of a few
> minutes for debugging before people start yelling. We've thrown everything we
> could at our test setup but so far haven't been able to trigger it.
> Any pointers would be highly appreciated.
> 
> 
> thanks and best regards,
> -Christian
> 
> 
> 
> cat /etc/os-release
> PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
> 
> uname -vr
> 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)
> 
> apt list --installed '*nfs*'
> libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
> nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
> nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]
> 
> nfsconf -d
> [exportd]
>   debug = all
> [exportfs]
>   debug = all
> [general]
>   pipefs-directory = /run/rpc_pipefs
> [lockd]
>   port = 32769
>   udp-port = 32769
> [mountd]
>   debug = all
>   manage-gids = True
>   port = 892
> [nfsd]
>   debug = all
>   port = 2049
>   threads = 48
> [nfsdcld]
>   debug = all
> [nfsdcltrack]
>   debug = all
> [sm-notify]
>   debug = all
>   outgoing-port = 846
> [statd]
>   debug = all
>   outgoing-port = 2020
>   port = 662
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
  2023-04-07  6:26 ` Yu Kuai
@ 2023-04-20 12:57   ` Christian Herzog
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Herzog @ 2023-04-20 12:57 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-block, yukuai (C)

Dear all,

we just had another freeze on one of our bookworm file servers. The scenario
is a bit different, but the root cause might be just the same. So what
happened:

- the server had been happily serving NFS + SMB for two weeks
- today I noticed a left-over rsync process from a recent backup run that
  didn't do any IO and was in D state
- I killed this rsync process, but since it was in D, it never died
- after a few minutes I noticed an nfsd in D state too (but just one). I
  watched it for a bit and then decided to try "service nfs-kernel-server
  restart" to see if again nfs was involved. I guess it was...
- from then on, all sorts of processes entered eternal D: several smbd,
  autofs, the rsync and one nfsd
- however: at all times, the underlying file systems seemed perfectly fine. We
  could write to every single one of them and gdu the hundred-TiB ones without
  a problem
- my impression is that at least this time, nfsd was just one of the victims
  of a deeper problem
- we took all the forensics suggested last time by Kuai and Bob. I don't
  really understand them, but here's the facts:
  - memory on the machine is completely uncritical, < 20% used
  - the rqos/wbt/inflight of all block devices are 0 (remember: those are
    iSCSI LUNs)
  - all the hctx* values seem unsuspicious to me, but what do I know
  - the stacks traces of the D processes don't show any rq_qos_wait this time

here's the D rsync trace:

[<0>] iterate_dir+0x52/0x1c0
[<0>] __x64_sys_getdents64+0x84/0x120
[<0>] do_syscall_64+0x58/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd


and the D nfsd:

[<0>] vfs_rename+0x266/0xd70
[<0>] nfsd_rename+0x327/0x470 [nfsd]
[<0>] nfsd4_rename+0x53/0x110 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

all the forensics are contained in
https://people.phys.ethz.ch/~daduke/freeze.tgz

we would be extremely grateful for any hints how we can debug (or even solve)
this. We're really at a loss here...


thanks and kind regards,
-Christian


-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-04-20 12:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-06 16:59 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
2023-04-07  6:26 ` Yu Kuai
2023-04-20 12:57   ` Christian Herzog
  -- strict thread matches above, loose matches on Subject: below --
2023-04-06 11:09 Christian Herzog
2023-04-06 13:48 ` Chuck Lever III
2023-04-06 15:33   ` Christian Herzog
2023-04-06 15:40     ` Chuck Lever III
2023-04-06 15:54       ` Christian Herzog
2023-04-06 16:19         ` Chuck Lever III
     [not found]           ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
2023-04-06 17:26             ` Christian Herzog

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.