Linux NFS development
 help / color / mirror / Atom feed
From: Christian Herzog <herzog@phys.ethz.ch>
To: linux-nfs@vger.kernel.org
Subject: re: file server freezes with all nfsds stuck in D state after upgrade to Debian
Date: Thu, 27 Apr 2023 11:27:33 +0200	[thread overview]
Message-ID: <ZEpABew3i0S/DBL8@phys.ethz.ch> (raw)

Hello again

Three weeks ago we reported on nfsd D state-induced freezes on our
bookworm-upgraded files servers [1]. The general consensus at the time seems
to have been that the real issue was deeper in our storage stack, so we headed
over to linux-block but were never able to pinpoint the issue. 
We just had another instance where all our 64 nfsd processes were stuck in D
state. This time the stack traces look different and we have some more hints
in our logs, and this time we're pretty sure it's nfsd and not general block
IO.

All 64 nfds have similiar stack traces:

14 processes:  
[<0>] __flush_workqueue+0x152/0x420
[<0>] nfsd4_shutdown_callback+0x49/0x130 [nfsd]
[<0>] __destroy_client+0x1f3/0x290 [nfsd]
[<0>] nfsd4_exchange_id+0x752/0x760 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

9 processes:
[<0>] __flush_workqueue+0x152/0x420
[<0>] nfsd4_shutdown_callback+0x49/0x130 [nfsd]
[<0>] __destroy_client+0x1f3/0x290 [nfsd]
[<0>] nfsd4_exchange_id+0x358/0x760 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

41 processes:
[<0>] __flush_workqueue+0x152/0x420
[<0>] nfsd4_destroy_session+0x1b6/0x250 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30


20 minutes prior to  the first frozen nfsds, we saw messages similiar to

  receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt
00000000fcdd40ac xid 182df75c

It seems these messages come from receive_cb_reply [2] and it looks like
xprt_lookup_rqst cannot find the  RPC request beloning to a certain
transaction. We  see these messages with different values for xpt_bc_xprt,
which, we think, correspond to the different NFS clients.

All this is on production file servers running Debian bookworm with iSCSI
block devices and XFS file systems.

Does anyone have any suggestions how to further debug this? Unfortunately we
have yet to find a way to trigger it deliberately, for the time being it
happens whenever it happens....


thanks and best regards,
-Christian


[1] https://www.spinics.net/lists/linux-nfs/msg96048.html
[2] https://elixir.bootlin.com/linux/v6.1.20/source/net/sunrpc/svcsock.c#L902




-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

                 reply	other threads:[~2023-04-27  9:27 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEpABew3i0S/DBL8@phys.ethz.ch \
    --to=herzog@phys.ethz.ch \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox