From: Li Lingfeng via Bugspray Bot <bugbot@kernel.org>
To: benoit.gschwind@minesparis.psl.eu, harald.dunkel@aixigo.com,
herzog@phys.ethz.ch, tom@talpey.com, chuck.lever@oracle.com,
jlayton@kernel.org, cel@kernel.org, trondmy@kernel.org,
baptiste.pellegrin@ac-grenoble.fr, carnil@debian.org,
linux-nfs@vger.kernel.org, anna@kernel.org
Subject: Re: NFSD threads hang when destroying a session or client ID
Date: Thu, 23 Jan 2025 02:10:21 +0000 [thread overview]
Message-ID: <20250123-b219710c17-c6cd701c9207@bugzilla.kernel.org> (raw)
In-Reply-To: <20250120-b219710c0-da932078cddb@bugzilla.kernel.org>
Li Lingfeng writes via Kernel.org Bugzilla:
(In reply to Chuck Lever from comment #0)
> On recent v6.1.y, intermittently, NFSD threads wait forever for NFSv4
> callback to shutdown. The wait is in __flush_workqueue(). A server system
> reboot is necessary to recover.
>
> On new kernels, similar symptoms but the indefinite wait is in the "destroy
> client" path, waiting for NFSv4 callback shutdown. The wait is on the
> wait_var_event() in nfsd41_cb_inflight_wait_complete().
Hi, I've had a similar problem recently.
And I've also done some analysis.
[ 6526.031343] INFO: task bash:846259 blocked for more than 606 seconds.
[ 6526.032060] Not tainted 6.6.0-gfbf24d352c28-dirty #22
[ 6526.032635] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6526.033404] task:bash state:D stack:0 pid:846259 ppid:838395 flags:0x0000020d
[ 6526.034226] Call trace:
[ 6526.034527] __switch_to+0x218/0x3e0
[ 6526.034925] __schedule+0x734/0x11a8
[ 6526.035323] schedule+0xa8/0x200
[ 6526.035731] nfsd4_shutdown_callback+0x24c/0x2f0
[ 6526.036228] __destroy_client+0x414/0x680
[ 6526.036663] nfs4_state_destroy_net+0x144/0x448
[ 6526.037152] nfs4_state_shutdown_net+0x2c8/0x450
[ 6526.037640] nfsd_shutdown_net+0x100/0x2e0
[ 6526.038078] nfsd_last_thread+0x190/0x330
[ 6526.038518] nfsd_svc+0x3cc/0x4a0
[ 6526.038892] write_threads+0x15c/0x2f0
[ 6526.039301] nfsctl_transaction_write+0x90/0xd0
[ 6526.039836] vfs_write+0x110/0x688
[ 6526.040221] ksys_write+0xd0/0x188
[ 6526.040607] __arm64_sys_write+0x4c/0x68
[ 6526.041035] invoke_syscall+0x68/0x198
[ 6526.041455] el0_svc_common.constprop.0+0x11c/0x150
[ 6526.041967] do_el0_svc+0x38/0x50
[ 6526.042353] el0_svc+0x5c/0x240
[ 6526.042723] el0t_64_sync_handler+0x100/0x130
[ 6526.043186] el0t_64_sync+0x188/0x190
[ 6526.051007] INFO: task cat:846265 blocked for more than 606 seconds.
1) Check cl_cb_inflight
crash> nfs4_client.cl_cb_inflight ffff000012338f08
cl_cb_inflight = {
counter = 1
},
crash>
2) No work is associated with nfsd
Only two works unrelated to NFSD.
crash> p callback_wq
callback_wq = $1 = (struct workqueue_struct *) 0xffff0000c30a1400
crash>
crash> workqueue_struct.cpu_pwq 0xffff0000c30a1400
cpu_pwq = 0xccfe9cb5d8d0
crash> kmem -o
PER-CPU OFFSET VALUES:
CPU 0: ffff2f015341c000
CPU 1: ffff2f0153442000
CPU 2: ffff2f0153468000
CPU 3: ffff2f015348e000
crash>
// ffff2f015341c000 + ccfe9cb5d8d0 = FFFFFBFFEFF798D0
crash> rd FFFFFBFFEFF798D0
fffffbffeff798d0: ffff0000d3488d00 ..H.....
crash>
// ffff2f0153442000 + ccfe9cb5d8d0 = FFFFFBFFEFF9F8D0
crash> rd FFFFFBFFEFF9F8D0
fffffbffeff9f8d0: ffff0000d3488d00 ..H.....
crash>
// ffff2f0153468000 + ccfe9cb5d8d0 = FFFFFBFFEFFC58D0
crash> rd FFFFFBFFEFFC58D0
fffffbffeffc58d0: ffff0000d3488d00 ..H.....
crash>
// ffff2f015348e000 + ccfe9cb5d8d0 = FFFFFBFFEFFEB8D0
crash> rd FFFFFBFFEFFEB8D0
fffffbffeffeb8d0: ffff0000d3488d00 ..H.....
crash>
crash> pool_workqueue.pool ffff0000d3488d00
pool = 0xffff0000c01b6800,
crash>
crash> worker_pool.worklist 0xffff0000c01b6800
worklist = {
next = 0xffff0000c906c4a8,
prev = 0xffffd0ff8944fc68 <stats_flush_dwork+8>
},
crash>
crash> list 0xffff0000c906c4a8
ffff0000c906c4a8
ffffd0ff8944fc68
ffff0000c01b6860
crash>
crash> work_struct.func ffff0000c906c4a0
func = 0xffffd0ff84fae128 <wb_update_bandwidth_workfn>,
crash> work_struct.func 0xffffd0ff8944fc60
func = 0xffffd0ff8510b258 <flush_memcg_stats_dwork>,
crash>
3) No running kworker
I checked vmcore by "foreach bt" and find all kworker are as follows.
PID: 62 TASK: ffff0000c31d0040 CPU: 1 COMMAND: "kworker/R-nfsio"
#0 [ffff800080c27b80] __switch_to at ffffd0ff866297dc
#1 [ffff800080c27bd0] __schedule at ffffd0ff8662a180
#2 [ffff800080c27d00] schedule at ffffd0ff8662ac9c
#3 [ffff800080c27d40] rescuer_thread at ffffd0ff84b418e4
#4 [ffff800080c27e60] kthread at ffffd0ff84b52e14
PID: 94 TASK: ffff0000c74ba080 CPU: 0 COMMAND: "kworker/0:1H"
#0 [ffff800080e07c00] __switch_to at ffffd0ff866297dc
#1 [ffff800080e07c50] __schedule at ffffd0ff8662a180
#2 [ffff800080e07d80] schedule at ffffd0ff8662ac9c
#3 [ffff800080e07dc0] worker_thread at ffffd0ff84b40f94
#4 [ffff800080e07e60] kthread at ffffd0ff84b52e14
4) Check works releated to nfsd4_run_cb_work
crash> p nfsd4_run_cb_work
nfsd4_run_cb_work = $5 =
{void (struct work_struct *)} 0xffffd0ff855691e0 <nfsd4_run_cb_work>
crash> search ffffd0ff855691e0
ffff000010474138: ffffd0ff855691e0
ffff0000104750f8: ffffd0ff855691e0
ffff0000104752f0: ffffd0ff855691e0
ffff0000104756e0: ffffd0ff855691e0
ffff000012338388: ffffd0ff855691e0
ffff000012339288: ffffd0ff855691e0
ffff00001233a908: ffffd0ff855691e0
ffff00001233b808: ffffd0ff855691e0
ffff0000c745d038: ffffd0ff855691e0
ffff0000c86499f8: ffffd0ff855691e0
ffff0000c8649b30: ffffd0ff855691e0
ffff0000c9ff8dc8: ffffd0ff855691e0
crash>
ffff000010474138 --> (work) ffff000010474120
ffff0000104750f8 --> (work) ffff0000104750e0
ffff0000104752f0 --> (work) ffff0000104752d8
ffff0000104756e0 --> (work) ffff0000104756c8
ffff000012338388 --> (work) ffff000012338370
ffff000012339288 --> (work) ffff000012339270
ffff00001233a908 --> (work) ffff00001233a8f0
ffff00001233b808 --> (work) ffff00001233b7f0
ffff0000c745d038 --> (work) ffff0000c745d020
ffff0000c86499f8 --> (work) ffff0000c86499e0
ffff0000c8649b30 --> (work) ffff0000c8649b18
ffff0000c9ff8dc8 --> (work) ffff0000c9ff8db0
crash> work_struct.data ffff0000104750e0
data = {
counter = 68719476704 // FFFFFFFE0 bit0~5 are 0
},
crash>
crash> work_struct.data ffff0000c9ff8db0
data = {
counter = 256 // 0x100
},
crash>
I have added some debug information and am trying to reproduce it.
Could you please provide more information you got?
Or any suggestions about this?
Thanks.
>
> In some cases, clients suspend (inactivity). The server converts them to
> courteous clients. The NFSv4 callback shutdown workqueue item for that
> client appears to be stuck waiting in rpc_shutdown_client().
>
> Let's collect data under this bug report.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c17
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next prev parent reply other threads:[~2025-01-23 2:09 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-20 15:00 NFSD threads hang when destroying a session or client ID Chuck Lever via Bugspray Bot
2025-01-20 15:14 ` Chuck Lever
2025-01-20 15:25 ` Chuck Lever via Bugspray Bot
2025-01-20 15:40 ` Chuck Lever via Bugspray Bot
2025-01-20 19:00 ` Chuck Lever via Bugspray Bot
2025-01-20 20:35 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 14:40 ` Jeff Layton via Bugspray Bot
2025-01-21 16:10 ` Chuck Lever via Bugspray Bot
2025-01-21 17:35 ` Jeff Layton via Bugspray Bot
2025-01-21 19:38 ` Tom Talpey
2025-01-21 19:43 ` Chuck Lever
2025-01-21 16:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 16:35 ` Chuck Lever via Bugspray Bot
2025-01-22 11:40 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-22 14:19 ` Chuck Lever
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-23 2:10 ` Li Lingfeng via Bugspray Bot [this message]
2025-01-23 13:50 ` Jeff Layton via Bugspray Bot
2025-01-23 14:22 ` Chuck Lever
2025-01-23 20:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-23 21:45 ` Chuck Lever via Bugspray Bot
2025-01-26 9:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-26 17:05 ` Chuck Lever via Bugspray Bot
2025-01-29 13:15 ` rik.theys via Bugspray Bot
2025-01-29 19:40 ` Chuck Lever via Bugspray Bot
2025-01-30 14:05 ` rik.theys via Bugspray Bot
2025-01-29 19:50 ` Chuck Lever via Bugspray Bot
2025-02-10 12:05 ` Baptiste PELLEGRIN via Bugspray Bot
2025-02-21 13:42 ` Salvatore Bonaccorso
2025-02-21 13:57 ` Harald Dunkel
2025-02-21 14:31 ` Salvatore Bonaccorso
2025-02-21 14:50 ` Jeff Layton via Bugspray Bot
2025-02-21 16:00 ` Chuck Lever via Bugspray Bot
2025-02-21 14:45 ` Jeff Layton via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250123-b219710c17-c6cd701c9207@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=baptiste.pellegrin@ac-grenoble.fr \
--cc=benoit.gschwind@minesparis.psl.eu \
--cc=carnil@debian.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=harald.dunkel@aixigo.com \
--cc=herzog@phys.ethz.ch \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.