From: "rik.theys via Bugspray Bot" <bugbot@kernel.org>
To: tom@talpey.com, trondmy@kernel.org, chuck.lever@oracle.com,
cel@kernel.org, anna@kernel.org,
baptiste.pellegrin@ac-grenoble.fr, harald.dunkel@aixigo.com,
linux-nfs@vger.kernel.org, benoit.gschwind@minesparis.psl.eu,
carnil@debian.org, jlayton@kernel.org, herzog@phys.ethz.ch
Subject: Re: NFSD threads hang when destroying a session or client ID
Date: Wed, 29 Jan 2025 13:15:28 +0000 [thread overview]
Message-ID: <20250129-b219710c24-6f95a749b544@bugzilla.kernel.org> (raw)
In-Reply-To: <20250120-b219710c0-da932078cddb@bugzilla.kernel.org>
rik.theys writes via Kernel.org Bugzilla:
Hi,
We had similar tracing running as we observed a similar issue on our 6.11 kernel. Recently, we upgraded to 6.12.11 and this has resulted in the warning below.
I'm not 100% sure this is the same issue. If not, let me know and I'll open a different thread/bug.
Unfortunately the attachment is too large to upload as the trace.dat file is ~900MiB.
I've uploaded it here:
https://homes.esat.kuleuven.be/~rtheys/logs-6.12.11.tar.zst
This file contains the warning, trace.dat file and the output of "echo t" > /proc/sysrq-trigger.
The warning that was issued was:
[Wed Jan 29 10:11:17 2025] cb_status=-521 tk_status=-10036
[Wed Jan 29 10:11:17 2025] WARNING: CPU: 16 PID: 1670626 at fs/nfsd/nfs4callback.c:1339 nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] Modules linked in: dm_snapshot(E) nfsv4(E) dns_resolver(E) nfs(E) netfs(E) binfmt_misc(E) rpcsec_gss_krb5(E) rpcrdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) bonding(E) tls(E) rfkill(E) nft_ct(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) dm_service_time(E) dm_multipath(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) mgag200(E) kvm(E) i2c_algo_bit(E) dell_pc(E) rapl(E) drm_shmem_helper(E) ipmi_ssif(E) intel_cstate(E) platform_profile(E) dell_smbios(E) dcdbas(E) dell_wmi_descriptor(E) intel_uncore(E) wmi_bmof(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E) mei(E) intel_pch_thermal(E) i
2c_mux(E) lpc_ich(E) i2c_smbus(E) joydev(E)
[Wed Jan 29 10:11:17 2025] acpi_power_meter(E) ipmi_si(E) acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) nfsd(E) nfs_acl(E) lockd(E) auth_rpcgss(E) grace(E) fuse(E) drm(E) sunrpc(E) ext4(E) mbcache(E) jbd2(E) lpfc(E) sd_mod(E) sg(E) nvmet_fc(E) nvmet(E) nvme_fc(E) ahci(E) crct10dif_pclmul(E) nvme_fabrics(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E) nvme_core(E) megaraid_sas(E) ixgbe(E) polyval_generic(E) ghash_clmulni_intel(E) mdio(E) libata(E) wdat_wdt(E) scsi_transport_fc(E) dca(E) wmi(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[Wed Jan 29 10:11:17 2025] CPU: 16 UID: 0 PID: 1670626 Comm: kworker/u193:0 Kdump: loaded Tainted: G E 6.12.11-1.el9.esat.x86_64 #1
[Wed Jan 29 10:11:17 2025] Tainted: [E]=UNSIGNED_MODULE
[Wed Jan 29 10:11:17 2025] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
[Wed Jan 29 10:11:17 2025] Workqueue: rpciod rpc_async_schedule [sunrpc]
[Wed Jan 29 10:11:17 2025] RIP: 0010:nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] Code: 0f 1f 44 00 00 e9 1d ff ff ff 80 3d 1c a7 01 00 00 0f 85 d9 fe ff ff 48 c7 c7 e5 b2 06 c1 c6 05 08 a7 01 00 01 e8 1f 4f ef d4 <0f> 0b 8b 75 54 e9 bc fe ff ff 0f 0b 0f 1f 00 90 90 90 90 90 90 90
[Wed Jan 29 10:11:17 2025] RSP: 0018:ffffa469b58c7e08 EFLAGS: 00010282
[Wed Jan 29 10:11:17 2025] RAX: 0000000000000000 RBX: ffff8a8f13ef6400 RCX: 0000000000000000
[Wed Jan 29 10:11:17 2025] RDX: 0000000000000002 RSI: ffffffff97398443 RDI: 00000000ffffffff
[Wed Jan 29 10:11:17 2025] RBP: ffff8a8c574515b8 R08: 0000000000000000 R09: ffffa469b58c7cb0
[Wed Jan 29 10:11:17 2025] R10: ffffa469b58c7ca8 R11: ffffffff97fdf688 R12: ffff8a7548f73f60
[Wed Jan 29 10:11:17 2025] R13: ffff8a8f13ef6400 R14: 0000000004248060 R15: ffffffffc0d66a40
[Wed Jan 29 10:11:17 2025] FS: 0000000000000000(0000) GS:ffff8a8ae0a00000(0000) knlGS:0000000000000000
[Wed Jan 29 10:11:17 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Jan 29 10:11:17 2025] CR2: 00007f8a14576160 CR3: 000000274fa20004 CR4: 00000000007726f0
[Wed Jan 29 10:11:17 2025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Wed Jan 29 10:11:17 2025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Wed Jan 29 10:11:17 2025] PKRU: 55555554
[Wed Jan 29 10:11:17 2025] Call Trace:
[Wed Jan 29 10:11:17 2025] <TASK>
[Wed Jan 29 10:11:17 2025] ? __warn+0x84/0x130
[Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] ? report_bug+0x1c3/0x1d0
[Wed Jan 29 10:11:17 2025] ? handle_bug+0x5b/0xa0
[Wed Jan 29 10:11:17 2025] ? exc_invalid_op+0x14/0x70
[Wed Jan 29 10:11:17 2025] ? asm_exc_invalid_op+0x16/0x20
[Wed Jan 29 10:11:17 2025] ? __pfx_rpc_exit_task+0x10/0x10 [sunrpc]
[Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] rpc_exit_task+0x5b/0x170 [sunrpc]
[Wed Jan 29 10:11:17 2025] __rpc_execute+0x9f/0x370 [sunrpc]
[Wed Jan 29 10:11:17 2025] rpc_async_schedule+0x2b/0x40 [sunrpc]
[Wed Jan 29 10:11:17 2025] process_one_work+0x179/0x390
[Wed Jan 29 10:11:17 2025] worker_thread+0x239/0x340
[Wed Jan 29 10:11:17 2025] ? __pfx_worker_thread+0x10/0x10
[Wed Jan 29 10:11:17 2025] kthread+0xdb/0x110
[Wed Jan 29 10:11:17 2025] ? __pfx_kthread+0x10/0x10
[Wed Jan 29 10:11:17 2025] ret_from_fork+0x2d/0x50
[Wed Jan 29 10:11:17 2025] ? __pfx_kthread+0x10/0x10
[Wed Jan 29 10:11:17 2025] ret_from_fork_asm+0x1a/0x30
[Wed Jan 29 10:11:17 2025] </TASK>
[Wed Jan 29 10:11:17 2025] ---[ end trace 0000000000000000 ]---
It also seems to indicate an issue with nfsd4_cb_done in a workqueue.
The "echo t > /proc/sysrq-trigger" was taken after stopping the trace-cmd.
I hope the trace.dat file contains enough information to help find the root cause. It was started before the first client mounted something from this server.
Regards,
Rik
View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c24
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next prev parent reply other threads:[~2025-01-29 13:14 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-20 15:00 NFSD threads hang when destroying a session or client ID Chuck Lever via Bugspray Bot
2025-01-20 15:14 ` Chuck Lever
2025-01-20 15:25 ` Chuck Lever via Bugspray Bot
2025-01-20 15:40 ` Chuck Lever via Bugspray Bot
2025-01-20 19:00 ` Chuck Lever via Bugspray Bot
2025-01-20 20:35 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 14:40 ` Jeff Layton via Bugspray Bot
2025-01-21 16:10 ` Chuck Lever via Bugspray Bot
2025-01-21 17:35 ` Jeff Layton via Bugspray Bot
2025-01-21 19:38 ` Tom Talpey
2025-01-21 19:43 ` Chuck Lever
2025-01-21 16:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 16:35 ` Chuck Lever via Bugspray Bot
2025-01-22 11:40 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-22 14:19 ` Chuck Lever
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-23 2:10 ` Li Lingfeng via Bugspray Bot
2025-01-23 13:50 ` Jeff Layton via Bugspray Bot
2025-01-23 14:22 ` Chuck Lever
2025-01-23 20:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-23 21:45 ` Chuck Lever via Bugspray Bot
2025-01-26 9:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-26 17:05 ` Chuck Lever via Bugspray Bot
2025-01-29 13:15 ` rik.theys via Bugspray Bot [this message]
2025-01-29 19:40 ` Chuck Lever via Bugspray Bot
2025-01-30 14:05 ` rik.theys via Bugspray Bot
2025-01-29 19:50 ` Chuck Lever via Bugspray Bot
2025-02-10 12:05 ` Baptiste PELLEGRIN via Bugspray Bot
2025-02-21 13:42 ` Salvatore Bonaccorso
2025-02-21 13:57 ` Harald Dunkel
2025-02-21 14:31 ` Salvatore Bonaccorso
2025-02-21 14:50 ` Jeff Layton via Bugspray Bot
2025-02-21 16:00 ` Chuck Lever via Bugspray Bot
2025-02-21 14:45 ` Jeff Layton via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250129-b219710c24-6f95a749b544@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=baptiste.pellegrin@ac-grenoble.fr \
--cc=benoit.gschwind@minesparis.psl.eu \
--cc=carnil@debian.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=harald.dunkel@aixigo.com \
--cc=herzog@phys.ethz.ch \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox