public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "rik.theys via Bugspray Bot" <bugbot@kernel.org>
To: tom@talpey.com, trondmy@kernel.org, chuck.lever@oracle.com,
	 cel@kernel.org, anna@kernel.org,
	baptiste.pellegrin@ac-grenoble.fr,  harald.dunkel@aixigo.com,
	linux-nfs@vger.kernel.org,  benoit.gschwind@minesparis.psl.eu,
	carnil@debian.org, jlayton@kernel.org,  herzog@phys.ethz.ch
Subject: Re: NFSD threads hang when destroying a session or client ID
Date: Wed, 29 Jan 2025 13:15:28 +0000	[thread overview]
Message-ID: <20250129-b219710c24-6f95a749b544@bugzilla.kernel.org> (raw)
In-Reply-To: <20250120-b219710c0-da932078cddb@bugzilla.kernel.org>

rik.theys writes via Kernel.org Bugzilla:

Hi,

We had similar tracing running as we observed a similar issue on our 6.11 kernel. Recently, we upgraded to 6.12.11 and this has resulted in the warning below.

I'm not 100% sure this is the same issue. If not, let me know and I'll open a different thread/bug.

Unfortunately the attachment is too large to upload as the trace.dat file is ~900MiB.

I've uploaded it here:

https://homes.esat.kuleuven.be/~rtheys/logs-6.12.11.tar.zst

This file contains the warning, trace.dat file and the output of "echo t" > /proc/sysrq-trigger.

The warning that was issued was:

[Wed Jan 29 10:11:17 2025] cb_status=-521 tk_status=-10036
[Wed Jan 29 10:11:17 2025] WARNING: CPU: 16 PID: 1670626 at fs/nfsd/nfs4callback.c:1339 nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] Modules linked in: dm_snapshot(E) nfsv4(E) dns_resolver(E) nfs(E) netfs(E) binfmt_misc(E) rpcsec_gss_krb5(E) rpcrdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) bonding(E) tls(E) rfkill(E) nft_ct(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) dm_service_time(E) dm_multipath(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) mgag200(E) kvm(E) i2c_algo_bit(E) dell_pc(E) rapl(E) drm_shmem_helper(E) ipmi_ssif(E) intel_cstate(E) platform_profile(E) dell_smbios(E) dcdbas(E) dell_wmi_descriptor(E) intel_uncore(E) wmi_bmof(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E) mei(E) intel_pch_thermal(E) i
 2c_mux(E) lpc_ich(E) i2c_smbus(E) joydev(E)
[Wed Jan 29 10:11:17 2025]  acpi_power_meter(E) ipmi_si(E) acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) nfsd(E) nfs_acl(E) lockd(E) auth_rpcgss(E) grace(E) fuse(E) drm(E) sunrpc(E) ext4(E) mbcache(E) jbd2(E) lpfc(E) sd_mod(E) sg(E) nvmet_fc(E) nvmet(E) nvme_fc(E) ahci(E) crct10dif_pclmul(E) nvme_fabrics(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E) nvme_core(E) megaraid_sas(E) ixgbe(E) polyval_generic(E) ghash_clmulni_intel(E) mdio(E) libata(E) wdat_wdt(E) scsi_transport_fc(E) dca(E) wmi(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[Wed Jan 29 10:11:17 2025] CPU: 16 UID: 0 PID: 1670626 Comm: kworker/u193:0 Kdump: loaded Tainted: G            E      6.12.11-1.el9.esat.x86_64 #1
[Wed Jan 29 10:11:17 2025] Tainted: [E]=UNSIGNED_MODULE
[Wed Jan 29 10:11:17 2025] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
[Wed Jan 29 10:11:17 2025] Workqueue: rpciod rpc_async_schedule [sunrpc]
[Wed Jan 29 10:11:17 2025] RIP: 0010:nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025] Code: 0f 1f 44 00 00 e9 1d ff ff ff 80 3d 1c a7 01 00 00 0f 85 d9 fe ff ff 48 c7 c7 e5 b2 06 c1 c6 05 08 a7 01 00 01 e8 1f 4f ef d4 <0f> 0b 8b 75 54 e9 bc fe ff ff 0f 0b 0f 1f 00 90 90 90 90 90 90 90
[Wed Jan 29 10:11:17 2025] RSP: 0018:ffffa469b58c7e08 EFLAGS: 00010282
[Wed Jan 29 10:11:17 2025] RAX: 0000000000000000 RBX: ffff8a8f13ef6400 RCX: 0000000000000000
[Wed Jan 29 10:11:17 2025] RDX: 0000000000000002 RSI: ffffffff97398443 RDI: 00000000ffffffff
[Wed Jan 29 10:11:17 2025] RBP: ffff8a8c574515b8 R08: 0000000000000000 R09: ffffa469b58c7cb0
[Wed Jan 29 10:11:17 2025] R10: ffffa469b58c7ca8 R11: ffffffff97fdf688 R12: ffff8a7548f73f60
[Wed Jan 29 10:11:17 2025] R13: ffff8a8f13ef6400 R14: 0000000004248060 R15: ffffffffc0d66a40
[Wed Jan 29 10:11:17 2025] FS:  0000000000000000(0000) GS:ffff8a8ae0a00000(0000) knlGS:0000000000000000
[Wed Jan 29 10:11:17 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Jan 29 10:11:17 2025] CR2: 00007f8a14576160 CR3: 000000274fa20004 CR4: 00000000007726f0
[Wed Jan 29 10:11:17 2025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Wed Jan 29 10:11:17 2025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Wed Jan 29 10:11:17 2025] PKRU: 55555554
[Wed Jan 29 10:11:17 2025] Call Trace:
[Wed Jan 29 10:11:17 2025]  <TASK>
[Wed Jan 29 10:11:17 2025]  ? __warn+0x84/0x130
[Wed Jan 29 10:11:17 2025]  ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025]  ? report_bug+0x1c3/0x1d0
[Wed Jan 29 10:11:17 2025]  ? handle_bug+0x5b/0xa0
[Wed Jan 29 10:11:17 2025]  ? exc_invalid_op+0x14/0x70
[Wed Jan 29 10:11:17 2025]  ? asm_exc_invalid_op+0x16/0x20
[Wed Jan 29 10:11:17 2025]  ? __pfx_rpc_exit_task+0x10/0x10 [sunrpc]
[Wed Jan 29 10:11:17 2025]  ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025]  ? nfsd4_cb_done+0x171/0x180 [nfsd]
[Wed Jan 29 10:11:17 2025]  rpc_exit_task+0x5b/0x170 [sunrpc]
[Wed Jan 29 10:11:17 2025]  __rpc_execute+0x9f/0x370 [sunrpc]
[Wed Jan 29 10:11:17 2025]  rpc_async_schedule+0x2b/0x40 [sunrpc]
[Wed Jan 29 10:11:17 2025]  process_one_work+0x179/0x390
[Wed Jan 29 10:11:17 2025]  worker_thread+0x239/0x340
[Wed Jan 29 10:11:17 2025]  ? __pfx_worker_thread+0x10/0x10
[Wed Jan 29 10:11:17 2025]  kthread+0xdb/0x110
[Wed Jan 29 10:11:17 2025]  ? __pfx_kthread+0x10/0x10
[Wed Jan 29 10:11:17 2025]  ret_from_fork+0x2d/0x50
[Wed Jan 29 10:11:17 2025]  ? __pfx_kthread+0x10/0x10
[Wed Jan 29 10:11:17 2025]  ret_from_fork_asm+0x1a/0x30
[Wed Jan 29 10:11:17 2025]  </TASK>
[Wed Jan 29 10:11:17 2025] ---[ end trace 0000000000000000 ]---

It also seems to indicate an issue with nfsd4_cb_done in a workqueue.

The "echo t > /proc/sysrq-trigger" was taken after stopping the trace-cmd.

I hope the trace.dat file contains enough information to help find the root cause. It was started before the first client mounted something from this server.

Regards,
Rik

View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c24
You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)


  parent reply	other threads:[~2025-01-29 13:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-20 15:00 NFSD threads hang when destroying a session or client ID Chuck Lever via Bugspray Bot
2025-01-20 15:14 ` Chuck Lever
2025-01-20 15:25 ` Chuck Lever via Bugspray Bot
2025-01-20 15:40 ` Chuck Lever via Bugspray Bot
2025-01-20 19:00 ` Chuck Lever via Bugspray Bot
2025-01-20 20:35 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 14:40 ` Jeff Layton via Bugspray Bot
2025-01-21 16:10 ` Chuck Lever via Bugspray Bot
2025-01-21 17:35   ` Jeff Layton via Bugspray Bot
2025-01-21 19:38     ` Tom Talpey
2025-01-21 19:43       ` Chuck Lever
2025-01-21 16:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-21 16:35   ` Chuck Lever via Bugspray Bot
2025-01-22 11:40     ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-22 14:19       ` Chuck Lever
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-23  2:10 ` Li Lingfeng via Bugspray Bot
2025-01-23 13:50 ` Jeff Layton via Bugspray Bot
2025-01-23 14:22   ` Chuck Lever
2025-01-23 20:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-23 21:45 ` Chuck Lever via Bugspray Bot
2025-01-26  9:25 ` Baptiste PELLEGRIN via Bugspray Bot
2025-01-26 17:05   ` Chuck Lever via Bugspray Bot
2025-01-29 13:15 ` rik.theys via Bugspray Bot [this message]
2025-01-29 19:40 ` Chuck Lever via Bugspray Bot
2025-01-30 14:05   ` rik.theys via Bugspray Bot
2025-01-29 19:50 ` Chuck Lever via Bugspray Bot
2025-02-10 12:05 ` Baptiste PELLEGRIN via Bugspray Bot
2025-02-21 13:42   ` Salvatore Bonaccorso
2025-02-21 13:57     ` Harald Dunkel
2025-02-21 14:31       ` Salvatore Bonaccorso
2025-02-21 14:50       ` Jeff Layton via Bugspray Bot
2025-02-21 16:00     ` Chuck Lever via Bugspray Bot
2025-02-21 14:45 ` Jeff Layton via Bugspray Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250129-b219710c24-6f95a749b544@bugzilla.kernel.org \
    --to=bugbot@kernel.org \
    --cc=anna@kernel.org \
    --cc=baptiste.pellegrin@ac-grenoble.fr \
    --cc=benoit.gschwind@minesparis.psl.eu \
    --cc=carnil@debian.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=harald.dunkel@aixigo.com \
    --cc=herzog@phys.ethz.ch \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox