From: Lucas via Bugspray Bot <bugbot@kernel.org>
To: jlayton@kernel.org, trondmy@kernel.org, anna@kernel.org,
cel@kernel.org, linux-nfs@vger.kernel.org
Subject: NFS Server Issues with RDMA in Kernel 6.13.6
Date: Thu, 13 Mar 2025 18:35:08 +0000 [thread overview]
Message-ID: <20250313-b219865c0-2a34cbc6e249@bugzilla.kernel.org> (raw)
Lucas added an attachment on Kernel.org Bugzilla:
Created attachment 307819
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
Hi
I am experiencing stability and performance issues when using NFS (kernel 6.13.6) over rdma protocol.
All what I need to do to trigger the issue is connect client and start read / write operations.
Fastest way to reproduce issue is by running fio job:
fio --name=test --rw=randwrite --bs=4k --filename=/mnt/nfs/test.io --size=40G --direct=1 --numjobs=18 --iodepth=24 --exitall --group_reporting --ioengine=libaio --time_based --runtime=300
Dmesg says: "watchdog: Watchdog detected hard LOCKUP on cpu "
[ 976.676922] watchdog: Watchdog detected hard LOCKUP on cpu 182
[ 976.676931] Modules linked in: xfs(E) brd(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) br_netfilter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) xt_recent(E) null_blk(E) nvme_fabrics(E) nvme(E) nvme_core(E) overlay(E) ip6t_REJECT(E) nf_reject_ipv6(E) xt_hl(E) ip6t_rt(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_LOG(E) nf_log_syslog(E) xt_multiport(E) nft_limit(E) xt_limit(E) xt_addrtype(E) xt_tcpudp(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E) nf_tables(E) binfmt_misc(E) nfnetlink(E) nls_iso8859_1(E) rpcrdma(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) sunrpc(E) rdma_ucm(E) ib_iser(E) libiscsi(E) ipmi_ssif(E) scsi_transport_iscsi(E) rdma_cm(E) ib_umad(E) kvm_amd(E) ib_ipoib(E) iw_cm(E) kvm(E) ib_cm(E) rapl(E) bridge(E) stp(E) llc(E) joydev(E) input_leds(E) ccp(E) ee1004(E) ptdma(E) k10temp(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
mac_hid(E) sch_fq_codel(E) bonding(E)
[ 976.677035] efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) ast(E) drm_client_lib(E) drm_shmem_helper(E) hid_generic(E) mlx5_core(E) mpt3sas(E) rndis_host(E) igb(E) raid_class(E) drm_kms_helper(E) usbhid(E) cdc_ether(E) dca(E) mlxfw(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) usbnet(E) psample(E) i2c_algo_bit(E) scsi_transport_sas(E) ahci(E) drm(E) mii(E) hid(E) libahci(E) i2c_piix4(E) tls(E) i2c_smbus(E) pci_hyperv_intf(E) aesni_intel(E) crypto_simd(E) cryptd(E)
[ 976.677112] CPU: 182 UID: 0 PID: 20143 Comm: nfsd Kdump: loaded Tainted: G E 6.13.6+ #1
[ 976.677118] Tainted: [E]=UNSIGNED_MODULE
[ 976.677120] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS 2.8 01/26/2024
[ 976.677123] RIP: 0010:native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677138] Code: ff ff 41 83 c6 01 41 c1 e5 10 41 c1 e6 12 45 09 ee 44 89 f0 c1 e8 10 66 41 87 44 24 02 89 c2 c1 e2 10 75 5f 31 d2 eb 02 f3 90 <41> 8b 04 24 66 85 c0 75 f5 89 c1 66 31 c9 44 39 f1 0f 84 97 00 00
[ 976.677141] RSP: 0018:ffffa16f2de8b948 EFLAGS: 00000002
[ 976.677145] RAX: 0000000001d00001 RBX: ffff8bfa8e937bc0 RCX: 0000000000000001
[ 976.677147] RDX: ffff8bfa8bd37bc0 RSI: 0000000003340000 RDI: ffff8bfb9fffbe08
[ 976.677149] RBP: ffffa16f2de8b970 R08: 0000000000000000 R09: 0000000000000000
[ 976.677151] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8bfb9fffbe08
[ 976.677153] R13: ffff8c3a8e037bc0 R14: 0000000002dc0000 R15: 00000000000000cc
[ 976.677155] FS: 0000000000000000(0000) GS:ffff8bfa8e900000(0000) knlGS:0000000000000000
[ 976.677158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 976.677160] CR2: 00007fb9b1da66b0 CR3: 00000003e84bc000 CR4: 0000000000350ef0
[ 976.677162] Call Trace:
[ 976.677166] <NMI>
[ 976.677172] ? show_regs+0x71/0x90
[ 976.677182] ? watchdog_hardlockup_check+0x1ac/0x380
[ 976.677189] ? srso_return_thunk+0x5/0x5f
[ 976.677194] ? watchdog_overflow_callback+0x69/0x80
[ 976.677198] ? __perf_event_overflow+0x153/0x450
[ 976.677206] ? srso_return_thunk+0x5/0x5f
[ 976.677211] ? perf_event_overflow+0x19/0x30
[ 976.677215] ? x86_pmu_handle_irq+0x189/0x210
[ 976.677225] ? srso_return_thunk+0x5/0x5f
[ 976.677228] ? flush_tlb_one_kernel+0xe/0x40
[ 976.677234] ? srso_return_thunk+0x5/0x5f
[ 976.677237] ? set_pte_vaddr_p4d+0x58/0x80
[ 976.677244] ? srso_return_thunk+0x5/0x5f
[ 976.677247] ? set_pte_vaddr+0x89/0xc0
[ 976.677250] ? cc_platform_has+0x30/0x40
[ 976.677256] ? srso_return_thunk+0x5/0x5f
[ 976.677259] ? native_set_fixmap+0x6b/0xa0
[ 976.677262] ? srso_return_thunk+0x5/0x5f
[ 976.677265] ? ghes_copy_tofrom_phys+0x7c/0x130
[ 976.677274] ? srso_return_thunk+0x5/0x5f
[ 976.677277] ? __ghes_peek_estatus.isra.0+0x4e/0xd0
[ 976.677282] ? amd_pmu_handle_irq+0x48/0xc0
[ 976.677287] ? perf_event_nmi_handler+0x2d/0x60
[ 976.677290] ? nmi_handle+0x67/0x190
[ 976.677295] ? default_do_nmi+0x45/0x150
[ 976.677301] ? exc_nmi+0x13e/0x1e0
[ 976.677304] ? end_repeat_nmi+0xf/0x53
[ 976.677313] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677317] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677322] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677325] </NMI>
[ 976.677326] <TASK>
[ 976.677329] _raw_spin_lock_irqsave+0x5c/0x80
[ 976.677334] alloc_iova+0x92/0x290
[ 976.677341] ? current_time+0x2d/0x120
[ 976.677348] alloc_iova_fast+0x1fb/0x400
[ 976.677351] ? srso_return_thunk+0x5/0x5f
[ 976.677354] ? touch_atime+0x1f/0x110
[ 976.677360] iommu_dma_alloc_iova+0xa2/0x190
[ 976.677365] iommu_dma_map_sg+0x447/0x4e0
[ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
[ 976.677380] dma_map_sgtable+0x21/0x50
[ 976.677386] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 976.677525] ? common_perm_cond+0x4d/0x210
[ 976.677532] ? srso_return_thunk+0x5/0x5f
[ 976.677538] ? xfs_vn_getattr+0xe2/0x3c0 [xfs]
[ 976.677700] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 976.677725] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 976.677746] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 976.677767] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 976.677790] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 976.677811] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
[ 976.677834] ? nfsd_cache_update+0x57/0x2c0 [nfsd]
[ 976.677889] svc_rdma_sendto+0x99/0x510 [rpcrdma]
[ 976.677912] ? svcauth_unix_release+0x1e/0x80 [sunrpc]
[ 976.677968] svc_send+0x49/0x140 [sunrpc]
[ 976.678013] svc_process+0x166/0x200 [sunrpc]
[ 976.678058] svc_recv+0x8a1/0xaa0 [sunrpc]
[ 976.678101] ? __pfx_nfsd+0x10/0x10 [nfsd]
[ 976.678144] nfsd+0xa7/0x110 [nfsd]
[ 976.678183] kthread+0xe4/0x120
[ 976.678188] ? __pfx_kthread+0x10/0x10
[ 976.678192] ret_from_fork+0x46/0x70
[ 976.678197] ? __pfx_kthread+0x10/0x10
[ 976.678200] ret_from_fork_asm+0x1a/0x30
[ 976.678210] </TASK>
Full log attached.
File: dmesg-6.13.6.log (text/plain)
Size: 407.10 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307819
---
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next reply other threads:[~2025-03-13 18:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 18:35 Lucas via Bugspray Bot [this message]
2025-03-13 18:44 ` NFS Server Issues with RDMA in Kernel 6.13.6 Chuck Lever
2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-14 10:43 ` Robin Murphy
2025-03-14 20:26 ` Chuck Lever
2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250313-b219865c0-2a34cbc6e249@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=cel@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox