From: Lucas via Bugspray Bot <bugbot@kernel.org>
To: jlayton@kernel.org, trondmy@kernel.org, anna@kernel.org,
cel@kernel.org, linux-nfs@vger.kernel.org
Subject: NFS Server Issues with RDMA in Kernel 6.13.6
Date: Thu, 13 Mar 2025 18:35:08 +0000 [thread overview]
Message-ID: <20250313-b219865c0-2a34cbc6e249@bugzilla.kernel.org> (raw)
Lucas added an attachment on Kernel.org Bugzilla:
Created attachment 307819
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
Hi
I am experiencing stability and performance issues when using NFS (kernel 6.13.6) over rdma protocol.
All what I need to do to trigger the issue is connect client and start read / write operations.
Fastest way to reproduce issue is by running fio job:
fio --name=test --rw=randwrite --bs=4k --filename=/mnt/nfs/test.io --size=40G --direct=1 --numjobs=18 --iodepth=24 --exitall --group_reporting --ioengine=libaio --time_based --runtime=300
Dmesg says: "watchdog: Watchdog detected hard LOCKUP on cpu "
[ 976.676922] watchdog: Watchdog detected hard LOCKUP on cpu 182
[ 976.676931] Modules linked in: xfs(E) brd(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) br_netfilter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) xt_recent(E) null_blk(E) nvme_fabrics(E) nvme(E) nvme_core(E) overlay(E) ip6t_REJECT(E) nf_reject_ipv6(E) xt_hl(E) ip6t_rt(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_LOG(E) nf_log_syslog(E) xt_multiport(E) nft_limit(E) xt_limit(E) xt_addrtype(E) xt_tcpudp(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E) nf_tables(E) binfmt_misc(E) nfnetlink(E) nls_iso8859_1(E) rpcrdma(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) sunrpc(E) rdma_ucm(E) ib_iser(E) libiscsi(E) ipmi_ssif(E) scsi_transport_iscsi(E) rdma_cm(E) ib_umad(E) kvm_amd(E) ib_ipoib(E) iw_cm(E) kvm(E) ib_cm(E) rapl(E) bridge(E) stp(E) llc(E) joydev(E) input_leds(E) ccp(E) ee1004(E) ptdma(E) k10temp(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
mac_hid(E) sch_fq_codel(E) bonding(E)
[ 976.677035] efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) ast(E) drm_client_lib(E) drm_shmem_helper(E) hid_generic(E) mlx5_core(E) mpt3sas(E) rndis_host(E) igb(E) raid_class(E) drm_kms_helper(E) usbhid(E) cdc_ether(E) dca(E) mlxfw(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) usbnet(E) psample(E) i2c_algo_bit(E) scsi_transport_sas(E) ahci(E) drm(E) mii(E) hid(E) libahci(E) i2c_piix4(E) tls(E) i2c_smbus(E) pci_hyperv_intf(E) aesni_intel(E) crypto_simd(E) cryptd(E)
[ 976.677112] CPU: 182 UID: 0 PID: 20143 Comm: nfsd Kdump: loaded Tainted: G E 6.13.6+ #1
[ 976.677118] Tainted: [E]=UNSIGNED_MODULE
[ 976.677120] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS 2.8 01/26/2024
[ 976.677123] RIP: 0010:native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677138] Code: ff ff 41 83 c6 01 41 c1 e5 10 41 c1 e6 12 45 09 ee 44 89 f0 c1 e8 10 66 41 87 44 24 02 89 c2 c1 e2 10 75 5f 31 d2 eb 02 f3 90 <41> 8b 04 24 66 85 c0 75 f5 89 c1 66 31 c9 44 39 f1 0f 84 97 00 00
[ 976.677141] RSP: 0018:ffffa16f2de8b948 EFLAGS: 00000002
[ 976.677145] RAX: 0000000001d00001 RBX: ffff8bfa8e937bc0 RCX: 0000000000000001
[ 976.677147] RDX: ffff8bfa8bd37bc0 RSI: 0000000003340000 RDI: ffff8bfb9fffbe08
[ 976.677149] RBP: ffffa16f2de8b970 R08: 0000000000000000 R09: 0000000000000000
[ 976.677151] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8bfb9fffbe08
[ 976.677153] R13: ffff8c3a8e037bc0 R14: 0000000002dc0000 R15: 00000000000000cc
[ 976.677155] FS: 0000000000000000(0000) GS:ffff8bfa8e900000(0000) knlGS:0000000000000000
[ 976.677158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 976.677160] CR2: 00007fb9b1da66b0 CR3: 00000003e84bc000 CR4: 0000000000350ef0
[ 976.677162] Call Trace:
[ 976.677166] <NMI>
[ 976.677172] ? show_regs+0x71/0x90
[ 976.677182] ? watchdog_hardlockup_check+0x1ac/0x380
[ 976.677189] ? srso_return_thunk+0x5/0x5f
[ 976.677194] ? watchdog_overflow_callback+0x69/0x80
[ 976.677198] ? __perf_event_overflow+0x153/0x450
[ 976.677206] ? srso_return_thunk+0x5/0x5f
[ 976.677211] ? perf_event_overflow+0x19/0x30
[ 976.677215] ? x86_pmu_handle_irq+0x189/0x210
[ 976.677225] ? srso_return_thunk+0x5/0x5f
[ 976.677228] ? flush_tlb_one_kernel+0xe/0x40
[ 976.677234] ? srso_return_thunk+0x5/0x5f
[ 976.677237] ? set_pte_vaddr_p4d+0x58/0x80
[ 976.677244] ? srso_return_thunk+0x5/0x5f
[ 976.677247] ? set_pte_vaddr+0x89/0xc0
[ 976.677250] ? cc_platform_has+0x30/0x40
[ 976.677256] ? srso_return_thunk+0x5/0x5f
[ 976.677259] ? native_set_fixmap+0x6b/0xa0
[ 976.677262] ? srso_return_thunk+0x5/0x5f
[ 976.677265] ? ghes_copy_tofrom_phys+0x7c/0x130
[ 976.677274] ? srso_return_thunk+0x5/0x5f
[ 976.677277] ? __ghes_peek_estatus.isra.0+0x4e/0xd0
[ 976.677282] ? amd_pmu_handle_irq+0x48/0xc0
[ 976.677287] ? perf_event_nmi_handler+0x2d/0x60
[ 976.677290] ? nmi_handle+0x67/0x190
[ 976.677295] ? default_do_nmi+0x45/0x150
[ 976.677301] ? exc_nmi+0x13e/0x1e0
[ 976.677304] ? end_repeat_nmi+0xf/0x53
[ 976.677313] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677317] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677322] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677325] </NMI>
[ 976.677326] <TASK>
[ 976.677329] _raw_spin_lock_irqsave+0x5c/0x80
[ 976.677334] alloc_iova+0x92/0x290
[ 976.677341] ? current_time+0x2d/0x120
[ 976.677348] alloc_iova_fast+0x1fb/0x400
[ 976.677351] ? srso_return_thunk+0x5/0x5f
[ 976.677354] ? touch_atime+0x1f/0x110
[ 976.677360] iommu_dma_alloc_iova+0xa2/0x190
[ 976.677365] iommu_dma_map_sg+0x447/0x4e0
[ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
[ 976.677380] dma_map_sgtable+0x21/0x50
[ 976.677386] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 976.677525] ? common_perm_cond+0x4d/0x210
[ 976.677532] ? srso_return_thunk+0x5/0x5f
[ 976.677538] ? xfs_vn_getattr+0xe2/0x3c0 [xfs]
[ 976.677700] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 976.677725] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 976.677746] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 976.677767] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 976.677790] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 976.677811] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
[ 976.677834] ? nfsd_cache_update+0x57/0x2c0 [nfsd]
[ 976.677889] svc_rdma_sendto+0x99/0x510 [rpcrdma]
[ 976.677912] ? svcauth_unix_release+0x1e/0x80 [sunrpc]
[ 976.677968] svc_send+0x49/0x140 [sunrpc]
[ 976.678013] svc_process+0x166/0x200 [sunrpc]
[ 976.678058] svc_recv+0x8a1/0xaa0 [sunrpc]
[ 976.678101] ? __pfx_nfsd+0x10/0x10 [nfsd]
[ 976.678144] nfsd+0xa7/0x110 [nfsd]
[ 976.678183] kthread+0xe4/0x120
[ 976.678188] ? __pfx_kthread+0x10/0x10
[ 976.678192] ret_from_fork+0x46/0x70
[ 976.678197] ? __pfx_kthread+0x10/0x10
[ 976.678200] ret_from_fork_asm+0x1a/0x30
[ 976.678210] </TASK>
Full log attached.
File: dmesg-6.13.6.log (text/plain)
Size: 407.10 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307819
---
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next reply other threads:[~2025-03-13 18:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 18:35 Lucas via Bugspray Bot [this message]
2025-03-13 18:44 ` NFS Server Issues with RDMA in Kernel 6.13.6 Chuck Lever
2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-14 10:43 ` Robin Murphy
2025-03-14 20:26 ` Chuck Lever
2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250313-b219865c0-2a34cbc6e249@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=cel@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.