* NFS Server Issues with RDMA in Kernel 6.13.6
@ 2025-03-13 18:35 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Lucas via Bugspray Bot @ 2025-03-13 18:35 UTC (permalink / raw)
To: jlayton, trondmy, anna, cel, linux-nfs
Lucas added an attachment on Kernel.org Bugzilla:
Created attachment 307819
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
Hi
I am experiencing stability and performance issues when using NFS (kernel 6.13.6) over rdma protocol.
All what I need to do to trigger the issue is connect client and start read / write operations.
Fastest way to reproduce issue is by running fio job:
fio --name=test --rw=randwrite --bs=4k --filename=/mnt/nfs/test.io --size=40G --direct=1 --numjobs=18 --iodepth=24 --exitall --group_reporting --ioengine=libaio --time_based --runtime=300
Dmesg says: "watchdog: Watchdog detected hard LOCKUP on cpu "
[ 976.676922] watchdog: Watchdog detected hard LOCKUP on cpu 182
[ 976.676931] Modules linked in: xfs(E) brd(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) br_netfilter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) xt_recent(E) null_blk(E) nvme_fabrics(E) nvme(E) nvme_core(E) overlay(E) ip6t_REJECT(E) nf_reject_ipv6(E) xt_hl(E) ip6t_rt(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_LOG(E) nf_log_syslog(E) xt_multiport(E) nft_limit(E) xt_limit(E) xt_addrtype(E) xt_tcpudp(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E) nf_tables(E) binfmt_misc(E) nfnetlink(E) nls_iso8859_1(E) rpcrdma(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) sunrpc(E) rdma_ucm(E) ib_iser(E) libiscsi(E) ipmi_ssif(E) scsi_transport_iscsi(E) rdma_cm(E) ib_umad(E) kvm_amd(E) ib_ipoib(E) iw_cm(E) kvm(E) ib_cm(E) rapl(E) bridge(E) stp(E) llc(E) joydev(E) input_leds(E) ccp(E) ee1004(E) ptdma(E) k10temp(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
mac_hid(E) sch_fq_codel(E) bonding(E)
[ 976.677035] efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) ast(E) drm_client_lib(E) drm_shmem_helper(E) hid_generic(E) mlx5_core(E) mpt3sas(E) rndis_host(E) igb(E) raid_class(E) drm_kms_helper(E) usbhid(E) cdc_ether(E) dca(E) mlxfw(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) usbnet(E) psample(E) i2c_algo_bit(E) scsi_transport_sas(E) ahci(E) drm(E) mii(E) hid(E) libahci(E) i2c_piix4(E) tls(E) i2c_smbus(E) pci_hyperv_intf(E) aesni_intel(E) crypto_simd(E) cryptd(E)
[ 976.677112] CPU: 182 UID: 0 PID: 20143 Comm: nfsd Kdump: loaded Tainted: G E 6.13.6+ #1
[ 976.677118] Tainted: [E]=UNSIGNED_MODULE
[ 976.677120] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS 2.8 01/26/2024
[ 976.677123] RIP: 0010:native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677138] Code: ff ff 41 83 c6 01 41 c1 e5 10 41 c1 e6 12 45 09 ee 44 89 f0 c1 e8 10 66 41 87 44 24 02 89 c2 c1 e2 10 75 5f 31 d2 eb 02 f3 90 <41> 8b 04 24 66 85 c0 75 f5 89 c1 66 31 c9 44 39 f1 0f 84 97 00 00
[ 976.677141] RSP: 0018:ffffa16f2de8b948 EFLAGS: 00000002
[ 976.677145] RAX: 0000000001d00001 RBX: ffff8bfa8e937bc0 RCX: 0000000000000001
[ 976.677147] RDX: ffff8bfa8bd37bc0 RSI: 0000000003340000 RDI: ffff8bfb9fffbe08
[ 976.677149] RBP: ffffa16f2de8b970 R08: 0000000000000000 R09: 0000000000000000
[ 976.677151] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8bfb9fffbe08
[ 976.677153] R13: ffff8c3a8e037bc0 R14: 0000000002dc0000 R15: 00000000000000cc
[ 976.677155] FS: 0000000000000000(0000) GS:ffff8bfa8e900000(0000) knlGS:0000000000000000
[ 976.677158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 976.677160] CR2: 00007fb9b1da66b0 CR3: 00000003e84bc000 CR4: 0000000000350ef0
[ 976.677162] Call Trace:
[ 976.677166] <NMI>
[ 976.677172] ? show_regs+0x71/0x90
[ 976.677182] ? watchdog_hardlockup_check+0x1ac/0x380
[ 976.677189] ? srso_return_thunk+0x5/0x5f
[ 976.677194] ? watchdog_overflow_callback+0x69/0x80
[ 976.677198] ? __perf_event_overflow+0x153/0x450
[ 976.677206] ? srso_return_thunk+0x5/0x5f
[ 976.677211] ? perf_event_overflow+0x19/0x30
[ 976.677215] ? x86_pmu_handle_irq+0x189/0x210
[ 976.677225] ? srso_return_thunk+0x5/0x5f
[ 976.677228] ? flush_tlb_one_kernel+0xe/0x40
[ 976.677234] ? srso_return_thunk+0x5/0x5f
[ 976.677237] ? set_pte_vaddr_p4d+0x58/0x80
[ 976.677244] ? srso_return_thunk+0x5/0x5f
[ 976.677247] ? set_pte_vaddr+0x89/0xc0
[ 976.677250] ? cc_platform_has+0x30/0x40
[ 976.677256] ? srso_return_thunk+0x5/0x5f
[ 976.677259] ? native_set_fixmap+0x6b/0xa0
[ 976.677262] ? srso_return_thunk+0x5/0x5f
[ 976.677265] ? ghes_copy_tofrom_phys+0x7c/0x130
[ 976.677274] ? srso_return_thunk+0x5/0x5f
[ 976.677277] ? __ghes_peek_estatus.isra.0+0x4e/0xd0
[ 976.677282] ? amd_pmu_handle_irq+0x48/0xc0
[ 976.677287] ? perf_event_nmi_handler+0x2d/0x60
[ 976.677290] ? nmi_handle+0x67/0x190
[ 976.677295] ? default_do_nmi+0x45/0x150
[ 976.677301] ? exc_nmi+0x13e/0x1e0
[ 976.677304] ? end_repeat_nmi+0xf/0x53
[ 976.677313] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677317] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677322] ? native_queued_spin_lock_slowpath+0x244/0x320
[ 976.677325] </NMI>
[ 976.677326] <TASK>
[ 976.677329] _raw_spin_lock_irqsave+0x5c/0x80
[ 976.677334] alloc_iova+0x92/0x290
[ 976.677341] ? current_time+0x2d/0x120
[ 976.677348] alloc_iova_fast+0x1fb/0x400
[ 976.677351] ? srso_return_thunk+0x5/0x5f
[ 976.677354] ? touch_atime+0x1f/0x110
[ 976.677360] iommu_dma_alloc_iova+0xa2/0x190
[ 976.677365] iommu_dma_map_sg+0x447/0x4e0
[ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
[ 976.677380] dma_map_sgtable+0x21/0x50
[ 976.677386] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 976.677525] ? common_perm_cond+0x4d/0x210
[ 976.677532] ? srso_return_thunk+0x5/0x5f
[ 976.677538] ? xfs_vn_getattr+0xe2/0x3c0 [xfs]
[ 976.677700] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 976.677725] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 976.677746] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 976.677767] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 976.677790] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 976.677811] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
[ 976.677834] ? nfsd_cache_update+0x57/0x2c0 [nfsd]
[ 976.677889] svc_rdma_sendto+0x99/0x510 [rpcrdma]
[ 976.677912] ? svcauth_unix_release+0x1e/0x80 [sunrpc]
[ 976.677968] svc_send+0x49/0x140 [sunrpc]
[ 976.678013] svc_process+0x166/0x200 [sunrpc]
[ 976.678058] svc_recv+0x8a1/0xaa0 [sunrpc]
[ 976.678101] ? __pfx_nfsd+0x10/0x10 [nfsd]
[ 976.678144] nfsd+0xa7/0x110 [nfsd]
[ 976.678183] kthread+0xe4/0x120
[ 976.678188] ? __pfx_kthread+0x10/0x10
[ 976.678192] ret_from_fork+0x46/0x70
[ 976.678197] ? __pfx_kthread+0x10/0x10
[ 976.678200] ret_from_fork_asm+0x1a/0x30
[ 976.678210] </TASK>
Full log attached.
File: dmesg-6.13.6.log (text/plain)
Size: 407.10 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307819
---
NFS over RDMA - Watchdog detected hard LOCKUP on cpu
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
@ 2025-03-13 18:44 ` Chuck Lever
2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-24 16:55 ` Lucas via Bugspray Bot
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-03-13 18:44 UTC (permalink / raw)
To: Lucas via Bugspray Bot, jlayton, trondmy, anna, cel; +Cc: iommu, linux-nfs
On 3/13/25 2:35 PM, Lucas via Bugspray Bot wrote:
> Lucas added an attachment on Kernel.org Bugzilla:
>
> Created attachment 307819
> NFS over RDMA - Watchdog detected hard LOCKUP on cpu
>
> Hi
>
> I am experiencing stability and performance issues when using NFS (kernel 6.13.6) over rdma protocol.
> All what I need to do to trigger the issue is connect client and start read / write operations.
>
> Fastest way to reproduce issue is by running fio job:
>
> fio --name=test --rw=randwrite --bs=4k --filename=/mnt/nfs/test.io --size=40G --direct=1 --numjobs=18 --iodepth=24 --exitall --group_reporting --ioengine=libaio --time_based --runtime=300
>
> Dmesg says: "watchdog: Watchdog detected hard LOCKUP on cpu "
>
> [ 976.676922] watchdog: Watchdog detected hard LOCKUP on cpu 182
> [ 976.676931] Modules linked in: xfs(E) brd(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) br_netfilter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) xt_recent(E) null_blk(E) nvme_fabrics(E) nvme(E) nvme_core(E) overlay(E) ip6t_REJECT(E) nf_reject_ipv6(E) xt_hl(E) ip6t_rt(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_LOG(E) nf_log_syslog(E) xt_multiport(E) nft_limit(E) xt_limit(E) xt_addrtype(E) xt_tcpudp(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E) nf_tables(E) binfmt_misc(E) nfnetlink(E) nls_iso8859_1(E) rpcrdma(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) sunrpc(E) rdma_ucm(E) ib_iser(E) libiscsi(E) ipmi_ssif(E) scsi_transport_iscsi(E) rdma_cm(E) ib_umad(E) kvm_amd(E) ib_ipoib(E) iw_cm(E) kvm(E) ib_cm(E) rapl(E) bridge(E) stp(E) llc(E) joydev(E) input_leds(E) ccp(E) ee1004(E) ptdma(E) k10temp(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
> mac_hid(E) sch_fq_codel(E) bonding(E)
> [ 976.677035] efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) ast(E) drm_client_lib(E) drm_shmem_helper(E) hid_generic(E) mlx5_core(E) mpt3sas(E) rndis_host(E) igb(E) raid_class(E) drm_kms_helper(E) usbhid(E) cdc_ether(E) dca(E) mlxfw(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) usbnet(E) psample(E) i2c_algo_bit(E) scsi_transport_sas(E) ahci(E) drm(E) mii(E) hid(E) libahci(E) i2c_piix4(E) tls(E) i2c_smbus(E) pci_hyperv_intf(E) aesni_intel(E) crypto_simd(E) cryptd(E)
> [ 976.677112] CPU: 182 UID: 0 PID: 20143 Comm: nfsd Kdump: loaded Tainted: G E 6.13.6+ #1
> [ 976.677118] Tainted: [E]=UNSIGNED_MODULE
> [ 976.677120] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS 2.8 01/26/2024
> [ 976.677123] RIP: 0010:native_queued_spin_lock_slowpath+0x244/0x320
> [ 976.677138] Code: ff ff 41 83 c6 01 41 c1 e5 10 41 c1 e6 12 45 09 ee 44 89 f0 c1 e8 10 66 41 87 44 24 02 89 c2 c1 e2 10 75 5f 31 d2 eb 02 f3 90 <41> 8b 04 24 66 85 c0 75 f5 89 c1 66 31 c9 44 39 f1 0f 84 97 00 00
> [ 976.677141] RSP: 0018:ffffa16f2de8b948 EFLAGS: 00000002
> [ 976.677145] RAX: 0000000001d00001 RBX: ffff8bfa8e937bc0 RCX: 0000000000000001
> [ 976.677147] RDX: ffff8bfa8bd37bc0 RSI: 0000000003340000 RDI: ffff8bfb9fffbe08
> [ 976.677149] RBP: ffffa16f2de8b970 R08: 0000000000000000 R09: 0000000000000000
> [ 976.677151] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8bfb9fffbe08
> [ 976.677153] R13: ffff8c3a8e037bc0 R14: 0000000002dc0000 R15: 00000000000000cc
> [ 976.677155] FS: 0000000000000000(0000) GS:ffff8bfa8e900000(0000) knlGS:0000000000000000
> [ 976.677158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 976.677160] CR2: 00007fb9b1da66b0 CR3: 00000003e84bc000 CR4: 0000000000350ef0
> [ 976.677162] Call Trace:
> [ 976.677166] <NMI>
> [ 976.677172] ? show_regs+0x71/0x90
> [ 976.677182] ? watchdog_hardlockup_check+0x1ac/0x380
> [ 976.677189] ? srso_return_thunk+0x5/0x5f
> [ 976.677194] ? watchdog_overflow_callback+0x69/0x80
> [ 976.677198] ? __perf_event_overflow+0x153/0x450
> [ 976.677206] ? srso_return_thunk+0x5/0x5f
> [ 976.677211] ? perf_event_overflow+0x19/0x30
> [ 976.677215] ? x86_pmu_handle_irq+0x189/0x210
> [ 976.677225] ? srso_return_thunk+0x5/0x5f
> [ 976.677228] ? flush_tlb_one_kernel+0xe/0x40
> [ 976.677234] ? srso_return_thunk+0x5/0x5f
> [ 976.677237] ? set_pte_vaddr_p4d+0x58/0x80
> [ 976.677244] ? srso_return_thunk+0x5/0x5f
> [ 976.677247] ? set_pte_vaddr+0x89/0xc0
> [ 976.677250] ? cc_platform_has+0x30/0x40
> [ 976.677256] ? srso_return_thunk+0x5/0x5f
> [ 976.677259] ? native_set_fixmap+0x6b/0xa0
> [ 976.677262] ? srso_return_thunk+0x5/0x5f
> [ 976.677265] ? ghes_copy_tofrom_phys+0x7c/0x130
> [ 976.677274] ? srso_return_thunk+0x5/0x5f
> [ 976.677277] ? __ghes_peek_estatus.isra.0+0x4e/0xd0
> [ 976.677282] ? amd_pmu_handle_irq+0x48/0xc0
> [ 976.677287] ? perf_event_nmi_handler+0x2d/0x60
> [ 976.677290] ? nmi_handle+0x67/0x190
> [ 976.677295] ? default_do_nmi+0x45/0x150
> [ 976.677301] ? exc_nmi+0x13e/0x1e0
> [ 976.677304] ? end_repeat_nmi+0xf/0x53
> [ 976.677313] ? native_queued_spin_lock_slowpath+0x244/0x320
> [ 976.677317] ? native_queued_spin_lock_slowpath+0x244/0x320
> [ 976.677322] ? native_queued_spin_lock_slowpath+0x244/0x320
> [ 976.677325] </NMI>
> [ 976.677326] <TASK>
> [ 976.677329] _raw_spin_lock_irqsave+0x5c/0x80
> [ 976.677334] alloc_iova+0x92/0x290
> [ 976.677341] ? current_time+0x2d/0x120
> [ 976.677348] alloc_iova_fast+0x1fb/0x400
> [ 976.677351] ? srso_return_thunk+0x5/0x5f
> [ 976.677354] ? touch_atime+0x1f/0x110
> [ 976.677360] iommu_dma_alloc_iova+0xa2/0x190
> [ 976.677365] iommu_dma_map_sg+0x447/0x4e0
I'm assuming you have IOMMU enabled on the boot command line.
Can you share your hardware configuration and RDMA NIC?
> [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
> [ 976.677380] dma_map_sgtable+0x21/0x50
So, here (and above) is where we leave the NFS server and venture into
the IOMMU layer. Adding the I/O folks for additional eyes.
Can you give us the output of:
$ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
Should look something like:
alloc_iova+0x92/0x1d0:
__get_cached_rbnode at
/home/cel/src/linux/server-development/drivers/iommu/iova.c:68
(inlined by) __alloc_and_insert_iova_range at
/home/cel/src/linux/server-development/drivers/iommu/iova.c:184
(inlined by) alloc_iova at
/home/cel/src/linux/server-development/drivers/iommu/iova.c:263
> [ 976.677386] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
> [ 976.677525] ? common_perm_cond+0x4d/0x210
> [ 976.677532] ? srso_return_thunk+0x5/0x5f
> [ 976.677538] ? xfs_vn_getattr+0xe2/0x3c0 [xfs]
> [ 976.677700] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
> [ 976.677725] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
> [ 976.677746] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
> [ 976.677767] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
> [ 976.677790] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
> [ 976.677811] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
> [ 976.677834] ? nfsd_cache_update+0x57/0x2c0 [nfsd]
> [ 976.677889] svc_rdma_sendto+0x99/0x510 [rpcrdma]
> [ 976.677912] ? svcauth_unix_release+0x1e/0x80 [sunrpc]
> [ 976.677968] svc_send+0x49/0x140 [sunrpc]
> [ 976.678013] svc_process+0x166/0x200 [sunrpc]
> [ 976.678058] svc_recv+0x8a1/0xaa0 [sunrpc]
> [ 976.678101] ? __pfx_nfsd+0x10/0x10 [nfsd]
> [ 976.678144] nfsd+0xa7/0x110 [nfsd]
> [ 976.678183] kthread+0xe4/0x120
> [ 976.678188] ? __pfx_kthread+0x10/0x10
> [ 976.678192] ret_from_fork+0x46/0x70
> [ 976.678197] ? __pfx_kthread+0x10/0x10
> [ 976.678200] ret_from_fork_asm+0x1a/0x30
> [ 976.678210] </TASK>
>
>
>
> Full log attached.
>
> File: dmesg-6.13.6.log (text/plain)
> Size: 407.10 KiB
> Link: https://bugzilla.kernel.org/attachment.cgi?id=307819
> ---
> NFS over RDMA - Watchdog detected hard LOCKUP on cpu
>
> You can reply to this message to join the discussion.
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:44 ` Chuck Lever
@ 2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-14 10:43 ` Robin Murphy
0 siblings, 1 reply; 9+ messages in thread
From: Lucas via Bugspray Bot @ 2025-03-13 19:20 UTC (permalink / raw)
To: jlayton, chuck.lever, linux-nfs, iommu, cel, trondmy, anna
Lucas writes via Kernel.org Bugzilla:
(In reply to Bugspray Bot from comment #1)
> Chuck Lever <chuck.lever@oracle.com> writes:
>
> On 3/13/25 2:35 PM, Lucas via Bugspray Bot wrote:
> > Lucas added an attachment on Kernel.org Bugzilla:
> >
> > Created attachment 307819 [details]
> > NFS over RDMA - Watchdog detected hard LOCKUP on cpu
> >
> > Hi
> >
> > I am experiencing stability and performance issues when using NFS (kernel
> > 6.13.6) over rdma protocol.
> > All what I need to do to trigger the issue is connect client and start read
> /
> > write operations.
> >
> > Fastest way to reproduce issue is by running fio job:
> >
> > fio --name=test --rw=randwrite --bs=4k --filename=/mnt/nfs/test.io
> --size=40G
> > --direct=1 --numjobs=18 --iodepth=24 --exitall --group_reporting
> > --ioengine=libaio --time_based --runtime=300
> >
> > Dmesg says: "watchdog: Watchdog detected hard LOCKUP on cpu "
> >
> > [ 976.676922] watchdog: Watchdog detected hard LOCKUP on cpu 182
> > [ 976.676931] Modules linked in: xfs(E) brd(E) nft_chain_nat(E)
> > xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E)
> xfrm_algo(E)
> > br_netfilter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E)
> > xt_recent(E) null_blk(E) nvme_fabrics(E) nvme(E) nvme_core(E) overlay(E)
> > ip6t_REJECT(E) nf_reject_ipv6(E) xt_hl(E) ip6t_rt(E) ipt_REJECT(E)
> > nf_reject_ipv4(E) xt_LOG(E) nf_log_syslog(E) xt_multiport(E) nft_limit(E)
> > xt_limit(E) xt_addrtype(E) xt_tcpudp(E) xt_conntrack(E) nf_conntrack(E)
> > nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E) nf_tables(E)
> binfmt_misc(E)
> > nfnetlink(E) nls_iso8859_1(E) rpcrdma(E) amd_atl(E) intel_rapl_msr(E)
> > intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) sunrpc(E) rdma_ucm(E)
> > ib_iser(E) libiscsi(E) ipmi_ssif(E) scsi_transport_iscsi(E) rdma_cm(E)
> > ib_umad(E) kvm_amd(E) ib_ipoib(E) iw_cm(E) kvm(E) ib_cm(E) rapl(E)
> bridge(E)
> > stp(E) llc(E) joydev(E) input_leds(E) ccp(E) ee1004(E) ptdma(E) k10temp(E)
> > acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
> > mac_hid(E) sch_fq_codel(E) bonding(E)
> > [ 976.677035] efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E)
> > blake2b_generic(E) raid10(E) raid456(E) async_raid6_recov(E)
> async_memcpy(E)
> > async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E)
> raid1(E)
> > raid0(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) ast(E) drm_client_lib(E)
> > drm_shmem_helper(E) hid_generic(E) mlx5_core(E) mpt3sas(E) rndis_host(E)
> > igb(E) raid_class(E) drm_kms_helper(E) usbhid(E) cdc_ether(E) dca(E)
> mlxfw(E)
> > crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E)
> > ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E)
> > usbnet(E) psample(E) i2c_algo_bit(E) scsi_transport_sas(E) ahci(E) drm(E)
> > mii(E) hid(E) libahci(E) i2c_piix4(E) tls(E) i2c_smbus(E)
> pci_hyperv_intf(E)
> > aesni_intel(E) crypto_simd(E) cryptd(E)
> > [ 976.677112] CPU: 182 UID: 0 PID: 20143 Comm: nfsd Kdump: loaded Tainted:
> G
> > E 6.13.6+ #1
> > [ 976.677118] Tainted: [E]=UNSIGNED_MODULE
> > [ 976.677120] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS
> > 2.8 01/26/2024
> > [ 976.677123] RIP: 0010:native_queued_spin_lock_slowpath+0x244/0x320
> > [ 976.677138] Code: ff ff 41 83 c6 01 41 c1 e5 10 41 c1 e6 12 45 09 ee 44
> 89
> > f0 c1 e8 10 66 41 87 44 24 02 89 c2 c1 e2 10 75 5f 31 d2 eb 02 f3 90 <41>
> 8b
> > 04 24 66 85 c0 75 f5 89 c1 66 31 c9 44 39 f1 0f 84 97 00 00
> > [ 976.677141] RSP: 0018:ffffa16f2de8b948 EFLAGS: 00000002
> > [ 976.677145] RAX: 0000000001d00001 RBX: ffff8bfa8e937bc0 RCX:
> > 0000000000000001
> > [ 976.677147] RDX: ffff8bfa8bd37bc0 RSI: 0000000003340000 RDI:
> > ffff8bfb9fffbe08
> > [ 976.677149] RBP: ffffa16f2de8b970 R08: 0000000000000000 R09:
> > 0000000000000000
> > [ 976.677151] R10: 0000000000000001 R11: 0000000000000000 R12:
> > ffff8bfb9fffbe08
> > [ 976.677153] R13: ffff8c3a8e037bc0 R14: 0000000002dc0000 R15:
> > 00000000000000cc
> > [ 976.677155] FS: 0000000000000000(0000) GS:ffff8bfa8e900000(0000)
> > knlGS:0000000000000000
> > [ 976.677158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 976.677160] CR2: 00007fb9b1da66b0 CR3: 00000003e84bc000 CR4:
> > 0000000000350ef0
> > [ 976.677162] Call Trace:
> > [ 976.677166] <NMI>
> > [ 976.677172] ? show_regs+0x71/0x90
> > [ 976.677182] ? watchdog_hardlockup_check+0x1ac/0x380
> > [ 976.677189] ? srso_return_thunk+0x5/0x5f
> > [ 976.677194] ? watchdog_overflow_callback+0x69/0x80
> > [ 976.677198] ? __perf_event_overflow+0x153/0x450
> > [ 976.677206] ? srso_return_thunk+0x5/0x5f
> > [ 976.677211] ? perf_event_overflow+0x19/0x30
> > [ 976.677215] ? x86_pmu_handle_irq+0x189/0x210
> > [ 976.677225] ? srso_return_thunk+0x5/0x5f
> > [ 976.677228] ? flush_tlb_one_kernel+0xe/0x40
> > [ 976.677234] ? srso_return_thunk+0x5/0x5f
> > [ 976.677237] ? set_pte_vaddr_p4d+0x58/0x80
> > [ 976.677244] ? srso_return_thunk+0x5/0x5f
> > [ 976.677247] ? set_pte_vaddr+0x89/0xc0
> > [ 976.677250] ? cc_platform_has+0x30/0x40
> > [ 976.677256] ? srso_return_thunk+0x5/0x5f
> > [ 976.677259] ? native_set_fixmap+0x6b/0xa0
> > [ 976.677262] ? srso_return_thunk+0x5/0x5f
> > [ 976.677265] ? ghes_copy_tofrom_phys+0x7c/0x130
> > [ 976.677274] ? srso_return_thunk+0x5/0x5f
> > [ 976.677277] ? __ghes_peek_estatus.isra.0+0x4e/0xd0
> > [ 976.677282] ? amd_pmu_handle_irq+0x48/0xc0
> > [ 976.677287] ? perf_event_nmi_handler+0x2d/0x60
> > [ 976.677290] ? nmi_handle+0x67/0x190
> > [ 976.677295] ? default_do_nmi+0x45/0x150
> > [ 976.677301] ? exc_nmi+0x13e/0x1e0
> > [ 976.677304] ? end_repeat_nmi+0xf/0x53
> > [ 976.677313] ? native_queued_spin_lock_slowpath+0x244/0x320
> > [ 976.677317] ? native_queued_spin_lock_slowpath+0x244/0x320
> > [ 976.677322] ? native_queued_spin_lock_slowpath+0x244/0x320
> > [ 976.677325] </NMI>
> > [ 976.677326] <TASK>
> > [ 976.677329] _raw_spin_lock_irqsave+0x5c/0x80
> > [ 976.677334] alloc_iova+0x92/0x290
> > [ 976.677341] ? current_time+0x2d/0x120
> > [ 976.677348] alloc_iova_fast+0x1fb/0x400
> > [ 976.677351] ? srso_return_thunk+0x5/0x5f
> > [ 976.677354] ? touch_atime+0x1f/0x110
> > [ 976.677360] iommu_dma_alloc_iova+0xa2/0x190
> > [ 976.677365] iommu_dma_map_sg+0x447/0x4e0
>
> I'm assuming you have IOMMU enabled on the boot command line.
>
> Can you share your hardware configuration and RDMA NIC?
>
>
system: Suermicro AS-4124GS-TNR
cpu: AMD EPYC 7H12 64-Core Processor
ram: 512G
rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
> > [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
> > [ 976.677380] dma_map_sgtable+0x21/0x50
>
> So, here (and above) is where we leave the NFS server and venture into
> the IOMMU layer. Adding the I/O folks for additional eyes.
>
> Can you give us the output of:
>
> $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>
root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
alloc_iova+0x92/0x290:
__alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/iova.c:180
(inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
root@test:/usr/src/linux-6.13.6#
> Should look something like:
>
> alloc_iova+0x92/0x1d0:
> __get_cached_rbnode at
> /home/cel/src/linux/server-development/drivers/iommu/iova.c:68
> (inlined by) __alloc_and_insert_iova_range at
> /home/cel/src/linux/server-development/drivers/iommu/iova.c:184
> (inlined by) alloc_iova at
> /home/cel/src/linux/server-development/drivers/iommu/iova.c:263
>
>
> > [ 976.677386] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
> > [ 976.677525] ? common_perm_cond+0x4d/0x210
> > [ 976.677532] ? srso_return_thunk+0x5/0x5f
> > [ 976.677538] ? xfs_vn_getattr+0xe2/0x3c0 [xfs]
> > [ 976.677700] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
> > [ 976.677725] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
> > [ 976.677746] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
> > [ 976.677767] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
> > [ 976.677790] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
> > [ 976.677811] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
> > [ 976.677834] ? nfsd_cache_update+0x57/0x2c0 [nfsd]
> > [ 976.677889] svc_rdma_sendto+0x99/0x510 [rpcrdma]
> > [ 976.677912] ? svcauth_unix_release+0x1e/0x80 [sunrpc]
> > [ 976.677968] svc_send+0x49/0x140 [sunrpc]
> > [ 976.678013] svc_process+0x166/0x200 [sunrpc]
> > [ 976.678058] svc_recv+0x8a1/0xaa0 [sunrpc]
> > [ 976.678101] ? __pfx_nfsd+0x10/0x10 [nfsd]
> > [ 976.678144] nfsd+0xa7/0x110 [nfsd]
> > [ 976.678183] kthread+0xe4/0x120
> > [ 976.678188] ? __pfx_kthread+0x10/0x10
> > [ 976.678192] ret_from_fork+0x46/0x70
> > [ 976.678197] ? __pfx_kthread+0x10/0x10
> > [ 976.678200] ret_from_fork_asm+0x1a/0x30
> > [ 976.678210] </TASK>
> >
> >
> >
> > Full log attached.
> >
> > File: dmesg-6.13.6.log (text/plain)
> > Size: 407.10 KiB
> > Link: https://bugzilla.kernel.org/attachment.cgi?id=307819
> > ---
> > NFS over RDMA - Watchdog detected hard LOCKUP on cpu
> >
> > You can reply to this message to join the discussion.
>
> (via https://msgid.link/7285c885-f998-410a-b6a6-f743328bf0b8@oracle.com)
View: https://bugzilla.kernel.org/show_bug.cgi?id=219865#c2
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 19:20 ` Lucas via Bugspray Bot
@ 2025-03-14 10:43 ` Robin Murphy
2025-03-14 20:26 ` Chuck Lever
0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-03-14 10:43 UTC (permalink / raw)
To: Lucas via Bugspray Bot, jlayton, chuck.lever, linux-nfs, iommu,
cel, trondmy, anna
On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
[...]
> system: Suermicro AS-4124GS-TNR
> cpu: AMD EPYC 7H12 64-Core Processor
> ram: 512G
> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
>
>
>>> [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
>>> [ 976.677380] dma_map_sgtable+0x21/0x50
>>
>> So, here (and above) is where we leave the NFS server and venture into
>> the IOMMU layer. Adding the I/O folks for additional eyes.
>>
>> Can you give us the output of:
>>
>> $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>>
>
>
> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
> alloc_iova+0x92/0x290:
> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/iova.c:180
> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
> root@test:/usr/src/linux-6.13.6#
OK so this is waiting for iova_rbtree_lock to get into the allocation
slowpath since there was nothing suitable in the IOVA caches. Said
slowpath under the lock is unfortunately prone to being quite slow,
especially as the rbtree fills up with massive numbers of relatively
small allocations (which I'm guessing I/O with a 4KB block size would
tend towards). If you have 256 threads all contending the same path then
they could certainly end up waiting a while, although they shouldn't be
*permanently* stuck...
Thanks,
Robin.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-14 10:43 ` Robin Murphy
@ 2025-03-14 20:26 ` Chuck Lever
0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2025-03-14 20:26 UTC (permalink / raw)
To: Robin Murphy, Lucas via Bugspray Bot, jlayton, linux-nfs, iommu,
cel, trondmy, anna
On 3/14/25 6:43 AM, Robin Murphy wrote:
> On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
> [...]
>> system: Suermicro AS-4124GS-TNR
>> cpu: AMD EPYC 7H12 64-Core Processor
>> ram: 512G
>> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
>>
>>
>>>> [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
>>>> [ 976.677380] dma_map_sgtable+0x21/0x50
>>>
>>> So, here (and above) is where we leave the NFS server and venture into
>>> the IOMMU layer. Adding the I/O folks for additional eyes.
>>>
>>> Can you give us the output of:
>>>
>>> $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>>>
>>
>>
>> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/
>> iova.o alloc_iova+0x92
>> alloc_iova+0x92/0x290:
>> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/
>> iova.c:180
>> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
>> root@test:/usr/src/linux-6.13.6#
>
> OK so this is waiting for iova_rbtree_lock to get into the allocation
> slowpath since there was nothing suitable in the IOVA caches. Said
> slowpath under the lock is unfortunately prone to being quite slow,
> especially as the rbtree fills up with massive numbers of relatively
> small allocations (which I'm guessing I/O with a 4KB block size would
> tend towards). If you have 256 threads all contending the same path then
> they could certainly end up waiting a while, although they shouldn't be
> *permanently* stuck...
The reported PID is different on every stack dump, so this doesn't look
like a permanent stall for any of the nfsd threads.
But is there a way that NFSD can reduce the amount of IOVA fragmentation
it causes? I wouldn't think that a similar multi-threaded 4KB I/O
workload on a local disk would result in the same kind of stalling
behavior.
I also note that the stack trace is the same for each occurance:
[ 1047.817528] alloc_iova+0x92/0x290
[ 1047.817534] ? __alloc_pages_noprof+0x191/0x1280
[ 1047.817542] ? current_time+0x2d/0x120
[ 1047.817548] alloc_iova_fast+0x1fb/0x400
[ 1047.817554] iommu_dma_alloc_iova+0xa2/0x190
[ 1047.817559] iommu_dma_map_sg+0x447/0x4e0
[ 1047.817566] __dma_map_sg_attrs+0x139/0x1b0
[ 1047.817572] dma_map_sgtable+0x21/0x50
[ 1047.817578] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 1047.817720] ? srso_return_thunk+0x5/0x5f
[ 1047.817729] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 1047.817757] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 1047.817774] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 1047.817791] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 1047.817810] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 1047.817828] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
svc_rdma_send_write_list() appears in all of these.
This function assembles an NFS READ response that will use an RDMA Write
to convey the I/O payload to the NFS client.
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
@ 2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Lucas via Bugspray Bot @ 2025-03-24 16:55 UTC (permalink / raw)
To: linux-nfs, iommu, trondmy, chuck.lever, cel, robin.murphy, anna,
jlayton
Lucas writes via Kernel.org Bugzilla:
Apologies for the silly question, I’m not a kernel developer. Is there any update on this issue, or is the fix included in the latest 6.14 kernel? If there’s anything I can do to help, such as running more tests, please let me know.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219865#c5
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
2025-03-24 16:55 ` Lucas via Bugspray Bot
@ 2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
4 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever via Bugspray Bot @ 2025-03-24 18:35 UTC (permalink / raw)
To: iommu, robin.murphy, anna, chuck.lever, jlayton, trondmy,
linux-nfs, cel
Chuck Lever writes via Kernel.org Bugzilla:
No progress. It looks like a long term structural problem that we won't be able to address quickly.
It would be good to confirm that the issue is indeed contention in the IOVA allocator. Compiling with LOCK_STAT enabled and looking in /proc/lock_stat would give some indication that Robin's theory in on the right track.
Meanwhile I plan to ask the RDMA gurus if there is an improvement that can be made in the svcrdma implementation to help address the problem. If you need immediately relief, reducing the NFSD thread count might help.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219865#c6
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
` (2 preceding siblings ...)
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
@ 2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
4 siblings, 0 replies; 9+ messages in thread
From: Lucas via Bugspray Bot @ 2025-03-26 23:00 UTC (permalink / raw)
To: linux-nfs, trondmy, cel, anna, iommu, jlayton, robin.murphy,
chuck.lever
Lucas added an attachment on Kernel.org Bugzilla:
Created attachment 307898
logs with LOCK_STAT enabled
Hi
I was able to recompile the kernel with LOCK_STAT enabled and reproduce the issue. Logs are attached.
Logs attached.
File: logs-lock_stat.zip (application/zip)
Size: 139.49 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307898
---
logs with LOCK_STAT enabled
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Server Issues with RDMA in Kernel 6.13.6
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
` (3 preceding siblings ...)
2025-03-26 23:00 ` Lucas via Bugspray Bot
@ 2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
4 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever via Bugspray Bot @ 2025-04-01 14:10 UTC (permalink / raw)
To: anna, cel, iommu, jlayton, trondmy, robin.murphy, linux-nfs,
chuck.lever
Chuck Lever writes via Kernel.org Bugzilla:
Attachment 307898 shows something completely different than before:
[ 4478.512632] xfs_blockgc_flush_all+0x99/0x140 [xfs]
[ 4478.512760] xfs_trans_alloc+0x116/0x2b0 [xfs]
[ 4478.512890] xfs_trans_alloc_inode+0x7d/0x190 [xfs]
[ 4478.513012] xfs_alloc_file_space+0x1ad/0x340 [xfs]
[ 4478.513149] xfs_file_fallocate+0x243/0x4b0 [xfs]
[ 4478.513276] vfs_fallocate+0x18d/0x440
[ 4478.513294] nfsd4_vfs_fallocate+0x50/0xa0 [nfsd]
[ 4478.513351] nfsd4_allocate+0x69/0xb0 [nfsd]
[ 4478.513398] nfsd4_proc_compound+0x484/0x930 [nfsd]
[ 4478.513432] ? srso_return_thunk+0x5/0x5f
[ 4478.513447] nfsd_dispatch+0xfe/0x2e0 [nfsd]
[ 4478.513493] svc_process_common+0x352/0x7d0 [sunrpc]
[ 4478.513556] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[ 4478.513600] svc_process+0x13e/0x260 [sunrpc]
[ 4478.513644] svc_recv+0x9a8/0xc40 [sunrpc]
[ 4478.513694] ? __pfx_nfsd+0x10/0x10 [nfsd]
[ 4478.513733] nfsd+0xe0/0x150 [nfsd]
[ 4478.513769] kthread+0xe7/0x120
[ 4478.513777] ? __pfx_kthread+0x10/0x10
[ 4478.513787] ret_from_fork+0x46/0x70
[ 4478.513794] ? __pfx_kthread+0x10/0x10
[ 4478.513801] ret_from_fork_asm+0x1a/0x30
Here your NFS server is waiting in the local file system for an NFSv4.2 ALLOCATE operation. Nothing to do with RDMA.
I'm at a loss to say that there is anything specific -- I think you are just at the limit that your server configuration can handle.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219865#c8
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-04-01 14:09 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-14 10:43 ` Robin Murphy
2025-03-14 20:26 ` Chuck Lever
2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox