From: Chuck Lever <chuck.lever@oracle.com>
To: Robin Murphy <robin.murphy@arm.com>,
Lucas via Bugspray Bot <bugbot@kernel.org>,
jlayton@kernel.org, linux-nfs@vger.kernel.org,
iommu@lists.linux.dev, cel@kernel.org, trondmy@kernel.org,
anna@kernel.org
Subject: Re: NFS Server Issues with RDMA in Kernel 6.13.6
Date: Fri, 14 Mar 2025 16:26:55 -0400 [thread overview]
Message-ID: <86bc01b6-fb3e-42e9-ae2b-fdea7bb16420@oracle.com> (raw)
In-Reply-To: <e59f75ea-9b50-45dc-aa89-f0e02aa4e787@arm.com>
On 3/14/25 6:43 AM, Robin Murphy wrote:
> On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
> [...]
>> system: Suermicro AS-4124GS-TNR
>> cpu: AMD EPYC 7H12 64-Core Processor
>> ram: 512G
>> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
>>
>>
>>>> [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
>>>> [ 976.677380] dma_map_sgtable+0x21/0x50
>>>
>>> So, here (and above) is where we leave the NFS server and venture into
>>> the IOMMU layer. Adding the I/O folks for additional eyes.
>>>
>>> Can you give us the output of:
>>>
>>> $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>>>
>>
>>
>> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/
>> iova.o alloc_iova+0x92
>> alloc_iova+0x92/0x290:
>> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/
>> iova.c:180
>> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
>> root@test:/usr/src/linux-6.13.6#
>
> OK so this is waiting for iova_rbtree_lock to get into the allocation
> slowpath since there was nothing suitable in the IOVA caches. Said
> slowpath under the lock is unfortunately prone to being quite slow,
> especially as the rbtree fills up with massive numbers of relatively
> small allocations (which I'm guessing I/O with a 4KB block size would
> tend towards). If you have 256 threads all contending the same path then
> they could certainly end up waiting a while, although they shouldn't be
> *permanently* stuck...
The reported PID is different on every stack dump, so this doesn't look
like a permanent stall for any of the nfsd threads.
But is there a way that NFSD can reduce the amount of IOVA fragmentation
it causes? I wouldn't think that a similar multi-threaded 4KB I/O
workload on a local disk would result in the same kind of stalling
behavior.
I also note that the stack trace is the same for each occurance:
[ 1047.817528] alloc_iova+0x92/0x290
[ 1047.817534] ? __alloc_pages_noprof+0x191/0x1280
[ 1047.817542] ? current_time+0x2d/0x120
[ 1047.817548] alloc_iova_fast+0x1fb/0x400
[ 1047.817554] iommu_dma_alloc_iova+0xa2/0x190
[ 1047.817559] iommu_dma_map_sg+0x447/0x4e0
[ 1047.817566] __dma_map_sg_attrs+0x139/0x1b0
[ 1047.817572] dma_map_sgtable+0x21/0x50
[ 1047.817578] rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 1047.817720] ? srso_return_thunk+0x5/0x5f
[ 1047.817729] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 1047.817757] svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 1047.817774] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 1047.817791] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 1047.817810] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 1047.817828] svc_rdma_send_write_list+0x144/0x290 [rpcrdma]
svc_rdma_send_write_list() appears in all of these.
This function assembles an NFS READ response that will use an RDMA Write
to convey the I/O payload to the NFS client.
--
Chuck Lever
next prev parent reply other threads:[~2025-03-14 20:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
2025-03-13 19:20 ` Lucas via Bugspray Bot
2025-03-14 10:43 ` Robin Murphy
2025-03-14 20:26 ` Chuck Lever [this message]
2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86bc01b6-fb3e-42e9-ae2b-fdea7bb16420@oracle.com \
--to=chuck.lever@oracle.com \
--cc=anna@kernel.org \
--cc=bugbot@kernel.org \
--cc=cel@kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox