Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Robin Murphy <robin.murphy@arm.com>,
	Lucas via Bugspray Bot <bugbot@kernel.org>,
	jlayton@kernel.org, linux-nfs@vger.kernel.org,
	iommu@lists.linux.dev, cel@kernel.org, trondmy@kernel.org,
	anna@kernel.org
Subject: Re: NFS Server Issues with RDMA in Kernel 6.13.6
Date: Fri, 14 Mar 2025 16:26:55 -0400	[thread overview]
Message-ID: <86bc01b6-fb3e-42e9-ae2b-fdea7bb16420@oracle.com> (raw)
In-Reply-To: <e59f75ea-9b50-45dc-aa89-f0e02aa4e787@arm.com>

On 3/14/25 6:43 AM, Robin Murphy wrote:
> On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
> [...]
>> system: Suermicro AS-4124GS-TNR
>> cpu: AMD EPYC 7H12 64-Core Processor
>> ram: 512G
>> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
>>
>>
>>>> [  976.677373]  __dma_map_sg_attrs+0x139/0x1b0
>>>> [  976.677380]  dma_map_sgtable+0x21/0x50
>>>
>>> So, here (and above) is where we leave the NFS server and venture into
>>> the IOMMU layer. Adding the I/O folks for additional eyes.
>>>
>>> Can you give us the output of:
>>>
>>>    $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>>>
>>
>>
>> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/
>> iova.o alloc_iova+0x92
>> alloc_iova+0x92/0x290:
>> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/
>> iova.c:180
>> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
>> root@test:/usr/src/linux-6.13.6#
> 
> OK so this is waiting for iova_rbtree_lock to get into the allocation
> slowpath since there was nothing suitable in the IOVA caches. Said
> slowpath under the lock is unfortunately prone to being quite slow,
> especially as the rbtree fills up with massive numbers of relatively
> small allocations (which I'm guessing I/O with a 4KB block size would
> tend towards). If you have 256 threads all contending the same path then
> they could certainly end up waiting a while, although they shouldn't be
> *permanently* stuck...

The reported PID is different on every stack dump, so this doesn't look
like a permanent stall for any of the nfsd threads.

But is there a way that NFSD can reduce the amount of IOVA fragmentation
it causes? I wouldn't think that a similar multi-threaded 4KB I/O
workload on a local disk would result in the same kind of stalling
behavior.

I also note that the stack trace is the same for each occurance:

[ 1047.817528]  alloc_iova+0x92/0x290
[ 1047.817534]  ? __alloc_pages_noprof+0x191/0x1280
[ 1047.817542]  ? current_time+0x2d/0x120
[ 1047.817548]  alloc_iova_fast+0x1fb/0x400
[ 1047.817554]  iommu_dma_alloc_iova+0xa2/0x190
[ 1047.817559]  iommu_dma_map_sg+0x447/0x4e0
[ 1047.817566]  __dma_map_sg_attrs+0x139/0x1b0
[ 1047.817572]  dma_map_sgtable+0x21/0x50
[ 1047.817578]  rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 1047.817720]  ? srso_return_thunk+0x5/0x5f
[ 1047.817729]  svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 1047.817757]  svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 1047.817774]  ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 1047.817791]  ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 1047.817810]  svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 1047.817828]  svc_rdma_send_write_list+0x144/0x290 [rpcrdma]

svc_rdma_send_write_list() appears in all of these.

This function assembles an NFS READ response that will use an RDMA Write
to convey the I/O payload to the NFS client.


-- 
Chuck Lever

  reply	other threads:[~2025-03-14 20:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 18:35 NFS Server Issues with RDMA in Kernel 6.13.6 Lucas via Bugspray Bot
2025-03-13 18:44 ` Chuck Lever
2025-03-13 19:20   ` Lucas via Bugspray Bot
2025-03-14 10:43     ` Robin Murphy
2025-03-14 20:26       ` Chuck Lever [this message]
2025-03-24 16:55 ` Lucas via Bugspray Bot
2025-03-24 18:35 ` Chuck Lever via Bugspray Bot
2025-03-26 23:00 ` Lucas via Bugspray Bot
2025-04-01 14:10 ` Chuck Lever via Bugspray Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86bc01b6-fb3e-42e9-ae2b-fdea7bb16420@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=anna@kernel.org \
    --cc=bugbot@kernel.org \
    --cc=cel@kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox