From: Chuck Lever <cel@kernel.org>
To: Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Cc: Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [RFC PATCH v2] svcrdma: Introduce Receive buffer arenas
Date: Tue, 19 Aug 2025 13:16:00 -0400 [thread overview]
Message-ID: <ee20aca3-8c32-48f4-8a90-5e4cd4e05aab@kernel.org> (raw)
In-Reply-To: <383ea66a-8b0e-4b90-98c7-69a737c23f82@talpey.com>
On 8/19/25 12:58 PM, Tom Talpey wrote:
> On 8/19/2025 12:08 PM, Chuck Lever wrote:
>> On 8/19/25 12:03 PM, Tom Talpey wrote:
>>> On 8/11/2025 4:35 PM, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> Reduce the per-connection footprint in the host's and RNIC's memory
>>>> management TLBs by combining groups of a connection's Receive
>>>> buffers into fewer IOVAs.
>>>
>>> This is an interesting and potentially useful approach. Keeping
>>> the iova count (==1) reduces the size of work requests and greatly
>>> simplifies processing.
>>>
>>> But how large are the iova's currently? RPCRDMA_DEF_INLINE_THRESH
>>> is just 4096, which would mean typically <= 2 iova's. The max is
>>> arbitrarily but consistently 64KB, is this complexity worth it?
>>
>> The pool's shard size is RPCRDMA_MAX_INLINE_THRESH, or 64KB. That's the
>> largest inline threshold this implementation allows.
>>
>> The default inline threshold is 4KB, so one shared can hold up to
>> sixteen 4KB Receive buffers. The default credit limit is 64, plus 8
>> batch overflow, so 72 Receive buffers total per connection.
>>
>>
>>> And, allocating large contiguous buffers would seem to shift the
>>> burden to kmalloc and/or the IOMMU, so it's not free, right?
>>
>> Can you elaborate on what you mean by "burden" ?
>
> Sure, it's that somebody has to manage the iova scatter/gather
> segments.
>
> Using kmalloc or its moral equivalent offers a contract that the
> memory returned is physically contiguous, 1 segment. That's
> gonna scale badly.
I'm still not sure what's not going to scale. We're already using
kmalloc today, one per Receive buffer. I'm making it one kmalloc per
shard (which can contain more than a dozen Receive buffers).
> Using the IOMMU, when available, stuffs the s/g list into its
> hardware. Simple at the verb layer (again 1 segment) but uses
> the shared hardware resource to provide it.
>
> Another approach might be to use fast-register for the receive
> buffers, instead of ib_register_mr on the privileged lmr. This
> would be a page list with first-byte-offset and length, which
> would put it the adapter's TPT instead of the PCI-facing IOMMU.
> The fmr's would registerd only once, unlike the fmr's used for
> remote transfers, so the cost would remain low. And fmr's typically
> support 16 segments minimum, so no restriction there.
I can experiment with fast registration. The goal of this work is to
reduce the per-connection hardware footprint.
> My point is that it seems unnecessary somehow in the RPCRDMA
> layer.
Well, if this effort is intriguing to others, it can certainly be moved
into the RDMA core. I already intend to convert the RPC/RDMA client
Receive code to use it too.
> But, that's just my intuition. Finding some way to measure
> any benefit (performance, setup overhead, scalbility, ...) would
> be certainly be useful.
That is a primary purpose of me posting this RFC. As stated in the patch
description, I would like some help quantifying the improvement (if
there is any).
--
Chuck Lever
next prev parent reply other threads:[~2025-08-19 17:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 20:35 [RFC PATCH v2] svcrdma: Introduce Receive buffer arenas Chuck Lever
2025-08-19 16:03 ` Tom Talpey
2025-08-19 16:08 ` Chuck Lever
2025-08-19 16:58 ` Tom Talpey
2025-08-19 17:16 ` Chuck Lever [this message]
2025-08-19 17:39 ` Tom Talpey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee20aca3-8c32-48f4-8a90-5e4cd4e05aab@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.