From: Jiri Pirko <jiri@resnulli.us>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org, leon@kernel.org, mrgolin@amazon.com,
gal.pressman@linux.dev, sleybo@amazon.com, parav@nvidia.com,
mbloch@nvidia.com, yanjun.zhu@linux.dev,
marco.crivellari@suse.com, roman.gushchin@linux.dev,
phaddad@nvidia.com, lirongqing@baidu.com, ynachum@amazon.com,
huangjunxian6@hisilicon.com,
kalesh-anakkur.purayil@broadcom.com, ohartoov@nvidia.com,
michaelgur@nvidia.com, shayd@nvidia.com, edwards@nvidia.com,
sriharsha.basavapatna@broadcom.com,
andrew.gospodarek@broadcom.com, selvin.xavier@broadcom.com
Subject: Re: [PATCH rdma-next v3 03/17] RDMA/core: Introduce generic buffer descriptor infrastructure for umem
Date: Thu, 14 May 2026 11:02:45 +0200 [thread overview]
Message-ID: <agWOldIWkFI3i1xB@FV6GYCPJ69> (raw)
In-Reply-To: <20260513233447.GU7702@ziepe.ca>
Thu, May 14, 2026 at 01:34:47AM +0200, jgg@ziepe.ca wrote:
>On Wed, May 13, 2026 at 09:18:28PM +0200, Jiri Pirko wrote:
>> Tue, May 12, 2026 at 08:12:36PM +0200, jgg@ziepe.ca wrote:
>> >On Wed, May 06, 2026 at 04:14:34PM +0200, Jiri Pirko wrote:
>> >> Wed, May 06, 2026 at 03:37:57PM +0200, jgg@ziepe.ca wrote:
>> >> >On Mon, May 04, 2026 at 03:57:17PM +0200, Jiri Pirko wrote:
>> >> >
>> >> >> +/**
>> >> >> + * ib_umem_get - Canonical on-demand umem getter.
>> >> >> + * @device: IB device.
>> >> >> + * @udata: uverbs udata bundle (may be NULL).
>> >> >> + * @attr_id: per-command UMEM attribute id; consulted if @udata is set.
>> >> >> + * @legacy_filler: optional command-specific legacy attr filler.
>> >> >> + * invoked if @udata is set.
>> >> >> + * @va_fallback: if true, build a VA-typed desc with @addr.
>> >> >> + * @addr: user VA, used if @va_fallback is true.
>> >> >> + * @size: driver-required minimum length.
>> >> >> + * @access: IB access flags forwarded to ib_umem_get_desc().
>> >> >> + *
>> >> >> + * Return: valid umem on success, ERR_PTR(...) on error, NULL
>> >> >> + * if no source produced a buffer (only possible when @va_fallback is false).
>> >> >> + */
>> >> >> +struct ib_umem *ib_umem_get(struct ib_device *device, struct ib_udata *udata,
>> >> >> + u16 attr_id,
>> >> >> + ib_umem_buf_desc_filler_t legacy_filler,
>> >> >> + bool va_fallback, u64 addr, size_t size, int access)
>> >> >
>> >> >I didn't try to look at what the drivers actually do, but I'm slightly
>> >> >surprised not to see an addr_size here? Is it the case the drivers
>> >> >don't have have a uhw->size to go along with their uhw->va?
>> >>
>> >> "size_t size". What am I missing?
>> >
>> >size is the minimum length, not the actual length passed into the
>> >system call
>>
>> Right, you're correct. Size is the post-pin minimum, and on
>> the VA fallback path it serves also as the pin length (we synthesize
>> desc.length = size when no attr/legacy filler matched).
>>
>> There's no addr_size because no caller has a user-passed
>> length distinct from the driver-required minimum.
>> Drivers either have no length in their legacy ucmd (mlx4/mlx5 CQ, mlx5 QP,
>> the size is driver-computed from entries*cqe_size etc.) or they
>> pass ucmd.buf_size which serves as both (vmw_pvrdma, qedr, mlx5
>> SRQ, ...).
>
>This is what I was wondering about, so how does it work when there is
>a ucmd.buf_size and also a minimum size computed from
>entries*cqe_size? What does the driver write?
>
>I think the driver has to pass in both the minimum size computed from
>entries*cqe_size and also the uhw exact size? The minimum size is used
>if the uhw path isn't used to check the other attribute while the uhw
>exact size is used to pin the memory if the uhw path is used - and it
>should also be checked against the minimum?
I asked Claude to check every caller: At the moment of conversion
none of them would have addr_size != min_size. They split into
three groups:
1) Driver-computed total only (no user-passed length distinct from
min). addr_size == min_size by construction:
mlx4 CQ/QP/SRQ entries * cqe_size, qp->buf_size, ...
mlx5 CQ/QP/SRQ entries * cqe_size, rwq->buf_size, ...
bnxt_re QP/SRQ/CQ-resize max_wqe * wqe_size, entries * sizeof
mana CQ cq->cqe * COMP_ENTRY_SIZE
hns_roce MTR mtr_bufs_size(buf_attr)
qedr SRQ producer pair sizeof(struct rdma_srq_producers)
all DBR helpers PAGE_SIZE
2) MR registration / opaque user umem. The user-passed length *is*
the request; there's no separate driver minimum. addr_size ==
min_size by definition:
reg_user_mr in every driver
mlx5 devx user umem
3) User-passed length without a driver minimum cross-check today:
vmw_pvrdma CQ/QP/SRQ derivable min (entries*sizeof, wqe_*),
not computed in the user path
qedr CQ/QP/SRQ ureq.size, no min computed in wrapper
mana WQ, QP raw_sq,
QP RC queues ucmd.{wq,sq}_buf_size, queue_size[],
no derivable min
ionic CQ, QP sq/rq req_cq->size, sq->size, rq->size,
no derivable min
For completeness, the two drivers that don't go through ib_umem use
the same single-length shape:
hfi1 hfi1_acquire_user_pages(vaddr, npages, ...) — thin wrapper
around pin_user_pages_fast(); no separate driver-min check.
usnic usnic_uiom_get_pages(addr, size, ...) — single size, same
shape as ib_umem_get_va().
So across the entire RDMA tree (umem and non-umem) every user-memory
pin helper currently takes a single length, and no caller
distinguishes a user-passed addr_size from a driver-computed min_size.
Group 3 is where addr_size != min_size could become real, but only
after each driver is independently tightened. Until then the split
just makes every caller pass the same value twice and would leave
ib_umem_get() the only helper in the subsystem that enforces the
distinction structurally.
Given that, I'd prefer to keep the single size argument for now and
spell the contract out in the kdoc:
@size: minimum required umem length, validated post-pin against any
descriptor produced via @attr_id / @legacy_filler; also used
as the pin length on the VA fallback path. Callers that have
a distinct user-passed length must validate it against their
driver minimum before calling.
If/when a driver actually needs distinct values, splitting into
addr_size + min_size is mechanical.
Would that work for you?
next prev parent reply other threads:[~2026-05-14 9:02 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 13:57 [PATCH rdma-next v3 00/17] RDMA: Introduce generic buffer descriptor infrastructure for umem Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 01/17] RDMA/umem: Rename ib_umem_get() to ib_umem_get_va() Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 02/17] RDMA/umem: Split ib_umem_get_va() into a thin wrapper around __ib_umem_get_va() Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 03/17] RDMA/core: Introduce generic buffer descriptor infrastructure for umem Jiri Pirko
2026-05-06 13:37 ` Jason Gunthorpe
2026-05-06 14:14 ` Jiri Pirko
2026-05-12 18:12 ` Jason Gunthorpe
2026-05-13 19:18 ` Jiri Pirko
2026-05-13 23:34 ` Jason Gunthorpe
2026-05-14 9:02 ` Jiri Pirko [this message]
2026-05-14 12:14 ` Jason Gunthorpe
2026-05-04 13:57 ` [PATCH rdma-next v3 04/17] RDMA/umem: Route ib_umem_get_va() through ib_umem_get() Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 05/17] RDMA/uverbs: Inline _uverbs_get_const_{signed,unsigned}() Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 06/17] RDMA/uverbs: Push out CQ buffer umem processing into a helper Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 07/17] RDMA/uverbs: Add CQ buffer UMEM attribute and driver helpers Jiri Pirko
2026-05-06 13:46 ` Jason Gunthorpe
2026-05-06 14:27 ` Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 08/17] RDMA/efa: Use ib_umem_get_cq_buf() for user CQ buffer Jiri Pirko
2026-05-06 13:51 ` Jason Gunthorpe
2026-05-06 14:32 ` Jiri Pirko
2026-05-12 18:18 ` Jason Gunthorpe
2026-05-04 13:57 ` [PATCH rdma-next v3 09/17] RDMA/mlx5: Use ib_umem_get_cq_buf_or_va() " Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 10/17] RDMA/bnxt_re: " Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 11/17] RDMA/mlx4: Use ib_umem_get_cq_buf() " Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 12/17] RDMA/uverbs: Remove legacy umem field from struct ib_cq Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 13/17] RDMA/uverbs: Use UMEM attributes for QP creation Jiri Pirko
2026-05-06 13:43 ` Jason Gunthorpe
2026-05-06 14:17 ` Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 14/17] RDMA/mlx5: Use UMEM attributes for QP buffers in create_qp Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 15/17] RDMA/mlx5: Use UMEM attribute for CQ doorbell record Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 16/17] RDMA/mlx5: Use UMEM attribute for QP " Jiri Pirko
2026-05-04 13:57 ` [PATCH rdma-next v3 17/17] RDMA/uverbs: Track attr consumption and warn on unused attrs Jiri Pirko
2026-05-06 13:56 ` Jason Gunthorpe
2026-05-06 14:22 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agWOldIWkFI3i1xB@FV6GYCPJ69 \
--to=jiri@resnulli.us \
--cc=andrew.gospodarek@broadcom.com \
--cc=edwards@nvidia.com \
--cc=gal.pressman@linux.dev \
--cc=huangjunxian6@hisilicon.com \
--cc=jgg@ziepe.ca \
--cc=kalesh-anakkur.purayil@broadcom.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=marco.crivellari@suse.com \
--cc=mbloch@nvidia.com \
--cc=michaelgur@nvidia.com \
--cc=mrgolin@amazon.com \
--cc=ohartoov@nvidia.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=roman.gushchin@linux.dev \
--cc=selvin.xavier@broadcom.com \
--cc=shayd@nvidia.com \
--cc=sleybo@amazon.com \
--cc=sriharsha.basavapatna@broadcom.com \
--cc=yanjun.zhu@linux.dev \
--cc=ynachum@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox