From: Leon Romanovsky <leon@kernel.org>
To: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: jgg@ziepe.ca, linux-rdma@vger.kernel.org,
andrew.gospodarek@broadcom.com, selvin.xavier@broadcom.com,
kalesh-anakkur.purayil@broadcom.com
Subject: Re: [PATCH rdma-next v3 4/4] RDMA/bnxt_re: Direct Verbs: Support CQ and QP verbs
Date: Tue, 11 Nov 2025 13:00:09 +0200 [thread overview]
Message-ID: <20251111110009.GN15456@unreal> (raw)
In-Reply-To: <20251110145628.290296-5-sriharsha.basavapatna@broadcom.com>
On Mon, Nov 10, 2025 at 08:26:28PM +0530, Sriharsha Basavapatna wrote:
> The following Direct Verb (DV) methods have been implemented in
> this patch.
>
> CQ Direct Verbs:
> ----------------
> - BNXT_RE_METHOD_DV_CREATE_CQ:
> Create a CQ of requested size (cqe). The application must have
> already registered this memory with the driver using DV_UMEM_REG.
> The CQ umem-handle and umem-offset are passed to the driver. The
> driver now maps/pins the CQ user memory and registers it with the
> hardware. The driver returns a CQ-handle to the application.
>
> - BNXT_RE_METHOD_DV_DESTROY_CQ:
> Destroy the DV_CQ specified by the CQ-handle; unmap the user memory.
>
> QP Direct Verbs:
> ----------------
> - BNXT_RE_METHOD_DV_CREATE_QP:
> Create a QP using specified params (struct bnxt_re_dv_create_qp_req).
> The application must have already registered SQ/RQ memory with the
> driver using DV_UMEM_REG. The SQ/RQ umem-handle and umem-offset are
> passed to the driver. The driver now maps/pins the SQ/RQ user memory
> and registers it with the hardware. The driver returns a QP-handle to
> the application.
>
> - BNXT_RE_METHOD_DV_DESTROY_QP:
> Destroy the DV_QP specified by the QP-handle; unmap SQ/RQ user memory.
>
> - BNXT_RE_METHOD_DV_MODIFY_QP:
> Modify QP attributes for the DV_QP specified by the QP-handle;
> wrapper functions have been implemented to resolve dmac/smac using
> rdma_resolve_ip().
>
> - BNXT_RE_METHOD_DV_QUERY_QP:
> Return QP attributes for the DV_QP specified by the QP-handle.
>
> Note:
> -----
> Some applications might want to allocate memory for all resources of a
> given type (CQ/QP) in one big chunk and then register that entire memory
> once using DV_UMEM_REG. At the time of creating each individual
> resource, the application would pass a specific offset/length in the
> umem registered memory.
>
> - The DV_UMEM_REG handler (previous patch) only creates a dv_umem object
> and saves user memory parameters, but doesn't really map/pin this
> memory.
> - The mapping would be done at the time of creating individual objects.
> - This actual mapping of specific umem offsets is implemented by the
> function bnxt_re_dv_umem_get(). This function validates the
> umem-offset and size parameters passed during CQ/QP creation. If the
> request is valid, it maps the specified offset/length within the umem
> registered memory.
> - The CQ and QP creation DV handlers call bnxt_re_dv_umem_get() to map
> offsets/sizes specific to each individual object. This means each
> object gets its own mapped dv_umem object that is distinct from the
> main dv_umem object created during DV_UMEM_REG.
> - The object specific dv_umem is unmapped when the object is destroyed.
>
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> Co-developed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Co-developed-by: Selvin Xavier <selvin.xavier@broadcom.com>
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> ---
> drivers/infiniband/hw/bnxt_re/bnxt_re.h | 12 +-
> drivers/infiniband/hw/bnxt_re/dv.c | 1071 ++++++++++++++++++++++
> drivers/infiniband/hw/bnxt_re/ib_verbs.c | 55 +-
> drivers/infiniband/hw/bnxt_re/ib_verbs.h | 12 +
> include/uapi/rdma/bnxt_re-abi.h | 93 ++
> 5 files changed, 1227 insertions(+), 16 deletions(-)
<...>
> +enum {
> + BNXT_RE_DV_RES_TYPE_QP = 0,
There is no need to set BNXT_RE_DV_RES_TYPE_QP to 0, all enums start
from 0 in C.
> + BNXT_RE_DV_RES_TYPE_CQ,
> + BNXT_RE_DV_RES_TYPE_MAX
> +};
> +
> struct bnxt_re_dev {
> struct ib_device ibdev;
> struct list_head list;
> @@ -231,6 +237,8 @@ struct bnxt_re_dev {
> union ib_gid ugid;
> u32 ugid_index;
> u8 sniffer_flow_created : 1;
> + atomic_t dv_cq_count;
> + atomic_t dv_qp_count;
> };
>
> #define to_bnxt_re_dev(ptr, member) \
> @@ -274,6 +282,9 @@ static inline int bnxt_re_read_context_allowed(struct bnxt_re_dev *rdev)
> return 0;
> }
>
> +struct bnxt_qplib_nq *bnxt_re_get_nq(struct bnxt_re_dev *rdev);
> +void bnxt_re_put_nq(struct bnxt_re_dev *rdev, struct bnxt_qplib_nq *nq);
> +
> #define BNXT_RE_CONTEXT_TYPE_QPC_SIZE_P5 1088
> #define BNXT_RE_CONTEXT_TYPE_CQ_SIZE_P5 128
> #define BNXT_RE_CONTEXT_TYPE_MRW_SIZE_P5 128
> @@ -286,5 +297,4 @@ static inline int bnxt_re_read_context_allowed(struct bnxt_re_dev *rdev)
>
> #define BNXT_RE_HWRM_CMD_TIMEOUT(rdev) \
> ((rdev)->chip_ctx->hwrm_cmd_max_timeout * 1000)
> -
> #endif
> diff --git a/drivers/infiniband/hw/bnxt_re/dv.c b/drivers/infiniband/hw/bnxt_re/dv.c
> index 1485aa0a6818..100e18d07f51 100644
> --- a/drivers/infiniband/hw/bnxt_re/dv.c
> +++ b/drivers/infiniband/hw/bnxt_re/dv.c
> @@ -14,6 +14,7 @@
> #include <rdma/uverbs_named_ioctl.h>
> #include <rdma/ib_umem.h>
> #include <rdma/bnxt_re-abi.h>
> +#include <rdma/ib_cache.h>
>
> #include "roce_hsi.h"
> #include "qplib_res.h"
> @@ -379,6 +380,76 @@ static int bnxt_re_dv_validate_umem_attr(struct bnxt_re_dev *rdev,
> return 0;
> }
>
> +static bool bnxt_re_dv_is_valid_umem(struct bnxt_re_dv_umem *umem,
> + u64 offset, u32 size)
> +{
> + return ((offset == ALIGN(offset, PAGE_SIZE)) &&
> + (offset + size <= umem->size));
> +}
> +
> +static struct bnxt_re_dv_umem *bnxt_re_dv_umem_get(struct bnxt_re_dev *rdev,
> + struct ib_ucontext *ib_uctx,
> + struct bnxt_re_dv_umem *obj,
> + u64 umem_offset, u64 size,
> + struct bnxt_qplib_sg_info *sg)
> +{
> + struct bnxt_re_dv_umem *dv_umem;
> + struct ib_umem *umem;
> + int umem_pgs, rc;
> +
> + if (!bnxt_re_dv_is_valid_umem(obj, umem_offset, size))
> + return ERR_PTR(-EINVAL);
I wonder why do you need this check. Will it be better to rely on
ib_umem_*_get() interfaces for these checks and slightly change code
to perform dv_umem = kzalloc(...) after that?
> +
> + dv_umem = kzalloc(sizeof(*dv_umem), GFP_KERNEL);
> + if (!dv_umem)
> + return ERR_PTR(-ENOMEM);
> +
> + dv_umem->addr = obj->addr + umem_offset;
> + dv_umem->size = size;
> + dv_umem->rdev = obj->rdev;
> + dv_umem->dmabuf_fd = obj->dmabuf_fd;
> + dv_umem->access = obj->access;
> +
> + if (obj->dmabuf_fd) {
> + struct ib_umem_dmabuf *umem_dmabuf;
> +
> + umem_dmabuf = ib_umem_dmabuf_get_pinned(&rdev->ibdev, dv_umem->addr,
> + dv_umem->size, dv_umem->dmabuf_fd,
> + dv_umem->access);
> + if (IS_ERR(umem_dmabuf)) {
> + rc = PTR_ERR(umem_dmabuf);
> + goto free_umem;
> + }
> + umem = &umem_dmabuf->umem;
> + } else {
> + umem = ib_umem_get(&rdev->ibdev, (unsigned long)dv_umem->addr,
> + dv_umem->size, dv_umem->access);
> + if (IS_ERR(umem)) {
> + rc = PTR_ERR(umem);
> + goto free_umem;
> + }
> + }
> +
> + dv_umem->umem = umem;
> +
> + umem_pgs = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
> + if (!umem_pgs) {
> + rc = -EINVAL;
> + goto rel_umem;
> + }
> + sg->npages = ib_umem_num_dma_blocks(umem, PAGE_SIZE);
You already calculated it, exactly 4 lines above.
> + sg->pgshft = PAGE_SHIFT;
> + sg->pgsize = PAGE_SIZE;
> + sg->umem = umem;
> + return dv_umem;
<...>
> + if (atomic_read(&rdev->stats.res.cq_count) >= dev_attr->max_cq) {
> + ibdev_err(&rdev->ibdev, "Create CQ failed - max exceeded(CQs)");
User can trigger this error and fill dmesg. If you want to keep it, move
to be debug print.
> + return NULL;
> + }
> +
> + /* Validate CQ fields */
> + if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
> + ibdev_err(&rdev->ibdev, "Create CQ failed - max exceeded(CQ_WQs)");
Same
> + return NULL;
> + }
<...>
> + atomic_inc(&rdev->stats.res.cq_count);
> + max_active_cqs = atomic_read(&rdev->stats.res.cq_count);
atomic_inc_return(&rdev->stats.res.cq_count);
> + if (max_active_cqs > rdev->stats.res.cq_watermark)
> + rdev->stats.res.cq_watermark = max_active_cqs;
> + spin_lock_init(&cq->cq_lock);
<...>
> +static int bnxt_re_dv_uverbs_copy_to(struct bnxt_re_dev *rdev,
> + struct uverbs_attr_bundle *attrs,
> + int attr, void *from, size_t size)
> +{
> + return uverbs_copy_to_struct_or_zero(attrs, attr, from, size);
> +}
> +
Please don't wrap basic functions.
> +static void bnxt_re_dv_finalize_uobj(struct ib_uobject *uobj, void *priv_obj,
> + struct uverbs_attr_bundle *attrs, int attr)
> +{
> + uobj->object = priv_obj;
> + uverbs_finalize_uobj_create(attrs, attr);
> +}
Same
> +
<...>
> + dv_umem = bnxt_re_dv_umem_get(rdev, context, sq_umem,
> + init_attr->sq_umem_offset,
> + init_attr->sq_len, sginfo);
> + if (IS_ERR(dv_umem)) {
> + rc = PTR_ERR(dv_umem);
> + ibdev_dbg(&rdev->ibdev, "%s: bnxt_re_dv_umem_get() failed, rc: %d\n",
> + __func__, rc);
Please remove __func__ prints.
> + return rc;
> + }
> + qp->sq_umem = dv_umem;
> + qp->sumem = dv_umem->umem;
<...>
> +static void
> +bnxt_re_dv_qp_init_msn(struct bnxt_re_qp *qp,
> + struct bnxt_re_dv_create_qp_req *req)
> +{
> + struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp;
> +
> + qplib_qp->is_host_msn_tbl = true;
> + qplib_qp->msn = 0;
> + qplib_qp->psn_sz = req->sq_psn_sz;
> + qplib_qp->msn_tbl_sz = req->sq_psn_sz * req->sq_npsn;
Did you check that it can't overflow?
> +}
<...>
> +static int bnxt_re_dv_free_qp(struct ib_uobject *uobj,
> + enum rdma_remove_reason why,
> + struct uverbs_attr_bundle *attrs)
> +{
> + struct bnxt_re_qp *qp = uobj->object;
> + struct bnxt_re_dev *rdev = qp->rdev;
> + struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp;
> + struct bnxt_qplib_nq *scq_nq = NULL;
> + struct bnxt_qplib_nq *rcq_nq = NULL;
> + int rc;
> +
> + mutex_lock(&rdev->qp_lock);
> + list_del(&qp->list);
> + atomic_dec_return(&rdev->stats.res.qp_count);
atomic_dec()
> + if (qp->qplib_qp.type == CMDQ_CREATE_QP_TYPE_RC)
> + atomic_dec(&rdev->stats.res.rc_qp_count);
> + mutex_unlock(&rdev->qp_lock);
<...>
> +static void bnxt_re_copyout_ah_attr(struct ib_uverbs_ah_attr *dattr,
> + struct rdma_ah_attr *sattr)
> +{
> + dattr->sl = sattr->sl;
Please don't do vertical alignment.
> + memcpy(dattr->grh.dgid, &sattr->grh.dgid, 16);
> + dattr->grh.flow_label = sattr->grh.flow_label;
> + dattr->grh.hop_limit = sattr->grh.hop_limit;
> + dattr->grh.sgid_index = sattr->grh.sgid_index;
> + dattr->grh.traffic_class = sattr->grh.traffic_class;
> +}
> +
> +static void bnxt_re_dv_copy_qp_attr_out(struct bnxt_re_dev *rdev,
> + struct ib_uverbs_qp_attr *out,
> + struct ib_qp_attr *qp_attr,
> + struct ib_qp_init_attr *qp_init_attr)
> +{
> + out->qp_state = qp_attr->qp_state;
> + out->cur_qp_state = qp_attr->cur_qp_state;
> + out->path_mtu = qp_attr->path_mtu;
> + out->path_mig_state = qp_attr->path_mig_state;
> + out->qkey = qp_attr->qkey;
> + out->rq_psn = qp_attr->rq_psn;
> + out->sq_psn = qp_attr->sq_psn;
> + out->dest_qp_num = qp_attr->dest_qp_num;
> + out->qp_access_flags = qp_attr->qp_access_flags;
> + out->max_send_wr = qp_attr->cap.max_send_wr;
> + out->max_recv_wr = qp_attr->cap.max_recv_wr;
> + out->max_send_sge = qp_attr->cap.max_send_sge;
> + out->max_recv_sge = qp_attr->cap.max_recv_sge;
> + out->max_inline_data = qp_attr->cap.max_inline_data;
> + out->pkey_index = qp_attr->pkey_index;
> + out->alt_pkey_index = qp_attr->alt_pkey_index;
> + out->en_sqd_async_notify = qp_attr->en_sqd_async_notify;
> + out->sq_draining = qp_attr->sq_draining;
> + out->max_rd_atomic = qp_attr->max_rd_atomic;
> + out->max_dest_rd_atomic = qp_attr->max_dest_rd_atomic;
> + out->min_rnr_timer = qp_attr->min_rnr_timer;
> + out->port_num = qp_attr->port_num;
> + out->timeout = qp_attr->timeout;
> + out->retry_cnt = qp_attr->retry_cnt;
> + out->rnr_retry = qp_attr->rnr_retry;
> + out->alt_port_num = qp_attr->alt_port_num;
> + out->alt_timeout = qp_attr->alt_timeout;
> +
> + bnxt_re_copyout_ah_attr(&out->ah_attr, &qp_attr->ah_attr);
> + bnxt_re_copyout_ah_attr(&out->alt_ah_attr, &qp_attr->alt_ah_attr);
> +}
Why do you need to open-code existing ib_uverbs_query_qp()? I afraid
that this in-driver prone to errors.
> +
<...>
> + uobj = uverbs_attr_get_uobject(attrs, BNXT_RE_DV_QUERY_QP_HANDLE);
You should check that uobj is not an error.
> + qp = uobj->object;
<...>
> +static int bnxt_re_dv_copy_qp_attr(struct bnxt_re_dev *rdev,
> + struct ib_qp_attr *dst,
> + struct ib_uverbs_qp_attr *src)
> +{
> + int rc;
> +
> + if (src->qp_attr_mask & IB_QP_ALT_PATH)
> + return -EINVAL;
> +
> + dst->qp_state = src->qp_state;
> + dst->cur_qp_state = src->cur_qp_state;
> + dst->path_mtu = src->path_mtu;
> + dst->path_mig_state = src->path_mig_state;
> + dst->qkey = src->qkey;
> + dst->rq_psn = src->rq_psn;
> + dst->sq_psn = src->sq_psn;
> + dst->dest_qp_num = src->dest_qp_num;
> + dst->qp_access_flags = src->qp_access_flags;
> +
> + dst->cap.max_send_wr = src->max_send_wr;
> + dst->cap.max_recv_wr = src->max_recv_wr;
> + dst->cap.max_send_sge = src->max_send_sge;
> + dst->cap.max_recv_sge = src->max_recv_sge;
> + dst->cap.max_inline_data = src->max_inline_data;
> +
> + if (src->qp_attr_mask & IB_QP_AV) {
> + rc = bnxt_re_copyin_ah_attr(&rdev->ibdev, &dst->ah_attr,
> + &src->ah_attr);
> + if (rc)
> + return rc;
> + }
> +
> + dst->pkey_index = src->pkey_index;
> + dst->alt_pkey_index = src->alt_pkey_index;
> + dst->en_sqd_async_notify = src->en_sqd_async_notify;
> + dst->sq_draining = src->sq_draining;
> + dst->max_rd_atomic = src->max_rd_atomic;
> + dst->max_dest_rd_atomic = src->max_dest_rd_atomic;
> + dst->min_rnr_timer = src->min_rnr_timer;
> + dst->port_num = src->port_num;
> + dst->timeout = src->timeout;
> + dst->retry_cnt = src->retry_cnt;
> + dst->rnr_retry = src->rnr_retry;
> + dst->alt_port_num = src->alt_port_num;
> + dst->alt_timeout = src->alt_timeout;
> +
> + return 0;
> +}
Again open-code variant of existing IB/core function.
> +
> +static int bnxt_re_dv_modify_qp(struct ib_uobject *uobj,
> + struct uverbs_attr_bundle *attrs)
> +{
I'm starting to believe that this DV QP interface is not right thing to
do. I would expect DV to be for extra parameters on top of existing
basic QP infrastructure.
Thanks
next prev parent reply other threads:[~2025-11-11 11:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 14:56 [PATCH rdma-next v3 0/4] RDMA/bnxt_re: Support direct verbs Sriharsha Basavapatna
2025-11-10 14:56 ` [PATCH rdma-next v3 1/4] RDMA/bnxt_re: Move the UAPI methods to a dedicated file Sriharsha Basavapatna
2025-11-10 14:56 ` [PATCH rdma-next v3 2/4] RDMA/bnxt_re: Refactor bnxt_qplib_create_qp() function Sriharsha Basavapatna
2025-11-10 14:56 ` [PATCH rdma-next v3 3/4] RDMA/bnxt_re: Direct Verbs: Support DBR and UMEM verbs Sriharsha Basavapatna
2025-11-10 14:56 ` [PATCH rdma-next v3 4/4] RDMA/bnxt_re: Direct Verbs: Support CQ and QP verbs Sriharsha Basavapatna
2025-11-11 11:00 ` Leon Romanovsky [this message]
2025-11-17 6:22 ` Sriharsha Basavapatna
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251111110009.GN15456@unreal \
--to=leon@kernel.org \
--cc=andrew.gospodarek@broadcom.com \
--cc=jgg@ziepe.ca \
--cc=kalesh-anakkur.purayil@broadcom.com \
--cc=linux-rdma@vger.kernel.org \
--cc=selvin.xavier@broadcom.com \
--cc=sriharsha.basavapatna@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox