public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records
@ 2026-02-03  8:49 Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

This patchset extends the existing CQ umem buffer infrastructure to QP
creation and adds doorbell record (DBR) umem support for both CQ and QP.

The kernel already supports CQ creation with externally provided umem
buffers. This patchset:

1. Adds umem reference counting to simplify memory lifecycle management.
2. Adds mlx5 driver implementation for the existing CQ buffer umem support.
3. Extends QP creation to support external SQ/RQ buffer umems.
4. Adds doorbell record umem support for both CQ and QP.
5. Implements all extensions in mlx5 driver.

This enables integration with dmabuf exporters for specialized memory
types, such as GPU memory for GPU-direct RDMA scenarios.

Patch #1 introduces reference counting for ib_umem objects to simplify
memory lifecycle management when umem buffers are shared between the
core layer and device drivers.
Patch #2 adopts the new ib_umem refcounting model for create_cq_umem.
Patch #3 implements mlx5 driver support for the existing CQ buffer umem
interface.
Patch #4 factors out common buffer umem parsing into a shared helper.
Patch #5 adds core framework for QP buffer umem support with new uverbs
attributes for SQ/RQ buffers.
Patch #6 implements mlx5 driver support for QP buffer umem extension.
Patch #7 extends create_cq_umem device op with DBR umem parameter and
adds new uverbs attributes for CQ DBR.
Patch #8 implements mlx5 driver support for CQ DBR umem and extends the
doorbell mapping infrastructure.
Patch #9 extends create_qp_umem device op with DBR umem parameters and
adds new uverbs attributes for SQ/RQ DBR.
Patch #10 implements mlx5 driver support for QP DBR umem.

Jiri Pirko (10):
  RDMA/umem: Add reference counting to ib_umem
  RDMA/uverbs: Use umem refcounting for CQ creation with external buffer
  RDMA/mlx5: Add support for CQ creation with external umem buffer
  RDMA/uverbs: Factor out common buffer umem parsing into helper
  RDMA/core: Add support for QP buffer umem in QP creation
  RDMA/mlx5: Add support for QP creation with external umem buffers
  RDMA/uverbs: Add doorbell record umem support to CQ creation
  RDMA/mlx5: Add external doorbell record umem support for CQ
  RDMA/uverbs: Add doorbell record umem support to QP creation
  RDMA/mlx5: Add external doorbell record umem support for QP

 drivers/infiniband/core/core_priv.h           |   9 ++
 drivers/infiniband/core/device.c              |   1 +
 drivers/infiniband/core/umem.c                |   5 +
 drivers/infiniband/core/umem_dmabuf.c         |   1 +
 drivers/infiniband/core/umem_odp.c            |   3 +
 drivers/infiniband/core/uverbs_ioctl.c        |  68 +++++++++++
 drivers/infiniband/core/uverbs_std_types_cq.c | 104 +++++++---------
 drivers/infiniband/core/uverbs_std_types_qp.c | 114 +++++++++++++++++-
 drivers/infiniband/core/verbs.c               |  72 +++++++++--
 drivers/infiniband/hw/efa/efa.h               |   3 +-
 drivers/infiniband/hw/efa/efa_verbs.c         |  10 +-
 drivers/infiniband/hw/mlx5/cq.c               |  47 ++++++--
 drivers/infiniband/hw/mlx5/doorbell.c         |  27 ++++-
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   9 +-
 drivers/infiniband/hw/mlx5/qp.c               |  93 +++++++++++---
 drivers/infiniband/hw/mlx5/srq.c              |   2 +-
 include/rdma/ib_umem.h                        |   9 ++
 include/rdma/ib_verbs.h                       |   7 ++
 include/rdma/uverbs_ioctl.h                   |   7 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  20 +++
 21 files changed, 500 insertions(+), 113 deletions(-)

-- 
2.51.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03 10:03   ` Leon Romanovsky
  2026-02-03 14:51   ` Jason Gunthorpe
  2026-02-03  8:49 ` [PATCH rdma-next 02/10] RDMA/uverbs: Use umem refcounting for CQ creation with external buffer Jiri Pirko
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Introduce reference counting for ib_umem objects to simplify memory
lifecycle management when umem buffers are shared between the core
layer and device drivers.

When the core RDMA layer allocates an ib_umem and passes it to a driver
(e.g., for CQ or QP creation with external buffers), both layers need
to manage the umem lifecycle. Without reference counting, this requires
complex conditional release logic to avoid double-frees on error paths
and leaks on success paths.

With reference counting:
- Core allocates umem with refcount=1
- Driver calls ib_umem_get_ref() to take a reference
- Both layers can unconditionally call ib_umem_release()
- The umem is freed only when the last reference is dropped

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
---
 drivers/infiniband/core/umem.c        | 5 +++++
 drivers/infiniband/core/umem_dmabuf.c | 1 +
 drivers/infiniband/core/umem_odp.c    | 3 +++
 include/rdma/ib_umem.h                | 9 +++++++++
 4 files changed, 18 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 8137031c2a65..09ce694d66ea 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
+	refcount_set(&umem->refcount, 1);
 	umem->ibdev      = device;
 	umem->length     = size;
 	umem->address    = addr;
@@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
 /**
  * ib_umem_release - release memory pinned with ib_umem_get
  * @umem: umem struct to release
+ *
+ * Decrement the reference count and free the umem when it reaches zero.
  */
 void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (!refcount_dec_and_test(&umem->refcount))
+		return;
 	if (umem->is_dmabuf)
 		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
 	if (umem->is_odp)
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index 939da49b0dcc..5c5286092fca 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
 	}
 
 	umem = &umem_dmabuf->umem;
+	refcount_set(&umem->refcount, 1);
 	umem->ibdev = device;
 	umem->length = size;
 	umem->address = offset;
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 572a91a62a7b..7be30fda57e3 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
 	if (!umem_odp)
 		return ERR_PTR(-ENOMEM);
 	umem = &umem_odp->umem;
+	refcount_set(&umem->refcount, 1);
 	umem->ibdev = device;
 	umem->writable = ib_access_writable(access);
 	umem->owning_mm = current->mm;
@@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
 	if (!odp_data)
 		return ERR_PTR(-ENOMEM);
 	umem = &odp_data->umem;
+	refcount_set(&umem->refcount, 1);
 	umem->ibdev = root->umem.ibdev;
 	umem->length     = size;
 	umem->address    = addr;
@@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
 	if (!umem_odp)
 		return ERR_PTR(-ENOMEM);
 
+	refcount_set(&umem_odp->umem.refcount, 1);
 	umem_odp->umem.ibdev = device;
 	umem_odp->umem.length = size;
 	umem_odp->umem.address = addr;
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 0a8e092c0ea8..44264f78eab3 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -10,6 +10,7 @@
 #include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <linux/refcount.h>
 #include <rdma/ib_verbs.h>
 
 struct ib_ucontext;
@@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
 struct ib_umem {
 	struct ib_device       *ibdev;
 	struct mm_struct       *owning_mm;
+	refcount_t		refcount;
 	u64 iova;
 	size_t			length;
 	unsigned long		address;
@@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
 
 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 			    size_t size, int access);
+
+static inline void ib_umem_get_ref(struct ib_umem *umem)
+{
+	refcount_inc(&umem->refcount);
+}
+
 void ib_umem_release(struct ib_umem *umem);
 int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      size_t length);
@@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
+static inline void ib_umem_get_ref(struct ib_umem *umem) { }
 static inline void ib_umem_release(struct ib_umem *umem) { }
 static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      		    size_t length) {
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 02/10] RDMA/uverbs: Use umem refcounting for CQ creation with external buffer
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 03/10] RDMA/mlx5: Add support for CQ creation with external umem buffer Jiri Pirko
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Adopt the new ib_umem refcounting model for create_cq_umem and
the only op implementation.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Ifad1447a9cd688554f2eed3d41e96d62fc28ed54
---
 drivers/infiniband/core/uverbs_std_types_cq.c | 3 +++
 drivers/infiniband/hw/efa/efa_verbs.c         | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index fab5d914029d..380e4e4d6fb1 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -196,6 +196,9 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	if (ret)
 		goto err_free;
 
+	/* Driver took a reference, release ours */
+	ib_umem_release(umem);
+
 	obj->uevent.uobject.object = cq;
 	obj->uevent.uobject.user_handle = user_handle;
 	rdma_restrack_add(&cq->res);
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 22d3e25c3b9d..aca48825cd71 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1225,6 +1225,7 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 			goto err_out;
 		}
 
+		ib_umem_get_ref(umem);
 		cq->cpu_addr = NULL;
 		cq->dma_addr = ib_umem_start_dma_addr(umem);
 		cq->umem = umem;
@@ -1300,7 +1301,9 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
-	if (!umem)
+	if (umem)
+		ib_umem_release(umem);
+	else
 		efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size,
 				DMA_FROM_DEVICE);
 err_out:
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 03/10] RDMA/mlx5: Add support for CQ creation with external umem buffer
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 02/10] RDMA/uverbs: Use umem refcounting for CQ creation with external buffer Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 04/10] RDMA/uverbs: Factor out common buffer umem parsing into helper Jiri Pirko
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Add mlx5_ib_create_cq_umem() function that allows creating a Completion
Queue with an externally provided user memory region instead of
allocating one internally.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Ie686dc8aa76fb72299a2bff2de30e95efbcffb3e
---
 drivers/infiniband/hw/mlx5/cq.c      | 42 ++++++++++++++++++++++------
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  2 ++
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 651d76bca114..2558a52c31c0 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -718,6 +718,7 @@ static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
 static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 			  struct mlx5_ib_cq *cq, int entries, u32 **cqb,
 			  int *cqe_size, int *index, int *inlen,
+			  struct ib_umem *ext_umem,
 			  struct uverbs_attr_bundle *attrs)
 {
 	struct mlx5_ib_create_cq ucmd = {};
@@ -749,12 +750,21 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 
 	*cqe_size = ucmd.cqe_size;
 
-	cq->buf.umem =
-		ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
-			    entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(cq->buf.umem)) {
-		err = PTR_ERR(cq->buf.umem);
-		return err;
+	if (ext_umem) {
+		if (ext_umem->length < entries * ucmd.cqe_size) {
+			mlx5_ib_dbg(dev, "External umem too small for CQ\n");
+			return -EINVAL;
+		}
+		ib_umem_get_ref(ext_umem);
+		cq->buf.umem = ext_umem;
+	} else {
+		cq->buf.umem =
+			ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+				    entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE);
+		if (IS_ERR(cq->buf.umem)) {
+			err = PTR_ERR(cq->buf.umem);
+			return err;
+		}
 	}
 
 	page_size = mlx5_umem_find_best_cq_quantized_pgoff(
@@ -949,8 +959,10 @@ static void notify_soft_wc_handler(struct work_struct *work)
 	cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
 }
 
-int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		      struct uverbs_attr_bundle *attrs)
+static int __mlx5_ib_create_cq(struct ib_cq *ibcq,
+			       const struct ib_cq_init_attr *attr,
+			       struct ib_umem *ext_umem,
+			       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -989,7 +1001,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	if (udata) {
 		err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size,
-				     &index, &inlen, attrs);
+				     &index, &inlen, ext_umem, attrs);
 		if (err)
 			return err;
 	} else {
@@ -1058,6 +1070,18 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return err;
 }
 
+int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct uverbs_attr_bundle *attrs)
+{
+	return __mlx5_ib_create_cq(ibcq, attr, NULL, attrs);
+}
+
+int mlx5_ib_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			   struct ib_umem *umem, struct uverbs_attr_bundle *attrs)
+{
+	return __mlx5_ib_create_cq(ibcq, attr, umem, attrs);
+}
+
 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq)
 {
 	struct mlx5_ib_dev *dev = to_mdev(cq->device);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index eba023b7af0f..a719738ebd0c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4447,6 +4447,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.check_mr_status = mlx5_ib_check_mr_status,
 	.create_ah = mlx5_ib_create_ah,
 	.create_cq = mlx5_ib_create_cq,
+	.create_cq_umem = mlx5_ib_create_cq_umem,
 	.create_qp = mlx5_ib_create_qp,
 	.create_srq = mlx5_ib_create_srq,
 	.create_user_ah = mlx5_ib_create_ah,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4f4114d95130..25efdae83d27 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1371,6 +1371,8 @@ int mlx5_ib_read_wqe_srq(struct mlx5_ib_srq *srq, int wqe_index, void *buffer,
 			 size_t buflen, size_t *bc);
 int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
+int mlx5_ib_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			   struct ib_umem *umem, struct uverbs_attr_bundle *attrs);
 int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 04/10] RDMA/uverbs: Factor out common buffer umem parsing into helper
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (2 preceding siblings ...)
  2026-02-03  8:49 ` [PATCH rdma-next 03/10] RDMA/mlx5: Add support for CQ creation with external umem buffer Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 05/10] RDMA/core: Add support for QP buffer umem in QP creation Jiri Pirko
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Introduce uverbs_get_buffer_umem() helper function to consolidate the
common pattern of parsing buffer attributes (VA or dmabuf FD) and
creating an ib_umem.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: If76883934c4020fbfc0cdc9378934244dc406a49
---
 drivers/infiniband/core/uverbs_ioctl.c        | 68 ++++++++++++++++++
 drivers/infiniband/core/uverbs_std_types_cq.c | 69 ++++---------------
 include/rdma/uverbs_ioctl.h                   |  7 ++
 3 files changed, 87 insertions(+), 57 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index f80da6a67e24..581b0705b496 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -32,6 +32,7 @@
 
 #include <rdma/rdma_user_ioctl.h>
 #include <rdma/uverbs_ioctl.h>
+#include <rdma/ib_umem.h>
 #include "rdma_core.h"
 #include "uverbs.h"
 
@@ -847,3 +848,70 @@ void uverbs_finalize_uobj_create(const struct uverbs_attr_bundle *bundle,
 		  pbundle->uobj_hw_obj_valid);
 }
 EXPORT_SYMBOL(uverbs_finalize_uobj_create);
+
+int uverbs_get_buffer_umem(struct ib_device *ib_dev,
+			   struct uverbs_attr_bundle *attrs,
+			   u16 va_attr, u16 len_attr,
+			   u16 fd_attr, u16 offset_attr,
+			   bool has_umem_support,
+			   int access, struct ib_umem **umem)
+{
+	struct ib_umem_dmabuf *umem_dmabuf;
+	u64 buffer_va, buffer_length, buffer_offset;
+	int buffer_fd;
+	int ret;
+
+	*umem = NULL;
+
+	if (uverbs_attr_is_valid(attrs, va_attr)) {
+		ret = uverbs_copy_from(&buffer_va, attrs, va_attr);
+		if (ret)
+			return ret;
+
+		ret = uverbs_copy_from(&buffer_length, attrs, len_attr);
+		if (ret)
+			return ret;
+
+		if (uverbs_attr_is_valid(attrs, fd_attr) ||
+		    uverbs_attr_is_valid(attrs, offset_attr) ||
+		    !has_umem_support)
+			return -EINVAL;
+
+		*umem = ib_umem_get(ib_dev, buffer_va, buffer_length, access);
+		if (IS_ERR(*umem)) {
+			ret = PTR_ERR(*umem);
+			*umem = NULL;
+			return ret;
+		}
+	} else if (uverbs_attr_is_valid(attrs, fd_attr)) {
+		ret = uverbs_get_raw_fd(&buffer_fd, attrs, fd_attr);
+		if (ret)
+			return ret;
+
+		ret = uverbs_copy_from(&buffer_offset, attrs, offset_attr);
+		if (ret)
+			return ret;
+
+		ret = uverbs_copy_from(&buffer_length, attrs, len_attr);
+		if (ret)
+			return ret;
+
+		if (uverbs_attr_is_valid(attrs, va_attr) ||
+		    !has_umem_support)
+			return -EINVAL;
+
+		umem_dmabuf = ib_umem_dmabuf_get_pinned(ib_dev, buffer_offset,
+						       buffer_length, buffer_fd,
+						       access);
+		if (IS_ERR(umem_dmabuf))
+			return PTR_ERR(umem_dmabuf);
+
+		*umem = &umem_dmabuf->umem;
+	} else if (uverbs_attr_is_valid(attrs, len_attr) ||
+		   uverbs_attr_is_valid(attrs, offset_attr)) {
+		/* Length or offset without VA/FD is invalid */
+		return -EINVAL;
+	}
+
+	return 0;
+}
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index 380e4e4d6fb1..64c43dbd96ba 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -66,16 +66,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 		typeof(*obj), uevent.uobject);
 	struct ib_uverbs_completion_event_file *ev_file = NULL;
 	struct ib_device *ib_dev = attrs->context->device;
-	struct ib_umem_dmabuf *umem_dmabuf;
 	struct ib_cq_init_attr attr = {};
 	struct ib_uobject *ev_file_uobj;
-	struct ib_umem *umem = NULL;
-	u64 buffer_length;
-	u64 buffer_offset;
+	struct ib_umem *umem;
 	struct ib_cq *cq;
 	u64 user_handle;
-	u64 buffer_va;
-	int buffer_fd;
 	int ret;
 
 	if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_cq_umem) || !ib_dev->ops.destroy_cq)
@@ -118,58 +113,18 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	INIT_LIST_HEAD(&obj->comp_list);
 	INIT_LIST_HEAD(&obj->uevent.event_list);
 
-	if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_VA)) {
-
-		ret = uverbs_copy_from(&buffer_va, attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_VA);
-		if (ret)
-			goto err_event_file;
-
-		ret = uverbs_copy_from(&buffer_length, attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH);
-		if (ret)
-			goto err_event_file;
-
-		if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_FD) ||
-		    uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET) ||
-		    !ib_dev->ops.create_cq_umem) {
-			ret = -EINVAL;
-			goto err_event_file;
-		}
-
-		umem = ib_umem_get(ib_dev, buffer_va, buffer_length, IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(umem)) {
-			ret = PTR_ERR(umem);
-			goto err_event_file;
-		}
-	} else if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_FD)) {
-
-		ret = uverbs_get_raw_fd(&buffer_fd, attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_FD);
-		if (ret)
-			goto err_event_file;
-
-		ret = uverbs_copy_from(&buffer_offset, attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET);
-		if (ret)
-			goto err_event_file;
-
-		ret = uverbs_copy_from(&buffer_length, attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH);
-		if (ret)
-			goto err_event_file;
-
-		if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_VA) ||
-		    !ib_dev->ops.create_cq_umem) {
-			ret = -EINVAL;
-			goto err_event_file;
-		}
+	ret = uverbs_get_buffer_umem(ib_dev, attrs,
+				     UVERBS_ATTR_CREATE_CQ_BUFFER_VA,
+				     UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH,
+				     UVERBS_ATTR_CREATE_CQ_BUFFER_FD,
+				     UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET,
+				     ib_dev->ops.create_cq_umem,
+				     IB_ACCESS_LOCAL_WRITE, &umem);
+	if (ret)
+		goto err_event_file;
 
-		umem_dmabuf = ib_umem_dmabuf_get_pinned(ib_dev, buffer_offset, buffer_length,
-							buffer_fd, IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(umem_dmabuf)) {
-			ret = PTR_ERR(umem_dmabuf);
-			goto err_event_file;
-		}
-		umem = &umem_dmabuf->umem;
-	} else if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET) ||
-		   uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH) ||
-		   !ib_dev->ops.create_cq) {
+	/* If no external buffer provided, require create_cq op */
+	if (!umem && !ib_dev->ops.create_cq) {
 		ret = -EINVAL;
 		goto err_event_file;
 	}
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index e6c0de227fad..2f96edf395bd 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -861,6 +861,13 @@ int uverbs_get_flags32(u32 *to, const struct uverbs_attr_bundle *attrs_bundle,
 		       size_t idx, u64 allowed_bits);
 int uverbs_copy_to(const struct uverbs_attr_bundle *attrs_bundle, size_t idx,
 		   const void *from, size_t size);
+struct ib_umem;
+int uverbs_get_buffer_umem(struct ib_device *ib_dev,
+			   struct uverbs_attr_bundle *attrs,
+			   u16 va_attr, u16 len_attr,
+			   u16 fd_attr, u16 offset_attr,
+			   bool has_umem_support,
+			   int access, struct ib_umem **umem);
 __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 			     gfp_t flags);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 05/10] RDMA/core: Add support for QP buffer umem in QP creation
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (3 preceding siblings ...)
  2026-02-03  8:49 ` [PATCH rdma-next 04/10] RDMA/uverbs: Factor out common buffer umem parsing into helper Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 06/10] RDMA/mlx5: Add support for QP creation with external umem buffers Jiri Pirko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Extend the QP creation path to support passing SQ and RQ buffers
via user memory (umem), either through virtual address or dmabuf
file descriptor.

Add new uverbs attributes for QP creation:
- SQ/RQ buffer virtual address
- SQ/RQ buffer length
- SQ/RQ buffer dmabuf file descriptor
- SQ/RQ buffer offset (for dmabuf)

The VA and FD modes are mutually exclusive for each buffer.

Introduce ib_create_qp_user_umem() function that extends
ib_create_qp_user() with sq_umem and rq_umem parameters. The
existing ib_create_qp_user() becomes a wrapper that passes NULL
for umem pointers, maintaining backward compatibility with all
existing callers.

Add create_qp_umem device op for drivers to implement umem-based
QP creation. When umem buffers are provided, the core calls
create_qp_umem instead of create_qp, allowing drivers to use
pre-pinned memory for QP buffers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: I08b1542a15fe6b89755ab7199b461050a45a9160
---
 drivers/infiniband/core/core_priv.h           |  7 ++
 drivers/infiniband/core/device.c              |  1 +
 drivers/infiniband/core/uverbs_std_types_qp.c | 63 +++++++++++++++++-
 drivers/infiniband/core/verbs.c               | 65 +++++++++++++++----
 include/rdma/ib_verbs.h                       |  4 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  8 +++
 6 files changed, 132 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 05102769a918..553326c4eca9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -320,6 +320,13 @@ struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd,
 				struct ib_qp_init_attr *attr,
 				struct ib_udata *udata,
 				struct ib_uqp_object *uobj, const char *caller);
+struct ib_qp *ib_create_qp_user_umem(struct ib_device *dev, struct ib_pd *pd,
+				     struct ib_qp_init_attr *attr,
+				     struct ib_umem *sq_umem,
+				     struct ib_umem *rq_umem,
+				     struct ib_udata *udata,
+				     struct ib_uqp_object *uobj,
+				     const char *caller);
 
 void ib_qp_usecnt_inc(struct ib_qp *qp);
 void ib_qp_usecnt_dec(struct ib_qp *qp);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4e09f6e0995e..a9ae03f1e936 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2704,6 +2704,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, create_cq_umem);
 	SET_DEVICE_OP(dev_ops, create_flow);
 	SET_DEVICE_OP(dev_ops, create_qp);
+	SET_DEVICE_OP(dev_ops, create_qp_umem);
 	SET_DEVICE_OP(dev_ops, create_rwq_ind_table);
 	SET_DEVICE_OP(dev_ops, create_srq);
 	SET_DEVICE_OP(dev_ops, create_user_ah);
diff --git a/drivers/infiniband/core/uverbs_std_types_qp.c b/drivers/infiniband/core/uverbs_std_types_qp.c
index be0730e8509e..f1e2cfb27aa5 100644
--- a/drivers/infiniband/core/uverbs_std_types_qp.c
+++ b/drivers/infiniband/core/uverbs_std_types_qp.c
@@ -96,6 +96,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 	struct ib_xrcd *xrcd = NULL;
 	struct ib_uobject *xrcd_uobj = NULL;
 	struct ib_device *device;
+	struct ib_umem *sq_umem, *rq_umem;
 	u64 user_handle;
 	int ret;
 
@@ -248,12 +249,39 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 	set_caps(&attr, &cap, true);
 	mutex_init(&obj->mcast_lock);
 
-	qp = ib_create_qp_user(device, pd, &attr, &attrs->driver_udata, obj,
-			       KBUILD_MODNAME);
+	/* Get SQ buffer umem (from VA or dmabuf FD) */
+	ret = uverbs_get_buffer_umem(device, attrs,
+				     UVERBS_ATTR_CREATE_QP_SQ_BUFFER_VA,
+				     UVERBS_ATTR_CREATE_QP_SQ_BUFFER_LENGTH,
+				     UVERBS_ATTR_CREATE_QP_SQ_BUFFER_FD,
+				     UVERBS_ATTR_CREATE_QP_SQ_BUFFER_OFFSET,
+				     device->ops.create_qp_umem,
+				     IB_ACCESS_LOCAL_WRITE, &sq_umem);
+	if (ret)
+		goto err_put;
+
+	/* Get RQ buffer umem (from VA or dmabuf FD) */
+	ret = uverbs_get_buffer_umem(device, attrs,
+				     UVERBS_ATTR_CREATE_QP_RQ_BUFFER_VA,
+				     UVERBS_ATTR_CREATE_QP_RQ_BUFFER_LENGTH,
+				     UVERBS_ATTR_CREATE_QP_RQ_BUFFER_FD,
+				     UVERBS_ATTR_CREATE_QP_RQ_BUFFER_OFFSET,
+				     device->ops.create_qp_umem,
+				     IB_ACCESS_LOCAL_WRITE, &rq_umem);
+	if (ret)
+		goto err_release_sq_umem;
+
+	qp = ib_create_qp_user_umem(device, pd, &attr, sq_umem, rq_umem,
+				    &attrs->driver_udata, obj, KBUILD_MODNAME);
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
-		goto err_put;
+		goto err_release_rq_umem;
 	}
+
+	/* Driver took a reference, release ours */
+	ib_umem_release(rq_umem);
+	ib_umem_release(sq_umem);
+
 	ib_qp_usecnt_inc(qp);
 
 	if (attr.qp_type == IB_QPT_XRC_TGT) {
@@ -277,6 +305,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 			     sizeof(qp->qp_num));
 
 	return ret;
+
+err_release_rq_umem:
+	ib_umem_release(rq_umem);
+err_release_sq_umem:
+	ib_umem_release(sq_umem);
 err_put:
 	if (obj->uevent.event_file)
 		uverbs_uobject_put(&obj->uevent.event_file->uobj);
@@ -340,6 +373,30 @@ DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_QP_RESP_QP_NUM,
 			   UVERBS_ATTR_TYPE(u32),
 			   UA_MANDATORY),
+	/* SQ buffer attributes - use VA or FD, not both */
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_BUFFER_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_BUFFER_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_RAW_FD(UVERBS_ATTR_CREATE_QP_SQ_BUFFER_FD,
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_BUFFER_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	/* RQ buffer attributes - use VA or FD, not both */
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_BUFFER_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_BUFFER_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_RAW_FD(UVERBS_ATTR_CREATE_QP_RQ_BUFFER_FD,
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_BUFFER_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
 	UVERBS_ATTR_UHW());
 
 static int UVERBS_HANDLER(UVERBS_METHOD_QP_DESTROY)(
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8b56b6b62352..97073b50fc30 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1264,6 +1264,7 @@ static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
 
 static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 			       struct ib_qp_init_attr *attr,
+			       struct ib_umem *sq_umem, struct ib_umem *rq_umem,
 			       struct ib_udata *udata,
 			       struct ib_uqp_object *uobj, const char *caller)
 {
@@ -1271,8 +1272,13 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 	struct ib_qp *qp;
 	int ret;
 
-	if (!dev->ops.create_qp)
-		return ERR_PTR(-EOPNOTSUPP);
+	if (sq_umem || rq_umem) {
+		if (!dev->ops.create_qp_umem)
+			return ERR_PTR(-EOPNOTSUPP);
+	} else {
+		if (!dev->ops.create_qp)
+			return ERR_PTR(-EOPNOTSUPP);
+	}
 
 	qp = rdma_zalloc_drv_obj_numa(dev, ib_qp);
 	if (!qp)
@@ -1302,7 +1308,12 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 	rdma_restrack_new(&qp->res, RDMA_RESTRACK_QP);
 	WARN_ONCE(!udata && !caller, "Missing kernel QP owner");
 	rdma_restrack_set_name(&qp->res, udata ? NULL : caller);
-	ret = dev->ops.create_qp(qp, attr, udata);
+
+	if (sq_umem || rq_umem)
+		ret = dev->ops.create_qp_umem(qp, attr, sq_umem, rq_umem,
+					      udata);
+	else
+		ret = dev->ops.create_qp(qp, attr, udata);
 	if (ret)
 		goto err_create;
 
@@ -1330,28 +1341,33 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 }
 
 /**
- * ib_create_qp_user - Creates a QP associated with the specified protection
- *   domain.
+ * ib_create_qp_user_umem - Creates a QP with optional umem buffers
  * @dev: IB device
  * @pd: The protection domain associated with the QP.
  * @attr: A list of initial attributes required to create the
  *   QP.  If QP creation succeeds, then the attributes are updated to
  *   the actual capabilities of the created QP.
+ * @sq_umem: SQ buffer umem (optional)
+ * @rq_umem: RQ buffer umem (optional)
  * @udata: User data
- * @uobj: uverbs obect
+ * @uobj: uverbs object
  * @caller: caller's build-time module name
  */
-struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd,
-				struct ib_qp_init_attr *attr,
-				struct ib_udata *udata,
-				struct ib_uqp_object *uobj, const char *caller)
+struct ib_qp *ib_create_qp_user_umem(struct ib_device *dev, struct ib_pd *pd,
+				     struct ib_qp_init_attr *attr,
+				     struct ib_umem *sq_umem,
+				     struct ib_umem *rq_umem,
+				     struct ib_udata *udata,
+				     struct ib_uqp_object *uobj,
+				     const char *caller)
 {
 	struct ib_qp *qp, *xrc_qp;
 
 	if (attr->qp_type == IB_QPT_XRC_TGT)
-		qp = create_qp(dev, pd, attr, NULL, NULL, caller);
+		qp = create_qp(dev, pd, attr, sq_umem, rq_umem, NULL, NULL, caller);
 	else
-		qp = create_qp(dev, pd, attr, udata, uobj, NULL);
+		qp = create_qp(dev, pd, attr, sq_umem, rq_umem, udata, uobj,
+			       NULL);
 	if (attr->qp_type != IB_QPT_XRC_TGT || IS_ERR(qp))
 		return qp;
 
@@ -1364,6 +1380,28 @@ struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd,
 	xrc_qp->uobject = uobj;
 	return xrc_qp;
 }
+EXPORT_SYMBOL(ib_create_qp_user_umem);
+
+/**
+ * ib_create_qp_user - Creates a QP associated with the specified protection
+ *   domain.
+ * @dev: IB device
+ * @pd: The protection domain associated with the QP.
+ * @attr: A list of initial attributes required to create the
+ *   QP.  If QP creation succeeds, then the attributes are updated to
+ *   the actual capabilities of the created QP.
+ * @udata: User data
+ * @uobj: uverbs obect
+ * @caller: caller's build-time module name
+ */
+struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd,
+				struct ib_qp_init_attr *attr,
+				struct ib_udata *udata,
+				struct ib_uqp_object *uobj, const char *caller)
+{
+	return ib_create_qp_user_umem(dev, pd, attr, NULL, NULL, udata, uobj,
+				      caller);
+}
 EXPORT_SYMBOL(ib_create_qp_user);
 
 void ib_qp_usecnt_inc(struct ib_qp *qp)
@@ -1413,7 +1451,8 @@ struct ib_qp *ib_create_qp_kernel(struct ib_pd *pd,
 	if (qp_init_attr->cap.max_rdma_ctxs)
 		rdma_rw_init_qp(device, qp_init_attr);
 
-	qp = create_qp(device, pd, qp_init_attr, NULL, NULL, caller);
+	qp = create_qp(device, pd, qp_init_attr, NULL, NULL, NULL, NULL,
+		       caller);
 	if (IS_ERR(qp))
 		return qp;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8bd020da7745..831426d27078 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2521,6 +2521,10 @@ struct ib_device_ops {
 	int (*destroy_srq)(struct ib_srq *srq, struct ib_udata *udata);
 	int (*create_qp)(struct ib_qp *qp, struct ib_qp_init_attr *qp_init_attr,
 			 struct ib_udata *udata);
+	int (*create_qp_umem)(struct ib_qp *qp,
+			      struct ib_qp_init_attr *qp_init_attr,
+			      struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			      struct ib_udata *udata);
 	int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
 			 int qp_attr_mask, struct ib_udata *udata);
 	int (*query_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 35da4026f452..0d6b4151512d 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -157,6 +157,14 @@ enum uverbs_attrs_create_qp_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_QP_EVENT_FD,
 	UVERBS_ATTR_CREATE_QP_RESP_CAP,
 	UVERBS_ATTR_CREATE_QP_RESP_QP_NUM,
+	UVERBS_ATTR_CREATE_QP_SQ_BUFFER_VA,
+	UVERBS_ATTR_CREATE_QP_SQ_BUFFER_LENGTH,
+	UVERBS_ATTR_CREATE_QP_SQ_BUFFER_FD,
+	UVERBS_ATTR_CREATE_QP_SQ_BUFFER_OFFSET,
+	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_VA,
+	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_LENGTH,
+	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_FD,
+	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_OFFSET,
 };
 
 enum uverbs_attrs_destroy_qp_cmd_attr_ids {
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 06/10] RDMA/mlx5: Add support for QP creation with external umem buffers
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (4 preceding siblings ...)
  2026-02-03  8:49 ` [PATCH rdma-next 05/10] RDMA/core: Add support for QP buffer umem in QP creation Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:49 ` [PATCH rdma-next 07/10] RDMA/uverbs: Add doorbell record umem support to CQ creation Jiri Pirko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Implement the create_qp_umem device operation to support QP creation
with externally provided umem buffers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Id53b7a0bfea91b6653ef6590f019498afffb58ab
---
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  3 ++
 drivers/infiniband/hw/mlx5/qp.c      | 81 +++++++++++++++++++++++-----
 3 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a719738ebd0c..e77a46e463c6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4449,6 +4449,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.create_cq = mlx5_ib_create_cq,
 	.create_cq_umem = mlx5_ib_create_cq_umem,
 	.create_qp = mlx5_ib_create_qp,
+	.create_qp_umem = mlx5_ib_create_qp_umem,
 	.create_srq = mlx5_ib_create_srq,
 	.create_user_ah = mlx5_ib_create_ah,
 	.dealloc_pd = mlx5_ib_dealloc_pd,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 25efdae83d27..755d7c17dbfd 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1356,6 +1356,9 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp);
 void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp);
 int mlx5_ib_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *init_attr,
 		      struct ib_udata *udata);
+int mlx5_ib_create_qp_umem(struct ib_qp *qp, struct ib_qp_init_attr *init_attr,
+			   struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			   struct ib_udata *udata);
 int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		      int attr_mask, struct ib_udata *udata);
 int mlx5_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 69af20790481..010f707e209d 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -943,7 +943,8 @@ static int _create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 			   struct ib_qp_init_attr *attr, u32 **in,
 			   struct mlx5_ib_create_qp_resp *resp, int *inlen,
 			   struct mlx5_ib_qp_base *base,
-			   struct mlx5_ib_create_qp *ucmd)
+			   struct mlx5_ib_create_qp *ucmd,
+			   struct ib_umem *ext_umem)
 {
 	struct mlx5_ib_ucontext *context;
 	struct mlx5_ib_ubuffer *ubuffer = &base->ubuffer;
@@ -998,7 +999,26 @@ static int _create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 	if (err)
 		goto err_bfreg;
 
-	if (ucmd->buf_addr && ubuffer->buf_size) {
+	if (ext_umem) {
+		if (ext_umem->length < ubuffer->buf_size) {
+			mlx5_ib_dbg(dev, "External umem too small for QP\n");
+			err = -EINVAL;
+			goto err_bfreg;
+		}
+		ib_umem_get_ref(ext_umem);
+		ubuffer->umem = ext_umem;
+		ubuffer->buf_size = ext_umem->length;
+		ubuffer->buf_addr = ext_umem->address;
+		page_size = mlx5_umem_find_best_quantized_pgoff(
+			ubuffer->umem, qpc, log_page_size,
+			MLX5_ADAPTER_PAGE_SHIFT, page_offset, 64,
+			&page_offset_quantized);
+		if (!page_size) {
+			err = -EINVAL;
+			goto err_umem;
+		}
+		ncont = ib_umem_num_dma_blocks(ubuffer->umem, page_size);
+	} else if (ucmd->buf_addr && ubuffer->buf_size) {
 		ubuffer->buf_addr = ucmd->buf_addr;
 		ubuffer->umem = ib_umem_get(&dev->ib_dev, ubuffer->buf_addr,
 					    ubuffer->buf_size, 0);
@@ -1335,7 +1355,8 @@ static int get_qp_ts_format(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *send_cq,
 static int create_raw_packet_qp_sq(struct mlx5_ib_dev *dev,
 				   struct ib_udata *udata,
 				   struct mlx5_ib_sq *sq, void *qpin,
-				   struct ib_pd *pd, struct mlx5_ib_cq *cq)
+				   struct ib_pd *pd, struct mlx5_ib_cq *cq,
+				   struct ib_umem *sq_ext_umem)
 {
 	struct mlx5_ib_ubuffer *ubuffer = &sq->ubuffer;
 	__be64 *pas;
@@ -1353,10 +1374,21 @@ static int create_raw_packet_qp_sq(struct mlx5_ib_dev *dev,
 	if (ts_format < 0)
 		return ts_format;
 
-	sq->ubuffer.umem = ib_umem_get(&dev->ib_dev, ubuffer->buf_addr,
-				       ubuffer->buf_size, 0);
-	if (IS_ERR(sq->ubuffer.umem))
-		return PTR_ERR(sq->ubuffer.umem);
+	if (sq_ext_umem) {
+		if (sq_ext_umem->length < ubuffer->buf_size) {
+			mlx5_ib_dbg(dev, "External umem too small for SQ\n");
+			return -EINVAL;
+		}
+		ib_umem_get_ref(sq_ext_umem);
+		sq->ubuffer.umem = sq_ext_umem;
+		sq->ubuffer.buf_size = sq_ext_umem->length;
+		sq->ubuffer.buf_addr = sq_ext_umem->address;
+	} else {
+		sq->ubuffer.umem = ib_umem_get(&dev->ib_dev, ubuffer->buf_addr,
+					       ubuffer->buf_size, 0);
+		if (IS_ERR(sq->ubuffer.umem))
+			return PTR_ERR(sq->ubuffer.umem);
+	}
 	page_size = mlx5_umem_find_best_quantized_pgoff(
 		ubuffer->umem, wq, log_wq_pg_sz, MLX5_ADAPTER_PAGE_SHIFT,
 		page_offset, 64, &page_offset_quantized);
@@ -1568,7 +1600,8 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 				u32 *in, size_t inlen, struct ib_pd *pd,
 				struct ib_udata *udata,
 				struct mlx5_ib_create_qp_resp *resp,
-				struct ib_qp_init_attr *init_attr)
+				struct ib_qp_init_attr *init_attr,
+				struct ib_umem *sq_ext_umem)
 {
 	struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
 	struct mlx5_ib_sq *sq = &raw_packet_qp->sq;
@@ -1588,7 +1621,8 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 			return err;
 
 		err = create_raw_packet_qp_sq(dev, udata, sq, in, pd,
-					      to_mcq(init_attr->send_cq));
+					      to_mcq(init_attr->send_cq),
+					      sq_ext_umem);
 		if (err)
 			goto err_destroy_tis;
 
@@ -1708,6 +1742,8 @@ struct mlx5_create_qp_params {
 	struct ib_qp_init_attr *attr;
 	u32 uidx;
 	struct mlx5_ib_create_qp_resp resp;
+	struct ib_umem *sq_ext_umem;
+	struct ib_umem *rq_ext_umem;
 };
 
 static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct ib_pd *pd,
@@ -2122,7 +2158,7 @@ static int create_dci(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		return ts_format;
 
 	err = _create_user_qp(dev, pd, qp, udata, init_attr, &in, &params->resp,
-			      &inlen, base, ucmd);
+			      &inlen, base, ucmd, NULL);
 	if (err)
 		return err;
 
@@ -2290,7 +2326,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 	}
 
 	err = _create_user_qp(dev, pd, qp, udata, init_attr, &in, &params->resp,
-			      &inlen, base, ucmd);
+			      &inlen, base, ucmd, params->rq_ext_umem);
 	if (err)
 		return err;
 
@@ -2394,7 +2430,8 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		qp->raw_packet_qp.sq.ubuffer.buf_addr = ucmd->sq_buf_addr;
 		raw_packet_qp_copy_info(qp, &qp->raw_packet_qp);
 		err = create_raw_packet_qp(dev, qp, in, inlen, pd, udata,
-					   &params->resp, init_attr);
+					   &params->resp, init_attr,
+					   params->sq_ext_umem);
 	} else
 		err = mlx5_qpc_create_qp(dev, &base->mqp, in, inlen, out);
 
@@ -3251,8 +3288,9 @@ static int check_ucmd_data(struct mlx5_ib_dev *dev,
 	return ret ? 0 : -EINVAL;
 }
 
-int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
-		      struct ib_udata *udata)
+static int __mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
+			       struct ib_udata *udata,
+			       struct ib_umem *sq_umem, struct ib_umem *rq_umem)
 {
 	struct mlx5_create_qp_params params = {};
 	struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
@@ -3277,6 +3315,8 @@ int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 	params.uidx = MLX5_IB_DEFAULT_UIDX;
 	params.attr = attr;
 	params.is_rss_raw = !!attr->rwq_ind_tbl;
+	params.sq_ext_umem = sq_umem;
+	params.rq_ext_umem = rq_umem;
 
 	if (udata) {
 		err = process_udata_size(dev, &params);
@@ -3351,6 +3391,19 @@ int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 	return err;
 }
 
+int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
+		      struct ib_udata *udata)
+{
+	return __mlx5_ib_create_qp(ibqp, attr, udata, NULL, NULL);
+}
+
+int mlx5_ib_create_qp_umem(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
+			   struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			   struct ib_udata *udata)
+{
+	return __mlx5_ib_create_qp(ibqp, attr, udata, sq_umem, rq_umem);
+}
+
 int mlx5_ib_destroy_qp(struct ib_qp *qp, struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(qp->device);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 07/10] RDMA/uverbs: Add doorbell record umem support to CQ creation
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (5 preceding siblings ...)
  2026-02-03  8:49 ` [PATCH rdma-next 06/10] RDMA/mlx5: Add support for QP creation with external umem buffers Jiri Pirko
@ 2026-02-03  8:49 ` Jiri Pirko
  2026-02-03  8:50 ` [PATCH rdma-next 08/10] RDMA/mlx5: Add external doorbell record umem support for CQ Jiri Pirko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Extend the user CQ creation path to support passing doorbell record
(DBR) memory via umem. This allows userspace to provide pre-pinned
doorbell record buffers, which is useful for confidential computing
scenarios where doorbell records need to be in decrypted memory
accessible by the device.

The DBR umem can be provided via:
- Virtual address (VA) mode: UVERBS_ATTR_CREATE_CQ_DBR_VA + LENGTH
- DMA-buf file descriptor mode: UVERBS_ATTR_CREATE_CQ_DBR_FD + OFFSET +
  LENGTH

These modes are mutually exclusive, similar to the CQ buffer umem.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Id0d92c0c98133dcf938664a37444535d4e0a9ea6
---
 drivers/infiniband/core/uverbs_std_types_cq.c | 38 +++++++++++++++----
 drivers/infiniband/hw/efa/efa.h               |  3 +-
 drivers/infiniband/hw/efa/efa_verbs.c         |  5 ++-
 drivers/infiniband/hw/mlx5/cq.c               |  3 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  3 +-
 include/rdma/ib_verbs.h                       |  1 +
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  4 ++
 7 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index 64c43dbd96ba..95ee0fea09b0 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -68,7 +68,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	struct ib_device *ib_dev = attrs->context->device;
 	struct ib_cq_init_attr attr = {};
 	struct ib_uobject *ev_file_uobj;
-	struct ib_umem *umem;
+	struct ib_umem *umem, *dbr_umem;
 	struct ib_cq *cq;
 	u64 user_handle;
 	int ret;
@@ -123,17 +123,25 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	if (ret)
 		goto err_event_file;
 
+	ret = uverbs_get_buffer_umem(ib_dev, attrs,
+				     UVERBS_ATTR_CREATE_CQ_DBR_VA,
+				     UVERBS_ATTR_CREATE_CQ_DBR_LENGTH,
+				     UVERBS_ATTR_CREATE_CQ_DBR_FD,
+				     UVERBS_ATTR_CREATE_CQ_DBR_OFFSET,
+				     ib_dev->ops.create_cq_umem, 0, &dbr_umem);
+	if (ret)
+		goto err_umem;
+
 	/* If no external buffer provided, require create_cq op */
-	if (!umem && !ib_dev->ops.create_cq) {
+	if (!umem && !dbr_umem && !ib_dev->ops.create_cq) {
 		ret = -EINVAL;
-		goto err_event_file;
+		goto err_dbr_umem;
 	}
 
 	cq = rdma_zalloc_drv_obj(ib_dev, ib_cq);
 	if (!cq) {
 		ret = -ENOMEM;
-		ib_umem_release(umem);
-		goto err_event_file;
+		goto err_dbr_umem;
 	}
 
 	cq->device        = ib_dev;
@@ -146,12 +154,14 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
 
-	ret = umem ? ib_dev->ops.create_cq_umem(cq, &attr, umem, attrs) :
+	ret = (umem || dbr_umem) ?
+		ib_dev->ops.create_cq_umem(cq, &attr, umem, dbr_umem, attrs) :
 		ib_dev->ops.create_cq(cq, &attr, attrs);
 	if (ret)
 		goto err_free;
 
 	/* Driver took a reference, release ours */
+	ib_umem_release(dbr_umem);
 	ib_umem_release(umem);
 
 	obj->uevent.uobject.object = cq;
@@ -164,9 +174,12 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	return ret;
 
 err_free:
-	ib_umem_release(umem);
 	rdma_restrack_put(&cq->res);
 	kfree(cq);
+err_dbr_umem:
+	ib_umem_release(dbr_umem);
+err_umem:
+	ib_umem_release(umem);
 err_event_file:
 	if (obj->uevent.event_file)
 		uverbs_uobject_put(&obj->uevent.event_file->uobj);
@@ -214,6 +227,17 @@ DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET,
 			   UVERBS_ATTR_TYPE(u64),
 			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_CQ_DBR_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_CQ_DBR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_RAW_FD(UVERBS_ATTR_CREATE_CQ_DBR_FD,
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_CQ_DBR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
 	UVERBS_ATTR_UHW());
 
 static int UVERBS_HANDLER(UVERBS_METHOD_CQ_DESTROY)(
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 96f9c3bc98b2..e4be0335af51 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -164,7 +164,8 @@ int efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		  struct uverbs_attr_bundle *attrs);
 int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct ib_umem *umem, struct uverbs_attr_bundle *attrs);
+		       struct ib_umem *umem, struct ib_umem *dbr_umem,
+		       struct uverbs_attr_bundle *attrs);
 struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 			 u64 virt_addr, int access_flags,
 			 struct ib_dmah *dmah,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index aca48825cd71..165a2e1e5b72 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1132,7 +1132,8 @@ static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 }
 
 int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct ib_umem *umem, struct uverbs_attr_bundle *attrs)
+		       struct ib_umem *umem, struct ib_umem *dbr_umem,
+		       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(
@@ -1314,7 +1315,7 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		  struct uverbs_attr_bundle *attrs)
 {
-	return efa_create_cq_umem(ibcq, attr, NULL, attrs);
+	return efa_create_cq_umem(ibcq, attr, NULL, NULL, attrs);
 }
 
 static int umem_to_page_list(struct efa_dev *dev,
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2558a52c31c0..35c51a91e8c4 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1077,7 +1077,8 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 }
 
 int mlx5_ib_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-			   struct ib_umem *umem, struct uverbs_attr_bundle *attrs)
+			   struct ib_umem *umem, struct ib_umem *dbr_umem,
+			   struct uverbs_attr_bundle *attrs)
 {
 	return __mlx5_ib_create_cq(ibcq, attr, umem, attrs);
 }
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 755d7c17dbfd..05f764185e92 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1375,7 +1375,8 @@ int mlx5_ib_read_wqe_srq(struct mlx5_ib_srq *srq, int wqe_index, void *buffer,
 int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
 int mlx5_ib_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-			   struct ib_umem *umem, struct uverbs_attr_bundle *attrs);
+			   struct ib_umem *umem, struct ib_umem *dbr_umem,
+			   struct uverbs_attr_bundle *attrs);
 int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 831426d27078..eeafa5358b49 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2535,6 +2535,7 @@ struct ib_device_ops {
 	int (*create_cq_umem)(struct ib_cq *cq,
 			      const struct ib_cq_init_attr *attr,
 			      struct ib_umem *umem,
+			      struct ib_umem *dbr_umem,
 			      struct uverbs_attr_bundle *attrs);
 	int (*modify_cq)(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 	int (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata);
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 0d6b4151512d..ef33b96511a8 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -116,6 +116,10 @@ enum uverbs_attrs_create_cq_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_CQ_BUFFER_LENGTH,
 	UVERBS_ATTR_CREATE_CQ_BUFFER_FD,
 	UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET,
+	UVERBS_ATTR_CREATE_CQ_DBR_VA,
+	UVERBS_ATTR_CREATE_CQ_DBR_LENGTH,
+	UVERBS_ATTR_CREATE_CQ_DBR_FD,
+	UVERBS_ATTR_CREATE_CQ_DBR_OFFSET,
 };
 
 enum uverbs_attrs_destroy_cq_cmd_attr_ids {
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 08/10] RDMA/mlx5: Add external doorbell record umem support for CQ
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (6 preceding siblings ...)
  2026-02-03  8:49 ` [PATCH rdma-next 07/10] RDMA/uverbs: Add doorbell record umem support to CQ creation Jiri Pirko
@ 2026-02-03  8:50 ` Jiri Pirko
  2026-02-03  8:50 ` [PATCH rdma-next 09/10] RDMA/uverbs: Add doorbell record umem support to QP creation Jiri Pirko
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:50 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Extend the mlx5 doorbell mapping infrastructure to support externally
provided doorbell record (DBR) umem buffers. This enables userspace to
pass pre-pinned DBR memory for CQ creation.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Ic2abbd048da9eb0d691423a7df97630a6a7f5e67
---
 drivers/infiniband/hw/mlx5/cq.c       | 12 +++++++-----
 drivers/infiniband/hw/mlx5/doorbell.c | 27 +++++++++++++++++++++++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  2 +-
 drivers/infiniband/hw/mlx5/qp.c       |  4 ++--
 drivers/infiniband/hw/mlx5/srq.c      |  2 +-
 5 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 35c51a91e8c4..3afcb81e53d4 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -718,7 +718,7 @@ static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
 static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 			  struct mlx5_ib_cq *cq, int entries, u32 **cqb,
 			  int *cqe_size, int *index, int *inlen,
-			  struct ib_umem *ext_umem,
+			  struct ib_umem *ext_umem, struct ib_umem *ext_dbr_umem,
 			  struct uverbs_attr_bundle *attrs)
 {
 	struct mlx5_ib_create_cq ucmd = {};
@@ -775,7 +775,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 		goto err_umem;
 	}
 
-	err = mlx5_ib_db_map_user(context, ucmd.db_addr, &cq->db);
+	err = mlx5_ib_db_map_user(context, ucmd.db_addr, ext_dbr_umem, &cq->db);
 	if (err)
 		goto err_umem;
 
@@ -962,6 +962,7 @@ static void notify_soft_wc_handler(struct work_struct *work)
 static int __mlx5_ib_create_cq(struct ib_cq *ibcq,
 			       const struct ib_cq_init_attr *attr,
 			       struct ib_umem *ext_umem,
+			       struct ib_umem *ext_dbr_umem,
 			       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
@@ -1001,7 +1002,8 @@ static int __mlx5_ib_create_cq(struct ib_cq *ibcq,
 
 	if (udata) {
 		err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size,
-				     &index, &inlen, ext_umem, attrs);
+				     &index, &inlen, ext_umem, ext_dbr_umem,
+				     attrs);
 		if (err)
 			return err;
 	} else {
@@ -1073,14 +1075,14 @@ static int __mlx5_ib_create_cq(struct ib_cq *ibcq,
 int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs)
 {
-	return __mlx5_ib_create_cq(ibcq, attr, NULL, attrs);
+	return __mlx5_ib_create_cq(ibcq, attr, NULL, NULL, attrs);
 }
 
 int mlx5_ib_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 			   struct ib_umem *umem, struct ib_umem *dbr_umem,
 			   struct uverbs_attr_bundle *attrs)
 {
-	return __mlx5_ib_create_cq(ibcq, attr, umem, attrs);
+	return __mlx5_ib_create_cq(ibcq, attr, umem, dbr_umem, attrs);
 }
 
 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq)
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index e32111117a5e..fd80b3ff6145 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -46,13 +46,31 @@ struct mlx5_ib_user_db_page {
 };
 
 int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
-			struct mlx5_db *db)
+			struct ib_umem *ext_umem, struct mlx5_db *db)
 {
+	unsigned long dma_offset;
 	struct mlx5_ib_user_db_page *page;
 	int err = 0;
 
 	mutex_lock(&context->db_page_mutex);
 
+	if (ext_umem) {
+		/* External umem path - no page sharing */
+		page = kzalloc(sizeof(*page), GFP_KERNEL);
+		if (!page) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		ib_umem_get_ref(ext_umem);
+		page->umem = ext_umem;
+		dma_offset = ib_umem_offset(ext_umem);
+		goto add_page;
+	}
+
+	dma_offset = virt & ~PAGE_MASK;
+
+	/* Standard path - lookup existing or create new page */
 	list_for_each_entry(page, &context->db_page_list, list)
 		if ((current->mm == page->mm) &&
 		    (page->user_virt == (virt & PAGE_MASK)))
@@ -76,11 +94,11 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
 	mmgrab(current->mm);
 	page->mm = current->mm;
 
+add_page:
 	list_add(&page->list, &context->db_page_list);
 
 found:
-	db->dma = sg_dma_address(page->umem->sgt_append.sgt.sgl) +
-		  (virt & ~PAGE_MASK);
+	db->dma = sg_dma_address(page->umem->sgt_append.sgt.sgl) + dma_offset;
 	db->u.user_page = page;
 	++page->refcnt;
 
@@ -96,7 +114,8 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
 
 	if (!--db->u.user_page->refcnt) {
 		list_del(&db->u.user_page->list);
-		mmdrop(db->u.user_page->mm);
+		if (db->u.user_page->mm)
+			mmdrop(db->u.user_page->mm);
 		ib_umem_release(db->u.user_page->umem);
 		kfree(db->u.user_page);
 	}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 05f764185e92..c1e60f5c1754 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1332,7 +1332,7 @@ to_mmmap(struct rdma_user_mmap_entry *rdma_entry)
 int mlx5_ib_dev_res_cq_init(struct mlx5_ib_dev *dev);
 int mlx5_ib_dev_res_srq_init(struct mlx5_ib_dev *dev);
 int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
-			struct mlx5_db *db);
+			struct ib_umem *ext_umem, struct mlx5_db *db);
 void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db);
 void __mlx5_ib_cq_clean(struct mlx5_ib_cq *cq, u32 qpn, struct mlx5_ib_srq *srq);
 void mlx5_ib_cq_clean(struct mlx5_ib_cq *cq, u32 qpn, struct mlx5_ib_srq *srq);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 010f707e209d..4b926b6f2461 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -918,7 +918,7 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		ib_umem_num_pages(rwq->umem), page_size, rwq->rq_num_pas,
 		offset);
 
-	err = mlx5_ib_db_map_user(ucontext, ucmd->db_addr, &rwq->db);
+	err = mlx5_ib_db_map_user(ucontext, ucmd->db_addr, NULL, &rwq->db);
 	if (err) {
 		mlx5_ib_dbg(dev, "map failed\n");
 		goto err_umem;
@@ -1064,7 +1064,7 @@ static int _create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		resp->bfreg_index = MLX5_IB_INVALID_BFREG;
 	qp->bfregn = bfregn;
 
-	err = mlx5_ib_db_map_user(context, ucmd->db_addr, &qp->db);
+	err = mlx5_ib_db_map_user(context, ucmd->db_addr, NULL, &qp->db);
 	if (err) {
 		mlx5_ib_dbg(dev, "map failed\n");
 		goto err_free;
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index bcb6b324af50..6169ffaf457c 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -83,7 +83,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	}
 	in->umem = srq->umem;
 
-	err = mlx5_ib_db_map_user(ucontext, ucmd.db_addr, &srq->db);
+	err = mlx5_ib_db_map_user(ucontext, ucmd.db_addr, NULL, &srq->db);
 	if (err) {
 		mlx5_ib_dbg(dev, "map doorbell failed\n");
 		goto err_umem;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 09/10] RDMA/uverbs: Add doorbell record umem support to QP creation
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (7 preceding siblings ...)
  2026-02-03  8:50 ` [PATCH rdma-next 08/10] RDMA/mlx5: Add external doorbell record umem support for CQ Jiri Pirko
@ 2026-02-03  8:50 ` Jiri Pirko
  2026-02-03  8:50 ` [PATCH rdma-next 10/10] RDMA/mlx5: Add external doorbell record umem support for QP Jiri Pirko
  2026-02-03  9:59 ` [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Leon Romanovsky
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:50 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Extend the QP creation path to support passing doorbell record (DBR)
memory via umem for both SQ and RQ. This allows userspace to provide
pre-pinned doorbell record buffers.

The DBR umem can be provided via:
- Virtual address (VA) mode: UVERBS_ATTR_CREATE_QP_{SQ,RQ}_DBR_VA + LENGTH
- DMA-buf file descriptor mode: UVERBS_ATTR_CREATE_QP_{SQ,RQ}_DBR_FD +
  OFFSET + LENGTH

These modes are mutually exclusive, similar to the QP buffer umem.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: Id2836b852d114a93c9b5b45c09bcef386386a193
---
 drivers/infiniband/core/core_priv.h           |  2 +
 drivers/infiniband/core/uverbs_std_types_qp.c | 55 ++++++++++++++++++-
 drivers/infiniband/core/verbs.c               | 25 ++++++---
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  1 +
 drivers/infiniband/hw/mlx5/qp.c               |  1 +
 include/rdma/ib_verbs.h                       |  2 +
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  8 +++
 7 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 553326c4eca9..baca8cc120aa 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -324,6 +324,8 @@ struct ib_qp *ib_create_qp_user_umem(struct ib_device *dev, struct ib_pd *pd,
 				     struct ib_qp_init_attr *attr,
 				     struct ib_umem *sq_umem,
 				     struct ib_umem *rq_umem,
+				     struct ib_umem *sq_dbr_umem,
+				     struct ib_umem *rq_dbr_umem,
 				     struct ib_udata *udata,
 				     struct ib_uqp_object *uobj,
 				     const char *caller);
diff --git a/drivers/infiniband/core/uverbs_std_types_qp.c b/drivers/infiniband/core/uverbs_std_types_qp.c
index f1e2cfb27aa5..3c8817f286a2 100644
--- a/drivers/infiniband/core/uverbs_std_types_qp.c
+++ b/drivers/infiniband/core/uverbs_std_types_qp.c
@@ -96,7 +96,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 	struct ib_xrcd *xrcd = NULL;
 	struct ib_uobject *xrcd_uobj = NULL;
 	struct ib_device *device;
-	struct ib_umem *sq_umem, *rq_umem;
+	struct ib_umem *sq_umem, *rq_umem, *sq_dbr_umem, *rq_dbr_umem;
 	u64 user_handle;
 	int ret;
 
@@ -271,14 +271,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 	if (ret)
 		goto err_release_sq_umem;
 
+	/* Get SQ DBR umem (from VA or dmabuf FD) */
+	ret = uverbs_get_buffer_umem(device, attrs,
+				 UVERBS_ATTR_CREATE_QP_SQ_DBR_VA,
+				 UVERBS_ATTR_CREATE_QP_SQ_DBR_LENGTH,
+				 UVERBS_ATTR_CREATE_QP_SQ_DBR_FD,
+				 UVERBS_ATTR_CREATE_QP_SQ_DBR_OFFSET,
+				 device->ops.create_qp_umem, 0, &sq_dbr_umem);
+	if (ret)
+		goto err_release_rq_umem;
+
+	/* Get RQ DBR umem (from VA or dmabuf FD) */
+	ret = uverbs_get_buffer_umem(device, attrs,
+				 UVERBS_ATTR_CREATE_QP_RQ_DBR_VA,
+				 UVERBS_ATTR_CREATE_QP_RQ_DBR_LENGTH,
+				 UVERBS_ATTR_CREATE_QP_RQ_DBR_FD,
+				 UVERBS_ATTR_CREATE_QP_RQ_DBR_OFFSET,
+				 device->ops.create_qp_umem, 0, &rq_dbr_umem);
+	if (ret)
+		goto err_release_sq_dbr_umem;
+
 	qp = ib_create_qp_user_umem(device, pd, &attr, sq_umem, rq_umem,
+				    sq_dbr_umem, rq_dbr_umem,
 				    &attrs->driver_udata, obj, KBUILD_MODNAME);
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
-		goto err_release_rq_umem;
+		goto err_release_rq_dbr_umem;
 	}
 
 	/* Driver took a reference, release ours */
+	ib_umem_release(rq_dbr_umem);
+	ib_umem_release(sq_dbr_umem);
 	ib_umem_release(rq_umem);
 	ib_umem_release(sq_umem);
 
@@ -306,6 +329,10 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 
 	return ret;
 
+err_release_rq_dbr_umem:
+	ib_umem_release(rq_dbr_umem);
+err_release_sq_dbr_umem:
+	ib_umem_release(sq_dbr_umem);
 err_release_rq_umem:
 	ib_umem_release(rq_umem);
 err_release_sq_umem:
@@ -397,6 +424,30 @@ DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_BUFFER_OFFSET,
 			   UVERBS_ATTR_TYPE(u64),
 			   UA_OPTIONAL),
+	/* SQ DBR attributes - use VA or FD, not both */
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_DBR_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_DBR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_RAW_FD(UVERBS_ATTR_CREATE_QP_SQ_DBR_FD,
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SQ_DBR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	/* RQ DBR attributes - use VA or FD, not both */
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_DBR_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_DBR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_RAW_FD(UVERBS_ATTR_CREATE_QP_RQ_DBR_FD,
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_RQ_DBR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_OPTIONAL),
 	UVERBS_ATTR_UHW());
 
 static int UVERBS_HANDLER(UVERBS_METHOD_QP_DESTROY)(
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 97073b50fc30..db0ad750a78a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1265,6 +1265,8 @@ static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
 static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 			       struct ib_qp_init_attr *attr,
 			       struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			       struct ib_umem *sq_dbr_umem,
+			       struct ib_umem *rq_dbr_umem,
 			       struct ib_udata *udata,
 			       struct ib_uqp_object *uobj, const char *caller)
 {
@@ -1272,7 +1274,7 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 	struct ib_qp *qp;
 	int ret;
 
-	if (sq_umem || rq_umem) {
+	if (sq_umem || rq_umem || sq_dbr_umem || rq_dbr_umem) {
 		if (!dev->ops.create_qp_umem)
 			return ERR_PTR(-EOPNOTSUPP);
 	} else {
@@ -1309,9 +1311,9 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 	WARN_ONCE(!udata && !caller, "Missing kernel QP owner");
 	rdma_restrack_set_name(&qp->res, udata ? NULL : caller);
 
-	if (sq_umem || rq_umem)
+	if (sq_umem || rq_umem || sq_dbr_umem || rq_dbr_umem)
 		ret = dev->ops.create_qp_umem(qp, attr, sq_umem, rq_umem,
-					      udata);
+					      sq_dbr_umem, rq_dbr_umem, udata);
 	else
 		ret = dev->ops.create_qp(qp, attr, udata);
 	if (ret)
@@ -1349,6 +1351,8 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
  *   the actual capabilities of the created QP.
  * @sq_umem: SQ buffer umem (optional)
  * @rq_umem: RQ buffer umem (optional)
+ * @sq_dbr_umem: SQ doorbell record umem (optional)
+ * @rq_dbr_umem: RQ doorbell record umem (optional)
  * @udata: User data
  * @uobj: uverbs object
  * @caller: caller's build-time module name
@@ -1357,6 +1361,8 @@ struct ib_qp *ib_create_qp_user_umem(struct ib_device *dev, struct ib_pd *pd,
 				     struct ib_qp_init_attr *attr,
 				     struct ib_umem *sq_umem,
 				     struct ib_umem *rq_umem,
+				     struct ib_umem *sq_dbr_umem,
+				     struct ib_umem *rq_dbr_umem,
 				     struct ib_udata *udata,
 				     struct ib_uqp_object *uobj,
 				     const char *caller)
@@ -1364,10 +1370,11 @@ struct ib_qp *ib_create_qp_user_umem(struct ib_device *dev, struct ib_pd *pd,
 	struct ib_qp *qp, *xrc_qp;
 
 	if (attr->qp_type == IB_QPT_XRC_TGT)
-		qp = create_qp(dev, pd, attr, sq_umem, rq_umem, NULL, NULL, caller);
+		qp = create_qp(dev, pd, attr, NULL, NULL, NULL, NULL, NULL,
+			       NULL, caller);
 	else
-		qp = create_qp(dev, pd, attr, sq_umem, rq_umem, udata, uobj,
-			       NULL);
+		qp = create_qp(dev, pd, attr, sq_umem, rq_umem, sq_dbr_umem,
+			       rq_dbr_umem, udata, uobj, NULL);
 	if (attr->qp_type != IB_QPT_XRC_TGT || IS_ERR(qp))
 		return qp;
 
@@ -1399,8 +1406,8 @@ struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd,
 				struct ib_udata *udata,
 				struct ib_uqp_object *uobj, const char *caller)
 {
-	return ib_create_qp_user_umem(dev, pd, attr, NULL, NULL, udata, uobj,
-				      caller);
+	return ib_create_qp_user_umem(dev, pd, attr, NULL, NULL, NULL, NULL,
+				      udata, uobj, caller);
 }
 EXPORT_SYMBOL(ib_create_qp_user);
 
@@ -1452,7 +1459,7 @@ struct ib_qp *ib_create_qp_kernel(struct ib_pd *pd,
 		rdma_rw_init_qp(device, qp_init_attr);
 
 	qp = create_qp(device, pd, qp_init_attr, NULL, NULL, NULL, NULL,
-		       caller);
+		       NULL, NULL, caller);
 	if (IS_ERR(qp))
 		return qp;
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c1e60f5c1754..f654d0fde3f1 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1358,6 +1358,7 @@ int mlx5_ib_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *init_attr,
 		      struct ib_udata *udata);
 int mlx5_ib_create_qp_umem(struct ib_qp *qp, struct ib_qp_init_attr *init_attr,
 			   struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			   struct ib_umem *sq_dbr_umem, struct ib_umem *rq_dbr_umem,
 			   struct ib_udata *udata);
 int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		      int attr_mask, struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 4b926b6f2461..79d30e68b4cb 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3399,6 +3399,7 @@ int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 
 int mlx5_ib_create_qp_umem(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 			   struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			   struct ib_umem *sq_dbr_umem, struct ib_umem *rq_dbr_umem,
 			   struct ib_udata *udata)
 {
 	return __mlx5_ib_create_qp(ibqp, attr, udata, sq_umem, rq_umem);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index eeafa5358b49..36d59e9d45b5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2524,6 +2524,8 @@ struct ib_device_ops {
 	int (*create_qp_umem)(struct ib_qp *qp,
 			      struct ib_qp_init_attr *qp_init_attr,
 			      struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			      struct ib_umem *sq_dbr_umem,
+			      struct ib_umem *rq_dbr_umem,
 			      struct ib_udata *udata);
 	int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
 			 int qp_attr_mask, struct ib_udata *udata);
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index ef33b96511a8..5e9be458e990 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -169,6 +169,14 @@ enum uverbs_attrs_create_qp_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_LENGTH,
 	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_FD,
 	UVERBS_ATTR_CREATE_QP_RQ_BUFFER_OFFSET,
+	UVERBS_ATTR_CREATE_QP_SQ_DBR_VA,
+	UVERBS_ATTR_CREATE_QP_SQ_DBR_LENGTH,
+	UVERBS_ATTR_CREATE_QP_SQ_DBR_FD,
+	UVERBS_ATTR_CREATE_QP_SQ_DBR_OFFSET,
+	UVERBS_ATTR_CREATE_QP_RQ_DBR_VA,
+	UVERBS_ATTR_CREATE_QP_RQ_DBR_LENGTH,
+	UVERBS_ATTR_CREATE_QP_RQ_DBR_FD,
+	UVERBS_ATTR_CREATE_QP_RQ_DBR_OFFSET,
 };
 
 enum uverbs_attrs_destroy_qp_cmd_attr_ids {
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rdma-next 10/10] RDMA/mlx5: Add external doorbell record umem support for QP
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (8 preceding siblings ...)
  2026-02-03  8:50 ` [PATCH rdma-next 09/10] RDMA/uverbs: Add doorbell record umem support to QP creation Jiri Pirko
@ 2026-02-03  8:50 ` Jiri Pirko
  2026-02-03  9:59 ` [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Leon Romanovsky
  10 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03  8:50 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

From: Jiri Pirko <jiri@nvidia.com>

Extend the mlx5 QP creation path to support externally provided
doorbell record (DBR) umem buffers. This enables userspace to pass
pre-pinned DBR memory for QP creation.

Note that a single doorbell is shared between SQ and RQ for raw-QPs.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Change-Id: I641f6412ad99b48f6633da742bf169df2bd1edf3
---
 drivers/infiniband/hw/mlx5/qp.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 79d30e68b4cb..f8c24f66fef5 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -944,7 +944,8 @@ static int _create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 			   struct mlx5_ib_create_qp_resp *resp, int *inlen,
 			   struct mlx5_ib_qp_base *base,
 			   struct mlx5_ib_create_qp *ucmd,
-			   struct ib_umem *ext_umem)
+			   struct ib_umem *ext_umem,
+			   struct ib_umem *ext_dbr_umem)
 {
 	struct mlx5_ib_ucontext *context;
 	struct mlx5_ib_ubuffer *ubuffer = &base->ubuffer;
@@ -1064,7 +1065,7 @@ static int _create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		resp->bfreg_index = MLX5_IB_INVALID_BFREG;
 	qp->bfregn = bfregn;
 
-	err = mlx5_ib_db_map_user(context, ucmd->db_addr, NULL, &qp->db);
+	err = mlx5_ib_db_map_user(context, ucmd->db_addr, ext_dbr_umem, &qp->db);
 	if (err) {
 		mlx5_ib_dbg(dev, "map failed\n");
 		goto err_free;
@@ -1744,6 +1745,7 @@ struct mlx5_create_qp_params {
 	struct mlx5_ib_create_qp_resp resp;
 	struct ib_umem *sq_ext_umem;
 	struct ib_umem *rq_ext_umem;
+	struct ib_umem *rq_dbr_ext_umem;
 };
 
 static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct ib_pd *pd,
@@ -2158,7 +2160,7 @@ static int create_dci(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 		return ts_format;
 
 	err = _create_user_qp(dev, pd, qp, udata, init_attr, &in, &params->resp,
-			      &inlen, base, ucmd, NULL);
+			      &inlen, base, ucmd, NULL, NULL);
 	if (err)
 		return err;
 
@@ -2326,7 +2328,8 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 	}
 
 	err = _create_user_qp(dev, pd, qp, udata, init_attr, &in, &params->resp,
-			      &inlen, base, ucmd, params->rq_ext_umem);
+			      &inlen, base, ucmd, params->rq_ext_umem,
+			      params->rq_dbr_ext_umem);
 	if (err)
 		return err;
 
@@ -3290,7 +3293,8 @@ static int check_ucmd_data(struct mlx5_ib_dev *dev,
 
 static int __mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 			       struct ib_udata *udata,
-			       struct ib_umem *sq_umem, struct ib_umem *rq_umem)
+			       struct ib_umem *sq_umem, struct ib_umem *rq_umem,
+			       struct ib_umem *rq_dbr_umem)
 {
 	struct mlx5_create_qp_params params = {};
 	struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
@@ -3317,6 +3321,7 @@ static int __mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 	params.is_rss_raw = !!attr->rwq_ind_tbl;
 	params.sq_ext_umem = sq_umem;
 	params.rq_ext_umem = rq_umem;
+	params.rq_dbr_ext_umem = rq_dbr_umem;
 
 	if (udata) {
 		err = process_udata_size(dev, &params);
@@ -3394,7 +3399,7 @@ static int __mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 int mlx5_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 		      struct ib_udata *udata)
 {
-	return __mlx5_ib_create_qp(ibqp, attr, udata, NULL, NULL);
+	return __mlx5_ib_create_qp(ibqp, attr, udata, NULL, NULL, NULL);
 }
 
 int mlx5_ib_create_qp_umem(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
@@ -3402,7 +3407,9 @@ int mlx5_ib_create_qp_umem(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 			   struct ib_umem *sq_dbr_umem, struct ib_umem *rq_dbr_umem,
 			   struct ib_udata *udata)
 {
-	return __mlx5_ib_create_qp(ibqp, attr, udata, sq_umem, rq_umem);
+	/* Single DBR umem for both SQ and RQ. */
+	return __mlx5_ib_create_qp(ibqp, attr, udata, sq_umem, rq_umem,
+				   rq_dbr_umem);
 }
 
 int mlx5_ib_destroy_qp(struct ib_qp *qp, struct ib_udata *udata)
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records
  2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
                   ` (9 preceding siblings ...)
  2026-02-03  8:50 ` [PATCH rdma-next 10/10] RDMA/mlx5: Add external doorbell record umem support for QP Jiri Pirko
@ 2026-02-03  9:59 ` Leon Romanovsky
  2026-02-03 10:13   ` Jiri Pirko
  10 siblings, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03  9:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 09:49:52AM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> This patchset extends the existing CQ umem buffer infrastructure to QP
> creation and adds doorbell record (DBR) umem support for both CQ and QP.
> 
> The kernel already supports CQ creation with externally provided umem
> buffers. This patchset:
> 
> 1. Adds umem reference counting to simplify memory lifecycle management.
> 2. Adds mlx5 driver implementation for the existing CQ buffer umem support.
> 3. Extends QP creation to support external SQ/RQ buffer umems.
> 4. Adds doorbell record umem support for both CQ and QP.
> 5. Implements all extensions in mlx5 driver.
> 
> This enables integration with dmabuf exporters for specialized memory
> types, such as GPU memory for GPU-direct RDMA scenarios.

What do you mean by that?

Are you referring to the RDMA dma‑buf exporter from Yishai?  
The GPU dma‑buf exporters already work correctly with the existing  
RDMA importer.

Thanks

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
@ 2026-02-03 10:03   ` Leon Romanovsky
  2026-02-03 10:11     ` Jiri Pirko
  2026-02-03 14:51   ` Jason Gunthorpe
  1 sibling, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03 10:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Introduce reference counting for ib_umem objects to simplify memory
> lifecycle management when umem buffers are shared between the core
> layer and device drivers.
> 
> When the core RDMA layer allocates an ib_umem and passes it to a driver
> (e.g., for CQ or QP creation with external buffers), both layers need
> to manage the umem lifecycle. Without reference counting, this requires
> complex conditional release logic to avoid double-frees on error paths
> and leaks on success paths.

This sentence doesn't align with the proposed change.

> 
> With reference counting:
> - Core allocates umem with refcount=1
> - Driver calls ib_umem_get_ref() to take a reference
> - Both layers can unconditionally call ib_umem_release()
> - The umem is freed only when the last reference is dropped
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d

Please remove the Change-Ids and write the commit message yourself,
without relying on AI. The current message provides no meaningful
information, particularly the auto‑generated summary at the end.

Thanks

> ---
>  drivers/infiniband/core/umem.c        | 5 +++++
>  drivers/infiniband/core/umem_dmabuf.c | 1 +
>  drivers/infiniband/core/umem_odp.c    | 3 +++
>  include/rdma/ib_umem.h                | 9 +++++++++
>  4 files changed, 18 insertions(+)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 8137031c2a65..09ce694d66ea 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
>  	if (!umem)
>  		return ERR_PTR(-ENOMEM);
> +	refcount_set(&umem->refcount, 1);
>  	umem->ibdev      = device;
>  	umem->length     = size;
>  	umem->address    = addr;
> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
>  /**
>   * ib_umem_release - release memory pinned with ib_umem_get
>   * @umem: umem struct to release
> + *
> + * Decrement the reference count and free the umem when it reaches zero.
>   */
>  void ib_umem_release(struct ib_umem *umem)
>  {
>  	if (!umem)
>  		return;
> +	if (!refcount_dec_and_test(&umem->refcount))
> +		return;
>  	if (umem->is_dmabuf)
>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
>  	if (umem->is_odp)
> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> index 939da49b0dcc..5c5286092fca 100644
> --- a/drivers/infiniband/core/umem_dmabuf.c
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>  	}
>  
>  	umem = &umem_dmabuf->umem;
> +	refcount_set(&umem->refcount, 1);
>  	umem->ibdev = device;
>  	umem->length = size;
>  	umem->address = offset;
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 572a91a62a7b..7be30fda57e3 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
>  	if (!umem_odp)
>  		return ERR_PTR(-ENOMEM);
>  	umem = &umem_odp->umem;
> +	refcount_set(&umem->refcount, 1);
>  	umem->ibdev = device;
>  	umem->writable = ib_access_writable(access);
>  	umem->owning_mm = current->mm;
> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
>  	if (!odp_data)
>  		return ERR_PTR(-ENOMEM);
>  	umem = &odp_data->umem;
> +	refcount_set(&umem->refcount, 1);
>  	umem->ibdev = root->umem.ibdev;
>  	umem->length     = size;
>  	umem->address    = addr;
> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>  	if (!umem_odp)
>  		return ERR_PTR(-ENOMEM);
>  
> +	refcount_set(&umem_odp->umem.refcount, 1);
>  	umem_odp->umem.ibdev = device;
>  	umem_odp->umem.length = size;
>  	umem_odp->umem.address = addr;
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 0a8e092c0ea8..44264f78eab3 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -10,6 +10,7 @@
>  #include <linux/list.h>
>  #include <linux/scatterlist.h>
>  #include <linux/workqueue.h>
> +#include <linux/refcount.h>
>  #include <rdma/ib_verbs.h>
>  
>  struct ib_ucontext;
> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
>  struct ib_umem {
>  	struct ib_device       *ibdev;
>  	struct mm_struct       *owning_mm;
> +	refcount_t		refcount;
>  	u64 iova;
>  	size_t			length;
>  	unsigned long		address;
> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
>  
>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>  			    size_t size, int access);
> +
> +static inline void ib_umem_get_ref(struct ib_umem *umem)
> +{
> +	refcount_inc(&umem->refcount);
> +}
> +
>  void ib_umem_release(struct ib_umem *umem);
>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>  		      size_t length);
> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
>  {
>  	return ERR_PTR(-EOPNOTSUPP);
>  }
> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
>  static inline void ib_umem_release(struct ib_umem *umem) { }
>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>  		      		    size_t length) {
> -- 
> 2.51.1
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 10:03   ` Leon Romanovsky
@ 2026-02-03 10:11     ` Jiri Pirko
  2026-02-03 12:26       ` Leon Romanovsky
  0 siblings, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 10:11 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
>On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> Introduce reference counting for ib_umem objects to simplify memory
>> lifecycle management when umem buffers are shared between the core
>> layer and device drivers.
>> 
>> When the core RDMA layer allocates an ib_umem and passes it to a driver
>> (e.g., for CQ or QP creation with external buffers), both layers need
>> to manage the umem lifecycle. Without reference counting, this requires
>> complex conditional release logic to avoid double-frees on error paths
>> and leaks on success paths.
>
>This sentence doesn't align with the proposed change.

Hmm, I'm not sure why you think it does not align. It exactly describes
the code I had it originally without this path in place :)

>
>> 
>> With reference counting:
>> - Core allocates umem with refcount=1
>> - Driver calls ib_umem_get_ref() to take a reference
>> - Both layers can unconditionally call ib_umem_release()
>> - The umem is freed only when the last reference is dropped
>> 
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
>
>Please remove the Change-Ids and write the commit message yourself,
>without relying on AI. The current message provides no meaningful

I'm new in RDMA. Not sure what you mean by meaningful information :)
I'm always trying to provide it.

>information, particularly the auto‑generated summary at the end.

Doh, the changeIDs :) Sorry about that.


>
>Thanks
>
>> ---
>>  drivers/infiniband/core/umem.c        | 5 +++++
>>  drivers/infiniband/core/umem_dmabuf.c | 1 +
>>  drivers/infiniband/core/umem_odp.c    | 3 +++
>>  include/rdma/ib_umem.h                | 9 +++++++++
>>  4 files changed, 18 insertions(+)
>> 
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 8137031c2a65..09ce694d66ea 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
>>  	if (!umem)
>>  		return ERR_PTR(-ENOMEM);
>> +	refcount_set(&umem->refcount, 1);
>>  	umem->ibdev      = device;
>>  	umem->length     = size;
>>  	umem->address    = addr;
>> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
>>  /**
>>   * ib_umem_release - release memory pinned with ib_umem_get
>>   * @umem: umem struct to release
>> + *
>> + * Decrement the reference count and free the umem when it reaches zero.
>>   */
>>  void ib_umem_release(struct ib_umem *umem)
>>  {
>>  	if (!umem)
>>  		return;
>> +	if (!refcount_dec_and_test(&umem->refcount))
>> +		return;
>>  	if (umem->is_dmabuf)
>>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
>>  	if (umem->is_odp)
>> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
>> index 939da49b0dcc..5c5286092fca 100644
>> --- a/drivers/infiniband/core/umem_dmabuf.c
>> +++ b/drivers/infiniband/core/umem_dmabuf.c
>> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>>  	}
>>  
>>  	umem = &umem_dmabuf->umem;
>> +	refcount_set(&umem->refcount, 1);
>>  	umem->ibdev = device;
>>  	umem->length = size;
>>  	umem->address = offset;
>> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> index 572a91a62a7b..7be30fda57e3 100644
>> --- a/drivers/infiniband/core/umem_odp.c
>> +++ b/drivers/infiniband/core/umem_odp.c
>> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
>>  	if (!umem_odp)
>>  		return ERR_PTR(-ENOMEM);
>>  	umem = &umem_odp->umem;
>> +	refcount_set(&umem->refcount, 1);
>>  	umem->ibdev = device;
>>  	umem->writable = ib_access_writable(access);
>>  	umem->owning_mm = current->mm;
>> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
>>  	if (!odp_data)
>>  		return ERR_PTR(-ENOMEM);
>>  	umem = &odp_data->umem;
>> +	refcount_set(&umem->refcount, 1);
>>  	umem->ibdev = root->umem.ibdev;
>>  	umem->length     = size;
>>  	umem->address    = addr;
>> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>>  	if (!umem_odp)
>>  		return ERR_PTR(-ENOMEM);
>>  
>> +	refcount_set(&umem_odp->umem.refcount, 1);
>>  	umem_odp->umem.ibdev = device;
>>  	umem_odp->umem.length = size;
>>  	umem_odp->umem.address = addr;
>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> index 0a8e092c0ea8..44264f78eab3 100644
>> --- a/include/rdma/ib_umem.h
>> +++ b/include/rdma/ib_umem.h
>> @@ -10,6 +10,7 @@
>>  #include <linux/list.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/refcount.h>
>>  #include <rdma/ib_verbs.h>
>>  
>>  struct ib_ucontext;
>> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
>>  struct ib_umem {
>>  	struct ib_device       *ibdev;
>>  	struct mm_struct       *owning_mm;
>> +	refcount_t		refcount;
>>  	u64 iova;
>>  	size_t			length;
>>  	unsigned long		address;
>> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
>>  
>>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>>  			    size_t size, int access);
>> +
>> +static inline void ib_umem_get_ref(struct ib_umem *umem)
>> +{
>> +	refcount_inc(&umem->refcount);
>> +}
>> +
>>  void ib_umem_release(struct ib_umem *umem);
>>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>>  		      size_t length);
>> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
>>  {
>>  	return ERR_PTR(-EOPNOTSUPP);
>>  }
>> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
>>  static inline void ib_umem_release(struct ib_umem *umem) { }
>>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>>  		      		    size_t length) {
>> -- 
>> 2.51.1
>> 
>> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records
  2026-02-03  9:59 ` [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Leon Romanovsky
@ 2026-02-03 10:13   ` Jiri Pirko
  0 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 10:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 10:59:33AM +0100, leon@kernel.org wrote:
>On Tue, Feb 03, 2026 at 09:49:52AM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> This patchset extends the existing CQ umem buffer infrastructure to QP
>> creation and adds doorbell record (DBR) umem support for both CQ and QP.
>> 
>> The kernel already supports CQ creation with externally provided umem
>> buffers. This patchset:
>> 
>> 1. Adds umem reference counting to simplify memory lifecycle management.
>> 2. Adds mlx5 driver implementation for the existing CQ buffer umem support.
>> 3. Extends QP creation to support external SQ/RQ buffer umems.
>> 4. Adds doorbell record umem support for both CQ and QP.
>> 5. Implements all extensions in mlx5 driver.
>> 
>> This enables integration with dmabuf exporters for specialized memory
>> types, such as GPU memory for GPU-direct RDMA scenarios.
>
>What do you mean by that?
>
>Are you referring to the RDMA dma‑buf exporter from Yishai?  
>The GPU dma‑buf exporters already work correctly with the existing  
>RDMA importer.

Basically it allows you to use any DMA-buf exported memory for CQ/QP/DBR
purposes. Currently, only MR works with DMA-buf exported memory.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 10:11     ` Jiri Pirko
@ 2026-02-03 12:26       ` Leon Romanovsky
  2026-02-03 12:46         ` Jiri Pirko
  0 siblings, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03 12:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> From: Jiri Pirko <jiri@nvidia.com>
> >> 
> >> Introduce reference counting for ib_umem objects to simplify memory
> >> lifecycle management when umem buffers are shared between the core
> >> layer and device drivers.

The lifecycle should be confined either to the core or to the driver,
but it should not mix responsibilities across both.

> >> 
> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
> >> (e.g., for CQ or QP creation with external buffers), both layers need
> >> to manage the umem lifecycle. Without reference counting, this requires
> >> complex conditional release logic to avoid double-frees on error paths
> >> and leaks on success paths.
> >
> >This sentence doesn't align with the proposed change.
> 
> Hmm, I'm not sure why you think it does not align. It exactly describes
> the code I had it originally without this path in place :)

There is no complex conditional release logic which this reference
counting solves. That counting allows delayed, semi-random release,
nothing more.

> 
> >
> >> 
> >> With reference counting:
> >> - Core allocates umem with refcount=1
> >> - Driver calls ib_umem_get_ref() to take a reference
> >> - Both layers can unconditionally call ib_umem_release()
> >> - The umem is freed only when the last reference is dropped
> >> 
> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
> >
> >Please remove the Change-Ids and write the commit message yourself,
> >without relying on AI. The current message provides no meaningful
> 
> I'm new in RDMA. Not sure what you mean by meaningful information :)

This part of commit message is clearly generated by Cursor:
With reference counting:
- Core allocates umem with refcount=1
- Driver calls ib_umem_get_ref() to take a reference
- Both layers can unconditionally call ib_umem_release()
- The umem is freed only when the last reference is dropped

The above paragraphs have clear AI pattern as they don't explain why,
but only how.

I'm seeing the SAME commit messages in many internals and externals
patches.

Even my AI review is agreed with me :)
...
"AI-authorship-score": "medium"
...

> I'm always trying to provide it.
> 
> >information, particularly the auto‑generated summary at the end.
> 
> Doh, the changeIDs :) Sorry about that.
> 
> 
> >
> >Thanks
> >
> >> ---
> >>  drivers/infiniband/core/umem.c        | 5 +++++
> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
> >>  include/rdma/ib_umem.h                | 9 +++++++++
> >>  4 files changed, 18 insertions(+)
> >> 
> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >> index 8137031c2a65..09ce694d66ea 100644
> >> --- a/drivers/infiniband/core/umem.c
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
> >>  	if (!umem)
> >>  		return ERR_PTR(-ENOMEM);
> >> +	refcount_set(&umem->refcount, 1);
> >>  	umem->ibdev      = device;
> >>  	umem->length     = size;
> >>  	umem->address    = addr;
> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
> >>  /**
> >>   * ib_umem_release - release memory pinned with ib_umem_get
> >>   * @umem: umem struct to release
> >> + *
> >> + * Decrement the reference count and free the umem when it reaches zero.
> >>   */
> >>  void ib_umem_release(struct ib_umem *umem)
> >>  {
> >>  	if (!umem)
> >>  		return;
> >> +	if (!refcount_dec_and_test(&umem->refcount))
> >> +		return;
> >>  	if (umem->is_dmabuf)
> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> >>  	if (umem->is_odp)
> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> >> index 939da49b0dcc..5c5286092fca 100644
> >> --- a/drivers/infiniband/core/umem_dmabuf.c
> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> >>  	}
> >>  
> >>  	umem = &umem_dmabuf->umem;
> >> +	refcount_set(&umem->refcount, 1);
> >>  	umem->ibdev = device;
> >>  	umem->length = size;
> >>  	umem->address = offset;
> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> >> index 572a91a62a7b..7be30fda57e3 100644
> >> --- a/drivers/infiniband/core/umem_odp.c
> >> +++ b/drivers/infiniband/core/umem_odp.c
> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
> >>  	if (!umem_odp)
> >>  		return ERR_PTR(-ENOMEM);
> >>  	umem = &umem_odp->umem;
> >> +	refcount_set(&umem->refcount, 1);
> >>  	umem->ibdev = device;
> >>  	umem->writable = ib_access_writable(access);
> >>  	umem->owning_mm = current->mm;
> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
> >>  	if (!odp_data)
> >>  		return ERR_PTR(-ENOMEM);
> >>  	umem = &odp_data->umem;
> >> +	refcount_set(&umem->refcount, 1);
> >>  	umem->ibdev = root->umem.ibdev;
> >>  	umem->length     = size;
> >>  	umem->address    = addr;
> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
> >>  	if (!umem_odp)
> >>  		return ERR_PTR(-ENOMEM);
> >>  
> >> +	refcount_set(&umem_odp->umem.refcount, 1);
> >>  	umem_odp->umem.ibdev = device;
> >>  	umem_odp->umem.length = size;
> >>  	umem_odp->umem.address = addr;
> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> >> index 0a8e092c0ea8..44264f78eab3 100644
> >> --- a/include/rdma/ib_umem.h
> >> +++ b/include/rdma/ib_umem.h
> >> @@ -10,6 +10,7 @@
> >>  #include <linux/list.h>
> >>  #include <linux/scatterlist.h>
> >>  #include <linux/workqueue.h>
> >> +#include <linux/refcount.h>
> >>  #include <rdma/ib_verbs.h>
> >>  
> >>  struct ib_ucontext;
> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
> >>  struct ib_umem {
> >>  	struct ib_device       *ibdev;
> >>  	struct mm_struct       *owning_mm;
> >> +	refcount_t		refcount;
> >>  	u64 iova;
> >>  	size_t			length;
> >>  	unsigned long		address;
> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
> >>  
> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >>  			    size_t size, int access);
> >> +
> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
> >> +{
> >> +	refcount_inc(&umem->refcount);
> >> +}
> >> +
> >>  void ib_umem_release(struct ib_umem *umem);
> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >>  		      size_t length);
> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
> >>  {
> >>  	return ERR_PTR(-EOPNOTSUPP);
> >>  }
> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >>  		      		    size_t length) {
> >> -- 
> >> 2.51.1
> >> 
> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 12:26       ` Leon Romanovsky
@ 2026-02-03 12:46         ` Jiri Pirko
  2026-02-03 13:03           ` Leon Romanovsky
  0 siblings, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 12:46 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
>On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
>> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
>> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> 
>> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> lifecycle management when umem buffers are shared between the core
>> >> layer and device drivers.
>
>The lifecycle should be confined either to the core or to the driver,
>but it should not mix responsibilities across both.

Okay, I can re-phrase. The point is, the last holding reference actually
does the release.


>
>> >> 
>> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
>> >> (e.g., for CQ or QP creation with external buffers), both layers need
>> >> to manage the umem lifecycle. Without reference counting, this requires
>> >> complex conditional release logic to avoid double-frees on error paths
>> >> and leaks on success paths.
>> >
>> >This sentence doesn't align with the proposed change.
>> 
>> Hmm, I'm not sure why you think it does not align. It exactly describes
>> the code I had it originally without this path in place :)
>
>There is no complex conditional release logic which this reference
>counting solves. That counting allows delayed, semi-random release,
>nothing more.

Again. Without the refcount, you have to make sure the umem is consumed
by the op. Actually, check the existing create_cq_umem. On the error
path, the umem is released by the caller. On success path the ownership
is transferred to the calle. Consider various error paths in the calle
some of which are shared with destroy_cq op, some umems are not actually
used etc, it's a nightmare. I had the code written down like this
originally, that's why I actually know.

Perhaps I'm missing your point. Is is just about the patch descriptio or
about the code itself?


>
>> 
>> >
>> >> 
>> >> With reference counting:
>> >> - Core allocates umem with refcount=1
>> >> - Driver calls ib_umem_get_ref() to take a reference
>> >> - Both layers can unconditionally call ib_umem_release()
>> >> - The umem is freed only when the last reference is dropped
>> >> 
>> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
>> >
>> >Please remove the Change-Ids and write the commit message yourself,
>> >without relying on AI. The current message provides no meaningful
>> 
>> I'm new in RDMA. Not sure what you mean by meaningful information :)
>
>This part of commit message is clearly generated by Cursor:
>With reference counting:
>- Core allocates umem with refcount=1
>- Driver calls ib_umem_get_ref() to take a reference
>- Both layers can unconditionally call ib_umem_release()
>- The umem is freed only when the last reference is dropped
>
>The above paragraphs have clear AI pattern as they don't explain why,
>but only how.

Why is explained above, isn't it?
If you don't want the "how part", I can remove it, no problem.


>
>I'm seeing the SAME commit messages in many internals and externals
>patches.
>
>Even my AI review is agreed with me :)
>...
>"AI-authorship-score": "medium"
>...
>
>> I'm always trying to provide it.
>> 
>> >information, particularly the auto‑generated summary at the end.
>> 
>> Doh, the changeIDs :) Sorry about that.
>> 
>> 
>> >
>> >Thanks
>> >
>> >> ---
>> >>  drivers/infiniband/core/umem.c        | 5 +++++
>> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
>> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
>> >>  include/rdma/ib_umem.h                | 9 +++++++++
>> >>  4 files changed, 18 insertions(+)
>> >> 
>> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> >> index 8137031c2a65..09ce694d66ea 100644
>> >> --- a/drivers/infiniband/core/umem.c
>> >> +++ b/drivers/infiniband/core/umem.c
>> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
>> >>  	if (!umem)
>> >>  		return ERR_PTR(-ENOMEM);
>> >> +	refcount_set(&umem->refcount, 1);
>> >>  	umem->ibdev      = device;
>> >>  	umem->length     = size;
>> >>  	umem->address    = addr;
>> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
>> >>  /**
>> >>   * ib_umem_release - release memory pinned with ib_umem_get
>> >>   * @umem: umem struct to release
>> >> + *
>> >> + * Decrement the reference count and free the umem when it reaches zero.
>> >>   */
>> >>  void ib_umem_release(struct ib_umem *umem)
>> >>  {
>> >>  	if (!umem)
>> >>  		return;
>> >> +	if (!refcount_dec_and_test(&umem->refcount))
>> >> +		return;
>> >>  	if (umem->is_dmabuf)
>> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
>> >>  	if (umem->is_odp)
>> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
>> >> index 939da49b0dcc..5c5286092fca 100644
>> >> --- a/drivers/infiniband/core/umem_dmabuf.c
>> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
>> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>> >>  	}
>> >>  
>> >>  	umem = &umem_dmabuf->umem;
>> >> +	refcount_set(&umem->refcount, 1);
>> >>  	umem->ibdev = device;
>> >>  	umem->length = size;
>> >>  	umem->address = offset;
>> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> >> index 572a91a62a7b..7be30fda57e3 100644
>> >> --- a/drivers/infiniband/core/umem_odp.c
>> >> +++ b/drivers/infiniband/core/umem_odp.c
>> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
>> >>  	if (!umem_odp)
>> >>  		return ERR_PTR(-ENOMEM);
>> >>  	umem = &umem_odp->umem;
>> >> +	refcount_set(&umem->refcount, 1);
>> >>  	umem->ibdev = device;
>> >>  	umem->writable = ib_access_writable(access);
>> >>  	umem->owning_mm = current->mm;
>> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
>> >>  	if (!odp_data)
>> >>  		return ERR_PTR(-ENOMEM);
>> >>  	umem = &odp_data->umem;
>> >> +	refcount_set(&umem->refcount, 1);
>> >>  	umem->ibdev = root->umem.ibdev;
>> >>  	umem->length     = size;
>> >>  	umem->address    = addr;
>> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>> >>  	if (!umem_odp)
>> >>  		return ERR_PTR(-ENOMEM);
>> >>  
>> >> +	refcount_set(&umem_odp->umem.refcount, 1);
>> >>  	umem_odp->umem.ibdev = device;
>> >>  	umem_odp->umem.length = size;
>> >>  	umem_odp->umem.address = addr;
>> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> >> index 0a8e092c0ea8..44264f78eab3 100644
>> >> --- a/include/rdma/ib_umem.h
>> >> +++ b/include/rdma/ib_umem.h
>> >> @@ -10,6 +10,7 @@
>> >>  #include <linux/list.h>
>> >>  #include <linux/scatterlist.h>
>> >>  #include <linux/workqueue.h>
>> >> +#include <linux/refcount.h>
>> >>  #include <rdma/ib_verbs.h>
>> >>  
>> >>  struct ib_ucontext;
>> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
>> >>  struct ib_umem {
>> >>  	struct ib_device       *ibdev;
>> >>  	struct mm_struct       *owning_mm;
>> >> +	refcount_t		refcount;
>> >>  	u64 iova;
>> >>  	size_t			length;
>> >>  	unsigned long		address;
>> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
>> >>  
>> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >>  			    size_t size, int access);
>> >> +
>> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
>> >> +{
>> >> +	refcount_inc(&umem->refcount);
>> >> +}
>> >> +
>> >>  void ib_umem_release(struct ib_umem *umem);
>> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >>  		      size_t length);
>> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
>> >>  {
>> >>  	return ERR_PTR(-EOPNOTSUPP);
>> >>  }
>> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
>> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
>> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >>  		      		    size_t length) {
>> >> -- 
>> >> 2.51.1
>> >> 
>> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 12:46         ` Jiri Pirko
@ 2026-02-03 13:03           ` Leon Romanovsky
  2026-02-03 13:20             ` Jiri Pirko
  0 siblings, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03 13:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> >> From: Jiri Pirko <jiri@nvidia.com>
> >> >> 
> >> >> Introduce reference counting for ib_umem objects to simplify memory
> >> >> lifecycle management when umem buffers are shared between the core
> >> >> layer and device drivers.
> >
> >The lifecycle should be confined either to the core or to the driver,
> >but it should not mix responsibilities across both.
> 
> Okay, I can re-phrase. The point is, the last holding reference actually
> does the release.
> 
> 
> >
> >> >> 
> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
> >> >> to manage the umem lifecycle. Without reference counting, this requires
> >> >> complex conditional release logic to avoid double-frees on error paths
> >> >> and leaks on success paths.
> >> >
> >> >This sentence doesn't align with the proposed change.
> >> 
> >> Hmm, I'm not sure why you think it does not align. It exactly describes
> >> the code I had it originally without this path in place :)
> >
> >There is no complex conditional release logic which this reference
> >counting solves. That counting allows delayed, semi-random release,
> >nothing more.
> 
> Again. Without the refcount, you have to make sure the umem is consumed
> by the op. Actually, check the existing create_cq_umem. On the error
> path, the umem is released by the caller. On success path the ownership
> is transferred to the calle. 

We need to fix it. Exactly right now, I'm working to make sure that umem
is managed by ib_core and drivers don't call to ib_umem_get() at all.

> Consider various error paths in the calle
> some of which are shared with destroy_cq op, some umems are not actually
> used etc, it's a nightmare. I had the code written down like this
> originally, that's why I actually know.
> 
> Perhaps I'm missing your point. Is is just about the patch descriptio or
> about the code itself?

Description and the code. UMEM needs to be created by ib_core and
ib_core should destroy them.

> 
> 
> >
> >> 
> >> >
> >> >> 
> >> >> With reference counting:
> >> >> - Core allocates umem with refcount=1
> >> >> - Driver calls ib_umem_get_ref() to take a reference
> >> >> - Both layers can unconditionally call ib_umem_release()
> >> >> - The umem is freed only when the last reference is dropped
> >> >> 
> >> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> >> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
> >> >
> >> >Please remove the Change-Ids and write the commit message yourself,
> >> >without relying on AI. The current message provides no meaningful
> >> 
> >> I'm new in RDMA. Not sure what you mean by meaningful information :)
> >
> >This part of commit message is clearly generated by Cursor:
> >With reference counting:
> >- Core allocates umem with refcount=1
> >- Driver calls ib_umem_get_ref() to take a reference
> >- Both layers can unconditionally call ib_umem_release()
> >- The umem is freed only when the last reference is dropped
> >
> >The above paragraphs have clear AI pattern as they don't explain why,
> >but only how.
> 
> Why is explained above, isn't it?
> If you don't want the "how part", I can remove it, no problem.

Commit message should provide an additional information, which is not
available in the code itself. Description like "Core allocates umem with
refcount=1" has zero value.

Thanks

> 
> 
> >
> >I'm seeing the SAME commit messages in many internals and externals
> >patches.
> >
> >Even my AI review is agreed with me :)
> >...
> >"AI-authorship-score": "medium"
> >...
> >
> >> I'm always trying to provide it.
> >> 
> >> >information, particularly the auto‑generated summary at the end.
> >> 
> >> Doh, the changeIDs :) Sorry about that.
> >> 
> >> 
> >> >
> >> >Thanks
> >> >
> >> >> ---
> >> >>  drivers/infiniband/core/umem.c        | 5 +++++
> >> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
> >> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
> >> >>  include/rdma/ib_umem.h                | 9 +++++++++
> >> >>  4 files changed, 18 insertions(+)
> >> >> 
> >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >> >> index 8137031c2a65..09ce694d66ea 100644
> >> >> --- a/drivers/infiniband/core/umem.c
> >> >> +++ b/drivers/infiniband/core/umem.c
> >> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
> >> >>  	if (!umem)
> >> >>  		return ERR_PTR(-ENOMEM);
> >> >> +	refcount_set(&umem->refcount, 1);
> >> >>  	umem->ibdev      = device;
> >> >>  	umem->length     = size;
> >> >>  	umem->address    = addr;
> >> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
> >> >>  /**
> >> >>   * ib_umem_release - release memory pinned with ib_umem_get
> >> >>   * @umem: umem struct to release
> >> >> + *
> >> >> + * Decrement the reference count and free the umem when it reaches zero.
> >> >>   */
> >> >>  void ib_umem_release(struct ib_umem *umem)
> >> >>  {
> >> >>  	if (!umem)
> >> >>  		return;
> >> >> +	if (!refcount_dec_and_test(&umem->refcount))
> >> >> +		return;
> >> >>  	if (umem->is_dmabuf)
> >> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> >> >>  	if (umem->is_odp)
> >> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> >> >> index 939da49b0dcc..5c5286092fca 100644
> >> >> --- a/drivers/infiniband/core/umem_dmabuf.c
> >> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
> >> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> >> >>  	}
> >> >>  
> >> >>  	umem = &umem_dmabuf->umem;
> >> >> +	refcount_set(&umem->refcount, 1);
> >> >>  	umem->ibdev = device;
> >> >>  	umem->length = size;
> >> >>  	umem->address = offset;
> >> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> >> >> index 572a91a62a7b..7be30fda57e3 100644
> >> >> --- a/drivers/infiniband/core/umem_odp.c
> >> >> +++ b/drivers/infiniband/core/umem_odp.c
> >> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
> >> >>  	if (!umem_odp)
> >> >>  		return ERR_PTR(-ENOMEM);
> >> >>  	umem = &umem_odp->umem;
> >> >> +	refcount_set(&umem->refcount, 1);
> >> >>  	umem->ibdev = device;
> >> >>  	umem->writable = ib_access_writable(access);
> >> >>  	umem->owning_mm = current->mm;
> >> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
> >> >>  	if (!odp_data)
> >> >>  		return ERR_PTR(-ENOMEM);
> >> >>  	umem = &odp_data->umem;
> >> >> +	refcount_set(&umem->refcount, 1);
> >> >>  	umem->ibdev = root->umem.ibdev;
> >> >>  	umem->length     = size;
> >> >>  	umem->address    = addr;
> >> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
> >> >>  	if (!umem_odp)
> >> >>  		return ERR_PTR(-ENOMEM);
> >> >>  
> >> >> +	refcount_set(&umem_odp->umem.refcount, 1);
> >> >>  	umem_odp->umem.ibdev = device;
> >> >>  	umem_odp->umem.length = size;
> >> >>  	umem_odp->umem.address = addr;
> >> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> >> >> index 0a8e092c0ea8..44264f78eab3 100644
> >> >> --- a/include/rdma/ib_umem.h
> >> >> +++ b/include/rdma/ib_umem.h
> >> >> @@ -10,6 +10,7 @@
> >> >>  #include <linux/list.h>
> >> >>  #include <linux/scatterlist.h>
> >> >>  #include <linux/workqueue.h>
> >> >> +#include <linux/refcount.h>
> >> >>  #include <rdma/ib_verbs.h>
> >> >>  
> >> >>  struct ib_ucontext;
> >> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
> >> >>  struct ib_umem {
> >> >>  	struct ib_device       *ibdev;
> >> >>  	struct mm_struct       *owning_mm;
> >> >> +	refcount_t		refcount;
> >> >>  	u64 iova;
> >> >>  	size_t			length;
> >> >>  	unsigned long		address;
> >> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
> >> >>  
> >> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >>  			    size_t size, int access);
> >> >> +
> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
> >> >> +{
> >> >> +	refcount_inc(&umem->refcount);
> >> >> +}
> >> >> +
> >> >>  void ib_umem_release(struct ib_umem *umem);
> >> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >>  		      size_t length);
> >> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
> >> >>  {
> >> >>  	return ERR_PTR(-EOPNOTSUPP);
> >> >>  }
> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
> >> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
> >> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >>  		      		    size_t length) {
> >> >> -- 
> >> >> 2.51.1
> >> >> 
> >> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 13:03           ` Leon Romanovsky
@ 2026-02-03 13:20             ` Jiri Pirko
  2026-02-03 13:32               ` Leon Romanovsky
  2026-02-03 13:49               ` Sriharsha Basavapatna
  0 siblings, 2 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 13:20 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
>On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
>> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
>> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
>> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
>> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> >> 
>> >> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> >> lifecycle management when umem buffers are shared between the core
>> >> >> layer and device drivers.
>> >
>> >The lifecycle should be confined either to the core or to the driver,
>> >but it should not mix responsibilities across both.
>> 
>> Okay, I can re-phrase. The point is, the last holding reference actually
>> does the release.
>> 
>> 
>> >
>> >> >> 
>> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
>> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
>> >> >> to manage the umem lifecycle. Without reference counting, this requires
>> >> >> complex conditional release logic to avoid double-frees on error paths
>> >> >> and leaks on success paths.
>> >> >
>> >> >This sentence doesn't align with the proposed change.
>> >> 
>> >> Hmm, I'm not sure why you think it does not align. It exactly describes
>> >> the code I had it originally without this path in place :)
>> >
>> >There is no complex conditional release logic which this reference
>> >counting solves. That counting allows delayed, semi-random release,
>> >nothing more.
>> 
>> Again. Without the refcount, you have to make sure the umem is consumed
>> by the op. Actually, check the existing create_cq_umem. On the error
>> path, the umem is released by the caller. On success path the ownership
>> is transferred to the calle. 
>
>We need to fix it. Exactly right now, I'm working to make sure that umem
>is managed by ib_core and drivers don't call to ib_umem_get() at all.

Should I wait and rebase?


>
>> Consider various error paths in the calle
>> some of which are shared with destroy_cq op, some umems are not actually
>> used etc, it's a nightmare. I had the code written down like this
>> originally, that's why I actually know.
>> 
>> Perhaps I'm missing your point. Is is just about the patch descriptio or
>> about the code itself?
>
>Description and the code. UMEM needs to be created by ib_core and
>ib_core should destroy them.
>
>> 
>> 
>> >
>> >> 
>> >> >
>> >> >> 
>> >> >> With reference counting:
>> >> >> - Core allocates umem with refcount=1
>> >> >> - Driver calls ib_umem_get_ref() to take a reference
>> >> >> - Both layers can unconditionally call ib_umem_release()
>> >> >> - The umem is freed only when the last reference is dropped
>> >> >> 
>> >> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> >> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
>> >> >
>> >> >Please remove the Change-Ids and write the commit message yourself,
>> >> >without relying on AI. The current message provides no meaningful
>> >> 
>> >> I'm new in RDMA. Not sure what you mean by meaningful information :)
>> >
>> >This part of commit message is clearly generated by Cursor:
>> >With reference counting:
>> >- Core allocates umem with refcount=1
>> >- Driver calls ib_umem_get_ref() to take a reference
>> >- Both layers can unconditionally call ib_umem_release()
>> >- The umem is freed only when the last reference is dropped
>> >
>> >The above paragraphs have clear AI pattern as they don't explain why,
>> >but only how.
>> 
>> Why is explained above, isn't it?
>> If you don't want the "how part", I can remove it, no problem.
>
>Commit message should provide an additional information, which is not
>available in the code itself. Description like "Core allocates umem with
>refcount=1" has zero value.

:), sure


>
>Thanks
>
>> 
>> 
>> >
>> >I'm seeing the SAME commit messages in many internals and externals
>> >patches.
>> >
>> >Even my AI review is agreed with me :)
>> >...
>> >"AI-authorship-score": "medium"
>> >...
>> >
>> >> I'm always trying to provide it.
>> >> 
>> >> >information, particularly the auto‑generated summary at the end.
>> >> 
>> >> Doh, the changeIDs :) Sorry about that.
>> >> 
>> >> 
>> >> >
>> >> >Thanks
>> >> >
>> >> >> ---
>> >> >>  drivers/infiniband/core/umem.c        | 5 +++++
>> >> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
>> >> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
>> >> >>  include/rdma/ib_umem.h                | 9 +++++++++
>> >> >>  4 files changed, 18 insertions(+)
>> >> >> 
>> >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> >> >> index 8137031c2a65..09ce694d66ea 100644
>> >> >> --- a/drivers/infiniband/core/umem.c
>> >> >> +++ b/drivers/infiniband/core/umem.c
>> >> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
>> >> >>  	if (!umem)
>> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >>  	umem->ibdev      = device;
>> >> >>  	umem->length     = size;
>> >> >>  	umem->address    = addr;
>> >> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
>> >> >>  /**
>> >> >>   * ib_umem_release - release memory pinned with ib_umem_get
>> >> >>   * @umem: umem struct to release
>> >> >> + *
>> >> >> + * Decrement the reference count and free the umem when it reaches zero.
>> >> >>   */
>> >> >>  void ib_umem_release(struct ib_umem *umem)
>> >> >>  {
>> >> >>  	if (!umem)
>> >> >>  		return;
>> >> >> +	if (!refcount_dec_and_test(&umem->refcount))
>> >> >> +		return;
>> >> >>  	if (umem->is_dmabuf)
>> >> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
>> >> >>  	if (umem->is_odp)
>> >> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
>> >> >> index 939da49b0dcc..5c5286092fca 100644
>> >> >> --- a/drivers/infiniband/core/umem_dmabuf.c
>> >> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
>> >> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>> >> >>  	}
>> >> >>  
>> >> >>  	umem = &umem_dmabuf->umem;
>> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >>  	umem->ibdev = device;
>> >> >>  	umem->length = size;
>> >> >>  	umem->address = offset;
>> >> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> >> >> index 572a91a62a7b..7be30fda57e3 100644
>> >> >> --- a/drivers/infiniband/core/umem_odp.c
>> >> >> +++ b/drivers/infiniband/core/umem_odp.c
>> >> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
>> >> >>  	if (!umem_odp)
>> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >>  	umem = &umem_odp->umem;
>> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >>  	umem->ibdev = device;
>> >> >>  	umem->writable = ib_access_writable(access);
>> >> >>  	umem->owning_mm = current->mm;
>> >> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
>> >> >>  	if (!odp_data)
>> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >>  	umem = &odp_data->umem;
>> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >>  	umem->ibdev = root->umem.ibdev;
>> >> >>  	umem->length     = size;
>> >> >>  	umem->address    = addr;
>> >> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>> >> >>  	if (!umem_odp)
>> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >>  
>> >> >> +	refcount_set(&umem_odp->umem.refcount, 1);
>> >> >>  	umem_odp->umem.ibdev = device;
>> >> >>  	umem_odp->umem.length = size;
>> >> >>  	umem_odp->umem.address = addr;
>> >> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> >> >> index 0a8e092c0ea8..44264f78eab3 100644
>> >> >> --- a/include/rdma/ib_umem.h
>> >> >> +++ b/include/rdma/ib_umem.h
>> >> >> @@ -10,6 +10,7 @@
>> >> >>  #include <linux/list.h>
>> >> >>  #include <linux/scatterlist.h>
>> >> >>  #include <linux/workqueue.h>
>> >> >> +#include <linux/refcount.h>
>> >> >>  #include <rdma/ib_verbs.h>
>> >> >>  
>> >> >>  struct ib_ucontext;
>> >> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
>> >> >>  struct ib_umem {
>> >> >>  	struct ib_device       *ibdev;
>> >> >>  	struct mm_struct       *owning_mm;
>> >> >> +	refcount_t		refcount;
>> >> >>  	u64 iova;
>> >> >>  	size_t			length;
>> >> >>  	unsigned long		address;
>> >> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
>> >> >>  
>> >> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >> >>  			    size_t size, int access);
>> >> >> +
>> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
>> >> >> +{
>> >> >> +	refcount_inc(&umem->refcount);
>> >> >> +}
>> >> >> +
>> >> >>  void ib_umem_release(struct ib_umem *umem);
>> >> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >> >>  		      size_t length);
>> >> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
>> >> >>  {
>> >> >>  	return ERR_PTR(-EOPNOTSUPP);
>> >> >>  }
>> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
>> >> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
>> >> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >> >>  		      		    size_t length) {
>> >> >> -- 
>> >> >> 2.51.1
>> >> >> 
>> >> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 13:20             ` Jiri Pirko
@ 2026-02-03 13:32               ` Leon Romanovsky
  2026-02-03 14:31                 ` Jiri Pirko
  2026-02-03 13:49               ` Sriharsha Basavapatna
  1 sibling, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03 13:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 02:20:45PM +0100, Jiri Pirko wrote:
> Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
> >On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
> >> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
> >> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
> >> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
> >> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> >> >> From: Jiri Pirko <jiri@nvidia.com>
> >> >> >> 
> >> >> >> Introduce reference counting for ib_umem objects to simplify memory
> >> >> >> lifecycle management when umem buffers are shared between the core
> >> >> >> layer and device drivers.
> >> >
> >> >The lifecycle should be confined either to the core or to the driver,
> >> >but it should not mix responsibilities across both.
> >> 
> >> Okay, I can re-phrase. The point is, the last holding reference actually
> >> does the release.
> >> 
> >> 
> >> >
> >> >> >> 
> >> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
> >> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
> >> >> >> to manage the umem lifecycle. Without reference counting, this requires
> >> >> >> complex conditional release logic to avoid double-frees on error paths
> >> >> >> and leaks on success paths.
> >> >> >
> >> >> >This sentence doesn't align with the proposed change.
> >> >> 
> >> >> Hmm, I'm not sure why you think it does not align. It exactly describes
> >> >> the code I had it originally without this path in place :)
> >> >
> >> >There is no complex conditional release logic which this reference
> >> >counting solves. That counting allows delayed, semi-random release,
> >> >nothing more.
> >> 
> >> Again. Without the refcount, you have to make sure the umem is consumed
> >> by the op. Actually, check the existing create_cq_umem. On the error
> >> path, the umem is released by the caller. On success path the ownership
> >> is transferred to the calle. 
> >
> >We need to fix it. Exactly right now, I'm working to make sure that umem
> >is managed by ib_core and drivers don't call to ib_umem_get() at all.
> 
> Should I wait and rebase?

It depends on your timeline. We are in -rc8 right now, so at least for
next 3 weeks (one week till 6.19 + two weeks merge window) from now, your
series won't be applied.

Thanks

> 
> 
> >
> >> Consider various error paths in the calle
> >> some of which are shared with destroy_cq op, some umems are not actually
> >> used etc, it's a nightmare. I had the code written down like this
> >> originally, that's why I actually know.
> >> 
> >> Perhaps I'm missing your point. Is is just about the patch descriptio or
> >> about the code itself?
> >
> >Description and the code. UMEM needs to be created by ib_core and
> >ib_core should destroy them.
> >
> >> 
> >> 
> >> >
> >> >> 
> >> >> >
> >> >> >> 
> >> >> >> With reference counting:
> >> >> >> - Core allocates umem with refcount=1
> >> >> >> - Driver calls ib_umem_get_ref() to take a reference
> >> >> >> - Both layers can unconditionally call ib_umem_release()
> >> >> >> - The umem is freed only when the last reference is dropped
> >> >> >> 
> >> >> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> >> >> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
> >> >> >
> >> >> >Please remove the Change-Ids and write the commit message yourself,
> >> >> >without relying on AI. The current message provides no meaningful
> >> >> 
> >> >> I'm new in RDMA. Not sure what you mean by meaningful information :)
> >> >
> >> >This part of commit message is clearly generated by Cursor:
> >> >With reference counting:
> >> >- Core allocates umem with refcount=1
> >> >- Driver calls ib_umem_get_ref() to take a reference
> >> >- Both layers can unconditionally call ib_umem_release()
> >> >- The umem is freed only when the last reference is dropped
> >> >
> >> >The above paragraphs have clear AI pattern as they don't explain why,
> >> >but only how.
> >> 
> >> Why is explained above, isn't it?
> >> If you don't want the "how part", I can remove it, no problem.
> >
> >Commit message should provide an additional information, which is not
> >available in the code itself. Description like "Core allocates umem with
> >refcount=1" has zero value.
> 
> :), sure
> 
> 
> >
> >Thanks
> >
> >> 
> >> 
> >> >
> >> >I'm seeing the SAME commit messages in many internals and externals
> >> >patches.
> >> >
> >> >Even my AI review is agreed with me :)
> >> >...
> >> >"AI-authorship-score": "medium"
> >> >...
> >> >
> >> >> I'm always trying to provide it.
> >> >> 
> >> >> >information, particularly the auto‑generated summary at the end.
> >> >> 
> >> >> Doh, the changeIDs :) Sorry about that.
> >> >> 
> >> >> 
> >> >> >
> >> >> >Thanks
> >> >> >
> >> >> >> ---
> >> >> >>  drivers/infiniband/core/umem.c        | 5 +++++
> >> >> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
> >> >> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
> >> >> >>  include/rdma/ib_umem.h                | 9 +++++++++
> >> >> >>  4 files changed, 18 insertions(+)
> >> >> >> 
> >> >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >> >> >> index 8137031c2a65..09ce694d66ea 100644
> >> >> >> --- a/drivers/infiniband/core/umem.c
> >> >> >> +++ b/drivers/infiniband/core/umem.c
> >> >> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
> >> >> >>  	if (!umem)
> >> >> >>  		return ERR_PTR(-ENOMEM);
> >> >> >> +	refcount_set(&umem->refcount, 1);
> >> >> >>  	umem->ibdev      = device;
> >> >> >>  	umem->length     = size;
> >> >> >>  	umem->address    = addr;
> >> >> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
> >> >> >>  /**
> >> >> >>   * ib_umem_release - release memory pinned with ib_umem_get
> >> >> >>   * @umem: umem struct to release
> >> >> >> + *
> >> >> >> + * Decrement the reference count and free the umem when it reaches zero.
> >> >> >>   */
> >> >> >>  void ib_umem_release(struct ib_umem *umem)
> >> >> >>  {
> >> >> >>  	if (!umem)
> >> >> >>  		return;
> >> >> >> +	if (!refcount_dec_and_test(&umem->refcount))
> >> >> >> +		return;
> >> >> >>  	if (umem->is_dmabuf)
> >> >> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> >> >> >>  	if (umem->is_odp)
> >> >> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> index 939da49b0dcc..5c5286092fca 100644
> >> >> >> --- a/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> >> >> >>  	}
> >> >> >>  
> >> >> >>  	umem = &umem_dmabuf->umem;
> >> >> >> +	refcount_set(&umem->refcount, 1);
> >> >> >>  	umem->ibdev = device;
> >> >> >>  	umem->length = size;
> >> >> >>  	umem->address = offset;
> >> >> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> >> >> >> index 572a91a62a7b..7be30fda57e3 100644
> >> >> >> --- a/drivers/infiniband/core/umem_odp.c
> >> >> >> +++ b/drivers/infiniband/core/umem_odp.c
> >> >> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
> >> >> >>  	if (!umem_odp)
> >> >> >>  		return ERR_PTR(-ENOMEM);
> >> >> >>  	umem = &umem_odp->umem;
> >> >> >> +	refcount_set(&umem->refcount, 1);
> >> >> >>  	umem->ibdev = device;
> >> >> >>  	umem->writable = ib_access_writable(access);
> >> >> >>  	umem->owning_mm = current->mm;
> >> >> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
> >> >> >>  	if (!odp_data)
> >> >> >>  		return ERR_PTR(-ENOMEM);
> >> >> >>  	umem = &odp_data->umem;
> >> >> >> +	refcount_set(&umem->refcount, 1);
> >> >> >>  	umem->ibdev = root->umem.ibdev;
> >> >> >>  	umem->length     = size;
> >> >> >>  	umem->address    = addr;
> >> >> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
> >> >> >>  	if (!umem_odp)
> >> >> >>  		return ERR_PTR(-ENOMEM);
> >> >> >>  
> >> >> >> +	refcount_set(&umem_odp->umem.refcount, 1);
> >> >> >>  	umem_odp->umem.ibdev = device;
> >> >> >>  	umem_odp->umem.length = size;
> >> >> >>  	umem_odp->umem.address = addr;
> >> >> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> >> >> >> index 0a8e092c0ea8..44264f78eab3 100644
> >> >> >> --- a/include/rdma/ib_umem.h
> >> >> >> +++ b/include/rdma/ib_umem.h
> >> >> >> @@ -10,6 +10,7 @@
> >> >> >>  #include <linux/list.h>
> >> >> >>  #include <linux/scatterlist.h>
> >> >> >>  #include <linux/workqueue.h>
> >> >> >> +#include <linux/refcount.h>
> >> >> >>  #include <rdma/ib_verbs.h>
> >> >> >>  
> >> >> >>  struct ib_ucontext;
> >> >> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
> >> >> >>  struct ib_umem {
> >> >> >>  	struct ib_device       *ibdev;
> >> >> >>  	struct mm_struct       *owning_mm;
> >> >> >> +	refcount_t		refcount;
> >> >> >>  	u64 iova;
> >> >> >>  	size_t			length;
> >> >> >>  	unsigned long		address;
> >> >> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
> >> >> >>  
> >> >> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >> >>  			    size_t size, int access);
> >> >> >> +
> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
> >> >> >> +{
> >> >> >> +	refcount_inc(&umem->refcount);
> >> >> >> +}
> >> >> >> +
> >> >> >>  void ib_umem_release(struct ib_umem *umem);
> >> >> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >> >>  		      size_t length);
> >> >> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
> >> >> >>  {
> >> >> >>  	return ERR_PTR(-EOPNOTSUPP);
> >> >> >>  }
> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
> >> >> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
> >> >> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >> >>  		      		    size_t length) {
> >> >> >> -- 
> >> >> >> 2.51.1
> >> >> >> 
> >> >> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 13:20             ` Jiri Pirko
  2026-02-03 13:32               ` Leon Romanovsky
@ 2026-02-03 13:49               ` Sriharsha Basavapatna
  2026-02-03 14:29                 ` Jiri Pirko
  1 sibling, 1 reply; 34+ messages in thread
From: Sriharsha Basavapatna @ 2026-02-03 13:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Leon Romanovsky, linux-rdma, jgg, mrgolin, gal.pressman, sleybo,
	parav, mbloch, yanjun.zhu, wangliang74, marco.crivellari,
	roman.gushchin, phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	andrew.gospodarek, selvin.xavier, Sriharsha Basavapatna

[-- Attachment #1: Type: text/plain, Size: 11302 bytes --]

On Tue, Feb 3, 2026 at 6:50 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
> >On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
> >> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
> >> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
> >> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
> >> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> >> >> From: Jiri Pirko <jiri@nvidia.com>
> >> >> >>
> >> >> >> Introduce reference counting for ib_umem objects to simplify memory
> >> >> >> lifecycle management when umem buffers are shared between the core
> >> >> >> layer and device drivers.
> >> >
> >> >The lifecycle should be confined either to the core or to the driver,
> >> >but it should not mix responsibilities across both.
> >>
> >> Okay, I can re-phrase. The point is, the last holding reference actually
> >> does the release.
> >>
> >>
> >> >
> >> >> >>
> >> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
> >> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
> >> >> >> to manage the umem lifecycle. Without reference counting, this requires
> >> >> >> complex conditional release logic to avoid double-frees on error paths
> >> >> >> and leaks on success paths.
> >> >> >
> >> >> >This sentence doesn't align with the proposed change.
> >> >>
> >> >> Hmm, I'm not sure why you think it does not align. It exactly describes
> >> >> the code I had it originally without this path in place :)
> >> >
> >> >There is no complex conditional release logic which this reference
> >> >counting solves. That counting allows delayed, semi-random release,
> >> >nothing more.
> >>
> >> Again. Without the refcount, you have to make sure the umem is consumed
> >> by the op. Actually, check the existing create_cq_umem. On the error
> >> path, the umem is released by the caller. On success path the ownership
> >> is transferred to the calle.
> >
> >We need to fix it. Exactly right now, I'm working to make sure that umem
> >is managed by ib_core and drivers don't call to ib_umem_get() at all.
>
> Should I wait and rebase?
>
>
> >
> >> Consider various error paths in the calle
> >> some of which are shared with destroy_cq op, some umems are not actually
> >> used etc, it's a nightmare. I had the code written down like this
> >> originally, that's why I actually know.
> >>
> >> Perhaps I'm missing your point. Is is just about the patch descriptio or
> >> about the code itself?
> >
> >Description and the code. UMEM needs to be created by ib_core and
> >ib_core should destroy them.

I'm seeing a kernel crash when perftest is stopped, I'm still
debugging it, but I'm wondering if it is related to this change. I see
this inline comment in the uverbs handler:

        /* Driver took a reference, release ours */
        ib_umem_release(rq_dbr_umem);
        ib_umem_release(sq_dbr_umem);
        ib_umem_release(rq_umem);
        ib_umem_release(sq_umem);

What does it mean by "Driver took a reference"? If the driver returns
success from create_qp_umem(), the umem it is still using gets
released above? Is there something that the driver should call to hold
a reference? It is not obvious from the create_qp_umem() dev op.


> >
> >>
> >>
> >> >
> >> >>
> >> >> >
> >> >> >>
> >> >> >> With reference counting:
> >> >> >> - Core allocates umem with refcount=1
> >> >> >> - Driver calls ib_umem_get_ref() to take a reference
> >> >> >> - Both layers can unconditionally call ib_umem_release()
> >> >> >> - The umem is freed only when the last reference is dropped
> >> >> >>
> >> >> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> >> >> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
> >> >> >
> >> >> >Please remove the Change-Ids and write the commit message yourself,
> >> >> >without relying on AI. The current message provides no meaningful
> >> >>
> >> >> I'm new in RDMA. Not sure what you mean by meaningful information :)
> >> >
> >> >This part of commit message is clearly generated by Cursor:
> >> >With reference counting:
> >> >- Core allocates umem with refcount=1
> >> >- Driver calls ib_umem_get_ref() to take a reference
> >> >- Both layers can unconditionally call ib_umem_release()
> >> >- The umem is freed only when the last reference is dropped
> >> >
> >> >The above paragraphs have clear AI pattern as they don't explain why,
> >> >but only how.
> >>
> >> Why is explained above, isn't it?
> >> If you don't want the "how part", I can remove it, no problem.
> >
> >Commit message should provide an additional information, which is not
> >available in the code itself. Description like "Core allocates umem with
> >refcount=1" has zero value.
>
> :), sure
>
>
> >
> >Thanks
> >
> >>
> >>
> >> >
> >> >I'm seeing the SAME commit messages in many internals and externals
> >> >patches.
> >> >
> >> >Even my AI review is agreed with me :)
> >> >...
> >> >"AI-authorship-score": "medium"
> >> >...
> >> >
> >> >> I'm always trying to provide it.
> >> >>
> >> >> >information, particularly the auto‑generated summary at the end.
> >> >>
> >> >> Doh, the changeIDs :) Sorry about that.
> >> >>
> >> >>
> >> >> >
> >> >> >Thanks
> >> >> >
> >> >> >> ---
> >> >> >>  drivers/infiniband/core/umem.c        | 5 +++++
> >> >> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
> >> >> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
> >> >> >>  include/rdma/ib_umem.h                | 9 +++++++++
> >> >> >>  4 files changed, 18 insertions(+)
> >> >> >>
> >> >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >> >> >> index 8137031c2a65..09ce694d66ea 100644
> >> >> >> --- a/drivers/infiniband/core/umem.c
> >> >> >> +++ b/drivers/infiniband/core/umem.c
> >> >> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >> >>        umem = kzalloc(sizeof(*umem), GFP_KERNEL);
> >> >> >>        if (!umem)
> >> >> >>                return ERR_PTR(-ENOMEM);
> >> >> >> +      refcount_set(&umem->refcount, 1);
> >> >> >>        umem->ibdev      = device;
> >> >> >>        umem->length     = size;
> >> >> >>        umem->address    = addr;
> >> >> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
> >> >> >>  /**
> >> >> >>   * ib_umem_release - release memory pinned with ib_umem_get
> >> >> >>   * @umem: umem struct to release
> >> >> >> + *
> >> >> >> + * Decrement the reference count and free the umem when it reaches zero.
> >> >> >>   */
> >> >> >>  void ib_umem_release(struct ib_umem *umem)
> >> >> >>  {
> >> >> >>        if (!umem)
> >> >> >>                return;
> >> >> >> +      if (!refcount_dec_and_test(&umem->refcount))
> >> >> >> +              return;
> >> >> >>        if (umem->is_dmabuf)
> >> >> >>                return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
> >> >> >>        if (umem->is_odp)
> >> >> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> index 939da49b0dcc..5c5286092fca 100644
> >> >> >> --- a/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
> >> >> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> >> >> >>        }
> >> >> >>
> >> >> >>        umem = &umem_dmabuf->umem;
> >> >> >> +      refcount_set(&umem->refcount, 1);
> >> >> >>        umem->ibdev = device;
> >> >> >>        umem->length = size;
> >> >> >>        umem->address = offset;
> >> >> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> >> >> >> index 572a91a62a7b..7be30fda57e3 100644
> >> >> >> --- a/drivers/infiniband/core/umem_odp.c
> >> >> >> +++ b/drivers/infiniband/core/umem_odp.c
> >> >> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
> >> >> >>        if (!umem_odp)
> >> >> >>                return ERR_PTR(-ENOMEM);
> >> >> >>        umem = &umem_odp->umem;
> >> >> >> +      refcount_set(&umem->refcount, 1);
> >> >> >>        umem->ibdev = device;
> >> >> >>        umem->writable = ib_access_writable(access);
> >> >> >>        umem->owning_mm = current->mm;
> >> >> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
> >> >> >>        if (!odp_data)
> >> >> >>                return ERR_PTR(-ENOMEM);
> >> >> >>        umem = &odp_data->umem;
> >> >> >> +      refcount_set(&umem->refcount, 1);
> >> >> >>        umem->ibdev = root->umem.ibdev;
> >> >> >>        umem->length     = size;
> >> >> >>        umem->address    = addr;
> >> >> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
> >> >> >>        if (!umem_odp)
> >> >> >>                return ERR_PTR(-ENOMEM);
> >> >> >>
> >> >> >> +      refcount_set(&umem_odp->umem.refcount, 1);
> >> >> >>        umem_odp->umem.ibdev = device;
> >> >> >>        umem_odp->umem.length = size;
> >> >> >>        umem_odp->umem.address = addr;
> >> >> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> >> >> >> index 0a8e092c0ea8..44264f78eab3 100644
> >> >> >> --- a/include/rdma/ib_umem.h
> >> >> >> +++ b/include/rdma/ib_umem.h
> >> >> >> @@ -10,6 +10,7 @@
> >> >> >>  #include <linux/list.h>
> >> >> >>  #include <linux/scatterlist.h>
> >> >> >>  #include <linux/workqueue.h>
> >> >> >> +#include <linux/refcount.h>
> >> >> >>  #include <rdma/ib_verbs.h>
> >> >> >>
> >> >> >>  struct ib_ucontext;
> >> >> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
> >> >> >>  struct ib_umem {
> >> >> >>        struct ib_device       *ibdev;
> >> >> >>        struct mm_struct       *owning_mm;
> >> >> >> +      refcount_t              refcount;
> >> >> >>        u64 iova;
> >> >> >>        size_t                  length;
> >> >> >>        unsigned long           address;
> >> >> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
> >> >> >>
> >> >> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> >> >> >>                            size_t size, int access);
> >> >> >> +
> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
> >> >> >> +{
> >> >> >> +      refcount_inc(&umem->refcount);
> >> >> >> +}
> >> >> >> +
> >> >> >>  void ib_umem_release(struct ib_umem *umem);
> >> >> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >> >>                      size_t length);
> >> >> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
> >> >> >>  {
> >> >> >>        return ERR_PTR(-EOPNOTSUPP);
> >> >> >>  }
> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
> >> >> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
> >> >> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
> >> >> >>                                    size_t length) {
> >> >> >> --
> >> >> >> 2.51.1
> >> >> >>
> >> >> >>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5505 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 13:49               ` Sriharsha Basavapatna
@ 2026-02-03 14:29                 ` Jiri Pirko
  2026-02-03 14:49                   ` Sriharsha Basavapatna
  0 siblings, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 14:29 UTC (permalink / raw)
  To: Sriharsha Basavapatna
  Cc: Leon Romanovsky, linux-rdma, jgg, mrgolin, gal.pressman, sleybo,
	parav, mbloch, yanjun.zhu, wangliang74, marco.crivellari,
	roman.gushchin, phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 02:49:33PM +0100, sriharsha.basavapatna@broadcom.com wrote:
>On Tue, Feb 3, 2026 at 6:50 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
>> >On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
>> >> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
>> >> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
>> >> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
>> >> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> >> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> >> >>
>> >> >> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> >> >> lifecycle management when umem buffers are shared between the core
>> >> >> >> layer and device drivers.
>> >> >
>> >> >The lifecycle should be confined either to the core or to the driver,
>> >> >but it should not mix responsibilities across both.
>> >>
>> >> Okay, I can re-phrase. The point is, the last holding reference actually
>> >> does the release.
>> >>
>> >>
>> >> >
>> >> >> >>
>> >> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
>> >> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
>> >> >> >> to manage the umem lifecycle. Without reference counting, this requires
>> >> >> >> complex conditional release logic to avoid double-frees on error paths
>> >> >> >> and leaks on success paths.
>> >> >> >
>> >> >> >This sentence doesn't align with the proposed change.
>> >> >>
>> >> >> Hmm, I'm not sure why you think it does not align. It exactly describes
>> >> >> the code I had it originally without this path in place :)
>> >> >
>> >> >There is no complex conditional release logic which this reference
>> >> >counting solves. That counting allows delayed, semi-random release,
>> >> >nothing more.
>> >>
>> >> Again. Without the refcount, you have to make sure the umem is consumed
>> >> by the op. Actually, check the existing create_cq_umem. On the error
>> >> path, the umem is released by the caller. On success path the ownership
>> >> is transferred to the calle.
>> >
>> >We need to fix it. Exactly right now, I'm working to make sure that umem
>> >is managed by ib_core and drivers don't call to ib_umem_get() at all.
>>
>> Should I wait and rebase?
>>
>>
>> >
>> >> Consider various error paths in the calle
>> >> some of which are shared with destroy_cq op, some umems are not actually
>> >> used etc, it's a nightmare. I had the code written down like this
>> >> originally, that's why I actually know.
>> >>
>> >> Perhaps I'm missing your point. Is is just about the patch descriptio or
>> >> about the code itself?
>> >
>> >Description and the code. UMEM needs to be created by ib_core and
>> >ib_core should destroy them.
>
>I'm seeing a kernel crash when perftest is stopped, I'm still
>debugging it, but I'm wondering if it is related to this change. I see
>this inline comment in the uverbs handler:
>
>        /* Driver took a reference, release ours */
>        ib_umem_release(rq_dbr_umem);
>        ib_umem_release(sq_dbr_umem);
>        ib_umem_release(rq_umem);
>        ib_umem_release(sq_umem);
>
>What does it mean by "Driver took a reference"? If the driver returns
>success from create_qp_umem(), the umem it is still using gets
>released above? Is there something that the driver should call to hold
>a reference? It is not obvious from the create_qp_umem() dev op.

Yes, see "RDMA/uverbs: Use umem refcounting for CQ creation with external
buffer"

[...]

>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 13:32               ` Leon Romanovsky
@ 2026-02-03 14:31                 ` Jiri Pirko
  0 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 14:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-rdma, jgg, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 02:32:28PM +0100, leon@kernel.org wrote:
>On Tue, Feb 03, 2026 at 02:20:45PM +0100, Jiri Pirko wrote:
>> Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
>> >On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
>> >> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
>> >> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
>> >> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
>> >> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> >> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> >> >> 
>> >> >> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> >> >> lifecycle management when umem buffers are shared between the core
>> >> >> >> layer and device drivers.
>> >> >
>> >> >The lifecycle should be confined either to the core or to the driver,
>> >> >but it should not mix responsibilities across both.
>> >> 
>> >> Okay, I can re-phrase. The point is, the last holding reference actually
>> >> does the release.
>> >> 
>> >> 
>> >> >
>> >> >> >> 
>> >> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
>> >> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
>> >> >> >> to manage the umem lifecycle. Without reference counting, this requires
>> >> >> >> complex conditional release logic to avoid double-frees on error paths
>> >> >> >> and leaks on success paths.
>> >> >> >
>> >> >> >This sentence doesn't align with the proposed change.
>> >> >> 
>> >> >> Hmm, I'm not sure why you think it does not align. It exactly describes
>> >> >> the code I had it originally without this path in place :)
>> >> >
>> >> >There is no complex conditional release logic which this reference
>> >> >counting solves. That counting allows delayed, semi-random release,
>> >> >nothing more.
>> >> 
>> >> Again. Without the refcount, you have to make sure the umem is consumed
>> >> by the op. Actually, check the existing create_cq_umem. On the error
>> >> path, the umem is released by the caller. On success path the ownership
>> >> is transferred to the calle. 
>> >
>> >We need to fix it. Exactly right now, I'm working to make sure that umem
>> >is managed by ib_core and drivers don't call to ib_umem_get() at all.
>> 
>> Should I wait and rebase?
>
>It depends on your timeline. We are in -rc8 right now, so at least for
>next 3 weeks (one week till 6.19 + two weeks merge window) from now, your
>series won't be applied.

Well, I don't think I have a different option, do I :)


>
>Thanks
>
>> 
>> 
>> >
>> >> Consider various error paths in the calle
>> >> some of which are shared with destroy_cq op, some umems are not actually
>> >> used etc, it's a nightmare. I had the code written down like this
>> >> originally, that's why I actually know.
>> >> 
>> >> Perhaps I'm missing your point. Is is just about the patch descriptio or
>> >> about the code itself?
>> >
>> >Description and the code. UMEM needs to be created by ib_core and
>> >ib_core should destroy them.
>> >
>> >> 
>> >> 
>> >> >
>> >> >> 
>> >> >> >
>> >> >> >> 
>> >> >> >> With reference counting:
>> >> >> >> - Core allocates umem with refcount=1
>> >> >> >> - Driver calls ib_umem_get_ref() to take a reference
>> >> >> >> - Both layers can unconditionally call ib_umem_release()
>> >> >> >> - The umem is freed only when the last reference is dropped
>> >> >> >> 
>> >> >> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> >> >> >> Change-Id: Ifb1765ea3b14dab3329294633ea5df063c74420d
>> >> >> >
>> >> >> >Please remove the Change-Ids and write the commit message yourself,
>> >> >> >without relying on AI. The current message provides no meaningful
>> >> >> 
>> >> >> I'm new in RDMA. Not sure what you mean by meaningful information :)
>> >> >
>> >> >This part of commit message is clearly generated by Cursor:
>> >> >With reference counting:
>> >> >- Core allocates umem with refcount=1
>> >> >- Driver calls ib_umem_get_ref() to take a reference
>> >> >- Both layers can unconditionally call ib_umem_release()
>> >> >- The umem is freed only when the last reference is dropped
>> >> >
>> >> >The above paragraphs have clear AI pattern as they don't explain why,
>> >> >but only how.
>> >> 
>> >> Why is explained above, isn't it?
>> >> If you don't want the "how part", I can remove it, no problem.
>> >
>> >Commit message should provide an additional information, which is not
>> >available in the code itself. Description like "Core allocates umem with
>> >refcount=1" has zero value.
>> 
>> :), sure
>> 
>> 
>> >
>> >Thanks
>> >
>> >> 
>> >> 
>> >> >
>> >> >I'm seeing the SAME commit messages in many internals and externals
>> >> >patches.
>> >> >
>> >> >Even my AI review is agreed with me :)
>> >> >...
>> >> >"AI-authorship-score": "medium"
>> >> >...
>> >> >
>> >> >> I'm always trying to provide it.
>> >> >> 
>> >> >> >information, particularly the auto‑generated summary at the end.
>> >> >> 
>> >> >> Doh, the changeIDs :) Sorry about that.
>> >> >> 
>> >> >> 
>> >> >> >
>> >> >> >Thanks
>> >> >> >
>> >> >> >> ---
>> >> >> >>  drivers/infiniband/core/umem.c        | 5 +++++
>> >> >> >>  drivers/infiniband/core/umem_dmabuf.c | 1 +
>> >> >> >>  drivers/infiniband/core/umem_odp.c    | 3 +++
>> >> >> >>  include/rdma/ib_umem.h                | 9 +++++++++
>> >> >> >>  4 files changed, 18 insertions(+)
>> >> >> >> 
>> >> >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> >> >> >> index 8137031c2a65..09ce694d66ea 100644
>> >> >> >> --- a/drivers/infiniband/core/umem.c
>> >> >> >> +++ b/drivers/infiniband/core/umem.c
>> >> >> >> @@ -192,6 +192,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >> >> >>  	umem = kzalloc(sizeof(*umem), GFP_KERNEL);
>> >> >> >>  	if (!umem)
>> >> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >> >>  	umem->ibdev      = device;
>> >> >> >>  	umem->length     = size;
>> >> >> >>  	umem->address    = addr;
>> >> >> >> @@ -280,11 +281,15 @@ EXPORT_SYMBOL(ib_umem_get);
>> >> >> >>  /**
>> >> >> >>   * ib_umem_release - release memory pinned with ib_umem_get
>> >> >> >>   * @umem: umem struct to release
>> >> >> >> + *
>> >> >> >> + * Decrement the reference count and free the umem when it reaches zero.
>> >> >> >>   */
>> >> >> >>  void ib_umem_release(struct ib_umem *umem)
>> >> >> >>  {
>> >> >> >>  	if (!umem)
>> >> >> >>  		return;
>> >> >> >> +	if (!refcount_dec_and_test(&umem->refcount))
>> >> >> >> +		return;
>> >> >> >>  	if (umem->is_dmabuf)
>> >> >> >>  		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
>> >> >> >>  	if (umem->is_odp)
>> >> >> >> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
>> >> >> >> index 939da49b0dcc..5c5286092fca 100644
>> >> >> >> --- a/drivers/infiniband/core/umem_dmabuf.c
>> >> >> >> +++ b/drivers/infiniband/core/umem_dmabuf.c
>> >> >> >> @@ -143,6 +143,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>> >> >> >>  	}
>> >> >> >>  
>> >> >> >>  	umem = &umem_dmabuf->umem;
>> >> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >> >>  	umem->ibdev = device;
>> >> >> >>  	umem->length = size;
>> >> >> >>  	umem->address = offset;
>> >> >> >> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> >> >> >> index 572a91a62a7b..7be30fda57e3 100644
>> >> >> >> --- a/drivers/infiniband/core/umem_odp.c
>> >> >> >> +++ b/drivers/infiniband/core/umem_odp.c
>> >> >> >> @@ -144,6 +144,7 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
>> >> >> >>  	if (!umem_odp)
>> >> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >> >>  	umem = &umem_odp->umem;
>> >> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >> >>  	umem->ibdev = device;
>> >> >> >>  	umem->writable = ib_access_writable(access);
>> >> >> >>  	umem->owning_mm = current->mm;
>> >> >> >> @@ -185,6 +186,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root, unsigned long addr,
>> >> >> >>  	if (!odp_data)
>> >> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >> >>  	umem = &odp_data->umem;
>> >> >> >> +	refcount_set(&umem->refcount, 1);
>> >> >> >>  	umem->ibdev = root->umem.ibdev;
>> >> >> >>  	umem->length     = size;
>> >> >> >>  	umem->address    = addr;
>> >> >> >> @@ -245,6 +247,7 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>> >> >> >>  	if (!umem_odp)
>> >> >> >>  		return ERR_PTR(-ENOMEM);
>> >> >> >>  
>> >> >> >> +	refcount_set(&umem_odp->umem.refcount, 1);
>> >> >> >>  	umem_odp->umem.ibdev = device;
>> >> >> >>  	umem_odp->umem.length = size;
>> >> >> >>  	umem_odp->umem.address = addr;
>> >> >> >> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> >> >> >> index 0a8e092c0ea8..44264f78eab3 100644
>> >> >> >> --- a/include/rdma/ib_umem.h
>> >> >> >> +++ b/include/rdma/ib_umem.h
>> >> >> >> @@ -10,6 +10,7 @@
>> >> >> >>  #include <linux/list.h>
>> >> >> >>  #include <linux/scatterlist.h>
>> >> >> >>  #include <linux/workqueue.h>
>> >> >> >> +#include <linux/refcount.h>
>> >> >> >>  #include <rdma/ib_verbs.h>
>> >> >> >>  
>> >> >> >>  struct ib_ucontext;
>> >> >> >> @@ -19,6 +20,7 @@ struct dma_buf_attach_ops;
>> >> >> >>  struct ib_umem {
>> >> >> >>  	struct ib_device       *ibdev;
>> >> >> >>  	struct mm_struct       *owning_mm;
>> >> >> >> +	refcount_t		refcount;
>> >> >> >>  	u64 iova;
>> >> >> >>  	size_t			length;
>> >> >> >>  	unsigned long		address;
>> >> >> >> @@ -110,6 +112,12 @@ static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
>> >> >> >>  
>> >> >> >>  struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
>> >> >> >>  			    size_t size, int access);
>> >> >> >> +
>> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem)
>> >> >> >> +{
>> >> >> >> +	refcount_inc(&umem->refcount);
>> >> >> >> +}
>> >> >> >> +
>> >> >> >>  void ib_umem_release(struct ib_umem *umem);
>> >> >> >>  int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >> >> >>  		      size_t length);
>> >> >> >> @@ -188,6 +196,7 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
>> >> >> >>  {
>> >> >> >>  	return ERR_PTR(-EOPNOTSUPP);
>> >> >> >>  }
>> >> >> >> +static inline void ib_umem_get_ref(struct ib_umem *umem) { }
>> >> >> >>  static inline void ib_umem_release(struct ib_umem *umem) { }
>> >> >> >>  static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>> >> >> >>  		      		    size_t length) {
>> >> >> >> -- 
>> >> >> >> 2.51.1
>> >> >> >> 
>> >> >> >> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 14:29                 ` Jiri Pirko
@ 2026-02-03 14:49                   ` Sriharsha Basavapatna
  0 siblings, 0 replies; 34+ messages in thread
From: Sriharsha Basavapatna @ 2026-02-03 14:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Leon Romanovsky, linux-rdma, jgg, mrgolin, gal.pressman, sleybo,
	parav, mbloch, yanjun.zhu, wangliang74, marco.crivellari,
	roman.gushchin, phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	andrew.gospodarek, selvin.xavier, Sriharsha Basavapatna

[-- Attachment #1: Type: text/plain, Size: 4007 bytes --]

On Tue, Feb 3, 2026 at 7:59 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Tue, Feb 03, 2026 at 02:49:33PM +0100, sriharsha.basavapatna@broadcom.com wrote:
> >On Tue, Feb 3, 2026 at 6:50 PM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Tue, Feb 03, 2026 at 02:03:35PM +0100, leon@kernel.org wrote:
> >> >On Tue, Feb 03, 2026 at 01:46:21PM +0100, Jiri Pirko wrote:
> >> >> Tue, Feb 03, 2026 at 01:26:18PM +0100, leon@kernel.org wrote:
> >> >> >On Tue, Feb 03, 2026 at 11:11:39AM +0100, Jiri Pirko wrote:
> >> >> >> Tue, Feb 03, 2026 at 11:03:27AM +0100, leon@kernel.org wrote:
> >> >> >> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> >> >> >> From: Jiri Pirko <jiri@nvidia.com>
> >> >> >> >>
> >> >> >> >> Introduce reference counting for ib_umem objects to simplify memory
> >> >> >> >> lifecycle management when umem buffers are shared between the core
> >> >> >> >> layer and device drivers.
> >> >> >
> >> >> >The lifecycle should be confined either to the core or to the driver,
> >> >> >but it should not mix responsibilities across both.
> >> >>
> >> >> Okay, I can re-phrase. The point is, the last holding reference actually
> >> >> does the release.
> >> >>
> >> >>
> >> >> >
> >> >> >> >>
> >> >> >> >> When the core RDMA layer allocates an ib_umem and passes it to a driver
> >> >> >> >> (e.g., for CQ or QP creation with external buffers), both layers need
> >> >> >> >> to manage the umem lifecycle. Without reference counting, this requires
> >> >> >> >> complex conditional release logic to avoid double-frees on error paths
> >> >> >> >> and leaks on success paths.
> >> >> >> >
> >> >> >> >This sentence doesn't align with the proposed change.
> >> >> >>
> >> >> >> Hmm, I'm not sure why you think it does not align. It exactly describes
> >> >> >> the code I had it originally without this path in place :)
> >> >> >
> >> >> >There is no complex conditional release logic which this reference
> >> >> >counting solves. That counting allows delayed, semi-random release,
> >> >> >nothing more.
> >> >>
> >> >> Again. Without the refcount, you have to make sure the umem is consumed
> >> >> by the op. Actually, check the existing create_cq_umem. On the error
> >> >> path, the umem is released by the caller. On success path the ownership
> >> >> is transferred to the calle.
> >> >
> >> >We need to fix it. Exactly right now, I'm working to make sure that umem
> >> >is managed by ib_core and drivers don't call to ib_umem_get() at all.
> >>
> >> Should I wait and rebase?
> >>
> >>
> >> >
> >> >> Consider various error paths in the calle
> >> >> some of which are shared with destroy_cq op, some umems are not actually
> >> >> used etc, it's a nightmare. I had the code written down like this
> >> >> originally, that's why I actually know.
> >> >>
> >> >> Perhaps I'm missing your point. Is is just about the patch descriptio or
> >> >> about the code itself?
> >> >
> >> >Description and the code. UMEM needs to be created by ib_core and
> >> >ib_core should destroy them.
> >
> >I'm seeing a kernel crash when perftest is stopped, I'm still
> >debugging it, but I'm wondering if it is related to this change. I see
> >this inline comment in the uverbs handler:
> >
> >        /* Driver took a reference, release ours */
> >        ib_umem_release(rq_dbr_umem);
> >        ib_umem_release(sq_dbr_umem);
> >        ib_umem_release(rq_umem);
> >        ib_umem_release(sq_umem);
> >
> >What does it mean by "Driver took a reference"? If the driver returns
> >success from create_qp_umem(), the umem it is still using gets
> >released above? Is there something that the driver should call to hold
> >a reference? It is not obvious from the create_qp_umem() dev op.
>
> Yes, see "RDMA/uverbs: Use umem refcounting for CQ creation with external
> buffer"
>
> [...]
Ok, added ib_umem_get_ref() before returning (success) from
create_cq/qp_umem() and it doesn't crash anymore.
>
> >

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5505 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
  2026-02-03 10:03   ` Leon Romanovsky
@ 2026-02-03 14:51   ` Jason Gunthorpe
  2026-02-03 15:39     ` Jiri Pirko
  2026-02-03 16:56     ` Leon Romanovsky
  1 sibling, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2026-02-03 14:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> Introduce reference counting for ib_umem objects to simplify memory
> lifecycle management when umem buffers are shared between the core
> layer and device drivers.

I admit I have reservations about this too.. The flow should not be so
convoluted that a refcount is necessary. The lifecycle of a umem is
not uncertain at all.

I imagine'd it would be like:

core code:
  if (ops->create_cq_umem) {
     umem = umem_get
     rc = ops->create_cq_umem(umem)
     if (rc)
      umem_free(umem)
  } else {
     rc = ops->create_cq()
  }

Driver:
  create_cq():
    copy_from_user(drvdata)
    umem = umem_get()
    rc = driver_create_cq_umem(umem, &drvdata))
    if (rc)
      umem_free(umem)

   create_cq_umem()
     copy_from_user(drvdata)
     return driver_create_cq_umem(umem, &drvdata)

   destroy_cq()
     destry_hw
     umem_free()

This basically moves all the working code in the driver to the
driver_create_cq_umem() which *always* gets a umem as a parameter.

If the user uses the drvdata path to specify the umem then the driver
helper create_cq() creates the umem from the drvdata parameters,
otherwise the core creates it from the common UATTRs.

We can keep things so the umem is always freed by the driver on
success, which doesn't require any driver changes.

It should never be "shared", this is just a very simple unwind on
error kind of pattern.

I think the challenge here is to unwind the drivers into the above
three functions so they don't have a mess of convoluted error handling
around the umem.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 14:51   ` Jason Gunthorpe
@ 2026-02-03 15:39     ` Jiri Pirko
  2026-02-03 16:59       ` Jason Gunthorpe
  2026-02-03 16:56     ` Leon Romanovsky
  1 sibling, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2026-02-03 15:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 03:51:38PM +0100, jgg@ziepe.ca wrote:
>On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> Introduce reference counting for ib_umem objects to simplify memory
>> lifecycle management when umem buffers are shared between the core
>> layer and device drivers.
>
>I admit I have reservations about this too.. The flow should not be so
>convoluted that a refcount is necessary. The lifecycle of a umem is
>not uncertain at all.
>
>I imagine'd it would be like:
>
>core code:
>  if (ops->create_cq_umem) {
>     umem = umem_get
>     rc = ops->create_cq_umem(umem)
>     if (rc)
>      umem_free(umem)
>  } else {
>     rc = ops->create_cq()
>  }
>
>Driver:
>  create_cq():
>    copy_from_user(drvdata)
>    umem = umem_get()
>    rc = driver_create_cq_umem(umem, &drvdata))
>    if (rc)
>      umem_free(umem)
>
>   create_cq_umem()
>     copy_from_user(drvdata)
>     return driver_create_cq_umem(umem, &drvdata)
>
>   destroy_cq()
>     destry_hw
>     umem_free()


This is how it is now. However there are couple of challenges about this
flow:
1) umem usage. For example, create_qp_umem at the end of the set gets 4
   umem pointers. sq,rq,sq_dbr,rq_dbr. Some driver may use only one of
   those, 2 of those, 3 of those. Depends. mlx5 actually uses 2 or 3.
   If what you suggest, the current approach stands, the user has to
   always take all pointers, store them and eventually release them on
   destroy_qp path.
2) error path. I found the error path quite odd. Then create_cq/qp_umem
   returns !=0, core releases all umems. However, standard cq/qp
   destroy path takes care of releasing umems. Since a lot of code on
   error path and destroy path is shared, it has to be informed to
   release or not release the umems. That is not nice.

That is why I tried to make this more elegant using reference, which the
driver takes when it takes responsibility of the release.
I admit that if the umem_get/release will not be possible to do in
drivers, just core, that would solve this issue in the most elegant way.
That I why I would like to wait on Leon to do that and rebase on top of
it.


>
>This basically moves all the working code in the driver to the
>driver_create_cq_umem() which *always* gets a umem as a parameter.
>
>If the user uses the drvdata path to specify the umem then the driver
>helper create_cq() creates the umem from the drvdata parameters,
>otherwise the core creates it from the common UATTRs.
>
>We can keep things so the umem is always freed by the driver on
>success, which doesn't require any driver changes.
>
>It should never be "shared", this is just a very simple unwind on
>error kind of pattern.

Not "shared", more like "taken over".


>
>I think the challenge here is to unwind the drivers into the above
>three functions so they don't have a mess of convoluted error handling
>around the umem.
>
>Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 14:51   ` Jason Gunthorpe
  2026-02-03 15:39     ` Jiri Pirko
@ 2026-02-03 16:56     ` Leon Romanovsky
  2026-02-03 17:01       ` Jason Gunthorpe
  1 sibling, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-03 16:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, mrgolin, gal.pressman, sleybo, parav,
	mbloch, yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 10:51:38AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> > From: Jiri Pirko <jiri@nvidia.com>
> > 
> > Introduce reference counting for ib_umem objects to simplify memory
> > lifecycle management when umem buffers are shared between the core
> > layer and device drivers.
> 
> I admit I have reservations about this too.. The flow should not be so
> convoluted that a refcount is necessary. The lifecycle of a umem is
> not uncertain at all.
> 
> I imagine'd it would be like:
> 
> core code:
>   if (ops->create_cq_umem) {
>      umem = umem_get
>      rc = ops->create_cq_umem(umem)
>      if (rc)
>       umem_free(umem)
>   } else {
>      rc = ops->create_cq()
>   }
> 
> Driver:
>   create_cq():
>     copy_from_user(drvdata)
>     umem = umem_get()
>     rc = driver_create_cq_umem(umem, &drvdata))
>     if (rc)
>       umem_free(umem)
> 
>    create_cq_umem()
>      copy_from_user(drvdata)
>      return driver_create_cq_umem(umem, &drvdata)
> 
>    destroy_cq()
>      destry_hw
>      umem_free()
> 
> This basically moves all the working code in the driver to the
> driver_create_cq_umem() which *always* gets a umem as a parameter.
> 
> If the user uses the drvdata path to specify the umem then the driver
> helper create_cq() creates the umem from the drvdata parameters,
> otherwise the core creates it from the common UATTRs.
> 
> We can keep things so the umem is always freed by the driver on
> success, which doesn't require any driver changes.
> 
> It should never be "shared", this is just a very simple unwind on
> error kind of pattern.
> 
> I think the challenge here is to unwind the drivers into the above
> three functions so they don't have a mess of convoluted error handling
> around the umem.

I opted for an even simpler approach: embed the umem directly in ib_cq, set  
cq->umem in ib_core’s create_cq, and clean it up in ib_core’s destroy_cq.

The beginning of the series is available here:  
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=refactor-umem-v1

Thanks

> 
> Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 15:39     ` Jiri Pirko
@ 2026-02-03 16:59       ` Jason Gunthorpe
  2026-02-04  7:01         ` Jiri Pirko
  2026-02-04 15:38         ` Jiri Pirko
  0 siblings, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2026-02-03 16:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 04:39:52PM +0100, Jiri Pirko wrote:
> Tue, Feb 03, 2026 at 03:51:38PM +0100, jgg@ziepe.ca wrote:
> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
> >> From: Jiri Pirko <jiri@nvidia.com>
> >> 
> >> Introduce reference counting for ib_umem objects to simplify memory
> >> lifecycle management when umem buffers are shared between the core
> >> layer and device drivers.
> >
> >I admit I have reservations about this too.. The flow should not be so
> >convoluted that a refcount is necessary. The lifecycle of a umem is
> >not uncertain at all.
> >
> >I imagine'd it would be like:
> >
> >core code:
> >  if (ops->create_cq_umem) {
> >     umem = umem_get
> >     rc = ops->create_cq_umem(umem)
> >     if (rc)
> >      umem_free(umem)
> >  } else {
> >     rc = ops->create_cq()
> >  }
> >
> >Driver:
> >  create_cq():
> >    copy_from_user(drvdata)
> >    umem = umem_get()
> >    rc = driver_create_cq_umem(umem, &drvdata))
> >    if (rc)
> >      umem_free(umem)
> >
> >   create_cq_umem()
> >     copy_from_user(drvdata)
> >     return driver_create_cq_umem(umem, &drvdata)
> >
> >   destroy_cq()
> >     destry_hw
> >     umem_free()
> 
> 
> This is how it is now. However there are couple of challenges about this
> flow:
> 1) umem usage. For example, create_qp_umem at the end of the set gets 4
>    umem pointers. sq,rq,sq_dbr,rq_dbr. Some driver may use only one of
>    those, 2 of those, 3 of those. Depends. mlx5 actually uses 2 or 3.
>    If what you suggest, the current approach stands, the user has to
>    always take all pointers, store them and eventually release them on
>    destroy_qp path.

Userspace passing umems that are not used by the driver is an error.
Fail the call.

> 2) error path. I found the error path quite odd. Then create_cq/qp_umem
>    returns !=0, core releases all umems. However, standard cq/qp
>    destroy path takes care of releasing umems. Since a lot of code on
>    error path and destroy path is shared, it has to be informed to
>    release or not release the umems. That is not nice.

Generally I would not assign to the driver's umem storage until the
creation is completed to avoid this. ie it stays null until committed.

But looking at mlx5 that looks like quite a maze there.. Yikes..

So maybe mlx5 adds some NULL assignments on its error paths and less
convoluted drivers can use a simpler option?

My issue with refcounts is that this isn't a refcounted structure, it
has very well defined points where it must become freed.

Like we can't get through detroy_qp without the umem being freed, that
is illegal.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 16:56     ` Leon Romanovsky
@ 2026-02-03 17:01       ` Jason Gunthorpe
  0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2026-02-03 17:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jiri Pirko, linux-rdma, mrgolin, gal.pressman, sleybo, parav,
	mbloch, yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

On Tue, Feb 03, 2026 at 06:56:00PM +0200, Leon Romanovsky wrote:

> I opted for an even simpler approach: embed the umem directly in ib_cq, set  
> cq->umem in ib_core’s create_cq, and clean it up in ib_core’s destroy_cq.

The issue is most drivers create the umem from parameters in their
driver data so the core code cannot do it at all.

The structure I gave is a way for the drivers to parse their driver
data while still keeping the umem out of the bowels of the driver.

You can lift the umems into the core structures too, but things like
the DBR with their driver-specific umems would be hard to deal with.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 16:59       ` Jason Gunthorpe
@ 2026-02-04  7:01         ` Jiri Pirko
  2026-02-04 15:38         ` Jiri Pirko
  1 sibling, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-04  7:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, wangliang74, marco.crivellari, roman.gushchin,
	phaddad, lirongqing, ynachum, huangjunxian6,
	kalesh-anakkur.purayil, ohartoov, michaelgur, shayd, edwards,
	sriharsha.basavapatna, andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 05:59:38PM +0100, jgg@ziepe.ca wrote:
>On Tue, Feb 03, 2026 at 04:39:52PM +0100, Jiri Pirko wrote:
>> Tue, Feb 03, 2026 at 03:51:38PM +0100, jgg@ziepe.ca wrote:
>> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> 
>> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> lifecycle management when umem buffers are shared between the core
>> >> layer and device drivers.
>> >
>> >I admit I have reservations about this too.. The flow should not be so
>> >convoluted that a refcount is necessary. The lifecycle of a umem is
>> >not uncertain at all.
>> >
>> >I imagine'd it would be like:
>> >
>> >core code:
>> >  if (ops->create_cq_umem) {
>> >     umem = umem_get
>> >     rc = ops->create_cq_umem(umem)
>> >     if (rc)
>> >      umem_free(umem)
>> >  } else {
>> >     rc = ops->create_cq()
>> >  }
>> >
>> >Driver:
>> >  create_cq():
>> >    copy_from_user(drvdata)
>> >    umem = umem_get()
>> >    rc = driver_create_cq_umem(umem, &drvdata))
>> >    if (rc)
>> >      umem_free(umem)
>> >
>> >   create_cq_umem()
>> >     copy_from_user(drvdata)
>> >     return driver_create_cq_umem(umem, &drvdata)
>> >
>> >   destroy_cq()
>> >     destry_hw
>> >     umem_free()
>> 
>> 
>> This is how it is now. However there are couple of challenges about this
>> flow:
>> 1) umem usage. For example, create_qp_umem at the end of the set gets 4
>>    umem pointers. sq,rq,sq_dbr,rq_dbr. Some driver may use only one of
>>    those, 2 of those, 3 of those. Depends. mlx5 actually uses 2 or 3.
>>    If what you suggest, the current approach stands, the user has to
>>    always take all pointers, store them and eventually release them on
>>    destroy_qp path.
>
>Userspace passing umems that are not used by the driver is an error.
>Fail the call.

Okay, that makes sense. I just wanted to ignore it :)


>
>> 2) error path. I found the error path quite odd. Then create_cq/qp_umem
>>    returns !=0, core releases all umems. However, standard cq/qp
>>    destroy path takes care of releasing umems. Since a lot of code on
>>    error path and destroy path is shared, it has to be informed to
>>    release or not release the umems. That is not nice.
>
>Generally I would not assign to the driver's umem storage until the
>creation is completed to avoid this. ie it stays null until committed.
>
>But looking at mlx5 that looks like quite a maze there.. Yikes..

Exactly.


>
>So maybe mlx5 adds some NULL assignments on its error paths and less
>convoluted drivers can use a simpler option?

Yeah, I wanted to avoid that. But sure.


>
>My issue with refcounts is that this isn't a refcounted structure, it
>has very well defined points where it must become freed.
>
>Like we can't get through detroy_qp without the umem being freed, that
>is illegal.

Sure, even with this patch that never happens. But I get your point.

>
>Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-03 16:59       ` Jason Gunthorpe
  2026-02-04  7:01         ` Jiri Pirko
@ 2026-02-04 15:38         ` Jiri Pirko
  2026-02-04 17:46           ` Jason Gunthorpe
  1 sibling, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2026-02-04 15:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, marco.crivellari, roman.gushchin, phaddad, lirongqing,
	ynachum, huangjunxian6, kalesh-anakkur.purayil, ohartoov,
	michaelgur, shayd, edwards, sriharsha.basavapatna,
	andrew.gospodarek, selvin.xavier

Tue, Feb 03, 2026 at 05:59:38PM +0100, jgg@ziepe.ca wrote:
>On Tue, Feb 03, 2026 at 04:39:52PM +0100, Jiri Pirko wrote:
>> Tue, Feb 03, 2026 at 03:51:38PM +0100, jgg@ziepe.ca wrote:
>> >On Tue, Feb 03, 2026 at 09:49:53AM +0100, Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> 
>> >> Introduce reference counting for ib_umem objects to simplify memory
>> >> lifecycle management when umem buffers are shared between the core
>> >> layer and device drivers.
>> >
>> >I admit I have reservations about this too.. The flow should not be so
>> >convoluted that a refcount is necessary. The lifecycle of a umem is
>> >not uncertain at all.
>> >
>> >I imagine'd it would be like:
>> >
>> >core code:
>> >  if (ops->create_cq_umem) {
>> >     umem = umem_get
>> >     rc = ops->create_cq_umem(umem)
>> >     if (rc)
>> >      umem_free(umem)
>> >  } else {
>> >     rc = ops->create_cq()
>> >  }
>> >
>> >Driver:
>> >  create_cq():
>> >    copy_from_user(drvdata)
>> >    umem = umem_get()
>> >    rc = driver_create_cq_umem(umem, &drvdata))
>> >    if (rc)
>> >      umem_free(umem)
>> >
>> >   create_cq_umem()
>> >     copy_from_user(drvdata)
>> >     return driver_create_cq_umem(umem, &drvdata)
>> >
>> >   destroy_cq()
>> >     destry_hw
>> >     umem_free()
>> 
>> 
>> This is how it is now. However there are couple of challenges about this
>> flow:
>> 1) umem usage. For example, create_qp_umem at the end of the set gets 4
>>    umem pointers. sq,rq,sq_dbr,rq_dbr. Some driver may use only one of
>>    those, 2 of those, 3 of those. Depends. mlx5 actually uses 2 or 3.
>>    If what you suggest, the current approach stands, the user has to
>>    always take all pointers, store them and eventually release them on
>>    destroy_qp path.
>
>Userspace passing umems that are not used by the driver is an error.
>Fail the call.
>
>> 2) error path. I found the error path quite odd. Then create_cq/qp_umem
>>    returns !=0, core releases all umems. However, standard cq/qp
>>    destroy path takes care of releasing umems. Since a lot of code on
>>    error path and destroy path is shared, it has to be informed to
>>    release or not release the umems. That is not nice.
>
>Generally I would not assign to the driver's umem storage until the
>creation is completed to avoid this. ie it stays null until committed.
>
>But looking at mlx5 that looks like quite a maze there.. Yikes..
>
>So maybe mlx5 adds some NULL assignments on its error paths and less
>convoluted drivers can use a simpler option?

How about we have:
	int (*create_cq_umem)(struct ib_cq *cq,
			      const struct ib_cq_init_attr *attr,
			      struct ib_umem **umem,
                                             ^^

			      struct uverbs_attr_bundle *attrs);

And instead of taking ref in the callee we just do *umem = NULL? :S

This would help to cover the error path vs destroy path differences,
Tt could also be used to make sure the op consumed all umems; all
should be NULLed on success.

Makes sense?



>
>My issue with refcounts is that this isn't a refcounted structure, it
>has very well defined points where it must become freed.
>
>Like we can't get through detroy_qp without the umem being freed, that
>is illegal.
>
>Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-04 15:38         ` Jiri Pirko
@ 2026-02-04 17:46           ` Jason Gunthorpe
  2026-02-04 17:54             ` Leon Romanovsky
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2026-02-04 17:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, mrgolin, gal.pressman, sleybo, parav, mbloch,
	yanjun.zhu, marco.crivellari, roman.gushchin, phaddad, lirongqing,
	ynachum, huangjunxian6, kalesh-anakkur.purayil, ohartoov,
	michaelgur, shayd, edwards, sriharsha.basavapatna,
	andrew.gospodarek, selvin.xavier

On Wed, Feb 04, 2026 at 04:38:22PM +0100, Jiri Pirko wrote:

> >Generally I would not assign to the driver's umem storage until the
> >creation is completed to avoid this. ie it stays null until committed.
> >
> >But looking at mlx5 that looks like quite a maze there.. Yikes..
> >
> >So maybe mlx5 adds some NULL assignments on its error paths and less
> >convoluted drivers can use a simpler option?
> 
> How about we have:
> 	int (*create_cq_umem)(struct ib_cq *cq,
> 			      const struct ib_cq_init_attr *attr,
> 			      struct ib_umem **umem,
>                                              ^^
> 
> 			      struct uverbs_attr_bundle *attrs);
> 
> And instead of taking ref in the callee we just do *umem = NULL? :S
> 
> This would help to cover the error path vs destroy path differences,
> Tt could also be used to make sure the op consumed all umems; all
> should be NULLed on success.
> 
> Makes sense?

I'm ok with that, though never seen the pattern before. If the core
code fails on !NULL it would be pretty hard to use it wrong.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-04 17:46           ` Jason Gunthorpe
@ 2026-02-04 17:54             ` Leon Romanovsky
  2026-02-04 17:56               ` Jiri Pirko
  0 siblings, 1 reply; 34+ messages in thread
From: Leon Romanovsky @ 2026-02-04 17:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, mrgolin, gal.pressman, sleybo, parav,
	mbloch, yanjun.zhu, marco.crivellari, roman.gushchin, phaddad,
	lirongqing, ynachum, huangjunxian6, kalesh-anakkur.purayil,
	ohartoov, michaelgur, shayd, edwards, sriharsha.basavapatna,
	andrew.gospodarek, selvin.xavier

On Wed, Feb 04, 2026 at 01:46:15PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 04, 2026 at 04:38:22PM +0100, Jiri Pirko wrote:
> 
> > >Generally I would not assign to the driver's umem storage until the
> > >creation is completed to avoid this. ie it stays null until committed.
> > >
> > >But looking at mlx5 that looks like quite a maze there.. Yikes..
> > >
> > >So maybe mlx5 adds some NULL assignments on its error paths and less
> > >convoluted drivers can use a simpler option?
> > 
> > How about we have:
> > 	int (*create_cq_umem)(struct ib_cq *cq,
> > 			      const struct ib_cq_init_attr *attr,
> > 			      struct ib_umem **umem,
> >                                              ^^
> > 
> > 			      struct uverbs_attr_bundle *attrs);
> > 
> > And instead of taking ref in the callee we just do *umem = NULL? :S
> > 
> > This would help to cover the error path vs destroy path differences,
> > Tt could also be used to make sure the op consumed all umems; all
> > should be NULLed on success.
> > 
> > Makes sense?
> 
> I'm ok with that, though never seen the pattern before. If the core
> code fails on !NULL it would be pretty hard to use it wrong.

I'm less excited to see unique coding patterns. We should rather invest
time in effort to layer everything correctly.

Thanks

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem
  2026-02-04 17:54             ` Leon Romanovsky
@ 2026-02-04 17:56               ` Jiri Pirko
  0 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2026-02-04 17:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, linux-rdma, mrgolin, gal.pressman, sleybo, parav,
	mbloch, yanjun.zhu, marco.crivellari, roman.gushchin, phaddad,
	lirongqing, ynachum, huangjunxian6, kalesh-anakkur.purayil,
	ohartoov, michaelgur, shayd, edwards, sriharsha.basavapatna,
	andrew.gospodarek, selvin.xavier

Wed, Feb 04, 2026 at 06:54:11PM +0100, leon@kernel.org wrote:
>On Wed, Feb 04, 2026 at 01:46:15PM -0400, Jason Gunthorpe wrote:
>> On Wed, Feb 04, 2026 at 04:38:22PM +0100, Jiri Pirko wrote:
>> 
>> > >Generally I would not assign to the driver's umem storage until the
>> > >creation is completed to avoid this. ie it stays null until committed.
>> > >
>> > >But looking at mlx5 that looks like quite a maze there.. Yikes..
>> > >
>> > >So maybe mlx5 adds some NULL assignments on its error paths and less
>> > >convoluted drivers can use a simpler option?
>> > 
>> > How about we have:
>> > 	int (*create_cq_umem)(struct ib_cq *cq,
>> > 			      const struct ib_cq_init_attr *attr,
>> > 			      struct ib_umem **umem,
>> >                                              ^^
>> > 
>> > 			      struct uverbs_attr_bundle *attrs);
>> > 
>> > And instead of taking ref in the callee we just do *umem = NULL? :S
>> > 
>> > This would help to cover the error path vs destroy path differences,
>> > Tt could also be used to make sure the op consumed all umems; all
>> > should be NULLed on success.
>> > 
>> > Makes sense?
>> 
>> I'm ok with that, though never seen the pattern before. If the core
>> code fails on !NULL it would be pretty hard to use it wrong.
>
>I'm less excited to see unique coding patterns. We should rather invest
>time in effort to layer everything correctly.

Easy to leave this pattern after the effort you refer to brings fruits
:)

>
>Thanks
>
>> 
>> Jason
>> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2026-02-04 17:56 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-03  8:49 [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 01/10] RDMA/umem: Add reference counting to ib_umem Jiri Pirko
2026-02-03 10:03   ` Leon Romanovsky
2026-02-03 10:11     ` Jiri Pirko
2026-02-03 12:26       ` Leon Romanovsky
2026-02-03 12:46         ` Jiri Pirko
2026-02-03 13:03           ` Leon Romanovsky
2026-02-03 13:20             ` Jiri Pirko
2026-02-03 13:32               ` Leon Romanovsky
2026-02-03 14:31                 ` Jiri Pirko
2026-02-03 13:49               ` Sriharsha Basavapatna
2026-02-03 14:29                 ` Jiri Pirko
2026-02-03 14:49                   ` Sriharsha Basavapatna
2026-02-03 14:51   ` Jason Gunthorpe
2026-02-03 15:39     ` Jiri Pirko
2026-02-03 16:59       ` Jason Gunthorpe
2026-02-04  7:01         ` Jiri Pirko
2026-02-04 15:38         ` Jiri Pirko
2026-02-04 17:46           ` Jason Gunthorpe
2026-02-04 17:54             ` Leon Romanovsky
2026-02-04 17:56               ` Jiri Pirko
2026-02-03 16:56     ` Leon Romanovsky
2026-02-03 17:01       ` Jason Gunthorpe
2026-02-03  8:49 ` [PATCH rdma-next 02/10] RDMA/uverbs: Use umem refcounting for CQ creation with external buffer Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 03/10] RDMA/mlx5: Add support for CQ creation with external umem buffer Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 04/10] RDMA/uverbs: Factor out common buffer umem parsing into helper Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 05/10] RDMA/core: Add support for QP buffer umem in QP creation Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 06/10] RDMA/mlx5: Add support for QP creation with external umem buffers Jiri Pirko
2026-02-03  8:49 ` [PATCH rdma-next 07/10] RDMA/uverbs: Add doorbell record umem support to CQ creation Jiri Pirko
2026-02-03  8:50 ` [PATCH rdma-next 08/10] RDMA/mlx5: Add external doorbell record umem support for CQ Jiri Pirko
2026-02-03  8:50 ` [PATCH rdma-next 09/10] RDMA/uverbs: Add doorbell record umem support to QP creation Jiri Pirko
2026-02-03  8:50 ` [PATCH rdma-next 10/10] RDMA/mlx5: Add external doorbell record umem support for QP Jiri Pirko
2026-02-03  9:59 ` [PATCH rdma-next 00/10] RDMA: Extend uverbs umem support for QP buffers and doorbell records Leon Romanovsky
2026-02-03 10:13   ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox