public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core
@ 2026-02-13 10:57 Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 01/50] RDMA: Move DMA block iterator logic into dedicated files Leon Romanovsky
                   ` (51 more replies)
  0 siblings, 52 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

Unify CQ UMEM creation, resize and release in ib_core to avoid the need
for complex driver-side handling. This lets us rely on the internal
reference counters of the relevant ib_XXX objects to manage UMEM
lifetime safely and consistently.

The resize cleanup made it clear that most drivers never handled this
path correctly, and there's a good chance the functionality was never
actually used. The most common issue was relying on the cq->resize_umem
pointer to detect races with other CQ commands, without clearing it on
errors and while ignoring proper locking for other CQ operations.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Leon Romanovsky (50):
      RDMA: Move DMA block iterator logic into dedicated files
      RDMA/umem: Allow including ib_umem header from any location
      RDMA/umem: Remove unnecessary includes and defines from ib_umem header
      RDMA/core: Promote UMEM to a core component
      RDMA/core: Manage CQ umem in core code
      RDMA/efa: Rely on CPU address in create‑QP
      RDMA/core: Prepare create CQ path for API unification
      RDMA/core: Reject zero CQE count
      RDMA/efa: Remove check for zero CQE count
      RDMA/mlx5: Save 4 bytes in CQ structure
      RDMA/mlx5: Provide a modern CQ creation interface
      RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers
      RDMA/mlx4: Introduce a modern CQ creation interface
      RDMA/mlx4: Remove unused create_flags field from CQ structure
      RDMA/bnxt_re: Convert to modern CQ interface
      RDMA/cxgb4: Separate kernel and user CQ creation paths
      RDMA/mthca: Split user and kernel CQ creation paths
      RDMA/erdma: Separate user and kernel CQ creation paths
      RDMA/ionic: Split user and kernel CQ creation paths
      RDMA/qedr: Convert to modern CQ interface
      RDMA/vmw_pvrdma: Provide a modern CQ creation interface
      RDMA/ocrdma: Split user and kernel CQ creation paths
      RDMA/irdma: Split user and kernel CQ creation paths
      RDMA/usnic: Provide a modern CQ creation interface
      RDMA/mana: Provide a modern CQ creation interface
      RDMA/erdma: Separate user and kernel CQ creation paths
      RDMA/rdmavt: Split user and kernel CQ creation paths
      RDMA/siw: Split user and kernel CQ creation paths
      RDMA/rxe: Split user and kernel CQ creation paths
      RDMA/core: Remove legacy CQ creation fallback path
      RDMA/core: Remove unused ib_resize_cq() implementation
      RDMA: Clarify that CQ resize is a user‑space verb
      RDMA/bnxt_re: Drop support for resizing kernel CQs
      RDMA/irdma: Remove resize support for kernel CQs
      RDMA/mlx4: Remove support for kernel CQ resize
      RDMA/mlx5: Remove support for resizing kernel CQs
      RDMA/mthca: Remove resize support for kernel CQs
      RDMA/rdmavt: Remove resize support for kernel CQs
      RDMA/rxe: Remove unused kernel‑side CQ resize support
      RDMA: Properly propagate the number of CQEs as unsigned int
      RDMA/core: Generalize CQ resize locking
      RDMA/bnxt_re: Complete CQ resize in a single step
      RDMA/bnxt_re: Rely on common resize‑CQ locking
      RDMA/bnxt_re: Reduce CQ memory footprint
      RDMA/mlx4: Use generic resize-CQ lock
      RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object
      RDMA/mlx5: Use generic resize-CQ lock
      RDMA/mlx5: Select resize‑CQ callback based on device capabilities
      RDMA/mlx5: Reduce CQ memory footprint
      RDMA/mthca: Use generic resize-CQ lock

 drivers/infiniband/core/Makefile                |   6 +-
 drivers/infiniband/core/cq.c                    |   3 +
 drivers/infiniband/core/device.c                |   4 +-
 drivers/infiniband/core/iter.c                  |  43 +++
 drivers/infiniband/core/umem.c                  |   2 +-
 drivers/infiniband/core/uverbs_cmd.c            |  18 +-
 drivers/infiniband/core/uverbs_std_types_cq.c   |  35 ++-
 drivers/infiniband/core/verbs.c                 |  61 +---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c        | 246 ++++++++-------
 drivers/infiniband/hw/bnxt_re/ib_verbs.h        |   9 +-
 drivers/infiniband/hw/bnxt_re/main.c            |   3 +-
 drivers/infiniband/hw/bnxt_re/qplib_res.c       |   2 +-
 drivers/infiniband/hw/cxgb4/cq.c                | 218 +++++++++----
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   2 +
 drivers/infiniband/hw/cxgb4/mem.c               |   2 +-
 drivers/infiniband/hw/cxgb4/provider.c          |   1 +
 drivers/infiniband/hw/efa/efa.h                 |   6 +-
 drivers/infiniband/hw/efa/efa_main.c            |   3 +-
 drivers/infiniband/hw/efa/efa_verbs.c           |  44 ++-
 drivers/infiniband/hw/erdma/erdma_main.c        |   1 +
 drivers/infiniband/hw/erdma/erdma_verbs.c       |  99 ++++--
 drivers/infiniband/hw/erdma/erdma_verbs.h       |   2 +
 drivers/infiniband/hw/hns/hns_roce_alloc.c      |   2 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c         | 103 ++++--
 drivers/infiniband/hw/hns/hns_roce_debugfs.c    |   1 -
 drivers/infiniband/hw/hns/hns_roce_device.h     |   3 +-
 drivers/infiniband/hw/hns/hns_roce_main.c       |   1 +
 drivers/infiniband/hw/ionic/ionic_controlpath.c |  88 ++++--
 drivers/infiniband/hw/ionic/ionic_ibdev.c       |   1 +
 drivers/infiniband/hw/ionic/ionic_ibdev.h       |   4 +-
 drivers/infiniband/hw/irdma/main.h              |   2 +-
 drivers/infiniband/hw/irdma/verbs.c             | 402 +++++++++++++-----------
 drivers/infiniband/hw/mana/cq.c                 | 128 +++++---
 drivers/infiniband/hw/mana/device.c             |   1 +
 drivers/infiniband/hw/mana/main.c               |  25 +-
 drivers/infiniband/hw/mana/mana_ib.h            |   6 +-
 drivers/infiniband/hw/mana/qp.c                 |  42 ++-
 drivers/infiniband/hw/mana/wq.c                 |  14 +-
 drivers/infiniband/hw/mlx4/cq.c                 | 401 ++++++++---------------
 drivers/infiniband/hw/mlx4/main.c               |   3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h            |  10 +-
 drivers/infiniband/hw/mlx4/mr.c                 |   1 +
 drivers/infiniband/hw/mlx5/cq.c                 | 383 ++++++++--------------
 drivers/infiniband/hw/mlx5/main.c               |   9 +-
 drivers/infiniband/hw/mlx5/mem.c                |   1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h            |  12 +-
 drivers/infiniband/hw/mlx5/qp.c                 |   2 +-
 drivers/infiniband/hw/mlx5/umr.c                |   1 +
 drivers/infiniband/hw/mthca/mthca_cq.c          |   1 -
 drivers/infiniband/hw/mthca/mthca_provider.c    | 193 ++++--------
 drivers/infiniband/hw/mthca/mthca_provider.h    |   1 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c      |   3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c     |  70 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h     |   6 +-
 drivers/infiniband/hw/qedr/main.c               |   1 +
 drivers/infiniband/hw/qedr/verbs.c              | 325 +++++++++++--------
 drivers/infiniband/hw/qedr/verbs.h              |   2 +
 drivers/infiniband/hw/usnic/usnic_ib_main.c     |   2 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c    |   6 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h    |   4 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma.h       |   2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c    | 171 ++++++----
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c  |   1 +
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |   3 +
 drivers/infiniband/sw/rdmavt/cq.c               | 224 +++++++------
 drivers/infiniband/sw/rdmavt/cq.h               |   4 +-
 drivers/infiniband/sw/rdmavt/vt.c               |   3 +-
 drivers/infiniband/sw/rxe/rxe_cq.c              |  31 --
 drivers/infiniband/sw/rxe/rxe_loc.h             |   3 -
 drivers/infiniband/sw/rxe/rxe_verbs.c           | 115 +++----
 drivers/infiniband/sw/siw/siw_main.c            |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c           | 111 +++++--
 drivers/infiniband/sw/siw/siw_verbs.h           |   2 +
 include/rdma/ib_umem.h                          |  36 +--
 include/rdma/ib_verbs.h                         |  67 +---
 include/rdma/iter.h                             |  88 ++++++
 76 files changed, 2085 insertions(+), 1847 deletions(-)
---
base-commit: 42e3aac65c1c9eb36cdee0d8312a326196e0822f
change-id: 20260203-refactor-umem-e5b4277e41b4

Best regards,
--  
Leon Romanovsky <leonro@nvidia.com>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 01/50] RDMA: Move DMA block iterator logic into dedicated files
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 02/50] RDMA/umem: Allow including ib_umem header from any location Leon Romanovsky
                   ` (50 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The DMA iterator logic was mixed into verbs and umem-specific code,
forcing all users to include rdma/ib_umem.h. Move the block iterator
logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and
rdma/ib_verbs.h can be separated in a follow-up patch.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/Makefile             |  2 +-
 drivers/infiniband/core/iter.c               | 43 ++++++++++++++
 drivers/infiniband/core/verbs.c              | 38 ------------
 drivers/infiniband/hw/bnxt_re/qplib_res.c    |  2 +-
 drivers/infiniband/hw/cxgb4/mem.c            |  2 +-
 drivers/infiniband/hw/efa/efa_verbs.c        |  2 +-
 drivers/infiniband/hw/erdma/erdma_verbs.c    |  2 +-
 drivers/infiniband/hw/hns/hns_roce_alloc.c   |  2 +-
 drivers/infiniband/hw/ionic/ionic_ibdev.h    |  2 +-
 drivers/infiniband/hw/irdma/main.h           |  2 +-
 drivers/infiniband/hw/mana/mana_ib.h         |  2 +-
 drivers/infiniband/hw/mlx4/mr.c              |  1 +
 drivers/infiniband/hw/mlx5/mem.c             |  1 +
 drivers/infiniband/hw/mlx5/umr.c             |  1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  2 +-
 drivers/infiniband/hw/qedr/verbs.c           |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma.h    |  2 +-
 include/rdma/ib_umem.h                       | 30 ----------
 include/rdma/ib_verbs.h                      | 48 ---------------
 include/rdma/iter.h                          | 88 ++++++++++++++++++++++++++++
 21 files changed, 147 insertions(+), 129 deletions(-)

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index f483e0c12444..48922e0ede56 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
 				nldev.o restrack.o counters.o ib_core_uverbs.o \
-				trace.o lag.o
+				trace.o lag.o iter.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/iter.c b/drivers/infiniband/core/iter.c
new file mode 100644
index 000000000000..8e543d100657
--- /dev/null
+++ b/drivers/infiniband/core/iter.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. */
+
+#include <linux/export.h>
+#include <rdma/iter.h>
+
+void __rdma_block_iter_start(struct ib_block_iter *biter,
+			     struct scatterlist *sglist, unsigned int nents,
+			     unsigned long pgsz)
+{
+	memset(biter, 0, sizeof(struct ib_block_iter));
+	biter->__sg = sglist;
+	biter->__sg_nents = nents;
+
+	/* Driver provides best block size to use */
+	biter->__pg_bit = __fls(pgsz);
+}
+EXPORT_SYMBOL(__rdma_block_iter_start);
+
+bool __rdma_block_iter_next(struct ib_block_iter *biter)
+{
+	unsigned int block_offset;
+	unsigned int delta;
+
+	if (!biter->__sg_nents || !biter->__sg)
+		return false;
+
+	biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance;
+	block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1);
+	delta = BIT_ULL(biter->__pg_bit) - block_offset;
+
+	while (biter->__sg_nents && biter->__sg &&
+	       sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) {
+		delta -= sg_dma_len(biter->__sg) - biter->__sg_advance;
+		biter->__sg_advance = 0;
+		biter->__sg = sg_next(biter->__sg);
+		biter->__sg_nents--;
+	}
+	biter->__sg_advance += delta;
+
+	return true;
+}
+EXPORT_SYMBOL(__rdma_block_iter_next);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 02ebc3e52196..47a97797d7be 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -3154,44 +3154,6 @@ int rdma_init_netdev(struct ib_device *device, u32 port_num,
 }
 EXPORT_SYMBOL(rdma_init_netdev);
 
-void __rdma_block_iter_start(struct ib_block_iter *biter,
-			     struct scatterlist *sglist, unsigned int nents,
-			     unsigned long pgsz)
-{
-	memset(biter, 0, sizeof(struct ib_block_iter));
-	biter->__sg = sglist;
-	biter->__sg_nents = nents;
-
-	/* Driver provides best block size to use */
-	biter->__pg_bit = __fls(pgsz);
-}
-EXPORT_SYMBOL(__rdma_block_iter_start);
-
-bool __rdma_block_iter_next(struct ib_block_iter *biter)
-{
-	unsigned int block_offset;
-	unsigned int delta;
-
-	if (!biter->__sg_nents || !biter->__sg)
-		return false;
-
-	biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance;
-	block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1);
-	delta = BIT_ULL(biter->__pg_bit) - block_offset;
-
-	while (biter->__sg_nents && biter->__sg &&
-	       sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) {
-		delta -= sg_dma_len(biter->__sg) - biter->__sg_advance;
-		biter->__sg_advance = 0;
-		biter->__sg = sg_next(biter->__sg);
-		biter->__sg_nents--;
-	}
-	biter->__sg_advance += delta;
-
-	return true;
-}
-EXPORT_SYMBOL(__rdma_block_iter_next);
-
 /**
  * rdma_alloc_hw_stats_struct - Helper function to allocate dynamic struct
  *   for the drivers.
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 875d7b52c06a..64b02ea98cac 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -46,7 +46,7 @@
 #include <linux/if_vlan.h>
 #include <linux/vmalloc.h>
 #include <rdma/ib_verbs.h>
-#include <rdma/ib_umem.h>
+#include <rdma/iter.h>
 
 #include "roce_hsi.h"
 #include "qplib_res.h"
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index adeed7447e7b..e0ec2c4158a0 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -32,9 +32,9 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
-#include <rdma/ib_umem.h>
 #include <linux/atomic.h>
 #include <rdma/ib_user_verbs.h>
+#include <rdma/iter.h>
 
 #include "iw_cxgb4.h"
 
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 22d3e25c3b9d..19e3033d4ff7 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -9,9 +9,9 @@
 #include <linux/log2.h>
 
 #include <rdma/ib_addr.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_user_verbs.h>
 #include <rdma/ib_verbs.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 #define UVERBS_MODULE_NAME efa_ib
 #include <rdma/uverbs_named_ioctl.h>
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 109a3f3de911..058edc42de58 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -12,7 +12,7 @@
 #include <linux/vmalloc.h>
 #include <net/addrconf.h>
 #include <rdma/erdma-abi.h>
-#include <rdma/ib_umem.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 
 #include "erdma.h"
diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 6ee911f6885b..c21004814c3c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -32,7 +32,7 @@
  */
 
 #include <linux/vmalloc.h>
-#include <rdma/ib_umem.h>
+#include <rdma/iter.h>
 #include "hns_roce_device.h"
 
 void hns_roce_buf_free(struct hns_roce_dev *hr_dev, struct hns_roce_buf *buf)
diff --git a/drivers/infiniband/hw/ionic/ionic_ibdev.h b/drivers/infiniband/hw/ionic/ionic_ibdev.h
index 82fda1e3cdb6..63828240d659 100644
--- a/drivers/infiniband/hw/ionic/ionic_ibdev.h
+++ b/drivers/infiniband/hw/ionic/ionic_ibdev.h
@@ -4,9 +4,9 @@
 #ifndef _IONIC_IBDEV_H_
 #define _IONIC_IBDEV_H_
 
-#include <rdma/ib_umem.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 
 #include <rdma/ionic-abi.h>
diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
index d320d1a228b3..3d49bd57bae7 100644
--- a/drivers/infiniband/hw/irdma/main.h
+++ b/drivers/infiniband/hw/irdma/main.h
@@ -37,8 +37,8 @@
 #include <rdma/rdma_cm.h>
 #include <rdma/iw_cm.h>
 #include <rdma/ib_user_verbs.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_cache.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 #include "osdep.h"
 #include "defs.h"
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index e447acfd2071..a7c8c0fd7019 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -8,7 +8,7 @@
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_mad.h>
-#include <rdma/ib_umem.h>
+#include <rdma/iter.h>
 #include <rdma/mana-abi.h>
 #include <rdma/uverbs_ioctl.h>
 #include <linux/dmapool.h>
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 94464f1694d9..9b647a300eb9 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -33,6 +33,7 @@
 
 #include <linux/slab.h>
 #include <rdma/ib_user_verbs.h>
+#include <rdma/iter.h>
 
 #include "mlx4_ib.h"
 
diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index af321f6ef7f5..75d5b5672b5c 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -31,6 +31,7 @@
  */
 
 #include <rdma/ib_umem_odp.h>
+#include <rdma/iter.h>
 #include "mlx5_ib.h"
 
 /*
diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c
index 4e562e0dd9e1..29488fba21a0 100644
--- a/drivers/infiniband/hw/mlx5/umr.c
+++ b/drivers/infiniband/hw/mlx5/umr.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. */
 
 #include <rdma/ib_umem_odp.h>
+#include <rdma/iter.h>
 #include "mlx5_ib.h"
 #include "umr.h"
 #include "wr.h"
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index dd572d76866c..aa5ca5c4ff77 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -35,8 +35,8 @@
  */
 
 #include <rdma/ib_smi.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_user_verbs.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 
 #include <linux/sched.h>
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 46d911fd38de..bf9211d8d130 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -45,9 +45,9 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>
 #include <rdma/iw_cm.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 
 #include "ocrdma.h"
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index ab9bf0922979..cb06c5d894b8 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -39,9 +39,9 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>
 #include <rdma/iw_cm.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
+#include <rdma/iter.h>
 #include <rdma/uverbs_ioctl.h>
 
 #include <linux/qed/common_hsi.h>
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma.h b/drivers/infiniband/hw/vmw_pvrdma/pvrdma.h
index 763ddc6f25d1..23e547d4b3a7 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma.h
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma.h
@@ -53,8 +53,8 @@
 #include <linux/pci.h>
 #include <linux/semaphore.h>
 #include <linux/workqueue.h>
-#include <rdma/ib_umem.h>
 #include <rdma/ib_verbs.h>
+#include <rdma/iter.h>
 #include <rdma/vmw_pvrdma-abi.h>
 
 #include "pvrdma_ring.h"
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 0a8e092c0ea8..ce47688dd003 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -76,36 +76,6 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 	return ib_umem_num_dma_blocks(umem, PAGE_SIZE);
 }
 
-static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
-						struct ib_umem *umem,
-						unsigned long pgsz)
-{
-	__rdma_block_iter_start(biter, umem->sgt_append.sgt.sgl,
-				umem->sgt_append.sgt.nents, pgsz);
-	biter->__sg_advance = ib_umem_offset(umem) & ~(pgsz - 1);
-	biter->__sg_numblocks = ib_umem_num_dma_blocks(umem, pgsz);
-}
-
-static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
-{
-	return __rdma_block_iter_next(biter) && biter->__sg_numblocks--;
-}
-
-/**
- * rdma_umem_for_each_dma_block - iterate over contiguous DMA blocks of the umem
- * @umem: umem to iterate over
- * @pgsz: Page size to split the list into
- *
- * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The
- * returned DMA blocks will be aligned to pgsz and span the range:
- * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz)
- *
- * Performs exactly ib_umem_num_dma_blocks() iterations.
- */
-#define rdma_umem_for_each_dma_block(umem, biter, pgsz)                        \
-	for (__rdma_umem_block_iter_start(biter, umem, pgsz);                  \
-	     __rdma_umem_block_iter_next(biter);)
-
 #ifdef CONFIG_INFINIBAND_USER_MEM
 
 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8bd020da7745..e1ec5a6c74e6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2950,22 +2950,6 @@ struct ib_client {
 	u8 no_kverbs_req:1;
 };
 
-/*
- * IB block DMA iterator
- *
- * Iterates the DMA-mapped SGL in contiguous memory blocks aligned
- * to a HW supported page size.
- */
-struct ib_block_iter {
-	/* internal states */
-	struct scatterlist *__sg;	/* sg holding the current aligned block */
-	dma_addr_t __dma_addr;		/* unaligned DMA address of this block */
-	size_t __sg_numblocks;		/* ib_umem_num_dma_blocks() */
-	unsigned int __sg_nents;	/* number of SG entries */
-	unsigned int __sg_advance;	/* number of bytes to advance in sg in next step */
-	unsigned int __pg_bit;		/* alignment of current block */
-};
-
 struct ib_device *_ib_alloc_device(size_t size, struct net *net);
 #define ib_alloc_device(drv_struct, member)                                    \
 	container_of(_ib_alloc_device(sizeof(struct drv_struct) +              \
@@ -2994,38 +2978,6 @@ void ib_unregister_device_queued(struct ib_device *ib_dev);
 int ib_register_client   (struct ib_client *client);
 void ib_unregister_client(struct ib_client *client);
 
-void __rdma_block_iter_start(struct ib_block_iter *biter,
-			     struct scatterlist *sglist,
-			     unsigned int nents,
-			     unsigned long pgsz);
-bool __rdma_block_iter_next(struct ib_block_iter *biter);
-
-/**
- * rdma_block_iter_dma_address - get the aligned dma address of the current
- * block held by the block iterator.
- * @biter: block iterator holding the memory block
- */
-static inline dma_addr_t
-rdma_block_iter_dma_address(struct ib_block_iter *biter)
-{
-	return biter->__dma_addr & ~(BIT_ULL(biter->__pg_bit) - 1);
-}
-
-/**
- * rdma_for_each_block - iterate over contiguous memory blocks of the sg list
- * @sglist: sglist to iterate over
- * @biter: block iterator holding the memory block
- * @nents: maximum number of sg entries to iterate over
- * @pgsz: best HW supported page size to use
- *
- * Callers may use rdma_block_iter_dma_address() to get each
- * blocks aligned DMA address.
- */
-#define rdma_for_each_block(sglist, biter, nents, pgsz)		\
-	for (__rdma_block_iter_start(biter, sglist, nents,	\
-				     pgsz);			\
-	     __rdma_block_iter_next(biter);)
-
 /**
  * ib_get_client_data - Get IB client context
  * @device:Device to get context for
diff --git a/include/rdma/iter.h b/include/rdma/iter.h
new file mode 100644
index 000000000000..19d64ef04ba9
--- /dev/null
+++ b/include/rdma/iter.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. */
+
+#ifndef _RDMA_ITER_H_
+#define _RDMA_ITER_H_
+
+#include <linux/scatterlist.h>
+#include <rdma/ib_umem.h>
+
+/**
+ * IB block DMA iterator
+ *
+ * Iterates the DMA-mapped SGL in contiguous memory blocks aligned
+ * to a HW supported page size.
+ */
+struct ib_block_iter {
+	/* internal states */
+	struct scatterlist *__sg;	/* sg holding the current aligned block */
+	dma_addr_t __dma_addr;		/* unaligned DMA address of this block */
+	size_t __sg_numblocks;		/* ib_umem_num_dma_blocks() */
+	unsigned int __sg_nents;	/* number of SG entries */
+	unsigned int __sg_advance;	/* number of bytes to advance in sg in next step */
+	unsigned int __pg_bit;		/* alignment of current block */
+};
+
+void __rdma_block_iter_start(struct ib_block_iter *biter,
+			     struct scatterlist *sglist,
+			     unsigned int nents,
+			     unsigned long pgsz);
+bool __rdma_block_iter_next(struct ib_block_iter *biter);
+
+/**
+ * rdma_block_iter_dma_address - get the aligned dma address of the current
+ * block held by the block iterator.
+ * @biter: block iterator holding the memory block
+ */
+static inline dma_addr_t
+rdma_block_iter_dma_address(struct ib_block_iter *biter)
+{
+	return biter->__dma_addr & ~(BIT_ULL(biter->__pg_bit) - 1);
+}
+
+/**
+ * rdma_for_each_block - iterate over contiguous memory blocks of the sg list
+ * @sglist: sglist to iterate over
+ * @biter: block iterator holding the memory block
+ * @nents: maximum number of sg entries to iterate over
+ * @pgsz: best HW supported page size to use
+ *
+ * Callers may use rdma_block_iter_dma_address() to get each
+ * blocks aligned DMA address.
+ */
+#define rdma_for_each_block(sglist, biter, nents, pgsz)		\
+	for (__rdma_block_iter_start(biter, sglist, nents,	\
+				     pgsz);			\
+	     __rdma_block_iter_next(biter);)
+
+static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter,
+						struct ib_umem *umem,
+						unsigned long pgsz)
+{
+	__rdma_block_iter_start(biter, umem->sgt_append.sgt.sgl,
+				umem->sgt_append.sgt.nents, pgsz);
+	biter->__sg_advance = ib_umem_offset(umem) & ~(pgsz - 1);
+	biter->__sg_numblocks = ib_umem_num_dma_blocks(umem, pgsz);
+}
+
+static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter)
+{
+	return __rdma_block_iter_next(biter) && biter->__sg_numblocks--;
+}
+
+/**
+ * rdma_umem_for_each_dma_block - iterate over contiguous DMA blocks of the umem
+ * @umem: umem to iterate over
+ * @pgsz: Page size to split the list into
+ *
+ * pgsz must be <= PAGE_SIZE or computed by ib_umem_find_best_pgsz(). The
+ * returned DMA blocks will be aligned to pgsz and span the range:
+ * ALIGN_DOWN(umem->address, pgsz) to ALIGN(umem->address + umem->length, pgsz)
+ *
+ * Performs exactly ib_umem_num_dma_blocks() iterations.
+ */
+#define rdma_umem_for_each_dma_block(umem, biter, pgsz)                        \
+	for (__rdma_umem_block_iter_start(biter, umem, pgsz);                  \
+	     __rdma_umem_block_iter_next(biter);)
+
+#endif /* _RDMA_ITER_H_ */

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 02/50] RDMA/umem: Allow including ib_umem header from any location
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 01/50] RDMA: Move DMA block iterator logic into dedicated files Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 03/50] RDMA/umem: Remove unnecessary includes and defines from ib_umem header Leon Romanovsky
                   ` (49 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Including ib_umem.h currently triggers circular dependency errors.
These issues can be resolved by removing the include of ib_verbs.h,
which was only needed to resolve the struct ib_device pointer.

>> depmod: ERROR: Cycle detected: ib_core -> ib_uverbs -> ib_core
>> depmod: ERROR: Found 2 modules in dependency cycles!
  make[3]: *** [scripts/Makefile.modinst:132: depmod] Error 1
  make[3]: Target '__modinst' not remade because of errors.
  make[2]: *** [Makefile:1960: modules_install] Error 2
  make[1]: *** [Makefile:248: __sub-make] Error 2
  make[1]: Target 'modules_install' not remade because of errors.
  make: *** [Makefile:248: __sub-make] Error 2
  make: Target 'modules_install' not remade because of errors.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/rdma/ib_umem.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index ce47688dd003..084a1d9a66f3 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -10,8 +10,8 @@
 #include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
-#include <rdma/ib_verbs.h>
 
+struct ib_device;
 struct ib_ucontext;
 struct ib_umem_odp;
 struct dma_buf_attach_ops;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 03/50] RDMA/umem: Remove unnecessary includes and defines from ib_umem header
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 01/50] RDMA: Move DMA block iterator logic into dedicated files Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 02/50] RDMA/umem: Allow including ib_umem header from any location Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 04/50] RDMA/core: Promote UMEM to a core component Leon Romanovsky
                   ` (48 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The ib_umem header no longer requires the removed includes or forward
declarations, so drop them to reduce clutter.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/rdma/ib_umem.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 084a1d9a66f3..c3ab11e6879f 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -7,13 +7,9 @@
 #ifndef IB_UMEM_H
 #define IB_UMEM_H
 
-#include <linux/list.h>
 #include <linux/scatterlist.h>
-#include <linux/workqueue.h>
 
 struct ib_device;
-struct ib_ucontext;
-struct ib_umem_odp;
 struct dma_buf_attach_ops;
 
 struct ib_umem {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 04/50] RDMA/core: Promote UMEM to a core component
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (2 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 03/50] RDMA/umem: Remove unnecessary includes and defines from ib_umem header Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 05/50] RDMA/core: Manage CQ umem in core code Leon Romanovsky
                   ` (47 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

To manage UMEM objects at the core level and reuse the existing
ib_destroy_cq*() flow, move the UMEM files to be built together with
ib_core. Attempting to call ib_umem_release() from verbs.c currently
results in the following error:

    depmod: ERROR: Cycle detected: ib_core -> ib_uverbs -> ib_core
    depmod: ERROR: Found 2 modules in dependency cycles!
    verbs.c:(.text+0x250c): undefined reference to `ib_umem_release'

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 48922e0ede56..ada9877d02df 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -16,6 +16,8 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
+ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
+ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
 
 ib_cm-y :=			cm.o cm_trace.o
 
@@ -42,5 +44,3 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o \
 				ucaps.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
-ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 05/50] RDMA/core: Manage CQ umem in core code
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (3 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 04/50] RDMA/core: Promote UMEM to a core component Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 06/50] RDMA/efa: Rely on CPU address in create‑QP Leon Romanovsky
                   ` (46 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

In the current implementation, CQ umem is handled both by ib_core and
the driver. ib_core sometimes creates and destroys it, while the driver
also destroys it.

Store the umem in struct ib_cq and ensure that only ib_core manages
its lifetime, relying solely on its internal reference counter.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/umem.c                |  2 +-
 drivers/infiniband/core/uverbs_cmd.c          |  1 +
 drivers/infiniband/core/uverbs_std_types_cq.c |  7 ++++++-
 drivers/infiniband/core/verbs.c               |  2 ++
 drivers/infiniband/hw/efa/efa_verbs.c         | 24 +++++++++++-------------
 include/rdma/ib_verbs.h                       |  1 +
 6 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 8137031c2a65..fc70b918f3f0 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -283,7 +283,7 @@ EXPORT_SYMBOL(ib_umem_get);
  */
 void ib_umem_release(struct ib_umem *umem)
 {
-	if (!umem)
+	if (IS_ERR_OR_NULL(umem))
 		return;
 	if (umem->is_dmabuf)
 		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f4616deeca54..fb19395b9f2a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1085,6 +1085,7 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	return uverbs_response(attrs, &resp, sizeof(resp));
 
 err_free:
+	ib_umem_release(cq->umem);
 	rdma_restrack_put(&cq->res);
 	kfree(cq);
 err_file:
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index fab5d914029d..05809f9ff0f6 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -186,6 +186,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	cq->comp_handler  = ib_uverbs_comp_handler;
 	cq->event_handler = ib_uverbs_cq_event_handler;
 	cq->cq_context    = ev_file ? &ev_file->ev_queue : NULL;
+	/*
+	 * If UMEM is not provided here, legacy drivers will set it during
+	 * CQ creation based on their internal udata.
+	 */
+	cq->umem = umem;
 	atomic_set(&cq->usecnt, 0);
 
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
@@ -206,7 +211,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	return ret;
 
 err_free:
-	ib_umem_release(umem);
+	ib_umem_release(cq->umem);
 	rdma_restrack_put(&cq->res);
 	kfree(cq);
 err_event_file:
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 47a97797d7be..ad48d2458a3f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -49,6 +49,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
 #include <rdma/ib_addr.h>
+#include <rdma/ib_umem.h>
 #include <rdma/rw.h>
 #include <rdma/lag.h>
 
@@ -2249,6 +2250,7 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
 	if (ret)
 		return ret;
 
+	ib_umem_release(cq->umem);
 	rdma_restrack_del(&cq->res);
 	kfree(cq);
 	return ret;
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 19e3033d4ff7..ae9b98b4b528 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1083,15 +1083,14 @@ int efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 		  cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
 
 	efa_destroy_cq_idx(dev, cq->cq_idx);
-	efa_cq_user_mmap_entries_remove(cq);
+	if (cq->cpu_addr)
+		efa_cq_user_mmap_entries_remove(cq);
 	if (cq->eq) {
 		xa_erase(&dev->cqs_xa, cq->cq_idx);
 		synchronize_irq(cq->eq->irq.irqn);
 	}
 
-	if (cq->umem)
-		ib_umem_release(cq->umem);
-	else
+	if (cq->cpu_addr)
 		efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size, DMA_FROM_DEVICE);
 	return 0;
 }
@@ -1212,22 +1211,20 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->ucontext = ucontext;
 	cq->size = PAGE_ALIGN(cmd.cq_entry_size * entries * cmd.num_sub_cqs);
 
-	if (umem) {
-		if (umem->length < cq->size) {
+	if (ibcq->umem) {
+		if (ibcq->umem->length < cq->size) {
 			ibdev_dbg(&dev->ibdev, "External memory too small\n");
 			err = -EINVAL;
 			goto err_out;
 		}
 
-		if (!ib_umem_is_contiguous(umem)) {
+		if (!ib_umem_is_contiguous(ibcq->umem)) {
 			ibdev_dbg(&dev->ibdev, "Non contiguous CQ unsupported\n");
 			err = -EINVAL;
 			goto err_out;
 		}
 
-		cq->cpu_addr = NULL;
-		cq->dma_addr = ib_umem_start_dma_addr(umem);
-		cq->umem = umem;
+		cq->dma_addr = ib_umem_start_dma_addr(ibcq->umem);
 	} else {
 		cq->cpu_addr = efa_zalloc_mapped(dev, &cq->dma_addr, cq->size,
 						 DMA_FROM_DEVICE);
@@ -1259,7 +1256,7 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->ibcq.cqe = result.actual_depth;
 	WARN_ON_ONCE(entries != result.actual_depth);
 
-	if (!umem)
+	if (cq->cpu_addr)
 		err = cq_mmap_entries_setup(dev, cq, &resp, result.db_valid);
 
 	if (err) {
@@ -1296,11 +1293,12 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (cq->eq)
 		xa_erase(&dev->cqs_xa, cq->cq_idx);
 err_remove_mmap:
-	efa_cq_user_mmap_entries_remove(cq);
+	if (cq->cpu_addr)
+		efa_cq_user_mmap_entries_remove(cq);
 err_destroy_cq:
 	efa_destroy_cq_idx(dev, cq->cq_idx);
 err_free_mapped:
-	if (!umem)
+	if (cq->cpu_addr)
 		efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size,
 				DMA_FROM_DEVICE);
 err_out:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e1ec5a6c74e6..b1e34fd2ed5f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1649,6 +1649,7 @@ struct ib_cq {
 	u8 interrupt:1;
 	u8 shared:1;
 	unsigned int comp_vector;
+	struct ib_umem *umem;
 
 	/*
 	 * Implementation details of the RDMA core, don't use in drivers:

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 06/50] RDMA/efa: Rely on CPU address in create‑QP
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (4 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 05/50] RDMA/core: Manage CQ umem in core code Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 07/50] RDMA/core: Prepare create CQ path for API unification Leon Romanovsky
                   ` (45 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Align this code with other locations where efa_free_mapped() depends on the
presence of a valid CPU address, which is guaranteed when qp->rq_size != 0.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/efa/efa_verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index ae9b98b4b528..bc69aef3e436 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -579,7 +579,7 @@ static int qp_mmap_entries_setup(struct efa_qp *qp,
 
 	resp->llq_desc_offset &= ~PAGE_MASK;
 
-	if (qp->rq_size) {
+	if (qp->rq_cpu_addr) {
 		address = dev->db_bar_addr + resp->rq_db_offset;
 
 		qp->rq_db_mmap_entry =
@@ -828,7 +828,7 @@ int efa_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init_attr,
 err_destroy_qp:
 	efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
 err_free_mapped:
-	if (qp->rq_size)
+	if (qp->rq_cpu_addr)
 		efa_free_mapped(dev, qp->rq_cpu_addr, qp->rq_dma_addr,
 				qp->rq_size, DMA_TO_DEVICE);
 err_out:

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 07/50] RDMA/core: Prepare create CQ path for API unification
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (5 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 06/50] RDMA/efa: Rely on CPU address in create‑QP Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 08/50] RDMA/core: Reject zero CQE count Leon Romanovsky
                   ` (44 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Ensure that .create_cq_umem() and .create_cq() follow the same API
contract, allowing drivers to be gradually migrated to the umem-aware
CQ management flow.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c              |  2 +-
 drivers/infiniband/core/uverbs_cmd.c          |  5 ++++-
 drivers/infiniband/core/uverbs_std_types_cq.c | 16 +++++++++++-----
 drivers/infiniband/core/verbs.c               |  6 +++++-
 drivers/infiniband/hw/efa/efa.h               |  6 ++----
 drivers/infiniband/hw/efa/efa_main.c          |  3 +--
 drivers/infiniband/hw/efa/efa_verbs.c         | 10 ++--------
 include/rdma/ib_verbs.h                       |  3 +--
 8 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4e09f6e0995e..9209b8c664ef 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2701,7 +2701,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, create_ah);
 	SET_DEVICE_OP(dev_ops, create_counters);
 	SET_DEVICE_OP(dev_ops, create_cq);
-	SET_DEVICE_OP(dev_ops, create_cq_umem);
+	SET_DEVICE_OP(dev_ops, create_user_cq);
 	SET_DEVICE_OP(dev_ops, create_flow);
 	SET_DEVICE_OP(dev_ops, create_qp);
 	SET_DEVICE_OP(dev_ops, create_rwq_ind_table);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index fb19395b9f2a..c7be592f60e8 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1068,7 +1068,10 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
 
-	ret = ib_dev->ops.create_cq(cq, &attr, attrs);
+	if (ib_dev->ops.create_user_cq)
+		ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
+	else
+		ret = ib_dev->ops.create_cq(cq, &attr, attrs);
 	if (ret)
 		goto err_free;
 	rdma_restrack_add(&cq->res);
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index 05809f9ff0f6..b999d8d62694 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -78,7 +78,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	int buffer_fd;
 	int ret;
 
-	if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_cq_umem) || !ib_dev->ops.destroy_cq)
+	if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_user_cq) ||
+	    !ib_dev->ops.destroy_cq)
 		return -EOPNOTSUPP;
 
 	ret = uverbs_copy_from(&attr.comp_vector, attrs,
@@ -130,7 +131,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 
 		if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_FD) ||
 		    uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_OFFSET) ||
-		    !ib_dev->ops.create_cq_umem) {
+		    !ib_dev->ops.create_user_cq) {
 			ret = -EINVAL;
 			goto err_event_file;
 		}
@@ -155,7 +156,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 			goto err_event_file;
 
 		if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_CREATE_CQ_BUFFER_VA) ||
-		    !ib_dev->ops.create_cq_umem) {
+		    !ib_dev->ops.create_user_cq) {
 			ret = -EINVAL;
 			goto err_event_file;
 		}
@@ -196,11 +197,16 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
 
-	ret = umem ? ib_dev->ops.create_cq_umem(cq, &attr, umem, attrs) :
-		ib_dev->ops.create_cq(cq, &attr, attrs);
+	if (ib_dev->ops.create_user_cq)
+		ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
+	else
+		ret = ib_dev->ops.create_cq(cq, &attr, attrs);
 	if (ret)
 		goto err_free;
 
+	/* Check that driver didn't overrun existing umem */
+	WARN_ON(umem && cq->umem != umem);
+
 	obj->uevent.uobject.object = cq;
 	obj->uevent.uobject.user_handle = user_handle;
 	rdma_restrack_add(&cq->res);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ad48d2458a3f..d0880346ebe2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2204,7 +2204,6 @@ struct ib_cq *__ib_create_cq(struct ib_device *device,
 		return ERR_PTR(-ENOMEM);
 
 	cq->device = device;
-	cq->uobject = NULL;
 	cq->comp_handler = comp_handler;
 	cq->event_handler = event_handler;
 	cq->cq_context = cq_context;
@@ -2219,6 +2218,11 @@ struct ib_cq *__ib_create_cq(struct ib_device *device,
 		kfree(cq);
 		return ERR_PTR(ret);
 	}
+	/*
+	 * We are in kernel verbs flow and drivers are not allowed
+	 * to set umem pointer, it needs to stay NULL.
+	 */
+	WARN_ON_ONCE(cq->umem);
 
 	rdma_restrack_add(&cq->res);
 	return cq;
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 96f9c3bc98b2..00b19f2ba3da 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -161,10 +161,8 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
 int efa_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init_attr,
 		  struct ib_udata *udata);
 int efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
-int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		  struct uverbs_attr_bundle *attrs);
-int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct ib_umem *umem, struct uverbs_attr_bundle *attrs);
+int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs);
 struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 			 u64 virt_addr, int access_flags,
 			 struct ib_dmah *dmah,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 6c415b9adb5f..a1d68dc49e45 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -371,8 +371,7 @@ static const struct ib_device_ops efa_dev_ops = {
 	.alloc_hw_device_stats = efa_alloc_hw_device_stats,
 	.alloc_pd = efa_alloc_pd,
 	.alloc_ucontext = efa_alloc_ucontext,
-	.create_cq = efa_create_cq,
-	.create_cq_umem = efa_create_cq_umem,
+	.create_user_cq = efa_create_user_cq,
 	.create_qp = efa_create_qp,
 	.create_user_ah = efa_create_ah,
 	.dealloc_pd = efa_dealloc_pd,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index bc69aef3e436..d465e6acfe3c 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1130,8 +1130,8 @@ static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 	return 0;
 }
 
-int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct ib_umem *umem, struct uverbs_attr_bundle *attrs)
+int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct efa_ucontext *ucontext = rdma_udata_to_drv_context(
@@ -1306,12 +1306,6 @@ int efa_create_cq_umem(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return err;
 }
 
-int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		  struct uverbs_attr_bundle *attrs)
-{
-	return efa_create_cq_umem(ibcq, attr, NULL, attrs);
-}
-
 static int umem_to_page_list(struct efa_dev *dev,
 			     struct ib_umem *umem,
 			     u64 *page_list,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b1e34fd2ed5f..67aa5fc2c0b7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2529,9 +2529,8 @@ struct ib_device_ops {
 	int (*destroy_qp)(struct ib_qp *qp, struct ib_udata *udata);
 	int (*create_cq)(struct ib_cq *cq, const struct ib_cq_init_attr *attr,
 			 struct uverbs_attr_bundle *attrs);
-	int (*create_cq_umem)(struct ib_cq *cq,
+	int (*create_user_cq)(struct ib_cq *cq,
 			      const struct ib_cq_init_attr *attr,
-			      struct ib_umem *umem,
 			      struct uverbs_attr_bundle *attrs);
 	int (*modify_cq)(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 	int (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 08/50] RDMA/core: Reject zero CQE count
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (6 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 07/50] RDMA/core: Prepare create CQ path for API unification Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 09/50] RDMA/efa: Remove check for " Leon Romanovsky
                   ` (43 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

All drivers already ensure that the number of CQEs is at least 1.
Add this validation to the core so drivers no longer need to repeat it.
Future patches converting to the .create_user_cq() interface will remove
the per‑driver checks.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cq.c                  |  3 +++
 drivers/infiniband/core/uverbs_cmd.c          |  3 +++
 drivers/infiniband/core/uverbs_std_types_cq.c | 15 +++++++++------
 drivers/infiniband/core/verbs.c               |  3 +++
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index 584537c71545..7e0b54ec4141 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -220,6 +220,9 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
 	struct ib_cq *cq;
 	int ret = -ENOMEM;
 
+	if (WARN_ON_ONCE(!nr_cqe))
+		return ERR_PTR(-EINVAL);
+
 	cq = rdma_zalloc_drv_obj(dev, ib_cq);
 	if (!cq)
 		return ERR_PTR(ret);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index c7be592f60e8..041bed7a43b4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1032,6 +1032,9 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	if (cmd->comp_vector >= attrs->ufile->device->num_comp_vectors)
 		return -EINVAL;
 
+	if (!cmd->cqe)
+		return -EINVAL;
+
 	obj = (struct ib_ucq_object *)uobj_alloc(UVERBS_OBJECT_CQ, attrs,
 						 &ib_dev);
 	if (IS_ERR(obj))
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index b999d8d62694..d2c8f71f934c 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -84,12 +84,15 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 
 	ret = uverbs_copy_from(&attr.comp_vector, attrs,
 			       UVERBS_ATTR_CREATE_CQ_COMP_VECTOR);
-	if (!ret)
-		ret = uverbs_copy_from(&attr.cqe, attrs,
-				       UVERBS_ATTR_CREATE_CQ_CQE);
-	if (!ret)
-		ret = uverbs_copy_from(&user_handle, attrs,
-				       UVERBS_ATTR_CREATE_CQ_USER_HANDLE);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&attr.cqe, attrs, UVERBS_ATTR_CREATE_CQ_CQE);
+	if (ret || !attr.cqe)
+		return ret ? : -EINVAL;
+
+	ret = uverbs_copy_from(&user_handle, attrs,
+			       UVERBS_ATTR_CREATE_CQ_USER_HANDLE);
 	if (ret)
 		return ret;
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d0880346ebe2..9d075eeda463 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2203,6 +2203,9 @@ struct ib_cq *__ib_create_cq(struct ib_device *device,
 	if (!cq)
 		return ERR_PTR(-ENOMEM);
 
+	if (WARN_ON_ONCE(!cq_attr->cqe))
+		return ERR_PTR(-EINVAL);
+
 	cq->device = device;
 	cq->comp_handler = comp_handler;
 	cq->event_handler = event_handler;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 09/50] RDMA/efa: Remove check for zero CQE count
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (7 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 08/50] RDMA/core: Reject zero CQE count Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 10/50] RDMA/mlx5: Save 4 bytes in CQ structure Leon Romanovsky
                   ` (42 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Since ib_core now handles validation, the device driver no longer needs
to verify that the CQE count is non‑zero.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/efa/efa_verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index d465e6acfe3c..e8fb99b61be8 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1152,9 +1152,9 @@ int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (attr->flags)
 		return -EOPNOTSUPP;
 
-	if (entries < 1 || entries > dev->dev_attr.max_cq_depth) {
+	if (entries > dev->dev_attr.max_cq_depth) {
 		ibdev_dbg(ibdev,
-			  "cq: requested entries[%u] non-positive or greater than max[%u]\n",
+			  "cq: requested entries[%u] greater than max[%u]\n",
 			  entries, dev->dev_attr.max_cq_depth);
 		err = -EINVAL;
 		goto err_out;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 10/50] RDMA/mlx5: Save 4 bytes in CQ structure
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (8 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 09/50] RDMA/efa: Remove check for " Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 11/50] RDMA/mlx5: Provide a modern CQ creation interface Leon Romanovsky
                   ` (41 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There is no need to maintain separate, nearly empty create_flags and
private_flags fields. Unifying them reduces memory usage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c      | 5 +++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
 drivers/infiniband/hw/mlx5/qp.c      | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 651d76bca114..1b4290166e87 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -983,7 +983,8 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	spin_lock_init(&cq->lock);
 	cq->resize_buf = NULL;
 	cq->resize_umem = NULL;
-	cq->create_flags = attr->flags;
+	if (attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION)
+		cq->private_flags |= MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION;
 	INIT_LIST_HEAD(&cq->list_send_qp);
 	INIT_LIST_HEAD(&cq->list_recv_qp);
 
@@ -1017,7 +1018,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	MLX5_SET(cqc, cqc, uar_page, index);
 	MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn);
 	MLX5_SET64(cqc, cqc, dbr_addr, cq->db.dma);
-	if (cq->create_flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN)
+	if (attr->flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN)
 		MLX5_SET(cqc, cqc, oi, 1);
 
 	if (udata) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4f4114d95130..ce3372aea48b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -561,6 +561,7 @@ struct mlx5_ib_cq_buf {
 enum mlx5_ib_cq_pr_flags {
 	MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD	= 1 << 0,
 	MLX5_IB_CQ_PR_FLAGS_REAL_TIME_TS = 1 << 1,
+	MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION = 1 << 2,
 };
 
 struct mlx5_ib_cq {
@@ -581,7 +582,6 @@ struct mlx5_ib_cq {
 	int			cqe_size;
 	struct list_head	list_send_qp;
 	struct list_head	list_recv_qp;
-	u32			create_flags;
 	struct list_head	wc_list;
 	enum ib_cq_notify_flags notify_flags;
 	struct work_struct	notify_work;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 0324909e3151..7af09e668c4c 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1274,7 +1274,7 @@ static int get_ts_format(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 		}
 		return MLX5_TIMESTAMP_FORMAT_REAL_TIME;
 	}
-	if (cq->create_flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION) {
+	if (cq->private_flags & MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION) {
 		if (!fr_sup) {
 			mlx5_ib_dbg(dev,
 				    "Free running TS format is not supported\n");

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 11/50] RDMA/mlx5: Provide a modern CQ creation interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (9 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 10/50] RDMA/mlx5: Save 4 bytes in CQ structure Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 12/50] RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers Leon Romanovsky
                   ` (40 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The uverbs CQ creation UAPI allows users to supply their own umem for a CQ.
Update mlx5 to support this workflow while preserving support for creating
umem through the legacy interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c      | 154 +++++++++++++++++++++++------------
 drivers/infiniband/hw/mlx5/main.c    |   1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   3 +
 3 files changed, 107 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 1b4290166e87..52a435efd0de 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -749,16 +749,15 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 
 	*cqe_size = ucmd.cqe_size;
 
-	cq->buf.umem =
-		ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
-			    entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(cq->buf.umem)) {
-		err = PTR_ERR(cq->buf.umem);
-		return err;
-	}
+	if (!cq->ibcq.umem)
+		cq->ibcq.umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+					    entries * ucmd.cqe_size,
+					    IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(cq->ibcq.umem))
+		return PTR_ERR(cq->ibcq.umem);
 
 	page_size = mlx5_umem_find_best_cq_quantized_pgoff(
-		cq->buf.umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
+		cq->ibcq.umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
 		page_offset, 64, &page_offset_quantized);
 	if (!page_size) {
 		err = -EINVAL;
@@ -769,12 +768,12 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	if (err)
 		goto err_umem;
 
-	ncont = ib_umem_num_dma_blocks(cq->buf.umem, page_size);
+	ncont = ib_umem_num_dma_blocks(cq->ibcq.umem, page_size);
 	mlx5_ib_dbg(
 		dev,
 		"addr 0x%llx, size %u, npages %zu, page_size %lu, ncont %d\n",
 		ucmd.buf_addr, entries * ucmd.cqe_size,
-		ib_umem_num_pages(cq->buf.umem), page_size, ncont);
+		ib_umem_num_pages(cq->ibcq.umem), page_size, ncont);
 
 	*inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
 		 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * ncont;
@@ -785,7 +784,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	}
 
 	pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, *cqb, pas);
-	mlx5_ib_populate_pas(cq->buf.umem, page_size, pas, 0);
+	mlx5_ib_populate_pas(cq->ibcq.umem, page_size, pas, 0);
 
 	cqc = MLX5_ADDR_OF(create_cq_in, *cqb, cq_context);
 	MLX5_SET(cqc, cqc, log_page_size,
@@ -858,7 +857,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	mlx5_ib_db_unmap_user(context, &cq->db);
 
 err_umem:
-	ib_umem_release(cq->buf.umem);
+	/* UMEM is released by ib_core */
 	return err;
 }
 
@@ -868,7 +867,6 @@ static void destroy_cq_user(struct mlx5_ib_cq *cq, struct ib_udata *udata)
 		udata, struct mlx5_ib_ucontext, ibucontext);
 
 	mlx5_ib_db_unmap_user(context, &cq->db);
-	ib_umem_release(cq->buf.umem);
 }
 
 static void init_cq_frag_buf(struct mlx5_ib_cq_buf *buf)
@@ -949,8 +947,9 @@ static void notify_soft_wc_handler(struct work_struct *work)
 	cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
 }
 
-int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		      struct uverbs_attr_bundle *attrs)
+int mlx5_ib_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -967,8 +966,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int eqn;
 	int err;
 
-	if (entries < 0 ||
-	    (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))))
+	if (attr->cqe > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
 		return -EINVAL;
 
 	if (check_cq_create_flags(attr->flags))
@@ -981,27 +979,15 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->ibcq.cqe = entries - 1;
 	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
-	cq->resize_buf = NULL;
-	cq->resize_umem = NULL;
 	if (attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION)
 		cq->private_flags |= MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION;
 	INIT_LIST_HEAD(&cq->list_send_qp);
 	INIT_LIST_HEAD(&cq->list_recv_qp);
 
-	if (udata) {
-		err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size,
-				     &index, &inlen, attrs);
-		if (err)
-			return err;
-	} else {
-		cqe_size = cache_line_size() == 128 ? 128 : 64;
-		err = create_cq_kernel(dev, cq, entries, cqe_size, &cqb,
-				       &index, &inlen);
-		if (err)
-			return err;
-
-		INIT_WORK(&cq->notify_work, notify_soft_wc_handler);
-	}
+	err = create_cq_user(dev, udata, cq, entries, &cqb, &cqe_size, &index,
+			     &inlen, attrs);
+	if (err)
+		return err;
 
 	err = mlx5_comp_eqn_get(dev->mdev, vector, &eqn);
 	if (err)
@@ -1021,12 +1007,8 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (attr->flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN)
 		MLX5_SET(cqc, cqc, oi, 1);
 
-	if (udata) {
-		cq->mcq.comp = mlx5_add_cq_to_tasklet;
-		cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp;
-	} else {
-		cq->mcq.comp  = mlx5_ib_cq_comp;
-	}
+	cq->mcq.comp = mlx5_add_cq_to_tasklet;
+	cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp;
 
 	err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out, sizeof(out));
 	if (err)
@@ -1037,12 +1019,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	INIT_LIST_HEAD(&cq->wc_list);
 
-	if (udata)
-		if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) {
-			err = -EFAULT;
-			goto err_cmd;
-		}
-
+	if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) {
+		err = -EFAULT;
+		goto err_cmd;
+	}
 
 	kvfree(cqb);
 	return 0;
@@ -1052,10 +1032,82 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 err_cqb:
 	kvfree(cqb);
-	if (udata)
-		destroy_cq_user(cq, udata);
-	else
-		destroy_cq_kernel(dev, cq);
+	destroy_cq_user(cq, udata);
+	return err;
+}
+
+
+int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
+	struct mlx5_ib_dev *dev = to_mdev(ibdev);
+	struct mlx5_ib_cq *cq = to_mcq(ibcq);
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
+	int index;
+	int inlen;
+	u32 *cqb = NULL;
+	void *cqc;
+	int cqe_size;
+	int eqn;
+	int err;
+
+	if (attr->cqe > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
+		return -EINVAL;
+
+	entries = roundup_pow_of_two(entries + 1);
+	if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
+		return -EINVAL;
+
+	cq->ibcq.cqe = entries - 1;
+	mutex_init(&cq->resize_mutex);
+	spin_lock_init(&cq->lock);
+	INIT_LIST_HEAD(&cq->list_send_qp);
+	INIT_LIST_HEAD(&cq->list_recv_qp);
+
+	cqe_size = cache_line_size() == 128 ? 128 : 64;
+	err = create_cq_kernel(dev, cq, entries, cqe_size, &cqb, &index,
+			       &inlen);
+	if (err)
+		return err;
+
+	INIT_WORK(&cq->notify_work, notify_soft_wc_handler);
+
+	err = mlx5_comp_eqn_get(dev->mdev, vector, &eqn);
+	if (err)
+		goto err_cqb;
+
+	cq->cqe_size = cqe_size;
+
+	cqc = MLX5_ADDR_OF(create_cq_in, cqb, cq_context);
+	MLX5_SET(cqc, cqc, cqe_sz,
+		 cqe_sz_to_mlx_sz(cqe_size,
+				  cq->private_flags &
+				  MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD));
+	MLX5_SET(cqc, cqc, log_cq_size, ilog2(entries));
+	MLX5_SET(cqc, cqc, uar_page, index);
+	MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn);
+	MLX5_SET64(cqc, cqc, dbr_addr, cq->db.dma);
+
+	cq->mcq.comp = mlx5_ib_cq_comp;
+
+	err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out,
+				  sizeof(out));
+	if (err)
+		goto err_cqb;
+
+	mlx5_ib_dbg(dev, "cqn 0x%x\n", cq->mcq.cqn);
+	cq->mcq.event = mlx5_ib_cq_event;
+
+	INIT_LIST_HEAD(&cq->wc_list);
+	kvfree(cqb);
+	return 0;
+
+err_cqb:
+	kvfree(cqb);
+	destroy_cq_kernel(dev, cq);
 	return err;
 }
 
@@ -1390,8 +1442,8 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 
 	if (udata) {
 		cq->ibcq.cqe = entries - 1;
-		ib_umem_release(cq->buf.umem);
-		cq->buf.umem = cq->resize_umem;
+		ib_umem_release(cq->ibcq.umem);
+		cq->ibcq.umem = cq->resize_umem;
 		cq->resize_umem = NULL;
 	} else {
 		struct mlx5_ib_cq_buf tbuf;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index eba023b7af0f..4f49f65e2c16 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4447,6 +4447,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.check_mr_status = mlx5_ib_check_mr_status,
 	.create_ah = mlx5_ib_create_ah,
 	.create_cq = mlx5_ib_create_cq,
+	.create_user_cq = mlx5_ib_create_user_cq,
 	.create_qp = mlx5_ib_create_qp,
 	.create_srq = mlx5_ib_create_srq,
 	.create_user_ah = mlx5_ib_create_ah,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index ce3372aea48b..2556e326afde 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1371,6 +1371,9 @@ int mlx5_ib_read_wqe_srq(struct mlx5_ib_srq *srq, int wqe_index, void *buffer,
 			 size_t buflen, size_t *bc);
 int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
+int mlx5_ib_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs);
 int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int mlx5_ib_pre_destroy_cq(struct ib_cq *cq);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 12/50] RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (10 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 11/50] RDMA/mlx5: Provide a modern CQ creation interface Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 13/50] RDMA/mlx4: Introduce a modern CQ creation interface Leon Romanovsky
                   ` (39 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Inline the mlx4_ib_get_cq_umem helper function into its two call sites
(mlx4_ib_create_cq and mlx4_alloc_resize_umem) to prepare for the
transition to modern CQ creation interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c | 108 ++++++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index c592374f4a58..94e9ff45725a 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -135,45 +135,6 @@ static void mlx4_ib_free_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *
 	mlx4_buf_free(dev->dev, (cqe + 1) * buf->entry_size, &buf->buf);
 }
 
-static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev,
-			       struct mlx4_ib_cq_buf *buf,
-			       struct ib_umem **umem, u64 buf_addr, int cqe)
-{
-	int err;
-	int cqe_size = dev->dev->caps.cqe_size;
-	int shift;
-	int n;
-
-	*umem = ib_umem_get(&dev->ib_dev, buf_addr, cqe * cqe_size,
-			    IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(*umem))
-		return PTR_ERR(*umem);
-
-	shift = mlx4_ib_umem_calc_optimal_mtt_size(*umem, 0, &n);
-	if (shift < 0) {
-		err = shift;
-		goto err_buf;
-	}
-
-	err = mlx4_mtt_init(dev->dev, n, shift, &buf->mtt);
-	if (err)
-		goto err_buf;
-
-	err = mlx4_ib_umem_write_mtt(dev, &buf->mtt, *umem);
-	if (err)
-		goto err_mtt;
-
-	return 0;
-
-err_mtt:
-	mlx4_mtt_cleanup(dev->dev, &buf->mtt);
-
-err_buf:
-	ib_umem_release(*umem);
-
-	return err;
-}
-
 #define CQ_CREATE_FLAGS_SUPPORTED IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION
 int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs)
@@ -208,6 +169,9 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	if (udata) {
 		struct mlx4_ib_create_cq ucmd;
+		int cqe_size = dev->dev->caps.cqe_size;
+		int shift;
+		int n;
 
 		if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
 			err = -EFAULT;
@@ -215,10 +179,28 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		}
 
 		buf_addr = (void *)(unsigned long)ucmd.buf_addr;
-		err = mlx4_ib_get_cq_umem(dev, &cq->buf, &cq->umem,
-					  ucmd.buf_addr, entries);
-		if (err)
+
+		cq->umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+				       entries * cqe_size,
+				       IB_ACCESS_LOCAL_WRITE);
+		if (IS_ERR(cq->umem)) {
+			err = PTR_ERR(cq->umem);
 			goto err_cq;
+		}
+
+		shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->umem, 0, &n);
+		if (shift < 0) {
+			err = shift;
+			goto err_umem;
+		}
+
+		err = mlx4_mtt_init(dev->dev, n, shift, &cq->buf.mtt);
+		if (err)
+			goto err_umem;
+
+		err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->umem);
+		if (err)
+			goto err_mtt;
 
 		err = mlx4_ib_db_map_user(udata, ucmd.db_addr, &cq->db);
 		if (err)
@@ -281,6 +263,7 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 err_mtt:
 	mlx4_mtt_cleanup(dev->dev, &cq->buf.mtt);
 
+err_umem:
 	ib_umem_release(cq->umem);
 	if (!udata)
 		mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
@@ -320,6 +303,9 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 				   int entries, struct ib_udata *udata)
 {
 	struct mlx4_ib_resize_cq ucmd;
+	int cqe_size = dev->dev->caps.cqe_size;
+	int shift;
+	int n;
 	int err;
 
 	if (cq->resize_umem)
@@ -332,17 +318,43 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	if (!cq->resize_buf)
 		return -ENOMEM;
 
-	err = mlx4_ib_get_cq_umem(dev, &cq->resize_buf->buf, &cq->resize_umem,
-				  ucmd.buf_addr, entries);
-	if (err) {
-		kfree(cq->resize_buf);
-		cq->resize_buf = NULL;
-		return err;
+	cq->resize_umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+				      entries * cqe_size,
+				      IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(cq->resize_umem)) {
+		err = PTR_ERR(cq->resize_umem);
+		goto err_buf;
+	}
+
+	shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->resize_umem, 0, &n);
+	if (shift < 0) {
+		err = shift;
+		goto err_umem;
 	}
 
+	err = mlx4_mtt_init(dev->dev, n, shift, &cq->resize_buf->buf.mtt);
+	if (err)
+		goto err_umem;
+
+	err = mlx4_ib_umem_write_mtt(dev, &cq->resize_buf->buf.mtt,
+				     cq->resize_umem);
+	if (err)
+		goto err_mtt;
+
 	cq->resize_buf->cqe = entries - 1;
 
 	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt);
+
+err_umem:
+	ib_umem_release(cq->resize_umem);
+
+err_buf:
+	kfree(cq->resize_buf);
+	cq->resize_buf = NULL;
+	return err;
 }
 
 static int mlx4_ib_get_outstanding_cqes(struct mlx4_ib_cq *cq)

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 13/50] RDMA/mlx4: Introduce a modern CQ creation interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (11 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 12/50] RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 14/50] RDMA/mlx4: Remove unused create_flags field from CQ structure Leon Romanovsky
                   ` (38 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The uverbs CQ creation UAPI allows users to supply their own umem when
creating a CQ. Update mlx4 to support this model while preserving compatibility
with the legacy interface that allocates umem internally.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c      | 191 ++++++++++++++++++++---------------
 drivers/infiniband/hw/mlx4/main.c    |   1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   4 +-
 3 files changed, 111 insertions(+), 85 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 94e9ff45725a..4bee08317620 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -136,8 +136,9 @@ static void mlx4_ib_free_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *
 }
 
 #define CQ_CREATE_FLAGS_SUPPORTED IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION
-int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		      struct uverbs_attr_bundle *attrs)
+int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -145,13 +146,16 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int vector = attr->comp_vector;
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct mlx4_ib_cq *cq = to_mcq(ibcq);
-	struct mlx4_uar *uar;
+	struct mlx4_ib_create_cq ucmd;
+	int cqe_size = dev->dev->caps.cqe_size;
 	void *buf_addr;
+	int shift;
+	int n;
 	int err;
 	struct mlx4_ib_ucontext *context = rdma_udata_to_drv_context(
 		udata, struct mlx4_ib_ucontext, ibucontext);
 
-	if (entries < 1 || entries > dev->dev->caps.max_cqes)
+	if (attr->cqe > dev->dev->caps.max_cqes)
 		return -EINVAL;
 
 	if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
@@ -161,95 +165,63 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->ibcq.cqe = entries - 1;
 	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
-	cq->resize_buf = NULL;
-	cq->resize_umem = NULL;
 	cq->create_flags = attr->flags;
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
 
-	if (udata) {
-		struct mlx4_ib_create_cq ucmd;
-		int cqe_size = dev->dev->caps.cqe_size;
-		int shift;
-		int n;
-
-		if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
-			err = -EFAULT;
-			goto err_cq;
-		}
-
-		buf_addr = (void *)(unsigned long)ucmd.buf_addr;
-
-		cq->umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
-				       entries * cqe_size,
-				       IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(cq->umem)) {
-			err = PTR_ERR(cq->umem);
-			goto err_cq;
-		}
-
-		shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->umem, 0, &n);
-		if (shift < 0) {
-			err = shift;
-			goto err_umem;
-		}
-
-		err = mlx4_mtt_init(dev->dev, n, shift, &cq->buf.mtt);
-		if (err)
-			goto err_umem;
-
-		err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->umem);
-		if (err)
-			goto err_mtt;
+	if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
+		err = -EFAULT;
+		goto err_cq;
+	}
 
-		err = mlx4_ib_db_map_user(udata, ucmd.db_addr, &cq->db);
-		if (err)
-			goto err_mtt;
+	buf_addr = (void *)(unsigned long)ucmd.buf_addr;
 
-		uar = &context->uar;
-		cq->mcq.usage = MLX4_RES_USAGE_USER_VERBS;
-	} else {
-		err = mlx4_db_alloc(dev->dev, &cq->db, 1);
-		if (err)
-			goto err_cq;
+	if (!ibcq->umem)
+		ibcq->umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+					 entries * cqe_size,
+					 IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(ibcq->umem)) {
+		err = PTR_ERR(ibcq->umem);
+		goto err_cq;
+	}
 
-		cq->mcq.set_ci_db  = cq->db.db;
-		cq->mcq.arm_db     = cq->db.db + 1;
-		*cq->mcq.set_ci_db = 0;
-		*cq->mcq.arm_db    = 0;
+	shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->ibcq.umem, 0, &n);
+	if (shift < 0) {
+		err = shift;
+		goto err_cq;
+	}
 
-		err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries);
-		if (err)
-			goto err_db;
+	err = mlx4_mtt_init(dev->dev, n, shift, &cq->buf.mtt);
+	if (err)
+		goto err_cq;
 
-		buf_addr = &cq->buf.buf;
+	err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->ibcq.umem);
+	if (err)
+		goto err_mtt;
 
-		uar = &dev->priv_uar;
-		cq->mcq.usage = MLX4_RES_USAGE_DRIVER;
-	}
+	err = mlx4_ib_db_map_user(udata, ucmd.db_addr, &cq->db);
+	if (err)
+		goto err_mtt;
 
 	if (dev->eq_table)
 		vector = dev->eq_table[vector % ibdev->num_comp_vectors];
 
-	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, cq->db.dma,
-			    &cq->mcq, vector, 0,
+	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, &context->uar,
+			    cq->db.dma, &cq->mcq, vector, 0,
 			    !!(cq->create_flags &
 			       IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION),
-			    buf_addr, !!udata);
+			    buf_addr, true);
 	if (err)
 		goto err_dbmap;
 
-	if (udata)
-		cq->mcq.tasklet_ctx.comp = mlx4_ib_cq_comp;
-	else
-		cq->mcq.comp = mlx4_ib_cq_comp;
+	cq->mcq.tasklet_ctx.comp = mlx4_ib_cq_comp;
 	cq->mcq.event = mlx4_ib_cq_event;
+	cq->mcq.usage = MLX4_RES_USAGE_USER_VERBS;
 
-	if (udata)
-		if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof (__u32))) {
-			err = -EFAULT;
-			goto err_cq_free;
-		}
+	if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof(__u32))) {
+		err = -EFAULT;
+		goto err_cq_free;
+	}
 
 	return 0;
 
@@ -257,21 +229,72 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	mlx4_cq_free(dev->dev, &cq->mcq);
 
 err_dbmap:
-	if (udata)
-		mlx4_ib_db_unmap_user(context, &cq->db);
+	mlx4_ib_db_unmap_user(context, &cq->db);
 
 err_mtt:
 	mlx4_mtt_cleanup(dev->dev, &cq->buf.mtt);
+	/* UMEM is released by ib_core */
 
-err_umem:
-	ib_umem_release(cq->umem);
-	if (!udata)
-		mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
+err_cq:
+	return err;
+}
+
+int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
+	struct mlx4_ib_dev *dev = to_mdev(ibdev);
+	struct mlx4_ib_cq *cq = to_mcq(ibcq);
+	void *buf_addr;
+	int err;
+
+	if (attr->cqe > dev->dev->caps.max_cqes)
+		return -EINVAL;
+
+	entries      = roundup_pow_of_two(entries + 1);
+	cq->ibcq.cqe = entries - 1;
+	mutex_init(&cq->resize_mutex);
+	spin_lock_init(&cq->lock);
+	INIT_LIST_HEAD(&cq->send_qp_list);
+	INIT_LIST_HEAD(&cq->recv_qp_list);
+
+	err = mlx4_db_alloc(dev->dev, &cq->db, 1);
+	if (err)
+		return err;
+
+	cq->mcq.set_ci_db  = cq->db.db;
+	cq->mcq.arm_db     = cq->db.db + 1;
+	*cq->mcq.set_ci_db = 0;
+	*cq->mcq.arm_db    = 0;
+
+	err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries);
+	if (err)
+		goto err_db;
+
+	buf_addr = &cq->buf.buf;
+
+	if (dev->eq_table)
+		vector = dev->eq_table[vector % ibdev->num_comp_vectors];
+
+	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, &dev->priv_uar,
+			    cq->db.dma, &cq->mcq, vector, 0, 0,
+			    buf_addr, false);
+	if (err)
+		goto err_buf;
+
+	cq->mcq.comp = mlx4_ib_cq_comp;
+	cq->mcq.event = mlx4_ib_cq_event;
+	cq->mcq.usage = MLX4_RES_USAGE_DRIVER;
+
+	return 0;
+
+err_buf:
+	mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
 
 err_db:
-	if (!udata)
-		mlx4_db_free(dev->dev, &cq->db);
-err_cq:
+	mlx4_db_free(dev->dev, &cq->db);
 	return err;
 }
 
@@ -445,8 +468,8 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 	if (ibcq->uobject) {
 		cq->buf      = cq->resize_buf->buf;
 		cq->ibcq.cqe = cq->resize_buf->cqe;
-		ib_umem_release(cq->umem);
-		cq->umem     = cq->resize_umem;
+		ib_umem_release(cq->ibcq.umem);
+		cq->ibcq.umem     = cq->resize_umem;
 
 		kfree(cq->resize_buf);
 		cq->resize_buf = NULL;
@@ -506,11 +529,11 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata)
 				struct mlx4_ib_ucontext,
 				ibucontext),
 			&mcq->db);
+		/* UMEM is released by ib_core */
 	} else {
 		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe);
 		mlx4_db_free(dev->dev, &mcq->db);
 	}
-	ib_umem_release(mcq->umem);
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index dd35e03402ab..fc05e7a1a870 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2527,6 +2527,7 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 	.attach_mcast = mlx4_ib_mcg_attach,
 	.create_ah = mlx4_ib_create_ah,
 	.create_cq = mlx4_ib_create_cq,
+	.create_user_cq = mlx4_ib_create_user_cq,
 	.create_qp = mlx4_ib_create_qp,
 	.create_srq = mlx4_ib_create_srq,
 	.dealloc_pd = mlx4_ib_dealloc_pd,
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 5df5b955114e..96563c0836ce 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -121,7 +121,6 @@ struct mlx4_ib_cq {
 	struct mlx4_db		db;
 	spinlock_t		lock;
 	struct mutex		resize_mutex;
-	struct ib_umem	       *umem;
 	struct ib_umem	       *resize_umem;
 	int			create_flags;
 	/* List of qps that it serves.*/
@@ -772,6 +771,9 @@ int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
+int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs);
 int mlx4_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int mlx4_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int mlx4_ib_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 14/50] RDMA/mlx4: Remove unused create_flags field from CQ structure
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (12 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 13/50] RDMA/mlx4: Introduce a modern CQ creation interface Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 15/50] RDMA/bnxt_re: Convert to modern CQ interface Leon Romanovsky
                   ` (37 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ creation flags do not need to be cached, as they are used
immediately at the point where they are stored. Remove the unused
field and reclaim 4 bytes.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c      | 4 +---
 drivers/infiniband/hw/mlx4/mlx4_ib.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 4bee08317620..83169060d120 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -165,7 +165,6 @@ int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
 	cq->ibcq.cqe = entries - 1;
 	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
-	cq->create_flags = attr->flags;
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
 
@@ -208,8 +207,7 @@ int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
 
 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, &context->uar,
 			    cq->db.dma, &cq->mcq, vector, 0,
-			    !!(cq->create_flags &
-			       IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION),
+			    attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION,
 			    buf_addr, true);
 	if (err)
 		goto err_dbmap;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 96563c0836ce..6a7ed5225c7d 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -122,7 +122,6 @@ struct mlx4_ib_cq {
 	spinlock_t		lock;
 	struct mutex		resize_mutex;
 	struct ib_umem	       *resize_umem;
-	int			create_flags;
 	/* List of qps that it serves.*/
 	struct list_head		send_qp_list;
 	struct list_head		recv_qp_list;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 15/50] RDMA/bnxt_re: Convert to modern CQ interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (13 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 14/50] RDMA/mlx4: Remove unused create_flags field from CQ structure Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 16/50] RDMA/cxgb4: Separate kernel and user CQ creation paths Leon Romanovsky
                   ` (36 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Allow users to supply their own umem.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 172 ++++++++++++++++++++-----------
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |   4 +-
 drivers/infiniband/hw/bnxt_re/main.c     |   1 +
 3 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index c146f43ae875..b8516d8b8426 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3134,22 +3134,20 @@ int bnxt_re_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata)
 	nq = cq->qplib_cq.nq;
 	cctx = rdev->chip_ctx;
 
-	if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) {
-		free_page((unsigned long)cq->uctx_cq_page);
+	free_page((unsigned long)cq->uctx_cq_page);
+	if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT)
 		hash_del(&cq->hash_entry);
-	}
-	bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
 
+	bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
 	bnxt_re_put_nq(rdev, nq);
-	ib_umem_release(cq->umem);
-
 	atomic_dec(&rdev->stats.res.cq_count);
 	kfree(cq->cql);
 	return 0;
 }
 
-int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		      struct uverbs_attr_bundle *attrs)
+int bnxt_re_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
 {
 	struct bnxt_re_cq *cq = container_of(ibcq, struct bnxt_re_cq, ib_cq);
 	struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibcq->device, ibdev);
@@ -3158,6 +3156,8 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx);
 	struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr;
 	struct bnxt_qplib_chip_ctx *cctx;
+	struct bnxt_re_cq_resp resp = {};
+	struct bnxt_re_cq_req req;
 	int cqe = attr->cqe;
 	int rc, entries;
 	u32 active_cqs;
@@ -3166,7 +3166,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		return -EOPNOTSUPP;
 
 	/* Validate CQ fields */
-	if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
+	if (attr->cqe > dev_attr->max_cq_wqes) {
 		ibdev_err(&rdev->ibdev, "Failed to create CQ -max exceeded");
 		return -EINVAL;
 	}
@@ -3181,33 +3181,107 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	cq->qplib_cq.sg_info.pgsize = PAGE_SIZE;
 	cq->qplib_cq.sg_info.pgshft = PAGE_SHIFT;
-	if (udata) {
-		struct bnxt_re_cq_req req;
-		if (ib_copy_from_udata(&req, udata, sizeof(req))) {
-			rc = -EFAULT;
-			goto fail;
-		}
 
-		cq->umem = ib_umem_get(&rdev->ibdev, req.cq_va,
-				       entries * sizeof(struct cq_base),
-				       IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(cq->umem)) {
-			rc = PTR_ERR(cq->umem);
-			goto fail;
-		}
-		cq->qplib_cq.sg_info.umem = cq->umem;
-		cq->qplib_cq.dpi = &uctx->dpi;
-	} else {
-		cq->max_cql = min_t(u32, entries, MAX_CQL_PER_POLL);
-		cq->cql = kcalloc(cq->max_cql, sizeof(struct bnxt_qplib_cqe),
-				  GFP_KERNEL);
-		if (!cq->cql) {
+	if (ib_copy_from_udata(&req, udata, sizeof(req)))
+		return -EFAULT;
+
+	if (!ibcq->umem)
+		ibcq->umem = ib_umem_get(&rdev->ibdev, req.cq_va,
+					 entries * sizeof(struct cq_base),
+					 IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(ibcq->umem))
+		return PTR_ERR(ibcq->umem);
+
+	cq->qplib_cq.sg_info.umem = cq->ib_cq.umem;
+	cq->qplib_cq.dpi = &uctx->dpi;
+
+	cq->qplib_cq.max_wqe = entries;
+	cq->qplib_cq.coalescing = &rdev->cq_coalescing;
+	cq->qplib_cq.nq = bnxt_re_get_nq(rdev);
+	cq->qplib_cq.cnq_hw_ring_id = cq->qplib_cq.nq->ring_id;
+
+	rc = bnxt_qplib_create_cq(&rdev->qplib_res, &cq->qplib_cq);
+	if (rc)
+		goto create_cq;
+
+	cq->ib_cq.cqe = entries;
+	cq->cq_period = cq->qplib_cq.period;
+
+	active_cqs = atomic_inc_return(&rdev->stats.res.cq_count);
+	if (active_cqs > rdev->stats.res.cq_watermark)
+		rdev->stats.res.cq_watermark = active_cqs;
+	spin_lock_init(&cq->cq_lock);
+
+	if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) {
+		/* Allocate a page */
+		cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL);
+		if (!cq->uctx_cq_page) {
 			rc = -ENOMEM;
-			goto fail;
+			goto c2fail;
 		}
+		hash_add(rdev->cq_hash, &cq->hash_entry, cq->qplib_cq.id);
+		resp.comp_mask |= BNXT_RE_CQ_TOGGLE_PAGE_SUPPORT;
+	}
+	resp.cqid = cq->qplib_cq.id;
+	resp.tail = cq->qplib_cq.hwq.cons;
+	resp.phase = cq->qplib_cq.period;
+	rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+	if (rc) {
+		ibdev_err(&rdev->ibdev, "Failed to copy CQ udata");
+		goto free_mem;
+	}
 
-		cq->qplib_cq.dpi = &rdev->dpi_privileged;
+	return 0;
+
+free_mem:
+	if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT)
+		hash_del(&cq->hash_entry);
+	free_page((unsigned long)cq->uctx_cq_page);
+c2fail:
+	atomic_dec(&rdev->stats.res.cq_count);
+	bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
+	/* UMEM is released by ib_core */
+create_cq:
+	bnxt_re_put_nq(rdev, cq->qplib_cq.nq);
+	return rc;
+}
+
+int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct uverbs_attr_bundle *attrs)
+{
+	struct bnxt_re_cq *cq = container_of(ibcq, struct bnxt_re_cq, ib_cq);
+	struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibcq->device, ibdev);
+	struct bnxt_qplib_dev_attr *dev_attr = rdev->dev_attr;
+	int cqe = attr->cqe;
+	int rc, entries;
+	u32 active_cqs;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	/* Validate CQ fields */
+	if (attr->cqe > dev_attr->max_cq_wqes) {
+		ibdev_err(&rdev->ibdev, "Failed to create CQ -max exceeded");
+		return -EINVAL;
 	}
+
+	cq->rdev = rdev;
+	cq->qplib_cq.cq_handle = (u64)(unsigned long)(&cq->qplib_cq);
+
+	entries = bnxt_re_init_depth(cqe + 1, NULL);
+	if (entries > dev_attr->max_cq_wqes + 1)
+		entries = dev_attr->max_cq_wqes + 1;
+
+	cq->qplib_cq.sg_info.pgsize = PAGE_SIZE;
+	cq->qplib_cq.sg_info.pgshft = PAGE_SHIFT;
+
+	cq->max_cql = min_t(u32, entries, MAX_CQL_PER_POLL);
+	cq->cql = kcalloc(cq->max_cql, sizeof(struct bnxt_qplib_cqe),
+			  GFP_KERNEL);
+	if (!cq->cql)
+		return -ENOMEM;
+
+	cq->qplib_cq.dpi = &rdev->dpi_privileged;
 	cq->qplib_cq.max_wqe = entries;
 	cq->qplib_cq.coalescing = &rdev->cq_coalescing;
 	cq->qplib_cq.nq = bnxt_re_get_nq(rdev);
@@ -3227,38 +3301,10 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		rdev->stats.res.cq_watermark = active_cqs;
 	spin_lock_init(&cq->cq_lock);
 
-	if (udata) {
-		struct bnxt_re_cq_resp resp = {};
-
-		if (cctx->modes.toggle_bits & BNXT_QPLIB_CQ_TOGGLE_BIT) {
-			hash_add(rdev->cq_hash, &cq->hash_entry, cq->qplib_cq.id);
-			/* Allocate a page */
-			cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL);
-			if (!cq->uctx_cq_page) {
-				rc = -ENOMEM;
-				goto c2fail;
-			}
-			resp.comp_mask |= BNXT_RE_CQ_TOGGLE_PAGE_SUPPORT;
-		}
-		resp.cqid = cq->qplib_cq.id;
-		resp.tail = cq->qplib_cq.hwq.cons;
-		resp.phase = cq->qplib_cq.period;
-		resp.rsvd = 0;
-		rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
-		if (rc) {
-			ibdev_err(&rdev->ibdev, "Failed to copy CQ udata");
-			bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
-			goto free_mem;
-		}
-	}
-
 	return 0;
 
-free_mem:
-	free_page((unsigned long)cq->uctx_cq_page);
-c2fail:
-	ib_umem_release(cq->umem);
 fail:
+	bnxt_re_put_nq(rdev, cq->qplib_cq.nq);
 	kfree(cq->cql);
 	return rc;
 }
@@ -3271,8 +3317,8 @@ static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
 
 	cq->qplib_cq.max_wqe = cq->resize_cqe;
 	if (cq->resize_umem) {
-		ib_umem_release(cq->umem);
-		cq->umem = cq->resize_umem;
+		ib_umem_release(cq->ib_cq.umem);
+		cq->ib_cq.umem = cq->resize_umem;
 		cq->resize_umem = NULL;
 		cq->resize_cqe = 0;
 	}
@@ -3872,7 +3918,7 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
 	/* User CQ; the only processing we do is to
 	 * complete any pending CQ resize operation.
 	 */
-	if (cq->umem) {
+	if (cq->ib_cq.umem) {
 		if (cq->resize_umem)
 			bnxt_re_resize_cq_complete(cq);
 		return 0;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
index 76ba9ab04d5c..cac3e10b73f6 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
@@ -108,7 +108,6 @@ struct bnxt_re_cq {
 	struct bnxt_qplib_cqe	*cql;
 #define MAX_CQL_PER_POLL	1024
 	u32			max_cql;
-	struct ib_umem		*umem;
 	struct ib_umem		*resize_umem;
 	int			resize_cqe;
 	void			*uctx_cq_page;
@@ -247,6 +246,9 @@ int bnxt_re_post_recv(struct ib_qp *qp, const struct ib_recv_wr *recv_wr,
 		      const struct ib_recv_wr **bad_recv_wr);
 int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
+int bnxt_re_create_user_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs);
 int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
 int bnxt_re_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int bnxt_re_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc);
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 73003ad25ee8..368c1fd8172e 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1334,6 +1334,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
 	.alloc_ucontext = bnxt_re_alloc_ucontext,
 	.create_ah = bnxt_re_create_ah,
 	.create_cq = bnxt_re_create_cq,
+	.create_user_cq = bnxt_re_create_user_cq,
 	.create_qp = bnxt_re_create_qp,
 	.create_srq = bnxt_re_create_srq,
 	.create_user_ah = bnxt_re_create_ah,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 16/50] RDMA/cxgb4: Separate kernel and user CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (14 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 15/50] RDMA/bnxt_re: Convert to modern CQ interface Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 17/50] RDMA/mthca: Split user and kernel " Leon Romanovsky
                   ` (35 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Split the create CQ logic to clearly distinguish kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/cxgb4/cq.c       | 218 ++++++++++++++++++++++-----------
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   2 +
 drivers/infiniband/hw/cxgb4/provider.c |   1 +
 3 files changed, 152 insertions(+), 69 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 14ced7b667fa..d263cca47432 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -994,8 +994,8 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata)
 	return 0;
 }
 
-int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		   struct uverbs_attr_bundle *attrs)
+int c4iw_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -1012,25 +1012,21 @@ int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		udata, struct c4iw_ucontext, ibucontext);
 
 	pr_debug("ib_dev %p entries %d\n", ibdev, entries);
-	if (attr->flags)
+	if (attr->flags || ibcq->umem)
 		return -EOPNOTSUPP;
 
-	if (entries < 1 || entries > ibdev->attrs.max_cqe)
+	if (attr->cqe > ibdev->attrs.max_cqe)
 		return -EINVAL;
 
 	if (vector >= rhp->rdev.lldi.nciq)
 		return -EINVAL;
 
-	if (udata) {
-		if (udata->inlen < sizeof(ucmd))
-			ucontext->is_32b_cqe = 1;
-	}
+	if (udata->inlen < sizeof(ucmd))
+		ucontext->is_32b_cqe = 1;
 
 	chp->wr_waitp = c4iw_alloc_wr_wait(GFP_KERNEL);
-	if (!chp->wr_waitp) {
-		ret = -ENOMEM;
-		goto err_free_chp;
-	}
+	if (!chp->wr_waitp)
+		return -ENOMEM;
 	c4iw_init_wr_wait(chp->wr_waitp);
 
 	wr_len = sizeof(struct fw_ri_res_wr) + sizeof(struct fw_ri_res);
@@ -1063,22 +1059,19 @@ int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (hwentries < 64)
 		hwentries = 64;
 
-	memsize = hwentries * ((ucontext && ucontext->is_32b_cqe) ?
+	memsize = hwentries * (ucontext->is_32b_cqe ?
 			(sizeof(*chp->cq.queue) / 2) : sizeof(*chp->cq.queue));
 
 	/*
 	 * memsize must be a multiple of the page size if its a user cq.
 	 */
-	if (udata)
-		memsize = roundup(memsize, PAGE_SIZE);
+	memsize = roundup(memsize, PAGE_SIZE);
 
 	chp->cq.size = hwentries;
 	chp->cq.memsize = memsize;
 	chp->cq.vector = vector;
 
-	ret = create_cq(&rhp->rdev, &chp->cq,
-			ucontext ? &ucontext->uctx : &rhp->rdev.uctx,
-			chp->wr_waitp);
+	ret = create_cq(&rhp->rdev, &chp->cq, &ucontext->uctx, chp->wr_waitp);
 	if (ret)
 		goto err_free_skb;
 
@@ -1093,54 +1086,52 @@ int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (ret)
 		goto err_destroy_cq;
 
-	if (ucontext) {
-		ret = -ENOMEM;
-		mm = kmalloc(sizeof(*mm), GFP_KERNEL);
-		if (!mm)
-			goto err_remove_handle;
-		mm2 = kmalloc(sizeof(*mm2), GFP_KERNEL);
-		if (!mm2)
-			goto err_free_mm;
-
-		memset(&uresp, 0, sizeof(uresp));
-		uresp.qid_mask = rhp->rdev.cqmask;
-		uresp.cqid = chp->cq.cqid;
-		uresp.size = chp->cq.size;
-		uresp.memsize = chp->cq.memsize;
-		spin_lock(&ucontext->mmap_lock);
-		uresp.key = ucontext->key;
-		ucontext->key += PAGE_SIZE;
-		uresp.gts_key = ucontext->key;
-		ucontext->key += PAGE_SIZE;
-		/* communicate to the userspace that
-		 * kernel driver supports 64B CQE
-		 */
-		uresp.flags |= C4IW_64B_CQE;
-
-		spin_unlock(&ucontext->mmap_lock);
-		ret = ib_copy_to_udata(udata, &uresp,
-				       ucontext->is_32b_cqe ?
-				       sizeof(uresp) - sizeof(uresp.flags) :
-				       sizeof(uresp));
-		if (ret)
-			goto err_free_mm2;
-
-		mm->key = uresp.key;
-		mm->addr = 0;
-		mm->vaddr = chp->cq.queue;
-		mm->dma_addr = chp->cq.dma_addr;
-		mm->len = chp->cq.memsize;
-		insert_flag_to_mmap(&rhp->rdev, mm, mm->addr);
-		insert_mmap(ucontext, mm);
-
-		mm2->key = uresp.gts_key;
-		mm2->addr = chp->cq.bar2_pa;
-		mm2->len = PAGE_SIZE;
-		mm2->vaddr = NULL;
-		mm2->dma_addr = 0;
-		insert_flag_to_mmap(&rhp->rdev, mm2, mm2->addr);
-		insert_mmap(ucontext, mm2);
-	}
+	ret = -ENOMEM;
+	mm = kmalloc(sizeof(*mm), GFP_KERNEL);
+	if (!mm)
+		goto err_remove_handle;
+	mm2 = kmalloc(sizeof(*mm2), GFP_KERNEL);
+	if (!mm2)
+		goto err_free_mm;
+
+	memset(&uresp, 0, sizeof(uresp));
+	uresp.qid_mask = rhp->rdev.cqmask;
+	uresp.cqid = chp->cq.cqid;
+	uresp.size = chp->cq.size;
+	uresp.memsize = chp->cq.memsize;
+	spin_lock(&ucontext->mmap_lock);
+	uresp.key = ucontext->key;
+	ucontext->key += PAGE_SIZE;
+	uresp.gts_key = ucontext->key;
+	ucontext->key += PAGE_SIZE;
+	/* communicate to the userspace that
+	 * kernel driver supports 64B CQE
+	 */
+	uresp.flags |= C4IW_64B_CQE;
+
+	spin_unlock(&ucontext->mmap_lock);
+	ret = ib_copy_to_udata(udata, &uresp,
+			       ucontext->is_32b_cqe ?
+			       sizeof(uresp) - sizeof(uresp.flags) :
+			       sizeof(uresp));
+	if (ret)
+		goto err_free_mm2;
+
+	mm->key = uresp.key;
+	mm->addr = 0;
+	mm->vaddr = chp->cq.queue;
+	mm->dma_addr = chp->cq.dma_addr;
+	mm->len = chp->cq.memsize;
+	insert_flag_to_mmap(&rhp->rdev, mm, mm->addr);
+	insert_mmap(ucontext, mm);
+
+	mm2->key = uresp.gts_key;
+	mm2->addr = chp->cq.bar2_pa;
+	mm2->len = PAGE_SIZE;
+	mm2->vaddr = NULL;
+	mm2->dma_addr = 0;
+	insert_flag_to_mmap(&rhp->rdev, mm2, mm2->addr);
+	insert_mmap(ucontext, mm2);
 
 	pr_debug("cqid 0x%0x chp %p size %u memsize %zu, dma_addr %pad\n",
 		 chp->cq.cqid, chp, chp->cq.size, chp->cq.memsize,
@@ -1153,14 +1144,103 @@ int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 err_remove_handle:
 	xa_erase_irq(&rhp->cqs, chp->cq.cqid);
 err_destroy_cq:
-	destroy_cq(&chp->rhp->rdev, &chp->cq,
-		   ucontext ? &ucontext->uctx : &rhp->rdev.uctx,
+	destroy_cq(&chp->rhp->rdev, &chp->cq, &ucontext->uctx,
+		   chp->destroy_skb, chp->wr_waitp);
+err_free_skb:
+	kfree_skb(chp->destroy_skb);
+err_free_wr_wait:
+	c4iw_put_wr_wait(chp->wr_waitp);
+	return ret;
+}
+
+int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		   struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
+	struct c4iw_dev *rhp = to_c4iw_dev(ibcq->device);
+	struct c4iw_cq *chp = to_c4iw_cq(ibcq);
+	int ret, wr_len;
+	size_t memsize, hwentries;
+
+	pr_debug("ib_dev %p entries %d\n", ibdev, entries);
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > ibdev->attrs.max_cqe)
+		return -EINVAL;
+
+	if (vector >= rhp->rdev.lldi.nciq)
+		return -EINVAL;
+
+	chp->wr_waitp = c4iw_alloc_wr_wait(GFP_KERNEL);
+	if (!chp->wr_waitp)
+		return -ENOMEM;
+	c4iw_init_wr_wait(chp->wr_waitp);
+
+	wr_len = sizeof(struct fw_ri_res_wr) + sizeof(struct fw_ri_res);
+	chp->destroy_skb = alloc_skb(wr_len, GFP_KERNEL);
+	if (!chp->destroy_skb) {
+		ret = -ENOMEM;
+		goto err_free_wr_wait;
+	}
+
+	/* account for the status page. */
+	entries++;
+
+	/* IQ needs one extra entry to differentiate full vs empty. */
+	entries++;
+
+	/*
+	 * entries must be multiple of 16 for HW.
+	 */
+	entries = roundup(entries, 16);
+
+	/*
+	 * Make actual HW queue 2x to avoid cdix_inc overflows.
+	 */
+	hwentries = min(entries * 2, rhp->rdev.hw_queue.t4_max_iq_size);
+
+	/*
+	 * Make HW queue at least 64 entries so GTS updates aren't too
+	 * frequent.
+	 */
+	if (hwentries < 64)
+		hwentries = 64;
+
+	memsize = hwentries * sizeof(*chp->cq.queue);
+
+	chp->cq.size = hwentries;
+	chp->cq.memsize = memsize;
+	chp->cq.vector = vector;
+
+	ret = create_cq(&rhp->rdev, &chp->cq, &rhp->rdev.uctx, chp->wr_waitp);
+	if (ret)
+		goto err_free_skb;
+
+	chp->rhp = rhp;
+	chp->cq.size--;				/* status page */
+	chp->ibcq.cqe = entries - 2;
+	spin_lock_init(&chp->lock);
+	spin_lock_init(&chp->comp_handler_lock);
+	refcount_set(&chp->refcnt, 1);
+	init_completion(&chp->cq_rel_comp);
+	ret = xa_insert_irq(&rhp->cqs, chp->cq.cqid, chp, GFP_KERNEL);
+	if (ret)
+		goto err_destroy_cq;
+
+	pr_debug("cqid 0x%0x chp %p size %u memsize %zu, dma_addr %pad\n",
+		 chp->cq.cqid, chp, chp->cq.size, chp->cq.memsize,
+		 &chp->cq.dma_addr);
+	return 0;
+err_destroy_cq:
+	destroy_cq(&chp->rhp->rdev, &chp->cq, &rhp->rdev.uctx,
 		   chp->destroy_skb, chp->wr_waitp);
 err_free_skb:
 	kfree_skb(chp->destroy_skb);
 err_free_wr_wait:
 	c4iw_put_wr_wait(chp->wr_waitp);
-err_free_chp:
 	return ret;
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index e17c1252536b..b8e3ee2a0c84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -1014,6 +1014,8 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata);
 void c4iw_cq_rem_ref(struct c4iw_cq *chp);
 int c4iw_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		   struct uverbs_attr_bundle *attrs);
+int c4iw_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			struct uverbs_attr_bundle *attrs);
 int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 int c4iw_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *attr,
 		    enum ib_srq_attr_mask srq_attr_mask,
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index e059f92d90fd..b9c183d1389d 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -461,6 +461,7 @@ static const struct ib_device_ops c4iw_dev_ops = {
 	.alloc_pd = c4iw_allocate_pd,
 	.alloc_ucontext = c4iw_alloc_ucontext,
 	.create_cq = c4iw_create_cq,
+	.create_user_cq = c4iw_create_user_cq,
 	.create_qp = c4iw_create_qp,
 	.create_srq = c4iw_create_srq,
 	.dealloc_pd = c4iw_deallocate_pd,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 17/50] RDMA/mthca: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (15 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 16/50] RDMA/cxgb4: Separate kernel and user CQ creation paths Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 18/50] RDMA/erdma: Separate " Leon Romanovsky
                   ` (34 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the create‑CQ logic into distinct user and kernel
code paths.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mthca/mthca_provider.c | 92 ++++++++++++++++++----------
 1 file changed, 58 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index aa5ca5c4ff77..6bf825978846 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -572,9 +572,9 @@ static int mthca_destroy_qp(struct ib_qp *qp, struct ib_udata *udata)
 	return 0;
 }
 
-static int mthca_create_cq(struct ib_cq *ibcq,
-			   const struct ib_cq_init_attr *attr,
-			   struct uverbs_attr_bundle *attrs)
+static int mthca_create_user_cq(struct ib_cq *ibcq,
+				const struct ib_cq_init_attr *attr,
+				struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -586,47 +586,41 @@ static int mthca_create_cq(struct ib_cq *ibcq,
 	struct mthca_ucontext *context = rdma_udata_to_drv_context(
 		udata, struct mthca_ucontext, ibucontext);
 
-	if (attr->flags)
+	if (attr->flags || ibcq->umem)
 		return -EOPNOTSUPP;
 
-	if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes)
+	if (attr->cqe > to_mdev(ibdev)->limits.max_cqes)
 		return -EINVAL;
 
-	if (udata) {
-		if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)))
-			return -EFAULT;
+	if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)))
+		return -EFAULT;
 
-		err = mthca_map_user_db(to_mdev(ibdev), &context->uar,
-					context->db_tab, ucmd.set_db_index,
-					ucmd.set_db_page);
-		if (err)
-			return err;
+	err = mthca_map_user_db(to_mdev(ibdev), &context->uar,
+				context->db_tab, ucmd.set_db_index,
+				ucmd.set_db_page);
+	if (err)
+		return err;
 
-		err = mthca_map_user_db(to_mdev(ibdev), &context->uar,
-					context->db_tab, ucmd.arm_db_index,
-					ucmd.arm_db_page);
-		if (err)
-			goto err_unmap_set;
-	}
+	err = mthca_map_user_db(to_mdev(ibdev), &context->uar,
+				context->db_tab, ucmd.arm_db_index,
+				ucmd.arm_db_page);
+	if (err)
+		goto err_unmap_set;
 
 	cq = to_mcq(ibcq);
 
-	if (udata) {
-		cq->buf.mr.ibmr.lkey = ucmd.lkey;
-		cq->set_ci_db_index  = ucmd.set_db_index;
-		cq->arm_db_index     = ucmd.arm_db_index;
-	}
+	cq->buf.mr.ibmr.lkey = ucmd.lkey;
+	cq->set_ci_db_index  = ucmd.set_db_index;
+	cq->arm_db_index     = ucmd.arm_db_index;
 
 	for (nent = 1; nent <= entries; nent <<= 1)
 		; /* nothing */
 
-	err = mthca_init_cq(to_mdev(ibdev), nent, context,
-			    udata ? ucmd.pdn : to_mdev(ibdev)->driver_pd.pd_num,
-			    cq);
+	err = mthca_init_cq(to_mdev(ibdev), nent, context, ucmd.pdn, cq);
 	if (err)
 		goto err_unmap_arm;
 
-	if (udata && ib_copy_to_udata(udata, &cq->cqn, sizeof(__u32))) {
+	if (ib_copy_to_udata(udata, &cq->cqn, sizeof(__u32))) {
 		mthca_free_cq(to_mdev(ibdev), cq);
 		err = -EFAULT;
 		goto err_unmap_arm;
@@ -637,18 +631,47 @@ static int mthca_create_cq(struct ib_cq *ibcq,
 	return 0;
 
 err_unmap_arm:
-	if (udata)
-		mthca_unmap_user_db(to_mdev(ibdev), &context->uar,
-				    context->db_tab, ucmd.arm_db_index);
+	mthca_unmap_user_db(to_mdev(ibdev), &context->uar,
+			    context->db_tab, ucmd.arm_db_index);
 
 err_unmap_set:
-	if (udata)
-		mthca_unmap_user_db(to_mdev(ibdev), &context->uar,
-				    context->db_tab, ucmd.set_db_index);
+	mthca_unmap_user_db(to_mdev(ibdev), &context->uar,
+			    context->db_tab, ucmd.set_db_index);
 
 	return err;
 }
 
+static int mthca_create_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	struct mthca_cq *cq;
+	int nent;
+	int err;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > to_mdev(ibdev)->limits.max_cqes)
+		return -EINVAL;
+
+	cq = to_mcq(ibcq);
+
+	for (nent = 1; nent <= entries; nent <<= 1)
+		; /* nothing */
+
+	err = mthca_init_cq(to_mdev(ibdev), nent, NULL,
+			    to_mdev(ibdev)->driver_pd.pd_num, cq);
+	if (err)
+		return err;
+
+	cq->resize_buf = NULL;
+
+	return 0;
+}
+
 static int mthca_alloc_resize_buf(struct mthca_dev *dev, struct mthca_cq *cq,
 				  int entries)
 {
@@ -1070,6 +1093,7 @@ static const struct ib_device_ops mthca_dev_ops = {
 	.attach_mcast = mthca_multicast_attach,
 	.create_ah = mthca_ah_create,
 	.create_cq = mthca_create_cq,
+	.create_user_cq = mthca_create_user_cq,
 	.create_qp = mthca_create_qp,
 	.dealloc_pd = mthca_dealloc_pd,
 	.dealloc_ucontext = mthca_dealloc_ucontext,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 18/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (16 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 17/50] RDMA/mthca: Split user and kernel " Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-24  5:51   ` Cheng Xu
  2026-02-13 10:57 ` [PATCH rdma-next 19/50] RDMA/ionic: Split " Leon Romanovsky
                   ` (33 subsequent siblings)
  51 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Split CQ creation into distinct kernel and user flows. The erdma driver,
inherited from mlx4, uses a problematic pattern that shares and caches
umem in erdma_map_user_dbrecords(). This design blocks the driver from
supporting generic umem sources (VMA, dmabuf, memfd, and others).

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/erdma/erdma_main.c  |  1 +
 drivers/infiniband/hw/erdma/erdma_verbs.c | 97 ++++++++++++++++++++-----------
 drivers/infiniband/hw/erdma/erdma_verbs.h |  2 +
 3 files changed, 67 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_main.c b/drivers/infiniband/hw/erdma/erdma_main.c
index f35b30235018..1b6426e89d80 100644
--- a/drivers/infiniband/hw/erdma/erdma_main.c
+++ b/drivers/infiniband/hw/erdma/erdma_main.c
@@ -505,6 +505,7 @@ static const struct ib_device_ops erdma_device_ops = {
 	.alloc_pd = erdma_alloc_pd,
 	.alloc_ucontext = erdma_alloc_ucontext,
 	.create_cq = erdma_create_cq,
+	.create_user_cq = erdma_create_user_cq,
 	.create_qp = erdma_create_qp,
 	.dealloc_pd = erdma_dealloc_pd,
 	.dealloc_ucontext = erdma_dealloc_ucontext,
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 058edc42de58..6f809907fec5 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -1952,8 +1952,8 @@ static int erdma_init_kernel_cq(struct erdma_cq *cq)
 	return -ENOMEM;
 }
 
-int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		    struct uverbs_attr_bundle *attrs)
+int erdma_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			 struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct erdma_cq *cq = to_ecq(ibcq);
@@ -1962,6 +1962,11 @@ int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int ret;
 	struct erdma_ucontext *ctx = rdma_udata_to_drv_context(
 		udata, struct erdma_ucontext, ibucontext);
+	struct erdma_ureq_create_cq ureq;
+	struct erdma_uresp_create_cq uresp;
+
+	if (ibcq->umem)
+		return -EOPNOTSUPP;
 
 	if (depth > dev->attrs.max_cqe)
 		return -EINVAL;
@@ -1977,31 +1982,22 @@ int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (ret < 0)
 		return ret;
 
-	if (!rdma_is_kernel_res(&ibcq->res)) {
-		struct erdma_ureq_create_cq ureq;
-		struct erdma_uresp_create_cq uresp;
-
-		ret = ib_copy_from_udata(&ureq, udata,
-					 min(udata->inlen, sizeof(ureq)));
-		if (ret)
-			goto err_out_xa;
+	ret = ib_copy_from_udata(&ureq, udata,
+				 min(udata->inlen, sizeof(ureq)));
+	if (ret)
+		goto err_out_xa;
 
-		ret = erdma_init_user_cq(ctx, cq, &ureq);
-		if (ret)
-			goto err_out_xa;
+	ret = erdma_init_user_cq(ctx, cq, &ureq);
+	if (ret)
+		goto err_out_xa;
 
-		uresp.cq_id = cq->cqn;
-		uresp.num_cqe = depth;
+	uresp.cq_id = cq->cqn;
+	uresp.num_cqe = depth;
 
-		ret = ib_copy_to_udata(udata, &uresp,
-				       min(sizeof(uresp), udata->outlen));
-		if (ret)
-			goto err_free_res;
-	} else {
-		ret = erdma_init_kernel_cq(cq);
-		if (ret)
-			goto err_out_xa;
-	}
+	ret = ib_copy_to_udata(udata, &uresp,
+			       min(sizeof(uresp), udata->outlen));
+	if (ret)
+		goto err_free_res;
 
 	ret = create_cq_cmd(ctx, cq);
 	if (ret)
@@ -2010,19 +2006,54 @@ int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return 0;
 
 err_free_res:
-	if (!rdma_is_kernel_res(&ibcq->res)) {
-		erdma_unmap_user_dbrecords(ctx, &cq->user_cq.user_dbr_page);
-		put_mtt_entries(dev, &cq->user_cq.qbuf_mem);
-	} else {
-		dma_free_coherent(&dev->pdev->dev, depth << CQE_SHIFT,
-				  cq->kern_cq.qbuf, cq->kern_cq.qbuf_dma_addr);
-		dma_pool_free(dev->db_pool, cq->kern_cq.dbrec,
-			      cq->kern_cq.dbrec_dma);
-	}
+	erdma_unmap_user_dbrecords(ctx, &cq->user_cq.user_dbr_page);
+	put_mtt_entries(dev, &cq->user_cq.qbuf_mem);
 
 err_out_xa:
 	xa_erase(&dev->cq_xa, cq->cqn);
+	return ret;
+}
+
+int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		    struct uverbs_attr_bundle *attrs)
+{
+	struct erdma_cq *cq = to_ecq(ibcq);
+	struct erdma_dev *dev = to_edev(ibcq->device);
+	unsigned int depth = attr->cqe;
+	int ret;
+
+	if (depth > dev->attrs.max_cqe)
+		return -EINVAL;
 
+	depth = roundup_pow_of_two(depth);
+	cq->ibcq.cqe = depth;
+	cq->depth = depth;
+	cq->assoc_eqn = attr->comp_vector + 1;
+
+	ret = xa_alloc_cyclic(&dev->cq_xa, &cq->cqn, cq,
+			      XA_LIMIT(1, dev->attrs.max_cq - 1),
+			      &dev->next_alloc_cqn, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+
+	ret = erdma_init_kernel_cq(cq);
+	if (ret)
+		goto err_out_xa;
+
+	ret = create_cq_cmd(NULL, cq);
+	if (ret)
+		goto err_free_res;
+
+	return 0;
+
+err_free_res:
+	dma_free_coherent(&dev->pdev->dev, depth << CQE_SHIFT,
+			  cq->kern_cq.qbuf, cq->kern_cq.qbuf_dma_addr);
+	dma_pool_free(dev->db_pool, cq->kern_cq.dbrec,
+		      cq->kern_cq.dbrec_dma);
+
+err_out_xa:
+	xa_erase(&dev->cq_xa, cq->cqn);
 	return ret;
 }
 
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h
index 7d8d3fe501d5..21a4fb404806 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.h
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.h
@@ -435,6 +435,8 @@ int erdma_get_port_immutable(struct ib_device *dev, u32 port,
 			     struct ib_port_immutable *ib_port_immutable);
 int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		    struct uverbs_attr_bundle *attrs);
+int erdma_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			 struct uverbs_attr_bundle *attrs);
 int erdma_query_port(struct ib_device *dev, u32 port,
 		     struct ib_port_attr *attr);
 int erdma_query_gid(struct ib_device *dev, u32 port, int idx,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 19/50] RDMA/ionic: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (17 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 18/50] RDMA/erdma: Separate " Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 20/50] RDMA/qedr: Convert to modern CQ interface Leon Romanovsky
                   ` (32 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows. The ionic
driver may allocate two umems per CQ, and the current layout prevents it from
supporting generic umem sources (VMA, dmabuf, memfd, and others).

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/ionic/ionic_controlpath.c | 88 +++++++++++++++++--------
 drivers/infiniband/hw/ionic/ionic_ibdev.c       |  1 +
 drivers/infiniband/hw/ionic/ionic_ibdev.h       |  2 +
 3 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/ionic/ionic_controlpath.c b/drivers/infiniband/hw/ionic/ionic_controlpath.c
index ea12d9b8e125..5b8b6baaf5d4 100644
--- a/drivers/infiniband/hw/ionic/ionic_controlpath.c
+++ b/drivers/infiniband/hw/ionic/ionic_controlpath.c
@@ -89,7 +89,7 @@ int ionic_create_cq_common(struct ionic_vcq *vcq,
 
 	cq->vcq = vcq;
 
-	if (attr->cqe < 1 || attr->cqe + IONIC_CQ_GRACE > 0xffff) {
+	if (attr->cqe > 0xffff - IONIC_CQ_GRACE) {
 		rc = -EINVAL;
 		goto err_args;
 	}
@@ -1209,8 +1209,8 @@ static int ionic_destroy_cq_cmd(struct ionic_ibdev *dev, u32 cqid)
 	return ionic_admin_wait(dev, &wr, IONIC_ADMIN_F_TEARDOWN);
 }
 
-int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		    struct uverbs_attr_bundle *attrs)
+int ionic_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			 struct uverbs_attr_bundle *attrs)
 {
 	struct ionic_ibdev *dev = to_ionic_ibdev(ibcq->device);
 	struct ib_udata *udata = &attrs->driver_udata;
@@ -1222,21 +1222,18 @@ int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct ionic_cq_req req;
 	int udma_idx = 0, rc;
 
-	if (udata) {
-		rc = ib_copy_from_udata(&req, udata, sizeof(req));
-		if (rc)
-			return rc;
-	}
+	if (ibcq->umem)
+		return -EOPNOTSUPP;
 
-	vcq->udma_mask = BIT(dev->lif_cfg.udma_count) - 1;
+	rc = ib_copy_from_udata(&req, udata, sizeof(req));
+	if (rc)
+		return rc;
 
-	if (udata)
-		vcq->udma_mask &= req.udma_mask;
+	vcq->udma_mask = BIT(dev->lif_cfg.udma_count) - 1;
+	vcq->udma_mask &= req.udma_mask;
 
-	if (!vcq->udma_mask) {
-		rc = -EINVAL;
-		goto err_init;
-	}
+	if (!vcq->udma_mask)
+		return -EINVAL;
 
 	for (; udma_idx < dev->lif_cfg.udma_count; ++udma_idx) {
 		if (!(vcq->udma_mask & BIT(udma_idx)))
@@ -1247,24 +1244,25 @@ int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 					    &resp.cqid[udma_idx],
 					    udma_idx);
 		if (rc)
-			goto err_init;
+			goto err_resp;
 
 		rc = ionic_create_cq_cmd(dev, ctx, &vcq->cq[udma_idx], &buf);
-		if (rc)
-			goto err_cmd;
+		if (rc) {
+			ionic_pgtbl_unbuf(dev, &buf);
+			ionic_destroy_cq_common(dev, &vcq->cq[udma_idx]);
+			goto err_resp;
+		}
 
 		ionic_pgtbl_unbuf(dev, &buf);
 	}
 
 	vcq->ibcq.cqe = attr->cqe;
 
-	if (udata) {
-		resp.udma_mask = vcq->udma_mask;
+	resp.udma_mask = vcq->udma_mask;
 
-		rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
-		if (rc)
-			goto err_resp;
-	}
+	rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
+	if (rc)
+		goto err_resp;
 
 	return 0;
 
@@ -1274,11 +1272,47 @@ int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		if (!(vcq->udma_mask & BIT(udma_idx)))
 			continue;
 		ionic_destroy_cq_cmd(dev, vcq->cq[udma_idx].cqid);
-err_cmd:
 		ionic_pgtbl_unbuf(dev, &buf);
 		ionic_destroy_cq_common(dev, &vcq->cq[udma_idx]);
-err_init:
-		;
+	}
+
+	return rc;
+}
+
+int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		    struct uverbs_attr_bundle *attrs)
+{
+	struct ionic_ibdev *dev = to_ionic_ibdev(ibcq->device);
+	struct ionic_vcq *vcq = to_ionic_vcq(ibcq);
+	struct ionic_tbl_buf buf = {};
+	int udma_idx = 0, rc;
+
+	vcq->udma_mask = BIT(dev->lif_cfg.udma_count) - 1;
+	for (; udma_idx < dev->lif_cfg.udma_count; ++udma_idx) {
+		rc = ionic_create_cq_common(vcq, &buf, attr, NULL, NULL, NULL,
+					    NULL, udma_idx);
+		if (rc)
+			goto err_resp;
+
+		rc = ionic_create_cq_cmd(dev, NULL, &vcq->cq[udma_idx], &buf);
+		if (rc) {
+			ionic_pgtbl_unbuf(dev, &buf);
+			ionic_destroy_cq_common(dev, &vcq->cq[udma_idx]);
+			goto err_resp;
+		}
+
+		ionic_pgtbl_unbuf(dev, &buf);
+	}
+
+	vcq->ibcq.cqe = attr->cqe;
+
+	return 0;
+
+err_resp:
+	while (udma_idx--) {
+		ionic_destroy_cq_cmd(dev, vcq->cq[udma_idx].cqid);
+		ionic_pgtbl_unbuf(dev, &buf);
+		ionic_destroy_cq_common(dev, &vcq->cq[udma_idx]);
 	}
 
 	return rc;
diff --git a/drivers/infiniband/hw/ionic/ionic_ibdev.c b/drivers/infiniband/hw/ionic/ionic_ibdev.c
index 164046d00e5d..32321a8996d6 100644
--- a/drivers/infiniband/hw/ionic/ionic_ibdev.c
+++ b/drivers/infiniband/hw/ionic/ionic_ibdev.c
@@ -229,6 +229,7 @@ static const struct ib_device_ops ionic_dev_ops = {
 	.alloc_mw = ionic_alloc_mw,
 	.dealloc_mw = ionic_dealloc_mw,
 	.create_cq = ionic_create_cq,
+	.create_user_cq = ionic_create_user_cq,
 	.destroy_cq = ionic_destroy_cq,
 	.create_qp = ionic_create_qp,
 	.modify_qp = ionic_modify_qp,
diff --git a/drivers/infiniband/hw/ionic/ionic_ibdev.h b/drivers/infiniband/hw/ionic/ionic_ibdev.h
index 63828240d659..0bcb8be6fb62 100644
--- a/drivers/infiniband/hw/ionic/ionic_ibdev.h
+++ b/drivers/infiniband/hw/ionic/ionic_ibdev.h
@@ -482,6 +482,8 @@ int ionic_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata);
 int ionic_dealloc_mw(struct ib_mw *ibmw);
 int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		    struct uverbs_attr_bundle *attrs);
+int ionic_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			 struct uverbs_attr_bundle *attrs);
 int ionic_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int ionic_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
 		    struct ib_udata *udata);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 20/50] RDMA/qedr: Convert to modern CQ interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (18 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 19/50] RDMA/ionic: Split " Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 21/50] RDMA/vmw_pvrdma: Provide a modern CQ creation interface Leon Romanovsky
                   ` (31 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Allow users to supply their own umem.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/qedr/main.c  |   1 +
 drivers/infiniband/hw/qedr/verbs.c | 323 +++++++++++++++++++++----------------
 drivers/infiniband/hw/qedr/verbs.h |   2 +
 3 files changed, 188 insertions(+), 138 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index ecdfeff3d44f..c6ca95983492 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -199,6 +199,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.alloc_ucontext = qedr_alloc_ucontext,
 	.create_ah = qedr_create_ah,
 	.create_cq = qedr_create_cq,
+	.create_user_cq = qedr_create_user_cq,
 	.create_qp = qedr_create_qp,
 	.create_srq = qedr_create_srq,
 	.dealloc_pd = qedr_dealloc_pd,
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index cb06c5d894b8..10010ccf63b3 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -789,52 +789,33 @@ static int qedr_init_user_db_rec(struct ib_udata *udata,
 
 static inline int qedr_init_user_queue(struct ib_udata *udata,
 				       struct qedr_dev *dev,
-				       struct qedr_userq *q, u64 buf_addr,
-				       size_t buf_len, bool requires_db_rec,
-				       int access,
+				       struct qedr_userq *q,
+				       bool requires_db_rec,
 				       int alloc_and_init)
 {
 	u32 fw_pages;
 	int rc;
 
-	q->buf_addr = buf_addr;
-	q->buf_len = buf_len;
-	q->umem = ib_umem_get(&dev->ibdev, q->buf_addr, q->buf_len, access);
-	if (IS_ERR(q->umem)) {
-		DP_ERR(dev, "create user queue: failed ib_umem_get, got %ld\n",
-		       PTR_ERR(q->umem));
-		return PTR_ERR(q->umem);
-	}
-
 	fw_pages = ib_umem_num_dma_blocks(q->umem, 1 << FW_PAGE_SHIFT);
 	rc = qedr_prepare_pbl_tbl(dev, &q->pbl_info, fw_pages, 0);
 	if (rc)
-		goto err0;
+		return rc;
 
 	if (alloc_and_init) {
 		q->pbl_tbl = qedr_alloc_pbl_tbl(dev, &q->pbl_info, GFP_KERNEL);
-		if (IS_ERR(q->pbl_tbl)) {
-			rc = PTR_ERR(q->pbl_tbl);
-			goto err0;
-		}
+		if (IS_ERR(q->pbl_tbl))
+			return PTR_ERR(q->pbl_tbl);
+
 		qedr_populate_pbls(dev, q->umem, q->pbl_tbl, &q->pbl_info,
 				   FW_PAGE_SHIFT);
 	} else {
 		q->pbl_tbl = kzalloc(sizeof(*q->pbl_tbl), GFP_KERNEL);
-		if (!q->pbl_tbl) {
-			rc = -ENOMEM;
-			goto err0;
-		}
+		if (!q->pbl_tbl)
+			return -ENOMEM;
 	}
 
 	/* mmap the user address used to store doorbell data for recovery */
 	return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
-
-err0:
-	ib_umem_release(q->umem);
-	q->umem = NULL;
-
-	return rc;
 }
 
 static inline void qedr_init_cq_params(struct qedr_cq *cq,
@@ -899,8 +880,8 @@ int qedr_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
 	return 0;
 }
 
-int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		   struct uverbs_attr_bundle *attrs)
+int qedr_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -908,6 +889,104 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		udata, struct qedr_ucontext, ibucontext);
 	struct qed_rdma_destroy_cq_out_params destroy_oparams;
 	struct qed_rdma_destroy_cq_in_params destroy_iparams;
+	struct qedr_dev *dev = get_qedr_dev(ibdev);
+	struct qed_rdma_create_cq_in_params params;
+	struct qedr_create_cq_ureq ureq = {};
+	int vector = attr->comp_vector;
+	int entries = attr->cqe;
+	struct qedr_cq *cq = get_qedr_cq(ibcq);
+	int chain_entries;
+	u32 db_offset;
+	int page_cnt;
+	u64 pbl_ptr;
+	u16 icid;
+	int rc;
+
+	DP_DEBUG(dev, QEDR_MSG_INIT,
+		 "create_cq: called from User Lib. entries=%d, vector=%d\n",
+		 entries, vector);
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > QEDR_MAX_CQES)
+		return -EINVAL;
+
+	chain_entries = qedr_align_cq_entries(entries);
+	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
+
+	/* calc db offset. user will add DPI base, kernel will add db addr */
+	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
+	if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen)))
+		return -EINVAL;
+
+	cq->cq_type = QEDR_CQ_TYPE_USER;
+
+	cq->q.buf_addr = ureq.addr;
+	cq->q.buf_len = ureq.len;
+	if (!ibcq->umem)
+		ibcq->umem = ib_umem_get(&dev->ibdev, ureq.addr, ureq.len,
+					 IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(ibcq->umem))
+		return PTR_ERR(ibcq->umem);
+	cq->q.umem = ibcq->umem;
+
+	rc = qedr_init_user_queue(udata, dev, &cq->q, true, 1);
+	if (rc)
+		return rc;
+
+	pbl_ptr = cq->q.pbl_tbl->pa;
+	page_cnt = cq->q.pbl_info.num_pbes;
+
+	cq->ibcq.cqe = chain_entries;
+	cq->q.db_addr = ctx->dpi_addr + db_offset;
+
+	qedr_init_cq_params(cq, ctx, dev, vector, chain_entries, page_cnt,
+			    pbl_ptr, &params);
+
+	rc = dev->ops->rdma_create_cq(dev->rdma_ctx, &params, &icid);
+	if (rc)
+		goto err1;
+
+	cq->icid = icid;
+	cq->sig = QEDR_CQ_MAGIC_NUMBER;
+	spin_lock_init(&cq->cq_lock);
+
+	rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
+	if (rc)
+		goto err2;
+
+	rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+				  &cq->q.db_rec_data->db_data,
+				  DB_REC_WIDTH_64B,
+				  DB_REC_USER);
+	if (rc)
+		goto err2;
+
+	DP_DEBUG(dev, QEDR_MSG_CQ,
+		 "create cq: icid=0x%0x, addr=%p, size(entries)=0x%0x\n",
+		 cq->icid, cq, params.cq_size);
+
+	return 0;
+
+err2:
+	destroy_iparams.icid = cq->icid;
+	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &destroy_iparams,
+				  &destroy_oparams);
+err1:
+	qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
+	if (cq->q.db_mmap_entry)
+		rdma_user_mmap_entry_remove(cq->q.db_mmap_entry);
+	return rc;
+}
+
+int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		   struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	struct qed_rdma_destroy_cq_out_params destroy_oparams;
+	struct qed_rdma_destroy_cq_in_params destroy_iparams;
 	struct qed_chain_init_params chain_params = {
 		.mode		= QED_CHAIN_MODE_PBL,
 		.intended_use	= QED_CHAIN_USE_TO_CONSUME,
@@ -916,7 +995,6 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_create_cq_in_params params;
-	struct qedr_create_cq_ureq ureq = {};
 	int vector = attr->comp_vector;
 	int entries = attr->cqe;
 	struct qedr_cq *cq = get_qedr_cq(ibcq);
@@ -928,18 +1006,14 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int rc;
 
 	DP_DEBUG(dev, QEDR_MSG_INIT,
-		 "create_cq: called from %s. entries=%d, vector=%d\n",
-		 udata ? "User Lib" : "Kernel", entries, vector);
+		 "create_cq: called from Kernel. entries=%d, vector=%d\n",
+		 entries, vector);
 
 	if (attr->flags)
 		return -EOPNOTSUPP;
 
-	if (entries > QEDR_MAX_CQES) {
-		DP_ERR(dev,
-		       "create cq: the number of entries %d is too high. Must be equal or below %d.\n",
-		       entries, QEDR_MAX_CQES);
+	if (attr->cqe > QEDR_MAX_CQES)
 		return -EINVAL;
-	}
 
 	chain_entries = qedr_align_cq_entries(entries);
 	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
@@ -948,47 +1022,18 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	/* calc db offset. user will add DPI base, kernel will add db addr */
 	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
 
-	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
-							 udata->inlen))) {
-			DP_ERR(dev,
-			       "create cq: problem copying data from user space\n");
-			goto err0;
-		}
+	cq->cq_type = QEDR_CQ_TYPE_KERNEL;
 
-		if (!ureq.len) {
-			DP_ERR(dev,
-			       "create cq: cannot create a cq with 0 entries\n");
-			goto err0;
-		}
-
-		cq->cq_type = QEDR_CQ_TYPE_USER;
-
-		rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
-					  ureq.len, true, IB_ACCESS_LOCAL_WRITE,
-					  1);
-		if (rc)
-			goto err0;
-
-		pbl_ptr = cq->q.pbl_tbl->pa;
-		page_cnt = cq->q.pbl_info.num_pbes;
-
-		cq->ibcq.cqe = chain_entries;
-		cq->q.db_addr = ctx->dpi_addr + db_offset;
-	} else {
-		cq->cq_type = QEDR_CQ_TYPE_KERNEL;
+	rc = dev->ops->common->chain_alloc(dev->cdev, &cq->pbl,
+					   &chain_params);
+	if (rc)
+		return rc;
 
-		rc = dev->ops->common->chain_alloc(dev->cdev, &cq->pbl,
-						   &chain_params);
-		if (rc)
-			goto err0;
+	page_cnt = qed_chain_get_page_cnt(&cq->pbl);
+	pbl_ptr = qed_chain_get_pbl_phys(&cq->pbl);
+	cq->ibcq.cqe = cq->pbl.capacity;
 
-		page_cnt = qed_chain_get_page_cnt(&cq->pbl);
-		pbl_ptr = qed_chain_get_pbl_phys(&cq->pbl);
-		cq->ibcq.cqe = cq->pbl.capacity;
-	}
-
-	qedr_init_cq_params(cq, ctx, dev, vector, chain_entries, page_cnt,
+	qedr_init_cq_params(cq, NULL, dev, vector, chain_entries, page_cnt,
 			    pbl_ptr, &params);
 
 	rc = dev->ops->rdma_create_cq(dev->rdma_ctx, &params, &icid);
@@ -999,37 +1044,23 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->sig = QEDR_CQ_MAGIC_NUMBER;
 	spin_lock_init(&cq->cq_lock);
 
-	if (udata) {
-		rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
-		if (rc)
-			goto err2;
-
-		rc = qedr_db_recovery_add(dev, cq->q.db_addr,
-					  &cq->q.db_rec_data->db_data,
-					  DB_REC_WIDTH_64B,
-					  DB_REC_USER);
-		if (rc)
-			goto err2;
+	/* Generate doorbell address. */
+	cq->db.data.icid = cq->icid;
+	cq->db_addr = dev->db_addr + db_offset;
+	cq->db.data.params = DB_AGG_CMD_MAX <<
+	    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
 
-	} else {
-		/* Generate doorbell address. */
-		cq->db.data.icid = cq->icid;
-		cq->db_addr = dev->db_addr + db_offset;
-		cq->db.data.params = DB_AGG_CMD_MAX <<
-		    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
-
-		/* point to the very last element, passing it we will toggle */
-		cq->toggle_cqe = qed_chain_get_last_elem(&cq->pbl);
-		cq->pbl_toggle = RDMA_CQE_REQUESTER_TOGGLE_BIT_MASK;
-		cq->latest_cqe = NULL;
-		consume_cqe(cq);
-		cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+	/* point to the very last element, passing it we will toggle */
+	cq->toggle_cqe = qed_chain_get_last_elem(&cq->pbl);
+	cq->pbl_toggle = RDMA_CQE_REQUESTER_TOGGLE_BIT_MASK;
+	cq->latest_cqe = NULL;
+	consume_cqe(cq);
+	cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
 
-		rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
-					  DB_REC_WIDTH_64B, DB_REC_KERNEL);
-		if (rc)
-			goto err2;
-	}
+	rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+				  DB_REC_WIDTH_64B, DB_REC_KERNEL);
+	if (rc)
+		goto err2;
 
 	DP_DEBUG(dev, QEDR_MSG_CQ,
 		 "create cq: icid=0x%0x, addr=%p, size(entries)=0x%0x\n",
@@ -1042,16 +1073,8 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &destroy_iparams,
 				  &destroy_oparams);
 err1:
-	if (udata) {
-		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
-		ib_umem_release(cq->q.umem);
-		if (cq->q.db_mmap_entry)
-			rdma_user_mmap_entry_remove(cq->q.db_mmap_entry);
-	} else {
-		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
-	}
-err0:
-	return -EINVAL;
+	dev->ops->common->chain_free(dev->cdev, &cq->pbl);
+	return rc;
 }
 
 #define QEDR_DESTROY_CQ_MAX_ITERATIONS		(10)
@@ -1081,7 +1104,6 @@ int qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 
 	if (udata) {
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
-		ib_umem_release(cq->q.umem);
 
 		if (cq->q.db_rec_data) {
 			qedr_db_recovery_del(dev, cq->q.db_addr,
@@ -1472,26 +1494,33 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
 	struct scatterlist *sg;
 	int rc;
 
-	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
-				  ureq->srq_len, false, access, 1);
+	srq->usrq.buf_addr = ureq->srq_addr;
+	srq->usrq.buf_len = ureq->srq_len;
+	srq->usrq.umem = ib_umem_get(&srq->dev->ibdev, ureq->srq_addr,
+				     ureq->srq_len, access);
+	if (IS_ERR(srq->usrq.umem))
+		return PTR_ERR(srq->usrq.umem);
+
+	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, false, 1);
 	if (rc)
-		return rc;
+		goto err_umem;
 
 	srq->prod_umem = ib_umem_get(srq->ibsrq.device, ureq->prod_pair_addr,
 				     sizeof(struct rdma_srq_producers), access);
 	if (IS_ERR(srq->prod_umem)) {
+		rc = PTR_ERR(srq->prod_umem);
 		qedr_free_pbl(srq->dev, &srq->usrq.pbl_info, srq->usrq.pbl_tbl);
-		ib_umem_release(srq->usrq.umem);
-		DP_ERR(srq->dev,
-		       "create srq: failed ib_umem_get for producer, got %ld\n",
-		       PTR_ERR(srq->prod_umem));
-		return PTR_ERR(srq->prod_umem);
+		goto err_umem;
 	}
 
 	sg = srq->prod_umem->sgt_append.sgt.sgl;
 	srq->hw_srq.phy_prod_pair_addr = sg_dma_address(sg);
 
 	return 0;
+
+err_umem:
+	ib_umem_release(srq->usrq.umem);
+	return rc;
 }
 
 static int qedr_alloc_srq_kernel_params(struct qedr_srq *srq,
@@ -1870,27 +1899,34 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 
 	if (qedr_qp_has_sq(qp)) {
 		/* SQ - read access only (0) */
-		rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
-					  ureq.sq_len, true, 0, alloc_and_init);
+		qp->usq.buf_addr = ureq.sq_addr;
+		qp->usq.buf_len = ureq.sq_len;
+		qp->usq.umem = ib_umem_get(&dev->ibdev, ureq.sq_addr,
+					   ureq.sq_len, 0);
+		if (IS_ERR(qp->usq.umem))
+			return PTR_ERR(qp->usq.umem);
+
+		rc = qedr_init_user_queue(udata, dev, &qp->usq, true,
+					  alloc_and_init);
 		if (rc)
-			return rc;
+			goto err_sq_umem;
 	}
 
 	if (qedr_qp_has_rq(qp)) {
 		/* RQ - read access only (0) */
-		rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
-					  ureq.rq_len, true, 0, alloc_and_init);
-		if (rc) {
-			ib_umem_release(qp->usq.umem);
-			qp->usq.umem = NULL;
-			if (rdma_protocol_roce(&dev->ibdev, 1)) {
-				qedr_free_pbl(dev, &qp->usq.pbl_info,
-					      qp->usq.pbl_tbl);
-			} else {
-				kfree(qp->usq.pbl_tbl);
-			}
-			return rc;
+		qp->urq.buf_addr = ureq.rq_addr;
+		qp->urq.buf_len = ureq.rq_len;
+		qp->urq.umem = ib_umem_get(&dev->ibdev, ureq.rq_addr,
+					   ureq.rq_len, 0);
+		if (IS_ERR(qp->urq.umem)) {
+			rc = PTR_ERR(qp->urq.umem);
+			goto err_rq_umem;
 		}
+
+		rc = qedr_init_user_queue(udata, dev, &qp->urq, true,
+					  alloc_and_init);
+		if (rc)
+			goto err_rq_umem2;
 	}
 
 	memset(&in_params, 0, sizeof(in_params));
@@ -1989,6 +2025,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 err1:
 	qedr_cleanup_user(dev, ctx, qp);
 	return rc;
+
+err_rq_umem2:
+	ib_umem_release(qp->urq.umem);
+err_rq_umem:
+	if (rdma_protocol_roce(&dev->ibdev, 1))
+		qedr_free_pbl(dev, &qp->usq.pbl_info, qp->usq.pbl_tbl);
+	else
+		kfree(qp->usq.pbl_tbl);
+err_sq_umem:
+	ib_umem_release(qp->usq.umem);
+	return rc;
 }
 
 static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 62420a15101b..292d77df562d 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -53,6 +53,8 @@ int qedr_alloc_xrcd(struct ib_xrcd *ibxrcd, struct ib_udata *udata);
 int qedr_dealloc_xrcd(struct ib_xrcd *ibxrcd, struct ib_udata *udata);
 int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		   struct uverbs_attr_bundle *attrs);
+int qedr_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			struct uverbs_attr_bundle *attrs);
 int qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int qedr_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 int qedr_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *attrs,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 21/50] RDMA/vmw_pvrdma: Provide a modern CQ creation interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (19 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 20/50] RDMA/qedr: Convert to modern CQ interface Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 22/50] RDMA/ocrdma: Split user and kernel CQ creation paths Leon Romanovsky
                   ` (30 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The uverbs CQ creation UAPI allows users to supply their own umem for a CQ.
Update vmw_pvrdma to support this workflow while preserving support for creating
umem through the legacy interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c    | 171 ++++++++++++++++--------
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c  |   1 +
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |   3 +
 3 files changed, 121 insertions(+), 54 deletions(-)

diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
index b3df6eb9b8ef..c43c363565c1 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -90,16 +90,9 @@ int pvrdma_req_notify_cq(struct ib_cq *ibcq,
 	return has_data;
 }
 
-/**
- * pvrdma_create_cq - create completion queue
- * @ibcq: Allocated CQ
- * @attr: completion queue attributes
- * @attrs: bundle
- *
- * @return: 0 on success
- */
-int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		     struct uverbs_attr_bundle *attrs)
+int pvrdma_create_user_cq(struct ib_cq *ibcq,
+			  const struct ib_cq_init_attr *attr,
+			  struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -123,58 +116,48 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (attr->flags)
 		return -EOPNOTSUPP;
 
-	entries = roundup_pow_of_two(entries);
-	if (entries < 1 || entries > dev->dsr->caps.max_cqe)
+	if (attr->cqe > dev->dsr->caps.max_cqe)
 		return -EINVAL;
 
+	entries = roundup_pow_of_two(entries);
+
 	if (!atomic_add_unless(&dev->num_cqs, 1, dev->dsr->caps.max_cq))
 		return -ENOMEM;
 
 	cq->ibcq.cqe = entries;
-	cq->is_kernel = !udata;
-
-	if (!cq->is_kernel) {
-		if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
-			ret = -EFAULT;
-			goto err_cq;
-		}
-
-		cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, ucmd.buf_size,
-				       IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(cq->umem)) {
-			ret = PTR_ERR(cq->umem);
-			goto err_cq;
-		}
+	cq->is_kernel = false;
 
-		npages = ib_umem_num_dma_blocks(cq->umem, PAGE_SIZE);
-	} else {
-		/* One extra page for shared ring state */
-		npages = 1 + (entries * sizeof(struct pvrdma_cqe) +
-			      PAGE_SIZE - 1) / PAGE_SIZE;
+	if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
+		ret = -EFAULT;
+		goto err_cq;
+	}
 
-		/* Skip header page. */
-		cq->offset = PAGE_SIZE;
+	if (!ibcq->umem)
+		ibcq->umem = ib_umem_get(ibdev, ucmd.buf_addr, ucmd.buf_size,
+					 IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(ibcq->umem)) {
+		ret = PTR_ERR(ibcq->umem);
+		goto err_cq;
 	}
 
+	npages = ib_umem_num_dma_blocks(cq->umem, PAGE_SIZE);
+
 	if (npages < 0 || npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
 		dev_warn(&dev->pdev->dev,
 			 "overflow pages in completion queue\n");
 		ret = -EINVAL;
-		goto err_umem;
+		goto err_cq;
 	}
 
-	ret = pvrdma_page_dir_init(dev, &cq->pdir, npages, cq->is_kernel);
+	ret = pvrdma_page_dir_init(dev, &cq->pdir, npages, false);
 	if (ret) {
 		dev_warn(&dev->pdev->dev,
 			 "could not allocate page directory\n");
-		goto err_umem;
+		goto err_cq;
 	}
 
 	/* Ring state is always the first page. Set in library for user cq. */
-	if (cq->is_kernel)
-		cq->ring_state = cq->pdir.pages[0];
-	else
-		pvrdma_page_dir_insert_umem(&cq->pdir, cq->umem, 0);
+	pvrdma_page_dir_insert_umem(&cq->pdir, cq->umem, 0);
 
 	refcount_set(&cq->refcnt, 1);
 	init_completion(&cq->free);
@@ -183,7 +166,7 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->hdr.cmd = PVRDMA_CMD_CREATE_CQ;
 	cmd->nchunks = npages;
-	cmd->ctx_handle = context ? context->ctx_handle : 0;
+	cmd->ctx_handle = context->ctx_handle;
 	cmd->cqe = entries;
 	cmd->pdir_dma = cq->pdir.dir_dma;
 	ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_CQ_RESP);
@@ -200,24 +183,106 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq;
 	spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
 
-	if (!cq->is_kernel) {
-		cq->uar = &context->uar;
+	cq->uar = &context->uar;
 
-		/* Copy udata back. */
-		if (ib_copy_to_udata(udata, &cq_resp, sizeof(cq_resp))) {
-			dev_warn(&dev->pdev->dev,
-				 "failed to copy back udata\n");
-			pvrdma_destroy_cq(&cq->ibcq, udata);
-			return -EINVAL;
-		}
+	/* Copy udata back. */
+	if (ib_copy_to_udata(udata, &cq_resp, sizeof(cq_resp))) {
+		dev_warn(&dev->pdev->dev,
+			 "failed to copy back udata\n");
+		pvrdma_destroy_cq(&cq->ibcq, udata);
+		return -EINVAL;
 	}
 
 	return 0;
 
 err_page_dir:
 	pvrdma_page_dir_cleanup(dev, &cq->pdir);
-err_umem:
-	ib_umem_release(cq->umem);
+err_cq:
+	atomic_dec(&dev->num_cqs);
+	return ret;
+}
+
+int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		     struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	struct pvrdma_dev *dev = to_vdev(ibdev);
+	struct pvrdma_cq *cq = to_vcq(ibcq);
+	int ret;
+	int npages;
+	unsigned long flags;
+	union pvrdma_cmd_req req;
+	union pvrdma_cmd_resp rsp;
+	struct pvrdma_cmd_create_cq *cmd = &req.create_cq;
+	struct pvrdma_cmd_create_cq_resp *resp = &rsp.create_cq_resp;
+
+	BUILD_BUG_ON(sizeof(struct pvrdma_cqe) != 64);
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > dev->dsr->caps.max_cqe)
+		return -EINVAL;
+	entries = roundup_pow_of_two(entries);
+
+	if (!atomic_add_unless(&dev->num_cqs, 1, dev->dsr->caps.max_cq))
+		return -ENOMEM;
+
+	cq->ibcq.cqe = entries;
+	cq->is_kernel = true;
+
+	/* One extra page for shared ring state */
+	npages = 1 + (entries * sizeof(struct pvrdma_cqe) +
+		      PAGE_SIZE - 1) / PAGE_SIZE;
+
+	/* Skip header page. */
+	cq->offset = PAGE_SIZE;
+
+	if (npages < 0 || npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
+		dev_warn(&dev->pdev->dev,
+			 "overflow pages in completion queue\n");
+		ret = -EINVAL;
+		goto err_cq;
+	}
+
+	ret = pvrdma_page_dir_init(dev, &cq->pdir, npages, true);
+	if (ret) {
+		dev_warn(&dev->pdev->dev,
+			 "could not allocate page directory\n");
+		goto err_cq;
+	}
+
+	/* Ring state is always the first page. Set in library for user cq. */
+	cq->ring_state = cq->pdir.pages[0];
+
+	refcount_set(&cq->refcnt, 1);
+	init_completion(&cq->free);
+	spin_lock_init(&cq->cq_lock);
+
+	memset(cmd, 0, sizeof(*cmd));
+	cmd->hdr.cmd = PVRDMA_CMD_CREATE_CQ;
+	cmd->nchunks = npages;
+	cmd->ctx_handle = 0;
+	cmd->cqe = entries;
+	cmd->pdir_dma = cq->pdir.dir_dma;
+	ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_CQ_RESP);
+	if (ret < 0) {
+		dev_warn(&dev->pdev->dev,
+			 "could not create completion queue, error: %d\n", ret);
+		goto err_page_dir;
+	}
+
+	cq->ibcq.cqe = resp->cqe;
+	cq->cq_handle = resp->cq_handle;
+	spin_lock_irqsave(&dev->cq_tbl_lock, flags);
+	dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq;
+	spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
+
+	return 0;
+
+err_page_dir:
+	pvrdma_page_dir_cleanup(dev, &cq->pdir);
 err_cq:
 	atomic_dec(&dev->num_cqs);
 	return ret;
@@ -229,8 +294,6 @@ static void pvrdma_free_cq(struct pvrdma_dev *dev, struct pvrdma_cq *cq)
 		complete(&cq->free);
 	wait_for_completion(&cq->free);
 
-	ib_umem_release(cq->umem);
-
 	pvrdma_page_dir_cleanup(dev, &cq->pdir);
 }
 
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
index 1664d1d7d969..3f5b94a1e517 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
@@ -194,6 +194,7 @@ static const struct ib_device_ops pvrdma_dev_ops = {
 	.alloc_ucontext = pvrdma_alloc_ucontext,
 	.create_ah = pvrdma_create_ah,
 	.create_cq = pvrdma_create_cq,
+	.create_user_cq = pvrdma_create_user_cq,
 	.create_qp = pvrdma_create_qp,
 	.dealloc_pd = pvrdma_dealloc_pd,
 	.dealloc_ucontext = pvrdma_dealloc_ucontext,
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
index 603e5a9311eb..18910d336744 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
@@ -375,6 +375,9 @@ int pvrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		     int sg_nents, unsigned int *sg_offset);
 int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		     struct uverbs_attr_bundle *attrs);
+int pvrdma_create_user_cq(struct ib_cq *ibcq,
+			  const struct ib_cq_init_attr *attr,
+			  struct uverbs_attr_bundle *attrs);
 int pvrdma_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int pvrdma_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int pvrdma_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 22/50] RDMA/ocrdma: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (20 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 21/50] RDMA/vmw_pvrdma: Provide a modern CQ creation interface Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:57 ` [PATCH rdma-next 23/50] RDMA/irdma: " Leon Romanovsky
                   ` (29 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  1 +
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 56 +++++++++++++++++++----------
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  3 ++
 3 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 5d4b3bc16493..0d89c5ec9a7a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -141,6 +141,7 @@ static const struct ib_device_ops ocrdma_dev_ops = {
 	.create_cq = ocrdma_create_cq,
 	.create_qp = ocrdma_create_qp,
 	.create_user_ah = ocrdma_create_ah,
+	.create_user_cq = ocrdma_create_user_cq,
 	.dealloc_pd = ocrdma_dealloc_pd,
 	.dealloc_ucontext = ocrdma_dealloc_ucontext,
 	.dereg_mr = ocrdma_dereg_mr,
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index bf9211d8d130..034d8b937a77 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -966,8 +966,9 @@ static int ocrdma_copy_cq_uresp(struct ocrdma_dev *dev, struct ocrdma_cq *cq,
 	return status;
 }
 
-int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		     struct uverbs_attr_bundle *attrs)
+int ocrdma_create_user_cq(struct ib_cq *ibcq,
+			  const struct ib_cq_init_attr *attr,
+			  struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
@@ -976,36 +977,29 @@ int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
 	struct ocrdma_ucontext *uctx = rdma_udata_to_drv_context(
 		udata, struct ocrdma_ucontext, ibucontext);
-	u16 pd_id = 0;
 	int status;
 	struct ocrdma_create_cq_ureq ureq;
 
-	if (attr->flags)
+	if (attr->flags || ibcq->umem)
 		return -EOPNOTSUPP;
 
-	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
-			return -EFAULT;
-	} else
-		ureq.dpp_cq = 0;
+	if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
+		return -EFAULT;
 
 	spin_lock_init(&cq->cq_lock);
 	spin_lock_init(&cq->comp_handler_lock);
 	INIT_LIST_HEAD(&cq->sq_head);
 	INIT_LIST_HEAD(&cq->rq_head);
 
-	if (udata)
-		pd_id = uctx->cntxt_pd->id;
-
-	status = ocrdma_mbx_create_cq(dev, cq, entries, ureq.dpp_cq, pd_id);
+	status = ocrdma_mbx_create_cq(dev, cq, entries, ureq.dpp_cq,
+				      uctx->cntxt_pd->id);
 	if (status)
 		return status;
 
-	if (udata) {
-		status = ocrdma_copy_cq_uresp(dev, cq, udata);
-		if (status)
-			goto ctx_err;
-	}
+	status = ocrdma_copy_cq_uresp(dev, cq, udata);
+	if (status)
+		goto ctx_err;
+
 	cq->phase = OCRDMA_CQE_VALID;
 	dev->cq_tbl[cq->id] = cq;
 	return 0;
@@ -1015,6 +1009,32 @@ int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return status;
 }
 
+int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		     struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	int entries = attr->cqe;
+	struct ocrdma_cq *cq = get_ocrdma_cq(ibcq);
+	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
+	int status;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	spin_lock_init(&cq->cq_lock);
+	spin_lock_init(&cq->comp_handler_lock);
+	INIT_LIST_HEAD(&cq->sq_head);
+	INIT_LIST_HEAD(&cq->rq_head);
+
+	status = ocrdma_mbx_create_cq(dev, cq, entries, 0, 0);
+	if (status)
+		return status;
+
+	cq->phase = OCRDMA_CQE_VALID;
+	dev->cq_tbl[cq->id] = cq;
+	return 0;
+}
+
 int ocrdma_resize_cq(struct ib_cq *ibcq, int new_cnt,
 		     struct ib_udata *udata)
 {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 6c5c3755b8a9..4a572608fd9f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -71,6 +71,9 @@ int ocrdma_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
 int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		     struct uverbs_attr_bundle *attrs);
+int ocrdma_create_user_cq(struct ib_cq *ibcq,
+			  const struct ib_cq_init_attr *attr,
+			  struct uverbs_attr_bundle *attrs);
 int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *);
 int ocrdma_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 23/50] RDMA/irdma: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (21 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 22/50] RDMA/ocrdma: Split user and kernel CQ creation paths Leon Romanovsky
@ 2026-02-13 10:57 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 24/50] RDMA/usnic: Provide a modern CQ creation interface Leon Romanovsky
                   ` (28 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/irdma/verbs.c | 310 +++++++++++++++++++++++-------------
 1 file changed, 195 insertions(+), 115 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index cf8d19150574..f2b3cfe125af 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -2461,15 +2461,9 @@ static inline int cq_validate_flags(u32 flags, u8 hw_rev)
 	return flags & ~IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION ? -EOPNOTSUPP : 0;
 }
 
-/**
- * irdma_create_cq - create cq
- * @ibcq: CQ allocated
- * @attr: attributes for cq
- * @attrs: uverbs attribute bundle
- */
-static int irdma_create_cq(struct ib_cq *ibcq,
-			   const struct ib_cq_init_attr *attr,
-			   struct uverbs_attr_bundle *attrs)
+static int irdma_create_user_cq(struct ib_cq *ibcq,
+				const struct ib_cq_init_attr *attr,
+				struct uverbs_attr_bundle *attrs)
 {
 #define IRDMA_CREATE_CQ_MIN_REQ_LEN offsetofend(struct irdma_create_cq_req, user_cq_buf)
 #define IRDMA_CREATE_CQ_MIN_RESP_LEN offsetofend(struct irdma_create_cq_resp, cq_size)
@@ -2489,14 +2483,22 @@ static int irdma_create_cq(struct ib_cq *ibcq,
 	int err_code;
 	int entries = attr->cqe;
 	bool cqe_64byte_ena;
-	u8 cqe_size;
+	struct irdma_ucontext *ucontext;
+	struct irdma_create_cq_req req = {};
+	struct irdma_cq_mr *cqmr;
+	struct irdma_pbl *iwpbl;
+	struct irdma_pbl *iwpbl_shadow;
+	struct irdma_cq_mr *cqmr_shadow;
+
+	if (ibcq->umem)
+		return -EOPNOTSUPP;
 
 	err_code = cq_validate_flags(attr->flags, dev->hw_attrs.uk_attrs.hw_rev);
 	if (err_code)
 		return err_code;
 
-	if (udata && (udata->inlen < IRDMA_CREATE_CQ_MIN_REQ_LEN ||
-		      udata->outlen < IRDMA_CREATE_CQ_MIN_RESP_LEN))
+	if (udata->inlen < IRDMA_CREATE_CQ_MIN_REQ_LEN ||
+	    udata->outlen < IRDMA_CREATE_CQ_MIN_RESP_LEN)
 		return -EINVAL;
 
 	err_code = irdma_alloc_rsrc(rf, rf->allocated_cqs, rf->max_cq, &cq_num,
@@ -2516,7 +2518,6 @@ static int irdma_create_cq(struct ib_cq *ibcq,
 	ukinfo->cq_id = cq_num;
 	cqe_64byte_ena = dev->hw_attrs.uk_attrs.feature_flags & IRDMA_FEATURE_64_BYTE_CQE ?
 			 true : false;
-	cqe_size = cqe_64byte_ena ? 64 : 32;
 	ukinfo->avoid_mem_cflct = cqe_64byte_ena;
 	iwcq->ibcq.cqe = info.cq_uk_init_info.cq_size;
 	if (attr->comp_vector < rf->ceqs_count)
@@ -2526,110 +2527,203 @@ static int irdma_create_cq(struct ib_cq *ibcq,
 	info.type = IRDMA_CQ_TYPE_IWARP;
 	info.vsi = &iwdev->vsi;
 
-	if (udata) {
-		struct irdma_ucontext *ucontext;
-		struct irdma_create_cq_req req = {};
-		struct irdma_cq_mr *cqmr;
-		struct irdma_pbl *iwpbl;
-		struct irdma_pbl *iwpbl_shadow;
-		struct irdma_cq_mr *cqmr_shadow;
-
-		iwcq->user_mode = true;
-		ucontext =
-			rdma_udata_to_drv_context(udata, struct irdma_ucontext,
-						  ibucontext);
-		if (ib_copy_from_udata(&req, udata,
-				       min(sizeof(req), udata->inlen))) {
-			err_code = -EFAULT;
-			goto cq_free_rsrc;
-		}
+	iwcq->user_mode = true;
+	ucontext =
+		rdma_udata_to_drv_context(udata, struct irdma_ucontext,
+					  ibucontext);
+	if (ib_copy_from_udata(&req, udata,
+			       min(sizeof(req), udata->inlen))) {
+		err_code = -EFAULT;
+		goto cq_free_rsrc;
+	}
 
+	spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
+	iwpbl = irdma_get_pbl((unsigned long)req.user_cq_buf,
+			      &ucontext->cq_reg_mem_list);
+	spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
+	if (!iwpbl) {
+		err_code = -EPROTO;
+		goto cq_free_rsrc;
+	}
+
+	cqmr = &iwpbl->cq_mr;
+
+	if (rf->sc_dev.hw_attrs.uk_attrs.feature_flags &
+	    IRDMA_FEATURE_CQ_RESIZE && !ucontext->legacy_mode) {
 		spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
-		iwpbl = irdma_get_pbl((unsigned long)req.user_cq_buf,
-				      &ucontext->cq_reg_mem_list);
+		iwpbl_shadow = irdma_get_pbl(
+				(unsigned long)req.user_shadow_area,
+				&ucontext->cq_reg_mem_list);
 		spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
-		if (!iwpbl) {
+
+		if (!iwpbl_shadow) {
 			err_code = -EPROTO;
 			goto cq_free_rsrc;
 		}
+		cqmr_shadow = &iwpbl_shadow->cq_mr;
+		info.shadow_area_pa = cqmr_shadow->cq_pbl.addr;
+		cqmr->split = true;
+	} else {
+		info.shadow_area_pa = cqmr->shadow;
+	}
+	if (iwpbl->pbl_allocated) {
+		info.virtual_map = true;
+		info.pbl_chunk_size = 1;
+		info.first_pm_pbl_idx = cqmr->cq_pbl.idx;
+	} else {
+		info.cq_base_pa = cqmr->cq_pbl.addr;
+	}
 
-		cqmr = &iwpbl->cq_mr;
+	info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
+					 (u32)IRDMA_MAX_CQ_READ_THRESH);
 
-		if (rf->sc_dev.hw_attrs.uk_attrs.feature_flags &
-		    IRDMA_FEATURE_CQ_RESIZE && !ucontext->legacy_mode) {
-			spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
-			iwpbl_shadow = irdma_get_pbl(
-					(unsigned long)req.user_shadow_area,
-					&ucontext->cq_reg_mem_list);
-			spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
+	if (irdma_sc_cq_init(cq, &info)) {
+		ibdev_dbg(&iwdev->ibdev, "VERBS: init cq fail\n");
+		err_code = -EPROTO;
+		goto cq_free_rsrc;
+	}
 
-			if (!iwpbl_shadow) {
-				err_code = -EPROTO;
-				goto cq_free_rsrc;
-			}
-			cqmr_shadow = &iwpbl_shadow->cq_mr;
-			info.shadow_area_pa = cqmr_shadow->cq_pbl.addr;
-			cqmr->split = true;
-		} else {
-			info.shadow_area_pa = cqmr->shadow;
-		}
-		if (iwpbl->pbl_allocated) {
-			info.virtual_map = true;
-			info.pbl_chunk_size = 1;
-			info.first_pm_pbl_idx = cqmr->cq_pbl.idx;
-		} else {
-			info.cq_base_pa = cqmr->cq_pbl.addr;
-		}
-	} else {
-		/* Kmode allocations */
-		int rsize;
+	cqp_request = irdma_alloc_and_get_cqp_request(&rf->cqp, true);
+	if (!cqp_request) {
+		err_code = -ENOMEM;
+		goto cq_free_rsrc;
+	}
 
-		if (entries < 1 || entries > rf->max_cqe) {
-			err_code = -EINVAL;
-			goto cq_free_rsrc;
-		}
+	cqp_info = &cqp_request->info;
+	cqp_info->cqp_cmd = IRDMA_OP_CQ_CREATE;
+	cqp_info->post_sq = 1;
+	cqp_info->in.u.cq_create.cq = cq;
+	cqp_info->in.u.cq_create.check_overflow = true;
+	cqp_info->in.u.cq_create.scratch = (uintptr_t)cqp_request;
+	err_code = irdma_handle_cqp_op(rf, cqp_request);
+	irdma_put_cqp_request(&rf->cqp, cqp_request);
+	if (err_code)
+		goto cq_free_rsrc;
 
-		entries += 2;
-		if (!cqe_64byte_ena && dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2)
-			entries *= 2;
+	struct irdma_create_cq_resp resp = {};
 
-		if (entries & 1)
-			entries += 1; /* cq size must be an even number */
+	resp.cq_id = info.cq_uk_init_info.cq_id;
+	resp.cq_size = info.cq_uk_init_info.cq_size;
+	if (ib_copy_to_udata(udata, &resp,
+			     min(sizeof(resp), udata->outlen))) {
+		ibdev_dbg(&iwdev->ibdev,
+			  "VERBS: copy to user data\n");
+		err_code = -EPROTO;
+		goto cq_destroy;
+	}
 
-		if (entries * cqe_size == IRDMA_HW_PAGE_SIZE)
-			entries += 2;
+	init_completion(&iwcq->free_cq);
 
-		ukinfo->cq_size = entries;
+	/* Populate table entry after CQ is fully created. */
+	smp_store_release(&rf->cq_table[cq_num], iwcq);
 
-		if (cqe_64byte_ena)
-			rsize = info.cq_uk_init_info.cq_size * sizeof(struct irdma_extended_cqe);
-		else
-			rsize = info.cq_uk_init_info.cq_size * sizeof(struct irdma_cqe);
-		iwcq->kmem.size = ALIGN(round_up(rsize, 256), 256);
-		iwcq->kmem.va = dma_alloc_coherent(dev->hw->device,
-						   iwcq->kmem.size,
-						   &iwcq->kmem.pa, GFP_KERNEL);
-		if (!iwcq->kmem.va) {
-			err_code = -ENOMEM;
-			goto cq_free_rsrc;
-		}
+	return 0;
+cq_destroy:
+	irdma_cq_wq_destroy(rf, cq);
+cq_free_rsrc:
+	irdma_cq_free_rsrc(rf, iwcq);
 
-		iwcq->kmem_shadow.size = ALIGN(IRDMA_SHADOW_AREA_SIZE << 3,
-					       64);
-		iwcq->kmem_shadow.va = dma_alloc_coherent(dev->hw->device,
-							  iwcq->kmem_shadow.size,
-							  &iwcq->kmem_shadow.pa,
-							  GFP_KERNEL);
-		if (!iwcq->kmem_shadow.va) {
-			err_code = -ENOMEM;
-			goto cq_free_rsrc;
-		}
-		info.shadow_area_pa = iwcq->kmem_shadow.pa;
-		ukinfo->shadow_area = iwcq->kmem_shadow.va;
-		ukinfo->cq_base = iwcq->kmem.va;
-		info.cq_base_pa = iwcq->kmem.pa;
+	return err_code;
+}
+
+static int irdma_create_cq(struct ib_cq *ibcq,
+			   const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	struct irdma_device *iwdev = to_iwdev(ibdev);
+	struct irdma_pci_f *rf = iwdev->rf;
+	struct irdma_cq *iwcq = to_iwcq(ibcq);
+	u32 cq_num = 0;
+	struct irdma_sc_cq *cq;
+	struct irdma_sc_dev *dev = &rf->sc_dev;
+	struct irdma_cq_init_info info = {};
+	struct irdma_cqp_request *cqp_request;
+	struct cqp_cmds_info *cqp_info;
+	struct irdma_cq_uk_init_info *ukinfo = &info.cq_uk_init_info;
+	int err_code;
+	int entries = attr->cqe;
+	bool cqe_64byte_ena;
+	u8 cqe_size;
+	int rsize;
+
+	err_code = cq_validate_flags(attr->flags, dev->hw_attrs.uk_attrs.hw_rev);
+	if (err_code)
+		return err_code;
+
+	err_code = irdma_alloc_rsrc(rf, rf->allocated_cqs, rf->max_cq, &cq_num,
+				    &rf->next_cq);
+	if (err_code)
+		return err_code;
+
+	cq = &iwcq->sc_cq;
+	cq->back_cq = iwcq;
+	refcount_set(&iwcq->refcnt, 1);
+	spin_lock_init(&iwcq->lock);
+	INIT_LIST_HEAD(&iwcq->resize_list);
+	INIT_LIST_HEAD(&iwcq->cmpl_generated);
+	iwcq->cq_num = cq_num;
+	info.dev = dev;
+	ukinfo->cq_size = max(entries, 4);
+	ukinfo->cq_id = cq_num;
+	cqe_64byte_ena = dev->hw_attrs.uk_attrs.feature_flags & IRDMA_FEATURE_64_BYTE_CQE ?
+			 true : false;
+	cqe_size = cqe_64byte_ena ? 64 : 32;
+	ukinfo->avoid_mem_cflct = cqe_64byte_ena;
+	iwcq->ibcq.cqe = info.cq_uk_init_info.cq_size;
+	if (attr->comp_vector < rf->ceqs_count)
+		info.ceq_id = attr->comp_vector;
+	info.ceq_id_valid = true;
+	info.ceqe_mask = 1;
+	info.type = IRDMA_CQ_TYPE_IWARP;
+	info.vsi = &iwdev->vsi;
+
+	/* Kmode allocations */
+	if (entries < 1 || entries > rf->max_cqe) {
+		err_code = -EINVAL;
+		goto cq_free_rsrc;
 	}
 
+	entries += 2;
+	if (!cqe_64byte_ena && dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2)
+		entries *= 2;
+
+	if (entries & 1)
+		entries += 1; /* cq size must be an even number */
+
+	if (entries * cqe_size == IRDMA_HW_PAGE_SIZE)
+		entries += 2;
+
+	ukinfo->cq_size = entries;
+
+	if (cqe_64byte_ena)
+		rsize = info.cq_uk_init_info.cq_size * sizeof(struct irdma_extended_cqe);
+	else
+		rsize = info.cq_uk_init_info.cq_size * sizeof(struct irdma_cqe);
+	iwcq->kmem.size = ALIGN(round_up(rsize, 256), 256);
+	iwcq->kmem.va = dma_alloc_coherent(dev->hw->device,
+					   iwcq->kmem.size,
+					   &iwcq->kmem.pa, GFP_KERNEL);
+	if (!iwcq->kmem.va) {
+		err_code = -ENOMEM;
+		goto cq_free_rsrc;
+	}
+
+	iwcq->kmem_shadow.size = ALIGN(IRDMA_SHADOW_AREA_SIZE << 3,
+				       64);
+	iwcq->kmem_shadow.va = dma_alloc_coherent(dev->hw->device,
+						  iwcq->kmem_shadow.size,
+						  &iwcq->kmem_shadow.pa,
+						  GFP_KERNEL);
+	if (!iwcq->kmem_shadow.va) {
+		err_code = -ENOMEM;
+		goto cq_free_rsrc;
+	}
+	info.shadow_area_pa = iwcq->kmem_shadow.pa;
+	ukinfo->shadow_area = iwcq->kmem_shadow.va;
+	ukinfo->cq_base = iwcq->kmem.va;
+	info.cq_base_pa = iwcq->kmem.pa;
+
 	info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
 					 (u32)IRDMA_MAX_CQ_READ_THRESH);
 
@@ -2656,28 +2750,13 @@ static int irdma_create_cq(struct ib_cq *ibcq,
 	if (err_code)
 		goto cq_free_rsrc;
 
-	if (udata) {
-		struct irdma_create_cq_resp resp = {};
-
-		resp.cq_id = info.cq_uk_init_info.cq_id;
-		resp.cq_size = info.cq_uk_init_info.cq_size;
-		if (ib_copy_to_udata(udata, &resp,
-				     min(sizeof(resp), udata->outlen))) {
-			ibdev_dbg(&iwdev->ibdev,
-				  "VERBS: copy to user data\n");
-			err_code = -EPROTO;
-			goto cq_destroy;
-		}
-	}
-
 	init_completion(&iwcq->free_cq);
 
 	/* Populate table entry after CQ is fully created. */
 	smp_store_release(&rf->cq_table[cq_num], iwcq);
 
 	return 0;
-cq_destroy:
-	irdma_cq_wq_destroy(rf, cq);
+
 cq_free_rsrc:
 	irdma_cq_free_rsrc(rf, iwcq);
 
@@ -5355,6 +5434,7 @@ static const struct ib_device_ops irdma_dev_ops = {
 	.alloc_pd = irdma_alloc_pd,
 	.alloc_ucontext = irdma_alloc_ucontext,
 	.create_cq = irdma_create_cq,
+	.create_user_cq = irdma_create_user_cq,
 	.create_qp = irdma_create_qp,
 	.dealloc_driver = irdma_ib_dealloc_device,
 	.dealloc_mw = irdma_dealloc_mw,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 24/50] RDMA/usnic: Provide a modern CQ creation interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (22 preceding siblings ...)
  2026-02-13 10:57 ` [PATCH rdma-next 23/50] RDMA/irdma: " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 25/50] RDMA/mana: " Leon Romanovsky
                   ` (27 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

usnic doesn't support kernel verbs and should have only
.create_user_cq() callback.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/usnic/usnic_ib_main.c  | 2 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 11eca39b73a9..8a3b641d6059 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -356,7 +356,7 @@ static const struct ib_device_ops usnic_dev_ops = {
 
 	.alloc_pd = usnic_ib_alloc_pd,
 	.alloc_ucontext = usnic_ib_alloc_ucontext,
-	.create_cq = usnic_ib_create_cq,
+	.create_user_cq = usnic_ib_create_user_cq,
 	.create_qp = usnic_ib_create_qp,
 	.dealloc_pd = usnic_ib_dealloc_pd,
 	.dealloc_ucontext = usnic_ib_dealloc_ucontext,
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index ae5df96589d9..2b41ded14a65 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -576,10 +576,10 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	return status;
 }
 
-int usnic_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct uverbs_attr_bundle *attrs)
+int usnic_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			    struct uverbs_attr_bundle *attrs)
 {
-	if (attr->flags)
+	if (attr->flags || ibcq->umem)
 		return -EOPNOTSUPP;
 
 	return 0;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index e3031ac32488..15882110a5d5 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -55,8 +55,8 @@ int usnic_ib_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *init_attr,
 int usnic_ib_destroy_qp(struct ib_qp *qp, struct ib_udata *udata);
 int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 				int attr_mask, struct ib_udata *udata);
-int usnic_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		       struct uverbs_attr_bundle *attrs);
+int usnic_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			    struct uverbs_attr_bundle *attrs);
 int usnic_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
 				u64 virt_addr, int access_flags,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 25/50] RDMA/mana: Provide a modern CQ creation interface
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (23 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 24/50] RDMA/usnic: Provide a modern CQ creation interface Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-24 22:30   ` [EXTERNAL] " Long Li
  2026-02-13 10:58 ` [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths Leon Romanovsky
                   ` (26 subsequent siblings)
  51 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The uverbs CQ creation UAPI allows users to supply their own umem for a CQ.
Update mana to support this workflow while preserving support for creating
umem through the legacy interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mana/cq.c      | 128 +++++++++++++++++++++++------------
 drivers/infiniband/hw/mana/device.c  |   1 +
 drivers/infiniband/hw/mana/main.c    |  25 +++----
 drivers/infiniband/hw/mana/mana_ib.h |   4 +-
 drivers/infiniband/hw/mana/qp.c      |  42 ++++++++++--
 drivers/infiniband/hw/mana/wq.c      |  14 +++-
 6 files changed, 147 insertions(+), 67 deletions(-)

diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index 2dce1b677115..605122ecf9f9 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -5,8 +5,8 @@
 
 #include "mana_ib.h"
 
-int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		      struct uverbs_attr_bundle *attrs)
+int mana_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
@@ -17,7 +17,6 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	struct mana_ib_dev *mdev;
 	bool is_rnic_cq;
 	u32 doorbell;
-	u32 buf_size;
 	int err;
 
 	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
@@ -26,44 +25,100 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->cq_handle = INVALID_MANA_HANDLE;
 	is_rnic_cq = mana_ib_is_rnic(mdev);
 
-	if (udata) {
-		if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
-			return -EINVAL;
+	if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
+		return -EINVAL;
 
-		err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-		if (err) {
-			ibdev_dbg(ibdev, "Failed to copy from udata for create cq, %d\n", err);
-			return err;
-		}
+	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
+	if (err) {
+		ibdev_dbg(ibdev, "Failed to copy from udata for create cq, %d\n", err);
+		return err;
+	}
 
-		if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) ||
-		    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
-			ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
-			return -EINVAL;
-		}
+	if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) ||
+	    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
+		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
+		return -EINVAL;
+	}
+
+	cq->cqe = attr->cqe;
+	if (!ibcq->umem)
+		ibcq->umem = ib_umem_get(ibdev, ucmd.buf_addr,
+				     cq->cqe * COMP_ENTRY_SIZE,
+				     IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(ibcq->umem))
+		return PTR_ERR(ibcq->umem);
+	cq->queue.umem = ibcq->umem;
+
+	err = mana_ib_create_queue(mdev, &cq->queue);
+	if (err)
+		return err;
 
-		cq->cqe = attr->cqe;
-		err = mana_ib_create_queue(mdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE,
-					   &cq->queue);
+	mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext,
+						  ibucontext);
+	doorbell = mana_ucontext->doorbell;
+
+	if (is_rnic_cq) {
+		err = mana_ib_gd_create_cq(mdev, cq, doorbell);
 		if (err) {
-			ibdev_dbg(ibdev, "Failed to create queue for create cq, %d\n", err);
-			return err;
+			ibdev_dbg(ibdev, "Failed to create RNIC cq, %d\n", err);
+			goto err_destroy_queue;
 		}
 
-		mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext,
-							  ibucontext);
-		doorbell = mana_ucontext->doorbell;
-	} else {
-		buf_size = MANA_PAGE_ALIGN(roundup_pow_of_two(attr->cqe * COMP_ENTRY_SIZE));
-		cq->cqe = buf_size / COMP_ENTRY_SIZE;
-		err = mana_ib_create_kernel_queue(mdev, buf_size, GDMA_CQ, &cq->queue);
+		err = mana_ib_install_cq_cb(mdev, cq);
 		if (err) {
-			ibdev_dbg(ibdev, "Failed to create kernel queue for create cq, %d\n", err);
-			return err;
+			ibdev_dbg(ibdev, "Failed to install cq callback, %d\n", err);
+			goto err_destroy_rnic_cq;
 		}
-		doorbell = mdev->gdma_dev->doorbell;
 	}
 
+	resp.cqid = cq->queue.id;
+	err = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+	if (err) {
+		ibdev_dbg(&mdev->ib_dev, "Failed to copy to udata, %d\n", err);
+		goto err_remove_cq_cb;
+	}
+
+	spin_lock_init(&cq->cq_lock);
+	INIT_LIST_HEAD(&cq->list_send_qp);
+	INIT_LIST_HEAD(&cq->list_recv_qp);
+
+	return 0;
+
+err_remove_cq_cb:
+	mana_ib_remove_cq_cb(mdev, cq);
+err_destroy_rnic_cq:
+	mana_ib_gd_destroy_cq(mdev, cq);
+err_destroy_queue:
+	mana_ib_destroy_queue(mdev, &cq->queue);
+	return err;
+}
+
+int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct uverbs_attr_bundle *attrs)
+{
+	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
+	struct ib_device *ibdev = ibcq->device;
+	struct mana_ib_dev *mdev;
+	bool is_rnic_cq;
+	u32 doorbell;
+	u32 buf_size;
+	int err;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	cq->comp_vector = attr->comp_vector % ibdev->num_comp_vectors;
+	cq->cq_handle = INVALID_MANA_HANDLE;
+	is_rnic_cq = mana_ib_is_rnic(mdev);
+
+	buf_size = MANA_PAGE_ALIGN(roundup_pow_of_two(attr->cqe * COMP_ENTRY_SIZE));
+	cq->cqe = buf_size / COMP_ENTRY_SIZE;
+	err = mana_ib_create_kernel_queue(mdev, buf_size, GDMA_CQ, &cq->queue);
+	if (err) {
+		ibdev_dbg(ibdev, "Failed to create kernel queue for create cq, %d\n", err);
+		return err;
+	}
+	doorbell = mdev->gdma_dev->doorbell;
+
 	if (is_rnic_cq) {
 		err = mana_ib_gd_create_cq(mdev, cq, doorbell);
 		if (err) {
@@ -78,23 +133,12 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		}
 	}
 
-	if (udata) {
-		resp.cqid = cq->queue.id;
-		err = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
-		if (err) {
-			ibdev_dbg(&mdev->ib_dev, "Failed to copy to udata, %d\n", err);
-			goto err_remove_cq_cb;
-		}
-	}
-
 	spin_lock_init(&cq->cq_lock);
 	INIT_LIST_HEAD(&cq->list_send_qp);
 	INIT_LIST_HEAD(&cq->list_recv_qp);
 
 	return 0;
 
-err_remove_cq_cb:
-	mana_ib_remove_cq_cb(mdev, cq);
 err_destroy_rnic_cq:
 	mana_ib_gd_destroy_cq(mdev, cq);
 err_destroy_queue:
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index ccc2279ca63c..c5c5fe051424 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -21,6 +21,7 @@ static const struct ib_device_ops mana_ib_dev_ops = {
 	.alloc_ucontext = mana_ib_alloc_ucontext,
 	.create_ah = mana_ib_create_ah,
 	.create_cq = mana_ib_create_cq,
+	.create_user_cq = mana_ib_create_user_cq,
 	.create_qp = mana_ib_create_qp,
 	.create_rwq_ind_table = mana_ib_create_rwq_ind_table,
 	.create_wq = mana_ib_create_wq,
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index fac159f7128d..a871b8287dc9 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -261,35 +261,26 @@ int mana_ib_create_kernel_queue(struct mana_ib_dev *mdev, u32 size, enum gdma_qu
 	return 0;
 }
 
-int mana_ib_create_queue(struct mana_ib_dev *mdev, u64 addr, u32 size,
+int mana_ib_create_queue(struct mana_ib_dev *mdev,
 			 struct mana_ib_queue *queue)
 {
-	struct ib_umem *umem;
 	int err;
 
-	queue->umem = NULL;
 	queue->id = INVALID_QUEUE_ID;
 	queue->gdma_region = GDMA_INVALID_DMA_REGION;
 
-	umem = ib_umem_get(&mdev->ib_dev, addr, size, IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(umem)) {
-		ibdev_dbg(&mdev->ib_dev, "Failed to get umem, %pe\n", umem);
-		return PTR_ERR(umem);
-	}
-
-	err = mana_ib_create_zero_offset_dma_region(mdev, umem, &queue->gdma_region);
+	err = mana_ib_create_zero_offset_dma_region(mdev, queue->umem,
+						    &queue->gdma_region);
 	if (err) {
-		ibdev_dbg(&mdev->ib_dev, "Failed to create dma region, %d\n", err);
-		goto free_umem;
+		ibdev_dbg(&mdev->ib_dev, "Failed to create dma region, %d\n",
+			  err);
+		return err;
 	}
-	queue->umem = umem;
 
-	ibdev_dbg(&mdev->ib_dev, "created dma region 0x%llx\n", queue->gdma_region);
+	ibdev_dbg(&mdev->ib_dev, "created dma region 0x%llx\n",
+		  queue->gdma_region);
 
 	return 0;
-free_umem:
-	ib_umem_release(umem);
-	return err;
 }
 
 void mana_ib_destroy_queue(struct mana_ib_dev *mdev, struct mana_ib_queue *queue)
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index a7c8c0fd7019..3bc7c88dc136 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -624,7 +624,7 @@ int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev,
 
 int mana_ib_create_kernel_queue(struct mana_ib_dev *mdev, u32 size, enum gdma_queue_type type,
 				struct mana_ib_queue *queue);
-int mana_ib_create_queue(struct mana_ib_dev *mdev, u64 addr, u32 size,
+int mana_ib_create_queue(struct mana_ib_dev *mdev,
 			 struct mana_ib_queue *queue);
 void mana_ib_destroy_queue(struct mana_ib_dev *mdev, struct mana_ib_queue *queue);
 
@@ -667,6 +667,8 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
 
 int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
+int mana_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			   struct uverbs_attr_bundle *attrs);
 
 int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 48c1f4977f21..b08dbc675741 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -326,11 +326,20 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
 		  ucmd.sq_buf_addr, ucmd.port);
 
-	err = mana_ib_create_queue(mdev, ucmd.sq_buf_addr, ucmd.sq_buf_size, &qp->raw_sq);
+	qp->raw_sq.umem = ib_umem_get(&mdev->ib_dev, ucmd.sq_buf_addr,
+				      ucmd.sq_buf_size, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(qp->raw_sq.umem)) {
+		err = PTR_ERR(qp->raw_sq.umem);
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed to get umem for qp-raw, err %d\n", err);
+		goto err_free_vport;
+	}
+
+	err = mana_ib_create_queue(mdev, &qp->raw_sq);
 	if (err) {
 		ibdev_dbg(&mdev->ib_dev,
 			  "Failed to create queue for create qp-raw, err %d\n", err);
-		goto err_free_vport;
+		goto err_release_umem;
 	}
 
 	/* Create a WQ on the same port handle used by the Ethernet */
@@ -391,6 +400,10 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 
 err_destroy_queue:
 	mana_ib_destroy_queue(mdev, &qp->raw_sq);
+	return err;
+
+err_release_umem:
+	ib_umem_release(qp->raw_sq.umem);
 
 err_free_vport:
 	mana_ib_uncfg_vport(mdev, pd, port);
@@ -553,13 +566,25 @@ static int mana_ib_create_rc_qp(struct ib_qp *ibqp, struct ib_pd *ibpd,
 		if (i == MANA_RC_SEND_QUEUE_FMR) {
 			qp->rc_qp.queues[i].id = INVALID_QUEUE_ID;
 			qp->rc_qp.queues[i].gdma_region = GDMA_INVALID_DMA_REGION;
+			qp->rc_qp.queues[i].umem = NULL;
 			continue;
 		}
-		err = mana_ib_create_queue(mdev, ucmd.queue_buf[j], ucmd.queue_size[j],
-					   &qp->rc_qp.queues[i]);
+		qp->rc_qp.queues[i].umem = ib_umem_get(&mdev->ib_dev,
+						       ucmd.queue_buf[j],
+						       ucmd.queue_size[j],
+						       IB_ACCESS_LOCAL_WRITE);
+		if (IS_ERR(qp->rc_qp.queues[i].umem)) {
+			err = PTR_ERR(qp->rc_qp.queues[i].umem);
+			ibdev_err(&mdev->ib_dev, "Failed to get umem for queue %d, err %d\n",
+				  i, err);
+			goto release_umems;
+		}
+
+		err = mana_ib_create_queue(mdev, &qp->rc_qp.queues[i]);
 		if (err) {
 			ibdev_err(&mdev->ib_dev, "Failed to create queue %d, err %d\n", i, err);
-			goto destroy_queues;
+			ib_umem_release(qp->rc_qp.queues[i].umem);
+			goto release_umems;
 		}
 		j++;
 	}
@@ -598,6 +623,13 @@ static int mana_ib_create_rc_qp(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	while (i-- > 0)
 		mana_ib_destroy_queue(mdev, &qp->rc_qp.queues[i]);
 	return err;
+
+release_umems:
+	while (i-- > 0) {
+		if (i != MANA_RC_SEND_QUEUE_FMR)
+			ib_umem_release(qp->rc_qp.queues[i].umem);
+	}
+	return err;
 }
 
 static void mana_add_qp_to_cqs(struct mana_ib_qp *qp)
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
index f959f4b9244f..be474aa8bdfc 100644
--- a/drivers/infiniband/hw/mana/wq.c
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -31,11 +31,19 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
 
 	ibdev_dbg(&mdev->ib_dev, "ucmd wq_buf_addr 0x%llx\n", ucmd.wq_buf_addr);
 
-	err = mana_ib_create_queue(mdev, ucmd.wq_buf_addr, ucmd.wq_buf_size, &wq->queue);
+	wq->queue.umem = ib_umem_get(&mdev->ib_dev, ucmd.wq_buf_addr,
+				     ucmd.wq_buf_size, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(wq->queue.umem)) {
+		err = PTR_ERR(wq->queue.umem);
+		ibdev_dbg(&mdev->ib_dev, "Failed to get umem for create wq, %d\n", err);
+		goto err_free_wq;
+	}
+
+	err = mana_ib_create_queue(mdev, &wq->queue);
 	if (err) {
 		ibdev_dbg(&mdev->ib_dev,
 			  "Failed to create queue for create wq, %d\n", err);
-		goto err_free_wq;
+		goto err_release_umem;
 	}
 
 	wq->wqe = init_attr->max_wr;
@@ -43,6 +51,8 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
 	wq->rx_object = INVALID_MANA_HANDLE;
 	return &wq->ibwq;
 
+err_release_umem:
+	ib_umem_release(wq->queue.umem);
 err_free_wq:
 	kfree(wq);
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (24 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 25/50] RDMA/mana: " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-24  2:20   ` Cheng Xu
  2026-02-26  6:17   ` Junxian Huang
  2026-02-13 10:58 ` [PATCH rdma-next 27/50] RDMA/rdmavt: Split " Leon Romanovsky
                   ` (25 subsequent siblings)
  51 siblings, 2 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Split CQ creation into distinct kernel and user flows. The hns driver,
inherited from mlx4, uses a problematic pattern that shares and caches
umem in hns_roce_db_map_user(). This design blocks the driver from
supporting generic umem sources (VMA, dmabuf, memfd, and others).

In addition, let's delete counter that counts CQ creation errors. There
are multiple ways to debug kernel in modern kernel without need to rely
on that debugfs counter.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/hns/hns_roce_cq.c      | 103 ++++++++++++++++++++-------
 drivers/infiniband/hw/hns/hns_roce_debugfs.c |   1 -
 drivers/infiniband/hw/hns/hns_roce_device.h  |   3 +-
 drivers/infiniband/hw/hns/hns_roce_main.c    |   1 +
 4 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 857a913326cd..0f24a916466b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -335,7 +335,10 @@ static int verify_cq_create_attr(struct hns_roce_dev *hr_dev,
 {
 	struct ib_device *ibdev = &hr_dev->ib_dev;
 
-	if (!attr->cqe || attr->cqe > hr_dev->caps.max_cqes) {
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > hr_dev->caps.max_cqes) {
 		ibdev_err(ibdev, "failed to check CQ count %u, max = %u.\n",
 			  attr->cqe, hr_dev->caps.max_cqes);
 		return -EINVAL;
@@ -407,8 +410,8 @@ static int set_cqe_size(struct hns_roce_cq *hr_cq, struct ib_udata *udata,
 	return 0;
 }
 
-int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
-		       struct uverbs_attr_bundle *attrs)
+int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
+			    struct uverbs_attr_bundle *attrs)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
 	struct ib_udata *udata = &attrs->driver_udata;
@@ -418,31 +421,27 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
 	struct hns_roce_ib_create_cq ucmd = {};
 	int ret;
 
-	if (attr->flags) {
-		ret = -EOPNOTSUPP;
-		goto err_out;
-	}
+	if (ib_cq->umem)
+		return -EOPNOTSUPP;
 
 	ret = verify_cq_create_attr(hr_dev, attr);
 	if (ret)
-		goto err_out;
+		return ret;
 
-	if (udata) {
-		ret = get_cq_ucmd(hr_cq, udata, &ucmd);
-		if (ret)
-			goto err_out;
-	}
+	ret = get_cq_ucmd(hr_cq, udata, &ucmd);
+	if (ret)
+		return ret;
 
 	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd);
 
 	ret = set_cqe_size(hr_cq, udata, &ucmd);
 	if (ret)
-		goto err_out;
+		return ret;
 
 	ret = alloc_cq_buf(hr_dev, hr_cq, udata, ucmd.buf_addr);
 	if (ret) {
 		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
-		goto err_out;
+		return ret;
 	}
 
 	ret = alloc_cq_db(hr_dev, hr_cq, udata, ucmd.db_addr, &resp);
@@ -464,13 +463,11 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
 		goto err_cqn;
 	}
 
-	if (udata) {
-		resp.cqn = hr_cq->cqn;
-		ret = ib_copy_to_udata(udata, &resp,
-				       min(udata->outlen, sizeof(resp)));
-		if (ret)
-			goto err_cqc;
-	}
+	resp.cqn = hr_cq->cqn;
+	ret = ib_copy_to_udata(udata, &resp,
+			       min(udata->outlen, sizeof(resp)));
+	if (ret)
+		goto err_cqc;
 
 	hr_cq->cons_index = 0;
 	hr_cq->arm_sn = 1;
@@ -487,9 +484,67 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
 	free_cq_db(hr_dev, hr_cq, udata);
 err_cq_buf:
 	free_cq_buf(hr_dev, hr_cq);
-err_out:
-	atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT]);
+	return ret;
+}
+
+int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs)
+{
+	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
+	struct hns_roce_ib_create_cq_resp resp = {};
+	struct hns_roce_cq *hr_cq = to_hr_cq(ib_cq);
+	struct ib_device *ibdev = &hr_dev->ib_dev;
+	struct hns_roce_ib_create_cq ucmd = {};
+	int ret;
+
+	ret = verify_cq_create_attr(hr_dev, attr);
+	if (ret)
+		return ret;
+
+	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd);
+
+	ret = set_cqe_size(hr_cq, NULL, &ucmd);
+	if (ret)
+		return ret;
 
+	ret = alloc_cq_buf(hr_dev, hr_cq, NULL, 0);
+	if (ret) {
+		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
+		return ret;
+	}
+
+	ret = alloc_cq_db(hr_dev, hr_cq, NULL, 0, &resp);
+	if (ret) {
+		ibdev_err(ibdev, "failed to alloc CQ db, ret = %d.\n", ret);
+		goto err_cq_buf;
+	}
+
+	ret = alloc_cqn(hr_dev, hr_cq, NULL);
+	if (ret) {
+		ibdev_err(ibdev, "failed to alloc CQN, ret = %d.\n", ret);
+		goto err_cq_db;
+	}
+
+	ret = alloc_cqc(hr_dev, hr_cq);
+	if (ret) {
+		ibdev_err(ibdev,
+			  "failed to alloc CQ context, ret = %d.\n", ret);
+		goto err_cqn;
+	}
+
+	hr_cq->cons_index = 0;
+	hr_cq->arm_sn = 1;
+	refcount_set(&hr_cq->refcount, 1);
+	init_completion(&hr_cq->free);
+
+	return 0;
+
+err_cqn:
+	free_cqn(hr_dev, hr_cq->cqn);
+err_cq_db:
+	free_cq_db(hr_dev, hr_cq, NULL);
+err_cq_buf:
+	free_cq_buf(hr_dev, hr_cq);
 	return ret;
 }
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_debugfs.c b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
index b869cdc54118..481b30f2f5b5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_debugfs.c
+++ b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
@@ -47,7 +47,6 @@ static const char * const sw_stat_info[] = {
 	[HNS_ROCE_DFX_MBX_EVENT_CNT] = "mbx_event",
 	[HNS_ROCE_DFX_QP_CREATE_ERR_CNT] = "qp_create_err",
 	[HNS_ROCE_DFX_QP_MODIFY_ERR_CNT] = "qp_modify_err",
-	[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT] = "cq_create_err",
 	[HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT] = "cq_modify_err",
 	[HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT] = "srq_create_err",
 	[HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT] = "srq_modify_err",
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3f032b8038af..fdc5f487d7a3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -902,7 +902,6 @@ enum hns_roce_sw_dfx_stat_index {
 	HNS_ROCE_DFX_MBX_EVENT_CNT,
 	HNS_ROCE_DFX_QP_CREATE_ERR_CNT,
 	HNS_ROCE_DFX_QP_MODIFY_ERR_CNT,
-	HNS_ROCE_DFX_CQ_CREATE_ERR_CNT,
 	HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT,
 	HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT,
 	HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT,
@@ -1295,6 +1294,8 @@ int to_hr_qp_type(int qp_type);
 
 int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
 		       struct uverbs_attr_bundle *attrs);
+int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
+			    struct uverbs_attr_bundle *attrs);
 
 int hns_roce_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata);
 int hns_roce_db_map_user(struct hns_roce_ucontext *context, unsigned long virt,
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index a3490bab297a..64de49bf8df7 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -727,6 +727,7 @@ static const struct ib_device_ops hns_roce_dev_ops = {
 	.create_ah = hns_roce_create_ah,
 	.create_user_ah = hns_roce_create_ah,
 	.create_cq = hns_roce_create_cq,
+	.create_user_cq = hns_roce_create_user_cq,
 	.create_qp = hns_roce_create_qp,
 	.dealloc_pd = hns_roce_dealloc_pd,
 	.dealloc_ucontext = hns_roce_dealloc_ucontext,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 27/50] RDMA/rdmavt: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (25 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 28/50] RDMA/siw: " Leon Romanovsky
                   ` (24 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rdmavt/cq.c | 144 +++++++++++++++++++++++++++-----------
 drivers/infiniband/sw/rdmavt/cq.h |   2 +
 drivers/infiniband/sw/rdmavt/vt.c |   1 +
 3 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index e7835ca70e2b..db86eb026bb3 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -147,33 +147,32 @@ static void send_complete(struct work_struct *work)
 }
 
 /**
- * rvt_create_cq - create a completion queue
+ * rvt_create_user_cq - create a completion queue for userspace
  * @ibcq: Allocated CQ
  * @attr: creation attributes
  * @attrs: uverbs bundle
  *
- * Called by ib_create_cq() in the generic verbs code.
+ * Called by ib_create_cq() in the generic verbs code for userspace CQs.
  *
  * Return: 0 on success
  */
-int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-		  struct uverbs_attr_bundle *attrs)
+int rvt_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *ibdev = ibcq->device;
 	struct rvt_dev_info *rdi = ib_to_rvt(ibdev);
 	struct rvt_cq *cq = ibcq_to_rvtcq(ibcq);
-	struct rvt_cq_wc *u_wc = NULL;
-	struct rvt_k_cq_wc *k_wc = NULL;
+	struct rvt_cq_wc *u_wc;
 	u32 sz;
 	unsigned int entries = attr->cqe;
 	int comp_vector = attr->comp_vector;
 	int err;
 
-	if (attr->flags)
+	if (attr->flags || ibcq->umem)
 		return -EOPNOTSUPP;
 
-	if (entries < 1 || entries > rdi->dparms.props.max_cqe)
+	if (entries > rdi->dparms.props.max_cqe)
 		return -EINVAL;
 
 	if (comp_vector < 0)
@@ -188,37 +187,27 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	 * We need to use vmalloc() in order to support mmap and large
 	 * numbers of entries.
 	 */
-	if (udata && udata->outlen >= sizeof(__u64)) {
-		sz = sizeof(struct ib_uverbs_wc) * (entries + 1);
-		sz += sizeof(*u_wc);
-		u_wc = vmalloc_user(sz);
-		if (!u_wc)
-			return -ENOMEM;
-	} else {
-		sz = sizeof(struct ib_wc) * (entries + 1);
-		sz += sizeof(*k_wc);
-		k_wc = vzalloc_node(sz, rdi->dparms.node);
-		if (!k_wc)
-			return -ENOMEM;
-	}
+	sz = sizeof(struct ib_uverbs_wc) * (entries + 1);
+	sz += sizeof(*u_wc);
+	u_wc = vmalloc_user(sz);
+	if (!u_wc)
+		return -ENOMEM;
 
 	/*
 	 * Return the address of the WC as the offset to mmap.
 	 * See rvt_mmap() for details.
 	 */
-	if (udata && udata->outlen >= sizeof(__u64)) {
-		cq->ip = rvt_create_mmap_info(rdi, sz, udata, u_wc);
-		if (IS_ERR(cq->ip)) {
-			err = PTR_ERR(cq->ip);
-			goto bail_wc;
-		}
-
-		err = ib_copy_to_udata(udata, &cq->ip->offset,
-				       sizeof(cq->ip->offset));
-		if (err)
-			goto bail_ip;
+	cq->ip = rvt_create_mmap_info(rdi, sz, udata, u_wc);
+	if (IS_ERR(cq->ip)) {
+		err = PTR_ERR(cq->ip);
+		goto bail_wc;
 	}
 
+	err = ib_copy_to_udata(udata, &cq->ip->offset,
+			       sizeof(cq->ip->offset));
+	if (err)
+		goto bail_ip;
+
 	spin_lock_irq(&rdi->n_cqs_lock);
 	if (rdi->n_cqs_allocated == rdi->dparms.props.max_cq) {
 		spin_unlock_irq(&rdi->n_cqs_lock);
@@ -229,11 +218,9 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	rdi->n_cqs_allocated++;
 	spin_unlock_irq(&rdi->n_cqs_lock);
 
-	if (cq->ip) {
-		spin_lock_irq(&rdi->pending_lock);
-		list_add(&cq->ip->pending_mmaps, &rdi->pending_mmaps);
-		spin_unlock_irq(&rdi->pending_lock);
-	}
+	spin_lock_irq(&rdi->pending_lock);
+	list_add(&cq->ip->pending_mmaps, &rdi->pending_mmaps);
+	spin_unlock_irq(&rdi->pending_lock);
 
 	/*
 	 * ib_create_cq() will initialize cq->ibcq except for cq->ibcq.cqe.
@@ -252,10 +239,7 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->notify = RVT_CQ_NONE;
 	spin_lock_init(&cq->lock);
 	INIT_WORK(&cq->comptask, send_complete);
-	if (u_wc)
-		cq->queue = u_wc;
-	else
-		cq->kqueue = k_wc;
+	cq->queue = u_wc;
 
 	trace_rvt_create_cq(cq, attr);
 	return 0;
@@ -264,6 +248,84 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	kfree(cq->ip);
 bail_wc:
 	vfree(u_wc);
+	return err;
+}
+
+/**
+ * rvt_create_cq - create a completion queue for kernel
+ * @ibcq: Allocated CQ
+ * @attr: creation attributes
+ * @attrs: uverbs bundle
+ *
+ * Called by ib_create_cq() in the generic verbs code for kernel CQs.
+ *
+ * Return: 0 on success
+ */
+int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		  struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *ibdev = ibcq->device;
+	struct rvt_dev_info *rdi = ib_to_rvt(ibdev);
+	struct rvt_cq *cq = ibcq_to_rvtcq(ibcq);
+	struct rvt_k_cq_wc *k_wc;
+	u32 sz;
+	unsigned int entries = attr->cqe;
+	int comp_vector = attr->comp_vector;
+	int err;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (entries > rdi->dparms.props.max_cqe)
+		return -EINVAL;
+
+	if (comp_vector < 0)
+		comp_vector = 0;
+
+	comp_vector = comp_vector % rdi->ibdev.num_comp_vectors;
+
+	/*
+	 * Allocate the completion queue entries and head/tail pointers.
+	 */
+	sz = sizeof(struct ib_wc) * (entries + 1);
+	sz += sizeof(*k_wc);
+	k_wc = vzalloc_node(sz, rdi->dparms.node);
+	if (!k_wc)
+		return -ENOMEM;
+
+	spin_lock_irq(&rdi->n_cqs_lock);
+	if (rdi->n_cqs_allocated == rdi->dparms.props.max_cq) {
+		spin_unlock_irq(&rdi->n_cqs_lock);
+		err = -ENOMEM;
+		goto bail_wc;
+	}
+
+	rdi->n_cqs_allocated++;
+	spin_unlock_irq(&rdi->n_cqs_lock);
+
+	/*
+	 * ib_create_cq() will initialize cq->ibcq except for cq->ibcq.cqe.
+	 * The number of entries should be >= the number requested or return
+	 * an error.
+	 */
+	cq->rdi = rdi;
+	if (rdi->driver_f.comp_vect_cpu_lookup)
+		cq->comp_vector_cpu =
+			rdi->driver_f.comp_vect_cpu_lookup(rdi, comp_vector);
+	else
+		cq->comp_vector_cpu =
+			cpumask_first(cpumask_of_node(rdi->dparms.node));
+
+	cq->ibcq.cqe = entries;
+	cq->notify = RVT_CQ_NONE;
+	spin_lock_init(&cq->lock);
+	INIT_WORK(&cq->comptask, send_complete);
+	cq->kqueue = k_wc;
+
+	trace_rvt_create_cq(cq, attr);
+	return 0;
+
+bail_wc:
 	vfree(k_wc);
 	return err;
 }
diff --git a/drivers/infiniband/sw/rdmavt/cq.h b/drivers/infiniband/sw/rdmavt/cq.h
index 4028702a7b2f..14ee2705c443 100644
--- a/drivers/infiniband/sw/rdmavt/cq.h
+++ b/drivers/infiniband/sw/rdmavt/cq.h
@@ -11,6 +11,8 @@
 
 int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		  struct uverbs_attr_bundle *attrs);
+int rvt_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs);
 int rvt_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags);
 int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index d22d610c2696..15964400b8d3 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -333,6 +333,7 @@ static const struct ib_device_ops rvt_dev_ops = {
 	.attach_mcast = rvt_attach_mcast,
 	.create_ah = rvt_create_ah,
 	.create_cq = rvt_create_cq,
+	.create_user_cq = rvt_create_user_cq,
 	.create_qp = rvt_create_qp,
 	.create_srq = rvt_create_srq,
 	.create_user_ah = rvt_create_ah,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 28/50] RDMA/siw: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (26 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 27/50] RDMA/rdmavt: Split " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 16:56   ` Bernard Metzler
  2026-02-13 10:58 ` [PATCH rdma-next 29/50] RDMA/rxe: " Leon Romanovsky
                   ` (23 subsequent siblings)
  51 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/siw/siw_main.c  |   1 +
 drivers/infiniband/sw/siw/siw_verbs.c | 111 +++++++++++++++++++++++-----------
 drivers/infiniband/sw/siw/siw_verbs.h |   2 +
 3 files changed, 80 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 5168307229a9..75dcf3578eac 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -232,6 +232,7 @@ static const struct ib_device_ops siw_device_ops = {
 	.alloc_pd = siw_alloc_pd,
 	.alloc_ucontext = siw_alloc_ucontext,
 	.create_cq = siw_create_cq,
+	.create_user_cq = siw_create_user_cq,
 	.create_qp = siw_create_qp,
 	.create_srq = siw_create_srq,
 	.dealloc_driver = siw_device_cleanup,
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index efa2f097b582..92b25b389b69 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1139,15 +1139,15 @@ int siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
  * @attrs: uverbs bundle
  */
 
-int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
-		  struct uverbs_attr_bundle *attrs)
+int siw_create_user_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct siw_device *sdev = to_siw_dev(base_cq->device);
 	struct siw_cq *cq = to_siw_cq(base_cq);
 	int rv, size = attr->cqe;
 
-	if (attr->flags)
+	if (attr->flags || base_cq->umem)
 		return -EOPNOTSUPP;
 
 	if (atomic_inc_return(&sdev->num_cq) > SIW_MAX_CQ) {
@@ -1155,7 +1155,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	if (size < 1 || size > sdev->attrs.max_cqe) {
+	if (attr->cqe > sdev->attrs.max_cqe) {
 		siw_dbg(base_cq->device, "CQ size error: %d\n", size);
 		rv = -EINVAL;
 		goto err_out;
@@ -1164,13 +1164,8 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 	cq->base_cq.cqe = size;
 	cq->num_cqe = size;
 
-	if (udata)
-		cq->queue = vmalloc_user(size * sizeof(struct siw_cqe) +
-					 sizeof(struct siw_cq_ctrl));
-	else
-		cq->queue = vzalloc(size * sizeof(struct siw_cqe) +
-				    sizeof(struct siw_cq_ctrl));
-
+	cq->queue = vmalloc_user(size * sizeof(struct siw_cqe) +
+				 sizeof(struct siw_cq_ctrl));
 	if (cq->queue == NULL) {
 		rv = -ENOMEM;
 		goto err_out;
@@ -1182,33 +1177,32 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 
 	cq->notify = (struct siw_cq_ctrl *)&cq->queue[size];
 
-	if (udata) {
-		struct siw_uresp_create_cq uresp = {};
-		struct siw_ucontext *ctx =
-			rdma_udata_to_drv_context(udata, struct siw_ucontext,
-						  base_ucontext);
-		size_t length = size * sizeof(struct siw_cqe) +
-			sizeof(struct siw_cq_ctrl);
+	struct siw_uresp_create_cq uresp = {};
+	struct siw_ucontext *ctx =
+		rdma_udata_to_drv_context(udata, struct siw_ucontext,
+					  base_ucontext);
+	size_t length = size * sizeof(struct siw_cqe) +
+		sizeof(struct siw_cq_ctrl);
 
-		cq->cq_entry =
-			siw_mmap_entry_insert(ctx, cq->queue,
-					      length, &uresp.cq_key);
-		if (!cq->cq_entry) {
-			rv = -ENOMEM;
-			goto err_out;
-		}
+	cq->cq_entry =
+		siw_mmap_entry_insert(ctx, cq->queue,
+				      length, &uresp.cq_key);
+	if (!cq->cq_entry) {
+		rv = -ENOMEM;
+		goto err_out;
+	}
 
-		uresp.cq_id = cq->id;
-		uresp.num_cqe = size;
+	uresp.cq_id = cq->id;
+	uresp.num_cqe = size;
 
-		if (udata->outlen < sizeof(uresp)) {
-			rv = -EINVAL;
-			goto err_out;
-		}
-		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
-		if (rv)
-			goto err_out;
+	if (udata->outlen < sizeof(uresp)) {
+		rv = -EINVAL;
+		goto err_out;
 	}
+	rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	if (rv)
+		goto err_out;
+
 	return 0;
 
 err_out:
@@ -1227,6 +1221,55 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 	return rv;
 }
 
+int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
+		  struct uverbs_attr_bundle *attrs)
+{
+	struct siw_device *sdev = to_siw_dev(base_cq->device);
+	struct siw_cq *cq = to_siw_cq(base_cq);
+	int rv, size = attr->cqe;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (atomic_inc_return(&sdev->num_cq) > SIW_MAX_CQ) {
+		siw_dbg(base_cq->device, "too many CQ's\n");
+		rv = -ENOMEM;
+		goto err_out;
+	}
+	if (size < 1 || size > sdev->attrs.max_cqe) {
+		siw_dbg(base_cq->device, "CQ size error: %d\n", size);
+		rv = -EINVAL;
+		goto err_out;
+	}
+	size = roundup_pow_of_two(size);
+	cq->base_cq.cqe = size;
+	cq->num_cqe = size;
+
+	cq->queue = vzalloc(size * sizeof(struct siw_cqe) +
+			    sizeof(struct siw_cq_ctrl));
+	if (cq->queue == NULL) {
+		rv = -ENOMEM;
+		goto err_out;
+	}
+	get_random_bytes(&cq->id, 4);
+	siw_dbg(base_cq->device, "new CQ [%u]\n", cq->id);
+
+	spin_lock_init(&cq->lock);
+
+	cq->notify = (struct siw_cq_ctrl *)&cq->queue[size];
+
+	return 0;
+
+err_out:
+	siw_dbg(base_cq->device, "CQ creation failed: %d", rv);
+
+	if (cq->queue)
+		vfree(cq->queue);
+	atomic_dec(&sdev->num_cq);
+
+	return rv;
+}
+
 /*
  * siw_poll_cq()
  *
diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
index e9f4463aecdc..527c356b55af 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.h
+++ b/drivers/infiniband/sw/siw/siw_verbs.h
@@ -44,6 +44,8 @@ int siw_query_device(struct ib_device *base_dev, struct ib_device_attr *attr,
 		     struct ib_udata *udata);
 int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
 		  struct uverbs_attr_bundle *attrs);
+int siw_create_user_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
+		       struct uverbs_attr_bundle *attrs);
 int siw_query_port(struct ib_device *base_dev, u32 port,
 		   struct ib_port_attr *attr);
 int siw_query_gid(struct ib_device *base_dev, u32 port, int idx,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 29/50] RDMA/rxe: Split user and kernel CQ creation paths
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (27 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 28/50] RDMA/siw: " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 23:22   ` yanjun.zhu
  2026-02-13 10:58 ` [PATCH rdma-next 30/50] RDMA/core: Remove legacy CQ creation fallback path Leon Romanovsky
                   ` (22 subsequent siblings)
  51 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Separate the CQ creation logic into distinct kernel and user flows.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 81 ++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 38d8c408320f..1e651bdd8622 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1072,58 +1072,70 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 }
 
 /* cq */
-static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
-			 struct uverbs_attr_bundle *attrs)
+static int rxe_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			      struct uverbs_attr_bundle *attrs)
 {
 	struct ib_udata *udata = &attrs->driver_udata;
 	struct ib_device *dev = ibcq->device;
 	struct rxe_dev *rxe = to_rdev(dev);
 	struct rxe_cq *cq = to_rcq(ibcq);
-	struct rxe_create_cq_resp __user *uresp = NULL;
-	int err, cleanup_err;
+	struct rxe_create_cq_resp __user *uresp;
+	int err;
 
-	if (udata) {
-		if (udata->outlen < sizeof(*uresp)) {
-			err = -EINVAL;
-			rxe_dbg_dev(rxe, "malformed udata, err = %d\n", err);
-			goto err_out;
-		}
-		uresp = udata->outbuf;
-	}
+	if (udata->outlen < sizeof(*uresp))
+		return -EINVAL;
 
-	if (attr->flags) {
-		err = -EOPNOTSUPP;
-		rxe_dbg_dev(rxe, "bad attr->flags, err = %d\n", err);
-		goto err_out;
-	}
+	uresp = udata->outbuf;
 
-	err = rxe_cq_chk_attr(rxe, NULL, attr->cqe, attr->comp_vector);
-	if (err) {
-		rxe_dbg_dev(rxe, "bad init attributes, err = %d\n", err);
-		goto err_out;
-	}
+	if (attr->flags || ibcq->umem)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > rxe->attr.max_cqe)
+		return -EINVAL;
 
 	err = rxe_add_to_pool(&rxe->cq_pool, cq);
-	if (err) {
-		rxe_dbg_dev(rxe, "unable to create cq, err = %d\n", err);
-		goto err_out;
-	}
+	if (err)
+		return err;
 
 	err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, udata,
 			       uresp);
-	if (err) {
-		rxe_dbg_cq(cq, "create cq failed, err = %d\n", err);
+	if (err)
 		goto err_cleanup;
-	}
 
 	return 0;
 
 err_cleanup:
-	cleanup_err = rxe_cleanup(cq);
-	if (cleanup_err)
-		rxe_err_cq(cq, "cleanup failed, err = %d\n", cleanup_err);
-err_out:
-	rxe_err_dev(rxe, "returned err = %d\n", err);
+	rxe_cleanup(cq);
+	return err;
+}
+
+static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+			 struct uverbs_attr_bundle *attrs)
+{
+	struct ib_device *dev = ibcq->device;
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_cq *cq = to_rcq(ibcq);
+	int err;
+
+	if (attr->flags)
+		return -EOPNOTSUPP;
+
+	if (attr->cqe > rxe->attr.max_cqe)
+		return -EINVAL;
+
+	err = rxe_add_to_pool(&rxe->cq_pool, cq);
+	if (err)
+		return err;
+
+	err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, NULL,
+			       NULL);
+	if (err)
+		goto err_cleanup;
+
+	return 0;
+
+err_cleanup:
+	rxe_cleanup(cq);
 	return err;
 }
 
@@ -1478,6 +1490,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.attach_mcast = rxe_attach_mcast,
 	.create_ah = rxe_create_ah,
 	.create_cq = rxe_create_cq,
+	.create_user_cq = rxe_create_user_cq,
 	.create_qp = rxe_create_qp,
 	.create_srq = rxe_create_srq,
 	.create_user_ah = rxe_create_ah,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 30/50] RDMA/core: Remove legacy CQ creation fallback path
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (28 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 29/50] RDMA/rxe: " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 31/50] RDMA/core: Remove unused ib_resize_cq() implementation Leon Romanovsky
                   ` (21 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

All drivers now support the modern CQ creation interface via the
create_user_cq callback. Remove the legacy fallback to create_cq
for userspace CQ creation.

This simplifies the core code by eliminating conditional logic and
ensures all userspace CQ creation goes through the modern interface
that properly supports user-supplied umem.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/uverbs_cmd.c          | 9 +++------
 drivers/infiniband/core/uverbs_std_types_cq.c | 8 ++------
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 041bed7a43b4..cdfee86fb800 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1071,10 +1071,7 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
 
-	if (ib_dev->ops.create_user_cq)
-		ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
-	else
-		ret = ib_dev->ops.create_cq(cq, &attr, attrs);
+	ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
 	if (ret)
 		goto err_free;
 	rdma_restrack_add(&cq->res);
@@ -3791,7 +3788,7 @@ const struct uapi_definition uverbs_def_write_intf[] = {
 				     UAPI_DEF_WRITE_UDATA_IO(
 					     struct ib_uverbs_create_cq,
 					     struct ib_uverbs_create_cq_resp),
-				     UAPI_DEF_METHOD_NEEDS_FN(create_cq)),
+				     UAPI_DEF_METHOD_NEEDS_FN(create_user_cq)),
 		DECLARE_UVERBS_WRITE(
 			IB_USER_VERBS_CMD_DESTROY_CQ,
 			ib_uverbs_destroy_cq,
@@ -3822,7 +3819,7 @@ const struct uapi_definition uverbs_def_write_intf[] = {
 					     reserved,
 					     struct ib_uverbs_ex_create_cq_resp,
 					     response_length),
-			UAPI_DEF_METHOD_NEEDS_FN(create_cq)),
+			UAPI_DEF_METHOD_NEEDS_FN(create_user_cq)),
 		DECLARE_UVERBS_WRITE_EX(
 			IB_USER_VERBS_EX_CMD_MODIFY_CQ,
 			ib_uverbs_ex_modify_cq,
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index d2c8f71f934c..a12e3184dd5c 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -78,8 +78,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	int buffer_fd;
 	int ret;
 
-	if ((!ib_dev->ops.create_cq && !ib_dev->ops.create_user_cq) ||
-	    !ib_dev->ops.destroy_cq)
+	if (!ib_dev->ops.create_user_cq || !ib_dev->ops.destroy_cq)
 		return -EOPNOTSUPP;
 
 	ret = uverbs_copy_from(&attr.comp_vector, attrs,
@@ -200,10 +199,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
 
-	if (ib_dev->ops.create_user_cq)
-		ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
-	else
-		ret = ib_dev->ops.create_cq(cq, &attr, attrs);
+	ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
 	if (ret)
 		goto err_free;
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 31/50] RDMA/core: Remove unused ib_resize_cq() implementation
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (29 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 30/50] RDMA/core: Remove legacy CQ creation fallback path Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 32/50] RDMA: Clarify that CQ resize is a user‑space verb Leon Romanovsky
                   ` (20 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There are no in-kernel users of the CQ resize functionality, so drop it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/verbs.c | 10 ----------
 include/rdma/ib_verbs.h         |  9 ---------
 2 files changed, 19 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 9d075eeda463..5f59487fc9d4 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2264,16 +2264,6 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
 }
 EXPORT_SYMBOL(ib_destroy_cq_user);
 
-int ib_resize_cq(struct ib_cq *cq, int cqe)
-{
-	if (cq->shared)
-		return -EOPNOTSUPP;
-
-	return cq->device->ops.resize_cq ?
-		cq->device->ops.resize_cq(cq, cqe, NULL) : -EOPNOTSUPP;
-}
-EXPORT_SYMBOL(ib_resize_cq);
-
 /* Memory regions */
 
 struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 67aa5fc2c0b7..b8adc2f17e73 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4001,15 +4001,6 @@ struct ib_cq *__ib_create_cq(struct ib_device *device,
 #define ib_create_cq(device, cmp_hndlr, evt_hndlr, cq_ctxt, cq_attr) \
 	__ib_create_cq((device), (cmp_hndlr), (evt_hndlr), (cq_ctxt), (cq_attr), KBUILD_MODNAME)
 
-/**
- * ib_resize_cq - Modifies the capacity of the CQ.
- * @cq: The CQ to resize.
- * @cqe: The minimum size of the CQ.
- *
- * Users can examine the cq structure to determine the actual CQ size.
- */
-int ib_resize_cq(struct ib_cq *cq, int cqe);
-
 /**
  * rdma_set_cq_moderation - Modifies moderation params of the CQ
  * @cq: The CQ to modify.

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 32/50] RDMA: Clarify that CQ resize is a user‑space verb
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (30 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 31/50] RDMA/core: Remove unused ib_resize_cq() implementation Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 33/50] RDMA/bnxt_re: Drop support for resizing kernel CQs Leon Romanovsky
                   ` (19 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ resize operation is used only by uverbs. Make this explicit.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c             | 2 +-
 drivers/infiniband/core/uverbs_cmd.c         | 4 ++--
 drivers/infiniband/hw/bnxt_re/main.c         | 2 +-
 drivers/infiniband/hw/irdma/verbs.c          | 2 +-
 drivers/infiniband/hw/mlx4/main.c            | 2 +-
 drivers/infiniband/hw/mlx5/main.c            | 2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c | 2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   | 2 +-
 drivers/infiniband/sw/rdmavt/vt.c            | 2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c        | 2 +-
 include/rdma/ib_verbs.h                      | 3 ++-
 11 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 9209b8c664ef..9411f7805eed 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2799,7 +2799,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
 	SET_DEVICE_OP(dev_ops, req_notify_cq);
 	SET_DEVICE_OP(dev_ops, rereg_user_mr);
-	SET_DEVICE_OP(dev_ops, resize_cq);
+	SET_DEVICE_OP(dev_ops, resize_user_cq);
 	SET_DEVICE_OP(dev_ops, set_vf_guid);
 	SET_DEVICE_OP(dev_ops, set_vf_link_state);
 	SET_DEVICE_OP(dev_ops, ufile_hw_cleanup);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cdfee86fb800..57697738fd25 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1151,7 +1151,7 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 	if (IS_ERR(cq))
 		return PTR_ERR(cq);
 
-	ret = cq->device->ops.resize_cq(cq, cmd.cqe, &attrs->driver_udata);
+	ret = cq->device->ops.resize_user_cq(cq, cmd.cqe, &attrs->driver_udata);
 	if (ret)
 		goto out;
 
@@ -3811,7 +3811,7 @@ const struct uapi_definition uverbs_def_write_intf[] = {
 				     UAPI_DEF_WRITE_UDATA_IO(
 					     struct ib_uverbs_resize_cq,
 					     struct ib_uverbs_resize_cq_resp),
-				     UAPI_DEF_METHOD_NEEDS_FN(resize_cq)),
+				     UAPI_DEF_METHOD_NEEDS_FN(resize_user_cq)),
 		DECLARE_UVERBS_WRITE_EX(
 			IB_USER_VERBS_EX_CMD_CREATE_CQ,
 			ib_uverbs_ex_create_cq,
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 368c1fd8172e..ccc01fc222ca 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -1373,7 +1373,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
 	.reg_user_mr = bnxt_re_reg_user_mr,
 	.reg_user_mr_dmabuf = bnxt_re_reg_user_mr_dmabuf,
 	.req_notify_cq = bnxt_re_req_notify_cq,
-	.resize_cq = bnxt_re_resize_cq,
+	.resize_user_cq = bnxt_re_resize_cq,
 	.create_flow = bnxt_re_create_flow,
 	.destroy_flow = bnxt_re_destroy_flow,
 	INIT_RDMA_OBJ_SIZE(ib_ah, bnxt_re_ah, ib_ah),
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index f2b3cfe125af..f727d1922a84 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -5460,7 +5460,7 @@ static const struct ib_device_ops irdma_dev_ops = {
 	.reg_user_mr_dmabuf = irdma_reg_user_mr_dmabuf,
 	.rereg_user_mr = irdma_rereg_user_mr,
 	.req_notify_cq = irdma_req_notify_cq,
-	.resize_cq = irdma_resize_cq,
+	.resize_user_cq = irdma_resize_cq,
 	INIT_RDMA_OBJ_SIZE(ib_pd, irdma_pd, ibpd),
 	INIT_RDMA_OBJ_SIZE(ib_ucontext, irdma_ucontext, ibucontext),
 	INIT_RDMA_OBJ_SIZE(ib_ah, irdma_ah, ibah),
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index fc05e7a1a870..daf95f94ec6f 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2570,7 +2570,7 @@ static const struct ib_device_ops mlx4_ib_dev_ops = {
 	.reg_user_mr = mlx4_ib_reg_user_mr,
 	.req_notify_cq = mlx4_ib_arm_cq,
 	.rereg_user_mr = mlx4_ib_rereg_user_mr,
-	.resize_cq = mlx4_ib_resize_cq,
+	.resize_user_cq = mlx4_ib_resize_cq,
 	.report_port_event = mlx4_ib_port_event,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx4_ib_ah, ibah),
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4f49f65e2c16..0471155eb739 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4496,7 +4496,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
-	.resize_cq = mlx5_ib_resize_cq,
+	.resize_user_cq = mlx5_ib_resize_cq,
 	.ufile_hw_cleanup = mlx5_ib_ufile_hw_cleanup,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 6bf825978846..8920deceea73 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1119,7 +1119,7 @@ static const struct ib_device_ops mthca_dev_ops = {
 	.query_port = mthca_query_port,
 	.query_qp = mthca_query_qp,
 	.reg_user_mr = mthca_reg_user_mr,
-	.resize_cq = mthca_resize_cq,
+	.resize_user_cq = mthca_resize_cq,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mthca_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, mthca_cq, ibcq),
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 0d89c5ec9a7a..7dafebc7f57e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -167,7 +167,7 @@ static const struct ib_device_ops ocrdma_dev_ops = {
 	.query_qp = ocrdma_query_qp,
 	.reg_user_mr = ocrdma_reg_user_mr,
 	.req_notify_cq = ocrdma_arm_cq,
-	.resize_cq = ocrdma_resize_cq,
+	.resize_user_cq = ocrdma_resize_cq,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, ocrdma_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, ocrdma_cq, ibcq),
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 15964400b8d3..5aff65b3916b 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -368,7 +368,7 @@ static const struct ib_device_ops rvt_dev_ops = {
 	.query_srq = rvt_query_srq,
 	.reg_user_mr = rvt_reg_user_mr,
 	.req_notify_cq = rvt_req_notify_cq,
-	.resize_cq = rvt_resize_cq,
+	.resize_user_cq = rvt_resize_cq,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, rvt_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rvt_cq, ibcq),
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 1e651bdd8622..72e3019ed1cb 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1532,7 +1532,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.reg_user_mr = rxe_reg_user_mr,
 	.req_notify_cq = rxe_req_notify_cq,
 	.rereg_user_mr = rxe_rereg_user_mr,
-	.resize_cq = rxe_resize_cq,
+	.resize_user_cq = rxe_resize_cq,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah),
 	INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b8adc2f17e73..94bb3cc4c67a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2534,7 +2534,8 @@ struct ib_device_ops {
 			      struct uverbs_attr_bundle *attrs);
 	int (*modify_cq)(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 	int (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata);
-	int (*resize_cq)(struct ib_cq *cq, int cqe, struct ib_udata *udata);
+	int (*resize_user_cq)(struct ib_cq *cq, int cqe,
+			      struct ib_udata *udata);
 	/*
 	 * pre_destroy_cq - Prevent a cq from generating any new work
 	 * completions, but not free any kernel resources

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 33/50] RDMA/bnxt_re: Drop support for resizing kernel CQs
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (31 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 32/50] RDMA: Clarify that CQ resize is a user‑space verb Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 34/50] RDMA/irdma: Remove resize support for " Leon Romanovsky
                   ` (18 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There are no ULP callers that use the CQ resize functionality, so remove it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index b8516d8b8426..16bb586d68c7 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3338,10 +3338,6 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	cq =  container_of(ibcq, struct bnxt_re_cq, ib_cq);
 	rdev = cq->rdev;
 	dev_attr = rdev->dev_attr;
-	if (!ibcq->uobject) {
-		ibdev_err(&rdev->ibdev, "Kernel CQ Resize not supported");
-		return -EOPNOTSUPP;
-	}
 
 	if (cq->resize_umem) {
 		ibdev_err(&rdev->ibdev, "Resize CQ %#x failed - Busy",
@@ -3375,7 +3371,7 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 		ibdev_err(&rdev->ibdev, "%s: ib_umem_get failed! rc = %pe\n",
 			  __func__, cq->resize_umem);
 		cq->resize_umem = NULL;
-		goto fail;
+		return rc;
 	}
 	cq->resize_cqe = entries;
 	memcpy(&sg_info, &cq->qplib_cq.sg_info, sizeof(sg_info));
@@ -3399,13 +3395,11 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	return 0;
 
 fail:
-	if (cq->resize_umem) {
-		ib_umem_release(cq->resize_umem);
-		cq->resize_umem = NULL;
-		cq->resize_cqe = 0;
-		memcpy(&cq->qplib_cq.sg_info, &sg_info, sizeof(sg_info));
-		cq->qplib_cq.dpi = orig_dpi;
-	}
+	ib_umem_release(cq->resize_umem);
+	cq->resize_umem = NULL;
+	cq->resize_cqe = 0;
+	memcpy(&cq->qplib_cq.sg_info, &sg_info, sizeof(sg_info));
+	cq->qplib_cq.dpi = orig_dpi;
 	return rc;
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 34/50] RDMA/irdma: Remove resize support for kernel CQs
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (32 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 33/50] RDMA/bnxt_re: Drop support for resizing kernel CQs Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 35/50] RDMA/mlx4: Remove support for kernel CQ resize Leon Romanovsky
                   ` (17 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ resize operation is a uverbs-only interface and is not required for
kernel-created CQs. Drop this unused functionality.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/irdma/verbs.c | 88 +++++++++----------------------------
 1 file changed, 21 insertions(+), 67 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index f727d1922a84..d5442aebf1ac 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -2015,6 +2015,9 @@ static int irdma_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata)
 static int irdma_resize_cq(struct ib_cq *ibcq, int entries,
 			   struct ib_udata *udata)
 {
+	struct irdma_resize_cq_req req = {};
+	struct irdma_ucontext *ucontext = rdma_udata_to_drv_context(
+		udata, struct irdma_ucontext, ibucontext);
 #define IRDMA_RESIZE_CQ_MIN_REQ_LEN offsetofend(struct irdma_resize_cq_req, user_cq_buffer)
 	struct irdma_cq *iwcq = to_iwcq(ibcq);
 	struct irdma_sc_dev *dev = iwcq->sc_cq.dev;
@@ -2029,7 +2032,6 @@ static int irdma_resize_cq(struct ib_cq *ibcq, int entries,
 	struct irdma_pci_f *rf;
 	struct irdma_cq_buf *cq_buf = NULL;
 	unsigned long flags;
-	u8 cqe_size;
 	int ret;
 
 	iwdev = to_iwdev(ibcq->device);
@@ -2039,81 +2041,39 @@ static int irdma_resize_cq(struct ib_cq *ibcq, int entries,
 	    IRDMA_FEATURE_CQ_RESIZE))
 		return -EOPNOTSUPP;
 
-	if (udata && udata->inlen < IRDMA_RESIZE_CQ_MIN_REQ_LEN)
+	if (udata->inlen < IRDMA_RESIZE_CQ_MIN_REQ_LEN)
 		return -EINVAL;
 
 	if (entries > rf->max_cqe)
 		return -EINVAL;
 
-	if (!iwcq->user_mode) {
-		entries += 2;
-
-		if (!iwcq->sc_cq.cq_uk.avoid_mem_cflct &&
-		    dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2)
-			entries *= 2;
-
-		if (entries & 1)
-			entries += 1; /* cq size must be an even number */
-
-		cqe_size = iwcq->sc_cq.cq_uk.avoid_mem_cflct ? 64 : 32;
-		if (entries * cqe_size == IRDMA_HW_PAGE_SIZE)
-			entries += 2;
-	}
-
 	info.cq_size = max(entries, 4);
 
 	if (info.cq_size == iwcq->sc_cq.cq_uk.cq_size - 1)
 		return 0;
 
-	if (udata) {
-		struct irdma_resize_cq_req req = {};
-		struct irdma_ucontext *ucontext =
-			rdma_udata_to_drv_context(udata, struct irdma_ucontext,
-						  ibucontext);
-
-		/* CQ resize not supported with legacy GEN_1 libi40iw */
-		if (ucontext->legacy_mode)
-			return -EOPNOTSUPP;
+	/* CQ resize not supported with legacy GEN_1 libi40iw */
+	if (ucontext->legacy_mode)
+		return -EOPNOTSUPP;
 
-		if (ib_copy_from_udata(&req, udata,
-				       min(sizeof(req), udata->inlen)))
-			return -EINVAL;
+	if (ib_copy_from_udata(&req, udata, min(sizeof(req), udata->inlen)))
+		return -EINVAL;
 
-		spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
-		iwpbl_buf = irdma_get_pbl((unsigned long)req.user_cq_buffer,
-					  &ucontext->cq_reg_mem_list);
-		spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
+	spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
+	iwpbl_buf = irdma_get_pbl((unsigned long)req.user_cq_buffer,
+				  &ucontext->cq_reg_mem_list);
+	spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
 
-		if (!iwpbl_buf)
-			return -ENOMEM;
+	if (!iwpbl_buf)
+		return -ENOMEM;
 
-		cqmr_buf = &iwpbl_buf->cq_mr;
-		if (iwpbl_buf->pbl_allocated) {
-			info.virtual_map = true;
-			info.pbl_chunk_size = 1;
-			info.first_pm_pbl_idx = cqmr_buf->cq_pbl.idx;
-		} else {
-			info.cq_pa = cqmr_buf->cq_pbl.addr;
-		}
+	cqmr_buf = &iwpbl_buf->cq_mr;
+	if (iwpbl_buf->pbl_allocated) {
+		info.virtual_map = true;
+		info.pbl_chunk_size = 1;
+		info.first_pm_pbl_idx = cqmr_buf->cq_pbl.idx;
 	} else {
-		/* Kmode CQ resize */
-		int rsize;
-
-		rsize = info.cq_size * sizeof(struct irdma_cqe);
-		kmem_buf.size = ALIGN(round_up(rsize, 256), 256);
-		kmem_buf.va = dma_alloc_coherent(dev->hw->device,
-						 kmem_buf.size, &kmem_buf.pa,
-						 GFP_KERNEL);
-		if (!kmem_buf.va)
-			return -ENOMEM;
-
-		info.cq_base = kmem_buf.va;
-		info.cq_pa = kmem_buf.pa;
-		cq_buf = kzalloc(sizeof(*cq_buf), GFP_KERNEL);
-		if (!cq_buf) {
-			ret = -ENOMEM;
-			goto error;
-		}
+		info.cq_pa = cqmr_buf->cq_pbl.addr;
 	}
 
 	cqp_request = irdma_alloc_and_get_cqp_request(&rf->cqp, true);
@@ -2154,13 +2114,7 @@ static int irdma_resize_cq(struct ib_cq *ibcq, int entries,
 
 	return 0;
 error:
-	if (!udata) {
-		dma_free_coherent(dev->hw->device, kmem_buf.size, kmem_buf.va,
-				  kmem_buf.pa);
-		kmem_buf.va = NULL;
-	}
 	kfree(cq_buf);
-
 	return ret;
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 35/50] RDMA/mlx4: Remove support for kernel CQ resize
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (33 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 34/50] RDMA/irdma: Remove resize support for " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 36/50] RDMA/mlx5: Remove support for resizing kernel CQs Leon Romanovsky
                   ` (16 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

No upper‑layer protocol currently uses CQ resize, and the feature has no
active callers. Drop the unused functionality.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c | 167 +++++-----------------------------------
 1 file changed, 21 insertions(+), 146 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 83169060d120..05fad06b89c2 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -296,30 +296,6 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return err;
 }
 
-static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
-				  int entries)
-{
-	int err;
-
-	if (cq->resize_buf)
-		return -EBUSY;
-
-	cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_KERNEL);
-	if (!cq->resize_buf)
-		return -ENOMEM;
-
-	err = mlx4_ib_alloc_cq_buf(dev, &cq->resize_buf->buf, entries);
-	if (err) {
-		kfree(cq->resize_buf);
-		cq->resize_buf = NULL;
-		return err;
-	}
-
-	cq->resize_buf->cqe = entries - 1;
-
-	return 0;
-}
-
 static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
 				   int entries, struct ib_udata *udata)
 {
@@ -329,9 +305,6 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	int n;
 	int err;
 
-	if (cq->resize_umem)
-		return -EBUSY;
-
 	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd))
 		return -EFAULT;
 
@@ -371,91 +344,36 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 
 err_umem:
 	ib_umem_release(cq->resize_umem);
-
+	cq->resize_umem = NULL;
 err_buf:
 	kfree(cq->resize_buf);
 	cq->resize_buf = NULL;
 	return err;
 }
 
-static int mlx4_ib_get_outstanding_cqes(struct mlx4_ib_cq *cq)
-{
-	u32 i;
-
-	i = cq->mcq.cons_index;
-	while (get_sw_cqe(cq, i))
-		++i;
-
-	return i - cq->mcq.cons_index;
-}
-
-static void mlx4_ib_cq_resize_copy_cqes(struct mlx4_ib_cq *cq)
-{
-	struct mlx4_cqe *cqe, *new_cqe;
-	int i;
-	int cqe_size = cq->buf.entry_size;
-	int cqe_inc = cqe_size == 64 ? 1 : 0;
-
-	i = cq->mcq.cons_index;
-	cqe = get_cqe(cq, i & cq->ibcq.cqe);
-	cqe += cqe_inc;
-
-	while ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) != MLX4_CQE_OPCODE_RESIZE) {
-		new_cqe = get_cqe_from_buf(&cq->resize_buf->buf,
-					   (i + 1) & cq->resize_buf->cqe);
-		memcpy(new_cqe, get_cqe(cq, i & cq->ibcq.cqe), cqe_size);
-		new_cqe += cqe_inc;
-
-		new_cqe->owner_sr_opcode = (cqe->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) |
-			(((i + 1) & (cq->resize_buf->cqe + 1)) ? MLX4_CQE_OWNER_MASK : 0);
-		cqe = get_cqe(cq, ++i & cq->ibcq.cqe);
-		cqe += cqe_inc;
-	}
-	++cq->mcq.cons_index;
-}
-
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 {
 	struct mlx4_ib_dev *dev = to_mdev(ibcq->device);
 	struct mlx4_ib_cq *cq = to_mcq(ibcq);
 	struct mlx4_mtt mtt;
-	int outst_cqe;
 	int err;
 
-	mutex_lock(&cq->resize_mutex);
-	if (entries < 1 || entries > dev->dev->caps.max_cqes) {
-		err = -EINVAL;
-		goto out;
-	}
+	if (entries < 1 || entries > dev->dev->caps.max_cqes)
+		return -EINVAL;
 
 	entries = roundup_pow_of_two(entries + 1);
-	if (entries == ibcq->cqe + 1) {
-		err = 0;
-		goto out;
-	}
-
-	if (entries > dev->dev->caps.max_cqes + 1) {
-		err = -EINVAL;
-		goto out;
-	}
+	if (entries == ibcq->cqe + 1)
+		return 0;
 
-	if (ibcq->uobject) {
-		err = mlx4_alloc_resize_umem(dev, cq, entries, udata);
-		if (err)
-			goto out;
-	} else {
-		/* Can't be smaller than the number of outstanding CQEs */
-		outst_cqe = mlx4_ib_get_outstanding_cqes(cq);
-		if (entries < outst_cqe + 1) {
-			err = -EINVAL;
-			goto out;
-		}
+	if (entries > dev->dev->caps.max_cqes + 1)
+		return -EINVAL;
 
-		err = mlx4_alloc_resize_buf(dev, cq, entries);
-		if (err)
-			goto out;
+	mutex_lock(&cq->resize_mutex);
+	err = mlx4_alloc_resize_umem(dev, cq, entries, udata);
+	if (err) {
+		mutex_unlock(&cq->resize_mutex);
+		return err;
 	}
-
 	mtt = cq->buf.mtt;
 
 	err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt);
@@ -463,52 +381,26 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 		goto err_buf;
 
 	mlx4_mtt_cleanup(dev->dev, &mtt);
-	if (ibcq->uobject) {
-		cq->buf      = cq->resize_buf->buf;
-		cq->ibcq.cqe = cq->resize_buf->cqe;
-		ib_umem_release(cq->ibcq.umem);
-		cq->ibcq.umem     = cq->resize_umem;
-
-		kfree(cq->resize_buf);
-		cq->resize_buf = NULL;
-		cq->resize_umem = NULL;
-	} else {
-		struct mlx4_ib_cq_buf tmp_buf;
-		int tmp_cqe = 0;
-
-		spin_lock_irq(&cq->lock);
-		if (cq->resize_buf) {
-			mlx4_ib_cq_resize_copy_cqes(cq);
-			tmp_buf = cq->buf;
-			tmp_cqe = cq->ibcq.cqe;
-			cq->buf      = cq->resize_buf->buf;
-			cq->ibcq.cqe = cq->resize_buf->cqe;
-
-			kfree(cq->resize_buf);
-			cq->resize_buf = NULL;
-		}
-		spin_unlock_irq(&cq->lock);
+	cq->buf = cq->resize_buf->buf;
+	cq->ibcq.cqe = cq->resize_buf->cqe;
+	ib_umem_release(cq->ibcq.umem);
+	cq->ibcq.umem = cq->resize_umem;
 
-		if (tmp_cqe)
-			mlx4_ib_free_cq_buf(dev, &tmp_buf, tmp_cqe);
-	}
+	kfree(cq->resize_buf);
+	cq->resize_buf = NULL;
+	cq->resize_umem = NULL;
+	mutex_unlock(&cq->resize_mutex);
+	return 0;
 
-	goto out;
 
 err_buf:
 	mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt);
-	if (!ibcq->uobject)
-		mlx4_ib_free_cq_buf(dev, &cq->resize_buf->buf,
-				    cq->resize_buf->cqe);
-
 	kfree(cq->resize_buf);
 	cq->resize_buf = NULL;
 
 	ib_umem_release(cq->resize_umem);
 	cq->resize_umem = NULL;
-out:
 	mutex_unlock(&cq->resize_mutex);
-
 	return err;
 }
 
@@ -707,7 +599,6 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 	u16 wqe_ctr;
 	unsigned tail = 0;
 
-repoll:
 	cqe = next_cqe_sw(cq);
 	if (!cqe)
 		return -EAGAIN;
@@ -727,22 +618,6 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 	is_error = (cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
 		MLX4_CQE_OPCODE_ERROR;
 
-	/* Resize CQ in progress */
-	if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_CQE_OPCODE_RESIZE)) {
-		if (cq->resize_buf) {
-			struct mlx4_ib_dev *dev = to_mdev(cq->ibcq.device);
-
-			mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
-			cq->buf      = cq->resize_buf->buf;
-			cq->ibcq.cqe = cq->resize_buf->cqe;
-
-			kfree(cq->resize_buf);
-			cq->resize_buf = NULL;
-		}
-
-		goto repoll;
-	}
-
 	if (!*cur_qp ||
 	    (be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK) != (*cur_qp)->mqp.qpn) {
 		/*

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 36/50] RDMA/mlx5: Remove support for resizing kernel CQs
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (34 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 35/50] RDMA/mlx4: Remove support for kernel CQ resize Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 37/50] RDMA/mthca: Remove resize support for " Leon Romanovsky
                   ` (15 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

No ULP users rely on CQ resize support, so drop the unused code.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c | 161 +++++-----------------------------------
 1 file changed, 18 insertions(+), 143 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 52a435efd0de..ce20af01cde0 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -74,11 +74,6 @@ static void *get_cqe(struct mlx5_ib_cq *cq, int n)
 	return mlx5_frag_buf_get_wqe(&cq->buf.fbc, n);
 }
 
-static u8 sw_ownership_bit(int n, int nent)
-{
-	return (n & nent) ? 1 : 0;
-}
-
 static void *get_sw_cqe(struct mlx5_ib_cq *cq, int n)
 {
 	void *cqe = get_cqe(cq, n & cq->ibcq.cqe);
@@ -1258,87 +1253,11 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	return 0;
 }
 
-static int resize_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
-			 int entries, int cqe_size)
-{
-	int err;
-
-	cq->resize_buf = kzalloc(sizeof(*cq->resize_buf), GFP_KERNEL);
-	if (!cq->resize_buf)
-		return -ENOMEM;
-
-	err = alloc_cq_frag_buf(dev, cq->resize_buf, entries, cqe_size);
-	if (err)
-		goto ex;
-
-	init_cq_frag_buf(cq->resize_buf);
-
-	return 0;
-
-ex:
-	kfree(cq->resize_buf);
-	return err;
-}
-
-static int copy_resize_cqes(struct mlx5_ib_cq *cq)
-{
-	struct mlx5_ib_dev *dev = to_mdev(cq->ibcq.device);
-	struct mlx5_cqe64 *scqe64;
-	struct mlx5_cqe64 *dcqe64;
-	void *start_cqe;
-	void *scqe;
-	void *dcqe;
-	int ssize;
-	int dsize;
-	int i;
-	u8 sw_own;
-
-	ssize = cq->buf.cqe_size;
-	dsize = cq->resize_buf->cqe_size;
-	if (ssize != dsize) {
-		mlx5_ib_warn(dev, "resize from different cqe size is not supported\n");
-		return -EINVAL;
-	}
-
-	i = cq->mcq.cons_index;
-	scqe = get_sw_cqe(cq, i);
-	scqe64 = ssize == 64 ? scqe : scqe + 64;
-	start_cqe = scqe;
-	if (!scqe) {
-		mlx5_ib_warn(dev, "expected cqe in sw ownership\n");
-		return -EINVAL;
-	}
-
-	while (get_cqe_opcode(scqe64) != MLX5_CQE_RESIZE_CQ) {
-		dcqe = mlx5_frag_buf_get_wqe(&cq->resize_buf->fbc,
-					     (i + 1) & cq->resize_buf->nent);
-		dcqe64 = dsize == 64 ? dcqe : dcqe + 64;
-		sw_own = sw_ownership_bit(i + 1, cq->resize_buf->nent);
-		memcpy(dcqe, scqe, dsize);
-		dcqe64->op_own = (dcqe64->op_own & ~MLX5_CQE_OWNER_MASK) | sw_own;
-
-		++i;
-		scqe = get_sw_cqe(cq, i);
-		scqe64 = ssize == 64 ? scqe : scqe + 64;
-		if (!scqe) {
-			mlx5_ib_warn(dev, "expected cqe in sw ownership\n");
-			return -EINVAL;
-		}
-
-		if (scqe == start_cqe) {
-			pr_warn("resize CQ failed to get resize CQE, CQN 0x%x\n",
-				cq->mcq.cqn);
-			return -ENOMEM;
-		}
-	}
-	++cq->mcq.cons_index;
-	return 0;
-}
-
 int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibcq->device);
 	struct mlx5_ib_cq *cq = to_mcq(ibcq);
+	unsigned long page_size;
 	void *cqc;
 	u32 *in;
 	int err;
@@ -1348,7 +1267,6 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 	unsigned int page_shift;
 	int inlen;
 	int cqe_size;
-	unsigned long flags;
 
 	if (!MLX5_CAP_GEN(dev->mdev, cq_resize)) {
 		pr_info("Firmware does not support resize CQ\n");
@@ -1371,34 +1289,19 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 		return 0;
 
 	mutex_lock(&cq->resize_mutex);
-	if (udata) {
-		unsigned long page_size;
-
-		err = resize_user(dev, cq, entries, udata, &cqe_size);
-		if (err)
-			goto ex;
-
-		page_size = mlx5_umem_find_best_cq_quantized_pgoff(
-			cq->resize_umem, cqc, log_page_size,
-			MLX5_ADAPTER_PAGE_SHIFT, page_offset, 64,
-			&page_offset_quantized);
-		if (!page_size) {
-			err = -EINVAL;
-			goto ex_resize;
-		}
-		npas = ib_umem_num_dma_blocks(cq->resize_umem, page_size);
-		page_shift = order_base_2(page_size);
-	} else {
-		struct mlx5_frag_buf *frag_buf;
+	err = resize_user(dev, cq, entries, udata, &cqe_size);
+	if (err)
+		goto ex;
 
-		cqe_size = 64;
-		err = resize_kernel(dev, cq, entries, cqe_size);
-		if (err)
-			goto ex;
-		frag_buf = &cq->resize_buf->frag_buf;
-		npas = frag_buf->npages;
-		page_shift = frag_buf->page_shift;
+	page_size = mlx5_umem_find_best_cq_quantized_pgoff(
+		cq->resize_umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
+		page_offset, 64, &page_offset_quantized);
+	if (!page_size) {
+		err = -EINVAL;
+		goto ex_resize;
 	}
+	npas = ib_umem_num_dma_blocks(cq->resize_umem, page_size);
+	page_shift = order_base_2(page_size);
 
 	inlen = MLX5_ST_SZ_BYTES(modify_cq_in) +
 		MLX5_FLD_SZ_BYTES(modify_cq_in, pas[0]) * npas;
@@ -1410,11 +1313,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 	}
 
 	pas = (__be64 *)MLX5_ADDR_OF(modify_cq_in, in, pas);
-	if (udata)
-		mlx5_ib_populate_pas(cq->resize_umem, 1UL << page_shift, pas,
-				     0);
-	else
-		mlx5_fill_page_frag_array(&cq->resize_buf->frag_buf, pas);
+	mlx5_ib_populate_pas(cq->resize_umem, 1UL << page_shift, pas, 0);
 
 	MLX5_SET(modify_cq_in, in,
 		 modify_field_select_resize_field_select.resize_field_select.resize_field_select,
@@ -1440,31 +1339,10 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 	if (err)
 		goto ex_alloc;
 
-	if (udata) {
-		cq->ibcq.cqe = entries - 1;
-		ib_umem_release(cq->ibcq.umem);
-		cq->ibcq.umem = cq->resize_umem;
-		cq->resize_umem = NULL;
-	} else {
-		struct mlx5_ib_cq_buf tbuf;
-		int resized = 0;
-
-		spin_lock_irqsave(&cq->lock, flags);
-		if (cq->resize_buf) {
-			err = copy_resize_cqes(cq);
-			if (!err) {
-				tbuf = cq->buf;
-				cq->buf = *cq->resize_buf;
-				kfree(cq->resize_buf);
-				cq->resize_buf = NULL;
-				resized = 1;
-			}
-		}
-		cq->ibcq.cqe = entries - 1;
-		spin_unlock_irqrestore(&cq->lock, flags);
-		if (resized)
-			free_cq_buf(dev, &tbuf);
-	}
+	cq->ibcq.cqe = entries - 1;
+	ib_umem_release(cq->ibcq.umem);
+	cq->ibcq.umem = cq->resize_umem;
+	cq->resize_umem = NULL;
 	mutex_unlock(&cq->resize_mutex);
 
 	kvfree(in);
@@ -1475,10 +1353,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 
 ex_resize:
 	ib_umem_release(cq->resize_umem);
-	if (!udata) {
-		free_cq_buf(dev, cq->resize_buf);
-		cq->resize_buf = NULL;
-	}
+	cq->resize_umem = NULL;
 ex:
 	mutex_unlock(&cq->resize_mutex);
 	return err;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 37/50] RDMA/mthca: Remove resize support for kernel CQs
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (35 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 36/50] RDMA/mlx5: Remove support for resizing kernel CQs Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 38/50] RDMA/rdmavt: " Leon Romanovsky
                   ` (14 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ resize operation is a uverbs-only interface and is not required for
kernel-created CQs. Drop this unused functionality.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mthca/mthca_provider.c | 102 ++-------------------------
 1 file changed, 6 insertions(+), 96 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 8920deceea73..fd306a229318 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -626,8 +626,6 @@ static int mthca_create_user_cq(struct ib_cq *ibcq,
 		goto err_unmap_arm;
 	}
 
-	cq->resize_buf = NULL;
-
 	return 0;
 
 err_unmap_arm:
@@ -667,53 +665,6 @@ static int mthca_create_cq(struct ib_cq *ibcq,
 	if (err)
 		return err;
 
-	cq->resize_buf = NULL;
-
-	return 0;
-}
-
-static int mthca_alloc_resize_buf(struct mthca_dev *dev, struct mthca_cq *cq,
-				  int entries)
-{
-	int ret;
-
-	spin_lock_irq(&cq->lock);
-	if (cq->resize_buf) {
-		ret = -EBUSY;
-		goto unlock;
-	}
-
-	cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_ATOMIC);
-	if (!cq->resize_buf) {
-		ret = -ENOMEM;
-		goto unlock;
-	}
-
-	cq->resize_buf->state = CQ_RESIZE_ALLOC;
-
-	ret = 0;
-
-unlock:
-	spin_unlock_irq(&cq->lock);
-
-	if (ret)
-		return ret;
-
-	ret = mthca_alloc_cq_buf(dev, &cq->resize_buf->buf, entries);
-	if (ret) {
-		spin_lock_irq(&cq->lock);
-		kfree(cq->resize_buf);
-		cq->resize_buf = NULL;
-		spin_unlock_irq(&cq->lock);
-		return ret;
-	}
-
-	cq->resize_buf->cqe = entries - 1;
-
-	spin_lock_irq(&cq->lock);
-	cq->resize_buf->state = CQ_RESIZE_READY;
-	spin_unlock_irq(&cq->lock);
-
 	return 0;
 }
 
@@ -736,60 +687,19 @@ static int mthca_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *uda
 		goto out;
 	}
 
-	if (cq->is_kernel) {
-		ret = mthca_alloc_resize_buf(dev, cq, entries);
-		if (ret)
-			goto out;
-		lkey = cq->resize_buf->buf.mr.ibmr.lkey;
-	} else {
-		if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		lkey = ucmd.lkey;
+	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
+		ret = -EFAULT;
+		goto out;
 	}
+	lkey = ucmd.lkey;
 
 	ret = mthca_RESIZE_CQ(dev, cq->cqn, lkey, ilog2(entries));
-
-	if (ret) {
-		if (cq->resize_buf) {
-			mthca_free_cq_buf(dev, &cq->resize_buf->buf,
-					  cq->resize_buf->cqe);
-			kfree(cq->resize_buf);
-			spin_lock_irq(&cq->lock);
-			cq->resize_buf = NULL;
-			spin_unlock_irq(&cq->lock);
-		}
+	if (ret)
 		goto out;
-	}
-
-	if (cq->is_kernel) {
-		struct mthca_cq_buf tbuf;
-		int tcqe;
-
-		spin_lock_irq(&cq->lock);
-		if (cq->resize_buf->state == CQ_RESIZE_READY) {
-			mthca_cq_resize_copy_cqes(cq);
-			tbuf         = cq->buf;
-			tcqe         = cq->ibcq.cqe;
-			cq->buf      = cq->resize_buf->buf;
-			cq->ibcq.cqe = cq->resize_buf->cqe;
-		} else {
-			tbuf = cq->resize_buf->buf;
-			tcqe = cq->resize_buf->cqe;
-		}
-
-		kfree(cq->resize_buf);
-		cq->resize_buf = NULL;
-		spin_unlock_irq(&cq->lock);
-
-		mthca_free_cq_buf(dev, &tbuf, tcqe);
-	} else
-		ibcq->cqe = entries - 1;
 
+	ibcq->cqe = entries - 1;
 out:
 	mutex_unlock(&cq->mutex);
-
 	return ret;
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 38/50] RDMA/rdmavt: Remove resize support for kernel CQs
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (36 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 37/50] RDMA/mthca: Remove resize support for " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 39/50] RDMA/rxe: Remove unused kernel‑side CQ resize support Leon Romanovsky
                   ` (13 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ resize operation is a uverbs-only interface and is not needed for
CQs created by the kernel. Remove this unused functionality.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rdmavt/cq.c | 70 ++++++++++++---------------------------
 1 file changed, 21 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index db86eb026bb3..1ae5d8c86acb 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -408,51 +408,36 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	struct rvt_dev_info *rdi = cq->rdi;
 	struct rvt_cq_wc *u_wc = NULL;
 	struct rvt_cq_wc *old_u_wc = NULL;
-	struct rvt_k_cq_wc *k_wc = NULL;
-	struct rvt_k_cq_wc *old_k_wc = NULL;
+	__u64 offset = 0;
 
 	if (cqe < 1 || cqe > rdi->dparms.props.max_cqe)
 		return -EINVAL;
 
+	if (udata->outlen < sizeof(__u64))
+		return -EINVAL;
+
 	/*
 	 * Need to use vmalloc() if we want to support large #s of entries.
 	 */
-	if (udata && udata->outlen >= sizeof(__u64)) {
-		sz = sizeof(struct ib_uverbs_wc) * (cqe + 1);
-		sz += sizeof(*u_wc);
-		u_wc = vmalloc_user(sz);
-		if (!u_wc)
-			return -ENOMEM;
-	} else {
-		sz = sizeof(struct ib_wc) * (cqe + 1);
-		sz += sizeof(*k_wc);
-		k_wc = vzalloc_node(sz, rdi->dparms.node);
-		if (!k_wc)
-			return -ENOMEM;
-	}
-	/* Check that we can write the offset to mmap. */
-	if (udata && udata->outlen >= sizeof(__u64)) {
-		__u64 offset = 0;
+	sz = sizeof(struct ib_uverbs_wc) * (cqe + 1);
+	sz += sizeof(*u_wc);
+	u_wc = vmalloc_user(sz);
+	if (!u_wc)
+		return -ENOMEM;
 
-		ret = ib_copy_to_udata(udata, &offset, sizeof(offset));
-		if (ret)
-			goto bail_free;
-	}
+	/* Check that we can write the offset to mmap. */
+	ret = ib_copy_to_udata(udata, &offset, sizeof(offset));
+	if (ret)
+		goto bail_free;
 
 	spin_lock_irq(&cq->lock);
 	/*
 	 * Make sure head and tail are sane since they
 	 * might be user writable.
 	 */
-	if (u_wc) {
-		old_u_wc = cq->queue;
-		head = RDMA_READ_UAPI_ATOMIC(old_u_wc->head);
-		tail = RDMA_READ_UAPI_ATOMIC(old_u_wc->tail);
-	} else {
-		old_k_wc = cq->kqueue;
-		head = old_k_wc->head;
-		tail = old_k_wc->tail;
-	}
+	old_u_wc = cq->queue;
+	head = RDMA_READ_UAPI_ATOMIC(old_u_wc->head);
+	tail = RDMA_READ_UAPI_ATOMIC(old_u_wc->tail);
 
 	if (head > (u32)cq->ibcq.cqe)
 		head = (u32)cq->ibcq.cqe;
@@ -467,31 +452,19 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 		goto bail_unlock;
 	}
 	for (n = 0; tail != head; n++) {
-		if (u_wc)
-			u_wc->uqueue[n] = old_u_wc->uqueue[tail];
-		else
-			k_wc->kqueue[n] = old_k_wc->kqueue[tail];
+		u_wc->uqueue[n] = old_u_wc->uqueue[tail];
 		if (tail == (u32)cq->ibcq.cqe)
 			tail = 0;
 		else
 			tail++;
 	}
 	cq->ibcq.cqe = cqe;
-	if (u_wc) {
-		RDMA_WRITE_UAPI_ATOMIC(u_wc->head, n);
-		RDMA_WRITE_UAPI_ATOMIC(u_wc->tail, 0);
-		cq->queue = u_wc;
-	} else {
-		k_wc->head = n;
-		k_wc->tail = 0;
-		cq->kqueue = k_wc;
-	}
+	RDMA_WRITE_UAPI_ATOMIC(u_wc->head, n);
+	RDMA_WRITE_UAPI_ATOMIC(u_wc->tail, 0);
+	cq->queue = u_wc;
 	spin_unlock_irq(&cq->lock);
 
-	if (u_wc)
-		vfree(old_u_wc);
-	else
-		vfree(old_k_wc);
+	vfree(old_u_wc);
 
 	if (cq->ip) {
 		struct rvt_mmap_info *ip = cq->ip;
@@ -521,7 +494,6 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	spin_unlock_irq(&cq->lock);
 bail_free:
 	vfree(u_wc);
-	vfree(k_wc);
 
 	return ret;
 }

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 39/50] RDMA/rxe: Remove unused kernel‑side CQ resize support
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (37 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 38/50] RDMA/rdmavt: " Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 40/50] RDMA: Properly propagate the number of CQEs as unsigned int Leon Romanovsky
                   ` (12 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

CQ resizing is only used by uverbs; the kernel‑side CQ resize path has
no users and can be removed.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 27 +++++++--------------------
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 72e3019ed1cb..bc7c77ff3d90 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1146,32 +1146,19 @@ static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	struct rxe_resize_cq_resp __user *uresp = NULL;
 	int err;
 
-	if (udata) {
-		if (udata->outlen < sizeof(*uresp)) {
-			err = -EINVAL;
-			rxe_dbg_cq(cq, "malformed udata\n");
-			goto err_out;
-		}
-		uresp = udata->outbuf;
-	}
+	if (udata->outlen < sizeof(*uresp))
+		return -EINVAL;
+	uresp = udata->outbuf;
 
 	err = rxe_cq_chk_attr(rxe, cq, cqe, 0);
-	if (err) {
-		rxe_dbg_cq(cq, "bad attr, err = %d\n", err);
-		goto err_out;
-	}
+	if (err)
+		return err;
 
 	err = rxe_cq_resize_queue(cq, cqe, uresp, udata);
-	if (err) {
-		rxe_dbg_cq(cq, "resize cq failed, err = %d\n", err);
-		goto err_out;
-	}
+	if (err)
+		return err;
 
 	return 0;
-
-err_out:
-	rxe_err_cq(cq, "returned err = %d\n", err);
-	return err;
 }
 
 static int rxe_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 40/50] RDMA: Properly propagate the number of CQEs as unsigned int
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (38 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 39/50] RDMA/rxe: Remove unused kernel‑side CQ resize support Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 41/50] RDMA/core: Generalize CQ resize locking Leon Romanovsky
                   ` (11 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in
ib_uvbers.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/uverbs_cmd.c         |  3 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.c     |  8 +++----
 drivers/infiniband/hw/bnxt_re/ib_verbs.h     |  3 ++-
 drivers/infiniband/hw/irdma/verbs.c          |  2 +-
 drivers/infiniband/hw/mlx4/cq.c              |  5 +++--
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  3 ++-
 drivers/infiniband/hw/mlx5/cq.c              | 10 +++------
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |  3 ++-
 drivers/infiniband/hw/mthca/mthca_provider.c |  5 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 12 +++++------
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |  3 ++-
 drivers/infiniband/sw/rdmavt/cq.c            | 10 ++-------
 drivers/infiniband/sw/rdmavt/cq.h            |  2 +-
 drivers/infiniband/sw/rxe/rxe_cq.c           | 31 ----------------------------
 drivers/infiniband/sw/rxe/rxe_loc.h          |  3 ---
 drivers/infiniband/sw/rxe/rxe_verbs.c        |  9 ++++----
 include/rdma/ib_verbs.h                      |  2 +-
 17 files changed, 38 insertions(+), 76 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 57697738fd25..b4b0c7c92fb1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1147,6 +1147,9 @@ static int ib_uverbs_resize_cq(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		return ret;
 
+	if (!cmd.cqe)
+		return -EINVAL;
+
 	cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs);
 	if (IS_ERR(cq))
 		return PTR_ERR(cq);
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 16bb586d68c7..d652018c19b3 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3324,7 +3324,8 @@ static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
 	}
 }
 
-int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
+int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
+		      struct ib_udata *udata)
 {
 	struct bnxt_qplib_sg_info sg_info = {};
 	struct bnxt_qplib_dpi *orig_dpi = NULL;
@@ -3346,11 +3347,8 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	}
 
 	/* Check the requested cq depth out of supported depth */
-	if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
-		ibdev_err(&rdev->ibdev, "Resize CQ %#x failed - out of range cqe %d",
-			  cq->qplib_cq.id, cqe);
+	if (cqe > dev_attr->max_cq_wqes)
 		return -EINVAL;
-	}
 
 	uctx = rdma_udata_to_drv_context(udata, struct bnxt_re_ucontext, ib_uctx);
 	entries = bnxt_re_init_depth(cqe + 1, uctx);
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
index cac3e10b73f6..7890d6ebad90 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
@@ -249,7 +249,8 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 int bnxt_re_create_user_cq(struct ib_cq *ibcq,
 			   const struct ib_cq_init_attr *attr,
 			   struct uverbs_attr_bundle *attrs);
-int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
+int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
+		      struct ib_udata *udata);
 int bnxt_re_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
 int bnxt_re_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc);
 int bnxt_re_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index d5442aebf1ac..f20f53ecd869 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -2012,7 +2012,7 @@ static int irdma_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata)
  * @entries: desired cq size
  * @udata: user data
  */
-static int irdma_resize_cq(struct ib_cq *ibcq, int entries,
+static int irdma_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 			   struct ib_udata *udata)
 {
 	struct irdma_resize_cq_req req = {};
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 05fad06b89c2..f4595afced45 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -351,14 +351,15 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	return err;
 }
 
-int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
+int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+		      struct ib_udata *udata)
 {
 	struct mlx4_ib_dev *dev = to_mdev(ibcq->device);
 	struct mlx4_ib_cq *cq = to_mcq(ibcq);
 	struct mlx4_mtt mtt;
 	int err;
 
-	if (entries < 1 || entries > dev->dev->caps.max_cqes)
+	if (entries > dev->dev->caps.max_cqes)
 		return -EINVAL;
 
 	entries = roundup_pow_of_two(entries + 1);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 6a7ed5225c7d..5a799d6df93e 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -767,7 +767,8 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
 int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
 		      unsigned int *sg_offset);
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
-int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
+int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+		      struct ib_udata *udata);
 int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		      struct uverbs_attr_bundle *attrs);
 int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index ce20af01cde0..78c3494517d7 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1253,7 +1253,8 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	return 0;
 }
 
-int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
+int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+		      struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibcq->device);
 	struct mlx5_ib_cq *cq = to_mcq(ibcq);
@@ -1273,13 +1274,8 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 		return -ENOSYS;
 	}
 
-	if (entries < 1 ||
-	    entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) {
-		mlx5_ib_warn(dev, "wrong entries number %d, max %d\n",
-			     entries,
-			     1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz));
+	if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
 		return -EINVAL;
-	}
 
 	entries = roundup_pow_of_two(entries + 1);
 	if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)) + 1)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2556e326afde..e99a647ed62d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1380,7 +1380,8 @@ int mlx5_ib_pre_destroy_cq(struct ib_cq *cq);
 void mlx5_ib_post_destroy_cq(struct ib_cq *cq);
 int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 int mlx5_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
-int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
+int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+		      struct ib_udata *udata);
 struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc);
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index fd306a229318..85de004547ab 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -668,7 +668,8 @@ static int mthca_create_cq(struct ib_cq *ibcq,
 	return 0;
 }
 
-static int mthca_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
+static int mthca_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+			   struct ib_udata *udata)
 {
 	struct mthca_dev *dev = to_mdev(ibcq->device);
 	struct mthca_cq *cq = to_mcq(ibcq);
@@ -676,7 +677,7 @@ static int mthca_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *uda
 	u32 lkey;
 	int ret;
 
-	if (entries < 1 || entries > dev->limits.max_cqes)
+	if (entries > dev->limits.max_cqes)
 		return -EINVAL;
 
 	mutex_lock(&cq->mutex);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 034d8b937a77..8445780c398f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1035,18 +1035,16 @@ int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return 0;
 }
 
-int ocrdma_resize_cq(struct ib_cq *ibcq, int new_cnt,
+int ocrdma_resize_cq(struct ib_cq *ibcq, unsigned int new_cnt,
 		     struct ib_udata *udata)
 {
-	int status = 0;
 	struct ocrdma_cq *cq = get_ocrdma_cq(ibcq);
 
-	if (new_cnt < 1 || new_cnt > cq->max_hw_cqe) {
-		status = -EINVAL;
-		return status;
-	}
+	if (new_cnt > cq->max_hw_cqe)
+		return -EINVAL;
+
 	ibcq->cqe = new_cnt;
-	return status;
+	return 0;
 }
 
 static void ocrdma_flush_cq(struct ocrdma_cq *cq)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 4a572608fd9f..bbc08f88c046 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -74,7 +74,8 @@ int ocrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 int ocrdma_create_user_cq(struct ib_cq *ibcq,
 			  const struct ib_cq_init_attr *attr,
 			  struct uverbs_attr_bundle *attrs);
-int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *);
+int ocrdma_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
+		     struct ib_udata *udata);
 int ocrdma_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 
 int ocrdma_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *attrs,
diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index 1ae5d8c86acb..7be79274bafb 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -393,13 +393,7 @@ int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags)
 	return ret;
 }
 
-/*
- * rvt_resize_cq - change the size of the CQ
- * @ibcq: the completion queue
- *
- * Return: 0 for success.
- */
-int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
+int rvt_resize_cq(struct ib_cq *ibcq, unsigned int cqe, struct ib_udata *udata)
 {
 	struct rvt_cq *cq = ibcq_to_rvtcq(ibcq);
 	u32 head, tail, n;
@@ -410,7 +404,7 @@ int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 	struct rvt_cq_wc *old_u_wc = NULL;
 	__u64 offset = 0;
 
-	if (cqe < 1 || cqe > rdi->dparms.props.max_cqe)
+	if (cqe > rdi->dparms.props.max_cqe)
 		return -EINVAL;
 
 	if (udata->outlen < sizeof(__u64))
diff --git a/drivers/infiniband/sw/rdmavt/cq.h b/drivers/infiniband/sw/rdmavt/cq.h
index 14ee2705c443..3827c0e6a0fb 100644
--- a/drivers/infiniband/sw/rdmavt/cq.h
+++ b/drivers/infiniband/sw/rdmavt/cq.h
@@ -15,7 +15,7 @@ int rvt_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		       struct uverbs_attr_bundle *attrs);
 int rvt_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int rvt_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags);
-int rvt_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
+int rvt_resize_cq(struct ib_cq *ibcq, unsigned int cqe, struct ib_udata *udata);
 int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 int rvt_driver_cq_init(void);
 void rvt_cq_exit(void);
diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index fffd144d509e..eaf7802a5cbe 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -8,37 +8,6 @@
 #include "rxe_loc.h"
 #include "rxe_queue.h"
 
-int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
-		    int cqe, int comp_vector)
-{
-	int count;
-
-	if (cqe <= 0) {
-		rxe_dbg_dev(rxe, "cqe(%d) <= 0\n", cqe);
-		goto err1;
-	}
-
-	if (cqe > rxe->attr.max_cqe) {
-		rxe_dbg_dev(rxe, "cqe(%d) > max_cqe(%d)\n",
-				cqe, rxe->attr.max_cqe);
-		goto err1;
-	}
-
-	if (cq) {
-		count = queue_count(cq->queue, QUEUE_TYPE_TO_CLIENT);
-		if (cqe < count) {
-			rxe_dbg_cq(cq, "cqe(%d) < current # elements in queue (%d)\n",
-					cqe, count);
-			goto err1;
-		}
-	}
-
-	return 0;
-
-err1:
-	return -EINVAL;
-}
-
 int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
 		     int comp_vector, struct ib_udata *udata,
 		     struct rxe_create_cq_resp __user *uresp)
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 7992290886e1..e095c12699cb 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -18,9 +18,6 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr);
 struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp);
 
 /* rxe_cq.c */
-int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
-		    int cqe, int comp_vector);
-
 int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
 		     int comp_vector, struct ib_udata *udata,
 		     struct rxe_create_cq_resp __user *uresp);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index bc7c77ff3d90..f57b4ba22a4f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1139,7 +1139,8 @@ static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return err;
 }
 
-static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
+static int rxe_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
+			 struct ib_udata *udata)
 {
 	struct rxe_cq *cq = to_rcq(ibcq);
 	struct rxe_dev *rxe = to_rdev(ibcq->device);
@@ -1150,9 +1151,9 @@ static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 		return -EINVAL;
 	uresp = udata->outbuf;
 
-	err = rxe_cq_chk_attr(rxe, cq, cqe, 0);
-	if (err)
-		return err;
+	if (cqe > rxe->attr.max_cqe ||
+	    cqe < queue_count(cq->queue, QUEUE_TYPE_TO_CLIENT))
+		return -EINVAL;
 
 	err = rxe_cq_resize_queue(cq, cqe, uresp, udata);
 	if (err)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 94bb3cc4c67a..7d32d02c35e3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2534,7 +2534,7 @@ struct ib_device_ops {
 			      struct uverbs_attr_bundle *attrs);
 	int (*modify_cq)(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 	int (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata);
-	int (*resize_user_cq)(struct ib_cq *cq, int cqe,
+	int (*resize_user_cq)(struct ib_cq *cq, unsigned int cqe,
 			      struct ib_udata *udata);
 	/*
 	 * pre_destroy_cq - Prevent a cq from generating any new work

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 41/50] RDMA/core: Generalize CQ resize locking
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (39 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 40/50] RDMA: Properly propagate the number of CQEs as unsigned int Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step Leon Romanovsky
                   ` (10 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

The CQ resize path must be protected from concurrent execution because it
updates in-kernel objects. Some drivers did not provide any locking,
leading to inconsistent behavior.

Rely on the core mutex for synchronization and drop the various ad‑hoc
locking implementations in individual drivers.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/uverbs_cmd.c          | 1 +
 drivers/infiniband/core/uverbs_std_types_cq.c | 1 +
 drivers/infiniband/core/verbs.c               | 2 ++
 include/rdma/ib_verbs.h                       | 3 +++
 4 files changed, 7 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b4b0c7c92fb1..1348ebd7a1c3 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1067,6 +1067,7 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	cq->event_handler = ib_uverbs_cq_event_handler;
 	cq->cq_context    = ev_file ? &ev_file->ev_queue : NULL;
 	atomic_set(&cq->usecnt, 0);
+	mutex_init(&cq->resize_mutex);
 
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index a12e3184dd5c..c572f528579d 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -195,6 +195,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	 */
 	cq->umem = umem;
 	atomic_set(&cq->usecnt, 0);
+	mutex_init(&cq->resize_mutex);
 
 	rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
 	rdma_restrack_set_name(&cq->res, NULL);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 5f59487fc9d4..b308100ba964 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2257,6 +2257,8 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
 	if (ret)
 		return ret;
 
+	if (udata)
+		mutex_destroy(&cq->resize_mutex);
 	ib_umem_release(cq->umem);
 	rdma_restrack_del(&cq->res);
 	kfree(cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7d32d02c35e3..48340b39ab26 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1638,8 +1638,11 @@ struct ib_cq {
 	struct ib_wc		*wc;
 	struct list_head        pool_entry;
 	union {
+		/* Kernel CQs */
 		struct irq_poll		iop;
 		struct work_struct	work;
+		/* Uverbs CQs */
+		struct mutex resize_mutex;
 	};
 	struct workqueue_struct *comp_wq;
 	struct dim *dim;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (40 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 41/50] RDMA/core: Generalize CQ resize locking Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-16  3:59   ` Selvin Xavier
  2026-02-24  8:15   ` Selvin Xavier
  2026-02-13 10:58 ` [PATCH rdma-next 43/50] RDMA/bnxt_re: Rely on common resize‑CQ locking Leon Romanovsky
                   ` (9 subsequent siblings)
  51 siblings, 2 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There is no need to defer the CQ resize operation, as it is intended to
be completed in one pass. The current bnxt_re_resize_cq() implementation
does not handle concurrent CQ resize requests, and this will be addressed
in the following patches.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index d652018c19b3..2aecfbbb7eaf 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return rc;
 }
 
-static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
-{
-	struct bnxt_re_dev *rdev = cq->rdev;
-
-	bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
-
-	cq->qplib_cq.max_wqe = cq->resize_cqe;
-	if (cq->resize_umem) {
-		ib_umem_release(cq->ib_cq.umem);
-		cq->ib_cq.umem = cq->resize_umem;
-		cq->resize_umem = NULL;
-		cq->resize_cqe = 0;
-	}
-}
 
 int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 		      struct ib_udata *udata)
@@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 		goto fail;
 	}
 
-	cq->ib_cq.cqe = cq->resize_cqe;
+	bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
+
+	cq->qplib_cq.max_wqe = cq->resize_cqe;
+	ib_umem_release(cq->ib_cq.umem);
+	cq->ib_cq.umem = cq->resize_umem;
+	cq->resize_umem = NULL;
+	cq->resize_cqe = 0;
+
+	cq->ib_cq.cqe = entries;
 	atomic_inc(&rdev->stats.res.resize_count);
 
 	return 0;
@@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
 	struct bnxt_re_sqp_entries *sqp_entry = NULL;
 	unsigned long flags;
 
-	/* User CQ; the only processing we do is to
-	 * complete any pending CQ resize operation.
-	 */
-	if (cq->ib_cq.umem) {
-		if (cq->resize_umem)
-			bnxt_re_resize_cq_complete(cq);
-		return 0;
-	}
-
 	spin_lock_irqsave(&cq->cq_lock, flags);
 	budget = min_t(u32, num_entries, cq->max_cql);
 	num_entries = budget;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 43/50] RDMA/bnxt_re: Rely on common resize‑CQ locking
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (41 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 44/50] RDMA/bnxt_re: Reduce CQ memory footprint Leon Romanovsky
                   ` (8 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

After introducing a shared mutex to protect against concurrent
resize‑CQ operations, update the bnxt_re driver to use this mechanism.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 2aecfbbb7eaf..d544a4fb1e96 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3326,12 +3326,6 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 	rdev = cq->rdev;
 	dev_attr = rdev->dev_attr;
 
-	if (cq->resize_umem) {
-		ibdev_err(&rdev->ibdev, "Resize CQ %#x failed - Busy",
-			  cq->qplib_cq.id);
-		return -EBUSY;
-	}
-
 	/* Check the requested cq depth out of supported depth */
 	if (cqe > dev_attr->max_cq_wqes)
 		return -EINVAL;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 44/50] RDMA/bnxt_re: Reduce CQ memory footprint
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (42 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 43/50] RDMA/bnxt_re: Rely on common resize‑CQ locking Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 45/50] RDMA/mlx4: Use generic resize-CQ lock Leon Romanovsky
                   ` (7 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There is no need to store resize_cqe and resize_umem in CQ object.
Let's remove them.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 37 +++++++++++---------------------
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |  2 --
 2 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index d544a4fb1e96..9a8bdb52097f 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3320,6 +3320,8 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 	struct bnxt_re_resize_cq_req req;
 	struct bnxt_re_dev *rdev;
 	struct bnxt_re_cq *cq;
+	struct ib_umem *umem;
+
 	int rc, entries;
 
 	cq =  container_of(ibcq, struct bnxt_re_cq, ib_cq);
@@ -3336,26 +3338,18 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 		entries = dev_attr->max_cq_wqes + 1;
 
 	/* uverbs consumer */
-	if (ib_copy_from_udata(&req, udata, sizeof(req))) {
-		rc = -EFAULT;
-		goto fail;
-	}
+	if (ib_copy_from_udata(&req, udata, sizeof(req)))
+		return -EFAULT;
 
-	cq->resize_umem = ib_umem_get(&rdev->ibdev, req.cq_va,
-				      entries * sizeof(struct cq_base),
-				      IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(cq->resize_umem)) {
-		rc = PTR_ERR(cq->resize_umem);
-		ibdev_err(&rdev->ibdev, "%s: ib_umem_get failed! rc = %pe\n",
-			  __func__, cq->resize_umem);
-		cq->resize_umem = NULL;
-		return rc;
-	}
-	cq->resize_cqe = entries;
+	umem = ib_umem_get(&rdev->ibdev, req.cq_va,
+			   entries * sizeof(struct cq_base),
+			   IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(umem))
+		return PTR_ERR(umem);
 	memcpy(&sg_info, &cq->qplib_cq.sg_info, sizeof(sg_info));
 	orig_dpi = cq->qplib_cq.dpi;
 
-	cq->qplib_cq.sg_info.umem = cq->resize_umem;
+	cq->qplib_cq.sg_info.umem = umem;
 	cq->qplib_cq.sg_info.pgsize = PAGE_SIZE;
 	cq->qplib_cq.sg_info.pgshft = PAGE_SHIFT;
 	cq->qplib_cq.dpi = &uctx->dpi;
@@ -3369,21 +3363,16 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 
 	bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
 
-	cq->qplib_cq.max_wqe = cq->resize_cqe;
+	cq->qplib_cq.max_wqe = entries;
 	ib_umem_release(cq->ib_cq.umem);
-	cq->ib_cq.umem = cq->resize_umem;
-	cq->resize_umem = NULL;
-	cq->resize_cqe = 0;
-
+	cq->ib_cq.umem = umem;
 	cq->ib_cq.cqe = entries;
 	atomic_inc(&rdev->stats.res.resize_count);
 
 	return 0;
 
 fail:
-	ib_umem_release(cq->resize_umem);
-	cq->resize_umem = NULL;
-	cq->resize_cqe = 0;
+	ib_umem_release(umem);
 	memcpy(&cq->qplib_cq.sg_info, &sg_info, sizeof(sg_info));
 	cq->qplib_cq.dpi = orig_dpi;
 	return rc;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
index 7890d6ebad90..ee7ccaa2ed4c 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
@@ -108,8 +108,6 @@ struct bnxt_re_cq {
 	struct bnxt_qplib_cqe	*cql;
 #define MAX_CQL_PER_POLL	1024
 	u32			max_cql;
-	struct ib_umem		*resize_umem;
-	int			resize_cqe;
 	void			*uctx_cq_page;
 	struct hlist_node	hash_entry;
 };

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 45/50] RDMA/mlx4: Use generic resize-CQ lock
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (43 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 44/50] RDMA/bnxt_re: Reduce CQ memory footprint Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 46/50] RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object Leon Romanovsky
                   ` (6 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Replace the open‑coded resize‑CQ lock with the standard core
implementation for better consistency and maintainability.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c      | 9 +--------
 drivers/infiniband/hw/mlx4/mlx4_ib.h | 1 -
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index f4595afced45..ffc3902dc329 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -163,7 +163,6 @@ int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
 
 	entries      = roundup_pow_of_two(entries + 1);
 	cq->ibcq.cqe = entries - 1;
-	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
@@ -253,7 +252,6 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 
 	entries      = roundup_pow_of_two(entries + 1);
 	cq->ibcq.cqe = entries - 1;
-	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
@@ -369,12 +367,9 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	if (entries > dev->dev->caps.max_cqes + 1)
 		return -EINVAL;
 
-	mutex_lock(&cq->resize_mutex);
 	err = mlx4_alloc_resize_umem(dev, cq, entries, udata);
-	if (err) {
-		mutex_unlock(&cq->resize_mutex);
+	if (err)
 		return err;
-	}
 	mtt = cq->buf.mtt;
 
 	err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt);
@@ -390,7 +385,6 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	kfree(cq->resize_buf);
 	cq->resize_buf = NULL;
 	cq->resize_umem = NULL;
-	mutex_unlock(&cq->resize_mutex);
 	return 0;
 
 
@@ -401,7 +395,6 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 
 	ib_umem_release(cq->resize_umem);
 	cq->resize_umem = NULL;
-	mutex_unlock(&cq->resize_mutex);
 	return err;
 }
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 5a799d6df93e..2f1043690554 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -120,7 +120,6 @@ struct mlx4_ib_cq {
 	struct mlx4_ib_cq_resize *resize_buf;
 	struct mlx4_db		db;
 	spinlock_t		lock;
-	struct mutex		resize_mutex;
 	struct ib_umem	       *resize_umem;
 	/* List of qps that it serves.*/
 	struct list_head		send_qp_list;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 46/50] RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (44 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 45/50] RDMA/mlx4: Use generic resize-CQ lock Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 47/50] RDMA/mlx5: Use generic resize-CQ lock Leon Romanovsky
                   ` (5 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

These variables do not need to persist for the lifetime of the CQ object.
They can be safely allocated on the stack instead.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c      | 81 +++++++++++++-----------------------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  1 -
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index ffc3902dc329..6e8017ecf137 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -294,15 +294,29 @@ int mlx4_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return err;
 }
 
-static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
-				   int entries, struct ib_udata *udata)
+int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
+		      struct ib_udata *udata)
 {
+	struct mlx4_ib_dev *dev = to_mdev(ibcq->device);
+	struct mlx4_ib_cq *cq = to_mcq(ibcq);
 	struct mlx4_ib_resize_cq ucmd;
 	int cqe_size = dev->dev->caps.cqe_size;
+	struct ib_umem *umem;
+	struct mlx4_mtt mtt;
 	int shift;
 	int n;
 	int err;
 
+	if (entries > dev->dev->caps.max_cqes)
+		return -EINVAL;
+
+	entries = roundup_pow_of_two(entries + 1);
+	if (entries == ibcq->cqe + 1)
+		return 0;
+
+	if (entries > dev->dev->caps.max_cqes + 1)
+		return -EINVAL;
+
 	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd))
 		return -EFAULT;
 
@@ -310,15 +324,14 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	if (!cq->resize_buf)
 		return -ENOMEM;
 
-	cq->resize_umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
-				      entries * cqe_size,
-				      IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(cq->resize_umem)) {
-		err = PTR_ERR(cq->resize_umem);
+	umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
+			   entries * cqe_size, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(umem)) {
+		err = PTR_ERR(umem);
 		goto err_buf;
 	}
 
-	shift = mlx4_ib_umem_calc_optimal_mtt_size(cq->resize_umem, 0, &n);
+	shift = mlx4_ib_umem_calc_optimal_mtt_size(umem, 0, &n);
 	if (shift < 0) {
 		err = shift;
 		goto err_umem;
@@ -328,73 +341,35 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	if (err)
 		goto err_umem;
 
-	err = mlx4_ib_umem_write_mtt(dev, &cq->resize_buf->buf.mtt,
-				     cq->resize_umem);
+	err = mlx4_ib_umem_write_mtt(dev, &cq->resize_buf->buf.mtt, umem);
 	if (err)
 		goto err_mtt;
 
 	cq->resize_buf->cqe = entries - 1;
 
-	return 0;
-
-err_mtt:
-	mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt);
-
-err_umem:
-	ib_umem_release(cq->resize_umem);
-	cq->resize_umem = NULL;
-err_buf:
-	kfree(cq->resize_buf);
-	cq->resize_buf = NULL;
-	return err;
-}
-
-int mlx4_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
-		      struct ib_udata *udata)
-{
-	struct mlx4_ib_dev *dev = to_mdev(ibcq->device);
-	struct mlx4_ib_cq *cq = to_mcq(ibcq);
-	struct mlx4_mtt mtt;
-	int err;
-
-	if (entries > dev->dev->caps.max_cqes)
-		return -EINVAL;
-
-	entries = roundup_pow_of_two(entries + 1);
-	if (entries == ibcq->cqe + 1)
-		return 0;
-
-	if (entries > dev->dev->caps.max_cqes + 1)
-		return -EINVAL;
-
-	err = mlx4_alloc_resize_umem(dev, cq, entries, udata);
-	if (err)
-		return err;
 	mtt = cq->buf.mtt;
 
 	err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt);
 	if (err)
-		goto err_buf;
+		goto err_mtt;
 
 	mlx4_mtt_cleanup(dev->dev, &mtt);
 	cq->buf = cq->resize_buf->buf;
 	cq->ibcq.cqe = cq->resize_buf->cqe;
 	ib_umem_release(cq->ibcq.umem);
-	cq->ibcq.umem = cq->resize_umem;
+	cq->ibcq.umem = umem;
 
 	kfree(cq->resize_buf);
 	cq->resize_buf = NULL;
-	cq->resize_umem = NULL;
 	return 0;
 
+err_mtt:
+	mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt);
 
+err_umem:
+	ib_umem_release(umem);
 err_buf:
-	mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt);
 	kfree(cq->resize_buf);
-	cq->resize_buf = NULL;
-
-	ib_umem_release(cq->resize_umem);
-	cq->resize_umem = NULL;
 	return err;
 }
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 2f1043690554..4163a6cb32d0 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -120,7 +120,6 @@ struct mlx4_ib_cq {
 	struct mlx4_ib_cq_resize *resize_buf;
 	struct mlx4_db		db;
 	spinlock_t		lock;
-	struct ib_umem	       *resize_umem;
 	/* List of qps that it serves.*/
 	struct list_head		send_qp_list;
 	struct list_head		recv_qp_list;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 47/50] RDMA/mlx5: Use generic resize-CQ lock
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (45 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 46/50] RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 48/50] RDMA/mlx5: Select resize‑CQ callback based on device capabilities Leon Romanovsky
                   ` (4 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Replace the open‑coded resize‑CQ lock with the standard core
implementation for better consistency and maintainability.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c      | 8 +-------
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 ---
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 78c3494517d7..f7fb6f4aef7d 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -972,7 +972,6 @@ int mlx5_ib_create_user_cq(struct ib_cq *ibcq,
 		return -EINVAL;
 
 	cq->ibcq.cqe = entries - 1;
-	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
 	if (attr->flags & IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION)
 		cq->private_flags |= MLX5_IB_CQ_PR_TIMESTAMP_COMPLETION;
@@ -1057,7 +1056,6 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		return -EINVAL;
 
 	cq->ibcq.cqe = entries - 1;
-	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
 	INIT_LIST_HEAD(&cq->list_send_qp);
 	INIT_LIST_HEAD(&cq->list_recv_qp);
@@ -1284,10 +1282,9 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	if (entries == ibcq->cqe + 1)
 		return 0;
 
-	mutex_lock(&cq->resize_mutex);
 	err = resize_user(dev, cq, entries, udata, &cqe_size);
 	if (err)
-		goto ex;
+		return err;
 
 	page_size = mlx5_umem_find_best_cq_quantized_pgoff(
 		cq->resize_umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
@@ -1339,7 +1336,6 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	ib_umem_release(cq->ibcq.umem);
 	cq->ibcq.umem = cq->resize_umem;
 	cq->resize_umem = NULL;
-	mutex_unlock(&cq->resize_mutex);
 
 	kvfree(in);
 	return 0;
@@ -1350,8 +1346,6 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 ex_resize:
 	ib_umem_release(cq->resize_umem);
 	cq->resize_umem = NULL;
-ex:
-	mutex_unlock(&cq->resize_mutex);
 	return err;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index e99a647ed62d..7b34f32b5ecb 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -574,9 +574,6 @@ struct mlx5_ib_cq {
 	 */
 	spinlock_t		lock;
 
-	/* protect resize cq
-	 */
-	struct mutex		resize_mutex;
 	struct mlx5_ib_cq_buf  *resize_buf;
 	struct ib_umem	       *resize_umem;
 	int			cqe_size;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 48/50] RDMA/mlx5: Select resize‑CQ callback based on device capabilities
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (46 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 47/50] RDMA/mlx5: Use generic resize-CQ lock Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 49/50] RDMA/mlx5: Reduce CQ memory footprint Leon Romanovsky
                   ` (3 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Remove the legacy capability check when issuing the resize‑CQ command.
Instead, rely on choosing the correct ops during initialization.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c   | 5 -----
 drivers/infiniband/hw/mlx5/main.c | 8 +++++++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index f7fb6f4aef7d..88f0f5e2944f 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1267,11 +1267,6 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	int inlen;
 	int cqe_size;
 
-	if (!MLX5_CAP_GEN(dev->mdev, cq_resize)) {
-		pr_info("Firmware does not support resize CQ\n");
-		return -ENOSYS;
-	}
-
 	if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
 		return -EINVAL;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 0471155eb739..f86721681f5b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4496,7 +4496,6 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
-	.resize_user_cq = mlx5_ib_resize_cq,
 	.ufile_hw_cleanup = mlx5_ib_ufile_hw_cleanup,
 
 	INIT_RDMA_OBJ_SIZE(ib_ah, mlx5_ib_ah, ibah),
@@ -4509,6 +4508,10 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	INIT_RDMA_OBJ_SIZE(ib_ucontext, mlx5_ib_ucontext, ibucontext),
 };
 
+static const struct ib_device_ops mlx5_ib_dev_resize_cq_ops = {
+	.resize_user_cq = mlx5_ib_resize_cq,
+};
+
 static const struct ib_device_ops mlx5_ib_dev_ipoib_enhanced_ops = {
 	.rdma_netdev_get_params = mlx5_ib_rn_get_params,
 };
@@ -4635,6 +4638,9 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 
 	ib_set_device_ops(&dev->ib_dev, &mlx5_ib_dev_ops);
 
+	if (MLX5_CAP_GEN(mdev, cq_resize))
+		ib_set_device_ops(&dev->ib_dev, &mlx5_ib_dev_resize_cq_ops);
+
 	if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS))
 		dev->ib_dev.driver_def = mlx5_ib_defs;
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 49/50] RDMA/mlx5: Reduce CQ memory footprint
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (47 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 48/50] RDMA/mlx5: Select resize‑CQ callback based on device capabilities Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-13 10:58 ` [PATCH rdma-next 50/50] RDMA/mthca: Use generic resize-CQ lock Leon Romanovsky
                   ` (2 subsequent siblings)
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

There is no need to store a temporary umem pointer in the generic CQ
object. Use an on‑stack variable instead.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c      | 64 ++++++++++++------------------------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 -
 2 files changed, 21 insertions(+), 44 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 88f0f5e2944f..6d9b62742674 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1218,44 +1218,13 @@ int mlx5_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period)
 	return err;
 }
 
-static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
-		       int entries, struct ib_udata *udata,
-		       int *cqe_size)
-{
-	struct mlx5_ib_resize_cq ucmd;
-	struct ib_umem *umem;
-	int err;
-
-	err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd));
-	if (err)
-		return err;
-
-	if (ucmd.reserved0 || ucmd.reserved1)
-		return -EINVAL;
-
-	/* check multiplication overflow */
-	if (ucmd.cqe_size && SIZE_MAX / ucmd.cqe_size <= entries - 1)
-		return -EINVAL;
-
-	umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr,
-			   (size_t)ucmd.cqe_size * entries,
-			   IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(umem)) {
-		err = PTR_ERR(umem);
-		return err;
-	}
-
-	cq->resize_umem = umem;
-	*cqe_size = ucmd.cqe_size;
-
-	return 0;
-}
-
 int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 		      struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibcq->device);
 	struct mlx5_ib_cq *cq = to_mcq(ibcq);
+	struct mlx5_ib_resize_cq ucmd;
+	struct ib_umem *umem;
 	unsigned long page_size;
 	void *cqc;
 	u32 *in;
@@ -1264,8 +1233,8 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	__be64 *pas;
 	unsigned int page_offset_quantized = 0;
 	unsigned int page_shift;
+	size_t umem_size;
 	int inlen;
-	int cqe_size;
 
 	if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
 		return -EINVAL;
@@ -1277,18 +1246,29 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	if (entries == ibcq->cqe + 1)
 		return 0;
 
-	err = resize_user(dev, cq, entries, udata, &cqe_size);
+	err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd));
 	if (err)
 		return err;
 
+	if (ucmd.reserved0 || ucmd.reserved1)
+		return -EINVAL;
+
+	if (check_mul_overflow(ucmd.cqe_size, entries, &umem_size))
+		return -EINVAL;
+
+	umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, umem_size,
+			   IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(umem))
+		return PTR_ERR(umem);
+
 	page_size = mlx5_umem_find_best_cq_quantized_pgoff(
-		cq->resize_umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
+		umem, cqc, log_page_size, MLX5_ADAPTER_PAGE_SHIFT,
 		page_offset, 64, &page_offset_quantized);
 	if (!page_size) {
 		err = -EINVAL;
 		goto ex_resize;
 	}
-	npas = ib_umem_num_dma_blocks(cq->resize_umem, page_size);
+	npas = ib_umem_num_dma_blocks(umem, page_size);
 	page_shift = order_base_2(page_size);
 
 	inlen = MLX5_ST_SZ_BYTES(modify_cq_in) +
@@ -1301,7 +1281,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	}
 
 	pas = (__be64 *)MLX5_ADDR_OF(modify_cq_in, in, pas);
-	mlx5_ib_populate_pas(cq->resize_umem, 1UL << page_shift, pas, 0);
+	mlx5_ib_populate_pas(umem, 1UL << page_shift, pas, 0);
 
 	MLX5_SET(modify_cq_in, in,
 		 modify_field_select_resize_field_select.resize_field_select.resize_field_select,
@@ -1315,7 +1295,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 		 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET(cqc, cqc, page_offset, page_offset_quantized);
 	MLX5_SET(cqc, cqc, cqe_sz,
-		 cqe_sz_to_mlx_sz(cqe_size,
+		 cqe_sz_to_mlx_sz(ucmd.cqe_size,
 				  cq->private_flags &
 				  MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD));
 	MLX5_SET(cqc, cqc, log_cq_size, ilog2(entries));
@@ -1329,8 +1309,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 
 	cq->ibcq.cqe = entries - 1;
 	ib_umem_release(cq->ibcq.umem);
-	cq->ibcq.umem = cq->resize_umem;
-	cq->resize_umem = NULL;
+	cq->ibcq.umem = umem;
 
 	kvfree(in);
 	return 0;
@@ -1339,8 +1318,7 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	kvfree(in);
 
 ex_resize:
-	ib_umem_release(cq->resize_umem);
-	cq->resize_umem = NULL;
+	ib_umem_release(umem);
 	return err;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7b34f32b5ecb..11e4b2ae0469 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -575,7 +575,6 @@ struct mlx5_ib_cq {
 	spinlock_t		lock;
 
 	struct mlx5_ib_cq_buf  *resize_buf;
-	struct ib_umem	       *resize_umem;
 	int			cqe_size;
 	struct list_head	list_send_qp;
 	struct list_head	list_recv_qp;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH rdma-next 50/50] RDMA/mthca: Use generic resize-CQ lock
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (48 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 49/50] RDMA/mlx5: Reduce CQ memory footprint Leon Romanovsky
@ 2026-02-13 10:58 ` Leon Romanovsky
  2026-02-25 13:51 ` (subset) [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
  2026-02-25 13:53 ` Leon Romanovsky
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 10:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

From: Leon Romanovsky <leonro@nvidia.com>

Replace the open‑coded resize‑CQ lock with the standard core
implementation for better consistency and maintainability.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mthca/mthca_cq.c       |  1 -
 drivers/infiniband/hw/mthca/mthca_provider.c | 20 ++++++--------------
 drivers/infiniband/hw/mthca/mthca_provider.h |  1 -
 3 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index 26c3408dcaca..9c15e9b886d1 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -819,7 +819,6 @@ int mthca_init_cq(struct mthca_dev *dev, int nent,
 	spin_lock_init(&cq->lock);
 	cq->refcount = 1;
 	init_waitqueue_head(&cq->wait);
-	mutex_init(&cq->mutex);
 
 	memset(cq_context, 0, sizeof *cq_context);
 	cq_context->flags           = cpu_to_be32(MTHCA_CQ_STATUS_OK      |
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 85de004547ab..cb94d73e89d6 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -680,28 +680,20 @@ static int mthca_resize_cq(struct ib_cq *ibcq, unsigned int entries,
 	if (entries > dev->limits.max_cqes)
 		return -EINVAL;
 
-	mutex_lock(&cq->mutex);
-
 	entries = roundup_pow_of_two(entries + 1);
-	if (entries == ibcq->cqe + 1) {
-		ret = 0;
-		goto out;
-	}
+	if (entries == ibcq->cqe + 1)
+		return 0;
 
-	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
-		ret = -EFAULT;
-		goto out;
-	}
+	if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)))
+		return -EFAULT;
 	lkey = ucmd.lkey;
 
 	ret = mthca_RESIZE_CQ(dev, cq->cqn, lkey, ilog2(entries));
 	if (ret)
-		goto out;
+		return ret;
 
 	ibcq->cqe = entries - 1;
-out:
-	mutex_unlock(&cq->mutex);
-	return ret;
+	return 0;
 }
 
 static int mthca_destroy_cq(struct ib_cq *cq, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.h b/drivers/infiniband/hw/mthca/mthca_provider.h
index 8a77483bb33c..7797d76fb93d 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.h
+++ b/drivers/infiniband/hw/mthca/mthca_provider.h
@@ -198,7 +198,6 @@ struct mthca_cq {
 	int			arm_sn;
 
 	wait_queue_head_t	wait;
-	struct mutex		mutex;
 };
 
 struct mthca_srq {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 28/50] RDMA/siw: Split user and kernel CQ creation paths
  2026-02-13 10:58 ` [PATCH rdma-next 28/50] RDMA/siw: " Leon Romanovsky
@ 2026-02-13 16:56   ` Bernard Metzler
  2026-02-13 21:17     ` Leon Romanovsky
  0 siblings, 1 reply; 73+ messages in thread
From: Bernard Metzler @ 2026-02-13 16:56 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

On 13.02.2026 11:58, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Separate the CQ creation logic into distinct kernel and user flows.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   drivers/infiniband/sw/siw/siw_main.c  |   1 +
>   drivers/infiniband/sw/siw/siw_verbs.c | 111 +++++++++++++++++++++++-----------
>   drivers/infiniband/sw/siw/siw_verbs.h |   2 +
>   3 files changed, 80 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
> index 5168307229a9..75dcf3578eac 100644
> --- a/drivers/infiniband/sw/siw/siw_main.c
> +++ b/drivers/infiniband/sw/siw/siw_main.c
> @@ -232,6 +232,7 @@ static const struct ib_device_ops siw_device_ops = {
>   	.alloc_pd = siw_alloc_pd,
>   	.alloc_ucontext = siw_alloc_ucontext,
>   	.create_cq = siw_create_cq,
> +	.create_user_cq = siw_create_user_cq,
>   	.create_qp = siw_create_qp,
>   	.create_srq = siw_create_srq,
>   	.dealloc_driver = siw_device_cleanup,
> diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
> index efa2f097b582..92b25b389b69 100644
> --- a/drivers/infiniband/sw/siw/siw_verbs.c
> +++ b/drivers/infiniband/sw/siw/siw_verbs.c
> @@ -1139,15 +1139,15 @@ int siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata)
>    * @attrs: uverbs bundle
>    */
>   
> -int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
> -		  struct uverbs_attr_bundle *attrs)
> +int siw_create_user_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
> +		       struct uverbs_attr_bundle *attrs)
>   {
>   	struct ib_udata *udata = &attrs->driver_udata;
>   	struct siw_device *sdev = to_siw_dev(base_cq->device);
>   	struct siw_cq *cq = to_siw_cq(base_cq);
>   	int rv, size = attr->cqe;
>   
> -	if (attr->flags)
> +	if (attr->flags || base_cq->umem)
>   		return -EOPNOTSUPP;
>   
>   	if (atomic_inc_return(&sdev->num_cq) > SIW_MAX_CQ) {
> @@ -1155,7 +1155,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
>   		rv = -ENOMEM;
>   		goto err_out;
>   	}
> -	if (size < 1 || size > sdev->attrs.max_cqe) {
> +	if (attr->cqe > sdev->attrs.max_cqe) {
>   		siw_dbg(base_cq->device, "CQ size error: %d\n", size);
>   		rv = -EINVAL;
>   		goto err_out;
> @@ -1164,13 +1164,8 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
>   	cq->base_cq.cqe = size;
>   	cq->num_cqe = size;
>   
> -	if (udata)
> -		cq->queue = vmalloc_user(size * sizeof(struct siw_cqe) +
> -					 sizeof(struct siw_cq_ctrl));
> -	else
> -		cq->queue = vzalloc(size * sizeof(struct siw_cqe) +
> -				    sizeof(struct siw_cq_ctrl));
> -
> +	cq->queue = vmalloc_user(size * sizeof(struct siw_cqe) +
> +				 sizeof(struct siw_cq_ctrl));
>   	if (cq->queue == NULL) {
>   		rv = -ENOMEM;
>   		goto err_out;
> @@ -1182,33 +1177,32 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
>   
>   	cq->notify = (struct siw_cq_ctrl *)&cq->queue[size];
>   
> -	if (udata) {
> -		struct siw_uresp_create_cq uresp = {};
> -		struct siw_ucontext *ctx =
> -			rdma_udata_to_drv_context(udata, struct siw_ucontext,
> -						  base_ucontext);
> -		size_t length = size * sizeof(struct siw_cqe) +
> -			sizeof(struct siw_cq_ctrl);
> +	struct siw_uresp_create_cq uresp = {};
> +	struct siw_ucontext *ctx =
> +		rdma_udata_to_drv_context(udata, struct siw_ucontext,
> +					  base_ucontext);
> +	size_t length = size * sizeof(struct siw_cqe) +
> +		sizeof(struct siw_cq_ctrl);
>   
> -		cq->cq_entry =
> -			siw_mmap_entry_insert(ctx, cq->queue,
> -					      length, &uresp.cq_key);
> -		if (!cq->cq_entry) {
> -			rv = -ENOMEM;
> -			goto err_out;
> -		}
> +	cq->cq_entry =
> +		siw_mmap_entry_insert(ctx, cq->queue,
> +				      length, &uresp.cq_key);
> +	if (!cq->cq_entry) {
> +		rv = -ENOMEM;
> +		goto err_out;
> +	}
>   
> -		uresp.cq_id = cq->id;
> -		uresp.num_cqe = size;
> +	uresp.cq_id = cq->id;
> +	uresp.num_cqe = size;
>   
> -		if (udata->outlen < sizeof(uresp)) {
> -			rv = -EINVAL;
> -			goto err_out;
> -		}
> -		rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
> -		if (rv)
> -			goto err_out;
> +	if (udata->outlen < sizeof(uresp)) {
> +		rv = -EINVAL;
> +		goto err_out;
>   	}
> +	rv = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
> +	if (rv)
> +		goto err_out;
> +
>   	return 0;
>   
>   err_out:
> @@ -1227,6 +1221,55 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
>   	return rv;
>   }
>   
> +int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
> +		  struct uverbs_attr_bundle *attrs)
> +{
> +	struct siw_device *sdev = to_siw_dev(base_cq->device);
> +	struct siw_cq *cq = to_siw_cq(base_cq);
> +	int rv, size = attr->cqe;
> +
> +	if (attr->flags)
> +		return -EOPNOTSUPP;
> +
> +	if (atomic_inc_return(&sdev->num_cq) > SIW_MAX_CQ) {
> +		siw_dbg(base_cq->device, "too many CQ's\n");
> +		rv = -ENOMEM;
> +		goto err_out;
> +	}
> +	if (size < 1 || size > sdev->attrs.max_cqe) {

isn't there now also a check for zero sized CQ in
__ib_alloc_cq(), which obsoletes that < 1 check?

Everything looks right otherwise.

Thanks,
Bernard.

> +		siw_dbg(base_cq->device, "CQ size error: %d\n", size);
> +		rv = -EINVAL;
> +		goto err_out;
> +	}
> +	size = roundup_pow_of_two(size);
> +	cq->base_cq.cqe = size;
> +	cq->num_cqe = size;
> +
> +	cq->queue = vzalloc(size * sizeof(struct siw_cqe) +
> +			    sizeof(struct siw_cq_ctrl));
> +	if (cq->queue == NULL) {
> +		rv = -ENOMEM;
> +		goto err_out;
> +	}
> +	get_random_bytes(&cq->id, 4);
> +	siw_dbg(base_cq->device, "new CQ [%u]\n", cq->id);
> +
> +	spin_lock_init(&cq->lock);
> +
> +	cq->notify = (struct siw_cq_ctrl *)&cq->queue[size];
> +
> +	return 0;
> +
> +err_out:
> +	siw_dbg(base_cq->device, "CQ creation failed: %d", rv);
> +
> +	if (cq->queue)
> +		vfree(cq->queue);
> +	atomic_dec(&sdev->num_cq);
> +
> +	return rv;
> +}
> +
>   /*
>    * siw_poll_cq()
>    *
> diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
> index e9f4463aecdc..527c356b55af 100644
> --- a/drivers/infiniband/sw/siw/siw_verbs.h
> +++ b/drivers/infiniband/sw/siw/siw_verbs.h
> @@ -44,6 +44,8 @@ int siw_query_device(struct ib_device *base_dev, struct ib_device_attr *attr,
>   		     struct ib_udata *udata);
>   int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
>   		  struct uverbs_attr_bundle *attrs);
> +int siw_create_user_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
> +		       struct uverbs_attr_bundle *attrs);
>   int siw_query_port(struct ib_device *base_dev, u32 port,
>   		   struct ib_port_attr *attr);
>   int siw_query_gid(struct ib_device *base_dev, u32 port, int idx,
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 28/50] RDMA/siw: Split user and kernel CQ creation paths
  2026-02-13 16:56   ` Bernard Metzler
@ 2026-02-13 21:17     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-13 21:17 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Junxian Huang, Abhijit Gangurde,
	Allen Hubbe, Krzysztof Czurylo, Tatyana Nikolova, Long Li,
	Konstantin Taranov, Yishai Hadas, Michal Kalderon, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list,
	Christian Benvenuti, Nelson Escobar, Dennis Dalessandro,
	Zhu Yanjun, linux-kernel, linux-rdma, linux-hyperv

On Fri, Feb 13, 2026 at 05:56:32PM +0100, Bernard Metzler wrote:
> On 13.02.2026 11:58, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Separate the CQ creation logic into distinct kernel and user flows.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >   drivers/infiniband/sw/siw/siw_main.c  |   1 +
> >   drivers/infiniband/sw/siw/siw_verbs.c | 111 +++++++++++++++++++++++-----------
> >   drivers/infiniband/sw/siw/siw_verbs.h |   2 +
> >   3 files changed, 80 insertions(+), 34 deletions(-)

<...>

> > +int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,
> > +		  struct uverbs_attr_bundle *attrs)
> > +{
> > +	struct siw_device *sdev = to_siw_dev(base_cq->device);
> > +	struct siw_cq *cq = to_siw_cq(base_cq);
> > +	int rv, size = attr->cqe;
> > +
> > +	if (attr->flags)
> > +		return -EOPNOTSUPP;
> > +
> > +	if (atomic_inc_return(&sdev->num_cq) > SIW_MAX_CQ) {
> > +		siw_dbg(base_cq->device, "too many CQ's\n");
> > +		rv = -ENOMEM;
> > +		goto err_out;
> > +	}
> > +	if (size < 1 || size > sdev->attrs.max_cqe) {
> 
> isn't there now also a check for zero sized CQ in
> __ib_alloc_cq(), which obsoletes that < 1 check?

Thanks, this line needs to be changed to be if "(attr.cqe > sdev->attrs.max_cqe)"

> 
> Everything looks right otherwise.
> 
> Thanks,
> Bernard.

Thanks

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 29/50] RDMA/rxe: Split user and kernel CQ creation paths
  2026-02-13 10:58 ` [PATCH rdma-next 29/50] RDMA/rxe: " Leon Romanovsky
@ 2026-02-13 23:22   ` yanjun.zhu
  2026-02-15  7:06     ` Leon Romanovsky
  0 siblings, 1 reply; 73+ messages in thread
From: yanjun.zhu @ 2026-02-13 23:22 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

On 2/13/26 2:58 AM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Separate the CQ creation logic into distinct kernel and user flows.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_verbs.c | 81 ++++++++++++++++++++---------------
>   1 file changed, 47 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 38d8c408320f..1e651bdd8622 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -1072,58 +1072,70 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
>   }
>   
>   /* cq */
> -static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> -			 struct uverbs_attr_bundle *attrs)
> +static int rxe_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> +			      struct uverbs_attr_bundle *attrs)
>   {
>   	struct ib_udata *udata = &attrs->driver_udata;
>   	struct ib_device *dev = ibcq->device;
>   	struct rxe_dev *rxe = to_rdev(dev);
>   	struct rxe_cq *cq = to_rcq(ibcq);
> -	struct rxe_create_cq_resp __user *uresp = NULL;
> -	int err, cleanup_err;
> +	struct rxe_create_cq_resp __user *uresp;
> +	int err;
>   
> -	if (udata) {
> -		if (udata->outlen < sizeof(*uresp)) {
> -			err = -EINVAL;
> -			rxe_dbg_dev(rxe, "malformed udata, err = %d\n", err);
> -			goto err_out;
> -		}
> -		uresp = udata->outbuf;
> -	}
> +	if (udata->outlen < sizeof(*uresp))
> +		return -EINVAL;
>   
> -	if (attr->flags) {
> -		err = -EOPNOTSUPP;
> -		rxe_dbg_dev(rxe, "bad attr->flags, err = %d\n", err);
> -		goto err_out;
> -	}
> +	uresp = udata->outbuf;
>   
> -	err = rxe_cq_chk_attr(rxe, NULL, attr->cqe, attr->comp_vector);
> -	if (err) {
> -		rxe_dbg_dev(rxe, "bad init attributes, err = %d\n", err);
> -		goto err_out;
> -	}
> +	if (attr->flags || ibcq->umem)
> +		return -EOPNOTSUPP;
> +
> +	if (attr->cqe > rxe->attr.max_cqe)
> +		return -EINVAL;
>   
>   	err = rxe_add_to_pool(&rxe->cq_pool, cq);
> -	if (err) {
> -		rxe_dbg_dev(rxe, "unable to create cq, err = %d\n", err);
> -		goto err_out;
> -	}
> +	if (err)
> +		return err;
>   
>   	err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, udata,
>   			       uresp);

Neither rxe_create_user_cq() nor rxe_create_cq() explicitly validates 
attr->comp_vector. Is this guaranteed to be validated by the core before 
reaching the driver, or should rxe still enforce device-specific limits?

> -	if (err) {
> -		rxe_dbg_cq(cq, "create cq failed, err = %d\n", err);
> +	if (err)
>   		goto err_cleanup;

The err_cleanup label is only used for this specific error path. It may 
improve readability to inline the cleanup logic at this site and remove 
the label altogether.

> -	}
>   
>   	return 0;
>   
>   err_cleanup:
> -	cleanup_err = rxe_cleanup(cq);
> -	if (cleanup_err)
> -		rxe_err_cq(cq, "cleanup failed, err = %d\n", cleanup_err);
> -err_out:
> -	rxe_err_dev(rxe, "returned err = %d\n", err);
> +	rxe_cleanup(cq);
> +	return err;
> +}
> +
> +static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> +			 struct uverbs_attr_bundle *attrs)
> +{
> +	struct ib_device *dev = ibcq->device;
> +	struct rxe_dev *rxe = to_rdev(dev);
> +	struct rxe_cq *cq = to_rcq(ibcq);
> +	int err;
> +
> +	if (attr->flags)
> +		return -EOPNOTSUPP;
> +
> +	if (attr->cqe > rxe->attr.max_cqe)
> +		return -EINVAL;
> +
> +	err = rxe_add_to_pool(&rxe->cq_pool, cq);
> +	if (err)
> +		return err;
> +
> +	err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, NULL,
> +			       NULL);
> +	if (err)
> +		goto err_cleanup;

ditto

Thanks a lot.

Zhu Yanjun

> +
> +	return 0;
> +
> +err_cleanup:
> +	rxe_cleanup(cq);
>   	return err;
>   }
>   
> @@ -1478,6 +1490,7 @@ static const struct ib_device_ops rxe_dev_ops = {
>   	.attach_mcast = rxe_attach_mcast,
>   	.create_ah = rxe_create_ah,
>   	.create_cq = rxe_create_cq,
> +	.create_user_cq = rxe_create_user_cq,
>   	.create_qp = rxe_create_qp,
>   	.create_srq = rxe_create_srq,
>   	.create_user_ah = rxe_create_ah,
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 29/50] RDMA/rxe: Split user and kernel CQ creation paths
  2026-02-13 23:22   ` yanjun.zhu
@ 2026-02-15  7:06     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-15  7:06 UTC (permalink / raw)
  To: yanjun.zhu
  Cc: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Junxian Huang, Abhijit Gangurde,
	Allen Hubbe, Krzysztof Czurylo, Tatyana Nikolova, Long Li,
	Konstantin Taranov, Yishai Hadas, Michal Kalderon, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list,
	Christian Benvenuti, Nelson Escobar, Dennis Dalessandro,
	Bernard Metzler, Zhu Yanjun, linux-kernel, linux-rdma,
	linux-hyperv

On Fri, Feb 13, 2026 at 03:22:13PM -0800, yanjun.zhu wrote:
> On 2/13/26 2:58 AM, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Separate the CQ creation logic into distinct kernel and user flows.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >   drivers/infiniband/sw/rxe/rxe_verbs.c | 81 ++++++++++++++++++++---------------
> >   1 file changed, 47 insertions(+), 34 deletions(-)

<...>

> > +	if (err)
> > +		return err;
> >   	err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, udata,
> >   			       uresp);
> 
> Neither rxe_create_user_cq() nor rxe_create_cq() explicitly validates
> attr->comp_vector. Is this guaranteed to be validated by the core before
> reaching the driver, or should rxe still enforce device-specific limits?

We should validate it in IB/core level.
https://github.com/linux-rdma/rdma-core/blob/8b9cdb7c6bd2b6e4e64e08888c10124b0d1873f2/libibverbs/man/ibv_create_cq.3#L32
.I comp_vector
for signaling completion events; it must be at least zero and less than
.I context\fR->num_comp_vectors.

> 
> > -	if (err) {
> > -		rxe_dbg_cq(cq, "create cq failed, err = %d\n", err);
> > +	if (err)
> >   		goto err_cleanup;
> 
> The err_cleanup label is only used for this specific error path. It may
> improve readability to inline the cleanup logic at this site and remove the
> label altogether.

Ill delete. Thanks

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-13 10:58 ` [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step Leon Romanovsky
@ 2026-02-16  3:59   ` Selvin Xavier
  2026-02-16  8:07     ` Leon Romanovsky
  2026-02-24  8:15   ` Selvin Xavier
  1 sibling, 1 reply; 73+ messages in thread
From: Selvin Xavier @ 2026-02-16  3:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 3297 bytes --]

On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> There is no need to defer the CQ resize operation, as it is intended to
> be completed in one pass. The current bnxt_re_resize_cq() implementation
> does not handle concurrent CQ resize requests, and this will be addressed
> in the following patches.
bnxt HW requires that the previous CQ memory be available with the HW until
HW generates a cut off cqe on the CQ that is being destroyed. This is
the reason for
polling the completions in the user library after returning the
resize_cq call. Once the polling
thread sees the expected CQE, it will invoke the driver to free CQ
memory.  So ib_umem_release
should wait. This patch doesn't guarantee that.  Do you think if there
is a better way to handle this requirement?

>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
>  1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index d652018c19b3..2aecfbbb7eaf 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
>         return rc;
>  }
>
> -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> -{
> -       struct bnxt_re_dev *rdev = cq->rdev;
> -
> -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> -
> -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> -       if (cq->resize_umem) {
> -               ib_umem_release(cq->ib_cq.umem);
> -               cq->ib_cq.umem = cq->resize_umem;
> -               cq->resize_umem = NULL;
> -               cq->resize_cqe = 0;
> -       }
> -}
>
>  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                       struct ib_udata *udata)
> @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                 goto fail;
>         }
>
> -       cq->ib_cq.cqe = cq->resize_cqe;
> +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> +
> +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> +       ib_umem_release(cq->ib_cq.umem);
> +       cq->ib_cq.umem = cq->resize_umem;
> +       cq->resize_umem = NULL;
> +       cq->resize_cqe = 0;
> +
> +       cq->ib_cq.cqe = entries;
>         atomic_inc(&rdev->stats.res.resize_count);
>
>         return 0;
> @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
>         struct bnxt_re_sqp_entries *sqp_entry = NULL;
>         unsigned long flags;
>
> -       /* User CQ; the only processing we do is to
> -        * complete any pending CQ resize operation.
> -        */
> -       if (cq->ib_cq.umem) {
> -               if (cq->resize_umem)
> -                       bnxt_re_resize_cq_complete(cq);
> -               return 0;
> -       }
> -
>         spin_lock_irqsave(&cq->cq_lock, flags);
>         budget = min_t(u32, num_entries, cq->max_cql);
>         num_entries = budget;
>
> --
> 2.52.0
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5473 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-16  3:59   ` Selvin Xavier
@ 2026-02-16  8:07     ` Leon Romanovsky
  2026-02-17  5:02       ` Selvin Xavier
  0 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-16  8:07 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > There is no need to defer the CQ resize operation, as it is intended to
> > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > does not handle concurrent CQ resize requests, and this will be addressed
> > in the following patches.
> bnxt HW requires that the previous CQ memory be available with the HW until
> HW generates a cut off cqe on the CQ that is being destroyed. This is
> the reason for
> polling the completions in the user library after returning the
> resize_cq call. Once the polling
> thread sees the expected CQE, it will invoke the driver to free CQ
> memory.

This flow is problematic. It requires the kernel to trust a user‑space
application, which is not acceptable. There is no guarantee that the
rdma-core implementation is correct or will invoke the interface properly.
Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
kernel memory, and access to fields that are now being modified. All of this
can occur silently and without any protections.

> So ib_umem_release should wait. This patch doesn't guarantee that.

The issue is that it was never guaranteed in the first place. It only appeared
to work under very controlled conditions.

> Do you think if there is a better way to handle this requirement?

You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
from resize_cq.

Thanks

> 
> >
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> >  1 file changed, 9 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > index d652018c19b3..2aecfbbb7eaf 100644
> > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> >         return rc;
> >  }
> >
> > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > -{
> > -       struct bnxt_re_dev *rdev = cq->rdev;
> > -
> > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > -
> > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > -       if (cq->resize_umem) {
> > -               ib_umem_release(cq->ib_cq.umem);
> > -               cq->ib_cq.umem = cq->resize_umem;
> > -               cq->resize_umem = NULL;
> > -               cq->resize_cqe = 0;
> > -       }
> > -}
> >
> >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                       struct ib_udata *udata)
> > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                 goto fail;
> >         }
> >
> > -       cq->ib_cq.cqe = cq->resize_cqe;
> > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > +
> > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > +       ib_umem_release(cq->ib_cq.umem);
> > +       cq->ib_cq.umem = cq->resize_umem;
> > +       cq->resize_umem = NULL;
> > +       cq->resize_cqe = 0;
> > +
> > +       cq->ib_cq.cqe = entries;
> >         atomic_inc(&rdev->stats.res.resize_count);
> >
> >         return 0;
> > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> >         unsigned long flags;
> >
> > -       /* User CQ; the only processing we do is to
> > -        * complete any pending CQ resize operation.
> > -        */
> > -       if (cq->ib_cq.umem) {
> > -               if (cq->resize_umem)
> > -                       bnxt_re_resize_cq_complete(cq);
> > -               return 0;
> > -       }
> > -
> >         spin_lock_irqsave(&cq->cq_lock, flags);
> >         budget = min_t(u32, num_entries, cq->max_cql);
> >         num_entries = budget;
> >
> > --
> > 2.52.0
> >



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-16  8:07     ` Leon Romanovsky
@ 2026-02-17  5:02       ` Selvin Xavier
  2026-02-17  7:56         ` Leon Romanovsky
  0 siblings, 1 reply; 73+ messages in thread
From: Selvin Xavier @ 2026-02-17  5:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 4916 bytes --]

On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > There is no need to defer the CQ resize operation, as it is intended to
> > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > does not handle concurrent CQ resize requests, and this will be addressed
> > > in the following patches.
> > bnxt HW requires that the previous CQ memory be available with the HW until
> > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > the reason for
> > polling the completions in the user library after returning the
> > resize_cq call. Once the polling
> > thread sees the expected CQE, it will invoke the driver to free CQ
> > memory.
>
> This flow is problematic. It requires the kernel to trust a user‑space
> application, which is not acceptable. There is no guarantee that the
> rdma-core implementation is correct or will invoke the interface properly.
> Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> kernel memory, and access to fields that are now being modified. All of this
> can occur silently and without any protections.
>
> > So ib_umem_release should wait. This patch doesn't guarantee that.
>
> The issue is that it was never guaranteed in the first place. It only appeared
> to work under very controlled conditions.
>
> > Do you think if there is a better way to handle this requirement?
>
> You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> from resize_cq.
The difficulty is that libbnxt_re  in rdma-core has the  queue  the
consumer index used for completion lookup. The driver therefore has to
use copy_from_user to read the queue memory and then check for
BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
relevant validity flags. I’ll explore if we have a way to handle this
and get back.
>
> Thanks
>
> >
> > >
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> > >  1 file changed, 9 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > index d652018c19b3..2aecfbbb7eaf 100644
> > > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> > >         return rc;
> > >  }
> > >
> > > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > > -{
> > > -       struct bnxt_re_dev *rdev = cq->rdev;
> > > -
> > > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > > -
> > > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > > -       if (cq->resize_umem) {
> > > -               ib_umem_release(cq->ib_cq.umem);
> > > -               cq->ib_cq.umem = cq->resize_umem;
> > > -               cq->resize_umem = NULL;
> > > -               cq->resize_cqe = 0;
> > > -       }
> > > -}
> > >
> > >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> > >                       struct ib_udata *udata)
> > > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> > >                 goto fail;
> > >         }
> > >
> > > -       cq->ib_cq.cqe = cq->resize_cqe;
> > > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > > +
> > > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > > +       ib_umem_release(cq->ib_cq.umem);
> > > +       cq->ib_cq.umem = cq->resize_umem;
> > > +       cq->resize_umem = NULL;
> > > +       cq->resize_cqe = 0;
> > > +
> > > +       cq->ib_cq.cqe = entries;
> > >         atomic_inc(&rdev->stats.res.resize_count);
> > >
> > >         return 0;
> > > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> > >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> > >         unsigned long flags;
> > >
> > > -       /* User CQ; the only processing we do is to
> > > -        * complete any pending CQ resize operation.
> > > -        */
> > > -       if (cq->ib_cq.umem) {
> > > -               if (cq->resize_umem)
> > > -                       bnxt_re_resize_cq_complete(cq);
> > > -               return 0;
> > > -       }
> > > -
> > >         spin_lock_irqsave(&cq->cq_lock, flags);
> > >         budget = min_t(u32, num_entries, cq->max_cql);
> > >         num_entries = budget;
> > >
> > > --
> > > 2.52.0
> > >
>
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5473 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-17  5:02       ` Selvin Xavier
@ 2026-02-17  7:56         ` Leon Romanovsky
  2026-02-17 10:52           ` Selvin Xavier
  0 siblings, 1 reply; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-17  7:56 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > >
> > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > in the following patches.
> > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > the reason for
> > > polling the completions in the user library after returning the
> > > resize_cq call. Once the polling
> > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > memory.
> >
> > This flow is problematic. It requires the kernel to trust a user‑space
> > application, which is not acceptable. There is no guarantee that the
> > rdma-core implementation is correct or will invoke the interface properly.
> > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > kernel memory, and access to fields that are now being modified. All of this
> > can occur silently and without any protections.
> >
> > > So ib_umem_release should wait. This patch doesn't guarantee that.
> >
> > The issue is that it was never guaranteed in the first place. It only appeared
> > to work under very controlled conditions.
> >
> > > Do you think if there is a better way to handle this requirement?
> >
> > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > from resize_cq.
> The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> consumer index used for completion lookup. The driver therefore has to
> use copy_from_user to read the queue memory and then check for
> BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> relevant validity flags. I’ll explore if we have a way to handle this
> and get back.

The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
kernel won't require anything from user-space.

Can you cause to your HW to stop generate CQEs before resize_cq?

Thanks

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-17  7:56         ` Leon Romanovsky
@ 2026-02-17 10:52           ` Selvin Xavier
  2026-02-19  8:02             ` Selvin Xavier
  0 siblings, 1 reply; 73+ messages in thread
From: Selvin Xavier @ 2026-02-17 10:52 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 2851 bytes --]

On Tue, Feb 17, 2026 at 1:27 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> > On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > >
> > > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > > in the following patches.
> > > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > > the reason for
> > > > polling the completions in the user library after returning the
> > > > resize_cq call. Once the polling
> > > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > > memory.
> > >
> > > This flow is problematic. It requires the kernel to trust a user‑space
> > > application, which is not acceptable. There is no guarantee that the
> > > rdma-core implementation is correct or will invoke the interface properly.
> > > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > > kernel memory, and access to fields that are now being modified. All of this
> > > can occur silently and without any protections.
> > >
> > > > So ib_umem_release should wait. This patch doesn't guarantee that.
> > >
> > > The issue is that it was never guaranteed in the first place. It only appeared
> > > to work under very controlled conditions.
> > >
> > > > Do you think if there is a better way to handle this requirement?
> > >
> > > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > > from resize_cq.
> > The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> > consumer index used for completion lookup. The driver therefore has to
> > use copy_from_user to read the queue memory and then check for
> > BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> > relevant validity flags. I’ll explore if we have a way to handle this
> > and get back.
>
> The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
> kernel won't require anything from user-space.
>
> Can you cause to your HW to stop generate CQEs before resize_cq?
we dont have this control (especially on the Receive CQ side).  For
the Tx side, maybe we can prevent
posting to the Tx queue.
>
> Thanks

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5473 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-17 10:52           ` Selvin Xavier
@ 2026-02-19  8:02             ` Selvin Xavier
  0 siblings, 0 replies; 73+ messages in thread
From: Selvin Xavier @ 2026-02-19  8:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 3316 bytes --]

On Tue, Feb 17, 2026 at 4:22 PM Selvin Xavier
<selvin.xavier@broadcom.com> wrote:
>
> On Tue, Feb 17, 2026 at 1:27 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> > > On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > >
> > > > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > > > in the following patches.
> > > > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > > > the reason for
> > > > > polling the completions in the user library after returning the
> > > > > resize_cq call. Once the polling
> > > > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > > > memory.
> > > >
> > > > This flow is problematic. It requires the kernel to trust a user‑space
> > > > application, which is not acceptable. There is no guarantee that the
> > > > rdma-core implementation is correct or will invoke the interface properly.
> > > > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > > > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > > > kernel memory, and access to fields that are now being modified. All of this
> > > > can occur silently and without any protections.
> > > >
> > > > > So ib_umem_release should wait. This patch doesn't guarantee that.
> > > >
> > > > The issue is that it was never guaranteed in the first place. It only appeared
> > > > to work under very controlled conditions.
> > > >
> > > > > Do you think if there is a better way to handle this requirement?
> > > >
> > > > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > > > from resize_cq.
> > > The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> > > consumer index used for completion lookup. The driver therefore has to
> > > use copy_from_user to read the queue memory and then check for
> > > BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> > > relevant validity flags. I’ll explore if we have a way to handle this
> > > and get back.
> >
> > The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
> > kernel won't require anything from user-space.
> >
> > Can you cause to your HW to stop generate CQEs before resize_cq?
> we dont have this control (especially on the Receive CQ side).  For
> the Tx side, maybe we can prevent
> posting to the Tx queue.
After discussing with other teams internally, we feel that the
sequence given by you
 should work fine.  As per the sequence, BNXT_RE_WC_TYPE_COFF should
be available when resize request is returned from FW.
We will test your series and confirm above behavior.
> >
> > Thanks

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5473 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-13 10:58 ` [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths Leon Romanovsky
@ 2026-02-24  2:20   ` Cheng Xu
  2026-02-24 10:46     ` Leon Romanovsky
  2026-02-26  6:17   ` Junxian Huang
  1 sibling, 1 reply; 73+ messages in thread
From: Cheng Xu @ 2026-02-24  2:20 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Kai Shen, Chengchang Tang, Junxian Huang,
	Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv



On 2/13/26 6:58 PM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Split CQ creation into distinct kernel and user flows. The hns driver,
> inherited from mlx4, uses a problematic pattern that shares and caches
> umem in hns_roce_db_map_user(). This design blocks the driver from
> supporting generic umem sources (VMA, dmabuf, memfd, and others).
> 
> In addition, let's delete counter that counts CQ creation errors. There
> are multiple ways to debug kernel in modern kernel without need to rely
> on that debugfs counter.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_cq.c      | 103 ++++++++++++++++++++-------
>  drivers/infiniband/hw/hns/hns_roce_debugfs.c |   1 -
>  drivers/infiniband/hw/hns/hns_roce_device.h  |   3 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c    |   1 +
>  4 files changed, 82 insertions(+), 26 deletions(-)
> 

Hi Leon,

The driver name in this patch's title should be "RDMA/hns".

Thanks,
Cheng Xu

> diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
> index 857a913326cd..0f24a916466b 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_cq.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
> @@ -335,7 +335,10 @@ static int verify_cq_create_attr(struct hns_roce_dev *hr_dev,
>  {
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
>  
> -	if (!attr->cqe || attr->cqe > hr_dev->caps.max_cqes) {
> +	if (attr->flags)
> +		return -EOPNOTSUPP;
> +
> +	if (attr->cqe > hr_dev->caps.max_cqes) {
>  		ibdev_err(ibdev, "failed to check CQ count %u, max = %u.\n",
>  			  attr->cqe, hr_dev->caps.max_cqes);
>  		return -EINVAL;
> @@ -407,8 +410,8 @@ static int set_cqe_size(struct hns_roce_cq *hr_cq, struct ib_udata *udata,
>  	return 0;
>  }
>  
> -int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> -		       struct uverbs_attr_bundle *attrs)
> +int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +			    struct uverbs_attr_bundle *attrs)
>  {
>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
>  	struct ib_udata *udata = &attrs->driver_udata;
> @@ -418,31 +421,27 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  	struct hns_roce_ib_create_cq ucmd = {};
>  	int ret;
>  
> -	if (attr->flags) {
> -		ret = -EOPNOTSUPP;
> -		goto err_out;
> -	}
> +	if (ib_cq->umem)
> +		return -EOPNOTSUPP;
>  
>  	ret = verify_cq_create_attr(hr_dev, attr);
>  	if (ret)
> -		goto err_out;
> +		return ret;
>  
> -	if (udata) {
> -		ret = get_cq_ucmd(hr_cq, udata, &ucmd);
> -		if (ret)
> -			goto err_out;
> -	}
> +	ret = get_cq_ucmd(hr_cq, udata, &ucmd);
> +	if (ret)
> +		return ret;
>  
>  	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd);
>  
>  	ret = set_cqe_size(hr_cq, udata, &ucmd);
>  	if (ret)
> -		goto err_out;
> +		return ret;
>  
>  	ret = alloc_cq_buf(hr_dev, hr_cq, udata, ucmd.buf_addr);
>  	if (ret) {
>  		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
> -		goto err_out;
> +		return ret;
>  	}
>  
>  	ret = alloc_cq_db(hr_dev, hr_cq, udata, ucmd.db_addr, &resp);
> @@ -464,13 +463,11 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  		goto err_cqn;
>  	}
>  
> -	if (udata) {
> -		resp.cqn = hr_cq->cqn;
> -		ret = ib_copy_to_udata(udata, &resp,
> -				       min(udata->outlen, sizeof(resp)));
> -		if (ret)
> -			goto err_cqc;
> -	}
> +	resp.cqn = hr_cq->cqn;
> +	ret = ib_copy_to_udata(udata, &resp,
> +			       min(udata->outlen, sizeof(resp)));
> +	if (ret)
> +		goto err_cqc;
>  
>  	hr_cq->cons_index = 0;
>  	hr_cq->arm_sn = 1;
> @@ -487,9 +484,67 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  	free_cq_db(hr_dev, hr_cq, udata);
>  err_cq_buf:
>  	free_cq_buf(hr_dev, hr_cq);
> -err_out:
> -	atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT]);
> +	return ret;
> +}
> +
> +int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +		       struct uverbs_attr_bundle *attrs)
> +{
> +	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
> +	struct hns_roce_ib_create_cq_resp resp = {};
> +	struct hns_roce_cq *hr_cq = to_hr_cq(ib_cq);
> +	struct ib_device *ibdev = &hr_dev->ib_dev;
> +	struct hns_roce_ib_create_cq ucmd = {};
> +	int ret;
> +
> +	ret = verify_cq_create_attr(hr_dev, attr);
> +	if (ret)
> +		return ret;
> +
> +	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd);
> +
> +	ret = set_cqe_size(hr_cq, NULL, &ucmd);
> +	if (ret)
> +		return ret;
>  
> +	ret = alloc_cq_buf(hr_dev, hr_cq, NULL, 0);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
> +		return ret;
> +	}
> +
> +	ret = alloc_cq_db(hr_dev, hr_cq, NULL, 0, &resp);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQ db, ret = %d.\n", ret);
> +		goto err_cq_buf;
> +	}
> +
> +	ret = alloc_cqn(hr_dev, hr_cq, NULL);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQN, ret = %d.\n", ret);
> +		goto err_cq_db;
> +	}
> +
> +	ret = alloc_cqc(hr_dev, hr_cq);
> +	if (ret) {
> +		ibdev_err(ibdev,
> +			  "failed to alloc CQ context, ret = %d.\n", ret);
> +		goto err_cqn;
> +	}
> +
> +	hr_cq->cons_index = 0;
> +	hr_cq->arm_sn = 1;
> +	refcount_set(&hr_cq->refcount, 1);
> +	init_completion(&hr_cq->free);
> +
> +	return 0;
> +
> +err_cqn:
> +	free_cqn(hr_dev, hr_cq->cqn);
> +err_cq_db:
> +	free_cq_db(hr_dev, hr_cq, NULL);
> +err_cq_buf:
> +	free_cq_buf(hr_dev, hr_cq);
>  	return ret;
>  }
>  
> diff --git a/drivers/infiniband/hw/hns/hns_roce_debugfs.c b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> index b869cdc54118..481b30f2f5b5 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> @@ -47,7 +47,6 @@ static const char * const sw_stat_info[] = {
>  	[HNS_ROCE_DFX_MBX_EVENT_CNT] = "mbx_event",
>  	[HNS_ROCE_DFX_QP_CREATE_ERR_CNT] = "qp_create_err",
>  	[HNS_ROCE_DFX_QP_MODIFY_ERR_CNT] = "qp_modify_err",
> -	[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT] = "cq_create_err",
>  	[HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT] = "cq_modify_err",
>  	[HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT] = "srq_create_err",
>  	[HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT] = "srq_modify_err",
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 3f032b8038af..fdc5f487d7a3 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -902,7 +902,6 @@ enum hns_roce_sw_dfx_stat_index {
>  	HNS_ROCE_DFX_MBX_EVENT_CNT,
>  	HNS_ROCE_DFX_QP_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_QP_MODIFY_ERR_CNT,
> -	HNS_ROCE_DFX_CQ_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT,
>  	HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT,
> @@ -1295,6 +1294,8 @@ int to_hr_qp_type(int qp_type);
>  
>  int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  		       struct uverbs_attr_bundle *attrs);
> +int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +			    struct uverbs_attr_bundle *attrs);
>  
>  int hns_roce_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata);
>  int hns_roce_db_map_user(struct hns_roce_ucontext *context, unsigned long virt,
> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> index a3490bab297a..64de49bf8df7 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_main.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> @@ -727,6 +727,7 @@ static const struct ib_device_ops hns_roce_dev_ops = {
>  	.create_ah = hns_roce_create_ah,
>  	.create_user_ah = hns_roce_create_ah,
>  	.create_cq = hns_roce_create_cq,
> +	.create_user_cq = hns_roce_create_user_cq,
>  	.create_qp = hns_roce_create_qp,
>  	.dealloc_pd = hns_roce_dealloc_pd,
>  	.dealloc_ucontext = hns_roce_dealloc_ucontext,
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 18/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-13 10:57 ` [PATCH rdma-next 18/50] RDMA/erdma: Separate " Leon Romanovsky
@ 2026-02-24  5:51   ` Cheng Xu
  2026-02-24 10:57     ` Leon Romanovsky
  0 siblings, 1 reply; 73+ messages in thread
From: Cheng Xu @ 2026-02-24  5:51 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-kernel, linux-rdma, linux-hyperv



On 2/13/26 6:57 PM, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Split CQ creation into distinct kernel and user flows. The erdma driver,
> inherited from mlx4, uses a problematic pattern that shares and caches
> umem in erdma_map_user_dbrecords(). This design blocks the driver from
> supporting generic umem sources (VMA, dmabuf, memfd, and others).
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/erdma/erdma_main.c  |  1 +
>  drivers/infiniband/hw/erdma/erdma_verbs.c | 97 ++++++++++++++++++++-----------
>  drivers/infiniband/hw/erdma/erdma_verbs.h |  2 +
>  3 files changed, 67 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/erdma/erdma_main.c b/drivers/infiniband/hw/erdma/erdma_main.c
> index f35b30235018..1b6426e89d80 100644
> --- a/drivers/infiniband/hw/erdma/erdma_main.c
> +++ b/drivers/infiniband/hw/erdma/erdma_main.c
> @@ -505,6 +505,7 @@ static const struct ib_device_ops erdma_device_ops = {
>  	.alloc_pd = erdma_alloc_pd,
>  	.alloc_ucontext = erdma_alloc_ucontext,
>  	.create_cq = erdma_create_cq,
> +	.create_user_cq = erdma_create_user_cq,
>  	.create_qp = erdma_create_qp,
>  	.dealloc_pd = erdma_dealloc_pd,
>  	.dealloc_ucontext = erdma_dealloc_ucontext,

<...>

> +
> +int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> +		    struct uverbs_attr_bundle *attrs)

create_cq will be used for kernel CQ creation, and the third input parameter
'struct uverbs_attr_bundle *attrs' will be useless, so it can be removed? Same to
all drivers.


> +{

<...>

> +	ret = create_cq_cmd(NULL, cq);
> +	if (ret)
> +		goto err_free_res;


In create_cq_cmd, should add the following change:

diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 8c30df61ae3d..eca28524e04b 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -240,7 +240,7 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq)
                req.first_page_offset = mem->page_offset;
                req.cq_dbrec_dma = cq->user_cq.dbrec_dma;
 
-               if (uctx->ext_db.enable) {
+               if (uctx && uctx->ext_db.enable) {
                        req.cfg1 |= FIELD_PREP(
                                ERDMA_CMD_CREATE_CQ_MTT_DB_CFG_MASK, 1);
                        req.cfg2 = FIELD_PREP(ERDMA_CMD_CREATE_CQ_DB_CFG_MASK,


Thanks,
Cheng Xu


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-13 10:58 ` [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step Leon Romanovsky
  2026-02-16  3:59   ` Selvin Xavier
@ 2026-02-24  8:15   ` Selvin Xavier
  2026-02-24 10:59     ` Leon Romanovsky
  1 sibling, 1 reply; 73+ messages in thread
From: Selvin Xavier @ 2026-02-24  8:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

[-- Attachment #1: Type: text/plain, Size: 3239 bytes --]

On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> There is no need to defer the CQ resize operation, as it is intended to
> be completed in one pass. The current bnxt_re_resize_cq() implementation
> does not handle concurrent CQ resize requests, and this will be addressed
> in the following patches.
>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
>  1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index d652018c19b3..2aecfbbb7eaf 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
>         return rc;
>  }
>
> -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> -{
> -       struct bnxt_re_dev *rdev = cq->rdev;
> -
> -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> -
> -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> -       if (cq->resize_umem) {
> -               ib_umem_release(cq->ib_cq.umem);
> -               cq->ib_cq.umem = cq->resize_umem;
> -               cq->resize_umem = NULL;
> -               cq->resize_cqe = 0;
> -       }
> -}
>
>  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                       struct ib_udata *udata)
> @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                 goto fail;
>         }
>
> -       cq->ib_cq.cqe = cq->resize_cqe;
> +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> +
> +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> +       ib_umem_release(cq->ib_cq.umem);
> +       cq->ib_cq.umem = cq->resize_umem;
> +       cq->resize_umem = NULL;
> +       cq->resize_cqe = 0;
> +
> +       cq->ib_cq.cqe = entries;
>         atomic_inc(&rdev->stats.res.resize_count);
>
>         return 0;
> @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
>         struct bnxt_re_sqp_entries *sqp_entry = NULL;
>         unsigned long flags;
>
> -       /* User CQ; the only processing we do is to
> -        * complete any pending CQ resize operation.
> -        */
> -       if (cq->ib_cq.umem) {
> -               if (cq->resize_umem)
> -                       bnxt_re_resize_cq_complete(cq);
> -               return 0;
> -       }
> -
Since this code is removed,  we need to remove  ibv_cmd_poll_cq call
from the user library.
For older libraries which still calls ibv_cmd_poll_cq, i think we
should we keep a check.  Else it will throw a print "POLL CQ : no CQL
to use". Either we should add the following code or remove this print.
       if (cq->ib_cq.umem)
                  return 0;
Otherwise, it looks good to me.

Thanks,
Selvin




>         spin_lock_irqsave(&cq->cq_lock, flags);
>         budget = min_t(u32, num_entries, cq->max_cql);
>         num_entries = budget;
>
> --
> 2.52.0
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5473 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-24  2:20   ` Cheng Xu
@ 2026-02-24 10:46     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-24 10:46 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

On Tue, Feb 24, 2026 at 10:20:39AM +0800, Cheng Xu wrote:
> 
> 
> On 2/13/26 6:58 PM, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Split CQ creation into distinct kernel and user flows. The hns driver,
> > inherited from mlx4, uses a problematic pattern that shares and caches
> > umem in hns_roce_db_map_user(). This design blocks the driver from
> > supporting generic umem sources (VMA, dmabuf, memfd, and others).
> > 
> > In addition, let's delete counter that counts CQ creation errors. There
> > are multiple ways to debug kernel in modern kernel without need to rely
> > on that debugfs counter.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/hns/hns_roce_cq.c      | 103 ++++++++++++++++++++-------
> >  drivers/infiniband/hw/hns/hns_roce_debugfs.c |   1 -
> >  drivers/infiniband/hw/hns/hns_roce_device.h  |   3 +-
> >  drivers/infiniband/hw/hns/hns_roce_main.c    |   1 +
> >  4 files changed, 82 insertions(+), 26 deletions(-)
> > 
> 
> Hi Leon,
> 
> The driver name in this patch's title should be "RDMA/hns".

Right, thanks

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 18/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-24  5:51   ` Cheng Xu
@ 2026-02-24 10:57     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-24 10:57 UTC (permalink / raw)
  To: Cheng Xu; +Cc: linux-kernel, linux-rdma, linux-hyperv

On Tue, Feb 24, 2026 at 01:51:41PM +0800, Cheng Xu wrote:
> 
> 
> On 2/13/26 6:57 PM, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Split CQ creation into distinct kernel and user flows. The erdma driver,
> > inherited from mlx4, uses a problematic pattern that shares and caches
> > umem in erdma_map_user_dbrecords(). This design blocks the driver from
> > supporting generic umem sources (VMA, dmabuf, memfd, and others).
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/erdma/erdma_main.c  |  1 +
> >  drivers/infiniband/hw/erdma/erdma_verbs.c | 97 ++++++++++++++++++++-----------
> >  drivers/infiniband/hw/erdma/erdma_verbs.h |  2 +
> >  3 files changed, 67 insertions(+), 33 deletions(-)
> > 
> > diff --git a/drivers/infiniband/hw/erdma/erdma_main.c b/drivers/infiniband/hw/erdma/erdma_main.c
> > index f35b30235018..1b6426e89d80 100644
> > --- a/drivers/infiniband/hw/erdma/erdma_main.c
> > +++ b/drivers/infiniband/hw/erdma/erdma_main.c
> > @@ -505,6 +505,7 @@ static const struct ib_device_ops erdma_device_ops = {
> >  	.alloc_pd = erdma_alloc_pd,
> >  	.alloc_ucontext = erdma_alloc_ucontext,
> >  	.create_cq = erdma_create_cq,
> > +	.create_user_cq = erdma_create_user_cq,
> >  	.create_qp = erdma_create_qp,
> >  	.dealloc_pd = erdma_dealloc_pd,
> >  	.dealloc_ucontext = erdma_dealloc_ucontext,
> 
> <...>
> 
> > +
> > +int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> > +		    struct uverbs_attr_bundle *attrs)
> 
> create_cq will be used for kernel CQ creation, and the third input parameter
> 'struct uverbs_attr_bundle *attrs' will be useless, so it can be removed? Same to
> all drivers.

Yes, but only after conversion of all drivers. I have that removal patch
in my v2.

> 
> 
> > +{
> 
> <...>
> 
> > +	ret = create_cq_cmd(NULL, cq);
> > +	if (ret)
> > +		goto err_free_res;
> 
> 
> In create_cq_cmd, should add the following change:

I took slightly different approach and inlined create_cq_cmd() into erdma_create_*_cq().

Thanks

> 
> diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
> index 8c30df61ae3d..eca28524e04b 100644
> --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
> +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
> @@ -240,7 +240,7 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq)
>                 req.first_page_offset = mem->page_offset;
>                 req.cq_dbrec_dma = cq->user_cq.dbrec_dma;
>  
> -               if (uctx->ext_db.enable) {
> +               if (uctx && uctx->ext_db.enable) {
>                         req.cfg1 |= FIELD_PREP(
>                                 ERDMA_CMD_CREATE_CQ_MTT_DB_CFG_MASK, 1);
>                         req.cfg2 = FIELD_PREP(ERDMA_CMD_CREATE_CQ_DB_CFG_MASK,
> 
> 
> Thanks,
> Cheng Xu
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step
  2026-02-24  8:15   ` Selvin Xavier
@ 2026-02-24 10:59     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-24 10:59 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Jason Gunthorpe, Kalesh AP, Potnuri Bharat Teja, Michael Margolin,
	Gal Pressman, Yossi Leybovich, Cheng Xu, Kai Shen,
	Chengchang Tang, Junxian Huang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

On Tue, Feb 24, 2026 at 01:45:42PM +0530, Selvin Xavier wrote:
> On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > There is no need to defer the CQ resize operation, as it is intended to
> > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > does not handle concurrent CQ resize requests, and this will be addressed
> > in the following patches.
> >
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> >  1 file changed, 9 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > index d652018c19b3..2aecfbbb7eaf 100644
> > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> >         return rc;
> >  }
> >
> > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > -{
> > -       struct bnxt_re_dev *rdev = cq->rdev;
> > -
> > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > -
> > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > -       if (cq->resize_umem) {
> > -               ib_umem_release(cq->ib_cq.umem);
> > -               cq->ib_cq.umem = cq->resize_umem;
> > -               cq->resize_umem = NULL;
> > -               cq->resize_cqe = 0;
> > -       }
> > -}
> >
> >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                       struct ib_udata *udata)
> > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                 goto fail;
> >         }
> >
> > -       cq->ib_cq.cqe = cq->resize_cqe;
> > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > +
> > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > +       ib_umem_release(cq->ib_cq.umem);
> > +       cq->ib_cq.umem = cq->resize_umem;
> > +       cq->resize_umem = NULL;
> > +       cq->resize_cqe = 0;
> > +
> > +       cq->ib_cq.cqe = entries;
> >         atomic_inc(&rdev->stats.res.resize_count);
> >
> >         return 0;
> > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> >         unsigned long flags;
> >
> > -       /* User CQ; the only processing we do is to
> > -        * complete any pending CQ resize operation.
> > -        */
> > -       if (cq->ib_cq.umem) {
> > -               if (cq->resize_umem)
> > -                       bnxt_re_resize_cq_complete(cq);
> > -               return 0;
> > -       }
> > -
> Since this code is removed,  we need to remove  ibv_cmd_poll_cq call
> from the user library.
> For older libraries which still calls ibv_cmd_poll_cq, i think we
> should we keep a check.  Else it will throw a print "POLL CQ : no CQL
> to use". Either we should add the following code or remove this print.
>        if (cq->ib_cq.umem)
>                   return 0;

I'll add the check with extra comment.

> Otherwise, it looks good to me.

Thanks

> 
> Thanks,
> Selvin
> 
> 
> 
> 
> >         spin_lock_irqsave(&cq->cq_lock, flags);
> >         budget = min_t(u32, num_entries, cq->max_cql);
> >         num_entries = budget;
> >
> > --
> > 2.52.0
> >



^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [EXTERNAL] [PATCH rdma-next 25/50] RDMA/mana: Provide a modern CQ creation interface
  2026-02-13 10:58 ` [PATCH rdma-next 25/50] RDMA/mana: " Leon Romanovsky
@ 2026-02-24 22:30   ` Long Li
  2026-02-25  8:24     ` Leon Romanovsky
  0 siblings, 1 reply; 73+ messages in thread
From: Long Li @ 2026-02-24 22:30 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Junxian Huang, Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	Shiraz Saleem
  Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-hyperv@vger.kernel.org

> diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
> index 2dce1b677115..605122ecf9f9 100644
> --- a/drivers/infiniband/hw/mana/cq.c
> +++ b/drivers/infiniband/hw/mana/cq.c
> @@ -5,8 +5,8 @@
> 
>  #include "mana_ib.h"
> 
> -int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> -		      struct uverbs_attr_bundle *attrs)
> +int mana_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr
> *attr,
> +			   struct uverbs_attr_bundle *attrs)
>  {
>  	struct ib_udata *udata = &attrs->driver_udata;
>  	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
> @@ -17,7 +17,6 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct
> ib_cq_init_attr *attr,
>  	struct mana_ib_dev *mdev;
>  	bool is_rnic_cq;
>  	u32 doorbell;
> -	u32 buf_size;
>  	int err;
> 
>  	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev); @@ -26,44
> +25,100 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct
> ib_cq_init_attr *attr,
>  	cq->cq_handle = INVALID_MANA_HANDLE;
>  	is_rnic_cq = mana_ib_is_rnic(mdev);
> 
> -	if (udata) {
> -		if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
> -			return -EINVAL;
> +	if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
> +		return -EINVAL;
> 
> -		err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd),
> udata->inlen));
> -		if (err) {
> -			ibdev_dbg(ibdev, "Failed to copy from udata for create
> cq, %d\n", err);
> -			return err;
> -		}
> +	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata-
> >inlen));
> +	if (err) {
> +		ibdev_dbg(ibdev, "Failed to copy from udata for create
> cq, %d\n", err);
> +		return err;
> +	}
> 
> -		if ((!is_rnic_cq && attr->cqe > mdev-
> >adapter_caps.max_qp_wr) ||
> -		    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
> -			ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr-
> >cqe);
> -			return -EINVAL;
> -		}
> +	if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) ||
> +	    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
> +		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
> +		return -EINVAL;
> +	}
> +
> +	cq->cqe = attr->cqe;
> +	if (!ibcq->umem)
> +		ibcq->umem = ib_umem_get(ibdev, ucmd.buf_addr,
> +				     cq->cqe * COMP_ENTRY_SIZE,
> +				     IB_ACCESS_LOCAL_WRITE);
> +	if (IS_ERR(ibcq->umem))
> +		return PTR_ERR(ibcq->umem);
> +	cq->queue.umem = ibcq->umem;
> +
> +	err = mana_ib_create_queue(mdev, &cq->queue);
> +	if (err)
> +		return err;

Should we call ib_umem_release() on this err?

> 
> diff --git a/drivers/infiniband/hw/mana/qp.c
> b/drivers/infiniband/hw/mana/qp.c index 48c1f4977f21..b08dbc675741
> 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -326,11 +326,20 @@ static int mana_ib_create_qp_raw(struct ib_qp
> *ibqp, struct ib_pd *ibpd,
>  	ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
>  		  ucmd.sq_buf_addr, ucmd.port);
> 
> -	err = mana_ib_create_queue(mdev, ucmd.sq_buf_addr,
> ucmd.sq_buf_size, &qp->raw_sq);
> +	qp->raw_sq.umem = ib_umem_get(&mdev->ib_dev, ucmd.sq_buf_addr,
> +				      ucmd.sq_buf_size,
> IB_ACCESS_LOCAL_WRITE);
> +	if (IS_ERR(qp->raw_sq.umem)) {
> +		err = PTR_ERR(qp->raw_sq.umem);
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "Failed to get umem for qp-raw, err %d\n", err);
> +		goto err_free_vport;
> +	}
> +
> +	err = mana_ib_create_queue(mdev, &qp->raw_sq);
>  	if (err) {
>  		ibdev_dbg(&mdev->ib_dev,
>  			  "Failed to create queue for create qp-raw, err %d\n",
> err);
> -		goto err_free_vport;
> +		goto err_release_umem;
>  	}
> 
>  	/* Create a WQ on the same port handle used by the Ethernet */ @@ -
> 391,6 +400,10 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp,
> struct ib_pd *ibpd,
> 
>  err_destroy_queue:
>  	mana_ib_destroy_queue(mdev, &qp->raw_sq);
> +	return err;

Should remove this "return err", the error handling code should fall through.

> +
> +err_release_umem:
> +	ib_umem_release(qp->raw_sq.umem);
> 
>  err_free_vport:
>  	mana_ib_uncfg_vport(mdev, pd, port);
> @@ -553,13 +566,25 @@ static int mana_ib_create_rc_qp(struct ib_qp *ibqp,
> struct ib_pd *ibpd,
>  		if (i == MANA_RC_SEND_QUEUE_FMR) {
>  			qp->rc_qp.queues[i].id = INVALID_QUEUE_ID;
>  			qp->rc_qp.queues[i].gdma_region =
> GDMA_INVALID_DMA_REGION;
> +			qp->rc_qp.queues[i].umem = NULL;
>  			continue;
>  		}
> -		err = mana_ib_create_queue(mdev, ucmd.queue_buf[j],
> ucmd.queue_size[j],
> -					   &qp->rc_qp.queues[i]);
> +		qp->rc_qp.queues[i].umem = ib_umem_get(&mdev->ib_dev,
> +						       ucmd.queue_buf[j],
> +						       ucmd.queue_size[j],
> +
> IB_ACCESS_LOCAL_WRITE);
> +		if (IS_ERR(qp->rc_qp.queues[i].umem)) {
> +			err = PTR_ERR(qp->rc_qp.queues[i].umem);
> +			ibdev_err(&mdev->ib_dev, "Failed to get umem for
> queue %d, err %d\n",
> +				  i, err);
> +			goto release_umems;

mana_ib_create_queue() may already have created some queues, need to clean them up or we have a leak. 

Maybe use destroy_queues: to call ib_umem_release()?


Another issue: there is a call to ib_umem_release(queue->umem) in mana_ib_destroy_queue(), should we remove that as well?

Thanks,
Long

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [EXTERNAL] [PATCH rdma-next 25/50] RDMA/mana: Provide a modern CQ creation interface
  2026-02-24 22:30   ` [EXTERNAL] " Long Li
@ 2026-02-25  8:24     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-25  8:24 UTC (permalink / raw)
  To: Long Li
  Cc: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Junxian Huang, Abhijit Gangurde,
	Allen Hubbe, Krzysztof Czurylo, Tatyana Nikolova,
	Konstantin Taranov, Yishai Hadas, Michal Kalderon, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list,
	Christian Benvenuti, Nelson Escobar, Dennis Dalessandro,
	Bernard Metzler, Zhu Yanjun, Shiraz Saleem,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-hyperv@vger.kernel.org

On Tue, Feb 24, 2026 at 10:30:37PM +0000, Long Li wrote:
> > diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
> > index 2dce1b677115..605122ecf9f9 100644
> > --- a/drivers/infiniband/hw/mana/cq.c
> > +++ b/drivers/infiniband/hw/mana/cq.c
> > @@ -5,8 +5,8 @@
> > 
> >  #include "mana_ib.h"
> > 
> > -int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> > -		      struct uverbs_attr_bundle *attrs)
> > +int mana_ib_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr
> > *attr,
> > +			   struct uverbs_attr_bundle *attrs)
> >  {
> >  	struct ib_udata *udata = &attrs->driver_udata;
> >  	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
> > @@ -17,7 +17,6 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct
> > ib_cq_init_attr *attr,
> >  	struct mana_ib_dev *mdev;
> >  	bool is_rnic_cq;
> >  	u32 doorbell;
> > -	u32 buf_size;
> >  	int err;
> > 
> >  	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev); @@ -26,44
> > +25,100 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct
> > ib_cq_init_attr *attr,
> >  	cq->cq_handle = INVALID_MANA_HANDLE;
> >  	is_rnic_cq = mana_ib_is_rnic(mdev);
> > 
> > -	if (udata) {
> > -		if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
> > -			return -EINVAL;
> > +	if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
> > +		return -EINVAL;
> > 
> > -		err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd),
> > udata->inlen));
> > -		if (err) {
> > -			ibdev_dbg(ibdev, "Failed to copy from udata for create
> > cq, %d\n", err);
> > -			return err;
> > -		}
> > +	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata-
> > >inlen));
> > +	if (err) {
> > +		ibdev_dbg(ibdev, "Failed to copy from udata for create
> > cq, %d\n", err);
> > +		return err;
> > +	}
> > 
> > -		if ((!is_rnic_cq && attr->cqe > mdev-
> > >adapter_caps.max_qp_wr) ||
> > -		    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
> > -			ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr-
> > >cqe);
> > -			return -EINVAL;
> > -		}
> > +	if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) ||
> > +	    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
> > +		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
> > +		return -EINVAL;
> > +	}
> > +
> > +	cq->cqe = attr->cqe;
> > +	if (!ibcq->umem)
> > +		ibcq->umem = ib_umem_get(ibdev, ucmd.buf_addr,
> > +				     cq->cqe * COMP_ENTRY_SIZE,
> > +				     IB_ACCESS_LOCAL_WRITE);
> > +	if (IS_ERR(ibcq->umem))
> > +		return PTR_ERR(ibcq->umem);
> > +	cq->queue.umem = ibcq->umem;
> > +
> > +	err = mana_ib_create_queue(mdev, &cq->queue);
> > +	if (err)
> > +		return err;
> 
> Should we call ib_umem_release() on this err?

<...>

> >  err_destroy_queue:
> >  	mana_ib_destroy_queue(mdev, &qp->raw_sq);
> > +	return err;
> 
> Should remove this "return err", the error handling code should fall through.

The main idea of this series is to allocate/release umem in the core logic.
See patch #5 https://lore.kernel.org/linux-rdma/20260213-refactor-umem-v1-5-f3be85847922@nvidia.com/

> 
> > +
> > +err_release_umem:
> > +	ib_umem_release(qp->raw_sq.umem);
> > 
> >  err_free_vport:
> >  	mana_ib_uncfg_vport(mdev, pd, port);
> > @@ -553,13 +566,25 @@ static int mana_ib_create_rc_qp(struct ib_qp *ibqp,
> > struct ib_pd *ibpd,
> >  		if (i == MANA_RC_SEND_QUEUE_FMR) {
> >  			qp->rc_qp.queues[i].id = INVALID_QUEUE_ID;
> >  			qp->rc_qp.queues[i].gdma_region =
> > GDMA_INVALID_DMA_REGION;
> > +			qp->rc_qp.queues[i].umem = NULL;
> >  			continue;
> >  		}
> > -		err = mana_ib_create_queue(mdev, ucmd.queue_buf[j],
> > ucmd.queue_size[j],
> > -					   &qp->rc_qp.queues[i]);
> > +		qp->rc_qp.queues[i].umem = ib_umem_get(&mdev->ib_dev,
> > +						       ucmd.queue_buf[j],
> > +						       ucmd.queue_size[j],
> > +
> > IB_ACCESS_LOCAL_WRITE);
> > +		if (IS_ERR(qp->rc_qp.queues[i].umem)) {
> > +			err = PTR_ERR(qp->rc_qp.queues[i].umem);
> > +			ibdev_err(&mdev->ib_dev, "Failed to get umem for
> > queue %d, err %d\n",
> > +				  i, err);
> > +			goto release_umems;
> 
> mana_ib_create_queue() may already have created some queues, need to clean them up or we have a leak. 
> 
> Maybe use destroy_queues: to call ib_umem_release()?

We should remove mana_ib_create_rc_qp() hunk, it came from my future
work where I removed umem from QPs as well.

Thanks

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: (subset) [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (49 preceding siblings ...)
  2026-02-13 10:58 ` [PATCH rdma-next 50/50] RDMA/mthca: Use generic resize-CQ lock Leon Romanovsky
@ 2026-02-25 13:51 ` Leon Romanovsky
  2026-02-25 13:53 ` Leon Romanovsky
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-25 13:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Junxian Huang, Abhijit Gangurde,
	Allen Hubbe, Krzysztof Czurylo, Tatyana Nikolova, Long Li,
	Konstantin Taranov, Yishai Hadas, Michal Kalderon, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list,
	Christian Benvenuti, Nelson Escobar, Dennis Dalessandro,
	Bernard Metzler, Zhu Yanjun, Leon Romanovsky
  Cc: linux-kernel, linux-rdma, linux-hyperv


On Fri, 13 Feb 2026 12:57:36 +0200, Leon Romanovsky wrote:
> Unify CQ UMEM creation, resize and release in ib_core to avoid the need
> for complex driver-side handling. This lets us rely on the internal
> reference counters of the relevant ib_XXX objects to manage UMEM
> lifetime safely and consistently.
> 
> The resize cleanup made it clear that most drivers never handled this
> path correctly, and there's a good chance the functionality was never
> actually used. The most common issue was relying on the cq->resize_umem
> pointer to detect races with other CQ commands, without clearing it on
> errors and while ignoring proper locking for other CQ operations.
> 
> [...]

Applied, thanks!

[01/50] RDMA: Move DMA block iterator logic into dedicated files
        (no commit info)
[02/50] RDMA/umem: Allow including ib_umem header from any location
        (no commit info)
[03/50] RDMA/umem: Remove unnecessary includes and defines from ib_umem header
        (no commit info)
[04/50] RDMA/core: Promote UMEM to a core component
        (no commit info)
[05/50] RDMA/core: Manage CQ umem in core code
        (no commit info)
[06/50] RDMA/efa: Rely on CPU address in create‑QP
        (no commit info)
[07/50] RDMA/core: Prepare create CQ path for API unification
        (no commit info)
[08/50] RDMA/core: Reject zero CQE count
        (no commit info)
[09/50] RDMA/efa: Remove check for zero CQE count
        (no commit info)
[10/50] RDMA/mlx5: Save 4 bytes in CQ structure
        (no commit info)
[11/50] RDMA/mlx5: Provide a modern CQ creation interface
        (no commit info)
[12/50] RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers
        (no commit info)
[13/50] RDMA/mlx4: Introduce a modern CQ creation interface
        (no commit info)
[14/50] RDMA/mlx4: Remove unused create_flags field from CQ structure
        (no commit info)

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core
  2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
                   ` (50 preceding siblings ...)
  2026-02-25 13:51 ` (subset) [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
@ 2026-02-25 13:53 ` Leon Romanovsky
  51 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-25 13:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Junxian Huang, Abhijit Gangurde,
	Allen Hubbe, Krzysztof Czurylo, Tatyana Nikolova, Long Li,
	Konstantin Taranov, Yishai Hadas, Michal Kalderon, Bryan Tan,
	Vishnu Dasa, Broadcom internal kernel review list,
	Christian Benvenuti, Nelson Escobar, Dennis Dalessandro,
	Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv

On Fri, Feb 13, 2026 at 12:57:36PM +0200, Leon Romanovsky wrote:
> Unify CQ UMEM creation, resize and release in ib_core to avoid the need
> for complex driver-side handling. This lets us rely on the internal
> reference counters of the relevant ib_XXX objects to manage UMEM
> lifetime safely and consistently.
> 
> The resize cleanup made it clear that most drivers never handled this
> path correctly, and there's a good chance the functionality was never
> actually used. The most common issue was relying on the cq->resize_umem
> pointer to detect races with other CQ commands, without clearing it on
> errors and while ignoring proper locking for other CQ operations.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Leon Romanovsky (50):
>       RDMA: Move DMA block iterator logic into dedicated files
>       RDMA/umem: Allow including ib_umem header from any location
>       RDMA/umem: Remove unnecessary includes and defines from ib_umem header
>       RDMA/core: Promote UMEM to a core component
>       RDMA/core: Manage CQ umem in core code
>       RDMA/efa: Rely on CPU address in create‑QP
>       RDMA/core: Prepare create CQ path for API unification
>       RDMA/core: Reject zero CQE count
>       RDMA/efa: Remove check for zero CQE count
>       RDMA/mlx5: Save 4 bytes in CQ structure
>       RDMA/mlx5: Provide a modern CQ creation interface
>       RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers
>       RDMA/mlx4: Introduce a modern CQ creation interface
>       RDMA/mlx4: Remove unused create_flags field from CQ structure

I took 14 patches above, rest will need to be resubmitted.

Thanks

>       RDMA/bnxt_re: Convert to modern CQ interface
>       RDMA/cxgb4: Separate kernel and user CQ creation paths
>       RDMA/mthca: Split user and kernel CQ creation paths
>       RDMA/erdma: Separate user and kernel CQ creation paths
>       RDMA/ionic: Split user and kernel CQ creation paths
>       RDMA/qedr: Convert to modern CQ interface
>       RDMA/vmw_pvrdma: Provide a modern CQ creation interface
>       RDMA/ocrdma: Split user and kernel CQ creation paths
>       RDMA/irdma: Split user and kernel CQ creation paths
>       RDMA/usnic: Provide a modern CQ creation interface
>       RDMA/mana: Provide a modern CQ creation interface
>       RDMA/erdma: Separate user and kernel CQ creation paths
>       RDMA/rdmavt: Split user and kernel CQ creation paths
>       RDMA/siw: Split user and kernel CQ creation paths
>       RDMA/rxe: Split user and kernel CQ creation paths
>       RDMA/core: Remove legacy CQ creation fallback path
>       RDMA/core: Remove unused ib_resize_cq() implementation
>       RDMA: Clarify that CQ resize is a user‑space verb
>       RDMA/bnxt_re: Drop support for resizing kernel CQs
>       RDMA/irdma: Remove resize support for kernel CQs
>       RDMA/mlx4: Remove support for kernel CQ resize
>       RDMA/mlx5: Remove support for resizing kernel CQs
>       RDMA/mthca: Remove resize support for kernel CQs
>       RDMA/rdmavt: Remove resize support for kernel CQs
>       RDMA/rxe: Remove unused kernel‑side CQ resize support
>       RDMA: Properly propagate the number of CQEs as unsigned int
>       RDMA/core: Generalize CQ resize locking
>       RDMA/bnxt_re: Complete CQ resize in a single step
>       RDMA/bnxt_re: Rely on common resize‑CQ locking
>       RDMA/bnxt_re: Reduce CQ memory footprint
>       RDMA/mlx4: Use generic resize-CQ lock
>       RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object
>       RDMA/mlx5: Use generic resize-CQ lock
>       RDMA/mlx5: Select resize‑CQ callback based on device capabilities
>       RDMA/mlx5: Reduce CQ memory footprint
>       RDMA/mthca: Use generic resize-CQ lock
> 
>  drivers/infiniband/core/Makefile                |   6 +-
>  drivers/infiniband/core/cq.c                    |   3 +
>  drivers/infiniband/core/device.c                |   4 +-
>  drivers/infiniband/core/iter.c                  |  43 +++
>  drivers/infiniband/core/umem.c                  |   2 +-
>  drivers/infiniband/core/uverbs_cmd.c            |  18 +-
>  drivers/infiniband/core/uverbs_std_types_cq.c   |  35 ++-
>  drivers/infiniband/core/verbs.c                 |  61 +---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c        | 246 ++++++++-------
>  drivers/infiniband/hw/bnxt_re/ib_verbs.h        |   9 +-
>  drivers/infiniband/hw/bnxt_re/main.c            |   3 +-
>  drivers/infiniband/hw/bnxt_re/qplib_res.c       |   2 +-
>  drivers/infiniband/hw/cxgb4/cq.c                | 218 +++++++++----
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   2 +
>  drivers/infiniband/hw/cxgb4/mem.c               |   2 +-
>  drivers/infiniband/hw/cxgb4/provider.c          |   1 +
>  drivers/infiniband/hw/efa/efa.h                 |   6 +-
>  drivers/infiniband/hw/efa/efa_main.c            |   3 +-
>  drivers/infiniband/hw/efa/efa_verbs.c           |  44 ++-
>  drivers/infiniband/hw/erdma/erdma_main.c        |   1 +
>  drivers/infiniband/hw/erdma/erdma_verbs.c       |  99 ++++--
>  drivers/infiniband/hw/erdma/erdma_verbs.h       |   2 +
>  drivers/infiniband/hw/hns/hns_roce_alloc.c      |   2 +-
>  drivers/infiniband/hw/hns/hns_roce_cq.c         | 103 ++++--
>  drivers/infiniband/hw/hns/hns_roce_debugfs.c    |   1 -
>  drivers/infiniband/hw/hns/hns_roce_device.h     |   3 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c       |   1 +
>  drivers/infiniband/hw/ionic/ionic_controlpath.c |  88 ++++--
>  drivers/infiniband/hw/ionic/ionic_ibdev.c       |   1 +
>  drivers/infiniband/hw/ionic/ionic_ibdev.h       |   4 +-
>  drivers/infiniband/hw/irdma/main.h              |   2 +-
>  drivers/infiniband/hw/irdma/verbs.c             | 402 +++++++++++++-----------
>  drivers/infiniband/hw/mana/cq.c                 | 128 +++++---
>  drivers/infiniband/hw/mana/device.c             |   1 +
>  drivers/infiniband/hw/mana/main.c               |  25 +-
>  drivers/infiniband/hw/mana/mana_ib.h            |   6 +-
>  drivers/infiniband/hw/mana/qp.c                 |  42 ++-
>  drivers/infiniband/hw/mana/wq.c                 |  14 +-
>  drivers/infiniband/hw/mlx4/cq.c                 | 401 ++++++++---------------
>  drivers/infiniband/hw/mlx4/main.c               |   3 +-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h            |  10 +-
>  drivers/infiniband/hw/mlx4/mr.c                 |   1 +
>  drivers/infiniband/hw/mlx5/cq.c                 | 383 ++++++++--------------
>  drivers/infiniband/hw/mlx5/main.c               |   9 +-
>  drivers/infiniband/hw/mlx5/mem.c                |   1 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h            |  12 +-
>  drivers/infiniband/hw/mlx5/qp.c                 |   2 +-
>  drivers/infiniband/hw/mlx5/umr.c                |   1 +
>  drivers/infiniband/hw/mthca/mthca_cq.c          |   1 -
>  drivers/infiniband/hw/mthca/mthca_provider.c    | 193 ++++--------
>  drivers/infiniband/hw/mthca/mthca_provider.h    |   1 -
>  drivers/infiniband/hw/ocrdma/ocrdma_main.c      |   3 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c     |  70 +++--
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h     |   6 +-
>  drivers/infiniband/hw/qedr/main.c               |   1 +
>  drivers/infiniband/hw/qedr/verbs.c              | 325 +++++++++++--------
>  drivers/infiniband/hw/qedr/verbs.h              |   2 +
>  drivers/infiniband/hw/usnic/usnic_ib_main.c     |   2 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c    |   6 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h    |   4 +-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma.h       |   2 +-
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c    | 171 ++++++----
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c  |   1 +
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |   3 +
>  drivers/infiniband/sw/rdmavt/cq.c               | 224 +++++++------
>  drivers/infiniband/sw/rdmavt/cq.h               |   4 +-
>  drivers/infiniband/sw/rdmavt/vt.c               |   3 +-
>  drivers/infiniband/sw/rxe/rxe_cq.c              |  31 --
>  drivers/infiniband/sw/rxe/rxe_loc.h             |   3 -
>  drivers/infiniband/sw/rxe/rxe_verbs.c           | 115 +++----
>  drivers/infiniband/sw/siw/siw_main.c            |   1 +
>  drivers/infiniband/sw/siw/siw_verbs.c           | 111 +++++--
>  drivers/infiniband/sw/siw/siw_verbs.h           |   2 +
>  include/rdma/ib_umem.h                          |  36 +--
>  include/rdma/ib_verbs.h                         |  67 +---
>  include/rdma/iter.h                             |  88 ++++++
>  76 files changed, 2085 insertions(+), 1847 deletions(-)
> ---
> base-commit: 42e3aac65c1c9eb36cdee0d8312a326196e0822f
> change-id: 20260203-refactor-umem-e5b4277e41b4
> 
> Best regards,
> --  
> Leon Romanovsky <leonro@nvidia.com>
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-13 10:58 ` [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths Leon Romanovsky
  2026-02-24  2:20   ` Cheng Xu
@ 2026-02-26  6:17   ` Junxian Huang
  2026-02-26  6:54     ` Leon Romanovsky
  1 sibling, 1 reply; 73+ messages in thread
From: Junxian Huang @ 2026-02-26  6:17 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe, Selvin Xavier, Kalesh AP,
	Potnuri Bharat Teja, Michael Margolin, Gal Pressman,
	Yossi Leybovich, Cheng Xu, Kai Shen, Chengchang Tang,
	Abhijit Gangurde, Allen Hubbe, Krzysztof Czurylo,
	Tatyana Nikolova, Long Li, Konstantin Taranov, Yishai Hadas,
	Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun
  Cc: linux-kernel, linux-rdma, linux-hyperv



On 2026/2/13 18:58, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Split CQ creation into distinct kernel and user flows. The hns driver,
> inherited from mlx4, uses a problematic pattern that shares and caches
> umem in hns_roce_db_map_user(). This design blocks the driver from
> supporting generic umem sources (VMA, dmabuf, memfd, and others).
> 
> In addition, let's delete counter that counts CQ creation errors. There
> are multiple ways to debug kernel in modern kernel without need to rely
> on that debugfs counter.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_cq.c      | 103 ++++++++++++++++++++-------
>  drivers/infiniband/hw/hns/hns_roce_debugfs.c |   1 -
>  drivers/infiniband/hw/hns/hns_roce_device.h  |   3 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c    |   1 +
>  4 files changed, 82 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
> index 857a913326cd..0f24a916466b 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_cq.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
> @@ -335,7 +335,10 @@ static int verify_cq_create_attr(struct hns_roce_dev *hr_dev,
>  {
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
>  
> -	if (!attr->cqe || attr->cqe > hr_dev->caps.max_cqes) {
> +	if (attr->flags)
> +		return -EOPNOTSUPP;
> +
> +	if (attr->cqe > hr_dev->caps.max_cqes) {
>  		ibdev_err(ibdev, "failed to check CQ count %u, max = %u.\n",
>  			  attr->cqe, hr_dev->caps.max_cqes);
>  		return -EINVAL;
> @@ -407,8 +410,8 @@ static int set_cqe_size(struct hns_roce_cq *hr_cq, struct ib_udata *udata,
>  	return 0;
>  }
>  
> -int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> -		       struct uverbs_attr_bundle *attrs)
> +int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +			    struct uverbs_attr_bundle *attrs)
>  {
>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
>  	struct ib_udata *udata = &attrs->driver_udata;
> @@ -418,31 +421,27 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  	struct hns_roce_ib_create_cq ucmd = {};
>  	int ret;
>  
> -	if (attr->flags) {
> -		ret = -EOPNOTSUPP;
> -		goto err_out;
> -	}
> +	if (ib_cq->umem)
> +		return -EOPNOTSUPP;
>  
>  	ret = verify_cq_create_attr(hr_dev, attr);
>  	if (ret)
> -		goto err_out;
> +		return ret;
>  
> -	if (udata) {
> -		ret = get_cq_ucmd(hr_cq, udata, &ucmd);
> -		if (ret)
> -			goto err_out;
> -	}
> +	ret = get_cq_ucmd(hr_cq, udata, &ucmd);
> +	if (ret)
> +		return ret;
>  
>  	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd);
>  
>  	ret = set_cqe_size(hr_cq, udata, &ucmd);
>  	if (ret)
> -		goto err_out;
> +		return ret;
>  
>  	ret = alloc_cq_buf(hr_dev, hr_cq, udata, ucmd.buf_addr);
>  	if (ret) {
>  		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
> -		goto err_out;
> +		return ret;
>  	}
>  
>  	ret = alloc_cq_db(hr_dev, hr_cq, udata, ucmd.db_addr, &resp);
> @@ -464,13 +463,11 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  		goto err_cqn;
>  	}
>  
> -	if (udata) {
> -		resp.cqn = hr_cq->cqn;
> -		ret = ib_copy_to_udata(udata, &resp,
> -				       min(udata->outlen, sizeof(resp)));
> -		if (ret)
> -			goto err_cqc;
> -	}
> +	resp.cqn = hr_cq->cqn;
> +	ret = ib_copy_to_udata(udata, &resp,
> +			       min(udata->outlen, sizeof(resp)));
> +	if (ret)
> +		goto err_cqc;
>  
>  	hr_cq->cons_index = 0;
>  	hr_cq->arm_sn = 1;
> @@ -487,9 +484,67 @@ int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  	free_cq_db(hr_dev, hr_cq, udata);
>  err_cq_buf:
>  	free_cq_buf(hr_dev, hr_cq);
> -err_out:
> -	atomic64_inc(&hr_dev->dfx_cnt[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT]);
> +	return ret;
> +}
> +
> +int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +		       struct uverbs_attr_bundle *attrs)
> +{
> +	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
> +	struct hns_roce_ib_create_cq_resp resp = {};
> +	struct hns_roce_cq *hr_cq = to_hr_cq(ib_cq);
> +	struct ib_device *ibdev = &hr_dev->ib_dev;
> +	struct hns_roce_ib_create_cq ucmd = {};

ucmd and resp are not needed since we don't have udata here.

Junxian

> +	int ret;
> +
> +	ret = verify_cq_create_attr(hr_dev, attr);
> +	if (ret)
> +		return ret;
> +
> +	set_cq_param(hr_cq, attr->cqe, attr->comp_vector, &ucmd)> +
> +	ret = set_cqe_size(hr_cq, NULL, &ucmd);
> +	if (ret)
> +		return ret;
>  
> +	ret = alloc_cq_buf(hr_dev, hr_cq, NULL, 0);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQ buf, ret = %d.\n", ret);
> +		return ret;
> +	}
> +
> +	ret = alloc_cq_db(hr_dev, hr_cq, NULL, 0, &resp);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQ db, ret = %d.\n", ret);
> +		goto err_cq_buf;
> +	}
> +
> +	ret = alloc_cqn(hr_dev, hr_cq, NULL);
> +	if (ret) {
> +		ibdev_err(ibdev, "failed to alloc CQN, ret = %d.\n", ret);
> +		goto err_cq_db;
> +	}
> +
> +	ret = alloc_cqc(hr_dev, hr_cq);
> +	if (ret) {
> +		ibdev_err(ibdev,
> +			  "failed to alloc CQ context, ret = %d.\n", ret);
> +		goto err_cqn;
> +	}
> +
> +	hr_cq->cons_index = 0;
> +	hr_cq->arm_sn = 1;
> +	refcount_set(&hr_cq->refcount, 1);
> +	init_completion(&hr_cq->free);
> +
> +	return 0;
> +
> +err_cqn:
> +	free_cqn(hr_dev, hr_cq->cqn);
> +err_cq_db:
> +	free_cq_db(hr_dev, hr_cq, NULL);
> +err_cq_buf:
> +	free_cq_buf(hr_dev, hr_cq);
>  	return ret;
>  }
>  
> diff --git a/drivers/infiniband/hw/hns/hns_roce_debugfs.c b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> index b869cdc54118..481b30f2f5b5 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_debugfs.c
> @@ -47,7 +47,6 @@ static const char * const sw_stat_info[] = {
>  	[HNS_ROCE_DFX_MBX_EVENT_CNT] = "mbx_event",
>  	[HNS_ROCE_DFX_QP_CREATE_ERR_CNT] = "qp_create_err",
>  	[HNS_ROCE_DFX_QP_MODIFY_ERR_CNT] = "qp_modify_err",
> -	[HNS_ROCE_DFX_CQ_CREATE_ERR_CNT] = "cq_create_err",
>  	[HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT] = "cq_modify_err",
>  	[HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT] = "srq_create_err",
>  	[HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT] = "srq_modify_err",
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 3f032b8038af..fdc5f487d7a3 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -902,7 +902,6 @@ enum hns_roce_sw_dfx_stat_index {
>  	HNS_ROCE_DFX_MBX_EVENT_CNT,
>  	HNS_ROCE_DFX_QP_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_QP_MODIFY_ERR_CNT,
> -	HNS_ROCE_DFX_CQ_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_CQ_MODIFY_ERR_CNT,
>  	HNS_ROCE_DFX_SRQ_CREATE_ERR_CNT,
>  	HNS_ROCE_DFX_SRQ_MODIFY_ERR_CNT,
> @@ -1295,6 +1294,8 @@ int to_hr_qp_type(int qp_type);
>  
>  int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
>  		       struct uverbs_attr_bundle *attrs);
> +int hns_roce_create_user_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> +			    struct uverbs_attr_bundle *attrs);
>  
>  int hns_roce_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata);
>  int hns_roce_db_map_user(struct hns_roce_ucontext *context, unsigned long virt,
> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> index a3490bab297a..64de49bf8df7 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_main.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> @@ -727,6 +727,7 @@ static const struct ib_device_ops hns_roce_dev_ops = {
>  	.create_ah = hns_roce_create_ah,
>  	.create_user_ah = hns_roce_create_ah,
>  	.create_cq = hns_roce_create_cq,
> +	.create_user_cq = hns_roce_create_user_cq,
>  	.create_qp = hns_roce_create_qp,
>  	.dealloc_pd = hns_roce_dealloc_pd,
>  	.dealloc_ucontext = hns_roce_dealloc_ucontext,
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths
  2026-02-26  6:17   ` Junxian Huang
@ 2026-02-26  6:54     ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2026-02-26  6:54 UTC (permalink / raw)
  To: Junxian Huang
  Cc: Jason Gunthorpe, Selvin Xavier, Kalesh AP, Potnuri Bharat Teja,
	Michael Margolin, Gal Pressman, Yossi Leybovich, Cheng Xu,
	Kai Shen, Chengchang Tang, Abhijit Gangurde, Allen Hubbe,
	Krzysztof Czurylo, Tatyana Nikolova, Long Li, Konstantin Taranov,
	Yishai Hadas, Michal Kalderon, Bryan Tan, Vishnu Dasa,
	Broadcom internal kernel review list, Christian Benvenuti,
	Nelson Escobar, Dennis Dalessandro, Bernard Metzler, Zhu Yanjun,
	linux-kernel, linux-rdma, linux-hyperv

On Thu, Feb 26, 2026 at 02:17:38PM +0800, Junxian Huang wrote:
> 
> 
> On 2026/2/13 18:58, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Split CQ creation into distinct kernel and user flows. The hns driver,
> > inherited from mlx4, uses a problematic pattern that shares and caches
> > umem in hns_roce_db_map_user(). This design blocks the driver from
> > supporting generic umem sources (VMA, dmabuf, memfd, and others).
> > 
> > In addition, let's delete counter that counts CQ creation errors. There
> > are multiple ways to debug kernel in modern kernel without need to rely
> > on that debugfs counter.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/hns/hns_roce_cq.c      | 103 ++++++++++++++++++++-------
> >  drivers/infiniband/hw/hns/hns_roce_debugfs.c |   1 -
> >  drivers/infiniband/hw/hns/hns_roce_device.h  |   3 +-
> >  drivers/infiniband/hw/hns/hns_roce_main.c    |   1 +
> >  4 files changed, 82 insertions(+), 26 deletions(-)

<...>

> > +int hns_roce_create_cq(struct ib_cq *ib_cq, const struct ib_cq_init_attr *attr,
> > +		       struct uverbs_attr_bundle *attrs)
> > +{
> > +	struct hns_roce_dev *hr_dev = to_hr_dev(ib_cq->device);
> > +	struct hns_roce_ib_create_cq_resp resp = {};
> > +	struct hns_roce_cq *hr_cq = to_hr_cq(ib_cq);
> > +	struct ib_device *ibdev = &hr_dev->ib_dev;
> > +	struct hns_roce_ib_create_cq ucmd = {};
> 
> ucmd and resp are not needed since we don't have udata here.

Thanks, will fix.

> 
> Junxian

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2026-02-26  6:54 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 10:57 [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 01/50] RDMA: Move DMA block iterator logic into dedicated files Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 02/50] RDMA/umem: Allow including ib_umem header from any location Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 03/50] RDMA/umem: Remove unnecessary includes and defines from ib_umem header Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 04/50] RDMA/core: Promote UMEM to a core component Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 05/50] RDMA/core: Manage CQ umem in core code Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 06/50] RDMA/efa: Rely on CPU address in create‑QP Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 07/50] RDMA/core: Prepare create CQ path for API unification Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 08/50] RDMA/core: Reject zero CQE count Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 09/50] RDMA/efa: Remove check for " Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 10/50] RDMA/mlx5: Save 4 bytes in CQ structure Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 11/50] RDMA/mlx5: Provide a modern CQ creation interface Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 12/50] RDMA/mlx4: Inline mlx4_ib_get_cq_umem into callers Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 13/50] RDMA/mlx4: Introduce a modern CQ creation interface Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 14/50] RDMA/mlx4: Remove unused create_flags field from CQ structure Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 15/50] RDMA/bnxt_re: Convert to modern CQ interface Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 16/50] RDMA/cxgb4: Separate kernel and user CQ creation paths Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 17/50] RDMA/mthca: Split user and kernel " Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 18/50] RDMA/erdma: Separate " Leon Romanovsky
2026-02-24  5:51   ` Cheng Xu
2026-02-24 10:57     ` Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 19/50] RDMA/ionic: Split " Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 20/50] RDMA/qedr: Convert to modern CQ interface Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 21/50] RDMA/vmw_pvrdma: Provide a modern CQ creation interface Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 22/50] RDMA/ocrdma: Split user and kernel CQ creation paths Leon Romanovsky
2026-02-13 10:57 ` [PATCH rdma-next 23/50] RDMA/irdma: " Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 24/50] RDMA/usnic: Provide a modern CQ creation interface Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 25/50] RDMA/mana: " Leon Romanovsky
2026-02-24 22:30   ` [EXTERNAL] " Long Li
2026-02-25  8:24     ` Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 26/50] RDMA/erdma: Separate user and kernel CQ creation paths Leon Romanovsky
2026-02-24  2:20   ` Cheng Xu
2026-02-24 10:46     ` Leon Romanovsky
2026-02-26  6:17   ` Junxian Huang
2026-02-26  6:54     ` Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 27/50] RDMA/rdmavt: Split " Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 28/50] RDMA/siw: " Leon Romanovsky
2026-02-13 16:56   ` Bernard Metzler
2026-02-13 21:17     ` Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 29/50] RDMA/rxe: " Leon Romanovsky
2026-02-13 23:22   ` yanjun.zhu
2026-02-15  7:06     ` Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 30/50] RDMA/core: Remove legacy CQ creation fallback path Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 31/50] RDMA/core: Remove unused ib_resize_cq() implementation Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 32/50] RDMA: Clarify that CQ resize is a user‑space verb Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 33/50] RDMA/bnxt_re: Drop support for resizing kernel CQs Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 34/50] RDMA/irdma: Remove resize support for " Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 35/50] RDMA/mlx4: Remove support for kernel CQ resize Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 36/50] RDMA/mlx5: Remove support for resizing kernel CQs Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 37/50] RDMA/mthca: Remove resize support for " Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 38/50] RDMA/rdmavt: " Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 39/50] RDMA/rxe: Remove unused kernel‑side CQ resize support Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 40/50] RDMA: Properly propagate the number of CQEs as unsigned int Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 41/50] RDMA/core: Generalize CQ resize locking Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step Leon Romanovsky
2026-02-16  3:59   ` Selvin Xavier
2026-02-16  8:07     ` Leon Romanovsky
2026-02-17  5:02       ` Selvin Xavier
2026-02-17  7:56         ` Leon Romanovsky
2026-02-17 10:52           ` Selvin Xavier
2026-02-19  8:02             ` Selvin Xavier
2026-02-24  8:15   ` Selvin Xavier
2026-02-24 10:59     ` Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 43/50] RDMA/bnxt_re: Rely on common resize‑CQ locking Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 44/50] RDMA/bnxt_re: Reduce CQ memory footprint Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 45/50] RDMA/mlx4: Use generic resize-CQ lock Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 46/50] RDMA/mlx4: Use on‑stack variables instead of storing them in the CQ object Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 47/50] RDMA/mlx5: Use generic resize-CQ lock Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 48/50] RDMA/mlx5: Select resize‑CQ callback based on device capabilities Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 49/50] RDMA/mlx5: Reduce CQ memory footprint Leon Romanovsky
2026-02-13 10:58 ` [PATCH rdma-next 50/50] RDMA/mthca: Use generic resize-CQ lock Leon Romanovsky
2026-02-25 13:51 ` (subset) [PATCH rdma-next 00/50] RDMA: Ensure CQ UMEMs are managed by ib_core Leon Romanovsky
2026-02-25 13:53 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox