public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
@ 2026-04-29 22:16 Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

This series adds per-vPort Event Queue (EQ) allocation and MSI-X interrupt
management for the MANA driver. Previously, all vPorts shared a single set
of EQs. This change enables dedicated EQs per vPort with support for both
dedicated and shared MSI-X vector allocation modes.

Patch 1 moves EQ ownership from mana_context to per-vPort mana_port_context
and exports create/destroy functions for the RDMA driver. Also adds EQ
create/destroy calls to mana_ib_cfg_vport/uncfg_vport so RDMA vPorts get
their own EQs.

Patch 2 adds device capability queries to determine whether MSI-X vectors
should be dedicated per-vPort or shared. When the number of available MSI-X
vectors is insufficient for dedicated allocation, the driver enables sharing
mode with bitmap-based vector assignment.

Patch 3 introduces the GIC (GDMA IRQ Context) abstraction with reference
counting, allowing multiple EQs to safely share a single MSI-X vector.

Patch 4 converts the global EQ allocation in probe/resume to use the new
GIC functions.

Patch 5 adds per-vPort GIC lifecycle management, calling get/put on each
EQ creation and destruction during vPort open/close.

Patch 6 extends the same GIC lifecycle management to the RDMA driver's EQ
allocation path.

Changes in v6:
- Rebased on net-next/main (v7.1-rc1)

Changes in v5:
- Rebased on net-next/main

Changes in v4:
- Rebased on net-next/main 7.0-rc4
- Patch 2: Use MANA_DEF_NUM_QUEUES instead of hardcoded 16 for
  max_num_queues clamping
- Patch 3: Track dyn_msix in GIC context instead of re-checking
  pci_msix_can_alloc_dyn() on each call; improved remove_irqs iteration
  to skip unallocated entries

Changes in v3:
- Rebased on net-next/main
- Patch 1: Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to
  prevent NULL pointer dereference when RSS QP is created before a raw QP
  has configured the vport and allocated EQs

Changes in v2:
- Rebased on net-next/main (adapted to kzalloc_objs/kzalloc_obj macros,
  new GDMA_DRV_CAP_FLAG definitions)
- Patch 2: Fixed misleading comment for max_num_queues vs
  max_num_queues_vport in gdma.h
- Patch 3: Fixed spelling typo in gdma_main.c ("difference" -> "different")

Long Li (6):
  net: mana: Create separate EQs for each vPort
  net: mana: Query device capabilities and configure MSI-X sharing for
    EQs
  net: mana: Introduce GIC context with refcounting for interrupt
    management
  net: mana: Use GIC functions to allocate global EQs
  net: mana: Allocate interrupt context for each EQ when creating vPort
  RDMA/mana_ib: Allocate interrupt contexts on EQs

 drivers/infiniband/hw/mana/main.c             |  47 ++-
 drivers/infiniband/hw/mana/qp.c               |  16 +-
 .../net/ethernet/microsoft/mana/gdma_main.c   | 307 +++++++++++++-----
 drivers/net/ethernet/microsoft/mana/mana_en.c | 163 ++++++----
 include/net/mana/gdma.h                       |  32 +-
 include/net/mana/mana.h                       |   7 +-
 6 files changed, 416 insertions(+), 156 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
@ 2026-04-29 22:16 ` Long Li
  2026-05-02 15:07   ` Simon Horman
  2026-05-02 15:23   ` Simon Horman
  2026-04-29 22:16 ` [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.

Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.

Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v3:
- Added NULL check for mpc->eqs in mana_ib_create_qp_rss()

 drivers/infiniband/hw/mana/main.c             |  14 ++-
 drivers/infiniband/hw/mana/qp.c               |  16 ++-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 110 ++++++++++--------
 include/net/mana/mana.h                       |   7 +-
 4 files changed, 94 insertions(+), 53 deletions(-)

diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index ac5e75dd3494..60cc02e4ad10 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
 	pd->vport_use_count--;
 	WARN_ON(pd->vport_use_count < 0);
 
-	if (!pd->vport_use_count)
+	if (!pd->vport_use_count) {
+		mana_destroy_eq(mpc);
 		mana_uncfg_vport(mpc);
+	}
 
 	mutex_unlock(&pd->vport_mutex);
 }
@@ -55,15 +57,21 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
 		return err;
 	}
 
-	mutex_unlock(&pd->vport_mutex);
 
 	pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
 	pd->tx_vp_offset = mpc->tx_vp_offset;
+	err = mana_create_eq(mpc);
+	if (err) {
+		mana_uncfg_vport(mpc);
+		pd->vport_use_count--;
+	}
+
+	mutex_unlock(&pd->vport_mutex);
 
 	ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
 		  mpc->port_handle, pd->pdn, doorbell_id);
 
-	return 0;
+	return err;
 }
 
 int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 645581359cee..6f1043383e8c 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 		cq_spec.gdma_region = cq->queue.gdma_region;
 		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
 		cq_spec.modr_ctx_id = 0;
-		eq = &mpc->ac->eqs[cq->comp_vector];
+		/* EQs are created when a raw QP configures the vport.
+		 * A raw QP must be created before creating rwq_ind_tbl.
+		 */
+		if (!mpc->eqs) {
+			ret = -EINVAL;
+			i--;
+			goto fail;
+		}
+		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
 		cq_spec.attached_eq = eq->eq->id;
 
 		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
@@ -317,7 +325,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
 	cq_spec.modr_ctx_id = 0;
 	eq_vec = send_cq->comp_vector;
-	eq = &mpc->ac->eqs[eq_vec];
+	if (!mpc->eqs) {
+		err = -EINVAL;
+		goto err_destroy_queue;
+	}
+	eq = &mpc->eqs[eq_vec % mpc->num_queues];
 	cq_spec.attached_eq = eq->eq->id;
 
 	err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a654b3699c4c..6c709f8b875d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1609,78 +1609,82 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 }
 EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA");
 
-static void mana_destroy_eq(struct mana_context *ac)
+void mana_destroy_eq(struct mana_port_context *apc)
 {
+	struct mana_context *ac = apc->ac;
 	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_queue *eq;
 	int i;
 
-	if (!ac->eqs)
+	if (!apc->eqs)
 		return;
 
-	debugfs_remove_recursive(ac->mana_eqs_debugfs);
-	ac->mana_eqs_debugfs = NULL;
+	debugfs_remove_recursive(apc->mana_eqs_debugfs);
+	apc->mana_eqs_debugfs = NULL;
 
-	for (i = 0; i < gc->max_num_queues; i++) {
-		eq = ac->eqs[i].eq;
+	for (i = 0; i < apc->num_queues; i++) {
+		eq = apc->eqs[i].eq;
 		if (!eq)
 			continue;
 
 		mana_gd_destroy_queue(gc, eq);
 	}
 
-	kfree(ac->eqs);
-	ac->eqs = NULL;
+	kfree(apc->eqs);
+	apc->eqs = NULL;
 }
+EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA");
 
-static void mana_create_eq_debugfs(struct mana_context *ac, int i)
+static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
 {
-	struct mana_eq eq = ac->eqs[i];
+	struct mana_eq eq = apc->eqs[i];
 	char eqnum[32];
 
 	sprintf(eqnum, "eq%d", i);
-	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
+	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
 	debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
 	debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
 	debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
 }
 
-static int mana_create_eq(struct mana_context *ac)
+int mana_create_eq(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = ac->gdma_dev;
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct gdma_context *gc = gd->gdma_context;
 	struct gdma_queue_spec spec = {};
 	int err;
 	int i;
 
-	ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
-	if (!ac->eqs)
+	WARN_ON(apc->eqs);
+	apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
+	if (!apc->eqs)
 		return -ENOMEM;
 
 	spec.type = GDMA_EQ;
 	spec.monitor_avl_buf = false;
 	spec.queue_size = EQ_SIZE;
 	spec.eq.callback = NULL;
-	spec.eq.context = ac->eqs;
+	spec.eq.context = apc->eqs;
 	spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
 
-	ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
+	apc->mana_eqs_debugfs = debugfs_create_dir("EQs", apc->mana_port_debugfs);
 
-	for (i = 0; i < gc->max_num_queues; i++) {
+	for (i = 0; i < apc->num_queues; i++) {
 		spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
-		err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
+		err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
 		if (err) {
 			dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
 			goto out;
 		}
-		mana_create_eq_debugfs(ac, i);
+		mana_create_eq_debugfs(apc, i);
 	}
 
 	return 0;
 out:
-	mana_destroy_eq(ac);
+	mana_destroy_eq(apc);
 	return err;
 }
+EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA");
 
 static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *rxq)
 {
@@ -2434,7 +2438,7 @@ static int mana_create_txq(struct mana_port_context *apc,
 		spec.monitor_avl_buf = false;
 		spec.queue_size = cq_size;
 		spec.cq.callback = mana_schedule_napi;
-		spec.cq.parent_eq = ac->eqs[i].eq;
+		spec.cq.parent_eq = apc->eqs[i].eq;
 		spec.cq.context = cq;
 		err = mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq);
 		if (err)
@@ -2827,13 +2831,12 @@ static void mana_create_rxq_debugfs(struct mana_port_context *apc, int idx)
 static int mana_add_rx_queues(struct mana_port_context *apc,
 			      struct net_device *ndev)
 {
-	struct mana_context *ac = apc->ac;
 	struct mana_rxq *rxq;
 	int err = 0;
 	int i;
 
 	for (i = 0; i < apc->num_queues; i++) {
-		rxq = mana_create_rxq(apc, i, &ac->eqs[i], ndev);
+		rxq = mana_create_rxq(apc, i, &apc->eqs[i], ndev);
 		if (!rxq) {
 			err = -ENOMEM;
 			netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err);
@@ -2852,9 +2855,8 @@ static int mana_add_rx_queues(struct mana_port_context *apc,
 	return err;
 }
 
-static void mana_destroy_vport(struct mana_port_context *apc)
+static void mana_destroy_rxqs(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct mana_rxq *rxq;
 	u32 rxq_idx;
 
@@ -2866,8 +2868,12 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 		mana_destroy_rxq(apc, rxq, true);
 		apc->rxqs[rxq_idx] = NULL;
 	}
+}
+
+static void mana_destroy_vport(struct mana_port_context *apc)
+{
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 
-	mana_destroy_txq(apc);
 	mana_uncfg_vport(apc);
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
@@ -2888,11 +2894,7 @@ static int mana_create_vport(struct mana_port_context *apc,
 			return err;
 	}
 
-	err = mana_cfg_vport(apc, gd->pdid, gd->doorbell);
-	if (err)
-		return err;
-
-	return mana_create_txq(apc, net);
+	return mana_cfg_vport(apc, gd->pdid, gd->doorbell);
 }
 
 static int mana_rss_table_alloc(struct mana_port_context *apc)
@@ -3178,21 +3180,36 @@ int mana_alloc_queues(struct net_device *ndev)
 
 	err = mana_create_vport(apc, ndev);
 	if (err) {
-		netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err);
+		netdev_err(ndev, "Failed to create vPort %u : %d\n",
+			   apc->port_idx, err);
 		return err;
 	}
 
+	err = mana_create_eq(apc);
+	if (err) {
+		netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_vport;
+	}
+
+	err = mana_create_txq(apc, ndev);
+	if (err) {
+		netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_eq;
+	}
+
 	err = netif_set_real_num_tx_queues(ndev, apc->num_queues);
 	if (err) {
 		netdev_err(ndev,
 			   "netif_set_real_num_tx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_txq;
 	}
 
 	err = mana_add_rx_queues(apc, ndev);
 	if (err)
-		goto destroy_vport;
+		goto destroy_rxq;
 
 	apc->rss_state = apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE;
 
@@ -3201,7 +3218,7 @@ int mana_alloc_queues(struct net_device *ndev)
 		netdev_err(ndev,
 			   "netif_set_real_num_rx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	mana_rss_table_init(apc);
@@ -3209,19 +3226,25 @@ int mana_alloc_queues(struct net_device *ndev)
 	err = mana_config_rss(apc, TRI_STATE_TRUE, true, true);
 	if (err) {
 		netdev_err(ndev, "Failed to configure RSS table: %d\n", err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) {
 		err = mana_pf_register_filter(apc);
 		if (err)
-			goto destroy_vport;
+			goto destroy_rxq;
 	}
 
 	mana_chn_setxdp(apc, mana_xdp_get(apc));
 
 	return 0;
 
+destroy_rxq:
+	mana_destroy_rxqs(apc);
+destroy_txq:
+	mana_destroy_txq(apc);
+destroy_eq:
+	mana_destroy_eq(apc);
 destroy_vport:
 	mana_destroy_vport(apc);
 	return err;
@@ -3326,6 +3349,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	mana_fence_rqs(apc);
 
 	/* Even in err case, still need to cleanup the vPort */
+	mana_destroy_rxqs(apc);
+	mana_destroy_txq(apc);
+	mana_destroy_eq(apc);
 	mana_destroy_vport(apc);
 
 	return 0;
@@ -3646,12 +3672,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 	INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
 
-	err = mana_create_eq(ac);
-	if (err) {
-		dev_err(dev, "Failed to create EQs: %d\n", err);
-		goto out;
-	}
-
 	err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
 				    MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
 	if (err)
@@ -3791,8 +3811,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		free_netdev(ndev);
 	}
 
-	mana_destroy_eq(ac);
-
 	if (ac->per_port_queue_reset_wq) {
 		destroy_workqueue(ac->per_port_queue_reset_wq);
 		ac->per_port_queue_reset_wq = NULL;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 8f721cd4e4a7..2634e9135eed 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -480,8 +480,6 @@ struct mana_context {
 	u8 bm_hostmode;
 
 	struct mana_ethtool_hc_stats hc_stats;
-	struct mana_eq *eqs;
-	struct dentry *mana_eqs_debugfs;
 	struct workqueue_struct *per_port_queue_reset_wq;
 	/* Workqueue for querying hardware stats */
 	struct delayed_work gf_stats_work;
@@ -501,6 +499,9 @@ struct mana_port_context {
 
 	u8 mac_addr[ETH_ALEN];
 
+	struct mana_eq *eqs;
+	struct dentry *mana_eqs_debugfs;
+
 	enum TRI_STATE rss_state;
 
 	mana_handle_t default_rxobj;
@@ -1034,6 +1035,8 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
 		   u32 doorbell_pg_id);
 void mana_uncfg_vport(struct mana_port_context *apc);
+int mana_create_eq(struct mana_port_context *apc);
+void mana_destroy_eq(struct mana_port_context *apc);
 
 struct net_device *mana_get_primary_netdev(struct mana_context *ac,
 					   u32 port_index,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
@ 2026-04-29 22:16 ` Long Li
  2026-05-02 15:08   ` Simon Horman
  2026-05-02 15:26   ` Simon Horman
  2026-04-29 22:16 ` [PATCH net-next v6 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

When querying the device, adjust the max number of queues to allow
dedicated MSI-X vectors for each vPort. The number of queues per vPort
is clamped to no less than MANA_DEF_NUM_QUEUES. MSI-X sharing among
vPorts is disabled by default and is only enabled when there are not
enough MSI-X vectors for dedicated allocation.

Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is
used at GDMA device probe time for querying device capabilities.

Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v4:
- Use MANA_DEF_NUM_QUEUES instead of hardcoded 16 for max_num_queues
  clamping

Changes in v2:
- Fixed misleading comment for max_num_queues vs max_num_queues_vport

 .../net/ethernet/microsoft/mana/gdma_main.c   | 66 ++++++++++++++++---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 36 +++++-----
 include/net/mana/gdma.h                       | 13 +++-
 3 files changed, 91 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 098fbda0d128..b96859e0aec9 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -149,6 +149,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	struct gdma_query_max_resources_resp resp = {};
 	struct gdma_general_req req = {};
+	unsigned int max_num_queues;
+	u8 bm_hostmode;
+	u16 num_ports;
 	int err;
 
 	mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
@@ -194,6 +197,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	if (gc->max_num_queues > gc->num_msix_usable - 1)
 		gc->max_num_queues = gc->num_msix_usable - 1;
 
+	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+	if (err)
+		return err;
+
+	if (!num_ports)
+		return -EINVAL;
+
+	/*
+	 * Adjust gc->max_num_queues returned from the SOC to allow dedicated
+	 * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
+	 */
+	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
+	max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
+	if (max_num_queues < MANA_DEF_NUM_QUEUES)
+		max_num_queues = MANA_DEF_NUM_QUEUES;
+
+	/*
+	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
+	 * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
+	 */
+	max_num_queues = min(gc->max_num_queues, max_num_queues);
+	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
+		gc->msi_sharing = true;
+
+	/* If MSI is shared, use max allowed value */
+	if (gc->msi_sharing)
+		gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
+	else
+		gc->max_num_queues_vport = max_num_queues;
+
+	dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
+		 gc->msi_sharing, gc->max_num_queues);
+
 	return 0;
 }
 
@@ -1856,6 +1893,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
 		/* Need 1 interrupt for HWC */
 		max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
 		min_irqs = 2;
+		gc->msi_sharing = true;
 	}
 
 	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
@@ -1934,6 +1972,8 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
 
 	pci_free_irq_vectors(pdev);
 
+	bitmap_free(gc->msi_bitmap);
+	gc->msi_bitmap = NULL;
 	gc->max_num_msix = 0;
 	gc->num_msix_usable = 0;
 }
@@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
 	if (err)
 		goto destroy_hwc;
 
-	err = mana_gd_query_max_resources(pdev);
+	err = mana_gd_detect_devices(pdev);
 	if (err)
 		goto destroy_hwc;
 
-	err = mana_gd_setup_remaining_irqs(pdev);
-	if (err) {
-		dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
-		goto destroy_hwc;
-	}
-
-	err = mana_gd_detect_devices(pdev);
+	err = mana_gd_query_max_resources(pdev);
 	if (err)
 		goto destroy_hwc;
 
+	if (!gc->msi_sharing) {
+		gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
+		if (!gc->msi_bitmap) {
+			err = -ENOMEM;
+			goto destroy_hwc;
+		}
+		/* Set bit for HWC */
+		set_bit(0, gc->msi_bitmap);
+	} else {
+		err = mana_gd_setup_remaining_irqs(pdev);
+		if (err) {
+			dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
+			goto destroy_hwc;
+		}
+	}
+
 	dev_dbg(&pdev->dev, "mana gdma setup successful\n");
 	return 0;
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 6c709f8b875d..e7f734994b5e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1007,10 +1007,9 @@ static int mana_init_port_context(struct mana_port_context *apc)
 	return !apc->rxqs ? -ENOMEM : 0;
 }
 
-static int mana_send_request(struct mana_context *ac, void *in_buf,
-			     u32 in_len, void *out_buf, u32 out_len)
+static int gdma_mana_send_request(struct gdma_context *gc, void *in_buf,
+				  u32 in_len, void *out_buf, u32 out_len)
 {
-	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_resp_hdr *resp = out_buf;
 	struct gdma_req_hdr *req = in_buf;
 	struct device *dev = gc->dev;
@@ -1044,6 +1043,14 @@ static int mana_send_request(struct mana_context *ac, void *in_buf,
 	return 0;
 }
 
+static int mana_send_request(struct mana_context *ac, void *in_buf,
+			     u32 in_len, void *out_buf, u32 out_len)
+{
+	struct gdma_context *gc = ac->gdma_dev->gdma_context;
+
+	return gdma_mana_send_request(gc, in_buf, in_len, out_buf, out_len);
+}
+
 static int mana_verify_resp_hdr(const struct gdma_resp_hdr *resp_hdr,
 				const enum mana_command_code expected_code,
 				const u32 min_size)
@@ -1177,11 +1184,10 @@ static void mana_pf_deregister_filter(struct mana_port_context *apc)
 			   err, resp.hdr.status);
 }
 
-static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
-				 u32 proto_minor_ver, u32 proto_micro_ver,
-				 u16 *max_num_vports, u8 *bm_hostmode)
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+			     u32 proto_minor_ver, u32 proto_micro_ver,
+			     u16 *max_num_vports, u8 *bm_hostmode)
 {
-	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct mana_query_device_cfg_resp resp = {};
 	struct mana_query_device_cfg_req req = {};
 	struct device *dev = gc->dev;
@@ -1196,7 +1202,7 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
 	req.proto_minor_ver = proto_minor_ver;
 	req.proto_micro_ver = proto_micro_ver;
 
-	err = mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp));
+	err = gdma_mana_send_request(gc, &req, sizeof(req), &resp, sizeof(resp));
 	if (err) {
 		dev_err(dev, "Failed to query config: %d", err);
 		return err;
@@ -1230,8 +1236,6 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
 	else
 		*bm_hostmode = 0;
 
-	debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
-
 	return 0;
 }
 
@@ -3397,7 +3401,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	int err;
 
 	ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
-				 gc->max_num_queues);
+				 gc->max_num_queues_vport);
 	if (!ndev)
 		return -ENOMEM;
 
@@ -3406,9 +3410,9 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc = netdev_priv(ndev);
 	apc->ac = ac;
 	apc->ndev = ndev;
-	apc->max_queues = gc->max_num_queues;
+	apc->max_queues = gc->max_num_queues_vport;
 	/* Use MANA_DEF_NUM_QUEUES as default, still honoring the HW limit */
-	apc->num_queues = min(gc->max_num_queues, MANA_DEF_NUM_QUEUES);
+	apc->num_queues = min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES);
 	apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
 	apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
 	apc->port_handle = INVALID_MANA_HANDLE;
@@ -3672,13 +3676,15 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 	INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
 
-	err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
-				    MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
 	if (err)
 		goto out;
 
 	ac->bm_hostmode = bm_hostmode;
 
+	debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
+
 	if (!resuming) {
 		ac->num_ports = num_ports;
 	} else {
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 6d836060976a..9c05b1e15c3e 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -399,8 +399,10 @@ struct gdma_context {
 	struct device		*dev;
 	struct dentry		*mana_pci_debugfs;
 
-	/* Per-vPort max number of queues */
+	/* Hardware max number of queues */
 	unsigned int		max_num_queues;
+	/* Per-vPort max number of queues */
+	unsigned int		max_num_queues_vport;
 	unsigned int		max_num_msix;
 	unsigned int		num_msix_usable;
 	struct xarray		irq_contexts;
@@ -446,6 +448,12 @@ struct gdma_context {
 	struct workqueue_struct *service_wq;
 
 	unsigned long		flags;
+
+	/* Indicate if this device is sharing MSI for EQs on MANA */
+	bool msi_sharing;
+
+	/* Bitmap tracks where MSI is allocated when it is not shared for EQs */
+	unsigned long *msi_bitmap;
 };
 
 static inline bool mana_gd_is_mana(struct gdma_dev *gd)
@@ -1018,4 +1026,7 @@ int mana_gd_resume(struct pci_dev *pdev);
 
 bool mana_need_log(struct gdma_context *gc, int err);
 
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+			     u32 proto_minor_ver, u32 proto_micro_ver,
+			     u16 *max_num_vports, u8 *bm_hostmode);
 #endif /* _GDMA_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 3/6] net: mana: Introduce GIC context with refcounting for interrupt management
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
@ 2026-04-29 22:16 ` Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA
EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with
reference counting. This allows the driver to create an interrupt context
on an assigned or unassigned MSI-X vector and share it across multiple
EQ consumers.

Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v4:
- Track dyn_msix in GIC context instead of re-checking
  pci_msix_can_alloc_dyn() on each call; improved remove_irqs
  iteration to skip unallocated entries

Changes in v2:
- Fixed spelling typo ("difference" -> "different")

 .../net/ethernet/microsoft/mana/gdma_main.c   | 159 ++++++++++++++++++
 include/net/mana/gdma.h                       |  11 ++
 2 files changed, 170 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index b96859e0aec9..3b6711355002 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1612,6 +1612,164 @@ static irqreturn_t mana_gd_intr(int irq, void *arg)
 	return IRQ_HANDLED;
 }
 
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi)
+{
+	struct pci_dev *dev = to_pci_dev(gc->dev);
+	struct msi_map irq_map;
+	struct gdma_irq_context *gic;
+	int irq;
+
+	mutex_lock(&gc->gic_mutex);
+
+	gic = xa_load(&gc->irq_contexts, msi);
+	if (WARN_ON(!gic)) {
+		mutex_unlock(&gc->gic_mutex);
+		return;
+	}
+
+	if (use_msi_bitmap)
+		gic->bitmap_refs--;
+
+	if (use_msi_bitmap && gic->bitmap_refs == 0)
+		clear_bit(msi, gc->msi_bitmap);
+
+	if (!refcount_dec_and_test(&gic->refcount))
+		goto out;
+
+	irq = pci_irq_vector(dev, msi);
+
+	irq_update_affinity_hint(irq, NULL);
+	free_irq(irq, gic);
+
+	if (gic->dyn_msix) {
+		irq_map.virq = irq;
+		irq_map.index = msi;
+		pci_msix_free_irq(dev, irq_map);
+	}
+
+	xa_erase(&gc->irq_contexts, msi);
+	kfree(gic);
+
+out:
+	mutex_unlock(&gc->gic_mutex);
+}
+EXPORT_SYMBOL_NS(mana_gd_put_gic, "NET_MANA");
+
+/*
+ * Get a GIC (GDMA IRQ Context) on a MSI vector
+ * a MSI can be shared between different EQs, this function supports setting
+ * up separate MSIs using a bitmap, or directly using the MSI index
+ *
+ * @use_msi_bitmap:
+ * True if MSI is assigned by this function on available slots from bitmap.
+ * False if MSI is passed from *msi_requested
+ */
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc,
+					 bool use_msi_bitmap,
+					 int *msi_requested)
+{
+	struct gdma_irq_context *gic;
+	struct pci_dev *dev = to_pci_dev(gc->dev);
+	struct msi_map irq_map = { };
+	int irq;
+	int msi;
+	int err;
+
+	mutex_lock(&gc->gic_mutex);
+
+	if (use_msi_bitmap) {
+		msi = find_first_zero_bit(gc->msi_bitmap, gc->num_msix_usable);
+		if (msi >= gc->num_msix_usable) {
+			dev_err(gc->dev, "No free MSI vectors available\n");
+			gic = NULL;
+			goto out;
+		}
+		*msi_requested = msi;
+	} else {
+		msi = *msi_requested;
+	}
+
+	gic = xa_load(&gc->irq_contexts, msi);
+	if (gic) {
+		refcount_inc(&gic->refcount);
+		if (use_msi_bitmap) {
+			gic->bitmap_refs++;
+			set_bit(msi, gc->msi_bitmap);
+		}
+		goto out;
+	}
+
+	irq = pci_irq_vector(dev, msi);
+	if (irq == -EINVAL) {
+		irq_map = pci_msix_alloc_irq_at(dev, msi, NULL);
+		if (!irq_map.virq) {
+			err = irq_map.index;
+			dev_err(gc->dev,
+				"Failed to alloc irq_map msi %d err %d\n",
+				msi, err);
+			gic = NULL;
+			goto out;
+		}
+		irq = irq_map.virq;
+		msi = irq_map.index;
+	}
+
+	gic = kzalloc(sizeof(*gic), GFP_KERNEL);
+	if (!gic) {
+		if (irq_map.virq)
+			pci_msix_free_irq(dev, irq_map);
+		goto out;
+	}
+
+	gic->handler = mana_gd_process_eq_events;
+	gic->msi = msi;
+	gic->irq = irq;
+	INIT_LIST_HEAD(&gic->eq_list);
+	spin_lock_init(&gic->lock);
+
+	if (!gic->msi)
+		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
+			 pci_name(dev));
+	else
+		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_msi%d@pci:%s",
+			 gic->msi, pci_name(dev));
+
+	err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
+	if (err) {
+		dev_err(gc->dev, "Failed to request irq %d %s\n",
+			irq, gic->name);
+		kfree(gic);
+		gic = NULL;
+		if (irq_map.virq)
+			pci_msix_free_irq(dev, irq_map);
+		goto out;
+	}
+
+	gic->dyn_msix = !!irq_map.virq;
+	refcount_set(&gic->refcount, 1);
+	gic->bitmap_refs = use_msi_bitmap ? 1 : 0;
+
+	err = xa_err(xa_store(&gc->irq_contexts, msi, gic, GFP_KERNEL));
+	if (err) {
+		dev_err(gc->dev, "Failed to store irq context for msi %d: %d\n",
+			msi, err);
+		free_irq(irq, gic);
+		kfree(gic);
+		gic = NULL;
+		if (irq_map.virq)
+			pci_msix_free_irq(dev, irq_map);
+		goto out;
+	}
+
+	if (use_msi_bitmap)
+		set_bit(msi, gc->msi_bitmap);
+
+out:
+	mutex_unlock(&gc->gic_mutex);
+	return gic;
+}
+EXPORT_SYMBOL_NS(mana_gd_get_gic, "NET_MANA");
+
 int mana_gd_alloc_res_map(u32 res_avail, struct gdma_resource *r)
 {
 	r->map = bitmap_zalloc(res_avail, GFP_KERNEL);
@@ -2101,6 +2259,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto release_region;
 
 	mutex_init(&gc->eq_test_event_mutex);
+	mutex_init(&gc->gic_mutex);
 	pci_set_drvdata(pdev, gc);
 	gc->bar0_pa = pci_resource_start(pdev, 0);
 	gc->bar0_size = pci_resource_len(pdev, 0);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 9c05b1e15c3e..690208a26121 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -388,6 +388,11 @@ struct gdma_irq_context {
 	spinlock_t lock;
 	struct list_head eq_list;
 	char name[MANA_IRQ_NAME_SZ];
+	unsigned int msi;
+	unsigned int irq;
+	refcount_t refcount;
+	unsigned int bitmap_refs;
+	bool dyn_msix;
 };
 
 enum gdma_context_flags {
@@ -449,6 +454,9 @@ struct gdma_context {
 
 	unsigned long		flags;
 
+	/* Protect access to GIC context */
+	struct mutex		gic_mutex;
+
 	/* Indicate if this device is sharing MSI for EQs on MANA */
 	bool msi_sharing;
 
@@ -1026,6 +1034,9 @@ int mana_gd_resume(struct pci_dev *pdev);
 
 bool mana_need_log(struct gdma_context *gc, int err);
 
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc, bool use_msi_bitmap,
+					 int *msi_requested);
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi);
 int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
 			     u32 proto_minor_ver, u32 proto_micro_ver,
 			     u16 *max_num_vports, u8 *bm_hostmode);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 4/6] net: mana: Use GIC functions to allocate global EQs
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
                   ` (2 preceding siblings ...)
  2026-04-29 22:16 ` [PATCH net-next v6 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
@ 2026-04-29 22:16 ` Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
  5 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

Replace the GDMA global interrupt setup code with the new GIC allocation
and release functions for managing interrupt contexts.

Signed-off-by: Long Li <longli@microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 80 +++----------------
 1 file changed, 10 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 3b6711355002..ce433a68938f 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1885,30 +1885,13 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 	 * further used in irq_setup()
 	 */
 	for (i = 1; i <= nvec; i++) {
-		gic = kzalloc_obj(*gic);
+		gic = mana_gd_get_gic(gc, false, &i);
 		if (!gic) {
 			err = -ENOMEM;
 			goto free_irq;
 		}
-		gic->handler = mana_gd_process_eq_events;
-		INIT_LIST_HEAD(&gic->eq_list);
-		spin_lock_init(&gic->lock);
-
-		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
-			 i - 1, pci_name(pdev));
-
-		/* one pci vector is already allocated for HWC */
-		irqs[i - 1] = pci_irq_vector(pdev, i);
-		if (irqs[i - 1] < 0) {
-			err = irqs[i - 1];
-			goto free_current_gic;
-		}
-
-		err = request_irq(irqs[i - 1], mana_gd_intr, 0, gic->name, gic);
-		if (err)
-			goto free_current_gic;
 
-		xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+		irqs[i - 1] = gic->irq;
 	}
 
 	/*
@@ -1930,19 +1913,11 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 	kfree(irqs);
 	return 0;
 
-free_current_gic:
-	kfree(gic);
 free_irq:
 	for (i -= 1; i > 0; i--) {
 		irq = pci_irq_vector(pdev, i);
-		gic = xa_load(&gc->irq_contexts, i);
-		if (WARN_ON(!gic))
-			continue;
-
 		irq_update_affinity_hint(irq, NULL);
-		free_irq(irq, gic);
-		xa_erase(&gc->irq_contexts, i);
-		kfree(gic);
+		mana_gd_put_gic(gc, false, i);
 	}
 	kfree(irqs);
 	return err;
@@ -1963,34 +1938,13 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
 	start_irqs = irqs;
 
 	for (i = 0; i < nvec; i++) {
-		gic = kzalloc_obj(*gic);
+		gic = mana_gd_get_gic(gc, false, &i);
 		if (!gic) {
 			err = -ENOMEM;
 			goto free_irq;
 		}
 
-		gic->handler = mana_gd_process_eq_events;
-		INIT_LIST_HEAD(&gic->eq_list);
-		spin_lock_init(&gic->lock);
-
-		if (!i)
-			snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
-				 pci_name(pdev));
-		else
-			snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
-				 i - 1, pci_name(pdev));
-
-		irqs[i] = pci_irq_vector(pdev, i);
-		if (irqs[i] < 0) {
-			err = irqs[i];
-			goto free_current_gic;
-		}
-
-		err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
-		if (err)
-			goto free_current_gic;
-
-		xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+		irqs[i] = gic->irq;
 	}
 
 	/* If number of IRQ is one extra than number of online CPUs,
@@ -2019,19 +1973,11 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
 	kfree(start_irqs);
 	return 0;
 
-free_current_gic:
-	kfree(gic);
 free_irq:
 	for (i -= 1; i >= 0; i--) {
 		irq = pci_irq_vector(pdev, i);
-		gic = xa_load(&gc->irq_contexts, i);
-		if (WARN_ON(!gic))
-			continue;
-
 		irq_update_affinity_hint(irq, NULL);
-		free_irq(irq, gic);
-		xa_erase(&gc->irq_contexts, i);
-		kfree(gic);
+		mana_gd_put_gic(gc, false, i);
 	}
 
 	kfree(start_irqs);
@@ -2106,26 +2052,20 @@ static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
 static void mana_gd_remove_irqs(struct pci_dev *pdev)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
-	struct gdma_irq_context *gic;
 	int irq, i;
 
 	if (gc->max_num_msix < 1)
 		return;
 
 	for (i = 0; i < gc->max_num_msix; i++) {
-		irq = pci_irq_vector(pdev, i);
-		if (irq < 0)
-			continue;
-
-		gic = xa_load(&gc->irq_contexts, i);
-		if (WARN_ON(!gic))
+		if (!xa_load(&gc->irq_contexts, i))
 			continue;
 
 		/* Need to clear the hint before free_irq */
+		irq = pci_irq_vector(pdev, i);
 		irq_update_affinity_hint(irq, NULL);
-		free_irq(irq, gic);
-		xa_erase(&gc->irq_contexts, i);
-		kfree(gic);
+
+		mana_gd_put_gic(gc, false, i);
 	}
 
 	pci_free_irq_vectors(pdev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
                   ` (3 preceding siblings ...)
  2026-04-29 22:16 ` [PATCH net-next v6 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
@ 2026-04-29 22:16 ` Long Li
  2026-04-29 22:16 ` [PATCH net-next v6 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
  5 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

Use GIC functions to create a dedicated interrupt context or acquire a
shared interrupt context for each EQ when setting up a vPort.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/gdma_main.c |  2 +-
 drivers/net/ethernet/microsoft/mana/mana_en.c   | 17 ++++++++++++++++-
 include/net/mana/gdma.h                         |  1 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index ce433a68938f..ccecf2adcfe6 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -851,7 +851,6 @@ static void mana_gd_deregister_irq(struct gdma_queue *queue)
 	}
 	spin_unlock_irqrestore(&gic->lock, flags);
 
-	queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
 	synchronize_rcu();
 }
 
@@ -966,6 +965,7 @@ static int mana_gd_create_eq(struct gdma_dev *gd,
 out:
 	dev_err(dev, "Failed to create EQ: %d\n", err);
 	mana_gd_destroy_eq(gc, false, queue);
+	queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
 	return err;
 }
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index e7f734994b5e..15dcfb009ef0 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1619,6 +1619,7 @@ void mana_destroy_eq(struct mana_port_context *apc)
 	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_queue *eq;
 	int i;
+	unsigned int msi;
 
 	if (!apc->eqs)
 		return;
@@ -1631,7 +1632,9 @@ void mana_destroy_eq(struct mana_port_context *apc)
 		if (!eq)
 			continue;
 
+		msi = eq->eq.msix_index;
 		mana_gd_destroy_queue(gc, eq);
+		mana_gd_put_gic(gc, !gc->msi_sharing, msi);
 	}
 
 	kfree(apc->eqs);
@@ -1648,6 +1651,7 @@ static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
 	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
 	debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
 	debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
+	debugfs_create_u32("irq", 0400, eq.mana_eq_debugfs, &eq.eq->eq.irq);
 	debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
 }
 
@@ -1658,6 +1662,7 @@ int mana_create_eq(struct mana_port_context *apc)
 	struct gdma_queue_spec spec = {};
 	int err;
 	int i;
+	struct gdma_irq_context *gic;
 
 	WARN_ON(apc->eqs);
 	apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
@@ -1674,12 +1679,22 @@ int mana_create_eq(struct mana_port_context *apc)
 	apc->mana_eqs_debugfs = debugfs_create_dir("EQs", apc->mana_port_debugfs);
 
 	for (i = 0; i < apc->num_queues; i++) {
-		spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+		if (gc->msi_sharing)
+			spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+
+		gic = mana_gd_get_gic(gc, !gc->msi_sharing, &spec.eq.msix_index);
+		if (!gic) {
+			err = -ENOMEM;
+			goto out;
+		}
+
 		err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
 		if (err) {
 			dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
+			mana_gd_put_gic(gc, !gc->msi_sharing, spec.eq.msix_index);
 			goto out;
 		}
+		apc->eqs[i].eq->eq.irq = gic->irq;
 		mana_create_eq_debugfs(apc, i);
 	}
 
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 690208a26121..240d7f1c0733 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -342,6 +342,7 @@ struct gdma_queue {
 			void *context;
 
 			unsigned int msix_index;
+			unsigned int irq;
 
 			u32 log2_throttle_limit;
 		} eq;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v6 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs
  2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
                   ` (4 preceding siblings ...)
  2026-04-29 22:16 ` [PATCH net-next v6 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
@ 2026-04-29 22:16 ` Long Li
  5 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-04-29 22:16 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
are limited.

The driver now supports allocating dedicated MSI-X for each EQ. Indicate
this capability through driver capability bits.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/infiniband/hw/mana/main.c | 33 ++++++++++++++++++++++++++-----
 include/net/mana/gdma.h           |  7 +++++--
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 60cc02e4ad10..2267a73f0d6e 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -748,6 +748,7 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
 {
 	struct gdma_context *gc = mdev_to_gc(mdev);
 	struct gdma_queue_spec spec = {};
+	struct gdma_irq_context *gic;
 	int err, i;
 
 	spec.type = GDMA_EQ;
@@ -758,9 +759,15 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
 	spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
 	spec.eq.msix_index = 0;
 
+	gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+	if (!gic)
+		return -ENOMEM;
+
 	err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->fatal_err_eq);
-	if (err)
+	if (err) {
+		mana_gd_put_gic(gc, false, 0);
 		return err;
+	}
 
 	mdev->eqs = kzalloc_objs(struct gdma_queue *,
 				 mdev->ib_dev.num_comp_vectors);
@@ -771,31 +778,47 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
 	spec.eq.callback = NULL;
 	for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
 		spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+
+		gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+		if (!gic) {
+			err = -ENOMEM;
+			goto destroy_eqs;
+		}
+
 		err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->eqs[i]);
-		if (err)
+		if (err) {
+			mana_gd_put_gic(gc, false, spec.eq.msix_index);
 			goto destroy_eqs;
+		}
 	}
 
 	return 0;
 
 destroy_eqs:
-	while (i-- > 0)
+	while (i-- > 0) {
 		mana_gd_destroy_queue(gc, mdev->eqs[i]);
+		mana_gd_put_gic(gc, false, (i + 1) % gc->num_msix_usable);
+	}
 	kfree(mdev->eqs);
 destroy_fatal_eq:
 	mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+	mana_gd_put_gic(gc, false, 0);
 	return err;
 }
 
 void mana_ib_destroy_eqs(struct mana_ib_dev *mdev)
 {
 	struct gdma_context *gc = mdev_to_gc(mdev);
-	int i;
+	int i, msi;
 
 	mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+	mana_gd_put_gic(gc, false, 0);
 
-	for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++)
+	for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
 		mana_gd_destroy_queue(gc, mdev->eqs[i]);
+		msi = (i + 1) % gc->num_msix_usable;
+		mana_gd_put_gic(gc, false, msi);
+	}
 
 	kfree(mdev->eqs);
 }
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 240d7f1c0733..12502b1b7be1 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -615,6 +615,7 @@ enum {
 #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
 #define GDMA_DRV_CAP_FLAG_1_GDMA_PAGES_4MB_1GB_2GB BIT(4)
 #define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
+#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
 
 /* Driver can handle holes (zeros) in the device list */
 #define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
@@ -631,7 +632,8 @@ enum {
 /* Driver detects stalled send queues and recovers them */
 #define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18)
 
-#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
+/* Driver supports separate EQ/MSIs for each vPort */
+#define GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT BIT(19)
 
 /* Driver supports linearizing the skb when num_sge exceeds hardware limit */
 #define GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE BIT(20)
@@ -659,7 +661,8 @@ enum {
 	 GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
 	 GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
 	 GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
-	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
+	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
+	 GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT)
 
 #define GDMA_DRV_CAP_FLAGS2 0
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
@ 2026-05-02 15:07   ` Simon Horman
  2026-05-04 21:51     ` [EXTERNAL] " Long Li
  2026-05-02 15:23   ` Simon Horman
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Horman @ 2026-05-02 15:07 UTC (permalink / raw)
  To: longli
  Cc: 'Simon Horman', kotaranov, kuba, davem, pabeni, edumazet,
	andrew+netdev, jgg, leon, haiyangz, kys, wei.liu, decui, netdev,
	linux-rdma, linux-hyperv, linux-kernel

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: mana: Create separate EQs for each vPort

This commit modifies the mana driver to prepare for assigning vPorts to
dedicated MSI-X vectors by removing EQ sharing among vPorts. It moves the EQ
definition from the mana_context to the mana_port_context and exports EQ
lifecycle functions for use by the MANA RDMA driver.

> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 645581359cee..6f1043383e8c 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
>  		cq_spec.gdma_region = cq->queue.gdma_region;
>  		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
>  		cq_spec.modr_ctx_id = 0;
> -		eq = &mpc->ac->eqs[cq->comp_vector];
> +		/* EQs are created when a raw QP configures the vport.
> +		 * A raw QP must be created before creating rwq_ind_tbl.
> +		 */
> +		if (!mpc->eqs) {
> +			ret = -EINVAL;
> +			i--;
> +			goto fail;
> +		}
> +		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];

Is it possible for the EQs to be destroyed while this RSS QP is still actively
using them?

If the EQs are created by the Ethernet interface being brought up, or by
a RAW QP configuring the vport, this RSS QP will attach to them without
incrementing pd->vport_use_count or taking any vport reference count.

If the Ethernet interface is subsequently brought down, or the RAW QP is
destroyed, mana_destroy_eq() will be called, freeing the mpc->eqs array and
destroying the underlying DMA regions while this RSS QP remains active. This
regression could allow the hardware to DMA completion events into freed
EQ memory.

Additionally, since mpc->eqs is accessed here without holding pd->vport_mutex,
could a concurrent teardown of the EQs lead to a use-after-free when reading
eq->eq->id?

>  		cq_spec.attached_eq = eq->eq->id;
>  
>  		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
  2026-04-29 22:16 ` [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
@ 2026-05-02 15:08   ` Simon Horman
  2026-05-04 22:21     ` [EXTERNAL] " Long Li
  2026-05-02 15:26   ` Simon Horman
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Horman @ 2026-05-02 15:08 UTC (permalink / raw)
  To: longli
  Cc: 'Simon Horman', kotaranov, kuba, davem, pabeni, edumazet,
	andrew+netdev, jgg, leon, haiyangz, kys, wei.liu, decui, netdev,
	linux-rdma, linux-hyperv, linux-kernel

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: mana: Query device capabilities and configure MSI-X sharing for EQs

When querying the device, this adjusts the max number of queues to allow
dedicated MSI-X vectors for each vPort, clamping to no less than
MANA_DEF_NUM_QUEUES. MSI-X sharing among vPorts is disabled by default
and enabled only when there are not enough vectors. It also renames
mana_query_device_cfg() to mana_gd_query_device_cfg().

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c

[ ... ]

> +	/*
> +	 * Adjust gc->max_num_queues returned from the SOC to allow dedicated
> +	 * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
> +	 */
> +	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> +	max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));

Does rounding up here inflate max_num_queues and unnecessarily force MSI-X
sharing?

For example, if there are 63 usable MSIs and 2 ports, the division yields 31.
Rounding up gives 32. The subsequent check below:

> +	/*
> +	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> +	 * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
> +	 */
> +	max_num_queues = min(gc->max_num_queues, max_num_queues);
> +	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> +		gc->msi_sharing = true;

would then evaluate to true (32 * 2 > 63) and force the driver into MSI-X
shared mode. This seems to contradict the intent to use dedicated MSI-X
whenever possible.

Would it be better to use rounddown_pow_of_two() instead to ensure the
calculated queues fit within the available dedicated vectors?

[ ... ]

> @@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
>  	if (err)
>  		goto destroy_hwc;
>  
> -	err = mana_gd_query_max_resources(pdev);
> +	err = mana_gd_detect_devices(pdev);
>  	if (err)
>  		goto destroy_hwc;
>  
> -	err = mana_gd_setup_remaining_irqs(pdev);
> -	if (err) {
> -		dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> -		goto destroy_hwc;
> -	}
> -
> -	err = mana_gd_detect_devices(pdev);
> +	err = mana_gd_query_max_resources(pdev);
>  	if (err)
>  		goto destroy_hwc;
>  
> +	if (!gc->msi_sharing) {
> +		gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
> +		if (!gc->msi_bitmap) {
> +			err = -ENOMEM;
> +			goto destroy_hwc;
> +		}
> +		/* Set bit for HWC */
> +		set_bit(0, gc->msi_bitmap);
> +	} else {
> +		err = mana_gd_setup_remaining_irqs(pdev);
> +		if (err) {
> +			dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> +			goto destroy_hwc;
> +		}
> +	}

If msi_sharing is disabled, we allocate the msi_bitmap but skip calling
mana_gd_setup_remaining_irqs(). 

Since mana_gd_setup_hwc_irqs() only allocates a single vector for the hardware
channel when dynamic allocation is supported, does this leave the device
without interrupts for its Ethernet queues?

If so, it seems this could lead to queue creation failures when the driver
attempts to map uninitialized vectors. I notice this is fixed in a later patch
in the series ("net: mana: Allocate interrupt context for each EQ when
creating vPort"), but does leaving it out here break bisectability?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
  2026-05-02 15:07   ` Simon Horman
@ 2026-05-02 15:23   ` Simon Horman
  2026-05-02 15:29     ` Simon Horman
  2026-05-04 22:07     ` Long Li
  1 sibling, 2 replies; 17+ messages in thread
From: Simon Horman @ 2026-05-02 15:23 UTC (permalink / raw)
  To: longli
  Cc: 'Simon Horman', kotaranov, kuba, davem, pabeni, edumazet,
	andrew+netdev, jgg, leon, haiyangz, kys, wei.liu, decui, netdev,
	linux-rdma, linux-hyperv, linux-kernel

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: mana: Create separate EQs for each vPort

This change moves EQ ownership from struct mana_context to struct
mana_port_context so each vPort can have dedicated EQs, and exports
mana_create_eq()/mana_destroy_eq() for use by the MANA RDMA driver.

> diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
> index ac5e75dd3494..60cc02e4ad10 100644
> --- a/drivers/infiniband/hw/mana/main.c
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
>  	pd->vport_use_count--;
>  	WARN_ON(pd->vport_use_count < 0);
>  
> -	if (!pd->vport_use_count)
> +	if (!pd->vport_use_count) {
> +		mana_destroy_eq(mpc);
>  		mana_uncfg_vport(mpc);
> +	}
>  
>  	mutex_unlock(&pd->vport_mutex);
>  }
> @@ -55,15 +57,21 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
>  		return err;
>  	}
>  
> -	mutex_unlock(&pd->vport_mutex);
>  
>  	pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
>  	pd->tx_vp_offset = mpc->tx_vp_offset;
> +	err = mana_create_eq(mpc);
> +	if (err) {
> +		mana_uncfg_vport(mpc);
> +		pd->vport_use_count--;
> +	}
> +
> +	mutex_unlock(&pd->vport_mutex);
>  
>  	ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
>  		  mpc->port_handle, pd->pdn, doorbell_id);

On the failure path, mana_uncfg_vport() is called and pd->vport_use_count
is decremented, but execution falls through to the ibdev_dbg() line that
prints the vport handle as if it were a success. Should this debug print
be skipped on the error path so logs are not ambiguous?

Also, pd->tx_shortform_allowed and pd->tx_vp_offset are assigned from mpc
before the mana_create_eq() attempt and are not cleared when that call
fails. Callers are expected to gate on pd->vport_use_count, but is it
worth resetting these fields on failure so the state matches the
pre-patch behavior?

The mutex_unlock(&pd->vport_mutex) has moved from before the field
assignments to after mana_create_eq(). That now holds pd->vport_mutex
across GFP_KERNEL allocations, debugfs directory creation, and multiple
mana_gd_create_mana_eq() firmware RPCs, and establishes the lock order
pd->vport_mutex -> apc->vport_mutex via mana_uncfg_vport() on the error
path. Is the extended critical section intentional?

>  
> -	return 0;
> +	return err;
>  }
>  
>  int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 645581359cee..6f1043383e8c 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
>  		cq_spec.gdma_region = cq->queue.gdma_region;
>  		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
>  		cq_spec.modr_ctx_id = 0;
> -		eq = &mpc->ac->eqs[cq->comp_vector];
> +		/* EQs are created when a raw QP configures the vport.
> +		 * A raw QP must be created before creating rwq_ind_tbl.
> +		 */
> +		if (!mpc->eqs) {
> +			ret = -EINVAL;
> +			i--;
> +			goto fail;
> +		}
> +		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];

Can the mpc->eqs read race with a concurrent free? mana_ib_create_qp_rss()
runs without pd->vport_mutex or RTNL, but mpc->eqs is freed by
mana_destroy_eq() from two paths:

  mana_ib_uncfg_vport()   (under pd->vport_mutex, on last raw-QP destroy)
  mana_dealloc_queues()   (under RTNL, on netdev down)

both of which do:

  kfree(apc->eqs);
  apc->eqs = NULL;

with no RCU grace period or reader-visible synchronization. If CPU-A
passes the !mpc->eqs check after CPU-B begins ip link set dev X down,
does CPU-A then dereference freed memory via mpc->eqs[...].eq->id?

Separately, what populates mpc->eqs for an RDMA-only RSS QP user on a
probed-but-idle Ethernet port? Pre-patch mana_probe() called
mana_create_eq(ac) unconditionally, so ac->eqs existed for the device
lifetime. After this patch the only creators are mana_alloc_queues()
(Ethernet up) and mana_ib_cfg_vport() (raw QP). An RDMA application that
uses only RSS QPs and never creates a raw QP will now get -EINVAL here
where it used to succeed. Is this intended, and should the commit log
mention it?

The adjacent comment:

   /* EQs are created when a raw QP configures the vport.
    * A raw QP must be created before creating rwq_ind_tbl.
    */

omits the Ethernet-up path that also populates mpc->eqs. Would it be
clearer to describe both creators?

There is also a behavior change in the index expression:

   eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];

Pre-patch this was ac->eqs[cq->comp_vector] sized by gc->max_num_queues.
Now comp_vector is folded modulo mpc->num_queues, which is tunable via
ethtool -L. Userspace that used distinct comp_vector values to hit
distinct EQs will silently alias when comp_vector >= num_queues. Should
this be documented or rejected with -EINVAL rather than silently
wrapped?

Can mpc->num_queues be 0 at this point? mana_set_channels() does not
reject new_count==0, and kzalloc_objs(struct mana_eq, 0) returns
ZERO_SIZE_PTR, which passes the !mpc->eqs check. During the window
between mana_create_eq() and the subsequent netif_set_real_num_tx_queues()
failing, a concurrent RDMA QP create would compute
cq->comp_vector % 0 here. Should mpc->num_queues be validated alongside
mpc->eqs?

The placement of the !mpc->eqs check is inside the per-iteration loop
over ind_tbl_size, but mpc->eqs cannot change across iterations, so the
check is only meaningful at i==0. It works today because i-- then makes
i = -1 and the cleanup while (i-- > 0) skips. Would hoisting the check
above the loop be clearer and less fragile against refactoring?

>  		cq_spec.attached_eq = eq->eq->id;
>  
>  		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
> @@ -317,7 +325,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
>  	cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
>  	cq_spec.modr_ctx_id = 0;
>  	eq_vec = send_cq->comp_vector;
> -	eq = &mpc->ac->eqs[eq_vec];
> +	if (!mpc->eqs) {
> +		err = -EINVAL;
> +		goto err_destroy_queue;
> +	}
> +	eq = &mpc->eqs[eq_vec % mpc->num_queues];

The same mpc->eqs TOCTOU concern applies here: the check and subsequent
dereference occur without pd->vport_mutex or RTNL held, so a concurrent
mana_dealloc_queues() on the Ethernet side can free mpc->eqs between the
check and the index access. Is there synchronization that prevents this?

The same comp_vector % num_queues aliasing and num_queues==0 divide
concerns apply here as well.

>  	cq_spec.attached_eq = eq->eq->id;
>  
>  	err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index a654b3699c4c..6c709f8b875d 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c

[ ... ]

> -static int mana_create_eq(struct mana_context *ac)
> +int mana_create_eq(struct mana_port_context *apc)
>  {
> -	struct gdma_dev *gd = ac->gdma_dev;
> +	struct gdma_dev *gd = apc->ac->gdma_dev;
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct gdma_queue_spec spec = {};
>  	int err;
>  	int i;
>  
> -	ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
> -	if (!ac->eqs)
> +	WARN_ON(apc->eqs);
> +	apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
> +	if (!apc->eqs)
>  		return -ENOMEM;

WARN_ON(apc->eqs) is used as a defensive assertion but execution
proceeds to overwrite apc->eqs unconditionally. If the invariant is
violated, the previous array and all HW EQs it referenced are leaked.
Should this instead bail out with an error or call mana_destroy_eq()
first to keep the function idempotent against future callers?

[ ... ]

> @@ -3326,6 +3349,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	mana_fence_rqs(apc);
>  
>  	/* Even in err case, still need to cleanup the vPort */
> +	mana_destroy_rxqs(apc);
> +	mana_destroy_txq(apc);
> +	mana_destroy_eq(apc);
>  	mana_destroy_vport(apc);

What is the intended interaction with outstanding RDMA QPs here? An
RDMA user that created an RSS QP while the Ethernet netdev was up sees
mpc->eqs populated by mana_alloc_queues(). When the admin issues
ip link set dev N down, mana_dealloc_queues() -> mana_destroy_eq()
destroys those EQs and frees mpc->eqs while the RDMA QPs are still
alive, leaving the QPs with dangling attached_eq IDs at the hardware
level and stale kernel references.

Pre-patch ac->eqs lived for the full mana_context lifetime and was torn
down only in mana_remove(). Is unconditionally destroying the EQs on
netdev-down the intended new behavior, and if so how are concurrent
RDMA consumers expected to recover?

>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
  2026-04-29 22:16 ` [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
  2026-05-02 15:08   ` Simon Horman
@ 2026-05-02 15:26   ` Simon Horman
  2026-05-04 22:30     ` [EXTERNAL] " Long Li
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Horman @ 2026-05-02 15:26 UTC (permalink / raw)
  To: longli
  Cc: 'Simon Horman', kotaranov, kuba, davem, pabeni, edumazet,
	andrew+netdev, jgg, leon, haiyangz, kys, wei.liu, decui, netdev,
	linux-rdma, linux-hyperv, linux-kernel

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/

I apologise for any overlap with this and the review of this
patchset available at sashiko.dev. They both seem relevant and
I'm unsure how to cleanly pass them both on.

Both Sashiko instances also flagged concerns in the remaining
patches of this patchset. But I am not forwarding them as they
seem to be lower priority, and I think I've already created
enough noise here.

---
net: mana: Query device capabilities and configure MSI-X sharing for EQs

This patch queries device capabilities early in GDMA probe and chooses
between dedicated MSI-X per vPort or a shared-MSI-X mode based on the
available vector count, introducing gc->max_num_queues_vport,
gc->msi_sharing and gc->msi_bitmap, and renaming mana_query_device_cfg()
to mana_gd_query_device_cfg() so it can be called from the GDMA layer.

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 098fbda0d128..b96859e0aec9 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -149,6 +149,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
>  	struct gdma_query_max_resources_resp resp = {};
>  	struct gdma_general_req req = {};
> +	unsigned int max_num_queues;
> +	u8 bm_hostmode;
> +	u16 num_ports;
>  	int err;
>  
>  	mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
> @@ -194,6 +197,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
>  	if (gc->max_num_queues > gc->num_msix_usable - 1)
>  		gc->max_num_queues = gc->num_msix_usable - 1;
>  
> +	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
> +				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
> +	if (err)
> +		return err;

Is this the same MANA_QUERY_DEV_CONFIG that mana_probe() issues shortly
after?  mana_probe() still contains:

	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);

so the same HWC round-trip runs twice on every probe with identical
arguments, both populating gc->adapter_mtu.

The commit message frames this rename as consolidation ("used at GDMA
device probe time"), but the mana_probe() call-site is not removed.  Was
one of the two call-sites meant to go away?

The relocation of debugfs_create_u16("adapter-MTU", ...) from
mana_gd_query_device_cfg() into mana_probe() only makes sense if the
function is now invoked more than once, which hints at the same
duplication.

> +
> +	if (!num_ports)
> +		return -EINVAL;
> +
> +	/*
> +	 * Adjust gc->max_num_queues returned from the SOC to allow dedicated
> +	 * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
> +	 */
> +	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> +	max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
> +	if (max_num_queues < MANA_DEF_NUM_QUEUES)
> +		max_num_queues = MANA_DEF_NUM_QUEUES;
> +
> +	/*
> +	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> +	 * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
> +	 */
> +	max_num_queues = min(gc->max_num_queues, max_num_queues);
> +	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> +		gc->msi_sharing = true;

Is gc->msi_sharing ever reset to false?  The only two writers are this
line and mana_gd_setup_hwc_irqs() (the !pci_msix_can_alloc_dyn branch),
and both only set it to true.  mana_gd_remove_irqs() frees msi_bitmap
and zeros max_num_msix / num_msix_usable, but does not clear
msi_sharing, and the gdma_context survives mana_gd_suspend() /
mana_gd_resume().

Once true is latched in any setup cycle, mana_gd_setup() will always
take the sharing branch on resume even if the recomputed
max_num_queues * num_ports fits within num_msix_usable - 1.  Is that
consistent with the commit message saying sharing "is only enabled when
there are not enough MSI-X vectors for dedicated allocation"?

> +
> +	/* If MSI is shared, use max allowed value */
> +	if (gc->msi_sharing)
> +		gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
> +	else
> +		gc->max_num_queues_vport = max_num_queues;
> +
> +	dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
> +		 gc->msi_sharing, gc->max_num_queues);

Should this print gc->max_num_queues_vport rather than
gc->max_num_queues?  The block immediately above computes
gc->max_num_queues_vport and leaves gc->max_num_queues unchanged, and
mana_probe_port() sizes alloc_etherdev_mq() and apc->max_queues from
gc->max_num_queues_vport:

	ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
				 gc->max_num_queues_vport);
	...
	apc->max_queues = gc->max_num_queues_vport;
	apc->num_queues = min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES);

so the logged value does not match the per-vPort queue count the driver
actually exposes.

> +
>  	return 0;
>  }
>  
> @@ -1856,6 +1893,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)

[ ... ]

> @@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
>  	if (err)
>  		goto destroy_hwc;
>  
> -	err = mana_gd_query_max_resources(pdev);
> +	err = mana_gd_detect_devices(pdev);
>  	if (err)
>  		goto destroy_hwc;
>  
> -	err = mana_gd_setup_remaining_irqs(pdev);
> -	if (err) {
> -		dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> -		goto destroy_hwc;
> -	}
> -
> -	err = mana_gd_detect_devices(pdev);
> +	err = mana_gd_query_max_resources(pdev);
>  	if (err)
>  		goto destroy_hwc;
>  
> +	if (!gc->msi_sharing) {
> +		gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
> +		if (!gc->msi_bitmap) {
> +			err = -ENOMEM;
> +			goto destroy_hwc;
> +		}
> +		/* Set bit for HWC */
> +		set_bit(0, gc->msi_bitmap);
> +	} else {
> +		err = mana_gd_setup_remaining_irqs(pdev);
> +		if (err) {
> +			dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> +			goto destroy_hwc;
> +		}
> +	}
> +

Can the driver bring up any vPort after this change when the !msi_sharing
branch is taken?

In the dedicated branch, only gc->msi_bitmap is allocated and bit 0 is
set for HWC.  mana_gd_setup_remaining_irqs() is skipped, so no
gdma_irq_context is inserted into gc->irq_contexts for indices 1..
num_msix_usable-1.

Later, mana_create_eq() still assigns

	spec.eq.msix_index = (i + 1) % gc->num_msix_usable;

and mana_gd_register_irq() does:

	gic = xa_load(&gc->irq_contexts, msi_index);
	if (WARN_ON(!gic))
		return -EINVAL;

On a typical cloud SKU with, for example, num_msix_usable=32,
num_ports=1 and num_online_cpus=16, the new math keeps msi_sharing=false
(16 * 1 <= 31), so every EQ-create goes down this path and hits the
WARN_ON.  Doesn't that make every vPort open and every resume fail for
the common dedicated-MSI-X case?

The msi_bitmap allocated here is not consumed anywhere in this commit;
the on-demand allocation via mana_gd_get_gic() appears in the later
commit "net: mana: Allocate interrupt context for each EQ when creating
vPort" (dbbdf40a8974).  Should the bitmap and the new branch be
introduced in the same commit that actually uses them, so each commit in
the series is independently bootable?

>  	dev_dbg(&pdev->dev, "mana gdma setup successful\n");
>  	return 0;
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-05-02 15:23   ` Simon Horman
@ 2026-05-02 15:29     ` Simon Horman
  2026-05-04 22:08       ` [EXTERNAL] " Long Li
  2026-05-04 22:07     ` Long Li
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Horman @ 2026-05-02 15:29 UTC (permalink / raw)
  To: longli
  Cc: kotaranov, kuba, davem, pabeni, edumazet, andrew+netdev, jgg,
	leon, haiyangz, kys, wei.liu, decui, netdev, linux-rdma,
	linux-hyperv, linux-kernel

On Sat, May 02, 2026 at 04:23:55PM +0100, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev

Sorry about this, there was supposed to be some different text here.

This review is available at https://netdev-ai.bots.linux.dev/sashiko/
And I apologise that it overlaps with the review from https://sashiko.dev
which I also posted.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-05-02 15:07   ` Simon Horman
@ 2026-05-04 21:51     ` Long Li
  0 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-04 21:51 UTC (permalink / raw)
  To: Simon Horman
  Cc: Konstantin Taranov, kuba@kernel.org, davem@davemloft.net,
	pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
	jgg@ziepe.ca, leon@kernel.org, Haiyang Zhang, KY Srinivasan,
	wei.liu@kernel.org, Dexuan Cui, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

> From: 'Simon Horman' <horms@kernel.org>
>
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> Full review at:
> https://sashiko.d/
> ev%2F&data=05%7C02%7Clongli%40microsoft.com%7C3344dd1ec8e44fc6d4a50
> 8dea85caa2a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63913331
> 3118303655%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=GJTF8DNw%2FzYNsPz0lC%2FDFZpcCwqjuCmfVu7smim81x
> E%3D&reserved=0
> ---
> net: mana: Create separate EQs for each vPort
>
> This commit modifies the mana driver to prepare for assigning vPorts to dedicated
> MSI-X vectors by removing EQ sharing among vPorts. It moves the EQ definition
> from the mana_context to the mana_port_context and exports EQ lifecycle
> functions for use by the MANA RDMA driver.
>
> > diff --git a/drivers/infiniband/hw/mana/qp.c
> > b/drivers/infiniband/hw/mana/qp.c index 645581359cee..6f1043383e8c
> > 100644
> > --- a/drivers/infiniband/hw/mana/qp.c
> > +++ b/drivers/infiniband/hw/mana/qp.c
> > @@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp,
> struct ib_pd *pd,
> >             cq_spec.gdma_region = cq->queue.gdma_region;
> >             cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> >             cq_spec.modr_ctx_id = 0;
> > -           eq = &mpc->ac->eqs[cq->comp_vector];
> > +           /* EQs are created when a raw QP configures the vport.
> > +            * A raw QP must be created before creating rwq_ind_tbl.
> > +            */
> > +           if (!mpc->eqs) {
> > +                   ret = -EINVAL;
> > +                   i--;
> > +                   goto fail;
> > +           }
> > +           eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
>
> Is it possible for the EQs to be destroyed while this RSS QP is still actively using
> them?

  No. The EQ id is consumed only at creation time - it is passed to firmware as cq_spec.attached_eq during mana_create_wq_obj(). After that call the CQ-to-EQ binding lives entirely in firmware.
  The kernel never dereferences mpc->eqs again for that RSS QP's lifetime, so there is no ongoing kernel-side access to the EQ struct from an active RSS QP.

>
> If the EQs are created by the Ethernet interface being brought up, or by a RAW
> QP configuring the vport, this RSS QP will attach to them without incrementing
> pd->vport_use_count or taking any vport reference count.

  This is by design. The RSS QP (RX side) does not take a vport reference, and symmetrically mana_ib_destroy_qp_rss() does not release one either. Only raw QPs (SQ side) take and release vport
  references. The refcount is balanced: RSS QPs are pure consumers of an already-configured vport.

>
> If the Ethernet interface is subsequently brought down, or the RAW QP is
> destroyed, mana_destroy_eq() will be called, freeing the mpc->eqs array and
> destroying the underlying DMA regions while this RSS QP remains active. This
> regression could allow the hardware to DMA completion events into freed EQ
> memory.

  Destroying the raw QP also calls mana_uncfg_vport(), which deconfigures the vport entirely. After that, firmware will not route any traffic or generate completions on this vport, so there are no
  in-flight DMA writes to the EQ. This is the same pre-existing behavior: the raw QP has always been the vport lifecycle anchor, and destroying it while an RSS QP is active would have broken the
  vport regardless - this patch does not change that relationship. Before this patch the EQs simply outlived the vport (device lifetime vs vport lifetime), which masked the dependency but did not
  make the out-of-order teardown any safer at the vport level.

>
> Additionally, since mpc->eqs is accessed here without holding pd->vport_mutex,
> could a concurrent teardown of the EQs lead to a use-after-free when reading
> eq->eq->id?

  The RDMA core serializes QP creation and destruction on the same device context. A concurrent teardown would require destroying the raw QP on the same PD simultaneously with RSS QP creation,
  which the IB core does not permit. The !mpc->eqs check is a defensive guard against wrong call ordering (creating an RSS QP before any raw QP), not a synchronization point for concurrent access.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-05-02 15:23   ` Simon Horman
  2026-05-02 15:29     ` Simon Horman
@ 2026-05-04 22:07     ` Long Li
  1 sibling, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-04 22:07 UTC (permalink / raw)
  To: Simon Horman
  Cc: Konstantin Taranov, kuba@kernel.org, davem@davemloft.net,
	pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
	jgg@ziepe.ca, leon@kernel.org, Haiyang Zhang, KY Srinivasan,
	wei.liu@kernel.org, Dexuan Cui, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

> From: 'Simon Horman' <horms@kernel.org>
>
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> Full review at:
> https://sashiko.d/
> ev%2F&data=05%7C02%7Clongli%40microsoft.com%7C36b10107f9774abb6894
> 08dea85f3721%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6391333
> 24077887564%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=cS3FmP7jK6ftjo74XWvtBHNXYa7YKlthqLgzEudcmb4%3D&re
> served=0
> ---
> net: mana: Create separate EQs for each vPort
>
> This change moves EQ ownership from struct mana_context to struct
> mana_port_context so each vPort can have dedicated EQs, and exports
> mana_create_eq()/mana_destroy_eq() for use by the MANA RDMA driver.
>
> > diff --git a/drivers/infiniband/hw/mana/main.c
> > b/drivers/infiniband/hw/mana/main.c
> > index ac5e75dd3494..60cc02e4ad10 100644
> > --- a/drivers/infiniband/hw/mana/main.c
> > +++ b/drivers/infiniband/hw/mana/main.c
> > @@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev,
> struct mana_ib_pd *pd,
> >     pd->vport_use_count--;
> >     WARN_ON(pd->vport_use_count < 0);
> >
> > -   if (!pd->vport_use_count)
> > +   if (!pd->vport_use_count) {
> > +           mana_destroy_eq(mpc);
> >             mana_uncfg_vport(mpc);
> > +   }
> >
> >     mutex_unlock(&pd->vport_mutex);
> >  }
> > @@ -55,15 +57,21 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32
> port, struct mana_ib_pd *pd,
> >             return err;
> >     }
> >
> > -   mutex_unlock(&pd->vport_mutex);
> >
> >     pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> >     pd->tx_vp_offset = mpc->tx_vp_offset;
> > +   err = mana_create_eq(mpc);
> > +   if (err) {
> > +           mana_uncfg_vport(mpc);
> > +           pd->vport_use_count--;
> > +   }
> > +
> > +   mutex_unlock(&pd->vport_mutex);
> >
> >     ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
> >               mpc->port_handle, pd->pdn, doorbell_id);
>
> On the failure path, mana_uncfg_vport() is called and pd->vport_use_count is
> decremented, but execution falls through to the ibdev_dbg() line that prints the
> vport handle as if it were a success. Should this debug print be skipped on the
> error path so logs are not ambiguous?

  This is fair point. The ibdev_dbg could be guarded with if (!err) for clarity. That said, ibdev_dbg is compiled out in non-debug builds and the error itself is returned to the caller, so this is
  cosmetic. Happy to add the guard if you feel strongly.

>
> Also, pd->tx_shortform_allowed and pd->tx_vp_offset are assigned from mpc
> before the mana_create_eq() attempt and are not cleared when that call fails.
> Callers are expected to gate on pd->vport_use_count, but is it worth resetting
> these fields on failure so the state matches the pre-patch behavior?

  These fields are only consumed when pd->vport_use_count > 0. On failure the count is decremented back to 0, so the stale values are never read. The next successful mana_ib_cfg_vport() call will
  overwrite them. This matches pre-patch behavior - pre-patch also assigned these fields unconditionally and relied on the use count to gate access.

>
> The mutex_unlock(&pd->vport_mutex) has moved from before the field
> assignments to after mana_create_eq(). That now holds pd->vport_mutex across
> GFP_KERNEL allocations, debugfs directory creation, and multiple
> mana_gd_create_mana_eq() firmware RPCs, and establishes the lock order
> pd->vport_mutex -> apc->vport_mutex via mana_uncfg_vport() on the error
> path. Is the extended critical section intentional?

  Yes, this is intentional. Pre-patch, pd->vport_mutex was released before assigning pd->tx_shortform_allowed and pd->tx_vp_offset, which was a data race - a concurrent raw
  QP creation on the same PD could read partially-initialized state. The extended critical section ensures the vport is fully configured (including EQs) before any concurrent user can observe
  vport_use_count > 0.

>
> >
> > -   return 0;
> > +   return err;
> >  }
> >
> >  int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) diff
> > --git a/drivers/infiniband/hw/mana/qp.c
> > b/drivers/infiniband/hw/mana/qp.c index 645581359cee..6f1043383e8c
> > 100644
> > --- a/drivers/infiniband/hw/mana/qp.c
> > +++ b/drivers/infiniband/hw/mana/qp.c
> > @@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp,
> struct ib_pd *pd,
> >             cq_spec.gdma_region = cq->queue.gdma_region;
> >             cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> >             cq_spec.modr_ctx_id = 0;
> > -           eq = &mpc->ac->eqs[cq->comp_vector];
> > +           /* EQs are created when a raw QP configures the vport.
> > +            * A raw QP must be created before creating rwq_ind_tbl.
> > +            */
> > +           if (!mpc->eqs) {
> > +                   ret = -EINVAL;
> > +                   i--;
> > +                   goto fail;
> > +           }
> > +           eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
>
> Can the mpc->eqs read race with a concurrent free? mana_ib_create_qp_rss()
> runs without pd->vport_mutex or RTNL, but mpc->eqs is freed by
> mana_destroy_eq() from two paths:
>
>   mana_ib_uncfg_vport()   (under pd->vport_mutex, on last raw-QP destroy)
>   mana_dealloc_queues()   (under RTNL, on netdev down)
>
> both of which do:
>
>   kfree(apc->eqs);
>   apc->eqs = NULL;
>
> with no RCU grace period or reader-visible synchronization. If CPU-A passes
> the !mpc->eqs check after CPU-B begins ip link set dev X down, does CPU-A then
> dereference freed memory via mpc->eqs[...].eq->id?

  These two paths cannot run concurrently with RDMA QP creation on the same port. mana_cfg_vport() enforces mutual exclusion between Ethernet and RDMA via apc->vport_use_count:

       mutex_lock(&apc->vport_mutex);
       if (apc->vport_use_count > 0) {
               mutex_unlock(&apc->vport_mutex);
               return -EBUSY;
       }
       apc->vport_use_count++;

  If RDMA holds the vport (created a raw QP), Ethernet cannot bring the interface up, so mana_dealloc_queues() cannot run. If Ethernet holds the vport, mana_ib_cfg_vport() → mana_cfg_vport()
  returns -EBUSY and no RDMA raw QP is created, so mpc->eqs belongs exclusively to Ethernet.

  The mana_ib_uncfg_vport() path requires destroying the last raw QP on the PD, which means no new RDMA QPs should be in flight on that PD. The IB core serializes QP creation/destruction on the
  same device context.

>
> Separately, what populates mpc->eqs for an RDMA-only RSS QP user on a probed-
> but-idle Ethernet port? Pre-patch mana_probe() called
> mana_create_eq(ac) unconditionally, so ac->eqs existed for the device lifetime.
> After this patch the only creators are mana_alloc_queues() (Ethernet up) and
> mana_ib_cfg_vport() (raw QP). An RDMA application that uses only RSS QPs and
> never creates a raw QP will now get -EINVAL here where it used to succeed. Is
> this intended, and should the commit log mention it?

  This is intentional. An RSS-only RDMA application (no raw QP) could never have worked in practice: mana_ib_create_qp_rss() calls mana_create_wq_obj(mpc, mpc->port_handle, ...) which requires the
  vport to be configured. Without a raw QP calling mana_ib_cfg_vport() → mana_cfg_vport(), mpc->port_handle is INVALID_MANA_HANDLE and the firmware call would fail. The -EINVAL is a cleaner early
  error for a path that was already broken.

>
> The adjacent comment:
>
>    /* EQs are created when a raw QP configures the vport.
>     * A raw QP must be created before creating rwq_ind_tbl.
>     */
>
> omits the Ethernet-up path that also populates mpc->eqs. Would it be clearer to
> describe both creators?

  The comment is in the RDMA code path. Due to the mana_cfg_vport() mutual exclusion described above, when RDMA is executing this code, it owns the vport - so the EQs were created by the raw QP,
  not by Ethernet. The Ethernet path is not reachable when RDMA holds the port. The comment is accurate for the context it appears in.

>
> There is also a behavior change in the index expression:
>
>    eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
>
> Pre-patch this was ac->eqs[cq->comp_vector] sized by gc->max_num_queues.
> Now comp_vector is folded modulo mpc->num_queues, which is tunable via
> ethtool -L. Userspace that used distinct comp_vector values to hit distinct EQs
> will silently alias when comp_vector >= num_queues. Should this be documented
> or rejected with -EINVAL rather than silently wrapped?

  The modulo wrap is the correct behavior. The EQ array is now sized to mpc->num_queues (the actual configured queue count), not the hardware maximum. Rejecting with -EINVAL would break
  applications that set comp_vector based on the device's total num_comp_vectors - which is the standard RDMA practice. Wrapping is what other drivers do in this situation (e.g., mlx5). Pre-patch
  the wrap just happened to be unnecessary because the array was oversized to the hardware max.

>
> Can mpc->num_queues be 0 at this point? mana_set_channels() does not reject
> new_count==0, and kzalloc_objs(struct mana_eq, 0) returns ZERO_SIZE_PTR,
> which passes the !mpc->eqs check. During the window between
> mana_create_eq() and the subsequent netif_set_real_num_tx_queues() failing, a
> concurrent RDMA QP create would compute
> cq->comp_vector % 0 here. Should mpc->num_queues be validated alongside
> mpc->eqs?

  num_queues cannot be 0. The ethtool core validates this before calling the driver - ethnl_set_channels() rejects the request if combined_count is 0 and there are no separate rx/tx channels
  (which MANA doesn't use). Additionally, num_queues is initialized at probe time to min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES) where MANA_DEF_NUM_QUEUES is 16 and the hardware always
  reports at least 1 queue.

>
> The placement of the !mpc->eqs check is inside the per-iteration loop over
> ind_tbl_size, but mpc->eqs cannot change across iterations, so the check is only
> meaningful at i==0. It works today because i-- then makes i = -1 and the cleanup
> while (i-- > 0) skips. Would hoisting the check above the loop be clearer and less
> fragile against refactoring?
>
> >             cq_spec.attached_eq = eq->eq->id;
> >
> >             ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
> @@ -317,7
> > +325,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd
> *ibpd,
> >     cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
> >     cq_spec.modr_ctx_id = 0;
> >     eq_vec = send_cq->comp_vector;
> > -   eq = &mpc->ac->eqs[eq_vec];
> > +   if (!mpc->eqs) {
> > +           err = -EINVAL;
> > +           goto err_destroy_queue;
> > +   }
> > +   eq = &mpc->eqs[eq_vec % mpc->num_queues];
>
> The same mpc->eqs TOCTOU concern applies here: the check and subsequent
> dereference occur without pd->vport_mutex or RTNL held, so a concurrent
> mana_dealloc_queues() on the Ethernet side can free mpc->eqs between the
> check and the index access. Is there synchronization that prevents this?

  Same answer as above: mana_cfg_vport() mutual exclusion prevents this. In mana_ib_create_qp_raw(), mana_ib_cfg_vport() is called just before this code, which calls mana_cfg_vport(). If Ethernet
  holds the port, that call returns -EBUSY and we never reach the mpc->eqs access. If we reach it, RDMA owns the vport and Ethernet cannot tear it down.

>
> The same comp_vector % num_queues aliasing and num_queues==0 divide
> concerns apply here as well.
>
> >     cq_spec.attached_eq = eq->eq->id;
> >
> >     err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ,
> &wq_spec,
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index a654b3699c4c..6c709f8b875d 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>
> [ ... ]
>
> > -static int mana_create_eq(struct mana_context *ac)
> > +int mana_create_eq(struct mana_port_context *apc)
> >  {
> > -   struct gdma_dev *gd = ac->gdma_dev;
> > +   struct gdma_dev *gd = apc->ac->gdma_dev;
> >     struct gdma_context *gc = gd->gdma_context;
> >     struct gdma_queue_spec spec = {};
> >     int err;
> >     int i;
> >
> > -   ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
> > -   if (!ac->eqs)
> > +   WARN_ON(apc->eqs);
> > +   apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
> > +   if (!apc->eqs)
> >             return -ENOMEM;
>
> WARN_ON(apc->eqs) is used as a defensive assertion but execution proceeds to
> overwrite apc->eqs unconditionally. If the invariant is violated, the previous array
> and all HW EQs it referenced are leaked.
> Should this instead bail out with an error or call mana_destroy_eq() first to keep
> the function idempotent against future callers?


  The WARN_ON is a development assertion to catch double-init bugs. If it fires, there is already a logic error in the caller. Silently cleaning up and proceeding would mask the root cause. The
  standard kernel pattern for "this must never happen" is WARN_ON + continue, not silent recovery. If we wanted to be stricter, returning -EBUSY would be reasonable, but the current approach is
  consistent with kernel convention for invariant checks.

>
> [ ... ]
>
> > @@ -3326,6 +3349,9 @@ static int mana_dealloc_queues(struct net_device
> *ndev)
> >     mana_fence_rqs(apc);
> >
> >     /* Even in err case, still need to cleanup the vPort */
> > +   mana_destroy_rxqs(apc);
> > +   mana_destroy_txq(apc);
> > +   mana_destroy_eq(apc);
> >     mana_destroy_vport(apc);
>
> What is the intended interaction with outstanding RDMA QPs here? An RDMA
> user that created an RSS QP while the Ethernet netdev was up sees
> mpc->eqs populated by mana_alloc_queues(). When the admin issues
> ip link set dev N down, mana_dealloc_queues() -> mana_destroy_eq() destroys
> those EQs and frees mpc->eqs while the RDMA QPs are still alive, leaving the QPs
> with dangling attached_eq IDs at the hardware level and stale kernel references.

  This scenario cannot occur. As described above, mana_cfg_vport() enforces mutual exclusion - Ethernet and RDMA cannot hold the same port simultaneously. If Ethernet is up, RDMA raw QP creation
  on that port returns -EBUSY. If RDMA holds the port, mana_alloc_queues() → mana_create_vport() → mana_cfg_vport() returns -EBUSY and the interface fails to come up. There is no state where both
  have created EQs on the same mpc.

>
> Pre-patch ac->eqs lived for the full mana_context lifetime and was torn down
> only in mana_remove(). Is unconditionally destroying the EQs on netdev-down
> the intended new behavior, and if so how are concurrent RDMA consumers
> expected to recover?

Yes, destroying EQs on netdev-down is the intended new behavior, and no RDMA recovery path is needed because the scenario has no concurrent RDMA consumers.

mana_cfg_vport() enforces mutual exclusion at the hardware port level via apc->vport_use_count - it returns -EBUSY if the port is already held. Ethernet and RDMA cannot hold the same port
simultaneously:

- If Ethernet is up → RDMA raw QP creation fails at mana_cfg_vport() with -EBUSY → no RDMA EQs or QPs exist on that port → netdev-down destroys only Ethernet's own EQs.
- If RDMA holds the port → mana_alloc_queues() → mana_create_vport() → mana_cfg_vport() returns -EBUSY → interface never comes up → mana_dealloc_queues() never runs.

Pre-patch, the device-lifetime EQs in ac->eqs were shared across all ports and both subsystems, which masked this exclusivity - the EQs were always present regardless of who owned the port.
Post-patch, each port owns its EQs, and they follow the lifecycle of whoever holds the port. The exclusion guarantee means there is nothing to recover from.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort
  2026-05-02 15:29     ` Simon Horman
@ 2026-05-04 22:08       ` Long Li
  0 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-04 22:08 UTC (permalink / raw)
  To: Simon Horman
  Cc: Konstantin Taranov, kuba@kernel.org, davem@davemloft.net,
	pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
	jgg@ziepe.ca, leon@kernel.org, Haiyang Zhang, KY Srinivasan,
	wei.liu@kernel.org, Dexuan Cui, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

> On Sat, May 02, 2026 at 04:23:55PM +0100, Simon Horman wrote:
> > From: 'Simon Horman' <horms@kernel.org>
> >
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > Full review at:
> > https://sash/
> >
> iko.dev%2F&data=05%7C02%7Clongli%40microsoft.com%7C50f9138d30ca49fb0
> 5b
> >
> 708dea85f9e35%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639133
> 32578
> >
> 9863881%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIw
> LjAuMD
> >
> AwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
> C&sd
> > ata=1Ew7dYw%2B7zQjROcj63hnOYFUfak20Pi3ytzOf2J0JWg%3D&reserved=0
>
> Sorry about this, there was supposed to be some different text here.

I have replied to both comments.

Thank you,
Long

>
> This review is available at
> https://netdev-/
> ai.bots.linux.dev%2Fsashiko%2F&data=05%7C02%7Clongli%40microsoft.com%7
> C50f9138d30ca49fb05b708dea85f9e35%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C639133325789877695%7CUnknown%7CTWFpbGZsb3d8eyJFbXB
> 0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIs
> IldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=F4%2FR6eEmKkHI0%2FkuGfvuaM
> 42oss8KCUb9J5Bw6B682Y%3D&reserved=0
> And I apologise that it overlaps with the review from
> https://sashiko.d/
> ev%2F&data=05%7C02%7Clongli%40microsoft.com%7C50f9138d30ca49fb05b70
> 8dea85f9e35%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63913332
> 5789886888%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=g%2BFgPtQXNQSqC%2ByHDua2twXj%2BufRZ8yCze757NpY
> vU8%3D&reserved=0
> which I also posted.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
  2026-05-02 15:08   ` Simon Horman
@ 2026-05-04 22:21     ` Long Li
  0 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-04 22:21 UTC (permalink / raw)
  To: Simon Horman
  Cc: Konstantin Taranov, kuba@kernel.org, davem@davemloft.net,
	pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
	jgg@ziepe.ca, leon@kernel.org, Haiyang Zhang, KY Srinivasan,
	wei.liu@kernel.org, Dexuan Cui, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Simon Horman <horms@kernel.org>
> Sent: Saturday, May 2, 2026 8:09 AM
> To: Long Li <longli@microsoft.com>
> Cc: 'Simon Horman' <horms@kernel.org>; Konstantin Taranov
> <kotaranov@microsoft.com>; kuba@kernel.org; davem@davemloft.net;
> pabeni@redhat.com; edumazet@google.com; andrew+netdev@lunn.ch;
> jgg@ziepe.ca; leon@kernel.org; Haiyang Zhang <haiyangz@microsoft.com>; KY
> Srinivasan <kys@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <DECUI@microsoft.com>; netdev@vger.kernel.org; linux-rdma@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH net-next v6 2/6] net: mana: Query device
> capabilities and configure MSI-X sharing for EQs
>
> From: 'Simon Horman' <horms@kernel.org>
>
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> Full review at:
> https://sashiko.d/
> ev%2F&data=05%7C02%7Clongli%40microsoft.com%7C1b8423bb5d15490288be
> 08dea85cb944%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6391333
> 13365211100%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=H6rPcEM62LmbWsXdpSVcVaaec3Wx6KJnTnYS0fBk6vY%3D
> &reserved=0
> ---
> net: mana: Query device capabilities and configure MSI-X sharing for EQs
>
> When querying the device, this adjusts the max number of queues to allow
> dedicated MSI-X vectors for each vPort, clamping to no less than
> MANA_DEF_NUM_QUEUES. MSI-X sharing among vPorts is disabled by default
> and enabled only when there are not enough vectors. It also renames
> mana_query_device_cfg() to mana_gd_query_device_cfg().
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>
> [ ... ]
>
> > +   /*
> > +    * Adjust gc->max_num_queues returned from the SOC to allow
> dedicated
> > +    * MSIx for each vPort. Clamp to no less than
> MANA_DEF_NUM_QUEUES.
> > +    */
> > +   max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> > +   max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
>
> Does rounding up here inflate max_num_queues and unnecessarily force MSI-X
> sharing?
>
> For example, if there are 63 usable MSIs and 2 ports, the division yields 31.
> Rounding up gives 32. The subsequent check below:
>
> > +   /*
> > +    * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> > +    * Ethernet EQs when (max_num_queues * num_ports >
> num_msix_usable - 1)
> > +    */
> > +   max_num_queues = min(gc->max_num_queues, max_num_queues);
> > +   if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> > +           gc->msi_sharing = true;
>
> would then evaluate to true (32 * 2 > 63) and force the driver into MSI-X shared
> mode. This seems to contradict the intent to use dedicated MSI-X whenever
> possible.
>
> Would it be better to use rounddown_pow_of_two() instead to ensure the
> calculated queues fit within the available dedicated vectors?

  The roundup is intentional. The primary goal is to maximize the number of queues per vPort for throughput - power-of-2 queue counts align with RSS indirection table sizes and give the best
  distribution. Dedicated MSI-X is preferred when it comes for free, but not at the cost of halving the queue count.

  In your example (63 MSIs, 2 ports), rounddown would give 16 queues per port - nearly half the 31 that the hardware could support. Roundup gives 32 and triggers sharing mode, but sharing mode is
  designed to work correctly and the queues still get serviced. The driver trades per-EQ interrupt exclusivity for better parallelism, which is the right tradeoff for network throughput on these
  devices.

  The sharing check exists as a fallback for exactly this situation - it is not a failure mode but a deliberate operating point.

>
> [ ... ]
>
> > @@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
> >     if (err)
> >             goto destroy_hwc;
> >
> > -   err = mana_gd_query_max_resources(pdev);
> > +   err = mana_gd_detect_devices(pdev);
> >     if (err)
> >             goto destroy_hwc;
> >
> > -   err = mana_gd_setup_remaining_irqs(pdev);
> > -   if (err) {
> > -           dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> > -           goto destroy_hwc;
> > -   }
> > -
> > -   err = mana_gd_detect_devices(pdev);
> > +   err = mana_gd_query_max_resources(pdev);
> >     if (err)
> >             goto destroy_hwc;
> >
> > +   if (!gc->msi_sharing) {
> > +           gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable,
> GFP_KERNEL);
> > +           if (!gc->msi_bitmap) {
> > +                   err = -ENOMEM;
> > +                   goto destroy_hwc;
> > +           }
> > +           /* Set bit for HWC */
> > +           set_bit(0, gc->msi_bitmap);
> > +   } else {
> > +           err = mana_gd_setup_remaining_irqs(pdev);
> > +           if (err) {
> > +                   dev_err(gc->dev, "Failed to setup remaining IRQs: %d",
> err);
> > +                   goto destroy_hwc;
> > +           }
> > +   }
>
> If msi_sharing is disabled, we allocate the msi_bitmap but skip calling
> mana_gd_setup_remaining_irqs().
>
> Since mana_gd_setup_hwc_irqs() only allocates a single vector for the hardware
> channel when dynamic allocation is supported, does this leave the device without
> interrupts for its Ethernet queues?
>
> If so, it seems this could lead to queue creation failures when the driver attempts
> to map uninitialized vectors. I notice this is fixed in a later patch in the series
> ("net: mana: Allocate interrupt context for each EQ when creating vPort"), but
> does leaving it out here break bisectability?

  You're right that with pci_msix_can_alloc_dyn() returning true on MANA, the non-sharing path is reachable between patches 2 and 5. In that window, mana_gd_register_irq() would fail with -EINVAL
  because irq_contexts for indices 1+ haven't been populated yet.

  However, the failure is contained: mana_create_eq() returns an error, mana_alloc_queues() propagates it, and the interface fails to come up cleanly - no WARN, no crash, no data corruption. The
  driver remains in a consistent state and succeeds once the full series is applied.

  This is a new capability being built up across the series. The dedicated MSI-X mode did not exist before, so there is no regression from the pre-patch baseline - the pre-patch code always went
  through mana_gd_setup_remaining_irqs() and operated in what is now called sharing mode. Restructuring the series to make non-sharing mode functional at each intermediate commit would require
  squashing the GIC infrastructure (patches 3-4) into this patch, producing a single large change that is significantly harder to review.

  I'd prefer to keep the logical separation as-is. If you feel strongly about strict bisectability, I could add a fallback in this patch that forces msi_sharing = true when the GIC allocator is
  not yet available, and have patch 5 remove it - but that adds throwaway code to an intermediate commit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
  2026-05-02 15:26   ` Simon Horman
@ 2026-05-04 22:30     ` Long Li
  0 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-04 22:30 UTC (permalink / raw)
  To: Simon Horman
  Cc: Konstantin Taranov, kuba@kernel.org, davem@davemloft.net,
	pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
	jgg@ziepe.ca, leon@kernel.org, Haiyang Zhang, KY Srinivasan,
	wei.liu@kernel.org, Dexuan Cui, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> Full review at:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetdev-
> ai.bots.linux.dev%2Fsashiko%2F&data=05%7C02%7Clongli%40microsoft.com%7
> Ca5d48bdebd0240e5d91308dea85ff3a8%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C639133327234427106%7CUnknown%7CTWFpbGZsb3d8eyJFbX
> B0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbC
> IsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=4doWStoTQ4w7CbmhN3I294b%
> 2B9c3NNjDGySSmISHGZk4%3D&reserved=0
> 
> I apologise for any overlap with this and the review of this patchset available at
> sashiko.dev. They both seem relevant and I'm unsure how to cleanly pass them
> both on.
> 
> Both Sashiko instances also flagged concerns in the remaining patches of this
> patchset. But I am not forwarding them as they seem to be lower priority, and I
> think I've already created enough noise here.
> 
> ---
> net: mana: Query device capabilities and configure MSI-X sharing for EQs
> 
> This patch queries device capabilities early in GDMA probe and chooses between
> dedicated MSI-X per vPort or a shared-MSI-X mode based on the available vector
> count, introducing gc->max_num_queues_vport,
> gc->msi_sharing and gc->msi_bitmap, and renaming mana_query_device_cfg()
> to mana_gd_query_device_cfg() so it can be called from the GDMA layer.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 098fbda0d128..b96859e0aec9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -149,6 +149,9 @@ static int mana_gd_query_max_resources(struct
> pci_dev *pdev)
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> >  	struct gdma_query_max_resources_resp resp = {};
> >  	struct gdma_general_req req = {};
> > +	unsigned int max_num_queues;
> > +	u8 bm_hostmode;
> > +	u16 num_ports;
> >  	int err;
> >
> >  	mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
> @@ -194,6
> > +197,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> >  	if (gc->max_num_queues > gc->num_msix_usable - 1)
> >  		gc->max_num_queues = gc->num_msix_usable - 1;
> >
> > +	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION,
> MANA_MINOR_VERSION,
> > +				       MANA_MICRO_VERSION, &num_ports,
> &bm_hostmode);
> > +	if (err)
> > +		return err;
> 
> Is this the same MANA_QUERY_DEV_CONFIG that mana_probe() issues shortly
> after?  mana_probe() still contains:
> 
> 	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION,
> MANA_MINOR_VERSION,
> 				       MANA_MICRO_VERSION, &num_ports,
> &bm_hostmode);
> 
> so the same HWC round-trip runs twice on every probe with identical arguments,
> both populating gc->adapter_mtu.
> 
> The commit message frames this rename as consolidation ("used at GDMA device
> probe time"), but the mana_probe() call-site is not removed.  Was one of the two
> call-sites meant to go away?
> 
> The relocation of debugfs_create_u16("adapter-MTU", ...) from
> mana_gd_query_device_cfg() into mana_probe() only makes sense if the function
> is now invoked more than once, which hints at the same duplication.

  The two calls are intentional and serve different layers. The call in mana_gd_query_max_resources() runs at the GDMA layer during device setup - it needs num_ports to compute per-vPort MSI-X
  allocation before any MANA-level structures exist. The call in mana_probe() runs at the MANA layer and consumes the results differently: it populates ac->num_ports, ac->bm_hostmode, and on
  resume validates that the port count hasn't changed.

  The HWC round-trip is lightweight (a single command/response exchange), and the function is renamed to mana_gd_ precisely because it now has callers at both layers. The debugfs move is the same
  reason - the debugfs node should only be created once, so it moves to mana_probe() which runs once, rather than staying in the function that is now called from both sites.

> 
> > +
> > +	if (!num_ports)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Adjust gc->max_num_queues returned from the SOC to allow
> dedicated
> > +	 * MSIx for each vPort. Clamp to no less than
> MANA_DEF_NUM_QUEUES.
> > +	 */
> > +	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> > +	max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
> > +	if (max_num_queues < MANA_DEF_NUM_QUEUES)
> > +		max_num_queues = MANA_DEF_NUM_QUEUES;
> > +
> > +	/*
> > +	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> > +	 * Ethernet EQs when (max_num_queues * num_ports >
> num_msix_usable - 1)
> > +	 */
> > +	max_num_queues = min(gc->max_num_queues, max_num_queues);
> > +	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> > +		gc->msi_sharing = true;
> 
> Is gc->msi_sharing ever reset to false?  The only two writers are this line and
> mana_gd_setup_hwc_irqs() (the !pci_msix_can_alloc_dyn branch), and both only
> set it to true.  mana_gd_remove_irqs() frees msi_bitmap and zeros
> max_num_msix / num_msix_usable, but does not clear msi_sharing, and the
> gdma_context survives mana_gd_suspend() / mana_gd_resume().
> 
> Once true is latched in any setup cycle, mana_gd_setup() will always take the
> sharing branch on resume even if the recomputed max_num_queues * num_ports
> fits within num_msix_usable - 1.  Is that consistent with the commit message
> saying sharing "is only enabled when there are not enough MSI-X vectors for
> dedicated allocation"?

  msi_sharing is one-directional by design, but this is not a problem in practice. The inputs to the sharing computation - num_msix_usable, num_ports, and hardware queue limits - are properties of
  the hardware and do not change across suspend/resume cycles. The gdma_context is kzalloc'd at probe time (so msi_sharing starts as false), and on any given hardware configuration the computation
  in mana_gd_query_max_resources() will always produce the same result.

  On resume, mana_gd_setup_hwc_irqs() re-evaluates dynamic MSI-X support and mana_gd_query_max_resources() recomputes the sharing check from the same hardware parameters. Since the hardware
  doesn't change, the flag converges to the same value. Adding an explicit reset would be defensive but not functionally necessary.

> 
> > +
> > +	/* If MSI is shared, use max allowed value */
> > +	if (gc->msi_sharing)
> > +		gc->max_num_queues_vport = min(gc->num_msix_usable - 1,
> gc->max_num_queues);
> > +	else
> > +		gc->max_num_queues_vport = max_num_queues;
> > +
> > +	dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
> > +		 gc->msi_sharing, gc->max_num_queues);
> 
> Should this print gc->max_num_queues_vport rather than
> gc->max_num_queues?  The block immediately above computes
> gc->max_num_queues_vport and leaves gc->max_num_queues unchanged, and
> mana_probe_port() sizes alloc_etherdev_mq() and apc->max_queues from
> gc->max_num_queues_vport:
> 
> 	ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
> 				 gc->max_num_queues_vport);
> 	...
> 	apc->max_queues = gc->max_num_queues_vport;
> 	apc->num_queues = min(gc->max_num_queues_vport,
> MANA_DEF_NUM_QUEUES);
> 
> so the logged value does not match the per-vPort queue count the driver actually
> exposes.


  The printed value is intentional. gc->max_num_queues is the GDMA device-level maximum, which is consumed by both the Ethernet and RDMA drivers - the RDMA driver uses it to size its queue
  resources. Printing max_num_queues_vport here would only reflect the Ethernet per-vPort limit and would be misleading in a device-level log message. The per-vPort value is already visible
  through ethtool channel reporting.

> 
> > +
> >  	return 0;
> >  }
> >
> > @@ -1856,6 +1893,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev
> > *pdev)
> 
> [ ... ]
> 
> > @@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
> >  	if (err)
> >  		goto destroy_hwc;
> >
> > -	err = mana_gd_query_max_resources(pdev);
> > +	err = mana_gd_detect_devices(pdev);
> >  	if (err)
> >  		goto destroy_hwc;
> >
> > -	err = mana_gd_setup_remaining_irqs(pdev);
> > -	if (err) {
> > -		dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> > -		goto destroy_hwc;
> > -	}
> > -
> > -	err = mana_gd_detect_devices(pdev);
> > +	err = mana_gd_query_max_resources(pdev);
> >  	if (err)
> >  		goto destroy_hwc;
> >
> > +	if (!gc->msi_sharing) {
> > +		gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable,
> GFP_KERNEL);
> > +		if (!gc->msi_bitmap) {
> > +			err = -ENOMEM;
> > +			goto destroy_hwc;
> > +		}
> > +		/* Set bit for HWC */
> > +		set_bit(0, gc->msi_bitmap);
> > +	} else {
> > +		err = mana_gd_setup_remaining_irqs(pdev);
> > +		if (err) {
> > +			dev_err(gc->dev, "Failed to setup remaining IRQs: %d",
> err);
> > +			goto destroy_hwc;
> > +		}
> > +	}
> > +
> 
> Can the driver bring up any vPort after this change when the !msi_sharing branch
> is taken?
> 
> In the dedicated branch, only gc->msi_bitmap is allocated and bit 0 is set for
> HWC.  mana_gd_setup_remaining_irqs() is skipped, so no gdma_irq_context is
> inserted into gc->irq_contexts for indices 1..
> num_msix_usable-1.
> 
> Later, mana_create_eq() still assigns
> 
> 	spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
> 
> and mana_gd_register_irq() does:
> 
> 	gic = xa_load(&gc->irq_contexts, msi_index);
> 	if (WARN_ON(!gic))
> 		return -EINVAL;
> 
> On a typical cloud SKU with, for example, num_msix_usable=32,
> num_ports=1 and num_online_cpus=16, the new math keeps msi_sharing=false
> (16 * 1 <= 31), so every EQ-create goes down this path and hits the WARN_ON.
> Doesn't that make every vPort open and every resume fail for the common
> dedicated-MSI-X case?
> 
> The msi_bitmap allocated here is not consumed anywhere in this commit; the on-
> demand allocation via mana_gd_get_gic() appears in the later commit "net:
> mana: Allocate interrupt context for each EQ when creating vPort"
> (dbbdf40a8974).  Should the bitmap and the new branch be introduced in the
> same commit that actually uses them, so each commit in the series is
> independently bootable?

  You're right that the non-sharing EQ creation path is not fully functional until patch 5 wires mana_gd_get_gic() into mana_create_eq(). However, this is a new capability being built
  incrementally: patch 2 introduces the decision framework and bitmap, patch 3 adds the GIC infrastructure, patch 4 converts global IRQ setup to use it, and patch 5 integrates it into per-vPort EQ
  creation.

  The intermediate state between patches 2 and 5 results in a clean error (-EINVAL from mana_gd_register_irq) - not a crash or data corruption. The dedicated MSI-X mode is a new feature that did
  not exist before this series, so there is no regression from the pre-patch baseline. Restructuring to make it functional at each intermediate commit would require squashing the GIC
  infrastructure into this patch, producing a significantly larger and harder-to-review change. I'd prefer to keep the logical separation as-is.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-04 22:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 22:16 [PATCH net-next v6 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-04-29 22:16 ` [PATCH net-next v6 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-02 15:07   ` Simon Horman
2026-05-04 21:51     ` [EXTERNAL] " Long Li
2026-05-02 15:23   ` Simon Horman
2026-05-02 15:29     ` Simon Horman
2026-05-04 22:08       ` [EXTERNAL] " Long Li
2026-05-04 22:07     ` Long Li
2026-04-29 22:16 ` [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-02 15:08   ` Simon Horman
2026-05-04 22:21     ` [EXTERNAL] " Long Li
2026-05-02 15:26   ` Simon Horman
2026-05-04 22:30     ` [EXTERNAL] " Long Li
2026-04-29 22:16 ` [PATCH net-next v6 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-04-29 22:16 ` [PATCH net-next v6 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-04-29 22:16 ` [PATCH net-next v6 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-04-29 22:16 ` [PATCH net-next v6 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox