* [PATCH net-next v12 1/6] net: mana: Create separate EQs for each vPort
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
@ 2026-06-05 0:57 ` Long Li
2026-06-05 0:57 ` [PATCH net-next v12 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.
Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.
RSS QPs now take a vport reference via pd->vport_use_count to ensure
EQs outlive all QP consumers. The vport must already be configured by
a raw QP before an RSS QP can be created. EQs are only destroyed when
the last QP (raw or RSS) on the PD releases its reference.
Restrict each vport to a single RSS QP. The hardware only supports one
steering configuration (indirection table / hash key) per vport, and
mana_disable_vport_rx() on QP destroy disables RX globally for the
vport. Previously, creating a second RSS QP would silently overwrite
the first QP's steering config and destroy would blackhole all traffic.
This is now explicitly rejected with -EBUSY. Existing applications
(DPDK being the primary RDMA consumer) always create one RSS QP per
vport, so no real-world flows are affected.
Reject cross-port PD sharing for both raw and RSS QPs. Since EQs and
vport configuration are per-port, a PD is bound to the port used by
its first raw QP. Subsequent QPs on the same PD must use the same
port or the creation fails with -EINVAL. Previously this was silently
broken: with shared EQs it appeared to work, but with per-vPort EQs
a cross-port PD would cause wrong-port EQ teardown and corruption.
DPDK creates one PD per port so no existing flows are affected.
Serialize mana_set_channels() and the async per-port queue reset
handler against RDMA vport configuration to prevent RDMA from claiming
the vport during the detach/attach window. A channel_changing flag is
set under apc->vport_mutex before detach and checked by
mana_cfg_vport() when called from the RDMA path, blocking RDMA from
grabbing the vport during the entire window. When the port is down
and RDMA already holds the vport, the channel change is rejected with
-EBUSY.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 40 ++++--
drivers/infiniband/hw/mana/mana_ib.h | 14 ++
drivers/infiniband/hw/mana/qp.c | 68 ++++++++-
drivers/net/ethernet/microsoft/mana/mana_en.c | 135 +++++++++++-------
.../ethernet/microsoft/mana/mana_ethtool.c | 23 ++-
include/net/mana/mana.h | 15 +-
6 files changed, 228 insertions(+), 67 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index afc2fc124fee..f42ea20cb75d 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
pd->vport_use_count--;
WARN_ON(pd->vport_use_count < 0);
- if (!pd->vport_use_count)
+ if (!pd->vport_use_count) {
+ mana_destroy_eq(mpc);
mana_uncfg_vport(mpc);
+ }
mutex_unlock(&pd->vport_mutex);
}
@@ -40,13 +42,27 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
pd->vport_use_count++;
if (pd->vport_use_count > 1) {
+ /* Reject cross-port PD sharing. EQs and vport config
+ * are per-port, so the PD must stay bound to the port
+ * that was configured on the first raw QP creation.
+ */
+ if (pd->vport_port != port) {
+ pd->vport_use_count--;
+ mutex_unlock(&pd->vport_mutex);
+ ibdev_dbg(&dev->ib_dev,
+ "PD already bound to port %u\n",
+ pd->vport_port);
+ return -EINVAL;
+ }
ibdev_dbg(&dev->ib_dev,
"Skip as this PD is already configured vport\n");
mutex_unlock(&pd->vport_mutex);
return 0;
}
- err = mana_cfg_vport(mpc, pd->pdn, doorbell_id);
+ pd->vport_port = port;
+
+ err = mana_cfg_vport(mpc, pd->pdn, doorbell_id, true);
if (err) {
pd->vport_use_count--;
mutex_unlock(&pd->vport_mutex);
@@ -55,15 +71,23 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
return err;
}
- mutex_unlock(&pd->vport_mutex);
- pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
- pd->tx_vp_offset = mpc->tx_vp_offset;
+ err = mana_create_eq(mpc);
+ if (err) {
+ mana_uncfg_vport(mpc);
+ pd->vport_use_count--;
+ } else {
+ pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
+ pd->tx_vp_offset = mpc->tx_vp_offset;
+ }
+
+ mutex_unlock(&pd->vport_mutex);
- ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
- mpc->port_handle, pd->pdn, doorbell_id);
+ if (!err)
+ ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
+ mpc->port_handle, pd->pdn, doorbell_id);
- return 0;
+ return err;
}
int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index c9c94e86a72b..da05966aff19 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -102,6 +102,20 @@ struct mana_ib_pd {
struct mutex vport_mutex;
int vport_use_count;
+ /* Port bound to this PD for raw QP usage. Only valid when
+ * vport_use_count > 0. A PD can only be associated with a
+ * single physical port because per-port EQs and vport
+ * configuration are tied to the PD's refcount.
+ */
+ u32 vport_port;
+
+ /* Only one RSS QP is allowed per vport because each RSS QP
+ * overwrites the vport steering config (indirection table /
+ * hash key) and mana_disable_vport_rx() on destroy would
+ * blackhole traffic for any other RSS QP on the same vport.
+ */
+ bool has_rss_qp;
+
bool tx_shortform_allowed;
u32 tx_vp_offset;
};
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 0fbcf449c134..d3ee30b64f53 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -79,6 +79,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
struct ib_qp_init_attr *attr,
struct ib_udata *udata)
{
+ struct mana_ib_pd *mana_pd = container_of(pd, struct mana_ib_pd, ibpd);
struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
struct mana_ib_dev *mdev =
container_of(pd->device, struct mana_ib_dev, ib_dev);
@@ -155,6 +156,30 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
qp->port = port;
+ /* Take a reference on the vport to ensure EQs outlive this QP.
+ * The vport must already be configured by a raw QP on the
+ * same port — cross-port PD sharing is not supported.
+ * Only one RSS QP per vport is allowed because each one
+ * overwrites the steering config and destroy disables RX
+ * globally.
+ */
+ mutex_lock(&mana_pd->vport_mutex);
+ if (!mana_pd->vport_use_count || mana_pd->vport_port != port) {
+ mutex_unlock(&mana_pd->vport_mutex);
+ ret = -EINVAL;
+ goto fail;
+ }
+ if (mana_pd->has_rss_qp) {
+ mutex_unlock(&mana_pd->vport_mutex);
+ ibdev_dbg(&mdev->ib_dev,
+ "Only one RSS QP per vport is supported\n");
+ ret = -EBUSY;
+ goto fail;
+ }
+ mana_pd->vport_use_count++;
+ mana_pd->has_rss_qp = true;
+ mutex_unlock(&mana_pd->vport_mutex);
+
for (i = 0; i < ind_tbl_size; i++) {
struct mana_obj_spec wq_spec = {};
struct mana_obj_spec cq_spec = {};
@@ -171,13 +196,19 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
cq_spec.gdma_region = cq->queue.gdma_region;
cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
cq_spec.modr_ctx_id = 0;
- eq = &mpc->ac->eqs[cq->comp_vector];
+ /* Map comp_vector to a per-vPort EQ. The modulo handles
+ * the case where the RDMA-advertised num_comp_vectors
+ * exceeds this port's num_queues (e.g. after ethtool -L
+ * reduces it), remapping to an available EQ rather than
+ * failing the QP creation.
+ */
+ eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
cq_spec.attached_eq = eq->eq->id;
ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
&wq_spec, &cq_spec, &wq->rx_object);
if (ret)
- goto fail;
+ goto free_vport;
/* The GDMA regions are now owned by the WQ object */
wq->queue.gdma_region = GDMA_INVALID_DMA_REGION;
@@ -199,7 +230,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
ret = mana_ib_install_cq_cb(mdev, cq);
if (ret) {
mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
- goto fail;
+ goto free_vport;
}
}
resp.num_entries = i;
@@ -210,7 +241,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
ucmd.rx_hash_key_len,
ucmd.rx_hash_key);
if (ret)
- goto fail;
+ goto free_vport;
ret = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (ret) {
@@ -226,7 +257,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
err_disable_vport_rx:
mana_disable_vport_rx(mpc);
-fail:
+free_vport:
while (i-- > 0) {
ibwq = ind_tbl->ind_tbl[i];
ibcq = ibwq->cq;
@@ -237,6 +268,13 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
}
+ mutex_lock(&mana_pd->vport_mutex);
+ mana_pd->has_rss_qp = false;
+ mutex_unlock(&mana_pd->vport_mutex);
+
+ mana_ib_uncfg_vport(mdev, mana_pd, port);
+
+fail:
kfree(mana_ind_table);
return ret;
@@ -299,7 +337,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
err = mana_ib_cfg_vport(mdev, port, pd, mana_ucontext->doorbell);
if (err)
- return -ENODEV;
+ return err;
qp->port = port;
@@ -321,7 +359,14 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
cq_spec.modr_ctx_id = 0;
eq_vec = send_cq->comp_vector;
- eq = &mpc->ac->eqs[eq_vec];
+ if (!mpc->eqs) {
+ err = -EINVAL;
+ goto err_destroy_queue;
+ }
+ /* Map comp_vector to a per-vPort EQ. See comment in
+ * mana_ib_create_qp_rss() for the modulo rationale.
+ */
+ eq = &mpc->eqs[eq_vec % mpc->num_queues];
cq_spec.attached_eq = eq->eq->id;
err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
@@ -785,14 +830,17 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
{
struct mana_ib_dev *mdev =
container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
+ struct ib_pd *ibpd = qp->ibqp.pd;
struct mana_port_context *mpc;
struct net_device *ndev;
+ struct mana_ib_pd *pd;
struct mana_ib_wq *wq;
struct ib_wq *ibwq;
int i;
ndev = mana_ib_get_netdev(qp->ibqp.device, qp->port);
mpc = netdev_priv(ndev);
+ pd = container_of(ibpd, struct mana_ib_pd, ibpd);
/* Disable vPort RX steering before destroying RX WQ objects.
* Otherwise firmware still routes traffic to the destroyed queues,
@@ -817,6 +865,12 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
}
+ mutex_lock(&pd->vport_mutex);
+ pd->has_rss_qp = false;
+ mutex_unlock(&pd->vport_mutex);
+
+ mana_ib_uncfg_vport(mdev, pd, qp->port);
+
return 0;
}
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index db14357d3732..ed60cc15fe78 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -309,11 +309,18 @@ static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
rtnl_lock();
+ /* Block RDMA from grabbing the vport during the detach/attach
+ * window, same as mana_set_channels().
+ */
+ mutex_lock(&apc->vport_mutex);
+ apc->channel_changing = true;
+ mutex_unlock(&apc->vport_mutex);
+
/* Pre-allocate buffers to prevent failure in mana_attach later */
err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
if (err) {
netdev_err(ndev, "Insufficient memory for reset post tx stall detection\n");
- goto out;
+ goto clear_flag;
}
err = mana_detach(ndev, false);
@@ -328,7 +335,11 @@ static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
dealloc_pre_rxbufs:
mana_pre_dealloc_rxbufs(apc);
-out:
+clear_flag:
+ mutex_lock(&apc->vport_mutex);
+ apc->channel_changing = false;
+ mutex_unlock(&apc->vport_mutex);
+
rtnl_unlock();
}
@@ -1298,7 +1309,7 @@ void mana_uncfg_vport(struct mana_port_context *apc)
EXPORT_SYMBOL_NS(mana_uncfg_vport, "NET_MANA");
int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
- u32 doorbell_pg_id)
+ u32 doorbell_pg_id, bool check_channel_changing)
{
struct mana_config_vport_resp resp = {};
struct mana_config_vport_req req = {};
@@ -1323,7 +1334,8 @@ int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
* Ethernet usage on the same port.
*/
mutex_lock(&apc->vport_mutex);
- if (apc->vport_use_count > 0) {
+ if (apc->vport_use_count > 0 ||
+ (check_channel_changing && apc->channel_changing)) {
mutex_unlock(&apc->vport_mutex);
return -EBUSY;
}
@@ -1623,78 +1635,84 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
}
EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA");
-static void mana_destroy_eq(struct mana_context *ac)
+void mana_destroy_eq(struct mana_port_context *apc)
{
+ struct mana_context *ac = apc->ac;
struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_queue *eq;
int i;
- if (!ac->eqs)
+ if (!apc->eqs)
return;
- debugfs_remove_recursive(ac->mana_eqs_debugfs);
- ac->mana_eqs_debugfs = NULL;
+ debugfs_remove_recursive(apc->mana_eqs_debugfs);
+ apc->mana_eqs_debugfs = NULL;
- for (i = 0; i < gc->max_num_queues; i++) {
- eq = ac->eqs[i].eq;
+ for (i = 0; i < apc->num_queues; i++) {
+ eq = apc->eqs[i].eq;
if (!eq)
continue;
mana_gd_destroy_queue(gc, eq);
}
- kfree(ac->eqs);
- ac->eqs = NULL;
+ kfree(apc->eqs);
+ apc->eqs = NULL;
}
+EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA");
-static void mana_create_eq_debugfs(struct mana_context *ac, int i)
+static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
{
- struct mana_eq eq = ac->eqs[i];
+ struct mana_eq eq = apc->eqs[i];
char eqnum[32];
sprintf(eqnum, "eq%d", i);
- eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
+ eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
}
-static int mana_create_eq(struct mana_context *ac)
+int mana_create_eq(struct mana_port_context *apc)
{
- struct gdma_dev *gd = ac->gdma_dev;
+ struct gdma_dev *gd = apc->ac->gdma_dev;
struct gdma_context *gc = gd->gdma_context;
struct gdma_queue_spec spec = {};
int err;
int i;
- ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
- if (!ac->eqs)
+ if (WARN_ON(apc->eqs))
+ return -EEXIST;
+ apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
+ if (!apc->eqs)
return -ENOMEM;
spec.type = GDMA_EQ;
spec.monitor_avl_buf = false;
spec.queue_size = EQ_SIZE;
spec.eq.callback = NULL;
- spec.eq.context = ac->eqs;
+ spec.eq.context = apc->eqs;
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
- ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
+ apc->mana_eqs_debugfs =
+ debugfs_create_dir("EQs", apc->mana_port_debugfs);
- for (i = 0; i < gc->max_num_queues; i++) {
+ for (i = 0; i < apc->num_queues; i++) {
spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
- err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
+ err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
if (err) {
dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
goto out;
}
- mana_create_eq_debugfs(ac, i);
+ mana_create_eq_debugfs(apc, i);
}
return 0;
out:
- mana_destroy_eq(ac);
+ mana_destroy_eq(apc);
return err;
}
+EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA");
static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *rxq)
{
@@ -2462,7 +2480,7 @@ static int mana_create_txq(struct mana_port_context *apc,
spec.monitor_avl_buf = false;
spec.queue_size = cq_size;
spec.cq.callback = mana_schedule_napi;
- spec.cq.parent_eq = ac->eqs[i].eq;
+ spec.cq.parent_eq = apc->eqs[i].eq;
spec.cq.context = cq;
err = mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq);
if (err)
@@ -2855,13 +2873,12 @@ static void mana_create_rxq_debugfs(struct mana_port_context *apc, int idx)
static int mana_add_rx_queues(struct mana_port_context *apc,
struct net_device *ndev)
{
- struct mana_context *ac = apc->ac;
struct mana_rxq *rxq;
int err = 0;
int i;
for (i = 0; i < apc->num_queues; i++) {
- rxq = mana_create_rxq(apc, i, &ac->eqs[i], ndev);
+ rxq = mana_create_rxq(apc, i, &apc->eqs[i], ndev);
if (!rxq) {
err = -ENOMEM;
netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err);
@@ -2880,9 +2897,8 @@ static int mana_add_rx_queues(struct mana_port_context *apc,
return err;
}
-static void mana_destroy_vport(struct mana_port_context *apc)
+static void mana_destroy_rxqs(struct mana_port_context *apc)
{
- struct gdma_dev *gd = apc->ac->gdma_dev;
struct mana_rxq *rxq;
u32 rxq_idx;
@@ -2897,8 +2913,12 @@ static void mana_destroy_vport(struct mana_port_context *apc)
apc->rxqs[rxq_idx] = NULL;
}
}
+}
+
+static void mana_destroy_vport(struct mana_port_context *apc)
+{
+ struct gdma_dev *gd = apc->ac->gdma_dev;
- mana_destroy_txq(apc);
mana_uncfg_vport(apc);
if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
@@ -2919,11 +2939,14 @@ static int mana_create_vport(struct mana_port_context *apc,
return err;
}
- err = mana_cfg_vport(apc, gd->pdid, gd->doorbell);
- if (err)
+ err = mana_cfg_vport(apc, gd->pdid, gd->doorbell, false);
+ if (err) {
+ if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
+ mana_pf_deregister_hw_vport(apc);
return err;
+ }
- return mana_create_txq(apc, net);
+ return 0;
}
static int mana_rss_table_alloc(struct mana_port_context *apc)
@@ -3226,21 +3249,36 @@ int mana_alloc_queues(struct net_device *ndev)
err = mana_create_vport(apc, ndev);
if (err) {
- netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err);
+ netdev_err(ndev, "Failed to create vPort %u : %d\n",
+ apc->port_idx, err);
return err;
}
+ err = mana_create_eq(apc);
+ if (err) {
+ netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n",
+ apc->port_idx, err);
+ goto destroy_vport;
+ }
+
+ err = mana_create_txq(apc, ndev);
+ if (err) {
+ netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n",
+ apc->port_idx, err);
+ goto destroy_eq;
+ }
+
err = netif_set_real_num_tx_queues(ndev, apc->num_queues);
if (err) {
netdev_err(ndev,
"netif_set_real_num_tx_queues () failed for ndev with num_queues %u : %d\n",
apc->num_queues, err);
- goto destroy_vport;
+ goto destroy_txq;
}
err = mana_add_rx_queues(apc, ndev);
if (err)
- goto destroy_vport;
+ goto destroy_rxq;
apc->rss_state = apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE;
@@ -3249,7 +3287,7 @@ int mana_alloc_queues(struct net_device *ndev)
netdev_err(ndev,
"netif_set_real_num_rx_queues () failed for ndev with num_queues %u : %d\n",
apc->num_queues, err);
- goto destroy_vport;
+ goto destroy_rxq;
}
mana_rss_table_init(apc);
@@ -3257,19 +3295,25 @@ int mana_alloc_queues(struct net_device *ndev)
err = mana_config_rss(apc, TRI_STATE_TRUE, true, true);
if (err) {
netdev_err(ndev, "Failed to configure RSS table: %d\n", err);
- goto destroy_vport;
+ goto destroy_rxq;
}
if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) {
err = mana_pf_register_filter(apc);
if (err)
- goto destroy_vport;
+ goto destroy_rxq;
}
mana_chn_setxdp(apc, mana_xdp_get(apc));
return 0;
+destroy_rxq:
+ mana_destroy_rxqs(apc);
+destroy_txq:
+ mana_destroy_txq(apc);
+destroy_eq:
+ mana_destroy_eq(apc);
destroy_vport:
mana_destroy_vport(apc);
return err;
@@ -3380,6 +3424,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
mana_fence_rqs(apc);
/* Even in err case, still need to cleanup the vPort */
+ mana_destroy_rxqs(apc);
+ mana_destroy_txq(apc);
+ mana_destroy_eq(apc);
mana_destroy_vport(apc);
return 0;
@@ -3706,12 +3753,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
- err = mana_create_eq(ac);
- if (err) {
- dev_err(dev, "Failed to create EQs: %d\n", err);
- goto out;
- }
-
err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
if (err)
@@ -3856,8 +3897,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
free_netdev(ndev);
}
- mana_destroy_eq(ac);
-
if (ac->per_port_queue_reset_wq) {
destroy_workqueue(ac->per_port_queue_reset_wq);
ac->per_port_queue_reset_wq = NULL;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 04350973e19e..4633acc976f0 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -454,6 +454,11 @@ static int mana_set_coalesce(struct net_device *ndev,
return err;
}
+/* mana_set_channels - change the number of queues on a port
+ *
+ * Returns -EBUSY if RDMA holds the vport with EQs sized to the
+ * current num_queues.
+ */
static int mana_set_channels(struct net_device *ndev,
struct ethtool_channels *channels)
{
@@ -462,10 +467,22 @@ static int mana_set_channels(struct net_device *ndev,
unsigned int old_count = apc->num_queues;
int err;
+ /* Set channel_changing to block RDMA from grabbing the vport
+ * during the detach/attach window. mana_cfg_vport() checks
+ * this flag under vport_mutex and returns -EBUSY if set.
+ */
+ mutex_lock(&apc->vport_mutex);
+ if (!apc->port_is_up && apc->vport_use_count) {
+ mutex_unlock(&apc->vport_mutex);
+ return -EBUSY;
+ }
+ apc->channel_changing = true;
+ mutex_unlock(&apc->vport_mutex);
+
err = mana_pre_alloc_rxbufs(apc, ndev->mtu, new_count);
if (err) {
netdev_err(ndev, "Insufficient memory for new allocations");
- return err;
+ goto clear_flag;
}
err = mana_detach(ndev, false);
@@ -483,6 +500,10 @@ static int mana_set_channels(struct net_device *ndev,
out:
mana_pre_dealloc_rxbufs(apc);
+clear_flag:
+ mutex_lock(&apc->vport_mutex);
+ apc->channel_changing = false;
+ mutex_unlock(&apc->vport_mutex);
return err;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index d9c27310fd04..5a9b94e0ef34 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -480,8 +480,6 @@ struct mana_context {
u8 bm_hostmode;
struct mana_ethtool_hc_stats hc_stats;
- struct mana_eq *eqs;
- struct dentry *mana_eqs_debugfs;
struct workqueue_struct *per_port_queue_reset_wq;
/* Workqueue for querying hardware stats */
struct delayed_work gf_stats_work;
@@ -501,6 +499,9 @@ struct mana_port_context {
u8 mac_addr[ETH_ALEN];
+ struct mana_eq *eqs;
+ struct dentry *mana_eqs_debugfs;
+
enum TRI_STATE rss_state;
mana_handle_t default_rxobj;
@@ -547,6 +548,12 @@ struct mana_port_context {
struct mutex vport_mutex;
int vport_use_count;
+ /* Set by mana_set_channels() under vport_mutex to block RDMA
+ * from grabbing the vport during the detach/attach window.
+ * Checked by mana_cfg_vport() when called from the RDMA path.
+ */
+ bool channel_changing;
+
/* Net shaper handle*/
struct net_shaper_handle handle;
@@ -1040,8 +1047,10 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
mana_handle_t wq_obj);
int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
- u32 doorbell_pg_id);
+ u32 doorbell_pg_id, bool check_channel_changing);
void mana_uncfg_vport(struct mana_port_context *apc);
+int mana_create_eq(struct mana_port_context *apc);
+void mana_destroy_eq(struct mana_port_context *apc);
struct net_device *mana_get_primary_netdev(struct mana_context *ac,
u32 port_index,
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net-next v12 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
2026-06-05 0:57 ` [PATCH net-next v12 1/6] net: mana: Create separate EQs for each vPort Long Li
@ 2026-06-05 0:57 ` Long Li
2026-06-05 0:57 ` [PATCH net-next v12 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
When querying the device, adjust the max number of queues to allow
dedicated MSI-X vectors for each vPort. The per-vPort queue count is
clamped towards MANA_DEF_NUM_QUEUES but will not exceed the hardware
maximum reported by the device.
MSI-X sharing among vPorts is enabled when there are not enough MSI-X
vectors for dedicated allocation, or when the platform does not support
dynamic MSI-X allocation (in which case all vectors are pre-allocated
at probe time and sharing is always used). The msi_sharing flag is
reset at the top of mana_gd_query_max_resources() so it is recomputed
from current hardware state on each probe or resume cycle.
Clamp apc->max_queues to gc->max_num_queues_vport in mana_init_port()
so that on resume, if max_num_queues_vport has decreased due to fewer
MSI-X vectors, num_queues is reduced accordingly before EQ allocation.
A device reporting zero ports now results in a fatal probe error since
the per-vPort MSI-X math requires at least one port.
Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is
used at GDMA device probe time for querying device capabilities.
Signed-off-by: Long Li <longli@microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 78 ++++++++++++++++++-
drivers/net/ethernet/microsoft/mana/mana_en.c | 45 ++++++-----
include/net/mana/gdma.h | 13 +++-
3 files changed, 114 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 712a0881d720..786e1d4cd4fe 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -179,8 +179,21 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_query_max_resources_resp resp = {};
struct gdma_general_req req = {};
+ unsigned int max_num_queues;
+ u8 bm_hostmode;
+ u16 num_ports;
int err;
+ /* Reset msi_sharing so it is recomputed from current hardware
+ * state. On resume, num_online_cpus() or num_msix_usable may
+ * have changed, making dedicated MSI-X feasible where it was
+ * not before. Only reset on platforms that support dynamic
+ * MSI-X allocation; on non-dyn platforms msi_sharing is
+ * unconditionally true (set in mana_gd_setup_hwc_irqs).
+ */
+ if (pci_msix_can_alloc_dyn(to_pci_dev(gc->dev)))
+ gc->msi_sharing = false;
+
mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
sizeof(req), sizeof(resp));
@@ -232,6 +245,52 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
debugfs_create_u32("max_num_queues", 0400, gc->mana_pci_debugfs,
&gc->max_num_queues);
+ err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION,
+ MANA_MINOR_VERSION,
+ MANA_MICRO_VERSION,
+ &num_ports, &bm_hostmode);
+ if (err)
+ return err;
+
+ if (!num_ports) {
+ dev_err(gc->dev, "Failed to detect any vPort\n");
+ return -EINVAL;
+ }
+
+ /* Cap to the same limit used by mana_probe() for port instantiation,
+ * so MSI-X and queue budgeting matches the actual port count.
+ */
+ if (num_ports > MAX_PORTS_IN_MANA_DEV)
+ num_ports = MAX_PORTS_IN_MANA_DEV;
+
+ /*
+ * Adjust the per-vPort max queue count to allow dedicated
+ * MSIx for each vPort. Prefer at least MANA_DEF_NUM_QUEUES,
+ * but the hardware max (gc->max_num_queues) takes precedence.
+ */
+ max_num_queues = (gc->num_msix_usable - 1) / num_ports;
+ max_num_queues = rounddown_pow_of_two(max(max_num_queues, 1U));
+ if (max_num_queues < MANA_DEF_NUM_QUEUES)
+ max_num_queues = MANA_DEF_NUM_QUEUES;
+
+ /*
+ * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
+ * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1).
+ */
+ max_num_queues = min(gc->max_num_queues, max_num_queues);
+ if (max_num_queues * num_ports > gc->num_msix_usable - 1)
+ gc->msi_sharing = true;
+
+ /* If MSI is shared, use max allowed value */
+ if (gc->msi_sharing)
+ gc->max_num_queues_vport = min(gc->num_msix_usable - 1,
+ gc->max_num_queues);
+ else
+ gc->max_num_queues_vport = max_num_queues;
+
+ dev_info(gc->dev, "MSI sharing mode %u max queues %u\n",
+ gc->msi_sharing, gc->max_num_queues_vport);
+
return 0;
}
@@ -1901,6 +1960,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
/* Need 1 interrupt for HWC */
max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
min_irqs = 2;
+ gc->msi_sharing = true;
}
nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
@@ -1979,6 +2039,8 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
pci_free_irq_vectors(pdev);
+ bitmap_free(gc->msi_bitmap);
+ gc->msi_bitmap = NULL;
gc->max_num_msix = 0;
gc->num_msix_usable = 0;
}
@@ -2018,6 +2080,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
if (err)
goto destroy_hwc;
+ err = mana_gd_detect_devices(pdev);
+ if (err)
+ goto destroy_hwc;
+
err = mana_gd_query_max_resources(pdev);
if (err)
goto destroy_hwc;
@@ -2028,9 +2094,15 @@ static int mana_gd_setup(struct pci_dev *pdev)
goto destroy_hwc;
}
- err = mana_gd_detect_devices(pdev);
- if (err)
- goto destroy_hwc;
+ if (!gc->msi_sharing) {
+ gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
+ if (!gc->msi_bitmap) {
+ err = -ENOMEM;
+ goto destroy_hwc;
+ }
+ /* Set bit for HWC */
+ set_bit(0, gc->msi_bitmap);
+ }
dev_dbg(&pdev->dev, "mana gdma setup successful\n");
return 0;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ed60cc15fe78..3ec8e94e7c17 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1018,10 +1018,9 @@ static int mana_init_port_context(struct mana_port_context *apc)
return !apc->rxqs ? -ENOMEM : 0;
}
-static int mana_send_request(struct mana_context *ac, void *in_buf,
- u32 in_len, void *out_buf, u32 out_len)
+static int gdma_mana_send_request(struct gdma_context *gc, void *in_buf,
+ u32 in_len, void *out_buf, u32 out_len)
{
- struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_resp_hdr *resp = out_buf;
struct gdma_req_hdr *req = in_buf;
struct device *dev = gc->dev;
@@ -1055,6 +1054,14 @@ static int mana_send_request(struct mana_context *ac, void *in_buf,
return 0;
}
+static int mana_send_request(struct mana_context *ac, void *in_buf,
+ u32 in_len, void *out_buf, u32 out_len)
+{
+ struct gdma_context *gc = ac->gdma_dev->gdma_context;
+
+ return gdma_mana_send_request(gc, in_buf, in_len, out_buf, out_len);
+}
+
static int mana_verify_resp_hdr(const struct gdma_resp_hdr *resp_hdr,
const enum mana_command_code expected_code,
const u32 min_size)
@@ -1188,11 +1195,10 @@ static void mana_pf_deregister_filter(struct mana_port_context *apc)
err, resp.hdr.status);
}
-static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
- u32 proto_minor_ver, u32 proto_micro_ver,
- u16 *max_num_vports, u8 *bm_hostmode)
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+ u32 proto_minor_ver, u32 proto_micro_ver,
+ u16 *max_num_vports, u8 *bm_hostmode)
{
- struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct mana_query_device_cfg_resp resp = {};
struct mana_query_device_cfg_req req = {};
struct device *dev = gc->dev;
@@ -1207,7 +1213,8 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
req.proto_minor_ver = proto_minor_ver;
req.proto_micro_ver = proto_micro_ver;
- err = mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp));
+ err = gdma_mana_send_request(gc, &req, sizeof(req),
+ &resp, sizeof(resp));
if (err) {
dev_err(dev, "Failed to query config: %d", err);
return err;
@@ -1241,8 +1248,6 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
else
*bm_hostmode = 0;
- debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
-
return 0;
}
@@ -3208,6 +3213,8 @@ static int mana_init_port(struct net_device *ndev)
max_queues = min_t(u32, max_txq, max_rxq);
if (apc->max_queues > max_queues)
apc->max_queues = max_queues;
+ if (apc->max_queues > gc->max_num_queues_vport)
+ apc->max_queues = gc->max_num_queues_vport;
if (apc->num_queues > apc->max_queues)
apc->num_queues = apc->max_queues;
@@ -3478,7 +3485,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
int err;
ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
- gc->max_num_queues);
+ gc->max_num_queues_vport);
if (!ndev)
return -ENOMEM;
@@ -3487,9 +3494,9 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc = netdev_priv(ndev);
apc->ac = ac;
apc->ndev = ndev;
- apc->max_queues = gc->max_num_queues;
+ apc->max_queues = gc->max_num_queues_vport;
/* Use MANA_DEF_NUM_QUEUES as default, still honoring the HW limit */
- apc->num_queues = min(gc->max_num_queues, MANA_DEF_NUM_QUEUES);
+ apc->num_queues = min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES);
apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
apc->port_handle = INVALID_MANA_HANDLE;
@@ -3753,13 +3760,18 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
- err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
- MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+ err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION,
+ MANA_MINOR_VERSION,
+ MANA_MICRO_VERSION,
+ &num_ports, &bm_hostmode);
if (err)
goto out;
ac->bm_hostmode = bm_hostmode;
+ debugfs_create_u16("adapter-MTU", 0400,
+ gc->mana_pci_debugfs, &gc->adapter_mtu);
+
if (!resuming) {
ac->num_ports = num_ports;
} else {
@@ -3773,9 +3785,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
enable_work(&ac->link_change_work);
}
- if (ac->num_ports == 0)
- dev_err(dev, "Failed to detect any vPort\n");
-
if (ac->num_ports > MAX_PORTS_IN_MANA_DEV)
ac->num_ports = MAX_PORTS_IN_MANA_DEV;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 70d62bc32837..145cc59dfc19 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -399,8 +399,10 @@ struct gdma_context {
struct device *dev;
struct dentry *mana_pci_debugfs;
- /* Per-vPort max number of queues */
+ /* Hardware max number of queues */
unsigned int max_num_queues;
+ /* Per-vPort max number of queues */
+ unsigned int max_num_queues_vport;
unsigned int max_num_msix;
unsigned int num_msix_usable;
struct xarray irq_contexts;
@@ -447,6 +449,12 @@ struct gdma_context {
struct workqueue_struct *service_wq;
unsigned long flags;
+
+ /* Indicate if this device is sharing MSI for EQs on MANA */
+ bool msi_sharing;
+
+ /* Bitmap tracks where MSI is allocated when it is not shared for EQs */
+ unsigned long *msi_bitmap;
};
static inline bool mana_gd_is_mana(struct gdma_dev *gd)
@@ -1019,4 +1027,7 @@ int mana_gd_resume(struct pci_dev *pdev);
bool mana_need_log(struct gdma_context *gc, int err);
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+ u32 proto_minor_ver, u32 proto_micro_ver,
+ u16 *max_num_vports, u8 *bm_hostmode);
#endif /* _GDMA_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net-next v12 3/6] net: mana: Introduce GIC context with refcounting for interrupt management
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
2026-06-05 0:57 ` [PATCH net-next v12 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-06-05 0:57 ` [PATCH net-next v12 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
@ 2026-06-05 0:57 ` Long Li
2026-06-05 0:57 ` [PATCH net-next v12 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA
EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with
reference counting. This allows the driver to create an interrupt context
on an assigned or unassigned MSI-X vector and share it across multiple
EQ consumers.
Signed-off-by: Long Li <longli@microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 169 ++++++++++++++++++
include/net/mana/gdma.h | 12 ++
2 files changed, 181 insertions(+)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 786e1d4cd4fe..fd643300a427 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -871,6 +871,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
}
queue->eq.msix_index = msi_index;
+ /* The caller acquired a GIC reference via mana_gd_get_gic().
+ * That refcount prevents mana_gd_put_gic() from erasing this
+ * irq_contexts entry concurrently.
+ */
gic = xa_load(&gc->irq_contexts, msi_index);
if (WARN_ON(!gic))
return -EINVAL;
@@ -898,6 +902,10 @@ static void mana_gd_deregister_irq(struct gdma_queue *queue)
if (WARN_ON(msix_index >= gc->num_msix_usable))
return;
+ /* The caller releases the GIC reference via mana_gd_put_gic()
+ * after this function returns. The refcount guarantees this
+ * irq_contexts entry is still valid.
+ */
gic = xa_load(&gc->irq_contexts, msix_index);
if (WARN_ON(!gic))
return;
@@ -1679,6 +1687,166 @@ static irqreturn_t mana_gd_intr(int irq, void *arg)
return IRQ_HANDLED;
}
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi)
+{
+ struct pci_dev *dev = to_pci_dev(gc->dev);
+ struct gdma_irq_context *gic;
+ struct msi_map irq_map;
+ int irq;
+
+ mutex_lock(&gc->gic_mutex);
+
+ gic = xa_load(&gc->irq_contexts, msi);
+ if (WARN_ON(!gic)) {
+ mutex_unlock(&gc->gic_mutex);
+ return;
+ }
+
+ if (use_msi_bitmap)
+ gic->bitmap_refs--;
+
+ if (use_msi_bitmap && gic->bitmap_refs == 0)
+ clear_bit(msi, gc->msi_bitmap);
+
+ if (!refcount_dec_and_test(&gic->refcount))
+ goto out;
+
+ irq = gic->irq;
+
+ irq_update_affinity_hint(irq, NULL);
+ free_irq(irq, gic);
+
+ if (gic->dyn_msix) {
+ irq_map.virq = irq;
+ irq_map.index = msi;
+ pci_msix_free_irq(dev, irq_map);
+ }
+
+ xa_erase(&gc->irq_contexts, msi);
+ kfree(gic);
+
+out:
+ mutex_unlock(&gc->gic_mutex);
+}
+EXPORT_SYMBOL_NS(mana_gd_put_gic, "NET_MANA");
+
+/*
+ * Get a GIC (GDMA IRQ Context) on a MSI vector
+ * a MSI can be shared between different EQs, this function supports setting
+ * up separate MSIs using a bitmap, or directly using the MSI index
+ *
+ * @use_msi_bitmap:
+ * True if MSI is assigned by this function on available slots from bitmap.
+ * False if MSI is passed from *msi_requested
+ */
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc,
+ bool use_msi_bitmap,
+ int *msi_requested)
+{
+ struct pci_dev *dev = to_pci_dev(gc->dev);
+ struct gdma_irq_context *gic;
+ struct msi_map irq_map = { };
+ int irq;
+ int msi;
+ int err;
+
+ mutex_lock(&gc->gic_mutex);
+
+ if (use_msi_bitmap) {
+ msi = find_first_zero_bit(gc->msi_bitmap, gc->num_msix_usable);
+ if (msi >= gc->num_msix_usable) {
+ dev_err(gc->dev, "No free MSI vectors available\n");
+ gic = ERR_PTR(-ENOSPC);
+ goto out;
+ }
+ *msi_requested = msi;
+ } else {
+ msi = *msi_requested;
+ }
+
+ gic = xa_load(&gc->irq_contexts, msi);
+ if (gic) {
+ refcount_inc(&gic->refcount);
+ if (use_msi_bitmap) {
+ gic->bitmap_refs++;
+ set_bit(msi, gc->msi_bitmap);
+ }
+ goto out;
+ }
+
+ irq = pci_irq_vector(dev, msi);
+ if (irq == -EINVAL) {
+ irq_map = pci_msix_alloc_irq_at(dev, msi, NULL);
+ if (!irq_map.virq) {
+ err = irq_map.index;
+ dev_err(gc->dev,
+ "Failed to alloc irq_map msi %d err %d\n",
+ msi, err);
+ gic = ERR_PTR(err);
+ goto out;
+ }
+ irq = irq_map.virq;
+ msi = irq_map.index;
+ *msi_requested = msi;
+ }
+
+ gic = kzalloc(sizeof(*gic), GFP_KERNEL);
+ if (!gic) {
+ gic = ERR_PTR(-ENOMEM);
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ gic->handler = mana_gd_process_eq_events;
+ gic->msi = msi;
+ gic->irq = irq;
+ INIT_LIST_HEAD(&gic->eq_list);
+ spin_lock_init(&gic->lock);
+
+ if (!gic->msi)
+ snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
+ pci_name(dev));
+ else
+ snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_msi%d@pci:%s",
+ gic->msi, pci_name(dev));
+
+ err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
+ if (err) {
+ dev_err(gc->dev, "Failed to request irq %d %s\n",
+ irq, gic->name);
+ kfree(gic);
+ gic = ERR_PTR(err);
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ gic->dyn_msix = !!irq_map.virq;
+ refcount_set(&gic->refcount, 1);
+ gic->bitmap_refs = use_msi_bitmap ? 1 : 0;
+
+ err = xa_err(xa_store(&gc->irq_contexts, msi, gic, GFP_KERNEL));
+ if (err) {
+ dev_err(gc->dev, "Failed to store irq context for msi %d: %d\n",
+ msi, err);
+ free_irq(irq, gic);
+ kfree(gic);
+ gic = ERR_PTR(err);
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ if (use_msi_bitmap)
+ set_bit(msi, gc->msi_bitmap);
+
+out:
+ mutex_unlock(&gc->gic_mutex);
+ return gic;
+}
+EXPORT_SYMBOL_NS(mana_gd_get_gic, "NET_MANA");
+
int mana_gd_alloc_res_map(u32 res_avail, struct gdma_resource *r)
{
r->map = bitmap_zalloc(res_avail, GFP_KERNEL);
@@ -2180,6 +2348,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto release_region;
mutex_init(&gc->eq_test_event_mutex);
+ mutex_init(&gc->gic_mutex);
pci_set_drvdata(pdev, gc);
gc->bar0_pa = pci_resource_start(pdev, 0);
gc->bar0_size = pci_resource_len(pdev, 0);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 145cc59dfc19..e3ee85c614ec 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -388,6 +388,11 @@ struct gdma_irq_context {
spinlock_t lock;
struct list_head eq_list;
char name[MANA_IRQ_NAME_SZ];
+ unsigned int msi;
+ unsigned int irq;
+ refcount_t refcount;
+ unsigned int bitmap_refs;
+ bool dyn_msix;
};
enum gdma_context_flags {
@@ -450,6 +455,9 @@ struct gdma_context {
unsigned long flags;
+ /* Protect access to GIC context */
+ struct mutex gic_mutex;
+
/* Indicate if this device is sharing MSI for EQs on MANA */
bool msi_sharing;
@@ -1027,6 +1035,10 @@ int mana_gd_resume(struct pci_dev *pdev);
bool mana_need_log(struct gdma_context *gc, int err);
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc,
+ bool use_msi_bitmap,
+ int *msi_requested);
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi);
int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
u32 proto_minor_ver, u32 proto_micro_ver,
u16 *max_num_vports, u8 *bm_hostmode);
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net-next v12 4/6] net: mana: Use GIC functions to allocate global EQs
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
` (2 preceding siblings ...)
2026-06-05 0:57 ` [PATCH net-next v12 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
@ 2026-06-05 0:57 ` Long Li
2026-06-05 0:57 ` [PATCH net-next v12 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-06-05 0:57 ` [PATCH net-next v12 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Replace the GDMA global interrupt setup code with the new GIC allocation
and release functions for managing interrupt contexts.
This changes the per-queue interrupt names in /proc/interrupts from
mana_q0, mana_q1, ... to mana_msi1, mana_msi2, ... to reflect the
MSI-X index rather than a zero-based queue number. The HWC interrupt
name (mana_hwc) is unchanged.
Signed-off-by: Long Li <longli@microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 110 ++++--------------
1 file changed, 22 insertions(+), 88 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index fd643300a427..653c091ad296 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1949,7 +1949,7 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_irq_context *gic;
bool skip_first_cpu = false;
- int *irqs, irq, err, i;
+ int *irqs, err, i, msi;
irqs = kmalloc_objs(int, nvec);
if (!irqs)
@@ -1962,30 +1962,14 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
* further used in irq_setup()
*/
for (i = 1; i <= nvec; i++) {
- gic = kzalloc_obj(*gic);
- if (!gic) {
- err = -ENOMEM;
+ msi = i;
+ gic = mana_gd_get_gic(gc, false, &msi);
+ if (IS_ERR(gic)) {
+ err = PTR_ERR(gic);
goto free_irq;
}
- gic->handler = mana_gd_process_eq_events;
- INIT_LIST_HEAD(&gic->eq_list);
- spin_lock_init(&gic->lock);
-
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
- i - 1, pci_name(pdev));
-
- /* one pci vector is already allocated for HWC */
- irqs[i - 1] = pci_irq_vector(pdev, i);
- if (irqs[i - 1] < 0) {
- err = irqs[i - 1];
- goto free_current_gic;
- }
-
- err = request_irq(irqs[i - 1], mana_gd_intr, 0, gic->name, gic);
- if (err)
- goto free_current_gic;
- xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+ irqs[i - 1] = gic->irq;
}
/*
@@ -2007,20 +1991,9 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
kfree(irqs);
return 0;
-free_current_gic:
- kfree(gic);
free_irq:
- for (i -= 1; i > 0; i--) {
- irq = pci_irq_vector(pdev, i);
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
- continue;
-
- irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
- }
+ for (i -= 1; i > 0; i--)
+ mana_gd_put_gic(gc, false, i);
kfree(irqs);
return err;
}
@@ -2029,9 +2002,9 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
{
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_irq_context *gic;
- int *irqs, *start_irqs, irq;
+ int *irqs, *start_irqs;
unsigned int cpu;
- int err, i;
+ int err, i, msi;
irqs = kmalloc_objs(int, nvec);
if (!irqs)
@@ -2040,34 +2013,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
start_irqs = irqs;
for (i = 0; i < nvec; i++) {
- gic = kzalloc_obj(*gic);
- if (!gic) {
- err = -ENOMEM;
+ msi = i;
+ gic = mana_gd_get_gic(gc, false, &msi);
+ if (IS_ERR(gic)) {
+ err = PTR_ERR(gic);
goto free_irq;
}
- gic->handler = mana_gd_process_eq_events;
- INIT_LIST_HEAD(&gic->eq_list);
- spin_lock_init(&gic->lock);
-
- if (!i)
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
- pci_name(pdev));
- else
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
- i - 1, pci_name(pdev));
-
- irqs[i] = pci_irq_vector(pdev, i);
- if (irqs[i] < 0) {
- err = irqs[i];
- goto free_current_gic;
- }
-
- err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
- if (err)
- goto free_current_gic;
-
- xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+ irqs[i] = gic->irq;
}
/* If number of IRQ is one extra than number of online CPUs,
@@ -2096,20 +2049,9 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
kfree(start_irqs);
return 0;
-free_current_gic:
- kfree(gic);
free_irq:
- for (i -= 1; i >= 0; i--) {
- irq = pci_irq_vector(pdev, i);
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
- continue;
-
- irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
- }
+ for (i -= 1; i >= 0; i--)
+ mana_gd_put_gic(gc, false, i);
kfree(start_irqs);
return err;
@@ -2183,28 +2125,20 @@ static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
static void mana_gd_remove_irqs(struct pci_dev *pdev)
{
struct gdma_context *gc = pci_get_drvdata(pdev);
- struct gdma_irq_context *gic;
- int irq, i;
+ int i;
if (gc->max_num_msix < 1)
return;
for (i = 0; i < gc->max_num_msix; i++) {
- irq = pci_irq_vector(pdev, i);
- if (irq < 0)
- continue;
-
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
+ if (!xa_load(&gc->irq_contexts, i))
continue;
- /* Need to clear the hint before free_irq */
- irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
+ mana_gd_put_gic(gc, false, i);
}
+ WARN_ON(!xa_empty(&gc->irq_contexts));
+
pci_free_irq_vectors(pdev);
bitmap_free(gc->msi_bitmap);
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net-next v12 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
` (3 preceding siblings ...)
2026-06-05 0:57 ` [PATCH net-next v12 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
@ 2026-06-05 0:57 ` Long Li
2026-06-05 0:57 ` [PATCH net-next v12 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Use GIC functions to create a dedicated interrupt context or acquire a
shared interrupt context for each EQ when setting up a vPort.
The caller now owns the GIC reference across the EQ create/destroy
lifecycle: mana_create_eq() calls mana_gd_get_gic() before creating
each EQ and mana_destroy_eq() calls mana_gd_put_gic() after destroying
it. The msix_index invalidation is moved from mana_gd_deregister_irq()
to the mana_gd_create_eq() error path so that mana_destroy_eq() can
read the index before teardown.
Signed-off-by: Long Li <longli@microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 2 +-
drivers/net/ethernet/microsoft/mana/mana_en.c | 18 +++++++++++++++++-
include/net/mana/gdma.h | 1 +
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 653c091ad296..6feedbbdebc3 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -919,7 +919,6 @@ static void mana_gd_deregister_irq(struct gdma_queue *queue)
}
spin_unlock_irqrestore(&gic->lock, flags);
- queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
synchronize_rcu();
}
@@ -1034,6 +1033,7 @@ static int mana_gd_create_eq(struct gdma_dev *gd,
out:
dev_err(dev, "Failed to create EQ: %d\n", err);
mana_gd_destroy_eq(gc, false, queue);
+ queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
return err;
}
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 3ec8e94e7c17..beada8660258 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1645,6 +1645,7 @@ void mana_destroy_eq(struct mana_port_context *apc)
struct mana_context *ac = apc->ac;
struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_queue *eq;
+ unsigned int msi;
int i;
if (!apc->eqs)
@@ -1658,7 +1659,9 @@ void mana_destroy_eq(struct mana_port_context *apc)
if (!eq)
continue;
+ msi = eq->eq.msix_index;
mana_gd_destroy_queue(gc, eq);
+ mana_gd_put_gic(gc, !gc->msi_sharing, msi);
}
kfree(apc->eqs);
@@ -1675,6 +1678,7 @@ static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
+ debugfs_create_u32("irq", 0400, eq.mana_eq_debugfs, &eq.eq->eq.irq);
debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
}
@@ -1683,7 +1687,9 @@ int mana_create_eq(struct mana_port_context *apc)
struct gdma_dev *gd = apc->ac->gdma_dev;
struct gdma_context *gc = gd->gdma_context;
struct gdma_queue_spec spec = {};
+ struct gdma_irq_context *gic;
int err;
+ int msi;
int i;
if (WARN_ON(apc->eqs))
@@ -1703,12 +1709,22 @@ int mana_create_eq(struct mana_port_context *apc)
debugfs_create_dir("EQs", apc->mana_port_debugfs);
for (i = 0; i < apc->num_queues; i++) {
- spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+ msi = (i + 1) % gc->num_msix_usable;
+
+ gic = mana_gd_get_gic(gc, !gc->msi_sharing, &msi);
+ if (IS_ERR(gic)) {
+ err = PTR_ERR(gic);
+ goto out;
+ }
+ spec.eq.msix_index = msi;
+
err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
if (err) {
dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
+ mana_gd_put_gic(gc, !gc->msi_sharing, msi);
goto out;
}
+ apc->eqs[i].eq->eq.irq = gic->irq;
mana_create_eq_debugfs(apc, i);
}
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index e3ee85c614ec..6a65fedae38f 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -342,6 +342,7 @@ struct gdma_queue {
void *context;
unsigned int msix_index;
+ unsigned int irq;
u32 log2_throttle_limit;
} eq;
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net-next v12 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs
2026-06-05 0:57 [PATCH net-next v12 0/6] net: mana: Per-vPort EQ and MSI-X management Long Li
` (4 preceding siblings ...)
2026-06-05 0:57 ` [PATCH net-next v12 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
@ 2026-06-05 0:57 ` Long Li
5 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2026-06-05 0:57 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui, shradhagupta
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
are limited.
The driver now supports allocating dedicated MSI-X for each EQ. Indicate
this capability through driver capability bits. The RDMA EQs pass
use_msi_bitmap=false to share MSI-X vectors with Ethernet, while the
capability flag advertises that the driver supports per-vPort EQ
separation when hardware has sufficient vectors.
Populate eq.irq on all RDMA EQs for consistency with the Ethernet path.
Also relocate the GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE define to its
numeric BIT(6) position among the other capability flags.
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
---
drivers/infiniband/hw/mana/main.c | 43 +++++++++++++++++++++++++------
include/net/mana/gdma.h | 7 +++--
2 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index f42ea20cb75d..142047847f38 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -765,7 +765,8 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
struct gdma_queue_spec spec = {};
- int err, i;
+ struct gdma_irq_context *gic;
+ int err, i, msi;
spec.type = GDMA_EQ;
spec.monitor_avl_buf = false;
@@ -773,11 +774,19 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
spec.eq.callback = mana_ib_event_handler;
spec.eq.context = mdev;
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
- spec.eq.msix_index = 0;
+
+ msi = 0;
+ gic = mana_gd_get_gic(gc, false, &msi);
+ if (IS_ERR(gic))
+ return PTR_ERR(gic);
+ spec.eq.msix_index = msi;
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->fatal_err_eq);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, 0);
return err;
+ }
+ mdev->fatal_err_eq->eq.irq = gic->irq;
mdev->eqs = kzalloc_objs(struct gdma_queue *,
mdev->ib_dev.num_comp_vectors);
@@ -787,32 +796,50 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
}
spec.eq.callback = NULL;
for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
- spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+ msi = (i + 1) % gc->num_msix_usable;
+
+ gic = mana_gd_get_gic(gc, false, &msi);
+ if (IS_ERR(gic)) {
+ err = PTR_ERR(gic);
+ goto destroy_eqs;
+ }
+ spec.eq.msix_index = msi;
+
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->eqs[i]);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, msi);
goto destroy_eqs;
+ }
+ mdev->eqs[i]->eq.irq = gic->irq;
}
return 0;
destroy_eqs:
- while (i-- > 0)
+ while (i-- > 0) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ mana_gd_put_gic(gc, false, (i + 1) % gc->num_msix_usable);
+ }
kfree(mdev->eqs);
destroy_fatal_eq:
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
return err;
}
void mana_ib_destroy_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
- int i;
+ int i, msi;
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
- for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++)
+ for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ msi = (i + 1) % gc->num_msix_usable;
+ mana_gd_put_gic(gc, false, msi);
+ }
kfree(mdev->eqs);
}
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 6a65fedae38f..78afd696b08b 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -616,6 +616,7 @@ enum {
#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
#define GDMA_DRV_CAP_FLAG_1_GDMA_PAGES_4MB_1GB_2GB BIT(4)
#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
+#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
/* Driver can handle holes (zeros) in the device list */
#define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
@@ -632,7 +633,8 @@ enum {
/* Driver detects stalled send queues and recovers them */
#define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18)
-#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
+/* Driver supports separate EQ/MSIs for each vPort */
+#define GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT BIT(19)
/* Driver supports linearizing the skb when num_sge exceeds hardware limit */
#define GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE BIT(20)
@@ -660,7 +662,8 @@ enum {
GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
- GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
+ GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
+ GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT)
#define GDMA_DRV_CAP_FLAGS2 0
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread