From: Jakub Kicinski <kuba@kernel.org>
To: longli@microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
kotaranov@microsoft.com, davem@davemloft.net, pabeni@redhat.com,
edumazet@google.com, andrew+netdev@lunn.ch, jgg@ziepe.ca,
leon@kernel.org, haiyangz@microsoft.com, kys@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com,
shradhagupta@linux.microsoft.com, horms@kernel.org,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort
Date: Mon, 11 May 2026 19:14:32 -0700 [thread overview]
Message-ID: <20260512021432.850108-1-kuba@kernel.org> (raw)
In-Reply-To: <20260507191237.438671-2-longli@microsoft.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: mana: Create separate EQs for each vPort
This commit prepares for assigning vPorts to dedicated MSI-X vectors by
removing EQ sharing among the vPorts and creating dedicated EQs for each
vPort. It moves the EQ definition from struct mana_context to struct
mana_port_context and exports related functions for the RDMA driver.
> diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
> index ac5e75dd3494d..8000ab6e8beb9 100644
> --- a/drivers/infiniband/hw/mana/main.c
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -55,15 +57,22 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
> return err;
> }
>
> - mutex_unlock(&pd->vport_mutex);
>
> pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> pd->tx_vp_offset = mpc->tx_vp_offset;
> + err = mana_create_eq(mpc);
> + if (err) {
> + mana_uncfg_vport(mpc);
> + pd->vport_use_count--;
> + }
>
> - ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
> - mpc->port_handle, pd->pdn, doorbell_id);
> + mutex_unlock(&pd->vport_mutex);
How does this handle a Protection Domain shared across multiple physical
ports?
Looking at the earlier part of mana_ib_cfg_vport(), if a PD is already
in use, the code increments pd->vport_use_count and returns early:
pd->vport_use_count++;
if (pd->vport_use_count > 1) {
ibdev_dbg(&dev->ib_dev,
"Skip as this PD is already configured vport\n");
mutex_unlock(&pd->vport_mutex);
return 0;
}
If a second port is configured with the same PD, it skips calling
mana_create_eq() for the new port's mpc. Will this cause mpc->eqs to
remain NULL for the second port and fail subsequent queue creations?
[ ... ]
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 0fbcf449c134b..6167742df9d29 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -171,7 +171,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
> cq_spec.gdma_region = cq->queue.gdma_region;
> cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> cq_spec.modr_ctx_id = 0;
> - eq = &mpc->ac->eqs[cq->comp_vector];
> + /* EQs are created when a raw QP configures the vport.
> + * A raw QP must be created before creating rwq_ind_tbl.
> + */
> + if (!mpc->eqs) {
> + ret = -EINVAL;
> + i--;
> + goto fail;
> + }
Does this explicit i-- cause a resource leak in the error path?
The fail block cleans up previously created WQ objects using a post-decrement
while loop:
fail:
while (i-- > 0) {
ibwq = ind_tbl->ind_tbl[i];
ibcq = ibwq->cq;
...
mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
}
If we manually decrement i before jumping to the fail block, the while loop
will decrement it again prior to evaluation, completely skipping the cleanup
of the WQ object that was just successfully created at the original index i.
> + eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
What guarantees the lifetime of the mpc->eqs array for active RSS QPs?
Since mana_ib_create_qp_rss() doesn't take a reference count or hold a lock
that prevents mana_destroy_eq() from running, if the Ethernet interface is
brought down or the RAW QP is destroyed, could mana_destroy_eq() free the
array while RSS QPs are still actively using it?
Additionally, can this out-of-bounds read if an ethtool channel update
changes mpc->num_queues while the interface is down?
If a RAW QP is created while the interface is down, mpc->eqs is sized to the
old channel count. If the channel count is then increased, mpc->num_queues
updates but the array isn't reallocated because the interface is down and
queue deallocation was skipped. This could lead to accessing past the end of
the allocation both here and in mana_destroy_eq().
[ ... ]
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index b2faa7cf398f3..cef86c9310543 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> -static void mana_create_eq_debugfs(struct mana_context *ac, int i)
> +static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
> {
> - struct mana_eq eq = ac->eqs[i];
> + struct mana_eq eq = apc->eqs[i];
> char eqnum[32];
>
> sprintf(eqnum, "eq%d", i);
> - eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
> + eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
> }
Does this update the actual debugfs pointer in the array?
By using struct mana_eq eq = apc->eqs[i];, it creates a local by-value copy
on the stack. The debugfs directory is created, but the pointer in the actual
apc->eqs array remains NULL. Would it be better to use a pointer reference
like struct mana_eq *eq = &apc->eqs[i]; here?
[ ... ]
> -static int mana_create_eq(struct mana_context *ac)
> +int mana_create_eq(struct mana_port_context *apc)
> {
> - struct gdma_dev *gd = ac->gdma_dev;
> + struct gdma_dev *gd = apc->ac->gdma_dev;
> struct gdma_context *gc = gd->gdma_context;
> struct gdma_queue_spec spec = {};
> int err;
> int i;
>
> - ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
> - if (!ac->eqs)
> + WARN_ON(apc->eqs);
> + apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
> + if (!apc->eqs)
> return -ENOMEM;
>
> spec.type = GDMA_EQ;
> spec.monitor_avl_buf = false;
> spec.queue_size = EQ_SIZE;
> spec.eq.callback = NULL;
> - spec.eq.context = ac->eqs;
> + spec.eq.context = apc->eqs;
> spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
>
> - ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
> + apc->mana_eqs_debugfs = debugfs_create_dir("EQs",
> + apc->mana_port_debugfs);
>
> - for (i = 0; i < gc->max_num_queues; i++) {
> + for (i = 0; i < apc->num_queues; i++) {
> spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
> - err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
> + err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
Could there be a race condition between mana_create_eq() and
mana_ib_create_qp_rss() here?
apc->eqs is assigned immediately after allocation, but before the individual
eq structures are populated by mana_gd_create_mana_eq() in this loop.
If an RDMA application concurrently calls mana_ib_create_qp_rss(), it checks
if (!mpc->eqs). That check will pass because the array is allocated, but
accessing eq->eq->id could result in a NULL pointer dereference because the
element has not been fully initialized yet.
next prev parent reply other threads:[~2026-05-12 2:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 19:12 [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-12 2:13 ` Jakub Kicinski
2026-05-12 2:14 ` Jakub Kicinski [this message]
2026-05-07 19:12 ` [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:15 ` [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512021432.850108-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox