From: Jakub Kicinski <kuba@kernel.org>
To: longli@microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
kotaranov@microsoft.com, davem@davemloft.net, pabeni@redhat.com,
edumazet@google.com, andrew+netdev@lunn.ch, jgg@ziepe.ca,
leon@kernel.org, haiyangz@microsoft.com, kys@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com,
shradhagupta@linux.microsoft.com, horms@kernel.org,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort
Date: Mon, 11 May 2026 19:13:58 -0700 [thread overview]
Message-ID: <20260512021358.849780-1-kuba@kernel.org> (raw)
In-Reply-To: <20260507191237.438671-2-longli@microsoft.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: Create separate EQs for each vPort
This change removes EQ sharing across vPorts and moves EQ ownership
from struct mana_context to struct mana_port_context. It exports
mana_create_eq() and mana_destroy_eq() so the MANA RDMA driver can
create EQs when a raw QP configures the vport, and also rearranges
mana_alloc_queues()/mana_dealloc_queues() to own the netdev-side
EQ lifecycle.
A few questions on the resulting ownership model below.
> diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
> index ac5e75dd3494..8000ab6e8beb 100644
> --- a/drivers/infiniband/hw/mana/main.c
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
> pd->vport_use_count--;
> WARN_ON(pd->vport_use_count < 0);
>
> - if (!pd->vport_use_count)
> + if (!pd->vport_use_count) {
> + mana_destroy_eq(mpc);
> mana_uncfg_vport(mpc);
> + }
>
> mutex_unlock(&pd->vport_mutex);
> }
[High]
After this change, mpc->eqs has two distinct owners: the RDMA path
frees it here under pd->vport_mutex, and the netdev path frees it under
rtnl_lock via mana_dealloc_queues() -> mana_destroy_eq(). Neither lock
is held by the readers in qp.c. See the comment on the RSS QP path
below.
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 645581359cee..6f1043383e8c 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
> cq_spec.gdma_region = cq->queue.gdma_region;
> cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> cq_spec.modr_ctx_id = 0;
> - eq = &mpc->ac->eqs[cq->comp_vector];
> + /* EQs are created when a raw QP configures the vport.
> + * A raw QP must be created before creating rwq_ind_tbl.
> + */
> + if (!mpc->eqs) {
> + ret = -EINVAL;
> + i--;
> + goto fail;
> + }
> + eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
> cq_spec.attached_eq = eq->eq->id;
[High]
Is the NULL check on mpc->eqs safe here without any synchronization?
Before this patch, ac->eqs was allocated in mana_probe() and freed in
mana_remove(), so its lifetime spanned the whole driver. With this
patch, apc->eqs can be freed concurrently by either owner:
mana_ib_uncfg_vport() (pd->vport_mutex)
mana_destroy_eq(mpc)
kfree(apc->eqs);
apc->eqs = NULL;
mana_dealloc_queues() (rtnl_lock, ndo_close)
mana_destroy_eq(apc)
kfree(apc->eqs);
apc->eqs = NULL;
This path holds neither lock. Can the 'if (!mpc->eqs)' observation
survive through the subsequent '&mpc->eqs[...]' and 'eq->eq->id'
dereference, or can a concurrent destroyer free apc->eqs between the
check and the deref and turn this into a read of freed memory?
There is also an initialization window in the writer:
apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
...
for (i = 0; i < apc->num_queues; i++) {
...
err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
apc->eqs is published before apc->eqs[i].eq is filled in. Can a
concurrent reader here see apc->eqs non-NULL but apc->eqs[i].eq still
NULL, and fault on eq->eq->id?
[Medium]
mana_ib advertises ib_dev.num_comp_vectors as gc->max_num_queues in
mana_ib_probe(), but this patch sizes the per-port pool as
apc->num_queues, which defaults to min(gc->max_num_queues,
MANA_DEF_NUM_QUEUES) and can be reduced further via ethtool -L.
eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
When gc->max_num_queues > apc->num_queues, does this modulo silently
collapse distinct advertised completion vectors onto shared EQs, so
CQs created by userspace with different comp_vector values end up on
the same notification stream? Would it be better to cap
ib_dev.num_comp_vectors to apc->num_queues, or reject comp_vector >=
apc->num_queues, instead of wrapping?
[Medium]
The added comment codifies a new ordering constraint:
/* EQs are created when a raw QP configures the vport.
* A raw QP must be created before creating rwq_ind_tbl.
*/
Before this patch, ac->eqs existed from mana_probe() onward, so
mana_ib_create_qp_rss() succeeded whether or not the netdev was up and
whether or not any raw QP had been created on the port. After this
patch, if the netdev is administratively down and no raw QP has
configured the vport, mpc->eqs is NULL and this path returns -EINVAL.
Is that ABI-visible change to RSS QP creation intentional, and worth
calling out in the commit message?
The comment is also slightly incomplete: EQs are created on netdev open
via mana_alloc_queues() -> mana_create_eq() as well, not only when a
raw QP configures the vport. Could the wording be adjusted?
>
> ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -3343,6 +3367,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
> mana_fence_rqs(apc);
>
> /* Even in err case, still need to cleanup the vPort */
> + mana_destroy_rxqs(apc);
> + mana_destroy_txq(apc);
> + mana_destroy_eq(apc);
> mana_destroy_vport(apc);
>
> return 0;
--
pw-bot: cr
next prev parent reply other threads:[~2026-05-12 2:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 19:12 [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-12 2:13 ` Jakub Kicinski [this message]
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:15 ` [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512021358.849780-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox