From: Paolo Abeni <pabeni@redhat.com>
To: longli@microsoft.com
Cc: kotaranov@microsoft.com, kuba@kernel.org, davem@davemloft.net,
pabeni@redhat.com, edumazet@google.com, andrew+netdev@lunn.ch,
jgg@ziepe.ca, leon@kernel.org, haiyangz@microsoft.com,
kys@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
shradhagupta@linux.microsoft.com, horms@kernel.org,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v10 1/6] net: mana: Create separate EQs for each vPort
Date: Tue, 19 May 2026 15:28:25 +0200 [thread overview]
Message-ID: <20260519132825.172490-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260515040508.491748-2-longli@microsoft.com>
From: AI Reviewer <ai@example.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: mana: Create separate EQs for each vPort
This patch moves EQ allocation from per-device to per-vPort, exports
mana_create_eq()/mana_destroy_eq() for use by the RDMA driver, takes
a vport reference for RSS QPs, and adds cross-port PD-sharing
rejection for raw QPs along with serialization between
mana_set_channels() and the RDMA vport configuration.
> diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
> index ac5e75dd3494..f8a9013f0ca3 100644
> --- a/drivers/infiniband/hw/mana/main.c
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -40,13 +42,27 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
>
> pd->vport_use_count++;
> if (pd->vport_use_count > 1) {
> + /* Reject cross-port PD sharing. EQs and vport config
> + * are per-port, so the PD must stay bound to the port
> + * that was configured on the first raw QP creation.
> + */
> + if (pd->vport_port != port) {
> + pd->vport_use_count--;
> + mutex_unlock(&pd->vport_mutex);
> + ibdev_dbg(&dev->ib_dev,
> + "PD already bound to port %u\n",
> + pd->vport_port);
> + return -EINVAL;
> + }
> ibdev_dbg(&dev->ib_dev,
> "Skip as this PD is already configured vport\n");
> mutex_unlock(&pd->vport_mutex);
> return 0;
> }
>
> - err = mana_cfg_vport(mpc, pd->pdn, doorbell_id);
> + pd->vport_port = port;
> +
> + err = mana_cfg_vport(mpc, pd->pdn, doorbell_id, true);
> if (err) {
> pd->vport_use_count--;
> mutex_unlock(&pd->vport_mutex);
[Low]
In the failure path of mana_ib_cfg_vport(), if mana_cfg_vport() or the
later mana_create_eq() fails, vport_use_count is rolled back to 0 but
pd->vport_port keeps the value just written.
Is the contract that pd->vport_port is only meaningful while
vport_use_count > 0? That invariant is not documented in the struct
mana_ib_pd comment, and a future reader who consults pd->vport_port
without first checking vport_use_count under vport_mutex would see stale
data. Would it be worth either resetting pd->vport_port on the rollback
path or documenting the rule explicitly?
> @@ -55,15 +71,23 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
> return err;
> }
>
> - mutex_unlock(&pd->vport_mutex);
>
> - pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> - pd->tx_vp_offset = mpc->tx_vp_offset;
> + err = mana_create_eq(mpc);
> + if (err) {
> + mana_uncfg_vport(mpc);
> + pd->vport_use_count--;
> + } else {
> + pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
> + pd->tx_vp_offset = mpc->tx_vp_offset;
> + }
[ ... ]
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 0fbcf449c134..108ec4c5ce51 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -155,6 +156,18 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
>
> qp->port = port;
>
> + /* Take a reference on the vport to ensure EQs outlive this QP.
> + * The vport must already be configured by a raw QP.
> + */
> + mutex_lock(&mana_pd->vport_mutex);
> + if (!mana_pd->vport_use_count) {
> + mutex_unlock(&mana_pd->vport_mutex);
> + ret = -EINVAL;
> + goto fail;
> + }
> + mana_pd->vport_use_count++;
> + mutex_unlock(&mana_pd->vport_mutex);
> +
[Critical]
The cross-port PD-sharing check added to mana_ib_cfg_vport() is not
mirrored here. mana_ib_create_qp_rss() takes the user-supplied port
from ucmd.port and only checks that vport_use_count is non-zero, so the
RSS path will happily attach to a different port than the one bound to
the PD by the first raw QP.
If a user creates a raw QP on port A (which sets pd->vport_port=A and
allocates mpc_A->eqs via mana_create_eq()), then creates an RSS QP on
the same PD with ucmd.port=B where port B has no Ethernet up and no
RDMA activity, mpc_B->eqs is NULL. The subsequent code in this same
function:
eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
cq_spec.attached_eq = eq->eq->id;
would then dereference NULL through mpc_B->eqs.
If port B does have Ethernet up, mpc_B->eqs is owned by the Ethernet
driver and the RSS QP attaches to those EQs. When the QP is destroyed
mana_ib_destroy_qp_rss() calls mana_ib_uncfg_vport(mdev, pd, qp->port=B),
and once pd->vport_use_count reaches 0 mana_ib_uncfg_vport() will run
mana_destroy_eq(mpc_B) on Ethernet's live EQs and call mana_uncfg_vport
on a port whose apc->vport_use_count was never incremented by this PD,
tripping the WARN_ON in mana_uncfg_vport() and corrupting Ethernet's
vport state. Meanwhile port A's EQs and vport configuration are leaked
because nothing on this PD will destroy them.
Should mana_ib_create_qp_rss() apply the same pd->vport_port == port
check that mana_ib_cfg_vport() now performs, before bumping
vport_use_count?
[Medium]
The new "vport must already be configured by a raw QP" requirement is a
visible uAPI change. Before this patch, EQs were allocated device-wide
on mana_context at probe time, so an RSS QP could be created without any
prior raw QP on the PD. After the patch, applications that legitimately
use only RSS QPs (RX-only RDMA workloads) will see -EINVAL here unless
they also create a raw QP first.
Was breaking the RSS-only case considered, and would lazy EQ creation on
the first QP regardless of type be feasible to avoid the
incompatibility?
> for (i = 0; i < ind_tbl_size; i++) {
> struct mana_obj_spec wq_spec = {};
> struct mana_obj_spec cq_spec = {};
> @@ -171,13 +184,13 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
> cq_spec.gdma_region = cq->queue.gdma_region;
> cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> cq_spec.modr_ctx_id = 0;
> - eq = &mpc->ac->eqs[cq->comp_vector];
> + eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
> cq_spec.attached_eq = eq->eq->id;
[ ... ]
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 04350973e19e..4633acc976f0 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -462,10 +467,22 @@ static int mana_set_channels(struct net_device *ndev,
> unsigned int new_count = channels->combined_count;
> unsigned int old_count = apc->num_queues;
> int err;
>
> + /* Set channel_changing to block RDMA from grabbing the vport
> + * during the detach/attach window. mana_cfg_vport() checks
> + * this flag under vport_mutex and returns -EBUSY if set.
> + */
> + mutex_lock(&apc->vport_mutex);
> + if (!apc->port_is_up && apc->vport_use_count) {
> + mutex_unlock(&apc->vport_mutex);
> + return -EBUSY;
> + }
> + apc->channel_changing = true;
> + mutex_unlock(&apc->vport_mutex);
[Low]
The commit message says:
When the port is down, apc->vport_mutex is held for the entire
operation since mana_detach()/mana_attach() do not take vport_mutex
in that case. When the port is up, Ethernet owns the vport
exclusively so no additional locking is needed.
Does this description match the code? mana_set_channels() drops
apc->vport_mutex immediately after setting channel_changing=true and
runs mana_pre_alloc_rxbufs(), mana_detach() and mana_attach() without
the mutex held, then re-acquires the mutex only at the end to clear the
flag.
The actual serialization mechanism is the channel_changing boolean
checked by mana_cfg_vport() under vport_mutex, not a long mutex hold.
Could the commit message be updated to match the implemented design so
that future readers do not assume a different invariant?
--
This is an AI-generated review.
next prev parent reply other threads:[~2026-05-19 13:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 4:05 [PATCH net-next v10 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-05-15 4:05 ` [PATCH net-next v10 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-16 4:05 ` sashiko-bot
2026-05-19 13:28 ` Paolo Abeni [this message]
2026-05-19 13:30 ` Paolo Abeni
2026-05-15 4:05 ` [PATCH net-next v10 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-16 4:05 ` sashiko-bot
2026-05-15 4:05 ` [PATCH net-next v10 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-05-15 4:05 ` [PATCH net-next v10 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-05-15 4:05 ` [PATCH net-next v10 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-05-16 4:05 ` sashiko-bot
2026-05-15 4:05 ` [PATCH net-next v10 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519132825.172490-1-pabeni@redhat.com \
--to=pabeni@redhat.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kotaranov@microsoft.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=shradhagupta@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox