From: Jakub Kicinski <kuba@kernel.org>
To: longli@microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
kotaranov@microsoft.com, davem@davemloft.net, pabeni@redhat.com,
edumazet@google.com, andrew+netdev@lunn.ch, jgg@ziepe.ca,
leon@kernel.org, haiyangz@microsoft.com, kys@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com,
shradhagupta@linux.microsoft.com, horms@kernel.org,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
Date: Mon, 11 May 2026 19:14:03 -0700 [thread overview]
Message-ID: <20260512021403.849804-1-kuba@kernel.org> (raw)
In-Reply-To: <20260507191237.438671-3-longli@microsoft.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: Query device capabilities and configure MSI-X sharing for EQs
This change queries device capabilities earlier during GDMA setup,
renames mana_query_device_cfg() to mana_gd_query_device_cfg(), and adds
a new gc->msi_sharing flag plus a gc->msi_bitmap to manage dedicated
vs. shared MSI-X vectors for EQs across vPorts.
A few questions below, mostly about msi_sharing state across resume and
about some wording in the commit message and in the code.
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index f3316e929175..3aa96329f359 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -149,6 +149,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> struct gdma_context *gc = pci_get_drvdata(pdev);
> struct gdma_query_max_resources_resp resp = {};
> struct gdma_general_req req = {};
> + unsigned int max_num_queues;
> + u8 bm_hostmode;
> + u16 num_ports;
> int err;
>
> mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
> @@ -197,6 +200,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> if (gc->max_num_queues == 0)
> return -ENOSPC;
>
> + err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
> + MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
> + if (err)
> + return err;
> +
> + if (!num_ports)
> + return -EINVAL;
[Low]
This is the first of two back-to-back mana_gd_query_device_cfg() calls
per probe cycle. The second one happens in mana_probe() (further down
in this same patch) and also fetches num_ports and bm_hostmode from
firmware. The first caller here discards bm_hostmode; the second one
uses both.
Would it be reasonable to cache num_ports and bm_hostmode on
struct gdma_context during the first call and have mana_probe() read
them from there, instead of doing a second HWC round-trip that must
return the same values for the per-vPort math in
mana_gd_query_max_resources() to match ac->num_ports in mana_probe()?
> +
> + /*
> + * Adjust gc->max_num_queues returned from the SOC to allow dedicated
> + * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
> + */
> + max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> + max_num_queues = rounddown_pow_of_two(max(max_num_queues, 1U));
> + if (max_num_queues < MANA_DEF_NUM_QUEUES)
> + max_num_queues = MANA_DEF_NUM_QUEUES;
> +
> + /*
> + * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> + * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
> + */
> + max_num_queues = min(gc->max_num_queues, max_num_queues);
[Low]
The commit message says "The number of queues per vPort is clamped to
no less than MANA_DEF_NUM_QUEUES". After the clamp-up above, this
min() with gc->max_num_queues can bring the result back below
MANA_DEF_NUM_QUEUES whenever gc->max_num_queues is below
MANA_DEF_NUM_QUEUES (which is 16).
gc->max_num_queues was previously clamped by num_online_cpus(),
resp.max_eq/cq/sq/rq, and gc->num_msix_usable - 1, so a VM with
fewer than 16 online CPUs will end up with gc->max_num_queues_vport
below MANA_DEF_NUM_QUEUES in the non-sharing branch.
Could the commit message be reworded to describe the actual behaviour
(the per-vPort count is clamped towards MANA_DEF_NUM_QUEUES but never
exceeds the hardware maximum)?
> + if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> + gc->msi_sharing = true;
[Medium]
Is gc->msi_sharing ever cleared?
Walking the write sites for this flag in the series: it is set to true
unconditionally in the non-dyn branch of mana_gd_setup_hwc_irqs() (see
the hunk below), and conditionally true here. I could not find any
path that writes false, and neither mana_gd_remove_irqs() nor
mana_gd_cleanup() reset it.
The same gdma_context allocated in mana_gd_probe() survives a suspend
/ resume cycle via mana_gd_suspend() -> mana_gd_cleanup() ->
mana_gd_resume() -> mana_gd_setup(). In the dynamic MSI-X path
(pci_msix_can_alloc_dyn() true, i.e. the PF), gc->num_msix_usable is
recomputed as min(resp.max_msix, num_online_cpus() + 1), so it can
legitimately grow across suspend/resume if CPUs came online while the
guest was suspended.
If the first probe set msi_sharing=true because MSI-X was tight, but
on resume there are enough vectors for dedicated allocation, does the
flag stay stuck at true? If so:
- gc->max_num_queues_vport is computed via the sharing branch
min(gc->num_msix_usable - 1, gc->max_num_queues) rather than the
dedicated max_num_queues value;
- the if (!gc->msi_sharing) branch at the end of mana_gd_setup()
is skipped and gc->msi_bitmap stays NULL;
- later patches in this series that consult !gc->msi_sharing as the
"use bitmap" predicate keep taking the shared path.
Would it be appropriate to reset gc->msi_sharing = false at the top of
mana_gd_query_max_resources() (and/or in mana_gd_setup_hwc_irqs()'s
other branch) before the conditions that may set it to true are
evaluated? This would also line up with the commit message's claim
that "MSI-X sharing among vPorts is disabled by default and is only
enabled when there are not enough MSI-X vectors for dedicated
allocation."
> +
> + /* If MSI is shared, use max allowed value */
> + if (gc->msi_sharing)
> + gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
> + else
> + gc->max_num_queues_vport = max_num_queues;
> +
> + dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
> + gc->msi_sharing, gc->max_num_queues);
[Low, Low]
Two small wording questions on this block:
The comment above reads "Adjust gc->max_num_queues returned from the
SOC to allow dedicated MSIx for each vPort", but the code never
modifies gc->max_num_queues; it updates a local and assigns to
gc->max_num_queues_vport. Should the comment say
gc->max_num_queues_vport instead?
For the dev_info, should the printed field be gc->max_num_queues_vport
rather than gc->max_num_queues? As written, the log always shows the
hardware maximum that was already decided earlier in the function,
not the per-vPort value that the preceding logic just chose, so the
number never reflects the effect of toggling msi_sharing.
> +
> return 0;
> }
>
> @@ -1859,6 +1896,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
> /* Need 1 interrupt for HWC */
> max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
> min_irqs = 2;
> + gc->msi_sharing = true;
> }
>
[ ... ]
next prev parent reply other threads:[~2026-05-12 2:14 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 19:12 [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-12 2:13 ` Jakub Kicinski
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski [this message]
2026-05-07 19:12 ` [PATCH net-next v7 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
2026-05-12 2:14 ` Jakub Kicinski
2026-05-12 2:15 ` [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Jakub Kicinski
2026-05-13 22:43 ` [EXTERNAL] " Long Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512021403.849804-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.