Re: [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: longli@microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	kotaranov@microsoft.com, davem@davemloft.net, pabeni@redhat.com,
	edumazet@google.com, andrew+netdev@lunn.ch, jgg@ziepe.ca,
	leon@kernel.org, haiyangz@microsoft.com, kys@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com,
	shradhagupta@linux.microsoft.com, horms@kernel.org,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
Date: Mon, 11 May 2026 19:14:03 -0700	[thread overview]
Message-ID: <20260512021403.849804-1-kuba@kernel.org> (raw)
In-Reply-To: <20260507191237.438671-3-longli@microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: Query device capabilities and configure MSI-X sharing for EQs

This change queries device capabilities earlier during GDMA setup,
renames mana_query_device_cfg() to mana_gd_query_device_cfg(), and adds
a new gc->msi_sharing flag plus a gc->msi_bitmap to manage dedicated
vs. shared MSI-X vectors for EQs across vPorts.

A few questions below, mostly about msi_sharing state across resume and
about some wording in the commit message and in the code.

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index f3316e929175..3aa96329f359 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -149,6 +149,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
>  	struct gdma_query_max_resources_resp resp = {};
>  	struct gdma_general_req req = {};
> +	unsigned int max_num_queues;
> +	u8 bm_hostmode;
> +	u16 num_ports;
>  	int err;
>  
>  	mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
> @@ -197,6 +200,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
>  	if (gc->max_num_queues == 0)
>  		return -ENOSPC;
>  
> +	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
> +				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
> +	if (err)
> +		return err;
> +
> +	if (!num_ports)
> +		return -EINVAL;

[Low]
This is the first of two back-to-back mana_gd_query_device_cfg() calls
per probe cycle. The second one happens in mana_probe() (further down
in this same patch) and also fetches num_ports and bm_hostmode from
firmware. The first caller here discards bm_hostmode; the second one
uses both.

Would it be reasonable to cache num_ports and bm_hostmode on
struct gdma_context during the first call and have mana_probe() read
them from there, instead of doing a second HWC round-trip that must
return the same values for the per-vPort math in
mana_gd_query_max_resources() to match ac->num_ports in mana_probe()?

> +
> +	/*
> +	 * Adjust gc->max_num_queues returned from the SOC to allow dedicated
> +	 * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
> +	 */
> +	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> +	max_num_queues = rounddown_pow_of_two(max(max_num_queues, 1U));
> +	if (max_num_queues < MANA_DEF_NUM_QUEUES)
> +		max_num_queues = MANA_DEF_NUM_QUEUES;
> +
> +	/*
> +	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> +	 * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
> +	 */
> +	max_num_queues = min(gc->max_num_queues, max_num_queues);

[Low]
The commit message says "The number of queues per vPort is clamped to
no less than MANA_DEF_NUM_QUEUES". After the clamp-up above, this
min() with gc->max_num_queues can bring the result back below
MANA_DEF_NUM_QUEUES whenever gc->max_num_queues is below
MANA_DEF_NUM_QUEUES (which is 16).

gc->max_num_queues was previously clamped by num_online_cpus(),
resp.max_eq/cq/sq/rq, and gc->num_msix_usable - 1, so a VM with
fewer than 16 online CPUs will end up with gc->max_num_queues_vport
below MANA_DEF_NUM_QUEUES in the non-sharing branch.

Could the commit message be reworded to describe the actual behaviour
(the per-vPort count is clamped towards MANA_DEF_NUM_QUEUES but never
exceeds the hardware maximum)?

> +	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> +		gc->msi_sharing = true;

[Medium]
Is gc->msi_sharing ever cleared?

Walking the write sites for this flag in the series: it is set to true
unconditionally in the non-dyn branch of mana_gd_setup_hwc_irqs() (see
the hunk below), and conditionally true here. I could not find any
path that writes false, and neither mana_gd_remove_irqs() nor
mana_gd_cleanup() reset it.

The same gdma_context allocated in mana_gd_probe() survives a suspend
/ resume cycle via mana_gd_suspend() -> mana_gd_cleanup() ->
mana_gd_resume() -> mana_gd_setup(). In the dynamic MSI-X path
(pci_msix_can_alloc_dyn() true, i.e. the PF), gc->num_msix_usable is
recomputed as min(resp.max_msix, num_online_cpus() + 1), so it can
legitimately grow across suspend/resume if CPUs came online while the
guest was suspended.

If the first probe set msi_sharing=true because MSI-X was tight, but
on resume there are enough vectors for dedicated allocation, does the
flag stay stuck at true? If so:

  - gc->max_num_queues_vport is computed via the sharing branch
    min(gc->num_msix_usable - 1, gc->max_num_queues) rather than the
    dedicated max_num_queues value;
  - the if (!gc->msi_sharing) branch at the end of mana_gd_setup()
    is skipped and gc->msi_bitmap stays NULL;
  - later patches in this series that consult !gc->msi_sharing as the
    "use bitmap" predicate keep taking the shared path.

Would it be appropriate to reset gc->msi_sharing = false at the top of
mana_gd_query_max_resources() (and/or in mana_gd_setup_hwc_irqs()'s
other branch) before the conditions that may set it to true are
evaluated? This would also line up with the commit message's claim
that "MSI-X sharing among vPorts is disabled by default and is only
enabled when there are not enough MSI-X vectors for dedicated
allocation."

> +
> +	/* If MSI is shared, use max allowed value */
> +	if (gc->msi_sharing)
> +		gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
> +	else
> +		gc->max_num_queues_vport = max_num_queues;
> +
> +	dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
> +		 gc->msi_sharing, gc->max_num_queues);

[Low, Low]
Two small wording questions on this block:

The comment above reads "Adjust gc->max_num_queues returned from the
SOC to allow dedicated MSIx for each vPort", but the code never
modifies gc->max_num_queues; it updates a local and assigns to
gc->max_num_queues_vport. Should the comment say
gc->max_num_queues_vport instead?

For the dev_info, should the printed field be gc->max_num_queues_vport
rather than gc->max_num_queues? As written, the log always shows the
hardware maximum that was already decided earlier in the function,
not the per-vPort value that the preceding logic just chose, so the
number never reflects the effect of toggling msi_sharing.

> +
>  	return 0;
>  }
>  
> @@ -1859,6 +1896,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
>  		/* Need 1 interrupt for HWC */
>  		max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
>  		min_irqs = 2;
> +		gc->msi_sharing = true;
>  	}
>  

[ ... ]

next prev parent reply	other threads:[~2026-05-12  2:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 19:12 [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort Long Li
2026-05-12  2:13   ` Jakub Kicinski
2026-05-12  2:14   ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Long Li
2026-05-12  2:14   ` Jakub Kicinski [this message]
2026-05-07 19:12 ` [PATCH net-next v7 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Long Li
2026-05-07 19:12 ` [PATCH net-next v7 4/6] net: mana: Use GIC functions to allocate global EQs Long Li
2026-05-12  2:14   ` Jakub Kicinski
2026-05-12  2:14   ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Long Li
2026-05-12  2:14   ` Jakub Kicinski
2026-05-07 19:12 ` [PATCH net-next v7 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Long Li
2026-05-12  2:14   ` Jakub Kicinski
2026-05-12  2:15 ` [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512021403.849804-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kotaranov@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shradhagupta@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox