Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH rdma-next v3] RDMA/mana_ib: Clamp adapter capabilities at the ib_device_attr boundary
From: Erni Sri Satya Vennela @ 2026-06-19 19:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: longli, kotaranov, Jason Gunthorpe, linux-rdma, linux-hyperv,
	linux-kernel
In-Reply-To: <20260611111745.GM327369@unreal>

On Thu, Jun 11, 2026 at 02:17:45PM +0300, Leon Romanovsky wrote:
> On Mon, May 25, 2026 at 12:01:01PM -0700, Erni Sri Satya Vennela wrote:
> > mana_ib stores its adapter capabilities internally as u32 in
> > struct mana_ib_adapter_caps. The IB core, however, exposes the
> > corresponding device attributes through struct ib_device_attr, where
> > fields such as max_qp, max_qp_wr, max_send_sge, max_recv_sge,
> > max_sge_rd, max_cq, max_cqe, max_mr, max_pd, max_qp_rd_atom,
> > max_res_rd_atom and max_qp_init_rd_atom are signed int.
> > 
> > mana_ib_query_device() is the only place that copies the cached u32
> > caps into these int fields. If a cap exceeds INT_MAX, the implicit
> > u32-to-int narrowing yields a negative value. Clamp each cap to
> > INT_MAX at this boundary so the values handed to the IB core are always
> > non-negative.
> > 
> > While here, fix a related overflow in the computation of
> > max_res_rd_atom. It is derived as max_qp_rd_atom * max_qp, both of
> > which are int after the assignment above; the multiplication can
> > overflow an int even with the new clamps in place. Widen to s64
> > before multiplying and clamp the result to INT_MAX.
> > 
> > Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> > ---
> > Changes in v3:
> > * Drop clamping from mana_ib_gd_query_adapter_caps(). The internal u32
> >   caps cache does not need to be clamped.
> > * Move all clamping exclusively to mana_ib_query_device(), which is the
> >   only place the cached u32 values are narrowed into the signed int
> >   fields of struct ib_device_attr.
> > * Reframe commit message: this is a u32-to-int type boundary fix, not a
> >   CVM/untrusted-hardware hardening patch.
> 
> You should align all types to u32 and avoid hiding the issue behind  
> min_t().
> 
> Thanks
Yes Leon, I'm currently at v7 version of this patch.
I'm planning to completely avoid using min_t in the next version.

- Vennela

^ permalink raw reply

* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: sashiko-bot @ 2026-06-20  5:54 UTC (permalink / raw)
  To: Erni Sri Satya Vennela; +Cc: linux-hyperv, bpf
In-Reply-To: <20260619055348.467224-1-ernis@linux.microsoft.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Preexisting error handling bug in XDP set path leaves apc->bpf_prog pointing to a freed BPF program, leading to a Use-After-Free/Double Free.
--

commit 425f6bf5274f2172e13cdc649ea2288d8dc1ad72
Author: Erni Sri Satya Vennela <ernis@linux.microsoft.com>

net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0

This commit fixes a regression for older PF firmware versions by allowing
an adapter_mtu of 0 to fall back to the standard ETH_FRAME_LEN. It also
updates MTU bounds checks for XDP limits.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> index b5e9bb184a1dd..53308e139cbe9 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> @@ -237,7 +237,8 @@ static int mana_xdp_set(struct net_device *ndev, struct bpf_prog *prog,
>  		bpf_prog_put(old_prog);
>  
>  	if (prog)
> -		ndev->max_mtu = MANA_XDP_MTU_MAX;
> +		ndev->max_mtu = min_t(unsigned int, MANA_XDP_MTU_MAX,
> +				      gc->adapter_mtu - ETH_HLEN);

[Severity: High]
This isn't a bug introduced by this patch, but while reviewing the changes
to mana_xdp_set(), I noticed a pre-existing issue on an error path earlier
in this function.

Earlier in mana_xdp_set(), the program is assigned:

    apc->bpf_prog = prog;

If the port is up, it then allocates rx buffers:

    err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
    if (err) {
        ...
        return err;
    }

If mana_pre_alloc_rxbufs() fails, it returns directly rather than jumping
to the err_dealloc_rxbuffs label. This bypasses the cleanup step that
restores apc->bpf_prog = old_prog.

Since the networking core frees the bpf program on error, does this leave
apc->bpf_prog pointing to a freed program?

If mana_xdp_set() is called again later, could fetching old_prog lead to a
use-after-free or double-free when calling bpf_prog_put(old_prog)?

>  	else
>  		ndev->max_mtu = gc->adapter_mtu - ETH_HLEN;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260619055348.467224-1-ernis@linux.microsoft.com?part=1

^ permalink raw reply

* [GIT PULL] Hyper-V patches for v7.2
From: Wei Liu @ 2026-06-22  6:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Wei Liu, Linux on Hyper-V List, Linux Kernel List, kys, haiyangz,
	decui, longli

Hi Linus,

The following changes since commit e7ae89a0c97ce2b68b0983cd01eda67cf373517d:

  Linux 7.1-rc5 (2026-05-24 13:48:06 -0700)

are available in the Git repository at:

  ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20260621

for you to fetch changes up to a4ffc59238be84dd1c26bf1c001543e832674fc6:

  mshv: add bounds check on vp_index in mshv_intercept_isr() (2026-06-07 23:22:46 -0700)

----------------------------------------------------------------
hyperv-next for v7.2-rc1
 - Use wakeup mailbox to boot APs in Hyper-V VTL2 TDX guests (Yunhong Jiang,
   Ricardo Neri)
 - Move the Hyper-V IOMMU to its own subdirectory (Mukesh Rathor)
 - Cosmetic changes to mshv and balloon driver (Junrui Luo, Markus
   Elfring)
----------------------------------------------------------------
Junrui Luo (1):
      mshv: add bounds check on vp_index in mshv_intercept_isr()

Markus Elfring (1):
      hv_balloon: Simplify data output in hv_balloon_debug_show()

Mukesh R (2):
      iommu/hyperv: Create hyperv subdirectory under drivers/iommu
      x86/hyperv: Cosmetic changes in irqdomain.c for readability

Ricardo Neri (6):
      x86/topology: Add missing struct declaration and attribute dependency
      x86/acpi: Add functions to setup and access the wakeup mailbox
      dt-bindings: reserved-memory: Wakeup Mailbox for Intel processors
      x86/dt: Parse the Wakeup Mailbox for Intel processors
      x86/acpi: Add a helper to get the address of the wakeup mailbox
      x86/hyperv/vtl: Use the wakeup mailbox to boot secondary CPUs

Yunhong Jiang (4):
      x86/hyperv/vtl: Set real_mode_header in hv_vtl_init_platform()
      x86/realmode: Make the location of the trampoline configurable
      x86/hyperv/vtl: Setup the 64-bit trampoline for TDX guests
      x86/hyperv/vtl: Mark the wakeup mailbox page as private

 .../reserved-memory/intel,wakeup-mailbox.yaml      |  49 +++++
 MAINTAINERS                                        |   2 +-
 arch/x86/hyperv/hv_vtl.c                           |  38 +++-
 arch/x86/hyperv/irqdomain.c                        | 198 +++++++++++----------
 arch/x86/include/asm/acpi.h                        |  16 ++
 arch/x86/include/asm/topology.h                    |   3 +
 arch/x86/include/asm/x86_init.h                    |   3 +
 arch/x86/kernel/acpi/madt_wakeup.c                 |  16 ++
 arch/x86/kernel/devicetree.c                       |  47 +++++
 arch/x86/kernel/x86_init.c                         |   3 +
 arch/x86/realmode/init.c                           |   7 +-
 drivers/hv/hv_balloon.c                            |   4 +-
 drivers/hv/mshv_synic.c                            |   5 +
 drivers/iommu/Kconfig                              |   9 -
 drivers/iommu/Makefile                             |   2 +-
 drivers/iommu/hyperv/Makefile                      |   2 +
 .../{hyperv-iommu.c => hyperv/hv-irq-remap-x86.c}  |   6 +-
 drivers/iommu/irq_remapping.c                      |   2 +-
 18 files changed, 290 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/reserved-memory/intel,wakeup-mailbox.yaml
 create mode 100644 drivers/iommu/hyperv/Makefile
 rename drivers/iommu/{hyperv-iommu.c => hyperv/hv-irq-remap-x86.c} (99%)

^ permalink raw reply

* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: Erni Sri Satya Vennela @ 2026-06-22  9:22 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-hyperv, bpf
In-Reply-To: <20260620055406.C564C1F000E9@smtp.kernel.org>

On Sat, Jun 20, 2026 at 05:54:06AM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> 
> Pre-existing issues:
> - [High] Preexisting error handling bug in XDP set path leaves apc->bpf_prog pointing to a freed BPF program, leading to a Use-After-Free/Double Free.
> --
> 
> commit 425f6bf5274f2172e13cdc649ea2288d8dc1ad72
> Author: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> 
> net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
> 
> This commit fixes a regression for older PF firmware versions by allowing
> an adapter_mtu of 0 to fall back to the standard ETH_FRAME_LEN. It also
> updates MTU bounds checks for XDP limits.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> > index b5e9bb184a1dd..53308e139cbe9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
> > @@ -237,7 +237,8 @@ static int mana_xdp_set(struct net_device *ndev, struct bpf_prog *prog,
> >  		bpf_prog_put(old_prog);
> >  
> >  	if (prog)
> > -		ndev->max_mtu = MANA_XDP_MTU_MAX;
> > +		ndev->max_mtu = min_t(unsigned int, MANA_XDP_MTU_MAX,
> > +				      gc->adapter_mtu - ETH_HLEN);
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but while reviewing the changes
> to mana_xdp_set(), I noticed a pre-existing issue on an error path earlier
> in this function.
> 
> Earlier in mana_xdp_set(), the program is assigned:
> 
>     apc->bpf_prog = prog;
> 
> If the port is up, it then allocates rx buffers:
> 
>     err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
>     if (err) {
>         ...
>         return err;
>     }
> 
> If mana_pre_alloc_rxbufs() fails, it returns directly rather than jumping
> to the err_dealloc_rxbuffs label. This bypasses the cleanup step that
> restores apc->bpf_prog = old_prog.
> 
> Since the networking core frees the bpf program on error, does this leave
> apc->bpf_prog pointing to a freed program?
> 
> If mana_xdp_set() is called again later, could fetching old_prog lead to a
> use-after-free or double-free when calling bpf_prog_put(old_prog)?
> 
> >  	else
> >  		ndev->max_mtu = gc->adapter_mtu - ETH_HLEN;
> >
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260619055348.467224-1-ernis@linux.microsoft.com?part=1

I'll take this up in a different patch.

Thankyou,
Vennela

^ permalink raw reply

* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: Simon Horman @ 2026-06-22 10:41 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, ssengar, jacob.e.keller,
	gargaditya, kees, linux-hyperv, netdev, linux-kernel, bpf
In-Reply-To: <20260619055348.467224-1-ernis@linux.microsoft.com>

On Thu, Jun 18, 2026 at 10:53:38PM -0700, Erni Sri Satya Vennela wrote:
> Commit d7709812e13d ("net: mana: hardening: Validate adapter_mtu from
> MANA_QUERY_DEV_CONFIG") rejected any adapter_mtu value smaller than
> ETH_MIN_MTU + ETH_HLEN, including 0, returning -EPROTO and failing
> mana_probe().
> 
> Some older PF firmware versions still in the field report
> adapter_mtu as 0 in the MANA_QUERY_DEV_CONFIG response. With the
> hardening check in place, the MANA VF driver now fails to load on
> those hosts, breaking networking entirely for guests.
> 
> MANA hardware always supports the standard Ethernet MTU. Treat a
> reported adapter_mtu of 0 as "the PF did not advertise a value" and
> fall back to ETH_FRAME_LEN, the same value used for the pre-V2
> message version path. Only jumbo frames remain unavailable until
> the PF reports a valid MTU.
> 
> Other small-but-nonzero bogus values are still rejected, preserving
> the original protection against the unsigned-subtraction wrap that
> would otherwise let ndev->max_mtu underflow to a huge value.
> 
> Fixes: d7709812e13d ("net: mana: hardening: Validate adapter_mtu from MANA_QUERY_DEV_CONFIG")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>

Reviewed-by: Simon Horman <horms@kernel.org>

FTR, I agree with your assessment that the issue flagged in the
AI-generated review of this patch on sashiko.dev can be
treated as a follow-up [1].

And I don't think the low priority issue flagged in the AI-generated
review on https://netdev-ai.bots.linux.dev/sashiko/ should impede progress
of this patch.

[1] https://lore.kernel.org/bpf/ajj+5mhswcqhI2z7@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/


^ permalink raw reply

* Re: [GIT PULL] Hyper-V patches for v7.2
From: pr-tracker-bot @ 2026-06-22 16:36 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linus Torvalds, Wei Liu, Linux on Hyper-V List, Linux Kernel List,
	kys, haiyangz, decui, longli
In-Reply-To: <20260622064549.GA2852659@liuwe-devbox-debian-v2.local>

The pull request you sent on Sun, 21 Jun 2026 23:45:49 -0700:

> ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20260621

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6e869de3a1b9ef9f096223e0e7f30c727de4f6bc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCH v4 net] net: mana: Optimize irq affinity for low vcpu configs
From: Jakub Kicinski @ 2026-06-23  1:22 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Erni Sri Satya Vennela, Dipayaan Roy, Shiraz Saleem,
	Michael Kelley, Long Li, Yury Norov, linux-hyperv, linux-kernel,
	netdev, Paul Rosswurm, Shradha Gupta, Saurabh Singh Sengar,
	stable
In-Reply-To: <20260619073338.481035-1-shradhagupta@linux.microsoft.com>

On Fri, 19 Jun 2026 00:33:35 -0700 Shradha Gupta wrote:
> Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
> Cc: stable@vger.kernel.org

If you want this to be a fix -- could you please rewrite the commit
message? What matters most is the comparison before the bad commit,
the bad commit, and then with this fix applied. Perhaps the three
cases you list is that but it's not immediately obvious..
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH v4 net] net: mana: Optimize irq affinity for low vcpu configs
From: Shradha Gupta @ 2026-06-23  9:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Erni Sri Satya Vennela, Dipayaan Roy, Shiraz Saleem,
	Michael Kelley, Long Li, Yury Norov, linux-hyperv, linux-kernel,
	netdev, Paul Rosswurm, Shradha Gupta, Saurabh Singh Sengar,
	stable
In-Reply-To: <20260622182248.5bfc49ce@kernel.org>

On Mon, Jun 22, 2026 at 06:22:48PM -0700, Jakub Kicinski wrote:
> On Fri, 19 Jun 2026 00:33:35 -0700 Shradha Gupta wrote:
> > Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
> > Cc: stable@vger.kernel.org
> 
> If you want this to be a fix -- could you please rewrite the commit
> message? What matters most is the comparison before the bad commit,
> the bad commit, and then with this fix applied. Perhaps the three
> cases you list is that but it's not immediately obvious..
> -- 
> pw-bot: cr

sure, Jakub. Let me send out a newer version with this change in commit
message

regards,
Shradha

^ permalink raw reply

* [PATCH v5 net] net: mana: Optimize irq affinity for low vcpu configs
From: Shradha Gupta @ 2026-06-24  7:21 UTC (permalink / raw)
  To: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Konstantin Taranov, Simon Horman, Erni Sri Satya Vennela,
	Dipayaan Roy, Shiraz Saleem, Michael Kelley, Long Li, Yury Norov
  Cc: Shradha Gupta, linux-hyperv, linux-kernel, netdev, Paul Rosswurm,
	Shradha Gupta, Saurabh Singh Sengar, stable, Yury Norov

Before the commit 755391121038 ("net: mana: Allocate MSI-X vectors
dynamically"), all the MANA IRQs were assigned statically and together
during early driver load.

After this commit, the IRQ allocation for MANA was done in two phases.
HWC IRQ allocated earlier and then, queue IRQs dynamically added at a
later point. By this time, the IRQ weights on vCPUs can become imbalanced
and if IRQ count is greater than the vCPU count the topology aware IRQ
distribution logic in MANA can cause multiple MANA IRQs to land on the
same vCPUs, while other sibling vCPUs have none (case 1).

On SMP enabled, low-vCPU systems, this becomes a bigger problem as the
softIRQ handling overhead of two IRQs on the same vCPUs becomes much more
than their overheads if they were spread across sibling vCPUs.

In such cases when many parallel TCP connections are tested, the
throughput drops significantly.

Fix the affinity assignment logic, in cases where the IRQ count is greater
than the vCPU count and when IRQs are added dynamically, by utilizing all
the vCPUs irrespective of their NUMA/core bindings (case 2).

The results of setting the affinity and hint to NULL were also studied,
and we observed that, with this logic if there are pre-existing IRQs
allocated on the VM (apart from MANA), during MANA IRQs allocation, it
leads to clustering of the MANA queue IRQs again (case 3).


=======================================================
Case 1: without this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

	TYPE		effective vCPU aff
=======================================================
IRQ0:	HWC		0
IRQ1:	mana_q1		0
IRQ2:	mana_q2		2
IRQ3:	mana_q3		0
IRQ4:	mana_q4		3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU		0	1	2	3
=======================================================
pass 1:		38.85	0.03	24.89	24.65
pass 2:		39.15	0.03	24.57	25.28
pass 3:		40.36	0.03	23.20	23.17

=======================================================
Case 2: with this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

        TYPE            effective vCPU aff
=======================================================
IRQ0:   HWC             0
IRQ1:   mana_q1         0
IRQ2:   mana_q2         1
IRQ3:   mana_q3         2
IRQ4:   mana_q4         3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU            0       1       2       3
=======================================================
pass 1:         15.42	15.85	14.99	14.51
pass 2:         15.53	15.94	15.81	15.93
pass 3:         16.41	16.35	16.40	16.36

=======================================================
Case 3: with affinity set to NULL
=======================================================
4 vCPU(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

	TYPE		effective vCPU aff
=======================================================
IRQ0:	HWC			0
IRQ1:	mana_q1			2
IRQ2:	mana_q2			3
IRQ3:	mana_q3			2
IRQ4:	mana_q4			3

=======================================================
Throughput Impact(in Gbps, same env)
=======================================================
TCP conn	with patch	w/o patch	aff NULL
20480		15.65		7.73		5.25
10240		15.63		8.93		5.77
8192		15.64		9.69		7.16
6144		15.64		13.16		9.33
4096		15.69		15.75		13.50
2048		15.69		15.83		13.61
1024		15.71		15.28		13.60

Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
Cc: stable@vger.kernel.org
Co-developed-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
---
Changes in v5
 * modify commit message to align with fix patch format
---
Changes in v4
 * Add mana prefix on irq_affinity_*() in mana driver
 * Corrected grammar, comment for mana_irq_setup_linear()
 * added new line as per guidelines
 * added case 3 in commit message for when affinity is NULL
---
Changes in v3
 * Optimize the comments in mana_gd_setup_dyn_irqs()
 * add more details in the dev_dbg for extra IRQs
---
Changes in v2
 * Removed the unused skip_first_cpu variable
 * fixed exit condition in irq_setup_linear() with len == 0
 * changed return type of irq_setup_linear() as it will always be 0
 * removed the unnecessary rcu_read_lock() in irq_setup_linear()
 * added appropriate comments to indicate expected behaviour when
   IRQs are more than or equal to num_online_cpus()
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 78 +++++++++++++++----
 1 file changed, 64 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index a0fdd052d7f1..e8b7ffb47eb9 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -210,6 +210,8 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	} else {
 		/* If dynamic allocation is enabled we have already allocated
 		 * hwc msi
+		 * Also, we make sure in this case the following is always true
+		 * (num_msix_usable - 1 HWC) <= num_online_cpus()
 		 */
 		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
 	}
@@ -1909,8 +1911,8 @@ void mana_gd_free_res_map(struct gdma_resource *r)
  * do the same thing.
  */
 
-static int irq_setup(unsigned int *irqs, unsigned int len, int node,
-		     bool skip_first_cpu)
+static int mana_irq_setup_numa_aware(unsigned int *irqs, unsigned int len,
+				     int node, bool skip_first_cpu)
 {
 	const struct cpumask *next, *prev = cpu_none_mask;
 	cpumask_var_t cpus __free(free_cpumask_var);
@@ -1946,11 +1948,24 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node,
 	return 0;
 }
 
+/* must be called with cpus_read_lock() held */
+static void mana_irq_setup_linear(unsigned int *irqs, unsigned int len)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (len == 0)
+			break;
+
+		irq_set_affinity_and_hint(*irqs++, cpumask_of(cpu));
+		len--;
+	}
+}
+
 static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	struct gdma_irq_context *gic;
-	bool skip_first_cpu = false;
 	int *irqs, err, i, msi;
 
 	irqs = kmalloc_objs(int, nvec);
@@ -1958,10 +1973,12 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 		return -ENOMEM;
 
 	/*
+	 * In this function, num_msix_usable = HWC IRQ + Queue IRQ.
+	 * nvec is only Queue IRQ (HWC already setup).
 	 * While processing the next pci irq vector, we start with index 1,
 	 * as IRQ vector at index 0 is already processed for HWC.
 	 * However, the population of irqs array starts with index 0, to be
-	 * further used in irq_setup()
+	 * further used in mana_irq_setup_numa_aware()
 	 */
 	for (i = 1; i <= nvec; i++) {
 		msi = i;
@@ -1975,18 +1992,51 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 	}
 
 	/*
-	 * When calling irq_setup() for dynamically added IRQs, if number of
-	 * CPUs is more than or equal to allocated MSI-X, we need to skip the
-	 * first CPU sibling group since they are already affinitized to HWC IRQ
+	 * When calling mana_irq_setup_numa_aware() for dynamically added IRQs,
+	 * if number of CPUs is more than or equal to allocated MSI-X, we need to
+	 * skip the first CPU sibling group since they are already affinitized to
+	 * HWC IRQ
 	 */
 	cpus_read_lock();
-	if (gc->num_msix_usable <= num_online_cpus())
-		skip_first_cpu = true;
+	if (gc->num_msix_usable <= num_online_cpus()) {
+		err = mana_irq_setup_numa_aware(irqs, nvec, gc->numa_node,
+						true);
+		if (err) {
+			cpus_read_unlock();
+			goto free_irq;
+		}
+	} else {
+		/*
+		 * When num_msix_usable are more than num_online_cpus, our
+		 * queue IRQs should be equal to num of online vCPUs.
+		 * We try to make sure queue IRQs spread across all vCPUs.
+		 * In such a case NUMA or CPU core affinity does not matter.
+		 * Note: in this case the total mana IRQ should always be
+		 * num_online_cpus + 1. The first HWC IRQ is already handled
+		 * in HWC setup calls
+		 * However, if CPUs went offline since num_msix_usable was
+		 * computed, queue IRQs will be more than num_online_cpus().
+		 * In such cases remaining extra IRQs will retain their default
+		 * affinity.
+		 */
+		int first_unassigned = num_online_cpus();
 
-	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
-	if (err) {
-		cpus_read_unlock();
-		goto free_irq;
+		if (nvec > first_unassigned) {
+			char buf[32];
+
+			if (first_unassigned == nvec - 1)
+				snprintf(buf, sizeof(buf), "%d",
+					 first_unassigned);
+			else
+				snprintf(buf, sizeof(buf), "%d-%d",
+					 first_unassigned, nvec - 1);
+
+			dev_dbg(&pdev->dev,
+				"MANA IRQ indices #%s will retain the default CPU affinity\n",
+				buf);
+		}
+
+		mana_irq_setup_linear(irqs, nvec);
 	}
 
 	cpus_read_unlock();
@@ -2041,7 +2091,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
 		nvec -= 1;
 	}
 
-	err = irq_setup(irqs, nvec, gc->numa_node, false);
+	err = mana_irq_setup_numa_aware(irqs, nvec, gc->numa_node, false);
 	if (err) {
 		cpus_read_unlock();
 		goto free_irq;

base-commit: 96e7f9122aae0ed000ee321f324b812a447906d9
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v5 01/15] drm/amd/display: Handle struct drm_plane_state.ignore_damage_clips
From: Harry Wentland @ 2026-06-24 16:06 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklkml,
	zack.rusin, bcm-kernel-feedback-list, sunpeng.li, siqueira,
	alexander.deucher, rodrigo.vivi, joonas.lahtinen, tursulin,
	javierm, dmitry.osipenko, gurchetansingh, olvaffe
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, amd-gfx, Zack Rusin, stable
In-Reply-To: <20260610152505.260172-2-tzimmermann@suse.de>



On 2026-06-10 11:18, Thomas Zimmermann wrote:
> The mode-setting pipeline can disabled damage clippings for a commit
> by setting ignore_damage_clips in struct drm_plane_state. The commit
> will then do a full display update.
> 
> Test the flag in DCN code and do a full update in DCN code if it has
> been set.
> 
> Commit 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers
> to ignore damage clips") introduced ignore_damage_clips to selectively
> ignore damage clipping in certain framebuffer changes. This driver does
> not do that, but DRM's damage iterator will soon rely on the flag.
> Therefore supporting it here as well make sense for consistency.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Fixes: 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to ignore damage clips")
> Cc: Javier Martinez Canillas <javierm@redhat.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Zack Rusin <zackr@vmware.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v6.8+

While I haven't looked thoroughly at the rest of the series this
patch for amdgpu_dm looks fine.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 0e20194e6662..4cbb27f65a0b 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -6614,8 +6614,8 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
>  {
>  	struct dm_crtc_state *dm_crtc_state = to_dm_crtc_state(crtc_state);
>  	struct rect *dirty_rects = flip_addrs->dirty_rects;
> -	u32 num_clips;
> -	struct drm_mode_rect *clips;
> +	u32 num_clips = 0;
> +	struct drm_mode_rect *clips = NULL;
>  	bool bb_changed;
>  	bool fb_changed;
>  	u32 i = 0;
> @@ -6631,8 +6631,10 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
>  	if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
>  		goto ffu;
>  
> -	num_clips = drm_plane_get_damage_clips_count(new_plane_state);
> -	clips = drm_plane_get_damage_clips(new_plane_state);
> +	if (!new_plane_state->ignore_damage_clips) {
> +		num_clips = drm_plane_get_damage_clips_count(new_plane_state);
> +		clips = drm_plane_get_damage_clips(new_plane_state);
> +	}
>  
>  	if (num_clips && (!amdgpu_damage_clips || (amdgpu_damage_clips < 0 &&
>  						   is_psr_su)))


^ permalink raw reply

* [PATCH] hyperv: mshv: zero VTL hypercall output page
From: Yousef Alhouseen @ 2026-06-24 17:21 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li
  Cc: linux-hyperv, linux-kernel, Yousef Alhouseen

mshv_vtl_hvcall_call() copies output_size bytes from a freshly allocated
hypercall output page back to userspace. The page is currently allocated
without __GFP_ZERO, so any bytes not written by the hypervisor are copied
from stale page contents.

Allocate the output page zeroed before issuing the hypercall. Also check
both bounce-page allocations before using them so memory pressure cannot
turn the copy paths into NULL pointer dereferences.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 drivers/hv/mshv_vtl_main.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
index 0d3d41619..0365d207c 100644
--- a/drivers/hv/mshv_vtl_main.c
+++ b/drivers/hv/mshv_vtl_main.c
@@ -1147,7 +1147,11 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
 	 * TODO: Take care of this when CVM support is added.
 	 */
 	in = (void *)__get_free_page(GFP_KERNEL);
-	out = (void *)__get_free_page(GFP_KERNEL);
+	out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (!in || !out) {
+		ret = -ENOMEM;
+		goto free_pages;
+	}
 
 	if (copy_from_user(in, (void __user *)hvcall.input_ptr, hvcall.input_size)) {
 		ret = -EFAULT;
@@ -1162,8 +1166,10 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
 	}
 	ret = put_user(hvcall.status, &hvcall_user->status);
 free_pages:
-	free_page((unsigned long)in);
-	free_page((unsigned long)out);
+	if (in)
+		free_page((unsigned long)in);
+	if (out)
+		free_page((unsigned long)out);
 
 	return ret;
 }
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH] hyperv: mshv: zero VTL hypercall output page
From: sashiko-bot @ 2026-06-24 17:32 UTC (permalink / raw)
  To: Yousef Alhouseen; +Cc: linux-hyperv
In-Reply-To: <20260624172157.2790-1-alhouseenyousef@gmail.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but the patch only partially fixes the information leak. While it zeroes the `out` page, the input bounce page `in` is still allocated without the `__GFP_ZERO` flag. If a user provides an `hvcall.input_size` that is smaller than the input size expected by the hypercall (specified by `hvcall.control`), `copy_from_user()` will only partially initialize the `in` page. The remainder of the page will contain uninitialized, stale kernel memory. This full page is then passed to the hypervisor. If the hypervisor processes these uninitialized bytes and reflects them into the `out` page or alters its observable behavior based on them, sensitive kernel data is leaked to userspace.
--

commit d8767fa45f0b927a8630a3fee3fba738b9ae7851
Author: Yousef Alhouseen <alhouseenyousef@gmail.com>

hyperv: mshv: zero VTL hypercall output page
    
This patch zeroes the hypercall output page to avoid copying stale memory
to userspace and adds NULL pointer checks for the page allocations.

> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0d3d4161974f8..0365d207cb110 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1147,7 +1147,11 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
>  	 * TODO: Take care of this when CVM support is added.
>  	 */
>  	in = (void *)__get_free_page(GFP_KERNEL);

[Severity: High]
This isn't a bug introduced by this patch, but should the "in" page also
be allocated with __GFP_ZERO to prevent leaking stale kernel memory?

If userspace provides an hvcall.input_size that is smaller than what the
hypercall expects, copy_from_user() will only partially initialize the
"in" page. The remainder of the page will contain uninitialized kernel
memory which is then passed to the hypervisor in hv_do_hypercall(). If the
hypervisor reflects this input or alters behavior based on it, could this
result in an information leak?

> -	out = (void *)__get_free_page(GFP_KERNEL);
> +	out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!in || !out) {
> +		ret = -ENOMEM;
> +		goto free_pages;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624172157.2790-1-alhouseenyousef@gmail.com?part=1

^ permalink raw reply

* [PATCH] hyperv: mshv: zero VTL hypercall input page
From: Yousef Alhouseen @ 2026-06-24 17:57 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li
  Cc: linux-hyperv, linux-kernel, Yousef Alhouseen

mshv_vtl_hvcall_call() copies only the user-provided input size.

It then passes the page to hv_do_hypercall().

For short inputs, stale bytes can remain in the bounce page.

Those bytes can be consumed by the hypervisor.

Allocate the input page zeroed, matching the output page.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 drivers/hv/mshv_vtl_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
index 0365d207c..f2633148c 100644
--- a/drivers/hv/mshv_vtl_main.c
+++ b/drivers/hv/mshv_vtl_main.c
@@ -1146,7 +1146,7 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
 	 *
 	 * TODO: Take care of this when CVM support is added.
 	 */
-	in = (void *)__get_free_page(GFP_KERNEL);
+	in = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 	out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 	if (!in || !out) {
 		ret = -ENOMEM;
-- 
2.54.0


^ permalink raw reply related

* [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Jakub Kicinski @ 2026-06-24 19:04 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	Breno Leitao, joshwash, hramamurthy, anthony.l.nguyen,
	przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
	kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
	jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
	linux-rdma, linux-hyperv

Breno reports following splats on mlx5:

  RTNL: assertion failed at net/core/dev.c (2241)
  WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
  RIP: 0010:netif_state_change+0xf9/0x130
  Call Trace:
    <TASK>
     __linkwatch_sync_dev+0xea/0x120
     ethtool_op_get_link+0xe/0x20
     __ethtool_get_link+0x26/0x40
     linkstate_prepare_data+0x51/0x200
     ethnl_default_doit+0x213/0x470
     genl_family_rcv_msg_doit+0xdd/0x110

Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
it just returns the link state, so add an opt-in bit.

Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: joshwash@google.com
CC: hramamurthy@google.com
CC: anthony.l.nguyen@intel.com
CC: przemyslaw.kitszel@intel.com
CC: saeedm@nvidia.com
CC: tariqt@nvidia.com
CC: mbloch@nvidia.com
CC: leon@kernel.org
CC: alexanderduyck@fb.com
CC: kernel-team@meta.com
CC: kys@microsoft.com
CC: haiyangz@microsoft.com
CC: wei.liu@kernel.org
CC: decui@microsoft.com
CC: longli@microsoft.com
CC: jordanrhee@google.com
CC: jacob.e.keller@intel.com
CC: nktgrg@google.com
CC: debarghyak@google.com
CC: leitao@debian.org
CC: mohsin.bashr@gmail.com
CC: ernis@linux.microsoft.com
CC: sdf@fomichev.me
CC: gal@nvidia.com
CC: linux-rdma@vger.kernel.org
CC: linux-hyperv@vger.kernel.org
---
 include/linux/ethtool.h                                 | 2 ++
 net/ethtool/common.h                                    | 4 ++++
 drivers/net/ethernet/google/gve/gve_ethtool.c           | 3 ++-
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c          | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c    | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c        | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 4 +++-
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c         | 3 ++-
 drivers/net/ethernet/microsoft/mana/mana_ethtool.c      | 3 ++-
 9 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 1b834e2a522e..5d491a98265e 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -942,6 +942,7 @@ struct kernel_ethtool_ts_info {
 #define ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM	BIT(5)
 #define ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM	BIT(6)
 #define ETHTOOL_OP_NEEDS_RTNL_RSS		BIT(7)
+#define ETHTOOL_OP_NEEDS_RTNL_GLINK		BIT(8)
 
 /**
  * struct ethtool_ops - optional netdev operations
@@ -978,6 +979,7 @@ struct kernel_ethtool_ts_info {
  *	 - phylink helpers (note that phydev is currently unsupported!)
  *	 - netdev_update_features()
  *	 - netif_set_real_num_tx_queues()
+ *	 - ethtool_op_get_link() (syncs link watch under rtnl_lock)
  *
  * @get_drvinfo: Report driver/device information. Modern drivers no
  *	longer have to implement this callback. Most fields are
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 2b3847f00801..4e5356e26f40 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -113,6 +113,8 @@ ethtool_nl_msg_needs_rtnl(const struct net_device *dev, u8 cmd)
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM;
 	case ETHTOOL_MSG_RSS_SET:
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
+	case ETHTOOL_MSG_LINKSTATE_GET:
+		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
 	case ETHTOOL_MSG_TSCONFIG_GET:
 	case ETHTOOL_MSG_TSCONFIG_SET:
 		/* tsconfig calls ndos (ndo_hwtstamp_set/get), not ethtool ops.
@@ -159,6 +161,8 @@ ethtool_ioctl_needs_rtnl(const struct net_device *dev, u32 ethcmd)
 	case ETHTOOL_SRXFH:
 	case ETHTOOL_SRXFHINDIR:
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
+	case ETHTOOL_GLINK:
+		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
 	}
 	return false;
 }
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 7cc22916852f..8199738ba979 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -984,7 +984,8 @@ const struct ethtool_ops gve_ethtool_ops = {
 	.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
 				 ETHTOOL_RING_USE_RX_BUF_LEN,
 	.op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			 ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo = gve_get_drvinfo,
 	.get_strings = gve_get_strings,
 	.get_sset_count = gve_get_sset_count,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index a615d599b88e..e7cf12eaa268 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -1855,6 +1855,7 @@ static const struct ethtool_ops iavf_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.supported_input_xfrm	= RXH_XFRM_SYM_XOR,
+	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo		= iavf_get_drvinfo,
 	.get_link		= ethtool_op_get_link,
 	.get_ringparam		= iavf_get_ringparam,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2f5b626ba33f..112926d07634 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2721,7 +2721,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 	.rxfh_max_num_contexts	= MLX5E_MAX_NUM_RSS,
 	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
 				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
-				  ETHTOOL_OP_NEEDS_RTNL_SPFLAGS,
+				  ETHTOOL_OP_NEEDS_RTNL_SPFLAGS |
+				  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE |
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 1a8a19f980d3..c8b76d301c92 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -419,7 +419,8 @@ static const struct ethtool_ops mlx5e_rep_ethtool_ops = {
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.op_needs_rtnl	   = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			     ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			     ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			     ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo	   = mlx5e_rep_get_drvinfo,
 	.get_link	   = ethtool_op_get_link,
 	.get_strings       = mlx5e_rep_get_strings,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index 9b3b32408c64..01ddc3def9ac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -286,7 +286,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.op_needs_rtnl	    = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			      ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			      ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			      ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo        = mlx5i_get_drvinfo,
 	.get_strings        = mlx5i_get_strings,
 	.get_sset_count     = mlx5i_get_sset_count,
@@ -309,6 +310,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
 };
 
 const struct ethtool_ops mlx5i_pkey_ethtool_ops = {
+	.op_needs_rtnl	    = ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo        = mlx5i_get_drvinfo,
 	.get_link           = ethtool_op_get_link,
 	.get_ts_info        = mlx5i_get_ts_info,
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index cb34fc166ef9..0e47088ec44b 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -2024,7 +2024,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = {
 					  ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM |
 					  ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM |
 					  ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-					  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+					  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+					  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo			= fbnic_get_drvinfo,
 	.get_regs_len			= fbnic_get_regs_len,
 	.get_regs			= fbnic_get_regs,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 94e658d07a27..881df597d7f9 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -597,7 +597,8 @@ static int mana_get_link_ksettings(struct net_device *ndev,
 const struct ethtool_ops mana_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
 	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+				  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_ethtool_stats	= mana_get_ethtool_stats,
 	.get_sset_count		= mana_get_sset_count,
 	.get_strings		= mana_get_strings,
-- 
2.54.0


^ permalink raw reply related

* [PATCH net v2 0/2] Fix MANA RX with bounce buffering
From: Dexuan Cui @ 2026-06-24 22:26 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
	jacob.e.keller, ssengar, linux-hyperv, netdev, linux-kernel,
	linux-rdma

With swiotlb=force, the MANA NIC fails to work properly due to commit
730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead
of full pages to improve memory efficiency.")

Dipayaan tried to fix this by avoiding page pool frags when bounce
buffering is in use [1][2]. However, that is not a clean solution: no
other NIC drivers need to explicitly check whether bounce buffering is
in use. It is also not good for throughput, since
dma_map_single()/dma_unmap_single() are then called for each incoming
packet.

In fact, page pool frags can still be used with the standard MTU of
1500: all we need is to add page_pool_dma_sync_for_cpu() before the CPU
reads the incoming packet, so I implemented that in v1 [3].

As Simon suggested [4], this version splits v1 into two patches:
Patch 1 adds page_pool_dma_sync_for_cpu().
Patch 2 validates the packet length reported by the NIC.

There is no functional difference between v1 and v2, so I am keeping
Haiyang's Reviewed-by tag in v2.

Please review. Thanks!

Note that, with jumbo MTU and XDP, page pool frags are not used, and
dma_map_single()/dma_unmap_single() are still called for each incoming
packet, causing poor throughput with swiotlb=force; see
mana_get_rxbuf_cfg() and mana_refill_rx_oob() -> mana_get_rxfrag().
The jumbo MTU/XDP issue will be addressed later since that needs more
consideration if we want to use page pool with PP_FLAG_DMA_MAP there:
e.g., for XDP, the received packet can be transmitted in place, i.e. the
same RX buffer can be used as a TX buffer:
mana_rx_skb() -> mana_xdp_tx() -> mana_start_xmit() -> mana_map_skb().

In mana_create_page_pool(), we may have to set pprm.dma_dir to
DMA_BIDIRECTIONAL if XDP is in use:
pprm.dma_dir = mana_xdp_get(mpc) ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;

In the case of XDP, the next issue is that mana_rx_skb() -> ... ->
mana_map_skb() appears to call dma_map_single() on an RX buffer allocated
from a page pool created with PP_FLAG_DMA_MAP, which seems incorrect.
Any thoughts?

[1] https://lore.kernel.org/all/ae91hyrLf4n23XE6@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/#r
[2] https://lore.kernel.org/all/ae9pxvJfkAZYfKMf@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
[3] https://lore.kernel.org/all/20260618035029.249361-1-decui@microsoft.com/
[4] https://lore.kernel.org/all/20260619090514.GT827683@horms.kernel.org/

Dexuan Cui (2):
  net: mana: Sync page pool RX frags for CPU
  net: mana: Validate the packet length reported by the NIC

 drivers/net/ethernet/microsoft/mana/mana_en.c | 61 +++++++++++++++----
 include/net/mana/mana.h                       |  8 +++
 2 files changed, 58 insertions(+), 11 deletions(-)

-- 
2.34.1


^ permalink raw reply

* [PATCH net v2 2/2] net: mana: Validate the packet length reported by the NIC
From: Dexuan Cui @ 2026-06-24 22:26 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
	jacob.e.keller, ssengar, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: stable
In-Reply-To: <20260624222605.1794719-1-decui@microsoft.com>

Validate the packet length reported in the RX CQE before using it as a DMA
sync length or passing it to skb processing. The CQE is supplied by the
NIC device and should not be blindly trusted.

Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes since v1:
    v1 is split into two patches in the v2.
    Add Haiyang's Reviewed-by.

 drivers/net/ethernet/microsoft/mana/mana_en.c | 24 +++++++++++++++----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 1875bffd82b7..0b44c51ae6ec 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2190,12 +2190,26 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
 		rxbuf_oob = &rxq->rx_oobs[curr];
 		WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
 
-		mana_refill_rx_oob(dev, rxq, rxbuf_oob, pktlen, &old_buf, &old_fp);
+		if (unlikely(pktlen > rxq->datasize)) {
+			/* Increase it even if mana_rx_skb() isn't called. */
+			rxq->rx_cq.work_done++;
 
-		/* Unsuccessful refill will have old_buf == NULL.
-		 * In this case, mana_rx_skb() will drop the packet.
-		 */
-		mana_rx_skb(old_buf, old_fp, oob, rxq, i);
+			++ndev->stats.rx_dropped;
+			netdev_warn_once(ndev,
+				"Dropped oversized RX packet: len=%u, datasize=%u\n",
+				pktlen, rxq->datasize);
+
+			/* Reuse the RX buffer since rxbuf_oob is unchanged. */
+		} else {
+
+			mana_refill_rx_oob(dev, rxq, rxbuf_oob, pktlen,
+					   &old_buf, &old_fp);
+
+			/* Unsuccessful refill will have old_buf == NULL.
+			 * In this case, mana_rx_skb() will drop the packet.
+			 */
+			mana_rx_skb(old_buf, old_fp, oob, rxq, i);
+		}
 
 		mana_move_wq_tail(rxq->gdma_rq,
 				  rxbuf_oob->wqe_inf.wqe_size_in_bu);
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 1/2] net: mana: Sync page pool RX frags for CPU
From: Dexuan Cui @ 2026-06-24 22:26 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
	jacob.e.keller, ssengar, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: stable
In-Reply-To: <20260624222605.1794719-1-decui@microsoft.com>

MANA allocates RX buffers from page pool fragments when frag_count is
greater than 1. In that case the buffers remain DMA mapped by page pool
and the RX completion path does not call dma_unmap_single(). As a result,
the implicit sync-for-CPU normally performed by dma_unmap_single() is
missing before the packet data is passed to the networking stack.

This breaks RX on configurations which require explicit DMA syncing, for
example when booted with swiotlb=force.

Fix this by recording the page pool page and DMA sync offset when the RX
buffer is allocated, and syncing the received packet range for CPU access
before handing the RX buffer to the stack.

Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes since v1:
    v1 is split into two patches in the v2.
    Add Haiyang's Reviewed-by.

 drivers/net/ethernet/microsoft/mana/mana_en.c | 39 +++++++++++++++----
 include/net/mana/mana.h                       |  8 ++++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index c9b1df1ed109..1875bffd82b7 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2044,12 +2044,16 @@ static void mana_rx_skb(void *buf_va, bool from_pool,
 }
 
 static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
-			     dma_addr_t *da, bool *from_pool)
+			     dma_addr_t *da, bool *from_pool,
+			     struct page **pp_page, u32 *dma_sync_offset)
 {
 	struct page *page;
 	u32 offset;
 	void *va;
+
 	*from_pool = false;
+	*pp_page = NULL;
+	*dma_sync_offset = 0;
 
 	/* Don't use fragments for jumbo frames or XDP where it's 1 fragment
 	 * per page.
@@ -2087,31 +2091,47 @@ static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
 	va  = page_to_virt(page) + offset;
 	*da = page_pool_get_dma_addr(page) + offset + rxq->headroom;
 	*from_pool = true;
+	*pp_page = page;
+	*dma_sync_offset = offset + rxq->headroom;
 
 	return va;
 }
 
 /* Allocate frag for rx buffer, and save the old buf */
 static void mana_refill_rx_oob(struct device *dev, struct mana_rxq *rxq,
-			       struct mana_recv_buf_oob *rxoob, void **old_buf,
-			       bool *old_fp)
+			       struct mana_recv_buf_oob *rxoob, u32 pktlen,
+			       void **old_buf, bool *old_fp)
 {
+	struct page *pp_page;
+	u32 dma_sync_offset;
 	bool from_pool;
 	dma_addr_t da;
 	void *va;
 
-	va = mana_get_rxfrag(rxq, dev, &da, &from_pool);
+	va = mana_get_rxfrag(rxq, dev, &da, &from_pool, &pp_page,
+			     &dma_sync_offset);
 	if (!va)
 		return;
-	if (!rxoob->from_pool || rxq->frag_count == 1)
+	if (!rxoob->from_pool || rxq->frag_count == 1) {
 		dma_unmap_single(dev, rxoob->sgl[0].address, rxq->datasize,
 				 DMA_FROM_DEVICE);
+	} else {
+		/* The page pool maps the whole page and only syncs for device
+		 * automatically (PP_FLAG_DMA_SYNC_DEV). Sync the received bytes
+		 * for the CPU before they are read: this is required if DMA
+		 * is incoherent or bounce buffers are used.
+		 */
+		page_pool_dma_sync_for_cpu(rxq->page_pool, rxoob->pp_page,
+					   rxoob->dma_sync_offset, pktlen);
+	}
 	*old_buf = rxoob->buf_va;
 	*old_fp = rxoob->from_pool;
 
 	rxoob->buf_va = va;
 	rxoob->sgl[0].address = da;
 	rxoob->from_pool = from_pool;
+	rxoob->pp_page = pp_page;
+	rxoob->dma_sync_offset = dma_sync_offset;
 }
 
 static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
@@ -2170,7 +2190,7 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
 		rxbuf_oob = &rxq->rx_oobs[curr];
 		WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
 
-		mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
+		mana_refill_rx_oob(dev, rxq, rxbuf_oob, pktlen, &old_buf, &old_fp);
 
 		/* Unsuccessful refill will have old_buf == NULL.
 		 * In this case, mana_rx_skb() will drop the packet.
@@ -2566,6 +2586,8 @@ static int mana_fill_rx_oob(struct mana_recv_buf_oob *rx_oob, u32 mem_key,
 			    struct mana_rxq *rxq, struct device *dev)
 {
 	struct mana_port_context *mpc = netdev_priv(rxq->ndev);
+	struct page *pp_page = NULL;
+	u32 dma_sync_offset = 0;
 	bool from_pool = false;
 	dma_addr_t da;
 	void *va;
@@ -2573,13 +2595,16 @@ static int mana_fill_rx_oob(struct mana_recv_buf_oob *rx_oob, u32 mem_key,
 	if (mpc->rxbufs_pre)
 		va = mana_get_rxbuf_pre(rxq, &da);
 	else
-		va = mana_get_rxfrag(rxq, dev, &da, &from_pool);
+		va = mana_get_rxfrag(rxq, dev, &da, &from_pool, &pp_page,
+				     &dma_sync_offset);
 
 	if (!va)
 		return -ENOMEM;
 
 	rx_oob->buf_va = va;
 	rx_oob->from_pool = from_pool;
+	rx_oob->pp_page = pp_page;
+	rx_oob->dma_sync_offset = dma_sync_offset;
 
 	rx_oob->sgl[0].address = da;
 	rx_oob->sgl[0].size = rxq->datasize;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 8f721cd4e4a7..4111b93169d2 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -305,6 +305,14 @@ struct mana_recv_buf_oob {
 
 	void *buf_va;
 	bool from_pool; /* allocated from a page pool */
+	/* head page of the page_pool fragment; valid only when
+	 * from_pool && frag_count > 1.
+	 */
+	struct page *pp_page;
+	/* Fragment offset plus rxq->headroom, passed to
+	 * page_pool_dma_sync_for_cpu().
+	 */
+	u32 dma_sync_offset;
 
 	/* SGL of the buffer going to be sent as part of the work request. */
 	u32 num_sge;
-- 
2.34.1


^ permalink raw reply related

* RE: [EXTERNAL] Re: [PATCH net] net: mana: Sync page pool RX frags for CPU
From: Dexuan Cui @ 2026-06-24 22:50 UTC (permalink / raw)
  To: Simon Horman
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Konstantin Taranov,
	ernis@linux.microsoft.com, dipayanroy@linux.microsoft.com,
	kees@kernel.org, jacob.e.keller@intel.com,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260619090514.GT827683@horms.kernel.org>

> From: Simon Horman <horms@kernel.org>
> Sent: Friday, June 19, 2026 2:05 AM
> > ...
> > Also validate the packet length reported in the RX CQE before using it as
> > a DMA sync length or passing it to skb processing. The CQE is supplied
> > by the device and should not be blindly trusted by Confidential VMs.
> 
> I think this last part warrants being split out into a separate patch.

Sorry for the late reply. I split v1 into 2 patches of v2, which I just posted:
https://lwn.net/ml/linux-kernel/20260624222605.1794719-1-decui@microsoft.com/
 
Thanks,
Dexuan

^ permalink raw reply

* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: patchwork-bot+netdevbpf @ 2026-06-25  1:08 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, ssengar, jacob.e.keller,
	horms, gargaditya, kees, linux-hyperv, netdev, linux-kernel, bpf
In-Reply-To: <20260619055348.467224-1-ernis@linux.microsoft.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 22:53:38 -0700 you wrote:
> Commit d7709812e13d ("net: mana: hardening: Validate adapter_mtu from
> MANA_QUERY_DEV_CONFIG") rejected any adapter_mtu value smaller than
> ETH_MIN_MTU + ETH_HLEN, including 0, returning -EPROTO and failing
> mana_probe().
> 
> Some older PF firmware versions still in the field report
> adapter_mtu as 0 in the MANA_QUERY_DEV_CONFIG response. With the
> hardening check in place, the MANA VF driver now fails to load on
> those hosts, breaking networking entirely for guests.
> 
> [...]

Here is the summary with links:
  - [net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
    https://git.kernel.org/netdev/net/c/6bd81a5b4e0d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Stanislav Fomichev @ 2026-06-25 15:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Breno Leitao, joshwash, hramamurthy, anthony.l.nguyen,
	przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
	kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
	jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
	linux-rdma, linux-hyperv
In-Reply-To: <20260624190439.2521219-1-kuba@kernel.org>

On 06/24, Jakub Kicinski wrote:
> Breno reports following splats on mlx5:
> 
>   RTNL: assertion failed at net/core/dev.c (2241)
>   WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
>   RIP: 0010:netif_state_change+0xf9/0x130
>   Call Trace:
>     <TASK>
>      __linkwatch_sync_dev+0xea/0x120
>      ethtool_op_get_link+0xe/0x20
>      __ethtool_get_link+0x26/0x40
>      linkstate_prepare_data+0x51/0x200
>      ethnl_default_doit+0x213/0x470
>      genl_family_rcv_msg_doit+0xdd/0x110
> 
> Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> it just returns the link state, so add an opt-in bit.
> 
> Reported-by: Breno Leitao <leitao@debian.org>
> Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply

* Re: [PATCH v5 net] net: mana: Optimize irq affinity for low vcpu configs
From: patchwork-bot+netdevbpf @ 2026-06-25 16:20 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: decui, wei.liu, haiyangz, kys, andrew+netdev, davem, edumazet,
	kuba, pabeni, kotaranov, horms, ernis, dipayanroy, shirazsaleem,
	mhklinux, longli, yury.norov, linux-hyperv, linux-kernel, netdev,
	paulros, shradhagupta, ssengar, stable, ynorov
In-Reply-To: <20260624072138.1632849-1-shradhagupta@linux.microsoft.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 24 Jun 2026 00:21:35 -0700 you wrote:
> Before the commit 755391121038 ("net: mana: Allocate MSI-X vectors
> dynamically"), all the MANA IRQs were assigned statically and together
> during early driver load.
> 
> After this commit, the IRQ allocation for MANA was done in two phases.
> HWC IRQ allocated earlier and then, queue IRQs dynamically added at a
> later point. By this time, the IRQ weights on vCPUs can become imbalanced
> and if IRQ count is greater than the vCPU count the topology aware IRQ
> distribution logic in MANA can cause multiple MANA IRQs to land on the
> same vCPUs, while other sibling vCPUs have none (case 1).
> 
> [...]

Here is the summary with links:
  - [v5,net] net: mana: Optimize irq affinity for low vcpu configs
    https://git.kernel.org/netdev/net/c/5316394b1752

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* RE: [PATCH] hyperv: mshv: zero VTL hypercall output page
From: Michael Kelley @ 2026-06-25 16:41 UTC (permalink / raw)
  To: Yousef Alhouseen, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260624172157.2790-1-alhouseenyousef@gmail.com>

From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Wednesday, June 24, 2026 10:22 AM

> Subject: [PATCH] hyperv: mshv: zero VTL hypercall output page

There was a recent discussion about what prefix to use in the patch
"Subject:" field for changes to MSHV VTL code. The agreement was to
use just "mshv_vtl:". See [1].

[1] https://lore.kernel.org/linux-hyperv/a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com/

> 
> mshv_vtl_hvcall_call() copies output_size bytes from a freshly allocated
> hypercall output page back to userspace. The page is currently allocated
> without __GFP_ZERO, so any bytes not written by the hypervisor are copied
> from stale page contents.

This is a good find! Even though the VTL user space code is somewhat trusted,
there should not be any circumstances where the kernel could copy random
garbage to user space.

> 
> Allocate the output page zeroed before issuing the hypercall.

Hypercall output is usually no more than a few tens of bytes. Zeroing
the entire page is a bit expensive. It would be sufficient to just zero
output_size bytes.

Standard practice is to *not* zero to the hypercall output area, since
the hypercall invoker knows exactly how many bytes Hyper-V will
return for a particular hypercall, and Hyper-V is responsible for not
leaving any garbage. So it would be good to leave a code comment
here about why the output area is being zero'ed contrary to that
standard practice.

I would note that many hypercalls don't return any output other
than the hypercall status. If output_size is zero, allocating the
output page could be skipped. But that's a further
optimization for another patch.

> Also check
> both bounce-page allocations before using them so memory pressure cannot
> turn the copy paths into NULL pointer dereferences.
> 
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
>  drivers/hv/mshv_vtl_main.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0d3d41619..0365d207c 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1147,7 +1147,11 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
>  	 * TODO: Take care of this when CVM support is added.
>  	 */
>  	in = (void *)__get_free_page(GFP_KERNEL);
> -	out = (void *)__get_free_page(GFP_KERNEL);
> +	out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!in || !out) {
> +		ret = -ENOMEM;
> +		goto free_pages;
> +	}
> 
>  	if (copy_from_user(in, (void __user *)hvcall.input_ptr, hvcall.input_size)) {
>  		ret = -EFAULT;
> @@ -1162,8 +1166,10 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
>  	}
>  	ret = put_user(hvcall.status, &hvcall_user->status);
>  free_pages:
> -	free_page((unsigned long)in);
> -	free_page((unsigned long)out);
> +	if (in)
> +		free_page((unsigned long)in);
> +	if (out)
> +		free_page((unsigned long)out);

Testing "in" and "out" here isn't necessary. free_page()
already has code to do nothing if its argument is zero.

Michael 

> 
>  	return ret;
>  }
> --
> 2.54.0
> 


^ permalink raw reply

* RE: [PATCH] hyperv: mshv: zero VTL hypercall input page
From: Michael Kelley @ 2026-06-25 16:41 UTC (permalink / raw)
  To: Yousef Alhouseen, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260624175703.9285-1-alhouseenyousef@gmail.com>

From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Wednesday, June 24, 2026 10:57 AM
> Subject: [PATCH] hyperv: mshv: zero VTL hypercall input page
> 

Same comment here about the patch "Subject:" prefix.

> mshv_vtl_hvcall_call() copies only the user-provided input size.
> 
> It then passes the page to hv_do_hypercall().
> 
> For short inputs, stale bytes can remain in the bounce page.
> 
> Those bytes can be consumed by the hypervisor.

It's unclear to me that there's really a problem here. In a
CoCo VM, the host hypervisor isn't trusted, so hypercall sites
must be careful to only expose intended data in the hypercall
input and output pages. But this code already doesn't support
CoCo VMs, as noted in the comment. So in the supported
scenario, the hypervisor has access to all of guest memory. Passing
stale bytes to the hypervisor vs. passing zeros really wouldn't matter.
And user space can already pass stale/garbage bytes to the hypervisor
if it wants to. This code doesn't try to validate the input data for
whatever hypercall user space is requesting to be made.

When support for CoCo VMs is added, this code will indeed
need to make sure not to allow garbage kernel data in the
hypercall input or output pages. But decrypting the pages
so the hypervisor can access them should take care of that
issue.

Michael

> 
> Allocate the input page zeroed, matching the output page.
> 
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
>  drivers/hv/mshv_vtl_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0365d207c..f2633148c 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1146,7 +1146,7 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
>  	 *
>  	 * TODO: Take care of this when CVM support is added.
>  	 */
> -	in = (void *)__get_free_page(GFP_KERNEL);
> +	in = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>  	out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>  	if (!in || !out) {
>  		ret = -ENOMEM;
> --
> 2.54.0
> 


^ permalink raw reply

* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Breno Leitao @ 2026-06-25 16:47 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, joshwash, hramamurthy, anthony.l.nguyen,
	przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
	kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
	jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
	linux-rdma, linux-hyperv
In-Reply-To: <aj1Nqe3RoITzxSEb@devvm7509.cco0.facebook.com>

On Thu, Jun 25, 2026 at 08:48:03AM -0700, Stanislav Fomichev wrote:
> On 06/24, Jakub Kicinski wrote:
> > Breno reports following splats on mlx5:
> > 
> >   RTNL: assertion failed at net/core/dev.c (2241)
> >   WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
> >   RIP: 0010:netif_state_change+0xf9/0x130
> >   Call Trace:
> >     <TASK>
> >      __linkwatch_sync_dev+0xea/0x120
> >      ethtool_op_get_link+0xe/0x20
> >      __ethtool_get_link+0x26/0x40
> >      linkstate_prepare_data+0x51/0x200
> >      ethnl_default_doit+0x213/0x470
> >      genl_family_rcv_msg_doit+0xdd/0x110
> > 
> > Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> > which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> > it just returns the link state, so add an opt-in bit.
> > 
> > Reported-by: Breno Leitao <leitao@debian.org>
> > Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Reviewed-by: Breno Leitao <leitao@debian.org>

^ permalink raw reply

* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Harshitha Ramamurthy @ 2026-06-25 17:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Breno Leitao, joshwash, anthony.l.nguyen, przemyslaw.kitszel,
	saeedm, tariqt, mbloch, leon, alexanderduyck, kernel-team, kys,
	haiyangz, wei.liu, decui, longli, jordanrhee, jacob.e.keller,
	nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal, linux-rdma,
	linux-hyperv
In-Reply-To: <20260624190439.2521219-1-kuba@kernel.org>

On Wed, Jun 24, 2026 at 12:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Breno reports following splats on mlx5:
>
>   RTNL: assertion failed at net/core/dev.c (2241)
>   WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
>   RIP: 0010:netif_state_change+0xf9/0x130
>   Call Trace:
>     <TASK>
>      __linkwatch_sync_dev+0xea/0x120
>      ethtool_op_get_link+0xe/0x20
>      __ethtool_get_link+0x26/0x40
>      linkstate_prepare_data+0x51/0x200
>      ethnl_default_doit+0x213/0x470
>      genl_family_rcv_msg_doit+0xdd/0x110
>
> Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> it just returns the link state, so add an opt-in bit.
>
> Reported-by: Breno Leitao <leitao@debian.org>
> Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: joshwash@google.com
> CC: hramamurthy@google.com
> CC: anthony.l.nguyen@intel.com
> CC: przemyslaw.kitszel@intel.com
> CC: saeedm@nvidia.com
> CC: tariqt@nvidia.com
> CC: mbloch@nvidia.com
> CC: leon@kernel.org
> CC: alexanderduyck@fb.com
> CC: kernel-team@meta.com
> CC: kys@microsoft.com
> CC: haiyangz@microsoft.com
> CC: wei.liu@kernel.org
> CC: decui@microsoft.com
> CC: longli@microsoft.com
> CC: jordanrhee@google.com
> CC: jacob.e.keller@intel.com
> CC: nktgrg@google.com
> CC: debarghyak@google.com
> CC: leitao@debian.org
> CC: mohsin.bashr@gmail.com
> CC: ernis@linux.microsoft.com
> CC: sdf@fomichev.me
> CC: gal@nvidia.com
> CC: linux-rdma@vger.kernel.org
> CC: linux-hyperv@vger.kernel.org
> ---
>  include/linux/ethtool.h                                 | 2 ++
>  net/ethtool/common.h                                    | 4 ++++
>  drivers/net/ethernet/google/gve/gve_ethtool.c           | 3 ++-
>  drivers/net/ethernet/intel/iavf/iavf_ethtool.c          | 1 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c    | 3 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c        | 3 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 4 +++-
>  drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c         | 3 ++-
>  drivers/net/ethernet/microsoft/mana/mana_ethtool.c      | 3 ++-
>  9 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 1b834e2a522e..5d491a98265e 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -942,6 +942,7 @@ struct kernel_ethtool_ts_info {
>  #define ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM      BIT(5)
>  #define ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM      BIT(6)
>  #define ETHTOOL_OP_NEEDS_RTNL_RSS              BIT(7)
> +#define ETHTOOL_OP_NEEDS_RTNL_GLINK            BIT(8)
>
>  /**
>   * struct ethtool_ops - optional netdev operations
> @@ -978,6 +979,7 @@ struct kernel_ethtool_ts_info {
>   *      - phylink helpers (note that phydev is currently unsupported!)
>   *      - netdev_update_features()
>   *      - netif_set_real_num_tx_queues()
> + *      - ethtool_op_get_link() (syncs link watch under rtnl_lock)
>   *
>   * @get_drvinfo: Report driver/device information. Modern drivers no
>   *     longer have to implement this callback. Most fields are
> diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> index 2b3847f00801..4e5356e26f40 100644
> --- a/net/ethtool/common.h
> +++ b/net/ethtool/common.h
> @@ -113,6 +113,8 @@ ethtool_nl_msg_needs_rtnl(const struct net_device *dev, u8 cmd)
>                 return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM;
>         case ETHTOOL_MSG_RSS_SET:
>                 return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
> +       case ETHTOOL_MSG_LINKSTATE_GET:
> +               return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
>         case ETHTOOL_MSG_TSCONFIG_GET:
>         case ETHTOOL_MSG_TSCONFIG_SET:
>                 /* tsconfig calls ndos (ndo_hwtstamp_set/get), not ethtool ops.
> @@ -159,6 +161,8 @@ ethtool_ioctl_needs_rtnl(const struct net_device *dev, u32 ethcmd)
>         case ETHTOOL_SRXFH:
>         case ETHTOOL_SRXFHINDIR:
>                 return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
> +       case ETHTOOL_GLINK:
> +               return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
>         }
>         return false;
>  }
> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index 7cc22916852f..8199738ba979 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -984,7 +984,8 @@ const struct ethtool_ops gve_ethtool_ops = {
>         .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
>                                  ETHTOOL_RING_USE_RX_BUF_LEN,
>         .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> -                        ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> +                        ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> +                        ETHTOOL_OP_NEEDS_RTNL_GLINK,

Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>

Thanks for the fix!
>         .get_drvinfo = gve_get_drvinfo,
>         .get_strings = gve_get_strings,
>         .get_sset_count = gve_get_sset_count,
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index a615d599b88e..e7cf12eaa268 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -1855,6 +1855,7 @@ static const struct ethtool_ops iavf_ethtool_ops = {
>         .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
>                                      ETHTOOL_COALESCE_USE_ADAPTIVE,
>         .supported_input_xfrm   = RXH_XFRM_SYM_XOR,
> +       .op_needs_rtnl          = ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_drvinfo            = iavf_get_drvinfo,
>         .get_link               = ethtool_op_get_link,
>         .get_ringparam          = iavf_get_ringparam,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> index 2f5b626ba33f..112926d07634 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> @@ -2721,7 +2721,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
>         .rxfh_max_num_contexts  = MLX5E_MAX_NUM_RSS,
>         .op_needs_rtnl          = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
>                                   ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> -                                 ETHTOOL_OP_NEEDS_RTNL_SPFLAGS,
> +                                 ETHTOOL_OP_NEEDS_RTNL_SPFLAGS |
> +                                 ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
>                                      ETHTOOL_COALESCE_MAX_FRAMES |
>                                      ETHTOOL_COALESCE_USE_ADAPTIVE |
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 1a8a19f980d3..c8b76d301c92 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -419,7 +419,8 @@ static const struct ethtool_ops mlx5e_rep_ethtool_ops = {
>                                      ETHTOOL_COALESCE_MAX_FRAMES |
>                                      ETHTOOL_COALESCE_USE_ADAPTIVE,
>         .op_needs_rtnl     = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> -                            ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> +                            ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> +                            ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_drvinfo       = mlx5e_rep_get_drvinfo,
>         .get_link          = ethtool_op_get_link,
>         .get_strings       = mlx5e_rep_get_strings,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> index 9b3b32408c64..01ddc3def9ac 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> @@ -286,7 +286,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
>                                      ETHTOOL_COALESCE_MAX_FRAMES |
>                                      ETHTOOL_COALESCE_USE_ADAPTIVE,
>         .op_needs_rtnl      = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> -                             ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> +                             ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> +                             ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_drvinfo        = mlx5i_get_drvinfo,
>         .get_strings        = mlx5i_get_strings,
>         .get_sset_count     = mlx5i_get_sset_count,
> @@ -309,6 +310,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
>  };
>
>  const struct ethtool_ops mlx5i_pkey_ethtool_ops = {
> +       .op_needs_rtnl      = ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_drvinfo        = mlx5i_get_drvinfo,
>         .get_link           = ethtool_op_get_link,
>         .get_ts_info        = mlx5i_get_ts_info,
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> index cb34fc166ef9..0e47088ec44b 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> @@ -2024,7 +2024,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = {
>                                           ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM |
>                                           ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM |
>                                           ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> -                                         ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> +                                         ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> +                                         ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_drvinfo                    = fbnic_get_drvinfo,
>         .get_regs_len                   = fbnic_get_regs_len,
>         .get_regs                       = fbnic_get_regs,
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 94e658d07a27..881df597d7f9 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -597,7 +597,8 @@ static int mana_get_link_ksettings(struct net_device *ndev,
>  const struct ethtool_ops mana_ethtool_ops = {
>         .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
>         .op_needs_rtnl          = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> -                                 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> +                                 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> +                                 ETHTOOL_OP_NEEDS_RTNL_GLINK,
>         .get_ethtool_stats      = mana_get_ethtool_stats,
>         .get_sset_count         = mana_get_sset_count,
>         .get_strings            = mana_get_strings,
> --
> 2.54.0
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox