Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v2 4/4] net: mana: Fix EQ leak in mana_remove on NULL port
From: Simon Horman @ 2026-04-14 15:40 UTC (permalink / raw)
  To: ernis
  Cc: Simon Horman, pabeni, edumazet, ssengar, dipayanroy, gargaditya,
	decui, shacharr, linux-hyperv, kotaranov, kys, shirazsaleem,
	longli, kuba, andrew+netdev, wei.liu, linux-kernel, netdev,
	haiyangz, kees, stephen, leon, davem
In-Reply-To: <20260413050843.605789-5-ernis@linux.microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: mana: Fix EQ leak in mana_remove on NULL port

In mana_remove(), when a NULL port is encountered in the port iteration
loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
queues allocated earlier by mana_create_eq(). This can happen when
mana_probe_port() fails for port 0, leaving ac->ports[0] as NULL. On
driver unload or error cleanup, mana_remove() hits the NULL entry and
jumps past mana_destroy_eq(). Change 'goto out' to 'break' so the
for-loop exits normally and mana_destroy_eq() is always reached.

> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

Does this Fixes tag point to the correct commit? Looking at the commit
history, the original driver commit ca9c54d2d6a5 did not have
mana_destroy_eq() in mana_remove() at all, so the 'goto out' pattern could
not have caused an EQ leak at that point.

The bug appears to have been introduced by commit 1e2d0824a9c3 ("net: mana:
Add support for EQ sharing"), which added mana_destroy_eq(ac) to
mana_remove() and placed it after the 'out:' label, making it skippable by
the 'goto out' on NULL port.

Should the Fixes tag be:
  Fixes: 1e2d0824a9c3 ("net: mana: Add support for EQ sharing")

^ permalink raw reply

* Re: [PATCH net v2 1/4] net: mana: Init link_change_work before potential error paths in probe
From: Simon Horman @ 2026-04-14 15:41 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-2-ernis@linux.microsoft.com>

On Sun, Apr 12, 2026 at 10:08:37PM -0700, Erni Sri Satya Vennela wrote:
> Move INIT_WORK(link_change_work) to right after the mana_context
> allocation, before any error path that could reach mana_remove().
> 
> Previously, if mana_create_eq() or mana_query_device_cfg() failed,
> mana_probe() would jump to the error path which calls mana_remove().
> mana_remove() unconditionally calls disable_work_sync(link_change_work),
> but the work struct had not been initialized yet. This can trigger
> CONFIG_DEBUG_OBJECTS_WORK enabled.
> 
> Fixes: 54133f9b4b53 ("net: mana: Support HW link state events")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v2:
> * Apply the patch in net instead of net-next.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net v2 2/4] net: mana: Init gf_stats_work before potential error paths in probe
From: Simon Horman @ 2026-04-14 15:41 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-3-ernis@linux.microsoft.com>

On Sun, Apr 12, 2026 at 10:08:38PM -0700, Erni Sri Satya Vennela wrote:
> Move INIT_DELAYED_WORK(gf_stats_work) to before mana_create_eq(),
> while keeping schedule_delayed_work() at its original location.
> 
> Previously, if any function between mana_create_eq() and the
> INIT_DELAYED_WORK call failed, mana_probe() would call mana_remove()
> which unconditionally calls cancel_delayed_work_sync(gf_stats_work)
> in __flush_work() or debug object warnings with
> CONFIG_DEBUG_OBJECTS_WORK enabled.
> 
> Fixes: be4f1d67ec56 ("net: mana: Add standard counter rx_missed_errors")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH v2] wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit
From: Jakub Kicinski @ 2026-04-14 15:44 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Shardul Bankar, kuniyu, andrew+netdev, davem, edumazet, pabeni,
	wireguard, netdev, linux-kernel, janak, kalpan.jani, shardulsb08,
	syzbot+f2fbf7478a35a94c8b7c
In-Reply-To: <CAHmME9rwKae5b9PYzuqmwG3p05iZ=jqyp6LQefg0OsB3OP6oWg@mail.gmail.com>

On Tue, 14 Apr 2026 17:40:03 +0200 Jason A. Donenfeld wrote:
> On Tue, Apr 14, 2026 at 5:18 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 14 Apr 2026 15:28:37 +0200 Jason A. Donenfeld wrote:  
> > > Thanks. Applied to the wireguard tree, and also added the missing
> > > __net_exit and __read_mostly annotations in the process.  
> >
> > Hi Jason, while we have you - do you have a PR for us for wireguard?
> > We're going to be sending the net-next PR later today..  
> 
> Sent!

Thanks! 
(I'll apply in a couple of hours once the CI had its way with it.)

^ permalink raw reply

* Re: [PATCH] net: mdio: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP
From: Charles Perry @ 2026-04-14 15:48 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Charles Perry, Conor Dooley, Jakub Kicinski, Maxime Chevallier,
	Andrew Lunn, Heiner Kallweit, Russell King, David S . Miller,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <980c57efa5843733ef95459c3283aebade56f142.1776162544.git.geert+renesas@glider.be>

On Tue, Apr 14, 2026 at 12:30:47PM +0200, Geert Uytterhoeven wrote:
> The PIC64-HPSC/HX MDIO interface is only present on Microchip
> PIC64-HPSC/HX SoCs.  Hence add a dependency on ARCH_MICROCHIP, to
> prevent asking the user about this driver when configuring a kernel
> without Microchip SoC support.
> 
> Fixes: f76aef980206e7c6 ("net: mdio: add a driver for PIC64-HPSC/HX MDIO controller")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Reviewed-by: Charles Perry <charles.perry@microchip.com>

Thanks!
Charles


^ permalink raw reply

* Re: [PATCH v3 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-14 15:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Fidelio Lawson, Woojung Huh, UNGLinuxDriver, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maxime Chevallier, Simon Horman, Heiner Kallweit, Russell King,
	netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <d9b161dd-f698-4d7e-8ccb-9ec12411bf87@lunn.ch>

On 4/14/26 2:40 PM, Andrew Lunn wrote:
> On Tue, Apr 14, 2026 at 01:05:49PM +0200, Marek Vasut wrote:
>> On 4/14/26 11:12 AM, Fidelio Lawson wrote:
>>> Implement the "Module 3: Equalizer fix for short cables" erratum from
>>> Microchip document DS80000687C for KSZ87xx switches.
>>>
>>> The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
>>> where the PHY receiver equalizer may amplify high-amplitude signals
>>> excessively, resulting in internal distortion and link establishment
>>> failures.
>>>
>>> KSZ87xx devices require a workaround for the Module 3 low-loss cable
>>> condition, controlled through the switch TABLE_LINK_MD_V indirect
>>> registers.
>>>
>>> The affected registers are part of the switch address space and are not
>>> directly accessible from the PHY driver. To keep the PHY-facing API
>>> clean and avoid leaking switch-specific details, model this errata
>>> control as vendor-specific Clause 22 PHY registers.
>>>
>>> A vendor-specific Clause 22 PHY register is introduced as a mode
>>> selector in PHY_REG_LOW_LOSS_CTRL, and ksz8_r_phy() / ksz8_w_phy()
>>> translate accesses to these bits into the appropriate indirect
>>> TABLE_LINK_MD_V accesses.
>>>
>>> The control register defines the following modes:
>>> 0: disabled (default behavior)
>>> 1: EQ training workaround
>>> 2: LPF 90 MHz
>>> 3: LPF 62 MHz
>>> 4: LPF 55 MHz
>>> 5: LPF 44 MHz
>> I may not fully understand this, but aren't the EQ and LPF settings
>> orthogonal ?
> 
> What is the real life experience using this feature? Is it needed for
> 1cm cables, but most > 1m cables are O.K with the defaults? Do we need
> all these configuration options? How is a user supposed to discover
> the different options? Can we simplify it down to a Boolean?

The report I got was, that if the device is cooled down AND the user 
used special short low-loss CAT6 cable, then there was packet loss until 
the communication completely broke down.

With the LPF set to 62 MHz and DSP EQ initial value set to 0, that 
situation improved and there was still up to 0.14% packet less, but it 
is better than total breakdown of communication. We couldn't get the 
packet loss down to 0% no matter which tuning we applied.

> Ethernet is just supposed to work with any valid length of cable,
> KISS. So maybe we should try to keep this feature KISS. Just tell the
> driver it is a short cable, pick different defaults which should work
> with any short cable?

I think the user should be able to configure the LPF bandwidth and DSP 
EQ initial value as needed. While the short cable improvement settings 
are "LPF set to 62 MHz bandwidth and DSP EQ initial value to 0", there 
might be future configurations which require different settings.

I think the ideal setup would be if those two settings were configurable 
separately, with a bit of documentation explaining the two currently 
known good settings:
- Default (LPF 90 MHz BW, DSP EQ initial value as needed)
- Short cable (LPF 62 MHz BW, DSP EQ initial value 0)
But if the user needs to reduce the BW further e.g. to improve noise 
resistance further, they shouldn't be prevented from doing so.

> A boolean should also help with making this tunable reusable with
> other devices. It is unlikely any other devices have these same
> configuration options, unless it is from the same vendor.
Could the LPF PHY tunable simply take integer as a parameter ? Then it 
would be portable across other PHYs I think ?

The DSP EQ initial value can also be an integer tunable.

^ permalink raw reply

* Re: [PATCH v3 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-14 15:50 UTC (permalink / raw)
  To: Andrew Lunn, Fidelio LAWSON
  Cc: Woojung Huh, UNGLinuxDriver, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marek Vasut,
	Maxime Chevallier, Simon Horman, Heiner Kallweit, Russell King,
	netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <d584bd42-ee6d-465f-b4f7-2edc992368d6@lunn.ch>

On 4/14/26 4:54 PM, Andrew Lunn wrote:
> On Tue, Apr 14, 2026 at 03:48:33PM +0200, Fidelio LAWSON wrote:
>> On 4/14/26 14:40, Andrew Lunn wrote:
>>> On Tue, Apr 14, 2026 at 01:05:49PM +0200, Marek Vasut wrote:
>>>> On 4/14/26 11:12 AM, Fidelio Lawson wrote:
>>>>> Implement the "Module 3: Equalizer fix for short cables" erratum from
>>>>> Microchip document DS80000687C for KSZ87xx switches.
>>>>>
>>>>> The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
>>>>> where the PHY receiver equalizer may amplify high-amplitude signals
>>>>> excessively, resulting in internal distortion and link establishment
>>>>> failures.
>>>>>
>>>>> KSZ87xx devices require a workaround for the Module 3 low-loss cable
>>>>> condition, controlled through the switch TABLE_LINK_MD_V indirect
>>>>> registers.
>>>>>
>>>>> The affected registers are part of the switch address space and are not
>>>>> directly accessible from the PHY driver. To keep the PHY-facing API
>>>>> clean and avoid leaking switch-specific details, model this errata
>>>>> control as vendor-specific Clause 22 PHY registers.
>>>>>
>>>>> A vendor-specific Clause 22 PHY register is introduced as a mode
>>>>> selector in PHY_REG_LOW_LOSS_CTRL, and ksz8_r_phy() / ksz8_w_phy()
>>>>> translate accesses to these bits into the appropriate indirect
>>>>> TABLE_LINK_MD_V accesses.
>>>>>
>>>>> The control register defines the following modes:
>>>>> 0: disabled (default behavior)
>>>>> 1: EQ training workaround
>>>>> 2: LPF 90 MHz
>>>>> 3: LPF 62 MHz
>>>>> 4: LPF 55 MHz
>>>>> 5: LPF 44 MHz
>>>> I may not fully understand this, but aren't the EQ and LPF settings
>>>> orthogonal ?
>>>
>>> What is the real life experience using this feature? Is it needed for
>>> 1cm cables, but most > 1m cables are O.K with the defaults? Do we need
>>> all these configuration options? How is a user supposed to discover
>>> the different options? Can we simplify it down to a Boolean?
>> We were seeing random link dropouts with the default settings, and since
>> enabling the workaround 2, the link has remained stable and we have not
>> observed any further issues.
> 
> So for you, a boolean which enables workaround 2 would be sufficient.
I agree with the observation from Fidelio, the hardware behaves that 
way. As for the rest format of the tunables, I now replied to previous 
email.

^ permalink raw reply

* Re: [PATCH net v4 1/2] flow_dissector: do not dissect PPPoE PFC frames
From: Simon Horman @ 2026-04-14 15:50 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: linux-ppp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Guillaume Nault, Wojciech Drewek, Tony Nguyen,
	linux-kernel, netdev, Paul Mackerras, Jaco Kroon, James Carlson,
	Marcin Szycik
In-Reply-To: <717da5ac-a020-4710-8ebb-6ed9d6e48bf4@linux.dev>

On Sat, Apr 11, 2026 at 11:56:30AM +0800, Qingfang Deng wrote:
> Hi,
> 
> On 4/11/2026 1:10 AM, Simon Horman wrote:
> > On Fri, Apr 10, 2026 at 11:36:20AM +0800, Qingfang Deng wrote:
> > > @@ -1361,7 +1376,7 @@ bool __skb_flow_dissect(const struct net *net,
> > >   			struct pppoe_hdr hdr;
> > >   			__be16 proto;
> > >   		} *hdr, _hdr;
> > > -		u16 ppp_proto;
> > > +		__be16 ppp_proto;
> > 
> > I'm unclear of the relationship between changing the type of ppp_proto
> > and the problem described in the patch description. And it
> > is creating a log of churn in this patch. I suggest dropping it.
> 
> The intention is to restore the original behavior before the blamed commit.
> If you find it too verbose for a fix, I can drop it and then repost that
> part later to net-next.

Thanks, I see you have posted v5, which I plan to review.
FTR: I think it is best to split the fix, for net, from
other changes for net-next.

...

^ permalink raw reply

* Re: [PATCH net-next] MAINTAINERS: Add netkit selftest files
From: patchwork-bot+netdevbpf @ 2026-04-14 15:50 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, kuba, dw, pabeni, razor
In-Reply-To: <20260414075249.611608-1-daniel@iogearbox.net>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 Apr 2026 09:52:49 +0200 you wrote:
> The following selftest files are related to netkit and should have
> netkit folks in Cc for review:
> 
>   - tools/testing/selftests/bpf/prog_tests/tc_netkit.c
>   - tools/testing/selftests/drivers/net/hw/nk_qlease.py
>   - tools/testing/selftests/net/nk_qlease.py
> 
> [...]

Here is the summary with links:
  - [net-next] MAINTAINERS: Add netkit selftest files
    https://git.kernel.org/netdev/net-next/c/bc28831d7a09

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2 0/3] Follow-ups to nk_qlease net selftests
From: patchwork-bot+netdevbpf @ 2026-04-14 15:50 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, kuba, dw, pabeni, razor
In-Reply-To: <20260413220809.604592-1-daniel@iogearbox.net>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 Apr 2026 00:08:03 +0200 you wrote:
> This is a set of follow-ups addressing [0]:
> 
> - Split netdevsim tests from HW tests in nk_qlease and move the SW
>   tests under selftests/net/
> - Remove multiple ksft_run()s to fix the recently enforced hard-fail
> - Move all the setup inside the test cases for the ones under
>   selftests/net/ (I'll defer the HW ones to David)
> - Add more test coverage related to queue leasing behavior and corner
>   cases, so now we have 45 tests in nk_qlease.py with netdevsim
>   which does not need special HW
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/3] tools/ynl: Make YnlFamily closeable as a context manager
    https://git.kernel.org/netdev/net-next/c/4a6fe5fe6004
  - [net-next,v2,2/3] selftests/net: Split netdevsim tests from HW tests in nk_qlease
    https://git.kernel.org/netdev/net-next/c/e254ffb9502c
  - [net-next,v2,3/3] selftests/net: Add additional test coverage in nk_qlease
    https://git.kernel.org/netdev/net-next/c/1e822171ba9b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH RFC bpf-next 1/8] kasan: expose generic kasan helpers
From: Alexei Starovoitov @ 2026-04-14 15:58 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Alexis Lothoré, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	John Fastabend, David S. Miller, David Ahern, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Shuah Khan, Maxime Coquelin, Alexandre Torgue, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, ebpf, Bastien Curutchet, Thomas Petazzoni,
	Xu Kuohai, bpf, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, linux-stm32,
	linux-arm-kernel, kasan-dev, linux-mm
In-Reply-To: <CA+fCnZd31GzdpEqR8VhfK4JtUKyyRMgbBoAbeGACJgm7WvB6Vw@mail.gmail.com>

On Tue, Apr 14, 2026 at 8:10 AM Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Tue, Apr 14, 2026 at 4:36 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > > ACK, I'll try to use those kasan_check_read and kasan_check_write rather
> > > than __asan_{load,store}X.
> >
> > No. The performance penalty will be too high.
>
> With using __asan_load/storeX(), it will be one function call to get
> to check_region_inline(): __asan_load/storeX->check_region_inline.
>
> With kasan_check_read/write(), right now, it would be two function
> calls: __kasan_check_read->kasan_check_range->check_region_inline.
>
> I doubt an extra function call would make a difference in terms of
> performance: the shadow checking itself is also expensive.
>
> But if the second call is a concern, we can move kasan_check_range()
> and lower-level functions into mm/kasan/generic.h and include it into
> shadow.c, and then it will be just one function call.
>
> To improve performance further, the JIT compiler could emit inlined
> shadow checking instructions, same as the C compiler does with
> KASAN_INLINE=y.
>
> > hw_tags won't work without corresponding JIT work.
>
> You probably meant SW_TAGS here.
>
> HW_TAGS will likely just work without any JIT changes (even the
> kasan_check_byte() thing I mentioned should not be required), assuming
> JIT'ed BPF code just accesses kernel-returned pointers as is.
>
> > I see no point sacrificing performance for aesthetics.
>
> With the change I suggested above, there would be no performance
> difference. And the code stays cleaner.
>
> > __asan_load/storeX is what compilers emit.
>
> For Generic mode. For SW_TAGS, the function names are different.
> Keeping this detail within the KASAN code is cleaner.

I think we're talking past each other.
We're not interested in KASAN_SW_TAGS or KASAN_HW_TAGS.
We're not going to modify arm64 JIT at all.

This is purely KASAN_GENRIC and only on x86-64.
JIT will emit exactly what compilers emit for generic
which is __asan_load/store. This is as stable ABI as it can get
and we don't want to deviate from it.

The goal here is to find bugs in the verifier.
If something got past it, that shouldn't have,
kasan generic on x86-64 is enough.

^ permalink raw reply

* Re: [PATCH net-next v6 0/2] net: mana: add ethtool private flag for full-page RX buffers
From: Dipayaan Roy @ 2026-04-14 16:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, leitao, kees, john.fastabend,
	hawk, bpf, daniel, ast, sdf, dipayanroy
In-Reply-To: <20260412125917.4fa8fc8d@kernel.org>

On Sun, Apr 12, 2026 at 12:59:17PM -0700, Jakub Kicinski wrote:
> On Thu, 9 Apr 2026 18:35:09 -0700 Jakub Kicinski wrote:
> > On Tue,  7 Apr 2026 12:59:17 -0700 Dipayaan Roy wrote:
> > > This behavior is observed on a single platform; other platforms
> > > perform better with page_pool fragments, indicating this is not a
> > > page_pool issue but platform-specific.  
> > 
> > Well, someone has to run some experiments and confirm other ARM
> > platforms are not impacted, with data. I was hoping to do it myself
> > but doesn't look like that will happen in time for the merge window :(
> 
> Please repost with the perf analysis on other commercially available
> ARM platform. Something like:
> 
>   This is a workaround applicable to only some platforms. Modifying
>   driver X to use a similar workaround on [Ampere Max|nVidia
>   Grace|Amazon Graviton 3|..] the performance for split pages is
>   y% higher than when using single pages.
> -- 
> pw-bot: cr

Hi Jakub,

I ran the same experiment on an alternate ARM64 platform from a
different vendor, which I was able to access only recently. I still see
roughly a 5% overhead from the atomic refcount operation itself, but on
that platform there is no throughput drop when using page fragments
versus full-page mode. In both cases, the setup reaches line rate. That
suggests the atomic overhead alone does not explain the throughput loss
on the specific hardware we are discussing.

I also received an update from the hardware team. They collected PCIe
traces and observed stalls on this particular ARM64 prcossor
when running with page fragments, while those stalls are not seen in
full-page mode. The exact root cause is still under investigation, but
their current assessment is that this is likely a microarchitectural
issue in the PCIe root port. Based on that, they are asking for a
software workaround that uses full pages until the issue is fully
understood.

For that reason, I am asking whether this could be accepted as an
ethtool private flag rather than as a generic driver change,
since the problem is still specific to one CPU/platform.
Please let me know whether you think this patch with private flag
would be acceptable here.

Regards
Dipayaan Roy

^ permalink raw reply

* Re: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from phy_resume() to __phy_resume()
From: Andrew Lunn @ 2026-04-14 16:03 UTC (permalink / raw)
  To: Biju
  Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Biju Das, Russell King, netdev, linux-kernel,
	Geert Uytterhoeven, Prabhakar Mahadev Lad, linux-renesas-soc
In-Reply-To: <20260412140032.122841-6-biju.das.jz@bp.renesas.com>

On Sun, Apr 12, 2026 at 03:00:27PM +0100, Biju wrote:
> From: Biju Das <biju.das.jz@bp.renesas.com>
> 
> Now that redundant locking has been removed from PHY driver callbacks,
> phy_init_hw() can be called with phydev->lock held.
> 
> Many MAC drivers and the phylink framework resume the PHY via
> phy_start(), which invokes __phy_resume() directly without going
> through phy_resume(). Keeping phy_init_hw() in phy_resume() means it
> is not called in this path.
> 
> Move phy_init_hw() into __phy_resume() so that PHY soft reset and
> re-initialisation happen unconditionally on every resume, regardless
> of which code path triggers it.

I would change the order of these patches. First remove the redundant
locks. You can then put phy_init_hw() into __phy_resume(), rather than
first moving it into phy_resume() and then __phy_resume().

      Andrew

^ permalink raw reply

* [PATCH] net: mdio: octeon: use %p for bus id
From: Chen Jung Ku @ 2026-04-14 15:56 UTC (permalink / raw)
  To: davem, kuba
  Cc: edumazet, pabeni, andrew, hkallweit1, linux, netdev, linux-kernel,
	Chen Jung Ku

Replace %px with %p to avoid exposing raw kernel pointer values.

Signed-off-by: Chen Jung Ku <ku.loong@gapp.nthu.edu.tw>
---
 drivers/net/mdio/mdio-octeon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mdio/mdio-octeon.c b/drivers/net/mdio/mdio-octeon.c
index cb53dccbde1a..c9c000bb0cd5 100644
--- a/drivers/net/mdio/mdio-octeon.c
+++ b/drivers/net/mdio/mdio-octeon.c
@@ -38,7 +38,7 @@ static int octeon_mdiobus_probe(struct platform_device *pdev)
 	oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
 
 	bus->mii_bus->name = KBUILD_MODNAME;
-	snprintf(bus->mii_bus->id, MII_BUS_ID_SIZE, "%px", bus->register_base);
+	snprintf(bus->mii_bus->id, MII_BUS_ID_SIZE, "%p", bus->register_base);
 	bus->mii_bus->parent = &pdev->dev;
 
 	bus->mii_bus->read = cavium_mdiobus_read_c22;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v3 3/5] net: phy: mscc: Drop unnecessary phydev->lock
From: Andrew Lunn @ 2026-04-14 16:06 UTC (permalink / raw)
  To: Biju
  Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Biju Das, Russell King, Lad Prabhakar,
	Horatiu Vultur, Vladimir Oltean, netdev, linux-kernel,
	Geert Uytterhoeven, linux-renesas-soc
In-Reply-To: <20260412140032.122841-4-biju.das.jz@bp.renesas.com>

On Sun, Apr 12, 2026 at 03:00:25PM +0100, Biju wrote:
> From: Biju Das <biju.das.jz@bp.renesas.com>
> 
> Remove manual mutex_lock/unlock(&phydev->lock) calls from several
> functions in the MSCC PHY driver.
> 
> In vsc85xx_edge_rate_cntl_set(), phydev->lock is taken around a single
> phy_modify_paged() call. phy_modify_paged() is already a fully locked
> atomic operation that acquires the MDIO bus lock internally, so the
> additional phydev->lock is unnecessary.
> 
> The remaining three functions — vsc85xx_mac_if_set(),
> vsc8531_pre_init_seq_set(), and vsc85xx_eee_init_seq_set() — use
> phy_read(), phy_write(), phy_select_page(), and phy_restore_page(),
> all of which operate under the MDIO bus lock. Taking phydev->lock
> around them provides no additional serialisation.
> 
> Along with dropping the locks, error-path labels are renamed from
> out_unlock to err or restore_oldpage to better reflect their purpose.
> In vsc8531_pre_init_seq_set() and vsc85xx_eee_init_seq_set(), the
> redundant intermediate assignment of oldpage before returning is also
> eliminated.
> 
> No functional change intended.
> 
> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 4/5] net: phy: microchip_t1: Replace phydev->lock with mdio_lock in lan937x_dsp_workaround()
From: Andrew Lunn @ 2026-04-14 16:08 UTC (permalink / raw)
  To: Biju
  Cc: Arun Ramadoss, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Biju Das, UNGLinuxDriver,
	Russell King, netdev, linux-kernel, Geert Uytterhoeven,
	Prabhakar Mahadev Lad, linux-renesas-soc
In-Reply-To: <20260412140032.122841-5-biju.das.jz@bp.renesas.com>

> -	mutex_lock(&phydev->lock);
> +	mutex_lock(&phydev->mdio.bus->mdio_lock);

phy_lock_mdio_bus(), and the phy_unlock_mdio_bus().

    Andrew

---
pw-bot: cr

^ permalink raw reply

* [PATCH net v3 1/3] vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
From: Luigi Leonardi @ 2026-04-14 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260414-fix_peek-v3-0-e7daead49f83@redhat.com>

`virtio_transport_stream_do_peek()` does not account for the skb offset
when computing the number of bytes to copy.

This means that, after a partial recv() that advances the offset, a peek
requesting more bytes than are available in the sk_buff causes
`skb_copy_datagram_iter()` to go past the valid payload, resulting in
a -EFAULT.

The dequeue path already handles this correctly.
Apply the same logic to the peek path.

Fixes: 0df7cd3c13e4 ("vsock/virtio/vhost: read data from non-linear skb")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a152a9e208d0..b5015ab2ee1e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -545,9 +545,8 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
 	skb_queue_walk(&vvs->rx_queue, skb) {
 		size_t bytes;
 
-		bytes = len - total;
-		if (bytes > skb->len)
-			bytes = skb->len;
+		bytes = min_t(size_t, len - total,
+			      skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset);
 
 		spin_unlock_bh(&vvs->rx_lock);
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH net v3 0/3] vsock/virtio: fix MSG_PEEK calculation on bytes to copy
From: Luigi Leonardi @ 2026-04-14 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi

`virtio_transport_stream_do_peek`, when calculating the number of bytes to
copy, didn't consider the `offset`, caused by partial reads that happened
before.
This might cause out-of-bounds read that lead to an EFAULT.
More details in the commits.

Commit 1 introduces the fix
Commit 2 introduces some preliminary work for adding a test and fixes a
problem in existing tests.
Commit 3 introduces a test that checks for this bug to avoid future
regressions.

For disclosure: this bug was found initially by claude opus 4.6, I then analzyed
it and worked on the fix and the test.

Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
Changes in v3:
- Addressed reviwers omment
    - Dropped test client, reusing the one already existing
    - Minor changes: added comment, improved commit messages
    - Rebased to latest net-next
- Link to v2: https://lore.kernel.org/r/20260407-fix_peek-v2-0-2e2581dc8b7c@redhat.com

Changes in v2:
- Addressed reviewers comment
    - Test now uses the recv_buf utils.
    - Removed unnecessary barrier
    - Checkpatch warnings.
- Added new commit that allows to use recv_buf with MSG_PEEK
- Picked up RoBs
- Link to v1: https://lore.kernel.org/r/20260402-fix_peek-v1-0-ad274fcef77b@redhat.com

---
Luigi Leonardi (3):
      vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
      vsock/test: fix MSG_PEEK handling in recv_buf()
      vsock/test: add MSG_PEEK after partial recv test

 net/vmw_vsock/virtio_transport_common.c |  5 ++--
 tools/testing/vsock/util.c              | 15 ++++++++++
 tools/testing/vsock/vsock_test.c        | 50 +++++++++++++++++++++++++--------
 3 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: bc28831d7a09f7058cdca4658d81e5faf635bed7
change-id: 20260401-fix_peek-6837b83469e3

Best regards,
-- 
Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply

* [PATCH net v3 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Luigi Leonardi @ 2026-04-14 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260414-fix_peek-v3-0-e7daead49f83@redhat.com>

`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
`recv` until all requested bytes are available or an error occurs.

The problem is how it calculates the amount of bytes read: MSG_PEEK
doesn't consume any bytes, will re-read the same bytes from the buffer
head, so, summing the return value every time is wrong.

Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if the
requested amount is more than the bytes available, the loop will never
terminate, because `recv` will never return EOF. For this reason we need
to compare the amount of read bytes with the number of bytes expected.

Add a check, and if the MSG_PEEK flag is present, update the counter of
read bytes differently, and break if we read the expected amount.

This allows us to simplify the `test_stream_credit_update_test`, by
reusing `recv_buf`, like some other tests already do.

This also fixes callers that pass MSG_PEEK to recv_buf().

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/util.c       | 15 +++++++++++++++
 tools/testing/vsock/vsock_test.c | 13 +------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 1fe1338c79cd..2c9ee3210090 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
 	}
 }

+#define RECV_PEEK_RETRY_USEC 10
+
 /* Receive bytes in a buffer and check the return value.
+ *
+ * MSG_PEEK note: MSG_PEEK doesn't consume bytes from the buffer, so partial
+ * reads cannot be summed. Instead, the function retries until recv() returns
+ * exactly expected_ret bytes in a single call.
  *
  * expected_ret:
  *  <0 Negative errno (for testing errors)
@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
 		if (ret <= 0)
 			break;

+		if (flags & MSG_PEEK) {
+			if (ret == expected_ret) {
+				nread = ret;
+				break;
+			}
+			timeout_usleep(RECV_PEEK_RETRY_USEC);
+			continue;
+		}
+
 		nread += ret;
 	} while (nread < len);
 	timeout_end();
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 5bd20ccd9335..bdb0754965df 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1500,18 +1500,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts,
 	}

 	/* Wait until there will be 128KB of data in rx queue. */
-	while (1) {
-		ssize_t res;
-
-		res = recv(fd, buf, buf_size, MSG_PEEK);
-		if (res == buf_size)
-			break;
-
-		if (res <= 0) {
-			fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
-			exit(EXIT_FAILURE);
-		}
-	}
+	recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size);

 	/* There is 128KB of data in the socket's rx queue, dequeue first
 	 * 64KB, credit update is sent if 'low_rx_bytes_test' == true.

-- 
2.53.0

^ permalink raw reply related

* [PATCH net v3 3/3] vsock/test: add MSG_PEEK after partial recv test
From: Luigi Leonardi @ 2026-04-14 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260414-fix_peek-v3-0-e7daead49f83@redhat.com>

Add a test that verifies MSG_PEEK works correctly after a partial
recv().

This is to test a bug that was present in the
`virtio_transport_stream_do_peek()` when computing the number of bytes to
copy: After a partial read, the peek function didn't take into
consideration the number of bytes that were already read. So peeking the
whole buffer would cause an out-of-bounds read, that resulted in a -EFAULT.

This test does exactly this: do a partial recv on a buffer, then try to
peek the whole buffer content.

Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/vsock_test.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index bdb0754965df..ab387a13f0ae 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -346,6 +346,38 @@ static void test_stream_msg_peek_server(const struct test_opts *opts)
 	return test_msg_peek_server(opts, false);
 }
 
+static void test_stream_peek_after_recv_server(const struct test_opts *opts)
+{
+	unsigned char buf_normal[MSG_PEEK_BUF_LEN];
+	unsigned char buf_peek[MSG_PEEK_BUF_LEN];
+	int fd;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL);
+	if (fd < 0) {
+		perror("accept");
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("SRVREADY");
+
+	/* Partial recv to advance offset within the skb */
+	recv_buf(fd, buf_normal, 1, 0, 1);
+
+	/* Ask more bytes than available */
+	recv_buf(fd, buf_peek, sizeof(buf_peek), MSG_PEEK, sizeof(buf_peek) - 1);
+
+	/* Recv rest of the data */
+	recv_buf(fd, buf_normal, sizeof(buf_normal) - 1, 0, sizeof(buf_normal) - 1);
+
+	/* Compare full peek and normal read. */
+	if (memcmp(buf_peek, buf_normal, sizeof(buf_peek) - 1)) {
+		fprintf(stderr, "Full peek data mismatch\n");
+		exit(EXIT_FAILURE);
+	}
+
+	close(fd);
+}
+
 #define SOCK_BUF_SIZE (2 * 1024 * 1024)
 #define SOCK_BUF_SIZE_SMALL (64 * 1024)
 #define MAX_MSG_PAGES 4
@@ -2509,6 +2541,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_tx_credit_bounds_client,
 		.run_server = test_stream_tx_credit_bounds_server,
 	},
+	{
+		.name = "SOCK_STREAM MSG_PEEK after partial recv",
+		.run_client = test_stream_msg_peek_client,
+		.run_server = test_stream_peek_after_recv_server,
+	},
 	{},
 };
 

-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] net: mdio: octeon: use %p for bus id
From: Andrew Lunn @ 2026-04-14 16:16 UTC (permalink / raw)
  To: Chen Jung Ku
  Cc: davem, kuba, edumazet, pabeni, hkallweit1, linux, netdev,
	linux-kernel
In-Reply-To: <20260414155652.7468-1-ku.loong@gapp.nthu.edu.tw>

On Tue, Apr 14, 2026 at 11:56:52PM +0800, Chen Jung Ku wrote:
> Replace %px with %p to avoid exposing raw kernel pointer values.

What exactly are we giving away here?

                        compatible = "cavium,octeon-3860-mdio";
                        #address-cells = <1>;
                        #size-cells = <0>;
                        reg = <0x11800 0x00001900 0x0 0x40>;

Isn't bus->register_base this well known value?

You also need to think about ABI.

    Andrew

^ permalink raw reply

* [PATCH net 0/2] bnge fixes
From: Vikas Gupta @ 2026-04-14 16:18 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta

Hi,
 This series fix two issues.

 Patch-1: 
    Due to wrong HWRM sequence, driver do not get the correct
    information regarding resources and capabilitie.
    The patch fixes the initial HWRM sequence.
Patch-2:
    Remove the unsupported backing store type initialization, which is
    not supported in Thor Ultra devices.

Thanks,
Vikas

Vikas Gupta (2):
  bnge: fix initial HWRM sequence
  bnge: remove unsupported backing store type

 .../net/ethernet/broadcom/bnge/bnge_core.c    | 40 ++++++++++---------
 .../net/ethernet/broadcom/bnge/bnge_rmem.c    | 16 --------
 2 files changed, 21 insertions(+), 35 deletions(-)

-- 
2.47.1


^ permalink raw reply

* [PATCH net 1/2] bnge: fix initial HWRM sequence
From: Vikas Gupta @ 2026-04-14 16:18 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260414161822.742382-1-vikas.gupta@broadcom.com>

Firmware may not advertize correct resources if backing store is not
enabled before resource information is queried.
Fix the initial sequence of HWRMs so that driver gets capabilities
and resource information correctly.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rahul Gupta <rahul-rg.gupta@broadcom.com>
---
 .../net/ethernet/broadcom/bnge/bnge_core.c    | 40 ++++++++++---------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_core.c b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
index b4090283df0f..2b13c552a2f6 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_core.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
@@ -73,30 +73,39 @@ static int bnge_func_qcaps(struct bnge_dev *bd)
 		return rc;
 	}
 
+	rc = bnge_alloc_ctx_mem(bd);
+	if (rc) {
+		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
+		goto err_free_ctx_mem;
+	}
+
 	rc = bnge_hwrm_func_resc_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "query resc caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_func_qcfg(bd);
 	if (rc) {
 		dev_err(bd->dev, "query config failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_vnic_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "vnic caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	return 0;
+
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
+	return rc;
 }
 
 static void bnge_fw_unregister_dev(struct bnge_dev *bd)
 {
-	/* ctx mem free after unrgtr only */
 	bnge_hwrm_func_drv_unrgtr(bd);
 	bnge_free_ctx_mem(bd);
 }
@@ -132,32 +141,25 @@ static int bnge_fw_register_dev(struct bnge_dev *bd)
 
 	bnge_hwrm_fw_set_time(bd);
 
-	rc =  bnge_hwrm_func_drv_rgtr(bd);
+	/* Get the resources and configuration from firmware */
+	rc = bnge_func_qcaps(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
 		return rc;
 	}
 
-	rc = bnge_alloc_ctx_mem(bd);
+	rc = bnge_hwrm_func_drv_rgtr(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
-		goto err_func_unrgtr;
-	}
-
-	/* Get the resources and configuration from firmware */
-	rc = bnge_func_qcaps(bd);
-	if (rc) {
-		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
-		rc = -ENODEV;
-		goto err_func_unrgtr;
+		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		goto err_free_ctx_mem;
 	}
 
 	bnge_set_dflt_rss_hash_type(bd);
 
 	return 0;
 
-err_func_unrgtr:
-	bnge_fw_unregister_dev(bd);
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
 	return rc;
 }
 
-- 
2.47.1


^ permalink raw reply related

* [PATCH net 2/2] bnge: remove unsupported backing store type
From: Vikas Gupta @ 2026-04-14 16:18 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260414161822.742382-1-vikas.gupta@broadcom.com>

The backing store type, BNGE_CTX_MRAV, is not applicable in Thor Ultra
devices. Remove it from the backing store configuration, as the firmware
will not populate entities in this backing store type, due to which the
driver load fails.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnge/bnge_rmem.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
index 94f15e08a88c..b066ee887a09 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
@@ -324,7 +324,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	u32 l2_qps, qp1_qps, max_qps;
 	u32 ena, entries_sp, entries;
 	u32 srqs, max_srqs, min;
-	u32 num_mr, num_ah;
 	u32 extra_srqs = 0;
 	u32 extra_qps = 0;
 	u32 fast_qpmd_qps;
@@ -390,21 +389,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	if (!bnge_is_roce_en(bd))
 		goto skip_rdma;
 
-	ctxm = &ctx->ctx_arr[BNGE_CTX_MRAV];
-	/* 128K extra is needed to accommodate static AH context
-	 * allocation by f/w.
-	 */
-	num_mr = min_t(u32, ctxm->max_entries / 2, 1024 * 256);
-	num_ah = min_t(u32, num_mr, 1024 * 128);
-	ctxm->split_entry_cnt = BNGE_CTX_MRAV_AV_SPLIT_ENTRY + 1;
-	if (!ctxm->mrav_av_entries || ctxm->mrav_av_entries > num_ah)
-		ctxm->mrav_av_entries = num_ah;
-
-	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, num_mr + num_ah, 2);
-	if (rc)
-		return rc;
-	ena |= FUNC_BACKING_STORE_CFG_REQ_ENABLES_MRAV;
-
 	ctxm = &ctx->ctx_arr[BNGE_CTX_TIM];
 	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, l2_qps + qp1_qps + extra_qps, 1);
 	if (rc)
-- 
2.47.1


^ permalink raw reply related

* Re: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
From: Willy Tarreau @ 2026-04-14 16:23 UTC (permalink / raw)
  To: Pavitra Jha; +Cc: pabeni, chandrashekar.devegowda, linux-wwan, netdev, stable
In-Reply-To: <20260414153201.1633720-1-jhapavitra98@gmail.com>

Hello,

On Tue, Apr 14, 2026 at 11:31:56AM -0400, Pavitra Jha wrote:
> t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
> a loop bound over port_msg->data[] without checking that the message buffer
> contains sufficient data. A modem sending port_count=65535 in a 12-byte
> buffer triggers a slab-out-of-bounds read of up to 262140 bytes.
> 
> Add a struct_size() check after extracting port_count and before the loop.
> Pass msg_len to t7xx_port_enum_msg_handler() and use it to validate
> the message size before accessing port_msg->data[].
> Pass msg_len from both call sites: skb->len at the DPMAIF path after
> skb_pull(), and the captured rt_feature->data_len at the handshake path.
> 
> Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
> Cc: stable@vger.kernel.org
> Reported-by: Pavitra Jha <jhapavitra98@gmail.com>
> Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>

Please note that you don't need the Reported-by tag when it's the same
as the Signed-off-by one.

Also, I'm noticing a few empty-line removals out of context below:

> diff --git a/drivers/net/wwan/t7xx/t7xx_modem_ops.c b/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> index 7968e208d..d0559fe16 100644
> --- a/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> +++ b/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> @@ -453,25 +453,25 @@ static int t7xx_parse_host_rt_data(struct t7xx_fsm_ctl *ctl, struct t7xx_sys_inf
>  {
>  	enum mtk_feature_support_type ft_spt_st, ft_spt_cfg;
>  	struct mtk_runtime_feature *rt_feature;
> +	size_t feat_data_len;
>  	int i, offset;
>  
>  	offset = sizeof(struct feature_query);
>  	for (i = 0; i < FEATURE_COUNT && offset < data_length; i++) {
>  		rt_feature = data + offset;
> -		offset += sizeof(*rt_feature) + le32_to_cpu(rt_feature->data_len);
> -
> +		feat_data_len = le32_to_cpu(rt_feature->data_len);
> +		offset += sizeof(*rt_feature) + feat_data_len;
>  		ft_spt_cfg = FIELD_GET(FEATURE_MSK, core->feature_set[i]);
>  		if (ft_spt_cfg != MTK_FEATURE_MUST_BE_SUPPORTED)
>  			continue;
> -

here

>  		ft_spt_st = FIELD_GET(FEATURE_MSK, rt_feature->support_info);
>  		if (ft_spt_st != MTK_FEATURE_MUST_BE_SUPPORTED)
>  			return -EINVAL;
> -

Here, the original author probably left the line to highlight the return
statement.

> -		if (i == RT_ID_MD_PORT_ENUM || i == RT_ID_AP_PORT_ENUM)
> -			t7xx_port_enum_msg_handler(ctl->md, rt_feature->data);
> +		if (i == RT_ID_MD_PORT_ENUM || i == RT_ID_AP_PORT_ENUM) {
> +			t7xx_port_enum_msg_handler(ctl->md, rt_feature->data,
> +						   feat_data_len);
> +		}
>  	}
> -

Here, why?

>  	return 0;
>  }
>  
> diff --git a/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c b/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> index ae632ef96..d984a688d 100644
> --- a/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> +++ b/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> @@ -154,7 +161,6 @@ int t7xx_port_enum_msg_handler(struct t7xx_modem *md, void *msg)
>  
>  	return 0;
>  }
> -

This one as well.

>  static int control_msg_handler(struct t7xx_port *port, struct sk_buff *skb)
>  {
>  	const struct t7xx_port_conf *port_conf = port->port_conf;

Better leave them untouched, it will keep the code as readable as it
previously was and reduce the overall review effort.

thanks,
willy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox