Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v3 00/12] BIG TCP for UDP tunnels
From: Alice Mikityanska @ 2026-04-15 12:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alice Mikityanska, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov, Shuah Khan, Stanislav Fomichev, Andrew Lunn,
	Simon Horman, Florian Westphal, netdev
In-Reply-To: <20260413155552.5cd00bc0@kernel.org>

On Tue, 14 Apr 2026 at 01:55, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 10 Apr 2026 18:09:31 +0300 Alice Mikityanska wrote:
> > This series is a follow-up to "BIG TCP without HBH in IPv6", and it adds
> > support for BIG TCP IPv4/IPv6 workloads in vxlan and geneve. Now that
> > IPv6 BIG TCP doesn't require stripping the HBH in all various
> > combinations in tunneled traffic, adding BIG TCP becomes feasible.
>
> No longer applies, sorry :(

That's a pity :(. I see that the only conflict is because udplite
parts have been removed from net/netfilter/nf_conntrack_proto_udp.c,
so I just need to drop my change that touches udplite.

> We'll have to revisit after the merge window.

OK, I'll resubmit after the merge window. I'd appreciate it if I can
still collect review comments in the meanwhile.

> --
> pw-bot: cr

^ permalink raw reply

* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Sasha Finkelstein @ 2026-04-15 12:33 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Sven Peter, Janne Grunau, Neal Gompa, Marcel Holtmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-kernel, asahi, linux-arm-kernel,
	linux-bluetooth, netdev
In-Reply-To: <CABBYNZJAEqwfTuVqbFAnx97HBSjcwn3Hb+y+r4r2C=MMPxFoDg@mail.gmail.com>

On Tue, 14 Apr 2026 at 16:00, Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
> > +       if (sock)
> > +               set_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
>
> This is more complicated than it needs to be. I'd just add a new
> callback, `hdev->set_priority(handle, skb->priority)`, so the driver
> is called whenever it needs to elevate a connection's priority, that
> said there could be cases where a connection needs its priority set
> momentarily to transmit A2DP, followed by OBEX packets that are best
> effort. Therefore, `hci_conn` will probably need to track the priority
> so it can detect when it needs changing on a per skb basis.

I have tested per-skb priorities, and unfortunately, this does not work.
If something tries to send a low-priority packet (for example - a volume
adjustment), a priority drop causes the same kind of dropout that is
caused by scans. It appears that the only way to make this hardware work
is to set the entire hci connection as high priority for as long as it
is being used to transmit audio.

^ permalink raw reply

* Re: [PATCH] net/sched: sch_dualpi2: fix NULL pointer dereference in dualpi2_change()
From: Simon Horman @ 2026-04-15 12:34 UTC (permalink / raw)
  To: veritas501
  Cc: chia-yu.chang, jhs, jiri, davem, edumazet, kuba, linux-kernel,
	netdev, pabeni
In-Reply-To: <20260414160000.1@hxzene.gmail.com>

On Wed, Apr 15, 2026 at 10:31:58AM +0800, veritas501 wrote:
> From: "Kito Xu (veritas501)" <hxzene@gmail.com>
> 
> Hi Simon,
> 
> Thanks for the review and for pointing out the alternative patch
> from Chia-Yu. I agree that the more comprehensive fix is the better
> choice for this code path.
> 
> Since I independently discovered and reported this issue, would it
> be possible to add a Reported-by tag to Chia-Yu's patch?
> 
>     Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>

That sounds reasonable to me.
But it might be best to bring up in a response to Chia-Yu's patch.
(I can't make it happen myself.)

> 
> Either way, thanks for handling this!
> 
> Best regards,
> Kito

^ permalink raw reply

* Re: [PATCH net v2 3/4] net: mana: Don't overwrite port probe error with add_adev result
From: Simon Horman @ 2026-04-15 12:37 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <ad84k4qaMd9wnYwd@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Wed, Apr 15, 2026 at 12:04:51AM -0700, Erni Sri Satya Vennela wrote:
> On Tue, Apr 14, 2026 at 04:35:03PM +0100, Simon Horman wrote:
> > From: 'Simon Horman' <horms@kernel.org>
> > 
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > Full review at: https://sashiko.dev
> > ---
> > net: mana: Don't overwrite port probe error with add_adev result
> > 
> > This commit ensures that if mana_probe_port() or mana_attach() fails, the
> > error is not overwritten by the result of add_adev(). By preserving the
> > error, it triggers mana_remove() cleanup instead of leaving ports partially
> > initialized.
> > 
> > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > index 9d095a6fb56c7..4d42cbfb3f15b 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > @@ -3634,10 +3634,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > >  	if (!resuming) {
> > >  		for (i = 0; i < ac->num_ports; i++) {
> > >  			err = mana_probe_port(ac, i, &ac->ports[i]);
> > > -			/* we log the port for which the probe failed and stop
> > > -			 * probes for subsequent ports.
> > > -			 * Note that we keep running ports, for which the probes
> > > -			 * were successful, unless add_adev fails too
> > > +			/* Log the port for which the probe failed, stop probing
> > > +			 * subsequent ports, and skip add_adev.
> > > +			 * Already-probed ports remain functional.
> > >  			 */
> > 
> > Are these comments accurate regarding ports remaining functional?
> > 
> > Since this patch preserves the error code, execution falls through to the end
> > of mana_probe() where it checks if err is set and calls mana_remove().
> > mana_remove() will iterate over the successfully allocated ports, completely
> > detach them, unregister their net_devices, and destroy event queues.
> > 
> > It appears no ports remain functional because the device is completely torn
> > down.
> >
> I will update the patch with the updated comment message. 

Thanks.

> > [ ... ]
> > 
> > > @@ -3651,10 +3650,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > >  			enable_work(&apc->queue_reset_work);
> > >  			err = mana_attach(ac->ports[i]);
> > >  			rtnl_unlock();
> > > -			/* we log the port for which the attach failed and stop
> > > -			 * attach for subsequent ports
> > > -			 * Note that we keep running ports, for which the attach
> > > -			 * were successful, unless add_adev fails too
> > > +			/* Log the port for which the attach failed, stop
> > > +			 * attaching subsequent ports, and skip add_adev.
> > > +			 * Already-attached ports remain functional.
> > >  			 */
> > >  			if (err) {
> > >  				dev_err(dev, "Attach Failed for port %d\n", i);
> > > @@ -3663,7 +3661,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > 
> > Does propagating this error during PM resume introduce a NULL pointer
> > dereference?
> > 
> > If PM resume fails here (resuming == true), the preserved error code causes
> > mana_probe() to call mana_remove(), which tears down the device and sets
> > gd->gdma_context = NULL.
> > 
> > However, a failed resume callback does not automatically unbind the driver.
> > When the device is eventually unbound later, mana_remove() is invoked again:
> > 
> > drivers/net/ethernet/microsoft/mana/mana_en.c:mana_remove() {
> >     struct gdma_context *gc = gd->gdma_context;
> >     struct device *dev = gc->dev;
> >     ...
> > }
> > 
> > In the second invocation, gd->gdma_context is NULL, meaning gc is NULL,
> > which could lead to a kernel panic when dereferencing gc->dev.
> 
> Thankyou for pointing it out, Simon.
> Since this is a pre-existing bug, I will create a different patch for
> this change and make it as part of this patchset.

Likewise, thanks.

FTR, it it is a pre-existing bug then I don't think it needs
to block progress of your patchset. Even if fixing things
sooner than later is a good maxim.

^ permalink raw reply

* RE: [PATCH] net/sched: sch_dualpi2: fix NULL pointer dereference in dualpi2_change()
From: Chia-Yu Chang (Nokia) @ 2026-04-15 12:37 UTC (permalink / raw)
  To: Simon Horman, veritas501
  Cc: jhs@mojatatu.com, jiri@resnulli.us, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	pabeni@redhat.com
In-Reply-To: <20260415123403.GF772670@horms.kernel.org>

> -----Original Message-----
> From: Simon Horman <horms@kernel.org> 
> Sent: Wednesday, April 15, 2026 2:34 PM
> To: veritas501 <hxzene@gmail.com>
> Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; jhs@mojatatu.com; jiri@resnulli.us; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; pabeni@redhat.com
> Subject: Re: [PATCH] net/sched: sch_dualpi2: fix NULL pointer dereference in dualpi2_change()
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On Wed, Apr 15, 2026 at 10:31:58AM +0800, veritas501 wrote:
> > From: "Kito Xu (veritas501)" <hxzene@gmail.com>
> >
> > Hi Simon,
> >
> > Thanks for the review and for pointing out the alternative patch from 
> > Chia-Yu. I agree that the more comprehensive fix is the better choice 
> > for this code path.
> >
> > Since I independently discovered and reported this issue, would it be 
> > possible to add a Reported-by tag to Chia-Yu's patch?
> >
> >     Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
> 
> That sounds reasonable to me.
> But it might be best to bring up in a response to Chia-Yu's patch.
> (I can't make it happen myself.)
> 
> >
> > Either way, thanks for handling this!
> >
> > Best regards,
> > Kito

Hi Kito and Simon,

Sure, I will add in the other patch with the proposed tag in v2 (let's wait for more feedback before submitting v2).
Thanks!

Chia-Yu

^ permalink raw reply

* [syzbot ci] Re: [PATCH net] tipc: fix UAF race in tipc_mon_peer_up/down/remove_peer vs bearer teardown
From: syzbot ci @ 2026-04-15 12:39 UTC (permalink / raw)
  To: jmaloy, kai.aizen.dev, kuba, netdev, pabeni, stable,
	tipc-discussion, ying.xue
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260415061211.45530-1-95986478+SnailSploit@users.noreply.github.com>

syzbot ci has tested the following series

[v1] [PATCH net] tipc: fix UAF race in tipc_mon_peer_up/down/remove_peer vs bearer teardown
https://lore.kernel.org/all/20260415061211.45530-1-95986478+SnailSploit@users.noreply.github.com
* [PATCH] [PATCH net] tipc: fix UAF race in tipc_mon_peer_up/down/remove_peer vs bearer teardown

and found the following issue:
WARNING: suspicious RCU usage in tipc_mon_delete

Full report is available here:
https://ci.syzbot.org/series/6267bc07-4172-4821-b3e5-dac381479d9d

***

WARNING: suspicious RCU usage in tipc_mon_delete

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      35c2c39832e569449b9192fa1afbbc4c66227af7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/a29dabe7-96d8-4072-bc2c-d798a349301e/config
syz repro: https://ci.syzbot.org/findings/f144d75a-7c29-41a1-988e-09892a89baa1/syz_repro

tipc: Disabling bearer <eth:syzkaller0>
=============================
WARNING: suspicious RCU usage
syzkaller #0 Not tainted
-----------------------------
net/tipc/monitor.c:108 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz.2.19/5962:
 #0: ffffffff8fbcba48 (rtnl_mutex){+.+.}-{4:4}, at: tun_detach drivers/net/tun.c:634 [inline]
 #0: ffffffff8fbcba48 (rtnl_mutex){+.+.}-{4:4}, at: tun_chr_close+0x3e/0x1c0 drivers/net/tun.c:3438

stack backtrace:
CPU: 1 UID: 0 PID: 5962 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 lockdep_rcu_suspicious+0x13f/0x1d0 kernel/locking/lockdep.c:6876
 tipc_monitor_rcu_bh+0xf5/0x110 net/tipc/monitor.c:108
 get_self net/tipc/monitor.c:209 [inline]
 tipc_mon_delete+0x10b/0x4d0 net/tipc/monitor.c:704
 tipc_l2_device_event+0x370/0x680 net/tipc/bearer.c:-1
 notifier_call_chain+0x1be/0x400 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12464
 unregister_netdevice_many net/core/dev.c:12527 [inline]
 unregister_netdevice_queue+0x31f/0x360 net/core/dev.c:12337
 unregister_netdevice include/linux/netdevice.h:3427 [inline]
 __tun_detach+0x6d9/0x15d0 drivers/net/tun.c:621
 tun_detach drivers/net/tun.c:637 [inline]
 tun_chr_close+0x10a/0x1c0 drivers/net/tun.c:3438
 __fput+0x44f/0xa70 fs/file_table.c:469
 task_work_run+0x1d9/0x270 kernel/task_work.c:233
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 __exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
 exit_to_user_mode_loop+0xed/0x480 kernel/entry/common.c:98
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
 do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7b26d9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffec30cee78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00007ffec30cef60 RCX: 00007f7b26d9c819
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 0000000000011900 R08: 0000000000000001 R09: 0000000000000000
R10: 0000001b2e520000 R11: 0000000000000246 R12: 00007ffec30cefa0
R13: 00007f7b27015fac R14: 000000000001193b R15: 00007f7b27015fa0
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15 12:43 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <CAH5Ym4jA8w9=UxMT4vKJpnXkuDHtkFtMcg4u2sy_0S+8wgy-9w@mail.gmail.com>

On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > survives iperf3 -c -R to the imx6.
> 
> Hi Russell,
> 
> Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> 
> > Dumping the registers and comparing, and then forcing the RQS and TQS
> > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > all four of the MTL receive operation mode registers, but only the
> > first MTL transmit operation mode register. However, DMA channels 1-3
> > aren't initialised.
> 
> Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> allowed to overflow, putting the hardware in a bad state.
> 
> Though I suspect this is only half of the problem: do you still see
> RBUs? Everything you've shared so far suggests the DMA failures are
> _not_ because the rx ring is drying up.

Yes. Note that RBUs will happen not because of DMA failures, but if
the kernel fails to keep up with the packet rate. RBU means "we read
the next descriptor, and it wasn't owned by hardware".

> > Looking back at 5.10, I don't see any code that would account for these
> > values being programmed for TQS and RQS, it looks like the calculations
> > are basically the same as we have today.
> 
> Note that Nvidia have their own "nvethernet" driver for their vendor
> kernel, which appears to pick the FIFO sizes from hardcoded tables in
> its eqos_configure_mtl_queue() [1] function.

That has:

	const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
		{ FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
		  FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
		{ FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
		  FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
	};
	const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
		{ FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
		  FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
		{ FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
		  FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
	};

where each of those values is the RQS/TQS value to use in KiB:

#define FIFO_SZ(x)		((((x) * 1024U) / 256U) - 1U)

This doesn't correspond with the values I'm seeing programmed into
the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
(36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
above from a quick look, but they're not in the right place!

For example, tx_fifo_sz[] doesn't contain an entry for 36KiB.
rx_fifo_sz[0][0..3] looks plausible.

It's certainly not a case of misreading the register values, this is
what devmem2 said:

Value at address 0x02490d00: 0x008f000a
Value at address 0x02490d30: 0x02379eb0

where TQS is bits 24:16 of the register at offset 0xd00 - which is
0x8f, and RQS is bits 29:20 of the register at 0xd30, which is
0x23.

Now, as for FIFO sizes, if we sum up all the entries, then we
get:

SUM(rx_fifo_size[0][]) = 60KiB
SUM(rx_fifo_size[1][]) = 64KiB
SUM(tx_fifo_size[0][]) = 60KiB
SUM(tx_fifo_size[1][]) = 64KiB

From what I gather in core_local.h, l_mac_ver contains one of three
values - 0 = Legacy EQOS, 1 = Orin EQOS, 2 = Orin MGBE, and which
set of values is selected by bit 0 of that. Decoding this further,
Legacy EQOS is IP version v5.0, Orin EQOS is v5.3, and Orin MGBE
is v3.1 and v4.0.

So, I wonder whether there's something in "Legacy EQOS" that consumes
4KiB of FIFO that isn't documented in iMX8M (IP v5.1).

Is anyone aware of public SoC documentation that covers the v5.0 IP
version?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH iwl-net 1/5] iavf: fix null pointer dereference in iavf_detect_recover_hung
From: Simon Horman @ 2026-04-15 12:48 UTC (permalink / raw)
  To: aleksandr.loktionov
  Cc: Simon Horman, intel-wired-lan, kiran.patil, sylwesterx.dziedziuch,
	netdev, anthony.l.nguyen, avinash.dayanand
In-Reply-To: <20260413073035.4082204-2-aleksandr.loktionov@intel.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
iavf: fix null pointer dereference in iavf_detect_recover_hung

During a concurrent reset, q_vectors are freed and re-allocated while
the watchdog task may still be iterating rings in
iavf_detect_recover_hung(). Dereferencing a NULL q_vector inside
iavf_force_wb() results in a crash. Guard against this by skipping
rings whose q_vector is NULL.

> Fixes: 9c6c12595b73 ("i40e: Detection and recovery of TX queue hung logic moved to service_task from tx_timeout")

The Fixes tag points to commit 9c6c12595b73, but that commit only modified
the i40e driver and did not introduce the iavf_detect_recover_hung function.

The bug was actually introduced by commit 07d44190a389 ("i40e/i40evf: Detect
and recover hung queue scenario"), which added iavf_detect_recover_hung with
the code pattern 'if (tx_ring && tx_ring->desc)' that calls
iavf_force_wb(vsi, tx_ring->q_vector) without checking if q_vector is NULL.

Should the Fixes tag be:

    Fixes: 07d44190a389 ("i40e/i40evf: Detect and recover hung queue scenario")

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
> index 363c42bf3dcf..e7e7fc9a5c68 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c

[ ... ]

> @@ -195,8 +194,11 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
>  		return;
>
>  	for (i = 0; i < vsi->back->num_active_queues; i++) {
> -		tx_ring = &vsi->back->tx_rings[i];
> -		if (tx_ring && tx_ring->desc) {
> +		struct iavf_ring *tx_ring = &vsi->back->tx_rings[i];
> +
> +		if (!tx_ring || !tx_ring->q_vector)
> +			continue;

The check `if (!tx_ring || ...)` tests whether tx_ring is NULL, but tx_ring
is assigned as `&vsi->back->tx_rings[i]`, which is the address of an array
element. This address is always non-NULL by C language rules, as noted in
the commit message itself: "the compiler can never observe since an
array-element address is always non-NULL."

This creates misleading logic. Other similar code in the driver correctly
omits such checks (for example, in iavf_virtchnl.c and elsewhere in
iavf_txrx.c).

Should this be simplified to just check the q_vector:

    if (!tx_ring->q_vector)
        continue;

> +		if (tx_ring->desc) {
>  			/* If packet counter has not changed the queue is
>  			 * likely stalled, so force an interrupt for this
>  			 * queue.

[ ... ]

^ permalink raw reply

* [PATCH net v4] openvswitch: cap upcall PID array size and pre-size vport replies
From: Weiming Shi @ 2026-04-15 12:51 UTC (permalink / raw)
  To: Aaron Conole, Eelco Chaudron, Ilya Maximets, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Pravin B Shelar, Alex Wang, Thomas Graf, netdev,
	dev, Xiang Mei, Weiming Shi

The vport netlink reply helpers allocate a fixed-size skb with
nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
array via ovs_vport_get_upcall_portids().  Since
ovs_vport_set_upcall_portids() accepts any non-zero multiple of
sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
array large enough to overflow the reply buffer, causing nla_put() to
fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
unprivileged user namespaces enabled (e.g., Ubuntu default), this is
reachable via unshare -Urn since OVS vport mutation operations use
GENL_UNS_ADMIN_PERM.

 kernel BUG at net/openvswitch/datapath.c:2414!
 Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
 CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
 RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
 Call Trace:
  <TASK>
  genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
  genl_rcv_msg (net/netlink/genetlink.c:1194)
  netlink_rcv_skb (net/netlink/af_netlink.c:2550)
  genl_rcv (net/netlink/genetlink.c:1219)
  netlink_unicast (net/netlink/af_netlink.c:1344)
  netlink_sendmsg (net/netlink/af_netlink.c:1894)
  __sys_sendto (net/socket.c:2206)
  __x64_sys_sendto (net/socket.c:2209)
  do_syscall_64 (arch/x86/entry/syscall_64.c:63)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  </TASK>
 Kernel panic - not syncing: Fatal exception

Reject attempts to set more PIDs than nr_cpu_ids in
ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply
size in ovs_vport_cmd_msg_size() based on that bound, similar to the
existing ovs_dp_cmd_msg_size().  nr_cpu_ids matches the cap already
used by the per-CPU dispatch configuration on the datapath side
(ovs_dp_cmd_fill_info() serialises at most nr_cpu_ids PIDs), so the
two sides stay consistent.

Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v4 (per Ilya):
- Use nr_cpu_ids instead of num_possible_cpus() for consistency with
  the per-CPU dispatch on the datapath side.
- Annotate ovs_vport_cmd_msg_size() per-attribute; split nested sums.
v3: Cap at num_possible_cpus(); add ovs_vport_cmd_msg_size(); keep
    BUG_ON(); fix Fixes tag.
v2: Dynamically size reply skb; drop WARN_ON_ONCE, return plain errors.
---
 net/openvswitch/datapath.c | 33 +++++++++++++++++++++++++++++++--
 net/openvswitch/vport.c    |  3 +++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e209099218b4..35e67e51b0d2 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2184,9 +2184,38 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
 	return err;
 }

+static size_t ovs_vport_cmd_msg_size(void)
+{
+	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
+
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_PORT_NO */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_TYPE */
+	msgsize += nla_total_size(IFNAMSIZ);    /* OVS_VPORT_ATTR_NAME */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_IFINDEX */
+	msgsize += nla_total_size(sizeof(s32)); /* OVS_VPORT_ATTR_NETNSID */
+	/* OVS_VPORT_ATTR_STATS */
+	msgsize += nla_total_size_64bit(sizeof(struct ovs_vport_stats));
+	/* OVS_VPORT_ATTR_UPCALL_STATS(OVS_VPORT_UPCALL_ATTR_SUCCESS +
+	 *                             OVS_VPORT_UPCALL_ATTR_FAIL)
+	 */
+	msgsize += nla_total_size(nla_total_size_64bit(sizeof(u64)) +
+				  nla_total_size_64bit(sizeof(u64)));
+	/* OVS_VPORT_ATTR_UPCALL_PID (capped at nr_cpu_ids by
+	 * ovs_vport_set_upcall_portids())
+	 */
+	msgsize += nla_total_size(nr_cpu_ids * sizeof(u32));
+	/* OVS_VPORT_ATTR_OPTIONS(OVS_TUNNEL_ATTR_DST_PORT +
+	 *                        OVS_TUNNEL_ATTR_EXTENSION(OVS_VXLAN_EXT_GBP))
+	 */
+	msgsize += nla_total_size(nla_total_size(sizeof(u16)) +
+				  nla_total_size(nla_total_size(0)));
+
+	return msgsize;
+}
+
 static struct sk_buff *ovs_vport_cmd_alloc_info(void)
 {
-	return nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	return genlmsg_new(ovs_vport_cmd_msg_size(), GFP_KERNEL);
 }

 /* Called with ovs_mutex, only via ovs_dp_notify_wq(). */
@@ -2196,7 +2225,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, struct net *net,
 	struct sk_buff *skb;
 	int retval;

-	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	skb = ovs_vport_cmd_alloc_info();
 	if (!skb)
 		return ERR_PTR(-ENOMEM);

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 23f629e94a36..56b2e2d1a749 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -406,6 +406,9 @@ int ovs_vport_set_upcall_portids(struct vport *vport, const struct nlattr *ids)
 	if (!nla_len(ids) || nla_len(ids) % sizeof(u32))
 		return -EINVAL;

+	if (nla_len(ids) / sizeof(u32) > nr_cpu_ids)
+		return -EINVAL;
+
 	old = ovsl_dereference(vport->upcall_portids);

 	vport_portids = kmalloc(sizeof(*vport_portids) + nla_len(ids),
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: KaFai Wan @ 2026-04-15 12:52 UTC (permalink / raw)
  To: Jiayuan Chen, bpf
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David Ahern, netdev, linux-doc, linux-kernel
In-Reply-To: <0b3a3a41-f709-4414-8a5d-d2eb4959db3f@linux.dev>

On Wed, 2026-04-15 at 09:47 +0800, Jiayuan Chen wrote:
> 
> On 4/14/26 11:37 PM, mkf wrote:
> > On Tue, 2026-04-14 at 18:57 +0800, Jiayuan Chen wrote:
> 
> Hi Martin, I saw your patch. Your solution is better, please ignore mine :)
> 
I'm not Martin, just same first name :). Ok, I'll continue.
> 
> 

-- 
Thanks,
KaFai

^ permalink raw reply

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
From: Russell King (Oracle) @ 2026-04-15 12:56 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <20260415023947.7627-1-CFSworks@gmail.com>

On Tue, Apr 14, 2026 at 07:39:47PM -0700, Sam Edwards wrote:
> The CPU receives frames from the MAC through conventional DMA: the CPU
> allocates buffers for the MAC, then the MAC fills them and returns
> ownership to the CPU. For each hardware RX queue, the CPU and MAC
> coordinate through a shared ring array of DMA descriptors: one
> descriptor per DMA buffer. Each descriptor includes the buffer's
> physical address and a status flag ("OWN") indicating which side owns
> the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
> the flag and the MAC is only allowed to clear it, and both must move
> through the ring in sequence: thus the ring is used for both
> "submissions" and "completions."
> 
> In the stmmac driver, stmmac_rx() bookmarks its position in the ring
> with the `cur_rx` index. The main receive loop in that function checks
> for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
> network stack (NULLing the pointer), and increments `cur_rx` modulo the
> ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
> position with `dirty_rx`, allocates fresh buffers and rearms the
> descriptors (setting OWN=1). If it fails any allocation, it simply stops
> early (leaving OWN=0) and will retry where it left off when next called.
> 
> This means descriptors have a three-stage lifecycle (terms my own):
> - `empty` (OWN=1, buffer valid)
> - `full` (OWN=0, buffer valid and populated)
> - `dirty` (OWN=0, buffer NULL)
> 
> But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
> the past (see 'Fixes:'), there was a bug where the loop could cycle
> `cur_rx` all the way back to the first descriptor it dirtied, resulting
> in a NULL dereference when mistaken for `full`. The aforementioned
> commit resolved that *specific* failure by capping the loop's iteration
> limit at `dma_rx_size - 1`, but this is only a partial fix: if the
> previous stmmac_rx_refill() didn't complete, then there are leftover
> `dirty` descriptors that the loop might encounter without needing to
> cycle fully around. The current code therefore panics (see 'Closes:')
> when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
> catch up to `dirty_rx`.
> 
> Fix this by further tightening the clamp from `dma_rx_size - 1` to
> `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
> entries and limiting the loop so that `cur_rx` cannot catch back up to
> `dirty_rx`. This carries no risk of arithmetic underflow: since the
> maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
> the worst the clamp can do is prevent the loop from running at all.
> 
> Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
> Cc: stable@vger.kernel.org
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>

Locally, while debugging my issues, I used this to prevent cur_rx
catching up with dirty_rx:

                status = stmmac_rx_status(priv, &priv->xstats, p);
                /* check if managed by the DMA otherwise go ahead */
                if (unlikely(status & dma_own))
                        break;

                next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
                                               priv->dma_conf.dma_rx_size);
                if (unlikely(next_entry == rx_q->dirty_rx))
                        break;

                rx_q->cur_rx = next_entry;

If we care about the cost of reloading rx_q->dirty_rx on every
iteration, then I'd suggest that the cost we already incur reading and
writing rx_q->cur_rx is something that should be addressed, and
eliminating that would counter the cost of reading rx_q->dirty_rx. I
suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
likely in the same cache line.

It looks like any fix to stmmac_rx() will also need a corresponding
fix for stmmac_rx_zc().

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next 5/6] net: stmmac: move PHY handling out of __stmmac_open()/release()
From: Russell King (Oracle) @ 2026-04-15 12:59 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, Maxime Coquelin, netdev, Paolo Abeni
In-Reply-To: <8409022.LvFx2qVVIh@steina-w>

On Wed, Apr 15, 2026 at 08:08:40AM +0200, Alexander Stein wrote:
> Hi,
> 
> Am Dienstag, 23. September 2025, 13:26:19 CEST schrieb Russell King (Oracle):
> > Move the PHY attachment/detachment from the network driver out of
> > __stmmac_open() and __stmmac_release() into stmmac_open() and
> > stmmac_release() where these actions will only happen when the
> > interface is administratively brought up or down. It does not make
> > sense to detach and re-attach the PHY during a change of MTU.
> 
> Sorry for coming up now. But I recently noticed this commit breaks changing
> the MTU on i.MX8MP. Once I simply change the MTU I run into some DMA error:
> $ ip link set dev end1 mtu 1400
> imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
> imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
> imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
> imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
> imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-4
> imx-dwmac 30bf0000.ethernet end1: Link is Down
> imx-dwmac 30bf0000.ethernet end1: Failed to reset the dma
> imx-dwmac 30bf0000.ethernet end1: stmmac_hw_setup: DMA engine initialization failed

This basically means that a clock is missing. Please provide more
information:

- what kernel version are you using?
- has EEE been negotiated?
- does the problem persist when EEE is disabled?
- which PHY is attached to stmmac?
- which PHY interface mode is being used to connect the PHY to stmmac?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH iproute2] ss: force a flush in monitor mode
From: Eric Dumazet @ 2026-04-15 13:03 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev, eric.dumazet, Eric Dumazet

Call fflush() from generic_show_sock() in order to work
with pipes and redirects.

After this patch, "ss -E &>log_file" works as expected.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 misc/ss.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/misc/ss.c b/misc/ss.c
index 1ea804ad549e23f767633e07efdd9adf1277af18..39b109276ffa83f12d1e1e9f8f2cf58c25737b4b 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -5534,6 +5534,7 @@ static int generic_show_sock(struct nlmsghdr *nlh, void *arg)
 
 	render();
 
+	fflush(stdout);
 	return ret;
 }
 
-- 
2.54.0.rc1.513.gad8abe7a5a-goog


^ permalink raw reply related

* [PATCH net v2] net: pse-pd: fix out-of-bounds bitmap access in pse_isr() on 32-bit
From: Kory Maincent @ 2026-04-15 13:02 UTC (permalink / raw)
  To: Jakub Kicinski, Kory Maincent (Dent Project), netdev,
	linux-kernel
  Cc: Carlo Szelinsky, thomas.petazzoni, Oleksij Rempel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni

In pse_isr(), notifs_mask was declared as a single unsigned long on the
stack (32 bits on 32-bit architectures). For PSE controllers with more
than 32 ports, this causes two problems:

- map_event callbacks could wrote bit positions >= 32 via
  *notifs_mask |= BIT(i), which is undefined behaviour on a 32-bit
  unsigned long and corrupts adjacent stack memory.

- for_each_set_bit(i, &notifs_mask, pcdev->nr_lines) treats
  &notifs_mask as a multi-word bitmap and reads beyond the single
  unsigned long when nr_lines > BITS_PER_LONG.

Fix this by moving notifs_mask out of the stack and into struct pse_irq
as a dynamically allocated bitmap. It is sized with
BITS_TO_LONGS(pcdev->nr_lines) words in devm_pse_irq_helper(), so it
is always wide enough regardless of the host word size.

Fixes: fc0e6db30941a ("net: pse-pd: Add support for reporting events")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---

Changes in v2:
- Use devm_bitmap_zalloc() instead of devm_kcalloc().
---
 drivers/net/pse-pd/pse_core.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index f6b94ac7a68a4..87aa4f4e97249 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -1170,6 +1170,7 @@ struct pse_irq {
 	struct pse_controller_dev *pcdev;
 	struct pse_irq_desc desc;
 	unsigned long *notifs;
+	unsigned long *notifs_mask;
 };
 
 /**
@@ -1247,7 +1248,6 @@ static int pse_set_config_isr(struct pse_controller_dev *pcdev, int id,
 static irqreturn_t pse_isr(int irq, void *data)
 {
 	struct pse_controller_dev *pcdev;
-	unsigned long notifs_mask = 0;
 	struct pse_irq_desc *desc;
 	struct pse_irq *h = data;
 	int ret, i;
@@ -1257,14 +1257,15 @@ static irqreturn_t pse_isr(int irq, void *data)
 
 	/* Clear notifs mask */
 	memset(h->notifs, 0, pcdev->nr_lines * sizeof(*h->notifs));
+	bitmap_zero(h->notifs_mask, pcdev->nr_lines);
 	mutex_lock(&pcdev->lock);
-	ret = desc->map_event(irq, pcdev, h->notifs, &notifs_mask);
-	if (ret || !notifs_mask) {
+	ret = desc->map_event(irq, pcdev, h->notifs, h->notifs_mask);
+	if (ret || bitmap_empty(h->notifs_mask, pcdev->nr_lines)) {
 		mutex_unlock(&pcdev->lock);
 		return IRQ_NONE;
 	}
 
-	for_each_set_bit(i, &notifs_mask, pcdev->nr_lines) {
+	for_each_set_bit(i, h->notifs_mask, pcdev->nr_lines) {
 		unsigned long notifs, rnotifs;
 		struct pse_ntf ntf = {};
 
@@ -1340,6 +1341,10 @@ int devm_pse_irq_helper(struct pse_controller_dev *pcdev, int irq,
 	if (!h->notifs)
 		return -ENOMEM;
 
+	h->notifs_mask = devm_bitmap_zalloc(dev, pcdev->nr_lines, GFP_KERNEL);
+	if (!h->notifs_mask)
+		return -ENOMEM;
+
 	ret = devm_request_threaded_irq(dev, irq, NULL, pse_isr,
 					IRQF_ONESHOT | irq_flags,
 					irq_name, h);
-- 
2.43.0


^ permalink raw reply related

* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Aleksandr Nogikh @ 2026-04-15 13:05 UTC (permalink / raw)
  To: syzbot+cib904ea9ebb647254, hawk
  Cc: netdev, linux-kernel, syzkaller-bugs, syzbot
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>

... okay, one more fixed bug, one more try.


#syz test

---
  drivers/net/veth.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
         */
        peer = rtnl_dereference(priv->peer);
        if (peer) {
-               for (i = start; i < end; i++)
+               int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+               for (i = start; i < peer_end; i++)
                        netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
        }


^ permalink raw reply related

* Re: [PATCH net v3 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Luigi Leonardi @ 2026-04-15 13:11 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <ad96TgXHW_jKitls@sgarzare-redhat>

On Wed, Apr 15, 2026 at 01:54:43PM +0200, Stefano Garzarella wrote:
>On Wed, Apr 15, 2026 at 01:31:11PM +0200, Stefano Garzarella wrote:
>>On Tue, Apr 14, 2026 at 06:10:22PM +0200, Luigi Leonardi wrote:
>>>`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
>>>`recv` until all requested bytes are available or an error occurs.
>>>
>>>The problem is how it calculates the amount of bytes read: MSG_PEEK
>>>doesn't consume any bytes, will re-read the same bytes from the buffer
>>>head, so, summing the return value every time is wrong.
>>>
>>>Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if the
>>>requested amount is more than the bytes available, the loop will never
>>>terminate, because `recv` will never return EOF. For this reason we need
>>>to compare the amount of read bytes with the number of bytes expected.
>>>
>>>Add a check, and if the MSG_PEEK flag is present, update the counter of
>>>read bytes differently, and break if we read the expected amount.
>>
>>nit: "..., update the counter for bytes read only after all expected
>>bytes have been read and break out of the loop; otherwise, try again
>>after a short delay to avoid consuming too many CPU cycles."
>>
>>>
>>>This allows us to simplify the `test_stream_credit_update_test`, by
>>>reusing `recv_buf`, like some other tests already do.
>>>
>>>This also fixes callers that pass MSG_PEEK to recv_buf().
>>
>>nit: this is implicit from the first part of the description.
>>
>>>
>>>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>>>---
>>>tools/testing/vsock/util.c       | 15 +++++++++++++++
>>>tools/testing/vsock/vsock_test.c | 13 +------------
>>>2 files changed, 16 insertions(+), 12 deletions(-)
>>>
>>>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>>>index 1fe1338c79cd..2c9ee3210090 100644
>>>--- a/tools/testing/vsock/util.c
>>>+++ b/tools/testing/vsock/util.c
>>>@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
>>>	}
>>>}
>>>
>>>+#define RECV_PEEK_RETRY_USEC 10
>>
>>10 usec IMO are a bit low, it could be the same order of the 
>>syscalls involved in the loop, I'd go to some milliseconds like we 
>>do for SEND_SLEEP_USEC.
>>
>>>+
>>>/* Receive bytes in a buffer and check the return value.
>>>+ *
>>>+ * MSG_PEEK note: MSG_PEEK doesn't consume bytes from the buffer, so partial
>>>+ * reads cannot be summed. Instead, the function retries until recv() returns
>>>+ * exactly expected_ret bytes in a single call.
>>
>>I'd replace with something like this:
>>
>>  * When MSG_PEEK is set, recv() is retried until it returns exactly
>>  * expected_ret bytes. The function returns on error, EOF, or timeout
>>  * as usual.
>>
>>Thanks,
>>Stefano
>>
>>>*
>>>* expected_ret:
>>>*  <0 Negative errno (for testing errors)
>>>@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
>>>		if (ret <= 0)
>>>			break;
>>>
>>>+		if (flags & MSG_PEEK) {
>>>+			if (ret == expected_ret) {
>
>On second thought, I think it would be more appropriate to check for
>`ret >= expected_ret` here, because all subsequent recv() will
>definitely return more bytes, so there’s no point in continuing the
>loop... and anyway, we’ll check the result later, so just that change
>should be fine.
>
>And of course I'd update the comment on top in this way:
>
>   * When MSG_PEEK is set, recv() is retried until it returns at least
>   * expected_ret bytes. The function returns on error, EOF, or timeout
>   * as usual.
>
>Thanks,
>Stefano
>

Good idea, will do.

Thanks!
Luigi


^ permalink raw reply

* [PATCH net] net: phy: motorcomm: use device properties for firmware tuning
From: chunzhi.lin @ 2026-04-15 13:14 UTC (permalink / raw)
  To: Frank.Sae
  Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, chunzhi.lin, chunzhi.lin

The Motorcomm PHY driver reads optional firmware properties via
of_property_read_*() from phydev->mdio.dev.of_node. This works for
Device Tree based systems, but causes ACPI platforms to ignore the same
properties when they are supplied through _DSD.

As a result, ACPI-described Motorcomm PHY devices fall back to default
settings instead of applying firmware-provided tuning such as
rx/tx internal delay, drive strength, clock output frequency, and
optional boolean controls like auto-sleep-disabled,
keep-pll-enabled, and tx clock inversion.

Switch these lookups to device_property_read_*() so the driver uses the
generic firmware node interface and can consume the same property names
from either Device Tree or ACPI.

This keeps the existing DT behavior unchanged while allowing ACPI
platforms to honor PHY configuration from firmware.

We have completed testing on Sophgo RISC-V architecture server SD3-10.
This server has a 64-core Thead C920 CPU whose DWMAC is connected to
Motorcomm's PHY YT8531. This server supports UEFI boot and it would like
to use the ACPI table.

Signed-off-by: chunzhi.lin <linchunzhi0@gmail.com>
---
 drivers/net/phy/motorcomm.c | 41 ++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c
index 4d62f7b36212..708491bc198a 100644
--- a/drivers/net/phy/motorcomm.c
+++ b/drivers/net/phy/motorcomm.c
@@ -10,7 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/phy.h>
-#include <linux/of.h>
+#include <linux/property.h>
 
 #define PHY_ID_YT8511		0x0000010a
 #define PHY_ID_YT8521		0x0000011a
@@ -843,12 +843,12 @@ static u32 ytphy_get_delay_reg_value(struct phy_device *phydev,
 				     u16 *rxc_dly_en,
 				     u32 dflt)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	int tb_size_half = tb_size / 2;
 	u32 val;
 	int i;
 
-	if (of_property_read_u32(node, prop_name, &val))
+	if (device_property_read_u32(dev, prop_name, &val))
 		goto err_dts_val;
 
 	/* when rxc_dly_en is NULL, it is get the delay for tx, only half of
@@ -996,12 +996,12 @@ static int yt8531_get_ds_map(struct phy_device *phydev, u32 cur)
 
 static int yt8531_set_ds(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	u32 ds_field_low, ds_field_hi, val;
 	int ret, ds;
 
 	/* set rgmii rx clk driver strength */
-	if (!of_property_read_u32(node, "motorcomm,rx-clk-drv-microamp", &val)) {
+	if (!device_property_read_u32(dev, "motorcomm,rx-clk-drv-microamp", &val)) {
 		ds = yt8531_get_ds_map(phydev, val);
 		if (ds < 0)
 			return dev_err_probe(&phydev->mdio.dev, ds,
@@ -1018,7 +1018,7 @@ static int yt8531_set_ds(struct phy_device *phydev)
 		return ret;
 
 	/* set rgmii rx data driver strength */
-	if (!of_property_read_u32(node, "motorcomm,rx-data-drv-microamp", &val)) {
+	if (!device_property_read_u32(dev, "motorcomm,rx-data-drv-microamp", &val)) {
 		ds = yt8531_get_ds_map(phydev, val);
 		if (ds < 0)
 			return dev_err_probe(&phydev->mdio.dev, ds,
@@ -1051,7 +1051,6 @@ static int yt8531_set_ds(struct phy_device *phydev)
  */
 static int yt8521_probe(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
 	struct device *dev = &phydev->mdio.dev;
 	struct yt8521_priv *priv;
 	int chip_config;
@@ -1101,7 +1100,7 @@ static int yt8521_probe(struct phy_device *phydev)
 			return ret;
 	}
 
-	if (of_property_read_u32(node, "motorcomm,clk-out-frequency-hz", &freq))
+	if (device_property_read_u32(dev, "motorcomm,clk-out-frequency-hz", &freq))
 		freq = YTPHY_DTS_OUTPUT_CLK_DIS;
 
 	if (phydev->drv->phy_id == PHY_ID_YT8521) {
@@ -1169,11 +1168,11 @@ static int yt8521_probe(struct phy_device *phydev)
 
 static int yt8531_probe(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	u16 mask, val;
 	u32 freq;
 
-	if (of_property_read_u32(node, "motorcomm,clk-out-frequency-hz", &freq))
+	if (device_property_read_u32(dev, "motorcomm,clk-out-frequency-hz", &freq))
 		freq = YTPHY_DTS_OUTPUT_CLK_DIS;
 
 	switch (freq) {
@@ -1665,7 +1664,7 @@ static int yt8521_resume(struct phy_device *phydev)
  */
 static int yt8521_config_init(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	int old_page;
 	int ret = 0;
 
@@ -1680,7 +1679,7 @@ static int yt8521_config_init(struct phy_device *phydev)
 			goto err_restore_page;
 	}
 
-	if (of_property_read_bool(node, "motorcomm,auto-sleep-disabled")) {
+	if (device_property_read_bool(dev, "motorcomm,auto-sleep-disabled")) {
 		/* disable auto sleep */
 		ret = ytphy_modify_ext(phydev, YT8521_EXTREG_SLEEP_CONTROL1_REG,
 				       YT8521_ESC1R_SLEEP_SW, 0);
@@ -1688,7 +1687,7 @@ static int yt8521_config_init(struct phy_device *phydev)
 			goto err_restore_page;
 	}
 
-	if (of_property_read_bool(node, "motorcomm,keep-pll-enabled")) {
+	if (device_property_read_bool(dev, "motorcomm,keep-pll-enabled")) {
 		/* enable RXC clock when no wire plug */
 		ret = ytphy_modify_ext(phydev, YT8521_CLOCK_GATING_REG,
 				       YT8521_CGR_RX_CLK_EN, 0);
@@ -1801,14 +1800,14 @@ static int yt8521_led_hw_control_get(struct phy_device *phydev, u8 index,
 
 static int yt8531_config_init(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	int ret;
 
 	ret = ytphy_rgmii_clk_delay_config_with_lock(phydev);
 	if (ret < 0)
 		return ret;
 
-	if (of_property_read_bool(node, "motorcomm,auto-sleep-disabled")) {
+	if (device_property_read_bool(dev, "motorcomm,auto-sleep-disabled")) {
 		/* disable auto sleep */
 		ret = ytphy_modify_ext_with_lock(phydev,
 						 YT8521_EXTREG_SLEEP_CONTROL1_REG,
@@ -1817,7 +1816,7 @@ static int yt8531_config_init(struct phy_device *phydev)
 			return ret;
 	}
 
-	if (of_property_read_bool(node, "motorcomm,keep-pll-enabled")) {
+	if (device_property_read_bool(dev, "motorcomm,keep-pll-enabled")) {
 		/* enable RXC clock when no wire plug */
 		ret = ytphy_modify_ext_with_lock(phydev,
 						 YT8521_CLOCK_GATING_REG,
@@ -1844,7 +1843,7 @@ static int yt8531_config_init(struct phy_device *phydev)
  */
 static void yt8531_link_change_notify(struct phy_device *phydev)
 {
-	struct device_node *node = phydev->mdio.dev.of_node;
+	struct device *dev = &phydev->mdio.dev;
 	bool tx_clk_1000_inverted = false;
 	bool tx_clk_100_inverted = false;
 	bool tx_clk_10_inverted = false;
@@ -1852,17 +1851,17 @@ static void yt8531_link_change_notify(struct phy_device *phydev)
 	u16 val = 0;
 	int ret;
 
-	if (of_property_read_bool(node, "motorcomm,tx-clk-adj-enabled"))
+	if (device_property_read_bool(dev, "motorcomm,tx-clk-adj-enabled"))
 		tx_clk_adj_enabled = true;
 
 	if (!tx_clk_adj_enabled)
 		return;
 
-	if (of_property_read_bool(node, "motorcomm,tx-clk-10-inverted"))
+	if (device_property_read_bool(dev, "motorcomm,tx-clk-10-inverted"))
 		tx_clk_10_inverted = true;
-	if (of_property_read_bool(node, "motorcomm,tx-clk-100-inverted"))
+	if (device_property_read_bool(dev, "motorcomm,tx-clk-100-inverted"))
 		tx_clk_100_inverted = true;
-	if (of_property_read_bool(node, "motorcomm,tx-clk-1000-inverted"))
+	if (device_property_read_bool(dev, "motorcomm,tx-clk-1000-inverted"))
 		tx_clk_1000_inverted = true;
 
 	if (phydev->speed < 0)
-- 
2.34.1


^ permalink raw reply related

* Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: Andrew Lunn @ 2026-04-15 13:20 UTC (permalink / raw)
  To: wenzhaoliao
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <AFkAcAA-KLHz8L2oAyS3qqrb.1.1776247198201.Hmail.2023000929@ruc.edu.cn>

On Wed, Apr 15, 2026 at 05:59:58PM +0800, wenzhaoliao wrote:
> 
> Hello PHY and Rust maintainers,
> 
> 
> I am a PhD student working on a C-to-Rust migration tool for systems code.
> We would like to validate it in Linux with one concrete PHY target and would
> like to confirm direction before posting a larger RFC series.
> 
> 
> Scope of this intent:
> - Initial target: MAE0621A (currently out-of-tree C driver).
> - We do NOT intend to submit a duplicate Rust rewrite of an existing in-tree C PHY driver.
> - Goal: evaluate a semi-automatic abstraction completion workflow:
>   reuse existing Rust PHY abstractions where possible, and add only minimal missing abstractions.
> 
> 
> Planned deliverables:
> - A gap analysis between MAE0621A C callbacks and current rust/kernel/net/phy.rs coverage.
> - A small RFC patch series with minimal abstraction additions (if needed).
> - A MAE0621A Rust driver prototype on top of those abstractions for linux-next/rust-next evaluation.

When done correctly, this sounds reasonable. However, i do have some
further questions.

Do you have hardware? What board do you intent to test this on. Does
the board boot using Mainline?

Do you have the datasheet?

What out of tree C driver do you intend to start from. I had a quick
look around and the first one i found is:

https://github.com/CoreELEC/linux-amlogic/blob/amlogic-5.4.210/drivers/net/phy/maxio.c

As is often the case of an out of tree driver, it is not up to the
quality of a Mainline driver. Doing a tool based C to Rust migration
based on this code will just give you a poor quality Rust driver,
which will not be accepted. Do you have the knowledge to fix all the
issues?

Maybe you can tell us what C driver you are plan to use, and do a
review of it, list all the issues you see with it, what needs
fixing. That will give us an idea if you can produce a Mainline
quality driver.

	Andrew

^ permalink raw reply

* RE: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX reference clock control for E825
From: Kubalewski, Arkadiusz @ 2026-04-15 13:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nitka, Grzegorz, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Oros, Petr, richardcochran@gmail.com, andrew+netdev@lunn.ch,
	Kitszel, Przemyslaw, Nguyen, Anthony L,
	Prathosh.Satish@microchip.com, Vecera, Ivan, jiri@resnulli.us,
	vadim.fedorenko@linux.dev, donald.hunter@gmail.com,
	horms@kernel.org, pabeni@redhat.com, davem@davemloft.net,
	edumazet@google.com
In-Reply-To: <20260414145835.07fbe355@kernel.org>

>From: Jakub Kicinski <kuba@kernel.org>
>Sent: Tuesday, April 14, 2026 11:59 PM
>
>On Mon, 13 Apr 2026 08:19:30 +0000 Kubalewski, Arkadiusz wrote:
>>> My concern is that I think this is a pretty run of the mill SyncE
>>> design. If we need to pretend we have two DPLLs here if we really
>>> only have one and a mux - then our APIs are mis-designed :(
>>
>> Well, the true is that we did not anticipated per-port control of the
>> TX clock source, as a single DPLL device could drive multiple of such.
>>
>> This is not true, that we pretend there is a second PLL - there is a
>> PLL on each TX clock, maybe not a full DPLL, but still the loop with
>> a control over it's sources is there and it has the same 2 external
>> sources + default XO.
>
>Don't we put that MAC PLL into bypass mode if we feed a clock from
>the EEC DPLL?

This HW doesn't use EEC DPLL signal to feed MAC clock, as DPLL is
external from NIC point of view. Only 2 signals from such external DPLL
device are used by NIC:
- synce (a single source for all those TXC per-port DPLL device)
- time_ref (a source for the TS_PLL - which drives PTP timer)

Grzegorz is now working on submitting the patches for later one.

>
>> A mentioned try of adding per port MUX-type pin, just to give some
>>control
>> to the user, is where we wanted to simplify things, but in the end the
>>API
>> would have to be modified in significant way, various paths related to
>>pin
>> registration and keeping correct references, just to make working case
>> for the pin_on_pin_register and it's internals. We decided that the
>>burden
>> and impact for existing design was to high.
>>
>> And that is why the TXC approach emerged, the change of DPLL is minimal,
>> The model is still correct from user perspective, SyncE SW controller
>>shall
>> anticipate possibility that per-port TXC dpll is there
>
>We are starting to push into what was previously the domain of
>drivers/clk, tho. IIUC the "ASIC PLL"s are usually integrated with
>clock dividers. And cannot be "configured" after chip init / async
>reset (which is why I presume you whack a reset in patch 7?).

Well, we need CGU-dividers change for a frequency-compliance with lower
link speeds, the link reset which is required as part of tx-clk switch
and link establishment on a new clock.

>
>> This particular device and driver doesn't implement any EEC-type DPLL
>> device, the one could think that we can just change the type here and
>>use
>> EEC type instead of new one TXC - since we share pins from external dpll
>> driver, which is EEC type, and our DPLL device would have different
>>clock_id
>> and module. But, further designs, where a single NIC is having control
>>over
>> both a EEC DPLL and ability to control each source per-port this would
>>be
>> problematic. At least one NIC Port driver would have to have 2 EEC-type
>>DPLLs
>> leaving user with extra confusion.
>
>The distinction between TXC and EEC dpll is confusing.
>I thought EEC one _was_supposed_to_ drive the Tx clock?
>What PPS means is obvious, what EEC means if not driving Tx clock is
>unclear to me..
>

Yes, correct, EEC DPLL main task would be to drive TX clocks of NIC
ports, but if there is a per-port control something extra is required.

>Let me summarize my concerns - we need to navigate the split between
>drivers/clk and dpll. We need a distinction on what goes where, because
>every ASIC has a bunch of PLLs which until now have been controlled by
>device tree (if at all). If the main question we want to answer is
>"which clock ref is used to drive internal clock" all we need is a MUX.
>If we want to make dpll cover also ASIC PLLs for platforms without
>device tree we need a more generic name than TXC, IMHO.

Well, 'floating' MUX type pin not connected to any dpll would require a
lot of additional implementations, just to allow source selection, as we
have tried it already.

Wouldn't more generic name cause a DPLL purpose problem?
We still want to make sure that given DPLL device would serve the role
of source selection for particular port where a source pin should be an
output either on EEC dpll or some external signal generator but somehow
related to SyncE or similar solutions.

Thanks,
Arkadiusz

^ permalink raw reply

* Re: [PATCH iwl-net 2/5] iavf: fix error path in iavf_request_misc_irq
From: Simon Horman @ 2026-04-15 13:26 UTC (permalink / raw)
  To: aleksandr.loktionov
  Cc: 'Simon Horman', intel-wired-lan, anthony.l.nguyen, netdev,
	kiran.patil
In-Reply-To: <20260413073035.4082204-3-aleksandr.loktionov@intel.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
iavf: fix error path in iavf_request_misc_irq

This patch drops the free_irq() call from the error path of
iavf_request_misc_irq() when request_irq() fails. This is intended to
prevent a kernel warning when freeing an unrequested vector.

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index dad001abc9086..ab5f5adc2d3c2 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -587,7 +587,6 @@ static int iavf_request_misc_irq(struct iavf_adapter *adapter)
>  		dev_err(&adapter->pdev->dev,
>  			"request_irq for %s failed: %d\n",
>  			adapter->misc_vector_name, err);
> -		free_irq(adapter->msix_entries[0].vector, netdev);

Does this patch completely fix the warning?

If request_irq() fails, iavf_request_misc_irq() returns an error but leaves
adapter->msix_entries allocated.

In caller error handling paths like iavf_disable_vf() which clean up after a
failure, iavf_free_misc_irq() is called:

static void iavf_free_misc_irq(struct iavf_adapter *adapter)
{
	struct net_device *netdev = adapter->netdev;

	if (!adapter->msix_entries)
		return;

	free_irq(adapter->msix_entries[0].vector, netdev);
}

Since adapter->msix_entries is still allocated, the check passes and free_irq()
is called on the unrequested vector anyway, triggering the exact same warning.

Does the driver need to explicitly track whether the IRQ was successfully
requested?

>  	}
>  	return err;
>  }

^ permalink raw reply

* Re: [PATCH iwl-net 3/5] iavf: prevent VSI corruption when ring params changed during reset
From: Simon Horman @ 2026-04-15 13:28 UTC (permalink / raw)
  To: aleksandr.loktionov
  Cc: 'Simon Horman', intel-wired-lan, anthony.l.nguyen, netdev,
	kiran.patil
In-Reply-To: <20260413073035.4082204-4-aleksandr.loktionov@intel.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
iavf: prevent VSI corruption when ring params changed during reset

Changing ring parameters via ethtool triggers a VF reset and queue
reconfiguration. If ethtool is called again before the first reset
completes, the second reset races with uninitialised queue state and
can corrupt the VSI resource tree on the PF side. Return -EAGAIN from
iavf_set_ringparam() when the adapter is already resetting or its queues
are disabled.

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index 1cd1f3f2930a0..390913170315f 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -495,6 +495,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
>  	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
>  		return -EINVAL;
>  
> +	if (adapter->state == __IAVF_RESETTING ||
> +	    (adapter->state == __IAVF_RUNNING &&
> +	     adapter->flags & IAVF_FLAG_QUEUES_DISABLED))
> +		return -EAGAIN;
> +

Is there a lockless Time-Of-Check to Time-Of-Use race here?

iavf_set_ringparam() is called from ethtool under rtnl_lock(), but the
background iavf_reset_task() synchronizes using netdev_lock(). Because iavf
does not set netdev->request_ops_lock = true, the ethtool core does not
acquire netdev_lock() for iavf_set_ringparam().

This allows adapter->state to transition to __IAVF_RESETTING immediately
after this lockless check.

Furthermore, calling iavf_reset_step() later in this function without
holding netdev_lock() allows the hardware reset sequence to execute
concurrently from both the ethtool thread and the workqueue, which could
lead to VSI resource corruption.

Could this open-coded state check miss pending resets?

If a reset is scheduled (e.g., by a watchdog timeout setting
IAVF_FLAG_RESET_NEEDED) but the worker task hasn't started yet,
adapter->state will still be __IAVF_RUNNING. This check will pass, and
iavf_set_ringparam() will proceed to mutate ring parameters and execute
iavf_reset_step(), racing with the previously scheduled iavf_reset_task.

Can this be addressed by also checking for IAVF_FLAG_RESET_NEEDED and
IAVF_FLAG_RESET_PENDING?

Also, the commit message notes that triggering a VF reset before the first
one completes causes VSI resource tree corruption. Both iavf_set_channels()
and iavf_change_mtu() also dynamically reconfigure queues and
unconditionally trigger resets by calling iavf_reset_step(adapter).

Should these functions be updated to include a similar state check to
prevent the same VSI resource tree corruption on the PF side?

>  	if (ring->tx_pending > IAVF_MAX_TXD ||
>  	    ring->tx_pending < IAVF_MIN_TXD ||
>  	    ring->rx_pending > IAVF_MAX_RXD ||

^ permalink raw reply

* [linus:master] [selftest]  400e658aa0: kernel-selftests-bpf.net.tun.fail
From: kernel test robot @ 2026-04-15 13:36 UTC (permalink / raw)
  To: Xu Du; +Cc: oe-lkp, lkp, linux-kernel, Jakub Kicinski, netdev, oliver.sang



Hello,

kernel test robot noticed "kernel-selftests-bpf.net.tun.fail" on:

commit: 400e658aa096cda99b37ce806ed63cfe894c9566 ("selftest: tun: Add test for sending gso packet into tun")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: kernel-selftests-bpf
version: 
with following parameters:

	group: net


config: x86_64-rhel-9.4-bpf
compiler: gcc-14
test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202604151525.8305306f-lkp@intel.com


# timeout set to 3600
# selftests: net: tun
# TAP version 13
# 1..9
# # Starting 9 tests from 5 test cases.
# #  RUN           tun.delete_detach_close ...
# #            OK  tun.delete_detach_close
# ok 1 tun.delete_detach_close
# #  RUN           tun.detach_delete_close ...
# #            OK  tun.detach_delete_close
# ok 2 tun.detach_delete_close
# #  RUN           tun.detach_close_delete ...
# #            OK  tun.detach_close_delete
# ok 3 tun.detach_close_delete
# #  RUN           tun.reattach_delete_close ...
# #            OK  tun.reattach_delete_close
# ok 4 tun.reattach_delete_close
# #  RUN           tun.reattach_close_delete ...
# #            OK  tun.reattach_close_delete
# ok 5 tun.reattach_close_delete
# #  RUN           tun_vnet_udptnl.4in4_1mss.send_gso_packet ...
# #            OK  tun_vnet_udptnl.4in4_1mss.send_gso_packet
# ok 6 tun_vnet_udptnl.4in4_1mss.send_gso_packet
# #  RUN           tun_vnet_udptnl.6in4_1mss.send_gso_packet ...
# # tun.c:679:send_gso_packet:Expected ret (0) == variant->data_size (1402)
# # send_gso_packet: Test terminated by assertion
# #          FAIL  tun_vnet_udptnl.6in4_1mss.send_gso_packet
# not ok 7 tun_vnet_udptnl.6in4_1mss.send_gso_packet
# #  RUN           tun_vnet_udptnl.4in6_1mss.send_gso_packet ...
# # tun.c:679:send_gso_packet:Expected ret (0) == variant->data_size (1402)
# # send_gso_packet: Test terminated by assertion
# #          FAIL  tun_vnet_udptnl.4in6_1mss.send_gso_packet
# not ok 8 tun_vnet_udptnl.4in6_1mss.send_gso_packet
# #  RUN           tun_vnet_udptnl.6in6_1mss.send_gso_packet ...
# # tun.c:679:send_gso_packet:Expected ret (0) == variant->data_size (1382)
# # send_gso_packet: Test terminated by assertion
# #          FAIL  tun_vnet_udptnl.6in6_1mss.send_gso_packet
# not ok 9 tun_vnet_udptnl.6in6_1mss.send_gso_packet
# # FAILED: 6 / 9 tests passed.
# # Totals: pass:6 fail:3 xfail:0 xpass:0 skip:0 error:0
not ok 19 selftests: net: tun # exit=1



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260415/202604151525.8305306f-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply

* Re: [PATCH net] net: phy: motorcomm: use device properties for firmware tuning
From: Andrew Lunn @ 2026-04-15 13:37 UTC (permalink / raw)
  To: chunzhi.lin
  Cc: Frank.Sae, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel, chunzhi.lin
In-Reply-To: <20260415131452.3492671-1-linchunzhi0@gmail.com>

On Wed, Apr 15, 2026 at 09:14:52PM +0800, chunzhi.lin wrote:
> The Motorcomm PHY driver reads optional firmware properties via
> of_property_read_*() from phydev->mdio.dev.of_node. This works for
> Device Tree based systems, but causes ACPI platforms to ignore the same
> properties when they are supplied through _DSD.
> 
> As a result, ACPI-described Motorcomm PHY devices fall back to default
> settings instead of applying firmware-provided tuning such as
> rx/tx internal delay, drive strength, clock output frequency, and
> optional boolean controls like auto-sleep-disabled,
> keep-pll-enabled, and tx clock inversion.
> 
> Switch these lookups to device_property_read_*() so the driver uses the
> generic firmware node interface and can consume the same property names
> from either Device Tree or ACPI.
> 
> This keeps the existing DT behavior unchanged while allowing ACPI
> platforms to honor PHY configuration from firmware.

Please document the new ACPI binding in
Documentation/firmware-guide/acpi/dsd and Cc: the ACPI list so they
can review the binding, same as a DT binding would be reviewed.

The Subject line is wrong. This patch is for net-next. Please read

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

and since the merge window is open at the moment, you will need to
wait two weeks before resubmitting.

    Andrew

---
pw-bot: cr

^ permalink raw reply

* Re: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
From: kernel test robot @ 2026-04-15 13:37 UTC (permalink / raw)
  To: Pavitra Jha, pabeni
  Cc: oe-kbuild-all, w, chandrashekar.devegowda, linux-wwan, netdev,
	stable, Pavitra Jha
In-Reply-To: <20260414153201.1633720-1-jhapavitra98@gmail.com>

Hi Pavitra,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]
[also build test WARNING on net-next/main linus/master v7.0 next-20260415]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pavitra-Jha/net-wwan-t7xx-validate-port_count-against-message-length-in-t7xx_port_enum_msg_handler/20260415-014321
base:   net/main
patch link:    https://lore.kernel.org/r/20260414153201.1633720-1-jhapavitra98%40gmail.com
patch subject: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260415/202604151531.ClMVCCxv-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604151531.ClMVCCxv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604151531.ClMVCCxv-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c:127 function parameter 'msg_len' not described in 't7xx_port_enum_msg_handler'
>> Warning: drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c:127 function parameter 'msg_len' not described in 't7xx_port_enum_msg_handler'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [BUG] KASAN: slab-use-after-free in sctp_addto_chunk
From: Xin Long @ 2026-04-15 13:40 UTC (permalink / raw)
  To: 许东洁
  Cc: marcelo.leitner, linux-sctp, netdev, zhaoruilin22
In-Reply-To: <7e897c44.4cb35.19d8f29411c.Coremail.xudongjie25@mails.ucas.ac.cn>

On Tue, Apr 14, 2026 at 11:23 PM 许东洁 <xudongjie25@mails.ucas.ac.cn> wrote:
>
> Hi,
>
> While running fuzzing tests on 6.19.0-rc5, we hit a slab-use-after-free in the SCTP module. The crash occurs in skb_put_data() when processing an incoming chunk and appending data via sctp_addto_chunk().
>
> Looking at the trace and the code, it seems to be an skb reallocation issue. In sctp_sf_beat_8_3(), a pointer to the payload is extracted from the incoming chunk's skb. Later, a pull operation (e.g., pskb_pull) might trigger pskb_expand_head(), which frees the original skb->head and reallocates a larger one. However, the previously extracted payload pointer becomes dangling but is still passed down to sctp_make_heartbeat_ack(), eventually being read by memcpy() in skb_put_data().
>
Hi, Dongjie,

Normally this shouldn't happen, as all incoming skbs must have already
been linearized in sctp_rcv() before coming to  sctp_sf_beat_8_3().
For a linearized skb, pskb_pull() will not trigger the skb
reallocation, but only reduce skb->len and advance skb->data.

Do you have a reproducer to trigger this issue? We need to check how a
non-linearized skb arrives in sctp_sf_beat_8_3().

Thanks.

> It seems we need to either ensure pull operations are completed before taking the payload pointer, or recalculate the pointer immediately after the pull.
>
> We haven't prepared a patch for this yet, but we are glad to help test any proposed fixes.
>
> Crash log, call trace, and machine info are as follows:
>
> [Machine Info]
> QEMU emulator version 6.2.0
> CPU: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz (4 cores)
> Kernel Version: 6.19.0-rc5-00042-g944aacb68baf
>
> [Crash Report & Call Trace]
> BUG: KASAN: slab-use-after-free in skb_put_data include/linux/skbuff.h:2800 [inline]
> BUG: KASAN: slab-use-after-free in sctp_addto_chunk+0xfa/0x2a0 net/sctp/sm_make_chunk.c:1535
> Read of size 56 at addr ffff88804878bb68 by task syz.6.114/15386
>
> CPU: 3 UID: 0 PID: 15386 Comm: syz.6.114 Not tainted 6.19.0-rc5-00042-g944aacb68baf #1 PREEMPT(full)
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
> print_address_description mm/kasan/report.c:378 [inline]
> print_report+0xca/0x5f0 mm/kasan/report.c:482
> kasan_report+0xca/0x100 mm/kasan/report.c:595
> check_region_inline mm/kasan/generic.c:194 [inline]
> kasan_check_range+0x39/0x1c0 mm/kasan/generic.c:200
> __asan_memcpy+0x24/0x60 mm/kasan/shadow.c:105
> skb_put_data include/linux/skbuff.h:2800 [inline]
> sctp_addto_chunk+0xfa/0x2a0 net/sctp/sm_make_chunk.c:1535
> sctp_make_heartbeat_ack+0x54/0x110 net/sctp/sm_make_chunk.c:1198
> sctp_sf_beat_8_3+0x4f6/0x7a0 net/sctp/sm_statefuns.c:1201
> sctp_do_sm+0x172/0x5520 net/sctp/sm_sideeffect.c:1172
> sctp_assoc_bh_rcv+0x38a/0x6c0 net/sctp/associola.c:1034
> sctp_inq_push+0x1dc/0x270 net/sctp/inqueue.c:88
> sctp_backlog_rcv+0x167/0x5a0 net/sctp/input.c:331
> sk_backlog_rcv include/net/sock.h:1177 [inline]
> __release_sock+0x397/0x430 net/core/sock.c:3213
> release_sock+0x5a/0x220 net/core/sock.c:3795
> ...
> </TASK>
>
> Freed by task 15386: kasan_save_stack+0x24/0x50 mm/kasan/common.c:57 kasan_save_track+0x14/0x30 mm/kasan/common.c:78 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:253 [inline] __kasan_slab_free+0x61/0x80 mm/kasan/common.c:285 kasan_slab_free include/linux/kasan.h:235 [inline] slab_free mm/slub.c:6670 [inline] kmem_cache_free+0x15f/0x780 mm/slub.c:6781 skb_kfree_head net/core/skbuff.c:1066 [inline] skb_free_head+0x1b7/0x210 net/core/skbuff.c:1080 pskb_expand_head+0x3b1/0xf80 net/core/skbuff.c:2314 skb_might_realloc+0xb1/0xd0 net/core/skb_fault_injection.c:33 pskb_may_pull_reason include/linux/skbuff.h:2850 [inline] pskb_pull include/linux/skbuff.h:2871 [inline] sctp_sf_beat_8_3+0x419/0x7a0 net/sctp/sm_statefuns.c:1198 ...
> Xu Dongjie
> University of Chinese Academy of Sciences

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox