Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Breno Leitao @ 2026-06-16 16:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, john.ogness, pmladek
  Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260616103529.Yh9Dxsjp@linutronix.de>

On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote:
> On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > > +	local_bh_disable();
> > > + 	poll_napi(dev);
> > > +	_local_bh_enable();
> >
> > tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> > softirq into not waking ksoftirqd? The problematic path is:
> >
> >   scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock)
> >
> > so the softirq may never get serviced.
> >
> > In netcons we try to avoid touching the network driver if the Tx path
> > locks are already held. Ideally we'd do something similar with the
> > scheduler. Try to do bare minimum if we may be in the scheduler.
> > Failing that - don't poll the driver if we were called with irqs
> > already disabled.
> >
> > Or maybe we only poll from console->write_thread ?
>
> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.

Does the nbcon thread handle defer even for consoles that support atomic
operations?

netconsole is marked with CON_NBCON_ATOMIC_UNSAFE, which means it rarely
performs inline/direct printk and instead pushes to the thread, which
flushes in a safe context.

For drivers that behave correctly, I'd like to be able to drop
CON_NBCON_ATOMIC_UNSAFE, potentially setting it at runtime based on the
underlying driver capabilities. If netconsole is backed by a well-behaving
network driver, we could eventually remove the flag (!?)

Would that approach cause any issues?

^ permalink raw reply

* Re: [PATCH RFC 3/9] net: stmmac: qcom-ethqos: fix RGMII_ID mode to use DLL bypass
From: Mohd Ayaan Anwar @ 2026-06-16 16:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
	Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
	linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <82705420-771d-41bf-a4d9-ed94dff86ff0@lunn.ch>

On Mon, Jun 15, 2026 at 06:48:55PM +0200, Andrew Lunn wrote:
> > > I'm curious how this works at the moment? Do no boards make use of
> > > RGMII ID? Are all current boards broken?
> > 
> > Searching through the DTS, I found that we have two boards using "rgmii"
> > (qcs404-evb-4000.dts and sa8155-adp.dts) and another board using
> > "rgmii-txid" (sa8540p-ride.dts). No board which uses RGMII ID.
> 
> So this causes problems. We cannot break existing boards, yet it would
> be good to fix the current broken behaviour.

I am trying to track down the sa8155-adp and sa8540p-ride boards. The
EMAC on QCS404 is extremely similar to QCS615 Ride [0], and I got that
board to work with this series (with RGMII ID mode). So I am fairly
confident that QCS404 would not break (if its even booting up with the
upstream kernel currently). Also, I think we could change the phy-mode
for QCS404 to "rgmii-id" from "rgmii" if these fixes go in.

> It could be the best way forward is that you issue a warning when
> "rgmii" is found and pass rgmii-id to the PHY. And you also change the
> two boards to use rgmii-id. Lets think about the rgmii-txid case once
> we better understand it.
> 

As Konrad mentioned, it would be great to know if we can test out these
boards. Looking at the different versions of the ETHQOS programming
guide, stopping MAC side delay should be as simple as what we are doing
in this commit. But whether the two boards work directly with the
default PHY delays is unknown.

	Ayaan

[0] The proposed RGMII fixes would help enable ethernet on QCS615 Ride
as well. I see that the original series had a lot of issues:
https://lore.kernel.org/all/20250121-dts_qcs615-v3-0-fa4496950d8a@quicinc.com/

^ permalink raw reply

* Re: [PATCH net] appletalk: fix use-after-free in atalk_find_primary()
From: Simon Horman @ 2026-06-16 16:34 UTC (permalink / raw)
  To: Yizhou Zhao
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kees Cook, Kito Xu, linux-kernel, Yuxiang Yang,
	Ao Wang, Xuewei Feng, Qi Li, Ke Xu, stable
In-Reply-To: <20260615103930.1484-1-zhaoyz24@mails.tsinghua.edu.cn>

On Mon, Jun 15, 2026 at 06:39:28PM +0800, Yizhou Zhao wrote:
> atalk_find_primary() walks the global AppleTalk interface list under
> atalk_interfaces_lock, but returns a pointer to iface->address after
> dropping that lock.  Both atalk_autobind() and atalk_bind() then
> dereference the returned pointer without any lifetime protection.
> 
> The interface can be removed concurrently through the normal AppleTalk
> interface ioctl path.  SIOCATALKDIFADDR calls atalk_dev_down(), which
> eventually reaches atif_drop_device() and frees the same struct
> atalk_iface that owns the returned address field.  A racing bind can
> therefore read from freed memory.
> 
> This is reachable with a configured AppleTalk interface; reproducing the
> race does not require a malicious device or driver.  The configuration
> ioctls require CAP_NET_ADMIN in the initial user namespace, and
> AF_APPLETALK sockets are limited to init_net.
> 
> Fix the lifetime issue without changing the returned address pointer
> type.  Rename the helper to atalk_find_primary_locked() and keep
> atalk_interfaces_lock held across the return.  The callers now copy
> s_net and s_node while the lock is still held, then immediately release
> the lock before doing any further work.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: stable@vger.kernel.org
> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
> Reported-by: Ao Wang <wangao@seu.edu.cn>
> Reported-by: Xuewei Feng <fengxw06@126.com>
> Reported-by: Qi Li <qli01@tsinghua.edu.cn>
> Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> ---
> diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
> index 30a6dc06291c..4d6576cd0ae8 100644
> --- a/net/appletalk/ddp.c
> +++ b/net/appletalk/ddp.c
> @@ -351,7 +351,7 @@ struct atalk_addr *atalk_find_dev_addr(struct net_device *dev)
>  	return iface ? &iface->address : NULL;
>  }
>  

A kernel doc for atalk_find_dev_addr which describes the locking
expectations is probably warranted here.

> -static struct atalk_addr *atalk_find_primary(void)
> +static struct atalk_addr *atalk_find_primary_locked(void)
>  {
>  	struct atalk_iface *fiface = NULL;
>  	struct atalk_addr *retval;
> @@ -378,7 +378,6 @@ static struct atalk_addr *atalk_find_primary(void)
>  	else
>  		retval = NULL;
>  out:
> -	read_unlock_bh(&atalk_interfaces_lock);

This function still acquires atalk_interfaces_lock but I don't think that
asymmetry is justified. If the critical section needs to be expanded then I
think it would be best to both acquire and release the lock in the caller.

>  	return retval;
>  }
>  
> @@ -1132,20 +1131,24 @@ static int atalk_autobind(struct sock *sk)
>  {
>  	struct atalk_sock *at = at_sk(sk);
>  	struct sockaddr_at sat;
> -	struct atalk_addr *ap = atalk_find_primary();
> +	struct atalk_addr *ap = atalk_find_primary_locked();
>  	int n = -EADDRNOTAVAIL;

We could take this opportunity to move towards reverse xmas tree here.

>  
>  	if (!ap || ap->s_net == htons(ATADDR_ANYNET))
> -		goto out;
> +		goto unlock_and_out;
>  
>  	at->src_net  = sat.sat_addr.s_net  = ap->s_net;
>  	at->src_node = sat.sat_addr.s_node = ap->s_node;
> +	read_unlock_bh(&atalk_interfaces_lock);

The unlock_and_out label applies to the critical section which ends here.
But in my mind the goto construct is best used for handling errors 
that apply to, and in general accumulate during, the flow of a function.

Combining that with my earlier comments would go for something like the
following (completely untested!). Similarly in atalk_bind().

	struct atalk_sock *at = at_sk(sk);
	struct sockaddr_at sat;
	int n = -EADDRNOTAVAIL;
	struct atalk_addr *ap;

	read_lock_bh(&atalk_interfaces_lock);
	ap = atalk_find_primary_locked();

	if (ap && ap->s_net != htons(ATADDR_ANYNET)) {
		at->src_net  = sat.sat_addr.s_net  = ap->s_net;
		at->src_node = sat.sat_addr.s_node = ap->s_node;
	}

	read_unlock_bh(&atalk_interfaces_lock);

>  
>  	n = atalk_pick_and_bind_port(sk, &sat);
>  	if (!n)
>  		sock_reset_flag(sk, SOCK_ZAPPED);
>  out:
>  	return n;
> +unlock_and_out:
> +	read_unlock_bh(&atalk_interfaces_lock);
> +	goto out;
>  }
>  
>  /* Set the address 'our end' of the connection */

...

-- 
pw-bot: changes-requested

^ permalink raw reply

* Re: [PATCH RFC net-next 0/4] net: pse-pd: decouple controller lookup from MDIO probe
From: Kory Maincent @ 2026-06-16 16:42 UTC (permalink / raw)
  To: Carlo Szelinsky
  Cc: Corey Leavitt, Jakub Kicinski, Russell King, Oleksij Rempel,
	Andrew Lunn, Heiner Kallweit, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, netdev, linux-kernel
In-Reply-To: <20260615180812.829678-1-github@szelinsky.de>

Hello Carlo,

On Mon, 15 Jun 2026 20:08:12 +0200
Carlo Szelinsky <github@szelinsky.de> wrote:

> Hi Corey,
> 
> just checking in on this one. Did you get a chance to continue with the
> series, or is there anything I can help with to move it forward? I'm
> happy to test a v2, and I can still run the SFP path on the
> S600WP-5GT-2SX-SE once it's back on my desk.
> 
> Kory, Jakub, Russell :-) it would be great to hear your view on the
> approach so Corey can plan the next version. The series fixed the probe
> loop in my testing and I'd really like to see it land.

I haven't heard from Corey since this patch series, but I am in favor of this
notifier design.
Corey, do you have time to continue this work? If not, would it be okay for
Carlo to continue it for you?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH RFC 8/9] arm64: dts: qcom: shikra-cqs-evk: Enable ethernet0
From: Mohd Ayaan Anwar @ 2026-06-16 16:50 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
	Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
	linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <2cb658f3-f564-4396-884d-d025eaa674a1@oss.qualcomm.com>

On Tue, Jun 16, 2026 at 11:50:26AM +0200, Konrad Dybcio wrote:
> On 6/11/26 8:37 PM, Mohd Ayaan Anwar wrote:
> 
> > +&tlmm {
> > +	ethernet0_defaults: ethernet0-defaults-state {
> 
> s/defaults/default
> 
> Please move this definition to shikra.dtsi
> 

The CQM and CQS variants have identical GPIO mapping but the IQS is
different. So should I keep this in shikra.dtsi and overwrite for IQS in
shikra-iqs-evk.dts?


> > +
> > +	emac0_phy_en_hog: emac0-phy-en-hog {
> > +		gpio-hog;
> > +		gpios = <149 GPIO_ACTIVE_HIGH>;
> > +		output-high;
> > +		line-name = "emac0-phy-en";
> > +	};
> 
> This looks like a hack - what does this pin actually do?
> 

The power supply to both PHYs on Shikra is gated by a GPIO pin. I am
unsure whether they should be modelled as a fixed, enable-on-boot
regulator or just like this. They need to be powered on early so that
MDIO can detect them.

Thank you for the review. I will fix the stylistic issues in v2.

	Ayaan

^ permalink raw reply

* Re: [Intel-wired-lan] e1000e: Report link down after "Detected Hardware Unit Hang" ?
From: Helge Deller @ 2026-06-16 16:55 UTC (permalink / raw)
  To: Ruinskiy, Dima, Andrew Lunn, Helge Deller
  Cc: Tony Nguyen, Przemek Kitszel, intel-wired-lan, netdev
In-Reply-To: <51828156-e859-44db-9926-c076796d0f75@intel.com>

Hello Dima,

On 6/16/26 18:20, Ruinskiy, Dima wrote:
> On 15/06/2026 23:36, Helge Deller wrote:
>> On 6/15/26 18:41, Andrew Lunn wrote:
>>> On Sun, Jun 14, 2026 at 11:48:08PM +0200, Helge Deller wrote:
>>>> I'm regularily facing the known "eno1: Detected Hardware Unit Hang:"
>>>> with my on-board intel e1000e NIC hardware.
>>>> Since none of he various tips on the internet helped, I had the idea
>>>> to setup a master/slave bond networking to fail over to another NIC when
>>>> the Intel chip hangs.
>>>>
>>>> Sadly this doesn't work as intended, because the link of the intel NIC
>>>> isn't reported "down", so the failover never happens, unless I manually
>>>> start "ifconfig eno1 down".
>>>>
>>>> My question: Shouldn't the intel NIC ideally report Link Down if we know
>>>> it hangs? That way a fail-over should at least happen, right?
>>>>
>>>> Below is a completely untested patch.
>>>> Does it make sense that I try to test and/or develop such a patch, or
>>>> are there things I miss?
>>>
>>> If the interface is dead, then setting the carrier down makes a lot of
>>> sense. 
>>
>> That's what I think as well. Thanks for confirming.
>>
>>> One question i have is, what do you need to do to recover the
>>> hardware? Will it correctly set the carrier up when you do the
>>> recovery?
>>
>> The only way I could recover was to plug the network cable and re-insert it.
>> I have not tested bringing the NIC down.
>> But in both cases the driver will need to re-detect the media & link
>>
>>> Also, just looking at your proposed change, it is not clear to me why
>>> such an assignment will result in carrier down. It would be good to
>>> explain it in the commit message.
>>
>> Sure. The patch I attached was completely untested and just based on
>> the analysis of the flow and how to make the Link possibly report to be down.
>> Maybe someone knowledgeable of the driver has a better suggestion how to
>> report the link down situation in a clean way?
>>
>> Helge
> This does not seem like the right direction to me.
> 
> The "Detected Hardware Unit Hang" print does not indicate that the
> interface is dead, but that the transmitter is stalled.

Ok. But effectively it means there can nothing be transmitted then at this stage,
which somehow is the same as if the Link would be down.

> This can be due to an unusually high load, or a HW fault / race condition with another component, etc.
>
> When a hang is detected, the transmitter is stopped with
> netif_stop_queue() and eventually ndo_tx_timeout triggers a full
> reset to the device, which in many cases recovers it from the hang.

That would be optimal, but I have never seen it recovering from such stalls since years.
Also looking at the many reports in the internet, people say it just
hangs and does not recover until the cable is plugged out (I might be wrong!).

> If the hang is persistent, we try to understand the cause and debug
> it. Permanently marking the device as 'down' because it hung once is
> not going to be the optimal solution.

Of course debugging this situation is preferred but it does not help when
the productive remote system stays unreachable forever.
Right now it just fills the syslog with the same stuck message.
Even an module option like "report_link_down_on_hang after 5 automatic re-tries"
would be good compromise.... You still should be able to get the necessary
debug info then.

Helge

^ permalink raw reply

* Re: [PATCH net v3] tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done
From: Simon Horman @ 2026-06-16 16:58 UTC (permalink / raw)
  To: doruk
  Cc: Simon Horman, jmaloy, davem, edumazet, kuba, pabeni,
	aleksander.lobakin, tipc-discussion, netdev, linux-kernel, stable
In-Reply-To: <20260615114618.71249-1-doruk@0sec.ai>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done

This commit attempts to fix a slab-use-after-free read in
tipc_aead_decrypt_done by taking a reference on the network namespace
before starting asynchronous decryption. It mirrors the encrypt-side fix
by keeping the reference across the async return and dropping it when done.

> diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c
> index 6d3b6b89b1d17..84a6489da0360 100644
> --- a/net/tipc/crypto.c
> +++ b/net/tipc/crypto.c
> @@ -941,12 +941,20 @@ static int tipc_aead_decrypt(struct net *net, struct tipc_aead *aead,
>  		goto exit;
>  	}
>  
> +	/* Get net to avoid freed tipc_crypto when delete namespace */
> +	if (!maybe_get_net(aead->crypto->net)) {

[Severity: High]
Could accessing aead->crypto->net read freed memory if the namespace
teardown races with packet decryption?

If aead is a TX cluster key, its crypto pointer points to the per-netns
crypto_tx structure. During a concurrent namespace teardown, tipc_exit_net()
synchronously frees crypto_tx without waiting for an RCU grace period.

Since packet reception (tipc_rcv) executes in softirq under rcu_read_lock()
but without holding a namespace reference, dereferencing aead->crypto to get
the net pointer could result in a use-after-free.

Because tipc_aead_decrypt() already receives the struct net *net directly as
its first parameter, would it be safer to use maybe_get_net(net) instead to
avoid accessing the potentially freed aead->crypto structure entirely?

> +		tipc_bearer_put(b);
> +		rc = -ENODEV;
> +		goto exit;
> +	}
> +
>  	/* Now, do decrypt */
>  	rc = crypto_aead_decrypt(req);
>  	if (rc == -EINPROGRESS || rc == -EBUSY)
>  		return rc;
>  
>  	tipc_bearer_put(b);
> +	put_net(aead->crypto->net);

[Severity: High]
Does this put_net() call have the same use-after-free risk? Could this also
use the net parameter directly instead of dereferencing aead->crypto->net?

[ ... ]

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Peter Zijlstra @ 2026-06-16 17:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260616103529.Yh9Dxsjp@linutronix.de>

On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote:

> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.

Hmm, I thought netconsole had some reserved skbs and could to writes
'atomic' like? That said, it was 2.6 era the last time I looked at
netconsole.

> Now. The scheduler usually does printk_deferred() because of the rq lock
> so it does not deadlock for various reasons. It is kind of a pity that
> the various WARN macros don't do that.

People have tried, last time was here:

  https://lkml.kernel.org/r/20260611074344.GG48970@noisy.programming.kicks-ass.net

and I hate deferred with a passion. It means you'll never see the
message when you wreck the machine.

> We could add printk_deferred_enter/exit() to all the rq_lock() variants.
> I think PeterZ loves this the most. And Greg will appreciate it too
> while backporting because of all the context changes.

No, not going to happen, ever, sorry. Instead printk should delete
console sem and have printk() itself be atomic safe.

As stated, printk deferred is an abomination and needs to die a horrible
painful death.

As described here:

  https://lkml.kernel.org/r/20260611191922.GK187714@noisy.programming.kicks-ass.net

"So printk should:

 - stick msg in buffer (lockless)
 - print to atomic consoles (lockless)
 - use irq_work to wake console kthreads (lockless)
 - each kthread then tries to flush buffer to its own non-atomic console
   in non-atomic context."




^ permalink raw reply

* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: Kuniyuki Iwashima @ 2026-06-16 17:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ido Schimmel, syzbot, davem, dsahern, horms, kuba, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <CANn89iJ7S1op9FJeaEqdR0KDiPu08PbFP7CqJ8NLVRgcPt370A@mail.gmail.com>

On Tue, Jun 16, 2026 at 8:55 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 16, 2026 at 8:31 AM Ido Schimmel <idosch@nvidia.com> wrote:
> >
> > On Tue, Jun 16, 2026 at 07:05:24AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
> >
> > This includes commit 759923cf03b0 ("ipv4: fib: Convert
> > fib_net_exit_batch() to ->exit_rtnl().") that moved ip_fib_net_exit()
> > (and therefore fib4_rules_exit()) earlier in the netns dismantle path.
> >
> > Kuniyuki, can you please take a look?
> >
> > You can use this to reproduce:
> >
> > #!/bin/bash
> >
> > while true; do
> >         ip netns add ns1
> >         ip -n ns1 link set dev lo up
> >         ip -n ns1 address add 192.0.2.1/24 dev lo
> >         ip -n ns1 link add name dummy1 up type dummy
> >         ip -n ns1 address add 198.51.100.1/24 dev dummy1
> >         ip -n ns1 rule add ipproto tcp sport 12345 table 12345
> >         ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
> >         ip netns del ns1
> > done
> >
>
> Oh right.
>
> While looking at this syzbot report I also found an old issue.
>
> https://lore.kernel.org/netdev/20260616141317.407791-1-edumazet@google.com/T/#u
>
> I guess adding some delays in enqueue_to_backlog() could trigger a
> similar bug even if we revert Kuniyuki's patch.

I'll look into it, thank you both !

>
>
>
>
> > Thanks
> >
> > > git tree:       net-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
> > > compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> > >
> > > bond0 (unregistering): Released all slaves
> > > bond1 (unregistering): Released all slaves
> > > bond2 (unregistering): (slave dummy0): Releasing active interface
> > > bond2 (unregistering): Released all slaves
> > > ==================================================================
> > > BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
> > >
> > > CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> > > Workqueue: netns cleanup_net
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> > >  print_address_description+0x55/0x1e0 mm/kasan/report.c:378
> > >  print_report+0x58/0x70 mm/kasan/report.c:482
> > >  kasan_report+0x117/0x150 mm/kasan/report.c:595
> > >  fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > >  __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
> > >  ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
> > >  ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
> > >  __ip_route_output_key include/net/route.h:169 [inline]
> > >  ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
> > >  ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
> > >  release_sock+0x206/0x260 net/core/sock.c:3861
> > >  inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
> > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > >  fou_release net/ipv4/fou_core.c:562 [inline]
> > >  fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
> > >  ops_exit_list net/core/net_namespace.c:199 [inline]
> > >  ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
> > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > >  process_one_work kernel/workqueue.c:3314 [inline]
> > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > >  kthread+0x389/0x470 kernel/kthread.c:436
> > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >  </TASK>
> > >
> > > Allocated by task 19121:
> > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > >  poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> > >  __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> > >  kasan_kmalloc include/linux/kasan.h:263 [inline]
> > >  __do_kmalloc_node mm/slub.c:5296 [inline]
> > >  __kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
> > >  kmemdup_noprof+0x2b/0x70 mm/util.c:138
> > >  kmemdup_noprof include/linux/fortify-string.h:763 [inline]
> > >  fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
> > >  fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
> > >  ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
> > >  fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
> > >  ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
> > >  setup_net+0x118/0x350 net/core/net_namespace.c:446
> > >  copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
> > >  create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
> > >  unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
> > >  ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
> > >  __do_sys_unshare kernel/fork.c:3316 [inline]
> > >  __se_sys_unshare kernel/fork.c:3314 [inline]
> > >  __x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
> > >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > >  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
> > >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > >
> > > Freed by task 12641:
> > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > >  kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
> > >  poison_slab_object mm/kasan/common.c:253 [inline]
> > >  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> > >  kasan_slab_free include/linux/kasan.h:235 [inline]
> > >  slab_free_hook mm/slub.c:2689 [inline]
> > >  __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
> > >  rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
> > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > >  do_softirq+0x76/0xd0 kernel/softirq.c:523
> > >  __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
> > >  unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
> > >  ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
> > >  ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
> > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > >  process_one_work kernel/workqueue.c:3314 [inline]
> > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > >  kthread+0x389/0x470 kernel/kthread.c:436
> > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >
> > > The buggy address belongs to the object at ffff88804ec4c600
> > >  which belongs to the cache kmalloc-192 of size 192
> > > The buggy address is located 128 bytes inside of
> > >  freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
> > >
> > > The buggy address belongs to the physical page:
> > > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
> > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > page_type: f5(slab)
> > > raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
> > > raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > page_owner tracks the page as allocated
> > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
> > >  set_page_owner include/linux/page_owner.h:32 [inline]
> > >  post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
> > >  prep_new_page mm/page_alloc.c:1861 [inline]
> > >  get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
> > >  __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
> > >  alloc_slab_page mm/slub.c:3278 [inline]
> > >  allocate_slab+0x77/0x660 mm/slub.c:3467
> > >  new_slab mm/slub.c:3525 [inline]
> > >  refill_objects+0x336/0x3d0 mm/slub.c:7272
> > >  refill_sheaf mm/slub.c:2816 [inline]
> > >  __pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
> > >  alloc_from_pcs mm/slub.c:4750 [inline]
> > >  slab_alloc_node mm/slub.c:4884 [inline]
> > >  __do_kmalloc_node mm/slub.c:5295 [inline]
> > >  __kmalloc_noprof+0x464/0x750 mm/slub.c:5308
> > >  kmalloc_noprof include/linux/slab.h:954 [inline]
> > >  kzalloc_noprof include/linux/slab.h:1188 [inline]
> > >  new_dir fs/proc/proc_sysctl.c:966 [inline]
> > >  get_subdir fs/proc/proc_sysctl.c:1010 [inline]
> > >  sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
> > >  __register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
> > >  neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
> > >  addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
> > >  ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
> > >  addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
> > >  notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
> > >  call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
> > >  call_netdevice_notifiers net/core/dev.c:2302 [inline]
> > >  register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
> > >  macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
> > >  rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
> > > page last free pid 12657 tgid 12657 stack trace:
> > >  reset_page_owner include/linux/page_owner.h:25 [inline]
> > >  __free_pages_prepare mm/page_alloc.c:1397 [inline]
> > >  __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
> > >  __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
> > >  tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
> > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > >  __do_softirq kernel/softirq.c:656 [inline]
> > >  invoke_softirq kernel/softirq.c:496 [inline]
> > >  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
> > >  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
> > >  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
> > >  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
> > >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> > >
> > > Memory state around the buggy address:
> > >  ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
> > >  ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > >ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > >                    ^
> > >  ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
> > > ==================================================================
> > >
> > >
> > > ---
> > > This report is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > >
> > > syzbot will keep track of this issue. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >
> > > If the report is already addressed, let syzbot know by replying with:
> > > #syz fix: exact-commit-title
> > >
> > > If you want to overwrite report's subsystems, reply with:
> > > #syz set subsystems: new-subsystem
> > > (See the list of subsystem names on the web dashboard)
> > >
> > > If the report is a duplicate of another one, reply with:
> > > #syz dup: exact-subject-of-another-report
> > >
> > > If you want to undo deduplication, reply with:
> > > #syz undup

^ permalink raw reply

* Re: [PATCH bpf v2 1/2] bpf: Fix partial copy of non-linear test_run output
From: sun jian @ 2026-06-16 17:16 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo
In-Reply-To: <ajFQvedGURQuKqbX@mail.gmail.com>

On Tue, Jun 16, 2026 at 9:33 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
>
> On Tue, Jun 16, 2026 at 05:31:02PM +0800, Sun Jian wrote:
> > For non-linear test_run output, bpf_test_finish() derives the linear
> > data copy length from copy_size - frag_size. This only matches the
> > linear data length when copy_size is the full packet size.
> >
> > When userspace provides a short data_out buffer, copy_size is clamped to
> > that buffer size. If copy_size is smaller than frag_size, the computed
> > length becomes negative and bpf_test_finish() returns -ENOSPC before
> > copying the packet prefix or updating data_size_out.
> >
> > Compute the linear data length from the packet layout instead, and clamp
> > the linear copy length to copy_size. This preserves the expected
> > partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> > in data_out, and report the full packet length through data_size_out.
> >
> > Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> > Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> > ---
> >  net/bpf/test_run.c | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index 2bc04feadfab..976e8fa31bc9 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
> >       }
> >
> >       if (data_out) {
> > -             int len = sinfo ? copy_size - frag_size : copy_size;
> > -
> > -             if (len < 0) {
> > -                     err = -ENOSPC;
> > -                     goto out;
> > -             }
> > +             u32 head_len = size - frag_size;
> > +             u32 len = min(copy_size, head_len);
> >
> >               if (copy_to_user(data_out, data, len))
> >                       goto out;
> >
> >               if (sinfo) {
> > -                     int i, offset = len;
> > +                     u32 offset = len;
> >                       u32 data_len;
> > +                     int i;
>
> That doesn't look needed.
>
> >
> >                       for (i = 0; i < sinfo->nr_frags; i++) {
> >                               skb_frag_t *frag = &sinfo->frags[i];
> > --
> > 2.43.0
> >

Hi Paul,

Thanks for taking another look.

Agreed, I'll keep the fix patch minimal and leave offset as-is.

For the selftest patch, I'll try to reuse pkt_v4 and the existing TC
program where possible, and keep only the minimal XDP frags program for the
XDP case.

Thanks,
Sun Jian

^ permalink raw reply

* Re: [PATCH net-next V3 2/7] netdevsim: Register devlink after device init
From: Mark Bloch @ 2026-06-16 17:29 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
	Jonathan Corbet, Shuah Khan, Jiri Pirko, Simon Horman,
	Sunil Goutham, Linu Cherian, Geetha sowjanya, hariprasad,
	Subbaraya Sundeep, Bharat Bhushan, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Ethan Nelson-Moore, linux-doc,
	netdev, linux-rdma
In-Reply-To: <f266dfa5-0c6c-4be0-b73e-b2185dadd6a7@nvidia.com>



On 11/06/2026 20:43, Mark Bloch wrote:
> 
> 
> On 11/06/2026 18:54, Jakub Kicinski wrote:
>> On Thu, 11 Jun 2026 09:02:03 +0300 Mark Bloch wrote:
>>> On 11/06/2026 2:50, Jakub Kicinski wrote:
>>>> On Fri, 5 Jun 2026 21:10:25 +0300 Mark Bloch wrote:  
>>>>> devl_register() makes the devlink instance visible to userspace. A later
>>>>> patch also makes registration the point where devlink core may call
>>>>> eswitch_mode_set() to apply a boot-time default eswitch mode.
>>>>>
>>>>> Move netdevsim registration after all objects (resources, params, regions,
>>>>> traps, debugfs etc) are initialized, and after the initial eswitch mode is
>>>>> set to legacy.
>>>>>
>>>>> Move devl_unregister() to the beginning of nsim_drv_remove(), before those
>>>>> devlink objects are torn down. This keeps devlink register/unregister as
>>>>> the notification barrier and makes the later object teardown paths run
>>>>> after devlink is no longer registered, so they do not emit their own
>>>>> netlink DEL notifications.  
>>>>
>>>> This is going backwards. At some point someone from nVidia thought that
>>>> we can order our way out of locking, so mlx5 is likely ordered this way,
>>>> but this must not be required, or in any way normalized.
>>>> We (syzbot) quickly discovered that it doesn't cover all corner cases.
>>>> devl_lock() is exposed specifically to allow the driver to finish
>>>> whatever init it needs without letting user space invoke callbacks, yet.
>>>> Almost (?) all driver callbacks hold devl_lock(), so maybe the devlink
>>>> instance is "visible" to user space but that should not matter.  
>>>
>>> Let me clarify.
>>>
>>> No locking is changed here, and I don't want to make register/unregister
>>> ordering a substitute for devl_lock().
>>>
>>> The only requirement I have for this series is that devl_register() is called
>>> only once the driver is ready for devlink core to call eswitch_mode_set().
>>> That follows from the earlier direction to have the core apply the default
>>> mode from devl_register() instead of adding an explicit driver call.
>>
>> This is exactly what I'm objecting to. AFAIU we are trading off
>> explicit call to get the default value for an implicit behavior
>> depending on order of calls. We want to optimize for how easy it
>> is to get the API wrong, not for LoC.
> 
> Right, the reason I moved in this direction is that in v1 I had
> the explicit driver call, and Jiri asked to make this transparent
> from devlink core instead.
> 
>>
>> If we don't have a clean way to implement this without driver
>> changes let's add the explicit API to get the default value.
>> If driver doesn't call it schedule a work to go via the callback
>> once devl_lock() is dropped. That way drivers which care can optimize
>> themselves by reading the default value upfront. Drivers which don't 
>> care will work correctly, and there's no API call order trap.
> 
> The workqueue fallback is possible, but I think it makes the semantics
> more complicated.
> 
> We would need to track devlink instances which still need the default
> applied, and the worker would have to skip/remove them once handled.
> 
> More importantly, the worker can race with userspace setting the
> eswitch mode, so we would also need some state to tell whether the user
> already changed the mode. That feels more fragile than an explicit
> driver call.
> 
>>
>> Not ideal, but isn't that best we can do here?
>> I still have flashbacks of the fallout from the call ordering games, 
>> we have too many drivers to keep this straight...
> 
> That's why I started with the explicit call in the first place.
> 
> I can switch back to this model: drivers which support boot time eswitch
> defaults will opt in and call the helper once they are ready. This keeps
> the support explicit per driver and avoids making it depend on where
> devl_register() happens in the init path.
> 
> With that, devlink can tell at register time whether the instance supports
> boot time eswitch defaults. If the user configured a default for an instance
> whose driver did not opt in, devlink can write to dmesg from
> devl_register().
> 
> Not perfect, but at least the user gets a visible failure instead of the
> config being silently ignored.
> 
> Mark

Jakub, Jiri, any thoughts?

I think the explicit helper is the cleanest option here, without any
workqueue fallback inside devlink. It avoids depending on devl_register()
ordering, and makes the support explicit per driver.

Does that sound like an acceptable direction?

Mark

> 
>>
>>> So if the objection is to the commit message wording, I can fix that and drop
>>> the "notification barrier" language.
>>>
>>> For unregister, I can probably leave the old ordering as-is. I moved it only
>>> to mirror the register path, which felt cleaner, but it is not required for
>>> the default-mode change and as the lock is held I see no issue with doing
>>> that.
> 
> 


^ permalink raw reply

* [PATCH net] net: thunderbolt: Fix frags[] overflow by bounding frame_count
From: Maoyi Xie @ 2026-06-16 17:38 UTC (permalink / raw)
  To: Mika Westerberg, Yehezkel Bernat, Andrew Lunn, Jakub Kicinski,
	Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel

tbnet_poll() assembles a multi-frame ThunderboltIP packet into one skb. The
first frame goes into the skb linear area and every further frame is added as
a page fragment.

	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
			page, hdr_size, frame_size,
			TBNET_RX_PAGE_SIZE - hdr_size);

A packet of frame_count frames therefore ends up with frame_count - 1
fragments. tbnet_check_frame() only bounds the peer supplied frame_count to
TBNET_RING_SIZE / 4 (64), which is far above MAX_SKB_FRAGS (17 by default). A
peer that sends a packet of 19 or more small frames pushes nr_frags past
MAX_SKB_FRAGS, so skb_add_rx_frag() writes past skb_shinfo()->frags[] and
corrupts memory after the shared info.

Tighten the start of packet bound to MAX_SKB_FRAGS + 1 so a packet can never
produce more fragments than frags[] can hold. This matches the recent skb
frags overflow fixes in other receive paths, for example f0813bcd2d9d ("net:
wwan: t7xx: fix potential skb->frags overflow in RX path") and 600dc40554dc
("net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()").

Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
Mika preferred the bound in tbnet_check_frame() over the nr_frags <
MAX_SKB_FRAGS guard in tbnet_poll() that I first floated on the list, so this
rejects the oversized packet up front. Reproduced under KASAN with a harness
that mirrors the per-frame skb_add_rx_frag() loop.

 drivers/net/thunderbolt/main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c
index 7aae5d915a1e..ac016890646c 100644
--- a/drivers/net/thunderbolt/main.c
+++ b/drivers/net/thunderbolt/main.c
@@ -787,8 +787,12 @@ static bool tbnet_check_frame(struct tbnet *net, const struct tbnet_frame *tf,
 		return true;
 	}
 
-	/* Start of packet, validate the frame header */
-	if (frame_count == 0 || frame_count > TBNET_RING_SIZE / 4) {
+	/* Start of packet, validate the frame header. tbnet_poll() puts the
+	 * first frame in the skb linear area and every further frame in a page
+	 * fragment, so a packet may not span more than MAX_SKB_FRAGS + 1 frames
+	 * without overflowing skb_shinfo()->frags[].
+	 */
+	if (frame_count == 0 || frame_count > MAX_SKB_FRAGS + 1) {
 		net->stats.rx_length_errors++;
 		return false;
 	}
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2] tipc: free bearer discoverer via RCU to fix tipc_disc_rcv UAF
From: Samuel Page @ 2026-06-16 17:53 UTC (permalink / raw)
  To: Jon Maloy
  Cc: Tung Quang Nguyen, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netdev, tipc-discussion, linux-kernel,
	Samuel Page, stable

bearer_disable() tears down a bearer's discovery object with
tipc_disc_delete(), which frees the struct tipc_discoverer with a plain,
synchronous kfree(). The discovery receive path, however, still reads
that object under RCU in softirq context:

  tipc_udp_recv()            // udp_media.c, rcu_dereference(ub->bearer)
    -> tipc_rcv()            // node.c
      -> tipc_disc_rcv()     // discover.c
        -> tipc_disc_addr_trial_msg(b->disc, ...)  // reads d->net etc.

tipc_udp_recv() only gates this path on test_bit(0, &b->up), which is a
TOCTOU check: an RX softirq that observes b->up == 1 before
bearer_disable() does clear_bit_unlock(0, &b->up) can still be executing
inside tipc_disc_rcv() when bearer_disable() reaches

	if (b->disc)
		tipc_disc_delete(b->disc);

and kfree()s the discoverer. The reader then dereferences freed memory
(d->net, inlined into tipc_disc_rcv()) in softirq context [0].

The bearer itself is freed RCU-safely (tipc_bearer_put() ->
kfree_rcu(b, rcu)) because the RX path runs under RCU, but the discoverer
hanging off b->disc is freed synchronously. The same b->disc is also
touched under rcu_read_lock() by
tipc_disc_add_dest()/tipc_disc_remove_dest().

Free the discoverer with the same RCU lifetime as its bearer. Add an
rcu_head to struct tipc_discoverer and defer the kfree_skb()/kfree() to
an RCU callback so any in-flight reader that already loaded b->disc
completes before the memory is released. The timer is still shut down
synchronously up front with timer_shutdown_sync() (which can sleep and
must not run from the RCU callback), and shutting it down before the
grace period prevents the periodic LINK_REQUEST timer from rearming or
re-entering the object.

This mirrors the existing TIPC pattern of pairing call_rcu() with a
cleanup callback (see tipc_node_free()/tipc_aead_free()).

[0]: (trailing page/memory-state dump trimmed)
BUG: KASAN: slab-use-after-free in tipc_disc_addr_trial_msg net/tipc/discover.c:149 [inline]
BUG: KASAN: slab-use-after-free in tipc_disc_rcv+0xe7c/0x103c net/tipc/discover.c:236
Read of size 8 at addr ffff000028f07428 by task ksoftirqd/0/15

CPU: 0 UID: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 7.0.11 #3 PREEMPT
Hardware name: linux,dummy-virt (DT)
Call trace:
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xb4/0xd4 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x118/0x5d8 mm/kasan/report.c:482
 kasan_report+0xb0/0xf4 mm/kasan/report.c:595
 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
 tipc_disc_addr_trial_msg net/tipc/discover.c:149 [inline]
 tipc_disc_rcv+0xe7c/0x103c net/tipc/discover.c:236
 tipc_rcv+0x1884/0x2b1c net/tipc/node.c:2126
 tipc_udp_recv+0x22c/0x684 net/tipc/udp_media.c:393
 udp_queue_rcv_one_skb+0x898/0x1798 net/ipv4/udp.c:2441
 udp_queue_rcv_skb+0x1b0/0xa44 net/ipv4/udp.c:2518
 udp_unicast_rcv_skb+0x13c/0x348 net/ipv4/udp.c:2678
 __udp4_lib_rcv+0x1aec/0x246c net/ipv4/udp.c:2754
 udp_rcv+0x78/0xa0 net/ipv4/udp.c:2936
 ip_protocol_deliver_rcu+0x68/0x410 net/ipv4/ip_input.c:207
 ip_local_deliver_finish+0x28c/0x4b4 net/ipv4/ip_input.c:241
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ip_local_deliver+0x29c/0x2ec net/ipv4/ip_input.c:262
 dst_input include/net/dst.h:480 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:453 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:439 [inline]
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ip_rcv+0x21c/0x258 net/ipv4/ip_input.c:573
 __netif_receive_skb_one_core+0x110/0x184 net/core/dev.c:6195
 __netif_receive_skb+0x2c/0x170 net/core/dev.c:6308
 process_backlog+0x178/0x488 net/core/dev.c:6659
 __napi_poll+0xa8/0x540 net/core/dev.c:7726
 napi_poll net/core/dev.c:7789 [inline]
 net_rx_action+0x360/0x964 net/core/dev.c:7946
 handle_softirqs+0x2f0/0x7b0 kernel/softirq.c:622
 run_ksoftirqd kernel/softirq.c:1063 [inline]
 run_ksoftirqd+0x6c/0x88 kernel/softirq.c:1055
 smpboot_thread_fn+0x65c/0x958 kernel/smpboot.c:160
 kthread+0x39c/0x444 kernel/kthread.c:436
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

Allocated by task 68873:
 kasan_save_stack+0x3c/0x64 mm/kasan/common.c:57
 kasan_save_track+0x20/0x3c mm/kasan/common.c:78
 kasan_save_alloc_info+0x40/0x54 mm/kasan/generic.c:570
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0xd4/0xd8 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x1b0/0x458 mm/slub.c:5385
 kmalloc_noprof include/linux/slab.h:950 [inline]
 tipc_disc_create+0xdc/0x5e0 net/tipc/discover.c:356
 tipc_enable_bearer+0x8b8/0xf94 net/tipc/bearer.c:348
 __tipc_nl_bearer_enable+0x2a8/0x398 net/tipc/bearer.c:1047
 tipc_nl_bearer_enable+0x2c/0x48 net/tipc/bearer.c:1056
 genl_family_rcv_msg_doit+0x1e4/0x2c0 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x4e8/0x750 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x204/0x3cc net/netlink/af_netlink.c:2550
 genl_rcv+0x3c/0x54 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x638/0x930 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x798/0xc68 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe0/0x128 net/socket.c:742
 __sys_sendto+0x230/0x2f4 net/socket.c:2206
 __do_sys_sendto net/socket.c:2213 [inline]
 __se_sys_sendto net/socket.c:2209 [inline]
 __arm64_sys_sendto+0xc4/0x13c net/socket.c:2209
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x84/0x2a8 arch/arm64/kernel/syscall.c:49
 el0_svc_common.constprop.0+0xe4/0x294 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x44/0x5c arch/arm64/kernel/syscall.c:151
 el0_svc+0x38/0xac arch/arm64/kernel/entry-common.c:724
 el0t_64_sync_handler+0xa0/0xe4 arch/arm64/kernel/entry-common.c:743
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596

Freed by task 60072:
 kasan_save_stack+0x3c/0x64 mm/kasan/common.c:57
 kasan_save_track+0x20/0x3c mm/kasan/common.c:78
 kasan_save_free_info+0x4c/0x74 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x88/0xb8 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2685 [inline]
 slab_free mm/slub.c:6170 [inline]
 kfree+0x14c/0x458 mm/slub.c:6488
 tipc_disc_delete+0x50/0x68 net/tipc/discover.c:393
 bearer_disable+0x18c/0x278 net/tipc/bearer.c:418
 tipc_bearer_stop+0xe0/0x198 net/tipc/bearer.c:757
 tipc_net_stop+0x110/0x178 net/tipc/net.c:159
 tipc_exit_net+0x80/0x19c net/tipc/core.c:112
 ops_exit_list net/core/net_namespace.c:199 [inline]
 ops_undo_list+0x244/0x694 net/core/net_namespace.c:252
 cleanup_net+0x3a0/0x830 net/core/net_namespace.c:702
 process_one_work+0x628/0xd38 kernel/workqueue.c:3289
 process_scheduled_works kernel/workqueue.c:3372 [inline]
 worker_thread+0x7a8/0xac0 kernel/workqueue.c:3453
 kthread+0x39c/0x444 kernel/kthread.c:436
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
v2:
 - Wrap the over-80-column container_of() line in tipc_disc_free_rcu()
   to fix the coding-style issue raised in review.

v1: https://lore.kernel.org/netdev/20260615144233.1730935-1-sam@bynar.io/

 net/tipc/discover.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 3e54d2df5683..761b625bba5a 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -49,6 +49,7 @@
 
 /**
  * struct tipc_discoverer - information about an ongoing link setup request
+ * @rcu: RCU head used to free the structure after a grace period
  * @bearer_id: identity of bearer issuing requests
  * @net: network namespace instance
  * @dest: destination address for request messages
@@ -60,6 +61,7 @@
  * @timer_intv: current interval between requests (in ms)
  */
 struct tipc_discoverer {
+	struct rcu_head rcu;
 	u32 bearer_id;
 	struct tipc_media_addr dest;
 	struct net *net;
@@ -382,6 +384,18 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 	return 0;
 }
 
+/* RCU callback: free the discoverer only after any concurrent
+ * tipc_disc_rcv() softirq reader of bearer->disc has finished.
+ */
+static void tipc_disc_free_rcu(struct rcu_head *rp)
+{
+	struct tipc_discoverer *d;
+
+	d = container_of(rp, struct tipc_discoverer, rcu);
+	kfree_skb(d->skb);
+	kfree(d);
+}
+
 /**
  * tipc_disc_delete - destroy object sending periodic link setup requests
  * @d: ptr to link dest structure
@@ -389,8 +403,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 void tipc_disc_delete(struct tipc_discoverer *d)
 {
 	timer_shutdown_sync(&d->timer);
-	kfree_skb(d->skb);
-	kfree(d);
+	call_rcu(&d->rcu, tipc_disc_free_rcu);
 }
 
 /**

base-commit: 47186409c092cd7dd70350999186c700233e854d
-- 
2.54.0


^ permalink raw reply related

* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: Kuniyuki Iwashima @ 2026-06-16 17:59 UTC (permalink / raw)
  To: kuniyu
  Cc: davem, dsahern, edumazet, horms, idosch, kuba, linux-kernel,
	netdev, pabeni, syzbot+965506b59a2de0b6905c, syzkaller-bugs
In-Reply-To: <CAAVpQUB8W6nXOq-OQfSArKC_xzFbQ=dg62Ee3R=0nuX0sW0fMg@mail.gmail.com>

From: Kuniyuki Iwashima <kuniyu@google.com>
Date: Tue, 16 Jun 2026 10:06:55 -0700
> On Tue, Jun 16, 2026 at 8:55 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 16, 2026 at 8:31 AM Ido Schimmel <idosch@nvidia.com> wrote:
> > >
> > > On Tue, Jun 16, 2026 at 07:05:24AM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
> > >
> > > This includes commit 759923cf03b0 ("ipv4: fib: Convert
> > > fib_net_exit_batch() to ->exit_rtnl().") that moved ip_fib_net_exit()
> > > (and therefore fib4_rules_exit()) earlier in the netns dismantle path.
> > >
> > > Kuniyuki, can you please take a look?
> > >
> > > You can use this to reproduce:
> > >
> > > #!/bin/bash
> > >
> > > while true; do
> > >         ip netns add ns1
> > >         ip -n ns1 link set dev lo up
> > >         ip -n ns1 address add 192.0.2.1/24 dev lo
> > >         ip -n ns1 link add name dummy1 up type dummy
> > >         ip -n ns1 address add 198.51.100.1/24 dev dummy1
> > >         ip -n ns1 rule add ipproto tcp sport 12345 table 12345
> > >         ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
> > >         ip netns del ns1
> > > done
> > >
> >
> > Oh right.
> >
> > While looking at this syzbot report I also found an old issue.
> >
> > https://lore.kernel.org/netdev/20260616141317.407791-1-edumazet@google.com/T/#u
> >
> > I guess adding some delays in enqueue_to_backlog() could trigger a
> > similar bug even if we revert Kuniyuki's patch.
> 
> I'll look into it, thank you both !

I'll move fib4_rules_exit() to ->exit().

fib_unmerge() requires RTNL, but it is not needed in ->delete()
in the first place since it's already called in ->configure().

---8<---
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c7d1f31650d7..42212970d735 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1612,10 +1612,6 @@ static void ip_fib_net_exit(struct net *net)
 			fib_free_table(tb);
 		}
 	}
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	fib4_rules_exit(net);
-#endif
 }
 
 static int __net_init fib_net_init(struct net *net)
@@ -1652,6 +1648,9 @@ static int __net_init fib_net_init(struct net *net)
 	ip_fib_net_exit(net);
 	rtnl_net_unlock(net);
 
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	goto out;
@@ -1671,6 +1670,9 @@ static void __net_exit fib_net_exit_rtnl(struct net *net,
 
 static void __net_exit fib_net_exit(struct net *net)
 {
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	fib4_semantics_exit(net);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 51f0193092f0..0bf6204468c5 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -352,12 +352,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 static int fib4_rule_delete(struct fib_rule *rule)
 {
 	struct net *net = rule->fr_net;
-	int err;
-
-	/* split local/main if they are not already split */
-	err = fib_unmerge(net);
-	if (err)
-		goto errout;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	if (((struct fib4_rule *)rule)->tclassid)
@@ -368,8 +362,8 @@ static int fib4_rule_delete(struct fib_rule *rule)
 	if (net->ipv4.fib_rules_require_fldissect &&
 	    fib_rule_requires_fldissect(rule))
 		net->ipv4.fib_rules_require_fldissect--;
-errout:
-	return err;
+
+	return 0;
 }
 
 static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
---8<---



> 
> >
> >
> >
> >
> > > Thanks
> > >
> > > > git tree:       net-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
> > > > compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> > > >
> > > > bond0 (unregistering): Released all slaves
> > > > bond1 (unregistering): Released all slaves
> > > > bond2 (unregistering): (slave dummy0): Releasing active interface
> > > > bond2 (unregistering): Released all slaves
> > > > ==================================================================
> > > > BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > > Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
> > > >
> > > > CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> > > > Workqueue: netns cleanup_net
> > > > Call Trace:
> > > >  <TASK>
> > > >  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> > > >  print_address_description+0x55/0x1e0 mm/kasan/report.c:378
> > > >  print_report+0x58/0x70 mm/kasan/report.c:482
> > > >  kasan_report+0x117/0x150 mm/kasan/report.c:595
> > > >  fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > >  __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
> > > >  ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
> > > >  ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
> > > >  __ip_route_output_key include/net/route.h:169 [inline]
> > > >  ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
> > > >  ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
> > > >  release_sock+0x206/0x260 net/core/sock.c:3861
> > > >  inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
> > > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > > >  fou_release net/ipv4/fou_core.c:562 [inline]
> > > >  fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
> > > >  ops_exit_list net/core/net_namespace.c:199 [inline]
> > > >  ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
> > > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > > >  process_one_work kernel/workqueue.c:3314 [inline]
> > > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > > >  kthread+0x389/0x470 kernel/kthread.c:436
> > > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > >  </TASK>
> > > >
> > > > Allocated by task 19121:
> > > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > > >  poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> > > >  __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> > > >  kasan_kmalloc include/linux/kasan.h:263 [inline]
> > > >  __do_kmalloc_node mm/slub.c:5296 [inline]
> > > >  __kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
> > > >  kmemdup_noprof+0x2b/0x70 mm/util.c:138
> > > >  kmemdup_noprof include/linux/fortify-string.h:763 [inline]
> > > >  fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
> > > >  fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
> > > >  ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
> > > >  fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
> > > >  ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
> > > >  setup_net+0x118/0x350 net/core/net_namespace.c:446
> > > >  copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
> > > >  create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
> > > >  unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
> > > >  ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
> > > >  __do_sys_unshare kernel/fork.c:3316 [inline]
> > > >  __se_sys_unshare kernel/fork.c:3314 [inline]
> > > >  __x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
> > > >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > >  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
> > > >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > Freed by task 12641:
> > > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > > >  kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
> > > >  poison_slab_object mm/kasan/common.c:253 [inline]
> > > >  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> > > >  kasan_slab_free include/linux/kasan.h:235 [inline]
> > > >  slab_free_hook mm/slub.c:2689 [inline]
> > > >  __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
> > > >  rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
> > > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > > >  do_softirq+0x76/0xd0 kernel/softirq.c:523
> > > >  __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
> > > >  unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
> > > >  ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
> > > >  ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
> > > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > > >  process_one_work kernel/workqueue.c:3314 [inline]
> > > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > > >  kthread+0x389/0x470 kernel/kthread.c:436
> > > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > >
> > > > The buggy address belongs to the object at ffff88804ec4c600
> > > >  which belongs to the cache kmalloc-192 of size 192
> > > > The buggy address is located 128 bytes inside of
> > > >  freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
> > > >
> > > > The buggy address belongs to the physical page:
> > > > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
> > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > page_type: f5(slab)
> > > > raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
> > > > raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
> > > > page dumped because: kasan: bad access detected
> > > > page_owner tracks the page as allocated
> > > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
> > > >  set_page_owner include/linux/page_owner.h:32 [inline]
> > > >  post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
> > > >  prep_new_page mm/page_alloc.c:1861 [inline]
> > > >  get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
> > > >  __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
> > > >  alloc_slab_page mm/slub.c:3278 [inline]
> > > >  allocate_slab+0x77/0x660 mm/slub.c:3467
> > > >  new_slab mm/slub.c:3525 [inline]
> > > >  refill_objects+0x336/0x3d0 mm/slub.c:7272
> > > >  refill_sheaf mm/slub.c:2816 [inline]
> > > >  __pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
> > > >  alloc_from_pcs mm/slub.c:4750 [inline]
> > > >  slab_alloc_node mm/slub.c:4884 [inline]
> > > >  __do_kmalloc_node mm/slub.c:5295 [inline]
> > > >  __kmalloc_noprof+0x464/0x750 mm/slub.c:5308
> > > >  kmalloc_noprof include/linux/slab.h:954 [inline]
> > > >  kzalloc_noprof include/linux/slab.h:1188 [inline]
> > > >  new_dir fs/proc/proc_sysctl.c:966 [inline]
> > > >  get_subdir fs/proc/proc_sysctl.c:1010 [inline]
> > > >  sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
> > > >  __register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
> > > >  neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
> > > >  addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
> > > >  ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
> > > >  addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
> > > >  notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
> > > >  call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
> > > >  call_netdevice_notifiers net/core/dev.c:2302 [inline]
> > > >  register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
> > > >  macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
> > > >  rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
> > > > page last free pid 12657 tgid 12657 stack trace:
> > > >  reset_page_owner include/linux/page_owner.h:25 [inline]
> > > >  __free_pages_prepare mm/page_alloc.c:1397 [inline]
> > > >  __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
> > > >  __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
> > > >  tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
> > > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > > >  __do_softirq kernel/softirq.c:656 [inline]
> > > >  invoke_softirq kernel/softirq.c:496 [inline]
> > > >  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
> > > >  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
> > > >  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
> > > >  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
> > > >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> > > >
> > > > Memory state around the buggy address:
> > > >  ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > >  ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > > >                    ^
> > > >  ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >  ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
> > > > ==================================================================
> > > >
> > > >
> > > > ---
> > > > This report is generated by a bot. It may contain errors.
> > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > > >
> > > > syzbot will keep track of this issue. See:
> > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > >
> > > > If the report is already addressed, let syzbot know by replying with:
> > > > #syz fix: exact-commit-title
> > > >
> > > > If you want to overwrite report's subsystems, reply with:
> > > > #syz set subsystems: new-subsystem
> > > > (See the list of subsystem names on the web dashboard)
> > > >
> > > > If the report is a duplicate of another one, reply with:
> > > > #syz dup: exact-subject-of-another-report
> > > >
> > > > If you want to undo deduplication, reply with:
> > > > #syz undup
> 

^ permalink raw reply related

* [PATCH nf-next v3 1/4] netfilter: nf_nat_ftp: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_ftp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
index c92a436d9c48..ab714629e2b1 100644
--- a/net/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -69,7 +69,7 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       struct nf_conntrack_expect *exp)
 {
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn *ct = exp->master;
 	char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN];
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 3/4] netfilter: nf_sockopt: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8, update prototype
and struct definition.

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/linux/netfilter.h  | 6 +++---
 net/netfilter/nf_sockopt.c | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index efbbfa770d66..91b68bdba3f5 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -181,7 +181,7 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
 struct nf_sockopt_ops {
 	struct list_head list;
 
-	u_int8_t pf;
+	u8 pf;
 
 	/* Non-inclusive ranges: use 0/0/NULL to never get called. */
 	int set_optmin;
@@ -357,9 +357,9 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 }
 
 /* Call setsockopt() */
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int optval, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int optval, sockptr_t opt,
 		  unsigned int len);
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int optval, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int optval, char __user *opt,
 		  int *len);
 
 struct flowi;
diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index 34afcd03b6f6..19a1d028158c 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -59,8 +59,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg)
 }
 EXPORT_SYMBOL(nf_unregister_sockopt);
 
-static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
-		int val, int get)
+static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u8 pf,
+					      int val, int get)
 {
 	struct nf_sockopt_ops *ops;
 
@@ -89,7 +89,7 @@ static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
 	return ops;
 }
 
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int val, sockptr_t opt,
 		  unsigned int len)
 {
 	struct nf_sockopt_ops *ops;
@@ -104,7 +104,7 @@ int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
 }
 EXPORT_SYMBOL(nf_setsockopt);
 
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int val, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int val, char __user *opt,
 		  int *len)
 {
 	struct nf_sockopt_ops *ops;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 4/4] netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t/u_int16_t with preferred kernel types u8/u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/xt_TCPOPTSTRIP.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_TCPOPTSTRIP.c b/net/netfilter/xt_TCPOPTSTRIP.c
index 93f064306901..265d21697847 100644
--- a/net/netfilter/xt_TCPOPTSTRIP.c
+++ b/net/netfilter/xt_TCPOPTSTRIP.c
@@ -16,7 +16,7 @@
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter/xt_TCPOPTSTRIP.h>
 
-static inline unsigned int optlen(const u_int8_t *opt, unsigned int offset)
+static inline unsigned int optlen(const u8 *opt, unsigned int offset)
 {
 	/* Beware zero-length options: make finite progress */
 	if (opt[offset] <= TCPOPT_NOP || opt[offset+1] == 0)
@@ -33,8 +33,8 @@ tcpoptstrip_mangle_packet(struct sk_buff *skb,
 	const struct xt_tcpoptstrip_target_info *info = par->targinfo;
 	struct tcphdr *tcph, _th;
 	unsigned int optl, i, j;
-	u_int16_t n, o;
-	u_int8_t *opt;
+	u16 n, o;
+	u8 *opt;
 	int tcp_hdrlen;
 
 	/* This is a fragment, no TCP header is available */
@@ -97,7 +97,7 @@ tcpoptstrip_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	int tcphoff;
-	u_int8_t nexthdr;
+	u8 nexthdr;
 	__be16 frag_off;
 
 	nexthdr = ipv6h->nexthdr;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 2/4] netfilter: nf_nat_irc: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int16_t with preferred kernel type u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_irc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c
index 19c4fcc60c50..14b79cb0171b 100644
--- a/net/netfilter/nf_nat_irc.c
+++ b/net/netfilter/nf_nat_irc.c
@@ -39,7 +39,7 @@ static unsigned int help(struct sk_buff *skb,
 	char buffer[sizeof("4294967296 65635")];
 	struct nf_conn *ct = exp->master;
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 
 	/* Reply comes from server. */
 	newaddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 0/4] netfilter: replace u_int*_t with kernel int types
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev

Hi all! This is my first patch series of many, I hope :)
I'd like to start contributing by helping out with janitor work,
standardizing code and cleaning up.

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files.

u_int*_t appears in many other files, but I wanted to keep this series
small, unless advised otherwise.

No functional changes.

Changes in v3:
- dropping changes to nf_log and xt_DSCP (need deeper understanding of the
  subsystem before converting these correctly)
- link to v2: https://lore.kernel.org/all/20260615133835.51273-1-carlos@carlosgrillet.me

Changes in v2:
- addresses sashiko comments https://sashiko.dev/#/patchset/32368
  - nf_sockopt: update function prototypes and struct definitions
  - nf_log: update the corresponding function declarations and the
    nf_logfn typedef
- link to v1: https://lore.kernel.org/all/20260612125146.75672-1-carlos@carlosgrillet.me

Carlos Grillet (4):
  netfilter: nf_nat_ftp: replace u_int16_t with u16
  netfilter: nf_nat_irc: replace u_int16_t with u16
  netfilter: nf_sockopt: replace u_int8_t with u8
  netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16

 include/linux/netfilter.h      | 6 +++---
 net/netfilter/nf_nat_ftp.c     | 2 +-
 net/netfilter/nf_nat_irc.c     | 2 +-
 net/netfilter/nf_sockopt.c     | 8 ++++----
 net/netfilter/xt_TCPOPTSTRIP.c | 8 ++++----
 5 files changed, 13 insertions(+), 13 deletions(-)

-- 
2.54.0


^ permalink raw reply

* Re: [PATCH bpf] bpf, sockmap: fix lock inversion between stab->lock and sk_callback_lock
From: Sechang Lim @ 2026-06-16 18:40 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: John Fastabend, Jakub Sitnicki, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski, Simon Horman,
	netdev, bpf, linux-kernel
In-Reply-To: <575a878e-6d37-4337-a821-4883d3dd3a63@linux.dev>

On Tue, Jun 16, 2026 at 06:17:48PM +0800, Jiayuan Chen wrote:
>
>On 6/16/26 5:11 PM, Sechang Lim wrote:
>>sock_map_update_common() and __sock_map_delete() hold stab->lock and call
>>sock_map_unref() -> sock_map_del_link() under it. sock_map_del_link() takes
>>sk_callback_lock for write to stop the strparser and verdict, giving the
>>lock order stab->lock -> sk_callback_lock.
>>
>>The opposite order comes from an SK_SKB stream parser. On RX,
>>sk_psock_strp_data_ready() holds sk_callback_lock for read while running
>>the parser. The verdict redirects the skb to egress, where a sched_cls
>
>
>The commit message is wrong. A verdict does not redirect to egress
>synchronously — sk_psock_skb_redirect() only queues the skb and
>schedule_delayed_work()s sk_psock_backlog, so egress runs in workqueue
>context, not under sk_callback_lock.
>

Thanks, you're right. it's the inline ACK, not the redirect. Sorry for
the misleading changelog, I'll fix it in v2.

>
>>program calls bpf_map_delete_elem() on a sockmap, which takes stab->lock:
>>
>>   WARNING: possible circular locking dependency detected
>>   7.1.0-rc6 Not tainted
>>   ------------------------------------------------------
>>   syz.9.8824 is trying to acquire lock:
>>   (&stab->lock){+.-.}-{3:3}, at: __sock_map_delete net/core/sock_map.c:421
>>   but task is already holding lock:
>>   (clock-AF_INET){++.-}-{3:3}, at: sk_psock_strp_data_ready net/core/skmsg.c:1173
>>
>>   -> #1 (clock-AF_INET){++.-}-{3:3}:
>>          _raw_write_lock_bh
>>          sock_map_del_link net/core/sock_map.c:167
>>          sock_map_unref net/core/sock_map.c:184
>>          sock_map_update_common net/core/sock_map.c:509
>>          sock_map_update_elem_sys net/core/sock_map.c:588
>>          map_update_elem kernel/bpf/syscall.c:1805
>>
>>   -> #0 (&stab->lock){+.-.}-{3:3}:
>>          _raw_spin_lock_bh
>>          __sock_map_delete net/core/sock_map.c:421
>>          sock_map_delete_elem net/core/sock_map.c:452
>>          bpf_prog_06044d24140080b6
>>          tcx_run net/core/dev.c:4451
>>          sch_handle_egress net/core/dev.c:4541
>>          __dev_queue_xmit net/core/dev.c:4808
>>          ...
>>          tcp_bpf_strp_read_sock net/ipv4/tcp_bpf.c:701
>
>
>I guess it is an ACK. What is the actual purpose of a sched_cls 
>program calling
>
>sockmap delete on the TX path of an ACK? If there is no real use case 
>for it, this is
>
>just broken BPF usage, not a kernel bug worth this change.
>
>

I don't have a real use case for that exact program. But the verifier
allows sockmap delete from tc, and it deadlocks when the strparser's
socket is concurrently removed from the same map. The fix only moves
sock_map_unref() out from under stab->lock.

Best,
Sechang

^ permalink raw reply

* [net PATCH v2] octeontx2-pf: mcs: Fix mcs resources free on PF shutdown
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep
In-Reply-To: <1781636420-19816-1-git-send-email-sbhatta@marvell.com>

From: Geetha sowjanya <gakula@marvell.com>

On PF shutdown, the current driver free mcs hardware
resources though mcs resources are not allocated to it.
This patch checks the mcs resources status and if resources
are allocated then only sends mailbox message to free them.

Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
v2 changes:
 Fixed AI review so that pfvf->macsec_cfg is freed correctly

 .../net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c    | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
index 2cc1bdfd9b2e..4d3a7f4be962 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
@@ -1776,11 +1776,16 @@ int cn10k_mcs_init(struct otx2_nic *pfvf)
 
 void cn10k_mcs_free(struct otx2_nic *pfvf)
 {
+	struct cn10k_mcs_cfg *cfg = pfvf->macsec_cfg;
+
 	if (!test_bit(CN10K_HW_MACSEC, &pfvf->hw.cap_flag))
 		return;
 
-	cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true);
-	cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true);
+	if (!list_empty(&cfg->txsc_list)) {
+		cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true);
+		cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true);
+	}
+
 	kfree(pfvf->macsec_cfg);
 	pfvf->macsec_cfg = NULL;
 }
-- 
2.48.1


^ permalink raw reply related

* [net PATCH v2] octeontx2-af: mcs: Fix unsupported secy stats read
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep

From: Geetha sowjanya <gakula@marvell.com>

Secy control stats counter doesn't exist for CNF10KB platform.
Skip reading this respective register for CNF10KB silicon while
fetching secy stats.

Fixes: 9312150af8da ("octeontx2-af: cn10k: mcs: Support for stats collection")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
v2 changes:
 Fixed AI review by modifying debugfs also NOT to access
 Secy control stats counter

 drivers/net/ethernet/marvell/octeontx2/af/mcs.c         | 6 +++---
 drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
index c1775bd01c2b..a07e0b3d8d00 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
@@ -120,13 +120,13 @@ void mcs_get_rx_secy_stats(struct mcs *mcs, struct mcs_secy_stats *stats, int id
 	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYUNTAGGEDX(id);
 	stats->pkt_untaged_cnt = mcs_reg_read(mcs, reg);
 
-	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id);
-	stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg);
-
 	if (mcs->hw->mcs_blks > 1) {
 		reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYNOTAGX(id);
 		stats->pkt_notag_cnt = mcs_reg_read(mcs, reg);
+		return;
 	}
+	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id);
+	stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg);
 }
 
 void mcs_get_flowid_stats(struct mcs *mcs, struct mcs_flowid_stats *stats,
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
index fa461489acdd..ca2704b188a5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
@@ -482,10 +482,11 @@ static int rvu_dbg_mcs_rx_secy_stats_display(struct seq_file *filp, void *unused
 		seq_printf(filp, "secy%d: Tagged ctrl pkts: %lld\n", secy_id,
 			   stats.pkt_tagged_ctl_cnt);
 		seq_printf(filp, "secy%d: Untaged pkts: %lld\n", secy_id, stats.pkt_untaged_cnt);
-		seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt);
 		if (mcs->hw->mcs_blks > 1)
 			seq_printf(filp, "secy%d: pkts notag: %lld\n", secy_id,
 				   stats.pkt_notag_cnt);
+		else
+			seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt);
 	}
 	mutex_unlock(&mcs->stats_lock);
 	return 0;
-- 
2.48.1


^ permalink raw reply related

* [PATCH] octeontx2-pf: Clear stats of all resources when freeing resources
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep
In-Reply-To: <1781636420-19816-1-git-send-email-sbhatta@marvell.com>

When all MCS resources mapped to a PF are being freed then clear
stats of all those resources too.

Fixes: 815debbbf7b5 ("octeontx2-pf: mcs: Clear stats before freeing resource")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
index 4d3a7f4be962..9524d38f1582 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
@@ -182,6 +182,7 @@ static void cn10k_mcs_free_rsrc(struct otx2_nic *pfvf, enum mcs_direction dir,
 	clear_req->id = hw_rsrc_id;
 	clear_req->type = type;
 	clear_req->dir = dir;
+	clear_req->all = all;
 
 	req = otx2_mbox_alloc_msg_mcs_free_resources(mbox);
 	if (!req)
-- 
2.48.1


^ permalink raw reply related

* [PATCH 6.1] net: gro: don't merge zcopy skbs
From: Alexander Martyniuk @ 2026-06-16 22:00 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Alexander Martyniuk, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sasha Levin, Sabrina Dubroca,
	Hyunwoo Kim, Pavel Begunkov, netdev, linux-kernel, lvc-project,
	Huzaifa Sidhpurwala, Willem de Bruijn

From: Sabrina Dubroca <sd@queasysnail.net>

commit 4db79a322db8c97f7b73b8a347395ef4d685eb40 upstream.

skb_gro_receive() can currently copy frags between the source and GRO
skb, without checking the zerocopy status, and in particular the
SKBFL_MANAGED_FRAG_REFS flag.

When SKBFL_MANAGED_FRAG_REFS is set, the skb doesn't hold a reference
on the pages in shinfo->frags. Appending those frags to another skb's
frags without fixing up the page refcount can lead to UAF.

When either the last skb in the GRO chain (the one we would append
frags to) or the source skb is zerocopy, don't merge the skbs.

Fixes: 753f1ca4e1e5 ("net: introduce managed frags infrastructure")
Reported-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/c3b7f906bbfcbdfd7b4fa9d6c18a438870df85be.1779307748.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
---
Backport fix for CVE-2026-46323
 net/core/gro.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/gro.c b/net/core/gro.c
index ea6571c01faa..c5a9733d929a 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -171,6 +171,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	if (p->pp_recycle != skb->pp_recycle)
 		return -ETOOMANYREFS;
 
+	if (skb_zcopy(p) || skb_zcopy(skb))
+		return -ETOOMANYREFS;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);
 
-- 
2.30.2


^ permalink raw reply related

* [PATCH v1 net-next] ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
From: Kuniyuki Iwashima @ 2026-06-16 19:13 UTC (permalink / raw)
  To: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev,
	syzbot+965506b59a2de0b6905c

syzbot reported use-after-free of net->ipv4.rules_ops. [0]

It can be reproduced with these commands:

  while true; do
  	ip netns add ns1
  	ip -n ns1 link set dev lo up
  	ip -n ns1 address add 192.0.2.1/24 dev lo
  	ip -n ns1 link add name dummy1 up type dummy
  	ip -n ns1 address add 198.51.100.1/24 dev dummy1
  	ip -n ns1 rule add ipproto tcp sport 12345 table 12345
  	ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
  	ip netns del ns1
  done

The cited commit moved fib4_rules_exit() earlier to ->exit_rtnl(),
but the kernel socket destroyed in ->exit() could eventually reach
__fib_lookup().

I left fib4_rules_exit() in ->exit_rtnl() because fib4_rule_delete()
calls fib_unmerge(), which requires RTNL.

However, when ->delete() is called, ->configure() has already been
called, thus fib_unmerge() in ->delete() has no effect.

Let's remove fib_unmerge() in fib4_rule_delete() and move
fib4_rules_exit() to ->exit().

Many thanks to Ido Schimmel for providing the nice repro very quickly.

Note that we can make fib_rules_ops.delete() return void once
net-next opens.

[0]:
BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641

CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
 __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
 ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
 ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
 __ip_route_output_key include/net/route.h:169 [inline]
 ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
 ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
 release_sock+0x206/0x260 net/core/sock.c:3861
 inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 fou_release net/ipv4/fou_core.c:562 [inline]
 fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
 ops_exit_list net/core/net_namespace.c:199 [inline]
 ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
 cleanup_net+0x572/0x810 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Fixes: 759923cf03b0 ("ipv4: fib: Convert fib_net_exit_batch() to ->exit_rtnl().")
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a315824.b0403584.28d0ff.0000.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv4/fib_frontend.c | 10 ++++++----
 net/ipv4/fib_rules.c    | 11 ++---------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c7d1f31650d7..42212970d735 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1612,10 +1612,6 @@ static void ip_fib_net_exit(struct net *net)
 			fib_free_table(tb);
 		}
 	}
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	fib4_rules_exit(net);
-#endif
 }
 
 static int __net_init fib_net_init(struct net *net)
@@ -1652,6 +1648,9 @@ static int __net_init fib_net_init(struct net *net)
 	ip_fib_net_exit(net);
 	rtnl_net_unlock(net);
 
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	goto out;
@@ -1671,6 +1670,9 @@ static void __net_exit fib_net_exit_rtnl(struct net *net,
 
 static void __net_exit fib_net_exit(struct net *net)
 {
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	fib4_semantics_exit(net);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 51f0193092f0..e068a5bace73 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -352,24 +352,17 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 static int fib4_rule_delete(struct fib_rule *rule)
 {
 	struct net *net = rule->fr_net;
-	int err;
-
-	/* split local/main if they are not already split */
-	err = fib_unmerge(net);
-	if (err)
-		goto errout;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	if (((struct fib4_rule *)rule)->tclassid)
 		atomic_dec(&net->ipv4.fib_num_tclassid_users);
 #endif
-	net->ipv4.fib_has_custom_rules = true;
 
 	if (net->ipv4.fib_rules_require_fldissect &&
 	    fib_rule_requires_fldissect(rule))
 		net->ipv4.fib_rules_require_fldissect--;
-errout:
-	return err;
+
+	return 0;
 }
 
 static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox