Netdev List
 help / color / mirror / Atom feed
* Re: [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release
From: Dragos Tatulea @ 2026-06-27  7:48 UTC (permalink / raw)
  To: Nabil S. Alramli, saeedm, tariqt, mbloch
  Cc: nalramli, leon, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, linux-rdma, linux-kernel
In-Reply-To: <aa190e99-2ebf-4d59-a6c9-755ca181e16d@nalramli.com>



On 26.06.26 20:02, Nabil S. Alramli wrote:
> On 6/26/26 09:12, Dragos Tatulea wrote:
>>
>>
>> [...]
>>> ```
>>> 	ret = atomic_long_sub_return(nr, pp_ref_count);
>>> 	WARN_ON(ret < 0);
>>> ```
>>>
>>> The actual stack trace looks like this:
>>>
>>> ```
>>> WARNING: CPU: 37 PID: 447795 at include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
>>> Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
>>> Hardware name: *
>>> RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
>>> RSP: 0018:ffffc90019814d98 EFLAGS: 00010293
>>> RAX: 000000000000003f RBX: ffff88c0993d0a10 RCX: ffffea02424592c0
>>> RDX: 0000000000000001 RSI: ffffea02424592c0 RDI: ffff88c090e20000
>>> RBP: 000000000000000a R08: 0000000000001409 R09: 0000000000000006
>>> R10: 0000000000000000 R11: ffff88c095fbc040 R12: 000000000000141f
>>> R13: 0000000000000009 R14: ffff88c090e20000 R15: 0000000000000001
>>> FS:  00007f34149fa6c0(0000) GS:ffff89200fa40000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007ed0265eb000 CR3: 0000005091cbe000 CR4: 0000000000350ef0
>>> Call Trace:
>>>  <IRQ>
>>>  mlx5e_free_rx_wqes+0x7b/0xa0 [mlx5_core]
>>>  mlx5e_post_rx_wqes+0x1ac/0x5a0 [mlx5_core]
>>>  mlx5e_napi_poll+0x5e5/0x6f0 [mlx5_core]
>>>  __napi_poll+0x2b/0x1a0
>>>  net_rx_action+0x30e/0x370
>>>  ? sched_clock+0x9/0x10
>>>  ? sched_clock_cpu+0xf/0x170
>>>  handle_softirqs+0xe2/0x2a0
>>>  common_interrupt+0x85/0xa0
>>>  </IRQ>
>>>  <TASK>
>>>  asm_common_interrupt+0x26/0x40
>>> RIP: 0010:page_counter_uncharge+0x34/0x90
>>> RSP: 0018:ffffc900e728bb00 EFLAGS: 00000213
>>> RAX: ffff88aff4762000 RBX: ffff88aff4762100 RCX: 0000000000000304
>>> RDX: 0000000000000001 RSI: 00000000004e9e1a RDI: ffff88aff4762100
>>> RBP: 0000000000000001 R08: ffff891ea0560048 R09: 00007ffffffff000
>>> R10: 0000000000001000 R11: ffff891ae8061b00 R12: ffffffffffffffff
>>> R13: ffff89107fcfd4c0 R14: ffff891ae8061b00 R15: ffff892002fe1400
>>>  uncharge_batch+0x40/0xd0
>>> ```
>>>
>> Can you provide more data on how you reproduced this? This helps to
>> narrow down the bug. Reproduction steps would be ideal.
>>
> 
> I don't have clear steps to reproduce it, we just have seen it randomly on
> some servers that were under memory pressure. I will try to look into it more
> and find a way to reliably reproduce it. I agree that would be ideal to find a
> proper fix.
> 
What NIC is this?
What MTU is being used?
Is strided rq enabled (ethtool --show-priv-flags).
Is XDP/AF_XDP used? If yes, can you provide more details?
Is HW-GRO on?

Based on those answers we can review the code path and see if there
is a case where the accounting for the fragments is not done correctly

Also, is buf_alloc_err growing during these memory pressure?

>>> The fix is to use an atomic page fragment counter, so it will always match
>>> the number of references held in the page_pool.
>>>
>> This is not the right fix. The mlx5 page frag counter is not atomic
>> on purpose because all changes to it happen only within the NAPI
>> context.
>>
> 
> That was a question that I had, is it ever possible for frag_page->frags to be
> incremented / set outside of NAPI context? I tried to answer that by looking
> at code and by tracing it but could not get a clear picture. If it's not
> possible then I agree, this is not the right fix.
> 
If that happens it is probably a bug.

Thanks,
Dragos

^ permalink raw reply

* Re: [PATCH] fix: net/batman-adv: batadv_interface_kill_vid: extra batadv_meshif_vlan_put after destroy
From: Sven Eckelmann @ 2026-06-27  7:07 UTC (permalink / raw)
  To: WenTao Liang
  Cc: marek.lindner, sw, antonio, davem, edumazet, kuba, pabeni, horms,
	b.a.t.m.a.n, netdev, linux-kernel, stable
In-Reply-To: <178254092045.4739.1497464106445743950.b4-review@b4>

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

On Saturday, 27 June 2026 08:15:20 CEST Sven Eckelmann wrote:
> On Sat, 27 Jun 2026 11:46:36 +0800, WenTao Liang <vulab@iscas.ac.cn> wrote:
> 
> Hi,
> 
> not-acked

Just noticed that we already have another odd patch from you [1] (and you 
never answered after my reply). Could it be that you just try to spread AI/
LLM(?) generated patches in stable@vger.kernel.org and hope that something 
sticks?

I see a lot more patch bombs and complains all over the place when searching 
the whole lore.kernel.org [2] and only checking the last couple of days.

If this is really the case - please don't do this. We already stress them (and 
other maintainers) enough by dumping large amounts of legitimate patches on 
them. Sending patches shutgun-style all over the place without any 
recognizable QA or oversight might just cause an overload. And when you then 
don't even take the time to react to the review of the patches or apply the 
requests they had to you (and instead invent new things to annoy them)... At 
least I will not spend an hour writing a reply to you anymore but directly 
reject your patch.

Regards,
	Sven

[1] https://lore.kernel.org/batman/20250401083901.2261-1-vulab@iscas.ac.cn/
[2] https://lore.kernel.org/all/?q=vulab@iscas.ac.cn

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] fix: net/batman-adv: batadv_interface_kill_vid: extra batadv_meshif_vlan_put after destroy
From: Sven Eckelmann @ 2026-06-27  6:15 UTC (permalink / raw)
  To: WenTao Liang
  Cc: marek.lindner, sw, antonio, sven, davem, edumazet, kuba, pabeni,
	horms, b.a.t.m.a.n, netdev, linux-kernel, stable
In-Reply-To: <20260627034636.59693-1-vulab@iscas.ac.cn>

On Sat, 27 Jun 2026 11:46:36 +0800, WenTao Liang <vulab@iscas.ac.cn> wrote:

Hi,

not-acked

1. please don't send patches to netdev directly. See (from any recent
   batadv.git, netdev/net.git netdev/net-next.git or torvalds/linux.git):

    $ ./scripts/get_maintainer.pl 20260627034636.59693-1-vulab@iscas.ac.cn.mbx 
    Marek Lindner <marek.lindner@mailbox.org> (maintainer:BATMAN ADVANCED,blamed_fixes:1/1=100%)
    Simon Wunderlich <sw@simonwunderlich.de> (maintainer:BATMAN ADVANCED)
    Antonio Quartulli <antonio@mandelbit.com> (maintainer:BATMAN ADVANCED,blamed_fixes:1/1=100%)
    Sven Eckelmann <sven@narfation.org> (maintainer:BATMAN ADVANCED)
    b.a.t.m.a.n@lists.open-mesh.org (moderated list:BATMAN ADVANCED)
    linux-kernel@vger.kernel.org (open list)

2. please add after the "PATCH" the tree which it should enter (in this case
   "[PATCH batadv]". See: 

    ./scripts/get_maintainer.pl --scm 20260627034636.59693-1-vulab@iscas.ac.cn.mbx|grep '^git'           
    git https://git.open-mesh.org/batadv.git
    git git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

3. Please use a subject line which follows the kernel style. See
   https://docs.kernel.org/process/submitting-patches.html#the-canonical-patch-formatA

   - no "fix: "
   - "batman-adv: " instead of "net/batman-adv: "
   - most likely no "batadv_interface_kill_vid: "
   - an actual summary of your change (because right now it says it adds(?) an extra put)

> In batadv_interface_kill_vid(), batadv_meshif_vlan_get() acquires a
> reference on the vlan object. batadv_meshif_destroy_vlan() internally
> calls batadv_meshif_vlan_put() which balances that reference. However, an

No, this doesn't balance the reference. The reference put in this function is
for the reference acquired by this function. The batadv_meshif_destroy_vlan()
put is for the reference for its "from .ndo_vlan_rx_add_vid till 
.ndo_vlan_rx_kill_vid" lifetime.

You can see exactly the same approach also in batadv_meshif_destroy_netlink()
for its "untagged" vlan. A function which you didn't touch.

> additional batadv_meshif_vlan_put(vlan) is called after
> batadv_meshif_destroy_vlan(), causing a refcount underflow and potential
> use-after-free of the vlan object.

No, doesn't cause an underflow in my setup. Please explain exactly how you
tested this and came the conclusion that this would cause a use-after-free.
Because I can't reproduce this and the patch in this form is causing a memory
leak for me.

> 
> Remove the extra batadv_meshif_vlan_put(vlan) call.

No, this can't be the correct solution.

>
>
> diff --git a/net/batman-adv/mesh-interface.c b/net/batman-adv/mesh-interface.c
> index e5a55d24..7a1aeeca 100644
> --- a/net/batman-adv/mesh-interface.c
> +++ b/net/batman-adv/mesh-interface.c
> @@ -693,9 +693,6 @@ static int batadv_interface_kill_vid(struct net_device *dev, __be16 proto,
>  
>  	batadv_meshif_destroy_vlan(bat_priv, vlan);
>  
> -	/* finally free the vlan object */
> -	batadv_meshif_vlan_put(vlan);
> -

This looks wrong to me. Now it leaks the VLAN which was acquired at the
beginning of the function. When I add a kref_get-printk right before the
batadv_meshif_destroy_vlan() and in batadv_tt_local_entry_release() before the
puts:

    refcnt before batadv_meshif_destroy_vlan: 3
    refcnt after batadv_meshif_destroy_vlan: 2
    refcnt before batadv_tt_local_entry_release: 2
    refcnt after batadv_tt_local_entry_release: 1

As you can see, now the VLAN never reaches the 0 and thus isn't free'd. You can
also directly see the memory leak (which didn't happen before):

    root@node01:~# ip l del dev bat0.10
    [   18.127153][  T368] refcnt before batadv_meshif_destroy_vlan: 3
    [   18.128792][  T368] refcnt after batadv_meshif_destroy_vlan: 2
    [   18.649318][   T12] refcnt before batadv_tt_local_entry_release: 2
    [   18.650220][   T12] refcnt after batadv_tt_local_entry_release: 1
    root@node01:~# rmmod batman-adv
    [   27.033891][  T374] batman_adv: bat0: Interface deactivated: dummy0
    [   27.034522][  T374] batman_adv: bat0: Removing interface: dummy0
    [   27.038340][  T374] batman_adv: bat0: Interface deactivated: enp0s1
    [   27.038973][  T374] batman_adv: bat0: Removing interface: enp0s1
    [   27.044439][  T374] br0: port 1(bat0) entered disabled state
    [   27.049110][  T374] bat0 (unregistering): left allmulticast mode
    [   27.049486][  T374] bat0 (unregistering): left promiscuous mode
    [   27.049804][  T374] br0: port 1(bat0) entered disabled state
    [   27.096326][  T374] refcnt before batadv_tt_local_entry_release: 1
    [   27.096851][  T374] refcnt after batadv_tt_local_entry_release: 0
    root@node01:~# modprobe batman-adv 
    root@node01:~# echo scan > /sys/kernel/debug/kmemleak
    root@node01:~# echo scan > /sys/kernel/debug/kmemleak
    [   41.460324][  T361] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
    root@node01:~# cat /sys/kernel/debug/kmemleak
    unreferenced object 0xffff88800ab1bd00 (size 64):
      comm "ip", pid 300, jiffies 4294893634
      hex dump (first 32 bytes):
        c0 cb c7 13 80 88 ff ff 0a 80 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace (crc 552e6e51):
        kmemleak_alloc+0x55/0xa0
        __kmalloc_cache_noprof+0x2f4/0x540
        batadv_meshif_create_vlan+0x7c/0x450 [batman_adv]
        batadv_interface_add_vid+0xb6/0xd0 [batman_adv]
        vlan_add_rx_filter_info+0xee/0x160
        vlan_vid_add+0x2f6/0x910
        register_vlan_dev+0xc5/0x6f0
        vlan_newlink+0x40e/0x6f0
        rtnl_newlink_create+0x2e1/0x770
        __rtnl_newlink+0x20b/0x9d0
        rtnl_newlink+0x7f7/0xf90
        rtnetlink_rcv_msg+0x811/0xbf0
        netlink_rcv_skb+0x148/0x3f0
        rtnetlink_rcv+0x19/0x20
        netlink_unicast+0x5fc/0xa50
        netlink_sendmsg+0x82b/0xd70

Because of the errors this patch introduces and the form of the patch: will not
be applied in batadv.git

We can discuss an actual fix when you can explain us how this problem can
actually be reproduced.

-- 
Sven Eckelmann <sven@narfation.org>

^ permalink raw reply

* [PATCH net] octeontx2-pf: fix SQ resource leaks on init failure
From: Dawei Feng @ 2026-06-27  6:03 UTC (permalink / raw)
  To: sgoutham
  Cc: gakula, sbhatta, hkelam, bbhushan2, andrew+netdev, davem,
	edumazet, kuba, pabeni, jbrandeb, richardcochran, amakarov,
	netdev, linux-kernel, stable, jianhao.xu, zilin, Dawei Feng

otx2_init_hw_resources() initializes SQ aura and pool resources
before several later setup steps. On failure, err_free_sq_ptrs only
frees SQB pages, leaving the per-SQ sqb_ptrs arrays behind. If
otx2_config_nix_queues() has initialized some SQs before failing, their
qmem-backed resources can be left behind too.

Use otx2_free_sq_res() for the SQ unwind path and let it free sqb_ptrs
even when sq->sqe has not been allocated yet. Also free the PTP
timestamp qmem from the same helper.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1.1.

An x86_64 allyesconfig build showed no new warnings. As we do not have an
OcteonTX2 PF device and the corresponding AF mailbox setup to test with,
no runtime testing was able to be performed.

Fixes: caa2da34fd25 ("octeontx2-pf: Initialize and config queues")
Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock")
Cc: stable@vger.kernel.org
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 20 +++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 41a0ebdf201e..88ac85354445 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1568,14 +1568,15 @@ static void otx2_free_sq_res(struct otx2_nic *pf)
 	otx2_sq_free_sqbs(pf);
 	for (qidx = 0; qidx < otx2_get_total_tx_queues(pf); qidx++) {
 		sq = &qset->sq[qidx];
-		/* Skip freeing Qos queues if they are not initialized */
-		if (!sq->sqe)
-			continue;
-		qmem_free(pf->dev, sq->sqe);
-		qmem_free(pf->dev, sq->sqe_ring);
-		qmem_free(pf->dev, sq->cpt_resp);
-		qmem_free(pf->dev, sq->tso_hdrs);
-		kfree(sq->sg);
+		/* sq->sqe is not initialized for unused QoS queues */
+		if (sq->sqe) {
+			qmem_free(pf->dev, sq->sqe);
+			qmem_free(pf->dev, sq->sqe_ring);
+			qmem_free(pf->dev, sq->cpt_resp);
+			qmem_free(pf->dev, sq->tso_hdrs);
+			qmem_free(pf->dev, sq->timestamps);
+			kfree(sq->sg);
+		}
 		kfree(sq->sqb_ptrs);
 	}
 }
@@ -1710,13 +1711,12 @@ int otx2_init_hw_resources(struct otx2_nic *pf)
 	return err;
 
 err_free_nix_queues:
-	otx2_free_sq_res(pf);
 	otx2_free_cq_res(pf);
 	otx2_ctx_disable(mbox, NIX_AQ_CTYPE_RQ, false);
 err_free_txsch:
 	otx2_txschq_stop(pf);
 err_free_sq_ptrs:
-	otx2_sq_free_sqbs(pf);
+	otx2_free_sq_res(pf);
 err_free_rq_ptrs:
 	otx2_free_aura_ptr(pf, AURA_NIX_RQ);
 	otx2_ctx_disable(mbox, NPA_AQ_CTYPE_POOL, true);
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Maxime Chevallier @ 2026-06-27  5:34 UTC (permalink / raw)
  To: Jakub Kicinski, Andrew Lunn
  Cc: davem, Eric Dumazet, Paolo Abeni, Simon Horman, Russell King,
	Heiner Kallweit, Jonathan Corbet, Shuah Khan, Oleksij Rempel,
	Vladimir Oltean, Florian Fainelli, thomas.petazzoni, netdev,
	linux-kernel, linux-doc
In-Reply-To: <20260626173352.7dc8f106@kernel.org>

Hi Jakub,

On 6/27/26 02:33, Jakub Kicinski wrote:
> On Fri, 26 Jun 2026 14:39:57 +0200 Andrew Lunn wrote:
>> On Fri, Jun 26, 2026 at 10:33:50AM +0200, Maxime Chevallier wrote:
>>>   
>>>> Sphinx follows pythons object orientate structure. So you could have a
>>>> class test_ethtool_pause_advertising, with class documentation. And
>>>> then methods within the class which are individual tests.  The
>>>> commented out section would then be method documentation.  
>>>
>>> Good point, so maybe something along these lines :
>>>
>>>  - A class for the test
>>>  - methods for indivitual tests
>>>  - For readability, I've written what the internal test helper would look
>>>    like (_adv_test), and how a test would look like without the helper in
>>>    adv_rx_on_tx_on().
>>>
>>> I'm already diving into coding, but it helps me a bit in the definition of the
>>> "description" format :)
>>>
>>> this is what the class would look like :  
>>
>> I like this :-)
> 
> This is very far from what existing python tests do in netdev.

We can probably drop the class, as it is with this discussion, it's merely a way
to regroup doc common to similar tests. The rest really is the usual set of
ksft funcs you can feed to the run function, with a set of ksft_ethtool_*
annotators for generic checks.

> 
> I would prefer to stick to the "bash on steroids" use of Python.
Maxime

^ permalink raw reply

* Re: [PATCH] tomoyo: Enforce connect policy in TCP Fast Open
From: Tetsuo Handa @ 2026-06-27  5:28 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Bryam Vargas, Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Yuchung Cheng,
	Eric Dumazet, netdev, Kentaro Takeda
In-Reply-To: <20260619002207.61104-1-matthieu@buffet.re>

On 2026/06/19 9:22, Matthieu Buffet wrote:
> Tomoyo restricted TCP connections in 2011 in commit
> 059d84dbb389 ("TOMOYO: Add socket operation restriction support.")
> using the socket_connect() LSM hook.
> 
> However, the MSG_FASTOPEN sendmsg() flag was added in 2012 to allow
> combining connect() and the first sendmsg(). Tomoyo was not updated to
> take this into account in its send hook.
> 
> This resulted in a TCP connect policy bypass similar to that reported in
> Landlock in 2024 (see Link below), with the difference that Tomoyo was
> fine when originally merged, and the problem got introduced when adding
> fastopen support, possibly due to lack of synchronization between lsm
> and netdev worlds.
> 
> Add MSG_FASTOPEN handling in Tomoyo's existing send hook.
> 
> Link: https://github.com/landlock-lsm/linux/issues/41
> Link: https://lore.kernel.org/all/20260616201615.275032-1-hexlabsecurity@proton.me/
> Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
> Cc: stable@kernel.org
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
>  security/tomoyo/network.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 

Thank you for finding this problem and making a patch.
I updated your patch like below in order to exclude kernel threads from this check.
If we are OK to go with modifying individual LSM, I'll apply this change.

 security/tomoyo/network.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/security/tomoyo/network.c b/security/tomoyo/network.c
index cfc2a019de1e..50d27c773b10 100644
--- a/security/tomoyo/network.c
+++ b/security/tomoyo/network.c
@@ -765,6 +765,15 @@ int tomoyo_socket_sendmsg_permission(struct socket *sock, struct msghdr *msg,
 	const u8 family = tomoyo_sock_family(sock->sk);
 	const unsigned int type = sock->type;
 
+	if ((msg->msg_flags & MSG_FASTOPEN) && msg->msg_name && type == SOCK_STREAM &&
+	    (family == PF_INET || family == PF_INET6) &&
+	    (sock->sk->sk_protocol == IPPROTO_TCP || sock->sk->sk_protocol == IPPROTO_MPTCP)) {
+		address.protocol = SOCK_STREAM;
+		address.operation = TOMOYO_NETWORK_CONNECT;
+		return tomoyo_check_inet_address((struct sockaddr *)msg->msg_name,
+						 msg->msg_namelen, 0, &address);
+	}
+
 	if (!msg->msg_name || !family ||
 	    (type != SOCK_DGRAM && type != SOCK_RAW))
 		return 0;
-- 
2.54.0



^ permalink raw reply related

* [PATCH net-next v2] r8169: migrate Rx path to page_pool
From: atharva-potdar @ 2026-06-27  3:52 UTC (permalink / raw)
  To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Francois Romieu, netdev, atharvapotdar07

Migrate the Rx path to use the page_pool API, replacing the legacy
alloc_pages() + skb_copy() model with napi_build_skb() for zero-copy
delivery. This prepares the driver for future XDP support.

To prevent MTU regressions and DMA overflows on older MACs
(CVE-2009-1389), the pool allocates higher-order pages using
get_order(SZ_16K), matching the legacy driver behavior.

DMA mapping and cache syncing are delegated to the page_pool core via
PP_FLAG_DMA_MAP and PP_FLAG_DMA_SYNC_DEV to ensure safe operation across
all architectures.

Signed-off-by: atharva-potdar <atharvapotdar07@gmail.com>
---
v2:
 - Reverted buffer size to SZ_16K and utilized get_order(SZ_16K) to 
   prevent MTU regression and mitigate CVE-2009-1389.
 - Use napi_build_skb() instead of skb_add_rx_frag() to keep ethernet
   headers in the linear data area.

 drivers/net/ethernet/realtek/r8169_main.c | 77 ++++++++++++++++-------
 1 file changed, 54 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index ec4fc21fa..a9bedf93b 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -31,6 +31,7 @@
 #include <linux/unaligned.h>
 #include <net/ip6_checksum.h>
 #include <net/netdev_queues.h>
+#include <net/page_pool/helpers.h>
 #include <net/phy/realtek_phy.h>
 
 #include "r8169.h"
@@ -729,6 +730,7 @@ enum rtl_dash_type {
 };
 
 struct rtl8169_private {
+	struct page_pool *rx_pool;
 	void __iomem *mmio_addr;	/* memory map physical address */
 	struct pci_dev *pci_dev;
 	struct net_device *dev;
@@ -4161,21 +4163,14 @@ static void rtl8169_mark_to_asic(struct RxDesc *desc)
 static struct page *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
 					  struct RxDesc *desc)
 {
-	struct device *d = tp_to_dev(tp);
-	int node = dev_to_node(d);
 	dma_addr_t mapping;
 	struct page *data;
 
-	data = alloc_pages_node(node, GFP_KERNEL, get_order(R8169_RX_BUF_SIZE));
+	data = page_pool_dev_alloc_pages(tp->rx_pool);
 	if (!data)
 		return NULL;
 
-	mapping = dma_map_page(d, data, 0, R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
-	if (unlikely(dma_mapping_error(d, mapping))) {
-		netdev_err(tp->dev, "Failed to map RX DMA!\n");
-		__free_pages(data, get_order(R8169_RX_BUF_SIZE));
-		return NULL;
-	}
+	mapping = page_pool_get_dma_addr(data);
 
 	desc->addr = cpu_to_le64(mapping);
 	rtl8169_mark_to_asic(desc);
@@ -4188,14 +4183,16 @@ static void rtl8169_rx_clear(struct rtl8169_private *tp)
 	int i;
 
 	for (i = 0; i < NUM_RX_DESC && tp->Rx_databuff[i]; i++) {
-		dma_unmap_page(tp_to_dev(tp),
-			       le64_to_cpu(tp->RxDescArray[i].addr),
-			       R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
-		__free_pages(tp->Rx_databuff[i], get_order(R8169_RX_BUF_SIZE));
+		page_pool_put_full_page(tp->rx_pool, tp->Rx_databuff[i], false);
 		tp->Rx_databuff[i] = NULL;
 		tp->RxDescArray[i].addr = 0;
 		tp->RxDescArray[i].opts1 = 0;
 	}
+
+	if (tp->rx_pool) {
+		page_pool_destroy(tp->rx_pool);
+		tp->rx_pool = NULL;
+	}
 }
 
 static int rtl8169_rx_fill(struct rtl8169_private *tp)
@@ -4221,8 +4218,26 @@ static int rtl8169_rx_fill(struct rtl8169_private *tp)
 
 static int rtl8169_init_ring(struct rtl8169_private *tp)
 {
+	struct page_pool_params params = {0};
+
 	rtl8169_init_ring_indexes(tp);
 
+	params.order = get_order(SZ_16K);
+	params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+	params.pool_size = NUM_RX_DESC;
+	params.dev = tp_to_dev(tp);
+	params.nid = dev_to_node(tp_to_dev(tp));
+	params.dma_dir = DMA_FROM_DEVICE;
+	params.offset = 0;
+	params.max_len = SZ_16K;
+	tp->rx_pool = page_pool_create(&params);
+	if (IS_ERR(tp->rx_pool)) {
+		int err = PTR_ERR(tp->rx_pool);
+
+		tp->rx_pool = NULL;
+		return err;
+	}
+
 	memset(tp->tx_skb, 0, sizeof(tp->tx_skb));
 	memset(tp->Rx_databuff, 0, sizeof(tp->Rx_databuff));
 
@@ -4777,6 +4792,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 		unsigned int pkt_size, entry = tp->cur_rx % NUM_RX_DESC;
 		struct RxDesc *desc = tp->RxDescArray + entry;
 		struct sk_buff *skb;
+		struct page *new_page;
 		const void *rx_buf;
 		dma_addr_t addr;
 		u32 status;
@@ -4820,21 +4836,36 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 			goto release_descriptor;
 		}
 
-		skb = napi_alloc_skb(&tp->napi, pkt_size);
-		if (unlikely(!skb)) {
-			dev->stats.rx_dropped++;
-			goto release_descriptor;
-		}
-
 		addr = le64_to_cpu(desc->addr);
 		rx_buf = page_address(tp->Rx_databuff[entry]);
 
 		dma_sync_single_for_cpu(d, addr, pkt_size, DMA_FROM_DEVICE);
 		prefetch(rx_buf);
-		skb_copy_to_linear_data(skb, rx_buf, pkt_size);
-		skb->tail += pkt_size;
-		skb->len = pkt_size;
-		dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
+
+		new_page = page_pool_dev_alloc_pages(tp->rx_pool);
+		if (unlikely(!new_page)) {
+			skb = napi_alloc_skb(&tp->napi, pkt_size);
+			if (unlikely(!skb)) {
+				dev->stats.rx_dropped++;
+				goto release_descriptor;
+			}
+			skb_copy_to_linear_data(skb, rx_buf, pkt_size);
+			skb_put(skb, pkt_size);
+			dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
+		} else {
+			skb = napi_build_skb(page_address(tp->Rx_databuff[entry]), SZ_16K);
+			if (unlikely(!skb)) {
+				page_pool_recycle_direct(tp->rx_pool, new_page);
+				dev->stats.rx_dropped++;
+				goto release_descriptor;
+			}
+
+			skb_put(skb, pkt_size);
+			skb_mark_for_recycle(skb);
+
+			tp->Rx_databuff[entry] = new_page;
+			desc->addr = cpu_to_le64(page_pool_get_dma_addr(new_page));
+		}
 
 		rtl8169_rx_csum(skb, status);
 		skb->protocol = eth_type_trans(skb, dev);
-- 
2.54.0


^ permalink raw reply related

* [PATCH v2] xfrm: cache the offload ifindex for netlink dumps
From: Cen Zhang @ 2026-06-27  3:50 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, baijiaju1990, zzzccc427

copy_to_user_state_extra() only holds a reference to the outer xfrm_state.
That does not pin x->xso.dev. NETDEV_DOWN and NETDEV_UNREGISTER can race
through xfrm_dev_state_flush(), xfrm_state_delete(), and
xfrm_dev_state_free(), which clears xso->dev and drops the netdev
reference before the GETSA dump reaches xso_to_xuo() and reads
xso->dev->ifindex.

The buggy scenario involves two paths, with each column showing the order
within that path:

XFRM_MSG_GETSA dump path:           NETDEV teardown path:
1. xfrm_get_sa() gets xfrm_state    1. xfrm_dev_state_flush() finds x
2. copy_to_user_state_extra() sees  2. xfrm_state_delete() removes x
   x->xso.dev                          from the SAD
3. copy_user_offload() calls        3. xfrm_dev_state_free() clears
   xso_to_xuo()                        xso->dev
4. xso->dev->ifindex dereferences   4. netdev_put() drops the device
   a detached net_device               reference

Avoid following the live net_device from the dump paths. Cache the
attached ifindex in xfrm_dev_offload when state or policy offload is bound
to a device, and serialize that snapshot instead. This preserves the
user-visible XFRMA_OFFLOAD_DEV value without depending on the embedded
net_device lifetime.

Validation reproduced this kernel report:
Oops: general protection fault

Call Trace:
 <TASK>
 copy_to_user_state_extra+0xb8d/0x1370 [xfrm_user]
 ? __pfx_copy_to_user_state_extra+0x10/0x10 [xfrm_user]
 ? __asan_memset+0x23/0x50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __alloc_skb+0x342/0x960
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __asan_memset+0x23/0x50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __nlmsg_put+0x147/0x1b0
 dump_one_state+0x1c7/0x3e0 [xfrm_user]
 xfrm_state_netlink+0xcb/0x130 [xfrm_user]
 ? __pfx_xfrm_state_netlink+0x10/0x10 [xfrm_user]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? xfrm_user_state_lookup.constprop.0+0x230/0x310 [xfrm_user]
 xfrm_get_sa+0x102/0x250 [xfrm_user]
 ? __pfx_xfrm_get_sa+0x10/0x10 [xfrm_user]
 xfrm_user_rcv_msg+0x504/0xaa0 [xfrm_user]
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? stack_trace_save+0x8e/0xc0
 ? __pfx_stack_trace_save+0x10/0x10
 netlink_rcv_skb+0x11f/0x350
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
 ? __pfx_netlink_rcv_skb+0x10/0x10
 ? __pfx_mutex_lock+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 xfrm_netlink_rcv+0x65/0x80 [xfrm_user]
 netlink_unicast+0x600/0x870
 ? __pfx_netlink_unicast+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __pfx_stack_trace_save+0x10/0x10
 netlink_sendmsg+0x75d/0xc10
 ? __pfx_netlink_sendmsg+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ____sys_sendmsg+0x77a/0x900
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __pfx_____sys_sendmsg+0x10/0x10
 ? __pfx_copy_msghdr_from_user+0x10/0x10
 ? release_sock+0x1a/0x1d0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? netlink_insert+0x143/0xec0
 ___sys_sendmsg+0xff/0x180
 ? __pfx____sys_sendmsg+0x10/0x10
 ? _raw_spin_lock_irqsave+0x85/0xe0
 ? do_getsockname+0xf9/0x170
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? fdget+0x53/0x3b0
 __sys_sendmsg+0x111/0x1a0
 ? __pfx___sys_sendmsg+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __sys_getsockname+0x8c/0x100
 do_syscall_64+0x102/0x5a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
 include/net/xfrm.h     |  2 ++
 net/xfrm/xfrm_device.c |  1 +
 net/xfrm/xfrm_state.c  |  1 +
 net/xfrm/xfrm_user.c   | 38 +++++++++++++++++++++++++++++---------
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 519a0156a05c..a6d69aaa6cd2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -162,6 +162,8 @@ struct xfrm_dev_offload {
 	 */
 	struct net_device	*real_dev;
 	unsigned long		offload_handle;
+	/* Snapshot the attached device index for dump paths. */
+	int			ifindex;
 	u8			dir : 2;
 	u8			type : 2;
 	u8			flags : 2;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 630f3dd31cc5..44bfaa04e621 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -313,6 +313,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	}
 
 	xso->dev = dev;
+	xso->ifindex = dev->ifindex;
 	netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
 
 	if (xuo->flags & XFRM_OFFLOAD_INBOUND)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c58cd024e3c6..707e29c82020 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1547,6 +1547,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			xso->type = XFRM_DEV_OFFLOAD_PACKET;
 			xso->dir = xdo->dir;
 			xso->dev = dev;
+			xso->ifindex = dev->ifindex;
 			xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
 			netdev_hold(dev, &xso->dev_tracker, GFP_ATOMIC);
 			error = dev->xfrmdev_ops->xdo_dev_state_add(dev, x,
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 6384795ee6b2..0eb87fc998d1 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1201,17 +1201,26 @@ static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
 	return 0;
 }
 
-static void xso_to_xuo(const struct xfrm_dev_offload *xso,
-		       struct xfrm_user_offload *xuo)
+static void xso_to_xuo_ifindex(const struct xfrm_dev_offload *xso, int ifindex,
+			       struct xfrm_user_offload *xuo)
 {
-	xuo->ifindex = xso->dev->ifindex;
+	xuo->ifindex = ifindex;
 	if (xso->dir == XFRM_DEV_OFFLOAD_IN)
 		xuo->flags = XFRM_OFFLOAD_INBOUND;
 	if (xso->type == XFRM_DEV_OFFLOAD_PACKET)
 		xuo->flags |= XFRM_OFFLOAD_PACKET;
 }
 
-static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
+#ifdef CONFIG_XFRM_MIGRATE
+static void xso_to_xuo(const struct xfrm_dev_offload *xso,
+		       struct xfrm_user_offload *xuo)
+{
+	xso_to_xuo_ifindex(xso, xso->dev->ifindex, xuo);
+}
+#endif
+
+static int copy_user_offload_ifindex(const struct xfrm_dev_offload *xso,
+				     int ifindex, struct sk_buff *skb)
 {
 	struct xfrm_user_offload *xuo;
 	struct nlattr *attr;
@@ -1222,11 +1231,22 @@ static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
 
 	xuo = nla_data(attr);
 	memset(xuo, 0, sizeof(*xuo));
-	xso_to_xuo(xso, xuo);
+	xso_to_xuo_ifindex(xso, ifindex, xuo);
 
 	return 0;
 }
 
+static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
+{
+	return copy_user_offload_ifindex(xso, xso->dev->ifindex, skb);
+}
+
+static int copy_user_state_offload(const struct xfrm_dev_offload *xso,
+				   struct sk_buff *skb)
+{
+	return copy_user_offload_ifindex(xso, READ_ONCE(xso->ifindex), skb);
+}
+
 static bool xfrm_redact(void)
 {
 	return IS_ENABLED(CONFIG_SECURITY) &&
@@ -1433,8 +1453,8 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 			      &x->replay);
 	if (ret)
 		goto out;
-	if(x->xso.dev)
-		ret = copy_user_offload(&x->xso, skb);
+	if (READ_ONCE(x->xso.dev))
+		ret = copy_user_state_offload(&x->xso, skb);
 	if (ret)
 		goto out;
 	if (x->if_id) {
@@ -4046,8 +4066,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 		l += nla_total_size(sizeof(*x->coaddr));
 	if (x->props.extra_flags)
 		l += nla_total_size(sizeof(x->props.extra_flags));
-	if (x->xso.dev)
-		 l += nla_total_size(sizeof(struct xfrm_user_offload));
+	if (READ_ONCE(x->xso.dev))
+		l += nla_total_size(sizeof(struct xfrm_user_offload));
 	if (x->props.smark.v | x->props.smark.m) {
 		l += nla_total_size(sizeof(x->props.smark.v));
 		l += nla_total_size(sizeof(x->props.smark.m));
-- 
2.43.0


^ permalink raw reply related

* [PATCH] fix: net/batman-adv: batadv_interface_kill_vid: extra batadv_meshif_vlan_put after destroy
From: WenTao Liang @ 2026-06-27  3:46 UTC (permalink / raw)
  To: marek.lindner, sw, antonio, sven, davem, edumazet, kuba, pabeni
  Cc: horms, b.a.t.m.a.n, netdev, linux-kernel, WenTao Liang, stable

In batadv_interface_kill_vid(), batadv_meshif_vlan_get() acquires a
reference on the vlan object. batadv_meshif_destroy_vlan() internally
calls batadv_meshif_vlan_put() which balances that reference. However, an
additional batadv_meshif_vlan_put(vlan) is called after
batadv_meshif_destroy_vlan(), causing a refcount underflow and potential
use-after-free of the vlan object.

Remove the extra batadv_meshif_vlan_put(vlan) call.

Cc: stable@vger.kernel.org
Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>
---
 net/batman-adv/mesh-interface.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/batman-adv/mesh-interface.c b/net/batman-adv/mesh-interface.c
index e7aa45bc6b7a..cc974f243200 100644
--- a/net/batman-adv/mesh-interface.c
+++ b/net/batman-adv/mesh-interface.c
@@ -691,9 +691,6 @@ static int batadv_interface_kill_vid(struct net_device *dev, __be16 proto,
 
 	batadv_meshif_destroy_vlan(bat_priv, vlan);
 
-	/* finally free the vlan object */
-	batadv_meshif_vlan_put(vlan);
-
 	return 0;
 }
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related

* [PATCH] xfrm: clear mode callbacks after failed mode setup
From: Cen Zhang @ 2026-06-27  3:01 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Christian Hopps
  Cc: netdev, linux-kernel, baijiaju1990, zzzccc427

xfrm_state_gc_task can run long after a failed IPTFS state setup. In the
reproduced case, __xfrm_init_state() cached x->mode_cbs, IPTFS setup
returned -ENOMEM before publishing mode_data, and the temporary module
reference from xfrm_get_mode_cbs() was dropped immediately. The dead state
then kept x->mode_cbs until deferred GC ran after xfrm_iptfs had been
unloaded.

Clear x->mode_cbs when mode init or clone fails before publishing
mode_data. Those states never installed mode-specific state or the
long-term IPTFS module pin, so deferred GC has nothing mode-specific to
destroy and must not retain a callback table pointer past the temporary
lookup reference.

The buggy scenario involves two paths, with each column showing the order
within that path:

failed setup path:
1. cache x->mode_cbs
2. mode setup fails before mode_data
3. drop the temporary module ref
4. dead state keeps x->mode_cbs cached

GC/unload path:
1. xfrm_state_put() queues GC work
2. xfrm_iptfs unloads later
3. xfrm_state_gc_task runs
4. GC dereferences stale x->mode_cbs

This also covers the failed clone path where clone_state() returns before
publishing mode_data.

Validation reproduced this kernel report:
Kernel panic - not syncing: Fatal exception
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
failslab_stacktrace_filter matched xfrm_iptfs frames
ack_error=-12
FAULT_INJECTION: forcing a failure
BUG: unable to handle page fault
Workqueue: events xfrm_state_gc_task
RIP: xfrm_state_gc_task+0x142/0x650
Modules linked in: esp4_offload xfrm_user [last unloaded: xfrm_iptfs]
Kernel panic - not syncing: Fatal exception

Fixes: 4b3faf610cc6 ("xfrm: iptfs: add new iptfs xfrm mode impl")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
 net/xfrm/xfrm_state.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c58cd024e3c6..4d95b2720894 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2071,8 +2071,11 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
 
 	x->mode_cbs = orig->mode_cbs;
 	if (x->mode_cbs && x->mode_cbs->clone_state) {
-		if (x->mode_cbs->clone_state(x, orig))
+		if (x->mode_cbs->clone_state(x, orig)) {
+			if (!x->mode_data)
+				x->mode_cbs = NULL;
 			goto error;
+		}
 	}
 
 	x->props.reqid = m->new_reqid;
@@ -3291,6 +3294,8 @@ int __xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack)
 		if (x->mode_cbs->init_state)
 			err = x->mode_cbs->init_state(x);
 		module_put(x->mode_cbs->owner);
+		if (err && !x->mode_data)
+			x->mode_cbs = NULL;
 	}
 error:
 	return err;
-- 
2.43.0


^ permalink raw reply related

* [PATCH] xfrm: cache the offload ifindex for netlink dumps
From: Cen Zhang @ 2026-06-27  3:00 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, baijiaju1990, zzzccc427

Validation reproduced this kernel report:
Oops: general protection fault

Call Trace:
 <TASK>
 copy_to_user_state_extra+0xb8d/0x1370 [xfrm_user]
 ? __pfx_copy_to_user_state_extra+0x10/0x10 [xfrm_user]
 ? __asan_memset+0x23/0x50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __alloc_skb+0x342/0x960
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __asan_memset+0x23/0x50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __nlmsg_put+0x147/0x1b0
 dump_one_state+0x1c7/0x3e0 [xfrm_user]
 xfrm_state_netlink+0xcb/0x130 [xfrm_user]
 ? __pfx_xfrm_state_netlink+0x10/0x10 [xfrm_user]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? xfrm_user_state_lookup.constprop.0+0x230/0x310 [xfrm_user]
 xfrm_get_sa+0x102/0x250 [xfrm_user]
 ? __pfx_xfrm_get_sa+0x10/0x10 [xfrm_user]
 xfrm_user_rcv_msg+0x504/0xaa0 [xfrm_user]
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? stack_trace_save+0x8e/0xc0
 ? __pfx_stack_trace_save+0x10/0x10
 netlink_rcv_skb+0x11f/0x350
 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
 ? __pfx_netlink_rcv_skb+0x10/0x10
 ? __pfx_mutex_lock+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 xfrm_netlink_rcv+0x65/0x80 [xfrm_user]
 netlink_unicast+0x600/0x870
 ? __pfx_netlink_unicast+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __pfx_stack_trace_save+0x10/0x10
 netlink_sendmsg+0x75d/0xc10
 ? __pfx_netlink_sendmsg+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ____sys_sendmsg+0x77a/0x900
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __pfx_____sys_sendmsg+0x10/0x10
 ? __pfx_copy_msghdr_from_user+0x10/0x10
 ? release_sock+0x1a/0x1d0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? netlink_insert+0x143/0xec0
 ___sys_sendmsg+0xff/0x180
 ? __pfx____sys_sendmsg+0x10/0x10
 ? _raw_spin_lock_irqsave+0x85/0xe0
 ? do_getsockname+0xf9/0x170
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? fdget+0x53/0x3b0
 __sys_sendmsg+0x111/0x1a0
 ? __pfx___sys_sendmsg+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __sys_getsockname+0x8c/0x100
 do_syscall_64+0x102/0x5a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

copy_to_user_state_extra() only holds a reference to the outer xfrm_state.
That does not pin x->xso.dev. NETDEV_DOWN and NETDEV_UNREGISTER can race
through xfrm_dev_state_flush(), xfrm_state_delete(), and
xfrm_dev_state_free(), which clears xso->dev and drops the netdev
reference before the GETSA dump reaches xso_to_xuo() and reads
xso->dev->ifindex.

The buggy scenario involves two paths, with each column showing the order
within that path:

XFRM_MSG_GETSA dump path:           NETDEV teardown path:
1. xfrm_get_sa() gets xfrm_state    1. xfrm_dev_state_flush() finds x
2. copy_to_user_state_extra() sees  2. xfrm_state_delete() removes x
   x->xso.dev                          from the SAD
3. copy_user_offload() calls        3. xfrm_dev_state_free() clears
   xso_to_xuo()                        xso->dev
4. xso->dev->ifindex dereferences   4. netdev_put() drops the device
   a detached net_device               reference

Avoid following the live net_device from the dump paths. Cache the
attached ifindex in xfrm_dev_offload when state or policy offload is bound
to a device, and serialize that snapshot instead. This preserves the
user-visible XFRMA_OFFLOAD_DEV value without depending on the embedded
net_device lifetime.

Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
 include/net/xfrm.h     |  2 ++
 net/xfrm/xfrm_device.c |  1 +
 net/xfrm/xfrm_state.c  |  1 +
 net/xfrm/xfrm_user.c   | 38 +++++++++++++++++++++++++++++---------
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 519a0156a05c..a6d69aaa6cd2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -162,6 +162,8 @@ struct xfrm_dev_offload {
 	 */
 	struct net_device	*real_dev;
 	unsigned long		offload_handle;
+	/* Snapshot the attached device index for dump paths. */
+	int			ifindex;
 	u8			dir : 2;
 	u8			type : 2;
 	u8			flags : 2;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 630f3dd31cc5..44bfaa04e621 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -313,6 +313,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	}
 
 	xso->dev = dev;
+	xso->ifindex = dev->ifindex;
 	netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
 
 	if (xuo->flags & XFRM_OFFLOAD_INBOUND)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c58cd024e3c6..707e29c82020 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1547,6 +1547,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			xso->type = XFRM_DEV_OFFLOAD_PACKET;
 			xso->dir = xdo->dir;
 			xso->dev = dev;
+			xso->ifindex = dev->ifindex;
 			xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
 			netdev_hold(dev, &xso->dev_tracker, GFP_ATOMIC);
 			error = dev->xfrmdev_ops->xdo_dev_state_add(dev, x,
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 6384795ee6b2..0eb87fc998d1 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1201,17 +1201,26 @@ static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
 	return 0;
 }
 
-static void xso_to_xuo(const struct xfrm_dev_offload *xso,
-		       struct xfrm_user_offload *xuo)
+static void xso_to_xuo_ifindex(const struct xfrm_dev_offload *xso, int ifindex,
+			       struct xfrm_user_offload *xuo)
 {
-	xuo->ifindex = xso->dev->ifindex;
+	xuo->ifindex = ifindex;
 	if (xso->dir == XFRM_DEV_OFFLOAD_IN)
 		xuo->flags = XFRM_OFFLOAD_INBOUND;
 	if (xso->type == XFRM_DEV_OFFLOAD_PACKET)
 		xuo->flags |= XFRM_OFFLOAD_PACKET;
 }
 
-static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
+#ifdef CONFIG_XFRM_MIGRATE
+static void xso_to_xuo(const struct xfrm_dev_offload *xso,
+		       struct xfrm_user_offload *xuo)
+{
+	xso_to_xuo_ifindex(xso, xso->dev->ifindex, xuo);
+}
+#endif
+
+static int copy_user_offload_ifindex(const struct xfrm_dev_offload *xso,
+				     int ifindex, struct sk_buff *skb)
 {
 	struct xfrm_user_offload *xuo;
 	struct nlattr *attr;
@@ -1222,11 +1231,22 @@ static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
 
 	xuo = nla_data(attr);
 	memset(xuo, 0, sizeof(*xuo));
-	xso_to_xuo(xso, xuo);
+	xso_to_xuo_ifindex(xso, ifindex, xuo);
 
 	return 0;
 }
 
+static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
+{
+	return copy_user_offload_ifindex(xso, xso->dev->ifindex, skb);
+}
+
+static int copy_user_state_offload(const struct xfrm_dev_offload *xso,
+				   struct sk_buff *skb)
+{
+	return copy_user_offload_ifindex(xso, READ_ONCE(xso->ifindex), skb);
+}
+
 static bool xfrm_redact(void)
 {
 	return IS_ENABLED(CONFIG_SECURITY) &&
@@ -1433,8 +1453,8 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 			      &x->replay);
 	if (ret)
 		goto out;
-	if(x->xso.dev)
-		ret = copy_user_offload(&x->xso, skb);
+	if (READ_ONCE(x->xso.dev))
+		ret = copy_user_state_offload(&x->xso, skb);
 	if (ret)
 		goto out;
 	if (x->if_id) {
@@ -4046,8 +4066,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 		l += nla_total_size(sizeof(*x->coaddr));
 	if (x->props.extra_flags)
 		l += nla_total_size(sizeof(x->props.extra_flags));
-	if (x->xso.dev)
-		 l += nla_total_size(sizeof(struct xfrm_user_offload));
+	if (READ_ONCE(x->xso.dev))
+		l += nla_total_size(sizeof(struct xfrm_user_offload));
 	if (x->props.smark.v | x->props.smark.m) {
 		l += nla_total_size(sizeof(x->props.smark.v));
 		l += nla_total_size(sizeof(x->props.smark.m));
-- 
2.43.0


^ permalink raw reply related

* [PATCH ipsec] xfrm: fix sk_dst_cache double-free in xfrm_user_policy()
From: Xiang Mei (Microsoft) @ 2026-06-27  2:40 UTC (permalink / raw)
  To: steffen.klassert, herbert, davem, netdev
  Cc: edumazet, kuba, pabeni, linux-kernel, AutonomousCodeSecurity,
	tgopinath, kys, Xiang Mei (Microsoft)

From: "Xiang Mei (Microsoft)" <xmei5@asu.edu>

xfrm_user_policy() clears the socket dst cache with __sk_dst_reset(),
i.e. the non-atomic __sk_dst_set(sk, NULL): it reads sk_dst_cache with
rcu_dereference_protected(), stores NULL and dst_release()s the old dst.
That is only safe if no other thread modifies sk_dst_cache concurrently.

For a connected UDP socket that does not hold: the transmit fast path
(udp_sendmsg -> sk_dst_check -> sk_dst_reset) resets the cache locklessly
with an atomic xchg(). A per-socket policy change racing a send can make
both sides observe the same old dst and each dst_release() it, dropping
the socket's single reference twice and freeing the xfrm_dst bundle while
it is still referenced:

  BUG: KASAN: slab-use-after-free in dst_release
  Write of size 4 at addr ffff88801897b6c0 by task exploit/155
  Call Trace:
   ...
   dst_release (... ./include/linux/rcuref.h:109)
   xfrm_user_policy (./include/net/sock.h:2239 ./include/net/sock.h:2256 net/xfrm/xfrm_state.c:3053)
   do_ip_setsockopt (net/ipv4/ip_sockglue.c:1347)
   ip_setsockopt (net/ipv4/ip_sockglue.c:1417)
   do_sock_setsockopt (net/socket.c:2368)
   __sys_setsockopt (net/socket.c:2393)
   __x64_sys_setsockopt (net/socket.c:2396)
   do_syscall_64 (arch/x86/entry/syscall_64.c:94)
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Reachable by an unprivileged user via a user+network namespace.

Use the atomic sk_dst_reset() so the cache is cleared and released with a
single xchg(): whichever side wins releases the dst once, the other sees
NULL and does nothing. Behaviour is otherwise unchanged.

Fixes: 2b06cdf3e688 ("xfrm: Clear sk_dst_cache when applying per-socket policy.")
Fixes: be8f8284cd89 ("net: xfrm: allow clearing socket xfrm policies.")
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
---
 net/xfrm/xfrm_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c58cd024e3c6..08ba6805ddb3 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -3010,7 +3010,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
 	if (sockptr_is_null(optval) && !optlen) {
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_OUT, NULL);
-		__sk_dst_reset(sk);
+		sk_dst_reset(sk);
 		return 0;
 	}
 
@@ -3050,7 +3050,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
 	if (err >= 0) {
 		xfrm_sk_policy_insert(sk, err, pol);
 		xfrm_pol_put(pol);
-		__sk_dst_reset(sk);
+		sk_dst_reset(sk);
 		err = 0;
 	}
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v2] seg6: validate SRH length before reading fixed fields
From: patchwork-bot+netdevbpf @ 2026-06-27  2:00 UTC (permalink / raw)
  To: Nuoqi Gui
  Cc: davem, edumazet, kuba, pabeni, horms, andrea.mayer, netdev, bpf,
	linux-kernel, m.xhonneux, daniel, dlebrun
In-Reply-To: <20260623-f01-17-seg6-srh-len-v2-1-2edc40e9e3e1@mails.tsinghua.edu.cn>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 23 Jun 2026 18:32:31 +0800 you wrote:
> seg6_validate_srh() reads fixed SRH fields such as srh->type and
> srh->hdrlen before checking that the supplied length covers the fixed
> struct ipv6_sr_hdr fields.
> 
> The BPF SEG6 encap path reaches this with a BPF program-supplied pointer
> and length: bpf_lwt_push_encap() and the SEG6 local BPF END_B6 and
> END_B6_ENCAP actions call bpf_push_seg6_encap(), which forwards the
> length to seg6_validate_srh() with no minimum-size guard.  A 2-byte SEG6
> encap header can therefore make the validator read srh->type at offset 2
> beyond the caller-supplied buffer.
> 
> [...]

Here is the summary with links:
  - [net,v2] seg6: validate SRH length before reading fixed fields
    https://git.kernel.org/netdev/net/c/a75d99f46bf2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 0/2] net/sched: finish the qdisc_dequeue_peeked conversion (taprio, multiq)
From: patchwork-bot+netdevbpf @ 2026-06-27  2:00 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: vinicius.gomes, pabeni, jhs, jiri, kuba, davem, edumazet, horms,
	netdev, jarkao2, vladimir.oltean, linux-kernel
In-Reply-To: <20260625-b4-disp-31bcb279-v1-0-85c40b83c529@proton.me>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 25 Jun 2026 04:51:18 -0500 you wrote:
> Commit 77be155cba4e added peek emulation: a non-work-conserving qdisc's
> ->peek dequeues one skb and stashes it in the child's gso_skb. A parent
> that peeks such a child must then take the packet with
> qdisc_dequeue_peeked(), not a direct ->dequeue(), or the stashed skb is
> bypassed and the child's qlen/backlog desync. sch_red and sch_sfb were
> just fixed for this; taprio and multiq still take the direct path.
> 
> [...]

Here is the summary with links:
  - [1/2] net/sched: sch_taprio: Replace direct dequeue call with peek and qdisc_dequeue_peeked
    https://git.kernel.org/netdev/net/c/e056e1dfcddc
  - [2/2] net/sched: sch_multiq: Replace direct dequeue call with peek and qdisc_dequeue_peeked
    https://git.kernel.org/netdev/net/c/54f6b0c843e2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] qede: fix out-of-bounds check for cqe->len_list[]
From: patchwork-bot+netdevbpf @ 2026-06-27  2:00 UTC (permalink / raw)
  To: Matvey Kovalev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, Pavel.Zhigulin,
	netdev, linux-kernel, lvc-project
In-Reply-To: <20260623144602.3521-1-matvey.kovalev@ispras.ru>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 23 Jun 2026 17:45:54 +0300 you wrote:
> Move index check before element access.
> 
> Fixes: 896f1a2493b5 ("net: qlogic/qede: fix potential out-of-bounds read in qede_tpa_cont() and qede_tpa_end()")
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Signed-off-by: Matvey Kovalev <matvey.kovalev@ispras.ru>
> 
> [...]

Here is the summary with links:
  - qede: fix out-of-bounds check for cqe->len_list[]
    https://git.kernel.org/netdev/net/c/f9ba47fce593

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
From: luoxuanqiang @ 2026-06-27  1:59 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Eric Dumazet, Neal Cardwell, netdev, David S . Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, luoxuanqiang
In-Reply-To: <CAAVpQUCayy3o59i2vh9hHRPi-3pw1BJgEYMwZYRpZnYEUoqsGw@mail.gmail.com>



> 2026年6月27日 07:40,Kuniyuki Iwashima <kuniyu@google.com> 写道:
> 
> On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
>> 
>> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
>> 
>> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
>> range on a per-socket basis.  __inet_hash_connect() already treats such a
>> range as an explicit application partition and scans it with step 1 [1].
>> 
>> Do the same in inet_csk_find_open_port():
> 
> What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
> without IP_BIND_ADDRESS_NO_PORT ?
Hi Kuniyuki,

Thanks for the question!

The use case is when an application wants to restrict ephemeral port
allocation to a socket-local IP_LOCAL_PORT_RANGE, but still needs
bind(..., 0) to allocate and reserve a local port immediately.

IP_BIND_ADDRESS_NO_PORT is useful when the application can defer port
allocation until connect(), but it changes this behavior: bind(..., 0)
does not reserve a port in that case. So it is not a replacement for
applications that need the local port before connect(), for example to
publish it to another component or set up local policy.

This patch is also intended to keep the bind(..., 0) path consistent with
Eric's earlier change in __inet_hash_connect().

Thanks,
Xuanqiang

^ permalink raw reply

* Re: [PATCH net v2] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: patchwork-bot+netdevbpf @ 2026-06-27  1:50 UTC (permalink / raw)
  To: Carlo Szelinsky
  Cc: o.rempel, kory.maincent, andrew+netdev, davem, edumazet, kuba,
	pabeni, horms, corey, hkallweit1, linux, netdev, linux-kernel
In-Reply-To: <20260624204017.2752934-1-github@szelinsky.de>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 24 Jun 2026 22:40:16 +0200 you wrote:
> From: Corey Leavitt <corey@leavitt.info>
> 
> __pse_control_release() drops psec->ps via devm_regulator_put(), which
> only succeeds if the devres entry added by the matching
> devm_regulator_get_exclusive() is still present on pcdev->dev at the
> time the pse_control's kref hits zero.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: pse-pd: scope pse_control regulator handle to kref lifetime
    https://git.kernel.org/netdev/net/c/16759757c4d2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: liquidio: fix BAR resource leak on PF number failure
From: patchwork-bot+netdevbpf @ 2026-06-27  1:50 UTC (permalink / raw)
  To: haoxiang_li2024
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, ricardo.farrington,
	felix.manlunas, horms, netdev, linux-kernel, stable
In-Reply-To: <20260624064013.2809570-1-haoxiang_li2024@163.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 24 Jun 2026 14:40:13 +0800 you wrote:
> If cn23xx_get_pf_num() fails, the function returns without
> unmapping either BAR. Unmap both BARs before returning from
> the error path.
> 
> Found by manual code review.
> 
> Fixes: 0c45d7fe12c7 ("liquidio: fix use of pf in pass-through mode in a virtual machine")
> Cc: stable@vger.kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> 
> [...]

Here is the summary with links:
  - [net,v2] net: liquidio: fix BAR resource leak on PF number failure
    https://git.kernel.org/netdev/net/c/c63ee62a3c4a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: ipa: fix SMEM state handle leaks in SMP2P init
From: patchwork-bot+netdevbpf @ 2026-06-27  1:50 UTC (permalink / raw)
  To: haoxiang_li2024
  Cc: elder, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, stable
In-Reply-To: <20260624065955.2822765-1-haoxiang_li2024@163.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 24 Jun 2026 14:59:55 +0800 you wrote:
> ipa_smp2p_init() acquires two Qualcomm SMEM state handles with
> qcom_smem_state_get(). However, neither the init error paths
> nor ipa_smp2p_exit() release them.
> 
> Release both handles with qcom_smem_state_put() in the init
> error paths and in ipa_smp2p_exit().
> 
> [...]

Here is the summary with links:
  - [net,v2] net: ipa: fix SMEM state handle leaks in SMP2P init
    https://git.kernel.org/netdev/net/c/96ca1e658ae4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net] net/smc: fix UAF in smc_cdc_rx_handler() by pinning the socket
From: Xiang Mei @ 2026-06-27  1:49 UTC (permalink / raw)
  To: D . Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	Mahanta Jambigi, Tony Lu, Wen Gu, netdev
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Hans Wippel, linux-rdma, linux-s390, Weiming Shi,
	Xiang Mei

smc_cdc_rx_handler() looks up the connection by token under the link
group's conns_lock, drops the lock, and then dereferences conn and the
smc_sock derived from it, ending in sock_hold(&smc->sk) inside
smc_cdc_msg_recv(). No reference is held across the lock release.

The only reference pinning the socket while the connection is
discoverable in the link group is taken in smc_lgr_register_conn()
(sock_hold) and dropped in __smc_lgr_unregister_conn() (sock_put), both
under conns_lock. Once the handler drops conns_lock, a concurrent
close() -> smc_release() -> smc_conn_free() -> smc_lgr_unregister_conn()
can drop that reference and free the smc_sock, so the handler's later
sock_hold() runs on freed memory:

  WARNING: lib/refcount.c:25 at refcount_warn_saturate
  Workqueue: rxe_wq do_work
   refcount_warn_saturate (lib/refcount.c:25)
   smc_cdc_msg_recv (net/smc/smc_cdc.c:430)
   smc_cdc_rx_handler (net/smc/smc_cdc.c:502)
   smc_wr_rx_tasklet_fn (net/smc/smc_wr.c:445)
   tasklet_action_common (kernel/softirq.c:938)
   handle_softirqs (kernel/softirq.c:622)
  Kernel panic - not syncing: panic_on_warn set

Only SMC-R is affected. The SMC-D receive tasklet is stopped by
tasklet_kill(&conn->rx_tsklet) in smc_conn_free() before the connection
is unregistered, so it cannot run concurrently with the free.

Take the socket reference while still holding conns_lock, so the
registration reference can no longer be the last one, and drop it once
the handler is done.

Fixes: d7b0e37c1ac1 ("net/smc: restructure CDC message reception")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
 net/smc/smc_cdc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 619b3bab3824..b809139d7e87 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -483,21 +483,27 @@ static void smc_cdc_rx_handler(struct ib_wc *wc, void *buf)
 	lgr = smc_get_lgr(link);
 	read_lock_bh(&lgr->conns_lock);
 	conn = smc_lgr_find_conn(ntohl(cdc->token), lgr);
+	if (conn && !conn->out_of_sync)
+		sock_hold(&container_of(conn, struct smc_sock, conn)->sk);
+	else
+		conn = NULL;
 	read_unlock_bh(&lgr->conns_lock);
-	if (!conn || conn->out_of_sync)
+	if (!conn)
 		return;
 	smc = container_of(conn, struct smc_sock, conn);
 
 	if (cdc->prod_flags.failover_validation) {
 		smc_cdc_msg_validate(smc, cdc, link);
-		return;
+		goto out;
 	}
 	if (smc_cdc_before(ntohs(cdc->seqno),
 			   conn->local_rx_ctrl.seqno))
 		/* received seqno is old */
-		return;
+		goto out;
 
 	smc_cdc_msg_recv(smc, cdc);
+out:
+	sock_put(&smc->sk);
 }
 
 static struct smc_wr_rx_handler smc_cdc_rx_handlers[] = {
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2] netdevsim: fix use-after-free in nsim_create and __nsim_dev_port_del
From: Jakub Kicinski @ 2026-06-27  1:48 UTC (permalink / raw)
  To: Hrushiraj Gandhi
  Cc: Simon Horman, Andrew Lunn, David S . Miller, Eric Dumazet,
	Paolo Abeni, Jiri Pirko, netdev, linux-kernel, bpf,
	syzbot+6c25f4750230faf70be9
In-Reply-To: <20260623144447.255326-1-hrushirajg23@gmail.com>

On Tue, 23 Jun 2026 20:14:47 +0530 Hrushiraj Gandhi wrote:
> Fix both paths by calling debugfs_remove_recursive() on the port's
> ddir before every free_netdev() call. The subsequent
> nsim_dev_port_debugfs_exit() calls become harmless no-ops since ddir is
> set to NULL.

Looks like the wrong fix. All features clean up after themselves with
the exception of ethtool. Save the ethtool ddir and remove just that
one. This will align with how the other features behave.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: Jakub Kicinski @ 2026-06-27  1:09 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kselftest, linux-kernel, netdev,
	bpf
In-Reply-To: <20260625181557.1086105-1-tjmercier@google.com>

On Thu, 25 Jun 2026 11:15:55 -0700 T.J. Mercier wrote:
> Write permissions on the /dev/udmabuf device file are not required to
> issue ioctls and allocate udmabufs. Applications should be opening this
> file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]
> 
> Remove the write access mode from the drivers/dma-buf/udmabuf.c and
> drivers/net/hw/ncdevmem.c selftests.

You need to explain "why", too. Why change it if it clearly
worked for everyone running this test until now.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next] caif: annotate phyinfo lookup under config lock
From: Jakub Kicinski @ 2026-06-27  1:07 UTC (permalink / raw)
  To: Runyu Xiao
  Cc: davem, edumazet, pabeni, horms, netdev, linux-kernel, jianhao.xu
In-Reply-To: <20260626042440.2013499-1-runyu.xiao@seu.edu.cn>

On Fri, 26 Jun 2026 12:24:40 +0800 Runyu Xiao wrote:
> cfcnfg_get_phyinfo_rcu() is used by both RCU read-side paths and config
> update paths that hold cnfg->lock before adding or deleting entries from
> cnfg->phys. The helper walks the list with list_for_each_entry_rcu(),
> but does not tell lockdep about the config-lock-protected callers.
> 
> Pass lockdep_is_held(&cnfg->lock) to the iterator. RCU-reader callers
> remain valid, and CONFIG_PROVE_RCU_LIST can now see the non-RCU
> protection used by the add/delete paths.
> 
> This was found by our static analysis tool and then manually reviewed
> against the current tree. The dynamic triage evidence is a
> target-matched CONFIG_PROVE_RCU_LIST warning; the change is limited
> to documenting the existing protection contract.

This code was removed a couple of releases ago.

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: Update Jason Wang's email address
From: Jakub Kicinski @ 2026-06-27  1:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, netdev, eperezma, kvm, linux-kernel
In-Reply-To: <20260626022039.96139-1-jasowang@redhat.com>

On Fri, 26 Jun 2026 10:20:38 +0800 Jason Wang wrote:
> I will use jasowangio@gmail.com for future review and discussion

Do you want to add a mailmap entry, too?
Otherwise I think you'll get CCed twice (once for MAINTAINERS and once
because you given tags to previous changes)

^ permalink raw reply

* Re: [PATCH net v2] octeontx2-pf: check DMAC extraction support before filtering
From: Harshitha Ramamurthy @ 2026-06-27  0:50 UTC (permalink / raw)
  To: nshettyj
  Cc: netdev, linux-kernel, sgoutham, gakula, sbhatta, hkelam,
	bbhushan2, andrew+netdev, davem, edumazet, kuba, pabeni, naveenm,
	tduszynski, sumang
In-Reply-To: <20260626062329.871990-1-nshettyj@marvell.com>

On Thu, Jun 25, 2026 at 11:24 PM <nshettyj@marvell.com> wrote:
>
> From: Suman Ghosh <sumang@marvell.com>
>
> Currently, configuring a VF MAC address via the PF (e.g., 'ip link
> set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
> hardware filter. However, the hardware parser profile might not
> support DMAC extraction.
>
> Check if the hardware parsing profile supports DMAC extraction
> before adding the filter. Additionally, emit a warning message
> to inform the operator if the MAC filter installation fails due
> to missing DMAC extraction support.
>
> Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
> Signed-off-by: Suman Ghosh <sumang@marvell.com>
> Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
>
> ---
> v2:
>  - Move the DMAC extraction check from otx2_set_vf_mac() into
>    otx2_do_set_vf_mac() which already holds pf->mbox.lock, so all
>    mbox operations are under a single lock/unlock pair. All error
>    paths now use the existing goto-out pattern, eliminating the
>    scattered mutex_unlock() + return calls from v1.
>  - Return -EOPNOTSUPP instead of 0 when DMAC extraction is not
>    supported, so the caller gets an explicit error rather than a
>    silent success.

Please ensure a minimum of 24 hr gap before posting a new revision and
also don't post patches in reply to a previous posting as documented
in:

https://www.kernel.org/doc/html/next/process/maintainer-netdev.html

> ---
>  .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 33 +++++++++++++++++++
>  1 file changed, 33 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index b63df5737ff2..dc7e4a225dd0 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -2517,10 +2517,43 @@ EXPORT_SYMBOL(otx2_config_hwtstamp_set);
>
>  static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
>  {
> +       struct npc_get_field_status_req *freq;
> +       struct npc_get_field_status_rsp *frsp;
>         struct npc_install_flow_req *req;
>         int err;
>
>         mutex_lock(&pf->mbox.lock);
> +
> +       /* Skip installing the DMAC filter if the hardware parser profile
> +        * does not support DMAC extraction.
> +        */
> +       freq = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
> +       if (!freq) {
> +               err = -ENOMEM;
> +               goto out;
> +       }

I noticed that otx2_set_vf_mac() copies the MAC address into the vf
config structure before the programming is successful. Is that
intended?

> +
> +       freq->field = NPC_DMAC;
> +       if (otx2_sync_mbox_msg(&pf->mbox)) {
> +               err = -EINVAL;
> +               goto out;
> +       }
> +
> +       frsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
> +              (&pf->mbox.mbox, 0, &freq->hdr);
> +       if (IS_ERR(frsp)) {
> +               err = PTR_ERR(frsp);
> +               goto out;
> +       }
> +
> +       if (!frsp->enable) {
> +               netdev_warn(pf->netdev,
> +                           "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
> +                           vf);

Would a netdev_warn_ratelimited() be better here to avoid spamming the log?

> +               err = -EOPNOTSUPP;
> +               goto out;
> +       }
> +
>         req = otx2_mbox_alloc_msg_npc_install_flow(&pf->mbox);
>         if (!req) {
>                 err = -ENOMEM;
> --
> 2.48.1
>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox