Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] Bluetooth: Add HCI_QUIRK_NO_SCAN_WHILE_CONNECTED for combo chips
From: StefanCondorache @ 2026-04-19  7:24 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, luiz.dentz, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, StefanCondorache

Realtek combo chips share a single antenna between Wi-Fi and Bluetooth.
Background LE passive scanning while an active connection exists causes
antenna multiplexing conflicts via Packet Traffic Arbitration (PTA),
resulting in audio stuttering and Wi-Fi packet loss.

Add HCI_QUIRK_NO_SCAN_WHILE_CONNECTED to suppress passive scanning in
hci_update_passive_scan_sync() when active connections are present.
Set this quirk for all Realtek devices in btrtl_set_quirks().

Also add device ID 0x0bda:0xc829 to the btusb Realtek device table.

Signed-off-by: StefanCondorache <condorachest@gmail.com>
---
 drivers/bluetooth/btrtl.c   |  6 ++++++
 drivers/bluetooth/btusb.c   |  2 ++
 include/net/bluetooth/hci.h | 11 +++++++++++
 net/bluetooth/hci_sync.c    |  7 +++++++
 4 files changed, 26 insertions(+)

diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c
index 62f9d4df3a4f..00dfba656970 100644
--- a/drivers/bluetooth/btrtl.c
+++ b/drivers/bluetooth/btrtl.c
@@ -1299,6 +1299,12 @@ EXPORT_SYMBOL_GPL(btrtl_download_firmware);
 
 void btrtl_set_quirks(struct hci_dev *hdev, struct btrtl_device_info *btrtl_dev)
 {
+	/* Realtek combo chips share the antenna between Wi-Fi and
+	 * Bluetooth. Suppress passive scanning while connected to
+	 * prevent coexistence issues.
+	 */
+	hci_set_quirk(hdev, HCI_QUIRK_NO_SCAN_WHILE_CONNECTED);
+
 	/* Enable controller to do both LE scan and BR/EDR inquiry
 	 * simultaneously.
 	 */
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 5f57953393be..87972f5fc567 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -516,6 +516,8 @@ static const struct usb_device_id quirks_table[] = {
 	/* Realtek 8822CU Bluetooth devices */
 	{ USB_DEVICE(0x13d3, 0x3549), .driver_info = BTUSB_REALTEK |
 						     BTUSB_WIDEBAND_SPEECH },
+	{ USB_DEVICE(0x0bda, 0xc829), .driver_info = BTUSB_REALTEK |
+							 BTUSB_WIDEBAND_SPEECH },
 
 	/* Realtek 8851BE Bluetooth devices */
 	{ USB_DEVICE(0x0bda, 0xb850), .driver_info = BTUSB_REALTEK },
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 572b1c620c5d..8466dc52aeca 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -378,6 +378,17 @@ enum {
 	 */
 	HCI_QUIRK_BROKEN_READ_PAGE_SCAN_TYPE,
 
+	/* When this quirk is set, the controller suppresses passive LE
+	 * background scanning while an active connection exists.
+	 * This is required for combo chips with shared Wi-Fi/Bluetooth
+	 * antennas to prevent coexistence issues causing audio drops
+	 * and Wi-Fi packet loss.
+	 *
+	 * This quirk can be set before hci_register_dev is called or
+	 * during the hdev->setup vendor callback.
+	 */
+	HCI_QUIRK_NO_SCAN_WHILE_CONNECTED,
+
 	__HCI_NUM_QUIRKS,
 };
 
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index fd3aacdea512..5e30e725efa2 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -3194,6 +3194,13 @@ int hci_update_passive_scan_sync(struct hci_dev *hdev)
 	if (hdev->discovery.state != DISCOVERY_STOPPED)
 		return 0;
 
+	/* If the controller requires no scanning while connected,
+	 * suppress passive scanning when an active connection exists.
+	 */
+	if (hci_test_quirk(hdev, HCI_QUIRK_NO_SCAN_WHILE_CONNECTED) &&
+	    !list_empty(&hdev->conn_hash.list))
+		return 0;
+
 	/* Reset RSSI and UUID filters when starting background scanning
 	 * since these filters are meant for service discovery only.
 	 *
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net v6 2/2] pppoe: drop PFC frames
From: Qingfang Deng @ 2026-04-19  7:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-ppp, andrew+netdev, davem, edumazet, pabeni, hataegu0826,
	horms, kees, kuniyu, bigeasy, gnault, ericwouds, semen.protsenko,
	netdev, linux-kernel, paulus, jaco, carlsonj, wojciech.drewek,
	marcin.szycik
In-Reply-To: <20260418191145.213625-1-kuba@kernel.org>

Hi,

On 4/19/2026 3:11 AM, Jakub Kicinski wrote:
> [ ... ]
>> @@ -434,6 +434,12 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
>>   	if (skb->len < len)
>>   		goto drop;
>>   
>> +	/* skb->data points to the PPP protocol header after skb_pull_rcsum.
>> +	 * Drop PFC frames.
>> +	 */
>> +	if (ppp_skb_is_compressed_proto(skb))
> 
> Does this code safely access the PPP protocol byte?
> 
> After pulling the PPPoE header, skb->data points to the inner payload.
> However, the new ppp_skb_is_compressed_proto() helper unconditionally
> dereferences skb->data[0].
> 
> If a crafted PPPoE packet is received with an inner length of 0 and no
> Ethernet padding, the linear buffer might end exactly after the PPPoE header.
> In that scenario, this would read past the allocated linear buffer.
> 
> Even if the inner length is greater than 0, could the payload reside entirely
> in non-linear paged fragments, causing an out-of-bounds read?
> 
> Would it be safer to verify the packet has at least 1 byte and use
> pskb_may_pull() to ensure the protocol byte is in the linear region before
> inspecting it, perhaps after the pskb_trim_rcsum() call?
I already updated the pskb_may_pull() above, from struct pppoe_hdr (6) 
to PPPOE_SES_HLEN (8), to ensure that.

Regards,
Qingfang

^ permalink raw reply

* Re: [PATCH nf] netfilter: xt_TCPMSS: check skb_dst before path-MTU clamping
From: Florian Westphal @ 2026-04-19  8:00 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Pablo Neira Ayuso, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Phil Sutter, Simon Horman, netfilter-devel, coreteam,
	netdev, Xiang Mei
In-Reply-To: <aePiSwmP6YEQ4mNE@strlen.de>

Florian Westphal <fw@strlen.de> wrote:
> Weiming Shi <bestswngs@gmail.com> wrote:
> > When TCPMSS with CLAMP_PMTU is used via nft_compat in a non-base
> > chain, par->hook_mask is set to 0, bypassing the checkentry hook
> > validation. The target can then run at PRE_ROUTING where skb_dst is
> > NULL, causing a null-ptr-deref in tcpmss_mangle_packet():
> > 
> >  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> >  RIP: 0010:tcpmss_mangle_packet (include/net/dst.h:219 net/netfilter/xt_TCPMSS.c:105)
> >   tcpmss_tg4 (net/netfilter/xt_TCPMSS.c:202)
> >   nft_target_eval_xt (net/netfilter/nft_compat.c:87)
> >   nft_do_chain (net/netfilter/nf_tables_core.c:287)
> >   nf_hook_slow (net/netfilter/core.c:623)
> > 
> > Check skb_dst() for NULL before calling dst_mtu().
> 
> FWIW I will apply this patch even though its wrong.
> 
> nft_compat.c is just too broken, I don't see how it can be
> fixed in any reasonable amount of time.

net/netfilter/xt_TCPMSS.c:          (par->hook_mask & ~((1 << NF_INET_FORWARD) |
net/netfilter/xt_addrtype.c:    if (par->hook_mask & ((1 << NF_INET_PRE_ROUTING) |
net/netfilter/xt_devgroup.c:        par->hook_mask & ~((1 << NF_INET_PRE_ROUTING) |
net/netfilter/xt_physdev.c:         par->hook_mask & (1 << NF_INET_LOCAL_OUT)) {
net/netfilter/xt_policy.c:      if (par->hook_mask & ((1 << NF_INET_POST_ROUTING) |
net/netfilter/xt_set.c:              (par->hook_mask & ~(1 << NF_INET_FORWARD |

Look at this I don't see an alternative to mixing nft specific bits into
x_tables, i.e.:

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -187,6 +187,8 @@ struct xt_target {
        /* Should return 0 on success or an error code otherwise (-Exxxx). */
        int (*checkentry)(const struct xt_tgchk_param *);
 
+       int (*nft_validate_chain)(const void *targinfo, unsigned int hook_mask);
+
        /* Called when entry of this type deleted. */
        void (*destroy)(const struct xt_tgdtor_param *);
 #ifdef CONFIG_NETFILTER_XTABLES_COMPAT

.. and then call that from nft_compat.c for TCPSS.
Same for matches.


^ permalink raw reply

* AW: pre-boot plugged SFP autoneg advertisement
From: markus.stockhausen @ 2026-04-19  8:49 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd
In-Reply-To: <90958cc3-e291-44ff-8fc3-102c0f62a269@lunn.ch>

Hi Andrew,

> Von: Andrew Lunn <andrew@lunn.ch> 
> Betreff: Re: pre-boot plugged SFP autoneg advertisement
> 
> > On Sat, Apr 18, 2026 at 11:27:40AM +0200, markus.stockhausen@gmx.de
wrote:
> > Hi,
> > 
> > I'm currently analyzing an issue where a pre-boot-plugged SFP module 
> > comes up with autoneg=no advertisement during boot. After an
> > unplug/replug autoneg=yes advertisement is chosen. 
> > 
> > The following addition in phylink_start() just before the call to
> > phylink_mac_initial_config() mitigiates this.
> > 
> > +  /* If an SFP module was already present before phylink_start() was
> > +   * called, phylink_sfp_set_config() was unable to call
> > +   * phylink_mac_initial_config() as phylink was not yet started.
> > +   * Ensure the SFP capabilities are reflected in advertising.
> > +   */
> > +  if (pl->sfp_bus && !linkmode_empty(pl->sfp_support))
> > +    linkmode_copy(pl->link_config.advertising, pl->sfp_support);
>
> Let me see if i have the call chain correct. This is net-next/main
> from today.
>
> phylink_sfp_connect_phy() ->
>   phylink_sfp_config_phy
>
>         if (changed && !test_bit(PHYLINK_DISABLE_STOPPED,
>                                  &pl->phylink_disable_state))
>                 phylink_mac_initial_config(pl, false);
>
> You are saying PHYLINK_DISABLE_STOPPED is set, so
> phylink_mac_initial_config() is not called.
>
> What i don't see is how phylink_mac_initial_config() does the
> linkmode_copy() you are adding.

Took that hint/question and digged deeper. Added further debug
to each and every linkmode_copy. I think I found the culprit in 
a userspace ethtool call. For now I assume OpenWrt netifd.
Adding my last trace below including the original (wrong) idea. 

Thank you very much for taking the time and your assistance.

Markus

[    3.301299] XXXX phylink_create lan12 set pl->link_config.advertising
(autoneg = 1)
[    3.309954] XXXX phylink_parse_mode lan12 set pl->link_config.advertising
(autoneg = 1)
[    3.318964] XXX sfp_module_insert lan12 called
[    3.323935] XXXX phylink_sfp_config_optical lan12 set config.advertising
(autoneg = 1)
[    3.332815] XXXX phylink_validate_one lan12 set tmp_supported (autoneg =
1)
[    3.340629] XXXX phylink_validate_mask lan12 set supported (autoneg = 1)
[    3.348165] XXXX phylink_validate_mask lan12 set state->advertising
(autoneg = 1)
[    3.356527] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12
(uninitialized): XXX phylink_sfp_set_config requesting link mode
inband/1000base-x with support 0000000,00000000,00000200,00006440
--- ETHTOOL CALL HERE ---
[   81.213726] XXXX phylink_ethtool_ksettings_set lan12 start got
config.advertising (autoneg = 1)
[   81.223542] XXXX phylink_ethtool_ksettings_set lan12 set accoring to
kset->base.autoneg (autoneg = 0)
[   81.233961] CPU: 0 UID: 0 PID: 1470 Comm: netifd Tainted: G           O
6.18.21 #0 NONE
[   81.234010] Tainted: [O]=OOT_MODULE
[   81.234017] Hardware name: Zyxel XGS1210-12 A1 Switch
[   81.234026] Stack : 823a3bbc 80139d20 00000000 00000001 00000000 00000000
00000000 00000000
[   81.234094]         00000000 00000000 00000000 00000000 00000000 00000001
823a3b78 82040d00
[   81.234152]         00000000 00000000 80992870 823a3a10 00000000 ffffefff
00000001 00000224
[   81.234213]         00000226 823a39d4 00000226 000019c8 00000001 00000000
80992870 80a00000
[   81.234273]         82764648 00000016 00000016 82764628 00000000 80a913b0
00000000 81990000
[   81.234334]         ...
[   81.234350] Call Trace:
[   81.234356] [<80115e48>] show_stack+0x28/0xf0
[   81.234407] [<8010fc78>] dump_stack_lvl+0x70/0xb0
[   81.234432] [<80592718>] phylink_ethtool_ksettings_set+0x58c/0x6a4
[   81.234479] [<806bb890>] ethtool_set_link_ksettings+0xbc/0x198
[   81.234516] [<806be01c>] __dev_ethtool+0xfe0/0x1a1c
[   81.234550] [<806beb24>] dev_ethtool+0xcc/0x24c
[   81.234575] [<80678770>] dev_ioctl+0x30c/0x5f4
[   81.234616] [<80603ebc>] sock_ioctl+0x2bc/0x470
[   81.234642] [<8036e530>] sys_ioctl+0xb4/0x120
[   81.234683] [<8011edec>] syscall_common+0x34/0x58
[   81.234715]
[   81.234723] XXXX 2 phylink_ethtool_ksettings_set lan12 set
pl->link_config.advertising (autoneg = 0)
-----------
[   81.375070] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
phylink_start configuring for inband/1000base-x link mode
--- INITIAL FIX/IDEA HERE ---
[   81.388408] XXX phylink_start lan12 sfp_bus set and linkmode not empty ->
would run linkmode_copy()
-----------
[   81.398688] XXX phylink_mac_initial_config lan12 called with
force_restart = 1
[   81.406752] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
major config, requested inband/1000base-x
[   81.418459] XXXX major_config_entry lan12: autoneg_adv=0 autoneg_sfp=1
sfp_may_have_phy=0
[   81.427612] XXXX phylink_pcs_neg_mode ENTRY lan12: pl->pcs_neg_mode=0x0
[   81.435102] XXXX phylink_pcs_neg_mode lan12 advertising autoneg=0
[   81.442034] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
interface 1000base-x inband modes: pcs=03 phy=00
[   81.454495] XXXX phylink_pcs_neg_mode lan12 base-x without phy
[   81.461134] XXXX phylink_pcs_neg_mode EXIT lan12 pl->pcs_neg_mode = 0x40
pl->act_link_an_mode = 0x2
[   82.085493] XXXX phylink_mac_pcs_get_state lan12 set state->advertising
(autoneg = 0)
[   82.094391] XXXX phylink_mac_pcs_get_state lan12 autoneg is 0


^ permalink raw reply

* [PATCH net v2] net: iptunnel: fix stale transport header after GRE/TEB decap
From: Jiayuan Chen @ 2026-04-19  9:08 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, syzbot+83181a31faf9455499c5, David S. Miller,
	David Ahern, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Pravin B Shelar, Tom Herbert, linux-kernel

syzbot reported a BUG.

I found that after GRE decapsulation in gretap/ip6gretap paths, the
transport_header becomes stale with a negative offset. The sequence is:

1. Before decap, transport_header points to the outer L4 (GRE) header.
2. __iptunnel_pull_header() calls skb_pull_rcsum() to advance skb->data
   past the GRE header, but does not update transport_header.
3. For TEB (gretap/ip6gretap), eth_type_trans() in ip_tunnel_rcv() /
   __ip6_tnl_rcv() further pulls ETH_HLEN (14 bytes) from skb->data.

After these two pulls, skb->data has moved forward while transport_header
still points to the old (now behind skb->data) position, resulting in a
negative skb_transport_offset(): typically -4 after GRE pull alone, or
-18 after GRE + inner Ethernet pull.

In the normal case where the inner frame is a recognizable protocol
(e.g., IPv4/TCP), the transport_header is subsequently overwritten by
ip_rcv_core() (or inet_gro_receive() on the GRO path) via
skb_set_transport_header(), and the stale value never reaches downstream
consumers. However, if the inner frame cannot be parsed (e.g.,
eth_type_trans() classifies it as ETH_P_802_2 due to a zero/invalid
inner Ethernet header), neither rescue runs, and the stale offset
persists into __netif_receive_skb_core().

When this stale offset is combined with contradictory GSO metadata (e.g.,
SKB_GSO_TCPV4 injected via virtio_net_hdr from a tun device),
qdisc_pkt_len_segs_init() trusts the negative offset: the unsigned
wraparound makes pskb_may_pull() effectively a no-op, and __tcp_hdrlen()
then reads from an invalid memory location, causing a use-after-free.

The UAF only triggers on the GSO path, where qdisc_pkt_len_segs_init()
dereferences the transport header to compute per-segment length. Fix
this by introducing iptunnel_rebuild_transport_header(), which is a
no-op for non-GSO packets and otherwise re-probes the transport header
via the flow dissector. If re-probing fails, the contradictory GSO
metadata is cleared via skb_gso_reset() so downstream consumers cannot
trust stale offsets. Restricting the rebuild to GSO packets keeps the
flow-dissector cost off the common rx fast path.

reproducer: https://gist.github.com/mrpre/5ba943fd86367af748b70de99263da4b

Link: https://syzkaller.appspot.com/bug?extid=83181a31faf9455499c5
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path")
Reported-by: syzbot+83181a31faf9455499c5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69de2bee.a00a0220.475f0.0041.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---

As a follow-up for production reliability, I am wondering whether we
can extend the existing safety net in __netif_receive_skb_core() to
also handle set-but-negative transport_header:

        if (!skb_transport_header_was_set(skb) ||
            skb_transport_offset(skb) < 0)
                skb_reset_transport_header(skb);
---
 include/net/ip_tunnels.h | 12 ++++++++++++
 net/ipv4/ip_tunnel.c     |  2 ++
 net/ipv6/ip6_tunnel.c    |  2 ++
 3 files changed, 16 insertions(+)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d708b66e55cd..9b4e662833a1 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -662,6 +662,18 @@ static inline int iptunnel_pull_offloads(struct sk_buff *skb)
 	return 0;
 }

+static inline void iptunnel_rebuild_transport_header(struct sk_buff *skb)
+{
+	if (!skb_is_gso(skb))
+		return;
+
+	skb->transport_header = (typeof(skb->transport_header))~0U;
+	skb_probe_transport_header(skb);
+
+	if (!skb_transport_header_was_set(skb))
+		skb_gso_reset(skb);
+}
+
 static inline void iptunnel_xmit_stats(struct net_device *dev, int pkt_len)
 {
 	if (pkt_len > 0) {
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 50d0f5fe4e4c..c46be68cfafa 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -445,6 +445,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
 	if (tun_dst)
 		skb_dst_set(skb, (struct dst_entry *)tun_dst);

+	iptunnel_rebuild_transport_header(skb);
+
 	gro_cells_receive(&tunnel->gro_cells, skb);
 	return 0;

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 46bc06506470..f95348cf3c77 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -879,6 +879,8 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct sk_buff *skb,
 	if (tun_dst)
 		skb_dst_set(skb, (struct dst_entry *)tun_dst);

+	iptunnel_rebuild_transport_header(skb);
+
 	gro_cells_receive(&tunnel->gro_cells, skb);
 	return 0;

-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net v2] net: iptunnel: fix stale transport header after GRE/TEB decap
From: Eric Dumazet @ 2026-04-19  9:25 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, syzbot+83181a31faf9455499c5, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Pravin B Shelar,
	Tom Herbert, linux-kernel
In-Reply-To: <20260419090817.127334-1-jiayuan.chen@linux.dev>

On Sun, Apr 19, 2026 at 2:08 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> syzbot reported a BUG.
>
> I found that after GRE decapsulation in gretap/ip6gretap paths, the
> transport_header becomes stale with a negative offset. The sequence is:
>
> 1. Before decap, transport_header points to the outer L4 (GRE) header.
> 2. __iptunnel_pull_header() calls skb_pull_rcsum() to advance skb->data
>    past the GRE header, but does not update transport_header.
> 3. For TEB (gretap/ip6gretap), eth_type_trans() in ip_tunnel_rcv() /
>    __ip6_tnl_rcv() further pulls ETH_HLEN (14 bytes) from skb->data.
>
> After these two pulls, skb->data has moved forward while transport_header
> still points to the old (now behind skb->data) position, resulting in a
> negative skb_transport_offset(): typically -4 after GRE pull alone, or
> -18 after GRE + inner Ethernet pull.
>
> In the normal case where the inner frame is a recognizable protocol
> (e.g., IPv4/TCP), the transport_header is subsequently overwritten by
> ip_rcv_core() (or inet_gro_receive() on the GRO path) via
> skb_set_transport_header(), and the stale value never reaches downstream
> consumers. However, if the inner frame cannot be parsed (e.g.,
> eth_type_trans() classifies it as ETH_P_802_2 due to a zero/invalid
> inner Ethernet header), neither rescue runs, and the stale offset
> persists into __netif_receive_skb_core().
>
> When this stale offset is combined with contradictory GSO metadata (e.g.,
> SKB_GSO_TCPV4 injected via virtio_net_hdr from a tun device),
> qdisc_pkt_len_segs_init() trusts the negative offset: the unsigned
> wraparound makes pskb_may_pull() effectively a no-op, and __tcp_hdrlen()
> then reads from an invalid memory location, causing a use-after-free.
>
> The UAF only triggers on the GSO path, where qdisc_pkt_len_segs_init()
> dereferences the transport header to compute per-segment length. Fix
> this by introducing iptunnel_rebuild_transport_header(), which is a
> no-op for non-GSO packets and otherwise re-probes the transport header
> via the flow dissector. If re-probing fails, the contradictory GSO
> metadata is cleared via skb_gso_reset() so downstream consumers cannot
> trust stale offsets. Restricting the rebuild to GSO packets keeps the
> flow-dissector cost off the common rx fast path.
>
> reproducer: https://gist.github.com/mrpre/5ba943fd86367af748b70de99263da4b
>
> Link: https://syzkaller.appspot.com/bug?extid=83181a31faf9455499c5
> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path")
> Reported-by: syzbot+83181a31faf9455499c5@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/69de2bee.a00a0220.475f0.0041.GAE@google.com/T/
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>
> As a follow-up for production reliability, I am wondering whether we
> can extend the existing safety net in __netif_receive_skb_core() to
> also handle set-but-negative transport_header:
>
>         if (!skb_transport_header_was_set(skb) ||
>             skb_transport_offset(skb) < 0)
>                 skb_reset_transport_header(skb);
> ---
>  include/net/ip_tunnels.h | 12 ++++++++++++
>  net/ipv4/ip_tunnel.c     |  2 ++
>  net/ipv6/ip6_tunnel.c    |  2 ++
>  3 files changed, 16 insertions(+)
>
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index d708b66e55cd..9b4e662833a1 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -662,6 +662,18 @@ static inline int iptunnel_pull_offloads(struct sk_buff *skb)
>         return 0;
>  }
>
> +static inline void iptunnel_rebuild_transport_header(struct sk_buff *skb)
> +{
> +       if (!skb_is_gso(skb))
> +               return;
> +
> +       skb->transport_header = (typeof(skb->transport_header))~0U;
> +       skb_probe_transport_header(skb);
> +
> +       if (!skb_transport_header_was_set(skb))
> +               skb_gso_reset(skb);

I do not think this makes sense.
What is a valid case for this packet being processed further?
The buggy packet must be dropped, instead of being mangled like this.

^ permalink raw reply

* Re: [PATCH net 1/1] mptcp: hold subflow request owners when cloning reqsk
From: Yuan Tan @ 2026-04-19  9:51 UTC (permalink / raw)
  To: Kuniyuki Iwashima, Matthieu Baerts
  Cc: yuantan098, Ren Wei, netdev, mptcp, davem, edumazet, kuba, pabeni,
	horms, ncardwell, dsahern, martineau, geliang, daniel, kafai,
	yifanwucs, tomapufckgml, bird, caoruide123, enjou1224z
In-Reply-To: <CAAVpQUDzh-dGsQpBCZjN3rUsoDc2QjzWh-o5yVWoBWDQNXbjmQ@mail.gmail.com>


On 4/16/2026 11:48 AM, Kuniyuki Iwashima wrote:
> On Thu, Apr 16, 2026 at 10:45 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>> Hi Ren,
>>
>> On 15/04/2026 11:31, Ren Wei wrote:
>>> From: Ruide Cao <caoruide123@gmail.com>
>>>
>>> TCP request migration clones pending request sockets with
>>> inet_reqsk_clone(). For MPTCP MP_JOIN requests this raw-copies
>>> subflow_req->msk, but the cloned request does not take a new reference.
>>>
>>> Both the original and the cloned request can later drop the same msk in
>>> subflow_req_destructor(), and a migrated request may keep a dangling msk
>>> pointer after the original owner has already been released.
>>>
>>> Add a request_sock clone callback and let MPTCP grab a reference for cloned
>>> subflow requests that carry an msk. This keeps ownership balanced across
>>> both successful migrations and failed clone/insert paths without changing
>>> other protocols.
>> Thank you for this patch!
>>
>> Indeed, it looks like this path has not been covered by the MPTCP test
>> suite so far.
>>
>> By chance, do you have any simpler reproducer using packetdrill? (the
>> MPTCP fork)
>>
>>   https://github.com/multipath-tcp/packetdrill
>>
>> Also, I see Sashiko is pointing to a potential issue with MP_CAPABLE and
>> the token, also only visible with net.ipv4.tcp_migrate_req=1:
>>
>> https://sashiko.dev/#/patchset/86e2514b533bf4d55d4aa2fdbf1404022e8c9430.1776149210.git.caoruide123%40gmail.com
>>
>> Are you also looking at that?
>>
>>> Fixes: c905dee62232 ("tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs.")
>>> Cc: stable@kernel.org
>>> Reported-by: Yuan Tan <yuantan098@gmail.com>
>>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>>> Reported-by: Xin Liu <bird@lzu.edu.cn>
>>> Signed-off-by: Ruide Cao <caoruide123@gmail.com>
>>> Tested-by: Ren Wei <enjou1224z@gmail.com>
>>> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>>> ---
>>>  include/net/request_sock.h      |  2 ++
>>>  net/ipv4/inet_connection_sock.c |  3 +++
>>>  net/mptcp/subflow.c             | 13 +++++++++++++
>>>  3 files changed, 18 insertions(+)
>>>
>>> diff --git a/include/net/request_sock.h b/include/net/request_sock.h
>>> index 5a9c826a7092..560e464c400f 100644
>>> --- a/include/net/request_sock.h
>>> +++ b/include/net/request_sock.h
>>> @@ -36,6 +36,8 @@ struct request_sock_ops {
>>>                                     struct sk_buff *skb,
>>>                                     enum sk_rst_reason reason);
>>>       void            (*destructor)(struct request_sock *req);
>>> +     void            (*init_clone)(const struct request_sock *req,
>>> +                                   struct request_sock *new_req);
>> @TCP/INET maintainers: are you OK with this new function pointer?
>>
>> Currently, MPTCP is the only user, and "req" is not used, see below.
>>
>>>  };
>>>
>>>  struct saved_syn {
>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>> index e961936b6be7..140a9e96ad58 100644
>>> --- a/net/ipv4/inet_connection_sock.c
>>> +++ b/net/ipv4/inet_connection_sock.c
>>> @@ -954,6 +954,9 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req,
>>>       if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(nreq)->tfo_listener)
>>>               rcu_assign_pointer(tcp_sk(nreq->sk)->fastopen_rsk, nreq);
>> (Maybe TCP with fastopen could be this other user to call
>> rcu_assign_pointer()? (net-next material))
>>
>>> +     if (req->rsk_ops->init_clone)
>>> +             req->rsk_ops->init_clone(req, nreq);
> I think a simple direct call is better.
>
> #ifdef CONFIG_MPTCP
>     if (tcp_rsk(req)->is_mptcp)
>         mptcp_reqsk_clone(nreq);
> #endif
>
 Thank you very much for your suggestion. We will use this approach in
the next version of the patch. Would you like us to add your
Suggested-by tag?


>>> +
>>>       return nreq;
>>>  }
>>>
>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>>> index 4ff5863aa9fd..5f4069647822 100644
>>> --- a/net/mptcp/subflow.c
>>> +++ b/net/mptcp/subflow.c
>>> @@ -47,6 +47,17 @@ static void subflow_req_destructor(struct request_sock *req)
>>>       mptcp_token_destroy_request(req);
>>>  }
>>>
>>> +static void subflow_req_clone(const struct request_sock *req,
>>> +                           struct request_sock *new_req)
>>> +{
>>> +     struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(new_req);
>>> +
>>> +     (void)req;
>>
>> Note: if 'req' is not used, and MPTCP is currently the only user of this
>> new init_clone() callback, perhaps enough to pass only 'new_req' for the
>> moment? 'req' can be added later if other users need it, no?
>>
>> With only init_close(nreq) in a v2, or if TCP maintainers prefer to keep
>> it, feel free to add:
>>
>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>
>>
>> Cheers,
>> Matt
>> --
>> Sponsored by the NGI0 Core fund.
>>

^ permalink raw reply

* [PATCH net 1/1] ipv4: icmp: validate reply type before using icmp_pointers
From: Ren Wei @ 2026-04-19 10:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, andreas.a.roeseler,
	yuantan098, yifanwucs, tomapufckgml, bird, caoruide123, n05ec
In-Reply-To: <cover.1776563662.git.caoruide123@gmail.com>

From: Ruide Cao <caoruide123@gmail.com>

Extended echo replies use ICMP_EXT_ECHOREPLY as the outbound reply type.
That value is outside the range covered by icmp_pointers[], which only
describes the traditional ICMP types up to NR_ICMP_TYPES.

Avoid consulting icmp_pointers[] for reply types outside that range and
keep the existing behavior for normal ICMP replies unchanged.

Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Ruide Cao <caoruide123@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/ipv4/icmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 4e2a6c70dcd8..d8036663f035 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -373,7 +373,8 @@ static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd,
 				      to, len);
 
 	skb->csum = csum_block_add(skb->csum, csum, odd);
-	if (icmp_pointers[icmp_param->data.icmph.type].error)
+	if (icmp_param->data.icmph.type <= NR_ICMP_TYPES &&
+	    icmp_pointers[icmp_param->data.icmph.type].error)
 		nf_ct_attach(skb, icmp_param->skb);
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH nf] netfilter: xt_TCPMSS: check skb_dst before path-MTU clamping
From: Pablo Neira Ayuso @ 2026-04-19 10:24 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Weiming Shi, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Phil Sutter, Simon Horman, netfilter-devel, coreteam,
	netdev, Xiang Mei
In-Reply-To: <aePiSwmP6YEQ4mNE@strlen.de>

On Sat, Apr 18, 2026 at 09:58:03PM +0200, Florian Westphal wrote:
> Weiming Shi <bestswngs@gmail.com> wrote:
> > When TCPMSS with CLAMP_PMTU is used via nft_compat in a non-base
> > chain, par->hook_mask is set to 0, bypassing the checkentry hook
> > validation. The target can then run at PRE_ROUTING where skb_dst is
> > NULL, causing a null-ptr-deref in tcpmss_mangle_packet():
> > 
> >  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> >  RIP: 0010:tcpmss_mangle_packet (include/net/dst.h:219 net/netfilter/xt_TCPMSS.c:105)
> >   tcpmss_tg4 (net/netfilter/xt_TCPMSS.c:202)
> >   nft_target_eval_xt (net/netfilter/nft_compat.c:87)
> >   nft_do_chain (net/netfilter/nf_tables_core.c:287)
> >   nf_hook_slow (net/netfilter/core.c:623)
> > 
> > Check skb_dst() for NULL before calling dst_mtu().
> 
> FWIW I will apply this patch even though its wrong.
> 
> nft_compat.c is just too broken, I don't see how it can be
> fixed in any reasonable amount of time.
> 
> validation is done too early, at expression instantiation
> time.
> 
> This doesn't work because we have incomplete graph, it has
> to be done at final table validation time.

I remember this used to work, maybe it broke with recent updates on
the chain graph detection?

Once the non-basechain is added it should consider the basechain where
this can be reached.

> But then all required compat info (xtables hints) is gone
> and no longer available.

What?

> AFAICS the only way to resolve this is to cache the info in
> the nft_expr priv area (WHERE IS ABSOLUTELY DOESN'T BELONG!)
> because thats the only storage thewre is.

No.

^ permalink raw reply

* Re: [PATCH nf] netfilter: xt_TCPMSS: check skb_dst before path-MTU clamping
From: Pablo Neira Ayuso @ 2026-04-19 10:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Weiming Shi, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Phil Sutter, Simon Horman, netfilter-devel, coreteam,
	netdev, Xiang Mei
In-Reply-To: <aePiSwmP6YEQ4mNE@strlen.de>

On Sat, Apr 18, 2026 at 09:58:03PM +0200, Florian Westphal wrote:
> Weiming Shi <bestswngs@gmail.com> wrote:
> > When TCPMSS with CLAMP_PMTU is used via nft_compat in a non-base
> > chain, par->hook_mask is set to 0, bypassing the checkentry hook
> > validation. The target can then run at PRE_ROUTING where skb_dst is
> > NULL, causing a null-ptr-deref in tcpmss_mangle_packet():
> > 
> >  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> >  RIP: 0010:tcpmss_mangle_packet (include/net/dst.h:219 net/netfilter/xt_TCPMSS.c:105)
> >   tcpmss_tg4 (net/netfilter/xt_TCPMSS.c:202)
> >   nft_target_eval_xt (net/netfilter/nft_compat.c:87)
> >   nft_do_chain (net/netfilter/nf_tables_core.c:287)
> >   nf_hook_slow (net/netfilter/core.c:623)
> > 
> > Check skb_dst() for NULL before calling dst_mtu().
> 
> FWIW I will apply this patch even though its wrong.

And no please, do not apply this.

This needs to be fixes from the chain graph detection.

^ permalink raw reply

* [PATCH] net: dsa: realtek: rtl8365mb: fix mode mask calculation
From: Mieczyslaw Nalewaj @ 2026-04-19  9:59 UTC (permalink / raw)
  To: netdev
In-Reply-To: <c1519cfa-5a66-48b6-a578-e0ac03a57484.ref@yahoo.com>

The RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_MASK macro was shifting
the 4-bit mask (0xF) by only (_extint % 2) bits instead of
(_extint % 2) * 4. This caused the mask to overlap with the adjacent
nibble when configuring odd-numbered external interfaces, selecting
the wrong bits entirely.

Align the shift calculation with the existing ...MODE_OFFSET macro.

Signed-off-by: Abdulkader Alrezej <alrazj.abdulkader@gmail.com>
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
---
 drivers/net/dsa/realtek/rtl8365mb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/realtek/rtl8365mb.c b/drivers/net/dsa/realtek/rtl8365mb.c
index ad7044b295ec..b85c99216648 100644
--- a/drivers/net/dsa/realtek/rtl8365mb.c
+++ b/drivers/net/dsa/realtek/rtl8365mb.c
@@ -216,7 +216,7 @@
 		 (_extint) == 2 ? RTL8365MB_DIGITAL_INTERFACE_SELECT_REG1 : \
 		 0x0)
 #define   RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_MASK(_extint) \
-		(0xF << (((_extint) % 2)))
+		(0xF << (((_extint) % 2) * 4))
 #define   RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_OFFSET(_extint) \
 		(((_extint) % 2) * 4)
 
-- 
2.53.0

^ permalink raw reply related

* [PATCH net v2] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Daniel Golle @ 2026-04-19 10:43 UTC (permalink / raw)
  To: Chester A. Unal, Daniel Golle, Andrew Lunn, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Russell King,
	Christian Marangi, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek
  Cc: Frank Wunderlich, Christian Marangi (Ansuel), John Crispin

The .get_stats64 callback runs in atomic context, but on
MDIO-connected switches every register read acquires the MDIO bus
mutex, which can sleep:
[   12.645973] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:609
[   12.654442] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 759, name: grep
[   12.663377] preempt_count: 0, expected: 0
[   12.667410] RCU nest depth: 1, expected: 0
[   12.671511] INFO: lockdep is turned off.
[   12.675441] CPU: 0 UID: 0 PID: 759 Comm: grep Tainted: G S      W           7.0.0+ #0 PREEMPT
[   12.675453] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   12.675456] Hardware name: Bananapi BPI-R64 (DT)
[   12.675459] Call trace:
[   12.675462]  show_stack+0x14/0x1c (C)
[   12.675477]  dump_stack_lvl+0x68/0x8c
[   12.675487]  dump_stack+0x14/0x1c
[   12.675495]  __might_resched+0x14c/0x220
[   12.675504]  __might_sleep+0x44/0x80
[   12.675511]  __mutex_lock+0x50/0xb10
[   12.675523]  mutex_lock_nested+0x20/0x30
[   12.675532]  mt7530_get_stats64+0x40/0x2ac
[   12.675542]  dsa_user_get_stats64+0x2c/0x40
[   12.675553]  dev_get_stats+0x44/0x1e0
[   12.675564]  dev_seq_printf_stats+0x24/0xe0
[   12.675575]  dev_seq_show+0x14/0x3c
[   12.675583]  seq_read_iter+0x37c/0x480
[   12.675595]  seq_read+0xd0/0xec
[   12.675605]  proc_reg_read+0x94/0xe4
[   12.675615]  vfs_read+0x98/0x29c
[   12.675625]  ksys_read+0x54/0xdc
[   12.675633]  __arm64_sys_read+0x18/0x20
[   12.675642]  invoke_syscall.constprop.0+0x54/0xec
[   12.675653]  do_el0_svc+0x3c/0xb4
[   12.675662]  el0_svc+0x38/0x200
[   12.675670]  el0t_64_sync_handler+0x98/0xdc
[   12.675679]  el0t_64_sync+0x158/0x15c

For MDIO-connected switches, poll MIB counters asynchronously using a
delayed workqueue every second and let .get_stats64 return the cached
values under a spinlock. A mod_delayed_work() call on each read
triggers an immediate refresh so counters stay responsive when queried
more frequently.

MMIO-connected switches (MT7988, EN7581, AN7583) are not affected
because their regmap does not sleep, so they continue to read MIB
counters directly in .get_stats64.

Fixes: 88c810f35ed5 ("net: dsa: mt7530: implement .get_stats64")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Chester A. Unal <chester.a.unal@arinc9.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
v2:
 * use spin_lock_bh()/spin_unlock_bh() to prevent potential deadlock
 * rate-limit mod_delayed_work() refresh to at most once per 100ms
 * move cancel_delayed_work_sync() after dsa_unregister_switch()
 * add mt753x_teardown() callback to cancel the stats work
 * fix commit message

 drivers/net/dsa/mt7530.c | 66 ++++++++++++++++++++++++++++++++++++++--
 drivers/net/dsa/mt7530.h |  8 +++++
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index b9423389c2ef0..8c1186ba2279b 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -25,6 +25,9 @@
 
 #include "mt7530.h"
 
+#define MT7530_STATS_POLL_INTERVAL	(1 * HZ)
+#define MT7530_STATS_RATE_LIMIT		(HZ / 10)
+
 static struct mt753x_pcs *pcs_to_mt753x_pcs(struct phylink_pcs *pcs)
 {
 	return container_of(pcs, struct mt753x_pcs, pcs);
@@ -906,10 +909,9 @@ static void mt7530_get_rmon_stats(struct dsa_switch *ds, int port,
 	*ranges = mt7530_rmon_ranges;
 }
 
-static void mt7530_get_stats64(struct dsa_switch *ds, int port,
-			       struct rtnl_link_stats64 *storage)
+static void mt7530_read_port_stats64(struct mt7530_priv *priv, int port,
+				     struct rtnl_link_stats64 *storage)
 {
-	struct mt7530_priv *priv = ds->priv;
 	uint64_t data;
 
 	/* MIB counter doesn't provide a FramesTransmittedOK but instead
@@ -951,6 +953,45 @@ static void mt7530_get_stats64(struct dsa_switch *ds, int port,
 			       &storage->rx_crc_errors);
 }
 
+static void mt7530_stats_poll(struct work_struct *work)
+{
+	struct mt7530_priv *priv = container_of(work, struct mt7530_priv,
+						stats_work.work);
+	struct rtnl_link_stats64 stats = {};
+	struct dsa_port *dp;
+	int port;
+
+	dsa_switch_for_each_user_port(dp, priv->ds) {
+		port = dp->index;
+
+		mt7530_read_port_stats64(priv, port, &stats);
+
+		spin_lock_bh(&priv->stats_lock);
+		priv->ports[port].stats = stats;
+		spin_unlock_bh(&priv->stats_lock);
+	}
+
+	priv->stats_last = jiffies;
+	schedule_delayed_work(&priv->stats_work,
+			      MT7530_STATS_POLL_INTERVAL);
+}
+
+static void mt7530_get_stats64(struct dsa_switch *ds, int port,
+			       struct rtnl_link_stats64 *storage)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	if (priv->bus) {
+		spin_lock_bh(&priv->stats_lock);
+		*storage = priv->ports[port].stats;
+		spin_unlock_bh(&priv->stats_lock);
+		if (time_after(jiffies, priv->stats_last + MT7530_STATS_RATE_LIMIT))
+			mod_delayed_work(system_wq, &priv->stats_work, 0);
+	} else {
+		mt7530_read_port_stats64(priv, port, storage);
+	}
+}
+
 static void mt7530_get_eth_ctrl_stats(struct dsa_switch *ds, int port,
 				      struct ethtool_eth_ctrl_stats *ctrl_stats)
 {
@@ -3137,9 +3178,24 @@ mt753x_setup(struct dsa_switch *ds)
 	if (ret && priv->irq_domain)
 		mt7530_free_mdio_irq(priv);
 
+	if (!ret && priv->bus) {
+		spin_lock_init(&priv->stats_lock);
+		INIT_DELAYED_WORK(&priv->stats_work, mt7530_stats_poll);
+		schedule_delayed_work(&priv->stats_work,
+				      MT7530_STATS_POLL_INTERVAL);
+	}
+
 	return ret;
 }
 
+static void mt753x_teardown(struct dsa_switch *ds)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	if (priv->bus)
+		cancel_delayed_work_sync(&priv->stats_work);
+}
+
 static int mt753x_set_mac_eee(struct dsa_switch *ds, int port,
 			      struct ethtool_keee *e)
 {
@@ -3257,6 +3313,7 @@ static int mt7988_setup(struct dsa_switch *ds)
 static const struct dsa_switch_ops mt7530_switch_ops = {
 	.get_tag_protocol	= mtk_get_tag_protocol,
 	.setup			= mt753x_setup,
+	.teardown		= mt753x_teardown,
 	.preferred_default_local_cpu_port = mt753x_preferred_default_local_cpu_port,
 	.get_strings		= mt7530_get_strings,
 	.get_ethtool_stats	= mt7530_get_ethtool_stats,
@@ -3409,6 +3466,9 @@ mt7530_remove_common(struct mt7530_priv *priv)
 
 	dsa_unregister_switch(priv->ds);
 
+	if (priv->bus)
+		cancel_delayed_work_sync(&priv->stats_work);
+
 	mutex_destroy(&priv->reg_mutex);
 }
 EXPORT_SYMBOL_GPL(mt7530_remove_common);
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 3e0090bed298d..dd33b0df3419e 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -796,6 +796,7 @@ struct mt7530_fdb {
  * @pvid:	The VLAN specified is to be considered a PVID at ingress.  Any
  *		untagged frames will be assigned to the related VLAN.
  * @sgmii_pcs:	Pointer to PCS instance for SerDes ports
+ * @stats:	Cached port statistics for MDIO-connected switches
  */
 struct mt7530_port {
 	bool enable;
@@ -803,6 +804,7 @@ struct mt7530_port {
 	u32 pm;
 	u16 pvid;
 	struct phylink_pcs *sgmii_pcs;
+	struct rtnl_link_stats64 stats;
 };
 
 /* Port 5 mode definitions of the MT7530 switch */
@@ -875,6 +877,9 @@ struct mt753x_info {
  * @create_sgmii:	Pointer to function creating SGMII PCS instance(s)
  * @active_cpu_ports:	Holding the active CPU ports
  * @mdiodev:		The pointer to the MDIO device structure
+ * @stats_lock:		Protects cached per-port stats from concurrent access
+ * @stats_work:		Delayed work for polling MIB counters on MDIO switches
+ * @stats_last:		Jiffies timestamp of last MIB counter poll
  */
 struct mt7530_priv {
 	struct device		*dev;
@@ -900,6 +905,9 @@ struct mt7530_priv {
 	int (*create_sgmii)(struct mt7530_priv *priv);
 	u8 active_cpu_ports;
 	struct mdio_device *mdiodev;
+	spinlock_t stats_lock; /* protects cached stats counters */
+	struct delayed_work stats_work;
+	unsigned long stats_last;
 };
 
 struct mt7530_hw_vlan_entry {
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH net v2] net: iptunnel: fix stale transport header after GRE/TEB decap
From: Jiayuan Chen @ 2026-04-19 13:01 UTC (permalink / raw)
  To: Eric Dumazet, Jiayuan Chen
  Cc: netdev, syzbot+83181a31faf9455499c5, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Pravin B Shelar,
	Tom Herbert, linux-kernel
In-Reply-To: <CANn89iKJFCw9gPdQN4hYZ94j-0Ua84N+DyYjPjwBTRLveK-j7g@mail.gmail.com>

[...]
>>   +662,18 @@ static inline int iptunnel_pull_offloads(struct sk_buff *skb)
>>          return 0;
>>   }
>>
>> +static inline void iptunnel_rebuild_transport_header(struct sk_buff *skb)
>> +{
>> +       if (!skb_is_gso(skb))
>> +               return;
>> +
>> +       skb->transport_header = (typeof(skb->transport_header))~0U;
>> +       skb_probe_transport_header(skb);
>> +
>> +       if (!skb_transport_header_was_set(skb))
>> +               skb_gso_reset(skb);
> I do not think this makes sense.
> What is a valid case for this packet being processed further?
> The buggy packet must be dropped, instead of being mangled like this.
Hi Eric,

The reproducer builds a gre frame whose inner Ethernet header is 
all-zero. Tracing the skb through RX:

1. At GRE decap exit, skb_transport_offset(skb) < 0 is the rule, not the 
exception.

It is negative for every packet leaving the tunnel, including perfectly 
well-formed inner IPv4 traffic
because the tunnel leaves skb->transport_header at the outer L4 offset while
pskb_pull() has already advanced skb->data past it. 
skb_transport_header_was_set() stays true, so downstream
code that trusts that flag now trusts a stale, negative offset.

2. GRO repairs it — but only for protocols it knows.

In dev_gro_receive(), skb->protocol is dispatched through the offload 
table. For ETH_P_IP,
inet_gro_receive() calls skb_set_transport_header(skb, 
skb_gro_offset(skb)), and the offset
becomes valid again. But for malformed skb, dev_gro_receive just bypass it.

3. Both kinds then reach __netif_receive_skb_core().

So the skb that qdisc/tc/BPF segmenters later see has an
invariant violation — _was_set == true but offset < 0 — that the core
layer has no intention of catching for us.

My reading of this is that the tunnel decap path is producing an skb 
that doesn't
honor the contract __netif_receive_skb_core() expects from its 
producers, and that
it doesn't really make sense to ask GRE to parse or validate the inner 
L4 in order
to fix this.

I'm thinking at the end of GRE decap, before handing the skb to 
gro_cells_receive(),
call skb_reset_transport_header(skb).

Thanks,
Jiayuan

^ permalink raw reply

* Re: [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites
From: Vineeth Remanan Pillai @ 2026-04-19 13:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Dmitry Ilvokhin, Masami Hiramatsu,
	Mathieu Desnoyers, Ingo Molnar, Jens Axboe, io-uring,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Xin Long, Jon Maloy, Aaron Conole, Eelco Chaudron, Ilya Maximets,
	netdev, bpf, linux-sctp, tipc-discussion, dev, Jiri Pirko,
	Oded Gabbay, Koby Elbaz, dri-devel, Rafael J. Wysocki,
	Viresh Kumar, Gautham R. Shenoy, Huang Rui, Mario Limonciello,
	Len Brown, Srinivas Pandruvada, linux-pm, MyungJoo Ham,
	Kyungmin Park, Chanwoo Choi, Christian König, Sumit Semwal,
	linaro-mm-sig, Eddie James, Andrew Jeffery, Joel Stanley,
	linux-fsi, David Airlie, Simona Vetter, Alex Deucher,
	Danilo Krummrich, Matthew Brost, Philipp Stanner, Harry Wentland,
	Leo Li, amd-gfx, Jiri Kosina, Benjamin Tissoires, linux-input,
	Wolfram Sang, linux-i2c, Mark Brown, Michael Hennerich,
	Nuno Sá, linux-spi, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi, Chris Mason, David Sterba, linux-btrfs,
	Thomas Gleixner, Andrew Morton, SeongJae Park, linux-mm,
	Borislav Petkov, Dave Hansen, x86, linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260418190456.631df6f3@fedora>

On Sat, Apr 18, 2026 at 7:05 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 23 Mar 2026 12:00:19 -0400
> "Vineeth Pillai (Google)" <vineeth@bitbyteword.org> wrote:
>
> >   if (trace_foo_enabled() && cond)
> >       trace_call__foo(args);   /* calls __do_trace_foo() directly */
>
> Hi Vineeth,
>
> Could you rebase this series on top of 7.1-rc1 when it comes out?
> Several of these patches were accepted already. Obviously drop those.
> They were the patches that added the feature, and any where the
> maintainer acked the patch.
>
> Now that the feature has been accepted, if you post the patch series
> again after 7.1-rc1 with all the patches that haven't been accepted
> yet, then the maintainers can simply take them directly. As the feature
> is now accepted, there's no dependency on it, and they don't need to go
> through the tracing tree.
>
Sure, will do. Thanks for merging this feature.

Thanks,
Vineeth

^ permalink raw reply

* Re: [PATCH 3/9] iio: intel_dc_ti_adc: switch to using FIELD_GET_SIGNED()
From: Jonathan Cameron @ 2026-04-19 13:18 UTC (permalink / raw)
  To: Yury Norov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, David Lechner,
	Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
	Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
	Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
	netdev, linux-rtc
In-Reply-To: <20260417173621.368914-4-ynorov@nvidia.com>

On Fri, 17 Apr 2026 13:36:14 -0400
Yury Norov <ynorov@nvidia.com> wrote:

> Switch from sign_extend32(FIELD_GET()) to the dedicated
> FIELD_GET_SIGNED() and don't provide the fields length explicitly.
> 
> Signed-off-by: Yury Norov <ynorov@nvidia.com>

Assuming any remaining discussion on the implementation of the macro
shakes out, this looks like a good little cleanup to me.

Not sure how you want to merge this but if it's going through another tree.
Acked-by: Jonathan Cameron <jic23@kernel.org>

> ---
>  drivers/iio/adc/intel_dc_ti_adc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iio/adc/intel_dc_ti_adc.c b/drivers/iio/adc/intel_dc_ti_adc.c
> index 0fe34f1c338e..b5afad713e2d 100644
> --- a/drivers/iio/adc/intel_dc_ti_adc.c
> +++ b/drivers/iio/adc/intel_dc_ti_adc.c
> @@ -290,8 +290,8 @@ static int dc_ti_adc_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> -	info->vbat_zse = sign_extend32(FIELD_GET(DC_TI_VBAT_ZSE, val), 3);
> -	info->vbat_ge = sign_extend32(FIELD_GET(DC_TI_VBAT_GE, val), 3);
> +	info->vbat_zse = FIELD_GET_SIGNED(DC_TI_VBAT_ZSE, val);
> +	info->vbat_ge = FIELD_GET_SIGNED(DC_TI_VBAT_GE, val);
>  
>  	dev_dbg(dev, "vbat-zse %d vbat-ge %d\n", info->vbat_zse, info->vbat_ge);
>  


^ permalink raw reply

* Re: [PATCH 4/9] iio: magnetometer: yas530: switch to using FIELD_GET_SIGNED()
From: Jonathan Cameron @ 2026-04-19 13:20 UTC (permalink / raw)
  To: Yury Norov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, David Lechner,
	Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
	Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
	Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
	netdev, linux-rtc
In-Reply-To: <20260417173621.368914-5-ynorov@nvidia.com>

On Fri, 17 Apr 2026 13:36:15 -0400
Yury Norov <ynorov@nvidia.com> wrote:

> Switch from sign_extend32(FIELD_GET()) to the dedicated
> FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
> 
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
Acked-by: Jonathan Cameron <jic23@kernel.org>

^ permalink raw reply

* Re: [PATCH 5/9] iio: pressure: bmp280: switch to using FIELD_GET_SIGNED()
From: Jonathan Cameron @ 2026-04-19 13:21 UTC (permalink / raw)
  To: Yury Norov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, David Lechner,
	Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
	Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
	Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
	netdev, linux-rtc
In-Reply-To: <20260417173621.368914-6-ynorov@nvidia.com>

On Fri, 17 Apr 2026 13:36:16 -0400
Yury Norov <ynorov@nvidia.com> wrote:

> Switch from sign_extend32(FIELD_GET()) to the dedicated
> FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
> 
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
Acked-by: Jonathan Cameron <jic23@kernel.org>

^ permalink raw reply

* Re: [PATCH 6/9] iio: mcp9600: switch to using FIELD_GET_SIGNED()
From: Jonathan Cameron @ 2026-04-19 13:21 UTC (permalink / raw)
  To: Yury Norov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, David Lechner,
	Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
	Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
	Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
	netdev, linux-rtc
In-Reply-To: <20260417173621.368914-7-ynorov@nvidia.com>

On Fri, 17 Apr 2026 13:36:17 -0400
Yury Norov <ynorov@nvidia.com> wrote:

> Switch from sign_extend32(FIELD_GET()) to the dedicated
> FIELD_GET_SIGNED() and don't calculate the fields length explicitly.
> 
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
Acked-by: Jonathan Cameron <jic23@kernel.org>

^ permalink raw reply

* Re: [PATCH net 1/1] ipv4: icmp: validate reply type before using icmp_pointers
From: Eric Dumazet @ 2026-04-19 14:21 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, davem, dsahern, kuba, pabeni, horms, andreas.a.roeseler,
	yuantan098, yifanwucs, tomapufckgml, bird, caoruide123
In-Reply-To: <efb2c33f544ece7727704608fc577525b415ae26.1776563662.git.caoruide123@gmail.com>

On Sun, Apr 19, 2026 at 3:24 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
>
> From: Ruide Cao <caoruide123@gmail.com>
>
> Extended echo replies use ICMP_EXT_ECHOREPLY as the outbound reply type.
> That value is outside the range covered by icmp_pointers[], which only
> describes the traditional ICMP types up to NR_ICMP_TYPES.
>
> Avoid consulting icmp_pointers[] for reply types outside that range and
> keep the existing behavior for normal ICMP replies unchanged.
>
> Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Ruide Cao <caoruide123@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
>  net/ipv4/icmp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 4e2a6c70dcd8..d8036663f035 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -373,7 +373,8 @@ static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd,
>                                       to, len);
>
>         skb->csum = csum_block_add(skb->csum, csum, odd);
> -       if (icmp_pointers[icmp_param->data.icmph.type].error)
> +       if (icmp_param->data.icmph.type <= NR_ICMP_TYPES &&
> +           icmp_pointers[icmp_param->data.icmph.type].error)
>                 nf_ct_attach(skb, icmp_param->skb);
>         return 0;

Pedantic mode: Perhaps also use array_index_nospec()

^ permalink raw reply

* Re: [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
From: Simon Horman @ 2026-04-19 14:27 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Morton, Hans Verkuil, Alex Deucher,
	Ian Rogers, Jonathan Cameron, Kees Cook, Ingo Molnar, Alan Cox,
	netdev
In-Reply-To: <20260415213359.335657-2-bestswngs@gmail.com>

On Thu, Apr 16, 2026 at 05:34:00AM +0800, Weiming Shi wrote:
> sl_bump() reserves only 80 bytes of expansion headroom before calling
> slhc_uncompress(), but the reconstructed IP + TCP header is up to
> ip->ihl*4 + thp->doff*4 bytes. IHL and TCP doff are 4-bit fields and
> both can legitimately reach 15, so the header can grow to 2*15*4 =
> 120 bytes. A VJ-uncompressed primer with ihl=15, doff=15 followed by
> a compressed frame of size buffsize - 80 therefore writes up to
> 33 bytes past the kmalloc(buffsize + 4) rbuff allocation, with
> attacker-controlled content:
> 
>  BUG: KASAN: slab-out-of-bounds in slhc_uncompress
>  Write of size 1069 at addr ffff88800ba93078 by task kworker/u8:1/32
>  Workqueue: events_unbound flush_to_ldisc
>  Call Trace:
>   __asan_memmove+0x3f/0x70
>   slhc_uncompress (drivers/net/slip/slhc.c:614)
>   slip_receive_buf (drivers/net/slip/slip.c:342)
>   tty_ldisc_receive_buf
>   flush_to_ldisc
> 
> Raise the reservation to match the real worst case. The ppp_generic
> receive path already enforces skb_tailroom >= 124 and is unaffected.
> 
> Fixes: b5451d783ade ("slip: Move the SLIP drivers")
> Reported-by: Simon Horman <horms@kernel.org>

FTR, I was mainly passing on a review generated by Sashiko

> Signed-off-by: Weiming Shi <bestswngs@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>

As usual I'll comment on the review of this patch by Sashiko.

TL;DR: I don't think it should block progress of this patch.

The review by Sashiko flags out of bounds errors. However,
these are addressed by one of your other patches:

- [PATCH net] slip: bound decode() reads against the compressed packet length
  https://lore.kernel.org/netdev/20260416100147.531855-5-bestswngs@gmail.com/

As noted in my review of that patch, while it seems too late for these
patches, please consider bundling related patches in a patchset in future.

^ permalink raw reply

* Re: [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
From: Simon Horman @ 2026-04-19 14:27 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Morton, Hans Verkuil, Alex Deucher,
	Ian Rogers, Jonathan Cameron, Kees Cook, Ingo Molnar, Alan Cox,
	netdev
In-Reply-To: <20260415213359.335657-2-bestswngs@gmail.com>

On Thu, Apr 16, 2026 at 05:34:00AM +0800, Weiming Shi wrote:
> sl_bump() reserves only 80 bytes of expansion headroom before calling
> slhc_uncompress(), but the reconstructed IP + TCP header is up to
> ip->ihl*4 + thp->doff*4 bytes. IHL and TCP doff are 4-bit fields and
> both can legitimately reach 15, so the header can grow to 2*15*4 =
> 120 bytes. A VJ-uncompressed primer with ihl=15, doff=15 followed by
> a compressed frame of size buffsize - 80 therefore writes up to
> 33 bytes past the kmalloc(buffsize + 4) rbuff allocation, with
> attacker-controlled content:
> 
>  BUG: KASAN: slab-out-of-bounds in slhc_uncompress
>  Write of size 1069 at addr ffff88800ba93078 by task kworker/u8:1/32
>  Workqueue: events_unbound flush_to_ldisc
>  Call Trace:
>   __asan_memmove+0x3f/0x70
>   slhc_uncompress (drivers/net/slip/slhc.c:614)
>   slip_receive_buf (drivers/net/slip/slip.c:342)
>   tty_ldisc_receive_buf
>   flush_to_ldisc
> 
> Raise the reservation to match the real worst case. The ppp_generic
> receive path already enforces skb_tailroom >= 124 and is unaffected.
> 
> Fixes: b5451d783ade ("slip: Move the SLIP drivers")
> Reported-by: Simon Horman <horms@kernel.org>

FTR, I was mainly passing on information flagged by Sashiko.

> Signed-off-by: Weiming Shi <bestswngs@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>

Let me summarise my understanding of Sashiko's review of this patch.

TL;DR: I don't think that review should block progress of this patch.

1. The issue wrt concurrent MTU changes appears to be a separate,
   pre-existing problem. Maybe you've looked into it already,
   if not, you may wish to.

2. The bounds checking problems are addressed by other patches in flight.

   - [PATCH net v2] slip: reject VJ receive packets on instances with no rstate array
     https://lore.kernel.org/netdev/20260415204130.258866-2-bestswngs@gmail.com/

   - [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
     https://lore.kernel.org/netdev/20260415213359.335657-2-bestswngs@gmail.com/

   In future you might want to consider creating patch sets for related
   patches. But I think it's too late in the case of these patches.

...

^ permalink raw reply

* Re: [PATCH net v2] ipv6: Apply max_dst_opts_cnt to ip6_tnl_parse_tlv_enc_lim
From: Ido Schimmel @ 2026-04-19 14:31 UTC (permalink / raw)
  To: Justin Iurman, daniel
  Cc: kuba, edumazet, dsahern, tom, willemdebruijn.kernel, pabeni,
	netdev
In-Reply-To: <8fdf517b-6217-4df6-8adf-0c79ce8d3be8@gmail.com>

On Sun, Apr 19, 2026 at 12:37:35AM +0200, Justin Iurman wrote:
> Nope. But if it happens, users would be confused as max_dst_opts_cnt would
> not have the same meaning in two different code paths. OTOH, I agree that
> such situation would look suspicious. I guess it's fine to keep your patch
> as is and to not over-complicate things unnecessarily.

I agree that it's weird to reuse max_dst_opts_cnt here:

1. The meaning is different from the Rx path.

2. We only enforce max_dst_opts_cnt, but not max_dst_opts_len.

3. The default is derived from the initial netns, unlike in the Rx path.

Given the above and that:

1. We believe that 8 options until the tunnel encapsulation limit option
is liberal enough.

2. We don't want to over-complicate things.

Can we go with an hard coded 8 and see if anyone complains? In the
unlikely case that someone complains we can at least gain some insight
into how this option is actually used with tunnels.

^ permalink raw reply

* Re: [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
From: Simon Horman @ 2026-04-19 14:32 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Andrew Morton, Hans Verkuil, Alex Deucher,
	Ian Rogers, Jonathan Cameron, Kees Cook, Ingo Molnar, Alan Cox,
	netdev
In-Reply-To: <20260419142720.GJ280379@horms.kernel.org>

On Sun, Apr 19, 2026 at 03:27:26PM +0100, Simon Horman wrote:
> On Thu, Apr 16, 2026 at 05:34:00AM +0800, Weiming Shi wrote:
> > sl_bump() reserves only 80 bytes of expansion headroom before calling
> > slhc_uncompress(), but the reconstructed IP + TCP header is up to
> > ip->ihl*4 + thp->doff*4 bytes. IHL and TCP doff are 4-bit fields and
> > both can legitimately reach 15, so the header can grow to 2*15*4 =
> > 120 bytes. A VJ-uncompressed primer with ihl=15, doff=15 followed by
> > a compressed frame of size buffsize - 80 therefore writes up to
> > 33 bytes past the kmalloc(buffsize + 4) rbuff allocation, with
> > attacker-controlled content:
> > 
> >  BUG: KASAN: slab-out-of-bounds in slhc_uncompress
> >  Write of size 1069 at addr ffff88800ba93078 by task kworker/u8:1/32
> >  Workqueue: events_unbound flush_to_ldisc
> >  Call Trace:
> >   __asan_memmove+0x3f/0x70
> >   slhc_uncompress (drivers/net/slip/slhc.c:614)
> >   slip_receive_buf (drivers/net/slip/slip.c:342)
> >   tty_ldisc_receive_buf
> >   flush_to_ldisc
> > 
> > Raise the reservation to match the real worst case. The ppp_generic
> > receive path already enforces skb_tailroom >= 124 and is unaffected.
> > 
> > Fixes: b5451d783ade ("slip: Move the SLIP drivers")
> > Reported-by: Simon Horman <horms@kernel.org>
> 
> FTR, I was mainly passing on information flagged by Sashiko.
> 
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

I'm very sorry but the text below below was for a different,
albeit related, patch:

- [PATCH net] slip: bound decode() reads against the compressed packet length
  https://lore.kernel.org/netdev/20260416100147.531855-5-bestswngs@gmail.com/

The corresponding text relating to this patch was posted as:

https://lore.kernel.org/netdev/20260419142710.GI280379@horms.kernel.org/

Sorry for the mix up!

> Let me summarise my understanding of Sashiko's review of this patch.
> 
> TL;DR: I don't think that review should block progress of this patch.
> 
> 1. The issue wrt concurrent MTU changes appears to be a separate,
>    pre-existing problem. Maybe you've looked into it already,
>    if not, you may wish to.
> 
> 2. The bounds checking problems are addressed by other patches in flight.
> 
>    - [PATCH net v2] slip: reject VJ receive packets on instances with no rstate array
>      https://lore.kernel.org/netdev/20260415204130.258866-2-bestswngs@gmail.com/
> 
>    - [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
>      https://lore.kernel.org/netdev/20260415213359.335657-2-bestswngs@gmail.com/
> 
>    In future you might want to consider creating patch sets for related
>    patches. But I think it's too late in the case of these patches.
> 
> ...

^ permalink raw reply

* Re: [PATCH net v2 1/1] net: l3mdev: Ignore non-L3 uppers in l3mdev_fib_table_rcu
From: Haoze Xie @ 2026-04-19 14:38 UTC (permalink / raw)
  To: Ido Schimmel, Ao Zhou
  Cc: netdev, David Ahern, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Ido Schimmel,
	Jiri Pirko, Yifan Wu, Juefei Pu, Yuan Tan, Xin Liu, royenheart
In-Reply-To: <6eb15ec6-6994-4b24-9a53-48a653b96860@gmail.com>


On 4/19/2026 11:49 AM, Haoze Xie wrote:
> 
> On 4/6/2026 11:48 PM, Ido Schimmel wrote:
>> On Mon, Apr 06, 2026 at 09:28:16PM +0800, Ao Zhou wrote:
>>> From: Haoze Xie <royenheart@gmail.com>
>>>
>>> l3mdev_fib_table_rcu() assumes that any upper device observed for
>>> an IFF_L3MDEV_SLAVE device is an L3 master and dereferences
>>> master->l3mdev_ops unconditionally.
>>>
>>> VRF slave setup sets IFF_L3MDEV_SLAVE before the upper link is fully
>>> switched, so readers can transiently observe a non-L3 upper such as a
>>> bridge and follow a NULL l3mdev_ops pointer. Require the current upper
>>> to still be an L3 master before consulting its FIB table.
>>>
>>> Fixes: fdeea7be88b1 ("net: vrf: Set slave's private flag before linking")
>>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>>> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>>> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>>> Suggested-by: Xin Liu <bird@lzu.edu.cn>
>>> Reviewed-by: David Ahern <dsahern@kernel.org>
>>> Signed-off-by: Haoze Xie <royenheart@gmail.com>
>>> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn>
>>> ---
>>> changes in v2:
>>> - point Fixes to the VRF slave ordering change identified by David Ahern
>>> - add David Ahern's Reviewed-by trailer
>>>
>>>  net/l3mdev/l3mdev.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
>>> index 5432a5f2dfc8..b8a3030cb2c4 100644
>>> --- a/net/l3mdev/l3mdev.c
>>> +++ b/net/l3mdev/l3mdev.c
>>> @@ -177,7 +177,7 @@ u32 l3mdev_fib_table_rcu(const struct net_device *dev)
>>>  		const struct net_device *master;
>>>  
>>>  		master = netdev_master_upper_dev_get_rcu(_dev);
>>> -		if (master &&
>>> +		if (master && netif_is_l3_master(master) &&
>>>  		    master->l3mdev_ops->l3mdev_fib_table)
>>
>> Don't we have the same problem in l3mdev_l3_rcv() and l3mdev_l3_out()?
>> If so, please check if I missed more places and include them in v3.
>>
> 
> I checked the same pattern in the other slave-side helpers, and v3 now
> extends the fix to both `l3mdev_l3_rcv()` and `l3mdev_l3_out()` in
> addition to `l3mdev_fib_table_rcu()`.
> 
> All three helpers resolve the current upper with
> `netdev_master_upper_dev_get_rcu()` and then use `master->l3mdev_ops`.
> So v3 consistently requires the resolved upper to still satisfy
> `netif_is_l3_master(master)` before dereferencing `l3mdev_ops`.
> 

While updating the patch, I found that `l3mdev_l3_rcv()` must keep
working for existing `IFF_L3MDEV_RX_HANDLER` users such as
`ipvlan_l3s`. The v3 patch keeps that direct RX-handler path
intact and applies the extra master check only to the slave-resolved
upper case.

the smoke test, it should ping succeffully:

===> BEGIN smoke test cmd <===
ip netns add ipvl_ns
ip link add ipvl_host type veth peer name ipvl_peer
ip link set ipvl_peer netns ipvl_ns

ip link set ipvl_host up
ip link add link ipvl_host name ipvl0 type ipvlan mode l3s
ip addr add 198.51.100.1/24 dev ipvl0
ip link set ipvl0 up

ip netns exec ipvl_ns ip link set lo up
ip netns exec ipvl_ns ip link set ipvl_peer up
ip netns exec ipvl_ns ip addr add 198.51.100.2/24 dev ipvl_peer
ip netns exec ipvl_ns ping -c 3 198.51.100.1
===> END smoke test cmd <===

>> And I think that the part that I was missing earlier is that we don't
>> have RCU synchronization in the unslaving path, so an RCU reader can
>> either see the original master, NULL or a new master (e.g., bridge
>> instead of the original VRF master).
>>
>>>  			tb_id = master->l3mdev_ops->l3mdev_fib_table(master);
>>>  	}
>>> -- 
>>> 2.53.0
>>>
> 


^ permalink raw reply

* Re: [PATCH net] slip: bound decode() reads against the compressed packet length
From: Simon Horman @ 2026-04-19 14:56 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev
In-Reply-To: <20260416100147.531855-5-bestswngs@gmail.com>

On Thu, Apr 16, 2026 at 06:01:51PM +0800, Weiming Shi wrote:
> slhc_uncompress() parses a VJ-compressed TCP header by advancing a
> pointer through the packet via decode() and pull16(). Neither helper
> bounds-checks against isize, and decode() masks its return with
> & 0xffff so it can never return the -1 that callers test for -- those
> error paths are dead code.
> 
> A short compressed frame whose change byte requests optional fields
> lets decode() read past the end of the packet. The over-read bytes
> are folded into the cached cstate and reflected into subsequent
> reconstructed packets.
> 
> Make decode() and pull16() take the packet end pointer and return -1
> when exhausted. Add a bounds check before the TCP-checksum read.
> The existing == -1 tests now do what they were always meant to.
> 
> Fixes: b5451d783ade ("slip: Move the SLIP drivers")

AI generated review points out that the cited patch only moves code,
so it isn't the origin of the bug. It seems that the problem has been
present since the beginning of git history. So:

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

> Reported-by: Simon Horman <horms@kernel.org>

FTR, I believe I was mainly passing on AI generated review

> Closes: https://lore.kernel.org/netdev/20260414134126.758795-2-horms@kernel.org/
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>

> ---
>  drivers/net/slip/slhc.c | 43 ++++++++++++++++++++++++-----------------
>  1 file changed, 25 insertions(+), 18 deletions(-)

As usual I'll comment on the review of this patch by Sashiko.

TL;DR: I don't think it should block progress of this patch.

The review by Sashiko flags out of bounds errors. However,
these are addressed by one of your other patches:

- [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
  https://lore.kernel.org/netdev/20260415213359.335657-2-bestswngs@gmail.com/

As noted in my review of that patch, while it seems too late for these
patches, please consider bundling related patches in a patchset in future.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox