Netdev List
 help / color / mirror / Atom feed
* [syzbot] [bridge?] KASAN: use-after-free Read in qdisc_pkt_len_segs_init
From: syzbot @ 2026-04-14 11:58 UTC (permalink / raw)
  To: bridge, davem, edumazet, horms, idosch, kuba, linux-kernel,
	netdev, pabeni, razor, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    17ad4759a082 Merge branch 'wangxun-improvement'
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1505dcd2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=229411a0a13ccb7d
dashboard link: https://syzkaller.appspot.com/bug?extid=83181a31faf9455499c5
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b67be09d914c/disk-17ad4759.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/20a2548795c3/vmlinux-17ad4759.xz
kernel image: https://storage.googleapis.com/syzbot-assets/29e723395cef/bzImage-17ad4759.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+83181a31faf9455499c5@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __tcp_hdrlen include/linux/tcp.h:31 [inline]
BUG: KASAN: use-after-free in qdisc_pkt_len_segs_init+0x7f8/0xa30 net/core/dev.c:4146
Read of size 2 at addr ffff88815ace2434 by task syz.2.24/6033

CPU: 0 UID: 0 PID: 6033 Comm: syz.2.24 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <IRQ>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xba/0x230 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 __tcp_hdrlen include/linux/tcp.h:31 [inline]
 qdisc_pkt_len_segs_init+0x7f8/0xa30 net/core/dev.c:4146
 sch_handle_ingress net/core/dev.c:4483 [inline]
 __netif_receive_skb_core+0x13bd/0x31a0 net/core/dev.c:6065
 __netif_receive_skb_list_core+0x24d/0x810 net/core/dev.c:6289
 __netif_receive_skb_list net/core/dev.c:6356 [inline]
 netif_receive_skb_list_internal+0x995/0xcf0 net/core/dev.c:6447
 gro_normal_list include/net/gro.h:523 [inline]
 gro_flush_normal include/net/gro.h:531 [inline]
 napi_complete_done+0x299/0x730 net/core/dev.c:6815
 gro_cell_poll+0x5a9/0x5d0 net/core/gro_cells.c:74
 __napi_poll+0xae/0x340 net/core/dev.c:7742
 napi_poll net/core/dev.c:7805 [inline]
 net_rx_action+0x627/0xf70 net/core/dev.c:7962
 handle_softirqs+0x22a/0x870 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 tun_rx_batched+0x617/0x790 drivers/net/tun.c:-1
 tun_get_user+0x2aeb/0x3ed0 drivers/net/tun.c:1953
 tun_chr_write_iter+0x113/0x200 drivers/net/tun.c:1999
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x61d/0xb90 fs/read_write.c:688
 ksys_write+0x150/0x270 fs/read_write.c:740
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f177e39c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f177f1e3028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f177e616180 RCX: 00007f177e39c819
RDX: 000000000000fdef RSI: 00002000000002c0 RDI: 0000000000000003
RBP: 00007f177e432c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f177e616218 R14: 00007f177e616180 R15: 00007ffee7d40588
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15ace2
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
raw: 057ff00000000000 ffffea00056b3888 ffffea00056b3888 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner info is not present (never set?)

Memory state around the buggy address:
 ffff88815ace2300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88815ace2380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88815ace2400: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                     ^
 ffff88815ace2480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88815ace2500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Jesper Dangaard Brouer @ 2026-04-14 11:56 UTC (permalink / raw)
  To: Mark Brown, David Miller, Jakub Kicinski, Paolo Abeni, Networking
  Cc: Fernando Fernandez Mancera, Linux Kernel Mailing List,
	Linux Next Mailing List
In-Reply-To: <adz0iX85FHMz0HdO@sirena.org.uk>



On 13/04/2026 15.50, Mark Brown wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>    include/net/sch_generic.h
> 
> between commit:
> 
>    a6bd339dbb351 ("net_sched: fix skb memory leak in deferred qdisc drops")
> 
> from the net tree and commit:
> 
>    ff2998f29f390 ("net: sched: introduce qdisc-specific drop reason tracing")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc include/net/sch_generic.h
> index 5fc0b1ebaf25c,5af262ec4bbd2..0000000000000
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@@ -1168,24 -1185,14 +1185,24 @@@ static inline void tcf_kfree_skb_list(s
>    }
>    
>    static inline void qdisc_dequeue_drop(struct Qdisc *q, struct sk_buff *skb,
> - 				      enum skb_drop_reason reason)
> + 				      enum qdisc_drop_reason reason)
>    {
>   +	struct Qdisc *root;
>   +
>    	DEBUG_NET_WARN_ON_ONCE(!(q->flags & TCQ_F_DEQUEUE_DROPS));
>    	DEBUG_NET_WARN_ON_ONCE(q->flags & TCQ_F_NOLOCK);
>    
>   -	tcf_set_qdisc_drop_reason(skb, reason);
>   -	skb->next = q->to_free;
>   -	q->to_free = skb;
>   +	rcu_read_lock();
>   +	root = qdisc_root_sleeping(q);
>   +
>   +	if (root->flags & TCQ_F_DEQUEUE_DROPS) {
> - 		tcf_set_drop_reason(skb, reason);
> ++		tcf_set_qdisc_drop_reason(skb, reason);

Change/merge looks sane to me :-)
--Jesper


>   +		skb->next = root->to_free;
>   +		root->to_free = skb;
>   +	} else {
>   +		kfree_skb_reason(skb, (enum skb_drop_reason)reason);
>   +	}
>   +	rcu_read_unlock();
>    }
>    
>    /* Instead of calling kfree_skb() while root qdisc lock is held,


^ permalink raw reply

* Re: [PATCH v2] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Fernando Fernandez Mancera @ 2026-04-14 11:50 UTC (permalink / raw)
  To: Kito Xu (veritas501), pablo
  Cc: coreteam, davem, edumazet, ffmancera, fw, horms, kuba,
	linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <20260414104900.2617863-1-hxzene@gmail.com>

On 4/14/26 12:49 PM, Kito Xu (veritas501) wrote:
> nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
> to in_dev_for_each_ifa_rcu() without checking for NULL. When the
> receiving device has no IPv4 configuration (ip_ptr is NULL),
> __in_dev_get_rcu() returns NULL and in_dev_for_each_ifa_rcu()
> dereferences it unconditionally, causing a kernel crash.
> 
> This can happen when a packet arrives on a device that has had its
> IPv4 configuration removed (e.g., MTU set below IPV4_MIN_MTU causing
> inetdev_destroy) or on a device that was never assigned an IPv4
> address, while an xt_osf or nft_osf rule with TTL_LESS mode is
> active and the packet TTL exceeds the fingerprint TTL.
> 
> Add a NULL check for in_dev before using it. When in_dev is NULL,
> return 0 (no match) since source-address locality cannot be
> determined without IPv4 addresses on the device.
> 
> KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
> RIP: 0010:nf_osf_match_one+0x204/0xa70
> Call Trace:
>   <IRQ>
>   nf_osf_match+0x2f8/0x780
>   xt_osf_match_packet+0x11c/0x1f0
>   ipt_do_table+0x7fe/0x12b0
>   nf_hook_slow+0xac/0x1e0
>   ip_rcv+0x123/0x370
>   __netif_receive_skb_one_core+0x166/0x1b0
>   process_backlog+0x197/0x590
>   __napi_poll+0xa1/0x540
>   net_rx_action+0x401/0xd80
>   handle_softirqs+0x19f/0x610
>   </IRQ>
> 
> Fixes: a218dc82f0b5 ("netfilter: nft_osf: Add ttl option support")
> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>

Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

Thanks !

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v1 0/3] i40e: support XDP metadata ops (RX
From: Holda, Patryk @ 2026-04-14 11:46 UTC (permalink / raw)
  To: Joe Damato, Kohei Enju
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	kohei.enju@gmail.com
In-Reply-To: <ab3TtpEKY5Pg+uQt@devvm20253.cco0.facebook.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Joe Damato
> Sent: Saturday, March 21, 2026 12:10 AM
> To: Kohei Enju <kohei@enjuk.jp>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen,
> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; kohei.enju@gmail.com
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 0/3] i40e: support XDP
> metadata ops (RX
> 
> On Thu, Mar 19, 2026 at 05:16:41PM +0000, Kohei Enju wrote:
> > This series adds support for XDP metadata ops. Since the i40e RX
> > timestamps are not available from the RX descriptor in the XDP path,
> > this series doesn't implement bpf_xdp_metadata_rx_timestamp().
> >
> > Patch 1/3 prepares i40e_xdp_buff for subsequent patches.
> > Patch 2/3 and 3/3 introduce bpf_xdp_metadata_rx_hash() and
> > bpf_xdp_metadata_rx_vlan_tag() respectively.
> >
> > Tested on Intel Corporation Ethernet Controller X710 for 10GbE SFP+
> > with ./tools/testing/selftests/bpf/xdp_hw_metadata.
> > Since i40e doesn't support HWTSTAMP_FILTER_ALL as an rx_filter, I
> > locally changed the selftest to use HWTSTAMP_FILTER_NONE instead.
> >
> > Kohei Enju (3):
> >   i40e: prepare for XDP metadata ops support
> >   i40e: add support for bpf_xdp_metadata_rx_hash()
> >   i40e: add support for bpf_xdp_metadata_rx_vlan_tag()
> >
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 51
> > ++++++++++++++++++++-  drivers/net/ethernet/intel/i40e/i40e_txrx.c |
> > 5 +-  drivers/net/ethernet/intel/i40e/i40e_txrx.h |  7 ++-
> > drivers/net/ethernet/intel/i40e/i40e_type.h |  5 ++
> > drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 +++++
> >  5 files changed, 77 insertions(+), 3 deletions(-)
> 
> For the series:
> 
> Reviewed-by: Joe Damato <joe@dama.to>

Tested-by: Patryk Holda <patryk.holda@intel.com> 



^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v1 3/3] i40e: add support for bpf_xdp_metadata_rx_vlan_tag()
From: Holda, Patryk @ 2026-04-14 11:46 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Kohei Enju,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
  Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	kohei.enju@gmail.com
In-Reply-To: <IA3PR11MB89865073DDFC1987A6445DA7E54CA@IA3PR11MB8986.namprd11.prod.outlook.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Loktionov, Aleksandr
> Sent: Friday, March 20, 2026 7:58 AM
> To: Kohei Enju <kohei@enjuk.jp>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; kohei.enju@gmail.com
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 3/3] i40e: add support for
> bpf_xdp_metadata_rx_vlan_tag()
> 
> 
> 
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > Of Kohei Enju
> > Sent: Thursday, March 19, 2026 6:17 PM
> > To: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> > Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn
> > <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
> > Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>;
> Paolo
> > Abeni <pabeni@redhat.com>; kohei.enju@gmail.com; Kohei Enju
> > <kohei@enjuk.jp>
> > Subject: [Intel-wired-lan] [PATCH iwl-next v1 3/3] i40e: add support
> > for bpf_xdp_metadata_rx_vlan_tag()
> >
> > Introduce i40e_xdp_rx_vlan_tag() which takes the same approach as
> > i40e_process_skb_fields() to extract the VLAN tag from the RX
> > descriptor.
> >
> > Tested with X710 adapter using xdp_hw_metadata, and confirmed that
> > VLAN tags match between bpf_xdp_metadata_rx_vlan_tag() and
> > skb->vlan_proto/vlan_tci.
> >
> > Signed-off-by: Kohei Enju <kohei@enjuk.jp>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > index 6b7e34b16a8d..3749f32ef95a 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > @@ -13610,8 +13610,27 @@ static int i40e_xdp_rx_hash(const struct
> > xdp_md *_ctx, u32 *hash,
> >  	return 0;
> >  }
> >
> > +static int i40e_xdp_rx_vlan_tag(const struct xdp_md *_ctx, __be16
> > *vlan_proto,
> > +				u16 *vlan_tci)
> > +{
> > +	const struct i40e_xdp_buff *ctx = (const void *)_ctx;
> > +	const union i40e_rx_desc *desc = ctx->desc;
> > +	u64 status;
> > +
> > +	status = le64_to_cpu(desc->wb.qword1.status_error_len);
> > +
> > +	if (!(status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)))
> > +		return -ENODATA;
> > +
> > +	*vlan_proto = cpu_to_be16(ETH_P_8021Q);
> > +	*vlan_tci = le16_to_cpu(desc->wb.qword0.lo_dword.l2tag1);
> > +
> > +	return 0;
> > +}
> > +
> >  static const struct xdp_metadata_ops i40e_xdp_metadata_ops = {
> >  	.xmo_rx_hash		= i40e_xdp_rx_hash,
> > +	.xmo_rx_vlan_tag	= i40e_xdp_rx_vlan_tag,
> >  };
> >
> >  static const struct net_device_ops i40e_netdev_ops = {
> > --
> > 2.51.0
> 
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

Tested-by: Patryk Holda <patryk.holda@intel.com> 



^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v1 2/3] i40e: add support for bpf_xdp_metadata_rx_hash()
From: Holda, Patryk @ 2026-04-14 11:45 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Kohei Enju,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
  Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	kohei.enju@gmail.com
In-Reply-To: <IA3PR11MB8986D3E4DF65EC87E23A6C1BE54CA@IA3PR11MB8986.namprd11.prod.outlook.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Loktionov, Aleksandr
> Sent: Friday, March 20, 2026 7:57 AM
> To: Kohei Enju <kohei@enjuk.jp>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; kohei.enju@gmail.com
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 2/3] i40e: add support for
> bpf_xdp_metadata_rx_hash()
> 
> 
> 
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > Of Kohei Enju
> > Sent: Thursday, March 19, 2026 6:17 PM
> > To: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> > Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn
> > <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
> > Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>;
> Paolo
> > Abeni <pabeni@redhat.com>; kohei.enju@gmail.com; Kohei Enju
> > <kohei@enjuk.jp>
> > Subject: [Intel-wired-lan] [PATCH iwl-next v1 2/3] i40e: add support
> > for bpf_xdp_metadata_rx_hash()
> >
> > Introduce i40e_xdp_rx_hash() which takes the same approach as
> > i40e_rx_hash() to extract the hash from RX descriptors.
> >
> > Tested with X710 adapter using xdp_hw_metadata, and verified hash
> > consistency between bpf_xdp_metadata_rx_hash() and skb->hash.
> >
> > Signed-off-by: Kohei Enju <kohei@enjuk.jp>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 30
> > +++++++++++++++++++++  drivers/net/ethernet/intel/i40e/i40e_type.h |
> > 5 ++++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > index 7966d9cb8009..6b7e34b16a8d 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > @@ -4,6 +4,7 @@
> >  #include <generated/utsrelease.h>
> >  #include <linux/crash_dump.h>
> >  #include <linux/intel/libie/pctype.h>
> > +#include <linux/intel/libie/rx.h>
> >  #include <linux/if_bridge.h>
> >  #include <linux/if_macvlan.h>
> >  #include <linux/module.h>
> > @@ -13585,6 +13586,34 @@ static int i40e_xdp(struct net_device *dev,
> >  	}
> >  }
> >
> > +static int i40e_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
> > +			    enum xdp_rss_hash_type *rss_type) {
> > +	const struct i40e_xdp_buff *ctx = (const void *)_ctx;
> > +	const union i40e_rx_desc *desc = ctx->desc;
> > +	struct libeth_rx_pt rx_ptype;
> > +	u8 raw_rx_ptype;
> > +	u64 status;
> > +
> > +	status = le64_to_cpu(desc->wb.qword1.status_error_len);
> > +	raw_rx_ptype = FIELD_GET(I40E_RXD_QW1_PTYPE_MASK, status);
> > +	rx_ptype = libie_rx_pt_parse(raw_rx_ptype);
> > +
> > +	if (!libeth_rx_pt_has_hash(ctx->xdp.rxq->dev, rx_ptype) ||
> > +	    FIELD_GET(I40E_RX_DESC_STATUS_FLTSTAT_MASK, status) !=
> > +		    I40E_RX_DESC_FLTSTAT_RSS_HASH)
> > +		return -ENODATA;
> > +
> > +	*hash = le32_to_cpu(desc->wb.qword0.hi_dword.rss);
> > +	*rss_type = rx_ptype.hash_type;
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct xdp_metadata_ops i40e_xdp_metadata_ops = {
> > +	.xmo_rx_hash		= i40e_xdp_rx_hash,
> > +};
> > +
> >  static const struct net_device_ops i40e_netdev_ops = {
> >  	.ndo_open		= i40e_open,
> >  	.ndo_stop		= i40e_close,
> > @@ -13788,6 +13817,7 @@ static int i40e_config_netdev(struct i40e_vsi
> > *vsi)
> >  	i40e_vsi_config_netdev_tc(vsi, vsi->tc_config.enabled_tc);
> >
> >  	netdev->netdev_ops = &i40e_netdev_ops;
> > +	netdev->xdp_metadata_ops = &i40e_xdp_metadata_ops;
> >  	netdev->watchdog_timeo = 5 * HZ;
> >  	i40e_set_ethtool_ops(netdev);
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h
> > b/drivers/net/ethernet/intel/i40e/i40e_type.h
> > index ed8bbdb586da..16a65c6e5153 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_type.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
> > @@ -4,6 +4,7 @@
> >  #ifndef _I40E_TYPE_H_
> >  #define _I40E_TYPE_H_
> >
> > +#include <linux/bits.h>
> >  #include <uapi/linux/if_ether.h>
> >  #include "i40e_adminq.h"
> >  #include "i40e_hmc.h"
> > @@ -699,6 +700,10 @@ enum i40e_rx_desc_status_bits {
> >  	I40E_RX_DESC_STATUS_LAST /* this entry must be last!!! */  };
> >
> > +#define I40E_RX_DESC_STATUS_FLTSTAT_MASK                   \
> > +	GENMASK_ULL(I40E_RX_DESC_STATUS_FLTSTAT_SHIFT + 1, \
> > +		    I40E_RX_DESC_STATUS_FLTSTAT_SHIFT)
> > +
> >  #define I40E_RXD_QW1_STATUS_SHIFT	0
> >  #define I40E_RXD_QW1_STATUS_MASK
> 	((BIT(I40E_RX_DESC_STATUS_LAST) - 1)
> > \
> >  					 << I40E_RXD_QW1_STATUS_SHIFT)
> > --
> > 2.51.0
> 
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

Tested-by: Patryk Holda <patryk.holda@intel.com> 



^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v4] ice: remove excessive memory allocation in ice_create_lag_recipe()
From: Holda, Patryk @ 2026-04-14 11:45 UTC (permalink / raw)
  To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	Nguyen, Anthony L, Loktionov, Aleksandr
  Cc: netdev@vger.kernel.org, Szycik, Marcin, Joe Damato
In-Reply-To: <20260327064855.112786-1-aleksandr.loktionov@intel.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Aleksandr Loktionov
> Sent: Friday, March 27, 2026 7:49 AM
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Szycik, Marcin <marcin.szycik@intel.com>; Joe
> Damato <joe@dama.to>
> Subject: [Intel-wired-lan] [PATCH iwl-next v4] ice: remove excessive memory
> allocation in ice_create_lag_recipe()
> 
> From: Marcin Szycik <marcin.szycik@intel.com>
> 
> For some reason ice_create_lag_recipe() allocates an array of 64 struct
> ice_aqc_recipe_data_elem elements, while it only needs one (1).
> Fix it, while also using kzalloc_obj().
> 
> Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Joe Damato <joe@dama.to>
> ---
> v3 -> v4 corrected misspeled RB from Joe
> v2 -> v3 use sizeof(*new_rcp) in memcpy() to match the allocation (Joe)
> v1 -> v2 remove 'Fixes' from commit message because it's not a critical bug
> ---
>  drivers/net/ethernet/intel/ice/ice_lag.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> b/drivers/net/ethernet/intel/ice/ice_lag.c
> index 310e8fe..9ad19c3 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> @@ -2418,11 +2418,11 @@ static int ice_create_lag_recipe(struct ice_hw
> *hw, u16 *rid,
>  	if (err)
>  		return err;
> 
> -	new_rcp = kzalloc(ICE_RECIPE_LEN * ICE_MAX_NUM_RECIPES,
> GFP_KERNEL);
> +	new_rcp = kzalloc_obj(*new_rcp, GFP_KERNEL);
>  	if (!new_rcp)
>  		return -ENOMEM;
> 
> -	memcpy(new_rcp, base_recipe, ICE_RECIPE_LEN);
> +	memcpy(new_rcp, base_recipe, sizeof(*new_rcp));
>  	new_rcp->content.act_ctrl_fwd_priority = prio;
>  	new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT;
>  	new_rcp->recipe_indx = *rid;

Tested-by: Patryk Holda <patryk.holda@intel.com> 


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v2] ice: call netif_keep_dst() once when entering switchdev mode
From: Holda, Patryk @ 2026-04-14 11:44 UTC (permalink / raw)
  To: Paul Menzel, Loktionov, Aleksandr
  Cc: intel-wired-lan@lists.osuosl.org, Nguyen, Anthony L,
	netdev@vger.kernel.org, Szycik, Marcin
In-Reply-To: <d87f554d-ef86-44c7-9585-0a3806cc5752@molgen.mpg.de>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Paul Menzel
> Sent: Wednesday, April 8, 2026 4:28 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; netdev@vger.kernel.org; Szycik, Marcin
> <marcin.szycik@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: call netif_keep_dst()
> once when entering switchdev mode
> 
> Dear Aleksandr, dear Marcin,
> 
> 
> Thank you for the patch.
> 
> Am 08.04.26 um 16:14 schrieb Aleksandr Loktionov:
> > From: Marcin Szycik <marcin.szycik@intel.com>
> >
> > netif_keep_dst() only needs to be called once for the uplink VSI, not
> > once for each port representor.  Move it from ice_eswitch_setup_repr()
> > to ice_eswitch_enable_switchdev().
> 
> It’d be great, if you could share the commands, how to verify your change.
> 
> > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
> > Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
> > Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> > ---
> > v1 -> v2:
> >   - Verified Fixes: tag via bisect - defd52455aee introduced the redundant
> >     per-repr call to netif_keep_dst(uplink_vsi->netdev) by changing the
> >     target netdev to the uplink VSI inside the per-representor setup
> >     function. Before that commit, each call was on a distinct repr->netdev
> >     so no Fixes: predating it applies.
> >
> >   drivers/net/ethernet/intel/ice/ice_eswitch.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> > b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> > index 2e4f096..c30e27b 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> > @@ -117,8 +117,6 @@ static int ice_eswitch_setup_repr(struct ice_pf
> *pf, struct ice_repr *repr)
> >   	if (!repr->dst)
> >   		return -ENOMEM;
> >
> > -	netif_keep_dst(uplink_vsi->netdev);
> > -
> >   	dst = repr->dst;
> >   	dst->u.port_info.port_id = vsi->vsi_num;
> >   	dst->u.port_info.lower_dev = uplink_vsi->netdev; @@ -312,6 +310,8
> > @@ static int ice_eswitch_enable_switchdev(struct ice_pf *pf)
> >   	if (ice_eswitch_br_offloads_init(pf))
> >   		goto err_br_offloads;
> >
> > +	netif_keep_dst(uplink_vsi->netdev);
> > +
> >   	pf->eswitch.is_running = true;
> >
> >   	return 0;
> 
> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
> 
> 
> Kind regards,
> 
> Paul

Tested-by: Patryk Holda <patryk.holda@intel.com> 


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v3 5/5] iavf: refactor virtchnl polling into single function
From: Loktionov, Aleksandr @ 2026-04-14 11:43 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez, netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, jesse.brandeburg@intel.com,
	Nguyen, Anthony L, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Kitszel, Przemyslaw
In-Reply-To: <20260414110006.124286-6-jtornosm@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Jose Ignacio Tornos Martinez
> Sent: Tuesday, April 14, 2026 1:00 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org; jesse.brandeburg@intel.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Jose Ignacio
> Tornos Martinez <jtornosm@redhat.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>
> Subject: [Intel-wired-lan] [PATCH net v3 5/5] iavf: refactor virtchnl
> polling into single function

For me it looks it should go to net-next as a refactoring.

> 
> At this moment, the driver has two separate functions for polling
> virtchnl messages from the admin queue:
> - iavf_poll_virtchnl_msg() for init-time (no timeout, no completion
>   handler)
> - iavf_poll_virtchnl_response() for runtime (with timeout, calls
>   completion)
> 
> Refactor by enhancing iavf_poll_virtchnl_msg() to handle both use
> cases:
> 1. Init-time mode (timeout_ms=0):
>   - Polls until matching opcode found or queue empty
>   - Returns raw message data without processing through completion
> handler
>   - Exits immediately on empty queue (no sleep/retry) 2. Runtime mode
> (timeout_ms>0):
>   - Polls with timeout using condition callback or opcode check
>   - Processes all messages through iavf_virtchnl_completion()
>   - Supports custom completion callback (takes priority) or falls back
>     to checking adapter->current_op against expected opcode
>   - Uses pending parameter to skip sleep when more messages queued
>   - Uses 50-75 usec sleep (due to commit 9e3f23f44f32 ("i40e: reduce
> wait
>     time for adminq command completion"))
> 
> By unifying message handling, both init-time and runtime messages can
> be processed through the completion handler when appropriate, ensuring
> consistent state updates and maintaining backward compatibility with
> all existing call sites.
> 
> Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf.h        |   9 +-
>  drivers/net/ethernet/intel/iavf/iavf_main.c   |  13 +-
>  .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 247 ++++++++---------

...

> --
> 2.53.0


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2 4/4] ice: skip unnecessary VF reset when setting trust
From: Loktionov, Aleksandr @ 2026-04-14 11:41 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez, netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, jesse.brandeburg@intel.com,
	Nguyen, Anthony L, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <20260407165206.1121317-5-jtornosm@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Jose Ignacio Tornos Martinez
> Sent: Tuesday, April 7, 2026 6:52 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org; jesse.brandeburg@intel.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Jose Ignacio
> Tornos Martinez <jtornosm@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net v2 4/4] ice: skip unnecessary VF
> reset when setting trust
> 
> Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
> ice_reset_vf() when the trust setting changes.
> 
> The ice driver already has logic to clean up MAC LLDP filters when
> removing trust, which is the only operation that requires filter
> synchronization. After this cleanup, the VF reset is only necessary if
> there were actually filters to remove.
> 
> For all other trust state changes (setting trust, or removing trust
> when no filters exist), the reset is unnecessary as filter
> synchronization happens naturally through normal VF operations.
> 
> Fix by only triggering the VF reset when removing trust AND filters
> were actually cleaned up (num_mac_lldp was non-zero).
> 
> This saves some time and eliminates unnecessary service disruption
> when changing VF trust settings if not necessary.
> 
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_sriov.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c
> b/drivers/net/ethernet/intel/ice/ice_sriov.c
> index 7e00e091756d..23f692b1e86c 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sriov.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
> @@ -1399,14 +1399,19 @@ int ice_set_vf_trust(struct net_device
> *netdev, int vf_id, bool trusted)
> 
>  	mutex_lock(&vf->cfg_lock);
> 
> -	while (!trusted && vf->num_mac_lldp)
> -		ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf),
> false);
> -
>  	vf->trusted = trusted;
> -	ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
>  	dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
>  		 vf_id, trusted ? "" : "un");
> 
> +	/* Only reset VF if removing trust and there are MAC LLDP
> filters
> +	 * to clean up. Reset is needed to ensure filter removal
> completes.
> +	 */
> +	if (!trusted && vf->num_mac_lldp) {
> +		while (vf->num_mac_lldp)
> +			ice_vf_update_mac_lldp_num(vf,
> ice_get_vf_vsi(vf), false);
> +		ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
> +	}
> +
>  	mutex_unlock(&vf->cfg_lock);
> 
>  out_put_vf:
> --
> 2.53.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v3 2/5] i40e: skip unnecessary VF reset when setting trust
From: Loktionov, Aleksandr @ 2026-04-14 11:41 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez, netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, jesse.brandeburg@intel.com,
	Nguyen, Anthony L, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <20260414110006.124286-3-jtornosm@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Jose Ignacio Tornos Martinez
> Sent: Tuesday, April 14, 2026 1:00 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org; jesse.brandeburg@intel.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Jose Ignacio
> Tornos Martinez <jtornosm@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net v3 2/5] i40e: skip unnecessary
> VF reset when setting trust
> 
> When VF trust is changed, i40e_ndo_set_vf_trust() always calls
> i40e_vc_reset_vf() to sync MAC/VLAN filters. However, this reset is
> only necessary when trust is removed from a VF that has ADQ (advanced
> queue) filters, which need to be deleted
> 
> In all other cases, the reset causes a ~10 second delay during which:
> - VF must reinitialize completely
> - Any in-progress operations (like bonding enslave) fail with timeouts
> - VF is unavailable
> 
> The MAC/VLAN filter sync will happen naturally through the normal VF
> operations and doesn't require a forced reset.
> 
> Fix by only resetting when actually needed: when removing trust from a
> VF that has ADQ cloud filters. For all other trust changes, just
> update the trust flag and let normal operation continue.
> 
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec15..fea267af7afe 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4987,16 +4987,21 @@ int i40e_ndo_set_vf_trust(struct net_device
> *netdev, int vf_id, bool setting)
>  	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>  	pf->vsi[vf->lan_vsi_idx]->flags |=
> I40E_VSI_FLAG_FILTER_CHANGED;
> 
> -	i40e_vc_reset_vf(vf, true);
>  	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
>  		 vf_id, setting ? "" : "un");
> 
> +	/* Only reset VF if we're removing trust and it has ADQ cloud
> filters.
> +	 * Cloud filters can only be added when trusted, so they must
> be
> +	 * removed when trust is revoked. Other trust changes don't
> require
> +	 * reset - MAC/VLAN filter sync happens through normal
> operation.
> +	 */
>  	if (vf->adq_enabled) {
>  		if (!vf->trusted) {
>  			dev_info(&pf->pdev->dev,
>  				 "VF %u no longer Trusted, deleting all
> cloud filters\n",
>  				 vf_id);
>  			i40e_del_all_cloud_filters(vf);
> +			i40e_vc_reset_vf(vf, true);
>  		}
>  	}
> 
> --
> 2.53.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* Re: [PATCH net] net: airoha: Fix VIP configuration for AN7583 SoC
From: patchwork-bot+netdevbpf @ 2026-04-14 11:40 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms,
	linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260412-airoha-7583-vip-fix-v1-1-c35e02b054bb@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sun, 12 Apr 2026 09:57:29 +0200 you wrote:
> EN7581 and AN7583 SoCs have different VIP definitions. Introduce
> get_vip_port callback in airoha_eth_soc_data struct in order to take
> into account EN7581 and AN7583 VIP register layout and definition
> differences.
> Introduce nbq parameter in airoha_gdm_port struct. At the moment nbq
> is set statically to value previously used in airhoha_set_gdm2_loopback
> routine and it will be read from device tree in subsequent patches.
> 
> [...]

Here is the summary with links:
  - [net] net: airoha: Fix VIP configuration for AN7583 SoC
    https://git.kernel.org/netdev/net/c/1acdfbdb516b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 nf] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
From: Pablo Neira Ayuso @ 2026-04-14 11:38 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Florian Westphal, Phil Sutter, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netfilter-devel,
	netdev
In-Reply-To: <20260414112120.248744-1-ericwouds@gmail.com>

On Tue, Apr 14, 2026 at 01:21:20PM +0200, Eric Woudstra wrote:
> Calling skb_reset_mac_header() before calling skb_vlan_push() does
> remove the error:
> 
> "skb_vlan_push got skb with skb->data not at mac header (offset 18)"
> 
> But the inner vlan tag is still not inserted correctly.
> 
> skb_vlan_push() uses __vlan_insert_inner_tag() to insert the tag
> at offset ETH_HLEN. But the inner tag should only be pushed, without
> offset, similar to nf_flow_pppoe_push().

It is doubled-tagged-vlan that is broken, right? I observed this once
but I have been burdened into a few things.

> Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
> Fixes: a3aca98aec9a ("netfilter: nf_flow_table_ip: reset mac header before vlan push")
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> 
> ---
> 
>  net/netfilter/nf_flow_table_ip.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index fd56d663cb5b..0086f8a1a0d6 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -544,6 +544,26 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
>  	return 1;
>  }
>  
> +static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id)
> +{
> +	if (skb_vlan_tag_present(skb)) {
> +		struct vlan_hdr *vhdr;
> +
> +		if (skb_cow_head(skb, VLAN_HLEN))
> +			return -1;
> +
> +		__skb_push(skb, VLAN_HLEN);
> +		skb_reset_network_header(skb);
> +		vhdr = (struct vlan_hdr *)(skb->data);
> +		vhdr->h_vlan_TCI = htons(id);
> +		vhdr->h_vlan_encapsulated_proto = skb->protocol;
> +		skb->protocol = proto;
> +	} else {
> +		__vlan_hwaccel_put_tag(skb, proto, id);
> +	}
> +	return 0;
> +}
> +
>  static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
>  {
>  	int data_len = skb->len + sizeof(__be16);
> @@ -738,9 +758,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
>  		switch (tuple->encap[i].proto) {
>  		case htons(ETH_P_8021Q):
>  		case htons(ETH_P_8021AD):
> -			skb_reset_mac_header(skb);
> -			if (skb_vlan_push(skb, tuple->encap[i].proto,
> -					  tuple->encap[i].id) < 0)
> +			if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
> +					      tuple->encap[i].id) < 0)
>  				return -1;
>  			break;
>  		case htons(ETH_P_PPP_SES):
> -- 
> 2.53.0
> 

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-net 1/2] idpf: do not enable XDP if queue based scheduling is not supported
From: Holda, Patryk @ 2026-04-14 11:37 UTC (permalink / raw)
  To: Hay, Joshua A, intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <20260406233236.3585504-2-joshua.a.hay@intel.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Joshua Hay
> Sent: Tuesday, April 7, 2026 1:33 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH iwl-net 1/2] idpf: do not enable XDP if
> queue based scheduling is not supported
> 
> The current XDP implementation uses queue based scheduling for its TxQs.
> If the FW does not advertise support for queue based scheduling, do not
> enable XDP. Add the missing capability check at the start of the XDP
> configuration. This will temporarily break XDP while a flow based
> implementation is worked on, as well as while FWs with queue based by
> default are rolled out.
> 
> Fixes: 705457e7211f ("idpf: implement XDP_SETUP_PROG in ndo_bpf for
> splitq")
> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/xdp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/xdp.c
> b/drivers/net/ethernet/intel/idpf/xdp.c
> index 18a6e7062863..9c3bdb193684 100644
> --- a/drivers/net/ethernet/intel/idpf/xdp.c
> +++ b/drivers/net/ethernet/intel/idpf/xdp.c
> @@ -511,6 +511,13 @@ int idpf_xdp(struct net_device *dev, struct
> netdev_bpf *xdp)
>  	if (!idpf_is_queue_model_split(vport->dflt_qv_rsrc.txq_model))
>  		goto notsupp;
> 
> +	if (!idpf_is_cap_ena(vport->adapter, IDPF_OTHER_CAPS,
> +			     VIRTCHNL2_CAP_SPLITQ_QSCHED)) {
> +		NL_SET_ERR_MSG_MOD(xdp->extack,
> +				   "Device does not support requested XDP Tx
> scheduling mode");
> +		goto notsupp;
> +	}
> +
>  	switch (xdp->command) {
>  	case XDP_SETUP_PROG:
>  		ret = idpf_xdp_setup_prog(vport, xdp);
> --
> 2.39.2

Tested-by: Patryk Holda <patryk.holda@intel.com> 


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix xdp crash in soft reset error path
From: Holda, Patryk @ 2026-04-14 11:36 UTC (permalink / raw)
  To: Simon Horman, Tantilov, Emil S
  Cc: daniel@iogearbox.net, ast@kernel.org, willemb@google.com,
	stable@vger.kernel.org, decot@google.com, bpf@vger.kernel.org,
	Nguyen, Anthony L, Kitszel, Przemyslaw,
	intel-wired-lan@lists.osuosl.org, edumazet@google.com,
	netdev@vger.kernel.org, pabeni@redhat.com, andrew+netdev@lunn.ch,
	kuba@kernel.org, davem@davemloft.net, sdf@fomichev.me,
	Loktionov, Aleksandr, Lobakin, Aleksander,
	john.fastabend@gmail.com, hawk@kernel.org
In-Reply-To: <20260321091753.GT74886@horms.kernel.org>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Simon Horman
> Sent: Saturday, March 21, 2026 10:18 AM
> To: Tantilov, Emil S <emil.s.tantilov@intel.com>
> Cc: daniel@iogearbox.net; ast@kernel.org; willemb@google.com;
> stable@vger.kernel.org; decot@google.com; bpf@vger.kernel.org; Nguyen,
> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; intel-wired-lan@lists.osuosl.org;
> edumazet@google.com; netdev@vger.kernel.org; pabeni@redhat.com;
> andrew+netdev@lunn.ch; kuba@kernel.org; davem@davemloft.net;
> sdf@fomichev.me; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>;
> Lobakin, Aleksander <aleksander.lobakin@intel.com>;
> john.fastabend@gmail.com; hawk@kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix xdp crash in soft
> reset error path
> 
> On Fri, Mar 20, 2026 at 02:35:42PM -0700, Tantilov, Emil S wrote:
> >
> >
> > On 3/20/2026 10:48 AM, Simon Horman wrote:
> > > This is an AI-generated review of your patch. The human sending this
> > > email has considered the AI review valid, or at least plausible.
> > >
> > > For local reproduction steps see
> > > https://netdev-ai.bots.linux.dev/ai-local.html
> > > ---
> > > idpf: fix xdp crash in soft reset error path
> > >
> > > This commit fixes a NULL pointer dereference that occurs when
> > > idpf_vport_open() fails during soft reset. The fix restores
> > > vport->xdp_prog in the error path and updates the restart check in
> > > idpf_xsk_pool_setup() to use IDPF_VPORT_UP instead of netif_running().
> > >
> > > > Fixes: 3d57b2c00f09 ("idpf: add XSk pool initialization")
> > >
> > > The Fixes: tag may not be targeting the correct commit. The primary
> > > bug being fixed is the NULL pointer dereference crash in
> > > idpf_xdp_setup_prog() that occurs when soft reset fails and vport-
> >xdp_prog is not restored.
> >
> > This is not exactly true, this is just one of the instances that would
> > cause idpf_qp_switch() to be called. [1]
> >
> > >
> > > Looking at the git history, this missing restoration was introduced
> > > in commit 705457e7211f ("idpf: implement XDP_SETUP_PROG in
> ndo_bpf
> > > for splitq"), where idpf_xdp_setup_prog() was first implemented.
> > > While commit
> > > 3d57b2c00f09 introduced the secondary issue with the netif_running()
> > > check, the main crash fix addresses code from commit 705457e7211f.
> > >
> > > Should the Fixes: tag point to commit 705457e7211f instead?
> >
> > The reason I chose commit 3d57b2c00f09 is because it is the commit
> > introducing the function where the crash is occurring:
> > [ 3179.284770] RIP: 0010:idpf_find_rxq_vec+0x17/0x30 [idpf] ...
> > [ 3179.291937] Call Trace:
> > [ 3179.292392]  <TASK>
> > [ 3179.292843]  idpf_qp_switch+0x25/0x820 [idpf]
> >
> > The setting of the restart variable is where the above commits "meet",
> > in that both conditions - netif_ruinning() and idpf_xdp_enabled() [1]
> > can be wrong:
> > https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/tr
> > ee/drivers/net/ethernet/intel/idpf/xsk.c#n571
> >
> > which would end up calling idpf_qp_switch() instead of taking the
> > alternate path:
> > 	restart = idpf_xdp_enabled(vport) && netif_running(vport->netdev);
> > 	if (!restart)
> > 		goto pool;
> >
> > Which was introduced by 3d57b2c00f09.
> 
> Thanks for the clarification.
> I agree that using 3d57b2c00f09 makes sense.
> 
> ...

Tested-by: Patryk Holda <patryk.holda@intel.com> 


^ permalink raw reply

* Re: [PATCH net 1/1] net: caif: clear client service pointer on teardown
From: patchwork-bot+netdevbpf @ 2026-04-14 11:30 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, sjur.brandeland,
	yifanwucs, tomapufckgml, yuantan098, bird, enjou1224z, zcliangcn
In-Reply-To: <9f3d37847c0037568aae698ca23cd47c6691acb0.1775897577.git.zcliangcn@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sat, 11 Apr 2026 23:10:26 +0800 you wrote:
> From: Zhengchuan Liang <zcliangcn@gmail.com>
> 
> `caif_connect()` can tear down an existing client after remote shutdown by
> calling `caif_disconnect_client()` followed by `caif_free_client()`.
> `caif_free_client()` releases the service layer referenced by
> `adap_layer->dn`, but leaves that pointer stale.
> 
> [...]

Here is the summary with links:
  - [net,1/1] net: caif: clear client service pointer on teardown
    https://git.kernel.org/netdev/net/c/f7cf8ece8cee

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v5] net: caif: fix stack out-of-bounds write in cfctrl_link_setup()
From: Simon Horman @ 2026-04-14 11:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Kangzheng Gu, davem, edumazet, kuba, kees, thorsten.blum, arnd,
	sjur.brandeland, netdev, linux-kernel, stable
In-Reply-To: <255224dc-0a55-4a0c-95f3-b84d4c6b3897@redhat.com>

On Mon, Apr 13, 2026 at 11:30:53AM +0200, Paolo Abeni wrote:
> On 4/12/26 3:57 PM, Simon Horman wrote:
> > I am wondering if it would be best to follow the pattern for
> > writing linkparam.u.utility.name elsewhere in this function.
> > That:
> > 1. Uses a somewhat more succinct loop control structure
> > 2. Silently truncates input without updating cmdrsp if overrun would occur
> > 
> > Something like this (compile tested only!):
> > 
> > diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
> > index c6cc2bfed65d..ba184c11386e 100644
> > --- a/net/caif/cfctrl.c
> > +++ b/net/caif/cfctrl.c
> > @@ -15,6 +15,7 @@
> >  #include <net/caif/cfctrl.h>
> >  
> >  #define container_obj(layr) container_of(layr, struct cfctrl, serv.layer)
> > +#define RFM_VOLUME_LEN 20
> >  #define UTILITY_NAME_LENGTH 16
> >  #define CFPKT_CTRL_PKT_LEN 20
> >  
> > @@ -414,10 +415,11 @@ static int cfctrl_link_setup(struct cfctrl *cfctrl, struct cfpkt *pkt, u8 cmdrsp
> >  		 */
> >  		linkparam.u.rfm.connid = cfpkt_extr_head_u32(pkt);
> >  		cp = (u8 *) linkparam.u.rfm.volume;
> > -		for (tmp = cfpkt_extr_head_u8(pkt);
> > -		     cfpkt_more(pkt) && tmp != '\0';
> > -		     tmp = cfpkt_extr_head_u8(pkt))
> > +		caif_assert(sizeof(linkparam.u.rfm.volume) >= RFM_VOLUME_LEN);
> > +		for(i = 0; i < RFM_VOLUME_LEN - 1 && cfpkt_more(pkt); i++) {
> > +			tmp = cfpkt_extr_head_u8(pkt);
> >  			*cp++ = tmp;
> > +		}
> >  		*cp = '\0';
> >  
> >  		if (CFCTRL_ERR_BIT & cmdrsp)
> 
> I agree that the code suggested by Simon is clearer. Note that AFAICS it
> lacks an additional `tmp!= '\0'` check to break the loop, but even with
> that added it should be preferable.

Sorry, I left out the `tmp!= '\0' check.
That was unintentional and I agree it should be there.

^ permalink raw reply

* Re: Re: [PATCH,net-next] tcp: Add TCP ROCCET congestion control module.
From: Lukas Prause @ 2026-04-14 11:23 UTC (permalink / raw)
  To: Neal Cardwell, Tim Fuechsel
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, linux-kernel,
	netdev
In-Reply-To: <CADVnQymmsispHew4-frsuBBfObZHdSbH+jfP-9aSW1HguK_N4A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6597 bytes --]

Thanks for the very detailed review of our code.
We will incorporate your comments regarding documentation and variable
usage into a new version of our code.


> Please reference figures in the paper and mention specific concrete
> numerical examples of latency reductions to quantify these statements.

Figures 5 and 6 show the performance of ROCCET in stationary and mobile
scenarios (https://arxiv.org/pdf/2510.25281). In the analyzed scenario,
we have observed a lower sRTT with ROCCET than with BBRv3 and CUBIC. The
observed throughput was marginally lower than that of BBRv3, but still
on a similar level. A detailed quantitative evaluation can be found in
the paper in sections VI and VII.


> Can you please elaborate on this statement here? AFAICT from figures 7
> and 8 in https://arxiv.org/pdf/2510.25281 it seems ROCCET is
> essentially starved by CUBIC when sharing a bottleneck with CUBIC when
> the bottleneck has 2*BDP or more of buffering. AFAICT it sounds like
> ROCCET does have "fairness issues when sharing a link with TCP CUBIC"?

Our main use case is a connection where the bottleneck link is in the
cellular network, where the bottleneck queue is typically not shared
between flows. "Fairness" between flows is being implemented by the base
station's scheduler. In this scenario, ROCCET achieves its objective to
not "bloat" its own queue.

We have performed additional fairness experiments in non-cellular
networks (figures 7 and 8). Here we show that even when used in other
types of networks, ROCCET does not cause harm (see
https://dl.acm.org/doi/10.1145/3365609.3365855) to other congestion control.


> Please specify what side effect or side effects ROCCET is claiming to
> solve (presumably bufferbloat?).
The side effect we observe in cellular networks is that, in particular,
for loss-based congestion control, the cwnd often gets 'frozen' at a
size that is too large for the BDP of the current link. This effect is
caused by the TCP cwnd validation, which at some point stops increasing
the cwnd because it assumes that the sender is application-limited.
However, this often leads to a cwnd size that is too large for the link,
but too small to cause a congestion event by overfilling the buffer. The
result is a standing queue that causes permanently high RTTs. Figure 2
in the paper (https://arxiv.org/pdf/2510.25281) shows the described
behaviour for a single TCP CUBIC flow.

> Expressed in isolation like this, that sounds potentially dangerous.
> Please mention what signal(s) ROCCET uses to exit slow start if it's
> not using loss.
>
> In addition, from reading the code AFAICT the connection does use loss
> to exit slow start (see my remarks below in this message). So AFAICT
> this summary seems inaccurate, or at least misleading?
You are right, the summary is misleading. In the code we submitted,
there are three conditions for exiting slow start:
The first one is packet loss (as you already mentioned, without a cwnd
reduction) Second is if the srRTT calculated by ROCCET exceeds an upper
bound and ACK rate, sampled in 100ms time intervals, differs by 10
segments. The third one is when the growth of the cwnd is stopped by the
TCP cwnd validation  (which considers the connection as
application-limited).


> If no lower RTT is found for 10 seconds, the algorithm interpolates
> the `min_rtt` upwards towards the current RTT.
>
> +  If the path is persistently congested (e.g., a large buffer is
> constantly full), the `min_rtt` baseline will drift up.
>
> +  This makes the algorithm less sensitive to queueing delay over
> time, potentially defeating the purpose of reducing bufferbloat in the
> long run. Contrast this with BBR, which actively drains the queue
> (using the ProbeRTT mechanism) to try to find the true physical
> minimum RTT.
>
> Can you please add a comment explaining why the ROCCET algorithm takes
> this approach, and how the algorithm expects to avoid queues that
> ratchet ever higher?
We added this functionality for the edge case of long-lived fat flows,
which are experiencing routing changes, to detect a higher base RTT.
Since this functionality is disabled by default and can also cause
problems with min_RTT detection, we have decided to remove it.
The measurement results in our paper have been obtained with this
functionality disabled.


> Here, `cnt` is incremented by `1` on every call, regardless of the
> `acked` value (number of packets ACKed in this event).
You are right, we will change this.


> +  With the default `ack_rate_diff_ca` of `200`, this condition will
> become true for $sum_cwnd * 100 / sum_acked >= 200$, i.e.
> $num_acks_per_round * 100 >= 200$. So AFAICT we expect this condition
> to be true if there are 2 or more ACKs in a round trip. This makes
> `bw_limit_detect` effectively a no-op or always-on trigger rather than
> a true detector of queue growth or bandwidth limits.
The purpose of this part of the code was to detect an increasing queue
by monitoring data sent and acknowledged in combination with an
increasing sRTT over 5 RTT time intervals. In the steady state of a TCP
connection, the sending rate of the TCP sender should be equal to the
receiver's ack rate, due to TCP self-clocking. The idea behind this code
was to check if the cwnd is still correlated to the sending rate. If
this is not the case and we also observe increasing RTTs, we assume the
TCP sender is filling a buffer. However, we have made a mistake when
calculating sum_cwnd:
We are accumulating the cwnd on each ack event, instead of each RTT,
which, as you mentioned, would make more sense. Because this leads to
the erroneous behaviour that you described, we will remove this part of
the code for now until we have evaluated the intended implementation.


> Did the experiments in the paper use the approach documented in the
> paper, or the approach documented in this code? They are very
> different, AFAICT.
The experiments were performed using the submitted code. This means that
the mentioned code snippet always evaluates to true, so that ROCCET only
reacts to changes in latency, which is different from what we described
in the paper.


> Having a module parameter to ignore loss in this way makes it too easy
> for users to cause excessive congestion. I would urge you to remove
> that module parameter. Researchers can add that sort of mechanism in
> their own code for research.
That is true, we will remove this part of the implementation.

Thanks,
Lukas

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4891 bytes --]

^ permalink raw reply

* [PATCH AUTOSEL 6.19-6.12] net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules
From: Sasha Levin @ 2026-04-14 11:24 UTC (permalink / raw)
  To: patches, stable
  Cc: John Pavlick, Russell King (Oracle), Marcin Nita, Jakub Kicinski,
	Sasha Levin, linux, andrew, hkallweit1, davem, edumazet, pabeni,
	netdev, linux-kernel
In-Reply-To: <20260414112509.410217-1-sashal@kernel.org>

From: John Pavlick <jspavlick@posteo.net>

[ Upstream commit 95aca8602ef70ffd3d971675751c81826e124f90 ]

Several GPON ONT SFP sticks based on Realtek RTL960x report
1000BASE-LX at 1300MBd in their EEPROM but can operate at 2500base-X.
On hosts capable of 2500base-X (e.g. Banana Pi R3 / MT7986), the
kernel negotiates only 1G because it trusts the incorrect EEPROM data.

Add quirks for:
- Hisense-Leox LXT-010S-H
- Hisense ZNID-GPON-2311NA
- HSGQ HSGQ-XPON-Stick

Each quirk advertises 2500base-X and ignores TX_FAULT during the
module's ~40s Linux boot time.

Tested on Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed
2.5Gbps link and full throughput with flow offloading.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Suggested-by: Marcin Nita <marcin.nita@leolabs.pl>
Signed-off-by: John Pavlick <jspavlick@posteo.net>
Link: https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile my full analysis.

---

## Complete Analysis

### PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `net: sfp:`
- Action verb: "add" (quirks)
- Summary: Adding hardware quirks for Hisense and HSGQ GPON ONT SFP
  modules
- Record: [net: sfp] [add] [hardware quirks for GPON ONT SFP modules
  with incorrect EEPROM data]

**Step 1.2: Tags**
- Reviewed-by: Russell King (Oracle) — the SFP subsystem maintainer
- Suggested-by: Marcin Nita — suggested investigating sfp.c quirks as a
  solution
- Signed-off-by: John Pavlick (author)
- Link:
  https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net
- Signed-off-by: Jakub Kicinski (netdev maintainer, applied the patch)
- No Cc: stable (expected — that's why we're reviewing)
- No Fixes: tag (expected — this is a quirk addition, not a code fix)
- Record: Notable: Russell King, the SFP subsystem maintainer/author,
  reviewed this. Strong quality signal.

**Step 1.3: Commit Body**
- Bug: GPON ONT SFP sticks report 1000BASE-LX / 1300MBd in EEPROM but
  actually support 2500base-X
- Symptom: Kernel negotiates only 1G because it trusts incorrect EEPROM
  data
- Affected hardware: Hisense-Leox LXT-010S-H, Hisense ZNID-GPON-2311NA,
  HSGQ HSGQ-XPON-Stick
- All based on Realtek RTL960x chipset
- Tested: Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed 2.5Gbps
  link
- TX_FAULT quirk needed during module's ~40s Linux boot time
- Record: Real-world hardware problem limiting link speed. Users get 1G
  instead of 2.5G.

**Step 1.4: Hidden Bug Fix Detection**
- This is not a "hidden" bug fix — it is an explicit hardware quirk
  addition to work around incorrect EEPROM data. This falls squarely
  into the "QUIRKS and WORKAROUNDS" exception category for stable.

### PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Files changed: 1 (`drivers/net/phy/sfp.c`)
- Lines added: 16 (including comments)
- Lines removed: 0
- Functions modified: None — only the `sfp_quirks[]` static const array
  is extended
- Scope: Single-file, table-only addition
- Record: Extremely contained — 3 new entries in a quirk table, with
  explanatory comments.

**Step 2.2: Code Flow Change**
- Before: These three SFP modules (Hisense-Leox LXT-010S-H, Hisense
  ZNID-GPON-2311NA, HSGQ HSGQ-XPON-Stick) have no quirk entries, so the
  kernel reads their EEPROM data literally and negotiates 1G
- After: These modules are matched by vendor/part strings and:
  1. `sfp_quirk_2500basex` enables 2500base-X mode advertisement
  2. `sfp_fixup_ignore_tx_fault` ignores the TX_FAULT signal during boot

**Step 2.3: Bug Mechanism**
- Category: Hardware workaround (h)
- The modules have incorrect EEPROM data (report 1000BASE-LX but support
  2500base-X)
- The quirks use the exact same pattern as many existing entries (e.g.,
  HUAWEI MA5671A, FS GPON-ONU-34-20BI)
- Record: Hardware quirk — identical pattern to existing accepted
  entries.

**Step 2.4: Fix Quality**
- Obviously correct: Uses exact same macro and functions as ~10 other
  existing entries
- Minimal/surgical: Only adds data to a static table; no logic changes
- Regression risk: Zero for users without these modules (quirks matched
  by vendor/part string)
- For users WITH these modules: enables 2.5G link (improvement) and
  ignores TX_FAULT during boot
- Record: Highest possible confidence — data-only addition using
  established infrastructure.

### PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- The `sfp_quirks[]` table was introduced by Russell King in commit
  23571c7b9643 (2022-09-13)
- The `sfp_quirk_2500basex` function and `sfp_fixup_ignore_tx_fault`
  function have existed since at least v6.1
- Record: Infrastructure is mature and present in all active stable
  trees.

**Step 3.2: Fixes Tag**
- No Fixes: tag (expected for quirk additions). N/A.

**Step 3.3: File History**
- SFP quirk additions are extremely frequent — ~17 quirk-related commits
  since v6.6
- This is a well-established pattern in the kernel community
- Record: Standalone commit, no dependencies on other patches.

**Step 3.4: Author**
- John Pavlick is a community contributor (not subsystem maintainer)
- But the patch was reviewed by Russell King (SFP subsystem
  author/maintainer) and applied by Jakub Kicinski (netdev maintainer)
- Record: Properly reviewed by the right maintainers.

**Step 3.5: Dependencies**
- The patch uses `SFP_QUIRK()` macro, `sfp_quirk_2500basex`, and
  `sfp_fixup_ignore_tx_fault`
- All three exist in v6.1, v6.6, and v6.12 stable trees (verified)
- Record: No dependencies. Completely self-contained.

### PHASE 4: MAILING LIST RESEARCH

**Step 4.1: Original Patch Discussion**
- Found via lore: v3 of the patch, submitted 2026-04-06
- v1→v2: Added Suggested-by tag
- v2→v3: Fixed inaccurate commit message about MT7986 SerDes
  capabilities
- Applied by Jakub Kicinski to netdev/net.git (main) as commit
  95aca8602ef7
- Record: Clean submission history, no objections.

**Step 4.2: Reviewers**
- Russell King (Oracle) — SFP subsystem maintainer — Reviewed-by
- Applied by Jakub Kicinski — netdev maintainer
- Record: Reviewed by the right people.

**Step 4.3-4.5: Bug Reports / Related / Stable Discussion**
- No formal bug report — this is a hardware enablement quirk
- The underlying problem is that these GPON SFP sticks' EEPROM
  incorrectly reports capabilities
- No stable-specific discussion found; no prior nomination

### PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.4: Functions**
- No functions are modified. Only static data (the `sfp_quirks[]` array)
  is extended.
- The quirk matching happens in `sfp_lookup_quirk()` which iterates the
  table and matches vendor/part strings
- The matched `sfp_quirk_2500basex` and `sfp_fixup_ignore_tx_fault`
  functions are called during SFP module insertion
- Record: No code flow changes — purely data-driven matching.

**Step 5.5: Similar Patterns**
- Exact same pattern used by:
  - HUAWEI MA5671A (sfp_quirk_2500basex + sfp_fixup_ignore_tx_fault)
  - FS GPON-ONU-34-20BI (sfp_quirk_2500basex +
    sfp_fixup_ignore_tx_fault)
  - ALCATELLUCENT G010SP (sfp_quirk_2500basex +
    sfp_fixup_ignore_tx_fault)
- Record: Identical pattern to multiple existing accepted entries.

### PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: Code Existence in Stable**
- `sfp_quirk_2500basex` exists in v6.1, v6.6, v6.12 (verified)
- `sfp_fixup_ignore_tx_fault` exists in v6.1, v6.6, v6.12 (verified)
- `SFP_QUIRK()` 4-argument macro exists in all stable trees (verified)
- Record: All needed infrastructure exists in all active stable trees.

**Step 6.2: Backport Complications**
- Minor context difference: In mainline, HUAWEI MA5671A uses
  `sfp_fixup_ignore_tx_fault_and_los` (changed by commit 9f9c31bacaae),
  while in v6.6 and v6.12 it still uses `sfp_fixup_ignore_tx_fault`.
  This affects the context lines around the insertion point.
- The Lantech entries also differ (SFP_QUIRK_S vs SFP_QUIRK_M,
  additional 8330-265D entry)
- This means the patch will need minor context adjustment (fuzz or
  manual resolution) for older trees
- Record: Expected minor context conflicts, trivially resolvable.

**Step 6.3: Related Fixes Already in Stable**
- No — these specific modules (Hisense-Leox, Hisense ZNID, HSGQ) have no
  existing quirks in any tree.

### PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1: Subsystem Criticality**
- Subsystem: net/phy (SFP transceiver support)
- Criticality: IMPORTANT — SFP modules are used in many networking
  setups, particularly in GPON/fiber deployments and embedded/router
  platforms (OpenWrt, etc.)
- Record: [net/phy/sfp] [IMPORTANT]

**Step 7.2: Subsystem Activity**
- Very active — 31 changes since v6.6, including many quirk additions
- SFP quirk additions to stable are a well-established practice
- Record: Actively maintained, frequent quirk additions.

### PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1: Affected Users**
- Users of Hisense-Leox LXT-010S-H, Hisense ZNID-GPON-2311NA, and HSGQ
  HSGQ-XPON-Stick SFP modules
- These are GPON ONT SFP sticks commonly used in fiber-to-the-home
  setups and by OpenWrt users
- Record: Driver-specific, but affects a real user population in the
  fiber networking community.

**Step 8.2: Trigger Conditions**
- Every time these SFP modules are inserted into a host capable of
  2500base-X
- 100% reproducible — the EEPROM always reports wrong data
- Record: Deterministic, always triggers, no race or timing dependency.

**Step 8.3: Failure Mode**
- Without quirk: Link operates at 1G instead of 2.5G — loss of 60%
  bandwidth
- This is a functional issue, not a crash or security issue
- Severity: MEDIUM (hardware not working at full capability)
- Record: Performance/capability limitation, not crash or corruption.

**Step 8.4: Risk-Benefit**
- BENEFIT: HIGH — enables proper 2.5G operation for these GPON SFP
  modules
- RISK: VERY LOW — data-only addition, zero impact on users without
  these specific modules, uses well-tested infrastructure
- Record: Very favorable risk-benefit ratio.

### PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Falls squarely into the "SFP/NETWORK QUIRKS" exception category
  (explicitly called out as YES for stable)
- 16 lines added, zero lines removed — purely additive data
- Uses exact same pattern as many existing entries already in stable
  trees
- All required infrastructure (macros, functions) exists in all active
  stable trees
- Reviewed by Russell King (SFP subsystem maintainer)
- Applied by Jakub Kicinski (netdev maintainer)
- Tested on real hardware with confirmed results
- Zero regression risk for unaffected users
- Fixes real hardware limitation for GPON fiber users

AGAINST backporting:
- Not a crash/security/corruption fix — it's hardware enablement
- Minor context conflicts needed for older stable trees
- No explicit Cc: stable nomination (but this is expected per our
  guidelines)

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? YES — data-only addition, reviewed by
   subsystem maintainer, tested on hardware
2. Fixes a real bug? YES — SFP modules operate at wrong speed due to
   incorrect EEPROM
3. Important issue? MEDIUM — hardware not operating at full capability
4. Small and contained? YES — 16 lines in one file, data-only
5. No new features or APIs? CORRECT — no new features, just device quirk
   entries
6. Can apply to stable? YES — with minor context adjustment

**Step 9.3: Exception Categories**
This is a **hardware quirk/workaround** — one of the explicitly listed
exception categories that is "YES for stable" even though it technically
"adds code." SFP quirks are specifically called out in the analysis
guidelines.

**Step 9.4: Decision**
This is a textbook SFP quirk addition. The guidelines explicitly state:
"SFP_QUIRK_* for optical modules with broken behavior... These are all
YES for stable." The patch is minimal, uses existing infrastructure
available in all stable trees, is reviewed by the subsystem maintainer,
was tested on real hardware, and carries essentially zero regression
risk.

### Verification

- [Phase 1] Parsed tags: Reviewed-by Russell King (SFP maintainer),
  Signed-off-by Jakub Kicinski (netdev maintainer), Suggested-by Marcin
  Nita, Link to lore
- [Phase 2] Diff analysis: 16 lines added (3 SFP_QUIRK entries +
  comments) to static const sfp_quirks[] array, zero lines removed,
  single file
- [Phase 3] git blame: sfp_quirks[] table introduced by Russell King in
  23571c7b9643 (2022-09-13), infrastructure present since v6.1
- [Phase 3] Verified sfp_quirk_2500basex exists in v6.1 (line 355), v6.6
  (line 399), v6.12 (line 424)
- [Phase 3] Verified sfp_fixup_ignore_tx_fault exists in v6.1 (line
  325), v6.6 (line 348), v6.12 (line 358)
- [Phase 3] Verified SFP_QUIRK() 4-argument macro exists in v6.1, v6.6,
  v6.12
- [Phase 3] No dependencies — standalone commit confirmed
- [Phase 4] Lore thread found: v3 submission, applied to netdev/net, no
  NAKs or concerns
- [Phase 4] b4 dig found original submission; Russell King CC'd and
  provided Reviewed-by
- [Phase 4] Patch went through v1→v2→v3, applied version is v3 (latest)
- [Phase 5] No function modifications — only static data table extended
- [Phase 6] Minor context conflict expected: HUAWEI MA5671A entry uses
  sfp_fixup_ignore_tx_fault in v6.6/v6.12 but
  sfp_fixup_ignore_tx_fault_and_los in mainline; Lantech entries differ
  (SFP_QUIRK_M vs SFP_QUIRK_S)
- [Phase 6] Confirmed all needed infrastructure exists in all active
  stable trees
- [Phase 7] SFP subsystem is actively maintained with frequent quirk
  additions
- [Phase 8] Zero regression risk for unaffected users; deterministic
  2.5G enablement for affected hardware

**YES**

 drivers/net/phy/sfp.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 7a85b758fb1e6..c62e3f364ea73 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -543,6 +543,22 @@ static const struct sfp_quirk sfp_quirks[] = {
 	SFP_QUIRK("HUAWEI", "MA5671A", sfp_quirk_2500basex,
 		  sfp_fixup_ignore_tx_fault_and_los),
 
+	// Hisense LXT-010S-H is a GPON ONT SFP (sold as LEOX LXT-010S-H) that
+	// can operate at 2500base-X, but reports 1000BASE-LX / 1300MBd in its
+	// EEPROM
+	SFP_QUIRK("Hisense-Leox", "LXT-010S-H", sfp_quirk_2500basex,
+		  sfp_fixup_ignore_tx_fault),
+
+	// Hisense ZNID-GPON-2311NA can operate at 2500base-X, but reports
+	// 1000BASE-LX / 1300MBd in its EEPROM
+	SFP_QUIRK("Hisense", "ZNID-GPON-2311NA", sfp_quirk_2500basex,
+		  sfp_fixup_ignore_tx_fault),
+
+	// HSGQ HSGQ-XPON-Stick can operate at 2500base-X, but reports
+	// 1000BASE-LX / 1300MBd in its EEPROM
+	SFP_QUIRK("HSGQ", "HSGQ-XPON-Stick", sfp_quirk_2500basex,
+		  sfp_fixup_ignore_tx_fault),
+
 	// Lantech 8330-262D-E and 8330-265D can operate at 2500base-X, but
 	// incorrectly report 2500MBd NRZ in their EEPROM.
 	// Some 8330-265D modules have inverted LOS, while all of them report
-- 
2.53.0


^ permalink raw reply related

* [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
  To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
	bpf, linux-kselftest
In-Reply-To: <20260414112310.1285783-1-kafai.wan@linux.dev>

Add a sockops test program that enables
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG on connection setup and calls
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.

Exercise the connection by sending data after the socket is
established. Before the fix, this setup can recurse through
tcp_push_pending_frames() and bpf_skops_hdr_opt_len() until the
kernel hits a stack guard page. After the fix, the connection
continues to make forward progress and the data exchange completes.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 .../bpf/prog_tests/tcp_hdr_options.c          | 34 +++++++++++++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 18 ++++++++++
 2 files changed, 52 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..f361f9c7bf59 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -513,6 +513,39 @@ static void misc(void)
 	bpf_link__destroy(link);
 }
 
+static void hdr_sockopt(void)
+{
+	const char send_msg[] = "MISC!!!";
+	char recv_msg[sizeof(send_msg)];
+	const unsigned int nr_data = 2;
+	struct bpf_link *link;
+	struct sk_fds sk_fds;
+	int i, ret;
+
+	link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
+	if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
+		return;
+
+	if (sk_fds_connect(&sk_fds, false)) {
+		bpf_link__destroy(link);
+		return;
+	}
+
+	for (i = 0; i < nr_data; i++) {
+		ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0);
+		if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)"))
+			goto check_linum;
+
+		ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg));
+		if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)"))
+			goto check_linum;
+	}
+
+check_linum:
+	sk_fds_close(&sk_fds);
+	bpf_link__destroy(link);
+}
+
 struct test {
 	const char *desc;
 	void (*run)(void);
@@ -526,6 +559,7 @@ static struct test tests[] = {
 	DEF_TEST(fastopen_estab),
 	DEF_TEST(fin),
 	DEF_TEST(misc),
+	DEF_TEST(hdr_sockopt),
 };
 
 void test_tcp_hdr_options(void)
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e1dc7246193e 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -326,4 +326,22 @@ int misc_estab(struct bpf_sock_ops *skops)
 	return CG_OK;
 }
 
+SEC("sockops")
+int misc_hdr_sockopt(struct bpf_sock_ops *skops)
+{
+	int true_val = 1;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		set_hdr_cb_flags(skops, 0);
+		break;
+	case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+		bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		break;
+	}
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
  To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
	bpf, linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
In-Reply-To: <20260414112310.1285783-1-kafai.wan@linux.dev>

A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.

That reaches __tcp_sock_set_nodelay(), which may call
tcp_push_pending_frames(). The transmit path then computes TCP
options again, re-enters bpf_skops_hdr_opt_len(), and invokes the
same BPF callback recursively. This can loop until the kernel
stack overflows.

TCP_NODELAY is not safe from the header option callback context.
Reject it with -EOPNOTSUPP when TCP header option callbacks are
enabled on the socket, so the callback cannot recurse back into
tcp_push_pending_frames() through do_tcp_setsockopt().

Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 net/ipv4/tcp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 202a4e57a218..7ac4c98be19d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 
 	switch (optname) {
 	case TCP_NODELAY:
-		__tcp_sock_set_nodelay(sk, val);
+		if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))
+			err = -EOPNOTSUPP;
+		else
+			__tcp_sock_set_nodelay(sk, val);
 		break;
 
 	case TCP_THIN_LINEAR_TIMEOUTS:
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
  To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
	bpf, linux-kselftest

This small patchset is about avoid infinite recursion in bpf_skops_hdr_opt_len() 
via TCP_NODELAY setsockopt.

---
KaFai Wan (2):
  bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
  selftests/bpf: Cover TCP_NODELAY in hdr opt callback

 net/ipv4/tcp.c                                |  5 ++-
 .../bpf/prog_tests/tcp_hdr_options.c          | 34 +++++++++++++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 18 ++++++++++
 3 files changed, 56 insertions(+), 1 deletion(-)

-- 
2.43.0


^ permalink raw reply

* [PATCH v2 nf] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
From: Eric Woudstra @ 2026-04-14 11:21 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Eric Woudstra
  Cc: netfilter-devel, netdev

Calling skb_reset_mac_header() before calling skb_vlan_push() does
remove the error:

"skb_vlan_push got skb with skb->data not at mac header (offset 18)"

But the inner vlan tag is still not inserted correctly.

skb_vlan_push() uses __vlan_insert_inner_tag() to insert the tag
at offset ETH_HLEN. But the inner tag should only be pushed, without
offset, similar to nf_flow_pppoe_push().

Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
Fixes: a3aca98aec9a ("netfilter: nf_flow_table_ip: reset mac header before vlan push")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>

---

 net/netfilter/nf_flow_table_ip.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index fd56d663cb5b..0086f8a1a0d6 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -544,6 +544,26 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 	return 1;
 }
 
+static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id)
+{
+	if (skb_vlan_tag_present(skb)) {
+		struct vlan_hdr *vhdr;
+
+		if (skb_cow_head(skb, VLAN_HLEN))
+			return -1;
+
+		__skb_push(skb, VLAN_HLEN);
+		skb_reset_network_header(skb);
+		vhdr = (struct vlan_hdr *)(skb->data);
+		vhdr->h_vlan_TCI = htons(id);
+		vhdr->h_vlan_encapsulated_proto = skb->protocol;
+		skb->protocol = proto;
+	} else {
+		__vlan_hwaccel_put_tag(skb, proto, id);
+	}
+	return 0;
+}
+
 static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
 {
 	int data_len = skb->len + sizeof(__be16);
@@ -738,9 +758,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
 		switch (tuple->encap[i].proto) {
 		case htons(ETH_P_8021Q):
 		case htons(ETH_P_8021AD):
-			skb_reset_mac_header(skb);
-			if (skb_vlan_push(skb, tuple->encap[i].proto,
-					  tuple->encap[i].id) < 0)
+			if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
+					      tuple->encap[i].id) < 0)
 				return -1;
 			break;
 		case htons(ETH_P_PPP_SES):
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v2] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Florian Westphal @ 2026-04-14 11:14 UTC (permalink / raw)
  To: Kito Xu (veritas501)
  Cc: pablo, coreteam, davem, edumazet, ffmancera, horms, kuba,
	linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <20260414104900.2617863-1-hxzene@gmail.com>

Kito Xu (veritas501) <hxzene@gmail.com> wrote:
> nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
> to in_dev_for_each_ifa_rcu() without checking for NULL. When the
> receiving device has no IPv4 configuration (ip_ptr is NULL),
> __in_dev_get_rcu() returns NULL and in_dev_for_each_ifa_rcu()
> dereferences it unconditionally, causing a kernel crash.
> 
> This can happen when a packet arrives on a device that has had its
> IPv4 configuration removed (e.g., MTU set below IPV4_MIN_MTU causing
> inetdev_destroy) or on a device that was never assigned an IPv4
> address, while an xt_osf or nft_osf rule with TTL_LESS mode is
> active and the packet TTL exceeds the fingerprint TTL.
> 
> Add a NULL check for in_dev before using it. When in_dev is NULL,
> return 0 (no match) since source-address locality cannot be
> determined without IPv4 addresses on the device.
> 
> KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
> RIP: 0010:nf_osf_match_one+0x204/0xa70
> Call Trace:
>  <IRQ>
>  nf_osf_match+0x2f8/0x780
>  xt_osf_match_packet+0x11c/0x1f0
>  ipt_do_table+0x7fe/0x12b0
>  nf_hook_slow+0xac/0x1e0
>  ip_rcv+0x123/0x370
>  __netif_receive_skb_one_core+0x166/0x1b0
>  process_backlog+0x197/0x590
>  __napi_poll+0xa1/0x540
>  net_rx_action+0x401/0xd80
>  handle_softirqs+0x19f/0x610
>  </IRQ>
> 
> Fixes: a218dc82f0b5 ("netfilter: nft_osf: Add ttl option support")
> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>

The other __in_dev_get_rcu() callers in netfilter check return value, so:

Reviewed-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply

* Re: [PATCH 1/1] net: strparser: fix skb_head leak in strp_abort_strp()
From: patchwork-bot+netdevbpf @ 2026-04-14 11:10 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, nate.karstens, sd,
	linux, Julia.Lawall, tom, yifanwucs, tomapufckgml, yuantan098,
	bird, rakukuip
In-Reply-To: <ade3857a9404999ce9a1c27ec523efc896072678.1775482694.git.rakukuip@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sat, 11 Apr 2026 23:10:10 +0800 you wrote:
> From: Luxiao Xu <rakukuip@gmail.com>
> 
> When the stream parser is aborted, for example after a message assembly timeout,
> it can still hold a reference to a partially assembled message in
> strp->skb_head.
> 
> That skb is not released in strp_abort_strp(), which leaks the partially
> assembled message and can be triggered repeatedly to exhaust memory.
> 
> [...]

Here is the summary with links:
  - [1/1] net: strparser: fix skb_head leak in strp_abort_strp()
    https://git.kernel.org/netdev/net/c/fe72340daaf1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox