Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v1] net: liquidio: resolve VF pci_dev on demand for FLR requests
From: Simon Horman @ 2026-04-21 15:33 UTC (permalink / raw)
  To: Yuho Choi
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Andrew Lunn, Eric Dumazet, Kory Maincent, Vadim Fedorenko,
	Marco Crivellari, linux-kernel, Myeonghun Pak, Ijae Kim,
	Taegyu Kim
In-Reply-To: <20260420023304.57105-1-dbgh9129@gmail.com>

On Sun, Apr 19, 2026 at 10:33:04PM -0400, Yuho Choi wrote:
> The PF SR-IOV enable path caches VF pci_dev pointers in
> dpiring_to_vfpcidev_lut[] by iterating with pci_get_device(). Those
> entries do not own a reference, because the iterator drops the previous
> device reference on each step. The cached pointer is then dereferenced
> later when handling OCTEON_VF_FLR_REQUEST.
> 
> This can leave stale VF pci_dev pointers in the lookup table and makes
> the FLR path rely on a PCI device object whose lifetime is not pinned.
> 
> Drop the long-lived lookup table and resolve the VF pci_dev only when an
> FLR request arrives. Use the PF's SR-IOV metadata to derive the VF's
> bus/devfn, get a referenced pci_dev for immediate use, issue the FLR,
> and then drop the reference.
> 
> Fixes: ca6139ffc67ee ("liquidio CN23XX: sysfs VF config support")
> Fixes: 8c978d059224 ("liquidio CN23XX: Mailbox support")
> Co-developed-by: Myeonghun Pak <mhun512@gmail.com>
> Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
> Co-developed-by: Ijae Kim <ae878000@gmail.com>
> Signed-off-by: Ijae Kim <ae878000@gmail.com>
> Co-developed-by: Taegyu Kim <tmk5904@psu.edu>
> Signed-off-by: Taegyu Kim <tmk5904@psu.edu>
> Signed-off-by: Yuho Choi <dbgh9129@gmail.com>

As this fixes code present in the net tree, it should be targeted
at that tree, like this:

Subject: [PATCH net] ...

In this case the CI defaulted to the net-next tree.
Which might be harmless. But please keep this in mind for next time.

...

> diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> index ad685f5d0a136..b967c7928b4a7 100644
> --- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> @@ -26,6 +26,29 @@
>  #include "octeon_mailbox.h"
>  #include "cn23xx_pf_device.h"
>  
> +static struct pci_dev *lio_vf_pci_dev_by_qno(struct octeon_device *oct, u32 q_no)
> +{
> +	int vfidx, bus, devfn;
> +
> +	if (!oct->sriov_info.rings_per_vf)
> +		return NULL;
> +
> +	if (q_no % oct->sriov_info.rings_per_vf)
> +		return NULL;
> +
> +	vfidx = q_no / oct->sriov_info.rings_per_vf;
> +	if (vfidx >= oct->sriov_info.num_vfs_alloced)
> +		return NULL;
> +
> +	bus = pci_iov_virtfn_bus(oct->pci_dev, vfidx);

When applied against net-next this causes a linker error with x86_64
allmodconfig (at least) because pci_iov_virtfn_bus is not defined.

> +	devfn = pci_iov_virtfn_devfn(oct->pci_dev, vfidx);
> +	if (bus < 0 || devfn < 0)
> +		return NULL;
> +
> +	return pci_get_domain_bus_and_slot(pci_domain_nr(oct->pci_dev->bus),
> +					   bus, devfn);
> +}
> +
>  /**
>   * octeon_mbox_read:
>   * @mbox: Pointer mailbox

-- 
pw-bot: changes-requested

^ permalink raw reply

* [syzbot] [net?] [nfs?] KASAN: slab-out-of-bounds Read in cache_seq_start_rcu
From: syzbot @ 2026-04-21 15:34 UTC (permalink / raw)
  To: Dai.Ngo, anna, chuck.lever, davem, edumazet, horms, jlayton, kuba,
	linux-kernel, linux-nfs, neil, netdev, okorniev, pabeni,
	syzkaller-bugs, tom, trondmy

Hello,

syzbot found the following issue on:

HEAD commit:    b4e07588e743 tracing: tell git to ignore the generated 'un..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14b522d2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=45528449ee7e2c2f
dashboard link: https://syzkaller.appspot.com/bug?extid=60cfa08822470bbebe44
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13f52c36580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13432cce580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c05a9c593011/disk-b4e07588.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/779dda6a7608/vmlinux-b4e07588.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5e7d794a3ff0/bzImage-b4e07588.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+60cfa08822470bbebe44@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in __cache_seq_start net/sunrpc/cache.c:1351 [inline]
BUG: KASAN: slab-out-of-bounds in cache_seq_start_rcu+0x3fe/0x420 net/sunrpc/cache.c:1399
Read of size 8 at addr ffff88802ae92800 by task syz.0.17/6044

CPU: 1 UID: 0 PID: 6044 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x13d/0x4b0 mm/kasan/report.c:482
 kasan_report+0xdf/0x1d0 mm/kasan/report.c:595
 __cache_seq_start net/sunrpc/cache.c:1351 [inline]
 cache_seq_start_rcu+0x3fe/0x420 net/sunrpc/cache.c:1399
 seq_read_iter+0x2c1/0x1270 fs/seq_file.c:226
 seq_read+0x33b/0x4c0 fs/seq_file.c:163
 pde_read fs/proc/inode.c:308 [inline]
 proc_reg_read+0x240/0x330 fs/proc/inode.c:320
 vfs_read+0x1e4/0xb30 fs/read_write.c:572
 ksys_pread64 fs/read_write.c:765 [inline]
 __do_sys_pread64 fs/read_write.c:773 [inline]
 __se_sys_pread64 fs/read_write.c:770 [inline]
 __x64_sys_pread64+0x1eb/0x250 fs/read_write.c:770
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f88b8d9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc567f4a98 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 00007f88b9015fa0 RCX: 00007f88b8d9c819
RDX: 0000000000000566 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f88b8e32c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000080000002 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f88b9015fac R14: 00007f88b9015fa0 R15: 00007f88b9015fa0
 </TASK>

Allocated by task 5958:
 kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
 kasan_save_track+0x14/0x30 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x301/0x850 mm/slub.c:5307
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 cache_create_net+0xa2/0x1f0 net/sunrpc/cache.c:1733
 nfsd_export_init+0x62/0x250 fs/nfsd/export.c:1536
 nfsd_net_init+0x69/0x3e0 fs/nfsd/nfsctl.c:2209
 ops_init+0x1e2/0x5f0 net/core/net_namespace.c:137
 setup_net+0x118/0x3a0 net/core/net_namespace.c:446
 copy_net_ns+0x46f/0x7c0 net/core/net_namespace.c:579
 create_new_namespaces+0x3ea/0xac0 kernel/nsproxy.c:132
 unshare_nsproxy_namespaces+0xf2/0x220 kernel/nsproxy.c:234
 ksys_unshare+0x438/0xab0 kernel/fork.c:3245
 __do_sys_unshare kernel/fork.c:3319 [inline]
 __se_sys_unshare kernel/fork.c:3317 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:3317
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88802ae92000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 0 bytes to the right of
 allocated 2048-byte region [ffff88802ae92000, ffff88802ae92800)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2ae90
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000040 ffff88813fe86000 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800080008 00000000f5000000 0000000000000000
head: 00fff00000000040 ffff88813fe86000 dead000000000100 dead000000000122
head: 0000000000000000 0000000800080008 00000000f5000000 0000000000000000
head: 00fff00000000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5958, tgid 5958 (syz-executor), ts 108313513495, free_ts 108309100167
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x153/0x170 mm/page_alloc.c:1858
 prep_new_page mm/page_alloc.c:1866 [inline]
 get_page_from_freelist+0x11a6/0x33b0 mm/page_alloc.c:3946
 __alloc_frozen_pages_noprof+0x27c/0x2bc0 mm/page_alloc.c:5226
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab mm/slub.c:3467 [inline]
 new_slab+0xa6/0x6c0 mm/slub.c:3525
 refill_objects+0x277/0x420 mm/slub.c:7251
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x375/0x650 mm/slub.c:4651
 alloc_from_pcs mm/slub.c:4749 [inline]
 slab_alloc_node mm/slub.c:4883 [inline]
 __kmalloc_cache_noprof+0x493/0x6f0 mm/slub.c:5410
 kmalloc_noprof include/linux/slab.h:950 [inline]
 netdev_create_hash net/core/dev.c:12922 [inline]
 netdev_init+0x151/0x3c0 net/core/dev.c:12942
 ops_init+0x1e2/0x5f0 net/core/net_namespace.c:137
 setup_net+0x118/0x3a0 net/core/net_namespace.c:446
 copy_net_ns+0x46f/0x7c0 net/core/net_namespace.c:579
 create_new_namespaces+0x3ea/0xac0 kernel/nsproxy.c:132
 unshare_nsproxy_namespaces+0xf2/0x220 kernel/nsproxy.c:234
 ksys_unshare+0x438/0xab0 kernel/fork.c:3245
 __do_sys_unshare kernel/fork.c:3319 [inline]
 __se_sys_unshare kernel/fork.c:3317 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:3317
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
page last free pid 5958 tgid 5958 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1402 [inline]
 __free_frozen_pages+0x747/0x1040 mm/page_alloc.c:2943
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x47/0xf0 mm/kasan/quarantine.c:179
 kasan_quarantine_reduce+0x1a0/0x1f0 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x69/0x90 mm/kasan/common.c:350
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4569 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 kmem_cache_alloc_node_noprof+0x25a/0x6f0 mm/slub.c:4950
 __alloc_skb+0x140/0x710 net/core/skbuff.c:702
 alloc_skb include/linux/skbuff.h:1383 [inline]
 nlmsg_new include/net/netlink.h:1055 [inline]
 netlink_ack+0x117/0xb80 net/netlink/af_netlink.c:2487
 netlink_rcv_skb+0x333/0x420 net/netlink/af_netlink.c:2556
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x585/0x850 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 __sys_sendto+0x468/0x4b0 net/socket.c:2265
 __do_sys_sendto net/socket.c:2272 [inline]
 __se_sys_sendto net/socket.c:2268 [inline]
 __x64_sys_sendto+0xe0/0x1c0 net/socket.c:2268
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff88802ae92700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88802ae92780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88802ae92800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff88802ae92880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88802ae92900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [syzbot] [sctp?] WARNING: refcount bug in sctp_association_hold
From: syzbot @ 2026-04-21 15:34 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, linux-sctp,
	lucien.xin, marcelo.leitner, netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    c1f49dea2b8f Merge tag 'mm-hotfixes-stable-2026-04-19-00-1..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15de0e6a580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=507c1c0a12a79510
dashboard link: https://syzkaller.appspot.com/bug?extid=61bdf856ff699245c643
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-c1f49dea.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/98ce9fed1a97/vmlinux-c1f49dea.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b02e163ec959/bzImage-c1f49dea.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+61bdf856ff699245c643@syzkaller.appspotmail.com

------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: lib/refcount.c:25 at refcount_warn_saturate+0x111/0x130 lib/refcount.c:25, CPU#0: swapper/0/0
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:refcount_warn_saturate+0x111/0x130 lib/refcount.c:25
Code: 06 e8 e3 e8 11 fd 48 8d 3d 8c d6 ef 0b 67 48 0f b9 3a e8 d2 e8 11 fd 5b 5d c3 cc cc cc cc e8 c6 e8 11 fd 48 8d 3d 7f d6 ef 0b <67> 48 0f b9 3a e8 b5 e8 11 fd 5b 5d e9 0e de a2 06 48 89 df e8 a6
RSP: 0000:ffffc90000007bd8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888075d2e004 RCX: ffffffff84f6dc0b
RDX: ffffffff8e4955c0 RSI: ffffffff84f6dcba RDI: ffffffff90e6b340
RBP: 0000000000000002 R08: 0000000000000005 R09: 0000000000000004
R10: 0000000000000002 R11: 0000000000000000 R12: ffff888075d2e004
R13: 0000000000000002 R14: ffff88804fd3cbd0 R15: ffff888050620000
FS:  0000000000000000(0000) GS:ffff8880970ee000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000080066018 CR3: 000000004c3cc000 CR4: 0000000000352ef0
Call Trace:
 <IRQ>
 __refcount_add include/linux/refcount.h:289 [inline]
 __refcount_inc include/linux/refcount.h:366 [inline]
 refcount_inc include/linux/refcount.h:383 [inline]
 sctp_association_hold+0x9f/0xb0 net/sctp/associola.c:843
 sctp_generate_timeout_event+0x292/0x3f0 net/sctp/sm_sideeffect.c:284
 call_timer_fn+0x19a/0x640 kernel/time/timer.c:1748
 expire_timers kernel/time/timer.c:1799 [inline]
 __run_timers+0x75f/0xaf0 kernel/time/timer.c:2374
 __run_timer_base kernel/time/timer.c:2386 [inline]
 __run_timer_base kernel/time/timer.c:2378 [inline]
 run_timer_base+0x114/0x190 kernel/time/timer.c:2395
 run_timer_softirq+0x1a/0x50 kernel/time/timer.c:2405
 handle_softirqs+0x1ea/0xa00 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 invoke_softirq kernel/softirq.c:496 [inline]
 __irq_exit_rcu+0x162/0x210 kernel/softirq.c:735
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
 sysvec_apic_timer_interrupt+0xa3/0xc0 arch/x86/kernel/apic/apic.c:1061
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:steal_cookie_task kernel/sched/core.c:6401 [inline]
RIP: 0010:sched_core_balance+0x3fd/0xea0 kernel/sched/core.c:6422
Code: 7e 48 48 89 4c 24 40 e8 61 df c2 09 48 8b 4c 24 40 e9 3a 01 00 00 49 8d 7f 48 e8 4e df c2 09 e8 79 76 3a 00 fb 80 7c 24 30 00 <0f> 85 92 00 00 00 8d 4d 01 48 63 d1 49 39 d4 73 48 83 f9 08 74 41
RSP: 0000:ffffffff8e407b80 EFLAGS: 00000246
RAX: 000000000046d04f RBX: ffff88802b23b3c8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8df51613 RDI: ffffffff8c1c0200
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff88802b33b380 R15: ffff88802b23b380
 do_balance_callbacks kernel/sched/core.c:5017 [inline]
 __balance_callbacks+0x21d/0x6e0 kernel/sched/core.c:5073
 __schedule+0x31b9/0x6820 kernel/sched/core.c:7191
 schedule_idle+0x54/0x80 kernel/sched/core.c:7308
 do_idle+0x2dd/0x590 kernel/sched/idle.c:381
 cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:451
 rest_init+0x251/0x260 init/main.c:762
 start_kernel+0x484/0x490 init/main.c:1220
 x86_64_start_reservations+0x24/0x30 arch/x86/kernel/head64.c:310
 x86_64_start_kernel+0x12b/0x130 arch/x86/kernel/head64.c:291
 common_startup_64+0x13e/0x148
 </TASK>
----------------
Code disassembly (best guess), 1 bytes skipped:
   0:	e8 e3 e8 11 fd       	call   0xfd11e8e8
   5:	48 8d 3d 8c d6 ef 0b 	lea    0xbefd68c(%rip),%rdi        # 0xbefd698
   c:	67 48 0f b9 3a       	ud1    (%edx),%rdi
  11:	e8 d2 e8 11 fd       	call   0xfd11e8e8
  16:	5b                   	pop    %rbx
  17:	5d                   	pop    %rbp
  18:	c3                   	ret
  19:	cc                   	int3
  1a:	cc                   	int3
  1b:	cc                   	int3
  1c:	cc                   	int3
  1d:	e8 c6 e8 11 fd       	call   0xfd11e8e8
  22:	48 8d 3d 7f d6 ef 0b 	lea    0xbefd67f(%rip),%rdi        # 0xbefd6a8
* 29:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2e:	e8 b5 e8 11 fd       	call   0xfd11e8e8
  33:	5b                   	pop    %rbx
  34:	5d                   	pop    %rbp
  35:	e9 0e de a2 06       	jmp    0x6a2de48
  3a:	48 89 df             	mov    %rbx,%rdi
  3d:	e8                   	.byte 0xe8
  3e:	a6                   	cmpsb  %es:(%rdi),%ds:(%rsi)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [syzbot] [net?] kernel BUG in pn_socket_autobind
From: syzbot @ 2026-04-21 15:35 UTC (permalink / raw)
  To: courmisch, davem, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    b4e07588e743 tracing: tell git to ignore the generated 'un..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10aa8e6a580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=95ee3fe1c5a8ab57
dashboard link: https://syzkaller.appspot.com/bug?extid=b3c0e6a240078433c42b
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-b4e07588.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/bb7067cc2bb4/vmlinux-b4e07588.xz
kernel image: https://storage.googleapis.com/syzbot-assets/46b9d3bae153/bzImage-b4e07588.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b3c0e6a240078433c42b@syzkaller.appspotmail.com

netlink: 'syz.3.104': attribute type 2 has an invalid length.
------------[ cut here ]------------
kernel BUG at net/phonet/socket.c:213!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 6429 Comm: syz.3.104 Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:pn_socket_autobind net/phonet/socket.c:213 [inline]
RIP: 0010:pn_socket_autobind+0x14c/0x170 net/phonet/socket.c:202
Code: 00 00 00 00 48 8b 44 24 58 65 48 2b 05 35 85 47 09 75 2a 48 83 c4 60 89 d8 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 f5 0f 3d f7 90 <0f> 0b e8 ad c8 aa f7 eb 9e e8 06 c9 aa f7 e9 6c ff ff ff e8 5c b0
RSP: 0018:ffffc900037ffa30 EFLAGS: 00010287
RAX: 000000000000084d RBX: 0000000000000000 RCX: ffffc90007544000
RDX: 0000000000080000 RSI: ffffffff8acc6aeb RDI: ffff888034de8000
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff920006fff46
R13: dffffc0000000000 R14: 1ffff920006fff61 R15: ffffc900037ffe48
FS:  00007fdb20e7f6c0(0000) GS:ffff8880d63e1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fce31d5ee9c CR3: 000000002d58c000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 pn_socket_sendmsg+0x43/0xe0 net/phonet/socket.c:421
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2698
 ___sys_sendmsg+0x190/0x1e0 net/socket.c:2752
 __sys_sendmsg+0x170/0x220 net/socket.c:2784
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdb1ff9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdb20e7f028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fdb20215fa0 RCX: 00007fdb1ff9c819
RDX: 0000000000008000 RSI: 0000200000000380 RDI: 0000000000000006
RBP: 00007fdb20032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fdb20216038 R14: 00007fdb20215fa0 R15: 00007ffc7b77c6b8
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:pn_socket_autobind net/phonet/socket.c:213 [inline]
RIP: 0010:pn_socket_autobind+0x14c/0x170 net/phonet/socket.c:202
Code: 00 00 00 00 48 8b 44 24 58 65 48 2b 05 35 85 47 09 75 2a 48 83 c4 60 89 d8 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 f5 0f 3d f7 90 <0f> 0b e8 ad c8 aa f7 eb 9e e8 06 c9 aa f7 e9 6c ff ff ff e8 5c b0
RSP: 0018:ffffc900037ffa30 EFLAGS: 00010287
RAX: 000000000000084d RBX: 0000000000000000 RCX: ffffc90007544000
RDX: 0000000000080000 RSI: ffffffff8acc6aeb RDI: ffff888034de8000
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff920006fff46
R13: dffffc0000000000 R14: 1ffff920006fff61 R15: ffffc900037ffe48
FS:  00007fdb20e7f6c0(0000) GS:ffff8880d63e1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fce31d5ee9c CR3: 000000002d58c000 CR4: 0000000000352ef0


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [syzbot] [net?] BUG: spinlock already unlocked in lec_atm_close
From: syzbot @ 2026-04-21 15:35 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    1f5ffc672165 Fix mismerge of the arm64 / timer-core interr..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12b85906580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=95729ed00549063a
dashboard link: https://syzkaller.appspot.com/bug?extid=49a9c7b281bb147f8eb3
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6b552538b97f/disk-1f5ffc67.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/724a3a1d69d7/vmlinux-1f5ffc67.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ea684969e2c2/bzImage-1f5ffc67.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+49a9c7b281bb147f8eb3@syzkaller.appspotmail.com

BUG: spinlock already unlocked on CPU#0, syz.2.25/6044
 lock: 0xffff888032ea0ea0, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
CPU: 0 UID: 0 PID: 6044 Comm: syz.2.25 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 spin_bug kernel/locking/spinlock_debug.c:78 [inline]
 debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
 do_raw_spin_unlock+0x143/0x210 kernel/locking/spinlock_debug.c:141
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:177 [inline]
 _raw_spin_unlock_irqrestore+0x23/0x80 kernel/locking/spinlock.c:198
 spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
 lec_arp_destroy net/atm/lec.c:1543 [inline]
 lec_atm_close+0x976/0x9e0 net/atm/lec.c:497
 vcc_destroy_socket net/atm/common.c:181 [inline]
 vcc_release+0x10f/0x580 net/atm/common.c:205
 svc_release+0x6c/0xd0 net/atm/svc.c:95
 __sock_release net/socket.c:722 [inline]
 sock_close+0xc3/0x240 net/socket.c:1514
 __fput+0x44f/0xa60 fs/file_table.c:510
 task_work_run+0x1d9/0x270 kernel/task_work.c:233
 get_signal+0x11eb/0x1330 kernel/signal.c:2811
 arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
 exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:238 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:328 [inline]
 do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1af9f9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f1afae30028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: 0000000000000000 RBX: 00007f1afa215fa0 RCX: 00007f1af9f9c819
RDX: 0000000000000000 RSI: 00000000000061d0 RDI: 0000000000000005
RBP: 00007f1afa032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f1afa216038 R14: 00007f1afa215fa0 R15: 00007ffdc6f7add8
 </TASK>
------------[ cut here ]------------
pvqspinlock: lock 0xffff888032ea0ea0 has corrupted value 0x0!
WARNING: kernel/locking/qspinlock_paravirt.h:506 at __pv_queued_spin_unlock_slowpath+0x1a5/0x270 kernel/locking/qspinlock_paravirt.h:504, CPU#0: syz.2.25/6044
Modules linked in:
CPU: 0 UID: 0 PID: 6044 Comm: syz.2.25 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
RIP: 0010:__pv_queued_spin_unlock_slowpath+0x1ce/0x270 kernel/locking/qspinlock_paravirt.h:504
Code: 04 48 89 df be 04 00 00 00 e8 4e 58 7b f6 48 89 d8 48 c1 e8 03 42 0f b6 04 28 84 c0 0f 85 87 00 00 00 8b 13 4c 89 f7 48 89 de <67> 48 0f b9 3a eb 98 48 c7 c7 30 c8 82 8e 4c 89 f6 4c 89 fa e8 f9
RSP: 0018:ffffc90005087920 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff888032ea0ea0 RCX: ffffffff8bb5d542
RDX: 0000000000000000 RSI: ffff888032ea0ea0 RDI: ffffffff90357220
RBP: 1ffff110065d41d5 R08: ffff888032ea0ea3 R09: 1ffff110065d41d4
R10: dffffc0000000000 R11: ffffed10065d41d5 R12: dffffc0000000000
R13: dffffc0000000000 R14: ffffffff90357220 R15: ffff888032ea0ea8
FS:  00007f1afae306c0(0000) GS:ffff888125245000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1994be92f8 CR3: 000000007abce000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __raw_callee_save___pv_queued_spin_unlock_slowpath+0x15/0x30
 .slowpath+0x9/0x18
 pv_queued_spin_unlock arch/x86/include/asm/paravirt-spinlock.h:40 [inline]
 queued_spin_unlock arch/x86/include/asm/paravirt-spinlock.h:72 [inline]
 do_raw_spin_unlock+0xf5/0x210 kernel/locking/spinlock_debug.c:142
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:177 [inline]
 _raw_spin_unlock_irqrestore+0x23/0x80 kernel/locking/spinlock.c:198
 spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
 lec_arp_destroy net/atm/lec.c:1543 [inline]
 lec_atm_close+0x976/0x9e0 net/atm/lec.c:497
 vcc_destroy_socket net/atm/common.c:181 [inline]
 vcc_release+0x10f/0x580 net/atm/common.c:205
 svc_release+0x6c/0xd0 net/atm/svc.c:95
 __sock_release net/socket.c:722 [inline]
 sock_close+0xc3/0x240 net/socket.c:1514
 __fput+0x44f/0xa60 fs/file_table.c:510
 task_work_run+0x1d9/0x270 kernel/task_work.c:233
 get_signal+0x11eb/0x1330 kernel/signal.c:2811
 arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
 exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:238 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:328 [inline]
 do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1af9f9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f1afae30028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: 0000000000000000 RBX: 00007f1afa215fa0 RCX: 00007f1af9f9c819
RDX: 0000000000000000 RSI: 00000000000061d0 RDI: 0000000000000005
RBP: 00007f1afa032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f1afa216038 R14: 00007f1afa215fa0 R15: 00007ffdc6f7add8
 </TASK>
----------------
Code disassembly (best guess):
   0:	04 48                	add    $0x48,%al
   2:	89 df                	mov    %ebx,%edi
   4:	be 04 00 00 00       	mov    $0x4,%esi
   9:	e8 4e 58 7b f6       	call   0xf67b585c
   e:	48 89 d8             	mov    %rbx,%rax
  11:	48 c1 e8 03          	shr    $0x3,%rax
  15:	42 0f b6 04 28       	movzbl (%rax,%r13,1),%eax
  1a:	84 c0                	test   %al,%al
  1c:	0f 85 87 00 00 00    	jne    0xa9
  22:	8b 13                	mov    (%rbx),%edx
  24:	4c 89 f7             	mov    %r14,%rdi
  27:	48 89 de             	mov    %rbx,%rsi
* 2a:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2f:	eb 98                	jmp    0xffffffc9
  31:	48 c7 c7 30 c8 82 8e 	mov    $0xffffffff8e82c830,%rdi
  38:	4c 89 f6             	mov    %r14,%rsi
  3b:	4c 89 fa             	mov    %r15,%rdx
  3e:	e8                   	.byte 0xe8
  3f:	f9                   	stc


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [PATCH bpf v2] bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
From: Weiming Shi @ 2026-04-21 15:50 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Martin KaFai Lau, Alexei Starovoitov, Amery Hung,
	Leon Hwang, Kees Cook, Fushuai Wang, Menglong Dong, netdev, bpf,
	Xiang Mei, Weiming Shi

bpf_selem_unlink_nofail() sets SDATA(selem)->smap to NULL before
removing the selem from the storage hlist. A concurrent RCU reader in
bpf_sk_storage_clone() can observe the selem still on the list with
smap already NULL, causing a NULL pointer dereference.

 general protection fault, probably for non-canonical address 0xdffffc000000000a:
 KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
 RIP: 0010:bpf_sk_storage_clone+0x1cd/0xaa0 net/core/bpf_sk_storage.c:174
 Call Trace:
  <IRQ>
  sk_clone+0xfed/0x1980 net/core/sock.c:2591
  inet_csk_clone_lock+0x30/0x760 net/ipv4/inet_connection_sock.c:1222
  tcp_create_openreq_child+0x35/0x2680 net/ipv4/tcp_minisocks.c:571
  tcp_v4_syn_recv_sock+0x123/0xf90 net/ipv4/tcp_ipv4.c:1729
  tcp_check_req+0x8e1/0x2580 include/net/tcp.h:855
  tcp_v4_rcv+0x1845/0x3b80 net/ipv4/tcp_ipv4.c:2347

Add a NULL check for smap in bpf_sk_storage_clone().
bpf_sk_storage_diag_put_all() has the same unprotected dereference
pattern, add a NULL check there as well.

Fixes: 5d800f87d0a5 ("bpf: Support lockless unlink when freeing map or local storage")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v2:
 drop the NULL check in diag_get(); The caller already checks smap for
NULL.

 net/core/bpf_sk_storage.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f8338acebf077..67901de5b9c65 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -172,7 +172,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 		struct bpf_map *map;
 
 		smap = rcu_dereference(SDATA(selem)->smap);
-		if (!(smap->map.map_flags & BPF_F_CLONE))
+		if (!smap || !(smap->map.map_flags & BPF_F_CLONE))
 			continue;
 
 		/* Note that for lockless listeners adding new element
@@ -599,6 +599,8 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
 	saved_len = skb->len;
 	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
 		smap = rcu_dereference(SDATA(selem)->smap);
+		if (!smap)
+			continue;
 		diag_size += nla_value_size(smap->map.value_size);
 
 		if (nla_stgs && diag_get(SDATA(selem), skb))
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest

This small patchset is about avoid infinite recursion in TCP header option callbacks
and bpf-tcp-cc callbacks via TCP_NODELAY setsockopt.

v4:
 - Fix the test case for TCP header option callbacks (Martin and Jiayuan)
 - Reject TCP_NODELAY in bpf-tcp-cc callbacks (AI and Martin)
 - Add a test case for bpf-tcp-cc

v3:
 - Remove CONFIG_INET check and add comment (Martin and Jiayuan)
 - Fix the test case (Martin)
 https://lore.kernel.org/bpf/20260417092035.2299913-1-kafai.wan@linux.dev/

v2:
 - Reject TCP_NODELAY in bpf_sock_ops_setsockopt() (AI and Martin)
 https://lore.kernel.org/bpf/20260416112308.1820332-1-kafai.wan@linux.dev/

v1:
 https://lore.kernel.org/bpf/20260414112310.1285783-1-kafai.wan@linux.dev/

---
KaFai Wan (4):
  bpf: Reject TCP_NODELAY in TCP header option callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY

 include/linux/bpf.h                           |  1 +
 net/core/filter.c                             | 30 +++++++++++++++++++
 net/ipv4/bpf_tcp_ca.c                         |  2 +-
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 +++
 .../bpf/prog_tests/tcp_hdr_options.c          |  6 ++++
 tools/testing/selftests/bpf/progs/bpf_cubic.c | 12 ++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 15 +++++++++-
 7 files changed, 68 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply

* [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
In-Reply-To: <20260421155804.135786-1-kafai.wan@linux.dev>

A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
__tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().

From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
tcp_current_mss(), which calls tcp_established_options() and re-enters
bpf_skops_hdr_opt_len().

BPF_SOCK_OPS_HDR_OPT_LEN_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_current_mss()
        -> tcp_established_options()
          -> bpf_skops_hdr_opt_len()
            -> BPF_SOCK_OPS_HDR_OPT_LEN_CB

From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
header option length through tcp_established_options() and
bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().

BPF_SOCK_OPS_WRITE_HDR_OPT_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_write_xmit()
        -> tcp_transmit_skb()
          -> tcp_established_options()
            -> bpf_skops_hdr_opt_len()
          -> bpf_skops_write_hdr_opt()
            -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB

This leads to unbounded recursion and can overflow the kernel stack.

Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
when bpf_setsockopt() is called from
BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 net/core/filter.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 5fa9189eb772..96849f4c1fbc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;

+	/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }

-- 
2.43.0

^ permalink raw reply related

* [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
In-Reply-To: <20260421155804.135786-1-kafai.wan@linux.dev>

A BPF TCP congestion control program can call bpf_setsockopt() from
its callbacks. In current kernels, if it calls
bpf_setsockopt(TCP_NODELAY) from cwnd_event_tx_start(), the call can
re-enter the TCP transmit path before the outer tcp_transmit_skb()
has completed and advanced the send head.

This can re-trigger CA_EVENT_TX_START and lead to unbounded recursion:

  tcp_transmit_skb()
    -> tcp_event_data_sent()
      -> tcp_ca_event(sk, CA_EVENT_TX_START)
        -> cwnd_event_tx_start()
          -> bpf_setsockopt(TCP_NODELAY)
            -> tcp_push_pending_frames()
              -> tcp_write_xmit()
                -> tcp_transmit_skb()

This leads to unbounded recursion and can overflow the kernel stack.

Reject TCP_NODELAY with -EOPNOTSUPP for bpf-tcp-cc by introducing
a dedicated setsockopt proto for BPF_PROG_TYPE_STRUCT_OPS TCP
congestion control programs.

Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 include/linux/bpf.h   |  1 +
 net/core/filter.c     | 24 ++++++++++++++++++++++++
 net/ipv4/bpf_tcp_ca.c |  2 +-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..cf75da8a12bd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3725,6 +3725,7 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto;
 extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
 extern const struct bpf_func_proto bpf_find_vma_proto;
diff --git a/net/core/filter.c b/net/core/filter.c
index 96849f4c1fbc..1140f4b55ab5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5688,6 +5688,30 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_sk_setsockopt_nodelay, struct sock *, sk, int, level,
+	   int, optname, char *, optval, int, optlen)
+{
+	/*
+	 * TCP_NODELAY triggers tcp_push_pending_frames() and re-enters
+	 * CA_EVENT_TX_START in bpf_tcp_cc, reject it in all bpf_tcp_cc.
+	 */
+	if (level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
+	return _bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto = {
+	.func		= bpf_sk_setsockopt_nodelay,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 BPF_CALL_5(bpf_unlocked_sk_setsockopt, struct sock *, sk, int, level,
 	   int, optname, char *, optval, int, optlen)
 {
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 008edc7f6688..791e15063237 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -168,7 +168,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 		 */
 		if (prog_ops_moff(prog) !=
 		    offsetof(struct tcp_congestion_ops, release))
-			return &bpf_sk_setsockopt_proto;
+			return &bpf_sk_setsockopt_nodelay_proto;
 		return NULL;
 	case BPF_FUNC_getsockopt:
 		/* Since get/setsockopt is usually expected to
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
In-Reply-To: <20260421155804.135786-1-kafai.wan@linux.dev>

Add a sockops selftest for the TCP_NODELAY restriction in
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

With BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG enabled,
bpf_setsockopt(TCP_NODELAY) returns -EOPNOTSUPP from
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB, avoiding
unbounded recursion and kernel stack overflow.

Other cases continue to work as before, including
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../selftests/bpf/prog_tests/tcp_hdr_options.c    |  6 ++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c         | 15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..21632e0946c5 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -507,6 +507,12 @@ static void misc(void)
 
 	ASSERT_EQ(misc_skel->bss->nr_hwtstamp, 0, "nr_hwtstamp");
 
+	ASSERT_TRUE(misc_skel->data->nodelay_est_ok, "nodelay_est_ok");
+
+	ASSERT_TRUE(misc_skel->data->nodelay_hdr_len_reject, "nodelay_hdr_len_reject");
+
+	ASSERT_TRUE(misc_skel->data->nodelay_write_hdr_reject, "nodelay_write_hdr_reject");
+
 check_linum:
 	ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum");
 	sk_fds_close(&sk_fds);
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e77ec6791092 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -29,6 +29,10 @@ unsigned int nr_syn = 0;
 unsigned int nr_fin = 0;
 unsigned int nr_hwtstamp = 0;
 
+bool nodelay_est_ok = true;
+bool nodelay_hdr_len_reject = true;
+bool nodelay_write_hdr_reject = true;
+
 /* Check the header received from the active side */
 static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn)
 {
@@ -300,7 +304,7 @@ static int handle_passive_estab(struct bpf_sock_ops *skops)
 SEC("sockops")
 int misc_estab(struct bpf_sock_ops *skops)
 {
-	int true_val = 1;
+	int true_val = 1, false_val = 0, ret;
 
 	switch (skops->op) {
 	case BPF_SOCK_OPS_TCP_LISTEN_CB:
@@ -316,10 +320,19 @@ int misc_estab(struct bpf_sock_ops *skops)
 	case BPF_SOCK_OPS_PARSE_HDR_OPT_CB:
 		return handle_parse_hdr(skops);
 	case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		nodelay_hdr_len_reject &= ret == -EOPNOTSUPP;
+
 		return handle_hdr_opt_len(skops);
 	case BPF_SOCK_OPS_WRITE_HDR_OPT_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		nodelay_write_hdr_reject &= ret == -EOPNOTSUPP;
+
 		return handle_write_hdr_opt(skops);
 	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &false_val, sizeof(false_val));
+		nodelay_est_ok &= ret == 0;
+
 		return handle_passive_estab(skops);
 	}
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
In-Reply-To: <20260421155804.135786-1-kafai.wan@linux.dev>

Add a bpf_tcp_ca selftest for the TCP_NODELAY restriction in
bpf-tcp-cc.

Update bpf_cubic to exercise init() and cwnd_event_tx_start(),
and check that both callbacks reject bpf_setsockopt(TCP_NODELAY)
with -EOPNOTSUPP.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c |  4 ++++
 tools/testing/selftests/bpf/progs/bpf_cubic.c       | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index f829b6f09bc9..4f632aa3a79e 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -112,6 +112,10 @@ static void test_cubic(void)
 
 	ASSERT_EQ(cubic_skel->bss->bpf_cubic_acked_called, 1, "pkts_acked called");
 
+	ASSERT_TRUE(cubic_skel->data->nodelay_init_reject, "init reject nodelay option");
+	ASSERT_TRUE(cubic_skel->data->nodelay_cwnd_event_tx_start_reject,
+		    "cwnd_event_tx_start reject nodelay option");
+
 	bpf_link__destroy(link);
 	bpf_cubic__destroy(cubic_skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cubic.c
index ce18a4db813f..b941ab3ebad5 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic.c
@@ -16,6 +16,7 @@
 
 #include "bpf_tracing_net.h"
 #include <bpf/bpf_tracing.h>
+#include <errno.h>
 
 char _license[] SEC("license") = "GPL";
 
@@ -170,10 +171,17 @@ static void bictcp_hystart_reset(struct sock *sk)
 	ca->sample_cnt = 0;
 }
 
+bool nodelay_init_reject = true;
+bool nodelay_cwnd_event_tx_start_reject = true;
+
 SEC("struct_ops")
 void BPF_PROG(bpf_cubic_init, struct sock *sk)
 {
 	struct bpf_bictcp *ca = inet_csk_ca(sk);
+	int true_val = 1, ret;
+
+	ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	nodelay_init_reject &= ret == -EOPNOTSUPP;
 
 	bictcp_reset(ca);
 
@@ -189,8 +197,12 @@ void BPF_PROG(bpf_cubic_cwnd_event_tx_start, struct sock *sk)
 {
 	struct bpf_bictcp *ca = inet_csk_ca(sk);
 	__u32 now = tcp_jiffies32;
+	int true_val = 1, ret;
 	__s32 delta;
 
+	ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	nodelay_cwnd_event_tx_start_reject &= ret == -EOPNOTSUPP;
+
 	delta = now - tcp_sk(sk)->lsndtime;
 
 	/* We were application limited (idle) for a while.
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH iproute2] ss: fix vsock port filter
From: Luigi Leonardi @ 2026-04-21 16:03 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: stefanha, netdev
In-Reply-To: <aeeEXS0n3VpM791Y@sgarzare-redhat>

On Tue, Apr 21, 2026 at 04:07:41PM +0200, Stefano Garzarella wrote:
>On Tue, Apr 21, 2026 at 02:35:12PM +0200, Luigi Leonardi wrote:
>>parse_hostcond() uses get_u32() to parse the vsock port into the
>>aafilter.port field, which is a long. On 64-bit systems, get_u32()
>>only writes the lower 32 bits, leaving the upper 32 bits set from
>>the -1 initialization. This causes the port comparison
>>"a->port != s->rport" in run_ssfilter() to always fail, since the
>>corrupted long value never matches the int rport.
>>
>>Fix by using get_long() instead, consistent with how AF_PACKET and
>>AF_NETLINK handle the same field.
>>
>>Fixes: c759116a0b2b ("ss: add AF_VSOCK support")
>
>Can this more related to commit 012cb515 ("ss: change aafilter port 
>from int to long (inode support)") ?
>
>I don't know this code at all, just asking.
>
>Stefano

oh yes, you are right!

Luigi


^ permalink raw reply

* Re: [PATCH 0/3] mptcp: add RECVERR and MSG_ERRQUEUE support
From: Matthieu Baerts @ 2026-04-21 16:07 UTC (permalink / raw)
  To: David Carlier, netdev, mptcp
  Cc: martineau, geliang, davem, edumazet, kuba, pabeni, horms
In-Reply-To: <20260421152216.38127-1-devnexen@gmail.com>

Hi David,

On 21/04/2026 17:22, David Carlier wrote:
> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
> parent socket does not currently provide usable MSG_ERRQUEUE handling.
> 
> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
> paths. It propagates RECVERR-related sockopts to existing and future
> subflows, makes poll() report pending errqueue activity through the
> parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to
> consume queued errors with the parent socket ABI.
> 
> The series also handles mixed-family subflows by applying the matching
> sockopt according to each subflow family, and avoids silently losing an
> error skb if requeueing to the parent socket fails under rmem pressure.
Thank you for this series!

Even if I agree it would be good to have full MSG_ERRQUEUE support,
net-next is currently closed, and only bug fixes are accepted, see:

  https://docs.kernel.org/process/maintainer-netdev.html

pw-bot: defer

Instead, I suggest switching the discussions only to the MPTCP ML if
that's OK. If the CI is happy, someone will try to review it over there,
when time permits. If not, please send the new versions only to the
MPTCP ML, with the 'PATCH mptcp-next' prefix, and ideally on top of the
'export' (or 'for-review') branch of our tree. For more details:

  https://www.mptcp.dev/contributing.html#kernel-development

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply

* [PATCH net v2 1/1] net/sched: cls_flower: avoid stale mask references after delete
From: Ren Wei @ 2026-04-21 16:03 UTC (permalink / raw)
  To: netdev
  Cc: jhs, jiri, davem, edumazet, kuba, pabeni, horms, sbrivio, vladbu,
	yuantan098, yifanwucs, tomapufckgml, bird, kanolyc, z1652074432,
	n05ec

From: Yuhang Zheng <z1652074432@gmail.com>

cls_flower keeps filter and mask state separately. After a filter is
removed or replaced, some paths can still need the mask data associated
with that filter.

Cache the mask key and dissector in struct cls_fl_filter when the mask
is assigned, and use the cached copies in dump and offload paths. This
avoids depending on the external mask object's lifetime after delete or
replace.

Fixes: 92149190067d ("net: sched: flower: set unlocked flag for flower proto ops")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yucheng Lu <kanolyc@gmail.com>
Signed-off-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---

changes in v2:
- target the net tree instead of nf
- fix the Fixes tag to the first triggerable introduction
- correct the Yuan Tan Reported-by address and add the forwarder sign-off
- v1 Link: https://lore.kernel.org/all/0fdcae6ac3e07afbbd43958f6b42e2ed6281e3d2.1773559972.git.z1652074432@gmail.com

 net/sched/cls_flower.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 099ff6a3e1f5..c1f10b4ec748 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -124,8 +124,10 @@ struct cls_fl_head {
 
 struct cls_fl_filter {
 	struct fl_flow_mask *mask;
+	struct flow_dissector mask_dissector;
 	struct rhash_head ht_node;
 	struct fl_flow_key mkey;
+	struct fl_flow_key mask_key;
 	struct tcf_exts exts;
 	struct tcf_result res;
 	struct fl_flow_key key;
@@ -445,6 +447,12 @@ static void fl_destroy_filter_work(struct work_struct *work)
 	__fl_destroy_filter(f);
 }
 
+static void fl_filter_copy_mask(struct cls_fl_filter *f)
+{
+	f->mask_key = f->mask->key;
+	f->mask_dissector = f->mask->dissector;
+}
+
 static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 				 bool rtnl_held, struct netlink_ext_ack *extack)
 {
@@ -476,8 +484,8 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_REPLACE;
 	cls_flower.cookie = (unsigned long) f;
-	cls_flower.rule->match.dissector = &f->mask->dissector;
-	cls_flower.rule->match.mask = &f->mask->key;
+	cls_flower.rule->match.dissector = &f->mask_dissector;
+	cls_flower.rule->match.mask = &f->mask_key;
 	cls_flower.rule->match.key = &f->mkey;
 	cls_flower.classid = f->res.classid;
 
@@ -2489,6 +2497,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	err = fl_check_assign_mask(head, fnew, fold, mask);
 	if (err)
 		goto unbind_filter;
+	fl_filter_copy_mask(fnew);
 
 	err = fl_ht_insert_unique(fnew, fold, &in_ht);
 	if (err)
@@ -2705,8 +2714,8 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		cls_flower.command = add ?
 			FLOW_CLS_REPLACE : FLOW_CLS_DESTROY;
 		cls_flower.cookie = (unsigned long)f;
-		cls_flower.rule->match.dissector = &f->mask->dissector;
-		cls_flower.rule->match.mask = &f->mask->key;
+		cls_flower.rule->match.dissector = &f->mask_dissector;
+		cls_flower.rule->match.mask = &f->mask_key;
 		cls_flower.rule->match.key = &f->mkey;
 
 		err = tc_setup_offload_action(&cls_flower.rule->action, &f->exts,
@@ -3709,7 +3718,7 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 		goto nla_put_failure_locked;
 
 	key = &f->key;
-	mask = &f->mask->key;
+	mask = &f->mask_key;
 	skip_hw = tc_skip_hw(f->flags);
 
 	if (fl_dump_key(skb, net, key, mask))
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 18/23] cpu/hotplug: Add a new cpuhp_offline_cb() API
From: Thomas Gleixner @ 2026-04-21 16:17 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Guenter Roeck, Frederic Weisbecker, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
	Chen Ridong, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: cgroups, linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
	linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
	Qiliang Yuan, Waiman Long
In-Reply-To: <20260421030351.281436-19-longman@redhat.com>

On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> Add a new cpuhp_offline_cb() API that allows us to offline a set of
> CPUs one-by-one, run the given callback function and then bring those
> CPUs back online again while inhibiting any concurrent CPU hotplug
> operations from happening.

Please provide a properly structured change log which explains the
context, the problem and the solution in separate paragraphs and this
order. This is not new. It's documented...

> This new API can be used to enable runtime adjustment of nohz_full and
> isolcpus boot command line options. A new cpuhp_offline_cb_mode flag
> is also added to signal that the system is in this offline callback
> transient state so that some hotplug operations can be optimized out
> if we choose to.

We chose nothing.

> +#include <linux/cpumask_types.h>

What for? This header only needs a 'struct cpumask' forward declaration
so that the compiler can handle the pointer argument, no?

> +typedef int (*cpuhp_cb_t)(void *arg);

You couldn't come up with a more generic name for this, right?

>  struct device;
>  
>  extern int lockdep_is_cpus_held(void);
> @@ -29,6 +31,8 @@ void clear_tasks_mm_cpumask(int cpu);
>  int remove_cpu(unsigned int cpu);
>  int cpu_device_down(struct device *dev);
>  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu);
> +int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, void *arg);

Ditto.

> +extern bool cpuhp_offline_cb_mode;

Groan. The only users are in the cpusets code which invokes this muck
and should therefore know what's going on, no?

>  #else /* CONFIG_HOTPLUG_CPU */
>  
> @@ -43,6 +47,11 @@ static inline void cpu_hotplug_disable(void) { }
>  static inline void cpu_hotplug_enable(void) { }
>  static inline int remove_cpu(unsigned int cpu) { return -EPERM; }
>  static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { }
> +static inline int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, void *arg)
> +{
> +	return -EPERM;

-EPERM?

> +/**
> + * cpuhp_offline_cb - offline CPUs, invoke callback function & online CPUs afterward
> + * @mask: A mask of CPUs to be taken offline and then online
> + * @func: A callback function to be invoked while the given CPUs are offline
> + * @arg:  Argument to be passed back to the callback function
> + *
> + * Return: 0 if successful, an error code otherwise
> + */
> +int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, void *arg)
> +{
> +	int off_cpu, on_cpu, ret, ret2 = 0;
> +
> +	if (WARN_ON_ONCE(cpumask_empty(mask) ||
> +	   !cpumask_subset(mask, cpu_online_mask)))
> +		return -EINVAL;

No line break required. You have 100 characters.

But what's worse is that the access to cpu_online_mask is not protected
against a concurrent CPU hotplug operation.

> +
> +	pr_debug("%s: begin (CPU list = %*pbl)\n", __func__, cpumask_pr_args(mask));

Tracing?

> +	lock_device_hotplug();
> +	cpuhp_offline_cb_mode = true;
> +	/*
> +	 * If all offline operations succeed, off_cpu should become nr_cpu_ids.
> +	 */
> +	for_each_cpu(off_cpu, mask) {
> +		ret = device_offline(get_cpu_device(off_cpu));
> +		if (unlikely(ret))
> +			break;
> +	}
> +	if (!ret)
> +		ret = func(arg);
> +
> +	/* Bring previously offline CPUs back online */
> +	for_each_cpu(on_cpu, mask) {
> +		int retries = 0;
> +
> +		if (on_cpu == off_cpu)
> +			break;
> +
> +retry:
> +		ret2 = device_online(get_cpu_device(on_cpu));
> +
> +		/*
> +		 * With the unlikely event that CPU hotplug is disabled while
> +		 * this operation is in progress, we will need to wait a bit
> +		 * for hotplug to hopefully be re-enabled again. If not, print
> +		 * a warning and return the error.
> +		 *
> +		 * cpu_hotplug_disabled is supposed to be accessed while
> +		 * holding the cpu_add_remove_lock mutex. So we need to
> +		 * use the data_race() macro to access it here.
> +		 */
> +		while ((ret2 == -EBUSY) && data_race(cpu_hotplug_disabled) &&
> +		       (++retries <= 5)) {
> +			msleep(20);
> +			if (!data_race(cpu_hotplug_disabled))
> +				goto retry;
> +		}
> +		if (ret2) {
> +			pr_warn("%s: Failed to bring CPU %d back online!\n",
> +				__func__, on_cpu);

Provide a proper text and not this silly __func__ thing.

> +			break;
> +		}
> +	}

TBH. This is unreviewable gunk and the whole 'unlikely event that CPU
hotplug is disabled' is just a lazy hack.

All of this can be avoided including this made up callback function.

It's not rocket science to provide:

     1) A function which serializes against any other CPU hotplug
        related action.

     2) A function which brings the CPUs in a given CPU mask down

     3) A function which brings the CPUs in a given CPU mask up

     4) A function which undoes #1

Yeah I know, it's more work and not convoluted enough. But see below.

That brings me to that other hack namely cpuhp_offline_cb_mode, which
you self described as such in patch 21/23:

> +	/*
> +	 * Hack: In cpuhp_offline_cb_mode, pretend all partitions are empty
> +	 * to prevent unnecessary partition invalidation.
> +	 */
> +	if (cpuhp_offline_cb_mode)
> +		return false;
> +

We are not merging hacks. End of story. But you knew that already, no?

Let's take a step back and see what you really need to achieve:

  1) Update tick_nohz_full_mask
  2) Update the managed interrupt mask
  3) Update CPU sets

Independent of the direction of this update you need to ensure that the
affected functionality keeps working correctly.

You achieve that by bulk offlining the affected CPUs, invoking a magic
callback and then bulk onlining the affected CPUs again, which requires
that ill defined cpuhp_offline_cb_mode hackery and probably some more
hacks all over the place.

You can achieve the same by doing CPU by CPU operations in the right
order without this mode hack, when you establish proper limitations for
this:

  At no point in time it's allowed to empty a CPU set or a affected CPU
  mask, except when you completely undo the isolation of CPUs.

  That can be computed upfront w/o changing anything at all. Once the
  validity is established, the update can proceed. Or you can leave it
  to user space which can keep the pieces if it gets it wrong.

That's a reasonable limitation as there is absolutely zero justification
to support something like:

       housekeeping_cpus = [CPU 0], isolated_cpus = [CPU 1]
  ---> housekeeping_cpus = [CPU 1], isolated_cpus = [CPU 0]

just because we can with enough horrible hacks.

If you get that out of the way, then a CPU by CPU update becomes the
obvious and simplest solution. The ordering constraints can be computed
in user space upfront and there is no reason to do any of this in the
kernel itself except for an eventual validation step. It might be a tad
slower, but this is all but a hotpath operation.

Just for the record. I suggested exactly this more than a year ago and
it's still the right thing to do.

And of course neither your cover letter nor any of the patches give a
proper rationale why you think that your bulk hackery is better. For the
very simple reason that there is no rationale at all.

This bulk muck is doomed when your ultimate goal is to avoid the stop
machine dance. With a per CPU update it is actually doable without more
ill defined hacks all over the place.

   1) Bring down the CPU to CPUHP_AP_SCHED_WAIT_EMPTY, which is the last
      state before stop machine is invoked.

      At that point:

         - no user space thread is running on the CPU anymore

         - everything related to this CPU has been shut down or moved
           elsewhere

         - interrupt managed device queues are quiesced if the CPU was
           the last online one in the queue affinity mask. If not the
           interrupt might still be affine to the CPU, but there is at
           least one other CPU available in the mask.

   2) Update the tick NOHZ handover

      This can be done without going into stop machine by providing a
      hotplug callback right between CPUHP_AP_SMPBOOT_THREADS and
      CPUHP_AP_IRQ_AFFINITY_ONLINE.

      That's trivial enough to achieve and can work independently of
      NOHZ full.

   3) Rework the affinity management, so that interrupt affinities can
      be reassigned in the CPUHP_AP_IRQ_AFFINITY_ONLINE state.

      That needs a lot of thoughts, but there is no real reason why it
      can't work.

   4) Flip the housekeeping CPU masks in sched_cpu_wait_empty() after
      balance_hotplug_wait().

   5) Bring the CPU online again.

For #2 and #3 to work you need a separate CPU mask which avoids touching
CPU online mask. For #3 this needs some more work to avoid reassigning the
interrupts once sparse_irq_lock is dropped, but the bulk is achieved
with the separate CPU mask.

No?

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH net-deletions] net: remove ax25 and amateur radio (hamradio) subsystem
From: Dan Cross @ 2026-04-21 16:17 UTC (permalink / raw)
  To: stephen
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, corbet, skhan, federico.vaga, carlos.bilbao, avadhut.naik,
	alexs, si.yanteng, dzm91, 2023002089, tsbogend, dsahern,
	jani.nikula, mchehab+huawei, gregkh, jirislaby, tytso, herbert,
	ebiggers, johannes.berg, geert, pablo, tglx, mashiro.chen, mingo,
	dqfext, jreuter, sdf, pkshih, enelsonmoore, mkl, toke, kees,
	jlayton, wangliang74, aha310510, takamitz, kuniyu, linux-doc,
	linux-mips
In-Reply-To: <20260421065507.2c5e3ba7@phoenix.local>

On Tue, Apr 21, 2026 at 9:55 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Mon, 20 Apr 2026 19:18:23 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
> > Remove the amateur radio (AX.25, NET/ROM, ROSE) protocol implementation
> > and all associated hamradio device drivers from the kernel tree.
> > This set of protocols has long been a huge bug/syzbot magnet,
> > and since nobody stepped up to help us deal with the influx
> > of the AI-generated bug reports we need to move it out of tree
> > to protect our sanity.
> >
> > The code is moved to an out-of-tree repo:
> > https://github.com/linux-netdev/mod-orphan
> > if it's cleaned up and reworked there we can accept it back.
>
> It would be good if these protocols could be done in userspace
> or with BPF?

Consensus for a userspace implementation is what folks on linux-hams
seem to be converging on.

The amateur radio protocols are more or less specific to low-speed
links, they are not particularly coupled to anything else that
requires running in the kernel, and the main coupling point (IP over
AX.25) can be implemented via TAP/TUN.

There are several popular packages that already implement AX.25 and
NET/ROM in user-space (for the interested, LinBPQ seems to be the
canonical example).  The main missing piece is ROSE, but it is likely
easier to add that to an existing package, or potentially something
brand new, than keep it in the kernel.

There's no compelling reason to keep these protocols in the kernel,
whether in-tree or out-of-tree; at least, one has not been
articulated.

        - Dan C.

^ permalink raw reply

* Re: [PATCH net] netconsole: avoid out-of-bounds access on empty string in trim_newline()
From: Simon Horman @ 2026-04-21 16:22 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Matthew Wood, netdev, linux-kernel, kernel-team,
	stable
In-Reply-To: <20260420-netcons_trim_newline-v1-1-dc35889aeedf@debian.org>

On Mon, Apr 20, 2026 at 03:18:36AM -0700, Breno Leitao wrote:
> trim_newline() unconditionally dereferences s[len - 1] after computing
> len = strnlen(s, maxlen). When the string is empty, len is 0 and the
> expression underflows to s[(size_t)-1], reading (and potentially
> writing) one byte before the buffer.
> 
> The two callers feed trim_newline() with the result of strscpy() from
> configfs store callbacks (dev_name_store, userdatum_value_store).
> configfs guarantees count >= 1 reaches the callback, but the byte
> itself can be NUL: a userspace write(fd, "\0", 1) leaves the
> destination empty after strscpy() and triggers the underflow. The OOB
> write only fires if the adjacent byte happens to be '\n', so this is
> not a security issue, but the access is undefined behaviour either way.
> 
> This pattern is commonly flagged by LLM-based code reviewers. While it
> is not a security fix, the underlying access is undefined behaviour and
> the change is small and self-contained, so it is a reasonable candidate
> for the stable trees.
> 
> Guard the dereference on a non-zero length.
> 
> Fixes: ae001dc67907 ("net: netconsole: move newline trimming to function")
> Cc: stable@vger.kernel.org
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Simon Horman <horms@kernel.org>

Sashiko has provided some feedback on this patch.
I do not believe that should hold up progress of this patch.
But I'd appreciate it if you could look over that feedback
and see if any follow-up is warranted.

Thanks!

^ permalink raw reply

* Re: [PATCH bpf v2] bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
From: bot+bpf-ci @ 2026-04-21 16:24 UTC (permalink / raw)
  To: bestswngs, davem, edumazet, kuba, pabeni
  Cc: horms, martin.lau, ast, ameryhung, leon.hwang, kees, wangfushuai,
	menglong8.dong, netdev, bpf, xmei5, bestswngs, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260421155024.997077-2-bestswngs@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 14eb7812bda4..b2cd2b0292cb 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c

[ ... ]

> @@ -596,6 +596,8 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
>  	saved_len = skb->len;
>  	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
>  		smap = rcu_dereference(SDATA(selem)->smap);
> +		if (!smap)
> +			continue;
>  		diag_size += nla_value_size(smap->map.value_size);
>
>  		if (nla_stgs && diag_get(SDATA(selem), skb))

Martin KaFai Lau pointed out in the v1 review that the NULL check for
smap doesn't fully address the race condition. The issue is that
diag_get() will re-dereference sdata->smap independently:

net/core/bpf_sk_storage.c:diag_get() {
    ...
    smap = rcu_dereference(sdata->smap);
    ...
}

Between the NULL check above and the call to diag_get(), smap could
become NULL again, leading to the same race condition.

Martin suggested passing smap as a parameter to diag_get() to avoid
the second dereference. This would ensure diag_get() uses the same
smap pointer that was already checked.

Could the fix be updated to pass smap to diag_get() as Martin
suggested?

Reference: https://lore.kernel.org/bpf/2026420182243.zAiN.martin.lau@linux.dev/


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24732581685

^ permalink raw reply

* Re: [PATCH] netfilter: xt_HL: add pr_fmt, default case and NULL checks
From: kernel test robot @ 2026-04-21 16:24 UTC (permalink / raw)
  To: Marino Dzalto, pablo, fw
  Cc: llvm, oe-kbuild-all, netfilter-devel, coreteam, netdev,
	linux-kernel, Marino Dzalto
In-Reply-To: <20260403193929.89449-1-marino.dzalto@gmail.com>

Hi Marino,

kernel test robot noticed the following build errors:

[auto build test ERROR on netfilter-nf/main]
[also build test ERROR on linus/master v7.0 next-20260420]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Marino-Dzalto/netfilter-xt_HL-add-pr_fmt-default-case-and-NULL-checks/20260420-185652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git main
patch link:    https://lore.kernel.org/r/20260403193929.89449-1-marino.dzalto%40gmail.com
patch subject: [PATCH] netfilter: xt_HL: add pr_fmt, default case and NULL checks
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260422/202604220024.iITt6Hv8-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260422/202604220024.iITt6Hv8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604220024.iITt6Hv8-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/netfilter/xt_hl.c:34:6: error: cannot assign to variable 'ttl' with const-qualified type 'const u8' (aka 'const unsigned char')
      34 |         ttl = ip_hdr(skb)->ttl;
         |         ~~~ ^
   net/netfilter/xt_hl.c:29:11: note: variable 'ttl' declared const here
      29 |         const u8 ttl;
         |         ~~~~~~~~~^~~
   1 error generated.


vim +34 net/netfilter/xt_hl.c

    25	
    26	static bool ttl_mt(const struct sk_buff *skb, struct xt_action_param *par)
    27	{
    28		const struct ipt_ttl_info *info = par->matchinfo;
    29		const u8 ttl;
    30	
    31		if (!skb)
    32			return false;
    33	
  > 34		ttl = ip_hdr(skb)->ttl;
    35	
    36		switch (info->mode) {
    37		case IPT_TTL_EQ:
    38			return ttl == info->ttl;
    39		case IPT_TTL_NE:
    40			return ttl != info->ttl;
    41		case IPT_TTL_LT:
    42			return ttl < info->ttl;
    43		case IPT_TTL_GT:
    44			return ttl > info->ttl;
    45		default:
    46			pr_warn("Unknown TTL match mode: %d\n", info->mode);
    47			return false;
    48		}
    49	}
    50	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH bpf v2] bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
From: Amery Hung @ 2026-04-21 16:29 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: bestswngs, davem, edumazet, kuba, pabeni, horms, martin.lau, ast,
	leon.hwang, kees, wangfushuai, menglong8.dong, netdev, bpf, xmei5,
	andrii, daniel, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <cc500964d83d95307126b05cc7557309b498bafce0c514fa3f1656dd6413ce71@mail.kernel.org>

On Tue, Apr 21, 2026 at 9:24 AM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > index 14eb7812bda4..b2cd2b0292cb 100644
> > --- a/net/core/bpf_sk_storage.c
> > +++ b/net/core/bpf_sk_storage.c
>
> [ ... ]
>
> > @@ -596,6 +596,8 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
> >       saved_len = skb->len;
> >       hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> >               smap = rcu_dereference(SDATA(selem)->smap);
> > +             if (!smap)
> > +                     continue;
> >               diag_size += nla_value_size(smap->map.value_size);
> >
> >               if (nla_stgs && diag_get(SDATA(selem), skb))
>
> Martin KaFai Lau pointed out in the v1 review that the NULL check for
> smap doesn't fully address the race condition. The issue is that
> diag_get() will re-dereference sdata->smap independently:
>
> net/core/bpf_sk_storage.c:diag_get() {
>     ...
>     smap = rcu_dereference(sdata->smap);
>     ...
> }
>
> Between the NULL check above and the call to diag_get(), smap could
> become NULL again, leading to the same race condition.
>
> Martin suggested passing smap as a parameter to diag_get() to avoid
> the second dereference. This would ensure diag_get() uses the same
> smap pointer that was already checked.
>
> Could the fix be updated to pass smap to diag_get() as Martin
> suggested?
>
> Reference: https://lore.kernel.org/bpf/2026420182243.zAiN.martin.lau@linux.dev/
>

CI is right. Please pass smap to diag_get() as Martin suggested. It
will be the caller's responsibility to do rcu_dereference() and
null-check it. Note that there are more than one caller to diag_get().

>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24732581685

^ permalink raw reply

* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: Simon Horman @ 2026-04-21 16:33 UTC (permalink / raw)
  To: Kohei Enju
  Cc: netdev, linux-trace-kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers
In-Reply-To: <20260420105427.162816-1-kohei@enjuk.jp>

On Mon, Apr 20, 2026 at 10:54:23AM +0000, Kohei Enju wrote:
> Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
> skb->napi_id shares storage with sender_cpu. RX tracepoints using
> net_dev_rx_verbose_template read skb->napi_id directly and can therefore
> report sender_cpu values as if they were NAPI IDs.
> 
> For example, on the loopback path this can report 1 as napi_id, where 1
> comes from raw_smp_processor_id() + 1 in the XPS path:
> 
>   # bpftrace -e 'tracepoint:net:netif_rx_entry{ print(args->napi_id); }'
>   # taskset -c 0 ping -c 1 ::1
> 
> Report only valid NAPI IDs in these tracepoints and use 0 otherwise.
> 
> Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
> Signed-off-by: Kohei Enju <kohei@enjuk.jp>

Reviewed-by: Simon Horman <horms@kernel.org>

> ---
>  include/trace/events/net.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index fdd9ad474ce3..dbc2c5598e35 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -10,6 +10,7 @@
>  #include <linux/if_vlan.h>
>  #include <linux/ip.h>
>  #include <linux/tracepoint.h>
> +#include <net/busy_poll.h>
>  
>  TRACE_EVENT(net_dev_start_xmit,
>  
> @@ -208,7 +209,8 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
>  	TP_fast_assign(
>  		__assign_str(name);
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> -		__entry->napi_id = skb->napi_id;
> +		__entry->napi_id = napi_id_valid(skb->napi_id) ?
> +				   skb->napi_id : 0;

Note to self: they key is that if the storage at napi_id is
being used as a sender_cpu then napi_id_valid because
the valid values for a sender_cpu are disjoint from those
of a valid napi_id. This can be seen clearly in the
implementation of napi_id_valid() and the comment above it.

>  #else
>  		__entry->napi_id = 0;
>  #endif
> -- 
> 2.51.0
> 

^ permalink raw reply

* Re: [PATCH net v4 0/5] net: mana: Fix probe/remove error path bugs
From: Simon Horman @ 2026-04-21 16:49 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>

On Mon, Apr 20, 2026 at 05:47:34AM -0700, Erni Sri Satya Vennela wrote:
> Fix five bugs in mana_probe()/mana_remove() error handling that can
> cause warnings on uninitialized work structs, NULL pointer dereferences,
> masked errors, and resource leaks when early probe steps fail.
> 
> Patches 1-2 move work struct initialization (link_change_work and
> gf_stats_work) to before any error path that could trigger
> mana_remove(), preventing WARN_ON in __flush_work() or debug object
> warnings when sync cancellation runs on uninitialized work structs.
> 
> Patch 3 guards mana_remove() against double invocation. If PM resume
> fails, mana_probe() calls mana_remove() which sets gdma_context and
> driver_data to NULL. A failed resume does not unbind the driver, so
> when the device is eventually unbound, mana_remove() is called again
> and dereferences NULL, causing a kernel panic. An early return on
> NULL gdma_context or driver_data makes the second call harmless.
> 
> Patch 4 prevents add_adev() from overwriting a port probe error,
> which could leave the driver in a broken state with NULL ports while
> reporting success.
> 
> Patch 5 changes 'goto out' to 'break' in mana_remove()'s port loop
> so that mana_destroy_eq() is always reached, preventing EQ leaks when
> a NULL port is encountered.
> ---
> Changes in v4:
> * Correct Fixes tag from ca9c54d2d6a5 to 635096a86edb
> * Correct Fixes tag from ced82fce77e9 to a69839d4327d

Thanks for the updates.

For the series:

Reviewed-by: Simon Horman <horms@kernel.org>


I see that Sashiko provided feedback on patch 4/5.
However, as it notes, the issue it flags is addressed in patch 5/5.
No further action required AFAICS.


^ permalink raw reply

* Re: [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
From: bot+bpf-ci @ 2026-04-21 16:51 UTC (permalink / raw)
  To: kafai.wan, ast, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, memxor, song, yonghong.song, jolsa, sdf, davem, edumazet,
	kuba, pabeni, horms, dsahern, shuah, ihor.solodrai, kafai.wan,
	jiayuan.chen, hoyeon.lee, ameryhung, bpf, linux-kernel, netdev,
	linux-kselftest
  Cc: 2022090917019, dddddd, M202472210, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260421155804.135786-2-kafai.wan@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5fa9189eb772..96849f4c1fbc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>  	if (!is_locked_tcp_sock_ops(bpf_sock))
>  		return -EOPNOTSUPP;
>
> +	/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
> +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> +	    level == SOL_TCP && optname == TCP_NODELAY)
> +		return -EOPNOTSUPP;
> +

Does this same recursion vulnerability exist for BPF TCP congestion
control algorithms using BPF_PROG_TYPE_STRUCT_OPS?

If a BPF congestion control algorithm invokes bpf_setsockopt(TCP_NODELAY)
from its cwnd_event callback when handling CA_EVENT_TX_START, could it
trigger unbounded recursion through this path:

BPF_PROG_TYPE_STRUCT_OPS cwnd_event callback (CA_EVENT_TX_START)
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_write_xmit()
        -> tcp_transmit_skb()
          -> tcp_event_data_sent()
            -> cwnd_event()
              -> BPF congestion control cwnd_event callback

The author confirmed this issue in the v3 discussion thread and Martin
KaFai Lau suggested creating a new func_proto
(bpf_sk_setsockopt_nodelay_proto) to reject TCP_NODELAY for all struct_ops
programs, but this patch only addresses BPF_SOCK_OPS callbacks.

Reference: https://lore.kernel.org/bpf/20260418092235.98444C19424@smtp.kernel.org/


Does this fix also need to block TCP_CORK when the value is 0?

When bpf_setsockopt() is called with SOL_TCP and TCP_CORK with value 0
(disabling cork), the call path goes through __tcp_sock_set_cork() in
net/ipv4/tcp.c, which unconditionally calls tcp_push_pending_frames(sk)
when cork is disabled:

__tcp_sock_set_cork()
  -> tcp_push_pending_frames()
    -> tcp_current_mss()
      -> tcp_established_options()
        -> bpf_skops_hdr_opt_len()
          -> BPF_SOCK_OPS_HDR_OPT_LEN_CB

This triggers the same recursion chain described in the commit message,
but the patch only blocks TCP_NODELAY, not TCP_CORK with value 0.

Reference: https://lore.kernel.org/bpf/5fea6b5e9d1a13e72edf8222a8ebe50cbb660b37e62b87b65dab6f6f317943ec@mail.kernel.org/

>  	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
>  }
>


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24733356810

^ permalink raw reply

* Re: [PATCH net] netconsole: avoid out-of-bounds access on empty string in trim_newline()
From: Breno Leitao @ 2026-04-21 16:55 UTC (permalink / raw)
  To: Simon Horman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Matthew Wood, netdev, linux-kernel, kernel-team,
	stable
In-Reply-To: <20260421162219.GF651125@horms.kernel.org>

On Tue, Apr 21, 2026 at 05:22:19PM +0100, Simon Horman wrote:
> On Mon, Apr 20, 2026 at 03:18:36AM -0700, Breno Leitao wrote:
> > trim_newline() unconditionally dereferences s[len - 1] after computing
> > len = strnlen(s, maxlen). When the string is empty, len is 0 and the
> > expression underflows to s[(size_t)-1], reading (and potentially
> > writing) one byte before the buffer.
> > 
> > The two callers feed trim_newline() with the result of strscpy() from
> > configfs store callbacks (dev_name_store, userdatum_value_store).
> > configfs guarantees count >= 1 reaches the callback, but the byte
> > itself can be NUL: a userspace write(fd, "\0", 1) leaves the
> > destination empty after strscpy() and triggers the underflow. The OOB
> > write only fires if the adjacent byte happens to be '\n', so this is
> > not a security issue, but the access is undefined behaviour either way.
> > 
> > This pattern is commonly flagged by LLM-based code reviewers. While it
> > is not a security fix, the underlying access is undefined behaviour and
> > the change is small and self-contained, so it is a reasonable candidate
> > for the stable trees.
> > 
> > Guard the dereference on a non-zero length.
> > 
> > Fixes: ae001dc67907 ("net: netconsole: move newline trimming to function")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> Reviewed-by: Simon Horman <horms@kernel.org>
> 
> Sashiko has provided some feedback on this patch.
> I do not believe that should hold up progress of this patch.
> But I'd appreciate it if you could look over that feedback
> and see if any follow-up is warranted.

Thanks for the review, I've had a quick look, and it is complaining
about problems are not regressions, but some other issues in the code,
which I will need to check more carefully tomorrow.

https://sashiko.dev/#/patchset/20260420-netcons_trim_newline-v1-1-dc35889aeedf%40debian.org

Thanks,
--breno

^ permalink raw reply

* Re: [syzbot] [kvm?] [net?] [virt?] BUG: sleeping function called from invalid context in vhost_get_avail_idx
From: Kohei Enju @ 2026-04-21 17:11 UTC (permalink / raw)
  To: syzbot; +Cc: jasowang, linux-kernel, mst, netdev, syzkaller-bugs
In-Reply-To: <69e6a414.050a0220.24bfd3.002d.GAE@google.com>

On 04/20 15:09, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    8541d8f725c6 Merge tag 'mtd/for-7.1' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136454ce580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7e54da1916e8d11f
> dashboard link: https://syzkaller.appspot.com/bug?extid=6985cb8e543ea90ba8ee
> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15d264ce580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143ec1ba580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-8541d8f7.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/22dfea2c37c2/vmlinux-8541d8f7.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e2f93ad68fe3/bzImage-8541d8f7.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+6985cb8e543ea90ba8ee@syzkaller.appspotmail.com
> 
> BUG: sleeping function called from invalid context at drivers/vhost/vhost.c:1527
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6110, name: vhost-6109
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 2 locks held by vhost-6109/6110:
>  #0: ffff888055624cb0 (&vq->mutex/1){+.+.}-{4:4}, at: handle_tx+0x2d/0x160 drivers/vhost/net.c:971
>  #1: ffff888055620248 (&vq->mutex){+.+.}-{4:4}, at: vhost_net_busy_poll+0x9c/0x730 drivers/vhost/net.c:554
> Preemption disabled at:
> [<ffffffff88f1a006>] vhost_net_busy_poll+0x1c6/0x730 drivers/vhost/net.c:563

I think the blamed commit may be commit 030881372460 ("vhost_net: basic
polling support"), since it introduced preempt_{disable,enable}() around
the busy-poll loop, which calls a sleepable function inside the loop.

Also, from the changelog of the series,

https://lore.kernel.org/netdev/1448435489-5949-4-git-send-email-jasowang@redhat.com/T/#u

  Changes from RFC V1:
  ...
  - Disable preemption during busy looping to make sure local_clock() was
    correctly used.

So my understanding is that preempt_disable() was introduced to keep
local_clock() based timeout accounting on a single CPU, rather than as a
requirement of busy polling itself.

If my understanding is correct, migrate_disable() is sufficient here
instead of preempt_disable(), avoiding sleepable accesses from a
preempt-disabled context.

#syz test

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 80965181920c..c6536cad9c4f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -560,7 +560,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
        busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
                                     tvq->busyloop_timeout;

-       preempt_disable();
+       migrate_disable();
        endtime = busy_clock() + busyloop_timeout;

        while (vhost_can_busy_poll(endtime)) {
@@ -577,7 +577,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
                cpu_relax();
        }

-       preempt_enable();
+       migrate_enable();

        if (poll_rx || sock_has_rx_data(sock))
                vhost_net_busy_poll_try_queue(net, vq);


> CPU: 0 UID: 0 PID: 6110 Comm: vhost-6109 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
>  __might_resched.cold+0x1ec/0x232 kernel/sched/core.c:9162
>  __might_fault+0x8b/0x140 mm/memory.c:7322
>  vhost_get_avail_idx+0x31c/0x4f0 drivers/vhost/vhost.c:1527
>  vhost_vq_avail_empty drivers/vhost/vhost.c:3206 [inline]
>  vhost_vq_avail_empty+0xa9/0xe0 drivers/vhost/vhost.c:3199
>  vhost_net_busy_poll+0x297/0x730 drivers/vhost/net.c:574
>  vhost_net_tx_get_vq_desc drivers/vhost/net.c:610 [inline]
>  get_tx_bufs.constprop.0+0x338/0x600 drivers/vhost/net.c:650
>  handle_tx_copy+0x28c/0x12e0 drivers/vhost/net.c:778
>  handle_tx+0x139/0x160 drivers/vhost/net.c:985
>  vhost_run_work_list+0x183/0x220 drivers/vhost/vhost.c:454
>  vhost_task_fn+0x156/0x430 kernel/vhost_task.c:49
>  ret_from_fork+0x72b/0xd50 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox