linux-hams.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
@ 2025-09-25 16:35 syzbot
  2025-10-18 20:37 ` syzbot
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2025-09-25 16:35 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    846bd2225ec3 Add linux-next specific files for 20250919
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=130c08e2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=135377594f35b576
dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13f3fe42580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c53d48022f8a/disk-846bd222.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/483534e784c8/vmlinux-846bd222.xz
kernel image: https://storage.googleapis.com/syzbot-assets/721b36eec9b3/bzImage-846bd222.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in nr_add_node+0x251b/0x2580 net/netrom/nr_route.c:248
Read of size 2 at addr ffff888078ae2632 by task syz.0.4719/15936

CPU: 0 UID: 0 PID: 15936 Comm: syz.0.4719 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 nr_add_node+0x251b/0x2580 net/netrom/nr_route.c:248
 nr_rt_ioctl+0x9f1/0xb00 net/netrom/nr_route.c:651
 sock_do_ioctl+0xdc/0x300 net/socket.c:1241
 sock_ioctl+0x576/0x790 net/socket.c:1362
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f29de38ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f29df216038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f29de5d5fa0 RCX: 00007f29de38ec29
RDX: 0000200000000000 RSI: 000000000000890b RDI: 0000000000000004
RBP: 00007f29de411e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f29de5d6038 R14: 00007f29de5d5fa0 R15: 00007ffedaf2ceb8
 </TASK>

Allocated by task 15939:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:417
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __kmalloc_cache_noprof+0x3d5/0x6f0 mm/slub.c:5723
 kmalloc_noprof include/linux/slab.h:957 [inline]
 nr_add_node+0x804/0x2580 net/netrom/nr_route.c:146
 nr_rt_ioctl+0x9f1/0xb00 net/netrom/nr_route.c:651
 sock_do_ioctl+0xdc/0x300 net/socket.c:1241
 sock_ioctl+0x576/0x790 net/socket.c:1362
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 15936:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 __kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587
 kasan_save_free_info mm/kasan/kasan.h:406 [inline]
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2507 [inline]
 slab_free mm/slub.c:6557 [inline]
 kfree+0x19a/0x6d0 mm/slub.c:6765
 nr_add_node+0x1cbe/0x2580 net/netrom/nr_route.c:246
 nr_rt_ioctl+0x9f1/0xb00 net/netrom/nr_route.c:651
 sock_do_ioctl+0xdc/0x300 net/socket.c:1241
 sock_ioctl+0x576/0x790 net/socket.c:1362
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888078ae2600
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 50 bytes inside of
 freed 64-byte region [ffff888078ae2600, ffff888078ae2640)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x78ae2
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88801b0418c0 ffffea0000c51900 dead000000000004
raw: 0000000000000000 0000000000200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5233, tgid 5233 (udevd), ts 92365645188, free_ts 92323224662
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3023 [inline]
 allocate_slab+0x96/0x3a0 mm/slub.c:3196
 new_slab mm/slub.c:3250 [inline]
 ___slab_alloc+0xe94/0x1920 mm/slub.c:4626
 __slab_alloc+0x65/0x100 mm/slub.c:4745
 __slab_alloc_node mm/slub.c:4821 [inline]
 slab_alloc_node mm/slub.c:5232 [inline]
 __do_kmalloc_node mm/slub.c:5601 [inline]
 __kmalloc_noprof+0x471/0x7f0 mm/slub.c:5614
 kmalloc_noprof include/linux/slab.h:961 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 tomoyo_encode2 security/tomoyo/realpath.c:45 [inline]
 tomoyo_encode+0x28b/0x550 security/tomoyo/realpath.c:80
 tomoyo_realpath_from_path+0x58d/0x5d0 security/tomoyo/realpath.c:283
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x213/0x4b0 security/tomoyo/file.c:822
 security_inode_getattr+0x12f/0x330 security/security.c:2416
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 __do_sys_newfstat fs/stat.c:555 [inline]
 __se_sys_newfstat fs/stat.c:550 [inline]
 __x64_sys_newfstat+0xfc/0x200 fs/stat.c:550
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5528 tgid 5528 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1394 [inline]
 __free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906
 rcu_do_batch kernel/rcu/tree.c:2605 [inline]
 rcu_core+0xcab/0x1770 kernel/rcu/tree.c:2861
 handle_softirqs+0x286/0x870 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 invoke_softirq kernel/softirq.c:496 [inline]
 __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
 sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697

Memory state around the buggy address:
 ffff888078ae2500: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff888078ae2580: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888078ae2600: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                                     ^
 ffff888078ae2680: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff888078ae2700: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
  2025-09-25 16:35 [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node syzbot
@ 2025-10-18 20:37 ` syzbot
  2025-10-19  5:02   ` Brahmajit Das
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: syzbot @ 2025-10-18 20:37 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzkaller-bugs

syzbot has found a reproducer for the following issue on:

HEAD commit:    f406055cb18c Merge tag 'arm64-fixes' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11cf767c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=155f1de2580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12b6b52f980000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f406055c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/a4db2a99bfb1/vmlinux-f406055c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/91d1ca420bac/bzImage-f406055c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888054b8cc30 by task syz.3.7839/22393

CPU: 3 UID: 0 PID: 22393 Comm: syz.3.7839 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xcd/0x630 mm/kasan/report.c:482
 kasan_report+0xe0/0x110 mm/kasan/report.c:595
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:651
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7f4fb8efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7f509c5038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f7f4fde5fa0 RCX: 00007f7f4fb8efc9
RDX: 0000200000000280 RSI: 000000000000890b RDI: 0000000000000004
RBP: 00007f7f4fc11f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7f4fde6038 R14: 00007f7f4fde5fa0 R15: 00007ffd1bd56f18
 </TASK>

Allocated by task 22380:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:417
 kmalloc_noprof include/linux/slab.h:957 [inline]
 nr_add_node+0xe4e/0x2c00 net/netrom/nr_route.c:146
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:651
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 22393:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 __kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:587
 kasan_save_free_info mm/kasan/kasan.h:406 [inline]
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2523 [inline]
 slab_free mm/slub.c:6611 [inline]
 kfree+0x2b8/0x6d0 mm/slub.c:6818
 nr_neigh_put include/net/netrom.h:143 [inline]
 nr_neigh_put include/net/netrom.h:137 [inline]
 nr_add_node+0x23b9/0x2c00 net/netrom/nr_route.c:246
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:651
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888054b8cc00
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 48 bytes inside of
 freed 64-byte region [ffff888054b8cc00, ffff888054b8cc40)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x54b8c
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88801b4428c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 6074, tgid 6074 (syz-executor), ts 126099390276, free_ts 126098882568
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1c0/0x230 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x10a3/0x3a30 mm/page_alloc.c:3884
 __alloc_frozen_pages_noprof+0x25f/0x2470 mm/page_alloc.c:5183
 alloc_pages_mpol+0x1fb/0x550 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3039 [inline]
 allocate_slab mm/slub.c:3212 [inline]
 new_slab+0x24a/0x360 mm/slub.c:3266
 ___slab_alloc+0xdc4/0x1ae0 mm/slub.c:4636
 __slab_alloc.constprop.0+0x63/0x110 mm/slub.c:4755
 __slab_alloc_node mm/slub.c:4831 [inline]
 slab_alloc_node mm/slub.c:5253 [inline]
 __kmalloc_cache_noprof+0x477/0x780 mm/slub.c:5743
 kmalloc_noprof include/linux/slab.h:957 [inline]
 __team_option_inst_add+0xbf/0x330 drivers/net/team/team_core.c:160
 __team_option_inst_add_option drivers/net/team/team_core.c:182 [inline]
 __team_options_register+0x355/0x720 drivers/net/team/team_core.c:276
 team_options_register drivers/net/team/team_core.c:341 [inline]
 team_init+0x5c8/0xce0 drivers/net/team/team_core.c:1658
 register_netdevice+0x653/0x2270 net/core/dev.c:11221
 team_newlink+0xb4/0x190 drivers/net/team/team_core.c:2213
 rtnl_newlink_create net/core/rtnetlink.c:3833 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3950 [inline]
 rtnl_newlink+0xc45/0x2000 net/core/rtnetlink.c:4065
 rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6954
 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
page last free pid 6074 tgid 6074 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1394 [inline]
 __free_frozen_pages+0x7df/0x1160 mm/page_alloc.c:2906
 selinux_genfs_get_sid security/selinux/hooks.c:1357 [inline]
 inode_doinit_with_dentry+0xacb/0x12e0 security/selinux/hooks.c:1555
 selinux_d_instantiate+0x26/0x30 security/selinux/hooks.c:6523
 security_d_instantiate+0x142/0x1a0 security/security.c:4148
 d_instantiate+0x5c/0x90 fs/dcache.c:1961
 __debugfs_create_file+0x286/0x6b0 fs/debugfs/inode.c:459
 debugfs_create_file_full+0x41/0x60 fs/debugfs/inode.c:469
 ref_tracker_dir_debugfs+0x19d/0x290 lib/ref_tracker.c:441
 ref_tracker_dir_init include/linux/ref_tracker.h:70 [inline]
 alloc_netdev_mqs+0x314/0x1550 net/core/dev.c:11907
 rtnl_create_link+0xc08/0xf90 net/core/rtnetlink.c:3641
 rtnl_newlink_create net/core/rtnetlink.c:3823 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3950 [inline]
 rtnl_newlink+0xb69/0x2000 net/core/rtnetlink.c:4065
 rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6954
 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
 netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1896
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 __sys_sendto+0x4a3/0x520 net/socket.c:2244

Memory state around the buggy address:
 ffff888054b8cb00: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
 ffff888054b8cb80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888054b8cc00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                                     ^
 ffff888054b8cc80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
 ffff888054b8cd00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
==================================================================


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
  2025-10-18 20:37 ` syzbot
@ 2025-10-19  5:02   ` Brahmajit Das
  2025-10-19  5:21     ` syzbot
  2025-10-19  5:10   ` Brahmajit Das
  2025-10-20  8:13   ` [PATCH] netrom: Prevent race conditions between multiple add route Lizhi Xu
  2 siblings, 1 reply; 27+ messages in thread
From: Brahmajit Das @ 2025-10-19  5:02 UTC (permalink / raw)
  To: syzbot
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzkaller-bugs

On 18.10.2025 13:37, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    f406055cb18c Merge tag 'arm64-fixes' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11cf767c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=155f1de2580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12b6b52f980000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f406055c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a4db2a99bfb1/vmlinux-f406055c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/91d1ca420bac/bzImage-f406055c.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> Read of size 4 at addr ffff888054b8cc30 by task syz.3.7839/22393
> 

#syz test

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..5fa7d9febbbb 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -245,7 +245,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 				nr_node->routes[2].neighbour->count--;
 				nr_neigh_put(nr_node->routes[2].neighbour);
 
-				if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
+				if (nr_node->routes[2].neighbour &&
+				    nr_node->routes[2].neighbour->count == 0 &&
+				    !nr_node->routes[2].neighbour->locked)
 					nr_remove_neigh(nr_node->routes[2].neighbour);
 
 				nr_node->routes[2].quality   = quality;

-- 
Regards,
listout

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
  2025-10-18 20:37 ` syzbot
  2025-10-19  5:02   ` Brahmajit Das
@ 2025-10-19  5:10   ` Brahmajit Das
  2025-10-19  5:10     ` syzbot
  2025-10-20  8:13   ` [PATCH] netrom: Prevent race conditions between multiple add route Lizhi Xu
  2 siblings, 1 reply; 27+ messages in thread
From: Brahmajit Das @ 2025-10-19  5:10 UTC (permalink / raw)
  To: syzbot
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzkaller-bugs

On 18.10.2025 13:37, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    f406055cb18c Merge tag 'arm64-fixes' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11cf767c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=155f1de2580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12b6b52f980000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f406055c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a4db2a99bfb1/vmlinux-f406055c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/91d1ca420bac/bzImage-f406055c.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> Read of size 4 at addr ffff888054b8cc30 by task syz.3.7839/22393

#syz test linux-next

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..5fa7d9febbbb 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -245,7 +245,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 				nr_node->routes[2].neighbour->count--;
 				nr_neigh_put(nr_node->routes[2].neighbour);
 
-				if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
+				if (nr_node->routes[2].neighbour &&
+				    nr_node->routes[2].neighbour->count == 0 &&
+				    !nr_node->routes[2].neighbour->locked)
 					nr_remove_neigh(nr_node->routes[2].neighbour);
 
 				nr_node->routes[2].quality   = quality;

-- 
Regards,
listout

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
  2025-10-19  5:10   ` Brahmajit Das
@ 2025-10-19  5:10     ` syzbot
  0 siblings, 0 replies; 27+ messages in thread
From: syzbot @ 2025-10-19  5:10 UTC (permalink / raw)
  To: listout
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, listout,
	netdev, pabeni, syzkaller-bugs

> On 18.10.2025 13:37, syzbot wrote:
>> syzbot has found a reproducer for the following issue on:
>> 
>> HEAD commit:    f406055cb18c Merge tag 'arm64-fixes' of git://git.kernel.o..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=11cf767c580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>> dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
>> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=155f1de2580000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12b6b52f980000
>> 
>> Downloadable assets:
>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f406055c.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/a4db2a99bfb1/vmlinux-f406055c.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/91d1ca420bac/bzImage-f406055c.xz
>> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
>> 
>> ==================================================================
>> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
>> Read of size 4 at addr ffff888054b8cc30 by task syz.3.7839/22393
>
> #syz test linux-next

want either no args or 2 args (repo, branch), got 1

>
> diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> index b94cb2ffbaf8..5fa7d9febbbb 100644
> --- a/net/netrom/nr_route.c
> +++ b/net/netrom/nr_route.c
> @@ -245,7 +245,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
>  				nr_node->routes[2].neighbour->count--;
>  				nr_neigh_put(nr_node->routes[2].neighbour);
>  
> -				if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
> +				if (nr_node->routes[2].neighbour &&
> +				    nr_node->routes[2].neighbour->count == 0 &&
> +				    !nr_node->routes[2].neighbour->locked)
>  					nr_remove_neigh(nr_node->routes[2].neighbour);
>  
>  				nr_node->routes[2].quality   = quality;
>
> -- 
> Regards,
> listout

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node
  2025-10-19  5:02   ` Brahmajit Das
@ 2025-10-19  5:21     ` syzbot
  0 siblings, 0 replies; 27+ messages in thread
From: syzbot @ 2025-10-19  5:21 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-hams, linux-kernel, listout,
	netdev, pabeni, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: slab-use-after-free Read in nr_add_node

==================================================================
BUG: KASAN: slab-use-after-free in nr_add_node+0x25e5/0x2c10 net/netrom/nr_route.c:249
Read of size 4 at addr ffff88805466ddb0 by task syz.1.4225/15237

CPU: 2 UID: 0 PID: 15237 Comm: syz.1.4225 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xcd/0x630 mm/kasan/report.c:482
 kasan_report+0xe0/0x110 mm/kasan/report.c:595
 nr_add_node+0x25e5/0x2c10 net/netrom/nr_route.c:249
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:653
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4229d8efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f422ac41038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f4229fe5fa0 RCX: 00007f4229d8efc9
RDX: 0000200000000280 RSI: 000000000000890b RDI: 0000000000000004
RBP: 00007f4229e11f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f4229fe6038 R14: 00007f4229fe5fa0 R15: 00007ffe9d4efb98
 </TASK>

Allocated by task 15224:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:417
 kmalloc_noprof include/linux/slab.h:957 [inline]
 nr_add_node+0xe4e/0x2c10 net/netrom/nr_route.c:146
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:653
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 15237:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 __kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:587
 kasan_save_free_info mm/kasan/kasan.h:406 [inline]
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2530 [inline]
 slab_free mm/slub.c:6619 [inline]
 kfree+0x2b8/0x6d0 mm/slub.c:6826
 nr_neigh_put include/net/netrom.h:143 [inline]
 nr_neigh_put include/net/netrom.h:137 [inline]
 nr_add_node+0x23c3/0x2c10 net/netrom/nr_route.c:246
 nr_rt_ioctl+0x11b7/0x29b0 net/netrom/nr_route.c:653
 nr_ioctl+0x19a/0x2d0 net/netrom/af_netrom.c:1254
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88805466dd80
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 48 bytes inside of
 freed 64-byte region [ffff88805466dd80, ffff88805466ddc0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5466d
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88801b4428c0 ffffea0000867b00 dead000000000004
raw: 0000000000000000 0000000000200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5345, tgid 5345 (udevd), ts 58557594470, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1c0/0x230 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x10a3/0x3a30 mm/page_alloc.c:3884
 __alloc_frozen_pages_noprof+0x25f/0x2470 mm/page_alloc.c:5183
 alloc_pages_mpol+0x1fb/0x550 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3046 [inline]
 allocate_slab mm/slub.c:3219 [inline]
 new_slab+0x24a/0x360 mm/slub.c:3273
 ___slab_alloc+0xdc4/0x1ae0 mm/slub.c:4643
 __slab_alloc.constprop.0+0x63/0x110 mm/slub.c:4762
 __slab_alloc_node mm/slub.c:4838 [inline]
 slab_alloc_node mm/slub.c:5260 [inline]
 __do_kmalloc_node mm/slub.c:5633 [inline]
 __kmalloc_noprof+0x501/0x880 mm/slub.c:5646
 kmalloc_noprof include/linux/slab.h:961 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 tomoyo_encode2+0x100/0x3e0 security/tomoyo/realpath.c:45
 tomoyo_encode+0x29/0x50 security/tomoyo/realpath.c:80
 tomoyo_realpath_from_path+0x18f/0x6e0 security/tomoyo/realpath.c:283
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_check_open_permission+0x2ab/0x3c0 security/tomoyo/file.c:771
 tomoyo_file_open+0x6b/0x90 security/tomoyo/tomoyo.c:334
 security_file_open+0x84/0x1e0 security/security.c:3183
 do_dentry_open+0x596/0x1530 fs/open.c:942
 vfs_open+0x82/0x3f0 fs/open.c:1097
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88805466dc80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
 ffff88805466dd00: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
>ffff88805466dd80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                                     ^
 ffff88805466de00: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
 ffff88805466de80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
==================================================================


Tested on:

commit:         1c64efcb Merge tag 'rust-rustfmt' of git://git.kernel...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12a8bde2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=13e20de2580000


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] netrom: Prevent race conditions between multiple add route
  2025-10-18 20:37 ` syzbot
  2025-10-19  5:02   ` Brahmajit Das
  2025-10-19  5:10   ` Brahmajit Das
@ 2025-10-20  8:13   ` Lizhi Xu
  2025-10-20 10:10     ` Dan Carpenter
  2 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-20  8:13 UTC (permalink / raw)
  To: syzbot+2860e75836a08b172755
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzkaller-bugs

The root cause of the problem is that multiple different tasks initiate
NETROM_NODE commands to add new routes, there is no lock between them to
protect the same nr_neigh.
Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
When Task3 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF:

Task0					Task1
=====					=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);

The solution to the problem is to use a lock to synchronize each add a route
to node.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
 net/netrom/nr_route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..ae1e5ee1f52f 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -102,7 +102,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 	struct nr_neigh *nr_neigh;
 	int i, found;
 	struct net_device *odev;
+	static DEFINE_MUTEX(add_node_lock);
 
+	guard(mutex)(&add_node_lock);
 	if ((odev=nr_dev_get(nr)) != NULL) {	/* Can't add routes to ourself */
 		dev_put(odev);
 		return -EINVAL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] netrom: Prevent race conditions between multiple add route
  2025-10-20  8:13   ` [PATCH] netrom: Prevent race conditions between multiple add route Lizhi Xu
@ 2025-10-20 10:10     ` Dan Carpenter
  2025-10-20 11:02       ` [PATCH V2] " Lizhi Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Carpenter @ 2025-10-20 10:10 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: syzbot+2860e75836a08b172755, davem, edumazet, horms, kuba,
	linux-hams, linux-kernel, netdev, pabeni, syzkaller-bugs

On Mon, Oct 20, 2025 at 04:13:59PM +0800, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> NETROM_NODE commands to add new routes, there is no lock between them to
> protect the same nr_neigh.
> Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> When Task3 executes nr_neigh_put(nr_node->routes[2].neighbour), it will

s/Task3/Task1/

> release the neighbour because its refcount value is 1.
> 

The refcount would be 2 and then drop to zero.  Both nr_neigh_put() and
nr_remove_neigh() drop the refcount.

> In this case, the following situation causes a UAF:
> 
> Task0					Task1
> =====					=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node->routes[2].neighbour->count--

Does this line really matter in terms of the use after free?

> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);


This chart is confusing.  It says that that the nr_neigh_hold() is the use
after free.  But we called nr_remove_neigh(nr_node->routes[2].neighbour)
before we assigned nr_node->routes[2].neighbour = nr_neigh...

The sysbot report says that the free happens on:

	r_neigh_put(nr_node->routes[2].neighbour);

and the use after free happens on the next line:

	if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)

Which does suggest that somewhere the refcount is 1 when it should be
at least 2...  It could be that two threads call nr_neigh_put() at
basically the same time, but that doesn't make sense either because
we're holding the nr_node_lock(nr_node)...

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 10:10     ` Dan Carpenter
@ 2025-10-20 11:02       ` Lizhi Xu
  2025-10-20 12:25         ` Dan Carpenter
  0 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-20 11:02 UTC (permalink / raw)
  To: dan.carpenter
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, lizhi.xu,
	netdev, pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

The root cause of the problem is that multiple different tasks initiate
NETROM_NODE commands to add new routes, there is no lock between them to
protect the same nr_neigh.
Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].

When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()

The solution to the problem is to use a lock to synchronize each add a route
to node.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf

 net/netrom/nr_route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..ae1e5ee1f52f 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -102,7 +102,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 	struct nr_neigh *nr_neigh;
 	int i, found;
 	struct net_device *odev;
+	static DEFINE_MUTEX(add_node_lock);
 
+	guard(mutex)(&add_node_lock);
 	if ((odev=nr_dev_get(nr)) != NULL) {	/* Can't add routes to ourself */
 		dev_put(odev);
 		return -EINVAL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 11:02       ` [PATCH V2] " Lizhi Xu
@ 2025-10-20 12:25         ` Dan Carpenter
  2025-10-20 12:33           ` Dan Carpenter
                             ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Dan Carpenter @ 2025-10-20 12:25 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Mon, Oct 20, 2025 at 07:02:44PM +0800, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> NETROM_NODE commands to add new routes, there is no lock between them to
> protect the same nr_neigh.
> Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> 
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 
> The solution to the problem is to use a lock to synchronize each add a route
> to node.

This chart is still not right.  Let me add line numbers to your chart:

netrom/nr_route.c
   214          nr_node_lock(nr_node);
   215  
   216          if (quality != 0)
   217                  strscpy(nr_node->mnemonic, mnemonic);
   218  
   219          for (found = 0, i = 0; i < nr_node->count; i++) {
   220                  if (nr_node->routes[i].neighbour == nr_neigh) {
   221                          nr_node->routes[i].quality   = quality;
   222                          nr_node->routes[i].obs_count = obs_count;
   223                          found = 1;
   224                          break;
   225                  }
   226          }
   227  
   228          if (!found) {
   229                  /* We have space at the bottom, slot it in */
   230                  if (nr_node->count < 3) {
   231                          nr_node->routes[2] = nr_node->routes[1];
   232                          nr_node->routes[1] = nr_node->routes[0];
   233  
   234                          nr_node->routes[0].quality   = quality;
   235                          nr_node->routes[0].obs_count = obs_count;
   236                          nr_node->routes[0].neighbour = nr_neigh;
   237  
   238                          nr_node->which++;
   239                          nr_node->count++;
   240                          nr_neigh_hold(nr_neigh);
   241                          nr_neigh->count++;
   242                  } else {
   243                          /* It must be better than the worst */
   244                          if (quality > nr_node->routes[2].quality) {
   245                                  nr_node->routes[2].neighbour->count--;
   246                                  nr_neigh_put(nr_node->routes[2].neighbour);
   247  
   248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
   249                                          nr_remove_neigh(nr_node->routes[2].neighbour);
   250  
   251                                  nr_node->routes[2].quality   = quality;
   252                                  nr_node->routes[2].obs_count = obs_count;
   253                                  nr_node->routes[2].neighbour = nr_neigh;
   254  
   255                                  nr_neigh_hold(nr_neigh);
   256                                  nr_neigh->count++;
   257                          }
   258                  }
   259          }


Task0					Task1						Task2
=====					=====						=====
[97] nr_add_node()
[113] nr_neigh_get_dev()		[97] nr_add_node()
					[214] nr_node_lock()
					[245] nr_node->routes[2].neighbour->count--
					[246] nr_neigh_put(nr_node->routes[2].neighbour);
					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
					[283] nr_node_unlock()
[214] nr_node_lock()
[253] nr_node->routes[2].neighbour = nr_neigh
[254] nr_neigh_hold(nr_neigh);								[97] nr_add_node()
											[XXX] nr_neigh_put()
                                                                                        ^^^^^^^^^^^^^^^^^^^^

These charts are supposed to be chronological so [XXX] is wrong because the
use after free happens on line [248].  Do we really need three threads to
make this race work?

> 
> syzbot reported:
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
                                                                                     ^^^

> Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

I'm sure you tested your patch and that it fixes the bug, but I just
wonder if it's the best possible fix?

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 12:25         ` Dan Carpenter
@ 2025-10-20 12:33           ` Dan Carpenter
  2025-10-20 12:57           ` Dan Carpenter
  2025-10-20 13:34           ` Lizhi Xu
  2 siblings, 0 replies; 27+ messages in thread
From: Dan Carpenter @ 2025-10-20 12:33 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Mon, Oct 20, 2025 at 03:25:40PM +0300, Dan Carpenter wrote:
> On Mon, Oct 20, 2025 at 07:02:44PM +0800, Lizhi Xu wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > NETROM_NODE commands to add new routes, there is no lock between them to
> > protect the same nr_neigh.
> > Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > 
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> > 
> > In this case, the following situation causes a UAF:
> > 
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 
> > The solution to the problem is to use a lock to synchronize each add a route
> > to node.
> 
> This chart is still not right.  Let me add line numbers to your chart:
> 
> netrom/nr_route.c
>    214          nr_node_lock(nr_node);
>    215  
>    216          if (quality != 0)
>    217                  strscpy(nr_node->mnemonic, mnemonic);
>    218  
>    219          for (found = 0, i = 0; i < nr_node->count; i++) {
>    220                  if (nr_node->routes[i].neighbour == nr_neigh) {
>    221                          nr_node->routes[i].quality   = quality;
>    222                          nr_node->routes[i].obs_count = obs_count;

Should we call nr_neigh->count++ if we found it?  I guess I don't
really understand what nr_neigh->count is counting...  It would be
really nice if someone added some comments explaining how the ref
counting worked.

>    223                          found = 1;
>    224                          break;
>    225                  }
>    226          }

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 12:25         ` Dan Carpenter
  2025-10-20 12:33           ` Dan Carpenter
@ 2025-10-20 12:57           ` Dan Carpenter
  2025-10-20 13:34           ` Lizhi Xu
  2 siblings, 0 replies; 27+ messages in thread
From: Dan Carpenter @ 2025-10-20 12:57 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Mon, Oct 20, 2025 at 03:25:40PM +0300, Dan Carpenter wrote:
> netrom/nr_route.c
>    214          nr_node_lock(nr_node);

I guess nr_node is different for each thread?

>    215  
>    216          if (quality != 0)
>    217                  strscpy(nr_node->mnemonic, mnemonic);
>    218  
>    219          for (found = 0, i = 0; i < nr_node->count; i++) {
>    220                  if (nr_node->routes[i].neighbour == nr_neigh) {
>    221                          nr_node->routes[i].quality   = quality;
>    222                          nr_node->routes[i].obs_count = obs_count;
>    223                          found = 1;
>    224                          break;
>    225                  }
>    226          }
>    227  
>    228          if (!found) {
>    229                  /* We have space at the bottom, slot it in */
>    230                  if (nr_node->count < 3) {
>    231                          nr_node->routes[2] = nr_node->routes[1];
>    232                          nr_node->routes[1] = nr_node->routes[0];
>    233  
>    234                          nr_node->routes[0].quality   = quality;
>    235                          nr_node->routes[0].obs_count = obs_count;
>    236                          nr_node->routes[0].neighbour = nr_neigh;
>    237  
>    238                          nr_node->which++;
>    239                          nr_node->count++;
>    240                          nr_neigh_hold(nr_neigh);
>    241                          nr_neigh->count++;
>    242                  } else {
>    243                          /* It must be better than the worst */
>    244                          if (quality > nr_node->routes[2].quality) {
>    245                                  nr_node->routes[2].neighbour->count--;
>    246                                  nr_neigh_put(nr_node->routes[2].neighbour);
>    247  
>    248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
>    249                                          nr_remove_neigh(nr_node->routes[2].neighbour);

Does neighbour->count means how many nr_node pointers have it in ->routes[]?
I wish this code had comments.
KTDOO: add comments explaining all the counters in netrom/nr_route.c

>    250  
>    251                                  nr_node->routes[2].quality   = quality;
>    252                                  nr_node->routes[2].obs_count = obs_count;
>    253                                  nr_node->routes[2].neighbour = nr_neigh;
>    254  
>    255                                  nr_neigh_hold(nr_neigh);
>    256                                  nr_neigh->count++;

Could we just add some locking to this if statement only?  I had thought
that nr_node_lock() serialized it but now I think maybe not?  Or maybe we
could increment the counters before assigning it?

			nr_neigh->count++;
			nr_neigh_hold(nr_neigh);
			nr_node->routes[2].neighbour = nr_neigh;

I'm not an expert in concerency.  Does calling nr_neigh_hold() mean th
that the nr_neigh->count++ will happen before the assignment?


>    257                          }
>    258                  }
>    259          }

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 12:25         ` Dan Carpenter
  2025-10-20 12:33           ` Dan Carpenter
  2025-10-20 12:57           ` Dan Carpenter
@ 2025-10-20 13:34           ` Lizhi Xu
  2025-10-20 13:49             ` Lizhi Xu
  2 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-20 13:34 UTC (permalink / raw)
  To: dan.carpenter
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, lizhi.xu,
	netdev, pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs


On Mon, 20 Oct 2025 15:25:40 +0300, Dan Carpenter wrote:
> Task0					Task1						Task2
> =====					=====						=====
> [97] nr_add_node()
> [113] nr_neigh_get_dev()		[97] nr_add_node()
> 					[214] nr_node_lock()
> 					[245] nr_node->routes[2].neighbour->count--
> 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> 					[283] nr_node_unlock()
> [214] nr_node_lock()
> [253] nr_node->routes[2].neighbour = nr_neigh
> [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> 											[XXX] nr_neigh_put()
>                                                                                         ^^^^^^^^^^^^^^^^^^^^
> 
> These charts are supposed to be chronological so [XXX] is wrong because the
> use after free happens on line [248].  Do we really need three threads to
> make this race work?
The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
executing [XXX]nr_neigh_put().

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 13:34           ` Lizhi Xu
@ 2025-10-20 13:49             ` Lizhi Xu
  2025-10-20 17:59               ` Dan Carpenter
  0 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-20 13:49 UTC (permalink / raw)
  To: lizhi.xu
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, netdev, pabeni, syzbot+2860e75836a08b172755,
	syzkaller-bugs

On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > Task0					Task1						Task2
> > =====					=====						=====
> > [97] nr_add_node()
> > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > 					[214] nr_node_lock()
> > 					[245] nr_node->routes[2].neighbour->count--
> > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					[283] nr_node_unlock()
> > [214] nr_node_lock()
> > [253] nr_node->routes[2].neighbour = nr_neigh
> > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > 											[XXX] nr_neigh_put()
> >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > 
> > These charts are supposed to be chronological so [XXX] is wrong because the
> > use after free happens on line [248].  Do we really need three threads to
> > make this race work?
> The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> executing [XXX]nr_neigh_put().
Execution Order:
1 -> Task0
[113] nr_neigh_get_dev() // After execution, the refcount value is 3

2 -> Task1
[246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
[248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1

3 -> Task0
[253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]

4 -> Task2
[XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 13:49             ` Lizhi Xu
@ 2025-10-20 17:59               ` Dan Carpenter
  2025-10-21  2:05                 ` Lizhi Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Carpenter @ 2025-10-20 17:59 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > Task0					Task1						Task2
> > > =====					=====						=====
> > > [97] nr_add_node()
> > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > 					[214] nr_node_lock()
> > > 					[245] nr_node->routes[2].neighbour->count--
> > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > 					[283] nr_node_unlock()
> > > [214] nr_node_lock()
> > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > 											[XXX] nr_neigh_put()
> > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > 
> > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > use after free happens on line [248].  Do we really need three threads to
> > > make this race work?
> > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > executing [XXX]nr_neigh_put().
> Execution Order:
> 1 -> Task0
> [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> 
> 2 -> Task1
> [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> 
> 3 -> Task0
> [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> 
> 4 -> Task2
> [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count

Let's step back a bit and look at the bigger picture design.  (Which is
completely undocumented so we're just guessing).

When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
reference count and nr_neigh->count++, then when we remove it from
->routes[] we drop the reference and do nr_neigh->count--.

If it's the last reference (and we are not holding ->locked) then we
remove it from the &nr_neigh_list and drop the reference count again and
free it.  So we drop the reference count twice.  This is a complicated
design with three variables: nr_neigh_hold(), nr_neigh->count and
->locked.  Why can it not just be one counter nr_neigh_hold().  So
instead of setting locked = true we would just take an extra reference?
The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.

Because that's fundamentally the problem, right?  We call
nr_neigh_get_dev() so we think we're holding a reference and we're
safe, but we don't realize that calling neighbour->count-- can
result in dropping two references.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-20 17:59               ` Dan Carpenter
@ 2025-10-21  2:05                 ` Lizhi Xu
  2025-10-21  6:36                   ` Dan Carpenter
  0 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-21  2:05 UTC (permalink / raw)
  To: dan.carpenter
  Cc: lizhi.xu, davem, edumazet, horms, kuba, linux-hams, linux-kernel,
	netdev, pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > Task0					Task1						Task2
> > > > =====					=====						=====
> > > > [97] nr_add_node()
> > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > 					[214] nr_node_lock()
> > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > 					[283] nr_node_unlock()
> > > > [214] nr_node_lock()
> > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > 											[XXX] nr_neigh_put()
> > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > use after free happens on line [248].  Do we really need three threads to
> > > > make this race work?
> > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > executing [XXX]nr_neigh_put().
> > Execution Order:
> > 1 -> Task0
> > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> >
> > 2 -> Task1
> > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> >
> > 3 -> Task0
> > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> >
> > 4 -> Task2
> > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> 
> Let's step back a bit and look at the bigger picture design.  (Which is
> completely undocumented so we're just guessing).
> 
> When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> reference count and nr_neigh->count++, then when we remove it from
> ->routes[] we drop the reference and do nr_neigh->count--.
> 
> If it's the last reference (and we are not holding ->locked) then we
> remove it from the &nr_neigh_list and drop the reference count again and
> free it.  So we drop the reference count twice.  This is a complicated
> design with three variables: nr_neigh_hold(), nr_neigh->count and
> ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> instead of setting locked = true we would just take an extra reference?
> The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
locked controls whether the neighbor quality can be automatically updated;
count controls the number of different routes a neighbor is linked to;
refcount is simply used to manage the neighbor lifecycle.
> 
> Because that's fundamentally the problem, right?  We call
> nr_neigh_get_dev() so we think we're holding a reference and we're
> safe, but we don't realize that calling neighbour->count-- can
> result in dropping two references.
After nr_neigh_get_dev() retrieves a neighbor, there shouldn't be an
unfinished nr_add_node() call operating on the neighbor in the route.
Therefore, we need to use a lock before the nr_neigh_get_dev() operation
begins to ensure that the neighbor is added atomically to the routing table.

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-21  2:05                 ` Lizhi Xu
@ 2025-10-21  6:36                   ` Dan Carpenter
  2025-10-21  8:34                     ` Lizhi Xu
  2025-10-21  8:35                     ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Lizhi Xu
  0 siblings, 2 replies; 27+ messages in thread
From: Dan Carpenter @ 2025-10-21  6:36 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Tue, Oct 21, 2025 at 10:05:33AM +0800, Lizhi Xu wrote:
> On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> > On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > > Task0					Task1						Task2
> > > > > =====					=====						=====
> > > > > [97] nr_add_node()
> > > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > > 					[214] nr_node_lock()
> > > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > > 					[283] nr_node_unlock()
> > > > > [214] nr_node_lock()
> > > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > > 											[XXX] nr_neigh_put()
> > > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > > use after free happens on line [248].  Do we really need three threads to
> > > > > make this race work?
> > > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > > executing [XXX]nr_neigh_put().
> > > Execution Order:
> > > 1 -> Task0
> > > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> > >
> > > 2 -> Task1
> > > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> > >
> > > 3 -> Task0
> > > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> > >
> > > 4 -> Task2
> > > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> > 
> > Let's step back a bit and look at the bigger picture design.  (Which is
> > completely undocumented so we're just guessing).
> > 
> > When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> > reference count and nr_neigh->count++, then when we remove it from
> > ->routes[] we drop the reference and do nr_neigh->count--.
> > 
> > If it's the last reference (and we are not holding ->locked) then we
> > remove it from the &nr_neigh_list and drop the reference count again and
> > free it.  So we drop the reference count twice.  This is a complicated
> > design with three variables: nr_neigh_hold(), nr_neigh->count and
> > ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> > instead of setting locked = true we would just take an extra reference?
> > The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
> locked controls whether the neighbor quality can be automatically updated;

I'm not sure your patch fixes the bug because we could still race against
nr_del_node().

I'm not saying get rid of locked completely, I'm saying get rid of code like
this:
		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
			nr_remove_neigh(nr_node->routes[2].neighbour);

Right now, locked serves as a special kind of reference count, because we
don't drop the reference if it's true.

> count controls the number of different routes a neighbor is linked to;

Sure, that is interesting information for the user, so keep it around to
print in the proc file, but don't use it as a reference count.

> refcount is simply used to manage the neighbor lifecycle.

The bug is caused because our reference counting is bad.

So right now what happens is we allocate nr_neigh and we put it on the
&nr_neigh_list.  Then we lock it or we add it to ->routes[] and each of
those has a different reference count.  Then when we drop those references
we do:

		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
			nr_remove_neigh(nr_node->routes[2].neighbour);

This removes it from the list, and hopefully this is the last reference
and it frees it.

It would be much simpler to say, we only use nr_neigh_hold()/put() for
reference counting.  When we set locked we do:

	nr_neigh_hold(nr_neigh);
	nr_neigh->locked  = true;

Incrementing the refcount means it can't be freed.

Then when we remove nr_neigh from ->routes[] we wouldn't "remove it from
the list", instead we would just drop a reference.  When we dropped the
last reference, nr_neigh_put() would remove it from the list.

My proposal would be a behavior change because right now what happens is:

1: allocate nr_neigh
2: add it to ->routes[]
3: remove it from ->routes[]
   (freed automatically because we drop two references)

Now it would be:
1: allocate nr_neigh
2: add it to ->routes[]
3: remove it from ->routes[]
4: needs to be freed manually with nr_del_neigh().

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
  2025-10-21  6:36                   ` Dan Carpenter
@ 2025-10-21  8:34                     ` Lizhi Xu
  2025-10-21  8:35                     ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Lizhi Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Lizhi Xu @ 2025-10-21  8:34 UTC (permalink / raw)
  To: dan.carpenter
  Cc: lizhi.xu, davem, edumazet, horms, kuba, linux-hams, linux-kernel,
	netdev, pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

On Tue, 21 Oct 2025 09:36:47 +0300, Dan Carpenter wrote:
> On Tue, Oct 21, 2025 at 10:05:33AM +0800, Lizhi Xu wrote:
> > On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> > > On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > > > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > > > Task0					Task1						Task2
> > > > > > =====					=====						=====
> > > > > > [97] nr_add_node()
> > > > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > > > 					[214] nr_node_lock()
> > > > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > > > 					[283] nr_node_unlock()
> > > > > > [214] nr_node_lock()
> > > > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > > > 											[XXX] nr_neigh_put()
> > > > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > > > >
> > > > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > > > use after free happens on line [248].  Do we really need three threads to
> > > > > > make this race work?
> > > > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > > > executing [XXX]nr_neigh_put().
> > > > Execution Order:
> > > > 1 -> Task0
> > > > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> > > >
> > > > 2 -> Task1
> > > > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > > > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> > > >
> > > > 3 -> Task0
> > > > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> > > >
> > > > 4 -> Task2
> > > > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > > > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> > >
> > > Let's step back a bit and look at the bigger picture design.  (Which is
> > > completely undocumented so we're just guessing).
> > >
> > > When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> > > reference count and nr_neigh->count++, then when we remove it from
> > > ->routes[] we drop the reference and do nr_neigh->count--.
> > >
> > > If it's the last reference (and we are not holding ->locked) then we
> > > remove it from the &nr_neigh_list and drop the reference count again and
> > > free it.  So we drop the reference count twice.  This is a complicated
> > > design with three variables: nr_neigh_hold(), nr_neigh->count and
> > > ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> > > instead of setting locked = true we would just take an extra reference?
> > > The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
> > locked controls whether the neighbor quality can be automatically updated;
>
> I'm not sure your patch fixes the bug because we could still race against
> nr_del_node().
This is fine in this issue, but for rigor, I'll add locks to all related
ioctl and route frame operations to maintain synchronization.
I will send V3 patch to improve it.
> 
> I'm not saying get rid of locked completely, I'm saying get rid of code like
> this:
> 		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
> 			nr_remove_neigh(nr_node->routes[2].neighbour);
> 
> Right now, locked serves as a special kind of reference count, because we
> don't drop the reference if it's true.
I don't think this is correct.
> 
> > count controls the number of different routes a neighbor is linked to;
> 
> Sure, that is interesting information for the user, so keep it around to
> print in the proc file, but don't use it as a reference count.
> 
> > refcount is simply used to manage the neighbor lifecycle.
> 
> The bug is caused because our reference counting is bad.
> 
> So right now what happens is we allocate nr_neigh and we put it on the
> &nr_neigh_list.  Then we lock it or we add it to ->routes[] and each of
> those has a different reference count.  Then when we drop those references
> we do:
> 
> 		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
> 			nr_remove_neigh(nr_node->routes[2].neighbour);
> 
> This removes it from the list, and hopefully this is the last reference
> and it frees it.
> 
> It would be much simpler to say, we only use nr_neigh_hold()/put() for
> reference counting.  When we set locked we do:
> 
> 	nr_neigh_hold(nr_neigh);
> 	nr_neigh->locked  = true;
> 
> Incrementing the refcount means it can't be freed.
No, setting locked = 1 is only done in nr_add_neigh(), and nr_neigh_hold()
is not executed, and the refcount value is 1.
> 
> Then when we remove nr_neigh from ->routes[] we wouldn't "remove it from
> the list", instead we would just drop a reference.  When we dropped the
> last reference, nr_neigh_put() would remove it from the list.
> 
> My proposal would be a behavior change because right now what happens is:
> 
> 1: allocate nr_neigh
> 2: add it to ->routes[]
> 3: remove it from ->routes[]
>    (freed automatically because we drop two references)
No, No, I know where your analysis went wrong, it is here.

The problem is not when allocating neigh and adding it to routes[2],
but when nr_add_node is executed twice later, one is Task0 as I mentioned
above, and the other is Task1.
After Task1 moves the neighbor out of routes, Task0 uses nr_neigh_get_dev()
to get the neighbor that Task1 moved out, and then adds it to its routes.
This is wrong. Task0 should not use nr_neigh_get_dev() to obtain the neighbor
before other tasks move it out. This will interfere with the reference count
of the neighbor, which is the root cause of the problem.

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V3] netrom: Prevent race conditions between neighbor operations
  2025-10-21  6:36                   ` Dan Carpenter
  2025-10-21  8:34                     ` Lizhi Xu
@ 2025-10-21  8:35                     ` Lizhi Xu
  2025-10-23 11:44                       ` Paolo Abeni
  1 sibling, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-21  8:35 UTC (permalink / raw)
  To: dan.carpenter
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, lizhi.xu,
	netdev, pabeni, syzbot+2860e75836a08b172755, syzkaller-bugs

The root cause of the problem is that multiple different tasks initiate
SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
between them to protect the same nr_neigh.

Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF on Task2:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()
											if (nr_node->routes[2].neighbour->count
Description of the UAF triggering process:
First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
Then, Task 1 puts the same neighbor from its routes[2] and executes
nr_remove_neigh() because the count is 0. After these two operations,
the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
lock and writes it to its routes[2].neighbour.
Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
release the neighbor. The subsequent execution of the neighbor->count
check triggers a UAF.

The solution to the problem is to use a lock to synchronize each add a
route to node, but for rigor, I'll add locks to related ioctl and route
frame operations to maintain synchronization.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf
V2 -> V3: sync neighbor operations in ioctl and route frame, update comments

 net/netrom/nr_route.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..debe3e925338 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
 static DEFINE_SPINLOCK(nr_node_list_lock);
 static HLIST_HEAD(nr_neigh_list);
 static DEFINE_SPINLOCK(nr_neigh_list_lock);
+static DEFINE_MUTEX(neighbor_lock);
 
 static struct nr_node *nr_node_get(ax25_address *callsign)
 {
@@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
 	ax25_digi digi;
 	int ret;
 
+	guard(mutex)(&neighbor_lock);
+
 	switch (cmd) {
 	case SIOCADDRT:
 		if (copy_from_user(&nr_route, arg, sizeof(struct nr_route_struct)))
@@ -765,6 +768,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 	nr_dest = (ax25_address *)(skb->data + 7);
 
 	if (ax25 != NULL) {
+		guard(mutex)(&neighbor_lock);
 		ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
 				  ax25->ax25_dev->dev, 0,
 				  READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
  2025-10-21  8:35                     ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Lizhi Xu
@ 2025-10-23 11:44                       ` Paolo Abeni
  2025-10-23 11:54                         ` Eric Dumazet
                                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Paolo Abeni @ 2025-10-23 11:44 UTC (permalink / raw)
  To: Lizhi Xu, dan.carpenter
  Cc: davem, edumazet, horms, kuba, linux-hams, linux-kernel, netdev,
	syzbot+2860e75836a08b172755, syzkaller-bugs

On 10/21/25 10:35 AM, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> between them to protect the same nr_neigh.
> 
> Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF on Task2:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 											if (nr_node->routes[2].neighbour->count
> Description of the UAF triggering process:
> First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> Then, Task 1 puts the same neighbor from its routes[2] and executes
> nr_remove_neigh() because the count is 0. After these two operations,
> the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> lock and writes it to its routes[2].neighbour.
> Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> release the neighbor. The subsequent execution of the neighbor->count
> check triggers a UAF.
> 
> The solution to the problem is to use a lock to synchronize each add a
> route to node, but for rigor, I'll add locks to related ioctl and route
> frame operations to maintain synchronization.

I think that adding another locking mechanism on top of an already
complex and not well understood locking and reference infra is not the
right direction.

Why reordering the statements as:

	if (nr_node->routes[2].neighbour->count == 0 &&
!nr_node->routes[2].neighbour->locked)
		nr_remove_neigh(nr_node->routes[2].neighbour);
	nr_neigh_put(nr_node->routes[2].neighbour);

is not enough?

> syzbot reported:
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> 
> Call Trace:
>  <TASK>
>  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> 
> Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>



> ---
> V1 -> V2: update comments for cause uaf
> V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> 
>  net/netrom/nr_route.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> index b94cb2ffbaf8..debe3e925338 100644
> --- a/net/netrom/nr_route.c
> +++ b/net/netrom/nr_route.c
> @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
>  static DEFINE_SPINLOCK(nr_node_list_lock);
>  static HLIST_HEAD(nr_neigh_list);
>  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> +static DEFINE_MUTEX(neighbor_lock);
>  
>  static struct nr_node *nr_node_get(ax25_address *callsign)
>  {
> @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
>  	ax25_digi digi;
>  	int ret;
>  
> +	guard(mutex)(&neighbor_lock);

See:

https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395

/P


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
  2025-10-23 11:44                       ` Paolo Abeni
@ 2025-10-23 11:54                         ` Eric Dumazet
  2025-10-23 12:41                         ` Lizhi Xu
  2025-10-24 10:45                         ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Dan Carpenter
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2025-10-23 11:54 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Lizhi Xu, dan.carpenter, davem, horms, kuba, linux-hams,
	linux-kernel, netdev, syzbot+2860e75836a08b172755, syzkaller-bugs

On Thu, Oct 23, 2025 at 4:44 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/21/25 10:35 AM, Lizhi Xu wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0                                 Task1                                           Task2
> > =====                                 =====                                           =====
> > nr_add_node()
> > nr_neigh_get_dev()                    nr_add_node()
> >                                       nr_node_lock()
> >                                       nr_node->routes[2].neighbour->count--
> >                                       nr_neigh_put(nr_node->routes[2].neighbour);
> >                                       nr_remove_neigh(nr_node->routes[2].neighbour)
> >                                       nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);                                                              nr_add_node()
> >                                                                                       nr_neigh_put()
> >                                                                                       if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> >
> > The solution to the problem is to use a lock to synchronize each add a
> > route to node, but for rigor, I'll add locks to related ioctl and route
> > frame operations to maintain synchronization.
>
> I think that adding another locking mechanism on top of an already
> complex and not well understood locking and reference infra is not the
> right direction.
>
> Why reordering the statements as:
>
>         if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
>                 nr_remove_neigh(nr_node->routes[2].neighbour);
>         nr_neigh_put(nr_node->routes[2].neighbour);
>
> is not enough?
>
> > syzbot reported:
> > BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> > Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> >
> > Call Trace:
> >  <TASK>
> >  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> >
> > Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> > Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
>
>
>
> > ---
> > V1 -> V2: update comments for cause uaf
> > V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> >
> >  net/netrom/nr_route.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> > index b94cb2ffbaf8..debe3e925338 100644
> > --- a/net/netrom/nr_route.c
> > +++ b/net/netrom/nr_route.c
> > @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
> >  static DEFINE_SPINLOCK(nr_node_list_lock);
> >  static HLIST_HEAD(nr_neigh_list);
> >  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> > +static DEFINE_MUTEX(neighbor_lock);
> >
> >  static struct nr_node *nr_node_get(ax25_address *callsign)
> >  {
> > @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
> >       ax25_digi digi;
> >       int ret;
> >
> > +     guard(mutex)(&neighbor_lock);
>
> See:
>
> https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395
>

I would also try to use a single spinlock : ie fuse together
nr_node_list_lock and nr_neigh_list_lock

Having two locks for something that is primarily used by fuzzers
nowadays is wasting our time.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
  2025-10-23 11:44                       ` Paolo Abeni
  2025-10-23 11:54                         ` Eric Dumazet
@ 2025-10-23 12:41                         ` Lizhi Xu
  2025-10-23 13:50                           ` [PATCH V4] netrom: Preventing the use of abnormal neighbor Lizhi Xu
  2025-10-24 10:45                         ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Dan Carpenter
  2 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-23 12:41 UTC (permalink / raw)
  To: pabeni
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, lizhi.xu, netdev, syzbot+2860e75836a08b172755,
	syzkaller-bugs

On Thu, 23 Oct 2025 13:44:18 +0200, Paolo Abeni wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 											if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> >
> > The solution to the problem is to use a lock to synchronize each add a
> > route to node, but for rigor, I'll add locks to related ioctl and route
> > frame operations to maintain synchronization.
> 
> I think that adding another locking mechanism on top of an already
> complex and not well understood locking and reference infra is not the
> right direction.
> 
> Why reordering the statements as:
> 
> 	if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
> 		nr_remove_neigh(nr_node->routes[2].neighbour);
> 	nr_neigh_put(nr_node->routes[2].neighbour);
> 
> is not enough?
This is not enough, the same uaf will appear, nr_remove_neigh() will also
execute nr_neigh_put(), and then executing nr_neigh_put() again will
trigger the uaf.
> 
> > syzbot reported:
> > BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> > Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> >
> > Call Trace:
> >  <TASK>
> >  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> >
> > Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> > Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
> 
> 
> 
> > ---
> > V1 -> V2: update comments for cause uaf
> > V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> >
> >  net/netrom/nr_route.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> > index b94cb2ffbaf8..debe3e925338 100644
> > --- a/net/netrom/nr_route.c
> > +++ b/net/netrom/nr_route.c
> > @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
> >  static DEFINE_SPINLOCK(nr_node_list_lock);
> >  static HLIST_HEAD(nr_neigh_list);
> >  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> > +static DEFINE_MUTEX(neighbor_lock);
> >
> >  static struct nr_node *nr_node_get(ax25_address *callsign)
> >  {
> > @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
> >  	ax25_digi digi;
> >  	int ret;
> >
> > +	guard(mutex)(&neighbor_lock);
> 
> See:
> 
> https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395
Using guard is not recommended. I'll reconsider.

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4] netrom: Preventing the use of abnormal neighbor
  2025-10-23 12:41                         ` Lizhi Xu
@ 2025-10-23 13:50                           ` Lizhi Xu
  2025-10-28 14:13                             ` Paolo Abeni
  0 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-23 13:50 UTC (permalink / raw)
  To: lizhi.xu
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, netdev, pabeni, syzbot+2860e75836a08b172755,
	syzkaller-bugs

The root cause of the problem is that multiple different tasks initiate
SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
between them to protect the same nr_neigh.

Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF on Task2:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()
											if (nr_node->routes[2].neighbour->count
Description of the UAF triggering process:
First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
Then, Task 1 puts the same neighbor from its routes[2] and executes
nr_remove_neigh() because the count is 0. After these two operations,
the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
lock and writes it to its routes[2].neighbour.
Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
release the neighbor. The subsequent execution of the neighbor->count
check triggers a UAF.

Filter out neighbors with a refcount of 1 to avoid unsafe conditions.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf
V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
V3 -> V4: Preventing the use of neighbors with a reference count of 1

 net/netrom/nr_route.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..1ef2743a5ec0 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -100,7 +100,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 {
 	struct nr_node  *nr_node;
 	struct nr_neigh *nr_neigh;
-	int i, found;
+	int i, found, ret = 0;
 	struct net_device *odev;
 
 	if ((odev=nr_dev_get(nr)) != NULL) {	/* Can't add routes to ourself */
@@ -212,6 +212,10 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 		return 0;
 	}
 	nr_node_lock(nr_node);
+	if (refcount_read(&nr_neigh->refcount) == 1) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (quality != 0)
 		strscpy(nr_node->mnemonic, mnemonic);
@@ -279,10 +283,11 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 		}
 	}
 
+out:
 	nr_neigh_put(nr_neigh);
 	nr_node_unlock(nr_node);
 	nr_node_put(nr_node);
-	return 0;
+	return ret;
 }
 
 static void nr_remove_node_locked(struct nr_node *nr_node)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
  2025-10-23 11:44                       ` Paolo Abeni
  2025-10-23 11:54                         ` Eric Dumazet
  2025-10-23 12:41                         ` Lizhi Xu
@ 2025-10-24 10:45                         ` Dan Carpenter
  2 siblings, 0 replies; 27+ messages in thread
From: Dan Carpenter @ 2025-10-24 10:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Lizhi Xu, davem, edumazet, horms, kuba, linux-hams, linux-kernel,
	netdev, syzbot+2860e75836a08b172755, syzkaller-bugs

On Thu, Oct 23, 2025 at 01:44:18PM +0200, Paolo Abeni wrote:
> Why reordering the statements as:
> 
> 	if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
> 		nr_remove_neigh(nr_node->routes[2].neighbour);
> 	nr_neigh_put(nr_node->routes[2].neighbour);
> 
> is not enough?

There are so many unfortunate things like this:

net/netrom/nr_route.c
   243                          /* It must be better than the worst */
   244                          if (quality > nr_node->routes[2].quality) {
   245                                  nr_node->routes[2].neighbour->count--;

++/-- are not atomic.

   246                                  nr_neigh_put(nr_node->routes[2].neighbour);
   247  
   248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
   249                                          nr_remove_neigh(nr_node->routes[2].neighbour);
   250  
   251                                  nr_node->routes[2].quality   = quality;
   252                                  nr_node->routes[2].obs_count = obs_count;
   253                                  nr_node->routes[2].neighbour = nr_neigh;

This line should come after the next two lines.

   254  
   255                                  nr_neigh_hold(nr_neigh);
   256                                  nr_neigh->count++;
   257                          }

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
  2025-10-23 13:50                           ` [PATCH V4] netrom: Preventing the use of abnormal neighbor Lizhi Xu
@ 2025-10-28 14:13                             ` Paolo Abeni
  2025-10-29  2:59                               ` Lizhi Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Abeni @ 2025-10-28 14:13 UTC (permalink / raw)
  To: Lizhi Xu
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, netdev, syzbot+2860e75836a08b172755, syzkaller-bugs

On 10/23/25 3:50 PM, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> between them to protect the same nr_neigh.
> 
> Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF on Task2:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 											if (nr_node->routes[2].neighbour->count
> Description of the UAF triggering process:
> First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> Then, Task 1 puts the same neighbor from its routes[2] and executes
> nr_remove_neigh() because the count is 0. After these two operations,
> the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> lock and writes it to its routes[2].neighbour.
> Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> release the neighbor. The subsequent execution of the neighbor->count
> check triggers a UAF.

I looked at the code quite a bit and I think this could possibly avoid
the above mentioned race, but this whole area looks quite confusing to me.

I think it would be helpful if you could better describe the relevant
scenario starting from the initial setup (no nodes, no neighs).

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
  2025-10-28 14:13                             ` Paolo Abeni
@ 2025-10-29  2:59                               ` Lizhi Xu
  2025-11-13  6:33                                 ` Lizhi Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Lizhi Xu @ 2025-10-29  2:59 UTC (permalink / raw)
  To: pabeni
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, lizhi.xu, netdev, syzbot+2860e75836a08b172755,
	syzkaller-bugs

On Tue, 28 Oct 2025 15:13:37 +0100, Paolo Abeni wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 											if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> 
> I looked at the code quite a bit and I think this could possibly avoid
> the above mentioned race, but this whole area looks quite confusing to me.
> 
> I think it would be helpful if you could better describe the relevant
> scenario starting from the initial setup (no nodes, no neighs).
OK. Let me fill in the origin of neigh.

Task3
=====
nr_add_node()
[146]if ((nr_neigh = kmalloc(sizeof(*nr_neigh), GFP_ATOMIC)) == NULL)
[253]nr_node->routes[2].neighbour = nr_neigh;
[255]nr_neigh_hold(nr_neigh);
[256]nr_neigh->count++;

neigh is created on line 146 in nr_add_node(), and added to node on
lines 253-256. It occurs before all Task0, Task1, and Task2.

Note:
1. [x], x is line number.
2. During my debugging process, I didn't pay attention to where the node
was created, and I apologize that I cannot provide the relevant creation
process.

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
  2025-10-29  2:59                               ` Lizhi Xu
@ 2025-11-13  6:33                                 ` Lizhi Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Lizhi Xu @ 2025-11-13  6:33 UTC (permalink / raw)
  To: lizhi.xu
  Cc: dan.carpenter, davem, edumazet, horms, kuba, linux-hams,
	linux-kernel, netdev, syzbot+2860e75836a08b172755, syzkaller-bugs

On Wed, 29 Oct 2025 10:59:04 +0800, Lizhi Xu wrote:
> > > The root cause of the problem is that multiple different tasks initiate
> > > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > > between them to protect the same nr_neigh.
> > >
> > > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > > release the neighbour because its refcount value is 1.
> > >
> > > In this case, the following situation causes a UAF on Task2:
> > >
> > > Task0					Task1						Task2
> > > =====					=====						=====
> > > nr_add_node()
> > > nr_neigh_get_dev()			nr_add_node()
> > > 					nr_node_lock()
> > > 					nr_node->routes[2].neighbour->count--
> > > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > > 					nr_node_unlock()
> > > nr_node_lock()
> > > nr_node->routes[2].neighbour = nr_neigh
> > > nr_neigh_hold(nr_neigh);								nr_add_node()
> > > 											nr_neigh_put()
> > > 											if (nr_node->routes[2].neighbour->count
> > > Description of the UAF triggering process:
> > > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > > nr_remove_neigh() because the count is 0. After these two operations,
> > > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > > lock and writes it to its routes[2].neighbour.
> > > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > > release the neighbor. The subsequent execution of the neighbor->count
> > > check triggers a UAF.
> > 
> > I looked at the code quite a bit and I think this could possibly avoid
> > the above mentioned race, but this whole area looks quite confusing to me.
> > 
> > I think it would be helpful if you could better describe the relevant
> > scenario starting from the initial setup (no nodes, no neighs).
> OK. Let me fill in the origin of neigh.
> 
> Task3
> =====
> nr_add_node()
> [146]if ((nr_neigh = kmalloc(sizeof(*nr_neigh), GFP_ATOMIC)) == NULL)
> [253]nr_node->routes[2].neighbour = nr_neigh;
> [255]nr_neigh_hold(nr_neigh);
> [256]nr_neigh->count++;
> 
> neigh is created on line 146 in nr_add_node(), and added to node on
> lines 253-256. It occurs before all Task0, Task1, and Task2.
> 
> Note:
> 1. [x], x is line number.
> 2. During my debugging process, I didn't pay attention to where the node
> was created, and I apologize that I cannot provide the relevant creation
> process.
Hi everyone, 
Today is my last day at WindRiver. Starting tomorrow, my email address
lizhi.xu@windriver.com will no longer be used;
I will use eadavis@qq.com thereafter.

BR,
Lizhi

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-11-13  6:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-25 16:35 [syzbot] [hams?] KASAN: slab-use-after-free Read in nr_add_node syzbot
2025-10-18 20:37 ` syzbot
2025-10-19  5:02   ` Brahmajit Das
2025-10-19  5:21     ` syzbot
2025-10-19  5:10   ` Brahmajit Das
2025-10-19  5:10     ` syzbot
2025-10-20  8:13   ` [PATCH] netrom: Prevent race conditions between multiple add route Lizhi Xu
2025-10-20 10:10     ` Dan Carpenter
2025-10-20 11:02       ` [PATCH V2] " Lizhi Xu
2025-10-20 12:25         ` Dan Carpenter
2025-10-20 12:33           ` Dan Carpenter
2025-10-20 12:57           ` Dan Carpenter
2025-10-20 13:34           ` Lizhi Xu
2025-10-20 13:49             ` Lizhi Xu
2025-10-20 17:59               ` Dan Carpenter
2025-10-21  2:05                 ` Lizhi Xu
2025-10-21  6:36                   ` Dan Carpenter
2025-10-21  8:34                     ` Lizhi Xu
2025-10-21  8:35                     ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Lizhi Xu
2025-10-23 11:44                       ` Paolo Abeni
2025-10-23 11:54                         ` Eric Dumazet
2025-10-23 12:41                         ` Lizhi Xu
2025-10-23 13:50                           ` [PATCH V4] netrom: Preventing the use of abnormal neighbor Lizhi Xu
2025-10-28 14:13                             ` Paolo Abeni
2025-10-29  2:59                               ` Lizhi Xu
2025-11-13  6:33                                 ` Lizhi Xu
2025-10-24 10:45                         ` [PATCH V3] netrom: Prevent race conditions between neighbor operations Dan Carpenter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).