linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
@ 2025-06-24 18:51 syzbot
  2025-06-25 13:44 ` Jason Xing
  0 siblings, 1 reply; 9+ messages in thread
From: syzbot @ 2025-06-24 18:51 UTC (permalink / raw)
  To: andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com

netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
------------------------------------------------------
syz.1.1331/11144 is trying to acquire lock:
ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649

but task is already holding lock:
ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&net->xdp.lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:602 [inline]
       __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
       xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
       notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
       call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
       call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
       call_netdevice_notifiers net/core/dev.c:2282 [inline]
       unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
       unregister_netdevice_many net/core/dev.c:12140 [inline]
       unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
       register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
       lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
       lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
       notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
       call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
       call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
       call_netdevice_notifiers net/core/dev.c:2282 [inline]
       __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
       netif_change_flags+0x108/0x160 net/core/dev.c:9526
       dev_change_flags+0xba/0x250 net/core/dev_api.c:68
       devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
       inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
       sock_do_ioctl+0x118/0x280 net/socket.c:1190
       sock_ioctl+0x227/0x6b0 net/socket.c:1311
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:907 [inline]
       __se_sys_ioctl fs/ioctl.c:893 [inline]
       __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:602 [inline]
       __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
       netdev_lock include/linux/netdevice.h:2756 [inline]
       netdev_lock_ops include/net/netdev_lock.h:42 [inline]
       xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
       __sys_bind_socket net/socket.c:1810 [inline]
       __sys_bind_socket net/socket.c:1802 [inline]
       __sys_bind+0x1a7/0x260 net/socket.c:1841
       __do_sys_bind net/socket.c:1846 [inline]
       __se_sys_bind net/socket.c:1844 [inline]
       __x64_sys_bind+0x72/0xb0 net/socket.c:1844
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&xs->mutex){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3168 [inline]
       check_prevs_add kernel/locking/lockdep.c:3287 [inline]
       validate_chain kernel/locking/lockdep.c:3911 [inline]
       __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
       lock_acquire kernel/locking/lockdep.c:5871 [inline]
       lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
       __mutex_lock_common kernel/locking/mutex.c:602 [inline]
       __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
       xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
       notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
       call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
       call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
       call_netdevice_notifiers net/core/dev.c:2282 [inline]
       unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
       rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
       rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
       rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
       netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
       netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
       netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
       netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
       sock_sendmsg_nosec net/socket.c:712 [inline]
       __sock_sendmsg net/socket.c:727 [inline]
       ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
       ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
       __sys_sendmsg+0x16d/0x220 net/socket.c:2652
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&net->xdp.lock);
                               lock(&dev_instance_lock_key#20);
                               lock(&net->xdp.lock);
  lock(&xs->mutex);

 *** DEADLOCK ***

2 locks held by syz.1.1331/11144:
 #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
 #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
 #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_dellink+0x277/0xa80 net/core/rtnetlink.c:3545
 #1: ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645

stack backtrace:
CPU: 1 UID: 0 PID: 11144 Comm: syz.1.1331 Not tainted 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2046
 check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2178
 check_prev_add kernel/locking/lockdep.c:3168 [inline]
 check_prevs_add kernel/locking/lockdep.c:3287 [inline]
 validate_chain kernel/locking/lockdep.c:3911 [inline]
 __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
 lock_acquire kernel/locking/lockdep.c:5871 [inline]
 lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
 __mutex_lock_common kernel/locking/mutex.c:602 [inline]
 __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
 xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
 notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
 call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
 call_netdevice_notifiers net/core/dev.c:2282 [inline]
 unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
 rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
 rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
 rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
 netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
 __sys_sendmsg+0x16d/0x220 net/socket.c:2652
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f97c7b8e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f97c8abc038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f97c7db5fa0 RCX: 00007f97c7b8e929
RDX: 0000000000000000 RSI: 0000200000000040 RDI: 0000000000000003
RBP: 00007f97c7c10b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f97c7db5fa0 R15: 00007fff09d1ae48
 </TASK>
batman_adv: batadv0: Removing interface: batadv_slave_1


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-24 18:51 [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3) syzbot
@ 2025-06-25 13:44 ` Jason Xing
  2025-06-25 15:06   ` Stanislav Fomichev
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Xing @ 2025-06-25 13:44 UTC (permalink / raw)
  To: syzbot
  Cc: andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On Wed, Jun 25, 2025 at 2:51 AM syzbot
<syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
>
> netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> ------------------------------------------------------
> syz.1.1331/11144 is trying to acquire lock:
> ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
>
> but task is already holding lock:
> ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&net->xdp.lock){+.+.}-{4:4}:
>        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
>        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
>        xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
>        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
>        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
>        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
>        call_netdevice_notifiers net/core/dev.c:2282 [inline]
>        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
>        unregister_netdevice_many net/core/dev.c:12140 [inline]
>        unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
>        register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
>        lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
>        lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
>        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
>        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
>        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
>        call_netdevice_notifiers net/core/dev.c:2282 [inline]
>        __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
>        netif_change_flags+0x108/0x160 net/core/dev.c:9526
>        dev_change_flags+0xba/0x250 net/core/dev_api.c:68
>        devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
>        inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
>        sock_do_ioctl+0x118/0x280 net/socket.c:1190
>        sock_ioctl+0x227/0x6b0 net/socket.c:1311
>        vfs_ioctl fs/ioctl.c:51 [inline]
>        __do_sys_ioctl fs/ioctl.c:907 [inline]
>        __se_sys_ioctl fs/ioctl.c:893 [inline]
>        __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
>        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>        entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
>        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
>        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
>        netdev_lock include/linux/netdevice.h:2756 [inline]
>        netdev_lock_ops include/net/netdev_lock.h:42 [inline]
>        xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
>        __sys_bind_socket net/socket.c:1810 [inline]
>        __sys_bind_socket net/socket.c:1802 [inline]
>        __sys_bind+0x1a7/0x260 net/socket.c:1841
>        __do_sys_bind net/socket.c:1846 [inline]
>        __se_sys_bind net/socket.c:1844 [inline]
>        __x64_sys_bind+0x72/0xb0 net/socket.c:1844
>        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>        entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> -> #0 (&xs->mutex){+.+.}-{4:4}:
>        check_prev_add kernel/locking/lockdep.c:3168 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3287 [inline]
>        validate_chain kernel/locking/lockdep.c:3911 [inline]
>        __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
>        lock_acquire kernel/locking/lockdep.c:5871 [inline]
>        lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
>        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
>        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
>        xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
>        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
>        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
>        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
>        call_netdevice_notifiers net/core/dev.c:2282 [inline]
>        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
>        rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
>        rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
>        rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
>        netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
>        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
>        netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
>        netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
>        sock_sendmsg_nosec net/socket.c:712 [inline]
>        __sock_sendmsg net/socket.c:727 [inline]
>        ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
>        ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
>        __sys_sendmsg+0x16d/0x220 net/socket.c:2652
>        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>        entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> other info that might help us debug this:
>
> Chain exists of:
>   &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&net->xdp.lock);
>                                lock(&dev_instance_lock_key#20);
>                                lock(&net->xdp.lock);
>   lock(&xs->mutex);

I feel the above race map is not that right?

My understanding is as shown below.
CPU 0                                                    CPU 1
---                                                           ---
unregister_netdevice_many_notify()
                                                          xsk_bind()
netdev_lock_ops(dev);

mutex_lock(&xs->mutex);
                                                          netdev_lock_ops(dev);
xsk_notifier()
mutex_lock(&net->xdp.lock);
mutex_lock(&xs->mutex);

So ABBA lock case happens, IIUC.

Thanks,
Jason

>
>  *** DEADLOCK ***
>
> 2 locks held by syz.1.1331/11144:
>  #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
>  #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
>  #0: ffffffff9034e4a8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_dellink+0x277/0xa80 net/core/rtnetlink.c:3545
>  #1: ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 11144 Comm: syz.1.1331 Not tainted 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
>  print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2046
>  check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2178
>  check_prev_add kernel/locking/lockdep.c:3168 [inline]
>  check_prevs_add kernel/locking/lockdep.c:3287 [inline]
>  validate_chain kernel/locking/lockdep.c:3911 [inline]
>  __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
>  lock_acquire kernel/locking/lockdep.c:5871 [inline]
>  lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
>  __mutex_lock_common kernel/locking/mutex.c:602 [inline]
>  __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
>  xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
>  notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
>  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
>  call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
>  call_netdevice_notifiers net/core/dev.c:2282 [inline]
>  unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
>  rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
>  rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
>  rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
>  netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
>  netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
>  netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
>  netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
>  sock_sendmsg_nosec net/socket.c:712 [inline]
>  __sock_sendmsg net/socket.c:727 [inline]
>  ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
>  ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
>  __sys_sendmsg+0x16d/0x220 net/socket.c:2652
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f97c7b8e929
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f97c8abc038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f97c7db5fa0 RCX: 00007f97c7b8e929
> RDX: 0000000000000000 RSI: 0000200000000040 RDI: 0000000000000003
> RBP: 00007f97c7c10b39 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007f97c7db5fa0 R15: 00007fff09d1ae48
>  </TASK>
> batman_adv: batadv0: Removing interface: batadv_slave_1
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 13:44 ` Jason Xing
@ 2025-06-25 15:06   ` Stanislav Fomichev
  2025-06-25 15:38     ` Jason Xing
  0 siblings, 1 reply; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-25 15:06 UTC (permalink / raw)
  To: Jason Xing
  Cc: syzbot, andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On 06/25, Jason Xing wrote:
> On Wed, Jun 25, 2025 at 2:51 AM syzbot
> <syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> > compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
> >
> > netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> > ------------------------------------------------------
> > syz.1.1331/11144 is trying to acquire lock:
> > ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> >
> > but task is already holding lock:
> > ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> >        xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> >        unregister_netdevice_many net/core/dev.c:12140 [inline]
> >        unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
> >        register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
> >        lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
> >        lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
> >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> >        __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
> >        netif_change_flags+0x108/0x160 net/core/dev.c:9526
> >        dev_change_flags+0xba/0x250 net/core/dev_api.c:68
> >        devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
> >        inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
> >        sock_do_ioctl+0x118/0x280 net/socket.c:1190
> >        sock_ioctl+0x227/0x6b0 net/socket.c:1311
> >        vfs_ioctl fs/ioctl.c:51 [inline]
> >        __do_sys_ioctl fs/ioctl.c:907 [inline]
> >        __se_sys_ioctl fs/ioctl.c:893 [inline]
> >        __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
> >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> >        netdev_lock include/linux/netdevice.h:2756 [inline]
> >        netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> >        xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
> >        __sys_bind_socket net/socket.c:1810 [inline]
> >        __sys_bind_socket net/socket.c:1802 [inline]
> >        __sys_bind+0x1a7/0x260 net/socket.c:1841
> >        __do_sys_bind net/socket.c:1846 [inline]
> >        __se_sys_bind net/socket.c:1844 [inline]
> >        __x64_sys_bind+0x72/0xb0 net/socket.c:1844
> >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > -> #0 (&xs->mutex){+.+.}-{4:4}:
> >        check_prev_add kernel/locking/lockdep.c:3168 [inline]
> >        check_prevs_add kernel/locking/lockdep.c:3287 [inline]
> >        validate_chain kernel/locking/lockdep.c:3911 [inline]
> >        __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
> >        lock_acquire kernel/locking/lockdep.c:5871 [inline]
> >        lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
> >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> >        xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> >        rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
> >        rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
> >        rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
> >        netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
> >        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> >        netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
> >        netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
> >        sock_sendmsg_nosec net/socket.c:712 [inline]
> >        __sock_sendmsg net/socket.c:727 [inline]
> >        ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
> >        ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
> >        __sys_sendmsg+0x16d/0x220 net/socket.c:2652
> >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> >   &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
> >
> >  Possible unsafe locking scenario:
> >
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&net->xdp.lock);
> >                                lock(&dev_instance_lock_key#20);
> >                                lock(&net->xdp.lock);
> >   lock(&xs->mutex);
> 
> I feel the above race map is not that right?
> 
> My understanding is as shown below.
> CPU 0                                                    CPU 1
> ---                                                           ---
> unregister_netdevice_many_notify()
>                                                           xsk_bind()
> netdev_lock_ops(dev);
> 
> mutex_lock(&xs->mutex);
>                                                           netdev_lock_ops(dev);
> xsk_notifier()
> mutex_lock(&net->xdp.lock);
> mutex_lock(&xs->mutex);
> 
> So ABBA lock case happens, IIUC.

Since we can't (easily) control the ordering in notifiers, looks like
we need to align xsk_bind ordering (to be instance lock -> xs->mutex).
LMK if you want to take a stab at this; otherwise I'll try to send a
fix.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 15:06   ` Stanislav Fomichev
@ 2025-06-25 15:38     ` Jason Xing
  2025-06-25 15:46       ` Stanislav Fomichev
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Xing @ 2025-06-25 15:38 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: syzbot, andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On Wed, Jun 25, 2025 at 11:06 PM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 06/25, Jason Xing wrote:
> > On Wed, Jun 25, 2025 at 2:51 AM syzbot
> > <syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> > > compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
> > >
> > > netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> > > ------------------------------------------------------
> > > syz.1.1331/11144 is trying to acquire lock:
> > > ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > >
> > > but task is already holding lock:
> > > ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > >
> > > which lock already depends on the new lock.
> > >
> > >
> > > the existing dependency chain (in reverse order) is:
> > >
> > > -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > >        xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > >        unregister_netdevice_many net/core/dev.c:12140 [inline]
> > >        unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
> > >        register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
> > >        lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
> > >        lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
> > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > >        __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
> > >        netif_change_flags+0x108/0x160 net/core/dev.c:9526
> > >        dev_change_flags+0xba/0x250 net/core/dev_api.c:68
> > >        devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
> > >        inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
> > >        sock_do_ioctl+0x118/0x280 net/socket.c:1190
> > >        sock_ioctl+0x227/0x6b0 net/socket.c:1311
> > >        vfs_ioctl fs/ioctl.c:51 [inline]
> > >        __do_sys_ioctl fs/ioctl.c:907 [inline]
> > >        __se_sys_ioctl fs/ioctl.c:893 [inline]
> > >        __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
> > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > >
> > > -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > >        netdev_lock include/linux/netdevice.h:2756 [inline]
> > >        netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > >        xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
> > >        __sys_bind_socket net/socket.c:1810 [inline]
> > >        __sys_bind_socket net/socket.c:1802 [inline]
> > >        __sys_bind+0x1a7/0x260 net/socket.c:1841
> > >        __do_sys_bind net/socket.c:1846 [inline]
> > >        __se_sys_bind net/socket.c:1844 [inline]
> > >        __x64_sys_bind+0x72/0xb0 net/socket.c:1844
> > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > >
> > > -> #0 (&xs->mutex){+.+.}-{4:4}:
> > >        check_prev_add kernel/locking/lockdep.c:3168 [inline]
> > >        check_prevs_add kernel/locking/lockdep.c:3287 [inline]
> > >        validate_chain kernel/locking/lockdep.c:3911 [inline]
> > >        __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
> > >        lock_acquire kernel/locking/lockdep.c:5871 [inline]
> > >        lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
> > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > >        xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > >        rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
> > >        rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
> > >        rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
> > >        netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
> > >        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> > >        netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
> > >        netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
> > >        sock_sendmsg_nosec net/socket.c:712 [inline]
> > >        __sock_sendmsg net/socket.c:727 [inline]
> > >        ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
> > >        ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
> > >        __sys_sendmsg+0x16d/0x220 net/socket.c:2652
> > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > >
> > > other info that might help us debug this:
> > >
> > > Chain exists of:
> > >   &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
> > >
> > >  Possible unsafe locking scenario:
> > >
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&net->xdp.lock);
> > >                                lock(&dev_instance_lock_key#20);
> > >                                lock(&net->xdp.lock);
> > >   lock(&xs->mutex);
> >
> > I feel the above race map is not that right?
> >
> > My understanding is as shown below.
> > CPU 0                                                    CPU 1
> > ---                                                           ---
> > unregister_netdevice_many_notify()
> >                                                           xsk_bind()
> > netdev_lock_ops(dev);
> >
> > mutex_lock(&xs->mutex);
> >                                                           netdev_lock_ops(dev);
> > xsk_notifier()
> > mutex_lock(&net->xdp.lock);
> > mutex_lock(&xs->mutex);
> >
> > So ABBA lock case happens, IIUC.
>
> Since we can't (easily) control the ordering in notifiers, looks like
> we need to align xsk_bind ordering (to be instance lock -> xs->mutex).
> LMK if you want to take a stab at this; otherwise I'll try to send a
> fix.

I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
worried if I'm capable of completing it. I will try then.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 15:38     ` Jason Xing
@ 2025-06-25 15:46       ` Stanislav Fomichev
  2025-06-25 20:48         ` Stanislav Fomichev
  0 siblings, 1 reply; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-25 15:46 UTC (permalink / raw)
  To: Jason Xing
  Cc: syzbot, andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On 06/25, Jason Xing wrote:
> On Wed, Jun 25, 2025 at 11:06 PM Stanislav Fomichev
> <stfomichev@gmail.com> wrote:
> >
> > On 06/25, Jason Xing wrote:
> > > On Wed, Jun 25, 2025 at 2:51 AM syzbot
> > > <syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> > > > compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
> > > >
> > > > netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> > > > ------------------------------------------------------
> > > > syz.1.1331/11144 is trying to acquire lock:
> > > > ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > >
> > > > but task is already holding lock:
> > > > ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > >
> > > > which lock already depends on the new lock.
> > > >
> > > >
> > > > the existing dependency chain (in reverse order) is:
> > > >
> > > > -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > >        xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > >        unregister_netdevice_many net/core/dev.c:12140 [inline]
> > > >        unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
> > > >        register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
> > > >        lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
> > > >        lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
> > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > >        __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
> > > >        netif_change_flags+0x108/0x160 net/core/dev.c:9526
> > > >        dev_change_flags+0xba/0x250 net/core/dev_api.c:68
> > > >        devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
> > > >        inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
> > > >        sock_do_ioctl+0x118/0x280 net/socket.c:1190
> > > >        sock_ioctl+0x227/0x6b0 net/socket.c:1311
> > > >        vfs_ioctl fs/ioctl.c:51 [inline]
> > > >        __do_sys_ioctl fs/ioctl.c:907 [inline]
> > > >        __se_sys_ioctl fs/ioctl.c:893 [inline]
> > > >        __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
> > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > >        netdev_lock include/linux/netdevice.h:2756 [inline]
> > > >        netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > > >        xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
> > > >        __sys_bind_socket net/socket.c:1810 [inline]
> > > >        __sys_bind_socket net/socket.c:1802 [inline]
> > > >        __sys_bind+0x1a7/0x260 net/socket.c:1841
> > > >        __do_sys_bind net/socket.c:1846 [inline]
> > > >        __se_sys_bind net/socket.c:1844 [inline]
> > > >        __x64_sys_bind+0x72/0xb0 net/socket.c:1844
> > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > -> #0 (&xs->mutex){+.+.}-{4:4}:
> > > >        check_prev_add kernel/locking/lockdep.c:3168 [inline]
> > > >        check_prevs_add kernel/locking/lockdep.c:3287 [inline]
> > > >        validate_chain kernel/locking/lockdep.c:3911 [inline]
> > > >        __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
> > > >        lock_acquire kernel/locking/lockdep.c:5871 [inline]
> > > >        lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
> > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > >        xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > >        rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
> > > >        rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
> > > >        rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
> > > >        netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
> > > >        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> > > >        netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
> > > >        netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
> > > >        sock_sendmsg_nosec net/socket.c:712 [inline]
> > > >        __sock_sendmsg net/socket.c:727 [inline]
> > > >        ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
> > > >        ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
> > > >        __sys_sendmsg+0x16d/0x220 net/socket.c:2652
> > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > other info that might help us debug this:
> > > >
> > > > Chain exists of:
> > > >   &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
> > > >
> > > >  Possible unsafe locking scenario:
> > > >
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&net->xdp.lock);
> > > >                                lock(&dev_instance_lock_key#20);
> > > >                                lock(&net->xdp.lock);
> > > >   lock(&xs->mutex);
> > >
> > > I feel the above race map is not that right?
> > >
> > > My understanding is as shown below.
> > > CPU 0                                                    CPU 1
> > > ---                                                           ---
> > > unregister_netdevice_many_notify()
> > >                                                           xsk_bind()
> > > netdev_lock_ops(dev);
> > >
> > > mutex_lock(&xs->mutex);
> > >                                                           netdev_lock_ops(dev);
> > > xsk_notifier()
> > > mutex_lock(&net->xdp.lock);
> > > mutex_lock(&xs->mutex);
> > >
> > > So ABBA lock case happens, IIUC.
> >
> > Since we can't (easily) control the ordering in notifiers, looks like
> > we need to align xsk_bind ordering (to be instance lock -> xs->mutex).
> > LMK if you want to take a stab at this; otherwise I'll try to send a
> > fix.
> 
> I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> worried if I'm capable of completing it. I will try then.

SG, thanks! If you need more details lmk, but basically we need to reorder
netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
And similarly for cleanup (out_unlock/out_release) path.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 15:46       ` Stanislav Fomichev
@ 2025-06-25 20:48         ` Stanislav Fomichev
  2025-06-25 21:03           ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-25 20:48 UTC (permalink / raw)
  To: Jason Xing
  Cc: syzbot, andrii, ast, bjorn, bpf, daniel, davem, edumazet, horms,
	jonathan.lemon, kuba, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On 06/25, Stanislav Fomichev wrote:
> On 06/25, Jason Xing wrote:
> > On Wed, Jun 25, 2025 at 11:06 PM Stanislav Fomichev
> > <stfomichev@gmail.com> wrote:
> > >
> > > On 06/25, Jason Xing wrote:
> > > > On Wed, Jun 25, 2025 at 2:51 AM syzbot
> > > > <syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit:    78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> > > > > git tree:       upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> > > > > compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > Downloadable assets:
> > > > > disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> > > > > kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
> > > > >
> > > > > netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> > > > > ======================================================
> > > > > WARNING: possible circular locking dependency detected
> > > > > 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> > > > > ------------------------------------------------------
> > > > > syz.1.1331/11144 is trying to acquire lock:
> > > > > ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > > >
> > > > > but task is already holding lock:
> > > > > ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > > >
> > > > > which lock already depends on the new lock.
> > > > >
> > > > >
> > > > > the existing dependency chain (in reverse order) is:
> > > > >
> > > > > -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> > > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > >        xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > > >        unregister_netdevice_many net/core/dev.c:12140 [inline]
> > > > >        unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
> > > > >        register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
> > > > >        lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
> > > > >        lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
> > > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > >        __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
> > > > >        netif_change_flags+0x108/0x160 net/core/dev.c:9526
> > > > >        dev_change_flags+0xba/0x250 net/core/dev_api.c:68
> > > > >        devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
> > > > >        inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
> > > > >        sock_do_ioctl+0x118/0x280 net/socket.c:1190
> > > > >        sock_ioctl+0x227/0x6b0 net/socket.c:1311
> > > > >        vfs_ioctl fs/ioctl.c:51 [inline]
> > > > >        __do_sys_ioctl fs/ioctl.c:907 [inline]
> > > > >        __se_sys_ioctl fs/ioctl.c:893 [inline]
> > > > >        __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
> > > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> > > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > >        netdev_lock include/linux/netdevice.h:2756 [inline]
> > > > >        netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > > > >        xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
> > > > >        __sys_bind_socket net/socket.c:1810 [inline]
> > > > >        __sys_bind_socket net/socket.c:1802 [inline]
> > > > >        __sys_bind+0x1a7/0x260 net/socket.c:1841
> > > > >        __do_sys_bind net/socket.c:1846 [inline]
> > > > >        __se_sys_bind net/socket.c:1844 [inline]
> > > > >        __x64_sys_bind+0x72/0xb0 net/socket.c:1844
> > > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > -> #0 (&xs->mutex){+.+.}-{4:4}:
> > > > >        check_prev_add kernel/locking/lockdep.c:3168 [inline]
> > > > >        check_prevs_add kernel/locking/lockdep.c:3287 [inline]
> > > > >        validate_chain kernel/locking/lockdep.c:3911 [inline]
> > > > >        __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
> > > > >        lock_acquire kernel/locking/lockdep.c:5871 [inline]
> > > > >        lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
> > > > >        __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > >        __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > >        xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > > >        notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > >        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > >        call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > >        call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > >        unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > > >        rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
> > > > >        rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
> > > > >        rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
> > > > >        netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
> > > > >        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> > > > >        netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
> > > > >        netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
> > > > >        sock_sendmsg_nosec net/socket.c:712 [inline]
> > > > >        __sock_sendmsg net/socket.c:727 [inline]
> > > > >        ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
> > > > >        ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
> > > > >        __sys_sendmsg+0x16d/0x220 net/socket.c:2652
> > > > >        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > >        do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > >        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > other info that might help us debug this:
> > > > >
> > > > > Chain exists of:
> > > > >   &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
> > > > >
> > > > >  Possible unsafe locking scenario:
> > > > >
> > > > >        CPU0                    CPU1
> > > > >        ----                    ----
> > > > >   lock(&net->xdp.lock);
> > > > >                                lock(&dev_instance_lock_key#20);
> > > > >                                lock(&net->xdp.lock);
> > > > >   lock(&xs->mutex);
> > > >
> > > > I feel the above race map is not that right?
> > > >
> > > > My understanding is as shown below.
> > > > CPU 0                                                    CPU 1
> > > > ---                                                           ---
> > > > unregister_netdevice_many_notify()
> > > >                                                           xsk_bind()
> > > > netdev_lock_ops(dev);
> > > >
> > > > mutex_lock(&xs->mutex);
> > > >                                                           netdev_lock_ops(dev);
> > > > xsk_notifier()
> > > > mutex_lock(&net->xdp.lock);
> > > > mutex_lock(&xs->mutex);
> > > >
> > > > So ABBA lock case happens, IIUC.
> > >
> > > Since we can't (easily) control the ordering in notifiers, looks like
> > > we need to align xsk_bind ordering (to be instance lock -> xs->mutex).
> > > LMK if you want to take a stab at this; otherwise I'll try to send a
> > > fix.
> > 
> > I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> > worried if I'm capable of completing it. I will try then.
> 
> SG, thanks! If you need more details lmk, but basically we need to reorder
> netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
> And similarly for cleanup (out_unlock/out_release) path.

Jakub just told me that I'm wrong and it looks similar to commit
f0433eea4688 ("net: don't mix device locking in dev_close_many()
calls"). So this is not as easy as flipping the lock ordering :-(

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 20:48         ` Stanislav Fomichev
@ 2025-06-25 21:03           ` Jakub Kicinski
  2025-06-25 23:37             ` Stanislav Fomichev
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2025-06-25 21:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jason Xing, syzbot, andrii, ast, bjorn, bpf, daniel, davem,
	edumazet, horms, jonathan.lemon, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On Wed, 25 Jun 2025 13:48:03 -0700 Stanislav Fomichev wrote:
> > > I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> > > worried if I'm capable of completing it. I will try then.  
> > 
> > SG, thanks! If you need more details lmk, but basically we need to reorder
> > netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
> > And similarly for cleanup (out_unlock/out_release) path.  
> 
> Jakub just told me that I'm wrong and it looks similar to commit
> f0433eea4688 ("net: don't mix device locking in dev_close_many()
> calls"). So this is not as easy as flipping the lock ordering :-(

I don't think registering a netdev from NETDEV_UP even of another
netdev is going to play way with instance locks and lockdep.
This is likely a false positive but if syzbot keeps complaining
we could:

diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index 995a7207bdf8..f357a7ac70ac 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -81,7 +81,7 @@ static struct lapbethdev *lapbeth_get_x25_dev(struct net_device *dev)
 
 static __inline__ int dev_is_ethdev(struct net_device *dev)
 {
-       return dev->type == ARPHRD_ETHER && strncmp(dev->name, "dummy", 5);
+       return dev->type == ARPHRD_ETHER && !netdev_need_ops_lock(dev);
 }
 
IDK what the dummy hack is there for, it's been like that since 
git begun..

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 21:03           ` Jakub Kicinski
@ 2025-06-25 23:37             ` Stanislav Fomichev
  2025-06-26  0:24               ` Jason Xing
  0 siblings, 1 reply; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-25 23:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jason Xing, syzbot, andrii, ast, bjorn, bpf, daniel, davem,
	edumazet, horms, jonathan.lemon, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On 06/25, Jakub Kicinski wrote:
> On Wed, 25 Jun 2025 13:48:03 -0700 Stanislav Fomichev wrote:
> > > > I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> > > > worried if I'm capable of completing it. I will try then.  
> > > 
> > > SG, thanks! If you need more details lmk, but basically we need to reorder
> > > netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
> > > And similarly for cleanup (out_unlock/out_release) path.  
> > 
> > Jakub just told me that I'm wrong and it looks similar to commit
> > f0433eea4688 ("net: don't mix device locking in dev_close_many()
> > calls"). So this is not as easy as flipping the lock ordering :-(
> 
> I don't think registering a netdev from NETDEV_UP even of another
> netdev is going to play way with instance locks and lockdep.
> This is likely a false positive but if syzbot keeps complaining
> we could:
> 
> diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
> index 995a7207bdf8..f357a7ac70ac 100644
> --- a/drivers/net/wan/lapbether.c
> +++ b/drivers/net/wan/lapbether.c
> @@ -81,7 +81,7 @@ static struct lapbethdev *lapbeth_get_x25_dev(struct net_device *dev)
>  
>  static __inline__ int dev_is_ethdev(struct net_device *dev)
>  {
> -       return dev->type == ARPHRD_ETHER && strncmp(dev->name, "dummy", 5);
> +       return dev->type == ARPHRD_ETHER && !netdev_need_ops_lock(dev);
>  }
>  
> IDK what the dummy hack is there for, it's been like that since 
> git begun..

Agreed. The driver itlself looks interesting. IIUC, when loaded, it
unconditionally creates virtual netdev for any eth device in the init
ns. A bit surprised that syzbot enables it, none of my machines have it
enabled.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
  2025-06-25 23:37             ` Stanislav Fomichev
@ 2025-06-26  0:24               ` Jason Xing
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2025-06-26  0:24 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jakub Kicinski, syzbot, andrii, ast, bjorn, bpf, daniel, davem,
	edumazet, horms, jonathan.lemon, linux-kernel, maciej.fijalkowski,
	magnus.karlsson, netdev, pabeni, sdf, syzkaller-bugs

On Thu, Jun 26, 2025 at 7:37 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 06/25, Jakub Kicinski wrote:
> > On Wed, 25 Jun 2025 13:48:03 -0700 Stanislav Fomichev wrote:
> > > > > I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> > > > > worried if I'm capable of completing it. I will try then.
> > > >
> > > > SG, thanks! If you need more details lmk, but basically we need to reorder
> > > > netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
> > > > And similarly for cleanup (out_unlock/out_release) path.
> > >
> > > Jakub just told me that I'm wrong and it looks similar to commit
> > > f0433eea4688 ("net: don't mix device locking in dev_close_many()
> > > calls"). So this is not as easy as flipping the lock ordering :-(
> >
> > I don't think registering a netdev from NETDEV_UP even of another
> > netdev is going to play way with instance locks and lockdep.
> > This is likely a false positive but if syzbot keeps complaining
> > we could:
> >
> > diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
> > index 995a7207bdf8..f357a7ac70ac 100644
> > --- a/drivers/net/wan/lapbether.c
> > +++ b/drivers/net/wan/lapbether.c
> > @@ -81,7 +81,7 @@ static struct lapbethdev *lapbeth_get_x25_dev(struct net_device *dev)
> >
> >  static __inline__ int dev_is_ethdev(struct net_device *dev)
> >  {
> > -       return dev->type == ARPHRD_ETHER && strncmp(dev->name, "dummy", 5);
> > +       return dev->type == ARPHRD_ETHER && !netdev_need_ops_lock(dev);
> >  }
> >
> > IDK what the dummy hack is there for, it's been like that since
> > git begun..
>
> Agreed. The driver itlself looks interesting. IIUC, when loaded, it
> unconditionally creates virtual netdev for any eth device in the init
> ns. A bit surprised that syzbot enables it, none of my machines have it
> enabled.

Interesting case I find. Thank you both for the detailed explanation :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-26  0:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 18:51 [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3) syzbot
2025-06-25 13:44 ` Jason Xing
2025-06-25 15:06   ` Stanislav Fomichev
2025-06-25 15:38     ` Jason Xing
2025-06-25 15:46       ` Stanislav Fomichev
2025-06-25 20:48         ` Stanislav Fomichev
2025-06-25 21:03           ` Jakub Kicinski
2025-06-25 23:37             ` Stanislav Fomichev
2025-06-26  0:24               ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).