* [syzbot] [virt?] [net?] possible deadlock in vsock_linger
@ 2025-10-21 0:02 syzbot
2025-10-21 8:27 ` Stefano Garzarella
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 0:02 UTC (permalink / raw)
To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
sgarzare, syzkaller-bugs, virtualization
Hello,
syzbot found the following issue on:
HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/6098 is trying to acquire lock:
ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
but task is already holding lock:
ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (vsock_register_mutex){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:598 [inline]
__mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
__vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
__sys_bind_socket net/socket.c:1874 [inline]
__sys_bind_socket net/socket.c:1866 [inline]
__sys_bind+0x1a7/0x260 net/socket.c:1905
__do_sys_bind net/socket.c:1910 [inline]
__se_sys_bind net/socket.c:1908 [inline]
__x64_sys_bind+0x72/0xb0 net/socket.c:1908
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
lock_acquire kernel/locking/lockdep.c:5868 [inline]
lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
lock_sock include/net/sock.h:1679 [inline]
vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
__sys_connect_file+0x141/0x1a0 net/socket.c:2102
__sys_connect+0x13b/0x160 net/socket.c:2121
__do_sys_connect net/socket.c:2127 [inline]
__se_sys_connect net/socket.c:2124 [inline]
__x64_sys_connect+0x72/0xb0 net/socket.c:2124
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(vsock_register_mutex);
lock(sk_lock-AF_VSOCK);
lock(vsock_register_mutex);
lock(sk_lock-AF_VSOCK);
*** DEADLOCK ***
1 lock held by syz.0.17/6098:
#0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
stack backtrace:
CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
lock_acquire kernel/locking/lockdep.c:5868 [inline]
lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
lock_sock include/net/sock.h:1679 [inline]
vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
__sys_connect_file+0x141/0x1a0 net/socket.c:2102
__sys_connect+0x13b/0x160 net/socket.c:2121
__do_sys_connect net/socket.c:2127 [inline]
__se_sys_connect net/socket.c:2124 [inline]
__x64_sys_connect+0x72/0xb0 net/socket.c:2124
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f767bf8efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
@ 2025-10-21 8:27 ` Stefano Garzarella
2025-10-21 10:48 ` Stefano Garzarella
2025-10-21 10:09 ` Stefano Garzarella
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 8:27 UTC (permalink / raw)
To: syzbot, Michal Luczaj
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
Hi Michal,
On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree: upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>
>Downloadable assets:
>disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>
>IMPORTANT: if you fix the issue, please add the following tag to the commit:
>Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>
>======================================================
>WARNING: possible circular locking dependency detected
>syzkaller #0 Not tainted
>------------------------------------------------------
>syz.0.17/6098 is trying to acquire lock:
>ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
Could this be related to our recent work on linger in vsock?
>
>but task is already holding lock:
>ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>
>which lock already depends on the new lock.
>
>
>the existing dependency chain (in reverse order) is:
>
>-> #1 (vsock_register_mutex){+.+.}-{4:4}:
> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
> vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
Ah, no maybe this is related to commit 209fd720838a ("vsock:
Fix transport_{g2h,h2g} TOCTOU") where we added locking in
vsock_find_cid().
Maybe we can just move the checks on top of __vsock_bind() to the
caller. I mean:
/* First ensure this socket isn't already bound. */
if (vsock_addr_bound(&vsk->local_addr))
return -EINVAL;
/* Now bind to the provided address or select appropriate values if
* none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
* like AF_INET prevents binding to a non-local IP address (in most
* cases), we only allow binding to a local CID.
*/
if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
return -EADDRNOTAVAIL;
We have 2 callers: vsock_auto_bind() and vsock_bind().
vsock_auto_bind() is already checking if the socket is already bound,
if not is setting VMADDR_CID_ANY, so we can skip those checks.
In vsock_bind() we can do the checks before lock_sock(sk), at least the
checks on vm_addr, calling vsock_find_cid().
I'm preparing a patch to do this.
Stefano
> vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
> __vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
> vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
> __sys_bind_socket net/socket.c:1874 [inline]
> __sys_bind_socket net/socket.c:1866 [inline]
> __sys_bind+0x1a7/0x260 net/socket.c:1905
> __do_sys_bind net/socket.c:1910 [inline]
> __se_sys_bind net/socket.c:1908 [inline]
> __x64_sys_bind+0x72/0xb0 net/socket.c:1908
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
> check_prev_add kernel/locking/lockdep.c:3165 [inline]
> check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> validate_chain kernel/locking/lockdep.c:3908 [inline]
> __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> lock_acquire kernel/locking/lockdep.c:5868 [inline]
> lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> lock_sock include/net/sock.h:1679 [inline]
> vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> __sys_connect+0x13b/0x160 net/socket.c:2121
> __do_sys_connect net/socket.c:2127 [inline]
> __se_sys_connect net/socket.c:2124 [inline]
> __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(vsock_register_mutex);
> lock(sk_lock-AF_VSOCK);
> lock(vsock_register_mutex);
> lock(sk_lock-AF_VSOCK);
>
> *** DEADLOCK ***
>
>1 lock held by syz.0.17/6098:
> #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>
>stack backtrace:
>CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
>Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
> print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
> check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
> check_prev_add kernel/locking/lockdep.c:3165 [inline]
> check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> validate_chain kernel/locking/lockdep.c:3908 [inline]
> __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> lock_acquire kernel/locking/lockdep.c:5868 [inline]
> lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> lock_sock include/net/sock.h:1679 [inline]
> vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> __sys_connect+0x13b/0x160 net/socket.c:2121
> __do_sys_connect net/socket.c:2127 [inline]
> __se_sys_connect net/socket.c:2124 [inline]
> __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>RIP: 0033:0x7f767bf8efc9
>Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
>RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
>RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
>RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
> </TASK>
>
>
>---
>This report is generated by a bot. It may contain errors.
>See https://goo.gl/tpsmEJ for more information about syzbot.
>syzbot engineers can be reached at syzkaller@googlegroups.com.
>
>syzbot will keep track of this issue. See:
>https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>If the report is already addressed, let syzbot know by replying with:
>#syz fix: exact-commit-title
>
>If you want syzbot to run the reproducer, reply with:
>#syz test: git://repo/address.git branch-or-commit-hash
>If you attach or paste a git patch, syzbot will apply it before testing.
>
>If you want to overwrite report's subsystems, reply with:
>#syz set subsystems: new-subsystem
>(See the list of subsystem names on the web dashboard)
>
>If the report is a duplicate of another one, reply with:
>#syz dup: exact-subject-of-another-report
>
>If you want to undo deduplication, reply with:
>#syz undup
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
2025-10-21 8:27 ` Stefano Garzarella
@ 2025-10-21 10:09 ` Stefano Garzarella
2025-10-21 10:11 ` syzbot
2025-10-21 10:16 ` Stefano Garzarella
2025-10-21 10:59 ` Stefano Garzarella
3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:09 UTC (permalink / raw)
To: syzbot
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization, Michal Luczaj
On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree: upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
#syz test
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -565,6 +565,11 @@ static u32 vsock_registered_transport_cid(const struct vsock_transport **transpo
return cid;
}
+/* vsock_find_cid() must be called outside lock_sock/release_sock
+ * section to avoid a potential lock inversion deadlock with
+ * vsock_assign_transport() where `vsock_register_mutex` is taken when
+ * `sk_lock-AF_VSOCK` is already held.
+ */
bool vsock_find_cid(unsigned int cid)
{
if (cid == vsock_registered_transport_cid(&transport_g2h))
@@ -735,23 +740,14 @@ static int __vsock_bind_dgram(struct vsock_sock *vsk,
return vsk->transport->dgram_bind(vsk, addr);
}
+/* The caller must ensure the socket is not already bound and provide a valid
+ * `addr` to bind (VMADDR_CID_ANY, or a CID assgined to a transport).
+ */
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
{
struct vsock_sock *vsk = vsock_sk(sk);
int retval;
- /* First ensure this socket isn't already bound. */
- if (vsock_addr_bound(&vsk->local_addr))
- return -EINVAL;
-
- /* Now bind to the provided address or select appropriate values if
- * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
- * like AF_INET prevents binding to a non-local IP address (in most
- * cases), we only allow binding to a local CID.
- */
- if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
- return -EADDRNOTAVAIL;
-
switch (sk->sk_socket->type) {
case SOCK_STREAM:
case SOCK_SEQPACKET:
@@ -991,15 +987,33 @@ vsock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
{
int err;
struct sock *sk;
+ struct vsock_sock *vsk;
struct sockaddr_vm *vm_addr;
sk = sock->sk;
+ vsk = vsock_sk(sk);
if (vsock_addr_cast(addr, addr_len, &vm_addr) != 0)
return -EINVAL;
+ /* Like AF_INET prevents binding to a non-local IP address (in most
+ * cases), we only allow binding to a local CID.
+ */
+ if (vm_addr->svm_cid != VMADDR_CID_ANY &&
+ !vsock_find_cid(vm_addr->svm_cid))
+ return -EADDRNOTAVAIL;
+
lock_sock(sk);
+
+ /* Ensure this socket isn't already bound. */
+ if (vsock_addr_bound(&vsk->local_addr)) {
+ err = -EINVAL;
+ goto out;
+ }
+
err = __vsock_bind(sk, vm_addr);
+
+out:
release_sock(sk);
return err;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 10:09 ` Stefano Garzarella
@ 2025-10-21 10:11 ` syzbot
0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 10:11 UTC (permalink / raw)
To: davem, edumazet, horms, kuba, linux-kernel, mhal, netdev, pabeni,
sgarzare, syzkaller-bugs, virtualization
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
failed to apply patch:
checking file net/vmw_vsock/af_vsock.c
patch: **** unexpected end of file in patch
Tested on:
commit: 6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler:
patch: https://syzkaller.appspot.com/x/patch.diff?x=17204e7c580000
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
2025-10-21 8:27 ` Stefano Garzarella
2025-10-21 10:09 ` Stefano Garzarella
@ 2025-10-21 10:16 ` Stefano Garzarella
2025-10-21 10:30 ` syzbot
2025-10-21 10:59 ` Stefano Garzarella
3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:16 UTC (permalink / raw)
To: syzbot
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
[-- Attachment #1: Type: text/plain, Size: 694 bytes --]
On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree: upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
#syz test
[-- Attachment #2: 0001-TODO.patch --]
[-- Type: text/plain, Size: 2743 bytes --]
From c32c21ea301aadc56160a57ddcd99f836a49f028 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 21 Oct 2025 12:12:24 +0200
Subject: [PATCH] TODO
From: Stefano Garzarella <sgarzare@redhat.com>
---
net/vmw_vsock/af_vsock.c | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4c2db6cca557..5434fe6a1d6b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -565,6 +565,11 @@ static u32 vsock_registered_transport_cid(const struct vsock_transport **transpo
return cid;
}
+/* vsock_find_cid() must be called outside lock_sock/release_sock
+ * section to avoid a potential lock inversion deadlock with
+ * vsock_assign_transport() where `vsock_register_mutex` is taken when
+ * `sk_lock-AF_VSOCK` is already held.
+ */
bool vsock_find_cid(unsigned int cid)
{
if (cid == vsock_registered_transport_cid(&transport_g2h))
@@ -735,23 +740,14 @@ static int __vsock_bind_dgram(struct vsock_sock *vsk,
return vsk->transport->dgram_bind(vsk, addr);
}
+/* The caller must ensure the socket is not already bound and provide a valid
+ * `addr` to bind (VMADDR_CID_ANY, or a CID assgined to a transport).
+ */
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
{
struct vsock_sock *vsk = vsock_sk(sk);
int retval;
- /* First ensure this socket isn't already bound. */
- if (vsock_addr_bound(&vsk->local_addr))
- return -EINVAL;
-
- /* Now bind to the provided address or select appropriate values if
- * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
- * like AF_INET prevents binding to a non-local IP address (in most
- * cases), we only allow binding to a local CID.
- */
- if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
- return -EADDRNOTAVAIL;
-
switch (sk->sk_socket->type) {
case SOCK_STREAM:
case SOCK_SEQPACKET:
@@ -991,15 +987,33 @@ vsock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
{
int err;
struct sock *sk;
+ struct vsock_sock *vsk;
struct sockaddr_vm *vm_addr;
sk = sock->sk;
+ vsk = vsock_sk(sk);
if (vsock_addr_cast(addr, addr_len, &vm_addr) != 0)
return -EINVAL;
+ /* Like AF_INET prevents binding to a non-local IP address (in most
+ * cases), we only allow binding to a local CID.
+ */
+ if (vm_addr->svm_cid != VMADDR_CID_ANY &&
+ !vsock_find_cid(vm_addr->svm_cid))
+ return -EADDRNOTAVAIL;
+
lock_sock(sk);
+
+ /* Ensure this socket isn't already bound. */
+ if (vsock_addr_bound(&vsk->local_addr)) {
+ err = -EINVAL;
+ goto out;
+ }
+
err = __vsock_bind(sk, vm_addr);
+
+out:
release_sock(sk);
return err;
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 10:16 ` Stefano Garzarella
@ 2025-10-21 10:30 ` syzbot
0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 10:30 UTC (permalink / raw)
To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
sgarzare, syzkaller-bugs, virtualization
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in vsock_linger
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/6384 is trying to acquire lock:
ffff888055028b18 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
ffff888055028b18 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080
but task is already holding lock:
ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (vsock_register_mutex){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:598 [inline]
__mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
__sys_connect_file+0x141/0x1a0 net/socket.c:2102
__sys_connect+0x13b/0x160 net/socket.c:2121
__do_sys_connect net/socket.c:2127 [inline]
__se_sys_connect net/socket.c:2124 [inline]
__x64_sys_connect+0x72/0xb0 net/socket.c:2124
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
lock_acquire kernel/locking/lockdep.c:5868 [inline]
lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
lock_sock include/net/sock.h:1679 [inline]
vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080
virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
__sys_connect_file+0x141/0x1a0 net/socket.c:2102
__sys_connect+0x13b/0x160 net/socket.c:2121
__do_sys_connect net/socket.c:2127 [inline]
__se_sys_connect net/socket.c:2124 [inline]
__x64_sys_connect+0x72/0xb0 net/socket.c:2124
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(vsock_register_mutex);
lock(sk_lock-AF_VSOCK);
lock(vsock_register_mutex);
lock(sk_lock-AF_VSOCK);
*** DEADLOCK ***
1 lock held by syz.0.17/6384:
#0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
stack backtrace:
CPU: 1 UID: 0 PID: 6384 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
lock_acquire kernel/locking/lockdep.c:5868 [inline]
lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
lock_sock include/net/sock.h:1679 [inline]
vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080
virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
__sys_connect_file+0x141/0x1a0 net/socket.c:2102
__sys_connect+0x13b/0x160 net/socket.c:2121
__do_sys_connect net/socket.c:2127 [inline]
__se_sys_connect net/socket.c:2124 [inline]
__x64_sys_connect+0x72/0xb0 net/socket.c:2124
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f300598efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3006912038 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f3005be5fa0 RCX: 00007f300598efc9
RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f3005a11f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3005be6038 R14: 00007f3005be5fa0 R15: 00007ffdba0a0048
</TASK>
Tested on:
commit: 6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=162a5492580000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=17a04e7c580000
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 8:27 ` Stefano Garzarella
@ 2025-10-21 10:48 ` Stefano Garzarella
2025-10-21 12:19 ` Stefano Garzarella
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:48 UTC (permalink / raw)
To: syzbot, Michal Luczaj
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Michal,
>
> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
> >Hello,
> >
> >syzbot found the following issue on:
> >
> >HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
> >git tree: upstream
> >console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
> >kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> >dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
> >compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
> >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
> >
> >Downloadable assets:
> >disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
> >vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
> >kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
> >
> >IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
> >
> >======================================================
> >WARNING: possible circular locking dependency detected
> >syzkaller #0 Not tainted
> >------------------------------------------------------
> >syz.0.17/6098 is trying to acquire lock:
> >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
> >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>
> Could this be related to our recent work on linger in vsock?
>
> >
> >but task is already holding lock:
> >ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> >
> >which lock already depends on the new lock.
> >
> >
> >the existing dependency chain (in reverse order) is:
> >
> >-> #1 (vsock_register_mutex){+.+.}-{4:4}:
> > __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> > __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
> > vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>
> Ah, no maybe this is related to commit 209fd720838a ("vsock:
> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
> vsock_find_cid().
>
> Maybe we can just move the checks on top of __vsock_bind() to the
> caller. I mean:
>
> /* First ensure this socket isn't already bound. */
> if (vsock_addr_bound(&vsk->local_addr))
> return -EINVAL;
>
> /* Now bind to the provided address or select appropriate values if
> * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
> * like AF_INET prevents binding to a non-local IP address (in most
> * cases), we only allow binding to a local CID.
> */
> if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
> return -EADDRNOTAVAIL;
>
> We have 2 callers: vsock_auto_bind() and vsock_bind().
>
> vsock_auto_bind() is already checking if the socket is already bound,
> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>
> In vsock_bind() we can do the checks before lock_sock(sk), at least the
> checks on vm_addr, calling vsock_find_cid().
>
> I'm preparing a patch to do this.
mmm, no, this is more related to vsock_linger() where sk_wait_event()
releases and locks again the sk_lock.
So, it should be related to commit 687aa0c5581b ("vsock: Fix
transport_* TOCTOU") where we take vsock_register_mutex in
vsock_assign_transport() while calling vsk->transport->release().
So, maybe we need to move the release and vsock_deassign_transport()
after unlocking vsock_register_mutex.
Stefano
>
> Stefano
>
>
> > vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
> > __vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
> > vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
> > __sys_bind_socket net/socket.c:1874 [inline]
> > __sys_bind_socket net/socket.c:1866 [inline]
> > __sys_bind+0x1a7/0x260 net/socket.c:1905
> > __do_sys_bind net/socket.c:1910 [inline]
> > __se_sys_bind net/socket.c:1908 [inline]
> > __x64_sys_bind+0x72/0xb0 net/socket.c:1908
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> >-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
> > check_prev_add kernel/locking/lockdep.c:3165 [inline]
> > check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> > validate_chain kernel/locking/lockdep.c:3908 [inline]
> > __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> > lock_acquire kernel/locking/lockdep.c:5868 [inline]
> > lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> > lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> > lock_sock include/net/sock.h:1679 [inline]
> > vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> > virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> > virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> > vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> > vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> > __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> > __sys_connect+0x13b/0x160 net/socket.c:2121
> > __do_sys_connect net/socket.c:2127 [inline]
> > __se_sys_connect net/socket.c:2124 [inline]
> > __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> >other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(vsock_register_mutex);
> > lock(sk_lock-AF_VSOCK);
> > lock(vsock_register_mutex);
> > lock(sk_lock-AF_VSOCK);
> >
> > *** DEADLOCK ***
> >
> >1 lock held by syz.0.17/6098:
> > #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> >
> >stack backtrace:
> >CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> >Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> >Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:94 [inline]
> > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
> > print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
> > check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
> > check_prev_add kernel/locking/lockdep.c:3165 [inline]
> > check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> > validate_chain kernel/locking/lockdep.c:3908 [inline]
> > __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> > lock_acquire kernel/locking/lockdep.c:5868 [inline]
> > lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> > lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> > lock_sock include/net/sock.h:1679 [inline]
> > vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> > virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> > virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> > vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> > vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> > __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> > __sys_connect+0x13b/0x160 net/socket.c:2121
> > __do_sys_connect net/socket.c:2127 [inline]
> > __se_sys_connect net/socket.c:2124 [inline]
> > __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >RIP: 0033:0x7f767bf8efc9
> >Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> >RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> >RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
> >RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
> >RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
> >R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
> > </TASK>
> >
> >
> >---
> >This report is generated by a bot. It may contain errors.
> >See https://goo.gl/tpsmEJ for more information about syzbot.
> >syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> >syzbot will keep track of this issue. See:
> >https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> >If the report is already addressed, let syzbot know by replying with:
> >#syz fix: exact-commit-title
> >
> >If you want syzbot to run the reproducer, reply with:
> >#syz test: git://repo/address.git branch-or-commit-hash
> >If you attach or paste a git patch, syzbot will apply it before testing.
> >
> >If you want to overwrite report's subsystems, reply with:
> >#syz set subsystems: new-subsystem
> >(See the list of subsystem names on the web dashboard)
> >
> >If the report is a duplicate of another one, reply with:
> >#syz dup: exact-subject-of-another-report
> >
> >If you want to undo deduplication, reply with:
> >#syz undup
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
` (2 preceding siblings ...)
2025-10-21 10:16 ` Stefano Garzarella
@ 2025-10-21 10:59 ` Stefano Garzarella
2025-10-21 11:19 ` syzbot
3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:59 UTC (permalink / raw)
To: syzbot
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
[-- Attachment #1: Type: text/plain, Size: 694 bytes --]
On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree: upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
#syz test
[-- Attachment #2: 0001-TODO.patch --]
[-- Type: text/plain, Size: 2286 bytes --]
From 456534cbdbc7312fa1cddfb7aa50b771725c0a53 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 21 Oct 2025 12:51:45 +0200
Subject: [PATCH] TODO
From: Stefano Garzarella <sgarzare@redhat.com>
---
net/vmw_vsock/af_vsock.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4c2db6cca557..89b4dbb859a5 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -487,12 +487,26 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
goto err;
}
- if (vsk->transport) {
- if (vsk->transport == new_transport) {
- ret = 0;
- goto err;
- }
+ if (vsk->transport == new_transport) {
+ ret = 0;
+ goto err;
+ }
+ /* We increase the module refcnt to prevent the transport unloading
+ * while there are open sockets assigned to it.
+ */
+ if (!new_transport || !try_module_get(new_transport->module)) {
+ ret = -ENODEV;
+ goto err;
+ }
+
+ /* It's safe to release the mutex after a successful try_module_get().
+ * Whichever transport `new_transport` points at, it won't go away until
+ * the last module_put() below or in vsock_deassign_transport().
+ */
+ mutex_unlock(&vsock_register_mutex);
+
+ if (vsk->transport) {
/* transport->release() must be called with sock lock acquired.
* This path can only be taken during vsock_connect(), where we
* have already held the sock lock. In the other cases, this
@@ -512,20 +526,6 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
vsk->peer_shutdown = 0;
}
- /* We increase the module refcnt to prevent the transport unloading
- * while there are open sockets assigned to it.
- */
- if (!new_transport || !try_module_get(new_transport->module)) {
- ret = -ENODEV;
- goto err;
- }
-
- /* It's safe to release the mutex after a successful try_module_get().
- * Whichever transport `new_transport` points at, it won't go away until
- * the last module_put() below or in vsock_deassign_transport().
- */
- mutex_unlock(&vsock_register_mutex);
-
if (sk->sk_type == SOCK_SEQPACKET) {
if (!new_transport->seqpacket_allow ||
!new_transport->seqpacket_allow(remote_cid)) {
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 10:59 ` Stefano Garzarella
@ 2025-10-21 11:19 ` syzbot
0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 11:19 UTC (permalink / raw)
To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
sgarzare, syzkaller-bugs, virtualization
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Tested on:
commit: 6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1534c3cd980000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=11a90d2f980000
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 10:48 ` Stefano Garzarella
@ 2025-10-21 12:19 ` Stefano Garzarella
2025-11-15 16:00 ` Michal Luczaj
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 12:19 UTC (permalink / raw)
To: syzbot, Michal Luczaj
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > Hi Michal,
> >
> > On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
> > >Hello,
> > >
> > >syzbot found the following issue on:
> > >
> > >HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
> > >git tree: upstream
> > >console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
> > >kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> > >dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
> > >compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > >syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
> > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
> > >
> > >Downloadable assets:
> > >disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
> > >vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
> > >kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
> > >
> > >IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > >Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
> > >
> > >======================================================
> > >WARNING: possible circular locking dependency detected
> > >syzkaller #0 Not tainted
> > >------------------------------------------------------
> > >syz.0.17/6098 is trying to acquire lock:
> > >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
> > >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> >
> > Could this be related to our recent work on linger in vsock?
> >
> > >
> > >but task is already holding lock:
> > >ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> > >
> > >which lock already depends on the new lock.
> > >
> > >
> > >the existing dependency chain (in reverse order) is:
> > >
> > >-> #1 (vsock_register_mutex){+.+.}-{4:4}:
> > > __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> > > __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
> > > vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
> >
> > Ah, no maybe this is related to commit 209fd720838a ("vsock:
> > Fix transport_{g2h,h2g} TOCTOU") where we added locking in
> > vsock_find_cid().
> >
> > Maybe we can just move the checks on top of __vsock_bind() to the
> > caller. I mean:
> >
> > /* First ensure this socket isn't already bound. */
> > if (vsock_addr_bound(&vsk->local_addr))
> > return -EINVAL;
> >
> > /* Now bind to the provided address or select appropriate values if
> > * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
> > * like AF_INET prevents binding to a non-local IP address (in most
> > * cases), we only allow binding to a local CID.
> > */
> > if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
> > return -EADDRNOTAVAIL;
> >
> > We have 2 callers: vsock_auto_bind() and vsock_bind().
> >
> > vsock_auto_bind() is already checking if the socket is already bound,
> > if not is setting VMADDR_CID_ANY, so we can skip those checks.
> >
> > In vsock_bind() we can do the checks before lock_sock(sk), at least the
> > checks on vm_addr, calling vsock_find_cid().
> >
> > I'm preparing a patch to do this.
>
> mmm, no, this is more related to vsock_linger() where sk_wait_event()
> releases and locks again the sk_lock.
> So, it should be related to commit 687aa0c5581b ("vsock: Fix
> transport_* TOCTOU") where we take vsock_register_mutex in
> vsock_assign_transport() while calling vsk->transport->release().
>
> So, maybe we need to move the release and vsock_deassign_transport()
> after unlocking vsock_register_mutex.
I implemented this here:
https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
sysbot successfully tested it.
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-10-21 12:19 ` Stefano Garzarella
@ 2025-11-15 16:00 ` Michal Luczaj
2025-11-17 9:57 ` Stefano Garzarella
0 siblings, 1 reply; 13+ messages in thread
From: Michal Luczaj @ 2025-11-15 16:00 UTC (permalink / raw)
To: Stefano Garzarella, syzbot
Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
syzkaller-bugs, virtualization
On 10/21/25 14:19, Stefano Garzarella wrote:
> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>
>>> Hi Michal,
>>>
>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>> git tree: upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>
>>>> Downloadable assets:
>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>
>>>> ======================================================
>>>> WARNING: possible circular locking dependency detected
>>>> syzkaller #0 Not tainted
>>>> ------------------------------------------------------
>>>> syz.0.17/6098 is trying to acquire lock:
>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>
>>> Could this be related to our recent work on linger in vsock?
>>>
>>>>
>>>> but task is already holding lock:
>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>> __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>> vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>
>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>> vsock_find_cid().
>>>
>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>> caller. I mean:
>>>
>>> /* First ensure this socket isn't already bound. */
>>> if (vsock_addr_bound(&vsk->local_addr))
>>> return -EINVAL;
>>>
>>> /* Now bind to the provided address or select appropriate values if
>>> * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
>>> * like AF_INET prevents binding to a non-local IP address (in most
>>> * cases), we only allow binding to a local CID.
>>> */
>>> if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>> return -EADDRNOTAVAIL;
>>>
>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>
>>> vsock_auto_bind() is already checking if the socket is already bound,
>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>
>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>> checks on vm_addr, calling vsock_find_cid().
>>>
>>> I'm preparing a patch to do this.
>>
>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>> releases and locks again the sk_lock.
>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>> transport_* TOCTOU") where we take vsock_register_mutex in
>> vsock_assign_transport() while calling vsk->transport->release().
>>
>> So, maybe we need to move the release and vsock_deassign_transport()
>> after unlocking vsock_register_mutex.
>
> I implemented this here:
> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
>
> sysbot successfully tested it.
>
> Stefano
Hi Stefano
Apologies for missing this, I was away for a couple of weeks.
Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
you've fixed the lock order inversion (thank you), being able to reset an
established socket, combined with SO_LINGER's lock-release-lock dance,
still leads to crashes.
I think it goes like this: if user hits connect() with a signal right after
connection is established (which implies an assigned transport), `sk_state`
gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
connect() can be retried. If re-connect() is for a different CID, transport
reassignment takes place. That involves transport->release() of the old
transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
Lingering temporarily releases socket lock. Which can be raced by another
thread doing connect(). Basically thread-1 can release resources from under
thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
does not expect a disappearing transport.
BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
Read of size 1 at addr ffff888107c99420 by task a.out/1385
CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G E
6.18.0-rc5+ #241 PREEMPT(voluntary)
Call Trace:
dump_stack_lvl+0x7e/0xc0
print_report+0x170/0x4de
kasan_report+0xc2/0x180
__kasan_check_byte+0x3a/0x50
lock_acquire+0xb2/0x300
_raw_spin_lock_bh+0x34/0x40
virtio_transport_unsent_bytes+0x3b/0x80
vsock_linger+0x263/0x370
virtio_transport_release+0x3ff/0x510
vsock_assign_transport+0x358/0x780
vsock_connect+0x5a2/0xc40
__sys_connect+0xde/0x110
__x64_sys_connect+0x6e/0xc0
do_syscall_64+0x94/0xbb0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Allocated by task 1384:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0x92/0xa0
virtio_transport_do_socket_init+0x48/0x320
vsock_assign_transport+0x4ff/0x780
vsock_connect+0x5a2/0xc40
__sys_connect+0xde/0x110
__x64_sys_connect+0x6e/0xc0
do_syscall_64+0x94/0xbb0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Freed by task 1384:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_save_free_info+0x37/0x50
__kasan_slab_free+0x63/0x80
kfree+0x142/0x6a0
virtio_transport_destruct+0x86/0x170
vsock_assign_transport+0x3a8/0x780
vsock_connect+0x5a2/0xc40
__sys_connect+0xde/0x110
__x64_sys_connect+0x6e/0xc0
do_syscall_64+0x94/0xbb0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
I suppose there are many ways this chain of events can be stopped, but I
see it as yet another reason to simplify vsock_connect(): do not let it
"reset" an already established socket. I guess that would do the trick.
What do you think?
Thanks,
Michal
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-11-15 16:00 ` Michal Luczaj
@ 2025-11-17 9:57 ` Stefano Garzarella
2025-11-17 21:00 ` Michal Luczaj
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-11-17 9:57 UTC (permalink / raw)
To: Michal Luczaj
Cc: syzbot, davem, edumazet, horms, kuba, linux-kernel, netdev,
pabeni, syzkaller-bugs, virtualization
On Sat, Nov 15, 2025 at 05:00:28PM +0100, Michal Luczaj wrote:
>On 10/21/25 14:19, Stefano Garzarella wrote:
>> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>
>>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> Hi Michal,
>>>>
>>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>>> git tree: upstream
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>>> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>>
>>>>> Downloadable assets:
>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>>
>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>>
>>>>> ======================================================
>>>>> WARNING: possible circular locking dependency detected
>>>>> syzkaller #0 Not tainted
>>>>> ------------------------------------------------------
>>>>> syz.0.17/6098 is trying to acquire lock:
>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>>
>>>> Could this be related to our recent work on linger in vsock?
>>>>
>>>>>
>>>>> but task is already holding lock:
>>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>>
>>>>> which lock already depends on the new lock.
>>>>>
>>>>>
>>>>> the existing dependency chain (in reverse order) is:
>>>>>
>>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>>> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>>> __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>>> vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>>
>>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>>> vsock_find_cid().
>>>>
>>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>>> caller. I mean:
>>>>
>>>> /* First ensure this socket isn't already bound. */
>>>> if (vsock_addr_bound(&vsk->local_addr))
>>>> return -EINVAL;
>>>>
>>>> /* Now bind to the provided address or select appropriate values if
>>>> * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
>>>> * like AF_INET prevents binding to a non-local IP address (in most
>>>> * cases), we only allow binding to a local CID.
>>>> */
>>>> if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>>> return -EADDRNOTAVAIL;
>>>>
>>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>>
>>>> vsock_auto_bind() is already checking if the socket is already bound,
>>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>>
>>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>>> checks on vm_addr, calling vsock_find_cid().
>>>>
>>>> I'm preparing a patch to do this.
>>>
>>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>>> releases and locks again the sk_lock.
>>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>>> transport_* TOCTOU") where we take vsock_register_mutex in
>>> vsock_assign_transport() while calling vsk->transport->release().
>>>
>>> So, maybe we need to move the release and vsock_deassign_transport()
>>> after unlocking vsock_register_mutex.
>>
>> I implemented this here:
>> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
>>
>> sysbot successfully tested it.
>>
>> Stefano
>
>Hi Stefano
Hi Michal!
>
>Apologies for missing this, I was away for a couple of weeks.
Don't worry at all!
>
>Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
>you've fixed the lock order inversion (thank you), being able to reset an
>established socket, combined with SO_LINGER's lock-release-lock dance,
>still leads to crashes.
Yeah, I see!
>
>I think it goes like this: if user hits connect() with a signal right after
>connection is established (which implies an assigned transport), `sk_state`
>gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
>connect() can be retried. If re-connect() is for a different CID, transport
>reassignment takes place. That involves transport->release() of the old
>transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
>Lingering temporarily releases socket lock. Which can be raced by another
>thread doing connect(). Basically thread-1 can release resources from under
>thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
>does not expect a disappearing transport.
Makes sense to me!
>
>BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
>Read of size 1 at addr ffff888107c99420 by task a.out/1385
>CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G E
>6.18.0-rc5+ #241 PREEMPT(voluntary)
>Call Trace:
> dump_stack_lvl+0x7e/0xc0
> print_report+0x170/0x4de
> kasan_report+0xc2/0x180
> __kasan_check_byte+0x3a/0x50
> lock_acquire+0xb2/0x300
> _raw_spin_lock_bh+0x34/0x40
> virtio_transport_unsent_bytes+0x3b/0x80
> vsock_linger+0x263/0x370
> virtio_transport_release+0x3ff/0x510
> vsock_assign_transport+0x358/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>Allocated by task 1384:
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_kmalloc+0x92/0xa0
> virtio_transport_do_socket_init+0x48/0x320
> vsock_assign_transport+0x4ff/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>Freed by task 1384:
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_save_free_info+0x37/0x50
> __kasan_slab_free+0x63/0x80
> kfree+0x142/0x6a0
> virtio_transport_destruct+0x86/0x170
> vsock_assign_transport+0x3a8/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>I suppose there are many ways this chain of events can be stopped, but I
>see it as yet another reason to simplify vsock_connect(): do not let it
>"reset" an already established socket. I guess that would do the trick.
>What do you think?
I agree, we should do that. Do you have time to take a look?
Thanks for the help!
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
2025-11-17 9:57 ` Stefano Garzarella
@ 2025-11-17 21:00 ` Michal Luczaj
0 siblings, 0 replies; 13+ messages in thread
From: Michal Luczaj @ 2025-11-17 21:00 UTC (permalink / raw)
To: Stefano Garzarella
Cc: syzbot, davem, edumazet, horms, kuba, linux-kernel, netdev,
pabeni, syzkaller-bugs, virtualization
On 11/17/25 10:57, Stefano Garzarella wrote:
> On Sat, Nov 15, 2025 at 05:00:28PM +0100, Michal Luczaj wrote:
>> On 10/21/25 14:19, Stefano Garzarella wrote:
>>> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit: d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>>>> git tree: upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>>>> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>>>
>>>>>> ======================================================
>>>>>> WARNING: possible circular locking dependency detected
>>>>>> syzkaller #0 Not tainted
>>>>>> ------------------------------------------------------
>>>>>> syz.0.17/6098 is trying to acquire lock:
>>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>>>
>>>>> Could this be related to our recent work on linger in vsock?
>>>>>
>>>>>>
>>>>>> but task is already holding lock:
>>>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>>>
>>>>>> which lock already depends on the new lock.
>>>>>>
>>>>>>
>>>>>> the existing dependency chain (in reverse order) is:
>>>>>>
>>>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>>>> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>>>> __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>>>> vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>>>
>>>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>>>> vsock_find_cid().
>>>>>
>>>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>>>> caller. I mean:
>>>>>
>>>>> /* First ensure this socket isn't already bound. */
>>>>> if (vsock_addr_bound(&vsk->local_addr))
>>>>> return -EINVAL;
>>>>>
>>>>> /* Now bind to the provided address or select appropriate values if
>>>>> * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
>>>>> * like AF_INET prevents binding to a non-local IP address (in most
>>>>> * cases), we only allow binding to a local CID.
>>>>> */
>>>>> if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>>>> return -EADDRNOTAVAIL;
>>>>>
>>>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>>>
>>>>> vsock_auto_bind() is already checking if the socket is already bound,
>>>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>>>
>>>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>>>> checks on vm_addr, calling vsock_find_cid().
>>>>>
>>>>> I'm preparing a patch to do this.
>>>>
>>>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>>>> releases and locks again the sk_lock.
>>>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>>>> transport_* TOCTOU") where we take vsock_register_mutex in
>>>> vsock_assign_transport() while calling vsk->transport->release().
>>>>
>>>> So, maybe we need to move the release and vsock_deassign_transport()
>>>> after unlocking vsock_register_mutex.
>>>
>>> I implemented this here:
>>> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
>>>
>>> sysbot successfully tested it.
>>>
>>> Stefano
>>
>> Hi Stefano
>
> Hi Michal!
>
>>
>> Apologies for missing this, I was away for a couple of weeks.
>
> Don't worry at all!
>
>>
>> Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
>> you've fixed the lock order inversion (thank you), being able to reset an
>> established socket, combined with SO_LINGER's lock-release-lock dance,
>> still leads to crashes.
>
> Yeah, I see!
>
>>
>> I think it goes like this: if user hits connect() with a signal right after
>> connection is established (which implies an assigned transport), `sk_state`
>> gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
>> connect() can be retried. If re-connect() is for a different CID, transport
>> reassignment takes place. That involves transport->release() of the old
>> transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
>> Lingering temporarily releases socket lock. Which can be raced by another
>> thread doing connect(). Basically thread-1 can release resources from under
>> thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
>> does not expect a disappearing transport.
>
> Makes sense to me!
>
>>
>> BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
>> Read of size 1 at addr ffff888107c99420 by task a.out/1385
>> CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G E
>> 6.18.0-rc5+ #241 PREEMPT(voluntary)
>> Call Trace:
>> dump_stack_lvl+0x7e/0xc0
>> print_report+0x170/0x4de
>> kasan_report+0xc2/0x180
>> __kasan_check_byte+0x3a/0x50
>> lock_acquire+0xb2/0x300
>> _raw_spin_lock_bh+0x34/0x40
>> virtio_transport_unsent_bytes+0x3b/0x80
>> vsock_linger+0x263/0x370
>> virtio_transport_release+0x3ff/0x510
>> vsock_assign_transport+0x358/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> Allocated by task 1384:
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_kmalloc+0x92/0xa0
>> virtio_transport_do_socket_init+0x48/0x320
>> vsock_assign_transport+0x4ff/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> Freed by task 1384:
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_save_free_info+0x37/0x50
>> __kasan_slab_free+0x63/0x80
>> kfree+0x142/0x6a0
>> virtio_transport_destruct+0x86/0x170
>> vsock_assign_transport+0x3a8/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> I suppose there are many ways this chain of events can be stopped, but I
>> see it as yet another reason to simplify vsock_connect(): do not let it
>> "reset" an already established socket. I guess that would do the trick.
>> What do you think?
>
> I agree, we should do that. Do you have time to take a look?
Sure, here's a patch:
https://lore.kernel.org/netdev/20251117-vsock-interrupted-connect-v1-1-bc021e907c3f@rbox.co/
Michal
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-11-17 21:00 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
2025-10-21 8:27 ` Stefano Garzarella
2025-10-21 10:48 ` Stefano Garzarella
2025-10-21 12:19 ` Stefano Garzarella
2025-11-15 16:00 ` Michal Luczaj
2025-11-17 9:57 ` Stefano Garzarella
2025-11-17 21:00 ` Michal Luczaj
2025-10-21 10:09 ` Stefano Garzarella
2025-10-21 10:11 ` syzbot
2025-10-21 10:16 ` Stefano Garzarella
2025-10-21 10:30 ` syzbot
2025-10-21 10:59 ` Stefano Garzarella
2025-10-21 11:19 ` syzbot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).