[syzbot] [virt?] [net?] possible deadlock in vsock

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [virt?] [net?] possible deadlock in vsock_linger
@ 2025-10-21  0:02 syzbot
  2025-10-21  8:27 ` Stefano Garzarella
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: syzbot @ 2025-10-21  0:02 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	sgarzare, syzkaller-bugs, virtualization

Hello,

syzbot found the following issue on:

HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/6098 is trying to acquire lock:
ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066

but task is already holding lock:
ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (vsock_register_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
       vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
       __vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
       vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
       __sys_bind_socket net/socket.c:1874 [inline]
       __sys_bind_socket net/socket.c:1866 [inline]
       __sys_bind+0x1a7/0x260 net/socket.c:1905
       __do_sys_bind net/socket.c:1910 [inline]
       __se_sys_bind net/socket.c:1908 [inline]
       __x64_sys_bind+0x72/0xb0 net/socket.c:1908
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
       lock_sock include/net/sock.h:1679 [inline]
       vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
       virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
       virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
       vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
       vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
       __sys_connect_file+0x141/0x1a0 net/socket.c:2102
       __sys_connect+0x13b/0x160 net/socket.c:2121
       __do_sys_connect net/socket.c:2127 [inline]
       __se_sys_connect net/socket.c:2124 [inline]
       __x64_sys_connect+0x72/0xb0 net/socket.c:2124
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(vsock_register_mutex);
                               lock(sk_lock-AF_VSOCK);
                               lock(vsock_register_mutex);
  lock(sk_lock-AF_VSOCK);

 *** DEADLOCK ***

1 lock held by syz.0.17/6098:
 #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469

stack backtrace:
CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
 check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
 lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
 lock_sock include/net/sock.h:1679 [inline]
 vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
 virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
 virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
 vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
 vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
 __sys_connect_file+0x141/0x1a0 net/socket.c:2102
 __sys_connect+0x13b/0x160 net/socket.c:2121
 __do_sys_connect net/socket.c:2127 [inline]
 __se_sys_connect net/socket.c:2124 [inline]
 __x64_sys_connect+0x72/0xb0 net/socket.c:2124
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f767bf8efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21  0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
@ 2025-10-21  8:27 ` Stefano Garzarella
  2025-10-21 10:48   ` Stefano Garzarella
  2025-10-21 10:09 ` Stefano Garzarella
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21  8:27 UTC (permalink / raw)
  To: syzbot, Michal Luczaj
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

Hi Michal,

On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree:       upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>
>Downloadable assets:
>disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>
>IMPORTANT: if you fix the issue, please add the following tag to the commit:
>Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>
>======================================================
>WARNING: possible circular locking dependency detected
>syzkaller #0 Not tainted
>------------------------------------------------------
>syz.0.17/6098 is trying to acquire lock:
>ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066

Could this be related to our recent work on linger in vsock?

>
>but task is already holding lock:
>ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>
>which lock already depends on the new lock.
>
>
>the existing dependency chain (in reverse order) is:
>
>-> #1 (vsock_register_mutex){+.+.}-{4:4}:
>       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]

Ah, no maybe this is related to commit 209fd720838a ("vsock:
Fix transport_{g2h,h2g} TOCTOU") where we added locking in
vsock_find_cid().

Maybe we can just move the checks on top of __vsock_bind() to the
caller. I mean:

	/* First ensure this socket isn't already bound. */
	if (vsock_addr_bound(&vsk->local_addr))
		return -EINVAL;

	/* Now bind to the provided address or select appropriate values if
	 * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
	 * like AF_INET prevents binding to a non-local IP address (in most
	 * cases), we only allow binding to a local CID.
	 */
	if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
		return -EADDRNOTAVAIL;

We have 2 callers: vsock_auto_bind() and vsock_bind().

vsock_auto_bind() is already checking if the socket is already bound,
if not is setting VMADDR_CID_ANY, so we can skip those checks.

In vsock_bind() we can do the checks before lock_sock(sk), at least the
checks on vm_addr, calling vsock_find_cid().

I'm preparing a patch to do this.

Stefano


>       vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
>       __vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
>       vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
>       __sys_bind_socket net/socket.c:1874 [inline]
>       __sys_bind_socket net/socket.c:1866 [inline]
>       __sys_bind+0x1a7/0x260 net/socket.c:1905
>       __do_sys_bind net/socket.c:1910 [inline]
>       __se_sys_bind net/socket.c:1908 [inline]
>       __x64_sys_bind+0x72/0xb0 net/socket.c:1908
>       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
>       entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
>       check_prev_add kernel/locking/lockdep.c:3165 [inline]
>       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
>       validate_chain kernel/locking/lockdep.c:3908 [inline]
>       __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
>       lock_acquire kernel/locking/lockdep.c:5868 [inline]
>       lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
>       lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
>       lock_sock include/net/sock.h:1679 [inline]
>       vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>       virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
>       virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
>       vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
>       vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
>       __sys_connect_file+0x141/0x1a0 net/socket.c:2102
>       __sys_connect+0x13b/0x160 net/socket.c:2121
>       __do_sys_connect net/socket.c:2127 [inline]
>       __se_sys_connect net/socket.c:2124 [inline]
>       __x64_sys_connect+0x72/0xb0 net/socket.c:2124
>       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
>       entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
>       CPU0                    CPU1
>       ----                    ----
>  lock(vsock_register_mutex);
>                               lock(sk_lock-AF_VSOCK);
>                               lock(vsock_register_mutex);
>  lock(sk_lock-AF_VSOCK);
>
> *** DEADLOCK ***
>
>1 lock held by syz.0.17/6098:
> #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>
>stack backtrace:
>CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
>Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
> print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
> check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
> check_prev_add kernel/locking/lockdep.c:3165 [inline]
> check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> validate_chain kernel/locking/lockdep.c:3908 [inline]
> __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> lock_acquire kernel/locking/lockdep.c:5868 [inline]
> lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> lock_sock include/net/sock.h:1679 [inline]
> vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> __sys_connect+0x13b/0x160 net/socket.c:2121
> __do_sys_connect net/socket.c:2127 [inline]
> __se_sys_connect net/socket.c:2124 [inline]
> __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>RIP: 0033:0x7f767bf8efc9
>Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
>RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
>RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
>RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
>R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
> </TASK>
>
>
>---
>This report is generated by a bot. It may contain errors.
>See https://goo.gl/tpsmEJ for more information about syzbot.
>syzbot engineers can be reached at syzkaller@googlegroups.com.
>
>syzbot will keep track of this issue. See:
>https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>If the report is already addressed, let syzbot know by replying with:
>#syz fix: exact-commit-title
>
>If you want syzbot to run the reproducer, reply with:
>#syz test: git://repo/address.git branch-or-commit-hash
>If you attach or paste a git patch, syzbot will apply it before testing.
>
>If you want to overwrite report's subsystems, reply with:
>#syz set subsystems: new-subsystem
>(See the list of subsystem names on the web dashboard)
>
>If the report is a duplicate of another one, reply with:
>#syz dup: exact-subject-of-another-report
>
>If you want to undo deduplication, reply with:
>#syz undup
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21  0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
  2025-10-21  8:27 ` Stefano Garzarella
@ 2025-10-21 10:09 ` Stefano Garzarella
  2025-10-21 10:11   ` syzbot
  2025-10-21 10:16 ` Stefano Garzarella
  2025-10-21 10:59 ` Stefano Garzarella
  3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:09 UTC (permalink / raw)
  To: syzbot
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization, Michal Luczaj

On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree:       upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000

#syz test

--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -565,6 +565,11 @@ static u32 vsock_registered_transport_cid(const struct vsock_transport **transpo
         return cid;
  }

+/* vsock_find_cid() must be called outside lock_sock/release_sock
+ * section to avoid a potential lock inversion deadlock with
+ * vsock_assign_transport() where `vsock_register_mutex` is taken when
+ * `sk_lock-AF_VSOCK` is already held.
+ */
  bool vsock_find_cid(unsigned int cid)
  {
         if (cid == vsock_registered_transport_cid(&transport_g2h))
@@ -735,23 +740,14 @@ static int __vsock_bind_dgram(struct vsock_sock *vsk,
         return vsk->transport->dgram_bind(vsk, addr);
  }

+/* The caller must ensure the socket is not already bound and provide a valid
+ * `addr` to bind (VMADDR_CID_ANY, or a CID assgined to a transport).
+ */
  static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
  {
         struct vsock_sock *vsk = vsock_sk(sk);
         int retval;

-       /* First ensure this socket isn't already bound. */
-       if (vsock_addr_bound(&vsk->local_addr))
-               return -EINVAL;
-
-       /* Now bind to the provided address or select appropriate values if
-        * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
-        * like AF_INET prevents binding to a non-local IP address (in most
-        * cases), we only allow binding to a local CID.
-        */
-       if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
-               return -EADDRNOTAVAIL;
-
         switch (sk->sk_socket->type) {
         case SOCK_STREAM:
         case SOCK_SEQPACKET:
@@ -991,15 +987,33 @@ vsock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
  {
         int err;
         struct sock *sk;
+       struct vsock_sock *vsk;
         struct sockaddr_vm *vm_addr;

         sk = sock->sk;
+       vsk = vsock_sk(sk);

         if (vsock_addr_cast(addr, addr_len, &vm_addr) != 0)
                 return -EINVAL;

+       /* Like AF_INET prevents binding to a non-local IP address (in most
+        * cases), we only allow binding to a local CID.
+        */
+       if (vm_addr->svm_cid != VMADDR_CID_ANY &&
+           !vsock_find_cid(vm_addr->svm_cid))
+               return -EADDRNOTAVAIL;
+
         lock_sock(sk);
+
+       /* Ensure this socket isn't already bound. */
+       if (vsock_addr_bound(&vsk->local_addr)) {
+               err = -EINVAL;
+               goto out;
+       }
+
         err = __vsock_bind(sk, vm_addr);
+
+out:
         release_sock(sk);

         return err;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21 10:09 ` Stefano Garzarella
@ 2025-10-21 10:11   ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 10:11 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, mhal, netdev, pabeni,
	sgarzare, syzkaller-bugs, virtualization

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file net/vmw_vsock/af_vsock.c
patch: **** unexpected end of file in patch

Tested on:

commit:         6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler:       
patch:          https://syzkaller.appspot.com/x/patch.diff?x=17204e7c580000

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21  0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
  2025-10-21  8:27 ` Stefano Garzarella
  2025-10-21 10:09 ` Stefano Garzarella
@ 2025-10-21 10:16 ` Stefano Garzarella
  2025-10-21 10:30   ` syzbot
  2025-10-21 10:59 ` Stefano Garzarella
  3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:16 UTC (permalink / raw)
  To: syzbot
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree:       upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000

#syz test


[-- Attachment #2: 0001-TODO.patch --]
[-- Type: text/plain, Size: 2743 bytes --]

From c32c21ea301aadc56160a57ddcd99f836a49f028 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 21 Oct 2025 12:12:24 +0200
Subject: [PATCH] TODO

From: Stefano Garzarella <sgarzare@redhat.com>

---
 net/vmw_vsock/af_vsock.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4c2db6cca557..5434fe6a1d6b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -565,6 +565,11 @@ static u32 vsock_registered_transport_cid(const struct vsock_transport **transpo
 	return cid;
 }
 
+/* vsock_find_cid() must be called outside lock_sock/release_sock
+ * section to avoid a potential lock inversion deadlock with
+ * vsock_assign_transport() where `vsock_register_mutex` is taken when
+ * `sk_lock-AF_VSOCK` is already held.
+ */
 bool vsock_find_cid(unsigned int cid)
 {
 	if (cid == vsock_registered_transport_cid(&transport_g2h))
@@ -735,23 +740,14 @@ static int __vsock_bind_dgram(struct vsock_sock *vsk,
 	return vsk->transport->dgram_bind(vsk, addr);
 }
 
+/* The caller must ensure the socket is not already bound and provide a valid
+ * `addr` to bind (VMADDR_CID_ANY, or a CID assgined to a transport).
+ */
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 	int retval;
 
-	/* First ensure this socket isn't already bound. */
-	if (vsock_addr_bound(&vsk->local_addr))
-		return -EINVAL;
-
-	/* Now bind to the provided address or select appropriate values if
-	 * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
-	 * like AF_INET prevents binding to a non-local IP address (in most
-	 * cases), we only allow binding to a local CID.
-	 */
-	if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
-		return -EADDRNOTAVAIL;
-
 	switch (sk->sk_socket->type) {
 	case SOCK_STREAM:
 	case SOCK_SEQPACKET:
@@ -991,15 +987,33 @@ vsock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	int err;
 	struct sock *sk;
+	struct vsock_sock *vsk;
 	struct sockaddr_vm *vm_addr;
 
 	sk = sock->sk;
+	vsk = vsock_sk(sk);
 
 	if (vsock_addr_cast(addr, addr_len, &vm_addr) != 0)
 		return -EINVAL;
 
+	/* Like AF_INET prevents binding to a non-local IP address (in most
+	 * cases), we only allow binding to a local CID.
+	 */
+	if (vm_addr->svm_cid != VMADDR_CID_ANY &&
+	    !vsock_find_cid(vm_addr->svm_cid))
+		return -EADDRNOTAVAIL;
+
 	lock_sock(sk);
+
+	/* Ensure this socket isn't already bound. */
+	if (vsock_addr_bound(&vsk->local_addr)) {
+		err = -EINVAL;
+		goto out;
+	}
+
 	err = __vsock_bind(sk, vm_addr);
+
+out:
 	release_sock(sk);
 
 	return err;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21 10:16 ` Stefano Garzarella
@ 2025-10-21 10:30   ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 10:30 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	sgarzare, syzkaller-bugs, virtualization

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in vsock_linger

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/6384 is trying to acquire lock:
ffff888055028b18 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
ffff888055028b18 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080

but task is already holding lock:
ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (vsock_register_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
       vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
       vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
       __sys_connect_file+0x141/0x1a0 net/socket.c:2102
       __sys_connect+0x13b/0x160 net/socket.c:2121
       __do_sys_connect net/socket.c:2127 [inline]
       __se_sys_connect net/socket.c:2124 [inline]
       __x64_sys_connect+0x72/0xb0 net/socket.c:2124
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
       lock_sock include/net/sock.h:1679 [inline]
       vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080
       virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
       virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
       vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
       vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
       __sys_connect_file+0x141/0x1a0 net/socket.c:2102
       __sys_connect+0x13b/0x160 net/socket.c:2121
       __do_sys_connect net/socket.c:2127 [inline]
       __se_sys_connect net/socket.c:2124 [inline]
       __x64_sys_connect+0x72/0xb0 net/socket.c:2124
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(vsock_register_mutex);
                               lock(sk_lock-AF_VSOCK);
                               lock(vsock_register_mutex);
  lock(sk_lock-AF_VSOCK);

 *** DEADLOCK ***

1 lock held by syz.0.17/6384:
 #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469

stack backtrace:
CPU: 1 UID: 0 PID: 6384 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
 check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
 lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
 lock_sock include/net/sock.h:1679 [inline]
 vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1080
 virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
 virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
 vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
 vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1592
 __sys_connect_file+0x141/0x1a0 net/socket.c:2102
 __sys_connect+0x13b/0x160 net/socket.c:2121
 __do_sys_connect net/socket.c:2127 [inline]
 __se_sys_connect net/socket.c:2124 [inline]
 __x64_sys_connect+0x72/0xb0 net/socket.c:2124
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f300598efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3006912038 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f3005be5fa0 RCX: 00007f300598efc9
RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f3005a11f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3005be6038 R14: 00007f3005be5fa0 R15: 00007ffdba0a0048
 </TASK>


Tested on:

commit:         6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=162a5492580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=17a04e7c580000


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21  8:27 ` Stefano Garzarella
@ 2025-10-21 10:48   ` Stefano Garzarella
  2025-10-21 12:19     ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:48 UTC (permalink / raw)
  To: syzbot, Michal Luczaj
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Michal,
>
> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
> >Hello,
> >
> >syzbot found the following issue on:
> >
> >HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
> >git tree:       upstream
> >console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
> >kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> >dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
> >compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
> >C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
> >
> >Downloadable assets:
> >disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
> >vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
> >kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
> >
> >IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
> >
> >======================================================
> >WARNING: possible circular locking dependency detected
> >syzkaller #0 Not tainted
> >------------------------------------------------------
> >syz.0.17/6098 is trying to acquire lock:
> >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
> >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>
> Could this be related to our recent work on linger in vsock?
>
> >
> >but task is already holding lock:
> >ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> >
> >which lock already depends on the new lock.
> >
> >
> >the existing dependency chain (in reverse order) is:
> >
> >-> #1 (vsock_register_mutex){+.+.}-{4:4}:
> >       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> >       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
> >       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>
> Ah, no maybe this is related to commit 209fd720838a ("vsock:
> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
> vsock_find_cid().
>
> Maybe we can just move the checks on top of __vsock_bind() to the
> caller. I mean:
>
>         /* First ensure this socket isn't already bound. */
>         if (vsock_addr_bound(&vsk->local_addr))
>                 return -EINVAL;
>
>         /* Now bind to the provided address or select appropriate values if
>          * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
>          * like AF_INET prevents binding to a non-local IP address (in most
>          * cases), we only allow binding to a local CID.
>          */
>         if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>                 return -EADDRNOTAVAIL;
>
> We have 2 callers: vsock_auto_bind() and vsock_bind().
>
> vsock_auto_bind() is already checking if the socket is already bound,
> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>
> In vsock_bind() we can do the checks before lock_sock(sk), at least the
> checks on vm_addr, calling vsock_find_cid().
>
> I'm preparing a patch to do this.

mmm, no, this is more related to vsock_linger() where sk_wait_event()
releases and locks again the sk_lock.
So, it should be related to commit 687aa0c5581b ("vsock: Fix
transport_* TOCTOU") where we take vsock_register_mutex in
vsock_assign_transport() while calling vsk->transport->release().

So, maybe we need to move the release and vsock_deassign_transport()
after unlocking vsock_register_mutex.

Stefano

>
> Stefano
>
>
> >       vsock_find_cid net/vmw_vsock/af_vsock.c:570 [inline]
> >       __vsock_bind+0x1b5/0xa10 net/vmw_vsock/af_vsock.c:752
> >       vsock_bind+0xc6/0x120 net/vmw_vsock/af_vsock.c:1002
> >       __sys_bind_socket net/socket.c:1874 [inline]
> >       __sys_bind_socket net/socket.c:1866 [inline]
> >       __sys_bind+0x1a7/0x260 net/socket.c:1905
> >       __do_sys_bind net/socket.c:1910 [inline]
> >       __se_sys_bind net/socket.c:1908 [inline]
> >       __x64_sys_bind+0x72/0xb0 net/socket.c:1908
> >       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> >       entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> >-> #0 (sk_lock-AF_VSOCK){+.+.}-{0:0}:
> >       check_prev_add kernel/locking/lockdep.c:3165 [inline]
> >       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> >       validate_chain kernel/locking/lockdep.c:3908 [inline]
> >       __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> >       lock_acquire kernel/locking/lockdep.c:5868 [inline]
> >       lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> >       lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> >       lock_sock include/net/sock.h:1679 [inline]
> >       vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> >       virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> >       virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> >       vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> >       vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> >       __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> >       __sys_connect+0x13b/0x160 net/socket.c:2121
> >       __do_sys_connect net/socket.c:2127 [inline]
> >       __se_sys_connect net/socket.c:2124 [inline]
> >       __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> >       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >       do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> >       entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> >other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(vsock_register_mutex);
> >                               lock(sk_lock-AF_VSOCK);
> >                               lock(vsock_register_mutex);
> >  lock(sk_lock-AF_VSOCK);
> >
> > *** DEADLOCK ***
> >
> >1 lock held by syz.0.17/6098:
> > #0: ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> >
> >stack backtrace:
> >CPU: 3 UID: 0 PID: 6098 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> >Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> >Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:94 [inline]
> > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
> > print_circular_bug+0x275/0x350 kernel/locking/lockdep.c:2043
> > check_noncircular+0x14c/0x170 kernel/locking/lockdep.c:2175
> > check_prev_add kernel/locking/lockdep.c:3165 [inline]
> > check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> > validate_chain kernel/locking/lockdep.c:3908 [inline]
> > __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5237
> > lock_acquire kernel/locking/lockdep.c:5868 [inline]
> > lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5825
> > lock_sock_nested+0x41/0xf0 net/core/sock.c:3720
> > lock_sock include/net/sock.h:1679 [inline]
> > vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> > virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1271 [inline]
> > virtio_transport_release+0x52a/0x640 net/vmw_vsock/virtio_transport_common.c:1291
> > vsock_assign_transport+0x320/0x900 net/vmw_vsock/af_vsock.c:502
> > vsock_connect+0x201/0xee0 net/vmw_vsock/af_vsock.c:1578
> > __sys_connect_file+0x141/0x1a0 net/socket.c:2102
> > __sys_connect+0x13b/0x160 net/socket.c:2121
> > __do_sys_connect net/socket.c:2127 [inline]
> > __se_sys_connect net/socket.c:2124 [inline]
> > __x64_sys_connect+0x72/0xb0 net/socket.c:2124
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >RIP: 0033:0x7f767bf8efc9
> >Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> >RSP: 002b:00007fff0a2857b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> >RAX: ffffffffffffffda RBX: 00007f767c1e5fa0 RCX: 00007f767bf8efc9
> >RDX: 0000000000000010 RSI: 0000200000000000 RDI: 0000000000000004
> >RBP: 00007f767c011f91 R08: 0000000000000000 R09: 0000000000000000
> >R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >R13: 00007f767c1e5fa0 R14: 00007f767c1e5fa0 R15: 0000000000000003
> > </TASK>
> >
> >
> >---
> >This report is generated by a bot. It may contain errors.
> >See https://goo.gl/tpsmEJ for more information about syzbot.
> >syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> >syzbot will keep track of this issue. See:
> >https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> >If the report is already addressed, let syzbot know by replying with:
> >#syz fix: exact-commit-title
> >
> >If you want syzbot to run the reproducer, reply with:
> >#syz test: git://repo/address.git branch-or-commit-hash
> >If you attach or paste a git patch, syzbot will apply it before testing.
> >
> >If you want to overwrite report's subsystems, reply with:
> >#syz set subsystems: new-subsystem
> >(See the list of subsystem names on the web dashboard)
> >
> >If the report is a duplicate of another one, reply with:
> >#syz dup: exact-subject-of-another-report
> >
> >If you want to undo deduplication, reply with:
> >#syz undup
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21  0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
                   ` (2 preceding siblings ...)
  2025-10-21 10:16 ` Stefano Garzarella
@ 2025-10-21 10:59 ` Stefano Garzarella
  2025-10-21 11:19   ` syzbot
  3 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 10:59 UTC (permalink / raw)
  To: syzbot
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>git tree:       upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000

#syz test


[-- Attachment #2: 0001-TODO.patch --]
[-- Type: text/plain, Size: 2286 bytes --]

From 456534cbdbc7312fa1cddfb7aa50b771725c0a53 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 21 Oct 2025 12:51:45 +0200
Subject: [PATCH] TODO

From: Stefano Garzarella <sgarzare@redhat.com>

---
 net/vmw_vsock/af_vsock.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4c2db6cca557..89b4dbb859a5 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -487,12 +487,26 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		goto err;
 	}
 
-	if (vsk->transport) {
-		if (vsk->transport == new_transport) {
-			ret = 0;
-			goto err;
-		}
+	if (vsk->transport == new_transport) {
+		ret = 0;
+		goto err;
+	}
 
+	/* We increase the module refcnt to prevent the transport unloading
+	 * while there are open sockets assigned to it.
+	 */
+	if (!new_transport || !try_module_get(new_transport->module)) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	/* It's safe to release the mutex after a successful try_module_get().
+	 * Whichever transport `new_transport` points at, it won't go away until
+	 * the last module_put() below or in vsock_deassign_transport().
+	 */
+	mutex_unlock(&vsock_register_mutex);
+
+	if (vsk->transport) {
 		/* transport->release() must be called with sock lock acquired.
 		 * This path can only be taken during vsock_connect(), where we
 		 * have already held the sock lock. In the other cases, this
@@ -512,20 +526,6 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		vsk->peer_shutdown = 0;
 	}
 
-	/* We increase the module refcnt to prevent the transport unloading
-	 * while there are open sockets assigned to it.
-	 */
-	if (!new_transport || !try_module_get(new_transport->module)) {
-		ret = -ENODEV;
-		goto err;
-	}
-
-	/* It's safe to release the mutex after a successful try_module_get().
-	 * Whichever transport `new_transport` points at, it won't go away until
-	 * the last module_put() below or in vsock_deassign_transport().
-	 */
-	mutex_unlock(&vsock_register_mutex);
-
 	if (sk->sk_type == SOCK_SEQPACKET) {
 		if (!new_transport->seqpacket_allow ||
 		    !new_transport->seqpacket_allow(remote_cid)) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21 10:59 ` Stefano Garzarella
@ 2025-10-21 11:19   ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2025-10-21 11:19 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	sgarzare, syzkaller-bugs, virtualization

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com

Tested on:

commit:         6548d364 Merge tag 'cgroup-for-6.18-rc2-fixes' of git:..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1534c3cd980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=11a90d2f980000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21 10:48   ` Stefano Garzarella
@ 2025-10-21 12:19     ` Stefano Garzarella
  2025-11-15 16:00       ` Michal Luczaj
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-10-21 12:19 UTC (permalink / raw)
  To: syzbot, Michal Luczaj
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > Hi Michal,
> >
> > On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
> > >Hello,
> > >
> > >syzbot found the following issue on:
> > >
> > >HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
> > >git tree:       upstream
> > >console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
> > >kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
> > >dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
> > >compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > >syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
> > >C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
> > >
> > >Downloadable assets:
> > >disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
> > >vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
> > >kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
> > >
> > >IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > >Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
> > >
> > >======================================================
> > >WARNING: possible circular locking dependency detected
> > >syzkaller #0 Not tainted
> > >------------------------------------------------------
> > >syz.0.17/6098 is trying to acquire lock:
> > >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
> > >ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
> >
> > Could this be related to our recent work on linger in vsock?
> >
> > >
> > >but task is already holding lock:
> > >ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
> > >
> > >which lock already depends on the new lock.
> > >
> > >
> > >the existing dependency chain (in reverse order) is:
> > >
> > >-> #1 (vsock_register_mutex){+.+.}-{4:4}:
> > >       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> > >       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
> > >       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
> >
> > Ah, no maybe this is related to commit 209fd720838a ("vsock:
> > Fix transport_{g2h,h2g} TOCTOU") where we added locking in
> > vsock_find_cid().
> >
> > Maybe we can just move the checks on top of __vsock_bind() to the
> > caller. I mean:
> >
> >         /* First ensure this socket isn't already bound. */
> >         if (vsock_addr_bound(&vsk->local_addr))
> >                 return -EINVAL;
> >
> >         /* Now bind to the provided address or select appropriate values if
> >          * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
> >          * like AF_INET prevents binding to a non-local IP address (in most
> >          * cases), we only allow binding to a local CID.
> >          */
> >         if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
> >                 return -EADDRNOTAVAIL;
> >
> > We have 2 callers: vsock_auto_bind() and vsock_bind().
> >
> > vsock_auto_bind() is already checking if the socket is already bound,
> > if not is setting VMADDR_CID_ANY, so we can skip those checks.
> >
> > In vsock_bind() we can do the checks before lock_sock(sk), at least the
> > checks on vm_addr, calling vsock_find_cid().
> >
> > I'm preparing a patch to do this.
>
> mmm, no, this is more related to vsock_linger() where sk_wait_event()
> releases and locks again the sk_lock.
> So, it should be related to commit 687aa0c5581b ("vsock: Fix
> transport_* TOCTOU") where we take vsock_register_mutex in
> vsock_assign_transport() while calling vsk->transport->release().
>
> So, maybe we need to move the release and vsock_deassign_transport()
> after unlocking vsock_register_mutex.

I implemented this here:
https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/

sysbot successfully tested it.

Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-10-21 12:19     ` Stefano Garzarella
@ 2025-11-15 16:00       ` Michal Luczaj
  2025-11-17  9:57         ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Luczaj @ 2025-11-15 16:00 UTC (permalink / raw)
  To: Stefano Garzarella, syzbot
  Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs, virtualization

On 10/21/25 14:19, Stefano Garzarella wrote:
> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>
>>> Hi Michal,
>>>
>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>> git tree:       upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>
>>>> Downloadable assets:
>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>
>>>> ======================================================
>>>> WARNING: possible circular locking dependency detected
>>>> syzkaller #0 Not tainted
>>>> ------------------------------------------------------
>>>> syz.0.17/6098 is trying to acquire lock:
>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>
>>> Could this be related to our recent work on linger in vsock?
>>>
>>>>
>>>> but task is already holding lock:
>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>>       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>>       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>>       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>
>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>> vsock_find_cid().
>>>
>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>> caller. I mean:
>>>
>>>         /* First ensure this socket isn't already bound. */
>>>         if (vsock_addr_bound(&vsk->local_addr))
>>>                 return -EINVAL;
>>>
>>>         /* Now bind to the provided address or select appropriate values if
>>>          * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
>>>          * like AF_INET prevents binding to a non-local IP address (in most
>>>          * cases), we only allow binding to a local CID.
>>>          */
>>>         if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>>                 return -EADDRNOTAVAIL;
>>>
>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>
>>> vsock_auto_bind() is already checking if the socket is already bound,
>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>
>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>> checks on vm_addr, calling vsock_find_cid().
>>>
>>> I'm preparing a patch to do this.
>>
>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>> releases and locks again the sk_lock.
>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>> transport_* TOCTOU") where we take vsock_register_mutex in
>> vsock_assign_transport() while calling vsk->transport->release().
>>
>> So, maybe we need to move the release and vsock_deassign_transport()
>> after unlocking vsock_register_mutex.
> 
> I implemented this here:
> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
> 
> sysbot successfully tested it.
> 
> Stefano

Hi Stefano

Apologies for missing this, I was away for a couple of weeks.

Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
you've fixed the lock order inversion (thank you), being able to reset an
established socket, combined with SO_LINGER's lock-release-lock dance,
still leads to crashes.

I think it goes like this: if user hits connect() with a signal right after
connection is established (which implies an assigned transport), `sk_state`
gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
connect() can be retried. If re-connect() is for a different CID, transport
reassignment takes place. That involves transport->release() of the old
transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
Lingering temporarily releases socket lock. Which can be raced by another
thread doing connect(). Basically thread-1 can release resources from under
thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
does not expect a disappearing transport.

BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
Read of size 1 at addr ffff888107c99420 by task a.out/1385
CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G            E
6.18.0-rc5+ #241 PREEMPT(voluntary)
Call Trace:
 dump_stack_lvl+0x7e/0xc0
 print_report+0x170/0x4de
 kasan_report+0xc2/0x180
 __kasan_check_byte+0x3a/0x50
 lock_acquire+0xb2/0x300
 _raw_spin_lock_bh+0x34/0x40
 virtio_transport_unsent_bytes+0x3b/0x80
 vsock_linger+0x263/0x370
 virtio_transport_release+0x3ff/0x510
 vsock_assign_transport+0x358/0x780
 vsock_connect+0x5a2/0xc40
 __sys_connect+0xde/0x110
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x94/0xbb0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Allocated by task 1384:
 kasan_save_stack+0x1c/0x40
 kasan_save_track+0x10/0x30
 __kasan_kmalloc+0x92/0xa0
 virtio_transport_do_socket_init+0x48/0x320
 vsock_assign_transport+0x4ff/0x780
 vsock_connect+0x5a2/0xc40
 __sys_connect+0xde/0x110
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x94/0xbb0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Freed by task 1384:
 kasan_save_stack+0x1c/0x40
 kasan_save_track+0x10/0x30
 __kasan_save_free_info+0x37/0x50
 __kasan_slab_free+0x63/0x80
 kfree+0x142/0x6a0
 virtio_transport_destruct+0x86/0x170
 vsock_assign_transport+0x3a8/0x780
 vsock_connect+0x5a2/0xc40
 __sys_connect+0xde/0x110
 __x64_sys_connect+0x6e/0xc0
 do_syscall_64+0x94/0xbb0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

I suppose there are many ways this chain of events can be stopped, but I
see it as yet another reason to simplify vsock_connect(): do not let it
"reset" an already established socket. I guess that would do the trick.
What do you think?

Thanks,
Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-11-15 16:00       ` Michal Luczaj
@ 2025-11-17  9:57         ` Stefano Garzarella
  2025-11-17 21:00           ` Michal Luczaj
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2025-11-17  9:57 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: syzbot, davem, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs, virtualization

On Sat, Nov 15, 2025 at 05:00:28PM +0100, Michal Luczaj wrote:
>On 10/21/25 14:19, Stefano Garzarella wrote:
>> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>
>>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> Hi Michal,
>>>>
>>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>>> git tree:       upstream
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>>> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>>
>>>>> Downloadable assets:
>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>>
>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>>
>>>>> ======================================================
>>>>> WARNING: possible circular locking dependency detected
>>>>> syzkaller #0 Not tainted
>>>>> ------------------------------------------------------
>>>>> syz.0.17/6098 is trying to acquire lock:
>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>>
>>>> Could this be related to our recent work on linger in vsock?
>>>>
>>>>>
>>>>> but task is already holding lock:
>>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>>
>>>>> which lock already depends on the new lock.
>>>>>
>>>>>
>>>>> the existing dependency chain (in reverse order) is:
>>>>>
>>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>>>       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>>>       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>>>       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>>
>>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>>> vsock_find_cid().
>>>>
>>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>>> caller. I mean:
>>>>
>>>>         /* First ensure this socket isn't already bound. */
>>>>         if (vsock_addr_bound(&vsk->local_addr))
>>>>                 return -EINVAL;
>>>>
>>>>         /* Now bind to the provided address or select appropriate values if
>>>>          * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
>>>>          * like AF_INET prevents binding to a non-local IP address (in most
>>>>          * cases), we only allow binding to a local CID.
>>>>          */
>>>>         if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>>>                 return -EADDRNOTAVAIL;
>>>>
>>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>>
>>>> vsock_auto_bind() is already checking if the socket is already bound,
>>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>>
>>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>>> checks on vm_addr, calling vsock_find_cid().
>>>>
>>>> I'm preparing a patch to do this.
>>>
>>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>>> releases and locks again the sk_lock.
>>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>>> transport_* TOCTOU") where we take vsock_register_mutex in
>>> vsock_assign_transport() while calling vsk->transport->release().
>>>
>>> So, maybe we need to move the release and vsock_deassign_transport()
>>> after unlocking vsock_register_mutex.
>>
>> I implemented this here:
>> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
>>
>> sysbot successfully tested it.
>>
>> Stefano
>
>Hi Stefano

Hi Michal!

>
>Apologies for missing this, I was away for a couple of weeks.

Don't worry at all!

>
>Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
>you've fixed the lock order inversion (thank you), being able to reset an
>established socket, combined with SO_LINGER's lock-release-lock dance,
>still leads to crashes.

Yeah, I see!

>
>I think it goes like this: if user hits connect() with a signal right after
>connection is established (which implies an assigned transport), `sk_state`
>gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
>connect() can be retried. If re-connect() is for a different CID, transport
>reassignment takes place. That involves transport->release() of the old
>transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
>Lingering temporarily releases socket lock. Which can be raced by another
>thread doing connect(). Basically thread-1 can release resources from under
>thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
>does not expect a disappearing transport.

Makes sense to me!

>
>BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
>Read of size 1 at addr ffff888107c99420 by task a.out/1385
>CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G            E
>6.18.0-rc5+ #241 PREEMPT(voluntary)
>Call Trace:
> dump_stack_lvl+0x7e/0xc0
> print_report+0x170/0x4de
> kasan_report+0xc2/0x180
> __kasan_check_byte+0x3a/0x50
> lock_acquire+0xb2/0x300
> _raw_spin_lock_bh+0x34/0x40
> virtio_transport_unsent_bytes+0x3b/0x80
> vsock_linger+0x263/0x370
> virtio_transport_release+0x3ff/0x510
> vsock_assign_transport+0x358/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>Allocated by task 1384:
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_kmalloc+0x92/0xa0
> virtio_transport_do_socket_init+0x48/0x320
> vsock_assign_transport+0x4ff/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>Freed by task 1384:
> kasan_save_stack+0x1c/0x40
> kasan_save_track+0x10/0x30
> __kasan_save_free_info+0x37/0x50
> __kasan_slab_free+0x63/0x80
> kfree+0x142/0x6a0
> virtio_transport_destruct+0x86/0x170
> vsock_assign_transport+0x3a8/0x780
> vsock_connect+0x5a2/0xc40
> __sys_connect+0xde/0x110
> __x64_sys_connect+0x6e/0xc0
> do_syscall_64+0x94/0xbb0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>I suppose there are many ways this chain of events can be stopped, but I
>see it as yet another reason to simplify vsock_connect(): do not let it
>"reset" an already established socket. I guess that would do the trick.
>What do you think?

I agree, we should do that. Do you have time to take a look?

Thanks for the help!
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [virt?] [net?] possible deadlock in vsock_linger
  2025-11-17  9:57         ` Stefano Garzarella
@ 2025-11-17 21:00           ` Michal Luczaj
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Luczaj @ 2025-11-17 21:00 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: syzbot, davem, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs, virtualization

On 11/17/25 10:57, Stefano Garzarella wrote:
> On Sat, Nov 15, 2025 at 05:00:28PM +0100, Michal Luczaj wrote:
>> On 10/21/25 14:19, Stefano Garzarella wrote:
>>> On Tue, 21 Oct 2025 at 12:48, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> On Tue, 21 Oct 2025 at 10:27, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> On Mon, Oct 20, 2025 at 05:02:56PM -0700, syzbot wrote:
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit:    d9043c79ba68 Merge tag 'sched_urgent_for_v6.18_rc2' of git..
>>>>>> git tree:       upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=130983cd980000
>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3e7b5a3627a90dd
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=10e35716f8e4929681fa
>>>>>> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f0f52f980000
>>>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ea9734580000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-d9043c79.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/0546b6eaf1aa/vmlinux-d9043c79.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/81285b4ada51/bzImage-d9043c79.xz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
>>>>>>
>>>>>> ======================================================
>>>>>> WARNING: possible circular locking dependency detected
>>>>>> syzkaller #0 Not tainted
>>>>>> ------------------------------------------------------
>>>>>> syz.0.17/6098 is trying to acquire lock:
>>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1679 [inline]
>>>>>> ffff8880363b8258 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_linger+0x25e/0x4d0 net/vmw_vsock/af_vsock.c:1066
>>>>>
>>>>> Could this be related to our recent work on linger in vsock?
>>>>>
>>>>>>
>>>>>> but task is already holding lock:
>>>>>> ffffffff906260a8 (vsock_register_mutex){+.+.}-{4:4}, at: vsock_assign_transport+0xf2/0x900 net/vmw_vsock/af_vsock.c:469
>>>>>>
>>>>>> which lock already depends on the new lock.
>>>>>>
>>>>>>
>>>>>> the existing dependency chain (in reverse order) is:
>>>>>>
>>>>>> -> #1 (vsock_register_mutex){+.+.}-{4:4}:
>>>>>>       __mutex_lock_common kernel/locking/mutex.c:598 [inline]
>>>>>>       __mutex_lock+0x193/0x1060 kernel/locking/mutex.c:760
>>>>>>       vsock_registered_transport_cid net/vmw_vsock/af_vsock.c:560 [inline]
>>>>>
>>>>> Ah, no maybe this is related to commit 209fd720838a ("vsock:
>>>>> Fix transport_{g2h,h2g} TOCTOU") where we added locking in
>>>>> vsock_find_cid().
>>>>>
>>>>> Maybe we can just move the checks on top of __vsock_bind() to the
>>>>> caller. I mean:
>>>>>
>>>>>         /* First ensure this socket isn't already bound. */
>>>>>         if (vsock_addr_bound(&vsk->local_addr))
>>>>>                 return -EINVAL;
>>>>>
>>>>>         /* Now bind to the provided address or select appropriate values if
>>>>>          * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
>>>>>          * like AF_INET prevents binding to a non-local IP address (in most
>>>>>          * cases), we only allow binding to a local CID.
>>>>>          */
>>>>>         if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
>>>>>                 return -EADDRNOTAVAIL;
>>>>>
>>>>> We have 2 callers: vsock_auto_bind() and vsock_bind().
>>>>>
>>>>> vsock_auto_bind() is already checking if the socket is already bound,
>>>>> if not is setting VMADDR_CID_ANY, so we can skip those checks.
>>>>>
>>>>> In vsock_bind() we can do the checks before lock_sock(sk), at least the
>>>>> checks on vm_addr, calling vsock_find_cid().
>>>>>
>>>>> I'm preparing a patch to do this.
>>>>
>>>> mmm, no, this is more related to vsock_linger() where sk_wait_event()
>>>> releases and locks again the sk_lock.
>>>> So, it should be related to commit 687aa0c5581b ("vsock: Fix
>>>> transport_* TOCTOU") where we take vsock_register_mutex in
>>>> vsock_assign_transport() while calling vsk->transport->release().
>>>>
>>>> So, maybe we need to move the release and vsock_deassign_transport()
>>>> after unlocking vsock_register_mutex.
>>>
>>> I implemented this here:
>>> https://lore.kernel.org/netdev/20251021121718.137668-1-sgarzare@redhat.com/
>>>
>>> sysbot successfully tested it.
>>>
>>> Stefano
>>
>> Hi Stefano
> 
> Hi Michal!
> 
>>
>> Apologies for missing this, I was away for a couple of weeks.
> 
> Don't worry at all!
> 
>>
>> Turns out it's vsock_connect()'s reset-on-signal that strikes again. While
>> you've fixed the lock order inversion (thank you), being able to reset an
>> established socket, combined with SO_LINGER's lock-release-lock dance,
>> still leads to crashes.
> 
> Yeah, I see!
> 
>>
>> I think it goes like this: if user hits connect() with a signal right after
>> connection is established (which implies an assigned transport), `sk_state`
>> gets set to TCP_CLOSING and `state` to SS_UNCONNECTED. SS_UNCONNECTED means
>> connect() can be retried. If re-connect() is for a different CID, transport
>> reassignment takes place. That involves transport->release() of the old
>> transport. Because `sk_state == TCP_CLOSING`, vsock_linger() is called.
>> Lingering temporarily releases socket lock. Which can be raced by another
>> thread doing connect(). Basically thread-1 can release resources from under
>> thread-0. That breaks the assumptions, e.g. virtio_transport_unsent_bytes()
>> does not expect a disappearing transport.
> 
> Makes sense to me!
> 
>>
>> BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x34/0x40
>> Read of size 1 at addr ffff888107c99420 by task a.out/1385
>> CPU: 6 UID: 1000 PID: 1385 Comm: a.out Tainted: G            E
>> 6.18.0-rc5+ #241 PREEMPT(voluntary)
>> Call Trace:
>> dump_stack_lvl+0x7e/0xc0
>> print_report+0x170/0x4de
>> kasan_report+0xc2/0x180
>> __kasan_check_byte+0x3a/0x50
>> lock_acquire+0xb2/0x300
>> _raw_spin_lock_bh+0x34/0x40
>> virtio_transport_unsent_bytes+0x3b/0x80
>> vsock_linger+0x263/0x370
>> virtio_transport_release+0x3ff/0x510
>> vsock_assign_transport+0x358/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> Allocated by task 1384:
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_kmalloc+0x92/0xa0
>> virtio_transport_do_socket_init+0x48/0x320
>> vsock_assign_transport+0x4ff/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> Freed by task 1384:
>> kasan_save_stack+0x1c/0x40
>> kasan_save_track+0x10/0x30
>> __kasan_save_free_info+0x37/0x50
>> __kasan_slab_free+0x63/0x80
>> kfree+0x142/0x6a0
>> virtio_transport_destruct+0x86/0x170
>> vsock_assign_transport+0x3a8/0x780
>> vsock_connect+0x5a2/0xc40
>> __sys_connect+0xde/0x110
>> __x64_sys_connect+0x6e/0xc0
>> do_syscall_64+0x94/0xbb0
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> I suppose there are many ways this chain of events can be stopped, but I
>> see it as yet another reason to simplify vsock_connect(): do not let it
>> "reset" an already established socket. I guess that would do the trick.
>> What do you think?
> 
> I agree, we should do that. Do you have time to take a look?

Sure, here's a patch:
https://lore.kernel.org/netdev/20251117-vsock-interrupted-connect-v1-1-bc021e907c3f@rbox.co/

Michal


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-11-17 21:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21  0:02 [syzbot] [virt?] [net?] possible deadlock in vsock_linger syzbot
2025-10-21  8:27 ` Stefano Garzarella
2025-10-21 10:48   ` Stefano Garzarella
2025-10-21 12:19     ` Stefano Garzarella
2025-11-15 16:00       ` Michal Luczaj
2025-11-17  9:57         ` Stefano Garzarella
2025-11-17 21:00           ` Michal Luczaj
2025-10-21 10:09 ` Stefano Garzarella
2025-10-21 10:11   ` syzbot
2025-10-21 10:16 ` Stefano Garzarella
2025-10-21 10:30   ` syzbot
2025-10-21 10:59 ` Stefano Garzarella
2025-10-21 11:19   ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).