* [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue @ 2023-05-30 7:30 syzbot 2023-05-30 11:24 ` Michael S. Tsirkin 0 siblings, 1 reply; 16+ messages in thread From: syzbot @ 2023-05-30 7:30 UTC (permalink / raw) To: jasowang, kvm, linux-kernel, mst, netdev, syzkaller-bugs, virtualization Hello, syzbot found the following issue on: HEAD commit: 933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae5280000 kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz kernel image: https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 FS: 00007f3b445ec700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e423000 CR3: 000000005d734000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 virtio_transport_send_pkt_info+0x54c/0x820 net/vmw_vsock/virtio_transport_common.c:250 virtio_transport_connect+0xb1/0xf0 net/vmw_vsock/virtio_transport_common.c:813 vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 __sys_connect_file+0x153/0x1a0 net/socket.c:2003 __sys_connect+0x165/0x1a0 net/socket.c:2020 __do_sys_connect net/socket.c:2030 [inline] __se_sys_connect net/socket.c:2027 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:2027 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3b4388c169 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3b445ec168 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00007f3b439ac050 RCX: 00007f3b4388c169 RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000004 RBP: 00007f3b438e7ca1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3b43acfb1f R14: 00007f3b445ec300 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 FS: 00007f3b445ec700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e428000 CR3: 000000005d734000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess), 5 bytes skipped: 0: 48 89 da mov %rbx,%rdx 3: 48 c1 ea 03 shr $0x3,%rdx 7: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) b: 75 56 jne 0x63 d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 14: fc ff df 17: 48 8b 1b mov (%rbx),%rbx 1a: 48 8d 7b 70 lea 0x70(%rbx),%rdi 1e: 48 89 fa mov %rdi,%rdx 21: 48 c1 ea 03 shr $0x3,%rdx * 25: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 29: 75 42 jne 0x6d 2b: 48 8b 7b 70 mov 0x70(%rbx),%rdi 2f: e8 95 9e ae f9 callq 0xf9ae9ec9 34: 5b pop %rbx 35: 5d pop %rbp 36: 41 5c pop %r12 38: 41 5d pop %r13 3a: e9 .byte 0xe9 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the bug is already fixed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to change bug's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the bug is a duplicate of another bug, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 7:30 [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue syzbot @ 2023-05-30 11:24 ` Michael S. Tsirkin 2023-05-30 13:44 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: Michael S. Tsirkin @ 2023-05-30 11:24 UTC (permalink / raw) To: syzbot Cc: jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, Stefano Garzarella, stefanha On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae5280000 > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > kernel image: https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com > > general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN > KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > FS: 00007f3b445ec700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b2e423000 CR3: 000000005d734000 CR4: 00000000003506f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > virtio_transport_send_pkt_info+0x54c/0x820 net/vmw_vsock/virtio_transport_common.c:250 > virtio_transport_connect+0xb1/0xf0 net/vmw_vsock/virtio_transport_common.c:813 > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > __sys_connect+0x165/0x1a0 net/socket.c:2020 > __do_sys_connect net/socket.c:2030 [inline] > __se_sys_connect net/socket.c:2027 [inline] > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f3b4388c169 > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007f3b445ec168 EFLAGS: 00000246 ORIG_RAX: 000000000000002a > RAX: ffffffffffffffda RBX: 00007f3b439ac050 RCX: 00007f3b4388c169 > RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000004 > RBP: 00007f3b438e7ca1 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007f3b43acfb1f R14: 00007f3b445ec300 R15: 0000000000022000 > </TASK> > Modules linked in: > ---[ end trace 0000000000000000 ]--- > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > FS: 00007f3b445ec700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b2e428000 CR3: 000000005d734000 CR4: 00000000003506e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > ---------------- > Code disassembly (best guess), 5 bytes skipped: > 0: 48 89 da mov %rbx,%rdx > 3: 48 c1 ea 03 shr $0x3,%rdx > 7: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) > b: 75 56 jne 0x63 > d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > 14: fc ff df > 17: 48 8b 1b mov (%rbx),%rbx > 1a: 48 8d 7b 70 lea 0x70(%rbx),%rdi > 1e: 48 89 fa mov %rdi,%rdx > 21: 48 c1 ea 03 shr $0x3,%rdx > * 25: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > 29: 75 42 jne 0x6d > 2b: 48 8b 7b 70 mov 0x70(%rbx),%rdi > 2f: e8 95 9e ae f9 callq 0xf9ae9ec9 > 34: 5b pop %rbx > 35: 5d pop %rbp > 36: 41 5c pop %r12 > 38: 41 5d pop %r13 > 3a: e9 .byte 0xe9 Stefano, Stefan, take a look? > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the bug is already fixed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want to change bug's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the bug is a duplicate of another bug, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 11:24 ` Michael S. Tsirkin @ 2023-05-30 13:44 ` Stefano Garzarella 2023-05-30 15:58 ` Mike Christie 2023-05-30 16:00 ` Stefano Garzarella 0 siblings, 2 replies; 16+ messages in thread From: Stefano Garzarella @ 2023-05-30 13:44 UTC (permalink / raw) To: Michael S. Tsirkin, Mike Christie Cc: syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae5280000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com > > > > general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN > > KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] > > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > > FS: 00007f3b445ec700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000001b2e423000 CR3: 000000005d734000 CR4: 00000000003506f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > > virtio_transport_send_pkt_info+0x54c/0x820 net/vmw_vsock/virtio_transport_common.c:250 > > virtio_transport_connect+0xb1/0xf0 net/vmw_vsock/virtio_transport_common.c:813 > > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > > __sys_connect+0x165/0x1a0 net/socket.c:2020 > > __do_sys_connect net/socket.c:2030 [inline] > > __se_sys_connect net/socket.c:2027 [inline] > > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > RIP: 0033:0x7f3b4388c169 > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:00007f3b445ec168 EFLAGS: 00000246 ORIG_RAX: 000000000000002a > > RAX: ffffffffffffffda RBX: 00007f3b439ac050 RCX: 00007f3b4388c169 > > RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000004 > > RBP: 00007f3b438e7ca1 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 00007f3b43acfb1f R14: 00007f3b445ec300 R15: 0000000000022000 > > </TASK> > > Modules linked in: > > ---[ end trace 0000000000000000 ]--- > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > > FS: 00007f3b445ec700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000001b2e428000 CR3: 000000005d734000 CR4: 00000000003506e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > ---------------- > > Code disassembly (best guess), 5 bytes skipped: > > 0: 48 89 da mov %rbx,%rdx > > 3: 48 c1 ea 03 shr $0x3,%rdx > > 7: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) > > b: 75 56 jne 0x63 > > d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > > 14: fc ff df > > 17: 48 8b 1b mov (%rbx),%rbx > > 1a: 48 8d 7b 70 lea 0x70(%rbx),%rdi > > 1e: 48 89 fa mov %rdi,%rdx > > 21: 48 c1 ea 03 shr $0x3,%rdx > > * 25: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > > 29: 75 42 jne 0x6d > > 2b: 48 8b 7b 70 mov 0x70(%rbx),%rdi > > 2f: e8 95 9e ae f9 callq 0xf9ae9ec9 > > 34: 5b pop %rbx > > 35: 5d pop %rbp > > 36: 41 5c pop %r12 > > 38: 41 5d pop %r13 > > 3a: e9 .byte 0xe9 > > > Stefano, Stefan, take a look? I'll take a look. From a first glance, it looks like an issue when we call vhost_work_queue(). @Mike, does that ring any bells since you recently looked at that code? Thanks, Stefano ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 13:44 ` Stefano Garzarella @ 2023-05-30 15:58 ` Mike Christie 2023-05-30 16:01 ` Mike Christie 2023-05-30 16:00 ` Stefano Garzarella 1 sibling, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-05-30 15:58 UTC (permalink / raw) To: Stefano Garzarella, Michael S. Tsirkin Cc: syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/30/23 8:44 AM, Stefano Garzarella wrote: > > From a first glance, it looks like an issue when we call vhost_work_queue(). > @Mike, does that ring any bells since you recently looked at that code? I see the bug. needed to have set the dev->worker after setting worker->vtsk like below: diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..7bd95984a501 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) if (!worker) return -ENOMEM; - dev->worker = worker; worker->kcov_handle = kcov_common_handle(); init_llist_head(&worker->work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) } worker->vtsk = vtsk; + dev->worker = worker; vhost_task_start(vtsk); return 0; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 15:58 ` Mike Christie @ 2023-05-30 16:01 ` Mike Christie 2023-05-30 16:11 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-05-30 16:01 UTC (permalink / raw) To: Stefano Garzarella, Michael S. Tsirkin Cc: syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/30/23 10:58 AM, Mike Christie wrote: > On 5/30/23 8:44 AM, Stefano Garzarella wrote: >> >> From a first glance, it looks like an issue when we call vhost_work_queue(). >> @Mike, does that ring any bells since you recently looked at that code? > > I see the bug. needed to have set the dev->worker after setting worker->vtsk > like below: > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index a92af08e7864..7bd95984a501 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) > if (!worker) > return -ENOMEM; > > - dev->worker = worker; > worker->kcov_handle = kcov_common_handle(); > init_llist_head(&worker->work_list); > snprintf(name, sizeof(name), "vhost-%d", current->pid); > @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) > } > > worker->vtsk = vtsk; Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask is set before dev->worker or vhost_work_queue could still end up seeing dev->worker set before worker->vtsk right? > + dev->worker = worker;> vhost_task_start(vtsk); > return 0; > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 16:01 ` Mike Christie @ 2023-05-30 16:11 ` Stefano Garzarella 0 siblings, 0 replies; 16+ messages in thread From: Stefano Garzarella @ 2023-05-30 16:11 UTC (permalink / raw) To: Mike Christie Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Tue, May 30, 2023 at 11:01:11AM -0500, Mike Christie wrote: >On 5/30/23 10:58 AM, Mike Christie wrote: >> On 5/30/23 8:44 AM, Stefano Garzarella wrote: >>> >>> From a first glance, it looks like an issue when we call vhost_work_queue(). >>> @Mike, does that ring any bells since you recently looked at that code? >> >> I see the bug. needed to have set the dev->worker after setting worker->vtsk Yes, I came to the same conclusion (see my email sent at the same time :-). >> like below: >> >> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index a92af08e7864..7bd95984a501 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) >> if (!worker) >> return -ENOMEM; >> >> - dev->worker = worker; >> worker->kcov_handle = kcov_common_handle(); >> init_llist_head(&worker->work_list); >> snprintf(name, sizeof(name), "vhost-%d", current->pid); >> @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) >> } >> >> worker->vtsk = vtsk; > >Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask >is set before dev->worker or vhost_work_queue could still end up seeing >dev->worker set before worker->vtsk right? But should we pair smp_wmb() with an smp_rmb() wherever we check dev->worker? Thanks, Stefano ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 13:44 ` Stefano Garzarella 2023-05-30 15:58 ` Mike Christie @ 2023-05-30 16:00 ` Stefano Garzarella 2023-05-30 16:09 ` Mike Christie 1 sibling, 1 reply; 16+ messages in thread From: Stefano Garzarella @ 2023-05-30 16:00 UTC (permalink / raw) To: Michael S. Tsirkin, Mike Christie Cc: syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Tue, May 30, 2023 at 3:44 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae5280000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > Downloadable assets: > > > disk image: https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com > > > > > > general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN > > > KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] > > > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 > > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > > > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > > > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > > > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > > > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > > > FS: 00007f3b445ec700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000001b2e423000 CR3: 000000005d734000 CR4: 00000000003506f0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > Call Trace: > > > <TASK> > > > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > > > virtio_transport_send_pkt_info+0x54c/0x820 net/vmw_vsock/virtio_transport_common.c:250 > > > virtio_transport_connect+0xb1/0xf0 net/vmw_vsock/virtio_transport_common.c:813 > > > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > > > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > > > __sys_connect+0x165/0x1a0 net/socket.c:2020 > > > __do_sys_connect net/socket.c:2030 [inline] > > > __se_sys_connect net/socket.c:2027 [inline] > > > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > RIP: 0033:0x7f3b4388c169 > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > > > RSP: 002b:00007f3b445ec168 EFLAGS: 00000246 ORIG_RAX: 000000000000002a > > > RAX: ffffffffffffffda RBX: 00007f3b439ac050 RCX: 00007f3b4388c169 > > > RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000004 > > > RBP: 00007f3b438e7ca1 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > R13: 00007f3b43acfb1f R14: 00007f3b445ec300 R15: 0000000000022000 > > > </TASK> > > > Modules linked in: > > > ---[ end trace 0000000000000000 ]--- > > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > > RSP: 0018:ffffc9000333faf8 EFLAGS: 00010202 > > > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000d84d000 > > > RDX: 000000000000000e RSI: ffffffff841221d7 RDI: 0000000000000070 > > > RBP: ffff88804b6b95b0 R08: 0000000000000001 R09: 0000000000000000 > > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff88804b6b00b0 > > > R13: 0000000000000000 R14: ffff88804b6b95e0 R15: ffff88804b6b95c8 > > > FS: 00007f3b445ec700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000001b2e428000 CR3: 000000005d734000 CR4: 00000000003506e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 000000000000003b DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > ---------------- > > > Code disassembly (best guess), 5 bytes skipped: > > > 0: 48 89 da mov %rbx,%rdx > > > 3: 48 c1 ea 03 shr $0x3,%rdx > > > 7: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) > > > b: 75 56 jne 0x63 > > > d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > > > 14: fc ff df > > > 17: 48 8b 1b mov (%rbx),%rbx > > > 1a: 48 8d 7b 70 lea 0x70(%rbx),%rdi > > > 1e: 48 89 fa mov %rdi,%rdx > > > 21: 48 c1 ea 03 shr $0x3,%rdx > > > * 25: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > > > 29: 75 42 jne 0x6d > > > 2b: 48 8b 7b 70 mov 0x70(%rbx),%rdi > > > 2f: e8 95 9e ae f9 callq 0xf9ae9ec9 > > > 34: 5b pop %rbx > > > 35: 5d pop %rbp > > > 36: 41 5c pop %r12 > > > 38: 41 5d pop %r13 > > > 3a: e9 .byte 0xe9 > > > > > > Stefano, Stefan, take a look? > > I'll take a look. > > From a first glance, it looks like an issue when we call vhost_work_queue(). > @Mike, does that ring any bells since you recently looked at that code? I think it is partially related to commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move worker thread fields to new struct"). Maybe that commits just highlighted the issue and it was already existing. In this case I think there is a race between vhost_worker_create() and vhost_transport_send_pkt(). vhost_transport_send_pkt() calls vhost_work_queue() without holding the vhost device mutex, so it can run while vhost_worker_create() set dev->worker, but has not yet set worker->vtsk. Before commit 1a5f8090c6de ("vhost: move worker thread fields to new struct"), dev->worker is set when everything was ready, but maybe it was just a case of the instructions not being re-ordered and the problem could still occur. This happens because VHOST_VSOCK_SET_GUEST_CID can be called before VHOST_SET_OWNER and then vhost_transport_send_pkt() finds the guest's CID and tries to send it a packet. But is it correct to handle VHOST_VSOCK_SET_GUEST_CID, before VHOST_SET_OWNER? QEMU always calls VHOST_SET_OWNER before anything, but I don't know about the other VMMs. So, could it be an acceptable solution to reject VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER? I mean somethig like this: diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 6578db78f0ae..33fc0805d189 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -829,7 +829,12 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl, case VHOST_VSOCK_SET_GUEST_CID: if (copy_from_user(&guest_cid, argp, sizeof(guest_cid))) return -EFAULT; - return vhost_vsock_set_cid(vsock, guest_cid); + mutex_lock(&vsock->dev.mutex); + r = vhost_dev_check_owner(&vsock->dev); + if (!r) + r = vhost_vsock_set_cid(vsock, guest_cid); + mutex_unlock(&vsock->dev.mutex); + return r; case VHOST_VSOCK_SET_RUNNING: if (copy_from_user(&start, argp, sizeof(start))) return -EFAULT; In the documentation, we say: /* Set current process as the (exclusive) owner of this file descriptor. This * must be called before any other vhost command. Further calls to * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */ This should prevents the issue, but could break a wrong userspace. Others idea that I have in mind are: - hold vsock->dev.mutex while calling vhost_work_queue() (performance degradation?) - use RCU to protect dev->worker WDYT? Thanks, Stefano ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 16:00 ` Stefano Garzarella @ 2023-05-30 16:09 ` Mike Christie 2023-05-30 16:17 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-05-30 16:09 UTC (permalink / raw) To: Stefano Garzarella, Michael S. Tsirkin Cc: syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/30/23 11:00 AM, Stefano Garzarella wrote: > I think it is partially related to commit 6e890c5d5021 ("vhost: use > vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > worker thread fields to new struct"). Maybe that commits just > highlighted the issue and it was already existing. See my mail about the crash. Agree with your analysis about worker->vtsk not being set yet. It's a bug from my commit where I should have not set it so early or I should be checking for if (dev->worker && worker->vtsk) instead of if (dev->worker) One question about the behavior before my commit though and what we want in the end going forward. Before that patch we would just drop work if vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected? The call to vhost_work_queue in vhost_vsock_start was only seeing the works queued after VHOST_SET_OWNER. Did you want works queued before that? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 16:09 ` Mike Christie @ 2023-05-30 16:17 ` Stefano Garzarella 2023-05-30 16:30 ` michael.christie 0 siblings, 1 reply; 16+ messages in thread From: Stefano Garzarella @ 2023-05-30 16:17 UTC (permalink / raw) To: Mike Christie Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >On 5/30/23 11:00 AM, Stefano Garzarella wrote: >> I think it is partially related to commit 6e890c5d5021 ("vhost: use >> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >> worker thread fields to new struct"). Maybe that commits just >> highlighted the issue and it was already existing. > >See my mail about the crash. Agree with your analysis about worker->vtsk >not being set yet. It's a bug from my commit where I should have not set >it so early or I should be checking for > >if (dev->worker && worker->vtsk) > >instead of > >if (dev->worker) Yes, though, in my opinion the problem may persist depending on how the instructions are reordered. Should we protect dev->worker() with an RCU to be safe? > >One question about the behavior before my commit though and what we want in >the end going forward. Before that patch we would just drop work if >vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected? I think so, since we ask the guest to call VHOST_SET_OWNER, before any other command. > >The call to vhost_work_queue in vhost_vsock_start was only seeing the >works queued after VHOST_SET_OWNER. Did you want works queued before that? > Yes, for example if an application in the host has tried to connect and is waiting for a timeout, we already have work queued up to flush as soon as we start the device. (See commit 0b841030625c ("vhost: vsock: kick send_pkt worker once device is started")). Thanks, Stefano ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 16:17 ` Stefano Garzarella @ 2023-05-30 16:30 ` michael.christie 2023-05-31 7:27 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: michael.christie @ 2023-05-30 16:30 UTC (permalink / raw) To: Stefano Garzarella Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/30/23 11:17 AM, Stefano Garzarella wrote: > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >>> worker thread fields to new struct"). Maybe that commits just >>> highlighted the issue and it was already existing. >> >> See my mail about the crash. Agree with your analysis about worker->vtsk >> not being set yet. It's a bug from my commit where I should have not set >> it so early or I should be checking for >> >> if (dev->worker && worker->vtsk) >> >> instead of >> >> if (dev->worker) > > Yes, though, in my opinion the problem may persist depending on how the > instructions are reordered. Ah ok. > > Should we protect dev->worker() with an RCU to be safe? For those multiple worker patchsets Jason had asked me about supporting where we don't have a worker while we are swapping workers around. To do that I had added rcu around the dev->worker. I removed it in later patchsets because I didn't think anyone would use it. rcu would work for your case and for what Jason had requested. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-30 16:30 ` michael.christie @ 2023-05-31 7:27 ` Stefano Garzarella 2023-05-31 15:15 ` Mike Christie 0 siblings, 1 reply; 16+ messages in thread From: Stefano Garzarella @ 2023-05-31 7:27 UTC (permalink / raw) To: michael.christie Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Tue, May 30, 2023 at 6:30 PM <michael.christie@oracle.com> wrote: > > On 5/30/23 11:17 AM, Stefano Garzarella wrote: > > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: > >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: > >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use > >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > >>> worker thread fields to new struct"). Maybe that commits just > >>> highlighted the issue and it was already existing. > >> > >> See my mail about the crash. Agree with your analysis about worker->vtsk > >> not being set yet. It's a bug from my commit where I should have not set > >> it so early or I should be checking for > >> > >> if (dev->worker && worker->vtsk) > >> > >> instead of > >> > >> if (dev->worker) > > > > Yes, though, in my opinion the problem may persist depending on how the > > instructions are reordered. > > Ah ok. > > > > > Should we protect dev->worker() with an RCU to be safe? > > For those multiple worker patchsets Jason had asked me about supporting > where we don't have a worker while we are swapping workers around. To do > that I had added rcu around the dev->worker. I removed it in later patchsets > because I didn't think anyone would use it. > > rcu would work for your case and for what Jason had requested. Yeah, so you already have some patches? Do you want to send it to solve this problem? Thanks, Stefano ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-31 7:27 ` Stefano Garzarella @ 2023-05-31 15:15 ` Mike Christie 2023-05-31 16:27 ` Mike Christie 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-05-31 15:15 UTC (permalink / raw) To: Stefano Garzarella Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/31/23 2:27 AM, Stefano Garzarella wrote: > On Tue, May 30, 2023 at 6:30 PM <michael.christie@oracle.com> wrote: >> >> On 5/30/23 11:17 AM, Stefano Garzarella wrote: >>> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >>>> On 5/30/23 11:00 AM, Stefano Garzarella wrote: >>>>> I think it is partially related to commit 6e890c5d5021 ("vhost: use >>>>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >>>>> worker thread fields to new struct"). Maybe that commits just >>>>> highlighted the issue and it was already existing. >>>> >>>> See my mail about the crash. Agree with your analysis about worker->vtsk >>>> not being set yet. It's a bug from my commit where I should have not set >>>> it so early or I should be checking for >>>> >>>> if (dev->worker && worker->vtsk) >>>> >>>> instead of >>>> >>>> if (dev->worker) >>> >>> Yes, though, in my opinion the problem may persist depending on how the >>> instructions are reordered. >> >> Ah ok. >> >>> >>> Should we protect dev->worker() with an RCU to be safe? >> >> For those multiple worker patchsets Jason had asked me about supporting >> where we don't have a worker while we are swapping workers around. To do >> that I had added rcu around the dev->worker. I removed it in later patchsets >> because I didn't think anyone would use it. >> >> rcu would work for your case and for what Jason had requested. > > Yeah, so you already have some patches? > > Do you want to send it to solve this problem? > Yeah, I'll break them out and send them later today when I can retest rebased patches. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-31 15:15 ` Mike Christie @ 2023-05-31 16:27 ` Mike Christie 2023-06-01 7:47 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-05-31 16:27 UTC (permalink / raw) To: Stefano Garzarella Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 5/31/23 10:15 AM, Mike Christie wrote: >>> rcu would work for your case and for what Jason had requested. >> Yeah, so you already have some patches? >> >> Do you want to send it to solve this problem? >> > Yeah, I'll break them out and send them later today when I can retest > rebased patches. > Just one question. Do you core vhost developers consider RCU more complex or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate regression fix we could just switch to the latter like below to just fix the crash if we think that is more simple. I think RCU is just a little more complex/invasive because it will have the extra synchronize_rcu calls. diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..03fd47a22a73 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; - if (dev->worker) { + if (READ_ONCE(dev->worker.vtsk)) { init_completion(&flush.wait_event); vhost_work_init(&flush.work, vhost_flush_work); @@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { - if (!dev->worker) + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); + + if (!vtsk) return; if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { @@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(&work->node, &dev->worker->work_list); - wake_up_process(dev->worker->vtsk->task); + llist_add(&work->node, &dev->worker.work_list); + wake_up_process(vtsk->task); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return dev->worker && !llist_empty(&dev->worker->work_list); + return !llist_empty(&dev->worker.work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; - dev->worker = NULL; + memset(&dev->worker, 0, sizeof(dev->worker)); dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev) static void vhost_worker_free(struct vhost_dev *dev) { - struct vhost_worker *worker = dev->worker; + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); - if (!worker) + if (!vtsk) return; - dev->worker = NULL; - WARN_ON(!llist_empty(&worker->work_list)); - vhost_task_stop(worker->vtsk); - kfree(worker); + vhost_task_stop(vtsk); + WARN_ON(!llist_empty(&dev->worker.work_list)); + WRITE_ONCE(dev->worker.vtsk, NULL); } static int vhost_worker_create(struct vhost_dev *dev) { - struct vhost_worker *worker; struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); - if (!worker) - return -ENOMEM; - - dev->worker = worker; - worker->kcov_handle = kcov_common_handle(); - init_llist_head(&worker->work_list); + dev->worker.kcov_handle = kcov_common_handle(); + init_llist_head(&dev->worker.work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); - vtsk = vhost_task_create(vhost_worker, worker, name); + vtsk = vhost_task_create(vhost_worker, &dev->worker, name); if (!vtsk) { ret = -ENOMEM; goto free_worker; } - worker->vtsk = vtsk; + WRITE_ONCE(dev->worker.vtsk, vtsk); vhost_task_start(vtsk); return 0; free_worker: - kfree(worker); - dev->worker = NULL; + WRITE_ONCE(dev->worker.vtsk, NULL); return ret; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0308638cdeee..305ec8593d46 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -154,7 +154,7 @@ struct vhost_dev { struct vhost_virtqueue **vqs; int nvqs; struct eventfd_ctx *log_ctx; - struct vhost_worker *worker; + struct vhost_worker worker; struct vhost_iotlb *umem; struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-05-31 16:27 ` Mike Christie @ 2023-06-01 7:47 ` Stefano Garzarella 2023-06-01 16:33 ` Mike Christie 0 siblings, 1 reply; 16+ messages in thread From: Stefano Garzarella @ 2023-06-01 7:47 UTC (permalink / raw) To: Mike Christie Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Wed, May 31, 2023 at 11:27:12AM -0500, Mike Christie wrote: >On 5/31/23 10:15 AM, Mike Christie wrote: >>>> rcu would work for your case and for what Jason had requested. >>> Yeah, so you already have some patches? >>> >>> Do you want to send it to solve this problem? >>> >> Yeah, I'll break them out and send them later today when I can retest >> rebased patches. >> > >Just one question. Do you core vhost developers consider RCU more complex >or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate >regression fix we could just switch to the latter like below to just fix >the crash if we think that is more simple. > >I think RCU is just a little more complex/invasive because it will have the >extra synchronize_rcu calls. Yes, you may be right, in this case we should just need READ_ONCE/WRITE_ONCE if dev->worker is no longer a pointer. > > >diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >index a92af08e7864..03fd47a22a73 100644 >--- a/drivers/vhost/vhost.c >+++ b/drivers/vhost/vhost.c >@@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev) > { > struct vhost_flush_struct flush; > >- if (dev->worker) { >+ if (READ_ONCE(dev->worker.vtsk)) { > init_completion(&flush.wait_event); > vhost_work_init(&flush.work, vhost_flush_work); > >@@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush); > > void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) > { >- if (!dev->worker) >+ struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); >+ >+ if (!vtsk) > return; > > if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { >@@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) > * sure it was not in the list. > * test_and_set_bit() implies a memory barrier. > */ >- llist_add(&work->node, &dev->worker->work_list); >- wake_up_process(dev->worker->vtsk->task); >+ llist_add(&work->node, &dev->worker.work_list); >+ wake_up_process(vtsk->task); > } > } > EXPORT_SYMBOL_GPL(vhost_work_queue); >@@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); > /* A lockless hint for busy polling code to exit the loop */ > bool vhost_has_work(struct vhost_dev *dev) > { >- return dev->worker && !llist_empty(&dev->worker->work_list); >+ return !llist_empty(&dev->worker.work_list); > } > EXPORT_SYMBOL_GPL(vhost_has_work); > >@@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev, > dev->umem = NULL; > dev->iotlb = NULL; > dev->mm = NULL; >- dev->worker = NULL; >+ memset(&dev->worker, 0, sizeof(dev->worker)); > dev->iov_limit = iov_limit; > dev->weight = weight; > dev->byte_weight = byte_weight; >@@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev) > > static void vhost_worker_free(struct vhost_dev *dev) > { >- struct vhost_worker *worker = dev->worker; >+ struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); > >- if (!worker) >+ if (!vtsk) > return; > >- dev->worker = NULL; >- WARN_ON(!llist_empty(&worker->work_list)); >- vhost_task_stop(worker->vtsk); >- kfree(worker); >+ vhost_task_stop(vtsk); >+ WARN_ON(!llist_empty(&dev->worker.work_list)); >+ WRITE_ONCE(dev->worker.vtsk, NULL); The patch LGTM, I just wonder if we should set dev->worker to zero here, but maybe we don't need to. Thanks, Stefano > } > > static int vhost_worker_create(struct vhost_dev *dev) > { >- struct vhost_worker *worker; > struct vhost_task *vtsk; > char name[TASK_COMM_LEN]; > int ret; > >- worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); >- if (!worker) >- return -ENOMEM; >- >- dev->worker = worker; >- worker->kcov_handle = kcov_common_handle(); >- init_llist_head(&worker->work_list); >+ dev->worker.kcov_handle = kcov_common_handle(); >+ init_llist_head(&dev->worker.work_list); > snprintf(name, sizeof(name), "vhost-%d", current->pid); > >- vtsk = vhost_task_create(vhost_worker, worker, name); >+ vtsk = vhost_task_create(vhost_worker, &dev->worker, name); > if (!vtsk) { > ret = -ENOMEM; > goto free_worker; > } > >- worker->vtsk = vtsk; >+ WRITE_ONCE(dev->worker.vtsk, vtsk); > vhost_task_start(vtsk); > return 0; > > free_worker: >- kfree(worker); >- dev->worker = NULL; >+ WRITE_ONCE(dev->worker.vtsk, NULL); > return ret; > } > >diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h >index 0308638cdeee..305ec8593d46 100644 >--- a/drivers/vhost/vhost.h >+++ b/drivers/vhost/vhost.h >@@ -154,7 +154,7 @@ struct vhost_dev { > struct vhost_virtqueue **vqs; > int nvqs; > struct eventfd_ctx *log_ctx; >- struct vhost_worker *worker; >+ struct vhost_worker worker; > struct vhost_iotlb *umem; > struct vhost_iotlb *iotlb; > spinlock_t iotlb_lock; > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-06-01 7:47 ` Stefano Garzarella @ 2023-06-01 16:33 ` Mike Christie 2023-06-05 8:26 ` Stefano Garzarella 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2023-06-01 16:33 UTC (permalink / raw) To: Stefano Garzarella Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On 6/1/23 2:47 AM, Stefano Garzarella wrote: >> >> static void vhost_worker_free(struct vhost_dev *dev) >> { >> - struct vhost_worker *worker = dev->worker; >> + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); >> >> - if (!worker) >> + if (!vtsk) >> return; >> >> - dev->worker = NULL; >> - WARN_ON(!llist_empty(&worker->work_list)); >> - vhost_task_stop(worker->vtsk); >> - kfree(worker); >> + vhost_task_stop(vtsk); >> + WARN_ON(!llist_empty(&dev->worker.work_list)); >> + WRITE_ONCE(dev->worker.vtsk, NULL); > > The patch LGTM, I just wonder if we should set dev->worker to zero here, We might want to just set kcov_handle to zero for now. In 6.3 and older, I think we could do: 1. vhost_dev_set_owner could successfully set dev->worker. 2. vhost_transport_send_pkt runs vhost_work_queue and sees worker is set and adds the vhost_work to the work_list. 3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop the worker before the work can be run and set worker to NULL. 4. We clear kcov_handle and return. We leave the work on the work_list. 5. Userspace can then retry vhost_dev_set_owner. If that works, then the work gets executed ok eventually. OR Userspace can just close the device. vhost_vsock_dev_release would eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker so will just return), and that will hit the WARN_ON but we would proceed ok. If I do a memset of the worker, then if userspace were to retry VHOST_SET_OWNER, we would lose the queued work since the work_list would get zero'd. I think it's unlikely this ever happens, but you know best so let me know if this a real issue. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue 2023-06-01 16:33 ` Mike Christie @ 2023-06-05 8:26 ` Stefano Garzarella 0 siblings, 0 replies; 16+ messages in thread From: Stefano Garzarella @ 2023-06-05 8:26 UTC (permalink / raw) To: Mike Christie Cc: Michael S. Tsirkin, syzbot, jasowang, kvm, linux-kernel, netdev, syzkaller-bugs, virtualization, stefanha On Thu, Jun 01, 2023 at 11:33:09AM -0500, Mike Christie wrote: >On 6/1/23 2:47 AM, Stefano Garzarella wrote: >>> >>> static void vhost_worker_free(struct vhost_dev *dev) >>> { >>> - struct vhost_worker *worker = dev->worker; >>> + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); >>> >>> - if (!worker) >>> + if (!vtsk) >>> return; >>> >>> - dev->worker = NULL; >>> - WARN_ON(!llist_empty(&worker->work_list)); >>> - vhost_task_stop(worker->vtsk); >>> - kfree(worker); >>> + vhost_task_stop(vtsk); >>> + WARN_ON(!llist_empty(&dev->worker.work_list)); >>> + WRITE_ONCE(dev->worker.vtsk, NULL); >> >> The patch LGTM, I just wonder if we should set dev->worker to zero here, > >We might want to just set kcov_handle to zero for now. > >In 6.3 and older, I think we could do: > >1. vhost_dev_set_owner could successfully set dev->worker. >2. vhost_transport_send_pkt runs vhost_work_queue and sees worker >is set and adds the vhost_work to the work_list. >3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop >the worker before the work can be run and set worker to NULL. >4. We clear kcov_handle and return. > >We leave the work on the work_list. > >5. Userspace can then retry vhost_dev_set_owner. If that works, then the >work gets executed ok eventually. > >OR > >Userspace can just close the device. vhost_vsock_dev_release would >eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker >so will just return), and that will hit the WARN_ON but we would >proceed ok. > >If I do a memset of the worker, then if userspace were to retry >VHOST_SET_OWNER, we would lose the queued work since the work_list would >get zero'd. I think it's unlikely this ever happens, but you know best >so let me know if this a real issue. > I don't think it's a problem, though, you're right, we could hide the warning and thus future bugs, better as you proposed. Thanks, Stefano ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-06-05 8:27 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-30 7:30 [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue syzbot 2023-05-30 11:24 ` Michael S. Tsirkin 2023-05-30 13:44 ` Stefano Garzarella 2023-05-30 15:58 ` Mike Christie 2023-05-30 16:01 ` Mike Christie 2023-05-30 16:11 ` Stefano Garzarella 2023-05-30 16:00 ` Stefano Garzarella 2023-05-30 16:09 ` Mike Christie 2023-05-30 16:17 ` Stefano Garzarella 2023-05-30 16:30 ` michael.christie 2023-05-31 7:27 ` Stefano Garzarella 2023-05-31 15:15 ` Mike Christie 2023-05-31 16:27 ` Mike Christie 2023-06-01 7:47 ` Stefano Garzarella 2023-06-01 16:33 ` Mike Christie 2023-06-05 8:26 ` Stefano Garzarella
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).