From: "Michael S. Tsirkin" <mst@redhat.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>,
syzbot <syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com>,
jasowang@redhat.com, linux-kernel@vger.kernel.org,
syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] kernel BUG in vhost_get_vq_desc
Date: Sun, 20 Feb 2022 08:10:06 -0500 [thread overview]
Message-ID: <20220220075826-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CACT4Y+Y_d2T4-TR2g_EPkaudWGxeEM8rrpQSkBANzCbme10ZtA@mail.gmail.com>
On Sun, Feb 20, 2022 at 01:31:02PM +0100, Dmitry Vyukov wrote:
> On Sun, 20 Feb 2022 at 13:16, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Sat, 19 Feb 2022 05:01:10 -0800
> > > > > > Hello,
> > > > > >
> > > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > > kernel BUG in vhost_get_vq_desc
> > > > >
> > > > > The WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 got quiesced.
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > kernel BUG at drivers/vhost/vhost.c:2338!
> > > > >
> > > > > Given the mutex_lock(&vq->mutex) in vhost_vsock_handle_tx_kick(), this
> > > > > report proves that the bug is bogus.
> > > > >
> > > > > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > > > > > CPU: 0 PID: 4071 Comm: vhost-4070 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > > > > RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
> > > > > > Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
> > > > > > RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
> > > > > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > > > > RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > > > > > R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
> > > > > > R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
> > > > > > FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000002 CR3: 000000001d077000 CR4: 00000000003506f0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
> > > > > > vhost_worker+0x2e9/0x3e0 drivers/vhost/vhost.c:374
> > > > > > kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > > > > </TASK>
> > > > > > Modules linked in:
> > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
> > > > > > Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
> > > > > > RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
> > > > > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > > > > RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > > > > > R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
> > > > > > R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
> > > > > > FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00007fc7293991d0 CR3: 000000001d077000 CR4: 00000000003506e0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > >
> > > > > >
> > > > > > Tested on:
> > > > > >
> > > > > > commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
> > > > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11e82d7a700000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
> > > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > > > patch: https://syzkaller.appspot.com/x/patch.diff?x=11857326700000
> > > > >
> > > > > Attempted fix is bail out if anything eerie is detected in terms of the
> > > > > notify flag.
> > > >
> > > Hello Mike,
> > >
> > > Thanks for taking a look at it.
> > >
> > > > I mean this will fix the warning for sure, but do we understand how
> > > > it might have triggered?
> > >
> > > Based on what's fed to BUG_ON in the hunk below, it was the update of
> > > used_flag behind our back that pulled the trigger.
> > >
> > > The bigger pain is, given the mutex_lock(&vq->mutex) in
> > > vhost_vsock_handle_tx_kick(), I find nothing to do about it now after
> > > scratching scalp twenty minutes other than detecting the update.
> >
> > Right. I think it's highly likely a use after free.
> > How about poisoning the vq struct with some value before freeing
> > so we can catch that?
>
> syzbot config enables KASAN, which catches most use-after-frees. So
> unless there is something very special about this code, I wouldn't
> assume this is a use-after-free.
> Some racy use-after-frees may be caught as both use-after-frees and
> other types of bugs with lower probability. I see 8 bugs on the syzbot
> dashboard that mention "vhost" but none of the are use-after-frees.
>
Hmm okay.
Well we also have the (non reproducible)
WARN_ON(!llist_empty(&dev->work_list));
trigger.
So I think what happens is that there's some worker still running
when we call vhost_vq_reset.
Here's what is supposed to stop it:
vhost_vsock_stop(vsock);
vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
after this point, there should be no new work.
However I wonder why do we flush before we stop everything.
Maybe this is what it's about.
#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..b31c3a78dbff 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -754,8 +754,8 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
vhost_vsock_stop(vsock);
- vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
+ vhost_vsock_flush(vsock);
spin_lock_bh(&vsock->send_pkt_list_lock);
while (!list_empty(&vsock->send_pkt_list)) {
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>,
syzbot <syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com>,
jasowang@redhat.com, linux-kernel@vger.kernel.org,
syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] kernel BUG in vhost_get_vq_desc
Date: Sun, 20 Feb 2022 08:29:17 -0500 [thread overview]
Message-ID: <20220220075826-mutt-send-email-mst@kernel.org> (raw)
Message-ID: <20220220132917.zwFWfG5esFlcAz2VPwrmDlRWRAkb3yCGyXiG4H6mhwQ@z> (raw)
In-Reply-To: <CACT4Y+Y_d2T4-TR2g_EPkaudWGxeEM8rrpQSkBANzCbme10ZtA@mail.gmail.com>
On Sun, Feb 20, 2022 at 01:31:02PM +0100, Dmitry Vyukov wrote:
> On Sun, 20 Feb 2022 at 13:16, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Sat, 19 Feb 2022 05:01:10 -0800
> > > > > > Hello,
> > > > > >
> > > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > > kernel BUG in vhost_get_vq_desc
> > > > >
> > > > > The WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 got quiesced.
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > kernel BUG at drivers/vhost/vhost.c:2338!
> > > > >
> > > > > Given the mutex_lock(&vq->mutex) in vhost_vsock_handle_tx_kick(), this
> > > > > report proves that the bug is bogus.
> > > > >
> > > > > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > > > > > CPU: 0 PID: 4071 Comm: vhost-4070 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > > > > RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
> > > > > > Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
> > > > > > RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
> > > > > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > > > > RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > > > > > R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
> > > > > > R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
> > > > > > FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000002 CR3: 000000001d077000 CR4: 00000000003506f0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
> > > > > > vhost_worker+0x2e9/0x3e0 drivers/vhost/vhost.c:374
> > > > > > kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > > > > </TASK>
> > > > > > Modules linked in:
> > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
> > > > > > Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
> > > > > > RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
> > > > > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > > > > RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > > > > > R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
> > > > > > R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
> > > > > > FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00007fc7293991d0 CR3: 000000001d077000 CR4: 00000000003506e0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > >
> > > > > >
> > > > > > Tested on:
> > > > > >
> > > > > > commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
> > > > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11e82d7a700000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
> > > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > > > patch: https://syzkaller.appspot.com/x/patch.diff?x=11857326700000
> > > > >
> > > > > Attempted fix is bail out if anything eerie is detected in terms of the
> > > > > notify flag.
> > > >
> > > Hello Mike,
> > >
> > > Thanks for taking a look at it.
> > >
> > > > I mean this will fix the warning for sure, but do we understand how
> > > > it might have triggered?
> > >
> > > Based on what's fed to BUG_ON in the hunk below, it was the update of
> > > used_flag behind our back that pulled the trigger.
> > >
> > > The bigger pain is, given the mutex_lock(&vq->mutex) in
> > > vhost_vsock_handle_tx_kick(), I find nothing to do about it now after
> > > scratching scalp twenty minutes other than detecting the update.
> >
> > Right. I think it's highly likely a use after free.
> > How about poisoning the vq struct with some value before freeing
> > so we can catch that?
>
> syzbot config enables KASAN, which catches most use-after-frees. So
> unless there is something very special about this code, I wouldn't
> assume this is a use-after-free.
> Some racy use-after-frees may be caught as both use-after-frees and
> other types of bugs with lower probability. I see 8 bugs on the syzbot
> dashboard that mention "vhost" but none of the are use-after-frees.
>
Okay, for starters let's try to make sure whether what we are seeing is
actually accessing a vsock that is being released.
#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..2dbc64f072e8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -58,6 +58,7 @@ struct vhost_vsock {
u32 guest_cid;
bool seqpacket_allow;
+ bool dead;
};
static u32 vhost_transport_get_local_cid(void)
@@ -106,6 +107,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
/* Avoid further vmexits, we're already processing the virtqueue */
vhost_disable_notify(&vsock->dev, vq);
+ WARN_ON(vsock->dead);
do {
struct virtio_vsock_pkt *pkt;
@@ -128,6 +130,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
list_del_init(&pkt->list);
spin_unlock_bh(&vsock->send_pkt_list_lock);
+ WARN_ON(vsock->dead);
head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
&out, &in, NULL, NULL);
if (head < 0) {
@@ -510,6 +513,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
goto out;
vhost_disable_notify(&vsock->dev, vq);
+ WARN_ON(vsock->dead);
do {
if (!vhost_vsock_more_replies(vsock)) {
/* Stop tx until the device processes already
@@ -519,6 +523,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
goto no_more_replies;
}
+ WARN_ON(vsock->dead);
head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
&out, &in, NULL, NULL);
if (head < 0)
@@ -678,6 +683,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
}
vsock->guest_cid = 0; /* no CID assigned yet */
+ vsock->dead = false;
atomic_set(&vsock->queued_replies, 0);
@@ -754,8 +760,9 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
vhost_vsock_stop(vsock);
- vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
+ vhost_vsock_flush(vsock);
+ vsock->dead = true;
spin_lock_bh(&vsock->send_pkt_list_lock);
while (!list_empty(&vsock->send_pkt_list)) {
next prev parent reply other threads:[~2022-02-20 13:10 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220219125100.835-1-hdanton@sina.com>
2022-02-19 13:01 ` [syzbot] kernel BUG in vhost_get_vq_desc syzbot
2022-02-21 13:09 ` Stefano Garzarella
[not found] ` <20220221133646.1551-1-hdanton@sina.com>
2022-02-21 13:45 ` Stefano Garzarella
2022-02-21 13:59 ` Michael S. Tsirkin
2022-02-21 14:04 ` Stefano Garzarella
[not found] ` <20220220014715.921-1-hdanton@sina.com>
2022-02-20 2:10 ` syzbot
2022-02-21 14:09 ` Stefano Garzarella
2022-02-21 14:25 ` syzbot
2022-02-20 10:08 ` Michael S. Tsirkin
[not found] ` <20220220110941.980-1-hdanton@sina.com>
2022-02-20 12:16 ` Michael S. Tsirkin
2022-02-20 12:31 ` Dmitry Vyukov
2022-02-20 13:10 ` Michael S. Tsirkin [this message]
2022-02-20 13:29 ` Michael S. Tsirkin
2022-02-20 13:20 ` syzbot
[not found] <20220222031128.1850-1-hdanton@sina.com>
2022-02-22 4:07 ` syzbot
[not found] <20220222001455.1737-1-hdanton@sina.com>
2022-02-22 0:26 ` syzbot
[not found] <20220221140558.1618-1-hdanton@sina.com>
2022-02-21 14:14 ` syzbot
[not found] <20220221054115.1270-1-hdanton@sina.com>
2022-02-21 5:51 ` syzbot
[not found] <20220221040745.1177-1-hdanton@sina.com>
2022-02-21 4:18 ` syzbot
[not found] ` <20220221085227.1356-1-hdanton@sina.com>
2022-02-21 9:17 ` Michael S. Tsirkin
[not found] ` <20220221101538.1415-1-hdanton@sina.com>
2022-02-21 10:48 ` Michael S. Tsirkin
[not found] ` <20220221130022.1494-1-hdanton@sina.com>
2022-02-21 13:58 ` Michael S. Tsirkin
2022-02-21 12:46 ` syzbot
[not found] <20220221021208.1109-1-hdanton@sina.com>
2022-02-21 2:26 ` syzbot
[not found] <20220219114936.747-1-hdanton@sina.com>
2022-02-19 12:00 ` syzbot
2022-02-12 22:47 syzbot
2022-02-18 1:21 ` syzbot
2022-02-18 11:37 ` Michael S. Tsirkin
2022-02-18 11:37 ` Michael S. Tsirkin
2022-03-02 8:29 ` Lee Jones
2022-03-02 8:29 ` Lee Jones
2022-03-02 9:18 ` Stefano Garzarella
2022-03-02 9:18 ` Stefano Garzarella
2022-03-02 9:23 ` Stefano Garzarella
2022-03-02 9:23 ` Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220220075826-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dvyukov@google.com \
--cc=hdanton@sina.com \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.