Re: [syzbot] INFO: task hung in vhost_worker_killed (2)

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Christie <michael.christie@oracle.com>
To: "Michael S. Tsirkin" <mst@redhat.com>, Hillf Danton <hdanton@sina.com>
Cc: syzbot <syzbot+a9528028ab4ca83e8bac@syzkaller.appspotmail.com>,
	eperezma@redhat.com, jasowang@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	syzkaller-bugs@googlegroups.com, virtualization@lists.linux.dev
Subject: Re: [syzbot] INFO: task hung in vhost_worker_killed (2)
Date: Wed, 7 Jan 2026 12:07:24 -0600	[thread overview]
Message-ID: <a71abaa5-478b-4fb7-9015-abe5f10909ef@oracle.com> (raw)
In-Reply-To: <20260106024033-mutt-send-email-mst@kernel.org>

On 1/6/26 1:57 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 06, 2026 at 09:46:30AM +0800, Hillf Danton wrote:
>>> taking vq mutex in a kill handler is probably not wise.
>>> we should have a separate lock just for handling worker
>>> assignment.
>>>
>> Better not before showing us the root cause of the hung to
>> avoid adding a blind lock.
> 
> Well I think it's pretty clear but the issue is that just another lock
> is not enough, we have bigger problems with this mutex.
> 
> It's held around userspace accesses so if the vhost thread gets into
> uninterruptible sleep holding that, a userspace thread trying to take it
> with mutex_lock will be uninterruptible.
> 
> So it propagates the uninterruptible status between vhost and a
> userspace thread.
> 
> It's not a new issue but the new(ish) thread management APIs make
> it more visible.
> 
> Here it's the kill handler that got hung but it's not really limited
> to that, any ioctl can do that, and I do not want to add another
> lock on data path.
> 

Above, are you saying that the kill handler and a ioctl are trying
to take the virtqueue->mutex in this bug?

I've been trying to replicate this for a while, but I just can't hit what
I'm seeing in the lockdep info from the initial email. We only see the
kill handler trying to take the virtqueue->mutex. Is the theory that the
locking info being reported is not correct? A userspace thread is doing
an ioctl that took the mutex but it's not reported below?

Originally I was using the vhost_dev->mutex for the locking in vhost_worker_killed
but I saw we could take that during ioctls which did a flush, so I added the
vhost_worker->mutex for some of the locking.

If the virtqueue->mutex is also an issue I can do a patch.



Showing all locks held in the system:
1 lock held by khungtaskd/32:
 #0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
 #0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
2 locks held by getty/5579:
 #0: ffff88814e3cb0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc9000332b2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
1 lock held by syz-executor/5978:
 #0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:311 [inline]
 #0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x2b1/0x6e0 kernel/rcu/tree_exp.h:956
2 locks held by syz.5.259/7601:
3 locks held by vhost-7617/7618:
 #0: ffff888054cc68e8 (&vtsk->exit_mutex){+.+.}-{4:4}, at: vhost_task_fn+0x322/0x430 kernel/vhost_task.c:54
 #1: ffff888024646a80 (&worker->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x57/0x390 drivers/vhost/vhost.c:470
 #2: ffff8880550c0258 (&vq->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x12b/0x390 drivers/vhost/vhost.c:476
1 lock held by syz-executor/7850:
 #0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:343 [inline]
 #0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x36e/0x6e0 kernel/rcu/tree_exp.h:956
1 lock held by syz.2.640/9940:
4 locks held by syz.3.641/9946:
3 locks held by syz.1.642/9954:

     prev parent reply	other threads:[~2026-01-07 18:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05  8:42 [syzbot] INFO: task hung in vhost_worker_killed (2) syzbot
2026-01-05  9:22 ` Michael S. Tsirkin
2026-01-06  1:46   ` Hillf Danton
2026-01-06  7:57     ` Michael S. Tsirkin
2026-01-07 18:07       ` Mike Christie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a71abaa5-478b-4fb7-9015-abe5f10909ef@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=eperezma@redhat.com \
    --cc=hdanton@sina.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+a9528028ab4ca83e8bac@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox