From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: David Airlie <airlied@linux.ie>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, vgoyal@redhat.com
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
Date: Tue, 24 Aug 2021 07:31:29 -0400 [thread overview]
Message-ID: <20210824072622-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20210824105944.172659-1-stefanha@redhat.com>
On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> While investigating an unhandled irq from vring_interrupt() with virtiofs I
> stumbled onto a possible race that also affects virtio_gpu. This theory is
> based on code inspection and hopefully you can point out something that makes
> this a non-issue in practice :).
>
> The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> taken with no used (completed) buffers in the virtqueue. The kernel disables
> the irq and the driver is no longer receives irqs when this happens:
>
> irq 77: nobody cared (try booting with the "irqpoll" option)
> ...
> handlers:
> [<00000000a40a49bb>] vring_interrupt
> Disabling IRQ #77
>
> Consider the following:
>
> 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> function is scheduled to dequeue used buffers:
> vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
>
> 2. The device adds more used requests and just before...
>
> 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> virtqueue_get_buf().
>
> 4. The device raises the irq and vring_interrupt() is called after
> virtio_fs_requests_done_work emptied the virtqueue:
>
> irqreturn_t vring_interrupt(int irq, void *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> if (!more_used(vq)) {
> pr_debug("virtqueue interrupt with no work for %p\n", vq);
> return IRQ_NONE;
> ^^^^^^^^^^^^^^^^
>
> I have included a patch that switches virtiofs from spin_lock() to
> spin_lock_irqsave() to prevent vring_interrupt() from running while the
> virtqueue is processed from a work function.
>
> virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> I think this can result in the same false unhandled irq problem as virtiofs.
>
> This race condition could in theory affect all drivers. The VIRTIO
> specification says:
>
> Neither of these notification suppression methods are reliable, as they are
> not synchronized with the device, but they serve as useful optimizations.
>
> If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> then virtio_net and other drivers have a problem because because an irq could
> be raised while the driver is dequeuing used buffers. I think we haven't seen
> this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> devices might cache the value and not disable notifications for some time...
>
> Have I missed something?
>
> The virtiofs patch I attached is being stress tested to see if the unhandled
> irqs still occur.
>
> Stefan Hajnoczi (1):
> fuse: disable local irqs when processing vq completions
>
> fs/fuse/virtio_fs.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
Fundamentally it is not a problem to have an unhandled IRQ
once in a while. It's only a problem if this happens time
after time.
Does the kernel you are testing include
commit 8d622d21d24803408b256d96463eac4574dcf067
Author: Michael S. Tsirkin <mst@redhat.com>
Date: Tue Apr 13 01:19:16 2021 -0400
virtio: fix up virtio_disable_cb
?
If not it's worth checking whether the latest kernel still
has the issue.
Note cherry picking that commit isn't trivial there are
a bunch of dependencies.
If you want to try on an old kernel, disable event index.
> --
> 2.31.1
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
Gerd Hoffmann <kraxel@redhat.com>,
linux-kernel@vger.kernel.org, David Airlie <airlied@linux.ie>,
vgoyal@redhat.com, jasowang@redhat.com
Subject: Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
Date: Tue, 24 Aug 2021 07:31:29 -0400 [thread overview]
Message-ID: <20210824072622-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20210824105944.172659-1-stefanha@redhat.com>
On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> While investigating an unhandled irq from vring_interrupt() with virtiofs I
> stumbled onto a possible race that also affects virtio_gpu. This theory is
> based on code inspection and hopefully you can point out something that makes
> this a non-issue in practice :).
>
> The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> taken with no used (completed) buffers in the virtqueue. The kernel disables
> the irq and the driver is no longer receives irqs when this happens:
>
> irq 77: nobody cared (try booting with the "irqpoll" option)
> ...
> handlers:
> [<00000000a40a49bb>] vring_interrupt
> Disabling IRQ #77
>
> Consider the following:
>
> 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> function is scheduled to dequeue used buffers:
> vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
>
> 2. The device adds more used requests and just before...
>
> 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> virtqueue_get_buf().
>
> 4. The device raises the irq and vring_interrupt() is called after
> virtio_fs_requests_done_work emptied the virtqueue:
>
> irqreturn_t vring_interrupt(int irq, void *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> if (!more_used(vq)) {
> pr_debug("virtqueue interrupt with no work for %p\n", vq);
> return IRQ_NONE;
> ^^^^^^^^^^^^^^^^
>
> I have included a patch that switches virtiofs from spin_lock() to
> spin_lock_irqsave() to prevent vring_interrupt() from running while the
> virtqueue is processed from a work function.
>
> virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> I think this can result in the same false unhandled irq problem as virtiofs.
>
> This race condition could in theory affect all drivers. The VIRTIO
> specification says:
>
> Neither of these notification suppression methods are reliable, as they are
> not synchronized with the device, but they serve as useful optimizations.
>
> If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
> then virtio_net and other drivers have a problem because because an irq could
> be raised while the driver is dequeuing used buffers. I think we haven't seen
> this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> devices might cache the value and not disable notifications for some time...
>
> Have I missed something?
>
> The virtiofs patch I attached is being stress tested to see if the unhandled
> irqs still occur.
>
> Stefan Hajnoczi (1):
> fuse: disable local irqs when processing vq completions
>
> fs/fuse/virtio_fs.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
Fundamentally it is not a problem to have an unhandled IRQ
once in a while. It's only a problem if this happens time
after time.
Does the kernel you are testing include
commit 8d622d21d24803408b256d96463eac4574dcf067
Author: Michael S. Tsirkin <mst@redhat.com>
Date: Tue Apr 13 01:19:16 2021 -0400
virtio: fix up virtio_disable_cb
?
If not it's worth checking whether the latest kernel still
has the issue.
Note cherry picking that commit isn't trivial there are
a bunch of dependencies.
If you want to try on an old kernel, disable event index.
> --
> 2.31.1
>
next prev parent reply other threads:[~2021-08-24 11:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-24 10:59 [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt() Stefan Hajnoczi
2021-08-24 10:59 ` Stefan Hajnoczi
2021-08-24 10:59 ` [RFC PATCH 1/1] fuse: disable local irqs when processing vq completions Stefan Hajnoczi
2021-08-24 10:59 ` Stefan Hajnoczi
2021-08-24 11:31 ` Michael S. Tsirkin [this message]
2021-08-24 11:31 ` [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt() Michael S. Tsirkin
2021-08-24 16:36 ` Stefan Hajnoczi
2021-08-24 16:36 ` Stefan Hajnoczi
2021-08-26 15:42 ` Stefan Hajnoczi
2021-08-26 15:42 ` Stefan Hajnoczi
2021-09-06 15:18 ` Stefan Hajnoczi
2021-09-06 15:18 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210824072622-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=airlied@linux.ie \
--cc=linux-kernel@vger.kernel.org \
--cc=stefanha@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.