From: Alex Williamson <alex.williamson@redhat.com>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: kvm <kvm@vger.kernel.org>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2] vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices
Date: Mon, 22 Sep 2025 12:01:22 -0600 [thread overview]
Message-ID: <20250922120122.4e9942e8.alex.williamson@redhat.com> (raw)
In-Reply-To: <912864077.1743059.1758561514856.JavaMail.zimbra@raptorengineeringinc.com>
On Mon, 22 Sep 2025 12:18:34 -0500 (CDT)
Timothy Pearson <tpearson@raptorengineering.com> wrote:
> PCI devices prior to PCI 2.3 both use level interrupts and do not support
> interrupt masking, leading to a failure when passed through to a KVM guest on
> at least the ppc64 platform. This failure manifests as receiving and
> acknowledging a single interrupt in the guest, while the device continues to
> assert the level interrupt indicating a need for further servicing.
>
> When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following
> sequence occurs:
>
> * Level IRQ assertion on device
> * IRQ marked disabled in kernel
> * Host interrupt handler exits without clearing the interrupt on the device
> * Eventfd is delivered to userspace
> * Host interrupt controller sees still-active INTX, reasserts IRQ
> * Host kernel ignores disabled IRQ
> * Guest processes IRQ and clears device interrupt
> * Software mask removed by VFIO driver
This isn't the sequence that was previously identified as the issue.
An interrupt controller that reasserts the IRQ when it remains active
is not susceptible to the issue, and is what I think we generally
expect on x86. I understand that we believe this issue manifests
exactly because the interrupt controller does not reassert an interrupt
that remains active. I think the sequence is:
* device asserts INTx
* vfio_intx_handler() calls disable_irq_nosync() to mark IRQ disabled
* interrupt delivered to userspace via eventfd
* userspace driver/VM services interrupt
* device de-asserts INTx
* device re-asserts INTx
* interrupt received while IRQ disabled is masked at controller
* VMM performs EOI via unmask ioctl, enable_irq() clears disable and
unmasks IRQ
* interrupt controller does not reassert interrupt to the host
The fix then, aiui, is that disabling the unlazy mode masks the
interrupt at disable_irq_nosync(), the same sequence of de-asserting
and re-asserting the interrupt occurs at the controller, but since the
controller was masked at the new rising edge, it will send the
interrupt when umasked.
> The behavior is now platform-dependent. Some platforms (amd64) will continue
> to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ
> will be handled by the host as soon as the mask is dropped. Others (ppc64) will
> only send the one request, and if it is not handled no further interrupts will
> be sent. The former behavior theoretically leaves the system vulnerable to
> interrupt storm, and the latter will result in the device stalling after
> receiving exactly one interrupt in the guest.
>
> Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 123298a4dc8f..d8637b53d051 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -304,6 +304,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
>
> vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
>
> + if (!vdev->pci_2_3)
> + irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> +
> ret = request_irq(pdev->irq, vfio_intx_handler,
> irqflags, ctx->name, ctx);
> if (ret) {
This branch is an example of where we're not clearing the flag on
error. Thanks,
Alex
> @@ -352,6 +355,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
> vfio_virqfd_disable(&ctx->unmask);
> vfio_virqfd_disable(&ctx->mask);
> free_irq(pdev->irq, ctx);
> + if (!vdev->pci_2_3)
> + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> if (ctx->trigger)
> eventfd_ctx_put(ctx->trigger);
> kfree(ctx->name);
next prev parent reply other threads:[~2025-09-22 18:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 17:18 [PATCH v2] vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices Timothy Pearson
2025-09-22 18:01 ` Alex Williamson [this message]
2025-09-22 18:25 ` Timothy Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250922120122.4e9942e8.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=tpearson@raptorengineering.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).