* [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI
@ 2025-01-08 12:12 David Woodhouse
0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2025-01-08 12:12 UTC (permalink / raw)
To: Bernhard Beschow, qemu-devel, Michael S. Tsirkin
Cc: Thomas Huth, Paul Durrant, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Marcel Apfelbaum
[-- Attachment #1: Type: text/plain, Size: 8826 bytes --]
From: David Woodhouse <dwmw@amazon.co.uk>
The system GSIs are not designed for sharing. One device might assert a
shared interrupt with qemu_set_irq() and another might deassert it, and
the level from the first device is lost.
This could be solved by refactoring the x86 GSI code to use an OrIrq
device, but that still wouldn't be ideal.
The best answer would be to have a 'resample' callback which is invoked
when the interrupt is acked at the interrupt controller, and causes the
devices to re-trigger the interrupt if it should still be pending. This
is the model that VFIO in Linux uses, with a 'resampler' eventfd that
actually unmasks the interrupt on the hardware device and thus triggers
a new interrupt from it if needed.
As things stand, QEMU currently doesn't use that VFIO interface
correctly, and just bashes on the resampler for every MMIO access to the
device "just in case". Which requires unmapping and trapping the MMIO
while an interrupt is pending!
For the Xen callback GSI, QEMU does something similar — a flag is set
which triggers a poll on *every* vmexst to see if the GSI should be
deasserted.
Proper resampler support would solve all of that, but is a task for
later which has already been on the TODO list for a while.
Since the Xen event channel GSI support *already* has hooks into the PC
gsi_handler() code for routing GSIs to PIRQs, we can use that for a
simpler bug fix.
So... remember the externally-driven state of the line (from e.g. PCI
INTx) and set the logical OR of that with the GSI. As a bonus, we now
only need to enable the polling of vcpu_info on vmexit if the Xen
callback GSI is the *only* reason the corresponding line is asserted.
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2731
Fixes: ddf0fd9ae1fd ("hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
v2:
• Fix comment indentation.
• Check '*level' not 'level' when actually asserting PIRQ.
• Call kvm_xen_set_callback_asserted() when needed.
• Reword commit message to explain why not using OrIrq.
hw/i386/kvm/xen_evtchn.c | 60 +++++++++++++++++++++++++++++++---------
hw/i386/kvm/xen_evtchn.h | 2 +-
hw/i386/x86-common.c | 32 +++++++++++++--------
3 files changed, 69 insertions(+), 25 deletions(-)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index bd2a3cbee0..58484f308e 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -140,6 +140,8 @@ struct XenEvtchnState {
uint64_t callback_param;
bool evtchn_in_kernel;
+ bool setting_callback_gsi;
+ int extern_gsi_level;
uint32_t callback_gsi;
QEMUBH *gsi_bh;
@@ -431,9 +433,22 @@ void xen_evtchn_set_callback_level(int level)
}
if (s->callback_gsi && s->callback_gsi < s->nr_callback_gsis) {
- qemu_set_irq(s->callback_gsis[s->callback_gsi], level);
- if (level) {
- /* Ensure the vCPU polls for deassertion */
+ /*
+ * Ugly, but since we hold the BQL we can set this flag so that
+ * xen_evtchn_set_gsi() can tell the difference between this code
+ * setting the GSI, and an external device (PCI INTx) doing so.
+ */
+ s->setting_callback_gsi = true;
+ /* Do not deassert the line if an external device is asserting it. */
+ qemu_set_irq(s->callback_gsis[s->callback_gsi],
+ level || s->extern_gsi_level);
+ s->setting_callback_gsi = false;
+
+ /*
+ * If the callback GSI is the only one asserted, ensure the status
+ * is polled for deassertion in kvm_arch_post_run().
+ */
+ if (level && !s->extern_gsi_level) {
kvm_xen_set_callback_asserted();
}
}
@@ -1596,7 +1611,7 @@ static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
return pirq;
}
-bool xen_evtchn_set_gsi(int gsi, int level)
+bool xen_evtchn_set_gsi(int gsi, int *level)
{
XenEvtchnState *s = xen_evtchn_singleton;
int pirq;
@@ -1608,16 +1623,35 @@ bool xen_evtchn_set_gsi(int gsi, int level)
}
/*
- * Check that that it *isn't* the event channel GSI, and thus
- * that we are not recursing and it's safe to take s->port_lock.
- *
- * Locking aside, it's perfectly sane to bail out early for that
- * special case, as it would make no sense for the event channel
- * GSI to be routed back to event channels, when the delivery
- * method is to raise the GSI... that recursion wouldn't *just*
- * be a locking issue.
+ * For the callback_gsi we need to implement a logical OR of the event
+ * channel GSI and the external input (e.g. from PCI INTx), because
+ * QEMU itself doesn't support shared level interrupts via demux or
+ * resamplers.
*/
if (gsi && gsi == s->callback_gsi) {
+ /* Remember the external state of the GSI pin (e.g. from PCI INTx) */
+ if (!s->setting_callback_gsi) {
+ s->extern_gsi_level = *level;
+
+ /*
+ * Don't allow the external device to deassert the line if the
+ * eveht channel GSI should still be asserted.
+ */
+ if (!s->extern_gsi_level) {
+ struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+ if (vi && vi->evtchn_upcall_pending) {
+ /* Need to poll for deassertion */
+ kvm_xen_set_callback_asserted();
+ *level = 1;
+ }
+ }
+ }
+
+ /*
+ * The event channel GSI cannot be routed to PIRQ, as that would make
+ * no sense. It could also deadlock on s->port_lock, if we proceed.
+ * So bail out now.
+ */
return false;
}
@@ -1628,7 +1662,7 @@ bool xen_evtchn_set_gsi(int gsi, int level)
return false;
}
- if (level) {
+ if (*level) {
int port = s->pirq[pirq].port;
s->pirq_gsi_set |= (1U << gsi);
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b740acfc0d..0521ebc092 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -23,7 +23,7 @@ void xen_evtchn_set_callback_level(int level);
int xen_evtchn_set_port(uint16_t port);
-bool xen_evtchn_set_gsi(int gsi, int level);
+bool xen_evtchn_set_gsi(int gsi, int *level);
void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
uint64_t addr, uint32_t data, bool is_masked);
void xen_evtchn_remove_pci_device(PCIDevice *dev);
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index a7d46c3105..97b4f7d4a0 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -450,8 +450,27 @@ static long get_file_size(FILE *f)
void gsi_handler(void *opaque, int n, int level)
{
GSIState *s = opaque;
+ bool bypass_ioapic = false;
trace_x86_gsi_interrupt(n, level);
+
+#ifdef CONFIG_XEN_EMU
+ /*
+ * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
+ * routing actually works properly under Xen). And then to
+ * *either* the PIRQ handling or the I/OAPIC depending on whether
+ * the former wants it.
+ *
+ * Additionally, this hook allows the Xen event channel GSI to
+ * work around QEMU's lack of support for shared level interrupts,
+ * by keeping track of the externally driven state of the pin and
+ * implementing a logical OR with the state of the evtchn GSI.
+ */
+ if (xen_mode == XEN_EMULATE) {
+ bypass_ioapic = xen_evtchn_set_gsi(n, &level);
+ }
+#endif
+
switch (n) {
case 0 ... ISA_NUM_IRQS - 1:
if (s->i8259_irq[n]) {
@@ -460,18 +479,9 @@ void gsi_handler(void *opaque, int n, int level)
}
/* fall through */
case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
-#ifdef CONFIG_XEN_EMU
- /*
- * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
- * routing actually works properly under Xen). And then to
- * *either* the PIRQ handling or the I/OAPIC depending on
- * whether the former wants it.
- */
- if (xen_mode == XEN_EMULATE && xen_evtchn_set_gsi(n, level)) {
- break;
+ if (!bypass_ioapic) {
+ qemu_set_irq(s->ioapic_irq[n], level);
}
-#endif
- qemu_set_irq(s->ioapic_irq[n], level);
break;
case IO_APIC_SECONDARY_IRQBASE
... IO_APIC_SECONDARY_IRQBASE + IOAPIC_NUM_PINS - 1:
--
2.47.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI
@ 2025-01-07 16:32 David Woodhouse
2025-01-07 16:38 ` Paul Durrant
2025-01-07 16:44 ` Philippe Mathieu-Daudé
0 siblings, 2 replies; 5+ messages in thread
From: David Woodhouse @ 2025-01-07 16:32 UTC (permalink / raw)
To: qemu-devel, Thomas Huth
Cc: Paul Durrant, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
[-- Attachment #1: Type: text/plain, Size: 7836 bytes --]
From: David Woodhouse <dwmw@amazon.co.uk>
The system GSIs are not designed for sharing. One device might assert a
shared interrupt with qemu_set_irq() and another might deassert it, and
the level from the first device is lost.
This could be solved by using a multiplexer which functions as an OR
gate, much like the PCI code already implements for pci_set_irq() for
muxing the INTx lines.
Alternatively, it could be solved by having a 'resample' callback which
is invoked when the interrupt is acked at the interrupt controller, and
causes the devices to re-trigger the interrupt if it should still be
pending. This is the model that VFIO in Linux uses, with a 'resampler'
eventfd that actually unmasks the interrupt on the hardware device and
thus triggers a new interrupt from it if needed. QEMU currently doesn't
use that VFIO interface correctly, and just bashes on the resampler for
every MMIO access to the device "just in case".
This does neither of those. The Xen event channel GSI support *already*
has hooks into the PC gsi_handler() code, for routing GSIs to PIRQs. So
we can implement the logical OR of the external input (from PCI INTx,
serial etc.) with the Xen event channel GSI by allowing that existing
hook to modify the 'level' being asserted.
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2731
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
v2:
• Fix deassertion of PIRQ GSI.
hw/i386/kvm/xen_evtchn.c | 50 +++++++++++++++++++++++++++++++---------
hw/i386/kvm/xen_evtchn.h | 2 +-
hw/i386/x86-common.c | 32 ++++++++++++++++---------
3 files changed, 61 insertions(+), 23 deletions(-)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index bd2a3cbee0..93f68790b0 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -140,6 +140,8 @@ struct XenEvtchnState {
uint64_t callback_param;
bool evtchn_in_kernel;
+ bool setting_callback_gsi;
+ int extern_gsi_level;
uint32_t callback_gsi;
QEMUBH *gsi_bh;
@@ -431,7 +433,16 @@ void xen_evtchn_set_callback_level(int level)
}
if (s->callback_gsi && s->callback_gsi < s->nr_callback_gsis) {
- qemu_set_irq(s->callback_gsis[s->callback_gsi], level);
+ /*
+ * Ugly, but since we hold the BQL we can set this flag so that
+ * xen_evtchn_set_gsi() can tell the difference between this code
+ * setting the GSI, and an external device (PCI INTx) doing so.
+ */
+ s->setting_callback_gsi = true;
+ /* Do not deassert the line if an external device is asserting it. */
+ qemu_set_irq(s->callback_gsis[s->callback_gsi],
+ level || s->extern_gsi_level);
+ s->setting_callback_gsi = false;
if (level) {
/* Ensure the vCPU polls for deassertion */
kvm_xen_set_callback_asserted();
@@ -1596,7 +1607,7 @@ static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
return pirq;
}
-bool xen_evtchn_set_gsi(int gsi, int level)
+bool xen_evtchn_set_gsi(int gsi, int *level)
{
XenEvtchnState *s = xen_evtchn_singleton;
int pirq;
@@ -1608,16 +1619,33 @@ bool xen_evtchn_set_gsi(int gsi, int level)
}
/*
- * Check that that it *isn't* the event channel GSI, and thus
- * that we are not recursing and it's safe to take s->port_lock.
- *
- * Locking aside, it's perfectly sane to bail out early for that
- * special case, as it would make no sense for the event channel
- * GSI to be routed back to event channels, when the delivery
- * method is to raise the GSI... that recursion wouldn't *just*
- * be a locking issue.
+ * For the callback_gsi we need to implement a logical OR of the event
+ * channel GSI and the external input (e.g. from PCI INTx), because
+ * QEMU itself doesn't support shared level interrupts via demux or
+ * resamplers.
*/
if (gsi && gsi == s->callback_gsi) {
+ /* Remember the external state of the GSI pin (e.g. from PCI INTx) */
+ if (!s->setting_callback_gsi) {
+ s->extern_gsi_level = *level;
+
+ /*
+ * Don't allow the external device to deassert the line if the
+ * eveht channel GSI should still be asserted.
+ */
+ if (!s->extern_gsi_level) {
+ struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
+ if (vi && vi->evtchn_upcall_pending) {
+ *level = 1;
+ }
+ }
+ }
+
+ /*
+ * The event channel GSI cannot be routed to PIRQ, as that would make
+ * no sense. It could also deadlock on s->port_lock, if we proceed.
+ * So bail out now.
+ */
return false;
}
@@ -1628,7 +1656,7 @@ bool xen_evtchn_set_gsi(int gsi, int level)
return false;
}
- if (level) {
+ if (*level) {
int port = s->pirq[pirq].port;
s->pirq_gsi_set |= (1U << gsi);
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b740acfc0d..0521ebc092 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -23,7 +23,7 @@ void xen_evtchn_set_callback_level(int level);
int xen_evtchn_set_port(uint16_t port);
-bool xen_evtchn_set_gsi(int gsi, int level);
+bool xen_evtchn_set_gsi(int gsi, int *level);
void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
uint64_t addr, uint32_t data, bool is_masked);
void xen_evtchn_remove_pci_device(PCIDevice *dev);
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index a7d46c3105..13badc26a5 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -450,8 +450,27 @@ static long get_file_size(FILE *f)
void gsi_handler(void *opaque, int n, int level)
{
GSIState *s = opaque;
+ bool bypass_ioapic = false;
trace_x86_gsi_interrupt(n, level);
+
+#ifdef CONFIG_XEN_EMU
+ /*
+ * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
+ * routing actually works properly under Xen). And then to
+ * *either* the PIRQ handling or the I/OAPIC depending on
+ * whether the former wants it.
+ *
+ * Additionally, this hook allows the Xen event channel GSI to
+ * work around QEMU's lack of support for shared level interrupts,
+ * by keeping track of the externally driven state of the pin and
+ * implementing a logical OR with the state of the evtchn GSI.
+ */
+ if (xen_mode == XEN_EMULATE) {
+ bypass_ioapic = xen_evtchn_set_gsi(n, &level);
+ }
+#endif
+
switch (n) {
case 0 ... ISA_NUM_IRQS - 1:
if (s->i8259_irq[n]) {
@@ -460,18 +479,9 @@ void gsi_handler(void *opaque, int n, int level)
}
/* fall through */
case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
-#ifdef CONFIG_XEN_EMU
- /*
- * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
- * routing actually works properly under Xen). And then to
- * *either* the PIRQ handling or the I/OAPIC depending on
- * whether the former wants it.
- */
- if (xen_mode == XEN_EMULATE && xen_evtchn_set_gsi(n, level)) {
- break;
+ if (!bypass_ioapic) {
+ qemu_set_irq(s->ioapic_irq[n], level);
}
-#endif
- qemu_set_irq(s->ioapic_irq[n], level);
break;
case IO_APIC_SECONDARY_IRQBASE
... IO_APIC_SECONDARY_IRQBASE + IOAPIC_NUM_PINS - 1:
--
2.47.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI
2025-01-07 16:32 David Woodhouse
@ 2025-01-07 16:38 ` Paul Durrant
2025-01-07 16:58 ` David Woodhouse
2025-01-07 16:44 ` Philippe Mathieu-Daudé
1 sibling, 1 reply; 5+ messages in thread
From: Paul Durrant @ 2025-01-07 16:38 UTC (permalink / raw)
To: David Woodhouse, qemu-devel, Thomas Huth
Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
On 07/01/2025 16:32, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The system GSIs are not designed for sharing. One device might assert a
> shared interrupt with qemu_set_irq() and another might deassert it, and
> the level from the first device is lost.
>
> This could be solved by using a multiplexer which functions as an OR
> gate, much like the PCI code already implements for pci_set_irq() for
> muxing the INTx lines.
>
> Alternatively, it could be solved by having a 'resample' callback which
> is invoked when the interrupt is acked at the interrupt controller, and
> causes the devices to re-trigger the interrupt if it should still be
> pending. This is the model that VFIO in Linux uses, with a 'resampler'
> eventfd that actually unmasks the interrupt on the hardware device and
> thus triggers a new interrupt from it if needed. QEMU currently doesn't
> use that VFIO interface correctly, and just bashes on the resampler for
> every MMIO access to the device "just in case".
>
> This does neither of those. The Xen event channel GSI support *already*
> has hooks into the PC gsi_handler() code, for routing GSIs to PIRQs. So
> we can implement the logical OR of the external input (from PCI INTx,
> serial etc.) with the Xen event channel GSI by allowing that existing
> hook to modify the 'level' being asserted.
>
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/2731
> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> v2:
> • Fix deassertion of PIRQ GSI.
>
> hw/i386/kvm/xen_evtchn.c | 50 +++++++++++++++++++++++++++++++---------
> hw/i386/kvm/xen_evtchn.h | 2 +-
> hw/i386/x86-common.c | 32 ++++++++++++++++---------
> 3 files changed, 61 insertions(+), 23 deletions(-)
>
[snip]
> diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
> index a7d46c3105..13badc26a5 100644
> --- a/hw/i386/x86-common.c
> +++ b/hw/i386/x86-common.c
> @@ -450,8 +450,27 @@ static long get_file_size(FILE *f)
> void gsi_handler(void *opaque, int n, int level)
> {
> GSIState *s = opaque;
> + bool bypass_ioapic = false;
>
> trace_x86_gsi_interrupt(n, level);
> +
> +#ifdef CONFIG_XEN_EMU
> + /*
> + * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
> + * routing actually works properly under Xen). And then to
> + * *either* the PIRQ handling or the I/OAPIC depending on
> + * whether the former wants it.
> + *
> + * Additionally, this hook allows the Xen event channel GSI to
> + * work around QEMU's lack of support for shared level interrupts,
> + * by keeping track of the externally driven state of the pin and
> + * implementing a logical OR with the state of the evtchn GSI.
> + */
Looks like something went wrong with the indent here.
> + if (xen_mode == XEN_EMULATE) {
> + bypass_ioapic = xen_evtchn_set_gsi(n, &level);
> + }
> +#endif
> +
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI
2025-01-07 16:38 ` Paul Durrant
@ 2025-01-07 16:58 ` David Woodhouse
0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2025-01-07 16:58 UTC (permalink / raw)
To: paul, qemu-devel, Thomas Huth
Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
[-- Attachment #1: Type: text/plain, Size: 926 bytes --]
On Tue, 2025-01-07 at 16:38 +0000, Paul Durrant wrote:
>
> > +#ifdef CONFIG_XEN_EMU
> > + /*
> > + * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
> > + * routing actually works properly under Xen). And then to
> > + * *either* the PIRQ handling or the I/OAPIC depending on
> > + * whether the former wants it.
> > + *
> > + * Additionally, this hook allows the Xen event channel GSI to
> > + * work around QEMU's lack of support for shared level interrupts,
> > + * by keeping track of the externally driven state of the pin and
> > + * implementing a logical OR with the state of the evtchn GSI.
> > + */
>
> Looks like something went wrong with the indent here.
Fixed the indentation for the comment to 4 spaces in my tree; thanks.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI
2025-01-07 16:32 David Woodhouse
2025-01-07 16:38 ` Paul Durrant
@ 2025-01-07 16:44 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-07 16:44 UTC (permalink / raw)
To: David Woodhouse, qemu-devel, Thomas Huth
Cc: Paul Durrant, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum, Bernhard Beschow
(Cc'ing Bernhard)
On 7/1/25 17:32, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The system GSIs are not designed for sharing. One device might assert a
> shared interrupt with qemu_set_irq() and another might deassert it, and
> the level from the first device is lost.
>
> This could be solved by using a multiplexer which functions as an OR
> gate, much like the PCI code already implements for pci_set_irq() for
> muxing the INTx lines.
>
> Alternatively, it could be solved by having a 'resample' callback which
> is invoked when the interrupt is acked at the interrupt controller, and
> causes the devices to re-trigger the interrupt if it should still be
> pending. This is the model that VFIO in Linux uses, with a 'resampler'
> eventfd that actually unmasks the interrupt on the hardware device and
> thus triggers a new interrupt from it if needed. QEMU currently doesn't
> use that VFIO interface correctly, and just bashes on the resampler for
> every MMIO access to the device "just in case".
>
> This does neither of those. The Xen event channel GSI support *already*
> has hooks into the PC gsi_handler() code, for routing GSIs to PIRQs. So
> we can implement the logical OR of the external input (from PCI INTx,
> serial etc.) with the Xen event channel GSI by allowing that existing
> hook to modify the 'level' being asserted.
>
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/2731
> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> v2:
> • Fix deassertion of PIRQ GSI.
>
> hw/i386/kvm/xen_evtchn.c | 50 +++++++++++++++++++++++++++++++---------
> hw/i386/kvm/xen_evtchn.h | 2 +-
> hw/i386/x86-common.c | 32 ++++++++++++++++---------
> 3 files changed, 61 insertions(+), 23 deletions(-)
>
> diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
> index bd2a3cbee0..93f68790b0 100644
> --- a/hw/i386/kvm/xen_evtchn.c
> +++ b/hw/i386/kvm/xen_evtchn.c
> @@ -140,6 +140,8 @@ struct XenEvtchnState {
>
> uint64_t callback_param;
> bool evtchn_in_kernel;
> + bool setting_callback_gsi;
> + int extern_gsi_level;
> uint32_t callback_gsi;
>
> QEMUBH *gsi_bh;
> @@ -431,7 +433,16 @@ void xen_evtchn_set_callback_level(int level)
> }
>
> if (s->callback_gsi && s->callback_gsi < s->nr_callback_gsis) {
> - qemu_set_irq(s->callback_gsis[s->callback_gsi], level);
> + /*
> + * Ugly, but since we hold the BQL we can set this flag so that
> + * xen_evtchn_set_gsi() can tell the difference between this code
> + * setting the GSI, and an external device (PCI INTx) doing so.
> + */
> + s->setting_callback_gsi = true;
> + /* Do not deassert the line if an external device is asserting it. */
> + qemu_set_irq(s->callback_gsis[s->callback_gsi],
> + level || s->extern_gsi_level);
> + s->setting_callback_gsi = false;
> if (level) {
> /* Ensure the vCPU polls for deassertion */
> kvm_xen_set_callback_asserted();
> @@ -1596,7 +1607,7 @@ static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
> return pirq;
> }
>
> -bool xen_evtchn_set_gsi(int gsi, int level)
> +bool xen_evtchn_set_gsi(int gsi, int *level)
> {
> XenEvtchnState *s = xen_evtchn_singleton;
> int pirq;
> @@ -1608,16 +1619,33 @@ bool xen_evtchn_set_gsi(int gsi, int level)
> }
>
> /*
> - * Check that that it *isn't* the event channel GSI, and thus
> - * that we are not recursing and it's safe to take s->port_lock.
> - *
> - * Locking aside, it's perfectly sane to bail out early for that
> - * special case, as it would make no sense for the event channel
> - * GSI to be routed back to event channels, when the delivery
> - * method is to raise the GSI... that recursion wouldn't *just*
> - * be a locking issue.
> + * For the callback_gsi we need to implement a logical OR of the event
> + * channel GSI and the external input (e.g. from PCI INTx), because
> + * QEMU itself doesn't support shared level interrupts via demux or
> + * resamplers.
> */
> if (gsi && gsi == s->callback_gsi) {
> + /* Remember the external state of the GSI pin (e.g. from PCI INTx) */
> + if (!s->setting_callback_gsi) {
> + s->extern_gsi_level = *level;
> +
> + /*
> + * Don't allow the external device to deassert the line if the
> + * eveht channel GSI should still be asserted.
> + */
> + if (!s->extern_gsi_level) {
> + struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
> + if (vi && vi->evtchn_upcall_pending) {
> + *level = 1;
> + }
> + }
> + }
> +
> + /*
> + * The event channel GSI cannot be routed to PIRQ, as that would make
> + * no sense. It could also deadlock on s->port_lock, if we proceed.
> + * So bail out now.
> + */
> return false;
> }
>
> @@ -1628,7 +1656,7 @@ bool xen_evtchn_set_gsi(int gsi, int level)
> return false;
> }
>
> - if (level) {
> + if (*level) {
> int port = s->pirq[pirq].port;
>
> s->pirq_gsi_set |= (1U << gsi);
> diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
> index b740acfc0d..0521ebc092 100644
> --- a/hw/i386/kvm/xen_evtchn.h
> +++ b/hw/i386/kvm/xen_evtchn.h
> @@ -23,7 +23,7 @@ void xen_evtchn_set_callback_level(int level);
>
> int xen_evtchn_set_port(uint16_t port);
>
> -bool xen_evtchn_set_gsi(int gsi, int level);
> +bool xen_evtchn_set_gsi(int gsi, int *level);
> void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
> uint64_t addr, uint32_t data, bool is_masked);
> void xen_evtchn_remove_pci_device(PCIDevice *dev);
> diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
> index a7d46c3105..13badc26a5 100644
> --- a/hw/i386/x86-common.c
> +++ b/hw/i386/x86-common.c
> @@ -450,8 +450,27 @@ static long get_file_size(FILE *f)
> void gsi_handler(void *opaque, int n, int level)
> {
> GSIState *s = opaque;
> + bool bypass_ioapic = false;
>
> trace_x86_gsi_interrupt(n, level);
> +
> +#ifdef CONFIG_XEN_EMU
> + /*
> + * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
> + * routing actually works properly under Xen). And then to
> + * *either* the PIRQ handling or the I/OAPIC depending on
> + * whether the former wants it.
> + *
> + * Additionally, this hook allows the Xen event channel GSI to
> + * work around QEMU's lack of support for shared level interrupts,
> + * by keeping track of the externally driven state of the pin and
> + * implementing a logical OR with the state of the evtchn GSI.
> + */
> + if (xen_mode == XEN_EMULATE) {
> + bypass_ioapic = xen_evtchn_set_gsi(n, &level);
> + }
> +#endif
> +
> switch (n) {
> case 0 ... ISA_NUM_IRQS - 1:
> if (s->i8259_irq[n]) {
> @@ -460,18 +479,9 @@ void gsi_handler(void *opaque, int n, int level)
> }
> /* fall through */
> case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
> -#ifdef CONFIG_XEN_EMU
> - /*
> - * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
> - * routing actually works properly under Xen). And then to
> - * *either* the PIRQ handling or the I/OAPIC depending on
> - * whether the former wants it.
> - */
> - if (xen_mode == XEN_EMULATE && xen_evtchn_set_gsi(n, level)) {
> - break;
> + if (!bypass_ioapic) {
> + qemu_set_irq(s->ioapic_irq[n], level);
> }
> -#endif
> - qemu_set_irq(s->ioapic_irq[n], level);
> break;
> case IO_APIC_SECONDARY_IRQBASE
> ... IO_APIC_SECONDARY_IRQBASE + IOAPIC_NUM_PINS - 1:
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-08 12:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 12:12 [PATCH v2] hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI David Woodhouse
-- strict thread matches above, loose matches on Subject: below --
2025-01-07 16:32 David Woodhouse
2025-01-07 16:38 ` Paul Durrant
2025-01-07 16:58 ` David Woodhouse
2025-01-07 16:44 ` Philippe Mathieu-Daudé
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).