* [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
@ 2025-09-18 16:25 Jon Kohler
2025-09-22 21:51 ` Sean Christopherson
2025-10-24 12:08 ` Khushit Shah
0 siblings, 2 replies; 15+ messages in thread
From: Jon Kohler @ 2025-09-18 16:25 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, kvm,
linux-kernel
Cc: jon, Khushit Shah
From: Khushit Shah <khushit.shah@nutanix.com>
Problem:
We observed Windows w/ HyperV getting stuck during boot because of
level triggered interrupt storm. This is because KVM currently
does not respect Directed EOI bit set by guest in split-irqchip
mode.
We observed the following ACTUAL sequence on Windows guests with
Directed EOI enabled:
1. Guest issues an APIC EOI.
2. The interrupt is injected into L2 and serviced.
3. Guest issues an IOAPIC EOI.
But, with the current behavior in split-irqchip mode:
1. Guest issues an APIC EOI.
2. KVM exits to userspace and QEMU's ioapic_service reasserts the
interrupt because the line is not yet deasserted.
3. Steps 1 and 2 keeps looping, and hence no progress is made.
(logs at the bug linked below).
This is because in split-irqchip mode, KVM requests a userspace IOAPIC
EOI exit on every APIC EOI. However, if the guest sets the Directed EOI
bit in the APIC Spurious Interrupt Vector Register (SPIV, bit 12), per
the x2APIC specification, the APIC does not broadcast EOIs to the IOAPIC.
In this case, it is the guest's responsibility to explicitly EOI the
IOAPIC by writing to its EOI register.
kernel-irqchip mode already handles this similarly in
kvm_ioapic_update_eoi_one().
Link: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com/
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
arch/x86/kvm/lapic.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0725d2cae742..a81e71ad5bda 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1473,6 +1473,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /* EOI the ioapic only if the Directed EOI is disabled. */
+ if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-09-18 16:25 [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled Jon Kohler
@ 2025-09-22 21:51 ` Sean Christopherson
2025-09-23 1:26 ` Huang, Kai
2025-09-23 3:32 ` Khushit Shah
2025-10-24 12:08 ` Khushit Shah
1 sibling, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-09-22 21:51 UTC (permalink / raw)
To: Jon Kohler
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, kvm, linux-kernel, Khushit Shah
It'd be super helpful for downstream folks to mention "split IRQCHIP" somewhere
in the shortlog, e.g.
KVM: x86: Suppress EOI broadcasts with split IRQCHIP if Directed EOI is enabled
On Thu, Sep 18, 2025, Jon Kohler wrote:
> From: Khushit Shah <khushit.shah@nutanix.com>
>
> Problem:
Please read Documentation/process/maintainer-kvm-x86.rst. I'll die on the hill
that leading with a problem statement results in a less efficient changelog.
And nowhere in the changelog does it state what change is actually being made.
The shortlog calls it out, but the shortlog isn't always visible, e.g. when
reviewing the initial patch email.
> We observed Windows w/ HyperV getting stuck during boot because of
No "we". Over time, who "we" is can become unclear.
> level triggered interrupt storm. This is because KVM currently
> does not respect Directed EOI bit set by guest in split-irqchip
> mode.
>
> We observed the following ACTUAL sequence on Windows guests with
Uber nit, generally speaking, use asterisks or underscores to emphasive a word.
Using all caps suggests there is special meaning to the word, e.g. that it's an
acronym or something. I am guilty of using ALL CAPS, but I'm pretty sure only
for "do NOT", where I think/hope the intent is clear. For whatever reason, I
kept doing double-takes when reading this sentence.
Though to be honest, I would omit the whole "actual" part. There's an assumption
that everyone is acting in good faith, i.e. that contributors aren't fabricating
a bug report or making things up.
> Directed EOI enabled:
> 1. Guest issues an APIC EOI.
> 2. The interrupt is injected into L2 and serviced.
> 3. Guest issues an IOAPIC EOI.
>
> But, with the current behavior in split-irqchip mode:
> 1. Guest issues an APIC EOI.
> 2. KVM exits to userspace and QEMU's ioapic_service reasserts the
> interrupt because the line is not yet deasserted.
> 3. Steps 1 and 2 keeps looping, and hence no progress is made.
> (logs at the bug linked below).
Eh, for the logs, there's a bug report, just use Closes. And honestly, I would
shorten all of the above. This is a fairly straightforward bug, providing a
super detailed play-by-play actually makes it _harder_ to understand what's
> This is because in split-irqchip mode, KVM requests a userspace IOAPIC
Please don't use QEMU's terminology when describing KVM bugs.
> EOI exit on every APIC EOI.
This is wrong. Every *intercepted* EOI, but not all EOIs are intercepted.
> However, if the guest sets the Directed EOI bit in the APIC Spurious
> Interrupt Vector Register (SPIV, bit 12), per the x2APIC specification,
There is no singular x2APIC specification. x2APIC is defined by both Intel's SDM
and AMD's APM. Usually the Intel and AMD specs are compatible, key word "usually".
Case in point, I can't find anything in the APM that suggests AMD CPUs support
"Suppress EOI Broadcasts".
Of course, that could just be an APM documentation bug, since the bit appears
writable on Turin (though not on Milan).
> the APIC does not broadcast EOIs to the IOAPIC. In this case, it is the
> guest's responsibility to explicitly EOI the IOAPIC by writing to its EOI
> register.
>
> kernel-irqchip mode already handles this similarly in
> kvm_ioapic_update_eoi_one().
Hmm, I'm pretty sure that's dead code since commit 0bcc3fb95b97 ("KVM: lapic: stop
advertising DIRECTED_EOI when in-kernel IOAPIC is in use").
And Fudge with a capital 'F', because I'm pretty sure we're going to need at least
a new CAP, and probably a quirk too.
As per the aforementioned commit, advertising DIRECTED_EOI without an I/O APIC
that emulates the EOI register will make for a very sad guest. So simply fixing
KVM will break existing setups, even though it's unequivocally the correct behavior.
And unfortunately, I _know_ there is at least one VMM in production that doesn't
support EOIs in the I/O APIC.
> Link: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com/
As above, this should be Closes, not Link.
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
This also needs:
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Cc: stable@vger.kernel.org
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> ---
> arch/x86/kvm/lapic.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0725d2cae742..a81e71ad5bda 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1473,6 +1473,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /* EOI the ioapic only if the Directed EOI is disabled. */
Capitalize I/O APIC (ugh, the above uses IOAPIC, and further above uses ioapic).
I'd just avoid mentioning I/O APIC at this point, it should be super obvious
what's happening.
"the Directed EOI" doesn't parse. Directed EOI is a feature, not a "thing".
"the ioapic" is also wrong in the sense that KVM has no idea how many I/O APICs
are being emulated by userspace. Could be 1, could be 0 or 2.
Don't bother sending a v2, at least not yet. I'll follow-up with various folks
to try and figure out the least awful way to get out of this mess, and will
probably post a v2 with a CAP and/or quirk.
In the meantime, I have the below sitting in a local branch:
--
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Thu, 18 Sep 2025 09:25:28 -0700
Subject: [PATCH] KVM: x86: Suppress EOI broadcasts with split IRQCHIP if
Directed EOI is enabled
Do not generate a KVM_EXIT_IOAPIC_EOI exit to userspace when handling EOIs
for a split IRQCHIP and the vCPU has enabled Directed EOIs in its local
APIC, i.e. if the guest has set "Suppress EOI Broadcasts" in Intel
parlance.
Incorrectly broadcasting EOIs can lead to a potentially fatal interrupt
storm if the IRQ line is still asserted and userspace reacts to the EOI by
re-injecting the IRQ. E.g. Windows with Hyper-V enabled gets stuck during
boot when running under QEMU with a split IRQCHIP.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Note #2, KVM doesn't support Directed EOIs for its in-kernel I/O APIC.
See commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when
in-kernel IOAPIC is in use").
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Link: https://lore.kernel.org/r/20250918162529.640943-1-jon@nutanix.com
[sean: rewrite changelog and comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fc437341e03..4d77112b887d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1429,6 +1429,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
+ * doesn't broadcast EOIs (the the guest must EOI the target
+ * I/O APIC(s) directly).
+ */
+ if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
base-commit: 07e27ad16399afcd693be20211b0dfae63e0615f
--
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-09-22 21:51 ` Sean Christopherson
@ 2025-09-23 1:26 ` Huang, Kai
2025-09-23 3:32 ` Khushit Shah
1 sibling, 0 replies; 15+ messages in thread
From: Huang, Kai @ 2025-09-23 1:26 UTC (permalink / raw)
To: Kohler, Jon, seanjc@google.com
Cc: khushit.shah@nutanix.com, x86@kernel.org,
dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
tglx@linutronix.de, bp@alien8.de, kvm@vger.kernel.org,
pbonzini@redhat.com, linux-kernel@vger.kernel.org
> From: Khushit Shah <khushit.shah@nutanix.com>
> Date: Thu, 18 Sep 2025 09:25:28 -0700
> Subject: [PATCH] KVM: x86: Suppress EOI broadcasts with split IRQCHIP if
> Directed EOI is enabled
>
> Do not generate a KVM_EXIT_IOAPIC_EOI exit to userspace when handling EOIs
> for a split IRQCHIP and the vCPU has enabled Directed EOIs in its local
> APIC, i.e. if the guest has set "Suppress EOI Broadcasts" in Intel
> parlance.
>
> Incorrectly broadcasting EOIs can lead to a potentially fatal interrupt
> storm if the IRQ line is still asserted and userspace reacts to the EOI by
> re-injecting the IRQ. E.g. Windows with Hyper-V enabled gets stuck during
> boot when running under QEMU with a split IRQCHIP.
>
> Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
> APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
> is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
>
> Note #2, KVM doesn't support Directed EOIs for its in-kernel I/O APIC.
> See commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when
> in-kernel IOAPIC is in use").
>
> Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
> Cc: stable@vger.kernel.org
> Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> Link: https://lore.kernel.org/r/20250918162529.640943-1-jon@nutanix.com
> [sean: rewrite changelog and comment]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
> ---
> arch/x86/kvm/lapic.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fc437341e03..4d77112b887d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1429,6 +1429,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /*
> + * Don't exit to userspace if the guest has enabled Directed
> + * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
> + * doesn't broadcast EOIs (the the guest must EOI the target
> + * I/O APIC(s) directly).
> + */
> + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> + return;
> +
> apic->vcpu->arch.pending_ioapic_eoi = vector;
> kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> return;
>
> base-commit: 07e27ad16399afcd693be20211b0dfae63e0615f
> --
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-09-22 21:51 ` Sean Christopherson
2025-09-23 1:26 ` Huang, Kai
@ 2025-09-23 3:32 ` Khushit Shah
2025-10-03 14:24 ` Khushit Shah
1 sibling, 1 reply; 15+ messages in thread
From: Khushit Shah @ 2025-09-23 3:32 UTC (permalink / raw)
To: Sean Christopherson
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Thanks for the very detailed review.
> On 23 Sep 2025, at 3:21 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> Don't bother sending a v2, at least not yet. I'll follow-up with various folks
> to try and figure out the least awful way to get out of this mess, and will
> probably post a v2 with a CAP and/or quirk.
I am happy to help with v2!
Reported-by: Khushit Shah <khushit.shah@nutanix.com>
Tested-by: Khushit Shah <khushit.shah@nutanix.com>
> On 23 Sep 2025, at 3:21 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /*
> + * Don't exit to userspace if the guest has enabled Directed
> + * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
> + * doesn't broadcast EOIs (the the guest must EOI the target
> + * I/O APIC(s) directly).
> + */
> +
Nit: small typo in the comment (“the the”).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-09-23 3:32 ` Khushit Shah
@ 2025-10-03 14:24 ` Khushit Shah
2025-10-24 20:21 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Khushit Shah @ 2025-10-03 14:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Sean,
Any updates on this?
I suggest adding a new KVM capability that disables advertising support for EOI
broadcast suppression when using split-irqchip. It is similar in spirit to
KVM_CAP_X2APIC_API for x2APIC quirks.
By default, we still assume the userspace I/O APIC implements the EOI register.
If it does not, userspace can set a flag before vCPU creation (after selecting
split-irqchip mode) to disable EOI broadcast suppression. This should be a
per-VM flag, as all APICs will share the same behavior. I am sharing a
preliminary diff for discussion. The earlier fix can sit on top of this. This just
allows disabling EOI broadcast suppression under split-irqchip.
What are your thoughts on this? If this seems reasonable, I can send a proper
patch.
Apologies if sending an inline diff isn’t standard procedure.
Thanks,
Khushit
---
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f19a76d3ca0e..8e087232dbcd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1457,6 +1457,8 @@ struct kvm_arch {
bool disabled_lapic_found;
+ bool disable_eoi_broadcast_suppression_support;
+
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..e822ed4310f5 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -879,6 +879,8 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
+#define KVM_SPLIT_IRQCHIP_API_DISABLE_EOI_BROADCAST_SUPPRESSION (1ULL << 0)
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4d77112b887d..1a077b5a75d7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -558,7 +558,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
* IOAPIC.
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
- !ioapic_in_kernel(vcpu->kvm))
+ !ioapic_in_kernel(vcpu->kvm) &&
+ !vcpu->kvm->arch.disable_eoi_broadcast_suppression_support)
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 706b6fd56d3c..9884c780138a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4785,6 +4785,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_TSC_CONTROL:
r = kvm_caps.has_tsc_control;
break;
+ case KVM_CAP_SPLIT_IRQCHIP_API:
+ r = KVM_SPLIT_IRQCHIP_API_DISABLE_EOI_BROADCAST_SUPPRESSION;
+ break;
case KVM_CAP_X2APIC_API:
r = KVM_X2APIC_API_VALID_FLAGS;
break;
@@ -6455,6 +6458,23 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
mutex_unlock(&kvm->lock);
break;
}
+ case KVM_CAP_SPLIT_IRQCHIP_API: {
+ mutex_lock(&kvm->lock);
+ if (!irqchip_split(kvm)) {
+ r = -ENXIO;
+ goto split_irqchip_api_unlock;
+ }
+ if (kvm->created_vcpus) {
+ r = -EINVAL;
+ goto split_irqchip_api_unlock;
+ }
+ kvm->arch.disable_eoi_broadcast_suppression_support = (cap->args[0]
+ & KVM_SPLIT_IRQCHIP_API_DISABLE_EOI_BROADCAST_SUPPRESSION) != 0;
+ r = 0;
+split_irqchip_api_unlock:
+ mutex_unlock(&kvm->lock);
+ break;
+ }
case KVM_CAP_X2APIC_API:
r = -EINVAL;
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0f0d49d2544..732a93f9365e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -962,6 +962,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242
#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
+#define KVM_CAP_SPLIT_IRQCHIP_API 244
struct kvm_irq_routing_irqchip {
__u32 irqchip;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-09-18 16:25 [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled Jon Kohler
2025-09-22 21:51 ` Sean Christopherson
@ 2025-10-24 12:08 ` Khushit Shah
1 sibling, 0 replies; 15+ messages in thread
From: Khushit Shah @ 2025-10-24 12:08 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: Jon Kohler, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, H. Peter Anvin, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Hi All,
Just following up on the patch series. The initial fix and the proposed KVM_CAP_SPLIT_IRQCHIP_API addition are both ready for further discussion or revision if needed.
Let me know if there’s been any movement on this or if I should post a v2.
Thanks,
Khushit
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-10-03 14:24 ` Khushit Shah
@ 2025-10-24 20:21 ` Sean Christopherson
2025-10-31 12:46 ` Khushit Shah
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-10-24 20:21 UTC (permalink / raw)
To: Khushit Shah
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Fri, Oct 03, 2025, Khushit Shah wrote:
> Hi Sean,
>
> Any updates on this?
Sorry, fell into the classic pattern of "I'll do that one tomorrow...".
> I suggest adding a new KVM capability that disables advertising support for EOI
> broadcast suppression when using split-irqchip. It is similar in spirit to
> KVM_CAP_X2APIC_API for x2APIC quirks.
>
> By default, we still assume the userspace I/O APIC implements the EOI register.
> If it does not, userspace can set a flag before vCPU creation (after selecting
> split-irqchip mode) to disable EOI broadcast suppression. This should be a
> per-VM flag, as all APICs will share the same behavior. I am sharing a
> preliminary diff for discussion. The earlier fix can sit on top of this. This just
> allows disabling EOI broadcast suppression under split-irqchip.
>
> What are your thoughts on this? If this seems reasonable, I can send a proper
> patch.
Make it a quirk instead of a capability. This is definitely a KVM bug, it's just
unfortunately one that we can't fix without breaking userspace :-/
And I'm pretty sure we want to quirk the exit to userspace, not the enumeration
of and support for the feature, e.g. so that an updated userspace VMM can disable
the quirk on a live update/migration and take advantage of the fanciness without
having to wait for guests to reboot.
Can you also start with the below changelog+comment? I massaged in anticipation
of applying v1 before I realized it would break userespace :-)
E.g. with the quirk stubbed in (obviously not tested in any capacity):
--
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Thu, 18 Sep 2025 09:25:28 -0700
Subject: [PATCH] KVM: x86: Suppress EOI broadcasts with split IRQCHIP if
Directed EOI is enabled
Do not generate a KVM_EXIT_IOAPIC_EOI exit to userspace when handling EOIs
for a split IRQCHIP and the vCPU has enabled Directed EOIs in its local
APIC, i.e. if the guest has set "Suppress EOI Broadcasts" in Intel
parlance.
Incorrectly broadcasting EOIs can lead to a potentially fatal interrupt
storm if the IRQ line is still asserted and userspace reacts to the EOI by
re-injecting the IRQ. E.g. Windows with Hyper-V enabled gets stuck during
boot when running under QEMU with a split IRQCHIP.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Note #2, KVM doesn't support Directed EOIs for its in-kernel I/O APIC.
See commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when
in-kernel IOAPIC is in use").
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Link: https://lore.kernel.org/r/20250918162529.640943-1-jon@nutanix.com
[sean: rewrite changelog and comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fc437341e03..56542239cc6b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1429,6 +1429,17 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the the guest must EOI the
+ * target I/O APIC(s) directly).
+ */
+ if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+ !kvm_check_has_quirk(vcpu->kvm,
+ KVM_X86_QUIRK_IGNORE_SUPPRESS_EOI_BROADCAST))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
base-commit: 07e27ad16399afcd693be20211b0dfae63e0615f
--
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-10-24 20:21 ` Sean Christopherson
@ 2025-10-31 12:46 ` Khushit Shah
2025-10-31 17:28 ` Sean Christopherson
0 siblings, 1 reply; 15+ messages in thread
From: Khushit Shah @ 2025-10-31 12:46 UTC (permalink / raw)
To: Sean Christopherson
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Sean,
Thanks for the reply.
> On 25 Oct 2025, at 1:51 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> Make it a quirk instead of a capability. This is definitely a KVM bug, it's just
> unfortunately one that we can't fix without breaking userspace :-/
I don’t think this approach fully addresses the issue.
For example, consider the same Windows guest running with a userspace
I/O APIC that has no EOI registers. The guest will set the Suppress EOI
Broadcast bit because KVM advertises support for it (see
kvm_apic_set_version).
If the quirk is enabled, an interrupt storm will occur.
If the quirk is disabled, userspace will never receive the EOI
notification.
For context, Windows with CG the interrupt in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Guest is not doing anything theoretically wrong here.
The root issue is that KVM advertises support for EOI broadcast
suppression without knowing whether userspace supports it.
Even my previous proposal doesn’t completely solve this. A potential
way to fix it without breaking userspace would be to let userspace
explicitly indicate whether it supports EOI broadcast suppression
(i.e. whether it implements EOI registers). By default, KVM should
assume userspace does *not* support EOI broadcast suppression,
contrary to the current behavior.
This way, unmodified userspace remains unaffected, and updated
userspace can opt in when it truly supports EOI broadcast suppression.
Am I missing something?
Regards,
Khushit
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-10-31 12:46 ` Khushit Shah
@ 2025-10-31 17:28 ` Sean Christopherson
2025-11-03 4:36 ` Khushit Shah
2025-11-04 9:55 ` Huang, Kai
0 siblings, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2025-10-31 17:28 UTC (permalink / raw)
To: Khushit Shah
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Fri, Oct 31, 2025, Khushit Shah wrote:
> Hi Sean,
>
> Thanks for the reply.
>
> > On 25 Oct 2025, at 1:51 AM, Sean Christopherson <seanjc@google.com> wrote:
> >
> > Make it a quirk instead of a capability. This is definitely a KVM bug, it's just
> > unfortunately one that we can't fix without breaking userspace :-/
>
> I don’t think this approach fully addresses the issue.
>
> For example, consider the same Windows guest running with a userspace
> I/O APIC that has no EOI registers. The guest will set the Suppress EOI
> Broadcast bit because KVM advertises support for it (see
> kvm_apic_set_version).
>
> If the quirk is enabled, an interrupt storm will occur.
> If the quirk is disabled, userspace will never receive the EOI
> notification.
Uh, why not?
> For context, Windows with CG the interrupt in the following order:
> 1. Interrupt for L2 arrives.
> 2. L1 APIC EOIs the interrupt.
> 3. L1 resumes L2 and injects the interrupt.
> 4. L2 EOIs after servicing.
> 5. L1 performs the I/O APIC EOI.
And at #5, the MMIO access to the I/O APIC gets routed to userspace for emulation.
> Guest is not doing anything theoretically wrong here.
>
> The root issue is that KVM advertises support for EOI broadcast
> suppression without knowing whether userspace supports it.
That's the whole point of the quirk; userspace should disable the quirk if and
only if it supports the I/O APIC EOI extension.
> Even my previous proposal doesn’t completely solve this. A potential
> way to fix it without breaking userspace would be to let userspace
> explicitly indicate whether it supports EOI broadcast suppression
> (i.e. whether it implements EOI registers). By default, KVM should
> assume userspace does *not* support EOI broadcast suppression,
> contrary to the current behavior.
But as I mentioned in my previous reply, that requires a guest reboot to take
affect.
> This way, unmodified userspace remains unaffected,
Not entirely, no. If you strictly scope "userspace" to mean the VMM code, then
yes, that statement is true. But changing the virtual CPU model that is presented
to the guest is absolutely going to affect userspace, in the sense that a guest
will see what appears to be different CPUs
> and updated userspace can opt in when it truly supports EOI broadcast
> suppression.
>
> Am I missing something?
I think so? It's also possible I'm missing something :-)
And all of the above said, I'm not at all opposed to giving userspace control
over whether or not Suppress EOI Broadcast is advertised to the guest. Quite
the opposite actually. It's just that I also want to provide a fix that allows
for fixing the worst of the issue without needing a guest reboot, and without
having to change the virtual CPU model that's exposed to the guest.
So, what if we do both? And to avoid spreading the damage all over the place,
use KVM_CAP_X2APIC_API? Compile tested only...
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Fri, 31 Oct 2025 09:25:59 -0700
Subject: [PATCH] KVM: x86: Add x2APIC "features" to control EOI broadcast
suppression
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts, which KVM completely mishandles. When x2APIC
support was first added, KVM incorrectly advertised and "enabled" Suppress
EOI Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Supress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Supress EOI Broad is problematic when the guest
relies on interrupts being masked in the I/O APIC until well after the
initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
the interrupt in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add two flags to allow userspace to choose exactly how to solve the
immediate issue, and in the long term to allow userspace to control the
virtual CPU model that is exposed to the guest (KVM should never have
enabled supported for Supress EOI Broadcast without a userspace opt-in).
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable@vger.kernel.org
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 14 ++++++++++++--
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/include/uapi/asm/kvm.h | 6 ++++--
arch/x86/kvm/lapic.c | 13 +++++++++++++
arch/x86/kvm/x86.c | 11 ++++++++---
5 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 57061fa29e6a..4bfd4ed6afa4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7800,8 +7800,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST_QUIRK (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7814,6 +7816,14 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST_QUIRK overrides KVM's quirky
+behavior of not actually suppressing EOI broadcasts for split IRQ chips when
+support for Suppress EOI Broadcasts is advertised to the guest.
+
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for Suppress
+EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise support to the
+guest and thus disallow enabling EOI broadcast suppression in SPIV.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..fdf4f99de630 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1480,6 +1480,8 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ bool disable_suppress_eoi_broadcast_quirk;
+ bool x2apic_disable_suppress_eoi_broadcast;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d420c9c066d4..955b854b4b82 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -913,8 +913,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST_QUIRK (1ULL << 2)
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0ae7f913d782..f83abbcf136f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -562,6 +562,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
* IOAPIC.
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
+ !vcpu->kvm->arch.x2apic_disable_suppress_eoi_broadcast &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
@@ -1517,6 +1518,18 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly). Ignore the suppression if userspace
+ * has NOT disabled KVM's quirk (KVM advertised support for
+ * Suppress EOI Broadcasts without actually suppressing EOIs).
+ */
+ if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+ apic->vcpu->kvm->arch.disable_suppress_eoi_broadcast_quirk)
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4b5d2d09634..b82840104c53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST_QUIRK | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -6783,7 +6785,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
-
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST_QUIRK)
+ kvm->arch.disable_suppress_eoi_broadcast_quirk = true;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.x2apic_disable_suppress_eoi_broadcast = true;
r = 0;
break;
case KVM_CAP_X86_DISABLE_EXITS:
base-commit: 4361f5aa8bfcecbab3fc8db987482b9e08115a6a
--
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-10-31 17:28 ` Sean Christopherson
@ 2025-11-03 4:36 ` Khushit Shah
2025-11-03 16:57 ` Sean Christopherson
2025-11-04 9:55 ` Huang, Kai
1 sibling, 1 reply; 15+ messages in thread
From: Khushit Shah @ 2025-11-03 4:36 UTC (permalink / raw)
To: Sean Christopherson
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Sean,
> On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
>
>> Hi Sean,
>>
>> Thanks for the reply.
>>
>>> On 25 Oct 2025, at 1:51 AM, Sean Christopherson <seanjc@google.com> wrote:
>>>
>>> Make it a quirk instead of a capability. This is definitely a KVM bug, it's just
>>> unfortunately one that we can't fix without breaking userspace :-/
>>
>> I don’t think this approach fully addresses the issue.
>>
>> For example, consider the same Windows guest running with a userspace
>> I/O APIC that has no EOI registers. The guest will set the Suppress EOI
>> Broadcast bit because KVM advertises support for it (see
>> kvm_apic_set_version).
>>
>> If the quirk is enabled, an interrupt storm will occur.
>> If the quirk is disabled, userspace will never receive the EOI
>> notification.
>
> Uh, why not?
>
>> For context, Windows with CG the interrupt in the following order:
>> 1. Interrupt for L2 arrives.
>> 2. L1 APIC EOIs the interrupt.
>> 3. L1 resumes L2 and injects the interrupt.
>> 4. L2 EOIs after servicing.
>> 5. L1 performs the I/O APIC EOI.
>
> And at #5, the MMIO access to the I/O APIC gets routed to userspace for emulation.
Yes, but the userspace does not have I/O APIC EOI register and so it will just be a
meaningless MMIO write, resulting in the the IRQ line being kept masked.
> On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
>
> That's the whole point of the quirk; userspace should disable the quirk if and
> only if it supports the I/O APIC EOI extension.
Sadly, so if the quirk is kept enabled (no I/O APIC EOI extension) and if we do
not want a guest reboot, the original windows interrupt storm bug will persist?
Unless we also update the userspace to handle the EOI register write nonetheless,
as damage has been done on the time of power on.
> On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
>
>> and updated userspace can opt in when it truly supports EOI broadcast
>> suppression.
>>
>> Am I missing something?
>
> I think so? It's also possible I'm missing something :-)
I am just thinking that the original Windows bug is not solved for all the cases,
i.e A powered on Windows guest with userspace I/O APIC that does not have
EOI register.
Also, in the patch instead of a knob to disable suppress EOI broadcast, I think
we should have a knob to enable, this way at least for unmodified userspace
the buggy situation is never reached.
Other than this, you patch makes perfect sense, If you want I can polish it
and test it along with qemu side changes.
Regards,
Khushit.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-11-03 4:36 ` Khushit Shah
@ 2025-11-03 16:57 ` Sean Christopherson
2025-11-04 5:08 ` Khushit Shah
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-11-03 16:57 UTC (permalink / raw)
To: Khushit Shah
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, Nov 03, 2025, Khushit Shah wrote:
> Hi Sean,
>
> > On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
> >
> >> Hi Sean,
> >>
> >> Thanks for the reply.
> >>
> >>> On 25 Oct 2025, at 1:51 AM, Sean Christopherson <seanjc@google.com> wrote:
> >>>
> >>> Make it a quirk instead of a capability. This is definitely a KVM bug, it's just
> >>> unfortunately one that we can't fix without breaking userspace :-/
> >>
> >> I don’t think this approach fully addresses the issue.
> >>
> >> For example, consider the same Windows guest running with a userspace
> >> I/O APIC that has no EOI registers. The guest will set the Suppress EOI
> >> Broadcast bit because KVM advertises support for it (see
> >> kvm_apic_set_version).
> >>
> >> If the quirk is enabled, an interrupt storm will occur.
> >> If the quirk is disabled, userspace will never receive the EOI
> >> notification.
> >
> > Uh, why not?
> >
> >> For context, Windows with CG the interrupt in the following order:
> >> 1. Interrupt for L2 arrives.
> >> 2. L1 APIC EOIs the interrupt.
> >> 3. L1 resumes L2 and injects the interrupt.
> >> 4. L2 EOIs after servicing.
> >> 5. L1 performs the I/O APIC EOI.
> >
> > And at #5, the MMIO access to the I/O APIC gets routed to userspace for emulation.
>
> Yes, but the userspace does not have I/O APIC EOI register and so it will just be a
> meaningless MMIO write, resulting in the the IRQ line being kept masked.
Why on earth would userspace disable the quirk without proper support?
> > On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
> >
> > That's the whole point of the quirk; userspace should disable the quirk if and
> > only if it supports the I/O APIC EOI extension.
>
>
> Sadly, so if the quirk is kept enabled (no I/O APIC EOI extension) and if we do
> not want a guest reboot, the original windows interrupt storm bug will persist?
Well, yeah, if you don't fix the bug it'll keep causing problems.
> Unless we also update the userspace to handle the EOI register write nonetheless,
> as damage has been done on the time of power on.
>
> > On 31 Oct 2025, at 10:58 PM, Sean Christopherson <seanjc@google.com> wrote:
> >
> >> and updated userspace can opt in when it truly supports EOI broadcast
> >> suppression.
> >>
> >> Am I missing something?
> >
> > I think so? It's also possible I'm missing something :-)
>
> I am just thinking that the original Windows bug is not solved for all the cases,
> i.e A powered on Windows guest with userspace I/O APIC that does not have
> EOI register.
Userspace _must_ change one way or the other. Either that or you livepatch your
kernel to carry an out-of-tree hack-a-fix to avoid updating userspace.
> Also, in the patch instead of a knob to disable suppress EOI broadcast, I think
> we should have a knob to enable, this way at least for unmodified userspace
> the buggy situation is never reached.
No. Having a bug that prevents booting certain guests is bad. Introducing a
change that potentially breaks existing setups is worse. Yes, it's unfortunate
that userspace needs to be updated to fully remedy the issue. But unless you're
livepatching the kernel, userspace should be updated anyways on a full reboot.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-11-03 16:57 ` Sean Christopherson
@ 2025-11-04 5:08 ` Khushit Shah
0 siblings, 0 replies; 15+ messages in thread
From: Khushit Shah @ 2025-11-04 5:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: Jon Kohler, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Hi Sean,
Agreed. I was too focused on fixing this purely in KVM and didn’t account for userspace realities. The quirk plus x2APIC flag makes sense.
I’ll make qemu side changes, test it as a whole and report back with results.
Thanks,
Khushit
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-10-31 17:28 ` Sean Christopherson
2025-11-03 4:36 ` Khushit Shah
@ 2025-11-04 9:55 ` Huang, Kai
2025-11-05 16:33 ` Sean Christopherson
1 sibling, 1 reply; 15+ messages in thread
From: Huang, Kai @ 2025-11-04 9:55 UTC (permalink / raw)
To: khushit.shah@nutanix.com, seanjc@google.com
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, bp@alien8.de,
Kohler, Jon, hpa@zytor.com, tglx@linutronix.de,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, mingo@redhat.com
[...]
> KVM's bogus handling of Supress EOI Broad is problematic when the guest
> relies on interrupts being masked in the I/O APIC until well after the
> initial local APIC EOI. E.g. Windows with Credential Guard enabled
> handles interrupts in the following order:
>
> the interrupt in the following order:
This sentence is broken and is not needed.
> 1. Interrupt for L2 arrives.
> 2. L1 APIC EOIs the interrupt.
> 3. L1 resumes L2 and injects the interrupt.
> 4. L2 EOIs after servicing.
> 5. L1 performs the I/O APIC EOI.
>
[...]
> @@ -1517,6 +1518,18 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /*
> + * Don't exit to userspace if the guest has enabled Directed
> + * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
> + * APIC doesn't broadcast EOIs (the guest must EOI the target
> + * I/O APIC(s) directly). Ignore the suppression if userspace
> + * has NOT disabled KVM's quirk (KVM advertised support for
> + * Suppress EOI Broadcasts without actually suppressing EOIs).
> + */
> + if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> + apic->vcpu->kvm->arch.disable_suppress_eoi_broadcast_quirk)
> + return;
> +
I found the name 'disable_suppress_eoi_broadcast_quick' is kinda confusing,
since it can be interpreted in two ways:
- the quirk is 'suppress_eoi_broadcast', and this boolean is to disable
this quirk.
- the quirk is 'disable_suppress_eoi_broadcast'.
And in either case, the final meaning is KVM needs to "disable suppress EOI
broadcast" when that boolean is true, which in turn means KVM actually needs
to "broadcast EOI" IIUC. But the above check seems does the opposite.
Perhaps "ignore suppress EOI broadcast" in your previous version is better?
Also, IIUC the quirk only applies to userspace IOAPIC, so is it better to
include "split IRQCHIP" to the name? Otherwise people may think it also
applies to in-kernel IOAPIC.
Btw, personally I also found "directed EOI" is more understandable than
"suppress EOI broadcast". How about using "directed EOI" in the code
instead? E.g.,
s/disable_suppress_eoi_broadcast/disable_directed_eoi
s/KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST/KVM_X2APIC_DISABLE_DIRECTED_EOI
It is shorter, and KVM is already using APIC_LVR_DIRECTED_EOI anyway.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-11-04 9:55 ` Huang, Kai
@ 2025-11-05 16:33 ` Sean Christopherson
2025-11-05 21:33 ` Huang, Kai
0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2025-11-05 16:33 UTC (permalink / raw)
To: Kai Huang
Cc: khushit.shah@nutanix.com, linux-kernel@vger.kernel.org,
x86@kernel.org, bp@alien8.de, Jon Kohler, hpa@zytor.com,
tglx@linutronix.de, dave.hansen@linux.intel.com,
pbonzini@redhat.com, kvm@vger.kernel.org, mingo@redhat.com
On Tue, Nov 04, 2025, Kai Huang wrote:
>
> [...]
>
>
> > KVM's bogus handling of Supress EOI Broad is problematic when the guest
> > relies on interrupts being masked in the I/O APIC until well after the
> > initial local APIC EOI. E.g. Windows with Credential Guard enabled
> > handles interrupts in the following order:
> >
> > the interrupt in the following order:
>
> This sentence is broken and is not needed.
>
> > 1. Interrupt for L2 arrives.
> > 2. L1 APIC EOIs the interrupt.
> > 3. L1 resumes L2 and injects the interrupt.
> > 4. L2 EOIs after servicing.
> > 5. L1 performs the I/O APIC EOI.
> >
>
> [...]
>
> > @@ -1517,6 +1518,18 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >
> > /* Request a KVM exit to inform the userspace IOAPIC. */
> > if (irqchip_split(apic->vcpu->kvm)) {
> > + /*
> > + * Don't exit to userspace if the guest has enabled Directed
> > + * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
> > + * APIC doesn't broadcast EOIs (the guest must EOI the target
> > + * I/O APIC(s) directly). Ignore the suppression if userspace
> > + * has NOT disabled KVM's quirk (KVM advertised support for
> > + * Suppress EOI Broadcasts without actually suppressing EOIs).
> > + */
> > + if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > + apic->vcpu->kvm->arch.disable_suppress_eoi_broadcast_quirk)
> > + return;
> > +
>
> I found the name 'disable_suppress_eoi_broadcast_quick' is kinda confusing,
> since it can be interpreted in two ways:
>
> - the quirk is 'suppress_eoi_broadcast', and this boolean is to disable
> this quirk.
> - the quirk is 'disable_suppress_eoi_broadcast'.
I hear you, but all of KVM's quirks are phrased exactly like this:
KVM_CAP_DISABLE_QUIRKS
KVM_CAP_DISABLE_QUIRKS2
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK
disable_slot_zap_quirk
> And in either case, the final meaning is KVM needs to "disable suppress EOI
> broadcast" when that boolean is true,
No. The flag says "Disable KVM's 'Suppress EOI-broadcast' Quirk", where the
quirk is that KVM always broadcasts even when broadcasts are supposed to be
suppressed.
> which in turn means KVM actually needs to "broadcast EOI" IIUC. But the
> above check seems does the opposite.
>
> Perhaps "ignore suppress EOI broadcast" in your previous version is better?
Hmm, I wanted to specifically call out that the behavior is a quirk. At the
risk of being too verbose, maybe DISABLE_IGNORE_SUPPRESS_EOI_BROADCAST_QUIRK?
And then to keep line lengths sane, grab "kvm" locally so that we can end up with:
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(kvm)) {
/*
* Don't exit to userspace if the guest has enabled Directed
* EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
* APIC doesn't broadcast EOIs (the guest must EOI the target
* I/O APIC(s) directly). Ignore the suppression if userspace
* has NOT disabled KVM's quirk (KVM advertised support for
* Suppress EOI Broadcasts without actually suppressing EOIs).
*/
if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
kvm->arch.disable_ignore_suppress_eoi_broadcast_quirk)
return;
> Also, IIUC the quirk only applies to userspace IOAPIC, so is it better to
> include "split IRQCHIP" to the name? Otherwise people may think it also
> applies to in-kernel IOAPIC.
Eh, I'd prefer to solve that through documentation and comments. The name is
already brutally long.
> Btw, personally I also found "directed EOI" is more understandable than
> "suppress EOI broadcast". How about using "directed EOI" in the code
> instead? E.g.,
>
> s/disable_suppress_eoi_broadcast/disable_directed_eoi
> s/KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST/KVM_X2APIC_DISABLE_DIRECTED_EOI
>
> It is shorter, and KVM is already using APIC_LVR_DIRECTED_EOI anyway.
It's also wrong. Directed EOI is the I/O APIC feature, the local APIC (CPU)
feature is "Suppress EOI-broadcasts" or "EOI-broadcast suppression". Conflating
those two features is largely what led to this mess in the first place, so I'd
strongly prefer not to bleed that confusion into KVM's uAPI.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
2025-11-05 16:33 ` Sean Christopherson
@ 2025-11-05 21:33 ` Huang, Kai
0 siblings, 0 replies; 15+ messages in thread
From: Huang, Kai @ 2025-11-05 21:33 UTC (permalink / raw)
To: seanjc@google.com
Cc: mingo@redhat.com, khushit.shah@nutanix.com, x86@kernel.org,
bp@alien8.de, hpa@zytor.com, Kohler, Jon,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
pbonzini@redhat.com, kvm@vger.kernel.org,
dave.hansen@linux.intel.com
> >
> > > @@ -1517,6 +1518,18 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> > >
> > > /* Request a KVM exit to inform the userspace IOAPIC. */
> > > if (irqchip_split(apic->vcpu->kvm)) {
> > > + /*
> > > + * Don't exit to userspace if the guest has enabled Directed
> > > + * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
> > > + * APIC doesn't broadcast EOIs (the guest must EOI the target
> > > + * I/O APIC(s) directly). Ignore the suppression if userspace
> > > + * has NOT disabled KVM's quirk (KVM advertised support for
> > > + * Suppress EOI Broadcasts without actually suppressing EOIs).
> > > + */
> > > + if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > > + apic->vcpu->kvm->arch.disable_suppress_eoi_broadcast_quirk)
> > > + return;
> > > +
> >
> > I found the name 'disable_suppress_eoi_broadcast_quick' is kinda confusing,
> > since it can be interpreted in two ways:
> >
> > - the quirk is 'suppress_eoi_broadcast', and this boolean is to disable
> > this quirk.
> > - the quirk is 'disable_suppress_eoi_broadcast'.
>
> I hear you, but all of KVM's quirks are phrased exactly like this:
>
> KVM_CAP_DISABLE_QUIRKS
> KVM_CAP_DISABLE_QUIRKS2
> KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK
> disable_slot_zap_quirk
Fair enough. It follows KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK which
disables a quirk named "broadcast".
I was looking at below though:
#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
Where we can tell very clearly about the name of the quirk.
And AFAICT the name tells what KVM actually does (I didn't check them all
though) -- e.g., for the SLOT_ZAP_ALL quirk, when a VM has this quirk, KVM
zaps all rather than only one slot.
I guess this was how I got confused about "SUPPRESS_EOI_BROADCAST" quirk --
I thought it was "KVM suppresses EOI broadcast while it should not", but it
actually means opposite ...
>
> > And in either case, the final meaning is KVM needs to "disable suppress EOI
> > broadcast" when that boolean is true,
>
> No. The flag says "Disable KVM's 'Suppress EOI-broadcast' Quirk", where the
> quirk is that KVM always broadcasts even when broadcasts are supposed to be
> suppressed.
... as you said here. :-)
>
> > which in turn means KVM actually needs to "broadcast EOI" IIUC. But the
> > above check seems does the opposite.
> >
> > Perhaps "ignore suppress EOI broadcast" in your previous version is better?
>
> Hmm, I wanted to specifically call out that the behavior is a quirk. At the
> risk of being too verbose, maybe DISABLE_IGNORE_SUPPRESS_EOI_BROADCAST_QUIRK?
I think it reflects the behaviour of the quirk more, thus I kinda prefer
this.
>
> And then to keep line lengths sane, grab "kvm" locally so that we can end up with:
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(kvm)) {
> /*
> * Don't exit to userspace if the guest has enabled Directed
> * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
> * APIC doesn't broadcast EOIs (the guest must EOI the target
> * I/O APIC(s) directly). Ignore the suppression if userspace
> * has NOT disabled KVM's quirk (KVM advertised support for
> * Suppress EOI Broadcasts without actually suppressing EOIs).
> */
> if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> kvm->arch.disable_ignore_suppress_eoi_broadcast_quirk)
> return;
>
> > Also, IIUC the quirk only applies to userspace IOAPIC, so is it better to
> > include "split IRQCHIP" to the name? Otherwise people may think it also
> > applies to in-kernel IOAPIC.
>
> Eh, I'd prefer to solve that through documentation and comments. The name is
> already brutally long.
I still kinda prefer the explicitness but no problem of skipping this part.
Btw, hate to say, but the existing x2 apic macros have an "_API" postfix
after "KVM_X2APIC":
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
>
> > Btw, personally I also found "directed EOI" is more understandable than
> > "suppress EOI broadcast". How about using "directed EOI" in the code
> > instead? E.g.,
> >
> > s/disable_suppress_eoi_broadcast/disable_directed_eoi
> > s/KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST/KVM_X2APIC_DISABLE_DIRECTED_EOI
> >
> > It is shorter, and KVM is already using APIC_LVR_DIRECTED_EOI anyway.
>
> It's also wrong. Directed EOI is the I/O APIC feature, the local APIC (CPU)
> feature is "Suppress EOI-broadcasts" or "EOI-broadcast suppression". Conflating
> those two features is largely what led to this mess in the first place, so I'd
> strongly prefer not to bleed that confusion into KVM's uAPI.
OK fair enough.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-05 21:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 16:25 [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled Jon Kohler
2025-09-22 21:51 ` Sean Christopherson
2025-09-23 1:26 ` Huang, Kai
2025-09-23 3:32 ` Khushit Shah
2025-10-03 14:24 ` Khushit Shah
2025-10-24 20:21 ` Sean Christopherson
2025-10-31 12:46 ` Khushit Shah
2025-10-31 17:28 ` Sean Christopherson
2025-11-03 4:36 ` Khushit Shah
2025-11-03 16:57 ` Sean Christopherson
2025-11-04 5:08 ` Khushit Shah
2025-11-04 9:55 ` Huang, Kai
2025-11-05 16:33 ` Sean Christopherson
2025-11-05 21:33 ` Huang, Kai
2025-10-24 12:08 ` Khushit Shah
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).