* [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast
@ 2025-12-29 11:17 Khushit Shah
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
` (3 more replies)
0 siblings, 4 replies; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 11:17 UTC (permalink / raw)
To: seanjc, pbonzini, kai.huang, dwmw2
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, Khushit Shah
Suppress EOI Broadcast (SEOIB) is an x2APIC feature that stops the local
APIC from broadcasting EOIs to I/O APICs. When enabled, guests must
directly write to specific I/O APIC's EOI Register (available in I/O APIC
version 0x20+) to EOI the interrupt.
KVM has historically mishandled SEOIB support. When x2APIC was introduced,
KVM advertised the feature without implementing the I/O APIC side (directed
EOI). This flaw carried over to split IRQCHIP mode, where KVM always
advertised support but didn't actually honor the guest's decision to
suppress EOI broadcast, and kept broadcasting EOIs to userspace.
The broken behavior can cause interrupt storms on guests that perform I/O
APIC EOI well after LAPIC EOI (e.g. Windows with Credential Guard enabled).
KVM "fixed" in-kernel IRQCHIP by not advertising SEOIB support, but
split IRQCHIP was never fixed. Rather than silently changing guest-visible
behavior, this series adds userspace control via KVM_CAP_X2APIC_API flags,
allowing VMMs to explicitly enable or disable SEOIB support. When enabled
with in-kernel IRQCHIP, KVM uses I/O APIC version 0x20 which provides the
EOI Register for directed EOI.
The series maintains backward compatibility: if neither flag is set,
legacy behavior is preserved. Modern VMMs should explicitly set either
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST or
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST.
Tested:
- No flags set: legacy quirky behavior preserved.
- ENABLE flag set: SEOIB advertised, in-kernel IRQCHIP uses I/O APIC
version 0x20.
- DISABLE flag set: SEOIB not advertised.
Changes in v5:
- Split into 3-patch series (refactor, I/O APIC 0x20 support, userspace
control)
- Extended to support in-kernel IRQCHIP mode.
- I/O APIC version 0x20 is used when enabling with in-kernel IRQCHIP
David Woodhouse (1):
KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
Khushit Shah (2):
KVM: x86: Refactor suppress EOI broadcast logic
KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
Documentation/virt/kvm/api.rst | 28 +++++++++++-
arch/x86/include/asm/kvm_host.h | 7 +++
arch/x86/include/uapi/asm/kvm.h | 6 ++-
arch/x86/kvm/ioapic.c | 43 ++++++++++++++++---
arch/x86/kvm/ioapic.h | 19 +++++----
arch/x86/kvm/lapic.c | 75 +++++++++++++++++++++++++++++----
arch/x86/kvm/lapic.h | 3 ++
arch/x86/kvm/trace.h | 17 ++++++++
arch/x86/kvm/x86.c | 15 ++++++-
9 files changed, 186 insertions(+), 27 deletions(-)
--
2.39.3
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2025-12-29 11:17 [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast Khushit Shah
@ 2025-12-29 11:17 ` Khushit Shah
2026-01-02 16:23 ` David Woodhouse
2026-01-13 23:11 ` Sean Christopherson
2025-12-29 11:17 ` [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR Khushit Shah
` (2 subsequent siblings)
3 siblings, 2 replies; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 11:17 UTC (permalink / raw)
To: seanjc, pbonzini, kai.huang, dwmw2
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, Khushit Shah
Extract the suppress EOI broadcast (Directed EOI) logic into helper
functions and move the check from kvm_ioapic_update_eoi_one() to
kvm_ioapic_update_eoi() (required for a later patch). Prepare
kvm_ioapic_send_eoi() to honor Suppress EOI Broadcast in split IRQCHIP
mode.
Introduce two helper functions:
- kvm_lapic_advertise_suppress_eoi_broadcast(): determines whether KVM
should advertise Suppress EOI Broadcast support to the guest
- kvm_lapic_respect_suppress_eoi_broadcast(): determines whether KVM should
honor the guest's request to suppress EOI broadcasts
This refactoring prepares for I/O APIC version 0x20 support and userspace
control of suppress EOI broadcast behavior.
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
arch/x86/kvm/ioapic.c | 12 +++++++---
arch/x86/kvm/lapic.c | 53 ++++++++++++++++++++++++++++++++++++-------
arch/x86/kvm/lapic.h | 3 +++
3 files changed, 57 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2c2783296aed..6bf8d110aece 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -545,7 +545,6 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
int trigger_mode,
int pin)
{
- struct kvm_lapic *apic = vcpu->arch.apic;
union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[pin];
/*
@@ -560,8 +559,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
spin_lock(&ioapic->lock);
- if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ if (trigger_mode != IOAPIC_LEVEL_TRIG)
return;
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
@@ -591,10 +589,16 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
{
int i;
+ struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
spin_lock(&ioapic->lock);
rtc_irq_eoi(ioapic, vcpu, vector);
+
+ if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+ kvm_lapic_respect_suppress_eoi_broadcast(ioapic->kvm))
+ goto out;
+
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
@@ -602,6 +606,8 @@ void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
continue;
kvm_ioapic_update_eoi_one(vcpu, ioapic, trigger_mode, i);
}
+
+out:
spin_unlock(&ioapic->lock);
}
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0ae7f913d782..2c24fd8d815f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -105,6 +105,39 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}
+bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ /*
+ * The default in-kernel I/O APIC emulates the 82093AA and does not
+ * implement an EOI register. Some guests (e.g. Windows with the
+ * Hyper-V role enabled) disable LAPIC EOI broadcast without checking
+ * the I/O APIC version, which can cause level-triggered interrupts to
+ * never be EOI'd.
+ *
+ * To avoid this, KVM must not advertise Suppress EOI Broadcast support
+ * when using the default in-kernel I/O APIC.
+ *
+ * Historically, in split IRQCHIP mode, KVM always advertised Suppress
+ * EOI Broadcast support but did not actually suppress EOIs, resulting
+ * in quirky behavior.
+ */
+ return !ioapic_in_kernel(kvm);
+}
+
+bool kvm_lapic_respect_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ /*
+ * Returns true if KVM should honor the guest's request to suppress EOI
+ * broadcasts, i.e. actually implement Suppress EOI Broadcast.
+ *
+ * Historically, in split IRQCHIP mode, KVM ignored the suppress EOI
+ * broadcast bit set by the guest and broadcasts EOIs to the userspace
+ * I/O APIC. For In-kernel I/O APIC, the support itself is not
+ * advertised, but if bit was set by the guest, it was respected.
+ */
+ return ioapic_in_kernel(kvm);
+}
+
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
@@ -554,15 +587,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
- /*
- * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
- * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
- * Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
- * version first and level-triggered interrupts never get EOIed in
- * IOAPIC.
- */
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
- !ioapic_in_kernel(vcpu->kvm))
+ kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}
@@ -1517,6 +1544,16 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly).
+ */
+ if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+ kvm_lapic_respect_suppress_eoi_broadcast(apic->vcpu->kvm))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 282b9b7da98c..fe2db0f1d190 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -231,6 +231,9 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
+bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm);
+bool kvm_lapic_respect_suppress_eoi_broadcast(struct kvm *kvm);
+
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
--
2.39.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 11:17 [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast Khushit Shah
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
@ 2025-12-29 11:17 ` Khushit Shah
2025-12-29 11:39 ` David Woodhouse
2025-12-29 11:17 ` [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-29 4:49 ` [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes David Woodhouse
3 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 11:17 UTC (permalink / raw)
To: seanjc, pbonzini, kai.huang, dwmw2
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, David Woodhouse, Khushit Shah
From: David Woodhouse <dwmw@amazon.co.uk>
Introduce support for I/O APIC version 0x20, which includes the EOI
Register (EOIR) for directed EOI. The EOI register allows guests to
perform EOIs to individual I/O APICs instead of relying on broadcast EOIs
from the local APIC.
When Suppress EOI Broadcast (SEOIB) capability is advertised to the guest,
guests that enable it will EOI individual I/O APICs by writing to their
EOI register instead of relying on broadcast EOIs from the LAPIC. Hence,
when SEOIB is advertised (so that guests can use it if they choose), use
I/O APIC version 0x20 to provide the EOI register. This prepares for a
userspace API that will allow explicit control of SEOIB support, providing
a consistent interface for both in-kernel and split IRQCHIP mode.
Add a tracepoint (kvm_ioapic_directed_eoi) to track directed EOIs for
debugging and observability.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
arch/x86/kvm/ioapic.c | 31 +++++++++++++++++++++++++++++--
arch/x86/kvm/ioapic.h | 19 +++++++++++--------
arch/x86/kvm/trace.h | 17 +++++++++++++++++
3 files changed, 57 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6bf8d110aece..eea1eb7845c4 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -48,8 +48,11 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
switch (ioapic->ioregsel) {
case IOAPIC_REG_VERSION:
- result = ((((IOAPIC_NUM_PINS - 1) & 0xff) << 16)
- | (IOAPIC_VERSION_ID & 0xff));
+ if (kvm_lapic_advertise_suppress_eoi_broadcast(ioapic->kvm))
+ result = IOAPIC_VERSION_ID_EOIR;
+ else
+ result = IOAPIC_VERSION_ID;
+ result |= ((IOAPIC_NUM_PINS - 1) & 0xff) << 16;
break;
case IOAPIC_REG_APIC_ID:
@@ -57,6 +60,10 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
result = ((ioapic->id & 0xf) << 24);
break;
+ case IOAPIC_REG_BOOT_CONFIG:
+ result = 0x01; /* Processor bus */
+ break;
+
default:
{
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
@@ -701,6 +708,26 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
ioapic_write_indirect(ioapic, data);
break;
+ case IOAPIC_REG_EOIR:
+ /*
+ * The EOI register is supported (and version 0x20 advertised)
+ * when userspace explicitly enables suppress EOI broadcast.
+ */
+ if (kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm)) {
+ u8 vector = data & 0xff;
+ int i;
+
+ trace_kvm_ioapic_directed_eoi(vcpu, vector);
+ rtc_irq_eoi(ioapic, vcpu, vector);
+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+ if (ent->fields.vector != vector)
+ continue;
+ kvm_ioapic_update_eoi_one(vcpu, ioapic, ent->fields.trig_mode, i);
+ }
+ }
+ break;
default:
break;
}
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index bf28dbc11ff6..f219577f738c 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -11,7 +11,8 @@ struct kvm_vcpu;
#define IOAPIC_NUM_PINS KVM_IOAPIC_NUM_PINS
#define MAX_NR_RESERVED_IOAPIC_PINS KVM_MAX_IRQ_ROUTES
-#define IOAPIC_VERSION_ID 0x11 /* IOAPIC version */
+#define IOAPIC_VERSION_ID 0x11 /* Default IOAPIC version */
+#define IOAPIC_VERSION_ID_EOIR 0x20 /* IOAPIC version with EOIR support */
#define IOAPIC_EDGE_TRIG 0
#define IOAPIC_LEVEL_TRIG 1
@@ -19,13 +20,15 @@ struct kvm_vcpu;
#define IOAPIC_MEM_LENGTH 0x100
/* Direct registers. */
-#define IOAPIC_REG_SELECT 0x00
-#define IOAPIC_REG_WINDOW 0x10
-
-/* Indirect registers. */
-#define IOAPIC_REG_APIC_ID 0x00 /* x86 IOAPIC only */
-#define IOAPIC_REG_VERSION 0x01
-#define IOAPIC_REG_ARB_ID 0x02 /* x86 IOAPIC only */
+#define IOAPIC_REG_SELECT 0x00
+#define IOAPIC_REG_WINDOW 0x10
+#define IOAPIC_REG_EOIR 0x40 /* version 0x20+ only */
+
+/* INDIRECT registers. */
+#define IOAPIC_REG_APIC_ID 0x00 /* x86 IOAPIC only */
+#define IOAPIC_REG_VERSION 0x01
+#define IOAPIC_REG_ARB_ID 0x02 /* x86 IOAPIC only */
+#define IOAPIC_REG_BOOT_CONFIG 0x03 /* x86 IOAPIC only */
/*ioapic delivery mode*/
#define IOAPIC_FIXED 0x0
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e79bc9cb7162..6902758353a9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -315,6 +315,23 @@ TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
(__entry->e & (1<<15)) ? "level" : "edge",
(__entry->e & (1<<16)) ? "|masked" : "")
);
+
+TRACE_EVENT(kvm_ioapic_directed_eoi,
+ TP_PROTO(struct kvm_vcpu *vcpu, u8 vector),
+ TP_ARGS(vcpu, vector),
+
+ TP_STRUCT__entry(
+ __field( __u32, apicid )
+ __field( __u8, vector )
+ ),
+
+ TP_fast_assign(
+ __entry->apicid = vcpu->vcpu_id;
+ __entry->vector = vector;
+ ),
+
+ TP_printk("apicid %x vector %u", __entry->apicid, __entry->vector)
+);
#endif
TRACE_EVENT(kvm_msi_set_irq,
--
2.39.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2025-12-29 11:17 [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast Khushit Shah
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
2025-12-29 11:17 ` [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR Khushit Shah
@ 2025-12-29 11:17 ` Khushit Shah
2026-01-02 16:41 ` David Woodhouse
2026-01-29 4:49 ` [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes David Woodhouse
3 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 11:17 UTC (permalink / raw)
To: seanjc, pbonzini, kai.huang, dwmw2
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, Khushit Shah, stable
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts, which KVM completely mishandles. When x2APIC
support was first added, KVM incorrectly advertised and "enabled" Suppress
EOI Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Suppress EOI Broadcast is problematic when the
guest relies on interrupts being masked in the I/O APIC until well after
the initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
explicitly enable or disable support for Suppress EOI Broadcasts. This
gives userspace control over the virtual CPU model exposed to the guest,
as KVM should never have enabled support for Suppress EOI Broadcast without
userspace opt-in. Not setting either flag will result in legacy quirky
behavior for backward compatibility.
When KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is set and using in-kernel
IRQCHIP mode, KVM will use I/O APIC version 0x20, which includes support
for the EOI Register.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable@vger.kernel.org
Suggested-by: David Woodhouse <dwmw2@infradead.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
Documentation/virt/kvm/api.rst | 28 +++++++++++++--
arch/x86/include/asm/kvm_host.h | 7 ++++
arch/x86/include/uapi/asm/kvm.h | 6 ++--
arch/x86/kvm/lapic.c | 64 ++++++++++++++++++++++-----------
arch/x86/kvm/x86.c | 15 ++++++--
5 files changed, 93 insertions(+), 27 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 57061fa29e6a..ad15ca519afc 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7800,8 +7800,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7814,6 +7816,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
+Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
+Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
+sets the Suppress EOI Broadcast bit in the SPIV register. When using
+in-kernel IRQCHIP mode, enabling this capability will cause KVM to use
+I/O APIC version 0x20, which includes support for the EOI Register for
+directed EOI.
+
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
+Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
+support to the guest.
+
+Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
+or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
+behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
+support for Suppress EOI Broadcasts but not actually suppress EOI
+broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
+Suppress EOI Broadcasts.
+
+Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..4a6d94dc7a2a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1229,6 +1229,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+enum kvm_suppress_eoi_broadcast_mode {
+ KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
+};
+
struct kvm_x86_msr_filter {
u8 count;
bool default_allow:1;
@@ -1480,6 +1486,7 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d420c9c066d4..d30241429fa8 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -913,8 +913,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
+#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2))
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2c24fd8d815f..36a5af218802 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -107,21 +107,31 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
{
- /*
- * The default in-kernel I/O APIC emulates the 82093AA and does not
- * implement an EOI register. Some guests (e.g. Windows with the
- * Hyper-V role enabled) disable LAPIC EOI broadcast without checking
- * the I/O APIC version, which can cause level-triggered interrupts to
- * never be EOI'd.
- *
- * To avoid this, KVM must not advertise Suppress EOI Broadcast support
- * when using the default in-kernel I/O APIC.
- *
- * Historically, in split IRQCHIP mode, KVM always advertised Suppress
- * EOI Broadcast support but did not actually suppress EOIs, resulting
- * in quirky behavior.
- */
- return !ioapic_in_kernel(kvm);
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * The default in-kernel I/O APIC emulates the 82093AA and does not
+ * implement an EOI register. Some guests (e.g. Windows with the
+ * Hyper-V role enabled) disable LAPIC EOI broadcast without
+ * checking the I/O APIC version, which can cause level-triggered
+ * interrupts to never be EOI'd.
+ *
+ * To avoid this, KVM must not advertise Suppress EOI Broadcast
+ * support when using the default in-kernel I/O APIC.
+ *
+ * Historically, in split IRQCHIP mode, KVM always advertised
+ * Suppress EOI Broadcast support but did not actually suppress
+ * EOIs, resulting in quirky behavior.
+ */
+ return !ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
}
bool kvm_lapic_respect_suppress_eoi_broadcast(struct kvm *kvm)
@@ -129,13 +139,25 @@ bool kvm_lapic_respect_suppress_eoi_broadcast(struct kvm *kvm)
/*
* Returns true if KVM should honor the guest's request to suppress EOI
* broadcasts, i.e. actually implement Suppress EOI Broadcast.
- *
- * Historically, in split IRQCHIP mode, KVM ignored the suppress EOI
- * broadcast bit set by the guest and broadcasts EOIs to the userspace
- * I/O APIC. For In-kernel I/O APIC, the support itself is not
- * advertised, but if bit was set by the guest, it was respected.
*/
- return ioapic_in_kernel(kvm);
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * Historically, in split IRQCHIP mode, KVM ignored the suppress
+ * EOI broadcast bit set by the guest and broadcasts EOIs to the
+ * userspace I/O APIC. For In-kernel I/O APIC, the support itself
+ * is not advertised, but if bit was set by the guest, it was
+ * respected.
+ */
+ return ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
}
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9c2aa6f4705..5d56b0384dcc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -6778,11 +6780,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
break;
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
+ break;
+
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
+ if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
+
r = 0;
break;
case KVM_CAP_X86_DISABLE_EXITS:
--
2.39.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 11:17 ` [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR Khushit Shah
@ 2025-12-29 11:39 ` David Woodhouse
2025-12-29 12:21 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2025-12-29 11:39 UTC (permalink / raw)
To: Khushit Shah, seanjc, pbonzini, kai.huang
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, David Woodhouse
On 29 December 2025 11:17:07 GMT, Khushit Shah <khushit.shah@nutanix.com> wrote:
>From: David Woodhouse <dwmw@amazon.co.uk>
>
>Introduce support for I/O APIC version 0x20, which includes the EOI
>Register (EOIR) for directed EOI. The EOI register allows guests to
>perform EOIs to individual I/O APICs instead of relying on broadcast EOIs
>from the local APIC.
>
>When Suppress EOI Broadcast (SEOIB) capability is advertised to the guest,
>guests that enable it will EOI individual I/O APICs by writing to their
>EOI register instead of relying on broadcast EOIs from the LAPIC. Hence,
>when SEOIB is advertised (so that guests can use it if they choose), use
>I/O APIC version 0x20 to provide the EOI register. This prepares for a
>userspace API that will allow explicit control of SEOIB support, providing
>a consistent interface for both in-kernel and split IRQCHIP mode.
>
>Add a tracepoint (kvm_ioapic_directed_eoi) to track directed EOIs for
>debugging and observability.
>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
>---
> arch/x86/kvm/ioapic.c | 31 +++++++++++++++++++++++++++++--
> arch/x86/kvm/ioapic.h | 19 +++++++++++--------
> arch/x86/kvm/trace.h | 17 +++++++++++++++++
> 3 files changed, 57 insertions(+), 10 deletions(-)
>
>diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>index 6bf8d110aece..eea1eb7845c4 100644
>--- a/arch/x86/kvm/ioapic.c
>+++ b/arch/x86/kvm/ioapic.c
>@@ -48,8 +48,11 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
>
> switch (ioapic->ioregsel) {
> case IOAPIC_REG_VERSION:
>- result = ((((IOAPIC_NUM_PINS - 1) & 0xff) << 16)
>- | (IOAPIC_VERSION_ID & 0xff));
>+ if (kvm_lapic_advertise_suppress_eoi_broadcast(ioapic->kvm))
>+ result = IOAPIC_VERSION_ID_EOIR;
>+ else
>+ result = IOAPIC_VERSION_ID;
>+ result |= ((IOAPIC_NUM_PINS - 1) & 0xff) << 16;
I think that wants to depend on _respect_ not _advertise_? Otherwise you're changing existing behaviour in the legacy/quirk case where the VMM neither explicitly enables not disables the feature.
> break;
>
> case IOAPIC_REG_APIC_ID:
>@@ -57,6 +60,10 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
> result = ((ioapic->id & 0xf) << 24);
> break;
>
>+ case IOAPIC_REG_BOOT_CONFIG:
>+ result = 0x01; /* Processor bus */
>+ break;
>+
> default:
> {
> u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
>@@ -701,6 +708,26 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
> ioapic_write_indirect(ioapic, data);
> break;
>
>+ case IOAPIC_REG_EOIR:
>+ /*
>+ * The EOI register is supported (and version 0x20 advertised)
>+ * when userspace explicitly enables suppress EOI broadcast.
>+ */
>+ if (kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm)) {
I'm torn, but I suspect this one should be conditional on _respect_ too. A guest shouldn't be trying this register unless the version register suggests that it exists anyway.
>+ u8 vector = data & 0xff;
>+ int i;
>+
>+ trace_kvm_ioapic_directed_eoi(vcpu, vector);
>+ rtc_irq_eoi(ioapic, vcpu, vector);
>+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>+
>+ if (ent->fields.vector != vector)
>+ continue;
>+ kvm_ioapic_update_eoi_one(vcpu, ioapic, ent->fields.trig_mode, i);
>+ }
>+ }
>+ break;
> default:
> break;
> }
>diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
>index bf28dbc11ff6..f219577f738c 100644
>--- a/arch/x86/kvm/ioapic.h
>+++ b/arch/x86/kvm/ioapic.h
>@@ -11,7 +11,8 @@ struct kvm_vcpu;
>
> #define IOAPIC_NUM_PINS KVM_IOAPIC_NUM_PINS
> #define MAX_NR_RESERVED_IOAPIC_PINS KVM_MAX_IRQ_ROUTES
>-#define IOAPIC_VERSION_ID 0x11 /* IOAPIC version */
>+#define IOAPIC_VERSION_ID 0x11 /* Default IOAPIC version */
>+#define IOAPIC_VERSION_ID_EOIR 0x20 /* IOAPIC version with EOIR support */
> #define IOAPIC_EDGE_TRIG 0
> #define IOAPIC_LEVEL_TRIG 1
>
>@@ -19,13 +20,15 @@ struct kvm_vcpu;
> #define IOAPIC_MEM_LENGTH 0x100
>
> /* Direct registers. */
>-#define IOAPIC_REG_SELECT 0x00
>-#define IOAPIC_REG_WINDOW 0x10
>-
>-/* Indirect registers. */
>-#define IOAPIC_REG_APIC_ID 0x00 /* x86 IOAPIC only */
>-#define IOAPIC_REG_VERSION 0x01
>-#define IOAPIC_REG_ARB_ID 0x02 /* x86 IOAPIC only */
>+#define IOAPIC_REG_SELECT 0x00
>+#define IOAPIC_REG_WINDOW 0x10
>+#define IOAPIC_REG_EOIR 0x40 /* version 0x20+ only */
>+
>+/* INDIRECT registers. */
>+#define IOAPIC_REG_APIC_ID 0x00 /* x86 IOAPIC only */
>+#define IOAPIC_REG_VERSION 0x01
>+#define IOAPIC_REG_ARB_ID 0x02 /* x86 IOAPIC only */
>+#define IOAPIC_REG_BOOT_CONFIG 0x03 /* x86 IOAPIC only */
>
> /*ioapic delivery mode*/
> #define IOAPIC_FIXED 0x0
>diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>index e79bc9cb7162..6902758353a9 100644
>--- a/arch/x86/kvm/trace.h
>+++ b/arch/x86/kvm/trace.h
>@@ -315,6 +315,23 @@ TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
> (__entry->e & (1<<15)) ? "level" : "edge",
> (__entry->e & (1<<16)) ? "|masked" : "")
> );
>+
>+TRACE_EVENT(kvm_ioapic_directed_eoi,
>+ TP_PROTO(struct kvm_vcpu *vcpu, u8 vector),
>+ TP_ARGS(vcpu, vector),
>+
>+ TP_STRUCT__entry(
>+ __field( __u32, apicid )
>+ __field( __u8, vector )
>+ ),
>+
>+ TP_fast_assign(
>+ __entry->apicid = vcpu->vcpu_id;
>+ __entry->vector = vector;
>+ ),
>+
>+ TP_printk("apicid %x vector %u", __entry->apicid, __entry->vector)
>+);
> #endif
>
> TRACE_EVENT(kvm_msi_set_irq,
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 11:39 ` David Woodhouse
@ 2025-12-29 12:21 ` Khushit Shah
2025-12-29 13:01 ` David Woodhouse
0 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 12:21 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham, David Woodhouse
> On 29 Dec 2025, at 5:09 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On 29 December 2025 11:17:07 GMT, Khushit Shah <khushit.shah@nutanix.com> wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Introduce support for I/O APIC version 0x20, which includes the EOI
>> Register (EOIR) for directed EOI. The EOI register allows guests to
>> perform EOIs to individual I/O APICs instead of relying on broadcast EOIs
>> from the local APIC.
>>
>> When Suppress EOI Broadcast (SEOIB) capability is advertised to the guest,
>> guests that enable it will EOI individual I/O APICs by writing to their
>> EOI register instead of relying on broadcast EOIs from the LAPIC. Hence,
>> when SEOIB is advertised (so that guests can use it if they choose), use
>> I/O APIC version 0x20 to provide the EOI register. This prepares for a
>> userspace API that will allow explicit control of SEOIB support, providing
>> a consistent interface for both in-kernel and split IRQCHIP mode.
>>
>> Add a tracepoint (kvm_ioapic_directed_eoi) to track directed EOIs for
>> debugging and observability.
>>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
>> ---
>> arch/x86/kvm/ioapic.c | 31 +++++++++++++++++++++++++++++--
>> arch/x86/kvm/ioapic.h | 19 +++++++++++--------
>> arch/x86/kvm/trace.h | 17 +++++++++++++++++
>> 3 files changed, 57 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 6bf8d110aece..eea1eb7845c4 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -48,8 +48,11 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic)
>>
>> switch (ioapic->ioregsel) {
>> case IOAPIC_REG_VERSION:
>> - result = ((((IOAPIC_NUM_PINS - 1) & 0xff) << 16)
>> - | (IOAPIC_VERSION_ID & 0xff));
>> + if (kvm_lapic_advertise_suppress_eoi_broadcast(ioapic->kvm))
>> + result = IOAPIC_VERSION_ID_EOIR;
>> + else
>> + result = IOAPIC_VERSION_ID;
>> + result |= ((IOAPIC_NUM_PINS - 1) & 0xff) << 16;
>
> I think that wants to depend on _respect_ not _advertise_? Otherwise you're changing existing behaviour in the legacy/quirk case where the VMM neither explicitly enables not disables the feature.
I think _advertise_ is correct, as for legacy case, in kernel IRQCHIP mode, _advertise_ is false. For kernel IRQCHIP, _advertise_ is only true when *enabled*.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 12:21 ` Khushit Shah
@ 2025-12-29 13:01 ` David Woodhouse
2025-12-29 15:16 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2025-12-29 13:01 UTC (permalink / raw)
To: Khushit Shah
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
[-- Attachment #1: Type: text/plain, Size: 3394 bytes --]
On Mon, 2025-12-29 at 12:21 +0000, Khushit Shah wrote:
>
> > On 29 Dec 2025, at 5:09 PM, David Woodhouse <dwmw2@infradead.org>
> > wrote:
> >
> > On 29 December 2025 11:17:07 GMT, Khushit Shah
> > <khushit.shah@nutanix.com> wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > >
> > > Introduce support for I/O APIC version 0x20, which includes the
> > > EOI
> > > Register (EOIR) for directed EOI. The EOI register allows guests
> > > to
> > > perform EOIs to individual I/O APICs instead of relying on
> > > broadcast EOIs
> > > from the local APIC.
> > >
> > > When Suppress EOI Broadcast (SEOIB) capability is advertised to
> > > the guest,
> > > guests that enable it will EOI individual I/O APICs by writing to
> > > their
> > > EOI register instead of relying on broadcast EOIs from the
> > > LAPIC. Hence,
> > > when SEOIB is advertised (so that guests can use it if they
> > > choose), use
> > > I/O APIC version 0x20 to provide the EOI register. This prepares
> > > for a
> > > userspace API that will allow explicit control of SEOIB support,
> > > providing
> > > a consistent interface for both in-kernel and split IRQCHIP mode.
> > >
> > > Add a tracepoint (kvm_ioapic_directed_eoi) to track directed EOIs
> > > for
> > > debugging and observability.
> > >
> > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > > Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> > > ---
> > > arch/x86/kvm/ioapic.c | 31 +++++++++++++++++++++++++++++--
> > > arch/x86/kvm/ioapic.h | 19 +++++++++++--------
> > > arch/x86/kvm/trace.h | 17 +++++++++++++++++
> > > 3 files changed, 57 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> > > index 6bf8d110aece..eea1eb7845c4 100644
> > > --- a/arch/x86/kvm/ioapic.c
> > > +++ b/arch/x86/kvm/ioapic.c
> > > @@ -48,8 +48,11 @@ static unsigned long
> > > ioapic_read_indirect(struct kvm_ioapic *ioapic)
> > >
> > > switch (ioapic->ioregsel) {
> > > case IOAPIC_REG_VERSION:
> > > - result = ((((IOAPIC_NUM_PINS - 1) & 0xff) << 16)
> > > - | (IOAPIC_VERSION_ID & 0xff));
> > > + if (kvm_lapic_advertise_suppress_eoi_broadcast(ioapic->kvm))
> > > + result = IOAPIC_VERSION_ID_EOIR;
> > > + else
> > > + result = IOAPIC_VERSION_ID;
> > > + result |= ((IOAPIC_NUM_PINS - 1) & 0xff) << 16;
> >
> > I think that wants to depend on _respect_ not _advertise_?
> > Otherwise you're changing existing behaviour in the legacy/quirk
> > case where the VMM neither explicitly enables not disables the
> > feature.
>
> I think _advertise_ is correct, as for legacy case, in kernel IRQCHIP
> mode, _advertise_ is false. For kernel IRQCHIP, _advertise_ is only
> true when *enabled*.
Hm? IIUC kvm_lapic_advertise_suppress_eoi_broadcast() is true whenever
userspace *hasn't* set KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
(either userspace has explicitly *enabled* it instead, or userspace has
done neither and we should preserve the legacy behaviour).
If the kernel I/O APIC is enabled when userspace has not explicitly
either enabled or disabled EOI suppression, then the I/O APIC should
advertise precisely the same features as before. As far as I can tell,
this will make the kernel I/O APIC advertise the newer version and
support the EOI register in that legacy case, which it shouldn't?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 13:01 ` David Woodhouse
@ 2025-12-29 15:16 ` Khushit Shah
2025-12-29 15:36 ` David Woodhouse
0 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 15:16 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 29 Dec 2025, at 6:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> Hm? IIUC kvm_lapic_advertise_suppress_eoi_broadcast() is true whenever
> userspace *hasn't* set KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
> (either userspace has explicitly *enabled* it instead, or userspace has
> done neither and we should preserve the legacy behaviour).
The legacy behaviour for "kvm_lapic_advertise_suppress_eoi_broadcast()" is:
- true for split IRQCHIP (userspace I/O APIC)
- false for in-kernel IRQCHIP
The in-kernel IRQCHIP case was "fixed" by commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), which made
it return false when IOAPIC is in-kernel.
With this series, in QUIRKED mode the function still returns !ioapic_in_kernel(),
preserving that exact legacy behavior. The I/O APIC version 0x20 (with EOIR)
is only used when userspace explicitly sets the ENABLE flag.
The comments in patch 1 explain this in more detail ;)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 15:16 ` Khushit Shah
@ 2025-12-29 15:36 ` David Woodhouse
2025-12-29 15:57 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2025-12-29 15:36 UTC (permalink / raw)
To: Khushit Shah
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
On 29 December 2025 15:16:40 GMT, Khushit Shah <khushit.shah@nutanix.com> wrote:
>
>
>> On 29 Dec 2025, at 6:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>>
>> Hm? IIUC kvm_lapic_advertise_suppress_eoi_broadcast() is true whenever
>> userspace *hasn't* set KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
>> (either userspace has explicitly *enabled* it instead, or userspace has
>> done neither and we should preserve the legacy behaviour).
>
>The legacy behaviour for "kvm_lapic_advertise_suppress_eoi_broadcast()" is:
>- true for split IRQCHIP (userspace I/O APIC)
>- false for in-kernel IRQCHIP
>
>The in-kernel IRQCHIP case was "fixed" by commit 0bcc3fb95b97 ("KVM: lapic:
>stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), which made
>it return false when IOAPIC is in-kernel.
>
>With this series, in QUIRKED mode the function still returns !ioapic_in_kernel(),
>preserving that exact legacy behavior. The I/O APIC version 0x20 (with EOIR)
>is only used when userspace explicitly sets the ENABLE flag.
>
>The comments in patch 1 explain this in more detail ;)
Ah, OK. So in the case of in-kernel I/O APIC, kvm_lapic_advertise_suppress_eoi_broadcast() kvm_lapic_respect_suppress_eoi_broadcast() are the same. In that case we can choose the one which is easier to understand and doesn't need the reader to refer back to an earlier commit? I accept your correction; the patch is correct.
But I think I still prefer the check to be on _respect_ as it's clearer that it's part of the new behaviour that is only introduced with this series.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 15:36 ` David Woodhouse
@ 2025-12-29 15:57 ` Khushit Shah
2026-01-02 16:17 ` David Woodhouse
0 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2025-12-29 15:57 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 29 Dec 2025, at 9:06 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> Ah, OK. So in the case of in-kernel I/O APIC, kvm_lapic_advertise_suppress_eoi_broadcast() kvm_lapic_respect_suppress_eoi_broadcast() are the same. In that case we can choose the one which is easier to understand and doesn't need the reader to refer back to an earlier commit? I accept your correction; the patch is correct.
>
> But I think I still prefer the check to be on _respect_ as it's clearer that it's part of the new behaviour that is only introduced with this series.
We can't use `_respect_` here because in QUIRKED mode with in-kernel IRQCHIP:
advertise = false (version 0x11 advertised, no EOIR register)
respect = true (legacy quirk: honor SPIV bit even if not advertised)
While it is true that when SEOIB is not advertised, the bit should not
be respected. However, the legacy KVM implementation still respected the
SPIV bit in kvm_ioapic_update_eoi_one() even when not advertising SEOIB.
I've preserved that legacy behavior in `_respect_` for QUIRKED mode.
I think the logic is straightforward: if we advertise SEOIB while using in-kernel
IRQCHIP, use I/O APIC version 0x20.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2025-12-29 15:57 ` Khushit Shah
@ 2026-01-02 16:17 ` David Woodhouse
2026-01-12 3:22 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-02 16:17 UTC (permalink / raw)
To: Khushit Shah
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
[-- Attachment #1: Type: text/plain, Size: 773 bytes --]
On Mon, 2025-12-29 at 15:57 +0000, Khushit Shah wrote:
>
>
> We can't use `_respect_` here because in QUIRKED mode with in-kernel IRQCHIP:
>
> advertise = false (version 0x11 advertised, no EOIR register)
> respect = true (legacy quirk: honor SPIV bit even if not advertised)
Oh wow, right. Since commit 0bcc3fb95b97a ("KVM: lapic: stop
advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), KVM with
the in-kernel I/O APIC will *not* advertise the EOI suppression in the
local APIC version register… but does actually honour the DIRECTED_EOI
bit if the guest sets it anyway.
While with a userspace I/O APIC, KVM *will* advertise it, but not
honour it.
Yay.
So yes, your code is the best way to do it. Sorry for the noise.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
@ 2026-01-02 16:23 ` David Woodhouse
2026-01-12 4:15 ` Khushit Shah
2026-01-13 23:11 ` Sean Christopherson
1 sibling, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-02 16:23 UTC (permalink / raw)
To: Khushit Shah, seanjc, pbonzini, kai.huang
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham
[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]
On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
> Extract the suppress EOI broadcast (Directed EOI) logic into helper
> functions and move the check from kvm_ioapic_update_eoi_one() to
> kvm_ioapic_update_eoi() (required for a later patch). Prepare
> kvm_ioapic_send_eoi() to honor Suppress EOI Broadcast in split IRQCHIP
> mode.
>
> Introduce two helper functions:
> - kvm_lapic_advertise_suppress_eoi_broadcast(): determines whether KVM
> should advertise Suppress EOI Broadcast support to the guest
> - kvm_lapic_respect_suppress_eoi_broadcast(): determines whether KVM should
> honor the guest's request to suppress EOI broadcasts
>
> This refactoring prepares for I/O APIC version 0x20 support and userspace
> control of suppress EOI broadcast behavior.
>
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Looks good to me, thanks for pushing this through to completion!
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Nit: Ideally I would would prefer to see an explicit 'no functional
change intended' and a reference to commit 0bcc3fb95b97a.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2025-12-29 11:17 ` [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
@ 2026-01-02 16:41 ` David Woodhouse
2026-01-12 3:27 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-02 16:41 UTC (permalink / raw)
To: Khushit Shah, seanjc, pbonzini, kai.huang
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, stable
[-- Attachment #1: Type: text/plain, Size: 4441 bytes --]
On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
> Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
> for Suppress EOI Broadcasts, which KVM completely mishandles. When x2APIC
> support was first added, KVM incorrectly advertised and "enabled" Suppress
> EOI Broadcast, without fully supporting the I/O APIC side of the equation,
> i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
>
> That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
> support for Suppress EOI Broadcasts irrespective of whether or not the
> userspace I/O APIC implementation supported directed EOIs. Even worse,
> KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
> support for directed EOI came to rely on the "spurious" broadcasts.
>
> KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
> support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
> stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
> didn't do anything to remedy userspace I/O APIC implementations.
>
> KVM's bogus handling of Suppress EOI Broadcast is problematic when the
> guest relies on interrupts being masked in the I/O APIC until well after
> the initial local APIC EOI. E.g. Windows with Credential Guard enabled
> handles interrupts in the following order:
> 1. Interrupt for L2 arrives.
> 2. L1 APIC EOIs the interrupt.
> 3. L1 resumes L2 and injects the interrupt.
> 4. L2 EOIs after servicing.
> 5. L1 performs the I/O APIC EOI.
>
> Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
> storm, e.g. if the IRQ line is still asserted and userspace reacts to the
> EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
> until step #4, and doesn't expect the interrupt to be re-enabled until
> step #5.
>
> Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
> of knowing if the userspace I/O APIC supports directed EOIs, i.e.
> suppressing EOI broadcasts would result in interrupts being stuck masked
> in the userspace I/O APIC due to step #5 being ignored by userspace. And
> fully disabling support for Suppress EOI Broadcast is also undesirable, as
> picking up the fix would require a guest reboot, *and* more importantly
> would change the virtual CPU model exposed to the guest without any buy-in
> from userspace.
>
> Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
> KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
> explicitly enable or disable support for Suppress EOI Broadcasts. This
> gives userspace control over the virtual CPU model exposed to the guest,
> as KVM should never have enabled support for Suppress EOI Broadcast without
> userspace opt-in. Not setting either flag will result in legacy quirky
> behavior for backward compatibility.
>
> When KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is set and using in-kernel
> IRQCHIP mode, KVM will use I/O APIC version 0x20, which includes support
> for the EOI Register.
>
> Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
> APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
> is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
>
> Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
> Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
> Cc: stable@vger.kernel.org
Do we want the Cc:stable? And if we do we'd want it on all three
patches, surely?
> Suggested-by: David Woodhouse <dwmw2@infradead.org>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Although...
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1229,6 +1229,12 @@ enum kvm_irqchip_mode {
> KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
> };
>
> +enum kvm_suppress_eoi_broadcast_mode {
> + KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
I believe it's cosmetic but I think I'd be slightly happier with an
explicit '= 0' on that, as we rely on that field being initialised to
zero with the allocation of struct kvm, don't we?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR
2026-01-02 16:17 ` David Woodhouse
@ 2026-01-12 3:22 ` Khushit Shah
0 siblings, 0 replies; 28+ messages in thread
From: Khushit Shah @ 2026-01-12 3:22 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 2 Jan 2026, at 9:47 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2025-12-29 at 15:57 +0000, Khushit Shah wrote:
>>
>>
>> We can't use `_respect_` here because in QUIRKED mode with in-kernel IRQCHIP:
>>
>> advertise = false (version 0x11 advertised, no EOIR register)
>> respect = true (legacy quirk: honor SPIV bit even if not advertised)
>
> Oh wow, right. Since commit 0bcc3fb95b97a ("KVM: lapic: stop
> advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), KVM with
> the in-kernel I/O APIC will *not* advertise the EOI suppression in the
> local APIC version register… but does actually honour the DIRECTED_EOI
> bit if the guest sets it anyway.
>
> While with a userspace I/O APIC, KVM *will* advertise it, but not
> honour it.
>
> Yay.
>
> So yes, your code is the best way to do it. Sorry for the noise.
Thanks, for the review!
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-02 16:41 ` David Woodhouse
@ 2026-01-12 3:27 ` Khushit Shah
0 siblings, 0 replies; 28+ messages in thread
From: Khushit Shah @ 2026-01-12 3:27 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham, stable@vger.kernel.org
> On 2 Jan 2026, at 10:11 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
>> Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
>> for Suppress EOI Broadcasts, which KVM completely mishandles. When x2APIC
>> support was first added, KVM incorrectly advertised and "enabled" Suppress
>> EOI Broadcast, without fully supporting the I/O APIC side of the equation,
>> i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
>>
>> That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
>> support for Suppress EOI Broadcasts irrespective of whether or not the
>> userspace I/O APIC implementation supported directed EOIs. Even worse,
>> KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
>> support for directed EOI came to rely on the "spurious" broadcasts.
>>
>> KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
>> support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
>> stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
>> didn't do anything to remedy userspace I/O APIC implementations.
>>
>> KVM's bogus handling of Suppress EOI Broadcast is problematic when the
>> guest relies on interrupts being masked in the I/O APIC until well after
>> the initial local APIC EOI. E.g. Windows with Credential Guard enabled
>> handles interrupts in the following order:
>> 1. Interrupt for L2 arrives.
>> 2. L1 APIC EOIs the interrupt.
>> 3. L1 resumes L2 and injects the interrupt.
>> 4. L2 EOIs after servicing.
>> 5. L1 performs the I/O APIC EOI.
>>
>> Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
>> storm, e.g. if the IRQ line is still asserted and userspace reacts to the
>> EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
>> until step #4, and doesn't expect the interrupt to be re-enabled until
>> step #5.
>>
>> Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
>> of knowing if the userspace I/O APIC supports directed EOIs, i.e.
>> suppressing EOI broadcasts would result in interrupts being stuck masked
>> in the userspace I/O APIC due to step #5 being ignored by userspace. And
>> fully disabling support for Suppress EOI Broadcast is also undesirable, as
>> picking up the fix would require a guest reboot, *and* more importantly
>> would change the virtual CPU model exposed to the guest without any buy-in
>> from userspace.
>>
>> Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
>> KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
>> explicitly enable or disable support for Suppress EOI Broadcasts. This
>> gives userspace control over the virtual CPU model exposed to the guest,
>> as KVM should never have enabled support for Suppress EOI Broadcast without
>> userspace opt-in. Not setting either flag will result in legacy quirky
>> behavior for backward compatibility.
>>
>> When KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is set and using in-kernel
>> IRQCHIP mode, KVM will use I/O APIC version 0x20, which includes support
>> for the EOI Register.
>>
>> Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
>> APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
>> is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
>>
>> Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
>> Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
>> Cc: stable@vger.kernel.org
>
> Do we want the Cc:stable? And if we do we'd want it on all three
> patches, surely?
>
>> Suggested-by: David Woodhouse <dwmw2@infradead.org>
>> Co-developed-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
>
> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
>
> Although...
>
>
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1229,6 +1229,12 @@ enum kvm_irqchip_mode {
>> KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
>> };
>>
>> +enum kvm_suppress_eoi_broadcast_mode {
>> + KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
>
>
> I believe it's cosmetic but I think I'd be slightly happier with an
> explicit '= 0' on that, as we rely on that field being initialised to
> zero with the allocation of struct kvm, don't we?
Acked.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-02 16:23 ` David Woodhouse
@ 2026-01-12 4:15 ` Khushit Shah
2026-01-13 23:40 ` David Woodhouse
2026-01-14 0:10 ` Sean Christopherson
0 siblings, 2 replies; 28+ messages in thread
From: Khushit Shah @ 2026-01-12 4:15 UTC (permalink / raw)
To: David Woodhouse
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 2 Jan 2026, at 9:53 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
>> Extract the suppress EOI broadcast (Directed EOI) logic into helper
>> functions and move the check from kvm_ioapic_update_eoi_one() to
>> kvm_ioapic_update_eoi() (required for a later patch). Prepare
>> kvm_ioapic_send_eoi() to honor Suppress EOI Broadcast in split IRQCHIP
>> mode.
>>
>> Introduce two helper functions:
>> - kvm_lapic_advertise_suppress_eoi_broadcast(): determines whether KVM
>> should advertise Suppress EOI Broadcast support to the guest
>> - kvm_lapic_respect_suppress_eoi_broadcast(): determines whether KVM should
>> honor the guest's request to suppress EOI broadcasts
>>
>> This refactoring prepares for I/O APIC version 0x20 support and userspace
>> control of suppress EOI broadcast behavior.
>>
>> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
>
> Looks good to me, thanks for pushing this through to completion!
>
>
> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
>
> Nit: Ideally I would would prefer to see an explicit 'no functional
> change intended' and a reference to commit 0bcc3fb95b97a.
I took another careful look at the refactor specifically through the
“no functional change” lens.
The legacy behavior with the in-kernel IRQCHIP can be summarized as:
- Suppress EOI Broadcast (SEOIB) is not advertised to the guest.
- If the guest nevertheless enables SEOIB, it is honored (already in un-s
upported territory).
- Even in that case, the legacy code still ends up calling
kvm_notify_acked_irq() in kvm_ioapic_update_eoi_one().
With the refactor, kvm_notify_acked_irq() is no longer reached in this
specific legacy scenario when the guest enables SEOIB despite it not
being advertised. I believe this is acceptable, as the guest is relying
on an unadvertised feature.
For non-QUIRKED configurations, the behavior is also correct:
- When SEOIB is ENABLED, kvm_notify_acked_irq() is called on EOIR write,
when enabled by guest.
- When SEOIB is DISABLED, kvm_notify_acked_irq() is called on EOI
broadcast.
I would appreciate others chiming in if they see a reason to preserve
the legacy ack behavior even in the unsupported case.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
2026-01-02 16:23 ` David Woodhouse
@ 2026-01-13 23:11 ` Sean Christopherson
1 sibling, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2026-01-13 23:11 UTC (permalink / raw)
To: Khushit Shah
Cc: pbonzini, kai.huang, dwmw2, mingo, x86, bp, hpa, linux-kernel,
kvm, dave.hansen, tglx, jon, shaju.abraham
On Mon, Dec 29, 2025, Khushit Shah wrote:
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0ae7f913d782..2c24fd8d815f 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -105,6 +105,39 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
> apic_test_vector(vector, apic->regs + APIC_IRR);
> }
>
> +bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
This can be static, its only caller is kvm_apic_set_version().
> +{
> + /*
> + * The default in-kernel I/O APIC emulates the 82093AA and does not
> + * implement an EOI register. Some guests (e.g. Windows with the
> + * Hyper-V role enabled) disable LAPIC EOI broadcast without checking
> + * the I/O APIC version, which can cause level-triggered interrupts to
> + * never be EOI'd.
> + *
> + * To avoid this, KVM must not advertise Suppress EOI Broadcast support
> + * when using the default in-kernel I/O APIC.
> + *
> + * Historically, in split IRQCHIP mode, KVM always advertised Suppress
> + * EOI Broadcast support but did not actually suppress EOIs, resulting
> + * in quirky behavior.
> + */
> + return !ioapic_in_kernel(kvm);
> +}
> +
> +bool kvm_lapic_respect_suppress_eoi_broadcast(struct kvm *kvm)
I don't see any point in forcing every caller to check SPIV *and* this helper.
Just do:
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
{
struct kvm *kvm = apic->vcpu->kvm;
if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
return false;
switch (kvm->arch.suppress_eoi_broadcast_mode) {
...
}
}
And then callers are much more readable, e.g. (spoiler alert if you haven't read
my other mail, which I haven't sent yet):
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
kvm_lapic_suppress_eoi_broadcast(apic))
return;
and
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
/*
* Don't exit to userspace if the guest has enabled Directed
* EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
* APIC doesn't broadcast EOIs (the guest must EOI the target
* I/O APIC(s) directly).
*/
if (kvm_lapic_suppress_eoi_broadcast(apic))
return;
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-12 4:15 ` Khushit Shah
@ 2026-01-13 23:40 ` David Woodhouse
2026-01-14 0:10 ` Sean Christopherson
1 sibling, 0 replies; 28+ messages in thread
From: David Woodhouse @ 2026-01-13 23:40 UTC (permalink / raw)
To: Khushit Shah
Cc: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
On 12 January 2026 04:15:37 GMT, Khushit Shah <khushit.shah@nutanix.com> wrote:
>
>
>> On 2 Jan 2026, at 9:53 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>>
>> On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
>>> Extract the suppress EOI broadcast (Directed EOI) logic into helper
>>> functions and move the check from kvm_ioapic_update_eoi_one() to
>>> kvm_ioapic_update_eoi() (required for a later patch). Prepare
>>> kvm_ioapic_send_eoi() to honor Suppress EOI Broadcast in split IRQCHIP
>>> mode.
>>>
>>> Introduce two helper functions:
>>> - kvm_lapic_advertise_suppress_eoi_broadcast(): determines whether KVM
>>> should advertise Suppress EOI Broadcast support to the guest
>>> - kvm_lapic_respect_suppress_eoi_broadcast(): determines whether KVM should
>>> honor the guest's request to suppress EOI broadcasts
>>>
>>> This refactoring prepares for I/O APIC version 0x20 support and userspace
>>> control of suppress EOI broadcast behavior.
>>>
>>> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
>>
>> Looks good to me, thanks for pushing this through to completion!
>>
>>
>> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Nit: Ideally I would would prefer to see an explicit 'no functional
>> change intended' and a reference to commit 0bcc3fb95b97a.
>
>
>I took another careful look at the refactor specifically through the
>“no functional change” lens.
This is, of course, exactly why I make people type those words explicitly :)
>The legacy behavior with the in-kernel IRQCHIP can be summarized as:
>- Suppress EOI Broadcast (SEOIB) is not advertised to the guest.
>- If the guest nevertheless enables SEOIB, it is honored (already in un-s
> upported territory).
>- Even in that case, the legacy code still ends up calling
> kvm_notify_acked_irq() in kvm_ioapic_update_eoi_one().
>
>With the refactor, kvm_notify_acked_irq() is no longer reached in this
>specific legacy scenario when the guest enables SEOIB despite it not
>being advertised. I believe this is acceptable, as the guest is relying
>on an unadvertised feature.
That sounds sensible as you describe it.
Note that we did advertise this in the past and then silently just stop doing so potentially underneath already running guests, but that (commit
0bcc3fb95b97a) was back in 2018 so I guess there won't be many "innocent victim" guests around any more who genuinely did see the feature advertised.
>For non-QUIRKED configurations, the behavior is also correct:
>- When SEOIB is ENABLED, kvm_notify_acked_irq() is called on EOIR write,
> when enabled by guest.
>- When SEOIB is DISABLED, kvm_notify_acked_irq() is called on EOI
> broadcast.
>
>I would appreciate others chiming in if they see a reason to preserve
>the legacy ack behavior even in the unsupported case.
LGTM. Thanks.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-12 4:15 ` Khushit Shah
2026-01-13 23:40 ` David Woodhouse
@ 2026-01-14 0:10 ` Sean Christopherson
2026-01-16 4:41 ` Khushit Shah
2026-01-16 9:01 ` David Woodhouse
1 sibling, 2 replies; 28+ messages in thread
From: Sean Christopherson @ 2026-01-14 0:10 UTC (permalink / raw)
To: Khushit Shah
Cc: David Woodhouse, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
On Mon, Jan 12, 2026, Khushit Shah wrote:
> > On 2 Jan 2026, at 9:53 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > On Mon, 2025-12-29 at 11:17 +0000, Khushit Shah wrote:
> >> Extract the suppress EOI broadcast (Directed EOI) logic into helper
> >> functions and move the check from kvm_ioapic_update_eoi_one() to
> >> kvm_ioapic_update_eoi() (required for a later patch). Prepare
> >> kvm_ioapic_send_eoi() to honor Suppress EOI Broadcast in split IRQCHIP
> >> mode.
> >>
> >> Introduce two helper functions:
> >> - kvm_lapic_advertise_suppress_eoi_broadcast(): determines whether KVM
> >> should advertise Suppress EOI Broadcast support to the guest
> >> - kvm_lapic_respect_suppress_eoi_broadcast(): determines whether KVM should
> >> honor the guest's request to suppress EOI broadcasts
> >>
> >> This refactoring prepares for I/O APIC version 0x20 support and userspace
> >> control of suppress EOI broadcast behavior.
> >>
> >> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> >
> > Looks good to me, thanks for pushing this through to completion!
> >
> >
> > Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
> >
> > Nit: Ideally I would would prefer to see an explicit 'no functional
> > change intended' and a reference to commit 0bcc3fb95b97a.
>
>
> I took another careful look at the refactor specifically through the
> “no functional change” lens.
>
> The legacy behavior with the in-kernel IRQCHIP can be summarized as:
> - Suppress EOI Broadcast (SEOIB) is not advertised to the guest.
> - If the guest nevertheless enables SEOIB, it is honored (already in un-s
> upported territory).
No, KVM will drop the attempt to enable SEOIB, because APIC_LVR won't have
APIC_LVR_DIRECTED_EOI (KVM fully controls the version info, e.g. calls
kvm_apic_set_version() even in kvm_apic_set_state() after copying user state).
case APIC_SPIV: {
u32 mask = 0x3ff;
if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
mask |= APIC_SPIV_DIRECTED_EOI;
apic_set_spiv(apic, val & mask);
if (!(val & APIC_SPIV_APIC_ENABLED)) {
int i;
for (i = 0; i < apic->nr_lvt_entries; i++) {
kvm_lapic_set_reg(apic, APIC_LVTx(i),
kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
}
apic_update_lvtt(apic);
atomic_set(&apic->lapic_timer.pending, 0);
}
break;
}
It _is_ possible for the virtual APIC to end up with the bit set, because KVM
doesn't sanitize APIC_SPIV during kvm_apic_set_state().
> - Even in that case, the legacy code still ends up calling
> kvm_notify_acked_irq() in kvm_ioapic_update_eoi_one().
>
> With the refactor, kvm_notify_acked_irq() is no longer reached in this
> specific legacy scenario when the guest enables SEOIB despite it not
> being advertised. I believe this is acceptable, as the guest is relying
> on an unadvertised feature.
Except that it needs to work when it's re-enabled in a few patches. And as per
commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI") and
https://bugzilla.kernel.org/show_bug.cgi?id=82211, allegedly KVM needs to notify
listeners in this case.
Given that KVM didn't actually implement Directed EOI in the in-kernel I/O APIC,
it's certainly debatable as to whether or not that still holds true, i.e. it may
have been a misdiagnosed root cause. But I have zero interest in finding out
the hard way, especially since the in-kernel I/O APIC is slowly being deprecated,
and _especially_ not in patches that will be Cc'd stable.
So while I agree it would be nice to simultaneously enable the in-kernel I/O APIC,
I want to prioritize landing the fix for split IRQCHIP. And if we're clever,
enabling in-kernel I/O APIC support in the future shouldn't require any new uAPI,
since we can document the limitation and not advertise
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST in KVM_CAP_X2APIC_API when run on a VM
without a split IRQCHIP. Then if support is ever added broadly, we can drop the
relevant code that requires irqchip_split() and update the documentation to say
that userspace need to query KVM_CAP_X2APIC_API on a VM fd to determine whether
or not the flag is supported for an in-kernel I/O APIC.
If someone has a strong need and use case for supporting Supress EOI Broadcast for
an in-kernel I/O APIC, then they can have the honor of proving that things like
Windows and Xen play nice with KVM's implementation. And they can do that on top.
Compile tested only, but this is what I'd like to go with for now (in a single
patch, because IMO isolating the refactoring isn't a net positive without patch 2/3).
--
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Mon, 29 Dec 2025 11:17:06 +0000
Subject: [PATCH] KVM: x86: Add x2APIC "features" to control EOI broadcast
suppression
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated
by userspace), which KVM completely mishandles. When x2APIC support was
first added, KVM incorrectly advertised and "enabled" Suppress EOI
Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Suppress EOI Broadcast is problematic when the
guest relies on interrupts being masked in the I/O APIC until well after
the initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
explicitly enable or disable support for Suppress EOI Broadcasts. This
gives userspace control over the virtual CPU model exposed to the guest,
as KVM should never have enabled support for Suppress EOI Broadcast without
userspace opt-in. Not setting either flag will result in legacy quirky
behavior for backward compatibility.
Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel
I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear
that commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI")
was entirely correct, i.e. it may have simply papered over the lack of
Directed EOI emulation in the I/O APIC.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable@vger.kernel.org
Suggested-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 28 +++++++++++-
arch/x86/include/asm/kvm_host.h | 7 +++
arch/x86/include/uapi/asm/kvm.h | 6 ++-
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/lapic.c | 76 +++++++++++++++++++++++++++++----
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/x86.c | 21 ++++++++-
7 files changed, 127 insertions(+), 15 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..f1f1d2e5dc7c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7835,8 +7835,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7849,6 +7851,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
+Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
+Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
+sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
+supported only when using a split IRQCHIP.
+
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
+Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
+support to the guest.
+
+Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
+or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
+behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
+support for Suppress EOI Broadcasts but not actually suppress EOI
+broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
+Suppress EOI Broadcasts.
+
+Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
+as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
+IRCHIP.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ecd4019b84b7..125bd9a4b807 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1238,6 +1238,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+enum kvm_suppress_eoi_broadcast_mode {
+ KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
+};
+
struct kvm_x86_msr_filter {
u8 count;
bool default_allow:1;
@@ -1487,6 +1493,7 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7ceff6583652..1b0ad5440b99 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
+#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2))
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 9a99d01b111c..a38a8e2ac70b 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -555,7 +555,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
spin_lock(&ioapic->lock);
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ kvm_lapic_suppress_eoi_broadcast(apic))
return;
ent->fields.remote_irr = 0;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 78c39341b2a5..c175a021e1a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}
+static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * The default in-kernel I/O APIC emulates the 82093AA and does not
+ * implement an EOI register. Some guests (e.g. Windows with the
+ * Hyper-V role enabled) disable LAPIC EOI broadcast without
+ * checking the I/O APIC version, which can cause level-triggered
+ * interrupts to never be EOI'd.
+ *
+ * To avoid this, KVM doesn't advertise Suppress EOI Broadcast
+ * support when using the default in-kernel I/O APIC.
+ *
+ * Historically, in split IRQCHIP mode, KVM always advertised
+ * Suppress EOI Broadcast support but did not actually suppress
+ * EOIs, resulting in quirky behavior.
+ */
+ return !ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
+{
+ struct kvm *kvm = apic->vcpu->kvm;
+
+ if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
+ return false;
+
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * Historically, in split IRQCHIP mode, KVM ignored the suppress
+ * EOI broadcast bit set by the guest and broadcasts EOIs to the
+ * userspace I/O APIC. For In-kernel I/O APIC, the support itself
+ * is not advertised, can only be enabled KVM_SET_APIC_STATE, and
+ * and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
+ * feature is enabled, it is respected (with odd behavior).
+ */
+ return ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
@@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
- /*
- * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
- * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
- * Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
- * version first and level-triggered interrupts never get EOIed in
- * IOAPIC.
- */
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
- !ioapic_in_kernel(vcpu->kvm))
+ kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}
@@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly).
+ */
+ if (kvm_lapic_suppress_eoi_broadcast(apic))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 71c80fa020e0..cf8aed8c95ea 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -239,6 +239,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
+
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d4e07f9cff5..82d893d262fa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -4943,6 +4945,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_X2APIC_API:
r = KVM_X2APIC_API_VALID_FLAGS;
+ if (kvm && !irqchip_split(kvm))
+ r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
break;
case KVM_CAP_NESTED_STATE:
r = kvm_x86_ops.nested_ops->get_state ?
@@ -6751,11 +6755,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
break;
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
+ break;
+
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ !irqchip_split(kvm))
+ break;
+
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
+ if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
+
r = 0;
break;
case KVM_CAP_X86_DISABLE_EXITS:
base-commit: f62b64b970570c92fe22503b0cdc65be7ce7fc7c
--
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-14 0:10 ` Sean Christopherson
@ 2026-01-16 4:41 ` Khushit Shah
2026-01-16 9:01 ` David Woodhouse
1 sibling, 0 replies; 28+ messages in thread
From: Khushit Shah @ 2026-01-16 4:41 UTC (permalink / raw)
To: Sean Christopherson
Cc: David Woodhouse, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 14 Jan 2026, at 5:40 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> case APIC_SPIV: {
> u32 mask = 0x3ff;
> if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
> mask |= APIC_SPIV_DIRECTED_EOI;
> apic_set_spiv(apic, val & mask);
> if (!(val & APIC_SPIV_APIC_ENABLED)) {
> int i;
>
> for (i = 0; i < apic->nr_lvt_entries; i++) {
> kvm_lapic_set_reg(apic, APIC_LVTx(i),
> kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
> }
> apic_update_lvtt(apic);
> atomic_set(&apic->lapic_timer.pending, 0);
>
> }
> break;
> }
>
> It _is_ possible for the virtual APIC to end up with the bit set, because KVM
> doesn't sanitize APIC_SPIV during kvm_apic_set_state().
Ohh Wow! Okay.
> On 14 Jan 2026, at 5:40 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> Except that it needs to work when it's re-enabled in a few patches. And as per
> commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI") and
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D82211&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=PGWMyignA0NiDmTlyP7vOTHozBws_VN86yrVmSMkBp0&m=3a8aRAaT-5-y7XGNZDSigpCoMKWgZMPHgKMf0pNKs-BEnjmuhtg8NxX9-jSx6CTp&s=qRth9VSXr8AqIx-tfKzLf8j4Fks5TtSdUMHhre4cgAo&e= , allegedly KVM needs to notify
> listeners in this case.
>
> Given that KVM didn't actually implement Directed EOI in the in-kernel I/O APIC,
> it's certainly debatable as to whether or not that still holds true, i.e. it may
> have been a misdiagnosed root cause. But I have zero interest in finding out
> the hard way, especially since the in-kernel I/O APIC is slowly being deprecated,
> and _especially_ not in patches that will be Cc'd stable.
>
> So while I agree it would be nice to simultaneously enable the in-kernel I/O APIC,
> I want to prioritize landing the fix for split IRQCHIP. And if we're clever,
> enabling in-kernel I/O APIC support in the future shouldn't require any new uAPI,
> since we can document the limitation and not advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST in KVM_CAP_X2APIC_API when run on a VM
> without a split IRQCHIP. Then if support is ever added broadly, we can drop the
> relevant code that requires irqchip_split() and update the documentation to say
> that userspace need to query KVM_CAP_X2APIC_API on a VM fd to determine whether
> or not the flag is supported for an in-kernel I/O APIC.
>
> If someone has a strong need and use case for supporting Supress EOI Broadcast for
> an in-kernel I/O APIC, then they can have the honor of proving that things like
> Windows and Xen play nice with KVM's implementation. And they can do that on top.
I agree on not finding it the hard way. I am okay with prioritising fix for split IRQCHIP.
> On 14 Jan 2026, at 5:40 AM, Sean Christopherson <seanjc@google.com> wrote:
>
> --
> From: Khushit Shah <khushit.shah@nutanix.com>
> Date: Mon, 29 Dec 2025 11:17:06 +0000
> Subject: [PATCH] KVM: x86: Add x2APIC "features" to control EOI broadcast
> suppression
>
> Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
> for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated
> by userspace), which KVM completely mishandles. When x2APIC support was
> first added, KVM incorrectly advertised and "enabled" Suppress EOI
> Broadcast, without fully supporting the I/O APIC side of the equation,
> i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
>
> That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
> support for Suppress EOI Broadcasts irrespective of whether or not the
> userspace I/O APIC implementation supported directed EOIs. Even worse,
> KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
> support for directed EOI came to rely on the "spurious" broadcasts.
>
> KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
> support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
> stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
> didn't do anything to remedy userspace I/O APIC implementations.
>
> KVM's bogus handling of Suppress EOI Broadcast is problematic when the
> guest relies on interrupts being masked in the I/O APIC until well after
> the initial local APIC EOI. E.g. Windows with Credential Guard enabled
> handles interrupts in the following order:
> 1. Interrupt for L2 arrives.
> 2. L1 APIC EOIs the interrupt.
> 3. L1 resumes L2 and injects the interrupt.
> 4. L2 EOIs after servicing.
> 5. L1 performs the I/O APIC EOI.
>
> Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
> storm, e.g. if the IRQ line is still asserted and userspace reacts to the
> EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
> until step #4, and doesn't expect the interrupt to be re-enabled until
> step #5.
>
> Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
> of knowing if the userspace I/O APIC supports directed EOIs, i.e.
> suppressing EOI broadcasts would result in interrupts being stuck masked
> in the userspace I/O APIC due to step #5 being ignored by userspace. And
> fully disabling support for Suppress EOI Broadcast is also undesirable, as
> picking up the fix would require a guest reboot, *and* more importantly
> would change the virtual CPU model exposed to the guest without any buy-in
> from userspace.
>
> Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
> KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
> explicitly enable or disable support for Suppress EOI Broadcasts. This
> gives userspace control over the virtual CPU model exposed to the guest,
> as KVM should never have enabled support for Suppress EOI Broadcast without
> userspace opt-in. Not setting either flag will result in legacy quirky
> behavior for backward compatibility.
>
> Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel
> I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear
> that commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI")
> was entirely correct, i.e. it may have simply papered over the lack of
> Directed EOI emulation in the I/O APIC.
>
> Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
> APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
> is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
>
> Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_kvm_7D497EF1-2D607D-2D4D37-2D98E7-2DDAF95F099342-40nutanix.com&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=PGWMyignA0NiDmTlyP7vOTHozBws_VN86yrVmSMkBp0&m=3a8aRAaT-5-y7XGNZDSigpCoMKWgZMPHgKMf0pNKs-BEnjmuhtg8NxX9-jSx6CTp&s=KnIT5Yo-kg0bFDhJahB2sRZX445TmPKS4mmCSM0vhqo&e=
> Cc: stable@vger.kernel.org
> Suggested-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> Documentation/virt/kvm/api.rst | 28 +++++++++++-
> arch/x86/include/asm/kvm_host.h | 7 +++
> arch/x86/include/uapi/asm/kvm.h | 6 ++-
> arch/x86/kvm/ioapic.c | 2 +-
> arch/x86/kvm/lapic.c | 76 +++++++++++++++++++++++++++++----
> arch/x86/kvm/lapic.h | 2 +
> arch/x86/kvm/x86.c | 21 ++++++++-
> 7 files changed, 127 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 01a3abef8abb..f1f1d2e5dc7c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7835,8 +7835,10 @@ Will return -EBUSY if a VCPU has already been created.
>
> Valid feature flags in args[0] are::
>
> - #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
> - #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
> + #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
> + #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
> + #define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
> + #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
>
> Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
> KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
> @@ -7849,6 +7851,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
> without interrupt remapping. This is undesirable in logical mode,
> where 0xff represents CPUs 0-7 in cluster 0.
>
> +Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
> +Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
> +Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
> +sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
> +supported only when using a split IRQCHIP.
> +
> +Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
> +Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
> +support to the guest.
> +
> +Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
> +or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
> +behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
> +support for Suppress EOI Broadcasts but not actually suppress EOI
> +broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
> +Suppress EOI Broadcasts.
> +
> +Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
> +KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
> +as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
> +IRCHIP.
> +
> 7.8 KVM_CAP_S390_USER_INSTR0
> ----------------------------
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ecd4019b84b7..125bd9a4b807 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1238,6 +1238,12 @@ enum kvm_irqchip_mode {
> KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
> };
>
> +enum kvm_suppress_eoi_broadcast_mode {
> + KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
> + KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
> + KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
> +};
> +
> struct kvm_x86_msr_filter {
> u8 count;
> bool default_allow:1;
> @@ -1487,6 +1493,7 @@ struct kvm_arch {
>
> bool x2apic_format;
> bool x2apic_broadcast_quirk_disabled;
> + enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
>
> bool has_mapped_host_mmio;
> bool guest_can_read_msr_platform_info;
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 7ceff6583652..1b0ad5440b99 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish {
> __u64 pad1[4];
> };
>
> -#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
> -#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
> +#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
> +#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
> +#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2))
> +#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
>
> struct kvm_hyperv_eventfd {
> __u32 conn_id;
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 9a99d01b111c..a38a8e2ac70b 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -555,7 +555,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
> spin_lock(&ioapic->lock);
>
> if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> + kvm_lapic_suppress_eoi_broadcast(apic))
> return;
>
> ent->fields.remote_irr = 0;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 78c39341b2a5..c175a021e1a9 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
> apic_test_vector(vector, apic->regs + APIC_IRR);
> }
>
> +static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
> +{
> + switch (kvm->arch.suppress_eoi_broadcast_mode) {
> + case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
> + return true;
> + case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
> + return false;
> + case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
> + /*
> + * The default in-kernel I/O APIC emulates the 82093AA and does not
> + * implement an EOI register. Some guests (e.g. Windows with the
> + * Hyper-V role enabled) disable LAPIC EOI broadcast without
> + * checking the I/O APIC version, which can cause level-triggered
> + * interrupts to never be EOI'd.
> + *
> + * To avoid this, KVM doesn't advertise Suppress EOI Broadcast
> + * support when using the default in-kernel I/O APIC.
> + *
> + * Historically, in split IRQCHIP mode, KVM always advertised
> + * Suppress EOI Broadcast support but did not actually suppress
> + * EOIs, resulting in quirky behavior.
> + */
> + return !ioapic_in_kernel(kvm);
> + default:
> + WARN_ON_ONCE(1);
> + return false;
> + }
> +}
> +
> +bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
> +{
> + struct kvm *kvm = apic->vcpu->kvm;
> +
> + if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
> + return false;
> +
> + switch (kvm->arch.suppress_eoi_broadcast_mode) {
> + case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
> + return true;
> + case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
> + return false;
> + case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
> + /*
> + * Historically, in split IRQCHIP mode, KVM ignored the suppress
> + * EOI broadcast bit set by the guest and broadcasts EOIs to the
> + * userspace I/O APIC. For In-kernel I/O APIC, the support itself
> + * is not advertised, can only be enabled KVM_SET_APIC_STATE, and
> + * and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
> + * feature is enabled, it is respected (with odd behavior).
> + */
> + return ioapic_in_kernel(kvm);
> + default:
> + WARN_ON_ONCE(1);
> + return false;
> + }
> +}
> +
> __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
> EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
>
> @@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>
> v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
>
> - /*
> - * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
> - * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
> - * Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
> - * version first and level-triggered interrupts never get EOIed in
> - * IOAPIC.
> - */
> +
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
> - !ioapic_in_kernel(vcpu->kvm))
> + kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
> v |= APIC_LVR_DIRECTED_EOI;
> kvm_lapic_set_reg(apic, APIC_LVR, v);
> }
> @@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /*
> + * Don't exit to userspace if the guest has enabled Directed
> + * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
> + * APIC doesn't broadcast EOIs (the guest must EOI the target
> + * I/O APIC(s) directly).
> + */
> + if (kvm_lapic_suppress_eoi_broadcast(apic))
> + return;
> +
> apic->vcpu->arch.pending_ioapic_eoi = vector;
> kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> return;
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 71c80fa020e0..cf8aed8c95ea 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -239,6 +239,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
>
> bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
>
> +bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
> +
> void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
>
> void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d4e07f9cff5..82d893d262fa 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>
> #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
>
> -#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
> - KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
> +#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
> + KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
> + KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
> + KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
>
> static void update_cr8_intercept(struct kvm_vcpu *vcpu);
> static void process_nmi(struct kvm_vcpu *vcpu);
> @@ -4943,6 +4945,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> break;
> case KVM_CAP_X2APIC_API:
> r = KVM_X2APIC_API_VALID_FLAGS;
> + if (kvm && !irqchip_split(kvm))
> + r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
> break;
> case KVM_CAP_NESTED_STATE:
> r = kvm_x86_ops.nested_ops->get_state ?
> @@ -6751,11 +6755,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
> break;
>
> + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> + (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
> + break;
> +
> + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> + !irqchip_split(kvm))
> + break;
> +
> if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
> kvm->arch.x2apic_format = true;
> if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
> kvm->arch.x2apic_broadcast_quirk_disabled = true;
>
> + if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
> + kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
> + if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
> + kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
> +
> r = 0;
> break;
> case KVM_CAP_X86_DISABLE_EXITS:
LGTM, will test this and reply on this thread.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-14 0:10 ` Sean Christopherson
2026-01-16 4:41 ` Khushit Shah
@ 2026-01-16 9:01 ` David Woodhouse
2026-01-16 10:02 ` Khushit Shah
1 sibling, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-16 9:01 UTC (permalink / raw)
To: Sean Christopherson, Khushit Shah
Cc: pbonzini@redhat.com, kai.huang@intel.com, mingo@redhat.com,
x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
[-- Attachment #1: Type: text/plain, Size: 2923 bytes --]
On Tue, 2026-01-13 at 16:10 -0800, Sean Christopherson wrote:
> Except that it needs to work when it's re-enabled in a few patches. And as per
> commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI") and
> https://bugzilla.kernel.org/show_bug.cgi?id=82211, allegedly KVM needs to notify
> listeners in this case.
But KVM *will* notify listeners, surely? When the guest issues the EOI
via the I/O APIC EOIR register.
For that commit to have made any difference, Xen *has* to have been
buggy, enabling directed EOI in the local APIC despite the I/O APIC not
having the required support. Thus interrupts never got EOI'd at all,
and sure, the notifiers didn't get called.
> Given that KVM didn't actually implement Directed EOI in the in-kernel I/O APIC,
> it's certainly debatable as to whether or not that still holds true, i.e. it may
> have been a misdiagnosed root cause. But I have zero interest in finding out
> the hard way, especially since the in-kernel I/O APIC is slowly being deprecated,
> and _especially_ not in patches that will be Cc'd stable.
Isn't that *exactly* the issue we knew we were resolving properly by
implementing the EOIR in the I/O APIC?
We should test, sure. But I don't think the existence of that commit
should make us throw our hands up in the air and be too scared of just
fixing it properly.
> So while I agree it would be nice to simultaneously enable the in-kernel I/O APIC,
> I want to prioritize landing the fix for split IRQCHIP. And if we're clever,
> enabling in-kernel I/O APIC support in the future shouldn't require any new uAPI,
> since we can document the limitation and not advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST in KVM_CAP_X2APIC_API when run on a VM
> without a split IRQCHIP. Then if support is ever added broadly, we can drop the
> relevant code that requires irqchip_split() and update the documentation to say
> that userspace need to query KVM_CAP_X2APIC_API on a VM fd to determine whether
> or not the flag is supported for an in-kernel I/O APIC.
>
> If someone has a strong need and use case for supporting Supress EOI Broadcast for
> an in-kernel I/O APIC, then they can have the honor of proving that things like
> Windows and Xen play nice with KVM's implementation. And they can do that on top.
>
> Compile tested only, but this is what I'd like to go with for now (in a single
> patch, because IMO isolating the refactoring isn't a net positive without patch 2/3).
I dislike this. It's just another wart. And it looks like userspace can
still check the cap and set KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST,
and *then* add the in-kernel I/O APIC afterwards?
If you're concerned about what to backport to stable, then arguably
it's *only* KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST which should be
backported, as that's the bug, and _ENABLE_ is a new feature?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-16 9:01 ` David Woodhouse
@ 2026-01-16 10:02 ` Khushit Shah
2026-01-16 17:34 ` Sean Christopherson
0 siblings, 1 reply; 28+ messages in thread
From: Khushit Shah @ 2026-01-16 10:02 UTC (permalink / raw)
To: David Woodhouse
Cc: Sean Christopherson, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
> On 16 Jan 2026, at 2:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> But KVM *will* notify listeners, surely? When the guest issues the EOI
> via the I/O APIC EOIR register.
>
> For that commit to have made any difference, Xen *has* to have been
> buggy, enabling directed EOI in the local APIC despite the I/O APIC not
> having the required support. Thus interrupts never got EOI'd at all,
> and sure, the notifiers didn't get called.
You are describing
0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use”)
Since then I guess this issue should have been fixed?! As
c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI”) was much earlier.
> On 16 Jan 2026, at 2:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> If you're concerned about what to backport to stable, then arguably
> it's *only* KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST which should be
> backported, as that's the bug, and _ENABLE_ is a new feature?
I think neither DISABLE or ENABLE is a new feature at least for split IRQCHIP.
It’s just giving a way to user-space to fix a bug in a way they like, because that’s how
it should have been from the beginning.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-16 10:02 ` Khushit Shah
@ 2026-01-16 17:34 ` Sean Christopherson
2026-01-23 13:04 ` Khushit Shah
0 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2026-01-16 17:34 UTC (permalink / raw)
To: Khushit Shah
Cc: David Woodhouse, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
On Fri, Jan 16, 2026, Khushit Shah wrote:
> > On 16 Jan 2026, at 2:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > But KVM *will* notify listeners, surely? When the guest issues the EOI
> > via the I/O APIC EOIR register.
> >
> > For that commit to have made any difference, Xen *has* to have been
> > buggy, enabling directed EOI in the local APIC despite the I/O APIC not
> > having the required support. Thus interrupts never got EOI'd at all,
> > and sure, the notifiers didn't get called.
Oh, I 100% agree there were bugs aplenty on both sides, but that's exactly why I
don't want to add support for the in-kernel I/O APIC without a strong reason for
doing so.
> You are describing
> 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use”)
> Since then I guess this issue should have been fixed?! As
> c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI”) was much earlier.
>
> > On 16 Jan 2026, at 2:31 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > If you're concerned about what to backport to stable, then arguably
> > it's *only* KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST which should be
> > backported, as that's the bug, and _ENABLE_ is a new feature?
>
> I think neither DISABLE or ENABLE is a new feature at least for split
> IRQCHIP. It’s just giving a way to user-space to fix a bug in a way they
> like, because that’s how it should have been from the beginning.
Ya. I don't see ENABLE (for split IRQCHIP) as a new feature, because it's the
only way for userspace to fix its setups without changing the virtual CPU model
exposed to the guest.
For better or worse, the aforementioned commit 0bcc3fb95b97 ("KVM: lapic: stop
advertising DIRECTED_EOI when in-kernel IOAPIC is in use”) already clobbered the
virtual model when using an in-kernel I/O APIC. Even though KVM (AFAIK) got away
with the switcheroo then, I am strongly opposed to _KVM_ changing the virtual CPU
model. I.e. I want to give userspace the ability to choose how to address the
issue, because only userspace (or rather, the platform owner) knows whether or
not its I/O APIC implementation plays nice with ENABLE, whether it's risker to
continue with QUIRK vs. DISABLE, etc.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic
2026-01-16 17:34 ` Sean Christopherson
@ 2026-01-23 13:04 ` Khushit Shah
0 siblings, 0 replies; 28+ messages in thread
From: Khushit Shah @ 2026-01-23 13:04 UTC (permalink / raw)
To: Sean Christopherson
Cc: David Woodhouse, pbonzini@redhat.com, kai.huang@intel.com,
mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham
Tested the patch and works as expected. Sent v6.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes
2025-12-29 11:17 [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast Khushit Shah
` (2 preceding siblings ...)
2025-12-29 11:17 ` [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
@ 2026-01-29 4:49 ` David Woodhouse
2026-01-29 15:19 ` Sean Christopherson
3 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-29 4:49 UTC (permalink / raw)
To: Khushit Shah, seanjc, pbonzini, kai.huang
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham
[-- Attachment #1: Type: text/plain, Size: 18880 bytes --]
From: David Woodhouse <dwmw@amazon.co.uk>
Rather than being frightened of doing the right thing for the in-kernel
I/O APIC because "there might be bugs", let's add selftests for it to
make sure it behaves correctly. For both in-kernel I/O APIC and
userspace, exercise the following modes:
• Legacy "quirk" behaviour (this test shows the same results on both
old kernels and on kernels with this patch series in default mode).
• KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST mode, both with the guest
enabling APIC_SPIV_DIRECTED_EOI and without.
• KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST mode.
Testing quirk mode (no flags)...
IOAPIC v0x11, LVR directed_eoi=0, SPIV directed_eoi=0, remote_irr=0
Testing explicit enable...
IOAPIC v0x20, LVR directed_eoi=1, SPIV directed_eoi=1, remote_irr=1
Testing explicit enable (guest doesn't use)...
IOAPIC v0x20, LVR directed_eoi=1, SPIV directed_eoi=0, remote_irr=0
Testing explicit disable...
IOAPIC v0x11, LVR directed_eoi=0, SPIV directed_eoi=0, remote_irr=0
All tests passed
=== Testing split IRQCHIP mode ===
Testing quirk mode (no flags)...
Split IRQCHIP: LVR directed_eoi=1, SPIV directed_eoi=0, got_eoi_exit=1
Testing explicit enable...
Split IRQCHIP: LVR directed_eoi=1, SPIV directed_eoi=1, got_eoi_exit=0
Testing explicit enable (guest doesn't use)...
Split IRQCHIP: LVR directed_eoi=1, SPIV directed_eoi=0, got_eoi_exit=1
Testing explicit disable...
Split IRQCHIP: LVR directed_eoi=0, SPIV directed_eoi=0, got_eoi_exit=1
All tests passed
There didn't seem to be a way for selftests to use split irqchip mode
until now, so this adds vm_irqchip_mode modelled on the existing
vm_guest_mode enum.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 148d427ff24b..01c59bf8b79f 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -122,6 +122,7 @@ TEST_GEN_PROGS_x86 += x86/vmx_nested_tsc_scaling_test
TEST_GEN_PROGS_x86 += x86/apic_bus_clock_test
TEST_GEN_PROGS_x86 += x86/xapic_ipi_test
TEST_GEN_PROGS_x86 += x86/xapic_state_test
+TEST_GEN_PROGS_x86 += x86/suppress_eoi_test
TEST_GEN_PROGS_x86 += x86/xcr0_cpuid_test
TEST_GEN_PROGS_x86 += x86/xss_msr_test
TEST_GEN_PROGS_x86 += x86/debug_regs
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index d3f3e455c031..c4eb0e95bae9 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -209,6 +209,14 @@ kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
shape; \
})
+enum vm_irqchip_mode {
+ VM_IRQCHIP_AUTO,
+ VM_IRQCHIP_KERNEL,
+ VM_IRQCHIP_SPLIT,
+};
+
+extern enum vm_irqchip_mode vm_irqchip_mode;
+
#if defined(__aarch64__)
extern enum vm_guest_mode vm_mode_default;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 1a93d6361671..4858c10f7530 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1687,21 +1687,34 @@ void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa)
return (void *) ((uintptr_t) region->host_alias + offset);
}
+enum vm_irqchip_mode vm_irqchip_mode = VM_IRQCHIP_AUTO;
+
/* Create an interrupt controller chip for the specified VM. */
void vm_create_irqchip(struct kvm_vm *vm)
{
int r;
/*
- * Allocate a fully in-kernel IRQ chip by default, but fall back to a
- * split model (x86 only) if that fails (KVM x86 allows compiling out
- * support for KVM_CREATE_IRQCHIP).
+ * Create IRQ chip based on vm_irqchip_mode:
+ * - VM_IRQCHIP_AUTO: Try in-kernel, fall back to split if not supported
+ * - VM_IRQCHIP_KERNEL: Force in-kernel IRQ chip
+ * - VM_IRQCHIP_SPLIT: Force split IRQ chip (x86 only)
*/
- r = __vm_ioctl(vm, KVM_CREATE_IRQCHIP, NULL);
- if (r && errno == ENOTTY && kvm_has_cap(KVM_CAP_SPLIT_IRQCHIP))
+ if (vm_irqchip_mode == VM_IRQCHIP_SPLIT) {
+ TEST_ASSERT(kvm_has_cap(KVM_CAP_SPLIT_IRQCHIP),
+ "Split IRQ chip not supported");
vm_enable_cap(vm, KVM_CAP_SPLIT_IRQCHIP, 24);
- else
+ } else if (vm_irqchip_mode == VM_IRQCHIP_KERNEL) {
+ r = __vm_ioctl(vm, KVM_CREATE_IRQCHIP, NULL);
TEST_ASSERT_VM_VCPU_IOCTL(!r, KVM_CREATE_IRQCHIP, r, vm);
+ } else {
+ /* VM_IRQCHIP_AUTO */
+ r = __vm_ioctl(vm, KVM_CREATE_IRQCHIP, NULL);
+ if (r && errno == ENOTTY && kvm_has_cap(KVM_CAP_SPLIT_IRQCHIP))
+ vm_enable_cap(vm, KVM_CAP_SPLIT_IRQCHIP, 24);
+ else
+ TEST_ASSERT_VM_VCPU_IOCTL(!r, KVM_CREATE_IRQCHIP, r, vm);
+ }
vm->has_irqchip = true;
}
diff --git a/tools/testing/selftests/kvm/x86/suppress_eoi_test.c b/tools/testing/selftests/kvm/x86/suppress_eoi_test.c
new file mode 100644
index 000000000000..ea14690b3116
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/suppress_eoi_test.c
@@ -0,0 +1,441 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test KVM's handling of Suppress EOI Broadcast in x2APIC mode.
+ */
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+#include "apic.h"
+
+#define TEST_VECTOR 0xa2
+#define TEST_IRQ 5
+#define APIC_LVR_DIRECTED_EOI (1 << 24)
+#define APIC_SPIV_DIRECTED_EOI (1 << 12)
+
+#define APIC_ISR 0x100
+#define APIC_LVTPC 0x340
+#define APIC_LVT0 0x350
+#define APIC_LVT1 0x360
+#define APIC_LVTERR 0x370
+
+#define IOAPIC_BASE_GPA 0xfec00000
+#define IOAPIC_REG_SELECT 0x00
+#define IOAPIC_REG_WINDOW 0x10
+#define IOAPIC_REG_VERSION 0x01
+#define IOAPIC_REG_EOI 0x40
+
+static uint32_t ioapic_version;
+
+static void guest_irq_handler(struct ex_regs *regs)
+{
+}
+
+static uint32_t ioapic_read_reg(uint32_t reg)
+{
+ volatile uint32_t *ioapic = (volatile uint32_t *)IOAPIC_BASE_GPA;
+ ioapic[0] = reg;
+ return ioapic[4];
+}
+
+static void ioapic_write_reg(uint32_t reg, uint32_t val)
+{
+ volatile uint32_t *ioapic = (volatile uint32_t *)IOAPIC_BASE_GPA;
+ ioapic[0] = reg;
+ ioapic[4] = val;
+}
+
+static void guest_code(uint64_t use_directed)
+{
+ uint64_t apic_base;
+ uint32_t spiv;
+
+ /* Enable x2APIC mode */
+ apic_base = rdmsr(MSR_IA32_APICBASE);
+ wrmsr(MSR_IA32_APICBASE, apic_base | X2APIC_ENABLE);
+
+ /* Mask all LVT entries to prevent spurious interrupts/NMIs */
+ x2apic_write_reg(APIC_LVTT, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVTPC, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVT0, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVT1, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVTERR, APIC_LVT_MASKED);
+
+ /* Enable APIC */
+ x2apic_write_reg(APIC_SPIV, APIC_SPIV_APIC_ENABLED | TEST_VECTOR);
+
+ /* Read IOAPIC version */
+ ioapic_version = ioapic_read_reg(IOAPIC_REG_VERSION);
+
+ /* Conditionally set APIC_SPIV_DIRECTED_EOI based on flag */
+ if (use_directed) {
+ spiv = x2apic_read_reg(APIC_SPIV);
+ x2apic_write_reg(APIC_SPIV, spiv | APIC_SPIV_DIRECTED_EOI);
+ }
+ spiv = x2apic_read_reg(APIC_SPIV);
+
+ GUEST_SYNC(ioapic_version | (spiv << 16));
+
+ /* Enable interrupts and wait for interrupt to be delivered */
+ asm volatile("sti; hlt");
+
+ /* Write EOI to trigger broadcast (or not) */
+ x2apic_write_reg(APIC_EOI, 0);
+
+ GUEST_SYNC(0);
+
+ /* If IOAPIC v0x20, write directed EOI to clear remote_irr */
+ if ((ioapic_version & 0xff) == 0x20) {
+ ioapic_write_reg(IOAPIC_REG_EOI, TEST_VECTOR);
+ GUEST_SYNC(1);
+ }
+
+ GUEST_DONE();
+}
+
+static void test_suppress_eoi(uint64_t x2apic_flags, bool expect_advertised, bool expect_implemented,
+ bool use_directed)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct kvm_lapic_state lapic;
+ struct kvm_irqchip chip;
+ struct kvm_irq_level irq_level;
+ struct ucall uc;
+ uint32_t lvr, ioapic_ver, spiv_after;
+ bool remote_irr_set;
+
+ use_directed = use_directed;
+
+ vm = vm_create(1);
+
+ if (x2apic_flags)
+ vm_enable_cap(vm, KVM_CAP_X2APIC_API, x2apic_flags);
+
+ if (!vm->has_irqchip)
+ vm_create_irqchip(vm);
+
+ vcpu = vm_vcpu_add(vm, 0, guest_code);
+ vcpu_args_set(vcpu, 1, use_directed);
+
+ vm_install_exception_handler(vm, TEST_VECTOR, guest_irq_handler);
+
+ /* Map IOAPIC for guest access */
+ virt_map(vm, IOAPIC_BASE_GPA, IOAPIC_BASE_GPA, 1);
+
+ /* Configure level-triggered interrupt in IOAPIC */
+ chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ vm_ioctl(vm, KVM_GET_IRQCHIP, &chip);
+
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.vector = TEST_VECTOR;
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.delivery_mode = 0; /* fixed */
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.dest_mode = 0; /* physical */
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.mask = 0; /* unmasked */
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.trig_mode = 1; /* level */
+ chip.chip.ioapic.redirtbl[TEST_IRQ].fields.dest_id = vcpu->id;
+
+ vm_ioctl(vm, KVM_SET_IRQCHIP, &chip);
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC);
+ ioapic_ver = uc.args[1] & 0xff;
+ spiv_after = (uc.args[1] >> 16) & 0xffff;
+
+ /* Inject level-triggered interrupt */
+ irq_level.irq = TEST_IRQ;
+ irq_level.level = 1;
+ vm_ioctl(vm, KVM_IRQ_LINE, &irq_level);
+
+ /* De-assert immediately so we only get one interrupt */
+ irq_level.level = 0;
+ vm_ioctl(vm, KVM_IRQ_LINE, &irq_level);
+
+ /* Guest receives interrupt and writes EOI */
+ vcpu_run(vcpu);
+
+ /* Check what ucall we got */
+ int ucall_type = get_ucall(vcpu, &uc);
+
+ /* Handle guest completion based on what it did */
+ if (ucall_type == UCALL_SYNC) {
+ /* Guest has more to do */
+ vcpu_run(vcpu);
+ ucall_type = get_ucall(vcpu, &uc);
+
+ if (ucall_type == UCALL_SYNC) {
+ /* Guest wrote EOIR */
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_DONE);
+ } else {
+ TEST_ASSERT_EQ(ucall_type, UCALL_DONE);
+ }
+ } else {
+ TEST_ASSERT_EQ(ucall_type, UCALL_DONE);
+ }
+
+ /* Check remote_irr after all guest EOI activity */
+ chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ vm_ioctl(vm, KVM_GET_IRQCHIP, &chip);
+ remote_irr_set = chip.chip.ioapic.redirtbl[TEST_IRQ].fields.remote_irr;
+
+ /* De-assert IRQ line */
+ irq_level.level = 0;
+ vm_ioctl(vm, KVM_IRQ_LINE, &irq_level);
+
+ /* Check LAPIC LVR */
+ vcpu_ioctl(vcpu, KVM_GET_LAPIC, &lapic);
+ lvr = *(u32 *)&lapic.regs[APIC_LVR];
+
+ printf(" IOAPIC v0x%x, LVR directed_eoi=%d, SPIV directed_eoi=%d, remote_irr=%d\n",
+ ioapic_ver, !!(lvr & APIC_LVR_DIRECTED_EOI),
+ !!(spiv_after & APIC_SPIV_DIRECTED_EOI), remote_irr_set);
+
+ if (expect_advertised) {
+ TEST_ASSERT(lvr & APIC_LVR_DIRECTED_EOI,
+ "Expected APIC_LVR_DIRECTED_EOI, got LVR=0x%x", lvr);
+ } else {
+ TEST_ASSERT(!(lvr & APIC_LVR_DIRECTED_EOI),
+ "Expected no APIC_LVR_DIRECTED_EOI, got LVR=0x%x", lvr);
+ }
+
+ /* Check IOAPIC version */
+ if (expect_implemented) {
+ TEST_ASSERT(ioapic_ver == 0x20,
+ "Expected IOAPIC v0x20 (with EOIR), got 0x%x", ioapic_ver);
+ } else {
+ TEST_ASSERT(ioapic_ver == 0x11,
+ "Expected IOAPIC v0x11 (no EOIR), got 0x%x", ioapic_ver);
+ }
+
+ /* Check SPIV and remote_irr based on whether guest used directed EOI */
+ if (use_directed) {
+ TEST_ASSERT(spiv_after & APIC_SPIV_DIRECTED_EOI,
+ "Expected APIC_SPIV_DIRECTED_EOI set, got SPIV=0x%x", spiv_after);
+ TEST_ASSERT(remote_irr_set,
+ "Expected remote_irr set (EOI suppressed), got cleared");
+ } else {
+ TEST_ASSERT(!(spiv_after & APIC_SPIV_DIRECTED_EOI),
+ "Expected APIC_SPIV_DIRECTED_EOI clear, got SPIV=0x%x", spiv_after);
+ TEST_ASSERT(!remote_irr_set,
+ "Expected remote_irr cleared (EOI broadcast), got set");
+ }
+
+ kvm_vm_free(vm);
+}
+
+static void guest_code_split(uint64_t use_directed)
+{
+ uint64_t apic_base;
+ uint32_t spiv;
+
+ /* Enable x2APIC mode */
+ apic_base = rdmsr(MSR_IA32_APICBASE);
+ wrmsr(MSR_IA32_APICBASE, apic_base | X2APIC_ENABLE);
+
+ /* Mask all LVT entries */
+ x2apic_write_reg(APIC_LVTT, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVTPC, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVT0, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVT1, APIC_LVT_MASKED);
+ x2apic_write_reg(APIC_LVTERR, APIC_LVT_MASKED);
+
+ /* Enable APIC */
+ x2apic_write_reg(APIC_SPIV, APIC_SPIV_APIC_ENABLED | TEST_VECTOR);
+
+ /* Conditionally set APIC_SPIV_DIRECTED_EOI */
+ if (use_directed) {
+ spiv = x2apic_read_reg(APIC_SPIV);
+ x2apic_write_reg(APIC_SPIV, spiv | APIC_SPIV_DIRECTED_EOI);
+ }
+ spiv = x2apic_read_reg(APIC_SPIV);
+
+ GUEST_SYNC(spiv);
+
+ /* Enable interrupts and wait for interrupt */
+ asm volatile("sti; hlt");
+
+ /* Write EOI */
+ x2apic_write_reg(APIC_EOI, 0);
+
+ GUEST_DONE();
+}
+
+static void test_suppress_eoi_split(uint64_t x2apic_flags, bool expect_advertised, bool use_directed)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct kvm_lapic_state lapic;
+ struct ucall uc;
+ uint32_t lvr, spiv_after;
+ bool got_eoi_exit;
+ enum vm_irqchip_mode saved_mode = vm_irqchip_mode;
+
+ vm_irqchip_mode = VM_IRQCHIP_SPLIT;
+ vm = vm_create(1);
+ vm_irqchip_mode = saved_mode;
+
+ if (x2apic_flags)
+ vm_enable_cap(vm, KVM_CAP_X2APIC_API, x2apic_flags);
+
+ vcpu = vm_vcpu_add(vm, 0, guest_code_split);
+ vcpu_args_set(vcpu, 1, use_directed);
+
+ /* Set up IRQ routing so kernel knows userspace IOAPIC handles TEST_IRQ */
+ struct kvm_irq_routing *routing = calloc(1, sizeof(*routing) + sizeof(routing->entries[0]));
+ routing->nr = 1;
+ routing->entries[0].gsi = TEST_IRQ;
+ routing->entries[0].type = KVM_IRQ_ROUTING_MSI;
+ routing->entries[0].u.msi.address_lo = 0xfee00000; /* Dest ID 0 */
+ routing->entries[0].u.msi.address_hi = 0;
+ routing->entries[0].u.msi.data = TEST_VECTOR | (1 << 15); /* Level-triggered */
+ __vm_ioctl(vm, KVM_SET_GSI_ROUTING, routing);
+ free(routing);
+
+ vm_install_exception_handler(vm, TEST_VECTOR, guest_irq_handler);
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC);
+ spiv_after = uc.args[1];
+
+ /* Inject via GSI (which routes to MSI) */
+ struct kvm_irq_level irq_level = {
+ .irq = TEST_IRQ,
+ .level = 1,
+ };
+ vm_ioctl(vm, KVM_IRQ_LINE, &irq_level);
+ irq_level.level = 0;
+ vm_ioctl(vm, KVM_IRQ_LINE, &irq_level);
+
+ /* Guest receives interrupt and writes EOI */
+ vcpu_run(vcpu);
+
+
+ /* Check if we got KVM_EXIT_IOAPIC_EOI */
+ got_eoi_exit = (vcpu->run->exit_reason == KVM_EXIT_IOAPIC_EOI &&
+ vcpu->run->eoi.vector == TEST_VECTOR);
+
+ /* If we got EOI exit, continue guest to finish */
+ if (got_eoi_exit) {
+ vcpu_run(vcpu);
+ }
+
+ /* Let guest finish */
+ int ucall_type = get_ucall(vcpu, &uc);
+ if (ucall_type == UCALL_SYNC) {
+ vcpu_run(vcpu);
+ ucall_type = get_ucall(vcpu, &uc);
+ if (ucall_type == UCALL_SYNC) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_DONE);
+ } else {
+ TEST_ASSERT_EQ(ucall_type, UCALL_DONE);
+ }
+ } else {
+ TEST_ASSERT_EQ(ucall_type, UCALL_DONE);
+ }
+
+ /* Check LAPIC LVR */
+ vcpu_ioctl(vcpu, KVM_GET_LAPIC, &lapic);
+ lvr = *(u32 *)&lapic.regs[APIC_LVR];
+
+ printf(" Split IRQCHIP: LVR directed_eoi=%d, SPIV directed_eoi=%d, got_eoi_exit=%d\n",
+ !!(lvr & APIC_LVR_DIRECTED_EOI),
+ !!(spiv_after & APIC_SPIV_DIRECTED_EOI), got_eoi_exit);
+
+ if (expect_advertised) {
+ TEST_ASSERT(lvr & APIC_LVR_DIRECTED_EOI,
+ "Expected APIC_LVR_DIRECTED_EOI, got LVR=0x%x", lvr);
+ } else {
+ TEST_ASSERT(!(lvr & APIC_LVR_DIRECTED_EOI),
+ "Expected no APIC_LVR_DIRECTED_EOI, got LVR=0x%x", lvr);
+ }
+
+ /* Check EOI exit based on whether guest used directed EOI */
+ if (use_directed) {
+ TEST_ASSERT(spiv_after & APIC_SPIV_DIRECTED_EOI,
+ "Expected APIC_SPIV_DIRECTED_EOI set, got SPIV=0x%x", spiv_after);
+ TEST_ASSERT(!got_eoi_exit,
+ "Expected no EOI exit (suppressed), but got one");
+ } else {
+ TEST_ASSERT(!(spiv_after & APIC_SPIV_DIRECTED_EOI),
+ "Expected APIC_SPIV_DIRECTED_EOI clear, got SPIV=0x%x", spiv_after);
+ if (expect_advertised) {
+ /* Quirk mode: advertised but should still broadcast */
+ if (!got_eoi_exit) {
+ printf(" Note: No EOI exit in quirk mode (old kernel behavior)\n");
+ }
+ } else {
+ /* Feature not advertised, no EOI exits expected */
+ }
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ int cap;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_X2APIC_API));
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_IRQCHIP));
+
+ cap = kvm_check_cap(KVM_CAP_X2APIC_API);
+
+ /*
+ * Test that KVM correctly handles the suppress EOI broadcast flags.
+ * Note: The actual behavior depends on the kernel implementation.
+ * This test documents the expected behavior per the commit messages.
+ *
+ * Quirk mode: Don't advertise or implement (legacy behavior)
+ * Explicit enable: Advertise and implement
+ * Explicit disable: Don't advertise or implement
+ */
+
+ printf("Testing quirk mode (no flags)...\n");
+ test_suppress_eoi(0, false, false, false);
+
+ if (cap & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) {
+ printf("Testing explicit enable...\n");
+ test_suppress_eoi(KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST, true, true, true);
+
+ printf("Testing explicit enable (guest doesn't use)...\n");
+ test_suppress_eoi(KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST, true, true, false);
+ } else {
+ printf("Skipping explicit enable (not supported)...\n");
+ }
+
+ if (cap & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST) {
+ printf("Testing explicit disable...\n");
+ test_suppress_eoi(KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST, false, false, false);
+ } else {
+ printf("Skipping explicit disable (not supported)...\n");
+ }
+
+ printf("All tests passed\n");
+
+ /* Test split irqchip mode */
+ printf("\n=== Testing split IRQCHIP mode ===\n");
+
+ printf("Testing quirk mode (no flags)...\n");
+ test_suppress_eoi_split(0, true, false); /* Quirk: advertised in split mode */
+
+ if (cap & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) {
+ printf("Testing explicit enable...\n");
+ test_suppress_eoi_split(KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST, true, true);
+
+ printf("Testing explicit enable (guest doesn't use)...\n");
+ test_suppress_eoi_split(KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST, true, false);
+ } else {
+ printf("Skipping explicit enable (not supported)...\n");
+ }
+
+ if (cap & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST) {
+ printf("Testing explicit disable...\n");
+ test_suppress_eoi_split(KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST, false, false);
+ } else {
+ printf("Skipping explicit disable (not supported)...\n");
+ }
+
+ printf("All tests passed\n");
+ return 0;
+}
+
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes
2026-01-29 4:49 ` [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes David Woodhouse
@ 2026-01-29 15:19 ` Sean Christopherson
2026-01-29 15:58 ` David Woodhouse
0 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2026-01-29 15:19 UTC (permalink / raw)
To: David Woodhouse
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham
On Wed, Jan 28, 2026, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Rather than being frightened of doing the right thing for the in-kernel
> I/O APIC because "there might be bugs",
I'm not worried about bugs per se, I'm worried about breaking existing guests.
Even if KVM is 100% perfect, changes in behavior can still break guests,
especially for a feature like this where it seems like everyone got it wrong.
And as I said before, I'm not opposed to supporting directed EOI in the in-kernel
I/O APIC, but (a) I don't want to do it in conjunction with the fixes for stable@,
and (b) I'd prefer to not bother unless there's an actual use case for doing so.
The in-kernel I/O APIC isn't being deprecated, but AFAIK it's being de-prioritized
by pretty much every VMM. I.e. the risk vs. reward isn't there for me.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes
2026-01-29 15:19 ` Sean Christopherson
@ 2026-01-29 15:58 ` David Woodhouse
2026-02-04 0:00 ` Sean Christopherson
0 siblings, 1 reply; 28+ messages in thread
From: David Woodhouse @ 2026-01-29 15:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham
[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]
On Thu, 2026-01-29 at 07:19 -0800, Sean Christopherson wrote:
> On Wed, Jan 28, 2026, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > Rather than being frightened of doing the right thing for the in-kernel
> > I/O APIC because "there might be bugs",
>
> I'm not worried about bugs per se, I'm worried about breaking existing guests.
> Even if KVM is 100% perfect, changes in behavior can still break guests,
> especially for a feature like this where it seems like everyone got it wrong.
There's the potential for guest bugs when the local APIC actually
starts honouring the DIRECTED_EOI bit in the SPIV register, sure. At
that point, the guest *has* to do the direct EOI (and it has to work).
But that's why we kept the 'quirk' mode as the default unless userspace
explicitly opts in. And it's true for the split-irqchip too; fixing the
behaviour is the whole point of this exercise.
I don't see why supporting precisely the same behaviour in the kernel
irqchip is any different in that respect.
> And as I said before, I'm not opposed to supporting directed EOI in the in-kernel
> I/O APIC, but (a) I don't want to do it in conjunction with the fixes for stable@,
> and (b) I'd prefer to not bother unless there's an actual use case for doing so.
> The in-kernel I/O APIC isn't being deprecated, but AFAIK it's being de-prioritized
> by pretty much every VMM. I.e. the risk vs. reward isn't there for me.
I tend to favour the simplicity, with _ENABLE and _DISABLE just quietly
doing what their name implies without any of that nonsense about
"except if you have a kernel irqchip".
But as you wish. Most of this test case should be fine on v6 of the
patch which dropped in-kernel I/O APIC support. All the tests are
conditional on the corresponding support being advertised, so it just
needs updating to correctly detect the in-kernel _ENABLE support in
case that does get added. How did we say we would advertise that?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes
2026-01-29 15:58 ` David Woodhouse
@ 2026-02-04 0:00 ` Sean Christopherson
0 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2026-02-04 0:00 UTC (permalink / raw)
To: David Woodhouse
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham
On Thu, Jan 29, 2026, David Woodhouse wrote:
> On Thu, 2026-01-29 at 07:19 -0800, Sean Christopherson wrote:
> > On Wed, Jan 28, 2026, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > >
> > > Rather than being frightened of doing the right thing for the in-kernel
> > > I/O APIC because "there might be bugs",
> >
> > I'm not worried about bugs per se, I'm worried about breaking existing guests.
> > Even if KVM is 100% perfect, changes in behavior can still break guests,
> > especially for a feature like this where it seems like everyone got it wrong.
>
> There's the potential for guest bugs when the local APIC actually
> starts honouring the DIRECTED_EOI bit in the SPIV register, sure. At
> that point, the guest *has* to do the direct EOI (and it has to work).
>
> But that's why we kept the 'quirk' mode as the default unless userspace
> explicitly opts in. And it's true for the split-irqchip too; fixing the
> behaviour is the whole point of this exercise.
>
> I don't see why supporting precisely the same behaviour in the kernel
> irqchip is any different in that respect.
Conceptually, nothing. But fixing the in-kernel I/O APIC is more invasive, it's
not currently broken (KVM doesn't advertise DIRECTED_EOI or SUPPRESS_EOI_BROADCAST),
no one is lining up to actually utilizes the functionality, *and* there are some
historical warts in KVM that need to be addressed.
Add it all up, and for me at least, the risk vs. reward is very different for
split vs. fully in-kernel irqchips.
> > And as I said before, I'm not opposed to supporting directed EOI in the in-kernel
> > I/O APIC, but (a) I don't want to do it in conjunction with the fixes for stable@,
> > and (b) I'd prefer to not bother unless there's an actual use case for doing so.
> > The in-kernel I/O APIC isn't being deprecated, but AFAIK it's being de-prioritized
> > by pretty much every VMM. I.e. the risk vs. reward isn't there for me.
>
> I tend to favour the simplicity, with _ENABLE and _DISABLE just quietly
> doing what their name implies without any of that nonsense about
> "except if you have a kernel irqchip".
But they _don't_ do what their name implies if there's no in-kernel local APIC.
I.e. userspace needs to read the docs and do the right thing anyways.
> But as you wish. Most of this test case should be fine on v6 of the
> patch which dropped in-kernel I/O APIC support. All the tests are
> conditional on the corresponding support being advertised, so it just
> needs updating to correctly detect the in-kernel _ENABLE support in
> case that does get added. How did we say we would advertise that?
A doc update plus this:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 67e666921a12..d711493f9c69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4934,7 +4934,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_X2APIC_API:
r = KVM_X2APIC_API_VALID_FLAGS;
- if (kvm && !irqchip_split(kvm))
+ if (kvm && !irqchip_in_kernel(kvm))
r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
break;
case KVM_CAP_NESTED_STATE:
^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2026-02-04 0:00 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-29 11:17 [PATCH v5 0/3] KVM: x86: Add userspace control for Suppress EOI Broadcast Khushit Shah
2025-12-29 11:17 ` [PATCH v5 1/3] KVM: x86: Refactor suppress EOI broadcast logic Khushit Shah
2026-01-02 16:23 ` David Woodhouse
2026-01-12 4:15 ` Khushit Shah
2026-01-13 23:40 ` David Woodhouse
2026-01-14 0:10 ` Sean Christopherson
2026-01-16 4:41 ` Khushit Shah
2026-01-16 9:01 ` David Woodhouse
2026-01-16 10:02 ` Khushit Shah
2026-01-16 17:34 ` Sean Christopherson
2026-01-23 13:04 ` Khushit Shah
2026-01-13 23:11 ` Sean Christopherson
2025-12-29 11:17 ` [PATCH v5 2/3] KVM: x86/ioapic: Implement support for I/O APIC version 0x20 with EOIR Khushit Shah
2025-12-29 11:39 ` David Woodhouse
2025-12-29 12:21 ` Khushit Shah
2025-12-29 13:01 ` David Woodhouse
2025-12-29 15:16 ` Khushit Shah
2025-12-29 15:36 ` David Woodhouse
2025-12-29 15:57 ` Khushit Shah
2026-01-02 16:17 ` David Woodhouse
2026-01-12 3:22 ` Khushit Shah
2025-12-29 11:17 ` [PATCH v5 3/3] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-02 16:41 ` David Woodhouse
2026-01-12 3:27 ` Khushit Shah
2026-01-29 4:49 ` [PATCH v5 4/3] KVM: selftests: Add test cases for EOI suppression modes David Woodhouse
2026-01-29 15:19 ` Sean Christopherson
2026-01-29 15:58 ` David Woodhouse
2026-02-04 0:00 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox