* [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
@ 2026-01-23 12:56 Khushit Shah
2026-01-27 2:21 ` Khushit Shah
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Khushit Shah @ 2026-01-23 12:56 UTC (permalink / raw)
To: seanjc, pbonzini, kai.huang, dwmw2
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, Khushit Shah, stable
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated
by userspace), which KVM completely mishandles. When x2APIC support was
first added, KVM incorrectly advertised and "enabled" Suppress EOI
Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Suppress EOI Broadcast is problematic when the
guest relies on interrupts being masked in the I/O APIC until well after
the initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
explicitly enable or disable support for Suppress EOI Broadcasts. This
gives userspace control over the virtual CPU model exposed to the guest,
as KVM should never have enabled support for Suppress EOI Broadcast without
userspace opt-in. Not setting either flag will result in legacy quirky
behavior for backward compatibility.
Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel
I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear
that commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI")
was entirely correct, i.e. it may have simply papered over the lack of
Directed EOI emulation in the I/O APIC.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable@vger.kernel.org
Suggested-by: David Woodhouse <dwmw2@infradead.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
Documentation/virt/kvm/api.rst | 28 +++++++++++-
arch/x86/include/asm/kvm_host.h | 7 +++
arch/x86/include/uapi/asm/kvm.h | 6 ++-
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/lapic.c | 76 +++++++++++++++++++++++++++++----
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/x86.c | 21 ++++++++-
7 files changed, 127 insertions(+), 15 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..f1f1d2e5dc7c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7835,8 +7835,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7849,6 +7851,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
+Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
+Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
+sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
+supported only when using a split IRQCHIP.
+
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
+Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
+support to the guest.
+
+Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
+or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
+behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
+support for Suppress EOI Broadcasts but not actually suppress EOI
+broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
+Suppress EOI Broadcasts.
+
+Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
+as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
+IRCHIP.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..c27b3e5f60c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1226,6 +1226,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+enum kvm_suppress_eoi_broadcast_mode {
+ KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
+};
+
struct kvm_x86_msr_filter {
u8 count;
bool default_allow:1;
@@ -1475,6 +1481,7 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7ceff6583652..1b0ad5440b99 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
+#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2))
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2c2783296aed..a26fa4222f29 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -561,7 +561,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
spin_lock(&ioapic->lock);
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ kvm_lapic_suppress_eoi_broadcast(apic))
return;
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1597dd0b0cc6..d2a821420d28 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}
+static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * The default in-kernel I/O APIC emulates the 82093AA and does not
+ * implement an EOI register. Some guests (e.g. Windows with the
+ * Hyper-V role enabled) disable LAPIC EOI broadcast without
+ * checking the I/O APIC version, which can cause level-triggered
+ * interrupts to never be EOI'd.
+ *
+ * To avoid this, KVM doesn't advertise Suppress EOI Broadcast
+ * support when using the default in-kernel I/O APIC.
+ *
+ * Historically, in split IRQCHIP mode, KVM always advertised
+ * Suppress EOI Broadcast support but did not actually suppress
+ * EOIs, resulting in quirky behavior.
+ */
+ return !ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
+{
+ struct kvm *kvm = apic->vcpu->kvm;
+
+ if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
+ return false;
+
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * Historically, in split IRQCHIP mode, KVM ignored the suppress
+ * EOI broadcast bit set by the guest and broadcasts EOIs to the
+ * userspace I/O APIC. For In-kernel I/O APIC, the support itself
+ * is not advertised, can only be enabled KVM_SET_APIC_STATE, and
+ * and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
+ * feature is enabled, it is respected (with odd behavior).
+ */
+ return ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
@@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
- /*
- * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
- * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
- * Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
- * version first and level-triggered interrupts never get EOIed in
- * IOAPIC.
- */
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
- !ioapic_in_kernel(vcpu->kvm))
+ kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}
@@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly).
+ */
+ if (kvm_lapic_suppress_eoi_broadcast(apic))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 282b9b7da98c..e5f5a222eced 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -231,6 +231,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
+
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63afdb6bb078..e64b61091d2d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_X2APIC_API:
r = KVM_X2APIC_API_VALID_FLAGS;
+ if (kvm && !irqchip_split(kvm))
+ r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
break;
case KVM_CAP_NESTED_STATE:
r = kvm_x86_ops.nested_ops->get_state ?
@@ -6748,11 +6752,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
break;
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
+ break;
+
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ !irqchip_split(kvm))
+ break;
+
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
+ if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
+
r = 0;
break;
case KVM_CAP_X86_DISABLE_EXITS:
--
2.39.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-23 12:56 [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
@ 2026-01-27 2:21 ` Khushit Shah
2026-01-27 2:41 ` Khushit Shah
2026-01-27 21:09 ` David Woodhouse
2026-02-04 0:10 ` Sean Christopherson
2 siblings, 1 reply; 17+ messages in thread
From: Khushit Shah @ 2026-01-27 2:21 UTC (permalink / raw)
To: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
dwmw2@infradead.org
Cc: mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham, stable@vger.kernel.org
If no one has any further comments, let’s get this merged?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-27 2:21 ` Khushit Shah
@ 2026-01-27 2:41 ` Khushit Shah
0 siblings, 0 replies; 17+ messages in thread
From: Khushit Shah @ 2026-01-27 2:41 UTC (permalink / raw)
To: seanjc@google.com, pbonzini@redhat.com, kai.huang@intel.com,
dwmw2@infradead.org
Cc: mingo@redhat.com, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
dave.hansen@linux.intel.com, tglx@linutronix.de, Jon Kohler,
Shaju Abraham, stable@vger.kernel.org
If no one has any further comments, let’s get this merged?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-23 12:56 [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-27 2:21 ` Khushit Shah
@ 2026-01-27 21:09 ` David Woodhouse
2026-01-27 21:49 ` Sean Christopherson
2026-02-04 0:10 ` Sean Christopherson
2 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2026-01-27 21:09 UTC (permalink / raw)
To: Khushit Shah, seanjc, pbonzini, kai.huang
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, stable
[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]
On Fri, 2026-01-23 at 12:56 +0000, Khushit Shah wrote:
>
> @@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> break;
> case KVM_CAP_X2APIC_API:
> r = KVM_X2APIC_API_VALID_FLAGS;
> + if (kvm && !irqchip_split(kvm))
> + r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
> break;
> case KVM_CAP_NESTED_STATE:
> r = kvm_x86_ops.nested_ops->get_state ?
> @@ -6748,11 +6752,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
> break;
>
> + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> + (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
> + break;
> +
> + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> + !irqchip_split(kvm))
> + break;
> +
> if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
Is it possible to set KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
*then* create the in-kernel I/O APIC?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-27 21:09 ` David Woodhouse
@ 2026-01-27 21:49 ` Sean Christopherson
2026-01-27 22:36 ` David Woodhouse
0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-01-27 21:49 UTC (permalink / raw)
To: David Woodhouse
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham, stable
On Tue, Jan 27, 2026, David Woodhouse wrote:
> On Fri, 2026-01-23 at 12:56 +0000, Khushit Shah wrote:
> >
> > @@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > break;
> > case KVM_CAP_X2APIC_API:
> > r = KVM_X2APIC_API_VALID_FLAGS;
> > + if (kvm && !irqchip_split(kvm))
> > + r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
> > break;
> > case KVM_CAP_NESTED_STATE:
> > r = kvm_x86_ops.nested_ops->get_state ?
> > @@ -6748,11 +6752,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
> > break;
> >
> > + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> > + (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
> > + break;
> > +
> > + if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
> > + !irqchip_split(kvm))
> > + break;
> > +
> > if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
>
> Is it possible to set KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
> *then* create the in-kernel I/O APIC?
Nope, we should be good on that front, kvm->arch.irqchip_mode can't be changed
once its set. I.e. the irqchip_split() check could get a false negative if it's
racing with KVM_CREATE_IRQCHIP, but it can't get a false positive and thus
incorrectly allow KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-27 21:49 ` Sean Christopherson
@ 2026-01-27 22:36 ` David Woodhouse
2026-01-28 2:22 ` Huang, Kai
2026-01-28 14:44 ` Sean Christopherson
0 siblings, 2 replies; 17+ messages in thread
From: David Woodhouse @ 2026-01-27 22:36 UTC (permalink / raw)
To: Sean Christopherson
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham, stable
[-- Attachment #1: Type: text/plain, Size: 981 bytes --]
On Tue, 2026-01-27 at 13:49 -0800, Sean Christopherson wrote:
>
> Nope, we should be good on that front, kvm->arch.irqchip_mode can't be changed
> once its set. I.e. the irqchip_split() check could get a false negative if it's
> racing with KVM_CREATE_IRQCHIP, but it can't get a false positive and thus
> incorrectly allow KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
Ah, so userspace which checks all the kernel's capabilities *first*
will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
I guess that's tolerable¹ but the documentation could make it clearer,
perhaps? I can see VMMs silently failing to detect the feature because
they just don't set split-irqchip before checking for it?
¹ although I still kind of hate it and would have preferred to have the
I/O APIC patch; userspace still has to intentionally *enable* that
combination. But OK, I've reluctantly conceded that.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-27 22:36 ` David Woodhouse
@ 2026-01-28 2:22 ` Huang, Kai
2026-01-28 3:48 ` David Woodhouse
2026-01-28 14:44 ` Sean Christopherson
1 sibling, 1 reply; 17+ messages in thread
From: Huang, Kai @ 2026-01-28 2:22 UTC (permalink / raw)
To: seanjc@google.com, dwmw2@infradead.org
Cc: shaju.abraham@nutanix.com, khushit.shah@nutanix.com,
x86@kernel.org, bp@alien8.de, stable@vger.kernel.org,
hpa@zytor.com, linux-kernel@vger.kernel.org, mingo@redhat.com,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, Kohler, Jon, tglx@linutronix.de
On Tue, 2026-01-27 at 14:36 -0800, David Woodhouse wrote:
> On Tue, 2026-01-27 at 13:49 -0800, Sean Christopherson wrote:
> >
> > Nope, we should be good on that front, kvm->arch.irqchip_mode can't be changed
> > once its set. I.e. the irqchip_split() check could get a false negative if it's
> > racing with KVM_CREATE_IRQCHIP, but it can't get a false positive and thus
> > incorrectly allow KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
>
> Ah, so userspace which checks all the kernel's capabilities *first*
> will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
>
> I guess that's tolerable¹ but the documentation could make it clearer,
> perhaps? I can see VMMs silently failing to detect the feature because
> they just don't set split-irqchip before checking for it?
>
>
> ¹ although I still kind of hate it and would have preferred to have the
> I/O APIC patch; userspace still has to intentionally *enable* that
> combination. But OK, I've reluctantly conceded that.
To make it even more robust, perhaps we can grab kvm->lock mutex in
kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
KVM_CAP_SPLIT_IRQCHIP?
Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 2:22 ` Huang, Kai
@ 2026-01-28 3:48 ` David Woodhouse
[not found] ` <SA2PR02MB756478359EE9185285ACE6158891A@SA2PR02MB7564.namprd02.prod.outlook.com>
2026-01-28 6:15 ` Huang, Kai
0 siblings, 2 replies; 17+ messages in thread
From: David Woodhouse @ 2026-01-28 3:48 UTC (permalink / raw)
To: Huang, Kai, seanjc@google.com
Cc: shaju.abraham@nutanix.com, khushit.shah@nutanix.com,
x86@kernel.org, bp@alien8.de, stable@vger.kernel.org,
hpa@zytor.com, linux-kernel@vger.kernel.org, mingo@redhat.com,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, Kohler, Jon, tglx@linutronix.de
[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]
On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
>
> > Ah, so userspace which checks all the kernel's capabilities *first*
> > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> >
> > I guess that's tolerable¹ but the documentation could make it clearer,
> > perhaps? I can see VMMs silently failing to detect the feature because
> > they just don't set split-irqchip before checking for it?
> >
> >
> > ¹ although I still kind of hate it and would have preferred to have the
> > I/O APIC patch; userspace still has to intentionally *enable* that
> > combination. But OK, I've reluctantly conceded that.
>
> To make it even more robust, perhaps we can grab kvm->lock mutex in
> kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> KVM_CAP_SPLIT_IRQCHIP?
>
> Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
If we do that, then the query for KVM_CAP_X2APIC_API could advertise
the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
KVM_CAP_SPLIT_IRQCHIP?
That would be slightly better than the existing proposed awfulness
where the kernel doesn't *admit* to having the _ENABLE_ capability
until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
[not found] ` <SA2PR02MB756478359EE9185285ACE6158891A@SA2PR02MB7564.namprd02.prod.outlook.com>
@ 2026-01-28 5:17 ` Khushit Shah
2026-01-28 5:32 ` David Woodhouse
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Khushit Shah @ 2026-01-28 5:17 UTC (permalink / raw)
To: David Woodhouse, Huang, Kai, seanjc@google.com
Cc: Shaju Abraham, x86@kernel.org, bp@alien8.de,
stable@vger.kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, mingo@redhat.com,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, Jon Kohler, tglx@linutronix.de
> On 28 Jan 2026, at 9:27 AM, Khushit Shah <khushit.shah@nutanix.com> wrote:
>
>
> On 28/01/26, 9:19 AM, "David Woodhouse" <dwmw2@infradead.org> wrote:
>
> On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> >
> > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > perhaps? I can see VMMs silently failing to detect the feature because
> > > they just don't set split-irqchip before checking for it? > > > > > > ¹ although I still kind of hate it and would have preferred to have the
> > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > combination. But OK, I've reluctantly conceded that.
> > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > KVM_CAP_SPLIT_IRQCHIP?
> > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
>
> If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> KVM_CAP_SPLIT_IRQCHIP?
>
> That would be slightly better than the existing proposed awfulness
> where the kernel doesn't *admit* to having the _ENABLE_ capability
> until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
How about we make an explicit _ENABLE_ bit for split IRQCHIP?
When/if in-kernel IRQCHIP starts supporting I/O APIC 0x20, we
can add a separate bit for that in the CAP.
This way:
- The flag name (KVM_X2APIC_SPLIT_ENABLE_SEOIB) is self-documenting.
- We always advertise it in KVM_CHECK_EXTENSION.
- Enabling requires split IRQCHIP to be configured first.
- Mutex protects against races with KVM_CAP_SPLIT_IRQCHIP.
Diff below (compile tested):
—
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Fri, 23 Jan 2026 12:45:46 +0000
Subject: [PATCH] KVM: x86: Add x2APIC "features" to control EOI broadcast
suppression
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated
by userspace), which KVM completely mishandles. When x2APIC support was
first added, KVM incorrectly advertised and "enabled" Suppress EOI
Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Suppress EOI Broadcast is problematic when the
guest relies on interrupts being masked in the I/O APIC until well after
the initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add KVM_X2APIC_SPLIT_ENABLE_SEOIB and KVM_X2APIC_DISABLE_SEOIB flags to
allow userspace to explicitly enable(for split IRQCHIP) or disable support
for Suppress EOI Broadcasts. This gives userspace control over the virtual
CPU model exposed to the guest, as KVM should never have enabled support
for Suppress EOI Broadcast without userspace opt-in. Not setting either
flag will result in legacy quirky behavior for backward compatibility.
Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel
I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear
that commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI")
was entirely correct, i.e. it may have simply papered over the lack of
Directed EOI emulation in the I/O APIC.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable@vger.kernel.org
Suggested-by: David Woodhouse <dwmw2@infradead.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
Documentation/virt/kvm/api.rst | 26 ++++++++++-
arch/x86/include/asm/kvm_host.h | 7 +++
arch/x86/include/uapi/asm/kvm.h | 6 ++-
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/lapic.c | 76 +++++++++++++++++++++++++++++----
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/x86.c | 24 ++++++++++-
7 files changed, 128 insertions(+), 15 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..6244ac6d865f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7835,8 +7835,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_SPLIT_ENABLE_SEOIB (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SEOIB (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7849,6 +7851,26 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_SPLIT_ENABLE_SEOIB instructs KVM to enable Suppress EOI
+Broadcasts when using a split IRQCHIP. KVM will advertise support for
+Suppress EOI Broadcast to the guest and suppress LAPIC EOI broadcasts when
+the guest sets the Suppress EOI Broadcast bit in the SPIV register. This
+flag requires split IRQCHIP to be configured first via
+KVM_CAP_SPLIT_IRQCHIP.
+
+Setting KVM_X2APIC_DISABLE_SEOIB disables support for Suppress EOI Broadcasts
+entirely, i.e. instructs KVM to NOT advertise support to the guest.
+
+Modern VMMs should explicitly set one of the two flags as appropriate:
+KVM_X2APIC_SPLIT_ENABLE_SEOIB for split IRQCHIP with directed EOI support,
+or KVM_X2APIC_DISABLE_SEOIB otherwise. If neither flag is set, legacy quirky
+behavior will be preserved: in split IRQCHIP mode, KVM will advertise support
+for Suppress EOI Broadcasts but not actually suppress EOI broadcasts; for
+in-kernel IRQCHIP mode, KVM will not advertise support.
+
+Setting both KVM_X2APIC_SPLIT_ENABLE_SEOIB and KVM_X2APIC_DISABLE_SEOIB will
+fail with an EINVAL error.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..c27b3e5f60c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1226,6 +1226,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+enum kvm_suppress_eoi_broadcast_mode {
+ KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
+};
+
struct kvm_x86_msr_filter {
u8 count;
bool default_allow:1;
@@ -1475,6 +1481,7 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7ceff6583652..bef6a3e06d94 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
+#define KVM_X2APIC_SPLIT_ENABLE_SEOIB (_BITULL(2))
+#define KVM_X2APIC_DISABLE_SEOIB (_BITULL(3))
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2c2783296aed..a26fa4222f29 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -561,7 +561,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
spin_lock(&ioapic->lock);
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ kvm_lapic_suppress_eoi_broadcast(apic))
return;
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1597dd0b0cc6..74c09ba4b280 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}
+static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * The default in-kernel I/O APIC emulates the 82093AA and does not
+ * implement an EOI register. Some guests (e.g. Windows with the
+ * Hyper-V role enabled) disable LAPIC EOI broadcast without
+ * checking the I/O APIC version, which can cause level-triggered
+ * interrupts to never be EOI'd.
+ *
+ * To avoid this, KVM doesn't advertise Suppress EOI Broadcast
+ * support when using the default in-kernel I/O APIC.
+ *
+ * Historically, in split IRQCHIP mode, KVM always advertised
+ * Suppress EOI Broadcast support but did not actually suppress
+ * EOIs, resulting in quirky behavior.
+ */
+ return !ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
+{
+ struct kvm *kvm = apic->vcpu->kvm;
+
+ if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
+ return false;
+
+ switch (kvm->arch.suppress_eoi_broadcast_mode) {
+ case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
+ return true;
+ case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
+ return false;
+ case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
+ /*
+ * Historically, in split IRQCHIP mode, KVM ignored the suppress
+ * EOI broadcast bit set by the guest and broadcasts EOIs to the
+ * userspace I/O APIC. For In-kernel I/O APIC, the support itself
+ * is not advertised, can only be enabled KVM_SET_APIC_STATE,
+ * and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
+ * feature is enabled, it is respected (with odd behavior).
+ */
+ return ioapic_in_kernel(kvm);
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+}
+
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
@@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
- /*
- * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
- * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
- * Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
- * version first and level-triggered interrupts never get EOIed in
- * IOAPIC.
- */
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
- !ioapic_in_kernel(vcpu->kvm))
+ kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}
@@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly).
+ */
+ if (kvm_lapic_suppress_eoi_broadcast(apic))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 282b9b7da98c..e5f5a222eced 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -231,6 +231,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
+bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
+
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63afdb6bb078..3c0af3fcbbb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_SPLIT_ENABLE_SEOIB | \
+ KVM_X2APIC_DISABLE_SEOIB)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -6748,12 +6750,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
break;
+ if ((cap->args[0] & KVM_X2APIC_SPLIT_ENABLE_SEOIB) &&
+ (cap->args[0] & KVM_X2APIC_DISABLE_SEOIB))
+ break;
+
+ mutex_lock(&kvm->lock);
+
+ /* SPLIT_ENABLE_SEOIB requires split irqchip to be configured. */
+ if ((cap->args[0] & KVM_X2APIC_SPLIT_ENABLE_SEOIB) &&
+ !irqchip_split(kvm))
+ goto x2apic_api_unlock;
+
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
+ if (cap->args[0] & KVM_X2APIC_SPLIT_ENABLE_SEOIB)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SEOIB)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
+
r = 0;
+x2apic_api_unlock:
+ mutex_unlock(&kvm->lock);
break;
case KVM_CAP_X86_DISABLE_EXITS:
r = -EINVAL;
--
2.39.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 5:17 ` Khushit Shah
@ 2026-01-28 5:32 ` David Woodhouse
2026-01-28 6:40 ` Huang, Kai
2026-01-28 15:04 ` Sean Christopherson
2 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2026-01-28 5:32 UTC (permalink / raw)
To: Khushit Shah, Huang, Kai, seanjc@google.com
Cc: Shaju Abraham, x86@kernel.org, bp@alien8.de,
stable@vger.kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, mingo@redhat.com,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, Jon Kohler, tglx@linutronix.de
[-- Attachment #1: Type: text/plain, Size: 791 bytes --]
On Wed, 2026-01-28 at 05:17 +0000, Khushit Shah wrote:
>
> How about we make an explicit _ENABLE_ bit for split IRQCHIP?
> When/if in-kernel IRQCHIP starts supporting I/O APIC 0x20, we
> can add a separate bit for that in the CAP.
>
> This way:
> - The flag name (KVM_X2APIC_SPLIT_ENABLE_SEOIB) is self-documenting.
> - We always advertise it in KVM_CHECK_EXTENSION.
> - Enabling requires split IRQCHIP to be configured first.
> - Mutex protects against races with KVM_CAP_SPLIT_IRQCHIP.
Ick. The more we iterate on this, the more convinced I am that we
should just enable it for I/O APIC at the same time. Userspace has to
explicitly opt in to the combination of kernel I/O APIC and SEOIB
anyway.
So I'll just bow out of the conversation; do whatever you think best.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 3:48 ` David Woodhouse
[not found] ` <SA2PR02MB756478359EE9185285ACE6158891A@SA2PR02MB7564.namprd02.prod.outlook.com>
@ 2026-01-28 6:15 ` Huang, Kai
2026-01-28 14:57 ` Sean Christopherson
1 sibling, 1 reply; 17+ messages in thread
From: Huang, Kai @ 2026-01-28 6:15 UTC (permalink / raw)
To: seanjc@google.com, dwmw2@infradead.org
Cc: Kohler, Jon, khushit.shah@nutanix.com, x86@kernel.org,
bp@alien8.de, hpa@zytor.com, mingo@redhat.com,
linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com,
pbonzini@redhat.com, stable@vger.kernel.org,
shaju.abraham@nutanix.com, kvm@vger.kernel.org,
tglx@linutronix.de
On Tue, 2026-01-27 at 19:48 -0800, David Woodhouse wrote:
> On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> >
> > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > >
> > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > perhaps? I can see VMMs silently failing to detect the feature because
> > > they just don't set split-irqchip before checking for it?
> > >
> > >
> > > ¹ although I still kind of hate it and would have preferred to have the
> > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > combination. But OK, I've reluctantly conceded that.
> >
> > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > KVM_CAP_SPLIT_IRQCHIP?
> >
> > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
>
> If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> KVM_CAP_SPLIT_IRQCHIP?
No IIUC it doesn't change that?
The change I mentioned above is only related to "enable" part, but not
"query" part.
The "query" is done via kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API),
and in this patch, it does:
@@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
ext)
break;
case KVM_CAP_X2APIC_API:
r = KVM_X2APIC_API_VALID_FLAGS;
+ if (kvm && !irqchip_split(kvm))
+ r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
IIRC if this is called before KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP,
then !irqchip_split() will be true, so it will NOT advertise
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
If it is called after KVM_CAP_SPLIT_IRQCHIP, then it will advertise
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
Btw, it doesn't grab kvm->lock either, so theoretically it could race with
KVM_CREATE_IRQCHIP and kvm_vm_ioctl_enable_cap(KVM_CAP_SPLIT_IRQCHIP) too.
>
> That would be slightly better than the existing proposed awfulness
> where the kernel doesn't *admit* to having the _ENABLE_ capability
> until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
We could also make kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API) to
_always_ advertise KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST if that's
better.
I suppose what we need is to document such behaviour -- that albeit
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is advertise as supposed, but it
cannot be enabled together with KVM_CREATE_IRQCHIP -- one will fail
depending on which is called first.
As a bonus, it can get rid of "calling irqchip_split() w/o holding kvm-
>lock" awfulness too.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 5:17 ` Khushit Shah
2026-01-28 5:32 ` David Woodhouse
@ 2026-01-28 6:40 ` Huang, Kai
2026-01-28 15:04 ` Sean Christopherson
2 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2026-01-28 6:40 UTC (permalink / raw)
To: khushit.shah@nutanix.com, seanjc@google.com, dwmw2@infradead.org
Cc: Kohler, Jon, x86@kernel.org, bp@alien8.de, hpa@zytor.com,
mingo@redhat.com, linux-kernel@vger.kernel.org,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
shaju.abraham@nutanix.com, stable@vger.kernel.org,
kvm@vger.kernel.org, tglx@linutronix.de
> > >
> > > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > > perhaps? I can see VMMs silently failing to detect the feature because
> > > > they just don't set split-irqchip before checking for it? > > > > > > ¹ although I still kind of hate it and would have preferred to have the
> > > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > > combination. But OK, I've reluctantly conceded that.
> > > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > > KVM_CAP_SPLIT_IRQCHIP?
> > > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
> >
> > If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> > the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> > even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> > KVM_CAP_SPLIT_IRQCHIP?
> >
> > That would be slightly better than the existing proposed awfulness
> > where the kernel doesn't *admit* to having the _ENABLE_ capability
> > until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
>
>
> How about we make an explicit _ENABLE_ bit for split IRQCHIP?
> When/if in-kernel IRQCHIP starts supporting I/O APIC 0x20, we
> can add a separate bit for that in the CAP.
>
> This way:
> - The flag name (KVM_X2APIC_SPLIT_ENABLE_SEOIB) is self-documenting.
> - We always advertise it in KVM_CHECK_EXTENSION.
> - Enabling requires split IRQCHIP to be configured first.
> - Mutex protects against races with KVM_CAP_SPLIT_IRQCHIP.
>
> Diff below (compile tested):
>
(Somehow I only saw this reply after I replied to David.)
Looks better to me, so Ack.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-27 22:36 ` David Woodhouse
2026-01-28 2:22 ` Huang, Kai
@ 2026-01-28 14:44 ` Sean Christopherson
1 sibling, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2026-01-28 14:44 UTC (permalink / raw)
To: David Woodhouse
Cc: Khushit Shah, pbonzini, kai.huang, mingo, x86, bp, hpa,
linux-kernel, kvm, dave.hansen, tglx, jon, shaju.abraham, stable
On Tue, Jan 27, 2026, David Woodhouse wrote:
> On Tue, 2026-01-27 at 13:49 -0800, Sean Christopherson wrote:
> >
> > Nope, we should be good on that front, kvm->arch.irqchip_mode can't be changed
> > once its set. I.e. the irqchip_split() check could get a false negative if it's
> > racing with KVM_CREATE_IRQCHIP, but it can't get a false positive and thus
> > incorrectly allow KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
>
> Ah, so userspace which checks all the kernel's capabilities *first*
> will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
Only if userspace creates a VM and uses that to check capabilities, in which case
KVM is 100% right to say that KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST isn't
supported. If userspace checks the system-scoped ioctl, i.e. with @kvm=NULL, it
will see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
> I guess that's tolerable¹ but the documentation could make it clearer,
> perhaps? I can see VMMs silently failing to detect the feature because
> they just don't set split-irqchip before checking for it?
Hmm, if we want to improve that particular documentation, then we should do so
in the description of KVM_CHECK_EXTENSION itself, which currently says:
Based on their initialization different VMs may have different capabilities.
It is thus encouraged to use the vm ioctl to query for capabilities (available
with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
Because there multiple capabilities that are conditionally supported based on
the VM type/configuration, i.e. this behavior isn't novel.
> ¹ although I still kind of hate it and would have preferred to have the
> I/O APIC patch; userspace still has to intentionally *enable* that
> combination. But OK, I've reluctantly conceded that.
Eh, VM really should be returning '0' for the check for all KVM_CAP_X2APIC_API,
and disallowing the capability, if the VM doesn't have an in-kernel local APIC.
Because enabling any of the KVM_X2APIC_API_* options without a local APIC doesn't
actually do anything.
I say that because I'd be very tempted to "fix" that by restricting new flags to
VMs with irqchip_in_kernel(), at which point userspace needs to get the ordering
right anyways.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 6:15 ` Huang, Kai
@ 2026-01-28 14:57 ` Sean Christopherson
2026-01-28 21:10 ` Huang, Kai
0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-01-28 14:57 UTC (permalink / raw)
To: Kai Huang
Cc: dwmw2@infradead.org, Jon Kohler, khushit.shah@nutanix.com,
x86@kernel.org, bp@alien8.de, hpa@zytor.com, mingo@redhat.com,
linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com,
pbonzini@redhat.com, stable@vger.kernel.org,
shaju.abraham@nutanix.com, kvm@vger.kernel.org,
tglx@linutronix.de
On Wed, Jan 28, 2026, Kai Huang wrote:
> On Tue, 2026-01-27 at 19:48 -0800, David Woodhouse wrote:
> > On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> > >
> > > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > >
> > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > > perhaps? I can see VMMs silently failing to detect the feature because
> > > > they just don't set split-irqchip before checking for it?
> > > >
> > > >
> > > > ¹ although I still kind of hate it and would have preferred to have the
> > > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > > combination. But OK, I've reluctantly conceded that.
> > >
> > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > > KVM_CAP_SPLIT_IRQCHIP?
> > >
> > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
> >
> > If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> > the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> > even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> > KVM_CAP_SPLIT_IRQCHIP?
>
> No IIUC it doesn't change that?
>
> The change I mentioned above is only related to "enable" part, but not
> "query" part.
>
> The "query" is done via kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API),
> and in this patch, it does:
>
> @@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
> ext)
> break;
> case KVM_CAP_X2APIC_API:
> r = KVM_X2APIC_API_VALID_FLAGS;
> + if (kvm && !irqchip_split(kvm))
> + r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
>
> IIRC if this is called before KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP,
> then !irqchip_split() will be true, so it will NOT advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
>
> If it is called after KVM_CAP_SPLIT_IRQCHIP, then it will advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
Yep. And when called at system-scope, i.e. with @kvm=NULL, userspace will see
the maximal support with KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
> Btw, it doesn't grab kvm->lock either, so theoretically it could race with
> KVM_CREATE_IRQCHIP and kvm_vm_ioctl_enable_cap(KVM_CAP_SPLIT_IRQCHIP) too.
That's totally fine.
> > That would be slightly better than the existing proposed awfulness
> > where the kernel doesn't *admit* to having the _ENABLE_ capability
> > until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
>
> We could also make kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API) to
> _always_ advertise KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST if that's
> better.
No, because then we'd need new uAPI if we add support for ENABLE_SUPPRESS_EOI_BROADCAST
with an in-kernel I/O APIC.
> I suppose what we need is to document such behaviour -- that albeit
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is advertise as supposed, but it
> cannot be enabled together with KVM_CREATE_IRQCHIP -- one will fail
> depending on which is called first.
No, we don't need to explicitly document this, because it's super duper basic
multi-threaded programming. KVM only needs to documented that
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST requires a VM with KVM_CAP_SPLIT_IRQCHIP.
> As a bonus, it can get rid of "calling irqchip_split() w/o holding kvm-
> >lock" awfulness too.
No, it's not awfulness. It's userspace's responsibility to not be stupid. KVM
taking kvm->lock changes *nothing*. All holding kvm->lock does is serialize KVM
code, it doesn't prevent a race. I.e. it just changes whether tasks are racing
to acquire kvm->lock versus racing against irqchip_mode.
If userspace invokes KVM_CAP_SPLIT_IRQCHIP and KVM_ENABLE_CAP concurrently on two
separate tasks, then KVM_ENABLE_CAP will fail ~50% of the time regardless of
whether or not KVM takes kvm->lock.
CPU0 CPU1
1. Locked Failure
----------------------------------------------------
lock(kvm->lock)
KVM_ENABLE_CAP = EINVAL
unlock(kvm->lock)
lock(kvm->lock)
KVM_CAP_SPLIT_IRQCHIP = 0
unlock(kvm->lock)
1. Locked Success
----------------------------------------------------
lock(kvm->lock)
KVM_CAP_SPLIT_IRQCHIP = 0
unlock(kvm->lock)
lock(kvm->lock)
KVM_ENABLE_CAP = 0
unlock(kvm->lock)
3. Lockless Failure
----------------------------------------------------
KVM_ENABLE_CAP = EINVAL
KVM_CAP_SPLIT_IRQCHIP = 0
4. Lockless Success
----------------------------------------------------
CPU0 CPU1
KVM_CAP_SPLIT_IRQCHIP = 0
KVM_ENABLE_CAP = 0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 5:17 ` Khushit Shah
2026-01-28 5:32 ` David Woodhouse
2026-01-28 6:40 ` Huang, Kai
@ 2026-01-28 15:04 ` Sean Christopherson
2 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2026-01-28 15:04 UTC (permalink / raw)
To: Khushit Shah
Cc: David Woodhouse, Kai Huang, Shaju Abraham, x86@kernel.org,
bp@alien8.de, stable@vger.kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, mingo@redhat.com,
dave.hansen@linux.intel.com, pbonzini@redhat.com,
kvm@vger.kernel.org, Jon Kohler, tglx@linutronix.de
On Wed, Jan 28, 2026, Khushit Shah wrote:
>
> > On 28 Jan 2026, at 9:27 AM, Khushit Shah <khushit.shah@nutanix.com> wrote:
> >
> >
> > On 28/01/26, 9:19 AM, "David Woodhouse" <dwmw2@infradead.org> wrote:
> >
> > On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> > >
> > > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > > perhaps? I can see VMMs silently failing to detect the feature because
> > > > they just don't set split-irqchip before checking for it? > > > > > > ¹ although I still kind of hate it and would have preferred to have the
> > > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > > combination. But OK, I've reluctantly conceded that.
> > > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > > KVM_CAP_SPLIT_IRQCHIP?
> > > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
> >
> > If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> > the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> > even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> > KVM_CAP_SPLIT_IRQCHIP?
> >
> > That would be slightly better than the existing proposed awfulness
> > where the kernel doesn't *admit* to having the _ENABLE_ capability
> > until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
No. If userspace wants to see if *KVM* supports the feature, then userspace can
do KVM_CHECK_EXTENSION on /dev/kvm. If userspace does KVM_CHECK_EXTENSION on a
VM fd, then KVM absolutely must report exactly what that VM supports.
> How about we make an explicit _ENABLE_ bit for split IRQCHIP?
> When/if in-kernel IRQCHIP starts supporting I/O APIC 0x20, we
> can add a separate bit for that in the CAP.
NAK. Conditionally enumerating support for a feature based on the configuration
of the VM has been KVM's documented behavior since KVM_CHECK_EXTENSION was added
by commit 92b591a4c46b ("KVM: Allow KVM_CHECK_EXTENSION on the vm fd").
I don't see any reason why KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST needs to do
something different.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-28 14:57 ` Sean Christopherson
@ 2026-01-28 21:10 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2026-01-28 21:10 UTC (permalink / raw)
To: seanjc@google.com
Cc: dwmw2@infradead.org, khushit.shah@nutanix.com, bp@alien8.de,
x86@kernel.org, tglx@linutronix.de, hpa@zytor.com, Kohler, Jon,
linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com,
mingo@redhat.com, pbonzini@redhat.com, stable@vger.kernel.org,
kvm@vger.kernel.org, shaju.abraham@nutanix.com
On Wed, 2026-01-28 at 06:57 -0800, Sean Christopherson wrote:
> On Wed, Jan 28, 2026, Kai Huang wrote:
> > On Tue, 2026-01-27 at 19:48 -0800, David Woodhouse wrote:
> > > On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> > > >
> > > > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > > >
> > > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > > > perhaps? I can see VMMs silently failing to detect the feature because
> > > > > they just don't set split-irqchip before checking for it?
> > > > >
> > > > >
> > > > > ¹ although I still kind of hate it and would have preferred to have the
> > > > > I/O APIC patch; userspace still has to intentionally *enable* that
> > > > > combination. But OK, I've reluctantly conceded that.
> > > >
> > > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > > > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > > > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > > > KVM_CAP_SPLIT_IRQCHIP?
> > > >
> > > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > > > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > > > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
> > >
> > > If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> > > the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> > > even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> > > KVM_CAP_SPLIT_IRQCHIP?
> >
> > No IIUC it doesn't change that?
> >
> > The change I mentioned above is only related to "enable" part, but not
> > "query" part.
> >
> > The "query" is done via kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API),
> > and in this patch, it does:
> >
> > @@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
> > ext)
> > break;
> > case KVM_CAP_X2APIC_API:
> > r = KVM_X2APIC_API_VALID_FLAGS;
> > + if (kvm && !irqchip_split(kvm))
> > + r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
> >
> > IIRC if this is called before KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP,
> > then !irqchip_split() will be true, so it will NOT advertise
> > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
> >
> > If it is called after KVM_CAP_SPLIT_IRQCHIP, then it will advertise
> > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
>
> Yep. And when called at system-scope, i.e. with @kvm=NULL, userspace will see
> the maximal support with KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
Yep.
>
> > Btw, it doesn't grab kvm->lock either, so theoretically it could race with
> > KVM_CREATE_IRQCHIP and kvm_vm_ioctl_enable_cap(KVM_CAP_SPLIT_IRQCHIP) too.
>
> That's totally fine.
>
> > > That would be slightly better than the existing proposed awfulness
> > > where the kernel doesn't *admit* to having the _ENABLE_ capability
> > > until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
> >
> > We could also make kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API) to
> > _always_ advertise KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST if that's
> > better.
>
> No, because then we'd need new uAPI if we add support for ENABLE_SUPPRESS_EOI_BROADCAST
> with an in-kernel I/O APIC.
That's my concern too (wasn't quite sure about that, though).
I thought we could document in-kernel IOAPIC doesn't work with
ENABLE_SUPPRESS_EOI_BROADCAST for now but we may support it in the future.
>
> > I suppose what we need is to document such behaviour -- that albeit
> > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is advertise as supposed, but it
> > cannot be enabled together with KVM_CREATE_IRQCHIP -- one will fail
> > depending on which is called first.
>
> No, we don't need to explicitly document this, because it's super duper basic
> multi-threaded programming. KVM only needs to documented that
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST requires a VM with KVM_CAP_SPLIT_IRQCHIP.
>
> > As a bonus, it can get rid of "calling irqchip_split() w/o holding kvm-
> > > lock" awfulness too.
>
> No, it's not awfulness. It's userspace's responsibility to not be stupid. KVM
> taking kvm->lock changes *nothing*.
>
Right it doesn't change any result.
> All holding kvm->lock does is serialize KVM
> code, it doesn't prevent a race. I.e. it just changes whether tasks are racing
> to acquire kvm->lock versus racing against irqchip_mode.
>
> If userspace invokes KVM_CAP_SPLIT_IRQCHIP and KVM_ENABLE_CAP concurrently on two
> separate tasks, then KVM_ENABLE_CAP will fail ~50% of the time regardless of
> whether or not KVM takes kvm->lock.
>
Fair enough. Thanks for the clarification :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
2026-01-23 12:56 [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-27 2:21 ` Khushit Shah
2026-01-27 21:09 ` David Woodhouse
@ 2026-02-04 0:10 ` Sean Christopherson
2 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2026-02-04 0:10 UTC (permalink / raw)
To: Sean Christopherson, pbonzini, kai.huang, dwmw2, Khushit Shah
Cc: mingo, x86, bp, hpa, linux-kernel, kvm, dave.hansen, tglx, jon,
shaju.abraham, stable
On Fri, 23 Jan 2026 12:56:25 +0000, Khushit Shah wrote:
> Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
> for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated
> by userspace), which KVM completely mishandles. When x2APIC support was
> first added, KVM incorrectly advertised and "enabled" Suppress EOI
> Broadcast, without fully supporting the I/O APIC side of the equation,
> i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
>
> [...]
Applied to kvm-x86 misc, with some minor formatting tweaks. Thanks!
[1/1] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
https://github.com/kvm-x86/linux/commit/6517dfbcc918
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-02-04 0:10 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-23 12:56 [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-27 2:21 ` Khushit Shah
2026-01-27 2:41 ` Khushit Shah
2026-01-27 21:09 ` David Woodhouse
2026-01-27 21:49 ` Sean Christopherson
2026-01-27 22:36 ` David Woodhouse
2026-01-28 2:22 ` Huang, Kai
2026-01-28 3:48 ` David Woodhouse
[not found] ` <SA2PR02MB756478359EE9185285ACE6158891A@SA2PR02MB7564.namprd02.prod.outlook.com>
2026-01-28 5:17 ` Khushit Shah
2026-01-28 5:32 ` David Woodhouse
2026-01-28 6:40 ` Huang, Kai
2026-01-28 15:04 ` Sean Christopherson
2026-01-28 6:15 ` Huang, Kai
2026-01-28 14:57 ` Sean Christopherson
2026-01-28 21:10 ` Huang, Kai
2026-01-28 14:44 ` Sean Christopherson
2026-02-04 0:10 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox