* [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation
@ 2026-06-12 21:10 Jon Kohler
2026-06-12 23:58 ` Sean Christopherson
2026-06-13 14:51 ` Sasha Levin
0 siblings, 2 replies; 3+ messages in thread
From: Jon Kohler @ 2026-06-12 21:10 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, kvm,
linux-kernel
Cc: jonmkohler, Dongli Zhang, Chao Gao, stable, Gulshan Gabel,
Jon Kohler
From: Dongli Zhang <dongli.zhang@oracle.com>
commit b2849bec936be642b5420801f902337f2507648e upstream.
The APICv (apic->apicv_active) can be activated or deactivated at runtime,
for instance, because of APICv inhibit reasons. Intel VMX employs different
mechanisms to virtualize LAPIC based on whether APICv is active.
When APICv is activated at runtime, GUEST_INTR_STATUS is used to configure
and report the current pending IRR and ISR states. Unless a specific vector
is explicitly included in EOI_EXIT_BITMAP, its EOI will not be trapped to
KVM. Intel VMX automatically clears the corresponding ISR bit based on the
GUEST_INTR_STATUS.SVI field.
When APICv is deactivated at runtime, the VM_ENTRY_INTR_INFO_FIELD is used
to specify the next interrupt vector to invoke upon VM-entry. The
VMX IDT_VECTORING_INFO_FIELD is used to report un-invoked vectors on
VM-exit. EOIs are always trapped to KVM, so the software can manually clear
pending ISR bits.
There are scenarios where, with APICv activated at runtime, a guest-issued
EOI may not be able to clear the pending ISR bit.
Taking vector 236 as an example, here is one scenario.
1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
3. After VM-entry, vector 236 is invoked through the guest IDT. At this
point, the data in VM_ENTRY_INTR_INFO_FIELD is no longer valid. The guest
interrupt handler for vector 236 is invoked.
4. Suppose a VM exit occurs very early in the guest interrupt handler,
before the EOI is issued.
5. Nothing is reported through the IDT_VECTORING_INFO_FIELD because
vector 236 has already been invoked in the guest.
6. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
kvm_vcpu_update_apicv() to activate APICv.
7. Unfortunately, GUEST_INTR_STATUS.SVI is not configured, although
vector 236 is still pending in the ISR.
8. After VM-entry, the guest finally issues the EOI for vector 236.
However, because SVI is not configured, vector 236 is not cleared.
9. ISR is stalled forever on vector 236.
Here is another scenario.
1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
3. VM-exit occurs immediately after the next VM-entry. The vector 236 is
not invoked through the guest IDT. Instead, it is saved to the
IDT_VECTORING_INFO_FIELD during the VM-exit.
4. KVM calls kvm_queue_interrupt() to re-queue the un-invoked vector 236
into vcpu->arch.interrupt. A KVM_REQ_EVENT is requested.
5. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
kvm_vcpu_update_apicv() to activate APICv.
6. Although APICv is now active, KVM still uses the legacy
VM_ENTRY_INTR_INFO_FIELD to re-inject vector 236. GUEST_INTR_STATUS.SVI is
not configured.
7. After the next VM-entry, vector 236 is invoked through the guest IDT.
Finally, an EOI occurs. However, due to the lack of GUEST_INTR_STATUS.SVI
configuration, vector 236 is not cleared from the ISR.
8. ISR is stalled forever on vector 236.
Using QEMU as an example, vector 236 is stuck in ISR forever.
(qemu) info lapic 1
dumping local APIC state for CPU 1
LVT0 0x00010700 active-hi edge masked ExtINT (vec 0)
LVT1 0x00010400 active-hi edge masked NMI
LVTPC 0x00000400 active-hi edge NMI
LVTERR 0x000000fe active-hi edge Fixed (vec 254)
LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0)
LVTT 0x000400ec active-hi edge tsc-deadline Fixed (vec 236)
Timer DCR=0x0 (divide by 2) initial_count = 0 current_count = 0
SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255
ICR 0x000000fd physical edge de-assert no-shorthand
ICR2 0x00000000 cpu 0 (X2APIC ID)
ESR 0x00000000
ISR 236
IRR 37(level) 236
The issue isn't applicable to AMD SVM as KVM simply writes vmcb01 directly
irrespective of whether L1 (vmcs01) or L2 (vmcb02) is active (unlike VMX,
there is no need/cost to switch between VMCBs). In addition,
APICV_INHIBIT_REASON_IRQWIN ensures AMD SVM AVIC is not activated until
the last interrupt is EOI'd.
Fix the bug by configuring Intel VMX GUEST_INTR_STATUS.SVI if APICv is
activated at runtime.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251110063212.34902-1-dongli.zhang@oracle.com
[sean: call out that SVM writes vmcb01 directly, tweak comment]
Link: https://patch.msgid.link/20251205231913.441872-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
(cherry picked from commit b2849bec936be642b5420801f902337f2507648e)
Cc: stable@vger.kernel.org # 6.6.x and above
Cc: Gulshan Gabel <gulshan.gabel@nutanix.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
---
This issue is pervasive and has been observed in production with QEMU
as the VMM. One scenario where this occurs is with Windows guests that
use the AutoEOI feature, which inhibits APICv
(APICV_INHIBIT_REASON_HYPERV).
The observed sequence is:
1. A VM is actively servicing vector 209 while live migrating, and
before the guest issues EOI, the VM is paused and migrated. The
LAPIC state (including ISR/IRR) is saved on the source. Until now,
APICv has been inhibited by AutoEOI.
2. Upon arrival at the destination, the LAPIC state is restored via
kvm_apic_set_state(). At this point, MSRs are not loaded, and since
the inhibit is not yet in place, apicv_active is true, and
vmx_hwapic_isr_update() writes SVI=209 into GUEST_INTR_STATUS.
3. When MSRs are subsequently loaded, the Hyper-V AutoEOI state is
restored, causing KVM to set APICV_INHIBIT_REASON_HYPERV. On the
first KVM_RUN, __kvm_vcpu_update_apicv() transitions apicv_active to
false and disables VID, leaving SVI=209 stale in GUEST_INTR_STATUS.
4. When the VM is rebooted from inside the guest, kvm_lapic_reset()
zeroes ISR/IRR in the virtual APIC page but does not update
GUEST_INTR_STATUS because apicv_active is false — SVI=209 persists.
5. During the bootloader sequence, the guest clears the Hyper-V AutoEOI
inhibit, and apicv_active transitions back to true. The stale
SVI=209 is now live, causing hardware to block the delivery of all
virtual interrupts with a lower priority.
6. In the observed case, the UEFI timer interrupt (vector 32) is
blocked in IRR. The guest later reprograms this vector. When APICv
is subsequently inhibited again, and the software interrupt path
takes over, the stale IRR entry is injected to the wrong handler,
and the guest panics.
With this fix, when APICv is reactivated in step 5, the SVI is
recalculated from the current virtual ISR, which is the expected
behavior.
Past that, we do also see this fail w/ selftest vmx_apicv_updates_test
which fails with the following signature before this patch and
afterwards it passes nicely.
./vmx_apicv_updates_test
Random seed: 0x6b8b4567
==== Test Assertion Failure ====
x86/vmx_apicv_updates_test.c:88: x2apic_read_reg(APIC_ISR + APIC_VECTOR_TO_REG_OFFSET(GOOD_IPI_VECTOR)) == 0
pid=154616 tid=154616 errno=4 - Interrupted system call
1 0x0000000000411e41: assert_on_unhandled_exception at processor.c:778
2 0x000000000040593d: _vcpu_run at kvm_util.c:1664
3 0x000000000040595a: vcpu_run at kvm_util.c:1675
4 0x0000000000400d1a: main at vmx_apicv_updates_test.c:131
5 0x000000000042e9f8: __libc_start_main at ??:?
6 0x0000000000400f3d: _start at ??:?
0x1 != 0x0 (x2apic_read_reg(APIC_ISR + APIC_VECTOR_TO_REG_OFFSET(GOOD_IPI_VECTOR)) != 0)
arch/x86/kvm/vmx/vmx.c | 9 ---------
arch/x86/kvm/x86.c | 7 +++++++
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c084f48e2b0b..b7798ced7b50 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6886,15 +6886,6 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
* VM-Exit, otherwise L1 with run with a stale SVI.
*/
if (is_guest_mode(vcpu)) {
- /*
- * KVM is supposed to forward intercepted L2 EOIs to L1 if VID
- * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC.
- * Note, userspace can stuff state while L2 is active; assert
- * that VID is disabled if and only if the vCPU is in KVM_RUN
- * to avoid false positives if userspace is setting APIC state.
- */
- WARN_ON_ONCE(vcpu->wants_to_run &&
- nested_cpu_has_vid(get_vmcs12(vcpu)));
to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
return;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ad2b7158b9c8..a21ebe04aa23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10950,9 +10950,16 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
* pending. At the same time, KVM_REQ_EVENT may not be set as APICv was
* still active when the interrupt got accepted. Make sure
* kvm_check_and_inject_events() is called to check for that.
+ *
+ * Update SVI when APICv gets enabled, otherwise SVI won't reflect the
+ * highest bit in vISR and the next accelerated EOI in the guest won't
+ * be virtualized correctly (the CPU uses SVI to determine which vISR
+ * vector to clear).
*/
if (!apic->apicv_active)
kvm_make_request(KVM_REQ_EVENT, vcpu);
+ else
+ kvm_apic_update_hwapic_isr(vcpu);
out:
preempt_enable();
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation
2026-06-12 21:10 [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation Jon Kohler
@ 2026-06-12 23:58 ` Sean Christopherson
2026-06-13 14:51 ` Sasha Levin
1 sibling, 0 replies; 3+ messages in thread
From: Sean Christopherson @ 2026-06-12 23:58 UTC (permalink / raw)
To: Jon Kohler
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, kvm, linux-kernel, jonmkohler,
Dongli Zhang, Chao Gao, stable, Gulshan Gabel
On Fri, Jun 12, 2026, Jon Kohler wrote:
> From: Dongli Zhang <dongli.zhang@oracle.com>
>
> commit b2849bec936be642b5420801f902337f2507648e upstream.
...
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Link: https://patch.msgid.link/20251110063212.34902-1-dongli.zhang@oracle.com
> [sean: call out that SVM writes vmcb01 directly, tweak comment]
> Link: https://patch.msgid.link/20251205231913.441872-2-seanjc@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> (cherry picked from commit b2849bec936be642b5420801f902337f2507648e)
> Cc: stable@vger.kernel.org # 6.6.x and above
> Cc: Gulshan Gabel <gulshan.gabel@nutanix.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
>
> This issue is pervasive and has been observed in production with QEMU
> as the VMM. One scenario where this occurs is with Windows guests that
> use the AutoEOI feature, which inhibits APICv
> (APICV_INHIBIT_REASON_HYPERV).
Gah, sorry, my bad. I don't know why I didn't tag this for stable@.
Acked-by: Sean Christopherson <seanjc@google.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation
2026-06-12 21:10 [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation Jon Kohler
2026-06-12 23:58 ` Sean Christopherson
@ 2026-06-13 14:51 ` Sasha Levin
1 sibling, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2026-06-13 14:51 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, kvm,
linux-kernel
Cc: Sasha Levin, jonmkohler, Dongli Zhang, Chao Gao, stable,
Gulshan Gabel, Jon Kohler
On Fri, Jun 12, 2026 at 02:10:01PM -0700, Jon Kohler wrote:
> From: Dongli Zhang <dongli.zhang@oracle.com>
>
> commit b2849bec936be642b5420801f902337f2507648e upstream.
Queued for 6.18.y, thanks. (And thanks Sean for the ack.)
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-13 14:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 21:10 [PATCH 6.18.y] KVM: VMX: Update SVI during runtime APICv activation Jon Kohler
2026-06-12 23:58 ` Sean Christopherson
2026-06-13 14:51 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox