public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: VMX: Fall back to IRR scan when PIR is empty despite PID.ON being set
@ 2026-04-28  7:03 Chenyi Qiang
  2026-04-28  7:45 ` Paolo Bonzini
  2026-04-28 11:10 ` Chao Gao
  0 siblings, 2 replies; 7+ messages in thread
From: Chenyi Qiang @ 2026-04-28  7:03 UTC (permalink / raw)
  To: kvm
  Cc: Chenyi Qiang, Sean Christopherson, Jim Mattson, Paolo Bonzini,
	Gao Chao, Farrah Chen

Fall back to kvm_lapic_find_highest_irr() in vmx_sync_pir_to_irr() when
PID.ON is set but PIR turns out to be empty, to correctly report the
highest pending interrupt from the existing IRR.

In a nested VM stress test, the following WARNING fires in
vmx_check_nested_events() when kvm_cpu_has_interrupt() reports a pending
interrupt but the subsequent kvm_apic_has_interrupt() (which invokes
vmx_sync_pir_to_irr() again) returns -1:

  WARNING: CPU: 99 PID: 57767 at arch/x86/kvm/vmx/nested.c:4449 vmx_check_nested_events+0x6bf/0x6e0 [kvm_intel]
  Call Trace:
   kvm_check_and_inject_events
   vcpu_enter_guest.constprop.0
   vcpu_run
   kvm_arch_vcpu_ioctl_run
   kvm_vcpu_ioctl
   __x64_sys_ioctl
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

The root cause is a race between vmx_sync_pir_to_irr() on the target vCPU
and __vmx_deliver_posted_interrupt() on a sender vCPU.  The sender
performs two individually-atomic operations that are not a single
transaction:

  1. pi_test_and_set_pir(vector)  -- sets the PIR bit
  2. pi_test_and_set_on()         -- sets PID.ON

The following interleaving triggers the bug:

  Sender vCPU (IPI):              Target vCPU (1st sync_pir_to_irr):
  B1: set PIR[vector]
                                  A1: pi_clear_on()
                                  A2: pi_harvest_pir() -> sees B1 bit
                                  A3: xchg() -> consumes bit, PIR=0
                                      (1st sync returns correct max_irr)
  B2: set PID.ON = 1

                                  Target vCPU (2nd sync_pir_to_irr):
                                  C1: pi_test_on() -> TRUE (from B2)
                                  C2: pi_clear_on() -> ON=0
                                  C3: pi_harvest_pir() -> PIR empty
                                  C4: *max_irr = -1, early return
                                      IRR NOT SCANNED

The interrupt is not lost (it resides in the IRR from the first sync and
is recovered on the next vcpu_enter_guest() iteration), but the incorrect
max_irr causes a spurious WARNING and a wasted L2 VM-Enter/VM-Exit cycle.

Fixes: b41f8638b9d3 ("KVM: VMX: Isolate pure loads from atomic XCHG when processing PIR")
Reported-by: Farrah Chen <farrah.chen@intel.com>
Assisted-by: GitHub Copilot:Claude Opus 4.6
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>

---
There is a WARNING call trace during a nested VM stress test. AI
provided an analysis of a race condition and the related fix, which
looks reasonable to me. With the patch applied, the WARNING can not
be reproduced in overnight stress testing.
---
 arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8b24e682535b..e2da29371e00 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7153,6 +7153,16 @@ int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 		smp_mb__after_atomic();
 		got_posted_interrupt =
 			kvm_apic_update_irr(vcpu, vt->pi_desc.pir, &max_irr);
+		/*
+		 * If PID.ON was set but PIR is empty, another CPU may have
+		 * set PID.ON via __vmx_deliver_posted_interrupt() after a
+		 * previous sync already consumed the PIR bits.  In this
+		 * case, kvm_apic_update_irr() will not have scanned the
+		 * existing IRR, so fall back to scanning the IRR directly
+		 * to correctly report the highest pending interrupt.
+		 */
+		if (max_irr == -1)
+			max_irr = kvm_lapic_find_highest_irr(vcpu);
 	} else {
 		max_irr = kvm_lapic_find_highest_irr(vcpu);
 		got_posted_interrupt = false;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-29 12:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  7:03 [PATCH] KVM: VMX: Fall back to IRR scan when PIR is empty despite PID.ON being set Chenyi Qiang
2026-04-28  7:45 ` Paolo Bonzini
2026-04-28  8:27   ` Chenyi Qiang
2026-04-28 15:50     ` Sean Christopherson
2026-04-29  1:08       ` Chenyi Qiang
2026-04-29 12:58         ` Sean Christopherson
2026-04-28 11:10 ` Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox