public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Chenyi Qiang <chenyi.qiang@intel.com>, kvm@vger.kernel.org
Cc: Sean Christopherson <seanjc@google.com>,
	Jim Mattson <jmattson@google.com>, Gao Chao <chao.gao@intel.com>,
	Farrah Chen <farrah.chen@intel.com>
Subject: Re: [PATCH] KVM: VMX: Fall back to IRR scan when PIR is empty despite PID.ON being set
Date: Tue, 28 Apr 2026 09:45:42 +0200	[thread overview]
Message-ID: <3235eb76-9b28-4000-920a-491659927e67@redhat.com> (raw)
In-Reply-To: <20260428070349.1633238-1-chenyi.qiang@intel.com>

On 4/28/26 09:03, Chenyi Qiang wrote:
> Fall back to kvm_lapic_find_highest_irr() in vmx_sync_pir_to_irr() when
> PID.ON is set but PIR turns out to be empty, to correctly report the
> highest pending interrupt from the existing IRR.
> 
> In a nested VM stress test, the following WARNING fires in
> vmx_check_nested_events() when kvm_cpu_has_interrupt() reports a pending
> interrupt but the subsequent kvm_apic_has_interrupt() (which invokes
> vmx_sync_pir_to_irr() again) returns -1:
> 
>    WARNING: CPU: 99 PID: 57767 at arch/x86/kvm/vmx/nested.c:4449 vmx_check_nested_events+0x6bf/0x6e0 [kvm_intel]
>    Call Trace:
>     kvm_check_and_inject_events
>     vcpu_enter_guest.constprop.0
>     vcpu_run
>     kvm_arch_vcpu_ioctl_run
>     kvm_vcpu_ioctl
>     __x64_sys_ioctl
>     do_syscall_64
>     entry_SYSCALL_64_after_hwframe
> 
> The root cause is a race between vmx_sync_pir_to_irr() on the target vCPU
> and __vmx_deliver_posted_interrupt() on a sender vCPU.  The sender
> performs two individually-atomic operations that are not a single
> transaction:
> 
>    1. pi_test_and_set_pir(vector)  -- sets the PIR bit
>    2. pi_test_and_set_on()         -- sets PID.ON
> 
> The following interleaving triggers the bug:
> 
>    Sender vCPU (IPI):              Target vCPU (1st sync_pir_to_irr):
>    B1: set PIR[vector]
>                                    A1: pi_clear_on()
>                                    A2: pi_harvest_pir() -> sees B1 bit
>                                    A3: xchg() -> consumes bit, PIR=0
>                                        (1st sync returns correct max_irr)
>    B2: set PID.ON = 1
> 
>                                    Target vCPU (2nd sync_pir_to_irr):
>                                    C1: pi_test_on() -> TRUE (from B2)
>                                    C2: pi_clear_on() -> ON=0
>                                    C3: pi_harvest_pir() -> PIR empty
>                                    C4: *max_irr = -1, early return
>                                        IRR NOT SCANNED
> 
> The interrupt is not lost (it resides in the IRR from the first sync and
> is recovered on the next vcpu_enter_guest() iteration), but the incorrect
> max_irr causes a spurious WARNING and a wasted L2 VM-Enter/VM-Exit cycle.
> 
> Fixes: b41f8638b9d3 ("KVM: VMX: Isolate pure loads from atomic XCHG when processing PIR")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Assisted-by: GitHub Copilot:Claude Opus 4.6
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> 
> ---
> There is a WARNING call trace during a nested VM stress test. AI
> provided an analysis of a race condition and the related fix, which
> looks reasonable to me. With the patch applied, the WARNING can not
> be reproduced in overnight stress testing.

The analysis of the race is correct and changing the logic is the
right thing to do; but I would change directly __kvm_apic_update_irr,
either like this:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e3ec4d8607c1..5ee14d6bc288 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -669,12 +669,14 @@ bool __kvm_apic_update_irr(unsigned long *pir, void *regs, int *max_irr)
  	u32 irr_val, prev_irr_val;
  	int max_updated_irr;
  
+	if (!pi_harvest_pir(pir, pir_vals)) {
+		*max_irr = apic_find_highest_vector(regs + APIC_IRR);
+		return false;
+	}
+
  	max_updated_irr = -1;
  	*max_irr = -1;
  
-	if (!pi_harvest_pir(pir, pir_vals))
-		return false;
-
  	for (i = vec = 0; i <= 7; i++, vec += 32) {
  		u32 *p_irr = (u32 *)(regs + APIC_IRR + i * 0x10);
  

Or even ignoring altogether the return value of pi_harvest_pir(), always
going in the loop below for simplicity.

Paolo

> ---
>   arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8b24e682535b..e2da29371e00 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7153,6 +7153,16 @@ int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>   		smp_mb__after_atomic();
>   		got_posted_interrupt =
>   			kvm_apic_update_irr(vcpu, vt->pi_desc.pir, &max_irr);
> +		/*
> +		 * If PID.ON was set but PIR is empty, another CPU may have
> +		 * set PID.ON via __vmx_deliver_posted_interrupt() after a
> +		 * previous sync already consumed the PIR bits.  In this
> +		 * case, kvm_apic_update_irr() will not have scanned the
> +		 * existing IRR, so fall back to scanning the IRR directly
> +		 * to correctly report the highest pending interrupt.
> +		 */
> +		if (max_irr == -1)
> +			max_irr = kvm_lapic_find_highest_irr(vcpu);
>   	} else {
>   		max_irr = kvm_lapic_find_highest_irr(vcpu);
>   		got_posted_interrupt = false;


  reply	other threads:[~2026-04-28  7:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  7:03 [PATCH] KVM: VMX: Fall back to IRR scan when PIR is empty despite PID.ON being set Chenyi Qiang
2026-04-28  7:45 ` Paolo Bonzini [this message]
2026-04-28  8:27   ` Chenyi Qiang
2026-04-28 15:50     ` Sean Christopherson
2026-04-29  1:08       ` Chenyi Qiang
2026-04-29 12:58         ` Sean Christopherson
2026-04-28 11:10 ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3235eb76-9b28-4000-920a-491659927e67@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=chao.gao@intel.com \
    --cc=chenyi.qiang@intel.com \
    --cc=farrah.chen@intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox