linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS
@ 2025-08-27 15:27 Fei Li
  2025-08-27 16:01 ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Fei Li @ 2025-08-27 15:27 UTC (permalink / raw)
  To: seanjc, pbonzini, tglx, mingo, bp, dave.hansen, liran.alon, hpa,
	wanpeng.li
  Cc: kvm, x86, linux-kernel, Fei Li, stable

Commit ff90afa75573 ("KVM: x86: Evaluate latched_init in
KVM_SET_VCPU_EVENTS when vCPU not in SMM") changes KVM_SET_VCPU_EVENTS
handler to set pending LAPIC INIT event regardless of if vCPU is in
SMM mode or not.

However, latch INIT without checking CPU state exists race condition,
which causes the loss of INIT event. This is fatal during the VM
startup process because it will cause some AP to never switch to
non-root mode. Just as commit f4ef19108608 ("KVM: X86: Fix loss of
pending INIT due to race") said:
      BSP                          AP
                     kvm_vcpu_ioctl_x86_get_vcpu_events
                       events->smi.latched_init = 0

                     kvm_vcpu_block
                       kvm_vcpu_check_block
                         schedule

send INIT to AP
                     kvm_vcpu_ioctl_x86_set_vcpu_events
                     (e.g. `info registers -a` when VM starts/reboots)
                       if (events->smi.latched_init == 0)
                         clear INIT in pending_events

                     kvm_apic_accept_events
                       test_bit(KVM_APIC_INIT, &pe) == false
                         vcpu->arch.mp_state maintains UNINITIALIZED

send SIPI to AP
                     kvm_apic_accept_events
                       test_bit(KVM_APIC_SIPI, &pe) == false
                         vcpu->arch.mp_state will never change to RUNNABLE
                         (defy: UNINITIALIZED => INIT_RECEIVED => RUNNABLE)
                           AP will never switch to non-root operation

In such race result, VM hangs. E.g., BSP loops in SeaBIOS's SMPLock and
AP will never be reset, and qemu hmp "info registers -a" shows:
CPU#0
EAX=00000002 EBX=00000002 ECX=00000000 EDX=00020000
ESI=00000000 EDI=00000000 EBP=00000008 ESP=00006c6c
EIP=000ef570 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
......
CPU#1
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00080660
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
......

Fix this by handling latched INITs only in specific CPU states (SMM,
VMX non-root mode, SVM with GIF=0) in KVM_SET_VCPU_EVENTS.

Cc: stable@vger.kernel.org
Fixes: ff90afa75573 ("KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMM")
Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c49bc681c46..7001b2af00ed1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5556,7 +5556,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 #endif
 
-		if (lapic_in_kernel(vcpu)) {
+		if (!kvm_apic_init_sipi_allowed(vcpu) && lapic_in_kernel(vcpu)) {
 			if (events->smi.latched_init)
 				set_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
 			else
-- 
2.39.2 (Apple Git-143)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS
  2025-08-27 15:27 [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS Fei Li
@ 2025-08-27 16:01 ` Sean Christopherson
  2025-08-27 16:08   ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2025-08-27 16:01 UTC (permalink / raw)
  To: Fei Li
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, liran.alon, hpa,
	wanpeng.li, kvm, x86, linux-kernel, stable

On Wed, Aug 27, 2025, Fei Li wrote:
> Commit ff90afa75573 ("KVM: x86: Evaluate latched_init in
> KVM_SET_VCPU_EVENTS when vCPU not in SMM") changes KVM_SET_VCPU_EVENTS
> handler to set pending LAPIC INIT event regardless of if vCPU is in
> SMM mode or not.
> 
> However, latch INIT without checking CPU state exists race condition,
> which causes the loss of INIT event. This is fatal during the VM
> startup process because it will cause some AP to never switch to
> non-root mode. Just as commit f4ef19108608 ("KVM: X86: Fix loss of
> pending INIT due to race") said:
>       BSP                          AP
>                      kvm_vcpu_ioctl_x86_get_vcpu_events
>                        events->smi.latched_init = 0
> 
>                      kvm_vcpu_block
>                        kvm_vcpu_check_block
>                          schedule
> 
> send INIT to AP
>                      kvm_vcpu_ioctl_x86_set_vcpu_events
>                      (e.g. `info registers -a` when VM starts/reboots)
>                        if (events->smi.latched_init == 0)
>                          clear INIT in pending_events

This is a QEMU bug, no?  IIUC, it's invoking kvm_vcpu_ioctl_x86_set_vcpu_events()
with stale data.  I'm also a bit confused as to how QEMU is even gaining control
of the vCPU to emit KVM_SET_VCPU_EVENTS if the vCPU is in kvm_vcpu_block().

>                      kvm_apic_accept_events
>                        test_bit(KVM_APIC_INIT, &pe) == false
>                          vcpu->arch.mp_state maintains UNINITIALIZED
> 
> send SIPI to AP
>                      kvm_apic_accept_events
>                        test_bit(KVM_APIC_SIPI, &pe) == false
>                          vcpu->arch.mp_state will never change to RUNNABLE
>                          (defy: UNINITIALIZED => INIT_RECEIVED => RUNNABLE)
>                            AP will never switch to non-root operation
> 
> In such race result, VM hangs. E.g., BSP loops in SeaBIOS's SMPLock and
> AP will never be reset, and qemu hmp "info registers -a" shows:
> CPU#0
> EAX=00000002 EBX=00000002 ECX=00000000 EDX=00020000
> ESI=00000000 EDI=00000000 EBP=00000008 ESP=00006c6c
> EIP=000ef570 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ......
> CPU#1
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00080660
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00009b00
> ......
> 
> Fix this by handling latched INITs only in specific CPU states (SMM,
> VMX non-root mode, SVM with GIF=0) in KVM_SET_VCPU_EVENTS.
> 
> Cc: stable@vger.kernel.org
> Fixes: ff90afa75573 ("KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMM")
> Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
> ---
>  arch/x86/kvm/x86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a1c49bc681c46..7001b2af00ed1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5556,7 +5556,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  			return -EINVAL;
>  #endif
>  
> -		if (lapic_in_kernel(vcpu)) {
> +		if (!kvm_apic_init_sipi_allowed(vcpu) && lapic_in_kernel(vcpu)) {
>  			if (events->smi.latched_init)
>  				set_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
>  			else
> -- 
> 2.39.2 (Apple Git-143)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS
  2025-08-27 16:01 ` Sean Christopherson
@ 2025-08-27 16:08   ` Paolo Bonzini
  2025-08-28 15:13     ` [External] " Fei Li
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2025-08-27 16:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fei Li, tglx, mingo, bp, dave.hansen, liran.alon, hpa, wanpeng.li,
	kvm, x86, linux-kernel, stable

On Wed, Aug 27, 2025 at 6:01 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Aug 27, 2025, Fei Li wrote:
> > Commit ff90afa75573 ("KVM: x86: Evaluate latched_init in
> > KVM_SET_VCPU_EVENTS when vCPU not in SMM") changes KVM_SET_VCPU_EVENTS
> > handler to set pending LAPIC INIT event regardless of if vCPU is in
> > SMM mode or not.
> >
> > However, latch INIT without checking CPU state exists race condition,
> > which causes the loss of INIT event. This is fatal during the VM
> > startup process because it will cause some AP to never switch to
> > non-root mode. Just as commit f4ef19108608 ("KVM: X86: Fix loss of
> > pending INIT due to race") said:
> >       BSP                          AP
> >                      kvm_vcpu_ioctl_x86_get_vcpu_events
> >                        events->smi.latched_init = 0
> >
> >                      kvm_vcpu_block
> >                        kvm_vcpu_check_block
> >                          schedule
> >
> > send INIT to AP
> >                      kvm_vcpu_ioctl_x86_set_vcpu_events
> >                      (e.g. `info registers -a` when VM starts/reboots)
> >                        if (events->smi.latched_init == 0)
> >                          clear INIT in pending_events
>
> This is a QEMU bug, no?

I think I agree.

> IIUC, it's invoking kvm_vcpu_ioctl_x86_set_vcpu_events()
> with stale data.

More precisely, it's not expecting other vCPUs to change the pending
events asynchronously.

> I'm also a bit confused as to how QEMU is even gaining control
> of the vCPU to emit KVM_SET_VCPU_EVENTS if the vCPU is in
> kvm_vcpu_block().

With a signal. :)

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS
  2025-08-27 16:08   ` Paolo Bonzini
@ 2025-08-28 15:13     ` Fei Li
  2025-08-28 16:44       ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Fei Li @ 2025-08-28 15:13 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: tglx, mingo, bp, dave.hansen, liran.alon, hpa, wanpeng.li, kvm,
	x86, linux-kernel, stable


On 8/28/25 12:08 AM, Paolo Bonzini wrote:
> On Wed, Aug 27, 2025 at 6:01 PM Sean Christopherson <seanjc@google.com> wrote:
>> On Wed, Aug 27, 2025, Fei Li wrote:
>>> Commit ff90afa75573 ("KVM: x86: Evaluate latched_init in
>>> KVM_SET_VCPU_EVENTS when vCPU not in SMM") changes KVM_SET_VCPU_EVENTS
>>> handler to set pending LAPIC INIT event regardless of if vCPU is in
>>> SMM mode or not.
>>>
>>> However, latch INIT without checking CPU state exists race condition,
>>> which causes the loss of INIT event. This is fatal during the VM
>>> startup process because it will cause some AP to never switch to
>>> non-root mode. Just as commit f4ef19108608 ("KVM: X86: Fix loss of
>>> pending INIT due to race") said:
>>>        BSP                          AP
>>>                       kvm_vcpu_ioctl_x86_get_vcpu_events
>>>                         events->smi.latched_init = 0
>>>
>>>                       kvm_vcpu_block
>>>                         kvm_vcpu_check_block
>>>                           schedule
>>>
>>> send INIT to AP
>>>                       kvm_vcpu_ioctl_x86_set_vcpu_events
>>>                       (e.g. `info registers -a` when VM starts/reboots)
>>>                         if (events->smi.latched_init == 0)
>>>                           clear INIT in pending_events
>> This is a QEMU bug, no?
> I think I agree.
Actually this is a bug triggered by one monitor tool in our production 
environment. This monitor executes 'info registers -a' hmp at a fixed 
frequency, even during VM startup process, which makes some AP stay in 
KVM_MP_STATE_UNINITIALIZED forever. But thisrace only occurs with 
extremely low probability, about 1~2 VM hangs per week.

Considering other emulators, like cloud-hypervisor and firecracker maybe 
also have similar potential race issues, I think KVM had better do some 
handling. But anyway, I will check Qemu code to avoid such race. Thanks 
for both of your comments. 🙂

Have a nice day, thanks
Fei
>
>> IIUC, it's invoking kvm_vcpu_ioctl_x86_set_vcpu_events()
>> with stale data.
> More precisely, it's not expecting other vCPUs to change the pending
> events asynchronously.
Yes, will sort out a more complete calling process later.
>
>> I'm also a bit confused as to how QEMU is even gaining control
>> of the vCPU to emit KVM_SET_VCPU_EVENTS if the vCPU is in
>> kvm_vcpu_block().
> With a signal. :)
>
> Paolo
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS
  2025-08-28 15:13     ` [External] " Fei Li
@ 2025-08-28 16:44       ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2025-08-28 16:44 UTC (permalink / raw)
  To: Fei Li
  Cc: Sean Christopherson, tglx, mingo, bp, dave.hansen, liran.alon,
	hpa, wanpeng.li, kvm, x86, linux-kernel, stable

On Thu, Aug 28, 2025 at 5:13 PM Fei Li <lifei.shirley@bytedance.com> wrote:
> Actually this is a bug triggered by one monitor tool in our production
> environment. This monitor executes 'info registers -a' hmp at a fixed
> frequency, even during VM startup process, which makes some AP stay in
> KVM_MP_STATE_UNINITIALIZED forever. But this race only occurs with
> extremely low probability, about 1~2 VM hangs per week.
>
> Considering other emulators, like cloud-hypervisor and firecracker maybe
> also have similar potential race issues, I think KVM had better do some
> handling. But anyway, I will check Qemu code to avoid such race. Thanks
> for both of your comments. 🙂

If you can check whether other emulators invoke KVM_SET_VCPU_EVENTS in
similar cases, that of course would help understanding the situation
better.

In QEMU, it is possible to delay KVM_GET_VCPU_EVENTS until after all
vCPUs have halted.

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-28 16:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 15:27 [PATCH] KVM: x86: Latch INITs only in specific CPU states in KVM_SET_VCPU_EVENTS Fei Li
2025-08-27 16:01 ` Sean Christopherson
2025-08-27 16:08   ` Paolo Bonzini
2025-08-28 15:13     ` [External] " Fei Li
2025-08-28 16:44       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).