qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fei Li <lifei.shirley@bytedance.com>
To: pbonzini@redhat.com, mtosatti@redhat.com, seanjc@google.com,
	kvm@vger.kernel.org
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH] KVM: x86: Restrict writeback of SMI VCPU state
Date: Tue, 9 Sep 2025 14:42:51 +0800	[thread overview]
Message-ID: <f6706599-0b8e-44bb-9f53-4e6a4db24b6d@bytedance.com> (raw)
In-Reply-To: <20250909063327.14263-1-lifei.shirley@bytedance.com>



On 9/9/25 2:33 PM, Fei Li wrote:
> Recently, we meet a SMI race bug triggered by one monitor tool in our
> production environment. This monitor executes 'info registers -a' hmp
> at a fixed frequency, even during VM startup process, which makes
> some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs.
> 
> The complete calling processes for the SMI race are as follows:
> 
> //thread1                      //thread2               //thread3
> `info registers -a` hmp [1]    AP(vcpu1) thread [2]    BSP(vcpu0) send INIT/SIPI [3]
> 
>                                 [2]
>                                 KVM: KVM_RUN and then
>                                      schedule() in kvm_vcpu_block() loop
> 
> [1]
> for each cpu: cpu_synchronize_state
> if !qemu_thread_is_self()
> 1. insert to cpu->work_list, and handle asynchronously
> 2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal
> 
>                                 [2]
>                                 KVM: checks signal_pending, breaks loop and returns -EINTR
>                                 Qemu: break kvm_cpu_exec loop, run
>                                       1. qemu_wait_io_event()
>                                       => process_queued_cpu_work => cpu->work_list.func()
>                                          e.i. do_kvm_cpu_synchronize_state() callback
>                                          => kvm_arch_get_registers
>                                             => kvm_get_mp_state /* KVM: get_mpstate also calls
>                                                kvm_apic_accept_events() to handle INIT and SIPI */
>                                       => cpu->vcpu_dirty = true;
>                                       // end of qemu_wait_io_event
> 
>                                                         [3]
>                                                         SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS.
>                                                                  send INIT and then SIPI by writing APIC_ICR during smp_scan
>                                                         KVM: BSP(vcpu0) exits, then => handle_apic_write
>                                                              => kvm_lapic_reg_write => kvm_apic_send_ipi to all APs
>                                                              => for each AP: __apic_accept_irq, e.g. for AP(vcpu1)
>                                                              ==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT)
>                                                                  (not kick the AP yet)
>                                                              ==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events)
>                                                                  (not kick the AP yet)
> 
>                                 [2]
>                                 Qemu continue:
>                                      2. kvm_cpu_exec()
>                                      => if (cpu->vcpu_dirty):
>                                         => kvm_arch_put_registers
>                                            => kvm_put_vcpu_events
>                                 KVM: kvm_vcpu_ioctl_x86_set_vcpu_events
>                                      => clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
>                                         e.i. pending_events changes from 11b to 10b
>                                        // end of kvm_vcpu_ioctl_x86_set_vcpu_events
>                                 Qemu: => after put_registers, cpu->vcpu_dirty = false;
>                                       => kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
>                                 KVM: KVM_RUN
>                                      => schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal
>                                      /* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED
>                                        to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without
>                                        handling INIT inside kvm_apic_accept_events(), considering BSP will never
>                                        send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter
>                                        non-root mode */
> 
>                                                         [3]
>                                                         SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever
>                                                                  e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000
>                                                                  and BSP(vcpu0) appears 100% utilized as it is in a while loop.
> 
> To fix this, avoid clobbering SMI when not putting "reset" state, just
> like NMI abd SIPI does.
> 
> Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
> ---
>   target/i386/kvm/kvm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8..598661799a 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
>   
>       events.sipi_vector = env->sipi_vector;
>   
> -    if (has_msr_smbase) {
> +    if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) {
>           events.flags |= KVM_VCPUEVENT_VALID_SMM;
>           events.smi.smm = !!(env->hflags & HF_SMM_MASK);
>           events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);

Hi all,
The previous context is from one KVM topic: 
https://lore-kernel.gnuweeb.org/lkml/f904b674-98ba-4e13-a64c-fd30b6ac4a2e@bytedance.com/T/#m80f426b55e272f9b257e2d7f6ff4902b7eb60178

Please help to review this patch, thanks for your time.

Have a nice day, thanks
Fei



  reply	other threads:[~2025-09-09 12:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09  6:33 [PATCH] KVM: x86: Restrict writeback of SMI VCPU state Fei Li
2025-09-09  6:42 ` Fei Li [this message]
2025-09-26 14:54 ` Fei Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6706599-0b8e-44bb-9f53-4e6a4db24b6d@bytedance.com \
    --to=lifei.shirley@bytedance.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).