From: Fei Li <lifei.shirley@bytedance.com>
To: pbonzini@redhat.com, mtosatti@redhat.com, seanjc@google.com,
kvm@vger.kernel.org
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH] KVM: x86: Restrict writeback of SMI VCPU state
Date: Tue, 9 Sep 2025 14:42:51 +0800 [thread overview]
Message-ID: <f6706599-0b8e-44bb-9f53-4e6a4db24b6d@bytedance.com> (raw)
In-Reply-To: <20250909063327.14263-1-lifei.shirley@bytedance.com>
On 9/9/25 2:33 PM, Fei Li wrote:
> Recently, we meet a SMI race bug triggered by one monitor tool in our
> production environment. This monitor executes 'info registers -a' hmp
> at a fixed frequency, even during VM startup process, which makes
> some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs.
>
> The complete calling processes for the SMI race are as follows:
>
> //thread1 //thread2 //thread3
> `info registers -a` hmp [1] AP(vcpu1) thread [2] BSP(vcpu0) send INIT/SIPI [3]
>
> [2]
> KVM: KVM_RUN and then
> schedule() in kvm_vcpu_block() loop
>
> [1]
> for each cpu: cpu_synchronize_state
> if !qemu_thread_is_self()
> 1. insert to cpu->work_list, and handle asynchronously
> 2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal
>
> [2]
> KVM: checks signal_pending, breaks loop and returns -EINTR
> Qemu: break kvm_cpu_exec loop, run
> 1. qemu_wait_io_event()
> => process_queued_cpu_work => cpu->work_list.func()
> e.i. do_kvm_cpu_synchronize_state() callback
> => kvm_arch_get_registers
> => kvm_get_mp_state /* KVM: get_mpstate also calls
> kvm_apic_accept_events() to handle INIT and SIPI */
> => cpu->vcpu_dirty = true;
> // end of qemu_wait_io_event
>
> [3]
> SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS.
> send INIT and then SIPI by writing APIC_ICR during smp_scan
> KVM: BSP(vcpu0) exits, then => handle_apic_write
> => kvm_lapic_reg_write => kvm_apic_send_ipi to all APs
> => for each AP: __apic_accept_irq, e.g. for AP(vcpu1)
> ==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT)
> (not kick the AP yet)
> ==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events)
> (not kick the AP yet)
>
> [2]
> Qemu continue:
> 2. kvm_cpu_exec()
> => if (cpu->vcpu_dirty):
> => kvm_arch_put_registers
> => kvm_put_vcpu_events
> KVM: kvm_vcpu_ioctl_x86_set_vcpu_events
> => clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
> e.i. pending_events changes from 11b to 10b
> // end of kvm_vcpu_ioctl_x86_set_vcpu_events
> Qemu: => after put_registers, cpu->vcpu_dirty = false;
> => kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
> KVM: KVM_RUN
> => schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal
> /* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED
> to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without
> handling INIT inside kvm_apic_accept_events(), considering BSP will never
> send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter
> non-root mode */
>
> [3]
> SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever
> e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000
> and BSP(vcpu0) appears 100% utilized as it is in a while loop.
>
> To fix this, avoid clobbering SMI when not putting "reset" state, just
> like NMI abd SIPI does.
>
> Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
> ---
> target/i386/kvm/kvm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8..598661799a 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
>
> events.sipi_vector = env->sipi_vector;
>
> - if (has_msr_smbase) {
> + if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) {
> events.flags |= KVM_VCPUEVENT_VALID_SMM;
> events.smi.smm = !!(env->hflags & HF_SMM_MASK);
> events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);
Hi all,
The previous context is from one KVM topic:
https://lore-kernel.gnuweeb.org/lkml/f904b674-98ba-4e13-a64c-fd30b6ac4a2e@bytedance.com/T/#m80f426b55e272f9b257e2d7f6ff4902b7eb60178
Please help to review this patch, thanks for your time.
Have a nice day, thanks
Fei
next prev parent reply other threads:[~2025-09-09 12:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 6:33 [PATCH] KVM: x86: Restrict writeback of SMI VCPU state Fei Li
2025-09-09 6:42 ` Fei Li [this message]
2025-09-26 14:54 ` Fei Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f6706599-0b8e-44bb-9f53-4e6a4db24b6d@bytedance.com \
--to=lifei.shirley@bytedance.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).