From: Fei Li <lifei.shirley@bytedance.com>
To: pbonzini@redhat.com, mtosatti@redhat.com, seanjc@google.com,
kvm@vger.kernel.org, Jan Kiszka <jan.kiszka@siemens.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH] KVM: x86: Restrict writeback of SMI VCPU state
Date: Fri, 26 Sep 2025 22:54:05 +0800 [thread overview]
Message-ID: <5db494c6-c8dd-4073-bea0-5a62fce170e9@bytedance.com> (raw)
In-Reply-To: <20250909063327.14263-1-lifei.shirley@bytedance.com>
Dear maintainers,
Could you please help to review the patch [PATCH] KVM: x86: Restrict
writeback of SMI VCPU state? This fixes a race condition causing VM hang
when frequently running `info registers -a` via HMP during VM startup.
The issue occurs because unrestricted SMI state writeback conflicts with
vCPU initialization sequences.
It would be very appreciated for us to know if this patch properly
resolve the race condition, and if validated, we would like to apply it
to our production environment. Let me know if further details are needed. :)
Best regards, and thanks again!
Fei
On 9/9/25 2:33 PM, Fei Li wrote:
> Recently, we meet a SMI race bug triggered by one monitor tool in our
> production environment. This monitor executes 'info registers -a' hmp
> at a fixed frequency, even during VM startup process, which makes
> some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs.
>
> The complete calling processes for the SMI race are as follows:
>
> //thread1 //thread2 //thread3
> `info registers -a` hmp [1] AP(vcpu1) thread [2] BSP(vcpu0) send INIT/SIPI [3]
>
> [2]
> KVM: KVM_RUN and then
> schedule() in kvm_vcpu_block() loop
>
> [1]
> for each cpu: cpu_synchronize_state
> if !qemu_thread_is_self()
> 1. insert to cpu->work_list, and handle asynchronously
> 2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal
>
> [2]
> KVM: checks signal_pending, breaks loop and returns -EINTR
> Qemu: break kvm_cpu_exec loop, run
> 1. qemu_wait_io_event()
> => process_queued_cpu_work => cpu->work_list.func()
> e.i. do_kvm_cpu_synchronize_state() callback
> => kvm_arch_get_registers
> => kvm_get_mp_state /* KVM: get_mpstate also calls
> kvm_apic_accept_events() to handle INIT and SIPI */
> => cpu->vcpu_dirty = true;
> // end of qemu_wait_io_event
>
> [3]
> SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS.
> send INIT and then SIPI by writing APIC_ICR during smp_scan
> KVM: BSP(vcpu0) exits, then => handle_apic_write
> => kvm_lapic_reg_write => kvm_apic_send_ipi to all APs
> => for each AP: __apic_accept_irq, e.g. for AP(vcpu1)
> ==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT)
> (not kick the AP yet)
> ==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events)
> (not kick the AP yet)
>
> [2]
> Qemu continue:
> 2. kvm_cpu_exec()
> => if (cpu->vcpu_dirty):
> => kvm_arch_put_registers
> => kvm_put_vcpu_events
> KVM: kvm_vcpu_ioctl_x86_set_vcpu_events
> => clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
> e.i. pending_events changes from 11b to 10b
> // end of kvm_vcpu_ioctl_x86_set_vcpu_events
> Qemu: => after put_registers, cpu->vcpu_dirty = false;
> => kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
> KVM: KVM_RUN
> => schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal
> /* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED
> to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without
> handling INIT inside kvm_apic_accept_events(), considering BSP will never
> send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter
> non-root mode */
>
> [3]
> SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever
> e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000
> and BSP(vcpu0) appears 100% utilized as it is in a while loop.
>
> To fix this, avoid clobbering SMI when not putting "reset" state, just
> like NMI abd SIPI does.
>
> Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
> ---
> target/i386/kvm/kvm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8..598661799a 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
>
> events.sipi_vector = env->sipi_vector;
>
> - if (has_msr_smbase) {
> + if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) {
> events.flags |= KVM_VCPUEVENT_VALID_SMM;
> events.smi.smm = !!(env->hflags & HF_SMM_MASK);
> events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);
prev parent reply other threads:[~2025-09-26 14:56 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 6:33 [PATCH] KVM: x86: Restrict writeback of SMI VCPU state Fei Li
2025-09-09 6:42 ` Fei Li
2025-09-26 14:54 ` Fei Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5db494c6-c8dd-4073-bea0-5a62fce170e9@bytedance.com \
--to=lifei.shirley@bytedance.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).