Re: [PATCH 2/3] KVM: x86: Skip IN_GUEST_MODE vCPUs in kvm_vcpu_on_spin main loop

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: zhanghao <76824143@qq.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	zhanghao <zhanghao1@kylinos.cn>
Subject: Re: [PATCH 2/3] KVM: x86: Skip IN_GUEST_MODE vCPUs in kvm_vcpu_on_spin main loop
Date: Thu, 12 Mar 2026 18:02:22 -0700	[thread overview]
Message-ID: <abNiHqMe0mqKIsML@google.com> (raw)
In-Reply-To: <tencent_AE2873502605DBDD4CD1E810F06C410F0105@qq.com>

On Thu, Mar 12, 2026, zhanghao wrote:
> On Tue, Feb 17, 2026, Sean Christopherson wrote:
> > On Sun, Feb 15, 2026, 76824143@qq.com wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 476ecdb18bdd..663df3a121c8 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -4026,6 +4026,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
> > >  		vcpu = xa_load(&kvm->vcpu_array, idx);
> > >  		if (!READ_ONCE(vcpu->ready))
> > >  			continue;
> > > +
> > > +		if (READ_ONCE(vcpu->mode) == IN_GUEST_MODE)
> > > +			continue;
> > 
> > This should generally not happen, as vcpu->ready should only be true when a vCPU
> > is scheduled out.  Although it does look like there's a race in kvm_vcpu_wake_up()
> > where vcpu->ready could be left %true, e.g. if the task was delyed or preempted
> > after __kvm_vcpu_wake_up(), before the "WRITE_ONCE(vcpu->ready, true)".  Not sure
> > how best to handle that scenario.
> > 
> > > +
> > >  		if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
> > >  			continue;
> > >  
> > > -- 
> > > 2.39.2
> > >
> 
> Thank you for reviewing this patch. As pointed out in the discussion,
> this addresses a race condition where kvm_vcpu_wake_up() can set
> "ready=true" while kvm_sched_in() is concurrently setting the vCPU to
> running state. Without this check, we might attempt to yield to a vCPU
> that is already running, which is futile.

Right, but I want to solve this by eliminating the race, not by papering over
the issue in kvm_vcpu_on_spin().  I don't see a sane way of handling the cross-vCPU
writes without atomics, and I'd prefer to avoid the complexity that comes with
that.

What if instead of having the waker set vcpu->ready, we remove it entirely and
instead do a best-effort detection of the "vCPU was blocking, is now awake, but
probably hasn't been scheduled in yet".  We can use a combination of flags to
detect that, and thanks to vcpu->scheduled_out, I'm pretty sure the false positive
rate would be on par with the existing vcpu->ready check (for the preempted case).

The idea is to identify the case where the vCPU is in the blocking sequence, but
not actually blocking, wants to run, and is scheduled out.

Compile tested only (and I don't love the name of the helper).

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b5d48e75b65..ac849d879b73 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10393,7 +10393,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 
        rcu_read_unlock();
 
-       if (!target || !READ_ONCE(target->ready))
+       if (!target || !kvm_vcpu_is_runnable_and_scheduled_out(target))
                goto no_yield;
 
        /* Ignore requests to yield to self */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 581b217abc33..70739179af58 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1746,6 +1746,15 @@ static inline bool kvm_vcpu_is_blocking(struct kvm_vcpu *vcpu)
        return rcuwait_active(kvm_arch_vcpu_get_wait(vcpu));
 }
 
+static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
+{
+       return READ_ONCE(vcpu->preempted) ||
+              (READ_ONCE(vcpu->scheduled_out) &&
+               READ_ONCE(vcpu->wants_to_run) &&
+               READ_ONCE(vcpu->stat.generic.blocking) &&
+               !kvm_vcpu_is_blocking(vcpu));
+}
+
 #ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED
 /*
  * returns true if the virtual interrupt controller is initialized and
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9faf70ccae7a..9f71e32daac5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -455,7 +455,6 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
        kvm_vcpu_set_in_spin_loop(vcpu, false);
        kvm_vcpu_set_dy_eligible(vcpu, false);
        vcpu->preempted = false;
-       vcpu->ready = false;
        preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
        vcpu->last_used_slot = NULL;
 
@@ -3803,7 +3802,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt);
 bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 {
        if (__kvm_vcpu_wake_up(vcpu)) {
-               WRITE_ONCE(vcpu->ready, true);
                ++vcpu->stat.generic.halt_wakeup;
                return true;
        }
@@ -4008,7 +4006,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
                        continue;
 
                vcpu = xa_load(&kvm->vcpu_array, idx);
-               if (!READ_ONCE(vcpu->ready))
+               if (!kvm_vcpu_is_runnable_and_scheduled_out(vcpu))
                        continue;
                if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
                        continue;
@@ -6393,7 +6391,6 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
        struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
 
        WRITE_ONCE(vcpu->preempted, false);
-       WRITE_ONCE(vcpu->ready, false);
 
        __this_cpu_write(kvm_running_vcpu, vcpu);
        kvm_arch_vcpu_load(vcpu, cpu);
@@ -6408,10 +6405,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
 
        WRITE_ONCE(vcpu->scheduled_out, true);
 
-       if (task_is_runnable(current) && vcpu->wants_to_run) {
+       if (task_is_runnable(current) && vcpu->wants_to_run)
                WRITE_ONCE(vcpu->preempted, true);
-               WRITE_ONCE(vcpu->ready, true);
-       }
+
        kvm_arch_vcpu_put(vcpu);
        __this_cpu_write(kvm_running_vcpu, NULL);
 }

next prev parent reply	other threads:[~2026-03-13  1:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260215140402.24659-1-76824143@qq.com>
2026-02-15 14:04 ` [PATCH 1/3] KVM: x86: Enhance kvm_vcpu_eligible_for_directed_yield to detect golden targets 76824143
2026-02-17 15:37   ` Sean Christopherson
2026-02-15 14:04 ` [PATCH 2/3] KVM: x86: Skip IN_GUEST_MODE vCPUs in kvm_vcpu_on_spin main loop 76824143
2026-02-17 15:41   ` Sean Christopherson
2026-03-12  5:57     ` zhanghao
2026-03-13  1:02       ` Sean Christopherson [this message]
2026-03-19  8:40         ` zhanghao
2026-02-15 14:04 ` [PATCH 3/3] KVM: x86: Use dynamic try count based on vCPU count 76824143
2026-02-17 16:21   ` Sean Christopherson
2026-03-12  6:24     ` zhanghao

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0b5d48e75b6 dfblob:ac849d879b7 dfblob:581b217abc3
dfblob:70739179af5 dfblob:9faf70ccae7 dfblob:9f71e32daac )
 OR (
bs:"Re: [PATCH 2/3] KVM: x86: Skip IN_GUEST_MODE vCPUs in kvm_vcpu_on_spin main loop" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abNiHqMe0mqKIsML@google.com \
    --to=seanjc@google.com \
    --cc=76824143@qq.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=zhanghao1@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox