* [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true"
@ 2026-04-09 21:33 Sean Christopherson
2026-04-14 9:09 ` zhanghao
2026-04-14 16:06 ` Paolo Bonzini
0 siblings, 2 replies; 5+ messages in thread
From: Sean Christopherson @ 2026-04-09 21:33 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, zhanghao, Wanpeng Li
Drop kvm_vcpu.ready and instead detect the case where a recently awakened
vCPU is runnable but not yet scheduled in by explicitly, manually checking
for a target vCPU that is (a) scheduled out, (b) wants to run, (c) is
marked as blocking in its stat, but (d) not actually flagged as blocking.
I.e. treat a runnable vCPU that's in the blocking sequence but not truly
blocking as a candidate for directed yield.
Keying off vcpu->stat.generic.blocking will yield some number of false
positives, e.g. if the vCPU is preempted _before_ blocking, but the rate of
false positives should be roughly the same as the existing approach, as
kvm_sched_out() would previously mark the vCPU as ready when it's scheduled
out and runnable.
Eliminating the write to vcpu->ready in kvm_vcpu_wake_up() fixes a race
where vcpu->ready could be set *after* the target vCPU is scheduled in,
e.g. if the task waking the target vCPU is preempted (or otherwise delayed)
after waking the vCPU, but before setting vcpu->ready. Hitting the race
leads to a very degraded state as KVM will constantly attempt to schedule
in a vCPU that is already running.
Fixes: d73eb57b80b9 ("KVM: Boost vCPUs that are delivering interrupts")
Reported-by: zhanghao <zhanghao1@kylinos.cn>
Closes: https://lore.kernel.org/all/tencent_AE2873502605DBDD4CD1E810F06C410F0105@qq.com
Cc: stable@vger.kernel.org
Cc: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 9 +++++++++
virt/kvm/kvm_main.c | 10 +++-------
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a..eebb2eb39ec0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10399,7 +10399,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
rcu_read_unlock();
- if (!target || !READ_ONCE(target->ready))
+ if (!target || !kvm_vcpu_is_runnable_and_scheduled_out(target))
goto no_yield;
/* Ignore requests to yield to self */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7981e9cab2eb..241a976e3410 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1753,6 +1753,15 @@ static inline bool kvm_vcpu_is_blocking(struct kvm_vcpu *vcpu)
return rcuwait_active(kvm_arch_vcpu_get_wait(vcpu));
}
+static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
+{
+ return READ_ONCE(vcpu->preempted) ||
+ (READ_ONCE(vcpu->scheduled_out) &&
+ READ_ONCE(vcpu->wants_to_run) &&
+ READ_ONCE(vcpu->stat.generic.blocking) &&
+ !kvm_vcpu_is_blocking(vcpu));
+}
+
#ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED
/*
* returns true if the virtual interrupt controller is initialized and
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9faf70ccae7a..9f71e32daac5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -455,7 +455,6 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
kvm_vcpu_set_in_spin_loop(vcpu, false);
kvm_vcpu_set_dy_eligible(vcpu, false);
vcpu->preempted = false;
- vcpu->ready = false;
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
vcpu->last_used_slot = NULL;
@@ -3803,7 +3802,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt);
bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
{
if (__kvm_vcpu_wake_up(vcpu)) {
- WRITE_ONCE(vcpu->ready, true);
++vcpu->stat.generic.halt_wakeup;
return true;
}
@@ -4008,7 +4006,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
continue;
vcpu = xa_load(&kvm->vcpu_array, idx);
- if (!READ_ONCE(vcpu->ready))
+ if (!kvm_vcpu_is_runnable_and_scheduled_out(vcpu))
continue;
if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
continue;
@@ -6393,7 +6391,6 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
WRITE_ONCE(vcpu->preempted, false);
- WRITE_ONCE(vcpu->ready, false);
__this_cpu_write(kvm_running_vcpu, vcpu);
kvm_arch_vcpu_load(vcpu, cpu);
@@ -6408,10 +6405,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
WRITE_ONCE(vcpu->scheduled_out, true);
- if (task_is_runnable(current) && vcpu->wants_to_run) {
+ if (task_is_runnable(current) && vcpu->wants_to_run)
WRITE_ONCE(vcpu->preempted, true);
- WRITE_ONCE(vcpu->ready, true);
- }
+
kvm_arch_vcpu_put(vcpu);
__this_cpu_write(kvm_running_vcpu, NULL);
}
base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true"
2026-04-09 21:33 [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true" Sean Christopherson
@ 2026-04-14 9:09 ` zhanghao
2026-04-14 13:44 ` Sean Christopherson
2026-04-14 16:06 ` Paolo Bonzini
1 sibling, 1 reply; 5+ messages in thread
From: zhanghao @ 2026-04-14 9:09 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, zhanghao, Wanpeng Li
On Thu, Apr 09, 2026, Sean Christopherson wrote:
> Drop kvm_vcpu.ready and instead detect the case where a recently awakened
> vCPU is runnable but not yet scheduled in by explicitly, manually checking
> for a target vCPU that is (a) scheduled out, (b) wants to run, (c) is
> marked as blocking in its stat, but (d) not actually flagged as blocking.
> I.e. treat a runnable vCPU that's in the blocking sequence but not truly
> blocking as a candidate for directed yield.
>
> Keying off vcpu->stat.generic.blocking will yield some number of false
> positives, e.g. if the vCPU is preempted _before_ blocking, but the rate of
> false positives should be roughly the same as the existing approach, as
> kvm_sched_out() would previously mark the vCPU as ready when it's scheduled
> out and runnable.
>
> Eliminating the write to vcpu->ready in kvm_vcpu_wake_up() fixes a race
> where vcpu->ready could be set *after* the target vCPU is scheduled in,
> e.g. if the task waking the target vCPU is preempted (or otherwise delayed)
> after waking the vCPU, but before setting vcpu->ready. Hitting the race
> leads to a very degraded state as KVM will constantly attempt to schedule
> in a vCPU that is already running.
>
> Fixes: d73eb57b80b9 ("KVM: Boost vCPUs that are delivering interrupts")
> Reported-by: zhanghao <zhanghao1@kylinos.cn>
> Closes: https://lore.kernel.org/all/tencent_AE2873502605DBDD4CD1E810F06C410F0105@qq.com
> Cc: stable@vger.kernel.org
> Cc: Wanpeng Li <kernellwp@gmail.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 2 +-
> include/linux/kvm_host.h | 9 +++++++++
> virt/kvm/kvm_main.c | 10 +++-------
> 3 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0a1b63c63d1a..eebb2eb39ec0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10399,7 +10399,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
>
> rcu_read_unlock();
>
> - if (!target || !READ_ONCE(target->ready))
> + if (!target || !kvm_vcpu_is_runnable_and_scheduled_out(target))
> goto no_yield;
>
> /* Ignore requests to yield to self */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7981e9cab2eb..241a976e3410 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1753,6 +1753,15 @@ static inline bool kvm_vcpu_is_blocking(struct kvm_vcpu *vcpu)
> return rcuwait_active(kvm_arch_vcpu_get_wait(vcpu));
> }
>
> +static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
> +{
> + return READ_ONCE(vcpu->preempted) ||
> + (READ_ONCE(vcpu->scheduled_out) &&
> + READ_ONCE(vcpu->wants_to_run) &&
> + READ_ONCE(vcpu->stat.generic.blocking) &&
> + !kvm_vcpu_is_blocking(vcpu));
> +}
> +
> #ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED
> /*
> * returns true if the virtual interrupt controller is initialized and
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9faf70ccae7a..9f71e32daac5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -455,7 +455,6 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> kvm_vcpu_set_in_spin_loop(vcpu, false);
> kvm_vcpu_set_dy_eligible(vcpu, false);
> vcpu->preempted = false;
> - vcpu->ready = false;
> preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
> vcpu->last_used_slot = NULL;
>
> @@ -3803,7 +3802,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt);
> bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
> {
> if (__kvm_vcpu_wake_up(vcpu)) {
> - WRITE_ONCE(vcpu->ready, true);
> ++vcpu->stat.generic.halt_wakeup;
> return true;
> }
> @@ -4008,7 +4006,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
> continue;
>
> vcpu = xa_load(&kvm->vcpu_array, idx);
> - if (!READ_ONCE(vcpu->ready))
> + if (!kvm_vcpu_is_runnable_and_scheduled_out(vcpu))
> continue;
> if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
> continue;
> @@ -6393,7 +6391,6 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
> WRITE_ONCE(vcpu->preempted, false);
> - WRITE_ONCE(vcpu->ready, false);
>
> __this_cpu_write(kvm_running_vcpu, vcpu);
> kvm_arch_vcpu_load(vcpu, cpu);
> @@ -6408,10 +6405,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>
> WRITE_ONCE(vcpu->scheduled_out, true);
>
> - if (task_is_runnable(current) && vcpu->wants_to_run) {
> + if (task_is_runnable(current) && vcpu->wants_to_run)
> WRITE_ONCE(vcpu->preempted, true);
> - WRITE_ONCE(vcpu->ready, true);
> - }
> +
> kvm_arch_vcpu_put(vcpu);
> __this_cpu_write(kvm_running_vcpu, NULL);
> }
>
> base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
> --
> 2.53.0.1213.gd9a14994de-goog
>
kvm_vcpu.ready still exists in struct kvm_vcpu, but there are no remaining
users after this patch, so it looks like a leftover cleanup item.
Best regards,
zhanghao
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true"
2026-04-14 9:09 ` zhanghao
@ 2026-04-14 13:44 ` Sean Christopherson
0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2026-04-14 13:44 UTC (permalink / raw)
To: zhanghao; +Cc: Paolo Bonzini, kvm, linux-kernel, zhanghao, Wanpeng Li
On Tue, Apr 14, 2026, zhanghao wrote:
> On Thu, Apr 09, 2026, Sean Christopherson wrote:
> > Drop kvm_vcpu.ready and instead detect the case where a recently awakened
> > vCPU is runnable but not yet scheduled in by explicitly, manually checking
> > for a target vCPU that is (a) scheduled out, (b) wants to run, (c) is
> > marked as blocking in its stat, but (d) not actually flagged as blocking.
> > I.e. treat a runnable vCPU that's in the blocking sequence but not truly
> > blocking as a candidate for directed yield.
> >
> > Keying off vcpu->stat.generic.blocking will yield some number of false
> > positives, e.g. if the vCPU is preempted _before_ blocking, but the rate of
> > false positives should be roughly the same as the existing approach, as
> > kvm_sched_out() would previously mark the vCPU as ready when it's scheduled
> > out and runnable.
> >
> > Eliminating the write to vcpu->ready in kvm_vcpu_wake_up() fixes a race
> > where vcpu->ready could be set *after* the target vCPU is scheduled in,
> > e.g. if the task waking the target vCPU is preempted (or otherwise delayed)
> > after waking the vCPU, but before setting vcpu->ready. Hitting the race
> > leads to a very degraded state as KVM will constantly attempt to schedule
> > in a vCPU that is already running.
...
> kvm_vcpu.ready still exists in struct kvm_vcpu, but there are no remaining
> users after this patch, so it looks like a leftover cleanup item.
Gah, I intended to remove kvm_vcpu.ready, not sure how I didn't. Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true"
2026-04-09 21:33 [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true" Sean Christopherson
2026-04-14 9:09 ` zhanghao
@ 2026-04-14 16:06 ` Paolo Bonzini
2026-04-14 22:38 ` Sean Christopherson
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2026-04-14 16:06 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, zhanghao, Wanpeng Li
On 4/9/26 23:33, Sean Christopherson wrote:
> +static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
> +{
> + return READ_ONCE(vcpu->preempted) ||
> + (READ_ONCE(vcpu->scheduled_out) &&
> + READ_ONCE(vcpu->wants_to_run) &&
wants_to_run doesn't seem important here, because blocking will never be
set outside KVM_RUN (unlike scheduled_out which can be set within any
vcpu_load/vcpu_put pair, if you're unlucky enough).
> + READ_ONCE(vcpu->stat.generic.blocking) &&
> + !kvm_vcpu_is_blocking(vcpu));
If you get here you have done the finish_rcuwait() in kvm_vcpu_block(),
meaning that you've been already scheduled in, haven't you? So, you
would need something like this:
static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct
kvm_vcpu *vcpu)
{
if (READ_ONCE(vcpu->preempted))
return true;
if (!READ_ONCE(vcpu->scheduled_out))
return false;
if (!READ_ONCE(vcpu->stat.generic.blocking))
return false;
return rcuwait_was_woken(kvm_arch_vcpu_get_wait(vcpu));
}
// in rcuwait.h
static inline bool rcuwait_was_woken(struct rcuwait *w)
{
guard(rcu)();
struct task_struct *t = rcu_access_pointer(w->task);
return t && !task_is_runnable(t);
}
Paolo
> +}
> +
> #ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED
> /*
> * returns true if the virtual interrupt controller is initialized and
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9faf70ccae7a..9f71e32daac5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -455,7 +455,6 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> kvm_vcpu_set_in_spin_loop(vcpu, false);
> kvm_vcpu_set_dy_eligible(vcpu, false);
> vcpu->preempted = false;
> - vcpu->ready = false;
> preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
> vcpu->last_used_slot = NULL;
>
> @@ -3803,7 +3802,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt);
> bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
> {
> if (__kvm_vcpu_wake_up(vcpu)) {
> - WRITE_ONCE(vcpu->ready, true);
> ++vcpu->stat.generic.halt_wakeup;
> return true;
> }
> @@ -4008,7 +4006,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
> continue;
>
> vcpu = xa_load(&kvm->vcpu_array, idx);
> - if (!READ_ONCE(vcpu->ready))
> + if (!kvm_vcpu_is_runnable_and_scheduled_out(vcpu))
> continue;
> if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
> continue;
> @@ -6393,7 +6391,6 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
> WRITE_ONCE(vcpu->preempted, false);
> - WRITE_ONCE(vcpu->ready, false);
>
> __this_cpu_write(kvm_running_vcpu, vcpu);
> kvm_arch_vcpu_load(vcpu, cpu);
> @@ -6408,10 +6405,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>
> WRITE_ONCE(vcpu->scheduled_out, true);
>
> - if (task_is_runnable(current) && vcpu->wants_to_run) {
> + if (task_is_runnable(current) && vcpu->wants_to_run)
> WRITE_ONCE(vcpu->preempted, true);
> - WRITE_ONCE(vcpu->ready, true);
> - }
> +
> kvm_arch_vcpu_put(vcpu);
> __this_cpu_write(kvm_running_vcpu, NULL);
> }
>
> base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true"
2026-04-14 16:06 ` Paolo Bonzini
@ 2026-04-14 22:38 ` Sean Christopherson
0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2026-04-14 22:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, zhanghao, Wanpeng Li
On Tue, Apr 14, 2026, Paolo Bonzini wrote:
> On 4/9/26 23:33, Sean Christopherson wrote:
> > +static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
> > +{
> > + return READ_ONCE(vcpu->preempted) ||
> > + (READ_ONCE(vcpu->scheduled_out) &&
> > + READ_ONCE(vcpu->wants_to_run) &&
>
> wants_to_run doesn't seem important here, because blocking will never be set
> outside KVM_RUN (unlike scheduled_out which can be set within any
> vcpu_load/vcpu_put pair, if you're unlucky enough).
Oh, good point.
> > + READ_ONCE(vcpu->stat.generic.blocking) &&
> > + !kvm_vcpu_is_blocking(vcpu));
>
> If you get here you have done the finish_rcuwait() in kvm_vcpu_block(),
> meaning that you've been already scheduled in, haven't you?
Gah, yes. I didn't realize finish_rcuwait() is what actually completes the
wakeup from KVM's perspective.
> So, you would need something like this:
>
> static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu)
> {
> if (READ_ONCE(vcpu->preempted))
> return true;
>
> if (!READ_ONCE(vcpu->scheduled_out))
> return false;
> if (!READ_ONCE(vcpu->stat.generic.blocking))
Hmm, I think this could actually be:
if (!kvm_vcpu_is_blocking(vcpu))
return false;
Because my use of vcpu->stat.generic.blocking was purely due to missing that
finish_rcuwait() is effectively what clears "blocking". That would narrow the
window for false positives a little, e.g. would at least wait until after the
kvm_arch_vcpu_blocking() call to treat the vCPU as blocking.
> return false;
> return rcuwait_was_woken(kvm_arch_vcpu_get_wait(vcpu));
> }
>
> // in rcuwait.h
> static inline bool rcuwait_was_woken(struct rcuwait *w)
> {
> guard(rcu)();
> struct task_struct *t = rcu_access_pointer(w->task);
> return t && !task_is_runnable(t);
Ah, and I missed the task_is_runnable() check guarding vcpu->ready. I suspect I
assumed kvm_vcpu_on_spin() would do that check.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-14 22:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 21:33 [PATCH] KVM: Drop kvm_vcpu.ready to squash race where "ready" can get stuck "true" Sean Christopherson
2026-04-14 9:09 ` zhanghao
2026-04-14 13:44 ` Sean Christopherson
2026-04-14 16:06 ` Paolo Bonzini
2026-04-14 22:38 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox