From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBB0B19E96D for ; Fri, 13 Mar 2026 01:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773363746; cv=none; b=htr3ZBu1Plsa2GtxVyiEG/tCLujZsmfGM+Q69JkNb2kELGF2Htx0FsFRTSUpV6OteglTyPOmPNtr1NksVTuMOeA/NdUeQmB3hNV6pwzCVQQXN5cqqfkE6hJIq+2mWylt0sLebYXb2Qv2u2HSHrBcUBGzReJKGOE7QHVVbnTbRp4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773363746; c=relaxed/simple; bh=t7SBRthkjnSag9ZHPducZy4771fuu/95E2vZdxDD6lA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KglfzFKGUWRrzYlnSZn9nBa7wBoI3xq4tp5++x6JhpKLp6LQcW+ze60PaiP4/FpNlQX8eOx7rRufgLlZ9KjElqf4gIVfwLZFgxjwkENvkTIunBIG85tTubzdLs1+A3/wd1zCI8/ww/hbIZqYTNMOjbdSpo9GRmojfGqJlEDc/5k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WtCNX1KA; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WtCNX1KA" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-829bf6406ecso5108355b3a.0 for ; Thu, 12 Mar 2026 18:02:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1773363744; x=1773968544; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=f0PMjfJj5HOggwaEl4u8ReK/X7vSOv3Nc8M3FyAZ2iI=; b=WtCNX1KA30vWxDJcCt3/ZSqcQ5Q43ggf9s+c2Gf5342M542vZPyNpnNEdo1/oxLEo6 /6ljnG+8lvKl5Hg60VGWnKfbygaisEDjqU+gw2AL0uXpYbCVignTV4hiD8s9qlWmmnWc c+nsbejuGeh/EhPUmzg5XeqLOmU8Vwr2HY6o4B5OaEqXgW1pt511zupU+1sRyLDnT5dr UuLnXmA4SD/B5rrBHOWrHRoHNHD1yaEf8u5fdBy0QeKwNuh8Sbld/EiwWzdNWEqp1jYf k8FBov6I3ApvlhrQABqgoUvXXMbcX9P/w2FH16bqpoqMNf6rcERssWlce/f8x6bAD2cj nO3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773363744; x=1773968544; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f0PMjfJj5HOggwaEl4u8ReK/X7vSOv3Nc8M3FyAZ2iI=; b=qV3uMNXi7rU1DFjyzXDA5+tM37ZMXfhOY0NF5WuvSgk5Vvg5IxG48WRDHRrTvjM8mn Pcq3L/d3nwZrwcU9h4JiXiyV87SSlBHDnokE7sBBYlNKVIaixuNQLrEH9TSqgADx+d/M kO7NsN8nYMJwTDZhBwW/zZGZTOXcEc+rYVC8PjodbZ2loyByteiqI1Ii5KGc1WMlVAPn S0uKqQ7hcbIqN3uilw+X24qvIrckZfbaFGvPuPqZvjX6OSh+fDi+5JEWouDayEVQs7h3 b/tTpa4r0mHEu/9fPEg3dyxaRAnAIEKje9pyLeAQb5znxBPD5C3RhXhEi9RBq2h6qbiO lujw== X-Forwarded-Encrypted: i=1; AJvYcCUtpY5bJG9WvpTH2MaZvOVDNh6+B2Ca65xuDrKqWVLwPrXSxOiY8CSUzC9Twr2vsWVGZuw=@vger.kernel.org X-Gm-Message-State: AOJu0YwXwaoFZd553M7Df/UOXNBBa/gK6l11MDmjd5T8r89VoyE9Jq3S AGUkay8MHjbyYPTA28CCf74zImGmgqyxOxX03rmMiGvXLiXxIrb8UbWnCmM/+FYQ06vdd4XSud0 WuUiZew== X-Received: from pfbbr4.prod.google.com ([2002:a05:6a00:4404:b0:829:880b:b4]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3389:b0:827:3e19:574a with SMTP id d2e1a72fcca58-82a198ec7c0mr1189751b3a.42.1773363743798; Thu, 12 Mar 2026 18:02:23 -0700 (PDT) Date: Thu, 12 Mar 2026 18:02:22 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260215140402.24659-1-76824143@qq.com> Message-ID: Subject: Re: [PATCH 2/3] KVM: x86: Skip IN_GUEST_MODE vCPUs in kvm_vcpu_on_spin main loop From: Sean Christopherson To: zhanghao <76824143@qq.com> Cc: pbonzini@redhat.com, kvm@vger.kernel.org, zhanghao Content-Type: text/plain; charset="us-ascii" On Thu, Mar 12, 2026, zhanghao wrote: > On Tue, Feb 17, 2026, Sean Christopherson wrote: > > On Sun, Feb 15, 2026, 76824143@qq.com wrote: > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 476ecdb18bdd..663df3a121c8 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -4026,6 +4026,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) > > > vcpu = xa_load(&kvm->vcpu_array, idx); > > > if (!READ_ONCE(vcpu->ready)) > > > continue; > > > + > > > + if (READ_ONCE(vcpu->mode) == IN_GUEST_MODE) > > > + continue; > > > > This should generally not happen, as vcpu->ready should only be true when a vCPU > > is scheduled out. Although it does look like there's a race in kvm_vcpu_wake_up() > > where vcpu->ready could be left %true, e.g. if the task was delyed or preempted > > after __kvm_vcpu_wake_up(), before the "WRITE_ONCE(vcpu->ready, true)". Not sure > > how best to handle that scenario. > > > > > + > > > if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) > > > continue; > > > > > > -- > > > 2.39.2 > > > > > Thank you for reviewing this patch. As pointed out in the discussion, > this addresses a race condition where kvm_vcpu_wake_up() can set > "ready=true" while kvm_sched_in() is concurrently setting the vCPU to > running state. Without this check, we might attempt to yield to a vCPU > that is already running, which is futile. Right, but I want to solve this by eliminating the race, not by papering over the issue in kvm_vcpu_on_spin(). I don't see a sane way of handling the cross-vCPU writes without atomics, and I'd prefer to avoid the complexity that comes with that. What if instead of having the waker set vcpu->ready, we remove it entirely and instead do a best-effort detection of the "vCPU was blocking, is now awake, but probably hasn't been scheduled in yet". We can use a combination of flags to detect that, and thanks to vcpu->scheduled_out, I'm pretty sure the false positive rate would be on par with the existing vcpu->ready check (for the preempted case). The idea is to identify the case where the vCPU is in the blocking sequence, but not actually blocking, wants to run, and is scheduled out. Compile tested only (and I don't love the name of the helper). diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0b5d48e75b65..ac849d879b73 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10393,7 +10393,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id) rcu_read_unlock(); - if (!target || !READ_ONCE(target->ready)) + if (!target || !kvm_vcpu_is_runnable_and_scheduled_out(target)) goto no_yield; /* Ignore requests to yield to self */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 581b217abc33..70739179af58 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1746,6 +1746,15 @@ static inline bool kvm_vcpu_is_blocking(struct kvm_vcpu *vcpu) return rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)); } +static inline bool kvm_vcpu_is_runnable_and_scheduled_out(struct kvm_vcpu *vcpu) +{ + return READ_ONCE(vcpu->preempted) || + (READ_ONCE(vcpu->scheduled_out) && + READ_ONCE(vcpu->wants_to_run) && + READ_ONCE(vcpu->stat.generic.blocking) && + !kvm_vcpu_is_blocking(vcpu)); +} + #ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED /* * returns true if the virtual interrupt controller is initialized and diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9faf70ccae7a..9f71e32daac5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -455,7 +455,6 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) kvm_vcpu_set_in_spin_loop(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); vcpu->preempted = false; - vcpu->ready = false; preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); vcpu->last_used_slot = NULL; @@ -3803,7 +3802,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt); bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu) { if (__kvm_vcpu_wake_up(vcpu)) { - WRITE_ONCE(vcpu->ready, true); ++vcpu->stat.generic.halt_wakeup; return true; } @@ -4008,7 +4006,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) continue; vcpu = xa_load(&kvm->vcpu_array, idx); - if (!READ_ONCE(vcpu->ready)) + if (!kvm_vcpu_is_runnable_and_scheduled_out(vcpu)) continue; if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) continue; @@ -6393,7 +6391,6 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); WRITE_ONCE(vcpu->preempted, false); - WRITE_ONCE(vcpu->ready, false); __this_cpu_write(kvm_running_vcpu, vcpu); kvm_arch_vcpu_load(vcpu, cpu); @@ -6408,10 +6405,9 @@ static void kvm_sched_out(struct preempt_notifier *pn, WRITE_ONCE(vcpu->scheduled_out, true); - if (task_is_runnable(current) && vcpu->wants_to_run) { + if (task_is_runnable(current) && vcpu->wants_to_run) WRITE_ONCE(vcpu->preempted, true); - WRITE_ONCE(vcpu->ready, true); - } + kvm_arch_vcpu_put(vcpu); __this_cpu_write(kvm_running_vcpu, NULL); }