From: Wanpeng Li <kernellwp@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: [PATCH 09/10] KVM: Implement IPI-aware directed yield candidate selection
Date: Mon, 10 Nov 2025 11:32:30 +0800 [thread overview]
Message-ID: <20251110033232.12538-10-kernellwp@gmail.com> (raw)
In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com>
From: Wanpeng Li <wanpengli@tencent.com>
Integrate IPI tracking with directed yield to improve scheduling when
vCPUs spin waiting for IPI responses.
Implement priority-based candidate selection in kvm_vcpu_on_spin()
with three tiers: Priority 1 uses kvm_vcpu_is_ipi_receiver() to
identify confirmed IPI targets within the recency window, addressing
lock holders spinning on IPI acknowledgment. Priority 2 leverages
existing kvm_arch_dy_has_pending_interrupt() for compatibility with
arch-specific fast paths. Priority 3 falls back to conventional
preemption-based logic when yield_to_kernel_mode is requested,
providing a safety net for non-IPI scenarios.
Add kvm_vcpu_is_good_yield_candidate() helper to consolidate these
checks, preventing over-aggressive boosting while enabling targeted
optimization when IPI patterns are detected.
Performance testing (16 pCPUs host, 16 vCPUs/VM):
Dedup (simlarge):
2 VMs: +47.1% throughput
3 VMs: +28.1% throughput
4 VMs: +1.7% throughput
VIPS (simlarge):
2 VMs: +26.2% throughput
3 VMs: +12.7% throughput
4 VMs: +6.0% throughput
Gains stem from effective directed yield when vCPUs spin on IPI
delivery, reducing synchronization overhead. The improvement is most
pronounced at moderate overcommit (2-3 VMs) where contention reduction
outweighs context switching cost.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
virt/kvm/kvm_main.c | 52 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 43 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 495e769c7ddf..9cf44b6b396d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3968,6 +3968,47 @@ bool __weak kvm_vcpu_is_ipi_receiver(struct kvm_vcpu *sender, struct kvm_vcpu *r
return false;
}
+/*
+ * IPI-aware candidate selection for directed yield
+ *
+ * Priority order:
+ * 1) Confirmed IPI receiver of 'me' within a short window (always boost)
+ * 2) Arch-provided fast pending interrupt (user-mode boost)
+ * 3) Kernel-mode yield: preempted-in-kernel vCPU (traditional boost)
+ * 4) Otherwise, be conservative
+ */
+static bool kvm_vcpu_is_good_yield_candidate(struct kvm_vcpu *me, struct kvm_vcpu *vcpu,
+ bool yield_to_kernel_mode)
+{
+ /* Priority 1: recently targeted IPI receiver */
+ if (kvm_vcpu_is_ipi_receiver(me, vcpu))
+ return true;
+
+ /* Priority 2: fast pending-interrupt hint (arch-specific). */
+ if (kvm_arch_dy_has_pending_interrupt(vcpu))
+ return true;
+
+ /*
+ * Minimal preempted gate for remaining cases:
+ * - If the target is neither a confirmed IPI receiver nor has a fast
+ * pending interrupt, require that the target has been preempted.
+ * - If yielding to kernel mode is requested, additionally require
+ * that the target was preempted while in kernel mode.
+ *
+ * This avoids expanding the candidate set too aggressively and helps
+ * prevent overboost in workloads where the IPI context is not
+ * involved.
+ */
+ if (!READ_ONCE(vcpu->preempted))
+ return false;
+
+ if (yield_to_kernel_mode &&
+ !kvm_arch_vcpu_preempted_in_kernel(vcpu))
+ return false;
+
+ return true;
+}
+
void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
{
int nr_vcpus, start, i, idx, yielded;
@@ -4015,15 +4056,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
continue;
- /*
- * Treat the target vCPU as being in-kernel if it has a pending
- * interrupt, as the vCPU trying to yield may be spinning
- * waiting on IPI delivery, i.e. the target vCPU is in-kernel
- * for the purposes of directed yield.
- */
- if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
- !kvm_arch_dy_has_pending_interrupt(vcpu) &&
- !kvm_arch_vcpu_preempted_in_kernel(vcpu))
+ /* IPI-aware candidate selection */
+ if (!kvm_vcpu_is_good_yield_candidate(me, vcpu, yield_to_kernel_mode))
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
--
2.43.0
next prev parent reply other threads:[~2025-11-10 3:33 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 3:32 [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2025-11-10 3:32 ` [PATCH 01/10] sched: Add vCPU debooster infrastructure Wanpeng Li
2025-11-10 3:32 ` [PATCH 02/10] sched/fair: Add rate-limiting and validation helpers Wanpeng Li
2025-11-12 6:40 ` K Prateek Nayak
2025-11-12 6:44 ` K Prateek Nayak
2025-11-13 13:36 ` Wanpeng Li
2025-11-13 12:00 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 03/10] sched/fair: Add cgroup LCA finder for hierarchical yield Wanpeng Li
2025-11-12 6:50 ` K Prateek Nayak
2025-11-13 8:59 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 04/10] sched/fair: Add penalty calculation and application logic Wanpeng Li
2025-11-12 7:25 ` K Prateek Nayak
2025-11-13 13:25 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 05/10] sched/fair: Wire up yield deboost in yield_to_task_fair() Wanpeng Li
2025-11-10 5:16 ` kernel test robot
2025-11-10 5:16 ` kernel test robot
2025-11-10 3:32 ` [PATCH 06/10] KVM: Fix last_boosted_vcpu index assignment bug Wanpeng Li
2025-11-21 0:35 ` Sean Christopherson
2025-11-21 0:38 ` Sean Christopherson
2025-11-21 11:46 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 07/10] KVM: x86: Add IPI tracking infrastructure Wanpeng Li
2025-11-10 3:32 ` [PATCH 08/10] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Wanpeng Li
2025-11-10 3:32 ` Wanpeng Li [this message]
2025-11-10 3:39 ` [PATCH 10/10] KVM: Relaxed boost as safety net Wanpeng Li
2025-11-10 12:02 ` [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Christian Borntraeger
2025-11-12 5:01 ` Wanpeng Li
2025-11-18 8:11 ` Christian Borntraeger
2025-11-18 14:19 ` Wanpeng Li
2025-11-11 6:28 ` K Prateek Nayak
2025-11-12 4:54 ` Wanpeng Li
2025-11-12 6:07 ` K Prateek Nayak
2025-11-13 5:37 ` Wanpeng Li
2025-11-13 4:42 ` K Prateek Nayak
2025-11-13 8:33 ` Wanpeng Li
2025-11-13 9:48 ` K Prateek Nayak
2025-11-13 13:56 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251110033232.12538-10-kernellwp@gmail.com \
--to=kernellwp@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox