From: Wanpeng Li <kernellwp@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>,
Richie Buturla <richie@linux.ibm.com>
Subject: [PATCH v3 04/10] sched/fair: Credit nominated next-buddy in yield_to_task_fair()
Date: Fri, 12 Jun 2026 09:33:49 +0800 [thread overview]
Message-ID: <20260612013355.59231-5-kernellwp@gmail.com> (raw)
In-Reply-To: <20260612013355.59231-1-kernellwp@gmail.com>
From: Wanpeng Li <wanpengli@tencent.com>
After set_next_buddy() nominates the yield_to() target at every level of
its sched_entity hierarchy, walk that same hierarchy and credit bounded
EEVDF lag to each not-yet-eligible entity. This allows pick_eevdf()'s
PICK_BUDDY path to select the nominated target instead of dropping the
hint at the first ineligible group entity.
Gate the walk with YIELD_TO_LAG_CREDIT. With the feature disabled,
yield_to_task_fair() keeps the existing forfeit-based behavior.
yield_to() holds both rq locks via double_rq_lock(), so touching the
target task's cfs_rqs, including remote cfs_rqs, is safe. Stop the walk
where set_next_buddy() stopped, and skip delayed or throttled entities.
Refresh the target rq clock when it differs from the local rq so the
per-level update_curr() calls observe current rq_clock values. The local
rq still uses the existing yield_task_fair() path in this change.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 41 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c28682fedb36..48f65a4f1923 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9346,8 +9346,8 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, struct t
* depth so it stays eligible across several picks. The caller clamps it to
* entity_lag()'s legal bound, so EEVDF fairness is preserved.
*/
-static u64 __maybe_unused
-eevdf_persistent_margin(struct cfs_rq *cfs_rq, struct sched_entity *se)
+static u64 eevdf_persistent_margin(struct cfs_rq *cfs_rq,
+ struct sched_entity *se)
{
u64 base = sysctl_sched_base_slice;
unsigned int n = cfs_rq->h_nr_queued;
@@ -9379,7 +9379,7 @@ eevdf_persistent_margin(struct cfs_rq *cfs_rq, struct sched_entity *se)
* Idempotent once @se holds the margin. Caller must hold
* rq_of(cfs_rq)->lock with rq_clock up to date.
*/
-static void __maybe_unused
+static void
eevdf_credit_entity_vlag(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 avruntime, credit, want, margin, max_slice, lag_limit;
@@ -9488,6 +9488,7 @@ static void yield_task_fair(struct rq *rq)
static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
{
struct sched_entity *se = &p->se;
+ struct rq *p_rq = task_rq(p);
/* !se->on_rq also covers throttled task */
if (!se->on_rq)
@@ -9496,6 +9497,43 @@ static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
/* Tell the scheduler that we'd really like se to run next. */
set_next_buddy(se);
+ /* Without lag credit, keep the existing forfeit-based yield. */
+ if (!sched_feat(YIELD_TO_LAG_CREDIT)) {
+ yield_task_fair(rq);
+ return true;
+ }
+
+ /*
+ * Walk the ancestor chain set_next_buddy() just nominated and credit
+ * bounded lag to each not-yet-eligible level so pick_eevdf() returns
+ * it. yield_to() holds both rq locks via double_rq_lock(), so touching
+ * p's cfs_rqs (possibly on another CPU) is safe; the primitive is
+ * idempotent, so no rate limiting is needed.
+ *
+ * Only refresh p_rq's clock when it differs from the local rq. A
+ * remote p_rq must be refreshed so the per-level update_curr() is
+ * accurate. In the same-rq case we skip it: the credit is a
+ * best-effort hint and the rq clock is recent enough, while the
+ * trailing yield_task_fair() would otherwise make this a second
+ * update_rq_clock() on the same rq and trip
+ * SCHED_WARN_ON(WARN_DOUBLE_CLOCK).
+ */
+ if (rq != p_rq)
+ update_rq_clock(p_rq);
+
+ for_each_sched_entity(se) {
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+ if (cfs_rq->next != se)
+ break;
+ if (se->sched_delayed)
+ break;
+ if (throttled_hierarchy(cfs_rq))
+ break;
+
+ eevdf_credit_entity_vlag(cfs_rq, se);
+ }
+
yield_task_fair(rq);
return true;
--
2.43.0
next prev parent reply other threads:[~2026-06-12 1:34 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 1:33 [PATCH v3 00/10] sched/fair, KVM: Semantics-aware directed yield for oversubscribed KVM Wanpeng Li
2026-06-12 1:33 ` [PATCH v3 01/10] sched/fair: Add EEVDF lag credit primitive for nominated next-buddy Wanpeng Li
2026-06-12 1:49 ` sashiko-bot
2026-06-12 5:34 ` K Prateek Nayak
2026-06-12 1:33 ` [PATCH v3 02/10] sched/fair: Credit a persistent, queue-depth-scaled vlag margin Wanpeng Li
2026-06-12 1:53 ` sashiko-bot
2026-06-12 6:07 ` K Prateek Nayak
2026-06-12 1:33 ` [PATCH v3 03/10] sched/fair: Credit queued next-buddy via canonical requeue Wanpeng Li
2026-06-12 1:55 ` sashiko-bot
2026-06-12 1:33 ` Wanpeng Li [this message]
2026-06-12 1:54 ` [PATCH v3 04/10] sched/fair: Credit nominated next-buddy in yield_to_task_fair() sashiko-bot
2026-06-12 1:33 ` [PATCH v3 05/10] sched/fair: Force a local resched on yield_to() so the buddy is picked Wanpeng Li
2026-06-12 1:50 ` sashiko-bot
2026-06-12 1:33 ` [PATCH v3 06/10] KVM: x86: Add IPI tracking infrastructure for directed yield Wanpeng Li
2026-06-12 1:33 ` [PATCH v3 07/10] KVM: x86/lapic: Track unicast fixed IPI delivery Wanpeng Li
2026-06-12 1:33 ` [PATCH v3 08/10] KVM: x86/lapic: Clear IPI tracking on matching-vector EOI Wanpeng Li
2026-06-12 3:46 ` sashiko-bot
2026-06-12 1:33 ` [PATCH v3 09/10] KVM: Add IPI-aware directed-yield candidate selection Wanpeng Li
2026-06-12 1:48 ` sashiko-bot
2026-06-12 1:33 ` [PATCH v3 10/10] KVM: Add relaxed preempted-only fallback for directed yield Wanpeng Li
2026-06-12 5:17 ` [PATCH v3 00/10] sched/fair, KVM: Semantics-aware directed yield for oversubscribed KVM K Prateek Nayak
2026-06-12 9:43 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612013355.59231-5-kernellwp@gmail.com \
--to=kernellwp@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=richie@linux.ibm.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.