From: Wanpeng Li <kernellwp@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: [PATCH 03/10] sched/fair: Add cgroup LCA finder for hierarchical yield
Date: Mon, 10 Nov 2025 11:32:24 +0800 [thread overview]
Message-ID: <20251110033232.12538-4-kernellwp@gmail.com> (raw)
In-Reply-To: <20251110033232.12538-1-kernellwp@gmail.com>
From: Wanpeng Li <wanpengli@tencent.com>
From: Wanpeng Li <wanpengli@tencent.com>
Implement yield_deboost_find_lca() to locate the lowest common ancestor
(LCA) in the cgroup hierarchy for EEVDF-aware yield operations.
The LCA represents the appropriate hierarchy level where vruntime
adjustments should be applied to ensure fairness is maintained across
cgroup boundaries. This is critical for virtualization workloads where
vCPUs may be organized in nested cgroups.
For CONFIG_FAIR_GROUP_SCHED, walk up both entity hierarchies by
aligning depths, then ascend together until a common cfs_rq is found.
For flat hierarchy, verify both entities share the same cfs_rq.
Validate that meaningful contention exists (nr_queued > 1) and ensure
the yielding entity has non-zero slice for safe penalty calculation.
The function operates under rq->lock protection. This static helper
will be integrated in subsequent patches.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
kernel/sched/fair.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a7dc21c2dbdb..740c002b8f1c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9058,6 +9058,66 @@ static bool __maybe_unused yield_deboost_validate_tasks(struct rq *rq, struct ta
return true;
}
+/*
+ * Find the lowest common ancestor (LCA) in the cgroup hierarchy for EEVDF.
+ * We walk up both entity hierarchies under rq->lock protection.
+ * Task migration requires task_rq_lock, ensuring parent chains remain stable.
+ * We locate the first common cfs_rq where both entities coexist, representing
+ * the appropriate level for vruntime adjustments and EEVDF field updates
+ * (deadline, vlag) to maintain scheduler consistency.
+ */
+static bool __maybe_unused yield_deboost_find_lca(struct sched_entity *se_y, struct sched_entity *se_t,
+ struct sched_entity **se_y_lca_out,
+ struct sched_entity **se_t_lca_out,
+ struct cfs_rq **cfs_rq_common_out)
+{
+ struct sched_entity *se_y_lca, *se_t_lca;
+ struct cfs_rq *cfs_rq_common;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ se_t_lca = se_t;
+ se_y_lca = se_y;
+
+ while (se_t_lca && se_y_lca && se_t_lca->depth != se_y_lca->depth) {
+ if (se_t_lca->depth > se_y_lca->depth)
+ se_t_lca = se_t_lca->parent;
+ else
+ se_y_lca = se_y_lca->parent;
+ }
+
+ while (se_t_lca && se_y_lca) {
+ if (cfs_rq_of(se_t_lca) == cfs_rq_of(se_y_lca)) {
+ cfs_rq_common = cfs_rq_of(se_t_lca);
+ goto found_lca;
+ }
+ se_t_lca = se_t_lca->parent;
+ se_y_lca = se_y_lca->parent;
+ }
+ return false;
+#else
+ if (cfs_rq_of(se_y) != cfs_rq_of(se_t))
+ return false;
+ cfs_rq_common = cfs_rq_of(se_y);
+ se_y_lca = se_y;
+ se_t_lca = se_t;
+#endif
+
+found_lca:
+ if (!se_y_lca || !se_t_lca)
+ return false;
+
+ if (cfs_rq_common->nr_queued <= 1)
+ return false;
+
+ if (!se_y_lca->slice)
+ return false;
+
+ *se_y_lca_out = se_y_lca;
+ *se_t_lca_out = se_t_lca;
+ *cfs_rq_common_out = cfs_rq_common;
+ return true;
+}
+
/*
* sched_yield() is very simple
*/
--
2.43.0
next prev parent reply other threads:[~2025-11-10 3:32 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 3:32 [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2025-11-10 3:32 ` [PATCH 01/10] sched: Add vCPU debooster infrastructure Wanpeng Li
2025-11-10 3:32 ` [PATCH 02/10] sched/fair: Add rate-limiting and validation helpers Wanpeng Li
2025-11-12 6:40 ` K Prateek Nayak
2025-11-12 6:44 ` K Prateek Nayak
2025-11-13 13:36 ` Wanpeng Li
2025-11-13 12:00 ` Wanpeng Li
2025-11-10 3:32 ` Wanpeng Li [this message]
2025-11-12 6:50 ` [PATCH 03/10] sched/fair: Add cgroup LCA finder for hierarchical yield K Prateek Nayak
2025-11-13 8:59 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 04/10] sched/fair: Add penalty calculation and application logic Wanpeng Li
2025-11-12 7:25 ` K Prateek Nayak
2025-11-13 13:25 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 05/10] sched/fair: Wire up yield deboost in yield_to_task_fair() Wanpeng Li
2025-11-10 5:16 ` kernel test robot
2025-11-10 5:16 ` kernel test robot
2025-11-10 3:32 ` [PATCH 06/10] KVM: Fix last_boosted_vcpu index assignment bug Wanpeng Li
2025-11-21 0:35 ` Sean Christopherson
2025-11-21 0:38 ` Sean Christopherson
2025-11-21 11:46 ` Wanpeng Li
2025-11-10 3:32 ` [PATCH 07/10] KVM: x86: Add IPI tracking infrastructure Wanpeng Li
2025-11-10 3:32 ` [PATCH 08/10] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Wanpeng Li
2025-11-10 3:32 ` [PATCH 09/10] KVM: Implement IPI-aware directed yield candidate selection Wanpeng Li
2025-11-10 3:39 ` [PATCH 10/10] KVM: Relaxed boost as safety net Wanpeng Li
2025-11-10 12:02 ` [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Christian Borntraeger
2025-11-12 5:01 ` Wanpeng Li
2025-11-18 8:11 ` Christian Borntraeger
2025-11-18 14:19 ` Wanpeng Li
2025-11-11 6:28 ` K Prateek Nayak
2025-11-12 4:54 ` Wanpeng Li
2025-11-12 6:07 ` K Prateek Nayak
2025-11-13 5:37 ` Wanpeng Li
2025-11-13 4:42 ` K Prateek Nayak
2025-11-13 8:33 ` Wanpeng Li
2025-11-13 9:48 ` K Prateek Nayak
2025-11-13 13:56 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251110033232.12538-4-kernellwp@gmail.com \
--to=kernellwp@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox