* [PATCH v2 sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
@ 2026-03-20 17:28 Andrea Righi
2026-03-21 18:42 ` Tejun Heo
0 siblings, 1 reply; 2+ messages in thread
From: Andrea Righi @ 2026-03-20 17:28 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min
Cc: Daniel Jordan, Emil Tsalapatis, Daniel Hodges, sched-ext,
linux-kernel, Cheng-Yang Chou
In the default built-in idle CPU selection policy, when @prev_cpu is
busy and no fully idle core is available, try to place the task on its
SMT sibling if that sibling is idle, before searching any other idle CPU
in the same LLC.
Migration to the sibling is cheap and keeps the task on the same core,
preserving L1 cache and reducing wakeup latency.
On large SMT systems this appears to consistently boost throughput by
roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
the number of SMT cores).
Cc: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Update scx_select_cpu_dfl() documentation to reflect the SMT sibling
preference (Cheng-Yang Chou)
- Link to v1: https://lore.kernel.org/all/20260318003842.762275-1-arighi@nvidia.com/
kernel/sched/ext_idle.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index c7e4052626979..d9596427b5aa1 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -424,18 +424,24 @@ static inline bool task_affinity_all(const struct task_struct *p)
* - prefer the last used CPU to take advantage of cached data (L1, L2) and
* branch prediction optimizations.
*
- * 3. Pick a CPU within the same LLC (Last-Level Cache):
+ * 3. Prefer @prev_cpu's SMT sibling:
+ * - if @prev_cpu is busy and no fully idle core is available, try to
+ * place the task on an idle SMT sibling of @prev_cpu; keeping the
+ * task on the same core makes migration cheaper, preserves L1 cache
+ * locality and reduces wakeup latency.
+ *
+ * 4. Pick a CPU within the same LLC (Last-Level Cache):
* - if the above conditions aren't met, pick a CPU that shares the same
* LLC, if the LLC domain is a subset of @cpus_allowed, to maintain
* cache locality.
*
- * 4. Pick a CPU within the same NUMA node, if enabled:
+ * 5. Pick a CPU within the same NUMA node, if enabled:
* - choose a CPU from the same NUMA node, if the node cpumask is a
* subset of @cpus_allowed, to reduce memory access latency.
*
- * 5. Pick any idle CPU within the @cpus_allowed domain.
+ * 6. Pick any idle CPU within the @cpus_allowed domain.
*
- * Step 3 and 4 are performed only if the system has, respectively,
+ * Step 4 and 5 are performed only if the system has, respectively,
* multiple LLCs / multiple NUMA nodes (see scx_selcpu_topo_llc and
* scx_selcpu_topo_numa) and they don't contain the same subset of CPUs.
*
@@ -616,6 +622,18 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
goto out_unlock;
}
+ /*
+ * Use @prev_cpu's sibling if it's idle.
+ */
+ if (sched_smt_active()) {
+ for_each_cpu_and(cpu, cpu_smt_mask(prev_cpu), allowed) {
+ if (cpu == prev_cpu)
+ continue;
+ if (scx_idle_test_and_clear_cpu(cpu))
+ goto out_unlock;
+ }
+ }
+
/*
* Search for any idle CPU in the same LLC domain.
*/
--
2.53.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-21 18:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 17:28 [PATCH v2 sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling Andrea Righi
2026-03-21 18:42 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox