public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
@ 2026-03-18  0:38 Andrea Righi
  2026-03-18  1:11 ` Cheng-Yang Chou
  2026-03-20 15:57 ` Daniel Jordan
  0 siblings, 2 replies; 5+ messages in thread
From: Andrea Righi @ 2026-03-18  0:38 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min
  Cc: Emil Tsalapatis, Daniel Hodges, sched-ext, linux-kernel

In the default built-in idle CPU selection policy, when @prev_cpu is
busy and no fully idle core is available, try to place the task on its
SMT sibling if that sibling is idle, before searching any other idle CPU
in the same LLC.

Migration to the sibling is cheap and keeps the task on the same core,
preserving L1 cache and reducing wakeup latency.

On large SMT systems this appears to consistently boost throughput by
roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
the number of SMT cores).

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index c7e4052626979..e0c57355b33b8 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -616,6 +616,18 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 		goto out_unlock;
 	}
 
+	/*
+	 * Use @prev_cpu's sibling if it's idle.
+	 */
+	if (sched_smt_active()) {
+		for_each_cpu_and(cpu, cpu_smt_mask(prev_cpu), allowed) {
+			if (cpu == prev_cpu)
+				continue;
+			if (scx_idle_test_and_clear_cpu(cpu))
+				goto out_unlock;
+		}
+	}
+
 	/*
 	 * Search for any idle CPU in the same LLC domain.
 	 */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
  2026-03-18  0:38 [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling Andrea Righi
@ 2026-03-18  1:11 ` Cheng-Yang Chou
  2026-03-20 16:23   ` Andrea Righi
  2026-03-20 15:57 ` Daniel Jordan
  1 sibling, 1 reply; 5+ messages in thread
From: Cheng-Yang Chou @ 2026-03-18  1:11 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Tejun Heo, David Vernet, Changwoo Min, Emil Tsalapatis,
	Daniel Hodges, sched-ext, linux-kernel, Ching-Chun Huang,
	Chia-Ping Tsai

Hi Andrea,

On Wed, Mar 18, 2026 at 01:38:42AM +0100, Andrea Righi wrote:
> In the default built-in idle CPU selection policy, when @prev_cpu is
> busy and no fully idle core is available, try to place the task on its
> SMT sibling if that sibling is idle, before searching any other idle CPU
> in the same LLC.
> 
> Migration to the sibling is cheap and keeps the task on the same core,
> preserving L1 cache and reducing wakeup latency.
> 
> On large SMT systems this appears to consistently boost throughput by
> roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
> the number of SMT cores).
> 
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>  kernel/sched/ext_idle.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index c7e4052626979..e0c57355b33b8 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -616,6 +616,18 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
>  		goto out_unlock;
>  	}
>  
> +	/*
> +	 * Use @prev_cpu's sibling if it's idle.
> +	 */
> +	if (sched_smt_active()) {
> +		for_each_cpu_and(cpu, cpu_smt_mask(prev_cpu), allowed) {
> +			if (cpu == prev_cpu)
> +				continue;
> +			if (scx_idle_test_and_clear_cpu(cpu))
> +				goto out_unlock;
> +		}
> +	}
> +
>  	/*
>  	 * Search for any idle CPU in the same LLC domain.
>  	 */
> -- 
> 2.53.0
> 

Overall looks good, just a nit:

The block comment at the top of scx_select_cpu_dfl() still lists 5
steps. With this patch a new step should be added and the numbering
updated accordingly.

-- 
Thanks,
Cheng-Yang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
  2026-03-18  0:38 [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling Andrea Righi
  2026-03-18  1:11 ` Cheng-Yang Chou
@ 2026-03-20 15:57 ` Daniel Jordan
  2026-03-20 16:28   ` Andrea Righi
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Jordan @ 2026-03-20 15:57 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Tejun Heo, David Vernet, Changwoo Min, Emil Tsalapatis,
	Daniel Hodges, sched-ext, linux-kernel

Hi Andrea,

On Wed, Mar 18, 2026 at 01:38:42AM +0100, Andrea Righi wrote:
> In the default built-in idle CPU selection policy, when @prev_cpu is
> busy and no fully idle core is available, try to place the task on its
> SMT sibling if that sibling is idle, before searching any other idle CPU
> in the same LLC.
> 
> Migration to the sibling is cheap and keeps the task on the same core,
> preserving L1 cache and reducing wakeup latency.

Seems reasonable.

> On large SMT systems this appears to consistently boost throughput by
> roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
> the number of SMT cores).

What workloads out of curiosity?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
  2026-03-18  1:11 ` Cheng-Yang Chou
@ 2026-03-20 16:23   ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-03-20 16:23 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: Tejun Heo, David Vernet, Changwoo Min, Emil Tsalapatis,
	Daniel Hodges, sched-ext, linux-kernel, Ching-Chun Huang,
	Chia-Ping Tsai

Hi Cheng-Yang,

On Wed, Mar 18, 2026 at 09:11:29AM +0800, Cheng-Yang Chou wrote:
> Hi Andrea,
> 
> On Wed, Mar 18, 2026 at 01:38:42AM +0100, Andrea Righi wrote:
> > In the default built-in idle CPU selection policy, when @prev_cpu is
> > busy and no fully idle core is available, try to place the task on its
> > SMT sibling if that sibling is idle, before searching any other idle CPU
> > in the same LLC.
> > 
> > Migration to the sibling is cheap and keeps the task on the same core,
> > preserving L1 cache and reducing wakeup latency.
> > 
> > On large SMT systems this appears to consistently boost throughput by
> > roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
> > the number of SMT cores).
> > 
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/ext_idle.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index c7e4052626979..e0c57355b33b8 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -616,6 +616,18 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
> >  		goto out_unlock;
> >  	}
> >  
> > +	/*
> > +	 * Use @prev_cpu's sibling if it's idle.
> > +	 */
> > +	if (sched_smt_active()) {
> > +		for_each_cpu_and(cpu, cpu_smt_mask(prev_cpu), allowed) {
> > +			if (cpu == prev_cpu)
> > +				continue;
> > +			if (scx_idle_test_and_clear_cpu(cpu))
> > +				goto out_unlock;
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * Search for any idle CPU in the same LLC domain.
> >  	 */
> > -- 
> > 2.53.0
> > 
> 
> Overall looks good, just a nit:
> 
> The block comment at the top of scx_select_cpu_dfl() still lists 5
> steps. With this patch a new step should be added and the numbering
> updated accordingly.

Ah yes, good catch, we should update the comment as well. Will send a v2.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling
  2026-03-20 15:57 ` Daniel Jordan
@ 2026-03-20 16:28   ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-03-20 16:28 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Tejun Heo, David Vernet, Changwoo Min, Emil Tsalapatis,
	Daniel Hodges, sched-ext, linux-kernel

Hi Daniel,

On Fri, Mar 20, 2026 at 11:57:02AM -0400, Daniel Jordan wrote:
> Hi Andrea,
> 
> On Wed, Mar 18, 2026 at 01:38:42AM +0100, Andrea Righi wrote:
> > In the default built-in idle CPU selection policy, when @prev_cpu is
> > busy and no fully idle core is available, try to place the task on its
> > SMT sibling if that sibling is idle, before searching any other idle CPU
> > in the same LLC.
> > 
> > Migration to the sibling is cheap and keeps the task on the same core,
> > preserving L1 cache and reducing wakeup latency.
> 
> Seems reasonable.
> 
> > On large SMT systems this appears to consistently boost throughput by
> > roughly 2-3% on CPU-bound workloads (running a number of tasks equal to
> > the number of SMT cores).
> 
> What workloads out of curiosity?

For the "server" side, I've used an internal benchmark suite, the workload
that is showing the best results (3% speedup) is based on NVBLAS, but doing
pure CPU activity.

I've also tested this locally (AMD Ryzen 9 laptop), usual gaming benchmarks
checking avg fps and tail latency. I noticed small improvements for this
use-case as well (still in the 1-2% range, nothing impressive, but it looks
consistent).

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-20 16:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  0:38 [PATCH sched_ext/for-7.1] sched_ext: idle: Prioritize idle SMT sibling Andrea Righi
2026-03-18  1:11 ` Cheng-Yang Chou
2026-03-20 16:23   ` Andrea Righi
2026-03-20 15:57 ` Daniel Jordan
2026-03-20 16:28   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox