All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [PATCH RFC] sched_ext: Choose prev_cpu if idle and cache affine without WF_SYNC
Date: Mon, 17 Mar 2025 18:20:53 +0100	[thread overview]
Message-ID: <Z9hZ9fgtGNd8DeEf@gpd3> (raw)
In-Reply-To: <20250317082803.3071809-1-joelagnelf@nvidia.com>

Hi Joel,

On Mon, Mar 17, 2025 at 04:28:02AM -0400, Joel Fernandes wrote:
> Consider that the previous CPU is cache affined to the waker's CPU and
> is idle. Currently, scx's default select function only selects the
> previous CPU in this case if WF_SYNC request is also made to wakeup on the
> waker's CPU.
> 
> This means, without WF_SYNC, the previous CPU being cache affined to the
> waker and is idle is not considered. This seems extreme. WF_SYNC is not
> normally passed to the wakeup path outside of some IPC drivers but it is
> very possible that the task is cache hot on previous CPU and shares
> cache with the waker CPU. Lets avoid too many migrations and select the
> previous CPU in such cases.
> 
> This change is consistent with the fair scheduler's behavior as well. In
> select_idle_sibling(), the previous CPU is selected if it is cache
> affined with the target. This is done regardless of WF_SYNC and before
> any scanning of fully idle cores is done.
> 
> One difference still exists though between SCX and CFS in this regard, in CFS
> we first check if the target CPU is idle before checking if the previous CPU is
> idle. However that could be a matter of choice and in the future, that behavior
> could also be unified.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/sched/ext.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 5a81d9a1e31f..3b1a489e2aaf 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3479,7 +3479,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  {
>  	const struct cpumask *llc_cpus = NULL;
>  	const struct cpumask *numa_cpus = NULL;
> -	s32 cpu;
> +	s32 cpu = smp_processor_id();
>  
>  	*found = false;
>  
> @@ -3507,22 +3507,20 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  			llc_cpus = llc_span(prev_cpu);
>  	}
>  
> +	/*
> +	 * If the waker's CPU is cache affine and prev_cpu is idle, then avoid
> +	 * a migration.
> +	 */
> +	if (cpus_share_cache(cpu, prev_cpu) &&
> +		test_and_clear_cpu_idle(prev_cpu)) {
> +		cpu = prev_cpu;
> +		goto cpu_found;
> +	}
> +

One potential issue that I see is that when SMT is enabled, you may end up
using prev_cpu also when the other sibling is busy. Maybe we should check
if prev_cpu is set in the SMT idle cpumask.

Also, last time I tried a similar change I was regressing a lot of
benchmarks. Maybe we should repeat the tests and get some numbers.

>  	/*
>  	 * If WAKE_SYNC, try to migrate the wakee to the waker's CPU.
>  	 */
>  	if (wake_flags & SCX_WAKE_SYNC) {
> -		cpu = smp_processor_id();
> -
> -		/*
> -		 * If the waker's CPU is cache affine and prev_cpu is idle,
> -		 * then avoid a migration.
> -		 */
> -		if (cpus_share_cache(cpu, prev_cpu) &&
> -		    test_and_clear_cpu_idle(prev_cpu)) {
> -			cpu = prev_cpu;
> -			goto cpu_found;
> -		}
> -
>  		/*
>  		 * If the waker's local DSQ is empty, and the system is under
>  		 * utilized, try to wake up @p to the local DSQ of the waker.
> -- 
> 2.43.0
> 

Thanks,
-Andrea

  parent reply	other threads:[~2025-03-17 17:21 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17  8:28 [PATCH RFC] sched_ext: Choose prev_cpu if idle and cache affine without WF_SYNC Joel Fernandes
2025-03-17 17:08 ` Tejun Heo
2025-03-17 17:30   ` Andrea Righi
2025-03-17 17:44     ` Peter Zijlstra
2025-03-18  5:17       ` Andrea Righi
2025-03-17 22:11     ` Joel Fernandes
2025-03-18  5:09       ` Andrea Righi
2025-03-18 17:00         ` Joel Fernandes
2025-03-18 17:46           ` Andrea Righi
2025-03-18 22:37             ` Joel Fernandes
2025-03-17 22:07   ` Joel Fernandes
2025-03-17 22:25     ` Tejun Heo
2025-03-17 22:45       ` Joel Fernandes
2025-03-18  0:12   ` Libo Chen
2025-03-18  0:14     ` Tejun Heo
2025-03-17 17:20 ` Andrea Righi [this message]
2025-03-17 22:44   ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9hZ9fgtGNd8DeEf@gpd3 \
    --to=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.