Re: [PATCH 09/10] sched_ext: idle: Get rid of the scx_selcpu_topo_numa logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 09/10] sched_ext: idle: Get rid of the scx_selcpu_topo_numa logic
Date: Tue, 24 Dec 2024 09:58:25 +0100	[thread overview]
Message-ID: <Z2p3sSZJRCIfS9jA@gpd3> (raw)
In-Reply-To: <Z2n0xDaP7Ulq1DSg@yury-ThinkPad>

On Mon, Dec 23, 2024 at 03:39:56PM -0800, Yury Norov wrote:
> On Fri, Dec 20, 2024 at 04:11:41PM +0100, Andrea Righi wrote:
> > With the introduction of separate per-NUMA node cpumasks, we
> > automatically track idle CPUs within each NUMA node.
> > 
> > This makes the special logic for determining idle CPUs in each NUMA node
> > redundant and unnecessary, so we can get rid of it.
> 
> But it looks like you do more than that... 
> 
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/ext_idle.c | 93 ++++++++++-------------------------------
> >  1 file changed, 23 insertions(+), 70 deletions(-)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 013deaa08f12..b36e93da1b75 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -82,7 +82,6 @@ static void idle_masks_init(void)
> >  }
> >  
> >  static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc);
> > -static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa);
> >  
> >  /*
> >   * Return the node id associated to a target idle CPU (used to determine
> > @@ -259,25 +258,6 @@ static unsigned int numa_weight(s32 cpu)
> >  	return sg->group_weight;
> >  }
> >  
> > -/*
> > - * Return the cpumask representing the NUMA domain of @cpu (or NULL if the NUMA
> > - * domain is not defined).
> > - */
> > -static struct cpumask *numa_span(s32 cpu)
> > -{
> > -	struct sched_domain *sd;
> > -	struct sched_group *sg;
> > -
> > -	sd = rcu_dereference(per_cpu(sd_numa, cpu));
> > -	if (!sd)
> > -		return NULL;
> > -	sg = sd->groups;
> > -	if (!sg)
> > -		return NULL;
> > -
> > -	return sched_group_span(sg);
> 
> I didn't find llc_span() and node_span() in vanilla kernel. Does this series
> have prerequisites?

This patch set is based on the sched_ext/for-6.14 branch:
https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/

I put sched_ext/for-6.14 in the cover email, maybe it wasn't very clear.
I should have mentioned the git repo in the email.

> 
> > -}
> > -
> >  /*
> >   * Return true if the LLC domains do not perfectly overlap with the NUMA
> >   * domains, false otherwise.
> > @@ -329,7 +309,7 @@ static bool llc_numa_mismatch(void)
> >   */
> >  static void update_selcpu_topology(struct sched_ext_ops *ops)
> >  {
> > -	bool enable_llc = false, enable_numa = false;
> > +	bool enable_llc = false;
> >  	unsigned int nr_cpus;
> >  	s32 cpu = cpumask_first(cpu_online_mask);
> >  
> > @@ -348,41 +328,34 @@ static void update_selcpu_topology(struct sched_ext_ops *ops)
> >  	if (nr_cpus > 0) {
> >  		if (nr_cpus < num_online_cpus())
> >  			enable_llc = true;
> > +		/*
> > +		 * No need to enable LLC optimization if the LLC domains are
> > +		 * perfectly overlapping with the NUMA domains when per-node
> > +		 * cpumasks are enabled.
> > +		 */
> > +		if ((ops->flags & SCX_OPS_BUILTIN_IDLE_PER_NODE) &&
> > +		    !llc_numa_mismatch())
> > +			enable_llc = false;
> 
> This doesn't sound like redundancy removal. I may be wrong, but this
> looks like a sort of optimization. If so, it deserves to be a separate
> patch.

So, the initial idea was to replace the current NUMA awareness logic with
the per-node cpumasks.

But in fact, we're doing this change:

 - before:
   - NUMA-awareness logic implicitly enabled if the node CPUs don't overlap
     with LLC CPUs (as it would be redundant)

 - after :
   - NUMA-awareness logic explicitly enabled when the scx scheduler sets
     SCX_OPS_BUILTIN_IDLE_PER_NODE in .flags (and in this case implicitly
     disable LLC awareness if the node/llc CPUs are overlapping)

Maybe a better approach would be to keep the old NUMA/LLC logic exactly as
it is in sched_ext/for-6.14 if SCX_OPS_BUILTIN_IDLE_PER_NODE is not
specified, otherwise use the new logic (and implicitly disable
scx_selcpu_topo_numa).

In this way this "removal" patch would only implement the logic to disable
scx_selcpu_topo_numa when SCX_OPS_BUILTIN_IDLE_PER_NODE is used.

-Andrea

next prev parent reply	other threads:[~2024-12-24  8:58 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 15:11 [PATCHSET v8 sched_ext/for-6.14] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2024-12-20 15:11 ` [PATCH 01/10] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node() Andrea Righi
2024-12-23 21:18   ` Yury Norov
2024-12-24  7:54     ` Andrea Righi
2024-12-24 17:33       ` Yury Norov
2024-12-20 15:11 ` [PATCH 02/10] sched_ext: Move built-in idle CPU selection policy to a separate file Andrea Righi
2024-12-24 21:21   ` Tejun Heo
2024-12-20 15:11 ` [PATCH 03/10] sched_ext: idle: introduce check_builtin_idle_enabled() helper Andrea Righi
2024-12-20 15:11 ` [PATCH 04/10] sched_ext: idle: use assign_cpu() to update the idle cpumask Andrea Righi
2024-12-23 22:26   ` Yury Norov
2024-12-20 15:11 ` [PATCH 05/10] sched_ext: idle: clarify comments Andrea Righi
2024-12-23 22:28   ` Yury Norov
2024-12-20 15:11 ` [PATCH 06/10] sched_ext: Introduce SCX_OPS_NODE_BUILTIN_IDLE Andrea Righi
2024-12-20 15:11 ` [PATCH 07/10] sched_ext: Introduce per-node idle cpumasks Andrea Righi
2024-12-24  4:05   ` Yury Norov
2024-12-24  8:18     ` Andrea Righi
2024-12-24 17:59       ` Yury Norov
2024-12-20 15:11 ` [PATCH 08/10] sched_ext: idle: introduce SCX_PICK_IDLE_NODE Andrea Righi
2024-12-24  2:48   ` Yury Norov
2024-12-24  3:53     ` Yury Norov
2024-12-24  8:37       ` Andrea Righi
2024-12-24 18:15         ` Yury Norov
2024-12-24  8:22     ` Andrea Righi
2024-12-24 21:29       ` Tejun Heo
2024-12-20 15:11 ` [PATCH 09/10] sched_ext: idle: Get rid of the scx_selcpu_topo_numa logic Andrea Righi
2024-12-23 23:39   ` Yury Norov
2024-12-24  8:58     ` Andrea Righi [this message]
2024-12-20 15:11 ` [PATCH 10/10] sched_ext: idle: Introduce NUMA aware idle cpu kfunc helpers Andrea Righi
2024-12-24  0:57   ` Yury Norov
2024-12-24  9:32     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2p3sSZJRCIfS9jA@gpd3 \
    --to=arighi@nvidia.com \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.