All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org, Chen Yu <yu.c.chen@intel.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	Arjan Van De Ven <arjan.van.de.ven@intel.com>
Subject: Re: [PATCH v3 1/2] sched: Create architecture specific sched domain distances
Date: Mon, 15 Sep 2025 14:37:38 +0200	[thread overview]
Message-ID: <20250915123738.GD3245006@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <1aa0ae94e95c45c8f3353f12e6494907df339632.1757614784.git.tim.c.chen@linux.intel.com>

On Thu, Sep 11, 2025 at 11:30:56AM -0700, Tim Chen wrote:
> Allow architecture specific sched domain NUMA distances that can be
> modified from NUMA node distances for the purpose of building NUMA
> sched domains.
> 
> The actual NUMA distances are kept separately.  This allows for NUMA
> domain levels modification when building sched domains for specific
> architectures.
> 
> Consolidate the recording of unique NUMA distances in an array to
> sched_record_numa_dist() so the function can be reused to record NUMA
> distances when the NUMA distance metric is changed.
> 
> No functional change if there's no arch specific NUMA distances
> are being defined.

Keeping both metrics side-by-side is confusing -- and not very well
justified by the above.

Is there any appreciable benefit to mixing the two like this?

> 
> Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  include/linux/sched/topology.h |   2 +
>  kernel/sched/topology.c        | 114 ++++++++++++++++++++++++++++-----
>  2 files changed, 99 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 5263746b63e8..4f58e78ca52e 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -59,6 +59,8 @@ static inline int cpu_numa_flags(void)
>  #endif
>  
>  extern int arch_asym_cpu_priority(int cpu);
> +extern int arch_sched_node_distance(int from, int to);
> +extern int sched_avg_remote_numa_distance;
>  
>  struct sched_domain_attr {
>  	int relax_domain_level;
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 977e133bb8a4..6c0ff62322cb 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1591,10 +1591,13 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
>  enum numa_topology_type sched_numa_topology_type;
>  
>  static int			sched_domains_numa_levels;
> +static int			sched_numa_node_levels;
>  static int			sched_domains_curr_level;
>  
>  int				sched_max_numa_distance;
> +int				sched_avg_remote_numa_distance;
>  static int			*sched_domains_numa_distance;
> +static int			*sched_numa_node_distance;
>  static struct cpumask		***sched_domains_numa_masks;
>  #endif /* CONFIG_NUMA */
>  
> @@ -1808,10 +1811,10 @@ bool find_numa_distance(int distance)
>  		return true;
>  
>  	rcu_read_lock();
> -	distances = rcu_dereference(sched_domains_numa_distance);
> +	distances = rcu_dereference(sched_numa_node_distance);
>  	if (!distances)
>  		goto unlock;
> -	for (i = 0; i < sched_domains_numa_levels; i++) {
> +	for (i = 0; i < sched_numa_node_levels; i++) {
>  		if (distances[i] == distance) {
>  			found = true;
>  			break;

I'm assuming (because its not actually stated anywhere) that
sched_numa_$FOO is based on the SLIT table, while sched_domain_$FOO is
the modified thing.

And you're saying it makes a significant difference to
preferred_group_nid()?

> +static int sched_record_numa_dist(int offline_node, int (*n_dist)(int, int),
> +		int **dist, int *levels)
> +

That's a coding style fail; use cino=(0:0.

>  {
> -	struct sched_domain_topology_level *tl;
>  	unsigned long *distance_map;
>  	int nr_levels = 0;
>  	int i, j;
>  	int *distances;
> -	struct cpumask ***masks;
>  
>  	/*
>  	 * O(nr_nodes^2) de-duplicating selection sort -- in order to find the
> @@ -1902,17 +1923,17 @@ void sched_init_numa(int offline_node)
>  	 */
>  	distance_map = bitmap_alloc(NR_DISTANCE_VALUES, GFP_KERNEL);
>  	if (!distance_map)
> -		return;
> +		return -ENOMEM;
>  
>  	bitmap_zero(distance_map, NR_DISTANCE_VALUES);
>  	for_each_cpu_node_but(i, offline_node) {
>  		for_each_cpu_node_but(j, offline_node) {
> -			int distance = node_distance(i, j);
> +			int distance = n_dist(i, j);
>  
>  			if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) {
>  				sched_numa_warn("Invalid distance value range");
>  				bitmap_free(distance_map);
> -				return;
> +				return -EINVAL;
>  			}
>  
>  			bitmap_set(distance_map, distance, 1);
> @@ -1927,17 +1948,66 @@ void sched_init_numa(int offline_node)
>  	distances = kcalloc(nr_levels, sizeof(int), GFP_KERNEL);
>  	if (!distances) {
>  		bitmap_free(distance_map);
> -		return;
> +		return -ENOMEM;
>  	}
> -
>  	for (i = 0, j = 0; i < nr_levels; i++, j++) {
>  		j = find_next_bit(distance_map, NR_DISTANCE_VALUES, j);
>  		distances[i] = j;
>  	}
> -	rcu_assign_pointer(sched_domains_numa_distance, distances);
> +	*dist = distances;
> +	*levels = nr_levels;
>  
>  	bitmap_free(distance_map);
>  
> +	return 0;
> +}
> +
> +static int avg_remote_numa_distance(int offline_node)
> +{
> +	int i, j;
> +	int distance, nr_remote = 0, total_distance = 0;
> +
> +	for_each_cpu_node_but(i, offline_node) {
> +		for_each_cpu_node_but(j, offline_node) {
> +			distance = node_distance(i, j);
> +
> +			if (distance >= REMOTE_DISTANCE) {
> +				nr_remote++;
> +				total_distance += distance;
> +			}
> +		}
> +	}
> +	if (nr_remote)
> +		return total_distance / nr_remote;
> +	else
> +		return REMOTE_DISTANCE;
> +}
> +
> +void sched_init_numa(int offline_node)
> +{
> +	struct sched_domain_topology_level *tl;
> +	int nr_levels, nr_node_levels;
> +	int i, j;
> +	int *distances, *domain_distances;
> +	struct cpumask ***masks;
> +
> +	if (sched_record_numa_dist(offline_node, numa_node_dist, &distances,
> +				   &nr_node_levels))
> +		return;
> +
> +	WRITE_ONCE(sched_avg_remote_numa_distance,
> +		   avg_remote_numa_distance(offline_node));

What is the point of all this? sched_avg_remote_numa_distance isn't
actually used anywhere. I'm thinking it doesn't want to be in this patch
at the very least.

  parent reply	other threads:[~2025-09-15 12:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 18:30 [PATCH v3 0/2] Fix NUMA sched domain build errors for GNR and CWF Tim Chen
2025-09-11 18:30 ` [PATCH v3 1/2] sched: Create architecture specific sched domain distances Tim Chen
2025-09-12  3:23   ` K Prateek Nayak
2025-09-15 16:44     ` Tim Chen
2025-09-17  6:45       ` K Prateek Nayak
2025-09-12  5:24   ` Chen, Yu C
2025-09-15 16:49     ` Tim Chen
2025-09-15 17:16     ` Tim Chen
2025-09-15 12:37   ` Peter Zijlstra [this message]
2025-09-15 17:13     ` Tim Chen
2025-09-15 20:04       ` Tim Chen
2025-09-11 18:30 ` [PATCH v3 2/2] sched: Fix sched domain build error for GNR, CWF in SNC-3 mode Tim Chen
2025-09-12  5:08   ` K Prateek Nayak
2025-09-15 17:15     ` Tim Chen
2025-09-12  5:39   ` Chen, Yu C
2025-09-12  9:23     ` K Prateek Nayak
2025-09-12 11:59       ` Chen, Yu C
2025-09-15 12:46   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250915123738.GD3245006@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=arjan.van.de.ven@intel.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=len.brown@intel.com \
    --cc=libo.chen@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vinicius.gomes@intel.com \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.