All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kyle Meyer <kyle.meyer@hpe.com>
To: tim.c.chen@linux.intel.com, bp@alien8.de,
	dave.hansen@linux.intel.com, mingo@redhat.com,
	peterz@infradead.org, tglx@kernel.org, vinicius.gomes@intel.com
Cc: brgerst@gmail.com, hpa@zytor.com, kprateek.nayak@amd.com,
	linux-kernel@vger.kernel.org, patryk.wlazlyn@linux.intel.com,
	rafael.j.wysocki@intel.com, russ.anderson@hpe.com,
	x86@kernel.org, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] sched/topology: Check average distances to remote packages
Date: Mon, 23 Feb 2026 10:42:34 -0600	[thread overview]
Message-ID: <aZyDel6NYyrAsLrl@hpe.com> (raw)
In-Reply-To: <aYPjOgiO_XsFWnWu@hpe.com>

On Wed, Feb 04, 2026 at 06:24:31PM -0600, Kyle Meyer wrote:
> Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> remote packages to fix scheduler domains, see [1] for more information.
> 
> A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> enabled and there are more than 2 packages because the average distances
> to remote packages could be different, skewing the single average remote
> distance.
> 
> This is unnecessary when the average distances to remote packages are
> the same.
> 
> Support single average remote distance on systems with more than 2
> packages, preventing unnecessary warnings and backtraces, by checking if
> average distances to remote packages are the same.
> 
> [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
> 
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
> 
> The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
> 
> v1:
> * https://lore.kernel.org/all/aXjvLjTCRe8d3UFD@hpe.com/
> 
> v1 -> v2:
> * Initialize pkg_total_distance and pkg_nr_remote to NULL, as suggested by Tim.
> 
> ---
>  arch/x86/kernel/smpboot.c | 69 ++++++++++++++++++++++++++++-----------
>  1 file changed, 50 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5cd6950ab672..dc8f15bd2e19 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -518,27 +518,69 @@ static int avg_remote_numa_distance(void)
>  {
>  	int i, j;
>  	int distance, nr_remote, total_distance;
> +	int max_pkgs = topology_max_packages();
> +	int cpu, pkg, pkg_avg_distance;
> +	int *pkg_total_distance = NULL, *pkg_nr_remote = NULL;
>  
>  	if (sched_avg_remote_distance > 0)
>  		return sched_avg_remote_distance;
>  
> +	sched_avg_remote_distance = REMOTE_DISTANCE;
> +
>  	nr_remote = 0;
>  	total_distance = 0;
> +
> +	pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> +	if (!pkg_total_distance)
> +		goto cleanup;
> +
> +	pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> +	if (!pkg_nr_remote)
> +		goto cleanup;
> +
>  	for_each_node_state(i, N_CPU) {
>  		for_each_node_state(j, N_CPU) {
>  			distance = node_distance(i, j);
>  
> -			if (distance >= REMOTE_DISTANCE) {
> -				nr_remote++;
> -				total_distance += distance;
> -			}
> +			if (distance < REMOTE_DISTANCE)
> +				continue;
> +
> +			nr_remote++;
> +			total_distance += distance;
> +
> +			cpu = cpumask_first(cpumask_of_node(j));
> +			if (cpu >= nr_cpu_ids)
> +				continue;
> +
> +			pkg = topology_physical_package_id(cpu);
> +			pkg_total_distance[pkg] += distance;
> +			pkg_nr_remote[pkg]++;
>  		}
>  	}
> -	if (nr_remote)
> -		sched_avg_remote_distance = total_distance / nr_remote;
> -	else
> -		sched_avg_remote_distance = REMOTE_DISTANCE;
>  
> +	if (!nr_remote)
> +		goto cleanup;
> +
> +	sched_avg_remote_distance = total_distance / nr_remote;
> +
> +	/*
> +	 * Single average remote distance won't be appropriate if different
> +	 * packages have different distances to remote packages.
> +	 */
> +	for (i = 0; i < max_pkgs; i++) {
> +		if (!pkg_nr_remote[i])
> +			continue;
> +
> +		pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> +
> +		pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> +
> +		if (pkg_avg_distance != sched_avg_remote_distance)
> +			WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> +	}
> +cleanup:
> +	kfree(pkg_nr_remote);
> +	kfree(pkg_total_distance);
>  	return sched_avg_remote_distance;
>  }
>  
> @@ -564,18 +606,7 @@ int arch_sched_node_distance(int from, int to)
>  		 * in the remote package in the same sched group.
>  		 * Simplify NUMA domains and avoid extra NUMA levels including
>  		 * different remote NUMA nodes and local nodes.
> -		 *
> -		 * GNR and CWF don't expect systems with more than 2 packages
> -		 * and more than 2 hops between packages. Single average remote
> -		 * distance won't be appropriate if there are more than 2
> -		 * packages as average distance to different remote packages
> -		 * could be different.
>  		 */
> -		WARN_ONCE(topology_max_packages() > 2,
> -			  "sched: Expect only up to 2 packages for GNR or CWF, "
> -			  "but saw %d packages when building sched domains.",
> -			  topology_max_packages());
> -
>  		d = avg_remote_numa_distance();
>  	}
>  	return d;
> -- 
> 2.52.0
> 

Just a friendly ping.

Thanks,
Kyle Meyer

  reply	other threads:[~2026-02-23 17:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  0:24 [PATCH v2] sched/topology: Check average distances to remote packages Kyle Meyer
2026-02-23 16:42 ` Kyle Meyer [this message]
2026-02-23 17:03 ` Peter Zijlstra
2026-02-25  1:43   ` Kyle Meyer
2026-02-25  9:05     ` Chen, Yu C
2026-02-25 12:30     ` Peter Zijlstra
2026-02-25 13:36       ` Peter Zijlstra
2026-02-25 15:39       ` Chen, Yu C
2026-02-25 15:44         ` Peter Zijlstra
2026-02-25 16:32           ` Peter Zijlstra
2026-02-25 16:40             ` Peter Zijlstra
2026-02-25 21:37             ` Tim Chen
2026-02-25 22:30               ` Peter Zijlstra
2026-02-25 22:54                 ` Peter Zijlstra
2026-02-25 22:55                 ` Tim Chen
2026-02-25 23:29                   ` Kyle Meyer
2026-02-26 18:14                     ` Tim Chen
2026-02-25 16:41       ` Kyle Meyer
2026-02-25 16:49         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZyDel6NYyrAsLrl@hpe.com \
    --to=kyle.meyer@hpe.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patryk.wlazlyn@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=russ.anderson@hpe.com \
    --cc=tglx@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vinicius.gomes@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.