From: Kyle Meyer <kyle.meyer@hpe.com>
To: tim.c.chen@linux.intel.com, bp@alien8.de,
dave.hansen@linux.intel.com, mingo@redhat.com,
peterz@infradead.org, tglx@kernel.org, vinicius.gomes@intel.com
Cc: brgerst@gmail.com, hpa@zytor.com, kprateek.nayak@amd.com,
linux-kernel@vger.kernel.org, patryk.wlazlyn@linux.intel.com,
rafael.j.wysocki@intel.com, russ.anderson@hpe.com,
x86@kernel.org, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] sched/topology: Check average distances to remote packages
Date: Mon, 23 Feb 2026 10:42:34 -0600 [thread overview]
Message-ID: <aZyDel6NYyrAsLrl@hpe.com> (raw)
In-Reply-To: <aYPjOgiO_XsFWnWu@hpe.com>
On Wed, Feb 04, 2026 at 06:24:31PM -0600, Kyle Meyer wrote:
> Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> remote packages to fix scheduler domains, see [1] for more information.
>
> A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> enabled and there are more than 2 packages because the average distances
> to remote packages could be different, skewing the single average remote
> distance.
>
> This is unnecessary when the average distances to remote packages are
> the same.
>
> Support single average remote distance on systems with more than 2
> packages, preventing unnecessary warnings and backtraces, by checking if
> average distances to remote packages are the same.
>
> [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
>
> The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
>
> v1:
> * https://lore.kernel.org/all/aXjvLjTCRe8d3UFD@hpe.com/
>
> v1 -> v2:
> * Initialize pkg_total_distance and pkg_nr_remote to NULL, as suggested by Tim.
>
> ---
> arch/x86/kernel/smpboot.c | 69 ++++++++++++++++++++++++++++-----------
> 1 file changed, 50 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5cd6950ab672..dc8f15bd2e19 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -518,27 +518,69 @@ static int avg_remote_numa_distance(void)
> {
> int i, j;
> int distance, nr_remote, total_distance;
> + int max_pkgs = topology_max_packages();
> + int cpu, pkg, pkg_avg_distance;
> + int *pkg_total_distance = NULL, *pkg_nr_remote = NULL;
>
> if (sched_avg_remote_distance > 0)
> return sched_avg_remote_distance;
>
> + sched_avg_remote_distance = REMOTE_DISTANCE;
> +
> nr_remote = 0;
> total_distance = 0;
> +
> + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_total_distance)
> + goto cleanup;
> +
> + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_nr_remote)
> + goto cleanup;
> +
> for_each_node_state(i, N_CPU) {
> for_each_node_state(j, N_CPU) {
> distance = node_distance(i, j);
>
> - if (distance >= REMOTE_DISTANCE) {
> - nr_remote++;
> - total_distance += distance;
> - }
> + if (distance < REMOTE_DISTANCE)
> + continue;
> +
> + nr_remote++;
> + total_distance += distance;
> +
> + cpu = cpumask_first(cpumask_of_node(j));
> + if (cpu >= nr_cpu_ids)
> + continue;
> +
> + pkg = topology_physical_package_id(cpu);
> + pkg_total_distance[pkg] += distance;
> + pkg_nr_remote[pkg]++;
> }
> }
> - if (nr_remote)
> - sched_avg_remote_distance = total_distance / nr_remote;
> - else
> - sched_avg_remote_distance = REMOTE_DISTANCE;
>
> + if (!nr_remote)
> + goto cleanup;
> +
> + sched_avg_remote_distance = total_distance / nr_remote;
> +
> + /*
> + * Single average remote distance won't be appropriate if different
> + * packages have different distances to remote packages.
> + */
> + for (i = 0; i < max_pkgs; i++) {
> + if (!pkg_nr_remote[i])
> + continue;
> +
> + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> +
> + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> +
> + if (pkg_avg_distance != sched_avg_remote_distance)
> + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> + }
> +cleanup:
> + kfree(pkg_nr_remote);
> + kfree(pkg_total_distance);
> return sched_avg_remote_distance;
> }
>
> @@ -564,18 +606,7 @@ int arch_sched_node_distance(int from, int to)
> * in the remote package in the same sched group.
> * Simplify NUMA domains and avoid extra NUMA levels including
> * different remote NUMA nodes and local nodes.
> - *
> - * GNR and CWF don't expect systems with more than 2 packages
> - * and more than 2 hops between packages. Single average remote
> - * distance won't be appropriate if there are more than 2
> - * packages as average distance to different remote packages
> - * could be different.
> */
> - WARN_ONCE(topology_max_packages() > 2,
> - "sched: Expect only up to 2 packages for GNR or CWF, "
> - "but saw %d packages when building sched domains.",
> - topology_max_packages());
> -
> d = avg_remote_numa_distance();
> }
> return d;
> --
> 2.52.0
>
Just a friendly ping.
Thanks,
Kyle Meyer
next prev parent reply other threads:[~2026-02-23 17:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 0:24 [PATCH v2] sched/topology: Check average distances to remote packages Kyle Meyer
2026-02-23 16:42 ` Kyle Meyer [this message]
2026-02-23 17:03 ` Peter Zijlstra
2026-02-25 1:43 ` Kyle Meyer
2026-02-25 9:05 ` Chen, Yu C
2026-02-25 12:30 ` Peter Zijlstra
2026-02-25 13:36 ` Peter Zijlstra
2026-02-25 15:39 ` Chen, Yu C
2026-02-25 15:44 ` Peter Zijlstra
2026-02-25 16:32 ` Peter Zijlstra
2026-02-25 16:40 ` Peter Zijlstra
2026-02-25 21:37 ` Tim Chen
2026-02-25 22:30 ` Peter Zijlstra
2026-02-25 22:54 ` Peter Zijlstra
2026-02-25 22:55 ` Tim Chen
2026-02-25 23:29 ` Kyle Meyer
2026-02-26 18:14 ` Tim Chen
2026-02-25 16:41 ` Kyle Meyer
2026-02-25 16:49 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZyDel6NYyrAsLrl@hpe.com \
--to=kyle.meyer@hpe.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=patryk.wlazlyn@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=russ.anderson@hpe.com \
--cc=tglx@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vinicius.gomes@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox