* [PATCH] sched/topology: Check average distances to remote packages
@ 2026-01-27 17:02 Kyle Meyer
2026-02-04 21:42 ` Tim Chen
2026-02-04 22:58 ` Tim Chen
0 siblings, 2 replies; 4+ messages in thread
From: Kyle Meyer @ 2026-01-27 17:02 UTC (permalink / raw)
To: bp, dave.hansen, mingo, peterz, tglx, tim.c.chen, vinicius.gomes
Cc: brgerst, hpa, kprateek.nayak, linux-kernel, patryk.wlazlyn,
rafael.j.wysocki, russ.anderson, x86, yu.c.chen, zhao1.liu
Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
remote packages to fix scheduler domains, see [1] for more information.
A warning and backtrace are printed when sub-NUMA clustering (SNC) is
enabled and there are more than 2 packages because the average distances
to remote packages could be different, skewing the single average remote
distance.
This is unnecessary when the average distances to remote packages are
the same.
Support single average remote distance on systems with more than 2
packages, preventing unnecessary warnings and backtraces by checking if
average distances to remote packages are the same.
[1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
---
The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
---
arch/x86/kernel/smpboot.c | 70 ++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..4467716f4054 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -518,27 +518,70 @@ static int avg_remote_numa_distance(void)
{
int i, j;
int distance, nr_remote, total_distance;
+ int max_pkgs = topology_max_packages();
+ int cpu, pkg, pkg_avg_distance;
+ int *pkg_total_distance;
+ int *pkg_nr_remote;
if (sched_avg_remote_distance > 0)
return sched_avg_remote_distance;
+ sched_avg_remote_distance = REMOTE_DISTANCE;
+
nr_remote = 0;
total_distance = 0;
+
+ pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
+ if (!pkg_total_distance)
+ goto cleanup;
+
+ pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
+ if (!pkg_nr_remote)
+ goto cleanup;
+
for_each_node_state(i, N_CPU) {
for_each_node_state(j, N_CPU) {
distance = node_distance(i, j);
- if (distance >= REMOTE_DISTANCE) {
- nr_remote++;
- total_distance += distance;
- }
+ if (distance < REMOTE_DISTANCE)
+ continue;
+
+ nr_remote++;
+ total_distance += distance;
+
+ cpu = cpumask_first(cpumask_of_node(j));
+ if (cpu >= nr_cpu_ids)
+ continue;
+
+ pkg = topology_physical_package_id(cpu);
+ pkg_total_distance[pkg] += distance;
+ pkg_nr_remote[pkg]++;
}
}
- if (nr_remote)
- sched_avg_remote_distance = total_distance / nr_remote;
- else
- sched_avg_remote_distance = REMOTE_DISTANCE;
+ if (!nr_remote)
+ goto cleanup;
+
+ sched_avg_remote_distance = total_distance / nr_remote;
+
+ /*
+ * Single average remote distance won't be appropriate if different
+ * packages have different distances to remote packages.
+ */
+ for (i = 0; i < max_pkgs; i++) {
+ if (!pkg_nr_remote[i])
+ continue;
+
+ pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
+
+ pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
+
+ if (pkg_avg_distance != sched_avg_remote_distance)
+ WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
+ }
+cleanup:
+ kfree(pkg_nr_remote);
+ kfree(pkg_total_distance);
return sched_avg_remote_distance;
}
@@ -564,18 +607,7 @@ int arch_sched_node_distance(int from, int to)
* in the remote package in the same sched group.
* Simplify NUMA domains and avoid extra NUMA levels including
* different remote NUMA nodes and local nodes.
- *
- * GNR and CWF don't expect systems with more than 2 packages
- * and more than 2 hops between packages. Single average remote
- * distance won't be appropriate if there are more than 2
- * packages as average distance to different remote packages
- * could be different.
*/
- WARN_ONCE(topology_max_packages() > 2,
- "sched: Expect only up to 2 packages for GNR or CWF, "
- "but saw %d packages when building sched domains.",
- topology_max_packages());
-
d = avg_remote_numa_distance();
}
return d;
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] sched/topology: Check average distances to remote packages
2026-01-27 17:02 [PATCH] sched/topology: Check average distances to remote packages Kyle Meyer
@ 2026-02-04 21:42 ` Tim Chen
2026-02-04 22:58 ` Tim Chen
1 sibling, 0 replies; 4+ messages in thread
From: Tim Chen @ 2026-02-04 21:42 UTC (permalink / raw)
To: Kyle Meyer, bp, dave.hansen, mingo, peterz, tglx, vinicius.gomes
Cc: brgerst, hpa, kprateek.nayak, linux-kernel, patryk.wlazlyn,
rafael.j.wysocki, russ.anderson, x86, yu.c.chen, zhao1.liu
On Tue, 2026-01-27 at 11:02 -0600, Kyle Meyer wrote:
> Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> remote packages to fix scheduler domains, see [1] for more information.
>
> A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> enabled and there are more than 2 packages because the average distances
> to remote packages could be different, skewing the single average remote
> distance.
>
> This is unnecessary when the average distances to remote packages are
> the same.
>
> Support single average remote distance on systems with more than 2
> packages, preventing unnecessary warnings and backtraces by checking if
> average distances to remote packages are the same.
Yes, I agree that if there are more than 2 packages but the packages are the
same average distance with each other (symmetric package topology), that
should be okay and no warnings is necessary.
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>
> [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
>
> The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
>
> ---
> arch/x86/kernel/smpboot.c | 70 ++++++++++++++++++++++++++++-----------
> 1 file changed, 51 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5cd6950ab672..4467716f4054 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -518,27 +518,70 @@ static int avg_remote_numa_distance(void)
> {
> int i, j;
> int distance, nr_remote, total_distance;
> + int max_pkgs = topology_max_packages();
> + int cpu, pkg, pkg_avg_distance;
> + int *pkg_total_distance;
> + int *pkg_nr_remote;
>
> if (sched_avg_remote_distance > 0)
> return sched_avg_remote_distance;
>
> + sched_avg_remote_distance = REMOTE_DISTANCE;
> +
> nr_remote = 0;
> total_distance = 0;
> +
> + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_total_distance)
> + goto cleanup;
> +
> + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_nr_remote)
> + goto cleanup;
> +
> for_each_node_state(i, N_CPU) {
> for_each_node_state(j, N_CPU) {
> distance = node_distance(i, j);
>
> - if (distance >= REMOTE_DISTANCE) {
> - nr_remote++;
> - total_distance += distance;
> - }
> + if (distance < REMOTE_DISTANCE)
> + continue;
> +
> + nr_remote++;
> + total_distance += distance;
> +
> + cpu = cpumask_first(cpumask_of_node(j));
> + if (cpu >= nr_cpu_ids)
> + continue;
> +
> + pkg = topology_physical_package_id(cpu);
> + pkg_total_distance[pkg] += distance;
> + pkg_nr_remote[pkg]++;
> }
> }
> - if (nr_remote)
> - sched_avg_remote_distance = total_distance / nr_remote;
> - else
> - sched_avg_remote_distance = REMOTE_DISTANCE;
>
> + if (!nr_remote)
> + goto cleanup;
> +
> + sched_avg_remote_distance = total_distance / nr_remote;
> +
> + /*
> + * Single average remote distance won't be appropriate if different
> + * packages have different distances to remote packages.
> + */
> + for (i = 0; i < max_pkgs; i++) {
> + if (!pkg_nr_remote[i])
> + continue;
> +
> + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> +
> + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> +
> + if (pkg_avg_distance != sched_avg_remote_distance)
> + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> + }
> +cleanup:
> + kfree(pkg_nr_remote);
> + kfree(pkg_total_distance);
> return sched_avg_remote_distance;
> }
>
> @@ -564,18 +607,7 @@ int arch_sched_node_distance(int from, int to)
> * in the remote package in the same sched group.
> * Simplify NUMA domains and avoid extra NUMA levels including
> * different remote NUMA nodes and local nodes.
> - *
> - * GNR and CWF don't expect systems with more than 2 packages
> - * and more than 2 hops between packages. Single average remote
> - * distance won't be appropriate if there are more than 2
> - * packages as average distance to different remote packages
> - * could be different.
> */
> - WARN_ONCE(topology_max_packages() > 2,
> - "sched: Expect only up to 2 packages for GNR or CWF, "
> - "but saw %d packages when building sched domains.",
> - topology_max_packages());
> -
> d = avg_remote_numa_distance();
> }
> return d;
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] sched/topology: Check average distances to remote packages
2026-01-27 17:02 [PATCH] sched/topology: Check average distances to remote packages Kyle Meyer
2026-02-04 21:42 ` Tim Chen
@ 2026-02-04 22:58 ` Tim Chen
2026-02-04 23:07 ` Kyle Meyer
1 sibling, 1 reply; 4+ messages in thread
From: Tim Chen @ 2026-02-04 22:58 UTC (permalink / raw)
To: Kyle Meyer, bp, dave.hansen, mingo, peterz, tglx, vinicius.gomes
Cc: brgerst, hpa, kprateek.nayak, linux-kernel, patryk.wlazlyn,
rafael.j.wysocki, russ.anderson, x86, yu.c.chen, zhao1.liu
On Tue, 2026-01-27 at 11:02 -0600, Kyle Meyer wrote:
> Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> remote packages to fix scheduler domains, see [1] for more information.
>
> A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> enabled and there are more than 2 packages because the average distances
> to remote packages could be different, skewing the single average remote
> distance.
>
> This is unnecessary when the average distances to remote packages are
> the same.
>
> Support single average remote distance on systems with more than 2
> packages, preventing unnecessary warnings and backtraces by checking if
> average distances to remote packages are the same.
>
> [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
>
> The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
>
> ---
> arch/x86/kernel/smpboot.c | 70 ++++++++++++++++++++++++++++-----------
> 1 file changed, 51 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5cd6950ab672..4467716f4054 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -518,27 +518,70 @@ static int avg_remote_numa_distance(void)
> {
> int i, j;
> int distance, nr_remote, total_distance;
> + int max_pkgs = topology_max_packages();
> + int cpu, pkg, pkg_avg_distance;
> + int *pkg_total_distance;
> + int *pkg_nr_remote;
Minor nit, pkg_nr_remote needs to be initialized as NULL.
Otherwise if allocation of pkg_total_distance failed and we
go to cleanup, we can call kfree(pkg_nr_remote) with a non
null uninitialized pkg_nr_remote pointer. Maybe do
int *pkg_total_distance=NULL, *pkg_nr_remote=NULL;
in declaration.
>
> if (sched_avg_remote_distance > 0)
> return sched_avg_remote_distance;
>
> + sched_avg_remote_distance = REMOTE_DISTANCE;
> +
> nr_remote = 0;
> total_distance = 0;
> +
> + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_total_distance)
> + goto cleanup;
> +
> + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> + if (!pkg_nr_remote)
> + goto cleanup;
> +
> for_each_node_state(i, N_CPU) {
> for_each_node_state(j, N_CPU) {
> distance = node_distance(i, j);
>
> - if (distance >= REMOTE_DISTANCE) {
> - nr_remote++;
> - total_distance += distance;
> - }
> + if (distance < REMOTE_DISTANCE)
> + continue;
> +
> + nr_remote++;
> + total_distance += distance;
> +
> + cpu = cpumask_first(cpumask_of_node(j));
> + if (cpu >= nr_cpu_ids)
> + continue;
> +
> + pkg = topology_physical_package_id(cpu);
> + pkg_total_distance[pkg] += distance;
> + pkg_nr_remote[pkg]++;
> }
> }
> - if (nr_remote)
> - sched_avg_remote_distance = total_distance / nr_remote;
> - else
> - sched_avg_remote_distance = REMOTE_DISTANCE;
>
> + if (!nr_remote)
> + goto cleanup;
> +
> + sched_avg_remote_distance = total_distance / nr_remote;
> +
> + /*
> + * Single average remote distance won't be appropriate if different
> + * packages have different distances to remote packages.
> + */
> + for (i = 0; i < max_pkgs; i++) {
> + if (!pkg_nr_remote[i])
> + continue;
> +
> + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> +
> + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> +
> + if (pkg_avg_distance != sched_avg_remote_distance)
> + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> + }
> +cleanup:
> + kfree(pkg_nr_remote);
> + kfree(pkg_total_distance);
> return sched_avg_remote_distance;
> }
>
> @@ -564,18 +607,7 @@ int arch_sched_node_distance(int from, int to)
> * in the remote package in the same sched group.
> * Simplify NUMA domains and avoid extra NUMA levels including
> * different remote NUMA nodes and local nodes.
> - *
> - * GNR and CWF don't expect systems with more than 2 packages
> - * and more than 2 hops between packages. Single average remote
> - * distance won't be appropriate if there are more than 2
> - * packages as average distance to different remote packages
> - * could be different.
> */
> - WARN_ONCE(topology_max_packages() > 2,
> - "sched: Expect only up to 2 packages for GNR or CWF, "
> - "but saw %d packages when building sched domains.",
> - topology_max_packages());
> -
> d = avg_remote_numa_distance();
> }
> return d;
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] sched/topology: Check average distances to remote packages
2026-02-04 22:58 ` Tim Chen
@ 2026-02-04 23:07 ` Kyle Meyer
0 siblings, 0 replies; 4+ messages in thread
From: Kyle Meyer @ 2026-02-04 23:07 UTC (permalink / raw)
To: Tim Chen
Cc: bp, dave.hansen, mingo, peterz, tglx, vinicius.gomes, brgerst,
hpa, kprateek.nayak, linux-kernel, patryk.wlazlyn,
rafael.j.wysocki, russ.anderson, x86, yu.c.chen, zhao1.liu
On Wed, Feb 04, 2026 at 02:58:46PM -0800, Tim Chen wrote:
> On Tue, 2026-01-27 at 11:02 -0600, Kyle Meyer wrote:
> > Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> > remote packages to fix scheduler domains, see [1] for more information.
> >
> > A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> > enabled and there are more than 2 packages because the average distances
> > to remote packages could be different, skewing the single average remote
> > distance.
> >
> > This is unnecessary when the average distances to remote packages are
> > the same.
> >
> > Support single average remote distance on systems with more than 2
> > packages, preventing unnecessary warnings and backtraces by checking if
> > average distances to remote packages are the same.
> >
> > [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
> >
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > ---
> >
> > The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
> >
> > ---
> > arch/x86/kernel/smpboot.c | 70 ++++++++++++++++++++++++++++-----------
> > 1 file changed, 51 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 5cd6950ab672..4467716f4054 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -518,27 +518,70 @@ static int avg_remote_numa_distance(void)
> > {
> > int i, j;
> > int distance, nr_remote, total_distance;
> > + int max_pkgs = topology_max_packages();
> > + int cpu, pkg, pkg_avg_distance;
> > + int *pkg_total_distance;
> > + int *pkg_nr_remote;
>
> Minor nit, pkg_nr_remote needs to be initialized as NULL.
> Otherwise if allocation of pkg_total_distance failed and we
> go to cleanup, we can call kfree(pkg_nr_remote) with a non
> null uninitialized pkg_nr_remote pointer. Maybe do
>
> int *pkg_total_distance=NULL, *pkg_nr_remote=NULL;
>
> in declaration.
Thanks for the review! I'll send a v2.
> > if (sched_avg_remote_distance > 0)
> > return sched_avg_remote_distance;
> >
> > + sched_avg_remote_distance = REMOTE_DISTANCE;
> > +
> > nr_remote = 0;
> > total_distance = 0;
> > +
> > + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> > + if (!pkg_total_distance)
> > + goto cleanup;
> > +
> > + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> > + if (!pkg_nr_remote)
> > + goto cleanup;
> > +
> > for_each_node_state(i, N_CPU) {
> > for_each_node_state(j, N_CPU) {
> > distance = node_distance(i, j);
> >
> > - if (distance >= REMOTE_DISTANCE) {
> > - nr_remote++;
> > - total_distance += distance;
> > - }
> > + if (distance < REMOTE_DISTANCE)
> > + continue;
> > +
> > + nr_remote++;
> > + total_distance += distance;
> > +
> > + cpu = cpumask_first(cpumask_of_node(j));
> > + if (cpu >= nr_cpu_ids)
> > + continue;
> > +
> > + pkg = topology_physical_package_id(cpu);
> > + pkg_total_distance[pkg] += distance;
> > + pkg_nr_remote[pkg]++;
> > }
> > }
> > - if (nr_remote)
> > - sched_avg_remote_distance = total_distance / nr_remote;
> > - else
> > - sched_avg_remote_distance = REMOTE_DISTANCE;
> >
> > + if (!nr_remote)
> > + goto cleanup;
> > +
> > + sched_avg_remote_distance = total_distance / nr_remote;
> > +
> > + /*
> > + * Single average remote distance won't be appropriate if different
> > + * packages have different distances to remote packages.
> > + */
> > + for (i = 0; i < max_pkgs; i++) {
> > + if (!pkg_nr_remote[i])
> > + continue;
> > +
> > + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> > +
> > + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> > +
> > + if (pkg_avg_distance != sched_avg_remote_distance)
> > + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> > + }
> > +cleanup:
> > + kfree(pkg_nr_remote);
> > + kfree(pkg_total_distance);
> > return sched_avg_remote_distance;
> > }
> >
> > @@ -564,18 +607,7 @@ int arch_sched_node_distance(int from, int to)
> > * in the remote package in the same sched group.
> > * Simplify NUMA domains and avoid extra NUMA levels including
> > * different remote NUMA nodes and local nodes.
> > - *
> > - * GNR and CWF don't expect systems with more than 2 packages
> > - * and more than 2 hops between packages. Single average remote
> > - * distance won't be appropriate if there are more than 2
> > - * packages as average distance to different remote packages
> > - * could be different.
> > */
> > - WARN_ONCE(topology_max_packages() > 2,
> > - "sched: Expect only up to 2 packages for GNR or CWF, "
> > - "but saw %d packages when building sched domains.",
> > - topology_max_packages());
> > -
> > d = avg_remote_numa_distance();
> > }
> > return d;
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-04 23:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 17:02 [PATCH] sched/topology: Check average distances to remote packages Kyle Meyer
2026-02-04 21:42 ` Tim Chen
2026-02-04 22:58 ` Tim Chen
2026-02-04 23:07 ` Kyle Meyer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox