From: Kyle Meyer <kyle.meyer@hpe.com>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: bp@alien8.de, dave.hansen@linux.intel.com, mingo@redhat.com,
peterz@infradead.org, tglx@kernel.org, vinicius.gomes@intel.com,
brgerst@gmail.com, hpa@zytor.com, kprateek.nayak@amd.com,
linux-kernel@vger.kernel.org, patryk.wlazlyn@linux.intel.com,
rafael.j.wysocki@intel.com, russ.anderson@hpe.com,
x86@kernel.org, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH] sched/topology: Check average distances to remote packages
Date: Wed, 4 Feb 2026 17:07:06 -0600 [thread overview]
Message-ID: <aYPRGsMaicbWglFy@hpe.com> (raw)
In-Reply-To: <a24c496e6c8f4d2cb44cc276455f98ba33be5ea6.camel@linux.intel.com>
On Wed, Feb 04, 2026 at 02:58:46PM -0800, Tim Chen wrote:
> On Tue, 2026-01-27 at 11:02 -0600, Kyle Meyer wrote:
> > Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to
> > remote packages to fix scheduler domains, see [1] for more information.
> >
> > A warning and backtrace are printed when sub-NUMA clustering (SNC) is
> > enabled and there are more than 2 packages because the average distances
> > to remote packages could be different, skewing the single average remote
> > distance.
> >
> > This is unnecessary when the average distances to remote packages are
> > the same.
> >
> > Support single average remote distance on systems with more than 2
> > packages, preventing unnecessary warnings and backtraces by checking if
> > average distances to remote packages are the same.
> >
> > [1] commit 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode").
> >
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > ---
> >
> > The warning and backtrace were noticed on a 16 socket GNR system with SNC-2 enabled.
> >
> > ---
> > arch/x86/kernel/smpboot.c | 70 ++++++++++++++++++++++++++++-----------
> > 1 file changed, 51 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 5cd6950ab672..4467716f4054 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -518,27 +518,70 @@ static int avg_remote_numa_distance(void)
> > {
> > int i, j;
> > int distance, nr_remote, total_distance;
> > + int max_pkgs = topology_max_packages();
> > + int cpu, pkg, pkg_avg_distance;
> > + int *pkg_total_distance;
> > + int *pkg_nr_remote;
>
> Minor nit, pkg_nr_remote needs to be initialized as NULL.
> Otherwise if allocation of pkg_total_distance failed and we
> go to cleanup, we can call kfree(pkg_nr_remote) with a non
> null uninitialized pkg_nr_remote pointer. Maybe do
>
> int *pkg_total_distance=NULL, *pkg_nr_remote=NULL;
>
> in declaration.
Thanks for the review! I'll send a v2.
> > if (sched_avg_remote_distance > 0)
> > return sched_avg_remote_distance;
> >
> > + sched_avg_remote_distance = REMOTE_DISTANCE;
> > +
> > nr_remote = 0;
> > total_distance = 0;
> > +
> > + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> > + if (!pkg_total_distance)
> > + goto cleanup;
> > +
> > + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL);
> > + if (!pkg_nr_remote)
> > + goto cleanup;
> > +
> > for_each_node_state(i, N_CPU) {
> > for_each_node_state(j, N_CPU) {
> > distance = node_distance(i, j);
> >
> > - if (distance >= REMOTE_DISTANCE) {
> > - nr_remote++;
> > - total_distance += distance;
> > - }
> > + if (distance < REMOTE_DISTANCE)
> > + continue;
> > +
> > + nr_remote++;
> > + total_distance += distance;
> > +
> > + cpu = cpumask_first(cpumask_of_node(j));
> > + if (cpu >= nr_cpu_ids)
> > + continue;
> > +
> > + pkg = topology_physical_package_id(cpu);
> > + pkg_total_distance[pkg] += distance;
> > + pkg_nr_remote[pkg]++;
> > }
> > }
> > - if (nr_remote)
> > - sched_avg_remote_distance = total_distance / nr_remote;
> > - else
> > - sched_avg_remote_distance = REMOTE_DISTANCE;
> >
> > + if (!nr_remote)
> > + goto cleanup;
> > +
> > + sched_avg_remote_distance = total_distance / nr_remote;
> > +
> > + /*
> > + * Single average remote distance won't be appropriate if different
> > + * packages have different distances to remote packages.
> > + */
> > + for (i = 0; i < max_pkgs; i++) {
> > + if (!pkg_nr_remote[i])
> > + continue;
> > +
> > + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i];
> > +
> > + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance);
> > +
> > + if (pkg_avg_distance != sched_avg_remote_distance)
> > + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n");
> > + }
> > +cleanup:
> > + kfree(pkg_nr_remote);
> > + kfree(pkg_total_distance);
> > return sched_avg_remote_distance;
> > }
> >
> > @@ -564,18 +607,7 @@ int arch_sched_node_distance(int from, int to)
> > * in the remote package in the same sched group.
> > * Simplify NUMA domains and avoid extra NUMA levels including
> > * different remote NUMA nodes and local nodes.
> > - *
> > - * GNR and CWF don't expect systems with more than 2 packages
> > - * and more than 2 hops between packages. Single average remote
> > - * distance won't be appropriate if there are more than 2
> > - * packages as average distance to different remote packages
> > - * could be different.
> > */
> > - WARN_ONCE(topology_max_packages() > 2,
> > - "sched: Expect only up to 2 packages for GNR or CWF, "
> > - "but saw %d packages when building sched domains.",
> > - topology_max_packages());
> > -
> > d = avg_remote_numa_distance();
> > }
> > return d;
prev parent reply other threads:[~2026-02-04 23:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 17:02 [PATCH] sched/topology: Check average distances to remote packages Kyle Meyer
2026-02-04 21:42 ` Tim Chen
2026-02-04 22:58 ` Tim Chen
2026-02-04 23:07 ` Kyle Meyer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYPRGsMaicbWglFy@hpe.com \
--to=kyle.meyer@hpe.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=patryk.wlazlyn@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=russ.anderson@hpe.com \
--cc=tglx@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vinicius.gomes@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox