From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EABC130E842 for ; Mon, 23 Feb 2026 17:03:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771866214; cv=none; b=QQW6/m0ZzPuBp8X0h1cNqkySy1iLpLYT1EbBIEDl4cBBNj8fBgt1cSwyT/EP8ozKNk8jUaHuLndaV53CF2gvrQlKJ2fe0zWw6I1mEV4cyA6v5yNS+i/L0kDDSmcQsx0XVHXvttA4RatvikVGV8PGWYnK/PR9vGTDHHz7gnqieIs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771866214; c=relaxed/simple; bh=hmNyGrbs0irWz8JrRGxFLzauurpjBxefVMxK/Fur8xg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eQ4brHjyKJVKkymwZ/A0YFlJ5+FEZ8czBRBHAogbIMc4MygPRDQVfteoQTm5At42dsnUd/mUpdD/zZqVfhbWK5BjgwHVClA9WBrBXyJHRJVmOpsz8h324p0fKC+vCtZ8az9eLgkMr5L2cXOVrPQHw4vz7+OL5PDSnWTLj6kxX6k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=nr+l1YmW; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="nr+l1YmW" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=PC8uQNYkQuPvxQmSMQpUjfMAEvHvMxEz44qN/jCK3CQ=; b=nr+l1YmWfcwSFBRjo7BiGa5YkM w0k+DnybIoflybbBrC+yow52o3EmQYz5nKNG/+zCqc2rax5LYIt84MkiohexOWU2Wd2AyfnDAF+/U g8y/g8aLyVRzgYAT/JOWH2LFgdXTi5dnQCNrxXpz4pKvnlDfMDSgxc6g7utROc7Jv1krl+S4EdfCt Owf7aJ8a2/vV+mwsL5qz7hDEDaIebgX8MOzJKuG9iTx+6pAKHDXdo4faLbClEXuz55/KrxCNko5zO 66Gfh7uqOtED8vDeyq0PnMMsgzrF+Od5+MGzgTCNNtiR0ymeEuvXvyyzq5e4GUL2yqB64e87WVQ8F 9KSaTSmA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vuZL2-0000000Eni2-24xC; Mon, 23 Feb 2026 17:03:16 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 9E8EB30095A; Mon, 23 Feb 2026 18:03:14 +0100 (CET) Date: Mon, 23 Feb 2026 18:03:14 +0100 From: Peter Zijlstra To: Kyle Meyer Cc: tim.c.chen@linux.intel.com, bp@alien8.de, dave.hansen@linux.intel.com, mingo@redhat.com, tglx@kernel.org, vinicius.gomes@intel.com, brgerst@gmail.com, hpa@zytor.com, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, patryk.wlazlyn@linux.intel.com, rafael.j.wysocki@intel.com, russ.anderson@hpe.com, x86@kernel.org, yu.c.chen@intel.com, zhao1.liu@intel.com Subject: Re: [PATCH v2] sched/topology: Check average distances to remote packages Message-ID: <20260223170314.GU1395266@noisy.programming.kicks-ass.net> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Feb 04, 2026 at 06:24:26PM -0600, Kyle Meyer wrote: > Granite Rapids (GNR) and Clearwater Forest (CWF) average distances to > remote packages to fix scheduler domains, see [1] for more information. > > A warning and backtrace are printed when sub-NUMA clustering (SNC) is > enabled and there are more than 2 packages because the average distances > to remote packages could be different, skewing the single average remote > distance. But earlier Tim said these systems will not have more than 2 packages. So what's what? So what do these new systems look like? > This is unnecessary when the average distances to remote packages are > the same. > > Support single average remote distance on systems with more than 2 > packages, preventing unnecessary warnings and backtraces, by checking if > average distances to remote packages are the same. > --- > arch/x86/kernel/smpboot.c | 69 ++++++++++++++++++++++++++++----------- > 1 file changed, 50 insertions(+), 19 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 5cd6950ab672..dc8f15bd2e19 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -518,27 +518,69 @@ static int avg_remote_numa_distance(void) > { > int i, j; > int distance, nr_remote, total_distance; > + int max_pkgs = topology_max_packages(); > + int cpu, pkg, pkg_avg_distance; > + int *pkg_total_distance = NULL, *pkg_nr_remote = NULL; Can you make that the normal reverse xmas thing? > if (sched_avg_remote_distance > 0) > return sched_avg_remote_distance; > > + sched_avg_remote_distance = REMOTE_DISTANCE; > + > nr_remote = 0; > total_distance = 0; > + > + pkg_total_distance = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL); > + if (!pkg_total_distance) > + goto cleanup; > + > + pkg_nr_remote = kcalloc(max_pkgs, sizeof(int), GFP_KERNEL); > + if (!pkg_nr_remote) > + goto cleanup; > + > for_each_node_state(i, N_CPU) { > for_each_node_state(j, N_CPU) { > distance = node_distance(i, j); > > - if (distance >= REMOTE_DISTANCE) { > - nr_remote++; > - total_distance += distance; > - } > + if (distance < REMOTE_DISTANCE) > + continue; > + > + nr_remote++; > + total_distance += distance; > + > + cpu = cpumask_first(cpumask_of_node(j)); > + if (cpu >= nr_cpu_ids) > + continue; > + > + pkg = topology_physical_package_id(cpu); > + pkg_total_distance[pkg] += distance; > + pkg_nr_remote[pkg]++; This is broken, physical_package_id is not guaranteed to be dense. > } > } > - if (nr_remote) > - sched_avg_remote_distance = total_distance / nr_remote; > - else > - sched_avg_remote_distance = REMOTE_DISTANCE; > > + if (!nr_remote) > + goto cleanup; > + > + sched_avg_remote_distance = total_distance / nr_remote; > + > + /* > + * Single average remote distance won't be appropriate if different > + * packages have different distances to remote packages. > + */ > + for (i = 0; i < max_pkgs; i++) { > + if (!pkg_nr_remote[i]) > + continue; > + > + pkg_avg_distance = pkg_total_distance[i] / pkg_nr_remote[i]; > + > + pr_debug("sched: Avg. distance to remote package %d: %d\n", i, pkg_avg_distance); > + > + if (pkg_avg_distance != sched_avg_remote_distance) > + WARN_ONCE(1, "sched: Avg. distances to remote packages are different\n"); > + } This is pretty yuck. Also, what's with the pr_debug() stuff? Anyway, that function was fairly magical, and now it is nearly impenetrable. If we want this, it needs comments. Definitely more comments, with nice pictures on. > +cleanup: > + kfree(pkg_nr_remote); > + kfree(pkg_total_distance); > return sched_avg_remote_distance; > } > > @@ -564,18 +606,7 @@ int arch_sched_node_distance(int from, int to) > * in the remote package in the same sched group. > * Simplify NUMA domains and avoid extra NUMA levels including > * different remote NUMA nodes and local nodes. > - * > - * GNR and CWF don't expect systems with more than 2 packages > - * and more than 2 hops between packages. Single average remote > - * distance won't be appropriate if there are more than 2 > - * packages as average distance to different remote packages > - * could be different. > */ > - WARN_ONCE(topology_max_packages() > 2, > - "sched: Expect only up to 2 packages for GNR or CWF, " > - "but saw %d packages when building sched domains.", > - topology_max_packages()); > - > d = avg_remote_numa_distance(); > } > return d; > -- > 2.52.0 >