From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A29EA26B2CE for ; Wed, 25 Feb 2026 21:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772055436; cv=none; b=dpBQOS0aOmBeAYrki6iDLUQP1w1t9pMYebKpqHW/2BRg2UvD5+BbYIsrYFYjbcF+FZSIZVc0hQtKrqHE+Ub811yZpuwc1J2Tl5cXlITgYC11RjKnLGZ1JVBLaJuhJzzxiAnDnYPPNzYjR9L36rVGHqI1MDH94mtmldaB/K97nkI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772055436; c=relaxed/simple; bh=fjrUAxIdhKPbc0FkbTBOCsraysHTUS7UUUJ4arL+1Aw=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=kEAJleLYMLUUtUE6WHhA3lOO1tr3lJFDLqr2TF9Re0gOrHb1tbOTw2MdgDUv7MyZHRsXS3QcBxSj1mYAqXT8zg8c0vovZFkdEUmuFanMZmB2t47h+eWx2Y9Kv0qojPLRjpwjDVV5PDwHKdYo6OIbZ15vEk4IWVUWJvGEA3RZpKQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N1VjsaB9; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N1VjsaB9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772055434; x=1803591434; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=fjrUAxIdhKPbc0FkbTBOCsraysHTUS7UUUJ4arL+1Aw=; b=N1VjsaB9vV8XQFyWZnJRsCCFhfKJmr5J5x2il8TPkd3T4r8tbpo6zXz4 3575j+c6Hn4KpNfZ9KFEmntl56dQ7zQm/nT/uNjy7TWuUblWtfTJx8nfR 9GAVaLmhNx9VJMMv8xBNfLJwHm+Fn2hxoMDe1JWzcLrB2dnaiprfOpXEH XQ8uWYTnWeG3BMtD6/Ia4IBlVzDtTM5OzzkXVpZQKkZAuvEYX5hI5IjYj V/XusQ1aguAFU2E589fNGqb6blyyLil441JJCoU+j5A0MM9PasZ9+Y//W RHziG5Li6cpS0FFTdVRO1lLGx/dgLZ1G+RhM7G3uqkFGvsvSNVwEW3r98 w==; X-CSE-ConnectionGUID: TFFYe0dSRZmHTykjyrPKQQ== X-CSE-MsgGUID: HV2O0CNaSYiokVns1Ft/Hw== X-IronPort-AV: E=McAfee;i="6800,10657,11712"; a="73016635" X-IronPort-AV: E=Sophos;i="6.21,311,1763452800"; d="scan'208";a="73016635" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2026 13:37:13 -0800 X-CSE-ConnectionGUID: EE3ZEfHGT0qWW1eCtckDYQ== X-CSE-MsgGUID: xYlfuy2cRiubQg+TILPWLA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,311,1763452800"; d="scan'208";a="239352986" Received: from schen9-mobl4.amr.corp.intel.com (HELO [10.125.111.101]) ([10.125.111.101]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2026 13:37:11 -0800 Message-ID: <62f610811c1b1cba7e282b6e855baba11f7f49a6.camel@linux.intel.com> Subject: Re: [PATCH v2] sched/topology: Check average distances to remote packages From: Tim Chen To: Peter Zijlstra , "Chen, Yu C" Cc: Kyle Meyer , bp@alien8.de, dave.hansen@linux.intel.com, mingo@redhat.com, tglx@kernel.org, vinicius.gomes@intel.com, brgerst@gmail.com, hpa@zytor.com, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, patryk.wlazlyn@linux.intel.com, rafael.j.wysocki@intel.com, russ.anderson@hpe.com, x86@kernel.org, zhao1.liu@intel.com Date: Wed, 25 Feb 2026 13:37:11 -0800 In-Reply-To: <20260225163246.GX1395416@noisy.programming.kicks-ass.net> References: <20260223170314.GU1395266@noisy.programming.kicks-ass.net> <20260225123052.GN3016024@noisy.programming.kicks-ass.net> <20260225154409.GD1282955@noisy.programming.kicks-ass.net> <20260225163246.GX1395416@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.58.1 (3.58.1-1.fc43) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Wed, 2026-02-25 at 17:32 +0100, Peter Zijlstra wrote: > On Wed, Feb 25, 2026 at 04:44:09PM +0100, Peter Zijlstra wrote: >=20 > > Yes, so this assumes that all u sized clusters on the trace are similar > > and 'sane' without verification. >=20 > That gave me an idea; how's this then? Sorry I was sick for a few days. Just catching up on this thread here. I think your patch takes care of both GNR SNC-3=C2=A0 with 3 compute dies (with non-symmetric remote distances) and generic SNC-2 with 2 dies (symmetric distances) very well. Minor suggestion below for the patch. Will ask the original GNR teams with the problem to try it out. >=20 > --- > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 5cd6950ab672..b1e464fd98c0 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -513,33 +513,99 @@ static void __init build_sched_topology(void) > } > =20 > #ifdef CONFIG_NUMA > -static int sched_avg_remote_distance; > -static int avg_remote_numa_distance(void) > + > +static bool slit_cluster_symmetric(int i, int j, int n) > { > - int i, j; > - int distance, nr_remote, total_distance; > + WARN_ON_ONCE((i % n) || (j % n)); > =20 > - if (sched_avg_remote_distance > 0) > - return sched_avg_remote_distance; > - > - nr_remote =3D 0; > - total_distance =3D 0; > - for_each_node_state(i, N_CPU) { > - for_each_node_state(j, N_CPU) { > - distance =3D node_distance(i, j); > - > - if (distance >=3D REMOTE_DISTANCE) { > - nr_remote++; > - total_distance +=3D distance; > - } > + for (int k =3D i; k < i + n; k++) { > + for (int l =3D k; l < j + n; l++) { > + if (node_distance(k, l) !=3D node_distance(k, l)) > + return false; > } > } > - if (nr_remote) > - sched_avg_remote_distance =3D total_distance / nr_remote; > - else > - sched_avg_remote_distance =3D REMOTE_DISTANCE; > =20 > - return sched_avg_remote_distance; > + return true; > +} > + > +static bool slit_cluster_match(int i, int j, int x, int y, int n) Seems like we only call this function with i=3D=3Dj and x=3D=3Dy (i.e. clus= ter at i and cluster at x). Can we simplify? Thanks. Tim > +{ > + WARN_ON_ONCE((i % n) || (j % n) || (x % n) || (y % n)); > + > + for (int k =3D 0; k < n; k++) { > + for (int l =3D k; l < n; l++) { > + if (node_distance(i + k, j + l) !=3D node_distance(x + k, y + l)) > + return false; > + } > + } > + > + return true; > +} > + > +/* > + * Find the largest symmetric,repeating cluster in an attempt to identif= y the > + * unit size. > + */ > +static int slit_cluster_size(void) > +{ > + int nodes =3D num_possible_nodes(); > + > + /* > + * There are at least 2 packages; so half-nodes is the largest > + * possible unit, go down from that. > + */ > + for (int u =3D nodes / 2; u; u--) { > + /* > + * If u doesn't divide nodes, it can't be a unit. > + */ > + if (nodes % u) > + continue; > + > + /* > + * Unit must be symmetric, > + */ > + if (!slit_cluster_symmetric(0, 0, u)) > + continue; > + > + /* > + * and repeating. > + */ > + if (slit_cluster_match(0, 0, u, u, u)) > + return u; > + } > + > + return nodes; > +} > + > +static int slit_cluster_distance(int i, int j) > +{ > + static int u =3D 0; > + long d =3D 0; > + int x, y; > + > + if (!u) > + u =3D slit_cluster_size(); > + > + /* > + * Is this a unit cluster on the trace? > + */ > + if ((i / u) =3D=3D (j / u)) > + return node_distance(i, j); > + > + /* > + * Off-trace cluster, return average of the cluster to force symmetry. > + */ > + x =3D i - (i % u); > + y =3D j - (j % u); > + > + for (i =3D x; i < x + u; i++) { > + for (j =3D y; j < y + u; j++) { > + d +=3D node_distance(i, j); > + d +=3D node_distance(j, i); > + } > + } > + > + return d / (2*u*u); > } > =20 > int arch_sched_node_distance(int from, int to) > @@ -550,8 +616,7 @@ int arch_sched_node_distance(int from, int to) > case INTEL_GRANITERAPIDS_X: > case INTEL_ATOM_DARKMONT_X: > =20 > - if (!x86_has_numa_in_package || topology_max_packages() =3D=3D 1 || > - d < REMOTE_DISTANCE) > + if (!x86_has_numa_in_package || topology_max_packages() =3D=3D 1) > return d; > =20 > /* > @@ -564,19 +629,8 @@ int arch_sched_node_distance(int from, int to) > * in the remote package in the same sched group. > * Simplify NUMA domains and avoid extra NUMA levels including > * different remote NUMA nodes and local nodes. > - * > - * GNR and CWF don't expect systems with more than 2 packages > - * and more than 2 hops between packages. Single average remote > - * distance won't be appropriate if there are more than 2 > - * packages as average distance to different remote packages > - * could be different. > */ > - WARN_ONCE(topology_max_packages() > 2, > - "sched: Expect only up to 2 packages for GNR or CWF, " > - "but saw %d packages when building sched domains.", > - topology_max_packages()); > - > - d =3D avg_remote_numa_distance(); > + return slit_cluster_distance(from, to); > } > return d; > }