public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Fix NUMA sched domain build errors for GNR and CWF
@ 2025-09-19 17:50 Tim Chen
  2025-09-19 17:50 ` [PATCH v4 1/2] sched: Create architecture specific sched domain distances Tim Chen
  2025-09-19 17:50 ` [PATCH v4 2/2] sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode Tim Chen
  0 siblings, 2 replies; 8+ messages in thread
From: Tim Chen @ 2025-09-19 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Tim Chen, Juri Lelli, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, Vincent Guittot, Len Brown,
	linux-kernel, Chen Yu, K Prateek Nayak, Gautham R . Shenoy,
	Zhao Liu, Vinicius Costa Gomes, Arjan Van De Ven

While testing Granite Rapids (GNR) and Clearwater Forest (CWF) in
SNC-3 mode, we encountered sched domain build errors in dmesg.
Asymmetric node distances from local node to nodes in remote package
was not expected by the scheduler domain code. Multiple distances
to different remote nodes led to multiple grouping of partial remote nodes
with local nodes, and too many sched domain hierarchy levels.

Simplify the remote node distances for the purpose of building sched
domains for GNR and CWF. Replace remote distance to nodes in the same
remote package with average distance to the remote node.  This fixed the
domain build errors and reduced the number of NUMA sched domain levels.

The actual SLIT NUMA node distances are kept separately should the node
distances be modified for building sched domains. NUMA balancing still
need to use the actual distance to locate remote node that is closer to
a task numa_group.

Thanks to Pratek, Chen Yu and Peter from reviewing previous
versions of the patches and providing valuable feedbacks.
Please add your Reviewed-by if this version looks okay to you.

Thanks.

Tim

Changes in v4:
- Move average node distance computation to x86 specific code
- Put all the changes under CONFIG_NUMA.
- Use __free() to simplify code.
- Allocate separate distance array only if node distances are
  modified.
- Assert that we don't have more than 2 packages for GNR/CWF
  when replacing remote node distances with average remote node
  distance.
- Comments and code style clean ups.
- Link to v3:
  https://lore.kernel.org/lkml/cover.1757614784.git.tim.c.chen@linux.intel.com/

Changes in v3:
- Simplify sched_record_numa_dist() by getting rid of max distance
  computation. 
- minor clean ups.
- Link to v2:
  https://lore.kernel.org/lkml/61a6adbb845c148361101e16737307c8aa7ee362.1757097030.git.tim.c.chen@linux.intel.com/

Changes in v2:
- Allow modification of NUMA distances by architecture to be the
  sched domain NUMA distances for building sched domains to
  simplify NUMA domains.
  Maintain separate NUMA distances for the purpose of building
  sched domains from actual NUMA distances.
- Use average remote node distance as the distance to nodes in remote
  packages for GNR and CWF.
- Remove the original fix for topology_span_sane() that's superseded
  by better fix from Pratek.
  https://lore.kernel.org/lkml/175688671425.1920.13690753997160836570.tip-bot2@tip-bot2/.
- Link to v1: https://lore.kernel.org/lkml/cover.1755893468.git.tim.c.chen@linux.intel.com/


Tim Chen (2):
  sched: Create architecture specific sched domain distances
  sched/topology: Fix sched domain build error for GNR, CWF in SNC-3
    mode

 arch/x86/kernel/smpboot.c      |  70 ++++++++++++++++++++
 include/linux/sched/topology.h |   1 +
 kernel/sched/topology.c        | 117 ++++++++++++++++++++++++++-------
 3 files changed, 166 insertions(+), 22 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-01  1:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-19 17:50 [PATCH v4 0/2] Fix NUMA sched domain build errors for GNR and CWF Tim Chen
2025-09-19 17:50 ` [PATCH v4 1/2] sched: Create architecture specific sched domain distances Tim Chen
2025-09-27 12:34   ` Chen, Yu C
2025-09-29 22:18     ` Tim Chen
2025-09-30  2:28       ` Chen, Yu C
2025-09-30 17:30         ` Tim Chen
2025-10-01  1:10           ` Chen, Yu C
2025-09-19 17:50 ` [PATCH v4 2/2] sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox