All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann	 <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman	 <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tim Chen	 <tim.c.chen@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Libo Chen	 <libo.chen@oracle.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Len Brown <len.brown@intel.com>,
	linux-kernel@vger.kernel.org, Chen Yu <yu.c.chen@intel.com>,
	 "Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	Arjan Van De Ven <arjan.van.de.ven@intel.com>
Subject: Re: [PATCH v3 2/2] sched: Fix sched domain build error for GNR, CWF in SNC-3 mode
Date: Mon, 15 Sep 2025 10:15:15 -0700	[thread overview]
Message-ID: <fc688b2dc7d6dcc27bf86a17b291962aeac18bb1.camel@linux.intel.com> (raw)
In-Reply-To: <91ab8136-64f3-45e3-9fec-567aab193353@amd.com>

On Fri, 2025-09-12 at 10:38 +0530, K Prateek Nayak wrote:
> Hello Tim,
> 
> On 9/12/2025 12:00 AM, Tim Chen wrote:
> > It is possible for Granite Rapids (GNR) and Clearwater Forest
> > (CWF) to have up to 3 dies per package. When sub-numa cluster (SNC-3)
> > is enabled, each die will become a separate NUMA node in the package
> > with different distances between dies within the same package.
> > 
> > For example, on GNR, we see the following numa distances for a 2 socket
> > system with 3 dies per socket:
> > 
> >     package 1       package2
> > 	----------------
> > 	|               |
> >     ---------       ---------
> >     |   0   |       |   3   |
> >     ---------       ---------
> > 	|               |
> >     ---------       ---------
> >     |   1   |       |   4   |
> >     ---------       ---------
> > 	|               |
> >     ---------       ---------
> >     |   2   |       |   5   |
> >     ---------       ---------
> > 	|               |
> > 	----------------
> > 
> > node distances:
> > node     0    1    2    3    4    5
> > 0:   	10   15   17   21   28   26
> > 1:   	15   10   15   23   26   23
> > 2:   	17   15   10   26   23   21
> > 3:   	21   28   26   10   15   17
> > 4:   	23   26   23   15   10   15
> > 5:   	26   23   21   17   15   10
> > 
> > The node distances above led to 2 problems:
> > 
> > 1. Asymmetric routes taken between nodes in different packages led to
> > asymmetric scheduler domain perspective depending on which node you
> > are on.  Current scheduler code failed to build domains properly with
> > asymmetric distances.
> > 
> > 2. Multiple remote distances to respective tiles on remote package create
> > too many levels of domain hierarchies grouping different nodes between
> > remote packages.
> > 
> > For example, the above GNR-X topology lead to NUMA domains below:
> > 
> > Sched domains from the perspective of a CPU in node 0, where the number
> > in bracket represent node number.
> > 
> > NUMA-level 1    [0,1] [2]
> > NUMA-level 2    [0,1,2] [3]
> > NUMA-level 3    [0,1,2,3] [5]
> > NUMA-level 4    [0,1,2,3,5] [4]
> > 
> > Sched domains from the perspective of a CPU in node 4
> > NUMA-level 1    [4] [3,5]
> > NUMA-level 2    [3,4,5] [0,2]
> > NUMA-level 3    [0,2,3,4,5] [1]
> > 
> > Scheduler group peers for load balancing from the perspective of CPU 0
> > and 4 are different.  Improper task could be chosen for load balancing
> > between groups such as [0,2,3,4,5] [1].  Ideally you should choose nodes
> > in 0 or 2 that are in same package as node 1 first.  But instead tasks
> > in the remote package node 3, 4, 5 could be chosen with an equal chance
> > and could lead to excessive remote package migrations and imbalance of
> > load between packages.  We should not group partial remote nodes and
> > local nodes together.
> > Simplify the remote distances for CWF-X and GNR-X for the purpose of
> > sched domains building, which maintains symmetry and leads to a more
> > reasonable load balance hierarchy.
> > 
> > The sched domains from the perspective of a CPU in node 0 NUMA-level 1
> > is now
> > NUMA-level 1    [0,1] [2]
> > NUMA-level 2    [0,1,2] [3,4,5]
> > 
> > The sched domains from the perspective of a CPU in node 4 NUMA-level 1
> > is now
> > NUMA-level 1    [4] [3,5]
> > NUMA-level 2    [3,4,5] [0,1,2]
> > 
> > We have the same balancing perspective from node 0 or node 4.  Loads are
> > now balanced equally between packages.
> > 
> > Tested-by: Zhao Liu <zhao1.liu@intel.com>
> > Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Feel free to include:
> 
> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

Thanks for reviewing and testing.

Tim

  reply	other threads:[~2025-09-15 17:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 18:30 [PATCH v3 0/2] Fix NUMA sched domain build errors for GNR and CWF Tim Chen
2025-09-11 18:30 ` [PATCH v3 1/2] sched: Create architecture specific sched domain distances Tim Chen
2025-09-12  3:23   ` K Prateek Nayak
2025-09-15 16:44     ` Tim Chen
2025-09-17  6:45       ` K Prateek Nayak
2025-09-12  5:24   ` Chen, Yu C
2025-09-15 16:49     ` Tim Chen
2025-09-15 17:16     ` Tim Chen
2025-09-15 12:37   ` Peter Zijlstra
2025-09-15 17:13     ` Tim Chen
2025-09-15 20:04       ` Tim Chen
2025-09-11 18:30 ` [PATCH v3 2/2] sched: Fix sched domain build error for GNR, CWF in SNC-3 mode Tim Chen
2025-09-12  5:08   ` K Prateek Nayak
2025-09-15 17:15     ` Tim Chen [this message]
2025-09-12  5:39   ` Chen, Yu C
2025-09-12  9:23     ` K Prateek Nayak
2025-09-12 11:59       ` Chen, Yu C
2025-09-15 12:46   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc688b2dc7d6dcc27bf86a17b291962aeac18bb1.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=arjan.van.de.ven@intel.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=len.brown@intel.com \
    --cc=libo.chen@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vinicius.gomes@intel.com \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.