Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chen Yu <yu.c.chen@intel.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-tip-commits@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
	Tim Chen <tim.c.chen@intel.com>, <x86@kernel.org>,
	Gautham Shenoy <gautham.shenoy@amd.com>
Subject: Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()
Date: Mon, 17 Jul 2023 09:09:12 +0800	[thread overview]
Message-ID: <ZLSUuEdwDQ2lbR/Z@chenyu5-mobl2> (raw)
In-Reply-To: <8702a92f-317e-c38f-48ec-5ac373ba5072@amd.com>

Hi Prateek,

On 2023-07-13 at 09:13:29 +0530, K Prateek Nayak wrote:
> Hello Chenyu,
> 
> > 
> > Tested on Sapphire Rapids, which has 2 x 56C/112T and 224 CPUs in total. C-states
> > deeper than C1E are disabled. Turbo is disabled. CPU frequency governor is performance.
> > 
> > The baseline is v6.4-rc1 tip:sched/core, on top of
> > commit 637c9509f3db ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()")
> > 
> > patch0: this SD_IDLE_SIBLING patch with above change to TOPOLOGY_SD_FLAGS
> > patch1: hack patch to split 1 LLC domain into 4 smaller LLC domains(with some fixes on top of
> >         https://lore.kernel.org/lkml/ZJKjvx%2FNxooM5z1Y@chenyu5-mobl2.ccr.corp.intel.com/)
> >         The test data in above link is invalid due to bugs in the hack patch, fixed in this version)
> > 
> > 
> > Baseline vs Baseline+patch0:
> > There is no much difference between the two, and it is expected because Sapphire Rapids
> > does not have multiple LLC domains within 1 Numa node(also consider the run to run variation):
> >

[snip] 

> > 
> > Baseline+patch1    vs    Baseline+patch0+patch1:
> > 
> > With multiple LLC domains in 1 Numa node, SD_IDLE_SIBLING brings improvement
> > to hackbench/schbench, while brings downgrading to netperf/tbench. This is aligned
> > with what was observed previously, if the waker and wakee wakes up each other
> > frequently, they would like to be put together for cache locality. While for
> > other tasks do not have shared resource, always choosing an idle CPU is better.
> > Maybe in the future we can look back at SIS_SHORT and terminates scan in
> > select_idle_node() if the waker and wakee have close relationship with
> > each other.
> 
> Gautham and I were discussing this and realized that when calling
> ttwu_queue_wakelist(), in a simulated split-LLC case, ttwu_queue_cond()
> will recommend using the wakelist and send an IPI despite the
> groups of the DIE domain sharing the cache in your case.
> 
> Can you check if the following change helps the regression?
> (Note: Completely untested and there may be other such cases lurking
> around that we've not yet considered)
> 

Good point. There are quite some cpus_share_cache() in the code, and it
could behave differently if simulated split-LLC is enabled. For example,
the chance to choose a previous CPU, or a recent_used_cpu is lower in
select_idle_sibling(), because the range of cpus_share_cache() shrinks.

I launched netperf(224 threads) and hackbench (2 groups) with below patch
applied, it seems there was no much difference(consider the run-to-run variation)

patch2: the cpus_share_cache() change below.


Baseline+patch1    vs    Baseline+patch0+patch1+patch2:


netperf
=======
case            	load    	baseline(std%)	compare%( std%)
TCP_RR          	224-threads	 1.00 (  2.36)	 -0.19 (  2.30)

hackbench
=========
case            	load    	baseline(std%)	compare%( std%)
process-pipe    	2-groups	 1.00 (  4.78)	 -6.28 (  9.42)

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a68d1276bab0..a8cab1c81aca 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3929,7 +3929,7 @@ static inline bool ttwu_queue_cond(struct task_struct *p, int cpu)
>  	 * If the CPU does not share cache, then queue the task on the
>  	 * remote rqs wakelist to avoid accessing remote data.
>  	 */
> -	if (!cpus_share_cache(smp_processor_id(), cpu))
> +	if (cpu_to_node(smp_processor_id()) !=  cpu_to_node(cpu))
>  		return true;
>  
>  	if (cpu == smp_processor_id())
> --
>

Then I did a hack patch3 in select_idle_node(), to put C/S 1:1 wakeup workloads together.
For netperf, it is a 1:1 waker/wakee relationship, for hackbench, it is 1:16 waker/wakee
by default(verified by bpftrace).


patch3:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5904da690f59..3bdfbd546f14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7161,6 +7161,11 @@ select_idle_node(struct task_struct *p, struct sched_domain *sd, int target)
 	if (!parent || parent->flags & SD_NUMA)
 		return -1;
 
+	/* Tasks pair should be put on local LLC as much as possible. */
+	if (current->last_wakee == p && p->last_wakee == current &&
+	    !current->wakee_flips && !p->wakee_flips)
+		return -1;
+
 	sg = parent->groups;
 	do {
 		int cpu = cpumask_first(sched_group_span(sg));
-- 
2.25.1

Baseline+patch1    vs    Baseline+patch0+patch1+patch3:

netperf
=======
case            	load    	baseline(std%)	compare%( std%)
TCP_RR          	224-threads	 1.00 (  2.36)	+804.31 (  2.88)


hackbench
=========
case            	load    	baseline(std%)	compare%( std%)
process-pipe    	2-groups	 1.00 (  4.78)	 -6.28 (  6.69)


It brings the performance of netperf back, while more or less keeps the improvment
of hackbench(consider the run-run variance).

thanks,
Chenyu

next prev parent reply	other threads:[~2023-07-17  1:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230605152531eucas1p2a10401ec2180696cc9a5f2e94a67adca@eucas1p2.samsung.com>
2023-05-31 12:04 ` [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling() tip-bot2 for Peter Zijlstra
2023-06-01  3:41   ` Abel Wu
2023-06-01  8:09     ` Peter Zijlstra
2023-06-01  9:33   ` K Prateek Nayak
2023-06-01 11:13     ` Peter Zijlstra
2023-06-01 11:56       ` Peter Zijlstra
2023-06-01 12:00         ` Peter Zijlstra
2023-06-01 14:47           ` Peter Zijlstra
2023-06-01 15:35             ` Peter Zijlstra
2023-06-02  5:13             ` K Prateek Nayak
2023-06-02  6:54               ` Peter Zijlstra
2023-06-02  9:19                 ` K Prateek Nayak
2023-06-07 18:32                 ` K Prateek Nayak
2023-06-13  8:25                   ` Peter Zijlstra
2023-06-13 10:30                     ` K Prateek Nayak
2023-06-14  8:17                       ` Peter Zijlstra
2023-06-14 14:58                         ` Chen Yu
2023-06-14 15:13                           ` Peter Zijlstra
2023-06-21  7:16                             ` Chen Yu
2023-06-16  6:34                     ` K Prateek Nayak
2023-07-05 11:57                       ` Peter Zijlstra
2023-07-08 13:17                         ` Chen Yu
2023-07-12 17:19                           ` Chen Yu
2023-07-13  3:43                             ` K Prateek Nayak
2023-07-17  1:09                               ` Chen Yu [this message]
2023-06-02  7:00               ` Peter Zijlstra
2023-06-01 14:51           ` Peter Zijlstra
2023-06-02  5:17             ` K Prateek Nayak
2023-06-02  9:06               ` Gautham R. Shenoy
2023-06-02 11:23                 ` Peter Zijlstra
2023-06-01 16:44       ` Chen Yu
2023-06-02  3:12       ` K Prateek Nayak
2023-06-05 15:25   ` Marek Szyprowski
2023-06-05 17:56     ` Peter Zijlstra
2023-06-05 19:07     ` Peter Zijlstra
2023-06-05 22:20       ` Marek Szyprowski
2023-06-06  7:58       ` Chen Yu
2023-06-01  8:43 tip-bot2 for Peter Zijlstra

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5904da690f5 dfblob:3bdfbd546f1 )
 OR (
bs:"Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLSUuEdwDQ2lbR/Z@chenyu5-mobl2 \
    --to=yu.c.chen@intel.com \
    --cc=gautham.shenoy@amd.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox