public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chen Yu <yu.c.chen@intel.com>
To: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Tim Chen <tim.c.chen@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Chen Yu <yu.chen.surf@gmail.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 0/3] Introduce SIS_CACHE to choose previous CPU during task wakeup
Date: Sun, 26 Nov 2023 17:25:48 +0800	[thread overview]
Message-ID: <ZWMPHC+uWdojgDB3@chenyu5-mobl2.ccr.corp.intel.com> (raw)
In-Reply-To: <23e9a0f2-be96-4eb6-0242-2865180c1d6c@linux.ibm.com>

On 2023-11-26 at 14:14:20 +0530, Madadi Vineeth Reddy wrote:
> Hi Chen Yu,
> 
> On 21/11/23 13:09, Chen Yu wrote:
> > v1  -> v2:
> > - Move the task sleep duration from sched_entity to task_struct. (Aaron Lu)
> > - Refine the task sleep duration calculation based on task's previous running
> >   CPU. (Aaron Lu)
> > - Limit the cache-hot idle CPU scan depth to reduce the time spend on
> >   searching, to fix the regression. (K Prateek Nayak)
> > - Add test results of the real life workload per request from Ingo
> >     Daytrader on a power system. (Madadi Vineeth Reddy)
> >     OLTP workload on Xeon Sapphire Rapids.
> > - Refined the commit log, added Reviewed-by tag to PATCH 1/3
> >   (Mathieu Desnoyers).
> > 
> > RFC -> v1:
> > - drop RFC
> > - Only record the short sleeping time for each task, to better honor the
> >   burst sleeping tasks. (Mathieu Desnoyers)
> > - Keep the forward movement monotonic for runqueue's cache-hot timeout value.
> >   (Mathieu Desnoyers, Aaron Lu)
> > - Introduce a new helper function cache_hot_cpu() that considers
> >   rq->cache_hot_timeout. (Aaron Lu)
> > - Add analysis of why inhibiting task migration could bring better throughput
> >   for some benchmarks. (Gautham R. Shenoy)
> > - Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
> >   select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
> >   (K Prateek Nayak)
> > 
> > Thanks for the comments and tests!
> > 
> > ----------------------------------------------------------------------
> > 
> > This series aims to continue the discussion of how to make the wakee
> > to choose its previous CPU easier.
> > 
> > When task p is woken up, the scheduler leverages select_idle_sibling()
> > to find an idle CPU for it. p's previous CPU is usually a preference
> > because it can improve cache locality. However in many cases, the
> > previous CPU has already been taken by other wakees, thus p has to
> > find another idle CPU.
> > 
> > Inhibit the task migration could benefit many workloads. Inspired by
> > Mathieu's proposal to limit the task migration ratio[1], introduce
> > the SIS_CACHE. It considers the sleep time of the task for better
> > task placement. Based on the task's short sleeping history, tag p's
> > previous CPU as cache-hot. Later when p is woken up, it can choose
> > its previous CPU in select_idle_sibling(). When other task is
> > woken up, skip this cache-hot idle CPU and try the next idle CPU
> > when possible. The idea of SIS_CACHE is to optimize the idle CPU
> > scan sequence. The extra scan time is minimized by restricting the
> > scan depth of cache-hot CPUs to 50% of the scan depth of SIS_UTIL.
> > 
> > This test is based on tip/sched/core, on top of
> > Commit ada87d23b734
> > ("x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram")
> > 
> > This patch set has shown 15% ~ 70% improvements for client/server
> > workloads like netperf and tbench. It shows 0.7% improvement of
> > OLTP with 0.2% run-to-run variation on Xeon 240 CPUs system.
> > There is 2% improvement of another real life workload Daytrader
> > per the test of Madadi on a power system with 96 CPUs. Prateek
> > has helped check there is no obvious microbenchmark regression
> > of the v2 on a 3rd Generation EPYC System with 128 CPUs.
> > 
> 
> Tested the patch on power system with 46 cores. Total of 368 CPU's.
> System has 8 NUMA nodes.
> 
> Below are some of the benchmark results.
> 
> schbench(new) 99.0th latency (lower is better)
> ========
> case            load        	baseline[pct imp](std%)       SIS_CACHE[pct imp]( std%)
> normal          1-mthreads      1.00 [ 0.00]( 4.34)            1.02 [ -2.00]( 5.98)
> normal          2-mthreads      1.00 [ 0.00]( 13.95)           1.08 [ -8.00]( 10.39)
> normal          4-mthreads      1.00 [ 0.00]( 6.20)            0.94 [ +6.00]( 10.90)
> normal          6-mthreads      1.00 [ 0.00]( 12.76)           1.03 [ -3.00]( 9.33)
> 
> It seems like schbench is not much impacted with this patch(The pct imp of schbench is within the std%).
> I expected some regression in wakeup latency while searching for an idle cpu which is not cache hot.
> But I guess limiting the search depth had helped.
>

I think so. Cutting the cache-hot cpu scan depth to 50% seems to also cure the regression
reported by Prateek.
 
> 
> producer_consumer avg time/access (lower is better)
> ========
> loads per consumer iteration   baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
> 5                  		1.00 [ 0.00]( 0.00)            0.93 [ +7.00]( 4.77)
> 10                   		1.00 [ 0.00]( 0.00)            1.00 [  0.00]( 0.00)
> 20                    		1.00 [ 0.00]( 0.00)            1.00 [  0.00]( 0.00)
> 
> The main goal of the patch of improving cache locality is reflected as SIS_CACHE only improves in this workload, 
> when loads per consumer iteration is lower.
> 
> 
> hackbench normalized time in seconds (lower is better)
> ========
> case            load        baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
> process-sockets 1-groups     1.00 [ 0.00]( 4.78)            0.99 [ +1.00]( 6.45)
> process-sockets 2-groups     1.00 [ 0.00]( 0.97)            1.02 [ -2.00]( 1.87)
> process-sockets 4-groups     1.00 [ 0.00]( 3.63)            1.01 [ -1.00]( 2.96)
> process-sockets 8-groups     1.00 [ 0.00]( 0.43)            1.00 [  0.00]( 0.27)
> process-pipe    1-groups     1.00 [ 0.00](23.77)            0.88 [+12.00](22.77)
> process-pipe    2-groups     1.00 [ 0.00]( 3.44)            1.03 [ -3.00]( 4.00)
> process-pipe    4-groups     1.00 [ 0.00]( 2.41)            0.98 [ +2.00]( 3.88)
> process-pipe    8-groups     1.00 [ 0.00]( 7.09)            1.07 [ -7.00]( 4.25)
> threads-pipe    1-groups     1.00 [ 0.00](18.47)            1.11 [-11.00](24.21)
> threads-pipe    2-groups     1.00 [ 0.00]( 6.45)            0.97 [ +3.00]( 5.58)
> threads-pipe    4-groups     1.00 [ 0.00]( 5.63)            0.96 [ +2.00]( 5.90)
> threads-pipe    8-groups     1.00 [ 0.00]( 1.65)            1.03 [ -3.00]( 3.97)
> threads-sockets 1-groups     1.00 [ 0.00]( 2.00)            1.00 [  0.00]( 0.65)
> threads-sockets 2-groups     1.00 [ 0.00]( 1.69)            1.02 [ -2.00]( 1.48)
> threads-sockets 4-groups     1.00 [ 0.00]( 5.66)            1.01 [ -1.00]( 3.56)
> threads-sockets 8-groups     1.00 [ 0.00]( 0.26)            0.99 [ +1.00]( 0.36)
> 
> hackbench is not impacted.
> 
> 
> Daytrader throughput (higher is better)
> ========
> instances,users                baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
> 3,30                 		1.00 [ 0.00]( 2.30)            1.02 [ +2.00]( 1.64)
> 3,60                 		1.00 [ 0.00]( 0.55)            1.01 [ +1.00]( 1.41)
> 3,90                  		1.00 [ 0.00]( 1.20)            1.02 [ +2.00]( 1.04)
> 3,120                  		1.00 [ 0.00]( 0.84)            1.02 [ +2.00]( 1.02)
> 
> A real life workload like daytrader is benefiting slightly with this patch.
> 
> 
> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
>

Thanks!

Best,
Chenyu 
> Thanks and Regards
> Madadi Vineeth Reddy

  reply	other threads:[~2023-11-26  9:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-21  7:39 [PATCH v2 0/3] Introduce SIS_CACHE to choose previous CPU during task wakeup Chen Yu
2023-11-21  7:39 ` [PATCH v2 1/3] sched/fair: Record the task sleeping time as the cache hot duration Chen Yu
2023-11-21  7:39 ` [PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU Chen Yu
2023-11-21  7:40 ` [PATCH v2 3/3] sched/fair: do not scribble cache-hot CPU in select_idle_cpu() Chen Yu
2023-11-29 17:26   ` Madadi Vineeth Reddy
2023-11-30  6:43     ` Chen Yu
2023-12-01 13:56       ` Madadi Vineeth Reddy
2024-02-19 11:50   ` Hillf Danton
2024-02-19 14:24     ` Chen Yu
2023-11-25  7:10 ` [PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU Madadi Vineeth Reddy
2023-11-26  7:21   ` Chen Yu
2023-11-26  8:44 ` [PATCH v2 0/3] Introduce SIS_CACHE to choose previous CPU during task wakeup Madadi Vineeth Reddy
2023-11-26  9:25   ` Chen Yu [this message]
2024-02-18  9:27 ` Madadi Vineeth Reddy
2024-02-18 13:01   ` Chen Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZWMPHC+uWdojgDB3@chenyu5-mobl2.ccr.corp.intel.com \
    --to=yu.c.chen@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineethr@linux.ibm.com \
    --cc=yu.chen.surf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox