linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@intel.com>
To: Alex Shi <alex.shi@intel.com>
Cc: mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com, bp@alien8.de,
	pjt@google.com, namhyung@kernel.org, efault@gmx.de,
	vincent.guittot@linaro.org, gregkh@linuxfoundation.org,
	preeti@linux.vnet.ibm.com, viresh.kumar@linaro.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch v6 0/21] sched: power aware scheduling
Date: Thu, 04 Apr 2013 08:57:58 +0800	[thread overview]
Message-ID: <515CD016.6050202@intel.com> (raw)
In-Reply-To: <1364654108-16307-1-git-send-email-alex.shi@intel.com>

On 03/30/2013 10:34 PM, Alex Shi wrote:
> This patch set implement/consummate the rough power aware scheduling
> proposal: https://lkml.org/lkml/2012/8/13/139.

BTW, this task packing feature causes more cpu freq boost because part
cores idle. And since cpu freq boost is more power efficient.
that is much helpful on performance/watts. like the 16/32 thread kbuild
results show:

         powersaving              performance
> x = 2    189.416 /228 23          193.355 /209 24
> x = 4    215.728 /132 35          219.69 /122 37
> x = 8    244.31 /75 54            252.709 /68 58
> x = 16   299.915 /43 77           259.127 /58 66
> x = 32   341.221 /35 83           323.418 /38 81
>
> data explains: 189.416 /228 23
> 	189.416: average Watts during compilation
> 	228: seconds(compile time)
> 	23:  scaled performance/watts = 1000000 / seconds / watts
>
> 
> The code also on this git tree:
> https://github.com/alexshi/power-scheduling.git power-scheduling
> 
> The patch defines a new policy 'powersaving', that try to pack tasks on
> each sched groups level. Then it can save much power when task number in
> system is no more than LCPU number.
> 
> As mentioned in the power aware scheduling proposal, Power aware
> scheduling has 2 assumptions:
> 1, race to idle is helpful for power saving
> 2, less active sched groups will reduce cpu power consumption
> 
> The first assumption make performance policy take over scheduling when
> any group is busy.
> The second assumption make power aware scheduling try to pack disperse
> tasks into fewer groups.
> 
> Compare to the removed power balance, this power balance has the following
> advantages:
> 1, simpler sys interface
> 	only 2 sysfs interface VS 2 interface for each of LCPU
> 2, cover on all cpu topology 
> 	effect on all domain level VS only work on SMT/MC domain
> 3, Less task migration 
> 	mutual exclusive perf/power LB VS balance power on balanced performance
> 4, considered system load threshing 
> 	yes VS no
> 5, transitory task considered       
> 	yes VS no
> 
> BTW, like sched numa, Power aware scheduling is also a kind of cpu
> locality oriented scheduling.
> 
> Thanks comments/suggestions from PeterZ, Linus Torvalds, Andrew Morton,
> Ingo, Len Brown, Arjan, Borislav Petkov, PJT, Namhyung Kim, Mike
> Galbraith, Greg, Preeti, Morten Rasmussen, Rafael etc.
> 
> Since the patch can perfect pack tasks into fewer groups, I just show
> some performance/power testing data here:
> =========================================
> $for ((i = 0; i < x; i++)) ; do while true; do :; done  &   done
> 
> On my SNB laptop with 4 core* HT: the data is avg Watts
>          powersaving     performance
> x = 8	 72.9482 	 72.6702
> x = 4	 61.2737 	 66.7649
> x = 2	 44.8491 	 59.0679
> x = 1	 43.225 	 43.0638
> 
> on SNB EP machine with 2 sockets * 8 cores * HT:
>          powersaving     performance
> x = 32	 393.062 	 395.134
> x = 16	 277.438 	 376.152
> x = 8	 209.33 	 272.398
> x = 4	 199 	         238.309
> x = 2	 175.245 	 210.739
> x = 1	 174.264 	 173.603
> 
> 
> tasks number keep waving benchmark, 'make -j <x> vmlinux'
> on my SNB EP 2 sockets machine with 8 cores * HT:
>          powersaving              performance
> x = 2    189.416 /228 23          193.355 /209 24
> x = 4    215.728 /132 35          219.69 /122 37
> x = 8    244.31 /75 54            252.709 /68 58
> x = 16   299.915 /43 77           259.127 /58 66
> x = 32   341.221 /35 83           323.418 /38 81
> 
> data explains: 189.416 /228 23
> 	189.416: average Watts during compilation
> 	228: seconds(compile time)
> 	23:  scaled performance/watts = 1000000 / seconds / watts
> The performance value of kbuild is better on threads 16/32, that's due
> to lazy power balance reduced the context switch and CPU has more boost 
> chance on powersaving balance.
> 
> Some performance testing results:
> ---------------------------------
> 
> Tested benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> loopback netperf. on my core2, nhm, wsm, snb, platforms.
> 
> results:
> A, no clear performance change found on 'performance' policy.
> B, specjbb2005 drop 5~7% on both of policy whenever with openjdk or
>    jrockit on powersaving polocy
> C, hackbench drops 40% with powersaving policy on snb 4 sockets platforms.
> Others has no clear change.
> 
> ===
> Changelog:
> V6 change:
> a, remove 'balance' policy.
> b, consider RT task effect in balancing
> c, use avg_idle as burst wakeup indicator
> d, balance on task utilization in fork/exec/wakeup.
> e, no power balancing on SMT domain.
> 
> V5 change:
> a, change sched_policy to sched_balance_policy
> b, split fork/exec/wake power balancing into 3 patches and refresh
> commit logs
> c, others minors clean up
> 
> V4 change:
> a, fix few bugs and clean up code according to Morten Rasmussen, Mike
> Galbraith and Namhyung Kim. Thanks!
> b, take Morten Rasmussen's suggestion to use different criteria for
> different policy in transitory task packing.
> c, shorter latency in power aware scheduling.
> 
> V3 change:
> a, engaged nr_running and utilisation in periodic power balancing.
> b, try packing small exec/wake tasks on running cpu not idle cpu.
> 
> V2 change:
> a, add lazy power scheduling to deal with kbuild like benchmark.
> 
> 
> -- Thanks Alex
> [patch v6 01/21] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 02/21] sched: set initial value of runnable avg for new
> [patch v6 03/21] sched: only count runnable avg on cfs_rq's
> [patch v6 04/21] sched: add sched balance policies in kernel
> [patch v6 05/21] sched: add sysfs interface for sched_balance_policy
> [patch v6 06/21] sched: log the cpu utilization at rq
> [patch v6 07/21] sched: add new sg/sd_lb_stats fields for incoming
> [patch v6 08/21] sched: move sg/sd_lb_stats struct ahead
> [patch v6 09/21] sched: scale_rt_power rename and meaning change
> [patch v6 10/21] sched: get rq potential maximum utilization
> [patch v6 11/21] sched: detect wakeup burst with rq->avg_idle
> [patch v6 12/21] sched: add power aware scheduling in fork/exec/wake
> [patch v6 13/21] sched: using avg_idle to detect bursty wakeup
> [patch v6 14/21] sched: packing transitory tasks in wakeup power
> [patch v6 15/21] sched: add power/performance balance allow flag
> [patch v6 16/21] sched: pull all tasks from source group
> [patch v6 17/21] sched: no balance for prefer_sibling in power
> [patch v6 18/21] sched: add new members of sd_lb_stats
> [patch v6 19/21] sched: power aware load balance
> [patch v6 20/21] sched: lazy power balance
> [patch v6 21/21] sched: don't do power balance on share cpu power
> 


-- 
Thanks
    Alex

      parent reply	other threads:[~2013-04-04  0:58 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-30 14:34 [patch v6 0/21] sched: power aware scheduling Alex Shi
2013-03-30 14:34 ` [patch v6 01/21] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-03-30 14:34 ` [patch v6 02/21] sched: set initial value of runnable avg for new forked task Alex Shi
2013-03-30 14:34 ` [patch v6 03/21] sched: only count runnable avg on cfs_rq's nr_running Alex Shi
2013-04-02 14:30   ` Vincent Guittot
2013-04-03  1:02     ` Alex Shi
2013-04-03  1:23       ` Paul Turner
2013-04-03  2:12         ` Alex Shi
2013-03-30 14:34 ` [patch v6 04/21] sched: add sched balance policies in kernel Alex Shi
2013-03-30 14:34 ` [patch v6 05/21] sched: add sysfs interface for sched_balance_policy selection Alex Shi
2013-03-30 14:34 ` [patch v6 06/21] sched: log the cpu utilization at rq Alex Shi
2013-03-30 14:34 ` [patch v6 07/21] sched: add new sg/sd_lb_stats fields for incoming fork/exec/wake balancing Alex Shi
2013-03-30 14:34 ` [patch v6 08/21] sched: move sg/sd_lb_stats struct ahead Alex Shi
2013-03-30 14:34 ` [patch v6 09/21] sched: scale_rt_power rename and meaning change Alex Shi
2013-03-30 14:34 ` [patch v6 10/21] sched: get rq potential maximum utilization Alex Shi
2013-04-02  9:02   ` Namhyung Kim
2013-04-02 13:38     ` Alex Shi
2013-04-03  2:15     ` Alex Shi
2013-04-03  2:22       ` Paul Turner
2013-04-03  2:35         ` Alex Shi
2013-04-03  8:07         ` Alex Shi
2013-04-02 14:38   ` Vincent Guittot
2013-04-03  1:11     ` Alex Shi
2013-03-30 14:34 ` [patch v6 11/21] sched: detect wakeup burst with rq->avg_idle Alex Shi
2013-04-03  8:12   ` Alex Shi
2013-03-30 14:34 ` [patch v6 12/21] sched: add power aware scheduling in fork/exec/wake Alex Shi
2013-04-01  9:50   ` Preeti U Murthy
2013-04-01 13:43     ` Alex Shi
2013-03-30 14:35 ` [patch v6 13/21] sched: using avg_idle to detect bursty wakeup Alex Shi
2013-04-03  5:08   ` Namhyung Kim
2013-04-03  5:41     ` Alex Shi
2013-04-03  8:10     ` Alex Shi
2013-03-30 14:35 ` [patch v6 14/21] sched: packing transitory tasks in wakeup power balancing Alex Shi
2013-03-30 14:35 ` [patch v6 15/21] sched: add power/performance balance allow flag Alex Shi
2013-03-30 14:35 ` [patch v6 16/21] sched: pull all tasks from source group Alex Shi
2013-03-30 14:35 ` [patch v6 17/21] sched: no balance for prefer_sibling in power scheduling Alex Shi
2013-03-30 14:35 ` [patch v6 18/21] sched: add new members of sd_lb_stats Alex Shi
2013-03-30 14:35 ` [patch v6 19/21] sched: power aware load balance Alex Shi
2013-03-30 14:35 ` [patch v6 20/21] sched: lazy power balance Alex Shi
2013-03-30 14:35 ` [patch v6 21/21] sched: don't do power balance on share cpu power domain Alex Shi
2013-04-01  5:05 ` [patch v6 0/21] sched: power aware scheduling Michael Wang
2013-04-01  6:17   ` Alex Shi
2013-04-01  6:20 ` Alex Shi
2013-04-03  8:17 ` Alex Shi
2013-04-04  0:57 ` Alex Shi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515CD016.6050202@intel.com \
    --to=alex.shi@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=efault@gmx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).