Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Arjan van de Ven <arjan@linux.intel.com>
To: Borislav Petkov <bp@alien8.de>, Alex Shi <alex.shi@intel.com>,
	Alex Shi <lkml.alex@gmail.com>,
	rob@landley.net, mingo@redhat.com, peterz@infradead.org,
	gregkh@linuxfoundation.org, andre.przywara@amd.com, rjw@sisk.pl,
	paul.gortmaker@windriver.com, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
	pjt@google.com, vincent.guittot@linaro.org,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>
Subject: Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling
Date: Tue, 11 Dec 2012 08:40:40 -0800	[thread overview]
Message-ID: <50C76208.107@linux.intel.com> (raw)
In-Reply-To: <20121211161320.GA28827@liondog.tnic>

On 12/11/2012 8:13 AM, Borislav Petkov wrote:
> On Tue, Dec 11, 2012 at 08:03:01AM -0800, Arjan van de Ven wrote:
>> On 12/11/2012 7:48 AM, Borislav Petkov wrote:
>>> On Tue, Dec 11, 2012 at 08:10:20PM +0800, Alex Shi wrote:
>>>> Another testing of parallel compress with pigz on Linus' git tree.
>>>> results show we get much better performance/power with powersaving and
>>>> balance policy:
>>>>
>>>> testing command:
>>>> #pigz -k -c  -p$x -r linux* &> /dev/null
>>>>
>>>> On a NHM EP box
>>>>           powersaving               balance   	         performance
>>>> x = 4    166.516 /88 68           170.515 /82 71         165.283 /103 58
>>>> x = 8    173.654 /61 94           177.693 /60 93         172.31 /76 76
>>>
>>> This looks funny: so "performance" is eating less watts than
>>> "powersaving" and "balance" on NHM. Could it be that the average watts
>>> measurements on NHM are not correct/precise..? On SNB they look as
>>> expected, according to your scheme.
>>
>> well... it's not always beneficial to group or to spread out
>> it depends on cache behavior mostly which is best
>
> Let me try to understand what this means: so "performance" above with
> 8 threads means that those threads are spread out across more than one
> socket, no?
>
> If so, this would mean that you have a smaller amount of tasks on each
> socket, thus the smaller wattage.
>
> The "powersaving" method OTOH fills up the one socket up to the brim,
> thus the slightly higher consumption due to all threads being occupied.
>
> Is that it?

not sure.

by and large, power efficiency is the same as performance efficiency, with some twists.
or to reword that to be more clear
if you waste performance due to something that becomes inefficient, you're wasting power as well.
now, you might have some hardware effects that can then save you power... but those effects
then first need to overcome the waste from the performance inefficiency... and that almost never happens.

for example, if you have two workloads that each fit barely inside the last level cache...
it's much more efficient to spread these over two sockets... where each has its own full LLC
to use.
If you'd group these together, both would thrash the cache all the time and run inefficient --> bad for power.

now, on the other hand, if you have two threads of a process that share a bunch of data structures,
and you'd spread these over 2 sockets, you end up bouncing data between the two sockets a lot,
running inefficient --> bad for power.

having said all this, if you have to tasks that don't have such cache effects, the most efficient way
of running things will be on 2 hyperthreading halves... it's very hard to beat the power efficiency of that.
But this assumes the tasks don't compete with resources much on the HT level, and achieve good scaling.
and this still has to compete with "race to halt", because if you're done quicker, you can put the memory
in self refresh quicker.

none of this stuff is easy for humans or computer programs to determine ahead of time... or sometimes even afterwards.
heck, even for just performance it's really really hard already, never mind adding power.

my personal gut feeling is that we should just optimize this scheduler stuff for performance, and that
we're going to be doing quite well on power already if we achieve that.

next prev parent reply	other threads:[~2012-12-11 16:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10  8:22 [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling Alex Shi
2012-12-10  8:22 ` [PATCH 01/18] sched: select_task_rq_fair clean up Alex Shi
2012-12-11  4:23   ` Preeti U Murthy
2012-12-11  5:28     ` Alex Shi
2012-12-11  6:30       ` Preeti U Murthy
2012-12-11 11:53         ` Alex Shi
2012-12-12  5:26           ` Preeti U Murthy
2012-12-21  4:28         ` Namhyung Kim
2012-12-23 12:17           ` Alex Shi
2012-12-10  8:22 ` [PATCH 02/18] sched: fix find_idlest_group mess logical Alex Shi
2012-12-11  5:08   ` Preeti U Murthy
2012-12-11  5:29     ` Alex Shi
2012-12-11  5:50       ` Preeti U Murthy
2012-12-11 11:55         ` Alex Shi
2012-12-10  8:22 ` [PATCH 03/18] sched: don't need go to smaller sched domain Alex Shi
2012-12-10  8:22 ` [PATCH 04/18] sched: remove domain iterations in fork/exec/wake Alex Shi
2012-12-10  8:22 ` [PATCH 05/18] sched: load tracking bug fix Alex Shi
2012-12-10  8:22 ` [PATCH 06/18] sched: set initial load avg of new forked task as its load weight Alex Shi
2012-12-21  4:33   ` Namhyung Kim
2012-12-23 12:00     ` Alex Shi
2012-12-10  8:22 ` [PATCH 07/18] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2012-12-12  3:57   ` Preeti U Murthy
2012-12-12  5:52     ` Alex Shi
2012-12-13  8:45     ` Alex Shi
2012-12-21  4:35       ` Namhyung Kim
2012-12-23 11:42         ` Alex Shi
2012-12-10  8:22 ` [PATCH 08/18] sched: consider runnable load average in move_tasks Alex Shi
2012-12-12  4:41   ` Preeti U Murthy
2012-12-12  6:26     ` Alex Shi
2012-12-21  4:43       ` Namhyung Kim
2012-12-23 12:29         ` Alex Shi
2012-12-10  8:22 ` [PATCH 09/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2012-12-10  8:22 ` [PATCH 10/18] sched: add sched_policy in kernel Alex Shi
2012-12-10  8:22 ` [PATCH 11/18] sched: add sched_policy and it's sysfs interface Alex Shi
2012-12-10  8:22 ` [PATCH 12/18] sched: log the cpu utilization at rq Alex Shi
2012-12-10  8:22 ` [PATCH 13/18] sched: add power aware scheduling in fork/exec/wake Alex Shi
2012-12-10  8:22 ` [PATCH 14/18] sched: add power/performance balance allowed flag Alex Shi
2012-12-10  8:22 ` [PATCH 15/18] sched: don't care if the local group has capacity Alex Shi
2012-12-10  8:22 ` [PATCH 16/18] sched: pull all tasks from source group Alex Shi
2012-12-10  8:22 ` [PATCH 17/18] sched: power aware load balance, Alex Shi
2012-12-10  8:22 ` [PATCH 18/18] sched: lazy powersaving balance Alex Shi
2012-12-11  0:51 ` [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling Alex Shi
2012-12-11 12:10   ` Alex Shi
2012-12-11 15:48     ` Borislav Petkov
2012-12-11 16:03       ` Arjan van de Ven
2012-12-11 16:13         ` Borislav Petkov
2012-12-11 16:40           ` Arjan van de Ven [this message]
2012-12-12  9:52             ` Amit Kucheria
2012-12-12 13:55               ` Alex Shi
2012-12-12 14:21                 ` Vincent Guittot
2012-12-13  2:51                   ` Alex Shi
2012-12-12 14:41             ` Borislav Petkov
2012-12-13  3:07               ` Alex Shi
2012-12-13 11:35                 ` Borislav Petkov
2012-12-14  1:56                   ` Alex Shi
2012-12-12  1:14           ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C76208.107@linux.intel.com \
    --to=arjan@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=andre.przywara@amd.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml.alex@gmail.com \
    --cc=mingo@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rob@landley.net \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).