All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@intel.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: peterz@infradead.org, mingo@kernel.org,
	preeti@linux.vnet.ibm.com, vincent.guittot@linaro.org,
	efault@gmx.de, pjt@google.com, linux-kernel@vger.kernel.org,
	linaro-kernel@lists.linaro.org, arjan@linux.intel.com,
	len.brown@intel.com, corbet@lwn.net, tglx@linutronix.de
Subject: Re: [RFC] Comparison of power-efficient scheduling patch sets
Date: Fri, 31 May 2013 09:17:08 +0800	[thread overview]
Message-ID: <51A7FA14.70902@intel.com> (raw)
In-Reply-To: <20130530134718.GB32728@e103034-lin>

On 05/30/2013 09:47 PM, Morten Rasmussen wrote:
> Hi,
> 
> A number of patch sets related to power-efficient scheduling have been
> posted over the last couple of months. Most of them do not have much
> data to back them up, so I decided to do some testing.
> 
> Common for all of the patch sets that I have tested, except one, is that
> they attempt to pack tasks on as few cpus as possible to allow the
> remaining cpus to enter deeper sleep states - a strategy that should
> make sense on most platforms that support per-cpu power gating and
> multi-socket machines.
> 
> Kernel: 3.9
> 
> Patch sets:
> rlb-v4: sched: use runnable load based balance (Alex Shi)
>         <https://lkml.org/lkml/2013/4/27/13>

Thanks for the valuable comparison!

The runnable load balance target is performance. It is still try to
disperse tasks to as much as possible CPUs. :)
The latest v7 version remove the 6th patch(wake_affine change) in v4.
and plus fix a slept time double counting issue, and remove
blocked_load_avg in tg load.
http://comments.gmane.org/gmane.linux.kernel/1498988
Enjoy!
> pas-v7: sched: power aware scheduling (Alex Shi)
>         <https://lkml.org/lkml/2013/4/3/732>

We still have some internal discussion on this patch set before update
it. Sorry for response late on this patchset!

> pst-v3: sched: packing small tasks (Vincent Guittot)
>         <https://lkml.org/lkml/2013/3/22/183>
> pst-v4: sched: packing small tasks (Vincent Guittot)
>         <https://lkml.org/lkml/2013/4/25/396>
> 
> Configuration:
> pas-v7: Set to "powersaving" mode.
> pst-v4: Set to "Full" packing mode.
> 
> Platform:
> ARM TC2 (test-chip), 2xCortex-A15 + 3xCortex-A7. Cortex-A15s disabled.
> 
> Measurement technique:
> Time spent non-idle (not in idle state) for each cpu based on cpuidle
> ftrace events. TC2 does not have per-core power-gating, so packing
> inside the A7 cluster does not lead to any significant power savings.
> Note that any product grade hardware (TC2 is a test-chip) will very
> likely have per-core power-gating, so in those cases packing will have
> an appreciable effect on power savings.
> Measuring non-idle time rather than power should give a more clear idea
> about the effect of the patch sets given that the idle back-end is
> highly implementation specific.
> 
> Benchmarks:
> audio playback (Android): 30s mp3 file playback on Android.
> bbench+audio (Android): Web page rendering while doing mp3 playback.
> andebench_native (Android): Android benchmark running in native mode.
> cyclictest: Short periodic tasks.
> 
> Results:
> Two runs for each patch set.
> 
> audio playback (Android) SMP
> non-idle %  cpu 0  cpu 1  cpu 2
> 3.9_1       11.96   2.86   2.48
> 3.9_2       12.64   2.81   1.88
> rlb-v4_1    12.61   2.44   1.90
> rlb-v4_2    12.45   2.44   1.90
> pas-v7_1    16.17   0.03   0.24
> pas-v7_2    16.08   0.28   0.07
> pst-v3_1    15.18   2.76   1.70
> pst-v3_2    15.13   0.80   0.38
> pst-v4_1    16.14   0.05   0.00
> pst-v4_2    16.34   0.06   0.00
> 
> bbench+audio (Android) SMP
> non-idle %  cpu 0  cpu 1  cpu 2  render time
> 3.9_1       25.00  20.73  21.22   812
> 3.9_2       24.29  19.78  22.34   795
> rlb-v4_1    23.84  19.36  22.74   782
> rlb-v4_2    24.07  19.36  22.74   797
> pas-v7_1    28.29  17.86  16.01   869
> pas-v7_2    28.62  18.54  15.05   908
> pst-v3_1    29.14  20.59  21.72   830
> pst-v3_2    27.69  18.81  20.06   830
> pst-v4_1    42.20  13.63   2.29   880
> pst-v4_2    41.56  14.40   2.17   935
> 
> andebench_native (8 threads) (Android) SMP
> non-idle %  cpu 0  cpu 1  cpu 2  Score
> 3.9_1       99.22  98.88  99.61   4139
> 3.9_2       99.56  99.31  99.46   4148
> rlb-v4_1    99.49  99.61  99.53   4153
> rlb-v4_2    99.56  99.61  99.53   4149
> pas-v7_1    99.53  99.59  99.29   4149
> pas-v7_2    99.42  99.63  99.48   4150
> pst-v3_1    97.89  99.33  99.42   4097
> pst-v3_2    99.16  99.62  99.42   4097
> pst-v4_1    99.34  99.01  99.59   4146
> pst-v4_2    99.49  99.52  99.20   4146
> 
> cyclictest SMP
> non-idle %  cpu 0  cpu 1  cpu 2
> 3.9_1        9.13   8.88   8.41
> 3.9_2       10.27   8.02   6.30
> rlb-v4_1     8.88   8.09   8.11
> rlb-v4_2     8.49   8.09   8.11
> pas-v7_1    10.20   0.02  11.50
> pas-v7_2     7.86  14.31   0.02
> pst-v3_1    20.44   8.68   7.97
> pst-v3_2    20.41   0.78   1.00
> pst-v4_1    21.32   0.21   0.05
> pst-v4_2    21.56   0.21   0.04
> 
> Overall, pas-v7 seems to do a fairly good job at packing. The idle time
> distribution seems to be somewhere between pst-v3 and the more
> aggressive pst-v4 for all the benchmarks. pst-v4 manages to keep two
> cpus nearly idle (<0.25% non-idle) for both cyclictest and audio, which
> is better than both pst-v3 and pas-v7. pas-v7 fails to pack cyclictest.
> Packing does come at at cost which can be seen for bbench+audio, where
> pst-v3 and rlb-v4 get better render times than pas-v7 and pst-v4 which
> do more aggressive packing. rlb-v4 does not pack, it is only included
> for reference.
> 
> From a packing perspective pst-v4 seems to do the best job for the
> workloads that I have tested on ARM TC2. The less aggressive packing in
> pst-v3 may be a better choice for in terms of performance.
> 
> I'm well aware that these tests are heavily focused on mobile workloads.
> I would therefore encourage people to share your test results for your
> workloads on your platforms to complete the picture. Comments are also
> welcome.
> 
> Thanks,
> Morten
> 
> 


-- 
Thanks
    Alex

  reply	other threads:[~2013-05-31  1:18 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31  1:17 ` Alex Shi [this message]
2013-05-31  8:23   ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59   ` Arjan van de Ven
2013-06-03 15:43     ` Ingo Molnar
2013-06-04 15:03   ` Morten Rasmussen
2013-06-07  6:26     ` Preeti U Murthy
2013-06-20 15:23     ` Ingo Molnar
2013-06-05  9:56   ` Amit Kucheria
2013-06-07  6:03   ` Preeti U Murthy
2013-06-07 14:51     ` Catalin Marinas
2013-06-07 18:08       ` Preeti U Murthy
2013-06-07 17:36         ` David Lang
2013-06-09  4:33           ` Preeti U Murthy
2013-06-08 11:28         ` Catalin Marinas
2013-06-08 14:02           ` Rafael J. Wysocki
2013-06-09  3:42             ` Preeti U Murthy
2013-06-09 22:53               ` Catalin Marinas
2013-06-10 16:25               ` Daniel Lezcano
2013-06-12  0:27                 ` David Lang
2013-06-12  1:48                   ` Arjan van de Ven
2013-06-12  9:48                     ` Amit Kucheria
2013-06-12 16:22                       ` David Lang
2013-06-12 10:20                     ` Catalin Marinas
2013-06-12 15:24                       ` Arjan van de Ven
2013-06-12 17:04                         ` Catalin Marinas
2013-06-12  9:50                   ` Daniel Lezcano
2013-06-12 16:30                     ` David Lang
2013-06-11  0:50               ` Rafael J. Wysocki
2013-06-13  4:32                 ` Preeti U Murthy
2013-06-09  4:23           ` Preeti U Murthy
2013-06-07 15:23     ` Arjan van de Ven
2013-06-14 16:05   ` Morten Rasmussen
2013-06-17 11:23     ` Catalin Marinas
2013-06-18  1:37     ` David Lang
2013-06-18 10:23       ` Morten Rasmussen
2013-06-18 17:39         ` David Lang
2013-06-19 12:39           ` Morten Rasmussen
2013-06-18 15:20     ` Arjan van de Ven
2013-06-18 17:47       ` David Lang
2013-06-18 19:36         ` Arjan van de Ven
2013-06-19 15:39         ` Arjan van de Ven
2013-06-19 17:00           ` Morten Rasmussen
2013-06-19 17:08             ` Arjan van de Ven
2013-06-21  8:50               ` Morten Rasmussen
2013-06-21 15:29                 ` Arjan van de Ven
2013-06-21 15:38                 ` Arjan van de Ven
2013-06-21 21:23                   ` Catalin Marinas
2013-06-21 21:34                     ` Arjan van de Ven
2013-06-23 23:32                       ` Benjamin Herrenschmidt
2013-06-24 10:07                         ` Catalin Marinas
2013-06-24 15:26                         ` Arjan van de Ven
2013-06-24 21:59                           ` Benjamin Herrenschmidt
2013-06-24 23:10                             ` Arjan van de Ven
2013-06-18 19:06       ` Catalin Marinas
2013-06-21 15:06       ` Morten Rasmussen
2013-06-23 10:55         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A7FA14.70902@intel.com \
    --to=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.