From: Catalin Marinas <catalin.marinas@arm.com>
To: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Lang <david@lang.hm>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Preeti U Murthy <preeti@linux.vnet.ibm.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Ingo Molnar <mingo@kernel.org>,
Morten Rasmussen <Morten.Rasmussen@arm.com>,
"alex.shi@intel.com" <alex.shi@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Mike Galbraith <efault@gmx.de>, "pjt@google.com" <pjt@google.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linaro-kernel <linaro-kernel@lists.linaro.org>,
"len.brown@intel.com" <len.brown@intel.com>,
"corbet@lwn.net" <corbet@lwn.net>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Linux PM list <linux-pm@vger.kernel.org>
Subject: Re: power-efficient scheduling design
Date: Wed, 12 Jun 2013 18:04:47 +0100 [thread overview]
Message-ID: <20130612170447.GF31646@arm.com> (raw)
In-Reply-To: <51B892C4.6090800@linux.intel.com>
On Wed, Jun 12, 2013 at 04:24:52PM +0100, Arjan van de Ven wrote:
> >>> This isn't in the fastpath, it's in the rebalancing logic.
> >>
> >> the reality is much more complex unfortunately.
> >> C and P states hang together tightly, and even C state on one core
> >> impacts other cores' performance, just like P state selection on one
> >> core impacts other cores.
> >>
> >> (at least for x86, we should really stop talking as if the OS picks
> >> the "frequency", that's just not the case anymore)
> >
> > I agree, the reality is very complex. But we should go back and analyse
> > what problem we are trying to solve, what each framework is trying to
> > address.
> >
> > When viewed separately from the scheduler, cpufreq and cpuidle governors
> > do the right thing. But they both base their action on the CPU load
> > (balance) decided by the scheduler and it's the latter that we are
> > trying to adjust (and we are still debating what the right approach is).
> >
> > Since such information seems too complex to be moved into the scheduler,
> > why don't we get cpufreq in charge of restricting the load balancing to
> > certain CPUs? It already tracks the load/idle time to (gradually) change
> > the P state. Depending on the governor/policy, it could decide that (for
>
> (btw in case you missed it, for Intel HW we no longer use cpufreq anymore)
Do you mean the intel_pstate.c code? It indeed doesn't use much of
cpufreq, just setpolicy and it's on its own afterwards. Separating this
from the framework probably has real benefits for the Intel processors
but it would make a unified scheduler/cpufreq/cpuidle solution harder
(just a remark, I don't say it's good or bad, there are many
opinions against the unified solution; ARM could do the same for
configurations like big.LITTLE).
But such driver could still interact with the scheduler to control it's
load balancing. At a quick look (I'm not familiar with this driver), it
tracks the per-CPU load and increases or decreases the P-state (similar
to a cpufreq governor). It could as well track the total load and
(depending on hardware configuration), get some CPUs in lower
performance P-state (or even C-state) and tell the scheduler to avoid
them.
One way to control load-balancing ratio is via something like
arch_scale_freq_power(). We could tweak the scheduler further so that
something like cpu_power==0 means do not schedule anything there.
So my proposal is to move the load-balancing hints (load ratio, avoiding
CPUs etc.) outside the scheduler into drivers like intel_pstate.c or
cpufreq governors. We then focus on getting the best performance out of
the scheduler (like quicker migration) but it would not be concerned
with the power consumption.
> I do agree the scheduler needs to get integrated a bit better, in that it
> has some better knowledge, and to be honest, we likely need to switch from
> giving tasks credit for "time consumed" to giving them credit for something like
> "cycles consumed" or "instructions executed" or a mix thereof.
> So that a task that runs on a slower CPU (for either policy choice reasons or
> due to hardware capabilities), it gets charged less than when it runs fast.
I agree, this would be useful in optimising the scheduler so that it
makes the right task placement/migration decisions (but as I said above,
make the power aspect transparent to the scheduler).
--
Catalin
next prev parent reply other threads:[~2013-06-12 17:06 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31 1:17 ` Alex Shi
2013-05-31 8:23 ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59 ` Arjan van de Ven
2013-06-03 15:43 ` Ingo Molnar
2013-06-04 15:03 ` Morten Rasmussen
2013-06-07 6:26 ` Preeti U Murthy
2013-06-20 15:23 ` Ingo Molnar
2013-06-05 9:56 ` Amit Kucheria
2013-06-07 6:03 ` Preeti U Murthy
2013-06-07 14:51 ` Catalin Marinas
2013-06-07 18:08 ` Preeti U Murthy
2013-06-07 17:36 ` David Lang
2013-06-09 4:33 ` Preeti U Murthy
2013-06-08 11:28 ` Catalin Marinas
2013-06-08 14:02 ` Rafael J. Wysocki
2013-06-09 3:42 ` Preeti U Murthy
2013-06-09 22:53 ` Catalin Marinas
2013-06-10 16:25 ` Daniel Lezcano
2013-06-12 0:27 ` David Lang
2013-06-12 1:48 ` Arjan van de Ven
2013-06-12 9:48 ` Amit Kucheria
2013-06-12 16:22 ` David Lang
2013-06-12 10:20 ` Catalin Marinas
2013-06-12 15:24 ` Arjan van de Ven
2013-06-12 17:04 ` Catalin Marinas [this message]
2013-06-12 9:50 ` Daniel Lezcano
2013-06-12 16:30 ` David Lang
2013-06-11 0:50 ` Rafael J. Wysocki
2013-06-13 4:32 ` Preeti U Murthy
2013-06-09 4:23 ` Preeti U Murthy
2013-06-07 15:23 ` Arjan van de Ven
2013-06-14 16:05 ` Morten Rasmussen
2013-06-17 11:23 ` Catalin Marinas
2013-06-18 1:37 ` David Lang
2013-06-18 10:23 ` Morten Rasmussen
2013-06-18 17:39 ` David Lang
2013-06-19 12:39 ` Morten Rasmussen
2013-06-18 15:20 ` Arjan van de Ven
2013-06-18 17:47 ` David Lang
2013-06-18 19:36 ` Arjan van de Ven
2013-06-19 15:39 ` Arjan van de Ven
2013-06-19 17:00 ` Morten Rasmussen
2013-06-19 17:08 ` Arjan van de Ven
2013-06-21 8:50 ` Morten Rasmussen
2013-06-21 15:29 ` Arjan van de Ven
2013-06-21 15:38 ` Arjan van de Ven
2013-06-21 21:23 ` Catalin Marinas
2013-06-21 21:34 ` Arjan van de Ven
2013-06-23 23:32 ` Benjamin Herrenschmidt
2013-06-24 10:07 ` Catalin Marinas
2013-06-24 15:26 ` Arjan van de Ven
2013-06-24 21:59 ` Benjamin Herrenschmidt
2013-06-24 23:10 ` Arjan van de Ven
2013-06-18 19:06 ` Catalin Marinas
2013-06-21 15:06 ` Morten Rasmussen
2013-06-23 10:55 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130612170447.GF31646@arm.com \
--to=catalin.marinas@arm.com \
--cc=Morten.Rasmussen@arm.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=arjan@linux.intel.com \
--cc=corbet@lwn.net \
--cc=daniel.lezcano@linaro.org \
--cc=david@lang.hm \
--cc=efault@gmx.de \
--cc=len.brown@intel.com \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=preeti@linux.vnet.ibm.com \
--cc=rjw@rjwysocki.net \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.