All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Arjan van de Ven <arjan@linux.intel.com>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Ingo Molnar <mingo@kernel.org>,
	"alex.shi@intel.com" <alex.shi@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"pjt@google.com" <pjt@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: power-efficient scheduling design
Date: Tue, 18 Jun 2013 20:06:25 +0100	[thread overview]
Message-ID: <20130618190625.GA9065@MacBook-Pro.local> (raw)
In-Reply-To: <51C07ABC.2080704@linux.intel.com>

On Tue, Jun 18, 2013 at 04:20:28PM +0100, Arjan van de Ven wrote:
> On 6/14/2013 9:05 AM, Morten Rasmussen wrote:
> > Looking at the discussion it seems that people have slightly different
> > views, but most agree that the goal is an integrated scheduling,
> > frequency, and idle policy like you pointed out from the beginning.
> 
> ... except that such a solution does not really work for Intel hardware.

I think it can work (see below).

> The OS does not get to really pick the CPU "frequency" (never mind that
> frequency is not what gets controlled), the hardware picks the frequency.
> The OS can do some level of requests (best to think of this as a percentage
> more than frequency) but what you actually get is more often than not
> what you asked for.

Morten's proposal does not try to "pick" a frequency. The P-state change
is still done gradually based on the load (so we still have an adaptive
loop). The load (total or per-task) can be tracked in an arch-specific
way (using aperf/mperf on x86).

The difference from what intel_pstate.c does now is that it has a view
of the total load (across all CPUs) and the run-queue content. It can
"guide" the load balancer into favouring one or two CPUs and ignoring
the rest (using cpu_power).

If several CPUs have small aperf/mperf ratio, it can decide to use fewer
CPUs at a higher aperf/mperf by telling the load balancer not to use
them (cpu_power = 1). All of this is continuously re-adjusted to cope
with changes in the load and hardware variations like turbo boost.

Similarly, if a CPU has aperf/mperf >= 1, it keeps increasing the
P-state (depending on the policy). Once it got to the highest level,
depending on the number of threads in the run-queue (doesn't make sense
for only one), it can open up other CPUs and let the load balancer use
them.

> You can look in hindsight what kind of performance you got (from some basic
> counters in MSRs), and the scheduler can use that to account backwards to what some process
> got. But to predict what you will get in the future...... that's near impossible
> on any realistic system nowadays (and even more so in the future).

We don't need absolute figures matching load to P-states but we'll
continue with an adaptive system. What we have now is also an adaptive
system but with independent decisions taken by the load balancer and the
P-state driver. The load balancer can even get confused by the cpufreq
decisions and move tasks around unnecessarily. With Morten's proposal we
get the power scheduler to adjust the P-state while giving hints to the
load balancer at the same time (it adjusts both, it doesn't try to
re-adjust itself after the load balancer).

> Treating "frequency" (well "performance) and idle separately is also a false thing to do
> (yes I know in 3.9/3.10 we still do that for Intel hw, but we're working
> on fixing that). They are by no means separate things. One guy's idle state
> is the other guys power budget (and thus performance)!.

I agree.

-- 
Catalin

  parent reply	other threads:[~2013-06-18 19:06 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31  1:17 ` Alex Shi
2013-05-31  8:23   ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59   ` Arjan van de Ven
2013-06-03 15:43     ` Ingo Molnar
2013-06-04 15:03   ` Morten Rasmussen
2013-06-07  6:26     ` Preeti U Murthy
2013-06-20 15:23     ` Ingo Molnar
2013-06-05  9:56   ` Amit Kucheria
2013-06-07  6:03   ` Preeti U Murthy
2013-06-07 14:51     ` Catalin Marinas
2013-06-07 18:08       ` Preeti U Murthy
2013-06-07 17:36         ` David Lang
2013-06-09  4:33           ` Preeti U Murthy
2013-06-08 11:28         ` Catalin Marinas
2013-06-08 14:02           ` Rafael J. Wysocki
2013-06-09  3:42             ` Preeti U Murthy
2013-06-09 22:53               ` Catalin Marinas
2013-06-10 16:25               ` Daniel Lezcano
2013-06-12  0:27                 ` David Lang
2013-06-12  1:48                   ` Arjan van de Ven
2013-06-12  9:48                     ` Amit Kucheria
2013-06-12 16:22                       ` David Lang
2013-06-12 10:20                     ` Catalin Marinas
2013-06-12 15:24                       ` Arjan van de Ven
2013-06-12 17:04                         ` Catalin Marinas
2013-06-12  9:50                   ` Daniel Lezcano
2013-06-12 16:30                     ` David Lang
2013-06-11  0:50               ` Rafael J. Wysocki
2013-06-13  4:32                 ` Preeti U Murthy
2013-06-09  4:23           ` Preeti U Murthy
2013-06-07 15:23     ` Arjan van de Ven
2013-06-14 16:05   ` Morten Rasmussen
2013-06-17 11:23     ` Catalin Marinas
2013-06-18  1:37     ` David Lang
2013-06-18 10:23       ` Morten Rasmussen
2013-06-18 17:39         ` David Lang
2013-06-19 12:39           ` Morten Rasmussen
2013-06-18 15:20     ` Arjan van de Ven
2013-06-18 17:47       ` David Lang
2013-06-18 19:36         ` Arjan van de Ven
2013-06-19 15:39         ` Arjan van de Ven
2013-06-19 17:00           ` Morten Rasmussen
2013-06-19 17:08             ` Arjan van de Ven
2013-06-21  8:50               ` Morten Rasmussen
2013-06-21 15:29                 ` Arjan van de Ven
2013-06-21 15:38                 ` Arjan van de Ven
2013-06-21 21:23                   ` Catalin Marinas
2013-06-21 21:34                     ` Arjan van de Ven
2013-06-23 23:32                       ` Benjamin Herrenschmidt
2013-06-24 10:07                         ` Catalin Marinas
2013-06-24 15:26                         ` Arjan van de Ven
2013-06-24 21:59                           ` Benjamin Herrenschmidt
2013-06-24 23:10                             ` Arjan van de Ven
2013-06-18 19:06       ` Catalin Marinas [this message]
2013-06-21 15:06       ` Morten Rasmussen
2013-06-23 10:55         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130618190625.GA9065@MacBook-Pro.local \
    --to=catalin.marinas@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.