public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: mingo@kernel.org, rjw@rjwysocki.net, markgross@thegnar.org,
	vincent.guittot@linaro.org, catalin.marinas@arm.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [5/11] issue 5: Frequency and uarch invariant task load
Date: Wed, 8 Jan 2014 13:31:18 +0100	[thread overview]
Message-ID: <20140108123118.GS30183@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <1389111587-5923-6-git-send-email-morten.rasmussen@arm.com>

On Tue, Jan 07, 2014 at 04:19:41PM +0000, Morten Rasmussen wrote:
> Potential solution: Frequency invariance has been proposed before [1]
> where the task load is scaled by the cur/max freq ratio. Another
> possibility is to use hardware counters if such are available on the
> platform.
> 
> [1] https://lkml.org/lkml/2013/4/16/289

Right, I just had a look at those patches.. they're not horrible but I
think they're missing a few opportunities.

My main objection to them is that I think the newly introduced
max_capacity is exactly what the current cpu_power thing is -- then
again, I still haven't let the entire thing sink in well enough.

Not to mention we need to fix some of the cpu_power abuse -- like the
correlation to capacity, which as stated in previous emails should be
sorted using utilization.

So DVFS certainly makes sense, and would indeed be required in order to
make sensible decisions in the face of P states. Even in the face of
funny hardware like Intel which pretty much ignores whatever you tell it
and does it own merry thing.


A few random thoughts:

 - I think for SMP-nice we want to migrate from /max_capacity to
   /curr_capacity; because SMP-nice cares about 100% utilization
   regardless of the actual P state. If we're somehow forced into a
   lower P state (thermal or otherwise) fairness is best served by
   normalizing at the rate we're actually running at, not the potential
   maximal.

 - We need to re-think SMT and turbo-bins in general; I think we can
   think of those two as the same effective thing. This does mean Intel
   chips will have a dual layer of this goo, and we can currently barely
   deal with the 1 SMT layer, let alone do something sensible with 2.

   To clarify, a single SMT thread will generally go 'faster' on its own
   since it doesn't need to compete with the other thread(s) for core
   resources, but together they might better utilize the core resources
   giving an over-all throughput win.

   Similar for turbo bins, a single core can go faster on its own since
   it doesn't have competition for energy and thermal constraints, but
   together cores can probably achieve greater throughput.

   So we need a better way to describe this capacity dependency and
   variability.

   I'm fairly sure ARM doesn't do SMT, but they certainly suffer from
   thermal caps and can thus have effective turbo bins, even though
   they're not explicit and magic like with Intel.

   And of course the honorary mention goes to Power7 which has
   asymmetric bins -- lets hope they fix it and nobody else things them
   a great idea.

 - For hardware without P state controls, or hardware that pretty much
   ignores them, we need means of obtaining the max and curr capacity.

   Intel has the APERF, MPERF registers which resp. count at actual
   frequency and fixed frequency. Using them is a bit tricky since
   APERF doesn't count when idle, but when filtering out the idle time
   they do provide a current performance ratio.

   From that we could obtain a max performance ratio by using a wide
   window max on the current value or somesuch.

   Again, SMT and turbo-bins will complicate matters..

   Other CPUs that have magic P state control might not provide such
   registers which would require PMU resources, which would completely
   blow :/



  reply	other threads:[~2014-01-08 12:31 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07 16:19 [0/11][REPOST] Energy-aware scheduling use-cases and scheduler issues Morten Rasmussen
2014-01-07 16:19 ` [1/11] issue 1: Missing power topology information in scheduler Morten Rasmussen
2014-01-07 16:19 ` [2/11] issue 2: Energy-awareness for heterogeneous systems Morten Rasmussen
2014-01-07 16:19 ` [3/11] issue 3: No understanding of potential cpu capacity Morten Rasmussen
2014-01-13 21:07   ` Rafael J. Wysocki
2014-01-14 10:27     ` Peter Zijlstra
2014-01-14 16:39     ` Morten Rasmussen
2014-01-14 16:51       ` Peter Zijlstra
2014-01-07 16:19 ` [4/11] issue 4: Tracking idle states Morten Rasmussen
2014-01-07 16:19 ` [5/11] issue 5: Frequency and uarch invariant task load Morten Rasmussen
2014-01-08 12:31   ` Peter Zijlstra [this message]
2014-01-16 11:16     ` Morten Rasmussen
2014-01-07 16:19 ` [6/11] issue 6: Poor and non-deterministic performance on heterogeneous systems Morten Rasmussen
2014-01-07 16:19 ` [7/11] use-case 1: Webbrowsing on Android Morten Rasmussen
2014-01-07 16:19 ` [8/11] use-case 2: Audio playback " Morten Rasmussen
2014-01-07 16:19 ` [9/11] use-case 3: Video " Morten Rasmussen
2014-01-07 16:19 ` [10/11] use-case 4: Game " Morten Rasmussen
2014-01-07 16:19 ` [11/11] system 1: Saving energy using DVFS Morten Rasmussen
2014-01-20 16:32   ` Pavel Machek
2014-01-21 12:14     ` Morten Rasmussen
2014-01-21 12:31       ` Pavel Machek
2014-01-20 16:49   ` Pavel Machek
2014-01-20 17:10     ` Catalin Marinas
2014-01-20 17:17       ` Catalin Marinas
2014-01-20 17:47         ` Pavel Machek
2014-01-20 18:03           ` Catalin Marinas
2014-01-20 19:15             ` Pavel Machek
2014-01-21 11:19               ` Catalin Marinas
2014-01-20 17:54       ` Pavel Machek
2014-01-20 18:16         ` Catalin Marinas
2014-01-20 20:44           ` Pavel Machek
2014-01-20 18:25         ` Sebastian Reichel
2014-01-21 18:53           ` Kalle Jokiniemi
2014-01-20 18:12       ` Pavel Machek
2014-01-21 11:42         ` Catalin Marinas
2014-01-21 12:20           ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140108123118.GS30183@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=markgross@thegnar.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox