Re: [RFC][PATCH v5 00/14] sched: packing tasks

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
       [not found] <1382097147-30088-1-git-send-email-vincent.guittot@linaro.org>
@ 2013-11-11 11:33 ` Catalin Marinas
  2013-11-11 16:36   ` Peter Zijlstra
                     ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Catalin Marinas @ 2013-11-11 11:33 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven, linux-pm

Hi Vincent,

(cross-posting to linux-pm as it was agreed to follow up on this list)

On 18 October 2013 12:52, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> This is the 5th version of the previously named "packing small tasks" patchset.
> "small" has been removed because the patchset doesn't only target small tasks
> anymore.
>
> This patchset takes advantage of the new per-task load tracking that is
> available in the scheduler to pack the tasks in a minimum number of
> CPU/Cluster/Core. The packing mechanism takes into account the power gating
> topology of the CPUs to minimize the number of power domains that need to be
> powered on simultaneously.

As a general comment, it's not clear how this set of patches address
the bigger problem of energy aware scheduling, mainly because we
haven't yet defined _what_ we want from the scheduler, what the
scenarios are, constraints, are we prepared to give up some
performance (speed, latency) for power, how much.

This packing heuristics may work for certain SoCs and workloads but,
for example, there are modern ARM SoCs where the P-state has a much
bigger effect on power and it's more energy-efficient to keep two CPUs
in lower P-state than packing all tasks onto one, even though they may
be gated independently. In such cases _small_ task packing (for some
definition of 'small') would be more useful than general packing but
even this is just heuristics that saves power for particular workloads
without fully defining/addressing the problem.

I would rather start by defining the main goal and working backwards
to an algorithm. We may as well find that task packing based on this
patch set is sufficient but we may also get packing-like behaviour as
a side effect of a broader approach (better energy cost awareness). An
important aspect even in the mobile space is keeping the performance
as close as possible to the standard scheduler while saving a bit more
power. Just trying to reduce the number of non-idle CPUs may not meet
this requirement.

So, IMO, defining the power topology is a good starting point and I
think it's better to separate the patches from the energy saving
algorithms like packing. We need to agree on what information we have
(C-state details, coupling, power gating) and what we can/need to
expose to the scheduler. This can be revisited once we start
implementing/refining the energy awareness.

2nd step is how the _current_ scheduler could use such information
while keeping the current overall system behaviour (how much of
cpuidle we should move into the scheduler).

Question for Peter/Ingo: do you want the scheduler to decide on which
C-state a CPU should be in or we still leave this to a cpuidle
layer/driver?

My understanding from the recent discussions is that the scheduler
should decide directly on the C-state (or rather the deepest C-state
possible since we don't want to duplicate the backend logic for
synchronising CPUs going up or down). This means that the scheduler
needs to know about C-state target residency, wake-up latency (I think
we can leave coupled C-states to the backend, there is some complex
synchronisation which I wouldn't duplicate).

Alternatively (my preferred approach), we get the scheduler to predict
and pass the expected residency and latency requirements down to a
power driver and read back the actual C-states for making task
placement decisions. Some of the menu governor prediction logic could
be turned into a library and used by the scheduler. Basically what
this tries to achieve is better scheduler awareness of the current
C-states decided by a cpuidle/power driver based on the scheduler
constraints.

3rd step is optimising the scheduler for energy saving, taking into
account the information added by the previous steps and possibly
adding some more. This stage however has several sub-steps (that can
be worked on in parallel to the steps above):

a) Define use-cases, typical workloads, acceptance criteria
(performance, latency requirements).

b) Set of benchmarks simulating the scenarios above. I wouldn't bother
with linsched since a power model is never realistic enough. It's
better to run those benchmarks on real hardware and either estimate
the energy based on the C/P states or, depending on SoC, read some
sensors, energy probes. If the scheduler maintainers want to reproduce
the numbers, I'm pretty sure we can ship some boards.

c) Start defining/implementing scheduler algorithm to do optimal task placement.

d) Assess the implementation against benchmarks at (b) *and* other
typical performance benchmarks (whether it's for servers, mobile,
Android etc). At this point we'll most likely go back and refine the
previous steps.

So far we've jumped directly to (c) because we had some scenarios in
mind that needed optimising but those haven't been written down and we
don't have a clear way to assess the impact. There is more here than
simply maximising the idle time. Ideally the scheduler should have an
estimate of the overall energy cost, the cost per task, run-queue, the
energy implications of moving the tasks to another run-queue, possibly
taking the P-state into account (but not 'picking' a P-state).

Anyway, I think we need to address the first steps and think about the
algorithm once we have the bigger picture of what we try to solve.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 11:33 ` [RFC][PATCH v5 00/14] sched: packing tasks Catalin Marinas
@ 2013-11-11 16:36   ` Peter Zijlstra
  2013-11-11 16:39     ` Arjan van de Ven
                       ` (2 more replies)
  2013-11-11 16:38   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 3 replies; 22+ messages in thread
From: Peter Zijlstra @ 2013-11-11 16:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven, linux-pm

On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:

tl;dr :-) Still trying to wrap my head around how to do that weird
topology Vincent raised..

> Question for Peter/Ingo: do you want the scheduler to decide on which
> C-state a CPU should be in or we still leave this to a cpuidle
> layer/driver?

I think the can leave most of that in a driver; right along with how to
prod the hardware to actually get into that state.

I think the most important parts are what is now 'generic' code; stuff
that guestimates the idle-time and so forth.

I think the scheduler simply wants to say: we expect to go idle for X
ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.

I think you also raised the point in that we do want some feedback as to
the cost of waking up particular cores to better make decisions on which
to wake. That is indeed so.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 11:33 ` [RFC][PATCH v5 00/14] sched: packing tasks Catalin Marinas
  2013-11-11 16:36   ` Peter Zijlstra
@ 2013-11-11 16:38   ` Peter Zijlstra
  2013-11-11 16:40     ` Arjan van de Ven
  2013-11-12 10:36     ` Vincent Guittot
  2013-11-11 16:54   ` Morten Rasmussen
  2013-11-12 12:35   ` Vincent Guittot
  3 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2013-11-11 16:38 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven, linux-pm

On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
> My understanding from the recent discussions is that the scheduler
> should decide directly on the C-state (or rather the deepest C-state
> possible since we don't want to duplicate the backend logic for
> synchronising CPUs going up or down). This means that the scheduler
> needs to know about C-state target residency, wake-up latency (I think
> we can leave coupled C-states to the backend, there is some complex
> synchronisation which I wouldn't duplicate).
> 
> Alternatively (my preferred approach), we get the scheduler to predict
> and pass the expected residency and latency requirements down to a
> power driver and read back the actual C-states for making task
> placement decisions. Some of the menu governor prediction logic could
> be turned into a library and used by the scheduler. Basically what
> this tries to achieve is better scheduler awareness of the current
> C-states decided by a cpuidle/power driver based on the scheduler
> constraints.

Ah yes.. so I _think_ the scheduler wants to eventually know about idle
topology constraints. But we can get there in a gradual fashion I hope.

Like the package C states on x86 -- for those to be effective the
scheduler needs to pack tasks and keep entire packages idle for as long
as possible.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:36   ` Peter Zijlstra
@ 2013-11-11 16:39     ` Arjan van de Ven
  2013-11-11 18:18       ` Catalin Marinas
  2013-11-12 17:40     ` Catalin Marinas
  2013-11-25 18:55     ` Daniel Lezcano
  2 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-11 16:39 UTC (permalink / raw)
  To: Peter Zijlstra, Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm


> I think the scheduler simply wants to say: we expect to go idle for X
> ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.

as long as Y normally is "large" or "infinity" that is ok ;-)
(a smaller Y will increase power consumption and decrease system performance)



> I think you also raised the point in that we do want some feedback as to
> the cost of waking up particular cores to better make decisions on which
> to wake. That is indeed so.

having a hardware driver give a prefered CPU ordering for wakes can indeed be useful.
(I'm doubtful that changing the recommendation for each idle is going to pay off,
but proof is in the pudding; there are certainly long term effects where this can help)

>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:38   ` Peter Zijlstra
@ 2013-11-11 16:40     ` Arjan van de Ven
  2013-11-12 10:36     ` Vincent Guittot
  1 sibling, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-11 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm

On 11/11/2013 8:38 AM, Peter Zijlstra wrote:
> Like the package C states on x86 -- for those to be effective the
> scheduler needs to pack tasks and keep entire packages idle for as long
> as possible.

"package" C states on x86 are not really per package... but system wide.
the name is very confusing.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 11:33 ` [RFC][PATCH v5 00/14] sched: packing tasks Catalin Marinas
  2013-11-11 16:36   ` Peter Zijlstra
  2013-11-11 16:38   ` Peter Zijlstra
@ 2013-11-11 16:54   ` Morten Rasmussen
  2013-11-11 18:31     ` Catalin Marinas
  2013-11-12 12:35   ` Vincent Guittot
  3 siblings, 1 reply; 22+ messages in thread
From: Morten Rasmussen @ 2013-11-11 16:54 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Paul Turner, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven,
	linux-pm@vger.kernel.org

On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
> Hi Vincent,
> 
> (cross-posting to linux-pm as it was agreed to follow up on this list)
> 
> On 18 October 2013 12:52, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> > This is the 5th version of the previously named "packing small tasks" patchset.
> > "small" has been removed because the patchset doesn't only target small tasks
> > anymore.
> >
> > This patchset takes advantage of the new per-task load tracking that is
> > available in the scheduler to pack the tasks in a minimum number of
> > CPU/Cluster/Core. The packing mechanism takes into account the power gating
> > topology of the CPUs to minimize the number of power domains that need to be
> > powered on simultaneously.
> 
> As a general comment, it's not clear how this set of patches address
> the bigger problem of energy aware scheduling, mainly because we
> haven't yet defined _what_ we want from the scheduler, what the
> scenarios are, constraints, are we prepared to give up some
> performance (speed, latency) for power, how much.
> 
> This packing heuristics may work for certain SoCs and workloads but,
> for example, there are modern ARM SoCs where the P-state has a much
> bigger effect on power and it's more energy-efficient to keep two CPUs
> in lower P-state than packing all tasks onto one, even though they may
> be gated independently. In such cases _small_ task packing (for some
> definition of 'small') would be more useful than general packing but
> even this is just heuristics that saves power for particular workloads
> without fully defining/addressing the problem.

When it comes to packing, I think the important things to figure out is
when to do it and how much. Those questions can only be answered when
the performance/energy trade-offs are known for the particular platform.
Packing seems to be a good idea for very small tasks, but I'm not so
sure about medium and big tasks. Packing the latter could lead to worse
performance (latency).

> 
> I would rather start by defining the main goal and working backwards
> to an algorithm. We may as well find that task packing based on this
> patch set is sufficient but we may also get packing-like behaviour as
> a side effect of a broader approach (better energy cost awareness). An
> important aspect even in the mobile space is keeping the performance
> as close as possible to the standard scheduler while saving a bit more

With the exception of big.LITTLE where we want to out-perform the
standard scheduler while saving power.

> power. Just trying to reduce the number of non-idle CPUs may not meet
> this requirement.
> 
> 
> So, IMO, defining the power topology is a good starting point and I
> think it's better to separate the patches from the energy saving
> algorithms like packing. We need to agree on what information we have
> (C-state details, coupling, power gating) and what we can/need to
> expose to the scheduler. This can be revisited once we start
> implementing/refining the energy awareness.
> 
> 2nd step is how the _current_ scheduler could use such information
> while keeping the current overall system behaviour (how much of
> cpuidle we should move into the scheduler).
> 
> Question for Peter/Ingo: do you want the scheduler to decide on which
> C-state a CPU should be in or we still leave this to a cpuidle
> layer/driver?
> 
> My understanding from the recent discussions is that the scheduler
> should decide directly on the C-state (or rather the deepest C-state
> possible since we don't want to duplicate the backend logic for
> synchronising CPUs going up or down). This means that the scheduler
> needs to know about C-state target residency, wake-up latency (I think
> we can leave coupled C-states to the backend, there is some complex
> synchronisation which I wouldn't duplicate).

It would be nice and simple to hide the complexity of the coupled
C-states, but we would loose the ability to prefer waking up cpus in a
cluster/package that already has non-idle cpus over cpus in a
cluster/package that has entered the coupled C-state. If we just know
the requested C-state of a cpu we can't tell the difference as it is
now.

> 
> Alternatively (my preferred approach), we get the scheduler to predict
> and pass the expected residency and latency requirements down to a
> power driver and read back the actual C-states for making task
> placement decisions. Some of the menu governor prediction logic could
> be turned into a library and used by the scheduler. Basically what
> this tries to achieve is better scheduler awareness of the current
> C-states decided by a cpuidle/power driver based on the scheduler
> constraints.

It might be easier to deal with the couple C-states using this approach.

> 
> 3rd step is optimising the scheduler for energy saving, taking into
> account the information added by the previous steps and possibly
> adding some more. This stage however has several sub-steps (that can
> be worked on in parallel to the steps above):
> 
> a) Define use-cases, typical workloads, acceptance criteria
> (performance, latency requirements).
> 
> b) Set of benchmarks simulating the scenarios above. I wouldn't bother
> with linsched since a power model is never realistic enough. It's
> better to run those benchmarks on real hardware and either estimate
> the energy based on the C/P states or, depending on SoC, read some
> sensors, energy probes. If the scheduler maintainers want to reproduce
> the numbers, I'm pretty sure we can ship some boards.
> 
> c) Start defining/implementing scheduler algorithm to do optimal task placement.
> 
> d) Assess the implementation against benchmarks at (b) *and* other
> typical performance benchmarks (whether it's for servers, mobile,
> Android etc). At this point we'll most likely go back and refine the
> previous steps.
> 
> So far we've jumped directly to (c) because we had some scenarios in
> mind that needed optimising but those haven't been written down and we
> don't have a clear way to assess the impact. There is more here than
> simply maximising the idle time. Ideally the scheduler should have an
> estimate of the overall energy cost, the cost per task, run-queue, the
> energy implications of moving the tasks to another run-queue, possibly
> taking the P-state into account (but not 'picking' a P-state).

The energy cost depends strongly on the P-state. I'm not sure if we can
avoid using at least a rough estimate of the P-state or a similar
metric in the energy cost estimation.

> 
> Anyway, I think we need to address the first steps and think about the
> algorithm once we have the bigger picture of what we try to solve.

I agree that we need to have the bigger picture in mind from the
beginning to avoid introducing changes that we later change again or
revert.

Morten

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:39     ` Arjan van de Ven
@ 2013-11-11 18:18       ` Catalin Marinas
  2013-11-11 18:20         ` Arjan van de Ven
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Catalin Marinas @ 2013-11-11 18:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Vincent Guittot, Linux Kernel Mailing List,
	Ingo Molnar, Paul Turner, Morten Rasmussen, Chris Metcalf,
	Tony Luck, alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On Mon, Nov 11, 2013 at 04:39:45PM +0000, Arjan van de Ven wrote:
> > I think the scheduler simply wants to say: we expect to go idle for X
> > ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.
> 
> as long as Y normally is "large" or "infinity" that is ok ;-)
> (a smaller Y will increase power consumption and decrease system performance)

Cpuidle already takes a latency into account via pm_qos. The scheduler
could pass this information down to the hardware driver or the cpuidle
driver could use pm_qos directly (as it's currently done in governors).

The scheduler may have its own requirements in terms of latency (e.g.
some real-time thread) and we could extend the pm_qos API with
per-thread information. But so far we don't have a way to pass such
per-thread requirements from user space (unless we assume that any
real-time thread has some fixed latency requirements). I suggest we
ignore this per-thread part until we find an actual need.

> > I think you also raised the point in that we do want some feedback as to
> > the cost of waking up particular cores to better make decisions on which
> > to wake. That is indeed so.
> 
> having a hardware driver give a prefered CPU ordering for wakes can indeed be useful.
> (I'm doubtful that changing the recommendation for each idle is going to pay off,
> but proof is in the pudding; there are certainly long term effects where this can help)

The ordering is based on the actual C-state, so a simple way is to wake
up the CPU in the shallowest C-state. With asymmetric configurations
(big.LITTLE) we have different costs for the same C-state, so this would
come in handy.

Even for symmetric configuration, the cost of moving a task to a CPU
includes wake-up cost plus the run-time cost which depends on the
P-state after wake-up (that's much trickier since we can't easily
estimate the cost of a P-state and it may change once you place a task
on it).

-- 
Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 18:18       ` Catalin Marinas
@ 2013-11-11 18:20         ` Arjan van de Ven
  2013-11-12 12:06         ` Morten Rasmussen
  2013-11-12 16:48         ` Arjan van de Ven
  2 siblings, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-11 18:20 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Vincent Guittot, Linux Kernel Mailing List,
	Ingo Molnar, Paul Turner, Morten Rasmussen, Chris Metcalf,
	Tony Luck, alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 11/11/2013 10:18 AM, Catalin Marinas wrote:

>
> Even for symmetric configuration, the cost of moving a task to a CPU
> includes wake-up cost plus the run-time cost which depends on the
> P-state after wake-up (that's much trickier since we can't easily
> estimate the cost of a P-state and it may change once you place a task
> on it).

yup including cache refill times (assuming you picked C states
that flushed the cache, which will be the common case... but even
if not, since you're moving at task the likelyhood of cache coldness is high)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:54   ` Morten Rasmussen
@ 2013-11-11 18:31     ` Catalin Marinas
  2013-11-11 19:26       ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2013-11-11 18:31 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Vincent Guittot, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Paul Turner, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven,
	linux-pm@vger.kernel.org

On Mon, Nov 11, 2013 at 04:54:54PM +0000, Morten Rasmussen wrote:
> On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
> > I would rather start by defining the main goal and working backwards
> > to an algorithm. We may as well find that task packing based on this
> > patch set is sufficient but we may also get packing-like behaviour as
> > a side effect of a broader approach (better energy cost awareness). An
> > important aspect even in the mobile space is keeping the performance
> > as close as possible to the standard scheduler while saving a bit more
> 
> With the exception of big.LITTLE where we want to out-perform the
> standard scheduler while saving power.

Good point. Maybe we should start with a separate set of patches for
improving the performance on asymmetric configurations like big.LITTLE
while ignoring (deferring) the power aspect. Things like placing bigger
threads on bigger CPUs and so on (you know better what's needed here ;).

> > My understanding from the recent discussions is that the scheduler
> > should decide directly on the C-state (or rather the deepest C-state
> > possible since we don't want to duplicate the backend logic for
> > synchronising CPUs going up or down). This means that the scheduler
> > needs to know about C-state target residency, wake-up latency (I think
> > we can leave coupled C-states to the backend, there is some complex
> > synchronisation which I wouldn't duplicate).
> 
> It would be nice and simple to hide the complexity of the coupled
> C-states, but we would loose the ability to prefer waking up cpus in a
> cluster/package that already has non-idle cpus over cpus in a
> cluster/package that has entered the coupled C-state. If we just know
> the requested C-state of a cpu we can't tell the difference as it is
> now.

I agree, we can't rely on the requested C-state but the _actual_ state
and this means querying the hardware driver. Can we abstract this via
some interface which provides the cost of waking up a CPU? This could
take into account the state of the other CPUs in the cluster and the
scheduler is simply concerned with the wake-up costs.

> > Alternatively (my preferred approach), we get the scheduler to predict
> > and pass the expected residency and latency requirements down to a
> > power driver and read back the actual C-states for making task
> > placement decisions. Some of the menu governor prediction logic could
> > be turned into a library and used by the scheduler. Basically what
> > this tries to achieve is better scheduler awareness of the current
> > C-states decided by a cpuidle/power driver based on the scheduler
> > constraints.
> 
> It might be easier to deal with the couple C-states using this approach.

We already have drivers taking care of the couple C-states, so it means
passing the information back to the scheduler in some way (actual
C-state or wake-up cost).

It would be nice if we can describe the wake-up costs statically while
considering coupled C-states but it needs more thinking.

-- 
Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 18:31     ` Catalin Marinas
@ 2013-11-11 19:26       ` Arjan van de Ven
  2013-11-11 22:43         ` Nicolas Pitre
  2013-11-11 23:43         ` Catalin Marinas
  0 siblings, 2 replies; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-11 19:26 UTC (permalink / raw)
  To: Catalin Marinas, Morten Rasmussen
  Cc: Vincent Guittot, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Paul Turner, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 11/11/2013 10:31 AM, Catalin Marinas wrote:
> I agree, we can't rely on the requested C-state but the_actual_  state
> and this means querying the hardware driver. Can we abstract this via
> some interface which provides the cost of waking up a CPU? This could
> take into account the state of the other CPUs in the cluster and the
> scheduler is simply concerned with the wake-up costs.

can you even query this without actually waking up the cpu and asking ???


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 19:26       ` Arjan van de Ven
@ 2013-11-11 22:43         ` Nicolas Pitre
  2013-11-11 23:43         ` Catalin Marinas
  1 sibling, 0 replies; 22+ messages in thread
From: Nicolas Pitre @ 2013-11-11 22:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Catalin Marinas, Morten Rasmussen, Vincent Guittot,
	Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Paul Turner, Chris Metcalf, Tony Luck, alex.shi@intel.com,
	Preeti U Murthy, linaro-kernel, len.brown@intel.com,
	l.majewski@samsung.com, Jonathan Corbet, Rafael J. Wysocki,
	Paul McKenney, linux-pm@vger.kernel.org

On Mon, 11 Nov 2013, Arjan van de Ven wrote:

> On 11/11/2013 10:31 AM, Catalin Marinas wrote:
> > I agree, we can't rely on the requested C-state but the_actual_  state
> > and this means querying the hardware driver. Can we abstract this via
> > some interface which provides the cost of waking up a CPU? This could
> > take into account the state of the other CPUs in the cluster and the
> > scheduler is simply concerned with the wake-up costs.
> 
> can you even query this without actually waking up the cpu and asking ???

On those systems we're interested in we sure can.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 19:26       ` Arjan van de Ven
  2013-11-11 22:43         ` Nicolas Pitre
@ 2013-11-11 23:43         ` Catalin Marinas
  1 sibling, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2013-11-11 23:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Morten Rasmussen, Vincent Guittot, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Paul Turner, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 11 Nov 2013, at 19:26, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 11/11/2013 10:31 AM, Catalin Marinas wrote:
>> I agree, we can't rely on the requested C-state but the_actual_  state
>> and this means querying the hardware driver. Can we abstract this via
>> some interface which provides the cost of waking up a CPU? This could
>> take into account the state of the other CPUs in the cluster and the
>> scheduler is simply concerned with the wake-up costs.
> 
> can you even query this without actually waking up the cpu and asking ???

Even if you don’t have additional hardware to query the state of a CPU
without waking it up, we could have a per-CPU variable storing the
actual C-states as selected by the arch backend.  This doesn’t need to
be precise but lets say only 90% accurate would probably be enough for
the scheduler.

Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:38   ` Peter Zijlstra
  2013-11-11 16:40     ` Arjan van de Ven
@ 2013-11-12 10:36     ` Vincent Guittot
  1 sibling, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2013-11-12 10:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, Lukasz Majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven,
	linux-pm@vger.kernel.org

On 11 November 2013 17:38, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
>> My understanding from the recent discussions is that the scheduler
>> should decide directly on the C-state (or rather the deepest C-state
>> possible since we don't want to duplicate the backend logic for
>> synchronising CPUs going up or down). This means that the scheduler
>> needs to know about C-state target residency, wake-up latency (I think
>> we can leave coupled C-states to the backend, there is some complex
>> synchronisation which I wouldn't duplicate).
>>
>> Alternatively (my preferred approach), we get the scheduler to predict
>> and pass the expected residency and latency requirements down to a
>> power driver and read back the actual C-states for making task
>> placement decisions. Some of the menu governor prediction logic could
>> be turned into a library and used by the scheduler. Basically what
>> this tries to achieve is better scheduler awareness of the current
>> C-states decided by a cpuidle/power driver based on the scheduler
>> constraints.
>
> Ah yes.. so I _think_ the scheduler wants to eventually know about idle
> topology constraints. But we can get there in a gradual fashion I hope.
>
> Like the package C states on x86 -- for those to be effective the
> scheduler needs to pack tasks and keep entire packages idle for as long
> as possible.

That's the purpose of patches 12, 13 and 14. To get the current wakeup
latency of a core and use it when selecting a core

>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 18:18       ` Catalin Marinas
  2013-11-11 18:20         ` Arjan van de Ven
@ 2013-11-12 12:06         ` Morten Rasmussen
  2013-11-12 16:48         ` Arjan van de Ven
  2 siblings, 0 replies; 22+ messages in thread
From: Morten Rasmussen @ 2013-11-12 12:06 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Arjan van de Ven, Peter Zijlstra, Vincent Guittot,
	Linux Kernel Mailing List, Ingo Molnar, Paul Turner,
	Chris Metcalf, Tony Luck, alex.shi@intel.com, Preeti U Murthy,
	linaro-kernel, len.brown@intel.com, l.majewski@samsung.com,
	Jonathan Corbet, Rafael J. Wysocki, Paul McKenney,
	linux-pm@vger.kernel.org

On Mon, Nov 11, 2013 at 06:18:05PM +0000, Catalin Marinas wrote:
> On Mon, Nov 11, 2013 at 04:39:45PM +0000, Arjan van de Ven wrote:
> > having a hardware driver give a prefered CPU ordering for wakes can indeed be useful.
> > (I'm doubtful that changing the recommendation for each idle is going to pay off,
> > but proof is in the pudding; there are certainly long term effects where this can help)
> 
> The ordering is based on the actual C-state, so a simple way is to wake
> up the CPU in the shallowest C-state. With asymmetric configurations
> (big.LITTLE) we have different costs for the same C-state, so this would
> come in handy.

Asymmetric configurations add a bit of extra fun to deal with as you
don't want to pick the cpu in the shallowest C-state if it is the wrong
type of cpu for the task waking up. That goes for both big and little
cpus in big.LITTLE.

So the hardware driver would need to know which cpus that are suitable
targets for the task, or we need to somehow limit the driver query to
suitable cpus, or the driver should return a list of cpus guaranteed to
include cpus of all types (big, little, whatever...).

Morten

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 11:33 ` [RFC][PATCH v5 00/14] sched: packing tasks Catalin Marinas
                     ` (2 preceding siblings ...)
  2013-11-11 16:54   ` Morten Rasmussen
@ 2013-11-12 12:35   ` Vincent Guittot
  3 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2013-11-12 12:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	Preeti U Murthy, linaro-kernel, len.brown@intel.com,
	Lukasz Majewski, Jonathan Corbet, Rafael J. Wysocki,
	Paul McKenney, Arjan van de Ven, linux-pm@vger.kernel.org,
	Daniel Lezcano, Tuukka Tikkanen, Alex Shi

On 11 November 2013 12:33, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Hi Vincent,
>
> (cross-posting to linux-pm as it was agreed to follow up on this list)
>

<snip>

>
> So, IMO, defining the power topology is a good starting point and I
> think it's better to separate the patches from the energy saving
> algorithms like packing. We need to agree on what information we have

Daniel and Tuukka who are working in cpuidle consolidation in the
scheduler, are also interested in using similar topology information
than me. I have made 1 patchset only because the information was only
used here. So I can probably make a separate patchset of the power
topology description in DT.

Vincent

> (C-state details, coupling, power gating) and what we can/need to
> expose to the scheduler. This can be revisited once we start
> implementing/refining the energy awareness.
>

<snip>
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 18:18       ` Catalin Marinas
  2013-11-11 18:20         ` Arjan van de Ven
  2013-11-12 12:06         ` Morten Rasmussen
@ 2013-11-12 16:48         ` Arjan van de Ven
  2013-11-12 23:14           ` Catalin Marinas
  2 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-12 16:48 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Vincent Guittot, Linux Kernel Mailing List,
	Ingo Molnar, Paul Turner, Morten Rasmussen, Chris Metcalf,
	Tony Luck, alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 11/11/2013 10:18 AM, Catalin Marinas wrote:
> The ordering is based on the actual C-state, so a simple way is to wake
> up the CPU in the shallowest C-state. With asymmetric configurations
> (big.LITTLE) we have different costs for the same C-state, so this would
> come in handy.

btw I was considering something else; in practice CPUs will be in the deepest state..
... at which point I was going to go with some other metrics of what is best from a platform level


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:36   ` Peter Zijlstra
  2013-11-11 16:39     ` Arjan van de Ven
@ 2013-11-12 17:40     ` Catalin Marinas
  2013-11-25 18:55     ` Daniel Lezcano
  2 siblings, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2013-11-12 17:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven,
	linux-pm@vger.kernel.org

On Mon, Nov 11, 2013 at 04:36:30PM +0000, Peter Zijlstra wrote:
> On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
> 
> tl;dr :-) Still trying to wrap my head around how to do that weird
> topology Vincent raised..

Long email, I know, but topology discussion is a good start ;).

To summarise the rest, I don't see full task packing as useful but
rather getting packing as a result of other decisions (like trying to
estimate the cost of task placement and refining the algorithm from
there). There are ARM SoCs where maximising idle time does not always
mean maximising the energy saving even if the cores can be power-gated
individually (unless you have small workload that doesn't increase the
P-state on the packing CPU).

> > Question for Peter/Ingo: do you want the scheduler to decide on which
> > C-state a CPU should be in or we still leave this to a cpuidle
> > layer/driver?
> 
> I think the can leave most of that in a driver; right along with how to
> prod the hardware to actually get into that state.
> 
> I think the most important parts are what is now 'generic' code; stuff
> that guestimates the idle-time and so forth.
> 
> I think the scheduler simply wants to say: we expect to go idle for X
> ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.

Sounds good (and I think the Linaro guys started looking into this).

> I think you also raised the point in that we do want some feedback as to
> the cost of waking up particular cores to better make decisions on which
> to wake. That is indeed so.

It depends on how we end up implementing the energy awareness in the
scheduler but too simple topology (just which CPUs can be power-gated)
is not that useful.

In a very simplistic and ideal world (note the 'ideal' part), we could
estimate the energy cost of a CPU for a period T:

E = sum(P(Cx) * Tx) + sum(wake-up-energy) + sum(P(Ty) * Ty)

  P(Cx): power in C-state x
  wake-up-energy: the cost of waking up from various C-states
  P(Ty): power of running task y (which also depends on the P-state)
  sum(Tx) + sum(Ty) = T

Assuming that we have such information and can predict (based on past
usage) what the task loads will be, together with other
performance/latency constraints, an 'ideal' scheduler would always
choose the correct C/P states and task placements for optimal energy.
However, the reality is different and even so it would be an NP problem.

But we can try to come up with some "guestimates" based on parameters
provided by the SoC (via DT or ACPI tables or just some low-level
driver/arch code). The scheduler does its best according on these
parameters at certain times (task wake-up, idle balance) while the SoC
can still tune the behaviour.

If we roughly estimate the energy cost of a run-queue and the energy
cost of individual tasks on that run-queue (based on their load and
P-state), we could estimate the cost of moving or waking the
task on another CPU (where the task's cost may change depending on
asymmetric configurations or different P-state). We don't even need to
be precise in the energy costs but just some relative numbers so that
the scheduler can favour one CPU or another. If we ignore P-state costs
and only consider C-states and symmetric configurations, we probably get
a behaviour similar to Vincent's task packing patches.

The information we have currently for C-states is target residency and
exit latency. From these I think we can only infer the wake-up energy
cost not how much we save but placing a CPU into that state. So if we
want the scheduler to decide whether to pack or spread (from an energy
cost perspective), we need additional information in the topology.

Alternatively we could have a power driver which dynamically returns
some estimates every time the scheduler asks for them, with a power
driver for each SoC (which is already the case for ARM SoCs).

-- 
Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-12 16:48         ` Arjan van de Ven
@ 2013-11-12 23:14           ` Catalin Marinas
  2013-11-13 16:13             ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2013-11-12 23:14 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Vincent Guittot, linux-kernel, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 12 Nov 2013, at 16:48, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 11/11/2013 10:18 AM, Catalin Marinas wrote:
>> The ordering is based on the actual C-state, so a simple way is to wake
>> up the CPU in the shallowest C-state. With asymmetric configurations
>> (big.LITTLE) we have different costs for the same C-state, so this would
>> come in handy.
> 
> btw I was considering something else; in practice CPUs will be in the deepest state..
> ... at which point I was going to go with some other metrics of what is best from a platform level

I agree, other metrics are needed.  The problem is that we currently
only have (relatively, guessed from the target residency) the cost of
transition from a C-state to a P-state (for the latter, not sure which).
But we don’t know what the power (saving) on that C-state is nor the one 
at a P-state (and vendors reluctant to provide such information). So the 
best the scheduler can do is optimise the wake-up cost and blindly assume 
that deeper C-state on a CPU is more efficient than lower P-states on two 
other CPUs (or the other way around).

If we find a good use for such metrics in the scheduler, I think the
vendors would be more open to providing at least some relative (rather
than absolute) numbers.

Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-12 23:14           ` Catalin Marinas
@ 2013-11-13 16:13             ` Arjan van de Ven
  2013-11-13 16:45               ` Catalin Marinas
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-13 16:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Vincent Guittot, linux-kernel, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On 11/12/2013 3:14 PM, Catalin Marinas wrote:
> On 12 Nov 2013, at 16:48, Arjan van de Ven <arjan@linux.intel.com> wrote:
>> On 11/11/2013 10:18 AM, Catalin Marinas wrote:
>>> The ordering is based on the actual C-state, so a simple way is to wake
>>> up the CPU in the shallowest C-state. With asymmetric configurations
>>> (big.LITTLE) we have different costs for the same C-state, so this would
>>> come in handy.
>>
>> btw I was considering something else; in practice CPUs will be in the deepest state..
>> ... at which point I was going to go with some other metrics of what is best from a platform level
>
> I agree, other metrics are needed.  The problem is that we currently
> only have (relatively, guessed from the target residency) the cost of
> transition from a C-state to a P-state (for the latter, not sure which).
> But we don’t know what the power (saving) on that C-state is nor the one
> at a P-state (and vendors reluctant to provide such information). So the
> best the scheduler can do is optimise the wake-up cost and blindly assume
> that deeper C-state on a CPU is more efficient than lower P-states on two
> other CPUs (or the other way around).

for picking the cpu to wake on there are also low level physical kind of things
we'd want to take into account on the intel side.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-13 16:13             ` Arjan van de Ven
@ 2013-11-13 16:45               ` Catalin Marinas
  2013-11-13 17:56                 ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2013-11-13 16:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Vincent Guittot, linux-kernel, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org

On Wed, Nov 13, 2013 at 04:13:57PM +0000, Arjan van de Ven wrote:
> On 11/12/2013 3:14 PM, Catalin Marinas wrote:
> > On 12 Nov 2013, at 16:48, Arjan van de Ven <arjan@linux.intel.com> wrote:
> >> On 11/11/2013 10:18 AM, Catalin Marinas wrote:
> >>> The ordering is based on the actual C-state, so a simple way is to wake
> >>> up the CPU in the shallowest C-state. With asymmetric configurations
> >>> (big.LITTLE) we have different costs for the same C-state, so this would
> >>> come in handy.
> >>
> >> btw I was considering something else; in practice CPUs will be in the deepest state..
> >> ... at which point I was going to go with some other metrics of what is best from a platform level
> >
> > I agree, other metrics are needed.  The problem is that we currently
> > only have (relatively, guessed from the target residency) the cost of
> > transition from a C-state to a P-state (for the latter, not sure which).
> > But we don’t know what the power (saving) on that C-state is nor the one
> > at a P-state (and vendors reluctant to provide such information). So the
> > best the scheduler can do is optimise the wake-up cost and blindly assume
> > that deeper C-state on a CPU is more efficient than lower P-states on two
> > other CPUs (or the other way around).
> 
> for picking the cpu to wake on there are also low level physical kind of things
> we'd want to take into account on the intel side.

Are these static and could they be hidden behind some cost number in a
topology description? If they are dynamic, we would need arch or driver
hooks to give some cost or priority number that the scheduler can use.

-- 
Catalin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-13 16:45               ` Catalin Marinas
@ 2013-11-13 17:56                 ` Arjan van de Ven
  0 siblings, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2013-11-13 17:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Vincent Guittot, linux-kernel, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski@samsung.com, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, linux-pm@vger.kernel.org


>> for picking the cpu to wake on there are also low level physical kind of things
>> we'd want to take into account on the intel side.
>
> Are these static and could they be hidden behind some cost number in a
> topology description? If they are dynamic, we would need arch or driver
> hooks to give some cost or priority number that the scheduler can use.

they're dynamic but slow moving (say, reevaluated once per second)

so we could have a static table that some driver updates async


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH v5 00/14] sched: packing tasks
  2013-11-11 16:36   ` Peter Zijlstra
  2013-11-11 16:39     ` Arjan van de Ven
  2013-11-12 17:40     ` Catalin Marinas
@ 2013-11-25 18:55     ` Daniel Lezcano
  2 siblings, 0 replies; 22+ messages in thread
From: Daniel Lezcano @ 2013-11-25 18:55 UTC (permalink / raw)
  To: Peter Zijlstra, Catalin Marinas
  Cc: Vincent Guittot, Linux Kernel Mailing List, Ingo Molnar,
	Paul Turner, Morten Rasmussen, Chris Metcalf, Tony Luck,
	alex.shi@intel.com, Preeti U Murthy, linaro-kernel,
	len.brown@intel.com, l.majewski, Jonathan Corbet,
	Rafael J. Wysocki, Paul McKenney, Arjan van de Ven, linux-pm

On 11/11/2013 05:36 PM, Peter Zijlstra wrote:
> On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
>
> tl;dr :-) Still trying to wrap my head around how to do that weird
> topology Vincent raised..
>
>> Question for Peter/Ingo: do you want the scheduler to decide on which
>> C-state a CPU should be in or we still leave this to a cpuidle
>> layer/driver?
>
> I think the can leave most of that in a driver; right along with how to
> prod the hardware to actually get into that state.
>
> I think the most important parts are what is now 'generic' code; stuff
> that guestimates the idle-time and so forth.
>
> I think the scheduler simply wants to say: we expect to go idle for X
> ns, we want a guaranteed wakeup latency of Y ns -- go do your thing.

Hi Peter,

IIUC, for full integration in the scheduler, we should eradicate the 
idle task and the related code tied with it, no ?

> I think you also raised the point in that we do want some feedback as to
> the cost of waking up particular cores to better make decisions on which
> to wake. That is indeed so.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-11-25 18:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1382097147-30088-1-git-send-email-vincent.guittot@linaro.org>
2013-11-11 11:33 ` [RFC][PATCH v5 00/14] sched: packing tasks Catalin Marinas
2013-11-11 16:36   ` Peter Zijlstra
2013-11-11 16:39     ` Arjan van de Ven
2013-11-11 18:18       ` Catalin Marinas
2013-11-11 18:20         ` Arjan van de Ven
2013-11-12 12:06         ` Morten Rasmussen
2013-11-12 16:48         ` Arjan van de Ven
2013-11-12 23:14           ` Catalin Marinas
2013-11-13 16:13             ` Arjan van de Ven
2013-11-13 16:45               ` Catalin Marinas
2013-11-13 17:56                 ` Arjan van de Ven
2013-11-12 17:40     ` Catalin Marinas
2013-11-25 18:55     ` Daniel Lezcano
2013-11-11 16:38   ` Peter Zijlstra
2013-11-11 16:40     ` Arjan van de Ven
2013-11-12 10:36     ` Vincent Guittot
2013-11-11 16:54   ` Morten Rasmussen
2013-11-11 18:31     ` Catalin Marinas
2013-11-11 19:26       ` Arjan van de Ven
2013-11-11 22:43         ` Nicolas Pitre
2013-11-11 23:43         ` Catalin Marinas
2013-11-12 12:35   ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).