Re: [PATCH v3 16/22] sched: add power aware scheduling in fork/exec/wake

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Alex Shi <alex.shi@intel.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"bp@alien8.de" <bp@alien8.de>, "pjt@google.com" <pjt@google.com>,
	"namhyung@kernel.org" <namhyung@kernel.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 16/22] sched: add power aware scheduling in fork/exec/wake
Date: Wed, 16 Jan 2013 14:27:30 +0000	[thread overview]
Message-ID: <20130116142730.GA30805@e103034-lin> (raw)
In-Reply-To: <50F6426D.7030201@intel.com>

On Wed, Jan 16, 2013 at 06:02:21AM +0000, Alex Shi wrote:
> On 01/15/2013 12:09 AM, Morten Rasmussen wrote:
> > On Fri, Jan 11, 2013 at 07:08:45AM +0000, Alex Shi wrote:
> >> On 01/10/2013 11:01 PM, Morten Rasmussen wrote:
> >>> On Sat, Jan 05, 2013 at 08:37:45AM +0000, Alex Shi wrote:
> >>>> This patch add power aware scheduling in fork/exec/wake. It try to
> >>>> select cpu from the busiest while still has utilization group. That's
> >>>> will save power for other groups.
> >>>>
> >>>> The trade off is adding a power aware statistics collection in group
> >>>> seeking. But since the collection just happened in power scheduling
> >>>> eligible condition, the worst case of hackbench testing just drops
> >>>> about 2% with powersaving/balance policy. No clear change for
> >>>> performance policy.
> >>>>
> >>>> I had tried to use rq load avg utilisation in this balancing, but since
> >>>> the utilisation need much time to accumulate itself. It's unfit for any
> >>>> burst balancing. So I use nr_running as instant rq utilisation.
> >>>
> >>> So you effective use a mix of nr_running (counting tasks) and PJT's
> >>> tracked load for balancing?
> >>
> >> no, just task number here.
> >>>
> >>> The problem of slow reaction time of the tracked load a cpu/rq is an
> >>> interesting one. Would it be possible to use it if you maintained a
> >>> sched group runnable_load_avg similar to cfs_rq->runnable_load_avg where
> >>> load contribution of a tasks is added when a task is enqueued and
> >>> removed again if it migrates to another cpu?
> >>> This way you would know the new load of the sched group/domain instantly
> >>> when you migrate a task there. It might not be precise as the load
> >>> contribution of the task to some extend depends on the load of the cpu
> >>> where it is running. But it would probably be a fair estimate, which is
> >>> quite likely to be better than just counting tasks (nr_running).
> >>
> >> For power consideration scenario, it ask task number less than Lcpu
> >> number, don't care the load weight, since whatever the load weight, the
> >> task only can burn one LCPU.
> >>
> > 
> > True, but you miss the opportunities for power saving when you have many
> > light tasks (> LCPU). Currently, the sd_utils < threshold check will go
> > for SCHED_POLICY_PERFORMANCE if the number tasks (sd_utils) is greater
> > than the domain weight/capacity irrespective of the actual load caused
> > by those tasks.
> > 
> > If you used tracked task load weight for sd_utils instead you would be
> > able to go for power saving in scenarios with many light tasks as well.
> 
> yes, that's right on power consideration. but for performance consider,
> it's better to spread tasks on different LCPU to save CS cost. And if
> the cpu usage is nearly full, we don't know if some tasks real want more
> cpu time.

If the cpu is nearly full according to its tracked load it should not be
used for packing more tasks. It is the nearly idle scenario that I am
more interested in. If you have lots of task with tracked load <10% then
why not pack them. The performance impact should be minimal.

Furthermore, nr_running is just a snapshot of the current runqueue
status. The combination of runnable and blocked load should give a
better overall view of the cpu loads.

> Even in the power sched policy, we still want to get better performance
> if it's possible. :)

I agree if it comes for free in terms of power. In my opinion it is
acceptable to sacrifice a bit of performance to save power when using a
power sched policy as long as the performance regression can be
justified by the power savings. It will of course depend on the system
and its usage how trade-off power and performance. My point is just that
with multiple sched policies (performance, balance and power as you
propose) it should be acceptable to focus on power for the power policy
and let users that only/mostly care about performance use the balance or
performance policy.

> > 
> >>>> +
> >>>> +		if (sched_policy == SCHED_POLICY_POWERSAVING)
> >>>> +			threshold = sgs.group_weight;
> >>>> +		else
> >>>> +			threshold = sgs.group_capacity;
> >>>
> >>> Is group_capacity larger or smaller than group_weight on your platform?
> >>
> >> Guess most of your confusing come from the capacity != weight here.
> >>
> >> In most of Intel CPU, a cpu core's power(with 2 HT) is usually 1178, it
> >> just bigger than a normal cpu power - 1024. but the capacity is still 1,
> >> while the group weight is 2.
> >>
> > 
> > Thanks for clarifying. To the best of my knowledge there are no
> > guidelines for how to specify cpu power so it may be a bit dangerous to
> > assume that capacity < weight when capacity is based on cpu power.
> 
> Sure. I also just got them from code. and don't know other arch how to
> different them.
> but currently, seems this cpu power concept works fine.

Yes, it seems to work fine for your test platform. I just want to
highlight that the assumption you make might not be valid for other
architectures. I know that cpu power is not widely used, but that may
change with the increasing focus on power aware scheduling.

Morten

> > 
> > You could have architectures where the cpu power of each LCPU (HT, core,
> > cpu, whatever LCPU is on the particular platform) is greater than 1024
> > for most LCPUs. In that case, the capacity < weight assumption fails.
> > Also, on non-HT systems it is quite likely that you will have capacity =
> > weight.
> 
> yes.
> > 
> > Morten
> > 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >>
> > 
> 
> 
> -- 
> Thanks Alex
>

next prev parent reply	other threads:[~2013-01-16 14:27 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-05  8:37 [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Alex Shi
2013-01-05  8:37 ` [PATCH v3 01/22] sched: set SD_PREFER_SIBLING on MC domain to reduce a domain level Alex Shi
2013-01-05  8:37 ` [PATCH v3 02/22] sched: select_task_rq_fair clean up Alex Shi
2013-01-11  4:57   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 03/22] sched: fix find_idlest_group mess logical Alex Shi
2013-01-11  4:59   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 04/22] sched: don't need go to smaller sched domain Alex Shi
2013-01-09 17:38   ` Morten Rasmussen
2013-01-10  3:16     ` Mike Galbraith
2013-01-11  5:02   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake Alex Shi
2013-01-09 18:21   ` Morten Rasmussen
2013-01-11  2:46     ` Alex Shi
2013-01-11 10:07       ` Morten Rasmussen
2013-01-11 14:50         ` Alex Shi
2013-01-14  8:55         ` li guang
2013-01-14  9:18           ` Alex Shi
2013-01-11  4:56     ` Preeti U Murthy
2013-01-11  8:01       ` li guang
2013-01-11 14:56         ` Alex Shi
2013-01-14  9:03           ` li guang
2013-01-15  2:34             ` Alex Shi
2013-01-16  1:54               ` li guang
2013-01-11 10:54       ` Morten Rasmussen
2013-01-16  5:43       ` Alex Shi
2013-01-16  7:41         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 06/22] sched: load tracking bug fix Alex Shi
2013-01-05  8:37 ` [PATCH v3 07/22] sched: set initial load avg of new forked task Alex Shi
2013-01-11  5:10   ` Preeti U Murthy
2013-01-11  5:44     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 08/22] sched: update cpu load after task_tick Alex Shi
2013-01-05  8:37 ` [PATCH v3 09/22] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-01-05  8:56   ` Alex Shi
2013-01-06  7:54     ` Alex Shi
2013-01-06 18:31       ` Linus Torvalds
2013-01-07  7:00         ` Preeti U Murthy
2013-01-08 14:27         ` Alex Shi
2013-01-11  6:31         ` Alex Shi
2013-01-21 14:47           ` Alex Shi
2013-01-22  3:20             ` Alex Shi
2013-01-22  6:55               ` Mike Galbraith
2013-01-22  7:50                 ` Alex Shi
2013-01-22  9:52                   ` Mike Galbraith
2013-01-23  0:36                     ` Alex Shi
2013-01-23  1:47                       ` Mike Galbraith
2013-01-23  2:01                         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 10/22] sched: consider runnable load average in move_tasks Alex Shi
2013-01-05  8:37 ` [PATCH v3 11/22] sched: consider runnable load average in effective_load Alex Shi
2013-01-10 11:28   ` Morten Rasmussen
2013-01-11  3:26     ` Alex Shi
2013-01-14 12:01       ` Morten Rasmussen
2013-01-16  5:30         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 12/22] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-01-05  8:37 ` [PATCH v3 13/22] sched: add sched_policy in kernel Alex Shi
2013-01-05  8:37 ` [PATCH v3 14/22] sched: add sched_policy and it's sysfs interface Alex Shi
2013-01-14  6:53   ` Namhyung Kim
2013-01-14  8:11     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 15/22] sched: log the cpu utilization at rq Alex Shi
2013-01-10 11:40   ` Morten Rasmussen
2013-01-11  3:30     ` Alex Shi
2013-01-14 13:59       ` Morten Rasmussen
2013-01-16  5:53         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 16/22] sched: add power aware scheduling in fork/exec/wake Alex Shi
2013-01-10 15:01   ` Morten Rasmussen
2013-01-11  7:08     ` Alex Shi
2013-01-14 16:09       ` Morten Rasmussen
2013-01-16  6:02         ` Alex Shi
2013-01-16 14:27           ` Morten Rasmussen [this message]
2013-01-17  5:47             ` Namhyung Kim
2013-01-18 13:41               ` Alex Shi
2013-01-14  7:03   ` Namhyung Kim
2013-01-14  8:30     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing Alex Shi
2013-01-10 17:17   ` Morten Rasmussen
2013-01-11  3:47     ` Alex Shi
2013-01-14  7:13       ` Namhyung Kim
2013-01-16  6:11         ` Alex Shi
2013-01-16 12:52           ` Namhyung Kim
2013-01-14 17:00       ` Morten Rasmussen
2013-01-16  7:32         ` Alex Shi
2013-01-16 15:08           ` Morten Rasmussen
2013-01-18 14:06             ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 18/22] sched: add power/performance balance allowed flag Alex Shi
2013-01-05  8:37 ` [PATCH v3 19/22] sched: pull all tasks from source group Alex Shi
2013-01-05  8:37 ` [PATCH v3 20/22] sched: don't care if the local group has capacity Alex Shi
2013-01-05  8:37 ` [PATCH v3 21/22] sched: power aware load balance, Alex Shi
2013-01-05  8:37 ` [PATCH v3 22/22] sched: lazy powersaving balance Alex Shi
2013-01-14  8:39   ` Namhyung Kim
2013-01-14  8:45     ` Alex Shi
2013-01-09 17:16 ` [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Morten Rasmussen
2013-01-10  3:49   ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130116142730.GA30805@e103034-lin \
    --to=morten.rasmussen@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=efault@gmx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).