From: Peter Zijlstra <peterz@infradead.org>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, Venki Pallipadi <venki@google.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Mike Galbraith <efault@gmx.de>,
linux-kernel <linux-kernel@vger.kernel.org>,
Tim Chen <tim.c.chen@linux.jf.intel.com>,
"Shi, Alex" <alex.shi@intel.com>
Subject: Re: [patch 3/6] sched, nohz: sched group, domain aware nohz idle load balancing
Date: Tue, 29 Nov 2011 10:44:19 +0100 [thread overview]
Message-ID: <1322559859.2921.190.camel@twins> (raw)
In-Reply-To: <1322524316.21329.64.camel@sbsiddha-desk.sc.intel.com>
On Mon, 2011-11-28 at 15:51 -0800, Suresh Siddha wrote:
> On Thu, 2011-11-24 at 03:47 -0800, Peter Zijlstra wrote:
> > On Fri, 2011-11-18 at 15:03 -0800, Suresh Siddha wrote:
> > > + for_each_domain(cpu, sd) {
> > > + struct sched_group *sg = sd->groups;
> > > + struct sched_group_power *sgp = sg->sgp;
> > > + int nr_busy = atomic_read(&sgp->nr_busy_cpus);
> > > +
> > > + if (nr_busy > 1 && (nr_busy * SCHED_LOAD_SCALE > sgp->power))
> > > + goto need_kick;
> >
> > This looks wrong, its basically always true for a box with HT.
>
> In the presence of two busy HT siblings, we need to do the idle load
> balance to figure out if the load from the busy core can be migrated to
> any other idle core/sibling in the platform. And at this point, we
> already know there are idle cpu's in the platform.
might have to, this nr_busy doesn't mean its actually busy, just that
its not nohz, it might very well be idle.
> But you are right. using group power like the above is not right. For
> example in the case of two sockets with each socket having dual core
> with no HT, if one socket is completely busy with another completely
> idle, we would like to identify this. But the group power of that socket
> will be 2 * SCHED_POWER_SCALE.
Right, minus whatever time was taken by interrupts and RT tasks.
> In the older kernels, for the domains which was sharing package
> resources, we were setting the group power to SCHED_POWER_SCALE for the
> default performance mode. And I has that old code in the mind, while
> doing the above check.
>
> I will modify the above check to:
>
> if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
> goto need_kick;
>
> This way, if there is a SMT/MC domain with more than one busy cpu in the
> group, then we will request for the idle load balancing.
Potentially 1 more than 1 busy, right? And we do the balancing just in
case there are indeed busy cpus.
I think its useful to mention that somewhere near, that this nr_busy
measure we use is an upper bound on actual busy.
> Current mainline code kicks the idle load balancer if there are two busy
> cpus in the system. Above mentioned modification makes this decision
> some what better. For example, two busy cpu's in two different sockets
> or two busy cpu's in a dual-core single socket system will never kick
> idle load balancer (as there is no need).
Except in power balance mode (although that's probably busted anyway),
where we want to aggregate the two tasks over two sockets onto one
socket.
> In future we can add more heuristics to kick the idle load balancer only
> when it is really necessary (for example when there is a real imbalance
> between the highest and lowest loaded groups etc). Only catch is to
> identify those scenarios with out adding much penality to the busy cpu
> which is identifying the imbalance and kicking the idle load balancer.
> Above proposed approach is the simplest approach that is trying to do
> better than the current logic we have in the kernel now.
Fair enough, and yeah, each patch only needs to do better and eventually
we'll get somewhere ;-)
> Any more thoughts in making the kick decisions (for doing idle load
> balancing) more robust are welcome.
Ha, I'll certainly share them if I have any.. I'd still love to split
this up somehow, but I've yet to figure out how to do so without
observing the whole machine.
next prev parent reply other threads:[~2011-11-29 9:47 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-18 23:03 [patch 0/6] sched, nohz: load balancing patches Suresh Siddha
2011-11-18 23:03 ` [patch 1/6] sched, nohz: introduce nohz_flags in the struct rq Suresh Siddha
2011-11-24 10:24 ` Peter Zijlstra
2011-11-28 23:59 ` Suresh Siddha
2011-11-29 9:47 ` Peter Zijlstra
2011-11-18 23:03 ` [patch 2/6] sched, nohz: track nr_busy_cpus in the sched_group_power Suresh Siddha
2011-11-18 23:03 ` [patch 3/6] sched, nohz: sched group, domain aware nohz idle load balancing Suresh Siddha
2011-11-24 11:47 ` Peter Zijlstra
2011-11-28 23:51 ` Suresh Siddha
2011-11-29 9:44 ` Peter Zijlstra [this message]
2011-12-01 1:03 ` Suresh Siddha
2011-12-01 1:17 ` Suresh Siddha
2011-12-01 8:36 ` Peter Zijlstra
2011-11-24 11:53 ` Peter Zijlstra
2011-11-28 23:58 ` Suresh Siddha
2011-11-29 9:45 ` Peter Zijlstra
2011-11-18 23:03 ` [patch 4/6] sched, nohz: cleanup the find_new_ilb() using sched groups nr_busy_cpus Suresh Siddha
2011-11-18 23:03 ` [patch 5/6] sched: disable sched feature TTWU_QUEUE by default Suresh Siddha
2011-11-19 4:30 ` Mike Galbraith
2011-11-19 4:41 ` Mike Galbraith
2011-11-18 23:03 ` [patch 6/6] sched: fix the sched group node allocation for SD_OVERLAP domain Suresh Siddha
2011-12-06 9:51 ` [tip:sched/core] sched: Fix the sched group node allocation for SD_OVERLAP domains tip-bot for Suresh Siddha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1322559859.2921.190.camel@twins \
--to=peterz@infradead.org \
--cc=alex.shi@intel.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=tim.c.chen@linux.jf.intel.com \
--cc=vatsa@linux.vnet.ibm.com \
--cc=venki@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox