From: Peter Zijlstra <peterz@infradead.org>
To: Michael Neuling <mikey@neuling.org>
Cc: Ingo Molnar <mingo@elte.hu>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
ego@in.ibm.com
Subject: Re: [PATCHv4 2/2] powerpc: implement arch_scale_smt_power for Power7
Date: Tue, 23 Feb 2010 17:24:41 +0100 [thread overview]
Message-ID: <1266942281.11845.521.camel@laptop> (raw)
In-Reply-To: <23662.1266905307@neuling.org>
On Tue, 2010-02-23 at 17:08 +1100, Michael Neuling wrote:
> I have some comments on the code inline but...
>
> So when I run this, I don't get processes pulled down to the lower
> threads. A simple test case of running 1 CPU intensive process at
> SCHED_OTHER on a machine with 2 way SMT system (a POWER6 but enabling
> SD_ASYM_PACKING). The single processes doesn't move to lower threads as
> I'd hope.
>
> Also, are you sure you want to put this in generic code? It seem to be
> quite POWER7 specific functionality, so would be logically better in
> arch/powerpc. I guess some other arch *might* need it, but seems
> unlikely.
Well, there are no arch hooks in the load-balancing (aside from the
recent cpu_power stuff, and that really is the wrong thing to poke at
for this), and I did hear some other people express interest in such a
constraint.
Also, load-balancing is complex enough as it is, so I prefer to keep
everything in the generic code where possible, clearly things like
sched_domain creation need arch topology bits, and the arch_scale*
things require other arch information like cpu frequency.
> > @@ -2493,6 +2494,28 @@ static inline void update_sg_lb_stats(st
> > DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
> > }
> >
> > +static int update_sd_pick_busiest(struct sched_domain *sd,
> > + struct sd_lb_stats *sds,
> > + struct sched_group *sg,
> > + struct sg_lb_stats *sgs)
> > +{
> > + if (sgs->sum_nr_running > sgs->group_capacity)
> > + return 1;
> > +
> > + if (sgs->group_imb)
> > + return 1;
> > +
> > + if ((sd->flags & SD_ASYM_PACKING) && sgs->sum_nr_running) {
>
> If we are asymetric packing...
>
>
> > + if (!sds->busiest)
> > + return 1;
>
> This just seems to be a null pointer check.
>
> From the tracing I've done, this is always true (always NULL) at this
> point so we return here.
Right, so we need to have a busiest group to take a task from, if there
is no busiest yet, take this group.
And in your scenario, with there being only a single task, we'd only hit
this once at most, so yes it makes sense this is always NULL.
> > +
> > + if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
> > + return 1;
>
> I'm a bit lost as to what this is for. Any clues you could provide
> would be appreciated. :-)
>
> Is the first cpu in this domain's busiest group before the first cpu in
> this group. If, so pick this as the busiest?
>
> Should this be the other way around if we want to pack the busiest to
> the first cpu? Mark it as the busiest if it's after (not before).
>
> Is group_first_cpu guaranteed to give us the first physical cpu (ie.
> thread 0 in our case) or are these virtualised at this point?
>
> I'm not seeing this hit anyway due to the null pointer check above.
So this says, if all things being equal, and we already have a busiest,
but this candidate (sg) is higher than the current (busiest) take this
one.
The idea is to move the highest SMT task down.
> > @@ -2562,6 +2585,38 @@ static inline void update_sd_lb_stats(st
> > } while (group != sd->groups);
> > }
> >
> > +int __weak sd_asym_packing_arch(void)
> > +{
> > + return 0;
> > +}
arch_sd_asym_packing() is what you used in topology.h
> > +static int check_asym_packing(struct sched_domain *sd,
> > + struct sd_lb_stats *sds,
> > + unsigned long *imbalance)
> > +{
> > + int i, cpu, busiest_cpu;
> > +
> > + if (!(sd->flags & SD_ASYM_PACKING))
> > + return 0;
> > +
> > + if (!sds->busiest)
> > + return 0;
> > +
> > + i = 0;
> > + busiest_cpu = group_first_cpu(sds->busiest);
> > + for_each_cpu(cpu, sched_domain_span(sd)) {
> > + i++;
> > + if (cpu == busiest_cpu)
> > + break;
> > + }
> > +
> > + if (sds->total_nr_running > i)
> > + return 0;
>
> This seems to be the core of the packing logic.
>
> We make sure the busiest_cpu is not past total_nr_running. If it is we
> mark as imbalanced. Correct?
>
> It seems if a non zero thread/group had a pile of processes running on
> it and a lower thread had much less, this wouldn't fire, but I'm
> guessing normal load balancing would kick in that case to fix the
> imbalance.
>
> Any corrections to my ramblings appreciated :-)
Right, so we're concerned the scenario where there's less tasks than SMT
siblings, if there's more they should all be running and the regular
load-balancer will deal with it.
If there's less the group will normally be balanced and we fall out and
end up in check_asym_packing().
So what I tried doing with that loop is detect if there's a hole in the
packing before busiest. Now that I think about it, what we need to check
is if this_cpu (the removed cpu argument) is idle and less than busiest.
So something like:
static int check_asym_pacing(struct sched_domain *sd,
struct sd_lb_stats *sds,
int this_cpu, unsigned long *imbalance)
{
int busiest_cpu;
if (!(sd->flags & SD_ASYM_PACKING))
return 0;
if (!sds->busiest)
return 0;
busiest_cpu = group_first_cpu(sds->busiest);
if (cpu_rq(this_cpu)->nr_running || this_cpu > busiest_cpu)
return 0;
*imbalance = (sds->max_load * sds->busiest->cpu_power) /
SCHED_LOAD_SCALE;
return 1;
}
Does that make sense?
I still see two problems with this though,.. regular load-balancing only
balances on the first cpu of a domain (see the *balance = 0, condition
in update_sg_lb_stats()), this means that if SMT[12] are idle we'll not
pull properly. Also, nohz balancing might mess this up further.
We could maybe play some games with the balance decision in
update_sg_lb_stats() for SD_ASYM_PACKING domains and idle == CPU_IDLE,
no ideas yet on nohz though.
next prev parent reply other threads:[~2010-02-23 16:25 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 20:00 [PATCH 0/2] sched: arch_scale_smt_powers Joel Schopp
2010-01-20 20:02 ` [PATCH 1/2] sched: Fix the place where group powers are updated Joel Schopp
2010-01-26 23:28 ` [PATCHv2 1/2] sched: enable ARCH_POWER Joel Schopp
2010-01-28 23:20 ` [PATCHv3 " Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
2010-01-20 20:04 ` [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7 Joel Schopp
2010-01-20 20:48 ` Peter Zijlstra
2010-01-20 21:58 ` Michael Neuling
2010-01-20 22:44 ` Joel Schopp
2010-01-21 8:27 ` Peter Zijlstra
2010-01-20 21:04 ` Michael Neuling
2010-01-20 22:09 ` Joel Schopp
2010-01-24 3:00 ` Benjamin Herrenschmidt
2010-01-25 17:50 ` Joel Schopp
2010-01-26 4:23 ` Benjamin Herrenschmidt
2010-01-20 21:33 ` Benjamin Herrenschmidt
2010-01-20 22:36 ` Joel Schopp
2010-01-26 23:28 ` [PATCHv2 " Joel Schopp
2010-01-27 0:52 ` Benjamin Herrenschmidt
2010-01-28 22:39 ` Joel Schopp
2010-01-29 1:23 ` Benjamin Herrenschmidt
2010-01-28 23:20 ` [PATCHv3 " Joel Schopp
2010-01-28 23:24 ` Joel Schopp
2010-01-29 1:23 ` Benjamin Herrenschmidt
2010-01-29 10:13 ` Peter Zijlstra
2010-01-29 18:34 ` Joel Schopp
2010-01-29 18:41 ` Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
2010-02-14 10:12 ` Peter Zijlstra
2010-02-17 22:20 ` Michael Neuling
2010-02-18 13:17 ` Peter Zijlstra
2010-02-18 13:19 ` Peter Zijlstra
2010-02-18 16:28 ` Joel Schopp
2010-02-18 17:08 ` Peter Zijlstra
2010-02-19 6:05 ` Michael Neuling
2010-02-19 10:01 ` Peter Zijlstra
2010-02-19 11:01 ` Michael Neuling
2010-02-23 6:08 ` Michael Neuling
2010-02-23 16:24 ` Peter Zijlstra [this message]
2010-02-23 16:30 ` Peter Zijlstra
2010-02-24 6:07 ` Michael Neuling
2010-02-24 11:13 ` Michael Neuling
2010-02-24 11:58 ` Michael Neuling
2010-02-27 10:21 ` Michael Neuling
2010-03-02 14:44 ` Peter Zijlstra
2010-03-04 22:28 ` Michael Neuling
2010-01-29 12:25 ` [PATCHv3 " Gabriel Paubert
2010-01-29 16:26 ` Joel Schopp
2010-01-26 23:27 ` [PATCHv2 0/2] sched: arch_scale_smt_powers v2 Joel Schopp
2010-01-28 23:20 ` [PATCHv3 0/2] sched: arch_scale_smt_powers Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1266942281.11845.521.camel@laptop \
--to=peterz@infradead.org \
--cc=ego@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mikey@neuling.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).