From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Ken Chen <kenchen@google.com>
Cc: Chris Friesen <cfriesen@nortel.com>, Ingo Molnar <mingo@elte.hu>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: busted CFS group load balancer?
Date: Tue, 18 Nov 2008 13:30:43 +0100 [thread overview]
Message-ID: <1227011444.29743.15.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <b040c32a0811172333k5a89ccfy6438bf66d32dca5d@mail.gmail.com>
On Mon, 2008-11-17 at 23:33 -0800, Ken Chen wrote:
> On Mon, Nov 17, 2008 at 9:19 PM, Peter Zijlstra wrote:
> > Note that with larger cpu count and/or lower group weight we'll quickly
> > run into numerical trouble...
> >
> > I would recommend trying this with the minimum weight in the order of
> > 8-16 times number of cpus on your system.
> >
> > There is only so much one can do with 10 bit fixed precision math :/
>
> That is probably one of the many problems. I also found that the
> updates to the per-cpu task_group's sched_entity load weight
> (tg->se[cpu]->load.weight) is very problematic and very erratic.
>
> The total rq_weight is calculated at one beginning of tg_shares_up(),
>
> for_each_cpu_mask(i, sd->span) {
> rq_weight += tg->cfs_rq[i]->load.weight;
> shares += tg->cfs_rq[i]->shares;
> }
>
> However, the scaling of per-cpu se->load.weight in function
> __update_group_shares_cpu() takes another lookup of
> tg->cfs_rq[cpu]->load.weight at a different time.
> cfs_rq[cpu].load.weight aren't always consistent across these two
> times. Due to these inconsistency of value taken on per cpu cfs_rq,
> I've see tg->se[cpu]->load.weight jumping all over the place. In our
> environment, the cpu loads are very dynamic. Process
> queuing/dequeuing at high rate.
Ok, if your load values are very unstable in the order of the
load-balance interval then you're hosed too, the same is true for the
normal smp load-balancer.
The cgroup load-balancer makes that even more problematic.
Again, there's just very little you can do about that, except increase
the coupling between cpus and thereby increase the overhead. Try
decreasing
sysctl_sched_shares_ratelimit.
> I'm also very troubled with this calculation in __update_group_shares_cpu():
>
> shares = (sd_shares * rq_weight) / (sd_rq_weight + 1);
>
> Won't you have rounding problem here? value 'shares' will gradually
> decrease for each iteration of __update_group_shares_cpu()?
Yes it will, however at the top of the sched-domain tree its reset.
if (!sd->parent || !(sd->parent->flags & SD_LOAD_BALANCE))
shares = tg->shares;
next prev parent reply other threads:[~2008-11-18 12:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-15 1:14 busted CFS group load balancer? Ken Chen
2008-11-17 15:37 ` Chris Friesen
2008-11-17 20:04 ` Ken Chen
2008-11-17 21:19 ` Chris Friesen
2008-11-18 5:19 ` Peter Zijlstra
2008-11-18 7:33 ` Ken Chen
2008-11-18 12:30 ` Peter Zijlstra [this message]
2008-11-18 13:48 ` Peter Zijlstra
2008-11-18 17:27 ` Ken Chen
2008-11-18 7:52 ` Ken Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1227011444.29743.15.camel@lappy.programming.kicks-ass.net \
--to=a.p.zijlstra@chello.nl \
--cc=cfriesen@nortel.com \
--cc=kenchen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.