Re: busted CFS group load balancer?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Ken Chen <kenchen@google.com>
Cc: Chris Friesen <cfriesen@nortel.com>, Ingo Molnar <mingo@elte.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: busted CFS group load balancer?
Date: Tue, 18 Nov 2008 14:48:53 +0100	[thread overview]
Message-ID: <1227016133.29743.17.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <1227011444.29743.15.camel@lappy.programming.kicks-ass.net>

On Tue, 2008-11-18 at 13:30 +0100, Peter Zijlstra wrote:
> On Mon, 2008-11-17 at 23:33 -0800, Ken Chen wrote:
> > On Mon, Nov 17, 2008 at 9:19 PM, Peter Zijlstra wrote:
> > > Note that with larger cpu count and/or lower group weight we'll quickly
> > > run into numerical trouble...
> > >
> > > I would recommend trying this with the minimum weight in the order of
> > > 8-16 times number of cpus on your system.
> > >
> > > There is only so much one can do with 10 bit fixed precision math :/
> > 
> > That is probably one of the many problems.  I also found that the
> > updates to the per-cpu task_group's sched_entity load weight
> > (tg->se[cpu]->load.weight) is very problematic and very erratic.
> > 
> > The total rq_weight is calculated at one beginning of tg_shares_up(),
> > 
> >         for_each_cpu_mask(i, sd->span) {
> >                 rq_weight += tg->cfs_rq[i]->load.weight;
> >                 shares += tg->cfs_rq[i]->shares;
> >         }
> > 
> > However, the scaling of per-cpu se->load.weight in function
> > __update_group_shares_cpu() takes another lookup of
> > tg->cfs_rq[cpu]->load.weight at a different time.
> > cfs_rq[cpu].load.weight aren't always consistent across these two
> > times.  Due to these inconsistency of value taken on per cpu cfs_rq,
> > I've see tg->se[cpu]->load.weight jumping all over the place.  In our
> > environment, the cpu loads are very dynamic.  Process
> > queuing/dequeuing at high rate.
> 
> Ok, if your load values are very unstable in the order of the
> load-balance interval then you're hosed too, the same is true for the
> normal smp load-balancer.
> 
> The cgroup load-balancer makes that even more problematic.
> 
> Again, there's just very little you can do about that, except increase
> the coupling between cpus and thereby increase the overhead. Try
> decreasing 
> sysctl_sched_shares_ratelimit.


Also, lower sysctl_sched_shares_thresh to 1 or 0.

next prev parent reply	other threads:[~2008-11-18 13:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-15  1:14 busted CFS group load balancer? Ken Chen
2008-11-17 15:37 ` Chris Friesen
2008-11-17 20:04   ` Ken Chen
2008-11-17 21:19     ` Chris Friesen
2008-11-18  5:19       ` Peter Zijlstra
2008-11-18  7:33         ` Ken Chen
2008-11-18 12:30           ` Peter Zijlstra
2008-11-18 13:48             ` Peter Zijlstra [this message]
2008-11-18 17:27             ` Ken Chen
2008-11-18  7:52     ` Ken Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1227016133.29743.17.camel@lappy.programming.kicks-ass.net \
    --to=a.p.zijlstra@chello.nl \
    --cc=cfriesen@nortel.com \
    --cc=kenchen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.