public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Mike Galbraith <efault@gmx.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: newidle balancing in NUMA domain?
Date: Mon, 23 Nov 2009 16:11:52 +0100	[thread overview]
Message-ID: <20091123151152.GA19175@wotan.suse.de> (raw)
In-Reply-To: <1258987059.6193.73.camel@marge.simson.net>

On Mon, Nov 23, 2009 at 03:37:39PM +0100, Mike Galbraith wrote:
> On Mon, 2009-11-23 at 12:22 +0100, Nick Piggin wrote:
> > Hi,
> > 
> > I wonder why it was decided to do newidle balancing in the NUMA
> > domain? And with newidle_idx == 0 at that.
> > 
> > This means that every time the CPU goes idle, every CPU in the
> > system gets a remote cacheline or two hit. Not very nice O(n^2)
> > behaviour on the interconnect. Not to mention trashing our
> > NUMA locality.
> 
> Painful on little boxen too if left unchained.

Yep. It's an order of magnitude more expensive to go on the
interconnect rather than stay in LLC. So even on little systems,
new idle balancing can become an order of magnitude more
expensive.

On slightly larger systems, where you have an order of magnitude
more cores on remote nodes than local, new idle balancing can now
be two orders of magnitude more expensive.


> > And then I see some proposal to do ratelimiting of newidle
> > balancing :( Seems like hack upon hack making behaviour much more
> > complex.
> 
> That's mine, and yeah, it is hackish.  It just keeps newidle at bay for
> high speed switchers while keeping it available to kick start CPUs for
> fork/exec loads.  Suggestions welcome.  I have a threaded testcase
> (x264) where turning the think off costs ~40% throughput.  Take that
> same testcase (or ilk) to a big NUMA beast, and performance will very
> likely suck just as bad as it does on my little Q6600 box.
> 
> Other than that, I'd be most happy to see the thing crawl back in it's
> cave and _die_ despite the little gain it provides for a kbuild.  It has
> been (is) very annoying.

Wait, you say it was activated to improve fork/exec CPU utilization?
For the x264 load? What do you mean by this? Do you mean it is doing
a lot of fork/exec/exits and load is not being spread quickly enough?
Or that NUMA allocations get screwed up because tasks don't get spread
out quickly enough before running?

In either case, I think newidle balancing is maybe not the right solution.
newidle balancing only checks the system state when the destination
CPU goes idle. fork events increase load at the source CPU. So for
example if you find newidle helps to pick up forks, then if the
newidle event happens to come in before the fork, we'll have to wait
for the next rebalance event.

So possibly making fork/exec balancing more aggressive might be a
better approach. This can be done by reducing the damping idx, or
perhaps some other conditions to reduce eg imbalance_pct or something
for forkexec balancing. Probably needs some studying of the workload
to work out why forkexec is failing.


> > One "symptom" of bad mutex contention can be that increasing the
> > balancing rate can help a bit to reduce idle time (because it
> > can get the woken thread which is holding a semaphore to run ASAP
> > after we run out of runnable tasks in the system due to them 
> > hitting contention on that semaphore).
> 
> Yes, when mysql+oltp starts jamming up, load balancing helps bust up the
> logjam somewhat, but that's not at all why newidle was activated..

OK good to know.

 
> > I really hope this change wasn't done in order to help -rt or
> > something sad like sysbench on MySQL.
> 
> Newidle was activated to improve fork/exec CPU utilization.  A nasty
> side effect is that it tries to rip other loads to tatters.
> 
> > And btw, I'll stay out of mentioning anything about CFS development,
> > but it really sucks to be continually making significant changes to
> > domains balancing *and* per-runqueue scheduling at the same time :(
> > It makes it even difficult to bisect things.
> 
> Yeah, balancing got jumbled up with desktop tweakage.  Much fallout this
> round, and some things still to be fixed back up.

OK. This would be great if fixing up involves making things closer
to what they were rather than adding more complex behaviour on top
of other changes that broke stuff. And doing it in 2.6.32 would be
kind of nice...


  reply	other threads:[~2009-11-23 15:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-23 11:22 newidle balancing in NUMA domain? Nick Piggin
2009-11-23 11:36 ` Peter Zijlstra
2009-11-23 11:43   ` Nick Piggin
2009-11-23 11:50     ` Peter Zijlstra
2009-11-23 12:16       ` Nick Piggin
2009-11-23 11:45   ` Ingo Molnar
2009-11-23 12:01     ` Nick Piggin
2009-11-23 12:08       ` Ingo Molnar
2009-11-23 12:27         ` Nick Piggin
2009-11-23 12:46           ` Ingo Molnar
2009-11-24  6:36             ` Nick Piggin
2009-11-24 17:24               ` Jason Garrett-Glaser
2009-11-24 18:09                 ` Mike Galbraith
2009-11-30  8:19                 ` Nick Piggin
2009-12-01  8:18                   ` Jason Garrett-Glaser
2009-11-23 14:37 ` Mike Galbraith
2009-11-23 15:11   ` Nick Piggin [this message]
2009-11-23 15:21     ` Peter Zijlstra
2009-11-23 15:29       ` Nick Piggin
2009-11-23 15:37         ` Peter Zijlstra
2009-11-24  6:54           ` Nick Piggin
2009-11-23 15:53         ` Mike Galbraith
2009-11-24  6:53           ` Nick Piggin
2009-11-24  8:40             ` Mike Galbraith
2009-11-24  8:58               ` Mike Galbraith
2009-11-24  9:11                 ` Ingo Molnar
2009-11-30  8:27                   ` Nick Piggin
2009-11-23 17:04         ` Ingo Molnar
2009-11-24  6:59           ` Nick Piggin
2009-11-24  9:16             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091123151152.GA19175@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox