public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Theurer <habanero@us.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Theurer <habanero@us.ibm.com>,
	nickpiggin@yahoo.com.au, anton@samba.org, tbrian@us.ibm.com
Subject: Re: Database regression due to scheduler changes ?
Date: Tue, 08 Nov 2005 20:14:01 -0600	[thread overview]
Message-ID: <43715B69.5040609@us.ibm.com> (raw)
In-Reply-To: <43715361.3070802@us.ibm.com>

Nick wrote:

>> I would also take a look at removing SD_WAKE_IDLE from the flags.
>> This flag should make balancing more aggressive, but it can have
>> problems when applied to a NUMA domain due to too much task
>> movement.
>
> Anton wrote:
> I was wondering how ppc64 ended up with different parameters in the NODE
> definitions (added SD_BALANCE_NEWIDLE and SD_WAKE_IDLE)    and it looks
> like it was Andrew :)
>
> http://lkml.org/lkml/2004/11/2/205

FWIW I changed all arch's, but most (except ppc) got changed back.  At 
the time we had data showing the more aggressive wake idle and newidle 
was good for things like OLTP.

Brian, do you have cpu util numbers and runqueue lengths for both tests?

>
> It looks like balancing was not agressive enough on his workload too.
> Im a bit uneasy with only ppc64 having the two flags though.

Brian wrote:

> We suspect the regression was introduced in the scheduler changes
> that went into 2.6.13-rc1.  However, the regression was hidden
> from us by a bug in include/asm-ppc64/topology.h that made ppc64
> look non-NUMA from 2.6.13-rc1 through 2.6.13-rc4.  That bug was
> fixed in 2.6.13-rc5.  Unfortunately the workload does not run to
> completion on 2.6.12 or 2.6.13-rc1.

Brian, I am not sure if you were thinking of a particular set of sched 
changes, but I suspect it might be one or more in the list below (my 
guess is the first and last).  Would it be possible to back out these 
change-sets from 2.6.13-rc5 and see if there is any difference?  FWIW, 
even if they do help, I am not suggesting, yet, that they should be 
reverted.  I am hoping there is some compromise that can work better in 
all situations.

-Andrew

commit cafb20c1f9976a70d633bb1e1c8c24eab00e4e80
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sat Jun 25 14:57:17 2005 -0700

    [PATCH] sched: no aggressive idle balancing
    
    Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
    going against the direction we are trying to go.  Hopefully we can regain
    performance through other methods.
    
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit a3f21bce1fefdf92a4d1705e888d390b10f3ac6f
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sat Jun 25 14:57:15 2005 -0700

    [PATCH] sched: tweak affine wakeups
    
    Do less affine wakeups.  We're trying to reduce dbt2-pgsql idle time
    regressions here...  make sure we don't don't move tasks the wrong way in an
    imbalance condition.  Also, remove the cache coldness requirement from the
    calculation - this seems to induce sharp cutoff points where behaviour will
    suddenly change on some workloads if the load creeps slightly over or under
    some point.  It is good for periodic balancing because in that case have
    otherwise have no other context to determine what task to move.
    
    But also make a minor tweak to "wake balancing" - the imbalance tolerance is
    now set at half the domain's imbalance, so we get the opportunity to do wake
    balancing before the more random periodic rebalancing gets preformed.
    
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit 7897986bad8f6cd50d6149345aca7f6480f49464
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sat Jun 25 14:57:13 2005 -0700

    [PATCH] sched: balance timers
    
    Do CPU load averaging over a number of different intervals.  Allow each
    interval to be chosen by sending a parameter to source_load and target_load.
    0 is instantaneous, idx > 0 returns a decaying average with the most recent
    sample weighted at 2^(idx-1).  To a maximum of 3 (could be easily increased).
    
    So generally a higher number will result in more conservative balancing.
    
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit 99b61ccf0bf0e9a85823d39a5db6a1519caeb13d
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sat Jun 25 14:57:12 2005 -0700

    [PATCH] sched: less aggressive idle balancing
    
    Remove the special casing for idle CPU balancing.  Things like this are
    hurting for example on SMT, where are single sibling being idle doesn't really
    warrant a really aggressive pull over the NUMA domain, for example.
    
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>





       reply	other threads:[~2005-11-09  2:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <43715361.3070802@us.ibm.com>
2005-11-09  2:14 ` Andrew Theurer [this message]
2005-11-07 22:17 Database regression due to scheduler changes ? Brian Twichell
2005-11-07 22:35 ` David Lang
2005-11-07 23:06   ` Brian Twichell
2005-11-08  0:51     ` Nick Piggin
2005-11-08  1:15       ` Anton Blanchard
2005-11-08  1:34         ` Martin J. Bligh
2005-11-08  1:46           ` Nick Piggin
2005-11-08  1:48             ` Nick Piggin
2005-11-08  1:58             ` Martin J. Bligh
2005-11-08  2:04             ` David Lang
2005-11-08  2:12               ` Martin J. Bligh
2005-11-08  2:15               ` Nick Piggin
2005-11-09  5:03       ` Brian Twichell
     [not found]         ` <43718DFE.3040600@yahoo.com.au>
2005-11-14 23:03           ` Brian Twichell
2005-11-08  2:31   ` Byron Stanoszek
2005-11-07 22:47 ` linux-os (Dick Johnson)
2005-11-08  3:54   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43715B69.5040609@us.ibm.com \
    --to=habanero@us.ibm.com \
    --cc=anton@samba.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tbrian@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox