public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Rick Lindsley <ricklind@us.ibm.com>, akpm@osdl.org
Cc: linux-kernel@vger.kernel.org, mjbligh@us.ibm.com, dvhltc@us.ibm.com
Subject: Re: [PATCH] Load balancing problem in 2.6.2-mm1
Date: Sat, 07 Feb 2004 10:08:10 +1100	[thread overview]
Message-ID: <40241E5A.6080105@cyberone.com.au> (raw)
In-Reply-To: <200402062249.i16Mn0013884@owlet.beaverton.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

Rick Lindsley wrote:

>    No you definitely still need the test... this is what I mean:
>    
>           if (avg_load > this_load)
>                    *imbalance = min(max_load - avg_load, avg_load - this_load);
>            else
>                    *imbalance = 0;
>
>Ah, yes, ok. I thought you were saying it would be zero because of code
>elsewhere.  Yup, that'll work.
>
>Rick
>
>

OK, I moved the rescaling too. Andrew, please apply.



[-- Attachment #2: sched-fix.patch --]
[-- Type: text/plain, Size: 2848 bytes --]


From: Rick Lindsley <ricklind@us.ibm.com>

In find_busiest_group(), after we exit the do/while, we select our
imbalance.  But max_load, avg_load, and this_load are all unsigned,
so min(x,y) will make a bad choice if max_load < avg_load < this_load
(that is, a choice between two negative [very large] numbers).

Unfortunately, there is a bug when max_load never gets changed from zero
(look in the loop and think what happens if the only load on the machine
is being created by cpu groups of which we are a member). And you have
a recipe for some really bogus values for imbalance.

Even if you fix the max_load == 0 bug, there will still be times when
avg_load - this_load will be negative (thus very large) and you'll make
the decision to move stuff when you shouldn't have.

This patch allows for this_load to set max_load, which if I understand
the logic properly is correct. With this patch applied, the algorithm is
*much* more conservative ... maybe *too* conservative but that's for 
another round of testing ...



 linux-2.6-npiggin/kernel/sched.c |   24 +++++++++++++++---------
 1 files changed, 15 insertions(+), 9 deletions(-)

diff -puN kernel/sched.c~sched-fix kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-fix	2004-02-06 20:41:27.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c	2004-02-07 09:59:48.000000000 +1100
@@ -1407,14 +1407,13 @@ find_busiest_group(struct sched_domain *
 		total_nr_cpus += nr_cpus;
 		avg_load /= nr_cpus;
 
+		if (avg_load > max_load)
+			max_load = avg_load;
+
 		if (local_group) {
 			this_load = avg_load;
-			goto nextgroup;
-		}
-
-		if (avg_load >= max_load) {
+		} else if (avg_load >= max_load) {
 			busiest = group;
-			max_load = avg_load;
 			busiest_nr_cpus = nr_cpus;
 		}
 nextgroup:
@@ -1437,11 +1436,18 @@ nextgroup:
 	 * reduce the max loaded cpu below the average load, as either of these
 	 * actions would just result in more rebalancing later, and ping-pong
 	 * tasks around. Thus we look for the minimum possible imbalance.
+	 * Negative imbalances (*we* are more loaded than anyone else) will
+	 * be counted as no imbalance for these purposes -- we can't fix that
+	 * by pulling tasks to us.  Be careful of negative numbers as they'll
+	 * appear as very large values with unsigned longs.
 	 */
-	*imbalance = min(max_load - avg_load, avg_load - this_load);
-
-	/* Get rid of the scaling factor now, rounding *up* as we divide */
-	*imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT;
+	if (avg_load >= this_load) {
+		*imbalance = min(max_load - avg_load, avg_load - this_load);
+		/* Get rid of the scaling factor, rounding *up* as we divide */
+		*imbalance = (*imbalance + SCHED_LOAD_SCALE - 1)
+						>> SCHED_LOAD_SHIFT;
+	} else
+		*imbalance = 0;
 
 	if (*imbalance == 0) {
 		if (package_idle != NOT_IDLE && domain->flags & SD_FLAG_IDLE

_

  reply	other threads:[~2004-02-06 23:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-06  9:24 [PATCH] Load balancing problem in 2.6.2-mm1 Rick Lindsley
2004-02-06  9:38 ` Nick Piggin
2004-02-06 18:13   ` Rick Lindsley
2004-02-06 21:57     ` Nick Piggin
2004-02-06 22:30       ` Rick Lindsley
2004-02-06 22:40         ` Nick Piggin
2004-02-06 22:49           ` Rick Lindsley
2004-02-06 23:08             ` Nick Piggin [this message]
2004-02-06 10:30 ` Anton Blanchard
2004-02-06 18:15   ` Rick Lindsley
2004-02-06 18:39     ` Martin J. Bligh
2004-02-06 22:02       ` Nick Piggin
2004-02-06 22:34         ` Rick Lindsley
2004-02-06 22:48           ` Nick Piggin
2004-02-06 22:42         ` Martin J. Bligh
2004-02-06 22:53           ` Nick Piggin
2004-02-06 23:11           ` Rick Lindsley
2004-02-06 23:20             ` Nick Piggin
2004-02-06 23:33               ` Martin J. Bligh
2004-02-06 23:41                 ` Nick Piggin
2004-02-06 23:47                   ` Martin J. Bligh
2004-02-07  0:11                     ` Nick Piggin
2004-02-07  0:25                       ` Martin J. Bligh
2004-02-07  0:31                         ` Nick Piggin
2004-02-07  9:50                           ` Anton Blanchard
2004-02-08  0:40                             ` Rick Lindsley
2004-02-08  1:12                               ` Anton Blanchard
2004-02-08  1:21                                 ` Nick Piggin
2004-02-08  1:41                                   ` Anton Blanchard
2004-02-08  3:20                                     ` Nick Piggin
2004-02-08  3:57                                       ` Anton Blanchard
2004-02-08  4:05                                         ` Nick Piggin
2004-02-08 12:14                                           ` Anton Blanchard
2004-02-08  1:22                                 ` Anton Blanchard
2004-02-09 16:37                       ` Timothy Miller
2004-02-09 16:43                         ` Martin J. Bligh
2004-02-06 18:33   ` Martin J. Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40241E5A.6080105@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=akpm@osdl.org \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjbligh@us.ibm.com \
    --cc=ricklind@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox