All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Rick Lindsley <ricklind@us.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org, mjbligh@us.ibm.com,
	dvhltc@us.ibm.com
Subject: Re: [PATCH] Load balancing problem in 2.6.2-mm1
Date: Fri, 06 Feb 2004 20:38:25 +1100	[thread overview]
Message-ID: <40236091.40807@cyberone.com.au> (raw)
In-Reply-To: <200402060924.i169OWx30517@owlet.beaverton.ibm.com>



Rick Lindsley wrote:

>Nick, Andrew --
>
>Found a problem in Nick's code which had it way overbalancing much of
>the time.  I've included a patch below.
>
>I had been porting my schedstats patch to the -mm tree and noticed some
>huge imbalances (on the order of 33 million) being "corrected" and at
>first I thought it was a mis-port. But it was right.  We really were
>deciding 33 million processes had to move.  Of course, we never found
>that many, but we still moved as many as can_migrate_task() would allow.
>
>In find_busiest_group(), after we exit the do/while, we select our
>imbalance.  But max_load, avg_load, and this_load are all unsigned,
>so min(x,y) will make a bad choice if max_load < avg_load < this_load
>(that is, a choice between two negative [very large] numbers).
>
>Unfortunately, there is a bug when max_load never gets changed from zero
>(look in the loop and think what happens if the only load on the machine
>is being created by cpu groups of which we are a member). And you have
>a recipe for some really bogus values for imbalance.
>
>Even if you fix the max_load == 0 bug, there will still be times when
>avg_load - this_load will be negative (thus very large) and you'll make
>the decision to move stuff when you shouldn't have.
>
>This patch allows for this_load to set max_load, which if I understand
>the logic properly is correct.  It then adds a check to imbalance to make
>sure a negative number hasn't been coerced into a large positive number.
>With this patch applied, the algorithm is *much* more conservative ...
>maybe *too* conservative but that's for another round of testing ...
>
>Rick
>
>diff -rup linux-2.6.2-mm1/kernel/sched.c linux-2.6.2-mm1-fix/kernel/sched.c
>--- linux-2.6.2-mm1/kernel/sched.c	Thu Feb  5 14:47:17 2004
>+++ linux-2.6.2-mm1-fix/kernel/sched.c	Thu Feb  5 21:44:04 2004
>@@ -1352,7 +1624,7 @@ static struct sched_group *
> find_busiest_group(struct sched_domain *domain, int this_cpu,
> 				unsigned long *imbalance, enum idle_type idle)
> {
>-	unsigned long max_load, avg_load, total_load, this_load;
>+	unsigned long max_load, avg_load, total_load, this_load, load_diff;
> 	int modify, total_nr_cpus, busiest_nr_cpus = 0;
> 	enum idle_type package_idle = IDLE;
> 	struct sched_group *busiest = NULL, *group = domain->groups;
>@@ -1407,14 +1679,13 @@ find_busiest_group(struct sched_domain *
> 		total_nr_cpus += nr_cpus;
> 		avg_load /= nr_cpus;
> 
>+		if (avg_load > max_load)
>+			max_load = avg_load;
>+
> 		if (local_group) {
> 			this_load = avg_load;
>-			goto nextgroup;
>-		}
>-
>-		if (avg_load >= max_load) {
>+		} else if (avg_load >= max_load) {
> 			busiest = group;
>-			max_load = avg_load;
> 			busiest_nr_cpus = nr_cpus;
> 		}
> nextgroup:
>@@ -1437,8 +1708,19 @@ nextgroup:
> 	 * reduce the max loaded cpu below the average load, as either of these
> 	 * actions would just result in more rebalancing later, and ping-pong
> 	 * tasks around. Thus we look for the minimum possible imbalance.
>+	 * Negative imbalances (*we* are more loaded than anyone else) will
>+	 * be counted as no imbalance for these purposes -- we can't fix that
>+	 * by pulling tasks to us.  Be careful of negative numbers as they'll
>+	 * appear as very large values with unsigned longs.
> 	 */
>-	*imbalance = min(max_load - avg_load, avg_load - this_load);
>+	if (avg_load > this_load)
>+		load_diff = avg_load - this_load;
>+	else
>+		load_diff = 0;
>+
>+	*imbalance = min(max_load - avg_load, load_diff);
>+	if ((long)*imbalance < 0)
>+		*imbalance = 0;
> 
>

You're right Rick, thanks for catching this. Why do you have
this last test though? This shouldn't be possible?


  reply	other threads:[~2004-02-06  9:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-06  9:24 [PATCH] Load balancing problem in 2.6.2-mm1 Rick Lindsley
2004-02-06  9:38 ` Nick Piggin [this message]
2004-02-06 18:13   ` Rick Lindsley
2004-02-06 21:57     ` Nick Piggin
2004-02-06 22:30       ` Rick Lindsley
2004-02-06 22:40         ` Nick Piggin
2004-02-06 22:49           ` Rick Lindsley
2004-02-06 23:08             ` Nick Piggin
2004-02-06 10:30 ` Anton Blanchard
2004-02-06 18:15   ` Rick Lindsley
2004-02-06 18:39     ` Martin J. Bligh
2004-02-06 22:02       ` Nick Piggin
2004-02-06 22:34         ` Rick Lindsley
2004-02-06 22:48           ` Nick Piggin
2004-02-06 22:42         ` Martin J. Bligh
2004-02-06 22:53           ` Nick Piggin
2004-02-06 23:11           ` Rick Lindsley
2004-02-06 23:20             ` Nick Piggin
2004-02-06 23:33               ` Martin J. Bligh
2004-02-06 23:41                 ` Nick Piggin
2004-02-06 23:47                   ` Martin J. Bligh
2004-02-07  0:11                     ` Nick Piggin
2004-02-07  0:25                       ` Martin J. Bligh
2004-02-07  0:31                         ` Nick Piggin
2004-02-07  9:50                           ` Anton Blanchard
2004-02-08  0:40                             ` Rick Lindsley
2004-02-08  1:12                               ` Anton Blanchard
2004-02-08  1:21                                 ` Nick Piggin
2004-02-08  1:41                                   ` Anton Blanchard
2004-02-08  3:20                                     ` Nick Piggin
2004-02-08  3:57                                       ` Anton Blanchard
2004-02-08  4:05                                         ` Nick Piggin
2004-02-08 12:14                                           ` Anton Blanchard
2004-02-08  1:22                                 ` Anton Blanchard
2004-02-09 16:37                       ` Timothy Miller
2004-02-09 16:43                         ` Martin J. Bligh
2004-02-06 18:33   ` Martin J. Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40236091.40807@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=akpm@osdl.org \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjbligh@us.ibm.com \
    --cc=ricklind@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.