From: Nick Piggin <piggin@cyberone.com.au>
To: Rick Lindsley <ricklind@us.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org, mjbligh@us.ibm.com,
dvhltc@us.ibm.com
Subject: Re: [PATCH] Load balancing problem in 2.6.2-mm1
Date: Fri, 06 Feb 2004 20:38:25 +1100 [thread overview]
Message-ID: <40236091.40807@cyberone.com.au> (raw)
In-Reply-To: <200402060924.i169OWx30517@owlet.beaverton.ibm.com>
Rick Lindsley wrote:
>Nick, Andrew --
>
>Found a problem in Nick's code which had it way overbalancing much of
>the time. I've included a patch below.
>
>I had been porting my schedstats patch to the -mm tree and noticed some
>huge imbalances (on the order of 33 million) being "corrected" and at
>first I thought it was a mis-port. But it was right. We really were
>deciding 33 million processes had to move. Of course, we never found
>that many, but we still moved as many as can_migrate_task() would allow.
>
>In find_busiest_group(), after we exit the do/while, we select our
>imbalance. But max_load, avg_load, and this_load are all unsigned,
>so min(x,y) will make a bad choice if max_load < avg_load < this_load
>(that is, a choice between two negative [very large] numbers).
>
>Unfortunately, there is a bug when max_load never gets changed from zero
>(look in the loop and think what happens if the only load on the machine
>is being created by cpu groups of which we are a member). And you have
>a recipe for some really bogus values for imbalance.
>
>Even if you fix the max_load == 0 bug, there will still be times when
>avg_load - this_load will be negative (thus very large) and you'll make
>the decision to move stuff when you shouldn't have.
>
>This patch allows for this_load to set max_load, which if I understand
>the logic properly is correct. It then adds a check to imbalance to make
>sure a negative number hasn't been coerced into a large positive number.
>With this patch applied, the algorithm is *much* more conservative ...
>maybe *too* conservative but that's for another round of testing ...
>
>Rick
>
>diff -rup linux-2.6.2-mm1/kernel/sched.c linux-2.6.2-mm1-fix/kernel/sched.c
>--- linux-2.6.2-mm1/kernel/sched.c Thu Feb 5 14:47:17 2004
>+++ linux-2.6.2-mm1-fix/kernel/sched.c Thu Feb 5 21:44:04 2004
>@@ -1352,7 +1624,7 @@ static struct sched_group *
> find_busiest_group(struct sched_domain *domain, int this_cpu,
> unsigned long *imbalance, enum idle_type idle)
> {
>- unsigned long max_load, avg_load, total_load, this_load;
>+ unsigned long max_load, avg_load, total_load, this_load, load_diff;
> int modify, total_nr_cpus, busiest_nr_cpus = 0;
> enum idle_type package_idle = IDLE;
> struct sched_group *busiest = NULL, *group = domain->groups;
>@@ -1407,14 +1679,13 @@ find_busiest_group(struct sched_domain *
> total_nr_cpus += nr_cpus;
> avg_load /= nr_cpus;
>
>+ if (avg_load > max_load)
>+ max_load = avg_load;
>+
> if (local_group) {
> this_load = avg_load;
>- goto nextgroup;
>- }
>-
>- if (avg_load >= max_load) {
>+ } else if (avg_load >= max_load) {
> busiest = group;
>- max_load = avg_load;
> busiest_nr_cpus = nr_cpus;
> }
> nextgroup:
>@@ -1437,8 +1708,19 @@ nextgroup:
> * reduce the max loaded cpu below the average load, as either of these
> * actions would just result in more rebalancing later, and ping-pong
> * tasks around. Thus we look for the minimum possible imbalance.
>+ * Negative imbalances (*we* are more loaded than anyone else) will
>+ * be counted as no imbalance for these purposes -- we can't fix that
>+ * by pulling tasks to us. Be careful of negative numbers as they'll
>+ * appear as very large values with unsigned longs.
> */
>- *imbalance = min(max_load - avg_load, avg_load - this_load);
>+ if (avg_load > this_load)
>+ load_diff = avg_load - this_load;
>+ else
>+ load_diff = 0;
>+
>+ *imbalance = min(max_load - avg_load, load_diff);
>+ if ((long)*imbalance < 0)
>+ *imbalance = 0;
>
>
You're right Rick, thanks for catching this. Why do you have
this last test though? This shouldn't be possible?
next prev parent reply other threads:[~2004-02-06 9:58 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-06 9:24 [PATCH] Load balancing problem in 2.6.2-mm1 Rick Lindsley
2004-02-06 9:38 ` Nick Piggin [this message]
2004-02-06 18:13 ` Rick Lindsley
2004-02-06 21:57 ` Nick Piggin
2004-02-06 22:30 ` Rick Lindsley
2004-02-06 22:40 ` Nick Piggin
2004-02-06 22:49 ` Rick Lindsley
2004-02-06 23:08 ` Nick Piggin
2004-02-06 10:30 ` Anton Blanchard
2004-02-06 18:15 ` Rick Lindsley
2004-02-06 18:39 ` Martin J. Bligh
2004-02-06 22:02 ` Nick Piggin
2004-02-06 22:34 ` Rick Lindsley
2004-02-06 22:48 ` Nick Piggin
2004-02-06 22:42 ` Martin J. Bligh
2004-02-06 22:53 ` Nick Piggin
2004-02-06 23:11 ` Rick Lindsley
2004-02-06 23:20 ` Nick Piggin
2004-02-06 23:33 ` Martin J. Bligh
2004-02-06 23:41 ` Nick Piggin
2004-02-06 23:47 ` Martin J. Bligh
2004-02-07 0:11 ` Nick Piggin
2004-02-07 0:25 ` Martin J. Bligh
2004-02-07 0:31 ` Nick Piggin
2004-02-07 9:50 ` Anton Blanchard
2004-02-08 0:40 ` Rick Lindsley
2004-02-08 1:12 ` Anton Blanchard
2004-02-08 1:21 ` Nick Piggin
2004-02-08 1:41 ` Anton Blanchard
2004-02-08 3:20 ` Nick Piggin
2004-02-08 3:57 ` Anton Blanchard
2004-02-08 4:05 ` Nick Piggin
2004-02-08 12:14 ` Anton Blanchard
2004-02-08 1:22 ` Anton Blanchard
2004-02-09 16:37 ` Timothy Miller
2004-02-09 16:43 ` Martin J. Bligh
2004-02-06 18:33 ` Martin J. Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40236091.40807@cyberone.com.au \
--to=piggin@cyberone.com.au \
--cc=akpm@osdl.org \
--cc=dvhltc@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mjbligh@us.ibm.com \
--cc=ricklind@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox