From: Peter Williams <pwil3058@bigpond.net.au>
To: Andrew Morton <akpm@osdl.org>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>,
Ingo Molnar <mingo@elte.hu>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Con Kolivas <kernel@kolivas.org>,
"Chen, Kenneth W" <kenneth.w.chen@intel.com>,
Mike Galbraith <efault@gmx.de>,
linux-kernel@vger.kernel.org
Subject: [PATCH] sched: make sure busiest group and run queue are pullable
Date: Sat, 25 Mar 2006 14:40:30 +1100 [thread overview]
Message-ID: <4424BBAE.2060005@bigpond.net.au> (raw)
In-Reply-To: <4424A298.70706@bigpond.net.au>
[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]
Peter Williams wrote:
> Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> more issues with smpnice patch...
>>>
>>> a) consider a 4-way system (simple SMP system with no HT and cores)
>>> scenario
>>> where a high priority task (nice -20) is running on P0 and two normal
>>> priority tasks running on P1. load balance with smp nice code
>>> will never be able to detect an imbalance and hence will never move
>>> one of the normal priority tasks on P1 to idle cpus P2 or P3.
>>
>> Why?
>
> OK, I think I know why. The load balancing code will always decide that
> P0 is the busiest CPU, right?
Attached is a patch that addresses this problem. The strategies
employed are:
1. for find_busiest_group() only consider groups that have at least one
CPU with more than one task running as candidates for "busiest", and
2. for find_busiest_queue() only consider queues that have more than one
running tasks as candidates for "busiest".
I think that the overhead gains from earlier abandonment of load
balancing attempts that would eventually (most probably -- see next
paragraph) be abandoned anyway will compensate for the extra overhead
introduced in these functions.
I think that the only likely behavioural changes for an all tasks have
nice==0 system is that without these checks there is a small chance that
a "busiest" that only has one runnable task (and for which move_tasks()
would eventually not move any tasks) when these tests are made may
actually acquire extra runnable tasks before the locks are taken in
preparation for calling move_tasks() and, therefore, load balancing may
actually take place. I think that this effect can be safely ignored.
Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
[-- Attachment #2: smpnice-modify-busiest-searches --]
[-- Type: text/plain, Size: 1646 bytes --]
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-03-25 13:43:06.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-03-25 13:56:37.000000000 +1100
@@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
int local_group;
int i;
unsigned long sum_nr_running, sum_weighted_load;
+ unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
local_group = cpu_isset(this_cpu, group->cpumask);
@@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
avg_load += load;
sum_nr_running += rq->nr_running;
+ if (rq->nr_running > 1)
+ ++nr_loaded_cpus;
sum_weighted_load += rq->raw_weighted_load;
}
@@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
this = group;
this_nr_running = sum_nr_running;
this_load_per_task = sum_weighted_load;
- } else if (avg_load > max_load) {
+ } else if (nr_loaded_cpus && avg_load > max_load) {
max_load = avg_load;
busiest = group;
busiest_nr_running = sum_nr_running;
@@ -2258,16 +2261,16 @@ out_balanced:
static runqueue_t *find_busiest_queue(struct sched_group *group,
enum idle_type idle)
{
- unsigned long load, max_load = 0;
- runqueue_t *busiest = NULL;
+ unsigned long max_load = 0;
+ runqueue_t *busiest = NULL, *rqi;
int i;
for_each_cpu_mask(i, group->cpumask) {
- load = weighted_cpuload(i);
+ rqi = cpu_rq(i);
- if (load > max_load) {
- max_load = load;
- busiest = cpu_rq(i);
+ if (rqi->nr_running > 1 && rqi->raw_weighted_load > max_load) {
+ max_load = rqi->raw_weighted_load;
+ busiest = rqi;
}
}
next prev parent reply other threads:[~2006-03-25 3:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-22 23:51 cpu scheduler merge plans Andrew Morton
2006-03-22 22:57 ` kernel
2006-03-23 1:37 ` Siddha, Suresh B
2006-03-23 22:06 ` Peter Williams
2006-03-23 0:31 ` Nick Piggin
2006-03-23 1:16 ` Peter Williams
2006-03-24 23:45 ` more smpnice patch issues Siddha, Suresh B
2006-03-25 0:56 ` Peter Williams
2006-03-25 1:53 ` Peter Williams
2006-03-25 3:40 ` Peter Williams [this message]
2006-03-26 23:24 ` Peter Williams
2006-03-26 23:43 ` [PATCH] sched: smpnice prevent integer arithmetic wrap problems Peter Williams
2006-03-23 5:03 ` cpu scheduler merge plans Ingo Molnar
2006-03-23 5:13 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4424BBAE.2060005@bigpond.net.au \
--to=pwil3058@bigpond.net.au \
--cc=akpm@osdl.org \
--cc=efault@gmx.de \
--cc=kenneth.w.chen@intel.com \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.