All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Williams <pwil3058@bigpond.net.au>
To: Andrew Morton <akpm@osdl.org>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>,
	Ingo Molnar <mingo@elte.hu>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Con Kolivas <kernel@kolivas.org>,
	"Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	linux-kernel@vger.kernel.org
Subject: [PATCH] sched: make sure busiest group and run queue are pullable
Date: Sat, 25 Mar 2006 14:40:30 +1100	[thread overview]
Message-ID: <4424BBAE.2060005@bigpond.net.au> (raw)
In-Reply-To: <4424A298.70706@bigpond.net.au>

[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]

Peter Williams wrote:
> Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> more issues with smpnice patch...
>>>
>>> a) consider a 4-way system (simple SMP system with no HT and cores) 
>>> scenario
>>> where a high priority task (nice -20) is running on P0 and two normal
>>> priority tasks running on P1. load balance with smp nice code
>>> will never be able to detect an imbalance and hence will never move 
>>> one of the normal priority tasks on P1 to idle cpus P2 or P3.
>>
>> Why?
> 
> OK, I think I know why.  The load balancing code will always decide that 
> P0 is the busiest CPU, right?

Attached is a patch that addresses this problem.  The strategies 
employed are:

1. for find_busiest_group() only consider groups that have at least one 
CPU with more than one task running as candidates for "busiest", and

2. for find_busiest_queue() only consider queues that have more than one 
running tasks as candidates for "busiest".

I think that the overhead gains from earlier abandonment of load 
balancing attempts that would eventually (most probably -- see next 
paragraph) be abandoned anyway will compensate for the extra overhead 
introduced in these functions.

I think that the only likely behavioural changes for an all tasks have 
nice==0 system is that without these checks there is a small chance that 
a "busiest" that only has one runnable task (and for which move_tasks() 
would eventually not move any tasks) when these tests are made may 
actually acquire extra runnable tasks before the locks are taken in 
preparation for calling move_tasks() and, therefore, load balancing may 
actually take place.  I think that this effect can be safely ignored.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: smpnice-modify-busiest-searches --]
[-- Type: text/plain, Size: 1646 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-03-25 13:43:06.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-03-25 13:56:37.000000000 +1100
@@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
 		int local_group;
 		int i;
 		unsigned long sum_nr_running, sum_weighted_load;
+		unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
 
 		local_group = cpu_isset(this_cpu, group->cpumask);
 
@@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
 
 			avg_load += load;
 			sum_nr_running += rq->nr_running;
+			if (rq->nr_running > 1)
+				++nr_loaded_cpus;
 			sum_weighted_load += rq->raw_weighted_load;
 		}
 
@@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
 			this = group;
 			this_nr_running = sum_nr_running;
 			this_load_per_task = sum_weighted_load;
-		} else if (avg_load > max_load) {
+		} else if (nr_loaded_cpus && avg_load > max_load) {
 			max_load = avg_load;
 			busiest = group;
 			busiest_nr_running = sum_nr_running;
@@ -2258,16 +2261,16 @@ out_balanced:
 static runqueue_t *find_busiest_queue(struct sched_group *group,
 	enum idle_type idle)
 {
-	unsigned long load, max_load = 0;
-	runqueue_t *busiest = NULL;
+	unsigned long max_load = 0;
+	runqueue_t *busiest = NULL, *rqi;
 	int i;
 
 	for_each_cpu_mask(i, group->cpumask) {
-		load = weighted_cpuload(i);
+		rqi = cpu_rq(i);
 
-		if (load > max_load) {
-			max_load = load;
-			busiest = cpu_rq(i);
+		if (rqi->nr_running > 1 && rqi->raw_weighted_load > max_load) {
+			max_load = rqi->raw_weighted_load;
+			busiest = rqi;
 		}
 	}
 

  reply	other threads:[~2006-03-25  3:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-22 23:51 cpu scheduler merge plans Andrew Morton
2006-03-22 22:57 ` kernel
2006-03-23  1:37   ` Siddha, Suresh B
2006-03-23 22:06     ` Peter Williams
2006-03-23  0:31 ` Nick Piggin
2006-03-23  1:16 ` Peter Williams
2006-03-24 23:45   ` more smpnice patch issues Siddha, Suresh B
2006-03-25  0:56     ` Peter Williams
2006-03-25  1:53       ` Peter Williams
2006-03-25  3:40         ` Peter Williams [this message]
2006-03-26 23:24     ` Peter Williams
2006-03-26 23:43       ` [PATCH] sched: smpnice prevent integer arithmetic wrap problems Peter Williams
2006-03-23  5:03 ` cpu scheduler merge plans Ingo Molnar
2006-03-23  5:13   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4424BBAE.2060005@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=akpm@osdl.org \
    --cc=efault@gmx.de \
    --cc=kenneth.w.chen@intel.com \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.