public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] sched: improve pinned task handling again!
@ 2005-04-02  2:58 Siddha, Suresh B
  2005-04-02  3:11 ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Siddha, Suresh B @ 2005-04-02  2:58 UTC (permalink / raw)
  To: mingo, nickpiggin; +Cc: akpm, linux-kernel, kenneth.w.chen

On  Wednesday, February 23, 2005 11:17 PM Nick Piggin wrote:
>John Hawkes explained the problem best:
>  A large number of processes that are pinned to a single CPU results
>  in every other CPU's load_balance() seeing this overloaded CPU as
>  "busiest", yet move_tasks() never finds a task to pull-migrate.  This
>  condition occurs during module unload, but can also occur as a
>  denial-of-service using sys_sched_setaffinity().  Several hundred
>  CPUs performing this fruitless load_balance() will livelock on the
>  busiest CPU's runqueue lock.  A smaller number of CPUs will livelock
>  if the pinned task count gets high.
>	
>Expanding slightly on John's patch, this one attempts to work out
>whether the balancing failure has been due to too many tasks pinned
>on the runqueue. This allows it to be basically invisible to the
>regular blancing paths (ie. when there are no pinned tasks). We can
>use this extra knowledge to shut down the balancing faster, and ensure
>the migration threads don't start running which is another problem
>observed in the wild.

This time Ken Chen brought up this issue -- No it has nothing to do with 
industry db benchmark ;-) 

Even with the above mentioned Nick's patch in -mm, I see system livelock's 
if for example I have 7000 processes pinned onto one cpu (this is on the 
fastest 8-way system I have access to). I am sure there will be other 
systems where this problem can be encountered even with lesser pin count.

We tried to fix this issue but as you know there is no good mechanism
in fixing this issue with out letting the regular paths know about this.

Our proposed solution is appended and we tried to minimize the affect on 
fast path.  It builds up on Nick's patch and once this situation is detected,
it will not do any more move_tasks as long as busiest cpu is always the 
same cpu and the queued processes on busiest_cpu, their
cpu affinity remain same(found out by runqueue's "generation_num")

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>


--- linux-2.6.12-rc1-mm4/kernel/sched.c	2005-04-01 10:07:24.183605296 -0800
+++ linux/kernel/sched.c	2005-04-01 18:40:21.120799664 -0800
@@ -205,9 +205,16 @@
 	/*
 	 * nr_running and cpu_load should be in the same cacheline because
 	 * remote CPUs use both these fields when doing load calculation.
+	 * generation_num also needs to be in the same cacheline as nr_running.
 	 */
-	unsigned long nr_running;
+	unsigned int nr_running;
 #ifdef CONFIG_SMP
+	/*
+	 * generation_num gets incremented in the following cases
+	 * 	- a process moves to this runqueue
+	 *	- cpu affinity of a process on this runqueue is changed
+	 */
+ 	unsigned int generation_num;
 	unsigned long cpu_load[3];
 #endif
 	unsigned long long nr_switches;
@@ -237,6 +244,8 @@
 
 	task_t *migration_thread;
 	struct list_head migration_queue;
+	runqueue_t *busiest_rq;
+	unsigned int busiest_generation_num;
 #endif
 
 #ifdef CONFIG_SCHEDSTATS
@@ -598,6 +607,9 @@
 {
 	enqueue_task(p, rq->active);
 	rq->nr_running++;
+#ifdef CONFIG_SMP
+	rq->generation_num++;
+#endif
 }
 
 /*
@@ -1670,6 +1682,7 @@
 	src_rq->nr_running--;
 	set_task_cpu(p, this_cpu);
 	this_rq->nr_running++;
+	this_rq->generation_num++;
 	enqueue_task(p, this_array);
 	p->timestamp = (p->timestamp - src_rq->timestamp_last_tick)
 				+ this_rq->timestamp_last_tick;
@@ -1998,6 +2011,14 @@
 
 	schedstat_add(sd, lb_imbalance[idle], imbalance);
 
+	/* if all tasks on busiest_cpu were pinned and can't be moved to 
+	 * this_cpu and from our last load_balance, there is no
+	 * changes to busiest_cpu's generation_num, then we are balanced
+	 */ 
+	if (unlikely(this_rq->busiest_rq == busiest && 
+		     this_rq->busiest_generation_num == busiest->generation_num))
+		goto out_balanced;
+		
 	nr_moved = 0;
 	if (busiest->nr_running > 1) {
 		/*
@@ -2013,8 +2034,12 @@
 		spin_unlock(&busiest->lock);
 
 		/* All tasks on this runqueue were pinned by CPU affinity */
-		if (unlikely(all_pinned))
+		if (unlikely(all_pinned)) {
+			this_rq->busiest_rq = busiest;
+			this_rq->busiest_generation_num = busiest->generation_num;
 			goto out_balanced;
+		} else
+			this_rq->busiest_rq = NULL;
 	}
 
 	spin_unlock(&this_rq->lock);
@@ -4146,8 +4171,10 @@
 
 	p->cpus_allowed = new_mask;
 	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpu_isset(task_cpu(p), new_mask))
+	if (cpu_isset(task_cpu(p), new_mask)) {
+		rq->generation_num++;
 		goto out;
+	}
 
 	if (migrate_task(p, any_online_cpu(new_mask), &req)) {
 		/* Need help from migration thread: drop lock and wait. */

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-04-04  1:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-02  2:58 [patch] sched: improve pinned task handling again! Siddha, Suresh B
2005-04-02  3:11 ` Nick Piggin
2005-04-02  4:05   ` Siddha, Suresh B
2005-04-02  4:12     ` Nick Piggin
2005-04-04  1:46     ` Chen, Kenneth W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox