linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: fix race in schedule
@ 2008-03-10 18:01 Hiroshi Shimamoto
  2008-03-10 18:36 ` Peter Zijlstra
  2008-03-20  5:44 ` Sripathi Kodi
  0 siblings, 2 replies; 18+ messages in thread
From: Hiroshi Shimamoto @ 2008-03-10 18:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, linux-rt-users, hpj

Hi Ingo,

I found a race condition in scheduler.
The first report is the below;
http://lkml.org/lkml/2008/2/26/459

It took a bit long time to investigate and I couldn't have much time last week.
It is hard to reproduce but -rt is little easier because it has preemptible
spin lock and rcu.

Could you please check the scenario and the patch.
It will be needed for the stable, too.

---
From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

There is a race condition between schedule() and some dequeue/enqueue
functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().

When scheduling to idle, idle_balance() is called to pull tasks from
other busy processor. It might drop the rq lock.
It means that those 3 functions encounter on_rq=0 and running=1.
The current task should be put when running.

Here is a possible scenario;
   CPU0                               CPU1
    |                              schedule()
    |                              ->deactivate_task()
    |                              ->idle_balance()
    |                              -->load_balance_newidle()
rt_mutex_setprio()                     |
    |                              --->double_lock_balance()
    *get lock                          *rel lock
    * on_rq=0, ruuning=1               |
    * sched_class is changed           |
    *rel lock                          *get lock
    :                                  |
                                       :
                                   ->put_prev_task_rt()
                                   ->pick_next_task_fair()
                                       => panic

The current process of CPU1(P1) is scheduling. Deactivated P1,
and the scheduler looks for another process on other CPU's runqueue
because CPU1 will be idle. idle_balance(), load_balance_newidle()
and double_lock_balance() are called and double_lock_balance() could
drop the rq lock. On the other hand, CPU0 is trying to boost the
priority of P1. The result of boosting only P1's prio and sched_class
are changed to RT. The sched entities of P1 and P1's group are never
put. It makes cfs_rq invalid, because the cfs_rq has curr and no leaf,
but pick_next_task_fair() is called, then the kernel panics.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 kernel/sched.c |   38 ++++++++++++++++----------------------
 1 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 52b9867..eedf748 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4268,11 +4268,10 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 	oldprio = p->prio;
 	on_rq = p->se.on_rq;
 	running = task_current(rq, p);
-	if (on_rq) {
+	if (on_rq)
 		dequeue_task(rq, p, 0);
-		if (running)
-			p->sched_class->put_prev_task(rq, p);
-	}
+	if (running)
+		p->sched_class->put_prev_task(rq, p);
 
 	if (rt_prio(prio))
 		p->sched_class = &rt_sched_class;
@@ -4281,10 +4280,9 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 
 	p->prio = prio;
 
+	if (running)
+		p->sched_class->set_curr_task(rq);
 	if (on_rq) {
-		if (running)
-			p->sched_class->set_curr_task(rq);
-
 		enqueue_task(rq, p, 0);
 
 		check_class_changed(rq, p, prev_class, oldprio, running);
@@ -4581,19 +4579,17 @@ recheck:
 	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	running = task_current(rq, p);
-	if (on_rq) {
+	if (on_rq)
 		deactivate_task(rq, p, 0);
-		if (running)
-			p->sched_class->put_prev_task(rq, p);
-	}
+	if (running)
+		p->sched_class->put_prev_task(rq, p);
 
 	oldprio = p->prio;
 	__setscheduler(rq, p, policy, param->sched_priority);
 
+	if (running)
+		p->sched_class->set_curr_task(rq);
 	if (on_rq) {
-		if (running)
-			p->sched_class->set_curr_task(rq);
-
 		activate_task(rq, p, 0);
 
 		check_class_changed(rq, p, prev_class, oldprio, running);
@@ -7617,11 +7613,10 @@ void sched_move_task(struct task_struct *tsk)
 	running = task_current(rq, tsk);
 	on_rq = tsk->se.on_rq;
 
-	if (on_rq) {
+	if (on_rq)
 		dequeue_task(rq, tsk, 0);
-		if (unlikely(running))
-			tsk->sched_class->put_prev_task(rq, tsk);
-	}
+	if (unlikely(running))
+		tsk->sched_class->put_prev_task(rq, tsk);
 
 	set_task_rq(tsk, task_cpu(tsk));
 
@@ -7630,11 +7625,10 @@ void sched_move_task(struct task_struct *tsk)
 		tsk->sched_class->moved_group(tsk);
 #endif
 
-	if (on_rq) {
-		if (unlikely(running))
-			tsk->sched_class->set_curr_task(rq);
+	if (unlikely(running))
+		tsk->sched_class->set_curr_task(rq);
+	if (on_rq)
 		enqueue_task(rq, tsk, 0);
-	}
 
 	task_rq_unlock(rq, &flags);
 }
-- 
1.5.4.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-03-20  5:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-10 18:01 [PATCH] sched: fix race in schedule Hiroshi Shimamoto
2008-03-10 18:36 ` Peter Zijlstra
2008-03-10 20:01   ` Hiroshi Shimamoto
2008-03-10 20:34     ` Peter Zijlstra
2008-03-10 20:54       ` Hiroshi Shimamoto
2008-03-10 21:01         ` Peter Zijlstra
2008-03-10 21:07           ` Hiroshi Shimamoto
2008-03-11  2:12       ` Hiroshi Shimamoto
2008-03-11  8:40         ` Peter Zijlstra
2008-03-11 17:10           ` Hiroshi Shimamoto
2008-03-11 23:38             ` Dmitry Adamushko
2008-03-12 13:27               ` Peter Zijlstra
2008-03-12 14:48                 ` Dmitry Adamushko
2008-03-12 14:57                   ` Peter Zijlstra
2008-03-14 17:58                     ` Hiroshi Shimamoto
2008-03-14 22:47                       ` Dmitry Adamushko
2008-03-14 22:57                         ` Peter Zijlstra
2008-03-20  5:44 ` Sripathi Kodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).