All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	hpj@urpla.net
Subject: Re: [PATCH] sched: fix race in schedule
Date: Mon, 10 Mar 2008 13:01:41 -0700	[thread overview]
Message-ID: <47D593A5.5060906@ct.jp.nec.com> (raw)
In-Reply-To: <1205174197.8514.159.camel@twins>

Peter Zijlstra wrote:
> On Mon, 2008-03-10 at 11:01 -0700, Hiroshi Shimamoto wrote:
>> Hi Ingo,
>>
>> I found a race condition in scheduler.
>> The first report is the below;
>> http://lkml.org/lkml/2008/2/26/459
>>
>> It took a bit long time to investigate and I couldn't have much time last week.
>> It is hard to reproduce but -rt is little easier because it has preemptible
>> spin lock and rcu.
>>
>> Could you please check the scenario and the patch.
>> It will be needed for the stable, too.
>>
>> ---
>> From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
>>
>> There is a race condition between schedule() and some dequeue/enqueue
>> functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
>>
>> When scheduling to idle, idle_balance() is called to pull tasks from
>> other busy processor. It might drop the rq lock.
>> It means that those 3 functions encounter on_rq=0 and running=1.
>> The current task should be put when running.
>>
>> Here is a possible scenario;
>>    CPU0                               CPU1
>>     |                              schedule()
>>     |                              ->deactivate_task()
>>     |                              ->idle_balance()
>>     |                              -->load_balance_newidle()
>> rt_mutex_setprio()                     |
>>     |                              --->double_lock_balance()
>>     *get lock                          *rel lock
>>     * on_rq=0, ruuning=1               |
>>     * sched_class is changed           |
>>     *rel lock                          *get lock
>>     :                                  |
>>                                        :
>>                                    ->put_prev_task_rt()
>>                                    ->pick_next_task_fair()
>>                                        => panic
>>
>> The current process of CPU1(P1) is scheduling. Deactivated P1,
>> and the scheduler looks for another process on other CPU's runqueue
>> because CPU1 will be idle. idle_balance(), load_balance_newidle()
>> and double_lock_balance() are called and double_lock_balance() could
>> drop the rq lock. On the other hand, CPU0 is trying to boost the
>> priority of P1. The result of boosting only P1's prio and sched_class
>> are changed to RT. The sched entities of P1 and P1's group are never
>> put. It makes cfs_rq invalid, because the cfs_rq has curr and no leaf,
>> but pick_next_task_fair() is called, then the kernel panics.
> 
> Very nice catch, this had me puzzled for a while. I'm not quite sure I
> fully understand. Could you explain why the below isn't sufficient?

thanks, your patch looks nice to me.
I had focused setprio, on_rq=0 and running=1 situation, it makes me to
fix these functions.
But one point, I've just noticed. I'm not sure on same situation against
sched_rt. I think the pre_schedule() of rt has chance to drop rq lock.
Is it OK?

> 
> ---
> diff --git a/kernel/sched.c b/kernel/sched.c
> index a0c79e9..ebd9fc5 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4067,10 +4067,11 @@ need_resched_nonpreemptible:
>  		prev->sched_class->pre_schedule(rq, prev);
>  #endif
>  
> +	prev->sched_class->put_prev_task(rq, prev);
> +
>  	if (unlikely(!rq->nr_running))
>  		idle_balance(cpu, rq);
>  
> -	prev->sched_class->put_prev_task(rq, prev);
>  	next = pick_next_task(rq, prev);
>  
>  	sched_info_switch(prev, next);
> 
> 

  reply	other threads:[~2008-03-10 20:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-10 18:01 [PATCH] sched: fix race in schedule Hiroshi Shimamoto
2008-03-10 18:36 ` Peter Zijlstra
2008-03-10 20:01   ` Hiroshi Shimamoto [this message]
2008-03-10 20:34     ` Peter Zijlstra
2008-03-10 20:54       ` Hiroshi Shimamoto
2008-03-10 21:01         ` Peter Zijlstra
2008-03-10 21:07           ` Hiroshi Shimamoto
2008-03-11  2:12       ` Hiroshi Shimamoto
2008-03-11  8:40         ` Peter Zijlstra
2008-03-11 17:10           ` Hiroshi Shimamoto
2008-03-11 23:38             ` Dmitry Adamushko
2008-03-12 13:27               ` Peter Zijlstra
2008-03-12 14:48                 ` Dmitry Adamushko
2008-03-12 14:57                   ` Peter Zijlstra
2008-03-14 17:58                     ` Hiroshi Shimamoto
2008-03-14 22:47                       ` Dmitry Adamushko
2008-03-14 22:57                         ` Peter Zijlstra
2008-03-20  5:44 ` Sripathi Kodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47D593A5.5060906@ct.jp.nec.com \
    --to=h-shimamoto@ct.jp.nec.com \
    --cc=hpj@urpla.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.