From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
hpj@urpla.net, stable <stable@kernel.org>
Subject: Re: [PATCH] sched: fix race in schedule
Date: Fri, 14 Mar 2008 10:58:56 -0700 [thread overview]
Message-ID: <47DABCE0.2070505@ct.jp.nec.com> (raw)
In-Reply-To: <1205333829.8514.249.camel@twins>
Peter Zijlstra wrote:
> On Wed, 2008-03-12 at 15:48 +0100, Dmitry Adamushko wrote:
>> On 12/03/2008, Peter Zijlstra <peterz@infradead.org> wrote:
>>> [ ... ]
>>>
>>> > > Before begin, I can tell that se->on_rq is changed at enqueue_task() or
>>> > > dequeue_task() in sched.c.
>>> > >
>>> > > Here is the flow to panic which I got;
>>> > >
>>> > > CPU0 CPU1
>>> > > | schedule()
>>> > > | ->deactivate_task()
>>> > >
>>> > > | -->dequeue_task()
>>> > > | * on_rq=0
>>> > > | ->put_prev_task_fair()
>>> > >
>>> > > | ->idle_balance()
>>> > > | -->load_balance_newidle()
>>> > >
>>> > > (a wakeup function) |
>>> > >
>>> > > | --->double_lock_balance()
>>> > > *get lock *rel lock
>>> > >
>>> > > * wake up target is CPU1's curr |
>>> > > ->enqueue_task() |
>>> > > * on_rq=1 |
>>> > > ->rt_mutex_setprio() |
>>> > > * on_rq=1, ruuning=1 |
>>> > > -->dequeue_task()!! |
>>> > > -->put_prev_task_fair()!! |
>>> >
>>> > humm... this one should have caused the problem.
>>> >
>>> > ->put_prev_task() has been previously done in schedule() so we get 2
>>> > consequent ->put_prev_task() without set_curr_task/pick_next_task()
>>> > being called in between
>>> > [ as a result, __enqueue_entitty() is called twice for CPU1's curr and
>>> > that definitely corrupts an rb-tree ]
>>> >
>>> > your initial patch doesn't have this problem. humm... logically-wise,
>>> > it looks like a change of the 'current' which can be expressed by a
>>> > pair :
>>> >
>>> > (1) put_prev_task() + (2) pick_next_task() or set_curr_task()
>>> > (both end up calling set_next_entity())
>>> >
>>> > has to be 'atomic' wrt the rq->lock.
>>> >
>>> > For schedule() that also involves a change of rq->curr.
>>>
>>>
>>> Right, this seems to 'rely' on rq->curr lagging behind put_prev_task().
>>> So by doing something like:
>>>
>>>
>>> ---
>>> diff --git a/kernel/sched.c b/kernel/sched.c
>>>
>>> index a0c79e9..62d796f 100644
>>>
>>> --- a/kernel/sched.c
>>> +++ b/kernel/sched.c
>>>
>>> @@ -4061,6 +4061,8 @@ need_resched_nonpreemptible:
>>> }
>>> switch_count = &prev->nvcsw;
>>> }
>>>
>>> + prev->sched_class->put_prev_task(rq, prev);
>>>
>>> + rq->curr = rq->idle;
>>>
>>>
>>> #ifdef CONFIG_SMP
>>> if (prev->sched_class->pre_schedule)
>>>
>>> @@ -4070,14 +4072,13 @@ need_resched_nonpreemptible:
>>>
>>> if (unlikely(!rq->nr_running))
>>> idle_balance(cpu, rq);
>>>
>>> - prev->sched_class->put_prev_task(rq, prev);
>>> next = pick_next_task(rq, prev);
>>>
>>> + rq->curr = next;
>>>
>>>
>>> sched_info_switch(prev, next);
>>>
>>>
>>> if (likely(prev != next)) {
>>> rq->nr_switches++;
>>> - rq->curr = next;
>>> ++*switch_count;
>>>
>>> context_switch(rq, prev, next); /* unlocks the rq */
>>> ---
>> hum, I'm quite suspicious about this approach... mainly, due to the
>> fact that I think your next assumption is wrong:
>> (unless we specify 'running' wrt to whom)
>>
>>> We would avoid being considered running while we're not.
>>>
>> the fact is that we are (i.e. 'prev') actually running on a cpu until
>> some point in context_switch().
>>
>> At the very least, the load balancer has to exactly know who is the
>> 'current' on other cpus to treat such tasks differently.
>>
>> With this patch, the load balancer can be confused/broken by the fact
>> that 'prev' is considered for migration as a "not-on-rq and
>> not-running" task [ from another cpu at the moment when either
>> pre_schedule() or idle_balance() drop a rq->lock of prev's cpu ].
>>
>> well, the version of task_current() for __ARCH_WANT_UNLOCKED_CTXSW
>> would fix it if used by default.
>>
>> But maybe there is something esle that would be exposed by the
>> 'rq->curr = rq->idle' manipulation... I can't provide examples right
>> now though (I need to think on it).
>
> I share your concerns, I don't really like it either. Just spewing out
> ideas here - bad ideas it seems :-/
>
> Ingo also suggested moving the balance calls right before
> deactivate_task(), but that gives a whole other set of head-aches.
>
Well, what will we do about this issue?
I see you're thinking to fix inconsistency in scheduler, right?
I agree about it.
However, I don't think it's good to remain this issue long time in
the -stable kernel.
Could you please let me know what I can do?
thanks,
Hiroshi Shimamoto
next prev parent reply other threads:[~2008-03-14 17:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-10 18:01 [PATCH] sched: fix race in schedule Hiroshi Shimamoto
2008-03-10 18:36 ` Peter Zijlstra
2008-03-10 20:01 ` Hiroshi Shimamoto
2008-03-10 20:34 ` Peter Zijlstra
2008-03-10 20:54 ` Hiroshi Shimamoto
2008-03-10 21:01 ` Peter Zijlstra
2008-03-10 21:07 ` Hiroshi Shimamoto
2008-03-11 2:12 ` Hiroshi Shimamoto
2008-03-11 8:40 ` Peter Zijlstra
2008-03-11 17:10 ` Hiroshi Shimamoto
2008-03-11 23:38 ` Dmitry Adamushko
2008-03-12 13:27 ` Peter Zijlstra
2008-03-12 14:48 ` Dmitry Adamushko
2008-03-12 14:57 ` Peter Zijlstra
2008-03-14 17:58 ` Hiroshi Shimamoto [this message]
2008-03-14 22:47 ` Dmitry Adamushko
2008-03-14 22:57 ` Peter Zijlstra
2008-03-20 5:44 ` Sripathi Kodi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47DABCE0.2070505@ct.jp.nec.com \
--to=h-shimamoto@ct.jp.nec.com \
--cc=dmitry.adamushko@gmail.com \
--cc=hpj@urpla.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.