linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
To: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Hiroshi Shimamoto" <h-shimamoto@ct.jp.nec.com>,
	"Ingo Molnar" <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	hpj@urpla.net, stable <stable@kernel.org>
Subject: Re: [PATCH] sched: fix race in schedule
Date: Wed, 12 Mar 2008 15:48:49 +0100	[thread overview]
Message-ID: <b647ffbd0803120748x2bfc8fbctda15ce3df08fcf5b@mail.gmail.com> (raw)
In-Reply-To: <1205328457.8514.244.camel@twins>

On 12/03/2008, Peter Zijlstra <peterz@infradead.org> wrote:
>
> [ ... ]
>
>  > >  Before begin, I can tell that se->on_rq is changed at enqueue_task() or
>  > >  dequeue_task() in sched.c.
>  > >
>  > >  Here is the flow to panic which I got;
>  > >
>  > >    CPU0                               CPU1
>  > >     |                              schedule()
>  > >     |                              ->deactivate_task()
>  > >
>  > >     |                              -->dequeue_task()
>  > >     |                                  * on_rq=0
>  > >     |                              ->put_prev_task_fair()
>  > >
>  > >     |                              ->idle_balance()
>  > >     |                              -->load_balance_newidle()
>  > >
>  > > (a wakeup function)                    |
>  > >
>  > >     |                              --->double_lock_balance()
>  > >     *get lock                          *rel lock
>  > >
>  > >     * wake up target is CPU1's curr    |
>  > >  ->enqueue_task()                      |
>  > >     * on_rq=1                          |
>  > >  ->rt_mutex_setprio()                  |
>  > >     * on_rq=1, ruuning=1               |
>  > >  -->dequeue_task()!!                   |
>  > >  -->put_prev_task_fair()!!             |
>  >
>  > humm... this one should have caused the problem.
>  >
>  > ->put_prev_task() has been previously done in schedule() so we get 2
>  > consequent ->put_prev_task() without set_curr_task/pick_next_task()
>  > being called in between
>  > [ as a result, __enqueue_entitty() is called twice for CPU1's curr and
>  > that definitely corrupts an rb-tree ]
>  >
>  > your initial patch doesn't have this problem. humm... logically-wise,
>  > it looks like a change of the 'current' which can be expressed by a
>  > pair :
>  >
>  > (1) put_prev_task() + (2) pick_next_task() or set_curr_task()
>  > (both end up calling set_next_entity())
>  >
>  > has to be 'atomic' wrt the rq->lock.
>  >
>  > For schedule() that also involves a change of rq->curr.
>
>
> Right, this seems to 'rely' on rq->curr lagging behind put_prev_task().
>  So by doing something like:
>
>
>  ---
>  diff --git a/kernel/sched.c b/kernel/sched.c
>
> index a0c79e9..62d796f 100644
>
> --- a/kernel/sched.c
>  +++ b/kernel/sched.c
>
> @@ -4061,6 +4061,8 @@ need_resched_nonpreemptible:
>                 }
>                 switch_count = &prev->nvcsw;
>         }
>
> +       prev->sched_class->put_prev_task(rq, prev);
>
> +       rq->curr = rq->idle;
>
>
>   #ifdef CONFIG_SMP
>         if (prev->sched_class->pre_schedule)
>
> @@ -4070,14 +4072,13 @@ need_resched_nonpreemptible:
>
>         if (unlikely(!rq->nr_running))
>                 idle_balance(cpu, rq);
>
>  -       prev->sched_class->put_prev_task(rq, prev);
>         next = pick_next_task(rq, prev);
>
> +       rq->curr = next;
>
>
>         sched_info_switch(prev, next);
>
>
>         if (likely(prev != next)) {
>                 rq->nr_switches++;
>  -               rq->curr = next;
>                 ++*switch_count;
>
>                 context_switch(rq, prev, next); /* unlocks the rq */
>  ---

hum, I'm quite suspicious about this approach... mainly, due to the
fact that I think your next assumption is wrong:
(unless we specify 'running' wrt to whom)

>
>  We would avoid being considered running while we're not.
>

the fact is that we are (i.e. 'prev') actually running on a cpu until
some point in context_switch().

At the very least, the load balancer has to exactly know who is the
'current' on other cpus to treat such tasks differently.

With this patch, the load balancer can be confused/broken by the fact
that 'prev' is considered for migration as a "not-on-rq and
not-running" task [ from another cpu at the moment when either
pre_schedule() or idle_balance() drop a rq->lock of prev's cpu ].

well, the version of task_current() for __ARCH_WANT_UNLOCKED_CTXSW
would fix it if used by default.

But maybe there is something esle that would be exposed by the
'rq->curr = rq->idle' manipulation... I can't provide examples right
now though (I need to think on it).


-- 
Best regards,
Dmitry Adamushko

  reply	other threads:[~2008-03-12 14:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-10 18:01 [PATCH] sched: fix race in schedule Hiroshi Shimamoto
2008-03-10 18:36 ` Peter Zijlstra
2008-03-10 20:01   ` Hiroshi Shimamoto
2008-03-10 20:34     ` Peter Zijlstra
2008-03-10 20:54       ` Hiroshi Shimamoto
2008-03-10 21:01         ` Peter Zijlstra
2008-03-10 21:07           ` Hiroshi Shimamoto
2008-03-11  2:12       ` Hiroshi Shimamoto
2008-03-11  8:40         ` Peter Zijlstra
2008-03-11 17:10           ` Hiroshi Shimamoto
2008-03-11 23:38             ` Dmitry Adamushko
2008-03-12 13:27               ` Peter Zijlstra
2008-03-12 14:48                 ` Dmitry Adamushko [this message]
2008-03-12 14:57                   ` Peter Zijlstra
2008-03-14 17:58                     ` Hiroshi Shimamoto
2008-03-14 22:47                       ` Dmitry Adamushko
2008-03-14 22:57                         ` Peter Zijlstra
2008-03-20  5:44 ` Sripathi Kodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b647ffbd0803120748x2bfc8fbctda15ce3df08fcf5b@mail.gmail.com \
    --to=dmitry.adamushko@gmail.com \
    --cc=h-shimamoto@ct.jp.nec.com \
    --cc=hpj@urpla.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).