All of lore.kernel.org
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>,
	Motohiro Kosaki <kosaki.motohiro@jp.fujitsu.com>,
	Linux Kernel ML <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] TASK_DEAD task is able to be woken up in special condition
Date: Wed, 21 Dec 2011 21:14:14 -0500	[thread overview]
Message-ID: <4EF29276.5050309@gmail.com> (raw)
In-Reply-To: <20111222094241.C691.E1E9C6FF@jp.fujitsu.com>

(12/21/11 7:42 PM), Yasunori Goto wrote:
>
> Hello
>
> I found TASK_DEAD task is able to be woken up in special condition.
> I would like to report this bug. Please check it.
>
> Here is the sequence how it occurs.
>
> ----------------------------------+-----------------------------
>                                    |
>             CPU A                  |             CPU B
> ----------------------------------+-----------------------------
> TASK A calls exit()....
>
> do_exit()
>
>    exit_mm()
>      down_read(mm->mmap_sem);
>
>      rwsem_down_failed_common()
>
>        set TASK_UNINTERRUPTIBLE
>        set waiter.task<= task A
>        list_add to sem->wait_list
>             :
>        raw_spin_unlock_irq()
>        (I/O interruption occured)
>
>                                        __rwsem_do_wake(mmap_sem)
>
>                                          list_del(&waiter->list);
>                                          waiter->task = NULL
>                                          wake_up_process(task A)
>                                            try_to_wake_up()
>                                               (task is still
>                                                 TASK_UNINTERRUPTIBLE)
>                                                p->on_rq is still 1.)
>
>                                                ttwu_do_wakeup()
>                                                   (*A)
>                                                     :
>       (I/O interruption handler finished)
>
>        if (!waiter.task)
>            schedule() is not called
>            due to waiter.task is NULL.
>
>        tsk->state = TASK_RUNNING
>
>            :
>                                                check_preempt_curr();
>                                                    :
>    task->state = TASK_DEAD
>                                                (*B)
>                                          <---    set TASK_RUNNING (*C)
>
>
>
>       schedule()
>       (exit task is running again)
>       BUG_ON() is called!
> --------------------------------------------------------
>
>
> You probably think that execution time between (*A) and (*B) is very short,
> because the interruption is disabled, and setting TASK_RUNNING at (*C)
> must be executed before setting TASK_DEAD.
>
>
> HOWEVER, if SMI is interrupted between (*A) and (*B),
> (*C) is able to execute AFTER setting TASK_DEAD!
> Then, exited task is scheduled again, and BUG_ON() is called....
>
> This is very bad senario.
> But, I suppose this phenomenon is able to occur on a guest system of
> virtual machine too.
>
> Please fix it.
>
> I suppose task->pi_lock should be held when task->state is changed to
> TASK_DEAD like the following patch (not tested yet).
> Because try_to_wake_up() hold it before checking task state.
>
>
> Thanks,
>
> ----
> Signed-off-by: Yasunori Goto<y-goto@jp.fujitsu.com>
>
> ---
>   kernel/exit.c |    3 +++
>   1 file changed, 3 insertions(+)
>
> Index: linux-3.2-rc4/kernel/exit.c
> ===================================================================
> --- linux-3.2-rc4.orig/kernel/exit.c
> +++ linux-3.2-rc4/kernel/exit.c
> @@ -1038,8 +1038,11 @@ NORET_TYPE void do_exit(long code)
>
>   	preempt_disable();
>   	exit_rcu();
> +
> +	spin_lock(&tsk->pi_lock, flags);
>   	/* causes final put_task_struct in finish_task_switch(). */
>   	tsk->state = TASK_DEAD;
> +	spin_unlock(&tsk->pi_lock, flags);
>   	schedule();
>   	BUG();
>   	/* Avoid "noreturn function does return".  */

I doubt it is not only TASK_DEAD issue, it is rwsem fundamental issue.
Because of, a lot of place assume "current->state = newstate" is safe
and don't need any synchronization. So, I'm worry about to lost 
TASK_UNINTERRUPTIBLE can make catastrophe like TASK_DEAD.

How about following patch? anyway, rwsem_down_failed_common() is 
definitely slowpath. so killing micro optimization is not so much
problem, I guess.



diff --git a/lib/rwsem.c b/lib/rwsem.c
index 410aa11..e2a0c9a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -208,9 +208,9 @@ rwsem_down_failed_common(struct rw_semaphore *sem,

         /* wait to be given the lock */
         for (;;) {
+               schedule();
                 if (!waiter.task)
                         break;
-               schedule();
                 set_task_state(tsk, TASK_UNINTERRUPTIBLE);
         }





  reply	other threads:[~2011-12-22  2:14 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-22  0:42 [BUG] TASK_DEAD task is able to be woken up in special condition Yasunori Goto
2011-12-22  2:14 ` KOSAKI Motohiro [this message]
2011-12-22  8:22   ` Yasunori Goto
2011-12-22 20:02     ` KOSAKI Motohiro
2011-12-23  9:49 ` Peter Zijlstra
2011-12-23 15:41   ` Oleg Nesterov
2011-12-26  8:23     ` Yasunori Goto
2011-12-26 17:11       ` Oleg Nesterov
2011-12-27  6:48         ` Yasunori Goto
2012-01-06 10:22           ` Yasunori Goto
2012-01-06 11:01             ` Peter Zijlstra
2012-01-06 12:01               ` Yasunori Goto
2012-01-06 12:43                 ` Peter Zijlstra
2012-01-06 14:12                   ` Oleg Nesterov
2012-01-06 14:19                     ` Oleg Nesterov
2012-01-07  1:31                     ` Yasunori Goto
2012-01-16 11:51                       ` Yasunori Goto
2012-01-16 13:38                         ` Peter Zijlstra
2012-01-17  8:40                           ` Yasunori Goto
2012-01-17  9:06                             ` Ingo Molnar
2012-01-17 15:12                               ` Oleg Nesterov
2012-01-18  9:42                                 ` Ingo Molnar
2012-01-18 14:20                                   ` Oleg Nesterov
2012-01-24 10:19                                     ` Peter Zijlstra
2012-01-24 10:55                                       ` Peter Zijlstra
2012-01-24 17:25                                         ` KOSAKI Motohiro
2012-01-25 15:45                                         ` Oleg Nesterov
2012-01-25 16:51                                           ` Peter Zijlstra
2012-01-25 17:43                                             ` Oleg Nesterov
2012-01-26 15:32                                               ` Peter Zijlstra
2012-01-26 16:26                                                 ` Oleg Nesterov
2012-01-27  8:59                                                   ` Peter Zijlstra
2012-01-24 10:11                                   ` Peter Zijlstra
2012-01-26  9:39                                     ` Ingo Molnar
2012-01-28 12:03                             ` [tip:sched/core] sched: Fix ancient race in do_exit() tip-bot for Yasunori Goto
2012-01-28 21:12                               ` Linus Torvalds
2012-01-29 16:07                                 ` Oleg Nesterov
2012-01-29 17:44                                   ` Linus Torvalds
2012-01-29 18:28                                     ` Linus Torvalds
2012-01-29 18:59                                     ` Oleg Nesterov
2012-01-30 16:27                                       ` Linus Torvalds
2012-01-06 13:48             ` [BUG] TASK_DEAD task is able to be woken up in special condition Oleg Nesterov
2011-12-28 21:07         ` KOSAKI Motohiro
2012-01-24 10:23           ` Peter Zijlstra
2012-01-24 18:01             ` KOSAKI Motohiro
2012-01-25  6:15               ` Mike Galbraith
2012-01-26 21:24                 ` KOSAKI Motohiro
2012-01-25 10:10           ` Peter Zijlstra
2012-01-26 20:25             ` [tip:sched/urgent] sched: Fix rq->nr_uninterruptible update race tip-bot for Peter Zijlstra
2012-01-27  5:20               ` Rakib Mullick
2012-01-27  8:19                 ` Peter Zijlstra
2012-01-27 14:11                   ` Rakib Mullick
2012-01-26 21:21             ` [BUG] TASK_DEAD task is able to be woken up in special condition KOSAKI Motohiro
2012-01-27  8:21               ` Peter Zijlstra
2011-12-26  6:52   ` Yasunori Goto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EF29276.5050309@gmail.com \
    --to=kosaki.motohiro@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.