From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Yasunori Goto <y-goto@jp.fujitsu.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@elte.hu>
Cc: Dan Carpenter <dan.carpenter@oracle.com>,
Michael Davidson <md@google.com>,
Suleiman Souhlal <suleiman@google.com>,
Julien Tinnes <jln@google.com>, Aaron Durbin <adurbin@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>, Roland McGrath <roland@hack.frob.com>,
Alexander Gordeev <agordeev@redhat.com>
Subject: TASK_DEAD && ttwu() again (Was: ensure arch_ptrace/ptrace_request can never race with SIGKILL)
Date: Wed, 23 Jan 2013 20:19:46 +0100 [thread overview]
Message-ID: <20130123191946.GA19101@redhat.com> (raw)
In-Reply-To: <20130121194723.GA18775@redhat.com>
To avoid the confusion, this is not connected to ptrace_freeze_traced()
changes...
With or without these changes, there is another problem: a spurious
wakeup from try_to_wake_up(TASK_NORMAL) which doesn't necessarily see
the "right" task->state.
As for ptrace_stop() this is purely theoretical, but I thought that
perhaps it makes sense to extract the "mb + unlock_wait(pi_lock)" code
from do_exit() into the generic helper,
set_current_state_sync_because_we_cant_tolerate_a_wrong_wakeup().
But when I look at this code again I am not sure it is right.
Let me remind the problem. To oversimplify, we have
try_to_wake_up(task, state)
{
lock(task->pi_lock);
if (task->state & state)
task->state = TASK_RUNNING;
unlock(task->pi_lock);
}
And this means that a task doing
current->state = STATE_1;
// no schedule() in between
current->state = STATE_2;
schedule();
can be actually woken up by try_to_wake_up(STATE_1) even if it already
sleeps in STATE_2.
Usually this is fine, any wait_event-like code should be careful. But
sometimes we can't afford the false wakeup, that is why do_wait() roughly
does
do_exit()
{
// down_read(mmap_sem) can do this without schedule()
current->state = TASK_UNINTERRUPTIBLE;
current->state = TASK_RUNNING;
mb();
spin_unlock_wait(current->pi_lock);
current->state = TASK_DEAD;
schedule();
}
This should ensure that any subsequent (after unlock_wait) try_to_wake_up()
can't see state == UNINTERRUPTIBLE and I think this works.
But. Somehow we missed the fact (I think) that we also need to serialize
unlock_wait() and "state = TASK_DEAD". The code above can be reordered,
do_exit()
{
// down_read(mmap_sem) can do this without schedule()
current->state = TASK_UNINTERRUPTIBLE;
current->state = TASK_RUNNING;
mb();
current->state = TASK_DEAD;
// !!! ttwu() can change ->state here !!!
spin_unlock_wait(current->pi_lock);
schedule();
}
and we have the same problem again. So _I think_ that we we need another
mb() after unlock_wait() ?
And, afaics, in theory we can't simply move the current mb() down, after
unlock_wait(). (again, only in theory, if nothing else we should have
the implicit barrrers after we played with ->state in the past).
Or perhaps we should modify ttwu_do_wakeup() to not blindly set RUNNING,
say, cmpxchg(old_state, RUNNING). But this is not simple/nice.
Or I missed something?
Oleg.
--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -869,8 +869,15 @@ void do_exit(long code)
* To avoid it, we have to wait for releasing tsk->pi_lock which
* is held by try_to_wake_up()
*/
+
smp_mb();
raw_spin_unlock_wait(&tsk->pi_lock);
+ /*
+ * The first mb() ensures that after that try_to_wake_up() must see
+ * state == TASK_RUNNING. We need another one to ensure that we set
+ * TASK_DEAD only after ->pi_lock is really unlocked.
+ */
+ smp_mb();
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
next prev parent reply other threads:[~2013-01-23 19:21 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130115172613.GA3011@redhat.com>
[not found] ` <20130116181830.GA6469@redhat.com>
[not found] ` <CA+55aFzkWqSEzw9oa5JodrM2NWE0H_AF7xyzRhd+DQ=PB=ZT2A@mail.gmail.com>
[not found] ` <20130118153700.GA27915@redhat.com>
[not found] ` <CA+55aFxEow_-PoX0xFa07yOi6az=6uVx8zeOsfToErmzh7dB8A@mail.gmail.com>
[not found] ` <20130118172854.GA29753@redhat.com>
[not found] ` <20130118175224.GA520@redhat.com>
[not found] ` <CA+55aFyEsU-pkX557A-m+xoGkA_v+fXEyA8z8HbJ5J8K1jObeg@mail.gmail.com>
[not found] ` <20130118185559.GA3773@redhat.com>
[not found] ` <CA+55aFy=newnMbx53HipyWbRs2mUUPSqXXCpSfDLW78gkro37g@mail.gmail.com>
2013-01-20 19:24 ` [PATCH 0/4] (Was: ptrace: prevent PTRACE_SETREGS from corrupting stack) Oleg Nesterov
2013-01-20 19:25 ` [PATCH 1/4] ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() Oleg Nesterov
2013-01-20 19:25 ` [PATCH 2/4] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-20 19:46 ` [PATCH v2 " Oleg Nesterov
2013-01-20 20:21 ` [PATCH " Linus Torvalds
2013-01-21 17:21 ` Oleg Nesterov
2013-01-21 18:27 ` Linus Torvalds
2013-01-21 19:47 ` [PATCH v3 0/3] " Oleg Nesterov
2013-01-21 19:47 ` [PATCH v3 1/3] ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() Oleg Nesterov
2013-01-21 19:48 ` [PATCH v3 2/3] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-22 17:52 ` [PATCH v4 " Oleg Nesterov
2013-01-21 19:48 ` [PATCH v3 3/3] wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task Oleg Nesterov
2013-01-22 17:51 ` [PATCH v3 0/3] ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL Oleg Nesterov
2013-01-23 19:19 ` Oleg Nesterov [this message]
2013-01-23 19:50 ` TASK_DEAD && ttwu() again (Was: ensure arch_ptrace/ptrace_request can never race with SIGKILL) Tejun Heo
2013-01-24 18:50 ` Oleg Nesterov
2013-01-20 19:25 ` [PATCH 3/4] ia64: kill thread_matches(), unexport ptrace_check_attach() Oleg Nesterov
2013-01-20 20:23 ` Linus Torvalds
2013-01-20 19:26 ` [PATCH 4/4] wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task Oleg Nesterov
2013-01-20 19:35 ` [PATCH 0/4] (Was: ptrace: prevent PTRACE_SETREGS from corrupting stack) Oleg Nesterov
2013-01-23 18:00 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130123191946.GA19101@redhat.com \
--to=oleg@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=adurbin@google.com \
--cc=agordeev@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.carpenter@oracle.com \
--cc=jln@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=md@google.com \
--cc=mingo@elte.hu \
--cc=roland@hack.frob.com \
--cc=suleiman@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=y-goto@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.