From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
Jan Kratochvil <jan.kratochvil@redhat.com>,
Lennart Poettering <lpoetter@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michal Schmidt <mschmidt@redhat.com>,
Roland McGrath <roland@hack.frob.com>, Tejun Heo <tj@kernel.org>,
linux-kernel@vger.kernel.org
Subject: [PATCH 2/5] wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition
Date: Thu, 20 Feb 2014 18:39:02 +0100 [thread overview]
Message-ID: <20140220173902.GA21853@redhat.com> (raw)
In-Reply-To: <20140220173838.GA21825@redhat.com>
wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal
if EXIT_DEAD, assuming that this task must be reaped by someone else.
So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable. This was fixed by
the previous commit, but it was the temporary hack.
1. Add the new exit_state, EXIT_TRACE. It means that the task is the
traced zombie, debugger is going to detach and notify its natural
parent.
This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
can avoid the changes in proc/kgdb code, get_task_state() still
reports "X (dead)" in this case.
Note: with or without this change userspace can see Z -> X -> Z
transition. Not really bad, but probably makes sense to fix.
2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
if we need to notify the ->real_parent.
3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
is always the final state we can safely ignore such a task.
4. Change wait_consider_task() to check EXIT_TRACE separately and kill
the racy and no longer needed ptrace_reparented() case.
If ptrace == T an EXIT_TRACE thread should be simply ignored, the
owner of this state is going to ptrace_unlink() this task. We can
pretend that it was already removed from ->ptraced list.
Otherwise we should skip this thread too but clear ->notask_error,
we must be the natural parent and debugger is going to untrace and
notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
even if the task was already untraced.
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-and-tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/sched.h | 1 +
kernel/exit.c | 50 ++++++++++++++++++++----------------------------
2 files changed, 22 insertions(+), 29 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a781dec..b3593ac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -206,6 +206,7 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
/* in tsk->exit_state */
#define EXIT_ZOMBIE 16
#define EXIT_DEAD 32
+#define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD)
/* in tsk->state again */
#define TASK_DEAD 64
#define TASK_WAKEKILL 128
diff --git a/kernel/exit.c b/kernel/exit.c
index 5281522..c702824 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -560,6 +560,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
struct list_head *dead)
{
list_move_tail(&p->sibling, &p->real_parent->children);
+
+ if (p->exit_state == EXIT_DEAD)
+ return;
/*
* If this is a threaded reparent there is no need to
* notify anyone anything has happened.
@@ -567,19 +570,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
if (same_thread_group(p->real_parent, father))
return;
- /*
- * We don't want people slaying init.
- *
- * Note: we do this even if it is EXIT_DEAD, wait_task_zombie()
- * can change ->exit_state to EXIT_ZOMBIE. If this is the final
- * state, do_notify_parent() was already called and ->exit_signal
- * doesn't matter.
- */
+ /* We don't want people slaying init. */
p->exit_signal = SIGCHLD;
- if (p->exit_state == EXIT_DEAD)
- return;
-
/* If it has exited notify the new parent about this child's death. */
if (!p->ptrace &&
p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) {
@@ -1045,17 +1038,13 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
return wait_noreap_copyout(wo, p, pid, uid, why, status);
}
+ traced = ptrace_reparented(p);
/*
- * Try to move the task's state to DEAD
- * only one thread is allowed to do this:
+ * Move the task's state to DEAD/TRACE, only one thread can do this.
*/
- state = xchg(&p->exit_state, EXIT_DEAD);
- if (state != EXIT_ZOMBIE) {
- BUG_ON(state != EXIT_DEAD);
+ state = traced ? EXIT_TRACE : EXIT_DEAD;
+ if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE)
return 0;
- }
-
- traced = ptrace_reparented(p);
/*
* It can be ptraced but not reparented, check
* thread_group_leader() to filter out sub-threads.
@@ -1116,7 +1105,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
/*
* Now we are sure this task is interesting, and no other
- * thread can reap it because we set its state to EXIT_DEAD.
+ * thread can reap it because we its state == DEAD/TRACE.
*/
read_unlock(&tasklist_lock);
@@ -1161,14 +1150,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* If this is not a sub-thread, notify the parent.
* If parent wants a zombie, don't release it now.
*/
+ state = EXIT_DEAD;
if (thread_group_leader(p) &&
- !do_notify_parent(p, p->exit_signal)) {
- p->exit_state = EXIT_ZOMBIE;
- p = NULL;
- }
+ !do_notify_parent(p, p->exit_signal))
+ state = EXIT_ZOMBIE;
+ p->exit_state = state;
write_unlock_irq(&tasklist_lock);
}
- if (p != NULL)
+ if (state == EXIT_DEAD)
release_task(p);
return retval;
@@ -1364,12 +1353,15 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
}
/* dead body doesn't have much to contribute */
- if (unlikely(p->exit_state == EXIT_DEAD)) {
+ if (unlikely(p->exit_state == EXIT_DEAD))
+ return 0;
+
+ if (unlikely(p->exit_state == EXIT_TRACE)) {
/*
- * But do not ignore this task until the tracer does
- * wait_task_zombie()->do_notify_parent().
+ * ptrace == 0 means we are the natural parent. In this case
+ * we should clear notask_error, debugger will notify us.
*/
- if (likely(!ptrace) && unlikely(ptrace_reparented(p)))
+ if (likely(!ptrace))
wo->notask_error = 0;
return 0;
}
--
1.5.5.1
next prev parent reply other threads:[~2014-02-20 17:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
2014-02-20 17:38 ` [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race Oleg Nesterov
2014-02-20 17:39 ` Oleg Nesterov [this message]
2014-02-20 17:39 ` [PATCH 3/5] wait: use EXIT_TRACE only if thread_group_leader(zombie) Oleg Nesterov
2014-02-20 17:39 ` [PATCH 4/5] wait: completely ignore the EXIT_DEAD tasks Oleg Nesterov
2014-02-20 17:39 ` [PATCH 5/5] wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space Oleg Nesterov
2014-02-20 19:48 ` [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Tejun Heo
2014-02-24 15:51 ` Oleg Nesterov
2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
2014-02-26 16:55 ` [PATCH 1/2] wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent Oleg Nesterov
2014-02-26 16:56 ` [PATCH 2/2] wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140220173902.GA21853@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jan.kratochvil@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lpoetter@redhat.com \
--cc=mschmidt@redhat.com \
--cc=roland@hack.frob.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.