From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Pavel Emelyanov <xemul@parallels.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Louis Rilling <louis.rilling@kerlabs.com>,
Mike Galbraith <efault@gmx.de>
Subject: [PATCH] pidns: Guarantee that the pidns init will be the last pidns process reaped. v2
Date: Mon, 21 May 2012 18:20:31 -0600 [thread overview]
Message-ID: <87d35x5ank.fsf_-_@xmission.com> (raw)
In-Reply-To: <20120521124414.GA20391@redhat.com> (Oleg Nesterov's message of "Mon, 21 May 2012 14:44:14 +0200")
Today we have a two-fold bug. Sometimes release_task on pid == 1 in a
pid namespace can run before other processes in a pid namespace have had
release task called. With the result that pid_ns_release_proc can be
called before the last proc_flus_task() is done using
upid->ns->proc_mnt, resulting in the use of a stale pointer. This same
set of circumstances can lead to waitpid(...) returning for a processes
started with clone(CLONE_NEWPID) before the every process in the pid
namespace has actually exited.
To fix this modify zap_pid_ns_processess wait until all other processes
in the pid namespace have exited, even EXIT_DEAD zombies.
The delay_group_leader and related tests ensure that the thread gruop
leader will be the last thread of a process group to be reaped, or to
become EXIT_DEAD and self reap. With the change to zap_pid_ns_processes
we get the guarantee that pid == 1 in a pid namespace will be the last
task that release_task is called on.
With pid == 1 being the last task to pass through release_task
pid_ns_release_proc can no longer be called too early nor can wait
return before all of the EXIT_DEAD tasks in a pid namespace have exited.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
Andrew can you replace your earlier version of this patch in your tree
with this one, after Oleg takes a look at it. I think this is about
as simple and maintainable and obvious as we can make this bug fix.
kernel/exit.c | 13 ++++++++++++-
kernel/pid_namespace.c | 11 +++++++++++
2 files changed, 23 insertions(+), 1 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..abc4fc0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,15 +64,26 @@ static void exit_mm(struct task_struct * tsk);
static void __unhash_process(struct task_struct *p, bool group_dead)
{
nr_threads--;
- detach_pid(p, PIDTYPE_PID);
if (group_dead) {
+ struct task_struct *parent;
+
detach_pid(p, PIDTYPE_PGID);
detach_pid(p, PIDTYPE_SID);
list_del_rcu(&p->tasks);
list_del_init(&p->sibling);
__this_cpu_dec(process_counts);
+
+ /* If we are the last child process in a pid namespace
+ * to be reaped notify the child_reaper.
+ */
+ parent = p->real_parent;
+ if ((task_active_pid_ns(p)->child_reaper == parent) &&
+ list_empty(&parent->children) &&
+ (parent->flags & PF_EXITING))
+ wake_up_process(parent);
}
+ detach_pid(p, PIDTYPE_PID);
list_del_rcu(&p->thread_group);
}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index b98b0ed..ba1cbb8 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -189,6 +189,17 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);
+ read_lock(&tasklist_lock);
+ for (;;) {
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ if (list_empty(¤t->children))
+ break;
+ read_unlock(&tasklist_lock);
+ schedule();
+ read_lock(&tasklist_lock);
+ }
+ read_unlock(&tasklist_lock);
+
if (pid_ns->reboot)
current->signal->group_exit_code = pid_ns->reboot;
--
1.7.5.4
next prev parent reply other threads:[~2012-05-22 0:20 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-28 9:19 [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith
2012-04-28 14:26 ` Oleg Nesterov
2012-04-29 4:13 ` Mike Galbraith
2012-04-29 7:57 ` Eric W. Biederman
2012-04-29 9:49 ` Mike Galbraith
2012-04-29 16:58 ` Oleg Nesterov
2012-04-30 2:59 ` Eric W. Biederman
2012-04-30 3:25 ` Mike Galbraith
2012-05-02 12:40 ` Oleg Nesterov
2012-05-02 17:37 ` Eric W. Biederman
2012-04-30 3:01 ` [PATCH] " Mike Galbraith
[not found] ` <m1zk9rmyh4.fsf@fess.ebiederm.org>
2012-05-01 20:42 ` Andrew Morton
2012-05-03 3:12 ` Mike Galbraith
2012-05-03 14:56 ` Mike Galbraith
2012-05-04 4:27 ` Mike Galbraith
2012-05-04 7:55 ` Eric W. Biederman
2012-05-04 8:34 ` Mike Galbraith
2012-05-04 9:45 ` Mike Galbraith
2012-05-04 14:13 ` Eric W. Biederman
2012-05-04 14:49 ` Mike Galbraith
2012-05-04 15:36 ` Eric W. Biederman
2012-05-04 16:57 ` Mike Galbraith
2012-05-04 20:29 ` Eric W. Biederman
2012-05-05 5:56 ` Mike Galbraith
2012-05-05 6:08 ` Mike Galbraith
2012-05-05 7:12 ` Mike Galbraith
2012-05-05 11:37 ` Eric W. Biederman
2012-05-07 21:51 ` [PATCH] vfs: Speed up deactivate_super for non-modular filesystems Eric W. Biederman
2012-05-07 22:17 ` Al Viro
2012-05-07 23:56 ` Paul E. McKenney
2012-05-08 1:07 ` Eric W. Biederman
2012-05-08 4:53 ` Mike Galbraith
2012-05-09 7:55 ` Nick Piggin
2012-05-09 11:02 ` Eric W. Biederman
2012-05-09 11:02 ` Eric W. Biederman
2012-05-15 8:40 ` Nick Piggin
2012-05-16 0:34 ` Eric W. Biederman
2012-05-16 0:34 ` Eric W. Biederman
2012-05-09 13:59 ` Paul E. McKenney
2012-05-04 8:03 ` [PATCH] Re: [RFC PATCH] namespaces: fix leak on fork() failure Eric W. Biederman
2012-05-04 8:19 ` Mike Galbraith
2012-05-04 8:54 ` Mike Galbraith
2012-05-07 0:32 ` [PATCH 0/3] pidns: Closing the pid namespace exit race Eric W. Biederman
2012-05-07 0:33 ` [PATCH 1/3] pidns: Use task_active_pid_ns in do_notify_parent Eric W. Biederman
2012-05-07 0:35 ` [PATCH 2/3] pidns: Guarantee that the pidns init will be the last pidns process reaped Eric W. Biederman
2012-05-08 22:50 ` Andrew Morton
2012-05-16 18:39 ` Oleg Nesterov
2012-05-16 19:34 ` Oleg Nesterov
2012-05-16 20:54 ` Eric W. Biederman
2012-05-17 17:00 ` Oleg Nesterov
2012-05-17 21:46 ` Eric W. Biederman
2012-05-18 12:39 ` Oleg Nesterov
2012-05-19 0:03 ` Eric W. Biederman
2012-05-21 12:44 ` Oleg Nesterov
2012-05-22 0:16 ` Eric W. Biederman
2012-05-22 0:20 ` Eric W. Biederman [this message]
2012-05-22 16:54 ` [PATCH] pidns: Guarantee that the pidns init will be the last pidns process reaped. v2 Oleg Nesterov
2012-05-22 19:23 ` Andrew Morton
2012-05-23 14:52 ` Oleg Nesterov
2012-05-25 15:15 ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Oleg Nesterov
2012-05-25 15:59 ` [PATCH -mm 0/1] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-25 16:00 ` [PATCH -mm 1/1] " Oleg Nesterov
2012-05-25 21:43 ` Eric W. Biederman
2012-05-27 19:10 ` [PATCH v2 -mm 0/1] " Oleg Nesterov
2012-05-27 19:11 ` [PATCH v2 -mm 1/1] " Oleg Nesterov
2012-05-29 6:34 ` Eric W. Biederman
2012-05-25 21:25 ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Eric W. Biederman
2012-05-27 18:41 ` [PATCH -mm v2] " Oleg Nesterov
2012-05-07 0:35 ` [PATCH 3/3] pidns: Make killed children autoreap Eric W. Biederman
2012-05-08 22:51 ` Andrew Morton
2012-04-30 13:57 ` [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d35x5ank.fsf_-_@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=efault@gmx.de \
--cc=gorcunov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=louis.rilling@kerlabs.com \
--cc=oleg@redhat.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.