From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Louis Rilling <louis.rilling@kerlabs.com>,
Mike Galbraith <efault@gmx.de>,
Pavel Emelyanov <xemul@parallels.com>,
linux-kernel@vger.kernel.org
Subject: [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped
Date: Wed, 30 May 2012 20:15:00 +0200 [thread overview]
Message-ID: <20120530181500.GA20130@redhat.com> (raw)
In-Reply-To: <20120530181429.GA19989@redhat.com>
From: Eric W. Biederman <ebiederm@xmission.com>
Today we have a two-fold bug. Sometimes release_task on pid == 1 in a
pid namespace can run before other processes in a pid namespace have had
release task called. With the result that pid_ns_release_proc can be
called before the last proc_flus_task() is done using
upid->ns->proc_mnt, resulting in the use of a stale pointer. This same
set of circumstances can lead to waitpid(...) returning for a processes
started with clone(CLONE_NEWPID) before the every process in the pid
namespace has actually exited.
To fix this modify zap_pid_ns_processess wait until all other processes
in the pid namespace have exited, even EXIT_DEAD zombies.
The delay_group_leader and related tests ensure that the thread gruop
leader will be the last thread of a process group to be reaped, or to
become EXIT_DEAD and self reap. With the change to zap_pid_ns_processes
we get the guarantee that pid == 1 in a pid namespace will be the last
task that release_task is called on.
With pid == 1 being the last task to pass through release_task
pid_ns_release_proc can no longer be called too early nor can wait
return before all of the EXIT_DEAD tasks in a pid namespace have exited.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/exit.c | 14 +++++++++++++-
kernel/pid_namespace.c | 20 ++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 910a071..4fd4c55 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,7 +64,6 @@ static void exit_mm(struct task_struct * tsk);
static void __unhash_process(struct task_struct *p, bool group_dead)
{
nr_threads--;
- detach_pid(p, PIDTYPE_PID);
if (group_dead) {
detach_pid(p, PIDTYPE_PGID);
detach_pid(p, PIDTYPE_SID);
@@ -72,7 +71,20 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
list_del_rcu(&p->tasks);
list_del_init(&p->sibling);
__this_cpu_dec(process_counts);
+ /*
+ * If we are the last child process in a pid namespace to be
+ * reaped, notify the reaper sleeping zap_pid_ns_processes().
+ */
+ if (IS_ENABLED(CONFIG_PID_NS)) {
+ struct task_struct *parent = p->real_parent;
+
+ if ((task_active_pid_ns(p)->child_reaper == parent) &&
+ list_empty(&parent->children) &&
+ (parent->flags & PF_EXITING))
+ wake_up_process(parent);
+ }
}
+ detach_pid(p, PIDTYPE_PID);
list_del_rcu(&p->thread_group);
}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 57bc1fd..41ed867 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -179,11 +179,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
}
read_unlock(&tasklist_lock);
+ /* Firstly reap the EXIT_ZOMBIE children we may have. */
do {
clear_thread_flag(TIF_SIGPENDING);
rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);
+ /*
+ * sys_wait4() above can't reap the TASK_DEAD children.
+ * Make sure they all go away, see __unhash_process().
+ */
+ for (;;) {
+ bool need_wait = false;
+
+ read_lock(&tasklist_lock);
+ if (!list_empty(¤t->children)) {
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ need_wait = true;
+ }
+ read_unlock(&tasklist_lock);
+
+ if (!need_wait)
+ break;
+ schedule();
+ }
+
if (pid_ns->reboot)
current->signal->group_exit_code = pid_ns->reboot;
--
1.5.5.1
next prev parent reply other threads:[~2012-05-30 18:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20120530175745.GA19327@redhat.com>
2012-05-30 18:14 ` [PATCH 0/2] pidns: premature pid_ns_release_proc() fixes Oleg Nesterov
2012-05-30 18:15 ` Oleg Nesterov [this message]
2012-05-31 10:32 ` [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped Pavel Emelyanov
2012-06-13 21:52 ` Andrew Morton
2012-06-13 22:17 ` Eric W. Biederman
2012-06-14 17:20 ` Oleg Nesterov
2012-05-30 18:15 ` [PATCH 2/2] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-31 10:33 ` Pavel Emelyanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120530181500.GA20130@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=efault@gmx.de \
--cc=gorcunov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=louis.rilling@kerlabs.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.