All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Louis Rilling <louis.rilling@kerlabs.com>,
	Mike Galbraith <efault@gmx.de>,
	Pavel Emelyanov <xemul@parallels.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped
Date: Wed, 13 Jun 2012 14:52:48 -0700	[thread overview]
Message-ID: <20120613145248.f5caab7e.akpm@linux-foundation.org> (raw)
In-Reply-To: <20120530181500.GA20130@redhat.com>

On Wed, 30 May 2012 20:15:00 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Today we have a two-fold bug.  Sometimes release_task on pid == 1 in a
> pid namespace can run before other processes in a pid namespace have had
> release task called.  With the result that pid_ns_release_proc can be
> called before the last proc_flus_task() is done using
> upid->ns->proc_mnt, resulting in the use of a stale pointer.  This same
> set of circumstances can lead to waitpid(...) returning for a processes
> started with clone(CLONE_NEWPID) before the every process in the pid
> namespace has actually exited.
> 
> To fix this modify zap_pid_ns_processess wait until all other processes
> in the pid namespace have exited, even EXIT_DEAD zombies.
> 
> The delay_group_leader and related tests ensure that the thread gruop
> leader will be the last thread of a process group to be reaped, or to
> become EXIT_DEAD and self reap.  With the change to zap_pid_ns_processes
> we get the guarantee that pid == 1 in a pid namespace will be the last
> task that release_task is called on.
> 
> With pid == 1 being the last task to pass through release_task
> pid_ns_release_proc can no longer be called too early nor can wait
> return before all of the EXIT_DEAD tasks in a pid namespace have exited.
> 
> ...
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -64,7 +64,6 @@ static void exit_mm(struct task_struct * tsk);
>  static void __unhash_process(struct task_struct *p, bool group_dead)
>  {
>  	nr_threads--;
> -	detach_pid(p, PIDTYPE_PID);
>  	if (group_dead) {
>  		detach_pid(p, PIDTYPE_PGID);
>  		detach_pid(p, PIDTYPE_SID);
> @@ -72,7 +71,20 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
>  		list_del_rcu(&p->tasks);
>  		list_del_init(&p->sibling);
>  		__this_cpu_dec(process_counts);
> +		/*
> +		 * If we are the last child process in a pid namespace to be
> +		 * reaped, notify the reaper sleeping zap_pid_ns_processes().
> +		 */
> +		if (IS_ENABLED(CONFIG_PID_NS)) {
> +			struct task_struct *parent = p->real_parent;
> +
> +			if ((task_active_pid_ns(p)->child_reaper == parent) &&
> +			    list_empty(&parent->children) &&
> +			    (parent->flags & PF_EXITING))
> +				wake_up_process(parent);
> +		}
>  	}
> +	detach_pid(p, PIDTYPE_PID);
>  	list_del_rcu(&p->thread_group);
>  }
>  
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 57bc1fd..41ed867 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -179,11 +179,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> +	/* Firstly reap the EXIT_ZOMBIE children we may have. */
>  	do {
>  		clear_thread_flag(TIF_SIGPENDING);
>  		rc = sys_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>  
> +	/*
> +	 * sys_wait4() above can't reap the TASK_DEAD children.
> +	 * Make sure they all go away, see __unhash_process().
> +	 */
> +	for (;;) {
> +		bool need_wait = false;
> +
> +		read_lock(&tasklist_lock);
> +		if (!list_empty(&current->children)) {
> +			__set_current_state(TASK_UNINTERRUPTIBLE);
> +			need_wait = true;
> +		}
> +		read_unlock(&tasklist_lock);
> +
> +		if (!need_wait)
> +			break;
> +		schedule();

This sleep is terminated by the above wake_up_process(), yes?

But that wake_up_process() only happens if CONFIG_PID_NS.  It's
unobvious (to me) that we can't get stuck in D state if
CONFIG_PID_NS=n.  If bug, please fix.  If not bug, please make obvious
to me ;)

<looks at the locking a bit>

<gets distracted>

That tty_kref_put() in __exit_signal() is running with tasklist_lock
held, yes?  It does a ton of work and calls out to random drivers and
none of this needs tasklist_lock.  Seems risky.

  parent reply	other threads:[~2012-06-13 21:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120530175745.GA19327@redhat.com>
2012-05-30 18:14 ` [PATCH 0/2] pidns: premature pid_ns_release_proc() fixes Oleg Nesterov
2012-05-30 18:15   ` [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped Oleg Nesterov
2012-05-31 10:32     ` Pavel Emelyanov
2012-06-13 21:52     ` Andrew Morton [this message]
2012-06-13 22:17       ` Eric W. Biederman
2012-06-14 17:20         ` Oleg Nesterov
2012-05-30 18:15   ` [PATCH 2/2] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-31 10:33     ` Pavel Emelyanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120613145248.f5caab7e.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=oleg@redhat.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.